based on big data, upon which the operational
convenience of a business can rely. Various machine
learning tasks are categorized depending on the
desired output of the machine learned system. The
applications of machine learning are as diverse as
the applications of big data. Adaptive websites, Bio
informatics, Computational advertising, Information
retrieval, credit card fraud detection, medical
diagnosis, Natural language processing , stock
market analysis are some areas where machine
learning has found its use.
2 PERSONALIZATION FOR
SOCIAL NETWORK
Social networks such as Face book, Twitter,
LinkedIn etc pave way for generation of huge
amount of diverse data in short period of time. Such
social media data require the application of big data
analytics to produce meaningful information to both
information consumers and 3 data generators. The
impact of different big data analytics tools and
techniques over processing social network data will
be discussed in detail in this section of this paper.
Big data analytics techniques and tool types
include all of the following such as Predictive
analytics, data mining, statistical analysis, complex
SQL, data visualization, artificial intelligence and
natural language processing. The analysis of
structured and unstructured data from social
networks leads to social network analytics
(Balabanovic, 1997). Even blogs, micro blogs and
wikis contribute to social network analytics data
sets. Though there are various sources of
information available in social media, we are largely
concerned about the user generated contents such as
sentiments, images, videos and bookmarks and
interactive relationships between people,
organizations and products. These two classes of
information is utilized in various big data analytics
tool such as Hadoop and Map Reduce Framework,
Apache Pig, Apache Hive, Jaql, NoSQL etc. When
user posted information is used in the analytics
approach, it is called as content based analytics and
when relationships between entities is used for
analytics, it is known as structure based analytics.
Social networks consist of millions of connected
objects and analysis of data from those objects is
computationally intensive and expensive. Hence
there are two different approaches that shall be
followed. They are parallelization approach and
Graph databases approach (Adomavicius, 2005). In
parallelization approach, focus is towards dividing a
huge data set into smaller sub sets and utilize the
computational power through cloud computing to
process the data in a parallel manner. Map Reduce
and Pregel from google are pioneer in this approach.
However, lots of open source initiatives in the form
of Hadoop are gaining popularity in the social
network analytics. Spark and Hama are also
registering their market share in the research of
social network data. (Burke, 2007)
Map reduce framework consists of Map phase
and Reduce phase which uses Key/Value pairs and
Key-Value List pairs respectively. Any mapreduce
application contains various hotspots such as Input
Reader, Map, Partition, Compare, Reduce, Output
Writer. Application of Map reduce is considered to
enable the scalability of social networks, for the
determination of graph based metrics. This
application is used to determine the betweenness
centrality. The chaining of Mapreduce jobs in social
network analytics is carried out for the estimation of
shortest paths in a graph. Blocking mechanism is an
important part of Map Reduce that deals with
machine failures in the application of social network
data.
The preprocessor cleans, integrates, selects and
transforms the knowledge base of the users and
items to relevant user and item data store. Then
various types of filters are applied to data stored in
these databases. The filtering algorithms can be
broadly classified into memory based algorithms and
model based algorithms. Recommender system
using memory based algorithm learns at a particular
instance of time considering all previous instances.
After the recommendation, the system immediately
knows the result of the prediction and hence uses the
feedback for further recommendations. Memory
based algorithms use similarity metrics to obtain the
similarity distance between two users, or two items
and aggregation measures that helps in generating
the prediction. Model-based methods use user and
item information to create a reference that generates
the recommendations. The most widely used model
based algorithms are based on Bayesian classifiers,
neural networks, fuzzy logic based algorithms,
genetic algorithms and singular value decomposition
techniques.
The Personalization Technique for Social Recommender Systems using Machine Learning
139