algorithm for detecting what is the attraction of each
participant towards the debated topics and, at the
same time, what is the probability of a topic to
follow another topic within the conversation.
Most of the existing approaches for analyzing
text using social network analysis (SNA) tools are
oriented towards systems that own explicit
referencing tools, such as forums or blogs. The
reason behind this orientation is the ease in
constructing the participant social network based on
the order in which the messages are sent and on the
recipient of the message. Still, there are a few
systems that intended to apply SNA tools to chat
conversations. One such tool was built by
Sundararajan (2010) for analyzing the content
published by the participants to 8 different courses
in order to observe how the respect and influence
earned by each participant influences their efforts to
"collaborate, learn new and conceptual knowledge"
and their satisfaction regarding the courses outcome.
Unfortunately, the author does not mention whether
this analysis is done manually or automatically.
Moreover, a regular SNA method is used for
evaluating the participants from the perspective of
their centrality, betweenness, in-degree, out-degree,
etc. in the network, which represent only
quantitative data. On the other hand, we are rather
interested in what the participants communicate
(what are the topics they know or they are interested
on) and in the interaction patterns between different
concepts that are debated in the conversation, which
is part of a qualitative evaluation of the participants,
topics and the conversation as a whole.
A more similar approach was undertaken by
Tuulos and Tirri (2004). The authors present a semi-
supervised system that uses a combination of topic
modelling and SNA to improve the information
retrieval from chat conversations. For their analysis,
they have used conversations taken from
SearchIRC.com which allowed them to use simple
heuristics in order to identify to whom each
utterance is addressed (and therefore to build the
social network). For this participant network the in-
degree, out-degree and PageRank of each participant
are determined. After that, the authors use some
existing conversations to detect the probabilities of
words to appear in conversations about different
topics, so that when they analyze new conversations
to be able to use these probabilities. Finally, they
evaluate the use of each of the SNA technique in
improving the information retrieval, considering as
baseline the results provided by the topic modelling.
Still, this approach gives them two advantages: first
of all they know both how many and what topics
should be present in the conversation (therefore
knowing what represents off-topic and being able to
discard that part); secondly, they have chosen the
topics from different topics (Bible, C++, Philosophy,
Physics, Politics, Win2000) thus simplifying the task
of identifying to what topic a given concept
corresponds. In our approach neither of these facts
can be exploited: since our system does not have a
learning phase, it gives the possibility to analyze
texts debating about any topics, without being
limited to the ones that were learnt (thus providing
generality in use). At the same time, it can be used to
distinguish between concepts that are from the same
or similar conceptual area. The examples presented
in this paper contain concepts from a single domain
(Human Computer Interaction) especially to prove
that the approach works even at this level, without
requiring that different topics to be debated in the
same conversation.
The paper continues with a short overview of the
PageRank algorithm. Then, we present the
application that has been developed and several
results that have been obtained by employing the
PageRank method adapted for CSCL chats. The
paper ends with an analysis of these results and with
our conclusions regarding the improvement of the
results’ quality.
2 OVERVIEW OF THE
PAGERANK ALGORITHM
Because previous researches have modelled an
online conversation as a graph with implicit and
explicit links between utterances (Rebedea et. al,
2011), we have started to consider that the PageRank
algorithm (Page et. al, 1998) may be a candidate for
the conversation graph analysis. PageRank is an
algorithm that was initially designed for the analysis
of a set of web pages in order to extract the relative
importance of each page from the considered set of
web pages (Page et. al, 1998). The algorithm
expresses the probability that a web surfer will be
able to “find” the considered page within a limited
number of steps (clicking on the links from one page
to another). It is a customization of a “random walk”
in a graph, which in turn is modelled as a Markov
chain in which the states are pages, and the
transitions, which are all equally probable, are the
links between pages.
The formal definition given in the initial paper
describing PageRank (Page et. al, 1998) was: if u is
a web page; Fu (forward links), the pages referred
UsingPageRankforDetectingtheAttractionbetweenParticipantsandTopicsinaConversation
295