the document being classified. (Calado et al., 2003)
analyzed several distinct linkage similarity measures
and determined which ones provide the best results
in predicting the category of a document. They also
proposed a Bayesian network model that takes advan-
tage of both the information provided by a content-
based classifier and the information provided by the
document link structure. (Zhou and Burges, 2007) ex-
tended their transductive learning framework by com-
bining the laplacian defined on each view. Moreover,
(Macskassy, 2007) proposes to merge an inferred net-
work and the link network into one global network.
Then, he applies to that network an iterative classifi-
cation algorithm based on relation labeling described
in (Macskassy and Provost, 2007), a baseline algo-
rithm in semi-supervised classification. Another re-
lated algorithm, namely ”stacked sequential learning”
has been used in order to augment an arbitrary base
learning so as to make it aware of the labels of con-
nected examples. (Maes et al., 2009) extended this
last algorithm in order to decrease an intrinsic bias
due to the iterative classification process. For its part,
(Tang et al., 2009) solves a multiple graph clustering
problem where each graph is approximated by ma-
trix factorization with a graph-specific factor and a
factor common to all graphs. Finally, more recently,
(Backstrom and Leskovec, 2011) proposes to learn
the weights of a namely ”supervised random walk”
using both the information from the network struc-
ture and the attribute data. People retrieval, or ex-
pert finding, has also been intensively studied this
last years. Recently, (McCallum et al., 2007) pro-
posed to apply his successful Author-Recipient-Topic
(ART) model to an expert retrieval task. Therefore,
they extended the ART model to the Role-Author-
Recipient-Topic model in order to represent explicitly
people’s roles. During the same period, (Mimno and
McCallum, 2007) introduced yet another topic based
model, namely, the Author-Persona-Topic model for
the problem of matching papers with reviewers. This
family of works try to find latent variables that explain
topics and communities formation and, indirectly, use
these latent variables to compute the similarities, what
is completely different from our approach. More re-
lated to our work, (Balog et al., 2009) proposed to
model the process of expert search by introducing a
theoretical language modeling framework. More re-
cently, (Smirnova and Balog, 2011) proposed to ex-
tend this model with a user-oriented aspect in order
to balance the retained expert candidate with the time
needed by the user to contact him. Actually, these
frameworks are mono-modal (i.e. working only on
document terms), they do not consider any social or
link information. Moreover, there is no aspect of
pseudo-relevance feedback in order to enrich the sub-
mitted query.
6 CONCLUSIONS
In this work we presented a global frameworkfor peo-
ple retrieval in a collection of socially-labelled docu-
ments, which extends the classical paradigm of docu-
ment retrieval by focusing on people and social roles.
This framework may be applied to a wide range of
retrieval tasks involving multi-view aspects. Our ap-
proach consists of separating the problem into two
phases : in the first one (at the document level), we
define valuable similarity measures exploiting direct
(i.e. one step) and indirect (i.e. two-step, as in tradi-
tional pseudo-relevance feedback) relations between
the query and the targeted collection. By this way, we
are also able to capture cross-modal similarities in or-
der to improve the final ranking. It appears that com-
bining these similarities by a simple mean after score
studentization offers a performance level that more
complex combination schemes (for instance, learning
the combination weights by a logistic regression when
we can formulate the task as a supervised prediction
problem) are not able to beat.
REFERENCES
Backstrom, L. and Leskovec, J. (2011). Supervised ran-
dom walks: predicting and recommending links in so-
cial networks. In Proceedings of the Forth Interna-
tional Conference on Web Search and Web Data Min-
ing, WSDM 2011, Hong Kong, China, pages 635–644.
Balog, K., Azzopardi, L., and de Rijke, M. (2009). A lan-
guage modeling framework for expert finding. Inf.
Process. Manage., 45(1):1–19.
Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto,
B., and Gonc¸alves, M. (2003). Combining link-based
and content-based methods for web document classi-
fication. In Proceedings of the twelfth international
Conference on Information and Knowledge Manage-
ment (CIKM 2003, pages 394–401. ACM.
Chakrabarti, S., Dom, B., and Indyk, P. (1998). Enhanced
hypertext categorization using hyperlinks. In Proceed-
ings of the 1998 ACM SIGMOD International Confer-
ence on Management of Data, pages 307–318.
Clinchant, S., Renders, J.-M., and Csurka, G. (2007). Trans-
media pseudo-relevance feedback methods in multi-
media retrieval. In CLEF, pages 569–576.
Cohn, D. A. and Hofmann, T. (2000). The missing link -
a probabilistic model of document content and hyper-
text connectivity. In Neural Information Processing
Systems conference (NIPS 2000), pages 430–436.
Fisher, M. and Everson, R. (2003). When are links useful?
Experiments in text classification. In Advances in in-
formation retrieval: proceedings of the 25th European
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
340