considered approaches have initially been evaluated
by applying the algorithms on data extracted from Pu-
bMed repository. The produced clustering solutions
have been validated on two different datasets by two
different cluster validation measures: F-measure and
Silhouette Index (SI). The two Bipartite Correlation
Clustering (BCC) algorithms have slightly outperfor-
med the Partitioning-based on average with respect to
SI on the first data set. The Merge-Split PBC algo-
rithm has also demonstrated better performance than
the other two algorithms on the second data set. This
algorithm is able to analyze the correlations between
two clustering solutions and based on the discovered
patterns it treats the clusters in different ways. In ad-
dition, in comparison to the Partitioning-based clus-
tering algorithm the two BCC algorithms do not need
prior knowledge about the optimal number of clusters
in order to produce a good clustering solution. The
BCC algorithms are also more suitable for the consi-
dered expertise retrieval context, because each cluster
is modelled by a list of domain-specific topics, i.e.
analogously to the experts’ expertise profiles.
For future work, we aim to pursue further compa-
rison and evaluation of the three proposed clustering
approaches on richer data extracted from different on-
line sources.
REFERENCES
Abramowicz, W. et al. (2011). Semantically enabled experts
finding system - ontologies, reasoning approach and
web interface design. In ADBIS, volume 2, pages 157–
166.
Ackerman, M. and Dasgupta, S. (2014). Incremental cluste-
ring: The case for extra clusters. In Proc. of Advances
in Neural Inf. Proc. Sys. 27, pages 307–315.
Ailon, N. et al. (2011). Improved Approximation Algo-
rithms for Bipartite Correlation Clustering, pages 25–
36. ESA.
Aristides, G. et al. (2007). Clustering aggregation. TKDD,
1:4.
Awasthi, P. et al. (2017). Local algorithms for interactive
clustering. J. Mach. Learn. Res., 18:75–109.
Balcan, M.-F. et al. (2008). A discriminative framework for
clustering via similarity functions. In Proceedings of
the 40th annual ACM symposium on Theory of Com-
puting, pages 671–680. ACM.
Balog, K. and Rijke, M. d. (2007). Finding similar experts.
In ACM SIGIR’07, pages 821–822.
Bansal, N. et al. (2004). Correlation clustering. Machine
Learning, 56:89–113.
Boeva, V. et al. (2012). Measuring expertise similarity in
expert networks. In Proceedings of 6th IEEE Int.
Conf. on Intelligent Systems, pages 53–57. IEEE.
Boeva, V. et al. (2014a). Analysis of multiple DNA microar-
ray datasets, pages 223–234. Springer DE.
Boeva, V. et al. (2014b). Semantic-Aware Expert Partitio-
ning, pages 13–24. LNAI Springer.
Boeva, V. et al. (2016). Identifying a group of subject ex-
perts using formal concept analysis. In IEEE Conf. on
Intelligent Systems, pages 464–469. IEEE.
Boeva, V. et al. (2017). Data-driven techniques for expert
finding. In Proc. 9th Int. Conference on Agents and
AI, pages 535–542.
Bozzon, A. et al. (2013). Choosing the right crowd: expert
finding in social networks. In EDBT, pages 637–648.
Buelens, S. and Putman, M. (2012). Identifying experts
through a framework for knowledge extraction from
public online sources. Ghent University.
Charikar, M. et al. (1997). Incremental clustering and dyna-
mic information retrieval. In Proc. 29th Annual ACM
Symposium on Theory of Computing, pages 626–635.
ACM.
Fellbaum, C. (1998). WordNet: an electronic lexical data-
base. MIT Press.
Goder, A. and Filkov, V. (2008). Consensus clustering al-
gorithms: Comparison and refinement. In Algorithm
Engineering and Experimentation - ALENEX, pages
109–117. SIAM.
Harpreet, S. et al. (2013). Developing a biomedical expert
finding system unsing medical subject headings. HIR,
4:243–249.
Hristoskova, A. et al. (2013). A graph-based disambigua-
tion approach for construction of an expert repository
from public online sources. In ICAART, pages 24–33.
Jain, A. K. et al. (1988). Algorithms for Clustering Data.
Prentice-Hall, Inc.
Jung, H. et al. (2007). Finding topic-centric identified ex-
perts based on full text analysis. In FEWS’07, pages
56–63.
Larsen, B. et al. (1999). Fast and effective text mining using
linear-time document clustering. In Proceedings of
KDD-99, pages 16–22. ACM.
Miller, G. A. (1995). Wordnet: A lexical database for eng-
lish. Commun. ACM, 38:39–41.
O’Callaghan, L. et al. (2002). Streaming-data algorithms
for high-quality clustering. In Proceedings of ICDE
Conference, pages 685–694. IEEE Computer Society.
Rousseeuw, P. (1987). Silhouettes: A graphical aid to the in-
terpretation and validation of cluster analysis. J. Com-
put. Appl. Math., 20:53–65.
Sayers, E. (2010). A general introduction
to the e-utilities. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK25497/.
Stankovic, M. et al. (2011). Linked data metrics for flexible
expert search on the open web. In ESWC (1), volume
6643, pages 108–123.
Toutanova, K. and Manning, C. D. (2000). Enriching the
knowledge sources used in a maximum entropy part-
of-speech tagger. In Proceeding of the Joint SIGDAT
Conference on Empirical Methods in NLP and Very
Large Corpora, pages 63–70.
Tsiporkova, E.and Tourw
´
e, T. (2011). Tool support for
technology scouting using online sources. volume
6999, pages 371376. LNCS Springer.
Zhang, J. et al. (2007). Expert Finding in a Social Network,
pages 1066–1069. LNCS Springer.
Zhou, J. and Shui, Y. (2015). The meshsim package.
ICAART 2018 - 10th International Conference on Agents and Artificial Intelligence
530