AN EMPIRICAL COMPARISON OF LABEL PREDICTION ALGORITHMS ON AUTOMATICALLY INFERRED NETWORKS

Omar Ali, Giovanni Zappella, Tijl De Bie, Nello Cristianini

Abstract

The task of predicting the label of a network node, based on the labels of the remaining nodes, is an area of growing interest in machine learning, as various types of data are naturally represented as nodes in a graph. As an increasing number of methods and approaches are proposed to solve this task, the problem of comparing their performance becomes of key importance. In this paper we present an extensive experimental comparison of 15 different methods, on 15 different labelled-networks, as well as releasing all datasets and source code. In addition, we release a further set of networks that were not used in this study (as not all benchmarked methods could manage very large datasets). Besides the release of data, protocols and algorithms, the key contribution of this study is that in each of the 225 combinations we tested, the best performance—both in accuracy and running time—was achieved by the same algorithm: Online Majority Vote. This is also one of the simplest methods to implement.

References

  1. Ali, O. and Cristianini, N. (2010). Information Fusion for Entity Matching in Unstructured Data. Artificial Intelligence Applications and Innovations.
  2. Ali, O., Flaounas, I., Bie, T. D., Mosdell, N., Lewis, J., and Cristianini, N. (2010). Automating News Content Analysis: An Application to Gender Bias and Readability. JMLR: Workshop and Conference Proceedings, pages 1-7.
  3. Boslaugh, S. and Watters, P. A. (2008). Statistics in a Nutshell: A Desktop Quick Reference (In a Nutshell (O'Reilly)). O'Reilly Media.
  4. Broder, A. (1989). Generating random spanning trees. 30th Annual Symposium on Foundations of Computer Science, pages 442-447.
  5. Cesa-Bianchi, N., Gentile, C., Vitale, F., and Zappella, G. (2010). Random spanning trees and the prediction of weighted graphs. In Proceedings of the 27th International Conference on Machine Learning.
  6. Cesa-Bianchi, N., Gentile, C., Vitale, F., and Zappella, G. (2011). See the Tree Through the Lines: The Shazoo Algorithm. In Proceedings of the 25th Conference on Neural Information Processing Systems.
  7. Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. (2002). GATE: A framework and graphical development environment for robust NLP tools and applications. In Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, pages 168-175.
  8. Flaounas, I., Ali, O., Turchi, M., Snowsill, T., Nicart, F., De Bie, T., and Cristianini, N. (2011). NOAM : News Outlets Analysis and Monitoring System. SIGMOD, pages 1-3.
  9. Flaounas, I., Turchi, M., Ali, O., Fyson, N., De Bie, T., Mosdell, N., Lewis, J., and Cristianini, N. (2010a). The structure of the EU mediasphere. PloS one, 5(12):e14243.
  10. Flaounas, I. N., Fyson, N., and Cristianini, N. (2010b). Predicting Relations in News-Media Content among EU Countries. Cognitive Information Processing (CIP), 2nd International Workshop on.
  11. Herbster, M. and Pontil, M. (2007). Prediction on a graph with a perceptron. Advances in neural information processing systems, 19:577.
  12. Herbster, M., Pontil, M., and Rojas-galeano, S. (2009). Fast Prediction on a Tree. Neural Information Processing Systems.
  13. Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., and Czajkowski, G. (2010). Pregel: a system for large-scale graph processing. In Proceedings of the 2010 International Conference on Management of Data, pages 135-146. ACM.
  14. Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. World Wide Web Internet And Web Information Systems, pages 1-17.
  15. Travers, J. and Milgram, S. (1969). An Experimental Study of the Small World Problem. Sociometry, 32(4):425.
  16. Turchi, M., Flaounas, I. N., Ali, O., De Bie, T., Snowsill, T., and Cristianini, N. (2009). Found in Translation. In ECML/PKDD, pages 746-749, Bled, Slovenia. Springer.
  17. Wilson, D. B. (1996). Generating random spanning trees more quickly than the cover time. Proceedings of the twenty-eighth annual ACM symposium on Theory of computing - STOC 7896, pages 296-303.
  18. Zhu, X. and Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation. School Comput. Sci., Carnegie Mellon Univ., Tech. Rep. CMUCALD-02-107.
  19. Zhu, X., Ghahramani, Z., and Lafferty, J. (2003). Semisupervised learning using gaussian fields and harmonic functions. In International Conference on Machine Learning, volume 20, page 912.
Download


Paper Citation


in Harvard Style

Ali O., Zappella G., De Bie T. and Cristianini N. (2012). AN EMPIRICAL COMPARISON OF LABEL PREDICTION ALGORITHMS ON AUTOMATICALLY INFERRED NETWORKS . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 259-268. DOI: 10.5220/0003695702590268


in Bibtex Style

@conference{icpram12,
author={Omar Ali and Giovanni Zappella and Tijl De Bie and Nello Cristianini},
title={AN EMPIRICAL COMPARISON OF LABEL PREDICTION ALGORITHMS ON AUTOMATICALLY INFERRED NETWORKS},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={259-268},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003695702590268},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - AN EMPIRICAL COMPARISON OF LABEL PREDICTION ALGORITHMS ON AUTOMATICALLY INFERRED NETWORKS
SN - 978-989-8425-99-7
AU - Ali O.
AU - Zappella G.
AU - De Bie T.
AU - Cristianini N.
PY - 2012
SP - 259
EP - 268
DO - 10.5220/0003695702590268