Dimensionality Reduction for Supervised Learning in Link Prediction Problems

Antonio Pecli, Bruno Giovanini, Carla C. Pacheco, Carlos Moreira, Fernando Ferreira, Frederico Tosta, Júlio Tesolin, Marcio Vinicius Dias, Silas Filho, Maria Claudia Cavalcanti, Ronaldo Goldschmidt

Abstract

In recent years, a considerable amount of attention has been devoted to research on complex networks and their properties. Collaborative environments, social networks and recommender systems are popular examples of complex networks that emerged recently and are object of interest in academy and industry. Many studies model complex networks as graphs and tackle the link prediction problem, one major open question in network evolution. It consists in predicting the likelihood of an association between two not interconnected nodes in a graph to appear. One of the approaches to such problem is based on binary classification supervised learning. Although the curse of dimensionality is a historical obstacle in machine learning, little effort has been applied to deal with it in the link prediction scenario. So, this paper evaluates the effects of dimensionality reduction as a preprocessing stage to the binary classifier construction in link prediction applications. Two dimensionality reduction strategies are experimented: Principal Component Analysis (PCA) and Forward Feature Selection (FFS). The results of experiments with three different datasets and four traditional machine learning algorithms show that dimensionality reduction with PCA and FFS can improve model precision in this kind of problem.

References

  1. Adamic, L. A., Adar, E., 2003. Friends and neighbors on the web. Social Networks, 25 (3), 211-230.
  2. Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., 2002. Evolution of the social network of scientific collaboration. Physica A: Statistical Mechanics and its Applications, 311 (3), 590-614.
  3. Barrat, A., Barthelemy, M., Pastor-Satorras, R. and Vespignani, A., 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences, 101. 3747-3752.
  4. Benchettara, N., Kanawati, R., Rouveirol, C., 2010. Supervised Machine Learning applied to Link Prediction in Bipartite Social Networks. International Conference on Advances in Social Networks Analysis and Mining, 326-330.
  5. Caruana, R., Karampatziakis, N., Yessenalina, A., 2008. An empirical evaluation of supervised learning in high dimensions. International Conference on Machine Learning (ICML). ACM. 96-103.
  6. Freeman, L. C., 1978. Centrality in social networks conceptual clarification. Social Networks, 1 (3).
  7. Freitas, A. A., 2002. Data Mining and Knowledge Discovery with Evolutionary Algorithms, Springer. New York.
  8. Hagberg, A. A., Schult, D. A., Swart, P. J., 2008. Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference (SciPy2008). 11-15.
  9. Hasan, M. A., Chaoji, V., Salem, S., Zaki, M., 2006. Link Prediction using Supervised Learning In Proc. of SDM 06 workshop on Link Analysis, Counterterrorism and Security, Counterterrorism and Security, SIAM Data Mining Conference.
  10. Hasan, M. A., Zaki, M. J., 2011. A survey on Link Prediction in Social Networks. Social Network Data Analytics. Springer. 243-275.
  11. Huang, Z., Li, X., Chen, H., 2005. Link prediction approach to collaborative filtering. Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries. ACM. 141-142.
  12. Izudheen, S., Mathew, S., 2013. Link Prediction in Protein Networks. Indian Journal of Applied Research, 3 (5).
  13. Jackson, J. E., 1991. A User's Guide to Principal Components, Wiley. New York.
  14. Katz, L., 1953. A new status index derived from sociometric analysis. Psychometrika, 18 (1). 39-43.
  15. Kohavi, R., John, G. H., 1997. Wrappers for feature subset selection. Artificial Intelligence, 97 (1). 273-324.
  16. Leicht, E. A., Holme, P., Newman, M. E. J., 2006, Vertex similarity in networks. Physical Review E, 73 (2).
  17. Li, X., Chen, H., 2009. Recommendation as link prediction: a graph kernel-based machine learning approach. Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries. 213-216.
  18. Liben-Nowell, D., Kleinberg, J., 2003. The link prediction problem for social networks. Proceedings of the twelfth international conference on Information and knowledge management. 556-559.
  19. Lind, P. G., Gonzalez, M. C., Herrmann, H. J., 2005. Cycles and clustering in bipartite networks. Physical Review E, 72.
  20. Leskovec, J., Krevl, A., 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from: http://snap.stanford.edu/data.
  21. Ley, M., 2009. DBLP: some lessons learned. Proceedings of the VLDB Endowment, 2 (2).
  22. Lü, L., Jin, C., Zhou, T., 2009. Similarity index based on local paths for link prediction of complex networks. In Physical Review, 80.
  23. Lü, L., Zhou, T., 2011. Link prediction in complex networks: A survey. Physica A, 390.
  24. Narang, K.; Lerman, K. & Kumaraguru, P., 2013. Network flows and the link prediction problem. Proceedings of the 7th Workshop on Social Network Mining and Analysis.
  25. Oyama, S., Hayashi, K., Kashima, H., 2011, Crosstemporal Link Prediction. Proceedings of the 11th International Conference on Data Mining (ICDM).
  26. Papadimitriou, A., Symeonidis, P., Manolopoulos, Y., 2012. Fast and Accurate Link Prediction in Social Networking Systems. Journal of Systems and Software, 85 (9).
  27. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12.
  28. Pujari, M., Kanawati, R., 2012. Tag Recommendation by Link Prediction Based on Supervised Machine Learning. Proceedings of the Sixth International Conference on Weblogs and Social Media.
  29. Sa, H. R., Prudencio, R. B. C., 2011. Supervised link prediction in weighted networks. The 2011 International Joint conference on Neural Networks (IJCNN).
  30. Saramäki, J.; Kivelä, M.; Onnela, J.; Kaski, K. & Kertesz, J., 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review, 75 (2).
  31. Shojaie A., 2013. Link Prediction in Biological Networks using Penalized Multi-Mode Exponential Random Graph Models. Proceedings of the 13th KDD Workshop on Learning and Mining with Graphs.
  32. Symeonidis, P., Iakovidou, N., Mantas, N., Manolopoulos, Y., 2013. From biological to social networks: Link prediction based on multi-way spectral clustering. Data & Knowledge Engineering, 87.
  33. Wang, L., Hu, K., Tang, Y., 2013. Robustness of Linkprediction Algorithm Based on Similarity and Application to Biological Networks. Current Bioinformatics, 9 (3).
  34. Xu, Y., Rockmore, D., 2012. Feature selection for link prediction. Proceedings of the 5th Ph.D. workshop on Information and knowledge. 25-32.
  35. Yu, L., Liu, H., 2003. Feature selection for highdimensional data: a fast correlation-based filter solution. Machine Learning International Workshop Then Conference, 20 (2).
Download


Paper Citation


in Harvard Style

Pecli A., Giovanini B., C. Pacheco C., Moreira C., Ferreira F., Tosta F., Tesolin J., Vinicius Dias M., Filho S., Claudia Cavalcanti M. and Goldschmidt R. (2015). Dimensionality Reduction for Supervised Learning in Link Prediction Problems . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 295-302. DOI: 10.5220/0005371802950302


in Bibtex Style

@conference{iceis15,
author={Antonio Pecli and Bruno Giovanini and Carla C. Pacheco and Carlos Moreira and Fernando Ferreira and Frederico Tosta and Júlio Tesolin and Marcio Vinicius Dias and Silas Filho and Maria Claudia Cavalcanti and Ronaldo Goldschmidt},
title={Dimensionality Reduction for Supervised Learning in Link Prediction Problems},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={295-302},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005371802950302},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Dimensionality Reduction for Supervised Learning in Link Prediction Problems
SN - 978-989-758-096-3
AU - Pecli A.
AU - Giovanini B.
AU - C. Pacheco C.
AU - Moreira C.
AU - Ferreira F.
AU - Tosta F.
AU - Tesolin J.
AU - Vinicius Dias M.
AU - Filho S.
AU - Claudia Cavalcanti M.
AU - Goldschmidt R.
PY - 2015
SP - 295
EP - 302
DO - 10.5220/0005371802950302