Extraction of Biographical Data from Wikipedia

Robert Viseur



Using the content of Wikipedia articles is common in academic research. However the practicalities are rarely analysed. Our research focuses on extracting biographical information about personalities from Belgium. Our research is divided into three sections. The first section describes the state of the art for data extraction from Wikipedia. A second section presents the case study about data extraction for biographies of Belgian personalities. Different solutions are discussed and the solution adopted is implemented. In the third section, the quality of the extraction is discussed. Practical recommendations for researchers wishing to use Wikipedia are also proposed on the basis of our case study.


  1. Auer S., Bizer C., Kobilarov G., Lehmann J., Cyganiak R., Ives Z., 2007. DBpedia: A Nucleus for a Web of Open Data, Lecture Notes in Computer Science, Vol. 4825, pp 722-735.
  2. Bekavac B., Tadic M., 2008. A Generic Method for Multi Word Extraction from Wikipedia, Proceedings of the Int. Conf. on Information Technology Interfaces, June 23-26, 2008.
  3. Biadsy F., Hirschberg J., Filatova E., 2008. An Unsupervised Approach to Biography Production using Wikipedia, Proceedings of ACL-08: HLT, pp. 807-815.
  4. Buscaldi D., Rosso P., 2006. Mining Knowledge from Wikipedia for the Question Answering task, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006).
  5. Charton E. Gagnon M., Ozell B., 2010. Extension d'un système d'étiquetage d'entités nommées en étiqueteur sémantique, TALN 2010, 19-23 juillet 2010.
  6. Hellmann S., Stadler C., Lehmann L., Auer S., 2009. DBpedia Live Extraction, Lecture Notes in Computer Science, Vol. 5871, pp 1209-1223.
  7. Hu X., Zhang X., Lu C., Park, E. K., Zhou, X., 2009. Exploiting Wikipedia as external knowledge for document clustering, KDD 7809 Proceedings of the 15th international conference on Knowledge discovery and data mining.
  8. Kazama J., Torisawa K., 2007. Exploiting Wikipedia as External Knowledge for Named Entity Recognition, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 2007, pp. 698-707.
  9. Kittur A., Chi E.H., Suh B., 2009. What's in Wikipedia?: Mapping Topics and Conflict using Socially Annotated Category Structure, Proceedings of the 27th international Conference on Human Factors in Computing Systems, April 04-09, 2009.
  10. Kobilarov G., Scott T., Raimond Y., Oliver S., Sizemore C., Smethurst M., Bizer C., Lee R., 2009. Media meets Semantic Web - How the BBC uses DBpedia and Linked Data to make Connection, ESWC 2009, pp. 723-737.

Paper Citation

in Harvard Style

Viseur R. (2013). Extraction of Biographical Data from Wikipedia . In Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-67-9, pages 248-252. DOI: 10.5220/0004595302480252

in Bibtex Style

author={Robert Viseur},
title={Extraction of Biographical Data from Wikipedia},
booktitle={Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,},

in EndNote Style

JO - Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - Extraction of Biographical Data from Wikipedia
SN - 978-989-8565-67-9
AU - Viseur R.
PY - 2013
SP - 248
EP - 252
DO - 10.5220/0004595302480252