A Flexible System for a Comprehensive Analysis of Bibliographical Data

Sahar Vahdati, Andreas Behrend, Gereon Schüller, Rainer Manthey


Scientific literature has become easily accessible by now but a comprehensive analysis of the contents and interrelationships between research papers is often missing. Therefore, a time consuming bibliographical analysis is usually performed by scientists before they can really start their research. This manual process includes the identification of the most important research trends, major papers, auspicious approaches, established conference series as well as the search for most active groups for a specific research topic. In addition, scientists have to collect related academic literature for avoiding reinvention of already published results. Although a large number of literature management systems have been developed in order to support researchers in these tasks, the offered analysis of bibliographical data is still quite limited. In this paper, we identify some of the missing analysis features and show how they could be implemented using data about author affiliations, reference relations and additional metadata, automatically generated from a set of research articles. The resulting prototypical implementation indicates the way towards the design of a general and extendible bibliographic analysis system.


  1. Aksnes, W. (2005). Citation and Their Use as Indicators in Science Policy. Study of Validity and Applicability Issues with a Particular Focus on Highly Cited Papers. PhD thesis, University of Twente.
  2. Bakkalbasi, N., Bauer, K., Glover, J., and Wang, L. (2006). Three options for citation tracking: Google scholar, scopus and web of science. BDL, 3(1).
  3. Behrend (2011). A uniform fixpoint approach to the implementation of inference methods for deductive databases. In LNAI, pages 1-16.
  4. Chen, C. (2006). Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIST, 57(3):359-377.
  5. Dong and et al. (2005). Reference reconciliation in complex information spaces. SIGMOD Rec., pages 85-96.
  6. Falagas and et al. (2008). Comparison of pubmed, scopus, web of science, and google scholar: strengths and weaknesses. FASEB., 22(2):338-342.
  7. Han, H. and et al. (2004). Two supervised learning approaches for name disambiguation in author citations. In JCDL, pages 296-305.
  8. Harzing (2013). A preliminary test of google scholar as a source for citation data: a longitudinal study of nobel prize winners. Scientometrics., 94(3):1057-1075.
  9. Jacso (2005). As we may search-comparison of major features of the web of science, scopus, and google scholar citation-based and citation-enhanced databases. CURRENT SCIENCE-BANGALORE., 89(9):1537-1547.
  10. Khazaei, H. (2012). Metadata visualization of scholarly search results: supporting exploration and discovery. In i-KNOW, pages 1-8.
  11. Klampfl, K. (2013). An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles. In TPDL, pages 144- 155.
  12. Lacasta and et al. (2013). Design and evaluation of a semantic enrichment process for bibliographic databases. DKE, 88(1):94-107.
  13. Lister, R. and Box, I. (2008). A citation analysis of the sigcse 2007 proceedings. In SIGCSE, pages 476-480.
  14. Mayol, E. and Teniente, E. (1999). A survey of current methods for integrity constraint maintenance and view updating. In ER Workshops, pages 62-73.
  15. Nascimento, M. A., Sander, J., and Pound, J. (2003). Analysis of sigmod's co-authorship graph. SIGMOD Rec., 32(3):8-10.
  16. Newman (2001). The structure of scientific collaboration networks. In PNAS, pages 401-409.
  17. Rahm, T. (2005). Citation analysis of database publications. SIGMOD Rec., pages 48-53.
  18. Smeaton, A. F. and et al. (2002). Analysis of papers from twenty-five years of sigir conferences: What have we been doing for the last quarter of a century. SIGIR Forum, 36:39-43.
  19. Tejada and et al. (2002). Learning domain-independent string transformation weights for high accuracy object identification. SIGMOD Rec., pages 350-359.
  20. Vicknair and et al. (2010). A comparison of a graph database and a relational database: a data provenance perspective. ACMSE., pages 1-6.
  21. Wu and et al. (2008). Interpreting tf-idf term weights as making relevance decisions. 26(3):1-37.

Paper Citation

in Harvard Style

Vahdati S., Behrend A., Schüller G. and Manthey R. (2014). A Flexible System for a Comprehensive Analysis of Bibliographical Data . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-023-9, pages 143-151. DOI: 10.5220/0004799201430151

in Bibtex Style

author={Sahar Vahdati and Andreas Behrend and Gereon Schüller and Rainer Manthey},
title={A Flexible System for a Comprehensive Analysis of Bibliographical Data},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},

in EndNote Style

JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - A Flexible System for a Comprehensive Analysis of Bibliographical Data
SN - 978-989-758-023-9
AU - Vahdati S.
AU - Behrend A.
AU - Schüller G.
AU - Manthey R.
PY - 2014
SP - 143
EP - 151
DO - 10.5220/0004799201430151