Natural Language Processing Techniques for Document Classification in IT Benchmarking - Automated Identification of Domain Specific Terms

Matthias Pfaff, Helmut Krcmar

2015

Abstract

In the domain of IT benchmarking collected data are often stored in natural language text and therefore intrinsically unstructured. To ease data analysis and data evaluations across different types of IT benchmarking approaches a semantic representation of this information is crucial. Thus, the identification of conceptual (semantical) similarities is the first step in the development of an integrative data management in this domain. As an ontology is a specification of such a conceptualization an association of terms, relations between terms and related instances must be developed. Building on previous research we present an approach for an automated term extraction by the use of natural language processing (NLP) techniques. Terms are automatically extracted out of existing IT benchmarking documents leading to a domain specific dictionary. These extracted terms are representative for each document and describe the purpose and content of each file and server as a basis for the ontology development process in the domain of IT benchmarking.

References

  1. Alatrish, E. S., Tosic, D., and Milenkovic, N. (2014). Building ontologies for different natural languages. Comput. Sci. Inf. Syst., 11(2):623-644.
  2. Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly, Beijing.
  3. Brewster, C. and O'Hara, K. (2007). Knowledge representation with ontologies: Present challenges - future possibilities. International Journal of HumanComputer Studies, 65(7):563-568.
  4. Cambria, E., Hussain, A., and Eckl, C. (2011). Bridging the gap between structured and unstructured health-care data through semantics and sentics. In Proceedings of ACM.
  5. Camp, R. (1989). Benchmarking: The search for industry best practices that lead to superior performance.
  6. Camp, R. (1995). Business process benchmarking : finding and implementing best practices. ASQC Quality Press, Milwaukee, Wis.
  7. Chandrasekaran, B., Josephson, J. R., and Benjamins, V. R. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, 14(1):20-26.
  8. Fernandez-Lopez, M., Gomez-Perez, A., and Juristo, N. (1997). Methontology: from ontological art towards ontological engineering. In Proceedings of the AAAI97 Spring Symposium, pages 33-40.
  9. Foundation, A. S. (2014). apache.org.
  10. Gacenga, F., Cater-Steel, A., Tan, W., and Toleman, M. (2011). It service management: towards a contingency theory of performance measurement. In International Conference on Information Systems, pages 1-18.
  11. Guarino, N. (1995). Formal ontology, conceptual analysis and knowledge representation. International Journal of Human-Computer Studies, 43(5-6):625-640.
  12. Horkoff, J., Borgida, A., Mylopoulos, J., Barone, D., Jiang, L., Yu, E., and Amyot, D. (2012). Making Data Meaningful: The Business Intelligence Model and Its Formal Semantics in Description Logics, volume 7566 of Lecture Notes in Computer Science, book section 17, pages 700-717. Springer Berlin Heidelberg.
  13. Jakob, M., Pfaff, M., and Reidt, A. (2013). A literature review of research on it benchmarking. In Krcmar, H., Goswami, S., Schermann, M., Wittges, H., and Wolf, P., editors, 11th Workshop on Information Systems and Service Sciences, volume 25.
  14. Jurs?ic, M., Mozetic, I., Erjavec, T., and Lavrac, N. (2010). Lemmagen: Multilingual lemmatisation with induced ripple-down rules. Journal of Universal Computer Science, 16(9):1190-1214.
  15. Karanikolas, N. N. and Skourlas, C. (2010). A parametric methodology for text classification. Journal of Information Science, 36(4):421-442.
  16. Kütz, M. (2006). IT-Steuerung mit Kennzahlensystemen. dpunkt.verlag, Heidelberg.
  17. Lame, G. (2005). Using nlp techniques to identify legal ontology components: Concepts and relations. In Benjamins, V., Casanovas, P., Breuker, J., and Gangemi, A., editors, Law and the Semantic Web, volume 3369 of Lecture Notes in Computer Science, pages 169- 184. Springer Berlin Heidelberg.
  18. LemmaGen (2011). LemmaGen, source lemmatisation framework. ijs.si/Services.
  19. Maynard, D., Li, Y., and Peters, W. (2008). Nlp techniques for term extraction and ontology population. In Proceedings of the 2008 Conference on Ontology Learning and Population: Bridging the Gap Between Text and Knowledge, pages 107-127, Amsterdam, The Netherlands, The Netherlands. IOS Press.
  20. Müller, M. (2010). Fusion of Spatial Information Models with Formal Ontologies in the Medical Domain. Thesis.
  21. Nissen, V., Petsch, M., Jung, D., and Praeg, C.-P. (2014). Empfehlungen fr eine generelle IT-Service-KatalogStruktur, book section 8, pages 133-154. Springer Fachmedien Wiesbaden.
  22. Nohr, H. (2003). Grundlagen der automatischen Indexierung: ein Lehrbuch. Logos-Verlag.
  23. Noy, N. F. and McGuinness, D. L. (2001). Ontology development 101: A guide to creating your first ontology.
  24. Peters, G. (1994). Benchmarking Customer Service. Financial Times Management Series. McGraw-Hill, London.
  25. Pfaff, M. and Krcmar, H. (2014). Semantic integration of semi-structured distributed data in the domain of it benchmarking. In 16th International Conference on Enterprise Information Systems (ICEIS).
  26. Pinto, H. S. and Martins, J. P. (2004). Ontologies: How can they be built? Knowledge and Information Systems, 6(4):441-464.
  27. Ray, S. and Chandra, N. (2012). Building Domain Ontologies and Automated Text Categorization: a contribution to NLP. LAP Lambert Academic Publishing.
  28. Riedl, C., May, N., Finzen, J., Stathel, S., Kaufman, V., and Krcmar, H. (2009). An idea ontology for innovation management. International Journal on Semantic Web and Information Systems, 5(4):1-18.
  29. Riempp, G., Müller, B., and Ahlemann, F. (2008). Towards a framework to structure and assess strategic IT/IS management. European Conference on Information Systems, pages 2484-2495.
  30. Rudolph, S. (2009). Servicebasierte Planung und Steuerung der IT-Infrastruktur im Mittelstand: Ein Modellansatz zur Struktur der IT-Leistungserbringung in mittelstndischen Unternehmen. Thesis.
  31. Rudolph, S. and Krcmar, H. (2009). Maturity model for it service catalogues an approach to assess the quality of IT service documentation. pages 759-759.
  32. Sack, D. H. (2008). Semantic Web. Hasso-Plattner-Institute, Potsdam.
  33. Salton, G. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  34. Savoy, J. (2014). German stopwordlist. http://members. unine.ch/jacques.savoy/clef/germanST.txt.
  35. Slevin, D. P., Stieman, P. A., and Boone, L. W. (1991). Critical success factor analysis for information systems performance measurement and enhancement: A case study in the university environment. Information & management, 21(3):161-174.
  36. Smith, H. A. and McKeen, J. D. (1996). Measuring is: how does your organization rate? ACM SIGMIS Database, 27(1):18-30.
  37. Spendolini, M. J. (1992). The benchmarking book. Amacom New York, NY.
  38. Uschold, M. and Gruninger, M. (2004). Ontologies and semantics for seamless connectivity. SIGMOD Record, 33(4).
  39. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., and Hübner, S. (2001). Ontology-based integration of information - a survey of existing approaches. In Stuckenschmidt, H., editor, IJCAI-01 Workshop: Ontologies and Information Sharing, pages 108-117.
  40. Witte, R., Khamis, N., and Rilling, J. (2010). Flexible ontology population from text: The owlexporter. In In: Int. Conf. on Language Resources and Evaluation (LREC.
  41. Wollersheim, J., Pfaff, M., and Krcmar, H. (2014). Information need in cloud service procurement - an exploratory case study. In E-Commerce and Web Technologies - 15th International Conference, EC-Web 2014, Munich, Germany, September 1-4, 2014. Proceedings, pages 26-33.
  42. Ziaie, P., Ziller, M., Wollersheim, J., and Krcmar, J. (2012). Introducing a generic concept for an online IT-Benchmarking System. International Journal of Computer Information Systems and Industrial Management Applications, 5.
  43. Zipf, G. (1949). Human behavior and the principle of least effort: an introduction to human ecology. AddisonWesley Press.
Download


Paper Citation


in Harvard Style

Pfaff M. and Krcmar H. (2015). Natural Language Processing Techniques for Document Classification in IT Benchmarking - Automated Identification of Domain Specific Terms . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 360-366. DOI: 10.5220/0005462303600366


in Bibtex Style

@conference{iceis15,
author={Matthias Pfaff and Helmut Krcmar},
title={Natural Language Processing Techniques for Document Classification in IT Benchmarking - Automated Identification of Domain Specific Terms},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={360-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005462303600366},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Natural Language Processing Techniques for Document Classification in IT Benchmarking - Automated Identification of Domain Specific Terms
SN - 978-989-758-096-3
AU - Pfaff M.
AU - Krcmar H.
PY - 2015
SP - 360
EP - 366
DO - 10.5220/0005462303600366