AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans
Wojciech M. Barczyñski, Felix Förster, Falk Brauer, Daniel Schuster
2010
Abstract
Implementing domain specific Information Extraction (IE) technologies to retrieve structured information from unstructured data is a challenging and complex task. It requires both IE expertise (e.g., in linguistics) and domain knowledge, provided by a domain expert who is aware of, say, the text corpus specifics and entities of interest. While the IE expert role is addressed by several approaches, less has been done in enabling domain experts in the process of IE development. Our approach targets this issue. We provide a base platform for collaboration of experts through IE plan modeling languages used to compose basic IE operators into complex IE flows. We provide each of the experts with a language that is adapted to their respective expertise. IE experts leverage a fine grained view and domain experts use a coarse grain view on execution of IE. We use Model Driven Architecture concept to enable transition among the languages and operators provided by an algebraicIE framework. To prove applicability of our approach we implemented an Eclipse based tool –AdaptIE– and demonstrate it in a real world scenario for the SAP Community Network.
References
- ATLAS (2006). Atlas Transformation Language (ATL) User Manual v0.7. Nantes.
- Barczynski, W. M., Brauer, F., Loeser, A., and Mocan, A. (2009). Algebraic information extraction of enterprise data: Methodology and operators. In IK-KR Workshop at 20th International Joint Conference on Artificial Intelligence 2009 (to be published).
- Bézivin, J. and Heckel, R., editors (2005). Language Engineering for Model-Driven Software Development, 29. February - 5. March 2004, volume 04101 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum fü r Informatik (IBFI), Schloss Dagstuhl, Germany.
- Bontcheva, K., Tablan, V., Maynard, D., and Cunningham, H. (2004). Evolving gate to meet new challenges in language engineering. Natural Language Engineering, 10(3-4):349-373.
- Bosch, J. and Dittrich, Y. (2004). Domain-Specific Languages for a Changing World. http://www.ide.hkr.se/ bosch/papers/dslincw.ps.
- Bouquet, P., Stoermer, H., Niederee, C., and Mana, A. (2008). Entity Name System: The Backbone of an Open and Scalable Web of Data. In ICSC 2008, number CSS-ICSC 2008-4-28-25 in CSS-ICSC, pages 554-561. IEEE Computer Society.
- Brauer, F., Barczynski, W., Hackenbroich, G., Schramm, M., Mocan, A., and Foerster, F. (2009). Rankie: Document retrieval on ranked entity graphs (demo). In 35th conference International Conference on Very Large Data Bases (VLDB) 2009.
- DeRose, P., Shen, W., 0002, F. C., Doan, A., and Ramakrishnan, R. (2007). Building structured web community portals: A top-down, compositional, and incremental approach. In (Koch et al., 2007), pages 399- 410.
- EMF (2008). Eclipse Modeling Framework. Documentation available at http://www.eclipse.org/modeling/ emf/.
- Favre, J.-M. (2004a). Foundations of meta-pyramids: Languages vs. metamodels - episode ii: Story of thotus the baboon1. In (Bézivin and Heckel, 2005).
- Favre, J.-M. (2004b). Foundations of model (driven) (reverse) engineering : Models - episode i: Stories of the fidus papyrus and of the solarus. In (Bézivin and Heckel, 2005).
- Ferruci, D. and Lally, A. (2004). Uima: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327-348.
- GMF (2009). http://gmf.eclipse.org.
- Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1).
- Koch, C., Gehrke, J., Garofalakis, M. N., Srivastava, D., Aberer, K., Deshpande, A., Florescu, D., Chan, C. Y., Ganti, V., Kanne, C.-C., Klas, W., and Neuhold, E. J., editors (2007). Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007. ACM.
- Petrasch, R. and Meimberg, O. (2006). Model Driven Architecture Eine praxisorientierte Einfhrung in die MDA. dpunkt.verlag.
- Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H., and Vaithyanathan, S. (2008). An algebraic approach to rule-based information extraction. In ICDE, pages 933-942. IEEE.
- Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3):261-377.
- Shen, W., DeRose, P., McCann, R., Doan, A., and Ramakrishnan, R. (2008). Toward best-effort information extraction. In SIGMOD 7808: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1031-1042, New York, NY, USA. ACM.
- Shen, W., Doan, A., Naughton, J. F., and Ramakrishnan, R. (2007). Declarative information extraction using datalog with embedded extraction predicates. In (Koch et al., 2007), pages 1033-1044.
Paper Citation
in Harvard Style
M. Barczyñski W., Förster F., Brauer F. and Schuster D. (2010). AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8425-04-1, pages 249-256. DOI: 10.5220/0002902602490256
in Bibtex Style
@conference{iceis10,
author={Wojciech M. Barczyñski and Felix Förster and Falk Brauer and Daniel Schuster},
title={AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2010},
pages={249-256},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002902602490256},
isbn={978-989-8425-04-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans
SN - 978-989-8425-04-1
AU - M. Barczyñski W.
AU - Förster F.
AU - Brauer F.
AU - Schuster D.
PY - 2010
SP - 249
EP - 256
DO - 10.5220/0002902602490256