AdaptIE

Using Domain Language Concept to Enable Domain Experts in Modeling of

Information Extraction Plans

Wojciech M. Barczy

nski, Felix F

orster, Falk Brauer

SAP AG, SAP Research CEC Dresden, Chemnitzer Str. 48, Dresden, Germany

Daniel Schuster

Technische Universit

at Dresden, Dresden, Germany

Keywords:

Information extraction, Domain adaption, User interface, Process modeling.

Abstract:

Implementing domain speciﬁc Information Extraction (IE) technologies to retrieve structured information

from unstructured data is a challenging and complex task. It requires both IE expertise (e.g., in linguistics)

and domain knowledge, provided by a domain expert who is aware of, say, the text corpus speciﬁcs and entities

of interest. While the IE expert role is addressed by several approaches, less has been done in enabling domain

experts in the process of IE development. Our approach targets this issue. We provide a base platform for

collaboration of experts through IE plan modeling languages used to compose basic IE operators into complex

IE ﬂows. We provide each of the experts with a language that is adapted to their respective expertise. IE

experts leverage a ﬁne grained view and domain experts use a coarse grain view on execution of IE. We

use Model Driven Architecture concept to enable transition among the languages and operators provided by

an algebraicIE framework. To prove applicability of our approach we implemented an Eclipse based tool –

AdaptIE– and demonstrate it in a real world scenario for the SAP Community Network.

1 INTRODUCTION

The problem of managing unstructured text and

present the information in structured form, com-

monly known as Information Extraction (IE), is be-

coming increasingly important. The amount of

human-generated information (unstructured data) is

constantly growing and contains a lot of valuable

information, comprising Web 2.0 data as well as

enterprise-internal wikis, technical blogs, and com-

munity portals. In recent years there has been a con-

siderable amount of interest in developing IE systems,

e.g., (Reiss et al., 2008; Shen et al., 2007). Many ex-

isting frameworks are general purpose extraction sys-

tems - they use algebraic operators (similar to SQL)

or logical programming languages (based on Data-

log) to provide extraction functionality. However, us-

ing those kinds of systems imposes severe usability

challenges on domain experts. The usability of such

systems is important, because (domain) knowledge is

a key element in achieving high accuracy of IE (see,

e.g., (Sarawagi, 2008)). As domain knowledge we

denote both knowledge about semantic and document

structure. Contemporary solutions require from do-

main experts to familiarize themselves with the IE

system, to gain technical expertise, and focus on the

extraction task itself instead of contributing their do-

main knowledge.

In our work we approach the usability issue of IE

languages from a user perspective. In particular we

specify this research question as: how to foster the

collaboration between IE and domain experts to make

them contribute their distinct knowledge into the de-

velopment of a domain-speciﬁc IE system (DES)?

Our approach addresses this question by providing an

IE language platform based on an algebraic extrac-

tion framework, introduced in Section 3, (e.g., (Shen

et al., 2008)) and the concept of domain speciﬁc lan-

guages from Model Driven Architecture (MDA) re-

search (Bosch and Dittrich, 2004). We provide each

of the experts with an IE plan modeling language that

is adapted to their respective expertise. An IE expert

has a ﬁne grain view (e.g., regular expression used in

an IE operator) and a domain expert leverages a coarse

249

M. Barczyñski W., Förster F., Brauer F. and Schuster D. (2010).

AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans.

In Proceedings of the 12th International Conference on Enterprise Information Systems - Databases and Information Systems Integration, pages

249-256

DOI: 10.5220/0002902602490256

 SciTePress

grain view (an IE operator is seen as a black box) on

a IE plan they both work on. We use the concept of

model transformation to enable transition among each

of languages and algebraic model of an IE framework.

Therefore, created extraction plans are stored in an IE

repository for later reuse. Experts’ work follows a

process that structures actions in creating DES from

Generic Extraction System (GES). GES provides a set

of generic IE operators that performs most common

IE tasks, e.g., regular expression extraction.

We implemented our approach as an Eclipse based

(http://www.eclipse.org) graphical modeling tool –

AdaptIE– for both the IE experts as well as the domain

experts. To demonstrate beneﬁts of our approach we

apply it in a scenario of text extraction of an SDN fo-

rum (http://forums.sdn.sap.com). AdaptIE was also

used to construct IE plans used for entity recognition

in a retrieval system presented in (Brauer et al., 2009).

Our contributions are: we show how a concept of

domain speciﬁc languages can be applied in IE re-

search to foster collaboration between different kinds

of experts in order to simplify creation of DES. More-

over we provided a process model for creating DES

from GES and special-purpose languages for IE ex-

traction task modeling for each of the experts. Our

work can be seen as an effort to ﬁnd synergies be-

tween different research areas to bring new value to

information systems research stated in the claremont

report (Agrawal et al., 2008). In particular, authors

of the report name as one of the challenges, the de-

velopment of new declarative languages that enable

non-experts (in our case non-IE experts) in develop-

ing programs (IE plans, respectively) for complex in-

formation systems (IE system).

We start our discussion by identifying related

work in Section 2. Our work relies on a generic IE

framework that we introduce in Section 3. We fur-

ther generalize and describe in detail a process for cre-

ating IE systems for different target domains in Sec-

tion 4. Section 5 provides insight into AdaptIE, which

is built on top of an algebraic IE framework. We val-

idate our concepts by considering the problem of ex-

tracting products and error messages from the SDN in

(Section 6). Section 7 concludes and discusses future

directions of our work.

2 RELATED WORK

IE frameworks. GATE (Bontcheva et al., 2004) and

UIMA (Ferruci and Lally, 2004) proposed architec-

tural approaches for Information Extraction systems

which at that time provided a common infrastructure

for IE. Both provide an object oriented framework

for black box composition of Information Extraction

modules in a pipeline. However, combined mod-

ules have to share the same semantics and incorporate

information extraction logic and domain knowledge

within the code itself.

IE languages. To overcome the drawbacks of archi-

tectural approaches, recent work propose languages

for operator composition and separation of domain

knowledge from the code. Notable approaches in this

area are (Reiss et al., 2008) and (DeRose et al., 2007).

Reiss et al. proposed an SQL like language, called

AQL, to combine a very small set of well deﬁned al-

gebraic operators (Reiss et al., 2008). The aim is to

simplify using Information Extraction technology and

bring it closer to non IE experts familiar with SQL. The

drawback of this approach is the limitation for IE ex-

perts to plug custom operators. DeRose et al. present

a concept of extraction plans as a representation of

extraction programs composition. This approach was

extended by a Datalog like language for plan creation

- XLOG (Shen et al., 2007). The authors state that

XLOG might help IE engineers to speed up IE plans

constructions by relaying on the clear semantic of

logic programming language. Our approach extends

these approaches in three major points. First, we pro-

vide a clear separation between extraction logic and

domain knowledge without limiting IE experts in ex-

tending the extraction logic by introducing new op-

erators. Next, we provide a mechanism and user

interfaces for customizing and composing operators

into easy understandable extraction programs, pre-

compiled via code generation and ready to use by

the domain users. Last, we enable domain experts,

who are not familiar to IE, to contribute their domain

knowledge.

MDA and MDE. We realize our approach using do-

main speciﬁc languages (DSL) and model driven ar-

chitecture (MDA). The foundations of metamodeling

in general, independent from a particular technology

are presented in (Favre, 2004a). A metamodel is in-

troduced there and incrementally reﬁned in (Favre,

2004b). Petrasch et al. explain in (Petrasch and

Meimberg, 2006) in detail the MDA standard deﬁned

by the Object Management Group and lays down the

fundamentals of model-driven software architecture.

AdaptIE tool is used to create domain speciﬁc lan-

guages for each of the users.

3 CONCEPTS AND

BACKGROUND

This section presents in detail an algebraic extraction

framework, which is introduced in (Barczynski et al.,

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

250

2009). The framework provides a set of generic oper-

ators and algebra which deﬁnes how to compose and

extend them in order to perform extraction. This clear

semantic allows us to make our approach generic for

all operators, which are described in the terms of the

framework. Moreover we introduce domain indepen-

dent and domain speciﬁc operators. As mentioned in

Section 2, there are other algebraic IE frameworks,

which could be used as well. Operators work on

annotations, which are extracted parts (fragments of

text) of a document, such as the title of a HTML page

or a recognized product. They contain semantic meta-

data: the entity type (e.g., SAP Product), and the ex-

tracted entity itself (e.g., NetWeaver 2004s) with, if

available, a unique id (e.g., an OKKAM Id (Bouquet

et al., 2008)). An operator takes as input a set of anno-

tations and returns new ones. Every operator has a set

of parameters for customizing its functionality. Those

parameters can be general (e.g., name and id) and do-

main speciﬁc (e.g., regular expression for ErcRegEx).

Table 1: Example of operators in the IE framework.

Operator Name Description

Import Imports documents from a given data source.

Basic Operators

ErcRegEx Extracts text using a regular expression.

RxR Extracts text between two regular expressions.

SentenceEx Detects sentences.

NounGrpEx Creates annotations for nouns groups.

ErcDict Extracts using a dictionary.

Relation Operators

ErcRel Relates annotations with objects in structured data.

Relate Combines two entities to one complex entity

Set Operators

Group Groups annotations by document id.

Operators. Each framework operator (see Table 1)

is considered atomic if it conducts a single and in-

divisible task in IE. Tasks include extracting entities,

identifying relationships, or combining extracted en-

tities. In our framework we distinguish the follow-

ing groups of atomic operators: Basic operators are

used to extract entities from unstructured documents.

Grammatical rules or dictionaries can be used for this

purpose. Relation operators combine extracted anno-

tations to complex entities or map complex entities

to structured data. Set operators, known in query lan-

guages, such as SQL, allow applying group, union and

aggregation on extracted entities. Complex operators

- atomic operators (as well as complex operators) can

be chained together to build complex extraction op-

erators. Complex operators encapsulate a complete

task in the process of IE (e.g., extracting the relation

between product and error message).

Domain adaptation. Both types of operators are ab-

Figure 1: Instantiating a GES to DES.

stract and only provide a skeleton (structure and func-

tionality), but without conﬁguration, they lack the do-

main knowledge necessary for the execution (see Fig-

ure 1). By using a generic approach for IE (Generic

Extraction System - GES) we gain extensibility and

generality. But this gain comes along with the need to

provide additional information in the framework - do-

main adaptation, user customization and composition

(see Figure 1). Such a customized framework we call

Domain Speciﬁc Extraction System (DES). As such

we speak of domain-independent operators and after

being provided with domain speciﬁc values for pa-

rameters of domain-speciﬁc operators. The method-

ology for creating DES is provided in Section 4.

4 USER-CENTRIC IE

This section provides a classiﬁcation of user roles in

IE, followed by an overview and an explanation of

our approach. Speciﬁcally we explain the sequence

of steps, called IE (domain) adaptation process.

4.1 IE Adaptation Process

Users of IE Systems. Different users are involved in

the process of extracting data and as a consequence,

they use IE systems to fulﬁll different tasks. We clas-

sify users by their level of expertise and distinguish

two such types. We leave out end users that use a

completed DES because this issue does not bring any-

thing new to our discussion.

First, we have IE experts (e.g., IE Consultant and

IE Engineer) that are familiar with the concepts of IE.

They know about NLP, operators, about variability of

their settings, and also know about the right order in

which they have to be arranged (e.g., noun grouping

before applying other extraction operators). These

users are able to create extraction plans by combin-

ing and conﬁguring the IE operators appropriately.

Second, there are domain experts (e.g., Product

Manager). They concentrate on their speciﬁc area and

AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans

251

Figure 2: IE adaptation.

contribute their domain knowledge rather than focus

on the extraction task itself. Regarding IE, they do not

need to be aware of system internals. They usually do

not have IE expertise and they are, thus, not able to

deal with the complexity of IE. They need to formu-

late complex queries against unstructured data. They

require a tool, which let to focus on their task.

IE adaptation Process. Figure 2 shows an overview

of our approach. The base is an IE framework (GES)

- see Figure 2a. As the domain expert (Figure 2c)

does not have the expertise to deal with the complex-

ity of creating a complex extraction program, the IE

expert (Figure 2f) uses an adaptation process (Fig-

ure 2b) to prepare and pre-conﬁgure operators for the

domain expert. It is the initial step; afterwards whole

process consists of iterations over IE plan by both

experts. First the IE expert prepares operators with

generic conﬁguration to the domain expert. Adapta-

tion consists of combining and adjusting atomic oper-

ators appropriately as well as performing a customiza-

tion tailored to speciﬁc domain expert’s needs. Ad-

ditionally – in next iterations after receiving feedback

from the domain expert – the IE expert tunes operators

based on his IE knowledge, e.g., by setting caching

strategies. As such, a set of domain-speciﬁc extrac-

tion operators can be created and stored in a reposi-

tory. The adaptation is done by using an expert graph-

ical interface. As a result domain expert is provided

with necessary operators in his IE modeling language.

The language uses familiar vocabulary (e.g., names

Figure 3: Actions necessary to provide DES.

of operators) and hides all ﬁne-grained tuning param-

eters. Using this language, the domain expert con-

tributes his domain knowledge (e.g., knowledge about

the structure of forum threads) using an intuitive GUI

(see Figure 7), which allows modeling concrete ex-

traction programs. Subsequently, a domain expert can

import and export previously deﬁned (complex) oper-

ators from repository (Figure 2e). Moreover an IE ex-

pert can access adapted operators to speciﬁc domain.

Final result of several iterations is a DES, depicted

in Figure 2d.

4.2 Users’ Roles and Actions

Several steps have to be performed to incorporate the

needed information in the GES, and to create a DES.

Those steps form an adaptation process which is de-

picted in Figure 3 (see also Figure 2b and Figure 2e).

In the following we will discuss the necessary steps

and focus on interaction between IE and domain ex-

perts.

Template Deﬁnition. The IE expert is solely respon-

sible for deﬁning templates. Templates are a mean to

describe the structure of each operator type. They can

be written by deﬁning XML-ﬁles by hand or by using

the IE expert interface. Parameters, their types, and

default values can be deﬁned. Every template con-

sists of parameters common to all operators (e.g. id,

name, and version), but also allows the deﬁnition of

operator-speciﬁc parameters. The usage of an opera-

tor may be limited to a certain context. For instance,

in contrast to other operators, the Import-operator

must not have predecessor operators, or ErcDict, the

dictionary-based operator needs to be connected to an

external data source providing the dictionary. Tem-

plates serve three purposes. First, they decouple IE

plan modeling from the concrete implementation in

the IE system. Second, they help to automate the pro-

cess of extending the system with new operators (add

to the tool GUI) and third, they constrain the applica-

tion of an operator.

Domain Adaptation. An IE system needs to be

conﬁgured appropriately for one application domain,

thus, domain-speciﬁc information has to be added.

Reﬁning Operators. Reﬁning operators means to pro-

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

252

vide values for parameters contained in atomic op-

erators. We distinguish common parameters (such

as name, and version), performance parameters (e.g.

caching, memory usage), and domain parameters (e.g.

location of dictionary). Parameters may be left un-

speciﬁed by the IE expert, in that case, we speak of

partial reﬁnement.

Composing Operators. Operators might require (e.g.,

for achieving higher precision) the execution of other

operators before they can be used. This usually in-

cludes operators for importing data or for natural lan-

guage (pre-) processing, such as annotating sentences

and noun groups. Thus, it is possible to combine

atomic extraction operators in the right order to model

complex extraction operators or to combine atomic

and complex operators to build a complete extraction

program. This is called operator composition and can

be done before or after operator reﬁnement.

Both IE experts and domain experts are responsi-

ble for performing the domain adaptation. For the ex-

pert it means to translate requirements formulated by

a domain expert to the world of IE and (partially) pre-

conﬁgure domain-speciﬁc operators. The domain ex-

pert, based on his domain knowledge, completes the

partial reﬁnement and models the ﬁnal extraction plan

using a DES (Figure 2d).

Expert Conﬁguration. The IE expert, based on his

knowledge or in response to the domain expert re-

quest, performs ﬁne-grained tuning. For example, he

redeﬁnes operators for achieving higher recall.

User Customization. After providing the domain-

knowledge, the IE expert performs ﬁnal adjustments

to hide the complexity and the details from the do-

main expert, thus customizing the system to domain

expert requirements. We introduce here different cat-

egories of customization and how they are performed

to provide user-centric IE. We denote those categories

dimensions of customization. Customization can be

applied on the following dimensions:

Operator Parameters. We allow the IE expert to hide

parameters, set them read-only and provide default

values. This provides a higher level of abstraction for

the domain expert.

Operator Composition. Complex operators are cre-

ated by combining atomic operators. They hide inter-

nal structure and complexity of extraction ﬂows. Cer-

tain steps of the IE process are not of interest for the

domain expert but nevertheless important and needed

in overall process.

Operator Documentation. As soon as an operator is

conﬁgured and provided with parameters, it is ready

Figure 4: Model transformation is used to implement the

user customization.

to be used in a domain-speciﬁc context. As such, the

semantics of the operator speciﬁed. This should be

reﬂected in its description, help, and examples. We

use a description-attribute for this purpose.

Debug and Runtime Feedback. If the IE expert

changes the IE language for the domain expert it

should affect also error and debug messages. The do-

main expert should see ”Product extractor failed...”

instead of ”ErcRgx operator...”. This dimension is

changed implicitly by changes in the IE language.

DES. The domain expert uses the DES to build ex-

traction programs. He can model extraction plans us-

ing prepared complex operators. Besides, if there is

a need, he can contact the IE expert and request for

the tuning the operators. The IE expert will perform

one of the actions shown in Figure 3. We allow the

domain expert to perform domain adaptation by oper-

ator composition and parameter adjustments.

Operator Persistence. All domain-speciﬁc operators

are persisted in an operator repository. The major aim

of an operator repository is to store operators, for fu-

ture retrieval and reuse. Domain-speciﬁc operators

are associated and persisted within the domain they

are applied to. Domain experts can choose a domain

they want to work with, and load a proﬁle containing

all the operators for this speciﬁc domain.

5 IMPLEMENTATION

AdaptIE tool is built on top of Eclipse Modeling

Framework. We use different metamodels to provide

abstract syntax for our extraction languages. Addi-

tionally, we use GMF (GMF, 2009) to provide con-

crete syntax through graphical user interfaces. Fig-

ure 4 shows the 4-layer metamodel architecture. Ele-

ments on one layer are said to be instances of elements

from the above layer and specify elements on the layer

below. On M0 are concrete IE plans modeled in UIs in

AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans

253

Figure 5: IE expert (a) and domain expert (b) metamodels

(simpliﬁed).

M1. On the very left in Figure 4 (level M1) we have

extraction programs or complex operators conform-

ing to the expert metamodel. The M2 is described by

means of its metamodel - M3 (Ecore (EMF, 2008)).

The same applies for composition programs written

using the end-user (domain expert) DSL. Those pro-

grams conform to the end-user metamodel.

In this work we analyze concrete instances of “ex-

pert operators” located on M1. We generate elements

for the end-user metamodel, which resides on M2 i.e.,

we need to cross the layers. The transformation is vi-

sualized by the dashed bold arrow in Figure 4. We de-

scribe model transformations by deﬁning relations be-

tween two models. The mapping between source and

target model is described in terms of their metamod-

els i.e., one level higher than input and output model.

In our case we describe rules using Ecore elements.

The transformation itself has to be deﬁned as a

model conforming to a metamodel deﬁning model

transformation semantics. Java and the ATLAS Trans-

formation Language (ATL) (ATLAS, 2006) provides

means to deﬁne and execute a model transformation.

In the prototype we used Java, but we consider to use

ATL in the next versions of AdaptIE. An excerpt of

the IE expert metamodel is shown in the UML class

diagram in Figure 5a. Intuitively, it allows model-

ing an instance of ComplexOperator by combining

AbstractOperators with Connections. The User

Customization class deﬁnes a user-centric adjust-

ment. The domain expert metamodel (Figure 5b) al-

lows to nest InnerOperators until a LeafOperator

(representing an atomic operator) is used, which does

not allow further nesting.

We use model transformation for two purposes:

First, it allows combining complex operators created

in the expert language using the domain expert lan-

guage. Second, we can use model transformation

for implementing the user customization. This takes

the form of analyzing concrete instances of the ex-

pert metamodel and generating elements for the do-

main expert metamodel. For example, a parameter

marked ”invisible” will not appear in the domain ex-

pert model.

6 EVALUATION

As a method to proof applicability of our method-

ology, we have selected a scenario based evaluation

(Hevner et al., 2004). Because we want to show that

our approach can be implemented and used to solve

issues in a real world scenario.

Scenario. We consider analysis of relation between

SAP Products and error messages (Java and ABAP

exceptions) in SDN forum. Results of such analysis

can help to understand which kind of problems de-

velopers and consultants encounter. In this scenario a

product manager is our domain expert. He knows the

forum, conventions used by its members well. More-

over he has access to structured data about products,

error messages, etc. He starts by selecting the data

source for analysis, e.g., RSS feed. The manager con-

ﬁgures the Import-operator himself. He provides the

number of items to be imported and the database con-

nectivity information. The next step is to extract prod-

ucts and error messages. As the domain expert is

not interested in how to extract things, therefore he

uses an operator repository and search extractors for

his domain. There are available Java, and ABAP ex-

ceptions extractors, but no operator for SAP Product.

Therefore he contacts an IE expert, and asks him for

such an operator. Moreover he provides the IE expert

with a taxonomy about SAP Products - SAP Terms

database. SAP Terms database contains all terminolo-

gies used in documentation about a product and list of

product’s subcomponents.

The IE expert does not need to understand SDN

forum data, so she can focus on her task to provide

an operator, which is able to use the taxonomy. Us-

ing AdaptIE, she composes operators as shown in

Figure 6b. The IE expert starts by applying pre-

processing operators for a morphological analysis.

First, she extracts sentences (SentenceEx) followed

by noun groups (NounGrpEx), thus, rejecting preposi-

tions, conjunctions, and relative pronouns that are not

considered important. Next, she applies ErcDict, re-

alizing a connection to the SAP Term database. The

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

254

Figure 7: Visual tools for IE expert (a) and end-user (b).

Figure 6: (a) Plan for extracting products and errors

(b) Complex operator (IE Expert’s view).

IE expert provides database access information and

decides to set the isCached property to keep ex-

tracted entities in memory thus reducing database ac-

cess time. Next, she saves the new IE plan as a oper-

ator and store it in repository. The IE expert speciﬁes

additionally that a domain expert will see SAP prod-

ucts as a atomic operator.

Now, the product manager can import the ready-

to-use operators and connect them to build the ﬁ-

nal extraction program (Figure 6a). He knows that

most of forum members put in their post about prob-

lems SAP products and error names together in the

HTML title. Moreover he knows that his product

is used in two scenarios with software components

written in Java and Abap. Therefore he selects

to extract both Java and ABAP exceptions. Fig-

ure 6 shows the complete extraction program mod-

eled by the product manager, and the complex oper-

ator for identifying SAP Products as modeled by the

IE expert. If operators do not perform as expected,

he may consult the expert again. So he does not

need to spend a lot of time trying to ﬁnd a mistake

in, e.g., regular expression for Java Exceptions ex-

traction: ([a-zA-Z]*\\.)*[a-zA-Z]+(Exception

|Error)[a-zA-Z]*. As such, they can both work on

reﬁnement iteratively and use the operator repository

(powered by the transformation engine) for exchang-

ing operators.

Tools. The current implementation of AdaptIE in-

cludes tools for the IE and domain experts. Figure 7a

shows the IE expert’s tool. It allows to create ex-

traction programs and to model complex operators.

The expert has access to all parameters and conﬁgu-

ration options. A low-level view on operators allows

ﬁne-tuning the execution of complex IE plans. Fur-

thermore, the tool allows him to perform a user cus-

tomization, i.e., renaming operators and parameters,

setting parameters, marking them mandatory, visible,

or read-only as well as providing default values for

the end-user. Once operators have been created, they

can be associated with a target domain and archived

in the repository.

The domain expert tool (Figure 7b) allows for

building the ﬁnal extraction program by combining

previously deﬁned domain-speciﬁc operators. Access

AdaptIE - Using Domain Language Concept to Enable Domain Experts in Modeling of Information Extraction Plans

255

to the repository and import of complex operators

is supported. We are currently investigating a more

declarative UI, which follows the document structure.

7 CONCLUSIONS

To summarize, this paper contribute an IE language

platform that can be a base for involving domain ex-

perts in IE plan modeling. We presented a process

for creating a domain-speciﬁc IE system. We showed

how our approach can be implemented using MDE

and a generic IE framework. As a proof of concept

we have developed AdaptIE and show how it can be

successfully applied in a real world scenario. As a fu-

ture work, we want to continue to bring IE to causal

users in two directions. The ﬁrst direction is further

investigation on IE languages for non IE expert users.

The second direction is work on automatic genera-

tion (driven by information available in, e.g., database

schema or descriptions of OLAP cubes) of IE plan to

simplify performing ad-hock Business analysis over

unstructured data.

REFERENCES

Agrawal, R., Ailamaki, A., Bernstein, P. A., Brewer, E. A.,

Carey, M. J., Chaudhuri, S., Doan, A., Florescu, D.,

Franklin, M. J., Molina, H. G., Gehrke, J., Gruenwald,

L., Haas, L. M., Halevy, A. Y., Hellerstein, J. M., Ioan-

nidis, Y. E., Korth, H. F., Kossmann, D., Madden, S.,

Magoulas, R., Ooi, B. C., O’Reilly, T., Ramakrishnan,

R., Sarawagi, S., Stonebraker, M., Szalay, A. S., and

Weikum, G. (2008). The claremont report on database

research. SIGMOD Rec., 37(3):9–19.

ATLAS (2006). Atlas Transformation Language (ATL)

User Manual v0.7. Nantes.

Barczynski, W. M., Brauer, F., Loeser, A., and Mocan, A.

(2009). Algebraic information extraction of enterprise

data: Methodology and operators. In IK-KR Workshop

at 20th International Joint Conference on Artiﬁcial In-

telligence 2009 (to be published).

ezivin, J. and Heckel, R., editors (2005). Language

Engineering for Model-Driven Software Develop-

ment, 29. February - 5. March 2004, volume 04101

of Dagstuhl Seminar Proceedings. Internationales

Begegnungs- und Forschungszentrum f

ur Informatik

(IBFI), Schloss Dagstuhl, Germany.

Bontcheva, K., Tablan, V., Maynard, D., and Cunningham,

H. (2004). Evolving gate to meet new challenges in

language engineering. Natural Language Engineer-

ing, 10(3-4):349–373.

Bosch, J. and Dittrich, Y. (2004). Domain-Speciﬁc Lan-

guages for a Changing World. http://www.ide.hk-

r.se/ bosch/papers/dslincw.ps.

Bouquet, P., Stoermer, H., Niederee, C., and Mana, A.

(2008). Entity Name System: The Backbone of an

Open and Scalable Web of Data. In ICSC 2008,

number CSS-ICSC 2008-4-28-25 in CSS-ICSC, pages

554–561. IEEE Computer Society.

Brauer, F., Barczynski, W., Hackenbroich, G., Schramm,

M., Mocan, A., and Foerster, F. (2009). Rankie:

Document retrieval on ranked entity graphs (demo).

In 35th conference International Conference on Very

Large Data Bases (VLDB) 2009.

DeRose, P., Shen, W., 0002, F. C., Doan, A., and Ramakr-

ishnan, R. (2007). Building structured web commu-

nity portals: A top-down, compositional, and incre-

mental approach. In (Koch et al., 2007), pages 399–

410.

EMF (2008). Eclipse Modeling Framework. Documen-

tation available at http://www.eclipse.org/modeling/

emf/.

Favre, J.-M. (2004a). Foundations of meta-pyramids: Lan-

guages vs. metamodels - episode ii: Story of thotus

the baboon1. In (B

ezivin and Heckel, 2005).

Favre, J.-M. (2004b). Foundations of model (driven) (re-

verse) engineering : Models - episode i: Stories of

the ﬁdus papyrus and of the solarus. In (B

ezivin and

Heckel, 2005).

Ferruci, D. and Lally, A. (2004). Uima: an architectural ap-

proach to unstructured information processing in the

corporate research environment. Natural Language

Engineering, 10(3-4):327–348.

GMF (2009). http://gmf.eclipse.org.

Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004).

Design science in information systems research. MIS

Quarterly, 28(1).

Koch, C., Gehrke, J., Garofalakis, M. N., Srivastava, D.,

Aberer, K., Deshpande, A., Florescu, D., Chan, C. Y.,

Ganti, V., Kanne, C.-C., Klas, W., and Neuhold, E. J.,

editors (2007). Proceedings of the 33rd International

Conference on Very Large Data Bases, University of

Vienna, Austria, September 23-27, 2007. ACM.

Petrasch, R. and Meimberg, O. (2006). Model Driven Archi-

tecture Eine praxisorientierte Einfhrung in die MDA.

dpunkt.verlag.

Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H., and

Vaithyanathan, S. (2008). An algebraic approach to

rule-based information extraction. In ICDE, pages

933–942. IEEE.

Sarawagi, S. (2008). Information extraction. Foundations

and Trends in Databases, 1(3):261–377.

Shen, W., DeRose, P., McCann, R., Doan, A., and Ramakr-

ishnan, R. (2008). Toward best-effort information ex-

traction. In SIGMOD ’08: Proceedings of the 2008

ACM SIGMOD international conference on Manage-

ment of data, pages 1031–1042, New York, NY, USA.

ACM.

Shen, W., Doan, A., Naughton, J. F., and Ramakrishnan, R.

(2007). Declarative information extraction using dat-

alog with embedded extraction predicates. In (Koch

et al., 2007), pages 1033–1044.

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

256