A MDE APPROACH FOR LANGUAGE ENGINEERING

Francisco Gort

azar, Abraham Duarte and Micael Gallego

Department of Computer Science, Universidad Rey Juan Carlos, Tulip

an, M

ostoles, Madrid, Spain

Keywords:

Model Driven Engineering, Language Engineering, abstract syntax, concrete syntax, DSLs.

Abstract:

Many development tools of modern Integrated Development Environments (IDEs) make an intensive use of

abstract syntax tree (AST) representations of the software. This is the case of refactors, code formatters,

or content assistants, among others. Such AST is usually an instance of an object oriented abstract syntax

model. We propose to center the attention of Language Engineering (LE) on this model. We propose to use

UML as the abstract syntax metamodel because UML tools provide code generators for different programming

languages for model implementation. As well as an abstract syntax, a concrete syntax of the language it is

also necessary. We are concerned about textual languages, whose concrete syntax is usually given as a BNF

grammar. Instead, we propose to stereotype the abstract syntax model by means of a proﬁle, aimed at concrete

syntax deﬁnition. Applying Model Driven Engineering (MDE) practices several development artifacts can be

automatically generated.

1 INTRODUCTION

Development tools in modern IDEs rely on the ASTs

of the program (Boshernitsan, 2001; Clark et al.,

2004; Herranz and Nogueira, 2005). An AST is a rep-

resentation of the program that is being edited. Some

development tools that are based on ASTs to perform

their tasks are refactorings, design patterns extractors,

call graph visualizers, type hierarchy visualizers, or

content assistants, among others. Some IDEs with

AST-based tools are Eclipse

, NetBeans

, or IntelliJ

IDEA

ASTs are generally object oriented, and conform

to the abstract syntax model (Vainsencher and Black,

2006; Jones, 2003). The abstract syntax model rep-

resents the abstract syntax of the language. We pro-

pose to make the abstract syntax model the central

piece of LE. We then follow a MDE approach to gen-

erate different development tools. MDE is a Soft-

ware Engineering methodology focused on the inte-

http://www.eclipse.org

http://www.netbeans.org

http://www.jetbrains.com/idea/

gration of bodies of knowledge by different research

communities (Favre, 2004). The MDE approach is

strongly based on models, and model-to-model trans-

formations that drive the application generation.

A well-suited modeling language for abstract syn-

tax model deﬁnition is UML. UML is a standard

modeling language for object oriented modeling, and

there are several tools which support it. Furthermore,

UML tools provide code generators which can gener-

ate code implementing the models for different pro-

gramming languages, such as Java, C#, and C++.

Languages consist on an abstract syntax and a

concrete syntax. There are two kind of concrete syn-

taxes: graphical and textual. We are concerned about

textual syntaxes. Thus, our proposal is aimed at gen-

erating development tools for textual languages. The

concrete syntax is usually represented by means of

some form of a BNF context-free grammar. These

grammars are used in reference manuals as the de-

facto standard for deﬁning the concrete syntax of lan-

guages (Paakki, 1995).

Some advantages of using BNF grammars in lan-

guage deﬁnitions follows:

• Context-free grammars, and concretely BNF

Gortázar F., Duarte A. and Gallego M. (2007).

A MDE APPROACH FOR LANGUAGE ENGINEERING.

In Proceedings of the Second International Conference on Evaluation of Novel Approaches to Software Engineering , pages 80-86

DOI: 10.5220/0002586800800086

 SciTePress

grammars, are well-known by language engi-

neers.

• Context-free grammars have a long tradition on

compiler construction, they also have solid under-

lying theories, and there is a large amount of doc-

umentation on the topic (Blasband, 2001).

• There is a large amount of automatic tools aimed

at compiler construction, such as lexer and parser

generators.

However, grammars as a concrete syntax deﬁni-

tion formalism also present some drawbacks:

• Duplication of information. The structure of the

language is represented in both the grammar and

the abstract syntax model. It is error-prone to

maintain both speciﬁcations synchronized.

• Structures of the grammar need to be trans-

formed into their counterparts in the abstract syn-

tax model. For instance, lists and boolean proper-

ties in the abstract syntax have different represen-

tations in the grammar.

In order to take advantage of MDE methodol-

ogy for Language Engineering, some authors pro-

pose a metamodel for context-free grammar deﬁni-

tion (Wimmer and Kramler, 2005; Muller et al., 2006;

Fondement et al., 2006). Grammars are then de-

ﬁned by means of models which conforms with such

metamodel. However, this approach does not solve

the problem of keeping in-sync abstract and concrete

models.

Instead, we propose to stereotype the abstract syn-

tax model by means of a UML proﬁle (Gort

azar et al.,

2007). This proﬁle is aimed at concrete syntax def-

inition. Our approach is compatible with grammar

metamodels, because models based on that metamod-

els can be automatically generated from the annotated

abstract syntax model. Although deﬁning in the same

model abstract and concrete syntaxes limits the lan-

guage to a unique concrete syntax, having more than

one concrete syntax for a language is unusual. In

contrast, this uniﬁcation of both deﬁnitions into one

model allows to keep in-sync both syntaxes. It is

possible to use one single model because some con-

structions of both domains are equivalent, like lists,

or inheritance, among others (Wimmer and Kramler,

2005; Alanen and Porres, 2003; Antoniol et al., 2003;

Hedin and Magnusson, 2003; Lieberherr, 2005; Wile,

1997).

In this paper we propose to apply MDE for auto-

matic generation of development tools from a speciﬁ-

cation given in UML, thus bridging Model Driven De-

velopment and Language Engineering. For this pur-

pose, we propose a new concrete syntax speciﬁcation

to be used within a model-driven approach.

2 ABSTRACT SYNTAX TREES

Abstract syntax represents the structure of the lan-

guage, present in every computer language (Bosher-

nitsan, 2001; Clark et al., 2004; Herranz and

Nogueira, 2005). An AST represents the structure of

a program as a tree. ASTs hide syntactic details like

reserved words or punctuation symbols. The most

common ASTs are object oriented (Jones, 2003). In

modern IDEs ASTs are a central repository, and de-

velopment tools rely heavily on them (Figure 1).

Codeformatters

View#1

Softwaremetrics

Refactorings

Searchtools

Editors

View#2

View#3

AST

Figure 1: AST dependencies in modern IDEs.

The importance of ASTs in modern IDEs causes

in some situations to start the language speciﬁcation

modeling the abstract syntax, and deﬁning separately

the concrete and abstract syntaxes. Some examples

are SableCC

, MPS, XMF-Mosaic, among others.

AST quality is gaining importance as a result of

the open architecture of IDEs, as long as third par-

ties can contribute development tools to the IDE by

means of plug-ins. These plug-ins interact with the

AST to perform their tasks. Thus, the abstract syn-

tax model has to be comprehensible and easy to use

(Bloch, 2006).

Figure 2 shows package dependencies of three

Eclipse plug-ins for development support in three dif-

ferent programming paradigms (imperative, object-

oriented, and functional). Dependencies shown in

model a) correspond to the EclipseFP project, which

is a framework that provides Haskell support in

Eclipse. The

Halamo

subpackage, in the

Core

pack-

age, contains the Haskell abstract syntax model.

Dependencies shown in model b) correspond to

the Java Development Tools (JDT)

, which is a set of

plug-ins that support Java programming in the Eclipse

environment. The

DOM

subpackage, in the

Core

pack-

age, contains the Java abstract syntax model.

SableCC 3.0: http://sablecc.org

http://www.eclipse.org/jdt

A MDE APPROACH FOR LANGUAGE ENGINEERING

b) c)

Figure 2: Some packages dependencies from different development tools.

Dependencies shown in model c) correspond to

the C/C++ Development Tooling, which provides

C/C++ support in the Eclipse environment. The

DOM

subpackage, in the

Core

package, contains the C/C++

abstract syntax model.

As it can be seen in Figure 2 most development

tools included in the three plug-ins rely on the abstract

syntax model. This is the case of editors, refactors,

content assistants, or code formatters, among others.

3 MODEL DRIVEN LANGUAGE

ENGINEERING: METACET’S

APPROACH

Our proposal is based on the abstract syntax model of

the language, as it is an essential part of development

tools. We propose a methodology, called MetaCET,

for textual language design and automatic genera-

tion of development tools. This methodology applies

MDE principles to language development. MetaCET

is based on modeling the abstract syntax of the tar-

get language in UML. The concrete syntax is pro-

vided by means of a UML proﬁle aimed at this task,

called Concrete Syntax proﬁle. This proﬁle is de-

scribed elsewhere (Gort

azar et al., 2007).

We call the resulting model the language model.

From this model several tools can be automatically

derived aimed at development in the target language.

Figure 3 shows graphically the general approach of

MetaCET.

During the rest of this section, we will use as

an example the Statechart language, as deﬁned in

(Fondement et al., 2006). Our intention is to obtain a

parser for such language, by means of applying MDE

principles to language engineering. It follows an ex-

planation of each step.

Abstract

Syntax

Model

Artifact

Concrete

Syntax

Profile

Language

Model

Figure 3: An overview of the approach.

3.1 Modeling the Abstract Syntax of the

Target Language

Given the importance of ASTs in modern IDEs, we

propose a Language Engineering approach which is

focused on the abstract syntax model. We use UML

as the abstract syntax metamodel. Our intention in

doing so is to take advantage of code generation usu-

ally available within UML tools. Furthermore, UML

is the de facto standard modeling language for object

oriented modeling.

Basically, any abstract syntax model contains lan-

guage concepts, represented as UML classes or in-

terfaces. Relations between concepts are represented

as UML associations. Finally, basic properties of

concepts are represented as attributes with basic data

types. Figure 4 presents an example, taken from

(Fondement et al., 2006), of the abstract syntax of a

statechart language. This language will be used to il-

lustrate our proposal.

ENASE 2007 - International Conference on Evaluation on Novel Approaches to Software Engineering

PseudoState

-kind:PseudoStateKind

StateMachine

StateVertex

<<enumeration>>

PseudoStateKind

initial

CompositeState

Transition

SimpleState

State Event

-transitions

-states

-container

0..1

-ingoing

-target

-outgoing

-source

-top

0..1

-trigger

Figure 4: Fragment of an abstract syntax model.

3.2 Modeling the Concrete Syntax of

the Target Language

The concrete syntax of a language deﬁnes the notation

used to write or build documents in such language.

There are two kind of notations: textual notations and

graphical notations. In this paper we are concerned

with textual notations.

We propose to stereotype the abstract syntax

model with the concrete syntax. This stereotyping is

performed by means of stereotypes deﬁned in a UML

proﬁle we have deﬁned: the Concrete Syntax proﬁle.

A UML proﬁle is an extension mechanism pro-

vided by UML. It may contain information from a

domain which is not supported directly by UML. A

proﬁle contains stereotype deﬁnitions which, when

applied to elements of a UML model, provide the se-

mantics of the domain.

In MetaCET, we provide a proﬁle which contains

the necessary stereotypes to augment an abstract syn-

tax model with the concrete syntax. This stereotyped

abstract syntax model is called the language model

(see Figure 5).

Keeping abstract and concrete syntax separated

might allow to deﬁne different concrete syntaxes for

the same abstract syntax. However, this is not very

usual. On the other hand, deﬁning the language in a

single model presents some advantages. Furthermore,

there are several similarities between EBNF and ob-

ject oriented modeling (Wimmer and Kramler, 2005).

For instance, inheritance ↔ alternation (Wimmer and

Kramler, 2005; Alanen and Porres, 2003), associa-

tions with a n upper bound multiplicity ↔ EBNF

repetitions (Alanen and Porres, 2003), enumerations

↔ choice between static strings (Alanen and Porres,

2003), among others.

The EBNF grammar corresponding to the lan-

guage model shown in Figure 5 is shown in Figure

6. The inheritance relationship between

State

and

its subclasses

CompositeState

and

SimpleState

, is

represented in the EBNF grammar as a choice rule

(the

State

rule). Symbols on the left represent non-

terminals, and symbols in bold represent literals.

Grammar details that cannot be represented di-

rectly in UML can be speciﬁed by means of the Con-

crete Syntax proﬁle. This is the case of grammar

terminals, and the arrangement of UML properties

within the syntax deﬁnition of the class.

<<syntax>>

CompositeState

{value=initial"CompositeState"name"{"(states|transitions)!"}"}

<<enumeration>>

Tokens

<<tokenDef>>identifier{pattern=[:jletter:][:jletterdigit:]*}

<<syntax>>

Transition

{value="Transition""from"source"to"targettrigger}

<<tokenRef>>-source:String{value=identifier}

<<tokenRef>>-target:String{value=identifier}

<<syntax>>

<<root>>

StateMachine

{value="StateMachine"nametop,

tokens= Tokens}

<<tokenRef>>-name:String{value=identifier}

<<LanguageElement>>

State

<<tokenRef>>-name:String{value=identifier}

<<syntax>>-initial:boolean{value="initial"}

<<syntax>>

SimpleState

{value=initial"State"name}

PseudoState

-kind:PseudoStateKind

<<enumeration>>

PseudoStateKind

initial

<<syntax>>

Event

{value=name}

StateVertex

<<syntaxList>>

-transitions *

-states

-container0..1

1 -top

-ingoing *

-outgoing *

<<optional>>

{previousSyntaxDescription="on"}

-trigger 0..1

Figure 5: An annotated abstract syntax model for the State-

Chart language.

StateMachine::=<ID>State

State::=CompositeState

|SimpleState

CompositeState::=[]

(State|Transition)

SimpleState::=[]<ID>

Transition::=<ID>

<ID>[Event]

Event::=<ID>

StateMachine

initialCompositeState

initialState

Transitionfrom

toon

Figure 6: EBNF for the StateChart language.

3.3 Tool Generation

Several tools can be generated automatically from the

language model. Usually such speciﬁcations are used

to generate a parser. However, this is just one of a

set of other possible tools. For instance, a call graph

visualizer could be generated from this model. The

visualizer obtains the information querying the AST.

A MDE APPROACH FOR LANGUAGE ENGINEERING

Models for different tasks can be derived from the

language model. These derived models can be tai-

lored to speciﬁc domains such as design pattern de-

tection, parser construction or software metrics. At

the end of the process, it is necessary to generate code

from the lowest levels. Code generators must be de-

ﬁned for this purpose. In this sense, low level models

must be speciﬁc enough in terms of the technology to

perform code generation.

Our proposal does not exclude other approaches.

For instance, from the language model, a grammar

models such as those deﬁned in (Wimmer and Kram-

ler, 2005) and (Alanen and Porres, 2003) can be auto-

matically generated.

4 VALIDATION: A PARSER FOR

AST CONSTRUCTION

In this section, a particularization of the methodology

is presented. We have stated that ASTs are key in

modern development tools such as those present in

common IDEs. Therefore, we have chosen to validate

our proposal by means of automatically generating a

parser for AST construction from the language model.

The parser takes a document in the target language

and builds its corresponding AST.

Our proposal for parser generation is based on

three different levels (Figure 7). Each level requires

different knowledge. This separation in levels al-

lows experts in different domains focus on different

parts of the language engineering process. Finally, we

build the parser using a parser generator or compiler-

compiler. Parser generators provide their own lan-

guage for parser speciﬁcation, thus we need to trans-

form the language model into the appropriate speci-

ﬁcation. We use model-to-model transformations to

derive such speciﬁcation.

4.1 First Level: Language Model

The ﬁrst level is the speciﬁcation of the language by

means of the abstract syntax model annotated with

the concrete syntax (the language model) as was de-

scribed in Section 3. This model is independent on

any kind of grammar. The model could even be hardly

tractable by most common analysis methods. How-

ever, this is not a key aspect at this level.

4.2 Second Level: Parser Model

The second level is the speciﬁcation of a context-free

grammar for the language. This grammar is repre-

sented by means of an object oriented model. This

Language

Model

Parser

Model

Generator

Model

Parser

Profile

Generator

Profile

Parser

Code

Generation

Figure 7: Parser development for a target language with the

MetaCET’s approach.

model is called parser model. It is not just a grammar

model, because elements in the model are related with

elements in the abstract syntax model.

The parser model is independent of the analysis

method used. Common problems to various of these

methods can be solved in this level. Thus, when mod-

els in the lower level are generated, they are free of

such problems.

The parser model is obtained automatically from

the language model by means of a model-to-model

transformation. The transformation produces the

parser model and annotates it with stereotypes from

a proﬁle, called Parser, we have deﬁned. Our proﬁle

have the same intention as other approaches such as

those of (Wimmer and Kramler, 2005; Muller et al.,

2006; Fondement et al., 2006). In fact, those ap-

proaches could also be used as a result of our trans-

formation.

4.3 Third Level: Generator Model

The third level is the speciﬁcation of the generator

model. This model is tailored for a concrete parser

generator. The generator model is obtained automat-

ically from the parser model. Two different model-

ENASE 2007 - International Conference on Evaluation on Novel Approaches to Software Engineering

to-model transformations have been deﬁned. The ﬁrst

one produces a model for a JavaCC parser generator.

The second one produces a model for a Cup

parser

generator.

The generator model is based upon a concrete kind

of context-free grammar: LL, LALR, etc. For in-

stance, JavaCC requires a LL grammar. Cup, on the

other side, requires a LALR grammar.

4.4 Code Generation: Parser

Construction

Code generation is performed in two steps. First,

from the generator model, a textual speciﬁcation for

the parser generator is produced. Second, the parser

generator takes this textual speciﬁcation and gener-

ates the parser. The aim of the parser is to parse docu-

ments written in the target language and to build their

corresponding AST. This AST is an instance of the

abstract syntax model.

5 CONCLUSION

Many development tools make an intensive use of

abstract syntax trees. This is the case of refactors,

code formatters, or content assistants, among others.

Such AST is usually an instance of an object oriented

model which represents the language’s abstract syn-

tax. In this paper we have proposed a Language En-

gineering methodology which is focused on this ab-

stract syntax model. We choose UML as the abstract

syntax metamodel because UML tools provide code

generators for different programming languages for

model implementation. Furthermore, UML is the de

facto standard for object oriented modeling.

We have proposed to stereotype the abstract syn-

tax model by means of a UML proﬁle, called Con-

crete Syntax, aimed at concrete syntax speciﬁcation.

This stereotyped abstract syntax model avoids the

synchronization between abstract and concrete syn-

tax. From this stereotyped abstract syntax model sev-

eral development artifacts can be automatically gener-

ated by means of applying MDE practices. We have

given an overview of the generation of a common de-

velopment component: a parser for AST construction.

Cup: LALR parser generator for Java

(http://www2.cs.tum.edu/projects/cup/)

ACKNOWLEDGEMENTS

This work has been partially supported by MCyT

TIN2005-08943-C02-02 and URJC-CM-2006-CET-

0603.

REFERENCES

Alanen, M. and Porres, I. (2003). A relation between

context-free grammars and meta object facility meta-

models.

Antoniol, G., Penta, M. D., and Merlo, E. (2003). Yaab

(yet another ast browser): Using ocl to navigate asts.

In IWPC ’03: Proceedings of the 11th IEEE In-

ternational Workshop on Program Comprehension,

page 13, Washington, DC, USA. IEEE Computer So-

ciety.

Blasband, D. (2001). Parsing in a hostile world. In Proceed-

ings of the Eighth Working Conference on Reverse En-

gineering (WCRE’01), page 291. IEEE Computer So-

ciety.

Bloch, J. (2006). How to design a good api and why

it matters. In OOPSLA ’06: Companion to the

21st ACM SIGPLAN conference on Object-oriented

programming languages, systems, and applications,

pages 506–507. ACM Press.

Boshernitsan, M. (2001). Harmonia: A ﬂexible framework

for constructing interactive. Technical report.

Clark, T., Evans, A., Sammut, P., and Willians, J. (2004).

Applied Metamodelling: A Foundation for Language

Driven Development. Xactium.

Favre, J.-M. (2004). Towards a basic theory to model

model driven engineering. In 3rd Workshop in Soft-

ware Model Engineering (WiSME 2004).

Fondement, F., Schnekenburger, R., G

erard, S., and Muller,

P.-A. (2006). Metamodel-Aware Textual Concrete

Syntax Speciﬁcation. Technical report.

Gort

azar, F., Duarte, A., and Gallego, M. (2007). Rep-

resenting languages in UML. In Proceedings of the

2nd Conference on Evaluation of Novel Approaches

to Software Engineering.

Hedin, G. and Magnusson, E. (2003). Jastadd: an aspect-

oriented compiler construction system. Sci. Comput.

Program., 47(1):37–58.

Herranz, A. and Nogueira, P. (2005). More than parsing. In

Lpez Fraguas, F. J., editor, Spanish V Conference on

Programming and Languages (PROLE 2005), pages

193–202. Thomson Paraninfo.

Jones, J. (2003). Abstract syntax tree implementation id-

ioms. In Proceedings of the 10th Conference on Pat-

tern Languages of Programs (PLoP’03).

Lieberherr, K. J. (2005). Object-oriented programming with

class dictionaries. LISP and Symbolic Computation,

1:185–212.

A MDE APPROACH FOR LANGUAGE ENGINEERING

Muller, P.-A., Fleurey, F., Fondement, F., Hassenforder, M.,

Schneckenburger, R., G

erard, S., and J

equel, J.-M.

(2006). Model-driven analysis and synthesis of con-

crete syntax. In MoDELS, pages 98–110.

Paakki, J. (1995). Attribute grammar paradigmsa high-level

methodology in language implementation. ACM Com-

put. Surv., 27(2):196–255.

Vainsencher, D. and Black, A. P. (2006). A pattern lan-

guage for extensible program representation. In Pro-

ceedings of the Pattern Languages of Programming

Conference.

Wile, D. S. (1997). Abstract syntax from concrete syntax.

In ICSE ’97: Proceedings of the 19th international

conference on Software engineering, pages 472–480,

New York, NY, USA. ACM Press.

Wimmer, M. and Kramler, G. (2005). Bridging grammar-

ware and modelware. In MoDELS Satellite Events,

pages 159–168.

ENASE 2007 - International Conference on Evaluation on Novel Approaches to Software Engineering