Parsing Abstract Syntax Graphs with ModelCC

Luis Quesada, Fernando Berzal and Juan-Carlos Cubero

Department of Computer Science and Artiﬁcial Intelligence

University of Granada, CITIC, 18071, Granada, Spain

Keywords:

Model-driven Software Development, Parser Generators, Abstract Syntax Graphs.

Abstract:

The tight coupling between language design and language processing in traditional language processing tools

is avoided by model-based parser generators such as ModelCC. By decoupling language speciﬁcation from

language processing, ModelCC avoids the limitations imposed by traditional parser generators, which con-

strain language designers to speciﬁc kinds of grammars. Apart from providing an alternative approach to

language speciﬁcation, ModelCC incorporates reference resolution within the parsing process. Instead of re-

turning mere abstract syntax trees, ModelCC is able to obtain abstract syntax graphs from its input string.

Moreover, such abstract syntax graphs are not restricted to directed acyclic graphs, since ModelCC supports

anaphoric, cataphoric, and recursive references.

1 INTRODUCTION

A formal language represents a set of strings (Jurafsky

and Martin, 2009). Formal languages consist of an al-

phabet, which describes the basic symbol or character

set of the language, and a grammar, which describes

how to write valid sentences of the language (Gins-

burg, 1975; Harrison, 1978). In Computer Science,

formal languages are used, among other things, for

the precise deﬁnition of data formats and the syntax

of programming languages.

Most existing language speciﬁcation techniques

(Aho et al., 2006) require the language designer to

provide a textual speciﬁcation of the language gram-

mar. The proper speciﬁcation of such a grammar is

a nontrivial process that depends on the lexical and

syntax analysis techniques to be used, since each kind

of technique requires the grammar to comply with a

speciﬁc set of constraints. Each analysis technique

is characterized by its expression power and this ex-

pression power determines whether a given analysis

technique is suitable for a particular language. The

most signiﬁcant constraints on formal language spec-

iﬁcation originate from the need to consider context-

sensitivity, the need to perform an efﬁcient analy-

sis, and some techniques’ inability to resolve conﬂicts

caused by grammar ambiguities.

As an alternative approach, model-based language

speciﬁcation techniques (Kleppe, 2007) decouple lan-

guage design from language processing and automat-

ically generate the corresponding language grammar,

thus making the language design process less ardu-

ous.

While, in general, the result of the parsing process

is an abstract syntax tree that corresponds to a valid

parsing of the input text according to the language

concrete syntax, nothing prevents the model-based

language designer from modeling non-tree structures.

Typically, syntax analysis defers some analy-

sis tasks to later stages in the language processing

pipeline, such as reference resolution and other se-

mantic checks. However, a model-driven parser gen-

erator can be employed to automate some parts of this

process.

ModelCC (Quesada et al., 2011) is a model-based

parser generator that includes support for dealing with

references between language elements, thus incor-

porating the reference resolution that is traditionally

hand-crafted with the help of a symbol table into the

parsing process.

In this paper, we explain how ModelCC (Quesada

et al., 2011) is able to resolve references and obtain

abstract syntax graphs as the result of the parsing pro-

cess, rather than the traditional abstract syntax trees

obtained from conventional parser generators.

Section 2 introduces model-based language spec-

iﬁcation. Section 3 explains the reference resolution

support in the ModelCC model-based parser genera-

tor. Section 4 introduces a working example that illus-

trates abstract syntax graph parsing. Finally, Section

151

Quesada L., Berzal F. and Cubero J..

Parsing Abstract Syntax Graphs with ModelCC.

DOI: 10.5220/0004671601510157

In Proceedings of the 2nd International Conference on Model-Driven Engineering and Software Development (MODELSWARD-2014), pages 151-157

ISBN: 978-989-758-007-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

5 presents our conclusions and future work.

2 BACKGROUND

In its most general sense, a model is anything used in

any way to represent something else. In such sense,

a grammar is a model of the language it deﬁnes. In

Software Engineering, data models are also common.

Data models explicitly determinethe structure of data.

Roughly speaking, they describe the elements they

represent and the relationships existing among them.

From a formal point of view, it should be noted that

data models and grammar-based language speciﬁca-

tions are not equivalent, even though both of them

can be used to represent data structures. A data model

can express relationships a grammar-based language

speciﬁcation cannot, and does not need to comply

with the constraints a grammar-based language spec-

iﬁcation has to comply with. Typically, describing a

data model is generally easier than describing the cor-

responding grammar-based language speciﬁcation.

In practice, when we want to build a complex data

structure from the contents of a ﬁle, the implementa-

tion of the mandatory language processor needed to

parse the ﬁle requires the software engineer to build a

grammar-based language speciﬁcation for the data as

represented in the ﬁle and also to implement the con-

version from the parse tree returned by the parser to

the desired data structure, which is an instance of the

data model that describes the data in the ﬁle.

Whenever the language speciﬁcation has to be

modiﬁed, the language designer has to manually

propagate changes throughout the entire language

processor tool chain, from the speciﬁcation of the

grammar deﬁning the formal language (and its adap-

tation to speciﬁc parsing tools) to the correspond-

ing data model. These updates are time-consuming,

tedious, and error-prone. By making such changes

labor-intensive, the traditional language processing

approach hampers the maintainability and evolution

of the language used to represent the data (Kats et al.,

2010).

Moreover, it is not uncommon for different appli-

cations to use the same language. For example, the

compiler, different code generators, and other tools

within an IDE, such as the editor or the debugger,

typically need to grapple with the full syntax of a

programming language. Unfortunately, their mainte-

nance typically requires keeping several copies of the

same language speciﬁcation synchronized.

The idea behind model-based language speciﬁ-

cation is that, starting from a single abstract syntax

model (ASM) that represents the core concepts in a

Context-Free

Grammar

e.g. BNF

Conceptual

Model

Attribute

Grammar

Abstract

Syntax

Tree

Concrete Syntax Model

Abstract Syntax Model

instance

Textual

Representation

Parser

input

output

Figure 1: Traditional language processing.

Context-Free

Grammar

e.g. BNF

Conceptual

Model

Textual

Representation

Parser

Abstract

Syntax

Graph

Concrete Syntax Model

Abstract Syntax Model

instance

input

output

Figure 2: Model-based language processing.

language, language designers can develop one or sev-

eral concrete syntax models (CSMs). These CSMs

can suit the speciﬁc needs of the desired textual or

graphical representation. The ASM-CSM mapping

can be performed, for instance, by annotating the ab-

stract syntax model with the constraints needed to

transform the elements in the abstract syntax into their

concrete representation.

This way, the ASM representing the language can

be modiﬁed as needed without having to worry about

the language processor and the peculiarities of the

chosen parsing technique, since the corresponding

language processor will be automatically updated.

Finally, as the ASM is not bound to a particu-

lar parsing technique, evaluating alternative and/or

complementary parsing techniques is possible with-

out having to propagate their constraints into the lan-

guage model. Therefore, by using an annotated ASM,

model-based language speciﬁcation completely de-

couples language speciﬁcation from language pro-

cessing, which can be performed using whichever

parsing techniques are suitable for the formal lan-

guage implicitly deﬁned by the abstract model and its

concrete mapping.

A diagram summarizing the traditional language

design process is shown in Figure 1, whereas the cor-

responding diagram for the model-based approach is

shown in Figure 2.

It should be noted that ASMs may represent non-

tree structures. Hence the use of the ‘abstract syntax

graph’ term in Figure 2.

ModelCC (Quesada et al., 2011) is a parser gen-

MODELSWARD2014-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

152

Table 1: The metadata annotations supported by the ModelCC model-based parser generator.

Constraints on... Annotation Function

...patterns

@Pattern Pattern matching deﬁnition of basic language elements.

@Value Field where the recognized input element will be stored.

...delimiters

@Preﬁx Element preﬁx(es).

@Sufﬁx Element sufﬁx(es).

@Separator Element separator(s) in lists of elements.

...cardinality

@Optional Optional elements.

@Minimum Minimum element multiplicity.

@Maximum Maximum element multiplicity.

...evaluation

order

@Associativity Element associativity (e.g. left-to-right).

@Composition Eager or lazy composition for nested composites.

@Priority Element precedence level/relationships.

...composition

order

@Position Deﬁne an element member position relative to other.

@FreeOrder All the element members positions may vary.

...references

@ID Identiﬁer of a language element.

@Reference Reference to a language element.

Custom

constraints

@Constraint Custom user-deﬁned constraint.

erator that supports a model-based approach to the

design of language processing systems. Its starting

ASM is created by deﬁning classes that represent lan-

guage elements and establishing relationships among

those elements. Once the ASM is established, con-

straints can be imposed over language elements and

their relationships as annotations in order to produce

the desired ASM-CSM mapping.

The ASM is built on top of basic language el-

ements, which can be viewed as the tokens in the

model-driven speciﬁcation of a language. ModelCC

provides the necessary mechanisms to combine those

basic elements into more complex language con-

structs, which correspond to the use of concatenation,

selection, and repetition in the syntax-driven speciﬁ-

cation of languages.

In ModelCC, the constraints imposed over ASMs

to deﬁne a particular ASM-CSM mapping are de-

clared as metadata annotations on the model itself.

Now supported by all the major programming plat-

forms, metadata annotations are often used in re-

ﬂective programming and code generation (Fowler,

2002). Table 1 summarizes the set of constraints sup-

ported by ModelCC for establishing ASM-CSM map-

pings.

When the ASM represents a tree-like structure, a

model-based parser generator is equivalent to a tradi-

tional grammar-based parser generator in terms of ex-

pression power. When the ASM represents non-tree

structures, reference resolution techniques can be em-

ployed to make model-based parser generators more

powerful than grammar-based ones, as we will see in

the next Section.

3 REFERENCE RESOLUTION

SUPPORT IN MODELCC

Reference resolution consists of ﬁnding the object a

reference refers to and, in the case of ModelCC, au-

tomatically linking the reference to the corresponding

object instantiation. Reference resolution leads to ab-

stract syntax graphs instead of trees in model-driven

language processing.

In ModelCC, an object reference is embodied by

a subset of the elements in its full object deﬁnition.

This subset of elements acts as an identiﬁer (or key

in database terms) that, when found in the input text,

can be recognized as a reference to the corresponding

object in the model and linked to its instantiation in

the ASM.

References in ModelCC can be anaphoric, when

they are preceded by the corresponding object deﬁ-

nition; cataphoric, when the references precede the

deﬁnition; and recursive, when they appear within the

deﬁnition they refer to.

Subsection 3.1 introduces the @ID metadata an-

notation, which allows the speciﬁcation of identi-

ﬁers for language elements. Subsection 3.2 presents

the @Reference metadata annotation, which allows

the speciﬁcation of references to other language el-

ements.

3.1 The @ID Annotation

ModelCC uses an @ID metadata annotation to sup-

port reference speciﬁcation. This annotation is ap-

ParsingAbstractSyntaxGraphswithModelCC

153

plied to a subset of the members of a language ele-

ment model. This subset determines the syntax of ref-

erences to particular instances of such elements in the

concrete syntax of the corresponding language. That

is, any appearance of the same set of values will be

interpreted as a reference to the same instance of the

referred language element.

The use of references is resolved in our imple-

mentation of ModelCC by the introduction of gram-

mar productions that characterize such references and

semantic actions that map them to the corresponding

language elements.

In Figure 3, the @ID annotation is employed to

identify users by a single number.

However, the @ID annotation can be used to-

gether with other ModelCC annotations, such as

@FreeOrder, which allows the members of a lan-

guage element to be shufﬂed in their textual represen-

tation, and @Preﬁx and @Sufﬁx, which add syntactic

sugar to the incarnation of the abstract syntax model

as a concrete textual language.

The inadvertent deﬁnition of two entities of the

same class with the same identiﬁer results in a run-

time warning produced by ModelCC when parsing its

input.

3.2 The @Reference Annotation

ModelCC resorts to the @Reference metadata anno-

tation to complete its support for reference resolu-

tion. The @Reference annotation applies to indi-

vidual members of any language element, provided

that the referenced types contain at least one @ID-

annotated member in their model.

Whenever a language element member is anno-

tated with @Reference, the corresponding grammar

productions are modiﬁed so that they refer to the sym-

bol corresponding to the element reference speciﬁca-

tion rather than the symbol that corresponds to its full

speciﬁcation. These productions are then associated

to a semantic action that resolves the references at

the end of the parsing process, in order to support

cataphoric and recursive references, apart from the

anaphoric references that could be resolved on the ﬂy

during the parsing process.

In Figure 3, the textual syntax of messages in-

cludes numbers that, as identiﬁers, refer to particular

users. ModelCC will parse such identiﬁers, recognize

the references, resolve them, and return the correct

object graph.



[@@@.@

[@@@.@

[@@.@

ssssssssssssssssss

[@@@.@

[@@.@

[

I]]U

[

Figure 3: ModelCC speciﬁcation of Messages, their

senders, and their receivers.

4 A WORKING EXAMPLE

In this section, we present an example language that

allows the speciﬁcation and rendering of complex 3D

objects using the reference resolution capabilities of

ModelCC.

First, we will outline the features we wish to in-

clude in our 3D object speciﬁcation language. Then,

we will provide the full language speciﬁcation for

ModelCC by deﬁning an abstract syntax model,

which will be annotated to specify the desired ASM-

CSM mapping. Lastly, we will see an example input

and output pair for our 3D object speciﬁcation lan-

guage.

4.1 Language Description

Our 3D object speciﬁcation language is designed to

support the following features:

• A special section, denoted by the “scene” key-

word, delimits the statements that will be used for

rendering the scene.

• The deﬁnition of custom objects, which are iden-

tiﬁed by an object name. As references can be

lazily resolved, recursion is allowed.

• Scoped statements, delimited by “{” and “}”, that

allow the speciﬁcation of lists of statements that

will run in a new scope.

• Composite statements, delimited by “[” and “]”,

that allow the speciﬁcation of lists of statements

that will run in the current scope.

• Repeated statements that allow the repetition of

a statement, a group of statements, or a block of

statements, a number of times.

• Draw statements, which draw either basic objects

(e.g. a cube) or user-deﬁned objects. Draw state-

ments allow the speciﬁcation of a numeric param-

eter. The “next” keyword, when used as this nu-

meric parameter, is replaced in runtime by the cur-

rent parameter decreased by one, and draw state-

ments will not run when the parameter is 0.

• Scale transformation statements, which support

the speciﬁcation of a combination of x, y, and z

MODELSWARD2014-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

154

Deﬁnition

- @ID name : ObjectName

- content : Statement

Statement

+ run()

Scene

- deﬁnitions : Deﬁnition[]

- content : Statement

+ draw()

ObjectName

- @Value name : String

BlockStatement

- content : Statement[]

GroupStatement

- content : Statement[]

SceneObject

+ draw()

IntegerLiteral

- @Value val : int

CubeObject

DeﬁnedObject

- @Reference ref : Deﬁnition

RepeatStatement

- @Suﬃx("times") times : Parameter

- content : Statement

+ @Autorun checkArguments()

RealLiteral

- @Value val : double

ColorStatement

- @Preﬁx("red") @Optional red : Literal

- @Preﬁx("green") @Optional green : Literal

- @Preﬁx("blue") @ Optional blue : Literal

- @Preﬁx("alpha") @ Optional alpha : Literal

- relative : Relative

+ @Autorun checkArguments()

Relative

ScaleStatement

- @Preﬁx("x") @Optional x : Literal

- @Preﬁx("y") @Optional y : Literal

- @Preﬁx("z") @Optional z : Literal

- @Optional all : Literal

+ @Autorun checkArguments()

RotateStatement

- @Preﬁx("angle") @ Optional angle : Literal

- @Preﬁx("x") @Optional x : Literal

- @Preﬁx("y") @Optional y : Literal

- @Preﬁx("z") @Optional z : Literal

+ @Autorun checkArguments()

TranslateStatement

- @Preﬁx("x") @Optional x : Literal

- @Preﬁx("y") @Optional y : Literal

- @Preﬁx("z") @Optional z : Literal

+ @Autorun checkArguments()

@FreeOrder

@Preﬁx("scene")

@Preﬁx("deﬁne")

@Pattern("[A-Za-z0-9_]+")

@Preﬁx("draw")

@Pattern("cube")

@Preﬁx("repeat")

@Preﬁx("[")

@Suﬃx("]")

@Suﬃx("}")

@Preﬁx("{")

@Priority(2)

@Priority(1)

@Optional

@FreeOrder

@Preﬁx("color")

@Preﬁx("scale")

@FreeOrder

@Preﬁx("rotate")

@FreeOrder

@Preﬁx("translate")

@Pattern("relative")

0..*

-deﬁnitions

-content

-name

-content

0..*

-content

0..*

-ref

-content

-relative

@Pattern("next")

Parameter

+ intValue() : int

+ doubleValue() : double

Literal

-parameter

-object

DrawStatement

- object : SceneObject

- @Optional parameter : Parameter

+ @Autorun checkArguments()

Figure 4: ModelCC deﬁnition of a 3D object speciﬁcation language. ModelCC reference resolution support is used to allow

the speciﬁcation of complex 3D objects in the Deﬁnition class.

ParsingAbstractSyntaxGraphswithModelCC

155

values in any order, or a single scaling factor that

will be applied to the three axes.

• Rotate transformation statements, which support

the speciﬁcation of the angle and a combination

of x, y, and z axis values in any order.

• Translate transformation statements, that support

the speciﬁcation of a combination of x, y, and z

values in any order.

• Color setting statements, which support the spec-

iﬁcation of a combination of red, green, blue, and

alpha values in any order, and allow either abso-

lute (by default) or relative color adjustments.

4.2 ModelCC Implementation

In ModelCC, the abstract syntax model is designed,

then mapped to a concrete syntax model by imposing

constraints by means of metadata annotations on the

abstract syntax model.

The resulting model can be processed by

ModelCC to automatically generate the correspond-

ing parser. The UML class diagram in Figure 4

presents our annotated 3D object speciﬁcation lan-

guage model.

The reference support extension we propose in

this paper can be observed in the Deﬁnition, Object-

Name, and DeﬁnedObject classes. The name member

of the Deﬁnition class is annotated with @ID, which

means that a Deﬁnition instance can be identiﬁed by

an ObjectName. Then, the ref member of a Deﬁne-

dObject is annotated with @Reference, which means

that, in textual form, a DeﬁnedObject can refer to a

Deﬁnition by its ObjectName. ModelCC reference

resolution allows references to be resolved during the

parsing process and makes the implementation of a

traditional symbol table unnecessary.

It should be noted that certain constraints cannot

be expressed in the abstract syntax model. How-

ever, these constraints can be expressed as custom

constraints using the @Constraint annotation. In our

example, some statements corresponding to elements

in our model, such as draw statements and repeat

statements, will not accept real values as parameters.

These custom semantic constraints are implemented

in the checkArguments() methods of the language el-

ements classes corresponding to those statements.

ModelCC is able to automatically generate a

grammar from the ASM deﬁned by a class model and

the ASM-CSM mapping deﬁned as a set of metadata

annotations on the class model. References in that

grammar are automatically resolved by ModelCC so

that further work is not needed.

define trunk {

color red 0.87 green 0.50 blue 0.10

alpha 1

draw cube

repeat 10 times [

scale x 1.02 z 1.02 y 0.98

color relative red -0.03 green -0.02

blue -0.01

draw cube

]

}

define leaves {

color red 0.2 green 0.9 blue 0.3

alpha 0.9

translate x -1

{

scale z 0.6 y 0.05

repeat 100 times [

color relative red +0.005

alpha -0.005

translate x -0.04 y -0.3

draw cube

]

}

define palmtree {

repeat 8 times [

draw trunk

translate y 1

]

repeat 3 times [

translate y -0.5

scale 0.7

rotate angle 8 y 1

repeat 15 times [

rotate angle 24 y 1

draw leaves

]

}

scene {

draw palmtree

}

Figure 5: Speciﬁcation of a palmtree in our 3D object spec-

iﬁcation language.

4.3 Example of 3D Object Speciﬁcation

Figures 5 and 6 illustrate the speciﬁcation and ren-

dering of a 3D palmtree in our 3D object speciﬁca-

tion language. The palmtree object is deﬁned as eight

MODELSWARD2014-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

156

Figure 6: Rendering of the palm tree from Figure 5.

trunk sections with leaves on the top.

5 CONCLUSIONS AND FUTURE

WORK

ModelCC is a model-based parser generator that em-

ploys metadata annotations to implement ASM-CSM

mappings.

We have described how ModelCC supports ref-

erence resolution and allows parsing abstract syntax

graphs rather than conventional abstract syntax trees,

as obtained by traditional grammar-driven parser gen-

erators.

We have demonstrated the use of ModelCC ref-

erence resolution support with a fully-functional ab-

stract syntax graph parser for a 3D object speciﬁca-

tion language.

In the future, we plan to apply model-based lan-

guage speciﬁcation techniques to problems such as

data integration. We also plan to implement metadata

annotations that support more complex scoping rules

for reference resolution.

ACKNOWLEDGEMENTS

Work partially supported by research project

TIN2012-36951, “NOESIS: Network-Oriented

Exploration, Simulation, and Induction System”,

cofunded by the Spanish Ministry of Economy and

the European Regional Development Fund (FEDER).

REFERENCES

Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. (2006).

Compilers: Principles, Techniques, and Tools. Addi-

son Wesley, 2nd edition.

Fowler, M. (2002). Using metadata. IEEE Software,

19(6):13–17.

Ginsburg, S. (1975). Algebraic and automata theoretic

properties of formal languages. North-Holland.

Harrison, M. A. (1978). Introduction to Formal Language

Theory. Reading, Mass: Addison-Wesley Publishing

Company.

Jurafsky, D. and Martin, J. H. (2009). Speech and Language

Processing: An Introduction to Natural Language

Processing, Computational Linguistics and Speech

Recognition. Prentice Hall, 2nd edition.

Kats, L. C. L., Visser, E., and Wachsmuth, G. (2010).

Pure and declarative syntax deﬁnition: Paradise lost

and regained. In Proceedings of the ACM Interna-

tional Conference on Object-Oriented Programming

Systems, Languages, and Applications (OOPSLA’10),

pages 918–932.

Kleppe, A. (2007). Towards the generation of a text-based

IDE from a language metamodel. volume 4530 of

Lecture Notes in Computer Science, pages 114–129.

Quesada, L., Berzal, F., and Cubero, J.-C. (2011). A lan-

guage speciﬁcation tool for model-based parsing. In

Proceedings of the 12th International Conference on

Intelligent Data Engineering and Automated Learn-

ing. Lecture Notes in Computer Science, volume 6936,

pages 50–57.

ParsingAbstractSyntaxGraphswithModelCC

157