Translating Platform-independent Code into Natural Language Texts

Håkan Burden and Rogardt Heldal

Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg,

Gothenburg, Sweden

Keywords:

Model Transformations, Executable and Translatable UML, Grammatical Framework.

Abstract:

Understanding software artifacts is not only time-consuming, without the proper training and experience it can

be impossible. From a model-driven perspective there are two beneﬁts from translating platform-independent

models into natural language texts: First, the non-functional properties of the solution have already been

omitted meaning that the translations focus on describing the functional behaviour of the system. Second,

the platform-independent models are reusable across platforms and so are the translations generated from

them. As a proof-of-concept a platform-independent Action language is translated into natural language texts

through the framework of model transformations.

1 INTRODUCTION

In MDA the platform-independent model, PIM,

should be a bridge between the speciﬁcations in the

computationally-independent model, CIM, and the

platform-speciﬁc model, PSM (Miller and Mukerji,

2003; Mellor et al., 2004). Thus it is important that

the PIM is clear and articulate (Lange et al., 2006) to

convey the intentions and motivations in the CIM as

well as correctly describe the PSM (Perry and Wolf,

1992).

Since the PSM can be automatically generated

from the PIM all changes to the software can be done

at PIM-level or on the transformations. In this way the

PIM and the PSM are in synchronisation with each

other. To keep the CIM and the PIM synchronised

is not as easy since their are no automatic transfor-

mations from CIM to PIM, yet. Here the translation

of the PIM into textual representations can serve as a

means of validation of the PIM, in regard to the CIM,

during development or to make it easier for new de-

velopers to comprehend the structure and behaviour

of the system (Arlow et al., 1999).

Claims have been made that comprehensibility

is more important than completeness if models are

used for communication between stakeholders (Mo-

hagheghi and Aagedal, 2007). But if the stakeholders

want to know if the PIM is correct with regards to the

software speciﬁcations, completeness is just as im-

portant. Understanding the annotation and testing of

a model requires an understanding of object-oriented

design, knowledge of the used models and experience

of using the modelling tools (Arlow et al., 1999). Nat-

ural language on the other hand is suitable for stake-

holders without the necessary expertise in models and

tools (Spreeuwenberg et al., 2010).

Contributions

This paper shows i) how a platform-independent Ac-

tion language can be translated into natural language

texts ii) by putting natural language generation of

software behaviour within the perspective of model-

driven software development iii) with transformation

rules that are reusable across domains and platforms.

Overview

Section 2 presents the theoretical framework for the

study. The tools, technologies and transformations

that are used in the study are explained together with

examples of translations in section 3. The study is

then put in a more general context in the discussion,

section 4, before the conclusion is given in section 5.

Finally, possibilities to further explore the results are

presented in section 6.

2 THEORETICAL FRAMEWORK

2.1 Natural Language Generation

Natural Language Generation (NLG; (Reiter and

Dale, 1997)) is a theoretical framework for describ-

281

Burden H. and Heldal R..

Translating Platform-independent Code into Natural Language Texts.

DOI: 10.5220/0004312102810290

In Proceedings of the 1st International Conference on Model-Driven Engineering and Software Development (MODELSWARD-2013), pages 281-290

ISBN: 978-989-8565-42-6

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

ing the transformation from software internal mod-

els of information into natural language representa-

tions. The content, its layout and the internal order

of the generated text is dependent on who the reader

is, the purpose of the text and by which means it is

displayed. Traditionally NLG is broken down into a

three-stage pipeline; text planning, sentence planning

and linguistic realisation (Reiter and Dale, 1997).

Text Planning Text planning is to decide on what in-

formation in the original model to communicate

to the readers.

Sentence Planning The second stage deﬁnes the

structure of the individual sentences. This is also

the time for choosing the terms that are going to

be used for the different concepts. The original

software model has now been transformed into an

intermediate linguistic model, a grammar.

Linguistic Realisation In the last stage the linguistic

model is used to generate text with correct word

order and word forms. Through the linguistic re-

alisation the intermediate model has been trans-

formed into natural language text.

2.2 Related Work

Nicolás and Toval (Nicolás and Álvarez, 2009) pro-

vide a systematic literature review on the textual gen-

eration from software models. This is a good starting

point for a broader investigation into the topic. In their

study there is no evidence of text generation from

platform-independent Action languages that specify

software behaviour.

Recently there has been a ﬂourish of publications

on generating natural language from source code.

Rastkar et. al. (Rastkar et al., 2011) generate English

for crosscutting concerns, functionality that is deﬁned

in multiple modules, from Java code. As a result of

the scattered nature of the crosscutting concerns they

are difﬁcult to handle during software evolution. Hav-

ing a natural language summary for each part of the

concern and where it is implemented helps develop-

ers handle software change tasks. Sridhara et. al.

(Sridhara et al., 2010; Sridhara et al., 2011) have also

investigated natural language generation from Java

code. Their motivation is that understanding code is

a time consuming activity and accurate descriptions

can both summarise the algorithmic behaviour of the

code and reduce the amount of code a developer needs

to read for comprehension. The automatic generation

of summaries from code mean that it is easy to keep

descriptions and system synchronized. An example

of a translation from Java to English is found in Fig-

ure 1, taken from (Sridhara et al., 2010). Another ap-

proach to textual summarisations of Java code is given

Java statement

if (saveAuctions())

English translation

/* If save auctions succeeds */

Figure 1: Example translation of Java to English.

by Haiduc et. al. (Haiduc et al., 2010). They claim

that developers spend more time reading and navi-

gating code than actually writing it. Central to these

publications is that they have to have some technique

for ﬁltering out the non-functional properties from the

source code before translation into natural language.

There are also contributions on using grammars

to translate platform-independent speciﬁcations into

natural language. One such attempt is the transla-

tion between the Object Control Language (OCL;

(Warmer and Kleppe, 2003)) and English (Hähnle

et al., 2002; Burke and Johannisson, 2005). This work

was followed up by a study on natural language gen-

eration of platform-independent contracts on system

operations (Heldal and Johannisson, 2006), where the

contracts were deﬁned as OCL constraints and speci-

ﬁed the pre- and post-conditions of system operations.

3 EXPLORATORY CASE STUDY

In order to explore how a platform-independent Ac-

tion language can be translated into natural language

texts Executable and Translatable UML is used to

encode the PIM and deﬁne the transformation rules.

Instead of generating text straight from the PIM the

Grammatical Framework works as an intermediate

modelling language to handle the linguistic properties

of the text. In this way the MDA process is integrated

with the process of natural language generation.

3.1 Executable and Translatable UML

Executable and Translatable UML (xtUML; (Starr,

2001; Mellor and Balcer, 2002)) evolved from merg-

ing the Schlaer-Mellor methodology (Shlaer and Mel-

lor, 1992) with the UML

and is a graphical program-

ming language for encoding platform-independent

models. BridgePoint

was chosen as the xtUML tool.

Three kinds of diagrams are used for the graphi-

cal modeling together with a textual Action language.

The diagrams are component diagrams, class dia-

grams and state-machines. There is a clear hierar-

http://www.uml.org/

http://www.mentor.com/products/sm/

model_development/bridgepoint/

MODELSWARD2013-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

282

Figure 2: An xtUML component diagram.

chical structure between the different diagrams; state-

machines are only found within classes, classes are

only found within components. Action language can

be used in all three component types to deﬁne their

functional behaviour. The diagrams and action lan-

guage will be further explained using simpliﬁed ex-

amples taken from the problem domain chosen for the

proof-of-concept implementation, a hotel reservation

system.

3.1.1 Diagrams

The xtUML component diagram follows the deﬁni-

tion given by UML. In Fig. 2 there is an example of

a component diagram. It consists of two components,

Hotel and User, connected across an interface.

Fig. 3 shows the class diagram that resides within

the Hotel component in the component diagram. The

xtUML classes and associations are more restricted

than in UML. Only those differences that are inter-

esting for the case study are mentioned. In UML the

associations between classes can be given a descrip-

tive association name while in xtUML the association

names are automatically given names on the form RN

where N is a unique natural number. In Fig. 3 Room

is associated to Reservation over the association R2.

The BookingProcess has no operations, instead the

dynamic behaviour is deﬁned by the statemachine re-

siding within, marked by the icon in the top-left cor-

ner of the BookingProcess class.

Figure 3: An xtUML class-diagram.

In xtUML a statemachine comprises states,

events, transitions and procedures (Mellor and Balcer,

2002). Fig. 4 shows the statemachine that describes

Figure 4: An xtUML statemachine.

the lifecycles of individual instances of a Booking-

Process. Given the statemachine there are two pos-

sible transitions from the state Searching; either the

event add_room is triggered and the BookingProcess

transits to the Adding rooms state or cancel is trig-

gered and the new state is Canceling. If another

event is triggered while a BookingProcess is in the

Searching state, the event is either ignored or an er-

ror is thrown. The states can contain procedures, both

events and procedures are deﬁned by the Action lan-

guage.

3.1.2 Action Language

An important property of xtUML is the Action lan-

guage. It is a textual programming language that is in-

tegrated with the graphical models, sharing the same

meta-model (Shlaer and Mellor, 1992). Since the

Action language shares the same metamodel as the

graphical models it can be used to deﬁne how values

and class instance are manipulated (Larman, 2004) as

well as how the classes change their state (Shlaer and

Mellor, 1992). Action language can be used to deﬁne

the calls between the components as described by the

interfaces or to control the ﬂow of calls through the

ports of the components. An example of how the Ac-

tion language can be used is given in Fig. 7. The code

details a simple algorithm for ﬁnding available rooms

and resides within the Searching state of Fig. 4. The

example will be further explained in section 3.

The number of syntactical constructs is deliber-

ately kept small. The reason is that each construction

in the Action Language shall be easy to translate to

any programming language enabling the PIM to be

reused for different PSMs. Over the years a number

of different Action languages have been implemented

(Mellor and Balcer, 2002) and in 2010 OMG released

there own standard, ALF

http://www.omg.org/spec/ALF/

TranslatingPlatform-independentCodeintoNaturalLanguageTexts

283

The abstract syntax:

cat Exp

fun Sum : Exp × Exp → Exp

EInt : Int → Exp

The concrete syntax:

lincat Exp = Str

lin Sum n m = "the sum of" ++ n ++

"and" ++ m

EInt i = i.s

Figure 5: A small GF grammar.

3.1.3 Translating the Models

The xtUML model can be translated into a Platform-

Speciﬁc Model by a model compiler. A model com-

piler traverses the metamodel of the PIM and maps

each concept into the corresponding concepts of the

target language, while preserving the structure of the

PIM. Since the platform-speciﬁc code is generated

from the model, it is possible for the code and the

models to always be in synchronization with each

other since all updates and changes to the system are

done at the PIM-level, never by touching the code.

3.2 Grammatical Framework

Grammatical Framework (GF

; (Ranta, 2011)) is a

domain-speciﬁc language for deﬁning Turing com-

plete grammars (Chomsky, 1959).

3.2.1 GF Grammars

GF separates the grammars into abstract and concrete

syntaxes (Mccarthy, 1962). The abstract syntax is de-

ﬁned by two ﬁnite sets, categories (cat) and functions

(fun). The categories are used as building blocks and

deﬁne the arguments and return values of the func-

tions. From an NLG view the categories are the con-

tent and the functions the structure of the text. In the

concrete syntax each category and function is given

a linearisation deﬁnition (lincat and lin respec-

tively). These deﬁnitions gives the sentences their

structure and the terminology to be used for the con-

cepts.

A small example of a GF grammar is given in Fig.

5. In the concrete syntax the linearisation of expres-

sions is deﬁned as strings. Integers are represented by

their string values which are obtained by record selec-

tion, i.s (Ranta, 2011). The linearisation rule for Sum

is then deﬁned by concatenating the string arguments

into their corresponding slots.

An abstract syntax tree deﬁnes in which order the

http://www.grammaticalframework.org/

functions of the abstract syntax are to be used. A text

with multiple readings is ambiguous and will return

an abstract tree for each possible reading but each tree

will only return one text.

Given the example above the sentence the sum of

3 and 5 will have the tree

Sum (EInt 3) (EInt 5)

The transformation from abstract tree to text is

called linearisation. Linearisation corresponds to the

linguistic realisation of NLG. This transformation is

a built-in property of GF (Ljunglöf, 2011; Angelov,

2011).

3.2.2 The GF Resource Library

In the Resource Grammar Library (RGL; (Ranta,

2009)) a common abstract syntax has 24 differ-

ent implementations in form of concrete syntaxes.

Among the concrete languages are English, Catalan

and Japanese. The resource grammars have a shared

interface which hides the complexity of each concrete

language behind abstract function calls. Just as a pro-

grammer can use a Java API without knowing how

the methods are implemented, the resource grammars

support grammar development through an interface

that speciﬁes how grammatical structures can be de-

veloped (Ranta, 2008). The implementation of each

function can be retrieved from the source code and its

documentation.

3.3 Model-to-Text Transformations

The automatic translation from software models to

natural language texts consists of two transforma-

tions, see Fig. 6. The ﬁrst transformation takes the

software model and reshapes it to an intermediate lin-

guistic model by performing text and sentence plan-

ning. The second transformation is the linguistic real-

isation when the linguistic model is used to generate

natural language text.

Both transformations are examples of uni-

directional and automatic transformations (Stevens,

2007). The ﬁrst transformation is a reverse engineer-

ing translation since the level of abstraction is higher

in the target models than in the source models and

the two models are deﬁned by different metamodels

(Mens and Gorp, 2006).

Each transformation consists of a set of rules

(Kleppe et al., 2005) and an algorithm for how to ap-

ply the rules (Mellor et al., 2004). Since the rules of

both transformations are deﬁned according to their re-

spective meta-models they are reusable for all models

that conform to the same meta-model (Atkinson and

MODELSWARD2013-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

284

Figure 6: From platform-independent models to natural lan-

guage texts

Kuhne, 2003; Mellor et al., 2004). The transforma-

tions can even be applied to partial xtUML models,

enabling textual feedback throughout development on

all changes and updates, even if the models need fur-

ther reﬁning.

3.4 Deﬁning the Grammar

The abstract grammar of the Action language speci-

ﬁes two main categories, expressions and statements.

Expressions can be of two kinds, sentences or noun

phrases.

3.4.1 Expressions

From a linguistic point of view a sentence, abbrevi-

ated as S, expresses a proposition about the world it

inhabits. An example from the Action language is

x == y, represented in English as x equals y. The

proposition itself does not claim to be true or false,

that is dependent on the context of its evaluation. A

characteristic of English propositions are that they

follow the form subject-predicate-object, in the exam-

ple above x is the subject, equals is the predicate and

y is the object.

In natural languages, both subjects and objects can

have more complicated structures, an example being

the sum of n and m, written n + m in Action language.

Such a structure is referred to as a noun phrase, ab-

breviated as NP. The result of combining the two ex-

amples is the expression x == n + m, translated as x

equals the sum of n and m. (Expressions such as x

== y == n + m can not be formed since the expres-

sions on either side of the equality sign have to refer

to members of the program. From a linguistic point

of view the expressions have to be NPs.)

This distinction between expressions as sentences

and noun phrases is captured in the abstract grammar

by the two categories SExpr and NPExpr. The abstract

syntax for the equality function then becomes

equality : NPExpr × NPExpr → SExpr

with the concrete syntax for English deﬁned using the

resource grammars

equality x y = mkS (pred (mkV2 "equal" x y)

The function mkV2 takes a string value and returns a

verb that expects two NPs, a subject (x) and an object

(y). The function pred then takes the verb and the two

NPs in order to return an intermediate structure that is

passed on to mkS. The result of applying mkS is a sen-

tence on the form x equals y where both x and y can be

complex NPs. In order to handle agreement between

subject and verb the linearisation categories for nouns

and verbs have to be more complex than just strings.

Exactly how complicated is not a problem for those

using the RGL as an API for grammar development,

it has already been dealt with by the RGL developers.

Instead, the complexity lies in applying the appropri-

ate functions from the API in the right order.

Both the S- and NP-expressions are derived from

the xtUML metamodel where they are encoded as

subtypes of the metaclass Value or as instances of

Variable. In the above example for equality both

the binary operation and the NPExpr are deﬁned as

Values. By recursively analysing the left and right

expressions of the operation shows that x and y are

instances of the metaclass Variable with their respec-

tive names. Unary operations, attribute references and

parameters for events and operations are other sub-

classes of Value.

3.4.2 Statements

If expressions could be both noun phrases and sen-

tences, all statements are sentences. An example of

this is the Action language’s return statement return

x where x could be both an NP such as the sum of n

and m as well as a sentence, n equals m. The solution

is to have two abstract functions deﬁning the return

statement, one for returning noun phrases and one for

returning sentences

returnNP : NPExpr → Stmt

returnS : SExpr → Stmt

For the concrete syntax a more general phrasing than

return n is used since it can be unclear for non-

programmers to whom n is returned and what this

means. This decision highlights how the abstract

syntax deﬁnes the text planning of the natural lan-

guage generation while the concrete syntax deﬁnes

the words to be used for different concepts and how

these words are to be strung together, i.e. the sentence

planning.

The ﬁrst function for return statements is imple-

mented in a fashion similar to the one used for equal-

ity expressions

returnNP n =

mkS (pred n (mkNP the_Det (mkN "result")))

and returns statements such as the result is the sum of

n and m for return n + m. For returning sentences

TranslatingPlatform-independentCodeintoNaturalLanguageTexts

285

other functions from the RGL are used since the type

of the argument is different

returnS s =

mkS (mkCl (mkNP the_Det (mkN "result")) s)

As an example the result is x equals y is the equivalent

translation for return x == y.

Finally, a program is deﬁned as a list of statements

fun sequence : [Stmt] → Prgm

3.5 Translations

The diagram in Fig. 7 shows an example of a pro-

gram written in Action language side-by-side with its

translation where the Action code resides within the

Searching state shown in Fig. 4. The generated text

is an example of a controlled natural language (CNL;

(Wyner et al., 2010)) where the described language is

a subset of a natural language. A common aspect of

such languages is that they are perceived as lacking

in naturalness (Clark et al., 2009) and that the sen-

tences have a repetitive structure inherited from the

source model. This can also be a beneﬁt since it al-

lows readers to quickly recognise and interpret the

different sentence structures (Clark et al., 2009).

The Action language is platform-independent in

the sense that it makes no assumptions on how col-

lections are to be implemented, all collections are

treated as sets. This is exempliﬁed on line 6 where

many Rooms are selected and stored as a set using

the variable rooms. On line 7 a for-loop is used to

iterate over the set. On the other side, the Action

language is not independent from the object-oriented

modelling paradigm. This shows in lines 1 and 2

where an instance of an object is created and then

associated to another object. To interpret the Action

language requires an understanding of the implicit in-

formation encoded in the paradigm of object-oriented

languages (Arlow et al., 1999). The aim of the trans-

lation is to make such information explicit without be-

ing to lengthy. Another aspect of the underlying de-

sign choices of the Action language is shown in the

naming convention for traversing across associations.

Here the unique association names are used, which

have no relevance for the domain. In the translation

to natural language texts association names, such as

R2, are therefor not mentioned.

Just as graphical models the Action language is

supposed to deliver a high-level view of the system.

But the abstraction gets muddled by language-speciﬁc

details such as the association names and the object-

oriented syntax, concepts that are not meaningful to

all stakeholders (Forward and Lethbridge, 2008).

The generated text is dependent on that meaning-

ful values have been assigned to class names, param-

eters etc. If the class Reservation was named RSV

instead the translation would generate sentences such

as res refers to an RSV making the generated texts

harder to comprehend.

On line 2 the statement relate res to self

across R4 could have been translated as relate res

to self. But what does it mean that two objects are

related? From an object-oriented view it means that

they can access each other’s public attributes and op-

erations. The translation tries to capture this without

going into details about the fundamentals of object-

oriented design, substituting the reference self for

the deﬁnite form of the class name of the referent, the

BookingProcess.

The Action code ﬁnishes by sending a signal

across the interface to the User component. De-

pending on if a room was found or not different

signals are sent. Here the name of the interface,

HotelInterfaces is substituted for the more infor-

mative User which is found by traversing the meta-

model across the interface and its ports to the receiv-

ing component.

The signals exemplify a challenge for generat-

ing summarisations; should the parameters be trans-

lated using the parameter name, its deﬁning ex-

pression or both? In the case of the message the

expression is more descriptive than the name but

for the room:room_number parameter both name

and expression would be useful. The value of the

process_id is less informative than the parameter

name (process_id is included as a parameter to en-

sure that the right instance of BookingProcess gets

the reply from the User). To make an informed de-

cision on the best phrasing in each case would re-

quire a semantic analysis of the values of the parame-

ter expressions in comparison to the parameter names,

something that is not supported by the transformation

language.

4 DISCUSSION

4.1 Changing the Language

Different stakeholders have different needs in terms of

the content of the summarisations, e.g. the developers

want a quick introduction to the functionality of the

system (Sridhara et al., 2010) while domain experts

want to validate that certain requirements are met and

maintained (Arlow et al., 1999). This can be accom-

modated by using different transformation rules for

generating the grammars. One transformation can

MODELSWARD2013-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

286

create object instance res of Reservation; res refers to a Reservation

relate res to self across R4; res and the BookingProcess share information

res.check_in = param.in; res’s check in gets the value of the given in

res.check_out = param.out; res’s check out gets the value of the

given out

res.guests = param.quantity; res’s guests gets the value of the given

quantity

room_number = 0; room number gets the value of 0

select many rooms from instances of Room; rooms refers to many Rooms

for each room in rooms for each room in rooms

relate self to room across R6; the BookingProcess and room share information

select many ress related by ress refers to many Reservations

room -> Reservation[R2];

for each res in ress for each res in ress

if (res.check_in > param.out if res’s check in is greater than the

given out

or res.check_out < param.in) or res’s check out is less than the given in

and room.beds == param.quantity and room’s beds equals the given quantity

room_number = room.getNumber(); then room number gets the value of room’s

get Number

break; %the for-loop is terminated

end if;

end for;

if room_number > 0 if room number is greater than 0

break; %the for-loop is terminated

end if;

end for;

if room_number == 0 if room number equals 0, then a cancellation

send HotelInterfaces:: with process id and message is sent to User

cancellation(process_id:self.process_id,

message:"No available rooms.");

else else a confirm room with process id and room

send HotelInterfaces:: is sent to User

confirm_room(process_id:self.process_id,

room:room_number);

end if;

Figure 7: An example of Action language code with natural language summarisation.

then generate a grammar that produces summarisa-

tions for the developers while another transformation

is aimed towards the needs of the domain experts. The

result is a shared abstract syntax that is realised by dif-

ferent concrete syntaxes to ﬁt their respective needs

using different functions from the RGL.

Some stakeholders might prefer another language

than English. This can be facilitated by the multi-

lingual aspect of the Grammatical Framework. In

this approach the lexicon (or domain vocabulary) of

the grammar is generated from the Action language.

However, it is not obvious that the domain concepts

share their names across languages. There are two

ways to overcome this challenge; The naïve way is to

ensure that the modelling elements use the terminol-

ogy of the desired target language, by this approach

the lexicon is automatically generated in the desired

language. The other solution is to manually develop

a lexicon per desired language, as explained in (An-

gelov and Ranta, 2009). Since the abstract functions

deﬁned by the RGL are language-independent, the

same rules can be used for all desired languages. In

this way the structure and content of the texts are pre-

served but with language-speciﬁc implementations of

the sentences.

It is important to remember that any changes to the

grammars are made through the transformation rules.

As a consequence the transformation experts need to

know the grammar that is used to model the texts well

enough to implement the changes. It also means that

neither the software modellers nor the customers need

to know how the text is generated or how to formu-

late model transformations. When the transforma-

tions have been deﬁned the translations are generated

by a push on the button. The generation can then be

repeated and reused for all models that conform to the

same metamodel as the transformation rules (Atkin-

son and Kuhne, 2003; Mellor et al., 2004).

TranslatingPlatform-independentCodeintoNaturalLanguageTexts

287

4.2 The Complexity of Model

Transformation

The complexity of the model transformations does not

lie in the complexity of the transformation rules but

in the complexity of the modelling language they are

applied to (Jézéquel et al., 2012).

On the target end of the transformation a knowl-

edge of linguistics in general and the grammar API

is needed to utilise the different categories and func-

tions of the grammar in an efﬁcient way. The alter-

native to grammars would be to generate text straight

from the models with the tedious work of making sure

that there is congruence between the verbs and the

noun phrases as well as taking care of aspects like

a reservation but an interface.

4.3 Text vs Models

Another beneﬁt of natural language translations of

textual software models embedded in graphical model

elements is that they enable using any preferred text

editor for searching after concepts and actions that

should be in the text. Different modelling tools have

their own support for searching with different inter-

faces, learning how to use them all is a tall request on

stakeholders (Arlow et al., 1999).

5 CONCLUSIONS

The proposed way of translating Action code differs

from previous work on code summarisation in that the

platform-independent models already have ﬁltered

away the non-functional properties of the software,

leaving the functional properties exposed. In compar-

ison to previous research on generating natural lan-

guage texts from software models this is the ﬁrst at-

tempt to generate software behaviour from platform-

independent code.

The PIM can be reused to generate a number of

different platform-speciﬁc models that include the us-

age of different APIs, programming languages, con-

nections to operative systems and deployment on

hardware. Since, the functionality of the system is

captured in the PIM so the generated text gives a nat-

ural language summary of the system’s behaviour dis-

regarding how this behaviour is implemented. This

means that the generated text can be used across plat-

forms and updated by re-generation whenever the

PIM is changed to reﬂect new requirements or bug-

ﬁxing. So, instead of having one framework for trans-

lating Java, another framework for translating C and

a third for C++, a general framework for translating

platform-independent code can be reused across plat-

forms independently of how the system is realised.

6 FUTURE WORK

The mapping rules that deﬁne the transformation from

PIM to PSM add the non-functional features that de-

termine a certain combination of platform-speciﬁc de-

tails. Generated summarisations from the mappings

could then describe the different proﬁles and proper-

ties of the system, such as safety and persistency.

The challenges in natural language generation

from the combination of textual and graphical models

is an interesting step to further explore. A case study

is planned for including transformation rules that map

the structure of the statemachines on to the generated

translations. In this way the translations will give an

overall structure of the software that follows the life-

cycles of the system’s classes and objects.

ACKNOWLEDGEMENTS

This work was partially funded by the National Grad-

uate School of Language Technology in Sweden and

the Center of Language Technology in Gothenburg.

The authors would like to thank Toni Siljamäki for

sharing his insights in model transformations.

REFERENCES

Angelov, K. (2011). The Mechanics of the Grammatical

Framework. PhD thesis, Chalmers University of Tech-

nology, Gothenburg, Sweden.

Angelov, K. and Ranta, A. (2009). Implementing Con-

trolled Languages in GF. In Controlled Natural Lan-

guage, Workshop on Controlled Natural Language,

CNL 2009, Marettimo Island, Italy, June 8-10, 2009,

volume 5972 of Lecture Notes in Computer Science,

pages 82–101. Springer Verlag.

Arlow, J., Emmerich, W., and Quinn, J. (1999). Liter-

ate Modelling - Capturing Business Knowledge with

the UML. In Selected papers from the First Interna-

tional Workshop on The Uniﬁed Modeling Language

UML’98: Beyond the Notation, pages 189–199, Lon-

don, UK. Springer-Verlag.

Atkinson, C. and Kuhne, T. (2003). Model-driven devel-

opment: a metamodeling foundation. IEEE Software,

20(5):36 – 41.

Burke, D. A. and Johannisson, K. (2005). Translating For-

mal Software Speciﬁcations to Natural Language. In

Blache, P., Stabler, E. P., Busquets, J., and Moot, R.,

editors, LACL, volume 3492 of Lecture Notes in Com-

puter Science, pages 51–66. Springer Verlag.

MODELSWARD2013-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

288

Chomsky, N. (1959). On certain formal properties of gram-

mars. Information and Control, 2:137–167.

Clark, P., Murray, W. R., Harrison, P., and Thompson, J. A.

(2009). Naturalness vs. Predictability: A Key Debate

in Controlled Languages. In Controlled Natural Lan-

guage, Workshop on Controlled Natural Language,

CNL 2009, Marettimo Island, Italy, June 8-10, 2009,

volume 5972 of Lecture Notes in Computer Science,

pages 65–81. Springer Verlag.

Forward, A. and Lethbridge, T. C. (2008). Problems and

Opportunities for Model-Centric Versus Code-Centric

Software Development: A Survey of Software Pro-

fessionals. In Proceedings of the 2008 international

workshop on Models in Software Engineering, MiSE

’08, pages 27–32, New York, NY, USA. ACM.

Hähnle, R., Johannisson, K., and Ranta, A. (2002). An Au-

thoring Tool for Informal and Formal Requirements

Speciﬁcations. In Kutsche, R.-D. and Weber, H., ed-

itors, FASE, volume 2306 of Lecture Notes in Com-

puter Science, pages 233–248. Springer.

Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. (2010).

On the Use of Automated Text Summarization Tech-

niques for Summarizing Source Code. In Antoniol,

G., Pinzger, M., and Chikofsky, E. J., editors, WCRE,

pages 35–44. IEEE Computer Society.

Heldal, R. and Johannisson, K. (2006). Customer Validation

of Formal Contracts. In OCL for (Meta-)Models in

Multiple Application Domains, pages 13–25, Genova,

Italy.

Jézéquel, J.-M., Combemale, B., Derrien, S., Guy, C., and

Rajopadhye, S. (2012). Bridging the Chasm Between

MDE and the World of Compilation. Journal of Soft-

ware and Systems Modeling (SoSyM), pages 1–17.

Kleppe, A., Warmer, J., and Bast, W. (2005). MDA Ex-

plained: The Model Driven Architecture

: Practice

and Promise. Addison-Wesley Professional.

Lange, C. F. J., Bois, B. D., Chaudron, M. R. V., and De-

meyer, S. (2006). An experimental investigation of

UML modeling conventions. In Nierstrasz, O., Whit-

tle, J., Harel, D., and Reggio, G., editors, MoDELS,

volume 4199 of Lecture Notes in Computer Science,

pages 27–41. Springer.

Larman, C. (2004). Applying UML and Patterns: An Intro-

duction to Object-Oriented Analysis and Design and

Iterative Development (3rd Edition). Prentice Hall

PTR, Upper Saddle River, NJ, USA.

Ljunglöf, P. (2011). Editing Syntax Trees on the Surface.

In Nodalida’11: 18th Nordic Conference of Compu-

tational Linguistics, volume 11, Riga, Latvia. NEALT

Proceedings Series.

Mccarthy, J. (1962). Towards a Mathematical Science of

Computation. In Proceedings of the Information Pro-

cessing Congress, pages 21–28. North-Holland.

Mellor, S. J. and Balcer, M. (2002). Executable UML: A

Foundation for Model-Driven Architectures. Addison-

Wesley Longman Publishing Co., Inc., Boston, MA,

USA.

Mellor, S. J., Kendall, S., Uhl, A., and Weise, D. (2004).

MDA Distilled. Addison Wesley Longman Publishing

Co., Inc., Redwood City, CA, USA.

Mens, T. and Gorp, P. V. (2006). A Taxonomy of Model

Transformation. Electronic Notes in Theoretical Com-

puter Science, 152:125–142.

Miller, J. and Mukerji, J. (2003). MDA Guide Version 1.0.1.

Technical report, Object Management Group (OMG).

Mohagheghi, P. and Aagedal, J. (2007). Evaluating quality

in model-driven engineering. In MISE ’07: Proceed-

ings of the International Workshop on Modeling in

Software Engineering, page 6, Washington, DC, USA.

IEEE Computer Society.

Nicolás, J. and Álvarez, J. A. T. (2009). On the genera-

tion of requirements speciﬁcations from software en-

gineering models: A systematic literature review. In-

formation & Software Technology, 51(9):1291–1307.

Perry, D. E. and Wolf, A. L. (1992). Foundations for the

study of software architecture. SIGSOFT Softw. Eng.

Notes, 17:40–52.

Ranta, A. (2008). Grammars as software libraries. In Huet,

G., Plotkin, G., Lévy, J.-J., and Bertot, Y., editors,

From semantics to computer science: essays in honor

of Gilles Kahn. Cambridge University Press.

Ranta, A. (2009). The GF Grammar Resource Library. Lin-

guistic Issues in Language Technology, 2(2).

Ranta, A. (2011). Grammatical Framework: Program-

ming with Multilingual Grammars. CSLI Publica-

tions, Stanford.

Rastkar, S., Murphy, G. C., and Bradley, A. W. J. (2011).

Generating natural language summaries for crosscut-

ting source code concerns. In ICSM, pages 103–112.

IEEE.

Reiter, E. and Dale, R. (1997). Building applied natural

language generation systems. Nat. Lang. Eng., 3:57–

87.

Shlaer, S. and Mellor, S. J. (1992). Object lifecycles: model-

ing the world in states. Yourdon Press, Upper Saddle

River, NJ, USA.

Spreeuwenberg, S., Van Grondelle, J., Heller, R., and Gri-

jzen, G. (2010). Design of a cnl to involve domain

experts in modelling. In Rosner, M. and Fuchs, N.,

editors, CNL 2010 Second Workshop on Controlled

Natural Languages. CEUR Workshop Proceedings.

Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., and

Vijay-Shanker, K. (2010). Towards automatically gen-

erating summary comments for Java methods. In Pro-

ceedings of the IEEE/ACM international conference

on Automated software engineering, ASE ’10, pages

43–52, New York, NY, USA. ACM.

Sridhara, G., Pollock, L., and Vijay-Shanker, K. (2011).

Automatically detecting and describing high level ac-

tions within methods. In Proceedings of the 33rd

International Conference on Software Engineering,

ICSE ’11, pages 101–110, New York, NY, USA.

ACM.

Starr, L. (2001). Executable UML: How to Build Class

Models. Prentice Hall PTR, Upper Saddle River, NJ,

USA.

Stevens, P. (2007). A Landscape of Bidirectional Model

Transformations. In Lämmel, R., Visser, J., and

Saraiva, J., editors, GTTSE, volume 5235 of Lecture

Notes in Computer Science, pages 408–424. Springer.

TranslatingPlatform-independentCodeintoNaturalLanguageTexts

289

Warmer, J. and Kleppe, A. (2003). The Object Con-

straint Language: Getting Your Models Ready for

MDA. Addison-Wesley Longman Publishing Co.,

Inc., Boston, MA, USA, 2 edition.

Wyner, A., Angelov, K., Barzdins, G., Damljanovic, D.,

Davis, B., Fuchs, N., Hoeﬂer, S., Jones, K., Kaljurand,

K., Kuhn, T., Luts, M., Pool, J., Rosner, M., Schwit-

ter, R., and Sowa, J. (2010). On controlled natural lan-

guages: Properties and prospects. In Fuchs, N. E., ed-

itor, Proceedings of the Workshop on Controlled Nat-

ural Language (CNL 2009), volume 5972 of Lecture

Notes in Computer Science, pages 281–289, Berlin /

Heidelberg, Germany. Springer Verlag.

MODELSWARD2013-InternationalConferenceonModel-DrivenEngineeringandSoftwareDevelopment

290