An Intermediate Language for Compilation to Scripting Languages
Paola Giannini and Albert Shaqiri
Computer Science Institute, DiSIT, Universit`a del Piemonte Orientale
Via Teresa Michel 11, 15121 Alessandria, Italy
Keywords:
Scripting Languages, Functional Languages, Intermediate Language, Translation.
Abstract:
In this paper we introduce an intermediate language for translation of
F#
, a functional language polymorphi-
cally typed relying on the .Net platform, to different scripting languages, such as Python and JavaScript. This
intermediate language (
IL
for short) is an imperative language, with constructs that make possible to move a
code fragment outside its definition environment, during the translation. Definition of names (variables and
functions) are done in blocks, like in Python (and JavaScript) and do not have to statically precede their use.
We present a translation of a core
F#
(including mutable variables) into
IL
.
1 INTRODUCTION
Implementing an application in JavaScript (or any
other dynamically typed language) can cause prob-
lems due to the absence of type checking. Such prob-
lems can lead to unexpected application behaviour
followed by onerous debugging. Although dynamic
type checking and automatic type casting shorten the
programming time, they introduce serious difficulties
in the maintenance of medium to large applications.
This is the reason why dynamically typed languages
are used mostly for prototyping and quick scripting.
We propose to deal with these problems using dy-
namically typed languages as “assembly languages”
to which we translate the source code from
F#
which
is statically typed. In this way, we take advantage of
the
F#
type checkerand type inference system, as well
as other
F#
constructs and paradigms such as pattern
matching, classes, discriminated unions, namespaces,
etc., and we may use the safe imperative features in-
troduced via
F#
mutable variables. There are also the
advantages of using an IDE such as Microsoft Vi-
sual Studio (code organization, debugging tools, In-
telliSense, etc.).
To provide translation to different target languages
we introduce an intermediate language,
IL
for short.
This is useful, for instance, for translating to Python
that does not have complete support for functions as
first class concept, or for translating to JavaScript, us-
This work has been partially supported by MIUR
CINA-Compositionality, Interaction, Negotiation, Auto-
nomicity for the future ICT society.
ing or not libraries such as jQuery.
Our aim is to prove the correctness of the com-
pilers produced. To do that we formalize
IL
, and the
translation from the source language to
IL
. The lan-
guage
IL
is imperative, and has some of the character-
istics of the scripting languages that makes them flex-
ible, but difficult to check, such as blocks in which
definition and use of variables may be interleaved,
and in which use of a variable may precede its def-
inition. (
IL
is partly inspired by IntegerPython, see
(Ranson et al., 2008).) Therefore, the proof of cor-
rectness of the translation from the source language
F#
to
IL
already covers most of the gap from
F#
to the
target scripting languages. In
IL
we also have some
construct that may be used to manipulate safely frag-
ments of open code.
The paper is organized as follows. In Section 2,
we introduce the challenges of the translation from
F#
to Python and JavaScript via some examples, that led
us to introduce our intermediate language. We also
outline the translation from
IL
to both JavaScript and
Python. In Section 3 we define the fragment of
F#
used as source language, and in Section 4 we formal-
ize
IL
. The formal translation from
F#
to
IL
is de-
fined in Section 5, where it is stated to preserve the
dynamic semantics of
F#
. In Section 6 we compare
our work with the work of others, and finally in Sec-
tion 7 we summarize our work, discussing briefly the
implementation issues and highlighting our plans for
future work.
92
Giannini P. and Shaqiri A..
An Intermediate Language for Compilation to Scripting Languages.
DOI: 10.5220/0004588600920103
In Proceedings of the 8th International Joint Conference on Software Technologies (ICSOFT-EA-2013), pages 92-103
ISBN: 978-989-8565-68-6
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
2 TRANSLATION BY
EXAMPLES: DESIGN CHOICES
In the fragment of
F#
we consider as source of our
translation we have the typical functional language
constructs: function definition and application, inte-
gers, booleans, addition and the conditional expres-
sion, and an imperative fragment including mutable
variables, assignment, and sequences of expressions.
On the left-hand-side of an assignment there must be
a variable that was introduced with the
mutable
mod-
ifier.
2.1 Sequences of Expressions
Many
F#
constructs can be directly mapped to
JavaScript (or Python), but when this is not the case
we obtain a semantically equivalent behaviour by us-
ing the primitives offered by the target language. E.g.,
in
F#
a sequence of expressions is itself an expression,
while in JavaScript and Python it is a statement. Sup-
pose we want to translate a piece of code that calcu-
lates a fibonacci number, binds the result to a name
and also stores the information if the result is even or
odd. In Fig. 1 we have one possible
F#
implementa-
tion.
let z=7
let mutable even = false
let x =
let rec fib x =
if x < 3 then 1
else fib(x - 1) + fib(x - 2)
let temp = fib z
even <- (temp % 2 = 0)
temp
x
Figure 1:
F#
program containing sequence of expressions.
As we can see, on the right-hand-side of
let x=
we have a sequence of expressions: the definition of
the function
fib
followed by the definition of
temp
,
etc. This sequence is, in
F#
, an expression. If we
directly map this code into JavaScript we obtain the
syntactically incorrect code of Fig. 2. This program
is syntactially wrong, since on the right-hand-side of
an assignment we must have an expression, while a
sequence of expressions is, in JavaScript, a statement.
To transform a sequence of statements in an expres-
sion, in JavaScript, we wrap the sequence into a func-
tion, and to execute it we call the function, i.e., we use
a JavaScript closure and application. Also, the whole
program is wrapped into an entry point function. In
this way, the code of Fig. 3 is correct. Unfortunately,
the same cannot be done in Python as its support for
var z = 7;
var even = false;
var x =
var fib = function (x) {
if (x < 3) return 1;
else return fib(x-1)+fib(x-2)};
var temp = fib(z);
even = (temp % 2) == 0;
temp;
return x;
Figure 2: Naive translation into JavaScript of sequence of
expressions.
(function() {
var z = 7;
var even = false;
var x = (function () {
var fib = function (x) {
if (x < 3) return 1;
else return fib(x-1)+fib(x-2)};
var temp = fib(z);
even = (temp % 2) == 0;
return temp })();
return x })();
Figure 3: Correct JavaScript translation.
closures is partial. So we have to define a temporary
function, say
temp1
, in the global scope and to exe-
cute it we have to call
temp1
in the place where the
original sequence should be. However, variables such
as
even
will be out of the scope of their definition,
and this would make the translation wrong. To ob-
tain a behaviour semantically equivalent, we have to
pass to
temp1
the variable
even
, by reference, since it
may be modified in the body of
temp
. Note that, this
problem is not present in JavaScript where the closure
is defined and called in the scope of
even
. Another
problem in Python is related to lambdas, whose body
must be an expression (not a sequence). So we de-
fine the function
temp2
whose body contains the state-
ments that should be placed where an expression is
expected. In Fig. 4 we can see the translation of the
F#
code into Python. The class
ByRef
is used to wrap
the mutable variable
even
to obtain a parameter called
by reference. The Python code generator inserts the
needed wrapping and unwrapping before and after the
call of
temp1
, and in the body of
temp1
.
The problem we illustrated above occurs when-
ever in the target language we get a statement where
an expression is expected. Since the target languages
handle the situation differently, we abstract from this
specific problem, and consider the more general prob-
lem of moving “open code” from its context, replac-
ing it with an expression having the same behaviour.
Taking inspiration from work on dynamic binding,
AnIntermediateLanguageforCompilationtoScriptingLanguages
93
def temp1(w, z):
def temp2(w, fib, x):
if (x < 3): return 1
else: return fib(x-1)+fib(x-2)
fib = lambda x: temp2(w, fib, x)
temp = fib(z)
w.value = ((temp % 2) == 0)
return temp
def __main__():
z = 7
even = False
wrapper1 = ByRef(even)
x = temp1(wrapper1, z)
even = wrapper1.value
return x
__main__();
Figure 4: Correct Python translation.
see (Nanevski, 2003) and recent work by the authors,
see (Ancona et al., 2013), we define a pair of box-
ing/unboxing contructs, that we call:
stm2exp
, and
exc
. The construct
stm2exp
wraps “open code” (in
this case a sequence of expressions) providing the in-
formation on the environment needed for its execu-
tion, that is the mutable and immutable variables oc-
curring in it. This construct defines a value, similar
to a function closure. The construct
exc
is used to
execute the code contained in
stm2exp
. To do this it
must provide values for the immutable variables, in
our example the variable
z
, and bindings for the mu-
table variables to variables in the current environment,
since when executing the code we have to modify the
variable
even
.
With these constructs, the
F#
code of Fig. 1 would
be translated into the IL code in Fig. 5. All the
let
constructs are translated to variable definitions. The
sequence of statements on the right-hand-side of “
let
x=
is packed into a stm2exp expression. Its first
component is the translation of the sequence of state-
ments, the second
w->EV
says that in the execution en-
def y = stm2exp(
def fib =
fun x ->
if x < 3 then 1
else (fib (x-1) + fib (x-2));
def temp = fib u;
w <- temp % 2 = 0;
temp,
w->EV, u);
def z = 7;
def even = false;
def x = exc(y, EV->even, z);
x
Figure 5: Translation of
F#
sequence of expressions in the
intermediate language.
vironment there should be a rebinding of the global
name
EV
to a variable. Such variable may (in this case
will) be modified by the executionof the code through
assignment to the local variable
w
. The third compo-
nent says that a value for
u
must be provided. The
variable
u
is not modified by the execution of thecode.
We choose to use global names to unbind/rebind mu-
table variables,
w
in our example, so that the local vari-
ables can be consistently renamed without affecting
the semantics of the construct as formal parameters
of functions. Instead names such as
EV
are global to
the whole program.
To obtain theresult that we would have by evaluat-
ing the sequence of statements in the current environ-
ment, to the variable
x
it is assigned the
exc
expres-
sion applied to
y
, which is bound to
stm2exp(
···
)
. The
name
EV
is bound to the (mutable) variable
even
and
the variable
u
will be assigned the value of the vari-
able
z
. Regarding the different treatment of mutable
and immutable variables, notice that, even though our
intermediate language is imperative, we know, since
we are translating
F#
code that some variables are im-
mutable, so we have to provide just an initial value.
The constructs
stm2exp
and
exc
have a different
translation into the target languages JavaScript and
Python, in particular for JavaScript we can take ad-
vantage from the fact that the closure wrapping the
code can be inlined in the position where we have
exc
,
so we can substitute both the mutable and immutable
variables, instead the translation to Python treats the
two kind of variables differently.
2.2 Dynamic Type Checking
JavaScript, and many dynamically typed languages,
lack a rigorous type system. On the contrary, in
F#
if
we write a function that adds two integers, say:
let add x y = x + y
we get
val add : int -> int -> int
because, even though we do not specify type infor-
mation, the interpreter infers the type shown after the
function definition. Therefore, there is no way of call-
ing
add
with arguments that are not of type integer.
However, if our translation in the intermediate code
would produce a function whose body was simply
x+y
, which in turn could be translated in the corre-
sponding expression in both JavaScript and Python,
the target JavaScript function could be called, e.g.,
add("foo")(1)
and obtain the string
"foo1"
which is
not what we wanted. In Python the situation would
be better, in the sense that we cannot call
add
on a
string and an integer, however, due to overloading we
ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies
94
can call it on two floating points obtaining a floating
point. To prevent this, the translation in the interme-
diate language, which follows, insert dynamic checks
on parameters of functions.
def add = fun x ->
def x1 = check(int, x);
fun y ->
def y1= check(int, y);
x1 + y1;
These checks are translated into dynamic type check-
ing in JavaScript and Python. In JavaScript we use
the function
checkInt
(that we defined) that returns
its argument if it is an integer, and fails, raising an
exception, if the parameter is not an integer:
var add = function (x) {
var x1 = checkInt(x);
return function(y) {
var y1 = checkInt(y);
return x1 + y1 } }
Similarly for Python:
def temp__1(y, x):
y1 = checkInt(y)
return (x + y1)
def temp__2(x):
x1 = checkInt(x)
return lambda y: temp__1(y, x1)
add = lambda x: temp__2(x)
3 CORE
F#
The syntax for the core
F#
language is presented in
Fig.6. We sacrificed minimality to clarity, includ-
ing constructs, such as
let
,
let mutable
, and
let
rec
that are used in the practice of programming and
that raise challenges in the translation to dynamic lan-
guages. We also did not introduce imperative features
through reference types, but through mutable vari-
ables, since this is closer to the imperative style of
programming. Moreover, we present a typed version
of
F#
without type inference, since this is performed
by the
F#
compiler. In the type system we omit type
variables, as they do not add complexity to the trans-
lation.
e :: = x | n |
tr
|
fls
| e
+
e | if e then e else e
| fun x:T
->
e | let [mutable] x=e in e
| e e | let rec
x:T=v in e | x
<-
e | e, e
T :: =
int
|
bool
| T T
v :: = n |
tr
|
fls
| fun x:T
->
e
Figure 6: Syntax of
F#
.
In the grammar for expressions, in Fig.6, the
square brackets “[. . .] delimit an optional part of the
syntax, we use x, y, z for variable names, and the over-
bar sequence notation is used according to (Igarashi
et al., 2001). For instance:
x:T=v stands for
x
1
:T
1
=v
1
··· x
n
:T
n
=v
n
”. The empty sequence is de-
noted by
/
0”. For an
F#
expressions e the free vari-
ables of e, FV(e) are defined in the standard way. An
expression e is closed if FV(e) =
/
0.
The
let rec
construct introduces mutually recur-
sive variables. Variable names, in this constructs are
meant to be bound to functions (as seen for
fib
in
the example of Fig. 1). The
let
construct (fol-
lowed by an optional
mutable
modifier) binds the
variable
x
to the value resulting from the evaluation
of the expression on the right-hand-side of = in the
evaluation of the body of the construct. As usual
the notation let f x=e
1
in e
2
is a short hand for
let f=fun x:T
->
e
1
in e
2
where T is the type of e
1
.
Similarly for
let rec
. In the (concrete syntax) of the
examples, as in
F#
, “, and
in
are substituted by a re-
turn without indentation.
When the
let
construct is followed by
mutable
the
variable introduced is mutable. Only mutable vari-
ables may be used on the left-hand-side of an assign-
ment. This restriction is enforced by the type system
of the language. The type system enforces also the
restriction that the body of a function cannot contain
free mutable variables, even though it may contain
bound mutable variables. So, the function
f
in Fig.
7 is not correct, whereas the definition of
g
that fol-
lows is correct. A type environment Γ is defined by:
let mutable z = 0
let f x =
if (x > 0) then z <- x
else z <- -x
z
let g x =
let mutable w = 0
if (x > 0) then w <- x
else w <- -x
w
z
Figure 7: Typing functions in
F#
.
Γ ::= x:T, Γ | x:T!, Γ |
/
0
that is Γ associates variables with types, possibly fol-
lowed by ! . If the type is followed by ! this means that
the variable was introduced with the mutable modi-
fier. Let denote either ! or the empty string, and
let dom(Γ) = {x | x:T Γ}. We assume that for any
AnIntermediateLanguageforCompilationtoScriptingLanguages
95
variable x, in Γ there is at most an associated type. We
say that the expression e has type T in the environment
Γ if the judgement
Γ e : T
is derivable from the rules of Fig. 8. In the rules of
Fig. 8, with Γ[Γ
] we denote the type environment
such that dom(Γ[Γ
]) = dom(Γ) dom(Γ
) and:
if x:T Γ
then x:T Γ[Γ
], and
if x:T Γ and x 6∈ dom(Γ
), then x:T Γ[Γ
].
In the following we describe the most interesting
rules.
Consider rule (TYABS): to type the body of a function
we need assumptions on its free variables and for-
mal parameter. From the definition of Γ[Γ
] we have
that the assumptions on its free variables must coin-
cide with the one present in the environment of the
definition of the function. Moreover, none of them
may have been declared as mutable. However, in the
environment in which the function is defined, Γ[Γ
],
there can be mutable variables, as long as they are not
needed to type the body of the function. In the ex-
ample of Fig. 7, if the definition of the function
f
were typable, it should have been typed from the en-
vironment Γ[Γ
] = z:
int
!, therefore, to type its body
we would have used the environment z:
int
!, x:
int
,
i.e., Γ
= z:
int
!. However, this is not possible. In-
stead, the definition of
g
, which is again typed in
Γ[Γ
] = z:
int
!, not having z free in its body, can be
typed from x:
int
, by defining Γ
=
/
0.
The rules (TYLET) and (TYLETMUT) bind a variable, x, to
the expression e
1
in the expression e
2
. So the expres-
sion e
2
is typed in a type environment in which x is
associated with the type of e
1
.
In the rule (TYLETMUT) the type is followed by ! so that
inside e
2
the variable x may be used on the left-hand-
side of an assignment, see rule (TYASSIGN).
Our core
F#
language has imperative features, so
for the definition of the operational semantics we use
a store. The runtime configurations are pairs “expres-
sion, store”, e | ρ, where a store ρ is a mapping be-
tween locations and values:
l
1
7→ v
1
, . . . l
n
7→ v
n
In Fig. 9 we define:
runtime expressions, which are expressions in-
cluding locations (generated by the evaluation of
mutable variables definitions);
evaluation contexts defining, in conjunction with
rule (CTX-F), the reduction strategy of the lan-
guage, which is call-by-value, with evaluation
left-to-right, and
the rules for the evaluation relation, .
In the rules, with e[x := e
] we denote the result of
substituting x with e
in e with renaming if needed.
Moreover, ρ[x 7→ v] is defined by: ρ[x 7→ v](x) = v,
and ρ[x 7→ v](y) = ρ(y), when x 6= y.
The evaluation of the sum expression assumes that
the operand be integers, and returns n, which is the
numeral corresponding to the sum of the values of n
1
and n
2
. For the conditional statements we have two
rules corresponding to the (boolean) value of the con-
dition. Both the evaluations of the application, rule
(APP-F), and
let
, rule (LET-F), substitute x with its the
value in the body of the construct. This is in accord
with the fact that x is immutable. Instead, for a vari-
able defined
mutable
, rule (LETMUT-F) , a new location
l is generated, added to the store with the initial value
v, and the variable x is substituted with l. Therefore,
during evaluation, expressions may contain locations.
Indeed, since variables on the left-hand-side of as-
signments where always introduced by
let mutable
,
when an assignment is evaluated, rule (ASSIGN-F), we
have a configuration: l
<-
v | ρ which is evaluated by
changing the value of the location l to be v. The eval-
uation of
let rec
, rule (LET-F), produces the body e in
which each variable x
i
is substituted with a
let rec
expression with body v
i
, so that if x
i
is evaluated all
the variables
x will be substituted with their defini-
tions v. Evaluation of a location, rule (LOC-F), pro-
duces the value associated in the store. Finally in rule
(CTX-F) the context E selects the first sub-expression
to be evaluated. We can show that evaluation is deter-
ministic.
The typing rules in Fig.8 are for the (source) ex-
pression language, so they do not include a rule for
locations. To type run-time expressions we need a
store environment Σ assigning types to locations. The
type judgement should therefore be:
Γ | Σ e : T
and the typing rule for locations
Γ | Σ l : Σ(l) (TYLOCF)
All the other rules are obtained by putting Γ | Σ on the
left-hand-side of in the typing rules of Fig.8.
Definition 1. A store ρ is well-typed with respect to a
type environment Γ, and a store environment Σ, writ-
ten Γ | Σ ρ, if dom(ρ) = dom(Σ), and for all l ρ,
we have that Γ | Σ ρ(l) : Σ(l).
Types are preserved by reduction, and progress
holds, as the following two theorems state.
Theorem 2 (Preservation). Let Γ | Σ e : T, and ρ be
such that Γ | Σ ρ. If e | ρ e
| ρ
, then Γ | Σ
e
:
T, for some Σ
Σ such that Γ | Σ
ρ
.
Theorem 3 (Progress). Let
/
0 | Σ e : T, then either e
is a value or for any store ρ such that
/
0 | Σ ρ there
are, e
, and ρ
such that e | ρ e
| ρ
.
ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies
96
Γ n :
int
(TYNUM) Γ
tr
,
fls
:
bool
(TYBOOL)
Γ e
1
:
int
Γ
e
e
2
:
int
(TYSUM)
Γ e
1
+
e
2
:
int
Γ e :
bool
Γ e
1
: T Γ e
2
: T
(TYIF)
Γ if e then e
1
else e
2
: T
Γ
[x:T] e : T
y, T
′′
y:T
′′
! 6∈ Γ
(TYABS)
Γ[Γ
] fun x:T
->
e : T T
Γ e
1
: T T
Γ e
2
: T
(TYAPP)
Γ e
1
e
2
: T
x:T Γ
(TYVAR)
Γ x : T
Γ e
1
: T Γ[x:T] e : T
(TYLET)
Γ let x=e
1
in e
2
: T
Γ[
x:T] v
i
: T
i
(1 i n)
Γ[
x:T] e : T
(TYREC)
Γ let rec
x:T=v in e : T
Γ e
1
: T Γ[x:T!] e : T
(TYLETMUT)
Γ let mutable x=e
1
in e
2
: T
Γ e : T x:T! Γ
(TYASSIGN)
Γ x
<-
e : T
Γ e
1
: T Γ e
2
: T
(TYSEQ)
Γ e
1
, e
2
: T
Figure 8: Typing rules of core
F#
.
e :: = · · · | l runtime expression
E :: = [ ] | E
+
e | n
+
E | if E then e else e | E e | v E | let [mutable] x=E in e evaluation contexts
| u
<-
E | E, e
n
1
+
n
2
| ρ n | ρ if ˜n = ˜n
1
+
int
˜n
2
(SUM-F)
if
tr
then e
1
else e
2
| ρ e
1
| ρ (IFTRUE-F)
if
fls
then e
1
else e
2
| ρ e
2
| ρ (IFFALSE-F)
(fun x:T
->
e) v | ρ e[x := v] | ρ (APP-F)
let x=v in e | ρ e[x := v] | ρ (LET-F)
let rec
x:T=v in e | ρ
e[x
i
:= (let rec
x:T=v in v
i
) | 1 i n] | ρ (REC-F)
let mutable x=v in e | ρ e[x := l] | ρ[l 7→ v] l 6∈ dom(ρ) new (LETMUT-F)
l
<-
v | ρ v | ρ[l 7→ v] l dom(ρ) (ASSIGN-F)
v, e | ρ e | ρ (SEQ-F)
l | ρ v | ρ if ρ(l) = v (VAR-F)
e | ρ e
| ρ
E 6= []
(CTX-F)
E [e] | ρ E[e
] | ρ
Figure 9: Operational semantics of core
F#
.
4 INTERMEDIATE LANGUAGE
The intermediate language,
IL
, is an imperative lan-
guage with three syntactic categories: expressions,
statements and blocks. We introduce the construct
that wraps code that need to be moved from its def-
inition environment, and the one that executes such
code in the runtime environment.
The syntax of
IL
is presented in Fig.10.
There are three syntactic categories: blocks, state-
ments, and expressions. We introduce the distinction
between expressions and statements as many target
languages do. This facilitates the translation process
and prevents some errors while building the interme-
diate abstract syntax tree, see (Appel, 1998) for a sim-
ilar choice. Blocks are sequences of statements or
expressions ended by an expression. In our transla-
tion we flatten the nested structure of
let
constructs
so we need blocks in which definitions and expres-
sions/statements may be intermixed. Moreover, since
we do not have a specific
let rec
construct use of a
variable may precede its definition, e.g., when defin-
ing mutually recursive (or simply recursive) func-
tions. Statements may be either assignments or vari-
able definitions. Our compiler handles many more
statements, but these are enough to show the ideas
AnIntermediateLanguageforCompilationtoScriptingLanguages
97
bl :: = st;bl | e;bl | e
st :: = x
<-
e | def x=e
e :: = x | n |
tr
|
fls
| e
+
e | fun x
->
{bl} | e e
| if e then {bl} else {bl} | check(T
p
, e)
| stm2exp({bl},
y 7→ Y, x)
| exc(e,
Y 7→ y, e)
T
p
:: =
int
|
bool
v :: = n |
tr
|
fls
| fun x:T
->
{bl}
| stm2exp({bl},
y 7→ Y, x)
Figure 10: Syntax of
IL
.
behind the design of
IL
. Our intermediate language
is inspired (especially for the block structure) to In-
tegerPython, see (Ranson et al., 2008). Variables are
statically scoped, in the sense that, if there is a defi-
nition of the variable x in a block, all the free occur-
rences of x in the block refer to this definition. How-
ever, we can have occurrences of x preceding its defi-
nition. E.g.,
def f = fun y -> { x };
def x = 5;
f 2
correctly returns 5, whereas the following code would
produce a run-time error:
def x =7;
if (x > 3) then {
def f = fun y -> { x };
f 2
def x = 5;
3 }
else { 4 }
since when
f
is called the variable x, defined in the
inner block, has not yet been assigned a value. In-
stead, if x was not defined in the inner block, like in
the following
def x =7;
if (x > 3) then {
def f = fun y -> { x };
f 2 }
else { 4 }
the block would return 7, since x is bound in the en-
closing block. This is also the behaviour in JavaScript
and Python.
The construct stm2exp is used to move a block,
bl, outside its definition context. To produce a closed
term, the mutable variables free in bl,
y, are unbound
by associating them to global names
Y not subject
to renaming. The variables x, instead, are immutable
variables free in bl, i.e., they are not modified by the
execution of bl. The metavariables, X, Y, Z are used
to denote names.
The operational semantics of
IL
, see Fig. 11, is
given, by defining a reduction relation for blocks. So
our configurations will be pairs: “block, store”. In
order to specify the order of reduction we define eval-
uation contexts for blocks, containing evaluation con-
texts for expressions. As for
F#
we have to add to
the syntax of expressions locations, l, as they are gen-
erated during the evaluation of blocks. Moreover,
we add two constructs wrapping blocks: {bl} and
eval(bl). The first will be used to do the initial al-
location of variables needed to reproduce the previ-
ously described semantics, and the second to execute
a block in a position where an expression would be re-
quired. Note that these expressions are not in
IL
but
are just introduced to describe its semantics.
As for
F#
, the evaluation contexts of Fig. 11 spec-
ify a call-by-value, left-to-right reduction strategy.
The first rule is used before the evaluation of a
block to allocate the variables defined in a block. The
function de f mapping a block to the set of variables
defined in it is defined by:
def (e) =
/
0,
def (e;bl) = def (x
<-
e;bl) = def(bl), and
def (def x=e;bl) = {x} def (bl).
The initial value of the locations is set to undefined, ?,
so if an access to a variable is done before the evalu-
ation of an assignment or a definition for this variable
undErr is returned. Note that, this will never hap-
pen for
IL
programs which are translation of
F#
pro-
grams. After this initial allocation a block will not
contain free variables (but locations).
Rules (ASSIGN) and (DEF) continue the execution of the
expressions/statements in a block in a store in which
the value of locationl is v. So after this the value of l is
not undefined. Rule (EXP) throws away the value of an
expression and continues the execution of the block.
The rules for +, and
if
are trivial. Rule (APP) allo-
cates a location in the memory, assigning the value
of the actual parameter to it, then the location is sub-
stituted for the formal parameter in the body of the
function. Note that, being in an imperative language,
the formal parameter could be modified in the body
of the function, however, this change would not be
visible in the calling environment, since the location
is new. After this allocation the execution continues
with the evaluation of the body {bl}, i.e., applying
rule (ALLOC). The rules (TYPEYES), and (TYPENO) check
whether a value is of the right primitive type. The
function typeof from values to types is defined by:
typeof(
tr
) = typeof(
fls
) =
bool
, typeof(n) =
int
,
and undefined for the other values. The evaluation of
the
exc
construct, rule (STTOEXP), expects the first ar-
gument to be a stm2exp, such that the names of its un-
bindings are a subset of the one of the rebindings pro-
vided by
exc
. If this is the case, it allocates new loca-
ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies
98
e :: = · · · | l | {bl} | eval(bl) runtime expression
S :: = l
<-
E ;bl | def l=E;bl | E ;bl | E block evaluation context
E :: = [ ] | E
+
e | n
+
E | E e | v E | if E then {bl} else {bl} | check(T
p
, E ) expression evaluation context
| exc(E ,
Z 7→ l, e) | exc(v, Z 7→ l, vE e) | eval(S )
{bl} | ρ bl[x := l] | ρ[l 7→ ?] if x = def(bl) (ALLOC)
l 6∈ dom(ρ) new
l
<-
v;bl | ρ bl | ρ[l 7→ v] (ASSIGN)
def l=v;bl | ρ bl | ρ[l 7→ v] (DEF)
v;bl | ρ bl | ρ (EXP)
n
1
+
n
2
| ρ n | ρ if ˜n = ˜n
1
+
int
˜n
2
(SUM)
(fun x
->
{bl}) v | ρ {bl[x := l]} | ρ[l 7→ v] l 6∈ dom(ρ) new (APP)
if
tr
then bl
1
else bl
2
| ρ {bl
1
} | ρ (IFTRUE)
if
fls
then bl
1
else bl
2
| ρ {bl
2
} | ρ (IFFALSE)
check(T
p
, v) | ρ v | ρ if typeof (v) = T
p
(TYPEYES)
check(T
p
, v) | ρ typeErr if typeof (v) 6= T
p
(TYPENO)
exc(stm2exp({bl},
y 7→ Y, x), Z 7→ l
, v) | ρ if Y Z (STTOEXP)
eval({(bl[
x := l])[y
i
:= l
j
| Y
i
= Z
j
1 i n]}) | ρ[
l 7→ v] l 6∈ dom(ρ) new
eval(v) | ρ v | ρ (EVAL)
l | ρ v | ρ if ρ(l) = v (LOCDEF)
l | ρ undErr | ρ if ρ(l) =? (LOCUND)
e | ρ e
| ρ
S 6= []
(CTX)
S [e] | ρ S [e
] | ρ
e | ρ err err = typeErr undErr S 6= []
(CTXERROR)
S [e] | ρ err
Figure 11: Runtime expressions, evaluation contexts and operational semantics rule for
IL
.
tions for the immutable variables
x (as in rule (APP) for
the formal parameter), instead, for the unbound vari-
ables
y it substitutes the associated locations (via the
correspondence of the names in
Y and Z). So through
assignment to the (local) variables in y the execution
environment may be modified. The resulting block is
wrapped in the
eval
construct. Rule (EVAL) returns its
value. (Evaluation inside
eval
is done by the (CTX)
rule.) Finally, access to a location may return undErr
if the location has not been initialized with an assign-
ment of or a definition statement. Rule (CTX) evalu-
ates the first sub-expression selected by the evaluation
context. In case the evaluation produces and error rule
(CTXERROR) returns the error at the top level. Note that,
given a block bl if there is S and e such that bl = S [e],
then S is unique. So evaluation is deterministic.
An
IL
program is a closed block, bl. The initial
configuration for a program is {bl} | [].
Let us look at an example of evaluation. Consider
the program of Fig. 5. Applying rule (ALLOC) to the
block enclosed in brackets we get the configuration
bl | ρ where bl is
def lc1 = stm2exp(...);
def lc2 = 7;
def lc3 = fls;
def lc4 = exc(lc1, EV->lc3, lc2);
lc4
and ρ = [lc1 7→?, lc2 7→?, lc3 7→?, lc4 7→?].
Applying (DEF) three times we get bl
1
| ρ
1
where
bl
1
= def lc4 = exc(lc1, EV lc3, lc2);lc4 and
ρ
1
= [lc1 7→ stm2exp(...), lc2 7→ 7, lc3 7→ fls, lc4 7→?].
From rule (CTX) where S is def lc4 = E ;lc4
and E is exc([], EV lc3, lc2);lc4, apply-
ing rule (LOCDEF) we get bl
2
| ρ
1
where bl
2
is
def lc4 = exc(stm2exp(...), EV lc3, lc2);lc4.
From rule (CTX) where S
1
is def lc4 = E
1
;lc4
and E
1
is exc(stm2exp(...), EV lc3, []);lc4, ap-
plying rule (LOCDEF) we get bl
3
| ρ
1
where bl
3
is
def lc4 = exc(stm2exp(...), EV lc3, 7);lc4.
Again by rule (CTX) where S
2
is def lc4 = E
2
;lc4
and E
2
= [], and applying rule (STTOEXP), we get
def lc4 = eval({bl
4
});lc4 | ρ
1
, where bl
4
is
def fib =
fun x ->
if x < 3 then 1
else (fib (x-1) + fib (x-2));
def temp = fib 7;
lc3 <- temp % 2 = 0;
temp
The evaluation proceeds inside the
eval
construct,
with rule (CTX) where S
3
is def lc4 = E
3
;lc4 and
E
3
is eval([ ]) , applying rule (ALLOC), and produc-
ing the configuration bl
5
| ρ
2
where ρ
2
= [lc1 7→
stm2exp(...), lc2 7→ 7, lc3 7→ fls, lc4 7→?, lc5 7→
?, lc6 7→?], and bl
5
is def lc4 = eval({bl
6
});lc4
where bl
6
is
AnIntermediateLanguageforCompilationtoScriptingLanguages
99
def lc5 =
fun x ->
if x < 3 then 1
else (lc5 (x-1) + lc5 (x-2));
def lc6 = lc5 7;
lc3 <- lc6 % 2 = 0;
lc6
We can see how recursion is handled and how the as-
signment to
lc3
when evaluated modifies the location
of the initial variable
even
.
5 TRANSLATION OF CORE F#
INTO IL
In our translation we flatten the
let
constructs trans-
forming them into definitions of the corresponding
variables followed by the translation of the expression
in their body. Therefore, we have to take into account
the fact that in an
IL
block we may have forward bind-
ing. E.g., if
let y = 3 in
if ( y = 3) then (
let f = (fun x -> y)
let y = 5
(f 0) )
else 4
is translated into
def y = 3;
if ( y = 3) then (
def f = (fun x -> { y });
def y = 5;
(f 0) )
else 4
The translation is incorrect, since in the
IL
code the
occurrence of
y
in the body of
f
is bound to the defi-
nition of
y
that follows. Therefore the
F#
expression
evaluates to 3 whereas its translation in
IL
evaluates
to 5. In the translation we use renaming to resolve this
problem.
As explained in the Section 2 sequences of ex-
pressions will be mapped to sequences of statements,
and we use the
stm2exp
and
exc
constructs to simu-
late the behaviour of the sequence of statements with
an expression. So we define two translations of
F#
expressions. The first to IL expressions, [[·]]
I,M
ex
, and
the second to IL blocks, [[·]]
I,M
bl
. The translations are
parametrized by the sets of the immutable variables, I,
and mutable variables, M, of the context of the
F#
ex-
pression that is translated. The translations produce,
in addition to an IL expression/block also a sequence
of top level variable definition of variables bound to
stm2exp
expressions. In the following we present the
translations for function definitions, sequence of ex-
pressions, and the
let
construct, which exemplify the
technique used.
In the formal definition of the translation δ is
a metavariable denoting a declaration of a variable
def x=e and
δ a sequence of declarations separated
by “;” (semicolon).
The translations of
F#
function definitions to
IL
blocks or expressions:
[[fun x:T
->
e]]
I,M
bl
[[fun x:T
->
e]]
I,M
ex
are both equal to:
(fun x
->
{def y=check(T, x);bl[x := y]}, δ)
where [[e]]
I∪{x},M
bl
= (bl,
δ). So the translation of a
function produces a function whose body is the trans-
lation of the body (to a block) of the original func-
tion. In the translation of the body of the function the
variable x is added to the set of free immutable vari-
ables I. The formal parameter is replaced with a new
variable resulting from the type checking of the origi-
nal parameter. See the discussion about dynamic type
checking in Section 2.
In the following, we introduce the definition of the
wrapping needed to extrude a block from its definition
environment and how the construct
exc
rebinds it in
the run-time environment.
Definition 4. Given an
IL
block, and the dis-
joint sets of variables I = {
x} and M = {y}, let
blockToExp(bl, I, M) be
(exc(z,
Y 7→ y, x), δ)
where:
δ is def z:T
′′
=stm2exp(bl,
y 7→ Y, x)
z is a new variable and
Y are new names.
Let blockToExp(bl, I, M) = (e, δ), we can prove that:
for all stores ρ we have: {δ;e} | ρ
v | ρ
if and only
if {bl} | ρ
v | ρ
′′
. So the evaluation of the defini-
tion δ followed by the generated expression produces
the same result as the evaluation of the original block.
The difference in the content of the final stores is due
to the fact that the evaluation of the definition δ allo-
cates a location and assigns it the
stm2exp
expression,
to subsequently substitute this value for the location
in the
exc
expression. However, since the variable z
is new it does not interfere with the evaluation of the
original block/expression.
To give the translation of both sequences of ex-
pressions and of the
let
constructs, we introduce the
formal definition of the top level variable definition of
F#
expressions, then we define the renaming needed
to avoid the capture of forward definitions described
at the beginning of this section.
Definition 5. 1. Let e be an
F#
expression, the func-
tion de f
#
(e) returning the set of variables defined
at the top level of e is defined as follows:
ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies
100
def
#
(let [mutable] x=e
1
in e
2
) = {x} def
#
(e
2
),
def
#
(let rec x:T=v in e) = {x} def
#
(e),
def
#
(e
1
, e
2
) = def
#
(e
1
) def
#
(e
2
), and
def
#
(e) =
/
0 for all other expresssions e.
2. Let e be an
F#
expression, and x a set of vari-
ables, rn(e, x), renames the top level definitions of
the variables x in e as follows:
if e is let [mutable] x=e
1
in e
2
, then rn(e,
x) is
let [mutable] x=e
1
in rn(e
2
,
x) if x 6∈ x
let [mutable] z=e
1
in rn(e
2
{x 7→ z}, x) if x x
and z is new
if e is let rec y:T=v in e, then rn(e, x) is
let rec
y:T=v in rn(e, x) if y x =
/
0
let rec z:T=(v{y 7→ z}) in rn(e{y 7→ z}, x) if y
x =
/
0 and z are new
if e is e
1
, e
2
then rn(e,
x) is rn(e
1
, x), rn(e
2
, x)
rn(e, x) is e for all other expresssions e.
The translations of an
F#
sequence of expressions
to a
IL
block is:
[[e
1
, e
2
]]
I,M
bl
= (bl
1
;bl
2
,
δ;δ
)
where:
[[e
1
]]
Γ
bl
= (bl
1
,
δ)
[[rn(e
2
, z)]]
Γ
bl
= ( bl
2
, δ
) and z = def
#
(e
2
) FV(e
1
).
The translation of the sequence is the sequence of
blocks which are the translations of the two expres-
sions to blocks. However, before translating the sec-
ond expression, e
2
, we rename all the variables de-
fined in it that are free in e
1
, since in e
1
these vari-
ables are bound to their definitions in the enclosing
environment. In this way we preserve the semantics
of the source language
F#
.
The translations of an
F#
sequence of expressions
to an
IL
expression is:
[[e
1
, e
2
]]
I,M
ex
= (e, δ;
δ)
where:
[[e
1
, e
2
]]
I,M
bl
= ( bl,
δ) and
blockToExp(bl, I, M) = ( e, δ).
That is we first translate the sequence to a block, and
then return an
exc
expression, and the definition of
a new variable bound to an
stm2exp
expression, see
Definition 4. Note that the sets of mutable and im-
mutable variable of the environment are needed to
generate the correct matching for the expressions
exc
and
stm2exp
.
The translation of the let construct to an
IL
block
[[let x=e
1
in e
2
]]
I,M
bl
= (def x=e
1
;bl,
δ;δ
)
where
[[e
1
]]
I,M
ex
= (e
1
,
δ) and
[[rn(e
2
, z)]]
I∪{x},M
bl
= (bl, δ
) with z = def
#
(e
2
)
FV(e
1
).
That is we translate e
1
into an
IL
expression and the
body of the let e
2
into a block. For the translation of
e
2
the variable x is added to the immutable variables
of the context. Before translating e
2
we rename all
the variables defined in e
2
that are free in e
1
(as for
the translation of sequences of expressions).
The translation of
let mutable
differs only in the fact
that in translattion of e
2
, the variable x, being mutable,
is added to M.
Note that, this translation produces a block, the defi-
nition of x followed by a block. Moreover, the trans-
lation of the expression on the right-hand-side of the
definition of x, that is e
1
, must be an
IL
expression.
Looking at the
F#
code of Fig. 1 this means that the
following
F#
expression:
let rec fib x =
if x < 3 then 1
else fib(x - 1) + fib(x - 2)
let temp = fib z
even <- (temp % 2 = 0)
temp
which is a sequence of expressions, must be translated
to an
IL
expression.
The translation of a let expression to an
IL
expression, is defined as the translation of a se-
quence of expressions to an
IL
expression in which
[[let x=e
1
in e
2
]]
I,M
bl
substitutes [[e
1
, e
2
]]
I,M
bl
.
Properties of the Translation. The translation pre-
serves the dynamic semantics of the
F#
expressions,
that is let e be an
F#
program, and [[e]]
/
0,
/
0
bl
= ( bl,
δ).
Then e | []
v | ρ if and only if {
δ;bl} | []
v |
ρ
for some ρ
. From this result and the fact that
F#
programs do not get stuck, we can derive that the
IL
translation of an
F#
program does not evaluate to an
error or gets stuck.
6 COMPARISONS WITH OTHER
WORK
Similar projects exist and are based on similar trans-
lation techniques, although, as far as we know, we are
the first to introduce an intermediate language allow-
ing to translate to many target languages. Pit, see (Fa-
had, 2012), and FunScript, see (Bray, 2013), are open
source
F#
to JavaScript compilers. They support only
translation to JavaScript. FunScript ha support for in-
tegration with JavaScript code. Websharper, see (In-
tellifactory, 2012), is a professional web and mobile
development framework. As of version 2.4 an open
AnIntermediateLanguageforCompilationtoScriptingLanguages
101
source license is available. It is a very rich frame-
work offering extensions for ExtJs, jQuery, Google
Maps, WebGL and many more. Again it supports
only JavaScript.
F#
Web Tools is an open source
tool whose main objective is not the translation to
JavaScript, instead, it is trying to solve the difficulties
of web programming: “the heterogeneous nature of
execution, the discontinuity between client and server
parts of execution and the lack of type-checked ex-
ecution on the client side”, see (Petˇıˇcek and Syme,
2012). It does so by using meta-programming and
monadic syntax. One of it features is translation to
JavaScript. Finally, a translation between Ocaml byte
code and JavaScript is provided by Ocsigen, and de-
scribed in (Vouillon and Balat, 2011).
On the theoretical side, a framework integrat-
ing statically and dynamically typed (functional) lan-
guages is presented in (Matthews and Findler, 2009).
Support for dynamic languages is provided with ad
hoc constructs in Scala, see (Moors et al., 2012).
A construct similar to stm2exp, is studied in recent
work by one of the authors, see (Ancona et al., 2013),
where it is shown how to use it to realize dynamic
binding and meta-programming, an issue we are plan-
ning to address. The only work to our knowledge that
proves the correctness of a translation between a stat-
ically typed functional language, with imperative fea-
tures to a scripting language (namely JavaScript) is
(Fournet et al., 2013).
7 CONCLUSIONS
AND FUTURE WORK
In this paper we introduced
IL
an intermediate lan-
guage for the translation of a significant fragment
of
F#
to scripting languages such as Python and
JavaScript. The translation is shown to preserve
the dynamic semantics of the original language. A
preliminary version of this paper was presented at
ICTCS 2012, see (Giannini et al., 2012), which has
not published proceedings. We have a prototype im-
plementation of the compiler that can be found at
http://www.bluestormproject.org/. The compiler is
implemented in
F#
and is based on two metaprogram-
ming features offered by the .net platform: quotations
and reflection. Our future work will be on the practi-
cal side to use the intermediate language to integrate
F#
code and JavaScript or Python native code. (Some
of the features of
IL
, such as dynamic type check-
ing, were originally introduced for this purpose.) A
previous implementation of the translation supported
other features such as namespacing, classes, pattern
matching, discriminated unions, etc. We are in the
process of adding them at the current implementation,
since some of this features have poor or no support at
all in JavaScript or Python. On the theoretical side,
we are planning to complete the proofs of correctness
of the translations. We need to formalize our target
languages Python and JavaScript, and then prove the
correctness of the translation from
IL
to them. (We
anticipate that these proofs will be easier than the one
from
F#
to
IL
.) Moreover, we want to formalize the
integration of native code, and more in general meta-
programming on the line of recent work by the au-
thors, see(Ancona et al., 2013) . We are also consid-
ering extending the type system for the intermediate
language with polymorphic types, which is, as shown
in (Ahmed et al., 2011), non trivial.
ACKNOWLEDGEMENTS
We warmly thank Daniele Mantovani for his support
and involvement in the topic of the paper. We also
thank the anonymous referees of a previous version
of the paper for pointing out some problems which
lead to a substantial review of the intermediate lan-
guage. Any misinterpretation of their suggestions is,
of course, our responsibility.
REFERENCES
Ahmed, A., Findler, R. B., Siek, J. G., and Wadler, P.
(2011). Blame for all. In Proceedings of POPL 2011,
Austin, TX, USA, ACM, pages 201–214.
Ancona, D., Giannini, P., and Zucca, E. (2013). Recon-
ciling positional and nominal binding. In ITRS 2012,
EPTCS.
Appel, A. W. (1998). Modern Compiler Implementation in
ML. Cambridge University Press.
Bray, Z. (2013). Funscript. http://tomasp.net/files/funscript/
tutorial.html.
Fahad, M. S. (2012). Pit - F Sharp to JS compiler. http://
pitfw.org/.
Fournet, C., Swamy, N., Chen, J., Dagand, P.-
´
E., Strub, P.-
Y., and Livshits, B. (2013). Fully abstract compilation
to javascript. In POPL, pages 371–384. ACM.
Giannini, P., Mantovani, D., and Shaqiri, A. (2012). Lever-
aging dynamic typing through static typing. ICTCS
2012. http://ictcs.di.unimi.it/papers/paper
4.pdf.
Igarashi, A., Pierce, B., and Wadler, P. (2001). Feather-
weight Java: A minimal core calculus for Java and
GJ. ACM TOPLAS, 23(3):396–450.
Intellifactory (2012). Websharper 2010 platform. http://
websharper.com/.
Matthews, J. and Findler, R. B. (2009). Operational seman-
tics for multi-language programs. ACM Trans. Pro-
gram. Lang. Syst., 31(3).
ICSOFT2013-8thInternationalJointConferenceonSoftwareTechnologies
102
Moors, A., Rompf, T., Haller, P., and Odersky, M. (2012).
Scala-virtualized. In Kiselyov, O. and Thompson,
S., editors, Proceedings of PEPM 2012, Philadelphia,
Pennsylvania, USA, ACM, pages 117–120.
Nanevski, A. (2003). From dynamic binding to state
via modal possibility. In PPDP’03, pages 207–218.
ACM.
Petˇıˇcek, T. and Syme, D. (2012). AFAX: Rich client/server
web applications in
F#
. http://www.scribd.com/doc/
54421045/Web-Apps-in-F-Sharp.
Ranson, J. F., Hamilton, H. J., and Fong, P. W. L. (2008).
A semantics of python in isabelle/hol. Technical
Report CS-2008-04, CS Department, University of
Regina,Saskatchewan.
Vouillon, J. and Balat, V. (2011). From bytecode
to javascript: the js of ocaml compiler. http://
www.pps.univ-paris-diderot.fr/balat/publi.php.
AnIntermediateLanguageforCompilationtoScriptingLanguages
103