NUMBER THEORY-BASED INDUCTION OF DETERMINISTIC
CONTEXT-FREE L-SYSTEM GRAMMAR
Ryohei Nakano
Department of Computer Science, Chubu University, 1200 Matsumoto-cho, Kasugai 487-8501, Japan
Naoya Yamada
Department of Computer Science and Engineering, Nagoya Institute of Technology
Gokiso-cho Showa-ku, Nagoya 466-8555, Japan
Keywords:
Grammatical induction, L-system, Number theory, Plant model.
Abstract:
This paper addresses grammatical induction of deterministic context-free L(D0L)-system. Considering the
parallel feature of L-system production and the deterministic context-free feature of D0L-system, we take a
number theory-based approach. Here D0L-system grammar is limited to one or two production rules. Basic
equations for the methods are derived and utilized to narrow down the parameter value ranges. Our experi-
ments using plants models showed the proposed methods induced the original production rules very efficiently.
1 INTRODUCTION
L-systems were originally developed by Linden-
mayer as a mathematical theory of plant development
(Prusinkiewicz and Lindenmayer, 1990). The central
concept of L-systems is rewriting. In general, rewrit-
ing is a mechanism for generating complex objects
from a simple initial object using production rules.
The most extensively studied rewriting systems
operate on character strings, and Chomsky’s work on
formal grammars is well known. Formal grammars
and L-systems are both string rewriting systems, but
the essential difference between them is that in formal
grammars productions are applied sequentially while
in L-systems productions are applied in parallel.
The reverse process of rewriting is grammatical
induction, which infers a set of production rules given
a set of strings. Grammatical induction of formal
grammars has been studied for decades and induction
of context-free grammars is still an open problem.
Induction of L-system grammars is also an open
problem little explored so far. L-systems can be clas-
sified using two axes: (1) deterministic or stochastic,
and (2) context-free or context-sensitive.
(McCormack, 1993) addressed computer graph-
ics modeling through evolution of context-free L-
systems. (Nevill-Manning, 1996) proposed a sim-
ple algorithm called Sequitur, which reveals structure
like context-free grammars from a wide range of se-
quences, however, with small success for grammati-
cal induction of deterministic context-free L-system
grammar. (Schlecht, et al., 2007) proposed statis-
tical structural inference for microscopic 3D images
through learning stochastic L-system model. (Dama-
sevicius, 2010) addressed structural analysis of DNA
sequences through evolution of stochastic context-
free L-system grammars.
This paper addresses grammatical induction of de-
terministic context-free L(D0L)-system. Considering
the parallel feature of L-system production and the
deterministic context-free feature of D0L-system, we
take a number theory-based approach. Here D0L-
system grammar is limited to one or two production
rules. Our experiments using plants models showed
the proposed methods induced the original production
rules quite efficiently.
2 D0L-SYSTEMS
D0L-systems. The simplest class of L-systems are
called D0L-system (deterministic context-free L-
system). D0L-system is defined as G = (V, C, ω, P),
where V and C denote sets of variables and constants,
194
Nakano R. and Yamada N..
NUMBER THEORY-BASED INDUCTION OF DETERMINISTIC CONTEXT-FREE L-SYSTEM GRAMMAR.
DOI: 10.5220/0003088101940199
In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2010), pages 194-199
ISBN: 978-989-8425-28-7
Copyright
c
2010 SCITEPRESS (Science and Technology Publications, Lda.)
ω is an initial string called axiom, and P is a set
of production rules. A variable is a symbol that is
replaced in rewriting, and a constant is a symbol that
remains fixed in rewriting and is used to control turtle
graphics.
Notation. Shown below is the notation employed in
the following sections. Here we assume the following
form of production rules.
rule A : A ????????
rule B : B ??????
Y: given string.
n: the number of rewritings.
Z
(n)
: string obtained after n times rewritings.
α
A
, α
B
, α
K
: the numbers of variables A, B and con-
stant K occurring in the right side of rule A.
β
A
, β
B
, β
K
: the numbers of variables A, B and con-
stant K occurring in the right side of rule B.
y
A
, y
B
, y
K
: the numbers of variables A, B and con-
stant K occurring in Y.
z
(n)
A
, z
(n)
B
, z
(n)
K
: the numbers of variables A, B and
constant K occurring in Z
(n)
.
3 INDUCTION OF L-SYSTEM
GRAMMAR HAVING ONE
RULE
When D0L-system has only one production rule, the
number theory-based induction is easy. The method
proposed below is called LGIN1 (L-system Grammar
Induction based on Number theory for 1 rule). Here
the following situation is assumed.
n =?, axiom : A
rule A : A ????????
Given string Y, we are asked to estimate the number
of rewritings n and to induce the rule A. Through sim-
ple observation we have the following.
z
(n)
A
= α
n
A
(1)
z
(n)
K
=
1 α
n
A
1 α
A
α
K
if α
A
6= 1
n α
K
if α
A
= 1
(2)
Then we obtain the following, which we call basic
equations of LGIN1.
y
A
= α
n
A
(3)
y
K
=
1 α
n
A
1 α
A
α
K
if α
A
6= 1
n α
K
if α
A
= 1
(4)
From eq.(3) we get candidate pairs (α
A
, n) by
factorizing y
A
into prime factors. For each candidate
pair we get α
K
using eq.(4) for each constant K.
Since given string Y includes the right side of rule
A as a substring, we exhaustively extract from Y a
substring having α
A
As and α
K
K’s to form rule A
candidate. Then we rewrite the axiom n times using
the rule A candidate, and check whether the obtained
string is equal to Y. If the equality holds, the rule A
candidate is a solution.
Example.
n = 3, axiom : A
rule A : A A[+]A
Consider string Y shown below.
A[+A]A[+A[+A]A]A[+A]A[+A[+A]A[+A[+A]
A]A[+A]A]A[+A]A[+A[+A]A]A[+A]A
By scanning Y we get the following values.
y
A
= 27, y
+
= 13, y
[
= 13, y
]
= 13 (5)
Since y
A
= 3
3
, we have the following two sets.
(i) n = 1, α
A
= 27, α
+
= 13, α
[
= 13, α
]
= 13
(ii) n = 3, α
A
= 3, α
+
= 1, α
[
= 1, α
]
= 1
The case n=1 is always a trivial one; thus, discard it.
By scanning Y, we have the following two substrings
having 3 As, one +, one [, and one ]:
A A[+A]A
A A]A[+A
By rewriting each of them n(= 3) times and checking
the equality, we select the original rule A.
4 INDUCTION OF L-SYSTEM
GRAMMAR HAVING TWO
RULES
When D0L-system has two production rules, the in-
duction gets immensely complicated. The method ex-
plained below is called LGIN2 (L-system Grammar
Induction based on Number theory for 2 rules). In
this case the following is assumed.
n =?, axiom : A
rule A : A ????????
rule B : B ?????
Given string Y, we are asked to estimate the number
of rewritings n and to induce the rules A and B.
LGIN2 goes in the following order.
NUMBER THEORY-BASED INDUCTION OF DETERMINISTIC CONTEXT-FREE L-SYSTEM GRAMMAR
195
(1) Derivation of Basic Equations. Focusing on
variables, we consider the growth of the numbers of
occurrences of A and B.
(1 0) T
n
= (z
(n)
A
z
(n)
B
), T =
α
A
α
B
β
A
β
B
(6)
Let λ
1
and λ
2
be eigen values of matrix T, and v
1
and
v
2
be their eigen vectors. Then we have the following.
T V = V Λ, V = (v
1
v
2
), Λ =
λ
1
0
0 λ
2
(7)
By simple calculation we get the following.
T
n
= V Λ
n
V
1
, Λ
n
=
λ
n
1
0
0 λ
n
2
(8)
By substituting eigen vectors into the above we have
the following.
T
n
=
D
1
α
B
x
n
β
A
x
n
D
2
(9)
D
1
= α
A
x
n
(α
A
β
B
α
B
β
A
)x
n1
(10)
D
2
= β
B
x
n
(α
A
β
B
α
B
β
A
)x
n1
(11)
x
n
=
λ
n
1
λ
n
2
λ
1
λ
2
if λ
1
6= λ
2
n α
n
A
if λ
1
= λ
2
(12)
From eqs.(6) and (9) we have the following, which we
call basic equations of LGIN2.
y
A
= α
A
x
n
(α
A
β
B
α
B
β
A
) x
n1
(13)
y
B
= α
B
x
n
(14)
(2) Narrowing down of Variable Parameters. From
eq.(14) we have candidate pairs (α
B
, x
n
) by factor-
izing y
B
. Equation (13) can be used to narrow down
the value ranges of α
A
, β
A
, and β
B
. For example,
considering n = 2 we have z
(2)
A
= α
2
A
+ α
B
β
A
, and
z
(2)
B
= α
A
α
B
+α
B
β
B
. Using these as lower bounds, we
have y
A
z
(2)
A
α
2
A
, and y
B
z
(2)
B
= α
A
α
B
+ α
B
β
B
.
(3) Estimating the Number of Rewritings. At this
stage we have candidates of (α
A
, α
B
, β
A
, β
B
, x
n
). For
each candidate set we estimate the number of rewrit-
ings n in the following way. As for x
n
, we can eas-
ily show that x
n
is an integer and strictly increasing.
Moreover, by simple calculation we have the follow-
ing recurrence formula.
x
n
= x
n1
(α
A
+ β
B
) x
n2
(α
A
β
B
α
B
β
A
) (15)
Starting with x
1
=1 and x
2
= α
A
+ β
B
, we increase
and find n whose x
n
is equal to a candidate x
n
. If x
n
exceeds the candidate x
n
, discard the candidate.
(4) Narrowing down of Constant Parameters. For
each constant K we repeat the following. Using the
following we can calculate r
(n)
A
and r
(n)
B
, the numbers
of A and B rewritings occurred until n rewritings.
r
(n)
A
= 1+ z
(1)
A
+ z
(2)
A
+ ... + z
(n1)
A
(16)
r
(n)
B
= z
(1)
B
+ z
(2)
B
+ ... + z
(n1)
B
(17)
Then we have the following equation whose coeffi-
cients and solution are integers.
r
(n)
A
α
K
+ r
(n)
B
β
K
= y
K
(18)
In general, this is an indeterminate equation and can
be solved easily using extended Euclidean algorithm.
(5) Generate-and-test of Rule Candidates. Now
we have candidates of (α
A
, α
B
, β
A
, β
B
, α
K
, β
K
). Since
given string Y always includes the right sides of rules
A and B as substrings, we exhaustively extract from
Y the following two substrings:
(a) a substring having α
A
As, α
B
B’s, and α
K
K’s to
form rule A candidate,
(b) a substring having β
A
As, β
B
B’s, and β
K
K’s to
form rule B candidate.
Then for each combination of rules A and B can-
didates, we rewrite the axiom n times using the candi-
dates, and check whether the obtained string is equal
to Y. If the equality holds, a pair of the candidates is
a solution.
5 EXPERIMENTS
We evaluate the proposed LGIN1 and LGIN2 using
plants models. The experiments were performed us-
ing a PC with 3.0 GHz CPU and 2MB main memory.
5.1 Experiments using LGIN1
Plants model ex01p, ex02p and ex03p are drawn from
(Prusinkiewicz and Lindenmayer, 1990), and ex01y,
ex02y and ex03y are their corresponding variations
with fewer n.
(ex01p) n = 5, axiom : F
rule : F F[+F]F[F]F
(ex01y) n = 4, axiom : F
rule : F F[+F]F[F]F
Shown below is string Y for ex01y whose length
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
196
is 1,561.
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]
F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F[+F[+F]F[F]F
]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F
]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]
F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]
F[+F]F[F]F[F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]
F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[
+F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F
]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F[+F[+F]F[F]F
]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[
F[+F]F[F]F]F[+F]F[F]F]F[+F]F[F]F[+F[+F]F[F]F]
F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F[
+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]F[+F]F[F]F]
F[+F]F[F]F[+F[+F]F[F]F]F[+F]F[F]F[F[+F]F[F]F]
F[+F]F[F]F
Figures 1 to 6 show plants graphics for these six
D0L-systems.
Figure 1: Model ex01p. Figure 2: Model ex01y.
(ex02p) n = 5, axiom : F
rule : F F[+F]F[F][F]
(ex02y) n = 4, axiom : F
rule : F F[+F]F[F][F]
(ex03p) n = 4, axiom : F
rule : F FF [F + F + F] + [+F F F]
(ex03y) n = 3, axiom : F
rule : F FF [F + F + F] + [+F F F]
Figure 3: Model ex02p. Figure 4: Model ex02y.
Figure 5: Model ex03p. Figure 6: Model ex03y.
For these six D0L-systems LGIN1 successfully
found the original grammars. When n = 4, LGIN1
found a grammar with n = 2 as another solution. On
the other hand, when n is a prime number, LGIN1
found the original grammar as a unique solution.
Table 1: CPU time of LGIN1.
model n string CPU time
length (sec)
ex01p 5 7,811 0.093
ex01y 4 1,561 0.063
ex02p 5 9,373 0.082
ex02y 4 1,873 0.158
ex03p 4 11,116 0.776
ex03y 3 1,388 0.051
The CPU time required by LGIN1 is shown in Ta-
ble 1. LGIN1 finished its processing within one sec-
ond for each example. When we increase the number
of rewritings with a production rule fixed, the string
length naturally gets much larger, but the processing
time does not always increase, for example, see ex02.
This happened probably because n = 4 has a factor
of 2, requiring additional search, while n = 5 has no
factor other than 1.
5.2 Experiments using LGIN2
Plants model ex04p, ex05p and ex06p are drawn from
(Prusinkiewicz and Lindenmayer, 1990), and ex04y,
NUMBER THEORY-BASED INDUCTION OF DETERMINISTIC CONTEXT-FREE L-SYSTEM GRAMMAR
197
ex05y and ex06y are their corresponding variations
with fewer n.
(ex04p) n = 7, axiom : X
rule : X F[+X]F[X] + X
rule : F FF
(ex04y) n = 5, axiom : X
rule : X F[+X]F[X] + X
rule : F FF
Shown below is string Y for ex04y whose length is
1,512.
FFFFFFFFFFFFFFFF[+FFFFFFFF[+FFFF[+FF[+F[+X]F[X] + X
]FF[F[+X]F[X]+ X] + F[+X]F[X] + X]FFFF[FF[+F[+X]F[
X]+ X]FF[F[+X]F[X] + X]+ F[+X]F[X] + X]+ FF[+F[+X]F[X
] + X]FF[F[+X]F[X] + X] + F[+X]F[X] + X]FFFFFFFF[FFFF[
+FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X] + F[+X]F[X]+ X]F
FFF[FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X]+ F[+X]F[X]
+X]+ FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X]+ F[+X]F[X] +
X]+ FFFF[+FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X] + F[+X]F
[X]+ X]FFFF[FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X] + F[
+X]F[X] + X] + FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X] + F[+
X]F[X] + X]FFFFFFFFFFFFFFFF[FFFFFFFF[+FFFF[+FF[+F[
+X]F[X] + X]FF[F[+X]F[X]+ X] + F[+X]F[X]+ X]FFFF[FF
[+F[+X]F[X]+ X]FF[F[+X]F[X]+ X] + F[+X]F[X]+ X] + FF[
+F[+X]F[X] + X]FF[F[+X]F[X] + X]+ F[+X]F[X] + X]FFFFF
FFF[FFFF[+FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X]+ F[+X
]F[X] + X]FFFF[FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X]+
F[+X]F[X] + X] + FF[+F[+X]F[X] + X]FF[F[+X]F[X] + X]+ F
[+X]F[X] + X] + FFFF[+FF[+F[+X]F[X] + X]FF[F[+X]F[X]
+X]+ F[+X]F[X] + X]FFFF[FF[+F[+X]F[X] + X]FF[F[+X]F
[X]+ X] + F[+X]F[X] + X] + FF[+F[+X]F[X] + X]FF[F[+X]F[
X]+ X] + F[+X]F[X]+ X] + FFFFFFFF[+FFFF[+FF[+F[+X]F[X
] + X]FF[F[+X]F[X] + X] + F[+X]F[X] + X]FFFF[FF[+F[+X]
F[X] + X]FF[F[+X]F[X] + X] + F[+X]F[X] + X] + FF[+F[+X]F
[X]+ X]FF[F[+X]F[X] + X] + F[+X]F[X]+ X]FFFFFFFF[FF
FF[+FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X] + F[+X]F[X]+
X]FFFF[FF[+F[+X]F[X]+ X]FF[F[+X]F[X] + X] + F[+X]F[
X]+ X] + FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X] + F[+X]F[
X]+ X]+ FFFF[+FF[+F[+X]F[X] + X]FF[F[+X]F[X]+ X]+ F[+
X]F[X] + X]FFFF[FF[+F[+X]F[X]+ X]FF[F[+X]F[X] + X]
+F[+X]F[X] + X] + FF[+F[+X]F[X]+ X]FF[F[+X]F[X] + X]+
F[+X]F[X] + X
Figures 7 to 12 show plants graphics for these six
D0L-systems.
Figure 7: Model ex04p. Figure 8: Model ex04y.
(ex05p) n = 7, axiom : X
rule : X F[+X][X]FX
rule : F FF
(ex05y) n = 5, axiom : X
rule : X F[+X][X]FX
rule : F FF
Figure 9: Model ex05p. Figure 10: Model ex05y.
(ex06p) n = 5, axiom : X
rule : X F [[X] + X] + F[+FX] X
rule : F FF
(ex06y) n = 4, axiom : X
rule : X F [[X] + X] + F[+FX] X
rule : F FF
Figure 11: Model ex06p. Figure 12: Model ex06y.
For these six D0L-systems having two production
rules LGIN2 found exactly the same original gram-
mars as unique solutions.
The CPU time required by LGIN2 is shown in
Table 2. LGIN2 finished each task within seconds.
When we increase the number of rewritings with pro-
duction rules fixed, the processing time does not al-
ways increase, for example, see ex06. This happened
partially because n = 4 has larger search space than
n = 5; that is, y
F
= 360 in n = 4 has 22 divisors (ex-
cluding 1 and 360) while y
F
= 1488 in n = 5 has 18
divisors (excluding 1 and 1488).
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
198
Table 2: CPU time of LGIN2.
model n string CPU time
length (sec)
ex04p 7 13,956 0.680
ex04y 5 1,512 0.132
ex05p 7 12,863 0.667
ex05y 5 1,391 0.126
ex06p 5 6,263 1.440
ex06y 4 1,551 4.228
6 CONCLUSIONS
This paper proposed two methods for grammatical in-
duction of D0L-systems having one or two produc-
tion rules and simple axioms. Basic equations for
the methods are derived and utilized to narrow down
the parameter value ranges. In our experiments using
plants models, the methods found the original gram-
mars very efficiently. In the future we plan to extend
our induction methods for wider class of L-systems.
ACKNOWLEDGEMENTS
This work was supported by Grants-in-Aid for Sci-
entific Research (C) 22500212 and Chubu University
Grant 22IS27A.
REFERENCES
Damasevicius, R. (2010). Structural analysis of regulatory
DNA sequences using grammar inference and support
vector machine. Neurocomputing, 73:633–638.
McCormack, J. (1993). Interactive evolution of L-system
grammars for computer graphics modelling. Complex
Systems: From Biology to Computation, ISO Press,
Amsterdam, 118–130.
Nevill-Manning, C. G. (1996). Inferring sequential struc-
ture. Ph.D. thesis, Dept. of Computer Science, Univ.
of Waikato, New Zealang.
Prusinkiewicz, P. and Lindenmayer, A. (1990). The Algo-
rithmic Beauty of Plants. Springer-Verlag, New York.
Schlecht, J., Barnard, K., Springgs, E., and Pryor, B. (2007).
Inferring grammar-based structure models from 3d
microscopy data. In Proc. of IEEE Conference on
Computer Vision and Pattern Recognition. 1–8.
NUMBER THEORY-BASED INDUCTION OF DETERMINISTIC CONTEXT-FREE L-SYSTEM GRAMMAR
199