In this paper we consider the following D0L-
system having two rules. Here n denotes the number
of rewritings, while A and B denote variables.
axiom : A, n =?
rule A : A →?????
rule B : B →???????
We have three strings: Y, mY, and Z. Y is a normal
string generated using the original grammar, while
mY is a string generated by applying transmutation to
Y. Z is a string generated using a grammar candidate,
where a grammar candidate means a candidate set of
n, rules A and B.
Transmutation
Among four types of transmutation, only d-type is
considered here since we want to know how LGIC2
works for d-type. We assume transmutation occurs
locally around the center of Y, considering two rates:
coverage rate P
c
and occurrence rate P
o
. P
c
rep-
resents the proportion of transmutation area to the
whole Y, while P
o
represents the probability of trans-
mutation in the area. Thus, overall transmutation rate
P
t
is calculated as follows:
P
t
= P
c
× P
o
(1)
Valid Transmutation
Simple transmutation will generate an invalid string,
which means such a string cannot be drawn through
turtle graphics. To keep the transmutation valid, the
numbers of left and right square brackets are moni-
tored and controlled if necessary. That is, in the trans-
mutation area the number count
ℓ
of left square brack-
ets should be larger than or equal to the number count
r
of right ones. Moreover,when the transmutation ends,
we assure count
ℓ
= count
r
by adding the right brack-
ets if necessary. Using such control, we get a valid
transmuted string mY from the normal string Y.
3 LGIC2: EMERGENT
INDUCTION OF L-system
GRAMMAR
An emergent induction method LGIC2 (L-system
GI with error Correction, ver.2) (Nakano, 2013b) is
explained and slightly modified. Given a transmuted
string mY, LGIC2 generates grammar candidates,
aiming at discovering the original grammar.
Basic Framework
The basic framework of LGIC2 is simple. Since a
right side of each production rule appears repeatedly
in mY, we extract frequently appearing substrings
from mY to form rule candidates. Then such a rule
candidate is combined with its reasonable n, the num-
ber of rewritings, to form a grammar candidate.
The main drawback of this approach is the
combinatorial growth in the number of grammar
candidates. Without any pruning it took ten days
to finish two thirds of the processing for mY whose
length is about 4,000. Thus, we need to narrow
down candidates to get only promising ones. LGIC2
introduces the following three pruning techniques.
Pruning by Frequency
Since the right side of each original rule appears
many times in mY, we can discard less frequent
substrings whose frequency is less than min
frq. The
threshold value may depend on the length of mY. In
our experiments we use min
frq = 50
The Number of Rewritings
Now that we have a pair of rule candidates, we con-
sider how to determine n, the number of rewritings.
The suitable n will depend on a transmutation type.
For r-type or i-type we tried to find n generating the
longest Z which satisfies len(Z) ≤ len(mY) since
len(Y) ≤ len(mY). For d-type, however, the situation
is different since len(mY) < len(Y). Thus, we should
select n generating the shortest Z which satisfies
len(mY) < len(Z).
Pruning by Goodness of Fit
The goodness of fit is a statistical measure which
evaluates how well a model (Z) fits to observed data
(mY). The goodness of fit can be evaluated by χ
2
val-
ues. Let the numbers of symbol occurrences in mY
and Z be {y
i
} and { z
i
} respectively. Then calculate
{p
i
= z
i
/len(Z)}, and we have the following χ
2
value.
Here I is the number of all kinds of variables and con-
stants.
χ
2
=
I
∑
i
(y
i
− len(mY) × p
i
)
2
/(len(mY) × p
i
) (2)
We discard a grammar candidate if χ
2
is greater than
max
chi2. For r-type or i-type transmutation we used
max chi2 = 150 since the value gets very large, more
than 200 for high transmutation rate P
t
, even for the
original grammar. For d-type, however, the fitting
turned out to be extremely good. Thus, we can use a
very small value max
chi2 = 10 for d-type.
Similarity between Two Strings
As the similarity between two strings, LGIC2 em-
ploys the longest common subsequence (LCS) (Cor-
men, Leiserson and Rivest, 1990). Let LCS(S
1
,S
2
)
KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
398