Linguistic Applications of Finite Automata with Translucent Letters

Benedek Nagy

and L´aszl´o Kov´acs

Faculty of Informatics, University of Debrecen, PO Box 12, Debrecen, Hungary

Department of Information Technology, University of Miskolc, Miskolc-Egyetemv´aros, Hungary

Keywords:

Finite Automata, Mildly Context-sensitive Languages, Natural Languages, Formal Linguistics, Free-order

Languages.

Abstract:

Finite automata with translucent letters do not read their input strictly from left to right as traditional ﬁnite

automata, but for each internal state of such a device, certain letters are translucent, that is, in this state the

automaton cannot see them. We address the word problem of these automata, both in the deterministic and in

the nondeterministic case. Some interesting examples from the formal language theory and from a segment of

the Hungarian language is shown using automata with translucent letters.

1 INTRODUCTION

The ﬁnite automaton is a fundamental computing de-

vice for accepting languages. Its deterministic ver-

sion (DFA) and its nondeterministic version (NFA)

both accept exactly the regular languages, and they

are being used in many areas like compiler construc-

tion, text editors, computational linguistics, etc. For

regular languages the word problem is decidable by

a real-time (i.e., linear) computation. However, the

expressiveness of regular languages is quite limited,

and thus, these automata are too weak for several fur-

ther applications. Accordingly, much more powerful

models of automata have been introduced and studied

like, e.g., pushdown automata, linear-bounded auto-

mata, and Turing machines. But this larger expressive

power comes at a price that certain algorithmic ques-

tions like the word problem or the emptiness problem

become more complex or even undecidable. Hence,

when dealing with applications, for example in natu-

ral language processing or concurrency control, it is

of importance to ﬁnd models of automata that rec-

oncile two contrasting goals: they have sufﬁcient ex-

pressiveness and, at the same time, a moderate degree

of complexity.

An interesting question in application of formal

languages is whether the sentences of natural lan-

guages (NL) can be modeled with a class of the

Chomsky hierarchy or not. There are many differ-

ent views and approaches in the literature regarding

the position of natural languages in this hierarchy. It

is folklore that ﬁnite languages are regular, and hence

the simplest model for NL uses regular form approach

based on the idea that the set of all sentences of hu-

mans are ﬁnite (there is only a ﬁnite number of people

live/lived and each of them said/wrote only a ﬁnite

number of sentences). There are also different ap-

proaches which use context-free grammars (Gazdar,

1982) while others like Matthews (Matthews, 1979)

consider NL even more complex than the recursively

enumerable languages. Since we do not understand

all features of the human brain, one may believe that

human brain can do more complex ‘computations’

than the computers/Turing machine can do.

The ﬁrst ﬁnite state model for NL was developed

in 1955 by Hockett (Hockett, 1955). Later, in 1969,

Reich (Reich, 1969) has argued that NLs are not self-

embedding to any arbitrary degree. Another argument

for regularity is given by among others Sullivan (Sul-

livan, 1980): an individual neuron in the brain can be

modeled with a ﬁnite automaton and the brain con-

tains only ﬁnite number of neurons thus the resulting

structure also corresponds to a ﬁnite automaton. The

number of states of the resulting non-deterministic ﬁ-

nite automaton is approximated with 10

. Thus the

only way to prove that NL are more complex than reg-

ular grammars is to exhibit some inﬁnite sequences

of grammatical sentences (Kornai, 1985). Chom-

sky (Chomsky, 1957) himself argues that English and

other NLs are not regular languages. The most fa-

mous example for non-regular behaviour is the fol-

lowing pattern (Wintner, 2002):

461

Nagy B. and Kovács L..

Linguistic Applications of Finite Automata with Translucent Letters.

DOI: 10.5220/0004359504610469

In Proceedings of the 5th International Conference on Agents and Artiﬁcial Intelligence (LAFLang-2013), pages 461-469

ISBN: 978-989-8565-38-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

A white male (whom a white male)

(hired)

hired another white male.

Also in the case, when the argumentation is accepted

that the given structure can’t be realized in inﬁnite

depth, it is clear that the corresponding base regular

grammar/ﬁnite automata for ﬁnite depth may have a

very large complexity. Thus, from a practical point of

view it is worth to ﬁnd more compact grammars that

can describe the special behaviour of the different nat-

ural languages.

There are also very strong arguments that NLs are

not context-free. In the area of formal languages in

conjunction with computational linguistics and nat-

ural language processing the formulation of the no-

tion of “mildly context-sensitive languages” (Joshi,

2010; Mery et al., 2006) has been appeared and used.

These classes are proper subclasses of the class of

context-sensitive languages and proper superclasses

of the class of context-free languages. They con-

tain some typical examples of non-context-free lan-

guages that show important features of natural lan-

guages, e.g., {a

|n ≥ 0}, {ww|w ∈ {a, b}

∗

} and

|m, n ≥ 0}. At the same time mildly

context-sensitive languages share many of the nice

properties with the context-free languages. For ex-

ample, they have semi-linear Parikh images and their

parsing complexity is polynomially bounded.

An interesting extension of the class of ﬁnite auto-

mata that vastly increases its expressive power, the so-

called ﬁnite automaton (or ﬁnite-state acceptor) with

translucent letters (NFAwtl for short) was introduced

in (Nagy and Otto, 2011). The idea of this new model

comes from the research of cooperative distributed

systems of stateless deterministic restarting automata

with window size 1 (Nagy and Otto, 2012a; Nagy

and Otto, 2012b). An NFAwtl does not read its in-

put strictly from left to right as the traditional ﬁnite

automaton does, but for each of its internal states,

certain letters are translucent, that is, in this state the

NFAwtl cannot see them. Accordingly, it may read

(and erase) a letter from the middle or the end of

the given input. They accept certain non-regular and

even some non-context-free languages, but all lan-

guages accepted by NFAwtls have semi-linear Parikh

images (Nagy and Otto, 2012a). These issues are im-

portant for the linguistic applications point of view.

In contrast to the classical ﬁnite-state acceptors, the

deterministic variants of the NFAwtls, the so-called

DFAwtls, are less expressive than the nondeterminis-

tic ones (Nagy and Otto, 2012b). In this paper, as a

continuation of the work started in (Nagy and Otto,

2011), we consider the word problem of these au-

tomata by showing a nondeterminsitic/deterministic

linear-time algorithm that decides if a given word is

accepted or not by an NFAwtl/DFAwtl, respectively.

Some linguistical examples are also shown to demon-

strate the efﬁciency of NFAwtl/DFAwtl.

This paper is structured as follows. In Section 2

we recall the deﬁnition of the ﬁnite automata with

translucent letters and we also present an example. In

Section 3 the word problem (parsing) is considered

with some further examples, while in Section 4 we

show examples for the usage of ﬁnite automata with

translucent letters modeling some phenomena of the

Hungarian language. In our investigation, the Hun-

garian language was selected which differs from En-

glish in the following important aspects:

• it is an agglutinative language (in Hungarian most

grammatical information is given through suf-

ﬁxes: cases, conjugation, etc.),

• it has no dominant word order (as we will see, in

Hungarian, several kinds of order of the words of

a sentence can be considered emphasizing slightly

different special meaning),

• reduced use of postpositions (since sufﬁxesare of-

ten equivalent to English prepositions, there are

only few postpositions in Hungarian).

In Section 5 we summarize our work and give

some open problems for future work.

2 FINITE-STATE ACCEPTORS

WITH TRANSLUCENT

LETTERS

In this section we ﬁx our notation and recall the def-

inition and some basic facts about the ﬁnite automata

with translucent letters.

A nondeterministic ﬁnite automaton (NFA) is de-

scribed by a tuple A = (Q, Σ, I, F, δ), where Q is a ﬁ-

nite set of internal states, Σ is a ﬁnite alphabet of input

letters, I ⊆ Q is the set of initial states, F ⊆ Q is the

set of ﬁnal states, and δ : Q× Σ → 2

is a transition

relation. If |I| = 1 and |δ(q, a)| ≤ 1 holds for all q ∈ Q

and all a ∈ Σ, then A is a deterministic ﬁnite automa-

ton (DFA).

An NFA A works as follows. It is given an input

string w ∈ Σ

∗

, and A starts its computation/run in a

state q

that is chosen nondeterministically from the

set I of all initial states. This conﬁguration is encoded

as q

w. Nowit reads the ﬁrst letter of w, say a, thereby

deleting this letter, and it changes its internal state to

a state q

that is chosen nondeterministically from the

set δ(q

, a). Should δ(q

, a) be empty, then A gets

stuck (in this run), otherwise, it continues its run by

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

462

reading letters until w has been consumed completely.

We say that A accepts w with a run if A is in a ﬁnal

state q

∈ F after reading w completely at the end of

this run. By L(A) we denote the set of all strings w ∈

∗

for which A has an accepting computation in the

sense described above.

It is well-known that the class L (NFAwtl) of lan-

guages L(A) that are accepted by NFAs coincides with

the class REG of regular languages, and that DFAs ac-

cept exactly the same languages.

Now we recall a variant of the nondeterministic

ﬁnite automata that does not process its input strictly

from left to right (Nagy and Otto, 2011).

Deﬁnition 1. A ﬁnite-state acceptor with translu-

cent letters (NFAwtl) is deﬁned as a 7-tuple A =

(Q, Σ, $, τ, I, F, δ), where Q is a ﬁnite set of internal

states, Σ is a ﬁnite alphabet of input letters, $ 6∈ Σ

is a special symbol that is used as an endmarker,

τ : Q → 2

is a translucency mapping, I ⊆ Q is a set

of initial states, F ⊆ Q is a set of ﬁnal states, and

δ : Q× Σ → 2

is a transition relation. For each state

q ∈ Q, the letters from the set τ(q) are translucent

for q, that is, in state q the automaton A does not see

these letters. A is called deterministic, abbreviated as

DFAwtl, if |I| = 1 and if |δ(q, a)| ≤ 1 for all q ∈ Q and

all a ∈ Σ.

An NFAwtl A = (Q, Σ, $, τ, I, F, δ) works as fol-

lows. For an input word w ∈ Σ

∗

, it starts in a non-

deterministically chosen initial state q ∈ I with the

word w · $ on its input tape. A single step compu-

tation of A is as follows. Assume that w = a

···a

for some n ≥ 1 and a

, . . . , a

∈ Σ. Then A looks for

the ﬁrst occurrence from the left of a letter that is not

translucent for the current state q, that is, if w = uav

such that u ∈ (τ(q))

∗

and a 6∈ τ(q), then A nondeter-

ministically chooses a state q

′

∈ δ(q, a), erases the

letter a from the tape thus producing the tape con-

tents uv · $, and its internal state is set to q

′

. In state

′

the automaton considers the tape uv$ and contin-

ues the process by another single step computation

looking for the ﬁrst visible letter of uv at state q

′

. In

case δ(q, a) =

0, A halts without accepting. Finally,

if w ∈ (τ(q))

∗

, then A reaches the $-symbol and the

computation halts. In this case A accepts if q is a ﬁnal

state; otherwise, it does not accept. Observe that this

deﬁnition also applies to conﬁgurations of the form

q · $, that is, q · ε · $ ⊢

Accept holds if and only if q

is a ﬁnal state. A word w ∈ Σ

∗

is accepted by A if

there exists an initial state q

∈ I and a computation

w·$⊢

∗

Accept, where ⊢

∗

denotes the reﬂexive tran-

sitive closure of the single-step computation relation

⊢

. Now L(A) = { w ∈ Σ

∗

| w is accepted by A} is the

language accepted by A.

The classical nondeterministic ﬁnite automata

(NFA) is obtained from the NFAwtl by removing

the endmarker $ and by ignoring the translucency

relation τ, and the deterministic ﬁnite-state accep-

tor (DFA) is obtained from the DFAwtl in the same

way. Thus, the NFA (DFA) can be interpreted as a

special type of NFAwtl (DFAwtl). Accordingly, all

regular languages are accepted by DFAwtl. More-

over, DFAwtls are much more expressive than stan-

dard DFAs as shown by the following example.

Example 1. Let A = (Q, Σ, $, τ, I, F, δ), where Q =

, q

}, I = {q

} = F, Σ = {a, b, c, d}, and the

functions τ and δ are deﬁned as follows:

τ(q

) =

0, δ(q

, a) = {q

δ(q

, b) = {q

τ(q

) = {a, b}, δ(q

, c) = {q

τ(q

) = {b}, δ(q

, d) = {q

and δ(q, x) =

0 for all other pairs (q, x) ∈ Q× Σ. Ob-

serve that A is in fact a DFAwtl.

It can be shown that L(A) consists only some of

words with |w|

= |w|

and |w|

= |w|

, moreover

L(A) ∩ (a

∗

· b

∗

· c

∗

· d

∗

) = {a

| n, m ≥ 0} and

thus this language is not context-free.

It is shown that already DFAwtls accept non-

context-free languages.

An NFAwtl A = (Q, Σ, $, τ,I, F, δ) is described

more transparently by a graph, similar to the graph

representation of standard NFAs. A state q ∈ Q is re-

presented by a node labelled with q, where the node

of an initial state p is marked by a special incoming

edge without a label, and the node of a ﬁnal state p is

marked by a special outgoing edge with label (τ(p))

∗

For each state q ∈ Q and each letter a ∈ Σr τ(q), if

δ(q, a) = {q

, . . . , q

}, then there is a directed edge

labelled ((τ(q))

∗

, a) from the node corresponding to

state q to the node corresponding to state q

for each

i = 1, . . . , s. The graph representation of the DFAwtl A

of Example 1 is given in Figure 1. (Using the notation

∗

= {ε}.)

According to the deﬁnition, an NFAwtl may ac-

cept a word without processing it completely. This,

however, is only a convenience that makes for simple

instructions, since for every NFAwtl A one can effec-

tively construct an NFAwtl B such that L(B) = L(A),

but for each word w ∈ L(B), each accepting computa-

tion of B on input w consists of |w| many reading steps

plus a ﬁnal step that accepts the empty word (i.e., the

input is totally processed). Nevertheless it is an open

problem whether the same fact holds for DFAwtls in

general. If A is an NFAwtl on Σ that is accepting only

totally processed input, then by removing the translu-

cency relation from A, we obtain a standard NFA A

′

that accepts a letter equivalent language of the origi-

nal language L(A), moreover L(A) ⊇ L(A

′

LinguisticApplicationsofFiniteAutomatawithTranslucentLetters

463

?>=<89:;

({a,b}

∗

,c)



// ?>=<89:;

{ε}

({ε},b)



({ε},a)

?>=<89:;

({b}

∗

,d)

Figure 1: The graphical representation of the DFAwtl A of

Example 1.

. . . a

. . .

Figure 2: Divided tape model of NFAwtl.

3 THE PARSING PROBLEM

In this section we address the word problem, i.e.,

how we can decide if a given word is accepted by an

NFAwtl/DFAwtl.

One may feel that in every step of our automaton

the input should be processed from the beginning by

searching an occurrence of the letter indicated by the

transition; and in this way the time complexity of the

word problem looks quadratic. In the next part of this

section we show a representation method which gives

lower complexity for the word problem.

Let us start this section with a kind of description

of our automata, i.e., how it could work. Our aim is to

reach easily the next occurrence of a given letter. To

do so, ﬁrst, let us divide the tape into as many parts as

the cardinality of the alphabet (about a similar way as

a one-tape Turing machine can simulate a multi-tape

Turing machine (Hopcroft and Ullman, 1969)). See

Figure 2 as well. In each division exactly one of the

letters is used: there is a bijection from Σ to the parts

of the tape.

Then in each state those parts of the tape are not

used that are assigned to translucent letters. With this

picture in mind, one can construct the ‘linked-list’

data structure of the tape content in a linear number of

steps (by the length of the input). The notation w

used for the n-th letter of the word w. The linked-list

structure contains the array STORE with dimension

|w| and |Σ| pointers. A pointer HEAD

shows the ﬁrst

occurrence of letter a in w. In that position in the ar-

ray there is a value that shows the second occurrence

of letter a in w, etc. At the last occurrence of a there

is a special marker that shows that there is no further

occurrences of the letter a. The formal algorithm is

shown in Figure 3. Starting from the end of the word

we easily ﬁnd the last occurrence of every type of let-

ters of the word, we mark these places of STORE by

special markers (NULL). Further preprocessing the

input we put every place to STORE the value that in-

dicates the next occurrence of the given letter. Finally,

the pointers HEAD

show the ﬁrst occurrences of the

letters.

Then the constructed data structure is used in com-

putation as a representation of the input word (see

Figure 4). By the constructed data structure one can

easily access the ﬁrst occurrence of any letters a of

any word w by the pointer HEAD

. When a ma-

chine should read a letter a then one needs a con-

stant number (at most |Σ| − 1) of comparisons: if

HEAD

> HEAD

for any non-translucent letters b

occurring in the unprocessed part of the input (here

the value NULL is considered as ∞), then the ma-

chine gets stuck, else a is read by the machine: it is

done by erasing the current head of the list represent-

ing a’s: let HEAD

= STORE

HEAD

. When all lists

are empty, i.e., HEAD

= NULL for every input letter

a, then the input is fully processed.

In this way a DFAwtl processes the input in lin-

ear time: linear preprocessing and linear processing

(the number of simple operations (comparisons and

assignment statements) are bounded by (|Σ|+2|w|)+

(|Σ| − 1)|w|) ).

Therefore the word problem for DFAwtl is almost

as simple as for DFA, this family of languages has

this very pleasant and effective property. (Note that

here we allowed to compare numbers in a ﬁxed time

complexity, however the stored numbers of the array

STORE depend on the length of the input.)

For NFAwtl the word problem using our represen-

tation is linear with a nondeterministic version of our

algorithm (i.e., the problem is in the class NLIN).

Further in this section we present some additional

examples. The next example is one of the most essen-

tial context-free languages.

Example 2. The Dyck language is accepted by the

DFAwtl shown in Figure 5. The input abaababb is

represented by HEAD

= 1, HEAD

= 2, STORE =

35467∞8∞ (where ∞ represents the value NULL as

discussed before). The run of the machine on this in-

put is as follows:

State q

, HEAD

= 1 (HEAD

< HEAD

) is

changing to STORE

HEAD

= STORE

= 3; the next

state is q

State q

, HEAD

= 2 is changing to

STORE

HEAD

= STORE

= 5; the next state is

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

464

Input: the alphabet

and the input word

Output: linked lists for every element

a ∈ Σ

stored in a

|w|

-long string

STORE

and

HEAD

s of the lists.

Initialization: For every

a ∈ Σ

let

HEAD

= NULL

For

i = | w|

downto

step

−1

Let

STORE

= HEAD

Let

HEAD

= i

EndFor.

[....]

Figure 3: Preprocessing algorithm to obtain linked list representation of the input word.

HEAD

→a

. . . →a

→NULL

→a

. . . →a

→NULL

. . .

→a

. . . →NULL

Figure 4: Linked list model of NFAwtl with preprocessed input.

State q

, HEAD

= 3 (HEAD

< HEAD

) is

changing to STORE

HEAD

= STORE

= 4; the next

state is q

State q

, HEAD

= 5 is changing to

STORE

HEAD

= STORE

= 7; the next state is

State q

, HEAD

= 4 (HEAD

< HEAD

) is

changing to STORE

HEAD

= STORE

= 6; the next

state is q

State q

, HEAD

= 7 is changing to

STORE

HEAD

= STORE

= 8; the next state is

State q

, HEAD

= 6 (HEAD

< HEAD

) is

changing to STORE

HEAD

= STORE

= ∞; the next

state is q

State q

, HEAD

= 8 is changing to

STORE

HEAD

= STORE

= ∞; the next state is

The input is empty (both HEAD

and HEAD

are

NULL and q

is a ﬁnal state, the input is accepted.

The input abbabaab is represented by HEAD

1, HEAD

= 2, STORE = 435687∞∞. The run of the

machine on this input is as follows:

State q

, HEAD

= 1 (HEAD

< HEAD

) is

changing to STORE

HEAD

= STORE

= 4; the next

state is q

State q

, HEAD

= 2 is changing to

STORE

HEAD

= STORE

= 3; the next state is

. The ﬁrst two letters of the input are already

processed.

State q

, HEAD

= 4, HEAD

= 3, therefore

HEAD

> HEAD

this transition cannot be executed.

The machine gets stuck. This input is not accepted,

abbabaab is not in the Dyck language.

Further, we present two non-context-free lan-

guages that are closely related to basic mildly context-

// ?>=<89:;

{ε}

({ε},a)



?>=<89:;

({a}

∗

,b)

Figure 5: The DFAwtl of Example 2.

// ?>=<89:;

({b,c}

∗

,a)





{ε}

?>=<89:;

({a,c}

∗

,b)

// ?>=<89:;

({a,b}

∗

,c)

ccG

Figure 6: The DFAwtl of Example 3.

sensitive languages.

Example 3. Let A = (Q, Σ, $, τ, I, F, δ), where Q =

, q

}, I = {q

} = F, Σ = {a, b, c}, and the func-

tions τ and δ are deﬁned as follows:

τ(q

) = {b, c}, δ(q

, a) = {q

τ(q

) = {a, c}, δ(q

, b) = {q

τ(q

) = {a, b}, δ(q

, c) = {q

and δ(q, x) =

0 for all other pairs (q, x) ∈ Q× Σ. The

language {w ∈ {a, b, c}

∗

||w|

= |w|

} is ac-

cepted by this DFAwtl. Its graphical representation

is shown in Figure 6.

An example run on input abbacbcca is :

Representation of the input: HEAD

= 1,

HEAD

= 2, HEAD

= 5, STORE = 43697∞8∞∞.

Starting from the initial state q

an a is read by

changing HEAD

from 1 to STORE

= 4.

Then in state q

a b is read by increasing HEAD

from 2 to STORE

= 3.

In state q

a c is read by changing HEAD

from 5

to STORE

= 7.

LinguisticApplicationsofFiniteAutomatawithTranslucentLetters

465

Now the system is in q

and an a is read: HEAD

is changed to STORE

= 9.

In state q

HEAD

is increased to STORE

= 6.

In state q

HEAD

is changed to STORE

= 8.

In state q

HEAD

is increased to STORE

= ∞.

In state q

HEAD

is increased to STORE

= ∞.

In state q

HEAD

is changed to STORE

= ∞ and

the system is arrived to q

Since all lists are empty (all HEADs are NULL)

the input is fully processed and the system is in its

ﬁnal state, the input is accepted.

Note that in this example all letters that are not

being read in a step are translucent, therefore the run

does not need any comparisons. The machine may get

stuck only if one of the letters is running out...

The language of the previous example intersected

by the regular language a

∗

gives the language

. The language {a

} can be recog-

nized in a similar way: it can be obtained as an inter-

section of the DFAwtl language presented in Exam-

ple 1 and the regular language a

∗

. These lan-

guages are belonging to mildly context-sensitive lan-

guage families and they are important from linguistic

point of view. The next language is closely connected

to the copy language that is also belonging to mildly

context-sensitive language families.

Example 4. The disjoint copy language {ww

′

|w ∈

{a, b}

∗

, w

′

∈ {a

′

, b

′

}, w

′

= h(w), where the alphabetic

morphism h maps the letters to their primed ver-

sions } is accepted in the following way. Consider the

DFAwtl shown in Figure 7. The language accepted

by the DFAwtl of Figure 7 intersected by the regular

language (a + b)

∗

′

+ b

′

)

∗

gives exactly the disjoint

copy language.

4 APPLICATIONS IN A NATURAL

LANGUAGE: MODELING

SOME STRUCTURES OF

HUNGARIAN LANGUAGE

In the previous section we gave some examples how

NFAwtl’s/DFAwtl’s can be used modelling languages

that are not context-free and closely connected to the

main mildly context-sensitive languages. In this sec-

tion we make a further step: we use NFAwtl’s to

present some features of a natural language, namely,

of the Hungarian language.

In our investigation, the Hungarian language was

selected which differs from English in many aspects.

As we already mentioned, the Hungarian language is

an agglutinative language, it has no dominant word

?>=<89:;

({a,b}

∗

′

)



// ?>=<89:;

{ε}

({ε},b)



({ε},a)

?>=<89:;

({a,b}

∗

′

)

Figure 7: The graphical representation of the DFAwtl of

Example 4.

order, and there is a reduced use of postpositions.

Our next example is modeling the two types of

conjugation of the Hungarian language. The Hungar-

ian language is a free order language therefore our

new model can effectively be used. In our example

we consider a very small segment of the language fo-

cusing on the phenomenon.

Example 5. The used ‘alphabet’, i.e., the Hungarian

words can be seen in Tables 1 and 2. Conjugation of

verbs can be seen in Table 2.

One of the most widely investigated distinguish-

ing features of languages is the ordering of subject

(S), object (O) and verb (V) within a sentence. Theo-

retically there are seven different ways and each way

is represented by a set of living languages. Accord-

ing to the statistical analysis (Dryera et al., 2012), the

dominating sequence is the SOV order with about 565

languages, the smallest group is the cluster with 4 lan-

guages for OSV order. An example of OSV order can

be found in the Nadeb language, where the sentence

’the child sees the jaguar’ is given with

awad (jaguar) kalap´e´e (child) hap´uh (to see)

(Note that in Hungarian it also make sense to use OSV

order: Jagu´art a gyerek l´at. (a gyerek = the child, l´at =

(can) see) However, in Hungarian this sentence may

have an additional meaning, it underlines the child,

i.e., not the adult (or anybody else), but the child sees

the jaguar.

In a signiﬁcant number of languages, no single

dominant order can be found. This languages use

a relatively free word order. According to some re-

cent approaches (Dryer, 1997) the SOV order has a

marginal role in categorization of the languages as

in more languages some of these components can

be eliminated from the sentences, the corresponding

clause will be pronominal or it is expressed by some

verbal afﬁxes. It is argued that a more useful typology

is one based on two more basic features, whether the

language is OV or VO and whether it is SV or VS.

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

466

Table 1: Hungarian words used in Example 5.

person ´en I te you ˝o she/he Jani Johnny Mari Mary

thing a k¨onyv az ´ujs´ag a keny´er a keksz a sajt

the book the newspaper the bread the biscuit the cheese

object a k¨onyvet az ´ujs´agot a kenyeret a kekszet a sajtot

Table 2: Hungarian verbs used in Example 5.

(I) (you) (she/he/it) (I) (you) (she/he/it)

deﬁnite eszem eszed eszi olvasom olvasod olvassa

eat read

indeﬁnite eszek eszel eszik olvasok olvasol olvas

situation fekszem fekszel fekszik vagyok am vagy are van is

lay be

Correct sentences in Hungarian can be formed in

the following way: a person or a thing and a situation

verb can be paired in any order (respecting the per-

son of the conjugation), e.g., “

En fekszem.”, “Vagyok

´en.”, “Fekszel te.”, “Te vagy.”, “

O fekszik.”, “Fekszik

Jani.”, “Mari van.”, “Az ´ujs´ag van.”, “A k¨onyv fek-

szik.”

A person and a verb in indeﬁnite form can also

be paired in any order (respecting the person of the

conjugation), e.g., “

En eszek.”, “Olvasok ´en.”, “Te

eszel.”, “Olvas ˝o.”, “Mari eszik.”, “Eszik Jani.”

A person, a verb in deﬁnite form and an object

can also be grouped to form a sentence in any or-

der (respecting the person of the conjugation), e.g.,

“

En eszem a kenyeret.”, “A kenyeret te eszed.”, “Eszi

Mari a kenyeret.”, “

En az ´ujs´agot olvasom.”, “Te a

k¨onyvet olvasod.”, “A kekszet eszi Jani.”, “

En eszem

kenyeret.”, “Olvassa ˝o a k¨onyvet.”

Therefore the automaton may look ﬁrst for the

subject (having everything else translucent), and then

depending on the object the automaton looks for the

verb (with everythingelse translucent). If deﬁnite ver-

sion of a verb is used with a person, then the automa-

ton will check the existence of the object also.

From the theoretical viewpoint, the main interest

in modeling natural language grammars focuses on

the sentences with unlimited length. Regarding the

set of sublanguages belonging to the class accepted

by NFAwtl the following condition should be met

(Nagy and Otto, 2011): any language accepted by an

NFAwtl should contain a regular sublanguage being

letter equivalent with the language itself. Allowing

an unbounded length, this regular sublanguage must

have the following structure

∗

where S

are of ﬁnite length. Thus the pattern

accepted by the NFAwtl is letter equivalent with this

pattern. In the simplest cases, this part is equal to

• the repetition of a ﬁxed subsequence or

• the repetition of any permutations of elements of

the subsequence or

• the permutation of equal numbers of elements

from each different symbols in the subsequence

Considering the ﬁrst case, the pattern includes the

repetition of the same element:

∗

A sample sentence can be given as

P´eter szereti Ann´at,

Ev´at, Katit, Marik´at,... (Peter

likes Anna, Eve, Kate, Mary,...)

This kind of pattern can be described with a sim-

ple regular grammar, thus no translucent symbols

are needed. Thus, the power of NFAwtl can be

demonstrated at such pattern where the ordering of

the elements can be arbitrary. The next sentence

demonstrates this kind of pattern:

Anna egy k¨onyvet olvas arr´ol, hogy egy k¨onyvet

olvas Zoli arr´ol, hogy olvas Maria egy k¨onyvet arr´ol,

hogy Tibor olvas egy k¨onyvet arr´ol,... (Anna is

reading a book about that Zoli is reading a book

about that Maria is reading a book about that Tibor is

reading a book about that ..)

In this example, the order of some words within the

repeating subsentence can be arbitrary, as the seman-

tic role is encoded in Hungarian language with case-

making (inﬂection) of the words. The word “k¨onyvet”

is the accusative form of the stem “k¨onyv” (book).

Thus all of the following sentences can be used:

• hogy Anna k¨onyvet olvas

• hogy k¨onyvet olvas Anna

• hogy Anna olvas k¨onyvet

LinguisticApplicationsofFiniteAutomatawithTranslucentLetters

467

// ?>=<89:;

({ε},x)

// ?>=<89:;

({z,v}

∗

,y)



?>=<89:;

({ε},w)

// ?>=<89:;

({ε},x)

{{

?>=<89:;

({z}

∗

,v)

// ?>=<89:;

({ε},z)

Figure 8: The graphical representation of the DFAwtl for (x(yzv)

∗

• hogy olvas Anna k¨onyvet

The pattern of the corresponding subsentence can be

given with

(x(yzv)

∗

where the s

symbol denotes here the permutation of

the elements of s. In general, the form of the sublan-

guage is

)

∗

The pattern (x(yzv)

∗

can be validated with the

NFAwtl grammar given in Figure 8.

Theoretically, the language of pattern (S

)

∗

with ﬁnite S

could be accepted also by DFA or by

regular grammar. This DFA/grammar should contain

the explicit description of every possible permutation.

Thus the size of the grammar is O(|S

|!) times larger

than the size of the corresponding NFAwtl. Thus the

main beneﬁt of NFAwtl is the compact representation

form of the grammar.

In natural languages, the semantic role can be

represented at the syntax level by different ways.

The two most usual encoding methods are the

inﬂection and the relative position of the words.

In Hungarian language, where free orders can be

accepted, the different permutations usually convey

different marginally semantic contents like emphases

or the opinion of the sender. For example, taking the

sentence

Maria sokat olvas (Mary reads a lot)

the following permutations can be constructed:

• Maria sokat olvas (correct, natural)

• Maria olvas sokat (rare use, special situation)

• olvas Maria sokat (special situation)

• olvas sokat Maria (special situation)

• sokat Maria olvas (very special situation, sounds

strange)

• sokat olvas Maria (correct, natural)

The complexity of natural languages can be well

demonstrated with the phenomena that acceptance

of a permutation depends on the semantic of the sit-

uation. In the next example a similar sentence is used:

Maria olvas egy k¨onyvet (Mary reads a book)

The related permutations are

• Maria olvas egy k¨onyvet (correct)

• Maria egy k¨onyvet olvas (correct)

• olvas egy k¨onyvet Maria (correct)

• olvas Maria egy k¨onyvet (correct)

• egy k¨onyvet Maria olvas (sounds strange)

• egy k¨onyvet olvas Maria (correct).

There are some cases also in Hungarian where

the order of the words has a key role in correct inter-

pretation of the sentence. Let us take the following

example:

P´eter seg´ıtette J´osk´at (acc) Tomit (acc) kifesteni

(Peter helped John to paint Tom)

The sentence

P´eter seg´ıtette Tomit (acc) J´osk´at (acc) kifesteni

(Peter helped Tom to paint John)

means on the other hand a very different situation.

5 CONCLUSIONS

A large group of natural languages has no single dom-

inant word order, several permutations of the symbols

are grammatical. The traditional ﬁnite automata rep-

resent every possible orders explicitly resulting in a

huge/complex structure. The presented ﬁnite auto-

mata with translucent letters can be used for a more

compact modeling of free word order in natural lan-

guages. The NFAwtl automaton provides a more ex-

pressive and precise description of the grammar than

ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence

468

the base (regular) ﬁnite automaton. The next step of

the investigation is to cover also the non-regular and

non-context-free elements of natural languages, since

DFAwtl’s and NFAwtl’s are good candidates to handle

some of these features...

Other main result is that we provide a linear time

algorithm for the word problem. The considered al-

gorithm is non-deterministic for NFAwtl and deter-

ministic for DFAwtl. Thus, it remains open to give

an effective deterministic algorithm for NFAwtl lan-

guages.

It is also an interesting question whether all lan-

guages accepted by DFAwtl, can be accepted by such

a way that all the input letters are erased during the

computation.

ACKNOWLEDGEMENTS

The publication was supported by the T

AMOP-4.2.2/

C-11/1/KONV-2012-0001 project. The project has

been supported by the European Union, co-ﬁnanced

by the European Social Fund.

REFERENCES

Chomsky, N. (1957). Syntatic Structures. The Hague: Mou-

ton Co.

Dryer, M. S. (1997). On the six-way word order typology.

Studies in Language, 21:69–103.

Dryera, Matthew, S., Haspelmath, and (eds.), M. (2012).

The world atlas of langauges structures. Max Planck

Digital Library, Munich.

Gazdar, G. (1982). Natural languages and context-free lan-

guages. Linguistics and Philosophy, 4:469–473.

Hockett, C. (1955). A manual of phonology. Waverly Press,

Baltimore.

Hopcroft, J. E. and Ullman, J. D. (1969). Formal lan-

guages and their relation to automata. Addison-

Wesley Longman Publishing Co., Boston.

Joshi, A. K. (2010). Mildly Context-Sensitive Grammars.

http://www.kornai.com/MatLing/mcsﬁn.pdf.

Kornai, A. (1985). Natural languages and the chomsky hi-

erarchy. Proc. of EACL’85, pages 1–7.

Matthews, R. J. (1979). Are the grammatical sentences of a

language a recursive set. Synthese, 40:209–224.

Mery, B., Amblard, M., Durand, I., and Retor´e, C. (2006).

A case study of the convergence of mildly context-

sensitive formalisms for natural language syntax:

from minimalist grammars to multiple context-free

grammars. INRA Rapport de recherche, page nr 6042.

Nagy, B. and Otto, F. (2011). Finite-state acceptors with

translucent letters. BILC 2011 – 1st International

Workshop on AI Methods for Interdisciplinary Re-

search in Language and Biology, ICAART 2011 – 3rd

International Conference on Agents and Artiﬁcial In-

telligence, pages 3–13.

Nagy, B. and Otto, F. (2012a). CD-systems of stateless de-

terministic R-automata with window size one. Journal

of Computer and System Sciences, 78:780–806.

Nagy, B. and Otto, F. (2012b). On globally deter-

ministic CD-systems of stateless R-automata with

window size one. International Journal of

Computer Mathematics, pages online ﬁrst, doi:

10.1080/00207160.2012.688820.

Reich, P. A. (1969). The ﬁnitness of natural languages. Lan-

guage, 45:831–843.

Sullivan, W. J. (1980). Syntax and linguistic semantics in

stratiﬁcational theory. Current approaches to syntax,

pages 301–327.

Wintner, S. (2002). Formal language theory for natural lan-

guage processing. Proc. of ACL’02, pages 71–76.

LinguisticApplicationsofFiniteAutomatawithTranslucentLetters

469