be adapted to guess lexical categories of extraneous
words from context. However in most of them, this
would require a major modification of the parser.
Take for instance DCGs (Definite Clause Gram-
mars, (Pereira and Warren, 1980)), where lexical rules
would appear as exemplified by:
noun --> [borogove].
If the lexicon does not explicitly include the word
“borogrove” among the nouns, the parser would sim-
ply fail when encountering it. One could admit un-
known nouns through the following rule:
noun --> [_].
But since this rule would indiscriminately accept
any word as a noun ( and similar rules would have
to be included in order to treat possible extraneous
words in any other category), this approach would
mislead the parser into trying countless paths that are
doomed to fail, and might even generate wrong re-
sults.
In contrast, we can parse extraneous words
through Womb Grammar by anonymizing the cate-
gory and its features rather than the word itself, e.g.
word(Category,[Number,Gender],borogrove)), which
more accurately represents what we know and what
we don’t. The category and features will become ef-
ficiently instantiated through constraint satisfaction,
taking into account all the properties that must be sat-
isfied by this word in interaction with its context.
Of course, what would be most interesting would
be to derive the meaning of the word that “does not
belong”. While Womb Grammars do not yet have
a complete way of treating semantics, the clues they
can provide regarding syntactic category can serve to
guide a subsequent semantic analysis, or to bypass
the need for a complete semantic analysis by the con-
comitant use of ontologies relevant to domain-specific
uses of our parser. In general, we are not necessarily
interested in capturing the exact meaning of each un-
recognised word; but rather to infer its relation with
known words. The problem can be casted into the (au-
tomatic) extraction of a portion of the hypernym re-
lation involving the extraneous word using the actual
document or additional sources as corpora (see (Clark
et al., 2012)).
For instance, in the poem “Jabberwocky”, by
Lewis Carroll, nonsense words are interspersed
within English text with correct syntax. Our target
lexicon, which we might call Wonderland Lexicon or
WL, can be to some extent reconstructed from the sur-
rounding English words and structure by modularly
applying the constraints for English. Thus, “boro-
goves” must be labelled as a noun in order not to
violate a noun phrase’s exigency for a head noun.
In other noun phrases, the extraneous words can be
recognised only as adjectives. This is the case for
“the manxome foe” and “his vorpal sword”, once
the following constraints are applied: adjectives must
precede nouns, a noun phrase can have only one
head noun, determiners are also unique within a noun
phrase. In the case of “the slithy toves”, where there
are two WL words, the constraint that the head noun
is obligatory implies that one of these two words is
a noun, and the noun must be “toves” rather than
“slithy” (which is identified as an adjective as in the
two previous examples) in order not to violate the
precedence constraint between nouns and adjectives.
In other cases we may not be able to unambiguously
determine the category, for instance the WL word
“frabjous” preceding the English word “day” may re-
main ambiguous no matter how we parse it, if it satis-
fies all the constraints either as a determiner or as an
adjective
2
.
Two of the poem’s noun phrases (“the Jubjub
bird” and “the Tumtum tree”) provide ontological
as well as lexical information (under the reasonable
assumption that capitalised words must be proper
nouns, coupled with the fact that as proper nouns,
these words do not violate any constraints). Our adap-
tation of Womb Grammars includes a starting-point,
domain dependent ontology (which could, of course,
initially be empty), which can be augmented with
such ontological information as the facts that Tum-
tums are trees and Jubjubs are birds. Similarly, input
such as “Vrilligs are vampires” would result in addi-
tions to the ontology besides in lexical recognition.
It could be that some input allows us even to equate
some extraneous words with their English equiva-
lents. For instance, if instead of having in the same
poem the noun phrases “his vorpal sword” and “the
vorpal blade”, we’d encountered “his vorpal sword”
and “the cutting blade”, we could bet on approximate
synonymy between “vorpal” and “cutting” , on the ba-
sis of our English ontology having established seman-
tic similarity between “sword” and “blade”.
Similarly, extraneous words that repeat might al-
low a domain-dependent ontology to help determine
their meaning. Taking once more the example of “his
vorpal sword” and “the vorpal blade”, by consulting
the ontology besides the constraints, we can not only
determine that “vorpal” is an adjective, but also that
it probably refers to some quality of cutting objects.
It would be most interesting to carefully study under
which conditions such ontological inferences would
be warranted.
2
Which precise constraints are defined for a given lan-
guage subset is left to the grammar designer; those in this
paper are meant to exemplify more than to prescribe.
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
294