Handling Larger Problem Sizes. Our implement-
ation does not have any problems with efficiency
– all experiments run within less than a minute on
an ordinary laptop. But since the problem itself is
NP-complete, we will potentially run into difficulties
when we increase the number of logical variables,
either by increasing the number of example sentences,
making the sentences longer (and therefore more am-
biguous), or by allowing larger subtrees in the CSP.
One main problem is the number of parse trees,
which can grow exponentially in the length of the
sentences. If we also split the trees into all possible
subtrees, the number grows even more. One possible
solution we want to explore is to move away from the
formulation of our problem in terms of parse trees and
instead refer to the states in the parse chart. The chart
has a polynomial size, compared to the exponential
growth of the trees, and it should be possible to trans-
late the chart directly to a complex logical formula
instead of having to go via parse trees.
Other Grammar Formalisms. Currently our al-
gorithm uses the GF resource grammar library, but
there are other formalisms and resource grammars for
which we hope the method can prove useful, such as
the HPSG resource grammars developed within the
DELPH-IN collaboration,
2
or grammars created for
the XMG metagrammar compiler.
3
8 CONCLUSION
In this paper we have shown that it is possible to learn
a grammar from a very limited number of example
sentences, if we can make use of a large-scale re-
source grammar. In most cases only around 10 ex-
ample are enough sentences to get a grammar with
good coverage.
There is still work left to be done, including per-
forming more evaluations on different kinds of gram-
mars and example treebanks. But we hope that this
idea can find its uses in areas such as computer-
assisted language learning, domain-specific dialogue
systems, computer games, and more. We will espe-
cially focus on ways to use this method in Computer-
Assisted Language Learning. However, a thorough
evaluation of the suitability of the extracted grammars
has to be conducted for each of these applications and
remains as future work.
2
http://www.delph-in.net/wiki/index.php/Grammars
3
http://xmg.phil.hhu.de/
REFERENCES
Bod, R. (1992). A computational model of language per-
formance: Data oriented parsing. In COLING’92,
14th International Conference on Computational Lin-
guistics, Nantes, France.
Bod, R. (2003). An efficient implementation of a new DOP
model. In EACL’03, 10th Conference of the European
Chapter of the Association for Computational Lin-
guistics, Budapest, Hungary.
Bod, R. (2006). Exemplar-based syntax: How to get
productivity from examples. The Linguistic Review,
23(3). Special issue on exemplar-based models in lin-
guistics.
Clark, A. and Lappin, S. (2010). Unsupervised learning and
grammar induction. In Clark, A., Fox, C., and Lap-
pin, S., editors, The Handbook of Computational Lin-
guistics and Natural Language Processing, chapter 8,
pages 197–220. Wiley-Blackwell, Oxford.
Clark, A. and Yoshinaka, R. (2014). Distributional learning
of parallel multiple context-free grammars. Machine
Learning, 96(1-2):5–31.
D’Ulizia, A., Ferri, F., and Grifoni, P. (2011). A survey of
grammatical inference methods for natural language
learning. Aritifical Intelligence Review, 36:1–27.
Garey, M. R. and Johnson, D. S. (1979). Computers
and Intractability: A Guide to the Theory of NP-
Completeness. W. H. Freeman & Co, New York, USA.
Henschel, R. (1997). Application-driven automatic sub-
grammar extraction. In Computational Environments
for Grammar Development and Linguistic Engineer-
ing.
Imada, K. and Nakamura, K. (2009). Learning context free
grammars by using SAT solvers. In 2009 Interna-
tional Conference on Machine Learning and Applic-
ations, pages 267–272.
Karp, R. M. (1972). Reducibility among combinatorial
problems. In Miller, R. E., Thatcher, J. W., and Bo-
hlinger, J., editors, Complexity of Computer Compu-
tations, pages 85–103. Plenum, New York, USA.
Ke
ˇ
selj, V. and Cercone, N. (2007). A formal approach to
subgrammar extraction for nlp. Mathematical and
Computer Modelling, 45(3):394 – 403.
Ranta, A. (2009a). The GF Resource Grammar Library.
Linguistic Issues in Language Technology, 2(2).
Ranta, A. (2009b). Grammatical Framework: A Multilin-
gual Grammar Formalism. Language and Linguistics
Compass, 3(5):1242–1265.
Ranta, A. (2011). Grammatical Framework: Program-
ming with Multilingual Grammars. CSLI Publica-
tions, Stanford.
Ranta, A. (2012). Implementing Programming Languages.
An Introduction to Compilers and Interpreters. Col-
lege Publications.
Russell, S. and Norvig, P. (2009). Artificial Intelligence: A
Modern Approach. Prentice Hall, 3rd edition.
NLPinAI 2020 - Special Session on Natural Language Processing in Artificial Intelligence
430