always learns more knowledge from its experience and can handle future situations
better. In this way, we can make the system more and more robust to handle unknown
sentences in the real test.
6 Conclusions
The lack of well-annotated data is always one of the biggest problems for most
training-based dialogue systems.
In this paper, we explore the evolutionary language processing approach to build a
natural language understanding system for dialogue systems in a virtual human
training project. The initial training data are built with a finite state machine. The
language understanding machine is trained based on the automated data first and is
improved as more and more data come in, which is proved by the experimental
results.
The quality and the configuration of the training set affect the ability to process
sentences. How to build a balanced training set with single finite state machine will
remain one of our important future problems. Ongoing research also includes
improving pruning approaches and finding new ways to integrate semantic knowledge
to our classifier.
References
1. Swartout, W., et al.: Toward the Holodeck: Integrating Graphics, Sound, Character and
Story. Proceedings of 5th International Conference on Autonomous Agents. (2001)
2. Eugene Charniak. Statistical Parsing with a Context-free Grammar and Word Statistics.
AAAI-97, (1997) pp. 598-603
3. Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. Proc. of the
35th ACL, (1997) pp. 16-23
4. S. Miller, R. Bobrow, R. Ingria, and R. Schwartz. Hidden Understanding Models of
Natural Language, Proceedings of ACL Meeting, (1994) pp. 25-32
5. Schwartz, R., Miller, S., Stallard, D., and Makhoul, J.: Language Understanding using
hidden understanding models. In ICSLP’96 (1996.), pp. 997-1000
6. Klaus Macherey, Franz Josef Och, Hermann Ney. Natural Language Understanding Using
Statistical Machine Translation, EUROSPEECH, (2001) pp. 2205-2208, Denmark
7. K. A. Papineni, et al. Feature-based language understanding, Proceedings of
EuroSpeech'97, Greece, vol 3, (1997) pp. 1435-1438
8. A.L. Gorin, G. Riccardi and J.H. Wright. How may I help you?, Speech Communication,
vol. 23, (1997) pp. 113-127
9. W. Minker, S.K. Bennacef, and J.L. Gauvain. A Stochastic Case Frame Approach for
Natural Language Understanding, Proc. ICSLP, (1996) pp. 1013—1016
10. D. Gildea and D. Jurafsky. Automatic Labeling of Semantic Roles, Computational
Linguistics, 28(3) (2002) 245-288 14
11. Michael Fleischman, Namhee Kwon, and Eduard Hovy. Maximum Entropy Models for
FrameNet Classification. EMNLP, Sapporo, Japan. (2003)
12. G. Sampson, 1996. Evolutionary Language Understanding, Cassell, NY/London (1996)
13. Peter F. Brown, et al. Class-Based n-gram Models of Natural Language, Computational
Linguistics, 18 (4), (1992) 467-479
54