6 Future Work
We have described a complex system which integrates visual context into language
processing. The system is able to differentiate between relevant and irrelevant context
items, choosing only those entities of the described visual scene that seem to be im-
portant for a given sentence. In contrast to previous approaches our system can find
the applicable information in a diverse, dynamic context, switching between its choice
whenever additional information is received, thereby guiding the focus of a model of
human visual attention.
So far, the integration of context information is restricted to descriptions of static scenes.
In the future, we will also investigate changing representations of context to model the
influence of a dynamic environment.
At the moment the selected referents influence saliency of a picture over time, but once
a scene of reference has been selected for parts of a sentence, even major changes in
saliency due to low-level cues in the picture will not be able to question this choice.
To remove this limitation a goal of future research will be to model the influence of
bottom-up attention on the choice of visual referents for linguistic input.
References
1. Beuck, N., K
¨
ohn, A., Menzel. W.: Incremental parsing and the evaluation of partial depen-
dency analyses In Proceedings of the 1st International Conference on Dependency Linguis-
tics, DepLing-2011 (2011) 290–299
2. Eberhard, K. M., Spivey-Knowlton, M. J., Sedivy, J. C:, Tanenhaus, M. K.: Eye Movements
as a Window into Real-Time Spoken Language Comprehension in Natural Contexts. Journal
of Psycholinguistic Research 24 (1995) 409–436
3. Egeth, H. E., Yantis, S.: Visual attention: Control, representation, and time course. Annual
Review of Psychology 48 (1997) 269-297
4. Foth, K.: Transformationsbasiertes Constraint-Parsing Diplomarbeit Universit
¨
at Hamburg
(1999)
5. Foth, K.: Hybrid Methods Of Natural Language Analysis PhD Thesis Universit
¨
at Hamburg
(2006)
6. Gorniak, P., Roy, D.: Grounded Semantic Composition for Visual Scenes. Journal of Artifi-
cial Intelligence Research 21 (2004) 429–470
7. Haddock, N. J.: Computational models of incremental semantic interpretation. Language and
Cognitive Processes 4(3) (1989) 337–36.
8. Itti, L.: Models of Bottom-Up and Top-Down Visual Attention. California Institute of Tech-
nology Ph.D. Thesis (2000)
9. Kn
¨
oferle, P.: The Role of Visual Scenes in Spoken Language Comprehension: Evidence from
Eye-Tracking. PhD thesis Universit
¨
at des Saarlandes (2005).
10. Kn
¨
oferle, P., Crocker, M. W., Scheepers, C., Pickering M. J.: The influence of the immediate
visual context on incremental thematic role-assignement evidence from eye-movements in
depicted events. Cognition 95 (2005) 95–127
11. Kn
¨
oferle, P., Crocker M. W.: The influence of recent scene events on spoken comprehension:
evidence from eye movements. Journal of Memory and Language 57(2) (2007) 519-543
12. McCrae, P.: A model for the cross-modal influence of visual context upon language pro-
cessing. Proceedings of the International Conference Recent Advances in Natural Language
Processing (2009) 230–235
32