Towards a System Architecture for Integrating

Cross-modal Context in Syntactic Disambiguation

Patrick McCrae and Wolfgang Menzel

CINACS Graduate Research Group

Department of Informatics, Hamburg University

Vogt-Kölln-Straße 30, 22527 Hamburg, Germany

Abstract. Most natural language utterances are inherently ambiguous, which

results in semantic underspecification. Yet, despite the omnipresence of ambi-

guity, human communication still succeeds in most cases and even displays a

remarkable robustness – quite in contrast to the majority of natural language

applications today. The reason for this is that in processing an ambiguous utter-

ance humans also integrate information from sources other than the utterance

itself and thus have access to additional knowledge to enrich the semantic

specification that guides disambiguation. One important such source of addi-

tional semantic knowledge in humans is sensory input from cross-modal per-

ception.

In this paper we describe a system architecture motivated by effects during hu-

man sentence processing which permits to study the integration of cross-modal

context knowledge in syntactic parsing. We hypothesise that integrating cross-

modal context into syntactic constraint dependency parsing will significantly

and substantially improve the accuracy of structural ambiguity resolution.

1 Introduction

Considering the complexity of factors contributing to successful natural language

interaction, human language processing is surprisingly robust against ambiguous or

ungrammatical input – in fact, much more robust than the majority of present-day

natural language processing (NLP) systems. An important reason for this robustness

in human sentence processing is the access to information sources other than the

linguistic material. In the computation of an utterance’s overall meaning, humans do

not only analyse the linguistic material in isolation but also incorporate additional

linguistic and extra-linguistic information to establish reference and resolve various

types of ambiguity. Typical sources of additional knowledge include cross-modal

sensory information [1], discourse history, context [2] and common sense or world

knowledge [3]. All of these help constrain utterance context and thus facilitate the

computation of the most plausible sentence meaning resulting from the overall con-

text evaluated.

McCrae P. and Menzel W. (2007).

Towards a System Architecture for Integrating Cross-modal Context in Syntactic Disambiguation.

In Proceedings of the 4th International Workshop on Natural Language Processing and Cognitive Science, pages 228-237

DOI: 10.5220/0002430602280237

 SciTePress

One extensively studied example for syntactic ambiguity is prepositional phrase

(PP) attachment. It is of particular interest because in many cases disambiguation

cannot be performed on linguistic grounds alone. Consider the classic example (S1).

(S1) The man saw the woman with a telescope.

(S1) has at least two readings, depending on which constituent the PP with a tele-

scope is considered to attach to. In the first reading, with a telescope modifies the act

of seeing and functions as the thematic

INSTRUMENT for seeing. In the second read-

ing, the term telescope modifies the woman and thus functions as the thematic

COMITATIVE expressing the concept of companionship.

Observe that from a purely linguistic point of view both readings and structural in-

terpretations are equally valid. A conclusive attachment preference requires addi-

tional, extra-linguistic information which other sources such as context may provide.

A key source of contextual information in human communication is cross-modal

sensory perception [4], [5]. Given such context information – e. g. by additionally

seeing the image of a man looking through the telescope – we have sufficient extra

information to favour one reading over the other and hence arrive at different de-

pendency structure representations in (D1); attachment PP1 for the thematic role

INSTRUMENT and attachment PP2 for the thematic role COMITATIVE.

(D1)

While a large number of NLP techniques for handling PP attachment exist, the ma-

jority of these approaches rely on lexical or syntactic properties of the input sentence

or statistical distribution patterns of the preposition relative to its attachment constitu-

ent. All of these approaches are based on properties of the sentence material alone;

none of them incorporate extra-sentential context information although utterance

context is known to direct semantic and structural interpretation in human sentence

processing [2]. If we wish to enable an NLP system to make informed context-

dynamic decisions on structurally ambiguous utterances, we need to provide the basis

for such decisions by making suitable representations of context information available

for integration into the syntactic decision process.

Starting out from CDG, a weighted-constraint dependency syntax parser for Ger-

man, our goal is to develop an architecture capable of integrating cross-modal context

in syntactic parsing. The architectural integration of cross-modal context simulates

the effects of the cognitive processes during human sentence processing. We hy-

pothesise that integrating cross-modal context will significantly and substantially

improve the accuracy of structural ambiguity resolution.

The structure of this paper is as follows: The Literature Review section outlines cur-

rent approaches to PP attachment, one notoriously hard-to-handle type of structural

ambiguity, and provides an overview over the NLP challenges in assigning thematic

229

roles. From these we derive general requirements for our Context Integration Archi-

tecture. In Section 3 we present the architecture, first conceptually and then with an

implementation focus. In the Future Work section we give an outlook onto which

cognitive and linguistic phenomena we intend to study.

2 Literature Review

Most heuristics for syntactic ambiguity resolution in broad-coverage NLP applica-

tions today are based on the syntactic or syntacto-lexical properties of the sentence

material rather than contextual semantics. These heuristics derive their answer to a

syntactic problem from a primarily syntax-oriented procedure. Given that accurate

thematic role assignment is equivalent to correct syntactic disambiguation, a viable

alternative approach is to achieve syntactic disambiguation via semantic analysis. We

will now illustrate the limitations of primarily syntactic pathways to syntactic disam-

biguation based on PP attachment. Let it be said, however, that the architecture sub-

sequently proposed is not limited to the syntactic phenomenon of PP attachment.

2.1 Current NLP Approaches to Structural Disambiguation: PP Attachment

Some of the central techniques for PP attachment disambiguation today include syn-

tactic approaches such as Right Association/Late Closure and Minimal Attachment or

syntacto-lexical approaches such as statistical methods and case frames.

With reported attachment accuracies of 92% for German [6] and 94.9% [7] to

95.77% [8] for English, statistical approaches to PP attachment based on machine

learning are the most successful and most frequently used ones in large-coverage

parsers today. Statistical methods make predictions for PP attachment based on pat-

terns extracted from large text corpora. These approaches favour attachment in

agreement with what has been found to exist most frequently in the corpus. Note,

however, that the statistical approaches report averages and thus are blind to context,

too. Their decision is static rather than contextually dynamic.

A more semantically inclined approach is provided by case frames which consti-

tute an approach to base PP attachment decisions on a formalisation of the verb’s

syntactic and semantic attachment requirements anchored in the lexicon. With case

frames, the attachment decision is based on the verb’s syntactic argument selection

criteria. These may be extended to include semantic constraints to permit an evalua-

tion of semantic fit for the constituents in question as well. This model is supported

by experimental evidence from priming which indicates that syntactic schemas are

activated during sentence processing [9]. To our knowledge, however, present-day

case-frame implementations do not include extra-sentential context in the assessment

of arguments’ semantic fit. We therefore highlight the need to extend the case frame

model to include cross-modal context as a key component for the assessment of se-

mantic fit in thematic role assignment.

230

2.2 Challenges in Assigning Thematic Roles

Since a thematic role describes a relation of a noun phrase (NP) to a verb phrase

(VP), approaches to thematic role assignment can, in principle, be based on the prop-

erties of the NP, the VP or both. As a first step towards a semantically enhanced case-

frame approach for thematic role assignment it therefore seems promising to examine

the ontological properties of the PP’s NP to assess whether the PP can actually fill a

specific thematic role, say

INSTRUMENT, for a given verb. (S4.1) and (S4.2) clearly

illustrate the challenge in this task as the same NP can be a good

INSTRUMENT for

one sense of the polysemous verb (S4.1) but not for another (S4.2).

(S4.1) He cut the apple off [with a knife.]

PP: INSTRUMENT

Context: Picking an apple.

(S4.2) ?? He cut her off [with a knife.]

PP: INSTRUMENT

Context: A conversation.

This begs the question whether we can use semantic features of the NP in question

to assess its suitability for filling a given thematic role slot. In the literature attempts

have been made to define feature-based thematic role hierarchies as a basis for the

decision on an NP’s suitability in a given thematic role slot [10], [11]. The most

prominent – and also the most frequently investigated – NP feature is animacy [12]

whose importance in human sentence processing has also been shown using brain-

imaging techniques such as ERPs [13].

To our knowledge, however, none of the noun-feature-based categorisation ap-

proaches to thematic role assignment achieves coverage over a broader range of

verbs, let alone generality. We interpret this lack of generalisability of feature re-

quirements on thematic role slots a clear manifestation of the thematic role filler’s

semantic dependence upon verbal argument structure. In assigning the totality of all

thematic selection criteria to the verb, Dowty [14] has dissembled all hopes of con-

structing a generalised, i. e.: verb-independent, noun-feature-based thematic role

hierarchy.

Dowty’s view of dominating verb meaning is supported by evidence from priming

experiments suggesting that the thematic roles

AGENT, PATIENT and INSTRUMENT

are intrinsically integrated into the situation schema referenced by the verb [15]. Fer-

retti et al. found that verb meanings prime nouns which are commonly perceived as

good filler nouns for the aforementioned thematic roles. They conclude that thematic

role knowledge is closely and inseparably coupled to verb meaning.

In our view this experimental evidence motivates to model thematic role assign-

ment in our architecture with close proximity to verb meaning. We also acknowledge

the need to step beyond intrasentential semantic relations in thematic role assignment

and to include additional semantic knowledge from utterance context.

Note that we have excluded here the effect upon thematic role assignment arising from the

semantic interactions between preposition and the NP in the PP to afford the PP’s overall se-

mantic content here.

231

2.3 The Extent to which Context Influences Thematic Role Assignment

It is not a new claim that context and world knowledge are required for thematic role

assignment. McCawley has argued as early as 1968 [16] that selectional restrictions

must make reference to world knowledge and that some of this knowledge will be

verb-specific. McCawley’s selectional restrictions effectively are the verb’s perspec-

tive on thematic role assignment. [2] also argue for a turn away from structural to-

wards a semantic and context-based approach for the resolution of local ambiguity in

natural language. Their position arises from experimental evidence demonstrating that

thematic role assignment can be context-directed for English reduced relatives [2].

Their findings are further supported by other techniques such as eye-tracking experi-

ments [12].

While the importance of context in thematic role assignment is unchallenged, the

question remains just how strong its contribution to ambiguity resolution is. While

contextual information can in some cases be the key to fully disambiguating an utter-

ance, in other cases it may lead the hearer astray. Trueswell and Tanenhaus [12] argue

to take into account both the strength and the local relevance of contextual con-

straints. Hence, context information is preference-directing in character. It should

therefore not be modelled as a rigid grammatical rule but rather as a preference indi-

cator dynamically adjusting to contextual influences. Our architecture should permit

exactly this: the modelling of preference gradients ranging from hard (must be met) to

soft (can be but need not be met).

3 Proposed Architecture for Cross-Modal Context Integration

3.1 Conceptual View

Our architectural approach centres around CDG, a weighted-constraint dependency

syntax parser downloadable from http://nats-www.informatik.uni-hamburg.de. Based

on a full-form lexicon and a weighted constraint grammar, CDG produces labelled

dependency structures analogous to those in (D1) in a three-step process:

1. The sentence is submitted to a range of external components such as a part-

of-speech tagger and a chunk tagger with the request for feedback.

2. When feedback from the external components has been received, CDG ap-

plies unary constraints from the grammar. Every constraint violation incurs a

penalty. The more important (‘the harder’) the constraint, the more severe its

violation penalty.

3. Once compliance with the unary constraints has been checked, CDG applies

the grammar’s binary constraints and searches for the dependency structures

with the lowest overall penalty score. Penalty scores are the product of all

penalty scores incurred during steps 2 and 3. CDG then outputs the best

ranked dependency parse.

232

To improve structural disambiguation for PP attachment CDG currently integrates an

external component, the PP Attacher, with the following properties

a. The PP Attacher is integrated in step 1 above. Note that at that stage CDG

has not performed any syntactic parsing and hence has no higher syntactic

information available yet. The PP Attacher can therefore only process its in-

put at token level.

b. Based on attachment observations from large text corpora the PP Attacher

assigns scores for the attachment of a preposition to other words in the same

sentence.

c. Scores are provided for attachment to all verbs in the sentence as well for at-

tachment to all nouns left of the preposition.

d. For large texts the PP Attacher attains an accuracy of 92% [6].

The PP Attacher’s properties above have the following implications: a. implies that

the current PP Attacher component is ignorant of sentence structure as well as supra-

sentential context. b. implies that the scores returned by the predictor are constant for

a given preposition-attachment-word pair. This means we can expect exactly one

static prediction for the disambiguation of (S1)-like sentences, irrespective of which

context they appear in. This is a direct consequence of the fact that attachment prob-

abilities are based on the preposition alone. The PP kernel noun’s contribution to the

overall PP meaning is presently ignored. c. implies that the PP Attacher evaluates

attachment at word- and not at phrase-level. While d. attests unprecedented disam-

biguation accuracies, properties a., b. and c. indicate that the PP Attacher will not

perform as well in dynamically changing contexts.

To enable our architecture to simulate cross-modal context integration as observed

in human sentence processing we propose the following enhancements:

1. Enable semantic analysis capabilities in the parser. Specifically, enable the

parser to assign thematic roles correctly.

2. Have a plausibility assessment guide thematic role assignment. Perform the

plausibility assessment based on cross-modal context information managed

in a separate Context Model.

3. Add suitable constraints to the grammar which direct attachment preferences

based on the context-sensitively assigned thematic roles. Define these con-

straints such that parsing accuracy outside of the modelled context domain

remains uncompromised.

As for the representation of context information, we believe that the most suitable

representation will be in the form of a machine-readable context ontology of cross-

modal propositional knowledge. For the modelled domain, it is to contain a represen-

For a full account of the PP Attacher see [6].

Note, however, that the PP Attacher can be combined with other external components capable

of making predictions on higher syntactic properties such as phrase structure. The PP Attacher

uses the input of those components to have access to higher syntactic information which CDG

cannot provide at the early stage in the parsing process when the PP Attacher is called.

233

tation of all relevant objects and their syntactically significant relationships to each

other.

We maintain that the domain-specific representation of extra-linguistic knowledge

need not be exhaustive as long as it is sufficient to support structural disambiguation.

Further, the nature and origin of the context knowledge are arbitrary. This means that,

in principle, the Context Model can function as a unifying interface for a large variety

of contextual data. In robotics, for example, the Context Model can be fed with pro-

positional knowledge derived from cross-modally integrated sensor data such as

video or audio data. Effectively, this permits the integration of propositional knowl-

edge from low-level sensory data into the syntax-semantic interface.

It is also worth mentioning that we are assuming in this architectural approach that

contextual information is indeed congruent with utterance meaning. Strictly speaking,

context information is just one cognitively very plausible and frequently utilised guid-

ing principle to direct structural ambiguity resolution. It is, however, not impossible

to imagine a scenario in which context and utterance meaning diverge, i. e. a scenario

in which context suggests one meaning while the speaker actually intended another

meaning for the utterance. Speakers may purposely employ such semantic contrast in

order to achieve special communicative effects, e. g. in humorous, ironic or meta-

phorical speech. The proposed architecture will detect such cases of divergence be-

tween utterance and context semantics through violation of the corresponding con-

straints.

3.2 Implementation View

To realise the conceptual approach presented in the preceding subsection several

changes to CDG’s existing standard implementation for German need to be made. We

now outline these changes in detail.

The existing standard lexicon needs to be extended to include thematic role selec-

tion criteria for all verbs within the domain modelling scope. The existing standard

grammar needs to be extended to perform thematic role assignment. Thematic roles

will be assigned based on a combination of the verb’s lexical properties and plausibil-

ity scores obtained from a Plausibility Predictor Component (PPC). The PPC man-

ages the communication between the parser and the Reasoning Component (RC)

which accesses and queries the Context Model (CM). Queries are formulated by the

PPC based on the input sentence material and lexical information it receives from

CDG. The context model is created manually using an ontology editor. The afore-

mentioned components integrate into the Context Integration Architecture (CIA)

shown in Fig 1.

For the sake of argument, we here omit the discussion of the additional levels of complexity

inferred by the need to perform suitable sensor data fusion prior to integration into the con-

text model. Our assumption is that context information in the Context Model has already

been fused across modalities.

234

CDG Parser

Reasoning

Component (RC)

Plausibility

Predictor

Component

(PPC)

Extended

German

Standard

Lexicon

Extended

German

Standard

Grammar

Sentence

Containing

structural

ambiguity

Returns plausibility scores

Load into

cdg

Context

Model

Sends query

Accesses context

information

Returns context

knowledge

Sends sentence plus lexical info

and requests plausibility scores

Fig. 1. The Context Integration Architecture (CIA).

Context integration proceeds along the following sequence of steps:

1. The extended German Standard Grammar and Lexicon as well as the input sen-

tence containing structural ambiguity are loaded into CDG.

2. CDG submits the ambiguous sentence plus additional static information from the

Lexcion to the PPC and requests plausibility scores for structural disambiguation.

3. Based on the input received from CDG, the PPC formulates a query and submits

it to the RC.

4. The RC accesses context information in the Context Model which contains a re-

presentation of the sentence context under consideration. The RC also performs

reasoning on the context information to produce context knowledge.

5. The RC returns its context knowledge to the PPC.

6. Based on the context knowledge received from the RC the PPC assigns plausibi-

lity scores and returns them to CDG. The CDG-internal syntax parsing process

now starts. The grammar’s integration constraints consider any thematic role as-

signment preferences arising from the plausibility scores received.

4 Conclusion

In this paper we have outlined a system architecture motivated by context integration

effects during human sentence processing. Our architecture couples a reasoning com-

ponent operating on a context ontology with a semantically enabled syntax parser.

Contextual influences are integrated via context-dependent plausibility assessments.

Since the Context Model in the architecture is accessed before a sentence is parsed

the syntactic structure obtained can respond dynamically to changing context repre-

sentations. Our architecture therefore permits the simulation of cross-modal context

integration in human sentence processing.

We have also hypothesised that integrating cross-modal context into syntactic con-

straint dependency parsing will significantly and substantially improve the accuracy

of structural ambiguity resolution.

235

5 Future Work

In this paper we have focused on applying the proposed architecture on PP attach-

ment. We will study parsing accuracy for other types of German syntax ambiguity

such as subject-object ambiguity or genitive-dative ambiguity in feminine singular

nouns.

Another interesting area of study opened by the proposed architecture is constraint

relaxation. Here, the effect of systematic modifications to the grammar’s constraint

penalty scores upon overall parsing accuracy will be investigated.

We also will use the proposed architecture to model more complex phenomena in

human sentence processing such as cross-modal compensation. This effect assumes

the reliance on contextual information to achieve improved robustness to ungram-

matical or incomplete input.

By extending the mechanism for populating the Context Model from manual to

automated, processing can extend to continuous data streams, which would permit an

expansion from sentence-by-sentence operation to continuous operation. Application

domains in which the Context Model is filled with a continuous flow of propositional

knowledge obtained from different cross-modal sources (e. g. from a robot’s camera

and microphone) are of particular interest. By looping contextual representations

based on parsing results back into the Context Model the architecture may be ex-

tended to build up a discourse history.

References

1. Knoeferle, Pia S. (2005). The Role of Visual Scenes inSpoken Language Comprehension:

Evidence from Eye-Tracking (PhD Thesis). Saarbrücken: Universität desSaarlandes.

2. Crain, Stephen & Steedman, Mark (1985). On not Being Led up the Garden Path - the Use

of Context by the Psychological Syntax Parser. In D. Dowty, L. Karttunen and A.Zwicky

(Eds.), Natural language parsing: Psychological, computational, and theoretical perspec-

tives (pp. 320 – 358). Cambridge University Press.

3. Lieberman, Henry, Faaborg, Alexander, Daher, Waseem, & Espinosa, José (2005). How to

Wreck a Nice Beach You Sing Calm Incense. International Conference on Intelligent User-

Interfaces, IUI 2005, January 9 – 12, 2005.

4. Watanabe, Katsumi (2001). Cross-modal Interactions in Humans (PhD Thesis). Pasadena,

California: California Instituteof Technology.

5. Tanenhaus, Michael, Spivey-Knowlton, Michael. J., Eberhard, Kathleen M. et al. (1995).

Integration of visual andlinguistic information in spoken language comprehension.

SCIENCE (Volume 268), 16 June 1995, pp. 1632 – 1634.

6. Foth, Kilian & Menzel, Wolfgang (2006). The Benefit of Stochastic PP Attachment to a

Rule-Based Parser. In Proceedings of the 21st International Conference on Computational

Linguistics. Sydney: Coling-ACL-2006.

7. Lüdtke, Dirk & Sato, Satoshi (2003). Fast Base NP Chunking with Decision Trees – Ex-

periments on Different POS Tag Settings. In A. Gelbukh (Ed.) Computational linguistics

and intelligent text processing, Springer LNCS, 2003, pp. 136 – 147.

8. Kudo, Taku & Matsumoto, Yuji (2000). Use of SupportVector Learning for Chunk Identi-

fication. In Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal.

236

9. Auble, Pamela & Franks, Jeffery J. (1983). Sentence comprehension processes. Journal of

Verbal Learning and Verbal Behavior (22), pp. 395 – 405.

10. Grimshaw, Jane B. (1990). Argument structure. Cambridge MA: MIT Press.

11. Simpson, Jane (1991). Warlpiri morpho-syntax: A lexicalist approach. Dordrecht: Kluwer

Academic Publishers.

12. Trueswell, John C., Tanenhaus, Michael K., & Garnsey,Susan M. (1994). Semantic Influ-

ences on Parsing: Use of Thematic Role Information in Syntactic Ambiguity Resolution.

Journal of Memory and Language (33), pp. 285 – 318.

13. Kuperberg, Gina R., Kreher, Donna A., Sitnikova, Tatiana, Caplan, David N., & Holcomb,

Phillip J. (2006). The Role of Animacy and Thematic Relationships in Processing Active

English Sentences: Evidence from Event-Related Potentials. In Brain and Language, in

press.

14. Dowty, David (1989). On the Semantic Content of theNotion of 'Thematic Role'. In Gen-

naro Chierchia, Barbara H.Partee and Raymond Turner (Eds.), Properties, types and mean-

ing (Volume II: Semantic issues). Dordrecht: KluwerAcademic Publishers, pp. 69 – 130.

15. Ferretti, Todd R., McRae, Ken & Hatherell, Andrea (2001). Integrating Verbs, Situation

Schemas, and Thematic Role Concepts. Journal of Memory and Language (33), pp. 516 –

547.

16. McCawley, J. D. (1968). The Role of Semantics in Grammar. In E. Bach & R. T. Harms

(Eds.), Universals in linguistic theory. New York: Holt, Rinehart, & Winston.

237