minimize the destabilizing effect context has on con-
cepts, such as that of Machery (2015), or others that
assume a more intermediate position such as. that of
Mazzone and Lalumera (2009), that while acknowl-
edging the fundamental. role context might play in
concepts, sustain that a characterizing stable nucleus
of mental concepts is also a part.
On the other hand, cognitive neuroscience is now
starting to consider in a systematic way how context
interacts with neural responses (Stark et al., 2018).
The way context drives language comprehension de-
pends on the effects of context on the conceptual scaf-
folding of the listener, which in turn, is the result of
his neural responses in combination to context.
The kind of ambiguity addressed in this paper is
the canonical case of structural ambiguity, technically
known as Prepositional Phrase Attachment, where a
sentence includes a prepositional phrase that can be
attached to more than one higher level phrases (Hin-
dle and Rooth, 1993). The attachment resolution is
context dependent, we deal specifically with the case
when depends on the visual context.
Specifically, provided with a sentence, admitting
two or more candidate interpretations, and an image
that depicts its content, it is required to choose the
correct interpretation of the sentence depending on
the image’s content. Thus we address the problem of
selecting the interpretation of an ambiguous sentence
matching the content of a given image.
This type of inference is frequently called for in
human communication occurring in a visual environ-
ment, and is crucial for language acquisition, when
much of the linguistic content refers to the visual sur-
roundings of the child (Bates et al., 1995; Bornstein
and R.Cote, 2004).
This kind of task is also fundamental to the prob-
lem of grounding vision in language, by focusing on
phenomena of linguistic ambiguity, which are preva-
lent in language, but typically overlooked when using
language as a medium for expressing understanding
of visual content. Due to such ambiguities, a super-
ficially appropriate description of a visual scene may
in fact not be sufficient for demonstrating a correct
understanding of the relevant visual content.
From the neurocomputational point of view, our
model is based on Nengo (https://www.nengo.ai),
the implementation of Eliasmith’s Neural Engineer-
ing Framework (NEF) (Eliasmith, 2013). The basic
semantic component within NEF is the so-called Se-
mantic Pointer Architecture (SPA) (Thagard, 2011),
which determines how the concepts are represented
as dymanic neural assemblies. The model works by
extracting the three involved entities from the input
sentence and identifying the categories involved.
Early experimental results show that the presented
computational model achieves a reliable ability to dis-
ambiguate sentences.
1.1 A Framework for Neural Semantics
The two main requirements we seek in the identifi-
cation of a suitable neural framework to be adopted
all along this work is the biological plausibility and
the possibility of modeling at a level enough abstract
to deal with full images and with words in sentences.
The two requirements are clearly in stark contrast.
Today the legacy of connectionism has been taken
up by the family of algorithms collected under the
name deep learning. Unlike the former artificial neu-
ral networks, deep learning models succeeds in highly
complex cognitive tasks, reaching even human-like
performances in some visual tasks (VanRullen, 2017).
However, the level of biological plausibility of deep
learning algorithms is in general even lower than in
connectionism, these models were developed with en-
gineering goals in mind, and exploring cognition is
not in the agenda of this research community (Plebe
and Grasso, 2019). In our model we will also include
a very simple deep learning component, but only for
the low-level analysis of the images. This choice
makes the model simpler, by exploiting the ease of
deep learning model in processing visual stimuli. It
would have been easy to solve also the crucial part
of our problem, the semantic disambiguation, through
deep learning, but this would have been of little value
as a cognitive model.
Currently, the neural framework that can simu-
late the widest range of cognitive tasks, by adopting a
unified methodology with a reasonable degree of bi-
ological plausibility, is Nengo (Neural ENGineering
Objects) (Eliasmith, 2013). The idea behind Nengo
dates back to 2003, thanks to the former NEF (Neu-
ral Engineering Framework) (Eliasmith and Ander-
son, 2003), which defines a general methodology for
the construction of large cognitive models, informed
by a number of key neuroscientific concepts. In brief,
the three main such concepts are the following:
• The Representation Principle: neural representa-
tions are defined by the combination of nonlinear
encoding of spikes over a population of neurons,
and weighted decoding over the same populations
of neurons and over time;
• The Transformation Principle: transformations of
neural representations are functions of the vari-
ables represented by neural populations. Trans-
formations are determined using an alternately
weighted decoding;
NCTA 2020 - 12th International Conference on Neural Computation Theory and Applications
448