2 BASIS VECTORS AND
BEHAVIORAL STATE
In quantum probability theory a vector space
(technically, a Hilbert space) represents all possible
outcomes for questions we could ask about a system.
A basis is a set of linearly independent vectors that,
in linear combination, can represent every vector in
the vector space. They represent the coordinate
system and correspond to elementary observations.
Put it another way, the intersection of all subspaces
containing the basis vectors, that is, their linear span,
constitutes the vector space. A vector represents the
state of the system, given by the superposition of the
basis vectors according to their coefficients (Hughes,
1989; Isham, 1989). Historically, quantum
probability has been applied to physical systems but
the same analysis can refer to other types of systems,
including animals and software agents. At the end of
the day, animals are behavior systems –sets of
behaviors that are organized around biological
functions and goals, e.g., feeding (Timberlake and
Silva, 1995), defense (Fanselow, 1994), or sex
(Domjan, 1994). Software agents, on the other hand,
are formally defined as systems that (learn to) act in
virtual environments. Not surprisingly,
reinforcement learning in software agents has taken
concepts and methods from operant conditioning
theory. In turn, the former, software learning agents,
can be understood as computational models of the
latter, operant conditioning.
We define two basis vectors according to the
dichotomies reinforcement vs. punishment and
positive vs. negative in Fig. 1. The former, that we
call Frequency, takes values ranging from a
maximum number of responses per unit time
(Reinforcement) to the absence of response
(Punishment); the latter, that we call Applies, takes
values from “the response always applies the
outcome” (Positive) to “the response always
removes the outcome” (Negative). The values in
between indicate various response frequencies, that
is, probabilities that the animal responds, and
various probabilities that the outcome follows the
response, respectively.
The relation of the two bases is undetermined, in
the sense that even in the simplest reinforcement
schedules (fixed/variable ratio/interval schedules)
we cannot observe with certainty how the response
affects the outcome and how the outcome affects the
frequency of responding at the same time. This
uncertainty is aggravated in more complex
compound schedules.
The problem is thus how to determine the
behavioral state of an animal given this uncertainty.
Several models have been proposed to explain
patterns of operant behavior, some of which use
probabilities (see (Staddon and Cerutti, 2003) for a
recent survey). We argue that the inherent
uncertainty in operant conditioning cannot be
represented using classical probability (Kolmogorov,
1933), and that we need quantum probability
instead.
The behavioral state of the animal is represented
using the state vector, a unit length vector, denoted
as |Ψ in bra-ket notation. We need to find out which
linear combination of the basis vectors results in a
given behavioral state and with which probability.
We start with a single question in Fig. 2, about
whether the response applies the outcome. In this
case |Positive and |Negative are the basis states, so
we can write |Ψ = a|Positive + b|Negative
, where
“a” and “b” are amplitudes (coefficients) that reflect
the components of the state vector along the
different basis vectors. The answer to the question is
certain when the state vector |Ψ exactly coincides
with one basis vector. For instance if “the response
always applies the outcome”, then |Ψ = |Positive.
In such case the probability of Positive is 1. Since
the basis vectors are orthogonal, that is, since they
represent mutually exclusive answers, we know that
“the response removes the outcome” with 0
probability, corresponding to a 0 projection to the
subspace for Negative.
Figure 2: State space with the Applies subspace
(corresponding to the question whether response applies
outcome) and Positive-Negative basis vectors. The blue
vertical line represents the projection of |Ψ on |Positive.
To determine the probability of Positive we use a
projector, P
Positive
, which takes the vector |Ψ and
lays it down on the subspace spanned by |Positive,
that is, P
Positive
|Ψ = a|Positive. Then, the probability
that the response applies the outcome is equal to the
squared length of the projection, ||P
Positive
|Ψ||
2
. The
same applies to the probability associated with
b|Negative.
Posi ve
Nega ve
Ψ
(a)
QuantumProbabilityinOperantConditioning-BehavioralUncertaintyinReinforcementLearning
549