promising work extends this paradigm to interactive
(i.e. multi agent) settings (Gmytrasiewicz and Doshi,
2005), (Doshi and Gmytrasiewicz, 2005). In these
models, the recursive modeling of other agents’ be-
liefs and reasoning is made explicit in a framework
called the I-POMDP that extends classical POMDPs.
Computable (i.e. finitely nested) instances of these
models simply stop the recursive reflection after fi-
nite levels, allowing agents to make the best-possible
decision with the information they chose to repre-
sent. In essence, they are (expected utility maximiz-
ing) decision-theoretic models that represent and rea-
son with finite levels of the interactive belief hierar-
chy and make no (common knowledge) assumptions
regarding the levels that are not modeled.
This finite interactive decision theoretic model is
the assumed modeling framework throughout the cur-
rent work, where we study a particular interactive se-
quential game – namely, two-stage seller-offers bar-
gaining under incomplete information (Samuelson,
1984). It is known that this game has a unique Perfect
Bayesian Equilibrium (Sobel and Takahashi, 1983), if
it is asummed that the seller’s belief about the buyer’s
valuation is commonly known.
In this work, we do not make the assumption that
the seller’s (first-order) belief is commonly known.
Instead, we cast the problem in the finite interactive
decision theoretic framework for which we derive op-
timal strategies for three types of agents of succes-
sively increasing epistemological sophistication.
Our first main contribution is the observation of a
systematic regular (monotonic) relationship between
the epistemology of the problem and the agents’ op-
timal behavior. Secondly, this regularity is exploited
to devisea belief generation scheme that generates be-
liefs that are “evenly dispersed” across an entire space
of beliefs – equivalent to sampling the higher order
belief in “evenly dispersed” locations. And, thirdly,
solutions to these sample beliefs are precomputed of-
fline for later use in the online stage – which consists
of a binary search through the space of solved beliefs
to identify the closest sample belief in order to more
accurately approximately predict future behavior of
the opponent.
In the next section, we describe the model(s) used
throughout this paper and introduce necessary nota-
tions. In the following three sections, we describe, re-
spectively, the deliberative reasoning process for each
of the three strategy levels. Our main contributions
with respect to (higher order) belief sampling, identi-
fication and updating are presented in the context of
the discussion about the most sophisticated agent type
studies in this paper – the L3-Buyer (Section 5). In the
final section, we summarize our contributions, state
ongoing work and discuss relevant open questions.
2 PRELIMINARIES
Throughout this paper, it is assumed that the seller’s
valuation c = 0 and that the buyer’s valuation v is such
that 0 ≤ v ≤ 1. These are assumed to be commonly
known; the exact value of v is the buyer’s private in-
formation. The seller’s belief about the buyer’s val-
uation has the distribution F(v) = v;0 ≤ v ≤ 1. We
assume also that trade is feasible, i.e. that c ≤ v. The
mechanism is simple – the seller makes a first offer
x
1
which the buyer may chose to accept; if the buyer
rejects it, the seller makes a second and final offer x
2
.
The buyer strategic decision consists of choosing a
decision boundary d(x
1
) – it accepts the first offer x
1
if v ≥ d(x
1
).
If agreement is arrived at on the first day, the pay-
offs are x
1
and v − x
1
to the seller and buyer, respec-
tively. If agreement is arrivedat on the second day, the
payoffs are δ·x
2
and δ·(v−x
2
), respectively. Else, the
payoffs are 0 to either player. A discount factor, δ, is
applied to the payoffs on the second day.
Agents may form other relevant beliefs and
higher-order beliefs; for e.g. the seller may form a be-
lief about the buyer’s valuation, the buyer may form
a second-order belief about the seller’s first-order be-
lief about its (i.e. the buyer’s) valuation, etc. None of
these beliefs are assumed to be commonly known.
2.1 Notations
The following notation will be used throughout.
B
X
(p). Belief maintained by agent X (either seller
S or buyer B) about p, where p is the object of the
agent’s belief and may be a ground proposition or an-
other agent’s belief about something (this will be clear
from the context).
U(s). A uniform belief supported over a space s,
where s may be a (finite or countable) set of ground
propositions or a set of beliefs.
E[v]. Expected value of random variable v.
p
Accept
(x). The probability that offer x is accepted.
p
Reject
(x). The probability that offer x is rejected
(equal to 1− p
Accept
(x)).
π
1
(·). The expected utility function for the entire
(2-stage) sequential bargaining game. We denote
argmaxπ
1
(·) by Π
1
(·).
π
2
(·). The expected utility function for the last (i.e.
second) stage of the bargaining game. We denote
SAMPLING AND UPDATING HIGHER ORDER BELIEFS IN DECISION-THEORETIC BARGAINING WITH FINITE
INTERACTIVE EPISTEMOLOGIES
115