A FOUNDATION FOR INFORMED NEGOTIATION
John Debenham and Simeon Simoff
University of Technology, Sydney
PO Box 123, Broadway, NSW 2007, Australia
Keywords:
Intelligent Agents, Decision Support, Agents for Internet Computing.
Abstract:
Approaches to the construction of agents that are to engage in competitive negotiation are often founded
on game theory. In such an approach the agents are endowed with utility functions and assumed to be utility
optimisers. In practice the utility function is derived in the context of massive uncertainties both in terms of the
agent’s priorities and of the raw data or information. To address this issue we propose an agent architecture that
is founded on information theory, and that manages uncertainty with entropy-based inference. Our negotiating
agent engages in multi-issue bilateral negotiation in a dynamic information-rich environment. The agent
strives to make informed decisions. The agent may assume that the integrity of some of its information decays
with time, and that a negotiation may break down under certain conditions. The agent makes no assumptions
about the internals of its opponent — it focuses only on the signals that it receives. It constructs two probability
distributions over the set of all deals. First the probability that its opponent will accept a deal, and second that
a deal will prove to be acceptable to it in time.
1 INTRODUCTION
We propose an agent architecture that is founded on
information theory, and that manages uncertainty with
entropy-based inference. This architecture aims to ad-
dress the problem of defining a utility function for a
game-theoretic agent in the context of massive uncer-
tainties both in terms of the agent’s priorities and of
the raw data or information. Our agent does not nec-
essarily have a utility function and so is not neces-
sarily a utility optimiser it may have less selfish
aims such as simply ‘playing fair’. This Negotiating
Agent, NA, engages in bilateral bargaining with an op-
ponent, OP. It strives to make informed decisions in
an information-rich environment that includes market
data and general sources including the Internet. NA
attempts to fuse the negotiation with the information
generated both by and because of it. It reacts to in-
formation derived from its opponent and from the en-
vironment, and proactively seeks missing information
that may be of value.
NA is founded on information theory. Game the-
ory tells us what to do, and what outcome to expect,
in many well-known negotiation situations, but these
strategies and expectations are derived from assump-
tions about the internals of the opponent. Game theo-
retic analyses of bargaining are founded on the notion
of agents as utility optimizers in the presence of com-
plete and incomplete information about their oppo-
nents (Muthoo, 1999). Two probability distributions
form the foundation of both the offer evaluation and
the offer making processes. They are both over the set
of all deals and are based on all information available
to the agent. The first distribution is the probability
that any deal is acceptable to OP. The second distrib-
ution is the probability that any deal will prove to be
acceptable to NA this distribution generalizes the
notion of utility.
NA makes no assumptions about the internals of
OP in particular whether it has a utility function. NA
does make assumptions about: the way in which the
integrity of information will decay, preferences that
its opponent may have for some deals over others,
and conditions that may lead to breakdown. It also
assumes that unknown probabilities can be inferred
using maximum entropy probabilistic logic (MacKay,
2003) that is based on random worlds (Halpern,
2003). The maximum entropy probability distribu-
tion is “the least biased estimate possible on the given
information; i.e. it is maximally noncommittal with
23
Debenham J. and Simoff S. (2006).
A FOUNDATION FOR INFORMED NEGOTIATION.
In Proceedings of the Eighth International Conference on Enterprise Information Systems - AIDSS, pages 23-30
DOI: 10.5220/0002494000230030
Copyright
c
SciTePress
regard to missing information” (Jaynes, 1957). In the
absence of knowledge about OPs decision-making
apparatus, NA assumes that the “maximally noncom-
mittal” model is the correct model on which to base
its reasoning.
A preference relation is an assumption that NA
makes about OPs preferences for some deals over
others. For example, that she prefers to pay a lower
price to a higher price. A single-issue preference rela-
tion assumes that she prefers deals on the basis of one
issue alone, independent of the values of the other is-
sues. A preference relation may be assumed prior to
the negotiation, or during it based on the offers made.
For example, the opponent may display a preference
for items of a certain color; (Faratin et al., 2003) de-
scribes a basis for ordering colors. The preference re-
lations illustrated here are single-issue orderings, but
the agent’s reasoning operates equally well with any
preference relation as long as it may be expressed in
Horn clause logic.
Bilateral bargaining is known to be inherently in-
efficient (Myerson and Satterthwaite, 1983). (Bulow
and Klemperer, 1996) shows that a seller is better off
with an auction that attracts n+1 buyers than bargain-
ing with n individuals, no matter what the bargain-
ing protocol is. (Neeman and Vulkan, 2000) shows
that the weaker bargaining types will fare better in ex-
changes leading to a gradual migration. These results
hold for agents who aim to optimize their utility and
do limit the work described here.
2 INFORMED AGENTS
NA operates in an information-rich environment. The
integrity of its information, including information ex-
tracted from the Internet, will decay in time. The way
in which this decay occurs will depend on the type of
information, and on the source from which it is drawn.
Little appears to be known about how the integrity of
information, such as news-feeds, decays.
One source of NAs information is the signals re-
ceived from OP. These include offers to NA, and the
acceptance or rejection of NAs offers. If OP rejected
NAs offer of $8 two days ago then what is NAs belief
now in the proposition that OP will accept another of-
fer of $8 now? Perhaps it is around 0.1. A linear
model is used to model the integrity decay of these
beliefs, and when the probability of a decaying belief
approaches 0.5
1
the belief is discarded. This choice
of a linear model is independent of the bargaining
method. The model of decay could be exponential,
quadratic or what ever.
1
A sentence probability of 0.5 represents “maybe,
maybe not”.
2.1 Interaction Protocol
The agents communicate using sentences in a first-
order language L. This includes the exchange, ac-
ceptance and rejection of offers. L contains the fol-
lowing predicates: Offer(δ), Accept(δ), Reject(δ) and
Quit(.), where Offer(δ) means “the sender is offering
you a deal δ”, Accept(δ) means “the sender accepts
your deal δ”, Reject(δ) means “the sender rejects your
deal δ” and Quit(.) means “the sender quits — the ne-
gotiation ends”.
Two negotiation protocols are described. First, ne-
gotiation without decay in which all offers stand for
the entire negotiation. Second, negotiation with de-
cay in which offers stand only if accepted by return
NA represents OPs offers as beliefs with sentence
probabilities that decay in time.
NA and OP each exchange offers alternately at suc-
cessive discrete times (Kraus, 2001). They enter into
a commitment if one of them accepts a standing offer.
The protocol has three stages:
1. Simultaneous, initial, binding offers from both
agents;
2. A sequence of alternating offers, and
3. An agent quits and walks away from the negotia-
tion.
The negotiation ceases either in the second round if
one of the agents accepts a standing offer or in the fi-
nal round if one agent quits and the negotiation breaks
down.
In the first stage the agents simultaneously send Of-
fer(.) messages to each other. These initial offers are
taken as limits on the range of values that are consid-
ered possible. This is crucial to the method described
in Sec. 3 where there are domains that would other-
wise be unbounded. The exchange of initial offers
“stakes out the turf on which the subsequent nego-
tiation will take place. In the second stage an Of-
fer(.) message is interpreted as an implicit rejection,
Reject(.), of the opponent’s offer on the table.
2.2 Agent Architecture
Incoming messages from all sources are time-
stamped and placed in an “In Box”, X , as they ar-
rive. NA has a knowledge base K and a belief set
B. Each of these two sets contains statements in L.
K contains statements that are generally true, such as
x(Accept(x) ↔¬Reject(x)) i.e. an agent does
one thing or the other. The belief set B = {β
i
} con-
tains statements that are each qualified with a given
sentence probability, B(β
i
), that represents an agent’s
belief in the truth of the statement. These sentence
probabilities may decay in time.
The distinction between the knowledge base K and
the belief set B is simply that K contains unqualified
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
24
statements and B contains statements that are quali-
fied with sentence probabilities. K and B play differ-
ent roles in the method described in Sec. 3.
NAs actions are determined by its “strategy”. A
strategy is a function S : K×B Awhere A
is the set of actions. At certain distinct times the
function S is applied to K and B and the agent does
something. The set of actions, A, includes sending
Offer(.), Accept(.), Reject(.) and Quit(.) messages to
OP. The way in which S works is described in Sec. 5.
Momentarily before the S function is activated, a “re-
vision function” R is activated:
R :(K×B) (K×B)
R clears the “In Box”, and stores the messages either
in B with a given sentence probability or in K.
A deal, δ, is a commitment for the sender to do
something, τ (the sender’s “terms”), subject to the re-
ceiver committing to do something, ω (the receiver’s
“terms”): δ =(τ,ω). NA may have a real-valued
utility function: U : T→, where
T is the set of
terms. If so, then for any deal δ =(τ,ω) the expres-
sion U(ω)U(τ ) is called the surplus of δ. An agent
may be unable to specify a utility function either pre-
cisely or with certainty.
2
Sec. 4 describes a predicate
NAAcc(.) that represents the “acceptability” of a deal.
NA uses three things to make offers: an estimate of
the likelihood that OP will accept any offer [Sec. 3],
an estimate of the likelihood that NA will, in hind-
sight, feel comfortable accepting any particular offer
[Sec. 4], and an estimate of when OP may quit and
leave the negotiation [Sec. 5].
2.3 Random Worlds
Let G be the set of all positive ground literals that can
be constructed using the predicate, function and con-
stant symbols in L.Apossible world is a valuation
function v : G→{, ⊥}. V denotes the set of all
possible worlds, and V
K
denotes the set of possible
worlds that are consistent with a knowledge base K
(Halpern, 2003).
A random world for K is a probability distribution
W
K
= {p
i
} over V
K
= {v
i
}, where W
K
expresses
an agent’s degree of belief that each of the possi-
ble worlds is the actual world. The derived sentence
probability of any σ ∈L, with respect to a random
world W
K
is:
P
W
K
(σ)
n
{ p
n
: σis in v
n
} (1)
A random world W
K
is consistent with the agent’s
beliefs B if: (β ∈B)(B(β)=P
W
K
(β)). That
2
The often-quoted oxymoron “I paid too much for it,
but its worth it. attributed to Samuel Goldwyn, movie pro-
ducer, illustrates that intelligent agents may choose to nego-
tiate with uncertain utility.
is, for each belief its derived sentence probability as
calculated using Eqn. 1 is equal to its given sentence
probability.
The entropy of a discrete random variable X with
probability mass function {p
i
} is (MacKay, 2003):
H(X)=
n
p
n
log p
n
where: p
n
0 and
n
p
n
=1. Let W
{K,B}
be the “maximum en-
tropy probability distribution over V
K
that is consis-
tent with B”. Given an agent with K and B, its derived
sentence probability for any sentence, σ ∈L, is:
(σ ∈L)P(σ) P
W
{K,B}
(σ) (2)
Using Eqn. 2, the derived sentence probability for any
belief, β
i
, is equal to its given sentence probability. So
the term sentence probability is used without ambigu-
ity.
3 ESTIMATING P(OPAcc(.))
NA does two different things. First, it reacts to of-
fers received from OP that is described in Sec. 4.
Second, it sends offers to OP. This section describes
the estimation of P(OPAcc(δ)) where the predicate
OPAcc(δ) means “the deal δ is acceptable to OP”.
When a negotiation commences NA may have no
information about OP or about prior deals. If so then
the initial offers may only be based on past experience
or circumstantial information.
3
So the opening offers
are simply taken as given.
In the four sub-sections following, NA is attempt-
ing sell something to OP. In Secs. 3.1 and 3.2 NA’s
terms τ are to supply a particular good, and OP’s
terms ω are money in those examples the amount
of money ω is the subject of the negotiation. In
Secs. 3.3 and 3.4 NAs terms are to supply a partic-
ular good together with some negotiated warranty pe-
riod, and OPs terms are money — in those examples
the amount of money p and the period of the warranty
period w are the subject of the negotiation.
3.1 One Issue Without Decay
The unary predicate OPAcc(x) means “the amount of
money $x is acceptable to OP”. NA is interested in
whether the unary predicate OPAcc(x) is true for var-
ious values of $x. NA assumes the following prefer-
ence relation on the OPAcc predicate:
κ
1
: x, y((x>y) (OPAcc(x) OPAcc(y)))
3
In rather dire circumstances King Richard III of Eng-
land is reported to have initiated a negotiation with remark-
ably high stakes: A horse! a horse! my kingdom for a
horse!” [William Shakespeare]. Fortunately for Richard, a
person named Catesby was nearby, and advised Richard to
retract this rash offer “Withdraw, my lord”, and so Richard’s
intention to honor his commitments was not put to the test.
A FOUNDATION FOR INFORMED NEGOTIATION
25
Suppose that NAs opening offer is ω, and OPs open-
ing offer is ω
where ω < ω. Then K now con-
tains two further sentences: κ
2
: ¬OPAcc(ω) and
κ
3
: OPAcc(ω). There are now ωω possible worlds,
and the maximum entropy distribution is uniform.
Suppose that NA knows its true valuation for the
good, u
na
, and that NA has decided to make an
“expected-utility-optimizing” offer: x =
ω+u
na
2
. This
offer is calculated on the basis of the preference or-
dering κ
1
and the two signals that NA has received
from OP. The response is in terms of only NAs valu-
ation u
na
and the signal Reject(ω) it is independent
of the signal Offer(ω
) which implies that ω is accept-
able.
In the standard game theoretic analysis of bargain-
ing (Muthoo, 1999), NA assumes that OP has a utility,
u
op
, that it lies in some interval [u, u], and that the ex-
pected value of u
op
is uniformly distributed on that
interval. On the basis of these assumptions NA then
derives the expected-utility-optimizing offer:
u+u
na
2
.
These two offers differ by
u in the game-theoretic re-
sult and
ω in the maximum entropy result. The game
theoretic approach relies on estimates for u
and u:
E([u
, u] | Reject(ω) Accept(ω))
If OP has a utility, and it may not, then if OP is ra-
tional: u
ω u. The inherent inefficiency of bi-
lateral bargaining (Myerson and Satterthwaite, 1983)
shows for an economically rational OP that u
op
, and
so consequently
u, may be greater than ω. There is no
reason to suspect that
u and ω will be equal.
3.2 One Issue With Decay
As in the previous example, suppose that the open-
ing offers at time t
0
are taken as given and are ω and
ω. Then K contains κ
1
, κ
2
and κ
3
. Suppose L con-
tains n consecutive, integer constants in the interval
[ω
, ω], where n = ω ω +1, that represent various
amounts of money. κ
1
induces a total ordering on the
sentence probabilities for OPAcc(x) on the interval
[ω
, ω], where the probabilities are 0 at ω, and 1
at ω
.
Suppose that at time t
1
NA makes an offer ω
na
which is rejected by OP, who has replied at time t
2
with an offer of ω
op
where ω ω
op
ω
na
ω.
At time t
3
B contains β
1
: OPAcc(ω
na
) and β
2
:
OPAcc(ω
op
). Suppose that there is some level of in-
tegrity decay on these two beliefs: 0 < B(β
1
) <
0.5 < B(β
2
) < 1. Then V
K
contains n +1possi-
ble worlds ranging from “all false” to “all true” each
containing n literals. So a random world for K will
consist of n +1probabilities {p
i
}, where, say, p
1
is
the probability of “all true”, and p
n+1
is the probabil-
ity of “all false”. P
{K,B}
will be the distribution that
maximizes
n
p
n
log p
n
subject to the constraints:
p
n
0,
n
p
n
=1,
ωω
na
+1
n=1
p
n
= B(β
1
) and
ωω
op
+1
n=1
p
n
= B(β
2
).
The optimization of entropy, H, subject to linear
constraints is described in Sec. 3.2.1 below. P
{K,B}
is:
p
n
=
B(β
1
)
ωω
na
+1
if1 n ω ω
na
+1
B(β
2
)B(β
1
)
ω
na
ω
op
ifω ω
na
+1<n<ω ω
op
+2
1B(β
2
)
ω
op
ω+1
ifω ω
op
+2 n ω ω +2
Using Eqn. 2, for ω
op
x ω
na
:
P(OPAcc(x)) =
B(β
1
)+
ω
na
x
ω
na
ω
op
(B(β
2
) B(β
1
))
(3)
These probability estimates are used in Sec. 5 to cal-
culate NAs next offer.
The values for P(OPAcc(x)) in the region ω
op
x ω
na
are derived from only two pieces of in-
formation that are the two signals Reject(ω
na
) and
Offer(ω
op
) each qualified with the time at which they
arrived, and the decay rate on their integrity. The as-
sumptions in the analysis given above are: the choice
of values for ω
and ω which do not appear in Eqn. 3
in any case — and the choice of the “maximally non-
committal” distribution.
If the agents continue to exchange offers then new
beliefs will be acquired and the integrity of old be-
liefs will decay. If the next pair of offers lies within
the interval [ω
op
na
] and if the integrity of β
1
and
β
2
decays then the sentence probabilities of β
1
and
β
2
will be inconsistent with those of the two new be-
liefs due to the total ordering of sentence probabili-
ties on [ω
, ω] induced by κ
1
. This inconsistency is
resolved by the revision function R that here discards
inconsistent older beliefs, β
1
and β
2
, in favor of more
recent beliefs. If the agents continue in this way then
the sentence probabilities for the OPAcc predicate are
given simply by Eqn. 3 using the most recent values
for ω
na
and ω
op
.
The analysis given above requires that values be
specified for the opening offers ω
and ω. The only
part of the probability distribution that depends on the
values chosen for ω
and ω are the two “tails” of the
distribution. So the choice of values for these two
opening offers is unlikely to effect the estimates. The
two tails are necessary to “soak up” the otherwise un-
allocated probability.
3.2.1 Maximizing Entropy
If X is a discrete random variable taking a finite num-
ber of possible values {x
i
} with probabilities {p
i
}
then the entropy is the average uncertainty removed
by discovering the true value of X, and is given by
H =
n
p
n
log p
n
. The direct optimization of
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
26
H subject to a number, θ, of linear constraints of
the form
n
p
n
g
k
(x
n
)=g
k
for given constants g
k
,
where k =1,...,θ, is a difficult problem. Fortu-
nately this problem has the same unique solution as
the maximum likelihood problem for the Gibbs dis-
tribution (Pietra et al., 1997). The solution to both
problems is given by:
p
n
=
exp(
θ
k=1
λ
k
g
k
(x
n
))
m
exp(
θ
k=1
λ
k
g
k
(x
m
))
(4)
for n =1, 2, ···, where the constants {λ
i
} may
be calculated using Eqn. 4 together with the three
sets of constraints: p
n
0,
n
p
n
=1and
n
p
n
g
k
(x
n
)=g
k
. The distribution in Eqn. 4 is
known as Gibbs distribution.
Calculating the expressions for the values of {p
n
}
given in the example above in Sec. 3.2 does not re-
quire the full evaluation of the expressions in Eqn. 4.
That equation shows that there are just three different
values for the {p
n
}. Applying simple algebra to that
fact together with the constraints yields the expres-
sions given.
3.3 Two Issues Without Decay
The above approach to single-issue bargaining gener-
alizes without modification to multi-issue bargaining,
it is illustrated with two issues only for ease of presen-
tation. The problem considered is the sale of an item
with 0,...,4 years of warranty. The terms being ne-
gotiated specify an amount of money p and the num-
ber of years warranty w. The predicate OPAcc(w, p)
now means OP will accept the offer to purchase the
good with w years warranty for $p”.
NA assumes the following two preference order-
ings, and K contains:
κ
11
: x, y, z((x>y) (OPAcc(y, z)
OPAcc(x, z)))
κ
12
: x, y, z((x>y) (OPAcc(z, x)
OPAcc(z, y)))
As in Sec. 3.1, these sentences conveniently reduce
the number of possible worlds. The number of
possible worlds will be finite as long as K con-
tains two statements of the form: ¬OPAcc(4,a) and
OPAcc(0,b) for some a and b. Suppose that NAs ini-
tial offer was “4 years warranty for $21” and OP’s
initial offer was “no warranty for $10”. K now con-
tains:
κ
13
: ¬OPAcc(4, 21) κ
14
: OPAcc(0, 10)
These two statements, together with the restriction to
integers only, limit the possible values of w and p in
OPAcc(w, p) to a 5 × 10 matrix.
Suppose that NA knows its utility function for the
good with 0,...,4 years warranty and that its values
are: $11.00, $11.50, $12.00, $13.00 and $14.50 re-
spectively. Suppose that NA uses the strategy S
(n)
which is described in Sec. 5 the details of that
strategy are not important now. If NA uses that strat-
egy with n =2, then NA offers Offer(2, $16) which
suppose OP rejects and counters with Offer(1, $11).
Then with n =2again, NA offersOffer(2, $14) which
suppose OP rejects and counters with Offer(3, $13).
P(OPAcc(w, p)) now is:
w =0 w =1 w =2 w =3 w =4
p =20 0.0000 0.0000 0.0000 0.0455 0.0909
p =19 0.0000 0.0000 0.0000 0.0909 0.1818
p =18 0.0000 0.0000 0.0000 0.1364 0.2727
p =17 0.0000 0.0000 0.0000 0.1818 0.3636
p =16 0.0000 0.0000 0.0000 0.2273 0.4545
p =15 0.0000 0.0000 0.0000 0.2727 0.5454
p =14 0.0000 0.0000 0.0000 0.3182 0.6364
p =13 0.0455 0.0909 0.
1364 1.0000 1.0000
p =12 0.0909 0.1818 0.2727 1.0000 1.0000
p =11 0.1364 1.0000 1.0000 1.0000 1.0000
and the expected-utility-optimizing offer is: Of-
fer(4, $18).IfNA makes that offer then the expected
surplus is $0.95. The matrix above contains the “max-
imally non-committal” values for P(OPAcc(w, p));
those values are recalculated each time a signal ar-
rives. The example demonstrates how the NA is able
to conduct multi-issue bargaining in a focussed way
without making assumptions about OPs internals, in
particular, whether OP is aware of a utility function
(Osborne and Rubinstein, 1990).
3.4 Two Issues With Decay
Following from the previous section, suppose that K
contains κ
11
, κ
12
, κ
13
and κ
14
. The two preference
orderings κ
11
and κ
12
induce a partial ordering on
the sentence probabilities in the P(OPAcc(w, p)) ar-
ray [as in Sec. 3.3] from the top-left where the prob-
abilities are 0, to the bottom-right where the prob-
abilities are 1. There are fifty-one possible worlds
that are consistent with K.
Suppose that B contains: β
11
: OPAcc(2, 16),
β
12
: OPAcc(2, 14), β
13
: OPAcc(1, 11) and β
14
:
OPAcc(3, 13) this is the same offer sequence as
considered in Sec. 3.3 — and with a 10% decay in in-
tegrity for each time step: P(β
11
)=0.4, P(β
12
)=
0.2, P(β
13
)=0.7 and P(β
14
)=0.9. Belief β
11
is
inconsistent with K∪{β
12
} as together they violate
the sentence probability ordering induced by κ
11
and
κ
12
. Resolving this issue is a job for the belief revi-
sion function R which discards the older, and weaker,
belief β
11
.
Eqn. 4 is used to calculate the distribution W
{K,B}
which has just five different probabilities in it. The
resulting values for the three λs are: λ
12
=
2.8063, λ
13
= 2.0573 and λ
14
= 2.5763.
A FOUNDATION FOR INFORMED NEGOTIATION
27
P(OPAcc(w, p)) now is:
w =0 w =1 w =2 w =3 w =4
p =20 0.0134 0.0269 0.0286 0.0570 0.0591
p =19 0.0269 0.0537 0.0571 0.1139 0.1183
p =18 0.0403 0.0806 0.0857 0.1709 0.1774
p =17 0.0537 0.1074 0.1143 0.2279 0.2365
p =16 0.0671 0.1343 0.1429 0.2849 0.2957
p =15 0.0806 0.1611 0.1714 0.3418 0.3548
p =14 0.0940 0.1880 0.2000 0.3988 0.4139
p =13 0.3162 0.
6324 0.6728 0.9000 0.9173
p =12 0.3331 0.6662 0.7088 0.9381 0.9576
p =11 0.3500 0.7000 0.7447 0.9762 0.9978
In this array, the derived sentence probabilities for the
three sentences in B are shown in bold type; they are
exactly their given values.
4 ESTIMATING P(NAAcc(.))
The proposition NAAcc(δ) means: δ is acceptable to
NA”. This section describes how NA attaches a con-
ditional probability to the proposition: P(NAAcc(δ) |
I
t
) in the light of information I
t
. The meaning of “ac-
ceptable to NA is described below. This is intended
to put NA in the position “looking back on it, I made
the right decision at the time” — this is a vague notion
but makes sense to the author. The idea is for NA to
accept a deal δ when P(NAAcc(δ) |I
t
) α for some
threshold value α that is one of NAs mental states.
P(NAAcc (δ) |I
t
) is derived from conditional
probabilities attached to four other propositions:
P(Suited(ω) |I
t
),
P(Good(OP ) |I
t
),
P(Fair(δ) |I
t
∪{Suited(ω), Good(OP )}) and
P(Me(δ) |I
t
∪{Suited(ω), Good(OP )}).
meaning respectively: “terms ω are perfectly suited
to my needs”, OP will be a good agent for me to
be doing business with”, δ is generally considered
to be a good deal for NA”, and “on strictly subjective
grounds, δ is acceptable to NA”. The last two of these
four probabilities factor out both the suitability of ω
and the appropriateness of the opponent OP. The dif-
ference between the third and fourth is that the third
captures the concept of “a good market deal” and the
fourth a strictly subjective “what ω is worth to NA”.
The “Me(.)” proposition is related to the concept of a
private valuation in game theory.
To determine P(Suited(ω) |I
t
). If there are suf-
ficiently strong preference relations to establish ex-
trema for this distribution then they may be assigned
extreme values 0.0 or 1.0. NA is repeatedly asked
to provide probability estimates for the offer ω that
yields the greatest reduction in entropy for the result-
ing distribution (MacKay, 2003). This continues un-
til NA considers the distribution to be “satisfactory”.
This is tedious but the “preference acquisition bot-
tleneck” appears to be an inherently costly business
(Castro-Schez et al., 2004).
To determine P(Good(OP ) |I
t
) involves an as-
sessment of the reliability of OP. For some retailers
(sellers), information — of varying reliability — may
be extracted from sites that rate them. For individ-
uals, this may be done either through assessing their
reputation established during prior trades (Ramchurn
et al., 2003) (Sierra and Debenham, 2005), or by the
inclusion of some third-party escrow service that is
then rated for “reliability” instead.
P(Fair(δ) |I
t
∪{Suited(ω), Good(OP )}) is de-
termined by market data. As for dealing with Suited,
if the preference relations establish extrema for this
distribution then extreme values may be assigned. In-
dependently of this, real market data, qualified with
given sentence probabilities, is fed into the distribu-
tion. The revision function R identifies and removes
inconsistencies, and missing values are estimated us-
ing the maximum entropy distribution.
Determining P(Me(δ) |I
t
∪{Suited(ω),
Good(OP )}) is a subjective matter. It is specified
using the same device as used for Fair except that
the data is fed in by hand “until the distribution ap-
pears satisfactory”. To start this process first iden-
tify those δ that NA would be never accept” they
are given a probability of 0.0, and second those δ
that NA would be delighted to accept” they are
given a probability of 1.0. The Me proposition
links the information-theory approach with “private
valuations” in game-theory.
There is no causal relationship between the four
probability distributions as they have been defined,
with the possible exception of the third and fourth. To
link the probabilities associated with the five proposi-
tions, the probabilities are treated as epistemic prob-
abilities and the nodes form a simple Bayesian net.
The weights on the four arcs of the Bayesian net are a
subjective representation of what “acceptable” means
to NA. The resulting net divides the problem of esti-
mating P(NAAcc) into four simpler sub-problems.
The conditionals on the Bayesian network are sub-
jective they are easy to specify because twelve of
them are zero — that is, for the cases in which NA be-
lieves that either Me or Suited is “false”. For example,
if the conditionals (set by NA) are:
P(NAAcc | Me, Suited, Good, Fair)=1.0
P(NAAcc | Me, Suited, ¬Good, Fair)=0.
1
P(NAAcc | Me, Suited, Good, ¬Fair)=0.4
P(NAAcc | Me, Suited, ¬Good, ¬Fair)=0.05
then, with probabilities of 0.9 on each of the four evi-
dence nodes, the probability P(NAAcc) = 0.75. It then
remains to manage the acquisition of information I
t
from the available sources to, if necessary, increase
P(NAAcc (δ) |I
t
) so that δ is acceptable. The con-
ditional probabilities on the net represent an agent’s
priorities for a deal, and so they are specified for each
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
28
class of deal.
The NAAcc predicate generalizes the notion of
utility. Suppose that NA knows its utility function
U. If the conditionals on the Bayesian net are as
in the previous paragraph and if either P(Me(.)) or
P(Suited(.)) are zero then P(NAAcc(.)) will be zero.
If the conditional probabilities on the Bayesian net
are 1.0 when Me is true and are 0.0 otherwise then
P(NAAcc)=P(Me). Then define: P(Me(τ, ω)) =
1
2
× (1 +
U(ω)U(τ )
U(ω)U(τ )
) for U(ω) > U(τ) and zero
otherwise, where
ω = max
ω
U(ω).
4
A bargaining
threshold α>0.5 will then accept offers for which
the surplus is positive. In this way NAAcc represents
utility-based bargaining with a private valuation.
NAAcc also is intended to be able to represent ap-
parently irrational bargaining situations (eg: “I’ve just
got to have that hat”), as well as tricky multi-issue
problems such as those typical in eProcurement. It
enables an agent to balance the degree of suitability
of the terms offered with the reliability of the oppo-
nent and with the fairness of the deal.
5 NEGOTIATION STRATEGIES
Sec. 3 estimated the probability distribution,
P(OPAcc), that OP will accept an offer, and Sec. 4
estimated the probability distribution, P(NAAcc),
that NA should be prepared to accept an offer. These
two probability distributions represent the opposing
interests of the two agents NA and OP. P(OPAcc)
will change every time an offer is made, rejected or
accepted. P(NAAcc) will change as the background
information changes. This section discusses NA’s
strategy S.
Bargaining can be a game of bluff and counter-
bluff in which an agent may even not intend to close
the deal if one should be reached. A basic conun-
drum in any offer-exchange bargaining is: it is im-
possible to force your opponent to reveal informa-
tion about their position without revealing informa-
tion about your own position. Further, by revealing
information about your own position you may change
your opponents position and so on.
5
This infi-
4
The introduction of ω may be avoided by defining
P(Me(τ , ω))
1
1+exp(β×(U(ω)U(τ ))
for U(ω) U(τ )
and zero otherwise, where β is some constant. This is the
sigmoid transfer function used in some neural networks.
This function is near-linear for U(ω) U(τ ), and is con-
cave, or “risk averse”, outside that region. The transition
between these two behaviors is determined by the choice of
β.
5
This a reminiscent of Werner Heisenberg’s indeter-
minacy relation, or unbestimmtheitsrelationen: “you can’t
measure one feature of an object without changing another”
— with apologies.
nite regress, of speculation and counter-speculation,
is avoided here by ignoring the internals of the op-
ponent and by focussing on what is known for certain
that is: what information is contained in the signals
received and when did those signals arrive.
A fundamental principle of competitive bargaining
is “never reveal your best price”, and another is “never
reveal your deadline if you have one” (Sandholm
and Vulkan, 1999). It is not possible to be prescrip-
tive about what an agent should reveal. All that can
be achieved is to provide strategies that an agent may
choose to employ. The following are examples of
such strategies.
An agent’s strategy S is a function of the infor-
mation I
t
that is has at time t. That information
will be represented in the agent’s K and B, and will
have been used to calculate P(OPAcc) and P(NAAcc).
Simple strategies choose an offer only on the ba-
sis of P(OPAcc), P(NAAcc) and α. The greedy
strategy S
+
chooses arg max
δ
{P(NAAcc (δ)) |
P(OPAcc(δ)) 0}, it is appropriate for an
agent that believes OP is desperate to trade. The
expected-acceptability-to-NA-optimizing strategy S
chooses arg max
δ
{P(OPAcc(δ)) × P(NAAcc(δ)) |
P(NAAcc (δ)) α}, it is appropriate for a confident
agent that is not desperate to trade. The strategy S
chooses arg max
δ
{P(OPAcc(δ)) | P(NAAcc(δ))
α}, it optimizes the likelihood of trade — it is a good
strategy for an agent that is keen to trade without com-
promising its own standards of acceptability.
An approach to issue-tradeoffs is described in
(Faratin et al., 2003). The bargaining strategy de-
scribed there attempts to make an acceptable offer by
“walking round” the iso-curve of NAs previous offer
(that has, say, an acceptability of α
na
α) towards
OPs subsequent counter offer. In terms of the ma-
chinery described here, an analogue is to use the strat-
egy S
: arg max
δ
{ P(OPAcc(δ)) | P(NAAcc(δ) |
I
t
) α
na
} for α = α
na
. This is reasonable for an
agent that is attempting to be accommodating without
compromising its own interests. Presumably such an
agent will have a policy for reducing the value α
na
if her deals fail to be accepted. The complexity of
the strategy in (Faratin et al., 2003) is linear with
the number of issues. The strategy described here
does not have that property, but it benefits from us-
ing P(OPAcc) that contains foot prints of the prior
offer sequence see Sec. 3.4 in that distribution
more recent offers have stronger weights.
6 CONCLUSIONS
The negotiating agent achieves its goal of reaching
informed decisions whilst making no assumptions
about the internals of its opponent. The agent is
A FOUNDATION FOR INFORMED NEGOTIATION
29
founded on information theory is ‘driven’ by real-
time information flows derived from its environment
that includes the Internet. Existing text and data min-
ing bots have been used to feed information into NA
in experiments including a negotiation between two
agents in an attempt to swap a mobile phone for a
digital camera with no cash involved.
NA has five ways of leading a negotiation towards
a positive outcome. First, by making more attractive
offers to OP. Second, by reducing its threshold α.
Third, by acquiring information to hopefully increase
the acceptability of offers received. Fourth, by en-
couraging OP to submit more attractive offers. Fifth,
by encouraging OP to accept NAs offers. The first
two of these have been described. The third has been
implemented but is not described here. The remaining
two are the realm of argumentation-based negotiation
which is the next step in this project. The integrated
way in which NA manages both the negotiation and
the information acquisition should provide a sound
basis for an argumentation-based negotiator.
(Halpern, 2003) discusses problems with the
random-worlds approach, and notes particularly rep-
resentation and learning. Representation is particu-
larly significant here — for example, the logical con-
stants in the price domains could have been given
other values, and, as long as they remained or-
dered, and as long as the input values remained un-
changed, the probability distributions would be unal-
tered. Learning is not an issue now as the distributions
are kept as simple as possible and are re-computed
each time step. The assumptions of maximum entropy
probabilistic logic exploit the agent’s limited rational-
ity by attempting to assume “precisely no more than
is known”. But, the computations involved will be
substantial if the domains in the language L are large,
and will be infeasible if the domains are unbounded.
If the domains are large then preference relations such
as κ
1
can simplify the computations substantially.
Much has not been described here including: the
data and text mining software, the use of the Bayesian
net to prompt a search for information that may lead
to NA raising or perhaps lowering its accept-
ability threshold, and the way in which the incoming
information is structured to enable its orderly acqui-
sition (Debenham, 2004). We have not described the
belief revision and the identification of those random
worlds that are consistent with K.
The following issues are presently being investi-
gated. The random worlds computations are per-
formed each time the knowledge, K, or beliefs, B,
alter there is scope for using approximate updat-
ing techniques interspersed with the exact calcula-
tions. The offer accepting machinery operates inde-
pendently from the offer making machinery but not
vice versa this may mean that better deals could
have been struck under some circumstances.
REFERENCES
Bulow, J. and Klemperer, P. (1996). Auctions versus nego-
tiations. American Economic Review, 86(1):180–194.
Castro-Schez, J., Jennings, N., Luo, X., and Shadbolt, N.
(2004). Acquiring domain knowledge for negotiating
agents: a case study. International Journal of Human-
Computer Studies, 61(1):3 – 31.
Debenham, J. (2004). A bargaining agent aims to
‘play fair’. In proceedings Twenty-fourth Interna-
tional Conference on Innovative Techniques and Ap-
plications of Artificial Intelligence, pages 173–186.
Springer-Verlag: Heidelberg, Germany.
Faratin, P., Sierra, C., and Jennings, N. (2003). Using
similarity criteria to make issue trade-offs in auto-
mated negotiation. Journal of Artificial Intelligence,
142(2):205–237.
Halpern, J. (2003). Reasoning about Uncertainty. MIT
Press.
Jaynes, E. (1957). Information theory and statistical me-
chanics: Part I. Physical Review, 106:620 – 630.
Kraus, S. (2001). Strategic Negotiation in Multiagent Envi-
ronments. MIT Press.
MacKay, D. (2003). Information Theory, Inference and
Learning Algorithms. Cambridge University Press.
Muthoo, A. (1999). Bargaining Theory with Applications.
Cambridge UP.
Myerson, R. and Satterthwaite, M. (1983). Efficient mecha-
nisms for bilateral trading. Journal of Economic The-
ory, 29:1–21.
Neeman, Z. and Vulkan, N. (2000). Markets versus ne-
gotiations. Technical report, Center for Rationality
and Interactive Decision Theory, Hebrew University,
Jerusalem.
Osborne, M. J. and Rubinstein, A. (1990). Bargaining and
Markets. Academic Press.
Pietra, S. D., Pietra, V. D., and Lafferty, J. (1997). Inducing
features of random fields. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 19(2):380–
393.
Ramchurn, S., Jennings, N., Sierra, C., and Godo, L.
(2003). A computational trust model for multi-agent
interactions based on confidence and reputation. In
Proceedings 5th Int. Workshop on Deception, Fraud
and Trust in Agent Societies.
Sandholm, T. and Vulkan, N. (1999). Bargaining with dead-
lines. In Proceedings of the National Conference on
Artificial Intelligence (AAAI).
Sierra, C. and Debenham, J. (2005). An information-based
model for trust. In Dignum, F., Dignum, V., Koenig,
S., Kraus, S., Singh, M., and Wooldridge, M., editors,
Proceedings Fourth International Conference on Au-
tonomous Agents and Multi Agent Systems AAMAS-
2005, pages 497 504, Utrecht, The Netherlands.
ACM Press, New York.
ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
30