A FOUNDATION FOR INFORMED NEGOTIATION

John Debenham and Simeon Simoff

University of Technology, Sydney

PO Box 123, Broadway, NSW 2007, Australia

Keywords:

Intelligent Agents, Decision Support, Agents for Internet Computing.

Abstract:

Approaches to the construction of agents that are to engage in competitive negotiation are often founded

on game theory. In such an approach the agents are endowed with utility functions and assumed to be utility

optimisers. In practice the utility function is derived in the context of massive uncertainties both in terms of the

agent’s priorities and of the raw data or information. To address this issue we propose an agent architecture that

is founded on information theory, and that manages uncertainty with entropy-based inference. Our negotiating

agent engages in multi-issue bilateral negotiation in a dynamic information-rich environment. The agent

strives to make informed decisions. The agent may assume that the integrity of some of its information decays

with time, and that a negotiation may break down under certain conditions. The agent makes no assumptions

about the internals of its opponent — it focuses only on the signals that it receives. It constructs two probability

distributions over the set of all deals. First the probability that its opponent will accept a deal, and second that

a deal will prove to be acceptable to it in time.

1 INTRODUCTION

We propose an agent architecture that is founded on

information theory, and that manages uncertainty with

entropy-based inference. This architecture aims to ad-

dress the problem of deﬁning a utility function for a

game-theoretic agent in the context of massive uncer-

tainties both in terms of the agent’s priorities and of

the raw data or information. Our agent does not nec-

essarily have a utility function and so is not neces-

sarily a utility optimiser — it may have less selﬁsh

aims such as simply ‘playing fair’. This Negotiating

Agent, NA, engages in bilateral bargaining with an op-

ponent, OP. It strives to make informed decisions in

an information-rich environment that includes market

data and general sources including the Internet. NA

attempts to fuse the negotiation with the information

generated both by and because of it. It reacts to in-

formation derived from its opponent and from the en-

vironment, and proactively seeks missing information

that may be of value.

NA is founded on information theory. Game the-

ory tells us what to do, and what outcome to expect,

in many well-known negotiation situations, but these

strategies and expectations are derived from assump-

tions about the internals of the opponent. Game theo-

retic analyses of bargaining are founded on the notion

of agents as utility optimizers in the presence of com-

plete and incomplete information about their oppo-

nents (Muthoo, 1999). Two probability distributions

form the foundation of both the offer evaluation and

the offer making processes. They are both over the set

of all deals and are based on all information available

to the agent. The ﬁrst distribution is the probability

that any deal is acceptable to OP. The second distrib-

ution is the probability that any deal will prove to be

acceptable to NA — this distribution generalizes the

notion of utility.

NA makes no assumptions about the internals of

OP in particular whether it has a utility function. NA

does make assumptions about: the way in which the

integrity of information will decay, preferences that

its opponent may have for some deals over others,

and conditions that may lead to breakdown. It also

assumes that unknown probabilities can be inferred

using maximum entropy probabilistic logic (MacKay,

2003) that is based on random worlds (Halpern,

2003). The maximum entropy probability distribu-

tion is “the least biased estimate possible on the given

information; i.e. it is maximally noncommittal with

Debenham J. and Simoff S. (2006).

A FOUNDATION FOR INFORMED NEGOTIATION.

In Proceedings of the Eighth International Conference on Enterprise Information Systems - AIDSS, pages 23-30

DOI: 10.5220/0002494000230030

 SciTePress

regard to missing information” (Jaynes, 1957). In the

absence of knowledge about OP’s decision-making

apparatus, NA assumes that the “maximally noncom-

mittal” model is the correct model on which to base

its reasoning.

A preference relation is an assumption that NA

makes about OP’s preferences for some deals over

others. For example, that she prefers to pay a lower

price to a higher price. A single-issue preference rela-

tion assumes that she prefers deals on the basis of one

issue alone, independent of the values of the other is-

sues. A preference relation may be assumed prior to

the negotiation, or during it based on the offers made.

For example, the opponent may display a preference

for items of a certain color; (Faratin et al., 2003) de-

scribes a basis for ordering colors. The preference re-

lations illustrated here are single-issue orderings, but

the agent’s reasoning operates equally well with any

preference relation as long as it may be expressed in

Horn clause logic.

Bilateral bargaining is known to be inherently in-

efﬁcient (Myerson and Satterthwaite, 1983). (Bulow

and Klemperer, 1996) shows that a seller is better off

with an auction that attracts n+1 buyers than bargain-

ing with n individuals, no matter what the bargain-

ing protocol is. (Neeman and Vulkan, 2000) shows

that the weaker bargaining types will fare better in ex-

changes leading to a gradual migration. These results

hold for agents who aim to optimize their utility and

do limit the work described here.

2 INFORMED AGENTS

NA operates in an information-rich environment. The

integrity of its information, including information ex-

tracted from the Internet, will decay in time. The way

in which this decay occurs will depend on the type of

information, and on the source from which it is drawn.

Little appears to be known about how the integrity of

information, such as news-feeds, decays.

One source of NA’s information is the signals re-

ceived from OP. These include offers to NA, and the

acceptance or rejection of NA’s offers. If OP rejected

NA’s offer of $8 two days ago then what is NA’s belief

now in the proposition that OP will accept another of-

fer of $8 now? Perhaps it is around 0.1. A linear

model is used to model the integrity decay of these

beliefs, and when the probability of a decaying belief

approaches 0.5

the belief is discarded. This choice

of a linear model is independent of the bargaining

method. The model of decay could be exponential,

quadratic or what ever.

A sentence probability of 0.5 represents “maybe,

maybe not”.

2.1 Interaction Protocol

The agents communicate using sentences in a ﬁrst-

order language L. This includes the exchange, ac-

ceptance and rejection of offers. L contains the fol-

lowing predicates: Offer(δ), Accept(δ), Reject(δ) and

Quit(.), where Offer(δ) means “the sender is offering

you a deal δ”, Accept(δ) means “the sender accepts

your deal δ”, Reject(δ) means “the sender rejects your

deal δ” and Quit(.) means “the sender quits — the ne-

gotiation ends”.

Two negotiation protocols are described. First, ne-

gotiation without decay in which all offers stand for

the entire negotiation. Second, negotiation with de-

cay in which offers stand only if accepted by return

— NA represents OP’s offers as beliefs with sentence

probabilities that decay in time.

NA and OP each exchange offers alternately at suc-

cessive discrete times (Kraus, 2001). They enter into

a commitment if one of them accepts a standing offer.

The protocol has three stages:

1. Simultaneous, initial, binding offers from both

agents;

2. A sequence of alternating offers, and

3. An agent quits and walks away from the negotia-

tion.

The negotiation ceases either in the second round if

one of the agents accepts a standing offer or in the ﬁ-

nal round if one agent quits and the negotiation breaks

down.

In the ﬁrst stage the agents simultaneously send Of-

fer(.) messages to each other. These initial offers are

taken as limits on the range of values that are consid-

ered possible. This is crucial to the method described

in Sec. 3 where there are domains that would other-

wise be unbounded. The exchange of initial offers

“stakes out the turf” on which the subsequent nego-

tiation will take place. In the second stage an Of-

fer(.) message is interpreted as an implicit rejection,

Reject(.), of the opponent’s offer on the table.

2.2 Agent Architecture

Incoming messages from all sources are time-

stamped and placed in an “In Box”, X , as they ar-

rive. NA has a knowledge base K and a belief set

B. Each of these two sets contains statements in L.

K contains statements that are generally true, such as

∀x(Accept(x) ↔¬Reject(x)) — i.e. an agent does

one thing or the other. The belief set B = {β

} con-

tains statements that are each qualiﬁed with a given

sentence probability, B(β

), that represents an agent’s

belief in the truth of the statement. These sentence

probabilities may decay in time.

The distinction between the knowledge base K and

the belief set B is simply that K contains unqualiﬁed

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

statements and B contains statements that are quali-

ﬁed with sentence probabilities. K and B play differ-

ent roles in the method described in Sec. 3.

NA’s actions are determined by its “strategy”. A

strategy is a function S : K×B → Awhere A

is the set of actions. At certain distinct times the

function S is applied to K and B and the agent does

something. The set of actions, A, includes sending

Offer(.), Accept(.), Reject(.) and Quit(.) messages to

OP. The way in which S works is described in Sec. 5.

Momentarily before the S function is activated, a “re-

vision function” R is activated:

R :(X×K×B) → (K×B)

R clears the “In Box”, and stores the messages either

in B with a given sentence probability or in K.

A deal, δ, is a commitment for the sender to do

something, τ (the sender’s “terms”), subject to the re-

ceiver committing to do something, ω (the receiver’s

“terms”): δ =(τ,ω). NA may have a real-valued

utility function: U : T→, where

T is the set of

terms. If so, then for any deal δ =(τ,ω) the expres-

sion U(ω)−U(τ ) is called the surplus of δ. An agent

may be unable to specify a utility function either pre-

cisely or with certainty.

Sec. 4 describes a predicate

NAAcc(.) that represents the “acceptability” of a deal.

NA uses three things to make offers: an estimate of

the likelihood that OP will accept any offer [Sec. 3],

an estimate of the likelihood that NA will, in hind-

sight, feel comfortable accepting any particular offer

[Sec. 4], and an estimate of when OP may quit and

leave the negotiation [Sec. 5].

2.3 Random Worlds

Let G be the set of all positive ground literals that can

be constructed using the predicate, function and con-

stant symbols in L.Apossible world is a valuation

function v : G→{, ⊥}. V denotes the set of all

possible worlds, and V

denotes the set of possible

worlds that are consistent with a knowledge base K

(Halpern, 2003).

A random world for K is a probability distribution

= {p

} over V

= {v

}, where W

expresses

an agent’s degree of belief that each of the possi-

ble worlds is the actual world. The derived sentence

probability of any σ ∈L, with respect to a random

world W

is:

(σ) 



{ p

: σis in v

} (1)

A random world W

is consistent with the agent’s

beliefs B if: (∀β ∈B)(B(β)=P

(β)). That

The often-quoted oxymoron “I paid too much for it,

but its worth it.” attributed to Samuel Goldwyn, movie pro-

ducer, illustrates that intelligent agents may choose to nego-

tiate with uncertain utility.

is, for each belief its derived sentence probability as

calculated using Eqn. 1 is equal to its given sentence

probability.

The entropy of a discrete random variable X with

probability mass function {p

} is (MacKay, 2003):

H(X)=−



log p

where: p

≥ 0 and



=1. Let W

{K,B}

be the “maximum en-

tropy probability distribution over V

that is consis-

tent with B”. Given an agent with K and B, its derived

sentence probability for any sentence, σ ∈L, is:

(∀σ ∈L)P(σ)  P

{K,B}

(σ) (2)

Using Eqn. 2, the derived sentence probability for any

belief, β

, is equal to its given sentence probability. So

the term sentence probability is used without ambigu-

ity.

3 ESTIMATING P(OPAcc(.))

NA does two different things. First, it reacts to of-

fers received from OP — that is described in Sec. 4.

Second, it sends offers to OP. This section describes

the estimation of P(OPAcc(δ)) where the predicate

OPAcc(δ) means “the deal δ is acceptable to OP”.

When a negotiation commences NA may have no

information about OP or about prior deals. If so then

the initial offers may only be based on past experience

or circumstantial information.

So the opening offers

are simply taken as given.

In the four sub-sections following, NA is attempt-

ing sell something to OP. In Secs. 3.1 and 3.2 NA’s

terms τ are to supply a particular good, and OP’s

terms ω are money — in those examples the amount

of money ω is the subject of the negotiation. In

Secs. 3.3 and 3.4 NA’s terms are to supply a partic-

ular good together with some negotiated warranty pe-

riod, and OP’s terms are money — in those examples

the amount of money p and the period of the warranty

period w are the subject of the negotiation.

3.1 One Issue — Without Decay

The unary predicate OPAcc(x) means “the amount of

money $x is acceptable to OP”. NA is interested in

whether the unary predicate OPAcc(x) is true for var-

ious values of $x. NA assumes the following prefer-

ence relation on the OPAcc predicate:

: ∀x, y((x>y) → (OPAcc(x) → OPAcc(y)))

In rather dire circumstances King Richard III of Eng-

land is reported to have initiated a negotiation with remark-

ably high stakes: “A horse! a horse! my kingdom for a

horse!” [William Shakespeare]. Fortunately for Richard, a

person named Catesby was nearby, and advised Richard to

retract this rash offer “Withdraw, my lord”, and so Richard’s

intention to honor his commitments was not put to the test.

A FOUNDATION FOR INFORMED NEGOTIATION

Suppose that NA’s opening offer is ω, and OP’s open-

ing offer is ω

where ω < ω. Then K now con-

tains two further sentences: κ

: ¬OPAcc(ω) and

: OPAcc(ω). There are now ω−ω possible worlds,

and the maximum entropy distribution is uniform.

Suppose that NA knows its true valuation for the

good, u

, and that NA has decided to make an

“expected-utility-optimizing” offer: x =

ω+u

. This

offer is calculated on the basis of the preference or-

dering κ

and the two signals that NA has received

from OP. The response is in terms of only NA’s valu-

ation u

and the signal Reject(ω) — it is independent

of the signal Offer(ω

) which implies that ω is accept-

able.

In the standard game theoretic analysis of bargain-

ing (Muthoo, 1999), NA assumes that OP has a utility,

, that it lies in some interval [u, u], and that the ex-

pected value of u

is uniformly distributed on that

interval. On the basis of these assumptions NA then

derives the expected-utility-optimizing offer:

u+u

These two offers differ by

u in the game-theoretic re-

sult and

ω in the maximum entropy result. The game

theoretic approach relies on estimates for u

and u:

E([u

, u] | Reject(ω) ∧ Accept(ω))

If OP has a utility, and it may not, then if OP is ra-

tional: u

≤ ω ≤ u. The inherent inefﬁciency of bi-

lateral bargaining (Myerson and Satterthwaite, 1983)

shows for an economically rational OP that u

, and

so consequently

u, may be greater than ω. There is no

reason to suspect that

u and ω will be equal.

3.2 One Issue — With Decay

As in the previous example, suppose that the open-

ing offers at time t

are taken as given and are ω and

ω. Then K contains κ

, κ

and κ

. Suppose L con-

tains n consecutive, integer constants in the interval

[ω

, ω], where n = ω − ω +1, that represent various

amounts of money. κ

induces a total ordering on the

sentence probabilities for OPAcc(x) on the interval

[ω

, ω], where the probabilities are ≈ 0 at ω, and ≈ 1

at ω

Suppose that at time t

NA makes an offer ω

which is rejected by OP, who has replied at time t

with an offer of ω

where ω ≤ ω

≤ ω

≤ ω.

At time t

B contains β

: OPAcc(ω

) and β

OPAcc(ω

). Suppose that there is some level of in-

tegrity decay on these two beliefs: 0 < B(β

) <

0.5 < B(β

) < 1. Then V

contains n +1possi-

ble worlds ranging from “all false” to “all true” each

containing n literals. So a random world for K will

consist of n +1probabilities {p

}, where, say, p

the probability of “all true”, and p

n+1

is the probabil-

ity of “all false”. P

{K,B}

will be the distribution that

maximizes −



log p

subject to the constraints:

≥ 0,



=1,



ω−ω

n=1

= B(β

) and



ω−ω

n=1

= B(β

The optimization of entropy, H, subject to linear

constraints is described in Sec. 3.2.1 below. P

{K,B}

is:

⎧

⎪

⎨

⎪

⎩

B(β

)

ω−ω

if1 ≤ n ≤ ω − ω

B(β

)−B(β

)

−ω

ifω − ω

+1<n<ω − ω

1−B(β

)

−ω+1

ifω − ω

+2≤ n ≤ ω − ω +2

Using Eqn. 2, for ω

≤ x ≤ ω

P(OPAcc(x)) =

B(β

− x

− ω

(B(β

) − B(β

))

(3)

These probability estimates are used in Sec. 5 to cal-

culate NA’s next offer.

The values for P(OPAcc(x)) in the region ω

≤

x ≤ ω

are derived from only two pieces of in-

formation that are the two signals Reject(ω

) and

Offer(ω

) each qualiﬁed with the time at which they

arrived, and the decay rate on their integrity. The as-

sumptions in the analysis given above are: the choice

of values for ω

and ω — which do not appear in Eqn. 3

in any case — and the choice of the “maximally non-

committal” distribution.

If the agents continue to exchange offers then new

beliefs will be acquired and the integrity of old be-

liefs will decay. If the next pair of offers lies within

the interval [ω

,ω

] and if the integrity of β

and

decays then the sentence probabilities of β

and

will be inconsistent with those of the two new be-

liefs due to the total ordering of sentence probabili-

ties on [ω

, ω] induced by κ

. This inconsistency is

resolved by the revision function R that here discards

inconsistent older beliefs, β

and β

, in favor of more

recent beliefs. If the agents continue in this way then

the sentence probabilities for the OPAcc predicate are

given simply by Eqn. 3 using the most recent values

for ω

and ω

The analysis given above requires that values be

speciﬁed for the opening offers ω

and ω. The only

part of the probability distribution that depends on the

values chosen for ω

and ω are the two “tails” of the

distribution. So the choice of values for these two

opening offers is unlikely to effect the estimates. The

two tails are necessary to “soak up” the otherwise un-

allocated probability.

3.2.1 Maximizing Entropy

If X is a discrete random variable taking a ﬁnite num-

ber of possible values {x

} with probabilities {p

}

then the entropy is the average uncertainty removed

by discovering the true value of X, and is given by

H = −



log p

. The direct optimization of

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

H subject to a number, θ, of linear constraints of

the form



)=g

for given constants g

where k =1,...,θ, is a difﬁcult problem. Fortu-

nately this problem has the same unique solution as

the maximum likelihood problem for the Gibbs dis-

tribution (Pietra et al., 1997). The solution to both

problems is given by:

exp(−



k=1

))



exp(−



k=1

))

(4)

for n =1, 2, ···, where the constants {λ

} may

be calculated using Eqn. 4 together with the three

sets of constraints: p

≥ 0,



=1and



)=g

. The distribution in Eqn. 4 is

known as Gibbs distribution.

Calculating the expressions for the values of {p

}

given in the example above in Sec. 3.2 does not re-

quire the full evaluation of the expressions in Eqn. 4.

That equation shows that there are just three different

values for the {p

}. Applying simple algebra to that

fact together with the constraints yields the expres-

sions given.

3.3 Two Issues — Without Decay

The above approach to single-issue bargaining gener-

alizes without modiﬁcation to multi-issue bargaining,

it is illustrated with two issues only for ease of presen-

tation. The problem considered is the sale of an item

with 0,...,4 years of warranty. The terms being ne-

gotiated specify an amount of money p and the num-

ber of years warranty w. The predicate OPAcc(w, p)

now means “OP will accept the offer to purchase the

good with w years warranty for $p”.

NA assumes the following two preference order-

ings, and K contains:

: ∀x, y, z((x>y) → (OPAcc(y, z) →

OPAcc(x, z)))

: ∀x, y, z((x>y) → (OPAcc(z, x) →

OPAcc(z, y)))

As in Sec. 3.1, these sentences conveniently reduce

the number of possible worlds. The number of

possible worlds will be ﬁnite as long as K con-

tains two statements of the form: ¬OPAcc(4,a) and

OPAcc(0,b) for some a and b. Suppose that NA’s ini-

tial offer was “4 years warranty for $21” and OP’s

initial offer was “no warranty for $10”. K now con-

tains:

: ¬OPAcc(4, 21) κ

: OPAcc(0, 10)

These two statements, together with the restriction to

integers only, limit the possible values of w and p in

OPAcc(w, p) to a 5 × 10 matrix.

Suppose that NA knows its utility function for the

good with 0,...,4 years warranty and that its values

are: $11.00, $11.50, $12.00, $13.00 and $14.50 re-

spectively. Suppose that NA uses the strategy S

(n)

which is described in Sec. 5 — the details of that

strategy are not important now. If NA uses that strat-

egy with n =2, then NA offers Offer(2, $16) which

suppose OP rejects and counters with Offer(1, $11).

Then with n =2again, NA offersOffer(2, $14) which

suppose OP rejects and counters with Offer(3, $13).

P(OPAcc(w, p)) now is:

w =0 w =1 w =2 w =3 w =4

p =20 0.0000 0.0000 0.0000 0.0455 0.0909

p =19 0.0000 0.0000 0.0000 0.0909 0.1818

p =18 0.0000 0.0000 0.0000 0.1364 0.2727

p =17 0.0000 0.0000 0.0000 0.1818 0.3636

p =16 0.0000 0.0000 0.0000 0.2273 0.4545

p =15 0.0000 0.0000 0.0000 0.2727 0.5454

p =14 0.0000 0.0000 0.0000 0.3182 0.6364

p =13 0.0455 0.0909 0.

1364 1.0000 1.0000

p =12 0.0909 0.1818 0.2727 1.0000 1.0000

p =11 0.1364 1.0000 1.0000 1.0000 1.0000

and the expected-utility-optimizing offer is: Of-

fer(4, $18).IfNA makes that offer then the expected

surplus is $0.95. The matrix above contains the “max-

imally non-committal” values for P(OPAcc(w, p));

those values are recalculated each time a signal ar-

rives. The example demonstrates how the NA is able

to conduct multi-issue bargaining in a focussed way

without making assumptions about OP’s internals, in

particular, whether OP is aware of a utility function

(Osborne and Rubinstein, 1990).

3.4 Two Issues — With Decay

Following from the previous section, suppose that K

contains κ

, κ

and κ

. The two preference

orderings κ

and κ

induce a partial ordering on

the sentence probabilities in the P(OPAcc(w, p)) ar-

ray [as in Sec. 3.3] from the top-left where the prob-

abilities are ≈ 0, to the bottom-right where the prob-

abilities are ≈ 1. There are ﬁfty-one possible worlds

that are consistent with K.

Suppose that B contains: β

: OPAcc(2, 16),

: OPAcc(2, 14), β

: OPAcc(1, 11) and β

OPAcc(3, 13) — this is the same offer sequence as

considered in Sec. 3.3 — and with a 10% decay in in-

tegrity for each time step: P(β

)=0.4, P(β

0.2, P(β

)=0.7 and P(β

)=0.9. Belief β

inconsistent with K∪{β

} as together they violate

the sentence probability ordering induced by κ

and

. Resolving this issue is a job for the belief revi-

sion function R which discards the older, and weaker,

belief β

Eqn. 4 is used to calculate the distribution W

{K,B}

which has just ﬁve different probabilities in it. The

resulting values for the three λ’s are: λ

2.8063, λ

= −2.0573 and λ

= −2.5763.

A FOUNDATION FOR INFORMED NEGOTIATION

P(OPAcc(w, p)) now is:

w =0 w =1 w =2 w =3 w =4

p =20 0.0134 0.0269 0.0286 0.0570 0.0591

p =19 0.0269 0.0537 0.0571 0.1139 0.1183

p =18 0.0403 0.0806 0.0857 0.1709 0.1774

p =17 0.0537 0.1074 0.1143 0.2279 0.2365

p =16 0.0671 0.1343 0.1429 0.2849 0.2957

p =15 0.0806 0.1611 0.1714 0.3418 0.3548

p =14 0.0940 0.1880 0.2000 0.3988 0.4139

p =13 0.3162 0.

6324 0.6728 0.9000 0.9173

p =12 0.3331 0.6662 0.7088 0.9381 0.9576

p =11 0.3500 0.7000 0.7447 0.9762 0.9978

In this array, the derived sentence probabilities for the

three sentences in B are shown in bold type; they are

exactly their given values.

4 ESTIMATING P(NAAcc(.))

The proposition NAAcc(δ) means: “δ is acceptable to

NA”. This section describes how NA attaches a con-

ditional probability to the proposition: P(NAAcc(δ) |

) in the light of information I

. The meaning of “ac-

ceptable to NA” is described below. This is intended

to put NA in the position “looking back on it, I made

the right decision at the time” — this is a vague notion

but makes sense to the author. The idea is for NA to

accept a deal δ when P(NAAcc(δ) |I

) ≥ α for some

threshold value α that is one of NA’s mental states.

P(NAAcc (δ) |I

) is derived from conditional

probabilities attached to four other propositions:

P(Suited(ω) |I

P(Good(OP ) |I

P(Fair(δ) |I

∪{Suited(ω), Good(OP )}) and

P(Me(δ) |I

∪{Suited(ω), Good(OP )}).

meaning respectively: “terms ω are perfectly suited

to my needs”, “OP will be a good agent for me to

be doing business with”, “δ is generally considered

to be a good deal for NA”, and “on strictly subjective

grounds, δ is acceptable to NA”. The last two of these

four probabilities factor out both the suitability of ω

and the appropriateness of the opponent OP. The dif-

ference between the third and fourth is that the third

captures the concept of “a good market deal” and the

fourth a strictly subjective “what ω is worth to NA”.

The “Me(.)” proposition is related to the concept of a

private valuation in game theory.

To determine P(Suited(ω) |I

). If there are suf-

ﬁciently strong preference relations to establish ex-

trema for this distribution then they may be assigned

extreme values ≈ 0.0 or 1.0. NA is repeatedly asked

to provide probability estimates for the offer ω that

yields the greatest reduction in entropy for the result-

ing distribution (MacKay, 2003). This continues un-

til NA considers the distribution to be “satisfactory”.

This is tedious but the “preference acquisition bot-

tleneck” appears to be an inherently costly business

(Castro-Schez et al., 2004).

To determine P(Good(OP ) |I

) involves an as-

sessment of the reliability of OP. For some retailers

(sellers), information — of varying reliability — may

be extracted from sites that rate them. For individ-

uals, this may be done either through assessing their

reputation established during prior trades (Ramchurn

et al., 2003) (Sierra and Debenham, 2005), or by the

inclusion of some third-party escrow service that is

then rated for “reliability” instead.

P(Fair(δ) |I

∪{Suited(ω), Good(OP )}) is de-

termined by market data. As for dealing with Suited,

if the preference relations establish extrema for this

distribution then extreme values may be assigned. In-

dependently of this, real market data, qualiﬁed with

given sentence probabilities, is fed into the distribu-

tion. The revision function R identiﬁes and removes

inconsistencies, and missing values are estimated us-

ing the maximum entropy distribution.

Determining P(Me(δ) |I

∪{Suited(ω),

Good(OP )}) is a subjective matter. It is speciﬁed

using the same device as used for Fair except that

the data is fed in by hand “until the distribution ap-

pears satisfactory”. To start this process ﬁrst iden-

tify those δ that “NA would be never accept” — they

are given a probability of ≈ 0.0, and second those δ

that “NA would be delighted to accept” — they are

given a probability of ≈ 1.0. The Me proposition

links the information-theory approach with “private

valuations” in game-theory.

There is no causal relationship between the four

probability distributions as they have been deﬁned,

with the possible exception of the third and fourth. To

link the probabilities associated with the ﬁve proposi-

tions, the probabilities are treated as epistemic prob-

abilities and the nodes form a simple Bayesian net.

The weights on the four arcs of the Bayesian net are a

subjective representation of what “acceptable” means

to NA. The resulting net divides the problem of esti-

mating P(NAAcc) into four simpler sub-problems.

The conditionals on the Bayesian network are sub-

jective — they are easy to specify because twelve of

them are zero — that is, for the cases in which NA be-

lieves that either Me or Suited is “false”. For example,

if the conditionals (set by NA) are:

P(NAAcc | Me, Suited, Good, Fair)=1.0

P(NAAcc | Me, Suited, ¬Good, Fair)=0.

P(NAAcc | Me, Suited, Good, ¬Fair)=0.4

P(NAAcc | Me, Suited, ¬Good, ¬Fair)=0.05

then, with probabilities of 0.9 on each of the four evi-

dence nodes, the probability P(NAAcc) = 0.75. It then

remains to manage the acquisition of information I

from the available sources to, if necessary, increase

P(NAAcc (δ) |I

) so that δ is acceptable. The con-

ditional probabilities on the net represent an agent’s

priorities for a deal, and so they are speciﬁed for each

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS

class of deal.

The NAAcc predicate generalizes the notion of

utility. Suppose that NA knows its utility function

U. If the conditionals on the Bayesian net are as

in the previous paragraph and if either P(Me(.)) or

P(Suited(.)) are zero then P(NAAcc(.)) will be zero.

If the conditional probabilities on the Bayesian net

are 1.0 when Me is true and are 0.0 otherwise then

P(NAAcc)=P(Me). Then deﬁne: P(Me(τ, ω)) =

× (1 +

U(ω)−U(τ )

) for U(ω) > U(τ) and zero

otherwise, where

ω = max

U(ω).

A bargaining

threshold α>0.5 will then accept offers for which

the surplus is positive. In this way NAAcc represents

utility-based bargaining with a private valuation.

NAAcc also is intended to be able to represent ap-

parently irrational bargaining situations (eg: “I’ve just

got to have that hat”), as well as tricky multi-issue

problems such as those typical in eProcurement. It

enables an agent to balance the degree of suitability

of the terms offered with the reliability of the oppo-

nent and with the fairness of the deal.

5 NEGOTIATION STRATEGIES

Sec. 3 estimated the probability distribution,

P(OPAcc), that OP will accept an offer, and Sec. 4

estimated the probability distribution, P(NAAcc),

that NA should be prepared to accept an offer. These

two probability distributions represent the opposing

interests of the two agents NA and OP. P(OPAcc)

will change every time an offer is made, rejected or

accepted. P(NAAcc) will change as the background

information changes. This section discusses NA’s

strategy S.

Bargaining can be a game of bluff and counter-

bluff in which an agent may even not intend to close

the deal if one should be reached. A basic conun-

drum in any offer-exchange bargaining is: it is im-

possible to force your opponent to reveal informa-

tion about their position without revealing informa-

tion about your own position. Further, by revealing

information about your own position you may change

your opponents position — and so on.

This inﬁ-

The introduction of ω may be avoided by deﬁning

P(Me(τ , ω)) 

1+exp(−β×(U(ω)−U(τ ))

for U(ω) ≥ U(τ )

and zero otherwise, where β is some constant. This is the

sigmoid transfer function used in some neural networks.

This function is near-linear for U(ω) ≈ U(τ ), and is con-

cave, or “risk averse”, outside that region. The transition

between these two behaviors is determined by the choice of

β.

This a reminiscent of Werner Heisenberg’s indeter-

minacy relation, or unbestimmtheitsrelationen: “you can’t

measure one feature of an object without changing another”

— with apologies.

nite regress, of speculation and counter-speculation,

is avoided here by ignoring the internals of the op-

ponent and by focussing on what is known for certain

— that is: what information is contained in the signals

received and when did those signals arrive.

A fundamental principle of competitive bargaining

is “never reveal your best price”, and another is “never

reveal your deadline — if you have one” (Sandholm

and Vulkan, 1999). It is not possible to be prescrip-

tive about what an agent should reveal. All that can

be achieved is to provide strategies that an agent may

choose to employ. The following are examples of

such strategies.

An agent’s strategy S is a function of the infor-

mation I

that is has at time t. That information

will be represented in the agent’s K and B, and will

have been used to calculate P(OPAcc) and P(NAAcc).

Simple strategies choose an offer only on the ba-

sis of P(OPAcc), P(NAAcc) and α. The greedy

strategy S

chooses arg max

{P(NAAcc (δ)) |

P(OPAcc(δ))  0}, it is appropriate for an

agent that believes OP is desperate to trade. The

expected-acceptability-to-NA-optimizing strategy S

∗

chooses arg max

{P(OPAcc(δ)) × P(NAAcc(δ)) |

P(NAAcc (δ)) ≥ α}, it is appropriate for a conﬁdent

agent that is not desperate to trade. The strategy S

−

chooses arg max

{P(OPAcc(δ)) | P(NAAcc(δ)) ≥

α}, it optimizes the likelihood of trade — it is a good

strategy for an agent that is keen to trade without com-

promising its own standards of acceptability.

An approach to issue-tradeoffs is described in

(Faratin et al., 2003). The bargaining strategy de-

scribed there attempts to make an acceptable offer by

“walking round” the iso-curve of NA’s previous offer

(that has, say, an acceptability of α

≥ α) towards

OP’s subsequent counter offer. In terms of the ma-

chinery described here, an analogue is to use the strat-

egy S

−

: arg max

{ P(OPAcc(δ)) | P(NAAcc(δ) |

)  α

} for α = α

. This is reasonable for an

agent that is attempting to be accommodating without

compromising its own interests. Presumably such an

agent will have a policy for reducing the value α

if her deals fail to be accepted. The complexity of

the strategy in (Faratin et al., 2003) is linear with

the number of issues. The strategy described here

does not have that property, but it beneﬁts from us-

ing P(OPAcc) that contains foot prints of the prior

offer sequence — see Sec. 3.4 — in that distribution

more recent offers have stronger weights.

6 CONCLUSIONS

The negotiating agent achieves its goal of reaching

informed decisions whilst making no assumptions

about the internals of its opponent. The agent is

A FOUNDATION FOR INFORMED NEGOTIATION

founded on information theory is ‘driven’ by real-

time information ﬂows derived from its environment

that includes the Internet. Existing text and data min-

ing bots have been used to feed information into NA

in experiments including a negotiation between two

agents in an attempt to swap a mobile phone for a

digital camera with no cash involved.

NA has ﬁve ways of leading a negotiation towards

a positive outcome. First, by making more attractive

offers to OP. Second, by reducing its threshold α.

Third, by acquiring information to hopefully increase

the acceptability of offers received. Fourth, by en-

couraging OP to submit more attractive offers. Fifth,

by encouraging OP to accept NA’s offers. The ﬁrst

two of these have been described. The third has been

implemented but is not described here. The remaining

two are the realm of argumentation-based negotiation

which is the next step in this project. The integrated

way in which NA manages both the negotiation and

the information acquisition should provide a sound

basis for an argumentation-based negotiator.

(Halpern, 2003) discusses problems with the

random-worlds approach, and notes particularly rep-

resentation and learning. Representation is particu-

larly signiﬁcant here — for example, the logical con-

stants in the price domains could have been given

other values, and, as long as they remained or-

dered, and as long as the input values remained un-

changed, the probability distributions would be unal-

tered. Learning is not an issue now as the distributions

are kept as simple as possible and are re-computed

each time step. The assumptions of maximum entropy

probabilistic logic exploit the agent’s limited rational-

ity by attempting to assume “precisely no more than

is known”. But, the computations involved will be

substantial if the domains in the language L are large,

and will be infeasible if the domains are unbounded.

If the domains are large then preference relations such

as κ

can simplify the computations substantially.

Much has not been described here including: the

data and text mining software, the use of the Bayesian

net to prompt a search for information that may lead

to NA raising — or perhaps lowering — its accept-

ability threshold, and the way in which the incoming

information is structured to enable its orderly acqui-

sition (Debenham, 2004). We have not described the

belief revision and the identiﬁcation of those random

worlds that are consistent with K.

The following issues are presently being investi-

gated. The random worlds computations are per-

formed each time the knowledge, K, or beliefs, B,

alter — there is scope for using approximate updat-

ing techniques interspersed with the exact calcula-

tions. The offer accepting machinery operates inde-

pendently from the offer making machinery — but not

vice versa — this may mean that better deals could

have been struck under some circumstances.

REFERENCES

Bulow, J. and Klemperer, P. (1996). Auctions versus nego-

tiations. American Economic Review, 86(1):180–194.

Castro-Schez, J., Jennings, N., Luo, X., and Shadbolt, N.

(2004). Acquiring domain knowledge for negotiating

agents: a case study. International Journal of Human-

Computer Studies, 61(1):3 – 31.

Debenham, J. (2004). A bargaining agent aims to

‘play fair’. In proceedings Twenty-fourth Interna-

tional Conference on Innovative Techniques and Ap-

plications of Artiﬁcial Intelligence, pages 173–186.

Springer-Verlag: Heidelberg, Germany.

Faratin, P., Sierra, C., and Jennings, N. (2003). Using

similarity criteria to make issue trade-offs in auto-

mated negotiation. Journal of Artiﬁcial Intelligence,

142(2):205–237.

Halpern, J. (2003). Reasoning about Uncertainty. MIT

Press.

Jaynes, E. (1957). Information theory and statistical me-

chanics: Part I. Physical Review, 106:620 – 630.

Kraus, S. (2001). Strategic Negotiation in Multiagent Envi-

ronments. MIT Press.

MacKay, D. (2003). Information Theory, Inference and

Learning Algorithms. Cambridge University Press.

Muthoo, A. (1999). Bargaining Theory with Applications.

Cambridge UP.

Myerson, R. and Satterthwaite, M. (1983). Efﬁcient mecha-

nisms for bilateral trading. Journal of Economic The-

ory, 29:1–21.

Neeman, Z. and Vulkan, N. (2000). Markets versus ne-

gotiations. Technical report, Center for Rationality

and Interactive Decision Theory, Hebrew University,

Jerusalem.

Osborne, M. J. and Rubinstein, A. (1990). Bargaining and

Markets. Academic Press.

Pietra, S. D., Pietra, V. D., and Lafferty, J. (1997). Inducing

features of random ﬁelds. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 19(2):380–

393.

Ramchurn, S., Jennings, N., Sierra, C., and Godo, L.

(2003). A computational trust model for multi-agent

interactions based on conﬁdence and reputation. In

Proceedings 5th Int. Workshop on Deception, Fraud

and Trust in Agent Societies.

Sandholm, T. and Vulkan, N. (1999). Bargaining with dead-

lines. In Proceedings of the National Conference on

Artiﬁcial Intelligence (AAAI).

Sierra, C. and Debenham, J. (2005). An information-based

model for trust. In Dignum, F., Dignum, V., Koenig,

S., Kraus, S., Singh, M., and Wooldridge, M., editors,

Proceedings Fourth International Conference on Au-

tonomous Agents and Multi Agent Systems AAMAS-

2005, pages 497 – 504, Utrecht, The Netherlands.

ACM Press, New York.

ICEIS 2006 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS