Conflict Resolution of Production-Marketing Collaborative Planning
based on Multi-Agent Self-adaptation Negotiation
Hao Li, Ting Pang, Yuying Wu and Guorui Jiang
The Economics and Management school, Beijing University of Technology,
No. 100, Pingleyuan, Chaoyang District, Beijing, China
Keywords: Production-Marketing Collaborative Conflict, Multi-Agent, Self-adaptation Negotiation, RBF Neural
Network, Q-reinforcement Learning.
Abstract: In order to overcome the lack of adaptability and learning ability of traditional negotiation, we regard supply
chain production-marketing collaborative planning negotiation as the research object, design one
five-elements negotiation model, adopt a negotiation strategy based on Q-reinforcement learning, and
optimize the negotiation strategy by the RBF neural network and predict the information of opponent for
adjusting the concession extent. At last, we give a sample that verifies the negotiation strategy can enhance
the ability of the negotiation Agents, reduce the negotiation times, and improve the efficiency of resolving
the conflicts of production-marketing collaborative planning, comparing to the un-optimized
Q-reinforcement learning.
1 INTRODUCTION
In order to meet the market requirements of the
dynamic changes quickly and timely, the retailer and
manufacturer on the supply chain establish contract
early and draw up merchandise procurement plan.
Now, the production-marketing planning changes
from the simple trading to the consideration of
general interests of supply chain. However, because
the objectives of different enterprise are always
various, their disagreements and conflicts appear
usually.
Distributed Agent technology have characters
such as interaction, autonomy and learning (Wang,
2013), it is not restricted by time and space, using
the negotiation means based on multi-agent in the
supply chain production-marketing collaborative
planning, not only resolves conflicts but also solves
the enterprise-decentralization problem. Many
scholars have used the Agent technology in the
negotiation of supply chain. For exampleKumar
proposed a multi-agent system, selected the best
supplier by automatic negotiation based on cost,
distance and quality(Kumar, 2011); Sara used the
multi-agent system to simulate multi-layer supply
chain, controlled the inventory and cost by sharing
information, predicting knowledge (Sara, 2012). In
order to adapt to environment and the opponent’s
dynamic information, the scholars in the intelligent
negotiation field began to introduce self-learning
mechanism into negotiation. For instance, Cheng
learned the opponent’s utility function by using
SVM (Cheng, 2009). Q-reinforcement learning is an
algorithm which is independent of environment
model, proposed by Watkins(Watkins, 1992), each
action of Agent during negotiating has a return
function value Q, then evaluates the present action of
Agent and predicts the next action of Agent,
accomplishes the proposed process by calculating
Q(Sui, 2010). Many studies have introduced
Q-reinforcement learning into collaborative
negotiations (Shen, 2012; Ariel, 2013) to resolve
conflicts effectively, optimize collaborative effect.
The studies above all improve the running
effectiveness of supply chain, but still there are some
shortages. Now there are fewer studies on supply
chain negotiation adopting adaptive algorithms;
self-learning ability and adaptability of negotiation
Agents are relative poor; most of negotiation
strategies based on Q-reinforcement learning are not
self-adaptive, there are lesser studies on adjusting Q
value depending on opponents’ behavior, and the
convergence speed is still slow. To solve the
problems, we propose a multi-agent adaptive
negotiation method for the problem of supply chain
production-marketing collaborative planning conflict.
209
Li H., Pang T., Wu Y. and Jiang G..
Conflict Resolution of Production-marketing Collaborative Planning based on Multi-Agent Self-adaptation Negotiation.
DOI: 10.5220/0004830602090214
In Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART-2014), pages 209-214
ISBN: 978-989-758-016-1
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
In consideration of the multi-issues between retailers
and manufacturers, we establish negotiation strategy
based on Q-reinforcement learning to ensure both
sides have different levels of concession, resolve
conflicts and obtain satisfying results. At the same
time, we use RBF neural network to optimize
negotiation strategy, predict opponent’s action and
adjust Q value, improve the speed of convergence,
reduce the negotiation times.
2 NEGOTIATION MODEL
Suppose the negotiation model is defined as
H={G, X , W, T, O }, the definition of each element are
shown as followed:
(1) G represents the set of negotiation Agent,
G={A
m
, A
r
}, A
m
represents manufacturer Agent, A
r
represents retailer Agent;
(2) X represents the set of negotiation issues,
suppose issue elements are quantitative,
X={x
1
,x
2
, ,x
n
}, n represents the number of
negotiation issues, the contents of issues may be
price, quantity, date of delivery, defect rate and so
on;
(3) W represents the set of negotiation issue
weight, W={
1
,
2
, ,
n
}, representing the
preference value of Agent on issue x
i
(1 i n),
suppose both Agents have same preference value;
(4) T represents the set of negotiation time-limit,
T={T
m
,T
r
}, T
m
and T
r
respectively represent the
maximum negotiation times that manufacturer Agent
and retailer Agent set;
(5) O represents the set of issue boundary values ,
O={O
m
, O
r
}, where
min max
[, ]
mm
mxixi
OOO
, denoting the
range of values which Manufacturer Agent can
accept on issue x
i
, and
min max
[, ]
rr
rxixi
OOO
the range of
values which retailer Agent can accept on issue x
i
.
Assuming t represents the t-th negotiation;
t
x
m
O
,
t
x
r
O
respectively represents the comprehensive
values on the whole issues which manufacturer
Agent and retailer Agent proposed at the t-th times,
as in (1), (2).
)(
1
ωOO
i
n
i
t
xi
t
x
mm
(1)
)(
1
ωOO
i
n
i
t
xi
t
x
rr
(2)
Where
t
xi
m
O
,
t
xi
r
O
respectively represents the proposal
values on the issue x
i
that Manufacture Agent and
Distributor Agent propose at the t-th time. With the
increase of the number t of negotiation,
t
xi
m
O
decreases within the acceptable range
],[
maxmin
mm
xixi
OO
, while
t
xi
r
O
increases within the
acceptable range
],[
maxmin
rr
xixi
OO
. When the absolute
value of the difference between
t
xi
m
O
and
t
xi
r
O
is less
than l (l presents a positive number which is less
than 0.5), the negotiation succeeds, and the final
trading value
O
t
xi
of x
i
takes their average value, as
in (3).
2
OO
O
t
xi
t
xi
t
xi
rm
(3)
3 THE NEGOTIATION
STRATEGY BASED ON
Q-REINFORCEMENT
LEARNING
Q-reinforcement learning method performs action a
t
in the state s
t
by Q function, and performs the
cumulative awarding values of discount it gains.
(Shen, 2007) :
)}(max{
11
as
Q
r
as
Q
tt
t
tt
Where r
t
denotes the awarding value Agent accept
after transferring from state s
t
to state s
t+1
, the value
can be positive, negative or zero.
the discount
factor,
11
,
tt
asQ
the expectations after Agent
transfer to state s
t+1
. During the Q-reinforcement
learning , Agent have experienced a series of time
steps, during every time step, the learning steps are:
(1) To observe current state s
t
; (2) To select and
perform action a
t
; (3) To observe the next state s
t+1
;
(4) To receive the reinforcement signal and adjust
Q-expectation according to the established Q-value
formula.
According to the thought of Q-reinforcement
learning, add reinforcement learning Q-value to the
process of negotiation proposal, then give Agent a
awarding to make each Agent has a concession to
some extent so that make sure both Agents reach
agreements as soon as possible. Assuming the r
t
is
positive when the negotiation succeeds, r
t
is negative
when the negotiation fails, or r
t
is 0 while the
negotiation is in progress. During the negotiation the
Q-value increases constantly, therefore current
),(max
11 tt
asQ is the Q-expectation from the last
negotiation. Based on the above hypothesis, the
Q-expectation of Agent during negotiation is
Q
t 1
.
At first, defining the Q-expectation of Agent as
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
210
shown in (4), (5).
The initial Q-expectation which retailer Agent
proposed on the issue x
i
at the t-th negotiation:
O
d
O
O
O
O
Q
t
xixi
t
xi
r
rr
xi
r
xi
r
r
)(
min
max
min
(4)
The initial Q-expectation is that manufacturer Agent
proposes on the issue x
i
at the t-th negotiation:
O
d
O
O
O
O
Q
t
xixi
t
xi
m
mm
xi
m
xi
m
m
)(
min
max
min
(5)
To control the growth speed of Q-value, we define
Q-value as the average expectation as shown in (6),
(7).
The average expectation of Q-value which
retailer Agent proposes on the issue x
i
at the t-th
negotiation:
t
Q
Q
r
t
t
xi
r
1
(6)
The average expectation of Q-value, which
manufacturer Agent proposes on the issue x
i
at the
t-th negotiation:
t
Q
Q
m
t
t
xi
m
1
(7)
The
controls the changing speed of the reward
value, also affects the concession degree of both
Agents in Q-reinforcement learning.
The discount factor
which retailer Agent
proposes on the issue x
i
at the t-th negotiation:
t
O
t
xi
t
xi
r
r
r
1
1
(8)
Where
t
xi
r
represents the proposal value that
retailer Agent predicts manufacturer Agent on the
issue x
i
at the t-th negotiation.
The discount factor
, which manufacturer
Agent proposes on the issue x
i
at the t-th negotiation:
t
O
t
xi
t
xi
m
mm
1
1
(9)
Where
t
xi
m
represents the proposal value that
manufacturer Agent predicts retailer Agent on the
issue x
i
at the t-th negotiation.
Agent will make concessions on the basis of the
award value during every negotiation, concession
degree and each proposal value are defined as shown
in (10), (11).
The value of proposal that retailer Agent
proposes on the issue x
i
at the t-th negotiation:
Q
OO
t
xi
xi
t
xi
r
rr
min
(10)
The value of proposal that manufacturer Agent
proposes on the issue x
i
at the t-th negotiation:
Q
OO
t
xi
xi
t
xi
m
mm
min
(11)
4 NEGOTIATION STRATEGY
OPTIMIZATION BASED ON
RBF NEURAL NETWORK
4.1 Designing of Network Structure
To reach agreement as soon as possible, we optimize
in Q-reinforcement learning and make reasonable
concession by predicting opponent’s proposal value
using RBF neural network (Shi, 2009). Taking
retailer Agent as example, to approach
t
xi
r
, we
design a 3-layer feed forward network, as shown in
Figure 3.
O
H
xi
InputLayer HiddenLayer
OutputLayer
t
xi
r
O
t
xi
r
1
O
t
xi
m
1
C
1
C
2
C
s
Figure 3: RBF neural network.
Input layer contains three nodes and input vector is
O=[o
1
,o
2
,o
3
]=[
H
xi
O
,
1t
xi
r
O
,
1t
xi
m
O
].
H
xi
O
is the average
historical negotiation result of both Agents on issues
x
i
, as shown in (12);
1t
xi
r
O
is the average t-1 round
proposal result of retailer Agent on issues x
i
, as
shown in (13);
1t
xi
m
O
is the average t-1 round
proposal result of manufacturer Agent on issue x
i
, as
shown in (14). The hidden layer contains s nodes;
Cp=[ c
p1
, c
p2
, c
p3
]
T
(1pS) represents the data
center of the p-th node with the same dimension as
O’s;
φ
=[
φ
(O,C
1
),
φ
(O,C
2
) ,…,
φ
(O,C
S
) ] is
output matrix in hidden layer and
φ
(·) is radial
basis function, achieving a direct mapping of input
layer to hidden layer based on Gauss function, as
ConflictResolutionofProduction-marketingCollaborativePlanningbasedonMulti-AgentSelf-adaptationNegotiation
211
shown in (15); E =[e
1
,e
2
, , e
S
]
T
is output weight
matrix. Output layer contains only one node, which
is the simple linear weighted sum of the hidden layer
output matrix, obtaining the possible proposal value
t
xi
r
which retailer Agent predicts manufacturer
Agent.
The average historical negotiation result
k
j
j
xi
H
xi
k
O
O
1
(12)
Where k is the number of x
i
historical negotiation
and
j
xi
O is the result of j-th historical negotiation.
The average t-1 round proposal result of retailer
Agent:
1
1
1
1
t
j
j
xi
rt
xi
r
t
O
O
(13)
The average t-1 round proposal result of
manufacturer Agent:
1
1
1
1
t
j
j
xi
m
t
xi
m
t
O
O
(14)
The Gaussian radial basis function:
S
CO
CO ,,1p),
2
exp(),(
2
p
p
p
(15)
Where
σ
p
is the width of the hidden layer and its
size determines the shape of the function.
The output value of the network output layer:
s
ECO
p
p
t
xi
r
1
),(
(16)
4.2 Network Parameter Learning
(1) The data center parameter of hidden layer is
updated based on K-means clustering algorithm,
specific steps are as follows:
Step 1: Randomly select S data samples as initial
data center C
p
in hidden layer (1pS);
Step 2: Group O=[o
1
,o
2
,o
3
]=[
O
H
xi
,
O
t
xi
r
1
,
O
t
xi
m
1
] by
the nearest cluster center, if
pS COCO min
(1sS), the sample o
1
,o
2
,o
3
belong to class ψ
s
;
Step 3: Calculate the average of the samples in
class ψ
s
update it to the new data enter,
1
p
p
CO
N
,
N
p
is the i-th node samples number;
Step 4: If the difference between the new cluster
center and the original is less than εthe obtained
cluster center is final basis function center of RBF
neural networks; and if more than εthen return to
step 2;
Step5: Confirm the width of the basic functions:
2
1
p
p
p
CO
N
.
(2) Output layer weights learning based on
Gradient-descent algorithm. Set the output error of
samples as
2
1
1
(, )
2
S
ip
p
D
bOCE





, Where b
i
is
the expectation output. Updating weights,
E
D
EE
Where
is the learning rate of
Gradient-descent algorithm.
5 AN EXAMPLE
OF SELF- ADAPTATION
NEGOTIATION
We give an example to illustrate the feasibility of
this strategy, the effectiveness of negotiation method.
Assume in the manufacturing supply chain of one
electronic product, retailer submits order plan to
manufacturer, including price, quantity, but both
sides have conflicts on collaborative plan, in order to
avoid reaching an impasse, they start to negotiate
using the negotiation strategy. The four items of this
plan are regarded as issues, n=2, here lists a part of
data. Suppose time-limit of negotiation T
r
=20,
T
m
=25, threshold l=0.2, node S=5, learning rate
2.0
, the set of issues boundary and weighting as
shown in Table 1.
Table 1: Data of example.
X
OO
max
xi
min
xi
rr

OO
max
xi
min
xi
mm
W
x1(price)
60,15

55,10
0.4
x2(quantity)
100,20

120,40
0.2
Using the adaptive negotiation method proposed, we
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
212
simulate and implement a conflict resolution of
supply chain production-marketing collaborative
planning. Simulating results of the proposals (here
means price and quantity with time going) submitted
by manufacturer Agent and retailer Agent are shown
in Figure 4. The proposal of either Agent at each
time is expressed as (price, quantity). It can be seen
that the manufacturer and retailer continue making
concessions. At the fifth time retailer Agent submits
the proposal (38.00, 77) and manufacturer Agent
submits (38.15, 77), the difference of comprehensive
value of either Agent on the two issues is 0.09,
which is less than 0.2, so the negotiation finishes.
According to formula (3), the final result is (38.08,
77) after negotiating for 5 times, and achieve
satisfactory results on both sides. Then we use the
Q-reinforcement learning algorithm to simulate, the
results are shown in Figure 5. The parameters of
Figure 5 have the same means as the Figure 4. The
final result is (43.35, 80) after negotiating for 10
times. By comparing Figure 4 with Figure 5, we can
find that the Q-reinforcement learning algorithm
optimized by RBF neural network can reduce the
negotiation times and improve the efficiency of
solving the production-marketing collaborative
planning conflict.
6 CONCLUSIONS
Resolving the conflicts of production-marketing
collaborative planning is an important guarantee of
low cost and high-efficiency running of supply chain;
it is an efficient way to resolve conflicts by
multi-agent self-adaptive negotiation method. We
construct a negotiation model, propose a negotiation
strategy based on Q-reinforcement learning to make
both Agents make concession to some extent, predict
opponent’s information and optimize negotiation
strategy by RBF neural network.
Figure 4: The result of simulation by the adaptive
negotiation strategy.
Experiment shows that when compared to only
using Q-reinforcement learning, the new method can
reduce negotiation times and improve efficiency of
resolving conflicts. In future, we will study
multi-agent self-adaptive negotiation method for
resolving conflicts on supply chain, exploring other
learning mechanism to improve the intelligence and
adaptability of supply chain.
Figure 5: The result of simulation by Q-reinforcement
learning algorithm.
ACKNOWLEDGEMENTS
This research was financially supported by National
Natural Science Fund of China (71371018,
71071005), and Beijing Philosophy and Social
Science Fund of China (13JDJGB037)
REFERENCES
Hao, J. Y., Cheng H. F., 2012. An Adaptive Bilateral
Negotiating Strategy over Multiple Items. In
Proceedings of IEEE International Conferences on
Web Intelligence and Intelligent Agent Technology.
Wang, G., Wong, T. N., Yu, C.X., 2013. A Computational
Model for Multi-agent E-commerce Negotiations with
Adaptive Negotiation Behaviors. In Journal of
Computational Science.
Kumar V., Mishra N., 2011. A Multi-agent Self Correcting
Architecture for Distributed Manufacturing Supply
Chain. In IEEE Systems Journal.
Sara S., Ali S., Reza S., 2012. Applying Agent-Based
System and Negotiation Mechanism in Improvement
of Inventory Management and Customer Order
Fulfillment in Multi Echelon Supply Chain. In Arabian
Journal for Science and Engineering.
Yu C., Gao J., Gu M. H., etc., 2009. Automatic
Negotiation Decision Model Based on Machine
Learning. In Journal of Software.
Watkins C., Dayan P., 1992. Q-reinforcement learning. In
Machine Learning.
Sui X., Cai G.Y., Shi L, 2010. Multi-agent Negotiation
Strategy and Algorithm Based on Q-Learning. In
Computer Engineering.
Chun S., Lei L., Fan L., etc, 2012. An Adaptive
Market-driven Agent Based on Multi-agent
ConflictResolutionofProduction-marketingCollaborativePlanningbasedonMulti-AgentSelf-adaptationNegotiation
213
Reinforcement Learning for Automated Negotiation.
In International Journal of Digital Content Technology
and its Applications.
Ariel M., Analía A., 2013. A Reinforcement Learning
Approach to Improve the Argument Selection
Effectiveness in Argumentation-based Negotiation. In
Expert Systems with Applications.
Shen, J., 2007. Hierarchical Reinforcement Learning
Theory and Method. In Harbin Engineering University
Press.
Shi, Z. Z., 2009. Neural Network. In Higher Education
Press, Beijing.
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
214