
in Table 3a. In the Laptop domain, which has a small
domain size, the pre-trained model outperformed the
baseline in terms of both utility and win rate, in-
dicating superior performance. This is because the
small domain size allows the pre-trained model to ef-
ficiently capture data patterns and effectively demon-
strate its generalizability. In the IS BT domain, the
utility and win rate of the pre-trained model are lower
than those of the baseline. However, given that 54%
of the agents achieved the same utility, the pre-trained
model performed at a comparable level to the base-
line. In addition, although the pre-trained model
underperforms in terms of utility in the Itexvs and
thompson domains, its win rate is equal to or higher
than that of the baseline. In the Grocery domain, the
pre-trained model outperforms the baseline in terms
of utility, demonstrating performance on par with or
better than that of the baseline. In summary, the
pre-trained model, which is trained on seven time-
dependent strategies, exhibits comparable or superior
performance to the baseline across various domains,
indicating a high level of generalization ability.
Second, we discuss and analyze the results pre-
sented in Table 3b. In the Laptop domain, similar
to Table 3a, the pre-trained model outperforms the
baseline in terms of both utility and win rate. How-
ever, in the IS BT and thompson domains, the per-
formance is lower, whereas, in the Itexvs and Gro-
cery domains, the performance improves. Although
the performance varies by domain, overall, the pre-
trained model demonstrates equal or better perfor-
mance. Therefore, the pre-trained model, which also
includes behavior-dependent strategies, exhibits high
generalizability.
Finally, we discuss and analyze the results pre-
sented in Table 3c. The Laptop domain shows sim-
ilar results to Table 3a and Table 3b, with the pre-
trained model outperforming the baseline in terms
of both utility and win rate. Although the perfor-
mance in the IS BT domain is lower, in all other do-
mains, the pre-trained model outperforms the base-
line in terms of utility. The pre-trained model selects
the model with the highest average utility for each
domain; thus, having a higher utility in four of five
domains indicates that the pre-trained model outper-
forms the baseline. Therefore, the pre-trained model
trained on time-dependent, behavior-dependent, and
ANAC agents also demonstrates high generalizabil-
ity.
In addition, a comparative evaluation was per-
formed between the three models: T7-(Pre), TB9-
(Pre), and TBA13-(Pre). The evaluation criteria in-
cluded the average utility values obtained by each
model against the seven time-dependent agents in dif-
ferent domains. Furthermore, for each domain, the
top rate was defined as the percentage of times each
model achieved the highest utility among the four
models, including the baseline. Table 4 presents the
results of this comparison.
In Table 4, TBA13-(Pre) demonstrates the highest
utility and top rate in three domains with small do-
main sizes. The Grocery domain also exhibits gener-
ally high utility and top rate. However, in the thomp-
son domain, the utility and top rate of TB9-(Pre) and
TBA13-(Pre) are lower than those in the other do-
mains. This shows that a pre-trained model incorpo-
rating successful ANAC champion agents tends to ex-
hibit higher generalizability than other models. How-
ever, in more complex domains with large domain
sizes and higher conflict levels, a pre-trained model
specialized in time-dependent strategies performs bet-
ter.
7 EVALUATION ON
PERFORMANCE
IMPROVEMENT THROUGH
FINE-TUNING
7.1 Experimental Settings
To improve the learned negotiation strategy for the
specific individual agents, fine-tuning was performed
using the pre-trained model as the initial parameters.
The model fine-tuned from the pre-trained T7-(Pre)
is referred to as T7-(FT); similarly, the models fine-
tuned from TB9-(Pre) and TBA13-(Pre) are referred
to as TB9-(FT) and TBA13-(FT), respectively.
7.2 Experimental Results & Discussion
Table 5, Table 6, and Table 7 compares the perfor-
mance of the fine-tuned models (T7-(FT), TB9-(FT),
TBA13-(FT)) with their respective baselines and pre-
trained models. Each table shows the average utility
and top rate for each domain. In addition, the “up-rate
(P→F)”, which represents the percentage of agents
that improved from the pre-trained model to the fine-
tuned model, is also calculated for each domain.
First, we discuss and analyze the results presented
in Table 5. The up-rate in three of five domains
was 0%, indicating no performance variation after
fine-tuning the pre-trained model. The pre-trained
model was already optimized, indicating that it ef-
fectively learned general patterns from multiple op-
ponents, resulting in high generalizability. However,
in the IS BT domain, 43% of the agents improved in
Pre-Trained Models and Fine-Tuning for Negotiation Strategies with End-to-End Reinforcement Learning
407