(Pardoe and Stone, 2006)). Additionally, (Kitts and
Leblanc, 2004) suggested computing a myopic (one-
shot) profit maximizing bid given learned regression
models of expected position and payment per click.
One problem with learning-based approaches is that
they do not prescribe what should be done in the ab-
sence of any information about the adversaries. Ad-
ditionally, they assume that adversary behavior is sta-
tionary and, thus, past behavior is a good predictor
of future behavior. In fact, learning may take some
time before its prescriptions are effective, and the
opponents will often be learning themselves, creat-
ing complex interactions between the learning algo-
rithms, with policies that are unlikely to be stationary.
We steer away from learning-based approaches
entirely, with our bidding policy determined by a
simulation-based equilibrium estimate. We do so not
to suggest that learning is a lost cause; rather, we fol-
low a precise research agenda: developing an agent
that plays an equilibrium strategy alone allows us to
directly measure the efficacy of a pure game theo-
retic approach. Success of our approach will, thus,
make a good case for equilibrium as initial prediction
and strategic prescription, while further online explo-
ration may or may not lead an agent to play other,
more promising strategies.
In order to apply simulation-based game theoretic
techniques to bidding, we need to first abstract the
complex environment of TAC/AA into a computation-
ally tractable restricted bidding strategy class. To this
end, we make a dramatic simplification in considering
bidding strategies which are linear in an estimate of
an advertiser’s value per click v, i.e., b(v) = αv. The
motivation for such a restriction comes from the lit-
erature on the theory of one-item auctions (Krishna,
2002), which often exhibits equilibria that are lin-
ear in bidder valuations, as well as other game the-
oretic treatments of far simpler models of keyword
auctions (Vorobeychik, 2009). Note that this bidding
function is entirely myopic, as it contains no tempo-
ral dependence (or any other state information about
the game that may be available). On the other hand,
it is very simple to implement and highly intuitive: an
agent is asked to determine what fraction of his value
he wishes to bid. Indeed, particularly due to the simi-
larity of the GSP price mechanism to Vickrey auction,
a very natural strategy would be to bid one’s value,
setting α = 1. As we demonstrate below, this “truth-
ful bidding” turns out to be a very poor strategy in our
context.
While we have now a concrete class of bidding
strategies to focus on, we have yet another question
to answer before we can proceed to the actual analy-
sis stage: as value per click is not directly given, how
do we derive it from the TAC/AA specification and/or
game experience? We devote the next section to this
question.
4.3 Estimating Value per Click
A value per click of an advertiser a for a keyword q is
the expected revenue from a click,
v
a
= Pr{conversion|click}E[R
a
q
|conversion].
Revenue from a conversion depends entirely on
whether the manufacturer in the keyword (user pref-
erence) matches the advertiser’s specialty. If the man-
ufacturer is specified in the keyword, the revenue is
$15 if it matches the specialty and $10 otherwise. If
not, the expected revenue is 15 ×
1
3
+ 10 ×
2
3
=
35
3
, as
there is a 1/3 chance of a specialty match.
To compute the conversion probability, we need to
estimate two things: the proportion of focused shop-
pers and the (expected) value of I
d
. We begin with
the former, assuming that an estimate of I
d
is avail-
able. Since the proportion of focused shoppers ac-
tually depends on agent policies, we obtain an ini-
tial estimate using an arbitrary fixed policy, use the
result to estimate bidding equilibria, and then refine
the estimate using equilibrium bidding policies.
3
If
we fix agent policies, the proportion of focused shop-
pers on a given day for a keyword q can be com-
puted as the ratio of the empirical fraction of clicks
that result in purchases and the estimate of conver-
sion probability of a focused shopper. We average
such empirical proportions for every simulation day
over 100-130 simulations to obtain a daily estimate
of expected proportion of focused shoppers for each
keyword. We further average the resulting empirical
proportions of focused shoppers over keyword classes
(that is, over 6 F1 keywords in one case and over
9 F2 keywords in another). Thus, we have in the
end empirical proportions of focused shoppers for the
three classes of keywords, shown in Figure 2. Two
features of this plot are worthy of note. First, the
proportions are essentially the same for all keyword
classes. This is not very surprising: there isn’t a
very strong a priori reason to believe that they would
of necessity be different. Second, proportions fol-
low a damped harmonic oscillation pattern. These
oscillations are caused by the nonstationarity in the
state transition process: a higher proportion of fo-
cused shoppers yield a higher conversion probability
and, therefore, more sales, which result in the drop
of conversion probability due to exhausted capacity
3
In practice, it turned out that our estimates of focused
shopper proportions were not very sensitive to the specifics
of a bidding policy in our linear strategy space.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
38