(P. Rusmevichientong, 2006), which is problematic
with low frequency keywords. In addition long tail
search engine marketing campaigns became very pop-
ular and attained attention in research (B. Skiera,
2010). Skiera et al. argue in their empirical study that
focusing on the long tail is not that profitable because
the top 20% of keywords in terms of search volume
coveredalready 94.32% of the conversions(B. Skiera,
2010). Nonetheless our tests show that it can be valu-
able to find profitable pattern in the long tail.
Another research stream in web search focuses
on finding concepts in search queries and relation-
ships between search and interest ((G. Xu, 2009),
(M. Pasca, 2007)). As far as we know there has been
no work on aggregating search queries in the long tail
from a search engine marketing perspective.
3 METHOD
Our method is a recursive algorithm (Algorithm 1). It
aggregates search queries in dependence on a given
target metric and provides viable keyword defini-
tions for later deployment in Google Adwords. Typ-
ical target metrics for optimizing search engine mar-
keting campaigns are click-through rate, conversion
rate, cost-per-click and cost-per-conversion. The al-
gorithm has four different parameters:
• training data (T)
• offset phrase (o)
• minimal size (mS)
• quality measure (mQ).
The training data for the algorithm has to contain
a list of different search engine queries and the re-
quired attributes for calculating the target metric (e.g.
number of impressions and numberof clicks for click-
through rate as target metric). The algorithm divides
the whole set of available search queries into distinct
subsets in dependence of the most frequent phrase in
the set containing the offset phrase. The subsets are
further spitted till the stop criterion is reached. The
stop criterion is the minimal size of the subset. The
target metric determines the attribute for calculating
the minimal size (e.g. target metric conversion rate
determines number of clicks as minimal size because
the conversion rate depends on the number of clicks).
If a subset cannot be divided any further the target
metric in the subset is calculated. If the target met-
ric in the subset fulfils the requirements of the qual-
ity measure the keyword definitions for the subset are
generated.
The requirements of the target platform where the
optimized search engine marketing campaign should
Algorithm 1: Keyword Definition Generation.
procedure GETKEYWORDS(T, o, mS, mQ)
Find the most frequent phrase mf p in the
available search queries T containing o.
Divide T into three subsets A, B,C with:
∀a ∈ A, mf p = a
∀b ∈ B, mf p ⊂ b
∀c ∈ C, m f p 6⊂ c
if (getSize(A) > mS then
if getQuality(A) > mQ then
Generate a new keyword definition
containing the most frequent phrase
(mf p).
end if
end if
if getSize(B) > mS then
getKeywords(B, m f p, mS, mQ)
else
if getQuality(B) > mQ then
Generate a new keyword definition
containing the most frequent phrase
(mf p).
end if
end if
if getSize(C) > mS then
getKeywords(C, o, mS, mQ)
else
if getQuality(C) > mQ then
Generate a new keyword definition
containing the most frequent offset
phrase (o).
end if
end if
end procedure
be deployed limit the design of possible keyword def-
initions. Therefore a single keyword definition can
only consist of a phrase (single words in a specific
order) with four different modifiers (Table 1). Table
2 shows an example set of possible keyword defini-
tions describing a profitable subset of matching search
queries found by the algorithm.
Table 1: Allowed modifiers for keyword definitions.
Positive
exact match
Matches if a search query equals
the phrase: [phrase].
Positive
phrase match
Matches if a search query contains
the phrase with optional words be-
fore or after the phrase: ”phrase”.
Negative
exact match
Doesn’t match if the search query
equals the phrase: -[phrase].
Negative
phrase match
Doesn’t match if the search query
contains the phrase: -”phrase”.
KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
226