3. It is quite plausible, although it has not, as far as
we know, been published, that some large com-
panies developing their own ad hoc experimental
software, use Thompson sampling in its most ef-
ficient form in this context, where the algorithm
is dynamically updated at each decision and not
every N decisions.
In experiments where the objective follows a non-
Bernoulli distribution, measuring browsing time,
number of pages visited, total revenue, Thompson
sampling cannot be used since it is not possible to
parameterize the reward distribution. Therefore, the
main companies, such as Adobe and ABTasty, use
other alternatives, mainly a variation on e-greedy al-
gorithms. Other vendors such as Google, with Google
Analytics, and ABTasty, do not provide information
on whether or not and how they perform dynamic
traffic distribution with objectives not following a
Bernoulli distribution.
As cited before, the type of distribution associated
with the rewards or the indicator to be optimized does
not have to be known a priori in the possibilistic re-
ward (PR) methods. Thus, they constitute an alterna-
tive for dynamic traffic distribution for non-Bernoulli
reward distributions.
In (Mart
´
ın et al., 2020) a variation for dynamic
traffic distribution in A/B testing accounting for PR
methods for non-Bernoulli reward distributions is
proposed.
3.2 Stopping Criterion
The stopping criterion plays a key role in the execu-
tion of A/B testing experiments. It is used to decide
when a variation is considered to be the best.
The de facto method used to define the stopping
criterion in most approaches is based on a classical
hypothesis test. However, classic stopping criteria are
not very efficient, since they are unable to dynami-
cally stop the test when there is enough evidence to
suggest that one variation is better than the others
(Scott, 2015).
Recently, the most innovative companies are in-
troducing more dynamic stopping criteria to reduce
testing costs, leading to the same statistical signif-
icance in a similar way. These new methods, al-
though perfectly applicable to classical A/B testing,
come hand in hand with the new methods for dy-
namic traffic distribution. The multi-armed bandit
paradigm is the most popular, since the number of
samples that have to be executed for each variation
is determined dynamically rather than using classical
hypothesis tests to identify the number of samples re-
quired to achieve statistical significance.
These new criteria are based on different ap-
proaches (Bayesian, inequalities bounds...). In
(Mart
´
ın et al., 2020) a review of the most important
approaches is provided, including Google Analytics,
which uses a stopping criterion based on a Bayesian
approach (Scott, 2015; Google, ), and Adobe Target
(Adobe, ), in which a stopping method based on con-
fidence intervals computed by the Bernstein inequal-
ity (Bernstein, 1946) is used. Google Analytics and
Adobe Target are the the stopping criteria most used
by the main vendors.
The stopping method based on the value remain-
ing used by Google Analytics (Scott, 2015) is very
efficient in environments with rewards following a
Bernoulli distribution, since it has to know the exact
distribution of the expected rewards in order to carry
out the simulations. The distribution of the expected
rewards is inferred with a Bayesian approach.
This approach, however, has a drawback: the
shape of the reward distribution has to be known or
modeled by a family of parameterizable distributions
on which priors can be applied. In addition, it should
be tractable or at least computationally efficient to up-
date the a posteriori distributions and the expected
value. This is not very often the case in many real
contexts, where the family to which the reward dis-
tribution belongs (normal, Poisson, Bernoulli) is un-
known. Besides, if the distribution is known or can
be modeled, it is very difficult to make an efficient
inference using, for example, conjugate priors.
To overcome this problem, a new approach was
proposed in (Mart
´
ın et al., 2020), in which the prob-
ability distribution of the expected rewards efficiently
is approximately modeled by applying the possibilis-
tic rewards methods (PR2 and PR3) for the reward
in each variation. To do this, only the minimum and
maximum reward bounds have to be known rather
than the distribution of each reward. This information
is commonly available in real contexts.
Once the density function of the expected reward
(Step 3 in PR2 and PR3) is derived, the simulation and
stopping condition techniques used in (Scott, 2015)
are applied. In the Section 5, reporting a numerical
analysis of these methods on checkout process scenar-
ios, these approaches will be denoted as PR2 ValRem
and PR3 ValRem.
Besides, a stopping criterion computed from ap-
proximations to the probability distributions of the ex-
pected reward derived from PR2 and PR3 methods is
also proposed in (Mart
´
ın et al., 2020) for emulating
confidence level-based stopping criteria, such as em-
pirical Bernstein in Adobe Target.
To do this, function that outputs the percentile
value is needed, which will be used as a confidence
A/B Testing Adaptations based on Possibilistic Reward Methods for Checkout Processes: A Numerical Analysis
281