for applicability of such designs.
A major barrier to implementing adaptive designs
in practice is computational. Bandit problems in clin-
ical trials context are typically modeled as MDP’s,
where the solution is obtained by solving a finite-
horizon dynamic program (Ahuja and Birge, 2014).
However, the problem size increases exponentially as
the number of time periods, patients, or treatment-
outcome combinations increase, commonly referred
to as the curse of dimensionality (Powell, 2007). As a
result, a direct application of dynamic programming
becomes computationally prohibitive and finding an
optimal policy to this high-dimensional problem be-
comes challenging (Bertsimas and Mersereau, 2007).
Approximation techniques address this problem
and allow users to find a solution by reducing the
problem size and the associated computational bur-
den. When the underlying problem is modeled as an
MDP, the possible approximation techniques are gen-
erally collectively referred to as approximate dynamic
programming (ADP). There currently exist several
techniques to approximate the value function or state
space or both. Although some techniques are more
popular than others, ADP still remains more of an art
than a science.
In this study, we use a grid to approximate the
state space. We numerically evaluate the optimality
loss or the loss in objective value with respect to fully
enumerated solution, as a result of this approximation.
Our proposed approach has implications for clinicians
and policymakers interested in finding an efficient yet
easily implementable design for large clinical trials,
where currently existing adaptive designs either can-
not be implemented or do not perform or scale well.
The rest of the paper is organized as follows. §2
provides an overview of the literature. §3 presents the
model and the proposed approximation method. We
present numerical results in §4. We conclude in §5.
2 LITERATURE
Several methods and techniques have been proposed
in the literature for approximate solutions to large dy-
namic programs (see (Powell, 2007) for a discussion).
Lagrangian decomposition-based ADP approach is
one such method to approximate the value function
(e.g. (Adelman and Mersereau, 2008)). The approach
has been used to find approximate solution in interac-
tive marketing (Bertsimas and Mersereau, 2007, e.g.)
and retail assortment (Caro and Gallien, 2007, e.g.).
Another method is to use polynomials, for example
least squares approximation using Chebyshev poly-
nomials (Judd, 1998). (Ahuja and Birge, 2014) is the
only study that has used approximation in the con-
text of adaptive designs for clinical trials; they use
a truncated-horizon or limited-lookahead approxima-
tion method.
Grid-based methods are commonly used to ap-
proximate the state space. These techniques sample
a finite number of points, called the grid, from the en-
tire state space, compute the value of the points in the
grid and approximate the values of the non-grid points
via some form of interpolation (Sandikci, 2010).
There exists a rich literature on the grid-based
approximation including notable studies within the
operations management literature (Monahan, 1982;
Lovejoy, 1991; Aviv and Pazgal, 2005), as well in the
computer science literature (Hauskrecht, 1997; Zhou
and Hansen, 2001). (Sandikci et al., 2013) is an ex-
ample of a recent study that uses a grid-based approx-
imation approach in a healthcare setting to approxi-
mate the position of the patient on the waiting list.
There are several approaches for grid-based ap-
proximation that depend on the grid construction
choices, for example, uniform vs. non-uniform grid,
fixed vs. variable resolution grid, etc. In general, all
the corner points of the probability simplex are in-
cluded in the grid since that eliminates the need to ex-
trapolate (see (Sandikci, 2010) for a brief overview).
In this paper, we use a fixed-resolution uniform grid
since that allows for an efficient interpolation.
While grid-based approximation methods have
been studied and implemented before, our contribu-
tion lies in the efficient use of such methods in the
clinical trials context, specifically to the response-
adaptive designs for clinical trials, thus widely broad-
ening the practical applicability of such designs. In
a later study, we provide bounds on optimality gap
while noting that the solution obtained by approxima-
tion is a lower bound on the optimal solution obtained
from a fully enumerated problem.
3 MODEL
We follow the Bayes-adaptive Markov decision pro-
cess (BAMDP) model developed in (Ahuja and Birge,
2014). The state in the BAMDP model is a vec-
tor with dimension equal to the number of treatment-
outcome combinations, also called health conditions.
The state thus captures the information observed so
far (history) and is used to derive the distributions that
describe the uncertainty in the transition probabilities.
We first re-define the state in terms of fraction
of patient observations within each health condition.
Each state dimension then represents the fraction of
patient observations observed so far in a given health
ApproximationMethodsforDeterminingOptimalAllocationsinResponseAdaptiveClinicalTrials
461