in a learning algorithm, in order to gain insights on
the expressivity of learning algorithms.
Within the statistical learning literature, there exist
various measures characterizing algorithmic expres-
sivity. For instance, the Vapnik-Chervonekis (VC) di-
mension (Vapnik and Chervonenkis, 1971) provides a
loose upper bound on algorithmic expressivity in gen-
eral by characterizing the number of data points that
can be exactly classified by the learning algorithm,
for any possible labeling of the points. However, the
disadvantages of the VC dimension include its inher-
ent dependence on the dimensionality of the space on
which the learning algorithm operates on (V’yugin,
2015), as well as the fact that it is only restricted
to classification problems. Building on the origi-
nal VC dimension idea, Kearns and Schapire devel-
oped a generalization of the VC dimension with the
Fat-shattering VC dimension by deriving dimension-
free bounds with the assumption that the learning al-
gorithm operates within a restricted space (Kearns
and Schapire, 1990). Further, Bartlett and Mendel-
son created Rademacher complexity as a more gen-
eral measure of algorithmic expressivity by eliminat-
ing the assumption that learning algorithms are re-
stricted within a particular distribution space (Bartlett
and Mendelson, 2003).
In this paper, we establish an alternative general
measure of algorithmic expressivity based on the al-
gorithmic search framework (Monta
˜
nez, 2017a). Be-
cause this search framework applies to clustering and
optimization (Monta
˜
nez, 2017b) as well as to the gen-
eral machine learning problems considered in Vap-
nik’s learning framework (Vapnik, 1999), such as
classification, regression, and density estimation, the-
oretical derivations of the expressivity of search al-
gorithms using this framework directly apply to the
expressivity of many types of learning algorithms.
3 SEARCH FRAMEWORK
3.1 The Search Problem
We formulate machine learning problems as search
problems using the algorithmic search framework
(Monta
˜
nez, 2017a). Within the framework, a search
problem is represented as a 3-tuple (Ω,T, F). The fi-
nite search space from which we can sample is Ω.
The subset of elements in the search space that we are
searching for is the target set T . A target function
that represents T is an |Ω|-length vector with entries
having value 1 when the corresponding elements of
Ω are in the target set and 0 otherwise. The external
information resource F is a finite binary string that
provides initialization information for the search and
evaluates points in Ω, acting as an oracle that guides
the search process. In learning scenarios this is typi-
cally a dataset with accompanying loss function.
3.2 The Search Algorithm
Given a search problem, a history of elements already
examined, and information resource evaluations, an
algorithmic search is a process that decides how to
next query elements of Ω. As the search algorithm
samples, it adds the record of points queried and in-
formation resource evaluations, indexed by time, to
the search history. The algorithm uses the history to
update its sampling distribution on Ω. An algorithm
is successful if it queries an element ω ∈T during the
course of its search. Figure 1 visualizes the search
process.
Ω
next point at time step i
(ω, F(ω))
Black-Box
Algorithm
Search History
·
·
·
(ω
2
, F(ω
2
))
i = 5
(ω
0
, F(ω
0
))
i = 4
(ω
5
, F(ω
5
))
i = 3
(ω
4
, F(ω
4
))
i = 2
(ω
1
, F(ω
1
))
i = 1
Figure 1: As a black-box optimization algorithm samples
from Ω, it produces an associated probability distribution
P
i
based on the search history. When a sample ω
k
corre-
sponding to location k in Ω is evaluated using the external
information resource F, the tuple (ω
k
, F(ω
k
)) is added to
the search history.
3.3 Measuring Performance
Following Monta
˜
nez, we measure a learning algo-
rithm’s performance using the expected per-query
probability of success (Monta
˜
nez, 2017a). This quan-
tity gives a normalized measure of performance com-
pared to an algorithm’s total probability of success,
since the number of sampling steps may vary depend-
ing on the algorithm used and the particular run of
the algorithm, which in turn effects the total probabil-
ity of success. Furthermore, the per-query probability
of success naturally accounts for sampling procedures
that may involve repeatedly sampling the same points
in the search space, as is the case with genetic al-
gorithms (Goldberg, 1999; Reeves and Rowe, 2002),
allowing this measure to deftly handle search algo-
rithms that manage trade-offs between exploration
and exploitation.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
142