small k-sized target sets.
Monta
˜
nez et al. later defined bias, the degree to
which an algorithm is predisposed to a fixed target,
with respect to the expected per-query probability of
success metric, and proved that there were a limited
number of favorable information resources for a given
bias (Monta
˜
nez et al., 2019). Using the search frame-
work, they proved that an algorithm cannot be favor-
ably biased towards many distinct targets simultane-
ously.
As machine learning grew in prominence, re-
searchers began to probe what was possible within
machine learning. Valiant considered learnability of
a task as the ability to generate a program for per-
forming the task without explicit programming of the
task (Valiant, 1984). By restricting the tasks to a
specific context, Valiant demonstrated a set of tasks
which were provably learnable.
Schaffer provided an early foundation to the
idea of bounding universal performance of an algo-
rithm (Schaffer, 1994). Schaffer analyzed general-
ization performance, the ability of a learner to clas-
sify objects outside of its training set, in a classifi-
cation task. Using a baseline of uniform sampling
from the classifiers, he showed that, over the set of
all learning situations, a learner’s generalization per-
formance sums to zero, which makes generalization
performance a conserved quantity.
Wolpert and Macready demonstrated that the his-
torical performance of a deterministic optimization
algorithm provides no a priori justification whatso-
ever for its continued use over any other alternative
going forward (Wolpert and Macready, 1997), imply-
ing that there is no utility in rationally choosing a
thus-far better algorithm over choosing the opposite.
Furthermore, just as there does not exist a single al-
gorithm that performs better than random on all possi-
ble optimization problems, they proved that there also
does not exist an optimization problem on which all
algorithms perform better than average.
Continuing the application of prior knowledge
to learning and optimization, G
¨
ulc¸ehre and Bengio
showed that the worse-than-chance performance of
certain machine learning algorithms can be improved
through learning with hints, namely, guidance using
a curriculum (G
¨
ulc¸ehre and Bengio, 2016). So, while
Wolpert’s results might make certain tasks seem futile
and infeasible, G
¨
ulc¸ehre’s empirical results show that
there exist some alternate means through which we
can use prior knowledge to attain better results in both
learning and optimization. Dembski and Marks mea-
sured the contributions of such prior knowledge us-
ing active information (Dembski and Marks II, 2009)
and proved the difficulty of finding a good search al-
gorithm for a fixed problem (Dembski and Marks II,
2010), through their concept of a search for a search
(S4S). Eventually, their work expanded into a formal
general theory of search, characterizing the informa-
tion costs associated with success (Dembski et al.,
2013), which served as an inspiration for later devel-
opments in machine learning (Monta
˜
nez, 2017b).
Others have worked towards meaningful bounds
on algorithmic success through different approaches.
Sterkenburg approached this concept from the per-
spective of Putnam, who originally claimed that a uni-
versal learning machine is impossible through the use
of a diagonalization argument (Sterkenburg, 2019).
Sterkenburg follows up on this claim, attempting to
find a universal inductive rule by exploring a measure
which cannot be diagonalized. Even when attempting
to evade Putnam’s original diagonalization, Sterken-
burg is able to apply a new diagonalization that rein-
forces Putnam’s original claim of the impossibility of
a universal learning machine.
There has also been work on proving learn-
ing bounds for specific problems. Kumagai and
Kanamori analyzed the theoretical bounds of param-
eter transfer algorithms and self-taught learning (Ku-
magai and Kanamori, 2019). By looking at the local
stability, or the degree to which a feature is affected
by shifting parameters, they developed a definition for
parameter transfer learnability, which describes the
probability of effective transfer.
2.1 Distinctions from Prior Work
The expected per-query probability of success metric
previously defined in the algorithmic search frame-
work (Monta
˜
nez, 2017b) tells us, for a given infor-
mation resource, algorithm, and target set, how often
(in expectation) our algorithm will successfully locate
elements of the target set. While this metric is useful
when making general claims about the performance
of an algorithm or the favorability of an algorithm
and information resource to the target set, it lacks
the specificity to make claims about similar perfor-
mance and favorability on a per-iteration basis. This
trade-off calls for a more general metric that can be
used to make both general and specific (per iteration)
claims. For instance, in transfer learning tasks, the
performance and favorability of the last pre-transfer
iteration is more relevant than the overall expected
per-query probability of success. The general proba-
bility of success, which we will define as a particular
decomposable probability-of-success metric, is a tool
through which we can make claims at specific and rel-
evant steps.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
786