2.3 Multitask Learning
Deep learning models use a combination of many hid-
den layers and parameters in their learning process to
give results. As such, they require lots of data. Can-
cer like many other medical fields does not meet this
data requirement more so, because it applies manual
labor to label data instances (Zhang and Yang, 2017).
It is therefore, a perfect case for applying multitask
learning (MTL) where useful information from mul-
tiple relevant tasks are used to alleviate the problem of
data sparsity. MTL has been a promising field in ma-
chine learning since its initial formulation by Caruana
(1997). Broadly, the goal of MTL is to leverage use-
ful data/information found in multiple learning tasks
to get more accurate learners. Of course, this ob-
jective assumes that the tasks (or their subsets) are
related. Empirically and theoretically, jointly learn-
ing various tasks has been found to give better per-
formances than when learning is done independently.
Moreover, based on the tasks, MTL can take differ-
ent setup which outlines its effective classification as:
MTL supervised learning, MTL unsupervised learn-
ing, MTL semi-supervised learning and MTL online
learning among others.
MTL helps to promote the notion that machines
can mimic human learning activities as people trans-
fer knowledge from different tasks to further others.
For instance, the skills of long jump and, running
track and field can facilitate each other, hence im-
prove the performance of an athlete. Thus, MTL is
simply an inductive transfer mechanism that aims to
improve the generalization of machine learning mod-
els (Caruana, 1997). A concept (generalization) that
it fulfills by leveraging domain-specific data from re-
lated activities through parallel training. Therefore,
the training power of the additional tasks acts as an in-
ductive bias. In this case, an inductive bias hails from
its general definition which is anything that influences
an inductive learner to prefer certain hypotheses as
compared to others.
2.3.1 Empirical Studies
Most of observation studies of MTL have focused
on feature selection problems where some attributes
in multi-source data have been used in classification
of regression experiments. In most cases, the fea-
tures in question have been related even though they
are derived from different data sources. Based on
these underlying relations, it has been found to be
easier to jointly select the necessary attributes (fea-
tures) from various sources using joint selection reg-
ularizers. These regularizers, which are simply select
constraints, have been found to improve the perfor-
mance of classification models as compared to other
conventional techniques that evaluate features indi-
vidually based on their data sources. Examples of
regularizers commonly introduced include joint spar-
sity, graph sparse coding, graph self-representation
and low rank. It is the inclusion of these elements that
has helped MTL deal with complex worldly problems
such as the diagnosis of neurodegenerative diseases
(Bib, 2019). Using structural Magnetic Resonance
Imaging (sMRI), researchers have been able to pre-
dict the values of various types of clinical scores in
these conditions, including their specific subject di-
agnostic labels. An example of this success is high-
lighted by the study of Alzheimer disease (AD) where
clinical scores such as Mini-Mental State Examina-
tion (MMSE) and Dementia Rating Scale (DRS) have
been used to grade the healthiness (functionality) of
the brain.
As specified by MTL principles, the classification
in this instance is based on the prediction of a target
output. Because the target outputs, such as diagnos-
tic labels and clinical score, are related then one gets
better results unlike when each task is learned inde-
pendently. It is this ’similar’ approach that has led
to the recent success of self-driving automation sys-
tems. In this case, images from cameras attached to
subjects are used to detect objects (road signs, traffic
lights etc.) which are then fed into neural networks to
train a model for autonomous driving. A more robust
system is acquired because the model gets to learn
multiple objects simultaneously.
2.3.2 Multitask Learning Approaches
From the discussion above, MTL is simply a type
of inductive transfer which improves algorithms by
adding an inductive bias. This bias helps a model dis-
criminate some attributes and thus, prefer some hy-
potheses over others. `
1
regularization is the most
common type of inductive bias known in ML and is
often used to get preferences for various sparse so-
lutions. In contrast, MTL attains its inductive bias
through auxiliary tasks which through their contri-
butions models certain hypotheses inclinations. To
achieve its goals, MTL commonly employs two con-
trasting ways in deep neural networks. They are; hard
and soft parameter sharing (Ruder, 2017) The shared
element comes from segmentation (sharing) of hidden
layers.
1. Hard Parameter Sharing: Its application in neu-
ral networks goes back to Caruana (1997). It
shares the hidden layers between all tasks in-
volved but also maintains a few task-specific out-
put layers. Because of its efficiency and simplicity
Multitask Learning or Transfer Learning? Application to Cancer Detection
551