et al., 2005). Due to the high similarity of adjacent
frames, this process provides highly accurate and re-
liable results. The question how the keyframe can be
colored is left unanswered, though.
A large group of colorization approaches involve a
human operator to exploit the vast knowledge of hu-
mans about objects in the real world. Users label a
few pixels within the image with corresponding co-
lors, which are then propagated to neighboring pixels
with similar intensity or textural patterns (A. Levin,
2004; Yatziv and Sapiro, 2006). The strength of these
approaches is certainly the user. On the one hand, hu-
mans are naturally well trained to match colors from
memory. On the other hand, colors will be selected
that are plausible on the object- or semantic level
instead of being based on low-level image informa-
tion such as texture or intensity alone. While these
approaches result in a visually pleasing color version,
the involvement of a human operator is time consu-
ming and hinders the application to a large amount of
images.
The last group consists of fully automatic appro-
aches which do not rely on any kind of manual in-
teraction. Instead, they are based on a set of trai-
ning images where additional to the grayscale images
the corresponding color information is known. These
methods derive a statistical model of the relationship
between intensity and textural patterns on the one side
and realistic colors on the other side. A recent exam-
ple is (R. Zhang, 2016) which is based on a Convoluti-
onal Network (ConvNet) which is trained on millions
of color images. The resulting network is able to co-
lorize general images and leads to visually pleasing
results.
While approaches such as (R. Zhang, 2016) are
based on millions of training images, our approach
operates with 10-20 images with the additional con-
straint that these images show a similar scene (see Fi-
gure 1). Due to the smaller amount of images, training
is very efficient and can be easily tailored to specific
scenes for various applications.
It applies a Random Forest (RF, (Breiman, 2001))
for the regression task of estimating plausible co-
lor information when provided with a local grays-
cale image patch. Color information is stored as 2D
histograms over chrominance values within the CIE
L*a*b* color space. Although the RF is trained on
only a few images of a similar scene, the usage of
local image patches as well as comparatively large
histograms would lead to a large memory footprint
if naively implemented. Two solutions are proposed
to cope with this problem. First, the color histograms
at the leafs are usually sparse and can thus be sto-
red in a memory-efficient manner. The second solu-
tion is based on the observation, that only tree crea-
tion needs to hold all training samples in the memory
(see Section 2.2), while tree training can be executed
on training batches (see Section 2.3). Instead of pre-
computing any kind of low-level features, the RF is
applied to the grayscale images directly. Correspon-
ding node tests compute several implicit features on
the fly (as for example in (Lepetit and Fua, 2006)),
which allow memory- and time-efficient processing
and are furthermore highly adaptable to the specific
colorization task. The training data is augmented with
training images at different scales to enable the forest
to map scaled textures to the correct colors. Obser-
ved color values are rebalanced similar to (R. Zhang,
2016) to account for the fact that pastel colors occur
more frequently than saturated colors.
The contribution of the proposed method is there-
fore five-fold:
• Decoupling tree creation and tree training to make
full use of a large amount of training samples du-
ring the estimation of the target variable.
• Implicit feature learning makes the computation
of predefined features obsolete.
• A sparse representation of the target variables le-
ads to memory-efficient RFs.
• Data augmentation increases the robustness of the
colorization regarding scale.
• Color rebalancing leads to realistically saturated
colors.
2 COLORIZATION ALGORITHM
2.1 Preprocessing
The proposed colorization method is based on a RF
(see Sections 2.2-2.4) as regression method. As su-
pervised approach, it relies on training data which -
additionally to the grayscale images - provides the
corresponding color information. While ground truth
data is difficult to obtain in many other supervised ma-
chine learning problems such as semantic segmenta-
tion, it basically comes for free for colorization tasks.
Any kind of color image can be transformed into a
grayscale version where the latter is used as training
data and the former as reference image. The propo-
sed method uses the CIE L*a*b* color space to per-
form regression, since it decouples luminance from
color information. Thus, training images are conver-
ted from RGB to CIE L*a*b*, where the luminance
L* is used as training input and the a*b* components
as target variable. During prediction, the luminance
Towards Image Colorization with Random Forests
271