An Alternative to Restricted-Boltzmann Learning for Binary Latent
Variables based on the Criterion of Maximal Mutual Information
David Edelman
University College Dublin, Ireland
Keywords:
Machine Learning, Data Compression, Information Theory, Unsupervised Learning.
Abstract:
The latent binary variable training problem used in the pre-training process for Deep Neural Networks is appro-
ached using the Principle (and related Criterion) of Maximum Mutual Information (MMI). This is presented
as an alternative to the most widely-accepted ’Restricted Boltzmann Machine’ (RBM) approach of Hinton.
The primary contribution of the present article is to present the MMI approach as the arguably more logically
’natural’ and logically simple means to the same ends. Additionally, the relative ease and effectiveness of the
approach for application will be demonstrated for an example case.
1 INTRODUCTION
As has become evident in recent years, the use of
pre-training is crucial to the overall training of Deep
Neural Networks. Historically, the training of feed-
forward Neural Networks involved weight initialisa-
tion had been carried out by mere pseudo-random
sampling, which worked satisfactorily for networks
of few hidden layers. The inadequacy of this form
of initialisation, however, inhibited research into net-
works of deeper architecture, and it was the key bre-
akthrough of Hinton in 1999 (Hinton, 1999) intro-
ducing new methods for unsupervised ’pre-training’,
which first enabled the widespread use of Networks
of Deeper architecture, which in turn marked the be-
ginning of the resurgence in the research area of Neu-
ral Networks known as Deep Learning. In essence,
the notion of ’pre-training’ in a feed-forward net-
work amounts to an iterated succession of unsupervi-
sed data compressions continuing forward into a net-
work before supervised learning or training has be-
gun. The method of compression proposed by Hin-
ton is referred to as the ’Restricted Boltzmann Ma-
chine’ (hereafter, RBM), a construct which owes its
heuristics to analogy with problems in thermodyna-
mics, and requires an intricate estimation procedure
involving application of advanced Monte Carlo simu-
lation including Gibbs Sampling, in a process refer-
red to as Contrastive Divergence. The RBM approach
in pre-training has been proven to be effective, and
indeed become one of the most widely-used heuris-
tics for carrying out pre-training. One question worth
asking, however, is whether a logically simpler, more
direct approach (not involving analogies, heuristics or
requiring intricate simulations or calculations) might
be found. It is this question to which the present arti-
cle addresses itself.
In what follows, a method based on a probability-
based measure called Mutual Information, which, it
is argued, should be maximum between a pair of va-
riables if one is considered to be an optimal compres-
sion of the other. Therefore, a Maximimum Mutual
Information (hereafter, MMI) Criterion is introduced
and applied in training to attempt or approach opti-
mal compression from one network layer to the next.
It will be argued that this leads to a practicable algo-
rithm for achieving a similar aim as an RBM, and this
algorithm is then exhibited as being effective and sim-
ple to implement, where a practical example from the
Financial Markets is used to demonstrate.
Before proceeding, it should be mentioned that
while the methods proposed here for the ’pre-training’
problem would generally be applied in place of the
RBM methodology, the latter will not be reviewed
here. This is because it is believed that the RBM con-
struct and the advanced techniques involved in its ap-
plication do not lend themselves well to brief descrip-
tion and explanation, so it is therefore felt that readers
unfamiliar with RBMs would not benefit from an at-
tempt to describe it all here, even in general terms. By
contrast, it is believed that a wide variety of readers
will be able to follow the (arguably much simpler) ap-
proach adopted here for addressing the ’pre-training’
problem, where many such readers might not readily
be able to grasp and apply the RBM approach without
considerable further study.
Edelman, D.
An Alternative to Restricted-Boltzmann Learning for Binary Latent Variables based on the Criterion of Maximal Mutual Information.
DOI: 10.5220/0007618608650868
In Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART 2019), pages 865-868
ISBN: 978-989-758-350-6
Copyright
c
2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
865