minimization,First, we have observedsome cases where the information-theoreticmeth-
ods do not necessarily succeed in increasing information content. For example, when
the number of neurons increases, the adjustment among neurons becomes difficult,
which prevents neural networks from increasing information content. Then, we have
a problem of computational complexity. As experted, information or entropy functions
gives complex learning formula. This also suggests that the information-theoreticmeth-
ods can be effective only for the relatively small sized neural networks. Third, we have
a problem of compromise between information maximization and error minimization.
From the information-theoretic points of view, information on input patterns should
be increased. However, neural networks should minimize errors between targets and
outputs. We have observed that information maximization and error minimization are
sometimes contradictory to each other. This mean that it is difficult to comprise between
information maximization and error minimization in one framework.
We here propose a new information-theoretic methods to facilitate information ac-
quisition in neural networks. Instead of directly dealing with the entropy function, we
realize a process of information maximization by using the outputs from neurons with-
out normalizing the outputs for the probability approximation. This direct use of outputs
can facilitate a process of information maximization and eliminate the computational
complexity.
In addition, we separate information acquisition and use phase. We first try to ac-
quire information content in input patterns. Then, we use obtained information content
to train supervised neural networks. This eliminates contradiction between information
maximization and error minimization. The effectiveness of separation has been proved
to be useful in the field of deep learning [8], [9], [10], [11]. Different from those meth-
ods, our method tries to create actively necessary information for supervised learning.
2 Theory and Computational Methods
2.1 Simplified Information Maximization
We developedthe information-theoreticmethods to increase information content in hid-
den neurons on input patterns. We have so far succeeded in increasing the information
content to a large quantity [5], [6], [7]. However, the method was limited to networks
with a relatively smaller number of hidden neurons because of the computational com-
plexity of the information method. In addition, we found that the obtained information
content did not necessarily contribute to improved prediction performance.
The computational complexity of the information-theoretic methods can be atten-
uated by dealing directly with the outputs from the neurons. We try to approximate
higher information by producing the hidden patterns achieved by the real information
maximization.
Information in Hidden Neurons. We here explain how to compute the information
and approximate it for simplification. Let x
s
k
and w
jk
denote the kth element of the
sth input pattern and connection weights from the kth input neuron to the jth hidden
79