The adaptation performed in Eq. 2 was imple-
mented by the need for the activation function to be
the identity function at the beginning of training of
a new layer. Throughout the training of this layer,
the activation function must be gradually converted
into the sigmoid function, which is promoted by the
c variable, so that after a certain amount of iterations,
the activation function returns to be only the sigmoid
function traditional. Thus, the variable c acts by
weighting the identity and sigmoid functions, trans-
forming an activation function of identity at the be-
ginning of the training of the new layer, to a sigmoid
function at the end.
The inclusion of the c variable in the sigmoid ac-
tivation function (φ) gives the designer the power to
control the influence of a layer in learning, in which
the closer to 1 is the value of the variable c, the greater
its influence and the closer to 0, the less will be such
influence. The use of the variable c in training is
of great importance for the method of inclusion of
new layers, because, with this artifice, it is possible to
guarantee that the influence of the new layer is grad-
ual, as the weights of the new layer adjust to the prob-
lem, since, by default, these are generated randomly
for each new layer inserted.
Due to the change in the sigmoid activation func-
tion mentioned above, it was necessary to use the
derivative of the altered function in the backpropaga-
tion algorithm (Eq. 3).
dφ
A
dn
= c · (y) · (1 − y) + (1 − c) (3)
where y is a traditional sigmoid function and c is the
weighting factor.
Using the D mask and the activation function
changed with the use of the variable c, the insertion of
a new layer did not interfere negatively in the learn-
ing, as it can be seen in the following chapter with
the presentation of the results obtained in the experi-
ments.
4 RESULTS AND DISCUSSION
CollabNet application was performed in a pattern
recognition task. The base used was the Wisconsin
Breast Cancer Dataset, withdrawn from the Machine
Learning Repository of the University of California
at Irvine (UCI). This database has information on 669
breast tumor registries, having two classes identified
as malignant (M) and benign (B) tumors, each with
ten calculated real characteristics for each cell nu-
cleus: radius, texture, perimeter, area, softness (local
variation in radius length), compactness (
perimetro
2
area−1.0
),
concavity (concave portions of the contour), concave
points, symmetry, and fractal dimension.
For this database, several configurations of the
proposed network were tested, varying several initial-
ization parameters, for both network and new layers:
number of neurons and epochs, learning rate, value
and moment of increment of variable c e of the mask
D (∆D) and the behavior of the weights in the train-
ing. However, the amount of neurons in the hidden
layers is a primary parameter that relates to the struc-
ture of the network. This parameter is defined at the
moment of creation of the network, being immutable
from the insertion of the second hidden layer, ensur-
ing that each new layer hidden changes only the depth
of the network and not its width.
4.1 Parameterization
The training parameters were estimated empirically
and always with the inclusion of a new layer. The
learning rate, the number of times, the behavior of
the weights of the new layer in relation to its initial-
ization are essential parameters of the insertion of a
new layer. Finally, we have the parameters related to
the behavior of the variable c and the mask D, defin-
ing information regarding the velocities of variation
of these variables in training.
Given the various parameters of the network, the
variable c deserves a more comprehensive explana-
tion, which in this project plays a special role. This
variable is directly related to the inclusion of a new
layer, as well as its transition, so that a newly in-
serted layer, not useful for learning the network, can
become an element of importance to this learning, as
presented in the Section 3. The variable c has the
responsibility to control the influence of a new layer
in training, being that influence is a quantity directly
proportional to the value of c, that is, the closer the
c is of its value (1), the higher the influence of the
new layer in training. Therefore, the control of c is
the great challenge of this proposal and the way that
the value of this variable increases during the training
of the new layer needs to be parameterized individu-
ally. The parameterization occurred empirically, with
values between 0.001 and 0.003 being chosen.
The designer must carefully observe the adjust-
ment of the increment of c since its correct parame-
terization has a direct influence on the behavior of the
MSE. Figure 6(a) presents an enlarged view of the last
training layer shown in Figure 6(b), wherein this in-
clusion an increment of the variable c relatively high
was defined, approximately 0.3 at each iteration. In
each iteration of c, it is possible to perceive the pertur-
bations in the MSE. This phenomenon is explained by
CollabNet - Collaborative Deep Learning Network
689