run. “Iterations” corresponds to each training step
performed, which is completed when a batch is taken.
“Learn. Rate” refers to the size of the change made
by the learning algorithm in neural network weights.
Thus, the higher this value, the greater will be the
changes in neural network weights during training.
“Accuracy” is the percentage of correct classifications
in the test set that the neural network model obtained
after performing the training.
Model 1 presented an accuracy of 72.5% and was
the highest reached in this work. This model reached
a total of 29 samples correctly classified, 11 sam-
ples classified wrong, totaling the 40 tested samples.
Among these errors, 6 are false positives, and 5 are
false negatives.
Model number 2 brought very relevant results, as
described in Table 1 presented an accuracy of 70.0%.
This model reached a total of 28 correctly classified
samples, 12 wrong classified samples, totaling 40 test
samples. Of these errors, 5 are false positives, and 7
are false negatives.
Following Table 1, model number 3 presented an
accuracy of 67.5%. This model reached a total of 27
correctly classified samples, 13 wrong classified sam-
ples, totaling 40 test samples. Of these errors, 6 are
false positives, and 7 are false negatives.
The next model listed is number 4 which showed
an accuracy of 65.0% as described in Table 1. This
model reached a total of 26 correctly classified sam-
ples, 14 wrongly classified samples, totaling 40 test
samples. Of these errors, 5 are false positives, and 9
are false negatives.
Model number 5, as described in Table 1, pre-
sented an accuracy of 55.0%. Among all the models
mentioned, this was the one that reached the lowest
result. This model reached only a total of 22 correctly
classified samples, 18 wrongly classified samples, to-
taling 40 test samples. Of these errors, 9 are false
positives, and 9 are false negatives. This model is the
only one in the results presented, which has the value
of 40 of MFCC. For this parameter setting, despite
the low result, it was the model that brought the most
significant result.
It may be possible to perceive that there are simi-
larities in the parameters of each model. For the layer
quantity parameter, values were only between 3 and
4, and when using values smaller or larger than this,
we have noticed that there is no better result. The
number of cells in the hidden layers was between 200
and 300. Out of this range, there were no results with
any significance. The file batch size was defined on
all models with a value of 64, which brought stabil-
ity to the model when it was trained several times,
always producing equivalent and more stable results.
The number of iterations was between 100 and 180,
and it was observed that when the number of itera-
tions was higher than 200, it produced an overfitting
behavior in the model. This means that the model pre-
sented good classification results during training, but
for a set of new data, which is the data for testing, the
model demonstrates inefficiency. The learning rate
showed variable values according to each model. It
can be observed in a considerably high-value range,
varying from 0.001 to 0.01. One of the parameters
that greatly influenced the results was the number of
MFCCs, in which models defined with values of 13
and 20, proved to be much more effective in the re-
sults than the model that had 40 MFCC. The high-
est accuracy obtained by a model with 40 MFCC was
55.0%, which is an inaccurate result and does not rep-
resent that the model is an effective classifier.
5 CONCLUSIONS AND FINAL
REMARKS
The work-related in this paper aims to analyze an
LSTM performance when classifying a person’s voice
answer as reliable or not. For that, it was necessary
to verify which prediction models implemented led
the best results by checking its accuracy. By verify-
ing directly, the model that has the highest number of
correct ratings during the tests demonstrates the most
efficient in the results. In the final stage of this work,
the most relevant results were listed, and among them,
the one that showed the most prominence is a model
that reached an accuracy of 72.5%. That is, 72.5%
of the test samples were classified correctly. It is still
possible to state that there was a relevant statistical
significance since the model behaved above the level
of chance. For the context of this study, which is lie
detection through the analysis of voice answers, the
results obtained in this project can be considered rel-
evant, since there is no such result in the literature.
It can be observed that the obtained results are
close to other similar works. Similar work using sim-
ilar technology with the experiment performed in this
project is the work of (Chow and Louie, 2017), where
the purpose is to find patterns of lying in the voice
(through the MFCC) and in the sample interview tran-
scripts, using as a prediction model a recurrent LSTM
neural network. Following these specifications, the
best result obtained in the work of (Chow and Louie,
2017) is 63% of accuracy, slightly lower than that
achieved in this work. One detail is that to perform
the work of CHOW and LOUIE, was used a dataset
ready for the type of problem to be solved, and with
a large number of samples, which can significantly
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
748