Table 2: Experimental results of CNN with New Augmentation are in terms of the means of the achieved f1 measures during
10-fold Cross-Validation.
(Papakostas et al., 2017) 0.57 0.60 0.67 0.61
proposed 0.70 0.51 0.72 0.64
difference +22.8% -15.0% +7.5% +4.9%
Our future goals are focused on various directions.
Firstly, we would like to increase the robustness of the
proposed method in the given datasets, by further op-
timizing the learning process of the CNN. We also be-
lieve that speaker-dependent and speaker-independent
experimental setups will lead to further improvement
of results as recent research has shown. Two such
examples are (Huang et al., 2014) and (Zhao et al.,
2019). Within a speaker-dependent setup, samples
from multiple speakers are used for training; testing
takes place on different samples which belong to the
same set of speakers. Moreover, within a speaker-
independent setup, samples from multiple speakers
are used for training and testing takes place on sam-
ples that belong to a different set of speakers. In con-
clusion, another future goal could be to experiment
with models targeting at language or cultural infor-
mation of the speech or with models that use transfer
learning, which may provide another possible solu-
tion to language independence issues.
This research has been co-financed by the European
Union and Greek national funds through the Oper-
ational Program Competitiveness, Entrepreneurship
and Innovation, under the call RESEARCH – CRE-
ATE – INNOVATE (project code: 1EDK-02070).
