cation problems characterized by different number of
labels, ranging from an order of about 10 to an or-
der of about 30, 000. The considered task is harder
than a normal multi-class text classification due to
variable label number (multi-label) associated to each
sample. We analysed the performances and the be-
haviours of the networks considering the effects of the
training hyperparameters of the SGD function, with
an increasing classes number and average number of
labels per sample, using a Big Data training source
extracted from PubMed repository. We performed a
preliminary empirical evaluation of the link between
the SGD hyperparameters and the dataset complex-
ity, providing an overview of the performances and
the optimal settings of learning rate, momentum and
batch size for the considered problem.
As future works we are planning to investigate the
impact of the hyperparameters on other DNN topolo-
gies, and we are considering to build a completely
new topology customized for the hierarchical XMTC
problems.
In addition, we will investigate on the use of hi-
erarchical label structure, exploiting the better perfor-
mances of higher label levels to correct the results ob-
tained with deeper cases.
REFERENCES
Alicante, A., Benerecetti, M., Corazza, A., and Silvestri,
S. (2016a). A distributed architecture to integrate on-
tological knowledge into information extraction. In-
ternational Journal of Grid and Utility Computing,
7(4):245–256.
Alicante, A., Corazza, A., Isgr
`
o, F., and Silvestri, S.
(2016b). Semantic Cluster Labeling for Medical Re-
lations, pages 183–193. Springer.
Amato, A., Di Martino, B., Scialdone, M., and Venticinque,
S. (2014). Personalized Recommendation of Seman-
tically Annotated Media Contents, pages 261–270.
Springer International Publishing, Prague, Czech Re-
public.
Baker, S. and Korhonen, A. (2017). Initializing neural net-
works for hierarchical multi-label text classification.
In BioNLP 2017, pages 307–315, Vancouver, Canada.
ACL.
Baumel, T., Nassour-Kassis, J., Elhadad, M., and Elhadad,
N. (2017). Multi-label classification of patient notes
a case study on icd code assignment. arXiv preprint
arXiv:1709.09587.
Berger, M. J. (2015). Large scale multi-label text classifi-
cation with semantic word vectors. Technical report,
Stanford University.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of machine Learning re-
search, 3(Jan):993–1022.
Chen, G., Ye, D., Xing, Z., Chen, J., and Cambria, E.
(2017). Ensemble application of convolutional and
recurrent neural networks for multi-label text cate-
gorization. In 2017 International Joint Conference
on Neural Networks, IJCNN 2017, pages 2377–2383,
Anchorage, AK, USA. IEEE.
Gargiulo, F., Silvestri, S., and Ciampi, M. (2017a). A big
data architecture for knowledge discovery in pubmed
articles. In 2017 IEEE Symposium on Computers and
Communications, ISCC 2017, pages 82–87, Herak-
lion, Greece. IEEE.
Gargiulo, F., Silvestri, S., Fontanella, M., Ciampi, M., and
De Pietro, G. (2017b). A deep learning approach for
scientific paper semantic ranking. In International
Conference on Intelligent Interactive Multimedia Sys-
tems and Services, pages 471–481, Vilamoura, Portu-
gal. Springer.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Hughes, M., Li, I., Kotoulas, S., and Suzumura, T. (2017).
Medical text classification using convolutional neural
networks. CoRR, 235:246–250.
Ilievski, I., Akhtar, T., Feng, J., and Shoemaker, C. A.
(2017). Efficient hyperparameter optimization for
deep learning algorithms using deterministic rbf sur-
rogates. In Proceedings of the Thirty-First AAAI Con-
ference on Artificial Intelligence (AAAI-17), pages
822–829, San Francisco, California, USA. AAAI.
Liu, J., Chang, W., Wu, Y., and Yang, Y. (2017). Deep learn-
ing for extreme multi-label text classification. In Pro-
ceedings of the 40th International ACM SIGIR Con-
ference on Research and Development in Information
Retrieval, pages 115–124, Shinjuku, Tokyo, Japan.
ACM.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. CoRR, abs/1301.3781.
Nam, J., Kim, J., Loza Menc
´
ıa, E., Gurevych, I., and
F
¨
urnkranz, J. (2014). Large-scale multi-label text
classification - revisiting neural networks. In Machine
Learning and Knowledge Discovery in Databases -
European Conference, ECML PKDD 2014, pages
437–452, Nancy, France. Springer.
Nentidis, A., Bougiatiotis, K., Krithara, A., Paliouras, G.,
and Kakadiaris, I. (2017). Results of the fifth edition
of the bioasq challenge. In Proceedings of the BioNLP
2017 workshop, pages 48–57, Vancouver, Canada.
ACL.
Nesterov, Y. (1983). A method for unconstrained convex
minimization problem with the rate of convergence
o(1/kˆ 2). In Doklady AN USSR, volume 269, pages
543–547.
Nigam, P. (2017). Applying deep learning to icd-9 multi-
label classification from medical records. Technical
report, Stanford University.
¨
Ozg
¨
ur, A.,
¨
Ozg
¨
ur, L., and G
¨
ung
¨
or, T. (2005). Text catego-
rization with class-based and corpus-based keyword
selection. Computer and Information Sciences-ISCIS
2005, pages 606–615.