imals between member countries, compared to previ-
ous procedures, if the regulations are followed.
Results: other/: 0.96
5.3 Hardware and Implementation
The hardware used to run the tests includes a In-
tel(R) Core(TM) i7-6700HQ CPU @ 2.60 GHz with
16 GB RAM. The actual classifier has been developed
in Python with the scikit-learn, CountVectorizer and
Multi-layer Perceptron (MLP) libraries used to build
vectors and for creating the neural networks. The
pickle library has been used to persist the neural net-
works and to load them in memory. In general, with
this hardware the time taken to classify a document is
less than 25 milliseconds.
6 CONCLUSION AND FUTURE
WORKS
This paper presented NETHIC, a classifier for textual
contents based on hierarchical taxonomies and neural
networks. The results reported showed that this com-
bined approach has several advantages for tackling
the given task, including modularity, scalability and
overall performance. The system has proved itself to
be successful both on a general and a specific, terrorist
and crime-related domain. As far as the latter is con-
cerned, future developments may include the possibil-
ity of applying the classification process for enhanc-
ing other components of the DANTE project itself,
especially related to the discovery of criminal and ter-
rorist networks from social media via the analysis of
the textual contents shared and discussed online. Ad-
ditional improvements may also involve the introduc-
tion of deep learning algorithms in order to increase
NETHIC’s global accuracy, for example by introduc-
ing more hidden layers for each neural network and
different activation functions (e.g. via frameworks
like TensorFlow (Abadi et al., 2015)); this may lead to
further refinements of NETHIC’s classification mech-
anism.
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,
M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,
G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-
lur, M., Levenberg, J., Man
´
e, D., Monga, R., Moore,
S., Murray, D., Olah, C., Schuster, M., Shlens, J.,
Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-
houcke, V., Vasudevan, V., Vi
´
egas, F., Vinyals, O.,
Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and
Zheng, X. (2015). TensorFlow: Large-scale machine
learning on heterogeneous systems. Software avail-
able from tensorflow.org.
Alarc
´
on, R., S
´
anchez, O., and Mijangos, V. (2009). Ex-
ploiting Wikipedia as a knowledge base: Towards an
ontology of movies. 534:8–16.
Atzeni, P., Polticelli, F., and Toti, D. (2011a). An auto-
matic identification and resolution system for protein-
related abbreviations in scientific papers. In Lec-
ture Notes in Computer Science, volume 6623 LNCS,
pages 171–176.
Atzeni, P., Polticelli, F., and Toti, D. (2011b). Experimenta-
tion of an automatic resolution method for protein ab-
breviations in full-text papers. In 2011 ACM Confer-
ence on Bioinformatics, Computational Biology and
Biomedicine, BCB 2011, pages 465–467.
Atzeni, P., Polticelli, F., and Toti, D. (2011c). A framework
for semi-automatic identification, disambiguation and
storage of protein-related abbreviations in scientific
literature. In Proceedings - International Conference
on Data Engineering, pages 59–61.
Bird, S., Klein, E., and Loper, E. (2009). Natural Language
Processing with Python - Analyzing Text with the Nat-
ural Language Toolkit. O’Reilly.
Chen, J.-C. (2003). Dijkstra’s shortest path algorithm. J. of
Form. Math., 15.
Dalal, M. K. and Zaveri, M. (2011). Automatic text classi-
fication: A technical review. 28.
Forman, G. (2003). An extensive empirical study of feature
selection metrics for text classification. The Journal
of machine learning research, 3:1289–1305.
Hermundstad, A., Brown, K., Bassett, D., and Carlson, J.
(2011). Learning, memory, and the role of neural net-
work architecture. 7.
Koppel, M. and Winter, Y. (2014). Determining if two doc-
uments are written by the same author. J. of the Assoc.
for Information Science and Technology, 65:178–187.
Kowsari, K., Brown, D. E., Heidarysafa, M., Meimandi,
K. J., Gerber, M. S., and Barnes, L. E. (2017). Hdltex:
Hierarchical deep learning for text classification. 2.
Kumar, A., IRsoy, O., Su, J., Bradbury, J., English, R.,
Pierce, B., Ondruska, P., Gulrajani, I., and Socher, R.
(2015). Ask me anything: Dynamic memory networks
for natural language processing. In CoRR.
Lahitani, A. R., Permanasari, A. E., and Setiawan, N. A.
(2016). Cosine similarity to determine similarity mea-
sure: Study case in online essay assessment. In 2016
4th International Conference on Cyber and IT Service
Management, pages 1–6.
Lamar, M., Maron, Y., Johnson, M., and Bienenstock, E.
(2010). SVD and clustering for unsupervised POS
tagging. In ACL.
Lamont, J. (2003). Dynamic taxonomies: keeping up with
changing content. KMWorld, 12.
Li, J., Chen, X., Hovy, E., and Jurasky, D. (2015). Visu-
alizing and understanding neural models in NLP. In
CoRR.
NETHIC: A System for Automatic Text Classification using Neural Networks and Hierarchical Taxonomies
305