of searching for keywords and expressions with the
same semantic value as those chosen by the investiga-
tor. That is, instead of searching only for the exact
occurrence of the term that the researcher chooses,
the system could search for exact occurrences of syn-
onyms and/or expressions similar to those that the
researcher chose, and rank the results from such in-
ferences with lower scores in relation to those that
present a direct occurrence of the typed term. It would
also be extremely important to compare the proposed
framework and the application of machine learning
and deep learning algorithms using a dataset contain-
ing conversations and data directly related to crimes,
so that the obfuscation attempts were more evident.
ACKNOWLEDGEMENTS
This work is supported in part by CNPq - Brazil-
ian National Research Council (Grants 312180/2019-
5 PQ-2 and 465741/2014-2 INCT on Cybersecu-
rity), in part by the Administrative Council for Eco-
nomic Defense (Grant CADE 08700.000047/2019-
14), in part by the General Attorney of the Union
(Grant AGU 697.935/2019), in part by the Na-
tional Auditing Department of the Brazilian Health
System SUS (Grant DENASUS 4/2021), in part
by the General Attorney’s Office for the National
Treasure (Grant PGFN 23106.148934/2019-67), and
in part by the University of Brasilia (Grant 7129
FUB/EMENDA/DPI/COPEI/AMORIS).
REFERENCES
Affairs, S. (2021). Security affairs - telegram
is becoming the paradise of cyber criminals.
https://securityaffairs.co/wordpress/122609/cyber-
crime/telegram-cybercrime.html.
Almeida, L. C. d., Filho, F. L. d. C., Marques, N. A., Prado,
D. S. d., Mendonc¸a, F. L. L. d., and Sousa Jr., R. T. d.
(2021). Design and evaluation of a data collector and
analyzer to monitor the covid-19 and other epidemic
outbreaks. In Rocha,
´
A., Ferr
´
as, C., L
´
opez-L
´
opez,
P. C., and Guarda, T., editors, Information Technology
and Systems, pages 23–35, Cham. Springer Interna-
tional Publishing.
Archibugi, D. and Iammarino, S. (2002). The globalization
of technological innovation: Definition and evidence.
Review of International Political Economy, 9(1):98–
122.
Britannica (2022). Britannica - acquisition and
recording of information in digital form.
https://www.britannica.com/technology/information-
processing/Acquisition-and-recording-of-
information-in-digital-form.
CHURCH, K. W. (2017). Word2vec. Natural Language
Engineering, 23(1):155–162.
CISA (2008). Cybersecurity and infrastructure security
agency of the united states (cisa) - computer foren-
sics. https://www.cisa.gov/uscert/sites/default/files/
publications/forensics.pdf.
Cornell (2022a). Cornell - jurisprudence.
https://www.law.cornell.edu/wex/jurisprudence.
Cornell (2022b). Cornell - movie review data.
http://www.cs.cornell.edu/people/pabo/movie-
review-data/.
Di Nunzio, G. M. (2004). A bidimensional view of docu-
ments for text categorisation. In McDonald, S. and
Tait, J., editors, Advances in Information Retrieval,
pages 112–126, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Forbes (2021). Forbes - how the fbi unmasked a
whatsapp and whisper user in a pedophile sting.
https://www.forbes.com/sites/thomasbrewster/2021/04
/12/how-the-fbi-unmasked-a-whatsapp-and-whisper-
user-in-a-pedophile-sting/?sh=3ac6e12641b5.
Guti
´
errez, Y., V
´
azquez, S., and Montoyo, A. (2016). A se-
mantic framework for textual data enrichment. Expert
Systems with Applications, 57:248–269.
Havrlant, L. and Kreinovich, V. (2017). A simple prob-
abilistic explanation of term frequency-inverse docu-
ment frequency (tf-idf) heuristic (and variations mo-
tivated by this explanation). International Journal of
General Systems, 46(1):27–36.
IBM (2022). Ibm - what is overfitting?
https://www.ibm.com/cloud/learn/overfitting.
Katyayan, P. and Joshi, N. (2019). Sarcasm Detection
Approaches for English Language, pages 167–183.
Springer International Publishing, Cham.
Luo, Y., Wang, W., and Lin, X. (2008). Spark: A keyword
search engine on relational databases. In 2008 IEEE
24th International Conference on Data Engineering,
pages 1552–1555.
Lynn K, P. (2022). Lynn k, perry - the shape of the vocab-
ulary predicts the shape of the bias. https://www.ncbi.
nlm.nih.gov/pmc/articles/PMC3222225/.
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N.,
Chenaghlu, M., and Gao, J. (2021). Deep learning–
based text classification: A comprehensive review.
ACM Comput. Surv., 54(3).
MIT (1997). Mit - support vector ma-
chines: Training and applications.
https://dspace.mit.edu/handle/1721.1/7290.
MIT (2018). Mit - the risk of machine-
learning bias (and how to prevent it).
https://sloanreview.mit.edu/article/the-risk-of-
machine-learning-bias-and-how-to-prevent-it/.
MIT (2022). Mit - data warehouse at mit: Strategy doc-
ument. https://ist.mit.edu/sites/default/files/services/
business/Data\%20Warehouse\%20@\%20MIT\
\
%20Strategy\%20Document.pdf.
Mukhopadhyay, D., Sharma, M., Joshi, G., Pagare, T., and
Palwe, A. (2013). Experience of developing a meta-
semantic search engine. In 2013 International Confer-
ICSBT 2022 - 19th International Conference on Smart Business Technologies
88