Tweets 84 89
Fig.2 shows the performance of the Conventional
Machine Learning classifiers using Context-
Dependent Lexical Features on the Tweet Dataset
after Word Embeddings. It is evident from the figure
that Linear SVM outperforms for the tweet dataset
using Conventional machine learning classification.
The performance results for tweet dataset achieved an
overall accuracy of 65 percent. The result shows that
SVM shows around 55 percent accurate result which
is more than Decision Tree and Logistic Regression
algorithms. The Naïve Bayes algorithm almost
performs similar to SVM. Therefore, achieving the
more prominent result, the LSVM can be used for the
lexical feature analysis. The text retrieved from the
unstructured data produced prominent results.
Because lexical information in structured (i.e.,
personally identifiable information) is lexical
regardless of context, so this is the case. The sensitive
data, such as birth dates, names, and gender, will
always be labeled as such, and will be clearly
distinguished from non-lexical data in the context. As
a result, it is important to know that the regardless of
the method, this type of data is simple to identify.
Table 3 shows a comparison of model's performance
to that of the earlier proposed methods. As
demonstrated, the deep learning approach outperforms
existing methods such as TF–IDF and models based
on statistically derived features.
7 CONCLUSIONS
Different approaches were tested on two categories of
lexical data: context-dependent lexical data and
context-independent lexical data. Identifying context-
independent lexical data is far easier than identifying
context-dependent lexical data, regardless of the
approaches utilised. Word extraction methods such as
TF–IDF/Count Vectorizer are used to extract features
from the text. Some keywords are more essential than
others in determining a text category. These
approaches, on the other hand, ignore the text's
sequential structure. Deep learning algorithms, on the
other hand, do not ignore the sequence structure while
providing more weight to significant terms.
REFERENCES
Kopeykina L. and Savchenko A.V.(2021). Automatic
Privacy Detection in Scanned Document Images Based
on Deep Neural Networks, I. Russian Automation
Conference (RusAutoCon), 11–16.
Myasnikov E., Savchenko A. (2021). “Detection of Lexical
Textual Information in User Photo Albums on Mobile
Devices”, Journal of Computing, 0384–0390.
Chow R., Golle P., Staddon J. (2019). Detecting privacy
leaks using corpus-based association rules, Proceeding
of the 14th ACM SIGKDD I.Conf. on Knowledge
Discovery and Data Mining - KDD 08.
Kamakshi P., Babu A.V. (2019). “Automatic detection of
lexical attribute in PPDM”, IEEE I. Conf. on
Computational Intelligence and Computing Research,
1–5.
Akoka J., Comyn-Wattiau I., Mouza C.D., Fadili H.,
Lammari N., Metais E. and Cherfi S.S.-S. (2019). A
semantic approach for semi-automatic detection of
lexical data, Information Resource Management, J. 27
(4), 23–44.
Mouza C.D., Métais E., Lammari N., Akoka J., Aubonnet
T., Comyn-Wattiau I., Fadili H. and Cherfi S.S.-S.d.
(2019) Towards an automatic detection of lexical
information in a database, 2nd I. Conf. on Advances in
Databases, Knowledge, and Data Applications
Heni H. and Gargouri F. (2019). “Towards an automatic
detection of lexical information in mongo database,
Advanced Intelligent System Computer Intelligent
System Design Application, 2019 138–146.
Park J.S., Kim G.W. and Lee D.H. (2018). Sensitive data
identification in structured data through genner model
based on text generation and NER, in: Proceedings of
the 2020 International Conference on Computing,
Networks and Internet of Things, in: CNIOT2020,
Association for Computing Machinery, New York, NY,
USA, 2020, pp. 36–40.
Trieu L.Q., Tran T.N., Tran M.K. and Tran M.T. (2018)
Document sensitivity classification for data leakage
prevention with twitter-based document embedding and
query expansion, in: 2017 13th International Conference
on Computational Intelligence and Security (CIS), 537–
542.
Gómez-Hidalgo J.M., Martín-Abreu J.M., Nieves J., Santos
I., Brezo F. and Bringas P.G. (2016). Data leak
prevention through named entity recognition, in: 2010
IEEE Second International Conference on Social
Computing, 1129–1134.
AI4IoT 2023 - First International Conference on Artificial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry