12 Salvic languages (Liu and Cong, 2013) and Mi-
croblogs (Garg and Kumar, 2018b); (iii) gleam in-
sights from GoW for various domains such as Text
Authorship (Akimushkin et al., 2017); (iv) applica-
bility of network science metrics such as power-law
(Choudhury et al., 2010), spectral distribution (Liang,
2017), has enriched this domain to investigate infor-
mation retrieval approach towards unifying language-
specific text in language-independent vector represen-
tation.
Due to versatility of the project and need of native-
language experts, a major challenge for introducing a
generic model on GoW is the construction of dataset:
samples and annotations. However, the access to
in-build NLTK corpus in Python language facilitates
the structural analysis of text documents in differ-
ent language. Furthermore, we found new multilin-
gual datasets for different NLP-centered tasks such
as MIRACL dataset
1
for information retrieval and
XCOPA
2
for causal commonsense. Such datasets fa-
cilitate structural analysis to find unique patterns in
domain/ language/ genre-specific GoW for keyword
extraction. Structural analysis act as foundation to de-
sign language-independent objective function for in-
formation retrieval. An alternative application of dy-
namism construct mathematical modeling for concept
drift and evolving trends/events.
REFERENCES
Akimushkin, C., Amancio, D. R., and Oliveira Jr, O. N.
(2017). Text authorship identified using the dy-
namics of word co-occurrence networks. PloS one,
12(1):e0170527.
Atefeh, F. and Khreich, W. (2015). A survey of techniques
for event detection in twitter. Computational Intelli-
gence, 31(1):132–164.
Beliga, S., Me
ˇ
strovi
´
c, A., and Martin
ˇ
ci
´
c-Ip
ˇ
si
´
c, S. (2015).
An overview of graph-based keyword extraction
methods and approaches. Journal of information and
organizational sciences, 39(1):1–20.
Biswas, S. K., Bordoloi, M., and Shreya, J. (2018). A graph
based keyword extraction model using collective node
weight. Expert Systems with Applications, 97:51–59.
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.,
Nunes, C., and Jatowt, A. (2020). Yake! keyword
extraction from single documents using multiple local
features. Information Sciences, 509:257–289.
ˇ
Cebiri
´
c,
ˇ
S., Goasdou
´
e, F., Kondylakis, H., Kotzinos, D.,
Manolescu, I., Troullinou, G., and Zneika, M. (2019).
Summarizing semantic graphs: a survey. The VLDB
journal, 28(3):295–327.
1
https://project-miracl.github.io/
2
https://github.com/cambridgeltl/xcopa#cite
Chernyavskiy, A., Ilvovsky, D., and Nakov, P. (2021).
Transformers:“the end of history” for natural lan-
guage processing? In Joint European Conference
on Machine Learning and Knowledge Discovery in
Databases, pages 677–693. Springer.
Choudhury, M., Chatterjee, D., and Mukherjee, A. (2010).
Global topology of word co-occurrence networks: Be-
yond the two-regime power-law. In Coling 2010:
Posters, pages 162–170.
Duari, S. and Bhatnagar, V. (2019). scake: semantic con-
nectivity aware keyword extraction. Information Sci-
ences, 477:100–117.
Garg, M. (2021). A survey on different dimensions for
graphical keyword extraction techniques. Artificial In-
telligence Review, pages 1–40.
Garg, M. and Kumar, M. (2016). Review on event detection
techniques in social multimedia. Online Information
Review.
Garg, M. and Kumar, M. (2018a). Identifying influential
segments from word co-occurrence networks using
ahp. Cognitive Systems Research, 47:28–41.
Garg, M. and Kumar, M. (2018b). The structure of word co-
occurrence network for microblogs. Physica A: Sta-
tistical Mechanics and its Applications, 512:698–720.
Garg, M. and Kumar, M. (2022). Kest: A graph-based
keyphrase extraction technique for tweets summariza-
tion using markov decision process. Expert Systems
with Applications, 209:118110.
Grohe, M. (2020). word2vec, node2vec, graph2vec, x2vec:
Towards a theory of vector embeddings of structured
data. In Proceedings of the 39th ACM SIGMOD-
SIGACT-SIGAI Symposium on Principles of Database
Systems, pages 1–16.
Harary, F. and Norman, R. Z. (1960). Some properties of
line digraphs. Rendiconti del Circolo Matematico di
Palermo, 9(2):161–168.
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo,
G. d., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Nav-
igli, R., Neumaier, S., et al. (2021). Knowledge
graphs. ACM Computing Surveys (CSUR), 54(4):1–
37.
Hulth, A. (2003). Improved automatic keyword extraction
given more linguistic knowledge. In Proceedings of
the 2003 conference on Empirical methods in natural
language processing, pages 216–223.
Liang, W. (2017). Spectra of english evolving word co-
occurrence networks. Physica A: Statistical Mechan-
ics and its Applications, 468:802–808.
Liang, W., Shi, Y., Chi, K. T., Liu, J., Wang, Y., and Cui, X.
(2009). Comparison of co-occurrence networks of the
chinese and english languages. Physica A: Statistical
Mechanics and its Applications, 388(23):4901–4909.
Liu, H. and Cong, J. (2013). Language clustering with word
co-occurrence networks based on parallel texts. Chi-
nese Science Bulletin, 58(10):1139–1144.
Magueresse, A., Carles, V., and Heetderks, E. (2020). Low-
resource languages: A review of past work and future
challenges. arXiv preprint arXiv:2006.07264.
Majumder, P., Mitra, M., and Chaudhuri, B. (2002). N-
gram: a language independent approach to ir and nlp.
Towards Pattern Recognition with Network Science and Natural Language Processing for Information Retrieval
499