12 Salvic languages (Liu and Cong, 2013) and Mi-
croblogs (Garg and Kumar, 2018b); (iii) gleam in-
sights from GoW for various domains such as Text
Authorship (Akimushkin et al., 2017); (iv) applica-
bility of network science metrics such as power-law
(Choudhury et al., 2010), spectral distribution (Liang,
2017), has enriched this domain to investigate infor-
mation retrieval approach towards unifying language-
specific text in language-independent vector represen-
Due to versatility of the project and need of native-
language experts, a major challenge for introducing a
generic model on GoW is the construction of dataset:
samples and annotations. However, the access to
in-build NLTK corpus in Python language facilitates
the structural analysis of text documents in differ-
ent language. Furthermore, we found new multilin-
gual datasets for different NLP-centered tasks such
as MIRACL dataset
for information retrieval and
for causal commonsense. Such datasets fa-
cilitate structural analysis to find unique patterns in
domain/ language/ genre-specific GoW for keyword
extraction. Structural analysis act as foundation to de-
sign language-independent objective function for in-
formation retrieval. An alternative application of dy-
namism construct mathematical modeling for concept
drift and evolving trends/events.
Towards Pattern Recognition with Network Science and Natural Language Processing for Information Retrieval