ics and logic
4. GA13-21988S – Enumeration in informatics and
optimization
5. GP14-13017P – Parameterized Algorithms for
Fundamental Network Problems Related to Con-
nectivity
According to the meaning of the experts, these
fields belong to priorities in computer science in the
Czech Republic.
4 CONCLUSION AND FURTHER
WORK
We have proposed a software tool for visualizing the
structure of collections of research projects with re-
spect to their content similarity. The approach is
based on the application of latent semantic analy-
sis and it can be easily implemented in
R
or Python
language. The results are easy-to-understand im-
ages/graphs that provide a quick overview of the con-
sidered set of projects. In future, this visualization
tool
Communities of similar projects can be subse-
quently elaborated: reports in the form of lists of in-
stitutions/researchers participating on projects in the
community can be also generated.
The plans of further work contain development of
evaluation methods and improvements that concern
mainly:
• Experimenting with Different Representations of
Projects: in this experiment we use only titles,
keywords and abstracts. We will investigate the
influence of taking more textual data – full pro-
posals, descriptions of project results (abstract of
papers assigned to the project etc.)
• Other Methods of Calculating Similarity: when a
big corpus of textual data is available, we will use
word2vec model (Mikolov et al., 2013) for simi-
larity computations
• Enriching the Visualization by Additional Data:
the size of node can be proportional to the budget
of the project, opacity of the node can represent a
value of a certain centrality measure in the graph,
a classification of a project (fundamental/applied
research etc.) can be represented by different col-
ors
• Employing External Data Sources: in our work,
the edges represent content similarity. We can
also add an additional layer where edges (in
different color) will represent other connections
among projects (e. g. an edge can link a pair of
projects having a common institution as a partici-
pant).
4.1 Other Possible Applications
Application of the proposed tool is not limited only
to projects domain. Analogously it can be used for
patent proposals grouping etc. In R&D environment,
other possible applications are:
• Exploration of the structure of research institu-
tions: each institution can be represented as a
plaintext file containing titles, keywords and ab-
stracts of projects in which has the institution par-
ticipated
• Project reviewer matching and/or expert search:
in our setting it is not necessary that all enti-
ties are of the same type. We can analogously
together represent researchers (by lists of titles
of their publications and keywords as in (Trigo
and Brazdil, 2014)) and calculate mutual sim-
ilarities of type “researcher-project (proposal)”.
Researchers that have the highest similarity to a
given project proposal can be considered as poten-
tial reviewers (after satisfying possible constraints
such as “independence of researcher on the re-
viewed project”). This principle can be also ap-
plied for searching experts for a newly prepared
project.
REFERENCES
Bonacich, P. (1972). Factoring and weighting approaches
to status scores and clique identification. Journal of
Mathematical Sociology, 2(1):113–120.
Brazdil, P., Trigo, L., Cordeiro, J., Sarmento, R., and Val-
izadeh, M. (2015). affinity mining of documents sets
via network analysis, keywords and summaries. Oslo
Studies in Language, 7(1).
Combe, D., Largeron, C., Egyed-Zsigmond, E., and G´ery,
M. (2010). A comparative study of social network
analysis tools. In International Workshop on Web In-
telligence and Virtual Enterprises, volume 2, page 1.
Feldman, R. and Sanger, J. (2007). The text mining hand-
book: advanced approaches in analyzing unstruc-
tured data. Cambridge University Press.
Ma, J., Xu, W., Sun, Y.-h., Turban, E., Wang, S., and Liu,
O. (2012). An ontology-based text-mining method to
cluster proposals for research project selection. Sys-
tems, Man and Cybernetics, Part A: Systems and Hu-
mans, IEEE Transactions on, 42(3):784–790.
Magerman, T., Van Looy, B., and Song, X. (2010). Ex-
ploring the feasibility and accuracy of latent semantic