2 RELATED WORK
Mapping science and scientific processes through ci-
tation data has been explored by (Small, 1999), where
multiple approaches are reviewed and the data is ar-
ranged in different ways. In addition, there are multi-
ple visualization applications for general graph based
data (Bastian et al., 2009) as well as co-citation net-
works or bibliometric networks. The available view-
ers provide different tools to view and visualize the
graph based on its properties as well as performing
different manipulations on the graph data. A sim-
ilar tool as the work presented here is VOSviewer.
VOSviewer (van Eck and Waltman, 2009) is a tool for
the visualization of bibliometric data and combines
this with natural language processing to also create
term co-occurrence networks from textual informa-
tion. Collaboration Spotting is a tool which also can
be applied to generic data, and offers information re-
trieval methods and novel ways to navigate through
the data which the normal bibliometric visualization
tools do not provide. CiteSpace (Chen, 2006) is an-
other tool that helps to explore and visualize the sci-
entific knowledge domains. Key differences are in
how the retrieval aspects of the navigation are han-
dled. Similar to VOSviewer, CiteSpace does not offer
information retrieval functionality, which is included
in collaboration spotting. The correct information in
the visualization platforms have to be provided be-
forehand from external datasets. In the case of CiteS-
pace they can also be directly downloaded from the
Web of Science search interface. The procedure in
Collaboration Spotting has the advantage of users be-
ing able to rapidly performing multiple searches and
even being able to combine them to create a suitable
result graph for their data exploration. In comparison
to other systems, the data can come directly from the
indexed documents, but a manual blueprint of the data
mapping has to be created. Parts of the retrieval pro-
cess relies on methods that can be described as rele-
vance feedback through graph navigation. Relevance
feedback as a way to refine the information retrieval
process has been well defined and explored in liter-
ature (Rocchio, 1971) (Salton and Buckley, 1990),
and there are a lot of approaches who use the fully
automated pseudo-relevance feedback method to re-
fine queries to good success (Cao et al., 2008). In
addition, there are even methods of utilising pseudo
relevance feedback for citation recommendation (Liu
et al., 2014), but the authors do not know of any meth-
ods that directly use graph exploration and navigation
as a mechanism for the application of relevance feed-
back.
Figure 1: A principle representation of a schema as it is
used for the transformation of the data and the navigation
in the graph. The publication builds the central point for
navigation between available metadata. Search (START)
represents the connection to the search keywords or seed
document the graph is based on, publication references the
actual document.
3 COLLABORATION SPOTTING
CITE
Collaboration Spotting is a visualisation and naviga-
tion platform for exploring and manipulating large
and complex data-sets (Agocs et al., 2017). It com-
bines aspects of information retrieval and visual ana-
lytics to let users explore their data without having a
background in data science or other related fields. A
typical search and navigation process in the Cite ver-
sion of the web application is performed in multiple
steps: Retrieval of the relevant documents and con-
struction of the graph, Navigation and exploration of
the data and finally refining the search through rele-
vance feedback. The following sections explain each
of the stages in more detail.
3.1 Information Retrieval and Graphs
The system operates in the following way: First, the
user performs a full text search on the indexed docu-
ments and the retrieval process returns a list of items
and their relevance. The Collaboration Spotting plat-
form is not limited to text documents, but the search
procedures have been optimized for this application.
Parts of the retrieval process are described in more de-
tail in (Rattinger et al., 2018a). The retrieval process
takes either full documents or keywords defined by
the user to perform the initial search, as for search of
patents and publications source documents are mostly
available. In this case, keywords are extracted from
the different sections and weighted by tf-idf (Ramos
et al., 2003). The list of result documents is then
transformed into a graph according to a predefined
Collaboration Spotting Cite: An Exploration System for the Bibliographic Information of Publications and Patents
549