Therefore, these structures allow us to analyze how
knowledge is spreading across generations of scien-
tists and how these links affect the development of
science.
This paper describes an information system called
The Gold Tree, which goal is to visualize academic
genealogy trees created from a set of metadata ex-
tracted and integrated from multiple sources. The
Information Management Research Group developed
it in Centro de Ci
ˆ
encias Computacionais at Universi-
dade Federal do Rio Grande (FURG). The proposed
system allows a researcher to query and track infor-
mation about his or her advisers and graduate students
at any level. A case study was explored to validate the
system using data from more than 570 thousand the-
ses and dissertations.
The rest of this paper is organized as follows. In
Section 2, we discuss related work. In Section 3, we
present the methodology to develop the proposed so-
lution. In Section 4, we give details on the obtained
results. Finally, in Section 5, we draw our conclusions
and point out some future work directions.
2 RELATED WORK
In recent years, several studies have explored the visu-
alization of academic collaboration data. While some
platforms such as ResearchGate (Yu et al., 2016),
Google Citations, and the Web of Science (WoS) clas-
sify registered researchers by citation indexing for
their articles and papers (Barab
ˆ
asi et al., 2002), other
tools such as Pajek (Batagelj and Mrvar, 2002) and
PubNet (Douglas et al., 2005) are only concerned
with viewing research networks. Furthermore, we
point out that there are also solutions that use spe-
cific data sources to extract information and generate
knowledge from co-authoring relationships (Mena-
Chalco and Cesar-Jr, 2013; Laender et al., 2011). The
following subsections present in detail the work used
as baseline in the validation of the proposed system.
2.1 Academic Family Tree
Neurotree is a Web database created to document
the lineage of academic mentorship in neuroscience
(David and Hayden, 2012). The authors present a
temporal analysis of the database growth in a pe-
riod of seven years. The following metric were per-
formed: the number of researchers and relationships,
the monthly growth rate, the fraction of researchers
linked in the main graph, the average distance be-
tween researchers, and the average number of connec-
tions per researcher. In addition, they report the accu-
racy of related data in Neurotree with data reported
on Web sites of five research groups. Finally, in order
to study the relationship between mentorship groups
and research areas within neuroscience, they provide
a clustering analysis.
This tree exists as a part of the larger Academic
Family Tree
1
, which seeks to build a genealogy across
multiple academic fields, building a single, interdis-
ciplinary academic genealogy. Figure 2 present the
result of a query by research name.
The contents of the database are entirely crowd-
sourced. So it is totally dependent on human effort.
This feature makes it very susceptible to field fill er-
rors, as well as always presenting incomplete data.
Any Web user can add information about researchers
and the connections between them, which can leave
the database with poor quality and with false infor-
mation.
2.2 Ac
´
acia Plataform
The Ac
´
acia Platform
2
(Damaceno, 2017) is a sys-
tem created in 2017 for documenting the formal rela-
tions of advising in the context of the Brazilian grad-
uate programs. The system uses data registered in
the Lattes Platform
3
, which is a database of Brazilian
researchers’ curricula maintained by the Ministry of
Science and Technology and Innovation. Currently,
the Acacia Platform has over 1 million vertices and
relationships. Each vertex represents a researcher
and each edge an advising relation completed be-
tween two researchers (advisor and student). Figure
3 present the result of a query by research name. The
system shows some bibliometric indexes as the num-
ber of direct and indirect descendants and information
about the advising relationships.
2.3 Science Tree
Created in 2015, the Science Tree
4
application collect
metadata of academic genealogy from many coun-
tries (Dores et al., 2016). The authors are crawling
data from a variety of sources, including the Net-
worked Digital Library of Theses and Dissertations
(NDLTD), which has more than 4.5 million theses and
dissertations from around the world. They develop a
framework to extract academic genealogy trees from
this data and, providing a series of analyses that de-
scribe the main properties of the academic genealogy
1
https://academictree.org
2
http://plataforma-acacia.org
3
http://lattes.cnpq.br
4
http://www.sciencetree.net
The Gold Tree: An Information System for Analyzing Academic Genealogy
115