the isolated individual, no matter how smart or well-
informed he is (Surowiecki, 2005).
According to Estellés-Arolas and Guevara, there
are 40 definitions for the concepts of crowdsourcing
that come from 32 distinct articles published from
2006 to 2011 (Estellés-Arolas & Guevara, 2012). The
term was created by Jeff Howe in his article “The Rise
of Crowdsourcing” in 2006. It a combination of
“crowd” and “outsourcing” and can be described as
“the act of taking work once performed within an
organization and outsourcing it to the general public
through an open call for participants” (Ridge, 2014).
Crowdvoting is, e.g., one type of crowdsourcing.
Its objective is to know the opinions of the crowd
regarding specific issues or products. Here, people are
giving their opinions and vote on a certain topic
(Simon, Pechuan, & Estelles-Miguel, 2015)
(Jimenez-Crespo, 2017) (Kitchens & Crane, 2014)
(Turban, King, Lee, Liang, & Turban, 2015).
3.4 Related Works
The concept of crowdsourcing is fairly new.
Nevertheless, the idea of crowdsourcing taxonomy
development and evolution was already applied in
scientific publications. There are two steps involving
in the process of developing a new taxonomy:
forming a term corpus and creating hierarchical
relationships between terms. Crowdsourcing can be
used in either one of these steps or in both of them.
The work of forming a term corpus using
crowdsourcing in the first step was introduced by the
mean of social tagging and folksonomy. Popular
tagging systems, which were mentioned the most in
scientific publications, are social bookmarking
website Delicious and photo-sharing site Flickr.
They have features that allow the user to add tags to
existing contents, in contrast to stricter systems like
libraries where a book will have exactly one proper
call number based on content (Heymann & Garcia-
Molina, 2006). These tags together form a
folksonomy and can be used as terms for the
developing taxonomy.
Nevertheless, folksonomy has its disadvantages.
There is no control of synonymy and homonymy,
there are many formats for dates and a lot of typing
and orthographic errors. Tags can also contain words
from different languages or even compound words
consisting of more than two words or a mixture of
languages. Combining all tags from a system, we can
find many words that have the same meaning or same
words but in different forms, e.g., “bag” vs “bags”,
“computer science” vs “computer_science” and
“computer-science” (Peters & Stock, 2007).
Besides the approach using folksonomy, there are
other methods to create a term corpus for taxonomy
without using crowdsourcing, such as extracting
words with top term frequency - inverse document
frequency score (Brooks & Montanez, 2006) or get
words or phrases in the top-ranked documents that
commonly co-occur with each other across many of
the passages (Sanderson & Croft, 1999).
From the terms’ corpus created in the first step,
the creators form hierarchical relationships between
terms and get the final result as a new taxonomy in
the second step. One method is to apply an algorithm
to grow deeper, bushier tree by merging saplings
created by different users, called SAP
(Plangprasopchok, Lerman, & Getoor, 2010).
Another method was introduced by Heymann and
Garcia-Molina (Heymann & Garcia-Molina, 2006).
Their idea is to convert tag into tag vectors and
calculate the similarity between tags using the cosine
similarity between tag vectors. The end product is a
tag similarity graph where each tag is represented by
a vertex, and two vertices are connected by an edge if
the similarity of the nodes they represent is above
some set threshold.
Liu et al. computes a generality score for each tag,
then use agglomerative hierarchical clustering
approach to generate the concept hierarchy (Liu,
Fang, & Zhang, 2010). Their algorithm has the same
principle as Heymann’s. Tags are sorted by their
score in descending order. In this case, it is the
generality score. Then the algorithm tries to find the
parent node in the taxonomy tree for each tag. If it
cannot be found, the tag is added as a child of the root.
Agglomerative Hierarchical Clustering (AHC) is
frequently used to build the hierarchy of tags. It relies
on how similar / distant two nodes are in building a
hierarchy. Li et al. proposed an enhance AHC
framework by skipping the error-prone step of
calculating each tag’s generality and integrating a
topic model to capture thematic correlations among
tags (Li, et al., 2012).
An interesting approach was introduced by
Karampinas and Triantafillou (Karampinas &
Triantafillou, 2012). Rather than calculating the
similarity score between two tags, they use the crowd
to annotate parent-children relationships between
tags. An algorithm, called “CrowdTaxonomy”, was
introduced to grow the taxonomy tree based on the
crowd’s annotations. The algorithm is called on every
vote. This method includes the crowd in both steps of
the taxonomy development process. The crowd is
used to form a terms corpus by the mean of social
tagging, and they vote to annotate pair between two
terms. Hierarchical relationships are built based on