projects. We defined five categories, and verified the
correctness of the discovered projects using three cri-
teria. The discovered performances have great vari-
ability, depending also on the query. The most pop-
ular project categories we defined have low perfor-
mances in RepoFinder, while, increasing the speci-
ficity of the query, performances raise.
The discovery time is one of the quality factors
in RepoFinder. To collect the results on RepoFinder,
it took from 5 to 15 minutes, while the same task on
Google took from 1 to 2 hours. The motivation is very
simple: RepoFinder reports projects in a structured
and uniform way, showing the name, labels and de-
scription of projects. Web pages, instead, mostly re-
port the name of the project, and sometimes the link to
the official site, making the discovery process slower.
We did not describe in this paper the analyses we
can perform on the retrieved projects, by exploiting
the different tools we have integrated for metrics com-
putation and code smell detection. We wanted to fo-
cus instead on the discovery functionalities of Re-
poFinder, leaving the demonstration of the analysis
phases for another future work.
Regarding future work, we identified some direc-
tions in which RepoFinder’s discovery functionalities
can be extended:
• Online query support. As we outlined, the crawl-
ing process can be slow, also because of limita-
tions imposed by Code Forges, and can lead to a
partial exploration of the available projects. We
are planning to combine our local index search
with online queries submitted to the supported
Code Forges. This hybrid solution should increase
the chance of discovering new projects, and will
leverage the existing search engines available on
each Code Forge.
• Similar projects search. As we outlined in Sec-
tion 5, software categories often have one or
more famous projects. It happens many times
that developers look for alternatives to an existing
project, rather than directly searching a software
by identifying its category. This is a common use
case, and we plan to integrate automated support
for it.
ACKNOWLEDGEMENTS
This work was partly funded by KIE S.r.l. of Milano,
Italy ( http://www.kie-services.com/ ). The authors
kindly thank this society.
REFERENCES
Arapidis, C. S. (2012). Sonar Code Quality Testing Essen-
tials. Packt Publishing.
Arcelli Fontana, F., Braione, P., and Zanoni, M. (2012).
Automatic detection of bad smells in code: An ex-
perimental assessment. Journal of Object Technology,
11(2):5:1–38.
Black Duck Software (2014). Ohloh. www.ohloh.net.
Campbell, G. A. and Papapetrou, P. P. (2013). SonarQube
in Action. Manning Publications Co.
Clarkware Consulting Inc. (2014). JDepend.
clarkware.com/software/JDepend.html.
Dangel, A. (2014). PMD. pmd.sourceforge.net.
Fowler, M. (1999). Refactoring: Improving the De-
sign of Existing Code. Addison-Wesley Longman
Publishing Co. Inc., Boston, MA, USA. http://
www.refactoring.com/.
Gousios, G. (2013). The GHTorrent dataset and tool suite.
In Proc. 10th Working Conf. Mining Software Reposi-
tories (MSR ’13), pages 233–236, San Francisco, CA,
USA. IEEE.
Howison, J., Conklin, M., and Crowston, K. (2006).
FLOSSmole: A collaborative repository for FLOSS
research data and analyses. Intl J. Information Tech-
nology and Web Engineering, 1:17–26.
Ivanov, R. and Sopov, I. (2014). CheckStyle. checkstyle.
sourceforge.net.
Lee, C. C. (2014). JavaNCSS. www.kclee.de/clemens/
java/javancss.
Neu, S., Lanza, M., Hattori, L., and D’Ambros, M. (2011).
Telling stories about GNOME with Complicity. In
Proc. 6th Intl Workshop on Visualizing Software for
Understanding and Analysis (VISSOFT 2011), pages
1–8, Williamsburg, Virginia, USA. IEEE.
Prez, J., Deshayes, R., Goeminne, M., and Mens, T. (2012).
SECONDA: Software ecosystem analysis dashboard.
In Proc. 16th European Conf. Software Maintenance
and Reengineering (CSMR 2012), pages 527–530,
Szeged, Hungary. IEEE.
Robles, G., Gonzlez-Barahona, J. M., Ghosh, R. A., and
Carlos, J. (2004). GlueTheos: Automating the re-
trieval and analysis of data from publicly available
software repositories. In Proc. Intl Workshop on Min-
ing Software Repositories (MSR 2004), pages 28–31,
Edinburgh, UK. IET.
Roy, C. and Cordy, J. (2008). NICAD: Accurate de-
tection of near-miss intentional clones using flexi-
ble pretty-printing and code normalization. In Proc.
16th IEEE Intl Conf. Program Comprehension (ICPC
2008), pages 172–181.
Scientific Toolworks, Inc. (2014). Understand.
www.scitools.com.
Squire, M. and Williams, D. (2012). Describing the soft-
ware forge ecosystem. In Proc. 45th Hawaii Intl Conf.
Systems Science (HICSS-45 2012), pages 3416–3425,
Grand Wailea, Maui, HI, USA. IEEE.
Van Antwerp, M. and Madey, G. (2008). Advances in the
sourceforge research data archive (SRDA). In Proc.
4th Intl Conf. Open Source Systems (WoPDaSD 2008),
Milan, Italy.
DiscoverKnowledgeonFLOSSProjectsThroughRepoFinder
491