the codebase. Reasonable results are also obtained
for the Ops contributors; the low number of commits
combined with their high participation in issues indi-
cate focus on operations. Further assessing their be-
haviors, we are also able to identify the ones that seem
more descriptive (verbal), as inferred by the number
and length of their comments. Finally, concerning the
DevOps contributors, they seem to participate both in
development and in operations’ activities, exhibiting
rather high values in all aforementioned metrics.
6 CONCLUSIONS
In this work, we proposed a data-driven methodology
and, by applying clustering, we were able to identify
the roles of contributors that take part in a software
project as well as the special characteristics of their
behavior. Future work lies in several directions. At
first, we can add more metrics to provide an analy-
sis that covers additional development scenarios and
roles. Moreover, we can expand our dataset by adding
projects with different characteristics. Finally, an in-
teresting direction would be to build a recommenda-
tions engine able to provide recommendations regard-
ing optimal team formulation and/or task allocation
based on the characteristics of each contributor.
ACKNOWLEDGEMENTS
This research has been co-financed by the European
Regional Development Fund of the European Union
and Greek national funds through the Operational
Program Competitiveness, Entrepreneurship and In-
novation, under the call RESEARCH – CREATE –
INNOVATE (project code: T1EDK-02347).
REFERENCES
Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who Should
Fix This Bug? In Proceedings of the 28th Inter-
national Conference on Software Engineering (ICSE
’06), pages 361–370, New York, NY, USA. ACM.
Bass, L., Weber, I., and Zhu, L. (2015). DevOps: A Soft-
ware Architect’s Perspective. Addison-Wesley Profes-
sional, 1st edition.
Bhattacharya, P., Neamtiu, I., and Shelton, C. R. (2012).
Automated, Highly-accurate, Bug Assignment Using
Machine Learning and Tossing Graphs. J. Syst. Softw.,
85(10):2275–2292.
Biazzini, M. and Baudry, B. (2014). “May the Fork Be
with You”: Novel Metrics to Analyze Collaboration
on GitHub. In Proc. of the 5th International Workshop
on Emerging Trends in Software Metrics (WETSoM
2014), pages 37–43, New York, NY, USA. ACM.
Cabot, J., Izquierdo, J. L. C., Cosentino, V., and Rolandi,
B. (2015). Exploring the use of labels to categorize is-
sues in Open-Source Software projects. In Proc. of the
22nd International Conference on Software Analysis,
Evolution, and Reengineering, pages 550–554.
Christidis, K., Paraskevopoulos, F., Panagiotou, D., and
Mentzas, G. (2012). Combining Activity Metrics
and Contribution Topics for Software Recommenda-
tions. In Proc. of the Third International Workshop on
Recommendation Systems for Software Engineering,
pages 43–46, Piscataway, NJ, USA. IEEE Press.
Dimaridou, V., Kyprianidis, A.-C., Papamichail, M., Dia-
mantopoulos, T., and Symeonidis, A. (2017). Towards
Modeling the User-Perceived Quality of Source Code
using Static Analysis Metrics. In Proc. of the 12th Int.
Conference on Software Technologies, pages 73–84.
Gousios, G., Kalliamvakou, E., and Spinellis, D. (2008).
Measuring developer contribution from software
repository data. In Proc. of the 2008 International
Working Conference on Mining Software Reposito-
ries, MSR ’08, pages 129–132, NY, USA. ACM.
Greene, G. J. and Fischer, B. (2016). CVExplorer: Identi-
fying Candidate Developers by Mining and Exploring
Their Open Source Contributions. In Proc. of the 31st
International Conference on Automated Software En-
gineering (ASE), pages 804–809, NY, USA. ACM.
LaToza, T. D. and van der Hoek, A. (2016). Crowdsourc-
ing in software engineering: Models, motivations, and
challenges. IEEE Softw., 33(1):74–80.
Li, S., Tsukiji, H., and Takano, K. (2016). Analysis of
Software Developer Activity on a Distributed Version
Control System. In Proc. of the 30th International
Conference on Advanced Information Networking and
Applications Workshops (WAINA), pages 701–707.
Liao, Z., He, D., Chen, Z., Fan, X., Zhang, Y., and Liu, S.
(2018). Exploring the Characteristics of Issue-Related
Behaviors in GitHub Using Visualization Techniques.
IEEE Access, 6:24003–24015.
Lima, J., Treude, C., Filho, F. F., and Kulesza, U.
(2015). Assessing developer contribution with reposi-
tory mining-based metrics. In Proc. of the 2015 IEEE
International Conference on Software Maintenance
and Evolution (ICSME), pages 536–540, USA. IEEE.
Maxwell, K. D. and Forselius, P. (2000). Benchmark-
ing software-development productivity. IEEE Softw.,
17(1):80–88.
Nagappan, N., Zeller, A., Zimmermann, T., Herzig, K., and
Murphy, B. (2010). Change bursts as defect predic-
tors. In Proc. of the 21st Int. Symposium on Software
Reliability Engineering, pages 309–318. IEEE.
Onoue, S., Hata, H., and Matsumoto, K.-i. (2013). A
Study of the Characteristics of Developers’ Activities
in GitHub. In Proc. of the 20th Asia-Pacific Software
Engineering Conference, pages 7–12, USA. IEEE.
Papamichail, M., Diamantopoulos, T., and Symeonidis,
A. L. (2016). User-Perceived Source Code Quality Es-
timation based on Static Analysis Metrics. In Proc. of
the IEEE International Conference on Software Qual-
ity, Reliability and Security, pages 100–107, Austria.
Towards Extracting the Role and Behavior of Contributors in Open-source Projects
543