
we have presented initial results and conceptual ideas.
7 CONCLUSION
Our aim in this paper is to explore how to identify
key contributors responsible for the majority of test-
ing in open-source projects, specifically through the
analysis of test commits within the Apache Spark
project. By focusing on contributors who bear testing
responsibilities (RQ1), we observe that only 9.8% of
contributors are responsible for the majority of test-
related commits. This distribution emphasizes that a
concentrated group of individuals is responsible for
testing activities, a more skewed distribution than the
traditional 80/20 Pareto Principe. Our further analy-
sis uncovered varying engagement levels among the
top testing contributors by examining their total com-
mits over the past three years. We categorized them as
Highly-Active Contributors, Moderately-Active Con-
tributors, and Lowly-Active Contributors (RQ2). Ad-
ditionally, we analyzed the testing focuses of each
activity level across quarterly periods. It becomes
apparent that Highly-Active Contributors who are
deeply engaged with the Apache Spark project also
exhibit the highest testing focus across other clusters,
positioning them as the backbone of the project’s test-
ing practices. Our analysis highlights significant im-
plications for OSS projects, particularly in fostering
broader participation in testing activities among OSS
contributors and strategizing to achieve a more bal-
anced involvement across the community.
REFERENCES
Abdou, T., Grogono, P., and Kamthan, P. (2012). A con-
ceptual framework for open source software test pro-
cess. In 2012 IEEE 36th Annual Computer Software
and Applications Conference Workshops, pages 458–
463. IEEE.
Arafat, O. and Riehle, D. (2009). The commit size distri-
bution of open source software. In 2009 42nd Hawaii
International Conference on System Sciences, pages
1–8. IEEE.
Beller, M., Gousios, G., Panichella, A., and Zaidman, A.
(2015). When, how, and why developers (do not)
test in their ides. In Proceedings of the 2015 10th
Joint Meeting on Foundations of Software Engineer-
ing, pages 179–190.
Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swami-
nathan, A. (2006). Mining email social networks. In
Proceedings of the 2006 international workshop on
Mining software repositories, pages 137–143.
Bruce, P., Bruce, A., and Gedeck, P. (2020). Practical
statistics for data scientists: 50+ essential concepts
using R and Python. O’Reilly Media.
Chełkowski, T., Gloor, P., and Jemielniak, D. (2016).
Inequalities in open source software development:
Analysis of contributor’s commits in apache software
foundation projects. PLoS One, 11(4):e0152976.
Cheng, J. and Guo, J. L. (2019). Activity-based analysis of
open source software contributors: Roles and dynam-
ics. In 2019 IEEE/ACM 12th International Workshop
on Cooperative and Human Aspects of Software Engi-
neering (CHASE), pages 11–18. IEEE.
Coelho, F., Tsantalis, N., Massoni, T., and Alves, E. L.
(2021). An empirical study on refactoring-inducing
pull requests. In Proceedings of the 15th ACM/IEEE
International Symposium on Empirical Software En-
gineering and Measurement (ESEM), pages 1–12.
Desikan, S. and Ramesh, G. (2006). Software testing: prin-
ciples and practice. Pearson Education India.
Di Bella, E., Sillitti, A., and Succi, G. (2013). A multivari-
ate classification of open source developers. Informa-
tion Sciences, 221:72–83.
Ducheneaut, N. (2005). Socialization in an open source
software community: A socio-technical analysis.
Computer Supported Cooperative Work (CSCW),
14:323–368.
Geldenhuys, J. (2010). Finding the core developers. In
2010 36th EUROMICRO Conference on Software En-
gineering and Advanced Applications, pages 447–
450. IEEE.
Goeminne, M. and Mens, T. (2011). Evidence for the pareto
principle in open source software activity. In the
Joint Porceedings of the 1st International workshop
on Model Driven Software Maintenance and 5th In-
ternational Workshop on Software Quality and Main-
tainability, pages 74–82. Citeseer.
Hao, X., Zhengang, Z., Chunpei, L., and Zhuo, D. (2008).
The study on innovation mechanism of open source
software community. In 2008 4th International Con-
ference on Wireless Communications, Networking and
Mobile Computing.
Lee, A. and Carver, J. C. (2017). Are one-time contributors
different? a comparison to core and periphery devel-
opers in floss repositories. In 2017 ACM/IEEE Inter-
national Symposium on Empirical Software Engineer-
ing and Measurement (ESEM), pages 1–10. IEEE.
Levin, S. and Yehudai, A. (2017). Boosting automatic com-
mit classification into maintenance activities by uti-
lizing source code changes. In Proceedings of the
13th International Conference on Predictive Models
and Data Analytics in Software Engineering, pages
97–106.
Li, Z., Qi, X., Yu, Q., Liang, P., Mo, R., and Yang,
C. (2021). Multi-programming-language commits in
oss: An empirical study on apache projects. In 2021
IEEE/ACM 29th International Conference on Pro-
gram Comprehension (ICPC), pages 219–229. IEEE.
Lipovetsky, S. (2009). Pareto 80/20 law: derivation via ran-
dom partitioning. International Journal of Mathemat-
ical Education in Science and Technology, 40(2):271–
277.
ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering
278