Analyzing the Developer’s Sentiment in Software Components: A
Decade-Long Study of the Apache Project
Tien Rahayu Tulili, Ayushi Rastogi and Andrea Capiluppi
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, Faculty of Science and Engineering,
University of Groningen, Groningen, The Netherlands
Keywords:
Complexity, Sentiments, Software Components, File Dependencies, Open Source Software, Software
Architecture.
Abstract:
Open-source software development relies heavily on effective collaboration among developers, with commu-
nication often reflecting emotional responses to the technical challenges encountered. The Apache HTTP
Server (’httpd’) project, a widely used web server, provides a rich dataset to explore how developer sentiment
may be influenced by the complexity of software components.
This study aims to investigate the relationship between developer sentiment and software component complex-
ity in the Apache ’httpd’ project. Specifically, it seeks to determine whether emotional expressions, captured
through sentiment analysis, correlate with the complexity of the components developers work on over a decade
of project development (2015–2024).
We utilized two primary datasets: developer communication from the mailing list and commit data. Sentiment
analysis was conducted using Sentistrength-SE to classify messages as positive or negative. Software com-
ponent complexity was measured using static code analysis tools, and a network model of file dependencies
was created to examine the architectural structure. Statistical tests, including ANOVA and Tukey HSD, were
applied to assess the relationship between sentiment, complexity, and developer contributions.
The results indicate that complexity is not necessarily associated with developers’ sentiments. However, the
most crucial component was significantly affected by sentiments. Developers contributing to more complex
components expressed more negative sentiments, suggesting that complexity may contribute to emotional
strain. These findings offer insights into managing developer well-being and improving project management
strategies in open-source development environments by addressing both technical and emotional factors.
1 INTRODUCTION
Developers are the driving force behind software de-
velopment activities. Their contributions (writing
code, fixing bugs, participating in code reviews, and
submitting commits) are essential for a project’s suc-
cess. How developers manage tasks and collaborate
within teams can directly affect their productivity,
and ultimately influence the sustainability of the soft-
ware (Muri
´
c et al., 2019; Jalote and Kamma, 2019).
In addition to their contributions and collaboration,
the way developers interact with software compo-
nents (such as modules, libraries, tools, and frame-
works) plays a critical role in shaping the quality and
speed of development.
Software components represent the fundamental
building blocks of a project (Lau, 2006). These com-
ponents (e.g., modules, libraries, frameworks) are
crucial for the software’s functionality and architec-
ture. Developers must manage them effectively to
ensure new features are implemented correctly, de-
pendencies are resolved, and system stability is main-
tained. However, the interdependent and often com-
plex nature of these components poses challenges.
Balancing the maintenance of existing code with the
addition of new features can be difficult. Well-
planned component systems can accelerate develop-
ment, whereas poor management can slow progress
significantly.
Sentiments are frequently expressed during soft-
ware development (Robinson et al., 2016; Murgia
et al., 2014; Murgia et al., 2018), influencing how
developers engage with their work and collaborate
with others. Tight deadlines, excessive workloads,
and communication challenges often give rise to neg-
ative emotions such as frustration, anger, or dissatis-
Tulili, T. R., Rastogi, A. and Capiluppi, A.
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project.
DOI: 10.5220/0013162000003912
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 1: GRAPP, HUCAPP
and IVAPP, pages 483-494
ISBN: 978-989-758-728-3; ISSN: 2184-4321
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
483
faction, potentially leading to burnout and a decline in
productivity. Conversely, positive emotions, such as
satisfaction and happiness, arise when problems are
solved successfully, feedback is positive, or the com-
munity offers support. These emotional dynamics can
affect developer engagement, retention, and collabo-
ration (Sage Sharp, 2015; Philipp Ranzhin, 2015).
This study aims to investigate the relationship be-
tween three key aspects of open-source software de-
velopment: developers, components, and sentiments.
Specifically, we focus on understanding how develop-
ers’ sentiments interact with the complexity of soft-
ware components in collaborative development envi-
ronments.
In open-source software projects, developers of-
ten contribute to multiple components or packages,
and several developers may work on the same com-
ponent. To coordinate these efforts and resolve tech-
nical issues, developers rely on communication chan-
nels such as mailing lists and chat platforms. These
exchanges often carry emotional undertones, both
positive and negative (Murgia et al., 2018; Mur-
gia et al., 2014). At times, negative emotions can
escalate, leading to toxic interactions that damage
team cohesion (Sage Sharp, 2015; Philipp Ranzhin,
2015). Research shows that emotions can signifi-
cantly impact aspects of development such as pro-
ductivity and project retention. For example, de-
velopers experiencing dissatisfaction may disengage
from their work or leave the project entirely (Grazi-
otin et al., 2018; Garcia et al., 2013). Previous
studies have also linked negative sentiments in com-
mit messages to bug-related work (Huq et al., 2020),
while others have observed that agile teams frequently
express emotions in response to changing require-
ments (Madampe et al., 2020).
Despite growing interest in the role of emotions
in software development, no studies have specifically
focused on how developer sentiment affects software
components. Additionally, the architectural impli-
cations of developer sentiment have yet to be ex-
plored. Our study addresses this gap by investigat-
ing the Apache ’httpd’ project
1
. We aim to deter-
mine whether developer sentiment has a measurable
impact on software components and how component
complexity influences sentiments during the develop-
ment process. To guide our investigation, we pose the
following research questions:
RQ1. Are there differences among developers in
terms of their sentiment involvement during software
development?
Rationale: This question aims to explore whether
distinct groups of developers, based on their senti-
1
https://httpd.apache.org/
ment expressions during communication, emerge in
the software development process. We also examine
their commit activities, contributions, and communi-
cation patterns.
RQ2. To what extent does the complexity of software
components impact developers’ sentiments?
Rationale: This question investigates the relation-
ship between the complexity of software components
and developers’ sentiments. We aim to understand
whether complex components elicit more negative
sentiments.
This paper is structured as follows: Section 2 re-
views related literature. Section 3 details the method-
ology and Section 4 presents results and discusses
their implications. Section 5 addresses threats to va-
lidity while Section 6 concludes with key insights and
future work.
2 RELATED WORK
We structured our review of related work along three
key axes: "sentiments and developers", "developers
and software components", and "sentiments and soft-
ware components". Previous studies have predom-
inantly focused on the first two axes: "sentiments
and developers" and "developers and components."
However, the "sentiments and software components"
topic, particularly in Free/Libre Open Source Soft-
ware (FLOSS), remains unexplored. Specifically, no
research has examined how software components in-
fluence developers’ sentiments and how this, in turn,
impacts project dynamics. Below, we summarize
prior research on the first two axes.
2.1 Sentiments and Developers
Research on developer sentiment during software de-
velopment has gained significant attention. For in-
stance, Tourani et al. (Tourani et al., 2014) ana-
lyzed developer mailing lists to identify expressions
of happiness and distress during development, us-
ing sentiment analysis techniques. Similarly, Fucci
et al. (Fucci et al., 2021) explored developers’ habits
around self-admitted technical debt by classifying is-
sue comments, finding that functional issues tended
to evoke more negative sentiments. In a related study,
Pletea et al. (Pletea et al., 2014) focused on secu-
rity discussions on GitHub, analyzing commit and
pull request comments. They confirmed that security-
related discussions contained more negative emotions
compared to non-security-related ones. However,
these studies focused their emotional investigation on
topics-based discussion only in whole projects during
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
484
the development. They did not investigate the inner
level of a project, such as at the components or pack-
age level. Therefore, our study addresses this gap by
investigating the sentiments of developers at the gran-
ular level.
Ortu et al. (Ortu et al., 2016) also studied de-
veloper interactions, investigating how psychologi-
cal conditions influence the tone of their commu-
nication. Using the Sentistrength tool, they found
that developers often responded to negative com-
ments with either positive or negative remarks. In
terms of productivity, some studies have examined
the correlation between developer sentiment and ac-
tivity levels, including how sentiments relate to re-
solved/unresolved issues (Valdez et al., 2020), peak
productivity days (Valdez et al., 2020; Guzman et al.,
2014), and the time required to fix issues (Ortu et al.,
2015). Nevertheless, these sentiment studies focused
more on the developer’s aspect relating to their emo-
tional condition and activity during the project devel-
opment as a whole. They did not take into account the
granular level of the project that our study will focus
on.
Several other studies have explored the relation-
ship between developer sentiment and software de-
velopment activities. Huq et al. (Huq et al., 2020)
investigated the connection between developer senti-
ment and software bugs by analyzing GitHub com-
mits. Their findings revealed that commits associ-
ated with bug-related activities, such as introducing,
fixing, or preceding bugs, were generally more nega-
tive. Similarly, Robinson et al. (Robinson et al., 2016)
identified a statistical correlation between changes in
developer routines and shifts in sentiment, suggest-
ing that behavioural changes are often accompanied
by emotional changes. However, similar to the afore-
mentioned studies earlier, there is still a lack of con-
sideration for studying more at the granular level,
such as at the packages level, which this study intends
to investigate.
2.2 Developers and Components
Wu et al. (Wu et al., 2023) examined the role of
social and technical dependency networks in open-
source software (OSS) communities. They analyzed
network metrics, such as degree centrality, between-
ness centrality, and closeness centrality, to understand
their impact on project success. The study found that
nonlinear relationships exist between the number of
connections within social and technical networks and
OSS success—suggesting that an increase in connec-
tions does not always correlate with project success.
However, Wu et al.s study did not consider the
sentiment dynamics of developer interactions within
these technical networks. Our research fills this gap
by investigating how sentiments evolve at the compo-
nent level. We explore how sentiment responses dur-
ing communication affect individual software compo-
nents, thus linking the technical and sentiment dimen-
sions of OSS development.
Software modularity is a crucial aspect of
component-based software engineering, enabling sys-
tem evolution and maintenance (Weide and Gibson,
1997; MacCormack et al., 2007). Modular designs
facilitate future adaptations and create ’option value’
for improved designs (MacCormack et al., 2007). Re-
search suggests that higher modularity in software
development can lead to increased productivity and
quality. Studies have shown that modular reuse en-
hances productivity, quality, and reduces costs in em-
bedded software development (Sun et al., 2014; Sun
et al., 2016). Higher modularity is associated with im-
proved development productivity and fewer software
failures (Cataldo and Herbsleb, 2013). It also reduces
development time and coordination efforts (Gomes
and Joglekar, 2008). Developers tend to invest less
cognitive effort in understanding modular code, al-
though they may spend more time on it (Segalotto
et al., 2023). Modularity positively interacts with
developers’ temporal work styles to influence soft-
ware quality and job satisfaction (Foerderer et al.,
2016). However, some developers do not fully uti-
lize advanced modularity techniques, often sticking
to basic object-oriented programming (Fukuda and
Leger, 2015). Additionally, while higher modularity
improves developer retention and reduces bug-fixing
time, increased complexity has the opposite effect (V.
and C. Palvia, 2007). Nevertheless, even though these
studies focus on the modularity aspect of software de-
velopment, they disregard the aspect of the developer,
such as sentiment that may also play an important role
in improving developers’ productivity and software
quality. Hence, by taking into account the sentiments,
our study explores the sentiments at the level of com-
ponent or packages.
Moreover, Mockus et al. (Mockus et al., 2000),
in their case study report, scrutinized the develop-
ment process of an open-source project, the Apache
web server, to acknowledge the possibility of com-
bining the development process of open-source soft-
ware and commercial. They initially quantified sev-
eral aspects such as developer participation, core team
size, code ownership, productivity, defect density, and
problem resolution interval by deeply analyzing the
email archives of source code change history and
problem reports of the project. Their study concluded
with some proposals of hypotheses. One of the hy-
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project
485
potheses relates to the suggested number (e.g. 10-15
people) of core developers who control the code base
or would create approximately 80% or more of the
new functionality of the project. Additionally, in a
separate study (Mockus et al., 2002), they tested and
refined their proposed hypotheses with another OSS
application, the Mozilla browser. Still using similar
types of archives in their previous study and method-
ology, they revisited the hypotheses and made some
refinements. The refined hypothesis includes the size
of core developers that would not be higher than the
aforementioned size if the core group used only infor-
mal ad hoc means of coordinating their work.
However, Mockus et al.s studies only investigated
the aspect of developers’s participation during project
development without looking at the developer’s sen-
timent. Hence, we fill the gap by adding the devel-
oper’s sentiment and considering the components and
developers’ activity.
3 METHODOLOGY
3.1 Definitions
The key terms used in this analysis are defined below:
Active Developer: refers to any developer who
pushed commits to the Apache repository be-
tween 2015 and 2024.
Negative Message: refers to a message contain-
ing sentences with negative scores between -3
and -5 and positive scores between +1 and +2.
Strongly negative words such as really hate,
awful, and suffer characterize this range.
Messages containing sentences with equal posi-
tive and negative scores and do not contain any
range of the aforementioned scores are ignored
(e.g., a message scoring -3 and +3) .
Positive Message: refers to a message containing
sentences with positive scores between +3 and +5
and negative scores between -1 and -2. Strongly
positive phrases like “very cool, excellent, and
thanks define this range. Messages containing
sentences with similar balanced scores and that do
not contain any range of the aforementioned score
are excluded.
Sentence: In this paper we divided long mes-
sages in several sentences. A ‘sentence’ refers to
a group of words starting with a capital letter and
ending with punctuation (e.g., periods, question
marks, and exclamation points).
Developer Writing Negative Sentences
(DWNs): refers to developers who predom-
inantly write negative sentences. DWNs are
identified as those in the top 5% of developers
by the number of negative sentences, based on a
quartile analysis.
Developer Writing Positive Sentences (DWPs):
refers to developers who primarily write positive
sentences, with a similar identification criteria for
DWNs, but focusing on positive sentences.
Component: refers to a community of files linked
by dependencies. Further details on constructing
the component network are in section 3.4.
3.2 Data Sets
In this study, we investigated the Apache commu-
nity, an open-source software community that has ex-
isted since the 1990s, focusing on the project ‘httpd’,
which powers one of the world’s most widely used
servers. The project employs various communica-
tion channels to coordinate its development efforts.
This component-based development project is built
upon over 60 standard Apache modules compris-
ing hundreds of files. This community fits with
the requirements of our study. These requirements
include implementing component-based architecture
and employing text-based communication for dis-
cussing technical issues, and they existed for over
ten years. Our study spanned a ten-year interval,
from January 2015 to May 2024, encompassing mul-
tiple programming languages, including SQL, Java,
Python, and R.
2
For the empirical analysis we utilized three pri-
mary data sources: i) the developer mailing list, which
captures discussions about technical issues related to
the software; ii) the GitHub commits, which we used
to quantify the complexity of components; and iii) the
source code to extract the Apache components. All
the datasets are publicly accessible.
1. Mailing List for Sentiments. We collected
data from the mailing list dedicated to discussing
source code changes and technical issues re-
lated to the HTTP server
3
. The publicly avail-
able archive of this mailing list
4
contains 47,104
emails.
2. Commits by Developers. We collected and ana-
lyzed 162,289 commits from the project’s GitHub
repository
5
. From the commits metadata, devel-
opers were classified into two groups: 1) Develop-
2
We describe all the steps of our methodology in this
link: https://shorturl.at/RAT7m
3
https://httpd.apache.org/lists.html#http-dev
4
https://marc.info/?l=apache-httpd-dev&r=1&w=1
5
https://github.com/apache/httpd
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
486
ers Writing Negative Sentences (DWNs), and 2)
Developers Writing Positive Sentences (DWPs),
as defined in section 3.1.
3. Source code for Complexity and Dependencies.
We used complexity data provided in the com-
mit dataset and employed a static code analysis
tool
6
to retrieve file dependencies. The code files
used are 591 files. A component network based
on these dependencies was then generated using
Infomap (Edler et al., 2024).
3.3 Data Extraction, Preprocessing, and
Sentiment Classification
We followed several steps in our analysis, beginning
with data extraction and preprocessing, followed by
sentiment labelling to address our research inquiries.
Data Extraction We collected two distinct datasets,
scraping the Python source code from the mailing list
archive and extracting commit data using Git com-
mands. The mailing list archive was used for the
first dataset mentioned in Section 3.2. Meanwhile,
the commit data was used for the second and third
datasets. The metadata for the mailing list includes
‘Subject, ‘Sender,’ ‘Date,and ‘Message-Body,while
the commit metadata includes details such as ‘hash,,
‘msg’, ‘committer_email’, ‘committer_name’, ‘com-
mitter_date,, ‘filename, and ‘complexity.. Further-
more, we extracted the source files and the complex-
ity of each source file provided in Github commits.
All datasets were stored in a database for subsequent
analysis
Data Preprocessing As we intend to analyze the
message content for our sentiment analysis, we need
to remove unnecessary lines. Message body con-
tent was cleaned by removing lines starting with ‘>’,
URLs, names, signatures, and greetings, along with
any code syntax or HTML/XML tags. The output was
divided into files for DWNs, DWPs, and sentiment-
tagged messages. Multiple email addresses for a sin-
gle developer were standardized to maintain consis-
tency.
Sentiments Classification In our study, we utilized
the sentiment analysis framework proposed by Yadol-
lahi et al. (Yadollahi et al., 2017) to classify opin-
ion polarity as either positive, negative, or neutral.
To conduct this analysis, we employed SentiStrength-
SE (Islam and Zibran, 2018), a sentiment tool devel-
oped for the software engineering domain. In addi-
tion, to classify messages from the mailing list to sen-
timents classes, negative and positive, we employed
the tool, designed specifically for Software Engineer-
6
Understand, http://scitools.com
ing and utilised in previous studies (Calefato et al.,
2018; Pletea et al., 2014; Chen et al., 2021; El Asri
et al., 2019; Girardi et al., 2021). This tool provides
positive and negative sentiment scores ranging from
+1 to +5 and -1 to -5, respectively.
SentiStrength-SE is a dictionary-based classifier
that enhances the original SentiStrength by utilizing
a domain-specific dictionary. It analyzes 490,000
commit messages from 50 open-source projects on
GitHub, providing bipolar scores for sentiment clas-
sification, with scores ranging from +1 to +5 for pos-
itive sentiments and -1 to -5 for negative ones. The
tool is based on six basic emotions: joy, love, anger,
sadness, fear, and surprise.
In line with previous research by Ortu et al. (Ortu
et al., 2015), joy and love indicate positive polarity,
while anger, sadness, and fear indicate negative po-
larity. Surprise is categorized into two sets based on
context: negative (surprise-) and positive (surprise+).
However, our focus remained strictly on the bipolar
sentiments of negative and positive.
Email messages typically contain multiple sen-
tences, each with its own sentiment score. Acknowl-
edging the potential impact of highly positive or neg-
ative sentences on both the writer and reader (Richter
et al., 2010), we adopted a sentence-level analysis ap-
proach. We divided all email messages into individual
sentences, analysing both positive and negative sen-
tences separately. Sentence-ending punctuation (e.g.,
periods, question marks, exclamation points) served
as indicators to delineate sentence boundaries. We
only considered the sentence written by the sender
and ignored the quoted sentences referring to the pre-
vious email (sender).
Hereby the example of a message
7
: So do you
mean, aprlib.apache.org? Who’s on the PMC for it?
What’s its charter? Is it really a big enough deal
to create a whole new project it to its own project
once we figure out answers to all the above questions?
I’d really hate to break the nomenclature conventions
that we’d all talked about and I thought decided upon,
regarding the hierarchy of projects and components
underneath them. Or is this an "incubator" project?
The message as a whole contains 6 sentences: the
Sentistrength-SE gave negative scores (-1, -1, -1, -1,
-5, -1), and positive scores (1, 1, 1, 1, 1, 1, 1). We
classified it as a negative message as it matched the
criteria of Negative Message described in Section 3.1.
7
The message does not contain personal data, so it does
not infringe the GDPR. In addition, the mailing list mes-
sages are publicly accessible and have been made public by
the Apache community.
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project
487
3.4 Post-Processing of the Data
As post-processing activities, we firstly linked the
mailing list and commit datasets by timestamp (year,
month, day) and developer name, manually unifying
inconsistent names and emails across both datasets.
These linked datasets were used in our analysis to an-
swer RQ1 and RQ2.
Secondly, in order to measure complexity, we uti-
lized cyclomatic complexity values obtained from the
PyDriller library (Spadini et al., 2018). The process
involved three steps:
First, we calculated yearly mean complexity val-
ues for each file to account for annual fluctuations.
Second, these values were aggregated to compute
the yearly mean complexity for each component.
Finally, we calculated the overall mean complex-
ity for each component over the years. These val-
ues were used to address RQ2 (see Section 4.2).
Thirdly, we normalized the differences between
positive and negative sentiment counts across all
components annually, using a z-score transformation.
This normalization was critical for answering RQ2.
Finally, the component network was constructed
by analyzing all files in the Apache ‘httpd’ project as
of May 2024 using a static code analysis tool. Depen-
dencies between files were represented as node pairs
with weights, and Infomap (Edler et al., 2024) was
used to generate the network. Our analysis focused
on the 20 largest components out of a total of 35.
3.5 Statistical Approaches
We employed various statistical methods to address
our research questions. For RQ1, we used the Mann-
Whitney test with Bonferroni correction to evaluate
differences. To address RQ2, an analysis of variance
(ANOVA) was conducted to assess complexity differ-
ences across components. Pairwise differences were
further examined using Tukey’s Honestly Significant
Difference (HSD) method, considering both normal-
ized sentiment scores and mean complexity values
(see Section 3.4).
4 RESULTS AND DISCUSSION
4.1 RQ1: Comparing Apache
Developers
We quantified and analyzed the activity of developers,
specifically focusing on DWNs, DWPs, and the most
active developers, in terms of their commit activities
and the frequency of sentimental interactions during
their contributions to the ‘httpd’ project.
Significant differences were observed among
these groups. Figure 1 illustrates these differences
through boxplots:
(a) The boxplots depict the number of com-
mits/contributions per month. The very dark grey rep-
resents the top ten DWNs, dark grey represents the top
ten DWPs, and very light grey represents the top ten
most active developers.
(b) Another set of boxplots illustrates the number
of positive and negative messages as a fraction of the
total messages sent. In this context, very dark grey
and dark grey correspond to the top ten DWNs and
DWPs, respectively, while light grey and very light
grey indicate the top ten active developers who wrote
positive and negative messages, respectively.
0
2
4
6
Top 10
Active developers
Top 10
DWNs
Top 10
DWPs
the proprotion of the number of commits and
total contributions (in months)
with log−scale
(a)
−3.5
−3.0
−2.5
−2.0
−1.5
−1.0
Top 10
Active developers
writing negative
messages
Top 10
DWNs
Top 10
Active developers
writing positive
messages
Top 10
DWPs
the proportion of the number of neg/pos sentences
and total messages with log−scale
(b)
Figure 1: Boxplots of the top 10 of active developers,
DWNs, DWPs regarding a relative number of commits done
(a) and of positive and negative sentences written (b).
These visualizations highlight the varying levels
of engagement and sentimental expressions among
different developer groups within the ‘httpd’ project.
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
488
Figure 2: The Infomap dependency network of twenty soft-
ware components illustrates the structure of the Apache
‘httpd’ project.
From Figures 1a and 1b, we observed that the
most active developers (those who wrote negative
messages) tend to use fewer negative sentences when
communicating with other developers during software
development. To statistically validate this observa-
tion, we conducted a Mann-Whitney U Test with Bon-
ferroni correction, comparing the two groups: the
most active developers writing negative messages and
the top ten DWNs. The resulting p-value was 0.023,
leading us to reject the null hypothesis that these
groups contribute equally in terms of the proportion
of negative messages to total messages.
Additionally, we performed a similar statistical
comparison between the top ten active developers
writing positive messages and DWPs using the Mann-
Whitney U Test with Bonferroni correction. Here, the
obtained p-value was 0.006, prompting us to reject
the null hypothesis (‘these groups contribute equally
in terms of the proportion of positive messages to total
messages’).
Based on these findings, we concluded that de-
spite their high number of commits and extensive
contributions, the top ten developers are more likely
to communicate neutrally or use fewer negative or
positive words. In contrast, the top ten DWNs and
DWPs exhibit similarities in their communication pat-
terns, as shown in the figures. Upon further investiga-
tion of the developers within these groups, we identi-
fied overlaps, indicating that these developers tend to
express both negative and positive sentiments when
communicating with others. Furthermore, it is human
nature that developers as human beings involve their
sentiments or emotions in a situation, as this is in line
with the previous study that found how negative emo-
tion (e.g. anger) negatively correlates with situational
power in the workplace (Fitness, 2000).
Moreover, it was noted that many of these devel-
opers worked within the same component, identified
as having high complexity and a significant negative
impact. For detailed insights, refer to Section 4.2.
While prior studies (Pletea et al., 2014; Huq
et al., 2020; Fucci et al., 2021; Valdez et al.,
2020) suggest a general prevalence of negative
emotions in specific contexts (e.g., security, bug-
related activities, resolved/unresolve issues, or func-
tional issues), our findings indicate that the most
active developers—despite their extensive contribu-
tions—communicate in a more neutral tone, us-
ing fewer emotionally charged words overall. This
contrasts with DWNs and DWPs, who consistently
use both positive and negative sentiments. Further-
more, our findings connect these sentiment patterns to
component-level contexts, uncovering that these ex-
pressive developers often work in high-complexity ar-
eas with notable negative impacts. These insights not
only fill the gap in the granular-level analysis high-
lighted in prior studies but also introduce the novel
observation of how communication styles relate to
developer roles and the technical complexity of their
work.
4.2 RQ2: Sentiment-Driven
Components
Figure 2 presents the network diagram of Apache
‘httpd’, consisting of twenty components, using the
dependencies between source files (C and Python)
and summarized by Infomap. A variable diameter cir-
cle represents each component, while variable width
lines depict the undirected flow of the connections.
The larger the circle, the higher the dependencies,
while the thicker the links, the stronger the relations
between the components. Detailed information about
each component is publicly available
8
.
Upon observation, we noted that one component,
highlighted with a dark orange line, stands out as
the largest and the most crucial, significantly larger
than the others. Additionally, this component shows
significant high dependencies, showing that inside
this component contains many sub-components that
are more dependent on each other than other outside
smaller components. Utilizing this component net-
work, we further analyzed the complexity of these
twenty components, as depicted in Figure 3.
Figure 3 displays these components, showcasing
sentiments and complexity. Each box’s color in-
dicates the normalized differences between positive
8
The list of components, identified by the IDs 1:1 to
1:20, is available at https://shorturl.at/IcQRw
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project
489
and negative messages, while the number inside each
box represents the mean complexity values across all
years for that component. Vertical and horizontal
lines denote periods with no commits.
It’s evident from the heatmap that ‘1:1’, the largest
and most crucial component, consistently experiences
high negative sentiments across all years. ‘1:2’ and
‘1:4’ show spikes in negative sentiments in 2022,
while ‘1:3’ exhibits a peak in positive emotions in
2017. The remaining components appear to be less
influenced by either positive or negative sentiments.
Furthermore, we conducted statistical compar-
isons of complexity across these components and
found a highly significant p-value (p < 2e 16), indi-
cating significant differences among the components.
Subsequently, we performed Tukey HSD post-hoc
tests to identify specific differences between compo-
nents, particularly focusing on comparisons involving
‘1:1’. Our analysis revealed that components ‘1:15’,
‘1:18’, ‘1:17’, ‘1:16’, ‘1:10’, ‘1:6’, and ‘1:7’, while
less impacted by emotional sentiments, differed sig-
nificantly from ‘1:1’ in terms of complexity. Notably,
component ‘1:15’, which ranked highest in complex-
ity, did not significantly differ from ‘1:6’ and ‘1:16’,
but showed significant differences with ‘1:3’, ‘1:18’,
‘1:17’, and ‘1:10’.
Based on these findings, we concluded that while
sentiments play a notable role in some software com-
ponents, only ‘1:1’ is distinctly impacted by emo-
tional expressions. Complexity, on the other hand,
appears to vary significantly across different compo-
nents and does not necessarily correlate with the emo-
tional sentiments expressed by developers. Addition-
ally, it’s noteworthy that many developers associated
with the top ten DWNs and DWPs worked within
‘1:1’, suggesting their influence on this component’s
dynamics.
Our results highlights a significant shift in fo-
cus from macro-level analyses of sentiment in soft-
ware development to a more granular, component-
level perspective. Prior studies (Sun et al., 2014; Sun
et al., 2016; Wu et al., 2023; Mockus et al., 2000;
Cataldo and Herbsleb, 2013; Foerderer et al., 2016;
V. and C. Palvia, 2007) explored developers’ inter-
actions or modularity without addressing sentiment.
In contrast, our study uniquely bridges these gaps by
investigating how sentiments manifest at the compo-
nent level, uncovering patterns that align with tech-
nical complexity and dependencies. For example,
while prior research (Cataldo and Herbsleb, 2013;
Sun et al., 2014; Sun et al., 2016; Foerderer et al.,
2016) associated productivity and developer satisfac-
tion with modularity, our findings reveal that compo-
nents with high dependencies, such as ‘1:1’, consis-
tently exhibit higher negative sentiment, linking sen-
timents to technical complexity and dependencies in
ways previously unexplored.
Furthermore, our results provide a nuanced under-
standing of developer communication. Unlike pre-
vious studies, which often focused on general senti-
ment trends among developers or their impact on pro-
ductivity (Valdez et al., 2020; Guzman et al., 2014;
Ortu et al., 2015), our analysis uncovers specific dy-
namics between sentimentally expressive developers
(DWNs/DWPs) and their activity within critical com-
ponents. The observation that the most active de-
velopers communicate in a more neutral tone, while
DWNs and DWPs are often associated with highly de-
pendent and complex components, offers new insights
into how sentiments might influence—and might be
influenced by—the technical characteristics of soft-
ware components. Moreover, our statistical findings
underscore that while complexity significantly dif-
fers among components, only ‘1:1’ demonstrates a
strong correlation between high complexity and neg-
ative sentiment. This adds a new dimension to the
study of modularity by integrating sentiment as a fac-
tor, thereby extending prior research on modularity’s
impact on productivity and software quality into the
emotional domain.
4.3 Implications for Practitioners and
Researchers
Observation on Developer Behaviour. The observa-
tion that the most active developers tend to use fewer
negative messages than the top negative senders high-
lights potential differences in communication styles
and their impact on project dynamics. Practitioners
can analyze these patterns to better understand how
individual developer behavior influences the over-
all tone and effectiveness of communication within
the community. For instance, profiling the behavior
of individual developers based on their communica-
tion patterns, activity levels, and contributions to the
project may assist leading developers in making in-
formed talent management, team formation, and re-
source allocation decisions within the community.
Additionally, our findings indicate that top devel-
opers, who often express strong emotions, play a piv-
otal role in shaping the software’s architecture. Com-
ponents with higher complexity attract more negative
sentiments, underscoring the challenging relationship
between emotional involvement and technical intrica-
cies. This highlights the necessity for project man-
agers to monitor both technical metrics and the emo-
tional well-being of developers.
Insights from the Communication Within a Compo-
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
490
nent. Our earlier findings highlight that some soft-
ware components may be influenced by sentiments
expressed by developers during software develop-
ment, particularly large components with high depen-
dencies, which are simultaneously handled by more
developers. Practitioners can leverage this informa-
tion regarding specific components within the project
where negative communication is prevalent and im-
plement targeted interventions to address underlying
issues. For example, regular code reviews could be
implemented specifically focused on critical compo-
nents prone to issues and bugs.
Qualitative Research. While our quantitative analy-
sis provides valuable insights, researchers can com-
plement these findings with qualitative methods such
as interviews and surveys to gain more detailed infor-
mation about the specific components managed along
with the sentiments expressed during their handling.
This can further enrich the understanding of the un-
derlying motivations, perceptions, and experiences of
developers within the open-source software commu-
nity, providing deeper insights into sentiments and
software component dynamics.
Comparative Studies and Generalisability. Our
methodology may be expanded to other open-source
projects or software development communities. Re-
searchers can conduct comparative studies analyzing
communication patterns, developer behaviors, and
project outcomes across diverse contexts. This can
help identify common trends and context-specific fac-
tors influencing dynamics.
5 THREATS TO VALIDITY
Our study’s results are subject to several limitations
that impact their validity. Therefore, we described
several threats to validity in this study.
Construct Validity One significant concern is the
potential lack of correlation between the two datasets
we linked: the mailing list archive and the commit
datasets, which were linked based on time and devel-
oper names. To assess this, we manually checked 100
random samples from both datasets to ensure their
linkage and correlation. This involved conducting
stratified random sampling across different years to
ensure representative sampling. In addition, we did
not conduct further investigation to distinguish be-
tween negative comments containing specific topics,
such as the development process or the code in the
repository.
Conclusion Validity Our methods, such as pick-
ing up the top ten DWNs, DWPs, and Active devel-
opers, may bias the results in finding the differences
among these groups of developers. However, as these
are only our preliminary results, we will extend the
groups by considering all the data points in our next
future work.
External Validity Another limitation arises from
our reliance on the Sentistrength-SE tool for senti-
ment labelling of messages. This tool may general-
ize the meaning of sentences without thoroughly ex-
amining their context or the entirety of the message,
potentially leading to inaccuracies in sentiment classi-
fication. Furthermore, the top ten subpopulations may
not generalize to the broader population.
Addressing these limitations is crucial for ensur-
ing the reliability and accuracy of our findings, par-
ticularly in understanding the nuanced relationships
between developer sentiments, commit activities, and
project dynamics within the ‘httpd’ project.
6 CONCLUSION AND FUTURE
WORK
This study highlights the relationship between de-
veloper sentiment and software component complex-
ity within the Apache ‘httpd’ project over nearly a
decade. Our results indicate that components with
higher dependencies tend to generate more negative
sentiments, suggesting that managing technical is-
sues, such as file dependencies and complexity, is not
just a matter of code but also of maintaining developer
well-being.
The findings suggest that technical complexity
and sentiment are not necessarily interconnected;
however, the complexity across the components
varies. In addition, the most crucial component con-
tained more negative than the rest. By identifying
components that are prone to sentiment responses,
project managers can implement strategies to mitigate
negative sentiments, such as more frequent code re-
views or targeted support for developers working on
complex tasks.
Future work should explore these dynamics in
other open-source projects to validate the generality
of these results and to investigate how sentiment re-
sponses can be better managed in the context of large
distributed teams. Additionally, incorporating quali-
tative approaches, such as developer interviews, could
provide further insights into how sentiment strain in-
fluences software development processes. Further-
more, further investigation into the cause relationship
between developers and the software and vice versa
should be explored, particularly on the granular level,
such as components.
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project
491
169
226.9
187.6
444.8
449
265
451
476.8
331.2
599
196.3
304.2
274
439
304
226.5
259.7
445
307.5
169
151.5
242
327.9
365
357
359.7
358.7
347.6
345
156.2
167.2
186.8
188.7
219.1
178.8
203.5
218.1
235.6
295.5
113.6
150.8
205.7
195.6
226.2
157.6
220.3
247.5
236.5
244.5
158.1
187
200.5
232
253.8
273.7
287
295
107.1
139.5
155.1
201.2
166.9
267.1
177.8
149
149.5
188.2
123.6
255
255
258
258
258
258
187.1
238.9
240
228
240
178.3
143.8
148.3
156.3
164.8
107
122.5
171.8
212.6
139.6
158.9
203.8
64.6
101.2
112.2
114
176.7
176
170.5
194
109.4
113.1
113.8
149.1
109.8
140.8
129.5
203.5
109.2
138.5
133
130.9
123.9
119.2
89.9
81.7
122.3
54
139
123.5
78
96
141.3
78
156
216.5
225.5
246
246
68.5
66.7
74
76
75.7
76
35
51.2
35
68
84
94
117
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2015
2016
2018
2019
2020
2021
2023
2021
2016
2017
2018
2019
2020
2021
2022
2023
2016
2017
2018
2019
2021
2022
2023
2015
2016
2017
2018
2019
2021
2022
2023
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2015
2016
2017
2018
2019
2020
2021
2022
2023
2015
2017
2018
2019
2020
2023
2015
2016
2017
2018
2021
2024
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2016
2018
2016
2015
2018
2019
2023
20172017
2021
2015
2016
2018
2020
2021
2016
2019
2020
2021
2024
2016
2017
2018
2019
2020
2022
2024
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2023
2015
2016
2017
2018
2019
2020
2021
2022
2023
2017
2018
2021
2017
2018
2019
2020
2021
2022
2023
2024
2015
2016
2017
2018
2019
2020
2021
2022
2023
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2015
2016
2017
2018
2019
2020
2021
2022
2023
2015
2016
2022
2017
2018
2019
2020
2021
2022
2023
2024
2021
2022
2023
2024
2015
2021
2022
2023
1:15: 1:6: 1:16: 1:1: 1:3: 1:9: 1:2: 1:13: 1:12: 1:5: 1:14: 1:4: 1:7: 1:10: 1:8: 1:20: 1:17: 1:18: 1:19: 1:11:
Component
−5 −3 −1 1 3 5
z−score of the number of differences between positive and negative messages
Figure 3: Heatmaps of the number of differences between positive and negative messages and the number of complexity in
twenty components yearly.
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
492
REFERENCES
Calefato, F., Lanubile, F., Maiorano, F., and Novielli, N.
(2018). Sentiment polarity detection for software de-
velopment. In Proceedings of the 40th International
Conference on Software Engineering, pages 128–128.
Cataldo, M. and Herbsleb, J. D. (2013). Coordination
breakdowns and their impact on development produc-
tivity and software failures. IEEE Transactions on
Software Engineering, 39(3):343–360.
Chen, Z., Cao, Y., Yao, H., Lu, X., Peng, X., Mei, H., and
Liu, X. (2021). Emoji-powered sentiment and emo-
tion detection from software developers’ communica-
tion data. ACM Transactions on Software Engineering
and Methodology (TOSEM), 30(2):1–48.
Edler, D., Holmgren, A., and Rosvall, M. (2024). The
MapEquation software package. https://mapequation.
org.
El Asri, I., Kerzazi, N., Uddin, G., Khomh, F., and Idrissi,
M. J. (2019). An empirical study of sentiments in
code reviews. Information and Software Technology,
114:37–54.
Fitness, J. (2000). Anger in the workplace: an emo-
tion script approach to anger episodes between work-
ers and their superiors, co-workers and subordinates.
Journal of Organizational Behavior: The Interna-
tional Journal of Industrial, Occupational and Orga-
nizational Psychology and Behavior, 21(2):147–162.
Foerderer, J., Kude, T., Mithas, S., and Heinzl, A. (2016).
How temporal work styles and product modularity in-
fluence software quality and job satisfaction. In Pro-
ceedings of the 2016 ACM SIGMIS Conference on
Computers and People Research, volume 44, pages
105–112. ACM.
Fucci, G., Cassee, N., Zampetti, F., Novielli, N., Sere-
brenik, A., and Di Penta, M. (2021). Waiting around
or job half-done? sentiment in self-admitted technical
debt. In 2021 IEEE/ACM 18th International Confer-
ence on Mining Software Repositories (MSR), pages
403–414. IEEE.
Fukuda, H. and Leger, P. (2015). Why do developers not
take advantage of the progress in modularity? In Pro-
ceedings of the 8th International Conference on Bio-
inspired Information and Communications Technolo-
gies (formerly BIONETICS). ACM.
Garcia, D., Zanetti, M. S., and Schweitzer, F. (2013). The
role of emotions in contributors activity: A case study
on the gentoo community. In 2013 International con-
ference on cloud and green computing, pages 410–
417. IEEE.
Girardi, D., Lanubile, F., Novielli, N., and Serebrenik, A.
(2021). Emotions and perceived productivity of soft-
ware developers at the workplace. IEEE Transactions
on Software Engineering, 48(9):3326–3341.
Gomes, P. J. and Joglekar, N. R. (2008). Linking modu-
larity with problem solving and coordination efforts.
Managerial and Decision Economics, 29(5):443–457.
Graziotin, D., Fagerholm, F., Wang, X., and Abrahamsson,
P. (2018). What happens when software developers
are (un)happy. Jnl of Systems and Software, 140:32–
47.
Guzman, E., Azócar, D., and Li, Y. (2014). Sentiment
analysis of commit comments in github: an empirical
study. In Proceedings of the 11th working conference
on mining software repositories, pages 352–355.
Huq, S. F., Sadiq, A. Z., and Sakib, K. (2020). Is developer
sentiment related to software bugs: An exploratory
study on github commits. In 2020 IEEE 27th Intl Conf
on Software Analysis, Evolution and Reengineering
(SANER), pages 527–531.
Islam, M. R. and Zibran, M. F. (2018). Sentistrength-se:
Exploiting domain specificity for improved sentiment
analysis in software engineering text. Journal of Sys-
tems and Software, 145:125–146.
Jalote, P. and Kamma, D. (2019). Studying task processes
for improving programmer productivity. IEEE Trans-
actions on Software Engineering, 47(4):801–817.
Lau, K.-K. (2006). Software component models. In Pro-
ceedings of the 28th international conference on Soft-
ware engineering, pages 1081–1082.
MacCormack, A., Rusnak, J., and Baldwin, C. Y. (2007).
The impact of component modularity on design evo-
lution: Evidence from the software industry. SSRN
Electronic Journal.
Madampe, K., Hoda, R., and Singh, P. (2020). To-
wards understanding emotional response to require-
ments changes in agile teams. In Proceedings of the
ACM/IEEE 42nd International Conference on Soft-
ware Engineering: New Ideas and Emerging Results,
pages 37–40.
Mockus, A., Fielding, R. T., and Herbsleb, J. (2000). A
case study of open source software development: the
apache server. In Proceedings of the 22nd inter-
national conference on Software engineering, pages
263–272.
Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002).
Two case studies of open source software develop-
ment: Apache and mozilla. ACM Transactions on
Software Engineering and Methodology (TOSEM),
11(3):309–346.
Murgia, A., Ortu, M., Tourani, P., Adams, B., and Demeyer,
S. (2018). An exploratory qualitative and quantita-
tive analysis of emotions in issue report comments of
open source systems. Empirical Software Engineer-
ing, 23:521–564.
Murgia, A., Tourani, P., Adams, B., and Ortu, M. (2014).
Do developers feel emotions? an exploratory analy-
sis of emotions in software artifacts. In Proceedings
of the 11th working conference on mining software
repositories, pages 262–271.
Muri
´
c, G., Abeliuk, A., Lerman, K., and Ferrara, E. (2019).
Collaboration drives individual productivity. Proceed-
ings of the ACM on Human-Computer Interaction,
3(CSCW):1–24.
Ortu, M., Adams, B., Destefanis, G., Tourani, P., Marchesi,
M., and Tonelli, R. (2015). Are bullies more produc-
tive? empirical study of affectiveness vs. issue fixing
time. In 2015 IEEE/ACM 12th Working Conference on
Mining Software Repositories, pages 303–313. IEEE.
Analyzing the Developer’s Sentiment in Software Components: A Decade-Long Study of the Apache Project
493
Ortu, M., Destefanis, G., Counsell, S., Swift, S., Tonelli, R.,
and Marchesi, M. (2016). Arsonists or firefighters?
affectiveness in agile software development. In Agile
Processes, in Software Engineering, and Extreme Pro-
gramming: 17th International Conference, XP 2016,
Edinburgh, UK, May 24-27, 2016, Proceedings 17,
pages 144–155. Springer International Publishing.
Philipp Ranzhin (2015). I ruin developer’s lives with
my code reviews and i’m sorry. https://habr.com/en/
articles/440736/, Last accessed on 09/2019.
Pletea, D., Vasilescu, B., and Serebrenik, A. (2014). Secu-
rity and emotion: sentiment analysis of security dis-
cussions on github. In Proceedings of the 11th work-
ing conference on mining software repositories, pages
348–351.
Richter, M., Eck, J., Straube, T., Miltner, W. H., and Weiss,
T. (2010). Do words hurt? brain activation during the
processing of pain-related words. Pain, 148(2):198–
205.
Robinson, W. N., Deng, T., and Qi, Z. (2016). Devel-
oper behavior and sentiment from data mining open
source repositories. In 2016 49th Hawaii Interna-
tional Conference on System Sciences (HICSS), pages
3729–3738. IEEE.
Sage Sharp (2015). Closing a door. https://sage.thesharps.
us/2015/10/05/closing-a-door/, Last accessed on
09/2023.
Segalotto, M., Bolzan, W., and Farias, K. (2023). Effects
of modularization on developers’ cognitive effort in
code comprehension tasks: A controlled experiment.
In Proceedings of the XXXVII Brazilian Symposium
on Software Engineering, volume 63, pages 206–215.
ACM.
Spadini, D., Aniche, M., and Bacchelli, A. (2018). Py-
Driller: Python Framework for Mining Software
Repositories.
Sun, H., Ha, W., Teh, P.-L., and Huang, J. (2016). A case
study on implementing modularity in software devel-
opment. Journal of Computer Information Systems,
57(2):130–138.
Sun, H., Ha, W., Xie, M., and Huang, J. (2014). Modu-
larity’s impact on the quality and productivity of em-
bedded software development: a case study in a hong
kong company. Total Quality Management & Busi-
ness Excellence, 26(11-12):1188–1201.
Tourani, P., Jiang, Y., and Adams, B. (2014). Monitoring
sentiment in open source mailing lists: exploratory
study on the apache ecosystem. In Proceedings of 24th
Annual International Conference on Computer Sci-
ence and Software Engineering, CASCON ’14, page
34–44. ACM.
V., M. and C. Palvia, P. (2007). Retention and quality in
open source software projects. Americas Conference
on Information Systems.
Valdez, A., Oktaba, H., Gómez, H., and Vizcaíno, A.
(2020). Sentiment analysis in jira software reposito-
ries. In 2020 8th International Conference in Software
Engineering Research and Innovation (CONISOFT),
pages 254–259. IEEE.
Weide, B. and Gibson, D. S. (1997). Behavioral relation-
ships between software components.
Wu, J., Huang, X., and Wang, B. (2023). Social-technical
network effects in open source software communities:
understanding the impacts of dependency networks on
project success. Information Technology & People,
36(2):895–915.
Yadollahi, A., Shahraki, A. G., and Zaiane, O. R. (2017).
Current state of text sentiment analysis from opinion
to emotion mining. ACM Computing Surveys (CSUR),
50(2):1–33.
HUCAPP 2025 - 9th International Conference on Human Computer Interaction Theory and Applications
494