Empirical Study of Ad Hoc Collaborative Activities in
Software Engineering
Sébastien Cherry
1
,
Pierre N. Robillard
1
1
Software Engineering Research Laboratory, Computer Engineering Department,
École Polytechnique de Montréal, C.P. 6079. succ. Centre-Ville, Montréal, Canada
Abstract. This paper presents empirical research on ad hoc collaborative activi-
ties found in an industrial software engineering setting. We believe that a better
understanding of these activities and their content will help us to propose soft-
ware development process enhancements and also provide some insight into the
tools needed to support communications in a distributed software development
environment. Further details of our motivations are included, followed by a dis-
cussion on our research methodology, and, finally, some results of a prelimi-
nary analysis confirming the significance of our data and the importance of the
observed phenomenon.
1 Introduction
It is well supported in the literature that some problems encountered in software de-
velopment are not attributable to technical factors, but rather to the human aspects of
software engineering [2], [4], [5], [7], [11], [13], [14], [15]. While some aspects, such
as “communication” [7], [15], “coordination” [4], [5] and “collaboration” [2], [13],
are gaining recognition in the research community, some methodological challenges
emerge. Human factors, for example, have been overlooked in the past for many
reasons, but principally because of the difficulty in measuring these facets quantita-
tively [11]. Nevertheless, empirical research in software engineering is growing in
popularity and beginning to be adapted to studying this new topic of interest, namely
people, and methods and techniques are being borrowed which were formerly used in
the human sciences such as psychology and sociology. Like many researchers, we
think that this domain will offer research opportunities for years to come.
This paper presents in-progress empirical research in the context of a case study in
the industry, and explores collaborative work in software engineering; specifically,
the ad hoc collaborative activities that take place during the software development
process. By ad hoc collaborative activities, we mean activities which are not formally
prescribed, and which occur between two or more developers working on a specific
project task and which happen informally and spontaneously. They can take many
forms, such as in peer-to-peer conversations, electronic mail exchanges, and so on.
Details of our motivations are to be found in the next section of this paper, fol-
lowed by a discussion of the research methodology used, including data collection
methods, and the need to place greater emphasis on the analysis techniques that will
Cherry S. and N. Robillard P. (2004).
Empirical Study of Ad Hoc Collaborative Activities in Software Engineering.
In Proceedings of the 1st International Workshop on Computer Supported Activity Coordination, pages 116-125
DOI: 10.5220/0002682701160125
Copyright
c
SciTePress
be used to explore the large quantity of amassed data. Finally, some results of a pre-
liminary analysis are revealed, which confirm both the relevance of the collected data
and the importance of the observed phenomenon.
2 Motivations
2.1 Why collaborative work?
As previously mentioned, a growing number of researchers support the view that
many of the problems that arise during software development could be imputable to
human factors associated with the software engineering process. Perry, Staudenmayer
and Votta (1994) [11], among others, believe that too much attention is given to the
technological aspects of software engineering. They say that one of the reasons fre-
quently mentioned is the difficulty of measuring these human factors quantitatively.
The same comments are also supported by Seaman (1999) [16].
Many approaches have been envisaged to study the human aspects of software en-
gineering. Some researchers have examined the communication occurring during
software development [7], [15], while others have studied the coordination aspect [4],
[5] and still others were interested in the collaborative work [2], [13].
With regard to collaborative work, Robillard and Robillard (2000) [13] have em-
pirically identified four types of collaborative activities performed during the soft-
ware engineering process. They defined “ad hoc” collaborative activities, for exam-
ple, as the work carried out simultaneously by teammates on a particular task of the
project and which is not prescribed by a formal process. One of the conclusions of
this research was that ad hoc collaborative activities can play a major role in team
communication dynamics, accounting for 41% of this dynamics during the case
study. Furthermore, these activities constitute the longest of the working sessions, and
in addition seem to have an important impact on individual activities, since they often
precede long individual working sessions.
Also, Perry, Staudenmayer and Votta (1994) [11] found during another case study
that informal communications take up an average of 75 minutes per day per software
developer. Seaman (1996) [15] also supports the view that this type of communica-
tion is a non-negligible element to be taken into account during a development proc-
ess, and which is essential if developers are to carry out their tasks adequately.
Because they monopolize quite a considerable part of a software project and con-
stitute an important element of it, as established above, exploratory research is essen-
tial to understanding the content of these ad hoc collaborative activities and the com-
munication that ensues, and to measuring the impact, both positive and negative, on
the rest of the development process. We believe that such research will help us to
subsequently propose software engineering process enhancements which will be
better adapted to the human and empirical realities of software development.
By contrast, collaborative software development, also known as “distributed soft-
ware development”, is an increasingly fashionable domain of research these days.
117
Both these expressions refer to software development distributed over time and
over relatively long distances, something that has become quite common business
practice nowadays in cases where it occurs.
However, according to recent research in this domain [5], the distances between
the members of virtual teams tends to obstruct informal communications, resulting in
problems of coordination. This is another important reason for undertaking research
in this field. It will also provide some insight into the tools needed to support com-
munications in a distributed software development environment.
2.2 Why an empirical study?
Empirical research based on the experimental method has been conducted for a long
time now, in many of the human sciences, such as psychology and sociology. It is,
moreover, very often considered to be the only valid scientific method accepted in
these domains. Although empirical research has been conducted in software engineer-
ing for many years, it is on a much smaller scale and only quite recently has it seen an
increase in popularity. One reason for this is the growing interest in the human as-
pects of software engineering [16].
Further arguments supporting this new tendency, and strengthening the evidence
for it, have been expressed by Tichy (1998) [17] in his paper, “Should Computer
Scientists Experiment More?” However, those who uphold this practice in software
engineering believe that, since the quantity of empirical research is on the increase, its
quality should increase as well [10], [18].
3 Research Protocol
3.1 Problem Statement
Research Objectives. As discussed previously, the importance and the necessity for
ad hoc collaborative work and the communications that ensue in software
development are widely supported by many authors [2], [4], [5], [6], [7], [11], [13],
[15]. Although some research has quantified the importance of the phenomenon,
there has been no known attempt to determine and describe the content of that work.
These considerations led us to define the following research objectives:
To observe the collaborative work taking place in a case study in the industry
to design a conceptual model and distinguish some patterns of exchanges.
To characterize the ad hoc collaborative activities found, the communications
that ensue, and to identify and describe their content.
To generate a series of hypotheses which emerges from the results of this re-
search, and which could later be validated by confirmatory research.
118
Theoretical Relevance of the Research. The fact that there has been little or no
empirical study of ad hoc collaborative activities gives this research theoretical rele-
vance. It will make it possible to establish a model of these activities and to gain a
sense of the cognitive aspects involved from which we will be able to generate a
series of hypotheses to create a theoretical base in this domain.
Practical Relevance of the Research. Based on the fact that good collaboration is an
indispensable condition of a software development team working effectively to make
a quality product which meets the needs of the user, in the time required and at the
expected cost, this research is relevant in practice because it will potentially pave the
way to the proposal of improvements to software engineering processes. Also, as
previously mentioned, it will allow us to better understand the informal collaboration
and communication aspects of software engineering, and provide some insight into
the tools needed to support communications in a distributed software development
environment.
3.2 Research Methodology
General Approach. The research is carried out by means of participant observation
within the framework of a case study in an industrial environment. This type of
approach is suitable in our case because this is exploratory research. Also, as
Jorgensen (1989) [8] and Babbie (2001) [1] have stressed, study in the field combined
with participant observation are appropriate when it is not a question of empirically
verifying hypotheses formulated in advance, but rather of inductively generating
theories from observations and from the empirical data collected.
Target-setting. The setting in which the chosen software development team works is
a large enterprise which produces software for commercial purposes. It is a well-
established, mature concern which has been in operation for several years, and where
there exists a clearly defined development process. Nevertheless, even if the
observations are done in a large company, this last also contains some attributes of
smaller organizations since the development of software components is divided into
small teams.
Also, based on a common-sense judgment (face validity) [1], we can say that the
chosen team of eight individuals is representative of the majority of development
teams, with a wide range of ages, amounts of schooling, years of experience in soft-
ware development and length of service in the company.
Data Collection. The following data collection methods were identified from a
preliminary ethnography period within the chosen team which lasted several months.
An initial data collection phase, which took place in the autumn of 2003, is now
completed and was spread out over 8 weeks. The results presented in this paper were
produced from these earliest data. The purpose of this collection was to gather the
119
maximum amount of information from the beginning to the end of the development
of an update (patch) of a given version of the software produced.
The data collected during this first phase includes:
185 hours of audio-video recordings of working sessions over 37 workdays
The capture of a total of 2496 e-mails exchanged by the 8 teammates
A daily backup of the source code and other documents and artifacts found
E-mails were captured automatically, by means of triggers defined in the messag-
ing software used in the company. This capture included both e-mails received and
those sent by teammates, in order to permit cross-validation.
The daily backup of the source code, and the various documents and artifacts, was
available for potential use for subsequent content analysis.
Data Analysis. One of the techniques that will be used for the analysis is Exploratory
Sequential Data Analysis (ESDA) [3]. This technique is suited to exploratory
research, where the objective is to find answers to research questions or to find
patterns among the empirical data and to describe them using, for example, simple
statistical representations.
ESDA allows researchers to define, from these descriptions, hypotheses which are
subsequently verified by means of confirmatory research using statistical inference
methods. However, the important feature of ESDA is that it applies more specifically
to research where the sequential integrity of the data must be preserved.
Of the eight operations proposed by ESDA, encoding is certainly the most impor-
tant. This involves labeling each sequence of data by means of a code formed using a
particular syntax and contained in an exhaustive, exclusive and relatively restricted
category list, and doing so to decrease the variability of the data, as well as to facili-
tate its subsequent manipulation. This encoding makes it possible to transform quali-
tative data into quantitative data, on which it is then possible to perform statistical
analysis [3], [16].
The ESDA process, such as proposed by Fisher and Sanderson (1996) [3], is an it-
erative one, involving the definition of a series of concepts stemming from research
questions of interest. The process will subsequently drive what should be observed
among the collected raw data and what manipulations should be made to obtain de-
rived data on which it is possible to generate theories or define hypotheses. It is itera-
tive because it is often necessary to revisit certain steps; for example, to add, remove
or redefine concepts or categories that are sometimes found intuitively and validated
by their statistical representations.
Research Validity. To satisfy the validity criterion for the research, the empirical
measure must faithfully translate the empirical reality of the measured phenomenon
[1]. To enhance the validity of our research, particular attention is directed to the
definition of the concepts or categories chosen to encode the data. These concept
definitions, which must arise from the ESDA traditions that concern us [3], as well as
from the context of the research field, ensure a degree of representativeness of the
phenomenon under study by common-sense validity (face validity) [1], [8].
The concepts or categories under which the data will be encoded, as well as the
number of categories chosen, will also be very important as far as content validity is
120
concerned [1]. This aspect of validity refers rather to the coverage of the meanings
encompassed by the concepts. Furthermore, the validity of the connections or the
relations (construct validity) [1] should be assured among the concepts forming the
theoretical model appearing from the data. This can be done by means of certain
correlation measures or statistical associations.
Finally, a data triangulation will be made between qualitative and quantitative data,
as well as of data resulting from various sources [16], [18]. Concerning this last point,
other phases of data collection are to be anticipated.
4 Preliminary Results
The results presented in this section are the product of a preliminary analysis concern-
ing four of the eight developers in the team who were observed over a period of 8
hours. The choice of these individuals was not made by means of a sampling method,
but from direct observations in the field: they had been identified as being likely to
work more collaboratively than the others. This choice is justified because the objec-
tive of this research is not to find a magic number indicating the time spent on ad hoc
collaborative work, but rather to investigate the content of these collaborative activi-
ties. It should also be noted that the results below do not take into account e-mail
interactions, but only the peer-to-peer conversations and telephone exchanges.
As can be seen from Figure 1, 51% of the time is spent on ad hoc collaborative
work, as against 49% for the other types of activities. This result tends to corroborate
the observations made in the field that suggested the importance of the phenomenon.
51%
49%
Ad hoc collaborative activities
Other types of activities
Fig. 1. Distribution of time spent on ad hoc collaborative activities in comparison with other
types of activities
Figure 2 indicates the percentage of time spent on ad hoc collaborative activities
by the subjects observed. As was noted in the field, subjects MS2 and MS3 seem to
have spent a large amount of their time collaborating and communicating in a sponta-
neous way with their colleagues. This can be explained by the nature of the work
performed by these subjects. MS2 occupies the position of project manager in the
team, and one of his functions is to circulate relevant information needed by the de-
velopers on his team. When we examine more closely the interactions in which MS2
is involved, we note that, for 78.13% of the time, his colleagues initiate the interac-
tions. We can suggest hypothetically that MS2 constitutes a source of the information
his colleagues require. However, it was noted in the field that a great deal of the in-
formation passed on by MS2 to his teammates is in the form of e-mails. It would be
interesting to investigate this method of communication. MS3 is, for his part, respon-
sible for the infrastructure of the software built, and often the individual consulted to
121
resolve problems. He manages several tasks at the same time, which brings him into
communication with some of his colleagues more often.
47
55
54
48
42 44 46 48 50 52 54 56
Percentage of time spent on ad hoc activities
collaboratives
MS1
MS2
MS3
MS4
Subjects
Fig. 2. Percentage of ad hoc collaborative activities per subject
By contrast, the average duration of the interactions analyzed by the four develop-
ers is 6:31 minutes, and these interactions involve, on average, 2.3 stakeholders. We
should remember that an interaction is defined as a communicative unit which pre-
sents an evident internal continuity, while it breaks with what precedes it and what
follows it [11]. Moreover, these results were based on a total of 82 observed interac-
tions.
Figures 3 and 4 give an initial outline of a distribution as a percentage with regard
to the number of occurrences and time spent on the various categories of ad hoc col-
laborative activities identified. “Cognitive synchronization” exists when two or more
developers exchange information to ensure that they share the same knowledge or the
same representation of the object in question. “Problem resolution” occurs when two
or more developers are aware of the existence of a problem and attempt by different
means to solve the problem or to mitigate it. “Development” occurs when two or
more developers contribute to the development of a new feature or component of the
software. “Management” is the result of two or more developers coordinating and
planning activities such as meetings, common working sessions or setting schedules.
“Conflict resolution” is the process of two or several developers taking part in discus-
sions to resolve a difference of opinion. Ad hoc collaborative activities under the “not
relevant” category group together all interactions which do not concern the project or
the software built.
122
12.63%
8.42%
11.58%
0%
14.74%
52.63%
Cognitive synchronization
Problem resolution
Development
Management
Conflict resolution
Not relevant
Fig. 3. Distribution in number of occurrences of ad hoc collaborative activities identified
24.50%
8.24%
7.37%
3.32%
0%
56.57%
Cognitive synchronization
Problem resolution
Development
Management
Conflict resolution
Not relevant
Fig. 4. Distribution in terms of time spent on ad hoc collaborative activities identified
As Figure 3 suggests, 52.63% of the ad hoc collaborative activities that arose are
forms of cognitive synchronization. This agrees with the direct observations made in
the field. The figure is not surprising when we consider that the exchange of informa-
tion and knowledge constitutes an essential element in software development, which
is to crystallize [16] all the information required in quality software to meet the needs
of the user. As shown in Figure 4, cognitive synchronization occupies 56.57% of the
time spent on ad hoc collaborative work, which also supports the previous finding.
The other significant category, problem resolution, does not seem important in
Figure 3 in terms of number of occurrences. Figure 4, however, suggests that this
activity monopolizes almost a quarter of the time spent on ad hoc collaborative work
by the four subjects observed. It demonstrates that perhaps, while problem resolution
activities are relatively few in number, when they do arise, they monopolize a rather
considerable amount of time. An analysis of the mean time spent by sequence as a
function of ad hoc collaborative activity tends to confirm this, showing that problem
resolution takes up to 9:48 minutes when it occurs, as opposed to the interaction av-
erage of 6:31 minutes.
The results relative to management activities also reveal an interesting finding.
Unlike problem resolution activities, they are relatively numerous considering the
fairly short time that they occupy. This may tend to confirm the theories of certain
authors [7], who maintain that informal communications are necessary in order that
the members of a team can coordinate their activities effectively.
123
5 Conclusion
It is clear, and widely supported, that good collaboration and communication are an
essential condition of the successful delivery of a quality product by a software de-
velopment team, one that meets the user’s needs in a timely fashion and at the ex-
pected cost.
It was revealed by previous research that ad hoc collaborative activities and infor-
mal communications occupy a considerable portion of the time that a developer
spends on a software project. However, no research has attempted to describe the
content of these activities, which leaves a vast field open for exploration.
The empirical research described in the present paper suggests the importance of
investigating this field, because the authors believe that understanding how people
collaborate will make it possible to propose practices to enhance collaboration and
communication within a development team, as well as improve software development
processes.
This article has briefly presented the methodology used to meet our research objec-
tives. It was influenced by previous empirical research which was also aimed at in-
vestigating the human aspects of software engineering, but which was, however,
adapted to the context of the field on which this study focuses.
The embryonic results that were partially presented in this paper, and that arise
from a tiny portion of the considerable quantity of data that was collected, are very
interesting. Although more analysis is needed, a model of data and patterns already
seems to be emerging which will allow us to subsequently form hypotheses which
can be validated by other, confirmatory research, thereby forging a knowledge base in
this, as yet unknown, domain of software engineering.
6 Acknowledgments
This research would not have been possible without the agreement of the company in
which it was conducted, and without the generous participation and patience of the
software development team members from which the data was collected. To all these
people, we extend our grateful thanks.
References
1. Babbie, E.: The Practice of Social Research, 9th ed., Wadsworth Publishing Company,
Belmont, CA (2001)
2. D’Astous, P., Robillard, P.N.: Les aspects de l’échange d’information dans un processus de
génie logiciel, Rapport interne, École Polytechnique de Montréal, EPM/RT-97-06 (1997)
3. Fisher, C., Sanderson, P.: Exploratory Sequential Data Analysis: Exploring Continuous
Observational Data, Interactions, Vol.3, No. 2, Mar. (1996)
4. Grinter, R.E., Herbsleb, J.D., Perry, D.E.: The geography of coordination: Dealing with
distance in R&D work, GROUP'99: International Conference on Supporting Group Work,
Coordination and Negotiation, (1999) 306-315
124
5. Herbsleb, J.D., Grinter, R.E.: Splitting the Organization and Integrating the Code: Con-
way’s Law Revisited, Proceedings, International Conference on Software Engineering, Los
Angeles, CA (1999) 85-95
6. Herbsleb, J.D., Moitra, D.: Guest Editors' introduction: Global software development.
IEEE Software, Vol.18 No.2, March/April (2001) 16-20
7. Herbsleb, J.D., Mockus, A.: An empirical study of speed and communication in globally
distributed software development, IEEE Transactions on Software Engineering, Vol.29,
No.6, June (2003) 481-494
8. Jorgensen, D.L.: Participant Observation: A Methodology for Human Studies, Applied
Social Research Methods Series; v.15, Sage Publications, Newbury Park, CA (1989)
9. Kerbrat-Orrecchioni, C.: Les Interactions Verbales, Armand Colin, Paris (1998)
10. Kitchenham B. et al., Preliminary guidelines for empirical research in software engineer-
ing, IEEE Transactions on Software Engineering, Vol.28, No.8. (2002) 721–734
11. Perry, D.E., Staudenmayer N., Votta, L.G.: People, Organizations, and Process Improve-
ment, IEEE Software, July (1994)
12. Robillard, P.N.: Études des aspects cognitifs applicables au génie logiciel, Rapport interne,
École Polytechnique de Montréal, EPM/RT-96/18 (1996)
13. Robillard, P. N., Robillard, M.P.: Types of Collaborative Work in Software Engineering,
The Journal of System and Software, No.53 (2000) 219-224
14. Robillard, P. N., Kruchten, P., D’Astous, P.: Software Engineering Process with the
UPEDU, Addison Wesley, Pearson Education (2002)
15. Seaman, C.: Organizational Issues in Software Development: An Empirical Study of
Communication, PhD Thesis, Computer Science Department, University of Maryland,
Technical Report CS-TR-3726, UMIACS Technical Report UMIACS-TR-96-94 (1996)
16. Seaman, C.: Qualitative methods in empirical studies of software engineering, IEEE Trans-
actions on Software Engineering, Vol.25, No.4 (1999) 557–572
17. Tichy, W.: Should computer scientists experiment more? Computer, Vol.31, No.5 (1998)
32–40
18. Walker, R.J., Briand, L.C., Notkin, D., Seaman, C.B., Tichy, W.F.: Panel: Empirical Vali-
dation–What, Why, When, and How, Proceedings, International Conference on Software
Engineering, Portland, Oregon (2003) 721-722
125