developer in an integrated development environment
(IDE) is a widely used technique. At the same time,
each tool implements this method independently and
tune it to a specific task. In this position paper, we dis-
cuss an approach to a code editing process model and
a set of action capturing tools that could be reused for
solving particular problems like defect prediction, de-
veloper performance analysis, or teaching good pro-
gramming practices. Our approach is to represent dif-
ferent aspects of the software development process as
a hierarchical set of annotated events in a single time
scale.
2 RELATED WORK
The document authoring workflows have been stu-
died in multiple research areas, from psychology and
human-computer interaction (HCI)to software engi-
neering. A particular process that is widely represen-
ted in psychology and HCI papers is writing or modi-
fying documents using text editing software. In these
cases, researchers are most interested in the ways
users interact with a computer system, their mental
models, and the factors that influence the interaction,
like user interface complexity (Card et al., 1980), user
experience (Rosson, 1983), or interruptions (Burmis-
trov and Leonova, 2003). The experimental data is
usually collected using video recording or by logging
keys and user commands. The process is described
with one or more quantitative characteristics, e.g., the
number of operations per a time unit.
The most common workflow model used in these
studies is the chronologically ordered sequence of
actions. Burmistrov and Leonova (2003) replaced
atomic operations with the events start and stop marks
so that their model could describe nested and postpo-
ned activities. Polson and Kieras (1985) proposed a
generative model where a set of production rules des-
cribes the user behavior. When the editing context
matches a rule, this rule fires and the model generates
a sequence of higher-level editing commands.
In software engineering, human behavior models
are widely used for task automation and functional
testing. Tools like Expect
2
and Selenium
3
allow de-
velopers to describe the interaction process as a pro-
gram in a domain-specific or a general purpose pro-
gramming language. When these programs are run,
they interact with the system by analyzing its output
and providing necessary input.
At the same time, the workflows that software de-
velopers themselves are using have been studied to a
2
https://core.tcl.tk/expect/index
3
https://www.seleniumhq.org/
much lesser extent. Although there is a lot of rese-
arch on project management, individual processes of
writing the code aren’t in focus. Well-known books
on best programming practices (Hunt and Thomas,
1999; McConnell, 2004; Martin, 2008) deal mos-
tly with software architecture and coding techniques.
The process of programming is to some extent discus-
sed by Carter and Sangler (1997). The book concerns
with the mental models and strategies (“mapping” and
“packing”) that programmers use while working on
the code.
The main problem of experimental studies of pro-
gram authoring processes is their cognitive nature that
makes difficult to trace workflow states and transiti-
ons as they are hidden from the observer. One way to
overcome this difficulty is to ask programmers to ex-
plicitly describe the actions they perform, including
purely mental tasks (Jeffries et al., 1981; Bennedsen
and Caspersen, 2005). The possible states and transi-
tions may be modeled as Markov processes (Kamma,
2014).
At the higher level, the workflow states may be
linked to the tasks the developer solves. These tasks
and their connections to the project issues (bugs or
features) are usually tracked using a version control
system. Each task results in a commit annotated with
a description of the changes and with optional refe-
rences to the issue tracker. Additionally, the source
control repository provides data on the timeline and
history of modifications and on developers involved.
Commit logs seems to be the main source of informa-
tion on programming workflows at the time present
(Hassan and Holt, 2003; Hassan, 2009; D’Ambros
et al., 2010; Rahman and Devanbu, 2013; Rubin et al.,
2014).
The main artifact of the software developer’s work
is the source code. Modern IDEs provide rich code
processing features including incremental parsing,
type checking, and control flow graph analysis. Alt-
hough these features are often available for plugins,
there is just a limited number of process logging tools
that make use of the source code, mostly for defect
prediction. Although the set of employed features
is usually limited, like the code fragments that have
been copied and pasted (Kim et al., 2004; Hou et al.,
2009), other structured features (adding and removal
of code, modification of types or function arguments)
may be worth logging as well (Lehnert, 2011).
As this review shows, the task of modeling work-
flows used by software developers, and other specia-
lists whose operations are hard to observe due to their
mental nature, is far from being solved. Although
analysis tools exist for capturing specific process fe-
atures (e.g., tracking code copying or extracting in-
An Unified Representation of Source Code Authoring Workflows
229