TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability

Tool using Copy&Paste

Katsuhiko Gondow

, Yoshitaka Arahori

, Koji Yamamoto

, Masahiro Fukuyori

and Riyuuichi Umekawa

Dept. of Computing Science, Tokyo Institute of Technology, Tokyo, Japan

Fujitsu Laboratories, Kanagawa, Japan

Keywords:

Software Traceability, Traceability Link, Hash Value, Software Maintenance, Copy and Paste.

Abstract:

In software development, it is crucially important to effectively capture, record and maintain traceability links

in a lightweight way. For example, we often would like to know “what documents (rationale) a programmer

referred to, to write this code fragment”, which is supposed to be solved by the traceability links. Most of

previous work are retrospective approaches based on information retrieval techniques, but they are likely to

generate many false positive traceability links; i.e., their accuracy is low. Unlike retrospective approaches, this

paper proposes a novel lightweight prospective approach, which we call TCC (tracer-carrying code). TCC

uses a hash value as a tracer (global ID), widely used in distributed version control systems like Git. TCC

automatically embeds a TCC tracer into source code as a side-effect of users’ copy&paste operation, so users

have no need to explicitly handle tracers (e.g., users have no need to copy&pastes URLs). TCC also caches the

referred original text into Git repository. Thus, users can always view the original text later by simply clicking

the tracer, even after its URL or ﬁle path is changed, or the original text is modiﬁed or removed. To show

the feasibility of our TCC approach, we developed several TCC prototype systems for Emacs editor, Google

Chrome browser, Chrome PDF viewer, and macOS system clipboard. We applied them to the development of

a simple iPhone application, which shows a good result; our TCC is quite effective and useful to establish and

maintain pinpointable traceability links in a lightweight way. Also several important ﬁndings are obtained.

1 INTRODUCTION

In software development, it is crucially important to

effectively capture, record and maintain traceability

links in a lightweight way. For example, we often

would like to know “what documents (rationale) a

programmer referred to, to write this code fragment”,

which is supposed to be solved by the traceability

links. Most of previous work are retrospective ap-

proaches based on information retrieval techniques,

but they are likely to generate many false positive

trace links; i.e., their accuracy is low.

In practice, there are many cases where the pro-

grammers know, with pinpoint accuracy, the source

and target of traceability links for the software arti-

facts being developed. For example, consider the situ-

ation where a programmer found out the URL (e.g., to

Q&A sites like Stack Overﬂow), where a workaround

is described after spending several tens of minutes

to resolve an implementation issue (see Sec. 2.3 and

Sec. 5 for concrete examples). We call this a pin-

pointable traceability link (Sec. 2.2).

Unlike retrospective approaches, this paper pro-

poses a novel lightweight prospective approach,

which we call TCC (tracer-carrying code). The main

aim of TCC is to make it easier to certainly establish

and maintain pinpointable traceability links, focusing

on the practical use. Note that TCC does not aim to

establish all pinpointable traceability links. Instead,

TCC aims to establish non-trivial pinpointable trace-

ability links, which are found out by the programmers

after spending several tens of minutes for example,

and thus are thought important by the programmers.

TCC uses a hash value as a tracer (global ID),

widely used in distributed version control systems

like Git. TCC automatically embeds

a TCC tracer

into source code as a side-effect of users’ copy&paste

operation, so users have no need to explicitly han-

dle tracers (e.g., users have no need to copy&pastes

In the current implementation of TCC, the hash value

is simply written in source code as comments.

Gondow, K., Arahori, Y., Yamamoto, K., Fukuyori, M. and Umekawa, R.

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste.

DOI: 10.5220/0006837102210232

In Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pages 221-232

ISBN: 978-989-758-320-9

221

URLs). The reason of using copy&paste in TCC is

because our observation suggests that there is a high

correlation between pinpointable traceability links

and copy&paste operations. That is, a copy&paste

often occurs when a programmer wants to use some

information in the referred document (where to copy)

to write a code fragment (where to paste), for exam-

ple.

TCC also caches the referred original whole text

into Git repository. Thus, users can always view the

original text later by simply clicking the tracer, even

after its URL or ﬁle path is changed, or the original

text is modiﬁed or removed.

Although the idea of TCC is quite simple, TCC

has the following important advantages:

• TCC automatically establishes pinpointable trace-

ability links by hooking into users’ copy&paste

operations. Also TCC alleviates the bothersome

management of URL links, memos, and so on.

Once the user has done the TCC’s copy&paste,

the text stored in a Git repository can be always

viewed by simply clicking the TCC tracer, even

after its URL or ﬁle path is changed, or the orig-

inal text is modiﬁed or removed. Furthermore,

TCC can handle the artifacts with no URL or ﬁle

path from the beginning.

• TCC can check the consistency among artifacts

by comparing the hash values (see pair-hash in

Sec. 4.1.2). Also TCC can cope with ﬁne grained

traceability links (see part-hash in Sec. 4.1.3).

• TCC is language-independent, since TCC can be

applied to any text-based ﬁles. Furthermore TCC

can be applied to some binary ﬁles if copy&paste

is available (e.g., PDF ﬁles).

• The Git repositories for other projects can be

safely merged into the Git repositories used by

TCC since the possibility of the hash collisions

is low enough in practice.

• Unlike retrospective approaches, TCC distin-

guishes different versions of the same artifact in-

cluding many common words, since TCC is hash-

based; different versions have different hashes in

practical use.

• The TCC tracer is just a text including hash val-

ues, so even in editors or environments not sup-

ported by TCC, the contents of artifacts can be

retrieved by manually manipulating the Git repos-

itory with the hash values.

• The idea of TCC is very simple, so the TCC im-

plementation is likely to be very small. Actually

the LOC of TCC for Emacs is about 1,000 lines in

Emacs Lisp. (In spite of this fact, the TCC imple-

mentation is not trivial, and even not feasible in

some cases. See Sec. 4.4).

The main contributions of this research are as fol-

lows:

• We have proposed TCC as a novel lightweight

traceability method which automatically embeds

a hash value as the tracer to the copy&pasted text

(Sec. 4).

• We have provided the ﬁrst prototype implemen-

tation of TCC for Emacs, Google Chrome, PDF

viewer and macOS clipboard (Sec.4.3). The full

source code of TCC is publicly available at the

TCC homepage (TCC homepage).

• Our preliminary experiment shows that TCC is

an effective and useful traceability tool with some

important ﬁndings (Sec. 5).

The rest of this paper is organized as follows.

Sec. 2 explains the software traceability and our target

pinpointable traceability links. Motivating examples

of pinpointable traceability links are also presented.

Sec. 3 gives related work. Sec. 4 describes the idea,

design and implementation of TCC. Sec. 5 describes

the preliminary experiment of applying TCC to a sim-

ple iPhone application development. Several import

ﬁndings are also described. Finally Sec. 6 gives the

conclusion.

2 BACKGROUND

2.1 Software Traceability

Although there are various deﬁnitions of software

traceability (Cleland-Huang, et al., 2012) (Cleland-

Huang, et al., 2014), we deﬁne software traceability

as follows:

Software traceability is a property that we can

trace, in both directions, what parts of one

software artifact (e.g., code) reﬂect (e.g. sat-

isfy, implement, etc.) a part of another soft-

ware artifact (e.g., requirement).

Suppose there are two software artifacts A and B.

Also suppose there are a relationship between A and

B like: “B satisﬁes A”, “A is the rationale of B” and

we can reach from A to B, and vice versa. Then, we

ICSOFT 2018 - 13th International Conference on Software Technologies

222

say “a software traceability between A and B is estab-

lished”. A link between A and B is called “traceability

link”.

Software systems are developed to satisfy their re-

quirements, related standards and laws, and so on, but

the requirements are often modiﬁed. So we often need

to modify the corresponding code, too. Conversely,

when a bug is found, we need to refer to the corre-

sponding requirements to correct the code. Therefore

traceability links are indispensable in software devel-

opment.

2.2 Pinpointable Traceability Links

Unfortunately, it is often quite difﬁcult to effectively

and efﬁciently establish traceability links. Upstream

requirements are often ambiguous, vague and con-

ﬂicting (Cleland-Huang, et al., 2012), and their gran-

ularity tends to be coarse, so it is difﬁcult to con-

nect them to the corresponding downstream code

fragments. A non-functional requirement (e.g., se-

curity arrangement) often comes with cross-cutting

concerns, which potentially have complicated one-to-

many (or many-to-many) traceability links, and it is

quite difﬁcult to establish them in reality.

From this observation, in this paper, we limit our

scope to pinpointable traceability links, which can

pinpoint the parts of source and target of the re-

lated artifacts (see also Sec. 1). In other words, pin-

pointable traceability links are links with one-to-one

relationship between the ﬁne-grained parts of text-

based artifacts. At a glance, pinpointable links are

easy to cope with, but actually not. The reasons are as

follows:

• As the size of software systems and their SDK

platforms are rapidly increasing, there are so

many pinpointable links among them. These APIs

and documents are often updated, and their URLs

are often modiﬁed. Thus, the management of pin-

pointable traceability links is a bothersome and er-

ror prone task.

• Even though traceability links are limited to text-

based ones, there are many kinds of media like

plain texts, PDF, HTML/XML documents, MS of-

ﬁce documents, etc., which are stored in various

tools (e.g., version control systems, issue track-

ing systems, Email clients, SNS) and IDEs (e.g.,

Eclipse, Xcode).

• It often takes much time (e.g., several tens of min-

utes) for programmers to ﬁnd out pinpointable

traceability links, since there are several cases

where the API is used intentionally in a non-

intuitive way (See Sec. 2.3 and Sec. 5 for exam-

void signal_handler (int sig) {

:::::::

fprintf must not be used here (see C99)

::::

write (fd, buf, n_buf);

}

Figure 1: write is used instead of fprintf in a UNIX sig-

nal handler.

ple). One major reason of the difﬁculty in recov-

ering traceability links stems from the fact that it

is really difﬁcult to precisely know the program-

mer’s intentions, for example, when the program-

mer used not straightforward or not well-known

ways.

2.3 Motivating Examples of

Pinpointable Traceability Links

2.3.1 Motivating Example #1: fprintf Cannot

Be used in a UNIX Signal Handler

As shown in Fig. 1, in a UNIX signal handler, the

system call write should be used to output some data,

and the library function fprintf must not be used to

do so. The reason of this is because Sec. 7.1.4.4 of

the C standard (C99, 1999) says as follows:

The functions in the standard library are not

guaranteed to be reentrant and may modify

objects with static storage duration. Thus, a

signal handler cannot, in general, call standard

library functions.

This description comes from the fact that data

races can occur if the C standard I/O library functions

are used in a UNIX signal handler as they (including

fprintf) are not reentrant, for example, in the buffer-

ing process (Tahara, et al., 2008).

A traceability link from this code fragment in

Fig. 1 to the above description in the C standard is

obviously pinpointable one. Note that it is quite dif-

ﬁcult for programmers to ﬁnd out this pinpointable

traceability link, if they do not know the possibility of

data race occurrence.

Also note that it is quite difﬁcult to establish this

pinpointable link by the existing retrospective meth-

ods using the similarity between the words in the text,

since the words “fprintf“ and “signal handler” fre-

quently appear in the other places of the C standards.

On the other hand, once we discovered this fact,

we can establish the traceability link with pinpoint ac-

curacy (thus pinpointable), since the target of the link

is only one small fragment of the document (i.e., Sec.

7.1.4.4 of the C standard (C99, 1999)).

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste

223

centering

chrome.runtime.onMessageExternal.addListener (

function (request, sender, sendResponse) {

chrome.tabs.query ({active: true, currentWindow: true},

function (tabs) { chrome.tabs.sendMessage (tabs[0].id, type: "TCC",

function (res) { sendResponse (res); return true; });

return true; });

::::::

return

:::::

true; // this line cannot be omitted.

}

);

Figure 2: return true; is required to send a response asynchronously.

2.3.2 Motivating Example #2

The code fragment in Fig. 2 is a part of our TCC im-

plementation TCC for PDF viewer (Sec. 4.3). In Fig. 2,

using chrome.runtime.onMessageExternal, a

callback function is registered to communicate be-

tween two Google Chrome extensions. The problem

here is that the ﬁnal response should be sent back

by calling sendResponse, but was not, since we

accidentally omitted the last return true; by

mistake. This happened because it is not intuitive

for us to tell the Chrome that we wish to send a re-

sponse asynchronously by using this return true;,

although this is clearly written in the Google Chrome

manual (Google Chrome Manual) as follows.

This function becomes invalid when the event

listener returns, unless you return true from

the event listener to indicate you wish to send

a response asynchronously (this will keep the

message channel open to the other end until

sendResponse is called).

The link between the manual of

chrome.runtime.onMessageExternal and the

code fragment in Fig. 2 is another example of

pinpointable traceability link. If the link is lost, it

becomes difﬁcult to know the rationale of why the

last return true; cannot be omitted.

3 RELATED WORK

There are many papers on the techniques for estab-

lishing traceability links, but, to our knowledge, they

do not use the idea of our TCC, i.e., automatically em-

bedding a TCC tracer into artifacts as a side-effect of

users’ copy&paste operation.

3.1 Retrospective Traceability, Feature

Location

There are many retrospective approaches, which try

to automatically recover traceability links by calculat-

ing the similarity between the words in the artifacts,

using information retrieval techniques like Latent Se-

mantic Indexing (LSI) (Marcus, et al., 2003)(Asun-

cion, et al., 2010)(Dekhtyar, et al., 2007)(Delater

and Paech, 2013)(Hayes, et al., 2006)(Gethers, et al.,

2011). There are many feature location techniques

proposed, which try to identify the source code

fragment that implements some functionality (re-

quirement), thus they are a kind of retrospective

traceability approaches speciﬁc to features (Rubin

and Chechik, 2013)(Dit, et al., 2011)(Dit, et al.,

2013)(Ishio, et al., 2013).

Unfortunately, all of them produce only candi-

dates of traceability links by similar word search,

so these approaches tend to lack the accuracy and

produce many false positives. Especially, they can-

not distinguish different versions of the same artifact

(e.g., source code, API document), while our TCC

can distinguish them since different versions have dif-

ferent hash values in practical use.

3.2 Prospective Traceability

There are several prospective approaches, which

try to (semi-)automatically capture traceability links

by monitoring users’ operations (Neumuller and

Grunbacher, 2006)(Asuncion, et al., 2007)(Pinheiro

and Goguen, 1996)(Pohl, 1996)(Medvidovic, et al.,

2003)(Kersten and Murphy, 2005)(Altintas, et al.,

2006)(Asuncion and Taylor, 2009). Unfortunately,

similarly to retrospective approaches, they also

suffer from the lack of accuracy. Furthermore,

some approaches require users to describe some

rules in advance (Pinheiro and Goguen, 1996)(Pohl,

1996)(Asuncion and Taylor, 2009). Thus, the exist-

ing prospective approaches are not lightweight.

ICSOFT 2018 - 13th International Conference on Software Technologies

224

Our TCC is a prospective approach in a broad

sense, but TCC’s accuracy is high and does not re-

quire rules in advance (i.e., TCC is lightweight),

since TCC embeds tracers as a side-effect of users’

copy&paste operation.

3.3 Software Concordance

The Software Concordance (Nguyen and Munson,

2003) integrates both of source code and documents

into XML formats, and connect them using hyper-

text links as traceability links. Software concor-

dance also provides uniform editing and versioning

for Java. This approach seems highly language-

dependent, which may make it difﬁcult to extend the

Software Concordance to other languages.

Unlike the Software Concordance, TCC is

language-independent, and TCC can be in principle

applied to all text-based artifacts, since TCC just em-

beds a hash value to the original copy&pasted text.

3.4 Documentation Tools, Cross

Referencer Tools, IDEs

Documentation tools (e.g., Javadoc, Doxygen) gener-

ate API documents using the user-speciﬁed annota-

tions, which helps users to keep the API documents

consistent, but they do not support traceability links.

(Users have to manually specify the URL in the anno-

tations.)

Cross reference tools (e.g., LXR, GNU Global,

Cxref) automatically generate links between the def-

inition and reference of identiﬁers, which helps users

to understand the source code. The supported links

are limited to identiﬁers (e.g., names of ﬁles, classes,

methods, functions and variables), so they cannot sup-

port general traceability links.

Recent IDEs (e.g., Eclipse, Xcode) navigate users

to the corresponding API manuals by clicking the API

function names in the source code editors. Like cross

reference tools, the supported links are limited to API

function names.

3.5 Literate Programming

In literate programming (Knuth, 1984), source code

and the corresponding document are written in the

same ﬁle, and a dedicated tool of literate program-

ming generates the traditional source code and docu-

ments separately. This approach helps users to keep

source code and its document consistent, but trace-

ability links need to be manually described, when the

document exists outside of the source code (e.g., stan-

dards and laws) or the document is too large to put

them into the same ﬁle.

3.6 Software Watermarking, Software

Birthmarking

There are software watermarking/birthmarking tech-

niques (Jalil, 2009)(Myles and Collberg, 2004),

whose primary goal is to prevent software piracy. In

watermarking, some invisible data is embedded to

programs purposely, while a birthmark is calculated

using some characteristics of the programs.

Both techniques detect software piracy by com-

paring some characteristic values of two artifacts A

and B. Thus, it is a prerequisite that you already have

A and B. (If not, a brute force search is possible, but

impractical.)

On the other hand, in software traceabilities, we

need to reach from the artifact A to B when we

have only A, and B is unknown yet. Thus soft-

ware watermarking/birthmarking are not suitable for

software traceabilities. Furthermore, software wa-

termarking/birthmarking are typically used for the

whole software (coarse granularity), while our TCC

is used for the small fragments of artifacts (ﬁne gran-

ularity).

4 TCC: TRACER-CARRYING

CODE

This section presents the idea, design and implemen-

tation of our TCC (Tracer-Carrying Code). We pro-

pose a novel traceability method called TCC, whose

primary aim is to establish pinpointable traceability

links in an effective and lightweight way.

4.1 Ideas of TCC

4.1.1 Idea #1: TCC Embeds a Hash Value as

Tracer in a Copy&Pasted Text

Consider the situation where a user copy&pastes

some text (e.g., “output "X"” in Fig. 3) from one

ﬁle A to another ﬁle B, with the intention of estab-

lishing a traceability link from B to A. Then, as a side

effect of the copy&paste, TCC automatically embeds

the hash value (0x1234 in Fig. 3) for the whole con-

tent of A to the copy&pasted text

. In our current

implementation, we use SHA-1 (20-bytes length, i.e.,

40 characters in hexadecimal) as a hashing method.

TCC also embeds other information like URL if exists.

See Table. 1 in Sec. 4.2 for the detail.

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste

225

The pasted hash values might make the source

code less readable. To avoid this problem, TCC pro-

vides a mechanism to hidden the hash values in the

editors like outlining mechanism.

Also TCC registers the whole content of the ﬁle

A to a Git repository, so the whole content of A can

be retrieved by using the hash value as a key. Thus,

the user can view the content of A by accessing the

Git repository at any time, even after the original ﬁle

content is modiﬁed or removed, or the URL or ﬁle

path is modiﬁed. To make this retrieval easier, TCC

provides a mechanism to do this by simply clicking

the hash value in B’s editor.

Figure 3: TCC’s idea #1.

4.1.2 Idea #2: TCC Detects Inconsistency using

Pair Hash

If the user desires, TCC embeds the pair hash of A

and B (e.g., hash(A+B) in Fig. 4), which is the hash

value for the string-concatenated text of A and B. If

the original ﬁle A or B (on the Internet or ﬁle system,

not in the Git repository) is changed, TCC detects the

change by comparing two pair hashes of the previ-

ously recorded one and the latest one.

Here this pair hash mechanism assumes that the

user always sets up the pair hash after he or she modi-

ﬁes the ﬁle A and/or B into a consistent state between

them. This assumption is required since there is no

other way for TCC to know that A and B get consis-

tent. The pair hash is a kind of declaration by the user

to show the user believes that A and B are consistent.

Note that the pair hash cannot prevent malicious

falsiﬁcation, since any user can set up the pair hash

again after modiﬁcation. In this case, TCC cannot

detect the inconsistency, although the previous con-

sistent versions of A and B can be reverted manually

later from the Git repository.

Figure 4: TCC’s idea #2.

4.1.3 Idea #3: Fine-grained Traceability using

Part Hash

As is often the case, the user wants to establish a

traceability link not to the whole content of the ﬁle,

but to the speciﬁc part of it. In this case, the hash

value for the whole content might produce a false pos-

itive, since only adding one character at the beginning

of the ﬁle implies the change of the whole ﬁle, even

though the speciﬁc part that the user is interested in re-

mains unchanged. Thus, the granularity of the whole

ﬁle is sometimes too coarse.

To solve this problem by enabling ﬁne-grained

granularity, TCC embeds the part hash of the spe-

ciﬁc part that the user speciﬁes (e.g., hash(a part A) in

Fig. 5). The part hash for the ﬁle A (reference target)

is straightforwardly computed for the selected region

in copy&paste, while the region needs to be addition-

ally speciﬁed by the user to compute the part hash for

the ﬁle B (reference source).

Figure 5: TCC’s idea #3.

4.2 TCC’s More Concrete Usage

Examples

This section shows TCC’s more concrete usage ex-

amples using the screenshots of the Emacs version

of our TCC implementations, called TCC for Emacs

ICSOFT 2018 - 13th International Conference on Software Technologies

226

Figure 6: TCC’s copy operation in the ﬁle A.txt.

Figure 7: The copied text with a hash code is pasted into

B.c.

(See Sec. 4.3 for other TCC implementations). TCC

for Emacs is implemented as an Emacs minor-mode,

so TCC can be enabled in editing of any kind of text-

based ﬁles, which means TCC for Emacs is language-

independent.

Figure 8: The content of A.txt is shown with the previ-

ously copied region highlighted (in a different buffer from

A.txt).

Figure 9: TCC tracers are made invisible.

1. TCC copy operation

Consider the situation, like Sec. 4.1, where a user

copy&pastes the text “output "X"” from the ﬁle

A.txt to B.c. First, in the ﬁle A.txt, the user

does the copy operation (Fig. 6). As a side ef-

fect, as described in Sec. 4.1.1, TCC appends the

hash value of A.txt to the copied text. Also TCC

registers A.txt to a Git repository

. Note here

We use Git in our implementation, but other hash-based

that we deliberately designed TCC’s copy oper-

ation to be different from the normal one in the

Emacs or the underlying OS, since TCC’s copy

operation silently modiﬁes the copied text, and

the modiﬁcation of the copied text unintended by

the user is undesirable

. Because of this, TCC for

Emacs does not use the normal copy&paste area

in Emacs (primary selection). Instead, TCC for

Emacs uses the secondary selection as the TCC’s

copy&paste area, which most users do not use.

2. TCC paste operation

After the copy operation, the user pastes the

copied text with a hash value to B.c (Fig. 7). Un-

like TCC’s copy operation, this paste is a nor-

mal one. The underlined text appended by TCC

for Emacs (called TCC tracer) in Fig. 7 is click-

able; when clicked, the content of A.txt will be

shown in the editor with the previously copied re-

gion highlighted (Fig. 8) by retrieving it from the

Git repository. Note that the buffer window is dif-

ferent from the original one of A.txt, thus this

feature always works even after A.txt is removed

or moved in the editor, ﬁle system, or Web server.

As shown in Fig. 7, the TCC tracer is represented

in JSON format for easy processing; the keys in

TCC tracer and their meaning are listed in Ta-

ble. 1. @TCC is just a preamble to show that TCC’s

hash value follows. The TCC tracer in Fig. 7 in-

cludes the hash value of A.txt (ref-hash), the

part hash value of A.txt (ref-part-hash), the

byte offset positions of the copied area in A.txt

(ref-pos), the ﬁle path of A.txt (ref-url), and

the time stamp when copied (timestamp).

The TCC tracer in Fig. 7 is 7-lines long; it makes

the source code difﬁcult to read. To alleviate this,

TCC for Emacs allows the user to make the hash

value invisible (Fig. 9) by a simple key operation

(Ctrl-c t i by default).

3. TCC inconsistency checking

TCC detects the inconsistency between source

and target ﬁles by comparing the hash values in

the TCC tracer. To show this, consider the situa-

tion where the user modiﬁed the ﬁrst line of A.txt

(Fig. 10), but the user forgot to modify the corre-

sponding part of B.c, which results in an incon-

sistent state.

Then TCC compares two hash values between

distributed version control systems can be used alterna-

tively. TCC uses the Git repository at /.tcc by default, but

the existing Git repositories in other projects can also be

used, since no interference occurs there.

Actually, in most Web browsers, modifying the clip-

board in JavaScript is not allowed for security reasons.

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste

227

Table 1: Keys in TCC tracer.

Key Description

ref-hash Hash value for the whole con-

tent of target ﬁle

ref-part-hash Hash value for the part content

of target ﬁle

ref-pos Pair of byte offsets of the copied

region in target ﬁle

ref-url URL or ﬁle path of target ﬁle (if

exists)

timestamp Time stamp when this TCC

tracer was appended

my-hash Hash value for the whole con-

tent of source ﬁle

my-part-hash Hash value for the part content

of source ﬁle

my-pos Pair of byte offsets of the speci-

ﬁed region by the user in source

ﬁle

pair-hash Hash value for the whole con-

tent of both of target and source

ﬁle

pair-part-hash Hash value for the part contents

of both of target and source ﬁle

pair-pos Two pairs of byte offsets for the

regions in target and source ﬁle

the previously recorded one and the latest one,

and highlights them in red color if they differ

(Fig. 11). Currently, this is invoked by the user’s

key operation (Ctrl-c t c by default), but it

can be automated, using periodic polling, with-

out the user’s operation. Note that in Fig. 11,

Figure 10: The user modiﬁes the target ﬁle A.txt at the ﬁrst

line.

Figure 11: TCC detects the inconsistency between A.txt

and B.c.

pair-hash and ref-hash are highlighted in red,

but ref-part-hash not since the copied text

Output "X" in A.txt is not modiﬁed. This

demonstrates the idea of TCC’s part hash works

well in this example. Also note that a naive re-

computation of the latest part hash using byte off-

sets ref-pos does not work, since the text Output

"X" is shifted forward by the text insertion at the

ﬁrst line, and thus the hash values calculated by

ref-pos differ from the previously recorded one,

even if the text Output "X" is not modiﬁed. So

our current implementation uses a diff tool (git

diff); TCC reports the inconsistency if the origi-

nal copied text is included in the diff.

4.3 TCC Implementations

Currently, we have implemented the following TCC

prototypes, using 20-bytes length SHA-1 as a hash

method, and Git as a hash-based distributed version

control system:

• TCC for Emacs

• TCC for Google Chrome

• TCC for PDF viewer

• TCC for macOS clipboard

TCC for Emacs supports all the TCC operations

described in Sec. 4.2. Also TCC for Emacs supports

the paste operation by drag&drop, i.e., the TCC tracer

is inserted by dragging a ﬁle icon and then dropping

it on the Emacs editor.

The prototype implementations other than TCC for

Emacs only support the TCC copy operation, since

• The Web pages and PDF ﬁles on the Internet are

usually not editable

• Editing local PDF ﬁles has usually no meaning,

because the change will not be reﬂected to the

original PDF ﬁles on the Internet.

• The API of macOS clipboard provides no way to

edit the ﬁle where the copy operation is done.

TCC for Google Chrome is implemented as a

Chrome extension. For just the same reason that TCC

for Emacs uses a secondary selection (Sec. 4.2), TCC

for Google Chrome does not hook the normal copy

command (e.g., Ctrl-C). Instead, the TCC copy op-

eration is invoked by selecting the context menu Copy

with TCC tracer (Fig. 12) to make the user aware

that the copied text is modiﬁed by adding the TCC

tracer.

TCC for PDF viewer is implemented by modifying

PDF.js

, which is a Chrome extension. The interface

This is the main reason why our current TCC only sup-

ports one way links, not bidirectional ones.

https://mozilla.github.io/pdf.js/

ICSOFT 2018 - 13th International Conference on Software Technologies

228

of the TCC copy operation in TCC for PDF viewer is

the same as TCC for Google Chrome (Fig. 13).

TCC for macOS clipboard is implemented as a

“macOS service”. The macOS services enable the

user to use the features of another application, pass-

ing some data (e.g., the selected text) from one ap-

plication. Thus TCC for macOS clipboard works

for all the macOS applications that supports the ma-

cOS services. TCC for mac OS clipboard can be in-

voked by the service menu (Fig. 14), the context menu

(Fig. 12 and Fig. 13) or by a system keyboard short-

cut (Ctrl-command-C in our setting). Due to the lim-

itations of the clipboard API, TCC for macOS clip-

board only registers the copied text to the Git repos-

itory, not the whole content of the selected ﬁle (see

also Sec. 4.4).

Figure 12: TCC copy operation in TCC for Google Chrome.

Figure 13: TCC copy operation in TCC for PDF viewer.

Figure 14: TCC copy operation in TCC for macOS clip-

board.

4.4 The Difﬁculties in Implementing

TCC

The idea of TCC is quite simple. Unfortunately, how-

ever, this does not imply that the TCC implementation

is trivial and straightforward. Actually we encoun-

tered the following difﬁculties when we developed the

TCC prototypes.

• The current TCC for macOS clipboard cannot

compute the hash value for the whole content of

the ﬁle (ref-hash), since the clipboard API of

macOS provides no way to access to the copied

ﬁle. So the current TCC for macOS clipboard

only computes the hash value for the copied part

(ref-part-hash). This shows the feasibility of

implementing TCC highly depends on the under-

lying API.

Also the macOS service of TCC for macOS clip-

board works well for most macOS applications,

but there are some exceptions (e.g., Microsoft Ex-

cel for macOS).

• There is no simple way to execute a UNIX shell

command from a Chrome extension mainly due to

security reasons. To work around this problem in

TCC for Google Chrome and TCC for PDF viewer,

we needed to implement a simple Web server

(called xhr_git_register.pl)

, which receives

an HTML ﬁle as a POST method, executes Git

commands to register the HTML ﬁle to the Git

repository, and returns a SHA-1 hash value.

• There is no way to obtain the byte offset positions

of the copied area (ref-pos) in the Chrome. The

reasons are twofold.

– The API window.getSelection() returns a

selection object. This selection object includes

relative offsets inside the selected DOM nodes,

but it does not include the byte offsets from the

beginning of the HTML ﬁle.

– The HTML ﬁle obtained through the API

document.documentElement.outerHTML

and the original HTML ﬁle differs in the lexical

level, where spaces, newlines and indentation

are changed. Furthermore, the HTML ﬁle is

normalized according to the XHTML standard,

where for example <li>aaa is changed to

<li>aaa</li>. This means that the byte

offsets are useless even if available.

• The hash value for a ﬁle cannot be embedded in

the same ﬁle correctly, in the sense that the hash

Until 2013, NPAPI plugins were available, which al-

lows us to call native binary code from JavaScript, but not

available now.

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste

229

void signal_handler (int sig)

{

// fprintf must not be used here (see C99)

// C99 says: a signal handler cannot,

// in general, call (omitted) @TCC...

write (fd, buf, n_buf);

}

Figure 15: TCC link is successfully established in Exam-

ple. 1.

value embedded in the ﬁle is not the same as the

hash value recomputed for the ﬁle, since the em-

bedded hash value alters the hash value for the

ﬁle.

To work around this problem, (e.g. my-hash (Ta-

ble. 1) is the case), when TCC computes and

checks the hash value, TCC removes the existing

hash value from the ﬁle in advance.

5 PRELIMINARY EXPERIMENT

As a ﬁrst preliminary experiment, we applied TCC

to our two motivating examples (Sec. 2.3); for both

cases, TCC worked well. Fig. 15 shows that the TCC

link is successfully established in the ﬁrst example

(Fig. 1)

, where the TCC link itself is made invisible

as ....

We also applied TCC to our own simple iPhone

application development. The application is a simple

pedometer (Fig. 16), and its source code is about 600

lines in Swift3. As the result, we found that TCC is an

effective and useful traceability tool in the following

ﬁndings:

Figure 16: Screenshot of our pedometer application using

TCC.

• TCC is useful to record the implementation ratio-

nale.

The ﬁgure using TCC for the second example (Fig. 2)

is omitted to save space here.

For example, we would like to output the notiﬁ-

cation in the application (e.g., “You have reached

10,000 steps!”) even when the application is run-

ning in the background. But the motion-related

API of iOS SDK (CoreMotion) is not allowed to

run in the background. However, other famous

pedometer applications like Runtastic deﬁnitely

do this even in the background.

After a half hour struggling on the Internet, we

found out some techniques in combination with

other services that are allowed to run in the back-

ground (e.g., GPS, music) (Apple Programming

Guide)(Apple Ref. Core)(Apple Ref. Back-

Ground)(stack overﬂow). TCC made it easier for

us to record these URLs as a rationale (i.e., the

reason why the GPS service is used, which is un-

necessary at a glance) using the TCC tracer in the

source code.

This TCC tracer is useful since the TCC tracer

saves time to search the rationale again, and also

gives us a sense of relief that we can reach the

contents of URLs at any time, even after they are

updated, moved or removed (the API manuals are

frequently updated).

• TCC is useful to record unresolved issues.

For example, it was unclear how we can increment

an iOS badge on the application icon, which is the

number of unread notiﬁcations in the application.

At the time we developed, we can setup the iOS

badges by incrementing the variable UILocal.

applicationIconBadgeNumber, but the manual

says that the API is deprecated, so we should not

have used it.

After all, we could not ﬁnd the appropriate way in

the manual (perhaps, the manual was not updated

for the latest API), so we could not help but use

the above deprecated API. We described this issue

in the Emacs buffer, and established a traceabil-

ity link to this by copy&pasted from the Emacs

buffer. This shows TCC is useful for unresolved

issues by recording the hash value for anonymous

ﬁles and stores them to the Git repository.

A typical conventional way to show some unre-

solved issue is to write a comment like FIXME:

this API is obsolete. Using TCC, we can

write more detailed long description and embed

the TCC link to the description.

• We did not need the traceability links to most

of the normal API manuals we referred to, since

modern IDEs like Eclipse and Xcode provide

a way to easily view the corresponding manual

from the API function name. Instead, TCC is ef-

fective to record the rationale of corner cases.

ICSOFT 2018 - 13th International Conference on Software Technologies

230

6 CONCLUSION

This paper proposes a novel lightweight prospective

approach to establish and maintain traceability links,

which we call TCC (tracer-carrying code). TCC uses

a hash value as a tracer (global ID), and TCC automat-

ically embeds a TCC tracer into source code as a side-

effect of users’ copy&paste operation. To show the

feasibility of our TCC approach, we developed sev-

eral TCC prototype systems. We applied them to the

development of a simple iPhone application, which

shows a good result. As a future work, we would like

to implement other prototypes, for example, for IDEs

like Eclipse and Xcode, and for ofﬁce software like

MS Excel and Powerpoint.

REFERENCES

J. Cleland-Huang, O. Gotel and A. Zisman (Eds.): Software

and Systems Traceability, ISBN: 978-1-4471-5819-6,

Springer, 2012.

J. Cleland-Huang, O. Gotel, J. H. Hayes, P. Mader and

A. Zisman: Software traceability: trends and future

directions, Proc. on Future of Software Engineering

(FOSE 2014). pp.55–69, ACM, 2014.

Programming languages–C: ISO/IEC 9899:1999.

T. Tahara, K. Gondow, S. Ohsuga: DRACULA: Detec-

tor of Data Races in Signals Handlers, 15th Asia-

Paciﬁc Software Engineering Conf. (APSEC), pp.17–

24, 2008.

A. Marcus and J. I. Maletic: Recovering documentation-

to- source-code traceability links using latent semantic

indexing, ICSE’03, pp.125–135, 2003.

H. U. Asuncion, A. U. Asuncion and R. N. Taylor: Software

traceability with topic modeling, ICSE’10, pp.95–

104, 2010.

A. Dekhtyar, J. H. Hayes, S. Sundaram, A. Holbrook and

O. Dekhtyar: Technique integration for requirements

assessment, 15th IEEE Int. Requirements Engineering

Conf. (RE 2007), pp.141–150, 2007.

A. Delater and B. Paech: Analyzing the tracing of re-

quirements and source code during software develop-

ment, a research preview, 19th Int. Working Conf. on

Requirements Engineering: Foundation for Software

Quality (REFSQ 2013), pp.308–314, 2013.

J. H. Hayes, A. Dekhtyar and S. K. Sundaram: Advanc-

ing candidate link generation for requirements tracing:

the study of methods, IEEE Transactions on Software

Engineering, 32(1), pp.4–19, 2006.

M. Gethers, R. Oliveto, D. Poshyvanyk and A. De Lucia:

On integrating orthogonal information retrieval meth-

ods to improve traceability recovery, 27th IEEE Int.

Conf. on Software Maintenance (ICSM), pp.25–30,

2011.

J. Rubin and M. Chechik: A survey of feature location tech-

niques, in the book of Domain engineering: product

lines, languages, and conceptual models, ISBN-10:

3642366538, Springer, pp.29–58, 2013.

B. Dit, M. Revelle, M. Gethers, D. Poshyvanyk: Feature lo-

cation in source code: a taxonomy and survey, Journal

of software maintenance and evolution: research and

practice, 25(1), pp.53–95, 2011

B. Dit, M. Revelle and D. Poshyvanyk: Integrating in-

formation retrieval, execution and link analysis algo-

rithms to improve feature location in software. Empir-

ical Software Engineering, 18(2), pp.277–309, 2013.

T. Ishio, S. Hayashi, H. Kazato and T. Oshima: On the ef-

fectiveness of accuracy of automated feature location

technique, 20th Working Conf. on Reverse Engineer-

ing (WCRE), pp.381–390, 2013.

C. Neumuller and P. Grunbacher: Automating Software

Traceability in Very Small Companies: A Case Study

and Lessons Learne, 21st IEEE/ACM Int. Conf.

on Automated Software Engineering (ASE), pp.145–

156, 2006.

H. U. Asuncion, F. François, and R. N. Taylor: An end-

to-end industrial software traceability tool, Proc. 6th

joint meeting of the European Software Engineer-

ing Conf. and the ACM SIGSOFT Sympo. on The

Foundations of Software Engineering (ESEC-FSE),

pp.115–124, 2007.

F. A. C. Pinheiro and J. A. Goguen: An Object-Oriented

Tool for Tracing Requirements. IEEE Softw. 13(2),

pp.52–64, 1996.

K. Pohl: PRO-ART: enabling requirements pre-traceability,

Proc. 2nd Int. Conf. on Requirements Engineering,

pp.76–84, 1996.

N. Medvidovic, P. Grunbacher, A. Egyed and B. W. Boehm:

Bridging models across the software lifecycle, Jour-

nal of Systems and Software, 68(3), pp.199â

A¸S-215,

2003.

M. Kersten and G. C. Murphy: Mylar: a degree-of-interest

model for IDEs: Proc. 4th Int. Conf. on Aspect-

Oriented Software Development (AOSD) pp.159–

168, 2005.

I. Altintas, O. Barney, E. Jaeger-Frank: Provenance collec-

tion support in the kepler scientiﬁc workﬂow system,

Int. Provenance and Annotation Workshop, pp.118–

132, 2006.

H. U. Asuncion and R. N. Taylor: Capturing custom link

semantics among heterogeneous artifacts and tools,

Proc. 2009 ICSE Workshop on Traceability in Emerg-

ing Forms of Software Engineering (TEFSE ’09),

pp.1–5, 2009.

T. N. Nguyen and E. V. Munson: The software concor-

dance: a new software document management envi-

ronment, SIGDOC’03: Proc. 21st Annual Int. Conf.

on Documentation, pp.198–205, 2003.

D. E. Knuth: Literate programming, Comput. J., 27(2),

pp.97–111, 1984.

Jalil, Z.: A Review of Digital Watermarking Techniques for

Text Documents. Int. Conf. Information and Multime-

dia Technology, ICIMT 2009, pp.230â

A¸S-234, 2009.

G. Myles and C. S. Collberg:Detecting software theft via

whole program path birthmarks, Proc. 7th Int. Conf.

on Information Security, vol. 3225 of LNCS, pages

404–415. Springer, 2004.

App Programming Guide for iOS: Background Execu-

tion, https://developer.apple. com/library/content/

TCC (Tracer-Carrying Code): A Hash-based Pinpointable Traceability Tool using Copy&Paste

231

documentation/iPhone/Conceptual/iPhoneOSProgram

mingGuide/BackgroundExecution/ BackgroundExe-

cution.html

Apple Reference: Core Location, https://developer. ap-

ple.com/ reference/corelocation/

Apple Reference: allowsBackgroundLocationUp-

dates, https://developer.apple.com/reference/

corelocation/cllocationmanager/1620568-

allowsbackgroundlocationupdates

stack overﬂow: Pedometer in the Background, http://stack

overﬂow.com/questions/17785325/ pedometer-in-the-

background/

Google Chrome Manual: chrome.runtime#onMessage Ex-

ternal, https://developer.chrome.com/extensions/ run-

time #event-onMessageExternal

TCC homepage: http://www.sde.cs.titech.ac.jp/tcc/

ICSOFT 2018 - 13th International Conference on Software Technologies

232