A SEMI-AUTOMATED PROCESS FOR OPEN SOURCE

CODE REUSE

Apostolos Kritikos, George Kakarontzas and Ioannis Stamelos

Computer Science Department, Aristotle University of Thessaloniki, 54124 Thessaloniki, Thessaloniki, Greece

Keywords: Reuse, Free Libre / Open Source Software (FLOSS), Reuse process, Software components.

Abstract: It is clear that Free Libre / Open Source Software (FLOSS) has been demonstrating increasing importance

continually for some years now. As a result, millions of lines of code are becoming available online. In

many cases, this code, is carefully designed, implemented, tested and therefore represents a very good

option for reusability. Lately, more and more companies, especially Small and Medium Enterprises (SMEs),

are reusing open source code to develop their own software. Source code forges such as SourceForge,

Google Code etc., serve as component pools providing plenty of alternatives. In this work we are proposing

a semi-automated reuse process model for discovering open source code online, based on the requirements

of the system under design. This model illustrates the greedy approach of a reuse engineer, who wishes to

reuse as much code as he can and implement the least possible.

1 INTRODUCTION

Code reuse is not a new phenomenon. Both software

companies and individual developers know that

there are certain blocks of code which form classic

components in most of the commercial software

projects. Moreover, there is the case where code that

has been developed for specific requirements, serves

as a base for a similar project that a future client

requests. We usually refer to this kind of code as

legacy code.

The vast adoption of FLOSS brought to surface

the collaborative development of software. In

addition the code of this software was made freely

available online, allowing everyone to see, alter and

in many cases even commercialize the derived work.

This new development culture led to millions of free

lines of code that transformed the WWW to a huge

pool of reusable code which was lately organized to

large code repositories that are known as forges

(SourceForge, Google Code, etc.).

This paper is an experience report trying to

capture specific, discrete steps the reuse engineer

* This work is partially funded by the European Commission in

the context of the OPEN-SME “Open-Source Software Reuse

Services for SMEs” project, under the grant agreement no. FP7-

SME-2008-2 / 243768

takes in order to reuse as much source code as

possible. We then make an attempt to organize these

steps in a semi-automated reuse process model. In

this work we use the term ‘reuse engineer’ to

identify the role who attempts to reuse code by

adapting either the code to the system under

development or the system under development to the

retrieved code, or both. The reuse engineer can be

any developer especially in contexts where a

systematic reuse program is absent, which is very

often the case with SMEs, or it may be an actual

engineer who has been assigned the task of

retrieving and adapting reusable components in a

more systematic reuse approaches.

The rest of the paper is organized as follows. In

Sec. 2 we propose a place for our model within the

software product's life cycle. Sec. 3 describes case

studies and the speculations arose by them which led

to the process model. Sec. 4 provides a detailed

description of our process model. Sec. 5 discusses

related work. Finally the last section summarizes our

conclusions and provides speculations for further

research.

179

Kritikos A., Kakarontzas G. and Stamelos I. (2010).

A SEMI-AUTOMATED PROCESS FOR OPEN SOURCE CODE REUSE.

In Proceedings of the Fifth International Conference on Evaluation of Novel Approaches to Software Engineering, pages 179-185

DOI: 10.5220/0002999401790185

 SciTePress

2 THE REUSE PROCESS INSIDE

THE SOFTWARE PRODUCT’S

LIFECYCLE

Software, as any other type of product has its

lifecycle. In (ISO/IEC 15288, 2002) the phases of a

product’s lifecycle are defined as follows: (1)

Concept (2) Development (3) Production (4)

Utilization (5) Retirement.

Although the nature of the lifecycle of a software

product might consist of slightly different phases, it

is obvious that any attempt for code reuse will take

place in the activity of software construction or the

activities of extension / customization of the

software product. These activities take place during

the phases of development, production and

utilization in the aforementioned product’s lifecycle

scheme.

In order for a software product to be able to

benefit from code reuse, its initial description needs

to be decomposed to small, simple, stand-alone

requirements. Given that such a pre-process was

made, each of the aforementioned requirements

could represent a component to be implemented, or

found from another source, and be reused after

possible adaptation.

The component-based approach, as mentioned in

(Crnkovic et. al, 2006), is based in code reuse in the

sense that existing components are combined in

order to form the desired software. As far as the

product’s lifecycle in this approach is concerned,

(Crnkovic et. al, 2006) propose a variation of the

Waterfall model, which is called Component-based

Waterfall model. This modified waterfall model,

follows the same phases as the classical one, which

are: (1) Requirements (2) Design (3) Implementation

(4) Verification (5) Maintenance. The only

difference is that in each one of this phases we work

with components.

Both the abstract product lifecycle model and the

Component-based Waterfall Software product

lifecycle pinpoint the fact that code reuse, as a

process, fits in the phases of code implementation or

maintenance, where source code is being produced.

3 CASE STUDIES: SOFTWARE

DEVELOPMENT BASED ON

CODE REUSE

Reuse engineering is based in covering the

requirements of the software product we are about to

implement piece by piece. In order to be able to

work this way we need to define the notion “piece of

software”. Most of the reusable code exists in open

source software repositories. Additionally it is a

common practice for open source software

developers to organize their code in components,

bigger or smaller.

Figure 1: System under development decomposed in

components (component tree).

With this in mind we can now go back to

requirements and organize them to possible

components following an approach similar to the

one depicted in figure 1.

Initially, we consider each requirement as a

separate component. Then, based on how

complicated a function each one of these

components encapsulates, we either decompose

them to simpler, dividing their functionality to trivial

ones, or not.

Eventually we will come with a tree structure

that has as a root node the software product itself,

and leaves, the components that need to be

implemented in order to successfully implement the

software product as a whole.

As long as we have this set of components at

hand, we can start searching for their

implementations in reusable code repositories.

During this component “safari” we might face one

of the following situations:

• The component we seek exists: In this case all

we have to do is customize and integrate this

component with the rest of our work.

• The component we seek does not exist, but

subsets of it do: In this case we might need to go

back to our component tree and extend it by

breaking the component which we are currently

dealing with, to simpler ones.

• The component we seek does not exist and

dividing it to simpler seems more time

consuming than actually implementing it: In this

case there is no other option but implementing

the component from scratch. Given the fact that

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

180

in this case the component is usually a trivial

artifact to implement we can always refer to

development forums or online courses to “reuse”

trivial snippets of code (for example read from or

write to a file, creating a Java Comparator, etc.).

Usually a Google search does the trick.

In order to test the effectiveness of code reuse in

action we experimented with two different scenarios

which we describe here.

Searching for a Log-in Component. The objective

is to seek for a reusable component that implements

a web based login functionality in the form of a Java

Bean.

The code reuse process is the following:

1. We are going for a quick solution (therefore we

use Google) and search for “login java bean”.

2. The second result is entitled “Authenticating

users using a Java Bean”. It looks promising.

3. We find source code available and a good

documentation of what we are trying to do with

this specific component in the web page we

visited. Also personal information about the

author informs us that he is a researcher and a

developer in a software company.

4. There is no sign of copyright but there is no sign

that the code is under any kind of open source

license either.

5. Most probably, after a personal request to the

author we will be able to reuse it.

During this process:

• We needed approximately 30” (seconds) to

perform the search

• We needed approximately 5’ (minutes) to have

a first glance for the integrity of the site

In total

: 5.5’ minutes.

Wordnet Handler – Double Code Reuse to

Surpass Library Conflict: We have developed a

Java class that serves as a handler for WordNet, the

lexical database for the English language. In this

implementation we use Java WordNet Library

(JWNL) as a means to connect and handle WordNet.

We want to embody our work to the bigger software

product we are currently working to.

A major conflict with JWNL library, while trying

to deploy our work as a Java OSGI bundle, forces as

to use another Java compatible WordNet library.

Logically, the handler’s code will need to be

rewritten as well.

We have spotted our new library candidate to be

Java WordNet Interface (JWI). We would like to

find reusable code to create a new WordNet handler

class too. While searching to the documentation of

the JWI library, at the official site of the library, we

come across a sample class that implements most of

the desired functionality. Instead of adapting the

newly discovered reusable code we decide to try an

experiment. To reuse the code we discovered to

adapt our first handler implementation. The fact that

WordNet provides specific data makes all library

implementations similar and, as expected, their

API’s too. Combining this insight with the reusable

code we have in hand, we come up with a new Java

WordNet Handler in less than an hour. More

specifically we needed:

• 10’ to search for reusable code for JWI (the new

library)

• 5’ to become familiar with this code

• 30’ to alter the old handler in order to use JWI

• 10’ to test functionality

In total

: 55’ < 1 hour

For consistency reasons we mention that our

initial implementation was also a product of code

reuse. In order for our initial handler to come to its

final version the timeframe, respectively, was:

• 1 ½ hours to search for reusable code

• 4 hours to customize and adapt the reusable

code to the general needs of the project

• 2 hours to test our final code

In total

: 7,5 hours.

The observations made during the above case

studies and other similar to them, lead us to the reuse

process that we describe next.

4 A SEMI-AUTOMATED OPEN

SOURCE SOFTWARE REUSE

PROCESS

In this section we try to organize the knowledge

derived from the case studies of the previous section

to a model. Based on the aforementioned

speculations we propose an open source software

reuse process model (see figure 2). Although it

might seems a bit daedal at first glance, once

explained it is becomes really simple to understand

and follow.

We start by defining the software product that

needs to be developed (from now on we will refer to

it as System Under Development). It can be

considered as a unique component. Therefore, it is

possible to be available in reusable code

repositories. The reuse engineer performs a search to

source code forges. If the search is successful, one or

more results are returned. The reuse engineer

proceeds then in code adaptation, packs the derived

A SEMI-AUTOMATED PROCESS FOR OPEN SOURCE CODE REUSE

181

Figure 2: Open Source software reuse process model.

work and the software product is ready to be handed

out to the customer. At this point one might notice

that no specific methodology for choosing the best

component (in case our search returns more than

one) is being proposed. While this is true, it is not an

omission. In this work we choose to introduce our

model in a basic form, revealing its core

functionality. Component evaluation was

intentionally left as an open issue for future research.

Once the reuse engineer has eliminated the

possibility of finding reusable code for all the

functionality he needs, he moves on by decomposing

the System Under Development into components.

This is the point where he, unintentionally most of

the times, starts creating the tree of components we

described in the previous section.

There is a small possibility that the System

Under Development is too simple to be decomposed

to discrete components. In this case our model

proposes that it should be developed from scratch.

Another possible scenario could be that the

decomposition of the System Under Development

and search for the derived components could require

more time than the development of the project from

scratch. Once the development from scratch decision

is made, the System Under Development is being

implemented, packed and it is ready to be handed

out to the customer.

Most of the times, however, the requirements

can be translated as discrete components. In this

case, the reuse engineer must start searching for

these components, one by one. In our process model

this part of the development procedure is highlighted

by the decision making rhombus entitled

“UNIMPLEMENTED COMPONENTS?”. Its role is

binary. On one hand it starts the loop of trying to

find reusable code for the unimplemented

components. On the other hand it is the condition

that ends the loop, and the whole process in essence,

as it keeps track on whether there are any

components left unimplemented. When no more

functionality needs implementation, the System

Under Development is considered finished, is being

packed and is ready to be handed out to customer.

For every unimplemented component a sub

process starts in order to decide whether reusable

code can be found to implement the functionality

needed or the source code of this component must be

written from scratch.

As we mentioned in section three when breaking

components to simpler ones, we face the danger to

get lost in the procedure and eventually come up

with having spent more time to find reusable code

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

182

for a component than it would actually have taken us

to implement it. In order to avoid this kind of

pitfalls, our process model forces the reuse engineer,

for each one of the components, to speculate on

whether it is really worthy of implementing using

reusable code. There are two possible scenarios

where searching for reusable code should be

discouraged:

• The component needed is very specific,

therefore a lot of time might be spent in

searching accompanied by a high probability of

not returning any results.

• The implementation of the component is trivial;

therefore it will take less time for an

experienced programmer to implement it, than

for the reuse engineer to spot reusable code that

will cover its requirement.

If the reuse engineer decides that the component

is not worth searching for reusable solutions he is

left with only one option; implementing it from

scratch. Looking at the model, though, one notices a

different step to proceed than from scratch

implementation: “SEARCH FOR TRIVIAL

CODE”. Developers were reusing code way before

open source, or the various source code repositories

flourish. This need for reuse comes from the

empirical observation that some snippets of code are

being found in most of software projects. Reading

from or writing to a file, connecting to a database,

creating a comparator in Java, are functionalities we

meet so often, when it comes to software

engineering, that have come to be considered

bibliography code. It is this kind of code that we opt

for the reuse engineer to find implemented by

following the “SEARCH FOR TRIVIAL CODE

PATH”. This way, even when seeking for reusable

components leads to a dead-end the reuse engineer

can be sure that has exhausted every possible way of

performing effective code reuse. After retrieving as

many snippets of code as possible, the code is being

adapted in the needs of the component under

development, the code of the component is being

finalized, and the reuse engineer is ready to move on

to the next component. Of course, when all kinds of

search fail, implementing from scratch is inevitable.

Finally we are going to examine the case where

the component under development cannot be found

as is in a repository of open source code and the

reuse engineer needs to break it to simpler

components of less functionality. No matter how

complex a component he is dealing with, the reuse

engineer needs to be sure that it does not exist in

some code forge. Therefore, as our model illustrates,

he performs a search to forges. After receiving no

results he must examine whether the component can

be further decomposed to simpler components. If not

this means he deals with a component of medium or

little complexity and therefore the whole process

goes back to the previous paragraph where we

discussed the role of the “SEARCH FOR TRIVIAL

CODE” search / implementation model. In case the

component can be decomposed to simpler ones our

process model will consider them as

“UNIMPLEMENTED COMPONENTS” and the

reuse process will continue as normal.

We define the proposed model as semi-

automated because, as it became clear by this

section, the presence of the reuse engineer, is

considered essential. When we speak about a reuse

engineer, we refer to an expert, a software engineer

trained to develop software using reusable code. As

we already explained, normal developers can take on

that role or in more systematic reuse approaches this

role can be assigned to persons with this specific

task as part of a development team.

5 RELATED WORK

(Crnkovic et. al., 2006) present a modification of the

waterfall process for component reuse, in which

there are two processes one for developing reusable

components and another for developing systems

with these reusable components. The authors discuss

in detail the modifications of the activities of the

waterfall model for system development with

component reuse. To connect these two

aforementioned processes they include an additional

process called ‘Component Assessment’ which

should be carried out as much as possible

independently from the system development to

reduce time-to-market. In general an assessment

activity should comprise the following: (a)

Component discovery, (b) Component selection

according its suitability for current and/or future

products, (c) Component verification, and (d)

Storage of the component and its metadata for future

reference. Our proposed process can be used by

reuse engineers to carry out the component

discovery and selection in a more systematic way

when they reuse FLOS software.

Some proposed processes for component

retrieval from the Internet repositories aim at

pushing the automation of this process as much as

possible.

In (Hummel and Atkinson, 2007) a process

called Extreme Harvesting is proposed, which uses

A SEMI-AUTOMATED PROCESS FOR OPEN SOURCE CODE REUSE

183

unit tests that are developed in the context of an

agile software development process (e.g. Extreme

Programming) as a search criterion for reusable

components. There are two variants of the Extreme

Harvesting process, definitive harvesting and

speculative harvesting. With speculative harvesting

the reusable components retrieved are close to what

the developers wanted but not a perfect match and

the developers are required to adapt the system for

the integration of the retrieved components.

Therefore the process is not fully automated but is

supported by an Eclipse plug-in.

Another tool supported process is proposed in

(McCarey et al., 2005) in which the authors describe

Rascal an intelligent agent, which oversees the

development of new code and uses AI techniques to

match the characteristics of the developed code with

existing code from reuse repositories (e.g. the

Sourceforge FLOSS repository). This process aims

at more automation than (Hummel and Atkinson,

2007) since the search process is triggered by the

intelligent agent and the discovered components are

presented to the reuser without his intervention.

However, ultimately the developer is responsible for

deciding the suitability of the retrieved components

and for integrating them to the new system.

In relation to (Hummel and Atkinson, 2007) and

(McCarey et al., 2005) our work aims at

understanding the reuse process as a human activity

first and then propose the tools for supporting this

activity. Although tools such as the ones proposed in

(Hummel and Atkinson, 2007) and (McCarey et al.,

2005) are undoubtedly useful, our approach

concentrates more at the moment on the reuse

process itself, with the hope of better understanding

the issues involved. We believe that a better

understanding of the issues is also a prerequisite for

more effective tool support.

Besides the searching and retrieval of reusable

components, which is the basic area of our research,

there is also a whole other spectrum of issues in

software reuse in general and FLOS software reuse

in particular. These include licensing issues and

quality issues. There is progress towards supporting

these aspects of reuse as well. For example the

FOSSology project (Gobeille, 2008) is best known

for finding the licensing of FLOS software which is

a very important factor especially for commercial

firms who wish to reuse open source software

(Madanmohan and De, 2008). Projects such SQO-

OSS (Gousios et. al, 2008) aim at providing quality

related information for reusable software to enhance

the trust of the users and re-users of FLOS software.

6 CONCLUSIONS AND FUTURE

WORK

In this experience report we discussed about the role

of code reuse when it comes to a software product’s

lifecycle and the software product’s development

process. We tried to provide, in the form of a case

study the reuse engineer’s approach in software

development using concepts related to component

based approach theory. Finally we proposed a semi-

automated open source software reuse model in the

form of a flow chart and presented how it organizes

the steps, a reuse engineer is taking in order to create

a software product with the less effort possible in

terms of programming from scratch.

As we pointed earlier in this paper, this process

model is a first attempt at providing a well defined

way of implementing reuse engineering. Currently

our model requires the presence of an expert, a reuse

engineer, in order to take various kinds of decisions

such as whether a component needs to break to

simpler ones or not, which one of the reusable

components discovered should we use to our

implementation and why, what kind of adaptation

the reusable code needs and so on and so forth.

As future research we would like to examine the

possibilities of providing an even more automated

process model that will be able to deal with some

trivial although essential decisions such as the

proposal of the best component in case the search

returned more than one candidates based in specific

metrics. Another interesting approach could be to try

and measure the fitness of a component inside the

system under development. By fitness we mean the

similarity a component has with the others in terms

of design patterns, coding style, quality metrics, etc.

Once it reaches a certain level of maturity, the

process model could ultimately be transformed into

a tool using the open source forges as a reusable

software pool providing a semi-automated way to

any developer who wishes to discover and evaluate

alternatives of free, reusable code.

REFERENCES

ISO/IEC 15288, System Engineering – System Life Cycle

Processes, First Edition, ISO/IEC, 2002.

I. Crnkovic, M. Chaudron and S. Larsson. 2006.

Component-Based Development Process and

Component Lifecycle. In Proceedings of the

international Conference on Software Engineering

Advances (October 29 - November 03, 2006). ICSEA.

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

184

IEEE Computer Society, Washington, DC, 44. DOI=

http://dx.doi.org/10.1109/ICSEA.2006.28

O. Hummel and C. Atkinson: “Supporting Agile Reuse

Through Extreme Harvesting”, in proc. of the 8th

International XP Conference, pp. 28-37, Springer,

2007

F. McCarey, M. Ó Cinnéide and N. Kushmerick: “Rascal:

A Recommender Agent for Agile Reuse", Artificial

Intelligence Review, vol. 24, no. 3-4, pp. 253-276,

Springer, November 2005

R. Gobeille: “The FOSSology project”, In Proceedings of

the 2008 international Working Conference on Mining

Software Repositories (MSR '08), pp. 47-50, ACM,

2008

T. R. Madanmohan and R. De’, “Open Source Reuse in

Commercial Firms”, IEEE Software, vol. 21, Dec.

2004, pp. 62-69

I. Samoladas, G. Gousios, D. Spinellis and I. Stamelos:

“The SQO-OSS Quality Model: Measurement Based

Open Source Software Evaluation”, IFIP 20th World

Computer Congress, Working Group 2.3 on Open

Source Software, pp. 237-248, Springer, 2008

A SEMI-AUTOMATED PROCESS FOR OPEN SOURCE CODE REUSE

185