Supervised Machine Learning for Recovering Implicit Implementation of

Singleton Design Pattern

Abir Nacef

, Sahbi Bahroun

, Adel Khalfallah

and Samir Ben Ahmed

Faculty of Mathematical, Physical and Natural Sciences of Tunis (FST), Computer Laboratory for Industrial Systems,

Tunis El Manar University, Tunisia

Higher Institute of Computer Science (ISI), Limtic Laboratory, Tunis El Manar University, Tunisia

Keywords:

Singleton Implicit Implementation, Features, Datasets, LSTM, Supervised ML Algorithms.

Abstract:

An implicit or indirect implementation of the Singleton design Pattern (SP) is a programming implementation

whose purpose is to restrict the instantiation to a single object without actually using the SP. This structure may

not be faulty or errant but can impact negatively the software quality especially if they are used in inappropriate

contexts. To improve the quality of the source code, the injection of the SP is sometimes mandatory. In order

to assuring that, a speciﬁc structure must be identiﬁed and automatically detected. However, due to their

vague and abstract nature, they can be implemented in various ways, which are not conducive to automatic

and accurate detection. This paper presents the ﬁrst method dedicated to the automatic detection of Singleton

Implicit Implementations (SII) based on supervised Machine Learning (ML) algorithms. In this work, we

deﬁne the different variants of SII, then based on the detailed deﬁnition we propose relevant features and we

create a dataset named FTD (Feature Train Data) according to the corresponding variant. Based on Long Short

Term Memory (LSTM) models, trained by the FTD data we extract features values from Java program. Then

we create another data named SDTD (Singleton Detector Train Data) containing feature combination values to

train the ML classiﬁer. We resolve the problem of automatic detection of SII with different ML algorithms like

KNN, SVM, Naive Bayes and Random Forest for classiﬁcation task. Based on different public Java corpus,

we create and label a data named SDED (Singleton Detector Evaluating Data), this data is used for evaluating

and choosing the appropriate ML model. The empirical results prove the performance of our technique to

automatically detect the SII.

1 INTRODUCTION

The singleton pattern (SP) (Gamma et al., 1994) is a

creational pattern. It represents an abstract solution to

a commonly occurring software design problem. This

pattern is useful when we need to ensure that only one

object of a given class is instantiated.However, in cer-

tain cases the use of this pattern in a class that required

it is absent, or oven accompanied by incomplete struc-

ture. In this case, the injection of the suitable pattern

is mandatory and considered a complex form of refac-

toring that improve the source code quality, and code

reuse.

The SP has a speciﬁc structure which usually con-

sists of name, application, and implementation de-

tails. The use of this pattern makes possible the shar-

ing of design information. This pattern solves a class

of problems, so designers must know when to use it.

In order to apply the SP, it is important to identify the

type of problem resolved by it, and the existing struc-

ture in the source code that needed the use of this pat-

tern.

In This paper, we focus on the deﬁnition and the

automatic detection of SII. The deﬁnition of SII has

many beneﬁts like the reducing of errors in imple-

mentation, improving the quality of code and make

code reuse.

There are many ways to implement SII and there is

no formal-used structure. This variety of implemen-

tations makes the automatic detection of these struc-

tures a hard task. The existing methods dealing with

pattern recovery convert source code to some interme-

diate representation. However, they show poor per-

formance compared to methods used features. At the

same time, the structural analysis doesn’t allow a per-

fect recovery of the pattern intent in different repre-

sentations. Semantic analysis is needed, and can per-

fectly improve the performance of the model in ex-

tracting needed information.

Therefore, to solve this problem, we propose to

354

Nacef, A., Bahroun, S., Khalfallah, A. and Ben Ahmed, S.

Supervised Machine Learning for Recovering Implicit Implementation of Singleton Design Pattern.

DOI: 10.5220/0011836100003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 354-361

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

deﬁne and extract features from the source code with-

out using any intermediate representation. The rele-

vant features are resulting from the structural and se-

mantic analysis of the Java program. We are based

on the performance of the LSTM to deal with source

code sequence to extract feature values. We create a

set of structured data named FTD corresponding to

each feature for training LSTM classiﬁers. Based on

the extracted features, we create a new structured data

named SDTD to deﬁne the combination values deﬁn-

ing each variant. This data is used to train a super-

vised ML classiﬁer to detect an instance of SII.

In this work, we analyze a Java program using

LSTM classiﬁers, then we use 4 different ML clas-

siﬁers (KNN, SVM, Naive Bayes, and Random For-

est) for the automatic detection of SII.

The main contributions of this paper consist on:

• Deﬁning and analyzing the SII.

• Proposing 21 relevant features for identifying

each variant.

• Constructing 3 datasets; FTD whose total size is,

15000 used for training LSTM classiﬁer. SDTD

containing 7000 samples used for training the Sin-

gleton Implicit Detector (SID). SDED containing

200 java ﬁles collected from the java public cor-

pus and used for the evaluation process.

• Evaluating the created classiﬁers and comparing

the different ML results in terms of precision, re-

call, and F1-Score.

The rest of this paper is organized as follows: In Sec-

tion 2 we establish an overview of related works. In

Section 3, we deﬁne and analyze the SII, and we pro-

pose a set of relevant features. Section 4 presents

the proposed approach process and techniques used.

In Section 5 we represent the different created data

and we discuss the empirical results. In section 6, we

show the conclusion and future works.

2 RELATED WORKS

The identiﬁcation of SII has not been extensively dis-

cussed in the literature. For this reason, we provide

an overview of relevant empirical studies.

The effect of DP on selected quality characteris-

tics and improve software quality has been the sub-

ject of several studies. However, most DPs recogni-

tion approaches focus on the deﬁnition of a typical

structure.

DP recognition methods can be classiﬁed accord-

ing to different criteria and aspects. In this paper, we

categorize detection approaches according to the de-

tection methods used and the analysis styles. The De-

tection methods can be divided into four major cate-

gories: Database query methods, metric-based meth-

ods, UML structures, diagrams, etc. Matrix-based

methods and others.

(Rasool et al., 2010), (Stencel and Wegrzynowicz,

2008) and other works use database query approaches

to recover pattern. In general, this kind of approach

transforms the source code into an intermediate rep-

resentation, such as AST, ASG, XMI, etc. Then they

use SQL queries to extract related pattern information

from the source code representation. However, the

use of database queries cannot recover the instances

of behavioral patterns, and queries are limited to the

available information existing in the intermediate rep-

resentations.

Techniques proposed by (von Detten and Becker,

2011) and (Kim and Boldyreff, 2000) use program-

related metrics method (e.g., aggregations, general-

izations, associations, and interface hierarchies) to re-

cover pattern. (von Detten and Becker, 2011) com-

bine both cluster-based and pattern-based reverse en-

gineering methods. In their approach, they show

that the presence of bad smells in software system

code can falsify metric-based clustering results. An

advanced method, proposed by (Satoru Uchiyama,

2011) combine software metrics and machine learn-

ing to identify candidates for the roles that compose.

Other approaches transform structural and behav-

ioral information into UML structure, graph, or ma-

trix. (Fontana and Zanoni, 2011) exploit a combi-

nation of graph matching and ML techniques. (Yu

et al., 2015) transform GoF patterns and source code

into graphs with classes as nodes and relationships as

edges. Moreover, to obtain ﬁnal instances, the behav-

ioral characteristics of method invocations are com-

pared with the predeﬁned method signature template

of GoF patterns. In this approach, a structural feature

model to represent GoF DPs is introduced. However,

structural features cannot capture the pattern intent.

Most of these methods show good precision and re-

call, but they cannot handle implementation variants

of DP.

Another method proposed by (Chihada et al.,

2015) is based on learning from DPs information for

the recovering process. The DP recognizing is consid-

ered as a learning problem. Similar recent work pro-

posed by (Hussain et al., 2018) treats the problem of

DPs recovering as text categorization. Another work

proposed by (Thaller et al., 2019) uses the neural net-

works to build a feature map for pattern instances. Re-

cently, (Nazar et al., 2022) select a set of relevant fea-

ture codes and combine it with ML to automatically

train a DP detector.

Supervised Machine Learning for Recovering Implicit Implementation of Singleton Design Pattern

355

We can summarize the difference between our

work and the previous works as:

• Similar to other approachs (Chihada et al., 2015),

and (Hussain et al., 2018), we consider the pattern

recovering as learning problem. However, we use

features for reducing the learning space and focus

on relevant information to improve accuracy.

• Similar to (Thaller et al., 2019) and (Nazar et al.,

2022), we use ML and features. But, the use of

speciﬁc and more various data makes the learning

process more efﬁcient.

• (Satoru Uchiyama, 2011) in his approach pro-

poses a method to recover a variant of the SP. One

of the recovered variants represent a speciﬁc im-

plementation using Boolean variable (listing 1 ).

This implementation is considered in our work as

an SII.

All previous work aims to improve the quality of

source code. But we are the ﬁrst to deﬁne the SII and

its automatic detection.

3 SII: DEFINITION AND

REPORTED VARIANTS

In this section, we will give a concrete deﬁnition of

SII with their reported variants. According to the im-

plementation aspect of each variant, we propose a set

of features for their automatic detection.

3.1 Deﬁnition of SII

The SP intent is to ensure that in the same applica-

tion that can be only one created instance of a class

at any time. To implement that, there are many dif-

ferent structures used depending on the context and

the usefulness. These structures are almost based on

common characteristics like ensuring that only one

instance exists, providing a global private static in-

stance to that access with private constructors, and

providing a static method that returns a reference to

this instance. However, in some cases, the use of this

speciﬁc structure (SP) in the class that required it, is

absent. The developer creates a class in his speciﬁc

way corresponding to his knowledge and can prevent

the instantiation by a speciﬁc structure. We named

this speciﬁc implementation, used to prevent instanti-

ation only to one instance, as Singleton Implicit Im-

plementation. There are many ways to implement

that. Based on a detailed analysis of different pub-

lic projects, and developers’ choices to avoid the use

of SP, we identify 4 categories of SII:

• Category 1: Non-Global Access Point

The SP ensures instance reuse. In order to preserve

the instance for future reuse, global access to the in-

stance must be provided. However, away from SP’s

typical implementation, providing public and global

access to the instance may not always be preserved.

Single instantiation can be ensured by restricting ac-

cess to certain areas of code or even making it private

to a single class. There are several ways to achieve

this, like the use of a Boolean variable (listing 1).

Listing 1: Example of SII using Boolean variable as feature.

p u b l i c c l a s s S ing l C

{ p u b l i c s t a t i c bo ol ea n t e s t = f a l s e ;

p u b l i c S in g lC ( ) t h rows E x e p t i o n

{ i f ( ! t e s t ) { t e s t = t r u e ; }

e l s e { throw new E x e p t i o n ( ” . . . ” ) ; } }

p u b l i c s t a t i c S i n g l C g e t I n s t ( ) thro w s

E x e p t i o n { . . . }

. . .

• Category 2: SII with Static Properties.

– Static Data. Monostate described in (Martin,

2002) represents a kind of implementation used

to create one instance of a class. This pattern in-

cludes making all data static except getters and

setters.

– Fully Static Class. Static classes are a solu-

tion for storing a single instance of data to be

accessed globally in an application.

• Category 3: Providing Convenient Access to an

Instance

It consists of giving easy access to objects that

need to be used in many different places. The

general rule is that we want the bounds of the vari-

ables to be as tight as possible and still get the job

done. There are many ways to access objects:

– Object as Parameter in Function. The easiest

and often best solution is to simply pass the de-

sired object as a parameter to the function that

needs it. For example, consider a function for

rendering objects. Rendering requires access

to an object that represents the graphics device

and contains the rendering state. It is common

to pass it to all render functions, usually as a

parameter called context.

– Base Class. The subclass sandbox contains an

abstract sandbox method. Each derived Sand-

box subclass implements the Sandbox meth-

ods with the provided actions. Sandbox classes

work well in the case of minimizing the cou-

pling between the derived classes and the rest

of the program.

– Service Locator. Consist to deﬁne a class

whose sole purpose is to provide global access

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

356

to objects. This generic pattern is called Ser-

vice Locator. It deﬁnes an abstract interface for

a set of operations in which a speciﬁc service

provider implements this interface.

• Category 4: Other Implicit Implementations

It is worth noting that these implementations can

be combined and other implementations to restrict

instantiation, preserve the state of the instance, or

share services can exist.

The use of an enum with ﬁnal ﬁelds can be consid-

ered a multi-thread safe solution. Also, the control

instantiation itself (making a strict condition to

limit the number of instances) can be considered

a naif solution to SII. More than that, we consider

an incomplete structure of Singleton’s typical im-

plementation with certain conditions (have only

one used reference and only one method to create

an instance) as SII which should be detected to be

corrected

Listing 2: Example of Singleton Implicit Implementation

using Enum class.

p u b l i c enum Sou n d System {

INSTANCE( ” H e l l o ” , ” World ” ) ;

p r i v a t e f i n a l O b j e c t f i e l d 1 ;

p r i v a t e f i n a l O b j e c t f i e l d 2 ;

p r i v a t e Soun d S ystem ( O b j e c t f i e l d 1 , O b j e c t

f i e l d 2 ) {

t h i s . f i e l d 1 = f i e l d 1 ;

t h i s . f i e l d 2 = f i e l d 2 ;

}

. . .

3.2 Quality Impact and Trade-offs

SII aims to create a single instance of the class. How-

ever, this goal can not always be achieved, and the

implementation doesn’t play a good solution. Unique

instantiation is not always guaranteed and therefore

can cause unnecessary system overhead and inconsis-

tent behavior.

Classes based on the implementation that doesn’t

provide global access, allow anyone to build it, but it

will fail when we try to build multiple instances. The

downside of this implementation is that the checks to

prevent multiple instantiations are only performed at

runtime. In contrast, due to the nature of the class

structure, the SP guarantees that there is only one in-

stance at compile time.

The beneﬁt of using static classes is that the com-

piler ensures that instance methods are not acciden-

tally added. The compiler guarantees that no instance

of the class can be created. However, the downside

of using static classes is that after decorating a class

with the static keyword, we can never change how it

behaves and the polymorphic option is restricted.

The Service Locator passes objects to the code

that it needs, instead of using a global mechanism to

give some code access to it. That’s dead simple, and

it makes the coupling completely obvious. But, some

objects manually are unnecessary, or actively make

the code harder to read.So, implementing a service

locator instead of SP can cause scalability issues in

high-concurrency environments.

SII aims to create a single instance of the class.

However, sometimes it doesn’t represent the perfect

solution and can provoke many problems, as we have

mentioned. On another side, the use of the typical im-

plementation of SP offers a complete object-oriented

solution. Only one instance of a class is required at

any given time and maintains one state. In this case,

the injection of a typical implementation is manda-

tory.

3.3 Proposed Features

To be able to detect the different variants of SII we are

based on a subset of highlighted features. According

to variant deﬁnition and structure, we have proposed

21 features as represented in Table 1 with the abbre-

viation and the description.

For all variants, common properties must be ver-

iﬁed like; the global object declaration, the accessi-

bility and static property of the class attribute, the ac-

cessibility of the constructor, the global and the static

declaration of the access method and ﬁnally the con-

dition to limit the number of instances; whether by

using Boolean variables, numerical variables count-

ing the number of instances, or by testing the value of

the object itself.

For the implementations based on static aspects

we must check that the class has many static attributes

and static methods, the class has a static inner class

and the static properties of getter and setter methods.

For variants that provide the object reference to the

block it needs, check the use of the reference as a pa-

rameter in the constructor and/or in one or more meth-

ods.

There are several ways to implement a Service

Locator, but the most commonly used are: a static

service locator, which uses attributes for each ser-

vice to store object references, and a dynamic method

that contains java.util.Map of all references of service.

This can be dynamically expanded to support new ser-

vices. For the base class variant ( Base Class), many

mechanisms can be used. The simplest solution is to

pass the object reference as a parameter in the base

class constructor. Another solution is to divide the ini-

tialization into two stages. The manufacturers Do not

take any parameters and simply create objects. Then

Supervised Machine Learning for Recovering Implicit Implementation of Singleton Design Pattern

357

Table 1: Proposed Features

No. Abb. Feature

1 IRE Inheritance relationship (extends)

2 IRI Inheritance relationship (implements)

3 EC Enum Class

4 GD Global class attribute declaration

5 AA Class attribute accessibility

6 SA Static class attribute

7 OA Have only one class attribute

8 CA Constructor accessibility

9 GSAM Global Static Access Method

10 GSSM Global Static Setter Method

11 HOGM Have one method to generate instance

12 SIC Use Static Inner Class

13 CSA Class with Static attributes

14 CFA Class with Final attributes

15 CSM Class with Static Methods

16 COP Constructor with Object Parameter

17 MOP Method with Object Parameter

18 CI Control Instantiation

19 BV Using Boolean Variable to Instantiate

20 MSR Create Map for serves references

21 AMDP abstract method with data parameter

call a method that passes all the necessary data and is

deﬁned directly in the base class. Or simply make the

state of the basic class private and static. Finally, in

the enum class-based implementation, it is necessary

to verify that a class type is declared enum, and that

all variables are declared as ﬁnal static. The number

of class objects declared, and the number of meth-

ods that generate the class instance, play a large role

in checking the consistency between the source code

and the pattern’s intent. Therefore, taking these con-

ditions into account is critical to prevent the model

from generating false predictions (predicting a false

positive as shown in Program 1.3 and 1.4).

3.4 Features Preserving the SP Intent

The intent of the SP is to create only one instance of

a class. Even if the SII doesn’t represent the complete

or the correct solution to resolve the problem well,

an implementation with features that prohibit the pat-

tern intent is not considered an SII. The overarching

features we use to deﬁne SII consistent with the in-

tent of the pattern are ”have only one class attribute”

and ”have only one generating instance”. For exam-

ple, the implementation represented in listing 3 is not

considered as SII because it’s incoherent with single-

ton pattern.

Listing 3: Example of incorrect SP.

n b i n s = 0 ;

p u b l i c s t a t i c S i ng l C 2 g e t I n s ( ) {

i f ( n b i n s == 0 ) {

i n s t a n c e = new Si n g l C2 ( ) ;

n b i n s += 1;}

r e t u r n i n s t a n c e ; }

p u b l i c s t a t i c S i ng l C 2 g e t I n s t 2 ( )

{ r e t u r n new Si n g l C2 ( ) ; } }

Figure 1: Proposed Approach Process.

4 DETECTION PROCESS

In this section, we will present the technical process

of our proposed approach. For recovering the SII, the

proposed process will be divided into two phases as

shown in ﬁgure 1.

4.1 Phase1: Source Code Analysis

In this phase, we started by analyzing the intention

of the SP. Based on this analysis we have determined

as many as possible structures that assure this intent.

These SII as we have named are subsequently studied

for the purpose to extract a set of common characteris-

tics between each variant. At the end of this analysis,

we have identiﬁed 21 features and created 21 datasets

FTD corresponding to each one. Every data is com-

posed of a variety of implementations that can serve

to good learning of the classiﬁer.

To be able to extract feature value, we apply a

structural and semantic analysis of the source code by

the LSTM model.

We have to build 21 LSTM classiﬁers for extract-

ing values for each feature. LSTM classiﬁers are

trained by the corresponding feature’s data. After an-

alyzing the source code and extracting features value,

we pass to the second phase whose purpose is to re-

cover the SII instances.

4.2 Phase 2

The analysis of a different variant of SII makes the

recovery of each one more easy. In fact, based on the

previous analysis, we have created a dataset contain-

ing different features combination.

By creating structured data named SDTD, we

have trained a set of ML classiﬁers. We have selected

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

358

Table 2: Data details.

Data Size Application

FTD 15000 training LSTM classiﬁers

SDTD 5000 training the SID classiﬁer

SDED 200 Evaluate LSTMs and SID

4 ML techniques the most used on DPs and CSs de-

tection (Na

ıve Bayes (NB), KNN, SVM, Random

Forest (RF) and compare them based on the obtained

results.

5 EVALUATION SETUP

In this section, we will present the used datasets for

training and testing the models, the criteria to evaluate

them, and different achieved results.

5.1 Data

We have created and labeled manually three different

structured data; FTD, SDTD and SDED which the

corresponding details are presented in table 2.

FTD. is data created from a variety of snippets of code

whose total size is 15000 samples, used for extract-

ing feature’s value from Java source code. For each

feature, corresponding data is created. The data con-

tains the most needed samples that make true positive

prediction and reduce the false positive (example of

listing 4 5, 6 and 7) rate. Based on obtained results in

each simulation, we try to correctly modify the data

with the goal to obtain the most appropriate data for

the right model training.

Listing 4: Example of correct control instantiation feature.

p r i v a t e s t a t i c C1 i n s t a n c e 1 ;

p r i v a t e C1 i n s t a n c e 1 = n u l l ;

p r i v a t e C1 ( ) { }

p u b l i c s t a t i c C1 g e t I n s t a n c e ( ) {

i f ( i n s t a n c e 1 == n u l l )

{ i n s t a n c e 1 = new C1 ( ) ; }

r e t u r n i n s t a n c e 1 ; }

Listing 5: Example of incorrect Control instantiation fea-

ture

p u b l i c s t a t i c C1 g e t I n s t a n c e ( )

{ i f ( C2 . i n s t a n c e 2 == n u l l )

{ i n s t a n c e 1 = new C1 ( ) ; }

r e t u r n i n s t a n c e 1 ; }

Listing 6: Example of correct Boolean variable feature.

p r i v a t e s t a t i c C1 i n s t a n c e 1 ;

p r i v a t e B oo le an n o I n s t a n c e = tr u e ;

p u b l i c s t a t i c C1 g e t I n s t a n c e ( ) {

i f ( n o I n s t a n c e )

{ i n s t a n c e 1 = new C1 ( ) ;

r e t u r n i n s t a n c e 1 ; }

e l s e { Sys tem . o u t . p r i n t l n ( ” e x i s t ” ) ; }}

Table 3: True and false SII based on class Enum.

Class Features

EC AA CA COP CFA

True True Final/ private True True

Private ﬁnal

False True Final/ private True False

Private ﬁnal

Table 4: True and false SII based on static class.

Class Features

COP CSA SIC CSM

True True True True True

False True True False True

Listing 7: Example of incorrect Boolean variable feature.

p r i v a t e s t a t i c C3 i n s t a n c e 1 ;

p r i v a t e B oo le an n o I n s t a n c e = tr u e ;

p u b l i c s t a t i c C3 g e t I n s t a n c e ( ) {

i f ( n o I n s t a n c e )

{ Sys t em . o u t . p r i n t l n ( ”No i n s t a n c e ” ) ; }

i n s t a n c e 1 = new C1 ( ) ;

r e t u r n i n s t a n c e 1 ; }

SDTD. is structured data containing 5000 samples

used to train the SID. The data contain feature com-

bination values as input (whether Boolean and cate-

gorical types) and two categorical targets (True-SII /

False-SII). To ensure fair classiﬁcation, we make an

equitable number of samples (2500 samples for each

class). The False-SII class should contain the com-

bination of features that is close to SII, but doesn’t

make an SII candidate. The goal in there is to reduce

the number of false-positive and make perfect model

learning. Table 3,4 and 5 shows an extract from the

SDTD.

For a class enum, we must verify that a class type

is an enum (EC) and that all variables are declared as

ﬁnal (CFA), otherwise, it is considered a false candi-

date. Static classes contain static attributes and meth-

ods (COP, CSA) and can use any type of access mod-

iﬁer (private, protected, public, or default) like any

other static member. A static class is represented in

the form of an inner class or a nested class (SIC) oth-

erwise it is considered a faulty implementation. For

SII implemented using control instantiation; indepen-

dent of class attribute accessibility and constructor

accessibility, a class must maintain only one gener-

ated instance (HOGM) in which there exist a block

for controlling the instantiation (CI). If any of these

conditions is violated, the implementation is consid-

ered as False SII.

SDED. is a structured labeled data used for evaluat-

Table 5: True and false SII based on control instantiation.

Class Features

GSAM OA HOGM CI MOP

True True True True True True

False True True False False True

Supervised Machine Learning for Recovering Implicit Implementation of Singleton Design Pattern

359

ing the ML models whether in phase 1 or 2. We have

collected 200 Java ﬁles from a variety of GitHub pub-

lic Java projects

. Based on the analysis of these

projects, we manually select 150 ﬁles with SII and

50 with none.

5.2 The Evaluation Protocol

Precision, recall, and F1 score are standard mea-

sures for statistical evaluation of classiﬁer effective-

ness. The precision rate indicates the fraction of pos-

itive precision which was actually correct. However,

the recall represents the proportion of actual positives

which were identiﬁed correctly. The F1 score is a har-

monic mean of precision and recall into a single num-

ber.

5.3 Experimental results

Regarding our approach being the ﬁrst to deﬁne and

detect SII, we haven’t found closet works for compar-

ing results. That’s why, in this subsection, we restrict

the comparison phase to only one relevant approach

(Thaller et al., 2019).

5.3.1 Obtained Results

• LSTM Results

We have built 21 LSTM classiﬁers and trained them

by the corresponding data to extract each feature

value. The different results presented in table 6 prove

that the LSTM is suitable to extract features value

from Java program, especially when they have an el-

ementary and simple structure (100% of F1-Sore in

RE, RI and EC). However, the model is less efﬁcient

with complex features like OA, HOGM, and CSM

(less than 80% of F1-Score). To deal with that, we

can replace the complex features with more elemen-

tary ones.

• SID Classiﬁer Results

The SID is a classiﬁer using supervised learning with

structured data. To select the appropriate model, we

have tested a set of ML models most used on DP and

CSs extraction (KNN, SVM, RF and NB). The differ-

ent obtained results are represented in table 7. All ML

models perform a good results thanks to the speciﬁc

https://github.com/topics/service-locator?l=java /

https://github.com/topics/service-locator?o=asc&s=forks /

https://github.com/topics/singleton?o=desc&s=updated

/ https://github.com/topics/enum?l=java /

https://github.com/topics/singleton?l=java /

https://github.com/topics/static-class?l=java /

https://github.com/topics/static-variables

Table 6: LSTMs results applied on SDED.

Features Precision (%) Recall (%) F1(%)

IRE 100 100 100

IRI 100 100 100

CSA 96.1 87.49 91.59

EC 100 100 100

CFA 94.13 90.2 92.12

GD 100 95.5 97.69

CSM 74.81 81.26 77.9

AA 92.6 95.12 93.84

COP 88.46 95.4 91.79

SA 94.8 91.74 93.24

MOP 89.7 87.63 88.65

OA 83.4 70.51 76.41

CI 92.36 97.8 95

CA 87.89 89.3 88.58

BV 96.45 91.76 94.04

GSAM 88.8 82.46 85.51

MSR 93.5 89.12 91.25

GSSM 86.23 90.21 88.17

AMDP 87.45 89.56 88.49

HOGM 70.9 77.8 74.18

SIC 90.56 94.78 92.62

Table 7: Comparison of SID Classiﬁers results.

SID

Measures

Precision (%) Recall (%) F1 (%)

RF 99.49 99 99.23

NB 87.90 89.95 87.63

KNN 97.77 98.29 98.01

SVM 96.48 97 96.72

created dataset SDTD. The training data make any

model able to automatically recover SII with higher

accuracy. Except for NB, all models perform very

good and close results (more than 96% in both pre-

cisions, recall, and F1-Score). RF contrarily to NB is

the butter used model, it achieves more than 99% in

different standard measures.

5.3.2 Comparison with State-of-the-Art

Approach

Though our approach is based on features and ML, we

have chosen Feature Maps (FM) (Thaller et al., 2019)

as a similar approach for comparison. The data used

for the evaluation contain 100 Java ﬁles and is con-

structed from Java samples existing in DPB corpus

(Fontana and Zanoni, 2011) (typical implementation)

and samples from SDED containing base class and

service locator implementation (SII). We have repli-

cated the FM approach and evaluated it on recently

created data based on the RF algorithm. The obtained

results are detailed in table 8.

Table 8 show the difference in result between FM

and SID in terms of the correct and incorrect number

of predicted SP instances. Feature Maps is created

to recover typical implementations, so it has the abil-

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

360

Table 8: Comparing SID results with Feature Maps results.

Methods STI SII

Correct Incorrect Correct Incorrect

FM 49 9 24 18

SID 47 11 40 2

Total 58 42

ity to recover correct instances. However, similar im-

plementations (implicit) are not fully trained by the

model, so it fails to correctly predict them. Feature

Maps recover a quiet number of SII as true positive

SP whereas these implementations can negatively af-

fect the source code quality. In contrast, and even if

the structures were very similar, the SID can distin-

guish between the two implementations (47 correct

instances from 58 SP and 40 correct instances from

42 SII).

6 CONCLUSION

In this paper, we have proposed a novel approach to

deﬁne and automatically detect the SII. Based on the

analysis of the SP intent, we deﬁne structure in that

we need to inject the SP. By deﬁning and analyzing

different variants, we propose 21 relevant features that

can make the deﬁnition of each implementation.

For extracting feature values from the Java pro-

gram, we propose to use the LSTM models for syn-

tactical and semantic analysis. To train the LSTM, we

have created and labeled a data named FTD which is

composed of 15000 snippets of code. For the auto-

matic detection of SII, we create a classiﬁer named

SID which uses different ML algorithms. The SID is

trained by a created and labeled data named SDTD.

The data is composed of feature combination values

according to the most existing implementation. The

global size of the data is 5000 samples.

To evaluate the created models, we collect 200

Java ﬁles from different public projects. We manu-

ally label the data used for the evaluation process. We

have selected 4 ML models for the SID and chosen

the perfect one according to their performance. The

empirical results prove that the proposed technique

can correctly recover any SII with higher accuracy

(more than 99% of precision) and outperforms the rel-

evant approach in distinguishing between both types

of SP.

In future work, we try to create more elementary

features to improve the extraction of some complex

ones. The purpose of automatic detection of the SII

is to improve the source code quality by injecting the

SP in the appropriate context. So, in future work, we

will pass on the realization of these complex types of

refactoring using ML. Beginner with the SP we try

to continue the extraction of implicit implementation

with other DPs.

REFERENCES

Chihada, A., Jalili, S., Hasheminejad, S. M. H., and Zan-

gooei, M. H. (2015). Source code and design confor-

mance, design pattern detection from source code by

classiﬁcation approach. Appl. Soft Comput.

Fontana, F. A. and Zanoni, M. (2011). A tool for design

pattern detection and software architecture reconstruc-

tion. Inf. Sci., 181(7):1306–1324.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. M.

(1994). Design Patterns: Elements of Reusable

Object-Oriented Software. Addison-Wesley Profes-

sional.

Hussain, S., Keung, J., Khan, A. A., Ahmad, A., Cuomo,

S., Piccialli, F., Jeon, G., and Akhunzada, A. (2018).

Implications of deep learning for the automation of

design patterns organization. J. Parallel Distributed

Comput., pages 256–266.

Kim, H. and Boldyreff, C. (2000). A method to recover de-

sign patterns using software product metrics. In Soft-

ware Reuse: Advances in Software Reusability, 6th

International Conerence, Lecture Notes in Computer

Science. Springer.

Martin, R. C. (2002). Agile Software Development, Princi-

ples, Patterns, and Practices. Prentice-Hall.

Nazar, N., Aleti, A., and Zheng, Y. (2022). Feature-based

software design pattern detection. J. Syst. Softw.

Rasool, G., Philippow, I., and M

ader, P. (2010). Design pat-

tern recovery based on annotations. Adv. Eng. Softw.,

41(4):519–526.

Satoru Uchiyama, Atsuto Kubo, H. W. Y. F. (2011). Detect-

ing design patterns in object-oriented program source

code by using metrics and machine learning. Proceed-

ings of the 5th International Workshop on Software

Quality and Maintainability.

Stencel, K. and Wegrzynowicz, P. (2008). Detection of

diverse design pattern variants. In 15th Asia-Paciﬁc

Software Engineering Conference, pages 25–32. IEEE

Computer Society.

Thaller, H., Linsbauer, L., and Egyed, A. (2019). Fea-

ture maps: A comprehensible software representa-

tion for design pattern detection. In 26th IEEE Inter-

national Conference on Software Analysis, Evolution

and Reengineering, SANER 2019, Hangzhou, China,

February 24-27.

von Detten, M. and Becker, S. (2011). Combining clus-

tering and pattern detection for the reengineering of

component-based software systems. In 7th Interna-

tional Conference on the Quality of Software Archi-

tectures, pages 23–32. ACM.

Yu, D., Zhang, Y., and Chen, Z. (2015). A comprehensive

approach to the recovery of design pattern instances

based on sub-patterns and method signatures. J. Syst.

Softw.

Supervised Machine Learning for Recovering Implicit Implementation of Singleton Design Pattern

361