Features and Supervised Machine Learning Based Method for Singleton

Design Pattern Variants Detection

Abir Nacef

, Sahbi Bahroun

, Adel Khalfallah

and Samir Ben Ahmed

Faculty of Mathematical, Physical and Natural Sciences of Tunis (FST), Computer Laboratory for Industrial Systems,

Tunis El Manar University, Tunisia

Higher Institute of Computer Science (ISI), Limtic Laboratory, Tunis El Manar University, Tunisia

Keywords:

Design Patterns, Singleton Variant, RNN-LSTM Classiﬁer, Machine Learning, SD Classiﬁer.

Abstract:

Design patterns codify standard solutions to common problems in software design and architecture. Given

their importance in improving software quality and facilitating code reuse, many types of research are proposed

on their automatic detection. In this paper, we focus on singleton pattern recovery by proposing a method that

can identify orthodox implementations and non-standard variants. The recovery process is based on speciﬁc

data created using a set of relevant features. These features are speciﬁc information deﬁning each variant

which is extracted from the Java program by syntactical and semantic analysis. We are based on the singleton

analysis and different proposed features in ou previous work (Nacef et al., 2022) to create structured data. This

data contains a combination of feature values deﬁning each singleton variant to train a supervised Machine

Learning (ML) algorithm. The goal is not limited to detecting the singleton pattern but also the speciﬁcation

of the implemented variant as so as the incoherent structure that inhib the pattern intent. We use different

ML algorithms to create the Singleton Detector (SD) and compare their performance. The empirical results

demonstrate that our method based on features and supervised ML, can identify any singleton implementation

with the speciﬁc variant’s name achieving 99% of precision, and recall. We have compared the proposed

approach to similar studies namely DPDf and GEML. The results show that the SD outperforms the state-

of-the-art approaches by more than 20% on evaluated data constructed from different repositories; PMART,

DPB and DPDf corpus in terms of precision.

1 INTRODUCTION

Design patterns (Gamma et al., 1994) have an impor-

tant role in the software development process. The

term has become commonplace among software de-

signers, requirements engineers, and software pro-

grammers alike. The language of design patterns is

now necessary to understand and work with software

because it helps to understand the design intent of

pre-developed software. Hence, software reverse en-

gineering and redesign. His recovery can make the

maintenance of source code, more easy and enhance

her existing analysis tools by bringing program under-

standing to the design level. Regarding its important

role in improving program comprehension and re-

engineering, design pattern detection became a more

and more active research ﬁeld and has observed in re-

cent years, a continual improvement in the ﬁeld of

automatic detection. However, there are many difﬁ-

culties in detecting practical design patterns that arise

from the following:

• The non-formalization of the pattern, the variety

in implementations of source codes, and the in-

creasing complexity of software projects

• The design structure does not match the intent:

Even though the identiﬁed pattern instances cor-

respond to their structure, the design intent can be

broken. Which can be considered as the main rea-

son behind the increase in the false positive rate.

Most existing methods convert source code and de-

sign patterns into intermediate representations, such

as rules, models, graphs, products, and languages.

Using these intermediate representations makes it

easier to extract structural elements such as classes,

properties, methods, etc. from the source code. How-

ever, they show poor performance compared to meth-

ods based on extracted features. At the same time,

structural analysis cannot recover the pattern intent of

its different representations, so it is necessary to per-

form semantic analysis on the source code to improve

226

Nacef, A., Bahroun, S., Khalfallah, A. and Ben Ahmed, S.

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection.

DOI: 10.5220/0011992100003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineer ing (ENASE 2023), pages 226-237

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

the accuracy of the recovered model. On the other

hand, extracting semantic information from source

code is still a difﬁcult task due to the complexity

and diverse representation of the code. Therefore,

to solve this problem, we are based on our previous

work (Nacef et al., 2022) to deﬁne singleton variants

rules and create speciﬁc data to recover each variant

from the source code. We have proposed a set of fea-

tures for identifying singleton patterns based on a de-

tailed analysis of the variant’s structure and behaviors.

For analyzing the Java program, we apply syntactical

and semantic analysis by the use of LSTM model to

extract semantic information (features). The LSTM

model is trained by speciﬁcally structured data corre-

sponding to each feature for a classiﬁcation task.

The main goal of our work is the detection of

the singleton design pattern. The SD role has not

been limited only to the recovery of singleton vari-

ants (including non-standard variants and their com-

binations), but also incoherent implementations with

the singleton intent in order to verify that the single-

ton pattern has been implemented correctly. While

designers understand patterns well, developers may

not have as much experience. This can lead to incor-

rect implementation of the pattern or the possibility

of later introducing coding errors that break the pat-

tern stage. The recovery of incorrect implementation

makes possible the refactoring task. If a singleton pat-

tern is detected, the SD can specify the type of the

implemented variant.

In this paper, we propose a Feature-Based Single-

ton Design Pattern Detection approach that uses a set

of features extracted from both syntactical and seman-

tic analyses. We create speciﬁc structured data based

on the feature’s combination values corresponding to

each singleton variant to train a supervised ML model

named SD. Trying to construct data containing vari-

ous implementations, we ameliorate the learning pro-

cess of the SD to recover any implementation of the

singleton pattern. The proposed approach is based on

the detailed analysis of the pattern previously realized

and selected features in (Nacef et al., 2022) to extract

rules for pattern identiﬁcation in the goal to create

the data. This dataset serves as training data to make

learning the SD model. Then we evaluate the pro-

posed approach with a labeled dataset collected from

PMART (Gu

eneuc, 2007), (Gu

eneuc, 2007),

DPB (Fontana et al., 2012), DPDf-corpus (Nazar

et al., 2022) and 94 Java ﬁles extracted from a pub-

licly available GitHub Java Corpus. The detector

makes a very good result, with higher accuracy com-

pared to the state-of-the-art approaches.

The Contribution of the paper can be summarized

as follows:

• We introduce a novel approach called “Singleton

Design Pattern Variant Detection using features

and Supervised Machine Learning” (SD) that use

33 features to recover Singleton pattern variants.

• We create a structured data named DTSD for

training the SD classiﬁer. the DTSD contains

7000 samples (combination of features values).

• We build two labeled data for evaluating the SD;

the ﬁrst is extracted from DPDf-Corpus (we take

only ﬁles with singleton implementation) named

DPDf 2 and we insert the missing variants. The

second is building from the singleton pattern ﬁle

existing in P-MART, DPB and DPDf Corpus.

• We proved that our approach outperforms similar

existing approaches with a substantial margin in

terms of standard measures.

The rest of the paper is organized as follows; we

present the related work and the contribution made to

them in Section2. We indicate the reported singleton

variants and the highlighted features in Section3. In

section 4, we discuss the relevant background of the

related technologies and we present the proposed ap-

proach. Section 5 presents obtained results. An eval-

uation of different ML algorithms and a comparison

between our study and the state-of-the-art are realized

and discussed. Finally, in Section 6 a conclusion and

future work is presented.

2 RELATED WORK

In recent years, design pattern detection became a

more and more active research domain. The problem

of recovering design patterns from the source code

has been faced and discussed in several works. Many

strategies and many techniques are used.

The majority of design patterns mining ap-

proaches transform the source code and design pat-

terns into some intermediate representations such as

an abstract semantic graph, abstract syntax tree, rules,

grammar, etc. . . . The searching methods diversify

also from one to another, and can be classiﬁed as met-

rics, constraint resolver, database queries, eXtended

Positional Grammar (XPG), etc. . . .

Several approaches (Wegrzynowicz and Stencel,

2013),(Combemale et al., 2021), (Ahram, 2021) use

database queries as a technique for extracting pat-

terns. They use Structured Query Language (SQL)

queries to extract pattern-related information, and

produce an intermediate representation of the source

code. In this case, the performances depend enor-

mously on the underlying database and can be scaled

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

227

very well. However, queries are limited to the avail-

able information existing in the intermediate repre-

sentations.

Techniques proposed by (von Detten and Becker,

2011), (Kim and Boldyreff, 2000) use program-

related metrics (e.g., aggregations, generalizations,

associations, and interface hierarchies) from differ-

ent source code representations. The detection of DP

is based on comparing DP and code source metrics

values. This method is computationally efﬁcient be-

cause it reduces the search space through ﬁltration

(Gu

eneuc et al., 2010). An advanced step proposed

by (Satoru Uchiyama, ) combine software metrics and

machine learning to identify candidates for the roles

that compose.

Other techniques are based on graph representa-

tion. The detection approach proposed by (Balanyi

and Ferenc, 2003) combines graph and software met-

rics to perform the recognition process. First, a set

of candidate classes for each DP role are identiﬁed

based on software metrics. These metrics were cho-

sen based on the theoretical description of the DP,

and they were used to establish clear logical rules

that could lead to many false positives. In the second

stage, all candidate class combinations are analyzed

in detail to ﬁnd DP matches. To improve the accuracy

of results, ML methods (decision trees and artiﬁcial

neural networks) are used to ﬁlter as many as possible

false and distinguish similar patterns (Ferenc et al.,

2005). Other work proposed by (Zanoni et al., 2015)

develop a MARPLE tool (Fontana and Zanoni, 2011)

which exploited a combination of graph matching and

ML techniques.

Inﬂuenced by the work given by (Tsantalis et al.,

2006), (Thaller et al., 2019) propose a feature Maps

for pattern instances based neural networks. (Chihada

et al., 2015) propose a design pattern detector that

learns based on the information extracted from each

pattern instance. They treat the design pattern recog-

nition problem as a learning problem. However, re-

cent work proposed by (Hussain et al., 2018) treats the

problem as text categorization, in which they leverage

deep learning algorithms for organizing and selecting

DPs. Recently, (Nazar et al., 2022) selected 15 feature

codes and use machine learning classiﬁers to auto-

matically train a design pattern detector. Another re-

cent work proposed in (Barbudo et al., 2021) present

a novel machine learning-based approach for DPD

named GEML. Like other work, the used method ex-

plores the ML capacity, but (Barbudo et al., 2021) ad-

dressed their limitations by using G3P as a basis of

the proposed approach.

Though we base on ML to extract information

from source code and to detect singleton variant, our

approach differs from the other mentioned approaches

in many ways. The difference that can exist is sum-

marized as under.

• (Stencel and Wegrzynowicz, ) and other work, de-

tect many variants of singleton pattern. However,

we are the ﬁrst to detect non-only the different

variants, but its possible combination and incoher-

ent implementations that inhibit the pattern intent,

with their corresponding names.

• Our approach use ML like many other ap-

proaches, but it is the ﬁrst to create singleton spe-

ciﬁc dataset for training the model (whether at the

level of code analysis or pattern extraction), which

performs the model to better training, i.e. better

results, and make easy to ﬁlter false positive.

• As (Hussain et al., 2018), we use deep learning

algorithms for text categorization, but extracting

a pattern from direct source code is a very hard

task and needs an enormous number of data be-

cause there are many non-standard implementa-

tions. In our work, we consider text categorization

as the ﬁrst step in which we extract needed infor-

mation from the source code. This information

represents features that describe the pattern struc-

ture and behavior. The use of features reduces

the search space, and the size of training data in-

creases the prediction rate and decreases the false

positive number.

• (Nazar et al., 2022) use 15 source code features to

identify 12 Design Patterns, (Fontana and Zanoni,

2011) utilize code metrics and (Thaller et al.,

2019) use feature maps. However, we employ 33

features, especially for the singleton pattern. The

use of speciﬁc features gives a detailed deﬁnition

of the pattern and allows the identiﬁcation of non-

standard implementations.

• The data dedicated to the SD has a size of 7000.

However, P-MART corpus includes 1039 ﬁles,

and DPDf uses a corpus with 1300 ﬁles for train-

ing the classiﬁer to recover many design patterns

with imbalanced nature. The Corpus Used in

both is used for training and evaluating the model.

The number of singleton patterns existing in P-

MART and DPDf corpus is respectively 12, and

100 which is not entirely satisfactory to recover

all implementations.

• Our SD based on extracted features from struc-

tural and semantic analysis of source code and

ML techniques achieved approximately 98% of

precision and recall and prove its capability to

recover any non-standard variant and ﬁlter false

positives. However, DPDf and GEML did not

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

228

exceed 80% in terms of precision, recall and F1

Score.

3 REPORTED VARIANTS AND

PROPOSED FEATURES

The singleton design pattern is used to ensure that a

class has only one instance. That means restricting

class instantiation to a single object (or even to a few

objects only) in a system and providing a global ac-

cess point to it. The recognition of instances of sin-

gleton patterns in source code is difﬁcult, caused by

the different implementations, which are also not for-

mally deﬁned. So that we depart from the singleton

pattern analysis and their variants presented in (Nacef

et al., 2022).

3.1 Reported Variants

Based on the proposed singleton variants deﬁned in

(Gamma et al., 1994), (Stencel and Wegrzynowicz, ),

our approach can effectively recognize the different

variants represented in table 1 and their combinations

form.

Table 1: Reported Singleton variants.

Singleton variants

Eager Instantiation Lazy Instantiation

Placeholder Replaceable Instance

Subclassed Singleton Delegated Construction

Different Access Point Limiton

Social Singleton Generic Singleton

3.2 Selected Features

For the singleton pattern detection, we have to use the

33 features proposed by (Nacef et al., 2022) resulting

from the singleton variants analysis.

This speciﬁc analysis allowed the extraction of the es-

sential information for each variant signature. These

features are presented in table 2.

If we just look at its canonical implementation

(which is quite simple), the intent of the singleton

pattern seems simple. However, careful analysis of

the structure may lead to further constraints being de-

ﬁned. These characteristics reﬂect information that

has forty effects on preserving the singleton intent.

The only way to keep a singleton instance for future

reuse is to store it as a global static variable. With

this, we should check the correctness of the different

variants and count the number of class attributes and

method-generating instances. If the number exceeds

Table 2: Used Features.

Abb. Feature

IRE Inheritance relationship (extends)

IRI Inheritance relationship (implements)

Class accessibility (public, abstract, ﬁ-

nal)

GOD Global class attribute declaration

AA Class attribute accessibility

SR Static class attribute

ON Have only one class attribute

COA Constructor accessibility

HC Hidden Constructor

ILC Instantiate when loading class

GAM Global accessor method

PSI Public Static accessor method

GSM Global setter method

PST Public static setter method

INC Use of inner class

EC Use External Class

RINIC Returning instance by the inner class

RINEC Returning instance by the External class

CS Control instantiation

HGM Have one method to generate instance

DC Double check locking

Return reference of the Singleton in-

stance

CNI

Variable to count the number of in-

stances

CII Create internal static read-only instance

DM Use delegated method

GMS Global accessor synchronized method

IGO Initializing global class attribute

LNI Limit the number of instances

SCI Use string to create instance

SB Static Block

AFL Allowed Friend List

CAFB Control access to friend behavior

Using Singleton class as a type for

generic instantiation

one, then the structure is incorrect and the intention is

inhibited (case of listing 1 and 2.

The common conditions that must be veriﬁed in

most singleton variants are:

• The global access point to get the instance.

• The Constructor modiﬁer to restrict its accessibil-

ity.

• The Control of instantiation; verifying the exis-

tence of conditions that limit the number of cre-

ated instances.

• The veriﬁcation of the number of declared class

attributes and the number of method-creating in-

stances to only one.

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

229

Second, speciﬁc information for each variant should

be veriﬁed. In table 3, we categorize features with

corresponding variants. Relevant information needed

for the identiﬁcation of each variant is regrouped.

This grouping strongly helps SD data creation.

Table 3: Features corresponding to each variant.

Singleton Variants Features

Eager Implementation ILC, SB

Lazy Instantiation

GAM, PSI, CS, HC, DC,

GMS

Different Placeholder INC, RINIC

Replaceable Instance PST, GSM

Subclassed Singleton CA, IRE

Different Access Point EC, RINEC

Delegated Construction DM

Limiton LNI

Social Singleton IRI, AFL, CAFB

Generic Singleton TUR

4 PROPOSED APPROACH

In this section, we will represent the used techniques

and discuss the singleton detection process. Finally,

we are gone to give details about the created data used

for the training phase.

4.1 ML Used Techniques

ML is a subset of artiﬁcial intelligence (AI) that fo-

cuses on creating systems that can learn from data,

identify patterns and make decisions with minimal

human intervention. The power of ML is its capa-

bility to automatically improve performance through

experience. ML have penetrated every aspect of our

lives, and made the hard task easier to resolve, with

higher accuracy.

Algorithms are the engines of ML. In general, two

main types of ML algorithms are used today: super-

vised learning and unsupervised learning. The differ-

ence between the two is deﬁned by the method used

to process the data to make results, the type of input

and output data, and the task that they intend to solve.

In our work, we use the supervised learning type;

which is supplied with information about several en-

tities whose class membership is known and which

produce from this a characterization of each class. In

supervised learning, there are two major types, named

regression and classiﬁcation. In our work, tasks real-

ized have a classiﬁcation type. The ﬁrst step realized

(Nacef et al., 2022) consist of analyzing Java program

by the use of (RNN-LSTM) (Sherstinsky, 2018) to

extract the value of features with binary or multi-class

classiﬁcation. In the second phase, we use the pre-

vious analysis of the singleton pattern and feature’s

value to create labeled structured data for training the

SD classiﬁer. The SD carries out a multi-class classi-

ﬁcation task; each class represents a singleton variant

(if a singleton pattern is implemented) or none other-

wise. For the SD, we built various ML classiﬁers such

as Random Forest (RF) (Elmahdy et al., 2021), Gra-

dient Boosted Tree (GBT) (Murphy, 2012), SVM

(M.Schnyer, 2020), KNN (Peterson, 2009), and Neu-

ral Network (NN) (Doya and Wang, 2022) intending

to compare their results.

Figure 1 illustrates the proposed approach pro-

cess. In the following subsections, we will discuss

these phases one by one.

4.2 First Phase

The detailed analyses of singleton variants make eas-

ier the deﬁning of efﬁcient information and the con-

struction of the training dataset. In (Nacef et al.,

2022) a detailed analysis of the singleton pattern is

realized, a set of variants are identiﬁed, and a 33 rele-

vant features are selected. For extracting feature val-

ues from the Java program, a syntactical and seman-

tic analysis is applied by the use of LSTM.Based on

the analysis and the feature’s value extracted from the

Java program, we deﬁne rules for every singleton vari-

ant, and then we construct the dataset for training the

SD.

4.3 Second Phase

The analysis already done was very useful in terms

of deﬁning the rules for each implementation variant,

which will be the key behind the construction of the

training data DTSD. The strength of our approach is

that we use speciﬁc features, as well as the use of our

own created data.

• Dataset Creation Process. As we know, the data

is the essence of ML, then the more large and

more diverse the dataset is, the better results will

be obtained. If the classiﬁer is trained by com-

pleted and diversiﬁed examples, it will be more

able to predict correct instances, and achieve a

higher accuracy result. Based on the important

role of the data in building a performed model,

we gave importance to the creation of the training

dataset. Contrarily to other approaches, we don’t

limit ourselves to the existing implementation ex-

tracted from the considerate benchmark corpus,

because many implementations can be missed,

or we can have an imbalance in their numbers

(dominance of some implementations over others)

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

230

Figure 1: Flowchart of the proposed approach.

which causes an unfair training process and can

lead to the incapability of the model to recover

all existing implementations. So for training the

model, we create a dataset DTSD based on rules

extracted from singleton variants analyses, and for

evaluating the model, we use other created data

based on DPDf singleton corpus.

• Deﬁning Rules. We have identiﬁed 27 instances

according to different singleton variants. The

class candidates are represented in table 4. Based

on speciﬁc information about each one, we illus-

trate rules to deﬁne them. The rules represent a

combination of values of features. The SD will

be made by learning from information extracted

from the DTSD data. Table 5 shows an exam-

ple of important feature values that made each

candidate instance true. Noting that, based on

these most important features, we can create sev-

eral implementations by playing on the other fea-

tures values (we must respect the identity of each

variant).

To achieve high recall (which will be explained

in the next section), we need to reduce the num-

ber of false-negative predictions. For a singleton

design pattern detection, this leads to creating a

dataset that comports numerous implementation

variants that preserve the meaning of the pattern,

and those that can destroy the intent. As an ex-

ample, the only way to keep a singleton instance

for future reuse is to store it as a global static vari-

able, so that we should verify that there is only

one declared class attribute in different variants

(ON). In the case of different placeholder and dif-

ferent access point variants, the instance is held

as a static attribute of an inner class or external

class, so we should verify the existence of a static

class attribute inside both. Another example, as

the intent of a singleton pattern is to limit the

number of objects to only one (exception Limi-

ton variant), we must verify that there is only one

block for creating an Instance (HGM). The fact

that one of the two features (ON/HGM) is false,

the implementation will be considered an incor-

rect singleton structure (as shown in table 11).

This error inhibits the singleton intent and can

provoke false-negative predictions. This type of

error can be caused by a developer’s unconscious-

ness during the implementation, so we decided to

consider it, and detect similar implementations if

they exist. Listing 1 and 2 represents an exam-

ple of incorrect lazy and eager implementation,

which inhibits the singleton pattern intent. Ta-

ble 5,6,7,8,9,10 present example of rules deﬁning

some singleton variants and table 11 represents an

example of a combination of feature values mak-

ing the implementation error.

Listing 1: Example 1: Singleton Error Implementations.

P u b l i c c l a s s C1 {

P r i v a t e s t a t i c C1 i n s t a n c e = new C1

( ) ;

C1 ( ) {}

p u b l i c s t a t i c C1 g e t I n s t a n c e ( ) {

R e t u r n ( new C1 ( ) ) ; }

}

Listing 2: Example 2: Singleton Error Implementations.

P u b l i c c l a s s C2 {

P r i v a t e s t a t i c C2 i n s t a n c e ;

P r i v a t e C2 ( ) {}

P u b l i c s t a t i c C2 g e t I n s t a n c e 1 ( ) {

i f ( i n s t a n c e = = n u l l ) i n s t a n c e =

new C2 ;

}

P u b l i c s t a t i c C2 g e t I n s t a n c e 2 ( )

{ r e t u r n new C2 ; }

}

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

231

Table 4: SD Classiﬁer classes.

Singleton Variants Classes NO.

Eager Implementation

Simple Implementation 1

Implementation With static Block 2

Simple Implementation-Invalid Implementation 3

Implementation With static Block-Invalid Implementation 4

Lazy Instantiation

Singleton naif 5

Non-thread safe 6

Thread safe with synchronized method 7

Lazy instantiation double lock mechanism 8

Singleton naif-Invalid Implementation 9

Non thread safe-Invalid Implementation 10

Thread safe with synchronized method-Invalid Implementation 11

Lazy instantiation double lock mechanism-Invalid Implementation 12

Different Placeholder

Different Placeholder 13

Different Placeholder-Invalid Implementation 14

Replaceable Instance

Replaceable Instance 15

Replaceable Instance-Invalid Implementation 16

Subclassed Singleton

Subclassed Singleton 17

Subclassed Singleton-Invalid Implementation 18

Different Access Point

Different Access Point 19

Different Access Point -Invalid Implementation 20

Delegated Construction

Delegated Construction 21

Delegated Construction -Invalid Implementation 22

Limiton

Limiton 23

Limiton-Invalid Implementation 24

Social Singleton Social Singleton 25

Generic Singleton Generic Singleton 26

No Singleton No Singleton 27

Table 5: Data extract: Example of features combination for Eager Singleton variant.

Class Combination Values

CA GOD AA SR COA ILC GAM PSI RR SB ON HGM

1 public True Private True private True True True True False True True

2 Public True Private True private True True True True True True True

Table 6: Data extract: Example of features combination for Lazy Instantiation Singleton variant.

Class

Combination Values

CA GOD AA SR COA ILC GAM PSI RR CS DC GMS ON HGM

6 Public True Private True private False True True True True False False True True

7 Public True Private True Private False True True True True False True True True

Table 7: Data extract: Example of features combination for Different Placeholder Singleton variant.

Class

Combination Values

CA COA ILC GAM PSI RR INC RINIC ON HGM

13 public Private False True True True True True True True

Table 8: Data extract: Example of features combination for Replaceable Instance Singleton variant.

Class

Combination Values

CA GAD AA SR COA GAM PSI RR GSM PSS ON HGM

15 public True Private True Private True True True True True True True

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

232

Table 9: Data extract: Example of features combination for Delegated Construction Singleton variant.

Class

Combination Values

CA GOD AA COA CS GAM PSI RR DM ON HGM

21 public True Private Private True True True True True True True

Table 10: Data extract: Example of features combination for Social Singleton variant.

Class

Combination Values

CA COA IRI AFL CAFB

25 Public Private True True True

Table 11: Data extract: Example of features combination for Eager Invalid Implementation Singleton variant.

Class

Combination Values

CA GOD AA SR COA ILC GAM PSI RR SB ON HGM

3 public True private True private True True True True False False False

4 public True private True private True True True True True False False

• Building Singleton Variants Classiﬁer. The use

of an ML algorithm depends on the type of data

to treat and generate, and the type of realized task.

A singleton design pattern detection is a classiﬁ-

cation problem, with structured labeled data. To

deal with this typical problem, we can use dif-

ferent algorithms. In this case, the better model

to use is the one that gives a better result. We

choose to use ﬁve different algorithms the most

already used in classiﬁcation tasks; RF, GBT,

SVM, KNN, and NN.

5 EVALUATION SETUP

This section presents the criteria used to evaluate our

SD and the different results made by it. We compared

results generated from each model, and interpret and

compared them with state-of-the-art approaches.

5.1 Evaluation Protocol

The standard measures to statistically evaluate the ef-

ﬁcacy of classiﬁers are Precision, Recall, and the F1-

Score. Prediction: The prediction rate indicates the

fraction of positive prediction which was actually cor-

rect. It is deﬁned in 1:

Precision =

T P

T P +FP

(1)

Recall: The recall indicates the fraction of actual pos-

itives which were identiﬁed correctly. It is deﬁned in

Recall =

T P

T P +FN

(2)

The F1 score is a way to combine both precision and

recall into a single number. It represents a harmonic

mean of both scores, which is given by this simple

formula 3:

F1Score =

2 ∗ (precision ∗ recall)

(precision + recall)

(3)

5.2 Performed Results

We have created a classiﬁer based on different ML

models. After training the model we have tested it

with the 200 GitHub Java classes referred by DPDf

Corpus proposed by (Nazar et al., 2022) 100 ﬁles

contain singleton implementation and 100 ﬁles do not

contain any type of design patterns i.e. none.

Unfortunately, the corpora used do not contain all

singleton variants, which makes the evaluation of the

SD not completed. To ensure the performance of the

detector in recognizing each variant; we collect other

GitHub Java classes with missing variants. Next, we

took these collected classes and injected blocs that in-

hibit the pattern intention, in order to verify the ability

of the SD classiﬁer to recognize also the incorrect im-

plementations.

After collecting the evaluation set, we labeled

them with the corresponding singleton variant name.

In the ﬁrst step, we analyze the existing java class with

the LSTM classiﬁer to determine every feature’s val-

ues. In the second step, we evaluate the SD classiﬁer

by the use of the resulting dataset containing the fea-

ture’s values. The corresponding results of the SD

classiﬁer are illustrated in table 12.

All used ML algorithms make very good results

thanks to the use of speciﬁc training datasets. Com-

paring the results of each ML algorithm used in the

SD classiﬁer, we can see that all used algorithms are

performed and make closed results. SVM model has

performed excellent results in terms of precision, re-

call, and F1-score, and can correctly identify any Sin-

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

233

Table 12: SD Classiﬁer results.

SD Classiﬁers

Measures (%)

Precision Rappel F1

RF 96.24 96.47 95.81

GBT 98.85 98.65 98.64

KNN 92.3 87.05 86.55

SVM 99.49 99.42 99.41

NN 98.3 98.7 98.49

Figure 2: KNN precision results.

gleton candidate with 100% of precision as shown in

ﬁgure 3 (except one candidate 87%). However, the

KNN algorithm is the classiﬁer that makes fewer re-

sults. Contrary to SVM, KNN fails to detect correctly

some Variant like Lazy instantiation with double lock

mechanism and Replaceable Instance with less than

70% of precision like showing in ﬁgure 2.

5.3 Comparison with Similar Existing

Approaches

In this work, we propose a machine learning-based

method to recover the singleton pattern. To posi-

tion our work against state-of-the-art studies, we se-

lect two recent relevant approaches working also with

Machine Learning. The ﬁrst is proposed by (Barbudo

et al., 2021) named GEML and the second is pro-

posed by (Nazar et al., 2022) named DPDf.

Figure 3: SVM precision results.

5.3.1 The Benchmark DP Detection Approaches

• GEML Approach

GEML is a novel DPD approach based on

ML and grammar-guided genetic programming

(G3P). By the use of software properties, GEML

extracts DP characteristics formulated in terms of

human-readable rules. Then a machine learning

classiﬁer is built based on the established rules to

recover 5 DP roles.

In (Barbudo et al., 2021) a comparison between

GEML and other DPD methods, including both

ML and non-ML are realized. GEML outper-

form MARPLE techniques (Zanoni et al., 2015)

with highly values in terms of accuracy and f1

score. It’s also more competitive than two other

reference DPD tools which are frequently used

for comparative purposes (MLDA, SparT and De-

PATOS).

The experimentation in GEML covers 15 roles of

DP and uses implementations from two reposi-

tories to compare obtained results against other

works. The singleton repository details are:

– DPB: created by the authors of (Fontana et al.,

2012) and used to compare the results with

MARPLE.

– JHotDraw from PMART (Gu

eneuc, 2007)

used to compare with non ML-based methods

like DePATOS (Yu et al., 2018), MLDA (Al-

Obeidallah et al., 2018) and SparT (Xiong and

Li, 2019).

Table 13 illustrate the comparison results of

GEML against other DPD studies. The compar-

ison is conducted based on common ground truth.

Four methods are under study. The use of the

”-” symbol, is to indicate that the particular DP

is not supported by the corresponding approach.

The ﬁrst comparison is based on the DPB corpus.

Both GEML and MARPLE perform good results

in singleton detection, but GEML outperforms

MARPLE by more than 7% and 3% of improves

in terms of accuracy and F1 score. The second

comparison is based on the JHotDraw project.

GEML, MLDA, and SparT can recover all single-

ton instances. Their great performance is related

to the reduced number of true Singleton instances

in the test data.

• DPDf Approach

The approach proposed in (Nazar et al., 2022) is

the ﬁrst to employ lexical-based code features and

ML to recover a wide range of design patterns

with higher accuracy compared to the state-of-the-

art. Our approach also combines semantic anal-

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

234

Table 13: Comparing GEML with the state-of-the-art results.

ML techniques

Singleton corpus

DPB-Corpus JHotDraw

Accuracy (%) F1-score (%) Precision (%) Recall (%) F1-score (%)

MARPLE 88 91 - - -

GEML 95.61 94.11 100 100 100

DePATOS - - - - -

MLDA - - 100 100 100

SparT - - 100 100 100

ysis, features, and ML algorithms as techniques,

but we have focused only on Singleton Variants

recovering. Based on the relevancy measure, we

chose to compare our study with the study pro-

posed in (Nazar et al., 2022).

DPDf. Developed in (Nazar et al., 2022) gener-

ates a Software Syntactic and Lexical Represen-

tation (SSLR) by building a call graph and ex-

tracting 15 source code features. The SSLR is

used as an input to build a word-space geometri-

cal model by applying the Word2Vec algorithm.

Then, a DPDf ML classiﬁer is created and trained

by a labeled dataset and geometrical model.

DPDf Reported Results. Nazar et al. has com-

pared his study with two approaches based on

code features and ML, which are developed by

(Thaller et al., 2019) and (Fontana and Zanoni,

2011). For comparing the results, they uses two

benchmark Corpus:

– P-MART Corpus; containing only 12 Single-

ton implemented classes.

– DPDf Corpus; Contain 100 singleton imple-

mented class.

The compared results, reported by (Nazar et al.,

2022) in the detection of the singleton design pat-

tern, are illustrated in table 14. DPDf has im-

proved more performance in recovering singleton

candidates from the DPDf-corpus but has fairly

recovered it from the P-MART Corpus. These

unbalanced results are caused by the number of

instances of singleton existing in each corpus,

which have a great impact on the learning of the

classiﬁer.

5.3.2 Comparison Against GEML and DPDf

Methods

• Comparison Strategy

Results made by (Nazar et al., 2022) and (Barbudo

et al., 2021) do not refer exactly to the performance of

both approaches to recover singleton pattern instance

because other patterns are included. Therefore, we

have tested the DPDf and GEML with singleton spe-

ciﬁc data. The P-MART Corpus contains a few in-

stances of singleton pattern, contrarily to DPDf and

DPB corpus. Otherwise, they represent a none com-

plete data; some variants are absent. Consequently,

we construct new data for the evaluation; we bring

together all singleton instances existing in PMART,

DPB and DPDf repositories, and complete the data

with missing variants. We also construct an inco-

herent structure by injecting, to correct instances, a

structure that inhibits the singleton intent. We try to

evaluate with complete data containing a variety of

implementations. Details of constructed data are pre-

sented in table 15, with the according to the number

of correct, incorrect, and incoherent samples in each

repository.

• Comparison Results

We reproduce the public works of (Nazar et al., 2022)

and (Barbudo et al., 2021) with only singleton pattern.

Then, we evaluate them by the newly created data.

The experiment results are conducted by the use of

a RF classiﬁer in the evaluation process. Table 16

illustrate obtained results from the experiments.

• Discussion

The comparative results illustrated in table 16 show

the performance of the SD to recover any variant of

the singleton pattern, whatever the implementation is.

The SD reaches the best result (98% of F1 Score),

and outperforms GEML and DPDf by respectively

21% and 24% improvements in terms of F1 Score.

Both GEML and DPDf use approximately a small

sample for training the model, which is not enough

for the best training. The classiﬁer in this case is not

able to recover all singleton implementations. His ca-

pacity enormously depends on the variants existing in

the training data, new variants which not fully trained

cannot be detected. However, by the use of complete

data that is well created as in the case of SD (7000

samples), the classiﬁer will be better trained, and al-

ways gives a great performance in the detection pro-

cess. Our SD has the ability not only to recognize the

existence of singleton patterns, but also the type of

implementation. It has also the capacity of recovering

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

235

Table 14: Comparing DPDf results with MARPLE and Feature Maps.

ML techniques

Singleton Corpus

DPDF Corpus Labelled P-MART

Precision (%) Recall (%) F1-score (%) Precision (%) Recall (%) F1-score (%)

Feature Maps 65 67 65.98 63 59 60.93

MARPLE 74.24 69.23 71 74.23 70.18 72.15

DPDf 81.6 68.22 74.31 43.33 40 41.6

Table 15: Evaluating Data Composition.

Singleton Corpus

PMART DPB DPDf Others

Correct 12 58 100 53

Incorrect - 96 100 -

Incoherent - - - 41

Total size 460

Table 16: Comparing SD with DPDf and GEML

Approaches Measures (%)

Precision Recall F1

DPDf 78.5 70.23 74.13

GEML 81.6 73.4 77.28

SD 98.96 97.63 98.29

incoherent implementations with the singleton intent.

The imbalanced results obtained from the differ-

ent DPD methods are explained by the number and

the variety of implementations existing in the training

data. There is another factor that strongly affects the

results, which is the characteristics of the source code

which are highlighted. DPDf approach uses only 15

features to deﬁne 12 design patterns, GEML propose

23 Grammar operators to describe a variety of design

pattern implementations. However, in our work, we

use 33 features for deﬁning only the singleton pattern.

These enormous numbers of features make a detailed

description of the pattern and provide all information

needed to identify any variant.

6 CONCLUSION AND FUTURE

WORK

We propose in this paper a new singleton design

pattern approach based on features and ML tech-

niques. We are based on previous work (Nacef

et al., 2022), and different proposed features to cre-

ate DTSD dataset. DTSD is used to train the SD

classiﬁer, in which we try to make rules deﬁning a

various number of singleton implementations. In the

dataset, we give a combination of feature values to

categorize each variant. We have tried to be sure that

the data contain the most number of implementations

to perfectly train the model. Next, we build a Single-

ton Detector based on different ML classiﬁers. We

trained the supervised classiﬁer by the labeled dataset

DTSD, and we evaluate its performance with other

data collected from the GitHub Java corpus, and sin-

gleton variants existing in DPDf, DPB and P-MART

corpus.

We apply three standard statistical measures

namely precision, recall, and F1-Score to evaluate the

performance of the created classiﬁers. A comparison

between different used ML algorithms to create the

SD is realized. The different results show that the

use of the created dataset makes any classiﬁer eas-

ily able to recover any variants of the singleton pat-

tern although there is a slight difference in perfor-

mance. The Empirical result shows that our proposed

approach can recover any non-standard singleton vari-

ant, even incorrect implementation destroyed the sin-

gleton intent with approximately 99% on both preci-

sion and recall. Our approach outperforms recent rel-

evance approaches by more than 20% improvements

in terms of standard measures.

While our classiﬁer’s performance is promising,

as furfur work we gonna applies it to recover a wide

range of software design patterns. The choice of

source code as refactoring support is very important

and interesting, but we should not limit ourselves only

to this support, we try to switch to the model as refac-

toring support.

REFERENCES

Ahram, T. Z., editor (2021). Advances in Artiﬁcial Intel-

ligence, Software and Systems Engineering, volume

1213 of Advances in Intelligent Systems and Comput-

ing. Springer.

Al-Obeidallah, M., Petridis, M., and Kapetanakis, S.

(2018). A multiple phases approach for design pat-

terns recovery based on structural and method signa-

ture features. Int. J. Softw. Innov., 6(3):36–52.

Balanyi, Z. and Ferenc, R. (2003). Mining design patterns

from C++ source code. In 19th International Confer-

ence on Software Maintenance (ICSM 2003), pages

305–314. IEEE Computer Society.

Barbudo, R., Ram

ırez, A., Servant, F., and Romero, J. R.

(2021). GEML: A grammar-based evolutionary ma-

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

236

chine learning approach for design-pattern detection.

J. Syst. Softw., 175:110919.

Chihada, A., Jalili, S., Hasheminejad, S. M. H., and Zan-

gooei, M. H. (2015). Source code and design confor-

mance, design pattern detection from source code by

classiﬁcation approach. Appl. Soft Comput., 26:357–

367.

Combemale, B., Kienzle, J., Mussbacher, G., Ali, H.,

Amyot, D., Bagherzadeh, M., Batot, E., Bencomo, N.,

Benni, B., Bruel, J., Cabot, J., Cheng, B. H. C., Collet,

P., Engels, G., Heinrich, R., J

equel, J., Koziolek, A.,

Mosser, S., Reussner, R. H., Sahraoui, H. A., Saini,

R., Sallou, J., Stinckwich, S., Syriani, E., and Wim-

mer, M. (2021). A hitchhiker’s guide to model-driven

engineering for data-centric systems. IEEE Softw.,

38(4):71–84.

Doya, K. and Wang, D. (2022). Announcement of the neural

networks best paper award. Neural Networks, 145:xix.

Elmahdy, S., Ali, T., and Mohamed, M. (2021). Regional

mapping of groundwater potential in ar rub al khali,

arabian peninsula using the classiﬁcation and regres-

sion trees model. Remote. Sens., 13(12):2300.

Ferenc, R., Besz

edes,

A., F

op, L. J., and Lele, J. (2005).

Design pattern mining enhanced by machine learning.

In 21st IEEE International Conference on Software

Maintenance (ICSM 2005), 25-30 September 2005,

Budapest, Hungary, pages 295–304. IEEE Computer

Society.

Fontana, F. A., Caracciolo, A., and Zanoni, M. (2012).

DPB: A benchmark for design pattern detection tools.

In 16th European Conference on Software Main-

tenance and Reengineering, pages 235–244. IEEE

Computer Society.

Fontana, F. A. and Zanoni, M. (2011). A tool for design

pattern detection and software architecture reconstruc-

tion. Inf. Sci., 181(7):1306–1324.

Yann-Ga

el Gu

eneuc P-MARt: Pattern-like Micro Archi-

tecture Repository.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J. M.

(1994). Design Patterns: Elements of Reusable

Object-Oriented Software. Addison-Wesley Profes-

sional, 1 edition.

eneuc, Y., Guyomarc’h, J., and Sahraoui, H. A.

(2010). Improving design-pattern identiﬁcation: a

new approach and an exploratory study. Softw. Qual.

J., 18(1):145–174.

eneuc, Y.-G. (2007). P-mart : Pattern-like micro ar-

chitecture repository.

Hussain, S., Keung, J., Khan, A. A., Ahmad, A., Cuomo,

S., Piccialli, F., Jeon, G., and Akhunzada, A. (2018).

Implications of deep learning for the automation of

design patterns organization. J. Parallel Distributed

Comput., 117:256–266.

Kim, H. and Boldyreff, C. (2000). A method to recover de-

sign patterns using software product metrics. In Soft-

ware Reuse: Advances in Software Reusability, 6th In-

ternational Conerence, volume 1844 of Lecture Notes

in Computer Science, pages 318–335. Springer.

M.Schnyer, D. A. (2020). Support vector machine. In Ma-

chine Learning, pages 101–121. ScienceDirect.

Murphy, K. P. (2012). Machine learning - a probabilis-

tic perspective. Adaptive computation and machine

learning series. MIT Press.

Nacef, A., Khalfallah, A., Bahroun, S., and Ben Ahmed, S.

(2022). Deﬁning and extracting singleton design pat-

tern information from object-oriented software pro-

gram. In Advances in Computational Collective Intel-

ligence, pages 713–726, Cham. Springer International

Publishing.

Nazar, N., Aleti, A., and Zheng, Y. (2022). Feature-based

software design pattern detection. J. Syst. Softw.,

185:111179.

Peterson, L. E. (2009). K-nearest neighbor.

Satoru Uchiyama, Atsuto Kubo, H. W. Y. F. Detecting de-

sign patterns in object-oriented program source code

by using metrics and machine learning. Proceedings

of the 5th International Workshop on Software Qual-

ity and Maintainability.

Sherstinsky, A. (2018). Fundamentals of recurrent neural

network (RNN) and long short-term memory (LSTM)

network. CoRR, abs/1808.03314.

Stencel, K. and Wegrzynowicz, P. Implementation variants

of the singleton design pattern. In On the Move to

Meaningful Internet Systems, series = Lecture Notes

in Computer Science, volume = 5333, pages = 396–

406, publisher = Springer, year = 2008.

Thaller, H., Linsbauer, L., and Egyed, A. (2019). Fea-

ture maps: A comprehensible software representa-

tion for design pattern detection. In 26th IEEE Inter-

national Conference on Software Analysis, Evolution

and Reengineering, pages 207–217.

Tsantalis, N., Chatzigeorgiou, A., Stephanides, G., and

Halkidis, S. T. (2006). Design pattern detection us-

ing similarity scoring. IEEE Trans. Software Eng.,

32(11):896–909.

von Detten, M. and Becker, S. (2011). Combining clus-

tering and pattern detection for the reengineering of

component-based software systems. In 7th Interna-

tional Conference on the Quality of Software Archi-

tectures, pages 23–32. ACM.

Wegrzynowicz, P. and Stencel, K. (2013). Relaxing queries

to detect variants of design patterns. In Proceedings of

the 2013 Federated Conference on Computer Science

and Information Systems, Krak

ow, Poland, September

8-11, 2013, pages 1559–1566.

Xiong, R. and Li, B. (2019). Accurate design pattern

detection based on idiomatic implementation match-

ing in java language context. In 26th IEEE Inter-

national Conference on Software Analysis, Evolution

and Reengineering, pages 163–174. IEEE.

Yu, D., Zhang, P., Yang, J., Chen, Z., Liu, C., and Chen, J.

(2018). Efﬁciently detecting structural design pattern

instances based on ordered sequences. J. Syst. Softw.,

142:35–56.

Zanoni, M., Fontana, F. A., and Stella, F. (2015). On ap-

plying machine learning techniques for design pattern

detection. J. Syst. Softw., 103:102–117.

Features and Supervised Machine Learning Based Method for Singleton Design Pattern Variants Detection

237