7
7
7
7
8
8
9
9
13
14
18
28
29
29
31
33
40
54
57
63
0 10 20 30 40 50 60 70
OneR (OR)
Recurrent Neural Network (RNN)
Convolutional Neural Network (CNN)
Support Vector Regression (SVR)
K-Means (KM)
Boosting (Bo)
AdaBoost (AB)
Bagging (Ba)
Regression Tree (CART)
Radial Basis Function Network (RBF)
Bayes Network (BN)
Logistic Regression (LR)
Artificial Neural Network (ANN)
Decision Tree (DT)
Multi Layer Perceptron (MLP)
C4.5 (J48)
K-Nearest Neighbor (KNN)
Random Forest (RF)
Support Vector Machine (SVM)
Naive Bayes (NB)
Figure 2: 20 most frequent ML techniques used in SE do-
main.
problem domains presented, we identified 328 ML
techniques and 35 tools used to assist in solving SE
problems. We list ML techniques, as described in the
papers. Even though some techniques such as neural
networks (ANN, CNN, and RNN) and tree classifi-
cation algorithms (C4.5 (J48), DT, and CART) could
be grouped, we choose to keep the original descrip-
tions we found in the papers, to know exactly the most
used algorithms. Figure 2 presents the 20 most used
techniques and Figure 3 illustrates the 12 most used
tools. Below we list the most frequently mentioned
techniques and tools.
• Techniques/Methods:
1. Na
¨
ıve Bayes (Rish et al., 2001) (36% of Pa-
pers): NB is a classifier that keeps statistics
about each column of data in each class. New
examples are classified by a statistical anal-
ysis that reports the closest class to the test
case. NB classifiers are computationally ef-
ficient and tend to perform well on relatively
small datasets. Papers: [R1, R2, R3, R5, R7,
R11, R13, R14, R21, R23, R24, R31, R36-R38,
R56, R57, R60, R64, R66, R73, R74, R75,
R77, R82, R83, R88, R91, R97, R99, R100,
R102, R105, R107, R110, R111, R115-R120,
R122, R126, R128, R129, R131, R132, R142,
R145-R150, R152, R154, R155, R161, R162,
R164, R168, and R172].
2. Random Forest (Liaw et al., 2002) (30%
of Papers): RF is a joint learning process
that combines multiple weaker learners into a
stronger learner. RF have a better general-
ization and are less susceptible to overfitting
(Breiman, 2001). They can be used for classi-
fication and regression problems. Papers: [R3,
R4, R11, R13, R21, R31, R37, R39, R49, R51,
R54, R56, R57, R59, R60, R61, R62, R65,
R73, R75, R80, R82, R83, R88, R99, R100,
R102, R104, R105, R109, R110, R112, R115-
R119, R122, R123, R127, R128, R129, R131,
R132, R134, R140, R145, R148-R150, R152,
R159, and R171].
3. Support Vector Machine (Cristianini et al.,
2000) (32% of Papers): It is responsible for
creating a linear discrimination function using
a small number of critical threshold instances
(called support vectors) of each class, ensur-
ing maximum possible separation (Liu et al.,
2002). SVM is known as SMO in the WEKA
tool. Papers: [R1, R2, R4, R7, R11, R16, R22,
R24, R31, R32, R38, R42, R51, R57, R59,
R60, R61, R62, R63, R66, R67, R68, R70,
R73, R74, R77, R80, R82, R83, R84, R86,
R89, R90, R91, R94, R105, R108, R110, R111,
R115, R119, R120, R122, R124, R125, R126,
R128, R143, R145, R146, R161, R162, R164,
R165, R171, R172, R174].
• Tools:
1. WEKA (32% of Papers) (Hall et al., 2009):
WEKA is a unified work environment that
allows researchers access to ML techniques
previously implemented and easily configured.
Besides providing a learning algorithm tool-
box, WEKA also provides a framework so re-
searchers can implement new algorithms with-
out having to worry about infrastructure sup-
port for data manipulation and schema evalua-
tion. Papers: [R1- R5, R7, R10, R14, R17, R19,
R21, R23, R24, R27, R31, R34, R36, R47,
R49, R51, R53, R54, R56, R57, R64, R67,
R75, R77, R82-R86, R99-R101, R103, R109,
R110, R117-R119, R121, R124, R128, R129,
R131, R134, R135, R142, R145, R152, R154,
R155, R161, R164, and R177].
2. MATLAB (9% of Papers)
3
: Platform to
solve scientific and engineering problems. The
matrix-based MATLAB language serves to ex-
press computational mathematics. It is used
for ML, signal processing, image processing,
machine vision, communications, computer fi-
nance, control design, robotics, and many other
fields. Papers: [R2, R3, R8, R35, R48, R55,
R81, R85, R92, R93, R127, R140, R141, R144,
R149, and R160].
3. SCIKIT-LEARN
4
(8% of Papers): A Python
module that integrates a wide range of ML al-
gorithms for medium-scale supervised and un-
3
Available at: http://la.mathworks.com/
4
Available at: https://scikit-learn.org/stable
How Machine Learning Has Been Applied in Software Engineering?
311