
techniques. We identified fourteen studies that used
datasets based on open-source repositories (S03, S07,
S24, S32, S43, S44, S45, S46, S47, S61, S62, S68,
S70). EzzatiKarami and Madhavji (S20) use a pub-
lic requirements document (PURE dataset). Reahimi
et al. (S54) specify that the dataset was obtained in
a previous real-world project. Gu et al. (S22) se-
lected a dataset of an experiment concerning 91 ef-
fective cases of KLK2 elevator, among others.
Considering the type of document of the dataset
applied for research purposes, we found some pa-
pers based on SRS (S09, S10, S20, S31) (5%), tex-
tual (S44, S61, S62) (4%), mobile apps (S04, S12,
S46) (4%), review data (S25, S70) (3%), and book
(S60) (1%). Once the dataset is gathered, it has to
be processed to prepare data and use ML algorithms.
Most studies use the PROMISE dataset (26%), among
others include iTrust (S24, S65), PURE (S02, S20),
SQuAD (S03), Aurora 2 (S47).
Regarding the dataset size, several works to group
sizes between 14 to 10.000 (S04, S06, S07, S11, S12,
S18, S19, S20, S21, S22, S24, S25, S28, S30, S31,
S34, S35, S36, S43, S44, S50, S52, S53, S54, S55,
S61, S62, S63, S64, S66, S68, S70, S73) (45%), and
other works varied between 10.000 to 7 million and
more (S02, S03, S11, S39, S45, S57) (8%). Some
methods consider continuous and categorical vari-
ables, five papers corresponding to categorical (S22,
S24, S36, S44, S45), and one to continuous (S22).
4.2.2 RQ2 Findings
In research on the classification of requirements using
ML techniques, evaluating the results is an essential
part of the process. In this regard, various metrics
widely used in the analyzed works have been iden-
tified: Accuracy, Precision, Recall, and F-measure.
This confirms that these four metrics are the most
commonly used as they are considered the most rep-
resentative for evaluating an ML model.
According to the results table, most papers use re-
quirement repositories that are freely accessible. Re-
garding the size of the datasets, most papers com-
prise less than 10.000 requirements. Although, some
papers with a minimal amount of requirements are
mainly used to perform a proof of concept of the pro-
posed approach. Only a small number of papers have
datasets with more than 10.000 requirements, which
is why deep learning algorithms are used to a lesser
extent. This situation may be due to the lack of free
access datasets with large amounts of requirements.
These results suggest that the investigation of the
application of ML techniques to automate the activi-
ties of RE has several open avenues, such as (1) con-
sidering other types of requirements specification for-
mats, (2) replicating these works to confirm their re-
sults and provide benchmarks of different approaches,
(3) extend the existing techniques through the use of
large data sets and other application domains; and
(4) pay attention to the efficiency of the approaches.
4.3 RQ3. What Are the Tools Used in
the Field of RE that Apply ML?
4.3.1 RQ3 Results
Some important topics in using ML algorithms ap-
plied to software requirements are related to the tech-
nology and tools used. Several tools and technology
have been applied in different studies involving ML
and RE, being that 20% of the works have used open-
source (S12, S13, S24, S43, S44, S45, S46, S47, S61,
S62, S64, S68), whereas 5% indicated visualization
tools for presenting the results of ML analyses in a
user-friendly (e.g., dashboards) (S13, S12, S61, S62),
and 3% specified chatbot (S03, S12). Dabrowski
et al. (S12) do not mention using visualization or
chatbot technologies; rather, they focus on evaluat-
ing techniques for opinion mining and searching for
feature-related reviews in app reviews. Some au-
thors indicated that they gave a name to their propos-
als, for example, Review with Categorized Require-
ments (ReCaRe) (S49), Heuristic Requirements Assis-
tant (HeRA) (S62), retraining Bidirectional Encoder
Representations from Transformers (REBERT4RE)
(S02), ReqVec (S66), NOMEN (S26), among others.
Concerning how the proposals were developed,
the results indicate that ten works use Python (S04,
S08, S18, S20, S28, S46, S54, S72), four use Java
(S22, S24, S34, S63) two use Weka (S04, S56). The
rest of the papers, corresponding to MySql (S22),
R (S52), Google Colab (S06), wiki (S61), Semi-
supervised approach for Feature Extraction (SAFE),
Group MAsking (GuMa), Relation-based Unsuper-
vised Summarization (ReUS) and Mining App Review
using Association Rule Mining (MARAM) (S12), and
APIs of Apple App Store (S70).
4.3.2 RQ3 Findings
After the analysis, it was identified that most ap-
proaches use open-source technology. Several of them
assign names to the generated tools, varying from vi-
sual tools to chatbots. Regarding programming lan-
guages, the ones used in data science in general, such
as Python, R, and Java, among others, are mostly
used. Visual frameworks, such as Weka, are also used.
It is important to note that several papers use cloud
software for experimentation execution.
Machine Learning-Enhanced Requirements Engineering: A Systematic Literature Review
527