Extract-Transform-Load Process for Recognizing Sentiment from
User-Generated Text on Social Media
Afef Walha
1,2
, Faiza Ghozzi
1,3
and Faiez Gargouri
1,3
1
MIRACL Laboratory, Sfax, Tunisia
2
Higher Institute of Information Science and Multimedia of Gabes (ISIMG), University of Gabes, Tunisia
3
Higher Institute of Information Science and Multimedia of Sfax (ISIMS), University of Sfax, Tunisia
Keywords:
Sentiment, Classification, BPMN, Polarity, ETL, Process, Formalization, Social Media.
Abstract:
In today’s world, business intelligence systems must incorporate opinion mining into their decision-making
process. Sentiment analysis of user-generated content on social media has gained significant attention in recent
years. This method collects user opinions, feelings, and attitudes toward a topic of interest and helps determine
whether their sentiment is positive, neutral, or negative. This paper addresses text classification in sentiment
analysis and presents a solution to the Extract-Transform-Load (ETL) process based on a lexicon approach.
This process involves gathering media clips, converting them into sentiments, and loading them into a social
data warehouse. We provide generic and customizable models to aid designers in integrating pre-processing
techniques and sentiment analysis into the ETL process. By formalizing new ETL concepts, designers can
create a reliable conceptual design for any ETL process related to opinion data integration from social media.
1 INTRODUCTION
Business Intelligence (BI) systems analyze data for
decision-making, while social networks facilitate so-
cial interactions. Integrating social network data into
BI requires considering user experiences and orga-
nizational goals. Businesses should evaluate the ad-
vantages and difficulties of utilizing social media data
(Sinha et al., 2024). Social media platforms like Face-
book, Twitter, and Instagram allow people to share
their thoughts and interests through user-generated
content (UGC). Companies use this data to improve
marketing, customer service, and public relations.
UGC, like tweets, has given rise to sentiment anal-
ysis (Wankhade et al., 2022). Because it deals with
human-generated informal text, this field is complex.
Most researchers focus on sentiment analysis, which
involves using various cleaning techniques and polar-
ity detection methods to identify opinions. Design-
ers working on a data warehouse (DW) may require
assistance integrating UGC from diverse social me-
dia sources into their Extract-Transform-Load pro-
cess (Khan et al., 2024). Therefore, it is crucial to
model the ETL process for integrating opinions, re-
gardless of the social platform or topic of interest.
This paper proposes ETL4Social-Process-Sentiment,
a process for gathering UGC text and converting it
into opinions for the Social DW. The paper focuses
on modeling complex operations’ control and data
flows, emphasizing text pre-processing and polarity
detection. It also suggests formalizing various ETL
concepts, thus enabling designers to reuse and ad-
just current models or develop new ones that meet
their specific requirements. The manuscript is divided
into seven sections. Section 2 discusses opinion in-
tegration and notable works. Section 3 outlines the
sentiment analysis method and proposed model for
ETL4Social-Process-Sentiment. Section 4 highlights
its complex operation models, while Section 5 for-
malizes concepts. Section 6 compares our contribu-
tion to existing works. Lastly, Section 7 concludes
and provides a foundation for future research.
2 MOTIVATION AND RELATED
WORK
Social DW (SDW) is a central storage for analyzing
UGC from various social media platforms. We sug-
gest a global schema for all media types, as shown
in Figure 1. This model enables decision-makers to
analyze opinions expressed by OpinionFact accord-
ing to favorites count, republished count, and opin-
Walha, A., Ghozzi, F. and Gargouri, F.
Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media.
DOI: 10.5220/0012706100003687
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2024), pages 641-648
ISBN: 978-989-758-696-5; ISSN: 2184-4895
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
641
ion occurrence. The analysis can be done based on
several dimensions, such as Media ClipDim, Date-
Dim, TimeDim, LocationDim, UserDim, Sentiment-
Dim, ContextDim, and TopicDim. This present paper
focuses on the process that performs complex ETL
steps to extract user-generated text, clean it, transform
it into opinions, and load it into the SDW, especially
to aliment the SentimentDim, which is the user’s sen-
timent regarding a social event. It is associated with
a ”positive”, ”negative”, or ”neutral” polarity. The
polarity depends on the polarity value, a float on the
range of [-1,1], computed based on a sentiment clas-
sification algorithm. The Sentiment attribute (belongs
to SentimentDim) includes several variants of the po-
larity, such as ”high positive”, ”positive” and ”low
positive” for a ”positive” polarity.
Figure 1: Multidimensional Schema of Social DW.
2.1 Challenges of Matching UGC-Text
with Sentiment
Table 1 shows examples of tweets related to ”mo-
bile technology”. We manually analyzed the human-
generated text based on sentiment words, emoticons,
and other indicators to determine whether the senti-
ment expressed in the tweets is positive, negative, or
neutral. It’s important to note that we only explored a
small set of tweets. However, how can we proceed
when there are hundreds or thousands of tweets to
analyze?
2.2 Related Work
2.2.1 Sentiment Analysis Approaches
Sentiment analysis determines the polarity of a
text. Methods include sentiment lexicon-based ap-
proaches, machine learning-based approaches, and
deep learning techniques (Li et al., 2022). The senti-
ment lexicon-based approaches (e.g., (Darwich et al.,
Table 1: Classification of tweets (”mobile technology”).
Tweet text Sentiment Polarity
WOW the new #Google
#Nexus is so beautiful !!! to-
tally boost google’s market
share in the smartphone. :D
high pos-
itive
positive
Houston we have a problem !!
My iPad has been restoring for
12+ hours after installing @apple
IOS5. This can’t be right ....
low neg-
ative
negative
@Apple needs to give me a con-
tract deal i get them new cus-
tomers all the time #teamiphone
PrettyAmaazing
neutral neutral
2019), (Ojeda-Hern
´
andez et al., 2023)) use a senti-
ment lexicon, such as dictionaries or corpus, to ana-
lyze text polarity. These approaches use annotations
that describe how the text matches the lexicons. Ma-
chine learning (ML) approaches ((Silva et al., 2022),
(Li et al., 2023)) classify data to predict emotional
polarity. At the same time, deep learning techniques
have been used to identify frequently used models
in sentiment analysis research (e.g., (Alamoudi and
Alghamdi, 2021), (Su and Shen, 2022)). Although
analyzing social media data can provide valuable in-
sights, managing this data still presents challenges.
These issues include the need for a clear pattern in
conceptual models and the conflicting goals of com-
panies and researchers who use this data.
2.2.2 Social Media Data Integration Approaches
Several approaches propose frameworks that trans-
form social media data into meaningful, valuable in-
formation to enable more effective decision-making.
They incorporate these data into the existing mul-
tidimensional structures. Several studies from the
past five years have recommended incorporating user
opinions as an analysis dimension in the DW. (Walha
et al., 2017) developed a Twitter DW model to eval-
uate opinions presented by Tweet Fact. It analyzes
opinions across dimensions, including Tweet, Date,
Time, Location, User, Sentiment, Context, and Topic.
Sentiment is categorized as positive, negative, or neu-
tral, and is determined using an opinion analysis al-
gorithm called POLSentiment (Walha et al., 2016).
In (Ben Kraiem et al., 2020), a model was devel-
oped to analyze tweet data using OLAP. The model
examines user activities over time, connections be-
tween tweets and respondent users, and tweet sen-
timent data. The ”DTweet-Metadata” dimension of
the model provides opinion data on user sentiment
and tweet topics. Tweet-Sentiment data is obtained
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
642
by counting the tweets for each sentiment category.
(Val
ˆ
encio et al., 2020) proposed a DW model for
Facebook and Twitter using a normalized constella-
tion schema to support opinion analysis. The ap-
proach involves the ETL stage, which eliminates re-
dundant data to improve performance. The model
helps in data acquisition, transformation, and loading
and can extract valuable insights when human com-
prehension falls short. Recently, (Moalla et al., 2022)
created a data mart to analyze user opinions on so-
cial media platforms such as Facebook, Twitter, and
YouTube. They implemented an ETL process with
three stages and used a supervised learning classifica-
tion technique for sentiment analysis (Moalla et al.,
2018). Despite consistent experimental results, the
study’s drawback was the need for further formaliza-
tion and design of the ETL process steps. Detect-
ing sentiments in UGC text is crucial. Effective pre-
processing methods have been defined, but designers
need to incorporate them into ETL modeling to align
UGC with social DW.
3 SENTIMENT ANALYSIS AND
ETL MODELING
We employ the POLSentiment lexicon-based ap-
proach to analyze sentiment indicators in text and
evaluate polarity values. This method, detailed in
(Walha et al., 2016), allows us to determine Senti-
mentDim attributes, as defined in Figure 1.
3.1 Overview of POLSentiment
PolSentiment is sentiment analysis method that uses
dictionaries to express positive or negative sentiments
in user-generated texts. Opinion dictionaries are
formed from sentiment indicators widely used on so-
cial media. These indicators can be verbal expres-
sions known as opinion words or graphic symbols
known as emoticons. POLSentiment algorithm is di-
vided into three stages, as illustrated in Figure 2. Step
(1) in analyzing informal UGC text is to perform a pri-
mary ”Text Cleaning” phase that removes unknown
characters, URLs, punctuations, and repetitive char-
acters. After this, the text is segmented into Tok 1,
Tok 2, ..., Tok n through ”Text Tokenization”. Step
(2) identifies sentiment expressions in the UGC text,
including emoticons or opinion words. It also consid-
ers the possibility of a modifier preceding an opinion
word, such as ”not, ”little, ”very,” and so on, which
can alter the word’s meaning in terms of the expressed
sentiment. A sentiment and a valence (a float value in
[-1, 1]) describe each sentiment indicator. In step (3),
called ”Opinion analysis”, the main goal is to deter-
mine the sentiment expressed in the UGC entry. This
step is realized by computing the text’s polarity based
on the valences of the sentiment indicators extracted
in Step (2). To achieve this, POLSentiment evalu-
ates the valence of opinion words, their modifiers, and
even emoticons to determine the text’s polarity ac-
curately. This algorithm outperforms other methods
and is evaluated on a pre-annotated dataset. Our pri-
(1) Media Clips Pre-processing (3) Opinion Analysis
Text Cleaning
Media Clips
Text Tokenization
Tok 1, 2, …………… Tok n
Calculate text polarity
(PolSentiment Algorithms)
Emoticons
extraction
Opinion terms
extraction
Emoticons
Opinion terms
Extract
modifiers
Detect lexicon
valences
Modifiers
(2) Lexicon Extraction
Lexicon
Collections
Figure 2: POLSentiment steps (Walha et al., 2016).
mary objective is to model the ETL process to match
the UGC text with SentimentDim attributes. This
paper proposes an ETL process called ETL4Social-
Process-Sentiment that assists designers in integrat-
ing the main stages of POLSentiment in a generic and
reusable manner.
3.2 ETL4Social-Process-Sentiment
Model
ETL4Social-Process-Sentiment is a process that ex-
tracts, transforms, and loads user-generated texts into
SDW for easier matching with the SentimentDim di-
mension. Figure 3 shows a diagram of the BPMN
1
model used, including a sequence of ETL opera-
tions executed automatically in a specific order. The
Figure 3: BPMN model of ETL4Social-Process-Sentiment.
Crawl Media Clips operation searches social media
1
Business Process Modeling and Notation
Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media
643
UGCs, known as Media Clips, published on a spe-
cific topic. The SentimentDim dimension gathers data
from informal texts, which must be cleaned before
use. The Pre-process Clip-Text operation uses well-
known cleaning techniques to clean this data. This
operation includes a tokenization stage to facilitate
sentiment classification, separating words from the
clean text. Detecting Opinion Indicators is used to
identify sentiment indicators in a given text, such as
opinion words and emoticons. Based on these in-
dicators, the Clip-Polarity Compute operation com-
putes the text polarity, ranging from (-1) to (1). The
Specify Sent-Polarity operation then uses this value
to determine the overall sentiment, such as ”high pos-
itive” or ”low positive”, as well as the polarity of
the media clip, which can be ”positive”, ”negative”,
or ”neutral. Finally, the Load SentimentDim opera-
tion loads the data in this dimension. ETL4Social-
Process-Sentiment model is an abstract view of ex-
tracting UGCs, transforming them into attributes of
the SentimentDim dimension, and loading them into
the DW. It efficiently manages the execution order of
all the operations involved in this process while each
operation receives input data, requires intermediate
data, and produces output data. However, it is essen-
tial to note that the model does not control the data
flow needed to execute these operations. ETL4Social-
Sentiment-Operation, on the other hand, focuses on
this aspect of modeling the ETL process.
4 MODELING OF ETL4Social-
SENTIMENT-OPERATION
The ETL4Social-Process-Sentiment model converts
media clip texts into SentimentDim dimension data.
ETL operations are specified with BPMN sub-
processes, as shown in Figure 3. The ETL4Social-
Sentiment-Operation models manage data flow be-
tween tasks and activities. These models are detailed
in the following sections.
4.1 Pre-Process Clip-Text Operation
Model
The data collected through the Media Clips Crawl-
ing operation is human-generated text. Therefore, it
may contain informal messages with abbreviations,
spelling errors, and symbols. This issue can present
a significant challenge when performing sentiment
analysis on such content. However, we have de-
signed a model specifically for the Pre-process Clip-
Text operation, as illustrated in Figure 4, which com-
prises a set of activities including Get media clips,
Clean texts, Tokenize text, and Store tokens, each of
which is specified by either a BPMN ”Sub-Process”
element for composite activities or a BPMN ”Task”
item for atomic activities.
Opinion analysis involves various data-cleaning
techniques. Figure 4 shows the ”Clean texts” ac-
tivity applied to each media clip. This activity
is a well-defined sequence of BPMN tasks, in-
cluding Clean URLs, Clean users, Clear diacritics,
Clean repetitive letters, and Clean stop words. The
sole objective of these tasks is to eradicate any trace
of URLs, email addresses, user mentions (@user-
name), diacritical signs, repetitive letters (for exam-
ple, ”veryyyy” being converted to ”very”), and stop
words (such as ”the” and ”a”). As a result of this
highly efficient process, we obtained a collection of
cleaned texts. Next, the ”Tokenize texts” activity seg-
ments the cleaned text by breaking it down into in-
dividual words or symbols known as tokens. These
tokens are stored in a temporary data object (clip c)
and then used to identify opinion indicators in the clip
text.
4.2 Detect Opinion Indicators Model
POLSentiment efficiently categorizes the sentiment
of text-based content by analyzing opinions, modi-
fiers, and emoticons. It identifies sentiment indicators
among the tokens present in the text. The model
that controls the data flows of the Detect Opinion
Indicators operation is depicted in Figure 5. The first
step is the ”Detect
indicators” activity, which helps
determine whether a given token is an opinion word
or an emoticon. It utilizes the ”Is an opinion word”
and ”Is an emoticon” tasks, which query the
opinion dictionaries ”Dict Opinion Words” and
”dict Emoticons” respectively.
”Is a modifier” activity thoroughly searches for
a modifier before an opinion word within the text
segments. If emoticons and opinions are present in
the text, the ”Store emoticon” and ”Store modifier”
activities will temporarily store them in the objects
”clip emoticons” and ”clip opinion words”, which
will eventually be stored in the DSA.
4.3 Compute Clip-Polarity Model
Based on (Walha et al., 2016), the Compute
Clip-Polarity operation calculates text polarity
(TPol) using valences of opinion words, mod-
ifiers, and emoticons. TPol is determined by
initializing the polarities of polarities gener-
ated by opinion words (Tpol W) and emoticons
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
644
Figure 4: BPMN model of Pre-process Clip-Text operation.
(TPol E) to 0, retrieving ”clip emoticons” and
”clip opinion words, and running three activi-
ties in parallel: ”Compute opn words polarity”,
”Compute emoticons polarity”, and
”Count opinion indicators”. The last activity
counts opinion words and emoticons in the input text,
represented by count ind.
The TPol score considers the valence of emotions
and opinions. The ”Compute emoticons polarity”
operation adds Tpol E to TPol. ”Com-
pute opn words polarity” returns Tpol W, calculated
based on opinion valences and modifiers. A BPMN
gateway called ”has modifier?” checks for a modifier
(m) for the word (w). Val w (the valence of w) is
added to Tpol W if there is no modifier. Otherwise,
Val w and Val m are considered. If Val w is negative,
the opposite of Val w and the absolute value of
(Val m) are added to Tpol W. Otherwise, this step is
completed by carrying out the ”Average: Neg(Val w),
Abs(Val m)” activity. If (Val m) is negative, its op-
posite is added to Tpol W. To determine a text’s
polarity, we add the polarity values of opinion words
and emoticons and divide the total by the count ind.
Due to diverse social media platforms, designers of
social data warehouses may need help with the ETL
process. Identifying specific ETL concepts can help
modify or create new models. Our method includes
sentiment analysis and introduces new social ETL
modeling concepts.
5 FORMALIZATION OF
ETL4Social-SENTIMENT
CONCEPTS
5.1 ”O-Activity” Concept
”O-Activity” is the executable component in a so-
cial ETL operation.This concept has three variations:
”Social Activity”, ”Standard Activity”, and ”Seman-
tic Activity”.
Definition 1. O-Activity
O-Activity (O A) Is Defined with the n-
uplet (Name
O A
, InpFlOb j
O A
, OutFlOb j
O A
,
InpData
O A
, OutData
O A
), Where:
Name
O A
: Is the Name of the Activity (O A),
O Model Name: Is the Name of the ETL Op-
eration Model Containing this Activity
InpFlOb j
O A
: Is the Input Flow Object of O A.
this Object Might Be an Instance of an Activity
(O-Activity), a Gateway, or an Event,
OutFlOb j
O A
: Is the Output Flow Objects of
O A,
InpData
O A
= {Din
1
;...; Din
k
}: Is a Set of Input
Data Objects (O-Data) Used to Execute O A,
OutData
O A
= {Dout
1
;...; Dout
l
}: Is a Set of
Data Resulting from O A Execution.
5.2 ”Social Activity” Concept
”Social Activity” is a step of the ETL operation for
UGC data mapping into social DW, and it can be ei-
Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media
645
Figure 5: BPMN model of Detect Opinion Indicators operation.
ther standard or semantic.
Definition 2. Social Activity
A Social Activity (A
SO) Is an n-tuple (Name
A SO
,
InpFlOb j
A SO
, OutFlOb j
A SO
, InpData
A SO
,
OutData
A SO
, StdAct
A SO
, SemAct
A SO
):
the Parameters Name
A SO
, InpFlOb j
A SO
,
OutFlOb j
A SO
, InpData
A SO
, OutData
A SO
Are
Already Described in (Definition 1)
StdAct
A SO
= {Ast
1
;...; Ast
n
}: a Set of Standard
Activities Parts of the Activity (A SO),
SemAct
A SO
= {Asm
1
;...; Asm
m
}: a Set of Seman-
tic Activities of A SO.
Example 1. ”Detect Indicators” is a social activ-
ity that identifies emoticons and opinion words in
a UGC text. It involves standard tasks like ”Get”
and ”Store” and semantic tasks, such as identify-
ing opinion words and emoticons, under the ”De-
tect Opinion Indicators” operation (Figure 5).
”Detect Indicators” is defined as follows:
A SO
DetIndic
= (”Detect Indicators”,
InpFlOb j
DetIndic
,OutFlOb j
DetIndic
,
InpData
DetIndic
, OutData
DetIndic
)
with:
- InpFlOb j
DetIndic
= {Get tokens},
- OutFlOb j
DetIndic
= {Store clip emoticons},
- InpData
DetIndic
= {Inpclip tokens},
- OutData
DetIndic
= {clip emoticons,
clip opinion words}.
- StdAct
DetIndic
= {Store opinion word,
Get previous token, Store modifier, Store emoticon}.
- SemAct
DetIndic
= {Is an opinion word,
Is a modifier, Is an emoticon}.
5.3 ”Standard Activity” Concept
ETL processes involve common activities like ”Get”,
”Join”, ”Merge”, and ”Store” used to promote struc-
tured data processing.
Definition 3. Standard Activity
Standard Activity (A ST)” Is a Variant of
the ”O-Activity” Concept, Defined by the n-
uplet (Name
A ST
, InpFlOb j
A ST
, OutFlOb j
A ST
,
InpData
A ST
, OutData
A ST
).
Example 2. The activity ”Get previous token” in
the ”Detect Indicators” social activity, described in
Example 1, is an instance of the ”Standard Activity”
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
646
Figure 6: BPMN model of Compute Clip-Polarity operation.
concept. It is defined as:
A ST
GetPrevT
= (”Get previous token”,
InpFlOb j
GetPrevT
,OutFlOb j
GetPrevT
,
InpData
GetPrevT
, OutData
GetPrevT
)
with:
- InpFlOb j
GetPrevT
= {Store opinion word: t},
- OutFlOb j
GetPrevT
= {Is a Modifier: p t},
- InpData
GetPrevT
= {t, clip tokens},
- OutData
GetPrevT
= {p t}.
5.4 ”Semantic Activity” Concept
”Semantic Activity” transforms human-generated text
into decisional information using lexical or semantic
resources.
Definition 4. Standard Activity
Semantic Activity (A SE)”, a Variant of the
”O-Activity”, Is Defined by the n-uplet (Name
A SE
,
InpFlOb j
A SE
, OutFlOb j
A SE
, InpData
A SE
,
OutData
A SE
).
Example 3. ”Is an emoticon” is an activity under
”Detect Indicators”, as defined in Example 1 and
shown in Figure 5. This activity used for identifying
emoticons in text is summarized as follows:
A SE
IsAnEmot
= (“Is an emoticon”,
InpFlOb j
IsAnEmot
,OutFlOb j
IsAnEmot
,
InpData
IsAnEmot
, OutData
IsAnEmot
)
with:
- InpFlOb j
IsAnEmot
= {opinion word?},
- OutFlOb j
IsAnEmot
= {emoticon?},
- InpData
IsAnEmot
= {t, Dict
Emot},
- OutData
IsAnEmot
= {resp}.
6 DISCUSSION
Table 2: ETL4Social-Sentiment Vs. Opinion Integration
Approaches.
Approaches Social
Media
ETL
Design
ETL
concept
SA
method
(Walha
et al., 2017)
TW yes yes no
(Ben Kraiem
et al., 2020)
TW no no no
(Val
ˆ
encio
et al., 2020)
FB,
TW
no no no
(Moalla
et al., 2022)
FB,
TW YT
no no ML
ETL4Social-
Sentiment
Global
SDW
yes yes POLSent-
iment
This paper presents a design solution for incorporat-
ing sentiment analysis of UGC into the SDW, reduc-
ing computational costs, and enabling opinion dis-
covery. We compared our proposal with existing ap-
proaches in Section 2.2.2. Table 2 shows the results
based on specific criteria.
Social Media. It lists social media platforms
Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media
647
such as Twitter (TW), Facebook (FB), and
YouTube(YT) used by the DW and ETL models.
ETL Design. It verifies if the approach proposes
ETL process model to map UGC into sentiment.
ETL concepts. It checks whether the approach de-
fines or formalizes the ETL concepts.
SA method. It determines if the sentiment is gen-
erated based on a valid sentiment analysis method.
Although some approaches have proposed solu-
tions to integrate opinions from text UGCs, we note
the need to model ETL processes to transform UGC
text into DW. In this context, our contribution ad-
dresses this problem at a conceptual level. The mod-
els (cf. Sections 3.2 and 4) serve as design patterns
for opinion data integration and simplify the ETL de-
signer’s task. Morover, the ETL4Social-Sentiment-
Process and operation models apply to all social me-
dia types for sentiment analysis. It provides ETL de-
signers with a standardized approach to optimizing
social ETL processes using formalized concepts.
7 CONCLUSION
Integrating opinion data from unstructured text
sources into a decisional system can be challenging
when designing ETL processes. A social data ware-
house can help with this. However, careful han-
dling of user-generated content is required to iden-
tify sentiment. Our research aimed to develop prac-
tical approaches for sentiment analysis on social me-
dia. We proposed design models for the ETL4Social-
Sentiment process and operations. These models
handle activities and data to match UGC text with
the SentimentDim dimension of the SDW. The mod-
els are generic and customizable based on the ETL-
formulized concepts. Big data sources require pow-
erful ETL tools that are efficient in execution cost,
transformations, and parallel data processing. To im-
prove our proposal, we must use MapReduce as a dis-
tributed execution framework to process big data in
parallel, saving time and reducing the risk of errors.
REFERENCES
Alamoudi, E. S. and Alghamdi, N. S. (2021). Sentiment
classification and aspect-based sentiment analysis on
yelp reviews using deep learning and word embed-
dings. Journal of Decision Systems, 30(2-3):259–281.
Ben Kraiem, M., Alqarni, M., Feki, J., and Ravat, F. (2020).
Olap operators for social network analysis. Cluster
Computing, 23:2347–2374.
Darwich, M., Mohd, S. A., Omar, N., and Osman, N. A.
(2019). Corpus-based techniques for sentiment lex-
icon generation: A review. J. Digit. Inf. Manag.,
17(5):296.
Khan, B., Jan, S., Khan, W., and Chughtai, M. I. (2024).
An overview of etl techniques, tools, processes and
evaluations in data warehousing. Journal on Big Data,
6.
Li, J., Zhang, Y., Li, J., and Du, J. (2023). The role of
sentiment tendency in affecting review helpfulness for
durable products: nonlinearity and complementarity.
Information Systems Frontiers, 25(4):1459–1477.
Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z.
(2022). Text mining of user-generated content (ugc)
for business applications in e-commerce: A system-
atic review. Mathematics, 10(19):3554.
Moalla, I., Nabli, A., and Hammami, M. (2018). Towards
opinions analysis method from social media for mul-
tidimensional analysis. In MoMM, pages 8–14.
Moalla, I., Nabli, A., and Hammami, M. (2022). Data
warehouse building to support opinion analysis in so-
cial media. Social Network Analysis and Mining,
12(1):123.
Ojeda-Hern
´
andez, M., L
´
opez-Rodr
´
ıguez, D., and Mora,
´
A.
(2023). Lexicon-based sentiment analysis in texts us-
ing formal concept analysis. International Journal of
Approximate Reasoning, 155:104–112.
Silva, L. M. M., Val
ˆ
encio, C. R., Zafalon, G. F. D.,
Columbini, A. C., Filipe, J., Smialek, M., Brodsky, A.,
and Hammoudi, S. (2022). Feature selection with hy-
brid bio-inspired approach for classifying multi-idiom
social media sentiment analysis. In ICEIS, pages 297–
307.
Sinha, S., Narayanan, R. S., and Rakila, R. (2024). Har-
nessing sentiment analysis methodologies for business
intelligence enhancement and governance intelligence
evaluation. Journal of Intelligent Systems and Appli-
cations in Engineering, 12(11s):166–176.
Su, Y. and Shen, Y. (2022). A deep learning-based senti-
ment classification model for real online consumption.
Frontiers in Psychology, 13:886982.
Val
ˆ
encio, C. R., Silva, L. M. M., Ten
´
orio, W., Zafalon, G.
F. D., Colombini, A. C., and Fortes, M. Z. (2020).
Data warehouse design to support social media anal-
ysis in a big data environment. Journal of Computer
Science, pages 126–136.
Walha, A., Ghozzi, F., and Gargouri, F. (2016). A lexicon
approach to multidimensional analysis of tweets opin-
ion. In AICCSA, pages 1–8. IEEE.
Walha, A., Ghozzi, F., and Gargouri, F. (2017). Etl4social-
data: Modeling approach for topic hierarchy. In
KEOD, pages 107–118.
Wankhade, M., Rao, A. C. S., and Kulkarni, C. (2022).
A survey on sentiment analysis methods, applica-
tions, and challenges. Artificial Intelligence Review,
55(7):5731–5780.
ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering
648