Extract-Transform-Load Process for Recognizing Sentiment from

User-Generated Text on Social Media

Afef Walha

1,2

, Faiza Ghozzi

1,3

and Faiez Gargouri

1,3

MIRACL Laboratory, Sfax, Tunisia

Higher Institute of Information Science and Multimedia of Gabes (ISIMG), University of Gabes, Tunisia

Higher Institute of Information Science and Multimedia of Sfax (ISIMS), University of Sfax, Tunisia

Keywords:

Sentiment, Classiﬁcation, BPMN, Polarity, ETL, Process, Formalization, Social Media.

Abstract:

In today’s world, business intelligence systems must incorporate opinion mining into their decision-making

process. Sentiment analysis of user-generated content on social media has gained signiﬁcant attention in recent

years. This method collects user opinions, feelings, and attitudes toward a topic of interest and helps determine

whether their sentiment is positive, neutral, or negative. This paper addresses text classiﬁcation in sentiment

analysis and presents a solution to the Extract-Transform-Load (ETL) process based on a lexicon approach.

This process involves gathering media clips, converting them into sentiments, and loading them into a social

data warehouse. We provide generic and customizable models to aid designers in integrating pre-processing

techniques and sentiment analysis into the ETL process. By formalizing new ETL concepts, designers can

create a reliable conceptual design for any ETL process related to opinion data integration from social media.

1 INTRODUCTION

Business Intelligence (BI) systems analyze data for

decision-making, while social networks facilitate so-

cial interactions. Integrating social network data into

BI requires considering user experiences and orga-

nizational goals. Businesses should evaluate the ad-

vantages and difﬁculties of utilizing social media data

(Sinha et al., 2024). Social media platforms like Face-

book, Twitter, and Instagram allow people to share

their thoughts and interests through user-generated

content (UGC). Companies use this data to improve

marketing, customer service, and public relations.

UGC, like tweets, has given rise to sentiment anal-

ysis (Wankhade et al., 2022). Because it deals with

human-generated informal text, this ﬁeld is complex.

Most researchers focus on sentiment analysis, which

involves using various cleaning techniques and polar-

ity detection methods to identify opinions. Design-

ers working on a data warehouse (DW) may require

assistance integrating UGC from diverse social me-

dia sources into their Extract-Transform-Load pro-

cess (Khan et al., 2024). Therefore, it is crucial to

model the ETL process for integrating opinions, re-

gardless of the social platform or topic of interest.

This paper proposes ETL4Social-Process-Sentiment,

a process for gathering UGC text and converting it

into opinions for the Social DW. The paper focuses

on modeling complex operations’ control and data

ﬂows, emphasizing text pre-processing and polarity

detection. It also suggests formalizing various ETL

concepts, thus enabling designers to reuse and ad-

just current models or develop new ones that meet

their speciﬁc requirements. The manuscript is divided

into seven sections. Section 2 discusses opinion in-

tegration and notable works. Section 3 outlines the

sentiment analysis method and proposed model for

ETL4Social-Process-Sentiment. Section 4 highlights

its complex operation models, while Section 5 for-

malizes concepts. Section 6 compares our contribu-

tion to existing works. Lastly, Section 7 concludes

and provides a foundation for future research.

2 MOTIVATION AND RELATED

WORK

Social DW (SDW) is a central storage for analyzing

UGC from various social media platforms. We sug-

gest a global schema for all media types, as shown

in Figure 1. This model enables decision-makers to

analyze opinions expressed by OpinionFact accord-

ing to favorites count, republished count, and opin-

Walha, A., Ghozzi, F. and Gargouri, F.

Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media.

DOI: 10.5220/0012706100003687

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2024), pages 641-648

ISBN: 978-989-758-696-5; ISSN: 2184-4895

641

ion occurrence. The analysis can be done based on

several dimensions, such as Media ClipDim, Date-

Dim, TimeDim, LocationDim, UserDim, Sentiment-

Dim, ContextDim, and TopicDim. This present paper

focuses on the process that performs complex ETL

steps to extract user-generated text, clean it, transform

it into opinions, and load it into the SDW, especially

to aliment the SentimentDim, which is the user’s sen-

timent regarding a social event. It is associated with

a ”positive”, ”negative”, or ”neutral” polarity. The

polarity depends on the polarity value, a ﬂoat on the

range of [-1,1], computed based on a sentiment clas-

siﬁcation algorithm. The Sentiment attribute (belongs

to SentimentDim) includes several variants of the po-

larity, such as ”high positive”, ”positive” and ”low

positive” for a ”positive” polarity.

Figure 1: Multidimensional Schema of Social DW.

2.1 Challenges of Matching UGC-Text

with Sentiment

Table 1 shows examples of tweets related to ”mo-

bile technology”. We manually analyzed the human-

generated text based on sentiment words, emoticons,

and other indicators to determine whether the senti-

ment expressed in the tweets is positive, negative, or

neutral. It’s important to note that we only explored a

small set of tweets. However, how can we proceed

when there are hundreds or thousands of tweets to

analyze?

2.2 Related Work

2.2.1 Sentiment Analysis Approaches

Sentiment analysis determines the polarity of a

text. Methods include sentiment lexicon-based ap-

proaches, machine learning-based approaches, and

deep learning techniques (Li et al., 2022). The senti-

ment lexicon-based approaches (e.g., (Darwich et al.,

Table 1: Classiﬁcation of tweets (”mobile technology”).

Tweet text Sentiment Polarity

WOW the new #Google

#Nexus is so beautiful !!! to-

tally boost google’s market

share in the smartphone. :D

high pos-

itive

positive

Houston we have a problem !!

My iPad has been restoring for

12+ hours after installing @apple

IOS5. This can’t be right ....

low neg-

ative

negative

@Apple needs to give me a con-

tract deal i get them new cus-

tomers all the time #teamiphone

PrettyAmaazing

neutral neutral

2019), (Ojeda-Hern

andez et al., 2023)) use a senti-

ment lexicon, such as dictionaries or corpus, to ana-

lyze text polarity. These approaches use annotations

that describe how the text matches the lexicons. Ma-

chine learning (ML) approaches ((Silva et al., 2022),

(Li et al., 2023)) classify data to predict emotional

polarity. At the same time, deep learning techniques

have been used to identify frequently used models

in sentiment analysis research (e.g., (Alamoudi and

Alghamdi, 2021), (Su and Shen, 2022)). Although

analyzing social media data can provide valuable in-

sights, managing this data still presents challenges.

These issues include the need for a clear pattern in

conceptual models and the conﬂicting goals of com-

panies and researchers who use this data.

2.2.2 Social Media Data Integration Approaches

Several approaches propose frameworks that trans-

form social media data into meaningful, valuable in-

formation to enable more effective decision-making.

They incorporate these data into the existing mul-

tidimensional structures. Several studies from the

past ﬁve years have recommended incorporating user

opinions as an analysis dimension in the DW. (Walha

et al., 2017) developed a Twitter DW model to eval-

uate opinions presented by Tweet Fact. It analyzes

opinions across dimensions, including Tweet, Date,

Time, Location, User, Sentiment, Context, and Topic.

Sentiment is categorized as positive, negative, or neu-

tral, and is determined using an opinion analysis al-

gorithm called POLSentiment (Walha et al., 2016).

In (Ben Kraiem et al., 2020), a model was devel-

oped to analyze tweet data using OLAP. The model

examines user activities over time, connections be-

tween tweets and respondent users, and tweet sen-

timent data. The ”DTweet-Metadata” dimension of

the model provides opinion data on user sentiment

and tweet topics. Tweet-Sentiment data is obtained

ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering

642

by counting the tweets for each sentiment category.

(Val

encio et al., 2020) proposed a DW model for

Facebook and Twitter using a normalized constella-

tion schema to support opinion analysis. The ap-

proach involves the ETL stage, which eliminates re-

dundant data to improve performance. The model

helps in data acquisition, transformation, and loading

and can extract valuable insights when human com-

prehension falls short. Recently, (Moalla et al., 2022)

created a data mart to analyze user opinions on so-

cial media platforms such as Facebook, Twitter, and

YouTube. They implemented an ETL process with

three stages and used a supervised learning classiﬁca-

tion technique for sentiment analysis (Moalla et al.,

2018). Despite consistent experimental results, the

study’s drawback was the need for further formaliza-

tion and design of the ETL process steps. Detect-

ing sentiments in UGC text is crucial. Effective pre-

processing methods have been deﬁned, but designers

need to incorporate them into ETL modeling to align

UGC with social DW.

3 SENTIMENT ANALYSIS AND

ETL MODELING

We employ the POLSentiment lexicon-based ap-

proach to analyze sentiment indicators in text and

evaluate polarity values. This method, detailed in

(Walha et al., 2016), allows us to determine Senti-

mentDim attributes, as deﬁned in Figure 1.

3.1 Overview of POLSentiment

PolSentiment is sentiment analysis method that uses

dictionaries to express positive or negative sentiments

in user-generated texts. Opinion dictionaries are

formed from sentiment indicators widely used on so-

cial media. These indicators can be verbal expres-

sions known as opinion words or graphic symbols

known as emoticons. POLSentiment algorithm is di-

vided into three stages, as illustrated in Figure 2. Step

(1) in analyzing informal UGC text is to perform a pri-

mary ”Text Cleaning” phase that removes unknown

characters, URLs, punctuations, and repetitive char-

acters. After this, the text is segmented into Tok 1,

Tok 2, ..., Tok n through ”Text Tokenization”. Step

(2) identiﬁes sentiment expressions in the UGC text,

including emoticons or opinion words. It also consid-

ers the possibility of a modiﬁer preceding an opinion

word, such as ”not,” ”little,” ”very,” and so on, which

can alter the word’s meaning in terms of the expressed

sentiment. A sentiment and a valence (a ﬂoat value in

[-1, 1]) describe each sentiment indicator. In step (3),

called ”Opinion analysis”, the main goal is to deter-

mine the sentiment expressed in the UGC entry. This

step is realized by computing the text’s polarity based

on the valences of the sentiment indicators extracted

in Step (2). To achieve this, POLSentiment evalu-

ates the valence of opinion words, their modiﬁers, and

even emoticons to determine the text’s polarity ac-

curately. This algorithm outperforms other methods

and is evaluated on a pre-annotated dataset. Our pri-

(1) Media Clips Pre-processing (3) Opinion Analysis

Text Cleaning

Media Clips

Text Tokenization

Tok 1, 2, …………… Tok n

Calculate text polarity

(PolSentiment Algorithms)

Emoticons

extraction

Opinion terms

extraction

Emoticons

Opinion terms

Extract

modifiers

Detect lexicon

valences

Modifiers

(2) Lexicon Extraction

Lexicon

Collections

Figure 2: POLSentiment steps (Walha et al., 2016).

mary objective is to model the ETL process to match

the UGC text with SentimentDim attributes. This

paper proposes an ETL process called ETL4Social-

Process-Sentiment that assists designers in integrat-

ing the main stages of POLSentiment in a generic and

reusable manner.

3.2 ETL4Social-Process-Sentiment

Model

ETL4Social-Process-Sentiment is a process that ex-

tracts, transforms, and loads user-generated texts into

SDW for easier matching with the SentimentDim di-

mension. Figure 3 shows a diagram of the BPMN

model used, including a sequence of ETL opera-

tions executed automatically in a speciﬁc order. The

Figure 3: BPMN model of ETL4Social-Process-Sentiment.

Crawl Media Clips operation searches social media

Business Process Modeling and Notation

Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media

643

UGCs, known as Media Clips, published on a spe-

ciﬁc topic. The SentimentDim dimension gathers data

from informal texts, which must be cleaned before

use. The Pre-process Clip-Text operation uses well-

known cleaning techniques to clean this data. This

operation includes a tokenization stage to facilitate

sentiment classiﬁcation, separating words from the

clean text. Detecting Opinion Indicators is used to

identify sentiment indicators in a given text, such as

opinion words and emoticons. Based on these in-

dicators, the Clip-Polarity Compute operation com-

putes the text polarity, ranging from (-1) to (1). The

Specify Sent-Polarity operation then uses this value

to determine the overall sentiment, such as ”high pos-

itive” or ”low positive”, as well as the polarity of

the media clip, which can be ”positive”, ”negative”,

or ”neutral.” Finally, the Load SentimentDim opera-

tion loads the data in this dimension. ETL4Social-

Process-Sentiment model is an abstract view of ex-

tracting UGCs, transforming them into attributes of

the SentimentDim dimension, and loading them into

the DW. It efﬁciently manages the execution order of

all the operations involved in this process while each

operation receives input data, requires intermediate

data, and produces output data. However, it is essen-

tial to note that the model does not control the data

ﬂow needed to execute these operations. ETL4Social-

Sentiment-Operation, on the other hand, focuses on

this aspect of modeling the ETL process.

4 MODELING OF ETL4Social-

SENTIMENT-OPERATION

The ETL4Social-Process-Sentiment model converts

media clip texts into SentimentDim dimension data.

ETL operations are speciﬁed with BPMN sub-

processes, as shown in Figure 3. The ETL4Social-

Sentiment-Operation models manage data ﬂow be-

tween tasks and activities. These models are detailed

in the following sections.

4.1 Pre-Process Clip-Text Operation

Model

The data collected through the Media Clips Crawl-

ing operation is human-generated text. Therefore, it

may contain informal messages with abbreviations,

spelling errors, and symbols. This issue can present

a signiﬁcant challenge when performing sentiment

analysis on such content. However, we have de-

signed a model speciﬁcally for the Pre-process Clip-

Text operation, as illustrated in Figure 4, which com-

prises a set of activities including Get media clips,

Clean texts, Tokenize text, and Store tokens, each of

which is speciﬁed by either a BPMN ”Sub-Process”

element for composite activities or a BPMN ”Task”

item for atomic activities.

Opinion analysis involves various data-cleaning

techniques. Figure 4 shows the ”Clean texts” ac-

tivity applied to each media clip. This activity

is a well-deﬁned sequence of BPMN tasks, in-

cluding Clean URLs, Clean users, Clear diacritics,

Clean repetitive letters, and Clean stop words. The

sole objective of these tasks is to eradicate any trace

of URLs, email addresses, user mentions (@user-

name), diacritical signs, repetitive letters (for exam-

ple, ”veryyyy” being converted to ”very”), and stop

words (such as ”the” and ”a”). As a result of this

highly efﬁcient process, we obtained a collection of

cleaned texts. Next, the ”Tokenize texts” activity seg-

ments the cleaned text by breaking it down into in-

dividual words or symbols known as tokens. These

tokens are stored in a temporary data object (clip c)

and then used to identify opinion indicators in the clip

text.

4.2 Detect Opinion Indicators Model

POLSentiment efﬁciently categorizes the sentiment

of text-based content by analyzing opinions, modi-

ﬁers, and emoticons. It identiﬁes sentiment indicators

among the tokens present in the text. The model

that controls the data ﬂows of the Detect Opinion

Indicators operation is depicted in Figure 5. The ﬁrst

step is the ”Detect

indicators” activity, which helps

determine whether a given token is an opinion word

or an emoticon. It utilizes the ”Is an opinion word”

and ”Is an emoticon” tasks, which query the

opinion dictionaries ”Dict Opinion Words” and

”dict Emoticons” respectively.

”Is a modiﬁer” activity thoroughly searches for

a modiﬁer before an opinion word within the text

segments. If emoticons and opinions are present in

the text, the ”Store emoticon” and ”Store modiﬁer”

activities will temporarily store them in the objects

”clip emoticons” and ”clip opinion words”, which

will eventually be stored in the DSA.

4.3 Compute Clip-Polarity Model

Based on (Walha et al., 2016), the Compute

Clip-Polarity operation calculates text polarity

(TPol) using valences of opinion words, mod-

iﬁers, and emoticons. TPol is determined by

initializing the polarities of polarities gener-

ated by opinion words (Tpol W) and emoticons

ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering

644

Figure 4: BPMN model of Pre-process Clip-Text operation.

(TPol E) to 0, retrieving ”clip emoticons” and

”clip opinion words,” and running three activi-

ties in parallel: ”Compute opn words polarity”,

”Compute emoticons polarity”, and

”Count opinion indicators”. The last activity

counts opinion words and emoticons in the input text,

represented by count ind.

The TPol score considers the valence of emotions

and opinions. The ”Compute emoticons polarity”

operation adds Tpol E to TPol. ”Com-

pute opn words polarity” returns Tpol W, calculated

based on opinion valences and modiﬁers. A BPMN

gateway called ”has modiﬁer?” checks for a modiﬁer

(m) for the word (w). Val w (the valence of w) is

added to Tpol W if there is no modiﬁer. Otherwise,

Val w and Val m are considered. If Val w is negative,

the opposite of Val w and the absolute value of

(Val m) are added to Tpol W. Otherwise, this step is

completed by carrying out the ”Average: Neg(Val w),

Abs(Val m)” activity. If (Val m) is negative, its op-

posite is added to Tpol W. To determine a text’s

polarity, we add the polarity values of opinion words

and emoticons and divide the total by the count ind.

Due to diverse social media platforms, designers of

social data warehouses may need help with the ETL

process. Identifying speciﬁc ETL concepts can help

modify or create new models. Our method includes

sentiment analysis and introduces new social ETL

modeling concepts.

5 FORMALIZATION OF

ETL4Social-SENTIMENT

CONCEPTS

5.1 ”O-Activity” Concept

”O-Activity” is the executable component in a so-

cial ETL operation.This concept has three variations:

”Social Activity”, ”Standard Activity”, and ”Seman-

tic Activity”.

Deﬁnition 1. O-Activity

O-Activity (O A) Is Deﬁned with the n-

uplet (Name

O A

, InpFlOb j

O A

, OutFlOb j

O A

InpData

O A

, OutData

O A

), Where:

• Name

O A

: Is the Name of the Activity (O A),

• O − Model Name: Is the Name of the ETL Op-

eration Model Containing this Activity

• InpFlOb j

O A

: Is the Input Flow Object of O A.

this Object Might Be an Instance of an Activity

(O-Activity), a Gateway, or an Event,

• OutFlOb j

O A

: Is the Output Flow Objects of

O A,

• InpData

O A

= {Din

;...; Din

}: Is a Set of Input

Data Objects (O-Data) Used to Execute O A,

• OutData

O A

= {Dout

;...; Dout

}: Is a Set of

Data Resulting from O A Execution.

5.2 ”Social Activity” Concept

”Social Activity” is a step of the ETL operation for

UGC data mapping into social DW, and it can be ei-

Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media

645

Figure 5: BPMN model of Detect Opinion Indicators operation.

ther standard or semantic.

Deﬁnition 2. Social Activity

A Social Activity (A

SO) Is an n-tuple (Name

A SO

InpFlOb j

A SO

, OutFlOb j

A SO

, InpData

A SO

OutData

A SO

, StdAct

A SO

, SemAct

A SO

• the Parameters Name

A SO

, InpFlOb j

A SO

OutFlOb j

A SO

, InpData

A SO

, OutData

A SO

Are

Already Described in (Deﬁnition 1)

• StdAct

A SO

= {Ast

;...; Ast

}: a Set of Standard

Activities Parts of the Activity (A SO),

• SemAct

A SO

= {Asm

;...; Asm

}: a Set of Seman-

tic Activities of A SO.

Example 1. ”Detect Indicators” is a social activ-

ity that identiﬁes emoticons and opinion words in

a UGC text. It involves standard tasks like ”Get”

and ”Store” and semantic tasks, such as identify-

ing opinion words and emoticons, under the ”De-

tect Opinion Indicators” operation (Figure 5).

”Detect Indicators” is deﬁned as follows:

A SO

DetIndic

= (”Detect Indicators”,

InpFlOb j

DetIndic

,OutFlOb j

DetIndic

InpData

DetIndic

, OutData

DetIndic

)

with:

- InpFlOb j

DetIndic

= {Get tokens},

- OutFlOb j

DetIndic

= {Store clip emoticons},

- InpData

DetIndic

= {Inpclip tokens},

- OutData

DetIndic

= {clip emoticons,

clip opinion words}.

- StdAct

DetIndic

= {Store opinion word,

Get previous token, Store modiﬁer, Store emoticon}.

- SemAct

DetIndic

= {Is an opinion word,

Is a modiﬁer, Is an emoticon}.

5.3 ”Standard Activity” Concept

ETL processes involve common activities like ”Get”,

”Join”, ”Merge”, and ”Store” used to promote struc-

tured data processing.

Deﬁnition 3. Standard Activity

”Standard Activity (A ST)” Is a Variant of

the ”O-Activity” Concept, Deﬁned by the n-

uplet (Name

A ST

, InpFlOb j

A ST

, OutFlOb j

A ST

InpData

A ST

, OutData

A ST

Example 2. The activity ”Get previous token” in

the ”Detect Indicators” social activity, described in

Example 1, is an instance of the ”Standard Activity”

ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering

646

Figure 6: BPMN model of Compute Clip-Polarity operation.

concept. It is deﬁned as:

A ST

GetPrevT

= (”Get previous token”,

InpFlOb j

GetPrevT

,OutFlOb j

GetPrevT

InpData

GetPrevT

, OutData

GetPrevT

)

with:

- InpFlOb j

GetPrevT

= {Store opinion word: t},

- OutFlOb j

GetPrevT

= {Is a Modiﬁer: p t},

- InpData

GetPrevT

= {t, clip tokens},

- OutData

GetPrevT

= {p t}.

5.4 ”Semantic Activity” Concept

”Semantic Activity” transforms human-generated text

into decisional information using lexical or semantic

resources.

Deﬁnition 4. Standard Activity

”Semantic Activity (A SE)”, a Variant of the

”O-Activity”, Is Deﬁned by the n-uplet (Name

A SE

InpFlOb j

A SE

, OutFlOb j

A SE

, InpData

A SE

OutData

A SE

Example 3. ”Is an emoticon” is an activity under

”Detect Indicators”, as deﬁned in Example 1 and

shown in Figure 5. This activity used for identifying

emoticons in text is summarized as follows:

A SE

IsAnEmot

= (“Is an emoticon”,

InpFlOb j

IsAnEmot

,OutFlOb j

IsAnEmot

InpData

IsAnEmot

, OutData

IsAnEmot

)

with:

- InpFlOb j

IsAnEmot

= {opinion word?},

- OutFlOb j

IsAnEmot

= {emoticon?},

- InpData

IsAnEmot

= {t, Dict

Emot},

- OutData

IsAnEmot

= {resp}.

6 DISCUSSION

Table 2: ETL4Social-Sentiment Vs. Opinion Integration

Approaches.

Approaches Social

Media

ETL

Design

ETL

concept

method

(Walha

et al., 2017)

TW yes yes no

(Ben Kraiem

et al., 2020)

TW no no no

(Val

encio

et al., 2020)

FB,

no no no

(Moalla

et al., 2022)

FB,

TW YT

no no ML

ETL4Social-

Sentiment

Global

SDW

yes yes POLSent-

iment

This paper presents a design solution for incorporat-

ing sentiment analysis of UGC into the SDW, reduc-

ing computational costs, and enabling opinion dis-

covery. We compared our proposal with existing ap-

proaches in Section 2.2.2. Table 2 shows the results

based on speciﬁc criteria.

• Social Media. It lists social media platforms

Extract-Transform-Load Process for Recognizing Sentiment from User-Generated Text on Social Media

647

such as Twitter (TW), Facebook (FB), and

YouTube(YT) used by the DW and ETL models.

• ETL Design. It veriﬁes if the approach proposes

ETL process model to map UGC into sentiment.

• ETL concepts. It checks whether the approach de-

ﬁnes or formalizes the ETL concepts.

• SA method. It determines if the sentiment is gen-

erated based on a valid sentiment analysis method.

Although some approaches have proposed solu-

tions to integrate opinions from text UGCs, we note

the need to model ETL processes to transform UGC

text into DW. In this context, our contribution ad-

dresses this problem at a conceptual level. The mod-

els (cf. Sections 3.2 and 4) serve as design patterns

for opinion data integration and simplify the ETL de-

signer’s task. Morover, the ETL4Social-Sentiment-

Process and operation models apply to all social me-

dia types for sentiment analysis. It provides ETL de-

signers with a standardized approach to optimizing

social ETL processes using formalized concepts.

7 CONCLUSION

Integrating opinion data from unstructured text

sources into a decisional system can be challenging

when designing ETL processes. A social data ware-

house can help with this. However, careful han-

dling of user-generated content is required to iden-

tify sentiment. Our research aimed to develop prac-

tical approaches for sentiment analysis on social me-

dia. We proposed design models for the ETL4Social-

Sentiment process and operations. These models

handle activities and data to match UGC text with

the SentimentDim dimension of the SDW. The mod-

els are generic and customizable based on the ETL-

formulized concepts. Big data sources require pow-

erful ETL tools that are efﬁcient in execution cost,

transformations, and parallel data processing. To im-

prove our proposal, we must use MapReduce as a dis-

tributed execution framework to process big data in

parallel, saving time and reducing the risk of errors.

REFERENCES

Alamoudi, E. S. and Alghamdi, N. S. (2021). Sentiment

classiﬁcation and aspect-based sentiment analysis on

yelp reviews using deep learning and word embed-

dings. Journal of Decision Systems, 30(2-3):259–281.

Ben Kraiem, M., Alqarni, M., Feki, J., and Ravat, F. (2020).

Olap operators for social network analysis. Cluster

Computing, 23:2347–2374.

Darwich, M., Mohd, S. A., Omar, N., and Osman, N. A.

(2019). Corpus-based techniques for sentiment lex-

icon generation: A review. J. Digit. Inf. Manag.,

17(5):296.

Khan, B., Jan, S., Khan, W., and Chughtai, M. I. (2024).

An overview of etl techniques, tools, processes and

evaluations in data warehousing. Journal on Big Data,

Li, J., Zhang, Y., Li, J., and Du, J. (2023). The role of

sentiment tendency in affecting review helpfulness for

durable products: nonlinearity and complementarity.

Information Systems Frontiers, 25(4):1459–1477.

Li, S., Liu, F., Zhang, Y., Zhu, B., Zhu, H., and Yu, Z.

(2022). Text mining of user-generated content (ugc)

for business applications in e-commerce: A system-

atic review. Mathematics, 10(19):3554.

Moalla, I., Nabli, A., and Hammami, M. (2018). Towards

opinions analysis method from social media for mul-

tidimensional analysis. In MoMM, pages 8–14.

Moalla, I., Nabli, A., and Hammami, M. (2022). Data

warehouse building to support opinion analysis in so-

cial media. Social Network Analysis and Mining,

12(1):123.

Ojeda-Hern

andez, M., L

opez-Rodr

ıguez, D., and Mora,

(2023). Lexicon-based sentiment analysis in texts us-

ing formal concept analysis. International Journal of

Approximate Reasoning, 155:104–112.

Silva, L. M. M., Val

encio, C. R., Zafalon, G. F. D.,

Columbini, A. C., Filipe, J., Smialek, M., Brodsky, A.,

and Hammoudi, S. (2022). Feature selection with hy-

brid bio-inspired approach for classifying multi-idiom

social media sentiment analysis. In ICEIS, pages 297–

307.

Sinha, S., Narayanan, R. S., and Rakila, R. (2024). Har-

nessing sentiment analysis methodologies for business

intelligence enhancement and governance intelligence

evaluation. Journal of Intelligent Systems and Appli-

cations in Engineering, 12(11s):166–176.

Su, Y. and Shen, Y. (2022). A deep learning-based senti-

ment classiﬁcation model for real online consumption.

Frontiers in Psychology, 13:886982.

Val

encio, C. R., Silva, L. M. M., Ten

orio, W., Zafalon, G.

F. D., Colombini, A. C., and Fortes, M. Z. (2020).

Data warehouse design to support social media anal-

ysis in a big data environment. Journal of Computer

Science, pages 126–136.

Walha, A., Ghozzi, F., and Gargouri, F. (2016). A lexicon

approach to multidimensional analysis of tweets opin-

ion. In AICCSA, pages 1–8. IEEE.

Walha, A., Ghozzi, F., and Gargouri, F. (2017). Etl4social-

data: Modeling approach for topic hierarchy. In

KEOD, pages 107–118.

Wankhade, M., Rao, A. C. S., and Kulkarni, C. (2022).

A survey on sentiment analysis methods, applica-

tions, and challenges. Artiﬁcial Intelligence Review,

55(7):5731–5780.

ENASE 2024 - 19th International Conference on Evaluation of Novel Approaches to Software Engineering

648