Warehousing Data for Brand Health and Reputation with AI-Driven

Scores in NewSQL Architectures: Opportunities and Challenges

Paulo Siqueira

1,2 a

, Rodrigo Dias

1 b

, Jo

ao Silva-Leite

2 c

, Paulo Mann

3 d

, Rodrigo Salvador

2 e

Daniel de Oliveira

2 f

and Marcos Bedo

2 g

Wikki Brasil, Parque Tecnol

ogico, Rio de Janeiro/RJ, Brazil

Institute of Computing, Fluminense Federal University, Niter

oi/RJ, Brazil

Institute of Mathematics and Statistics, Rio de Janeiro State University, Rio de Janeiro/RJ, Brazil

Keywords:

Multidimensional Model, Data Warehousing, Public Relations, Brand Monitoring, NewSQL, HTAP.

Abstract:

This study explores the use of NewSQL systems for brand health and reputation analysis, focusing on multidi-

mensional modeling and Data Warehouses. While row-based and relational OLAP systems (ROLAP) struggle

to ingest large volumes of data and NoSQL alternatives rely on physically coupled models, NewSQL solutions

enable Data Warehouses to maintain their multidimensional schemas, which can be seamlessly implemented

across various physical models, including columnar and key-value structures. Additionally, NewSQL provides

ACID guarantees for data updates, which is instrumental when data curation involves human supervision. To

address these challenges, we propose a Star schema model to analyze brand health and reputation, focusing on

the ingestion of large volumes of data from social media and news sources. The ingestion process also includes

rapid data labeling through a large language model (GPT-4o), which is later reﬁned by human experts through

updates. To validate this approach, we implemented the Star schema in a system called RepSystem and tested it

across four NewSQL systems: Google Spanner, CockroachDB, Snowﬂake, and Amazon Aurora. An extensive

evaluation revealed that NewSQL systems signiﬁcantly outperformed the baseline ROLAP (a multi-sharded

PostgreSQL instance) in terms of: (i) data ingestion time, (ii) query performance, and (iii) maintenance and

storage. Results also indicated that the primary bottleneck of RepSystem lies in the classiﬁcation process,

which may hinder data ingestion. These ﬁndings highlight how NewSQL can overcome the drawbacks of

row-based systems while maintaining the logical model, and suggest the potential for integrating AI-driven

strategies into data management to optimize both data curation and ingestion.

1 INTRODUCTION

The Multidimensional model serves as the logical

foundation for OLAP (Online Analytical Process-

ing) systems and is widely used in its materialized

form within physically row-based database manage-

ment engines that support Data Warehouses (RO-

LAP) (Morfonios et al., 2007; Zhang et al., 2019;

Dehne et al., 2003). Although such engines are de-

https://orcid.org/0000-0003-3652-284X

https://orcid.org/0000-0003-1288-3093

https://orcid.org/0009-0001-7881-6566

https://orcid.org/0000-0002-3597-9170

https://orcid.org/0009-0001-1202-5585

https://orcid.org/0000-0001-9346-7651

https://orcid.org/0000-0003-2198-4670

signed to handle efﬁciently normalized, day-to-day

transactional tasks that follow the store-and-query

pipeline, they struggle in the ingestion of big data

as data typically undergoes a validation workﬂow to

ensure ACID (Atomicity, Consistency, Isolation, and

Durability) properties (Garcia-Molina et al., 2008).

This limits the practical use of multidimensional

models in favor of quick-ingestion NoSQL strategies

with physically-coupled schemas that, in their turn,

may be tricky and difﬁcult to migrate between plat-

forms (Ramzan et al., 2019; Chevalier et al., 2015).

The emergence of NewSQL solutions provides

scalable executions for the mixed HTAP (Hybrid

Transactional/Analytical Processing) paradigm (Val-

duriez et al., 2021; Huang et al., 2020; Grolinger

et al., 2013). They enable Data Warehouses to

continue being speciﬁed through multidimensional

Siqueira, P., Dias, R., Silva-Leite, J., Mann, P., Salvador, R., de Oliveira, D. and Bedo, M.

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges.

DOI: 10.5220/0013279800003929

In Proceedings of the 27th International Conference on Enterprise Information Systems (ICEIS 2025) - Volume 1, pages 51-62

ISBN: 978-989-758-749-8; ISSN: 2184-4992

designs and be implemented transparently across

different physical models, including columnar and

key-value schemas, and ensure ACID properties to

transactional-like operations, which is ideal for OLAP

systems requiring data curation (Chereja et al., 2021).

Quality in evolving data, quick ingestion, and fast

responses to OLAP queries are particularly relevant

requirements in brand health and reputation moni-

toring tools. In this context, companies can analyze

and assess topic trends, perform sentiment analysis,

evaluate the impact of stakeholders’ statements, and

measure engagement generated by digital inﬂuencers.

Such systems require fast responsiveness from so-

cial media, print, and digital press to track positional

trends over time (Doorley and Garcia, 2015). Accord-

ingly, decision-making processes focus on acting and

reacting to preserve or enhance the core values and

perceived quality associated with each brand (Wang

et al., 2021). The advantages of a strong reputation

are substantial and wide-ranging: organizations with

solid reputations attract top talent, secure better terms

with suppliers, and increase their margins for PR-

based products and operations (Wang et al., 2021).

Although reputation may seem like an intangible

resource, it can be quantiﬁed and monitored through

various metrics and analytical methods, such as mea-

suring the reach and practical impact of a report (or

social media post). This capacity to assess reputa-

tion enables companies to objectively evaluate, track,

and strategically manage their position over time,

where the present is exponentially more relevant than

the past. This scenario ﬁts perfectly into a multidi-

mensional model, where dimensions represent data

sources, authors, stakeholders, and metrics directly

collected from the impact of posts and articles (here-

after referred simply to as mentions) that mention

the company’s brands. The time-sensitive nature of

OLAP analysis enables grouping mentions into hot

and cold portions based on their origin date within

a pipeline that also aligns with the multidimensional

structure. However, the expected volume of ingested

data also suggests row-based solutions are unsuitable

for the problem, whereas NoSQL solutions require

yet another design that may be hard to migrate on-

premise. The NewSQL approach is still unexplored

for reputational analyses, to the best of the authors’

knowledge (Li and Zhang, 2022; Wang et al., 2021).

In this study, we ﬁll this gap by examining the

behavior of different NewSQL systems considering a

single Star schema designed for reputation and brand

health analysis. This challenge involves not only a

high volume of data ingestion from social media and

press outlets but also requirements where (i) mentions

must be quickly labeled regarding several facets in a

single data pass, which we accomplish by using an

LLM-based classiﬁcation strategy and (ii) dimension

values, new mentions, and data classiﬁcation may

eventually be reﬁned by a human expert through up-

date and removal operations that require consistency.

Accordingly, we implemented the multidimen-

sional model as a system called RepSystem and

instantiated it in four different NewSQL solutions

with varying materialized physical implementations:

Google Spanner, CockroachDB, Snowﬂake, and

Amazon Aurora. Regarding data classiﬁcation, we

addressed the challenge of quantifying corporate rep-

utation by designing a multi-purpose prompt that dis-

cretizes the value of each mention based on: (i) senti-

ment analysis (with ﬁve distinct labels), and (ii) top-

k topics (drawn from a predeﬁned list). Following

an extensive experimental evaluation, we identiﬁed

signiﬁcant differences between the systems in terms

of: (i) data ingestion time, (ii) storage requirements,

(iii) OLAP query time, and (iv) maintenance costs.

The results further indicated that: (i) the analyzed sys-

tems substantially outperform the baseline ROLAP

implementation, which utilizes a PostgreSQL with

multiple shards, and (ii) the primary bottleneck in the

RepSystem is not ingestion nor query performance in

NewSQL, but rather data classiﬁcation.

The remainder of this paper is structured as fol-

lows: Section 2 presents the main concepts and re-

lated work. Section 3 introduces RepSystem and de-

scribes the materials and methods. Section 4 pro-

vides the experimental evaluation and discusses the

key ﬁndings regarding the observed performance. Fi-

nally, Section 5 concludes the study and outlines di-

rections for future work.

2 RELATED WORK

Business Intelligence Applied to Public Relations

(PR) and Social Media. PR enterprises manage

clients’ reputations aiming at developing robust

communicative strategies (Civelek et al., 2016). Data

is sourced from social media platforms and the print

and digital press, each impacting different audience

proﬁles and shaping stakeholders’ perceptions.

Before the advent of advanced AI models, public

relations depended on human content analysis and

annotation to extract more information, including

the identiﬁcation of topics and the interpretation of

the context surrounding each mention (Macnamara,

2005). Consequently, a set of key proprietary

performance indicators (KPIs) was established for

individual companies, typically integrating audience

reach and impact into dashboard monitors that evolve

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

over time (Marr and Schiuma, 2003). However, the

manual nature of this process limited the scale of

insights, as it excluded the analysis of vast volumes

of digital mentions (Fan and Gordon, 2014). AI-

powered systems can automatically label topics and

assess sentiment, which enables PR professionals to

focus on tracking insights related to brand reputation

and detecting crises in their early stages. Despite

these advancements, two signiﬁcant issues are still

open: (i) how to efﬁciently handle a large volume

of mentions while maintaining system performance

and KPI consistency over time, and (ii) how do

those data management practices align with AI-based

solutions to ensure that the ingestion process evolves

consistently and scales with input?

Multidimensional Logical Model. The multi-

dimensional model is a logical abstraction that

represents data structures in a way that facilitates

efﬁcient querying and analysis. It organizes data into

dimensions and facts, with dimensions representing

descriptive attributes (e.g., time, publishers) and

facts representing quantitative metrics (e.g., views,

comments) (Inmon, 2005; Kimball and Ross, 2013;

Golfarelli and Rizzi, 2009). Such tables are often

structured as a Star schema, where a central fact table

is connected to dimension tables, or a snowﬂake

schema, where dimension tables are normalized. The

model is typically implemented using a materialized

strategy, where data is copied and pre-aggregated,

and stored in a Data Warehouse within a DBMS.

Data Warehousing (DW). The DW approach is

a structured process that materializes data from

multiple sources to support business intelligence.

It consists of four key layers that include (i) the

mapping layer of external sources (providing raw

input data), (ii) the stagging layer of data extracted

from external sources, which serves as an inter-

mediary area where data is standardized, cleaned,

and aggregated by ETL (Extract, Transform, Load)

routines, (iii) the warehouse layer, where material-

ized data are stored, and (iv) the presentation layer,

which interfaces with end-users through BI tools and

dashboards. Speciﬁcally, the third layer provides a

physical implementation for the multidimensional

model, which can take many forms (Inmon, 2005).

DBMS Physical Models. In addition to the

traditional row-based implementation of the multi-

dimensional model in relational DBMSs (ROLAP),

other physical models, including NoSQL approaches,

can be used for implementing the Data Warehouse

layer (Cuzzocrea et al., 2013). This allows for the use

of distributed query engines, enabling near-real-time

processing with improved performance. However,

the challenge lies in mapping the logical model to

speciﬁc physical structures and query languages,

which complicates data migration and tracking

schema evolution. Distributed databases, such as

NewSQL and HTAP systems, also have the potential

to implement the multidimensional model across

nodes or clusters using sharding. These systems not

only overcome the limitations of previous approaches

but also ensure a uniﬁed query language and deliver

high availability and data consistency (Huang et al.,

2020; Valduriez et al., 2021).

PostgreSQL. PostgreSQL is a row-based, relational

DBMS that can implement a multidimensional

logical model in a ROLAP architecture by storing

dimensional and fact tables as clustered and indexed

records. Records can be queried using high-level

SQL, with extensions for multidimensional operators,

such as ROLLUP. While query performance can be

signiﬁcantly improved with indexing and sharding,

these techniques require expertise and monitoring to

maintain efﬁciency. Despite these limitations, Post-

greSQL provides a ﬂexible, cost-effective baseline

solution for systems that do not need the specialized

functionality of a dedicated OLAP engine – particu-

larly when the ingestion window can be relaxed.

Amazon Aurora. Aurora extends PostgreSQL’s

basic principles to the AWS environment, providing

a distributed, fault-tolerant storage architecture. The

storage mechanism uses six shards across different

availability zones with auto-scaling and load balanc-

ing. Replication is log-based and takes full advantage

of AWS resources, providing low-latency updates,

synchronous writes, and asynchronous reads. Aurora

fully supports SQL and the PostgreSQL wire pro-

tocol, optimizing query performance with adaptive

caching and auto-scaling distributed in-memory

search operators (Verbitski et al., 2017).

Google Spanner. Spanner is a distributed NewSQL

system that scales horizontally. It relies on a unique

architecture based on proprietary synchronized

clocks (the Google TrueTime API) that ensures

ACID properties with distributed transactions. Span-

ner replication and sharding are based on user-deﬁned

thresholds and occur automatically, enabling auto

load balancing and reducing maintenance overhead.

Spanner’s shards communicate through the Paxos

protocol replicated in multiple zones for redundancy.

In terms of SQL support, Spanner offers full ANSI

SQL compliance for data querying so that results can

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges

Data sources

Elastic ETL + Classifier

Extract

Transform

Pre-Load

Schema

Alignment

Multi-task

Prompt

Mention

BI Report

Sentiment %

Topic %

Top Mention

Negative

Product A

XYZ on

Regional News

NewSQL Warehouse

Star-Schema

Logical Model

Figure 1: RepSystem pipeline for PR analysis (social media, and digital and printed press). (a) Sources, (b) Workers for ETL,

be combined with multiple resources from Google

Cloud services (Corbett et al., 2013).

CockroachDB. CockroachDB is a NewSQL built

on a shared-nothing architecture that relies on the

Raft consensus protocol to deliver ACID transac-

tions. Isolation is achieved by a two-phase commit

strategy, while a Multi-Version Concurrency Control

mechanism manages the concurrent transaction

schedule. Its physical model is a key-value storage

with automatic sharding and geographical balancing,

where entries can have multiple keys. The DBMS

is fully compatible with PostgreSQL, including

the Wire protocol and the structure of procedures.

CockroachDB engine optimizes query execution

through distributed planning, and the execution can

be integrated with Kubernetes for orchestration,

enabling seamless deployment and scaling in cloud-

native environments (Taft et al., 2020).

Snowﬂake. Snowﬂake uses a multi-cluster architec-

ture to replicate data across multiple regions to ensure

availability. It uses transaction logs for durability and

micro-partitioned storage to optimize performance

and maintain consistency, which enables efﬁcient

data pruning and reduced scan times. Snowﬂake

is SQL-compliant and a cloud-native DBMS for

warehousing, separating workloads for query and

storage to achieve independent scaling. Optimization

is achieved through virtual abstractions applied to

micro-partitions, enabling isolated compute resources

that scale automatically. Those abstractions beneﬁt

from zero-copy cloning, which allows for fast and

cost-effective data duplication without physically

copying the data (Dageville et al., 2016).

Related Work. Data-driven approaches and AI, par-

ticularly Large Language Models (LLM), have trans-

formed the PR ecosystem, especially in reputation

management. This shift calls for new strategies and

tools in the form of a hybrid intelligence that com-

bines traditional PR with data-driven insights (Santa

Soriano and Torres Vald

es, 2021). The study of Jeong

and Park (2023) shows PR efﬁciency enhancement

with AI. An example of this use is modeling sentiment

towards brands, as shown in the studies of Zhao et al.

(2023) and Ingole et al. (2024), which provide accu-

rate brand rankings and sentiment predictions. The

work of Alqwadri et al. (2021) extends sentiment pre-

diction to forecast consumer reliability, while Ur Rah-

man et al. (2022) applies a deep learning-based solu-

tion to enhance accuracy.

While these studies emphasize the data-driven po-

tential for PR applications, they overlook how to man-

age the large volumes needed to realize these ad-

vancements. This study addresses these challenges

by using NewSQL solutions to maintain the seman-

tic integrity of KPIs and ensure evolving consistency

through a detached multidimensional model. Addi-

tionally, we implement an LLM-based approach for

scalable multi-task data labeling, advancing state-of-

the-art reputation management.

3 MATERIAL AND METHODS

This section introduces RepSystem, a PR system

that consolidates data from social media, digital, and

printed press. We set RepSystem to use third-party

services that collect data from original sources

for generality sake and, as some data come from

digitalized printed press, we use the umbrella term

“clipping” to refer to that data gathering stage. Figure

1 presents the RepSystem data pipeline, following a

materialized perspective.

Data Sourcing and Extraction. RepSystem was

designed to handle inputs from ﬁles (spreadsheets),

API outputs in .xml and .json, and a combination

of all of the above. The extraction process for press

content poses unique challenges due to the existence

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

of paywalls, text varying sizes, and multimedia

links. As for the third-party services, we connected

RepSystem with one clipping tool for social media

and four clipping applications that handle different

media channels, including television, newspapers,

and blogs. Although the ﬁlters that collect relevant

mentions for every monitored brand were deﬁned

on the clipping setup, we also set a dynamic list of

regular expressions (regex) terms associated with

each brand, including company names, brands’

names, and topics of interest to further ﬁlter spurious

data from our experimental analysis.

Schema Alignment. The ETL process begins by

mapping the dataset (rows and columns) to the logical

model using a versioned YAML ﬁle that deﬁnes two

parameters for each attribute of the model. The

parameters are: (i) the original and mapped column

names, and (ii) the extraction and transformation

functions. Users can conﬁgure multiple functions

per column, allowing each attribute to return speciﬁc

types, such as a unique value, a list, or a multi-

dimensional list. A try-catch mechanism ensures

that the process iteratively applies each function

per column in the order of priority deﬁned by the

written order. For instance, to extract the number

of reactions, the YAML ﬁle can initially redirect

to the function ext reacts, which attempts to fetch

the reaction value from a single column. If the

function fails (e.g., the column does not exist, or it is

empty), the function ext aggReacts is activated, which

materializes the reaction values as a new column

generated by the aggregate sum of comments, likes,

and shares. A last priority, a default map function

must also be deﬁned to handle missing columns.

Data Transformation. Collected mentions were

transformed to ensure standardization and prevent

missing entries related to content, title, date, publisher

vehicle, and author. The transformation pipeline

consists of sequential steps (the actions mapped to

functions), with a common yet critical component

being the management of data inconsistencies using

regular expressions (regex). We deﬁned a hierarchical

set of functions, ranging from exact two-way matches

to approximate matches, so that, depending on the

organization and structure of the input source, distinct

regex strategies can be seamlessly applied.

Deterministic, Unique Identiﬁers. The ETL

routines use a hashing method based on each men-

tion’s URL

combined with the channel name to

create a unique identiﬁer for each mention, i.e.,

Digitalized contents were linked to cloud-stored ﬁles.

Source 1 Source 2 Source 3

Distributed

Database

Source N

ETL

LLM

ETL

LLM

ETL

LLM

ETL

LLM

Figure 2: RepSystem data ingestion.

the idMention. This process relies on the SHA-1

algorithm combined with UUID5 casting to produce

a deterministic 128-bit identiﬁer, which, by deﬁni-

tion, prevents duplicate inserts. Finally, the ETL

combines this identiﬁer with the company identiﬁer,

as different mentions may be related to distinct

companies, e.g., two competitor brands in the same

article. This composite key is not only semantically

meaningful but also eliminates the need for locks

during insertion, enabling concurrent inserts and

simplifying transaction scheduling.

Data Ingestion. Mentions are always loaded in

append mode, whereas dimensions are loaded using

a destructive merge policy. The ingestion ﬂow

follows the simpliﬁed diagram in Figure 2, assuming

data are ingested upon user requests (without a

predeﬁned time window) Each request triggers a new

containerized process that fetches a chunk of raw

data from a clipper source, aligns and transforms

the internal information, and sends the content for

LLM-based analysis. The orchestration of this

elastic approach can be managed using an external

balancer or a simple mechanism, such as a message

queue associated with a triggering lambda function.

RepSystem uses transactional control tables to track

the status of launched containers so that if the ETL

process fails at any step, its status is recorded for

reproducibility and debugging purposes.

LLM-based Classiﬁcation. RepSystem analyses

require data content to be evaluated regarding senti-

ment analysis (positive, neutral, negative) and top-k

topic identiﬁcation. Since these two analyses are

different in nature, RepSystem relies on a multi-task

prompting strategy to speed up the analysis, avoiding

the problem of double-passing the data and reducing

maintenance overhead. In a nutshell, RepSystem

ETL simultaneously handles the two tasks, each

associated with a set of mentions and a set of label

pairs (sentiment, top-k topics). For each mention in a

task, there is one corresponding label. Therefore, a

task-shared prompt can be used to unify the represen-

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges

COLD

HOT

UUID

STRING

UUID

STRING

wPolarityBase FLOAT

companyName STRING

wReactBase FLOAT

factMention

date DATE

idMention UUID

UUID

STRING

wChannelBase FLOAT

wTopicBase FLOAT

wPolarity FLOAT

score FLOAT

Figure 3: The simpliﬁed Star schema for RepSystem.

tation across all tasks. This single prompt template

is applied to each mention, producing the outputs for

the two tasks as pairs of labels. RepSystem uses a

zero-shot approach alongside the multi-task strategy,

which labels each input according to a pre-trained

encoder for general-purpose PR analysis.

Discrete Prompting. To investigate a suitable

template for the multi-task problem, we use dis-

crete prompting as part of our prompt engineering

strategy (Liu et al., 2023), which involves manually

creating prompts using natural language. Accord-

ingly, we ﬁrst developed an empirical prompt using

the chain-of-thought process and then generated three

additional prompts. Next, we assessed the quality

of the produced prompts by measuring the F1-Score

on a curated and manually annotated dataset, and

calculated the average and standard deviation to

determine the most suitable prompt for data labeling.

Multidimensional Data Model. Figure 3 presents

the RepSystem multidimensional model, which is a

Star schema for the fact table factMention. The

model is mirrored between the “hot” and “cold” por-

tions, with data stored in the former part being the

most recent and data in the latter part being archived.

This separation occurs at the DDL schema level,

enabling tuning and optimization of different work-

loads and indexing. The logical model contains ﬁve-

dimensional tables, each serving a specialized role:

(i) dimPost, which records mention content informa-

tion (entries are relatively large); (ii) dimDataSource,

which tracks the origin of each mention’s data source,

e.g., social media, newspapers, TV, etc.; (iii) dim-

Channel, which enables the identiﬁcation of the spe-

ciﬁc vehicle for the communication, e.g., Social Net

foo, TV foo, etc.; (iv) dimTopic, which structures

the topics of interest related to each company; and

(v) dimPolarity, which deﬁnes the values for senti-

ment analysis. Both dimTopic and dimPolarity di-

mensional values are used as lists of options by the

LLM to solve the multi-task classiﬁcation problem.

The fact table contains references to the di-

mensional tables, with the dimension date being

implicitly captured by the fact table. The numerical

metrics include the number of reactions (sum of likes,

comments, and shares), views, followers (at the time

of the mention), price (if it is a paid mention), and

four scores: weights for polarity ([-1, 1]), channel,

topic, and reactions, along with an overall aggregate

score for the mention ([0, 1]). This score value was

deﬁned as a linear combination to emulate proprietary

metrics, which have their own biases toward speciﬁc

dimensions. In our evaluations, we calculate the score

value as score = w

∑

i∈{Channel, Topic, React}

with

being the polarity weight. Notice LLM-based

classiﬁcation may occasionally miss. Therefore, PR

experts will review the process, correcting relevant

misclassiﬁcations and triggering the adjustment of

associated metrics. These adjustments will be con-

sistently propagated across NewSQL shards without

requiring modiﬁcations to the logical model.

RepSystem implementation. RepSystem was

implemented in Python 3.10.13, venv environment,

with packages numpy 1.26.4, pandas 2.1.4, selenium

4.19.0, Unidecode 1.38.4, beautifulsoup4 4.12.3,

SQLAlchemy 2.0.30, boto3 1.34.161, PyYAML

6.0.1, tenacity 8.5.0. GPT 4o was accessed with

package openai 1.30.3 using a corporate credential

from the Microsoft AzureAI Cloud portal with the

highest available service level agreement for a basic

regular corporate account. RepSystem deployed

on a RepSystem on AWS Cloud Platform using

AWS Lambda with ECS running the ETL as isolated

AWS fargates. The multidimensional logical model

was instanced on several environments, with cloud-

oriented instances (all East-US), namely (i) AWS

RDS for Postgres, (ii) AWS Aurora, (iii) Google

Spanner on Google Cloud Platform, (iv) Cock-

roachDB serverless instance on proprietary cloud,

and (v) Snowﬂake instance running on Amazon AWS.

Queries. The multidimensional model was described

using SQL DDL and instantiated in each DBMS with

minor adjustments for a fairer comparison (type com-

pliance). All of the evaluated queries were expressed

with SQL operators derived from the basic CUBE and

ROLLUP multidimensional operators, which were un-

supported by some DBMSs. Next, we deﬁne ﬁve

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

Table 1: Average F-Measure performance for zero-shoting GPT4o-mini regarding Single Tasks (two data passes) vs. Multi-

Task (just one data pass) by using the best prompts constructed with discrete prompting.

Task Strategy Company A Company B Company C

Sentiment

Single-Task .79 .81 .82

Multi-Task .82 .84 .87

Topic

Single-Task .84 .64 .70

Multi-Task .82 .63 .67

queries that handle different workloads and produce

signiﬁcant insights into various branding reputation

analyses by a senior PR expert, namely: (Q1): the

proportion of mentions by polarity per company,

rolling up on dimension date (lightweight workload

– Overall sentiment towards the brand); (Q2): the av-

erage reaction by channel per company and polarity,

rolling up on dimension date (medium workload – In-

ferred support or detractor channel); (Q3): the most

frequent pairs ⟨company, topic⟩ over time, with a

CUBE on topics (medium workload – Dominant narra-

tives per period); (Q4): highest-ranked channels (sum

of absolute scores) by company aggregate date per

source, rolling up on dimension date (heavy work-

load – Channels impacting your brand); (Q5): top-

k monthly and annual mentions by company, data

source, polarity, top-k

′

topics, and top-k

′′

channels

(heavy workload – Most inﬂuential content over the

period by producer).

Whenever possible, we used SQL-only functions

to reduce the writing, such as in (Q2), which was

reduced to a speciﬁc instance of the Market-Basket

problem solved by the partition-based A-Priori solu-

tion. The same “function-packing” rationale was em-

ployed to UNION result for DBMSs not supporting the

multidimensional operators. Finally, we wrote (Q5)

with LATERAL JOIN/UNNEST to avoid converting win-

dow functions for speciﬁc SQL dialects.

4 EXPERIMENTAL EVALUATION

Dataset Workloads. We curated a dataset by select-

ing two years of data from three real-world compa-

nies across very different domains, totaling over two

million human-annotated fact tuples (4.7GB of raw

text). The dataset covers 12,975 channels, 18 distinct

data sources, and 608 unique topics. We divided the

dataset into 80 stratiﬁed workloads, with stratiﬁcation

based on the original “topic” as the meta attribute for

each company. Next, we removed the sentiment and

topic annotations and submitted the resulting batches

for ETL processing, producing spreadsheets contain-

ing raw content from the clipping sources. Unlike

existing HTAP performance benchmarks (Cole et al.,

2011), this subject-oriented dataset incorporates an

AI-driven annotation aspect into the OLAP problem.

Multi-Task Validation. To assess the viability of

using the proposed multi-task prompting approach

as part of RepSystem’s one-pass labeling stage in

the ETL process, we compared this strategy with the

baseline approach, which utilizes two specialized

prompts (one for sentiment analysis and the other

for top-1 topic classiﬁcation). Table 1 presents the

average F-Measure (a weighted score combining

precision and recall) per company in the dataset

workload, based on a zero-shot comparison between

our multi-task approach and the baseline. This

comparison was conducted using the GPT-4o-mini

LLM (version gpt-4o-mini-2024-07-18) through

the Azure OpenAI API, with a temperature setting

of 0.3. The results show minimal differences in

average performance between the single-task and

multi-task strategies, suggesting that the proposed

approach can effectively utilize a one-pass LLM-

based analysis rather than issuing multiple requests

to label each mention. Accordingly, we adopted

the same zero-shot setup and prompt for the ﬁnal

part of our ETL evaluation in subsequent assessments.

Experimental Setup. The PostgreSQL setup was

conﬁgured with 02 vCores, 4 GiB of RAM, and

32 GiB of storage, instantiated in the East-US

AWS Cloud. For Google Spanner, a comparable

conﬁguration involved provisioning 500 PUs (0.5

nodes), providing approximately 02 vCPUs, 8 GiB

of RAM, and 32 GB of storage. The AWS Aurora

instance was conﬁgured with 02 vCPUs and 4 GiB of

RAM (db.t3.medium class), with automatic storage

scaling and failover across six availability zones.

In Snowﬂake, we used a comparable conﬁguration

with a Small Virtual Warehouse, offering 02 vCores

and 4 GiB of RAM, with automatic scaling and

sharding. Finally, CockroachDB was conﬁgured

with 02 compute units (CUs), suitable for moderate

workloads in a serverless, self-managed environment.

For Aurora, Snowﬂake, and CockroachDB, storage

was set to scale automatically.

Classiﬁcation Elapsed Time. We measured classiﬁ-

cation time separately to assess the individual impact

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges

Distribution (%)

Elapsed Time Elapsed Time Elapsed Time

(a) Company A (b) Company B (c) Company C

Distribution (%)

PostgreSQL

AWS Aurora

CockroachDB

Google Spanner

Snowflake

1000 200 300 400 500 3 4.5

ms s

1000 200 300 400 500 2 2.5

ms s

1000 200 300 400 2 2.5

ms s

Figure 4: Distribution of elapsed time required for ingesting data batches into different NewSQL DBMSs. The time spent on

LLM-based classiﬁcation was subtracted from the total elapsed time.

Time (s)

0.75

1.5

Company A Company B Company C

2x Single Prompt

Multi-Task Prompt

Figure 5: Elapsed time for multi-task analysis.

of this process on the ETL pipeline. Although we em-

ployed the highest available service level for the LLM

(ensuring a throughput of 250k tokens per minute),

our measurements indicated that processing times

did not cause sufﬁcient queuing to exceed this limit.

Figure 5 presents the average elapsed time required to

label a mention. Results show that the classiﬁcation

time using the multi-task prompt remained relatively

stable across all companies examined, regardless

of the speciﬁc topic list from which the prompt

was expected to extract a response. These ﬁndings

suggest that (i) data labeling via the LLM is both

a ﬂexible and scalable option for transformations

involving multiple natural language analyses, and

(ii) while scalable per request, classiﬁcation time still

signiﬁcantly impacts the overall data ingestion time.

Thus, we factored this time into the total duration of

the remaining evaluations.

Data Ingestion. We ingested all data batches into

each DBMS using parallel ETL workers instantiated

as Fargate components in an AWS Elastic Container

Service (ECS) cluster, with the maximum number of

connections limited to 80 parallel transactions. Each

mention was ingested individually (single inserts) to

highlight the impact of transaction management for

each compared approach. This stress injection mech-

anism resembles more closely an HTAP environment,

where on-demand inserts can arrive individually. We

carefully subtracted data labeling time from the anal-

ysis, as it is an expensive component of the ETL

pipeline that could skew the timing results.

Figure 4 shows the elapsed time required to

process the inputs and ingest data into each DBMS.

PostgreSQL struggled with ingestion, as data

mentions had to pass through relational constraint

validation. Aurora exhibited improved performance

by taking full advantage of inner sharding per

company. It reduced the PostgreSQL insertion time

by up to 83% (Company A) and maintained lower

buffer usage, avoiding potential overﬂows. Although

Snowﬂake showed the widest performance variation

during ingestion, its performance was poor, as the tool

is designed for bulk-loading data rather than individ-

ual inserts commonly found in OLTP environments.

While outside the AWS cloud, CockroachDB outper-

formed Snowﬂake and AWS Aurora, on average. In

particular, it was also 5× faster than baseline Post-

greSQL. Finally, Google Spanner outperformed the

competitors in data ingestion, requiring nearly 60%

less time than the second-best competitor, on average.

Query Performance. We executed ﬁve separate

queries for each data batch to evaluate query per-

formance across the tested DBMSs. The results re-

vealed signiﬁcant variation in query response times

and highlighted differences in elapsed times among

the evaluated systems. Snowﬂake demonstrated

the fastest query performance overall, outperforming

PostgreSQL by up to ten times. This performance

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

Distribution (%)

Elapsed Time (s) - Company B

Distribution (%)

Elapsed Time (s) - Company C

Distribution (%)

Elapsed Time (s) - Company B

Distribution (%)

Elapsed Time (s) - Company C

Distribution (%)

Elapsed Time (s) - Company B

Distribution (%)

Elapsed Time (s) - Company C

Distribution (%)

Elapsed Time (s) - Company B

Distribution (%)

Elapsed Time (s) - Company C

Distribution (%)

Elapsed Time (s) - Company B

Distribution (%)

Elapsed Time (s) - Company C

Distribution (%)

0.500.25 0.75 1.251.00 1.50 1.75 0.500.25 0.75 1.251.00 1.50 1.75

Elapsed Time (s) - Company A

Query (Q1) Query (Q1) Query (Q1)

Distribution (%)

1 2 43 5 6 1 2 43 5 6 1 2 43 5 6

Elapsed Time (s) - Company A

(Q2)

Distribution (%)

4 6 108 12 14 4 6 108 12 14 4 6 108 12 14

Elapsed Time (s) - Company A

(Q3)

Distribution (%)

2 64 8 10 2 64 8 10 2

64 8 10

Elapsed Time (s) - Company A

Distribution (%)

2.5

7.5

5.0 10.0 12.5 15.0 17.5

Elapsed Time (s) - Company A

PostgreSQL

AWS Aurora

CockroachDB

Google Spanner

Snowflake

0.500.25 0.75 1.251.00 1.50 1.75

2.5

7.5

5.0 10.0 12.5 15.0 17.5 2.5

7.5

5.0 10.0 12.5 15.0 17.5

Query (Q2) Query (Q2) Query (Q2)

Query (Q3) Query (Q3) Query (Q3)

Query (Q4) Query (Q4) Query (Q4)

Query (Q5) Query (Q5) Query (Q5)

Figure 6: Distribution of elapsed time required for querying different companies. Line entries corresponds to queries and

column entries are companies.

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges

highlights Snowﬂake’s highly optimized query exe-

cution engine that was particularly evident for queries

(Q4) and (Q5), which involve multiple nested/lateral

joins. Google Spanner was the second-best per-

former, trailing Snowﬂake by up to 35%, but consis-

tently surpassing the other systems regarding elapsed

time for every examined query.

CockroachDB and AWS Aurora exhibited similar

query performance, with CockroachDB substantially

outperforming Aurora in complex queries. Despite

their similarities, PostgreSQL’s non-distributed ar-

chitecture occasionally showed latency spikes, while

AWS Aurora beneﬁted from its high scalability and

inner-sharding mechanisms for batch queries. The

performance per query varied widely, with queries

(Q3) to (Q5) posing more challenges across all

DBMSs. These queries required up to 10× more

time and memory, as observed in the panel insights

for CockroachDB, Snowﬂake, and Spanner, suggest-

ing that query complexity and data relationships heav-

ily inﬂuenced execution times, regardless of the un-

derlying cloud system. PostgreSQL struggled signif-

icantly with heavy workloads (Q4) – (Q5) due to the

lack of distributed processing and limited parallelism

in query execution. However, it managed lightweight

workloads like (Q1) – (Q2) relatively well, highlight-

ing its suitability for midsize query workloads.

Although the same multidimensional model was

implemented by all DBMSs, trade-offs were observed

between querying and ingestion. For instance, while

Snowﬂake excelled in querying, its ingestion perfor-

mance was outpaced by CockroachDB and Spanner,

underscoring the importance of understanding the

speciﬁc workload requirements when implementing

the multidimensional model in a physical system.

Storage Requirements. Figure 7 shows the storage

consumption across all DBMSs after processing

and ingesting the mentions. As expected, Post-

greSQL and AWS Aurora showed similar storage

consumption, with Aurora using nearly half the

space of PostgreSQL. While Snowﬂake relies on

columnar storage and heavy data compression, it

ranked last in overall storage cost due to metadata

retention for each column and insert. CockroachDB

outperformed PostgreSQL and Aurora in terms of

storage management, requiring 5× and 1.3× less

storage than PostgreSQL and Aurora, respectively.

CockroachDB’s controlled replication model pro-

vided additional resilience without signiﬁcantly

increasing storage usage, requiring less space than

all the other systems. Finally, Google Spanner

displayed storage optimization, requiring slightly

more storage than CockroachDB on average. These

Storage (GiB)

2,5

7,5

12,5

17,5

DBMS

Google Spanner

AWS Aurora

CockroachDB

PostgreSQL

Snowflake

Figure 7: Overall storage demanded for each solution.

Table 2: MedianRank aggregation for the average perfor-

mance of compared NewSQL approaches. The ﬁrst position

tie was broken by the next lowest ranking.

DBMS

Ingestion

Storage

MedianRank

A. Aurora 3 4 3 3

G. Spanner 1 2 2 1

CockroachDB 2 3 1 2

Snowﬂake 4 1 4 4

ﬁndings indicate that Aurora mitigated PostgreSQL’s

traditional row-based storage costs, which were

much different than the columnar model employed

by Snowﬂake. Overall, Spanner and CockroachDB’s

distributed design offered robust performance, setting

the benchmark for storage efﬁciency.

Comparison of NewSQL Approaches. We con-

solidated the average performance for ingestion,

querying, and storage requirements into rankings

(1 = best, 5 = worst) to aggregate the individual

performances using the MedianRank strategy (Fagin

et al., 2003). The query performance was ranked

based on the average performance for queries (Q3)

through (Q5). Table 2 summarizes the ranking distri-

bution and ﬁnal positions, where Google Spanner and

CockroachDB achieved the best performance across

all three criteria for the evaluated PR data. Snowﬂake

exhibited the largest trade-off, being the fastest query

solution but with high storage costs. AWS Aurora

ranked third overall, primarily struggling with data

ingestion. In practice, however, budget and long-term

maintenance are important constraints to address

when planning project implementation.

Infrastructure Costs. Figure 8 presents a simpliﬁed

cloud-based cost summary for the previous exper-

iments. Since the comparison is cloud-dependent,

we deﬁned a Relative Cost metric that averages the

total cost billed for the entire evaluation, divided by

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

Relative Cost (%)

1,750

250

500

750

1,000

1,250

1,500

DBMS

PostgreSQL

AWS Aurora

CockroachDB

Google Spanner

Snowflake

Figure 8: Comparison of scaled cloud-based costs.

the (fractional) number of hours it took to complete.

We set PostgreSQL as the reference value (with

100% of Relative Cost) and then linearly scaled the

costs of the remaining DBMSs in comparison to this

reference. Self-hosted CockroachDB offered the

lowest cost baseline, being the cheapest alternative

among the compared NewSQL solutions. AWS

Aurora optimized costs by adjusting resources based

on the workloads but required a higher budget

compared to PostgreSQL. Snowﬂake, with its ﬂexible

consumption-based pricing model, was the most

expensive among the competitors due to struggles

with individual data ingestion. Finally, Spanner’s

costs were proportionally higher than those of Cock-

roachDB and Aurora, reﬂecting potential pricing

differences between the host clouds.

Maintenance Issues. While PostgreSQL main-

tenance has native limitations that require human

expertise for query and load tuning, AWS Aurora

provides insights and maintenance warnings through

the Amazon RDS Console, indicating workload

bottlenecks and offering recommendations for query

optimization. This proprietary cloud strategy is also

observed in Google Spanner, which relies on Google

Cloud monitoring and logging. Snowﬂake and

CockroachDB offer their cloud-agnostic consoles

that provide insights and can even automatically

apply enhancements for workloads. Snowﬂake offers

a more detailed analysis, as storage and queries are

evaluated separately.

Discussion. The evaluation of a real-world dataset for

brand monitoring highlighted performance and scala-

bility differences across various NewSQL approaches

implementing the same logical model for RepSystem.

The ﬁndings showed that these approaches, partic-

ularly Google Spanner and CockroachDB, signiﬁ-

cantly outperformed traditional ROLAP-based sys-

tems in terms of data ingestion, even under the stress

of single-insert operations. Optimized LLM-based

classiﬁcation in a single data pass was a bottle-

neck compared to data ingestion into the NewSQL

DBMSs. Results indicated that Snowﬂake dominated

query performance, with Spanner and CockroachDB

also reaching competitive values. Regarding stor-

age efﬁciency, CockroachDB performed the best, fol-

lowed by Google Spanner.

If all performance criteria (ingestion, query per-

formance, and storage) are equally weighted, Google

Spanner emerges as the top contender, closely fol-

lowed by CockroachDB. Moreover, when considering

cost-effectiveness, CockroachDB, AWS Aurora, and

Google Spanner offered the best value, demonstrat-

ing efﬁciency in handling large-scale deployments at

competitive costs. Finally, for organizations with di-

verse cloud environments or multi-cloud strategies,

CockroachDB and Snowﬂake provided superior ﬂex-

ibility, while Spanner and Aurora offered deeper inte-

gration within their respective cloud ecosystems. Ac-

cordingly, speciﬁc needs related to performance, cost,

and ﬂexibility will likely result in different NewSQL

ﬂavors implementing the same logical model.

5 CONCLUSION AND FUTURE

WORK

We explored the use of NewSQL systems for brand

health analysis, focusing on a logical multidimen-

sional model across various physical implementa-

tions. We found that NewSQL systems, tackling the

logical-physical model separation, can overcome the

limitations of traditional ROLAP systems, simplify

DBMS migration, and enhance schema evolution. We

proposed a simpliﬁed Star schema for PR analysis, in-

tegrating data from social media, digital, and printed

press through the RepSystem application. Unlike ex-

isting HTAP benchmarks, this approach required AI

for multiple data labeling, which we addressed with a

multi-task prompt strategy using a distributed LLM.

Our evaluation revealed performance differ-

ences across different NewSQL solutions, such

as (i) Snowﬂake excelling in complex queries but

struggling with data insertion and (ii) CockroachDB

outperforming the competitors in storage efﬁciency

but being less efﬁcient than Google Spanner in heavy

workloads. Overall, NewSQL systems outperformed

the baseline ROLAP PostgreSQL, with data classiﬁ-

cation being the main identiﬁable bottleneck. Future

work will include evaluating larger databases, simu-

lating near-real-time PR data ﬂows, and developing a

benchmark for brand health and monitoring.

Warehousing Data for Brand Health and Reputation with AI-Driven Scores in NewSQL Architectures: Opportunities and Challenges

ACKNOWLEDGEMENTS

This study was ﬁnanced by the Coordenac¸

ao de

Aperfeic¸oamento de Pessoal de N

ıvel Superior -

Brasil (CAPES), Brazil - Finance Code 001, and by

the Carlos Chagas Filho Research Support Founda-

tion of the Rio de Janeiro State (FAPERJ), Brazil

(Grants E-26/204.238/2024 and E-26/204.544/2024).

REFERENCES

Alqwadri, A., Azzeh, M., and Almasalha, F. (2021). Appli-

cation of machine learning for online reputation sys-

tems. IJAC, 18(3):492–502.

Chereja, I., Hahn, S. M. L., Matei, O., and Avram, A.

(2021). Operationalizing analytics with newsql. In

Soft. Eng. and Alg., pages 249–263. Springer.

Chevalier, M., El Malki, M., Kopliku, A., Teste, O., and

Tournier, R. (2015). How can we implement a mul-

tidimensional data warehouse using nosql? In EIS,

pages 108–130. Springer.

Civelek, M. E., C¸ emberci, M., and Eralp, N. E. (2016). The

role of social media in crisis communication and crisis

management. IJRBSC, 5(3).

Cole, R. et al. (2011). The mixed workload ch-benchmark.

In IWTDS, pages 1–6.

Corbett, J. C. et al. (2013). Spanner: Google’s globally dis-

tributed database. ACM Trans. Comput. Syst., 31(3).

Cuzzocrea, A., Bellatreche, L., and Song, I.-Y. (2013). Data

warehousing and olap over big data: current chal-

lenges and future research directions. In DOLAP, page

67–70. ACM.

Dageville, B. et al. (2016). The snowﬂake elastic data ware-

house. In ICMD, page 215–226, NY, USA. ACM.

Dehne, F., Eavis, T., and Rau-Chaplin, A. (2003). Parallel

multi-dimensional rolap index. In ISCCG, pages 86–

93.

Doorley, J. and Garcia, H. F. (2015). Reputation manage-

ment: The key to successful public relations and cor-

porate communication. Routledge.

Fagin, R., Kumar, R., and Sivakumar, D. (2003). Efﬁcient

similarity search and classiﬁcation via rank aggrega-

tion. In SIGMOD, pages 301–312.

Fan, W. and Gordon, M. D. (2014). The power of social

media analytics. Commun. ACM, 57(6):74–81.

Garcia-Molina, H., Ullman, J., and Widom, J. (2008).

Database Systems. PH, USA, 2 edition.

Golfarelli, M. and Rizzi, S. (2009). Data Warehouse

Design: Modern Principles and Methodologies.

McGraw-Hill Education.

Grolinger, K., Higashino, W. A., Tiwari, A., and Capretz,

M. A. (2013). Data management in cloud environ-

ments: Nosql and newsql data stores. J. of Cloud

Computing: Advances, Sys. and App., 2:1–24.

Huang, D., Liu, Q., Cui, Q., Fang, Z., Ma, X., Xu, F.,

Shen, L., Tang, L., Zhou, Y., Huang, M., Wei, W.,

Liu, C., Zhang, J., Li, J., Wu, X., Song, L., Sun, R.,

Yu, S., Zhao, L., Cameron, N., Pei, L., and Tang, X.

(2020). Tidb: a raft-based htap database. PVLDB,

13(12):3072–3084.

Ingole, A., Khude, P., Kittad, S., Parmar, V., and Ghotkar,

A. (2024). Competitive sentiment analysis for brand

reputation monitoring. In ICETITE, pages 1–7.

Inmon, W. (2005). Building the Data Warehouse. Wiley.

Jeong, J. Y. and Park, N. (2023). Examining the

inﬂuence of artiﬁcial intelligence on public rela-

tions: Insights from the organization-situation-public-

communication (ospc) model. JCRI, 9(7):485–495.

Kimball, R. and Ross, M. (2013). The Data Warehouse

Toolkit: The Deﬁnitive Guide. Wiley.

Li, G. and Zhang, C. (2022). Htap databases: What is new

and what is next. In SIGMOD, page 2483–2488, New

York, NY, USA. ACM.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig,

G. (2023). Pre-train, prompt, and predict: A system-

atic survey of prompting methods in natural language

processing. ACM Comput. Surv., 55(9).

Macnamara, J. (2005). Media content analysis: Its uses,

beneﬁts and best practice methodology. Asia-Paciﬁc

Public Relations J., 6:1.

Marr, B. and Schiuma, G. (2003). Business performance

measurement – past, present and future. Management

Decision, 41(8):680–687.

Morfonios, K., Konakas, S., Ioannidis, Y., and Kotsis, N.

(2007). Rolap implementations of the data cube. ACM

Comput. Surv., 39(4):12–es.

Ramzan, S., Bajwa, I. S., Kazmi, R., and Amna (2019).

Challenges in nosql-based distributed data storage: A

systematic literature review. Electronics, 8(5).

Santa Soriano, A. and Torres Vald

es, R. M. (2021). En-

gaging universe 4.0: The case for forming a public

relations-strategic intelligence hybrid. Public Rela-

tions Rev., 47(2):102035.

Taft, R. et al. (2020). Cockroachdb: The resilient

geo-distributed sql database. In SIGMOD, page

1493–1509, NY, USA. ACM.

Ur Rahman, M. W., Shao, S., Satam, P., Hariri, S., Padilla,

C., Taylor, Z., and Nevarez, C. (2022). A BERT-based

Deep Learning Approach for Reputation Analysis in

Social Media . In AICCSA, pages 1–8. IEEE.

Valduriez, P., Jimenez-Peris, R., and

Ozsu, M. T.

(2021). Distributed Database Systems: The Case for

NewSQL, pages 1–15. Springer.

Verbitski, A. et al. (2017). Amazon Aurora: Design con-

siderations for high throughput cloud-native relational

databases. In SIGMOD.

Wang, Y., Cheng, Y., and Sun, J. (2021). When public rela-

tions meets social media: A systematic review of so-

cial media related public relations research from 2006

to 2020. Public Relations Rev., 47(4):102081.

Zhang, Y., Zhang, Y., Wang, S., and Lu, J. (2019). Fusion

olap: Fusing the pros of molap and rolap together for

in-memory olap. TKDE, 31(9):1722–1735.

Zhao, R., Gui, L., Yan, H., and He, Y. (2023). Track-

ing brand-associated polarity-bearing topics in user

reviews. TACL, 11:404–418.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems