Bibliometric Insights into Web Scraping and Advanced AI-Based

Models for Valuable Business Data

Barba Giuliana

, Lazoi Mariangela

and Lezzi Marianna

Department of Engineering for Innovation, University of Salento, Campus Ecotekne Via Monteroni, Lecce 73100, Italy

Keywords: Web Scraping, Artificial Intelligence, Natural Language Processing, Business Data Analysis, Sentiment

Analysis.

Abstract: The integration of advanced Artificial Intelligence (AI) based models with web scraping technique opens new

opportunities for businesses, streamlining the extraction of valuable insights from the huge amounts of online

data. This integration is strategic in overcoming the challenges of extracting dirty data and retrieving missing

information, which could otherwise compromise the reliability of business decisions. Despite the growing

importance of integrating AI-based models and web scraping techniques in the business context, there exists

a significant gap in understanding the specific implications. To address this gap, our study uses a systematic

literature review (SLR) and bibliometric analysis to examine the implications of the combined use of advanced

AI-based models and web scraping in business contexts. The study highlights four distinct clusters that sug-

gest potential research areas in the areas of “Machine Learning (ML) for sentiment analysis”, “Artificial In-

telligence and Natural Language Processing (NLP) integration”, “Data intelligence and optimization”, “NLP

and Deep Learning (DL) integration”. The paper offers both theoretical and practical contributions, providing

a clear overview of emerging research directions in the field of AI-based models and web scraping integration

and guiding managers in adopting advanced AI-based models to enhance the value of web data obtained

through scraping.

1 INTRODUCTION

Online data are a crucial tool for knowledge genera-

tion and knowledge-based decision support (Rejeb et

al., 2020). In particular, online data enable companies

to consolidate information by transforming it into

useful insights for marketing and service decisions,

such as optimising pricing decisions based on con-

sumer behaviour (Jorge et al., 2020) or segmenting

customers based on their perceptions (Rejeb et al.,

2020). Moreover, gathering online data not only

facilitates a comprehensive analysis of web users'

perceptions of corporate products and services

(Bisconti et al., 2019), but also contributes positively

to the enrichment of the company's information assets

(M. A. Khder, 2021).

Companies that extract insights from online data

prove to be more agile and more adaptive to market

dynamics (Rejeb et al., 2020), highlighting the crucial

https://orcid.org/0009-0004-9138-2063

https://orcid.org/0000-0003-4280-1597

https://orcid.org/0000-0003-3526-8421

importance of the strategic use of web data to drive

business decisions and maintain a distinctive compet-

itive position in the global landscape.

Web scraping is a valuable technique for auto-

matic data collection from the internet (Tanasescu et

al., 2022) about customer feedback (Bisconti et al.,

2019) and sentiment about particular products (Jorge

et al., 2020). However, web data from various online

sources such as websites, blogs and social media is

often unstructured and consists of a wide range of het-

erogeneous information such as text, images and

video (Eberendu, 2016). To enhance the value of un-

structured data, advanced Artificial Intelligence (AI)

based models can extract hidden insights from news

and opinions on the web providing useful in under-

standing customer attitudes (Chan et al., 2022). This

information can help decision-making strategies (Ta-

nasescu et al., 2022) and refining customer targeting

(Rejeb et al., 2020).

Giuliana, B., Mariangela, L. and Marianna, L.

Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data.

DOI: 10.5220/0012686900003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 1, pages 321-328

ISBN: 978-989-758-692-7; ISSN: 2184-4992

321

The implementation of advanced AI-based mod-

els, together with the web scraping, presents novel

opportunities for companies as it facilitates the man-

agement of vast amounts of data from the internet

with the objective of extracting valuable insights (Ar-

junan, 2022). This integration is particularly im-

portant for overcoming the challenges of mining dirty

data and retrieving missing information that could

otherwise affect the reliability of business decisions

(Tanasescu et al., 2022). Several studies reflect the

growing interest in integrating advanced models

based on artificial intelligence and web scraping. For

instance, (Kumar et al., 2021) focus on estimating the

relevance of online documents; while, (Sahu et al.,

2022) address the evaluation of customer reviews in

the context of e-commerce. Moreover, (Arjunan,

2022) study highlights the potential of integrating ad-

vanced artificial intelligence-based models to enrich

web data. However, the lack of analysis on implica-

tions of integrating AI-based models and web scrap-

ing in business contexts reveals a gap that needs to be

filled.

Through a Systematic Literature Review (SLR)

and bibliometric analysis, the objective of this study

is to investigate the current key research directions in

the combined use of advanced AI-based models and

web scraping applied in business contexts. This

promises to highlight the main implications in this

emerging field and to help explore a new frontier in

the literature, which still seems to be developing. In

particular, the paper aims to provide an answer to the

following research question: "What are the implica-

tions of applying web scraping integrated with ad-

vanced AI-based models in the business context?". To

answer this research question, a network examination

and visual analysis of textual bibliographic data ac-

quired from Scopus database were conducted, with

the aim of creating a map of interconnections using

VOSviewer (a software tool for bibliometric analy-

sis).

2 BACKGROUND

2.1 Web Scraping and Advanced

AI-Based Models for Business

Web scraping - also known as web crawling (M. A.

Khder, 2021), web harvesting (Zhao, 2017), web data

extraction (Zhao, 2017), web data scraping (Barbera

et al., 2023), or screen scraping (Arjunan, 2022) - is

an advanced technique for systematically extracting

unstructured data from websites and subsequently

transforming them into structured data to be stored in

a file or database (M. A. Khder, 2021s).

The significant advantage of web scraping lies in

the automation of the process of searching and ex-

tracting data, eliminating the need to manually copy

information from a website to a file (M. A. Khder,

2021; Tanasescu et al., 2022). Moreover, in the con-

text of data science, web scraping emerges as a tool

of considerable scientific interest, recognized as an

efficient method for collecting big data (Barbera et

al., 2023), thanks to low execution and maintenance

costs (M. A. Khder, 2021).

The applications of web scraping span various

sectors, significantly contributing to the business and

marketing world. In particular, web scraping is

widely employed in the analysis of online user feed-

back, such as for the support of event management

(Bisconti et al., 2019), the prediction of consumers

‘choices (Corallo et al., 2020), the monitoring

changes in product prices (Jorge et al., 2020) or to in-

vestigate employee opinions (Tanasescu et al., 2022).

Furthermore, in the job search engine sector, there are

several applications of web scraping to define job

profiles (De Mauro et al., 2018) and conduct labour

market surveys (Vankevich & Kalinouskaya, 2021).

On the other hand, AI has revolutionised the business

landscape, by enabling the use of models capable of

handling an enormous stream of heterogeneous data

identifying hidden patterns or trends in large datasets

(Afandizadeh et al., 2023). The application of AI-

based models is increasingly significant in analysing

online data (such as customer reviews or sales infor-

mation) to detect behavioural patterns and identify

preferences (Chan et al., 2022).

The application of these models allows companies

to delineate detailed customer segmentation strate-

gies, opening the door to personalised marketing

campaigns, individualised product suggestions and

customer experiences tailored to specific needs

(Afandizadeh et al., 2023). Therefore, companies can

gain extensive and contextual knowledge from online

data, enabling them to make informed decisions and

develop precise business strategies (Chan et al.,

2022).

AI-based models are pivotal in audio, video, im-

age and text processing, significantly contributing to

the optimisation of diverse business operations and

demonstrating how innovative technologies have

been transformed into practical commercial solutions.

For instance, (Arslan & Cruz, 2022) use an AI-

based model to exploit large corpora of textual docu-

ments while (Sarica et al., 2020) propose the TechNet

model, demonstrating how AI can be used to extract

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

322

relational knowledge from complex patent docu-

ments or also to find optimal candidates facilitating

recruitment process (Sridevi & Suganthi, 2022).

AI-based models are also used to analyse images

for hot rolling mill process defect detection (Latham

& Giannetti, 2023) or to improve the performance of

harvesting robots (Tang et al., 2020).

2.2 Integration of Web Scraping and

Advanced AI-Based Models in the

Business Domain

The combination of advanced AI-based models and

web scraping presents a modern strategy for obtaining

high quality data from the web by optimizing the data

collection process, eliminating redundant information

and ensuring the delivery of more relevant and clean

data (Arjunan, 2022).

The application of this integration plays a key role

in providing crucial information for understanding

consumer sentiment, thus helping to drive informed

decisions in the business environment (Sahu et al.,

2022).

The synergies of web scraping and AI-based mod-

els integration are evident in various contexts, as

shown by (Hao et al., 2023), which extract vulnerable

online contracts using web scraping, and trained an

AI-based model to enhance the security of smart con-

tracts. (Thuan et al., 2022) use this integration to sim-

plify the assignment of professional roles based on

employee competencies. Furthermore, (Kinne & Re-

sch, 2018) propose an AI-based model that assesses

the companies’ degree of digital innovation by ana-

lysing web scraped data from their websites.

The joint use of AI-based models for sentiment

and opinion recognition, together with web scraping,

is another demonstration of how AI can enhance the

value of data collected from online posts and reviews

(M. A. Khder, 2021). A tangible example is provided

by (Kafeza et al., 2023), which exploit these data

from social media to model customers' actions in or-

der to discover their behavioural patterns. Addition-

ally, (Tanasescu et al., 2022) collect data via web

scraping on employee feedback and integrate it into

an AI-based model to facilitate decision-making pro-

cesses based on the opinions of company employees.

Moreover, other studies show that AI-based mod-

els improve the accuracy of web scraping by provid-

ing more relevant information and cleaner data. In

particular, (Reddy et al., 2021) present an AI-based

model that supports the scraping process by summa-

rizing meaningful insights of online documents. (Ar-

junan, 2022) show that AI-based model can be spe-

cialized in recognising the relevance of web and in

eliminating superfluous information. On the other

hand, (M. Lee & Na, 2023) use this integration to se-

lect relevant companies information by retrieving

customised market data from the web; whereas (Sahu

et al., 2022) propose an AI-based model to evaluate

the honesty of online scrapped reviews.

The combination of AI-based models and web

scraping holds promise, but poses significant chal-

lenges. First, further exploration in the field of AI-

based models is necessary to improve the perfor-

mance and accuracy of results as the current imple-

mentation is in its early stages of development (Sahu

et al., 2022; Tanasescu et al., 2022). Moreover, the

malleability of website design and layout presents a

significant obstacle, as scripts can quickly become

obsolete, and adapting them to hundreds of sites is

nearly impossible (Patnaik & Babu, 2021).

Furthermore, the analysis of sentiments poses

even greater challenges, as it involves identifying

complex phenomena like negation or irony in the an-

alysed texts (Tanasescu et al., 2022).

3 METHODOLOGY

The research strategy was designed to gain the current

key research directions in the combined use of ad-

vanced AI-based models and web scraping for busi-

ness applications. In particular, the research aims to

address the key question: "What are the implications

of applying web scraping integrated with advanced

AI-based models in the business context?".

To address this question, a SLR was conducted as

the main methodology for data collection. This pro-

cess, known for its transparency, scientific rigor, and

replicability, allows for the identification, highlight-

ing, and evaluation of various sources of information,

facilitating the cataloguing and structured compari-

son of results (Del Vecchio et al., 2023; Tranfield et

al., 2003).

Additionally, a bibliometric analysis was con-

ducted to examine the collected sample and gain an

exploratory understanding of the main themes of in-

terest. Bibliometric analysis, introduced by

(Pritchard, 1969), is recognized as a crucial method-

ology for exploring research in various disciplines,

highlighting its nature, and providing particularly val-

uable insights in fields still in development (Donthu,

Kumar, Mukherjee, et al., 2021). As suggested by

(Del Vecchio et al., 2023), the research process is

structured into three main phases: definition of the

search framework and data sample, preliminary anal-

ysis of the sample, and data analysis.

Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data

323

3.1 Definition of Search Schema and

Data Sample

In line with the objective of this study and following

(Corallo et al., 2021) SRL procedure, the keywords

were identified for the investigation of the fields of

interest: web scraping, artificial intelligence, text

mining and business.

The choice of these keywords was based on the

consideration of similar concepts, synonyms, and ac-

ronyms identified in the theoretical context. The com-

bination of these keywords was implemented using

accurate mathematical logical connectors (Boolean

and Proximity operators) as it is provided in Table 1.

Table 1: Query structure used for Scopus.

SCOPUS

TITLE-ABS-KEY

("web") W/2 ("scrap*" OR "craw*" OR "harvest*" OR

"data extract*" OR "data harvest*" OR "data scrap*" OR

"automat* scrap*" OR "information extract*")) OR

(“screen scrap*" OR "automat* web scrap*" OR "spi-

der*"))

AND

("Machine learning" OR "ML" OR "artificial intelli-

gence" OR "AI" OR "natural language process*" OR

"NLP" OR "natural language generat*" OR "NLG" OR

"artificial neural network" OR "ANN" OR "neural net-

work" OR "deep neural network" OR "DNN" OR "recur-

rent neural network" OR "RNN" OR "unsupervised

learn*" OR "convolutional neural network" OR "CNN"

OR "supervised learn*" OR "learning algorithm" OR

"convolutional neural network" OR "CNN" OR "SVM"

OR "support vector machine")

AND

("text data" OR "Text mining" OR "data mining" OR

"web mining" OR "web data mining" OR "knowledge dis-

covery from text" OR "KDT" OR "information extrac-

tion" OR "IE" OR "information retrieval" OR "IR" OR

"data enrich*" OR "data augment*" OR "data extract*"

OR "data collect*" OR "semantic analysis" OR "senti-

ment analysis" OR "sentiment recognition" OR "text anal-

ysis" OR "data analysis" OR "information retrieval" OR

"IR" OR "big data" OR "Text Summarization" OR "Text

Generation" OR "Market Intelligence" OR "Predictive

Analytics" OR "Reputation Monitoring")

AND

("Business" OR "firm" OR "enterprise" OR "company"

OR "organization" OR "manufactur*" OR "industry" OR

"market*" OR "account*" OR "sales" OR "decision mak-

ing" OR "knowledge manag*" OR "strategic planning"

OR "financial" OR "HRM" OR "human resource

manag*" OR "SCM" OR "supply chain manag*" OR "risk

management" OR "retail" OR "CRM" OR "customer re-

lationship manag*" OR "insurance")

The query used for Scopus (Table 1) was modified

in the field tag and proximity operator to adapt for

Web of Science (WOS).

Papers containing the keywords in the title, ab-

stract, and keywords were searched in the Scopus and

Web of Science electronic scientific database in De-

cember 2023, chosen for their robustness as biblio-

graphic data sources (Pranckutė, 2021). Subsequently

the initial sample of 455 and 129 articles, respectively

from Scopus and WOS, was refined to 459 unique by

setting English language limitation as the only filter.

All results were exported in RIS format, retaining

all necessary information for conducting the subse-

quent analysis, including title, year, abstract, and key-

words.

3.2 Preliminary Analysis of the Sample

The second step of the adopted methodology focuses

on the statistical analysis through MS Excel of

metadata associated with the selected papers. Ini-

tially, we explored the quantity of publications over

the years (Figure 1). The graph reveals that from 1998

to 2008, the production of studies in this field is neg-

ligible. However, from 2009 to 2017, a gradual in-

crease occurs, indicating a growing interest in data

management and its connection to the web within the

scientific community. It's interesting to note that from

2017 to 2023, there is a rapid growth in the number

of studies related to this domain, with a remarkable

growth rate of 100% over six years.

Additionally, the distribution of papers based on

their respective disciplinary areas was examined.

"Computer Science" emerges as the predominant

area, representing 36.5% of the total articles pub-

lished on this subject. Following in order are the dis-

ciplinary areas of "Engineering" (15.4%), and "Deci-

sion Science" (10.3%), confirming the multidiscipli-

nary nature and broad scope of web scraping and AI-

based models in the business context.

Figure 1: Distribution of papers by year.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

324

3.3 Data Analysis

In the third phase of this research methodology, an in-

depth bibliometric analysis has been chosen as a suit-

able method for macro-level evaluations of the 459

sample of selected papers. Additionally, since it is not

influenced by researcher subjectivity, it can help to

decrease reviewer bias during research (van Oorschot

et al., 2018).

In the context of this study, bibliometric analysis

played a crucial role in delineating implications and

future research directions related to combined use of

AI-based models and Web Scraping within business

applications.

Specifically, through this analysis, the objectives

were to: i) comprehend the most recurring topics in

the fields of analysis; ii) identify relationships among

them; iii) highlight emerging themes and trends over

time. Network analysis and graphical exploration of

textual data were conducted using VOSviewer soft-

ware, known for its ability to create easily interpreta-

ble co-occurrence maps on a large scale (van

Oorschot et al., 2018). Following the guidelines of

(Donthu, Kumar, Mukherjee, et al., 2021), the analy-

sis focused on counting recurring terms in title, key-

words and abstracts, with a minimum frequency of

10. By carrying out a preliminary merge of the ex-

tracted keywords and creating a specific thesaurus,

out of the 3658 terms examined, 67 meet the thresh-

old, contributing to a clear definition of conceptual

relationships within papers keywords.

4 EXPLORING FUTURE

RESEARCH DIRECTIONS IN

THE COMBINED USE OF WEB

SCRAPING AND ADVANCED

AI-BASED MODELS

For the data analysis phase, VOSviewer offers three

visualizations, namely the network, overlay, and den-

sity visualizations which are commented in next sec-

tions.

4.1 Network Visualization

Bibliometric analysis enables a network analysis that

relies on term co-occurrence. The co-occurrence map,

presented in Figure 2, clearly displays four main clus-

ters generated by binding a minimum of one item per

cluster.

Cluster 1 (red), comprising 22 terms, focuses on the

implementation of “Machine Learning (ML) for senti-

ment analysis”, with an emphasis on big data, learning

algorithms, web scraping and online social networks.

Additionally, the topics cover aspects of financial mar-

kets, risk assessment, e-commerce, investment, data

analysis, and human impacts. This implies an interest

in comprehending human and market dynamics using

sophisticated data analysis techniques.

Cluster 2 (green), consisting of 19 terms, ad-

dresses the “AI and NLP integration” to optimise the

management and analysis of online information. Key

topics include classifying information, semantics,

search engines, extracting web information and man-

aging knowledge, with a specific emphasis on ontol-

ogies and semantic analysis.

Cluster 3 (blue), with 16 terms, focuses on “Data

intelligence and optimization” to improve the extrac-

tion and management of information through data

mining and text mining approaches. This indicates an

emphasis on improving data collection and manipula-

tion operations, alongside decision-making systems.

Cluster 4 (yellow), consisting of 10 terms, focuses

on the “NLP and DL integration”, such as deep neural

networks and convolutional neural networks. This

cluster highlights an interest in advanced natural lan-

guage comprehension to facilitate informed decision-

making, as well as emphasizing the analysis of opin-

ions through deep learning techniques.

Moreover, the position of the largest nodes in Fig-

ure 2 confirms that the extraction and analysis of data,

both structured and unstructured, are crucial aspects

for research in the considered field. Furthermore, the

presence of these central nodes suggests that technol-

ogies, such as web scraping, data mining and text

mining, are strongly integrated with big data and ma-

chine learning, indicating an emphasis on extraction

and analysis of web data as an integral part of the de-

cision-making process and knowledge generation.

Figure 2: Network visualization.

Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data

325

4.2 Overlay Visualization

The overlay map (Figure 3), a variant of the co-occur-

rence map, broadens the temporal perspective by in-

troducing a distinctive chronological dimension. The

use of distinct colours highlights the older terms in

blue, intermediate terms in green, and more recent

terms in yellow. This temporal representation spans

from 2014 to 2022. The choice of this time interval is

crucial as it provides an optimal balance for clearly

perceiving variations in terms over the years.

It is interesting to note that the older terms, cover-

ing the period from 2014 to 2017, such as information

retrieval, information extraction, semantic web, and

AI, reflecting the early stages of development when the

focus was on information organisation and extraction.

Over the interim period from 2017 to 2019, there

has been a move towards more research in NLP,

learning systems, web crawling and data mining. This

suggests an awareness of the increasing significance

of online information and a drive to create more so-

phisticated systems to handle it.

Finally, the recent terms, from 2019 to the present,

reflect the ongoing development of technologies and

areas of interest. Terms such as web scraping, deep

learning, sentiment analysis, social media, and con-

volutional neural networks highlight a greater empha-

sis on advanced data analysis and the application of

advanced technologies across various domains (i.e.

sales, financial, risk assessment), encompassing those

associated with online social dynamics.

Figure 3: Overlay visualization.

4.3 Density Visualization

Density visualization provides a perspective on the

distribution and relevance of terms within the exten-

sive range of analysed papers (Figure 4). Each identi-

fied label and the surrounding area, takes on a colour

and intensity that reflect the density of papers in that

specific position.

The presence of key terms such as web crawling,

data mining, AI, NLP systems, ML, sentiment analy-

sis, big data and learning systems highlights points of

dense concentration where scientific research has sig-

nificantly focused. In these clusters, the colour tends

to shift towards dark blue, indicating a higher density

and suggesting that these topics are at the centre of

academic debate.

Conversely, terms like deep learning, sales, and

knowledge management, while important, exhibit a

more widespread distribution characterized by lighter

shades of yellow, suggesting a lower concentration in

specific research areas compared to key terms. This

could indicate that these concepts are addressed in

various contexts and sectors without a specific focus

in a single research area.

The keyword density highlights central and cur-

rent topics in scientific analysis, reflecting a contin-

ued interest in crucial areas such as NLP, ML and data

mining from web sources, with a keen eye on new de-

velopments and emerging challenges.

Figure 4: Item density visualization.

5 CONCLUSIONS

The bibliometric analysis conducted offers a detailed

perspective on the main areas of research and devel-

opment in the field of web scraping and AI-based

models. The four clusters identified

(“ML for senti-

ment analysis”, “AI and NLP integration, “Data intel-

ligence and optimization”, “NLP and DL integra-

tion”) reflect different approaches of analysis and us-

age of online information. Several topics emerge

from these clusters, such as the implementation of

ML for sentiment analysis, the integration between

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

326

AI and NLP to optimize information management,

the use of data mining and text mining techniques to

improve data extraction and management, and the ap-

plication of advanced technologies such as deep neu-

ral networks to understand and analyse complex

online opinions. Furthermore, the temporal view

highlights a progression in research focus over the

years, with an increase in attention towards advanced

data analysis and the application of these innovative

technologies.

From an academic perspective, the identification

of emerging research trends is one of the most rele-

vant contributions. By analysing the temporal distri-

bution of keywords, the evolution of topics of interest

over time can be identified, providing academics with

a clear overview of emerging research directions.

Furthermore, the analysis provides insights into the

intersections between different disciplines and re-

search fields.

From a managerial perspective, this study pro-

vides greater clarity of key trends and themes in the

field of web scraping and AI-based models, which

can inform strategic decisions regarding investments

in technologies to understand the online data and op-

timise their extraction and analysis.

However, this research has some limitations. The

interpretation of the results could be affected by the

subjectivity of the observers and their previous

knowledge in the field, introducing potential biases

into the analysis. Finally, bibliometric analysis may

not fully capture the dynamism and complexity of the

context such as socio-cultural and political factors

that could influence research trends. Future research

should integrate theoretical and empirical approaches

to obtain a more complete and in-depth understanding

of the dynamics and challenges in the field of web

scraping combined with AI-based models, with the

aim of contributing to the development of innovative

and impactful solutions.

REFERENCES

Afandizadeh, S., Sharifi, D., Kalantari, N., & Mirzahossein,

H. (2023). Using machine learning methods to predict

electric vehicles penetration in the automotive market.

Scientific Reports, 13(1), 8345. https://doi.org/10.10

38/s41598-023-35366-3

Arjunan, T. (2022). Building Business Intelligence Data

Extractor using NLP and Python. International Journal

for Research in Applied Science & Engineering Tech-

nology (IJRASET), 10(X).

Arslan, M., & Cruz, C. (2022). Semantic taxonomy enrich-

ment to improve business text classification for dy-

namic environments. 2022 International Conference on

INnovations in Intelligent SysTems and Applications

(INISTA), 1–6. https://doi.org/10.1109/INISTA553

18.2022.9894173

Babu, S., Pragathi, B. S., Chinthala, U., & Maheshwaram,

S. (2020). Subject Tracking with Camera Movement

Using Single Board Computer. Proceedings of 2020

IEEE-HYDCON International Conference on Engi-

neering in the 4th Industrial Revolution, HYDCON

2020, 0–5. https://doi.org/10.1109/HYDCON48903.20

20.9242811

Barbera, G., Araujo, L., & Fernandes, S. (2023). The Value

of Web Data Scraping: An Application to TripAdvisor.

Big Data and Cognitive Computing, 7(3), 121.

https://doi.org/10.3390/bdcc7030121

Bisconti, C., Corallo, A., Fortunato, L., & Spennato, A.

(2019). Influence parameters correlation in a Twitter

event network. Entrepreneurship and Small Business.

Chan, L., Hogaboam, L., & Cao, R. (2022). Artificial Intel-

ligence for Business. https://doi.org/10.1007/978-3-

031-05740-3_1

Corallo, A., Del Vecchio, V., Lezzi, M., & Morciano, P.

(2021). Shop floor digital twin in smart manufacturing:

A systematic literature review. Sustainability

(Switzerland), 13(23). https://doi.org/10.3390/su132

312987

Corallo, A., Fortunato, L., Spennato, A., Errico, F., &

Pedone, A. (2020). Predicting the Consumer’s Purchase

Intention of Food Products. 2020 9th International Con-

ference on Industrial Technology and Management

(ICITM), 181–185. https://doi.org/10.1109/ICITM489

82.2020.9080404

De Mauro, A., Greco, M., Grimaldi, M., & Ritala, P.

(2018). Human resources for Big Data professions: A

systematic classification of job roles and required skill

sets. Information Processing and Management, 54(5),

807–817. https://doi.org/10.1016/j.ipm.2017.05.004

Del Vecchio, V., Lazoi, M., & Lezzi, M. (2023). Digital

Twin and Extended Reality in Industrial Contexts: A

Bibliometric Review. In Lecture Notes in Computer

Science (including subseries Lecture Notes in Artificial

Intelligence and Lecture Notes in Bioinformatics): Vol.

14218 LNCS. Springer Nature Switzerland.

https://doi.org/10.1007/978-3-031-43401-3_18

Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim,

W. M. (2021). How to conduct a bibliometric analysis:

An overview and guidelines. Journal of Business Re-

search, 133(April), 285–296. https://doi.org/

10.1016/j.jbusres.2021.04.070

Eberendu, A. C. (2016). Unstructured Data: An overview

of the data of Big Data. International Journal of Com-

puter Trends and Technology, 38(1), 46–50.

https://doi.org/10.14445/22312803/ijctt-v38p109

Hao, Z., Zhang, B., Mao, D., Yen, J., Zhao, Z., Zuo, M., Li,

H., & Xu, C.-Z. (2023). A novel method using LSTM-

RNN to generate smart contracts code templates for im-

proved usability. Multimedia Tools and Applications,

82(27), 41669–41699. https://doi.org/ 10.1007/s11042-

023-14592-x

Jorge, O., Pons, A., Rius, J., Vintró, C., Mateo, J., & Vi-

laplana, J. (2020). Increasing online shop revenues with

Bibliometric Insights into Web Scraping and Advanced AI-Based Models for Valuable Business Data

327

web scraping: A case study for the wine sector. British

Food Journal, 122(11), 3383–3401.

https://doi.org/10.1108/BFJ-07-2019-0522

Kafeza, E., Rompolas, G., Kyriazidis, S., & Makris, C.

(2023). Time-Series Clustering for Determining Behav-

ioral-Based Brand Loyalty of Users Across Social Me-

dia. IEEE Transactions on Computational Social Sys-

tems, 10(4), 1951–1965. https://doi.org/

10.1109/TCSS.2022.3219781

Khder, M. A. (2021). Web scraping or web crawling: State

of art, techniques, approaches and application. Interna-

tional Journal of Advances in Soft Computing and its

Applications, 13(3), 144–168. https://doi.org/

10.15849/ijasca.211128.11

Kinne, J., & Resch, B. (2018). Generating Big Spatial Data

on Firm Innovation Activity from Text- Mined Firm

Websites. GI_Forum, 1, 82–89. https://doi.org/

10.1553/giscience2018_01_s82

Kumar, S. A., Nasralla, M. M., García-Magariño, I., &

Kumar, H. (2021). A machine-learning scraping tool

for data fusion in the analysis of sentiments about pan-

demics for supporting business decisions with human-

centric AI explanations. PeerJ Computer Science, 7,

e713. https://doi.org/10.7717/peerj-cs.713

Latham, S., & Giannetti, C. (2023). A Tool to Combine Ex-

pert Knowledge and Machine Learning for Defect De-

tection and Root Cause Analysis in a Hot Strip Mill. SN

Computer Science, 4(5), 628. https://doi.org/

10.1007/s42979-023-02104-5

Lee, M., & Na, I. (2023). Enhancing Similar Business

Group Recommendation through Derivative Criteria

and Web Crawling. KSII Transactions on Internet and

Information Systems, 17(10). https://doi.org/10.3837/

tiis.2023.10.012

Patnaik, S. K., & Babu, C. N. (2021). Information Retrieval

from web with Faster R-CNN Deep Learning Net-

works: A New Perspective. Proceedings - 6th Interna-

tional Conference on Computer Science and Engineer-

ing, UBMK 2021, i, 61–66. https://doi.org/

10.1109/UBMK52708.2021.9558956

Pranckutė, R. (2021). Web of science (Wos) and scopus:

The titans of bibliographic information in today’s aca-

demic world. Publications, 9(1).

https://doi.org/10.3390/publications9010012

Pritchard, A. (1969). Statistical Bibliography or Bibliomet-

rics? Journal of Documentation, 25, 348–349.

Reddy, K. K. C., Anisha, P. R., Nguyen, N. G., & Sreelatha,

G. (2021). A Text Mining using Web Scraping for

Meaningful Insights. Journal of Physics: Conference

Series, 2089(1). https://doi.org/10.1088/1742-

6596/2089/1/012048

Rejeb, A., Rejeb, K., & Keogh, J. G. (2020). Potential Of

Big Data For Marketing: A Literature Review. Manage-

ment Research and Practice, 12(3).

Sahu, S., Divya, K., Rastogi, D. N., Yadav, P. K., & Perwej,

D. Y. (2022). Sentimental Analysis on Web Scraping

Using Machine Learning Method.

Sarica, S., Luo, J., & Wood, K. L. (2020). TechNet: Tech-

nology semantic network based on patent data. Expert

Systems with Applications, 142, 112995.

https://doi.org/10.1016/j.eswa.2019.112995

Sridevi, G. M., & Suganthi, S. K. (2022). AI based suitabil-

ity measurement and prediction between job descrip-

tion and job seeker profiles. International Journal of In-

formation Management Data Insights, 2(2), 100109.

https://doi.org/10.1016/j.jjimei.2022.10 0109

Tanasescu, L. G., Vines, A., Bologa, A. R., & Vaida, C. A.

(2022). Big Data ETL Process and Its Impact on Text

Mining Analysis for Employees’ Reviews. Applied Sci-

ences, 12(15), 7509. https://doi.org/10.3390/app121

57509

Tang, Y., Chen, M., Wang, C., Luo, L., Li, J., Lian, G., &

Zou, X. (2020). Recognition and Localization Methods

for Vision-Based Fruit Picking Robots: A Review.

Frontiers in Plant Science, 11(May), 1–17.

https://doi.org/10.3389/fpls.2020.00510

Thuan, N. D., Nhut, N. M., Quan, D. M., & Khanh, L. M.

D. (2022). Using Blockchain and Artificial Intelligence

to build a Job Recommendation System for Students in

Information Technology. 2022 RIVF International

Conference on Computing and Communication Tech-

nologies (RIVF), 364–369. https://doi.org/10.11

09/RIVF55975.2022.10013916

Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a

Methodology for Developing Evidence-Informed Man-

agement Knowledge by Means of Systematic Review.

British Journal of Management, 14(3), 207–222.

https://doi.org/10.1111/1467-8551.00375

van Oorschot, J. A. W. H., Hofman, E., & Halman, J. I. M.

(2018). A bibliometric review of the innovation adop-

tion literature. Technological Forecasting and Social

Change, 134(June), 1–21. https://doi.org/

10.1016/j.techfore.2018.04.032

Vankevich, A., & Kalinouskaya, I. (2021). Better under-

standing of the labour market using Big Data.

Ekonomia i Prawo, 20(3), 677–692. https://doi.org/

10.12775/eip.2021.040

Zhao, B. (2017). Web Scraping. In L. A. Schintler & C. L.

McNeely (A c. Di), Encyclopedia of Big Data.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

328