In this context, social media can be regarded as a
shared platform for participating citizens so that
people become truly empowered in smart societies.
For this reason, the latest research efforts focus on
extracting knowledge from social media to
contribute to developing smart-city systems, where
taking Twitter as a sensor has gained increasing
interest because of the real-time nature of its data. In
this regard, two controversial issues could seemingly
distort research results: restrictions on data
acquisition and the veracity of information.
However, such issues do not undermine the
adequacy of Twitter as an information-providing
platform for smart-city applications. On the one
hand, the Twitter Streaming API is only able to
return up to 1% of all content published at a given
time, but Ayora et al. (2018) empirically
demonstrated that neither the lack of completeness
nor the latency of Twitter data results in flawed data
that could lead to wrong conclusions. On the other
hand, Doran et al. (2016) presented the reasons for
relying on the collective information conveyed by a
stream of UGC:
(a) users lose credibility within their social
networks when they continuously share posts that
are unlikely to be authentic and truthful, and
(b) false information provided by a few is
unlikely to lead to misleading inferences because
truthful information is usually shared by an
overwhelming majority.
Two projects merit our attention in this
framework of Smart Society, particularly in
developing general-purpose Twitter-based
crowdsensing systems. For example, TwitterSensing
(Costa et al., 2018) detects and classifies events of
interest (e.g. accidents, floods, traffic jams, etc.) not
only to enhance the quality of wireless sensor
networks but also to detect the areas where new
sensors are required. In this case, information from
events that are currently happening is extracted
when tweets are converted into vector
representations using term frequency and inverse
document frequency (TF-IDF), which are then fed
into a Multinomial Naive Bayes classifier.
Adikari and Alahakoon (2021) proposed an AI-
based system to monitor the 'emotional pulse' of the
city by analysing the emotions collectively
expressed by citizens through data from social media
and online discussion forums. The system carries out
three main tasks. First, primary emotions (i.e. anger,
anticipation, disgust, fear, joy, sad, surprise, and
trust) are extracted based on a crowdsourced lexicon
for emotion mining (Mohammad & Turney, 2018);
as a result, an emotion profile is created for each
tweet. Second, emotion transitions are modelled
using Markov models. Finally, toxic comments,
which indicate a higher level of negativity than basic
negative emotions, are detected with a deep-learning
multi-label classifier, which employs layers of word
embedding, bidirectional recurrent neural networks
and convolutional neural networks.
On the other hand, most of the latest studies that
integrate social media into a smart-city model focus
on specific domains, such as traffic (Pandhare &
Shah, 2017; Lau, 2017; Salas et al., 2017),
healthcare (Alotaibi et al., 2020) or security (Saura
et al., 2021). For example, Pandhare and Shah
(2017) proposed a system that detects tweets related
to traffic and accidents. After filtering out
stopwords, they determined the importance of the
tokens in a tweet through TF-IDF and then
employed logistic regression and SVM as binary
classifiers (i.e. traffic and non-traffic tweets).
Lau (2017) extracted useful driving navigation
information (e.g. road accidents, traffic jams, etc.)
from social media (i.e. Twitter and Sina Weibo) to
enhance the effectiveness of Intelligent
Transportation Systems, which provide drivers with
real-time navigation information. First, he employed
a topic model-based method (Latent Dirichlet
Allocation) to learn concepts about traffic events
from an unlabeled corpus of UGC (i.e. message
filtering). Second, he applied an ensemble-based
classification method to detect traffic-related events
automatically (i.e. event identification). In particular,
the ensemble classifier relied on a weighted voting
scheme with three base classifiers, i.e. support
vector machines (SVM), Naïve Bayes, and K-
Nearest Neighbour.
Salas et al. (2017) proposed a framework to
analyse real-time traffic incidents using Twitter data,
where the main steps are as follows. First, tweets are
tokenised, and stopwords and special characters are
removed. Second, tweets are classified into traffic-
related and non-traffic-related, and traffic-related
tweets are in turn classified into different event
categories: roadworks, accidents, weather, and social
events; SVM is used as the classifier. Third, the
tweet location is extracted using named-entity
recognition and entity disambiguation based on
Wikipedia. Fourth, the strength of positive or
negative sentiment (ranging from -5 to 5) is
predicted for each tweet. Finally, the level of stress
or relaxation for each tweet (ranging from -5 to 5) is
also determined; therefore, when the user is
complaining, the level of stress in the text is high.
Alotaibi et al. (2020) developed a big-data
analytics tool to detect symptoms and diseases using