ter explorations, latent encapsulation of individuals
by deep neural networks, and temporal information
understanding with gated recurrent unit (GRU) neural
networks (Cho et al., 2014). We also improve the con-
stant type of bias removal from the original method
with time series (Benvenuto et al., 2020), which leads
to better results. The available tools for post-survey
analysis have also been improved. We are also ad-
dressing the issue of creating synthetic data that ap-
proximates real human profiles so that clients can
evaluate and customize their survey definitions before
sending them to people.
Profiling people for content recommendations,
such as news recommendations, is a long-standing
practice (Mannens et al., 2013). Automatic detec-
tion of fraudulent profiles on social media platforms
such as Instagram and Twitter is another common
application for the creation of people profiles using
data mining and clustering techniques (Khaled et al.,
2018). In (Ni et al., 2017), social media data ex-
tracted from WeChat
1
is used to create individual
profiles and group them based on their occupational
field, using similar NLP techniques to those previ-
ously mentioned. The research in (Schermer, 2011)
discusses the use of data mining in automated pro-
filing processes, with a focus on ethics and potential
discrimination. Use cases include security services
or internal organizations that create profiles to assess
various characteristics of their employees. Profiling
and grouping individuals using data mining and NLP
techniques to extract information from text data is a
common topic in the literature. In (Wibawa et al.,
2022), the authors use AI methods such as traditional
NLP to process application documents for job open-
ings, which enables automatic filtering, evaluation
and prioritization of candidates.
3 SURVEY SETUP
3.1 Survey Formalization
Our aim is to present a survey in as generalized a form
as possible. In doing so, we rely on our experience
with the clients of vorteXplore and on the experience
we have gained with later versions of the framework.
The proposed high-level presentation method consists
of a limited number of questions (configurable on the
client side, on average between 15-25) that are either
general in nature or related to an asset shown in the
form of an image, video or extracted text (e.g. arti-
cles, SMS messages, emails, etc.). The formal spec-
1
WeChat.com
ifications and components of a survey are explained
below.
Groups. Every asset and question that is asked is part
of a group. Examples of groups from the use case:
Awareness, Prevalence, Sanction, Inspiration, Fac-
tual, Sensitivity. In our experience, this has proven to
be very useful for characterizing people from multi-
ple perspectives and organizing assets and questions.
It also has implications for reusability and makes it
easier to maintain the dataset.
Assets. A collection of assets representing video
files, media posts, SMS, etc. Asset indices also have
an optional dependency specification, i.e. the client
can specify that an asset should depend on a pre-
viously displayed set of other assets: Deps(A
i
) =
{A
j
}
j∈1..|Assets|
. For example, a video or image as-
set could only make sense as a sequence of previous
assets.
Question. The set of textual questions is denoted
by Q. Each element Q
i
∈ Q has two categories of
properties:
1. Structural properties.
• The set of assets that are compatible with this
question: Compat(Q
i
) = {A
j
∈ Assets}
j
. The
idea of compatibility is that some of the ques-
tions make sense for each type of asset shown.
Others do not, e.g. video-based assets with a
concrete action demonstration.
• Dependencies on previous questions. Internally,
the dependencies between questions take the
form of a directed acyclic graph, where each
node Q
i
has a set of dependencies Deps(Q
i
) =
{Q
j
}
j
. This set represents a restriction that Q
i
can only be asked as a follow-up question to a
previous question Q
k
∈ Deps(Q
i
).
b) Scoring properties.
• Attributes. For the use case of IB recognition,
some examples: Team interaction, Offensive
language, Rumors, Personal boundaries, Lead-
ership Style (the full list can be found in a ta-
ble in our repository). These are customizable
in the framework, are usually set by the organi-
zations prior to the surveys and are not visible
to the respondents. Generally, the client orga-
nization strategically uses these inherent charac-
teristics to gain the insights they are looking for
in the post-survey analysis. The Attr set repre-
sents the collection of attributes used by an ap-
plication. For each question Q
i
, a vector of all
attributes ordered by indices is given, represent-
ing the relative importance of each attribute to
the question. The value range is [0 − 1], where
ICSOFT 2024 - 19th International Conference on Software Technologies
360