The Future of Data-driven Personas: A Marriage of Online Analytics

Numbers and Human Attributes

Joni Salminen

1,2

, Soon-gyo Jung

and Bernard J. Jansen

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar

Turku School of Economics at the University of Turku, Turku, Finland

Keywords:

Data-driven Personas, Automatic Persona Generation, Online Analytics, Customer Segmentation, Marketing,

Big Data, Automation.

Abstract:

The massive volume of online analytics data about customers has led to novel opportunities for user segmen-

tation. However, getting real value from data remains challenging for many organizations. One of the recent

innovations in online analytics is data-driven persona generation that can be used to create high-quality hu-

man representations from online analytics data. This manuscript (a) summarizes the potential of data-driven

persona generation for online analytics, (b) characterizes nine open research questions for data-driven persona

generation, and (c) outlines a research agenda for making persona analytics more useful for decision makers.

1 INTRODUCTION

Despite the increasing availability of online analytics

data (also referred to as “Big Data”), decision mak-

ers are trying to turn customer data into practical in-

sights (Salminen et al., 2017a). For this reason, var-

ious approaches for automatic analytics and insight

generation have been proposed (Salminen and Jansen,

2018; Wang et al., 2018).

One approach for better understanding customers

is the persona technique, popularized by Cooper

(2004). A persona is deﬁned as a ﬁctitious person

representing an underlying customer or user group,

often the core customers of an organization, although

they can also be the potential or desired users of a

system (Cooper, 2004) (see Figure 1 for an example).

Personas are deployed for various purposes, e.g., soft-

ware development, design, marketing, and health in-

formatics (Goodwin and Cooper, 2009). They facil-

itate the communication of data within an organiza-

tion, so that decisions can be made keeping the cus-

tomers in mind (Long, 2009).

From the analytics perspective, personas segment

similar customers under one archetype, aiding deci-

sion makers to understand customer needs and wants.

While it is not practical to cognitively process thou-

sands of individuals for customer decisions, a few

core customer segments is feasible for humans.

As online analytics data has become more preva-

lent and accessible, researchers have proposed novel

Figure 1: Example of a persona proﬁle

. A typical persona

proﬁle has a name, picture, and text description of the per-

sona.

methods for data-driven persona generation that uses

digital, rather than analog, data for persona cre-

ation (Zhang et al., 2016).

Data-driven persona generation addresses two ma-

jor challenges in persona creation: (a) the complexity

and cumbersomeness of using large amounts of cus-

tomer data for creating personas, and (b) the slow

and expensive process of creating personas manually.

Data-driven personas transform online analytics data

into representations that decision makers can easily

process (An et al., 2018c; Salminen et al., 2018a).

Further, data-driven personas are created rapidly and

updated easily, while preserving the privacy of indi-

viduals (An et al., 2018b).

608

Salminen, J., Jung, S. and Jansen, B.

The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes.

DOI: 10.5220/0007744706080615

In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), pages 608-615

ISBN: 978-989-758-372-8

However, to generate useful and accurate data-

driven personas without any manual interventions,

there are several challenges to address. These chal-

lenges relate to sub-ﬁelds of computer science, such

as Image Generation, Natural Language Processing,

Topic Modeling, Algorithms, and Human-Computer

Interaction, as well as various “softer” topics such as

persona perceptions (Salminen et al., 2018c), persona

biases (Hill et al., 2017; Salminen et al., 2019b), and

value in use (Salminen et al., 2018b).

In this manuscript, we explore a contemporary

collection of research challenges related to data-

driven persona creation, particularly from the per-

spective of automatic persona generation. We aim

to inspire research within persona studies and related

subﬁelds of computer science.

2 THE RISE OF DATA-DRIVEN

PERSONAS

2.1 Limitations of Manual Persona

Generation

Personas are typically created with qualitative ap-

proaches. Brickey et al. (2012) found that 81% of per-

sona creation efforts reported in academic literature

have applied qualitative techniques, such as ethno-

graphic ﬁeldwork and interviews. However, manual

persona generation has been thoroughly criticized in

the literature, the main criticism being:

Non-Representative Data: Manually created per-

sonas typically rely on data that does not represent the

whole customer base (Chapman and Milham, 2006).

Lack of Scaling: Because manual analysis relies

on human labor, it scales poorly with the big datasets

used in online analytics (An et al., 2018b).

High Cost: Manual persona generation is costly; it

typically takes several months and costs tens of thou-

sands of dollars. The high cost factor keeps personas

from the reach of small to medium-size businesses

and start-ups.

Expiration: Personas tend to expire when changes

in customer behavior take place. This is typical for

many fast-moving online businesses, including online

purchase behavior (Salminen et al., 2017b), search

behavior (Jansen et al., 2011), and online content con-

sumption (Abbar et al., 2015).

2.2 Advances in Data-driven Persona

Generation

To solve the challenges of manual persona generation,

researchers have suggested quantitative persona gen-

eration. The main techniques are as follows.

Quantitative Analysis of Survey Data: Several

prior attempts for data-driven persona creation rely on

survey-based data collection (Chapman et al., 2015;

Dupree et al., 2016; Vahlo et al., 2017). This survey

data is then most typically analyzed via cluster or fac-

tor analysis. However, survey-based data collection

can be costly and fallible compared to using behav-

ioral data due to many possible respondent and re-

searcher biases associated to survey data collection in

general (Podsakoff et al., 2003).

System Log Data: In addition to survey data,

personas can be created from system logs, and or-

ganizational records describing the users or cus-

tomers (Brickey et al., 2012). For example, Molenaar

(2017) analyzed 400,000 clickstreams from a period

of three months, grouping them into common work-

ﬂows and classifying users into these workﬂows. Us-

ing a similar approach, Zhang et al. (2016) applied hi-

erarchical clustering to generate ﬁve data-driven per-

sonas from clickstream data.

Procedural Personas: In video game context, re-

searchers have created procedural personas that cap-

ture the sequential game-playing choices. The ap-

plied techniques include, e.g., evolutionary algo-

rithms and neural networks (Holmgard et al., 2018).

The procedural personas are given names based on

their behaviors of playing the game (e.g., “Mon-

ster Killers”). Rather than being “rounded per-

sonas” (Nielsen, 2004) with name and demographic

information, these personas can be seen as virtual

agents that model the possible game-playing behav-

iors (Vahlo et al., 2017).

Latent Semantic Analysis (LSA): LSA has been

applied to create personas by differentiating users

based on their use of language (Miaskiewicz et al.,

2009). The weakness of this approach is the depen-

dency on the text corpus which is not always available

in online analytics data. In addition, not using behav-

ioral data (e.g., product engagement) can be consid-

ered as a weakness.

Discrete Choice Analysis: In the discrete choice

methodology for persona creation (Chapman et al.,

2015), customers explicitly state their preferences and

a conjoint analysis algorithm is then used to match

respondent to their best-ﬁt persona. The method was

developed to answer the criticism of personas as lack-

ing quantitative information (Chapman et al., 2008),

as it enables, through forced assignment, to determine

The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes

609

the proportional representativeness of personas within

the overall user base. The method also makes it pos-

sible to compare the algorithmic persona assignments

to randomly generated persona assignments. How-

ever, the major limitation is that stated preference data

can be expensive to collect and can also be more unre-

liable than observed behavioral data. This is also the

limitation of creating personas with principal compo-

nent analysis (Sinha, 2003) that uses preference data

from a limited number of customers.

Automatic Persona Generation (APG): APG is de-

ﬁned both as a methodology and a system for au-

tomatic creation of personas from online analytics

data (An et al., 2018b). Automatically generated per-

sonas are (1) representative, as APG processes the en-

tire online analytics dataset, (2) behaviorally accurate,

inferring patterns from customers’ engagement with

products (e.g., digital content, e-commerce products,

ﬂight destinations...), (3) rapidly generated due to fast

processing time, and (4) constantly up-to-date due to

period refreshing of the data and the associated re-

generation of the personas (An et al., 2018b,c).

The following section explains the APG method-

ology. We focus on this approach as it represents the

latest techniques for data-driven persona generation

and speciﬁcally utilizes online analytics data.

3 AUTOMATIC PERSONA

GENERATION

3.1 Data Collection from Online

Analytics Platforms

Online analytics platforms (e.g., Google Analytics,

YouTube Analytics, Facebook Insights) typically en-

able collection of user data automatically via applica-

tion programming interfaces (APIs)

. Typically, this

data is aggregated into segments to protect the pri-

vacy of individual users. An example of aggregated

user segment is [Male, 44-55, Qatar]. The segments

given by the online analytics platforms typically con-

tain information of the gender, age, and country of the

users. The platforms typically collect this informa-

tion from the users upon registration. Various inter-

action metrics can be retrieved for each group (e.g.,

clicks, views). For example, [Female, 24-35, USA]

−→ 1, 590 views for Video A.

Using the APIs of online analytics platform, APG

collects the aggregated data for products and engage-

ment metrics. For example, from YouTube Analytics,

Note that accessing the analytics data requires autho-

rization from the owner of the analytics property.

Figure 2: Output of APG. Denoted areas are: [A] Name and

demographic information, [B] Picture, [C] Text description,

[D] Life situation information, [E] Topics of interest, [F]

Social media comments, [G] Most interested content, and

[F] Audience size.

APG collects videos and their view counts, whereas,

from Google Analytics, APG collects pages and num-

ber of sessions.

3.2 Data Processing and Persona Proﬁle

Generation

After collecting the data from an online analytics plat-

form, APG transforms it into an interaction matrix

that captures the interaction between customers and

online products (An et al., 2018c,b).

V denotes the g × c matrix of g customer groups

and c online products. The element V, V

i j

, can be any

metric that reﬂects the engagement of the customer

group G

for product C

. For example, in YouTube

Analytics, V

i j

is a view count for a particular video,

from customer group G

. The customer groups

contain gender, age, and country (e.g., Female, 54-

65, South Korea). Using V as the basis, non-negative

matrix factorization is applied to detect p latent pat-

terns (An et al., 2018c).

These patterns form the core of the personas, as

they represent the customer groups’ product prefer-

ences. APG then chooses a representative demo-

graphic group for each latent pattern and enriches

this demographic group with additional information

to produce a complete persona proﬁle (see Figure 2).

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

610

4 FUTURE OF AUTOMATIC

PERSONA GENERATION

The vision of APG can be summarized as achieving

completely automatic generation of high-quality per-

sonas that addresses the limitations of manual per-

sona creation through the employment of online an-

alytics data. “Completely automatic” refers to elimi-

nation of manual steps. “High quality” refers to serv-

ing the persona user’s decision needs while accurately

representing the underlying data about the customers.

In the subsections, we present Proposed Research

Questions (PRQs) toward the vision.

4.1 Automatic Generation of Persona

Pictures

In the current version of APG, a photo for each per-

sona is purchased through online stock photo banks.

However, it is difﬁcult and costly to ﬁnd an appro-

priate photo for all demographic groups, especially

worldwide. The potential solution could be to gener-

ate persona pictures automatically.

The nascent developments in Deep Learning,

particularly in Generative Adversarial Networks

(GAN) Goodfellow et al. (2014), have been applied

in generating and modifying human faces. This line

of work could be used for generating photos that vary

by persona’s age, gender, and ethnicity – possibly us-

ing these attributes as conditions in the Conditional

GAN architecture Isola et al. (2017). PRQ1: How to

automatically generate persona proﬁle pictures?

4.2 Deﬁning the Optimal Persona

Attributes

Persona proﬁles typically contain name, age, and gen-

der, as well as other demographic information, such as

marital status, education level, job, and so on. How-

ever, there have been few studies into what informa-

tion should be included in a persona. This lack of

prior work speaks to a need for user studies, includ-

ing interviews and ethnographic investigations of ac-

tual users of personas in the workplace.

Related to this issue of deﬁning the optimal in-

formation content of a persona proﬁle, another ma-

jor limitation of data-driven persona methodologies

is that none of them currently infer in-depth insights

about the users such as needs and wants that are es-

sential for the thorough understanding of the users or

customers that the persona portrays (Cooper, 2004).

We summarize these issues in two research ques-

tions: PRQ2: What information should automat-

ically generated personas contain? PRQ3: How

could that information be automatically inferred

from online analytics data?

Two approaches could potentially address PRQ2:

(1) deﬁning shared information needs for a given in-

dustry, and constructing industry-speciﬁc templates

(e.g., e-commerce, online media and news, e-health,

etc.); or (2) providing a self-selection options for

users to build personas by choosing from all avail-

able information. In the former case, the selection

of persona attributes should depend on persona users’

information needs that can be obtained via user stud-

ies (Salminen et al., 2018d). Overall, determining the

persona users’ information needs relies on implicit or

explicit user feedback.

A potential solution for PRQ3 is the use of com-

putational methods for inferring customer attributes

from social media.There is a substantial amount of

research using social media platforms, such as Twit-

ter, to infer user attributes (Volkova et al., 2015).

These studies have inferred, for example, social me-

dia users’ psychological traits, socioeconomic status,

relationship status, political orientation, and brand

likings, by using proﬁle information, comments, and

connections of the user. The applied techniques are

diverse, including natural language processing, graph

analysis, and various machine learning classiﬁers. If

any of the above attributes are considered critical

by persona users in a speciﬁc application domain,

methods of inferring those attributes and associating

them with the automatically generated personas (most

likely using probabilistic matching) are called for.

Moreover, these additional customer attributes

could be available on-demand, so that the per-

sona users could construct their own personas from

ground-up by choosing the information elements that

matter in their respective industries or use cases.

4.3 Unsupervised Learning of Persona’s

Interest

In addition to demographic information, online ana-

lytics data contains information on customer prefer-

ences that can be inferred from the customers’ en-

gagement with various online products. However, due

to vast number of individual products, they need to

be categorized in order to provide a meaningful sum-

mary of the customer preferences. Thus, data-driven

personas should incorporate unsupervised topic mod-

els that can accurately classify the online products

based on their features, such as textual descriptions.

This prompts the research question: PRQ4: How

to generate a universal topic taxonomy for online

content? Here, unsupervised learning methods, such

as Latent Dirichlet Allocation (LDA) Topic Mod-

The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes

611

Table 1: Research and development goals for automatic persona generation.

Persona section

addressed

Proposed solution Ideal outcome Applicable domains

Persona Picture Automatically producing persona

face pictures for the matching de-

mographic variables (age, gender,

country)

Eliminates the need for manually

acquisition of persona pictures.

Computer Vision,

Generative Adversarial

Networks

Topics of interest Automatically creating a taxonomy

that is scalable across multiple top-

ical domains.

Eliminates the need for creat-

ing an organization- or industry-

speciﬁc taxonomy each time a

new one is added.

Topic Modeling: LDA,

LSA; Entity Resolu-

tion, Data Mapping

Persona quotes Providing comments that relevant

for persona user’s use case and do

not distract the persona user from

the important attributes of the per-

sona.

Increases empathy and customer

insights gained by the persona

user; eliminates distraction

caused by non-useful comments.

Social Computing; Text

Classiﬁcation, Natural

Language Processing

Persona at-

tributes

Determining the persona attributes

that correspond to the persona

users’ needs in a given decision-

making situation and devising

methods to infer those attributes

from unstructured data such as

social media comments

Satisfying the persona users’ in-

formation needs, thereby en-

abling possible better decision-

making via the use of personas.

Human-Computer In-

teraction, Information

Design, Usability

Overall persona

proﬁle

Validating accuracy, consistency,

and usefulness of personas for indi-

viduals and organizations.

Ensuring that personas are reli-

able and valid, so that they can be

trusted in real decision-making

situations.

Case Studies, User

Studies, HCI

elling (Hong et al., 2018), could be helpful. In ad-

dition, Google’s Universal Sentence Encoder uses a

hybrid approach that outputs similarity with a known

taxonomy for any text content (Cer et al., 2018).

Another challenge related to inferring additional

customer attributes is their association with the per-

sona proﬁles generated using different source data.

For example, Platform A has information on a per-

sona’s topics of interest, and Platform B has informa-

tion on the persona’s movie preferences. Then, there

is a need for mapping these seemingly disconnected

pieces of information in order to include both of them

in the persona proﬁle. To create high-quality personas

with attributes such as the persona’s goals, needs,

and wants, several data sources need to be combined.

Therefore, PRQ5: How can we map the personas

to online users across different platforms? Ap-

proaches studied in the domain of entity resolution

could be of help here.

4.4 Choosing High-quality Quotes for

the Persona

Descriptive quotes are typically shown in the per-

sona proﬁle to provide a better understanding of the

customers (Cooper, 2004). However, it has been

found that the quotes can also distract the persona

users toward information that is not relevant for their

task. For example, Salminen et al. (2018d) found

that the ethnicity of the persona affected the persona

users’ interpretation of the persona. Moreover, when

showing unﬁltered social media comments as persona

quotes, the impression of the persona can quickly turn

toxic (Salminen et al., 2018d). To counter this is-

sue, Salminen et al. (2019a) have proposed three crite-

ria for the automatic selection social media comments

as persona quotes:

1. Representativeness: the selected comments match

the behavioral patterns, topics of interest, and de-

mographics of the corresponding persona

2. Relevance: the selected comments are helpful for

the persona user in his or her purpose for using the

persona

3. Non-toxicity: the selected comments are not of-

fensive to the degree where they would distract

the persona user from the other information in the

persona proﬁle.

The associated solutions require ﬁltering out the

most toxic comments by automatic classiﬁcation (or

use of dictionary-based methods), but also mapping

the comments with the matching personas. In a re-

cent workshop on automatic persona generation, it

was suggested that the mapping could be done based

on demographic analysis of the social media users’

proﬁle information (when publicly available) or by in-

ferring their gender, age and interest from the style of

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

612

their writing (An et al., 2018a). Thus, the research

questions are: PRQ6: How can the the attributes

of the commenting customers be inferred only us-

ing the text of the social media comments (when

no public proﬁle information is available) to select

the comments that meet a persona’s attributes?

PRQ7: How to select the most relevant comments

to the end user?

Moreover, in ﬁltering out toxic comments, we

should be cautious of manipulating the actual data and

thereby biasing the information shown in the persona

proﬁle. Therefore, if the data in fact contains a high

number of toxic comments, to maintain the truthful-

ness of the persona, data-driven personas should dis-

play those comments, even if some end users might

ﬁnd them offensive. Thus, the challenge of toxicity

in personas involves a certain trade-off between truth-

fulness and user experience.

4.5 Avoiding Biasing the End Users of

Personas

One challenge of the personas is the fact that for the

selected attributes, only one dominant value can be

chosen. For example, the persona can have only one

age, even though the customers that the particular

persona represents form a distribution of ages. This

concern is highlighted in data-driven persona creation

methodologies that are based on behavioral or pref-

erence patterns, because many demographic groups

can share behavioral patterns or preferences. Choos-

ing one dominant value for an attribute, say, gender or

ethnicity, can easily result in biased interpretations by

persona users (Salminen et al., 2018d). Thus, PRQ8:

How can personas be debiased so that oversimpli-

cation of the customer base is avoided?

Two solutions can be thought of: (1) removing

ambiguous informational to debias the persona for

end users, and (2) purposefully introducing diversity

to display the variation in the underlying user base.

For example, it is possible to introduce an additional

layer of information in the persona proﬁles (Salmi-

nen et al., 2019a). Such an approach could be used to

mirror each active information element in a “deeper

layer” that holds breakdown information. By show-

ing deeper information, it may be possible to curb the

tendency of personas to evoke stereotypical thinking.

The drawback of this approach is that it may re-

duce the empathy-beneﬁts of persona (immersion, un-

derstanding) (Cooper, 2004), so that instead of be-

ing a believable person, the persona becomes a frag-

mented group of different people. To maintain the

credibility of the persona, a coherence of the whole is

needed. These perceptual questions are conceptually

linked to evaluation of the persona proﬁles, an area

that is critical for adoption and real use of personas

in organizations. Toward that end, our ﬁnal research

question is PRQ9: How to evaluate the usefulness

and value generated by data-driven personas?

5 EVALUATING THE QUALITY

OF DATA-DRIVEN PERSONAS

Finally, it is not immediately evident how to mea-

sure the quality of data-driven personas. For exam-

ple, how can their accuracy (in terms of correspon-

dence with the data) be veriﬁed? Is accuracy even

correlated with the perceived usefulness of the per-

sonas? In disentangling these questions, researchers

have mostly focused on the technical aspects of per-

sona quality (Chapman and Milham, 2006; Chapman

et al., 2008). Yet, there is a nascent stream of studies

focusing on persona perceptions (Marsden and Haag,

2016; Hill et al., 2017; Salminen et al., 2018d).

For example, Salminen et al. (2018c) developed a

Persona Perception Scale that lists several perceptual

constructs associated with the use of personas. From

this scale, at least the following ones could be per-

ceived important for evaluating persona quality: cred-

ibility, consistency, completeness, and clarity. In or-

der for personas to be useful, the persona users need

to perceive them as credible (i.e., trustworthy, reli-

able). Moreover, the information in the data-driven

persona proﬁles needs to be consistent (e.g., topics of

interest need to match the quotes), or else there is a

risk of confusion among the persona users. In turn,

if the personas are not complete (i.e., contain all the

necessary information that the persona user needs for

accomplishing their task), they can hardly be consid-

ered useful. Finally, information should be presented

clearly; for example, unclear titles or description for

the persona content sections are likely to cause con-

fusion among end users (Salminen et al., 2018d).

6 DISCUSSION

Data-driven personas of the future should be low-cost,

accurate, and accessible by small and large organiza-

tions with varying budgets and needs. However, many

challenges await before reaching this vision.

To investigate these challenges, we formulated

nine research questions that deal with various aspects

of automatic persona generation. Addressing these

questions, we believe, would result in major progress

toward creating high-quality personas from customer

The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes

613

analytics data. This goal is impactful for real organi-

zations deploying personas for use cases such as prod-

uct development, design, and marketing.

This manuscript represents a call for action to re-

searchers interested in “humanizing” online analyt-

ics, encouraging contributions in methodological and

practical development of data-driven personas. We

expect that addressing the research questions pro-

posed here requires several years of active research,

with the potential of several new avenues of inquiry

in multiple domains. Data-driven persona creation is

an on-going research ﬁeld with potential for both fo-

cused disciplinary and cross-disciplinary research in

Algorithms, HCI, Online Analytics, and so on.

Personas are also opening new research av-

enues for experiments in Computational Social Sci-

ence, particularly revealing end users’ subjective

perceptions and biases about the audience or user

groups (Hill et al., 2017; Salminen et al., 2018d). By

classifying personas according to their attributes (e.g.,

age, gender, ethnicity), it is possible conduct user

studies that examine how end users perceive and re-

spond to different personas. Another line of research

is to investigate the possibility of algorithmic bias in

the automatically generated personas.

While automatically generated personas may not

replace numbers in online analytics, they do pro-

vide intuitive descriptions of the customer base us-

ing quantitative data. In the APG system (Jung et al.,

2017), numbers remain available as raw data that can

be downloaded by the end users and as data break-

downs. Thus, data-driven personas can support de-

cision making by providing humanlike renderings of

numerical customer data, while providing an access

to the underlying raw data.

7 CONCLUSION

The advancements in machine learning and Web

technologies, combined with online analytics data,

show great promise for data-driven persona genera-

tion. With these novel methods, it becomes possi-

ble to bring personas in the reach of more decision

makers in more organizations, enhancing customer-

oriented decision making and democratizing personas

for all organizations, including corporations, small

businesses, and startups using online analytics data.

REFERENCES

Abbar, S., An, J., Kwak, H., Messaoui, Y., and Borge-

Holthoefer, J. (2015). Consumers and suppliers: At-

tention asymmetries. a case study of al jazeera’s news

coverage and comments. In Computational Journal-

ism Symposium.

An, J., Chunara, R., Crandall, D. J., Frajberg, D., French,

M., Jansen, B. J., Kulshrestha, J., Mejova, Y., Romero,

D. M., Salminen, J., Sharma, A., Sheth, A., Tan, C.,

Taylor, S. H., and Wijeratne, S. (2018a). Reports of

the Workshops Held at the 2018 International AAAI

Conference on Web and Social Media. AI Magazine,

39(4):36–44.

An, J., Kwak, H., Jung, S.-g., Salminen, J., and Jansen,

B. J. (2018b). Customer segmentation using online

platforms: isolating behavioral and demographic seg-

ments for persona creation via aggregated user data.

Social Network Analysis and Mining, 8(1).

An, J., Kwak, H., Salminen, J., Jung, S.-g., and Jansen, B. J.

(2018c). Imaginary People Representing Real Num-

bers: Generating Personas from Online Social Media

Data. ACM Transactions on the Web (TWEB), 12(3).

Brickey, J., Walczak, S., and Burgess, T. (2012). Comparing

semi-automated clustering methods for persona devel-

opment. IEEE Transactions on Software Engineering,

38(3):537–546.

Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John,

R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S.,

and Tar, C. (2018). Universal sentence encoder. arXiv

preprint arXiv:1803.11175.

Chapman, C., Krontiris, K., and Webb, J. (2015). Proﬁle

CBC: Using Conjoint Analysis for Consumer Proﬁles.

Technical report, Google Research.

Chapman, C., Love, E., Milham, R., ElRif, P., and Alford, J.

(2008). Quantitative evaluation of personas as infor-

mation. In Human Factors and Ergonomics Society

52nd Annual Meeting, pages 1107–1111.

Chapman, C. N. and Milham, R. P. (2006). The personas’

new clothes: Methodological and practical arguments

against a popular method. In Human Factors and Er-

gonomics Society Annual Meeting, volume 50, pages

634–636.

Cooper, A. (2004). The Inmates Are Running the Asylum:

Why High Tech Products Drive Us Crazy and How

to Restore the Sanity (2nd Edition). Pearson Higher

Education.

Dupree, J. L., Devries, R., Berry, D. M., and Lank, E.

(2016). Privacy personas: Clustering users via atti-

tudes and behaviors toward security practices. In Pro-

ceedings of the ACM Conference on Human Factors

in Computing Systems (CHI2016), pages 5228–5239.

ACM.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Ad-

vances in Neural Information Processing Systems,

pages 2672–2680.

Goodwin, K. and Cooper, A. (2009). Designing for the Dig-

ital Age: How to Create Human-Centered Products

and Services. Wiley, Indianapolis, IN.

Hill, C. G., Haag, M., Oleson, A., Mendez, C., Mars-

den, N., Sarma, A., and Burnett, M. (2017). Gender-

inclusiveness personas vs. stereotyping: Can we have

it both ways? In Proceedings of the ACM Conference

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

614

on Human Factors in Computing Systems (CHI2017),

pages 6658–6671. ACM Press.

Holmgard, C., Green, M. C., Liapis, A., and Togelius, J.

(2018). Automated Playtesting with Procedural Per-

sonas with Evolved Heuristics. IEEE Transactions on

Games, PP(99):1–1.

Hong, W., Zheng, X., Qi, J., Wang, W., and Weng, Y.

(2018). Project Rank: An Internet Topic Evaluation

Model Based on Latent Dirichlet Allocation. In 2018

13th International Conference on Computer Science

& Education (ICCSE), pages 1–4. IEEE.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

Image-to-image translation with conditional adversar-

ial networks. In 2017 IEEE Conference on Computer

Vision and Pattern Recognition (CVPR), pages 5967–

5976. IEEE.

Jansen, B. J., Sobel, K., and Cook, G. (2011). Classifying

ecommerce information sharing behaviour by youths

on social networking sites. Journal of Information Sci-

ence, 37(2):120–136.

Jung, S.-G., An, J., Kwak, H., Ahmad, M., Nielsen, L., and

Jansen, B. J. (2017). Persona generation from aggre-

gated social media data. In Extended Abstracts on Hu-

man Factors in Computing Systems (CHI2017), pages

1748–1755.

Long, F. (2009). Real or imaginary: The effectiveness of

using personas in product design. In Proceedings of

the Irish Ergonomics Society Annual Conference, vol-

ume 14. Irish Ergonomics Society Dublin.

Marsden, N. and Haag, M. (2016). Stereotypes and politics:

Reﬂections on personas. In Proceedings of the ACM

Conference on Human Factors in Computing Systems

(CHI2016), pages 4017–4031. ACM.

Miaskiewicz, T., Grant, S. J., and Kozar, K. A. (2009). A

preliminary examination of using personas to enhance

user-centered design. In AMCIS 2009 Proceedings,

page Article 697.

Molenaar, L. (2017). Data-driven personas: Generating

consumer insights with the use of clustering analysis

from big data. Master’s thesis, TU Delft, Netherlands.

Nielsen, L. (2004). Engaging Personas and Narrative Sce-

narios. Doctoral dissertation. Copenhagen Business

School.

Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., and Pod-

sakoff, N. P. (2003). Common method biases in be-

havioral research: A critical review of the literature

and recommended remedies. Journal of Applied Psy-

chology, 88(5):879.

Salminen, J., S¸eng

un, S., Kwak, H., Jansen, B. J., An, J.,

Jung, S.-g., Vieweg, S., and Harrell, F. (2018a). From

2,772 segments to ﬁve personas: Summarizing a di-

verse online audience by generating culturally adapted

personas. First Monday, 23(6).

Salminen, J. and Jansen, B. J. (2018). Use Cases and

Outlooks for Automatic Analytics. arXiv:1810.00358

[cs]. arXiv: 1810.00358.

Salminen, J., Jansen, B. J., An, J., Kwak, H., and Jung, S.-g.

(2018b). Are personas done? Evaluating their useful-

ness in the age of digital analytics. Persona Studies,

4(2):47–65.

Salminen, J., Jansen, B. J., An, J., Kwak, H., and Jung, S.-

G. (2019a). Automatic Persona Generation for On-

line Content Creators: Conceptual Rationale and a

Research Agenda. In Nielsen, L., editor, Personas -

User Focused Design, Human–Computer Interaction

Series, pages 135–160. Springer London, London.

Salminen, J., Jung, S.-G., and Jansen, B. J. (2019b). Detect-

ing Demographic Bias in Automatically Generated

Personas. In Extended Abstracts on CHI Conference

on Human Factors in Computing Systems (CHI2019),

Glasgow, UK.

Salminen, J., Kwak, H., Santos, J. M., Jung, S.-G., An, J.,

and Jansen, B. J. (2018c). Persona Perception Scale:

Developing and Validating an Instrument for Human-

Like Representations of Data. In Extended Abstracts

on ACM Conference on Human Factors in Computing

Systems (CHI2018), Montr

eal, Canada.

Salminen, J., Milenkovi

c, M., and Jansen, B. J. (2017a).

Problems of data science in organizations: An explo-

rative qualitative analysis of business professionals’

concerns. In Proceedings of International Conference

on Electronic Business (ICEB 2017).

Salminen, J., Nielsen, L., Jung, S.-G., An, J., Kwak, H., and

Jansen, B. J. (2018d). “Is More Better?”: Impact of

Multiple Photos on Perception of Persona Proﬁles. In

Proceedings of ACM Conference on Human Factors

in Computing Systems (CHI2018), Montr

eal, Canada.

Salminen, J., Seitz, S., Jansen, B. J., and Salenius, T.

(2017b). Gender Effect on E-Commerce Sales of

Experience Gifts: Preliminary Empirical Findings.

In Proceedings of International Conference on Elec-

tronic Business (ICEB 2017), Dubai.

Sinha, R. (2003). Persona development for information-rich

domains. In Extended abstracts of the ACM Confer-

ence on Human Factors in Computing Systems (CHI

2003), pages 830–831. ACM.

Vahlo, J., Kaakinen, J. K., Holm, S. K., and Koponen,

A. (2017). Digital Game Dynamics Preferences and

Player Types. Journal of Computer-Mediated Com-

munication, 22(2):88–103.

Volkova, S., Bachrach, Y., Armstrong, M., and Sharma, V.

(2015). Inferring Latent User Properties from Texts

Published in Social Media. In Proceedings of the

Twenty-Ninth AAAI Conference on Artiﬁcial Intelli-

gence, AAAI’15, pages 4296–4297, Austin, Texas.

AAAI Press.

Wang, Y., Kung, L., and Byrd, T. A. (2018). Big data an-

alytics: Understanding its capabilities and potential

beneﬁts for healthcare organizations. Technological

Forecasting and Social Change, 126:3–13.

Zhang, X., Brown, H.-F., and Shankar, A. (2016). Data-

driven personas: Constructing archetypal users with

clickstreams and user telemetry. In Proceedings of

the ACM Conference on Human Factors in Comput-

ing Systems (CHI2016), pages 5350–5359. ACM.

The Future of Data-driven Personas: A Marriage of Online Analytics Numbers and Human Attributes

615