AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts

Through Set Interactions

Franziska Becker

1 a

, Christoph M

uller

1 b

, David Karpuk

2 c

, Tanja Blascheck

1 d

and

Thomas Ertl

1 e

Institute for Visualization and Interactive Systems (VIS), University of Stuttgart, Germany

WithSecure Corporation, Finland

Keywords:

Visual Analytics, Cybersecurity, Exploratory Analysis, Sets, Brushing.

Abstract:

Security providers typically deal with large numbers of alerts based on heterogeneous data from many endpoint

sensors. While the number of alerts is generally much smaller than the volume of raw data, most alerts are

false positives that do not reﬂect genuinely malicious activity. All types of experts work on such alerts, be it to

determine whether they are indeed false positives, to build machine learning models to support their analysis

or to keep an eye on the current threat landscape. We conducted a design study to support a diverse group

of experts whose working environments are connected to the same alert data. Based on an ongoing industry

project that clusters alerts, we designed and evaluated a visual analytics system which enables exploration via

powerful, easy-to-understand ﬁltering mechanisms framed through set operations. In this article, we describe

our system, give a detailed breakdown of the design process and the lessons we learned. We discuss the results

from expert interviews, which showed the set-based framing to align with experts’ intuitive approach to data

analysis and helped users uncover improvement opportunities for the clustering and detection pipelines.

1 INTRODUCTION

Defending the digital infrastructure of companies,

government agencies, and public services against at-

tacks or espionage attempts has become an even more

critical issue in recent years. To deal with the vast

amounts of data collected in such cases, compa-

nies are constantly looking to improve their detection

pipelines and reduce workloads. Working with such

data often requires expertise not easily acquired or

communicated, making cybersecurity experts a scarce

commodity. Analysts who work in security oper-

ations centers (SOC) investigate alerts and decide

whether they refer to malicious activity or are false

positives. However, cybersecurity experts also design

and ﬁne-tune data collection and data processing al-

gorithms that prepare the data for analysts. With the

immense amount of data that can be collected, de-

signing data pipelines that produce data which allows

for the extraction of information that, in turn, leads to

actionable insight is a considerable challenge.

https://orcid.org/0000-0001-6146-7841

https://orcid.org/0000-0002-9979-3682

https://orcid.org/0000-0003-3621-9752

https://orcid.org/0000-0003-4002-4499

https://orcid.org/0000-0003-4019-2505

Based on an alert clustering project at WithSecure,

we designed a visual analytics system for different ex-

perts to interact with the clustering results. However,

understanding and formulating user needs and tasks

was far from trivial. Analysts in the SOC should be

able to ﬁnd opportunities to process batches of alerts

at once and judge whether it is safe to do so. Data sci-

entists should instead be able to gain an intuition of

what the clustering is doing, whether the results are

good, and how the model can be improved. Detection

engineers may want to dig deep into the features, rela-

tions between alerts and how the data is interpreted by

the clustering algorithm so that they can improve col-

lection and detection strategies. How to address these

needs was something even our collaborators found

hard to formulate clearly. With limited access to all

potential user groups, ﬁnding the right parts of the

data and the right visualizations was markedly tricky.

Which workﬂow best suits these high-level tasks was

something that only slowly emerged during the de-

sign phase. In our ﬁnal design, the main focus is to

enable diverse exploration strategies through power-

ful ﬁltering mechanisms. With sets and set operations

as a framing device, our system allows users to inves-

tigate the data from the angle best suited to them. In

this article, we provide contributions in three different

aspects, namely:

776

Becker, F., Müller, C., Karpuk, D., Blascheck, T. and Ertl, T.

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions.

DOI: 10.5220/0013376800003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 1: GRAPP, HUCAPP

and IVAPP, pages 776-787

ISBN: 978-989-758-728-3; ISSN: 2184-4321

• A visual analytics system that employs sets and

set operations to allow different types of experts

to explore data for their respective tasks.

• An in-depth analysis of system usability and

promising usage scenarios via expert interviews.

• Insights and lessons learned throughout the de-

sign process, facing challenges like a fuzzy

task description, working with data from an in-

development industry project, and with limited

access to target users.

2 RELATED WORK

Being situated in the domain of cybersecurity and vi-

sualization, we discuss related work on the intersec-

tion of these two topics. In addition, we review pre-

vious works on brushing-and-linking that employ set

operations or logical operations to combine ﬁlters.

Visualization for Cybersecurity. According to

three surveys (Komadina et al., 2022; Lavigne and

Gouin, 2014; Jiang et al., 2022) that looked at dif-

ferent areas of visualization for cybersecurity, visual-

izations in such systems span the complete visualiza-

tion toolkit: from basic charts showing statistical val-

ues and pixel-based visualizations to treemaps, node-

link diagrams and parallel coordinate plots. Jiang

et al. (2022) note that many tools also include tables,

and Komadina et al. (2022) show that more straight-

forward visualizations like basic charts are the most

common types of visualizations. Gates and Engle

(2013) list the use of custom visualizations that ex-

perts have a hard time understanding as a common

pitfall. Related to our domain of alert or incident

analysis is the work by Shi et al. (2018), who devel-

oped a radial visualization of alerts that are generated

by intrusion detection systems (IDS) from the 2011

and 2012 VAST challenges. These alerts are based

on network activity, meaning the data is much less

heterogeneous and more structured than alerts in our

case, leading to visualizations that focus more on the

where and when of alerts. The authors note that their

visualizations may face scalability issues with more

signiﬁcant numbers of data points, which makes it

harder to get an overview and do exploratory anal-

ysis. Other works related to cybersecurity alerts in-

clude automatic narrative summaries for incident re-

ports (Gove, 2022), a force-directed graph of net-

work intrusion detection alerts (Hong et al., 2019)

that can show patterns in threats like Botnet attacks

and dashboard applications providing statistics about

alerts (Macedo et al., 2021; Carvalho et al., 2016).

Many related works focus on security analysts as

their target audience, which is sensible for scenar-

ios that do not transcend the boundaries between user

roles in a security context. However, our work con-

siders the needs and opportunities of different user

roles in the same company working with the same

alert data. In that respect, our situation aligns with

two of the ﬁve application-agnostic use cases deﬁned

by Gates and Engle (2013): visualization for explo-

ration and visualization as a stepping stone. Regard-

ing the former, only one user role we consider has a

clear question to answer (i. e. if a cluster of alerts can

be treated similarly), whereas the others wish to see

what there is to ﬁnd in the data. Exploration is conse-

quently key to ﬁnding interesting data that could hold

promise for further analysis, which connects to the

second use case of visualization as a stepping stone.

When dealing with ill-formed task descriptions that

can vary between users, visualization can provide the

bridge that brings users to the next point of analy-

sis rather than answering all questions alone. Such

vague tasks can be dealt with by enabling experts to

engage in exploratory search. Exploratory search is a

type of information seeking that can be characterized

by continuous querying and browsing to get a better

understanding of the problem and its possible solu-

tions (Marchionini, 2006; White and Roth, 2009).

Brushing-and-Linking. Brushing itself and

brushing-and-linking are well-known techniques in

the visualization community. In most cases, users can

deﬁne a single brush, e.g., in a scatterplot Becker and

Cleveland (1987), by directly marking an area of data

items resulting in a data selection. For linked views,

such a brushing action updates any related visualiza-

tions to indicate the brushed selection (Buja et al.,

1991). Over the years, different types of brushes

have been developed and investigated, though its

simplest form seems to dominate in practical use. In

our system, users can deﬁne multiple data subsets by

combining brushes (or ﬁlters) using logical (or set)

operations. Similarly, previous works used logical

operations to combine brushes to form complex data

selections. Designing interactions to create such

complex selections can be challenging to design.

Early iterations use dedicated interfaces and widgets

to let users manage brushes (Martin and Ward, 1995;

Doleisch et al., 2003), whereas others make use of

graphs Chen (2004); Koch et al. (2011) or employ

direction manipulation (Roberts and Wright, 2006).

Dedicated interfaces and graph-like visualizations

often make it easier to create and modify complex

compositions, but can incur additional effort and

disrupt the sensemaking process.

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

777

3 SYSTEM DESIGN

The problems our application tackles make it a suit-

able candidate for a design study (Sedlmair et al.,

2012), which is why we opted to follow this method-

ology. Our scenario includes multiple experts, who

do not know precisely how clustering and visualiza-

tion can help solve their problems or whose goals

are fuzzy, as described in section 1. This section de-

scribes the data from the clustering project, how we

process it, which user needs were identiﬁed during

the design process and what the ﬁnal system design

looks like.

3.1 Data Description

Incidents or alerts—terms used interchangeably

throughout this paper—originate from WithSecure’s

analysis pipeline. An automated process sifts through

vast masses of data points from endpoint sensors to

ﬁnd data, which is unusual and worth investigating—

creating an alert. Alerts can be understood as doc-

uments of key-value pairs that contain information

about run processes, network connections, console

commands or triggered detection rules. Clustering

algorithms have enjoyed several applications in cy-

bersecurity, for example, to understand the topology

of enterprise networks (Riddle-Workman et al., 2021;

Pavlenko et al., 2022) or to facilitate handling of se-

curity incidents by threat hunters (Silva et al., 2018;

Raj et al., 2020). The clustering pipeline by With-

Secure produces the data ultimately visualized in our

prototype and consists of the following steps: Alerts

are ﬁrst vectorized in an application-speciﬁc way. As

a result, we get very high-dimensional vectors where

each dimension stands for a feature from the collec-

tion of documents, e. g. indicating that Powershell

was executed. Next, a bespoke version of the DB-

STREAM algorithm (B

ar et al., 2014), a version of

DBSCAN (Ester et al., 1996) for streaming data, is

used to cluster these vectorized alerts. Millions of

vectors are fed into the clustering algorithm, each

with a dimension of around 400k components, re-

sulting in approximately 20k clusters in our datasets.

While the feature space is large, it is also sparse,

with the maximum number of nonzero features for

a single data point not exceeding 500. This scale

makes the data hard for users to work with, but re-

ducing it requires understanding how features can be

modiﬁed—creating a circular dependency of need-

ing to understand the data to reduce its scale, in or-

der to better understand it. As a last step, we cal-

culate a 2D embedding of the cluster centers using

the UMAP algorithm (McInnes et al., 2018), which

is then used to project the alerts. Due to the large

data size, putting all alerts into the dimensionality re-

duction algorithm was not possible. We experimented

with multiple algorithms for the dimensionality re-

duction, which showed the t-SNE algorithm (Van der

Maaten and Hinton, 2008) to be an equally suitable

method, though it usually performs worse in terms of

speed. More information about these experiments can

be found in the supplemental materials at OSF using

https://osf.io/rg5zd/.

3.2 Requirements

Experts voiced a few high-level requirements of what

they wanted to see or do in the application. They ex-

plicitly requested an overview of the complete data

landscape or its most recent snapshot. None of the

different users had this at their disposal in their daily

working lives. Such an overview can provide an idea

of what the “big picture” looks like and simultane-

ously serve as an entry point to ﬁnding interesting

subsets of the data to analyze. At the same time,

SOC analysts wanted to drill down to the alert level,

which is the data they are most familiar with and, is

consequently the information they base decisions on.

Finally, as we expect our system to act more as an

intermediate than a ﬁnal stop, we wanted to include

connection opportunities where users could transfer

data to another tool if they wanted. Since many of our

target users often work in a web browser, using appli-

cations like Kibana, CyberChef or Python Notebooks,

we built our system as a single-page web application.

Apart from these high-level needs, we tried to ﬁnd

common or essential activities in experts’ problem-

solving workﬂows. They knew their respective goals,

e. g. propagating resolution labels through clusters to

reduce false positives, but could not clearly articulate

the activities that are vital to reaching their goal. The

complex nature of such problems made precise tasks

or activities hard to pin down. Ultimately, we found

that these vague tasks are likely best served through

exploration, allowing users to learn more about the

data and how to solve their problems whilst interact-

ing with it. To effectively enable exploration of the

data, there must be a means for the user to deﬁne and

compare subsets of that data. Exploratory analysis via

comparison is a common strategy in many ﬁelds, es-

pecially when the data can naturally be divided into

different groups, e. g. normal and abnormal behavior.

As a baseline, it requires ﬁltering the data into dif-

ferent subsets based on their attributes and analyzing

these subsets. With such functionalities in tow, users

can start a loop of analyzing the current data land-

scape, deﬁning one or more subsets that seem promis-

IVAPP 2025 - 16th International Conference on Information Visualization Theory and Applications

778

ing and then going back to studying the updated data

landscape. To make this loop as efﬁcient as possible,

both the action of ﬁltering and the action of analyz-

ing data attributes should be fast and easy to perform.

Going beyond simple subsets, set operations provide a

natural extension to this concept that can increase the

expressive power of the basic ﬁltering interactions it

is based on. Filters are commonly deﬁned as a logi-

cal AND or intersection operation and can only be per-

formed once for any speciﬁc feature. Additional set

operations, like union or difference, allow for more

complex selections through chaining.

3.3 Visualization and Interactions

To better illustrate how visualizations and interactions

in our system work, we ﬁrst consider how the set-

based framing, or brushing with logical operators, is

realized.

3.3.1 Set-Based Framing

Sets in our system are a collection of ﬁlters (or

brushes) that deﬁne which alerts belong to a set. A

ﬁlter consists of the attribute and value(s) it ﬁlters

for and the set operation that should be used when

combining it with other ﬁlters. Whenever a user de-

ﬁnes a new ﬁlter in our system—except for the ﬁrst

one—a popup asks the user to choose the set opera-

tion they want to use for this ﬁlter. This prompt de-

picts all available operators (intersection, union, dif-

ference) as icon buttons at the current location of the

mouse cursor (cf. Figure 1). Upon choosing an oper-

ator, the new ﬁlter is added to the set and its resulting

alert collection is updated. Filters can be deﬁned for

any alert attribute, including the cluster label, source

organization, and time of recording. Users can de-

ﬁne an arbitrary number of sets, and sets can always

be modiﬁed, meaning that ﬁlters can be added or re-

moved. In addition, sets themselves can also be com-

bined using set operations. We chose a simple and

intuitive approach to creating and combining ﬁlters

for two reasons: First, the overall system complex-

ity should be minimized to reduce adoption barriers.

Second, it should support exploration without inter-

fering with the sensemaking ﬂow.

3.3.2 System Components

Our visual analytics system employs a component-

based architecture. Its components are laid out from

top to bottom, with an increasing level of detail go-

ing down, as shown in Figure 2. Framed through the

lens of the famous visual information-seeking mantra

“Overview ﬁrst, zoom and ﬁlter, then details on de-

Figure 1: Filtering prompt displayed after a ﬁlter interac-

tion. The icons denote which set operation will be used

for combining the ﬁlter. The red border around the prompt

blinks continuously to direct the user’s attention towards it.

mand” (Shneiderman, 2003), we describe how our

system lets users explore clusters and their alerts.

Overview First. The complete dataset is visualized

in two components: the histogram panel ( ) and the

scatter plot ( ). Both serve as overviews and en-

try points into the analysis process. In the histogram

panel, we visualize high-level attributes of the alert

data, such as the resolution label, each in its own his-

togram akin to small multiples. The scatter plot dis-

plays all embedded alert vectors, which are colored

according to one of the attributes in the histogram

panel. Experts may choose which attribute is used

for coloring by clicking on the respective histogram

label. This allows users to quickly gauge the gen-

eral distribution of an attribute, which can be a start-

ing point for deeper analysis. Users can change the

color scale for the scatter plot in the foldable settings

panel (

) above the scatter plot. It also allows them

to deﬁne the dot size and drawing opacity. Both the

histogram panel and the scatter plot provide means to

deﬁne or modify a set: the histograms allow for brush-

ing if they show a numerical attribute; otherwise, sin-

gle bars can be clicked to create a ﬁlter. In the scatter

plot, experts can click on a single circle to select the

respective alert or select a whole region with the lasso

tool. In addition to showing alerts, the scatter plot has

two additional modes that visualize the clusters. The

ﬁrst mode keeps the scatter plot as is, but also depicts

information about a data point’s cluster upon hover-

ing: It draws the cluster extent as a polygon and high-

lights all data points belonging to the cluster. Click-

ing on the alert creates a ﬁlter based on that alert’s

cluster. The second mode visualizes the active sets

and their cluster polygons, showing overlap between

them (cf. Figure 3). These modes give users an idea

of how clusters are distributed across the embedded

space and how they are connected to speciﬁc alerts.

Details on Demand. On the next level, we visual-

ize data about the alerts in any of the deﬁned sets.

First, there is the set info panel ( ). It is a movable

ﬂoating window for each set and can be hidden on de-

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

779

Figure 2: Overview of the improved system: ( ) histogram panel, ( ) interactive scatter plot, ( ) collapsed settings panel,

( ) advanced selection and set control, ( ) feature set operations, ( ) ﬂoating set info panels showing set information,

( ) raw data table, ( ) set feature panel and a ( ) toast that is displayed as long as computationally expensive requests or

operations are being handled. Feature names are pixelated for privacy.

IVAPP 2025 - 16th International Conference on Information Visualization Theory and Applications

780

Figure 3: Scatter plot in the second mode, showing colored

polygons for all clusters that belong to the two active sets.

mand. The panel displays general information about

its respective set, like its number of alerts and clus-

ters. In addition, it displays small histograms for the

same attributes as the histogram panel but is limited to

the alerts contained in the set. Lastly, this component

includes action buttons related to the set, such as re-

moving the set or combining it with another set to cre-

ate a new set. A set’s ﬁlters are displayed at the bot-

tom of the panel in a list-like fashion. Each list item

shows the attribute that is ﬁltered, from which com-

ponent the ﬁlter was created and which set operation

is used for it. Experts may delete a ﬁlter with the trash

icon button or copy all ﬁlters up to the selected one to

the active set with the copy icon button. Copying ﬁl-

ters allows users to easily create slightly modiﬁed ver-

sions of the same set without recreating all its ﬁlters.

For the investigation of alert features, we supply users

with the feature operations component ( ), located

right next to the scatter plot. It lets users inspect how

the features of different sets relate to one another by

looking at feature prevalences. Speciﬁcally, experts

can choose to which sets they want to apply which set

operations. For example, an expert may want to see

the intersection of features for two sets: The result is

visualized in two bar charts after choosing the appro-

priate sets and set operation via checkboxes. The ﬁrst

chart shows how many features are in the sets and the

result. The second chart displays the prevalence of

features in the result in a horizontal bar chart. On top

of each bar in the chart, the feature name is shown,

which can be clicked on to create a ﬁlter for the re-

spective feature.

Another component that lets users interact with set

data is the raw data table ( ). It is positioned below

the scatter plot and contains an interactive sortable ta-

ble for a set, which can be chosen via an action button

in the set info panel. The table’s columns contain all

the attributes shown in the histogram panel in addi-

tion to the alert ID and respective cluster ID. Each

column can be sorted, and the alert and cluster ID

cells can be clicked to create a ﬁlter for the respec-

tive ID. At the bottom of the page, users can ﬁnd the

ﬁnal component, the set features panel ( ), where ex-

perts can inspect the feature prevalences in a set and

compare it to the features of individual alerts or clus-

ters. When a set is loaded into the raw data table, it

is also loaded into this component, which shows the

same kind of horizontal bar chart used in the feature

operations component. This bar chart can be sorted

in ascending or descending fashion, ﬁltered through

a text input widget, and lets users create a new ﬁl-

ter based on a feature name. In the raw data table,

experts may select a single alert or cluster with an

icon button pointing downwards toward the set fea-

ture panel. Clicking on such a button highlights the

features that the chosen data has, drawing all other

features with a lower opacity. There are two addi-

tional means of creating ﬁlters in the system, one in

the settings panel (

) and one in the advanced selec-

tion panel ( ). Next to the settings panel is an input

widget where users can enter a feature name to ﬁl-

ter. It includes an autocomplete functionality and also

accepts incomplete strings matching multiple feature

names. The advanced selection panel, which can be

hidden or shown on demand, allows users to enter an

alert ID or cluster ID to ﬁlter for. In case an alert ID is

entered, users can also create a ﬁlter for its respective

cluster with the click of a button.

3.3.3 Interaction Design

Our system employs common interaction techniques

from the ﬁeld of information visualization, namely

multiple views and brushing-and-linking. For exam-

ple, brushing a range of bars in one of the histograms

( ) applies a ﬁlter to the active selection. This ﬁlter

is visualized by reducing the opacity of all histogram

bars outside the brushed region. All data points in

the scatter plot ( ) that belong to a user-deﬁned set

are highlighted, and a polygon is drawn around them.

In addition, the set info panel ( ) is continuously up-

dated to reﬂect any changes in the active selection. An

overview of the complete data is constantly visible.

At the same time, user-deﬁned sets are visually distin-

guished in the scatter plot. More details are available

in several other components, like the set info panel

( ), the raw data table ( ), the set feature panel ( )

and the feature operations component ( ). Since we

are working with many data points, some operations

or backend requests take longer than a second. In such

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

781

cases, our system displays a small toast ( ) for the

duration of the operation, as shown in Figure 2. Upon

completion, the toast is updated to indicate the result

and disappears automatically after a few seconds.

4 EVALUATION

We evaluated our visual analytics system with three

experts working at WithSecure, which this section de-

scribes in detail. The evaluation was carried out on

a previous iteration of our system, which was then

improved based on the evaluation feedback. Specif-

ically, instead of having the different layers in the

scatter plot ( ), we allowed participants to switch be-

tween an alert view and a cluster view. In each view,

the user could see the same components but show only

the data for either the alerts or the cluster centers. An-

other change was the extent of the set-based fram-

ing. Instead of including set operations for each ﬁl-

ter, users could only combine different sets, i.e., data

selections, via set operations. Lastly, instead of the

feature set panel ( ), we showed users a bar chart

of feature prevalences for a single selected alert or

cluster center in combination with a table, which to-

gether constituted the sample info panel ( ). In case

they were in the cluster view, the table showed the ten

closest and ten farthest alerts to the cluster center. If

they were in the alert view, it only showed the selected

sample chosen in the raw data table.

4.1 Methodology

We opted to conduct expert interviews since we tar-

get experts and want to focus on understanding how

experts explore the data and how exploration is me-

diated by our system’s design. Our collaborator re-

cruited three participants who work at different po-

sitions at WithSecure Corporation. One of the par-

ticipants was a senior cybersecurity expert who over-

sees other SOC analysts. He was previously part of

a feedback session, where he saw an earlier iteration

of our system. Another participant, who worked as

a data scientist, had previously worked on the clus-

tering project, though his involvement ended more

than a year before the development of our visual-

ization. The last participant worked at the intersec-

tion of data science and cybersecurity and was not

involved in related projects. Because there are sig-

niﬁcant geographical distances between the authors

and the participants, the interviews were conducted

remotely via Microsoft Teams. We chose to let par-

ticipants interact with the system in a modiﬁed ver-

sion of the pair analytics format (Arias-Hernandez

et al., 2011), which was conducted in the following

manner: Our prototype was set up on the machine

of our industry collaborator and co-author of this pa-

per shared his screen and performed any requested ac-

tions inside the system. The participant could either

directly specify what they wanted to do or state what

they wanted to see. All participants worked with the

same dataset containing real-world alerts, clustered

using the algorithm our industry collaborators imple-

mented. In addition, two more authors were present

during the interviews and participated in discussions,

helping when it was unclear how to perform a spe-

ciﬁc action best and directing the interview to system

parts previously unexplored. We chose this evaluation

method because we wanted participants to explore all

core features of the system, regardless of their visu-

alization expertise and without having to spend time

training to understand all of the system’s capabilities.

All participants received a demo video showcas-

ing the most relevant features of our prototype a few

days before the interview. They were also given a

short form to ﬁll out, which contained questions re-

garding demographic information about their person

and their expertise in cybersecurity and visualization.

The questions and answer options of the question-

naire can be found in the supplemental materials. Ta-

ble 1 shows all participants’ recorded demographic

information and reported expertise. The interviews

lasted 80, 115 and 120 minutes, respectively, and

were recorded as a video. Each interview started

with a short introduction regarding the objective of

the interview, the system’s components and the un-

derlying data. Then, we addressed any questions par-

ticipants had already formed by watching the demo

video. This naturally led to exploring other parts of

the data or other functionalities of the prototype. As

such, participants did not explore the same parts of

the data and experienced the system from their per-

spectives. However, we tried to ensure that each par-

ticipant used all core functionalities at least once. We

continuously asked participants to elaborate on their

thought processes during the interviews while explor-

ing the data. After every core system functionality

had been visited and all of the participant’s questions

had been discussed, we concluded the interview by

asking ﬁve questions about their thoughts regarding

our system. Quite frequently, the topics included in

their answers were already touched upon during the

interviews beforehand, as participants naturally com-

mented on what they liked or what they felt was miss-

ing while interacting with the system.

IVAPP 2025 - 16th International Conference on Information Visualization Theory and Applications

782

Table 1: Information about participants related to demographic factors and expertise. Participants were asked to report their

expertise in years and rate their familiarity with the topic on a scale from 1 to 5, with the following value labels: none (1),

good (3) and expert (5).

Participant Age Gender Cybersecurity Expertise Visualization Expertise

30-45 male 15+ years, rated 3 (good) 15+ years, rated 3 (good)

30-45 male 4 years, rated 2 15 years, rated 3 (good)

45+ male 25 years, rated 5 (expert) 23 years, rated 3 (good)

4.2 Results

This subsection presents participants’ attitudes to-

wards our system by describing the common themes,

ideas and criticisms we found in the interviews. We

discuss the main topics extracted from participants’

responses to questions and comments made during

the interview. Finally, we analyze participants’ ex-

ploration strategies and usage experience.

4.2.1 Questionnaire

After the demonstration of the prototype, we asked

the experts (i) what they liked and (ii) what they dis-

liked about it, (iii) what they thought about the set-

based framing, (iv) how they would use our system

(or a modiﬁed version of it) and (v) whether their

view of the data, system or usage opportunities for

either changed after this interview. Overall, partici-

pants viewed the system positively, with one unani-

mous comment being that having an overview via the

scatter plot was a helpful place to start exploring the

data. Other than that, we identiﬁed four core themes

throughout their answers.

Integration. Expectedly, all participants mentioned

wanting the system to be integrated with other tools

inside the company infrastructure that handles the

same type of data, i. e. alerts. For example, participant

said: “I select these ﬁve incidents, and then I want

to click a button that opens [their current tool] in new

tabs, one for each incident. Then I can jump between

the tabs and inspect them.” We expected this issue to

be brought up after having discussed it during previ-

ous meetings in the design process, but it was impos-

sible to resolve in the prototype as WithSecure cannot

share the required details with third parties due to con-

tractual obligations with their customers. Therefore,

we could only resort to displaying unique identiﬁers,

e. g. for individual alerts, which could eventually be

used to connect to other web-based tools and access

the relevant details.

Missing Data. In a similar vein to the previous

topic, participants P

and P

, both familiar with cy-

bersecurity and common tasks cybersecurity analysts

deal with, mentioned that they felt some data was

missing. This included the raw alert data that ana-

lysts work on, which is not the same as the raw data

available in our system. Alerts that go into our sys-

tem only have a handful of attributes attached to them,

while the remaining attributes, like the speciﬁc detec-

tion that resulted in the alert or the hierarchy of related

processes, are lost to the vectorization process. Inter-

estingly, participant P

did not mention this aspect.

We think this may be connected to another topic we

discussed with participant P

, who talked about the

two angles from which he thinks this system can be

viewed, based on the person who does the viewing:

a data-driven and a knowledge-driven angle. For ex-

ample, participant P

worked as a senior data scientist

and was more interested in understanding the cluster-

ing and alerts in terms of the data that is available.

Set-Based Framing. We speciﬁcally asked partic-

ipants for their opinion on the set-based framing,

which participant P

had already voiced as the as-

pect of our system he liked in the ﬁrst question. He

said that when they work with such data, the opera-

tions they would apply would likely also “be based

on some kind of set operations to massage the data.

And this is just a convenient way of doing it.” For

the other participants, these questions seemed harder

to answer, with participant P

explicitly saying: “For

me, it’s very difﬁcult to say.” In this context, both

and P

remarked that they would like to be able

to hide data they are not interested in. For P

, this

was related to not being able to easily see alerts in the

scatter plot that have a rare label, such as confirmed,

since these are often occluded by the large mass of

data points with the label unconfirmed. Participant

approached this from a different angle, saying that

when investigating an alert, he would want to only see

the relevant context for that alert, meaning only alerts

from the same organization. Due to the signiﬁcant

differences between organizations, an analyst might

be interested in other features or judge them differ-

ently based on the organization the alert came from.

We understood these comments as participants want-

ing to extend the set-based ﬁltering to other features,

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

783

some of which are not part of our data, and also want-

ing to hide data that they are not interested in visually

or that is not a member of any deﬁned set.

Usage Scenarios. The responses to the question

about ﬁtting use cases for the system varied—though

there was some overlap. Participant P

said that the

best use would be for analysts to save time when in-

vestigating an alert. He stated that, provided our tool

was integrated into production systems and had ac-

cess to the missing data, he could see it being used

by analysts to increase their productivity by speeding

up decisions. Participant P

agreed that the system, as

we presented it, does not contain the information nec-

essary to make decisions about single alerts but that

an integrated version might assist him in “false alarm

hunting.” He remarked that this would enable detec-

tion engineers to use it. Detection engineers maintain

“the system”, but were not initially considered to be

part of the target audience. Being able to ﬁnd patterns

that identify false alarms takes an immense amount

of experience, so any support for that task can be a

big help. In addition, he commented that the sys-

tem “[. . . ] is so much more efﬁcient just for slicing

and dicing data than we can do with [their current

tool],”, which might enable more people to handle sit-

uations where he would otherwise have to step in and

help. Participant P

also mentioned that he could see

our system being used to do exploratory analysis—

to see whether there are new trends, strange behav-

iors, or even redundant or missing features in the data.

This was echoed by participant P

, who voiced that

he would use our system to understand better the ac-

tual “substance” of the data. During the interview, he

also speculated that it might be useful for users who

do not yet have much experience with such alert data

and could learn about it using our system. These opin-

ions mirror the position by Gates and Engle (2013): In

some cases, visualization can best serve its users as a

stepping stone.

4.2.2 Exploration Strategies

Although participants did not directly interact with

our system themselves, we still observed differences

in their exploration strategies based on their instruc-

tions. Overall, participants often started by investi-

gating the initial conﬁguration, studying the scatter

plot ( ) and histogram panel ( ). They usually pro-

ceeded by deﬁning a set through interactions with

said components and then consulting the attributes of

the set in the set info panel (

). From there, they

mostly changed their selection completely or split it

into multiple sets. If they decided to investigate sets,

they looked at both the raw data ( ) and employed

set operations ( ) to ﬁnd differences between sets—

leading to new or further exploration based on the re-

sult. Participants often wanted to inspect the features

in a set before using the set operations in the feature

operations panel ( ), which was not as easy as they

liked. Participants tended to spend more time in the

view that better matched their expertise and rarely uti-

lized the sample info panel ( ); if it was used, it was

mainly done in the clustering view. As a consequence

of the two latter observations, we replaced the sample

info panel with the set feature panel ( ), which lets

users inspect features for the complete set as well as

single data points therein.

4.2.3 Usage Experience

Looking at the interactions and resulting user experi-

ence, we observed that deﬁning sets, comparing them

and inspecting their details seemed easy for partic-

ipants to understand and structure their exploration

strategy by. They intuitively talked about the data

in terms of sets that are deﬁned by speciﬁc features

or combinations thereof. Our system even helped

two participants discover ideas to improve the clus-

tering pipeline or ﬁnd data parts they would like to

inspect in more detail. However, there were also sev-

eral problems, even if users themselves did not nec-

essarily identify them as such. Deﬁning the desired

sets sometimes took too many steps, also mentioned

as a limitation by participant P

, making it cumber-

some and time-consuming. For example, to get three

different sets that each include alerts from the same

region in the scatter plot but differ in the resolution

label, the user has to select the region, select the label

and then declare this selection as a new set each time.

Analysts also encountered a few situations where they

could not create the ﬁlter they wanted, like selecting

by cluster ID. In addition, not seeing the alerts when

looking at the clusters, and vice versa, was also some-

thing that hindered users during their exploration.

4.2.4 Improvements

Some usability issues were addressed in a revised it-

eration of our prototype, as described in section 3.

To get preliminary feedback on these changes, one

of our previous participants watched a video show-

casing them—together with ﬁve questions in writing.

Both the questions and responses can be found in the

supplemental materials. Overall, our expert viewed

the revision positively, saying that it expands the sys-

tem’s functionality where he sees the highest beneﬁt:

having easier ways of combing through the data.

IVAPP 2025 - 16th International Conference on Information Visualization Theory and Applications

784

5 DISCUSSION

We split the discussion of the evaluation results and

what can be learned from them into two groups: those

related to the design of our system and those related

to the overall context in which our work exists, the

sphere of visualization for cybersecurity applications.

5.1 System Functionality and Usability

Analyzing what participants said and how they inter-

acted with the system, the set-based framing seems

to align well with how users think about the data in

the ﬁrst place and provides an easy mechanism for

them to start exploring. All participants found it an

appropriate means of exploring the data, with one of

them highlighting the concept as what he liked most

about the system as a whole. Framing exploration

through sets and set operations is easy to understand

and simultaneously allows for powerful selection me-

chanics that create visualizations for speciﬁc subsets.

In turn, these might show interesting patterns not so

effortlessly found by manually sifting through data.

While all participants were able to explore the data

for their respective focus successfully, we were sur-

prised that they all concentrated so heavily on the res-

olution labels. Throughout discussions in the design

phase, other features oftentimes dominated the con-

versation, e. g. the size of clusters and the number

of unique nonzero features. During the interviews,

participants did not consider these features much and

mainly viewed the data through the lens of resolution

labels at the start of their analysis. This strategy ex-

posed the limits of ﬁltering the data by this feature,

which could quickly become cumbersome. Partici-

pant P

also commented that it was not immediately

apparent to him whether an interaction would affect

the selection or only highlight data.

In general, we found that the extent to which set

operations could be chained in the system’s previous

iteration was too limited. While it was possible to

combine different conditions, be it the location in the

scatter plot or any of the features in the histograms,

chaining conditions based on the same feature was not

possible in the most straightforward manner. With the

extended ﬁltering functionalities and more opportuni-

ties to create ﬁlters, this issue could already be mit-

igated in the revised system version. Based on user

feedback, clusters and alerts should be more closely

connected so that both can be investigated simultane-

ously. A ﬁrst attempt to tighten this connection was

made in the revised prototype through the layers and

interactions in the scatter plot; more work is needed

to test whether this approach supports users’ work-

ﬂows. In addition, two practical limitations restrict

the utility of our system: missing data and integra-

tion. However, these require our industry partner’s

side to solve. Lastly, the potential for misinterpreta-

tion should also be kept in mind. Using clustering and

dimensionality reduction introduces uncertainty into

the analysis, which can suggest meaning where none

may be found. Distances in the scatter plot may not

correlate with similarity in the data, and clusters may

not provide a ﬁtting grouping of alerts. Hence, users

need to be aware of such potentially confounding fac-

tors. One participant mentioned that he would like

to select alerts in the scatter plot based on distances

in the high-dimensional space. However, this could

introduce a lot of confusion because of discrepancies

between low- and high-dimensional distances and the

unintuitive nature of high-dimensional spaces.

5.2 Insights and Lessons Learned

Reﬂecting on the design process and evaluation of our

visual analytics system for cybersecurity alerts, one

critical lesson emerged: the centrality of collabora-

tion. Initially, our design process adhered to strict,

disconnected user roles. This clear separation mir-

rored the everyday workﬂows of cybersecurity pro-

fessionals and aligned with the distinct expertise and

responsibilities that deﬁne these roles. Analysts, data

scientists, and detection engineers often operate in

separate environments, tackling speciﬁc tasks tailored

to their respective skill sets. However, we see un-

tapped potential for collaboration across these roles.

While the division of labor brings efﬁciency and fo-

cus, it can unintentionally silo expertise, preventing

the exchange of insights that might lead to innovative

solutions. Collaboration, when facilitated effectively,

can help uncover new perspectives on the data, iden-

tify opportunities for system improvement, and ulti-

mately shape better approaches for tackling similar

problems in future projects.

The need for collaboration across roles in cyber-

security is not a new insight. Experts often recog-

nize the value of sharing knowledge but face barri-

ers in practice. A key challenge, as noted by our in-

dustry partner, is the lack of tools designed to fos-

ter meaningful collaboration. Current systems often

do not support users in identifying speciﬁc aspects

of the data where their unique expertise could pro-

vide value. As a result, opportunities for improve-

ment are missed because experts are either unaware

of these opportunities or lack the context to engage

with them effectively. This challenge became ev-

ident during one of our interviews. For example,

a participant highlighted adaptations to the feature

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

785

modeling—such as removing some features originat-

ing from PowerShell scripts—that could improve the

clustering pipeline. The same expert also speculated

about potential reﬁnements to detection rules for iden-

tifying malicious activity. These ideas did not sur-

face earlier, despite security experts already looking

at the data. We suspect this is because the tools at

their disposal did not let them ﬁnd these issues be-

cause they weren’t looking at the data from this par-

ticular angle. Looking back at the design process,

having different user groups interact with the data

and prototypes in a collaborative setting could also

have sped up and improved the development of our

system. Of course, getting a hold of experts, espe-

cially in the cybersecurity domain, is a well-known

problem (Adams and Snider, 2018; D’Amico et al.,

2016). However, visualization with different users

in mind could present an approach to improve opera-

tional effectiveness through easier collaboration. Ad-

ditionally, companies may even ﬁnd use in applying

the design study methodology to ﬁnd opportunities

for workﬂow improvements or the collection and pro-

cessing of data—even if it does not result in an appli-

cation that will be integrated into production.

5.3 Future Work

We see two main avenues for future work. First, there

are still many ways in which our system design can

be improved. Although we already added some of

these and received positive feedback, the set fram-

ing could be extended through provenance features,

making it easier to track different sets and ﬁlters over

time. Asynchronous insight externalization and an-

notation could help different experts exchange and

store their insights. In this manner, insights and the

data they are tied to are more easily exchanged, and

the system can be evaluated in terms of the insights

it supports best. The second avenue goes in a dif-

ferent direction, as suggested by a participant during

the interviews. He explained that he would be in-

terested in manipulating both the clustering and the

embedding via interactions in our application. This

idea goes into the direction of interactive machine

learning (Fails and Olsen, 2003) and human-machine

teaming (Wenskovitch et al., 2021), where users work

together with machine learning algorithms, providing

feedback that incorporates their knowledge into the

algorithm, which in turn can show patterns in the data

not visible beforehand. While this is an interesting

idea, the scale of the data makes an interactive system

hard to implement—the clustering model used in this

work already takes a few hours to compute.

6 CONCLUSION

We designed and evaluated a visual analytics system

for different expert users in a cybersecurity company

to explore high-dimensional clustered alerts. The data

we visualize comes directly from an ongoing project

from an industry partner, with the initial goal of im-

proving their handling of alerts, especially false pos-

itives. Due to fuzzy task descriptions, our design fo-

cuses on exploration through set operations, letting

users easily ﬁnd and analyze interesting subsets of the

data. Using this framing, we provide experts with a

simple and familiar construct to explore their massive

and sometimes hard-to-understand data. The evalua-

tion of our system showed that different users could

use our system to work on various tasks, even if it

served more as a stepping stone than an all-around so-

lution. Participants were able to identify directions for

further analysis and found potential means of improv-

ing the clustering pipeline and detection rules. We

described the design process and the lessons learned

from the development and evaluation. In our case,

creating a generic but powerful system for exploration

allowed different experts to contribute their unique in-

sights and ﬁnd directions for further investigation.

ACKNOWLEDGEMENTS

Tanja Blascheck is funded by the European Social

Fund and the Ministry of Science, Research and Arts

Baden-W

urttemberg.

REFERENCES

Adams, C. N. and Snider, D. (2018). Effective data visual-

ization in cybersecurity. In SoutheastCon, pages 1–8.

Arias-Hernandez, R., Kaastra, L. T., Green, T. M., and

Fisher, B. (2011). Pair analytics: Capturing reasoning

processes in collaborative visual analytics. In Hawaii

Int. Conf. Sys. Sci., pages 1–10.

Becker, R. A. and Cleveland, W. S. (1987). Brushing scat-

terplots. Technometrics, 29(2):127–142.

Buja, A., McDonald, J. A., Michalak, J., and Stuetzle, W.

(1991). Interactive data visualization using focusing

and linking. In Proc. 2nd Conf. Vis. ’91, pages 156–

163.

ar, A., Casas, P., Golab, L., and Finamore, A. (2014).

Dbstream: An online aggregation, ﬁltering and pro-

cessing system for network trafﬁc monitoring. In Int.

Wireless Comm. Mob. Comp. Conf., pages 611–616.

Carvalho, V. S., Polidoro, M. J., and Magalh

aes, J. P.

(2016). Owlsight: Platform for real-time detection

and visualization of cyber threats. In IEEE 2nd Int.

IVAPP 2025 - 16th International Conference on Information Visualization Theory and Applications

786

Conf. Big Data Sec. Cloud (BigDataSecurity), IEEE

Int. Conf. High Perf. Smart Comp. (HPSC), and IEEE

Int. Conf. on Int. Data Sec. (IDS), pages 61–66.

Chen, H. (2004). Compound brushing explained. Informa-

tion Visualization, 3(2):96–108.

D’Amico, A., Buchanan, L., Kirkpatrick, D., and Walczak,

P. (2016). Cyber operator perspectives on security vi-

sualization. In Nicholson, D., editor, Advances in Hu-

man Factors in Cybersecurity, pages 69–81, Cham.

Springer Int. Publishing.

Doleisch, H., Gasser, M., and Hauser, H. (2003). Inter-

active feature speciﬁcation for focus+context visual-

ization of complex simulation data. In VisSym, pages

239–302. The Eurographics Association.

Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996).

A density-based algorithm for discovering clusters in

large spatial databases with noise. In Proc. Int. Conf.

Know. Disc. Data Mining, KDD’96, pages 226––231.

AAAI Press.

Fails, J. A. and Olsen, D. R. (2003). Interactive machine

learning. In Proc. of the Int. Conf. on Intelligent User

Interfaces, pages 39–45.

Gates, C. and Engle, S. (2013). Reﬂecting on visualization

for cyber security. In IEEE Int. Conf. on Intelligence

and Security Informatics, pages 275–277.

Gove, R. (2022). Automatic narrative summarization for

visualizing cyber security logs and incident reports.

IEEE Trans. Vis. Comp. Graph., 28(1):1182–1190.

Hong, J., Lee, J., Lee, H., Chang, Y., Choi, K., and Cha,

S. K. (2019). AlertVision: Visualizing security alerts.

In Kang, B. B. and Jang, J., editors, Information Secu-

rity Applications, pages 173–184, Cham. Springer Int.

Publishing.

Jiang, L., Jayatilaka, A., Nasim, M., Grobler, M., Zahedi,

M., and Babar, M. A. (2022). Systematic literature

review on cyber situational awareness visualizations.

IEEE Access, 10:57525–57554.

Koch, S., Bosch, H., Giereth, M., and Ertl, T. (2011). It-

erative integration of visual insights during scalable

patent search and analysis. IEEE Trans. Vis. Comp.

Graph., 17(5):557–569.

Komadina, A., Mihajlovi

Z., and Gro

s, S. (2022). Analy-

sis of the design space for cybersecurity visualizations

in vizsec. In IEEE Symp. Vis. Cyber Sec., pages 1–11.

Lavigne, V. and Gouin, D. (2014). Visual analytics for cy-

ber security and intelligence. The Journal of Defense

Modeling and Simulation, 11(2):175–199.

Macedo, I., Wanous, S., Oliveira, N., Sousa, O., and Prac¸a,

I. (2021). A tool to support the investigation and visu-

alization of cyber and/or physical incidents. In Trends

Appl. Inf. Sys. Tech., pages 130–140, Cham. Springer

Int. Publishing.

Marchionini, G. (2006). Exploratory search: From ﬁnding

to understanding. Commun. ACM, 49(4):41–46.

Martin, A. R. and Ward, M. O. (1995). High dimen-

sional brushing for interactive exploration of multi-

variate data. In Visualization ’95, pages 271–278, Los

Alamitos. IEEE Computer Society Press.

McInnes, L., Healy, J., Saul, N., and Großberger, L. (2018).

UMAP: uniform manifold approximation and projec-

tion. Journal of Open Source Software, 3(29):861.

Pavlenko, E. Y., Eremenko, I. S., and Fatin, A. D. (2022).

Computer network clustering methods in cybersecu-

rity problems. Automatic Control and Computer Sci-

ences, 56(8):957–963.

Raj, C., Khular, L., and Raj, G. (2020). Clustering based in-

cident handling for anomaly detection in cloud infras-

tructures. In Int. Conf. Cloud Comp., Data Science &

Engineering, pages 611–616.

Riddle-Workman, E., Evangelou, M., and Adams, N. M.

(2021). Multi-type relational clustering for enterprise

cyber-security networks. Pattern Recognition Letters,

149:172–178.

Roberts, J. C. and Wright, M. (2006). Towards ubiquitous

brushing for information visualization. In Banissi, E.,

editor, Information Visualization, pages 151–156, Los

Alamitos, Calif. IEEE Computer Society.

Sedlmair, M., Meyer, M., and Munzner, T. (2012). Design

study methodology: Reﬂections from the trenches

and the stacks. IEEE Trans. Vis. Comp. Graph.,

18(12):2431–2440.

Shi, Y., Zhao, Y., Zhou, F., Shi, R., Zhang, Y., and Wang,

G. (2018). A novel radial visualization of intrusion

detection alerts. IEEE Comp. Graph. App., 38(6):83–

95.

Shneiderman, B. (2003). The eyes have it: A task by data

type taxonomy for information visualizations. In The

Craft of Information Visualization, pages 364–371.

Elsevier.

Silva, D., Dell’Amico, M., Hart, M., Roundy, K. A., and

Kats, D. (2018). Hierarchical incident clustering for

security operation centers. In Proc. of KDD Workshop

on Interactive Data Exploration and Analytics. ACM.

Van der Maaten, L. and Hinton, G. (2008). Visualizing data

using t-sne. Journal of Machine Learning Research,

9(11).

Wenskovitch, J., Fallon, C., Miller, K., and Dasgupta, A.

(2021). Beyond visual analytics: Human-machine

teaming for ai-driven data sensemaking. In IEEE

Workshop on TRust and EXpertise in Visual Analyt-

ics, pages 40–44.

White, R. W. and Roth, R. A. (2009). Exploratory search:

Beyond the query-response paradigm, volume 1. Mor-

gan & Claypool Publishers.

AlertSets: Supporting Exploratory Analysis of Cybersecurity Alerts Through Set Interactions

787