Development of a Procedure for the Processing of Raw Sensor Data

from Smart Devices for Utilisation in Process Mining

Rebecca Bulander

, Bernhard Kölmel

and Marcel Julian Rath

IoS³ - Institut für Smart Systems und Services, Pforzheim University, Tiefenbronner Straße 65, 75175 Pforzheim, Germany

Keywords: Process Mining, Smart Devices, ETL (Extract, Transform, and Load), Microsoft Power Query, Phyphox, Data

Extraction, Data Transformation.

Abstract: This research paper proposes a novel approach to address the blind spot in process mining. The method

involves collecting raw sensor data using a smartphone application and transforming it into an event log using

Microsoft Power Query. The experimental process is then discovered using process mining to reduce the blind

spot in process mining. The study demonstrates the potential of process mining in previously unexplored areas

and shows how enterprises that could not previously benefit from process mining can now do so. The paper

concludes by highlighting the importance of this research in bridging the gap in process mining and making

it accessible to a wider range of businesses.

1 INTRODUCTION

Businesses face a variety of challenges in the current

economic environment. The challenges that

enterprises currently face depend on numerous

factors, such as the industry in which a company

operates, its geographical location, the competition,

and the general economic situation. However, almost

all companies are affected and need to be able to face

the current challenges that have arisen from the yet

ongoing COVID-19 pandemic, including advancing

digitalization, increasing demands for sustainability

and environmental protection, wars and conflicts, as

well as increasingly stressed supply chains and rising

competition. These challenges, as well as their scale

and scope, are pushing companies that want to be

successful in the current and future world of business

to strive for agility, efficiency, resilience, and

adaptability, among other things.

To meet these requirements, companies are

increasingly relying on the extraction and processing

of information relating to their business processes.

Process mining (PM), as a combination of the two

disciplines of process science and data science,

combines the advantages of both and creates potential

https://orcid.org/0000-0002-6841-6577

https://orcid.org/0000-0002-1220-5922

for companies to meet the aforementioned challenges

(van der Aalst, 2016, p. 15 f.).

However, there is a blind spot in the discipline of

process mining due to the fact that not all process data

is or can be collected. This concerns mostly manual

processes which are not or not fully connected to IT-

systems and thus there is no data, or it is not and

cannot be used for process mining. These processes

also include manual processes that cannot be

automated and are spatially distributed, which cannot

or can only with difficulty be embedded in already

existing IT systems. This blind spot provides

potential for the further development of process

mining and for companies to increase their

performance and competitiveness.

The central question of this paper is therefore:

what possibilities are there to make such processes,

which cannot yet be taken into consideration by

process mining, visible for process mining? As well

as what methods and procedures are necessary for

this. This paper focuses on processes that include at

least some manually executed activities, where the

individual process steps or instances take place

spatially distributed at fixed positions and have not

yet been connected to IT-systems.

Bulander, R., KÃ˝ulmel, B. and Rath, M.

Development of a Procedure for the Processing of Raw Sensor Data from Smart Devices for Utilisation in Process Mining.

DOI: 10.5220/0012122700003552

In Proceedings of the 20th International Conference on Smart Business Technologies (ICSBT 2023), pages 193-202

ISBN: 978-989-758-667-5; ISSN: 2184-772X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

193

To answer this question, the paper will review the

background of process mining, sensor data collection

via smart devices, extract, transform and load

processes (ETL), and Microsoft Excel Power Query,

as well as the methodologies of data collection

experiment, data preparation and transformation, and

process discovery in process mining software

application.

2 AIMS AND OBJECTIVES

The aim of this research is to explore techniques for

generating and analyzing data on a process that is not

currently mapped and analyzed through process

mining techniques, particularly for small or medium-

sized enterprises (SMEs) that may not be fully

integrated into Industry 4.0.

The process may consist of manually executed

activities that take place in spatially distributed fixed

positions, making it difficult to capture using

traditional methods.

Additionally, the research aims to transform the

generated raw sensor data into a format suitable for

process mining and to demonstrate the effectiveness

of this transformation process.

Ultimately, the objective is to map the modeled

exemplary process in a process mining software

application, confirming the feasibility and validity of

the proposed approach. To achieve these objectives,

the research will generate data on a modeled process,

develop a procedure for transforming and processing

the data into an event log, and map the process in a

process mining software application.

The aim of this paper is to show that the

exemplary process can be modelled and reproduced

within PM from the sensor data generated during the

execution of a process within the production of a

company.

For this purpose, the recorded sensor data must be

transformed and processed into an event log for use

in process mining. The method developed for this

purpose is shown in this paper. The modelled

example process is to be mapped in a process mining

software application. Thus, the possibility as well as

the correctness of the above mentioned data

generation and transformation shall be demonstrated.

3 BACKGROUNDS

This paper covers three main topics: process mining,

smart devices, and extract, transform, and load (ETL).

A fundamental understanding of these topics is

crucial for comprehending the methods used and the

results presented in the subsequent chapters.

However, it is not necessary to have an in-depth

knowledge of the entire spectrum of these topics.

Such a detailed understanding would exceed the

scope of this paper and would not provide any added

value to the research, as there is already a plethora of

literature available on these subjects.

In the following sections of this chapter, we

provide a brief overview of the essential concepts and

principles for each topic to aid in understanding the

research work.

3.1 Process Mining

The objective of process mining (PM) is typically to

derive knowledge and actions from process event

data. Through diverse systems and equipment,

including Data on process events are recorded and

saved using enterprise resource planning systems or

sensor technology in a networked production

(Brzychczy, Gackowiec, and Liebetrau, 2020, p. 2).

Process mining utilises these data.

With the aid of this event data, processes can be

studied, modeled, verified, analyzed, and enhanced

(van der Aalst, 2016, p. 2).

The process mining discipline is regarded as the

link between the superordinate disciplines of process

science and data science (van der Aalst, 2016, p. 15).

PM incorporates approaches from data science

and process science. Advantages can be realised by

employing both model-based process analysis

methodologies and data-centric approaches. (van der

Aalst, 2016, p. 17)

The purpose of process mining is to uncover,

monitor, and enhance actual processes (de Leoni, van

der Aalst, and Dees, 2015).

Many business procedures leave digital footprints

or event data.

These traces, in the form of information regarding

real-world processes, are collected in various storage

systems. This data can then be used to generate event

logs. Process mining presupposes the existence of an

event log in which each entry corresponds to a case,

an activity, and a time stamp. An event log is a

collection of cases, while a case is a trace or sequence

of events. The information included within an event

log can subsequently be utilised by various process

mining tools to discover, analyze, and optimise the

actual real world process. Among other things, it is

possible to design a process model. By drawing

conclusions regarding process bottlenecks, for

example, the process might be modified or enhanced.

ICSBT 2023 - 20th International Conference on Smart Business Technologies

194

In the actual world, the modified process continues to

leave traces, which can be added to the event logs,

which form the basis of process mining. Thus,

process mining enables the drawing of further

conclusions. Consequently, process mining enables

the ongoing discovery, monitoring, analysis, and

modification of processes (van der Aalst, 2016, p.

32).

Hence, the aforementioned event logs are the

basis upon which process mining is constructed. The

spectrum of methods and formats for recording

process data includes databases, individual tables in

Excel format, and text files. Collecting, linking, and

preparing dispersed process data is necessary. This

prepared data constitutes an event log in which all

pertinent event data for the investigated process is

compiled. (cf. Brzychczy, Gackowiec, and Liebetrau,

2020, p. 2).

One can map a great deal of information regarding

a process. This information may include staff

identification, customer numbers, etc., depending on

the process and data scenario. Important is the

structure of the individual event logs. At least four

distinct categories of information are required for

process mining (Bauer et al., 2014, p. 6).

Each event log must include the following four

categories of information:

The Case-ID serves as an identification number

for the case. This case-id represents the particular

process instance. Each instance of a process has its

own unique identification number. This makes it

feasible to not only differentiate between distinct

process instances, but also to compare them (Peters

and Nauroth, 2019, page 15).

The second piece of crucial information in an

event log is the Event-ID. Likewise to the Case-ID is

a unique identifier. Nevertheless, each entry (row) in

an Event-Log, and therefore each event, has its own

unique number. Its primary purpose is to determine

the order of the events. With the Event-ID, activities

can be identified and appropriately categorised if, for

example, there are repetitions of activities inside a

process instance or if activities occur at the same

time. (van der Aalst, 2016, p. 37 ff.)

In addition, each event log must contain

timestamps for each of the mentioned events or

activities (Bauer et al., 2014, p. 6 ff.). These

timestamps facilitate Identify and display process

data such as throughput times and wait periods.

The activity description is the fourth fundamental

piece of information that must be present in event

logs. This is used to develop process models in

process mining, among other things.

3.2 Collection of Sensor Data Through

Smart Devices

This section discusses the collecting of sensor data

using intelligent devices. It is first established what is

meant by the phrase "smart devices" and how these

devices are distinguished from others. The phrase

"smart devices" will also be defined within the scope

of this work. In addition, the methods employed to

create and record sensor data for the purpose of this

work will be presented.

3.2.1 Definition of Smart-Devices

As a result of ongoing technological advancements

and the growing importance and application of the

Internet of Things (IoT), an increasing number of

things can be categorised as smart devices. The

Internet of Things is a network of interconnected

items. These may range from simple sensors to

smartphones. (Silverio Fernández, Renukappa, and

Suresh, 2018).

For an object or device to be recognised as a smart

device, it must possess three primary qualities

(Fernández, Renukappa, and Suresh, 2018, p. 6).

The first requirement is context awareness. The

object must be able to gather data from its

surroundings. This is accomplished, for instance, by

the use of sensors such as cameras, accelerometers,

gyroscopes, and GPS.

Second, the connection property must be satisfied.

The items or devices must be able to connect and

communicate with one another or with other systems.

It is irrelevant which network connectivity is possible

on. It is essential that this possibility even exists.

Smart gadgets, such as smartphones, frequently

utilise network access primarily in the form of an

Internet connection.

Autonomy is the third and final essential property

an object must possess in order to be deemed a smart

gadget. The devices or things must be able to act

alone or autonomously (Silverio Fernández,

Renukappa, and Suresh, 2018, p. 4–9).

As said, numerous devices and items qualify as

smart devices. Due to the scope of this study, only a

subset of these devices will be discussed. These

intelligent devices are cell phones (smartphones).

3.2.2 Acquisition of Sensor Data by Means

of Smart Devices

Many people own a smart device in the form of a

smartphones. These smartphones usually have a large

number of built-in sensors. The most common

Development of a Procedure for the Processing of Raw Sensor Data from Smart Devices for Utilisation in Process Mining

195

sensors found in smartphones are microphones,

cameras, accelerometers, gyroscopes,

magnetometers, pressure sensors, temperature

sensors, proximity sensors, light sensors, and

humidity sensors (Sztyler et al., 2015).

The multitude of information from these different

sensors is mostly processed within the background of

the smartphone (Sztyler et al., 2015). The user of such

a smart mobile phone will most commonly not see

this gathered sensor data. Instead, the data is mostly

used in order to maintain and assure the

functionalities of the device. The data is used, for

example, to recognise whether the smartphone is

being held to an ear. With this recognition, the device

is able to shut off the touchscreen, which in turn will

keep the user from performing unwanted actions.

A number of freely available applications give the

user of such a smart device the ability to see as well

as collect data generated by the in-built sensors.

One application like this is the Phyphox-App.

This app was created by the 2nd Institute of

Physics at RWTH Aachen University. The

application makes it possible to experiment with the

sensor technology built into the smartphone.

In addition to executing prefabricated

experiments for various applications, the experiment

editor allows one to create and execute ones own

experiments. The user is restricted solely by the

device's hardware (Rheinisch-Westfälische

Technische Hochschule (RWTH) Aachen and Staaks,

n.d.).

As stated in earlier chapters and sections, it is

possible to analyse and enhance previously

unmapped or incompletely mapped processes,

particularly tangible production processes within

small and medium-sized enterprises (SMEs), by

generating process data from sensors using process

mining (van der Aalst et al., 2012).

The overall aim of this research is to show the

possibility of such manual, locally bound processes

with which process data can be generated and

processed in such a way that it can be used in process

mining. The use of smartphones in this context is thus

identified as a possibility for processing data. Many

People or for that manner SMEs have smartphones at

their disposal. Thus, the idea arises to use these smart

devices and their sensory systems for the generation

of process data instead of trying to (if even possible)

integrate other sensory systems, often resulting in

high cost. By means of a smartphone and the

application Phyphox, it is possible to generate sensor

data on certain processes that are not otherwise

available due to various circumstances.

By generating this previously unavailable data, it

is possible to also consider these manual, locally

bound processes in process mining and hence reduce

the aforementioned blind spot of process mining and

generate added value for companies by analyzing and

improving the processes that could not be mapped

before (van der Aalst et al., 2012).

3.3 Extract Transform & Load (ETL)

The ETL process is often found in the context of the

two topics of business intelligence and big data or

data science and is thus also closely connected with

process mining. This is not surprising, as the ETL

process is used to extract large amounts of data from

different sources, process them, and then transfer or

load them in the required format into data

warehouses, databases, or other designated data

stores (Li, Kuang, and Liu, 2016).

The ETL process is an integral part of business

intelligence. The topic of business intelligence

describes procedures and methods for gaining

knowledge about aspects and facts within companies

(Dedi and Stainer, 2016, p. 225 ff.).

In addition to its use in process mining, the ETL

process performs essential duties. Typically,

unstructured, or partially structured data serve as the

beginning point in this context. Even in process

mining, the data foundation determines the outcome.

Additionally, the ETL process is used to extract

process data from various sources. The data is utilised

to prepare and transfer extracted data into systems for

further processing (Diba et al., 2019).

The ETL process consists of the three successive

phases of extraction, transformation, and loading. The

extraction phase includes the extraction and

sometimes combination of data in its raw form, which

is necessary for the subsequent acquisition of

knowledge. The sources from which data is extracted

can be very different in nature. The ETL applications

on the market can extract the required data from a

variety of different data sources. Due to the large

selection of data sources and their differences from

each other, the data is usually recorded in many

different types and formats (Du, Ye, and Wang,

2013).

In the subsequent transformation phase, the

extracted data is transformed or prepared in a way that

makes it usable for whatever purpose it is destined to

serve.

After the data is transformed, it can then be loaded

into a number of destinations, for example,

spreadsheets or databases.

ICSBT 2023 - 20th International Conference on Smart Business Technologies

196

3.4 Microsoft Excel Power Query

Microsoft Power Query is an application for

executing ETL processes. Power Query is an add-in

that is available in some Microsoft products. Thus, it

is, for example, included in the Microsoft Excel

spreadsheet programme within the Microsoft Office

365 package. Microsoft Power Query is also included

in Microsoft's business intelligence programme

Microsoft Power BI. Like other ETL applications,

Power Query creates queries with which data is

automatically transformed, modified, or prepared

before it is loaded into the target system. The ETL

process, once set up, can run automatically in the

background. For this purpose, you can specify

whether and at what intervals the created queries are

to be updated.

Microsoft Excel is one of the most popular

applications within companies in the field of data

preparation and transformation. There are many

reasons for this.

On the one hand, Excel has existed since the end

of 1985 and is therefore well known and established

in the market. On the other hand, Microsoft has

continuously expanded, adapted, and improved Excel

since its creation. It offers a wide range of functions

and activities that can be carried out in it (Bahr 2019).

The ETL solution Power Query is used for the

methodical procedure described in the following

chapter with regard to the data preparation of the

collected event data from smart devices. This is

justified by the focus and objective of this research.

The paper focuses on the mapping of physical, locally

bound processes in process mining. These types of

processes are often, but not exclusively, found in the

everyday business of small and medium-sized

enterprises (SMEs). A procedure is developed with

which previously unmappable processes can be

discovered through process mining. The main focus

is that this procedure be as simple, fast, and efficient

as possible. This is to be achieved by using existing

or known tools and methods. Excel Power Query has

already been in use in most companies for a long time,

and the users already have an overview of the

programme. Moreover, in this regard, no new ETL

application needs to be implemented or learned.

4 METHODOLOGIES

This paper pursues the goals formulated in the

motivation and objectives section in order to develop

a procedure for processing sensor data from smart

devices for use in process mining. The three

objectives (data acquisition (via smartphone), data

preparation (via ETL/Power Query) and process

discovery in PM (via Celonis PM-application) build

on each other. Without existing process data no data

preparation is possible and without data preparation

no process discovery is possible. Therefore, a suitable

data source must first be found, and the data obtained

from it processed. Only after the data has been

collected can it be transformed to turn the collected

information about a process into insights that can be

used for development and improvement, ultimately

adding value to an organisation.

The following chapter will give an insight into the

methods utilised, the approaches taken, and the

outcomes reached.

The chapter begins by explaining the

experimental approach to collecting raw process

sensor data.

An overview of the approach taken to transform

the raw sensor data into an event log using ETL

techniques is also given. Since the steps taken within

the ETL-process of this research cannot be presented

and described in detail and in their entirety due to the

limited extent of the type of publication at hand, a

broad overview supported by certain examples will be

provided.

Finally, the chapter presents the discovery of the

experimental process within a process mining

software application, which also serves as a test of the

success of the data transformation approach

previously performed.

4.1 Data Collection Experiment

To be able to create an event log that meets the

requirements of process mining, the data base must

already be of high quality. However, it is more

important to be able to generate information and data

about a process at all. In order to reduce the blind spot

in process mining, there is the potential to make

processes that could not previously be handled with

PM due to a lack of process data and information

visible for PM by using smart devices. For this

purpose, we first put ourselves in the position of an

SME that carries out processes within its production,

some of which are manual in nature and

geographically separated from one another.

These processes are not or are not completely

connected to the IT infrastructure of the company.

Therefore, no data is collected, which in turn is

needed to better understand and improve the

processes through PM.

To obtain this information, the company could go

several ways. For example, it could aim to link

Development of a Procedure for the Processing of Raw Sensor Data from Smart Devices for Utilisation in Process Mining

197

different sensors or entire systems to the processes.

However, this is a major undertaking that is both

time-consuming and cost intensive. If we add to this

the fact that the processes sometimes have small

batch sizes, it becomes obvious that a cost-benefit

analysis would not be directly in favour of the

benefits.

The idea arises that smartphones (smart devices)

also have built-in sensors and can thus draw

conclusions about their environment and what is

happening around them through their context

awareness. The Phyphox application makes it

possible to record and export the raw sensor data of

various smart devices.

To generate the required data, the Phyphox

application is used in conjunction with a smart device.

In order to generate raw sensor data that comes as

close as possible to reality, a model production

process that can also be found in an SME is

developed, and then data and information are

collected using smart devices. The experiment for

data collection thus consists of a model process for

which the required process data can be collected by

means of a smartphone. This modelled process, which

will be described later, has been chosen to best model

and mimic processes that can be and are often found

in small and medium sized enterprises. An example

might be a small metalworking company where, for

example, a workpiece must first be retrieved from its

storage location, then transported to a second location

where it is, for example, cut, then transported to a

third location where it is, for example, finished, and

finally transported to a fourth location within the

company premises to be packed and shipped.

In many real-world scenarios, these instances of

processing—or workstations in this experimental

model—that a workpiece has to pass through are

located in fixed areas and are spatially distributed. In

order to collect data along the whole span of the

process, the smart device needs to follow the

workpiece. Thus, the smartphone will accompany the

workpiece to be processed through all process stages.

This will be achieved by equipping a load carrier box

used for the transportation of the workpiece. This

way, the device will be able to experience similar

environmental influences as the workpiece, from

which process instances or activities can be derived.

4.1.1 Description of Experimental Set-up

To collect process data, a process to be investigated

must first be defined. This process for data collection,

which is a model in the context of the objective of this

work, represents the object of investigation. The

experimental process consists of two types of

activities. The first of these types are transport

activities, and the second are activities in which a

workpiece is processed. The process is shown in

Figure 1 as a BPMN process model.

The experimental process is defined and

modelled. Now it can be transferred to the real world,

and data can be collected about it.

To carry out the experiment, a smart device in the

form of a smartphone that has downloaded and

installed the Phyphox application is needed.

In addition, a load carrier is needed in which the

smart device can follow the process and record data.

4.1.2 Description of the Experiment

To carry out the experimental process and collect

data, an open space (open field) was found. Within

this open space, the premises for the production of an

enterprise were simulated. For this, the four

workstations of the defined experimental production

process were marked in the open space. It does not

matter where exactly the workstations are located, but

when carrying out further experiments (repeating the

process), it must be ensured that the stations are

located in the same places as before, otherwise the

data will not match. This is one aspect to be

considered in terms of data quality.

Now, after setting up simulated workstation, the

Phyphox application can be opened on the smart

device. Within the application, one can set up

different data collection experiments. For these, all

the sensors built into the device are available.

For this particular experimental process, two of

the available sensory systems were chosen to deliver

Figure 1: BPMN-process-model of the experimental process.

Source: Own illustration created with Bizagi-Modeller.

ICSBT 2023 - 20th International Conference on Smart Business Technologies

198

a degree of information efficient enough to draw the

necessary conclusions. These sensory systems were

the location sensory system and the accelerometer

sensory system. By combining location data with data

about acceleration, it can be determined what exact

process instance the workpiece passes through at a

certain point in time.

After setting up the Phyphox Experiment, the

device is stored within the same load carrier as the

workpiece. The experiment is started at the same time

as the exemplary process is started, the process is run,

and at the end of it, the experiment on the app is

stopped and the accumulated data is saved within a

cloud-based storage system. This has been repeated

twelve times. In some runs, the correct order of

instances is followed; in others, instances are

repeated, left out, carried out in a different order, or

all the above combined. This is done to assure that the

data set replicates not only ideal processes but also

incorporates divergences or variances, as the

discovery of these discrepancies between the ideal

process and the way it actually takes place in the real

world is the basis of the value added by process

mining.

After carrying out the experiment several times,

enough data of adequate quality was obtained so that

the data was then ready to be transformed.

4.2 Data Preparation and

Transformation

By fulfilling the goal of obtaining process data from

sensors in smart devices, the results of this procedure,

raw sensor data, can now be further processed and

converted into event logs.

This generated raw sensor data forms the

foundation of the following procedures. It is available

in a certain format. How the data is available

determines the steps that must be taken to convert the

generated data into a usable event log.

The data sets created are saved either as Excel

files or as Text files (CSV) all have the same structure

and are all saved in a common folder within a cloud

storage system. The Excel files each consist of four

sheets. The first sheet of the Excel workbook is

labelled Linear Accelerometer. This sheet contains all

the data collected by the linear accelerometer.

The next sheet, entitled Location, contains all the

data recorded by the GPS, including time stamps,

longitudes, and latitudes, which are required for the

conversion to the event log, but also data on altitude

above sea level, which are not relevant in the context

of this work and are therefore redundant.

Furthermore, the metadata device sheet is created,

which contains information about the terminal device

with which the data was recorded. Such as the product

name or the brand of the unit. The data in this folder

is also not required.

The fourth sheet is called Metadata Time and

contains information about the temporal horizon of

the experiment for sensor data generation. The sheet

shows the elapsed time, as well as the date and time

at the start and end points and at pauses in the

experiment, in two different formats. This data is also

used to create the event log and must therefore be

processed.

Since Power Query is embedded in Excel, the

very first step is to create a new Excel folder. The data

is loaded into this folder after it has been extracted

and transformed into an event log.

Now, within Excel, the saved data can be

extracted and loaded into the Power Query interface.

However, because the data for each process case is

stored in a separate sheet and separate files, they must

all be combined. This is accomplished through the use

of one query. Now all the collected data has been

extracted and combined.

Following the combination, the data will be

transformed by creating further queries all relating to

each other in some ways. By transforming an example

file, one query will be created for this exact example

file, which can later be used to transform each file

containing relevant data in the same way as specified

in the example file. In this case, the first file of the

folder, however any other file in the folder can be

selected.

Within the Power Query Editor, a preview of one

thousand rows of the selected and combined data is

displayed. Also, the Editor offers different

transformation, loading, and extraction actions as

predefined functions. However, other actions can also

be carried out by using the SQL-code necessary.

Depending on the situation, the pre-defined actions

may be sufficient, but for a task like the one at hand,

many special actions had to be made of.

Power Query will create a query from the

transformation actions taken to enhance the data. A

user will extract the data necessary, carry out the

required transformation steps, and finally load the

processed data into its designated location, for

example, an Excel file.

Examples for transformation steps could be the

filtering by the defined coordinates of the chosen

workstations to remove unnecessary data,

calculations, changes in format, or even the

adjustment and integration of units. Using this

method, one can define how data is to be transformed

Development of a Procedure for the Processing of Raw Sensor Data from Smart Devices for Utilisation in Process Mining

199

by means of manual processing of a small part of the

data. These defined transformation steps will then be

applied to all the data by following and repeating the

recorded steps. This means that once the queries

required are defined, large masses of data can be

transformed automatically and repeatedly.

To distinguish and delimit the different process

instances or activities, for example, a filter was

created.

For the activity at workstation 1, for example,

there were geographical limits considered in

combination with certain values for the accelerometer

data. All data that fits both filter criteria can then be

assigned to workstation 1 or activity 1. Through this

method, one of the four essential event log contents,

the activity name, can be found and added. The other

essential contents were found and added in a similar

fashion. However, in detail, each contains similar but

different transformation and processing steps. These

depend on the data available and the targeted

outcome.

As with the extraction, the user has many options

for loading. Processed data can be loaded into new

Excel worksheets but also into tables or existing

folders. However, it can also be loaded into a data

warehouse, for example. The choice here was to load

the finished event log into the file in which the queries

were created.

The result of the ETL process is an event log in

the form of a table within an Excel workbook file.

4.3 Process Discovery in Process

Mining Software Application

The event log created from the raw sensor data is now

to be examined in process mining, and the

experimental process underlying the event log is to be

discovered.

The process mining application of the market

leader Celonis is used for this purpose.

First, a data pool is created. The file containing the

event log is now uploaded into this data pool via the

File Uploads tab. The data scheme can then be

configured prior to the upload.

Afterwards, a new workspace can be created via

the Process Analytics tab. This workspace is now

connected to the created data pool containing the

created event-log, and the discovery of the process

can begin. Celonis has provided several ready-made

tools and templates to explore the process at hand,

such as in the Process Explorer, where the process

model created from the uploaded event log is

displayed and can be explored.

The displayed Figure 2 shows the process model

generated by the Celonis EMS. It is to be said that the

exact content like the designations within the figure

are not important. Also, since the methodological part

Figure 2: Process discovery of the experimental process within the Celonis EMS Application.

Source: extracted from Celonis EMS-System.

ICSBT 2023 - 20th International Conference on Smart Business Technologies

200

of this research was undertaken in German, this

screenshot of the Process Viewer within the Celonis-

EMS shows German activity names. Since the

underlying data-basis (event-log) is in German

language the visualisation of the process within the

Celonis EMS can only be shown in german.

However, it is to be said, that this graphic is merely

included to show the outcome as well as demonstrate

and illustrate the following.

When comparing the ideal process model (Figure

1) to the experiment carried out and the resulting real-

life process map visualized via the Celonis EMS, it

becomes clear that the targeted goal of creating and

following an approach to data transformation that is

able to depict the process as it has happened in the

real world was met.

The figure shows, by comparison with the ideal

BPMN process model, how strongly a seemingly

simple process can deviate from its ideal path. The

figure 2 shows a so-called spaghetti-model showing

all of the model process’ variations. The inscriptions

are not important, merely the fact that the process can

be mapped using this methodology, as well as the

insight that even such a simple process (compare to

Figure 1) can in a real-world scenario be much more

complex are the reasoning behind sowing this figure.

5 CONCLUDING REMARKS &

OUTLOOK

In conclusion, it can be said that the goals set have

been met and that there is a potential to reduce the PM

blind spot. The main task and most of the time spent

were used to plan and implement and carry out the

methodological approach, of which the detailed

description unfortunately is beyond the scope of this

paper. It has been shown that Power Query is a very

powerful tool in the field of ETL.

However, it must also be mentioned that contrary

to Microsoft's statements, an affinity for coding and,

ideally, background knowledge should already exist,

especially for special applications. Looking to the

future, it can be said, when considering current

research, that the topics dealt with in the work will

come more and more to the fore.

There are many possibilities for continuing this

work. More sensors could be connected, or already

installed ones could be integrated as well.

The use of artificial intelligence could have an

influence on the topic. Artificial intelligence (AI) and

machine learning (ML) techniques can be used to

automatically identify the type of process treated and

type sensor data that is being captured (i.e.,

acceleration data vs velocity data) and transform the

data into a format that is suitable for visualising and

analysing using process mining techniques. AI and

ML algorithms can be used to automatically classify

the type of process and sensor data that is being

captured. For example, image recognition algorithms

can be used to identify the type of workpiece that is

being processed, while natural language processing

algorithms can be used to classify textual data. Once

the data has been classified, AI and ML algorithms

can be used to clean and normalise the data, making

it easier to analyse. This can include removing

duplicates, filling in missing values, and normalising

data across different sources. Once the data has been

cleaned and transformed, ML models can be trained

to predict process outcomes or identify areas for

improvement. For example, a model could be trained

to predict when a workpiece is likely to fail based on

sensor data and other process variables. By

combining this with process mining techniques, they

can optimize their processes and improve their overall

performance.

Another opportunity for future research is the use

of more advanced sensors. Smartphones have a

variety of sensors that can be used to capture data

such as location, acceleration, and orientation, but

more advanced sensors could provide more detailed

data and enable more accurate insights. For example,

LiDAR sensors can be used to capture detailed 3D

maps of objects and environments, infrared cameras

can be used to capture temperature data, and 3D

scanners can be used to capture detailed geometrical

information. LiDAR sensors, for example, are already

built into many current smartphone models. Also,

more advanced sensors could be integrated through a

wireless connection like Bluetooth.

Furthermore, other data sources could be

integrated. Smart device data can be integrated with

other data sources, such as enterprise resource

planning (ERP) systems, customer relationship

management (CRM) systems, and other operational

systems, to provide a more comprehensive view of

the process. This integration can help provide

additional context and insights that may not be

available through smart device data alone.

To ensure accurate and reliable insights, it is

important to improve data quality by minimizing

errors and inconsistencies in the data. This may

involve using data validation techniques to ensure

that data is entered correctly, cleaning and filtering

the data to remove duplicates or incomplete records

and ensuring that the data is complete and consistent

across all data sources.

Development of a Procedure for the Processing of Raw Sensor Data from Smart Devices for Utilisation in Process Mining

201

Overall and in conclusion, the set and targeted

goals of this research (data creation, data

transformation and enhancement, process discovery

with process mining) were met. thus, showing one

possibility of how to further reduce the mining blind

spot by experimenting on how this can be achieved,

the potential of generating value added for enterprises

increases, process mining is further developed, and

new potentials that were not apparent beforehand can

be found.

The publication at hand shows that the potential

to further develop the possibilities for further

enhancement within disciplines such as data science

or process mining is not exhausted. And thus, there is

a predominant reason to keep up research efforts.

REFERENCES

Bahr, I. (2019). Capterra Nutzerstudie:

Digitalisierungstrends in deutschen KMU 2019.

[online] Capterra. Available at:

https://www.capterra.com.de/blog/640/digitalisierungs

trends-in-deutschen-kmu-2019 [Accessed 20 Oct.

2022].

Bauer, R. and et al (2014). Multidimensionales Process

Mining. [online] Available at:

http://www.boles.de/teaching/pg_fb10/endberichte/20

14/MPM-Abschlussbericht.pdf.

Brzychczy, E., Gackowiec, P. and Liebetrau, M. (2020).

Data analytic approaches for mining process

Improvement—Machinery utilization use case.

Resources, [online] 9. doi:10.3390/resources9020017.

Dedić, N., Stanier, C. (2016). Measuring the Success of

Changes to Existing Business Intelligence Solutions to

Improve Business Intelligence Reporting. In: Tjoa, A.,

Xu, L.,

de Leoni, M., van der Aalst, W. and Dees, M. (2015). A

general process mining framework for correlating,

predicting, and clustering dynamic behavior based on

event logs. Information Systems.

doi:http://dx.doi.org/10.1016/j.is.2015.07.003.

Diba, K., Batoulis, K., Weidlich, M. and Weske, M. (2019).

Extraction, correlation, and abstraction of event data for

process mining. WIREs Data Mining and Knowledge

Discovery. doi:10.1002/widm.1346.

Du, N., Ye, X. and Wang, J. (2013). A semantic-aware data

generator for ETL workflows. Concurrency and

Computation: Practice and Experience, 28(4),

pp.1016–1040. doi:10.1002/cpe.3028.

Li, J., Kuang, B. and Liu, J. (2016). Script-based

Automation ETL Tool. Proceedings of the 2016 4th

International Conference on Management, Education,

Information and Control (MEICI 2016).

doi:10.2991/meici-16.2016.201.

Rheinisch-Westfälische Technische Hochschule (RWTH)

Aachen and Staaks, S. (n.d.). Phyphox - Dein

Smartphone ist ein mobiles Labor. [online]

phyphox.org. Available at:

https://phyphox.org/de/home-de/ [Accessed 30 Dec.

2022].

Silverio Fernández, M., Renukappa, S. and Suresh, S.

(2018). What is a smart device? A conceptualisation

within the paradigm of the internet of things.

Visualization in Engineering, [online] 6(1), p.3.

doi:10.1186/s4032701800638.

Sztyler, T., Völker, J., Meier, O. and Stuckenschmidt, H.

(2015). Discovery of Personal Processes from Labeled

Sensor Data – An Application of Process Mining to

Personalized Health Care. CEUR Workshop

Proceedings, [online] 1371, pp.31–46. Available at:

http://ceur-ws.org/Vol-1371/paper03.pdf [Accessed 30

Dec. 2022].

van der Aalst, W. (2016a). Process Mining. Encyclopedia

of Database Systems, pp.1–3. doi:10.1007/978-1-4899-

7993-3_1477-2.

van der Aalst, W. (2016b). Process Mining Data Science in

Action. Springer-Verlag Berlin Heidelberg.

van der Aalst, W., Adriansyah, A., de Medeiros, A.K.A.,

Arcieri, F., Baier, T., Blickle, T., Bose, J.C., van den

Brand, P., Brandtjen, R., Buijs, J., Burattin, A.,

Carmona, J., Castellanos, M., Claes, J., Cook, J.,

Costantini, N., Curbera, F., Damiani, E., de Leoni, M.

and Delias, P. (2012). Process Mining Manifesto.

Business Process Management Workshops, [online]

pp.169–194. doi:10.1007/978-3-642-28108-2_19.

ICSBT 2023 - 20th International Conference on Smart Business Technologies

202