To address this issue, we implemented a process
mining tool building upon the Power BI
infrastructure. We named the tool Business
Intelligence and process mining (BIpm). BIpm can
discover process models from event logs and plot
them understandably and also showing compliance
diagnostics. Moreover, one of the advantages of
BIpm is getting the Power BI users and all BI experts
more familiar with process mining analysis. BIpm
lets BI developers and general data scientists
subsequently apply process mining analysis quickly,
user-friendly and easily in the platform that they are
used to it. According to the available license fees for
process mining commercial tools such as Disco and
Celonis, BIpm is more affordable. It is free custom
visual and only the probable fee might be charged for
using Power BI. Meanwhile, regarding to the current
license policy of Microsoft, using Power BI Desktop
is completely free (Microsoft, 2019).
The idea to relate process mining to OLAP
analysis was introduced firstly by van der Aalst (van
der Aalst, 2013) and it was realized by building the
so-called Process Cube paradigm (Bolt and van der
Aalst, 2015). Process cubes organize event data in the
form of an OLAP cube to allow for discovering,
comparing and analyzing the process models by
applying dice and slice filtering functions on the cube
(cross filtering). Here, we continue this line of
research by providing an integrated process mining
solution with many BI features analysis in a single
platform. This is achieved by our developed custom
visual for Microsoft Power BI. Power BI is the
powerful self-service BI platform for big data-centric
businesses with many interactive visualizations for
graphical figures, data mining tasks, statistical
analyses, and geographical maps and also it has useful
features such as supporting online dashboards,
customized reports and, online alerting (Ferrari and
Russo, 2016). There are many options for connecting
or importing different data sources into Power BI, as
long as the following constraints are satisfied 1) there
is a 1 GB limit for each dataset. 2) The maximum
number of rows when using DirectQuery is 1 million
rows and when not using DirectQuery is 2 billion
rows, 3) The maximum number of columns in a
dataset should not exceed more than 16,000 columns
(Microsoft, 2019). These constraints are not limiting
in most applications.
By using BIpm, business owners, business
analysts, and managers can understand the value of
process mining and come up with the improvement
plans for reengineering the previous and ongoing
processes or designing forerunner ones in the hope to
achieve the better performance and efficiency.
2 BIpm OVERVIEW
In this section, we give an overview of BIpm. Firstly,
we will describe how the input data fields should be
prepared and placed in the Fields pane of Power BI.
After that, we illustrate some functional capabilities
and available opportunities in the BIpm for better
understating of process mining analysis.
2.1 Input Fields
According to the expected attributes of standard event
logs for process mining, given data logs in Power BI
should have these attribute fields: CaseId (i.e., the
identifier for each case), Activity (i.e., activity name
associated to events), and Timestamp (i.e., the
execution time of one activity regarding to the
determined case). Moreover, Path threshold and
Activity threshold are optional fields. Other event and
process attributes such as Resource, Cost, lifecycle,
etc. can be used for multidimensional analysis and to
enrich analysis by adding further insights. An
example of an event log is mentioned in Table 1.
Table 1: Sample rows of an example event log.
Case Id Activity Timestamp Resource Customer
Type
1142 Register 11:25 System Gold
1142 Analyze Defect 12:50 Tester3 Gold
1142 Repair (Simple) 13:25 SolverS1 Gold
1145 Register 11:44 System Silver
1142 Test Repair 17:12 Tester1 Gold
1142 Restart Repair 18:15 System Gold
46 Test Repair 05:47
Tester6
Bronze
46 Inform User 06:00 System
Bronze
46 Archive Repair 06:02 System
Bronze
45 Register 19:36 System
Gold
45 Analyze Defect 19:36 Tester3
Gold
45 Repair (Simple) 20:01 SolverC2
Gold
To get the proper output process model, the following
practical points are recommended to be considered in
the Power BI report designing level:
1. The data type of “CaseId” field should be
numeric for performance reasons, but simple
conversions are available. The data type of
“Timestamp” field can be the time or series of
integers.
2. Generally, CaseId, Activity, and Timestamp
attributes should be set as "Don't summarize" to
be considered as the row based granularity in the
data input gateway for the custom visual. It can be
done in the drop-down menu of each field slot in
the Fields pane.