Authors:
Francesco Folino
;
Massimo Guarascio
and
Luigi Pontieri
Affiliation:
Institute ICAR and National Research Council (CNR), Italy
Keyword(s):
Data Mining, Business Process Intelligence, Trace Clustering, Workflow Discovery.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Process discovery (i.e. the automated induction of a behavioral process model from execution logs) is an important tool for business process analysts/managers, who can exploit the extracted knowledge in key process improvement and (re-)design tasks. Unfortunately, when directly applied to the logs of complex and/or lowly-structured processes, such techniques tend to produce low-quality workflow schemas, featuring both poor readability ("spaghetti-like") and low fitness (i.e. low ability to reproduce log traces). Trace clustering methods alleviate this problem, by helping detect different execution scenarios, for which simpler and more fitting workflow schemas can be eventually discovered. However, most of these methods just focus on the sequence of activities performed in each log trace, without fully exploiting all non-structural data (such as cases data and environmental variables) available in many real logs, which might well help discover more meaningful (context-related) process
variants. In order to overcome these limitations, we propose a two-phase clustering-based process discovery approach, where the clusters are inherently defined through logical decision rules over context data, ensuring a satisfactory trade-off is between the readability/explainability of the discovered clusters, and the behavioral fitness of the workflow schemas eventually extracted from them. The approach has been implemented in a system prototype, which supports the discovery, evaluation and reuse of such multi-variant process models. Experimental results on a real-life log confirmed its capability to achieve compelling performances w.r.t. state-of-the-art clustering approaches, in terms of both fitness and explainability.
(More)