(e.g., flow times of cases).
The above examples illustrate that Table 1 iden-
tifies a range of novel research challenges in process
mining. In today’s society, event data are collected
about anything, at any time, and at any place. Today’s
process mining tools are able to analyze such data and
can handle event logs with billions of events. These
amazing capabilities also imply a great responsibility.
Fairness, confidentiality, accuracy and transparency
should be key concerns for any process miner.
4 CONCLUSION
This paper introduced the notion of “Green Data Sci-
ence” (GDS) from four angles: fairness, confidential-
ity, accuracy, and transparency. The possible “pollu-
tion” caused by data science should not be addressed
(only) by legislation. We should aim for positive,
technological solutions to protect individuals, orga-
nizations and society against the negative side-effects
of data. As an example, we discussed “green chal-
lenges” in process mining. Table 1 can be viewed as
a research agenda listing interesting open problems.
REFERENCES
Aalst, W. van der (2011). Process Mining: Discovery, Con-
formance and Enhancement of Business Processes.
Springer-Verlag, Berlin.
Aalst, W. van der (2013a). Business Process Management:
A Comprehensive Survey. ISRN Software Engineer-
ing, pages 1–37. doi:10.1155/2013/507984.
Aalst, W. van der (2013b). Decomposing Petri Nets for Pro-
cess Mining: A Generic Approach. Distributed and
Parallel Databases, 31(4):471–507.
Aalst, W. van der (2014). Data Scientist: The Engineer of
the Future. In Mertins, K., Benaben, F., Poler, R.,
and Bourrieres, J., editors, Proceedings of the I-ESA
Conference, volume 7 of Enterprise Interoperability,
pages 13–28. Springer-Verlag, Berlin.
Aalst, W. van der (2016). Process Mining: Data Science in
Action. Springer-Verlag, Berlin.
Aalst, W. van der, Adriansyah, A., and Dongen, B. van
(2012). Replaying History on Process Models
for Conformance Checking and Performance Analy-
sis. WIREs Data Mining and Knowledge Discovery,
2(2):182–192.
Burattin, A., Sperduti, A., and Aalst, W. van der (2014).
Control-Flow Discovery from Event Streams. In IEEE
Congress on Evolutionary Computation (CEC 2014),
pages 2420–2427. IEEE Computer Society.
Calders, T. and Verwer, S. (2010). Three Naive Bayes
Approaches for Discrimination-Aware Classification.
Data Mining and Knowledge Discovery, 21(2):277–
292.
Casella, G. and Berger, R. (2002). Statistical Inference, 2nd
Edition. Duxbury Press.
European Commission (1995). Directive 95/46/EC of the
European Parliament and of the Council on the Pro-
tection of Individuals with Wegard to the Processing
of Personal Data and on the Free Movement of Such
Data. Official Journal of the European Communities,
No L 281/31.
European Commission (2015). Proposal for a Regulation
of the European Parliament and of the Council on
the Protection of Individuals with Wegard to the Pro-
cessing of Personal Data and on the Free Movement
of Such Data (General Data Protection Regulation).
9565/15, 2012/0011 (COD).
IEEE Task Force on Process Mining (2013). XES Standard
Definition. www.xes-standard.org.
Kamiran, F., Calders, T., and Pechenizkiy, M. (2010).
Discrimination-Aware Decision-Tree Learning. In
Proceedings of the IEEE International Conference on
Data Mining (ICDM 2010), pages 869–874.
Leemans, S., Fahland, D., and Aalst, W. van der (2015).
Exploring Processes and Deviations. In Fournier, F.
and Mendling, J., editors, Business Process Manage-
ment Workshops, International Workshop on Business
Process Intelligence (BPI 2014), volume 202 of Lec-
ture Notes in Business Information Processing, pages
304–316. Springer-Verlag, Berlin.
Miller, R. (1981). Simultaneous Statistical Inference.
Springer-Verlag, Berlin.
Monreale, A., Rinzivillo, S., Pratesi, F., Giannotti, F., and
Pedreschi, D. (2014). Privacy-By-Design in Big Data
Analytics and Social Mining. EPJ Data Science,
1(10):1–26.
Nelson, G. (2015). Practical Implications of Sharing Data:
A Primer on Data Privacy, Anonymization, and De-
Identification. Paper 1884-2015, ThotWave Technolo-
gies, Chapel Hill, NC.
Pedreshi, D., Ruggieri, S., and Turini, F. (2008).
Discrimination-Aware Data Mining. In Proceedings
of the 14th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pages
560–568. ACM.
President’s Council of Advisors on Science and Technology
(2014). Big Data and Privacy: A Technological Per-
spective (Report to the President). Executive Office of
the President, US-PCAST.
Rajaraman, A. and Ullman, J. (2011). Mining of Massive
Datasets. Cambridge University Press.
Ruggieri, S., Pedreshi, D., and Turini, F. (2010). DCUBE:
Discrimination Discovery in Databases. In Proceed-
ings of the ACM SIGMOD Intetrnational Conference
on Management of Data, pages 1127–1130. ACM.
Zelst, S. van, Dongen, B. van, and Aalst, W. van der (2015).
Know What You Stream: Generating Event Streams
from CPN Models in ProM 6. In Proceedings of
the BPM2015 Demo Session, volume 1418 of CEUR
Workshop Proceedings, pages 85–89. CEUR-WS.org.
Vigen, T. (2015). Spurious Correlations. Hachette Books.