academic world as well (Scott and Wilkins, 1999;
Lin et al., 2006). These present new concepts in
the form of graph- and language-oriented synthetic
data description, providing greater flexibility in the
description and generation of synthetic data. An ap-
proach was proposed in (Gray et al., 1994) to generate
special-purpose data sets in parallel. It converts a sim-
ple sequential load into a parallel load, which turns a
two-day task into a one-hour task. (Bruno and Chaud-
huri, 2005) introduces a Data Generation Language
(DGL), to generate databases with complex synthetic
distributions and inter-table correlations. (Mans et al.,
2010) proposed experimental frameworks to generate
event data and specify, develop, test, and validate the
operational performance of systems.
Our approach differs from previously published
approaches in some aspects. First, the data are gener-
ated by real information systems, such that it always
has the same structure as the real-life data. Exist-
ing approaches can also generate the “semi-real” data,
but they require more efforts such as investigating the
data schema and how operations in the information
system change the database. Second, the user can de-
sign a business process to control the execution of in-
formation systems.
6 CONCLUSION
This paper proposes a framework to automatically
generate semi-real data. Indicated by the name, the
generated data are located between real-life data and
purely synthetic data. More precisely, it is generated
by automatically operating real information systems,
e.g., an ERP system Dolibarr. Therefore, it has the
same data structure as real-life data. The attribute
values in the data are created based on domain knowl-
edge and these may not be as precise as the values in
real-life data.
The framework is implemented as a ProM plu-
gin to support automatically operating on information
systems based on a simulation log (derived by simu-
lating a design model). Based on the generated data
and the designed model, various analysis techniques
can be verified.
REFERENCES
Bruno, N. and Chaudhuri, S. (2005). Flexible database gen-
erators. In Proceedings of the 31st international con-
ference on Very large data bases, pages 1097–1107.
VLDB Endowment.
Centre, P. B. (2018). DTM Database Tools.
http://www.sqledit.com/. Accessed: 2018-12-05.
Global Software Applications, L. (2018). GSAPPS.
http://www.gsapps.com/. Accessed: 2018-12-05.
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., and
Weinberger, P. J. (1994). Quickly generating billion-
record synthetic databases. In Acm Sigmod Record,
volume 23, pages 243–252. ACM.
Hoag, J. E. (2008). Synthetic data generation: Theory, tech-
niques and applications. University of Arkansas.
Hoag, J. E. and Thompson, C. W. (2009). A parallel
general-purpose synthetic data generator1. In Data
Engineering, pages 103–117. Springer.
IRI, T. C. C. (2018). IRI RowGen.
http://www.iri.com/products/rowgen. Accessed:
2018-12-05.
Jensen, K. (2013). Coloured Petri nets: basic con-
cepts, analysis methods and practical use, volume 1.
Springer Science & Business Media.
Jensen, K. and Kristensen, L. M. (2009). Coloured Petri
nets: Modelling and validation of concurrent systems.
Springer Science & Business Media.
Li, G., de Carvalho, R. M., and van der Aalst, W. M. P.
(2017). Automatic Discovery of Object-Centric Be-
havioral Constraint Models. In BIS 2017, June 28–30,
2017, Proceedings, pages 43–58. Springer.
Li, G., de Carvalho, R. M., and van der Aalst, W. M. P.
(2018a). Configurable event correlation for pro-
cess discovery from object-centric event data. In
2018 IEEE International Conference on Web Services
(ICWS), pages 203–210. IEEE.
Li, G., de Murillas, E. G. L., de Carvalho, R. M., and van der
Aalst, W. M. P. (2018b). Extracting object-centric
event logs to support process mining on databases. In
CAiSE Forum, pages 182–199. Springer.
Lin, P. J. et al. (2006). Development of a synthetic data set
generator for building and testing information discov-
ery systems. In Information Technology: New Gener-
ations, 2006. ITNG 2006. Third International Confer-
ence on, pages 707–712. IEEE.
Mans, R. S., Russell, N. C., van der Aalst, W. M. P., Mole-
man, A. J., and Bakker, P. J. (2010). Schedule-aware
workflow management systems. In Transactions on
Petri nets and other models of concurrency IV, pages
121–143. Springer.
Murata, T. (1989). Petri nets: Properties, analysis and ap-
plications. Proceedings of the IEEE, 77(4):541–580.
Scott, P. D. and Wilkins, E. (1999). Evaluating data
mining procedures: techniques for generating artifi-
cial data sets. Information and software technology,
41(9):579–587.
van der Aalst, W. M., Bichler, M., and Heinzl, A. (2018).
Robotic process automation.
van der Aalst, W. M. P. (1998). The application of petri
nets to workflow management. Journal of circuits,
systems, and computers, 8(01):21–66.
van der Aalst, W. M. P., Li, G., and Montali, M. (2017).
Object-Centric Behavioral Constraints. Corr techni-
cal report, arXiv.org e-Print archive. Available at
https://arxiv.org/abs/1703.05740.
Zervos, C. (1977). Coloured Petri nets: Their properties
and applications. PhD thesis, University of Michigan,
Michigan.
ICEIS 2019 - 21st International Conference on Enterprise Information Systems
220