Another member of the Apache family is
Hive (Thusoo et al., 2009), which provides database
like view on data in HDFS. It uses so-called SerDe
modules to serialize and deserialize data for querying
via SQL. SQL is a well-established query language
for relational structured data in familiar tools. Due
to the fact that Hive follows the SQL standard JDBC
and ODBC Database drivers are available to connect
with familiar frontend tools in order to analyze and
work with the data. Such tools include for example
Tableau for visualization and Matlab for data explora-
tion. Using the schema-on-read principle in combina-
tion with consolidated storages a harmonized access
to the data can be realized. All data can be accessed
via well-defined interfaces, allowing for a fast and ho-
listic analysis.
6 CONCLUSION
In this work, we presented an architectural frame-
work and integration chain that enables Big Data ana-
lytics workflows for further application in the ma-
nufacturing domain. By making use of the propo-
sed concepts, production environments with their spe-
cial challenges and requirements can be appropria-
tely mapped to an overall information management
and data harmonization. One of the major inside of
this work consists in the identification of lacking data
accessibility in current factories. By increasing the
general data accessibility through the proposed inte-
gration approach, it becomes possible to get deeper
insides with regard to the production processes and to
reveal patterns based on machine learning for corre-
lations across different data sources. We shown how
possible ingestion processes of these different sorts of
data sets and streams into one coherent data pool can
be realized by making use of established technolo-
gies. Further steps that are required for a more generic
integration of shop floor devices into data lake struc-
tures can be identified especially in the field of OPC
UA data and information modeling. Other topics to be
further investigated are issues related to data gover-
nance and auditing. This is especially important for
cases, in which privacy or customer/user data become
relevant and are included into the data lake structure.
REFERENCES
Apache Software Foundation (2009). Apache Avro.
https://avro.apache.org - last accessed Jan-2018.
Ball, G., Runge, C., Ramsey, R., and Barrett, N. (2017).
Systems integration and verification in an advanced
smart factory. In 2017 Annual IEEE International Sy-
stems Conference (SysCon), pages 1–5.
Bonci, A., Pirani, M., and Longhi, S. (2016). A database-
centric approach for the modeling, simulation and
control of cyber-physical systems in the factory of the
future. IFAC-PapersOnLine, 49(12):249–254.
Calder, A. (2009). Information Security Based on ISO
27001/ISO 27002: A Management Guide - Best
Practice. Van Haren Publishing.
Cox, M. and Ellsworth, D. (1997). Managing big data for
scientific visualization. In ACM Siggraph, volume 97,
pages 21–38.
Dean, J. and Ghemawat, S. (2008). Mapreduce: Simpli-
fied data processing on large clusters. Commun. ACM,
51(1):107–113.
Fisher, M., Partner, J., Bogoevici, M., and Fuld, I. (2012).
Spring Integration in Action. Manning Publications
Co., Greenwich, CT, USA.
Goodhue, D. L., Wybo, M. D., and Kirsch, L. J. (1992). The
impact of data integration on the costs and benefits of
information systems. MiS Quarterly, pages 293–311.
Hohpe, G. and Woolf, B. (2003). Enterprise Integration
Patterns: Designing, Building, and Deploying Messa-
ging Solutions. Addison-Wesley Longman Publishing
Co., Inc., Boston, MA, USA.
Kleppmann, M. and Kreps, J. (2015). Kafka, Samza and
the Unix Philosophy of Distributed Data. IEEE Data
Eng. Bull., 38(4):4–14.
O’Leary, D. E. (2014). Embedding AI and Crowdsour-
cing in the Big Data Lake. IEEE Intelligent Systems,
29(5):70–73.
Rinaldi, J. (2013). OPC UA - the basics: An OPC UA over-
view for those who are not networking gurus. Ama-
zon, Great Britain.
Runge, C., Lynch, K., Ramsey, R., and Pauline, T. (2016).
Digital product assurance for model-based open ma-
nufacturing of small satellites. In 2016 IEEE Inter-
national Conference on Emerging Technologies and
Innovative Business Practices for the Transformation
of Societies (EmergiTech), pages 206–209.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R.
(2010). The hadoop distributed file system. In 2010
IEEE 26th Symposium on Mass Storage Systems and
Technologies (MSST), pages 1–10.
Technology, I. (2008). ISO/IEC 9075 Database languages -
SQL. Technical report, International Organization for
Standardization.
Technology, I. (2013). ISO/IEC 27002:2013 - Information
technology – Security techniques – Code of practice
for information security management. Technical re-
port, International Organization for Standardization.
Theorin, A., Bengtsson, K., Provost, J., Lieder, M., Johns-
son, C., Lundholm, T., and Lennartson, B. (2015). An
event-driven manufacturing information system archi-
tecture. IFAC-PapersOnLine, 48(3):547–554.
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., An-
thony, S., Liu, H., Wyckoff, P., and Murthy, R. (2009).
Hive: A warehousing solution over a map-reduce fra-
mework. Proc. VLDB Endow., 2(2):1626–1629.
Videla, A. and Williams, J. J. (2012). RabbitMQ in action:
distributed messaging for everyone. Manning.
Vinoski, S. (2006). Advanced message queuing protocol.
IEEE Internet Computing, 10(6):87–89.
ICEIS 2018 - 20th International Conference on Enterprise Information Systems
182