ICs Manufacturing Workﬂow Assessment via Multiple Logs Analysis

Vincenza Carchiolo

1 a

, Alessandro Longheu

1 b

, Giuseppe Saccullo

, Simona Maria Sau

and Renato Sortino

University of Catania, Italy

STMicroelectronics, Catania, Italy

Keywords:

Log Mining, Manufacturing Workﬂow, Big Data, Real-time Analysis, Elasticsearch, Data Visualization.

Abstract:

Today’s complexity in ICT services, consisting of several interacting applications, requires strict control over

log ﬁles to detect what exceptions/errors occurred and how they could be ﬁxed. The current scenario is harder

and harder due to the volume, velocity, and variety of (big) data within log ﬁles, therefore an approach to

assist developers and facilitate their work is needed. In this paper an industrial application of such log analysis

is presented, in particular, we consider the manufacturing of Integrated Circuits (ICs), i.e. a set of physical

and chemical processes performed by production machines onto silicon slices. We present a widely used

set of open-source tools that join together a platform to allow logs mining to assess manufacturing workﬂow

processes. We show that the proposed architecture helps in discovering and removing anomalies and slowdown

in ICs production.

1 INTRODUCTION

The idea of a log ﬁle is really old, going back to the

beginning of computer era. The need for tracking

what computers do (or do not) during the execution

of any process still underpins the correct functioning

of applications. The increasing complexity in current

ICT services materializes in a rich set of interacting

applications, each one generating its log ﬁles at a dif-

ferent (possibly high) rate, with distinct ﬁle format

(sometimes not so human-readable) and with contents

at different detail. In the very past, log ﬁles were used

both for accountability purposes and also as an inves-

tigation tool by ICT specialists to detect what excep-

tions/errors occurred and how they could be ﬁxed. To-

day’s scenario often requires (in some cases even real-

time) analysis of such logs that becomes harder and

harder in the 3-V - volume, velocity, and variety- big-

data compliant scenario just described (Katal et al.,

2013) (Lu et al., 2014), therefore an approach to assist

developers and facilitate their work is needed. Several

solutions have been developed in recent years to this

purpose, usually consisting of a tool or a group of in-

teracting tools that carry out the following tasks:

https://orcid.org/0000-0002-1671-840X

https://orcid.org/0000-0002-9898-8808

• gather the set of log ﬁles to collect all information

• ﬁlter data stored into logs to extract relevant infor-

mation, e.g. via ER patterns

• convey ﬁltered data into a query engine to support

indexing and searching

• visualize queried data in an effective form as

charts and graphs, to support the discovery of hid-

den phenomena; advanced architecture could also

provide more speciﬁc mining tools.

In this paper, an industrial application of log analysis

is presented. The scenario concerns the manufactur-

ing of Integrated Circuits (ICs) inside the STMicro-

electronics production facilities (STMicroelectronics,

2020). The manufacturing workﬂow basically con-

sists of a set of physical and chemical processes per-

formed by production machines onto silicon slices to

turn a group of them (named Lot) into a set of ICs; de-

tails about the production model can be found in (Car-

chiolo et al., 2010).

A wide range of ICs is developed to satisfy

different market requests, ranging from memories

to devices for automotive, imaging/photonics and

power management scenarios, to MEMS, NFC, mo-

tor drivers, and others. Production machines for all

of these devices are managed via a set of integrated

applications generating a huge amount of logs on a

Carchiolo, V., Longheu, A., Saccullo, G., Sau, S. and Sortino, R.

ICs Manufacturing Workﬂow Assessment via Multiple Logs Analysis.

DOI: 10.5220/0009517208010808

In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020) - Volume 2, pages 801-808

ISBN: 978-989-758-423-7

801

24/7/365 basis to trace the (sometimes very complex)

sequence of ICs manufacturing. Anomalies or errors

may arise at some point in the sequence, especially

when new devices are being tested or when new tech-

nologies are being implemented (e.g. switching from

7nm to 5nm process (IEEE.org, 2018)). Because of

the complexity of interacting processes carried out by

machines, a single log analysis could be not enough to

discover where, when and how such anomalies occur,

therefore an effective log investigation is needed, in

particular integrating information coming from multi-

ple machine driving applications to address and solve

production workﬂow issues for a globally efﬁcient

ICs manufacturing.

This need for exploiting log data goes back to sev-

eral years ago , e.g. in (Longheu et al., 2009) an old-

dated example about the classiﬁcation of reports (that

are data collection coming from log analysis) is in-

troduced; here the most recent solution to cope with

the current 3-V scenario is presented and discussed

(a quite similar work we recently developed can be

found in (Carchiolo. et al., 2019)). In particular, we

adopt the ELK Stack (Elastic.co, 2020a) as a widely

used enterprise open source platform successfully ex-

ploited in several other contexts, in conjuction with

Kafka broker (Apache software foundation, 2020b)

to integrate sources (logs and databases) making them

available to subsequent log mining tools. We describe

the system and implementation details to highlight

how the solution effectively helps in manufacturing

workﬂow assessment and improvement.

The paper is organized as follows: in section 2 the

overall architecture is introduced with its main com-

ponents, while in section 3 in-depth details are pro-

vided to describe the ongoing activity. Section 4 pro-

vides an overview of related works where different

scenarios require a similar approach in the log min-

ing task, ﬁnally presenting our concluding remarks in

section 5.

2 LOG ANALYSIS

ARCHITECTURE OVERVIEW

As stated in the introduction, several steps character-

ize a log analysis system; Figure 1 shows the archi-

tecture presented in this paper, and in the following

each element is described in more detail. The starting

point is the set of data sources, where the core busi-

ness relies upon; such sources are characterized by the

3-V Big data properties, therefore, the ﬁrst element to

manage them is the Data Shipping, consisting of two

modules, namely FileBeat and Kafka in Fig. 1.

Figure 1: System Architecture.

2.1 Extracting and Deliverying Data

FileBeat (Elastic.co, 2020c) belongs to the Beat fam-

ily, a set of products part of the ELK Stack tech-

nology, namely Elasticsearch, Logstash, and Kibana

software (Elastic.co, 2020a). The purpose of File-

Beat is to fetch data from log ﬁles, collect and send

them to the other modules. FileBeat includes a set of

so-called harvesters, each one conﬁgured for a spe-

ciﬁc log ﬁle and that reads the ﬁle itself on a row ba-

sis. FileBeat can accommodate both inputs as well

as outputs lack, i.e. whenever its output (e.g. Kafka

or LogsTash) is not available, it buffers collected log

lines until the output responds again, whereas if the

input ﬁle is moved/renamed during the reading, the

harvester continues to read the ﬁle thanks to a unique

identiﬁer for each ﬁle. The output sends an acknowl-

edgment to FileBeat hence in case of data loss due

to output and/or FileBeat downtime events, FileBeat

sends again lines starting from the one unconﬁrmed.

The second module of Data Shipping step

is the Kafka Apache distributed streaming plat-

form (Apache software foundation, 2020b) that re-

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

802

ceives data from FileBeat and also from production

databases (see sec. 3) and generates a fault-tolerant

durable way data stream for the next module devoted

to data processing, namely Logstash. Kafka supports

the standard publish-subscribe mechanism to manage

its input sources (publishers) as well as output (sub-

scribers); it also arranges data into topics, a logical

grouping of messages sharing some features. Each

topic contains a set of partitions, each being a se-

quence of records; a record is the atomic element

used to build the stream. Partitions help in increas-

ing the parallelism, especially useful for multiple con-

sumers (Logstash instances in our case), also allowing

to scale the amount of data for a given topic. Parti-

tions generally run on a set of servers for fault toler-

ance purposes; which partition is managed by which

server and which consumer is associated with is com-

pletely conﬁgurable to achieve the best performances.

Kafka durably persists all its published records

for a speciﬁc retention time; when it expires, records

are deleted to free space. Subscribers (Logstash in-

stances) can be conﬁgured to consume records from

a speciﬁc offset position established by the subscriber

itself, implementing a ﬂexible tail-command like op-

eration that allows tailoring stream receiving to ac-

tual subscriber’s capabilities; moreover, Kafka in-

clude an acknowledgment mechanism to guarantee

that consumers got their data. The management of

offset positions is handled by ZooKeeper (Apache

software foundation, 2020e), a coordination service

for distributed applications part of the Apache Soft-

ware Foundation like Kafka. ZooKeeper manages the

set of servers running Kafka, preventing race condi-

tions and deadlock, and exploiting servers’ states to

redirect a client whenever its related server is down or

the connection is lost.

2.2 Elaborating Data

The next stage of log analysis is the processing of

data, carried out by the Logstash module (Elastic.co,

2020e). As soon as data comes (from Kafka in our ar-

chitecture) an event is triggered and stored in a queue,

from where a thread (one for each input source) peri-

odically fetch a set of events, process them using cus-

tom ﬁlters, delivering processed data to other mod-

ules. In the input stage, both the set size and the

number of running threads are fully conﬁgurable, and

even the queue can be set as persistent (disk rather

than in-memory) to cope with Logstash unforeseen

crashes. The ﬁlter stage is the Logstash core, where

several predeﬁned ﬁlters are available and customiz-

able to allow data to be structured, modiﬁed, added

or discarded. Filter ranges from standard regular ex-

pressions pattern matching to string-number conver-

sion and vice-versa, check IP addresses against a list

of network blocks or convert IP to geolocalized info,

data split (e.g. from CSV format) or merging, date

conversion, parsing of unstructured event data into

ﬁelds, JSON format plug-in and many others. The

last stage of the Logstash pipeline is the output, where

processed data are sent to other modules as Elastic-

Search (as it occurs in our architecture, see ﬁg. 1),

ﬁles, services, or tools; output plug-ins are available

to support speciﬁc connection.

2.3 Visualizing and Mining Processed

Data

The last module of the proposed architecture includes

two different tools, the former is Elasticsearch and is

used to index and search data previously processed

by Logstash module, whereas the latter (Kibana) en-

dorses an effective and fruitful visualization of pro-

cessed data. Elasticsearch (Elastic.co, 2020b) is an

engine that uses standard RESTful APIs and JSON;

it allows indexing and searching stored data coming

from Logstash.

Elasticsearch runs as a distributed, horizontally

scalable and fault-tolerant cluster of nodes; indices

can be split into shards over separate nodes to im-

prove both searching operations as well as the sys-

tem’s fault-tolerance.

Elasticsearch can be used for several purposes as:

• logs monitoring, adopting a tail -f old-style visu-

alization

• infrastructure monitoring, where either predeﬁned

or customizable metrics can be leveraged extract

and highlight relevant information

• application performance and availability monitor-

ing

Each document inside Elasticsearch is represented as

a JSON object with a key-value pair for each ﬁeld;

APIs are provided to support full CRUD operations

over documents; optimized relevance models and

support for typo-tolerance, stemming, bigrams are in-

cluded. Full-text searching is performed by leverag-

ing the Lucene engine library (Apache software foun-

dation, 2020c).

The last module of the proposed architecture is

Kibana (Elastic.co, 2020d), that allows us to visual-

ize and navigate data managed by Elasticsearch, pro-

viding a complete set of tools as histograms, line

graphs, pie charts, maps, time series, graphs, tables,

tag clouds, and others. Besides, it comes with ma-

chine learning capabilities that help e.g. in time se-

ries forecasting and anomaly detections; ﬁnally, all

ICs Manufacturing Workﬂow Assessment via Multiple Logs Analysis

803

such tools can be arranged together into dashboards to

provide at-a-glance view of all relevant information.

The displayed information can be easily exported as

PNG/JPG images or CSV raw data.

3 THE SYSTEM AT WORK:

IMPLEMENTATION NOTES

AND TEST SCENARIO

After the description of the general architecture in the

previous section, here we present the actual conﬁgu-

ration of the proposed system, also providing details

about the implementation and test scenario.

In particular, we run a couple of FileBeat servers,

one for Windows and another for Linux platform

since logs come from both architectures, whereas a

pair of redundant servers were used for data ﬁlter-

ing (Kafka-Logstash) and searching (Elasticsearch-

Kibana); during tests, four applications were cur-

rently involved for a total of about 50 log ﬁles.

In the production scenario, the number of manu-

facturing machines is roughly 300, driven by 50 ap-

plications generating up to 500 log ﬁles simultane-

ously. Besides, a set of speciﬁc apps fetch alarms data

from a Java Message Service (JMS)-based queue to

process and afterward store them in Oracle Databases

(not shown in ﬁg. 1). Both log ﬁles and Oracle DB are

used as data sources, managed by the Kafka stream-

ing platform. This production scenario will however

require further evaluations to properly scale resources

needed for effective yet efﬁcient management.

To achieve the best performances for data in-

dexing and searching, based on the most recent -

most queried principle, nodes were recent indices

are stored (hot nodes) were implemented as high-end

servers with local SSD storage, swap memory dis-

abled to improve response time and with a 12GB heap

memory size over a total of 24GB total RAM for each

node. The number of shards was set to 20 per GB

heap, i.e. 200 on a single hot node while the thread

pool size was set to 4096. Conversely, the so-called

warm nodes store long-term read-only indices, there-

fore, relaxed requirements on their performance are

needed. Data transfer from hot to warm nodes oc-

curs when indices on the hot nodes exceed the reten-

tion period we set to 90 days. Our total Elastic index

size is about 5.5GB/day, whereas our minimum sys-

tem size includes the following:

• 2 hot nodes (1 active, 1 replica): 250Gb SSD, 8Gb

RAM

• 2 warm nodes (1 active, 1 replica): 500Gb HDD,

16Gb RAM

Focusing on log ﬁles processing, their initial format

is mostly the classical text-based format, i.e. each

row having a timestamp followed by speciﬁc info

about the event being logged; another format that also

occurs in our scenario is XML-based. To logically

grouping messages, we deﬁned several topics within

Kafka streaming platform, each featuring a speciﬁc

step of the production process (named transaction)

and deﬁned with a set of regular expressions, one

for each log ﬁle, for instance, some ﬁles have rows

starting with a speciﬁc XML tag, whereas in others a

special character identiﬁes the required information.

About 30 topics were deﬁned, using the default Kafka

conﬁguration for the number of partitions they are ar-

ranged into. As indicated above, the retention time

was set to 1.5 months.

The interface provided by Kibana to inquire into

Elasticsearch indexed data allows the deﬁnition of

dashboards, as cited in the previous section. In Fig. 2

one of such dashboard is shown, where:

• on the left the search bar is displayed; usually we

specify the name of a production machine, or the

id of a lot we are interested in, o a RegEx to match

a group of error messages

• on the upper part of the dashboard, the graph

beside the search bar is used to trace the lack

of synchronization between applications and the

software interface that manages production ma-

chines. Indeed, whenever this occurs a stop in lots

manufacturing for the next phase is automatically

activated; the analysis of logs aims at discover-

ing when, why, which lots-machine pair(s) deter-

mined that anomaly.

• the bar chart on the upper right part of Fig. 2

shows the number of messages coming from logs

and involved in the synchro lack described above,

grouped by MES (manufacturing execution sys-

tem) operations. In particular, three operations

are shown and also represented in the pie-chart

at the center of the ﬁgure; most of them concern

the ScanGo, actually related to the handheld bar-

code wireless scanner used by human operators to

identify lots manually moved from a machine to

the next one during manufacturing.

In Fig. 3 we show the co-occurrence of two anoma-

lies, the one identiﬁed with error-code 1 is raised

by a subset of manufacturing operations, whereas the

WIPFLT ﬂat line is the stop in lots production to be re-

duced by discovering and removing triggering factors

via logs analysis; the graph helps in detecting time

instant where both anomalies occur.

Fig 4 shows an example of which type of log mes-

sages (namely, topics) are produced by applications,

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

804

Figure 2: A dashboard deﬁned within Kibana.

in particular, CIM (Computer Integrated Manufactur-

ing) is the software operating at a high-end, whereas

the RWS (Remote Workstream System) is the inter-

mediate layer between CMS and the MES software

driving production machines. Topics just related to

CIM but not to RWS, for instance, denote that the po-

tential anomaly occurs just at the higher level and is

not probably due to hardware related processes; two

of such topics are the RTDRULE concerning business

rules, and the CMTRASF, happening when lots are

moved from a carrier to another for further operations.

4 RELATED WORKS

This section aims to provide an overview of real-

world applications where log analysis endorses pro-

cesses’ effectiveness and/or efﬁciency. We believe

that a comparison attempt among the following works

and with the one presented here would probably lack

meaningfulness, because of the difference in scenar-

ios such works concern; therefore, we simply outline

each approach to show how relevant is log mining

across different industries.

In (Wang et al., 2019) authors describe a virtu-

alization platform to support their campus NetFlow

log data analysis using the Ceph storage system (The

Ceph foundation, 2020) and the same ELK Stack

we adopt in our proposal. The data rate is approx

38GB/day with more than 8TB collected.

In (Li et al., 2019) web usage log mining is

examined as the process of reconstructing sessions

from raw logs and ﬁnding interesting patterns for

purposes as ranking, query suggestion, and recom-

mendation. In this paper, authors propose a cloud-

based log-mining framework that exploits Apache

Spark (Apache software foundation, 2020d) and Elas-

ticsearch in addition to a data partition paradigm de-

signed to solve the data imbalance problem in data

parallelism.

The ELK Stack technology is still adopted in (Pur-

nachandra Rao and Nagamalleswara Rao, 2019)

to perform Hadoop (Apache software foundation,

2020a) FS logﬁle analysis in assisting the manage-

ment of server and/or storage failures, in particular

showing the frequency of errors by the given period

time using different forms such as trend, bar, pie and

gauge charts.

Real-time online social media data processing

is proposed in (Shah et al., 2018) to discover po-

litical trends, advertising, public health awareness

programs, and policymaking. Authors demonstrate

a solution to effectively address the challenges of

real-time analysis using a conﬁgurable Elasticsearch

search engine with a distributed database architecture

and pre-build indexing for large scale text mining.

(Yang et al., 2019) introduces a log management

system for network administrators. The proposed sys-

tem uses ELK Stack with Ceph to provide a safe

network, good Wi-Fi signal strength, and adequate

backup data mechanism, integrating both Wi-Fi log

and NetFlow log data into a single architecture.

Execution anomaly detection in large-scale sys-

tems through console log analysis is addressed

in (Bao et al., 2018). Authors’ propose to replace the

unfeasible manual log inspection ﬁrst by extracting

ICs Manufacturing Workﬂow Assessment via Multiple Logs Analysis

805

Figure 3: Two anomalies correlation graph (number of occurrences vs time).

the set of log statements in the source code, then gen-

erating the reachability graph to reveal the reachable

relations of log statements, ﬁnally exploiting informa-

tion retrieval techniques to parse log ﬁles therefore ef-

fectively detecting running anomalies.

Another work about logs mining is discussed

in (Agrawal et al., 2018), where authors introduce a

system which monitors the logs of OpenStack com-

ponents (OpenStack.org, 2020) in real-time and gen-

erates an alert for the information, debug, error, warn-

ing, and trace messages, to effective help administra-

tors in their activity.

The work introduced in (Nagdive et al., 2019)

presents a methodology to preprocess high volume

web log ﬁles to detect users’ behavior and thus

identify the potential value hidden within websites’

data, ﬁnally assisting administrators in their busi-

ness decision-making process. The proposed en-

terprise weblog analysis system is based on the

Hadoop Distributed File System (HDFS) and related

MapReduce Software Framework and Pig Latin Lan-

guage (Apache software foundation, 2020a).

In (Castro and Schots, 2018), the scenario con-

sidered is that of software development. In particu-

lar, results stored in log ﬁles during the testing phase

can be leveraged by developers to understand failures

and identify potential causes. Authors present an in-

frastructure that extracts information from several log

ﬁles and presents it in multi-perspective interactive vi-

sualizations that aim at easing and improving the de-

velopers’ analysis process.

(Hamilton et al., 2018) describes the use of Elas-

ticstack within the context of CERN (CERN, 2020),

where more than 200 control applications includ-

ing domains such as LHC magnet protection, cryo-

genics, and electrical network supervision systems

have been developed. Millions of value changes and

alarms from many devices are archived to a central-

ized database; the Elastic Stack is exploited to pro-

vide easy access to such huge data and it can be used

e.g. to detect abnormal situations and alarm miscon-

ﬁguration.

5 CONCLUSIONS AND FUTURE

WORKS

In this paper, we presented a log mining architecture

within an ICs manufacturing context. We outlined

each component, from the gathering to ﬁltering, in-

dexing, and searching modules; moreover, we also

detailed the setup of a real implementation to high-

light how the proposed architecture can be effectively

used as a basis to improve the production workﬂow

and detect anomalies/errors during the manufacturing

processes.

Further works include

• the deeper investigation, conﬁguration and testing

of data mining algorithms on log ﬁles, being this

at a very early stage of development

• the deployment of a properly sized and conﬁgured

production environment compliant system

• the development of a complete list of ﬁltering

rules within Elasticsearch to improve logs data

management

• the deﬁnition of dashboards set within Kibana,

one for each working group to feature different

departments with speciﬁc data they are interested

in.

• the design and implementation of an integrated

intelligent (e.g. machine-learning based) system

to provide a more powerful tool for manufactur-

ing assessment and control, e.g. automatic data-

driven recovery procedures

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

806

Figure 4: Heatmap showing topics (log messages occurrences) vs CIM/RWS applications.

REFERENCES

Agrawal, V., Kotia, D., Moshirian, K., and Kim, M. (2018).

Log-based cloud monitoring system for openstack.

In 2018 IEEE Fourth International Conference on

Big Data Computing Service and Applications (Big-

DataService), pages 276–281.

Apache software foundation (2020a). Apache Hadoop.

https://hadoop.apache.org/. Accessed: 2020-01-01.

Apache software foundation (2020b). Apache Kafka. https:

//kafka.apache.org/. Accessed: 2020-01-01.

Apache software foundation (2020c). Apache Lucene.

https://lucene.apache.org/. Accessed: 2020-01-01.

Apache software foundation (2020d). Apache Spark. https:

//spark.apache.org/. Accessed: 2020-01-01.

Apache software foundation (2020e). Apache ZooKeeper.

https://zookeeper.apache.org/. Accessed: 2020-01-01.

Bao, L., Li, Q., Lu, P., Lu, J., Ruan, T., and Zhang, K.

(2018). Execution anomaly detection in large-scale

systems through console log analysis. Journal of Sys-

tems and Software, 143:172 – 186.

Carchiolo, V., D’Ambra, S., Longheu, A., and Malgeri, M.

(2010). Object-oriented re-engineering of manufac-

turing models: A case study. Information Systems

Frontiers, 12(2):97–114.

Carchiolo., V., Longheu., A., di Martino., V., and Consoli.,

N. (2019). Power plants failure reports analysis for

predictive maintenance. In Proceedings of the 15th

International Conference on Web Information Systems

and Technologies - Volume 1: WEBIST,, pages 404–

410. INSTICC, SciTePress.

Castro, D. and Schots, M. (2018). Analysis of test log infor-

mation through interactive visualizations. In Proceed-

ings of the 26th Conference on Program Comprehen-

sion, ICPC ’18, pages 156–166, New York, NY, USA.

Association for Computing Machinery.

CERN (2020). CERN Accelerating science. https://home.

cern/. Accessed: 2020-01-01.

Elastic.co (2020a). Elastics. https://www.elastic.co/. Ac-

cessed: 2020-01-01.

Elastic.co (2020b). Elasticsearch. https://www.elastic.co/

products/elasticsearch. Accessed: 2020-01-01.

Elastic.co (2020c). FileBeat. https://www.elastic.co/

products/beats. Accessed: 2020-01-01.

Elastic.co (2020d). Kibana. https://www.elastic.co/

products/kibana. Accessed: 2020-01-01.

Elastic.co (2020e). Logstash. https://www.elastic.co/

products/logstash. Accessed: 2020-01-01.

Hamilton, J., Gonzalez Berges, M., Schoﬁeld, B., and

Tournier, J.-C. (2018). SCADA statistics monitor-

ing using the Elastic Stack (Elasticsearch, Logstash,

Kibana). In Proceedings, 16th International Confer-

ence on Accelerator and Large Experimental Physics

Control Systems (ICALEPCS 2017): Barcelona,

Spain, October 8-13, 2017, page TUPHA034.

IEEE.org (2018). International Roadmap for Devices and

Systems 2018 Edition. https://irds.ieee.org/. Ac-

cessed: 2020-01-01.

Katal, A., Wazid, M., and Goudar, R. H. (2013). Big

data: Issues, challenges, tools and good practices. In

2013 Sixth International Conference on Contempo-

rary Computing (IC3), pages 404–409.

Li, Y., Jiang, Y., Gu, J., Lu, M., Yu, M., Armstrong, E.,

Huang, T., Moroni, D., McGibbney, L., Frank, G., and

et al. (2019). A cloud-based framework for large-scale

ICs Manufacturing Workﬂow Assessment via Multiple Logs Analysis

807

log mining through apache spark and elasticsearch.

Applied Sciences, 9(6):1114.

Longheu, A., Malgeri, M., Carchiolo, V., Mangioni, G.,

Caruso, C., Sau, S., and Saccullo, G. (2009). Effec-

tive classiﬁcation of enterprise reports in production

assessment. In 2009 Second International Conference

on Future Information Technology and Management

Engineering, pages 286–289.

Lu, R., Zhu, H., Liu, X., Liu, J. K., and Shao, J. (2014).

Toward efﬁcient and privacy-preserving computing in

big data era. IEEE Network, 28(4):46–50.

Nagdive, A., Tugnayat, R., Regulwar, G., and Petkar, D.

(2019). Web server log analysis for unstructured data

using apache ﬂume and pig. International Journal of

Computer Sciences and Engineering, 7:220–225.

OpenStack.org (2020). OpenStack. https://www.openstack.

org/. Accessed: 2020-01-01.

Purnachandra Rao, B. and Nagamalleswara Rao, N.

(2019). HDFS Logﬁle Analysis Using ElasticSearch,

LogStash and Kibana, pages 185–191. Springer Sin-

gapore, Singapore.

Shah, N., Willick, D., and Mago, V. (2018). A framework

for social media data analytics using elasticsearch and

kibana. Wireless Networks.

STMicroelectronics (2020). STMicroelectronics. https://st.

com/. Accessed: 2020-01-01.

The Ceph foundation (2020). Ceph. https://ceph.io/

ceph-storage/. Accessed: 2020-01-01.

Wang, Y.-T., Yang, C.-T., Kristiani, E., Liu, M.-L., Lai, C.-

H., Jiang, W.-J., and Chan, Y.-W. (2019). The imple-

mentation of netﬂow log system using ceph and elk

stack. In Hung, J. C., Yen, N. Y., and Hui, L., edi-

tors, Frontier Computing, pages 256–265, Singapore.

Springer Singapore.

Yang, C.-T., Kristiani, E., Wang, Y.-T., Min, G., Lai, C.-H.,

and Jiang, W.-J. (2019). On construction of a network

log management system using elk stack with ceph.

The Journal of Supercomputing.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

808