2 THEORY AND RELATED
WORK
2.1 Data Quality Monitoring
The Data Management Association says there are four
processes in managing data quality are plan, deploy,
monitor, and act (Brackett and Earley, 2009). The
plan is defined as a process for assessing the scope
of data quality issues. The deploy stage is the stage
for analyzing data profiles. Stages of monitoring are
used to monitor data quality and measure data quality
by business rules. Act stage is the last stage, namely
decision-making, to overcome and resolve data qual-
ity problems. In managing data quality, there is a
process, one of which is Data Quality Monitoring
(DQM). Data Quality Monitoring is a framework that
is used to control the quality of data in an informa-
tion system continuously, for example, by using met-
rics, reports, or by using profiling data regularly (Apel
et al., 2015). (Ehrlinger and W
¨
oß, 2017) et al. di-
viding the data quality monitoring process into four
steps, namely data profiling and quality assessment,
data quality repository, time series analytics, and vi-
sualization. Data profiling is a series of activities and
processes to determine metadata in a data set (Abed-
jan, 2016). Based on (Abedjan et al., 2016), data pro-
filing is divided into three groups, namely single col-
umn profiling, multiple columns profiling, and depen-
dencies. According to DataFlux Corporation, in the
technique and processing of data profiling involves
three categories of analytical methods, namely: col-
umn discovery, structure discovery, and relationship
discovery (Apel et al., 2015). Data Quality Assess-
ment is a phase in DQM that is used to verify the
source, quantity, and impact of each data item that
violates predetermined data quality rules. Data qual-
ity standards consist of five dimensions, and there are
availability, usability, reliability, relevance, and pre-
sentation qualityc (Cai and Zhu, 2015). DQ Repos-
itory is divided into two components, namely Data
Quality Metadata (DQMD) and the results of Data
Quality Assessment (DQA). DQMD is a description
of the data schema being assessed, while the results
of DQA are a database that stores DQA results over
time (Ehrlinger and W
¨
oß, 2017). Visualization of the
results of data quality has been examined by Kandel et
al. (Kandel et al., 2012), which highlights the need to
automate this step since the determination. The pur-
pose of the visualization is that there are two of them,
the time-series data stored from the DQ Repository
can be mapped directly, and on the other hand, the re-
sults of time series analysis can be presented to the
user.
2.2 Pureshare
Pureshare is a dashboard development methodology
produced by pureshare vendors. Pureshare proposes
to do projects associated with measuring and manag-
ing organizational performance easier. The pureshare
development method, starting with planning and de-
sign that identifying the user needs, in addition to this
step, also identifying the features on the dashboard.
After knowing the user needs, the next step is review
system and data, such as controlling the system, iden-
tifying of data sources, accessing data, and measuring
the size of a data. The next step is designing the pro-
totype as quickly as possible to provide an overview
of the dashboard system that will be created. After the
prototype is created, a series of prototypes that have
been made will be reviewed together with the user to
gather feedback to be further developed according to
the user’s needs. After the user approve the prototype
and it suitable with the user needs, the dashboard pro-
totype will be implemented in the release step.
2.3 Related Work
This research is a continuation of previous studies that
have been carried out. Previous research conducted
by Amethyst (Amethyst et al., 2018) about the using
of data profiling, which focuses primarily on analyz-
ing data by doing profiling data using the cardinali-
ties method, data pattern and value distribution using
open source applications. The results of profiling will
be implemented in the form of logic in open source
applications and will be compared with other open-
source applications. In another research used in this
study is a study from Dwiandriani, which had a main
focus on building a profiling data architecture for cal-
culating null or blank data in a column.
3 PROPOSED METHODOLOGY
In developing this data quality monitoring applica-
tion, there are two main focuses, including the devel-
opment of a dashboard as visualization and the de-
velopment of data quality monitoring architecture. In
developing the data quality monitoring architecture,
we used the concept that was studied by Ehrlinger
(Ehrlinger and W
¨
oß, 2017). The architecture of data
quality monitoring has been investigated by Ehrlinger
(Ehrlinger and W
¨
oß, 2017) states that data quality
monitoring architecture starts from determining data
source, followed by data profiling and data quality as-
sessment, storing the results, time series analysis, and
Analysis and Design of Data Quality Monitoring Application using Open Source Tools: A Case Study at a Government Agency
215