It is worth noting that metadata quality dimensions
are strongly related to each other. For example, adding
additional information to improve completeness may
lead to increase metadata redundancy. It is
recommended to find the right balance to ensure that
any dimension is impacted heavily in such a case.
Thus, focusing only on one dimension without
considering its correlation with the others may not be
a practical approach for supporting metadata quality.
Therefore, for a successful metadata quality
assessment and improvement, data managers should
also consider the existing dependencies between the
metadata quality dimensions while prioritizing
metadata quality dimensions. This would also help
data managers to define the causes that may degrade a
specific dimension. Thus, improving metadata quality
is not limited to enhance the quality of metadata
dimensions. It is a whole process that involves other
components such as the intended use of data, the
business requirements, the organizational policy, and
the data owners.
5 CONCLUSIONS
Ensuring metadata quality is of great importance since
it directly impacts data quality and, thus, the extracted
insights' reliability. Therefore, several approaches
have been suggested in the literature to assess and
improve metadata quality. With the emergence of big
data, new challenges related to metadata quality have
been raised by big data's particular characteristics,
known as 7 V's. To the best of our knowledge, no
studies have been conducted to address the impact of
big data 7V is on the different metadata quality
dimensions. Thus, this work's purpose was to
highlight the quality issues related to metadata in big
data environments. This study analyzed each big data
characteristic's impact on the most common metadata
quality dimensions. Thus, six metadata quality
dimensions have been addressed: accuracy,
usefulness, completeness, consistency, shareability,
and timeliness. Also, some recommendations to
address the raised issues have been suggested. As
future work, we aim to propose a novel quality
framework for big data that addresses the metadata
quality issues raised in this paper and implements the
suggested solutions while considering the different
factors that could impact metadata quality, including
the organizational policy project context and the
business requirements.
REFERENCES
A. Immonen, P. Paakkonen, and E. Ovaska, "Evaluating the
Quality of Social Media Data in Big Data Architecture,"
IEEE Access, vol. 3, pp. 2028–2043, 2015.
T. Bruce and D. Hillmann, "The Continuum of Metadata
Quality: Defining, Expressing, Exploiting," ALA Ed.,
Jan. 2004.
P. Király, "Towards an extensible measurement of metadata
quality," presented at the Proceedings of the 2nd
International Conference on Digital Access to Textual
Cultural Heritage, Jun. 2017.
I. El Alaoui, Y. Gahi, and R. Messoussi, "Big Data Quality
Metrics for Sentiment Analysis Approaches," in
Proceedings of the 2019 International Conference on
Big Data Engineering (BDE 2019) - BDE 2019, Hong
Kong, Hong Kong, 2019, pp. 36–43.
I. E. Alaoui and Y. Gahi, "The Impact of Big Data Quality
on Sentiment Analysis Approaches," Procedia Comput.
Sci., vol. 160, pp. 803–810, 2019.
G. Kapil, A. Agrawal, and Prof. R. Khan, "A study of big
data characteristics," in International Conference on
Communication and Electronics Systems, Oct. 2016, p.
4.
I. E. Alaoui, Y. Gahi, and R. Messoussi, "Full
Consideration of Big Data Characteristics in Sentiment
Analysis Context," in 2019 IEEE 4th International
Conference on Cloud Computing and Big Data
Analysis (ICCCBDA), Chengdu, China, Apr. 2019, pp.
126–130.
M. Foulonneau, "Information redundancy across metadata
collections," Inf. Process. Manag., vol. 43, no. 3, pp.
740–751, May 2007.
C. McMahon and S. Denaxas, "A novel framework for
assessing metadata quality in epidemiological and
public health research settings," AMIA Summits
Transl. Sci. Proc., vol. 2016, pp. 199–208, Jul. 2016.
S. Shreeves, E. Knutson, B. Stvilia, C. Palmer, M. Twidale,
and T. Cole, "Is 'Quality' Metadata 'Shareable'
Metadata? The Implications of Local Metadata
Practices for Federated Collections," Dec. 2010.
T. R. Bruce and D. I. Hillmann, "The Continuum of
Metadata Quality: Defining, Expressing, Exploiting,"
ALA Editions, 2004.
B. Stvilia, L. Gasser, M. Twidale, and L. Smith, "A
framework for information quality assessment,"
JASIST, vol. 58, pp. 1720–1733, Oct. 2007.
X. Ochoa and erik duval, "Automatic evaluation of
metadata quality in digital libraries," Int J Digit. Libr.,
vol. 10, pp. 67–91, Aug. 2009.
K. J. Reiche and E. Höfig, "Implementation of Metadata
Quality Metrics and Application on Public Government
Data," in 2013 IEEE 37th Annual Computer Software
and Applications Conference Workshops, Jul. 2013, pp.
236–241.
P. Király and M. Büchler, "Measuring Completeness as
Metadata Quality Metric in Europeana," in 2018 IEEE
International Conference on Big Data, 2018.
S. Kubler, J. Robert, S. Neumaier, J. Umbrich, and Y. Le
Traon, "Comparison of metadata quality in open data