Low Level Big Data Processing
Jaime Salvador-Meneses
1
, Zoila Ruiz-Chavez
1
and Jose Garcia-Rodriguez
2
1
Universidad Central del Ecuador, Ciudadela Universitaria, Quito, Ecuador
2
Universidad de Alicante, Ap. 99. 03080, Alicante, Spain
Keywords:
Big Data, Compression, Processing, Categorical Data, BLAS.
Abstract:
The machine learning algorithms, prior to their application, require that the information be stored in memory.
Reducing the amount of memory used for data representation clearly reduces the number of operations requi-
red to process it. Many of the current libraries represent the information in the traditional way, which forces
you to iterate the whole set of data to obtain the desired result. In this paper we propose a technique to process
categorical information previously encoded using the bit-level schema, the method proposes a block proces-
sing which reduces the number of iterations on the original data and, at the same time, maintains a processing
performance similar to the processing of the original data. The method requires the information to be stored in
memory, which allows you to optimize the volume of memory consumed for representation as well as the ope-
rations required to process it. The results of the experiments carried out show a slightly lower time processing
than the obtained with traditional implementations, which allows us to obtain a good performance.
1 INTRODUCTION
The number of attributes (also called dimension) in
many datasets is large, and many algorithms do not
work well with datasets that have a high dimension.
Currently, it is a challenge to process data sets with
a high dimensionality such as censuses conducted in
different countries (Rai and Singh, 2010).
Latin America, in the last 20 years, has tended to
take greater advantage of census information (Feres,
2010), this information is mostly categorical informa-
tion (variables that take a reduced set of values). The
representation of this information in digital media can
be optimized given the categorical nature.
A census consists of a set of m observations (also
called records) each of which contains n attributes.
An observation contains the answers to a question-
naire given by all members of a household (Bruni,
2004).
This paper proposes a mechanism for proces-
sing categorical information using bit-level operati-
ons. Prior to processing, it is necessary to encode
(compress) the information into a specific format.
The mechanism proposes compressing the informa-
tion into packets of a certain number of bits (16, 32,
64 bits), in each packet a certain number of values are
stored.
Bitwise operations (AND, OR, etc.) are an impor-
tant part of modern programming languages because
they allow you to replace arithmetic operations with
more efficient operations (Seshadri et al., 2015).
This document is organized as follows: Section 2
summarizes the standard that defines the algebraic
operations that can be performed on data sets as
well as some of the main libraries that implement
it, Section 3 presents an alternative for information
processing based on the BLAS Level 1 standard,
Section 4 presents several results obtained using the
proposed processing method and, finally, Section 5
presents some conclusions.
2 BLAS SPECIFICATION
In this section we present a summary of some libraries
that implements the BLAS specification and a brief
review about encoding categorical data.
Basic Linear Algebra Subprograms (BLAS) is a
specification that defines low-level routines for per-
forming operations related to linear algebra. Opera-
tions are related to scalar, vector and matrix process.
BLAS define 3 levels:
• BLAS Level 1 defines operations between vector
scales
• BLAS Level 2 defines operations between scales
Salvador-Meneses, J., Ruiz-Chavez, Z. and Garcia-Rodriguez, J.
Low Level Big Data Processing.
DOI: 10.5220/0007227103470352
In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018) - Volume 1: KDIR, pages 347-352
ISBN: 978-989-758-330-8
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
347