Authors:
Jaime Salvador-Meneses
1
;
Zoila Ruiz-Chavez
1
and
Jose Garcia-Rodriguez
2
Affiliations:
1
Universidad Central del Ecuador, Ciudadela Universitaria, Quito and Ecuador
;
2
Universidad de Alicante, Ap. 99. 03080, Alicante and Spain
Keyword(s):
Big Data, Data Compression, Categorical Data, Encoding.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Data Reduction and Quality Assessment
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
In the last years, some specialized algorithms have been developed to work with categorical information, however the performance of these algorithms has two important factors to consider: the processing technique (algorithm) and the representation of information used. Many of the machine learning algorithms depend on whether the information is stored in memory, local or distributed, prior to processing. Many of the current compression techniques do not achieve an adequate balance between the compression ratio and the decompression speed. In this work we propose a mechanism for storing and processing categorical information by compression at the bit level, the method proposes a compression and decompression by blocks, with which the process of compressed information resembles the process of the original information. The proposed method allows to keep the compressed data in memory, which drastically reduces the memory consumption. The experimental results obtained show a high compressi
on ratio, while the block decompression is very efficient. Both factors contribute to build a system with good performance.
(More)