2 RELATED WORK
For the purpose of comparing our results, we have
used LZMA and BZIP-2. Among the different
compression algorithms that have been used for
compression of power quality data, LZMA and BZIP-
2 are considered among the best for compression of
power quality data as discussed in (Techeou et al.
2013; Kraus 2009; Azoff 1994). A brief explanation
of the features of these algorithms along with selected
other compression methods is given below:
2.1 LZMA
LZMA is a type of dictionary lossless compression
algorithm that uses a large dictionary size for coming
up with a compression scheme. The basic idea is to
use various dictionary data structures to come up with
different symbols within the original file and then use
range encoder to encode these repeating symbols.
This is a variant of LZ family differs in terms of the
construction and size of the dictionary used for
compression. As illustrated in section 4, LZMA
compresses a smart grid data archive up to 83 percent
on average.
2.2 BZIP-2
The reason of using this library is because of its
prominence in compressing power quality data as
discussed in (Kraus et al. 2009). It is a type of block
sorting algorithm that uses Burrows-Wheeler block
sorting text compression algorithm along with
integrating Huffman encoding. It gives relatively
faster compression and decompression rate compared
to the conventional LZ77 which is precursor to
LZMA (LZMA SDK web. 2014). However, BZIP-2
needs better computing resources because of its usage
of block sorting mechanism. BZIP-2 gives 79 percent
compression on average as illustrated in section 4.
2.3 Differential Compression
This technique uses the similarity with the previous
readings to compress the data. It looks for the relation
between the previous value in the data and the current
value and based on this relation it takes the decision
of placing the data in its compressed format which
reduces the overall data size. This type of technique
is useful for the types of data sets which have do not
have much variations in its readings and in ideal
situations, this technique can give up to 98%
compression (Cormack et al. 1987).
2.4 PPM
Prediction by partial matching (PPM) is a technique
that uses statistical modelling of the data to predict
the upcoming values of the data and compress it
accordingly. The predictions in PPM are essentially
the symbols rankings. The technique uses previous
‘n’ symbols to come up with the statistical model and
if the prediction is not possible with the help of
previous ‘n’ symbols, then previous n-1 symbols are
utilized for coming up with a prediction. This process
is continued until the end of the data. The results as in
(Moffat 1990) shows that the technique is able to
compress 2.4b/character for English text at a very
high speed.
Our proposed strategy implements a pre-
processing to convert a smart grid archive into a
format which greatly enhances the results produced
using LZMA and BZIP2 compression technique. The
method converts an archive into a format which best
suits the LZMA and BZIP2 methods of compression.
Hence when implementing the overall compression,
LZMA or BZIP2 are considered as a black box. Using
this pre-processing the overall time and compression
efficiency are greatly improved.
3 METHODOLOGY
In this section, we provide the methodology for
pre-processing the smart grid data archive. A
record of reading from a smart meter consists
of a valid time stamp, meter id and the
corresponding current reading (Nabeel et al.
2013). For any smart grid data at high
frequency the time series is the main
component that takes the most space in the data
(Azoff 1994). Figure 1 shows the plot for thirty
sets of smart grid data. This data show that
times series consists of up to 70% of the total
smart grid data and hence can be smartly
compressed to reduce the overall data size.
A basic property of smart grid data which
helped us to come up with our enhanced
technique is the time interval in time series
data. Although the time interval may vary from
application to application in smart grids, but for
a particular application, for example for
load forecasting, the time interval between the
previous and next reading is fixed (Nabeel et
al. 2013). To demonstrate the firmness with
which this time interval property is followed,
figure 2 shows the plot of the number of
readings for which the time interval property
EnhancedLZMAandBZIP2forImprovedEnergyDataCompression
257