and processing requirements. Its simplicity stems
from the linearity of the associated non-adaptive
incoherent projections, which are employed for the
representation and reconstruction of sparse signals.
Recently, the framework of compressive video
sensing (CVS) was introduced as a natural extension
proposing distinct approaches for video acquisition
using a reduced amount of data, while maintaining a
similar reconstruction performance when compared
to standard video compression techniques.
Existing CVS approaches perform separate
encoding of each frame, based on a non-overlapping
block splitting to reduce the storage and computa-
tional costs, by combining full sampling of reference
frames with CS applied on non-reference frames.
Then, at the decoder, the reconstruction is performed
separately (Stankovi
´
c et al., 2008), or jointly by either
considering a joint sparsity model as in (Kang and
Lu, 2009) or by designing an adaptive sparsifying
basis using neighboring blocks in previously recon-
structed frames (Do et al., 2009; Prades-Nebot et al.,
2009). The major drawbacks are that, since potential
spatio-temporal redundancies are not removed at the
encoder, the corresponding CVS methods usually
result in increased bit-rates, while also being sensitive
to the propagation of reconstruction errors along the
sequence in the case of joint decoding.
The efficiency of typical video compression
standards, such as the MPEGx, in achieving a good
tradeoff between the reconstruction quality and
the associated bit-rates, is primarily based on the
capability of removing potential spatio-temporal
redundancies by means of intra-frame transform
coding and inter-frame motion prediction. However,
an encoder with increased memory and processing
resources is required, which may be prohibitive in
a lightweight remote imaging system. On the other
hand, the use of M-JPEG, which is an intra-frame-
only video compression scheme, has the advantage of
imposing significantly lower processing and memory
requirements on the hardware, but at the cost of
increasing significantly the required bit-rate, which is
restrictive in the case of limited bandwidth.
In the present work, we address the drawbacks of
the previous CVS methods, as well as the limitations
of MPEGx and M-JPEG compression techniques, by
introducing a CVS scheme which could be integrated
in onboard video sensing devices with restricted
resources. In particular, the proposed CVS method
combines a simplified encoding process by embed-
ding a CS module in an M-JPEG-like encoder, along
with a refinement phase based on inter-frame predic-
tion at the decoder (as opposed to MPEGx, where
the prediction errors are formed at the encoder). The
idea of transferring the computational burden of the
motion estimation and compensation processes at the
decoder was also appeared in (Jung and Ye, 2010)
in the framework of dynamic magnetic resonance
imaging, where an auxiliary sequence of residual
frames was generated at the decoder recursively
using a set of fully-sampled reference frames in
conjunction with the low-resolution dynamic frames.
Moreover, the required bit-rate of our proposed
encoder can be further decreased by downsampling
the non-reference frames, followed by an additional
super-resolution step at the decoder to restore the
reconstructed frames in their original resolution.
The use of super-resolution as a tool to resize the
frames in their original dimension is motivated by
recent works on sparse representation-based image
super-resolution via dictionary learning (Freeman
et al., 2002; Yang et al., 2008; Wang et al., 2011;
Zhang et al., 2011), where it has been shown that
a super-resolution method results in images with
superior quality when compared to the commonly
used 2-dimensional interpolation schemes (e.g.,
bilinear, bicubic, spline).
The paper is organized as follows: in Section 2,
the model for the compressed measurements acqui-
sition is introduced. Section 3 describes in detail
the proposed CVS architecture, while a performance
evaluation is carried out in Section 4. Finally,
conclusions and further extensions are outlined in
Section 5.
2 CS MEASUREMENTS MODEL
In the following, we consider for convenience the case
of square N × N frames, although the proposed ap-
proach is extended straightforwardly in the general
non-square case. The main disadvantage when we
deal with a remote imaging device with limited capa-
bilities, as mentioned in Section 1, is the high memory
and computational expense when we operate at high
resolutions. This drawback can be alleviated by pro-
ceeding in a block-wise fashion. More specifically, in
the proposed CVS system, each frame is divided into
equally sized B × B non-overlapping blocks. Then, a
measurement vector g
j
, j = 1,...,n
B
, is generated for
each one of the n
B
blocks via a simple linear model as
follows,
g
j
= Φ
j
x
j
, (1)
where Φ
j
∈ R
M×B
2
is a suitable measurement ma-
trix (M B
2
) and x
j
∈ R
B
2
denotes the j-th block
of frame x, reshaped as a column vector. Although,
in general, a distinct measurement matrix can be as-
DESIGN OF A COMPRESSIVE REMOTE IMAGING SYSTEM COMPENSATING A HIGHLY LIGHTWEIGHT
ENCODING WITH A REFINED DECODING SCHEME
47