mentioned MVC schemes improve the RD perfor-
mance, the corresponding bit-rate is increased when
individuallysending views. In this paper,we present a
modified inter-view prediction MVC scheme form the
perspective of viewer’s interactivity. The proposed
MVC scheme reduces the bite-rate required to retrieve
the requested views (i.e., interactivity) with compara-
ble RD performance.
This paper is structured as follows. Section 2
presents the proposed MVC scheme showing its re-
lation to the views’interactivity. Data sets descrip-
tion, implementation setup, experiments and results
are presented in Section 3. Finally, conclusions are
given in Section 4.
2 THE PROPOSED MVC
SCHEME
In this section, we first give a brief background on the
standard inter-view prediction MVC scheme (Vetro
et al., 2008). Then, we extend the mathematical re-
lationship between the standard MVC scheme and
views’interactivity in the IMVS system to finally
present the proposed MVC scheme.
The MVC-HBP standard scheme has coding
efficiency advantages compared to other schemes
(Merkle et al., 2007) in terms of RD performance.
An example, with eight linearly arranged cameras and
a group of pictures (GOP) length of 8 (for simplic-
ity), is shown in Fig. 1(a). This scheme first uses
inter-view prediction to provide P pictures for even
camera views (S
2
, S
4
and S
6
) at T
0
and T
8
of each
GOP from the base view S
0
. Rest of the pictures in
the even camera views are predicted with hierarchical
B pictures in the temporal direction (Schwarz et al.,
2006). Whereas, odd camera views (S
1
, S
3
and S
5
) are
obtained by combining an inter-view prediction from
two adjacent even views and an hierarchical B coding
structure in the temporal direction. For an even num-
ber of views, the last view represents a specific case
for prediction. S
7
is coded as shown in Fig. 1(a), start-
ing with an inter-view predicted P frame, followed by
hierarchical B-frames, which are also inter-views pre-
dicted from the previous view. This coding scheme
can be applied to any multiview with more than two
views.
In IMVS systems, an user only needs to receive
the requested views. The IMVS server, then, extracts
the requested views from the whole set of frames with
reference frames from other views. In turn, the IMVS
server sends those frames as a subset in one bit-stream
to the user via the network. To retrieve an i
th
view,
V
i
,the required extracted frames (EF) for one GOP can
be generally formulated as,
EF = I + n× P+ m× B, (1)
where I, P, and B are frames related to that specific
view, n and m denote the number of the P-frames and
B-frames, respectively, depending on the view’s loca-
tion at the whole set. To determine n and m for ex-
tracting the V
i
in the MVC-HBP scheme for one GOP,
given base view as S
0
as shown in Fig. 1(a), (1) can
be rewritten as,
EF
i
= I + α
i
P+ [(2k + 1)β
i
− (k + 1)δ
i
+ k]B, (2)
where i ∈ {0, 1,2,...,N − 1} denotes the view num-
ber, N denotes the number of the views, [·] denotes
the number of B-frames. α
i
that denotes the number
of the P-frames, β
i
that determines whether V
s
is an
odd or even, and δ
i
that determines whether V
s
is an
edge view or not, can be obtained as,
α
i
= ⌈ i/2 ⌉, ∀ i ∈ {0,1, 2, ...,N − 1}, (3)
β
i
= i mod 2, ∀ i ∈ {0, 1,2,...,N − 1}, (4)
δ
i
= i mod 2, ∀ i ∈ {0, N − 1}. (5)
The base view isn’t necessary to be set to S
0
as
presented in the standard MVC-HBP scheme (Merkle
et al., 2007). So, the symbol i in (3), (4) and (5) can be
replaced with | B
v
− i |, where B
v
∈ {0,1,2,... , N −
1} denotes the number of the base view. Thus, (3), (4)
and (5) can be rewritten as,
α
x
= ⌈ (|x|)/2 ⌉, ∀ i ∈ {0,1,2, . ..,N −1}, (6)
β
x
= (|x|) mod 2, ∀ B
v
,i ∈ {0,1,2,. ..,N −1}, (7)
δ
x
= (|x|) mod 2, ∀ B
v
,i ∈ {0,N − 1}, (8)
where, x = B
v
− i. Accordingly, (2) can be rewritten
as,
EF
(B
v
−i)
= I + α
(B
v
−i)
P+ [(2k+ 1)β
(B
v
−i)
− (k + 1)δ
(B
v
−i)
+ k]B. (9)
In (9), you can notice that only decreasing the
value of α
(B
v
−i)
minimizes the number of required
extracted P-frames as β
(B
v
−i)
and δ
(B
v
−i)
are bi-
nary. Thus, the bit-rate to retrieve a specific view
V
i
is reduced. To decrease the value of α
(B
v
−i)
,
we propose that the value of B
v
should be set to
⌈ median {0, 1,2,...,N − 1} ⌉, as shown in Fig. 1(b),
yielding improving the view’s interactivity. Table 2
shows the number of extracted frames to retrieve a
specific view of one GOP using different base views
using (9). Setting B
v
to 3 or 4 yields to a minimum
number of extracted frames. Therefore, in the rest of
this paper, the number of views, N is set to 8, and B
v
is set to 4 (i.e., the base view is set to S
4
).
The proposed scheme, shown in Fig. 1(b), first
uses the inter-view prediction to provide P-frames for
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
36