where W
i
is the weighting value assigned to the i-th
frame based on global motion and w
ij
. Since the met-
ric was developed using the VQEG Phase I test data
it consists of larger frame sizes (SD-resolutions, 525-
line and 625-line) than the QCIF used in this paper,
therefore a modified VSSIM has also been used in
the proposed solution to adapt it to smaller resolu-
tion. This is accomplished by scaling the weighting
coefficient K
M
, used to calculateW
i
, and its connected
thresholds with a factor of 8, from 16 to 2 (Lu et al.,
2004).
NTIA VQM, the National Telecommunications and
Information Administrations general purpose Video
Quality Model general model, is a reduced reference
method containing linear combination of seven ob-
jective parameters for measuring the perceptual ef-
fects of a wide range of impairments such as blurring,
block distortion, jerky/unnatural motion, noise (in
both the luminance and chrominance channels), and
error blocks (Pinson and Wolf, 2004). The perceptual
impairment is calculated using comparison functions
that have been developed to model visual masking
of spatial and temporal impairments. Some features
use a comparison function that performs a simple Eu-
clidean distance between two original and two pro-
cessed feature streams but most features use either the
ratio comparison function or the log comparison func-
tion. The VQM general model was included in the
Video Quality Experts Group (VQEG) Phase II Full
Reference Television (FR-TV) tests (VQEG, 2008).
PEVQ, the Perceptual Evaluation of Video Quality
from Opticom, calculates measures from the differ-
ences in the luminance and chrominance domains be-
tween corresponding frames. Also motion informa-
tion is used in forming the final measure (PEVQ,
2008). PEVQ has been developed for low bit rates and
resolutions as CIF (352× 288) and QCIF (176× 144).
PEVQ is a proposed candidate for standardization of a
FR video model within VQEG which is in the process
of starting verification tests for future standardization.
4 THE MATHEMATICAL MODEL
The problem can be presented as an observation ma-
trix, X = [x
1
x
2
· · · x
N
], where x
1
,x
2
,. .. x
N
are a num-
ber of feature vectors that has been generated with
different video content and codec setups. Each fea-
ture vector x
n
consists of extracted codec parameters
denoted x
1
,x
2
,. .. x
K
. The corresponding quality mea-
sures for the different video content, PSNR, PEVQ,
SSIM, VSSIM, and NTIM then correspond to the de-
sired Y = [y
1
y
2
· · · y
N
]. X and Y can be viewed as
training data for a classification, mapping or regres-
sion problem. It is desired to find a function Z = f(x)
that maps the given values in x to a specific value Z,
e.g. an estimation of PSNR.
There are several different models solving the prob-
lem, that are more or less computationalcomplex. Be-
cause a low complex solution is required in order to
have the possibility for an implementation in a mobile
device, multi-linear regression is selected.
The multi-linear model is formulated as:
Y = βx+ ε (8)
where ε represents the unpredicted variation. The
multi-linear regression estimates the values for β de-
noted
b
β that can be used to predict Z as
b
Z =
b
β
0
+
b
β
1
x
1
+
b
β
2
x
2
+ ... +
b
β
K
x
K
(9)
4.1 Predicted Metric Evaluation
To be able to evaluate the accuracy of the predicted
metric Pearson linear correlation coefficient is used.
It is defined as follows:
r
P
=
∑
(
b
Z
i
−
b
Z
mean
)(Z
i
− Z
mean
)
q
∑
(
b
Z
i
−
b
Z
mean
)
2
p
∑
(Z
i
− Z
mean
)
2
(10)
where
b
Z
mean
and Z
mean
are the mean value of esti-
mated and true data set respectively, and
b
Z
i
and Z
i
are
the estimated and true data values for each sequence.
This assumes a linear relation between the data sets.
5 VIDEO SOURCE SEQUENCES
To generate training and verification data different se-
quences with different characteristic (amount of mo-
tion, color, heads, animations) were used. The source
sequences had QCIF (176× 144) resolution and were
generated with different frame rates, 30, 15, 10, and
7.5 frames per second (fps), and bitrates, approxi-
mately: 30, 40, 50, 100, 150, and 200 kilobits per
second (kbps). The video sequences were approxi-
mately 3 seconds long (90, 45, 30, and 23 frames) and
they were encoded with the H.264/MPEG-4 AVC ref-
erence software, version 12.2 generated by JVT (JVT,
2000) using the baseline profile.
The sequences for training were: Foreman, Cart,
Mobile, Shine, Fish, Soccer goal, and Car Phone re-
sulting in 168 sequences for training. For verification
five different parts from a cropped version of the 3G-
sequence was used, where the five parts have differ-
ent characteristics. The cropping was made to QCIF
without the original letter box aspect ratio. Varying
the bitrate and the frame rate in the same way as for
the training data results in 120 verification sequences.
A NEW VIDEO QUALITY PREDICTOR BASED ON DECODER PARAMETER EXTRACTION
287