2.2 Analysis of Behaviors by Speaker
and Audience by using Time-series
Models
Features for the behavior by the speaker detected by
image processing method can be summarized as fol-
lows; (i) the loudness of speech by speaker, (ii) the
number of skin-colored pixels in the face region, and
the number of skin-colored pixels in the face region
of audience. In lectures, the speaker has to talk with
many audience and grasp their interests for given con-
tents immediately. Accordingly, they focus on the
face movement by the audience for the purpose of
judgement of taking interests in the lecture.
In this paper, we assume that the face direction by
speaker and audience show non-stationary character-
istics with the time. Namely, that characteristic of the
behavior by the audience changes with the time and
the content of the lecture. In this section, we pro-
pose an extraction method of “dominant section” and
model for speaker and audience based piecewise AR
(auto-regressive) modeling. Here, the “dominant sec-
tion” and model mean the change of the contents by
the speaker and interests by the audience. Therefore,
it is very important for the analysis of the objective
lecture to extract dominant section.
We assume that the face direction by speaker
and audience can be modeled by the following non-
stationary AR model with time varying parameters
a
i
(t); Let us consider the following non-stationary
AR model with time varying parameters a
i
(t);
x(t) +
p
∑
i=1
a
i
(t)x(t − i) = e(t), (2)
where p denotes the degree of the AR model, and a se-
quence {e(t)} of white noise has the following statis-
tics:
E[e(t)] = 0, E[e(t)e(τ)] = σ
2
δ
tτ
, (3)
where δ
tτ
denotes the Kronecker delta function.
When the Yule-Walker method is applied to non-
stationary time series data, we have to pay attention
to the following trade-off problems; (i) Too long lo-
cal section: While the reliability of the statistics be-
comes increased, it is difficult to grasp the changing
property of time varying parameters. As a result, the
estimation performance of such parameters becomes
worse. (ii) Too short local section: In the contrast,
while it is easy to grasp the changing property of time
varying parameters, the reliability of the statistics be-
comes decreased.
It is very important to develop a modeling method
by taking account of these non-stationary properties.
Figure 3: Observation section and local stationary section ∆
in the non-stationary time-series data.
From the viewpoint of the statistical approach, sev-
eral kinds of estimation methods of time varying pa-
rameters in AR model have been already discussed by
many researchers. They can be categorized into the
following two approaches: (i) Estimation method by
introducing the local stationary section (Y. Miyanaga
and Hatori, 1991), (Y. Miyoshi and Kakusho, 1987),
(ii) Estimation method by introducing the time vary-
ing parameters (E. Watanabe and Mitani, 1997). In
block-wise processing for the non-stationary time se-
ries data, it is necessary to consider three factors (i.e.,
the length of the local stationary section, the learning
ability of the local stationary model, and the structure
of the local stationary model). These factors are mu-
tually connected and it is very difficult to determine
appropriate values for such factors in prior.
In this paper, we propose a method for the extrac-
tion of “dominant section” and model for speaker and
audience based on time-series models. Authors have
already proposed an extraction method for the domi-
nant sections based on the prediction error (E. Watan-
abe and Kohama, 2011a). Here, the dominant section
and model mean the change of the contents by speaker
and interests by audience. In this paper, we propose
a new extraction method for the dominant sections
based on the change of estimated parameters. The
prediction value ˆx(t) in the k-th local stationary sec-
tion can be calculated by
ˆx(t) = −
p
∑
i=1
a
k
i
x(t − i), (4)
where a
k
i
denotes the estimated parameter in the k-th
section. Also, the prediction error E
k
P
in each section
can be calculated by
E
k
P
=
1
∆
(k+1))∆
∑
t=k∆
(x(t) − ˆx(t))
2
. (5)
AnalysisofBehaviorsbyAudienceinLecturesbyUsingTime-seriesModels
193