system we chose to use stored movie clips for
simplicity. While we use only one instructor it
might be more instructors involved. We currently
have twelve physiotherapy moves used in our
system.
Currently, a set of movie clips, showing a full
view of the instructor, is used. In each clip the
instructor performs a single cycle of the move then
visual feedback of the instructor is displayed. In our
proposed system if the algorithm recognizes that the
user is performing the physiotherapy move correctly,
it notifies the user that he performs complete
movement correctly and if the user movement is
wrong, system notifies the user that his movement is
wrong and along with the information that which
part of the body had wrong movement.
Our design is divided into the following three
procedures: 1. Detection of the markers in the first
frame. 2. Tracking markers in the rest of the frames
and finding the trajectories 3. Matching motion
trajectories with stored model.
The paper is organized as follows. In section 2
we introduce, without going into the details, the
pervious systems related to human movement
analysis. In Section 3 we described the method that
was used for detection of markers in the first frame,
and in Section 4 we explain how to track markers in
a video sequence and this is divided into two
procedures: matching and prediction. The matching
is used to find correspondence between the extracted
objects of two consecutive frames. The prediction
stage is an important stage in order to limit the
search region, thus reducing the execution time. In
Section 5 the verification of the trajectories,
recorded from markers movements, will be
discussed and they are compared with the
information for the models. Comparison between a
trajectory and the model is performed by computing
the difference between their smoothed zero-crossing
potentials. Our experimental results are in Section 6
and conclusion is in Section 7.
2 PREVIOUS SYSTEM
A system called W4 ( Haritaoglu,1998) is a real time
visual surveillance system for detecting and tracking
people and monitoring their activities in an outdoor
environment. It operates on monocular grayscale
video imagery, or on video imagery from an infrared
camera.
Other system called Pfinder (Wren ,1997) is a
real-time system for tracking a person which it uses
a multi-class statistical model of color and shape to
segment a person from a background scene. It finds
and tracks people’s head and hands under a wide
range of viewing condition.
System introduced by Kidrooms (Bobick,1996)
is a tracking system based on “closed-world
regions”. These are regions of space and time in
which the specific context of what is in the regions
is assumed to be known. These regions are tracked
in real-time domains where object motions are not
smooth or rigid, and where multiple objects are
interacting.
3 DETECTION
As mentioned earlier, there are 12 markers; as
shown in Figure 2; these are used to locate important
human body points. Detection of markers in the first
frame is an important step in our proposed system.
We used a Gaussian Pyramid representation in order
to decrease computational processing time. Each
level of a Gaussian pyramid is a lower resolution
with respect to the previous one. In each level we
simply performed two operations: 1) low-pass filter
of the image, and 2) discarding the odd numbered
rows and columns from the filtered image. These
operations performed for both of input image and
template image. Low-pass filtering of input image is
accomplished by taking a weighted average of a 5x5
region surrounding at each image pixel. To increase
computation efficiency, a separable equivalent
weighting function was used to perform the 5x5
weighted averages. At first, convolving an image
with a 1x5 weighting function, the 'horizontal'
weighted average of the image is obtained. Next,
convolving the horizontally averaged result with the
transpose of the 1x5 vector (i.e. a '5x1') weighting
function results the 'vertical' weighted averaging. A
Gaussian-like weighting function with values of [.05
.25 .5 .25 .05] was used as the impulse response of
our filter. Due to our input image frame size, we
used the two-level pyramid in our system. As an
example, a frame of video sequence in the original
size and its lower resolution is shown in the Figure
2.
Once the resolution of an image and the template is
decreased, using Gaussian pyramid at one level, then
a template representing a marker must be searched
in the lower level. When located, then we find the
probable locations of the markers within the 2x2
region surrounding the places found in the
corresponding locations of the original image. The
cross-correlation function is used for detection,
given by the following relationship.
VISAPP 2006 - MOTION, TRACKING AND STEREO VISION
450