Figure 4: Sample Frames of the distance transformed space
time shape formed from 10 silhouttes.
Figure 5: Sample frames of the gradient of the distance
transformed shape.
shapes with axes x, y and t. The space time shapes are
shown in Figure 3.
4.2 Computation of Euclidean Distance
Transform
To segment the space time shape, the 3D distance
transform (P.F.Felzenszwalb and D.P.Huttenlocher,
2004) based on the Euclidean distance is computed.
This transformation assigns to a interior voxel a value
which is proportional to the Euclidean distance be-
tween this voxel and the nearest boundary voxel. The
computation involves the use of a 3-pass algorithm
where each pass is associated with a raster scan of the
entire space time shape in a particular dimension us-
ing a 1D mask.
The minimum distance calculation is done by
finding the local minima of the lower envelope of the
set of parabolas where each parabola is defined on the
basis of the Euclidean distance between two points
(P.F.Felzenszwalb and D.P.Huttenlocher, 2004). The
intermediate distance transform vales computed in the
first pass is based directly on the Euclidean distance.
In the next two passes, the distance transform values
are computed from the set of parabolas defined on the
boundary voxels in the respective dimension. This
type of distance transform is given by
D
f
(p) = min
qε B
((p − q)
2
+ f (q)) (2)
where p is a non-boundary point, q is a boundary
point, B is the boundary and f (q) is the value of the
distance measure between points p and q. It is seen
that for every qεB, the distance transform is bounded
by the parabola rooted at (q, f (q)). In short, the dis-
tance transform value at point p is the minima of the
lower envelope of the parabolas formed from every
boundary point q. The distance transformation of the
space time shape is shown in Figure 4. It is seen
that the area covered by the torso part of the body
has higher values than the area covered by the limbs.
By varying the aspect ratio, the different axes x,y and
t will have different emphasis on the computed dis-
tance transform.
4.3 Segmentation of the 3D Space-time
Shape
Human actions are distinguished from the variation
of the silhouette and these variations are more along
the limbs than in the torso. So, a better representation
of the space time shape is required which emphasizes
fast moving parts so that the features extracted gives
the necessary variation to represent the action. Thus, a
normalized gradient of the distance transform is used
and, as shown in Figure 5, the fast moving parts such
as the limbs have higher values compared to the torso
region. The gradient of the space time shape φ(x,y,t)
(M.Blank et al., 2005) is defined as
φ(x,y,t) = U (x,y,t) +K
1
·
∂
2
U
∂x
2
+K
2
·
∂
2
U
∂y
2
+K
3
·
∂
2
U
∂t
2
(3)
Figure 6: 8-Level segmentation of sample frames of the
space time shape of the “Jumping-Jack” action.
where U(x,y,t) is the distance transformed space time
shape, K
i
is the weight added to the derivative taken
along the i
th
axis. The weights associated with the
Figure 4: Sample Frames of the distance transformed space
time shape formed from 10 silhouttes.
Figure 4: Sample Frames of the distance transformed space
time shape formed from 10 silhouttes.
Figure 5: Sample frames of the gradient of the distance
transformed shape.
shapes with axes x,y and t. The space time shapes are
shown in Figure 3.
4.2 Computation of Euclidean Distance
Transform
To segment the space time shape, the 3D distance
transform (P.F.Felzenszwalb and D.P.Huttenlocher,
2004) based on the Euclidean distance is computed.
This transformation assigns to a interior voxel a value
which is proportional to the Euclidean distance be-
tween this voxel and the nearest boundary voxel. The
computation involves the use of a 3-pass algorithm
where each pass is associated with a raster scan of the
entire space time shape in a particular dimension us-
ing a 1D mask.
The minimum distance calculation is done by
finding the local minima of the lower envelope of the
set of parabolas where each parabola is defined on the
basis of the Euclidean distance between two points
(P.F.Felzenszwalb and D.P.Huttenlocher, 2004). The
intermediate distance transform vales computed in the
first pass is based directly on the Euclidean distance.
In the next two passes, the distance transform values
are computed from the set of parabolas defined on the
boundary voxels in the respective dimension. This
type of distance transform is given by
D
f
(p) = min
qε B
((p − q)
2
+ f (q)) (2)
where p is a non-boundary point, q is a boundary
point, B is the boundary and f (q) is the value of the
distance measure between points p and q. It is seen
that for every qεB, the distance transform is bounded
by the parabola rooted at (q, f (q)). In short, the dis-
tance transform value at point p is the minima of the
lower envelope of the parabolas formed from every
boundary point q. The distance transformation of the
space time shape is shown in Figure 4. It is seen
that the area covered by the torso part of the body
has higher values than the area covered by the limbs.
By varying the aspect ratio, the different axes x,y and
t will have different emphasis on the computed dis-
tance transform.
4.3 Segmentation of the 3D Space-time
Shape
Human actions are distinguished from the variation
of the silhouette and these variations are more along
the limbs than in the torso. So, a better representation
of the space time shape is required which emphasizes
fast moving parts so that the features extracted gives
the necessary variation to represent the action. Thus, a
normalized gradient of the distance transform is used
and, as shown in Figure 5, the fast moving parts such
as the limbs have higher values compared to the torso
region. The gradient of the space time shape φ(x,y,t)
(M.Blank et al., 2005) is defined as
φ(x,y,t) = U(x, y,t)+K
1
·
∂
2
U
∂x
2
+K
2
·
∂
2
U
∂y
2
+K
3
·
∂
2
U
∂t
2
(3)
Figure 6: 8-Level segmentation of sample frames of the
space time shape of the “Jumping-Jack” action.
where U(x,y,t) is the distance transformed space time
shape, K
i
is the weight added to the derivative taken
along the i
th
axis. The weights associated with the
Figure 5: Sample frames of the gradient of the distance
transformed shape.
4.2 Computation of Euclidean Distance
Transform
To segment the space time shape, the 3D distance
transform (P. F. Felzenszwalb and D. P. Huttenlocher,
2004) based on the Euclidean distance is computed.
This transformation assigns to a interior voxel a value
which is proportional to the Euclidean distance be-
tween this voxel and the nearest boundary voxel. The
computation involves the use of a 3-pass algorithm
where each pass is associated with a raster scan of the
entire space time shape in a particular dimension us-
ing a 1D mask.
The minimum distance calculation is done by
finding the local minima of the lower envelope of the
set of parabolas where each parabola is defined on the
basis of the Euclidean distance between two points (P.
F. Felzenszwalb and D. P. Huttenlocher, 2004). The
intermediate distance transform vales computed in the
first pass is based directly on the Euclidean distance.
In the next two passes, the distance transform values
are computed from the set of parabolas defined on the
boundary voxels in the respective dimension. This
type of distance transform is given by
D
f
(p) = min
qεB
((p − q)
2
+ f (q)) (2)
where p is a non-boundary point, q is a boundary
point, B is the boundary and f (q) is the value of the
distance measure between points p and q. It is seen
that for every q εB, the distance transform is bounded
by the parabola rooted at (q, f (q)). In short, the dis-
tance transform value at point p is the minima of the
lower envelope of the parabolas formed from every
boundary point q. The distance transformation of the
space time shape is shown in Figure 4. It is seen
that the area covered by the torso part of the body
has higher values than the area covered by the limbs.
By varying the aspect ratio, the different axes x,y and
t will have different emphasis on the computed dis-
tance transform.
4.3 Segmentation of the 3D Space-time
Shape
Human actions are distinguished from the variation
of the silhouette and these variations are more along
the limbs than in the torso. So, a better representation
of the space time shape is required which emphasizes
fast moving parts so that the features extracted gives
the necessary variation to represent the action. Thus, a
normalized gradient of the distance transform is used
and, as shown in Figure 5, the fast moving parts such
as the limbs have higher values compared to the torso
region. The gradient of the space time shape φ(x,y,t)
(M. Blank et al., 2005) is defined as
φ(x,y,t) = U(x,y,t) + K
1
·
∂
2
U
∂x
2
+K
2
·
∂
2
U
∂y
2
+K
3
·
∂
2
U
∂t
2
(3)
Figure 4: Sample Frames of the distance transformed space
time shape formed from 10 silhouttes.
Figure 5: Sample frames of the gradient of the distance
transformed shape.
shapes with axes x,y and t. The space time shapes are
shown in Figure 3.
4.2 Computation of Euclidean Distance
Transform
To segment the space time shape, the 3D distance
transform (P.F.Felzenszwalb and D.P.Huttenlocher,
2004) based on the Euclidean distance is computed.
This transformation assigns to a interior voxel a value
which is proportional to the Euclidean distance be-
tween this voxel and the nearest boundary voxel. The
computation involves the use of a 3-pass algorithm
where each pass is associated with a raster scan of the
entire space time shape in a particular dimension us-
ing a 1D mask.
The minimum distance calculation is done by
finding the local minima of the lower envelope of the
set of parabolas where each parabola is defined on the
basis of the Euclidean distance between two points
(P.F.Felzenszwalb and D.P.Huttenlocher, 2004). The
intermediate distance transform vales computed in the
first pass is based directly on the Euclidean distance.
In the next two passes, the distance transform values
are computed from the set of parabolas defined on the
boundary voxels in the respective dimension. This
type of distance transform is given by
D
f
(p) = min
qε B
((p − q)
2
+ f (q)) (2)
where p is a non-boundary point, q is a boundary
point, B is the boundary and f (q) is the value of the
distance measure between points p and q. It is seen
that for every qεB, the distance transform is bounded
by the parabola rooted at (q, f (q)). In short, the dis-
tance transform value at point p is the minima of the
lower envelope of the parabolas formed from every
boundary point q. The distance transformation of the
space time shape is shown in Figure 4. It is seen
that the area covered by the torso part of the body
has higher values than the area covered by the limbs.
By varying the aspect ratio, the different axes x,y and
t will have different emphasis on the computed dis-
tance transform.
4.3 Segmentation of the 3D Space-time
Shape
Human actions are distinguished from the variation
of the silhouette and these variations are more along
the limbs than in the torso. So, a better representation
of the space time shape is required which emphasizes
fast moving parts so that the features extracted gives
the necessary variation to represent the action. Thus, a
normalized gradient of the distance transform is used
and, as shown in Figure 5, the fast moving parts such
as the limbs have higher values compared to the torso
region. The gradient of the space time shape φ(x, y,t)
(M.Blank et al., 2005) is defined as
φ(x,y,t) = U (x,y,t)+K
1
·
∂
2
U
∂x
2
+K
2
·
∂
2
U
∂y
2
+K
3
·
∂
2
U
∂t
2
(3)
Figure 6: 8-Level segmentation of sample frames of the
space time shape of the “Jumping-Jack” action.
where U(x, y,t) is the distance transformed space time
shape, K
i
is the weight added to the derivative taken
along the i
th
axis. The weights associated with the
Figure 6: 8-Level segmentation of sample frames of the
space time shape of the “Jumping-Jack” action.
where U(x,y,t) is the distance transformed space time
shape, K
i
is the weight added to the derivative taken
along the i
th
axis. The weights associated with the
gradients along each of the axes are usually kept the
same. It is seen that the proper variation occurs where
the time axis has more emphasis. The fast moving
parts in this case being the hands and legs have high
values, the region surrounding the torso which are not
so fast moving have moderate values while the torso
region which moves very slowly with respect to the
limbs have very low values. Moreover this represen-
tation also contains concatenation of silhouettes from
the previous frame onto the current frame due to the
ACTION RECOGNITION BASED ON MULTI-LEVEL REPRESENTATION OF 3D SHAPE
381