(3)
Furthermore, the distance between a joint’s current
position and its initial position in the first frame are
taken as a feature. By calculating this value for each
frame, another time series is obtained as shown in
Equation-4. If
represents j
th
joint in the action
sequence,
is the initial position of the joint.
is
the location of j
th
joint in frame c.
∈
;
∈
(4)
After obtaining first three type of the features as
time series, other features are obtained from the
whole observed part of the action sequence instead
of each frame. First of all, 3D angle values between
all joints are observed. However, in our experiments,
some joints give noisy information while some
others provide robust information. We found that the
most robust and useful angles are between shoulder-
elbow-arm and crotch-knee-foot. Other angles are
not very useful to recognize actions. To calculate
joint angles, 3D coordinates of elbow, shoulder,
hand wrist, knee and foot wrist joints are considered.
Joint angles are computed for all frames and then a
histogram for each joint angle is constructed. For a
compact representation of joint angles, histograms of
all joint angle values are concatenated in a one
dimensional array. The order of histograms in the
array is important to classify actions.
Changes in a joint angle have an effect on
prediction capability of the trained models.
However, joint angles might have similar values in
some actions as we reported in (Keceli and Can,
2014). For example, checking watch and crossing
arms actions have similar histograms. Therefore,
joint angles may not provide enough information to
distinguish some actions. How much each joint
moves in different dimensions might be important in
some actions. In other words, total displacement of
joints can be used in addition to joint angle
information. To calculate displacements of joints,
the relative coordinate values of joints are used.
Euclidean distances in x-y-z dimensions between
consequent frames are calculated for each joint.
Then, by summing up Euclidean distances of the
joint among consequent frames, total displacement
of a joint in a dimension is calculated. For each
joint, total displacements in x-y-z dimensions are
considered. This allowed us to distinguish actions
that have similar joint angle histograms but have
dimension orientations. For example, hand waving
and punching actions produce similar angle
histograms. However wrist and elbow joint angle are
moving in different dimensions. Evaluating
displacements in x-y-z dimensions separately
provides more information to distinguish these
actions. In addition to displacements in x-y-z
dimensions, total displacement of each joint in 3D
coordinate space is considered as another feature.
4 CLASSIFICATION
After features are obtained Adaboost classifier is
trained for prediction. Adaboost utilizes boosting
paradigm to increase the accuracy of classification.
Boosting is constructing powerful classifiers from
union of weak classifiers and rules. In our earlier
work, we used support vector machines (SVM) and
Random Forest (Liaw and Wiener, 2002) (RF)
algorithms to classify actions after seeing whole data
sequence (Keceli and Can, 2014). However, when
actions are classified with a limited knowledge about
the sequence, SVM and RF algorithms may have a
very low performance as stated in (Juhl and
Bateman, 2011). In case of having partial
observation, there could be similarity between the
features. Especially under the conditions that less
than 50% of the action sequence is observed,
discrimination ratio of the features are decreasing
dramatically. In this case there is a need for a better
discriminating classifier. Therefore, after testing
SVM and RF classifiers, we decided to use
Adaboost classifier for low latency action
recognition. Adaboost is beneficial in classification
with partial sequence observation.
The Adaboost method is first proposed by
(Freund and Schapire, 1999). This method depends
on boosting algorithm. Boosting is constructing
powerful predictive models by uniting weak
classifiers. Weak model is a predictive model that its
fault ratio is more than 0.5 and powerful model is
the predictive model whose fault ratio as small as
possible. In boosting a huge training data set is split
into three parts. First part is taken and
model is