prediction agent might combine these elements (ob-
ject, motion, interaction) and tag them as: writing on
a board. Depending on the abstraction level some-
times it is more convenient and beneficial to observe
closer and more attentive some motion features rather
than others. In the child example it is more advan-
tageous to keep track on how his actual position de-
viates from the predicted path at each step, whereas
in the whiteboard example it might be more helpful
to examine either the final written pattern e.g., by an
OCR (Optical Character Recognition) system or the
motion patterns projected onto the plane, in order to
determine the action in progress, writting vs erasing.
We organize the paper as follows. In the next
section we describe briefly some other published ap-
proaches to the visual recognition of actions or activ-
ities. In Sect. 3 the theorical and mathematical fun-
dations as well as the image processing procedures of
the approach are explained. We run some experiments
in different scenes with different objects/tools, the re-
sults are presented in Sect.4. In Sect. 5 we give some
final comments and remarks on the presented work.
2 RELATED WORK
It is not uncommon to find very frequently in this
line of research the words: activity, task, action, and
atomic action. In this context we define the follow-
ing concepts. Atomic action: or stroke or gesture,
generally they describe fast, short, instantaneous and
continuous motion displacements, e.g., a hand twist,
lifting an arm, etc; Action: or task, it is composed
by an ordered sequence of atomic actions, e.g., drink-
ing, writing, etc; Normally one action is associated
with one object. Activity: is a series of different ac-
tions that are shared in space and time, e.g., cook-
ing, driving, etc. One activity is associated with mul-
tiple objects. There exist also cases in which these
concepts overlap, for example, eating as an activity
or action, turning the steering wheel as an action or
atomic action. The literature in this area spans from
the recognition of atomic actions to the identifica-
tion of activities in a general, global perspective, this
is, such actions and activities are not linked to an
specific object. Here we describe some representa-
tive examples. In (Ju Sun et al., 2009) they tackled
the problem of action recognition in video sequences
scenes by introducing three levels of context, a point-
level context with a SIFT descriptor (Lowe, 2004),
an intra-trajectory context defined by the trajectory of
the salient SIFT features and an inter-trajectory con-
text where the intra-trajectory is related to the other
objects in the scene. The approach in (Kuehne et al.,
2012) combines histograms of sparse feature flow
with hidden Markov Model HMM for action recog-
nition. Global histograms of sparse feature flow are
built for each input image and processed for recog-
nition of small action units by the HMM stage, then
the actions units are combined into a meaningful se-
quence for the recognition of the overall task. Re-
cently in (Chen and Burschka, 2018) in order to pre-
dict and label human actions with objects they pro-
posed a graphical representation to link the objects
with their corresponding usual places inside a scene.
In this representation they decouple the action regions
inside the environment into location areas (LA) and
Sector Maps (SM). The former is where actually the
action occurs and the latter indicates rather the trans-
portation way between LAs. Following this approach
we can say that our work focus on the LAs, since
we observe mainly how an object interacts in order
to characterize its functionality.
3 APPROACH
Inside our analysis framework we can identify the
next main functional blocks: hand tracking, plane de-
tection and point projection to a plane. In Algo.1 we
present an overview of the workflow of the approach.
Algorithm 1: Main Workflow of the Approach.
Result: Projected Motions and Motion
Pattern
1 GET I
3d
(k); /* rgb-3d image */
2 DETECT-PLANE;
3 SEG-3D hand; /* 3D segmentation */
4 init KF
3d
; /* kalman filter */
5 while I
3d
(k) 6=
/
0 do
6 DETECT-PLANE;
7 KF
3d
.predict hand-pos;
8 SEG-3D hand-pos;
9 KF
3d
.correct hand-pos;
10 PROJECT-Hand-centroid;
11 end
12 run-PCAon( proj-pts );
13 get Motion-Patterns;
3.1 Hand Tracking and Plane Detection
We assume we know a-priori the object in used. For
this an additional object recognition block can be
added to the system or it can be simply introduced
manually. In any case we track the motions of the
hand rather than the object’s for several reasons: i) the
hand is the active actor in the visual environment that
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
796