obtained when kernels from spatial, temporal, con-
text, 3D points and depth are combined within the
CMMKL-SVM approach. In this respect, the highest
recognition rates (92.83%) have been obtained when
a combination of trajectories, HOG, FPFH, Depth and
object is used. Due to the relevant importance to in-
telligent robots, our future work will focus on the im-
provement of multimodal fusion and the reduction of
the computationalburden by exploiting differentopti-
mization techniques for MKL, allowing a quicker re-
sponse of the robot to interact with humans by either
imitating or anticipating actions.
This research has been partially supported by the
Industrial Doctorate program of the Government of
Catalonia, and by the European Community through
the FP7 framework program by funding the Vinbot
project (N 605630) conducted by Ateknea Solutions
