The Cavy dataset contains six dominant interactions
performed several times by two or three cavies at dif-
ferent locations. The challenging aspect of the the
Cavy dataset is that the cavies are behaving and inter-
acting in complicated and unexpected ways. The ex-
periments have been performed on the Cavy dataset
and the Behave dataset. Extensive experiments on
these datasets demonstrate the effectiveness of the
proposed method. Our approach achieved satisfactory
results with a clustering accuracy of up to 68.84% on
the Behave dataset and up to 45% on Cavy dataset.
In the future, robust tracker needs to be developed
to mitigate the tracker effects. Also the appearance-
based and trajectory-based features beside optical
flow could possibly be included.
REFERENCES
Al-Raziqi, A., Krishna, M., and Denzler, J. (2014). Detec-
tion of object interactions in video sequences. OGRW,
pages 156–161.
Blunsden, S., Andrade, E., and Fisher, R. (2007). Non para-
metric classification of human interaction. In PRIA,
pages 347–354. Springer.
Blunsden, S. and Fisher, R. (2009). Detection and classi-
fication of interacting persons. Machine Learning for
Human Motion Analysis: Theory and Practice, page
213.
Blunsden, S. and Fisher, R. (2010). The behave video
dataset: ground truthed video for multi-person behav-
ior classification. BMVA, 4:1–12.
Cheng, Z., Qin, L., Huang, Q., Yan, S., and Tian, Q. (2014).
Recognizing human group action by layered model
with multiple cues. Neurocomputing, 136:124–135.
Delaitre, V., Sivic, J., and Laptev, I. (2011). Learning
person-object interactions for action recognition in
still images. In NIPS, pages 1503–1511.
Dong, Z., Kong, Y., Liu, C., Li, H., and Jia, Y. (2011). Rec-
ognizing human interaction by multiple features. In
ACPR, pages 77–81.
Guha, T. and Ward, R. K. (2012). Learning sparse represen-
tations for human action recognition. IEEE Transac-
tions on, Pattern Analysis and Machine Intelligence,
34(8):1576–1588.
Jiang, X., Rodner, E., and Denzler, J. (2012). Multi-
person tracking-by-detection based on calibrated
multi-camera systems. In Computer Vision and
Graphics, pages 743–751. Springer.
Kim, Y.-J., Cho, N.-G., and Lee, S.-W. (2014). Group activ-
ity recognition with group interaction zone. In ICPR,
pages 3517–3521.
Kong, Y. and Jia, Y. (2012). A hierarchical model for human
interaction recognition. In ICME, pages 1–6.
Krishna, M. and Denzler, J. (2014). A combination of
generative and discriminative models for fast unsuper-
vised activity recognition from traffic scene videos. In
Proceedings of the IEEE (WACV), pages 640–645.
Krishna, M., K
¨
orner, M., and Denzler, J. (2013). Hierarchi-
cal dirichlet processes for unsupervised online multi-
view action perception using temporal self-similarity
features. In ICDSC, pages 1–6.
Kuettel, D., Breitenstein, M. D., Van Gool, L., and Fer-
rari, V. (2010). What’s going on? discovering spatio-
temporal dependencies in dynamic scenes. In CVPR,
pages 1951–1958.
Li, B., Ayazoglu, M., Mao, T., Camps, O., Sznaier, M., et al.
(2011). Activity recognition using dynamic subspace
angles. In CVPR, pages 3193–3200.
Lin, W., Sun, M.-T., Poovendran, R., and Zhang, Z. (2010).
Group event detection with a varying number of group
members for video surveillance. IEEE Transactions
on CSVT, 20(8):1057–1067.
M
¨
unch, D., Michaelsen, E., and Arens, M. (2012). Support-
ing fuzzy metric temporal logic based situation recog-
nition by mean shift clustering. In KI 2012: Advances
in Artificial Intelligence, pages 233–236. Springer.
Ni, B., Yan, S., and Kassim, A. (2009). Recognizing human
group activities with localized causalities. In CVPR,
pages 1470–1477.
Ohayon, S., Avni, O., Taylor, A. L., Perona, P., and Egnor,
S. R. (2013). Automated multi-day tracking of marked
mice for the analysis of social behaviour. Journal of
neuroscience methods, 219(1):10–19.
Patron-Perez, A., Marszalek, M., Zisserman, A., and Reid,
I. (2010). High five: Recognising human interactions
in tv shows.
Sato, K. and Aggarwal, J. K. (2004). Temporal spatio-
velocity transform and its application to tracking and
interaction. Computer Vision and Image Understand-
ing, 96(2):100–128.
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M.
(2006). Hierarchical dirichlet processes. Journal of
the american statistical association, 101(476).
Yang, G., Yin, Y., and Man, H. (2013). Human object inter-
actions recognition based on social network analysis.
In AIPR, pages 1–4.
Yin, Y., Yang, G., Xu, J., and Man, H. (2012). Small group
human activity recognition. In ICIP, pages 2709–
2712.
Zach, C., Pock, T., and Bischof, H. (2007). A duality based
approach for realtime tv-l 1 optical flow. In Pattern
Recognition, pages 214–223. Springer.
Zhang, C., Yang, X., Lin, W., and Zhu, J. (2012). Recogniz-
ing human group behaviors with multi-group causali-
ties. In WI-IAT, volume 3, pages 44–48.
Zhou, Y., Ni, B., Yan, S., and Huang, T. S. (2011). Recog-
nizing pair-activities by causality analysis. ACM TIST,
2(1):5.
Zhu, G., Yan, S., Han, T. X., and Xu, C. (2011). Gener-
ative group activity analysis with quaternion descrip-
tor. In Advances in Multimedia Modeling, pages 1–11.
Springer.
Zivkovic, Z. (2004). Improved adaptive gaussian mixture
model for background subtraction. In ICPR, volume 2,
pages 28–31.
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
516