Unsupervised Framework for Interactions Modeling between Multiple Objects

Ali Al-Raziqi, Joachim Denzler


Extracting compound interactions involving multiple objects is a challenging task in computer vision due to different issues such as the mutual occlusions between objects, the varying group size and issues raised from the tracker. Additionally, the single activities are uncommon compared with the activities that are performed by two or more objects, e.g., gathering, fighting, running, etc. The purpose of this paper is to address the problem of interaction recognition among multiple objects based on dynamic features in an unsupervised manner. Our main contribution is twofold. First, a combined framework using a tracking-by-detection framework for trajectory extraction and HDPs for latent interaction extraction is introduced. Another important contribution is the introduction of a new dataset, the Cavy dataset. The Cavy dataset contains about six dominant interactions performed several times by two or three cavies at different locations. The cavies are interacting in complicated and unexpected ways, which leads to perform many interactions in a short time. This makes working on this dataset more challenging. The experiments in this study are not only performed on the Cavy dataset but we also use the benchmark dataset Behave. The experiments on these datasets demonstrate the effectiveness of the proposed method. Although the our approach is completely unsupervised, we achieved satisfactory results with a clustering accuracy of up to 68.84% on the Behave dataset and up to 45% on the Cavy dataset.


  1. Al-Raziqi, A., Krishna, M., and Denzler, J. (2014). Detection of object interactions in video sequences. OGRW, pages 156-161.
  2. Blunsden, S., Andrade, E., and Fisher, R. (2007). Non parametric classification of human interaction. In PRIA, pages 347-354. Springer.
  3. Blunsden, S. and Fisher, R. (2009). Detection and classification of interacting persons. Machine Learning for Human Motion Analysis: Theory and Practice, page 213.
  4. Blunsden, S. and Fisher, R. (2010). The behave video dataset: ground truthed video for multi-person behavior classification. BMVA, 4:1-12.
  5. Cheng, Z., Qin, L., Huang, Q., Yan, S., and Tian, Q. (2014). Recognizing human group action by layered model with multiple cues. Neurocomputing, 136:124-135.
  6. Delaitre, V., Sivic, J., and Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In NIPS, pages 1503-1511.
  7. Dong, Z., Kong, Y., Liu, C., Li, H., and Jia, Y. (2011). Recognizing human interaction by multiple features. In ACPR, pages 77-81.
  8. Guha, T. and Ward, R. K. (2012). Learning sparse representations for human action recognition. IEEE Transactions on, Pattern Analysis and Machine Intelligence, 34(8):1576-1588.
  9. Jiang, X., Rodner, E., and Denzler, J. (2012). Multiperson tracking-by-detection based on calibrated multi-camera systems. In Computer Vision and Graphics, pages 743-751. Springer.
  10. Kim, Y.-J., Cho, N.-G., and Lee, S.-W. (2014). Group activity recognition with group interaction zone. In ICPR, pages 3517-3521.
  11. Kong, Y. and Jia, Y. (2012). A hierarchical model for human interaction recognition. In ICME, pages 1-6.
  12. Krishna, M. and Denzler, J. (2014). A combination of generative and discriminative models for fast unsupervised activity recognition from traffic scene videos. In Proceedings of the IEEE (WACV), pages 640-645.
  13. Krishna, M., K örner, M., and Denzler, J. (2013). Hierarchical dirichlet processes for unsupervised online multiview action perception using temporal self-similarity features. In ICDSC, pages 1-6.
  14. Kuettel, D., Breitenstein, M. D., Van Gool, L., and Ferrari, V. (2010). What's going on? discovering spatiotemporal dependencies in dynamic scenes. In CVPR, pages 1951-1958.
  15. Li, B., Ayazoglu, M., Mao, T., Camps, O., Sznaier, M., et al. (2011). Activity recognition using dynamic subspace angles. In CVPR, pages 3193-3200.
  16. Lin, W., Sun, M.-T., Poovendran, R., and Zhang, Z. (2010). Group event detection with a varying number of group members for video surveillance. IEEE Transactions on CSVT, 20(8):1057-1067.
  17. Münch, D., Michaelsen, E., and Arens, M. (2012). Supporting fuzzy metric temporal logic based situation recognition by mean shift clustering. In KI 2012: Advances in Artificial Intelligence , pages 233-236. Springer.
  18. Ni, B., Yan, S., and Kassim, A. (2009). Recognizing human group activities with localized causalities. In CVPR, pages 1470-1477.
  19. Ohayon, S., Avni, O., Taylor, A. L., Perona, P., and Egnor, S. R. (2013). Automated multi-day tracking of marked mice for the analysis of social behaviour. Journal of neuroscience methods, 219(1):10-19.
  20. Patron-Perez, A., Marszalek, M., Zisserman, A., and Reid, I. (2010). High vfie: Recognising human interactions in tv shows.
  21. Sato, K. and Aggarwal, J. K. (2004). Temporal spatiovelocity transform and its application to tracking and interaction. Computer Vision and Image Understanding, 96(2):100-128.
  22. Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical dirichlet processes. Journal of the american statistical association, 101(476).
  23. Yang, G., Yin, Y., and Man, H. (2013). Human object interactions recognition based on social network analysis. In AIPR, pages 1-4.
  24. Yin, Y., Yang, G., Xu, J., and Man, H. (2012). Small group human activity recognition. In ICIP, pages 2709- 2712.
  25. Zach, C., Pock, T., and Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Pattern Recognition, pages 214-223. Springer.
  26. Zhang, C., Yang, X., Lin, W., and Zhu, J. (2012). Recognizing human group behaviors with multi-group causalities. In WI-IAT, volume 3, pages 44-48.
  27. Zhou, Y., Ni, B., Yan, S., and Huang, T. S. (2011). Recognizing pair-activities by causality analysis. ACM TIST, 2(1):5.
  28. Zhu, G., Yan, S., Han, T. X., and Xu, C. (2011). Generative group activity analysis with quaternion descriptor. In Advances in Multimedia Modeling, pages 1-11. Springer.
  29. Zivkovic, Z. (2004). Improved adaptive gaussian mixture model for background subtraction. In ICPR, volume 2, pages 28-31.

Paper Citation

in Harvard Style

Al-Raziqi A. and Denzler J. (2016). Unsupervised Framework for Interactions Modeling between Multiple Objects . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 509-516. DOI: 10.5220/0005680705090516

in Bibtex Style

author={Ali Al-Raziqi and Joachim Denzler},
title={Unsupervised Framework for Interactions Modeling between Multiple Objects},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},

in EndNote Style

JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Unsupervised Framework for Interactions Modeling between Multiple Objects
SN - 978-989-758-175-5
AU - Al-Raziqi A.
AU - Denzler J.
PY - 2016
SP - 509
EP - 516
DO - 10.5220/0005680705090516