Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding
Joanna Isabelle Olszewska
2016
Abstract
Reliable detection of objects of interest in complex visual scenes is of prime importance for video-surveillance applications. While most vision approaches deal with tracking visible or partially visible objects in single or multiple video streams, we propose a new approach to automatically detect all objects of interest being part of an analyzed scene, even those entirely hidden in a camera view whereas being present in the scene. For that, we have developed an innovative artificial-intelligence framework embedding a computer vision process fully integrating symbolic knowledge-based reasoning. Our system has been evaluated on standard datasets consisting of video streams with real-world objects evolving in cluttered, outdoor environment under difficult lighting conditions. Our proposed approach shows excellent performance both in detection accuracy and robustness, and outperforms state-of-the-art methods.
References
- Albanese, M., Molinaro, C., Persia, F., Picariello, A., and Subrahmanian, V. S. (2011). Finding unexplained activities in video. In Proceedings of the AAAI International Joint Conference on Artificial Intelligence, pages 1628-1634.
- Bai, L., Lao, S., Jones, G. J. F., and Smeaton, A. F. (2007). Video semantic content analysis based on ontology. In Proceedings of the IEEE International Machine Vision and Image Processing Conference, pages 117- 124.
- Berclaz, J., Fleuret, F., Tueretken, E., and Fua, P. (2011). Multiple object tracking using K-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9):1806-1819.
- Bernardin, K. and Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP Journal on Image and Video Processing, 2008:1-10.
- Bhat, M. and Olszewska, J. I. (2014). DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams. In Proceedings of the ACL International Conference on Computational Linguistics Workshop, pages 87-94.
- Chen, L., Wei, H., and Ferryman, J. (2014). ReadingAct RGB-D action dataset and human action recognition from local features. Pattern Recognition Letters, 50:159-169.
- Dai, X. and Payandeh, S. (2013). Geometry-based object association and consistent labeling in multi-camera surveillance. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 3(2):175-184.
- Evans, M., Osborne, C. J., and Ferryman, J. (2013). Multicamera object detection and tracking with object size estimation. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 177-182.
- Ferrari, V., Tuytelaars, T., and Gool, L. V. (2006). Simultaneous object recognition and segmentation from single or multiple model views. International Journal of Computer Vision, 67(2):159-188.
- Ferryman, J., Hogg, D., Sochman, J., Behera, A., Rodriguez-Serrano, J. A., Worgan, S., Li, L., Leung, V., Evans, M., Cornic, P., Herbin, S., Schlenger, S., and Dose, M. (2013). Robust abandoned object detection integrating wide area visual surveillance and social context. Pattern Recognition Letters, 34(7):789- 798.
- Fleuret, F., Berclaz, J., Lengagne, R., and Fua, P. (2008). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):267-282.
- Gomez-Romero, J., Patricio, M. A., Garcia, J., and Molina, J. M. (2011). Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Systems with Applications, 38(6):7494-7510.
- Jeong, J.-W., Hong, H.-K., and Lee, D.-H. (2011). Ontology-based automatic video annotation technique in smart TV environment. IEEE Transactions on Consumer Electronics, 57(4):1830-1836.
- Kasturi, R., Goldgof, D., Soundararajan, P., Manohar, V., Garofolo, J., Boonstra, M., Korzhova, V., and Zhang, J. (2009). Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):319-336.
- Lehmann, J., Neumann, B., Bohlken, W., and Hotz, L. (2014). A robot waiter that predicts events by highlevel scene interpretation. In Proceedings of the International Conference on Agents and Artificial Intelligence, pages I.469-I.476.
- Mavrinac, A. and Chen, X. (2013). Modeling coverage in camera networks: A survey. International Journal of Computer Vision, 101(1):205-226.
- Natarajan, P. and Nevatia, R. (2005). EDF: A framework for semantic annotation of video. In Proceedings of the IEEE International Conference on Computer Vision Workshops, page 1876.
- Olszewska, J. I. (2012). Multi-target parametric active contours to support ontological domain representation. In Proceedings of the RFIA Conference, pages 779-784.
- Olszewska, J. I. (2013). Multi-scale, multi-feature vector flow active contours for automatic multiple-face detection. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing.
- Olszewska, J. I. (2015). Multi-camera video object recognition using active contours. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing, pages 379-384.
- Olszewska, J. I. and McCluskey, T. L. (2011). Ontologycoupled active contours for dynamic video scene understanding. In Proceedings of the IEEE International Conference on Intelligent Engineering Systems, pages 369-374.
- Park, H.-S. and Cho, S.-B. (2008). A fuzzy rule-based system with ontology for summarization of multi-camera event sequences. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing. LNCS 5097., pages 850-860.
- Remagnino, P., Shihab, A. I., and Jones, G. A. (2004). Distributed intelligence for multi-camera visual surveillance. Pattern Recognition, 37(4):675-689.
- Riboni, D. and Bettini, C. (2011). COSAR: Hybrid reasoning for context-aware activity recognition. Personal and Ubiquitous Computing, 15(3):271-289.
- Sridhar, M., Cohn, A. G., and Hogg, D. C. (2010). Unsupervised learning of event classes from video. In Proceedings of the AAAI International Conference on Artificial Intelligence, pages 1631-1638.
- Vrusias, B., Makris, D., Renno, J.-P., Newbold, N., Ahmad, K., and Jones, G. (2007). A framework for ontology enriched semantic annotation of CCTV video. In Proceedings of the IEEE International Workshop on Image Analysis for Multimedia Interactive Services, page 5.
- Yilmaz, A., Javed, O., and Shah, M. (2006). Object Tracking: A Survey. ACM Computing Surveys, 38(4):13.
Paper Citation
in Harvard Style
Olszewska J. (2016). Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 223-229. DOI: 10.5220/0005852302230229
in Bibtex Style
@conference{icaart16,
author={Joanna Isabelle Olszewska},
title={Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={223-229},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005852302230229},
isbn={978-989-758-172-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Tracking The Invisible Man - Hidden-object Detection for Complex Visual Scene Understanding
SN - 978-989-758-172-4
AU - Olszewska J.
PY - 2016
SP - 223
EP - 229
DO - 10.5220/0005852302230229