directed towards the detection of special effect
captions which are moving texts or nonhorizontally
aligned texts.
REFERENCES
Chen, D., Odobez, J., and Bourlard, H. (2003).Text
detection and recognition in images and video
frames.Pattern Recognition, 37:595–607.
Gonzalez, A., Bergasa, L., Yebes, J., and Bronte, S.
(2012). Text location in complex images. In
Proceedings of International Conference on Pattern
Recognition, pages 617–620.
Gui, T., Sun, J., Naoi, S., and Katsuyama, Y. (2012). A
fast caption detection method for low quality video
images. In Proceedings of IAPR International
Workshop on Document Analysis Systems, pages 302–
306.
Hua, X., Chen, X., Liu, W., and Zhang, H.
(2001).Automatic location of text in video frame. In
Proceedings of ACM SIGMM International Workshop
on Multimedia Information Retrieval, pages 24–27.
Huang, H., Shi, P., and Yang, L. (2014).A method of
caption location and segmentation in news video. In
Proceedings of International Congress on Image and
Signal Processing, pages 365–369.
Kim, K., Jung, K., Park, S., and Kim, H. (2001). Support
vector machine-based text detection in digital video.
Pattern Recognition, 34(2):527–529.
Li, H., Doermann, D., and Kia, O. (2000). Automatic text
detection and tracking in digital video. IEEE
Transactions on Image Processing, 9(1):147–156.
Lienhart, R. and Wernicke, A. (2002). Localizing and
segmenting text in images and videos. IEEE
Transactions on Circuits and Systems for Video
Technology, 12(4):256–257.
Liu, X., Wang, W., and Zhu, T. (2010). Extracting
captions in complex background from video. In
Proceedings of International Conference on Pattern
Recognition , pages 3232–3235.
Lyu, M., Song, J., and Gai, M. (2005).A comprehensive
method for multilingual video text detection,
localization, and extraction.IEEE Transactions on
Circuits and Systems for Video Technology,
15(2):243–255.
Mariano, V. and Kasturi, R. (2000).Locating uniform
colored text in video frames. In Proceedings of
International Conference on Pattern Recognition,
pages 539–542.
Ngo, C., Pong, T., and Chin, R. (2001). Video partitioning
by temporal slice coherency.IEEE Transactions on
Circuits and Systems for Video Technology,
11(8):941–953.
Pan, Y., Hou, X., and Liu, C. (2011).A hybrid approach to
detect and localize texts in natural scene images. IEEE
Transactions on Image Processing, 20(3):800–813.
Qian, X., Liu, G., Wang, H., and Su, R. (2007). Text
detection, localization, and tracking in compressed
video. Signal Processing: Image Communication,
22:752– 768.
Shiva kumara, P., Huang, W., and Tan, C. (2008).
Efficient video text detection using edge features. In
Proceedings of International Conference on Pattern
Recognition.
Tang, X., Gao, X., Liu, J., and Zhang, H. (2002). A spatial
temporal approach for video caption detection and
recognition.IEEE Transactions on Neural Network,
13(4):961–971.
Tasi, T., Chen, Y., and Fang, C. (2006).A comprehensive
motion video text detection localization and extraction
method. In Proceedings of International Conference
on Communications, Circuits and Systems, pages 515–
519.
Wang, R., Jin, W., and Wu, L. (2004). A novel video
caption detection approach using multi-frame
integration. In Proceedings of International
Conference on Pattern Recognition, pages 449–452.
Wang, Y. and Chen, J. (2006). Detecting video text using
spatio-temporal wavelet transform. In Proceedings of
International Conference on Pattern Recognition,
pages 754–757.
Ye, Q., Gao, Q. H. W., and Zhao, D. (2005). Fast and
robust text detection in images and video frames.
Image and Vision Computing, 23:565–576.
Zhong, Y., Zhang, H., and Jain, A. (2000).Automatic
caption localization in compressed video.IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 22(4):385–392.
APPENDIX
Table 1: Performance comparison for caption detection.
Test The Proposed Component Based
Video
ID
Approach Approach
Recall Precision Recall Precision
(1) 93.07% 89.56% 86.35% 81.43%
(2) 94.92% 91.30% 88.53% 84.10%
(3) 88.24% 78.94% 83.89% 70.67%
(4) 90.44% 87.88% 83.07% 77.69%
(5) 91.38% 85.74% 87.42% 80.57%
Test
Texture Based
Approach
Edge Based
Approach
Video
ID
Recall Precision Recall Precision
(1) 88.42% 84.71% 91.43% 86.36%
(2) 90.77% 88.42% 92.23% 87.10%
(3) 85.36% 75.31% 82.35% 77.06%
(4) 86.21% 81.35% 88.07% 83.64%
(5) 83.46% 77.74% 89.57% 83.33%