combinations, we tested all possible combinations of
the thresholds τ
BoW
and τ
RGB
and for a recall point se-
lected the highest precision. Our BoW method clearly
outperforms the baseline method using colour his-
tograms. However, it is evident that combining the
two still improves detection remarkably.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Precision
RGB
BoW
RGB+BoW (or)
RGB+Bow (and)
Figure 3: TRECVid comparison with the baseline method
and with the hybrid of the two methods.
5 CONCLUSIONS
In this work, we adopted the popular approach for
object class detection, visual Bag-of-Words (BoW),
to the low level video processing task of video shot
boundary detection. To the authors’ best knowledge,
our work is the first which uses the BoW approach
in video shot boundary detection. We utilised the
available efficient implementations and our method,
which has equal complexity in terms of the number
of parameters, achieved clearly superior performance
to the baseline. This is an interesting result, since
the baseline (colour histogram difference) is at the
core of many top performing methods. Our method
runs on half frame rate on standard PC hardware and
without special optimisation. Moreover, our results
showed that the two, BoW feature histograms and
colour histograms, provide complementary informa-
tion, and their combination achieved the best perfor-
mance. In future work, we will investigate other low
level video processing tasks using the BoW approach
and optimisation of our implementation to run on at
least frame rate.
REFERENCES
Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010).
Spatial bag-of-features. In CVPR.
Csurka, G., Dance, C., Willamowski, J., Fan, L., and Bray,
C. (2004). Visual categorization with bags of key-
points. In ECCV Workshop on Statistical Learning
in Computer Vision.
Deng, J., Berg, A., Li, K., and Fei-Fei, L. (2010). What does
classifying more than 10,000 image categories tell us?
In ECCV.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2011). The PASCAL Visual
Object Classes Challenge 2011 (VOC2011) Results.
http://www.pascal-network.org/challenges/VOC
/voc2011/workshop/index.html.
Gargi, U., Kasturi, R., and Strayer, S. H. (2000). Perfor-
mance characterization of video-shot-change detec-
tion methods. IEEE Trans. Circuits Syst. Video Techn.,
10(1):1–13.
Joyce, R. A. and Liu, B. (2006). Temporal segmentation of
video using frame and histogram space. IEEE Trans-
actions on Multimedia, 8(1):130–140.
Kang, H.-W. and Hua, X.-S. (2005). To learn representa-
tiveness of video frames. In ACM international con-
ference on Multimedia.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In CVPR.
Leibe, B., Ettlin, A., and Schiele, B. (2008). Learning se-
mantic object parts for object categorization. Image
and Vision Computing, 26(1):15–26.
Li, J., Ding, Y., Shi, Y., and Li, W. (2010). A divide-
and-rule scheme for shot boundary detection based on
SIFT. Int. J. of Digital Content Technology and Its
Applications, 4(3).
Mas, J. and Fernandez, G. (2003). Video shot boundary de-
tection based on color histogram. In TRECVid Work-
shop.
Pruteanu-Malinici, I. and Carin, L. (2008). Infinite Hid-
den Markov Models for Unusual-Event Detection in
Video. IEEE Trans. on Image Processing, 17(5):811–
822.
Sivic, J. and Zisserman, A. (2003). Video Google: A text
retrieval approach to object matching in videos. In
ICCV.
Smeaton, A., Over, P., and Doherty, A. (2010). Video
shot boundary detection: Seven years of TRECVid
activity. Computer Vision and Image Understanding,
114:411–418.
Tahaghoghi, S., Williams, H., Thom, J., and Volkmer, T.
(2005). Video cut detection using frame windows. In
Australasian Computer Science Conference.
Truong, B. and Venkatesh, S. (2007). Video abstraction: A
systematic review and classification. ACM Trans. on
Multimedia Computing, Communications and Appli-
cations (ACM TOMCCAP), 3(1).
Tuytelaars, T., Lampert, C., Blaschko, M., and Buntine, W.
(2010). Unsupervised object discovery: A compari-
son. Int J Comput Vis, 88(2).
VideoShotBoundaryDetectionusingVisualBag-of-Words
791