detection and hence, we created our own ground truth for our application.
1
The data is in
the form of grayscale frames of size 320x240 pixels. There are five sequences, namely
corridor, diningRoom, lakeSide, library, and park. Each sequence contains different
kind of motion and different zoom levels such that the object sizes are very different.
5.1 Experimental Setup
The parameter F , i.e., the number of frames used for training the model, was fixed at
50. Thus, the first 50 frames are assumed not to have any special event. At the normal
rate of 25 frames per second, this means we are assuming constancy for only 2 seconds,
which is a very realistic and reasonable assumption. The threshold T was set to 0.175,
which was determined empirically on the dataset. Future work will include automatic
determination of even these two system parameters. Furthermore, 30 bins are used in
the PHOG descriptor for each histogram calculated over three levels.
5.2 Results on the CVPR Change Detection Dataset
In most of the published works including [5], the performance measures used are sub-
jective and do not lend themselves to comparison. The reason is that in problems like
video segmentation, it is very difficult to define a good performance measure for the al-
gorithms. Hence, we concentrated on the fact that our algorithm works on the principle
of detecting events and use performance measures for event detection. To evaluate the
results of the algorithm qualitatively, we used the detection rate η:
η =
number of correct detections
number of events in ground truth
. (4)
It is often the case that the detection of the algorithm and the ground truth vary by
about 20-25 frames, because the algorithm makes hard decisions using a threshold and
ground truth is marked by human observers. This is not a serious problem, since in
real-life videos 25 frames corresponds to a time span of 1 second, in which generally
not many events happen. For most applications, this difference is not a major problem.
Additionally, over-segmentation is expected, because our algorithm works com-
pletely unsupervised. As the threshold is set without any prior knowledge about the
video and is thus completely independent of it, often very small and insignificant changes
in the video result in a new segment being reported. However, this is also not a cause
for alarm as this stage is usually intended to be followed by a higher processing stage
or a human observer in most applications. Thus, at the higher level, we can choose to
ignore these particular extra segments in a post-processing step. We represent the effect
of over-segmentation with the over-segmentation ratio γ:
γ =
number of false detections
number of events in ground truth
. (5)
1
Readers interested in obtaining the ground truth for their own further research or verification
of our methods can contact the authors.
78