from the drinking mouse which maintains its posture
but uses only its mouth (see Fig. 2). Overall, our
method significantly outperforms the current state-of-
the-art methods for each specific mouse behavior. In
terms of the final accuracy our method has an
improvement of 13.7%, 3.6% and 2.9% over (Dollár
et al., 2005), (Wang et al. 2015) and (Jhuang et al.,
2010), respectively.
4.5 Continuous Video Annotation
To annotate continuous videos, sliding windows are
centered at each frame and both appearance features
and contextual features are computed inside them.
Once spatio-temporal features are computed for all
the sliding windows, Fisher vector is then computed
for each frame by focusing on a sliding window
centered in the current frame. These fisher vectors are
finally classified by a trained neural network and their
classification results are regarded as labels of all the
frames. To explore an optimal sliding window size,
we establish an experiment to compare the percentage
agreements with human annotation using different
sliding window sizes, illustrated in Fig. 7.
Figure 7: Continuous video annotation with different
window sizes.
5 CONCLUSION
This paper has presented a new approach to
automatically recognizing specific mouse behaviors.
We show that our interest detector is stable under
illumination. Our appearance and contextual fusion
features encoded by spatial-temporal stacked fisher
vector significantly outperform the other state-of-the-
art features. Also, the combination of Fisher vector
and neural networks improves the performance of our
system and gives higher accuracy than the other state-
of-the art systems. Overall, our method achieves an
average of 95.9% accuracy compared to the previous
best test of 93%. Final experiments on annotation of
continuous video also obtain results (72.9%) that are
on a par with human annotation, which is evaluated
as 71.6% in (Jhuang et al., 2010). Future work will
include exploring more distinguishing features,
combining temporal model and extending the range
of behaviors. We also plan to study social behavior
between multiple mice.
ACKNOWLEDGEMENT
This project is supported by UK EPSRC under Grant
EP/N011074/1 and National Natural Science
Foundation of China under Grant 61300111.
REFERENCES
Bishop, C. M. (2006). In Pattern Recognition and Machine
Learning.
Burgos-Artizzu, X. P., Dollár, P., Lin, D., Anderson, D. J.,
and Perona, P. (2012, June). In Social behavior
recognition in continuous video. IEEE Conference on
Computer Vision and Pattern Recognition.
Chatfield, K., Lempitsky, V. S., Vedaldi, A., and
Zisserman, A. (2011, September). In The devil is in the
details: an evaluation of recent feature encoding
methods. British Machine Vision Conference (Vol. 2,
No. 4, p. 8).
Dankert, H., Wang, L., Hoopfer, E. D., Anderson, D. J., and
Perona, P. (2009). In Automated monitoring and
analysis of social behavior in Drosophila. Nature
methods, 6(4), 297-303.
Dollár, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005,
October). In Behavior recognition via sparse spatio-
temporal features. 2nd Joint IEEE International
Workshop on Visual Surveillance and Performance
Evaluation of Tracking and Surveillance.
Jaakkola, T. S., and Haussler, D. (1999). In Exploiting
generative models in discriminative classifiers.
Advances in neural information processing systems,
487-493.
Jhuang, H., Garrote, E., Yu, X., Khilnani, V., Poggio, T.,
Steele, A. D., and Serre, T. (2010). In Automated home-
cage behavioural phenotyping of mice. Nature
communications, 1, 68.
Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007,
October). In A biologically inspired system for action
recognition. IEEE 11th International Conference on
Computer Vision (pp. 1-8).
Laptev, I. (2005). In On space-time interest points.
International Journal of Computer Vision, 64(2-3), 107-
123.
Roughan, J. V., Wright-Williams, S. L., and Flecknell, P.
A. (2009). In Automated analysis of postoperative
behaviour: assessment of HomeCageScan as a novel