available to train these models for our specific
experiments.
In order to palliate the issue related to training
data scarcity, we proposed a method that does not
require a large amount of training data to apply deep
learning models to mouse behaviour recognition
tasks. In our method, we utilize deep learning
models pre-trained from human action recognition
tasks. First, we retrain these models using the largest
of the current publicly available mouse behaviour
datasets (Jhuang et al., 2010). We use this step to
give the models the knowledge about mouse
behaviour recognition tasks. Then, we use data of
our specific tasks to fine-tune these models. Because
the retrained models after the first step have learned
knowledge about mouse behaviours, we do not need
a large amount of data to train them for our specific
tasks in the second step.
In the next section, we describe the deep learning
models and the mouse behaviour dataset we used in
the first step of our proposed method. In Section 3,
we present the swimming mouse behaviour
recognition tasks we used to evaluate our method
and the results of our experiments. Finally, in
Section 4, we state our conclusions.
2 METHOD
As described in the previous section, our proposed
method has two steps. In the first step, we fine-tune
deep learning models which were used for human
action recognition tasks by using the largest publicly
available mouse behaviour dataset. Then in the
second step, we train these models again using the
data we prepared for our swimming mouse
behaviour recognition tasks. In this section, we give
the information about the deep learning models and
the mouse behaviour dataset we used in the first step
of our method.
2.1 The Two-Stream I3d Model
Carreira and Zisserman introduced the Two-Stream
Inflated 3D ConvNets (Two-Stream I3D model)
(Carreira and Zisserman, 2018), one of the current
state-of-the-art deep learning models for human
action recognition tasks. As reported, the Two-
Stream I3D models achieve 98% of accuracy on
UCF-101 human action recognition dataset
(Soomro, Zamir and Shah, 2012) and 80.9% of
accuracy on HMDB-51 human action recognition
dataset (Kuehne et al., 2011). These models are
derived from the Inception-V1 model which uses the
Inception module architecture (Szegedy et al., 2015).
Layers of the Inception modules combine filters of
different sizes and pooling kernels to utilize all their
good effects in feature extraction.
To create an I3D model, all 2D filters and
pooling kernels of an Inception-V1 model are
inflated to 3D by endowing them with an additional
temporal dimension, i.e. n × n filters become n × n ×
n filters, and the weights of the 3D filters are
bootstrapped by repeating the weights of the
respective 2D filters n times along the new temporal
dimension. This bootstrap method let the I3D
models benefit from the learned parameters of the
pre-trained 2D models.
In this research, we used the same I3D models’
architectures as described in the paper of Carreira
and Zisserman (Carreira and Zisserman, 2018). The
models were pre-trained on ImageNet data
(Russakovsky et al., 2015) for the first step of our
method. Also reported in the research of Carreira
and Zisserman, using optical flow data computed
from RGB data to train a complementary model for
the model trained on RGB data can help to improve
the prediction accuracy. Therefore, in this research,
we also utilized optical flow data, and we
experimented on various fusion ratios of RGB data
trained models and optical flow trained models to
find the best fusion ratio for the swimming mouse
behaviour recognition tasks. To compute optical
flow data from the RGB data we used the TV-L1
algorithm (Zach, Pock and Bischof, 2007).
2.2 The Mouse Behaviour Dataset
In the work of Jhuang H. et al. (Jhuang et al., 2010),
they created a dataset to train their mouse behaviour
recognition system. They have recorded and
annotated more than 9000 video clips (~10 hours of
video) of single housed mice in a home cage. There
are 8 types of behaviour annotated in this dataset:
“drink”, “eat”, “groom”, “hang”, “micro-
movement”, “rear”, “rest” and “walk”.
From the recorded video clips, they selected
4,200 clips (~2.5 hours of video) that have the most
unambiguous examples of each behaviour to create a
subset called “clipped database”. In this research, we
used this subset for the first step of our method.
3 EXPERIMENTS & RESULTS
The mouse forced swim tests are rodent behavioural
tests used to study about antidepressant drugs,
antidepressant efficacy of new compounds, and