Table 4: Reduction rate of false expression changes (FECs)
between frames of R-CNN framework with the adoption of
HMM compared to solely Faster R-CNN. Boldface num-
bers indicate the highest score.
Clinic Youtube
Discomfort - Unhappy 0.337 0.474
Unhhappy - Discomfort 0.409 0.563
Discomfort - Other 0.649 0.352
Other - Discomfort 0.610 0.325
Unhappy - Other 0.587 0.489
Other - Unhappy 0.651 0.483
In our experiment, the algorithm is executed on
a GTX-1080ti GPU, which achieves a frame rate of
7 fps. It can be assumed that with a more advanced
GPU, or combining a tracking method, the computa-
tion speed will increase, and therefore allows model
usage in a real-time infant monitoring system.
5 CONCLUSIONS AND FUTURE
WORK
This paper has proposed a near real-time video-based
infant monitoring system, using Faster R-CNN com-
bined with a Hidden Markov Model. The HMM in-
creases the stability of decision making over time
and reduces the noise in expressions. Differentiating
from the conventional methods applying face detec-
tion and expression classification separately, we have
trained a ConvNet detector that directly outputs the
expressions. The experimental results have shown an
AP achieving up to 90.3% for discomfort detection,
which provides a dramatic accuracy increase com-
pared to conventional methods (larger than 50%). The
high-accuracy discomfort detection can be combined
with some disease analysis such as GERD. In addi-
tion, the consistency of the system output is evalu-
ated, and the experimental results have shown that the
false expression changes between frames can be sig-
nificantly reduced with a temporal analysis. In the
future, as more video sequences of infants become
available, we will train our expression classifier and
temporal analysis end-to-end.
REFERENCES
Brahnam, S., Chuang, C.-F., Shih, F. Y., and Slack, M. R.
(2006). Svm classification of neonatal facial images of
pain. In Bloch, I., Petrosino, A., and Tettamanzi, A.
G. B., editors, Fuzzy Logic and Applications, pages
121–128.
Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan,
S., Guadarrama, S., Saenko, K., and Darrell, T.
(2017). Long-term recurrent convolutional networks
for visual recognition and description. IEEE Trans.
on Pattern Anal. Mach. Intell., 39(4):677–691.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision, 88(2):303–338.
Fotiadou, E., Zinger, S., Tjon a Ten, W. E., Oetomo, S.,
and de With, P. H. N. (2014). Video-based facial dis-
comfort analysis for infants. In Proc. SPIE 9029, Vi-
sual Information Processing and Communication V,
90290F.
Harrison, D., Sampson, M., Reszel, J., Abdulla, K., Barrow-
man, N., Cumber, J., Li, C., Nocholls, S., and Pound,
C. M. (2014). Too many crying babies: a systematic
review of pain management of practices during immu-
nizations on youtube. BMC Pediatrics, 14:134.
Hicks, C., Baeyer, C., Spafford, P., van Korlaar, I., and
Goodenough, B. (2001). The faces pain scale - re-
vised: Toward a common metric in pediatric pain
measurement. Pain, 93:173–83.
Jaskowski, S. K. (1998). The flacc: A behavioral scale for
scoring postoperative pain in young children. AACN
Nursing Scan in Critical Care, 8:16.
Li, C., Zinger, S., Tjon a Ten, W. E., and de With, P. H. N.
(2016). Video-based discomfort detection for infants
using a constrained local model. In 2016 Int. Conf.
Systems, Signals and Image Proc. (IWSSIP), pages 1–
4.
Lin, F., Hong, R., Zhou, W., and Li, H. (2018). Facial ex-
pression recognition with data augmentation and com-
pact feature learning. In 2018 25th IEEE Int. Conf. on
Image Processing (ICIP), pages 1957–1961.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully
convolutional networks for semantic segmentation.
In 2015 IEEE Conf. on Comput. Vis. Patt. Recog.
(CVPR), pages 3431–3440.
Mueller, M., Smith, N., and Ghanem, B. (2017). Context-
aware correlation filter tracking. In 2017 IEEE Conf.
Comp. Vision Pattern Recogn. (CVPR), pages 1387–
1395.
Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster r-
cnn: Towards real-time object detection with region
proposal networks. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 39(6):1137–1149.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv 1409.1556.
Sullivan, M. and Lewis, M. (2003). Emotional expressions
of young infants and children. Infants & Young Chil-
dren, 16:120–142.
Sun, Y., Shan, C., Tan, T., Long, X., Pourtaherian, A.,
Zinger, S., and de With, P. H. N. (2018). Video-based
discomfort detection for infants. Machine Vision and
Applications.
Tavakolian, M. and Hadid, A. (2018). Deep binary repre-
sentation of facial expressions: A novel framework for
Automated Infant Monitoring based on R-CNN and HMM
559