
Hus, V., Gotham, K., and Lord, C. (2014). Standardizing
ADOS domain scores: separating severity of social af-
fect and restricted and repetitive behaviors. J. Autism
Dev. Disord., 44(10):2400–2412.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C.,
Vijayanarasimhan, S., Viola, F., Green, T., Back, T.,
Natsev, P., et al. (2017). The kinetics human action
video dataset. arXiv preprint arXiv:1705.06950.
Le Couteur, A., Rutter, M., Lord, C., Rios, P., Robertson, S.,
Holdgrafer, M., and McLennan, J. (1989). Autism di-
agnostic interview: a standardized investigator-based
instrument. J. Autism Dev. Disord., 19(3):363–387.
Li, J., Bhat, A., and Barmaki, R. (2021). Improving the
movement synchrony estimation with action quality
assessment in children play therapy. In Proceedings
of the 2021 International Conference on Multimodal
Interaction, pages 397–406.
Li, J., Chheang, V., Kullu, P., Brignac, E., Guo, Z., Bhat, A.,
Barner, K. E., and Barmaki, R. L. (2023). MMASD: A
Multimodal Dataset for Autism Intervention Analysis.
In Proceedings of the 25th International Conference
on Multimodal Interaction, ICMI ’23, page 397–405,
New York, NY, USA. Association for Computing Ma-
chinery.
Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal,
B. L., DiLavore, P. C., Pickles, A., and Rutter, M.
(2000). The Autism Diagnostic Observation Sched-
ule—Generic: A standard measure of social and com-
munication deficits associated with the spectrum of
autism. Journal of autism and developmental disor-
ders, 30:205–223.
Lucas, B. D. and Kanade, T. (1981). An iterative image
registration technique with an application to stereo vi-
sion. In IJCAI’81: 7th international joint conference
on Artificial intelligence, volume 2, pages 674–679.
Lyall, K., Croen, L., Daniels, J., Fallin, M. D., Ladd-Acosta,
C., Lee, B. K., Park, B. Y., Snyder, N. W., Schendel,
D., Volk, H., et al. (2017). The changing epidemiol-
ogy of autism spectrum disorders. Annual review of
public health, 38(1):81–102.
Mont
´
e-Rubio, G. C., Falc
´
on, C., Pomarol-Clotet, E., and
Ashburner, J. (2018). A comparison of various MRI
feature types for characterizing whole brain anatomi-
cal differences using linear pattern recognition meth-
ods. Neuroimage, 178:753–768.
Nagaoka, C. and Komori, M. (2008). Body movement
synchrony in psychotherapeutic counseling: A study
using the video-based quantification method. IEICE
transactions on information and systems, 91(6):1634–
1640.
Nickl-Jockschat, T., Habel, U., Maria Michel, T., Manning,
J., Laird, A. R., Fox, P. T., Schneider, F., and Eick-
hoff, S. B. (2012). Brain structure anomalies in autism
spectrum disorder—a meta-analysis of VBM studies
using anatomic likelihood estimation. Human Brain
Mapping, 33(6):1470–1489.
Santana, C. P., de Carvalho, E. A., Rodrigues, I. D., Bastos,
G. S., de Souza, A. D., and de Brito, L. L. (2022). rs-
fMRI and machine learning for ASD diagnosis: a sys-
tematic review and meta-analysis. Scientific Reports,
12(1):6030.
Sherkatghanad, Z., Akhondzadeh, M., Salari, S., Zomorodi-
Moghadam, M., Abdar, M., Acharya, U. R., Khos-
rowabadi, R., and Salari, V. (2020). Automated detec-
tion of autism spectrum disorder using a convolutional
neural network. Frontiers in neuroscience, 13:1325.
Srinivasan, S. M., Eigsti, I.-M., Neelly, L., and Bhat, A. N.
(2016). The effects of embodied rhythm and robotic
interventions on the spontaneous and responsive so-
cial attention patterns of children with autism spec-
trum disorder (ASD): A pilot randomized controlled
trial. Research in autism spectrum disorders, 27:54–
72.
Sun, C., Junejo, I. N., Tappen, M., and Foroosh, H. (2015).
Exploring sparseness and self-similarity for action
recognition. IEEE Transactions on Image Processing,
24(8):2488–2501.
Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M. J., and Mei, T.
(2021). Monocular, one-stage, regression of multiple
3D people. In Proceedings of the IEEE/CVF interna-
tional conference on computer vision, pages 11179–
11188.
Tarr, B., Slater, M., and Cohen, E. (2018). Synchrony and
social connection in immersive virtual reality. Scien-
tific reports, 8(1):3693.
Yan, S., Xiong, Y., and Lin, D. (2018). Spatial temporal
graph convolutional networks for skeleton-based ac-
tion recognition. In Proceedings of the AAAI confer-
ence on artificial intelligence, volume 32.
Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., and
Ding, Z. (2021). 3D human pose estimation with spa-
tial and temporal transformers. In Proceedings of the
IEEE/CVF international conference on computer vi-
sion, pages 11656–11665.
Multi-Modal Framework for Autism Severity Assessment Using Spatio-Temporal Graph Transformers
693