are already on the market as Digital Assistants to
support accessibility. This study is focused on
investigating whether generative AI used by digital
assistants is suitable for generating descriptions of
STEM graphic content. The literature shows that
generative AI is increasingly being used to produce
descriptions of commonly used images, especially
photos and art images. Scientific content is little
considered, with the risk of excluding many people
from access to STEM studies and careers. Generative
AI could be well exploited to support the task of
producing complex image descriptions also for
STEM content. In this study we analysed alternative
descriptions automatically generated for STEM
graphical content (with different levels of difficulty)
by five AI Digital Assistants and applications. Based
on the tests conducted on even a few images, we can
say that 'Seeing AI' overall is unsuitable because it
cannot identify STEM content. 'Be my eyes',
identifies objects correctly, and produces good
descriptions of simple content, and less accurate
descriptions of more complex content. Gemini has
limitations in more complex images such as the state
diagram and generates descriptions that are too
verbose. Bing Copilot also seems to perform well
with more complex images, both in identification and
descriptions, including the visual one.
As we discussed, the tested AI assistants can be
useful to visually impaired students, although we saw
that some very promising AI assistants are not always
reliable, especially for complicated images like
STEM subjects. Moreover, while using the tools we
could find that some parts of the descriptions were not
appropriate. Those little mistakes can be challenging
for a student who is trying to learn or while taking an
exam. It needs work in terms of accuracy and
accessibility but in the end, current AI assistants need
some improvements for effectively assisting blind
students. Last, the AI assistants should be able to
adapt the various descriptions to the level of the
student, i.e. whether he/she is new to the subject, or
has already learned several concepts.
The study is certainly too limited to be able to say
whether the tools are mature or not for interpreting
STEM content. However, it emerges that some tools
are beginning to provide appropriate descriptions,
albeit with many limitations and inaccuracies. A more
in-depth study may provide more guidance. We can
conclude that when image descriptions related to
STEM content are to be generated to the user, they
should be provided according to the student's learning
level with respect to a certain subject. Furthermore,
the student may be in different contexts: learning,
review/practice, examination. The system should also
consider these three different contexts to produce
appropriate descriptions.
REFERENCES
Diagram Center (2012). Making Images Accessible, online
Available: http://diagramcenter.org/making-images-
accessible.html/.
Gamage, B., Do, T. T., Price, N. S. C., Lowery, A., &
Marriott, K. (2023). What do Blind and Low-Vision
People Really Want from Assistive Smart Devices?
Comparison of the Literature with a Focus Study. In
Proc. of the 25th Int. ACM SIGACCESS Conference
(pp. 1-21).
Gleason, C., Carrington, P., Cassidy, C., Morris, M. R.,
Kitani, K. M., & Bigham, J. P. (2019). “It's almost like
they're trying to hide it”: How User-Provided Image
Descriptions Have Failed to Make Twitter Accessible.
In The World Wide Web Conference (pp. 549-559).
Hilal, A. M., Alrowais, F., Al-Wesabi, F. N., & Marzouk,
R. (2023). Red Deer Optimization with Artificial
Intelligence Enabled Image Captioning System for
Visually Impaired People. Computer Systems Science
& Engineering, 46(2).
Kim, H. N. (2023). Digital privacy of smartphone camera-
based assistive technology for users with visual
disabilities. International Journal of Human Factors and
Ergonomics, 10(1), 66-84.
Kubullek, A. K., & Dogangün, A. (2023). Creating
Accessibility 2.0 with Artificial Intelligence. In
Proceedings of Mensch und Computer 2023 (pp. 437-
441).
Leotta, M., Mori, F., & Ribaudo, M. (2023). Evaluating the
effectiveness of automatic image captioning for web
accessibility. Universal access in the information
society, 22(4), 1293-1313.
Lundgard, A., & Satyanarayan, A. (2021). Accessible
visualization via natural language descriptions: A four-
level model of semantic content. IEEE transactions on
visualization and computer graphics, 28(1), 1073-1083.
Mack, K., Cutrell, E., Lee, B., & Morris, M. R. (2021).
Designing tools for high-quality alt text authoring.
In Proc. of the 23rd Int. ACM SIGACCESS (pp. 1-14).
Mott, M. E., Tang, J., & Cutrell, E. (2023). Accessibility of
Profile Pictures: Alt Text and Beyond to Express
Identity Online. In Proceedings of the CHI Conference
on Human Factors in Computing Systems (pp. 1-13).
Selivanov, A., Rogov, O. Y., Chesakov, D., Shelmanov, A.,
Fedulova, I., & Dylov, D. V. (2023). Medical image
captioning via generative pretrained transformers.
Scientific Reports, 13(1), 4171.
Sharif, A., Chintalapati, S. S., Wobbrock, J. O., &
Reinecke, K. (2021). Understanding screen-reader
users’ experiences with online data visualizations. In
Proc. of the 23rd Int. ACM SIGACCESS Conference
(pp. 1-16).
Sharma, H., Agrahari, M., Singh, S. K., Firoj, M., &
Mishra, R. K. (2020). Image captioning: a