ColEnViSon: Color Enhanced Visual Sonifier - A Polyphonic Audio Texture and Salient Scene Analysis

Codruta Orniana Ancuti, Cosmin Ancuti, Philippe Bekaert

Abstract

In this work we introduce a color based image-audio system that enhances the perception of the visually impaired users. Traditional sound-vision substitution systems mainly translate gray scale images into corresponding audio frequencies. However, these algorithms deprive the user from the color information, an critical factor in object recognition and also for attracting visual attention. We propose an algorithm that translates the scene into sound based on some classical computer vision algorithms. The most salient visual regions are extracted by a hybrid approach that blends the computed salient map with the segmented image. The selected image region is simplified based on a reference color map dictionary. The centroid of the color space are translated into audio by different musical instruments. We chose to encode the audio file by polyphonic music composition reasoning that humans are capable to distinguish more than one instrument in the same time but also to reduce the playing duration. Testing the prototype demonstrate that non-proficient blindfold participants can easily interpret sequence of colored patterns and also to distinguish by example the quantity of a specific color contained by a given image.

References

  1. Arno P., Capelle C., W. D. M. C. A. M. and C., V. (1999). Auditory coding of visual patterns for the blind. In Perception, volume 28(8), pages 1013-1029.
  2. Auvray M., Hanneton S., L. C. (2005). There is something out there: distal attribution in sensory substitution, twenty years later. Journal of Integrative Neuroscience, 4:505-521.
  3. Bach-y Rita P., Collins C., S. F. W. B. S. L. (1969). Visual substitution by tactile image projection. In Nature, volume 221, pages 963-964.
  4. Bach-y Rita P., S. W. K. (2003). Sensory substitution and the human-machine interface. Trends in Cognitive Sciences, 7(12):541-546.
  5. Belpaeme, T. (2002). Factors influencing the origins of color categories. PhD Thesis, Artificial Intelligence Lab, Vrije Universiteit Brussel.
  6. Beretta, G. (1990). Color palette selection tools. The Society for Imaging Science and Technology.
  7. Bregman, A. (1990). Auditory scene analysis. MIT Press, Cambridge, MA.
  8. Brent, B. and Kay, P. (1991). Basic color terms: their universality and evolution. Berkeley: University of California Press.
  9. Capelle C., Trullemans C., A. P. and C., V. (1998). A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Transactions on Biomedical Engineering, 45(10):1279-1293.
  10. Clifford, C.W.G., H. A. P. J. (2004). Rapid global form binding with loss of associated colors. Journal of Vision, 4:1090-1101.
  11. Comaniciu, D. and Meer, P. Robust analysis of feature spaces: color image segmentation. In In Proc of Computer Vision and Pattern Recognition (CVPR 7897).
  12. Comaniciu, D. and Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):603-619.
  13. Cronly-Dillon J., Persaud K., G. R. (1999). The perception of visual images encoded in musical form: a study in cross-modality information. Biological Sciences, pages 2427-2433.
  14. Fairchild, M. D. (2005). Color appearance models, 2nd ed.
  15. Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. Int. J. Comput. Vision, 59(2):167-181.
  16. Fine, I., M. D. B. G. Surface segmentation based on the luminance and color statistics of natural scenes. Journal of the Optical Society of America (2003).
  17. Gibson, S. Harvey, R. (2001). Morphological color quantization. Proc IEEE Conf. on Comp. Vision and Pattern Recog.
  18. Heckbert, P. S. (1982). Color image quantization for frame buffer display. ACM SIGGRAPH.
  19. Itti, L., K. C. and E., N. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 20(11):1254-1259.
  20. Kelly, K. L. and Judd., D. B. (1976). Color: Universal language and dictionary of names. National Bureau of Standards, Spec. Publ. 440.
  21. Liu, T., Zhou, H., Lin, F., Pang, Y., and Wu, J. (2008). Improving image segmentation by gradient vector flow and mean shift. Pattern Recogn. Lett., 29(1):90-95.
  22. Meijer, P. (1992). An experimental system for auditory image representations. IEEE Transactions on Biomedical Engineering, 39(2):112-121.
  23. Meijer, P. (1998). Cross-modal sensory streams. In Conference Abstracts and Applications, ACM SIGGRAPH.
  24. Mitchell T. V., M. M. T. (2007). How vision matters for individuals with hearing loss. In Informa Healthcare, volume 46(9), pages 1499-2027.
  25. Neisser, U. (1964). Visual search. Scientific American, 210(6):94-102.
  26. Puzicha J., Held M., K. J. B. J. F. D. (2000). On spatial quantization of color images. IEEE Transactions on Image Processing, 9(4):666-682.
  27. R. Valazquez, E. E. Pissaloux, J. C. G. and Maingreaud, F. (2005). Walking using touch:design and preliminary prototype of a noninvasive eta for the visually impaired. In in Proceedings of the the Medicine and Biology 27th Annual Conference.
  28. Rossion, B., P. G. (2004). Revisiting snodgrass and vanderwarts object pictorial set: The role of surface detail in basic-level object recognition. Perception, 33:217- 236.
  29. Shyi-Chyi Cheng, C.-K. Y. (2001). A fast and novel technique for color quantization using reduction of color space dimensionality. Pattern Recognition Letters, 22(8):845-856.
  30. Strybel T. Z., M. M. L. (1998). Auditory apparent motion between sine waves differing in frequency. Perception, 27(4):483-495.
  31. Walther, D. and Kochb, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19:1395- 1407.
  32. Wang, J., Thiesson, B., Xu, Y., and Cohen, M. F. (2004). Image and video segmentation by anisotropic kernel mean shift.
  33. Wilson, R. A. and Keil, F. C. (1999). The mit encyclopedia of the cognitive sciences. Cerebral Cortex: Top-Down Processing in Vision.
Download


Paper Citation


in Harvard Style

Ancuti C., Ancuti C. and Bekaert P. (2009). ColEnViSon: Color Enhanced Visual Sonifier - A Polyphonic Audio Texture and Salient Scene Analysis . In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2009) ISBN 978-989-8111-69-2, pages 566-572. DOI: 10.5220/0001805105660572


in Bibtex Style

@conference{visapp09,
author={Codruta Orniana Ancuti and Cosmin Ancuti and Philippe Bekaert},
title={ColEnViSon: Color Enhanced Visual Sonifier - A Polyphonic Audio Texture and Salient Scene Analysis},
booktitle={Proceedings of the Fourth International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2009)},
year={2009},
pages={566-572},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001805105660572},
isbn={978-989-8111-69-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2009)
TI - ColEnViSon: Color Enhanced Visual Sonifier - A Polyphonic Audio Texture and Salient Scene Analysis
SN - 978-989-8111-69-2
AU - Ancuti C.
AU - Ancuti C.
AU - Bekaert P.
PY - 2009
SP - 566
EP - 572
DO - 10.5220/0001805105660572