PITCH-SENSITIVE COMPONENTS EMERGE FROM HIERARCHICAL SPARSE CODING OF NATURAL SOUNDS

Engin Bumbacher, Vivienne Ming

2012

Abstract

The neural basis of pitch perception, our subjective sense of the tone of a sound, has been a great ongoing debates in neuroscience.Variants of the two classic theories - spectral Place theory and temporal Timing theory - continue to continue to drive new experiments and debates (Shamma, 2004). Here we approach the question of pitch by applying a theoretical model based on the statistics of natural sounds. Motivated by gist research (Oliva and Torralba, 2006), we extended the nonlinear hierarchical generative model developed by Karklin et al. (Karklin and Lewicki, 2003) with a parallel gist pathway. The basic model encodes higher-order structure in natural sounds capturing variations in the underlying probability distribution. The secondary pathway provides a fast biasing of the model’s inference process based on the coarse spectrotemporal structures of sound stimuli on broader timescales. Adapting our extended model to speech demonstrates that the learned code describes a more detailed and broader range of statistical regularities that reflect abstract properties of sound such as harmonics and pitch than models without the gist pathway. The spectrotemporal modulation characteristics of the learned code are better matched to the modulation spectrum of speech signals than alternate models, and its higher-level coefficients capture information which not only effectively cluster related speech signals but also describe smooth transitions over time, encoding the temporal structure of speech signals. Finally, we find that the model produces a type of pitch-related density components which combine temporal and spectral qualities.

References

  1. Bell, T. and Sejnowski, T. (1995). An informationmaximization approach to blind separation and blind deconvolution. Neural Computation, 7:1129-1159.
  2. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., and Zue, V. (1990). TIMIT AcousticPhonetic Continuous Speech Corpus.
  3. Griffiths, T., Buechel, C., Frackowiak, R., and Patterson, R. (1998). Analysis of temporal structure in sound by the human brain. Nature Neuroscience, 6:633-637.
  4. Harding, S., Cooke, M., and Konig, P. (2007). Auditory gist perception: An alternative to attentional selection of auditory streams? In Lecture Notes in Computer Science. Springer.
  5. Karklin, Y. and Lewicki, M. (2003). Learning higher-order structures in natural images. Network: Computation in Neural Systems, 14:483-499.
  6. Karklin, Y. and Lewicki, M. (2005). A hierarchical bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Computation, 17:397-423.
  7. Klein, D., Konig, P., and Kording, K. (2003). Sparse spectrotemporal coding of sound. EURASIP J. on Advances in Signal Processing.
  8. Ming, V., Rehn, M., and Sommer, F. (2009). Sparse coding of the auditory image model. UC Berkeley Tech Report.
  9. Oliva, A. and Torralba, A. (2006). Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research.
  10. Olshausen, B. and Field, D. (1996). Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature, 381:607-609.
  11. Olshausen, B. and Field, D. (1997). Sparse coding with an overcomplete basis: A strategy employed by v1? Vision Research, 37:3311-3325.
  12. Oxenham, A., Bernstein, J., and Penagos, H. (2004). Correct tonotopic representation is necessary for complex pitch perception. In Proc Natl Acad Sci USA, volume 101, pages 1114-1115.
  13. Patterson, R., Uppenkamp, S., Johnsrude, I., and Griffiths, T. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron, 36:767-776.
  14. Saul, L. and Roweis, S. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 22:2323.
  15. Shamma, S. (2001). On the role of space and time in auditory processing. Trends in Cognitive Science, 5:340- 348.
  16. Shamma, S. (2004). Topographic organization is essential for pitch perception. PNAS, 5:1114-1115.
  17. Shannon, R., Zeng, F., Kamath, V., Wygonski, J., and Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270.
  18. Singh, N. and Theunissen, F. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. J Acoust Soc Am, 114:3394-3411.
  19. Turner, R. and Sahani, M. (2007). Probabilistic amplitude demodulation. In Lecture Notes in Computer Science, volume 4666, pages 544-551.
  20. 2se2 jjx (17)
Download


Paper Citation


in Harvard Style

Bumbacher E. and Ming V. (2012). PITCH-SENSITIVE COMPONENTS EMERGE FROM HIERARCHICAL SPARSE CODING OF NATURAL SOUNDS . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 219-229. DOI: 10.5220/0003786802190229


in Bibtex Style

@conference{icpram12,
author={Engin Bumbacher and Vivienne Ming},
title={PITCH-SENSITIVE COMPONENTS EMERGE FROM HIERARCHICAL SPARSE CODING OF NATURAL SOUNDS},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={219-229},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003786802190229},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - PITCH-SENSITIVE COMPONENTS EMERGE FROM HIERARCHICAL SPARSE CODING OF NATURAL SOUNDS
SN - 978-989-8425-99-7
AU - Bumbacher E.
AU - Ming V.
PY - 2012
SP - 219
EP - 229
DO - 10.5220/0003786802190229