Adnan Firoze, M. Shamsul Arifin, Ryana Quadir, Rashedur M. Rahman


The paper presents Bangla word speech recognition using spectral analysis and fuzzy logic. As human speech is imprecise and ambiguous, the fuzzy logic – the base of which is indeed linguistic ambiguity, could serve as a more precise tool for analysing and recognizing human speech. Even though the core source of an uttered word is a voiced signal, our system revolves around the visual representation of voiced signals – the spectrogram. The spectrogram may be perceived as a “visual” entity. The essences of a spectrogram are matrices that include information about properties of a sound, e.g., energy, frequency and time. In this research the spectral analysis has been chosen as opposed to image processing for increased accuracy. The decision making process of our system is based on fuzzy logic. Experimental results demonstrate that our system is 80% accurate compared to a commercial Hidden Markov Model (HMM) based speech recognizer that shows 73% accuracy on an average.


  1. Abul, Md. H., Jabir, M., Mumit, K, 2007. Isolated and Continuous Bangla Speech Recognition: Implementation, Performance and application perspective, in SNLP 07, Kasetsart University, Bangok, Thailand
  2. Davies, K. H., Biddulph, R., Balashek, S., 1952. Automatic Speech Recognition of Spoken Digits, J. Acoust. Soc. Am. 24(6) pp.637 -642.
  3. Dragon Natural Speaking (DNS), 2010, Wikipedia Encyclopedia, 2010. Available: g
  4. Fletcher, H., 1922. The Nature of Speech and its Interpretations, Bell Syst. Tech. J., Vol 1, pp. 129- 144.
  5. Hasan, M. R., Nath, B., Alauddin B. M. , 2003. Bengali Phoneme Recognition: A New Approach, in 6th ICCIT conference, Dhaka.
  6. Illinois Image Formation and Processing (IIFP), 2010. DSP Mini-Project: An Automatic Speaker Recognition System [Online]. Available: recognition/speaker_recognition.html
  7. Islam, M. R., Sohail, A. S. M., Sadid, M. W. H.M., Mottalib, A., 2005. Bangla Speech Recognition using three layer Back-Propagation Neural Network, in NCCPB, Dhaka.
  8. Juang, B. H., Rabiner, L. R., 2005. Automatic Speech Recognition -A Brief History of the Technology, Elsevier Encyclopedia of Language and Linguistics, Second Edition, Amsterdam, Holland.
  9. Karim, A H M. R, Rahman, Md. S., Iqbal, Md.Zafar, 2002. Recognition of Spoken Letters in Bangla, in 6th ICCIT conference, Dhaka.
  10. Nuance Communications (NComm), (2010) Available:
  11. Rahman, K. J., Hossain,M.A., Das, D., Islam, T. A. Z. and Ali, M.G., 2003. Continuous Bangla Speech Recognition System, in 6th Int. Conf. on Computer and Information Technology (ICCIT), Dhaka.
  12. Roy, K., Das, D., Ali, M.G, 2002. Development of the Speech Recognition System Using Artificial Neural Network, in 5th ICCIT conference, Dhaka.
  13. Spectrogram on Wikipedia Encyclopedia, 2010. [Online]. Available:
  14. Short-time Fourier Transform (STFT),Wikipedia Encyclopedia, 2010. [Online]. Available:
  15. Traunmüller, H., Eriksson, A., 1995. Publications of Hartmut Traunmüller, Stockholm University, Sweden [Online]. Available:
  16. Weiss, M., 2006 . Indo-European Language and Culture, Journal of the American Oriental Society [Online] . Available: ai_n29428508/

Paper Citation

in Harvard Style

Firoze A., Arifin M., Quadir R. and Rahman R. (2011). BANGLA ISOLATED WORD SPEECH RECOGNITION . In Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-54-6, pages 73-82. DOI: 10.5220/0003492700730082

in Bibtex Style

author={Adnan Firoze and M. Shamsul Arifin and Ryana Quadir and Rashedur M. Rahman},
booktitle={Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},

in EndNote Style

JO - Proceedings of the 13th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
SN - 978-989-8425-54-6
AU - Firoze A.
AU - Arifin M.
AU - Quadir R.
AU - Rahman R.
PY - 2011
SP - 73
EP - 82
DO - 10.5220/0003492700730082