6 CONCLUSION
This paper presents the Masry end-to-end text-to-
speech system tailored for Egyptian Arabic, combin-
ing Tacotron 2 with the HiFi-GAN Vocoder. Addition-
ally, a novel data set and its transcriptions in Egyptian
Arabic were introduced. T he system’s performance
was assessed through automatic evaluation metric s,
namely Charac te r Error Rate (CER) and Word Error
Rate (WER), resu lting in scores of 7 .3 and 22.3, re-
spectively. Furthermore , a manual evaluation using
the Mean Opinion Score (MOS) yielded a score of
4.48. Our findings indicate that the system’s perfor-
mance is in close proximity to that of English and
Modern Arabic Standard systems. Future work en-
tails incorporating additional features, such as emo-
tions and multispeaker capabilities, to enhance the
system’s cap abilities further.
REFERENCES
Abdel-Hamid, O., Abdou, S. M., and Rashwan, M. (2006).
Improving arabic hmm based speech synthesis qual-
ity. In Ninth International Conference on Spoken Lan-
guage Processing. n/a.
Abdel-Massih, E. T. (2011). An Introduction to Egyptian
Arabic. MPublishing, University of Michigan Library.
Abdelali, A., Durrani, N., Demiroglu, C ., Dalvi, F.,
Mubarak, H. , and Darwish, K. (2022). Natiq: An end-
to-end text-to-speech system for arabic. arXiv preprint
arXiv:2206.07373.
Alyafeai, Z. (2022). Klaam asr. https:// git hub.com/
ARBML/klaam.
Baali, M., Hayashi, T., Mubarak, H., Maiti, S., Watanabe,
S., El-Hajj, W., and Ali, A. (2023). Unsupervised data
selection for TTS: using arabic broadcast news as a
case study. CoRR, abs/2301.09099.
El-Imam, Y. A. (2004). Phonetization of arabic: rules
and algorithms. Computer Speech & Language,
18(4):339–373.
Fahmy, F. K., Khalil, M. I., and Abbas, H. M. (2020). A
transfer learning end-to-end arabic text-to-speech (tts)
deep architecture. In IAPR Workshop on Artificial
Neural Networks in Pattern Recognition, pages 266–
277. Springer.
Guski, R. (1997). P sychological methods for evaluat-
ing sound quality and assessing acoustic information.
Acta Acustica united with Acustica, 83:765–774.
Habash, N. Y. (2022). Introduction to Arabic natural lan-
guage processing. Springer N ature.
Halabi, N. (2016). Modern standard Arabic phonetics for
speech synthesis. PhD thesis, University of Southamp-
ton.
Imene, Z., Mnasri, Z., Vincent, C., D enis, J., Amal, H.,
et al. (2018). Duration modeling using dnn f or arabic
speech synthesis. In Proeedings of 9th International
Conference on Speech Prosody, pages 597–601.
Ito, K. and Johnson, L. (2017). The lj speech dataset. https:
//keithito.com/LJ-Speech-Dataset/.
Kong, J., K im, J., and Bae, J. (2020). Hifi-gan: Genera-
tive adversarial networks for efficient and high fidelity
speech synthesis. Advances in Neural Information
Processing Systems, 33:17022–17033.
Morris, A., Maier, V., and Green, P. (2004). From wer
and r il to mer and wil: improved evaluation measures
for connected speech recognition. In INTERSPEECH
2004 - ICSLP, 8th International Conference on Spoken
Language Processing, Jeju Island, Korea.
Obeid, O., Zalmout, N., Khalifa, S., Taji, D. , Oudah, M.,
Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., and
Habash, N. (2020). CAMeL tools: An open source
python toolkit for Arabic natural language processing.
In Proceedings of the Twelfth Language Resources
and Evaluation Conference, pages 7022–7032, Mar-
seille, France. European Language Resources Associ-
ation.
Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., and
Liu, T.-Y. (2019). Fastspeech: Fast, robust and con-
trollable text to speech. Advances in neural informa-
tion processing systems, 32.
Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N.,
Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerrv-Ryan,
R., et al. (2018). Natural tts synthesis by condition-
ing wavenet on mel spectrogram predictions. In 2018
IEEE international conference on acoustics, speech
and signal processing (ICA SSP), pages 4779–4783.
IEEE.
Versteegh, K. (2014). Arabic language. Edinburgh Univer-
sity Press.
Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss,
R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio,
S., Le, Q. V., Agiomyrgiannakis, Y., Clark, R. A. J.,
and Saurous, R. A. (2017). Tacotron: Towards end-
to-end speech synthesis. In Interspeech.
Young, M., Courtad, C. A., Douglas, K., and Chung, Y.-
C. (2018). The effects of text-to-speech on reading
outcomes for secondary st udents with learning dis-
abilities. Journal of Special Education Technology,
34:016264341878604.