can be detected by analysis of phoneme length with
only around 2.5% rate of false detections because
phonemes in the sentence ends tend to be longer then
other ones.
ACKNOWLEDGEMENTS
The project was funded by the National Science
Centre allocated on the basis of a decision DEC-
2011/03/D/ST6/00914.
REFERENCES
Baron, D., Shriberg, E., and Stolcke, A. (2002). Auto-
matic punctuation and disfluency detection in multi-
party meetings using prosodic and lexical cues. pages
949–952.
Christensen, H., Gotoh, Y., and Renals, S. (2001). Punctua-
tion annotation using statistical prosody models. In in
Proc. ISCA Workshop on Prosody in Speech Recogni-
tion and Understanding, pages 35–40.
Demenko, G. (1999). Analiza cech suprasegmentalnych
je¸zyka polskiego na potrzeby technologii mowy [Eng.
Analysis of Polish Suprasegmentals for Suprasegmen-
tals for Speech Technology]. Seria Je¸zykoznawstwo
stosowane. Wyd. Naukowe Uniw. im. Adama Mick-
iewicza.
Demenko, G., Wypych, M., and Baranowska, E. (2003).
Implementation of grapheme-to-phoneme rules and
extended SAMPA alphabet in Polish text-to-speech
synthesis. Speech and Language Technology, PTFon,
Pozna´n, 7(17).
Febrer, A., Padrell, J., and Bonafonte, A. (1998). Mod-
eling phone duration: Application to catalan tts. In
Proceedings of the Third ESCA/COCOSDA Workshop
on Speech Synthesis. Jenolan Caves, Australia, pages
43–46.
Frackowiak-Richter, L. (1973). The duration of Polish vow-
els. Speech analysis and Synthesis III, PWN.
Glass, J. (2003). A probabilistic framework for segment-
based speech recognition. Computer Speech and Lan-
guage, 17:137–152.
Grayden, D. B. and Scordilis, M. S. (1994). Phonemic seg-
mentation of fluent speech. Proceedings of ICASSP,
Adelaide, pages 73–76.
Grocholewski, S. (1997). CORPORA - speech database for
Polish diphones. Proceedings of Eurospeech.
Hockey, B. A. and Fagyal, Z. (1999). Phonemic length and
pre-boundary lengthening: An experimental investi-
gation on the use of durational cues in hungarian. Pro-
ceedings of the XIVth International Congress of Pho-
netics Sciences, San Francisco.
Jassem, W. (1973). Podstawy fonetyki akustycznej
(Eng. Rudiments of acoustic phonetics). Warszawa:
Pa´nstwowe Wydawnictwo Naukowe.
Kol´aˇr, J.,
ˇ
Svec, J., and Psutka, J. (2004). Automatic punc-
tuation annotation in czech broadcast news speech.
pages 319–325, Saint-Petersburg. SPIIRAS.
Linares, G., Lecouteux, B., Matrouf, D., and Nocera, P.
Phone duration models for fast broadcast news tran-
scriptions.
Morgan, N., Zhu, Q., Stolcke, A., Sonmez, K., Sivadas,
S., Shinozaki, T., Ostendorf, M., Jain, P., Hermansky,
H., Ellis, D., Doddington, G., Chen, B., Cretin, O.,
Bourlard, H., and Athineos, M. (2005). Pushing the
envelope - aside. IEEE Signal Processing Magazine,
22:81–88.
Ostendorf, M., Digalakis, V. V., and Kimball, O. A. (1996).
From HMM’s to segment models: A unified view
of stochastic modeling for speech recognition. IEEE
Transactions on Speech and Audio Processing, 4:360–
378.
Pylkk¨onen, J. and Kurimo, M. (2004). Using phone du-
rations in finnish large vocabulary continuous speech
recognition.
Russell, M. and Jackson, P. J. B. (2005). A multiple-level
linear/linear segmental HMM with a formant-based
intermediate layer. Computer Speech and Language,
19:205–225.
Shepherd, M. (2011). The scope and effects of preboundary
prosodic lengthening in Japanese. In USC Working
Papers in Linguistics, pages 1–14.
Shriberg, E., Stolcke, A., Hakkani-T¨ur, D., and T¨ur, G.
(2000). Prosody-based automatic segmentation of
speech into sentences and topics.
St¨ober, K. and Hess, W. (1998). Additional use of phoneme
duration hypotheses in automatic speech segmenta-
tion. Proceedings of ICSLP, Sydney, pages 1595–
1598.
Suh, Y. and Lee, Y. (1996). Phoneme segmentation of con-
tinuous speech using multi-layer perceptron. In Pro-
ceedings of ICSLP, Philadelphia, pages 1297–1300.
Toledano, D., G´omez, L., and Grande, L. (2003). Automatic
phonetic segmentation. IEEE Transactions on Speech
and Audio Processing, 11(6):617–625.
Weinstein, C. J., McCandless, S. S., Mondshein, L. F., and
Zue, V. W. (1975). A system for acoustic-phonetic
analysis of continuous speech. IEEE Transactions on
Acoustics, Speech and Signal Processing, 23:54–67.
Wierzchowska, B. (1980). Fonetyka i fonologia je¸zyka
polskiego (Eng. Phonetics and phonology of Polish).
Zakład Narodowy im. Ossoli´nskich, Wrocław.
Young, S. (1996). Large vocabulary continuous speech
recognition: a review. IEEE Signal Processing Maga-
zine, 13(5):45–57.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.,
Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev,
V., and Woodland, P. (2005). HTK Book. Cambridge
University Engineering Department, UK.
Zi´ołko, B., Manandhar, S., Wilson, R. C., and Zi´ołko,
M. (2011). Phoneme segmentation based on wavelet
spectra analysis. Archives of Acoustics, 36(1).
Zi´ołko, B. and Zi´ołko, M. (2011). Time durations of
phonemes in polish language for speech and speaker
recognition. Lecture notes in artificial inteligence,
6562:105–114.
Zue, V. W. (1985). The use of speech knowledge in auto-
matic speech recognition. Proceedings of the IEEE,
73:1602–1615.
SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications
64