by HTK, providing 10 best list and used in a similar
experiment as the one described above. The amount
of data is not enough to provide statistical results but
observations on exact sentences (Table 2 ) provide
the same conclusion as in the main experiment. The
recognitions, which were found using HTK only, had
fewer errors for 6 sentences. then 5 times the number
of errors was the same. One sentence was correctly
recognised for both models. One more was correctly
recognised using just HTK acoustic model.
4 RECOGNITION USING
LANGUAGE MODEL
Recognition can be conducted by finding the most co-
herent set of POS tags in a provided hypothesis. The
tagger calculates P
pos
, which can be used as additional
weight in providing speech recognition due to Bayes’
theorem. The values of p
htk
probability gained from
HTK model tend to be very similar for all hypotheses
in the 10 best list of a particular utterance. This is why
an extra weighting w was introduced to favour proba-
bilities from audio model over p
pos
received from the
tagger. The final measure can be obtained applying
Bayes’ rule
p = p
w
htk
p
pos
. (1)
Bayes rule is often used to compute posterior
probabilities given observations. It can be used to
compute the probability that a proposed hypothesis
is correct, given an observation. It is often applied
to combine probabilities of different models. p
htk
is
a probability of acoustic units given a word and p
pos
is a probability of word. There should division by a
probability of acoustic for normalisation purposes. It
can be skipped as long as we deliver normalisation in
another way or we accept the fact that final result is
not a probability function, as it does not take values
from 0 to 1. We can easily accept it if we are inter-
ested only in argument of a maximum of the result
and we do not need proper probability values. Ap-
plying some linguistical data in speech recognition is
necessary because acoustic models are not effective
enough. However, the model based on POS tagger
seems to not solve the issue.
5 CONCLUSIONS
It seems that POS tags are too ambiguous to be used
effectively in modelling Polish for ASR. Another
source of linguistical data has to be used to provide
effective language model.
ACKNOWLEDGEMENTS
We received a significant help from Maciej Piasecki
from the Technical University of Wrocław by provid-
ing tagger output and from Stefan Grocholewski from
technical University of Poznan by letting us experi-
ment on CORPORA.
REFERENCES
A.Przepi
´
orkowski (2006). The potential of the IPI PAN
corpus. Pozna
´
n Studies in Contemporary Linguistics,
41:31–48.
Brill, E. (1995). Transformation-based error-driven learn-
ing and natural language processing: A case study in
part of speech tagging. Computational Linguistics,
December:543–565.
Cozens, S. (1998). Primitive part-of-speech tagging using
word length and sentential structure. Computaion and
Language.
De¸bowski, Ł. (2003). A reconfigurable stochastic tagger for
languages with complex tag structure. The Proceed-
ings of the Workshop on Morphological Processing of
Slavic Languages, EACL.
Grocholewski, S. (1995). Zało
˙
zenia akustycznej bazy
danych dla je¸zyka polskiego na no
´
sniku cd rom (eng.
Assumptions of acoustic database for Polish lan-
guage). Mat. I KK: Głosowa komunikacja człowiek-
komputer, Wrocław, pages 177–180.
Johansson, S., Leech, G., and Goodluck, H. (1978). Man-
ual of Information to Accompany the Lancaster-
Olso/Bergen Corpus of British English, for Use with
Digital Computers. Department of English, Univer-
sity of Oslo.
Kucera, H. and Francis, W. (1967). Computational Analysis
of Present Day American English. Brown University
Press Providence.
Piasecki, M. (2006). Hand-written and automatically ex-
tracted rules for polish tagger. Lecture Notes in Arti-
ficial Intelligence, Springer, W P. Sojka, I. Kopecek,
K. Pala, eds. Proceedings of Text, Speech, Dialogue
2006:205–212.
Przepi
´
orkowski, A. (2004). The IPI PAN Corpus: Prelimi-
nary version. IPI PAN.
Przepi
´
orkowski, A. and Woli
´
nski, M. (2003). The unbear-
able lightness of tagging: A case study in morphosyn-
tactic tagging of Polish. Proceedings of the 4th Inter-
national Workshop on Linguistically Interpreted Cor-
pora (LINC-03), EACL 2003.
Young, S. (1996). Large vocabulary continuous speech
recognition: a review. IEEE Signal Processing Maga-
zine, 13(5):45–57.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.,
Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev,
V., and Woodland, P. (2005). HTK Book. Cambridge
University Engineering Department, UK.
SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications
180