The results are furthermore ordered by their
relevance represented by the count of occurrences of
the keywords and the time when the related audio
content was broadcast. For each result the user can
decide to hear only the piece where the keyword
occurred, the whole program in which this piece
occurred or only the speech/music of this program.
Table 3: Excerpt of results for search query “Literature”.
[19:53:14] Deutschlandradio Kultur - Literature
"Our history is full of violence" literature as contemporal history by E. L.
Doctorow and Richard Ford Von Johannes Kaiser
01.02.2009 - Program: 19:30-20:00 - Duration: 6m 36s
Hear: All / Music / Speech
[19:14:02] HR1 - hr1 - PRISMA
The magazine in the evening: amusing and informativ, relaxing and inspiringly!
The most important of the day with tones and opinions, tips for the spare time,
the best from society and life-style.
01.02.2009 - Program: 18:05-20:00 - Duration: 3m 23s
Hear: All / Music / Speech
[17:19:19] Deutschlandradio Kultur – Local time
The most important topics of the day
01.02.2009 - Program: 17:07-18:00 - Duration: 7s
Hear: All / Music / Speech
Through relatively recent improvements in large-
vocabulary ASR systems, recognition of broadcast
news has become possible in real-time. Though,
problems such as the use of abbreviations, elements
of foreign languages, and acoustic interferences are
complicating the recognition process. The
combination of informal speech (including dialect
and slang, non-speech utterances, music, noise, and
environmental sounds), frequent speaker changes,
and the impossibility to train the ASR system on
individual speakers results in poor transcription
performance on broadcast news. The result is a
stream of words with fragmented units of meaning.
We confirmed with our experiments an older
study of ASR performance on broadcast news of
Whittaker to this day (Whittaker, 1999), who
observed wide variations from a maximum of 88%
words correctly recognized to a minimum of 35%,
with a mean of 67% (our results: 92%, 41%, 72%).
Unfortunately most ASR programs do not show
additional information; they do not offer any
measure of confidence, nor do they give any
indication if it fails to recognize anything of the
audio signal. When the speech recognizer makes
errors, they are gaps and deletions, insertions and
substitutions of the inherent word pool, rather than
the kinds of non-word errors that are generated by
optical character recognition. Recent proper nouns,
especially names, contribute significant error
because they can not be transcribed correctly. It
seems unlikely that error-free ASR will be available
in the foreseeable future.
However, highest precision is not really required
for our approach. The goal is not to obtain a correct
transcript, but simply to gather enough semantic
information to generate a characterization that the
system can employ to find relevant content. The
interface offers primary the user the original audible
content from recordings, because audio is doubtless
a much richer medium of communication. Voice
quality and intonational characteristics are lost in
transcription, and intonational variation has been
widely shown to change even the semantics of the
simplest phrases. Hence, the presentation of texts is
intentionally limited in contrast to (Whittaker,
1999).
An advantage of our system, also respective to
the previously mentioned problem, is the low
complex but efficient MSD. It enables us to monitor
up to 30% more channels with still a good accuracy
compared to a system with MSD of higher
complexity. In any case MSD and ASR often lead
into major difficulties while modern broadcast uses
background music for spoken amounts.
During an evaluation time of one month we were
able to process up to four radio channels at the same
time and integrate the obtained information
automatically into our database for instant use On
the other hand the monitoring of several data
services is possible without any problems. The
limitation of ASR could be avoided by splitting up
the task by parallel processing which can reduce the
lag of time between the recording and the end of the
indexing process. The current limitations of the
introduced system have to be handled by more
efficient speech recognition subsystems,
sophisticated semantic retrieval algorithms, and a
higher degree of parallel processing. Furthermore,
prospectively a more natural communication style
using a combination of speech, gesture and
contextual knowledge should be possible. Therefore,
a system able to interpret the semantics of speech is
inevitable.
7 CONCLUSIONS
The Digital Radio was extended with the capability
to systematically search for contents in DAB/DMB
audio and data services; no major obstacles exist to
extend the principles also on HD Radio
TM
, internet
services, and podcasts etc. The functional
enlargement of a digital receiver significantly adds
value by promoting the evolution towards an
embedded device providing innovative
functionalities:
DIGITAL RADIO AS AN ADAPTIVE SEARCH ENGINE - Verbal Communication with a Digital Audio Broadcasting
Receiver
163