technical approaches for creating a scalable IVRS
platform for delivering the public services anytime
anywhere. We have also introduced the combined
use of different voice architecture which can be used
for creating effective voice platform. Compared to
prior solutions, IVR platform offers two key
novelties. First, it seamlessly connects Internet based
users with phone-based users. Both sets of users can
contribute and retrieve audio messages from a
repository in the automated IVR system.
Departments can connect to IVRS through internet
to post audio recordings for automatic broadcast to
mobile phones. The second uniqueness of IVR
System is that it scales across geographically
distributed access points, enabling affordable access
via local phone calls (
Vashistha, 2012).
IVR systems may become, primarily, an assistive
device for callers and agents during a conversation.
IVR will support in making a conversation more
meaningful by collecting and conveying information
to one or both the parties. In that sense, IVR will be
a thin intermediate layer that can amplify the impact
of talk by making it more interactive, and by
providing context. Some of the technologies used in
enhancing IVR systems are listed below.
2.1 Text to Speech (TTS) Systems
The goal of TTS is to convert input text to natural
sounding speech to transmit information from a
machine to a person, for example, citizen dials an
IVRS number to check the status of his/her
application he / she had been filed, and the IVRS
reads out the status fetched from the concerned
department server by converting text received into
speech using TTS engine. Such systems string the
words together to be spoken in isolation and the
artefacts of such a scheme are being often
perceptible. The methodology used in TTS is to
exploit audio representations of speech for
synthesis, together with linguistic analyses of text to
extract correct pronunciations (what is being said in
given context in terms of region, language) and
prosody in context (‘‘melody’’ of a sentence; how it
is being said). Synthesis systems are commonly
evaluated in terms of three characteristics: accuracy
of rendering the input text (does the TTS system
pronounce, e.g., acronyms, names, URLs, email
addresses, a knowledgeable human would?),
intelligibility of the resulting voice message
(measured as a percentage of a test set that is
understood), and perceived naturalness of the
resulting speech (does the TTS sound like a
recording of a live human?). Text to Speech system
can be used to broadcast citizen services like
weather information, crop details, etc. to farmers,
status updates, etc. in addition to banking services,
telecom services (
2.2 Automatic Speech Recognition
Automatic Speech recognition which means
understanding voice input and performing any
required task or the ability to match the voice input
against a provided or acquired vocabulary. The task
is to get a computer to understand the spoken
language. By “understand” we mean to react
appropriately and convert the input speech into
another medium e.g. text. Speech recognition is
therefore sometimes referred to as speech-to-text
(STT). The Automatic Speech Recognition system is
very important in delivering government services as
there are hundreds of services and it is extremely
difficult to access these services through a common
number without an accurate ASR system.
2.3 IP-Telephony
With the introduction of new edge technologies, the
Internet Protocol (IP) based networks are
increasingly being used as an alternative to the
traditional circuit-switched telephone network. The
different flavours of IP Telephony provide varying
degrees, alternative means of originating,
transmitting, and terminating voice and data
transmissions which would otherwise be carried by
the public switched telephone network (PSTN)
(Craig, 2000).
2.3.1 IP based Audio and Video Calling
Audio and Video calling can be done over IP
network. Through the use of Session Initiation
Protocol (SIP) the point-to-point communications
are no longer restricted to voice calls but can now be
extended to multimedia technologies such as video.
The IVR systems with live video of the caller
provide the ability to have true value interaction
with the caller. With the introduction of full-
duplex video, IVR will allow systems such as the
ability to read emotions and facial expressions. This