2.3 Robots Description
The prototype system uses a compact industrial
general-purpose robotic arm (ABB IRB 140). The
robot is a 6-axes machine with fast acceleration,
wide working area and high payload. It is driven by
a high performance industrial motion control unit
(S4Cplus) which employs the RAPID programming
language. The control unit offers extensive
communication capabilities - FieldBus, two Ehternet
channels and two RS-232 channels. The serial
channel was chosen for communication between the
robot and the developed control system running on a
PC.
Figure 3: Robots used.
The robotic control SW module simplifies the
robot use from the main engine's point of view. It
abstracts from the aspects of physical
communication and robot's programming interface.
It either accepts or refuses movement commands
issued by the core engine (depending on command's
feasibility). When a command is accepted, it is
carried out asynchronously, only notifying the
engine once the command is completed.
2.4 Distributed Computation
Most of the system's modules are developed and run
on a standard PC to which the robot is connected.
Since some of the software modules require
significant computational power, the system's
response time was far from satisfactory when the
whole system ran on a single computer. Therefore,
the most demanding computations (namely the
object recognition and the voice recognition) were
distributed to other (high performance) computers
via network (TCP connections).
3 DIALOGUE STRATEGY
The voice interface between an operator and the
controlled process is provided by a speech
recogniser and a text-to-speech synthesis (TTS)
system (both for Czech language). The TTS
synthesis system named EPOS was developed by
URE AV Prague. It allows various male or female
voices with many options of setting.
The speech recognition is based on a proprietary
isolated word engine that was developed in previous
projects. The recogniser is speaker independent,
noise robust, phoneme based with 3-state HMM
(Hidden Markov Models) and 32 Gaussians. It is
suitable for large vocabularies (up 10k words or
short phrases) and allows us to apply various
commands and their synonyms.
The dialog system is event-driven. We can
categorize the events into three fundamental
branches: operator events, scene manager events and
device events.
3.1 Operator Events
Operator events usually occur in response to
operator’s requests. For example commands which
are supposed to cause robot’s movement, object
detection, new command definition or detection of a
new object. This kind of event can occur at any time,
but the dialog manager has to resolve if it was a
relevant and feasible request or if it was just a
random speech recognition error.
Although the acoustic conditions in robotic
applications usually involve high background noises
(servos, air-pump), the speech recogniser works
commonly with 98% recognition score. If the
operator says a wrong command or a command out
of context (for example, the operator says “drop” but
the robot doesn’t hold anything) then the manager
asks him or her for a feasible command in the stead
of the nonsensical one.
3.2 Scene Manager Events
This sort of event occurs when the scene manager
detects a discrepancy in the scene. For example
when the operator says “move up” and the robot’s
arm moves all the way up until the maximum range
is reached. When this happens a scene event is
generated and the system indicates that the top
position was reached.
Other scene event occurs when the operator
wants to take up an object, but the system does not
know which one because of multiple objects
THE PROTOTYPE OF HUMAN – ROBOT INTERACTIVE VOICE CONTROL SYSTEM
309