an inbound and an outgoing stream from the client’s
side, thus being more costly in bandwidth.
The processing cost, especially for the client is a
very important factor for deciding which solution
will be implemented. The combination of java with
X3D content and the fact that the platform runs via
an X3D-enabled HTML browser lead to a relatively
high processing cost. The addition of spatial audio
support must, therefore, be as light, in processing
cost, as possible. The server processing cost is very
important as well due to the number of clients a
server may be requested to server. The first solution
adds very little processing cost to the client, since no
extra modules are required to playback the audio.
The only extra cost is that of the X3D browser
reproducing the audio which is comparatively small
comparing it with the need for a new module. The
second solution is light as well for the client since
only one JMF Player class instance is required for
audio playback.
However, that is not the case for the server’s
side. The invocation of n
2
threads, if n is the number
of clients, demands much more processing time than
the n threads required by the first’s solution server.
In addition the spatialization of the audio is
performed exclusively in the server, when the
second solution is used while on the first scenario
each client’s applet is responsible for the process.
The cost of spatializing the audio becomes a vast
process cost when the clients increase greatly in
number. Conclusively the first solution comes with
lower processing cost than the second one.
In terms of complexity the picture remains the
same. The first scenario relies for the spatialization
to the internal operations of the X3D browser, while
the second invokes n
2
times the spatialization
algorithm, while at the same time the SIP Spatial
Audio Server is assigned with the complex task to
mix n streams for each user, where n is the number
of online clients.
To conclude with, the first solution is in all of its
aspects better than the second. However, there is a
platform dependent issue that need careful
examination in order to avoid a very uncomfortable
situation. Due to that fact that the first scenario
relies to the continuous communication between the
VRML Server and the SIP Audio Server, a
breakdown of the SIP Audio Server could lead to a
VRML Server exception and vice versa.
Nevertheless given the necessary attention this issue
can be sustained, and as result, given the overall
dominance of the first solution, the first scenario
was chosen.
3 DESCRIPTION OF SIP
SPATIAL AUDIO MECHANISM
In this section the mechanism behind the SIP Audio
Server is described. The mechanism consists of
three main components. The SIP component, the
capture component the RTP component and the
spatialization component The following three
paragraphs describe each one of the above
respectively.
Each EVE client applet features an integrated
SIP client (Figure 4). When the EVE Applet
connects to the connection Server of the platform, a
unique port is granted for SIP use. The applet passes
this port parameter to the SIP client, which sends an
SIP INVITE message to the SIP server in the
previously mentioned port. Subsequently, the client
waits for the SIP OK message. As long as the server
accepts the invitation, a server thread is created to
serve the client. The thread establishes an rtp receive
stream with the client while the client establishes an
RTP send stream with the server thread. When the
client decides to disconnect from the platform a SIP
BYE message is sent to the server. When the client
receives a SIP OK message from the server, the
session ends.
Once the SIP session is established the client’s
applet invokes the methods for capturing the sound.
Firstly, a list of the available capture devices is
examined until an appropriate for sending audio
data, is found. Next, follow the instantiation of a
processor that receives the capture data and
produces a data source in the specified format that is
continuously filled with captured audio data. This
data source is used by the RTP stream to send the
audio data to the sever.
The RTP manager creates an RTP send stream
and passes to it as argument the data source that is
produced by the processor. Once this is
accomplished the stream starts sending the audio
data of the audio source. On the server side, the
server thread that corresponds to the particular client
instantiate an RTP manager, which, in turn, creates a
receive stream that stores the received data to a
buffer file for a constant amount of time. When this
amount time has elapsed a second buffer is being
written for the same amount of time while the first is
flushed. This procedure is continuously repeated
with one buffer being filled with data and the other
being flushed.
A SIP SPATIAL AUDIO SERVER FOR THE EVE PLATFORM
399