the network delivery architecture. Three network de-
livery architectures are then considered, Centralized,
Unicast Full Mesh, and a brief outline of a Hybrid
system. A traffic model is constructed with reference
to NGN core/access partitioning, and comparisons of
resulting traffic are made for each architecture.
2 CONFERENCE MODELS AND
SPATIAL AUDIO
Spatialized or 3D audio for virtual multiparty confer-
encing has been implemented by Kilgore et al (Kil-
gore et al., 2003), with simple manipulation of Inter
Aural Time Differences (ITD) and the Inter Aural In-
tensity Differences (IID) in accordance with duplex
theory (Cheng and Wakefield, 2001). HRTF based
systems are known to produce effective spatial repro-
duction (Crispien and Ehrenberg, 1995) (Evans et al.,
2000) and have been integrated into a conferencing
application under our development.
Using HRTF based spatial audio, a participant’s
mono voice stream may be convolved with a HRTF
to give a binaural audio stream that has temporal and
spectral effects that mimic a sound source from a
given point in space. Convolving each participant
with a different HRTF (relating to a different azimuth
and/or elevation), and then mixing the output for all
participants produces an audio space in which each
different speaker’s utterance will appear to emanate
from a different spatial location. As mentioned previ-
ously, this has many benefits for communication and
more importantly allows multiple conversation floors
to emerge through the process of schisming (Egbert,
1997), in which a large conversation floor involving
many participants may fragment into several smaller
floors. Users make use of the cocktail party effect
to ignore other conversations within the audio space,
and to align their speech turns to a conversation floor
of their choosing. As a result many floors may exist
within the space/conference. The floor control mech-
anism of limiting and choosing the number of simul-
taneous speakers is no longer required, as many par-
ticipants may speak simultaneously without masking
each other. Limits to the number of conferees are dis-
cussed later in relation to the delivery architecture.
Where the mixing and HRTF filtering is per-
formed has direct implications for both the scope of
such an audio space, and the resulting network traffic.
The next section introduces the possible architectures,
with a brief discussion on NGN partitioning.
Core
User A
User B
User
C
Access Link
Server Mix Stream
User Stream
Core Link
MRFP
Figure 1: Core/Access Network Division.
2.1 NGN and IMS: Centralized
Conferencing
The NGN architecture provides logical division be-
tween service functions and the underlying trans-
port technologies. The transport functions are fur-
ther divided into access and core network functions,
which perform a range of quality of service mech-
anisms including packet filtering, marking, shaping,
buffer management, scheduling and queuing (Knight-
son et al., 2005). The core transport network and its
associated control functions provide a platform to de-
liver traffic for services such as the IMS, and may
be logically separated by technology, ownership or
administrative boundaries. An IMS may be located
within a core network partition, and can provide sup-
port for media services such as audio conferencing.
An Application Server (AS) within the IMS can be
used for conference control, with SIP based session
control through call session control functions (CSCF).
In the NGN/IMS model, ASs have control over au-
dio mixing and filtering through the media resource
function controller (MRFC) that directly controls the
media resource function processor (MRFP) which is
responsible for audio processing. The AS and MRFP
may be physically separate, and thus it is the MRFP
location that is critical as the audio traffic dominates
the signalling traffic.
2.1.1 Mixing
The MRFP allows for a centralized audio conferenc-
ing model, under the control of an application server.
An outline for server based audio mixing for mono-
phonic conferencing is described in (Singh et al.,
2001), including a discussion of the decoding, jitter
buffering and mixing procedure, as well as some per-
formance statistics. Figure 2 shows the additional
filtering process within the MRFP required to pro-
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
248