Playout
t
k
i
Talkspurt
k
Silence
k
Talkspurt
k+1
Talkspurt
k
Talkspurt
k+1
Silence
k
∆
Receiver
a
k
i
p
k
i
Sender
Figure 1: The timings between the transmission, reception
and playout of packets.
ternet audio packet traces that, contrary on what was
claimed, the E-NLMS algorithm does not overperfom
the original NLMS algorithm. Finally, we propose an
improvement with spike detection that overperforms
both, the NLMS and the E-NLMS algorithms.
The remainder of the paper is as follows. In Sec-
tion 2, we provide some background about playout
delay algorithms. In Section 3 we describe the related
work on which we base our proposal, and particularly,
on the algorithms using the NLMS playout algorithms
just cited. In Section 4, we present our ongoing work
on an improvement of the original NLMS algorithm
proposed by DeLeon. Finally in Section 5, we con-
clude and present our current and future work.
2 BACKGROUND
Receivers use a playout buffer to smooth the stream
of audio packets. This smoothing is done by delay-
ing the playout of packets to compensate for variable
network delay. The playout delay can be either con-
stant through the whole session, orcan be adjusted be-
tween talkspurts. Moreover, in a recent work (Liang
et al., 2001), it has been shown that by using a tech-
nique called packet scaling, it is possible to change
the playout delay from packet to packet while keep-
ing the resulting distortion within tolerable levels. In
this paper we only focus on the per-talkspurt playout
delay adjusting approach.
Figure 1 shows the different stages incurred in an
audio session. The i-th packet of talkspurt k is sent at
time t
i
k
, it arrives at the receiver at time a
i
k
, and is held
in the smoothing receiver’s playout buffer until time
p
i
k
, when it is played out. Within a talkspurt, packets
are equally spaced at the sender by time intervals of
length ∆ seconds.
By delaying the playout of packets and dropping
those that arrive after their deadline, we are able to
reconstruct the original periodic form of the stream.
This adjusting mechanism results in a regenerated
stream having stretched or compressed silence peri-
ods compared to the original stream. These changes
are not noticeable by the human ear if they are kept
within tolerable small levels.
In Fig. 1, a dropped packet due to a late arrival is
represented by a dashed line. A packet is artificially
dropped if it arrives after its scheduled deadline p
i
k
.
The loss percentage can be reduced by increasing the
amount of time that packets stay in the playout buffer.
An efficient playout control algorithm must take into
account the trade-off between loss and delay in order
to keep both parameters as low as possible.
Throughout the paper, we use the notation de-
scribed in Table 1. For the validation of our algo-
rithm, we consider the packet traces generated with
the NeVoT audio tool that are described in (Moon
et al., 1998). We choose to use those traces since they
are packet traces generated during real audio conver-
sations. In (DeLeon and Sreenan, 1999) and later
in (Shallwani and Kabal, 2003), DeLeon and Shall-
wani use traces generated with the ping program for
the latter, and traces generated between three hosts in
the US and one host in the UK. Those traces are not
available to everyone and thus we cannot verify the
delay behavior in them. On the other side, the au-
dio traces provided by Moon, contain the sender and
receiver timestamps of transmitted packets that are
needed for the implementation of any playout delay
control algorithm. In these traces, one 160 byte audio
packet is generated approximately every 20 ms when
there is speech activity. A description of the traces
(reproduced from (Moon et al., 1998)) is depicted in
Table 2.
A typical sample of packet end-to-end delays is
shown in Fig. 2. A packet is represented by a dia-
mond and talkspurt boundaries by dashed rectangles.
The x-axis represents the time elapsed at the receiver
since the beginning of the audio session. Only the
variable portion of the end-to-end delay (d
i
k
) is rep-
resented on the y-axis of Fig. 2. To this end, the
constant component of the end-to-end delay (mostly
caused by the propagation delay) is removed by sub-
tracting from packet delays their minimum over all
the corresponding trace. By considering the variable
portion of the end-to-end delay, synchronization be-
tween sender and receiver clocks can be avoided.
We observe in Fig. 2 the presence of delay spikes.
This phenomenon in end-to-end delay has been pre-
viously reported in the literature (Ramjee et al., 1994;
Bolot, 1993). Delay spikes represent a serious prob-
lem for audio applications since they affect the per-
formance of playout delay adaptation algorithms. A
delay spike is defined as a sudden large increase in the
end-to-end delay followed by a series of packets arriv-
ing almost simultaneously, leading to the completion
ON NLMS ESTIMATION FOR VOIP PLAYOUT DELAY ALGORITHMS - Improving Delay Spike Detection
343