AN EFFICIENT PACKETIZATION SCHEME FOR VOIP

A. Estepa, R. Estepa and J. Vozmediano

University of Sevilla

Camino de los Descubrimientos s/n - E41092 Seville

Keywords:

VoIP, trafﬁc characterization.

Abstract:

A number of VoIP audio codecs generate Silence Insertion Descriptor (SID) frames during talk-gaps of con-

versations to update the comfort noise generator-parameters at the receiver. According to the RFC 3551

packetization scheme, discontinuously-generated SID frames can not be carried in the same IP packet, thus

increasing the conversation’s bandwidth consumption..

We deﬁne a novel packetization scheme in which a set of non-consecutive SID frames may share the same

packet, reducing the overhead while keeping the timing between them. We provide analytical expressions and

experimental validation for the bandwidth savings obtained with this new scheme, which grows up to a 14%

for the G.729B codec.

This work was supported in part by the Spanish Secre-

tar

ıa de Estado de Universidades y Educaci

on under

the project number TIC2003-04784-C02-02

1 INTRODUCTION

Voice over Internet Protocol (VoIP) is experiencing an

exponential growth in recent years due to the implicit

cost saving that conveys the usage of the free Internet

or corporate networks.

The goal for a VoIP system is the achievement of

a high quality of service (QoS) at the minimum cost.

The main impairments affecting the VoIP quality are

delay, packet loss and the codec’s intrinsic quality.

Since delay and packet loss are related to the available

bandwidth in the network, the reduction of the conver-

sation’s bandwidth requirement is a key goal in the

design of VoIP systems. The two main factors deter-

mining the bandwidth requirement of a voice stream

are the codec and the number of voice frames car-

ried in each packet (packetization rate). The chosen

codec should balance bandwidth consumption (codec

bit-rate) and listening speech quality, while the pack-

etization rate should balance between bandwidth sav-

ing (the more codec’s frames in each packet, the less

overhead ratio) and delay. The packetization rate can

be customized by the user in VoIP clients, and could

potentially achieve signiﬁcant bandwidth savings at

the cost of increasing the packetization delay.

Codecs periodically generate compressed voice

frames. Low bit-rate codecs are usually equipped

with a voice activity detection (VAD) feature which

pursues bandwidth savings by avoiding the generation

of frames during voice inactivity periods. Addition-

ally, some audio codecs such as G.729, G.723.1 or

AMR are also equipped with a discontinuous trans-

mission algorithm (DTX) which allows, at the be-

ginning of each voice inactivity period, to send a

short-sized type of frames named Silence-Insertion-

Descriptor (SID). Reception of a SID frame after a

voice frame can be interpreted as an explicit indica-

tion of the end of the talk-spurt. In addition, SID

frames may be also transmitted at any time during

the silence interval to update comfort noise genera-

tion parameters. This allows a faithful reproduction

of the background noise at the receiver’s side, increas-

ing the quality of the conversation at the cost of some

additional bandwith (Estepa et al., 2003). But the SID

frames, although short-sized, cause the generation of

RTP/UDP/IP packets affecting negatively to the band-

width consumption of the conversation (Estepa et al.,

2003).

Voice streams are typically transported using RTP

over UDP. The Real Time Protocol (RTP) is aimed to

provide the receiver with two main features in addi-

Estepa A., Estepa R. and Vozmediano J. (2006).

AN EFFICIENT PACKETIZATION SCHEME FOR VOIP.

In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 65-70

DOI: 10.5220/0001569300650070

 SciTePress

tion to the packet’s sequencing feature:

1. Payload identiﬁcation. The Payload-Type ﬁeld of

the RTP header uses the proﬁle identiﬁers given at

RFC 3551 (Schulzrinne, 2003), as a mean to pro-

vide information to identify the codec, the frame

type and how frames of a given codec (voiced or

SID) are packet. For example, for the G.723.1 three

types of frames are distinguised and marked using

two bits at the beginning of each frame: Active for

5.3 mode, Active for 6.3 mode and SID. Neverthe-

less for the G.729 codec, the frame type is indicated

by the size of the packet.

2. Jitter compensation. A timestamp is inserted in

the RTP header which allows to compensate the

network jitter. The receiver of a packet contain-

ing voice frames can adjust the play-out instant of

the ﬁrst frame of the packet by using this times-

tamp. Subsequent frames of the packet are time-

consecutive, so they can be played-out on the

proper instant at the receiver with no additional in-

formation other than the audio proﬁle identiﬁer.

Since the RTP header only provides one tim-

ing instant, there would be no way to choose the

right playout time for SID frames of the same

IP packet having any time-slot gap in between

them. As a result, the current packetization scheme,

RFC 3551 (Schulzrinne, 2003) also imposes that only

consecutively-generated SID frames are carried in the

same packet (Zopf, 2002). But the large RTP/UDP/IP

header size when compared to the reduced size of

the SID frames (typically 2 or 4 bytes) makes worth

considering to share the same packet for several SID

frames.

This paper proposes a new packetization scheme

to allow the transmission of non-consecutively-

generated SID frames in the same packet. The

new scheme does not modify the existing packeti-

zation scheme for active frames, and is backward-

compatible with the RFC 3551 packetization proﬁle.

This scheme will save bandwidth in conversations us-

ing SID-capable codecs, specially in scenarios with

a large number of potential users, where the header-

compression technique shows scalability problems.

This is may be the case of a corporate network branch

to central ofﬁce communication, or the case of the

VoIP Service Provider transport service, since the cost

of the transport can be considered proportional to the

required bandwidth.

The remainder of this paper is organized as fol-

lows: the mechanism is described in section 2. The

analytical expression of the achieved rate reduction is

deduced in section 3. The analytical expressions are

experimentally validated in section 4 and, ﬁnally, sec-

tion 5 concludes the paper.

2 PROPOSED PACKETIZATION

SCHEME

To reduce the bandwidth usage of the current IETF

packetization scheme and at the same time solving

the problem of obtaining the right playout time for

every SID frame but the ﬁrst one of each packet, we

deﬁne a new payload type called multi-SID. To main-

tain backward compatibility, we re-deﬁne the format

of the existing RTP payload proﬁle identiﬁer number

13 (Schulzrinne, 2003) currently devoted to indicate a

generic non-proprietary SID frame. This generic SID

frame, whose format is deﬁned in (Zopf, 2002), uses

the ﬁrst byte to indicate the power level of background

noise with a number between 0 and 127 (note that the

most signiﬁcant bit is always set to 0.)

We propose to extend the aforementioned format to

include also our new multi-SID payload type, which

will be indicated by setting the ﬁrst bit of the ﬁrst byte

to one as shown in ﬁgure 1. In our format, the re-

maining bits of the ﬁrst byte of this multi-SID frame

identify the codec (and thus the SID frame size being

used.) Following bytes carry the ﬁrst SID frame (e.g.

two-bytes size in ﬁgure 2.) At the end of the every

SID frame but the last one, an additional ﬁeld called

NF stands for the number of frame-period gaps to be

inserted before playing the next SID frame. The NF

ﬁeld has a length of one octet.

A A A

A A S

S S S S

1 COD NFS S

1 byte

S S

A A

...

Frames

RFC 3551

multi−SID packet

Proposed

S S

SAAAA S S S SA

IP+UDP+RTP Header

ACT frame

SID frame

Figure 1: Proposed packetization scheme.

Finally, we propose to use the multi-SID payload

type only whenever two or more SID frames gener-

ated within N

fpp

frame intervals would be forced to

travel in different packets according to the current

RFC 3551 scheme. This implies that the bandwidth

consumption will always be equal or less than with

the RFC 3551 packetization scheme. The bandwidth

saving achieved by this procedure will be estimated in

next section.

SIGMAP 2006 - INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA

APPLICATIONS

3 BANDWIDTH GAIN

ESTIMATION

The aim of this section is to quantize the beneﬁts of

our proposed scheme. The reader should keep in mind

that the contribution of this paper is the new pack-

etization scheme, since the deduction of the conver-

sation’s mean bit-rate analytical expression was ﬁrst

presented in (Estepa et al., 2005).

3.1 Mean Bit-rate for RFC3551

Packetization Scheme

We assume that, during voice activity periods, a new

packet transporting N

fpp

compressed-voice (ACT)

frames is generated every T ·N

fpp

. During voice inac-

tivity periods, a new packet loaded with SID frames is

sent according to the RFC 3551 packetization scheme

(in the VoIP case). Since SID frames generation is a

random process, we will use a discrete random vari-

able, X, to indicate the inter-arrival time (in number

of periods) between SID frames. Moreover, we as-

sume that SID frame generation is a renewal process.

The mean bit-rate of one conversation can be cal-

culated as the sum of the contributions of the trafﬁc

generated during voice activity (R

) and voice inac-

tivity periods (R

SID

). Let ρ be the conversation activ-

ity rate. Then:

R = ρ · R

+ (1 − ρ) · R

SID

(1)

where ρ is the conversation mean activity rate, p is

the peak rate and R

SID

is the mean rate during voice

inactivity periods caused by the transmission of SID

frames. For those codecs which do not generate SID

frames, obviously r

SID

= 0.

The peak rate depends on both the codec character-

istics and the number of frames per packet. Thus, it is

clearly given by:

H + N

fpp

ACT

fpp

(2)

where H is the header size of the encapsulating

protocols (i.e. 40 octets for the VoIP case,) L

ACT

is the

voice frame size and T is the frame generation period

of a given codec. Table 1 shows the characteristics of

some VoIP codecs.

Regarding to the R

SID

factor of equation 1, an an-

alytical expression for the VoIP transport case was

deduced and validated in (Estepa et al., 2004). The

deduction was based in the separation of the contri-

bution of the header and the SID frames to the mean

bit-rate so r

SID

= R

. The contribution of the

SID frames can be obtained by application of the El-

ementary Renewal Theorem (ERT) which states that

the SID frames arrival long-term rate is the inverse of

the expected inter-arrival time (E[X] · T ).

Table 1: Codec’s characteristics.

Codec Mode L

ACT

SID

T (ms) E[X] P

G.729 - 10 2 10 7.33 0

G.723.1 6.3 24 4 30 13.05 0.27

5.3 20 4 30

AMR 4.75 12 5 20 7.47 0

12.2 31 5 29 7.47 0

SID

T · E[X]

(3)

where L

SID

is the size of a SID frame.

In VoIP, the contribution of the packet header gen-

erated during inactive periods follows the packet gen-

eration pattern imposed by the RFC 3551, where

one packet header is sent every non-consecutive SID

frame (i > 1). For consecutive SID frames, one

packet header is sent every N

fpp

frames, so both cases

must be considered. Since the mean time between

SID frames is given by (E[X] · T ), the header contri-

bution (R

) can be expressed as:

= P

fpp

· T · E[X]

+ (1 − P

)

T · E[X]

(4)

where P

stands for the probability of having two

time-consecutive SID frames. Thus, for the VoIP case

we have an overall mean bit-rate of:

RF C3551

= ρ ·

ACT

fpp

· T

(1 − ρ)

E[X] · T

· L

SID

+ H · 1 +

(1 − N

fpp

)

fpp

(5)

The right side sum of equation 5 represents R

SID

which was calculated adding separately the contribu-

tion to the mean bit-rate of the packet headers (R

SID

)

and the SID frames (R

SID

3.2 Mean Bit-rate for multi-SID

Packetization Scheme

The next step is to deduce the mean bit-rate R

multiSID

for the new packetization scheme. Clearly, R

does

not change. Following the same approach from (Es-

tepa et al., 2005) for R

SID

, we separate the rate con-

tribution by the packet headers and the by the SID

frames. The novelty now is the packet generation

AN EFFICIENT PACKETIZATION SCHEME FOR VOIP

schema and, thus, the header contribution (factor

SID

.) As the size of the packet header depends on

the number of conveyed SID frames, an overhead of

an extra byte per SID frame must be accounted when-

ever the multi-SID packet format is used, as shown in

ﬁgure 2.

The contribution of the header in the binary rate

can be readily deduced noting that for an inter-arrival

time X = i < N

fpp

, the number of SID frames car-

ried on the packet is ⌈

fpp

⌉, otherwise packets will

transport only one SID frame. Also note that for time-

consecutive SID frames (i.e. X = i = 1), the multi-

SID payload is not applied. Thus, the resulting equa-

tion for R

SID

is:

SID

T · E[X]



fpp

· P

fpp

−1

i=2

H + 1

fpp

· P

∞

i=N

fpp

H · P





(6)

Adding equations 6,3 and 2 and reordering, we ob-

tain the overall mean bit-rate using our proposal pack-

etization scheme, which is:

multiSID

= ρ ·

ACT

fpp

· T

1 − ρ

E[X] · T

SID

+ H · 1 + P

fpp

− 1

fpp

−1

i=2

1 + H ·

1 −

fpp

(7)

From equation 7 and equation 1 the bandwidth gain

can be directly computed.

4 VALIDATION AND

NUMERICAL RESULTS

This section presents the results of a comparative

study of the mean bit-rate achieved with both pack-

etization schemes (RFC 3551 and multi-SID) which

allow us to quantify the beneﬁts of the multi-SID one,

and validate the equations presented in previous sec-

tions.

4.1 Experiment Setup

The result above has been validated by using the test-

bed described in (Estepa et al., 2003), where 5 hours

of conversations were recorded from a ISDN line in

an low-noise ofﬁce environment (i.e. SNR > 20dB).

The raw audio ﬁles were encoded using the G.729B

codec. This codec, highly available in any VoIP en-

vironment, holds the capability of generating SID

frames and is widely referenced in the literature, so it

will let us to compare our results with previous stud-

ies. The resulting sequences of frame types gener-

ated by the codec (i.e. ACT, SID or NULL) were pro-

cessed to empirically determine the values of P

and

the conversation activity factor ρ. These sequences

were also used to feed a packetization program that

builds up the IP packets according to both packeti-

zation schemes presented in this paper: RFC 3551

and the multi-SID scheme. The output of the pro-

gram shows the mean bit-rate for each conversation.

We will use these values to validate the analytical ex-

pressions deduced here and to probe the bandwidth

savings that the proposed packetization scheme pro-

vides over the traditional RFC 3551. A similar vali-

dation procedure can be followed to extend the results

to other codecs.

4.2 Numerical Results

Figure 2 shows the mean-bit rate measured with the

aforementioned simulations for both packetization

schemes and the plot of the analytical models given

by equations 7 and 5.

Figure 2: Mean Binary Rate for both multi-SID and RFC

3551 packetization schemes.

Figure 3 shows the difference of both packetiza-

tion scheme s in terms of bandwidth. With the G.729

codec and N

fpp

=9 (which implies a packetization de-

lay of 90ms), the bandwidth reduction reaches a 14%.

SIGMAP 2006 - INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA

APPLICATIONS

Figure 3: Bandwitdh saving for multi-SID respect to RFC

3551: analytical and measurement results.

In the Internet, the network delay and associated

jitter are clearly dominant factors in the overall de-

lay leaving no room for large packetization values

(typically N

fpp

=2 or 4 for G.729). However in cor-

porate networks, where the network and jitter delay

are low, the admissible packetization delay could po-

tentially allow higher N

fpp

values, yielding signiﬁcant

bandwidth savings with the multi-SID packetization

scheme.

5 CONCLUSIONS

This paper presents a new packetization scheme

that can be used in VoIP. This backward-compatible

scheme permits sending packets loaded with multi-

ple non-consecutively-generated SID frames, reduc-

ing the conversation’s bandwidth requirement.

An analytical expression for bandwidth saving

when new packetization scheme is used has been de-

duced. Experimental validation conﬁrms that the new

scheme improves the existing one up to 14% for ad-

missible packetization values. The multi-SID pack-

etization scheme is specially meaningful in the back-

bone of corporate networks where many voice sources

are multiplexed.

REFERENCES

Estepa, A., Estepa, R., and Vozmediano, J. (2003). Packeti-

zation and Silence Inﬂuence on VoIP Trafﬁc Proﬁles.

Lecture Notes in Computer Science, 2899(1):331–

339.

Estepa, A., Estepa, R., and Vozmediano, J. (2004). A New

Approach for VoIP Trafﬁc Characterization. IEEE

Communications Letters, 8(10):644–647.

Estepa, A., Estepa, R., and Vozmediano, J. (2005). Accurate

prediction of voip trafﬁc mean bit rate. IEE Electronic

Letters, 8(10):644–647.

Schulzrinne, H. (2003). RTP Proﬁle for Audio and Video

Conferences with minimal control. RFC 3551, Inter-

net Engineering Task Force.

Zopf, R. (2002). Real-time Transport Protocol (RTP) pay-

load for comfort noise (cn). RFC 3389, Internet Engi-

neering Task Force.

AN EFFICIENT PACKETIZATION SCHEME FOR VOIP