2.2.1 Client Logging
We have developed a light–weight client applica-
tion that requests an RTP (Schulzrinne et al., 1996)
stream from a media server, accepts RTP packets,
and records session-level statistics. In addition, it can
record a trace of every RTP/RTCP packet (packet ar-
rival time, size, sequence number, and the media de-
code time).
Every experiment runs two types of client applica-
tions: loading clients and probing clients. A load-
ing client is a long-lived session that exercises the
server at the level of concurrent requests. To sup-
port a large number of simultaneous loading clients,
it only records session-level statistics. The probing
client is a short-lived session that is issued consecu-
tively to collect detailed session statistics after an ex-
periment launches all the loading clients and reaches
steady state. It records both session-level statistics
and a trace of the delivered data packets.
From the trace, we can also derive the number
of rebuffering events, which is the number of late-
arriving packets observed from the probing session.
Late-arriving packets are computed from the packet-
arrival offset, the difference between each packet de-
livery time and its deadline. The detection of rebuffer-
ing events was, however, often problematic due to in-
creasingly bursty packet transmissions as the server
workload increased. The timing of these bursts was
such that, on occasion one or two packets would be
delayed beyond their delivery deadline. This small
amount of over-delayed data resulted in rebuffering
violations on those experiments, even when the server
was otherwise not saturated. We found that, by re-
categorizing these few packets as being lost data (in-
stead of late data), we could avoid a rebuffering vio-
lation without inducing a size violation. This greatly
improved the reliability and reproducibility of our de-
cision surface.
2.2.2 Session Failure and Server Capacity
Decision
If the server system is overloaded, a newly delivered
streaming request may be either rejected or admitted
but experience degraded session quality. Among ses-
sion failures, some can be detected from error log
files easily (hard failure), while others need further
processing (soft failure). Admission rejection and ex-
plicit session termination in the middle are hard fail-
ures.
Soft failure is a general term that describes an unac-
ceptable user streaming experience of a session. Du-
ration violations, size violations, and rebuffering vio-
lations belong to this category. These are defined as
follows:
• Duration Violation: Any session that satisfies the
following inequality condition |
T (s)
T
s
− 1| >ρ
T
is
considered to violate the duration requirement. T
s
is the expected duration of session s, T (s) is its
measured duration, and ρ
T
(0 <ρ
T
< 1) is the
acceptable range of the duration.
• Size Violation: Any session that satisfies that fol-
lowing inequality condition 1 −
B(s)
B
s
>ρ
B
, where
B(s) <B
s
, is considered to violate the session
length requirement. B
s
is the expected amount of
data bytes received at the client side for session s,
B(s) is its measured size, and ρ
B
(0 <ρ
B
< 1) is
the acceptable range of the bitstream length.
• Rebuffering Violation: Any experiment which
has N number of individual probing statis-
tics and satisfies following inequality condition
N
s
{I(s)+P ·R(s)}
N
s
T
s
>ρ
Q
is considered to violate the
desired service quality. I(s) is the start-up delay of
the measured session s, R(s) is the sum of time pe-
riods when the session s was in a rebuffering state,
P is the penalty constant assigned per rebuffering
event, and ρ
Q
(0 <ρ
Q
< 1) is the acceptable range
of the service quality.
Duration and size violations are obtainable from
session-level statistics, while rebuffering violations
are computed from data packet traces available at
client log statistics. Our failure model excludes the
condition B(s) >B
s
where the test session re-
ceives more packets than expected, which is caused
by packet retransmission.
To evaluate the user’s experience, we may directly
measure the quality of voice samples and the qual-
ity of video images received at the client side (P.862,
2001; Wolf, 2001) or indirectly estimate a user’s frus-
tration rate. We prefer the less accurate but real–time
quality evaluation method. Otherwise, the server ca-
pacity decision would take a tremendous amount of
time to finalize due to its stepwise nature. For this
reason, we chose Keynote’s indirect method (Keynote
Inc., 2003). The frustration rate proposed by Keynote
Inc. is a well-established methodology to quantify a
user’s streaming experience. This measure computes
the waiting time spent at startup, the initial buffering,
and rebuffering events of the measured session. To
minimize false negatives caused by statistically gener-
ated spikes during the experiments, our methodology
extends Keynote’s rating system by collecting and an-
alyzing multiple probing sessions.
If any session failures are seen at any time dur-
ing the experimental epoch, the streaming server is
labelled as being saturated for the full experimental
epoch. Each experimental epoch used to determine
the saturation point consists of five 20-minute mea-
surement sets at a possible saturating workload. This
repetition ensures a reproducible, internally consis-
tent categorization of the server.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
126