on this kind of information (usually more than 95%
compression rate). Every modern browser support-
ing HTML5 also supports HTTP compression, so ex-
changing compressed data needs little effort. During
the preparation phase explained above, IEEE 1599
files are gzip-compressed and stored with a custom
file extension (.xgz). The web server is configured to
associate the right MIME-type (text/xml) and content
encoding (x-gzip) to that file extension. When a client
requests an IEEE 1599 file, the server HTTP response
instructs the browser to enable its HTTP compression
features in order to decompress the file before using
it. In this way all the advantages of DOM parsing can
be exploited and no overhead is added to the stream,
at the cost of a little initial delay.
4.2 Infrastructure and Application
Design
After preparing the material to make it usable by
a HTML5 browser, let us choose how to make it
available over the net. There are two main alterna-
tives: either using a common Web server and adopt-
ing the progressive download approach, like almost
every “streaming” player for the Web, or setting up
a full fledged streaming server. While the latter op-
tion permits the use of protocols specifically designed
for streaming (like RTP and RTSP) and may therefore
be more flexible, our choice has fallen on the former
one because it is effective for our purposes, easier to
implement and widely used in similar application do-
mains. Moreover, HTTP/TCP traffic is usually bet-
ter accepted by the most common firewall configu-
rations, and less subject to NAT traversal problems.
Finally, Web users seem to be less annoyed by some
little pauses during the playback rather than by quality
degradation or loss of information.
On the client side, the fundamental choice is what
media streams to request, when to request them and
how to manage them without clogging the wire or the
buffer. Three possible cases have been studied:
• One Stream at a Time. Among all the available
contents, just the stream currently chosen by the
user for watching or listening is requested and
buffered. When another stream is selected, the
audio (or video) buffer is emptied and the new
stream is loaded. The main advantage of this solu-
tion is that only useful data are sent on the wire: at
every time, the user receives just the stream he/she
requested. The principal drawback, on the other
side, is that every time the user decides to watch or
listen to other media streams, he/she has to wait a
considerable amount of time for the new contents.
• All the Streams at the Same Time. All the available
contents are requested by the client and sent over
the net. This approach drastically reduces delays
when jumping from one media stream to another,
at the cost of a huge waste of bandwidth caused
by the dispatch of unwanted streams. In order to
reduce network traffic, contents which are not cur-
rently selected by the user may be sent in a low-
quality version, and upgraded to full quality only
when selected. With this approach, a smooth tran-
sition occurs: when the user selects a new media
stream, the client instantly plays the degraded ver-
sion, and switches to full quality as soon as possi-
ble, namely when the buffer is sufficiently full.
• Custom Packetized Streams. Borrowing some
principles from the piggyback forward error cor-
rection technique (Perkins et al., 1998), streams
can be served all together inside a single packet,
containing the active streams in full quality and
the inactive ones in low quality. This implies the
existence of a “smart” server, which does all the
synchronization and packing work, and a “dumb”
client which does not even need to know anything
about IEEE 1599 and its structure.
Even if the last option presents a certain interest, it
requires a custom server-side application and burdens
the server with lots of computation for each client.
In this paper we will focus on the first two scenar-
ios, using the first (which is also the simplest) to draw
our attention on the synchronization aspects, evolving
then to the second to support multiple media streams
simultaneously.
4.3 Audio and Video Synchronization
One of the key features of the IEEE 1599 format is the
description of information which can be used to syn-
chronize otherwise asynchronous and heterogeneous
media. As presented in Section 3.1, every musical
event of a certain interest (notes, time/clef/key sig-
nature changes etc.) should have its own unique id
inside the spine. Those identifiers can be used to ref-
erence the occurrence of a particular event inside the
various resources available for the piece: the area cor-
responding to a note inside an image of the music
sheet, a word in a text file representing the lyrics, a
particular frame of a video capturing the performance,
a given instant in an audio file, and so on.
For the IEEE 1599 streaming Web player, the Au-
dio layer is the most interesting. Each related audio
or video stream is represented by the tag <track>,
whose attributes give information about its URI and
encoding format. Inside each <track> there are many
<track event> tags, which are the actual references
MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study
197