user is in the Movies Space View, where movies can
be searched, overviewed and browsed, before one is
selected to be explored in more detail, in the Movie
View (Fig.1a-1b ). Tag clouds were adopted in both
views to represent summaries or overviews of the con-
tent in five tracks (subtitlesaudio events, soundtrack
mood, and felt emotions), for the power, flexibility,
engagement and fun usually associated with them.
After selecting the Back to the Future movie in
Figure 1b) (top right), it plays in the Movie View in
Figure 2a), with five timelines for the content tracks
showed below the movie, and a selected overview
tag cloud of the content presented on the right (audio
events in the example), synchronized with the movie
that is playing and thus with the video timeline, to em-
phasize when the events occur along the movie. From
the timelines, users may select which time to watch in
the video.
Playing SoundsLike. After pressing the Sounds-
Like logo (Fig.2a-b), a challenge appears: an audio
excerpt is highlighted in three audio timelines with
different zoom levels below the video, and repre-
sented, to the right, in the center of a non-oriented
graph displaying similar excerpts, with a field for se-
lection of suggested or insertion of new tags below, to
classify the current audio excerpt, and hopefully earn
more points. By presenting the surrounding neigh-
bours, and allowing to listen to entire audio excerpts
and watch them, SoundsLike was designed to support
the identification of the current audio excerpts to be
labelled.
Movie Soundtrack Timelines. Three timelines are
presented below the video (Fig 2:b-d and Fig.3): the
top one represents the entire soundtrack or video
timeline for the current movie; the second one
presents a zoomed-in timeline view with the level of
detail chosen by the user, by dragging a marker on the
soundtrack timeline; and the third one, also zoomed-
in, presents a close-up of a chosen excerpt as an au-
dio spectrogram. We designed the audio represen-
tation to be similar to the timelines already used in
MovieClouds (Gil et al., 2012) for content tracks (fig-
ure 1a)), with three levels of detail. Audio excerpts
are segments from the entire soundtrack, represented
here as rectangles. In all the timelines (and the graph):
the current excerpt to be classified is highlighted in
blue, while grey represents excerpts not yet classified,
green (and yellow) refer to excerpts classified before
(and skipped) by this user.
The relation between each of the timelines is rein-
forced by colour matching of selections and frames:
the colour of the marker in the soundtrack timeline
(white) matches the colour of the zoomed-in timeline
frame, and the colour of the selected audio excerpt
(blue for the current audio) matches the colour of the
spectrogram timeline frame. In addition, the current
position in each of the timelines is presented by a ver-
tical red line that moves synchronized along the time-
lines.
Audio Similarity Graph. To help the classification
of the current audio excerpt, the excerpt is displayed
and highlighted in the center of a connected graph,
representing the similarity relations to most similar
audio excerpts in the movie (Fig 2:b-d and Fig.4). Be-
ing based on a physical particles system, the nodes
repel each other tending to spread the graph open, to
show the nodes, and the graph may be dragged around
with an elastic behaviour (Fig.4b). The similarity
value between two excerpts is expressed as the Eu-
clidean distance between audio histograms computed
from extracted features (further details can be found
in (Langlois and Marques, 2009)), and it is translated
to the graph metaphor as screen distance between two
nodes. The shorter the edge is, the most similar ex-
cerpts are. The nodes use the same colour mapping
as the timelines, and the current excerpt has an ad-
ditional coloured frame to reinforce if it was already
classifying or skipped. The users can hover on each
node to quickly listen to the audio excerpts. On click,
the corresponding movie segment is played. For ad-
ditional context, on double click the audio becomes
the current audio excerpt to be classified. This is par-
ticularly useful, if the users identify this audio better
that the one they were trying to identify, and had not
classified it before, allowing to earn points faster; and
to select similar audios after they have finished the
current classification.
Synchronized Views. The views are synchronized
in SoundsLike. Besides the synchronization of the
timelines (section 4), the selection of audio excerpts
in both timelines and graph synchronize. This is
achieved by adopting, in both views: the same and
updated colours for the excerpts; and by presenting,
on over (to listen), a bright red coloured frame to tem-
porarily highlight the location of the excerpts every-
where. In addition, when an audio excerpt is selected
to play a frame is set for the excerpt in the graph node
and timelines and this time also in the video (in a
brownish tone of red) to reinforce which excerpt is
playing (Fig.2c).
Labelling, Scoring and Moving On. To label the
current audio excerpt – the actual goal – the users
choose one or more textual labels, or tags, to describe
AVideoBrowsingInterfaceforCollectingSoundLabelsusingHumanComputationinSoundsLike
475