PERSONAL SOUND BROWSER

A Collection of Tools to Search, Analyze and Collect Audio Files in a LAN

and in the Internet

Sergio Cavaliere, Carmine Colucci

Università di Napoli “Federico II”

Dipartimento di Scienze Fisiche

Via Cinthia 80129 Napoly (Italy)

Keywords: Multimedia Databases, Audio Browsing.

Abstract: In this paper we present a toolbox aimed to search for audio files on the Internet, in a Local Area Network

or in a single computer. Search is finalized both to analyze the collected files and to populate a multimedia

archive for further use or analysis. The related tools to interface to a multimedia Data Base and analyze

files is also provided. The toolbox is intended to be open in the sense that any user may customize it at will

adding proprietary tools and methods. It is freely distributed and also open to contributions. The goal has

been achieved building a Matlab Toolbox; this, as is well known, results in an open environment that

anybody may customize at will. Research in the field of music and sound browsing analysis and

classification is a large and open field in which a large amount of different solutions have been proposed in

the literature. Deciding which sound parameters are suited to a kind of search or classification is still an

open problem: we are therefore providing an open environment where anybody may customize at will tools

and methods, an environment which, as a plus respect to other tools in the literature, starts from the very

first stage of the process, searching and browsing directly from the Internet. Our work goes in this direction

and proposes an open environment made of open tools for the purpose. The language used allows also, as a

further benefit, the advantage of straightforward prototyping of new tools. Interested researchers are kindly

invited to email the authors for the distribution of the toolbox.

1 INTRODUCTION

An open and very interesting field of research is the

search for tools and methods to implement automatic

identification, indexing and segmentation

mechanisms suitable for musical audio and sounds

(Tzanetakis & Cook, 2000, Zhang & Kuo, 2001,

Wold , 1996, Foote, 1999, Peeters , 2002). This is a

new concept in network and database navigation: the

navigation in audio environment, signals and

messages. This innovative type of navigation will

also allow, in automated mode and at the several

levels of the acoustical message content, to explore

the large mass of sound and musical information that

may be found in the Internet, by means of powerful

audio search, interpretation and classification tools.

The indexing of these multimedia objects poses

completely new problems, consisting in the

identification and the analysis of the audio signal, by

means of large set of parameters which may describe

the audio content and allow both classification and

search of sonic material for different purposes,

including just listening, or using audio data for

music composition or collect audio material for

other purposes.

The starting idea is to provide an instrument

which easily allows browsing in the Internet or in a

personal computer or in Local Area Network, just as

directory explorers allow listing directories,

searching for particular files based on names, length,

format or even content, and displaying file

information and features. The same simple paradigm

of file browsers should allow searching for audio

file based on file name, format, length, but, most

important, on content, sound parameters and others.

This is the idea of our Personal Sound Browser.

After the search, the files just found may be added to

a chosen Multimedia Data Base, in the form of links

to their position in the Personal Computer or LAN or

their Internet URL.

335

Cavaliere S. and Colucci C. (2006).

PERSONAL SOUND BROWSER - A Collection of Tools to Search, Analyze and Collect Audio Files in a LAN and in the Internet.

In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 335-338

DOI: 10.5220/0001571403350338

 SciTePress

Figure 1: The top level interface of the application. Menù :

add by WEB search - add by file search- Query DB.

2 THE PERSONAL SOUND

BROWSER TOOLBOX

The toolbox allows searching in the Internet, in a

Local Area Network (LAN) or in any storage device

of a Personal Computer .

The toolbox provides the user with an interface

to a Data Base Access (but any different DB is

allowed), using which he may populate an existing

archive; finally the archive may be visited,

annotating sound files and a search in it may be

performed both by example, that is providing a

source file, or specifying a range of parameters for

similarity comparison.

The whole toolbox is organized as a collection of

files and routines which at the top level allow the

following operations (see figure 1):

• Add by file Search

• Add by Web Search

• Query DB

2.1 Add by File Search and Web

The search, as already stated, may be performed in

the computer (any storage device in it), just

providing the starting directory and the number of

levels to be analyzed in the directory tree. In the

actual realization parameters for search operation

and listing are grouped along the following

organization:

file features: name-extension

sound features: number. of bits, number of

channels, sampling rate

options for the download: download with

preview-max dimension

In the same GUI, whose functionality is

straightforward, the user may select the destination

archive to populate, if for some files the user decides

to store them in the archive, with predefined tables

and structure.

A second interface GUI is accessed choosing the

adding by web search menu entry.

This GUI is quite similar to the Add by File

search interface and is used in order to state the

modality of search; the search is then performed just

reading the source page, looking for links to the

chosen kind of files and adding these URLs to a list;

then, in the same page, links to other pages are

searched for, going down in the resulting tree by

using a recursive depth first search (DFS) algorithm.

The search is performed using regular grammars

and stops as far as a chosen number of levels is

traversed or a predetermined number of pages is

visited. The search may be delayed in time, pre-

programming it to be performed at definite time, and

at the end of the search a shut-down may be

programmed. In this case, at the end of the search,

the results, that is an html file containing the

addresses of the files found in the search, is stored as

a log file, for further processing.

Here also functions are grouped by category as

shown in the following list:

• Source for the search

• search options include: name, extension,

number of pages, number of levels for depth

search, optional visualization of web pages

during the search

• sound features such as: number of bits -

number of channels - sampling rate

• Download options e.g maximum length, etc

• General options: delay the search, save the

search, shutdown PC when search has finished.

While most of the items are readily understood

from the name, we will point out some relevant

features; first of all search in the Internet may be

started from a specific URL; browsing at this

address will mean that we are just looking for

sounds, but also we are analyzing the content of the

specific URL: in fact we may visualize the starting

page and also the linked pages that will be visited

during the search, if we decide to practice this kind

of navigation; this will thus benefit also of

information on the context of the sounds.

A second possible source for the search may be

an html file saved by the user on the local disk, as a

result of a search performed by any search engine.

Our browser, in this case, stands on top of

professional and efficient commercial search engines

whose work we are enabled to refine just entering

SIGMAP 2006 - INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA

APPLICATIONS

336

the name of the performed and stored search; from

this our browser will start looping on the signalled

URLs for further analysis of the content.

Finally a third case may be that of a search

already performed by us by means of the Personal

Sound Browser, whose results have been stored in

our PC for this further analysis, as a log file.

Search is performed by means of depth first

traversing algorithm, allowing exhaustive search of

pages in the tree, up to a programmed number of

levels.

The second stage in both GUIs, the add by file

search and add by web search, is that of showing the

results of the search, if any, and the analysis of

individual files. The user may then ask for

downloading which is performed by means of a call

to the WGET free executable, from the GNU

project:

(http://www.gnu.org/software/wget/wget.html ).

For format different than wave the use of

executables from the LAME project

(http://lame.sourceforge.net/) allows audio format

conversions.

The screen GUI has the following sections:

• list of files: where you may choose the file to

download and analyze

• signal plot

o spectrum: linear scale / db scale / mel scale

for frequency axis

o specgram linear frequency scale / mel scale

o start time - final time of signal to display -

play

o view web page: option to view the source

web page from which the file was

downloaded.

• Data Base operations

o play-list: this is a token to be stored in the

Data Base, subjectively chosen by the user:

it may be a stile, or a user defined

collection or other

o insert selected sound file

o insert all sound files found

2.2 Browse the Data Base

The common interface to the DB, is provided by this

functionality, which allows selecting files in the DB

in order to fulfil some chosen features defined as

their mean value and dispersion. The parameters

have been computed on request, one by one, so that

the user, displaying the features along time may

carry on a thorough analysis of the signal.

Most of the features, are stored in form of time

varying features, on the basis of a chosen time

window, and their statistical distribution, including

mean and standard deviation, for later comparison

(Burred & Lerch, 2004).

The parameters chosen so far include, as suggested

by the current literature:

• ZCR zero crossing rate

• RMS root mean square (see Figure 2)

• PITCH

• CENTROID weighted mean frequency

• ROLLOFF (related to energy distribution)

• FLUX (related to energy distribution)

• MFCC Mel frequency cepstral coefficients

These parameters may be easily extended at will

by any user and are actually a starting point to be

improved along lines easily found in the large

literature in the field (Wold 1996, Scheirer 1997,

Rossignol 1998, Brown 1999, Lu 2001, Zhang 2001,

Zölzer 2002, Tzanetakis 2002, Burred Lerch 2004) .

Figure 2: The screenshot for a time varying parameter:

RMS value and histogram.

Search in the database is actually performed by

means of a similarity criterion: the parameters of a

chosen file are displayed in the form of their mean

and standard deviation; the user may modify these

parameters by hand or apply a multiplicative

coefficient to the standard deviation; the search is

then performed just looking for files whose mean

value of the parameters fall in the programmed

range; increasing the multiplicative coefficient of the

standard deviation broadens the range of files

collected, while further parameter refinement allows

reducing the selected class at will, up to a desired

class of sounds.

PERSONAL SOUND BROWSER - A Collection of Tools to Search, Analyze and Collect Audio Files in a LAN and in the

Internet

337

3 DEVELOPMENTS

Many improvement are programmed in our project,

in all phases of its operation.

An improvement is that of using eventual XML

content information for the search (Bellini Nesi

2001, Haus Longari 2002) and that of using text

information from the URL by means of techniques

from Natural Language Processing, to be added to

the content information obtained by the signal: the

context of the sound file, description, annotations

and similar may in fact add useful information on it.

Other features to be used as means for

classification and search will be added, from the

large number identified by the literature (Peeters,

Rodet 2002); an example is the kind of thumbnails

recently introduced by one of the authors

(Evangelista & Cavaliere 2005).

A second modality of search will also be

implemented, based on histogram similarity using

the Kullback-Leibler divergence or other measure.

In this case the user will provide an example file or

an entire class of files for the search; files are then

searched for, which provide best fit to the statistical

distribution of the parameters in the example file.

We are also working to an improvement of the

program, consisting in a parallel version of it;

parallelism will be achieved by a master computer

which will divide the burden of annotation in chunks

and will send tasks to slave computers (these mostly

are in the LAN, but also might reside in any position

in the network); these slaves, as soon as the user in

them decides to open to parallel processing, will

signal its presence in the net and will be waiting for

the completion of the task. The maarester in fact will

receive the address of the slaves which are ready

and will send to it a specific task. The granularity of

these tasks is easily identified in the analysis of the

different sound files: the master just sends the

address of the files in the Internet: the slave will

download the sound file and, in turn, send back the

computed sound parameters to be stored in the

archive for further search.

The practice of our project has collected its first

encouraging results, showing that it has configured a

complete set of tools, which, installed in a Local

Area Network, in a studio or also classroom or

Research Laboratory, allows easily the efficient

paradigm of a parallel archive with distributed

storage and also distributed processing.

Also we realized that in spite of the use of high

level interpreted languages the efficiency of the

program is quite satisfying, while easiness of

prototyping lets experiment easily new solutions: on

the other end a compiled version of the Sound

Browser speeds up both search and classification.

REFERENCES

Bellini, P. Nesi, P., 2001 WEDELMUSIC format: an

XML music notation format for emerging applications

Proceedings of the First International Conference on

Web Delivering of Music.

Burred JJ, A Lerch 2004 Hierarchical Automatic Audio

Signal Classification Journal of the Audio Engineering

Society. Vol. 52, No. 7/8.

Evangelista G., Cavaliere S. 2005. Event Synchronous

Wavelet transform approach to the extraction of

Musical Thumbnails, Proc. of the DAFX05

International Conference on Digital Audio Effects

Madrid, Spain.

Foote, J. 1999. An overview of audio information

retrieval. ACM Multimedia Systems, 7:2–10.

Haus G, Longari M, 2002 Towards a Symbolic/Time-

Based Music language based on XML

Proc. First International IEEE Conference on Musical

Applications Using XML (MAX2002), New York.

Lu L., Hao J., and HongJiang Z., 2001. A robust audio

classification and segmentation method. In Proc. ACM

Multimedia, Ottawa, Canada.

Pachet F, La Burthe A, Zils A, Aucouturier JJ - Popular

music access: The Sony music browser Journal of the

American Society for Information Science and and

Technology, Volume 55, Issue 12 , Pages 1037 – 1044.

Panagiotakis C, Tziritas G, 2005. A Speech/Music

Discriminator Based on RMS and Zero-Crossings -

IEEE Transactions on Multimedia.

Peeters G., Rodet X., 2002. Automatically selecting signal

descriptors for sound classification. In Proceedings of

ICMC 2002, Goteborg, Sweden.

Rossignol S., Rodet X., 1998. et al. Features extraction

and temporal segmentation of acoustic signals. In

Proc. Int. Computer Music Conf. ICMC, pages 199–

202. ICMA.

Scheirer E., Slaney M., 1997. Construction and evaluation

of a robust multifeature speech/music discriminator. In

Proc. Int. Conf. on Acoustics, Speech and Signal

Processing ICASSP, pages 1331–1334. IEEE.

Tzanetakis, G. Cook, P., 2000, MARSYAS: a framework

for audio analysis . Organised Sound,

CambridgeUnivPress 4(3), pages 169-177.

Tzanetakis G. and Cook P., 2002. Musical Genre

Classification of Audio Signals IEEE Transactions on

Speech and Audio Processing, VOL. 10, NO. 5, JULY

p. 293.

Vinet H, Herrera P, Pachet F. , 2002. The Cuidado Project:

New Applications Based on Audio and Music Content

Description Proc. ICMC.

Wold E., Blum T., Keislar D., and Wheaton J., 1996.

Content-based classification, search and retrieval of

audio. IEEE Multimedia, 3(2).

Zhang T. and Kuo J., 2001. Audio Content Analysis for

online Audiovisual Data Segmentation and

Classification IEEE Transactions on Speech and

Audio Processing (4):441–457, May.

Zölzer U. (ed.). 2002. DAFX - Digital Audio Effects. John

Wiley & Sons.

SIGMAP 2006 - INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA

APPLICATIONS

338