
follows. In Section 2, related approaches are ex-
plored. Our model is explained in Section 3. In Sec-
tion 4, integration issues of the model with commer-
cial database management systems are discussed. The
following section gives some practical examples to
provide an understanding of possible applications of
the model. Finally, our conclusions with the further
issues are presented.
2 RELATED WORK
Content-based retrieval of multimedia data has been
explored in several studies. Early attempts have ad-
dressed the problem of retrieval of images. After-
wards, the problem of video retrieval has attracted
much more attention. Audio, however, is generally
studied with regard to video retrieval, and not much
has been done on this issue. As stated in (Gudivada
and Raghavan, 1995), we can broadly classify the var-
ious approaches into three categories: keyword based,
feature based, and concept based approaches. In key-
word based approaches, which is the simplest way to
model the content, is by using free text manual an-
notation. In feature based approaches, a set of fea-
tures are extracted from the multimedia data, and rep-
resented in a suitable form. In the latter case, applica-
tion domain knowledge is used to interpret an object’s
content and may require user intervention.
Systems in the first category are mostly based on
textual data, hence a traditional database management
system which provides support for object retrieval is
adequate for this purpose. In the latter categories, fur-
ther considerations are needed in the context of mod-
eling in multimedia database systems.
In the literature, there are a few works on content-
based retrieval systems for auditory data both com-
mercially and academically. However, most of these
systems support only segmentation and classification
of audio data, that is, the signal processing aspects of
the audio. As query languages are very dependent on
the underlying data model, our survey will take ac-
count of some of the multimedia query languages.
One specific technique in content-based audio
retrieval is query-by-humming. The approach in
(A. Ghias, 1995) defined the sequence of relative dif-
ferences in the pitch to represent the melody contour
and adopted the string matching method to search
similar songs.
In the content-based retrieval (CBR) work of the
Musclefish Company (E. Wold, 1996), they took sta-
tistical values (including means, variances, and auto-
correlations) of several time and frequency-domain
measurements to represent perceptual features like
loudness, brightness, bandwidth, and pitch. As
merely statistical values are used, this method is only
suitable for sounds with a single timbre.
A music and audio retrieval system was proposed
in (Foote, 1997), where the Mel-frequency coeffi-
cients were taken as features, and a tree-structured
classifier was built for retrieval.
In (A. Woudstra, 1998), an architecture for mod-
eling and retrieving audiovisual information is pro-
posed. The proposed system presents a general frame-
work for modeling multimedia information and dis-
cusses the application of that framework to the spe-
cific area of soccer video clips.
In (L. Lu, 2003) an SVM-based approach to
content-based classification and segmentation of au-
dio streams is presented for audio/video analysis. In
this approach, an audio clip is classified into one of
the five classes: pure speech, non-pure speech, mu-
sic, environment sound, and silence. However, there
is no underlying database model for content-based au-
dio retrieval in the system.
(J.Z. Li, 1997) describes a general multimedia
query language, called MOQL, based on ODGMs’
Object Query Language (OQL). Their approach is
to extend the current standard query language, OQL,
to facilitate the incorporation of MOQL into existing
object-oriented database management systems. How-
ever, as stated in (J.Z. Li, 1997), further work needs
to be done to investigate the support for audio media
and to establish the expressiveness of MOQL.
There are other audio data models, which are ex-
plored in the context of video (G. Amato, 1998),
(A. Hampapur, 1997), (R. Weiss, 1994). However,
since the main purpose is video, less attention is paid
on the audio component.
The main contribution of this work lies on the fol-
lowing. We mainly emphasized on the audio com-
ponent. Particular attention is given to the integra-
tion issues of the model with commercial database
management systems, and finally, we believe that the
interoperability of the model is enabled by utilizing
the signal features, which have been standardized in
MPEG-7 framework.
3 AUDIO DATA MODEL
In this section, we present our audio data model, its
components and some details on representation of
audio data. As identified in the work on MPEG-7
(John R. Smith, 2000), audio-visual content can be
described at many levels such as structure, semantics,
features and meta-data. At this stage, MPEG-7 takes
place by standardizing a core set of descriptors and
description schemes to enable indexing, retrieval of
audio-visual data, and interoperability of the data re-
sources (MPEG-7, 1999). A descriptor (D) is used to
represent a feature that characterizes the audio-visual
ICETE 2004 - WIRELESS COMMUNICATION SYSTEMS AND NETWORKS
386