PARTIAL TRACKING IN SINUSOIDAL MODELING

An Adaptive Prediction-based RLS Lattice Solution

Leonardo O. Nunes, Paulo A. A. Esquef, Luiz W. P. Biscainho and Ricardo Merched

LPS - DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

Keywords:

Acoustical signal processing, Sinusoidal modeling, Partial tracking, Linear prediction, Adaptive ﬁltering, Lat-

tice ﬁlters.

Abstract:

Partial tracking plays an important role in sinusoidal modeling analysis, being the stage in which the model

parameters are obtained. This is accomplished by coherently grouping the spectral peaks found in each frame

into time-evolving tracks of varying frequency and amplitude. The main difﬁculties faced by partial tracking

algorithms are the analysis of polyphonic signals and the pursuit of tracks exhibiting strong modulations in

frequency and amplitude. In these circumstances, linear prediction over the trajectory of a given track has

been shown to improve partial tracking performance. This paper proposes an adaptive RLS lattice ﬁlter for

the purpose of prediction in partial tracking. A new heuristic which certiﬁes the ﬁlter convergence is also

presented. Computer simulation results are shown to compare the proposed implementation with that of other

predictors. The performance of the proposed solution is similar to that of competing methods, albeit with

reduced computational complexity as well as improved numerical stability.

1 INTRODUCTION

Audio signals are predominantly resonant in nature,

being thus well described by a sum of amplitude- and

frequency-modulated sinusoids. Taking advantage of

that fact, sinusoidal modeling (SM) has been intro-

duced for speech analysis in (McAulay and Quatieri,

1986) and for audio signals in (Smith III and Serra,

1987), being later expanded (Serra, 1997) and modi-

ﬁed (George and Smith, 1992) to suit various audio-

related applications, such as speech synthesis and

modiﬁcations, musical instrument synthesizers, audio

coding, and automatic transcription of music.

The classical MQ sinusoidal analysis algorithm

presented in (McAulay and Quatieri, 1986) can be di-

vided into two separate steps. First, the sinusoidal

components are detected on a frame-by-frame basis,

usually by peak picking from the magnitude spectrum

of the signal computed via the Short-Time Fourier

Transform. The detected peaks are then linked

across the frames to form the partial tracks. Each

track, if correctly detected, models an amplitude- and

frequency-modulated sinusoid.

The main difﬁculties faced by a partial tracking al-

gorithm are in robustly estimating the trajectory of a

partial that exhibits strong frequency modulation (vi-

brato) and/or amplitude modulation (tremolo), as well

as in resolving partial ambiguities that may occur dur-

ing analysis of a polyphonic audio recording.

Several proposals have been made in attempt to

improve partial tracking performance. In (Depalle

et al., 1993) hidden Markov models, along with a

Viterbi algorithm, are used to model the track trajec-

tory in order to achieve optimum track continuation.

Kalman ﬁlters have also been considered in (Sterian

and Wakeﬁeld, 1998) for modeling the tracks’ behav-

ior of musical instrument sounds, provided knowl-

edge of a model for the instrument being analyzed.

Another approach for partial tracking makes use of

autoregressive modeling to predict the evolution of

the track parameters over time. In (Lagrange et al.,

2003) a predictor based on the Burg’s method has

been shown to be quite effective for such task, spe-

cially in situations where crossing frequency trajecto-

ries occur. In (Nunes et al., 2007b) an adaptive-ﬁlter

solution is described for the prediction of the partial

tracks. The sequential nature of adaptive ﬁlters has

been demonstrated to be specially suited for the pre-

diction problem at hand.

This work presents an RLS lattice ﬁlter solution to

the prediction of the frequency and amplitude compo-

nents of a given partial track. The proposition, which

builds on a previous solution (Nunes et al., 2007b), at-

tains similar performancebutwith a reduction in com-

O. Nunes L., A. A. Esquef P., W. P. Biscainho L. and Merched R. (2008).

PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution.

In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 84-91

DOI: 10.5220/0001937000840091

 SciTePress

putational complexityfrom O (n

) to O (n). Moreover,

a heuristic that discards unrealistic predictions during

the ﬁlter’s training period is also described.

After this introduction, Section 2 brieﬂy outlines

the processing stages involved in SM analysis, de-

scribing the standard MQ partial tracking algorithm,

and reviewing the adaptive ﬁlter framework for the

problem. In Section 3, the proposed lattice ﬁlter

solution is introduced. Computer simulation results

that illustrate the performance of the proposed pre-

dictor are shown in Section 4. Finally, conclusions

are drawn.

2 SINUSOIDAL ANALYSIS

OVERVIEW

Sinusoidal modeling (McAulay and Quatieri, 1986)

describes an audio signal x(t) as a sum of L sinusoids,

i.e.,

x(t) =

∑

l=1

(t)sin(Ψ

(t)), (1)

with

(t) = Ψ

(0) +

(u)du, (2)

where A

(t) is the modulated amplitude and Φ

(t) is

the modulated phase of partial l. In practice, Eq. (1)

is commonly replaced by a discrete model,

x[n] =

∑

l=1

[n]sin(Ψ

[n]). (3)

For a given partial l, the approximations A

[n] ≈ A

and Ψ

[n] ≈ Ω

n + Ψ

[0], where A

and Ω

are con-

stant values, hold true within a sufﬁciently short N-

sample frame.

The main objective of a sinusoidal analysis al-

gorithm consists in estimating A

and Ω

across the

frames. The typical stages (Serra and Smith III, 1990)

involved in the analysis portion of an SM system are

illustrated in Fig. 1.

Decomposition

Time/Frequency

Partial

Detection

Peak

x[n]

Tracks

Tracking

Partial

Figure 1: Processing stages of a sinusoidal analysis system.

The ‘time / frequency decomposition’ stage com-

putes the discrete-time Short-Time Fourier Transform

of the audio signal x[n], i.e.,

X[m,k] = STFT(x[n,m])

N−1

∑

n=0

w[n]x[n+ mH]e

− jk

2π

(4)

where w[n] is a window function of length N, e.g. the

Hamming window, k is the frequency bin index, m is

the frame index, and H is the frame hop (in samples)

along time. Out of X[m, k], the ‘peak detection’ stage

is supposed to select only those peaks that correspond

to stationary sinusoidal components present in frame

m. If desired, precise estimates for the amplitude and

frequency of the detected peaks can be obtained by

a number of dedicated methods (Lagrange and Marc-

hand, 2007). Finally, the partial tracking is responsi-

ble for coherently grouping peaks across consecutive

frames into the so-called partial tracks.

2.1 McAulay & Quatieri Algorithm

The objective of this section is twofold: to describe

the MQ algorithm, which will be extended in the next

section ,and else, to illustrate the difﬁculties that arise

in the partial tracking problem.

The MQ algorithm, proposed in (McAulay and

Quatieri, 1986) for speech analysis and adapted to au-

dio signals in (Smith III and Serra, 1987), is consid-

ered the standard algorithm for partial tracking and

serves as a starting point to many algorithms found in

the literature.

The MQ algorithm is summarized below:

1. To each track i in frame m with corresponding

peak frequency f

[m], the closest peak p, with fre-

quency f

[m + 1] such that | f

[m + 1] − f

[m]| ≤

∆f, isassigned by the algorithm. When two tracks

dispute the same peak, the one with the clos-

est frequency wins the conﬂict. The other track

searches for the next closest peak.

2. An emerging track is created to accommodate any

unassigned peaks. If a track stays in the emerg-

ing state for more than E frames, its state changes

to evolving. If an emerging track does not ﬁnd a

continuation after E frames, it is then discarded.

3. If in frame m + 1 a track does not ﬁnd any peak,

it is assigned a vanishing status, and its current

amplitude-frequencypair is propagated to the next

frame. If a track ﬁnds a continuation in at least S

frames, it leaves the vanishing state, being other-

wise considered inactive.

The algorithm performance is strongly dependent

on the choice of parameters {∆ f,E,S}. The role of

each parameter in the algorithm is described below:

• The ∆f parameter controls the maximum fre-

quency variation allowed and is usually frequency

dependent. For instance, ∆f = 0.03f

[m] is a com-

mon choice, since it corresponds to a quarter-tone

variation around f

[m].

PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution

• The E parameter is responsible for removing short

tracks, possibly formed by wrongly identiﬁed

peaks.

• The S parameter avoids track discontinuation as a

consequence of missing peaks.

The above procedure can fail to correctly identify

a track continuation in signals with vibrato or poly-

phonic audio. The occurrence of vibrato can lead to

a large frequency variation between adjacent frames,

requiring thus a too permissive choice for the ∆ f pa-

rameter. On the other hand, in polyphonic signals,

the occurrence of closely spaced track trajectories in

frequency (and even crossing frequency trajectories)

demands a very stringent choice of ∆f. Moreover, by

only considering the frequency evolution of the tracks

the MQ tracker ignores the preservation of the ampli-

tude continuity, which can lead to audible artifacts in

the re-synthesized signal.

The solution presented in the next section aims

at circumventing some of the aforementioned prob-

lems. By using previously acquired track information

to predict the frequency trajectory and comparing the

predicted with the observed peak value, the ∆ f pa-

rameter can be set to a small value even for tracks

with signiﬁcant frequency variations, thus favoring a

better performance in the polyphonic case. Prediction

can also be made for the track amplitude information

to better select a continuation of a given track. All

these modiﬁcations make the algorithm more robust

in the sense that a single set of well-tuned processing

parameters sufﬁces to the purpose of partial tracking

for a wider class of signals.

2.2 Prediction-based Partial Tracking

A rather natural extension to the MQ algorithm is ob-

tained by using predicted values for frequency and

amplitude of a given track, instead of using their last

measured values (Lagrange et al., 2007). This way,

the information acquired up to a determined frame is

used to obtain the best continuation for the track.

Figure 2 depicts a prediction-based partial track-

ing algorithm, which works as follows: for a given

track i, the most prominent spectral peaks of the sig-

nal at frame m (stored in amplitude-vector A[m] and

frequency-vector f[m]) are compared with predicted

values of amplitude (

[m]) and frequency (

[m]);

then a decision heuristics, to be discussed later, se-

lects amplitude

[m] and frequency f

[m] as the best

track continuation path.

The predicted values are obtained by minimizing

a function of the error between the estimated and the

“real” values of the peaks parameters. Since this min-

imization is performed sequentially on the track data,

Decider

A[m]

f [m]

[m − 1]

[m]

Predictor

Figure 2: Prediction scheme for track i at frame m.

the model can be adapted whenever a new input sam-

ple (new peak assigned to a given track) is fed to the

predictor.

In (Nunes et al., 2007b), a regularized recursive

least-squares procedure is proposed for sequentially

predicting the amplitude and the frequency of each

partial. Deﬁning the output vector with the predicted

parameters of the ith track by y

[m] =



[m]



and the input vector with the past J known parameters

by x

[m] =

[m− 1]

, with

[m− 1] =



[m− 1]

[m− 2] · · · A

[m− J]



(5)

[m− 1] =



[m− 1] f

[m− 2] ··· f

[m− J]



, (6)

one can write

[m] = x

[m]W

[m− 1], (7)

where W

[m] is a 2J × 2 coefﬁcient-matrix.

Given a choice of α > 0 and a forgetting factor

0 << λ ≤ 1, the exponentially-weighted regularized

least-squares problem (Sayed, 2003) seeks the matrix

[m] that minimizes

m+1

[m]P

−1

[m]

∑

l=0

m−l

[l] − x

[l]W

[m]k

, (8)

where d

[m] =



[m] f

[m]



is the desired-signal

vector and P

−1

= α

−1

, with I

denoting a Jth-order

identity matrix.

Referring to Figure 2 again, the predicted ampli-

tude and frequency values at frame m are used to

choose the best track continuation from the parame-

ter vector (A and f) of the detected peaks. The chosen

track continuations,

A[m] and f[m], are then employed

to update the coefﬁcient-matrix to time m. This new

coefﬁcient-matrix, in conjunction with the updated

input-vector x[m+ 1], is used to predict the amplitude

and frequency values for the track at frame m+ 1.

The decision strategy adopted in (Nunes et al.,

2007b) is similar to that described in Section 2.1 and

to the method proposed in (Lagrange et al., 2007).

During the ﬁrst qJ frames, q ≥ 1, the predicted results

are simply discarded while the ﬁlter is trained. For an

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

evolving track, given the vectors containing the pa-

rameters of detected peaks, f[m] and A[m], the candi-

date peaks are selected such that |

[m]− f

[m]| ≤ ∆ f,

where i is the track index and p is the peak index. The

distance

[m] − f

[m]|

[m]

+ κ

[m] − A

[m]|

[m]

(9)

is calculated for all peak candidates and the one

nearest to its predicted counterpart is appended to the

track trajectory. The treatment of emerging and van-

ishing tracks follows the guidelines already described

in Section 2.1.

The aforementioned scheme considers both am-

plitude and frequency information into their joint pre-

diction. However, depending on the type of sound

source, or even on the level of noise contamination,

the track amplitude may behave more unpredictably

than the corresponding frequency, thus impairing the

estimation of the latter. For these cases, an alternative

uncoupled structure can be straightforwardly obtained

with two separate predictors, one for the frequency

and another for the amplitude of the tracks.

Next, an RLS lattice ﬁlter is presented as a solu-

tion to the prediction problem with an accompanying

decision heuristic.

3 LATTICE FILTER SOLUTION

The adaptive ﬁlter described in the previous Section

can be considered as a ﬁxed order algorithm in the

sense that only time updates are performed in the ﬁl-

ter. Thus, only quantities related to the solution of

a ﬁxed order prediction are propagated. The lattice

solution, on the other hand, uses both time and or-

der recursions (up to a predeﬁned order) to obtain the

predicted values by sequentially solving linear predic-

tion problems of increasing order. The solution thus

obtained exhibits several advantages over the previ-

ous algorithm, including improved numerical behav-

ior, stability, and reduced computational complex-

ity (Sayed, 2003).

The notation used so far has to be adapted due to

the order-recursivenature of the lattice ﬁlter. An addi-

tional sub-index j, denoting the jth-order solution for

a given quantity will be used throughout this section.

In this work, an a priori lattice ﬁlter is employed

to predict the frequency and amplitude of the tracks.

This is equivalent to the uncoupled version of the

prediction scheme described in Section 2.2. Thus,

given a choice of α > 0 and of a forgetting factor

0 << λ ≤ 1, the lattice ﬁlter obtains the weight vector

i,J

[m] that minimizes the following Jth-order least-

squares cost function (Sayed, 2003):

m+1

i,J

[m]

−1

i,J

[m]

∑

l=0

m−l−1



[l] − x

[l]w

i,J

[m]



, (10)

where, in this case, x

[m] can be either

[m − 1],

for the amplitude predictor, or f

[m − 1], for the fre-

quency predictor. d

[m] is the desired signal at time

m, which can be either A

[m] or f

[m], accordingly,

whereas P

−1

= α

−1

diag{λ

−2

,λ

−3

,··· , λ

−(J+1)

} is a

regularization matrix. The solution of order j for the

ith track at frame m can be obtained through the fol-

lowing equations:

i, j

[m] = λζ

i, j

[m− 1] + α

i, j

[m]γ

i, j

[m− 1]

i, j

[m] = λζ

i, j

[m− 1] + β

i, j

[m]γ

i, j

[m]

i, j

[m] = λδ

i, j

[m− 1] + α

i, j

[m]β

i, j

[m− 1]γ

i, j

[m− 1]

i, j

[m] = λρ

i, j

[m− 1] + e

i, j

[m]γ

i, j

[m− 1]

i, j+1

[m] = β

i, j

[m− 1] − κ

i, j

[m− 1]α

i, j

[m]

i, j+1

[m] = α

i, j

[m] − κ

i, j

[m− 1]β

i, j

[m− 1]

i, j+1

[m] = e

i, j

[m] − κ

i, j

[m− 1]β

i, j

[m]

i, j+1

[m] = γ

i, j

[m] − (γ

i, j

[m]β

i, j

[m])

/ζ

i, j

[m]

i, j

[m] = δ

i, j

[m]/ζ

i, j

[m]

i, j

[m] = δ

i, j

[m]/ζ

i, j

[m− 1]

i, j

[m] = ρ

i, j

[m]/ζ

i, j

[m]

with γ

i,0

[m] = 1, β

i,0

[m] = α

i,0

[m] = x

[m] and e

i,0

[m] =

[m]. Hence the solution at frame m can be calcu-

lated by iterating the equations above for j varying

from 0 up to the desired predictor order J. The time

initialization of these quantities is described later in

this section.

As can be seen, the lattice ﬁlter does not explicitly

ﬁnd the optimum weight vector. Both weight vector

and predicted values could be computed through ad-

ditional recursions along the adapting procedure. A

simple solution is adopted here to compute only the

predicted value after a given time-update of the ﬁlter.

For this, the key quantity is the a priori error of the

ﬁlter, deﬁned as

i,J

[m] = d

[m] − x

[m]w

[m− 1]. (11)

The predicted value for frame m can be written as

[m] = x

[m]w

[m− 1], (12)

where y

[m] can be either

[m] or

[m], leading to

[m] = −e

i,J

[m]|

[m]=0

. (13)

Hence, in order to obtain the predicted value from

the ﬁlter at frame m, a new quantity, numerically

PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution

equal to the a priori error with a null desired signal, is

used. This way, the predicted value can be calculated

through the following order-update recursions:

i, j+1

[m] = α

i, j

[m] − κ

i, j

[m− 1]β

i, j

[m− 1]

i, j+1

[m] = β

i, j

[m− 1] − κ

i, j

[m− 1]α

i, j

[m]

i, j+1

[m] = y

i, j

[m] + κ

i, j

[m− 1]

i, j

[m],

with y

i,0

[m + 1] = 0 and

i,0

[m] = α

i,0

[m] initialized

either as

A[m − 1] or f[m− 1]. Notice that all quan-

tities involved in this calculation are available at time

m− 1, after the corresponding time-update of the lat-

tice ﬁlter.

Another issue with the prediction-based partial

tracker is the ﬁlter convergence. Partial tracking per-

formance may be hindered if prediction comes from a

ﬁlter whose convergence has not yet been achieved.

In order to overcome this limitation the following

heuristic is proposed: if the distance from the pre-

dicted value to the last element in the data vector is

too large, the predicted value is ignored and the last

element is used as the predicted value. This criterion

guarantees that the predicted values of frequency and

amplitude being used for comparison are always close

to those of the track parameters, thus avoiding discon-

tinuities in the obtained tracks.

The decision algorithm follows that of Section 2.2

with some modiﬁcations. The function used to decide

over a valid track is deﬁned by Eq. (9). The main

change is on the handling of missing peaks. If a track

i does not ﬁnd a suitable peak in frame m, it takes the

predicted amplitude and frequency. However, in the

next frame, a modiﬁed two-step ahead predictor used;

in other words, the amplitude and frequency in frame

m+ 1 are predicted using the information up to frame

m− 1. If in the next frames the track still does not ﬁnd

a continuation, the same modiﬁed scheme proceeds.

Formally, if a track is in the vanishing state during s

consecutive frames and remains in it at frame m, the

following predictor is used,

[m] = x

[m− s] ˜w

[m− s− 1], (14)

where the prediction coefﬁcients are found by min-

imizing the cost function in Eq. (10) with d

[m] =

[m + 1]. This way, the optimum predicted value is

always used to search a track continuation, given the

available track information.

When a peak is not selected to continue any track,

a new track is created to accommodate the peak.

The predictor for this new track has to be initialized

with the following values: γ

i, j

[−1] = 1, δ

i, j

[−1] =

0 = ρ

i, j

[−1] = β

i, j

[−1] =

i, j

[−1] = α

i, j

[−1] = 0,

i, j

[−1] = κ

i, j

[−1] = κ

i, j

[−1] = 0, ζ

i, j

= α

−1

−2

, and

i, j

[−1] = α

−1

− j−2

; for j from 0 up to J −1. This is

the necessary time initialization mentioned earlier. As

can be noted, the lattice ﬁlter does not need any ma-

trix structure, which reduces the memory use of the

algorithm in relation to the RLS solution.

The parameters of the lattice ﬁlter are the predic-

tor order J, the forgetting factor λ, and the regulariza-

tion factor α. The value of J is usually larger than 2

and not bigger than 10 (Lagrange et al., 2007). Al-

though a larger J favors lowering the prediction error,

it may imply an undesirably longer training period for

the ﬁlter. Setting 2 ≤ J ≤ 6 for both the amplitude and

the frequency predictors has been shown to be a good

compromise between the two conﬂicting goals above.

The forgetting factor controls how much the past sam-

ples inﬂuence the prediction. Adopting λ close to

0.98 has been found to be adequate for the predic-

tion of tracks. The α parameter controls how much

the regularization affects the prediction. A high value

of α (around 2000) allows the regularization factor to

be quickly forgotten. On the other hand, if the pre-

diction coefﬁcients are known to vary little from the

initial estimate (as is the case in this paper) a small α

helps speeding up ﬁlter convergence.

It should be noted that the frequency values of

a given track usually drift around a ﬁxed center-

frequency. Considering this frequency in the pre-

diction can slow down ﬁlter convergence, impairing

tracking performance. To avoid that, in the proposed

system the track frequency prediction is always car-

ried out relative to the frequency ﬁrstly attributed to a

given track.

4 COMPUTER SIMULATIONS

This section is devoted to investigate the performance

of the proposed adaptive lattice predictor in compari-

son with a few other predictors found in the literature.

It also illustrates how the partial trackers behavewhen

analyzing natural audio signals.

4.1 Example 1: Test Setup

The ﬁrst experiment is meant to assess the perfor-

mance of three predictors: the predictor based on

Burg’s method presented in (Lagrange et al., 2007),

the RLS predictor described in Section 2.2, and the

lattice predictor detailed in Section 3. For both the

RLS and lattice predictors λ = 0.98 and α = 10 have

been adopted. For the Burg predictor the length of

the observation window was chosen to be equivalent

to the duration (in samples) for which the exponential

window of the previous methods convey 90% of its

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

energy (Laakso and V¨alim¨aki, 1998). The prediction

order of all methods was chosen as 4.

The artiﬁcial frequency track depicted in Figure 3,

which simulates the behavior of a partial from a tone

played with vibrato, has been used as a test signal. It

consists of a frequency variation of sinusoidal nature,

centered around 440 Hz, with rate equal to 0.25 rad/s,

and amplitude depth of ±7 Hz multiplied by a trape-

zoidal envelope. White Gaussian noise was summed

to this signal so as to force an SNR equal to 40 dB.

4.2 Example 1: Results

The mean squared prediction error (MSE) of each

method is displayed in Figure 4. As can be seen,

the performance of the RLS and lattice methods was

equivalent, except for the initial parts of the MSE

curves, which differ due to the use of different reg-

ularization matrices in each case. The Burg predictor

yielded larger MSE and variance, being poorer in per-

formance.

According to this ﬁrst experiment, both the RLS

and lattice prediction ﬁlters have similar performance,

mainly due to the minimization of the same cost func-

tion. The difference between them is in the compu-

tational complexity requirements. The RLS solution

has an asymptotic computationalcomplexity of O (J

)

as opposed to O (J) of the proposed lattice solution,

where J stands for prediction order. Since the number

of active ﬁlters is proportional to the number of active

tracks in a given frame, the aforementioned reduction

in computational cost can have a great impact on the

overall processing load of a sinusoidal analysis sys-

tem. Moreover, the quantities that need to be saved

for each ﬁlter between adjacent frames are reduced in

the lattice ﬁlter, leading to less memory use.

0 100 200 300 400

430

435

440

445

450

Time (samples)

Frequency (Hz)

Figure 3: Test signal used in the performance evaluation of

the predictors.

0 100 200 300 400

−2

(a)

Prediction error

Time (samples)

0 100 200 300 400

−2

(b)

Prediction error

Time (samples)

0 100 200 300 400

−2

(c)

Prediction error

Time (samples)

Figure 4: Mean squared prediction error curves (log-scale)

for: (a) RLS, (b) lattice, and (c) Burg predictors.

4.3 Example 2: Test Setup

To illustrate the performance of the proposed partial

tracker as a whole, a long-duration violin tone played

with vibrato has been extracted from a CD recording

(sampled at 44.1 kHz). This signal was segmented

in frames through an overlap-and-add scheme that

employed windows with duration of 20 ms, without

any sidelobes (Depalle and H´elie, 1997), and frame

hops of 5 ms. The sinusoids were detected using the

method described in (Nunes et al., 2007a). The lattice

parameters were the same as those used in the previ-

ous example. The decider parameters were arbitrarily

selected as following: κ = 1, ∆f = 3%, E = 60, and

S = 8 frames.

PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution

0.2 0.4 0.6 0.8 1 1.2

Time (s)

Frequency (kHz)

Figure 5: Frequency tracks obtained using the proposed

partial tracking algorithm. The signal under analysis is a

violin F8 tone played with vibrato.

Time (s)

Frequency (kHz)

0.2 0.4 0.6 0.8 1 1.2

−160

−140

−120

−100

−80

−60

−40

Figure 6: Spectrogram of the violin tone used in Example

2. The colorbar values are in dB.

4.4 Example 2: Results

The obtained partial tracks of the violin tone can

be seen in Figure 5. For comparison purposes the

spectrogram of the same tone is shown in Figure 6.

By comparing the spectrogram with the obtained

tracks, one can see that the frequency variations that

are characteristic of tone partials in vibrato playing

were well captured. Moreover, the tracks exhibited

a good smoothness with few discontinuities, being

those compatible with partial continuity failures also

visible in the spectrogram.

The amplitude variation of the 6th partial track

(centered around 8.3 kHz) can be viewed in Fig-

ure 7. Although being less well-behaved than the

frequency variation, the tracked amplitude variation

also exhibits coherent behavior. In order to conﬁrm

that, the 6th partial has been isolated via an adequate

band-pass ﬁltering of the tone. In the sequel, the per

frame energy of the selected partial was computed, as

seen in Figure 8. It can be observed that the evolu-

tion of the track amplitude over time closely matches

that of the selected partial energy, being their cross-

correlation coefﬁcient equal to 0.95.

0.2 0.4 0.6 0.8 1 1.2

−80

−70

−60

−50

−40

Time (s)

Amplitude (dB)

Figure 7: Amplitude track obtained using the proposed par-

tial tracking algorithm. The plot shows in detail the sixth

partial of the violin tone.

0.2 0.4 0.6 0.8 1 1.2

−80

−70

−60

−50

−40

Time (s)

Amplitude (dB)

Figure 8: Overall energy variation of the 6th partial of the

violin tone.

5 CONCLUSIONS

This paper presented an adaptive lattice predictor to

the partial tracking problem in sinusoidal modeling

analysis of audio signals. The proposed method in-

corporated a novel predictor that signiﬁcantly reduced

both the computational complexity as well as the

memory use in relation to previous methods. A new

heuristic to validate the predicted track parameters

was also described.

Simulations haveshownthat, under equivalenttest

conditions, the lattice predictor performs as effec-

tively as other methods previously reported in the lit-

erature, despite the reduced computational cost. In

order to conﬁrm that, a real-world violin tone played

with vibrato has been subjected to analysis through

a sinusoidal modeling system that utilized the lattice

predictor within the partial tracking stage. The at-

tained results indicate that the adopted heuristics led

to a satisfactory tracking of the tone partials.

The proposed method may be further improved

if extended to perform partial tracking in a joint

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

frequency-amplitude prediction scheme. The deci-

sion algorithm could also be improved by considering

more than one frame, as proposed in (Lagrange et al.,

2007).

ACKNOWLEDGEMENTS

The authors wish to thank CNPq, CAPES, and

FAPERJ for supporting this work.

REFERENCES

Depalle, P., Garcia, G., and Rodet, X. (1993). Tracking

of partials for additive sound synthesis using hidden

markov models. In Proceedings of the 1993 IEEE In-

ternational Conference on Acoustics, Speech, and Sig-

nal Processing, volume 1, pages 225–228, Minneapo-

lis, USA.

Depalle, P. and H´elie, T. (1997). Extraction of spectral peak

parameters using a short-time fourier transform mod-

eling and no sidelobe windows. In 1997 IEEE Work-

shop Applications of Signal Processing to Audio and

Acoustics, New Paltz, USA.

George, E. B. and Smith, M. J. T. (1992). Analysis-by-

synthesis/overlap-add sinusoidal modeling applied to

the analysis and synthesis of musical tones. Journal

of the Audio Engineering Society, 40(6):497–516.

Laakso, T. and V¨alim¨aki, V. (1998). Energy-based effective

length of the impulse response of a recursive ﬁlter. In

Proceedings of the 1998 IEEE Conference on Acous-

tics, Speech, and Signal Processing, volume 3, pages

1253–1256, Seattle, USA.

Lagrange, M. and Marchand, S. (2007). Estimating the in-

stantaneous frequency of sinusoidal components using

phase-based methods. Journal of the Audio Engineer-

ing Society, 55(5):385 – 399.

Lagrange, M., Marchand, S., Raspaud, M., and Rault, J.-B.

(2003). Enhanced partial tracking using linear predic-

tion. In Proc. of the 6th International Conference on

Digital Audio Effects (DAFx’03), London, UK.

Lagrange, M., Marchand, S., and Rault, J.-B. (2007). En-

hancing the tracking of partials for the sinusoidal

modeling of polyphonic sounds. IEEE Transac-

tions on Audio, Speech, and Language Processing,

15(5):1625–1634.

McAulay, R. J. and Quatieri, T. F. (1986). Speech anal-

sysis/synthesis based on a sinusoidal representation.

IEEE Transactions on Acoustics, Speech, and Signal

Processing, 34(4):744–754.

Nunes, L., Esquef, P., and Biscainho, L. (2007a). Eval-

uation of threshold-based algorithms for detection

of spectral peaks in audio. In Proceedings of the

5th AES-Brazil Conference, pages 66–73, S˜ao Paulo,

Brazil.

Nunes, L., Merched, R., and Biscainho, L. (2007b). Re-

cursive least-squares estimation of the evolution of

partials in sinusoidal analysis. In Proceedings of the

2007 IEEE Conference on Acoustics, Speech, and Sig-

nal Processing, volume I, pages 253–256, Honolulu,

USA. IEEE.

Sayed, A. (2003). Fundamentals of Adaptive Filtering.

Wiley-IEEE.

Serra, X. (1997). Musical sound modeling with sinusoids

plus noise. In Poli, G. D., Picialli, A., Pope, S. T., and

Roads, C., editors, Musical Signal Processing. Swets

& Zeitlinger Publishers.

Serra, X. and Smith III, J. O. (1990). Spectral modeling

synthesis: A sound analysis/synthesis system based

on deterministic plus stochastic decomposition. Com-

puter Music Journal, 14(4):12–24.

Smith III, J. O. and Serra, X. (1987). PARSHL: An analy-

sis/synthesis program for non-harmonic sounds based

on a sinusoidal representation. In Proceedings of the

International Computer Music Conference, volume 76

(6), pages 1738–1742, Champaign-Urbana, USA.

Sterian, A. and Wakeﬁeld, G. H. (1998). A model-based

approach to partial tracking for musical transcription.

In Proceedings of the 1998 SPIE Annual Meeting, San

Diego, USA.

PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution