2.2 Amplitude Estimation
In contrast to the frequency estimation which is not
directly taken from the sound analysis results, we use
SMS analysis results as a basis for estimating the am-
plitude values of the harmonic partials.
However, we reduce the number of parameters to
provide a flexible synthesis model that is mostly inde-
pendent from the preceding sound analysis process.
This also reduces the computational complexity of
the synthesis process. Additionally, our main concern
is to keep the quality and naturalness of the musical
sound after the synthesis process in order to mimic
real instruments. Therefore, three different methods
have been applied to the analysis amplitude data. In
particular we have carried out amplitude estimation
by means of local optimization, lowpass filter estima-
tion and polynomial fitting.
We start by applying a standard SMS analysis
(Amatriain et al., 2002) to obtain the amplitude pa-
rameters. To increase the number of spectral samples
per Hz and improve the accuracy of the peak detec-
tion process, we apply zero-padding in the time do-
main - using a zero-padding factor of 2. The STFT
was performed with a sampling rate of 44.1 kHz and a
Blackman-Harris window with a window size of 1024
points and a hop size of 256 points. From the resulting
frequency spectrum, 100 spectral peaks were detected
and subsequently used to track the harmonic partials
of the sound. The number of partials to be tracked
was set to 20. This analysis has been applied to sound
samples taken from the RWC database (RWC, 2001),
in particular to all notes over the range of a flute, a
violin and a piano. Given the amplitude tracking re-
sults only one representative note for each instrument
has been chosen to provide the basis for the amplitude
values of the RPSM. However, this could be changed
in the future into using more than one amplitude tem-
plate, e.g., using different templates for the low notes
and the high notes within the range of an instrument.
2.2.1 Local Optimization
The SMS analysis provides one amplitude value for
each harmonic partial and for each frame of a given
sound signal. We reduce that parameter size by de-
termining the local maxima of each amplitude track.
This reduces the number of parameters to about a
third of the SMS analysis result. For example, for the
flute note (A4, played forte, non Vibrato) the SMS
analysis consists of 12680 amplitude values. This is
reduced to 3015 values which represent all the local
maxima of the 20 harmonic partials.
We determine the local maxima of each amplitude
envelope by using the first derivative of the amplitude
envelope function f
e
. Suppose we want to determine
if f
e
has a maximum at point x. If x is a maximum of
f
e
, then f
e
is increasing to the left of x and decreasing
to the right of x. The same principle applies for local
minima of f
e
. If x is a minimum of f
e
, then f
e
is
decreasing to the left of x and increasing to the right of
x. In contrast, if f
e
is increasing or decreasing on both
sides of x, then x is not a maximum or a minimum. In
terms of the first derivative of f
e
this means, that f
e
is
increasing when the derivative is positive, and that f
e
is decreasing when the derivative is negative.
To compute the shape of each amplitude track,
necessary for the synthesis process, we then perform a
one-dimensional linear interpolation between the lo-
cal maxima of the track. Figure 2 (top right) illus-
trates an example of estimated amplitude tracks using
this approach as well as the SMS analysis results (top
left) for a violin note. As can be seen the shape of the
tracks are close to the SMS analysis result. However,
this is not the case for the attack and the release part
of the sound.
2.2.2 Lowpass Filter Estimation
The second curve fitting method applied uses a low-
pass filter to estimate the overall amplitude envelope
of each partial. We apply a 3
rd
order Butterworth low-
pass filter to the analysis data. We perform zero-phase
digital filtering by processing the input data in both
the forward and reverse directions. After filtering in
the forward direction, the filtered sequence is reversed
and runs back through the filter. The resulting se-
quence has precisely zero-phase distortion and dou-
ble the filter order. As shown in Figure 2 (bottom left)
the envelope shapes of the estimated amplitude tracks
are similar to the local optimization estimation. How-
ever, the estimation takes significantly longer to be
performed. Similar to the local optimization method,
no sufficient estimate for the synthesis of the attack
and the release of the sound signal is obtained.
2.2.3 Polynomial Interpolation
Additionally we performed polynomial fitting to ob-
tain an estimate for the several amplitude tracks. For
each amplitude envelope the coefficients of a polyno-
mial of degree 6 are computed that fit the data - in
our case the analysis result - in a least squares sense.
This computation is performed using a Vandermonde
matrix (Meyer, 2000)
V =
1 α
1
α
1
... α
n−1
1
1 α
2
α
2
... α
n−1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 α
m
α
m
... α
n−1
m
(1)
TIME DOMAIN ATTACK AND RELEASE MODELING - Applied to Spectral Domain Sound Synthesis
121