dio. For instance, a high resolution graphics model
may have many vertices, and one way to simplify such
models is to remove vertices but still preserve some
of the geometry. In the case of audio, we can think
of the cycle as playing the role of a vertex. An inter-
esting audio signal, such as a single note played on a
piano or guitar, has many cycles evolving over time
at some approximate fundamental frequency f
0
. It is
possible with spline modeling and cycle interpolation
to remove many cycles but replace them with interpo-
lated versions when rendering, thus arriving at lower
resolution models.
What problem does LOD for audio solve? In mod-
ern realtime multimedia applications, such as video
games and virtual reality, the demands on computa-
tion and memory continue to rise rapidly. This can
also be compounded by the number of simultaneous
sound sources being used. A common solution is to
use submixes, so that the number of voices (or audio
streams) in play at any time can be limited by group-
ing together voices which share common character-
istics. Higher priority voices, such as those in the
foreground and which are critical to a given scene,
are given more attention. For instance, it may be
that a single key character emits sounds of different
types, say vocalizations, weapon sounds, footsteps,
each with its own audio stream. Each of these sounds
may also have effects ranging from simple filters to
HRTF for spatial localization. The cost of such high
priority detail must be offset by the savings in treating
low priority sounds more economically. If many low
priority sounds are dynamically being grouped into a
particular submix, say a distant group of characters
or objects, it may be significant to work with lower
resolution sounds which can also be mixed prior to
rendering.
2 SPLINE MODELS
The basic spline model is discussed in (Klassen,
2022), which we summarize here. The main enhance-
ment to this model, which we introduce in this pa-
per, is that we allow cycles to be defined without
the requirement that they begin and end with zero-
crossings. We refer to this modified model as the delta
model, since it amounts to introducing a small vertical
shift to the ends of each cycle. This shift is inserted in
the form of a cubic polynomial which can be thought
of as replacing the time axis over the interval of one
cycle. This delta model arose initially to solve the
“shape discontinuity” problem described in (Klassen,
2022).
As a basic class of sounds or signals, we use in-
strument samples with known fundamental frequency.
This allows us to work with the notion of a cycle, or
period, as a basic signal block. Although this is not
well-defined, given that the length and shape of cycle
can both vary with time, it is often how sound presents
itself, and is quite compelling. (We also note that
these methods can be adapted to work with arbitrary
fixed size audio blocks, but that the methods of cy-
cle interpolation are more clearly demonstrated when
there is an approximate notion of cycle available.) For
example, if we consider a simple recorded sound such
as a single note played on a piano or a guitar, the sig-
nal can be partitioned approximately into cycles based
on zero-crossings. This approach is central to the ba-
sic model in (Klassen, 2022). In the basic model, once
the cycles are determined by the sequence of zeros
z
i
, i = 0, ..., p, we do a cubic B-spline fit to the au-
dio sample data. We assume the audio sample data is
given as a piecewise linear function of time t, mean-
ing that we can choose to interpolate between samples
linearly for the purpose of defining zero-crossings. To
specify the spline function, say f (t), we work with
a uniform sequence of k subintervals on the interval
[0,1], and the vector space of C
2
cubic polynomial
splines with dimension n = k + 3. To specify a basis,
we use the knot sequence:
t = {t
0
,... ,t
N
} = {0,0,0,0,
1
k
,
2
k
,... ,
k − 1
k
,1,1,1, 1}.
Writing the B-spline basis functions associated to t as
B
0
(t),B
1
(t),.. . ,B
n−1
(t)
we note that B
0
(0) = 1 and B
n−1
(1) = 1, and all the
other basis splines vanish at both 0 and 1. So we
set c
0
= c
n−1
= 0 and solve for the other n − 2 co-
efficients for each cycle. In order to approximate the
audio data in one cycle, we find an interpolating cu-
bic spline which matches the (piecewise linear) audio
data function x(t) at n − 2 = k + 1 specified points. A
simple choice of such points is to use k−1 subinterval
endpoints and then add two more points at the middle
of the first and last subintervals.
In Figure 1 is a portion of the graph of an au-
dio sample, cycles 10 through 15 of a guitar pluck
at approximately 450 Hz, with spline model in blue.
Here we are using the default d = 3, interpolating cu-
bic spline on each cycle, with k = 15 subintervals
hence dimension n = 18. This signal is recorded
at 44100 samples per second, so there are around
98 = 44100/450 samples per cycle. The actual num-
ber of samples per cycle is listed at the bottom of each
shaded cycle. Since we are using only 18 data points
to define the interpolating spline, the match to the au-
dio graph is clearly not exact. For this audio sample,
if we use k = 30 and n = 33, the spline graph is diffi-
cult to distinguish from the original.
Spline Modeling and Level of Detail for Audio
95