coefficients using Huffman, arithmetic, RAR and ZIP
entropy coders. The resulting file lengths were very
close to those obtained with MP3 coding in all cases.
AAC compression was significantly better for similar
audio quality.
We used also another method to entropy code the
quantized values for scalability purposes. Taking ad-
vantage of the sparse structure that quantized values
using FPQ present, we compressed each bit-plane us-
ing PNG for binary images. The resulting file lengths
obtained were between 5% and 23% bigger than those
obtained with MP3. Nevertheless, the files thus com-
pressed have the advantage of being able to gener-
ate scalable bit-streams without any additional com-
putational cost. It is possible to achieve fine-grain
scalability by further subdividing the bit-planes into
smaller binary images. Experiments show that grace-
ful degradation is obtained by successively removing
the less significant bit-planes.
4.2 Real-time Performance
When low-delay is required each frame must be coded
and sent immediately. Using 512-point temporal win-
dows for the MDCT, the minimum algorithmic delay
is 11.6 ms for a sampling rate of 44100 Hz. Using
DCT with non-overlapped 256-point windows, delay
can be reduced to 5.8 ms. As long as FPQ and Huff-
man coding are very low costing operations, only the
psychoacoustic model would increase these delay val-
ues significantly. However, as FPQ uses the masking
threshold to the maximum possible frequency resolu-
tion for a given a window length, it is expected that
even shorter windows could be used while maintain-
ing acceptable coding gains.
The price we have to pay for low-delay perfor-
mance is that the Huffman codewords must be actual-
ized and sent to the decoder each and every time new
quantization values appear in the incoming frames.
After an initial start-up stage in which Huffman code-
words must be sent for every frame, the histogram be-
comes approximately stable. In this steady-state sce-
nario, the same codewords are nearly optimal for the
frames transmitted thereafter. If new values appear,
a new Huffman table including these values must be
generated. Fortunately, this happens only occasion-
ally after the first frames are coded and sent, and the
overload penalty is only an increase of 10-15% in
the amount of information that must be transmitted.
Another possibility is to predefine a fixed “standard”
Huffman table for all the cases, taking into account
that histograms of very different signals are actually
not so different, as shown in Figure 3: approximately
the same values (powers of 2 and multiples of powers
of 2) are the most probable in all cases.
5 CONCLUSIONS
In this paper, a novel perceptual quantization method
for audio coding called FPQ has been proposed. This
method is an alternative to those used in the most im-
portant perceptual audio coders and has several ad-
vantages with respect to them: it is much simpler and
computationally inexpensive; it can produce scalable
bit-streams with no additional computational cost and
it is suitable for real-time applications.
For the purpose of measuring the FPQ capabili-
ties, a simple audio compression prototype was built
in MATLAB. The results of the experiments showed
that the quantized spectral values given by FPQ are
very well suited for compression using almost any en-
tropy coding system. Compression rates comparable
to those obtained using MP3 coding for the same au-
dio quality were achieved. Very low delay and scal-
ability is also achievable at the expense of slightly
higher bit-rates. It is expected that the compression
capacity of the system will be significantly improved
using more refined psychoacoustic models and en-
tropy coders, while keeping the advantages of being
simple, computationally inexpensive and scalable.
REFERENCES
Bosi, M., Brandenburg, K., Quackenbush, S., Fielder, L.,
Akagiri, K., Fuchs, H., Dietz, M., Herre, J., David-
son, G., and Oikawa, Y. (1997). ISO/IEC MPEG-2 ad-
vanced audio coding. J. Audio Eng. Soc., 45(10):789–
814.
Derrien, O., Duhamel, P., Charbit, M., and Richard, G.
(2006). A New Quantization Optimization Algorithm
for the MPEG Advanced Audio Coder Using a Statis-
tical Subband Model of the Quantization Noise. IEEE
Transactions on Audio, Speech and Language Pro-
cessing, 14(4):1328–1339.
ISO/MPEG (1992). Information technology—Coding of
moving pictures and associated audio for digital stor-
age media at up to about 1.5 Mbit/s—Part 3: Audio.
IS11172-3 (MPEG-1).
Kramer, U., Schuller, G., Wabnik, S., Klier, J., and
Hirschfeld, J. (2004). Ultra Low Delay audio coding
with constant bit rate. 117th AES Convention.
Wabnik, S., Schuller, G., Hirschfeld, J., and Kraemer, U.
(2006). Different quantisation noise shaping methods
for predictive audio coding. IEEE International Con-
ference on Acoustics, Speech and Signal Processing
(ICASSP), Toulouse.
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
34