A BIOLOGICALLY INSPIRED HARDWARE MODULE FOR

EMBEDDED AUDITORY SIGNAL PROCESSING APPLICATIONS

Xin Yang, Mokhtar Nibouche, Tony Pipe and Chris Melhuish

Bristol Robotics Laboratory, University of the West of England, Coldharbour Lane, Bristol, U.K.

Keywords:

Bio-inspired system, Auditory subsystems, Acoustic signal processing, Digital ﬁlter, FPGA, System on chip.

Abstract:

This paper presents a fully parameterised and highly scalable design prototype of FPGA (ﬁeld programmable

gate array) implementation of a biologically inspired auditory signal processing system. The system has

been captured and simulated using system-level integrated design tools, namely, System Generator

and

AccelDSP

both from Xilinx

. The implemented hardware auditory periphery model consists of two

sub-models—the Patterson’s Gammatone ﬁlter bank and the Meddis’ inner hair cell. The prototype has been

successfully ported onto a Virtex

–II Pro FPGA. Ultimately, it can be used as a front-end apparatus in a

variety of embedded auditory signal processing applications.

1 INTRODUCTION

The human peripheral auditory system, as illustrated

in Figure 1, consists of the outer, middle and inner ear.

The inner ear, or the cochlea, is a coiled tube ﬁlled

with ﬂuid. The sound vibration is transmitted to the

ﬂuid, and then to the basilar membrane of the cochlea.

The stiffness of the basilar membrane decreases expo-

nentially along the length of the cochlea. This makes

the basilar membrane act like a frequency analyser

with the base part (near the oval window) responding

to high frequencies, and the apical part (the far end)

responding to lower frequencies. The sensory cells to

detect the frequencies are the hair cells attached to the

basilar membrane. There are three rows of outer hair

cells (OHC) and one row of inner hair cells (IHC).

For human, there are approximately 12,000 OHCs

and 3,500 IHCs (Truax, 1999). The movement of the

OHCs and the basilar membrane is conveyed to the

IHCs and causes a depolarisation, which in turn re-

sults in a receptor potential. Thus, IHCs release neu-

rotransmitters, whose concentration change gives rise

to nerve spikes into the auditory nerve.

The human auditory system deals with a wide

range of everyday real life applications such as pitch

detection, sound localisation and speech recognition,

just to name few. It does the job extremely well, and

far better than the current acoustic sensor technology

based systems. The auditory system consists of the

auditory periphery, as the front end sensing apparatus,

and includes different other regions of the brain up to

Figure 1: The human ear (Truax, 1999).

the auditory cortex. The auditory periphery ’trans-

duces’ acoustical data into train of spikes for more

high level neuronal processing. Based on this, many

researchers and engineers believe that modelling and

implementation of an artiﬁcial auditory subsystems,

especially the cochlea, will yield better performance

in acoustic signal processing. The reported electronic

cochlea implementations fall into two categories.

One is the task-oriented engineering approach,

which treats the whole cochlea as ﬁlters in order to

obtain the time/frequency information that can be

used for different kinds of post-processing. These

are the majority of the reported implementations, in-

cluding the ﬁrst electronic cochlea proposed by Lyon

and Meads (Lyon and Mead, 1988), some recent

FPGA implementations (Mishra and Hubbard, 2002),

(Leong et al., 2003), and (Wong and Leong, 2006).

The other category is the research-oriented signal

382

Yang X., Nibouche M., Pipe T. and Melhuish C. (2009).

A BIOLOGICALLY INSPIRED HARDWARE MODULE FOR EMBEDDED AUDITORY SIGNAL PROCESSING APPLICATIONS.

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing, pages 382-387

DOI: 10.5220/0001554703820387

 SciTePress

processing approach, which analyses each stage of the

biological acoustic signal processing in the auditory

neurological system. There is less work in this cat-

egory, and most of them focus on the hair cells and

auditory nerve. Lim reported a pitch detection sys-

tem (Lim et al., 1997) based on the Meddis’ inner

hair cell model (Meddis, 1986), then Jones improved

it to a four-stage pitch extraction system (Jones et al.,

2000). A spike-based sound localisation system was

implemented by Ponca (Ponca and Schauer, 2001).

There was an analogue hair cell model implemented

and then improved by van Schaik (van Schaik and

Meddis, 1999), (van Schaik, 2003).

The prototype implementation of the auditory sub-

system in this paper belongs to the second cate-

gory. It models a part of the signal processing

of the human cochlea, by connecting two widely

accepted models, the Patterson’s Gammatone ﬁlter

bank (GFB) (Patterson et al., 1992) and the Meddis’

inner hair cell (IHC) (Meddis, 1986). Compared to

other existing work, the proposed hardware imple-

mentation is fully parameterised and highly scalable.

It provides a good platform for further research, and

can be developed at the front-end of an embedded au-

ditory signal processing system.

2 PATTERSON’S GAMMATONE

FILTER BANK

The GFB proposed by Patterson (Patterson et al.,

1992) is a set of parallel Gammatone ﬁlters, each of

which responds to a speciﬁc frequency range. The

Gammatone ﬁlter is a bandpass ﬁlter with gamma dis-

tribution well known in statistics. It describes the

impulse response of a cat’s cochlea, which is very

similar to the human cochlea. The GFB provides a

reasonable trade-off between accuracy in simulating

the basilar membrane motion and computational load.

Some improved models have been developed based

on the original work, however, due to the increased

hardware computation burden, the original model is

adapted here. The impulse response of a Gammatone

ﬁlter is:

h(t) = At

N−1

−2πbt

cos(2π f

t + ϕ) (t ≥0, N ≥1)

(1)

Where A is an arbitrary factor that is typically used

to normalise the peak magnitude to unity; N is the ﬁl-

ter order; b is a parameter that determines the duration

of the impulse response and thus the ﬁlters bandwidth,

is the centre frequency, and ϕ is the phase of the

tone.

Slaney developed a digital version of the GFB

(Slaney, 1993), then implemented it in his famous

Matlab

“Auditory Toolbox” (Slaney, 1998). Each

digital Gammatone ﬁlter consists of four second-

order sections (SOS) or Inﬁnite Impulse Response

(IIR) ﬁlters, as illustrated in Figure 2. The general

Z transfer function for each IIR ﬁlter is:

H(z) =

+ A

−1

+ A

−2

1 + B

−1

+ B

−2

(2)

SOS

Gammatone Filter Bank Channel 1

SOS

Gammatone Filter Bank Channel 2

SOS

In Out 1

Out 2

Out N

SOS

Gammatone Filter Bank Channel N

SOS

Figure 2: Slaney’s digital Gammatone ﬁlter bank.

3 MEDDIS’ INNER HAIR CELL

MODEL

Meddis introduced the ﬁrst hair cell model (Med-

dis, 1986), which describes the transduction between

IHCs and auditory nerves in a manner quite close to

physiology by modelling both the short term and long

term adaptation characteristics of the IHCs. Just like

the case of the GFB model, the Meddis’ IHC model

has also been improved to adapt new ﬁndings in bi-

ology, however, for simplicity, the original model is

chosen for this prototype implementation. The Med-

dis’ IHC model can be described by a set of four non-

linear equations (Meddis, 1986).

k(t) =

(

g(s(t)+A)

s(t)+A+B

for s(t) + A > 0

0 for s(t) + A ≤ 0

(3)

= y(1 −q(t)) + rc(t) −k(t)q(t) (4)

= k(t)q(t) −lc(t) −rc(t) (5)

P(e) = hc(t)dt (6)

Where k(t) is the permeability; s(t) is the instan-

taneous amplitude; q(t) is the cell transmitter level;

A BIOLOGICALLY INSPIRED HARDWARE MODULE FOR EMBEDDED AUDITORY SIGNAL PROCESSING

APPLICATIONS

383

P(e) is the spike probability; A, B, g, y, l, x, r and m

are constants based on statistics; and h is the propor-

tionality factor, which can be set to different values.

The underlying structure of the model is illus-

trated in Figure 3 (Meddis, 1986). It is worth not-

ing that there is also a Matlab

implementation of

the Meddis’ IHC model in Slaney’s “Auditory Tool-

box” (Slaney, 1998), which provides a reference for

this design prototype.

Figure 3: The Meddis’ inner hair cell model (Meddis,

1986).

4 SYSTEM IMPLEMENTATION

4.1 System Architecture

The proposed system architecture, as illustrated in

Figure 4, consists of a GFB (a set of Gammatone ﬁl-

ters) that mimics the behaviour of the basilar mem-

brane, interfaced in a parallel fashion to the Med-

dis’ IHC module through a bank of buffers. Each

Gammatone ﬁlter is combined with a single Med-

dis’ IHC to process for a particular range of fre-

quencies (double-lined circle in Figure 4). The GFB

(basilar membrane module) processes the incoming

signal using parallel channels for different frequency

ranges, and generates outputs that represent the vi-

bration displacements along different parts of the bi-

ological basilar membrane. The Meddis’ IHC mod-

ule calculates the probability rate of neural spikes

(spikes/second) corresponding to each output gener-

ated by the GFB module. A channel structure con-

sisting of a Gammatone ﬁlter, a buffer and an IHC

(double-lined circle in Figure 4) could be reconﬁg-

ured to implement any of the system channels (1, 2,

3, . . . , n, . . . , N) through parameterisation. The sys-

tem is also scalable, which means that an arbitrary

number of channels can be generated, again through

parameterisation. Simulation can be carried out either

in software or hardware, however, for this prototyping

stage; software simulation is preferred.

The speciﬁcations of the complete model are as

follows. First, the “audible” frequency range of the

basilar membrane module has to cover the human au-

ditory range—from 200 Hz to 20 kHz (Truax, 1999).

A direct consequence is that the sampling frequency

of the system sets to 44 kHz to allow a reasonable

discrete representation. Second, perhaps a little arbi-

trarily, the number of channels is set to 20. Although

the number of GFB channels should be the same as

the number of IHCs in the cochlea (3,500), a com-

promise has to be made because of the constraints on

hardware. Finally, the system must operate in real-

time.

The general design methodology relies predomi-

nantly on IP-based blocks to model DSP primitives

such as adders and multipliers using signed ﬁxed-

point bit serial arithmetic. Appropriate number repre-

sentation, quantisation and overﬂow handling of the

signals in the system are keys for a successful imple-

mentation. As such, the system input was coded as

14-bit signed ﬁxed-point numbers, with 12 fractional

bits. The output of the basilar membrane module was

reduced to 10 bits with 8 fraction bits to relieve the

computation load for the Meddis inner hair cell mod-

ule. The output of the system was set as 14-bit with

12 fractional bits, in a similar fashion to the input.

Input

Signal

Out 1

Out 2

Out n

Basilar

Membrane

Module

Gammatone

Filter n

Buffer

Meddis n

Gammatone

Filter 1

Buffer

Meddis 1

Gammatone

Filter 2

Buffer

Meddis 2

Gammatone

Filter N

Buffer

Meddis N

Meddis

Inner Hair

Cell Module

Out N

Implementation

N: total number of channels

n: channel number

Figure 4: Structure for the whole model and the imple-

mented channel.

4.2 Basilar Membrane Module

For the proposed design prototype, only one chan-

nel of the GFB was implemented using Sys-

tem Generator

(Xilinx, 2008b). A channel or a

Gammatone ﬁlter consists of four second-order IIR

ﬁlters (SOS), each of which is implemented in the

direct form structure, as illustrated in Figure 5. The

calculations of the SOSs’ coefﬁcients are achieved

by using Slaney’s “Auditory Toolbox”. There is no

BIOSIGNALS 2009 - International Conference on Bio-inspired Systems and Signal Processing

384

forward path as indicated in the transfer function

of equation 2, simply because the coefﬁcient A

always zero for all the SOSs of any channel. The

numerator coefﬁcients (A

and A

) of each SOS are

scaled by

√

total gain, resulting in the Gammatone

ﬁlter to be scaled by the total gain which is calcu-

lated by the toolbox. This scaling narrows the dy-

namic range of the intermediate signals, and results

in a reduced word length for the adders and multipli-

ers. Table 1 presents an example of the calculated co-

efﬁcients for the ﬁrst channel of the GFB, where the

total gain for this channel is 2.90294 ×10

. Conse-

quently, all the coefﬁcients A

and A

in the table are

scaled by 1.30529 ×10

. It is worth noting that only

the coefﬁcient A

of the a SOS differ from those of

the other 3 SOSs in the same channel, which makes a

hardware optimisation possible for a single SOS.

-1

-B

-1

Figure 5: A second-order section of the Gammatone ﬁlter.

Table 1: Coefﬁcients of the IIR ﬁlters in channel 1.

IIR A

SOS 1 0.29665−0.10187 1.26532 0.56372

SOS 2 0.29665 0.47724 1.26532 0.56372

SOS 3 0.29665 0.13800 1.26532 0.56372

SOS 4 0.29665 0.23736 1.26532 0.56372

4.3 Meddis Inner Hair Cell Module

Slaney’s Matlab

implementation (Slaney, 1998) of

the Meddis’ IHC model must be revised prior to be

synthesised by AccelDSP

(Xilinx, 2008a). This is

not only because of the constraint requirements im-

posed by the AccelDSP

software, but also the de-

mand for a real-time DSP system. The revised imple-

mentation processes the input signal in a ﬁxed-point,

bit-serial fashion, which leads to an unavoidable er-

ror compared to the original ﬂoating-point model,

as illustrated in Figure 6. The investigation into

the reported hardware implementation ((Lim et al.,

1997),(Jones et al., 2000)) highlights this error as

well. In reality, this is not a critical issue since the ex-

act numerical values of the probability spike rate out-

put are not essential for the spike generation (could be

scaled up using the h coefﬁcient in equation 6), how-

ever, the half-wave rectiﬁcation, the saturation and the

adaptation (both long term and short term) character-

istics of the output are in return quite important (Jones

et al., 2000). The Meddis IHC module was generated

based on this revised model using AccelDSP

. Fig-

ure 6 gives a comparison between the generated ﬁxed-

point model and the original ﬂoating-point model us-

ing a 1 kHz sine wave input.

0 1 2 3 4 5

x 10

−3

0.5

0.6

0.7

0 1 2 3 4 5

x 10

−3

0.65

0.7

0 1 2 3 4 5

x 10

−3

−0.2

0.2

Simulation Time (s)

Firing Rate (spikes/s)

Figure 6: From top to bottom: outputs of the ﬁxed-point

Meddis’ IHC model, the ﬂoating-point Meddis’ IHC model

and the error, with 1 kHz sine wave input.

4.4 Simulation and Synthesis Results

The test bench illustrated in Figure 7 was built to com-

pare the simulation results of the original Matlab

ﬂoating-point model, the System Generator

ﬁxed-

point model and the FPGA hardware implementation

model, however, the simulation here are only carried

out in software at this prototyping stage. The input

signal is generated by Simulink

blocks and can

also be imported from ﬁles or even real-time external

events. By an initialisation script, the total number

of channels of the auditory subsystem can be set to

an arbitrary number, in this case, 20, and the ﬁxed-

point model can be conﬁgured as anyone of the chan-

nels. The simulation results, shown in Figure 8, de-

picts the outputs of the 5

channel under considera-

tion for a step and then a sine wave input functions

respectively. The hardware model generates closely

matched outputs comparing with that of the software

implementation in the simulation. The synthesis re-

port illustrated in Table 2 indicates that the hardware

utilisation is quite low (7%), except for the multipli-

A BIOLOGICALLY INSPIRED HARDWARE MODULE FOR EMBEDDED AUDITORY SIGNAL PROCESSING

APPLICATIONS

385

ers (32%), but the design can run in real-time. It also

implies that only 3 channels can be implemented in

parallel for this FPGA chip.

Table 2: Synthesis report of the ﬁrst channel of the auditory

subsystem.

Target Device XC2VP30

Synthesis Tool XST v10.1.01

Used Slices 993 7%

Used Slice Flip Flops 827 3%

Used 4 input LUTs 2,574 9%

Used RAMB16s 2 1%

Used MULT18X18s 44 32%

Max Frequency 17.505 MHz

5 CONCLUSIONS AND FUTURE

WORK

The paper presents the design and FPGA implemen-

tation of a bio-inspired hardware module that can be

used as a front end apparatus in a variety of embedded

auditory signal processing applications. The imple-

mentation consists of two sub-modules, Patterson’s

GFB and Meddis’ IHC, linked together to mimic the

behaviour of a single frequency channel of the audi-

tory periphery. The proposed design is fully param-

eterised and highly scalable. The design prototype

has been captured and then simulated using two inte-

grated tools, System Generator

and AccelDSP

both from Xilinx

. The prototype works as ex-

pected and the design process is much faster than

the traditional hardware description language (HDL)

design ﬂow. The resulting hardware structure was

too large to accommodate a 20-channel parallel au-

ditory subsystem; therefore, a time-shared multiplex-

ing scheme is envisaged for future implementations.

More optimisation can be achieved for the ﬁlter mod-

ules to improve the system performance and reduce

the number of multipliers. The ultimate goal is to

build a complete bio-inspired system that models the

signal processing of the whole human auditory sys-

tem.

REFERENCES

Jones, S., Meddis, R., Lim, S., and Temple, A. (2000). To-

ward a digital neuromorphic pitch extraction system.

Neural Networks, IEEE Transactions on, 11(4):978–

987.

Leong, M. P., Jin, C. T., and Leong, P. H. W. (2003). An

fpga-based electronic cochlea. EURASIP Journal on

Applied Signal Processing, 2003:629–638.

Lim, S., Temple, A., Jones, S., and Meddis, R. (1997).

Vhdl-based design of biologically inspired pitch de-

tection system. In Neural Networks,1997., Interna-

tional Conference on, volume 2, pages 922–927.

Lyon, R. and Mead, C. (1988). An analog electronic

cochlea. Acoustics, Speech, and Signal Processing

[see also IEEE Transactions on Signal Processing],

IEEE Transactions on, 36(7):1119–1134.

Meddis, R. (1986). Simulation of mechanical to neural

transduction in the auditory receptor. Journal of the

Acoustical Society of America, 79(3):702–711.

Mishra, A. and Hubbard, A. (2002). A cochlear ﬁlter imple-

mented with a ﬁeld-programmable gate array. Circuits

and Systems II: Analog and Digital Signal Processing,

IEEE Transactions on [see also Circuits and Systems

II: Express Briefs, IEEE Transactions on], 49(1):54–

60.

Patterson, R. D., Robinson, K., Holdsworth, J., McKeown,

D., Zhang, C., and Allerhand, M. (1992). Complex

sounds and auditory images. In Cazals, Y., Demany,

L., and Horner, K., editors, Auditory Physiology and

Perception, page 429C446. Pergamon, Oxford.

Ponca, M. and Schauer, C. (2001). Fpga implementation

of a spike-based sound localization system. In 5th In-

ternational Conference on Artiﬁcial Neural Networks

and Genetic Algorithms - ICANNGA2001.

Slaney, M. (1993). An efﬁcient implementation of the

patterson-holdsworth auditory ﬁlter bank. Technical

Report 35, Perception GroupłAdvanced Technology

Group, Apple Computer.

Slaney, M. (1998). Auditory Toolbox.

Truax, B., editor (1999). HANDBOOK FOR ACOUSTIC

ECOLOGY. Cambridge Street Publishing, 2 edition.

van Schaik, A. (2003). A small analog vlsi inner hair cell

model. In Circuits and Systems, 2003. ISCAS ’03.

Proceedings of the 2003 International Symposium on,

volume 1, pages 17–20.

van Schaik, A. and Meddis, R. (1999). Analog very large-

scale integrated (vlsi) implementation of a model

of amplitude-modulation sensitivity in the auditory

brainstem. Journal of the Acoustical Society of Amer-

ica, 105:811–821.

Wong, C. K. and Leong, P. H. W. (2006). An fpga-based

electronic cochlea with dual ﬁxed-point arithmetic. In

Field Programmable Logic and Applications, 2006.

FPL ’06. International Conference on, pages 1–6.

Xilinx (2008a). AccelDSP User Guide. Xilinx, 10.1.1 edi-

tion.

Xilinx (2008b). System Generator for DSP User Guide.

Xilinx, 10.1.1 edition.

BIOSIGNALS 2009 - International Conference on Bio-inspired Systems and Signal Processing

386

Figure 7: Test bench for the simulation of the prototype implementation of the auditory subsystem.

Figure 8: Simulation results for the 5

channel. (a) the output of the basilar membrane module with step input; (b) the output

of the basilar membrane module with 1 kHz sine input; (c) the output of the Meddis IHC module with step input; (d) the

output of the Meddis IHC module with 1 kHz sine input. For each sub-ﬁgure, the output of the ﬁxed-point hardware model

is at the top, the output of the ﬂoating-point software module is in the middle, and the calculated error is at the bottom. The

X-axis represents the simulation time, and the Y-axis represents the amplitude of the output.

A BIOLOGICALLY INSPIRED HARDWARE MODULE FOR EMBEDDED AUDITORY SIGNAL PROCESSING

APPLICATIONS

387