# AN ASYNCHRONOUS PROGRAMMABLE PARALLEL 2-D IMAGE FILTER CMOS IC BASED ON THE GILBERT VECTOR MULTIPLIER

Rafał Długosz

Swiss Federal Institute of Technology in Lausanne, Institute of Microtechnology Rue A.-L. Breguet 2, CH-2000, Neuchâtel, Switzerland

Vincent Gaudet

University of Alberta, Department of Electrical and Computer Engineering ECERF Building, Edmonton Alberta, T6G 2V4, Canada

Keywords: Parallel and asynchronous 2-D analog filter, Gilbert vector multiplier, Ultra-low power dissipation.

Abstract: A novel analogue power-efficient 2-D programmable finite impulse response image filter is proposed. This solution is based on the current-mode Gilbert-vector-multiplier operating in the weak inversion region, which allows for ultra low power operation. The main advantage is in the asynchronous and parallel calculation of all pixel values without using any clock generator. The filter is a programmable structure that allows programmability of different filter masks both low-pass and high-pass. An experimental filter integrated circuit with a resolution of 6x1 pixels dissipates in measurements a power of  $30 \mu$ W at a data rate of 30 kframes/s in a 180 nm CMOS technology. One of the intended applications of our proposed image filter is in data compression in wireless endoscopic capsules.

### **1 INTRODUCTION**

The advent of inexpensive CMOS image sensors has recently led to the design of wireless capsules for endoscopic diagnosis (Meng et al., 2004) (Xie et al., 2006). Such capsules usually contain the CMOS image sensor, signal processing circuitry, and a communication block between the capsule and a transceiver located outside the patient's body that records collected data. Such a system, together with light-emitting diodes and a battery, is placed in a small pill that is hermetically packaged to be robust against enzymes and acid in the human digestive tract. Such capsules must last 8 to 24 hours as they travel throughout the digestive tract while capturing about 50,000 photos that are then used for diagnostic purposes. Thus the capsule must collect and transmit an extremely large amount of information (Miaou, 2005). This problem is usually addressed by different compression techniques e.g. a discrete wavelet transform (DWT) (Miaou, 2005) and reported near-lossless recently compression

algorithm for the images with a Bayer color filter array (CFA) (Xie et al. 2006).

In these systems, analog data is typically captured by a CMOS image sensor and then processed in a specialized digital circuit (Xie et al. 2006). In this paper we propose a programmable analog, asynchronous, parallel 2-D finite impulse response (FIR) filter that can be used as a preprocessing filter prior to the analog-to-digital conversion (ADC) as well as in the filter banks in the DWT-based compression algorithms. Our proposed filter uses the current-mode Gilbert vector multiplier circuit operating in the weak inversion regime, leading to ultra-low power dissipation and energy savings (Winstead, 2004), which is one of main criterions in endoscopic capsule applications. One of the advantages of this approach is the speedup that results from the parallel and asynchronous calculation of all pixel values without using a clock generator.

Research has been conducted into analog image filters for many years. An example image filter with an array of 16x16 pixels that dissipates 165mW at

Długosz R. and Gaudet V. (2009). AN ASYNCHRONOUS PROGRAMMABLE PARALLEL 2-D IMAGE FILTER CMOS IC BASED ON THE GILBERT VECTOR MULTIPLIER. In Proceedings of the International Conference on Biomedical Electronics and Devices, pages 46-51 DOI: 10.5220/0001536800460051 Copyright © SciTePress the I/O data rate of  $5 \times 10^7$  events per second has been described by (Serrano-Gotarredona *et al.*, 2008). This solution allows for connecting of many chips into bigger systems with higher resolutions.

Image filters are also implemented using cellular neural networks (CNN). An example filter of this type, designed in a 0.5  $\mu$ m CMOS technology, with a resolution of 64x64 pixels at an I/O analog data rate of 1 MSps, dissipates 1.5 W (Linan et al., 1999).

The experimental prototype filter described in this paper has been designed for a resolution of 6x1 pixels, but as the circuit features a modular structure, therefore it can be redesigned for higher resolutions. Based on the simulation and the measurement results, we project a power dissipation of 22 mW for a 64x64 pixel filter operating at a throughput of 10 Mpixels/s (measurements) or even 4 Gpixels/s (simulations) that means a significant improvement in comparison to the solutions described above.

The paper is organized as follows. In next section we present the circuits that we use for FIR filtering. An experimental filter realized in TSMC 180nm CMOS process is described in the subsequent section together with selected measurement results. Finally the last section concludes this paper.

#### 2 GENERAL IDEA OF THE PROPOSED IMAGE FILTER

A basic 2-D FIR filtering scheme is shown in Fig. 1. Pixels from a 2-D input signal A are first multiplied by filter coefficients from a mask H. The products of these operations are summed, thus producing pixels for an output image B, according to the following equation (for e.g. 3x3 mask H):

$$B(x, y) = \sum_{n=1}^{3} \sum_{m=1}^{3} A(x+n-2, y+m-2)h(n,m) \quad (1)$$

When the image filtering is implemented in DSP systems the mask is moved over the input image A and the output pixels are calculated sequentially, which is a time and power consuming process.



Figure 1: Two-dimensional FIR image filtration.

On the other hand, the solution proposed in this paper allows for a parallel calculation of all pixels  $B_{xy}$ . As a result time that is required to calculate a single image *B* is not dependent on the number of pixels in this signal. The proposed filter additionally is a programmable structure that can be quickly reprogrammed to perform both the low-pass and the high-pass filtering, which is an important feature e.g. in realizing 2-D DWT filter banks.

The basic circuit used in the proposed filter is the current-mode Gilbert vector-by-scalar multiplier (GVSM) shown in Fig. 2 (Winstead, 2004). A vector of the output currents  $P_{xy}=\{I_{p1}, I_{p2}, ..., I_{pN}\}$  is calculated according to the following formula:

$$P_{xy} = \frac{I_{xy}h^{\mathrm{T}}}{\sum_{i=0}^{N}h_{i}} = I_{xy}|h|^{\mathrm{T}}$$
(2)

in which  $h=\{h_1, h_2, ..., h_N\}$  is a vector of currents, which in the proposed application are proportional to the filter coefficients, while  $I_{xy}$  is the current proportional to an input pixel  $A_{xy}$ . The currents  $h_i$  are adjusted off-line and then kept constant during filtering, although they can be reprogrammed very quickly if necessary.



Figure 2: Gilbert scalar-by-vector multiplier (GSVM).



Figure 3: Calculation scheme in our proposed filter: two possible views.

The calculation scheme in the proposed filter is illustrated in Fig. 3. Each column in this 3-D block contains a single GSVM circuit that provides the vector  $P_{xy}$ . Notice that lengths of this vector as well as of vector *h* are equal to the number of the filter coefficients that have unique values. For example, the low-pass filter mask, shown in Fig. 1, contains only 2 different values, namely: {1, 2}, so the length of the vector *h* is in this case equal to 2.



Figure 4: Realization of a pixel block with a programmable connection map.



Figure 5: Example connection maps for: (a) simple lowpass filter from Fig. 1, (b) simple high pass filter, (c) lowpass filter with combined coefficients. In the last case value 2 is realized as 3-1, while value 4 as 3+1.

The GSVM circuit calculates currents  $I_p$  that are products of the input pixels  $A_{xy}$  and particular coefficients  $h_i$ . As a result, the structure shown in Fig. 3 provides all necessary partial products  $I_p$  (*p* for short) in parallel, and these can be then used to calculate any output pixel  $B_{xy}$ . To realize it, a proper connection map is required between the signals  $p_{xyz}$ and the output signals  $B_{xy}$ . Notice that particular products  $p_{xyz}$  can be used in several output samples, and therefore an equivalent number of copies of each of these signals must be available. A single pixel block, which provides all necessary copies of  $p_{xyz}$ , is shown in Fig. 4. Particular branches (connections) in this circuit are controlled using configuration bits  $d_{n,mz,l}$  that are programmed offline. These bits establish only the connection map, but values of the filter coefficients finally depend on values of the currents  $I_{\rm h}$ .



Figure 6: Calculation scheme of the output signal  $B_{xy}$ .

It is convenient to collect the configuration bits dinto a single 4-dimensional matrix D, in which the first two dimensions (n, m) are determined by dimensions of the filter mask H. The 3<sup>rd</sup> dimension (z) is determined by the length of the vector  $P_{xy}$ , i.e. by number of unique filter coefficients, while the 4<sup>th</sup> dimension is always equal to 2. Theoretically products from a given vector  $P_{xy}$  can be used in calculation of  $n \cdot m$  output samples. In this situation bits that are under position n, m in the matrix Ddetermine how the products from this vector will be used to calculate a given signal  $B_{xy}$ . The 3<sup>rd</sup> dimension determines which product(s) from a given vector  $P_{xy}$  will be added to a given signal  $B_{xy}$ , while the last dimension determines if this product(s) will be added with a positive or the negative value. Taking this principle into account, an example matrix D in case of the mask shown in Fig. 1 would have dimensions 3, 3, 2, 2, so the total number of the configuration bits d is in this case equal to only 36.

Several example connection maps for different filter masks both the high-pass and the low-pass, together with required currents representing particular filter coefficients  $h_i$ , are shown in Fig. 5. In presented examples 2 unique filter coefficients are assumed for simplicity.

It is worth noting that when two different filters have an equal connection map and differ only by value of selected coefficients then switching between such filters requires only adjusting the currents  $I_{\rm h}$ . It is also worth noting that although the number of coefficients in the vector h is usually low, these coefficients can be combined together to provide an effectively larger number of coefficients, as in the example map shown in Fig. 5 (c).

Products *p* that are added as negative values are summed in a separate junction, then inverted in a single NMOS-type current mirror, and finally added to the output signal, as shown in Fig 6. The potential problem in this approach occurs when sum of the negative values is larger than sum of the positive values, which may turn off the output current mirror. This problem has been solved by introducing an additional constant current  $I_{DC}$ , which can be optionally added to the output signal in such a case.

### **3 CMOS IMPLEMENTATION OF THE PARALLEL FILTER AND EXPERIMENTAL RESULTS**

An experimental image processing block has been realized by authors in a 180 nm CMOS process. In this prototype the image resolution is equal to  $6 \times 1$  pixels, while the mask has dimensions  $3 \times 1$ . Such parameters are sufficient to verify the concept, but as the filter has a modular structure, higher resolutions can be easily realized. For example, the filter with 8x8 pixels with 3x3 masks has been successfully verified in the HSPICE simulations.



Figure 7: Experimental programmable image filter realized in three versions with different transistor dimensions.

The internal structure of the chip is shown in Fig. 7. The common problem in the Gilbert multipliers used in the proposed filter, in which transistors operate in weak inversion is influence of the transistor mismatch on the filter accuracy. To verify influence of this effect the experimental filter has been designed in three versions that differ only in the transistor dimensions (filter 1, filter 2, filter 3). The measurement results show that mismatch has a

limited influence only on a DC value at particular filter outputs, which can be easily corrected at the digital side, after analog-to-digital conversion.



Figure 8: Measurements for small signals for the low pass filter (top) input signals A3 and A4 (bottom) the filter's outputs B1-B6. Results for are for the 'filter 2'.



Figure 9: Low pass filter (top) HSPICE postlayout simulations and (bottom) the measurement results.

Selected measurement results and post-layout simulations have been presented to demonstrate the filter performance. The first and the second experiments, shown in Figs. 8 and 9, are for lowpass filter mask, H = [1, 1, 1] for small and for large input signals respectively. Fig. 8 illustrates additionally the input waveform applied to the A3 and A4 inputs. The other inputs are constant and equal to the bottom value of this pulse signal.



Figure 10: High pass filter (top) HSPICE postlayout simulations and (bottom) the measurement results.

Table 1: Example results for the low-pass filter [1,1,1] (see Fig. 8).

| Input [µA] | $A_1$                 | $A_2$                 | $A_3$                 | $A_4$ | $A_5$                 | $A_6$                 |
|------------|-----------------------|-----------------------|-----------------------|-------|-----------------------|-----------------------|
|            | 2.5                   | 2.5                   | 5.098                 | 5.098 | 2.5                   | 2.5                   |
| Out [µA]   | <b>B</b> <sub>1</sub> | <b>B</b> <sub>2</sub> | <b>B</b> <sub>3</sub> | $B_4$ | <b>B</b> <sub>5</sub> | <b>B</b> <sub>6</sub> |
| calculated | 3.880                 | 4.461                 | 4.757                 | 4.757 | 4.176                 | 3.595                 |
| simulated  | 3.595                 | 4.475                 | 4.761                 | 4.761 | 3.885                 | 3.002                 |
| measured   | 3.562                 | 4.456                 | 4.768                 | 4.754 | 3.851                 | 2.940                 |
| er_S/C %   | 15.68                 | 0.77                  | 0.22                  | 0.22  | 16.03                 | 32.60                 |
| er_M/C %   | 17.50                 | 0.27                  | -0.60                 | 0.20  | 17.88                 | 36.03                 |
| er_M/S %   | -1.81                 | -1.04                 | 0.39                  | -0.41 | -1.86                 | 3.43                  |

Table 2: Example results for the high-pass filter [1,-1,1] (see Fig. 9).

| Input [µA]            | $A_1$                 | $A_2$                 | $A_3$                 | $A_4$                 | $A_5$                 | $A_6$                 |
|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
|                       | 2.5                   | 2.5                   | 5.098                 | 5.098                 | 2.5                   | 2.5                   |
| Out [µA]              | <b>B</b> <sub>1</sub> | <b>B</b> <sub>2</sub> | <b>B</b> <sub>3</sub> | <b>B</b> <sub>4</sub> | <b>B</b> <sub>5</sub> | <b>B</b> <sub>6</sub> |
| calculated            | 2.195                 | 2.420                 | 2.186                 | 2.420                 | 2.654                 | 2.645                 |
| simulated             | 1.885                 | 2.407                 | 2.186                 | 2.405                 | 2.686                 | 3.001                 |
| measured              | 1.969                 | 2.416                 | 2.184                 | 2.402                 | 2.643                 | 2.913                 |
| er_S/C %              | 32.83                 | 1.38                  | -0.01                 | 1.63                  | -3.40                 | -37.7                 |
| er_M/C <mark>%</mark> | 23.94                 | 0.40                  | 0.23                  | 1.86                  | 1.12                  | -28.4                 |
| er_M/S %              | 8.89                  | 0.97                  | -0.24                 | -0.23                 | -4.52                 | -9.26                 |

A detailed comparison for the experiment with large input signals is provided in Table 1. To investigate the dynamic parameters of the filter, the input image, A, was varied during the test. The  $A_1$ ,

 $A_2$ ,  $A_5$  and  $A_6$  input samples were constant during the test and equal to 2.5  $\mu$ A, while the others ( $A_3$ ,  $A_4$ ) were oscillating in the range between c.a. 2.19 and 5.1  $\mu$ A. Theoretical (calculated), simulated and measured values of the output pixels are given in the 4<sup>th</sup>, 5<sup>th</sup> and 6<sup>th</sup> rows respectively.

The results have been compared in terms of the errors given in last three rows, in which S/C means simulated vs. calculated results etc. All presented results are for supply voltage 0.8 V.

The results for an example highpass filter with a mask  $H = \begin{bmatrix} -1 & 1 & 0 \end{bmatrix}$  are shown in Fig. 10 and compared in details in Table 2.

Notice that error values in signals  $B_2$ - $B_4$  are below 1-2 %. This is regularly observed for different input images and filter masks. On the other hand, errors concerning the  $B_1$ ,  $B_6$  and for the low-pass filter additionally  $B_5$  outputs (due to  $A_6$  being zeroed) are larger, which is due to the border effect where samples are calculated using only two inputs  $A_i$ .

The noise at the output is at a level of 1 mV – the currents are measured as voltages on  $100 \text{ k}\Omega$  resistors. The measured dynamic range depends on the average value of the input pixels as well as on type of the filter mask. The example measured values are as follows:

- a) for large signals (about  $3 \mu A$ ),
- low pass filter SNR = 36dB (6 bits); see Fig. 9 b) for large signals (about 3  $\mu$ A),
- high pass filter SNR = 30dB (5 bits); see Fig. 10 c) for small signals (about 0.6  $\mu$ A),
- low pass filter SNR = 24dB (4 bits); see Fig. 8 d) for small signals (about 0.6  $\mu$ A).
- high pass filter SNR = 21dB (3 bits)

The maximum data rate depends on values of the input signals and the transistors dimensions used in GSVM circuits. The example data rates are as follows:

- a) filter 1 (W/L=15/5 μm)
- is equal to 15.15 [kframes/s],

b) filter 2 (W/L=8/2 μm) is equal to 21.74 [kframes/s],

- c) filter 3 (W/L= $6/1 \mu m$ )
  - is equal to 33.33 [kframes/s]

In our experimental filter each frame consists of 6 pixels. Power dissipation depends on values of the input currents  $I_{xy}$ . For small input values c. 0.6  $\mu$ A power dissipation is equal to 5  $\mu$ W, while for larger values c. 3  $\mu$ A it equals to 30  $\mu$ W.

It is worth noting that filter accuracy, which has an influence on the gray scale depth, does not differ significantly between the simulations and the measurements, although the environmental noise limits the possible SNR. The noise is kept at a level of 1 mV, which limits the filter accuracy by c. 6 dB (1 bit). The accuracy is lower in case of high-pass filters, which is due to additional signal inverting.

On the other hand the data rate significantly differs between simulations and measurements. The lower data rate attained in measurements is due to large capacitances of pads as well as the setup environment. The conclusion is that when the filter is part of a bigger system, in which the output analog signals are further processed in the chip, the data rate can be higher and closer to the simulation results. In this case energy dissipated per calculation of a single pixel will be much smaller.

### **4** CONCLUSIONS

A 2-D analog, current-mode FIR filter has been proposed in this paper. The main building block in our filter is the Gilbert scalar-by-vector multiplier that allows for an ultra-low power dissipation due to transistors operating in weak inversion.

The proposed filter is a programmable solution, with a very simple logic block and small number of programming bits that allow for realization of different filter masks both the high-pass and the lowpass. One of the main advantages is a parallel and asynchronous calculation of all output pixels without using clock generators that typically are source of feedthrough noise.

To verify the proposed idea, three experimental image filters with different transistor dimensions have been designed in a 180 nm CMOS technology. Attenuation observed in post-layout HSPICE simulations reaches a level of about 55 dB. Theoretical analysis and measurement results concerning the influence of the transistor-thresholdvoltage mismatch on the filter properties shows that even in the worst case scenario, attenuation higher than 36-dB (6 bits) can be achieved, for the mismatch that is at the level of 2-3 %, which is sufficient for many practical applications e.g. in endoscopic capsules. The attained lower attenuation is caused by an environmental noise that is present in the input signal, as shown in Fig. 8 (top).

The filter performance is summarized in Table 3. The data rate is given in image frames/s as in our proposed filter this parameter does not depend on number of pixels in a single frame. The data rate attained in measurement significantly exceeds those usually required is endoscopic capsules i.e. several frames/s (Xie *et al.*, 2006). This creates the possibility to switch off the circuit for most of the

time and to save energy, which is one of the key criteria in wireless endoscopic capsules.

| Table 3. Summary of t | the image  | filtore 1 | parformanca  |
|-----------------------|------------|-----------|--------------|
| 1 auto 5. Summary 01  | une innage | IIIICIS I | Jerrormance. |

| Parameter                       | Small signals                         | Large signals    |  |
|---------------------------------|---------------------------------------|------------------|--|
| Voltage supply                  | 0.8 V                                 | 0.8 V            |  |
| effective data rate(measur.)    | 15 kframes/s                          | 15 kframes/s     |  |
| effective data rate(simulate.)  | 350 kframes/s                         | 1 Mframes/s      |  |
| SNR (dB)                        | 21-24 (3-4 bits)                      | 30-36 (5-6 bits) |  |
| Power dissipation (6 pixels)    | 5 µW                                  | 30 µW            |  |
| Energy/pixel (measured)         | 55 pJ                                 | 250 pJ           |  |
| Energy/pixel (simulated)        | 2.3 pJ                                | 5 pJ             |  |
| Process                         | TSMC CMOS 0.18 µm                     |                  |  |
| Die area - one filter: 6 pixels | 350 x 150 μm (0.052 mm <sup>2</sup> ) |                  |  |
| Image/mask resolution           | 6 x 1 / 3 x 1                         |                  |  |

### ACKNOWLEDGEMENTS

The work is supported by EU Marie Curie Outgoing International Fellowship No. 021926

## REFERENCES

- Meng, M.Q.-H. Tao Mei, Jiexin Pu, Chao Hu, Xiaona Wang, Yawen Chan, "Wireless robotic capsule endoscopy: state-of-the-art and challenges", 5th World Congress on Intelligent Control and Automation (WCICA), 2004, Vol. 6, June 2004, pp.5561 - 5555
- X. Xie, G. Li, X. Chen, X. Li, Z. Wang, "A Low-Power Digital IC Design Inside the Wireless Endoscopic Capsule", IEEE Journal of Solid-State Circuits, Vol. 41, Issue 11, Nov. 2006, pp. 2390 - 2400
- Shaou-Gang Miaou, Shih-Tse Chen, Fu-Sheng Ke, "Capsule endoscopy image coding using waveletbased adaptive vector quantization without codebook training", 3rd International Conference on Information Technology and Applications (ICITA), Vol. 1, July 2005, pp. 634 - 637
- C. Winstead, Analog Iterative Error Control Decoders, Ph.D disserta-tion, University of Alberta, ECE Department, Edmonton, ABa, 2004.
- G. Linan, P. Foldesy, S. Espejo, R. Dominguez-Castro, A. Rodriguez-Vazquez, "A 0.5µm CMOS 106 transistors analog programmable array processor for real-time image processing", 25th European Solid-State Circuits Conference (ESSCIRC), 1999, pp. 358-361.
- R. Serrano-Gotarredona, T. Serrano-Gotarredona, A. Acosta-Jimenez, C. Serrano-Gotarredona, J. A. Perez-Carrasco, A. Linares-Barranco, G. Jimenez-Moreno, A. Civit-Ballcels, and B. Linares-Barranco, "On Real-Time AER 2D Convolutions Hardware for Neuromorphic Spike Based Cortical Processing," *IEEE Trans. on Neural Networks*, vol.19, No.7, pp. 1196-1219, July 2008