Contour Learning and Diffusive Processes for Colour Perception

Francisco J. Diaz-Pernas, Mario Martínez-Zarzuela, Miriam Anton-Rodriguez

and David González-Ortega

Imaging and Telematics Group, University of Valladolid, Campus Miguel Delibes, 47011, Valladolid, Spain

Keywords: Computer Vision, Contour Learning, Boundary Detection, Neural Networks, Colour Image Processing, Bio-

Inspired Models.

Abstract: This work proposes a bio-inspired neural architecture called L-PREEN (Learning and Perceptual boundaRy

rEcurrent dEtection neural architecture). L-PREEN has three different feedback interactions that fuse the

bottom-up and top-down contour information of visual areas V1-V2-V4-Infero Temporal. This recurrent

model uses colour, texture, and diffusive features to generate surface perception and contour learning and

recognition processes. We compare the L-PREEN model against other boundary detection methods using the

Berkeley Segmentation Dataset and Benchmark (Martin et al., 2001). The results obtained show better

performance of L-PREEN using quantitative measures.

1 INTRODUCTION

In this paper, a bio-inspired neural architecture called

L-PREEN (Learning and Perceptual boundaRy

rEcurrent dEtection Neural architecture) is proposed.

L-PREEN has three different feedback interactions

that fuse the bottom-up and top-down contour

information of visual areas V1-V2-V4-IT. This

recurrent model uses colour, texture and diffusive

features to generate diffusive surface perception and

contour learning processes.

There are several artificial neural models that

model the behaviour of the human system (Grossberg

and Williamson, 1999), (Kokkinos et al., 2008),

(Mingolla et al., 1999). There are just a few bio-

inspired proposals in the literature that include colour

as a fundamental characteristic in the early image

processing. Grossberg and Huang (Grossberg and

Huang, 2009) proposed ARTSCENE for colour scene

recognition. Their model for colour features

extraction is comprised of three independent R, G,

and B channels, without a formulation of colour

opponent channels. Similarly, using RGB

independent channels, Hong and Grossberg (Hong

and Grossberg, 2004) proposed a bio-inspired

neuromorphic model for removing the variations on

the illumination effects in colour natural scenes.

Using colour opponent signals, Vonikakis et al.

(Vonikakis et al., 2006) proposed a model with

contour extraction in colour images. In their

opponency model, they propound two positive-

negative concentric square neighbourhoods where the

direct subtraction of the opponent signals is

performed.

2 PROPOSED RECURRENT

BOUNDARY DETECTION

ARCHITECTURE

The L-PREEN architecture detects the most

perceptual significance natural boundaries and

generates surface perception. The L-PREEN

recurrent interactions fuse the bottom-up (BU) and

the top-down (TD) informations. The L-PREEN

model is comprised of seven main stages (see Figure

1): an Opponent Channel stage (OC), a Contour

Channel stage (CC), a Competitive Fusion stage

(CF), a Cooperative Saliency stage (CS), a Region

Enhancement stage (RE), and Contour Learning

Neural Network (CLNN) . These stages are applied in

three scales s (s=0 small, s=1 medium, s=2 large). The

LPREEN model processes three signals, luminance

and two chromatics:







(



)

=(



+



+



)/3,







(



)

=



−









(



)

=



−(



+



)/2,

Diaz-Pernas, F., Martínez-Zarzuela, M., Anton-Rodriguez, M. and González-Ortega, D.

Contour Learning and Diffusive Processes for Colour Perception.

DOI: 10.5220/0005955200890094

In Proceedings of the 13th International Joint Conference on e-Business and Telecommunications (ICETE 2016) - Volume 5: SIGMAP, pages 89-94

ISBN: 978-989-758-196-0

Figure 1: Scheme of the L-PREEN architecture.

2.1 Contour Channel Stage

The CC stage in L-PREEN models the behavior of the

simple and complex cells in V1. The activity of

simple cells is obtained through Gabor filters, 

,,

()

for three scales (s=0,1,2), six orientations (k=0,…,5

corresponding to θ=0º, 30º, 60º, 90º, 120º, 150º),

deviations =1.0,2.0,4.0 and frequencies =

0.1,0.07,0.03.

The model equation for RG channel is detailed in

(1).

(





(



)







(



)

)=



()

⨂

,,

()

(1)

where, ⨂represents matrix convolution.

The complex cells in L-PREEN receive inputs

from the bottom-up pathway (BU) coming from the

OC stage and the top-down pathways (TD) of the RE-

V4 stage and the LCNN stage (see Figure 1). These

inputs are fused together following equation (2) with





=max(,0) and gain constants α, β and κ (1.0,

1.0, 0.6);





(



)

=∝





(



)





+







(



)





+













(



)













(



)







(2)











(



)













(



)







)

2.2 Competitive-Cooperative Loop

The inner BU-TD interaction is performed in the

competitive CF and cooperative CS stages through a

multiplicative competitive network with double

inhibition among spatial positions and among

orientations. This recurrent interaction detects,

regulates, and completes boundaries into globally

consistent contrast positions and orientations, while it

suppresses activations from redundant and less

important contours, thus eliminating image noise.

The CF activity, 



()

, follows equation (3) with, 



()

and 



()

are Gaussian functions and 

()



0.4,0.8,1.0



,



()

=







()

+







()

is the fusion

between the BU signal from the CC stage (



()

) and

the TD signal from the CS stage (



()

, see equation

4). [c]=max(c, 0), A4=10.0, Kh=1.0, Kf=5.0, Cc=0.3

and Ci=0.1.





(



)







()

−



∑∑





()





()

−



∑





()





()







+



()

∑∑





()





()

∑





()





()







(3)

()( )

() () () ()

()

() () () ()

ss s s

pqk pqk pqk pqk

ijk

ss s s

pqk pqk pqk pqk

zPUzNU

APU NU

∑

∑∑

(4)

where 



()

and 



()

are the receptive field

lobules with bipole profile (see Figure 2), A

=0.001

is a constant and z(s) =max(s-α, 0), α=0.1.

This competition-cooperation recurrence is

computed in an iterative way following equations (3)

and (4). In our test simulations, convergence is

accomplished after 56 iterations.

Figure 2: Profile of the dipole.

2.3 Region Enhancement Stage

Region enhancement performs diffusion processes

from three channels.

SIGMAP 2016 - International Conference on Signal Processing and Multimedia Applications

These diffusions yield a colour coherent

homogenization within significant regions of the

natural scene. This region enhancement corresponds

to the surface perception.

For the diffusion process an iterative scheme

based in equation (5) is proposed. In this equation,





()

is the γ-channel diffusion, A7 =1.0 is a constant,





()

represents the OC signal rg, and 



are the

nearest neighbors to position (i, j), 













(



(



)





(



)

is the permeability with



()

∑





()



, Ke=1.0, Kp=10.0 are positive constants.





()

=



()

in =0





()





()

∑





()

,∈











∑



,∈



(5)

Fig. 4 shows L-PREEN diffusion outputs. These

diffusions (surface perception) will be filtered to

extract the contours which will behave as feedback.

2.4 Contour Learning Neural Network

The CLNN stage includes a SOON neural network

(Antón et a., 2009) based in Fuzzy ARTMAP

(Carpenter et al., 1992) that merges the contour

information coming from different scales, in order to

generate the learning of the ground truth contours

traced by humans (see Figure 3-down).

The contour activity pattern Rij is expressed

following equation (6).





=(





(



)







(



)

,





(



)







(



)

,..,







(



)

,…,





(



)

,…, (6)







(



)







(



)

,





(



)







(



)

,..,





(



)

,…,







(



)

,…,





(



)

,…)

where (i, j) is the position, (



(



)

,



(



)

) is the pair

corresponding to the real and imaginary parts of the

Gabor filtering (see equation 1). SOON network has

two levels of neural layers, the input level C1 with 6

layers and complementary coding, and the

categorization level C2 (see figure 3-up). To link

these two levels, there is an adaptive filter with

weights 



where prototypes learnt of the categories

generated are stored.

In the train phase, the positions belonging to

human contours of the ground truth images are taken

as a reference for contour pattern learning (see Fig. 3-

down). In the test mode, the weights of the selected

node, 



(D=winner category) determine the

Figure 3: Up: Scheme of the SOON architecture. Down:

Original image with overlapped human contours. The

points of ground truth represented in red are the learning

points.

feedback signals of the LCNN contour learning stage

(





(



)

,





(



)

) (see equation (2)).

2.5 Experimental Results

We have used every single image of the BSD300

from the Berkeley Segmentation Dataset (Martin et

al., 2001), the 200 images for train and the 100 images

for test, together with their human segmented images

counterparts. The latter were taken as the ground truth

for learning process and to accomplish F-values. The

F-value is the harmonic mean of precision (the

fraction of true positives) and recall (the fraction of

ground-truth boundary pixels detected), at the optimal

detector threshold. Depending on the relative cost

between these measures, the F-value is expressed

according to Eq. (7) where P stands for precision, R

for recall and α is the relative cost.

Contour Learning and Diffusive Processes for Colour Perception

=



+

(

1−

)



(7)

For the tests performed using L-PREEN we chose

a relative cost α = 0.5,

The average F-value obtained is F=0.64 (0.60,

0.69), as showed in Fig. 4. Fig. 5 shows some of the

segmentation results obtained using L-PREEN We

are going to compare the L-PREEN results with a

method for boundary detection based in tensor voting

with perceptual grouping in natural scenes that has

been made available publicly. Perceptual groupings

achieve to extract illusory figures o completed

boundaries following the Gestalt visual perception

principles. Therefore, the comparison method

proposes a non-neural scheme of perceptual

groupings of natural figures in cluttered backgrounds

different from L-PREEN. Loss et al. (Loss et al.,

2009) proposed an iterative method based in

multiscale tensor voting. Loss et al.’s approach

consists in iterative removing image segments and

applying a new voting over the rest of the segments,

in order to estimate the most reliable saliency. The

tensor representation chosen was subsets of pixels to

form the tensor with ball or stick tensors initialization.

The decision on this representation is based in

reducing the number of tensors, what at the end

reduces the computation time. In this work they

present an evaluation of their method using two

datasets: fruit synthetic images and the BSDS300

Berkeley dataset. In the latter evaluation, they use

five base segmentation methods (Gradient Magnitude

(GM), Multi-scale Gradient Magnitude (MGM),

Texture Gradient (TG), Brightness Gradient (BG) and

Brightness/Texture Gradient (BTG), to generate a

Boundary Posterior Probability map (`segmentation

feeders’). This map is employed as a preprocessing

step for their method. In order to quantify the results,

they obtained the F-value and Precision-Recall

graphs. The F-values obtained using the five methods

over the 100 test images of the Berkeley Dataset

where 0.57, 0.58, 0.57, 0.60 and 0.62 respectively. L-

PREEN obtains an F-value of 0.64, as it is showed in

Table 1, which is a better result than all the five

versions of the comparison method.

We took the Matlab code of the gPb method

(Global Probability of Boundary) (Arbelaez et al.,

2011), third position in the ranking published in the

Berkeley-Benchmark http://www.eecs.berkeley.edu/

Research/Projects/CS/vision/bsds/bench/html/image

s.html) offered by their authors in the University of

Berkeley website and performed some executions for

the 100 test images, obtaining an average execution

time of 403.29 s per image, while L-PREEN needs

159.12 s.

Figure 4: Precision-recall curve.

Table 1: Comparative results.

Method F-value

Loss et al’s method with GM method 0.57

Loss et al’s method with MGM

method

0.58

Loss et al’s method with TG method 0.57

Loss et al’s method with BG method 0.60

Loss et al’s method with BTG method 0.62

L-PREEN 0,64

3 CONCLUSIONS

This work presents a new model, L-PREEN, for

detecting the boundaries and the surface perception of

colour natural images. This model is bio-inspired on

processes in V1, V2, V4 and IT visual areas of the

Human Visual System.

L-PREEN model includes orientational filtering,

competition among orientations and positions, and

cooperation through bipole profile fields and contour

learning. The proposed architecture has been

compared with Loss et al.’s method (Loss et al.,

2009), obtaining better results. A major advantage of

the L-PREEN model is its speed when compared to

other methods. L-PREEN can be implemented using

matrix and convolution operations, making it

compatible and scalable with parallel processing

hardware. This research will be our future work.

SIGMAP 2016 - International Conference on Signal Processing and Multimedia Applications

Figure 5: Examples of processing results using L-PREEN. Left column: original image with overlapped human contours.

Second column: L-PREEN boundary output. Third column: diffusion output (rg channel).

REFERENCES

Antón-Rodríguez, M., Díaz-Pernas, F. J., Díez-Higuera, J.

F., Martínez-Zarzuela, M., González-Ortega, D.,Boto-

Giralda, D., (2009). Recognition of coloured and

textured images through a multi-scale neural

architecture with orientational filtering and chromatic

diffusion. Neurocomputing, 72:3713–3725.

Arbelaez, P., Maire, M., Fowlkes, C. and Malik, J. (2011).

Contour Detection and Hierarchical Image

Segmentation. IEEE TPAMI, Vol. 33, No. 5, pp. 898-

916.

Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds,

J.H., Rosen, D.B., (1992). Fuzzy ARTMAP: A Aneural

Network Architecture for Incremental Supervised

Learning of Analog Multidimensional Maps. IEEE

Transactions on Neural networks, 3(5):698-712.

Grossberg, S., Williamson, J.R., (1999). A self-organizing

neural system for leaning to recognize textured scenes.

Vision Research, 39:1385-1406.

Grossberg, S., Huang, T.. (2009). ARTSCENE: A neural

system for natural scene classification. Journal of

Vision, 9(4):1–19.

Contour Learning and Diffusive Processes for Colour Perception

Hong, S., Grossberg, S.. (2004). A neuromorphic model for

achromatic and chromatic surface representaction of

natural images. Neural Networks, 17:787-808.

Kokkinos, I., Deriche, R., Faugeras, O., Maragos, P..

(2008). Computational analysis and learning for a

biologically motivated model of boundary detection.

Neurocomputing, 71(10-12):1798-1812.

Loss, L., Bebis, G., Nicolescu, M., Skurikhin, A.. (2009).

An iterative multi-scale tensor voting scheme for

perceptual grouping of natural shapes in cluttered

backgrounds. Computer Vision and Image

Understanding 113:126–149.

Martin, D., Fowlkes, C., Tal, D., Malik, J.. (2001). A

Database of Human Segmented Natural Images and its

Application to Evaluating Segmentation Algorithms

and Measuring Ecological Statistics. Proc. 8th Int'l

Conf. Computer Vision, 2: 416-423.

Mingolla, E., Ross, W., Grossberg, S., (1999). A neural

network for enhancing boundaries and surfaces in

synthetic aperture radar images. Neural Networks,

12:499-511.

Vonikakis, V., Gasteratos, A., Andreadis, I.. (2006).

Extraction of Salient Contours in Color Images. 4th

Panhellenic Conference of Artificial Intelligence

(SETN 2006), Lecture Notes in Computer Science,

3955:400-410, Heraklion, Greece.

SIGMAP 2016 - International Conference on Signal Processing and Multimedia Applications