Automatic Perception Enhancement for Simulated Retinal Implants

Johannes Steffen, Georg Hille and Klaus T

onnies

Department of Simulation and Graphics, Otto-von-Guericke University, Magdeburg, Germany

Keywords:

Bionic Vision, Retinal Implants, Vision Restauration.

Abstract:

This work addresses the automatic enhancement of visual percepts of virtual patients with retinal implants.

Speciﬁcally, we render the task as an image transformation problem within an artiﬁcial neural network. The

neurophysiological model of (Nanduri et al., 2012) was implemented as a tensor network to simulate a virtual

patient’s visual percept and used together with an image transformation network in order to perform end-to-end

learning on an image reconstruction and a classiﬁcation task. The image reconstruction task was evaluated

using the MNIST data set and yielded plausible results w.r.t. the learned transformations while halving the

dissimilarity (mean-squared-error) of an input image to its simulated visual percept. Furthermore, the classiﬁ-

cation task was evaluated on the cifar-10 data set. Experiments show, that classiﬁcation accuracy increases by

approximately 12.9% when a suitable input image transformation is learned.

1 INTRODUCTION

Due to a severe degeneration of photoreceptors and

other cells within the retina throughout the course of

retinitis pigmentosa (RP), transduction of light into

electrochemical signals and further neural processing

will become signiﬁcantly limited or even impossible

leading to complete blindness of patients.

For healthy subjects, light is transducted by pho-

toreceptors and subsequently processed within the

retina. Hereby, the signal is forwarded and pro-

cessed throughout the inner nuclear layer consisting

of bipolar-, amacrine-, and horizontal cells and the

ganglion cell layer consisting of multiple types of

ganglion cells with their axons forming the optical

nerve (please see Figure 2 for a schematic overview).

For patients suffering from RP this process is severely

deﬁcient, since photoreceptors and other cells within

the retina degenerate and prevent further process-

Figure 1: Top row: Original images from the cifar-10 data

set. Bottom row: Examples of corresponding simulated

visual percepts by simulating a retinal implant with Alpha

IMS alike parameters.

ing.In later stages of RP, this will lead to complete

blindness.

1.1 Vision Restoration

Recently, new types of medical licensed retinal im-

plants became available that are capable of restoring

the loss of vision caused by RP. Today, the most pop-

ular types of retinal implants are: 1) Epiretinal im-

plants that are placed on top of the retina housing

an array of electrodes stimulating ganglion cells and,

therefore, signals are only indirectly fed into the in-

ner circuitry of the retina (Humayun et al., 2012) and

2) Subretinal implants that are placed below the retina

inside the area of degenerated photoreceptors housing

an array of electrode / photodiode pairs (Stingl et al.,

2013; Stingl et al., 2015) (see Figure 2 for the place-

ment of subretinal implants).

Common to both is a severe loss of visual qual-

ity w.r.t. healthy vision (see Figure 1 for a simu-

lated percept using a retinal implant). The reasons are

manifold: Acuity is signiﬁcantly lower compared to

healthy vision due to reduced spatial sampling using

only a small amount of photodiodes and transmitting

electrodes (∼ 60 electrodes for epiretinal to ∼ 1600

electrodes/photodiodes for subretinal implants), vi-

sion is limited to grey-scale illumination with lower

contrast, poor spatial sampling due to physical bound-

aries such as an electrode’s size, signal distortions

due to axonal streaks (epiretinal), as well as unwanted

stimulation of nearby cells in an elcectrode’s vicinity.

908

Steffen, J., Hille, G. and Tönnies, K.

Automatic Perception Enhancement for Simulated Retinal Implants.

DOI: 10.5220/0007695409080914

In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 908-914

ISBN: 978-989-758-351-3

Bipolar cellsGanglion cells

with axons as optical ﬁber

Epithelium

Signal processing direction

Degenerated photoreceptors

and subretinal implant

Figure 2: Simpliﬁed illustration of a human eye ball and a subretinal implant with one electrode/photodiode. Incoming

light is captured using the implant’s photodiode and a cathodic monophasic pulse with a duration of 1 ms is emitted by the

corresponding electrode with a working frequency of 5 Hz to 20 Hz.

1.2 Subretinal Implants

For the purpose of perception enhancement, subreti-

nal implants have two major advantages compared to

their epiretinal counterparts. First, the spatial res-

olution is higher (1600 versus 60 electrodes within

an area of ∼ 13mm

) yielding potentially higher vi-

sual acuity and, second, due to their subretinal place-

ment the likelihood to stimulate nearby axons travers-

ing epiretinally, thus, causing axonal streaks, is re-

duced (cf. (Steffen et al., 2018; Beyeler et al., 2017)).

Therefore, we focussed on the automatic perception

enhancements without modelling axonal streaks and

with a number and sizes of electrodes that are used in

subretinal implants.

1.3 Modelling Visual Perception - A

Virtual Patient

For the quantitative and qualitative analysis of visual

percepts from patients with (sub-)retinal implants, ef-

forts were made to model the cascade of process-

ing steps found in neurophysiological experiments.

Beyeler et al. (Beyeler et al., 2017) introduced their

framework pulse2percept for simulating visual per-

cepts with great ﬂexibility w.r.t. a vast amount of

model parameters, e.g., implant type and its speciﬁ-

cations, implant’s position, spatial and temporal sam-

pling, and the underlying neurophysiological model.

We follow the lines of Beyeler et al. by using the

phrase virtual patient and virtual patient’s percept to

refer to a simulated patient and its corresponding vi-

sual perception based on a neurophysiological model.

An example of such subretinal implant simulation can

be seen in Figure 1.

Limitations and Focus of this Work. It has to be

stressed, that the work of (Beyeler et al., 2017) and

(Nanduri et al., 2012) in simulating and modelling vi-

sual percepts is only a coarse approximation of the

neurophysiological processes present in the retinal

circuity. However, the most prominent aspects of

the spatiotemporal information reduction using reti-

nal implants (e.g., a blurred stimulation by spatial

cross-activations) are still captured using the under-

lying simpliﬁcations. Thus, methods for the enhance-

ment of visual percepts w.r.t. certain visual tasks need

to deal with these kinds of information reduction and

it is likely, that solutions that adequately tackle those

will still provide good grounds if the underlying neu-

rophysiological model is updated.

Moreover, the exact neural processing within the

retinal circuity is still subject to research, but retinal

implants are present and used today. Therefore, we do

believe that it is necessary and possible to address the

question, whether it is possible to enhance the visual

perception using computer vision methods given an

interchangeable neurophysiological model.

1.4 Outline

As can be seen in Figure 1, the visual perception of

virtual patients with retinal implants is signiﬁcantly

limited compared to normal vision. Due to spatially

overlapping activations of ganglion cells in the neigh-

bourhood of an electrode and the low spatial reso-

lution, visual acuity is signiﬁcantly deteriorated and

the percept appears severely blurred. We are inter-

ested to investigate whether it is possible to enhance

the virtual patient’s visual perception, given the phys-

ical bounds and limitations of current retinal implants.

Speciﬁcally, we seek a suitable transformation of an

original input stimulus, such that its perceived version

by a virtual patient using a retinal implant looks more

Automatic Perception Enhancement for Simulated Retinal Implants

909

Transformation Network

Neurophysiological

Network

Classiﬁcation Network

ZeroPadding / Conv2D (3 × 3) / ReLU / BatchNorm FullyConnected / ReLUConv2D (5 × 5) / MaxPooling (2,2) / ReLU / BatchNorm

Figure 3: Overview of the proposed networks. An input image is fed to the transformation network to obtain a suitable

transformation before passing it to the neurophysiological network and ﬁnally either to the classiﬁcation network (yellow

path) or it is compared using a mean-squared-error with a properly resized version of the original input (green path). The

red path indicates the processing ﬂow for the baseline classiﬁcation of a unaltered visual percept, thus, no transformation is

learned. Please note, that for the output of the classiﬁcation network a softmax activation is used to obtain class memberships.

similar to its original or provides better grounds for

visual tasks, e.g., object classiﬁcation.

Following (Steffen et al., 2018), we model the im-

age transformation process as an artiﬁcial neural net-

work. However, instead of using a coarse approxi-

mation of the underlying neurophysiological model

of (Nanduri et al., 2012) by a spatial point-spread-

function, we explicitly implement it as a neural ten-

sor network used to simulate the output of electrodes’

stimulations in space and time using a cascade of

convolutions. Modelling the task of perception en-

hancement as a differentiable artiﬁcial neural network

allows us to automatically learn image transforma-

tions given arbitrary objective functions in an end-

to-end fashion using gradient descent based back-

propagation.

2 METHODS

We propose to model the neurophysiological signal

processing model of (Nanduri et al., 2012) as a ten-

sor network making it suitable to be plugged into an

artiﬁcial neural network enabling end-to-end learning

of various differentiable objective functions. After-

wards, we embed this neurophysiological tensor net-

work inside two artiﬁcial neural network architectures

to show its applicability w.r.t. two different tasks: Im-

age reconstruction and image classiﬁcation.

2.1 Neurophysiological Network

The neural processing of an electrode’s stimulation

yields spatial as well as temporal effects. Since we

are interested in the perceptual enhancement of still

images, an actual input image is fed into the model as

a spatiotemporal signal. We convert the original in-

put images having just one step in time to sequences

with multiple time steps. Speciﬁcally, we model an

input image as a spatiotemporal sequence comprising

200ms. This sequence is subsequently transformed

into a pulse train, spatially attenuated as a function

of distance to an activated electrode, and temporally

convolved with multiple gamma functions acting as

low-pass ﬁlters throughout time as proposed by Nan-

duri et al. (please refer to (Nanduri et al., 2012; Hor-

sager et al., 2009) for a detailed description).

2.2 Network Architectures

To demonstrate the applicability of the proposed

method, we evaluated two distinct tasks subject to

an image transformation network described below for

perceptual enhancements: First, we seek a transfor-

mation of the original input image, such that its trans-

formed perceptual output is most similar to the orig-

inal input (i.e., trying to reconstruct the original im-

age after the severe information loss introduced by

the retinal implant) and, second, we perform an ob-

ject classiﬁcation task with 10 classes.

In all of the experiments, we chose the simula-

tion parameters such that they approximately match

the medical licensed Alpha IMS subretinal implant

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

910

(Stingl et al., 2013). Therefore, our simulated reti-

nal implant consists of 40 × 40 electrode/photodiode

pairs, where each electrode has a circular shape with a

radius of 50µm. Electrodes were evenly arranged on a

squared grid with a distance of 70µm w.r.t. their cen-

ters. Pulse trains were generated for a stimuli duration

of 200 ms with monophasic cathodic pulses of 1 ms.

The temporal sampling steps where set to 0.0004 ms.

Thus, one still input image was represented as a pulse

train of shape (40, 40,500), i.e., spatial resolution of

the input at 40 × 40 with 500 simulated time steps.

Pulses occurred at a working frequency of 5Hz. Re-

garding the output of the network, we compromised

between computational complexity and accuracy of

the model by sampling every 25 µm on the retinal sur-

face yielding a spatial output resolution of 112 × 112

px. To obtain a single time step output after the

neurophysiological network we followed the work of

(Beyeler et al., 2017) by extracting the response at the

time step of highest output response. Please note, that

this network has no trainable parameters but has ﬁxed

convolutional kernels as described in (Nanduri et al.,

2012).

2.2.1 Image Transformation Network

The original input image is processed within the im-

age transformation network (see Figure 3 for more de-

tails). Since we are interested in generally suitable

transformations based on a rather small local neigh-

bourhood (and want to avoid transformations based

on image semantics) it consists of only 4 convolu-

tional blocks with kernels of size 3 × 3. The ﬁrst

three convolutional layers consist of 32 trainable ker-

nels and the last one of 1 to re-obtain the input shape

of the original image.

2.2.2 Image Reconstruction Task

For the task of image reconstruction, the input im-

age is fed into our proposed image transformation net-

work and its output is subsequently transformed to a

simulated percept using the implemented neurophys-

iological tensor network (Figure 3, green path) with

parameters as described in Section 2.2. Since the

output resolution of the transformed visual percept

does not match with the shape of the original input

image (40 × 40 electrodes input, 112 × 112 sampled

positions on the retina as output) the original image

is bi-linearly interpolated to match the output shape

of the neurophysiological network. The dissimilar-

ity between the two is then assessed using the mean-

squared-error.

2.2.3 Classiﬁcation Task

We choose a simple image classiﬁcation task with

10 object classes for evaluating the general plausibil-

ity of our system. For the task of object classiﬁca-

tion, we seek for a suitable transformation of the in-

put images, such that their corresponding visual per-

cepts, generated by the neurophysiological network,

will lead to an increased classiﬁcation accuracy com-

pared to their unaltered counterparts. Therefore, af-

ter feeding the input image to the transformation net-

work and the spatiotemporal network, the output of

the latter is fed to a standard classiﬁcation convolu-

tional neuronal network consisting of convolutional

blocks and a multilayer-perceptron thereafter (please

refer to Figure 3 for an overview). Here, categorical

cross-entropy is used as the objective function.

3 EVALUATION

3.1 Image Reconstruction

The proposed image reconstruction task was tested

on the popular MNIST data set (Y. LeCun, 1998)

comprising binary images of handwritten digits. This

data set is of particular interest, since due to its clear

ﬁgure/ground separation the qualitative assessment

of learned transformations is assumed to be easy to

grasp. Furthermore, the enhancement of visual per-

cepts of digits (and even more letters) is an everyday

visual task that potentially is of great importance for

patients suffering from RP and treated with a retinal

implant. Mean-squared-error was used as the objec-

tive function assessing the dissimilarity of the input

image and its virtually perceived version.

For the training of the network, the training set

comprised 50000 images belonging to 10 classes

of digits (0 − 9). Training was performed batch-

wise (n = 128) for 500 epochs and a validation set

of 10000 different images was evaluated after each

epoch. Standard stochastic gradient descent was used

for optimization with a ﬁxed learning rate of 0.01.

Figure 4a) shows the mean-squared-error (mse)

over time using a logarithmic scale for better visibil-

ity. As it can be seen, training and validation loss

decrease signiﬁcantly until the validation loss suppos-

edly saturates after around 300 epochs. As a quantita-

tive reference, the baseline mse (without image trans-

formation) throughout the validation set is at 0.067,

whereas it drops to 0.035 after 500 epochs.

For a qualitative visual comparison, the last two

rows of Figure 5 show exemplary results given an in-

put image from the validation set (ﬁrst column), its

Automatic Perception Enhancement for Simulated Retinal Implants

911

(a)

0 100 200 300 400 500

−1.45

−1.4

−1.35

−1.3

epochs

log loss − mse

training

validation

(b)

0 200 400 600 800

0.2

0.4

0.6

0.8

epochs

accuracy

training

validation

(c)

0 200 400 600 800

0.2

0.4

0.6

0.8

epochs

accuracy

training

validation

Figure 4: a) Loss for image reconstruction task. b) Classiﬁcation accuracy of the classiﬁcation task without learning a suitable

transformation. c) Classiﬁcation accuracy with learned transformation.

learned transformation (second column), the simu-

lated percept of the original image (third column), and

the simulated percept of the transformed image (forth

column). Roughly speaking, the transformation net-

work thins out the original strokes of the digits and

emphasizes visually interesting locations, e.g., cor-

ners, start and end points. This transformation seems

to be adequate, as this will result in visual percepts

looking less blurred and with (subjectively) higher

acuity.

3.2 Classiﬁcation

Classiﬁcation as described in Section 2.2.3 was per-

formed using the cifar-10 data set (Krizhevsky et al.,

2009). The data set consists of 60000 training and

10000 validation samples of natural images belong-

ing to 10 different classes. Images were converted to

grey-scale before further processing. This data set is

particularly challenging for the task of perception en-

hancement, since images are often cluttered and often

contain noisy information.

For assessing the quality gain by using a transfor-

mation network, a second model without any image

transformation was trained on the simulated percepts

of the original input images for comparison (Figure 3,

red path).

Training was performed batch-wise (n = 128) for

800 epochs using standard stochastic gradient descent

with a ﬁxed learning rate of 0.01. Validation accuracy

was evaluated after each epoch.

Figure 4b) shows the classiﬁcation accuracy of the

network without a learned transformation. Validation

accuracy starts to saturate at around 400 epochs and a

ﬁnal validation accuracy of 62% is obtained. Sim-

ilarly, Figure 4c) shows the classiﬁcation accuracy

with the learned transformation. Although having the

same characteristics, a slightly higher validation ac-

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

912

Figure 5: Exemplary results of the classiﬁcation (ﬁrst two rows) and the image reconstruction task (last two rows). From left

to right: Original image, learned transformation, simulated percept of the original image, simulated percept of the transformed

image.

curacy of 70% is obtained. For a visual comparison,

the top two rows of Figure 5 show the original input

images (ﬁrst column), their learned transformations

(second column), the simulated percepts of the origi-

nal images (third column), and the simulated percepts

of the transformed images (forth column). Again, the

learned transformations seem to be adequate, as edges

and contours are emphasized, whereas homogeneous

regions are suppressed.

3.3 Discussion

Although the transformed images do look plausible

(see Figure 5) and signiﬁcant quality improvements

could be achieved, it remains questionable, whether

the altered images and their corresponding visual per-

cepts will be useful for real patients.

As can be seen in Figure 5, the applied transfor-

mations of the original images seem to enhance the

visual percepts of digits (bottom rows, MNIST) yield-

ing better ﬁgure-ground separation, a quality gain re-

garding real and more complex objects on slightly

cluttered background (top rows, cifar-10) is barley no-

table and highly subjective. For further assessment of

bounds, limitations and applicable domains of such

transformations, behavioural experiments need to be

carried out.

Moreover, the model used in our experiments to

describe the retinal signal processing is far from opti-

mal, since it only coarsely approximates the complex

processing of numerous cells within the retina. There-

fore, it is likely that the obtained measurable qual-

ity w.r.t. the classiﬁcation error and mean-squared-

error is dependent on the underlying neurophysiolog-

ical model. However, due to the fact that the most

prominent aspects of the spatiotemporal information

reduction are captured by the used model of (Nan-

duri et al., 2012), the authors do believe, that the pre-

Automatic Perception Enhancement for Simulated Retinal Implants

913

sented results are indicative for the applicability of

automatically learned transformations for perception

enhancement irrespective of the used neurophysiolog-

ical model.

4 CONCLUSION

By modelling the processing pipeline from an input

image to a percept using artiﬁcial neural networks it

is possible to learn input image transformations in an

end-to-end fashion. Therefore, we extend the work

of (Steffen et al., 2018) by implementing the neuro-

physiological spatiotemporal model of Nanduri et al.

(Nanduri et al., 2012) as a tensor network. This al-

lows us to embed the simulation of visual percepts

from retinal implants inside arbitrary artiﬁcial neural

networks.

Regarding our goal of perception enhancement,

we proposed an image transformation network that

learns a suitable transformation of input images given

an image reconstruction and classiﬁcation task. Re-

sults are promising. For both evaluated tasks, a sig-

niﬁcant enhancement was achieved by using our ap-

proach.

However, results have to be seen with caution,

since experiments with real patients or behavioural

experiments have to be conducted to verify its appli-

cability in practical terms. Moreover, within this work

we focussed solely on spatial image transformations,

however, pulse trains may also be altered temporally

providing grounds for extensive further experiments.

Furthermore, using our implementation of the

neurophysiological model of (Nanduri et al., 2012),

our processing pipeline is capable of performing real-

time transformations for upto 30 frames per second.

This will potentially allow to conduct behavioural ex-

periments with healthy subjects using virtual reality

glasses to understand the validity of the assumptions

underlying the image enhancements irrespective of

the validity of the neurophysiological model.

REFERENCES

Beyeler, M., Boynton, G., Fine, I., and Rokem, A. (2017).

pulse2percept: A Python-based simulation framework

for bionic vision. In Proceedings of the 16th Python

in Science Conference, pages 81–88. SciPy.

Horsager, A., Greenwald, S. H., Weiland, J. D., Humayun,

M. S., Greenberg, R. J., et al. (2009). Predicting visual

sensitivity in retinal prosthesis patients. Investigative

Ophthalmology and Visual Science, 50(4):1483–1491.

Humayun, M. S., Dorn, J. D., da Cruz, L., Dagnelie, G.,

Sahel, J.-A., et al. (2012). Interim results from the

international trial of Second Sight’s visual prosthesis.

Ophthalmology, 119(4):779–88.

Krizhevsky, A., Nair, V., and Hinton, G. (2009). CIFAR-10

and CIFAR-100 datasets.

Nanduri, D., Fine, I., Horsager, A., Boynton, G. M., Hu-

mayun, M. S., et al. (2012). Frequency and Ampli-

tude Modulation Have Different Effects on the Per-

cepts Elicited by Retinal Stimulation. Investigative

Ophthamology & Visual Science, 53(1):205–214.

Steffen, J., Napp, J., Pollmann, S., and T

onnies, K. (2018).

Perception Enhancement for Bionic Vision - Prelim-

inary Study on Object Classiﬁcation with Subretinal

Implants. In International Conference on Pattern

Recognition Applications and Methods, volume 7,

pages 169–177.

Stingl, K., Bartz-Schmidt, K. U., Besch, D., Braun,

A., Bruckmann, A., et al. (2013). Artiﬁcial

vision with wirelessly powered subretinal elec-

tronic implant alpha-IMS. Biological sciences,

280(1757):20130077.

Stingl, K., Bartz-Schmidt, K. U., Besch, D., Chee, C. K.,

Cottriall, C. L., et al. (2015). Subretinal Visual Im-

plant Alpha IMS – Clinical trial interim report. Vision

Research, 111:149–160.

Y. LeCun (1998). The MNIST database of handwritten dig-

its. http://yann.lecun.com/exdb/mnist/.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

914