An Event Camera Simulator for Arbitrary Viewpoints Based on Neural

Radiance Fields

Diego Hern

andez Rodr

ıguez

1,2

, Motoharu Sonogashira

1,2

, Kazuya Kitano

, Yuki Fujimura

Takuya Funatomi

2 a

, Yasuhiro Mukaigawa

and Yasutomo Kawanishi

1,2 b

Guardian Robot Project, RIKEN, Kyoto, Japan

Division of Information Science, Nara Institute of Science and Technology, Nara, Japan

Keywords:

Event Camera Simulation, Neural Radiance Fields.

Abstract:

Event cameras are novel sensors that offer signiﬁcant advantages over standard cameras, such as high temporal

resolution, high dynamic range, and low latency. Despite recent efforts, however, event cameras remain rela-

tively expensive and difﬁcult to obtain. Simulators for these sensors are crucial for developing new algorithms

and mitigating accessibility issues. However, existing simulators based on a real-world video often fail to gen-

eralize to novel viewpoints or temporal resolutions, making the generation of realistic event data from a single

scene unfeasible. To address these challenges, we propose enhancing event camera simulators with neural

radiance ﬁelds (NeRFs). NeRFs can synthesize novel views of complex scenes from a low-frame-rate video

sequence, providing a powerful tool for simulating event cameras from arbitrary viewpoints. This approach

not only simpliﬁes the simulation process but also allows for greater ﬂexibility and realism in generating event

camera data, making the technology more accessible to researchers and developers.

1 INTRODUCTION

Event cameras represent a paradigm shift in visual

sensing technology, capturing dynamic scenes with

remarkable temporal resolution and high dynamic

range. Unlike conventional frame-based cameras,

event cameras asynchronously record changes in the

intensity of the visual ﬁeld, offering a unique advan-

tage in scenarios involving fast motion or challenging

lighting conditions. Since these sensors are still rela-

tively expensive and difﬁcult to obtain, various efforts

have been made to create simulators to facilitate their

research further.

Previous simulators aim to generate event data

from RGB video by either relying on ultra-high fram-

erates (Gehrig et al., 2020; Garc

ıa et al., 2016) or by

interpolation of the video sequence (Hu et al., 2021).

This comes with the drawback of not being able to

generate more data from a single video. While sim-

ulators like ESIM (Rebecq et al., 2018) attempt to

tackle this issue with the use of 3D models, generating

data that resembles a realistic scene is both time and

labor-intensive, making it unsuitable for researchers

https://orcid.org/0000-0001-5588-5932

https://orcid.org/0000-0002-3799-4550

who want to generate for their own environment.

To address this challenge of generating event data

from arbitrary viewpoints, we propose a framework

of a simulation shown in Fig. 2. The framework gen-

erates synthetic event camera data using neural radi-

ance ﬁelds (NeRFs) (Mildenhall et al., 2020), a re-

cent breakthrough in the ﬁeld of computer vision that

enables the reconstruction of high-ﬁdelity 3D scenes

from a sparse set of 2D images by leveraging neu-

ral networks to model the volumetric radiance ﬁeld.

By integrating NeRF with event-based sensing prin-

ciples, we aim to create a versatile framework that

can produce realistic and diverse event camera data,

facilitating the advancement of event-based vision al-

gorithms. Notably, our method focuses on generating

event data from static scenes, allowing for the explo-

ration of how camera motion alone inﬂuences event

generation without the added complexity of dynamic

scene changes.

Our approach offers several signiﬁcant advan-

tages. First, it allows for creating extensive datasets

without the need for labor-intensive data collection

processes. Second, it provides a controlled virtual

environment where various parameters can be mod-

iﬁed to evaluate the robustness of event-based algo-

rithms. Finally, the synthetic data generated through

774

Rodríguez, D. H., Sonogashira, M., Kitano, K., Fujimura, Y., Funatomi, T., Mukaigawa, Y. and Kawanishi, Y.

An Event Camera Simulator for Arbitrary Viewpoints Based on Neural Radiance Fields.

DOI: 10.5220/0013388400003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 2: VISAPP, pages

774-780

ISBN: 978-989-758-728-3; ISSN: 2184-4321

our method can serve as a valuable resource for train-

ing deep learning models, potentially improving their

performance in real-world applications.

The rest of this paper is organized as follows. In

section 2, we introduce some of the most important

works concerning event camera simulation and ex-

plain their working mechanism. We also quickly re-

view the formulation of neural radiance ﬁelds. In

section 3, we detail the proposed methodology for

synthesizing event camera data using NeRF and dis-

cuss the implementation and integration of these tech-

nologies. In section 4, we present experimental re-

sults demonstrating the effectiveness of our approach,

comparing them to actual event data streams and with

other video-to-event generation pipelines. Finally, in

section 5, we discuss our method’s limitations, possi-

ble extensions, and future work. By bridging the gap

between synthetic data generation and event-based

sensing, our work aims to accelerate research in event

cameras and pave the way for their broader adoption

and application.

2 RELATED WORK

2.1 Event Camera Simulation

Numerous event camera datasets and simulators have

been introduced over the years. In this section, we

review the most relevant ones and their speciﬁc ap-

plication scenarios. The number of publicly released

event camera simulators is small. While some of them

build upon previous research, they mostly tackle the

task differently.

Early simulators like (Mueggler et al., 2017) vi-

sually approximated an event stream by detecting sig-

niﬁcant changes in luminance between two succes-

sive frames to create edge-like images that resemble

the output of an event camera. Most of these sim-

ulators did not discuss how to convert the simulated

events into realistic and accurate raw event streams.

Recent approaches like (Zhang et al., 2024) tackle

this by developing a statistics-based local-dynamics-

aware timestamp inference algorithm that enables the

smooth transition to the event stream. Other simula-

tors like (Joubert et al., 2021) attempt to physically

model the unique characteristics of the sensor and its

parameters, while methods such as (Zhu et al., 2019)

and (Hu et al., 2021) take a deep learning-based ap-

proach in order to approximate the outputs of a phys-

ical sensor. However, none of them take into account

the geometry of the scene, nor can they generate an

event stream outside the original path followed by the

camera. In order to circumvent this limitation, (Li

Figure 1: A pixel u of the intensity image I

in the event

generation model. A positive or negative event is generated

when the brightness change exceeds the threshold τ in a log-

arithmic scale. Represented in blue and red, respectively.

et al., 2018) and, most notably, (Rebecq et al., 2018)

leverage 3D models to render a scene in which a user-

deﬁned camera path can be utilized to generate an

event stream. However, this approach poses the need

for detailed models in case a realistic scene is to be

simulated.

In ESIM (Rebecq et al., 2018), an output

event stream E is represented as a sequence of

= (t

, u

, p

), denoting brightness changes asyn-

chronously registered by an image I at time t and its

pixel location u

= (x

, y

) in the image, with a polar-

ity p

∈ {−1, 1}. The polarity of an event indicates a

positive or negative change in illumination according

to a logarithmic scale, quantized by negative and pos-

itive thresholds τ. The change in brightness between

two timestamps can be estimated by the difference of

intensity of a pixel u

of images at time t

and t

i−1

the logarithmic scale. This mechanism is illustrated

in Fig. 1 and formulated as follows.

(

−1 if τ < ∆(u

)

1 if τ > ∆(u

)

(1)

∆(u

) = logI(u

) − logI(u

i−1

) (2)

2.2 Neural Radiance Fields

Neural radiance ﬁelds (Mildenhall et al., 2020) repre-

sent a scene utilizing a multi-layer perceptron (MLP)

: (x, d) → (c, σ) that maps a position in 3D space

x = (x, y, z) and a 2D viewing direction d = (θ, φ) to

its corresponding directional emitted radiance, i.e., its

color c = (R, G, B) and volume density σ. From this

representation, the estimated emitted radiance

L at a

given pixel u can be calculated using the volume ren-

dering equation (Tagliasacchi and Mildenhall, 2022)

An Event Camera Simulator for Arbitrary Viewpoints Based on Neural Radiance Fields

775

Figure 2: Illustration of our method. We ﬁrst train a neural radiance ﬁeld and subsequently simulate a virtual event camera,

adding noise to the simulation.

with quadrature, as follows:

L(u) =

∑

k=1

(1 − exp(−σ

))c

, (3)

= exp

−

k−1

∑

m=1

, (4)

where σ

and c

are the volume density and the emit-

ted radiance, respectively, of a sampled position x

along the back-projected ray r through a pixel, which

has a direction d and an origin o at the camera cen-

ter. The sample x

= o + s

d has a distance s

from

the camera center and a distance of δ

= s

k+1

− s

be-

tween its adjacent sample x

k+1

Several advances have been made since the origi-

nal NeRF paper was ﬁrst published. Neural network-

based approaches like (M

uller et al., 2022) and (Chen

et al., 2022) have greatly reduced inference time and

increased 3D reconstruction quality, while methods

such as (Kerbl et al., 2023) completely forgo a neu-

ral representation and opt for a modiﬁed differentiable

point-based rendering technique. While we utilize

uller et al., 2022) as our rendering backbone in this

paper, it is worth noting that our method is radiance

ﬁeld agnostic. Meaning that the method used to ren-

der the radiance ﬁeld is interchangeable.

2.3 Event Cameras and NeRFs

Recent studies have explored the integration of event

cameras with NeRFs. Notable works such as (Klenk

et al., 2023), (Rudnev et al., 2023), and (Hwang et al.,

2023) have demonstrated promising results in con-

structing radiance ﬁeld representations directly from

event camera data streams. In contrast, this paper

shifts focus from generating radiance ﬁelds to de-

riving event representations from existing radiance

ﬁelds.

This approach presents several advantages. By

leveraging the continuous and high-resolution nature

of NeRFs, it becomes possible to simulate event data

from arbitrary viewpoints and under varying condi-

tions without the need for specialized hardware. This

ﬂexibility enables the creation of diverse datasets for

training and evaluating event-based algorithms, which

are often limited by the scarcity and cost of event

cameras.

However, simulating realistic event data from

NeRFs introduces unique challenges. Reproducing

sensor-speciﬁc noise and latency effects is essential

for generating data that closely mirrors real-world

conditions and addressing these challenges is critical

to ensuring that the simulated events are both physi-

cally plausible and useful for downstream tasks.

3 METHOD

3.1 Problem Formulation

Event cameras asynchronously detect changes in

pixel brightness, delivering high temporal resolution

and low-latency data. However, their high cost and

limited availability restrict widespread adoption. Re-

searchers rely on simulators to generate synthetic

event streams, yet existing simulators that rely on

RGB videos are limited to the viewpoints present in

the input sequence, preventing the generation of event

data from novel camera paths.

To address these challenges, we propose a NeRF-

based event camera simulator capable of generating

synthetic event data from arbitrary viewpoints.

The simulation consists of two stages: NeRF

training and event data generation. In the NeRF

training stage, the simulator accepts a set of images

I =

{

, . . . , I

}

. Using I, a neural radiance ﬁeld F

trained. In the event data generation stage, a camera

trajectory (a sequence of camera positions and orien-

tations) C = [c

, . . . , c

] are input to the simulator f

sim

Then, the simulator generates an event data stream E

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

776

along the given camera path C using the trained neural

radiance ﬁeld F

E = f

sim

(C; F

) (5)

The overall process ﬂow is illustrated in Fig. 2.

3.2 Event Data Generation by Sampling

Radiance Fields

Following the methodology behind ESIM’s event

generation from 3D models, our method approx-

imates the per-pixel value of the intensity image

logI(u

) at pixel u

by using the trained F

and a

selected camera position interpolated from the given

camera path C. For each pixel of the image at the

sampled camera position, the color of each pixel is

calculated by accumulating the contributions from all

sampled points along the ray following equation (3).

Since event cameras operate in brightness pixels, we

convert the sampled color images using the ITU-R

Recommendation BT601 for luma (Union, 2011), i.e.,

according to the formula:

Y (R, G, B) = 0.299R + 0.587G + 0.114B, (6)

with RGB channels in linear color space. This yields

the following equation:

I(u

) = Y (

L(u

)).

(7)

Generating a pair of logarithmic intensity im-

ages log(I(u

)) and log(I(u

i−1

)) based on user-

deﬁned parameters, such as maximum number of

events per camera position, pixel refraction period,

and brightness change threshold τ.

We can then determine the number of predicted

events at a certain pixel location during that time win-

dow with the following equation:



|∆(u

;t)|



. (8)

According to the number, events are generated with

polarity p

based on the positive or negative of

∆(u

;t).

3.3 Event Camera Noise

Although less studied than traditional RGB camera

noise, a data stream from an event camera normally

contains events that are not associated with changes

in intensity. These events are considered noise which

comes from two main sources: photon noise and leak-

age current (Guo and Delbruck, 2023). In low-

brightness conditions, photon noise is the most com-

mon source of noise, while leakage current domi-

nates high-brightness conditions. In some event cam-

era simulators like (Hu et al., 2021), the events that

are generated by photon noise are modeled as a Pois-

son process, in which the noise event rate linearly de-

creases with intensity. Further research on modeling

these noise sources were performed by Ruiming et.

al. (Cao et al., 2024) and we leverage their noise

model in our experiments.

4 EVALUATION

4.1 Experimental Settings

We utilize a modiﬁed version of instant NGP (M

uller

et al., 2022) implemented in PyTorch as our NeRF

backbone. Each scene of the dataset was trained

for 350 epochs with an initial learning rate of 0.01

and with the Adam optimizer. We conduct our

experiments on the dataset provided by Mueggler

et al. (Mueggler et al., 2017) for our comparisons

since it contains images generated by a DAVIS sen-

sor (Brandli et al., 2014), which are used to train

the radiance ﬁeld, as well as camera positions from

an external tracker, eliminating the need to use

COLMAP (Sch

onberger and Frahm, 2016) for cam-

era pose estimation.

To perform our tests, we interpolate ﬁve equidis-

tant positions between each camera pose along the

initial camera path, akin to the frame interpolation

V2E does.

4.2 Evaluation Metrics

For evaluation, it is difﬁcult to directly compare the

generated event data and the ground truth, we accu-

mulated events into an image and performed image-

level comparison.

First, an accumulation operation is performed on

both the ground truth and simulated event streams to

generate a frame representation. The accumulation

operation integrates events over time into a frame-by-

frame basis, aggregating changes captured by the sen-

sor. As shown in (Mueggler et al., 2017), a logarith-

mic intensity image log

I(u;t) can be reconstructed

from the event stream at any point in time t by ac-

cumulating events e

= (t

, u

, p

) according to the fol-

lowing function:

log

I(u;t) = logI(u; 0) + γ(u;t), (9)

γ(u;t) =

∑

0<t

≤t

τδ(u − u

)δ(t −t

), (10)

where I(u; 0) is the rendered image at time t = 0, and

δ selects the pixel to be updated on every event (pixel

I is updated at time t

An Event Camera Simulator for Arbitrary Viewpoints Based on Neural Radiance Fields

777

(a) Ground truth

(b) Ours (No added noise)

Figure 3: Comparison of event streams, positive and nega-

tive events are colored red and green, respectively.

We utilize a modiﬁed version of this function,

which applies a decay parameter to reduce the noise

of the generated frame. The accumulator function ap-

plies an exponential decay d(t, τ) to equation (7):

log

I(u;t) =log



I(u; 0)d(t, τ) + I(u;t)(1 − d(t, τ))

∑

0<t

≤t

τδ(u − u

)d(t −t

, τ)



, (11)

d(t, τ) =exp



−



, (12)

(a) Ground truth

(b) Ours (No added noise)

Figure 4: Zoomed-in view of a speciﬁc region from Figure

3, highlighting ﬁner details of the event streams.

where log(I(u;0)) is the logarithm of the inten-

sity of the pixel at the previous accumulated frame,

log(I(u;t)) is a neutral potential, and the decay pa-

rameter is the time constant τ. For our experiments

we set τ = 1 × 10

−5

microseconds and log(I(u; 0)) =

0.5.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

778

(a) Ground truth (b) Ours (c) V2E (Hu et al., 2021) (d) V2CE (Zhang et al., 2024)

Figure 5: Visual comparison of accumulated frames. All frames were obtained after accumulating events according to the

process described in section 4.2.

Table 1: Comparison of PSNR (dB) values obtained in

scenes from the dataset (Mueggler et al., 2017) (higher is

better).

Scene name Ours V2E V2CE

slider 30.01 29.40 29.42

boxes 6DoF 28.32 28.06 28.15

poster 28.04 28.57 28.64

4.2.1 Qualitative Comparison

We compare frame-level results with the real event

camera stream (ground truth). We also compare the

accumulated results with V2E and V2CE to the real

event camera stream (accumulated ground truth).

4.2.2 PSNR of Accumulated Event Frames

To measure the correctness of the simulated events

quantitatively, we perform an evaluation using a Peak

Signal Noise Ratio (PSNR) basis, which is a well-

known evaluation metric of image quality (Hor

e and

Ziou, 2010). These PSNR comparisons are summa-

rized in Tab. 1.

4.3 Experimental Results

As demonstrated in Fig. 3, our simulator correctly ap-

proximates the positive and negative events measured

by an actual event camera. It is worth noting that due

to not including both noise and hot pixel simulation in

our experiments, some areas of the simulation appear

not to show any information registered; a zoom-in of

an extreme case is illustrated in Fig. 4.

An example of the qualitative results of the accu-

mulated images is shown in Fig. 5.

While this paper primarily focuses on the applica-

tion of radiance ﬁelds for static scene reconstruction,

it is important to note several limitations and potential

avenues for future research.

Radiance ﬁelds have the ability to reconstruct dy-

namic scenes. The NeRF backbone utilized in our

experiments did not have the capability to represent

dynamic scenes, so we left their implementation as a

task for future research.

Our simulator, by its design, does not rely on a

speciﬁc representation of radiance ﬁelds. This ﬂexi-

bility allows for easy integration with alternative ren-

dering techniques such as Gaussian splatting (Kerbl

et al., 2023).

While our simulator demonstrates promising re-

sults in controlled environments, generalizing these

ﬁndings to real-world applications presents additional

challenges. Factors such as varying lighting condi-

tions, occlusions, and reﬂective surfaces can signiﬁ-

cantly impact the performance and accuracy of radi-

ance ﬁeld reconstruction.

5 CONCLUSION

In this paper, we introduced a novel method for

event camera simulation using neural radiance ﬁelds.

Our approach leverages the capabilities of NeRFs to

synthesize novel views of complex scenes, enabling

the generation of realistic and diverse event camera

data from arbitrary viewpoints. Experimental results

demonstrate that our simulator matches or outper-

forms existing methods in terms of accuracy and re-

alism, providing a valuable tool for the development

and evaluation of event-based vision algorithms. The

key contributions of this work include the integration

of NeRFs with event-based sensing principles and the

development of a versatile and efﬁcient event camera

simulator. We believe that this method represents a

signiﬁcant advancement in the ﬁeld of event camera

simulation, making this technology more accessible

to researchers and developers.

REFERENCES

Brandli, C., Berner, R., Yang, M., Liu, S.-C., and Delbruck,

T. (2014). A 240 × 180 130 db 3 µs latency global

shutter spatiotemporal vision sensor. IEEE Journal of

Solid-State Circuits, 49(10):2333–2341.

An Event Camera Simulator for Arbitrary Viewpoints Based on Neural Radiance Fields

779

Cao, R., Galor, D., Kohli, A., Yates, J. L., and Waller, L.

(2024). Noise2image: Noise-enabled static scene re-

covery for event cameras.

Chen, A., Xu, Z., Geiger, A., Yu, J., and Su, H. (2022).

Tensorf: Tensorial radiance ﬁelds. In European Con-

ference on Computer Vision (ECCV).

Garc

ıa, G. P., Camilleri, P., Liu, Q., and Furber, S. (2016).

pydvs: An extensible, real-time dynamic vision sensor

emulator using off-the-shelf hardware. In IEEE Sym-

posium Series on Computational Intelligence, pages

1–7.

Gehrig, D., Gehrig, M., Hidalgo-Carri

o, J., and Scara-

muzza, D. (2020). Video to events: Recycling video

datasets for event cameras. In IEEE/CVF Conference

on Computer Vision and Pattcalern Recognition.

Guo, S. and Delbruck, T. (2023). Low cost and latency

event camera background activity denoising. IEEE

Transactions on Pattern Analysis and Machine Intel-

ligence, 45(1):785–795.

Hor

e, A. and Ziou, D. (2010). Image quality metrics: Psnr

vs. ssim. In In International Conference on Pattern

Recognition, pages 2366–2369.

Hu, Y., Liu, S. C., and Delbruck, T. (2021). v2e: From

video frames to realistic DVS events. In IEEE/CVF

Conference on Computer Vision and Pattern Recogni-

tion Workshops (CVPRW). IEEE.

Hwang, I., Kim, J., and Kim, Y. M. (2023). Ev-nerf:

Event based neural radiance ﬁeld. In Proceedings of

the IEEE/CVF Winter Conference on Applications of

Computer Vision (WACV), pages 837–847.

Joubert, D., Marcireau, A., Ralph, N., Jolley, A., van

Schaik, A., and Cohen, G. (2021). Event camera

simulator improvements via characterized parameters.

Frontiers in Neuroscience, 15.

Kerbl, B., Kopanas, G., Leimk

uhler, T., and Drettakis,

G. (2023). 3d gaussian splatting for real-time radi-

ance ﬁeld rendering. ACM Transactions on Graphics,

42(4).

Klenk, S., Koestler, L., Scaramuzza, D., and Cremers, D.

(2023). E-nerf: Neural radiance ﬁelds from a moving

event camera. IEEE Robotics and Automation Letters.

Li, W., Saeedi, S., McCormac, J., Clark, R., Tzoumanikas,

D., Ye, Q., Huang, Y., Tang, R., and Leutenegger, S.

(2018). Interiornet: Mega-scale multi-sensor photo-

realistic indoor scenes dataset. In British Machine Vi-

sion Conference.

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,

Ramamoorthi, R., and Ng, R. (2020). Nerf: Repre-

senting scenes as neural radiance ﬁelds for view syn-

thesis. In ECCV.

Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., and D.

(2017). Scaramuzza: The event-camera dataset and

simulator: Event-based data for pose estimation, vi-

sual odometry, and SLAM. International Journal of

Robotics Research, 36:142–149.

uller, T., Evans, A., Schied, C., and Keller, A. (2022).

Instant neural graphics primitives with a multiresolu-

tion hash encoding. ACM Trans. Graph., 41(4):102:1–

102:15.

Rebecq, H., Gehrig, D., and Scaramuzza, D. (2018). ESIM:

an open event camera simulator. Conference on Robot

Learning (CoRL).

Rudnev, V., Elgharib, M., Theobalt, C., and Golyanik, V.

(2023). Eventnerf: Neural radiance ﬁelds from a sin-

gle colour event camera. In Computer Vision and Pat-

tern Recognition (CVPR).

Sch

onberger, J. L. and Frahm, J.-M. (2016). Structure-

from-motion revisited. In IEEE Conference on Com-

puter Vision and Pattern Recognition.

Tagliasacchi, A. and Mildenhall, B. (2022). Volume ren-

dering digest (for NeRF). arXiv:2209. 02417 [cs],

3:02417.

Union, I. T. (2011). Recommendation itu-r bt.601-7: Studio

encoding parameters of digital television for standard

4:3 and wide-screen 16:9 aspect ratios. https://www.

itu.int/rec/R-REC-BT.601.

Zhang, Z., Cui, S., Chai, K., Yu, H., Dasgupta, S., Mahbub,

U., and Rahman, T. (2024). V2ce: Video to continu-

ous events simulator.

Zhu, A. Z., Wang, Z., Khant, K., and Daniilidis, K. (2019).

Eventgan: Leveraging large scale image datasets for

event cameras. arXiv preprint arXiv:1912.01584.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

780