Improving Low-Light Image Recognition Performance

Based on Image-Adaptive Learnable Module

Seitaro Ono

, Yuka Ogino

, Takahiro Toizumi

, Atsushi Ito

and Masato Tsukada

1,2

University of Tsukuba, Ibaraki, Japan

NEC Corporation, Kanagawa, Japan

Keywords: Low-Light Image Enhancement, Image Recognition.

Abstract: In recent years, significant progress has been made in image recognition technology based on deep neural

networks. However, improving recognition performance under low-light conditions remains a significant

challenge. This study addresses the enhancement of recognition model performance in low-light conditions.

We propose an image-adaptive learnable module which apply appropriate image processing on input images

and a hyperparameter predictor to forecast optimal parameters used in the module. Our proposed approach

allows for the enhancement of recognition performance under low-light conditions by easily integrating as a

front-end filter without the need to retrain existing recognition models designed for low-light conditions.

Through experiments, our proposed method demonstrates its contribution to enhancing image recognition

performance under low-light conditions.

1 INTRODUCTION

In recent years, image recognition with deep neural

networks (DNNs) has advanced significantly.

Various recognition models, trained on large datasets,

have emerged and improved steadily in performance.

These models predominantly assume inputs of high-

quality images captured in well-lighting conditions.

The challenge of image recognition still remains in

adapting to various real-world conditions. In practical

applications, environmental factors such as low

lighting, backlighting, adverse weather, and image

sensor noise significantly impact image quality.

These factors degrade the performance of image

recognition.

To overcome these issues, many learning-based

image enhancement methods have been proposed so

far. These methods primarily aim to improve

perceptibility in human vision and do not necessarily

focus on enhancing the performance of recognition

models. In particular, learning-based low-light

image enhancement methods are designed without

considering the subsequent recognition task. These

methods may overly smooth images or accentuate

noise in images, leading to a decline in recognition

performance.

Recently, a method has been proposed to enhance

recognition performance under extremely low-light

conditions. Lee et al. proposed a human pose

estimation model which estimates poses of

individuals in extremely low-light images (Lee et al.,

2023). The pose estimation model is well trained on

images captured in extremely low-light conditions.

The model has significantly contributed to improving

the accuracy of pose estimation. However, we

hypothesize that further enhancement of performance

is achievable by introducing crucial image quality

enhancement for the model.

We propose an image-adaptive learnable module

and a hyper-parameter predictor to optimally process

input images to improve the performance of the later

stage recognition task. The proposed method does not

aim to improve image quality in human perception.

Instead, it focuses on enhancing recognition model

performance through the enrichment of useful

features for the model. We introduce a relatively

straightforward image processing module to correct

low-light images into images that can be easily

recognized by downstream recognition models.

Furthermore, to enhance recognition performance,

we propose a method for predicting appropriate

hyperparameters within the image processing module.

In this paper, we adopt Lee et al.'s pose estimation

as the recognition model and further improve the

Ono, S., Ogino, Y., Toizumi, T., Ito, A. and Tsukada, M.

Improving Low-Light Image Recognition Performance Based on Image-Adaptive Learnable Module.

DOI: 10.5220/0012459700003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

721-728

ISBN: 978-989-758-679-8; ISSN: 2184-4321

721

performance of the model. Through experiments, we

demonstrate that our proposed approach enhances the

recognition performance even with recognition

models trained on low-light images beforehand,

validating its practical effectiveness. This research

introduces a new direction for image recognition in

real-world environments, providing a crucial

foundation for advancing the performance of existing

recognition models.

2 RELATED WORKS

Research on improving the image quality of low-light

images is considered an important challenge in the

fields of computer vision and image processing.

Various techniques have been proposed to enhance

the lightness of low-light images and improve their

overall quality.

In recent years, with the rapid development of

deep learning, correction techniques for low-light

images using Convolutional Neural Networks

(CNNs) have gained attention. These methods can

learn features from low-level image characteristics to

high-level semantic features and improve image

quality in low-light environments. These approaches

not only increase the lightness of an image but also

achieve advanced image restoration, such as color

correction and noise reduction. LLNet (Lore et al.,

2017) employs a deep autoencoder for enhancing

low-light images and noise reduction. It enables end-

to-end training by appropriately adjusting lightness

while preserving the natural appearance of the image.

MSRNet (Shen et al., 2017) learns the mapping

between dark and bright images by using different

Gaussian convolution kernels in an end-to-end

manner. MBLLEN (Lv et al., 2018) utilizes a multi-

branch network to extract rich features at different

levels and ultimately generates the output image

through multi-branch fusion. RetinexNet (Wei et al.,

2018) combines the Retinex theory (Land et al. 1977)

with CNNs to estimate the illumination map of an

image. It improves low-light images by adjusting the

map. KinD (Zhang et al., 2019) is a network based on

the Retinex theory, designed with an additional

Restoration-Net for noise removal. This approach

effectively adjusts the illumination of low-light

images and achieves high-quality image

reconstruction by reducing noise.

These methods are designed to address critical

challenges in low-light image processing, such as

realistic image reconstruction, noise removal, and

management of lightness and contrast. However,

many conventional data-driven methods heavily rely

on large sets of paired data of dark and bright images.

The cost of collecting such paired datasets has

increased, imposing practical constraints.

These conventional methods primarily focus on

improving visibility for human perception, without

considering the subsequent recognition tasks.

Therefore, when applying these methods, there is a

concern that crucial features for recognition tasks

may be lost during image processing. In other words,

there is a concern about whether it is appropriate to

directly apply these methods to recognition tasks.

Thus, it is difficult to strike a balance between

improving the quality of the input image and the

accuracy of the recognition task.

Image-Adaptive YOLO (Liu et al., 2022)

addresses this issue by jointly training an image

processing module along with the subsequent object

detection model. The image processing handles

degraded input images captured under adverse

weather or low-light conditions. This approach strikes

a balance between enhancing input image quality and

improving the accuracy of the object detection model.

It improves object detection accuracy in adverse

weather and low-light conditions.

However, the cooperative learning approach

between the frontend image processing module and

the subsequent recognition model is not necessarily

the optimal solution. As the subsequent recognition

model becomes large-scale, training from scratch

demands a computationally expensive environment

and extensive processing time. In the current situation

where various recognition models are continually

proposed, the situation of incurring high costs and

extensive processing time with each new model

training from scratch is not desirable. There is a

demand for the development of recognition-model-

centric (low-light) image enhancement methods that

offer outstanding efficiency and flexibility,

eliminating the necessity to train the subsequent

recognition model.

Lee et al. (Lee et al., 2023) proposed a method for

estimating the poses of individuals in images captured

under extremely dark lighting conditions. They

developed a camera that can simultaneously capture

dark and well-exposed images of the same scene. By

controlling the intensity of light, the camera can

simultaneously capture both images of a scene under

a dark environment and a bright environment. Their

pose estimation model integrates a model processing

images under low-light conditions with a model

processing corresponding high-exposure images. It

learns representations independent of lighting

conditions and improves individual pose estimation

accuracy under extremely low-light conditions based

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

722

on these representations. Because this method does

not perform image enhancement on the input dark

images, however, it is not possible to check the

consistency of the model's prediction results against

the actual captured images.

3 PROPOSED METHOD

In recognition applications dealing with images or

videos captured in low-light environments, a

significant challenge is the degradation of recognition

accuracy. When a very dark image taken in a low-

light environment is input, it becomes difficult to

distinguish between the subject and the background

because of the low contrast between the subject and

the background. Therefore, in various image

recognition tasks, the extraction of feature values to

identify the subject is hindered. This difficulty in

extracting feature values of the subject impedes

proper identification. Additionally, a contributing

factor to the difficulty of recognition tasks in low-

light environments is the noise originating from

image sensors. In dark images captured in low-light

conditions, high levels of noise occur, obscuring the

fundamental structure of the scene. Differentiating

crucial features values from random noise becomes

challenging, leading to incorrect recognition results.

To overcome this problem, this study proposes a

Low-Light Enhancement (LLE) framework that

adapts input images by recovering exposure and

removing noise, making it easier to extract potential

feature values crucial for downstream recognition

tasks.

As mentioned earlier, Lee et al. proposed a

method for accurately estimating human poses from

images under extremely low-light conditions. In the

method, the pose estimation model learns the

similarity of feature representations between

appropriately exposed images and images captured

under extremely low-light conditions. However, the

method does not actively perform contrast adjustment

or noise reduction, thereby inadequately addressing

the degradation components of low-light images.

Consequently, these aspects may limit the pose

estimation model from fully unleashing its latent

performance.

We introduce the Low-Light Enhancement (LLE)

framework as a front-end module. This framework

explicitly incorporates mechanisms for exposure

recovery, contrast adjustment, and noise reduction.

By doing so, it mitigates challenges posed by low-

light conditions and facilitates the extraction of

crucial features for downstream recognition tasks.

Unlike Image-Adaptive YOLO, our proposed

approach focuses solely on training the LLE part

independently of the downstream recognition model.

This approach allows training only the LLE part

without modifying the pre-trained recognition model,

achieving performance improvement as a front-end

filter. This feature makes it easily applicable to

various existing pre-trained recognition models in

future. The proposed method not only recovers

exposure and removes noise from input images but

also enhances them to facilitate the extraction of

feature values tailored for downstream recognition

tasks. Figure 1 (a) illustrates the framework. The

entire pipeline comprises a differentiable image

processing module consisting of differentiable

multiple image processing operators, a Fully

Convolutional Network (FCN)-based optimal

parameter predictor to predict optimal parameters

(LLE parameters) for image processing operators,

and a recognition model. Initially, an input image is

randomly cropped to a size of 256×256 and fed into

the optimal parameter predictor to predict LLE

parameters for the differentiable image processing

modules. The optimal parameter predictor undergoes

end-to-end training, considering recognition loss to

calculate LLE parameters that maximize the

recognition performance of the recognition model.

The differentiable image processing module applies

the LLE parameters obtained from the optimal

parameter predictor to image processing operators.

The image processing operators process the entire

original image with the LLE parameters and

generates an input image for the recognition model.

3.1 Differentiable Image Processing

Module

To enable gradient-based optimization for the optimal

parameter predictor, all the various image processing

operators used within the differentiable image

processing module need to be differentiable. Our

proposed differentiable image processing module

consists of three differentiable image processing

operators with adjustable hyperparameters: Exposure,

Gamma, and Smoothing (Denoising). Among these,

Exposure and Gamma operators perform pixel-

wise arithmetic operations. The Smoothing

(Denoising) operator is specifically designed to

suppress noise components without losing content

information in the image.

Improving Low-Light Image Recognition Performance Based on Image-Adaptive Learnable Module

723

Figure 1: (a) End-to-end training pipeline of the proposed low-light image enhancement method. The optimal parameter

predictor predicts the best parameters (LLE parameters) for the differentiable image processing module from randomly

cropped images. The original images are processed by the differentiable image processing module, enhancing the performance

of the subsequent recognition model. During training, the recognition model does not undergo gradient updates. The

parameter predictor learns to minimize task-specific losses between the predictions of the recognition model and ground truth

data. (b) Configuration of the proposed parameter predictor. The predictor is a Fully Convolutional Network (FCN) consisting

of six convolutional layers. "k", "s" and "c" respectively denote the kernel size, the stride and the number of output channels

for each convolutional layer.

3.1.1 Exposure Operator

This operator adjusts the overall lightness of the

entire image by raising or lowering the exposure level,

effectively controlling the overall lightness. For an

input pixel value 𝑃



=(𝑟



,𝑔



,𝑏



) and an output pixel

𝑃



=(𝑟



,𝑔



,𝑏



), the operator performs the following

mapping:

𝑃



=𝑎𝑃



, (1)

where, 𝑎 is the parameter predicted by the parameter

predictor.

3.1.2 Gamma Operator

The Gamma operator alters the contrast of the image,

emphasizing or de-emphasizing specific details by

changing the gamma value. For an input pixel value

𝑃



=(𝑟



,𝑔



,𝑏



) and an output pixel 𝑃



=(𝑟



,𝑔



,𝑏



) ,

the operator performs the following mapping:

𝑃



=𝑃





, (2)

where, 𝛾 is the parameter predicted by the parameter

predictor. The operations in the Exposure operator

and Gamma operator involve simple multiplication

and exponentiation, making them differentiable.

3.1.3 Smoothing Operator

The Smoothing operator smoothens the input image

while preserving edge information crucial for the

recognition model. To achieve this, we adopted a

bilateral filter (Tomasi et al., 1998). For an 1-channel

image with 𝐼𝐽 pixels, let 𝑓

(

𝑖,𝑗

)

represent input

pixel value at spatial coordinates

(

𝑖,𝑗

)

, if we apply the

bilateral filter to

𝑓

(

𝑖,𝑗

)

resulting in the output

pixel value 𝑔

(

𝑖,𝑗

)

, it can be expressed as follows:

𝒈

(

𝒊,

𝒋

)

𝒇

(

𝒊+𝒎,𝒋+𝒏

)

𝒆



𝒎

𝟐

𝒏

𝟐

𝟐𝝈

𝟏

𝟐

𝒆





𝒇

(

𝒊,𝒋

)

𝒇

(

𝒊𝒎,𝒋𝒏

)



𝟐

𝟐𝝈

𝟐

𝒘

𝒎𝒘

𝒘

𝒏𝒘

𝒆



𝒎

𝟐

𝒏

𝟐

𝟐𝝈

𝟏

𝟐

𝒘

𝒎𝒘

𝒆





𝒇

(

𝒊,𝒋

)

𝒇

(

𝒊𝒎,𝒋𝒏

)



𝟐

𝟐𝝈

𝟐

𝒘

𝒏𝒘

(3)

where, 𝜎



and 𝜎



are parameters provided by the

parameter predictor, and 𝑊 is the window size. 𝜎



adjusts the influence of the distance between

coordinates (𝑖,𝑗) and (𝑖+𝑚,𝑗+𝑛), with a larger 𝜎



VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

724

reducing the impact of pixels that are farther away.

𝜎



adjusts the influence of the difference between

𝑓(𝑖,𝑗) and 𝑓

(

𝑖+𝑚,𝑗+𝑛

)

, with a larger 𝜎



reducing the impact of pixels with a larger difference

in values. We apply this operator independently to the

three channels of RGB.

3.2 Optimal Parameter Predictor

In the camera's Image Signal Processing (ISP)

pipeline, various image processing operators are

performed to obtain images with high visibility for

human perception. The correction parameters for

each of image processing operators are traditionally

determined empirically by experienced technicians.

(Mosleh et al., 2020). The tuning process to obtain

correction parameters suitable for a diverse range of

images incurs substantial costs. Moreover, in this

study, the objective is not to improve human

perceptibility but to seek correction parameters that

maximize the recognition performance of the

downstream recognition model.

To address this issue, we employ an efficient

small Fully Convolutional Network (FCN) as the

parameter predictor to estimate LLE parameters for

each input image. The purpose of the optimal

parameter predictor is to understand aspects such as

the exposure level and noise level in the input image

and predict the LLE parameters for the image

processing operators that maximize the recognition

performance of the downstream recognition. Since

FCNs consume considerable computational resources

when processing high-resolution images, the LLE

parameters are learned for randomly cropped images

of 256 pixels × 256 pixels from the input image.

In real-world environmental scenes, illumination

intensity is not necessarily constant, and the exposure

level and noise level are not globally constant.

However, for the sake of computational efficiency,

we prioritize the benefit of significantly reducing

computational costs and use randomly cropped

images as inputs to the optimal parameter predictor.

During training, the optimal parameter predictor

references the recognition loss derived from the

recognition results of the downstream recognition

model and learns to maximize recognition accuracy.

As mentioned in the introduction, we adopt

single-person pose estimation by Lee et al. as the

recognition task. Therefore, we utilize the pose

estimation loss widely used in pose estimation tasks

as the loss function. The pose estimation loss is

represented by the following formula:

𝐿𝑜𝑠𝑠 =

𝐾



‖

𝑃



−𝑋



‖







, (4)

where 𝑃



and 𝑋



represent the predicted heatmap and

ground truth heatmap of the 𝑖 -th pose estimation

model, respectively, and 𝐾 denotes the number of

keypoints.

As shown in Figure 1 (b), the optimal parameter

predictor consists of 6 convolutional layers. Except for

the 4th, 5th and 6th layers, each convolutional layer is

followed by a Batch Normalization layer, which

normalizes the distribution of input data, suppresses

data variability caused by changes in lighting

conditions, and enables the model to make consistent

predictions. Batch Normalization also stabilizes the

distribution of gradients, promoting the convergence of

training. A Dropout layer is applied after the 5th

convolutional layer. The final layer outputs the

hyperparameters for the differentiable image

processing module. The parameter predictor has only

455k parameters, given a total of 8 hyperparameters for

the differentiable image processing module.

4 EXPERIMENTS

We evaluated the performance of our method for

images captured in low-light environments.

Additionally, we conducted two ablation studies to

investigate the performance of each proposed

differentiable image processing module and the

impact of the order of the image processing operators.

4.1 Implementation Details

In this experiment, we adopt the pose estimation model

proposed by Lee et al. as the recognition model. This

model is pre-trained on the low-light image dataset for

pose estimation, known as the ExLPose dataset (Lee et

al., 2023). The combination of image processing

operators used in our differentiable image processing

module is [Exposure, Gamma, Smoothing]. The

optimal parameter predictor is trained to maximize the

performance of Lee's pose estimation model. During

the training, parameters of the pose estimation model

are fixed. Only gradients of the pose estimation model

are back propagated from loss function to update

trainable parameters of the optimal parameter predictor.

The optimal parameter predictor uses the Adam

Optimizer with a learning rate set to 1e-4 and is trained

for 10 epochs with a batch size of 8. We conducted the

experiment using PyTorch and executed it on a GTX

1080Ti GPU.

Improving Low-Light Image Recognition Performance Based on Image-Adaptive Learnable Module

725

Figure 2: Qualitative evaluation results. The first row represents the input low-light image, the second row shows the image

whose lightness values have been shifted to an average of 0.4 using Lee's method, the third row displays the image processed

by our proposed method, and the fourth row contains the corresponding bright images paired with the input low-light images.

Our proposed method effectively controls exposure and contrast in a manner that is easily understandable for the recognition

model, resulting in the recovery of recognition accuracy.

4.2 Dataset

In this experiment, we used ExLPose dataset (Lee et

al. 2023). ExLPose dataset is designed for human

poses estimation. It has paired extremely low-light

(LL) and well-exposed images captured at the same

scene with the same optical axis. The light intensities

of the LL images are reduced to 1/100 from those of

well-exposed images by using ND filters in their

camera. This dataset contains LL images, their paired

well-exposed images, and the ground truth human

pose labels, comprising 2065 training data pairs and

491 test data pairs. In this study, we used only the LL

images for both training and testing.

Table 1: Evaluation result on ExLPose Dataset. Our

proposed method improves accuracy by performing

adaptive image processing on the input image.

AP@0.5-0.95 ↑

Model LL-N LL-H LL-E LL-A

Lee et al., 2023 42.1 33.8 18.0 32.4

Ours +

Lee et al., 2023

42.6 34.1 20.0 33.2

4.3 Evaluation Protocol

In this experiment, we follow the same evaluation

method as Lee et al. Since we adopt single-person

pose estimation as the recognition task, we assume

that true bounding boxes are provided for each person

in the image. The evaluation metric used is the

Standard Average Precision (AP) score based on the

detected Object's Keypoint Similarity (OKS), widely

used in pose estimation tasks. The low-light test

images are divided into subsets based on their

Table 2: Ablation analysis on the Differentiable Image

Processing Module. E, G, and S stand for Exposure,

Gamma, and Smoothing operators, respectively.

AP@0.5-0.95 ↑

Operator LL-N LL-H LL-E LL-A

E, G 40.5 32.3 18.0 31.5

E, S 37.3 31.6 19.1 30.0

G, S 20.2 8.0 1.1 10.3

E, G, S 42.6 34.1 20.0 33.2

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

726

respective mean lightness: LL-Normal (LL-N), LL-

Hard (LL-H), and LL-Extreme (LL-E). The mean

pixel intensity of LL-N, LL-H, and LL-E images are

3.2, 1.4, and 0.9, respectively, indicating that all of

them can be classified as extremely dark images. The

union of all three low-light subsets is denoted as LL-

All (LL-A).

4.4 Experimental Results

The optimal parameter predictor receives randomly

cropped images as input, causing variability in the

recognition model's predictions for each evaluation.

In this experiment, we conducted three evaluations

for each of the four subsets of the ExLPose dataset

(LL-N, LL-H, LL-E, LL-A), reporting the average

values to account for this variability. In this

experiment, we compared the accuracy of human

pose estimation when our proposed method was

applied and when it was not applied to Lee's pose

estimation model. The comparison results of the pose

estimation accuracy are shown in Table 1. Applying

our proposed method to Lee's pose estimation model

led to improved pose estimation accuracy in all

subsets. Specifically, there was an improvement of

1.2% (0.5 points) in LL-N, 1.0% (0.3 points) in LL-

H, 11.1% (2.0 points) in LL-E, and 2.5% (0.8 points)

in LL-A. These results demonstrate that the proposed

method can enhance significant performance

improvement observed in the extremely low-light

condition of LL-E. Based on these experimental

results, the performance of the proposed method has

been confirmed. Figure 2 shows qualitative

evaluation results. Comparing the results of applying

Lee et al.'s proposed input data normalization method

in the second row to the application of our proposed

method to the images in the third row, it is evident

Table 3: Evaluation results when changing the order of the

three operators. The best results were obtained when the

order was E (Exposure), G (Gamma), S (Smoothing).

AP@0.5-0.95 ↑

Operator LL-N LL-H LL-E LL-A

S, E, G 36.9 29.6 15.7 28.4

S, G, E 6.4 8.9 3.1 6.1

G, E, S 9.6 10.7 4.5 8.5

G, S, E 8.0 9.8 2.7 6.9

E, S, G 40.5 32.3 17.9 31.5

E, G, S 42.6 34.1 20.0 33.2

that our proposed method in the third row is much

closer to the ground truth in the fourth row. This

demonstrates the performance of our proposed

approach in performing optimal image processing

based on the characteristics of the pretrained

recognition model in the subsequent stage.

4.5 Ablation Study

To validate the performance of each operator in the

differentiable image processing module, we

evaluated combination of the three image processing

operators using four subsets of the ExLPose test

dataset (LL-N, LL-H, LL-E, LL-A). Table 2 shows

evaluation results of the combination of two

operators. For each combination, we trained the

optimal parameter predictor with the same training

settings. We additionally show the result of three

operators (E, G, S). The result of the three image

processing operators yielded the best performance,

demonstrating the effectiveness of these operators.

Furthermore, we investigated the performance of

the order of the proposed three image processing

operators. We swapped the order of three proposed

image processing operators, trained the optimal

parameter predictor with the same training settings.

Table 3 shows the results of the performance of all the

orderings. The results revealed that in the order of

[Exposure, Gamma, Smoothing] is crucial for higher

performance of the pose estimation task. Adjusting

exposure spreads pixel values in low-light regions

linearly. It expands the overall range of pixel values

and enhances details and features. Subsequent gamma

adjustment makes natural and uniform lightness

distribution. It can extract detailed information from

low-light images. On the other hand, the ordering that

switches Exposure operator and Gamma operator

significantly decreases the recognition performance.

The inverse order makes nonlinear transformations to

information biased towards low-light regions. It

potentially destroys structural information in the

image. It also makes it difficult to extract features for

Table 4: Results of the accuracy comparison when

performing image processing based on the optimal

parameters obtained through grid search for each test data

image.

AP@0.5-0.95 ↑

Model LL-N LL-H LL-E LL-A

Lee et al., 2023 42.1 33.8 18.0 32.4

Ours +

Lee et al., 2023

51.5 41.4 27.0 40.9

Improving Low-Light Image Recognition Performance Based on Image-Adaptive Learnable Module

727

recognition tasks and leads to a degradation in

recognition performance.

4.6 Discussion

Our approach improves accuracy without retraining

the recognition model by incorporating exposure

recovery and noise reduction into the image

processing pipeline. This suggests that by applying

our proposed method, it became possible to retrieve

the overlooked features in Lee et al.'s pose estimation.

To explore the potential performance of our proposed

method, we conducted a grid search on the entire test

data of ExLPose. In this process, we searched for

optimal parameters for each input image and

processed the input images in the differentiable image

processing module. The processed images were then

input into Lee et al.'s recognition model. The results

are presented in Table 4. As evident from the results

of the preliminary experiment, the pose estimation

accuracy significantly improved across all subsets.

This suggests the potential to further enhance the

performance of the proposed method. The refinement

of the training method for the optimal parameter

predictor will be a future task.

5 CONCLUSIONS

We proposed an image-adaptive learnable module

that improves recognition performance in low-light

environments without retraining the pretrained

recognition model for pose estimation. Our proposed

method consists of a differentiable image processing

module and an optimal parameter predictor. The

Differentiable image processing module restores the

exposure and remove noise from low-light images to

recover the latent content of the images. The optimal

parameter predictor predicts the optimal

hyperparameters used in the modules by using a small

FCN. The entire framework was trained end-to-end,

and the optimal parameter predictor learned to predict

appropriate hyperparameters by referring only to the

loss of the subsequent pose estimation task in this

paper. The experimental results demonstrated that our

approach achieved a maximum recovery of up to

11.1% in the accuracy of pretrained pose estimation

models across different levels of low-light image data.

REFERENCES

Land, E. H. (1977). The retinex theory of color vision.

Scientific American, 237(6), 108-129.

Lore, K. G., Akintayo, A., & Sarkar, S. (2017). LLNet: A

deep autoencoder approach to natural low-light image

enhancement. Pattern Recognition, 61, 650-662.

Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., & Ma, J.

(2017). Msr-net: Low-light image enhancement using

deep convolutional network. arXiv preprint

arXiv:1711.02488.

Lv, F., Lu, F., Wu, J., & Lim, C. (2018, September).

MBLLEN: Low-Light Image/Video Enhancement

Using CNNs. In BMVC (Vol. 220, No. 1, p. 4).

Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep

retinex decomposition for low-light enhancement.

arXiv preprint arXiv:1808.04560.

Zhang, Y., Zhang, J., & Guo, X. (2019, October). Kindling

the darkness: A practical low-light image enhancer. In

Proceedings of the 27th ACM international conference

on multimedia (pp. 1632-1640).

Lee, S., Rim, J., Jeong, B., Kim, G., Woo, B., Lee, H., ... &

Kwak, S. (2023). Human pose estimation in extremely

low-light conditions. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern

Recognition (pp. 704-714).

Tomasi, C., & Manduchi, R. (1998, January). Bilateral

filtering for gray and color images. In Sixth

international conference on computer vision (IEEE Cat.

No. 98CH36271) (pp. 839-846). IEEE.

Liu, W., Ren, G., Yu, R., Guo, S., Zhu, J., & Zhang, L.

(2022, June). Image-adaptive YOLO for object

detection in adverse weather conditions. In Proceedings

of the AAAI Conference on Artificial Intelligence (Vol.

36, No. 2, pp. 1792-1800).

Mosleh, A., Sharma, A., Onzon, E., Mannan, F., Robidoux,

N., & Heide, F. (2020). Hardware-in-the-loop end-to-

end optimization of camera image processing pipelines.

In Proceedings of the IEEE/CVF Conference on

computer Vision and Pattern Recognition (pp. 7529-

7538).

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

728