DNN Layers Features Reduction for Out-of-Distribution Detection

Mikha

el Presley Kibinda-Moukengue

, Alexandre Baussard

and Pierre Beauseroy

Computer Science and Digital Society Laboratory (LIST3N), Universit

e de Technologie de Troyes, Troyes, France

Keywords:

Deep Neural Networks, Out-of-Distribution, Detection, Statistical Hypothesis Tests.

Abstract:

Decision-making in a number of industries, including environmental management, transportation, and public

health, is greatly aided by artiﬁcial intelligence systems. Nonetheless, to perform well, these systems requires

to follow some usage conditions. For instance, the data fed into a classiﬁcation neural network must come

from the same distribution as the training data to maintain the performance measured during test. In practice,

however, this condition is not always met and not so easy to guarantee. In particular, for image recognition,

it’s possible to submit images that do not contain any learned classes and still receive a ﬁrm response from the

network. This paper presents an approach to out-of-distribution observation detection applied to deep neural

networks (DNNs) for image classiﬁcation, called DNN Layers Features Reduction for Out-Of-Distribution

Detection (DROOD). The principle of DROOD is to construct a decision statistic by successively synthesizing

information from the features of all the intermediate layers of the classiﬁcation network. The method is

adaptable to any DNN architecture and experiments show results that outperform reference methods.

1 INTRODUCTION

From data collection to results deployment, machine

learning models, particularly deep neural networks

(DNN), are increasingly used in image classiﬁcation

tasks with remarkable performance. The traditional

use of these models requires the training and test sam-

ples to be drawn independently and identically dis-

tributed. In real-life applications, this condition is not

always satisﬁed. Thus, during inference, testing an

observation whose distribution is singular from the

distribution of the training data, also known as in-

distribution (ID), will produce random, erroneous, or

even overconﬁdent predictions. Such observation is

called out-of-distribution (OOD). In recent years, sev-

eral approaches have been proposed to deal with this

problem. Many of them focus on OOD detection for

deep learning models (Lee et al., ), (Sastry and Oore,

), (Kaur et al., ). Two groups of OOD detection ap-

proaches can be deﬁned: (i) integrated approaches

which directly integrate the detector when training

the model by modifying the network architecture or

by modifying the loss function (Malinin and Gales,

), (Winkens et al., ), (Zhang et al., ); (ii) post-hoc

approaches which integrate the detector during infer-

https://orcid.org/0009-0007-1455-6754

https://orcid.org/0000-0002-6693-4282

https://orcid.org/0000-0002-2883-1303

ence without modifying the weights of the trained net-

work (Lee et al., ), (Zisselman and Tamar, ), (Raghu-

ram et al., ). The integrated approach is inherently

computationally intensive, as it requires both calibrat-

ing the network weights and performing OOD detec-

tion simultaneously during training. The post-hoc ap-

proach can provide a signiﬁcant advantage since it

doesn’t require retraining the network for detection.

This allows detection to be implemented and adjusted

without affecting the classiﬁer’s performance. Several

works have applied OOD detection using information

from the last layer of the neural network (Hendrycks

and Gimpel, ), (Liu et al., ), (Sun and Li, ). This last

layer is very important in image classiﬁcation tasks

because it is used to make the decision. It assigns a

class to the image based on the output probabilities for

a given input sample. These approaches have shown

that applying OOD detection on the last layer of the

neural network enables a good separation between in-

distribution and out-of-distribution ata. Other OOD

detection approaches take advantage of each layer of

the neural network (Dziedzic et al., ), (Li et al., ).

In DNN, each layer plays a distinct role in process-

ing data, and the extent of their contribution can vary

from one layer to another. Applying detection to the

outputs of each layer allows all the sensitivities and

crucial information of the data to be taken into ac-

count for robust decision-making.

Kibinda-Moukengue, M. P., Baussard, A. and Beauseroy, P.

DNN Layers Features Reduction for Out-of-Distribution Detection.

DOI: 10.5220/0013167400003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 70-78

ISBN: 978-989-758-730-6; ISSN: 2184-4313

Figure 1: Overview of the DROOD approach for OOD detection in CNN classiﬁers. The blue arrows model the ﬂow of the

training data, the green arrows the ﬂow of the validation data and the red arrows the ﬂow on any data to be tested. The main

steps of the DROOD approach are framed by dotted lines of different colors, also used in Figure 2, which details the DROOD

ﬂow using the equations.

The literature shows a growing trend in OOD

detection methods that leverage decision statistics,

achieving notable performance improvements, as

seen with the MaSF (Max-Simes-Fisher) method

(Haroush et al., ). The proposed DNN Layers Fea-

tures Reduction for Out-Of-Distribution Detection

(DROOD) method is part of the post-hoc approach to

OOD detection. It can be applied to any pre-trained

neural network, as it requires only the extraction of

the features from all layers of the DNN to detect

OOD data. Similar to MaSF, DROOD conducts OOD

detection by applying decision statistics across each

layer of the DNN. DROOD uses these statistics to de-

termine the distance between elements within a space

that represents the class-conditional log-densities. In

Figure 1, the main steps of the DROOD approach

are depicted, using a convolutional neural network

(CNN) architecture (an equivalent diagram can be

drawn up for transformers).

Overall, the contributions in this paper can be

summarised as follows:

• We propose, for CNN classiﬁcation models, to re-

duce each channel of each layer by estimating the

probability density of the pattern of the channel

input that ﬁts best the trained convolutional ﬁl-

ter. Equivalently, for vision transformer models,

we propose to reduce the image tokens of each

layer (or encoding block) by estimating the prob-

ability density of the image token pattern with the

strongest response. To reduce channels, assuming

their independence, we estimate their joint log-

distribution per layer.

• To characterize an image, we evaluate its class-

conditional log-probability density at each layer

of the network. The detection statistic is deduced,

in the obtained representation space, from the av-

erage nearest neighbors’ distances between the

test image and the training ones.

• Overall, the DROOD approach demonstrates

promising performance on out-of-distribution

datasets, both close to and far from the training

data distribution, outperforming the methods used

for comparison in this paper.

This paper is organized as follows: Section 2 re-

views recent advancements in out-of-distribution de-

tection. Section 3 describes the proposed DROOD

approach. Section 4 gives some implementation de-

tails and presents the experimental results on several

datasets and for several DNN classiﬁcation models.

Finally, section 5 presents the conclusion and per-

spectives for future work.

2 RELATED WORK

Detecting out-of-distribution data is essential for en-

suring the safety and reliability of machine learn-

ing systems. Various methods address this challenge

based on how anomalies are deﬁned. Effective OOD

detection requires detectors that can accurately re-

ject inputs that are singular to the training distribution

while accepting those within it.

Recently, advanced statistical methods for OOD

detection in deep neural networks have been devel-

oped. The MLOD (Multitesting-based Layer-wise

Out-of-Distribution Detection) approach (Li et al., )

is one the them. It extracts feature maps across dif-

ferent layers of a model, applies multiple statistical

DNN Layers Features Reduction for Out-of-Distribution Detection

hypothesis testing techniques to control the True Pos-

itive Rate (TPR) and computes p-values based on the

empirical distribution of the score function across dif-

ferent layers for decision-making. Another approach,

called p-DkNN (Dziedzic et al., ), performs the sta-

tistical tests on latent representations of a pre-trained

CNN model. p-DkNN is built on theoretical analy-

sis of Neyman-Pearson classiﬁcation and combined

it with recent works in selective classiﬁcation (reject

option). The main idea in this approach is to ab-

stain from predicting the out-of-distribution samples

and to maintain high precision on the in-distribution

datasets.

Among the various existing methods, we focused

on four for comparison in this paper, namely ODIN

(Liang et al., ), Mahalanobis (Lee et al., ), OpenPCS-

Class (Carvalho et al., ), and MaSF (Haroush et al.,

). ODIN primarily targets the behavior of the ﬁ-

nal layer by modifying it and pre-processing the in-

put samples. The ﬁrst modiﬁcation involves adjust-

ing or controlling the distribution of the network’s

output probabilities, while the second focuses on as-

sessing the impact of the gradients calculated during

training on perturbations applied to the input samples.

The Mahalanobis detector is an approach that extracts

feature maps of the training data classes across the

layers of a pre-trained CNN, assuming these class-

conditional feature maps follow a Gaussian distribu-

tion to estimate their probability density. The detec-

tion score is then computed using the Mahalanobis

distance of each test sample with respect to the clos-

est class-conditional distribution. OpenPCS-Class ap-

plies principal component analysis (PCA) to project

the features of the model’s intermediate layers, as-

suming that the reduced features follow a Gaussian

distribution, and then computes the log-likelihood for

decision-making purposes. Finally, MaSF is based on

test statistics for OOD detection in CNN that used all

the intermediate layers features. The process consists

in spatial and channel reduction techniques to produce

statistics per layer, and these statistics are combined

to deﬁne a detection score. The statistical tests are

based on the Simes and Fisher tests.

The proposed DROOD approach is based on a

statistical framework similar to the MaSF and is de-

scribed in the next section. Note that we present the

DROOD method considering CNN-based classiﬁca-

tion models, but this does not affect its generality

and it can easily be transposed to transformer mod-

els, which are also considered in the experiments.

3 PROPOSED APPROACH

This paper seeks to address the out-of-distribution de-

tection problem through the use of statistical hypoth-

esis testing. The hypothesis to be tested is as follows:

(

: X ∼ P

train

: X ≁ P

train

where P

train

represents the training distribution.

As illustrated in Figure 1, for CNN based clas-

siﬁcation models, the DROOD approach is an OOD

detection method that summarize the information

brought by all channels of a classiﬁcation network in

a detection statistic. The statistic is build step by step

synthesising each channel then aggregating all chan-

nels synthesis by layer and ﬁnally bringing all the lay-

ers together in a single statistic to decide whether an

input image is in or out of P

train

. The proposed pro-

cess to obtain the ﬁnal decision statistic, considered at

macro scale, is similar to MaSF but the nature of the

determined statistics is different and the assumptions

that underline the reduction process are also different.

Following Figure 1, the next sections provide ana-

lytical details of the DROOD steps. We can also refer

to Figure 2, which gives a detailed description of the

steps sequence, including the analytical expressions.

3.1 Preliminary Notations

Let X

train

= {(X

train

, y)} be the training set composed

of N

train

images drawn from C

train

classes and X

train

{(X

train

, y)|y = c} its restriction to class c. In the paper

c,i

train

denotes the i

image of X

train

with i from 1 to

train

, the cardinal of X

train

By extension, the validation samples gathered in

val

are drawn from the same distribution than the

training samples in X

train

and are referred as in-

distribution. The class labels in the two sets are the

same, so C

val

= C

train

. These sets are used to learn the

parameters of the DROOD method. Next, new images

are tested to decide whether they are in-distribution

or not. To keep similar notations the i

image of the

test set X

test

is noted X

c,i

test

, where c corresponds ini-

tially to the ”unknown” label. The DNN classiﬁcation

model will predict the class c ∈ {1, ...,C

train

} (each

value corresponds to a given category) if the image

is detected as in-distribution otherwise the image is

out-of-distribution and the class c remain ”unkown”.

Also note that the so-called C

test

classes in the test

set (which are in fact unknown), are of course not all

the same as the C

train

classes in X

train

to unable out-

of-distribution performance analysis of the DROOD

approach.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

Figure 2: Details of the different steps in the DROOD approach. Frame colors correspond to those in Figure 1. Colored

arrows represent the data ﬂow as shown in the Figure 1.

The main goal of OOD detection is to decide

whether an image X input in a given DNN classiﬁ-

cation model is in-distribution or not. To this end, we

propose a method that use all the intermediate layers

features of the DNN. Typical DNN are composed of

L layers, and each layer l ∈ {1, ..., L} consists of n

channels. Let F

j,l

: X → R

×h

be the j-th channel in

the layer l. h

and w

refer to the size of the channels

in layer l.

3.2 Spatial Reduction

The ﬁrst step of DROOD is to produce a statistic for

each channel at each layer (light blue frame in Fig-

ure 1 and in the two ﬁrst black frames of Figure 2).

A spatial reduction is ﬁrst applied to summarize the

information contained in each channel. Following

(Haroush et al., ) the maximum value is considered

for this purpose:

c,i

j,l

= max F

j,l

c,i

j,l

corresponds to the largest response amplitude for

channel j at layer l and for the image i of the class

c in the set k = {train, val, test}. It relates to the idea

of adapted ﬁlters and can be interpreted as the value

at the position in the input features that ﬁts best the

trained ﬁlter.

Next, the probability q

′

j,l

, to obtained a more ex-

treme value of the statistic t

′

j,l

than the observed one

c,i

j,l

, is estimated for all classes c

′

. Each probability

captures how singular the obtained value is compared

to the distribution of t

′

j,l

for the images of class c

′

. It

can be formalized as:

′

j,l

= min(

F(t

c,i

j,l

| X

′

train

), 1 −

F(t

c,i

j,l

| X

′

train

)),

where

F() is the empirical cumulative distribution

function of t

′

j,l

determined using training samples

′

train

. Note that q

′

j,l

can be interpreted as a p-value

in the context of a two-sided test.

3.3 Probability Density Estimator

To perform channel reduction, the density f

j,l

(q) for

each class c is needed (light green frame in Figure 1

DNN Layers Features Reduction for Out-of-Distribution Detection

and ﬁrst black frame in Figure 2). Its Parzen estimator

j,l

(q|Q

train, j,l

) based on the set Q

train, j,l

= {q

c,i

j,l

}

i∈n

train

is then determined using a Gaussian kernel and cross

validation parameter estimation. These estimators

{

j,l

}

c∈{1,...,C

train

}

(outputs of the ﬁrst black frame in

Figure 2) are used during channel reduction to esti-

mate conditional probability densities, as described in

next section.

3.4 Channel Reduction

As shown in the green frame in Figure 1 and in the

second black frame in Figure 2, the probability den-

sity for each class c

′

, P

′

j,l

, of the input image X

c,i

k ∈ {train, val,test} is deduced for each channel us-

ing the estimators {

′

j,l

}

′

∈{1,...,C

train

}

and the q

′

j,l

′

j,l

′

j,l

′

j,l

′

train, j,l

Next, assuming channel independence, channel

reduction conditionally to a class c

′

is deduced by

computing the log-joint probability density of chan-

nels:

′

∑

logP

′

j,l

These log-joint conditional probabilities form a vec-

tor v

k,l

in a space S

of dimension C

train

, the number

of classes in the training set:

k,l

1,i

2,i

··· v

train

where T stands for the transpose operator. Its coor-

dinates characterise how likely are jointly the most

prominent channels response conditionally to each

trained class.

3.5 Nearest Neighbor Method

Following channel reduction, outlier observations

should be far from the origin in S

. To evaluate how

far an observation lays from training ones, a nearest

neighbor method (purple frame in Figure 1 and in the

last black frame of Figure 2) is considered.

The mean euclidean distance, noted md

k,l

, be-

tween v

k,l

and the N

nearest training samples

train,l

}

j∈N

train

, in S

, is determined. For j ∈ N

train

k,l

= ∥v

train,l

− v

k,l

∥

and

k,l

∑

m=1

(m)

k,l

where d

(1)

k,l

≤ d

(2)

k,l

≤ ··· ≤ d

train

)

k,l

are the sorted dis-

tances to training neighbors. md

k,l

indicates how well

the input sample resembles N

training ones.

3.6 Layer Reduction

To obtain the detection statistics, we ﬁnally perform a

layer reduction (light orange frame in Figure 1 and in

the last black frame of Figure 2). To do so, the proba-

bility p

that the mean distance between one sample at

layer l and the training samples could be larger than

the obtained value md

k,l

is estimated for each layer

(p-value of md

k,l

) using empirical cumulative distri-

bution estimator:

= 1 −

F(md

k,l

val

To determine the empirical cumulative distribution,

we use the validation set X

val

to ensure the indepen-

dence of the obtained p-values with the training data.

The joint probability of mean distances for all lay-

ers p is computed assuming layer independence as the

product of p

for all l:

p =

∏

l=1

We can therefore note that the larger this ﬁnal prob-

ability is the more likely the sample belongs to one

of the trained classes, leading to the detector deﬁned

below.

3.7 Detector

To implement detection (light grey frame in Figure 1

and output of the third black frame in Figure 2), p is

compared to a threshold γ ∈ [0, 1] to decide whether

the image has to be classiﬁed or discarded. The de-

tector is deﬁned as follows:

The tested image is:

(

ID, if p ≥ γ

OOD, if p < γ

4 EXPERIMENTS

This section describes the data, the DNN classiﬁca-

tion models, the experiments and discusses the re-

sults. It should be noted that we carried out all the

simulations of the OOD methods used to compare

performance with DROOD.

4.1 Datasets

As generally considered for OOD experiments, ﬁve

datasets are used in this paper. CIFAR10 (Krizhevsky,

) is used as the in-distribution dataset. It consists of 10

classes and contains 50,000 train images, split in train

(40,000) and validation (10,000) sets, and 10,000 test

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

images. This dataset is made up of natural images

with one dominant object per image such as vehicle,

animal, or boat.

The remaining four datasets are used as out-of-

distribution data. The Large-scale Scene UNderstand-

ing (LSUN) test dataset (Yu et al., ) contains 10,000

images with 10 classes representing different envi-

ronments, both natural and man-made. The Street

View House Numbers (SVHN) test dataset (Netzer

et al., ) contains 26,032 digits images (from 0 to

9), extracted from house numbers images captured

by Google Street View. The TinyImageNet (Le and

Yang, ) dataset, a subset of the larger ImageNet

dataset, containing 200 classes. The test set contains

10,000 images. Finally, the CIFAR100 test dataset

(Krizhevsky, ), which contains 10,000 images of 100

classes, is made up of natural images close to CI-

FAR10.

4.2 Model Architectures

The proposed DROOD method falls under post-

hoc out-of-distribution detection methods. In this

study, as classiﬁcation models, we used two CNN:

ResNet34 (He et al., ) and DenseNet-BC (Huang

et al., ) and one transformer: the Vision Trans-

former (Dosovitskiy et al., ), referenced as ViT

All these models are trained on the CIFAR10 train-

ing set, the considered in-distribution dataset. These

models achieved good accuracy, reaching 0.9510 with

ResNet34, 0.9400 with DenseNet-BC and 0.9852

with ViT, on the CIFAR10 test set.

In the experiments, for MaSF and DROOD meth-

ods, the spatial reduction (or max operation) is ap-

plied to the channels at each layer of the CNN mod-

els. For the ViT model, this operation is performed

on all tokens, excluding the ”class token”, across the

transformer encoding layers. The ”class token” is ex-

cluded because it gathers information from the other

tokens within these encoding layers. Consequently,

we can expect the application of spatial reduction to

the remaining ”image tokens” to be equivalent to the

direct consideration of the ”class token”. However, in

this paper, we want to keep the ﬂow exactly the same

as for the CNN models, for fair comparison. Note

that the MaSF method, initially developed for CNN

has been adapted to ViT.

The OpenPCS-Class method is originally applied

to the Vision Transformer architecture. For our exper-

imental evaluation, we have adapted this approach to

CNN architectures.

ViT weights have been uploaded from Hugging

Face web site: https://huggingface.co/nateraw/

vit-base-patch16-224-cifar10

The Mahalanobis approach was originally pro-

posed for CNN models. The code provided by the

authors cannot be used for the Vision Transformer

architecture, as it requires too much memory space.

Consequently, it is only used as comparison method

with the CNN models.

4.3 Evaluation Metrics

As generally considered, the following metrics are

used to evaluate the detection performance: the false

positive rate of the OOD data when the true posi-

tive rate of the ID data is 95%, denoted as FPR95;

the true positive rate of the ID data when the FPR of

the OOD data is 5%, referred to as TPR95; and the

area under the receiver operating characteristic curve

(AUC), which quantiﬁes how well a detector can sep-

arate ID data from OOD data. In the tables, ↓ (or ↑)

indicates that lower (or higher) values are preferable,

while bold text highlights the best results in each row.

4.4 Experimental Results

All experimental results on the considered DNN

classiﬁcation models (CNNs and transformers) are

summarized in Table 1 for ResNet34, Table 2 for

DenseNet-BC and Table 3 for ViT.

The hyperparameters of the method are the ker-

nels, the bandwidth of the kernels for estimating prob-

ability densities, the number of neighbours for cal-

culating the mean distance and the decision thresh-

old. For the probability density estimators, the chosen

kernels are Gaussians and their bandwidth has been

determined by cross-validation on the training data.

The choice of the number of neighbors Nm was de-

termined empirically. After experiments, we set the

Nm to 5 which gives the best results. The detec-

tion threshold γ has been chosen according to targeted

false alarm rate measured on validation set.

The detection performance of the DROOD ap-

proach is very good on both far-out-of-distribution

(LSUN, SVHN and TinyImageNet) and near-out-

of-distribution (CIFAR100) datasets, outperforming

most of the time the reference methods. Based on

these results, one can also note that OOD detection

performs better when using the ViT model.

In many OOD detection methods, CIFAR10 and

CIFAR100 are among the most difﬁcult datasets to

evaluate, especially when one of these datasets is used

as an in-distribution, due to the close similarities of

some classes. Despite this, the DROOD approach

achieves the best performance in OOD detection for

all DNN considered, with CIFAR10 used as the ID set

and CIFAR100 as the OOD set.

DNN Layers Features Reduction for Out-of-Distribution Detection

Table 1: Performance results using ResNet34.

TPR95↑ / FPR95↓ / AUC↑ (%)

OOD datasets ODIN Mahalanobis OpenPCS-Class MaSF Ours

CIFAR100 33.50 / 52.90 / 85.90 42.68 / 37.54 / 89.08 65.98 / 28.44 / 93.30 83.61 / 20.25 / 96.40 84.83 / 14.52 / 97.14

LSUN 92.90 / 8.50 / 98.60 92.17 / 7.18 / 98.34 98.76 / 0.09 / 99.40 99.73 / 0.19 / 99.81 99.91 / 0.01 / 99.91

SVHN 41.50 / 47.90 / 88.10 98.24 / 3.03 / 99.04 82.95 / 19.23 / 95.93 99.73 / 0.06 / 99.83 99.91 / 0.00 / 99.95

TinyImageNet 89.20 / 17.50 / 97.40 89.82 / 7.35 / 97.91 98.76 / 0.25 / 99.41 99.37 / 0.35 / 99.77 98.54 / 1.12 / 99.63

Average 64.27 / 31.70 / 92.50 80.72 / 13.77 / 96.09 86.61 / 12.00 / 97.01 95.61 / 5.21 / 98.95 95.79 / 3.91 / 99.15

Table 2: Performance results using DenseNet-BC.

TPR95↑ / FPR95↓ / AUC↑ (%)

OOD datasets ODIN Mahalanobis OpenPCS-Class MaSF Ours

CIFAR100 33.25 / 55.43 / 84.99 33.96 / 56.40 / 83.60 67.17 / 40.48 / 91.85 55.99 / 42.80 / 89.89 68.00 / 25.58 / 93.90

LSUN 91.00 / 10.15 / 98.07 90.23 / 7.66 / 98.06 85.85 / 27.58 / 95.65 99.61 / 0.14 / 99.86 99.98 / 0.00 / 99.98

SVHN 61.48 / 42.95 / 91.12 96.95 / 3.30 / 98.89 91.04 / 16.23 / 97.24 99.57 / 0.01 / 99.89 99.98 / 0.00 / 99.98

TinyImageNet 86.70 / 16.61 / 97.02 65.26 / 17.47 / 94.63 82.86 / 29.66 / 95.00 98.23 / 2.08 / 99.51 98.27 / 1.96 / 99.53

Average 68.10 / 31.28 / 92.80 71.60 / 21.20 / 93.79 81.73 / 28.48 / 94.93 88.35 / 11.25 / 97.28 91.55 / 6.88 / 98.34

Table 3: Performance results using ViT.

TPR95↑ / FPR95↓ / AUC↑ (%)

OOD datasets ODIN OpenPCS-Class MaSF Ours

CIFAR100 71.40 / 16.90 / 95.50 94.51 / 5.34 / 98.70 98.39 / 2.01 / 99.53 99.64 / 0.06 / 99.75

LSUN 93.10 / 6.00 / 98.60 99.75 / 0.15 / 99.80 99.92 / 0.00 / 99.95 99.71 / 0.00 / 99.83

SVHN 93.90 / 5.90 / 98.60 98.83 / 0.60 / 99.60 99.95/ 0.00 / 99.96 99.71 / 0.00 / 99.83

TinyImageNet 80.10 / 11.20 / 97.60 98.38 / 1.34 / 99.65 99.92 / 0.00 / 99.95 99.71 / 0.00 / 99.78

Average 84.62 / 10.00 / 97.57 97.86 / 1.85 / 99.43 99.54 / 0.50 / 99.84 99.69 / 0.01 / 99.79

(a) ResNet34 (b) DenseNet-BC (c) ViT

Figure 3: Performance results of the CIFAR100 ”bus” class.

(a) ResNet34 (b) DenseNet-BC (c) ViT

Figure 4: Performance results of the CIFAR100 ”cockroach” class.

Finally, we propose to focus on the ”bus” (Figure

3) and ”cockroach” (Figure 4) classes of CIFAR100.

The ﬁrst one is very similar to CIFAR10 ”automo-

bile” class, while ”cockroach” is not close to any CI-

FAR10 classes. DROOD is particularly effective for

the ”bus” class detection, whatever the DNN model,

and outperforms reference methods. The ”cockroach”

class is perfectly detected by almost all methods, as

expected.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

5 CONCLUSION

The DROOD method is based on a statistical frame-

work for OOD detection. It is a successive syn-

thesis of statistics using all the features produced

by a DNN. The experimental study shows very

good detection performances compared to state-of-

the-art methods with two image classiﬁcation net-

works based on CNNs and one based on transformers,

which also demonstrates its ability to perform what-

ever the model.

We observed variations in performance depend-

ing on the DNN chosen and the OOD method, which

seems in a certain way normal. However, some ex-

isting OOD detection methods appear to be linked

to speciﬁc neural network architectures, since perfor-

mances vary considerably when applied with others.

Experiments suggest that our DROOD detection ap-

proach is more robust than others.

As further work, It would be of course interesting

to test other distances than the Euclidean distance. As

mentioned above, in the transformer architecture, the

”class token” gathers information from the ”image to-

kens” across the transformer encoding layers for the

ﬁnal classiﬁcation task. One can therefore expect that

the max operation in MaSF and DROOD methods can

be effectively replaced by the use of this ”class to-

ken”. Finally, it would also be interesting to experi-

ment with this type of approach in other application

ﬁelds, such as audio analysis or image segmentation.

REFERENCES

Carvalho, T. M., Vellasco, M. M. B. R., and do Amaral, J.

F. M. Out-of-distribution detection in deep learning

models: A feature space-based approach. In Interna-

tional Joint Conference on Neural Networks, IJCNN,

Gold Coast, Australia, June 18-23, 2023, pages 1–7.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,

D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,

M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,

N. An image is worth 16x16 words: Transformers

for image recognition at scale. In International Con-

ference on Learning Representations, ICLR, May 3-7,

2021.

Dziedzic, A., Rabanser, S., Yaghini, M., Ale, A., Erdogdu,

M. A., and Papernot, N. p-dknn: Out-of-distribution

detection through statistical testing of deep represen-

tations. ArXiv, 2022.

Haroush, M., Frostig, T., Heller, R., and Soudry, D. A statis-

tical framework for efﬁcient out of distribution detec-

tion in deep neural networks. In International Confer-

ence on Learning Representations, ICLR, April 25-29,

2022.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual

learning for image recognition. In IEEE Conference

on Computer Vision and Pattern Recognition, CVPR,

June 27-30, 2016, Las Vegas, NV, USA, pages 770–

778.

Hendrycks, D. and Gimpel, K. A baseline for detecting mis-

classiﬁed and out-of-distribution examples in neural

networks. In International Conference on Learning

Representations, ICLR, April 24-26, 2017, Toulon,

France.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,

K. Q. Densely connected convolutional networks.

In IEEE Conference on Computer Vision and Pattern

Recognition, CVPR, 2017, Honolulu, HI, USA, pages

2261–2269.

Kaur, R., Jha, S., Roy, A., Park, S., Dobriban, E., Sokol-

sky, O., and Lee, I. idecode: In-distribution equivari-

ance for conformal out-of-distribution detection. In

AAAI Conference on Artiﬁcial Intelligence, 2022,

volume 36, pages 7104–7114.

Krizhevsky, A. Learning multiple layers of features from

tiny images. Technical report, University of Toronto,

2009, Toronto, Ontario.

Le, Y. and Yang, X. Tiny imagenet visual recognition chal-

lenge. CS 231N, 2015, 7(7):3.

Lee, K., Lee, K., Lee, H., and Shin, J. A simple uniﬁed

framework for detecting out-of-distribution samples

and adversarial attacks. In Bengio, S., Wallach, H. M.,

Larochelle, H., Grauman, K., Cesa-Bianchi, N., and

Garnett, R., editors, Advances in Neural Information

Processing Systems 31: Annual Conference on Neural

Information Processing Systems, NeurIPS, December

3-8, 2018, Montr

eal, Canada.

Li, J., Li, S., Wang, S., Zeng, Y., Tan, F., and Xie,

C. Enhancing out-of-distribution detection with

multitesting-based layer-wise feature fusion. In IEEE

Conference on Artiﬁcial Intelligence, CAI, 25-27

June, 2024, Singapore, pages 510–517.

Liang, S., Li, Y., and Srikant, R. Enhancing the reliability

of out-of-distribution image detection in neural net-

works. In International Conference on Learning Rep-

resentations, ICLR, April 30 - May 3, 2018, Vancou-

ver, BC, Canada.

Liu, W., Wang, X., Owens, J., and Li, Y. Energy-based

out-of-distribution detection. In Larochelle, H., Ran-

zato, M., Hadsell, R., Balcan, M., and Lin, H., editors,

Advances in Neural Information Processing Systems,

2020, volume 33, pages 21464–21475.

Malinin, A. and Gales, M. Predictive uncertainty estimation

via prior networks. Advances in Neural Information

Processing Systems 31: Annual Conference on Neural

Information Processing Systems, NeurIPS, December

3-8, 2018, Montr

eal, Canada.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng,

A. Y., et al. Reading digits in natural images with

unsupervised feature learning. In NIPS workshop

on deep learning and unsupervised feature learning,

2011, Granada, page 4.

Raghuram, J., Chandrasekaran, V., Jha, S., and Banerjee,

S. A general framework for detecting anomalous in-

puts to dnn classiﬁers. In International Conference on

Machine Learning, ICML, 2021, pages 8764–8775.

DNN Layers Features Reduction for Out-of-Distribution Detection

Sastry, C. S. and Oore, S. Detecting out-of-distribution ex-

amples with gram matrices. In International Confer-

ence on Machine Learning, ICML, 2020, pages 8491–

8501.

Sun, Y. and Li, Y. DICE: leveraging sparsiﬁcation for

out-of-distribution detection. In Avidan, S., Brostow,

G. J., Ciss

e, M., Farinella, G. M., and Hassner, T.,

editors, European Conference on Computer Vision,

ECCV, October 23-27, 2022, Tel Aviv, Proceedings,

Part XXIV, volume 13684, pages 691–708.

Winkens, J., Bunel, R., Guha Roy, A., Stanforth, R., Natara-

jan, V., Ledsam, J. R., MacWilliams, P., Kohli, P.,

Karthikesalingam, A., Kohl, S., Cemgil, T., Eslami, S.

M. A., and Ronneberger, O. Contrastive training for

improved out-of-distribution detection. arXiv, 2020.

Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. LSUN:

construction of a large-scale image dataset using deep

learning with humans in the loop. arXiv, 2015.

Zhang, H., Li, A., Guo, J., and Guo, Y. Hybrid models

for open set recognition. In European Conference on

Computer Vision, ECCV, August 23–28, 2020, Glas-

gow, UK, Proceedings, Part III 16, pages 102–117.

Zisselman, E. and Tamar, A. Deep residual ﬂow for out of

distribution detection. In IEEE Conference on Com-

puter Vision and Pattern Recognition, CVPR, 2020,

pages 13994–14003.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods