Safe Screening for Logistic Regression with ℓ
0
–ℓ
2
Regularization
Anna Deza
a
and Alper Atamt
¨
urk
b
Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA, U.S.A.
Keywords:
Screening Rules, Sparse Logistic Regression.
Abstract:
In logistic regression, it is often desirable to utilize regularization to promote sparse solutions, particularly for
problems with a large number of features compared to available labels. In this paper, we present screening rules
that safely remove features from logistic regression with ℓ
0
−ℓ
2
regularization before solving the problem. The
proposed safe screening rules are based on lower bounds from the Fenchel dual of strong conic relaxations
of the logistic regression problem. Numerical experiments with real and synthetic data suggest that a high
percentage of the features can be effectively and safely removed apriori, leading to substantial speed-up in the
computations.
1 INTRODUCTION
Logistic regression is a classification model used to
predict the probability of a binary outcome from a set
of features. Its use is prevalent in a large variety of do-
mains, from diagnostics in healthcare (Gramfort et al.,
2013; Shevade and Keerthi, 2003; Cawley and Talbot,
2006) to sentiment analysis in natural language pro-
cessing (Wang and Park, 2017; Yen et al., 2011) and
consumer choice modeling in economics (Kuswanto
et al., 2015).
Given a data matrix A ∈ R
m×n
of m observations,
each with n features and binary labels y ∈ {−1,1}
m
,
the logistic regression model seeks regression coeffi-
cients x ∈ R
n
that minimize the convex loss function
L(x) =
1
m
m
∑
i=1
log
1 + exp(−y
i
A
i
x)
·
We use A
i
to denote the i-th row of matrix A and
A
j
to denote the j-th column of A. When the num-
ber of available features is large compared to the
number of the observations (labels), i.e., m << n,
logistic regression models are prone to overfitting.
Such cases require pruning the features to mitigate
the risk of overfitting. Regularization is a natural
approach for this purpose. Convex ℓ
2
-regularization
(ridge) (Hoerl and Kennard, 1970) imposes bias by
shrinking the regression coefficients x
i
, i ∈ [n], to-
ward zero. The ℓ
1
-regularization (lasso) (Tibshirani,
1996) and ℓ
1
–ℓ
2
-regularization (elastic net) (Zou and
Hastie, 2005) perform shrinkage of the coefficients
a
https://orcid.org/0000-0002-4849-683X
b
https://orcid.org/0000-0003-1220-808X
and selection of the features simultaneously. Re-
cently, there has been a growing interested in utilizing
the exact ℓ
0
-regularization (Miller, 2002; Bertsimas
et al., 2016) for selecting features in linear regression.
Although ℓ
0
-regularization introduces non-convexity
to regression models, significant progress has been
done to develop strong models and specialized algo-
rithms to solve medium to large scale instances re-
cently (e.g. Bertsimas and Van Parys, 2017; Atamt
¨
urk
and G
´
omez, 2019; Hazimeh and Mazumder, 2020;
Han et al., 2020).
We consider logistic regression with ℓ
0
–ℓ
2
regu-
larization:
min
x∈R
n
L(x) +
1
γ
∥x∥
2
2
+ µ∥x∥
0
, and (REG)
min
x∈R
n
L(x) +
1
γ
∥x∥
2
2
s.t. ∥x∥
0
≤ k. (CARD)
Whereas the ℓ
2
-regularization penalty term above
encourages shrinking the coefficients, which helps
counter effects of noise present in the data matrix A,
the ℓ
0
-regularization penalty term in (REG) encour-
ages sparsity, selecting a small number of key fea-
tures to be used for prediction, which is modeled as
an explicit cardinality constraint in (CARD). Due to
the ℓ
0
-regularization terms, (REG) and (CARD) are
non-convex optimization problems.
Screening rules refer to preprocessing procedures
that discard certain features, leading to a reduction
in the dimension of the problem, which, in turn, im-
proves the solution times of the employed algorithms.
For ℓ
1
-regularized linear regression, El Ghaoui et al.
(2010) introduce safe screening rules that guarantee
to remove only features that are not selected in the so-
lution. Strong rules (Tibshirani, 2011), on the other
Deza, A. and Atamtürk, A.
Safe Screening for Logistic Regression with – Regularization.
DOI: 10.5220/0011578100003335
In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 1: KDIR, pages 119-126
ISBN: 978-989-758-614-9; ISSN: 2184-3228
Copyright
c
2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
119