Algorithms for Regularized Linear Discriminant Analysis

Jan Kalina

and Jurjen Duintjer Tebbens

2,3

Dept. of Medical Informatics and Biostatistics, Institute of Computer Science AS CR, Prague, Czech Republic

Dept. of Computational Methods, Institute of Computer Science AS CR, Prague, Czech Republic

Dept. of Biophysics and Physical Chemistry, Faculty of Pharmacy in Hradec Kr

alov

e, Charles University in Prague,

Hradec Kr

alov

e, Czech Republic

Keywords:

Classiﬁcation Analysis, Regularization, High-dimensional Data, Matrix Decomposition, Computational

Aspects.

Abstract:

This paper is focused on regularized versions of classiﬁcation analysis and their computation for high-

dimensional data. A variety of regularized classiﬁcation methods has been proposed and we critically discuss

their computational aspects. We formulate several new algorithms for regularized linear discriminant analysis,

which exploits a regularized covariance matrix estimator towards a regular target matrix. Numerical linear

algebra considerations are used to propose tailor-made algorithms for speciﬁc choices of the target matrix.

Further, we arrive at proposing a new classiﬁcation method based on L

-regularization of group means and the

pooled covariance matrix and accompany it by an efﬁcient algorithm for its computation.

1 INTRODUCTION

Classiﬁcation analysis methods have the aim to con-

struct (learn) a decision rule based on a training data

set, which is able to automatically assign new data to

one of K groups. Linear discriminant analysis (LDA)

is a standard statistical classiﬁcation method. In the

whole paper, we consider n observations with p vari-

ables, observed in K different samples (groups) with

p > K ≥ 2,

,...,X

, (1)

where n =

∑

k=1

. LDA assumes the data in each

group to come from a Gaussian distribution. The co-

variance matrix Σ is assumed to be the same across

groups and its estimator will be denoted by S.

Various tasks in bioinformatics deal with high-

dimensional data, i.e. data with the number of ob-

served variables p exceeding the number of observa-

tions. Analyzing data in this so-called large p/small n

problem is especially important e.g. in gene expres-

sion studies. Unfortunately, LDA in its standard form

is infeasible for n < p, because the matrix S of size p

is singular and computing its inverse must be replaced

by an appropriate alternative. Available approaches in

this context are based e.g. on pseudoinverse matrices,

which are however unstable due to a small n (Guo

et al., 2007).

Regularized versions of LDA for n  p were pro-

posed (Guo et al., 2007) mainly for applications in

bioinformatics. They are based on such regularized

estimator of the covariance matrix, which is guaran-

teed to be regular and positive deﬁnite even for n  p.

Fast computation and numerical stability is a key re-

quirement expected from reliable statistical and data

mining procedures (Kogan, 2007; Duintjer Tebbens

and Schlesinger, 2007) and remains to be an impor-

tant issue also for regularized classiﬁcation methods

(Hastie et al., 2008; Pourahmadi, 2013). Let us now

describe the most important approaches and critically

discuss their possible computation.

This paper studies efﬁcient algorithms for com-

puting various regularized versions of LDA. Section 2

of this paper formulates several algorithms for reg-

ularized LDA, which exploits a regularized covari-

ance matrix estimator towards a regular target matrix.

The computational effectivity of the algorithms is in-

spected using arguments of numerical linear algebra.

For a speciﬁc choice of the target matrix, we are able

to propose a tailor-made algorithm with a lower com-

putational cost compared to algorithms which are for-

mulated for a general context. Besides, we arrive at

proposing a new version of LDA based on regular-

ization in the sense of the L

norm and accompany it

by an efﬁcient algorithm for its computation in Sec-

tion 3. The classiﬁcation performance of the methods

128

Kalina J. and Duintjer Tebbens J..

Algorithms for Regularized Linear Discriminant Analysis.

DOI: 10.5220/0005234901280133

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2015), pages 128-133

ISBN: 978-989-758-070-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

is illustrated on real data in Section 4. Finally, Sec-

tion 5 concludes the paper.

2 COMPUTATION OF

REGULARIZED LINEAR

DISCRIMINANT ANALYSIS

This section is devoted to proposing and comparing

new algorithms for a habitually used version of the

regularized LDA. We use suitable matrix decomposi-

tions to propose efﬁcient algorithms either for a gen-

eral choice of T or for its speciﬁc choices. To the best

of our knowledge, tailor-made algorithms for a spe-

ciﬁc T have not been described. We compare the new

algorithms in terms of their computational costs as

well as numerical stability.

We will describe one of habitually used regular-

ized versions of LDA. This will be denoted as LDA

∗

to avoid confusion, because the concept of regularized

discriminant analysis encompasses several different

methods (Pourahmadi, 2013; Kalina, 2014) A given

target matrix T will be used, which must be a regular

symmetric positive deﬁnite matrix of size p × p. Its

most common choices include the identity matrix I

or a diagonal (non-identity) matrix; other target matri-

ces have been considered by (Sch

afer and Strimmer,

2005).

Let us denote the mean of the observed values in

the k-th group (k = 1,...,K) by

. LDA

∗

assigns

a new observation Z = (Z

,...,Z

)

to group k, if

∗

> l

∗

for every j 6= k, where the regularized linear

discriminant score for the k-th group (k = 1,...,K)

has the form

∗

)

−1

Z −

∗

)

−1

+ log p

, (2)

where p

is a prior probability of observing an obser-

vation from the k-th group and

∗

= λS + (1 − λ)T (3)

for λ ∈ (0,1] denotes a regularized estimator of the

covariance matrix across groups. We do not treat the

situation with l

∗

= l

∗

for k

6= k separately, because

it occurs with a zero probability for data coming from

a continuous distribution. Equivalently, LDA

∗

assigns

a new observation Z to group k, if

(

− Z)

∗−1

(

− Z) =

= min

j=1,...,K



(

− Z)

∗−1

(

− Z)



. (4)

Various versions of the target matrix T have been

suggested by (Sch

afer and Strimmer, 2005). We point

out that the formula (3) can be justiﬁed by Bayesian

reasoning. Let us assume normally distributed data

with a covariance matrix Σ, while Σ

−1

is assumed to

be a random variable following Wishart distribution

(T /γ, k) for some γ > 0 and an integer k. Follow-

ing (Haff, 1980), the mean of the posterior distribu-

tion of Σ is equal to (3) up to a normalizing constant,

which does not inﬂuence the classiﬁcation rule.

First, the standard approach for computing LDA

∗

may be improved by employing the eigendecomposi-

tion of S

∗

for a ﬁxed λ. A suitable value of λ is found

by a cross-validation in the form of a grid search over

all possible values of λ ∈ (0, 1].

Algorithm 1. LDA

∗

for the general regularization (3)

based on eigendecomposition.

Step 1. Compute the matrix

A = [

− Z, ... ,

− Z] (5)

of size p × K whose k-th column is

− Z.

Step 2. Compute S

∗

according to (3) with a ﬁxed λ ∈

(0,1].

Step 3. Compute the eigendecomposition of S

∗

= Q

∗

. (6)

Step 4. Compute the matrix

B = D

−1/2

∗

A (7)

and assign Z to group k if the column of B with

largest Euclidean norm is the k-th column.

Step 5. Repeat steps 2 to 4 with different values of λ

and ﬁnd the classiﬁcation rule with the best clas-

siﬁcation performance.

The group assignment (4) is done by using

(

− Z)

∗−1

(

− Z) =

= (

− Z)

∗

−1

∗

(

− Z) =

= kD

−1/2

∗

(

− Z)k

. (8)

The costs of the algorithm can be made reduced

by replacing the eigendecomposition of S

∗

with its

Cholesky decomposition

∗

= L

∗

, (9)

where L

∗

is a nonsingular lower triangular matrix.

Algorithm 2. LDA

∗

for the general regularization (3)

based on Cholesky decomposition.

Step 1. Compute the matrix

A = [

− Z, ... ,

− Z] (10)

of size p × K whose k-th column is

− Z.

Step 2. Compute S

∗

according to (3) with a ﬁxed λ ∈

(0,1].

AlgorithmsforRegularizedLinearDiscriminantAnalysis

129

Step 3. Compute the Cholesky factor L

∗

of S

∗

Step 4. Compute the matrix

B = L

−T

∗

A (11)

and assign Z to group k if the column of B with

largest Euclidean norm is the k-th column.

Step 5. Repeat steps 2 to 4 with different values of λ

and ﬁnd the classiﬁcation rule with the best clas-

siﬁcation performance.

For speciﬁc target matrices, we can further reduce

computational costs by using the following algorithm

for LDA

∗

. The pooled estimator S can be written in

the form S = Y

Y , where

Y = [X

−

X,...,X

−

X,...,

−

X,...,X

−

(12)

is of size n× p. Then using the singular value decom-

position (SVD) of Y in the form

Y = PΣQ

, (13)

we can express the eigendecomposition of S as

S = Y

Y = (PΣQ

)

PΣQ

= QΣ

. (14)

The costs will be about 4 · np

ﬂoating point opera-

tions, thus with p  n the gain is considerable.

Moreover, if

∗

= λS + (1 − λ)I

, λ ∈ (0,1], (15)

we immediately obtain the needed eigendecomposi-

tion of S

∗

= λS + (1 −λ)I

= Q



λΣ

+ (1 − λ)I



. (16)

The SVD can be computed in a backward stable way

with all singular values accurate up to machine pre-

cision level (Barlow et al., 2005). For the special

case (15), which is commonly denoted as Tikhonov

or ridge regularization of S, a more efﬁcient compu-

tation can be performed as follows.

Algorithm 3. LDA

∗

for the ridge regularization (15).

Step 1. Compute the matrix

A = [

− Z, ... ,

− Z] (17)

of size p × K whose k-th column is

− Z and

compute the matrix Y in (12).

Step 2. Compute the singular value decomposition of

Y as

Y = PΣQ

, (18)

with singular values {σ

,...,σ

} and comple-

ment these singular values with p − n zero values

n+1

= ··· = σ

= 0.

Step 3. For a ﬁxed λ ∈ (0,1], compute D

∗

diag{λσ

+ (1 − λ),...,λσ

+ (1 − λ)}. (19)

Step 4. Compute the matrix

B = D

−1/2

∗

A (20)

and assign Z to group k if the column of B with

largest Euclidean norm is the k-th column.

Step 5. Repeat steps 2 to 4 with different values of λ

and ﬁnd the classiﬁcation rule with the best clas-

siﬁcation performance.

Eigenvalues of S

∗

evaluated in (19) can be in-

terpreted as regularized eigenvalues. They are how-

ever different from shrinkage eigenvalues (Pourah-

madi, 2013), which are obtained by regularizing S by

applying a penalization criterion directly on eigenval-

ues.

3 L

-REGULARIZED LINEAR

DISCRIMINANT ANALYSIS

Disadvantages of SCRDA include a computational in-

tensity as well as an inconsistent approach to regular-

ization. The means are namely modiﬁed by an L

norm regularization and the covariance matrix in the

sense of the L

-norm. Examples of other regularized

LDA versions include the Prediction Analysis of Mi-

croarrays (PAM) (Tibshirani et al., 2003), which can

be described as a diagonalized LDA (DLDA) with

means regularized in the L

-norm.

As an alternative, this section proposes a new reg-

ularized version of LDA denoted as L

-LDA. As a

unique feature, the means in each group as well as the

pooled covariance matrix are regularized in the same

way, i.e. in the L

-norm. We propose an efﬁcient al-

gorithm for the computation of the method.

The classiﬁcation rule of L

-LDA assigns a new

observation Z to the k-th group, if l

†

> l

†

for every

j 6= k, where

†

∗

)

−1

Z −

∗

)

−1

+ log p

(21)

and

denotes the shrunken mean of the k-th group

towards the overall mean computed across groups.

The method can be interpreted as based on a L

reg-

ularized Mahalanobis distance. As another contrast

with the habitually used algorithm of SCRDA, we

will estimate the parameter λ in a straightforward

way using an asymptotically optimal value minimiz-

ing the mean square error (Sch

afer and Strimmer,

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

130

2005). To avoid confusion, the asymptotically opti-

mal value of λ will be denoted by λ

†

and the corre-

sponding regularized covariance matrix by

†

= λ

†

S + (1 − λ

†

)T. (22)

Algorithm 4. L

-LDA.

Step 1. Compute λ

†

∑

i=2

∑

i−1

j=1

var(S

i j

)

∑

i=2

∑

i−1

j=1

i j

∑

i=1

− 1)

, (23)

where

var(S

i j

) is the maximum likelihood estima-

tor of the variance of values S

i j

for a ﬁxed i and j.

Step 2. Compute the eigendecomposition of S

†

= Q

∗

. (24)

Step 3. For a ﬁxed δ ∈ [0,1], compute

= δ

+ (1 − δ)

X, k = 1,...,K. (25)

Step 4. Assign Z to group k, if

−1/2

∗

(

− Z)k =

= min

j=1,...,K

−1/2

∗

(

− Z)k. (26)

Step 5. Repeat steps 3 and 4 for various δ and ﬁnd

the optimal classiﬁcation rule yielding the best

classiﬁcation performance.

The main computational costs are in step 2; the

eigendecomposition costs about 9 · p

ﬂoating point

operations. Note that we need not (and should never)

compute the inverse of S

∗

, thus avoiding additional

computations of the Mahalanobis distance, which is

expensive of order p

and numerically rather unsta-

ble.

Algorithm 4 is formulated for a general target ma-

trix T . For a speciﬁc choice of T , a computationally

cheaper method can be obtained in an analogous way

as Algorithms 2 and 3 from the general Algorithm 1.

Particularly, the costs of Cholesky decomposition (9)

are about p

/3 ﬂoating point operations. On the other

hand, Cholesky decomposition will suffer from insta-

bility when S

∗

is not positive deﬁnite.

Analogous reasoning can be applied to obtain

an algorithm for L

-regularized version of quadratic

discriminant analysis (QDA), which is another stan-

dard classiﬁcation method derived for data following

a Gaussian distribution, but without the assumption

of a common covariance matrix. A regularized co-

variance matrix estimator S

∗

in the k-th group will be

considered with the optimal value of the regulariza-

tion parameter, again yielding the best classiﬁcation

performance. The regularized quadratic discriminant

score for the k-th group has the form

∗

)

−1

Z −

∗

)

−1

−

∗

)

−1

Z +

log|S

∗

| + log p

, (27)

where | · | denotes the determinant of a matrix.

4 EXAMPLES

We present two examples on real molecular genetic

data sets in order to illustrate the behavior of the

newly proposed L

-LDA method and to compare its

with performance of classical classiﬁcation proce-

dures.

Example 1 contains data from our own cardio-

vascular genetic study on 24 patients having a cere-

brovascular stroke and 24 control persons (Kalina and

arov

a, 2013). The were p = 38 590 gene transcripts

measured, which correspond to the whole genome.

In Example 2, a prostate cancer metabolomic data

set (Sreekumar et al., 2009) is analyzed, which con-

tains p = 518 metabolites measured over two groups

of patients, namely those with a benign prostate can-

cer (16 patients) and with other cancer types (26 pa-

tients). The task in both examples is to learn a classi-

ﬁcation rule allowing to discriminate between the two

classes of individuals.

In both examples, we computed the classiﬁcation

methods described in this paper using the algorithms

of Sections 2 and 3. For comparison, we computed

also other available classiﬁcation methods, including

the support vector machines (SVM), a classiﬁcation

tree, Kohonen’s self-organizing map, or a multilayer

perceptron with 2 hidden layers. Various regularized

versions of LDA include the most common choice

T = I

or another choice

∗

= λS + (1 − λ)sI

, s =

∑

i=1

/p, (28)

for λ ∈ (0, 1]. We used the default settings to compute

them in user-submitted packages, which accompany

the R free software and are listed also in Table 1. The

classiﬁcation performance is measured by means of

the Youden’s index, which is deﬁned as

sensitivity + speciﬁcity − 1. (29)

The results performed on raw data as well as after

a dimensionality reduction reveal that the regularized

versions of LDA perform quite similarly. The newly

proposed method L

-LDA with an efﬁcient algorithm

AlgorithmsforRegularizedLinearDiscriminantAnalysis

131

Table 1: Results of Example 1 and Example 2. LDA

∗

was computed using Algorithm 3 for the choice (15) and Algorithm 2

for (28). L

-LDA was computed using Algorithm 4. PCA uses 20 principal components.

Youden’s index

Method S

∗

R Package Function Example 1 Example 2

SVM - e1071 svm 1.00 1.00

Classiﬁcation tree - tree tree 0.94 0.97

Self-organizing map - kohonen som 0.88 0.93

Multilayer percpetron - nnet nnet Infeasible Infeasible

LDA - MASS lda Infeasible Infeasible

SCRDA (15) rda rda 1.00 1.00

LDA

∗

(15) - - 1.00 1.00

LDA

∗

(28) - - 1.00 1.00

-LDA (15) - - 1.00 1.00

-LDA (28) - - 1.00 1.00

PCA =⇒ LDA - - - 0.54 0.90

PCA =⇒ SCRDA (15) - - 0.71 0.92

PCA =⇒ LDA

∗

(15) - - 0.63 0.81

PCA =⇒ LDA

∗

(28) - - 0.63 0.81

PCA =⇒ L

-LDA (15) - - 0.71 0.92

PCA =⇒ L

-LDA (28) - - 0.71 0.92

seems to perform comparably with the available reg-

ularized methods with less efﬁcient computation. Be-

sides, the choice of the target matrix T does not seem

to play an important role. Some of standard classiﬁ-

cation methods are infeasible because of the dimen-

sionality (n  p).

Further, we investigated the effect of dimensional-

ity reduction on the classiﬁcation performance. Prin-

cipal component analysis (PCA) is performed and the

consequent classiﬁcation is applied on 20 principal

components. To explain the notation, for example the

approach denoted as PCA =⇒ L

-LDA corresponds

to performing L

-LDA (using Algorithm 4) on 20

principal components of the original data.

The method PCA =⇒ L

-LDA yields improved

results compared to its standard counterpart (PCA

=⇒ LDA) and is not outperformed by any other

method. It is remarkable that the combination PCA

=⇒ LDA

∗

does not improve the results compared to

PCA =⇒ LDA. The ﬁrst three principal components

seem rather arbitrary and they explain only 6 %, 6 %,

and 5 % of the variability in the data, respectively. Be-

sides, we did not ﬁnd any clear interpretation of the

principal components and there seems no remarkable

small group of genes responsible for a large portion

of variability of the data.

5 CONCLUSIONS

For high-dimensional continuous data, the main ob-

stacle of the traditional Mahalanobis distance is sin-

gularity of the empirical covariance matrix. As a so-

lution, various regularized versions of the LDA have

been proposed, which are commonly used to learn

a classiﬁcation rule from high-dimensional data. This

paper presents our view that the methodology in its

standard form represents a set of ad hoc procedures

rather than a coherent approach, moreover without ef-

ﬁcient algorithms available for the computation. Be-

sides, we explain some open problems concerning

regularized versions of LDA, e.g. speciﬁc algorithms

tailor-made for a speciﬁc choice of the target matrix.

Several new algorithms are proposed for a reg-

ularized LDA in Section 2. We propose the L

regularized version of LDA, advocating our position

that the mean and covariance matrix of multivariate

data can be estimated by means of the same regu-

larization principle. A new regularized classiﬁcation

method L

-LDA is proposed in Section 3 and accom-

panied by an efﬁcient algorithm. Its computational

costs are discussed. The regularization can be inter-

preted as a tailor-made correction for a small sample

size. If n < p for larger sample sizes, the effect of the

regularization will become of a smaller importance.

-LDA can be interpreted as a shrinkage esti-

mator of the covariance matrix, in the light of the

Stein’s result of estimating the mean of multivariate

normal data (Stein, 1956; Hastie et al., 2008). It

is possible to interpret the method as an approach

based on a regularized version of the Mahalanobis

distance. The method is reliable under an implicit as-

sumption that the variability is not substantially dif-

ferent across variables. The optimal regularization

parameters seem to be reasonable in the classiﬁcation

analysis context. Another possibility is to regularize

the within-group covariance matrix instead of regu-

larizing S, which is however computationally more

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

132

intensive. An analysis of two real data sets reveals

its classiﬁcation performance to be comparable to

available regularized classiﬁcation methods for high-

dimensional data.

In general, some regularized statistical methods

for the analysis of high-dimensional data have been

empirically observed to possess reasonable robust-

ness properties with respect to outlying measurement

in the data. To give an example, regularized means

has been observed to cause a certain local robust-

ness against small departures in the observed data

(Tibshirani et al., 2003). Regularization itself cannot

ensure robustness against serious outliers (Filzmoser

and Todorov, 2011) for continuous data. In the words

of robust statistics, regularization does not imply a ro-

bustness in terms of the breakdown point and regular-

ized LDA cannot replace robust classiﬁcation proce-

dures with a high breakdown point (Kalina, 2012). It

remains an open problem to investigate systematically

the relationship between regularization and statisti-

cal robustness for continuous data. Another warning

should be given that there is no reason to suppose that

the optimal procedure for the regularized model will

perform well away from that model (Davies, 2014).

Alternative approaches could be formulated by

means of regularization requiring a certain level of

sparsity (Chen et al., 2012). Moreover, L

-LDA can

be derived in an alternative way as a Bayesian esti-

mator or as the optimal method by means of robust

optimization (Xanthopoulos et al., 2013).

As a future research, we plan to investigate suit-

able choices of the target matrix T and extend the

regularized Mahalanobis distance to the context of

cluster analysis. From the theoretical point of view,

robustness of LDA regularized in the L

-norm has

not been inspected as well as regularized versions of

the highly robust MWCD estimator (Kalina, 2012).

We plan to apply and compare regularized versions

of LDA to pattern recognition problems in the analy-

sis of 3D neuroimages of spontaneous brain activity.

There, we plan to exploit the new L

-LDA without the

usual sparseness assumption, allowing to choose T to

model the high correlation of neighboring voxels.

ACKNOWLEDGEMENTS

The work was ﬁnancially supported by the Neuron

Fund for Support of Science. The work of J. Kalina

was supported by the grant GA13-17187S of the

Czech Science Foundation. The work of J. Duintjer

Tebbens was supported by the grant GA13-06684S of

the Czech Science Foundation.

REFERENCES

Barlow, J., Bosner, N., and Drmac, Z. (2005). A new stable

bidiagonal reduction algorithm. Linear Algebra and

its Applications, 397:35–84.

Chen, X., Kim, Y., and Wang, Z. (2012). Efﬁcient mini-

max estimation of a class of high-dimensional sparse

precision matrices. IEEE Transactions on Signal Pro-

cessing, 60:2899–2912.

Davies, P. (2014). Data Analysis and Approximate Mod-

els: Model Choice, Location-Scale, Analysis of Vari-

ance, Nonparametric Regression and Image Analysis.

Chapman & Hall/CRC, Boca Raton.

Duintjer Tebbens, J. and Schlesinger, P. (2007). Improving

implementation of linear discriminant analysis for the

high dimension/small sample size problem. Compu-

tational Statistics & Data Analysis, 52:423–437.

Filzmoser, P. and Todorov, V. (2011). Review of robust mul-

tivariate statistical methods in high dimension. Ana-

lytica Chinica Acta, 705:2–14.

Guo, Y., Hastie, T., and Tibshirani, R. (2007). Regularized

discriminant analysis and its application in microar-

rays. Biostatistics, 8:86–100.

Haff, L. (1980). Empirical bayes estimation of the multi-

variate normal covariance matrix. Annals of Statistics,

1980:586–597.

Hastie, T., Tibshirani, R., and Friedman, J. (2008). The

elements of statistical learning. Springer, New York,

2nd edition.

Kalina, J. (2012). Highly robust statistical methods in med-

ical image analysis. Biocybernetics and Biomedical

Engineering, 32(2):3–16.

Kalina, J. (2014). Classiﬁcation analysis methods for

high-dimensional genetic data. Biocybernetics and

Biomedical Engineering, 34:10–18.

Kalina, J. and Zv

arov

a, J. (2013). Decision support systems

in the process of improving patient safety. In E-health

Technologies and Improving Patient Safety: Explor-

ing Organizational Factors, pages 71–83. IGI Global,

Hershey.

Kogan, J. (2007). Introduction to clustering large and high-

dimensional data. Cambridge University Press, Cam-

bridge.

Pourahmadi, M. (2013). High-dimensional covariance es-

timation. Wiley, New York.

Sch

afer, J. and Strimmer, K. (2005). A shrinkage approach

to large-scale covariance matrix estimation and impli-

cations for functional genomics. Statistical Applica-

tions in Genetics and Molecular Biology, 32:1–30.

Sreekumar et al., A. (2009). Metabolomic proﬁles delineate

potential role for sarcosine in prostate cancer progres-

sion. Nature, 457:910–914.

Stein, C. (1956). Inadmissibility of the usual estimator for

the mean of a multivariate normal distribution. Pro-

ceedings of the Third Berkeley Symposium on Mathe-

matical Statistics and Probability, 1:197–206.

Tibshirani, R., Hastie, T., and Narasimhan, B. (2003). Class

prediction by nearest shrunken centroids, with ap-

plications to dna microarrays. Statistical Science,

18:104–117.

Xanthopoulos, P., Pardalos, P., and Trafalis, T. (2013). Ro-

bust data mining. Springer, New York.

AlgorithmsforRegularizedLinearDiscriminantAnalysis

133