THE STEPWISE RESPONSE REFINEMENT SCREENER (SRRS)

AND ITS APPLICATIONS TO ANALYSIS OF FACTORIAL

EXPERIMENTS

Frederick Kin Hing Phoa

Institute of Statistical Science, Academia Sinica, 128 Academia Rd. Sec. 2, Nangang Dist., Taipei 115, Taiwan R.O.C.

Keywords:

Stepwise Response Reﬁnement Screener (SRRS), Modiﬁed akaike information criterion, Screening experi-

ment, Nonregular fractional factorial design, Supersaturated design.

Abstract:

Two-level supersaturated designs are very useful in the screening experiments and the common goal is to

identify sparse but dominant active factors with low cost. Recently, a new analysis procedure called the

Stepwise Response Reﬁnement Screener (SRRS) method is proposed to screen important effects. This paper

extends this method to the two-level nonregular fractional factorial designs. The applications to several real-

life examples suggest that the SRRS method is able to retrieve similar results as the existing methods do.

Simulation studies show that compared to existing methods in the literature, the SRRS method performs well

in terms of the true model identiﬁcation rate and the average model size.

1 INTRODUCTION

As science and technology have advanced to a higher

level nowadays, investigators are becoming more in-

terested in and capable of studying large-scale sys-

tems. To address these challenges of expensive ex-

perimental costs, research in experimental design has

lately focused on the class of supersaturated designs

(SSD) for their run-size economy and mathematically

novelty. Under the condition of factor sparsity (Box

and Meyer, 1986), these experimentsaims at correctly

identifying the subset of those active factors that have

signiﬁcant impact on the response, so that the whole

investigation can be economically proceed via dis-

carding inactive factors prior to the follow-up experi-

ments.

Traditionally, SSDs are employed only for screen-

ing main effects, and interactions are discarded due

to limited degree of freedom. More reﬁned analy-

sis methods were recently developed and Phoa, Pan

and Xu (2009) provides a comprehensive list of re-

cent analysis methods found in the literature. Candes

and Tao (2007) proposed the Dantzig selector (DS)

and showed that it has some remarkable properties

under some conditions. Phoa, Pan and Xu (2009) im-

plemented the DS in practice, introducing a graphi-

cal procedure via a proﬁle plot for analysis and an

automatic variable selection procedure via a modi-

ﬁed Akaike information criterion (AIC). Tradition-

ally, AIC is used for model selection. For linear mod-

els, it is deﬁned as

AIC = nlog(RSS/n) + 2p (1)

where RSS =

∑

i=1

− ˆy

)

is the residual sum of

squares and p is the number of parameters in the

model. It is known that AIC tends to overﬁt the model

when the sample size is small. Phoa, Pan and Xu

(2009) imposed a heavy penalty on the model com-

plexity and proposed a new modiﬁed AIC for the DS

method, which is deﬁned as

mAIC = nlog(RSS/n) + 2p

(2)

The mAIC typically chooses a smaller model than

AIC.

Recently, Phoa (2011) introduce a new variable

selection approach via the Stepwise Response Reﬁne-

ment Screener (SRRS). The SRRS chooses the best

subset of variables or active factors by two proce-

dures: Factor Screening and Model Searching. This

method has shown its superior model selection ability

via a comparison to ﬁve commonly used methods in

the literature, namely SSVS (Chipman et al., 1997),

SSVS/IBF (Beattie et al., 2002), SCAD (Li and Lin,

2003), PLSVS (Zhang et al., 2007) and the DS (Phoa

et al., 2009a) method. Readers who are interested

in the main idea of the SRRS method are referred

to Phoa (2011) . This paper aims at extending the

SRRS method to the variants of supersaturated exper-

iments. In section 2, we review the notation and the

157

Kin Hing Phoa F. (2012).

THE STEPWISE RESPONSE REFINEMENT SCREENER (SRRS) AND ITS APPLICATIONS TO ANALYSIS OF FACTORIAL EXPERIMENTS.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 157-161

DOI: 10.5220/0003711101570161

 SciTePress

general procedure of SRRS introduced in Phoa (2011)

. Section 3 discusses the modiﬁcations of the algo-

rithms when the SRRS is used in these extensions. To

demonstrate the value of the SRRS method, a simula-

tion study is performed in section 4. The result shows

that the SRRS method is powerful for analyzing not

only SSDs but also its variant designs. The last sec-

tion gives some concluding remarks.

2 ANALYSIS OF FRACTIONAL

FACTORIAL DESIGNS VIA THE

SRRS METHODS

Fractional factorial designs (FFDs) are classiﬁed into

two broad types: Regular FFDs and Nonregular

FFDs. Regular FFDs are constructed through deﬁn-

ing relations among factors and are described in many

textbooks (Wu and Hamada, 2000). These designs

have been widely used in scientiﬁc researches and

industrial processes because they are simple to con-

struct and to analyze. On the other hands, nonreg-

ular FFDs such as Plackett and Burman (1946) de-

signs, Quaternary-code designs (Phoa and Xu 2009 ,

Zhang et. al. 2011 ) and other orthogonal arrays are

often used in various screening experiments for their

run size economy and ﬂexibility (Wu and Hamada,

2000). Phoa, Xu and Wong (2009) demonstrated the

advantages of using nonregular FFDs using two real-

life toxicological experiments. Phoa, Wong and Xu

(2009) used three real-life chemometrics examples to

show the analysis pitfalls when the interactions are

assumed to be insigniﬁcant without veriﬁcations.

In this section, we extend the use of the SRRS

method to the analysis of fractional factorial designs

(FFDs), including two-level nonregular FFDs and

multi-level FFDs.

2.1 Modiﬁcation of the SRRS Method

Accompanied for the Analysis of

Nonregular Designs

Consider a nonregular FFDs with k

main effects and

n runs, where n < m. There are k

= k

+ 1)/2 in-

teractions between two different main effects. If all

two-factor interactions are considered together with

all main effects, it is possible that k

> m, then the

design matrix is supersaturated. We express the re-

lationship via a linear regression model y = Xβ + ε

where y is an n × 1 vector of observations, X is an

n×k model matrix for k = k

+ k

, β is a k×1 vector

of unknown parameters, and ε is an n × 1 vector of

random errors. Assume that ε ∼ N(0,σ

) is a vector

of independent normal random variables. In addition,

X is assumed to be supersaturated, i.e. n < k. We

denote m to be the number of potentially important

effects (PIEs) and S

inf

to be the inﬂuential set of PIEs

found in the process.

Traditionally, the analysis of nonregular FFDs is

based on two assumptions: the factor sparsity princi-

ple and the effectheredity prinicple. The ﬁrst assump-

tion has been embedded in the SRRS method, but the

second assumption does not. In order to implement

the heredity principle into the SRRS method, the two

procedures of the SRRS method are slightly modiﬁed

and presented in the following steps:

I. SRRS (Heredity Prinicple embedded)–Factor

Screening:

Step 1. Standardize data so that y

has mean 0 and

columns of X have equal lengths.

Step 2. Compute the marginal correlations ρ(X

)

for all main effects X

, i = 1,...,k. (∗)

Step 3. Choose E

such that |ρ(E

)| =

max

|ρ(X

)|. Identify E

as the

ﬁrst PIE and include E

in S

Inf

Step 4. Obtain the estimate β

by regressing y

Step 5. For the next m PIEs E

where j = 1,...,m,

m < n− 2,

(a) Compute the reﬁned response y

j−1

− E

j−1

(b) Compute the marginal correlations

ρ({X

},y

) for all main effects X

i = 1,...,k and all two-factor interactions

, X

∈ S

Inf

. (∗)

such that |ρ(E

)| =

max

}

|ρ({X

},y

)|. (∗)

(d) Obtain the estimate β

by regressing y

on E

(e) Identify E

as a PIE and include E

Inf

if |β

| ≥ γ, where γ is the threshold

of noise level.

(f) Repeat (a) to (e) until E

is not included

in S

Inf

II. SRRS–Model Searching:

Step 6. Perform all-subset search, with the consid-

eration of the heredity principle, for all E

from models with one factor to models with

m factors, where m is minimum between the

ceiling of n/3 or the number of E

in S

inf

. (∗)

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

158

Step 7. Compute mAIC for each model and choose

the ﬁnal model with the smallest mAIC

among all models, and all E

included in the

ﬁnal model are considered to be signiﬁcant

to the response y

The ﬁrst modiﬁcation is in Step 2. Due to the

heredity principle, two-factor interactions are never

be selected as the ﬁrst PIE, so only the marginal corre-

lations of all main effects are compared for selecting

the ﬁrst PIE. The second and third modiﬁcations are

in Step 5. During the search of the j

PIE, not all

two-factor interactions are considered in the compari-

son of marginal correlation. According to the heredity

principle, a two-factor interaction X

is considered in

Step 5(b) if and only if either X

or X

or both parents

main effects have been included in S

Inf

in the pre-

vious searches. Therefore, the modiﬁcations in Step

5 take away a subset of two-factor interactions that

none of their corresponding parent main effects have

been PIEs. The last modiﬁcation is in Step 6. The re-

duced models built in this step must follow the hered-

ity principle in order to avoid the situation that some

signiﬁcant two-factor interactions are included in the

reduced model but none of their parent main effects

have been included.

2.2 Two Illustrating Examples

We illustrate the analysis of nonregular FFDs via the

SRRS method step by step using the following two

examples. The Factor Screening procedures is termi-

nated via the noise threshold in the ﬁrst example and

via the maximum number of PIEs in the second ex-

ample.

Example 1. Consider the cast fatigue experiment

(Wu and Hamada 2000 , section 7.1), a real data set

consisting of seven two-level factors. The design ma-

trix and the response are found in Wu and Hamada

(2000) . When all two-factor interactions are consid-

ered to be as important as the main effects, the design

matrix consists of 21 additional interactions and is su-

persaturated.

In the Factor Screening procedure, the ﬁrst PIE

being identiﬁed is F and its absolute marginal cor-

relation to y

is the highest among all main effects

(0.6672). A regression model between y

and F is

built and the magnitude of the slope estimate |β

| =

0.4576. Then we set the threshold γ = 0.04, about

10% of β

To search for the second PIE, the new response y

is reﬁned by subtracting Fβ

from y

. Then among

all main effects and all the two-factor interactions that

consist of F, FG (the interaction between main ef-

fects F and G) has the highest absolute marginal cor-

Table 1: Factor Screening of Cast Fatigue Experiment Data.

Marginal Continue

m PIE Correlation |β| or Stop

0 F 0.6672 0.4576 Continue

1 FG −0.8980 0.4588 Continue

2 D −0.4677 0.1183 Continue

3 EF −0.6336 0.1442 Continue

4 C 0.5032 0.0758 Continue

5 E −0.5817 0.0785 Continue

6 AE −0.7667 0.1482 Continue

AE −0.6835 0 Stop

PIEs in S

Inf

after Factor Screening:

C, D, E, F, AE, EF, FG

relation (0.8980) to y

and so it is identiﬁed as the

second PIE. A regression model between y

and FG,

F is built and the magnitude of the slope estimate

|β

| = 0.4588 > γ. This means FG is important

enough to be included in the inﬂuential set S

Inf

to-

gether with F.

The procedure continues to search for the next ﬁve

PIEs. Table 1 shows every step of the process of Fac-

tor Screening. Note that in the last step, the absolute

magnitude of the slope estimate of AE is close to 0,

so the search stops and seven PIEs are identiﬁed in the

Factor Screening procedure.

Since there are 12 observations in the data, the

maximum number of active factors is suggested to be

4. There are totally 98 reduced models up to four-

factors models that are constructed from seven PIEs,

but only 49 of them fulﬁll the heredity principle. A

comparison of the mAICs of these 49 reduced models

shows that the two-effects model with F and FG has

the lowest mAIC = −27.82. Thus the SRRS method

suggests that F and FG have signiﬁcant impacts to

the response y

. This result is also recommended by

Wu and Hamada (2000, Section 8.4) and the Dantzig

selector (DS) method in Phoa, Pan and Xu (2009) .

Example 2. Consider the high-performance liq-

uid chromatography (HPLC) experiment (Vander-

Heyden et al., 1999), a real data set consisting of eight

two-level factors. The design matrix and the response

are found in Phoa, Wong and Xu (2009) . When all

two-factor interactions are considered to be as impor-

tant as the main effects, the design matrix consists of

28 additional interactions and is supersaturated.

In the Factor Screening procedure, the ﬁrst PIE

being identiﬁed is E and its absolute marginal cor-

relation to y

is the highest among all main effects

(0.5019). A regression model between y

and E is

built and the magnitude of the slope estimate |β

| =

0.5583. Then we set the threshold γ = 0.05, about

10% of β

To search for the second PIE, the new response y

THE STEPWISE RESPONSE REFINEMENT SCREENER (SRRS) AND ITS APPLICATIONS TO ANALYSIS OF

FACTORIAL EXPERIMENTS

159

Table 2: Factor Screening of HPLC Experiment Data.

Marginal Continue

m PIE Correlation |β| or Stop

0 E −0.5019 0.5583 Continue

1 EF 0.8055 0.7750 Continue

2 F 0.7747 0.4417 Continue

3 H −0.7396 0.3000 Continue

4 FH 0.5897 0.1625 Continue

5 A 0.6922 0.1389 Continue

6 FI −0.5295 0.0893 Continue

7 EI 0.5713 0.0836 Continue

8 AF −0.6587 0.0792 Continue

EF 0.6951 0.0667 Continue

PIEs in S

Inf

after Factor Screening:

A, E, F, H, AF, EF, EI, FH, FI

is reﬁned by subtracting Eβ

from y

. Then among

all main effects and all the two-factor interactions that

consist of E, EF (the interaction between main effects

E and F) has the highest absolute marginalcorrelation

(0.8055) to y

and so it is identiﬁed as the second PIE.

A regression model between y

and EF, E is built and

the magnitude of the slope estimate |β

| = 0.7750 >

γ. This means EF is important enough to be included

in the inﬂuential set S

Inf

together with E.

The procedure continues to search for the next

eight PIEs. Table 2 shows every step of the process of

Factor Screening. Note that in the last step, although

the absolute magnitude of the slope estimate of EF is

0.0667 > γ, the m < n − 2 criterion stops the search

and nine PIEs are identiﬁed in the Factor Screening

procedure.

Since there are 12 observations in the data, the

maximum number of active factors is suggested to be

4. With nine PIEs found in the previous step, there are

totally 255 reduced models up to four-factors models,

but only 102 of them fulﬁll the heredity principle. A

comparison of the mAICs of these 102 reduced mod-

els shows that the three-effects model with E, F and

EF has the lowest mAIC = −6.48. Thus the SRRS

method suggests that E, F and EF have signiﬁcant

impacts to the response y

Phoa, Wong and Xu (2009) previously analyzed

the same data and concluded that an additional effect

H was also signiﬁcant to the response. The mAIC of

the model consisting of E, F, H and EF is −3.95,

which is slightly higher than our suggested model.

The increase of mAIC when H is added comes from

the heavy penalty to the number of factors in the

model. If other penalty terms are used, results may

be different. For example, the original AIC favors the

addition of H. Therefore, H may be barely signiﬁ-

cant and some follow-up experiments are suggested

to investigate the signiﬁcance of H to the response.

Table 3: Summary of Simulation Results in Example 3.

Case I II III IV

Min TMIR 94% 47% 5% 0%

Size 1.00 1.85 2.05 1.06

1st Q. TMIR 97% 97% 44% 15%

Size 1.01 2.01 3.00 2.42

Median TMIR 98% 97% 96% 53%

Size 1.02 2.02 3.00 3.30

3rd Q. TMIR 99% 99% 99% 88%

Size 1.03 2.03 3.01 3.76

Max TMIR 100% 100% 100% 99%

Size 1.06 2.05 3.04 3.98

3 SIMULATION STUDIES

In order to judge the value of the SRRS method, we

randomly generate some models and evaluate the per-

formance of the SRRS method.

Example 3. In this example, we generate data from

the same linear model as in Example 1. Since there

are only 12 observations in the data, the maximum

possible number of active factors is 4. Therefore, we

consider four cases for beta. There are i active fac-

tors for case i, 1 ≤ i ≤ 4. For each case, we generate

500 models where the selection of active factors is

random without replacement, the signs of the active

factors are randomly selected from either positive or

negative, and the magnitudes are randomly selected

from 2 to 10 with replacement. For each model, we

generate data 100 times and obtain the True Model

Identiﬁed Rate (TMIR) and the average model size. In

the simulations we ﬁx γ = 1, which is approximately

equal to 10% of max|β

|. Table 3 gives the summary

statistics of these two quantities among 500 models.

The SRRS method is very effective in identifying

1, 2 and 3 active factors; the TMIR in these cases

are at least 96% in average true model identiﬁed rate

and only a few cases that have average model sizes

slightly higher than the true numbers of active factors.

The performance of the method decreases in identify-

ing 4 active factors. It is mainly because of the limit

posted on the allowed number of active factors, which

leads to a slightly underﬁtting situation.

4 CONCLUDING REMARKS

The Stepwise Response Reﬁnement Screener (SRRS)

method has shown its satisfactory performance on

screening the supersaturated designs in Phoa (2011) .

In this paper, we modify the SRRS method in order to

adapt for analyzing the nonregularFFDs with the con-

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

160

sideration of interactions. Under the validity of the

factor sparsity and effect heredity assumptions, the

calculations needed to carry out the analysis are sim-

ple and easily performed with little computation time.

Simulation suggests that the SRRS method performs

well in most of the cases, except when it is on the line

of maximum number of allowed active factors. In ad-

dition, we cannot ensure that thsi method works well

in every case as its fundamental theorems are still un-

der investigation. Sometimes it may still possible to

reach misleading conclusion, so it is highly recom-

mended that once the suggested set of signiﬁcant fac-

tors is found, a follow-up experiment is needed for

validating the results.

REFERENCES

Beattie, S., Fong, D., and Lin, D. (2002). A two-stage

bayesian model selection strategy for supersaturated

designs. Technometrics, 44:55–63.

Box, G. and Meyer, R. (1986). An analysis for unreplicated

fractional factorials. Technometrics, 28:11–18.

Candes, E. and Tao, T. (2007). The dantzig selector: statis-

tical estimation when p is much larger than n. Annals

of Statistics, 35:2313–2351.

Chipman, H., Hamada, H., and Wu, C. (1997). A bayesian

variable-selection approach for analyzing designed

experiments with complex aliasing. Technometrics,

39:372–381.

Li, R. and Lin, D. (2003). Analysis methods for supersat-

urated design: some comparisons. Journal of Data

Science, 1:249–260.

Phoa, F. (2011). The stepwise response reﬁnement screener

(srrs). (in review).

Phoa, F., Pan, Y., and Xu, H. (2009a). Analysis of super-

saturated designs via the dantzig selector. Journal of

Statistical Planning and Inference, 139:2362–2372.

Phoa, F., Wong, W., and Xu, H. (2009b). The need of con-

sidering the interactions in the analysis of screening

designs. Journal of Chemometrics, 23:545–553.

Phoa, F. and Xu, H. (2009). Quater-fraction factorial design

constructed via quaternary codes. Annals of Statistics,

37:2561–2581.

Phoa, F., Xu, H., and Wong, W. (2009c). The use of nonreg-

ular fractional factorial designs in combination toxic-

ity studies. Food and Chemical Toxicology, 47:2183–

2188.

Plackett, R. and Burman, J. (1946). The design of optimum

multifactorial experiments. Biometrika, 33:305–325.

Vander-Heyden, Y., Jimidar, M., Hund, E., Niemeijer, N.,

Peeters, R., Smeyers-Verbeke, J., D.L., M., and Hoog-

martens, J. (1999). Determination of system suitabil-

ity limits with a robustness test. Journal of Chro-

matography A, 845:145–154.

Wu, C. and Hamada, M. (2000). Experiments: Planning,

Analysis, and Parameter Design Optimization. Wiley,

New York.

Zhang, Q., Zhang, R., and Liu, M. (2007). A method

for screening active effects in supersaturated de-

signs. Journal of Statistical Planning and Inference,

137:235–248.

Zhang, R., Phoa, F., Mukerjee, R., and Xu, H. (2011).

A trigonometric approach to quaternary code designs

with application to one-eighth and one- sixteenth frac-

tions. Annals of Statistics, 39:931–955.

THE STEPWISE RESPONSE REFINEMENT SCREENER (SRRS) AND ITS APPLICATIONS TO ANALYSIS OF

FACTORIAL EXPERIMENTS

161