rived from baselines obtained from target permuta-
tions) are used to filter non-influential outliers and
normal samples. The target-aware paradigm uses
target-labeled data for training, but the diagnostics are
calculated before the corresponding target attribute
value is available (to meet the conditions for pro-
cess control applications). The ensemble model al-
lows us to deal with complex interactions in predic-
tor space and local data structures. Linear methods
based on principal components often fail to detect
outliers and/or contributors in anomaly detection, es-
pecially in the presence of noise. The linear meth-
ods are also sensitive to scaling. Furthermore, linear
methods rarely work when the target is a non-linear
function of the inputs. Furthermore, methods such as
partial least squares often fail to rank the contributors
correctly and fail to separate relevant from irrelevant
contributors. When the number of irrelevant variables
increases it even can fail to identify influential outliers
on relevant predictors. The proposed method works
equally well for linear and non-linear cases in terms
of diagnostics. Our method can correctly rank out-
liers with respect to their effect on the target, rank at-
tributes that contribute to an outlier score, and filter
non-influential and normal sample. It is insensitive
to noise and ranks outliers and contributors for a data
sample using a fast, robust, nonparametric technique.
ACKNOWLEDGEMENTS
This research was partially supported by ONR grant
N00014-09-1-0656. We wish to thank anonymous
referees for comments that improved this work.
REFERENCES
Angelov, P., Giglio, V., Guardiola, C., Lughofer, E., and
Luj´an, J. (2006). An approach to model-based fault
detection in industrial measurement systems with ap-
plication to engine test benches. Measurement Science
and Technology, 17:1809.
Borisov, A., Eruhimov, V., and Tuv, E. (2006). Tree-
based ensembles with dynamic soft feature selection.
In Guyon, I., Gunn, S., Nikravesh, M., and Zadeh,
L., editors, Feature Extraction Foundations and Ap-
plications: Studies in Fuzziness and Soft Computing.
Springer.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984).
Classification and Regression Trees. Wadsworth, Bel-
mont, MA.
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Computing Sur-
veys (CSUR), 41(3):15.
Chiang, L., Russell, E., and Braatz, R. (2001). Fault de-
tection and diagnosis in industrial systems. Springer
Verlag.
Efendic, H., Schrempf, A., and Del Re, L. (2003). Data
based fault isolation in complex measurement sys-
tems using models on demand. In Proceedings of the
5th IFAC-Safeprocess 2003, IFAC, pages 1149–1154.
ACM.
Ergon, R. (2004). Informative pls score-loading plots for
process understanding and monitoring. Journal of
Process Control, 14(6):889–897.
Filev, D. and Tseng, F. (2006). Novelty detection based
machine health prognostics. In Evolving Fuzzy Sys-
tems, 2006 International Symposium on, pages 193–
199. IEEE.
Friedman, J. H. (2001). Greedy function approximation: A
gradient boosting machine. The Annals of Statistics,
29(5):1189–1232.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The
Elements of Statistical Learning. Springer.
Hodge, V. J. and Austin, J. (2004). A survey of outlier de-
tection methodologies. Artificial Intelligence Review,
22:85–126.
Hotelling, H. (1947). Multivariate quality control-
illustrated by the air testing of sample bombsights.
In Eisenhart, C., Hastay, M., and Wallis, W., edi-
tors, Techniques of Statistical Analysis, pages 111–
184. McGraw-Hill, New York.
Lughofer, E. and Guardioler, C. (2008). On-line fault detec-
tion with data-driven evolving fuzzy models. Control
and Intelligent Systems, 36(4):307–317.
Miller, P., Swanson, R., and Heckler, C. (1998). Contri-
bution plots: A missing link in multivariate quality
control. Applied Mathematics and Computer Science,
8(4):775–792.
Runger, G., Alt, F., and Montgomery, D. (1996). Con-
tributors to a Multivariate Statistical Process Control
Chart Signal. Communications in Statistics–Theory
and Methods, 25(10):2203–2213.
Wold, S., Sjostrom, M., and Eriksson, L. (2001). PLS-
regression: a basic tool of chemometrics. Chemo-
metrics and intelligent laboratory systems, 58(2):109–
130.
TARGET-AWARE ANOMALY DETECTION AND DIAGNOSIS
23