also applied both of these steps to the synthetic
datasets and have evaluated the impact of
kvalues−
on LOF. Figure 5 shows the impact of
k value−
on Local Outlier Factor (LOF). It
demonstrates a simple scenario where the data
objects belong to a Gaussian cluster i.e. all the data
objects within a cluster follows a Gaussian
distribution. For each
k value−
ranging from 3 to
100, the mean, minimum and maximum LOF values
are drawn. It can be observed that, with
increasing
k value−
, the LOF neither increases nor
decreases monotonically. For example, as shown in
Figure 5, the maximum LOF value is fluctuating as
k value−
increases continuously and eventually
stabilizes to some value showing that a single value
of
k
is inefficient to produce a more accurate LOF
value. So, mean of LOFs is taken over the range of
......
max
min
kk k=
in order to produce more
stabilized LOF values. These are shown in Figure 5.
Figure 5: Fluctuation of outlier factors within a Gaussian
cluster.
6 CONCLUSIONS
In this paper, we have proposed a symmetric and
computationally efficient kernel of order-2. Our
proposed kernel obtained lower MISE than the
previously available kernels and hence, produced a
more accurate density estimate. We have also
proposed an outlier detection method that uses our
proposed kernel function in order to construct density
estimates. We have decoupled the density estimation
and the local density based outlier detection steps in
order to preserve the strength of both. As a
consequence, the resulted framework can be easily
adjusted to any application-specific environment.
Experiments performed on both real and synthetic
datasets indicate that the proposed techniques can
detect outliers efficiently. As future work, we will be
focusing on classification of transient faults in
wireless sensor networks using outlier scores.
REFERENCES
Aggarwal C. C., 2013. Outlier analysis, Springer,
doi:10.1007/978-1-4614-6396-2.
Barnett, V. and Lewis, T., 1994. Outliers in statistical
data, Wiley, vol. 3, New York.
Branch J. W., Giannella C., Szymanski B., Wolf R. and
Kargupta H., 2013. “In-Network Outlier Detection in
Wireless Sensor Networks,” Knowledge and
Information System, vol. 34 no. 1, pp. 23-54.
Breunig M. M.,. Kriegel H. P., Raymond T. Ng. and
Sander J., 2000. “LOF: identifying density-based local
outliers,” ACM Sigmod Record, vol. 29 no. 2, pp. 93-
104, doi:10.1145/335191.335388.
Chandola V., Banerjee A. and Kumar V., 2009. “Anomaly
detection: A survey”, ACM Computing Surveys
(CSUR), vol. 41 no. 3: 15, pp. 1-58,
doi:10.1145/1541880.1541882.
Gupta M., Gao J., Aggarwal C.C. and Han J., 2013.
“Outlier detection for temporal data: A survey”, IEEE
Transaction on Knowledge and Data Engineering, vol.
25 no. 1, doi:10.1109/TKDE.2013.184.
Hodge V. J., Austin J., 2004. “A survey of outlier
detection methodologies,” Artificial Intelligence
Review, vol. 22 no. 2, pp. 85-126, doi:10.1007/
s10462-004-4304-y.
Intel Lab Data downloaded from http://db.csail.mit.edu/
labdata/labdata.html.
Jin W., Tung A. KH, Han J. and Wang W., 2006.
“Ranking outliers using symmetric neighborhood
relationship,” Advances in Knowledge Discovery and
Data Mining, Springer Berlin Heidelberg, pp. 577-
593, doi:10.1007/11731139_68.
Knorr E. M. and Raymond T. Ng., 1997. “A Unified
Notion of Outliers: Properties and Computation,”
Proc. KDD. Available at: http://www.aaai.org/Papers/
KDD/1997/ KDD97-044.pdf.
Kriegel H. P., Kröger P., Schubert E., Zimek A., 2009.
“LoOP: local outlier probabilities,” Proc. of the 18th
ACM conference on Information and knowledge
management (CIKM 09), ACM, pp. 1649-1652,
doi:10.1145/1645953.1646195.
Latecki L. J., Lazarevic A. and Pokrajac D., 2007. “Outlier
detection with kernel density functions,” Machine
Learning and Data Mining in Pattern Recognition,
Springer Berlin Heidelberg, pp. 61-75,
doi:10.1007/978-3-540-73499-4_6.
Loftsgaarden D. O. and Quesenberry C. P., 1965. “A
nonparametric estimate of a multivariate density
functions,” The Annals of Mathematical Statistics, vol.
36 no. 3, pp. 1049-1051. Available at:
http://projecteuclid.org/euclid.aoms/1177700079.
Marron J. S. and Wand M. P., 1992. “Exact mean
integrated squared error,” The Annals of Statistics,
vol. 20 no. 2, pp. 712-736. Available at:
http://projecteuclid.org/download/pdf_1/euclid.aos/11
76348653.
Papadimitriou S., Kitagawa H., Gibbons P. B. and
Faloutsos C.,2003. “Loci: Fast outlier detection using
the local correlation integral,” Proc. of the 19th
SENSORNETS2015-4thInternationalConferenceonSensorNetworks
174