Geometric Divide and Conquer Classification for High-dimensional Data

Pei Ling Lai, Yang Jin Liang, Alfred Inselberg

Abstract

From the Nested Cavities (abbr. NC) classifier (Inselberg and Avidan, 2000) a powerful new classification approach emerged. For a dataset P and a subset S ¼P the classifer constructs a rule distinguishing the elements of S from those in P.S. The NC is a geometrical algorithm which builds a sequence of nested unbounded parallelopipeds of minimal dimensionality containing disjoint subsets of P, and from which a hypersurface (the rule) containing the subset S is obtained. The partitioning of P.S and S into disjoint subsets is very useful when the original rule obtained is either too complex or imprecise. As illustrated with examples, this separation reveals exquisite insight on the datasetfs structure. Specifically from one of the problems we studied two different types of watermines were separated. From another dataset, two distinct types of ovarian cancer were found. This process is developed and illustrated on a (sonar) dataset with 60 variables and two categories (gminesh and grocksh) resulting in significant understanding of the domain and simplification of the classification rule. Such a situation is generic and occurs with other datasets as illustrated with a similar decompositions of a financial dataset producing two sets of conditions determing gold prices. The divide-and-conquer extension can be automated and also allows the classification of the sub-categories to be done in parallel.

References

  1. Fayad, U. M. Piatesky-Shapiro, G., Smyth, P., and Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge Mass.
  2. Inselberg, A. (2009). Parallel Coordinates : VISUAL Multidimensional Geometry and its Applications. Springer, New York.
  3. Inselberg, A. and Avidan, T. (2000). Classification and Visualization for High-Dimensional Data, in Proc. of KDD, 370-4. ACM, New York.
  4. Kugler, M. (2006). Divide-and-Conquer Large-Scale Support Vector Classification. Ph.D. Thesis, Dept. of CSE, Nagoya Inst. of Tech.
  5. Marchand, M. and Shawe-Taylor, J. (2002). The set covering machine. J. Mach. Learn. Res.
  6. Mavroforakis, M., Sdralis, M., and Theodoridis, S. (2006). A Novel SVM Geometric Algorithm based on Reduced Convex Hulls. Pat. Rec. ICPR Inter. Conf. 564-568.
  7. McBride, B. and Peterson, L. G. (2004). Blind Data Classification Using Hyper-Dimensional Convex Polytopes. Proc. AAAI.
  8. Murthy, S. and et al (1993). OC1: Randomized Induction of Oblique Desicion Trees. AAAI.
  9. UCI (2012). Machine Learning Database Repository at. www.ics.uci.edu/ mlearn/MLRepository.html.
  10. Xindowg, W. and et al (2008). Top 10 algorithms in data mining. Knowl. Inf. Syst., 14:1-37.
Download


Paper Citation


in Harvard Style

Ling Lai P., Jin Liang Y. and Inselberg A. (2012). Geometric Divide and Conquer Classification for High-dimensional Data . In Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-18-1, pages 79-82. DOI: 10.5220/0004050600790082


in Bibtex Style

@conference{data12,
author={Pei Ling Lai and Yang Jin Liang and Alfred Inselberg},
title={Geometric Divide and Conquer Classification for High-dimensional Data},
booktitle={Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2012},
pages={79-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004050600790082},
isbn={978-989-8565-18-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - Geometric Divide and Conquer Classification for High-dimensional Data
SN - 978-989-8565-18-1
AU - Ling Lai P.
AU - Jin Liang Y.
AU - Inselberg A.
PY - 2012
SP - 79
EP - 82
DO - 10.5220/0004050600790082