lieve this discounts its overall performance. Regard-
less of the algorithm chosen, the decision trees are fit
in a negligible amount of time for the smaller datasets
relative to the larger datasets. Moreover, Max Cut
Node Means PCA still provides the previously men-
tioned accuracy benefits. For the larger datasets, such
as CIFAR-100, the Max Cut Node Means PCA algo-
rithm reduced the required CPU time by 94% com-
pared to the baseline algorithm. Thus, Max Cut Node
Means PCA provides significant computational and
accuracy advantages.
4 CONCLUSIONS
In this paper, we propose two important modifications
to the CART methodology for constructing classifica-
tion decision trees. We first introduced a Max Cut
based metric for determining the optimal binary split
as an alternative to the Gini Impurity method (see ap-
pendix for an O(nlogn) implementation of Max Cut
along a single feature) . We then introduced the use of
localized PCA to determine locally viable directions
for considering splits and further modified the tradi-
tional PCA algorithm, Means PCA, to find suitable
directions for discriminating between classes. We
follow this study with a theoretical commentary on
how these modifications are reflected in the asymp-
totic bounds of node splitting and demonstrate that
Means PCA improves upon the traditional method.
Our extensive experimental analysis included
more than 20,000 synthetically generated data sets
with training sizes ranging from 100 and 300,000
and informative dimensions ranging between 4 and
50, We further considered both binary and multiclass
classification tasks. These experiments demonstrate
the significant improvements in increased accuracy
and decreased computational time provided by uti-
lizing localized Means PCA and Max Cut. Further-
more, we show that the accuracy improvements be-
come even more substantial as the dimension of the
data and/or the number of classes increases and that
the runtime improves with the dimension and size of
the datasets.
Analysis was also done on real-world datasets.
Our results indicate that the Max Cut Node Means
PCA algorithm remains advantageous even when us-
ing real-world data. For example, we show that in
the case of CIFAR-100 (100 classes, 3,072 dimen-
sions, and 48,000 training points out of 60,000 total)
our algorithm results in a 49.4% increase in accuracy
performance compared to the baseline CART model,
while simultaneously reducing the CPU time required
by 94%. This novel algorithm helps bring decision
trees into the world of big, high-dimensional data.
Our experiments demonstrate the significant improve-
ments that Max Cut Node Means PCA has on con-
structing classification decision trees for these types
of datasets. Further research on how these novel deci-
sion trees affect the performance of ensemble meth-
ods may lead to even greater advancements in the
area.
ACKNOWLEDGEMENTS
This research used the Savio computational cluster re-
source provided by the Berkeley Research Comput-
ing program at the University of California, Berke-
ley (supported by the UC Berkeley Chancellor, Vice
Chancellor for Research, and Chief Information Of-
ficer). This research was supported in part by NSF
award No. CMMI-1760102.
REFERENCES
Breiman, L., Friedman, J., Stone, C., and Olshen, R. (1984).
Classification and Regression Trees. The Wadsworth
and Brooks-Cole statistics-probability series. Taylor
& Francis.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis,
J. (2009). Modeling wine preferences by data min-
ing from physicochemical properties. Decis. Support
Syst., 47:547–553.
Dua, D. and Graff, C. (2017). UCI machine learning repos-
itory.
F.R.S., K. P. (1901). Liii. on lines and planes of closest fit to
systems of points in space. The London, Edinburgh,
and Dublin Philosophical Magazine and Journal of
Science, 2(11):559–572.
Holm, S. (1979). A simple sequentially rejective multi-
ple test procedure. Scandinavian Journal of Statistics,
6(2):65–70.
Karp, R. M. (1972). Reducibility among combinatorial
problems. In Miller, R. E., Thatcher, J. W., and
Bohlinger, J. D., editors, Complexity of Computer
Computations, pages 85–103. Plenum Press, New
York.
Krizhevsky, A. (2009). Learning multiple layers of features
from tiny images. Technical report.
LeCun, Y., Cortes, C., and Burges, C. (2010). Mnist hand-
written digit database. ATT Labs [Online]. Available:
http://yann.lecun.com/exdb/mnist, 2.
Oliphant, T. E. (2006). A guide to NumPy, volume 1. Trel-
gol Publishing USA.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Van Rossum, G. and Drake, F. L. (2009). Python 3 Refer-
ence Manual. CreateSpace, Scotts Valley, CA.
The Max-Cut Decision Tree: Improving on the Accuracy and Running Time of Decision Trees
69