real life, the ‘labels’ in the dataset for classification
were provided by medical experts who may have
applied some threshold in their mind while
partitioning the data. Therefore, learning this
threshold and basing it in line with theory is a good
idea. Hence, it is reasoned that our algorithm
performed well in classification.
5 CONCLUSIONS
High-dimensional data require dimensional reduction
techniques for which PCA is usually considered
suitable. ICA has not been much used for time series
data. If we are required to perform classification tasks
on high-dimensional data, we would need to perform
dimensional reduction first. We found evidence that
ICA can indeed provide better classification than
PCA. One of our contributions is that we have found
that a careful choice of the clustering algorithm (PAM
instead of k-means) also leads to better performance.
Our most important contribution is that we have
developed a new algorithm that works on semi-
supervised learning. We have applied it on multiple
ICAs for more stable results. The new algorithm has
provided the best classification performance. The
limitation of this work is that we do not generalize to
all kinds of datasets. Datasets that are in low
dimensions and have many columns that are highly
sparse may not yield good results using ICA. On the
overall, this work provides an additional method that
uses ICA, and may work very well on high-
dimensional datasets. Future work may explore many
more types of datasets for possible generalization,
though our work provides good indications of better
performance in higher dimensions. Dimension
reduction is very important to visualization of high-
dimensional data, so it is possible that future work
may consider using similar approach to improve
visualization.
REFERENCES
Aggarwal, C. C., 2014. An Introduction to Cluster Analysis.
In C. C. Aggarwal, & C. K. Reddy, Data Clustering:
Algorithms and Applications (pp. 1- 27). Boca Raton,
FL: CRC Press.
Aggarwal, C. C., Procopiuc, C. M., Wolf, J. L., Yu, P. S.,
& Park, J. S., 1999. Fast Algorithms for Projected
Clustering. Proceedings of ACM International
Conference on Management of Data (SIGMOD), (pp.
61-72). Philadelphia, PA.
Aggarwal, C., Han, J., Wang, J., & Yu, P., 2003. A
Framework for Clustering Evolving Data Streams.
VLDB Conference.
Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P.,
1998. Automatic Subspace Clustering of High
Dimensional Data. Proceedings of ACM International
Conference on Management of Data (SIGMOD), (pp.
94-105). Seattle, WA.
Alelyani, S., Tang, J., & Liu, H., 2014. Feature Selection
for Clustering: A Review. In C. C. Aggarwal, & C. K.
Reddy, Data Clustering - Algorithms and Applications
(pp. 29-60). Boca Raton, FL: CRC Press.
Belouchrani, A., et al., 1997. A Blind Source Separation
Technique Using Second-Order Statistics. IEEE
Transactions on Signal Processing, vol. 45, no. 2, pp.
434–44, doi:10.1109/78.554307.
Berkhin, P., 2006. A Survey of Clustering Data Mining
Techniques. In J. Kogan, C. Nicholas, & M. Teoulle,
Grouping Multidimensional Data (pp. 27-71). Berlin
Heidelberg: Springer.
Cardoso, J.-F. & Souloumiac, A., 1993. Blind beamforming
for non Gaussian signals. IEE Proceedings-F, 140,
362–370.
Chakrabarty, S. & Levkowitz, H., 2019. Denoising and
stability using Independent Component Analysis in
high dimensions – visual inspection still required. in
23
rd
International Conference Information
Visualisation, Paris, 2019.
Chakrabarty, S. & Levkowitz, H., 2019. A New Index for
Measuring Inconsistencies in Independent Component
Analysis Using Multi-sensor Data. In: Luo Y. (eds)
Cooperative Design, Visualization, and Engineering.
CDVE 2019. Lecture Notes in Computer Science, vol
11792. Springer, Cham.
Chakrabarty, S., 2018. Clustering Methods in Business
Intelligence. in Global Business Intelligence, J. M.
Munoz, Ed., New York, Routledge, pp. 37-50.
Chapelle, O., Scholkopf, B., & Zien, A., 2006. Semi-
Supervised Learning. MIT Press.
Comon, P. & Jutten, C., 2010. Handbook of Blind Source
Separation, Burlington, MA: Academic Press.
Dempster, A. P., Laird, N. M., & Rubin, D. B., 1977.
Maximum Likelihood from Incomplete Data via the
EM Algorithm. Journal of the Royal Statistical Society,
39(1), 1-38.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X., 1996. A
Density-based Algorithm for Discovering Clusters in
Large Spatial Databases with Noise. ACM KDD
Conference, (pp. 226-231).
Filiponne, M., Camastra, F., Masulli, F., & Rovatta, S.
2008. A Survey of Kernel and Spectral Methods for
Clustering. Pattern Recognition, 41(1), 176-190.
Himberg, J. & Hyvärinen, A., 2003. Icasso: software for
investigating the reliability of ICA estimates by
clustering and visualization. in In Proc. 2003 IEEE
Workshop on Neural Networks for Signal Processing
(NNSP2003), Toulouse, France.
Hyvärinen, A. & Oja, E., 1997. A Fast Fixed-Point
Algorithm for Independent Component Analysis.
Neural Computation, vol. 9, pp. 1483-1492.
A New Algorithm using Independent Components for Classification and Prediction of High Dimensional Data