# A PARALLEL ONLINE REGULARIZED LEAST-SQUARES MACHINE LEARNING ALGORITHM FOR FUTURE MULTI-CORE PROCESSORS

### Tapio Pahikkala, Antti Airola, Thomas Canhao Xu, Pasi Liljeberg, Hannu Tenhunen, Tapio Salakoski

#### Abstract

In this paper we introduce a machine learning system based on parallel online regularized least-squares learning algorithm implemented on a network on chip (NoC) hardware architecture. The system is specifically suitable for use in real-time adaptive systems due to the following properties it fulfills. Firstly, the system is able to learn in online fashion, a property required in almost all real-life applications of embedded machine learning systems. Secondly, in order to guarantee real-time response in embedded multi-core computer architectures, the learning system is parallelized and able to operate with a limited amount of computational and memory resources. Thirdly, the system can learn to predict several labels simultaneously which is beneficial, for example, in multi-class and multi-label classification as well as in more general forms of multi-task learning. We evaluate the performance of our algorithm from 1 thread to 4 threads, in a quad-core platform. A Network-on-Chip platform is chosen to implement the algorithm in 16 threads. The NoC consists of a 4x4 mesh. Results show that the system is able to learn with minimal computational requirements, and that the parallelization of the learning process considerably reduces the required processing time.

#### References

- Bogdanowicz, A. (2011). The motion tech behind Kinect. IEEE The Institute. Published Online 6. January 2011 http://www.theinstitute.ieee.org.
- Bottou, L. and Le Cun, Y. (2004). Large scale online learning. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA.
- Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., and Olukotun, K. (2007). Map-reduce for machine learning on multicore. In Schölkopf, B., Platt, J., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19, pages 281-288. MIT Press, Cambridge, MA.
- Dally, W. J. and Towles, B. (2001). Route packets, not wires: on-chip inteconnection networks. In Proceedings of the 38th conference on Design automation, pages 684-689.
- Do, T.-N., Nguyen, V.-H., and Poulet, F. (2008). Speed up SVM algorithm for massive classification tasks. In Tang, C., Ling, C. X., Zhou, X., Cercone, N., and Li, X., editors, Proceedings of the 4th International Conference on Advanced Data Mining and Applications (ADMA 2008), volume 5139 of Lecture Notes in Computer Science, pages 147-157. Springer.
- Farabet, C., Poulet, C., and LeCun, Y. (2009). An fpgabased stream processor for embedded real-time vision with convolutional networks. In Fifth IEEE Workshop on Embedded Computer Vision (ECV'09), pages 878- 885. IEEE.
- Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182.
- Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices. SIAM Review, 23(1):53- 60.
- Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:55-67.
- Horn, R. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press, Cambridge.
- Hsu, D., Kakade, S., Langford, J., and Zhang, T. (2009). Multi-label prediction via compressed sensing. In Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A., editors, Advances in Neural Information Processing Systems 22, pages 772-780. MIT Press.
- Intel (2010). Single-chip cloud computer. http:// techresearch.intel.com/articles/Tera-Scale/1826.htm.
- Jung, T. and Polani, D. (2006). Sequential learning with lssvm for large-scale data sets. In Kollias, S. D., Stafylopatis, A., Duch, W., and Oja, E., editors, Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN 2006), volume 4132 of Lecture Notes in Computer Science, pages 381-390. Springer.
- Kim, C., Burger, D., and Keckler, S. W. (2002). An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ACM SIGPLAN, pages 211- 222.
- Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M. (2010). Graphlab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010).
- Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. (2002). Simics: A full system simulation platform. Computer, 35(2):50-58.
- Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New York.
- Oresko, J. J., Jin, Z., Cheng, J., Huang, S., Sun, Y., Duschl, H., and Cheng, A. C. (2010). A wearable smartphone-based platform for real-time cardiovascular disease detection via electrocardiogram processing. IEEE Transactions on Information Technology in Biomedicine, 14:734-740.
- Pahikkala, T., Airola, A., and Salakoski, T. (2010). Speeding up greedy forward selection for regularized leastsquares. In Draghici, S., Khoshgoftaar, T. M., Palade, V., Pedrycz, W., Wani, M. A., and Zhu, X., editors, Proceedings of The Ninth International Conference on Machine Learning and Applications (ICMLA'10), pages 325-330. IEEE.
- Patel, A. and Ghose, K. (2008). Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors. In Proceeding of the thirteenth international symposium on Low power electronics and design, pages 247-252.
- Plackett, R. L. (1950). Some theorems in least squares. Biometrika, 37(1/2):pp. 149-157.
- Poggio, T. and Smale, S. (2003). The mathematics of learning: Dealing with data. Notices of the American Mathematical Society (AMS), 50(5):537-544.
- Rifkin, R., Yeo, G., and Poggio, T. (2003). Regularized least-squares classification. In Suykens, J., Horvath, G., Basu, S., Micchelli, C., and Vandewalle, J., editors, Advances in Learning Theory: Methods, Model and Applications, volume 190 of NATO Science Series III: Computer and System Sciences, chapter 7, pages 131-154. IOS Press, Amsterdam.
- Sullivan, H. and Bashkow, T. R. (1977). A large scale, homogeneous, fully distributed parallel machine. In Proceedings of the 4th annual symposium on Computer architecture, pages 105-117.
- Suykens, J., Van Gestel, T., De Brabanter, J., De Moor, B., and Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific Pub. Co., Singapore.
- Swere, E. A. (2008). Machine Learning in Embedded Systems. PhD thesis, Loughborough University.
- Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. (2007). An 80-tile 1.28tflops network-on-chip in 65nm cmos. In IEEE International Solid-State Circuits Conference ISSCC 2007, pages 98-589. IEEE.
- Zhdanov, F. and Kalnishkan, Y. (2010). An identity for kernel ridge regression. In Hutter, M., Stephan, F., Vovk, V., and Zeugmann, T., editors, Proceedings of the 21st international conference on Algorithmic learning theory, volume 6331 of Lecture Notes in Computer Science, pages 405-419, Berlin, Heidelberg. SpringerVerlag.
- Zinkevich, M., Smola, A., and Langford, J. (2009). Slow learners are fast. In Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A., editors, Advances in Neural Information Processing Systems 22, pages 2331-2339.

#### Paper Citation

#### in Harvard Style

Pahikkala T., Airola A., Canhao Xu T., Liljeberg P., Tenhunen H. and Salakoski T. (2011). **A PARALLEL ONLINE REGULARIZED LEAST-SQUARES MACHINE LEARNING ALGORITHM FOR FUTURE MULTI-CORE PROCESSORS** . In *Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: SAAES, (PECCS 2011)* ISBN 978-989-8425-48-5, pages 590-599. DOI: 10.5220/0003411405900599

#### in Bibtex Style

@conference{saaes11,

author={Tapio Pahikkala and Antti Airola and Thomas Canhao Xu and Pasi Liljeberg and Hannu Tenhunen and Tapio Salakoski},

title={A PARALLEL ONLINE REGULARIZED LEAST-SQUARES MACHINE LEARNING ALGORITHM FOR FUTURE MULTI-CORE PROCESSORS},

booktitle={Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: SAAES, (PECCS 2011)},

year={2011},

pages={590-599},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003411405900599},

isbn={978-989-8425-48-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems - Volume 1: SAAES, (PECCS 2011)

TI - A PARALLEL ONLINE REGULARIZED LEAST-SQUARES MACHINE LEARNING ALGORITHM FOR FUTURE MULTI-CORE PROCESSORS

SN - 978-989-8425-48-5

AU - Pahikkala T.

AU - Airola A.

AU - Canhao Xu T.

AU - Liljeberg P.

AU - Tenhunen H.

AU - Salakoski T.

PY - 2011

SP - 590

EP - 599

DO - 10.5220/0003411405900599