A Library to Support the Development of Applications that Process Huge Matrices in External Memory

Jaqueline A. Silveira, Sallles V. G. Magalhães, Marcus V. A. Andrade, Vinicius S. Conceição

2013

Abstract

This paper presents a new library, named TiledMatrix, to support the development of applications that process large matrices stored in external memory. The library is based on some strategies similar to cache memory management and its basic purpose is to allow that an application, originally designed for internal memory processing, can be easily adapted for external memory. It provides an interface for external memory access that is similar to the traditional method to access a matrix. The TiledMatrix was implemented and tested in some applications that require intensive matrix processing such as: computing the transposed matrix and the computation of viewshed and flow accumulation on terrains represented by elevation matrix. These application were implemented in two versions: one using TiledMatrix and another one using the Segment library that is included in GRASS, an open source GIS. They were executed on many datasets with different sizes and, according the tests, all applications ran faster using TiledMatrix than Segment. In average, they were 7 times faster with TiledMatrix and, in some cases, more than 18 times faster. Notice that processing large matrices (in external memory) can take hours and, thus, this improvement is very significant.

References

  1. Aggarwal, A. and Vitter, J. S. (1988). The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116-1127.
  2. Arge, L., Toma, L., and Vitter, J. S. (2001). I/o-efficient algorithms for problems on grid-based terrains. J. Exp. Algorithmics, 6.
  3. Chandra, D., Guo, F., Kim, S., and Solihin, Y. (2005). Predicting inter-thread cache contention on a chip multiprocessor architecture. In Proceedings of the HPCA 7805, pages 340-351, Washington, DC, USA. IEEE Computer Society.
  4. Crauser, A. and Mehlhorn, K. (1999). Leda-sm : Extending leda to secondary memory. In Vitter, J. S. and Zaroliagis, C. D., editors, Algorithm engineering (WAE99) : 3rd International Workshop, WAE'99, volume 1668 of Lecture Notes in Computer Science, pages 228-242, London, UK. Springer.
  5. Dementiev, R., Kettner, L., and Sanders, P. (2005). Stxxl : Standard template library for xxl data sets. http://stxxl.sourceforge.net/. Acessed July 15, 2012.
  6. Ferreira, C. R., Magalha˜es, S. V. G., Andrade, M. V. A., Franklin, W. R., and Pompermayer, A. M. (2012). More efficient terrain viewshed computation on massive datasets using external memory. In ACM SIGSPATIAL GIS 2012, Redondo Beach, CA.
  7. Fishman, J., Haverkort, H. J., and Toma, L. (2009). Improved visibility computation on massive grid terrains. In Wolfson, O., Agrawal, D., and Lu, C.-T., editors, GIS, pages 121-130. ACM.
  8. Franklin, W. R. and Ray, C. (1994). Higher isÁt necessarily better - visibility algorithms and experiments. 6th Symposium on Spatial Data Handling, Edinburgh, Scotland.
  9. GRASS, D. T. (2011). Geographic resources analysis support system (GRASS GIS) software. http://grass.osgeo.org. Accessed July 15, 2012.
  10. Grund, D. and Reineke, J. (2009). Abstract interpretation of FIFO replacement. In Palsberg, J. and Su, Z., editors, Static Analysis, 16th International Symposium, SAS 2009, volume 5673 of LNCS, pages 120-136. Springer.
  11. Guo, F. and Solihin, Y. (2006). An analytical model for cache replacement policy performance. pages 228- 239. SIGMETRICS Perform.Eval. Rev.
  12. Haverkort, H., ], L., and Zhuang, Y. (2007). Computing visibility on terrains in external memory. In Proceedings of the Ninth ALENEX/ANALCO.
  13. Haverkort, H. and Janssen, J. (2012). Simple i/o-efficient flow accumulation on grid terrains. CoRR - Compting Research Repository, abs/1211.1857.
  14. Lz4 (2012). Extremely fast compression algorithm. http://code.google.com/p/lz4/. Accessed June 1, 2012.
  15. Magalha˜es, S. V. G., Andrade, M. V. A., Franklin, W. R., and Pena, G. C. (2012). A new method for computing the drainage network based on raising the level of an ocean surrounding the terrain. 15th AGILE International Conference on Geographical Information Science, pages 391-407.
  16. Mehlhorn, K. and Näher, S. (1995). Leda: a platform for combinatorial and geometric computing. Commun. ACM, 38(1):96-102.
  17. Meyer, U. and Zeh, N. (2012). I/o-efficient shortest path algorithms for undirected graphs with random or bounded edge lengths. ACM Transactions on Algorithms, 8(3):22.
  18. Tarboton, D. (1997). A new method for the determination of flow directions and contributing areas in grid digital elevation models. Water Resources Research, 33:309- 319.
Download


Paper Citation


in Harvard Style

A. Silveira J., V. G. Magalhães S., V. A. Andrade M. and S. Conceição V. (2013). A Library to Support the Development of Applications that Process Huge Matrices in External Memory . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 153-160. DOI: 10.5220/0004435001530160


in Bibtex Style

@conference{iceis13,
author={Jaqueline A. Silveira and Sallles V. G. Magalhães and Marcus V. A. Andrade and Vinicius S. Conceição},
title={A Library to Support the Development of Applications that Process Huge Matrices in External Memory},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={153-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004435001530160},
isbn={978-989-8565-59-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Library to Support the Development of Applications that Process Huge Matrices in External Memory
SN - 978-989-8565-59-4
AU - A. Silveira J.
AU - V. G. Magalhães S.
AU - V. A. Andrade M.
AU - S. Conceição V.
PY - 2013
SP - 153
EP - 160
DO - 10.5220/0004435001530160