Efficient and Distributed DBScan Algorithm Using MapReduce to Detect Density Areas on Traffic Data

Ticiana L. Coelho da Silva, Antônio C. Araújo Neto, Regis Pires Magalhães, Victor A. E. de Farias, José A. F. de Macêdo, Javam C. Machado

Abstract

Mobility data has been fostered by the widespread diffusion of wireless technologies. This data opens new opportunities for discovering the hidden patterns and models that characterise the human mobility behaviour. However, due to the huge size of generated mobility data and the complexity of mobility analysis, new scalable algorithms for efficiently processing such data are needed. In this paper we are particularly interested in using traffic data for finding congested areas within a city. To this end we developed a new distributed and efficient strategy of the DBScan algorithm that uses MapReduce to detect what are the density areas. We conducted experiments using real traffic data of a brazilian city (Fortaleza) and compare our approach with centralized and map-reduce based DBSCAN approaches. Our preliminaries results confirm that our approach is scalable and more efficient than others competitors.

References

  1. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. In Communications of the ACM, volume 18, pages 509-517. ACM.
  2. Dai, B.-R. and Lin, I.-C. (2012). Efficient map/reducebased dbscan algorithm with optimized data partition. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pages 59-66. IEEE.
  3. Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107-113.
  4. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231.
  5. Giannotti, F., Nanni, M., Pedreschi, D., Pinelli, F., Renso, C., Rinzivillo, S., and Trasarti, R. (2011). Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB JournalThe International Journal on Very Large Data Bases, 20(5):695-719.
  6. He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S., and Fan, J. (2011). Mr-dbscan: An efficient parallel density-based clustering algorithm using mapreduce. In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on, pages 473- 480. IEEE.
  7. Kisilevich, S., Mansmann, F., and Keim, D. (2010). Pdbscan: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, page 38. ACM.
  8. Lin, J. and Dyer, C. (2010). Data-intensive text processing with mapreduce. Synthesis Lectures on Human Language Technologies, 3(1):1-177.
  9. Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., and Stonebraker, M. (2009). A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, SIGMOD 7809, pages 165-178, New York, NY, USA. ACM.
  10. Sousa, F. R. C., Moreira, L. O., Macldo, J. A. F., and Machado, J. C. (2010). Gerenciamento de dados em nuvem: Conceitos, sistemas e desafios. In SBBD, pages 101-130.
  11. Uncu, O., Gruver, W. A., Kotak, D. B., Sabaz, D., Alibhai, Z., and Ng, C. (2006). Gridbscan: Grid densitybased spatial clustering of applications with noise. In Systems, Man and Cybernetics, 2006. SMC'06. IEEE International Conference on, volume 4, pages 2976- 2981. IEEE.
  12. White, T. (2012). Hadoop: the definitive guide. O'Reilly.
Download


Paper Citation


in Harvard Style

L. Coelho da Silva T., C. Araújo Neto A., Pires Magalhães R., A. E. de Farias V., A. F. de Macêdo J. and C. Machado J. (2014). Efficient and Distributed DBScan Algorithm Using MapReduce to Detect Density Areas on Traffic Data . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-027-7, pages 52-59. DOI: 10.5220/0004891700520059


in Bibtex Style

@conference{iceis14,
author={Ticiana L. Coelho da Silva and Antônio C. Araújo Neto and Regis Pires Magalhães and Victor A. E. de Farias and José A. F. de Macêdo and Javam C. Machado},
title={Efficient and Distributed DBScan Algorithm Using MapReduce to Detect Density Areas on Traffic Data},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2014},
pages={52-59},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004891700520059},
isbn={978-989-758-027-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Efficient and Distributed DBScan Algorithm Using MapReduce to Detect Density Areas on Traffic Data
SN - 978-989-758-027-7
AU - L. Coelho da Silva T.
AU - C. Araújo Neto A.
AU - Pires Magalhães R.
AU - A. E. de Farias V.
AU - A. F. de Macêdo J.
AU - C. Machado J.
PY - 2014
SP - 52
EP - 59
DO - 10.5220/0004891700520059