Mining and Analysis of Apps in Google Play

Shahab Mokarizadeh, Mohammad Tafiqur Rahman, Mihhail Matskin

2013

Abstract

In this paper, we focus on analyzing Google Play, the largest Android app store that provides a wide collection of data on features (ratings, price and number of downloads) and descriptions related to application functionality. The overall objective of this analysis effort is to provide in-depth insight about intrinsic properties of App repositories in general. This allows us to draw a comprehensive picture of current situation of App market in order to help application developers to understand customers’ desire and attitude and the trend in the market. To this end, we suggest an analysis approach which examines the given collection of Apps in two directions. In the first direction, we measure the correlation between app features while in the second direction we construct cluster of similar applications and then examine their characteristics in association with features of interest. The examined dataset are collected from Google Play (in 2012) and Android Market (in 2011). In our analysis results, we identified a strong correlation between price and number of downloads and similarly between price and participation. Moreover, by employing a probabilistic topic modeling technique and K-means clustering method, we find out that the categorization system of Google Play does not respect properly similarity of applications. We also determined that there is a high competition between App providers producing similar applications.

References

  1. Blei, D. M. (2012). Probabilistic topic models. Commun. ACM, 55(4):77-84.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993- 1022.
  3. Camelin, N., Detienne, B., Huet, S., Quadri, D., and Lefèvre, F. (2011). Unsupervised concept annotation using latent dirichlet allocation and segmental methods. In Proceedings of the First Workshop on Unsupervised Learning in NLP, pages 72-81. Association for Computational Linguistics.
  4. Dokoohaki, N. and Matskin, M. (2012). Mining divergent opinion trust networks through latent dirichlet allocation. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 879-886. IEEE Computer Society.
  5. Harman, M., Jia, Y., and Zhang, Y. (2012). App store mining and analysis: Msr for app stores. In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR 7812), pages 108-111. IEEE.
  6. Hassan, A. E. (2008). The road ahead for mining software repositories. In Frontiers of Software Maintenance, 2008, pages 48-57. FoSM.
  7. Hatagami, Y. and Matsuka, T. (2009). Text mining with an augmented version of the bisecting k-means algorithm. In Proceedings of the 16th International Conference on Neural Information Processing: Part II, pages 352-359. Springer-Verlag.
  8. Hu, D. J. (2009). Latent dirichlet allocation for text, images and music. Citeseer.
  9. McCallum, A. K. (2012). Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, Accessed: 30/06/2012.
  10. Minelli, R. and Lanza, M. (2013). Software analytics for mobile applications - insights & lessons learned. In 17th IEEE European Conference on Software Maintenance and Reengineering (CSMR 2013). IEEE Computer Society Press. To Appear.
  11. Newman, D. (2011). How to do your own topic modeling. Collaborative Learning Center, Yale University, New Haven (2011). http://ydc2.yale.edu/node/362/ attachment.
  12. Sabatini, M. (2012). Google play (android market) vs apple app store 2012. http://www.androidauthority.com/, Accessed: 05/01/2013.
  13. Yang, T., Torget, A. J., and R., M. (2011). Topic modeling on historical newspapers. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96-104. Association for Computational Linguistics.
  14. Zhao, Y., Karypis, G., and Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov., 10(2):141-168.
Download


Paper Citation


in Harvard Style

Mokarizadeh S., Rahman M. and Matskin M. (2013). Mining and Analysis of Apps in Google Play . In Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: BA, (WEBIST 2013) ISBN 978-989-8565-54-9, pages 527-535. DOI: 10.5220/0004502005270535


in Bibtex Style

@conference{ba13,
author={Shahab Mokarizadeh and Mohammad Tafiqur Rahman and Mihhail Matskin},
title={Mining and Analysis of Apps in Google Play},
booktitle={Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: BA, (WEBIST 2013)},
year={2013},
pages={527-535},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004502005270535},
isbn={978-989-8565-54-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: BA, (WEBIST 2013)
TI - Mining and Analysis of Apps in Google Play
SN - 978-989-8565-54-9
AU - Mokarizadeh S.
AU - Rahman M.
AU - Matskin M.
PY - 2013
SP - 527
EP - 535
DO - 10.5220/0004502005270535