than four redundant permissions. Chia et al. (2012)
analyzed the most permission requesting Apps across
three categories: free Apps, Apps with mature content
and Apps with similar name to popular ones. They
identified that the popular Apps request permission
more than the average.
De et al. (2010) targeted application recommen-
dation problem. They developed an open source rec-
ommendation system by utilizing the Web mining
technique over implicit ratings. Other researches fo-
cus on software repository mining to retrieve infor-
mation from different sources that are available in
unstructured textual format such as emails, source
codes, documentations (Hassan, 2008). Zhong and
Michahelles (2013) examined the distribution of sales
and downloads in Google Play. They concluded that
Google Play is a superstar market dominated mostly
by popular Apps. They identified that these superstar
Apps are making up the vast majority of downloaded
or purchased applications and at the same time receiv-
ing higher user ratings. Harman et al. (2012) ap-
plied this mining technique to Blackberry App store
by considering it as a software repository and claimed
their research as the first work in the literature. They
analyzed the relationship among apps of Blackberry
App store where the relationship is developed be-
tween mined features and non-technical information.
They focused only on three features (rating, price and
download) to provide insights to the developers where
free apps are overlooked. Our research goal is the ex-
tension to their works but we have analyzed all the
possible relationships among different features of An-
droid apps, which can help developers to understand
the current scenario of Google Play. Furthermore, we
have figured out the technical dissimilarity among the
apps in same category that precedes us to cluster them
into technically similar groups.
5 CONCLUSIONS AND FUTURE
WORK
In this paper we suggested an analysis approach
suitable for examining intrinsic properties of App
repositories in general. As a case study, we fo-
cused on analyzing Google Play, the largest An-
droid app store. The overall objective of this anal-
ysis effort is to provide in-depth insight about in-
trinsic properties of such app repositories. Using
this approach, we identified a strong negative cor-
relation between hprice, number o f downloadsi and
hprice, participationi and a strong positive correla-
tion between hnumber o f download, participationi.
Moreover, by employing a probabilistic topic mod-
eling technique and K-means clustering method, we
found out that categorization system of Google Play
does not respect properly similarity of applications.
We also identified that there is a high competition be-
tween App providers producing similar applications.
As our future work, we are aiming for incorpo-
rating other features of applications, such as reviews,
collected from other commercial repositories and an-
alyze their correlation with already examined features
(such as ratings) of the apps. Moreover, we aim to de-
velop a recommendation system exploiting the iden-
tified correlation features to recommend applications.
REFERENCES
Blei, D. M. (2012). Probabilistic topic models. Commun.
ACM, 55(4):77–84.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). La-
tent dirichlet allocation. J. Mach. Learn. Res., 3:993–
1022.
Camelin, N., Detienne, B., Huet, S., Quadri, D., and
Lef
`
evre, F. (2011). Unsupervised concept annotation
using latent dirichlet allocation and segmental meth-
ods. In Proceedings of the First Workshop on Unsu-
pervised Learning in NLP, pages 72–81. Association
for Computational Linguistics.
Dokoohaki, N. and Matskin, M. (2012). Mining divergent
opinion trust networks through latent dirichlet alloca-
tion. In 2012 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining,
pages 879–886. IEEE Computer Society.
Harman, M., Jia, Y., and Zhang, Y. (2012). App store min-
ing and analysis: Msr for app stores. In Proceedings
of the 9th Working Conference on Mining Software
Repositories (MSR ’12), pages 108–111. IEEE.
Hassan, A. E. (2008). The road ahead for mining software
repositories. In Frontiers of Software Maintenance,
2008, pages 48–57. FoSM.
Hatagami, Y. and Matsuka, T. (2009). Text mining with
an augmented version of the bisecting k-means algo-
rithm. In Proceedings of the 16th International Con-
ference on Neural Information Processing: Part II,
pages 352–359. Springer-Verlag.
Hu, D. J. (2009). Latent dirichlet allocation for text, images
and music. Citeseer.
McCallum, A. K. (2012). Mallet: A machine learning
for language toolkit. http://mallet.cs.umass.edu, Ac-
cessed: 30/06/2012.
Minelli, R. and Lanza, M. (2013). Software analytics for
mobile applications - insights & lessons learned. In
17th IEEE European Conference on Software Mainte-
nance and Reengineering (CSMR 2013). IEEE Com-
puter Society Press. To Appear.
Newman, D. (2011). How to do your own topic model-
ing. Collaborative Learning Center, Yale University,
New Haven (2011). http://ydc2.yale.edu/node/362/
attachment.
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
534