by an effective bag of instruction words, attaching
more weight to the mixture of instruction types, rather
than the order of execution.
0.85
0.88
0.91
0.94
0.97
1
1000 2500 5000 7500 10000
Accurcay
Programs
3
4
5
6
7
8
Figure 6: System accuracy: extracted from confusion ma-
trices; shown as a function of an ascending, program col-
lection size, and parametrized by the SoC processor count.
The clustering process we devise bins compute
programs into classes, each identified as a unique vir-
tual processor. Much like a VM that compiles byte-
code to machine code, our system proceeds to as-
sign virtual to physical processing elements, a step
that is both device and SoC vendor specific, and is
based on mapping architecture properties attached to
each virtual cluster. As evident from our confusion
matrix data, there is the small statistical likelihood
for a program to be paired with a less-than-optimal
SoC compute entity. While the runtime performance
of this match might be below efficiency expectation,
program execution however, warrants functional cor-
rectness. Typically, extending the training program
collection and relabeling is one reasonable mitigating
practice to reduce false positive occurrences.
By endorsing a data type independent bytecode,
we assumed symmetric SoC processors, each capa-
ble of executing the entire, defined Dalvik ISA. This
premise is substantiated for the majority of instruc-
tion types, but not for all. For example, double pre-
cision format, might be supported natively on some
cores, but other compute elements may resort to a
less efficient, software emulation. To address this, we
plan to augment our instruction vocabulary, by adding
double data type annotations to the binary arithmetic
mnemonics, and let our GMM module abide by the
processor support level, in forming program clusters.
A direct progression of our work is to assume no
prior knowledge of the number of SoC compute el-
ements, and discover both the model fitting and the
selection dimension directly from the incomplete pro-
gram training set, using a combination of Akaike and
Bayesian information criteria. We look forward to a
more wide spread and publicly available repository
of compute programs, to allow for extending our ex-
periments, and pursue more real world, machine code
executables. Lastly, we envision our software to be
incorporated seamlessly in a mobile application plat-
form, to effectively perform the classification task of
processor target selection, at execution runtime.
ACKNOWLEDGEMENTS
We would like to thank the anonymous reviewers for
their insightful and helpful feedback.
REFERENCES
Akaike, H. (1973). Information theory and an extension of
the maximum likelihood principle. In International
Symposium on Information Theory, pages 267–281,
Budapest, Hungary.
Augonnet, C. (2011). Scheduling Tasks over Multicore Ma-
chines Enhanced with Accelerators: a Runtime Sys-
tem’s Perspective. PhD thesis, Universit´e Bordeaux
1, 351 cours de la Lib´eration — 33405 TALENCE
cedex.
Baeza-Yates, R. and Ribeiro-Neto, B., editors (1999). Mod-
ern Information Retrieval. ACM Press Series/Addison
Wesley, Essex, UK.
Cormen, T. H., Leiserson, C. H., Rivest, R. L., and
Stein, C. (1990). Introduction to Algorithms. MIT
Press/McGraw-Hill Book Company, Cambridge, MA.
Dalvik (2007). Bytecode for Dalvik VM.
http://source.android.com/devices/tech/dalvik/
dalvik-bytecode.html.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likelihood from incomplete data via the
EM algorithm. Royal Statistical Society, 39(1):1–38.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Unsu-
pervised learning and clustering. In Pattern Classifi-
cation, pages 517–601. Wiley, New York, NY.
Fraley, C. and Raftery, A. E. (2002). Model-based
clustering, discriminant analysis and density estima-
tion. Journal of the American Statistical Association,
97(458):611–631.
Fraley, C. and Raftery, A. E. (2007). Bayesian regulariza-
tion for normal mixture estimation and model-based
clustering. Journal of Classification, 24(2):155–181.
Kohavi, R. and Provost, F. (1998). Glossary of terms. Ma-
chine Learning, 30(2):271–274.
Manning, C. D., Raghavan, P., and Schutze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press, Cambridge, United Kingdom.
Mclachlan, G. J. and Basford, K. E. (1988). Mixture Mod-
els: Inference and Applications to Clustering. Marcel
Dekker, New York, NY.
Mclachlan, G. J. and Peel, D. (2000). Finite Mixture Mod-
els. John Wiley and Sons, New York, NY.
Ngatchou-Wandji, J. and Bulla, J. (2013). On choosing a
mixture model for clustering. Journal of Data Sci-
ence, 11(1):157–179.
KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
200