abstraction for the analysis of data-dependent gpu ker-
nels. OOPSLA ’13, pages 605–622, New York, NY,
USA. ACM.
Collingbourne, P., Cadar, C., and Kelly, P. H. J. (2012).
Symbolic testing of opencl code. In Proceedings of
the 7th International Haifa Verification Conference
on Hardware and Software: Verification and Tes-
ting, HVC’11, pages 203–218, Berlin, Heidelberg.
Springer-Verlag.
Collingbourne, P., Donaldson, A. F., Ketema, J., and Qa-
deer, S. (2013). Interleaving and Lock-Step Semantics
for Analysis and Verification of GPU Kernels, pages
270–289. Springer Berlin Heidelberg, Berlin, Heidel-
berg.
Darte, A. and Schreiber, R. (2005). A linear-time algorithm
for optimal barrier placement. In Proceedings of the
Tenth ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, PPoPP ’05, pages
26–35, New York, NY, USA. ACM.
Dhok, M., Mudduluru, R., and Ramanathan, M. K. (2015).
Pegasus: Automatic barrier inference for stable mul-
tithreaded systems. In Proceedings of the 2015 In-
ternational Symposium on Software Testing and Ana-
lysis, ISSTA 2015, pages 153–164, New York, NY,
USA. ACM.
Ernstsson, A., Li, L., and Kessler, C. (2017). SkePU 2:
Flexible and type-safe skeleton programming for he-
terogeneous parallel systems. International Journal of
Parallel Programming.
Hathhorn, C., Becchi, M., Harrison, W. L., and Procter,
A. M. (2012). Formal semantics of heterogeneous
CUDA-C: A modular approach with applications. In
Proceedings Seventh Conference on Systems Software
Verification, SSV 2012, Sydney, Australia, 28-30 No-
vember 2012., pages 115–124.
Huisman, M. and Mihel
ˇ
ci
´
c, M. (2013). Specification and
verification of GPGPU programs using permission-
based separation logic. In 8th Workshop on Bytecode
Semantics, Verification, Analysis and Transformation
(BYTECODE 2013).
Kirk, D. and Hwu, W.-M. W. (2016). Programming Mas-
sively Parallel Processors: A Hands-on Approach.
Morgan Kaufmann, 3 edition.
Lamport, L. (1977). Proving the correctness of multprocess
programs. IEEE Trans. Software Engineering, 3(2).
Lattner, C. and Adve, V. (2004). LLVM: A Compilation
Framework for Lifelong Program Analysis & Trans-
formation. In Proc. Intl. Symp. on Code generation
and optimization.
Leung, A., Gupta, M., Agarwal, Y., Gupta, R., Jhala, R.,
and Lerner, S. (2012). Verifying gpu kernels by test
amplification. In Proceedings of the 33rd ACM SIG-
PLAN Conference on Programming Language Design
and Implementation, pages 383–394.
Li, G. (2010). Formal verification of programs and their
transformations. PhD thesis.
Li, G. and Gopalakrishnan, G. (2010). Scalable smt-based
verification of gpu kernel functions. In Proceedings
of the Eighteenth ACM SIGSOFT International Sym-
posium on Foundations of Software Engineering (FSE
’10), pages 187–196.
Li, G. and Gopalakrishnan, G. (2012). Parameterized veri-
fication of gpu kernel programs. In Proceedings of the
2012 IEEE 26th International Parallel and Distribu-
ted Processing Symposium Workshops & PhD Forum,
IPDPSW ’12, pages 2450–2459.
Li, G., Li, P., Sawaya, G., Gopalakrishnan, G., Ghosh, I.,
and Rajan, S. P. (2012a). GKLEE: Concolic Verifica-
tion and Test Generation for GPUs. In Proceedings
of the 17th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, PPoPP ’12,
pages 215–224.
Li, P., Li, G., and Gopalakrishnan, G. (2012b). Parametric
flows: Automated behavior equivalencing for symbo-
lic analysis of races in cuda programs. In Procee-
dings of the International Conference on High Perfor-
mance Computing, Networking, Storage and Analysis,
SC ’12, pages 29:1–29:10.
Lin, Z., Gao, X., Wan, H., and Jiang, B. (2015). GLES: A
Practical GPGPU Optimizing Compiler Using Data
Sharing and Thread Coarsening, pages 36–50. Sprin-
ger, Cham.
Lv, J., Li, G., Humphrey, A., and Gopalakrishnan, G.
(2011). Performance Degradation Analysis of GPU
Kernels. In CAV EC
2
.
Manna, Z. (1974). Mathematical Theory of Computation.
McGraw-Hill Kogakusha, Tokyo.
McCarthy, J. (1962). Towards a Mathematical Science of
Computation. In IFIP Congress, pages 21–28.
Nickolls, J. and Dally, W. J. (2010). The GPU computing
era. IEEE Micro, 30(2).
Oboyle, M., Kervella, L., and Bodin, F. (1995). Synchroni-
zation minimization in a spmd execution model. Jour-
nal of Parallel and Distributed Computing, 29(2):196
– 210.
Said, M., Wang, C., Yang, Z., and Sakallah, K. (2011). Ge-
nerating data race witnesses by an smt-based analysis.
In Proceedings of the Third International Conference
on NASA Formal Methods, NFM’11, pages 313–327.
Springer-Verlag.
St
¨
ohr, E. A. and O’Boyle, M. F. P. (1997). A graph based
approach to barrier synchronisation minimisation. In
Proceedings of the 11th International Conference on
Supercomputing, ICS ’97, pages 156–163.
Zheng, M., Ravi, V. T., Qin, F., and Agrawal, G. (2011).
GRace: A Low-overhead Mechanism for Detecting
Data Races in GPU Programs. In Proceedings of the
16th ACM Symposium on Principles and Practice of
Parallel Programming, PPoPP ’11, pages 135–146.
Zheng, M., Ravi, V. T., Qin, F., and Agrawal, G. (2014).
Gmrace: Detecting data races in gpu programs via a
low-overhead scheme. IEEE Transactions on Parallel
and Distributed Systems, 25(1):104–115.
Analysis of GPGPU Programs for Data-race and Barrier Divergence
471