In comparison to the CPU, the average execution
time of each benchmark version is highly reduced
on the GPU. On average, the generated versions on
the GPU executed 15x faster than on the CPU. This
is mainly because the used GPU provided superior
processing power over the used CPU. Similar to the
CPU, the KPN MoC provided significantly improved
performance for majority of the benchmarks on the
GPU. In particular, for SITE, the KPN MoC version
is 24% and 5% faster than the DDF MoC and SDF
MoC versions, respectively. For all dynamic bench-
marks, the KPN MoC performed at least 19% faster
than the DDF MoC. The biggest difference has been
recorded in the case of DITE where the KPN MoC
version is about 51% faster than the DDF MoC ver-
sion.
7 CONCLUSIONS AND FUTURE
WORK
We extended a model-based design framework us-
ing different classes of dataflow process networks
(DPNs) as different models of computation (MoCs)
by Kahn process networks. We modeled and auto-
matically synthesized a set of benchmarks for differ-
ent target hardware architectures based on all sup-
ported MoCs of the framework, including the newly
integrated KPN MoC. We evaluated all generated ver-
sions of benchmarks for their code sizes and the end-
to-end performance.
Based on our evaluations, we observed that the
SDF MoC generated the most succinct kernel code
for static processes. The KPN MoC also supports dy-
namic behaviors and generated more compact kernel
code than the DDF MoC. The DDF MoC used addi-
tional lines of code for dynamically evaluating actions
at runtime within kernels. Furthermore, the KPN
MoC provided more efficient implementations for all
benchmarks in terms of end-to-end performance on
all target architectures. The KPN MoC effectively
performed up to 1.55x and 1.51x faster than the DDF
MoC on the CPU and the GPU, respectively.
Future work aims at exploring the schemes for ef-
ficiently mapping the models on heterogeneous archi-
tectures for performance acceleration.
REFERENCES
Boutellier, J., Wu, J., Huttunen, H., and Bhattacharyya, S.
(2018). PRUNE: Dynamic and decidable dataflow for
signal processing on heterogeneous platforms. IEEE
Transactions on Signal Processing, 66(3):654–665.
Brooks, C., Lee, E., and Tripakis, S. (2010). Exploring
models of computation with ptolemy II. In Givargis,
T. and Donlin, A., editors, International Conference
on Hardware/Software Codesign and System Synthe-
sis (CODES+ISSS), pages 331–332, Arizona, USA.
ACM.
Buck, J. (1993). Scheduling Dynamic Dataflow Graphs with
Bounded Memory Using the Token Flow Model. PhD
thesis, University of California, USA. PhD.
Dennis, J. (1974). First version of a data-flow procedure
language. In Robinet, B., editor, Programming Sym-
posium, volume 19 of LNCS, pages 362–376, France.
Springer.
Eker, J. and Janneck, J. (2003). CAL language report. ERL
Technical Memo UCB/ERL M03/48, EECS Depart-
ment, University of California at Berkeley, Berkeley,
California, USA.
Engels, M., Bilsen, G., Lauwereins, R., and Peperstraete, J.
(1995). Cyclo-static dataflow. In International Con-
ference on Acoustics, Speech and Signal Processing,
pages 3255–3258, USA. IEEE Computer Society.
Golomb, S. (1971). Mathematical models: Uses and limita-
tions. IEEE Transactions on Reliability, R-20(3):130–
131.
Haubelt, C., Schlichter, T., Keinert, J., and Meredith, M.
(2008). SystemCoDesigner: automatic design space
exploration and rapid prototyping from behavioral
models. In Fix, L., editor, Design Automation Con-
ference (DAC), pages 580–585, Anaheim, California,
USA. ACM.
Kahn, G. and MacQueen, D. (1977). Coroutines and net-
works of parallel processes. In Gilchrist, B., edi-
tor, Information Processing, pages 993–998. North-
Holland.
Karp, R. and Miller, R. (1966). Properties of a model
for parallel computations: Determinacy, termination,
queueing. SIAM Journal on Applied Mathematics
(SIAP), 14(6):1390–1411.
Kuhn, T., Forster, T., Braun, T., and Gotzhein, R. (2013).
FERAL - framework for simulator coupling on re-
quirements and architecture level. In Formal Methods
and Models for Codesign, pages 11–22, USA. IEEE
Computer Society.
Lee, E. and Messerschmitt, D. (1987). Synchronous data
flow. Proceedings of the IEEE, 75(9):1235–1245.
Lee, E. and Parks, T. (1995). Dataflow process networks.
Proceedings of the IEEE, 83(5):773–801.
Lund, W., Kanur, S., Ersfolk, J., Tsiopoulos, L., Lilius, J.,
Haldin, J., and Falk, U. (2015). Execution of dataflow
process networks on OpenCL platforms. In Euromi-
cro International Conference on Parallel, Distributed,
and Network-Based Processing, pages 618–625, Fin-
land. IEEE Computer Society.
Parks, T. (1995). Bounded Scheduling of Process Networks.
PhD thesis, Department of Electrical Engineering and
Computer Sciences, University of California. PhD.
Rafique, O. and Schneider, K. (2020a). Employing OpenCL
as a standard hardware abstraction in a distributed
embedded system: A case study. In Conference
on Cyber-Physical Systems and Internet-of-Things,
Budva, Montenegro. IEEE Computer Society.
MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development
98