Table 2: Results of experiments (POSIXfunctions).
Aggl.
CPU 1 2 4 8
Loop n seq. for synch time S E time S E time S E
Loop1
-
1500 0,0441 0,0462
0,0296
1,490 0,745 0,0231 1,281 0,320 0,0288 1,028 0,128
2000
0,0782
0,0814 0,0475 1,646 0,823 0,0346 2,263 0,566 0,0390 2,005 0,251
2500
0,1222
0,1271 0,0713 1,714 0,857 0,0461 2,651 0,663 0,0680 1,797 0,225
Loop1
+ / F
N =
200
1500 0,0441 0,0470 0,0299 1,475 0,737 0,0213 2,070 0,518 0,0272 1,621 0,203
2000
0,0782
0,0861 0,0479 1,633 0,816 0,0350 2,234 0,559 0,0312 2,506 0,313
2500
0,1222
0,1300 0,0721 1,695 0,847 0,0487 2,509 0,627 0,0461 2,651 0,331
Loop2
+
N =
16
1500
0,0446
0,0452 0,0308 1,449 0,724 0,0251 1,777 0,444 0,0247 1,806 0,226
2000
0,0793
0,0801 0,0516 1,537 0,768 0,0314 2,525 0,631 0,0318 2,494 0,312
2500
0,1238
0,1249 0,0720 1,719 0,860 0,0458 2,703 0,676 0,0382 3,241 0,405
Loop 3
-
1500
0,0578
0,0580 0,0391 1,478 0,739 - - - - - -
2000
0,0843
0,0844 0,0527 1,600 0,800 - - - - - -
2500
0,1148
0,1153 0,0648 1,772 0,886 - - - - - -
Table 3: Influence of the number of macro-slices on program performance (POSIX functions).
CPU 1 2 4 8
N seq. for synch time S E time S E time S E
32
0,0819
0,0908 0,0507 1,615 0,808 0,0428 1,914 0,478 0,0370 2,214 0,277
64
0,0819
0,0912 0,0498 1,645 0,823 0,0356 2,301 0,575 0,0346 2,367 0,296
128
0,0819
0,0878 0,0514 1,593 0,797 0,0324 2,528 0,632 0,0307 2,668 0,333
256
0,0819
0,0876 0,0538 1,522 0,761 0,0342 2,395 0,599 0,0364 2,250 0,281
Table 4: POSIX and OpenMP functions.
N
1 CPU, s 2 CPU, s 4 CPU, s 8 CPU, s
Seq. OpenMP POSIX WS OpenMP POSIX WS OpenMP POSIX WS OpenMP POSIX
WS
512
0,0819 0,0894 0,0866 0,0837 0,0587 0,0516 0,0496 0,0392 0,0381 0,0378 0,0375 0,036 0,0359
(6,8%;3,5%) (18,3%;4%) (3,7%;0,8%) (4,5%;0,3%)
256
0,0819 0,0888 0,0853 0,0852 0,0568 0,0501 0,0499 0,0345 0,0309 0,0306 0,0386 0,036 0,0317
(4,2%;3,5%) (13,8%; 0,4%) (12,7%; 1,0%) (21,8%; 13,6%)
128
0,0819 0,0875 0,0866 0,0856 0,0576 0,0521 0,0512 0,0356 0,0318 0,0316 0,0365 0,0319 0,032
(2,2%; 1,2%) (12,5%; 1,8%) (12,7%; 0,6%) (14,1%; 0%)
32
0,0819 0,1001 0,0965 0,0845 0,0585 0,056 0,0509 0,0431 0,0401 0,0371 0,0372 0,0343 0,0311
(18,5%; 14,2%) (14,9%; 10%) (16,2%;8,1%) (19,6%;10,3%)
The size of a macro-slice impacts also program
locality – a very important factor defining program
performance. Table 3 presents how the number of
macro-slices (Loop 1, free-scheduling with
agglomeration), N, impacts S and E. For each
number of processors, there exists a particular value
of N optimizing program performance.
Table 4 presents differences in execution times
of programs being written with OpenMP and POSIX
lock functions for Loop 1, n=2048. The columns
named as WS present program execution times when
send and receive functions are removed from code.
In the brackets, the percentage of the program
execution time is inserted that characterizes
synchronization time on the basis of OpenMP and
POSIX functions, respectively. The data in Table 4
demonstrates that POSIX functions permit for better
program performance in comparison with OpenMP
functions.
USING MESSAGE PASSING FOR DEVELOPING COARSE-GRAINED APPLICATIONS IN OPENMP
151