The algorithms presented so far have been implemented using the C language. Additionally, explicit parallel programing model incorporates ConvexPVM programming environment [4,7], while the implicit one makes use of compiler directives.
| One step | |||
| Configuration | wall-clock time | Speedup | Efficiency |
| [s] | % | ||
| SPP1200 (seq.) | 11.02 | -- | -- |
| SPP1200 (1. proc.) | 10.06 | -- | -- |
| SPP1200 (2. proc.) | 5.27 | 2.1 | 105 |
| SPP1200 (4. proc.) | 2.90 | 3.8 | 95 |
| SPP1200 (8. proc.) | 1.75 | 6.3 | 79 |
The results presented in Table 1 concern SPP1200 computer for the step problem. They show wall-clock time for sequential or parallel execution of a chosen single step of simulation thus the partition time is excluded. The results have been obtained on the dedicated SPP1200 subcomplex.
From Table 1 it follows that wall-clock time for one simulation step for the parallel version of the algorithm (one processor) is lower than for the sequential version. This is due to nodes renumeration necessary for parallel execution.
Fig. 1 and Fig. 2 show data obtained for the ramp problem on SPP1600. Two dedicated subcomplexes have been used: the first consisting up to 7 processors from one SPP hypernode (mentioned by P in Fig. 2) and the second - up to 16 processors from four SPP hypernodes (4 processors from each hypernode at most; mentioned by P4x4). To verify efficiency of the implicit parallel programming model in Fig. 1 we present CXpa wall-clock execution time profiles obtained on SPP1600 for one and seven threads. The comparison shows that substantial decrease of the execution time is obtained of the latter case. However, CXpa time values cannot be taken directly for further analysis due to substantial overhead introduced by the profiling environment. It can be seen that future improvement would be profitable to reduce the part of the code executed sequentially.
In Fig. 2 we present execution wall-clock time for solution with GMRES using two programming models. It follows that for the explicit programming model (with PVM) efficiency is independent of the procesing node organization - with no substantial influence of lower bandwidth between the hypernodes - since it is similar for both subcomplexes. For the implicit programming model worse scalability has been obtained than for the previous case, however the scalability is still within the acceptable range (especially for one hypernode).