# Parallel efficiency - simple approach

• compute parallelly efficient i. e. use only a sufficient number of cores.
• The performance of a program does not increase infinitely in parallel with the number of processors
• The maximum attainable speedup of a program converges against a fixed value
• In reality, the speedup factor decreases once the proportion of communication predominates.

Running a program can often be sped up by using multiple processors. However, each program has parts that can never be fully parallelized.
This means that even if a huge number of processors are used, only the computing time that can be parallelized is ever reduced.
To estimate how well the execution of a parallel calculation works, there is the so-called speedup factor or parallel efficiency factor.

The speedup factor (also acceleration factor) is a ratio of the calculation time of a process on a processor to the calculation time of a process with several processors.

$$S (n) = \frac{t_s}{t_p (n)}$$

$t_s$ is the time needed to run on a processor. $t_p (n)$ is the time needed to run on $n$ processors.

The parallel efficiency of a program is the ratio of the speedup factor $S (n)$ and the number of processors. $$\eta_S=\frac{S (n)}{n}$$

These quantities can be used to estimate how much of the computing capacity is actually used to carry out a calculation.
And, which part of the computing capacity is used for communication between the parallel processes of a program.

It should be avoided to waste excessive computing time on communication.
As a rule of thumb, the parallel efficiency should be greater than 0.5. ($\eta_S \geq 0.5$)

This can be illustrated by means of an example. The following table lists the records of a calculation.
How many iterations of a calculation for a different number of cores. In relation to the serial process, speedup and parallel efficiency were determined.

No. CPU Iterations per time unit speedup parallel efficiency
1 166 - -
4 518 3.1 0.78
8 857 5.16 0.65
16 879 5.3 0.3
32 861 5.19 0.162

It can be seen that already with 8 cores this simulation almost achieves its maximum speedup. With 16 cores the speedup is even higher. On the other hand, parallel efficiency decreases significantly. With an even larger number of cores, the speedup even decreases again. This is due to the increasing proportion of communication that has to be maintained in order to count on so many cores at the same time. The' optimum' in this case will probably be between 8 and 16 cores. No additional reduction of the computing time is to be expected. With 32 cores, the vast majority of computing time is wasted, which could potentially be used more effectively by other users.

Schematic representation of the speedup factor of a calculation for the increasing number of nodes

These statements are of course very specific to the simulation carried out. Other simulations and programs show their own characteristics. Ideally, the user should be aware of this.

The parallel efficiency of your own programs can be checked as in the example, which is practically good enough. Run the same simulation several times on different number of cores or nodes. Then, consider how many iterations were achieved in an hour/day.

Alternatively, the total calculation time can also be used as a reference value for parallel efficiency.