Parallel Computing: False Sharing

Calculation of pi:

The program will numerically compute the integral of 4/(1+x*x) from 0 to 1.
The value of this integral is pi.

Serial Code:

The serial implementation of this code is here
The serial code is faster than the parallel code.
We will see the parallel code soon.

Parallel Code using OpenMP :

The program is parallelized using OpenMP and an SPMD algorithm

The program will show low performance due to false sharing. In particular, sum[id] is unique to each thread, but adjacent values of this array share a cache line causing cache thrashing as the program runs.

You can see the complete code here

Output:

Serial Code, Output:

Parallel code using OpenMP, output:

we can clearly see, every time it is taking more time than the serial code.

Experimental Setup:

4-Core Intel I7 @ 3.07GHz 16 GB RAM
GCC compiler - 4.9.4

Why this is slower:

The OpenMP program will show low performance due to false sharing.
In particular, sum[id] is unique to each thread, but adjacent values of this array share a cache line causing cache thrashing as the program runs.

Parallel Computing

Wednesday, 19 April 2017

False Sharing

Calculation of pi:

Serial Code:

Parallel Code using OpenMP :

Output:

No comments:

Post a Comment