Wednesday, 19 April 2017

False Sharing

Calculation of pi:

The program will numerically compute the integral of 4/(1+x*x) from 0 to 1. 
The value of this integral is pi.


Serial Code:

The serial implementation of this code is here
The serial code is faster than the parallel code.
We will see the parallel code soon.


Parallel Code using OpenMP :

The program is parallelized using OpenMP and an SPMD algorithm

The program will show low performance due to false sharing. In particular, sum[id] is unique to each thread, but adjacent values of this array share a cache line causing cache thrashing as the program runs.

You can see the complete code here


Output:

Serial Code, Output:



Parallel code using OpenMP, output:



we can clearly see, every time it is taking more time than the serial code.

Experimental Setup:

4-Core Intel I7 @ 3.07GHz 16 GB RAM
GCC compiler - 4.9.4

Why this is slower:

The OpenMP program will show low performance due to false sharing
In particular, sum[id] is unique to each thread, but adjacent values of this array share a cache line causing cache thrashing as the program runs.





No comments:

Post a Comment