Skip to site navigation (Press enter)

My current idea for improving libgomp

Sho Nakatani Wed, 27 Apr 2011 20:14:34 -0700

Hi,

I'm Sho Nakatani, accepted by Google Summer of Code 2011.

I'm trying to add speed-up to libgomp, an OpenMP implementaion in GCC.
As GSoC project, I'll focus on OpenMP `task' directive (especially `tied task').
Around the beginning of April, some members here told me that I can
migrate the OpenMP implementation from Nanos4.

However, my experiment has shown that Nanos4 is not always better than libgomp.
See the graphs below.

Testsuite:
BOTs, which Nanos project provides.

Compilers:
gcc with libgomp
icc (Intel C Compiler) with its OpenMP runtime
mcc (Mercurium C Compiler) with Nanos4

Environment:
Each test case is executed on 48 CPUs (actually 24 CPUs and Hyper-Threading).
OMP_NUM_THREADS is 48 for parallelized programs.

Results:
Here are 9 graphs for each test case.
A graph compares the performance each compiler provides and the efficiency
of `task' directives.

https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-protein.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fft.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fib.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-floorplan.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-health.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-nqueen.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sort.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sparse.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-strassen.eps.png

In my opinion, just migrating Nanos4 implementation would not improve libgom
performance as a whole.
What I should aim for might be to understand both libgomp and Nanos4
implementation
and add only good features of Nanos4 to libgomp.
Is it OK? Give me any opinion and idea!

--
Sho Nakatani