Hi,

I'm Sho Nakatani, accepted by Google Summer of Code 2011.

I'm trying to add speed-up to libgomp, an OpenMP implementaion in GCC.
As GSoC project, I'll focus on OpenMP `task' directive (especially `tied task').
Around the beginning of April, some members here told me that I can
migrate the OpenMP implementation from Nanos4.

However, my experiment has shown that Nanos4 is not always better than libgomp.
See the graphs below.

Testsuite:
  BOTs, which Nanos project provides.

Compilers:
  gcc with libgomp
  icc (Intel C Compiler) with its OpenMP runtime
  mcc (Mercurium C Compiler) with Nanos4

Environment:
  Each test case is executed on 48 CPUs (actually 24 CPUs and Hyper-Threading).
  OMP_NUM_THREADS is 48 for parallelized programs.

Results:
  Here are 9 graphs for each test case.
  A graph compares the performance each compiler provides and the efficiency
  of `task' directives.

https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-protein.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fft.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fib.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-floorplan.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-health.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-nqueen.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sort.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sparse.eps.png
https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-strassen.eps.png


In my opinion, just migrating Nanos4 implementation would not improve libgom
performance as a whole.
What I should aim for might be to understand both libgomp and Nanos4
implementation
and add only good features of Nanos4 to libgomp.
Is it OK? Give me any opinion and idea!

--
Sho Nakatani

Reply via email to