Hi, I'm Sho Nakatani, accepted by Google Summer of Code 2011.
I'm trying to add speed-up to libgomp, an OpenMP implementaion in GCC. As GSoC project, I'll focus on OpenMP `task' directive (especially `tied task'). Around the beginning of April, some members here told me that I can migrate the OpenMP implementation from Nanos4. However, my experiment has shown that Nanos4 is not always better than libgomp. See the graphs below. Testsuite: BOTs, which Nanos project provides. Compilers: gcc with libgomp icc (Intel C Compiler) with its OpenMP runtime mcc (Mercurium C Compiler) with Nanos4 Environment: Each test case is executed on 48 CPUs (actually 24 CPUs and Hyper-Threading). OMP_NUM_THREADS is 48 for parallelized programs. Results: Here are 9 graphs for each test case. A graph compares the performance each compiler provides and the efficiency of `task' directives. https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-protein.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fft.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-fib.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-floorplan.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-health.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-nqueen.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sort.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-sparse.eps.png https://github.com/laysakura/GCC-OpenMP-Speedup/raw/9636d281663a8a7857efd38700c82486ff12ae7b/data/20110427-101530-tuna-strassen.eps.png In my opinion, just migrating Nanos4 implementation would not improve libgom performance as a whole. What I should aim for might be to understand both libgomp and Nanos4 implementation and add only good features of Nanos4 to libgomp. Is it OK? Give me any opinion and idea! -- Sho Nakatani