[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #14 from Jack Howarth --- Without optimization flags on a 24-core x86_64 Fedora 15 box, the timings for one, two and four OMP processes are… clang-3.4.0 (clang-omp/openmp) 69.988439 sec: 34.962212 sec: 17.641935 sec gcc 4.6.3

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #13 from Jack Howarth --- (In reply to Dominique d'Humieres from comment #12) > > Is gcc really optimizing that low by default? ... > > AFAIK the default optimization in gcc is -O0. Now before drawing conclusions > you should answer

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #12 from Dominique d'Humieres --- > Is gcc really optimizing that low by default? ... AFAIK the default optimization in gcc is -O0. Now before drawing conclusions you should answer my question in comment 8.

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #11 from Jack Howarth --- (In reply to Jakub Jelinek from comment #10) > Also, benchmarking -O0 code is weird. Is gcc really optimizing that low by default? Certainly it is at least doing -O1 when not passed a -O* optimization flag?

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #10 from Jakub Jelinek --- Note, libgomp is optimized for Linux futexes, it has bare support for other targets, so unless somebody steps up and submits and maintains a port for other OSes, those will keep using pthread_* APIs with no

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #9 from Dominique d'Humieres --- > ... (gcc 4.10.0 r210749) ... Forgot to say: Target: x86_64-apple-darwin13, Corei7, 4 cores, 8 threads, 2.8Ghz (turbo 3.8Ghz), cache 8Mb. Note that the "turbo" mode may make the serial test faster.

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-28 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #8 from Dominique d'Humieres --- Some comments: original shell: 1:1.86:2.9 + -Ofast : 1:1.37:1.8 (gcc 4.10.0 r210749). Does this mean that there is a problem with -Ofast and -fopenmp? The Wallclock time are: original shell: 4

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #7 from Andrew Pinski --- (In reply to Jack Howarth from comment #6) > There is a call to pthread_cond_timedwait() in the libiomp5 implementation > but I don't see any such calls in libgomp. Perhaps this is the related to > the increa

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #6 from Jack Howarth --- There is a call to pthread_cond_timedwait() in the libiomp5 implementation but I don't see any such calls in libgomp. Perhaps this is the related to the increased performance in libiomp5 on darwin?

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #5 from Jack Howarth --- It would be interesting to find out what Intel openmp is doing differently on darwin since it is significantly faster on four threads.

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #4 from Andrew Pinski --- The reason why GCC on linux is better is because it uses the futex syscall to have better locking (lower overhead). See config/linux/{mutex,lock}.c. While on Darwin, it directly calls into pthread_mutex cal

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #3 from Jack Howarth --- FYI, the timings on clang are for clang 3.4.1 with a merge of current clang-omp github commit f9e2fd7640f8fc06ebe1ef2f065c6158f6b4b6ef and openmp svn trunk from llvm.org at r208472/

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #2 from Jack Howarth --- Created attachment 32867 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32867&action=edit heated_plate_gcc.sh shell script to collect timings

[Bug libgomp/61333] potential target specific performance issue with libgomp

2014-05-27 Thread howarth.at.gcc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333 --- Comment #1 from Jack Howarth --- Created attachment 32866 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32866&action=edit heated_plate_openmp.c test code