https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #14 from Jack Howarth ---
Without optimization flags on a 24-core x86_64 Fedora 15 box, the timings for
one, two and four OMP processes areā¦
clang-3.4.0 (clang-omp/openmp) 69.988439 sec: 34.962212 sec: 17.641935 sec
gcc 4.6.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #13 from Jack Howarth ---
(In reply to Dominique d'Humieres from comment #12)
> > Is gcc really optimizing that low by default? ...
>
> AFAIK the default optimization in gcc is -O0. Now before drawing conclusions
> you should answer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #12 from Dominique d'Humieres ---
> Is gcc really optimizing that low by default? ...
AFAIK the default optimization in gcc is -O0. Now before drawing conclusions
you should answer my question in comment 8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #11 from Jack Howarth ---
(In reply to Jakub Jelinek from comment #10)
> Also, benchmarking -O0 code is weird.
Is gcc really optimizing that low by default? Certainly it is at least doing
-O1 when not passed a -O* optimization flag?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #10 from Jakub Jelinek ---
Note, libgomp is optimized for Linux futexes, it has bare support for other
targets, so unless somebody steps up and submits and maintains a port for other
OSes, those will keep using pthread_* APIs with no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #9 from Dominique d'Humieres ---
> ... (gcc 4.10.0 r210749) ...
Forgot to say: Target: x86_64-apple-darwin13, Corei7, 4 cores, 8 threads,
2.8Ghz
(turbo 3.8Ghz), cache 8Mb. Note that the "turbo" mode may make the serial test
faster.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #8 from Dominique d'Humieres ---
Some comments:
original shell: 1:1.86:2.9
+ -Ofast : 1:1.37:1.8
(gcc 4.10.0 r210749). Does this mean that there is a problem with -Ofast and
-fopenmp?
The Wallclock time are:
original shell: 4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #7 from Andrew Pinski ---
(In reply to Jack Howarth from comment #6)
> There is a call to pthread_cond_timedwait() in the libiomp5 implementation
> but I don't see any such calls in libgomp. Perhaps this is the related to
> the increa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #6 from Jack Howarth ---
There is a call to pthread_cond_timedwait() in the libiomp5 implementation but
I don't see any such calls in libgomp. Perhaps this is the related to the
increased performance in libiomp5 on darwin?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #5 from Jack Howarth ---
It would be interesting to find out what Intel openmp is doing differently on
darwin since it is significantly faster on four threads.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #4 from Andrew Pinski ---
The reason why GCC on linux is better is because it uses the futex syscall to
have better locking (lower overhead). See config/linux/{mutex,lock}.c.
While on Darwin, it directly calls into pthread_mutex cal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #3 from Jack Howarth ---
FYI, the timings on clang are for clang 3.4.1 with a merge of current clang-omp
github commit f9e2fd7640f8fc06ebe1ef2f065c6158f6b4b6ef and openmp svn trunk
from llvm.org at r208472/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #2 from Jack Howarth ---
Created attachment 32867
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32867&action=edit
heated_plate_gcc.sh shell script to collect timings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61333
--- Comment #1 from Jack Howarth ---
Created attachment 32866
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32866&action=edit
heated_plate_openmp.c test code
14 matches
Mail list logo