https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
--- Comment #6 from Thomas Koenig ---
A few more test cases with a relatively recent trunk.
POWER7:
[tkoenig@gcc1-power7 ~]$ gcc -mcpu=power7 -O3 foo.c && time ./a.out
41.987257
real0m3.688s
user0m3.685s
sys 0m0.002s
[tkoenig@gcc1-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
--- Comment #5 from Thomas Koenig ---
(In reply to Richard Biener from comment #3)
> The question is of course whether vector division has comparable latency /
> throughput as the scalar one.
Here's a test case on a rather old CPU, a Core 2 Q820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
--- Comment #4 from Andrew Pinski ---
(In reply to Richard Biener from comment #3)
> The question is of course whether vector division has comparable latency /
> throughput as the scalar one.
On the cores that cavium produces the answer is yes f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
--- Comment #3 from Richard Biener ---
The question is of course whether vector division has comparable latency /
throughput as the scalar one.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
--- Comment #2 from Thomas Koenig ---
Another test case.
It might even be profitable just to look for divisions, because these
are so expensive that packing/unpacking should always be
profitable.
double foo(double a, double b)
{
return 1/a +
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79151
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization
Status|