https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101891
--- Comment #9 from Arjan van de Ven ---
I don't have recent measurements since we did this work quite some time ago.
basically on the CPU level (speaking for Intel style cpus at least), a CPU can
eliminate (meaning: no execution resources used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101891
Arjan van de Ven changed:
What|Removed |Added
CC||arjan at linux dot intel.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101583
Arjan van de Ven changed:
What|Removed |Added
CC||arjan at linux dot intel.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
Arjan van de Ven changed:
What|Removed |Added
CC||arjan at linux dot intel.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82153
--- Comment #4 from Arjan van de Ven ---
btw gcc has no issue with just generating cvttsd2si
int roundme2(double A)
{
return A * 4.3;
}
generates
20: f2 0f 59 05 00 00 00mulsd 0x0(%rip),%xmm0# 28
27: 00
28: f2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82153
--- Comment #2 from Arjan van de Ven ---
When a conversion is inexact, a truncated result is returned. If a converted
result is larger than the maximum signed doubleword integer, the floating-point
invalid exception is raised, and if this excepti
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: arjan at linux dot intel.com
Target Milestone: ---
#include
int roundme(double A)
{
return floor(A * 4.3);
}
leads to
:
0: f2 0f 59 05 00 00 00mulsd 0x0(%rip),%xmm0# 8
7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #19 from Arjan van de Ven ---
> GCC is not just about x86.
I know that, which is why I know my patch is not correct, but more of a precise
bug report... clearly this need to be done in a way that does not hurt other
architectures.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #17 from Arjan van de Ven ---
(In reply to Andrew Pinski from comment #15)
> Read https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00693.html also. There
> is much more to that thread than just in August IIRC. Some in September and
> i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #16 from Arjan van de Ven ---
A comparable (but optimized to generate smaller asm) testcase is this:
#include
void RELU(float *buffer, int size)
{
float *ptr = (float *) __builtin_assume_aligned(buffer, 64);
int i;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #13 from Arjan van de Ven ---
Created attachment 40422
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40422&action=edit
generated ASM with vectorization (with patch / no fast-math)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #12 from Arjan van de Ven ---
Created attachment 40421
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40421&action=edit
generated ASM without vectorization (no patch / no fast-math)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #11 from Arjan van de Ven ---
Created attachment 40420
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40420&action=edit
generated ASM with vectorization and fast-math (no patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #10 from Arjan van de Ven ---
Created attachment 40419
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40419&action=edit
generated ASM with vectorization (no patch / no fast-math)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #9 from Arjan van de Ven ---
Created attachment 40418
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40418&action=edit
Makefile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #8 from Arjan van de Ven ---
Created attachment 40417
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40417&action=edit
refined test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #7 from Arjan van de Ven ---
Created attachment 40416
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40416&action=edit
prototype patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
Arjan van de Ven changed:
What|Removed |Added
Version|6.1.1 |6.3.0
--- Comment #6 from Arjan van d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #5 from Arjan van de Ven ---
I don't think that's completely true; it does use maxss (the non-vector one)
for this code, so at least something thinks its safe to use max, just likely
that something is after the vector phase?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #2 from Arjan van de Ven ---
I tried with <= and it doesn't seem all to eager to be vectorized that way
either; fast-math works either way
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: arjan at linux dot intel.com
Target Milestone: ---
program below does not auto-vectorize to use the x86 "maxps" instruction even
though gcc is smart enough to know there is a "maxss" instruction
21 matches
Mail list logo