Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
Created attachment 49013
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49013&action=edit
bug demonstrato
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512
--- Comment #1 from N Schaeffer ---
Created attachment 49014
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49014&action=edit
even simpler bug demonstrator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512
--- Comment #3 from N Schaeffer ---
(In reply to Richard Biener from comment #2)
> With trunk and GCC 10 I see
>
> vbroadcastsdzmm0, QWORD PTR [8+r8*8]
>
> can you check newer GCC? GCC 8.4 is out since some time already and I do
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512
--- Comment #6 from N Schaeffer ---
Hello,
Working further on this, it seems to be a problem in the assembler step, but
only on some installations.
I have a system where gcc 8.3 to 9 and 10 are good (no bug), while another
system where gcc 8.3,
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
It seems that trying to zero out two arrays in the same loop results in poor
code beeing generated by -O3.
If I understand it
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
When trying to produce a xor mask to negate even elements in an AVX vector, gcc
produces wrong code with -funsafe-math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334
--- Comment #3 from N Schaeffer ---
Hi,
Thanks for pointing out the issue about writing different values. This makes
sense.
However, since memset deals with bytes, whenever the type of array is floating
point data (or anything longer than bytes)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93334
--- Comment #5 from N Schaeffer ---
Elaborating a bit on this:
I can eliminate this problem by using:
-O3 -fno-tree-loop-distribute-patterns -fno-tree-loop-vectorize
I wonder why -fno-tree-loop-distribute-patterns is not enough ?
In that cas
: gcc
Version: 9.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
According to Agner Fog
++
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
With -ffast-math, isnan should return true if passed a NaN value.
Otherwise, how is isnan different than (x!=x) ?
isnan worked as expected with gcc 4.7, but does not with 4.8.1 and 4.8.2
How can I check if x
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237
--- Comment #2 from N Schaeffer ---
Thank you for your answer.
My program (which is a computational fluid dynamics solver) is not supposed to
produce NaNs. However, when it does (which means something went wrong), I would
like to abort the progra
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237
--- Comment #4 from N Schaeffer ---
int my_isnan(double x){
volatile double y=x;
return y!=y;
}
is translated to:
0x00406cf0 <+0>: movsd QWORD PTR [rsp-0x8],xmm0
0x00406cf6 <+6>: xoreax,eax
0x0040
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60237
--- Comment #6 from N Schaeffer ---
-fno-builtin-isnan is also interesting, thanks.
Is there somewhere a rationale for not making isnan() find NaN's with
-ffinite-math-only ?
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
I have found what seems to be a regression.
The following code is not compiled to 256-bit AVX when compiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98563
--- Comment #1 from N Schaeffer ---
I just found the -mprefer-vector-width=512 to force to use zmm.
The reported regression however remains.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98563
--- Comment #3 from N Schaeffer ---
I'd like to add that when you say "vectorization of the basic block", the code
generated is actually worse than non-vectorized naive code: it handles all
loads and arithmetic operations in scalar mode (v*sd ins
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nathanael.schaeffer at gmail dot com
Target Milestone: ---
A simple loop multiplying two arrays, with different multiplicity fails to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #3 from N Schaeffer ---
I have not benchmarked.
For 4 vmulpd doing the actual work, there are more than 40 permute/mov
instructions, among which 24 vpermd instructions which have a 3 cycle latency.
That is 6 vpermd per vmulpd.
There
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #4 from N Schaeffer ---
... and thank you for your quick reply!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #6 from N Schaeffer ---
indeed, aarch64 assembly looks very good.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #9 from N Schaeffer ---
In addition, optimizing for size with -Os leads to a non-vectorized double-loop
(51 bytes) while the vectorized loop with vbroadcastsd (produced by clang -Os)
leads to 40 bytes.
It is thus also a missed optimi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #10 from N Schaeffer ---
intrestingly (and maybe surprisingly) I can get gcc to produce nearly optimal
code using vbroadcastsd with the following options:
-O2 -march=skylake -ftree-vectorize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #12 from N Schaeffer ---
I found the "offending" option, and it seems to be indeed a cost-model problem
as Andrew Pinski said:
good code is generated by:
gcc -O2 -ftree-vectorize -march=skylake (since gcc 6.1)
gcc -O1 -ftre
23 matches
Mail list logo