[Bug other/95971] New: [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 Bug ID: 95971 Summary: [10 regression] Optimizer converts a false boolean value into a true boolean value Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: 0xe2.0x9a.0x9b at gmail dot com Target Milestone: --- Hello. I have found an optimization issue that is triggered by the -O2 optimization option in GCC 10.1.0. The source code (see below) contains an infinite while(cond){} loop. The loop condition is expected to always evaluate to true. The optimizer incorrectly derives that the loop condition evaluates to false and removes the loop. It is possible that the issue is related to optimizations of the delete operator in C++. Reproducibility: g++ 10.1.0 -O0: not reproducible g++ 10.1.0 -O1: not reproducible g++ 10.1.0 -O2: REPRODUCIBLE g++ 10.1.0 -O3: not reproducible g++ 9.3.0 -O2: not reproducible clang++ 10 -O2: not reproducible Full source code: $ cat a.cc void xbool(bool value); struct A { char *a = (char*)1; ~A() { delete a; } bool isZero() { return a == (void*)0; } }; int main() { A a; xbool(a.isZero()); while(!a.isZero()); xbool(a.isZero()); // This line isn't required to trigger the issue return 0; } $ cat b.cc void xbool(bool value) {} $ cat Makefile test: g++ -c -O2 a.cc g++ -c b.cc g++ -o a a.o b.o time ./a Dump of assembler code for function main: push %rbp xor%edi,%edi // %rdi := false sub$0x10,%rsp movq $0x1,0x8(%rsp) callq xbool(bool) mov$0x1,%edi // %rdi := true callq xbool(bool) lea0x8(%rsp),%rdi callq A::~A() add$0x10,%rsp xor%eax,%eax pop%rbp retq mov%rax,%rbp jmpq main.cold In the assembler code: The compiler correctly passes zero (false) in the 1st call to function xbool(bool), then incorrectly passes one (true) in the 2nd call to function xbool(bool). The source code initializes A::a to (char*)1 in order to keep the code as small as possible to trigger the issue. A::a could have been initialized to a valid delete-able heap address, but this would unnecessarily enlarge the source code. The GCC version string on my machine is "g++ (Gentoo 10.1.0-r1 p2) 10.1.0". Please confirm the reproducibility of this issue.
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #2 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 48805 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48805&action=edit b.cc
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #1 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 48804 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48804&action=edit a.cc
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #3 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 48806 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48806&action=edit Makefile
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> changed: What|Removed |Added Attachment #48804|0 |1 is obsolete|| --- Comment #5 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 48808 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48808&action=edit a.cc Initialize A::a to a valid heap pointer, instead of initializing it to (char*)1.
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #7 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Martin Liška from comment #6) > All right, so it's caused by cdde1: > > Assume loop 1 to be finite: it has an exit and -ffinite-loops is on. > >-ffinite-loops >Assume that a loop with an exit will eventually take the exit and > not loop indefinitely. This allows the compiler to remove loops that > otherwise have no side-effects, not considering eventual endless looping as > such. > >This option is enabled by default at -O2 for C++ with -std=c++11 > or higher. Thank you for the explanation. Your mindset is forcing me to stop using g++ over time because of reliability concerns during application development. Sincerely Jan
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #9 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Martin Liška from comment #8) > Or you can use -fno-finite-loops option. I am sorry, but I cannot trust this compiler not to force me again spending several hours of time just to learn that -O2 is semantically different from -O1 and -O3. The meaning of "semantically equivalent" in my mind is different from the meaning of "semantically equivalent" in your mind. Infinite loopiness is in my opinion semantically significant, so the compiler should have printed a warning that would inform me about the fact that the compiler is changing the semantics of the code in question. With -O3, the assembly code is: Dump of assembler code for function main: <+0>:sub$0x8,%rsp <+4>:xor%edi,%edi <+6>:callq xbool(bool) <+11>: jmpmain+11 "11: jmp 11" is a prime example of what -ffinite-loops is supposed to prevent from being generated. Assuming that -O3 actually does include -ffinite-loops, which I am unable to verify because "g++ --help=optimizers -Q" doesn't accept the -std=gnu++11 option.
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #10 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- I hope you do realize that the code I posted previously is equivalent, or very close to being equivalent, to the following code: struct President { const bool dead = false; bool isDead() { return dead; } } president; while(!president.isDead()); if(president.isDead()) { launch_retaliation_nukes(); } With -ffinite-loops enabled, the nukes are going to be launched because the only way that the while-loop can terminate is for President::dead to be true and thus the "const bool dead" can be assumed to be true when execution reaches the if-statement after skipping the deleted infinite while loop.
[Bug other/95971] [10 regression] Optimizer converts a false boolean value into a true boolean value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95971 --- Comment #12 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Marc Glisse from comment #11) > while(!a.isZero()); > > that doesn't look like something you would find in real code. Are you > waiting for a different thread to modify a? Then you should use an atomic > operation. Are you waiting for the hardware to change something? Use > volatile. Do you really want an infinite loop? Spell it out > if(!a.isZero())for(;;); The code I sent is a downsized version of a larger code, which means that the posted code isn't the real code.
[Bug target/89557] [7/8/9/10 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #11 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Eric Gallager from comment #10) > > /usr/bin/time ./a0-7.4 |& egrep -o [0-9]+.*user > 1.48 real 1.26 user > /usr/bin/time ./ag-7.4 |& egrep -o [0-9]+.*user > 0.61 real 0.59 user > /usr/bin/time ./a1-7.4 |& egrep -o [0-9]+.*user > 0.57 real 0.55 user > > /usr/bin/time ./a0-8.3 |& egrep -o [0-9]+.*user > 1.27 real 1.21 user > /usr/bin/time ./ag-8.3 |& egrep -o [0-9]+.*user > 0.60 real 0.59 user > /usr/bin/time ./a1-8.3 |& egrep -o [0-9]+.*user > 0.60 real 0.59 user > /usr/bin/time ./a3-8.3 |& egrep -o [0-9]+.*user > 0.45 real 0.43 user > > /usr/bin/time ./ag-7.4n |& egrep -o [0-9]+.*user > 0.60 real 0.59 user > /usr/bin/time ./ag-8.3n |& egrep -o [0-9]+.*user > 0.61 real 0.59 user > > So, uh, I'm not sure if that's a confirmation, but it's an extra data point. Interesting. Your measurement is showing that there is no performance regression on your machine when going from ag-7.4 to ag-8.3. Some questions: - What CPU was used to obtain your results? - If you run "perf record ./ag-8.3; perf report", which instructions do you see highlighted when you enter the disassembly of function "mul"? On Ryzen 3700X, I see: 3.57% movdqu 0x70(%rsp), %xmm4 69.25% movups %xmm4, 0x30(%rsp) 9.32% jmpq 11bb Thanks. Sidenote: The mirror http://gcc.fyxm.net at https://gcc.gnu.org/mirrors.html is invalid.
[Bug c++/89557] New: [7/8 regression] 4*movq to 2*movaps IPC performance regression on znver1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 Bug ID: 89557 Summary: [7/8 regression] 4*movq to 2*movaps IPC performance regression on znver1 Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: 0xe2.0x9a.0x9b at gmail dot com Target Milestone: --- Approximate C++ source code: struct __attribute__((aligned(16))) A { union { struct { uint64_t a; double b; }; uint64_t data[2]; }; }; A a; a.a = 2; a.b = x*y; return a; CPU: AMD Ryzen 5 1600 Six-Core Processor GCC 7.4.0 generates (no -march/mtune): movq $2, 0x80(%rsp) movsd %xmm0, 0x88(%rsp) mov 0x80(%rsp), %rax mov 0x88(%rsp), %rdx mov %rax, 0x30(%rsp) mov %rdx, 0x38(%rsp) GCC 7.4.0 generates (no -march, -mtune=native): movq $2, 0x80(%rsp) movsd %xmm0, 0x88(%rsp) movaps 0x80(%rsp), %xmm6 movaps %xmm6, 0x30(%rsp) GCC 8.2.0 generates (no -march/mtune): movq $2, 0x80(%rsp) movsd %xmm0, 0x88(%rsp) movdqa 0x80(%rsp), %xmm6 movaps %xmm6, 0x30(%rsp) GCC 8.2.0 generates (no -march, -mtune=native): movq $2, 0x80(%rsp) movsd %xmm0, 0x88(%rsp) movaps 0x80(%rsp), %xmm6 movaps %xmm6, 0x30(%rsp) IPC of an executable which uses the above code (perf stat): GCC 7.4.0 (no -march/mtune): 617.233116 task-clock (msec) #0.997 CPUs utilized 4,139,124,553 instructions #1.94 insn per cycle GCC 7.4.0 (no -march, -mtune=native): 1106.252920 task-clock (msec) #1.000 CPUs utilized 3,995,268,509 instructions #1.02 insn per cycle GCC 8.2.0 (no -march/mtune): 1096.852485 task-clock (msec) #1.000 CPUs utilized 3,790,839,401 instructions #0.97 insn per cycle GCC 8.2.0 (no -march, -mtune=native): 1105.693441 task-clock (msec) #1.000 CPUs utilized 4,041,957,928 instructions #1.04 insn per cycle Summary: Using 2*movaps instead of 4*movq severely lowers IPC on znver1 CPUs
[Bug c++/89557] [7/8 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> changed: What|Removed |Added Summary|[7/8 regression] 4*movq to |[7/8 regression] 4*movq to |2*movaps IPC performance|2*movaps IPC performance |regression on znver1|regression on znver1 with ||-Og --- Comment #1 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Forgot to mention that this happens with -Og optimization level.
[Bug c++/89557] [7/8 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #3 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Jakub Jelinek from comment #2) > -Og is not meant to generate code with good performance, but code which is > easy to debug, so benchmarking something with -Og makes no sense. I agree on the first part of your sentence. On the other hand, -Og is in my opinion the best among the -O? options for C/C++ developers to use during the development cycle and I believe -Og was originally intended to be used by developers, so we should care about its performance because of its presumably non-negligible userbase. > That said, if znver1 has slow movaps and it is confirmed on something other > than a microbenchmark, then we should adjust tuning to avoid using it for > memory copying. I think a carefully selected use of znver1 16-byte movaps isn't slow, but it is slow at least in the case when it is preceded by two 8-byte stores or more generally by any partial store to the 16 bytes in memory. A little piece of code enables to clearly demonstrate the cause of a problem in order to suggest an optimization rule for the compiler to follow. The IPC data I measured are from a larger application, and I was directed to the seemingly short code fragment by using "perf record" because it is a performance issue in the larger app. The 16-byte struct is fundamental to the application and I cannot avoid using it at this point in time, although I can remove the 16-byte alignment attribute which causes movaps to be generated. In general, imposing a 16-byte alignment on any C/C++ data structure with size >= 16 bytes shouldn't slow down any program by a factor of 2. It can be expected to increase or decrease performance by say a factor of 1.1 depending on workload. A factor of 2 slowdown is unexpected. It would be interesting to see what would happen to performance if all data structures in C/C++ codes with size >= 16 bytes were annotated to be aligned to 16 bytes. I don't have performance measurements about such general use of the aligned(16) attribute. Thank you for your reply.
[Bug c++/89557] [7/8 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #4 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Without the aligned(16) attribute the alignment of the struct in my code is 8 bytes, struct size remains to be 16 bytes: GCC 8.2.0 generates (-Og, no -march/mtune): movq $2, 0x80(%rsp) movsd %xmm0, 0x88(%rsp) movdqa 0x80(%rsp), %xmm6 movups %xmm6, 0x30(%rsp) The movups used here has approximately the same performance as movaps on znver1.
[Bug target/89557] [7/8/9 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #6 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 45897 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45897&action=edit a.cc: compilable testcase
[Bug target/89557] [7/8/9 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #7 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Created attachment 45898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45898&action=edit Makefile
[Bug target/89557] [7/8/9 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #8 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- Testcase (a.cc) benchmark results. See attached Makefile for further information about compiler options. Machine 1: Ryzen 5 1600 Six-Core Processor: a0-7.4: 0.753795user ag-7.4: 0.313097user a1-7.4: 0.281629user a0-8.3: 0.739894user ag-8.3: 0.954584user<-- performance issue in respect to ag-7.4 a1-8.3: 0.281554user a3-8.3: 0.224067user ag-7.4n: 1.032364user<-- performance issue in respect to ag-7.4 ag-8.3n: 1.007429user<-- performance issue in respect to ag-7.4 Machine 2: Intel(R) Xeon(R) CPU E5-2676 v3: a0-7.4: 1.02user ag-7.4: 0.37user a1-7.4: 0.34user a0-8.3: 1.01user ag-8.3: 0.95user<-- performance issue in respect to a1-7.4 a1-8.3: 0.34user a3-8.3: 0.27user ag-7.4n (-march=znver1): 1.05user<-- performance issue in respect to ag-7.4 ag-8.3n (-march=znver1): 0.99user<-- performance issue in respect to ag-7.4 Machine 3: Intel(R) Celeron(R) CPU N2930: a0-7.4: 2.223435user ag-7.4: 1.017597user a1-7.4: 0.741288user a0-8.3: 2.224145user ag-8.3: 1.620879user<-- performance issue in respect to ag-7.4 a1-8.3: 1.014488user<-- performance regression in respect to a1-7.4 a3-8.3: 0.885718user<-- performance regression in respect to a1-7.4 ag-7.4n (-march=znver1): n/a ag-8.3n (-march=znver1): n/a
[Bug target/89557] [7/8/9/10 regression] 4*movq to 2*movaps IPC performance regression on znver1 with -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89557 --- Comment #9 from Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b at gmail dot com> --- (In reply to Richard Biener from comment #5) > Please provide a compilable testcase. Done some time ago. Please change the status of this bug from WAITING to some other status.