[Bug target/92462] New: [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 Bug ID: 92462 Summary: [arm32] -ftree-pre makes a variable to be wrongly hoisted out Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: aleksei.voity...@bell-sw.com Target Milestone: --- Created attachment 47212 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47212&action=edit reduced testcase from the openjdk sources While compiling openjdk sources I bumped into a bug: when -ftree-pre is enabled the optimizer hoists out reload of a variable which subsequently leads to the infinite loop situation. Below is the relevant piece of code and "new_value" is the variable that gets hoisted out. template inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, T volatile* dest, T compare_value, atomic_memory_order order) const { printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", exchange_value, compare_value); uint8_t canon_exchange_value = exchange_value; uint8_t canon_compare_value = compare_value; volatile uint32_t* aligned_dest = reinterpret_cast(align_down(dest, sizeof(uint32_t))); size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; uint32_t cur = *aligned_dest; uint8_t* cur_as_bytes = reinterpret_cast(&cur); cur_as_bytes[offset] = canon_compare_value; do { uint32_t new_value = cur; reinterpret_cast(&new_value)[offset] = canon_exchange_value; printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", new_value, cur); uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); if (res == cur) break; cur = res; } while (cur_as_bytes[offset] == canon_compare_value); return PrimitiveConversions::cast(cur_as_bytes[offset]); } $ g++ -O1 -ftree-pre t.cpp $ ./a.out Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 $ g++ -O1 t.cpp $ ./a.out Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 Below is the assembler of the loop for the correct version: .L7: ldr r4, [sp] str r4, [sp, #4] strbr7, [r5, #-4] mov r2, r4 ldr r1, [sp, #4] mov r0, r6 bl printf cbz r4, .L6 movsr3, #0 str r3, [sp] ldrbr3, [r8]@ zero_extendqisi2 cmp r3, #1 beq .L7 and for the incorrect one: .L7: str r4, [sp, #4] strbr8, [r6] mov r2, r4 ldr r1, [sp, #4] mov r0, r5 bl printf cbz r4, .L6 movsr4, #0 str r4, [sp] ldrbr3, [r7]@ zero_extendqisi2 cmp r3, #1 beq .L7
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 Aleksei Voitylov changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID |--- --- Comment #4 from Aleksei Voitylov --- (In reply to Wilco from comment #1) > (In reply to Aleksei Voitylov from comment #0) > > Created attachment 47212 [details] > > reduced testcase from the openjdk sources > > > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is > > enabled the optimizer hoists out reload of a variable which subsequently > > leads to the infinite loop situation. > > > > Below is the relevant piece of code and "new_value" is the variable that > > gets hoisted out. > > > > template > > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, > > T volatile* dest, > > T compare_value, > > atomic_memory_order order) const { > > printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", > > exchange_value, compare_value); > > uint8_t canon_exchange_value = exchange_value; > > uint8_t canon_compare_value = compare_value; > > volatile uint32_t* aligned_dest > > = reinterpret_cast(align_down(dest, > > sizeof(uint32_t))); > > size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; > > uint32_t cur = *aligned_dest; > > uint8_t* cur_as_bytes = reinterpret_cast(&cur); > > cur_as_bytes[offset] = canon_compare_value; > > do { > > uint32_t new_value = cur; > > reinterpret_cast(&new_value)[offset] = > > canon_exchange_value; > > printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", > > new_value, cur); > > uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); > > if (res == cur) break; > > cur = res; > > } while (cur_as_bytes[offset] == canon_compare_value); > > return PrimitiveConversions::cast(cur_as_bytes[offset]); > > } > > > > $ g++ -O1 -ftree-pre t.cpp > > $ ./a.out > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 > > > > $ g++ -O1 t.cpp > > $ ./a.out > > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 > > > > Below is the assembler of the loop for the correct version: > > > > .L7: > > ldr r4, [sp] > > str r4, [sp, #4] > > strbr7, [r5, #-4] > > mov r2, r4 > > ldr r1, [sp, #4] > > mov r0, r6 > > bl printf > > cbz r4, .L6 > > movsr3, #0 > > str r3, [sp] > > ldrbr3, [r8]@ zero_extendqisi2 > > cmp r3, #1 > > beq .L7 > > > > and for the incorrect one: > > > > .L7: > > str r4, [sp, #4] > > strbr8, [r6] > > mov r2, r4 > > ldr r1, [sp, #4] > > mov r0, r5 > > bl printf > > cbz r4, .L6 > > movsr4, #0 > > str r4, [sp] > > ldrbr3, [r7]@ zero_extendqisi2 > > cmp r3, #1 > > beq .L7 > > There are serious aliasing bugs in the source - GCC is quite correct in > assuming that cur and cur_as_bytes[offset] never alias (obviously) and even > optimize away the cmpxchg (no idea why, that appears wrong). Isn't uint32_t cur = *aligned_dest; uint8_t* cur_as_bytes = reinterpret_cast(&cur); the very definition of the pointer aliasing? Regardless, if the function being called is doing atomic operations or not, the variable in question should not be hoisted? This is the essence of this bug. Yes, openjdk code is doing nasty things furtheron (and the code predates builtin gcc operations and is compiled by other compilers as well which may not be aware of builtins), but the bug as it stands does not depend on that logic. > Even if you fix the aliasing bugs, it won't emulate a byte-oriented cmpxchg > correctly, there are bugs in the logic too. > > So I suggest to go back to the drawing board - you can't hack your own > atomic operations and just hope for the best. GCC supports a standard set of > atomic operations for a good reason!
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 --- Comment #5 from Aleksei Voitylov --- (In reply to Richard Biener from comment #3) > Indeed -fno-strict-aliasing might be a workaround (apart from the atomicity > issue). Also using a character type for the access (uint8_t is _not_ a > character type) would make the alias issue go away. Actually, it is not. However, -fno-tree-pre can be used as such but the point is why a workaround is needed here in the first place.
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 --- Comment #9 from Aleksei Voitylov --- (In reply to Alexander Monakov from comment #8) > The full preprocessed source is provided and it clearly says > > typedef unsigned char uint8_t; > > in line 10, so it is in fact a character type. > > It's suspicious that cmpxchg_using_helper does not return a value > (incorrectly reduced testcase?) and there's still an aliasing violation when > atomic_cmpxchg_func tries to cast 'dest' from uint8_t* to int*. I think the > report was closed prematurely. It is suspicious regarding atomic functionality - from that point the testcase is over-reduced, you are right. It doesn't invalidate the initial question about the hoisting, though. > Aleksei - always provide output of 'gcc -v' when reporting such bugs, > otherwise people may be unable to reproduce it when there's really a problem > (no way to tell how your compiler was configured or even its exact version). $ g++ -v Using built-in specs. COLLECT_GCC=/home/build/toolchain/gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-g++ COLLECT_LTO_WRAPPER=/home/build/toolchain/gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf/bin/../libexec/gcc/arm-linux-gnueabihf/7.3.1/lto-wrapper Target: arm-linux-gnueabihf Configured with: Thread model: posix gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 Aleksei Voitylov changed: What|Removed |Added Attachment #47212|0 |1 is obsolete|| --- Comment #11 from Aleksei Voitylov --- Created attachment 47260 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47260&action=edit This is a streamlined testcase, no misleading references to external dependencies left
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 Aleksei Voitylov changed: What|Removed |Added Attachment #47260|0 |1 is obsolete|| --- Comment #12 from Aleksei Voitylov --- Created attachment 47262 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47262&action=edit This is a streamlined testcase, no misleading references to external dependencies left
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 Aleksei Voitylov changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|FIXED |--- --- Comment #13 from Aleksei Voitylov --- I'm reopening it since the new testcase that doesn't have anything to do with the actual atomic operations still fails in the same manner. I changed the name "Atomic" so that to make make it less distractive but even the source itself is quite self-evident now. It could be the case that the optimizer wrongly considers uint8_t type (which is unsigned char) as not aliasing uint32_t in this particular scenario.
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 --- Comment #17 from Aleksei Voitylov --- (In reply to Andrew Pinski from comment #14) > Have you tested gcc 7.5.0 that was just released? How about gcc 8.x? Have > you tried that. There has been aliasing bugs in gcc before and this might > already been fixed. I just built ToT cross-gcc and the result is the same. $ arm-linux-gnueabihf-gcc --version arm-linux-gnueabihf-gcc (GCC) 10.0.0 20191023 (experimental) Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462 --- Comment #25 from Aleksei Voitylov --- (In reply to Richard Biener from comment #22) > Fixed on trunk. Can arm people verify? I checked the DSE dump only. Bonus > if you manage to create a testcase for the testsuite failing before, passing > now. > > The patch is simple enough to backport if it works. Unfortunately it still fails for arm32. Here is the relevant piece of rtl dump for dse1: **scanning insn=21 mem: (plus:SI (reg/f:SI 125) (reg/v:SI 118 [ offset ])) after canon_rtx address: (plus:SI (plus:SI (reg/f:SI 102 sfp) (reg/v:SI 118 [ offset ])) (const_int -8 [0xfff8])) after cselib_expand address: (plus:SI (minus:SI (reg/f:SI 102 sfp) (reg:SI 116 [ _10 ])) (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) (const_int -8 [0xfff8] after canon_rtx address: (plus:SI (minus:SI (reg/f:SI 102 sfp) (reg:SI 116 [ _10 ])) (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) (const_int -8 [0xfff8] varying cselib base=11:2037069969 offset = 0 processing cselib store [0..1) mems_found = 1, cannot_delete = false **scanning insn=22 mem: (plus:SI (reg/f:SI 102 sfp) (const_int -8 [0xfff8])) after canon_rtx address: (plus:SI (reg/f:SI 102 sfp) (const_int -8 [0xfff8])) gid=1 offset=-8 processing const load gid=1[-8..-4) trying to replace SImode load in insn 22 from SImode store in insn 16 deferring rescan insn with uid = 22. deferring rescan insn with uid = 90. -- replaced the loaded MEM with (reg 135)