[Bug target/92462] New: [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-11 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Bug ID: 92462
   Summary: [arm32] -ftree-pre makes a variable to be wrongly
hoisted out
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aleksei.voity...@bell-sw.com
  Target Milestone: ---

Created attachment 47212
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47212&action=edit
reduced testcase from the openjdk sources

While compiling openjdk sources I bumped into a bug: when -ftree-pre is enabled
the optimizer hoists out reload of a variable which subsequently leads to the
infinite loop situation.

Below is the relevant piece of code and "new_value" is the variable that gets
hoisted out.

template
inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value,
T volatile* dest,
T compare_value,
atomic_memory_order order) const {
printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n",
exchange_value, compare_value);
uint8_t canon_exchange_value = exchange_value;
uint8_t canon_compare_value = compare_value;
volatile uint32_t* aligned_dest
= reinterpret_cast(align_down(dest,
sizeof(uint32_t)));
size_t offset = (uintptr_t)dest  - (uintptr_t)aligned_dest;
uint32_t cur = *aligned_dest;
uint8_t* cur_as_bytes = reinterpret_cast(&cur);
cur_as_bytes[offset] = canon_compare_value;
do {
uint32_t new_value = cur;
reinterpret_cast(&new_value)[offset] = canon_exchange_value;
printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n",
new_value, cur);
uint32_t res = cmpxchg(new_value, aligned_dest, cur, order);
if (res == cur) break;
cur = res;
} while (cur_as_bytes[offset] == canon_compare_value);
return PrimitiveConversions::cast(cur_as_bytes[offset]);
}

$ g++ -O1 -ftree-pre t.cpp
$ ./a.out
Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0

$ g++ -O1 t.cpp
$ ./a.out
Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256

Below is the assembler of the loop for the correct version:

.L7:
ldr r4, [sp]
str r4, [sp, #4]
strbr7, [r5, #-4]
mov r2, r4
ldr r1, [sp, #4]
mov r0, r6
bl  printf
cbz r4, .L6
movsr3, #0
str r3, [sp]
ldrbr3, [r8]@ zero_extendqisi2
cmp r3, #1
beq .L7

and for the incorrect one:

.L7:
str r4, [sp, #4]
strbr8, [r6]
mov r2, r4
ldr r1, [sp, #4]
mov r0, r5
bl  printf
cbz r4, .L6
movsr4, #0
str r4, [sp]
ldrbr3, [r7]@ zero_extendqisi2
cmp r3, #1
beq .L7

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-12 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Aleksei Voitylov  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #4 from Aleksei Voitylov  ---
(In reply to Wilco from comment #1)
> (In reply to Aleksei Voitylov from comment #0)
> > Created attachment 47212 [details]
> > reduced testcase from the openjdk sources
> > 
> > While compiling openjdk sources I bumped into a bug: when -ftree-pre is
> > enabled the optimizer hoists out reload of a variable which subsequently
> > leads to the infinite loop situation.
> > 
> > Below is the relevant piece of code and "new_value" is the variable that
> > gets hoisted out.
> > 
> > template
> > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value,
> > T volatile* dest,
> > T compare_value,
> > atomic_memory_order order) const {
> > printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n",
> > exchange_value, compare_value);
> > uint8_t canon_exchange_value = exchange_value;
> > uint8_t canon_compare_value = compare_value;
> > volatile uint32_t* aligned_dest
> > = reinterpret_cast(align_down(dest,
> > sizeof(uint32_t)));
> > size_t offset = (uintptr_t)dest  - (uintptr_t)aligned_dest;
> > uint32_t cur = *aligned_dest;
> > uint8_t* cur_as_bytes = reinterpret_cast(&cur);
> > cur_as_bytes[offset] = canon_compare_value;
> > do {
> > uint32_t new_value = cur;
> > reinterpret_cast(&new_value)[offset] =
> > canon_exchange_value;
> > printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n",
> > new_value, cur);
> > uint32_t res = cmpxchg(new_value, aligned_dest, cur, order);
> > if (res == cur) break;
> > cur = res;
> > } while (cur_as_bytes[offset] == canon_compare_value);
> > return PrimitiveConversions::cast(cur_as_bytes[offset]);
> > }
> > 
> > $ g++ -O1 -ftree-pre t.cpp
> > $ ./a.out
> > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0
> > 
> > $ g++ -O1 t.cpp
> > $ ./a.out
> > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1
> > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256
> > 
> > Below is the assembler of the loop for the correct version:
> > 
> > .L7:
> > ldr r4, [sp]
> > str r4, [sp, #4]
> > strbr7, [r5, #-4]
> > mov r2, r4
> > ldr r1, [sp, #4]
> > mov r0, r6
> > bl  printf
> > cbz r4, .L6
> > movsr3, #0
> > str r3, [sp]
> > ldrbr3, [r8]@ zero_extendqisi2
> > cmp r3, #1
> > beq .L7
> > 
> > and for the incorrect one:
> > 
> > .L7:
> > str r4, [sp, #4]
> > strbr8, [r6]
> > mov r2, r4
> > ldr r1, [sp, #4]
> > mov r0, r5
> > bl  printf
> > cbz r4, .L6
> > movsr4, #0
> > str r4, [sp]
> > ldrbr3, [r7]@ zero_extendqisi2
> > cmp r3, #1
> > beq .L7
> 
> There are serious aliasing bugs in the source - GCC is quite correct in
> assuming that cur and cur_as_bytes[offset] never alias (obviously) and even
> optimize away the cmpxchg (no idea why, that appears wrong).
Isn't

   uint32_t cur = *aligned_dest;
   uint8_t* cur_as_bytes = reinterpret_cast(&cur);

the very definition of the pointer aliasing? Regardless, if the function being
called is doing atomic operations or not, the variable in question should not
be hoisted? This is the essence of this bug.

Yes, openjdk code is doing nasty things furtheron (and the code predates
builtin gcc operations and is compiled by other compilers as well which may not
be aware of builtins), but the bug as it stands does not depend on that logic.


> Even if you fix the aliasing bugs, it won't emulate a byte-oriented cmpxchg
> correctly, there are bugs in the logic too.
> 
> So I suggest to go back to the drawing board - you can't hack your own
> atomic operations and just hope for the best. GCC supports a standard set of
> atomic operations for a good reason!

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-12 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

--- Comment #5 from Aleksei Voitylov  ---
(In reply to Richard Biener from comment #3)
> Indeed -fno-strict-aliasing might be a workaround (apart from the atomicity
> issue).  Also using a character type for the access (uint8_t is _not_ a
> character type) would make the alias issue go away.
Actually, it is not. However, -fno-tree-pre can be used as such but the point
is why a workaround is needed here in the first place.

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-12 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

--- Comment #9 from Aleksei Voitylov  ---
(In reply to Alexander Monakov from comment #8)
> The full preprocessed source is provided and it clearly says
> 
> typedef unsigned char uint8_t;
> 
> in line 10, so it is in fact a character type.
> 
> It's suspicious that cmpxchg_using_helper does not return a value
> (incorrectly reduced testcase?) and there's still an aliasing violation when
> atomic_cmpxchg_func tries to cast 'dest' from uint8_t* to int*. I think the
> report was closed prematurely.
It is suspicious regarding atomic functionality - from that point the testcase
is over-reduced, you are right. It doesn't invalidate the initial question
about the  hoisting, though.

> Aleksei - always provide output of 'gcc -v' when reporting such bugs,
> otherwise people may be unable to reproduce it when there's really a problem
> (no way to tell how your compiler was configured or even its exact version).

$ g++ -v
Using built-in specs.
COLLECT_GCC=/home/build/toolchain/gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-g++
COLLECT_LTO_WRAPPER=/home/build/toolchain/gcc-linaro-7.3.1-2018.05-x86_64_arm-linux-gnueabihf/bin/../libexec/gcc/arm-linux-gnueabihf/7.3.1/lto-wrapper
Target: arm-linux-gnueabihf
Configured with:



Thread model: posix
gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision
d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-14 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Aleksei Voitylov  changed:

   What|Removed |Added

  Attachment #47212|0   |1
is obsolete||

--- Comment #11 from Aleksei Voitylov  ---
Created attachment 47260
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47260&action=edit
This is a streamlined testcase, no misleading references to external
dependencies left

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-14 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Aleksei Voitylov  changed:

   What|Removed |Added

  Attachment #47260|0   |1
is obsolete||

--- Comment #12 from Aleksei Voitylov  ---
Created attachment 47262
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47262&action=edit
This is a streamlined testcase, no misleading references to external
dependencies left

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-14 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Aleksei Voitylov  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|FIXED   |---

--- Comment #13 from Aleksei Voitylov  ---
I'm reopening it since the new testcase that doesn't have anything to do with
the actual atomic operations still fails in the same manner.

I changed the name "Atomic" so that to make make it less distractive but even
the source itself is quite self-evident now.

It could be the case that the optimizer wrongly considers uint8_t type (which
is unsigned char) as not aliasing uint32_t in this particular scenario.

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-14 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

--- Comment #17 from Aleksei Voitylov  ---
(In reply to Andrew Pinski from comment #14)
> Have you tested gcc 7.5.0 that was just released?  How about gcc 8.x? Have
> you tried that.  There has been aliasing bugs in gcc before and this might
> already been fixed.

I just built ToT cross-gcc and the result is the same.

$ arm-linux-gnueabihf-gcc  --version
arm-linux-gnueabihf-gcc (GCC) 10.0.0 20191023 (experimental)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-18 Thread aleksei.voity...@bell-sw.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

--- Comment #25 from Aleksei Voitylov  ---
(In reply to Richard Biener from comment #22)
> Fixed on trunk.  Can arm people verify?  I checked the DSE dump only.  Bonus
> if you manage to create a testcase for the testsuite failing before, passing
> now.
> 
> The patch is simple enough to backport if it works.

Unfortunately it still fails for arm32. Here is the relevant piece of rtl dump
for dse1:

**scanning insn=21
  mem: (plus:SI (reg/f:SI 125)
(reg/v:SI 118 [ offset ]))

   after canon_rtx address: (plus:SI (plus:SI (reg/f:SI 102 sfp)
(reg/v:SI 118 [ offset ]))
(const_int -8 [0xfff8]))

   after cselib_expand address: (plus:SI (minus:SI (reg/f:SI 102 sfp)
(reg:SI 116 [ _10 ]))
(const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])
(const_int -8 [0xfff8]

   after canon_rtx address: (plus:SI (minus:SI (reg/f:SI 102 sfp)
(reg:SI 116 [ _10 ]))
(const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])
(const_int -8 [0xfff8]
  varying cselib base=11:2037069969 offset = 0
 processing cselib store [0..1)
mems_found = 1, cannot_delete = false

**scanning insn=22
  mem: (plus:SI (reg/f:SI 102 sfp)
(const_int -8 [0xfff8]))

   after canon_rtx address: (plus:SI (reg/f:SI 102 sfp)
(const_int -8 [0xfff8]))
  gid=1 offset=-8
 processing const load gid=1[-8..-4)
trying to replace SImode load in insn 22 from SImode store in insn 16
deferring rescan insn with uid = 22.
deferring rescan insn with uid = 90.
 -- replaced the loaded MEM with (reg 135)