https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #1 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Aleksei Voitylov from comment #0) > Created attachment 47212 [details] > reduced testcase from the openjdk sources > > While compiling openjdk sources I bumped into a bug: when -ftree-pre is > enabled the optimizer hoists out reload of a variable which subsequently > leads to the infinite loop situation. > > Below is the relevant piece of code and "new_value" is the variable that > gets hoisted out. > > template<typename T> > inline T Atomic::CmpxchgByteUsingInt::operator()(T exchange_value, > T volatile* dest, > T compare_value, > atomic_memory_order order) const { > printf ("Atomic::CmpxchgByteUsingInt::operator: 1: %d, 2: %d\n", > exchange_value, compare_value); > uint8_t canon_exchange_value = exchange_value; > uint8_t canon_compare_value = compare_value; > volatile uint32_t* aligned_dest > = reinterpret_cast<volatile uint32_t*>(align_down(dest, > sizeof(uint32_t))); > size_t offset = (uintptr_t)dest - (uintptr_t)aligned_dest; > uint32_t cur = *aligned_dest; > uint8_t* cur_as_bytes = reinterpret_cast<uint8_t*>(&cur); > cur_as_bytes[offset] = canon_compare_value; > do { > uint32_t new_value = cur; > reinterpret_cast<uint8_t*>(&new_value)[offset] = > canon_exchange_value; > printf ("Atomic::CmpxchgByteUsingInt::operator2: 1: %d, 2: %d\n", > new_value, cur); > uint32_t res = cmpxchg(new_value, aligned_dest, cur, order); > if (res == cur) break; > cur = res; > } while (cur_as_bytes[offset] == canon_compare_value); > return PrimitiveConversions::cast<T>(cur_as_bytes[offset]); > } > > $ g++ -O1 -ftree-pre t.cpp > $ ./a.out > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 0 > > $ g++ -O1 t.cpp > $ ./a.out > Atomic::CmpxchgByteUsingInt::operator: 1: 0, 2: 1 > Atomic::CmpxchgByteUsingInt::operator2: 1: 0, 2: 256 > > Below is the assembler of the loop for the correct version: > > .L7: > ldr r4, [sp] > str r4, [sp, #4] > strb r7, [r5, #-4] > mov r2, r4 > ldr r1, [sp, #4] > mov r0, r6 > bl printf > cbz r4, .L6 > movs r3, #0 > str r3, [sp] > ldrb r3, [r8] @ zero_extendqisi2 > cmp r3, #1 > beq .L7 > > and for the incorrect one: > > .L7: > str r4, [sp, #4] > strb r8, [r6] > mov r2, r4 > ldr r1, [sp, #4] > mov r0, r5 > bl printf > cbz r4, .L6 > movs r4, #0 > str r4, [sp] > ldrb r3, [r7] @ zero_extendqisi2 > cmp r3, #1 > beq .L7 There are serious aliasing bugs in the source - GCC is quite correct in assuming that cur and cur_as_bytes[offset] never alias (obviously) and even optimize away the cmpxchg (no idea why, that appears wrong). Even if you fix the aliasing bugs, it won't emulate a byte-oriented cmpxchg correctly, there are bugs in the logic too. So I suggest to go back to the drawing board - you can't hack your own atomic operations and just hope for the best. GCC supports a standard set of atomic operations for a good reason!