[Bug c++/83780] New: False positive alignment error with -fsanitize=undefined with virtual base
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83780 Bug ID: 83780 Summary: False positive alignment error with -fsanitize=undefined with virtual base Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: securesneakers at gmail dot com Target Milestone: --- Created attachment 43091 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43091&action=edit Minimal example that reproduces the issue Attached program generates false misalignment errors when compiled with -fsanitize=undefined $ g++ --version g++ (GCC) 7.2.1 20171224 $ uname -s -m Linux x86_64 $ g++ -std=c++11 -O2 -fsanitize=undefined minimal.cpp && ./a.out minimal.cpp:9:8: runtime error: constructor call on misaligned address 0x7ffdd1e1e658 for type 'struct Base2', which requires 16 byte alignment Attached example contains following hierarchy: struct alignas(16) Base1 { }; struct Base2 : virtual Base1 { }; struct Base3 : virtual Base2 { }; alignof(Base2) is set to 16 due to alignment of its base class. But when Base3 is instantiated, Base2 is placed with alignment of 8 as it should be according to Itanium C++ ABI (due to its non-virtual alignment being equal 8): https://refspecs.linuxfoundation.org/cxxabi-1.75.html#class-types. Yet sanitizer complains about alignment not being 16. Seems that sanitizer checks address using "normal" alignment when "non-virtual alignment" should be used.
[Bug sanitizer/83780] False positive alignment error with -fsanitize=undefined with virtual base
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83780 --- Comment #2 from Ivan Bodrov --- I have reported the same bug for Clang: https://bugs.llvm.org/show_bug.cgi?id=35902 Unlike GCC, Clang is also eager to generate unaligned "movaps" instructions, crashing the program. Afaik GCC does not generate SSE instructions that often, but I wonder if it can do something similar. Clang's unaligned movaps bug: https://bugs.llvm.org/show_bug.cgi?id=35901
[Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184 Ivan Bodrov changed: What|Removed |Added CC||securesneakers at gmail dot com --- Comment #2 from Ivan Bodrov --- This seem to have been implemented, at least for __atomic_fetch_and, but the optimization is very fragile and fails when "lock and" value and mask used during checking come from separate literals: $ cat fragile-fetch-and.c void slowpath(unsigned long *p); void func_bad(unsigned long *p) { if (__atomic_fetch_and(p, ~1UL, __ATOMIC_RELAXED) & ~1UL) slowpath(p); } void func_good(unsigned long *p) { unsigned long mask = ~1UL; if (__atomic_fetch_and(p, mask, __ATOMIC_RELAXED) & mask) slowpath(p); } Compiling this we can see that even though functions are the same, the first one wasn't optimized: $ gcc --version gcc (GCC) 13.2.1 20230801 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ uname -s -m Linux x86_64 $ gcc -O2 -c fragile-fetch-and.c $ objdump -d fragile-fetch-and.o fragile-fetch-and.o: file format elf64-x86-64 Disassembly of section .text: : 0: 48 8b 07mov(%rdi),%rax 3: 48 89 c1mov%rax,%rcx 6: 48 89 c2mov%rax,%rdx 9: 48 83 e1 fe and$0xfffe,%rcx d: f0 48 0f b1 0f lock cmpxchg %rcx,(%rdi) 12: 75 ef jne3 14: 48 83 fa 01 cmp$0x1,%rdx 18: 77 06 ja 20 1a: c3 ret 1b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 20: e9 00 00 00 00 jmp25 25: 66 66 2e 0f 1f 84 00data16 cs nopw 0x0(%rax,%rax,1) 2c: 00 00 00 00 0030 : 30: f0 48 83 27 fe lock andq $0xfffe,(%rdi) 35: 75 09 jne40 37: c3 ret 38: 0f 1f 84 00 00 00 00nopl 0x0(%rax,%rax,1) 3f: 00 40: e9 00 00 00 00 jmp45
[Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184 --- Comment #3 from Ivan Bodrov --- Created attachment 56646 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56646&action=edit Fails to apply optimization for __atomic_fetch_and ZF-flag with separate literals
[Bug rtl-optimization/115802] New: Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 Bug ID: 115802 Summary: Non-atomic load of static variable moved out of loop despite atomic fences Product: gcc Version: 14.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: securesneakers at gmail dot com Target Milestone: --- Created attachment 58595 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58595&action=edit Minimal program to reproduce the issue The attachment program contains: - A toy mutex (spinlock) - A toy condition variable - A non-atomic _static_ flag, protected by the mutex - A reader thread that waits for the flag to be set using mutex and condition variable - A writer thread that sets the flag and notifies the condition variable. The program can be compiled and run, but will hang: $ gcc -std=c11 -O2 minimal-executable.c $ ./a.out Because the waiting loop: mutex_lock(&mtx); while (!val) cond_wait(&cnd, &mtx); mutex_unlock(&mtx); Has been optimized into infinite loop: $ objdump -d a.out ... 10b0: e8 7b 01 00 00 call 1230 10b5: eb f9 jmp10b0 Such transformation means that non-atomic load of "val" has been moved before "memory_order_seq_cst" load of the "mutex_lock" function. Making the flag non-static or letting its address escape "fixes" it. I am using GCC 14.1.1, but this is reproducible for all version since at least 4.9.2. I have noticed that Clang shares similar issue, but only since Clang 13.
[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 --- Comment #3 from Ivan Bodrov --- Yes, but "wait" operation is not supposed to touch anything. The example shows typical usage of condition variable, except "real" code usually accesses data through some pointer, not as a static variable. Or links dynamically to mutex/condvar code. Once everything is linked statically and LTO is enabled, this transformation breaks it. Not a common case, but the code seems correct to me.
[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 --- Comment #5 from Ivan Bodrov --- Without the mutex threads would race for the first access to non-atomic variable. Seq-Cst ordering is only used to simplify the example, it can be relaxed to acquire/release/relaxed for different operations. I didn't want people to spend too much time thinking about it. I guess even shorter demonstration would be using inlint asm with memory clobbering: while (!val) __asm__ volatile ("":::"memory"); The above forces GCC to re-load "val" on every iteration and the code compiles to: .L8: mov edx, DWORD PTR val[rip] testedx, edx je .L8 But if the same fence is within the function, it won't have any effect: __attribute__((__noinline__)) void fence(void) { __asm__ volatile ("":::"memory"); } ... while (!val) fence(); Compiles to: .L8: callfunc jmp .L8
[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 --- Comment #6 from Ivan Bodrov --- Created attachment 58597 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58597&action=edit Smaller, but non runnable example. Full demo code for the above (can't be run, only shows code generation). Can be compiled as: $ gcc -O2 -c codegen-demo.c
[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 --- Comment #8 from Ivan Bodrov --- The 2nd example is only intended to show changes in code generation after the fence is moved to the function that is visible, but is not inlined, which is the cause of this issue. The code is not supposed to be correct. This is why the original example has a complete mutex + condvar.
[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802 --- Comment #9 from Ivan Bodrov --- Created attachment 58598 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58598&action=edit Minimal program to reproduce the issue (no condvar) To further simplify the original code, the condition variable can be excluded and the waiter code can be replaced with unlock-then-lock: mutex_lock(&mtx); while (!val) { mutex_unlock(&mtx); mutex_lock(&mtx); } mutex_unlock(&mtx); This might be a better demonstration. Will hang the same way for the same reason: $ gcc -O2 minimal-executable-no-cond.c $ ./a.out