[Bug c++/83780] New: False positive alignment error with -fsanitize=undefined with virtual base

2018-01-10 Thread securesneakers at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83780

Bug ID: 83780
   Summary: False positive alignment error with
-fsanitize=undefined with virtual base
   Product: gcc
   Version: 7.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: securesneakers at gmail dot com
  Target Milestone: ---

Created attachment 43091
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43091&action=edit
Minimal example that reproduces the issue

Attached program generates false misalignment errors when compiled with
-fsanitize=undefined

$ g++ --version
g++ (GCC) 7.2.1 20171224

$ uname -s -m
Linux x86_64

$ g++ -std=c++11 -O2 -fsanitize=undefined minimal.cpp && ./a.out
minimal.cpp:9:8: runtime error: constructor call on misaligned address
0x7ffdd1e1e658 for type 'struct Base2', which requires 16 byte alignment

Attached example contains following hierarchy:

struct alignas(16) Base1 { };
struct Base2 : virtual Base1 { };
struct Base3 : virtual Base2 { };

alignof(Base2) is set to 16 due to alignment of its base class. But when Base3
is instantiated, Base2 is placed with alignment of 8 as it should be according
to Itanium C++ ABI (due to its non-virtual alignment being equal 8):
https://refspecs.linuxfoundation.org/cxxabi-1.75.html#class-types. Yet
sanitizer complains about alignment not being 16.

Seems that sanitizer checks address using "normal" alignment when "non-virtual
alignment" should be used.

[Bug sanitizer/83780] False positive alignment error with -fsanitize=undefined with virtual base

2018-01-11 Thread securesneakers at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83780

--- Comment #2 from Ivan Bodrov  ---
I have reported the same bug for Clang:
https://bugs.llvm.org/show_bug.cgi?id=35902

Unlike GCC, Clang is also eager to generate unaligned "movaps" instructions,
crashing the program. Afaik GCC does not generate SSE instructions that often,
but I wonder if it can do something similar.

Clang's unaligned movaps bug: https://bugs.llvm.org/show_bug.cgi?id=35901

[Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF

2023-11-19 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184

Ivan Bodrov  changed:

   What|Removed |Added

 CC||securesneakers at gmail dot com

--- Comment #2 from Ivan Bodrov  ---
This seem to have been implemented, at least for __atomic_fetch_and, but the
optimization is very fragile and fails when "lock and" value and mask used
during checking come from separate literals:

$ cat fragile-fetch-and.c
void slowpath(unsigned long *p);
void func_bad(unsigned long *p)
{
if (__atomic_fetch_and(p, ~1UL, __ATOMIC_RELAXED) & ~1UL)
slowpath(p);
}
void func_good(unsigned long *p)
{
unsigned long mask = ~1UL;
if (__atomic_fetch_and(p, mask, __ATOMIC_RELAXED) & mask)
slowpath(p);
}

Compiling this we can see that even though functions are the same, the first
one wasn't optimized:

$ gcc --version
gcc (GCC) 13.2.1 20230801
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ uname -s -m
Linux x86_64
$ gcc -O2 -c fragile-fetch-and.c 
$ objdump -d fragile-fetch-and.o

fragile-fetch-and.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   48 8b 07mov(%rdi),%rax
   3:   48 89 c1mov%rax,%rcx
   6:   48 89 c2mov%rax,%rdx
   9:   48 83 e1 fe and$0xfffe,%rcx
   d:   f0 48 0f b1 0f  lock cmpxchg %rcx,(%rdi)
  12:   75 ef   jne3 
  14:   48 83 fa 01 cmp$0x1,%rdx
  18:   77 06   ja 20 
  1a:   c3  ret
  1b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
  20:   e9 00 00 00 00  jmp25 
  25:   66 66 2e 0f 1f 84 00data16 cs nopw 0x0(%rax,%rax,1)
  2c:   00 00 00 00 

0030 :
  30:   f0 48 83 27 fe  lock andq $0xfffe,(%rdi)
  35:   75 09   jne40 
  37:   c3  ret
  38:   0f 1f 84 00 00 00 00nopl   0x0(%rax,%rax,1)
  3f:   00 
  40:   e9 00 00 00 00  jmp45 

[Bug target/110184] [x86] Missed optimisation: atomic operations should use PF, ZF and SF

2023-11-19 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110184

--- Comment #3 from Ivan Bodrov  ---
Created attachment 56646
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56646&action=edit
Fails to apply optimization for __atomic_fetch_and ZF-flag with separate
literals

[Bug rtl-optimization/115802] New: Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-05 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

Bug ID: 115802
   Summary: Non-atomic load of static variable moved out of loop
despite atomic fences
   Product: gcc
   Version: 14.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: securesneakers at gmail dot com
  Target Milestone: ---

Created attachment 58595
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58595&action=edit
Minimal program to reproduce the issue

The attachment program contains:
- A toy mutex (spinlock)
- A toy condition variable
- A non-atomic _static_ flag, protected by the mutex
- A reader thread that waits for the flag to be set using mutex and condition
variable
- A writer thread that sets the flag and notifies the condition variable.

The program can be compiled and run, but will hang:

$ gcc -std=c11 -O2 minimal-executable.c
$ ./a.out

Because the waiting loop:

mutex_lock(&mtx);
while (!val)
cond_wait(&cnd, &mtx);
mutex_unlock(&mtx);

Has been optimized into infinite loop:

$ objdump -d a.out
...
10b0:   e8 7b 01 00 00  call   1230 
10b5:   eb f9   jmp10b0 

Such transformation means that non-atomic load of "val" has been moved before
"memory_order_seq_cst" load of the "mutex_lock" function. Making the flag
non-static or letting its address escape "fixes" it.

I am using GCC 14.1.1, but this is reproducible for all version since at least
4.9.2. I have noticed that Clang shares similar issue, but only since Clang 13.

[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-05 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

--- Comment #3 from Ivan Bodrov  ---
Yes, but "wait" operation is not supposed to touch anything.

The example shows typical usage of condition variable, except "real" code
usually accesses data through some pointer, not as a static variable. Or links
dynamically to mutex/condvar code.

Once everything is linked statically and LTO is enabled, this transformation
breaks it. Not a common case, but the code seems correct to me.

[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-06 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

--- Comment #5 from Ivan Bodrov  ---
Without the mutex threads would race for the first access to non-atomic
variable.

Seq-Cst ordering is only used to simplify the example, it can be relaxed to
acquire/release/relaxed for different operations. I didn't want people to spend
too much time thinking about it.

I guess even shorter demonstration would be using inlint asm with memory
clobbering:

while (!val)
   __asm__ volatile ("":::"memory");

The above forces GCC to re-load "val" on every iteration and the code compiles
to:

.L8:
mov edx, DWORD PTR val[rip]
testedx, edx
je  .L8

But if the same fence is within the function, it won't have any effect:

 __attribute__((__noinline__)) void fence(void) { __asm__ volatile
("":::"memory"); }
...
while (!val)
fence();

Compiles to:

.L8:
callfunc
jmp .L8

[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-06 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

--- Comment #6 from Ivan Bodrov  ---
Created attachment 58597
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58597&action=edit
Smaller, but non runnable example.

Full demo code for the above (can't be run, only shows code generation). Can be
compiled as:

$ gcc -O2 -c codegen-demo.c

[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-06 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

--- Comment #8 from Ivan Bodrov  ---
The 2nd example is only intended to show changes in code generation after the
fence is moved to the function that is visible, but is not inlined, which is
the cause of this  issue. The code is not supposed to be correct.

This is why the original example has a complete mutex + condvar.

[Bug middle-end/115802] Non-atomic load of static variable moved out of loop despite atomic fences

2024-07-06 Thread securesneakers at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802

--- Comment #9 from Ivan Bodrov  ---
Created attachment 58598
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58598&action=edit
Minimal program to reproduce the issue (no condvar)

To further simplify the original code, the condition variable can be excluded
and the waiter code can be replaced with unlock-then-lock:

mutex_lock(&mtx);
while (!val) {
mutex_unlock(&mtx);
mutex_lock(&mtx);
}
mutex_unlock(&mtx);

This might be a better demonstration. Will hang the same way for the same
reason:

$ gcc -O2 minimal-executable-no-cond.c
$ ./a.out