https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737
Bug ID: 98737 Summary: Atomic operation on x86 no optimized to use flags Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: drepper.fsp+rhbz at gmail dot com Target Milestone: --- Consider the following code: long a; _Bool f(long b) { return __atomic_sub_fetch(&a, b, __ATOMIC_RELEASE) == 0; } _Bool g(long b) { return (a -= b) == 0; } When compiling for x86-64 with the current HEAD as of 20210118 the resulting code is: 0000000000000000 <f>: 0: 48 f7 df neg %rdi 3: 48 89 f8 mov %rdi,%rax 6: f0 48 0f c1 05 00 00 lock xadd %rax,0x0(%rip) # f <f+0xf> d: 00 00 f: 48 01 f8 add %rdi,%rax 12: 0f 94 c0 sete %al 15: c3 retq 16: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 1d: 00 00 00 0000000000000020 <g>: 20: 48 29 3d 00 00 00 00 sub %rdi,0x0(%rip) # 27 <g+0x7> 27: 0f 94 c0 sete %al 2a: c3 retq The code for f is far too complicated. All that needs to be different from the code in g is that the lock prefix must be used for sub. Probably all __atomic_* builtins have problems with using flags when possible. This is not an esoteric problem. I was specifically looking at optimizing the std::latch implementation for C++20 and this is what would be needed. Without a fix a special version would be needed or the current, much worse code is used.