https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70823
Bug ID: 70823 Summary: x86_64: __atomic_fetch_and/or/xor() should perhaps use BTR/BTS/BTC if they can Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: dhowells at redhat dot com Target Milestone: --- Created attachment 38347 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38347&action=edit Test source If given a mask that clears, sets or flips a single bit and the result is checked for just that bit and reduced to bool, then the __atomic_fetch_and, _or and _xor functions should consider using BTR, BTS or BTC as appropriate. So, something like: static __always_inline bool test_and_set_bit(unsigned bit, unsigned long *ptr) { unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1)); unsigned long old; ptr += bit / BITS_PER_LONG; old = __atomic_fetch_or(ptr, mask, __ATOMIC_SEQ_CST); return old & mask; } where the mask is constructed by 1UL << bitnr. As things stand, for the example above, the result ends up with a CMPXCHG loop rather a BTS instruction: b: 89 f9 mov %edi,%ecx d: ba 01 00 00 00 mov $0x1,%edx 12: c1 ef 06 shr $0x6,%edi 15: 48 d3 e2 shl %cl,%rdx 18: 89 f9 mov %edi,%ecx 1a: 48 8b 04 ce mov (%rsi,%rcx,8),%rax 1e: 49 89 c0 mov %rax,%r8 21: 48 89 c7 mov %rax,%rdi 24: 49 09 d0 or %rdx,%r8 27: f0 4c 0f b1 04 ce lock cmpxchg %r8,(%rsi,%rcx,8) 2d: 75 ef jne 1e <set_bit+0x13> 2f: 48 85 fa test %rdi,%rdx 32: 0f 95 c0 setne %al 35: c3 retq Could we instead get something like: bts %edi,(%rsi) setne %al retq See the attached test source which should be compiled to a .s file. This is the case for all of: gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC) gcc version 6.0.0 20160219 (Red Hat Cross 6.0.0-0.1) (GCC) gcc version 4.8.5 20150623 (Red Hat 4.8.5-2.x) (GCC)