https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244
--- Comment #14 from dhowells at redhat dot com <dhowells at redhat dot com> ---
Okay, I built and booted an x86_64 kernel that had the XXX_bit() and
test_and_XXX_bit() ops altered to use __atomic_fetch_YYY() funcs. The core
kernel ended up ~8K larger in the .text segment. Examining ext4_resize_begin()
as an example, this statement:
if (test_and_set_bit_lock(EXT4_RESIZING, &EXT4_SB(sb)->s_resize_flags))
ret = -EBUSY;
looks like this in the unpatched kernel:
0xffffffff812169f3 <+122>: lock btsl $0x0,0x3b8(%rax)
0xffffffff812169fc <+131>: jb 0xffffffff81216a02
0xffffffff812169fe <+133>: xor %edx,%edx
0xffffffff81216a00 <+135>: jmp 0xffffffff81216a07
0xffffffff81216a02 <+137>: mov $0xfffffff0,%edx
0xffffffff81216a07 <+142>: mov %edx,%eax
and like this in the patched kernel:
0xffffffff81217414 <+122>: xor %edx,%edx
0xffffffff81217416 <+124>: lock btsq $0x0,0x3b8(%rax)
0xffffffff81217420 <+134>: setb %dl
0xffffffff81217423 <+137>: neg %edx
0xffffffff81217425 <+139>: and $0xfffffff0,%edx
0xffffffff81217428 <+142>: mov %edx,%eax
So it looks good here at least:-)
This also suggests there's an error in the current x86_64 kernel implementation
as the kernel bitops are supposed to operate on machine word-size locations, so
it should be using BTSQ not BTSL - which would make the __atomic_fetch_or()
variant a byte shorter - and involving no conditional jumps.