https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86314
Bug ID: 86314 Summary: GCC 7.x and 8.x zero out "eax" before using "rax" in "lock bts" Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: matthieum.147192 at gmail dot com Target Milestone: --- The following C++ code: using u64 = unsigned long long; struct Bucket { u64 mLeaves[16] = {}; }; struct BucketMap { u64 acquire() noexcept { while (true) { u64 map = mData; u64 index = (map & 1) ? 1 : 0; auto mask = u64(1) << index; auto previous = __atomic_fetch_or(&mData, mask, __ATOMIC_SEQ_CST); if ((previous & mask) == 0) { return index; } } } __attribute__((noinline)) Bucket acquireBucket() noexcept { acquire(); return Bucket(); } volatile u64 mData = 1; }; int main() { BucketMap map; map.acquireBucket(); return 0; } Generates the following assembly code: BucketMap::acquireBucket(): mov r8, rdi mov rdx, rsi .L2: mov rax, QWORD PTR [rsi] xor eax, eax lock bts QWORD PTR [rdx], rax setc al jc .L2 mov rdi, r8 mov ecx, 16 rep stosq mov rax, r8 ret main: sub rsp, 152 lea rsi, [rsp+8] lea rdi, [rsp+16] mov QWORD PTR [rsp+8], 1 call BucketMap::acquireBucket() xor eax, eax add rsp, 152 ret The problem is located in `.L2`: 1. `rax` is initialized with the value read from `*rsi`. 2. `eax` is zeroed out. 3. `rax` is used in `lock bts`; it is now `0` due to (2), resulting in an infinite loop. The problem only occurs when the size of `Bucket::mLeaves` is >= 11, which switches the zero-initialization of the array from multiple SSE instructions to the `rep stosq` instruction. The problem occurs both on my own machine with gcc 7.3.0 $ gcc --version gcc (GCC) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. On godbolt (see below), it occurs with gcc 7.1, 7.2, 7.3 and 8.1. It does not seem to occur with gcc 6.3, which uses a less optimal cmpxchg instead of bts. Simply specifying `-O3` triggers the issue, no further option necessary. Godbolt: https://godbolt.org/g/bTBXv2 Stackoverflow: https://stackoverflow.com/q/51020541/147192