https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86314

            Bug ID: 86314
           Summary: GCC 7.x and 8.x zero out "eax" before using "rax" in
                    "lock bts"
           Product: gcc
           Version: 7.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthieum.147192 at gmail dot com
  Target Milestone: ---

The following C++ code:

    using u64 = unsigned long long;

    struct Bucket {
        u64 mLeaves[16] = {};
    };

    struct BucketMap {
        u64 acquire() noexcept {
            while (true) {
                u64 map = mData;

                u64 index = (map & 1) ? 1 : 0;
                auto mask = u64(1) << index;

                auto previous =
                    __atomic_fetch_or(&mData, mask, __ATOMIC_SEQ_CST);
                if ((previous & mask) == 0) {
                    return index;
                }
            }
        }

        __attribute__((noinline)) Bucket acquireBucket() noexcept {
            acquire();
            return Bucket();
        }

        volatile u64 mData = 1;
    };

    int main() {
        BucketMap map;
        map.acquireBucket();
        return 0;
    }

Generates the following assembly code:

    BucketMap::acquireBucket():
        mov r8, rdi
        mov rdx, rsi

    .L2:
        mov rax, QWORD PTR [rsi]
        xor eax, eax
        lock bts QWORD PTR [rdx], rax
        setc al
        jc .L2
        mov rdi, r8
        mov ecx, 16
        rep stosq
        mov rax, r8
        ret

    main:
        sub rsp, 152
        lea rsi, [rsp+8]
        lea rdi, [rsp+16]
        mov QWORD PTR [rsp+8], 1
        call BucketMap::acquireBucket()
        xor eax, eax
        add rsp, 152
        ret

The problem is located in `.L2`:

 1. `rax` is initialized with the value read from `*rsi`.
 2. `eax` is zeroed out.
 3. `rax` is used in `lock bts`; it is now `0` due to (2), resulting in an
infinite loop.

The problem only occurs when the size of `Bucket::mLeaves` is >= 11, which
switches the zero-initialization of the array from multiple SSE instructions to
the `rep stosq` instruction.

The problem occurs both on my own machine with gcc 7.3.0

    $ gcc --version
    gcc (GCC) 7.3.0
    Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is 
    NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
    PURPOSE.

On godbolt (see below), it occurs with gcc 7.1, 7.2, 7.3 and 8.1. It does not
seem to occur with gcc 6.3, which uses a less optimal cmpxchg instead of bts.

Simply specifying `-O3` triggers the issue, no further option necessary.

Godbolt: https://godbolt.org/g/bTBXv2

Stackoverflow: https://stackoverflow.com/q/51020541/147192

Reply via email to