https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994

            Bug ID: 94994
           Summary: [10 Regression] possible miscompilation of
                    word-at-a-time copy via packed structs
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ebiggers3 at gmail dot com
  Target Milestone: ---

Starting with gcc 10, the following code (based on
https://github.com/ebiggers/libdeflate/blob/v1.5/lib/decompress_template.h#L353)
isn't compiled as expected at -O3 on x86_64:


#include <stddef.h>
#include <stdio.h>

#define WORDSIZE        sizeof(size_t)

struct word_unaligned {
        size_t x;
} __attribute__((packed, may_alias));

static inline size_t load_word_unaligned(const char *p)
{
        return ((struct word_unaligned *)p)->x;
}

static inline void store_word_unaligned(size_t x, char *p)
{
        ((struct word_unaligned *)p)->x = x;
}

void __attribute__((noinline))
copy(char *dst, const char *src, size_t word_count)
{
        do {
                store_word_unaligned(load_word_unaligned(src), dst);
                src += WORDSIZE;
                dst += WORDSIZE;
        } while (--word_count);
}

int main()
{
        char buf[9 + 6 * WORDSIZE + 1] = "012345678";

        copy(&buf[9], &buf[0], 6);

        puts(buf);
}


The code is supposed to copy 6 eight-byte words from &buf[0] to &buf[9], one
word at a time, resulting in the output
"012345678012345678012345678012345678012345678012345678012".  But the actual
output is "012345678012345678".  Based on the diassembly, this seems to be
caused by gcc assuming that the src and dst pointers are offset from each other
by a multiple of 8 bytes.  It uses this assumption to generate
16-byte-at-a-time SSE copy code.  But that's invalid when the pointers are
actually offset by 9 bytes as in the example.

Is this working as intended?  I don't think so, since the use of packed and
may_alias should make gcc assume that the pointers can have any alignment, and
each store can alias the next load.  But perhaps there's some reason I'm
missing why the code could nevertheless be considered incorrect, or perhaps
there's some ambiguity in what 'packed' is supposed to do (it's a gcc extension
after all).

I'm going to change my code to use memcpy() anyway, since the bugs where gcc
generated bad code for memcpy() in some cases have supposedly been fixed in
recent gcc's.  But I thought I'd point this out since I'm not sure it's working
as intended, and probably other people will run into it too.  Decompression
code is most likely to be affected by this.

gcc 9.3.0 works fine.  I didn't test anything in between that and 10.1.

Reply via email to