https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994
Bug ID: 94994 Summary: [10 Regression] possible miscompilation of word-at-a-time copy via packed structs Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ebiggers3 at gmail dot com Target Milestone: --- Starting with gcc 10, the following code (based on https://github.com/ebiggers/libdeflate/blob/v1.5/lib/decompress_template.h#L353) isn't compiled as expected at -O3 on x86_64: #include <stddef.h> #include <stdio.h> #define WORDSIZE sizeof(size_t) struct word_unaligned { size_t x; } __attribute__((packed, may_alias)); static inline size_t load_word_unaligned(const char *p) { return ((struct word_unaligned *)p)->x; } static inline void store_word_unaligned(size_t x, char *p) { ((struct word_unaligned *)p)->x = x; } void __attribute__((noinline)) copy(char *dst, const char *src, size_t word_count) { do { store_word_unaligned(load_word_unaligned(src), dst); src += WORDSIZE; dst += WORDSIZE; } while (--word_count); } int main() { char buf[9 + 6 * WORDSIZE + 1] = "012345678"; copy(&buf[9], &buf[0], 6); puts(buf); } The code is supposed to copy 6 eight-byte words from &buf[0] to &buf[9], one word at a time, resulting in the output "012345678012345678012345678012345678012345678012345678012". But the actual output is "012345678012345678". Based on the diassembly, this seems to be caused by gcc assuming that the src and dst pointers are offset from each other by a multiple of 8 bytes. It uses this assumption to generate 16-byte-at-a-time SSE copy code. But that's invalid when the pointers are actually offset by 9 bytes as in the example. Is this working as intended? I don't think so, since the use of packed and may_alias should make gcc assume that the pointers can have any alignment, and each store can alias the next load. But perhaps there's some reason I'm missing why the code could nevertheless be considered incorrect, or perhaps there's some ambiguity in what 'packed' is supposed to do (it's a gcc extension after all). I'm going to change my code to use memcpy() anyway, since the bugs where gcc generated bad code for memcpy() in some cases have supposedly been fixed in recent gcc's. But I thought I'd point this out since I'm not sure it's working as intended, and probably other people will run into it too. Decompression code is most likely to be affected by this. gcc 9.3.0 works fine. I didn't test anything in between that and 10.1.