https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108498
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ebotcazou at gcc dot gnu.org --- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, the problem is the separate string_concatenate mode introduced in that r11-1810-ge362a897655e3b92949b65a2b53e00fb3ab8ded0 commit. This mode is turned on whenever at least one of the stores is using a STRING_CST and is successfully merged with the other stores. Now, this mode assumes that the start of the store is on a byte boundary and that the store ends on a byte boundary too, it simply creates a character array with try_size / BITS_PER_UNIT and ignores any masking. It is true that merged_store_group::can_be_merged_into does various checks, if string_concatenation mode is on, it will reject merging with MEM_REF, etc., plus /* STRING_CST is compatible with INTEGER_CST if no BIT_INSERT_EXPR. */ if (info->rhs_code == STRING_CST && stores[0]->rhs_code == INTEGER_CST && stores[0]->bitsize == CHAR_BIT) return !bit_insertion; if (stores[0]->rhs_code == STRING_CST && info->rhs_code == INTEGER_CST && info->bitsize == CHAR_BIT) return !bit_insertion; and also merged_store_group::do_merge turns this mode off if: /* But we cannot use it if we don't have consecutive stores. */ if (!consecutive) string_concatenation = false; and on if: /* We want to use concatenation if there is any string. */ if (info->rhs_code == STRING_CST) { string_concatenation = true; gcc_assert (!bit_insertion); } Now, if the goal is to only do the string_concatenation mode if it is some set single character stores and at least one or more STRING_CST stores, the above doesn't achieve that by far. The first cited hunk above from can_be_merged_into allows it if the first store is char sized integral store and some later is STRING_CST. It doesn't check if that first store is byte aligned, and doesn't check all the other stores in between. All do_merge then does is if the stores aren't consecutive the string_concatenation mode is turned off. So e.g. we reject merging if there is 2 byte INTEGER_CST store followed by consecutive STRING_CST, but allow 1, 2 and 4 byte INTEGER_CST stores followed by STRING_CST (all consecutive), or say 1 byte, then 3 bits, then STRING_CST etc. The second one is more strict, when the first store is STRING_CST, we punt on any following INTEGER_CST store even when it is consecutive, if it isn't 1 byte store. Consider e.g. on x86_64-linux with -O2: struct U { char c[16]; }; struct V { char c[16]; }; struct S { unsigned int a : 3, b : 8, c : 21; struct U d; unsigned int e; struct V f; unsigned int g : 5, h : 27; }; struct T { unsigned int a : 16, b : 8, c : 8; struct U d; unsigned int e; struct V f; unsigned int g : 5, h : 27; }; __attribute__((noipa)) void foo (struct S *p) { p->b = 231; p->c = 42; p->d = (struct U) { "abcdefghijklmno" }; p->e = 0xdeadbeef; p->f = (struct V) { "ABCDEFGHIJKLMNO" }; } __attribute__((noipa)) void bar (struct S *p) { p->b = 231; p->c = 42; p->d = (struct U) { "abcdefghijklmno" }; p->e = 0xdeadbeef; p->f = (struct V) { "ABCDEFGHIJKLMNO" }; p->g = 12; } __attribute__((noipa)) void baz (struct T *p) { p->c = 42; p->d = (struct U) { "abcdefghijklmno" }; p->e = 0xdeadbeef; p->f = (struct V) { "ABCDEFGHIJKLMNO" }; p->g = 12; } int main () { struct S s = {}; struct T t = {}; foo (&s); if (s.a != 0 || s.b != 231 || s.c != 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 0 || s.h != 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); s.a = 7; s.g = 31; s.h = (1U << 27) - 1; foo (&s); if (s.a != 7 || s.b != 231 || s.c != 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 31 || s.h != (1U << 27) - 1) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); bar (&s); if (s.a != 0 || s.b != 231 || s.c != 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 12 || s.h != 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); s.a = 7; s.g = 31; s.h = (1U << 27) - 1; bar (&s); if (s.a != 7 || s.b != 231 || s.c != 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 12 || s.h != (1U << 27) - 1) __builtin_abort (); baz (&t); if (t.a != 0 || t.b != 0 || t.c != 42 || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e != 0xdeadbeef || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g != 12 || t.h != 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); t.a = 7; t.b = 255; t.g = 31; t.h = (1U << 27) - 1; baz (&t); if (t.a != 7 || t.b != 255 || t.c != 42 || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e != 0xdeadbeef || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g != 12 || t.h != (1U << 27) - 1) __builtin_abort (); return 0; } The 3 functions show the various issues, non-byte boundary start, non-byte boundary start & end and non-byte boundary end. We have lots of options to fix this, the question is what results in best code generation. One is to only do what has been probably originally meant, i.e. only allow merging of STRING_CSTs with byte aligned byte sized INTEGER_CST stores (of course consecutive). Unless we'd want to risk O(n^2) complexity, we might need a flag that is set in a group with non-byte sized or non-byte aligned INTEGER_CST stores and if that flag is set, don't allow merging it with a later STRING_CST. Another option is keep the current (admittedly weird) behavior but say in merged_store_group::apply_stores clear string_merging flag if start or width aren't divisible by BITS_PER_UNIT. After all, we already clear it also for buf_size <= MOVE_MAX. Or some combination of the above, I mean, e.g. in the original testcase as well as the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108498#c18 one the only STRING_CST in there is small and even power of two sized (and aligned the same); treating such 4 byte store differently from INTEGER_CST store in how store merging behaves in that case is weird. Of course, another example could be fairly long STRING_CST, especially with length not multiple of word size. We don't really try to estimate that the STRING_CST store is actually more expensive than a simple scalar store, so if we decide to merge such STRING_CST store with other stores in a group, it could be that we just punt on it because the multiple scalar stores look more expensive than the original stores which have one or more expensive STRING_CST stores. And if we punt, it might be the case that if we didn't merge the STRING_CST store in at first, say the group before the STRING_CST or after it or both could each store merge successfully and beneficially. That said, I think especially for backports to 11/12 and perhaps even for 13 the easiest will be the second option, clear string_merging on non-aligned start/width. Eric, your thoughts on this?