13 Regression] ppc64 big endian generates uninitialized reads with -fstore-merging

jakub at gcc dot gnu.org via Gcc-bugs Tue, 24 Jan 2023 07:35:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108498


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot gnu.org

--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, the problem is the separate string_concatenate mode introduced in that
r11-1810-ge362a897655e3b92949b65a2b53e00fb3ab8ded0 commit.
This mode is turned on whenever at least one of the stores is using a
STRING_CST and is successfully merged with the other stores.
Now, this mode assumes that the start of the store is on a byte boundary and
that the store ends on a byte boundary too, it simply creates a character array
with try_size / BITS_PER_UNIT and ignores any masking.
It is true that merged_store_group::can_be_merged_into does various checks, if
string_concatenation mode is on, it will reject merging with MEM_REF, etc.,
plus
  /* STRING_CST is compatible with INTEGER_CST if no BIT_INSERT_EXPR.  */
  if (info->rhs_code == STRING_CST
      && stores[0]->rhs_code == INTEGER_CST
      && stores[0]->bitsize == CHAR_BIT)
    return !bit_insertion;

  if (stores[0]->rhs_code == STRING_CST
      && info->rhs_code == INTEGER_CST
      && info->bitsize == CHAR_BIT)
    return !bit_insertion;
and also merged_store_group::do_merge turns this mode off if:
  /* But we cannot use it if we don't have consecutive stores.  */
  if (!consecutive)
    string_concatenation = false;
and on if:
  /* We want to use concatenation if there is any string.  */
  if (info->rhs_code == STRING_CST)
    {
      string_concatenation = true;
      gcc_assert (!bit_insertion);
    }
Now, if the goal is to only do the string_concatenation mode if it is some set
single character stores and at least one or more STRING_CST stores, the above
doesn't achieve that by far.
The first cited hunk above from can_be_merged_into allows it if the first store
is char sized integral store and some later is STRING_CST.  It doesn't check if
that
first store is byte aligned, and doesn't check all the other stores in between.
All do_merge then does is if the stores aren't consecutive the
string_concatenation mode is turned off.  So e.g. we reject merging if there is
2 byte INTEGER_CST store followed by consecutive STRING_CST, but allow 1, 2 and
4 byte INTEGER_CST stores followed by STRING_CST (all consecutive), or say 1
byte, then 3 bits, then STRING_CST etc.
The second one is more strict, when the first store is STRING_CST, we punt on
any following INTEGER_CST store even when it is consecutive, if it isn't 1 byte
store.

Consider e.g. on x86_64-linux with -O2:
struct U { char c[16]; };
struct V { char c[16]; };
struct S { unsigned int a : 3, b : 8, c : 21; struct U d; unsigned int e;
struct V f; unsigned int g : 5, h : 27; };
struct T { unsigned int a : 16, b : 8, c : 8; struct U d; unsigned int e;
struct V f; unsigned int g : 5, h : 27; };

__attribute__((noipa)) void
foo (struct S *p)
{
  p->b = 231;
  p->c = 42;
  p->d = (struct U) { "abcdefghijklmno" };
  p->e = 0xdeadbeef;
  p->f = (struct V) { "ABCDEFGHIJKLMNO" };
}

__attribute__((noipa)) void
bar (struct S *p)
{
  p->b = 231;
  p->c = 42;
  p->d = (struct U) { "abcdefghijklmno" };
  p->e = 0xdeadbeef;
  p->f = (struct V) { "ABCDEFGHIJKLMNO" };
  p->g = 12;
}

__attribute__((noipa)) void
baz (struct T *p)
{
  p->c = 42;
  p->d = (struct U) { "abcdefghijklmno" };
  p->e = 0xdeadbeef;
  p->f = (struct V) { "ABCDEFGHIJKLMNO" };
  p->g = 12;
}

int
main ()
{
  struct S s = {};
  struct T t = {};
  foo (&s);
  if (s.a != 0 || s.b != 231 || s.c != 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 0 || s.h !=
0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  s.a = 7;
  s.g = 31;
  s.h = (1U << 27) - 1;
  foo (&s);
  if (s.a != 7 || s.b != 231 || s.c != 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 31 || s.h
!= (1U << 27) - 1)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  bar (&s);
  if (s.a != 0 || s.b != 231 || s.c != 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 12 || s.h
!= 0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  s.a = 7;
  s.g = 31;
  s.h = (1U << 27) - 1;
  bar (&s);
  if (s.a != 7 || s.b != 231 || s.c != 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e != 0xdeadbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g != 12 || s.h
!= (1U << 27) - 1)
    __builtin_abort ();
  baz (&t);
  if (t.a != 0 || t.b != 0 || t.c != 42
      || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e != 0xdeadbeef
      || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g != 12 || t.h
!= 0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  t.a = 7;
  t.b = 255;
  t.g = 31;
  t.h = (1U << 27) - 1;
  baz (&t);
  if (t.a != 7 || t.b != 255 || t.c != 42
      || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e != 0xdeadbeef
      || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g != 12 || t.h
!= (1U << 27) - 1)
    __builtin_abort ();
  return 0;
}

The 3 functions show the various issues, non-byte boundary start, non-byte
boundary start & end and non-byte boundary end.

We have lots of options to fix this, the question is what results in best code
generation.  One is to only do what has been probably originally meant, i.e.
only
allow merging of STRING_CSTs with byte aligned byte sized INTEGER_CST stores
(of course consecutive).  Unless we'd want to risk O(n^2) complexity, we might
need a flag that is set in a group with non-byte sized or non-byte aligned
INTEGER_CST stores and if that flag is set, don't allow merging it with a later
STRING_CST.

Another option is keep the current (admittedly weird) behavior but say in
merged_store_group::apply_stores clear string_merging flag if start or width
aren't divisible by BITS_PER_UNIT.  After all, we already clear it also for
buf_size <= MOVE_MAX.

Or some combination of the above, I mean, e.g. in the original testcase as well
as
the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108498#c18 one the only
STRING_CST in there is small and even power of two sized (and aligned the
same); treating such 4 byte
store differently from INTEGER_CST store in how store merging behaves in that
case is weird.  Of course, another example could be fairly long STRING_CST,
especially with
length not multiple of word size.  We don't really try to estimate that the
STRING_CST
store is actually more expensive than a simple scalar store, so if we decide to
merge
such STRING_CST store with other stores in a group, it could be that we just
punt on it
because the multiple scalar stores look more expensive than the original stores
which have one or more expensive STRING_CST stores.  And if we punt, it might
be the case that
if we didn't merge the STRING_CST store in at first, say the group before the
STRING_CST or after it or both could each store merge successfully and
beneficially.

That said, I think especially for backports to 11/12 and perhaps even for 13
the easiest will be the second option, clear string_merging on non-aligned
start/width.

Eric, your thoughts on this?

[Bug tree-optimization/108498] [11/12/13 Regression] ppc64 big endian generates uninitialized reads with -fstore-merging

Reply via email to