https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118852

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and -O3 instead of -Ofast also makes it work.  (semantic-interposition?)

There's only a single loop vectorized in fold-const.c,
which is in native_interpret_int:

fold-const.c:8109:23: optimized: loop vectorized using 32 byte vectors
fold-const.c:8109:23: optimized: loop vectorized using 16 byte vectors

and with ASLR disabled it seems to reliably pass :/

I've bisected affected files and it's enough to have tree-ssa-sccvn.c built
with -fprofile-generate to reproduce a failure.

tree-ssa-sccvn.c:1587:17: optimized: loop vectorized using 16 byte vectors
tree-ssa-sccvn.c:1587:17: optimized:  loop versioned for vectorization because
of possible aliasing
tree-flow-inline.h:92:24: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:92:24: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:92:24: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors

It's also enough to not use -fprofile-generate on tree-ssa-sccvn.c to avoid
the failure.

The loop at 1587 looks innocent enough, the inlined one is walking a
hashtable which explains the ASLR sensitivity, it's also a early exit
one.  The loops are vectorized in other contexts as well, but apparently
without bad effect.  The following are the two loops, they are essentially
the same.

static inline void *
next_htab_element (htab_iterator *hti)
{
  while (++(hti->slot) < hti->limit) 
    {
      PTR x = *(hti->slot);
      if (x != HTAB_EMPTY_ENTRY && x != HTAB_DELETED_ENTRY)
        return x; 
    };
  return NULL;
}

static inline void *
first_htab_element (htab_iterator *hti, htab_t table)
{
  hti->htab = table;
  hti->slot = table->entries;
  hti->limit = hti->slot + htab_size (table);
  do
    {
      PTR x = *(hti->slot);
      if (x != HTAB_EMPTY_ENTRY && x != HTAB_DELETED_ENTRY)
        break;
    } while (++(hti->slot) < hti->limit);

  if (hti->slot < hti->limit)
    return *(hti->slot);
  return NULL;
}

compiling with -fdbg-cnt=vect_loop:6-6 is enough to trigger this, which
vectorizes next_htab_element, contained in the run_scc_vn function
(inlined into it, of course), we're around set_hashtable_value_ids.

It's an odd thing that this causes us to not fault but miscompile.  Without
-fprofile-generate we only vectorize

tree-ssa-sccvn.c:1587:17: optimized: loop vectorized using 16 byte vectors
tree-ssa-sccvn.c:1587:17: optimized:  loop versioned for vectorization because
of possible aliasing
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors
tree-flow-inline.h:68:28: optimized: loop vectorized using 32 byte vectors

Reply via email to