https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80430

            Bug ID: 80430
           Summary: Vectorizer undervalues cost of alias checking for
                    versioning
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wschmidt at gcc dot gnu.org
  Target Milestone: ---

While investigating a performance loss due to vectorization, I noticed that the
cost model doesn't properly account for the outside cost due to a versioning
check for aliasing.  Note the FIXME here in tree-vect-loop.c:

  /* Requires loop versioning with alias checks.  */
  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
    {
      /*  FIXME: Make cost depend on complexity of individual check.  */
      unsigned len = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).length ();
      (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
                            vect_prologue);
      dump_printf (MSG_NOTE,
                   "cost model: Adding cost of checks for loop "
                   "versioning aliasing.\n");
    }

Thus the outside cost typically gets a cost of 1 vector_stmt for this checking.
 In reality, about 20 gimple scalar statements are added, including several
conditional moves.  See vect_create_cond_for_alias_checks and friends in
tree-vect-loop-manip.c for the gory details.

This can cause real problems when an inner loop that isn't frequently executed
gets vectorized, and its outer loop executes many times, suffering the penalty
of the aliasing check.  (Even with a saner estimate of 20 * len scalar_stmts,
I've seen this happen due to static estimation still leading us to think this
is profitable.  But we should still clean this up.)

An example inner loop where this kind of pessimization occurs:

static void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd)
{
    BYTE* d = (BYTE*)dstPtr;
    const BYTE* s = (const BYTE*)srcPtr;
    BYTE* const e = (BYTE*)dstEnd;

    do { memcpy(d,s,8); d+=8; s+=8; } while (d<e);
}

(taken from the LZ4 open source project).

Reply via email to