https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80430
Bug ID: 80430 Summary: Vectorizer undervalues cost of alias checking for versioning Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org Target Milestone: --- While investigating a performance loss due to vectorization, I noticed that the cost model doesn't properly account for the outside cost due to a versioning check for aliasing. Note the FIXME here in tree-vect-loop.c: /* Requires loop versioning with alias checks. */ if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) { /* FIXME: Make cost depend on complexity of individual check. */ unsigned len = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).length (); (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0, vect_prologue); dump_printf (MSG_NOTE, "cost model: Adding cost of checks for loop " "versioning aliasing.\n"); } Thus the outside cost typically gets a cost of 1 vector_stmt for this checking. In reality, about 20 gimple scalar statements are added, including several conditional moves. See vect_create_cond_for_alias_checks and friends in tree-vect-loop-manip.c for the gory details. This can cause real problems when an inner loop that isn't frequently executed gets vectorized, and its outer loop executes many times, suffering the penalty of the aliasing check. (Even with a saner estimate of 20 * len scalar_stmts, I've seen this happen due to static estimation still leading us to think this is profitable. But we should still clean this up.) An example inner loop where this kind of pessimization occurs: static void LZ4_wildCopy(void* dstPtr, const void* srcPtr, void* dstEnd) { BYTE* d = (BYTE*)dstPtr; const BYTE* s = (const BYTE*)srcPtr; BYTE* const e = (BYTE*)dstEnd; do { memcpy(d,s,8); d+=8; s+=8; } while (d<e); } (taken from the LZ4 open source project).