https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64909
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- We call Breakpoint 6, ix86_add_stmt_cost (data=0x21f34b0, count=56, kind=scalar_stmt, stmt_info=0x0, misalign=0, where=vect_epilogue) which is because we estimate epilogue cost to # of peeled iterations times scalar iteration cost (4 * 14). The way we do that, if (*peel_iters_epilogue) retval += record_stmt_cost (epilogue_cost_vec, *peel_iters_epilogue * scalar_single_iter_cost, scalar_stmt, NULL, 0, vect_epilogue); is slightly off (we record 4 * 14 scalar stmts but for that we'd need to use scalar_single_iter_num_stmts, not their cost, but well - scalar stmt cost is 1 even for bdver1). So the issue is that scalar iteration cost is somehow very high for bdver1 (14) compared to generic (3). It looks like bdver1 uses scaled costs (not based on 1): 6, /* scalar_stmt_cost. */ 4, /* scalar load_cost. */ 4, /* scalar_store_cost. */ 6, /* vec_stmt_cost. */ 0, /* vec_to_scalar_cost. */ 2, /* scalar_to_vec_cost. */ 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ 2, /* cond_taken_branch_cost. */ 1, /* cond_not_taken_branch_cost. and thus runs into the aforementioned issue.