https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116032
--- Comment #8 from Christophe Lyon <clyon at gcc dot gnu.org> --- We currently have: struct cpu_vec_costs arm_default_vec_cost = { 1, /* scalar_stmt_cost. */ 1, /* scalar load_cost. */ 1, /* scalar_store_cost. */ 1, /* vec_stmt_cost. */ 1, /* vec_to_scalar_cost. */ 1, /* scalar_to_vec_cost. */ 1, /* vec_align_load_cost. */ 1, /* vec_unalign_load_cost. */ 1, /* vec_unalign_store_cost. */ 1, /* vec_store_cost. */ 3, /* cond_taken_branch_cost. */ 1, /* cond_not_taken_branch_cost. */ }; and obviously replacing vec_align_load_cost with "2" "fixed" the problem, we again generate: movs r2, #1 movs r3, #0 strd r2, r3, [r0] but that seems a bit too strong a change (and probably introduces regressions elsewhere) Maybe we could instead pessimize such a vec_load if it implies the creation of a literal pool entry?