https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99510
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, OK. We're having a lot of vector CTORs we "vectorize" with load permutations like { 484 506 } and that runs into the pre-existing issue (there's a PR about this...) that we emit dead vector loads for all of the elements in the group, including gaps. Costing says they're even which possibly makes sense. We do a build_aligned_type for each emitted stmt and for some reason it's quite costly here (well, there's the awkward linear type variant list to walk ...). Caching should be possible but the load vectorization loop is already quite awkward. Meh. The rev. likely triggered this because we didn't cost the scalar root stmt before (the CTOR itself we replace). Doing that made the costing profitable. Having equal scalar and vector load cost makes fixing on the costing side difficult - the vector load should be an epsilon more expensive to avoid these issues. Note for some reason we have gazillion of type variants here. Huh. ~36070 variants per type. Ah. And _that's_ because build_aligned_type does for (t = TYPE_MAIN_VARIANT (type); t; t = TYPE_NEXT_VARIANT (t)) if (check_aligned_type (t, type, align)) return t; t = build_variant_type_copy (type); SET_TYPE_ALIGN (t, align); TYPE_USER_ALIGN (t) = 1; ^^^^ and check_aligned_type checks for an exact match TYPE_USER_ALIGN, but of course if 'type' wasn't aligned originally it won't find the created aligned type ... Fixing that fixes the compile-time issue.