https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113622
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #8) > Guess for an rvalue (if even that crashes) we want to expand it to some > permutation or whole vector shift which moves the indexed elements first and > then extract it, for lvalue we need to insert it similarly. If we can we should match this up with .VEC_SET / .VEC_EXTRACT, otherwise we should go "simple" and spill. diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc index 7e2392ecd38..e94f292dd38 100644 --- a/gcc/gimple-isel.cc +++ b/gcc/gimple-isel.cc @@ -104,7 +104,8 @@ gimple_expand_vec_set_extract_expr (struct function *fun, machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0)); machine_mode extract_mode = TYPE_MODE (TREE_TYPE (ref)); - if (auto_var_in_fn_p (view_op0, fun->decl) + if ((auto_var_in_fn_p (view_op0, fun->decl) + || DECL_HARD_REGISTER (view_op0)) && !TREE_ADDRESSABLE (view_op0) && ((!is_extract && can_vec_set_var_idx_p (outermode)) || (is_extract ensures the former and fixes the ICE on x86_64 on trunk. The comment#5 testcase then results in the following loop: .L3: movslq %eax, %rdx vmovaps %zmm2, -56(%rsp) vmovaps %zmm0, -120(%rsp) vmovss -120(%rsp,%rdx,4), %xmm4 vmovss -56(%rsp,%rdx,4), %xmm3 vcmpltss %xmm4, %xmm3, %xmm3 vpbroadcastd %eax, %zmm4 addl $1, %eax vpcmpd $0, %zmm7, %zmm4, %k1 vblendvps %xmm3, %xmm5, %xmm6, %xmm3 vbroadcastss %xmm3, %zmm1{%k1} cmpl $8, %eax jne .L3 this isn't optimal of course, for optimality we need vectorization. But we still need to avoid the ICEs since vectorization can be disabled. That said, I'm quite sure in code using hard registers people are not doing such stupid things so I wonder how important it is to avoid "regressing" the vectorization here.