https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- We now also apply SLP vectorizing the loop, but as said the high VF is probably prohibitive and causes quite some spilling: .L7: vmovdqu (%r14), %ymm2 vmovdqu 32(%r14), %ymm1 subq $-128, %r14 subq $-128, %rdx vmovups -128(%rdx), %ymm10 vmovdqu -64(%r14), %ymm0 vpshufb .LC7(%rip), %ymm2, %ymm4 vmovups -96(%rdx), %ymm9 vmovups -64(%rdx), %ymm8 vpshufb .LC8(%rip), %ymm1, %ymm3 vpermq $78, %ymm4, %ymm4 vpermq $78, %ymm3, %ymm3 ... vmulps %ymm7, %ymm0, %ymm0 vaddps 136(%rsp), %ymm0, %ymm7 vaddps %ymm3, %ymm15, %ymm15 vmovaps %ymm4, 168(%rsp) vmovaps %ymm7, 136(%rsp) cmpq %r13, %r14 jne .L7 Maybe we should more aggressively reject vectorization when the VF is equal to the smallest element number of vector lanes. When we then also detect SLP this usually means BB-level SLP can do something. Note we fail to support V2SF -> V2QI now, not sure what changed here. vectorizable_conversion doesn't support float->int->short->char but only either float->char, float->int->char or float->short->char, but at least for 2-element vectors we don't support these (the vectorizer could support extra intermediate steps as well).