https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86541
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2018-07-17 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. So you are asking for GCC to properly "lower" explicit source level vector_size(8) operations to SSE (tree-vect-generic.c), not the autovectorizer supporting this? tree-vect-generic.c currently supports the reverse - if you'd use vector_size(16) then targets with only smaller vectors get those split up appropriately. That sounds easier in case vectorization with the larger vector size is possible for the code in question. Given that we have a target pass that makes use of SSE regs for scalar operations I wonder if it would make more sense to attack this at the target level by claiming native support for vector_size(8) and using a target pass to make that work. As you said the most simple way is to movlhps %xmmN, %xmmN at strategic places. That very thing could be also done by tree-vect-generic.c of course.