https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #0)
> We could vectorize gcc.dg/vect/pr65947-7.c if we implement the
> extendv4siv4hi pattern (sign-extend V4HI to V4SI). We can already do
> vec_unpacks_lo via
>
> pcmpgtw %xmm0, %xmm1
> movdqa %xmm0, %xmm2
> punpcklwd %xmm1, %xmm2
>
> and that would trivially extend to the required pattern - just the
> input is v4hi instead of v8hi.
>
> Other related patterns are probably missing as well, where we can do
> vec_unpack[s]_lo we should be able to implement [zero_]extend.
We have:
(define_expand "<insn>v4hiv4si2"
[(set (match_operand:V4SI 0 "register_operand")
(any_extend:V4SI
(match_operand:V4HI 1 "nonimmediate_operand")))]
"TARGET_SSE4_1"
in sse.md, so the testcase should be vectorized using -msse4.1. Is there any
other pattern missing for efficient vectorization?