On 10/13/2011 06:44 AM, Jakub Jelinek wrote:
>       * config/i386/sse.md (reduc_umin_v8hi): New pattern.
>       * config/i386/i386.c (ix86_build_const_vector): Handle
>       also V32QI, V16QI, V16HI and V8HI modes.
>       (emit_reduc_half): New function.
>       (ix86_expand_reduc): Use phminposuw insn for V8HImode UMIN.
>       Use emit_reduc_half helper function.
> 
>       * gcc.target/i386/sse4_1-phminposuw-2.c: New test.
>       * gcc.target/i386/sse4_1-phminposuw-3.c: New test.
>       * gcc.target/i386/avx-vphminposuw-2.c: New test.
>       * gcc.target/i386/avx-vphminposuw-3.c: New test.

Ok.

>      case V8SFmode:
> +      if (i == 256)
> +     tem = gen_avx_vperm2f128v8sf3 (dest, src, src, const1_rtx);
> +      else
> +     tem = gen_avx_shufps256 (dest, src, src,
> +                              GEN_INT (i == 128 ? 2 + (3 << 2) : 1));

It occurs to me to wonder if we wouldn't get better performance
dropping to a 128-bit vector during the first fold.  Let the AVX
128-bit operations zero the high bits of the ymm register.

Definitely something for a future patch though.


r~

Reply via email to