On 10/13/2011 06:44 AM, Jakub Jelinek wrote: > * config/i386/sse.md (reduc_umin_v8hi): New pattern. > * config/i386/i386.c (ix86_build_const_vector): Handle > also V32QI, V16QI, V16HI and V8HI modes. > (emit_reduc_half): New function. > (ix86_expand_reduc): Use phminposuw insn for V8HImode UMIN. > Use emit_reduc_half helper function. > > * gcc.target/i386/sse4_1-phminposuw-2.c: New test. > * gcc.target/i386/sse4_1-phminposuw-3.c: New test. > * gcc.target/i386/avx-vphminposuw-2.c: New test. > * gcc.target/i386/avx-vphminposuw-3.c: New test.
Ok. > case V8SFmode: > + if (i == 256) > + tem = gen_avx_vperm2f128v8sf3 (dest, src, src, const1_rtx); > + else > + tem = gen_avx_shufps256 (dest, src, src, > + GEN_INT (i == 128 ? 2 + (3 << 2) : 1)); It occurs to me to wonder if we wouldn't get better performance dropping to a 128-bit vector during the first fold. Let the AVX 128-bit operations zero the high bits of the ymm register. Definitely something for a future patch though. r~