On 10/12/2011 09:09 AM, Jakub Jelinek wrote:
>         /* Multiply the shuffle indicies by two.  */
> -       emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
> +       if (maskmode == V8SImode)
> +         emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
> +       else
> +         emit_insn (gen_addv32qi3 (t1, t1, t1));

I guess this would be cleaner to use plus always.  And thus
expand_simple_binop instead of (a couple of) these mode tests.

>  
> +     case V32QImode:
> +       t1 = gen_reg_rtx (V32QImode);
> +       t2 = gen_reg_rtx (V32QImode);
> +       t3 = gen_reg_rtx (V32QImode);
> +       vt2 = GEN_INT (128);
> +       for (i = 0; i < 32; i++)
> +         vec[i] = vt2;
> +       vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
> +       vt = force_reg (V32QImode, vt);
> +       for (i = 0; i < 32; i++)
> +         vec[i] = i < 16 ? vt2 : const0_rtx;
> +       vt2 = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
> +       vt2 = force_reg (V32QImode, vt2);
> +       emit_insn (gen_avx2_lshlv4di3 (gen_lowpart (V4DImode, t1),
> +                                      gen_lowpart (V4DImode, mask),
> +                                      GEN_INT (3)));
> +       emit_insn (gen_avx2_andnotv32qi3 (t2, vt, mask));
> +       emit_insn (gen_xorv32qi3 (t1, t1, vt2));
> +       emit_insn (gen_andv32qi3 (t1, t1, vt));
> +       emit_insn (gen_iorv32qi3 (t3, t1, t2));
> +       emit_insn (gen_xorv32qi3 (t1, t1, vt));
> +       emit_insn (gen_avx2_permv4di_1 (gen_lowpart (V4DImode, t3),
> +                                       gen_lowpart (V4DImode, t3),
> +                                       const2_rtx, GEN_INT (3),
> +                                       const0_rtx, const1_rtx));
> +       emit_insn (gen_iorv32qi3 (t1, t1, t2));

Some commentary here is required.  I might have expected to see a compare,
or something, but the logical operations here are less than obvious.

I believe I've commented on everything else in the previous messages.


r~

Reply via email to