On 10/12/2011 09:09 AM, Jakub Jelinek wrote: > /* Multiply the shuffle indicies by two. */ > - emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx)); > + if (maskmode == V8SImode) > + emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx)); > + else > + emit_insn (gen_addv32qi3 (t1, t1, t1));
I guess this would be cleaner to use plus always. And thus expand_simple_binop instead of (a couple of) these mode tests. > > + case V32QImode: > + t1 = gen_reg_rtx (V32QImode); > + t2 = gen_reg_rtx (V32QImode); > + t3 = gen_reg_rtx (V32QImode); > + vt2 = GEN_INT (128); > + for (i = 0; i < 32; i++) > + vec[i] = vt2; > + vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec)); > + vt = force_reg (V32QImode, vt); > + for (i = 0; i < 32; i++) > + vec[i] = i < 16 ? vt2 : const0_rtx; > + vt2 = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec)); > + vt2 = force_reg (V32QImode, vt2); > + emit_insn (gen_avx2_lshlv4di3 (gen_lowpart (V4DImode, t1), > + gen_lowpart (V4DImode, mask), > + GEN_INT (3))); > + emit_insn (gen_avx2_andnotv32qi3 (t2, vt, mask)); > + emit_insn (gen_xorv32qi3 (t1, t1, vt2)); > + emit_insn (gen_andv32qi3 (t1, t1, vt)); > + emit_insn (gen_iorv32qi3 (t3, t1, t2)); > + emit_insn (gen_xorv32qi3 (t1, t1, vt)); > + emit_insn (gen_avx2_permv4di_1 (gen_lowpart (V4DImode, t3), > + gen_lowpart (V4DImode, t3), > + const2_rtx, GEN_INT (3), > + const0_rtx, const1_rtx)); > + emit_insn (gen_iorv32qi3 (t1, t1, t2)); Some commentary here is required. I might have expected to see a compare, or something, but the logical operations here are less than obvious. I believe I've commented on everything else in the previous messages. r~