On Mon, 2018-01-15 at 09:08 -0600, Segher Boessenkool wrote:
> Hi Will,
>
> On Fri, Jan 12, 2018 at 03:22:06PM -0600, Will Schmidt wrote:
> > Add support for gimple folding of the mergeh, mergel intrinsics.
> > Since the merge low and merge high variants are almost identical, a
> > new helper function has been added so that code can be shared.
> >
> > This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us
> > to generate xxmrglw instead of vmrglw after folding. A few whitespace
> > fixes have been made to the existing vmrg?w defines.
> >
> > The changes introduced here affect the existing target testcases
> > gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that
> > a number of the scan-assembler tests would fail due to instruction counts
> > changing. Since the purpose of that test is to primarily ensure those
> > intrinsics are accepted by the compiler, I have disabled gimple-folding for
> > the existing tests that count instructions, and created new variants of
> > those
> > tests with folding enabled and a higher optimization level, that do not
> > count
> > instructions.
>
> > 2018-01-12 Will Schmidt <[email protected]>
> >
> > * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding
> > support for merge[hl].
>
> New line here.
>
> > (fold_mergehl_helper): New helper function.
> > * config/rs6000/altivec.md (altivec_xxmrghw_direct): New.
> > (altivec_xxmrglw_direct): New.
>
> > +(define_insn "altivec_xxmrghw_direct"
> > + [(set (match_operand:V4SI 0 "register_operand" "=v")
> > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > + (match_operand:V4SI 2 "register_operand" "v")]
> > + UNSPEC_VMRGH_DIRECT))]
> > + "TARGET_P8_VECTOR"
> > + "xxmrghw %x0,%x1,%x2"
> > + [(set_attr "type" "vecperm")])
> > +
> > (define_insn "altivec_vmrghw_direct"
> > [(set (match_operand:V4SI 0 "register_operand" "=v")
> > (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > (match_operand:V4SI 2 "register_operand" "v")]
> > UNSPEC_VMRGH_DIRECT))]
>
> How do these two differ? The xx variant can write all 64 VSR registers,
> it needs different constraints (wa?). Can the two patterns be merged?
> It doesn't need the TARGET_P8_VECTOR condition then: the constraints
> will handle that. And actually it is a v2.06 insn (p7)?
They differ in..
TARGET_P8_VECTOR, versus TARGET_ALTIVEC
xxmrghw %x0,%x1,%x2, versus vmrghw %0,%1,%2
Not clear to me if they can be merged. I'm weak in my grasp of the
constraints. I can dig into that, (and would accept additional hints
too :-) )
xxmrghw does show up in my book as V2.06.
In full, for reference:
(define_insn "altivec_xxmrghw_direct"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
(match_operand:V4SI 2 "register_operand" "v")]
UNSPEC_VMRGH_DIRECT))]
"TARGET_P8_VECTOR"
"xxmrghw %x0,%x1,%x2"
[(set_attr "type" "vecperm")])
(define_insn "altivec_vmrghw_direct"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
(match_operand:V4SI 2 "register_operand" "v")]
UNSPEC_VMRGH_DIRECT))]
"TARGET_ALTIVEC"
"vmrghw %0,%1,%2"
[(set_attr "type" "vecperm")])
>
> > (define_insn "altivec_vmrglb_direct"
> > [(set (match_operand:V16QI 0 "register_operand" "=v")
> > (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
>
> This line should start with a tab as well?
>
> > - (match_operand:V16QI 2 "register_operand" "v")]
> > - UNSPEC_VMRGL_DIRECT))]
> > + (match_operand:V16QI 2 "register_operand" "v")]
> > + UNSPEC_VMRGL_DIRECT))]
> > "TARGET_ALTIVEC"
> > "vmrglb %0,%1,%2"
> > [(set_attr "type" "vecperm")])
>
>
> > (define_insn "altivec_vmrglh_direct"
> > [(set (match_operand:V8HI 0 "register_operand" "=v")
> > (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")
>
> Same here.
>
> > (match_operand:V8HI 2 "register_operand" "v")]
> > - UNSPEC_VMRGL_DIRECT))]
> > + UNSPEC_VMRGL_DIRECT))]
> > "TARGET_ALTIVEC"
> > "vmrglh %0,%1,%2"
> > [(set_attr "type" "vecperm")])
>
>
> > +(define_insn "altivec_xxmrglw_direct"
> > + [(set (match_operand:V4SI 0 "register_operand" "=v")
> > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > + (match_operand:V4SI 2 "register_operand" "v")]
> > + UNSPEC_VMRGL_DIRECT))]
> > + "TARGET_P8_VECTOR"
> > + "xxmrglw %x0,%x1,%x2"
> > + [(set_attr "type" "vecperm")])
>
> Exactly analogous to mrghw comments.
>
> > +/* Helper function to handle the vector merge[hl] built-ins. The
> > + implementation difference between h and l versions for this code are in
> > + the values used when building of the permute vector for high word versus
> > + low word merge. The variance is keyed off the use_high parameter. */
>
> The continuation lines should be indented by three spaces, so that the
> text lines up.
>
> > +static void
> > +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high)
> > +{
> > + tree arg0 = gimple_call_arg (stmt, 0);
> > + tree arg1 = gimple_call_arg (stmt, 1);
> > + tree lhs = gimple_call_lhs (stmt);
> > + tree lhs_type = TREE_TYPE (lhs);
> > + tree lhs_type_type = TREE_TYPE (lhs_type);
> > + gimple *g;
> > + int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type);
> > + vec<constructor_elt, va_gc> *ctor_elts = NULL;
> > + int midpoint = n_elts / 2;
> > + int offset = 0;
> > + if (use_high == 1)
> > + offset = midpoint;
> > + for (int i = 0; i < midpoint; i++)
> > + {
> > + tree tmp1 = build_int_cst (lhs_type_type, offset + i);
> > + tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i);
> > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1);
> > + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2);
> > + }
> > + tree permute = create_tmp_reg_or_ssa_name (lhs_type);
> > + g = gimple_build_assign (permute, build_constructor (lhs_type,
> > ctor_elts));
> > + gimple_set_location (g, gimple_location (stmt));
> > + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> > + g = gimple_build_assign (lhs, VEC_PERM_EXPR, arg0, arg1, permute);
> > + gimple_set_location (g, gimple_location (stmt));
> > + gsi_replace (gsi, g, true);
> > +}
>
> Maybe add a few empty lines, to aid reading?
>
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target { powerpc64-*-* } } } */
>
> powerpc*-*-*, and if you really want to run on 64-bit only test lp64 (but
> is there any reason to?) (Same in the other tests).
I think this was a direct copy/paste from the other tests. I'll give it
another look and see.
>
> We really could use some big_endian / little_endian selectors, too; I'll
> see what I can do. No need to hold up this patch though.
>
>
> Segher
>