Richard Biener <richard.guent...@gmail.com> writes: > On Fri, Jan 12, 2018 at 10:22 PM, Will Schmidt > <will_schm...@vnet.ibm.com> wrote: >> Hi, >> Add support for gimple folding of the mergeh, mergel intrinsics. >> Since the merge low and merge high variants are almost identical, a >> new helper function has been added so that code can be shared. >> >> This also adds define_insn for xxmrghw, xxmrglw instructions, allowing us >> to generate xxmrglw instead of vmrglw after folding. A few whitespace >> fixes have been made to the existing vmrg?w defines. >> >> The changes introduced here affect the existing target testcases >> gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that >> a number of the scan-assembler tests would fail due to instruction counts >> changing. Since the purpose of that test is to primarily ensure those >> intrinsics are accepted by the compiler, I have disabled gimple-folding for >> the existing tests that count instructions, and created new variants of those >> tests with folding enabled and a higher optimization level, that do not count >> instructions. >> >> Regtests are currently running across assorted power systems. >> OK for trunk, pending successful results? >> >> Thanks, >> -Will >> >> [gcc] >> >> 2018-01-12 Will Schmidt <will_schm...@vnet.ibm.com> >> >> * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding >> support for merge[hl]. (fold_mergehl_helper): New helper function. >> * config/rs6000/altivec.md (altivec_xxmrghw_direct): New. >> (altivec_xxmrglw_direct): New. >> >> [testsuite] >> >> 2018-01-12 Will Schmidt <will_schm...@vnet.ibm.com> >> >> * gcc.target/powerpc/fold-vec-mergehl-char.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-double.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-float.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-int.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-longlong.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-pixel.c: New. >> * gcc.target/powerpc/fold-vec-mergehl-short.c: New. >> * gcc.target/powerpc/builtins-1-be.c: Disable gimple-folding. >> * gcc.target/powerpc/builtins-1-le.c: Disable gimple-folding. >> * gcc.target/powerpc/builtins-1-be-folded.c: New. >> * gcc.target/powerpc/builtins-1-le-folded.c: New. >> * gcc.target/powerpc/builtins-1.fold.h: New. >> >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 733d920..65d4548 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1101,10 +1101,20 @@ >> else >> return "vmrglw %0,%2,%1"; >> } >> [(set_attr "type" "vecperm")]) >> >> + >> +(define_insn "altivec_xxmrghw_direct" >> + [(set (match_operand:V4SI 0 "register_operand" "=v") >> + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") >> + (match_operand:V4SI 2 "register_operand" "v")] >> + UNSPEC_VMRGH_DIRECT))] >> + "TARGET_P8_VECTOR" >> + "xxmrghw %x0,%x1,%x2" >> + [(set_attr "type" "vecperm")]) >> + >> (define_insn "altivec_vmrghw_direct" >> [(set (match_operand:V4SI 0 "register_operand" "=v") >> (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") >> (match_operand:V4SI 2 "register_operand" "v")] >> UNSPEC_VMRGH_DIRECT))] >> @@ -1185,12 +1195,12 @@ >> [(set_attr "type" "vecperm")]) >> >> (define_insn "altivec_vmrglb_direct" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") >> - (match_operand:V16QI 2 "register_operand" "v")] >> - UNSPEC_VMRGL_DIRECT))] >> + (match_operand:V16QI 2 "register_operand" "v")] >> + UNSPEC_VMRGL_DIRECT))] >> "TARGET_ALTIVEC" >> "vmrglb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> (define_expand "altivec_vmrglh" >> @@ -1242,11 +1252,11 @@ >> >> (define_insn "altivec_vmrglh_direct" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v") >> (match_operand:V8HI 2 "register_operand" "v")] >> - UNSPEC_VMRGL_DIRECT))] >> + UNSPEC_VMRGL_DIRECT))] >> "TARGET_ALTIVEC" >> "vmrglh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> (define_expand "altivec_vmrglw" >> @@ -1290,10 +1300,19 @@ >> else >> return "vmrghw %0,%2,%1"; >> } >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_xxmrglw_direct" >> + [(set (match_operand:V4SI 0 "register_operand" "=v") >> + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") >> + (match_operand:V4SI 2 "register_operand" "v")] >> + UNSPEC_VMRGL_DIRECT))] >> + "TARGET_P8_VECTOR" >> + "xxmrglw %x0,%x1,%x2" >> + [(set_attr "type" "vecperm")]) >> + >> (define_insn "altivec_vmrglw_direct" >> [(set (match_operand:V4SI 0 "register_operand" "=v") >> (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") >> (match_operand:V4SI 2 "register_operand" "v")] >> UNSPEC_VMRGL_DIRECT))] >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 840b83c..18a7424 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -16121,10 +16121,45 @@ fold_compare_helper (gimple_stmt_iterator > *gsi, tree_code code, gimple *stmt) >> gimple *g = gimple_build_assign (lhs, cmp); >> gimple_set_location (g, gimple_location (stmt)); >> gsi_replace (gsi, g, true); >> } >> >> +/* Helper function to handle the vector merge[hl] built-ins. The >> + implementation difference between h and l versions for this code are in >> + the values used when building of the permute vector for high word versus >> + low word merge. The variance is keyed off the use_high parameter. */ >> +static void >> +fold_mergehl_helper (gimple_stmt_iterator *gsi, gimple *stmt, int use_high) >> +{ >> + tree arg0 = gimple_call_arg (stmt, 0); >> + tree arg1 = gimple_call_arg (stmt, 1); >> + tree lhs = gimple_call_lhs (stmt); >> + tree lhs_type = TREE_TYPE (lhs); >> + tree lhs_type_type = TREE_TYPE (lhs_type); >> + gimple *g; >> + int n_elts = TYPE_VECTOR_SUBPARTS (lhs_type); >> + vec<constructor_elt, va_gc> *ctor_elts = NULL; >> + int midpoint = n_elts / 2; >> + int offset = 0; >> + if (use_high == 1) >> + offset = midpoint; >> + for (int i = 0; i < midpoint; i++) >> + { >> + tree tmp1 = build_int_cst (lhs_type_type, offset + i); >> + tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + i); >> + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1); >> + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2); >> + } >> + tree permute = create_tmp_reg_or_ssa_name (lhs_type); >> + g = gimple_build_assign (permute, build_constructor (lhs_type, >> ctor_elts)); > > I think this is no longer canonical GIMPLE (Richard?)
FWIW, although the recent patches added the option of using wider permute vectors if the permute vector is constant, it's still OK to use the original style of permute vectors if that's known to be valid. In this case it is because we know the indices won't wrap, given the size of the input vectors. > and given it is also a constant you shouldn't emit a CONSTRUCTOR here > but directly construct the appropriate VECTOR_CST. So it looks like > the mergel/h intrinsics interleave the low or high part of two > vectors? Thanks, Richard