On Thu, 2017-09-14 at 09:38 -0500, Bill Schmidt wrote: > On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guent...@gmail.com> > wrote: > > > > On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt > > <wschm...@linux.vnet.ibm.com> wrote: > >> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> > >> wrote: > >>> > >>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guent...@gmail.com> > >>> wrote: > >>>> > >>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt > >>>> <will_schm...@vnet.ibm.com> wrote: > >>>>> Hi, > >>>>> > >>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE > >>>>> > >>>>> Folding of vector loads in GIMPLE. > >>>>> > >>>>> Add code to handle gimple folding for the vec_ld builtins. > >>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. > >>>>> Surrounding > >>>>> comments have been adjusted slightly so they continue to read OK for the > >>>>> existing vec_st code. > >>>>> > >>>>> The resulting code is specifically verified by the > >>>>> powerpc/fold-vec-ld-*.c > >>>>> tests which have been posted separately. > >>>>> > >>>>> For V2 of this patch, I've removed the chunk of code that prohibited the > >>>>> gimple fold from occurring in BE environments. This had fixed an issue > >>>>> for me earlier during my development of the code, and turns out this was > >>>>> not necessary. I've sniff-tested after removing that check and it looks > >>>>> OK. > >>>>> > >>>>>> + /* Limit folding of loads to LE targets. */ > >>>>>> + if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG) > >>>>>> + return false; > >>>>> > >>>>> I've restarted a regression test on this updated version. > >>>>> > >>>>> OK for trunk (assuming successful regression test completion) ? > >>>>> > >>>>> Thanks, > >>>>> -Will > >>>>> > >>>>> [gcc] > >>>>> > >>>>> 2017-09-12 Will Schmidt <will_schm...@vnet.ibm.com> > >>>>> > >>>>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling > >>>>> for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*). > >>>>> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > >>>>> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD. > >>>>> > >>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > >>>>> index fbab0a2..bb8a77d 100644 > >>>>> --- a/gcc/config/rs6000/rs6000-c.c > >>>>> +++ b/gcc/config/rs6000/rs6000-c.c > >>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t > >>>>> loc, tree fndecl, > >>>>> convert (TREE_TYPE (stmt), arg0)); > >>>>> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); > >>>>> return stmt; > >>>>> } > >>>>> > >>>>> - /* Expand vec_ld into an expression that masks the address and > >>>>> - performs the load. We need to expand this early to allow > >>>>> + /* Expand vec_st into an expression that masks the address and > >>>>> + performs the store. We need to expand this early to allow > >>>>> the best aliasing, as by the time we get into RTL we no longer > >>>>> are able to honor __restrict__, for example. We may want to > >>>>> consider this for all memory access built-ins. > >>>>> > >>>>> When -maltivec=be is specified, or the wrong number of arguments > >>>>> is provided, simply punt to existing built-in processing. */ > >>>>> - if (fcode == ALTIVEC_BUILTIN_VEC_LD > >>>>> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) > >>>>> - && nargs == 2) > >>>>> - { > >>>>> - tree arg0 = (*arglist)[0]; > >>>>> - tree arg1 = (*arglist)[1]; > >>>>> - > >>>>> - /* Strip qualifiers like "const" from the pointer arg. */ > >>>>> - tree arg1_type = TREE_TYPE (arg1); > >>>>> - if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != > >>>>> ARRAY_TYPE) > >>>>> - goto bad; > >>>>> - > >>>>> - tree inner_type = TREE_TYPE (arg1_type); > >>>>> - if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0) > >>>>> - { > >>>>> - arg1_type = build_pointer_type (build_qualified_type > >>>>> (inner_type, > >>>>> - 0)); > >>>>> - arg1 = fold_convert (arg1_type, arg1); > >>>>> - } > >>>>> - > >>>>> - /* Construct the masked address. Let existing error handling > >>>>> take > >>>>> - over if we don't have a constant offset. */ > >>>>> - arg0 = fold (arg0); > >>>>> - > >>>>> - if (TREE_CODE (arg0) == INTEGER_CST) > >>>>> - { > >>>>> - if (!ptrofftype_p (TREE_TYPE (arg0))) > >>>>> - arg0 = build1 (NOP_EXPR, sizetype, arg0); > >>>>> - > >>>>> - tree arg1_type = TREE_TYPE (arg1); > >>>>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE) > >>>>> - { > >>>>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type)); > >>>>> - tree const0 = build_int_cstu (sizetype, 0); > >>>>> - tree arg1_elt0 = build_array_ref (loc, arg1, const0); > >>>>> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0); > >>>>> - } > >>>>> - > >>>>> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, > >>>>> arg1_type, > >>>>> - arg1, arg0); > >>>>> - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, > >>>>> addr, > >>>>> - build_int_cst (arg1_type, > >>>>> -16)); > >>>>> - > >>>>> - /* Find the built-in to get the return type so we can convert > >>>>> - the result properly (or fall back to default handling if > >>>>> the > >>>>> - arguments aren't compatible). */ > >>>>> - for (desc = altivec_overloaded_builtins; > >>>>> - desc->code && desc->code != fcode; desc++) > >>>>> - continue; > >>>>> - > >>>>> - for (; desc->code == fcode; desc++) > >>>>> - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), > >>>>> desc->op1) > >>>>> - && (rs6000_builtin_type_compatible (TREE_TYPE (arg1), > >>>>> - desc->op2))) > >>>>> - { > >>>>> - tree ret_type = rs6000_builtin_type (desc->ret_type); > >>>>> - if (TYPE_MODE (ret_type) == V2DImode) > >>>>> - /* Type-based aliasing analysis thinks vector long > >>>>> - and vector long long are different and will put > >>>>> them > >>>>> - in distinct alias classes. Force our return type > >>>>> - to be a may-alias type to avoid this. */ > >>>>> - ret_type > >>>>> - = build_pointer_type_for_mode (ret_type, Pmode, > >>>>> - > >>>>> true/*can_alias_all*/); > >>>>> - else > >>>>> - ret_type = build_pointer_type (ret_type); > >>>>> - aligned = build1 (NOP_EXPR, ret_type, aligned); > >>>>> - tree ret_val = build_indirect_ref (loc, aligned, > >>>>> RO_NULL); > >>>>> - return ret_val; > >>>>> - } > >>>>> - } > >>>>> - } > >>>>> > >>>>> - /* Similarly for stvx. */ > >>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST > >>>>> && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) > >>>>> && nargs == 3) > >>>>> { > >>>>> tree arg0 = (*arglist)[0]; > >>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > >>>>> index 1338371..1fb5f44 100644 > >>>>> --- a/gcc/config/rs6000/rs6000.c > >>>>> +++ b/gcc/config/rs6000/rs6000.c > >>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin > >>>>> (gimple_stmt_iterator *gsi) > >>>>> res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), > >>>>> res); > >>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > >>>>> update_call_from_tree (gsi, res); > >>>>> return true; > >>>>> } > >>>>> + /* Vector loads. */ > >>>>> + case ALTIVEC_BUILTIN_LVX_V16QI: > >>>>> + case ALTIVEC_BUILTIN_LVX_V8HI: > >>>>> + case ALTIVEC_BUILTIN_LVX_V4SI: > >>>>> + case ALTIVEC_BUILTIN_LVX_V4SF: > >>>>> + case ALTIVEC_BUILTIN_LVX_V2DI: > >>>>> + case ALTIVEC_BUILTIN_LVX_V2DF: > >>>>> + { > >>>>> + gimple *g; > >>>>> + arg0 = gimple_call_arg (stmt, 0); // offset > >>>>> + arg1 = gimple_call_arg (stmt, 1); // address > >>>>> + > >>>>> + lhs = gimple_call_lhs (stmt); > >>>>> + location_t loc = gimple_location (stmt); > >>>>> + > >>>>> + tree arg1_type = TREE_TYPE (arg1); > >>>>> + tree lhs_type = TREE_TYPE (lhs); > >>>>> + > >>>>> + /* POINTER_PLUS_EXPR wants the offset to be of type > >>>>> 'sizetype'. Create > >>>>> + the tree using the value from arg0. The resulting type > >>>>> will match > >>>>> + the type of arg1. */ > >>>>> + tree temp_offset = create_tmp_reg_or_ssa_name (sizetype); > >>>>> + g = gimple_build_assign (temp_offset, NOP_EXPR, arg0); > >>>>> + gimple_set_location (g, loc); > >>>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); > >>>>> + tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type); > >>>>> + g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1, > >>>>> + temp_offset); > >>>>> + gimple_set_location (g, loc); > >>>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); > >>>>> + > >>>>> + /* Mask off any lower bits from the address. */ > >>>>> + tree alignment_mask = build_int_cst (arg1_type, -16); > >>>>> + tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type); > >>>>> + g = gimple_build_assign (aligned_addr, BIT_AND_EXPR, > >>>>> + temp_addr, alignment_mask); > >>>>> + gimple_set_location (g, loc); > >>>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); > >>>> > >>>> You could use > >>>> > >>>> gimple_seq stmts = NULL; > >>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); > >>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, > >>>> arg1_type, arg1, temp_offset); > >>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR, > >>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16)); > >>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > >>>> > >>>>> + /* Use the build2 helper to set up the mem_ref. The MEM_REF > >>>>> could also > >>>>> + take an offset, but since we've already incorporated the > >>>>> offset > >>>>> + above, here we just pass in a zero. */ > >>>>> + g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, > >>>>> aligned_addr, > >>>>> + build_int_cst > >>>>> (arg1_type, 0))); > >>>> > >>>> are you sure about arg1_type here? I'm sure not. For > >>>> > >>>> ... foo (struct S *p) > >>>> { > >>>> return __builtin_lvx_v2df (4, (double *)p); > >>>> } > >>>> > >>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus > >>>> TBAA using 'struct S' to access the memory. > >>> > >>> Hm, is that so? Wouldn't arg1_type be double* since arg1 is (double *)p? > >>> Will, you should probably test this example and see, but I'm pretty > >>> confident > >>> about this (see below). > >> > >> But, as I should have suspected, you're right. For some reason > >> gimple_call_arg is returning p, stripped of the cast information where the > >> user asserted that p points to a double*. > >> > >> Can you explain to me why this should be so? I assume that somebody > >> has decided to strip_nops the argument and lose the cast. > > > > pointer types have no meaning in GIMPLE so we aggressively prune them. > > > >> Using ptr_type_node loses all type information, so that would be a > >> regression from what we do today. In some cases we could reconstruct > >> that this was necessarily, say, a double*, but I don't know how we would > >> recover the signedness for an integer type. > > > > How did we handle the expansion previously - ah - it was done earlier > > in the C FE. So why are you moving it to GIMPLE? The function is called > > resolve_overloaded_builtin - what kind of overloading do you resolve here? > > As said argument types might not be preserved. > > The AltiVec builtins allow overloaded names based on the argument types, > using a special callout during parsing to convert the overloaded names to > type-specific names. Historically these have then remained builtin calls > until RTL expansion, which loses a lot of useful optimization. Will has been > gradually implementing gimple folding for these builtins so that we can > optimize simple vector arithmetic and so on. The overloading is still dealt > with during parsing. > > As an example: > > double a[64]; > vector double x = vec_ld (0, a); > > will get translated into > > vector double x = __builtin_altivec_lvx_v2df (0, a); > > and > > unsigned char b[64]; > vector unsigned char y = vec_ld (0, b); > > will get translated into > > vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b); > > So in resolving the overloading we still maintain the type info for arg1. > > Earlier I had dealt with the performance issue in a different way for the > vec_ld and vec_st overloaded builtins, which created the rather grotty > code in rs6000-c.c to modify the parse trees instead. My hope was that > we could simplify the code by having Will deal with them as gimple folds > instead. But if in so doing we lose type information, that may not be the > right call. > > However, since you say that gimple aggressively removes the casts > from pointer types, perhaps the code that we see in early gimple from > the existing method might also be missing the type information? Will, > it would be worth looking at that code to see. If it's no different then > perhaps we still go ahead with the folding.
The rs6000-c.c version of the code did not fold unless arg0 was constant; and if it was a constant, it appears the operation got turned directly into a * reference. So there isn't a good before/after compare there. What I see: return vec_ld (ll1, (vector double *)p); at gimple-time after the rs6000-c.c folding was a mostly un-folded D.3207 = __builtin_altivec_lvx_v2dfD.1443 (ll1D.3192, pD.3193); while this, with a constant value for arg0: return vec_ld (16, (vector double *)p); at gimple time after rs6000-c.c folding became a reference: _1 = p + 16; _2 = _1 & -16B; D.3196 = *_2; with the rs6000.c gimple folding code (the changes I've got locally), the before/after with arg0 is constant reads the same. When arg0 is a variable: return vec_ld (ll1, (vector double *)p); at dump-gimple time it then becomes: D.3208 = (sizetype) ll1D.3192; D.3209 = pD.3193 + D.3208; D.3210 = D.3209 & -16B; D.3207 = MEM[(struct S *)D.3210]; And if I change the code such that arg1_type is instead ptr_type_node: D.3207 = MEM[(voidD.44 *)D.3210]; So... > Another note for Will: The existing code gives up when -maltivec=be has > been specified, and you probably want to do that as well. That may be > why you initially turned off big endian -- it is easy to misread that code. > -maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN. Yeah, I apparently inverted and confused the logic when I made that change. My current snippet reads as: /* Do not fold for -maltivec=be on LE targets. */ if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN) return false; > Thanks, > Bill > > > > Richard. > > > >> Bill > >>> > >>>> > >>>> I think if the builtins have any TBAA constraints you need to build those > >>>> explicitely, if not, you should use ptr_type_node aka no TBAA. > >>> > >>> The type signatures are constrained during parsing, so we should only > >>> see allowed pointer types on arg1 by the time we get to gimple folding. I > >>> think that using arg1_type should work, but I am probably missing > >>> something subtle, so please feel free to whack me on the temple until > >>> I get it. :-) > >>> > >>> Bill > >>>> > >>>> Richard. > >>>> > >>>>> + gimple_set_location (g, loc); > >>>>> + gsi_replace (gsi, g, true); > >>>>> + > >>>>> + return true; > >>>>> + > >>>>> + } > >>>>> + > >>>>> default: > >>>>> if (TARGET_DEBUG_BUILTIN) > >>>>> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s > >>>>> %s\n", > >>>>> fn_code, fn_name1, fn_name2); > >>>>> break; >