On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> > wrote: >> >> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guent...@gmail.com> >> wrote: >>> >>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt >>> <will_schm...@vnet.ibm.com> wrote: >>>> Hi, >>>> >>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE >>>> >>>> Folding of vector loads in GIMPLE. >>>> >>>> Add code to handle gimple folding for the vec_ld builtins. >>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. >>>> Surrounding >>>> comments have been adjusted slightly so they continue to read OK for the >>>> existing vec_st code. >>>> >>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c >>>> tests which have been posted separately. >>>> >>>> For V2 of this patch, I've removed the chunk of code that prohibited the >>>> gimple fold from occurring in BE environments. This had fixed an issue >>>> for me earlier during my development of the code, and turns out this was >>>> not necessary. I've sniff-tested after removing that check and it looks >>>> OK. >>>> >>>>> + /* Limit folding of loads to LE targets. */ >>>>> + if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG) >>>>> + return false; >>>> >>>> I've restarted a regression test on this updated version. >>>> >>>> OK for trunk (assuming successful regression test completion) ? >>>> >>>> Thanks, >>>> -Will >>>> >>>> [gcc] >>>> >>>> 2017-09-12 Will Schmidt <will_schm...@vnet.ibm.com> >>>> >>>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling >>>> for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*). >>>> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): >>>> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD. >>>> >>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c >>>> index fbab0a2..bb8a77d 100644 >>>> --- a/gcc/config/rs6000/rs6000-c.c >>>> +++ b/gcc/config/rs6000/rs6000-c.c >>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t >>>> loc, tree fndecl, >>>> convert (TREE_TYPE (stmt), arg0)); >>>> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); >>>> return stmt; >>>> } >>>> >>>> - /* Expand vec_ld into an expression that masks the address and >>>> - performs the load. We need to expand this early to allow >>>> + /* Expand vec_st into an expression that masks the address and >>>> + performs the store. We need to expand this early to allow >>>> the best aliasing, as by the time we get into RTL we no longer >>>> are able to honor __restrict__, for example. We may want to >>>> consider this for all memory access built-ins. >>>> >>>> When -maltivec=be is specified, or the wrong number of arguments >>>> is provided, simply punt to existing built-in processing. */ >>>> - if (fcode == ALTIVEC_BUILTIN_VEC_LD >>>> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) >>>> - && nargs == 2) >>>> - { >>>> - tree arg0 = (*arglist)[0]; >>>> - tree arg1 = (*arglist)[1]; >>>> - >>>> - /* Strip qualifiers like "const" from the pointer arg. */ >>>> - tree arg1_type = TREE_TYPE (arg1); >>>> - if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != >>>> ARRAY_TYPE) >>>> - goto bad; >>>> - >>>> - tree inner_type = TREE_TYPE (arg1_type); >>>> - if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0) >>>> - { >>>> - arg1_type = build_pointer_type (build_qualified_type (inner_type, >>>> - 0)); >>>> - arg1 = fold_convert (arg1_type, arg1); >>>> - } >>>> - >>>> - /* Construct the masked address. Let existing error handling take >>>> - over if we don't have a constant offset. */ >>>> - arg0 = fold (arg0); >>>> - >>>> - if (TREE_CODE (arg0) == INTEGER_CST) >>>> - { >>>> - if (!ptrofftype_p (TREE_TYPE (arg0))) >>>> - arg0 = build1 (NOP_EXPR, sizetype, arg0); >>>> - >>>> - tree arg1_type = TREE_TYPE (arg1); >>>> - if (TREE_CODE (arg1_type) == ARRAY_TYPE) >>>> - { >>>> - arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type)); >>>> - tree const0 = build_int_cstu (sizetype, 0); >>>> - tree arg1_elt0 = build_array_ref (loc, arg1, const0); >>>> - arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0); >>>> - } >>>> - >>>> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type, >>>> - arg1, arg0); >>>> - tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, >>>> addr, >>>> - build_int_cst (arg1_type, -16)); >>>> - >>>> - /* Find the built-in to get the return type so we can convert >>>> - the result properly (or fall back to default handling if the >>>> - arguments aren't compatible). */ >>>> - for (desc = altivec_overloaded_builtins; >>>> - desc->code && desc->code != fcode; desc++) >>>> - continue; >>>> - >>>> - for (; desc->code == fcode; desc++) >>>> - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), >>>> desc->op1) >>>> - && (rs6000_builtin_type_compatible (TREE_TYPE (arg1), >>>> - desc->op2))) >>>> - { >>>> - tree ret_type = rs6000_builtin_type (desc->ret_type); >>>> - if (TYPE_MODE (ret_type) == V2DImode) >>>> - /* Type-based aliasing analysis thinks vector long >>>> - and vector long long are different and will put them >>>> - in distinct alias classes. Force our return type >>>> - to be a may-alias type to avoid this. */ >>>> - ret_type >>>> - = build_pointer_type_for_mode (ret_type, Pmode, >>>> - true/*can_alias_all*/); >>>> - else >>>> - ret_type = build_pointer_type (ret_type); >>>> - aligned = build1 (NOP_EXPR, ret_type, aligned); >>>> - tree ret_val = build_indirect_ref (loc, aligned, RO_NULL); >>>> - return ret_val; >>>> - } >>>> - } >>>> - } >>>> >>>> - /* Similarly for stvx. */ >>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST >>>> && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) >>>> && nargs == 3) >>>> { >>>> tree arg0 = (*arglist)[0]; >>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >>>> index 1338371..1fb5f44 100644 >>>> --- a/gcc/config/rs6000/rs6000.c >>>> +++ b/gcc/config/rs6000/rs6000.c >>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator >>>> *gsi) >>>> res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res); >>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>> update_call_from_tree (gsi, res); >>>> return true; >>>> } >>>> + /* Vector loads. */ >>>> + case ALTIVEC_BUILTIN_LVX_V16QI: >>>> + case ALTIVEC_BUILTIN_LVX_V8HI: >>>> + case ALTIVEC_BUILTIN_LVX_V4SI: >>>> + case ALTIVEC_BUILTIN_LVX_V4SF: >>>> + case ALTIVEC_BUILTIN_LVX_V2DI: >>>> + case ALTIVEC_BUILTIN_LVX_V2DF: >>>> + { >>>> + gimple *g; >>>> + arg0 = gimple_call_arg (stmt, 0); // offset >>>> + arg1 = gimple_call_arg (stmt, 1); // address >>>> + >>>> + lhs = gimple_call_lhs (stmt); >>>> + location_t loc = gimple_location (stmt); >>>> + >>>> + tree arg1_type = TREE_TYPE (arg1); >>>> + tree lhs_type = TREE_TYPE (lhs); >>>> + >>>> + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. >>>> Create >>>> + the tree using the value from arg0. The resulting type will >>>> match >>>> + the type of arg1. */ >>>> + tree temp_offset = create_tmp_reg_or_ssa_name (sizetype); >>>> + g = gimple_build_assign (temp_offset, NOP_EXPR, arg0); >>>> + gimple_set_location (g, loc); >>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); >>>> + tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type); >>>> + g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1, >>>> + temp_offset); >>>> + gimple_set_location (g, loc); >>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); >>>> + >>>> + /* Mask off any lower bits from the address. */ >>>> + tree alignment_mask = build_int_cst (arg1_type, -16); >>>> + tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type); >>>> + g = gimple_build_assign (aligned_addr, BIT_AND_EXPR, >>>> + temp_addr, alignment_mask); >>>> + gimple_set_location (g, loc); >>>> + gsi_insert_before (gsi, g, GSI_SAME_STMT); >>> >>> You could use >>> >>> gimple_seq stmts = NULL; >>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); >>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, >>> arg1_type, arg1, temp_offset); >>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR, >>> arg1_type, temp_addr, build_int_cst (arg1_type, -16)); >>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>> >>>> + /* Use the build2 helper to set up the mem_ref. The MEM_REF >>>> could also >>>> + take an offset, but since we've already incorporated the offset >>>> + above, here we just pass in a zero. */ >>>> + g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, >>>> aligned_addr, >>>> + build_int_cst (arg1_type, >>>> 0))); >>> >>> are you sure about arg1_type here? I'm sure not. For >>> >>> ... foo (struct S *p) >>> { >>> return __builtin_lvx_v2df (4, (double *)p); >>> } >>> >>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus >>> TBAA using 'struct S' to access the memory. >> >> Hm, is that so? Wouldn't arg1_type be double* since arg1 is (double *)p? >> Will, you should probably test this example and see, but I'm pretty confident >> about this (see below). > > But, as I should have suspected, you're right. For some reason > gimple_call_arg is returning p, stripped of the cast information where the > user asserted that p points to a double*. > > Can you explain to me why this should be so? I assume that somebody > has decided to strip_nops the argument and lose the cast.
pointer types have no meaning in GIMPLE so we aggressively prune them. > Using ptr_type_node loses all type information, so that would be a > regression from what we do today. In some cases we could reconstruct > that this was necessarily, say, a double*, but I don't know how we would > recover the signedness for an integer type. How did we handle the expansion previously - ah - it was done earlier in the C FE. So why are you moving it to GIMPLE? The function is called resolve_overloaded_builtin - what kind of overloading do you resolve here? As said argument types might not be preserved. Richard. > Bill >> >>> >>> I think if the builtins have any TBAA constraints you need to build those >>> explicitely, if not, you should use ptr_type_node aka no TBAA. >> >> The type signatures are constrained during parsing, so we should only >> see allowed pointer types on arg1 by the time we get to gimple folding. I >> think that using arg1_type should work, but I am probably missing >> something subtle, so please feel free to whack me on the temple until >> I get it. :-) >> >> Bill >>> >>> Richard. >>> >>>> + gimple_set_location (g, loc); >>>> + gsi_replace (gsi, g, true); >>>> + >>>> + return true; >>>> + >>>> + } >>>> + >>>> default: >>>> if (TARGET_DEBUG_BUILTIN) >>>> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s >>>> %s\n", >>>> fn_code, fn_name1, fn_name2); >>>> break; >