On Jun 1, 2018, at 10:35 AM, Richard Biener <[email protected]> wrote:
>
> On June 1, 2018 5:15:58 PM GMT+02:00, Bill Schmidt <[email protected]>
> wrote:
>> On Jun 1, 2018, at 10:11 AM, Will Schmidt <[email protected]>
>> wrote:
>>>
>>> On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote:
>>>> On Thu, May 31, 2018 at 9:59 PM Will Schmidt
>> <[email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>> Add support for gimple folding for unaligned vector loads and
>> stores.
>>>>> testcases posted separately in this thread.
>>>>>
>>>>> Regtest completed across variety of systems, P6,P7,P8,P9.
>>>>>
>>>>> OK for trunk?
>>>>> Thanks,
>>>>> -Will
>>>>>
>>>>> [gcc]
>>>>>
>>>>> 2018-05-31 Will Schmidt <[email protected]>
>>>>>
>>>>> * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs)
>> Add vec_xst
>>>>> variants to the list. (rs6000_gimple_fold_builtin) Add
>> support for
>>>>> folding unaligned vector loads and stores.
>>>>>
>>>>> diff --git a/gcc/config/rs6000/rs6000.c
>> b/gcc/config/rs6000/rs6000.c
>>>>> index d62abdf..54b7de2 100644
>>>>> --- a/gcc/config/rs6000/rs6000.c
>>>>> +++ b/gcc/config/rs6000/rs6000.c
>>>>> @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum
>> rs6000_builtins fn_code)
>>>>> case ALTIVEC_BUILTIN_STVX_V8HI:
>>>>> case ALTIVEC_BUILTIN_STVX_V4SI:
>>>>> case ALTIVEC_BUILTIN_STVX_V4SF:
>>>>> case ALTIVEC_BUILTIN_STVX_V2DI:
>>>>> case ALTIVEC_BUILTIN_STVX_V2DF:
>>>>> + case VSX_BUILTIN_STXVW4X_V16QI:
>>>>> + case VSX_BUILTIN_STXVW4X_V8HI:
>>>>> + case VSX_BUILTIN_STXVW4X_V4SF:
>>>>> + case VSX_BUILTIN_STXVW4X_V4SI:
>>>>> + case VSX_BUILTIN_STXVD2X_V2DF:
>>>>> + case VSX_BUILTIN_STXVD2X_V2DI:
>>>>> return true;
>>>>> default:
>>>>> return false;
>>>>> }
>>>>> }
>>>>> @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin
>> (gimple_stmt_iterator *gsi)
>>>>> gimple_set_location (g, loc);
>>>>> gsi_replace (gsi, g, true);
>>>>> return true;
>>>>> }
>>>>>
>>>>> + /* unaligned Vector loads. */
>>>>> + case VSX_BUILTIN_LXVW4X_V16QI:
>>>>> + case VSX_BUILTIN_LXVW4X_V8HI:
>>>>> + case VSX_BUILTIN_LXVW4X_V4SF:
>>>>> + case VSX_BUILTIN_LXVW4X_V4SI:
>>>>> + case VSX_BUILTIN_LXVD2X_V2DF:
>>>>> + case VSX_BUILTIN_LXVD2X_V2DI:
>>>>> + {
>>>>> + arg0 = gimple_call_arg (stmt, 0); // offset
>>>>> + arg1 = gimple_call_arg (stmt, 1); // address
>>>>> + lhs = gimple_call_lhs (stmt);
>>>>> + location_t loc = gimple_location (stmt);
>>>>> + /* Since arg1 may be cast to a different type, just use
>> ptr_type_node
>>>>> + here instead of trying to enforce TBAA on pointer
>> types. */
>>>>> + tree arg1_type = ptr_type_node;
>>>>> + tree lhs_type = TREE_TYPE (lhs);
>>>>> + /* POINTER_PLUS_EXPR wants the offset to be of type
>> 'sizetype'. Create
>>>>> + the tree using the value from arg0. The resulting type
>> will match
>>>>> + the type of arg1. */
>>>>> + gimple_seq stmts = NULL;
>>>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype,
>> arg0);
>>>>> + tree temp_addr = gimple_build (&stmts, loc,
>> POINTER_PLUS_EXPR,
>>>>> + arg1_type, arg1,
>> temp_offset);
>>>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>>> + /* Use the build2 helper to set up the mem_ref. The
>> MEM_REF could also
>>>>> + take an offset, but since we've already incorporated
>> the offset
>>>>> + above, here we just pass in a zero. */
>>>>> + gimple *g;
>>>>> + g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type,
>> temp_addr,
>>>>> + build_int_cst
>> (arg1_type, 0)));
>>>>
>>>> So in GIMPLE the type of the MEM_REF specifies the alignment so my
>> question
>>>> is what type does the lhs usually have here? I'd simply guess V4SF,
>> etc.? In
>>>
>>> yes. (double-checking). my reference for the intrinsic signatures
>>> shows the lhs is a vector of type. The rhs can be either *type or
>>> *vector of type.
>>>
>>> vector double vec_vsx_ld (int, const vector double *);
>>> vector double vec_vsx_ld (int, const double *);
>>> With similar/same for the assorted other types.
>>>
>>> These are also on my list as 'unaligned' vector loads. I'm not
>> certain
>>> if that adds a twist to how I should answer the below..
>>>
>>> Bill?
>>
>> 'unaligned' means not necessarily aligned on a vector boundary.
>> They are guaranteed to be aligned on an element boundary.
>>>
>>>> this case you are missing a
>>>> tree ltype = build_aligned_type (lhs_type, desired-alignment);
>>>>
>>>> and use that ltype for building the MEM_REF. I suppose in this case
>> the known
>>>> alignment is either BITS_PER_UNIT or element alignment (thus
>>>> TYPE_ALIGN (TREE_TYPE (lhs_type)))?
>>>
>>> I'd think element alignment. but no longer certain. :-)
>>
>> Yep, element alignment.
>
> Note the x86 unaligned intrinsics support arbitray unaligned loads. So that's
> not available for power? Does the HW implementation require element
> alignment?
I had to go look this up again...
Actually, the required alignment is 4 bytes regardless of the data type. I
thought
it was 8 bytes for V2DF/V2DI accesses, but that's not correct. But we don't
support
arbitrary alignment at the byte level.
Thanks!
Bill
>
> Richard.
>
>> Thanks,
>> Bill
>>>
>>>> Or is the type of the load the element types?
>>>
>>>
>>> So, In any case.. I'll build up / modify some tests to look at data
>>> being loaded, and see if I can see alignment issues here.
>>>
>>> Thanks,
>>> -Will
>>>
>>>
>>>
>>>> Richard.
>>>>
>>>>> + gimple_set_location (g, loc);
>>>>> + gsi_replace (gsi, g, true);
>>>>> + return true;
>>>>> + }
>>>>> +
>>>>> + /* unaligned Vector stores. */
>>>>> + case VSX_BUILTIN_STXVW4X_V16QI:
>>>>> + case VSX_BUILTIN_STXVW4X_V8HI:
>>>>> + case VSX_BUILTIN_STXVW4X_V4SF:
>>>>> + case VSX_BUILTIN_STXVW4X_V4SI:
>>>>> + case VSX_BUILTIN_STXVD2X_V2DF:
>>>>> + case VSX_BUILTIN_STXVD2X_V2DI:
>>>>> + {
>>>>> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.
>> */
>>>>> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */
>>>>> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to
>> address. */
>>>>> + location_t loc = gimple_location (stmt);
>>>>> + tree arg0_type = TREE_TYPE (arg0);
>>>>> + /* Use ptr_type_node (no TBAA) for the arg2_type. */
>>>>> + tree arg2_type = ptr_type_node;
>>>>> + /* POINTER_PLUS_EXPR wants the offset to be of type
>> 'sizetype'. Create
>>>>> + the tree using the value from arg0. The resulting type
>> will match
>>>>> + the type of arg2. */
>>>>> + gimple_seq stmts = NULL;
>>>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype,
>> arg1);
>>>>> + tree temp_addr = gimple_build (&stmts, loc,
>> POINTER_PLUS_EXPR,
>>>>> + arg2_type, arg2,
>> temp_offset);
>>>>> + /* Mask off any lower bits from the address. */
>>>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>>>>> + gimple *g;
>>>>> + g = gimple_build_assign (build2 (MEM_REF, arg0_type,
>> temp_addr,
>>>>> + build_int_cst
>> (arg2_type, 0)), arg0);
>>>>> + gimple_set_location (g, loc);
>>>>> + gsi_replace (gsi, g, true);
>>>>> + return true;
>>>>> + }
>>>>> +
>>>>> /* Vector Fused multiply-add (fma). */
>>>>> case ALTIVEC_BUILTIN_VMADDFP:
>>>>> case VSX_BUILTIN_XVMADDDP:
>>>>> case ALTIVEC_BUILTIN_VMLADDUHM:
>>>>> {