Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.

Bin.Cheng Sun, 07 Dec 2014 19:46:21 -0800

On Mon, Dec 8, 2014 at 11:38 AM, Andrew Pinski <pins...@gmail.com> wrote:
> On Sun, Dec 7, 2014 at 7:35 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Mon, Dec 8, 2014 at 11:26 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Mon, Dec 8, 2014 at 11:16 AM, Andrew Pinski <pins...@gmail.com> wrote:
>>>> On Sun, Dec 7, 2014 at 5:47 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>>> On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pins...@gmail.com> wrote:
>>>>>> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pins...@gmail.com> wrote:
>>>>>>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pins...@gmail.com> wrote:
>>>>>>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft
>>>>>>>> <marcus.shawcr...@gmail.com> wrote:
>>>>>>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>>>>>
>>>>>>>>>> 2014-11-18  Bin Cheng  <bin.ch...@arm.com>
>>>>>>>>>>
>>>>>>>>>>         * config/aarch64/aarch64.md (load_pair<mode>): Split to
>>>>>>>>>>         load_pairsi, load_pairdi, load_pairsf and load_pairdf.
>>>>>>>>>>         (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
>>>>>>>>>>         from load_pair<mode>.  New alternative to support int/fp
>>>>>>>>>>         registers in fp/int mode patterns.
>>>>>>>>>>         (store_pair<mode>:): Split to store_pairsi, store_pairdi,
>>>>>>>>>>         store_pairsf and store_pairdi.
>>>>>>>>>>         (store_pairsi, store_pairdi, store_pairsf, store_pairdf): 
>>>>>>>>>> Split
>>>>>>>>>>         from store_pair<mode>.  New alternative to support int/fp
>>>>>>>>>>         registers in fp/int mode patterns.
>>>>>>>>>>         (*load_pair_extendsidi2_aarch64): New pattern.
>>>>>>>>>>         (*load_pair_zero_extendsidi2_aarch64): New pattern.
>>>>>>>>>>         (aarch64-ldpstp.md): Include.
>>>>>>>>>>         * config/aarch64/aarch64-ldpstp.md: New file.
>>>>>>>>>>         * config/aarch64/aarch64-protos.h 
>>>>>>>>>> (aarch64_gen_adjusted_ldpstp):
>>>>>>>>>> New.
>>>>>>>>>>         (extract_base_offset_in_addr): New.
>>>>>>>>>>         (aarch64_operands_ok_for_ldpstp): New.
>>>>>>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New.
>>>>>>>>>>         * config/aarch64/aarch64.c (enum sched_fusion_type): New 
>>>>>>>>>> enum.
>>>>>>>>>>         (TARGET_SCHED_FUSION_PRIORITY): New hook.
>>>>>>>>>>         (fusion_load_store): New functon.
>>>>>>>>>>         (extract_base_offset_in_addr): New function.
>>>>>>>>>>         (aarch64_gen_adjusted_ldpstp): New function.
>>>>>>>>>>         (aarch64_sched_fusion_priority): New function.
>>>>>>>>>>         (aarch64_operands_ok_for_ldpstp): New function.
>>>>>>>>>>         (aarch64_operands_adjust_ok_for_ldpstp): New function.
>>>>>>>>>>
>>>>>>>>>> 2014-11-18  Bin Cheng  <bin.ch...@arm.com>
>>>>>>>>>>
>>>>>>>>>>         * gcc.target/aarch64/ldp-stp-1.c: New test.
>>>>>>>>>>         * gcc.target/aarch64/ldp-stp-2.c: New test.
>>>>>>>>>>         * gcc.target/aarch64/ldp-stp-3.c: New test.
>>>>>>>>>>         * gcc.target/aarch64/ldp-stp-4.c: New test.
>>>>>>>>>>         * gcc.target/aarch64/ldp-stp-5.c: New test.
>>>>>>>>>>         * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
>>>>>>>>>>         and peephole2 pass.
>>>>>>>>>
>>>>>>>>> Committed, thanks. /Marcus
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch has a bug in it dealing with volatile mems.  It will do a
>>>>>>>> peephole of two volatile mems into a pair which is not correct as the
>>>>>>>> order in the architecture is not defined and might even swap the order
>>>>>>>> of the load/store incorrectly.
>>>>>>>> I noticed this when I was merging in my changes for an improved
>>>>>>>> peephone which already has a check for volatile.
>>>>>>>
>>>>>>> Something like:
>>>>>>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx
>>>>>>> *operands, bool load,
>>>>>>>        reg_2 = operands[3];
>>>>>>>      }
>>>>>>>
>>>>>>> +  /* The mems cannot be volatile.  */
>>>>>>> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>>    /* Check if the addresses are in the form of [base+offset].  */
>>>>>>>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>>>>>>>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>>>>>>
>>>>>>>
>>>>>>> ---- CUT ----
>>>>>>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads
>>>>>>> being merged into one load pair.
>>>>>>
>>>>>>
>>>>>> And this one:
>>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>>> index 40881e7..da36dc8 100644
>>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>>> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
>>>>>> *operands, bool load,
>>>>>>    if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
>>>>>>      return false;
>>>>>>
>>>>>> +  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
>>>>>> +      || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
>>>>>> +    return false;
>>>>>> +
>>>>>>    /* Check if the addresses are in the form of [base+offset].  */
>>>>>>    extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
>>>>>>    if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
>>>>>> --- CUT ---
>>>>>> This one happens with c-c++-common/torture/complex-sign-add.c.
>>>>>> I will post a full patch tomorrow.
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Andrew Pinski
>>>>> Hi Andrew,
>>>>> Thanks for reporting this.  When I facing this volatile problem, I
>>>>> thought it was fine as long as the memory access wasn't removed
>>>>> accidentally.  Could you please explain more why the order of volatile
>>>>> also matters?
>>>>
>>>> Because volatile access between sequence points cannot be reordered.
>>>> See https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html .
>>>> Mainly:
>>>> " The minimum requirement is that at a sequence point all previous
>>>> accesses to volatile objects have stabilized and no subsequent
>>>> accesses have occurred. Thus an implementation is free to reorder and
>>>> combine volatile accesses that occur between sequence points, but
>>>> cannot do so for accesses across a sequence point".
>>>>
>>>>> Is it related to some barriers between the two
>>>>> accesses?
>>>>
>>>> Volatile accesses are not allowed to be merged as I pointed out above
>>>> or reordered which gives an implicit barrier between the two accesses.
>>> Ah, yes, it is implicit barrier that can't be crossed.
>>> Thanks very much for explanation.
>>>
>>> Thanks,
>>> bin
>>>>
>>>>> If that's the case, does it imply that dependence created
>>>>> by GCC isn't correct?
>>>>
>>>> It is correct just not the peepholes you created.
>>>>
>>>>> I am still not sure why the case passed in my
>>>>> regression test, or how I missed it.
>>>>
>>>> Well that is because the ICE only shows up with my improved
>>>> peephole2/load_pair/store_pair patterns.  My load_pair/strore_pair
>>>> patterns that I had created were rejecting volatile memory locations
>>>> to make sure nobody used these patterns with volatile memory
>>>> locations.  Once I get some more time, I will post a clean up patch of
>>>> my peephole2/load_pair/store_pair which I think are slightly easier to
>>>> understand.
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>>>
>>>>> Any way, volatile cases should be handled conservatively here.  Thanks 
>>>>> again.
>>>>>
>>>>> Thanks,
>>>>> bin
>>
>> As for the fix, I think it is TARGET_SCHED_FUSION_PRIORITY that needs
>> to be fixed.  Scheduler reorders loads/stores according to the
>> priorities set by that hook function.  After fixing the hook, even
>> volatile accesses can be paired if they are next to each other (or
>> volatile access and normal one) at the first place (which means no
>> sequence point between volatile accesses).  Makes sense?
>
> I think the scheduler should already have code to handle volatile
> memory locations and have the implicit barrier.
> And sequence point in the documentation is talking about source code
> sequence points (that is an end of a statement is considered a
> sequence point).
>
> Thanks,
> Andrew Pinski
>
>>
>> Thanks,
>> bin


Right, sched-deps adds dependence between volatile access.

Thanks,
bin

Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.

Reply via email to