On Mon, Dec 8, 2014 at 11:38 AM, Andrew Pinski <pins...@gmail.com> wrote: > On Sun, Dec 7, 2014 at 7:35 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Mon, Dec 8, 2014 at 11:26 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Mon, Dec 8, 2014 at 11:16 AM, Andrew Pinski <pins...@gmail.com> wrote: >>>> On Sun, Dec 7, 2014 at 5:47 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>>> On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>>>> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>>>>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>>>>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft >>>>>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>>>>> >>>>>>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>>>>>> >>>>>>>>>> * config/aarch64/aarch64.md (load_pair<mode>): Split to >>>>>>>>>> load_pairsi, load_pairdi, load_pairsf and load_pairdf. >>>>>>>>>> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split >>>>>>>>>> from load_pair<mode>. New alternative to support int/fp >>>>>>>>>> registers in fp/int mode patterns. >>>>>>>>>> (store_pair<mode>:): Split to store_pairsi, store_pairdi, >>>>>>>>>> store_pairsf and store_pairdi. >>>>>>>>>> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): >>>>>>>>>> Split >>>>>>>>>> from store_pair<mode>. New alternative to support int/fp >>>>>>>>>> registers in fp/int mode patterns. >>>>>>>>>> (*load_pair_extendsidi2_aarch64): New pattern. >>>>>>>>>> (*load_pair_zero_extendsidi2_aarch64): New pattern. >>>>>>>>>> (aarch64-ldpstp.md): Include. >>>>>>>>>> * config/aarch64/aarch64-ldpstp.md: New file. >>>>>>>>>> * config/aarch64/aarch64-protos.h >>>>>>>>>> (aarch64_gen_adjusted_ldpstp): >>>>>>>>>> New. >>>>>>>>>> (extract_base_offset_in_addr): New. >>>>>>>>>> (aarch64_operands_ok_for_ldpstp): New. >>>>>>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New. >>>>>>>>>> * config/aarch64/aarch64.c (enum sched_fusion_type): New >>>>>>>>>> enum. >>>>>>>>>> (TARGET_SCHED_FUSION_PRIORITY): New hook. >>>>>>>>>> (fusion_load_store): New functon. >>>>>>>>>> (extract_base_offset_in_addr): New function. >>>>>>>>>> (aarch64_gen_adjusted_ldpstp): New function. >>>>>>>>>> (aarch64_sched_fusion_priority): New function. >>>>>>>>>> (aarch64_operands_ok_for_ldpstp): New function. >>>>>>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New function. >>>>>>>>>> >>>>>>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>>>>>> >>>>>>>>>> * gcc.target/aarch64/ldp-stp-1.c: New test. >>>>>>>>>> * gcc.target/aarch64/ldp-stp-2.c: New test. >>>>>>>>>> * gcc.target/aarch64/ldp-stp-3.c: New test. >>>>>>>>>> * gcc.target/aarch64/ldp-stp-4.c: New test. >>>>>>>>>> * gcc.target/aarch64/ldp-stp-5.c: New test. >>>>>>>>>> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion >>>>>>>>>> and peephole2 pass. >>>>>>>>> >>>>>>>>> Committed, thanks. /Marcus >>>>>>>> >>>>>>>> >>>>>>>> This patch has a bug in it dealing with volatile mems. It will do a >>>>>>>> peephole of two volatile mems into a pair which is not correct as the >>>>>>>> order in the architecture is not defined and might even swap the order >>>>>>>> of the load/store incorrectly. >>>>>>>> I noticed this when I was merging in my changes for an improved >>>>>>>> peephone which already has a check for volatile. >>>>>>> >>>>>>> Something like: >>>>>>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx >>>>>>> *operands, bool load, >>>>>>> reg_2 = operands[3]; >>>>>>> } >>>>>>> >>>>>>> + /* The mems cannot be volatile. */ >>>>>>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) >>>>>>> + return false; >>>>>>> + >>>>>>> /* Check if the addresses are in the form of [base+offset]. */ >>>>>>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >>>>>>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >>>>>>> >>>>>>> >>>>>>> ---- CUT ---- >>>>>>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads >>>>>>> being merged into one load pair. >>>>>> >>>>>> >>>>>> And this one: >>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>>> index 40881e7..da36dc8 100644 >>>>>> --- a/gcc/config/aarch64/aarch64.c >>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx >>>>>> *operands, bool load, >>>>>> if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode)) >>>>>> return false; >>>>>> >>>>>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2) >>>>>> + || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4)) >>>>>> + return false; >>>>>> + >>>>>> /* Check if the addresses are in the form of [base+offset]. */ >>>>>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >>>>>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >>>>>> --- CUT --- >>>>>> This one happens with c-c++-common/torture/complex-sign-add.c. >>>>>> I will post a full patch tomorrow. >>>>>> >>>>>> Thanks, >>>>>> Andrew >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Andrew >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Andrew Pinski >>>>> Hi Andrew, >>>>> Thanks for reporting this. When I facing this volatile problem, I >>>>> thought it was fine as long as the memory access wasn't removed >>>>> accidentally. Could you please explain more why the order of volatile >>>>> also matters? >>>> >>>> Because volatile access between sequence points cannot be reordered. >>>> See https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html . >>>> Mainly: >>>> " The minimum requirement is that at a sequence point all previous >>>> accesses to volatile objects have stabilized and no subsequent >>>> accesses have occurred. Thus an implementation is free to reorder and >>>> combine volatile accesses that occur between sequence points, but >>>> cannot do so for accesses across a sequence point". >>>> >>>>> Is it related to some barriers between the two >>>>> accesses? >>>> >>>> Volatile accesses are not allowed to be merged as I pointed out above >>>> or reordered which gives an implicit barrier between the two accesses. >>> Ah, yes, it is implicit barrier that can't be crossed. >>> Thanks very much for explanation. >>> >>> Thanks, >>> bin >>>> >>>>> If that's the case, does it imply that dependence created >>>>> by GCC isn't correct? >>>> >>>> It is correct just not the peepholes you created. >>>> >>>>> I am still not sure why the case passed in my >>>>> regression test, or how I missed it. >>>> >>>> Well that is because the ICE only shows up with my improved >>>> peephole2/load_pair/store_pair patterns. My load_pair/strore_pair >>>> patterns that I had created were rejecting volatile memory locations >>>> to make sure nobody used these patterns with volatile memory >>>> locations. Once I get some more time, I will post a clean up patch of >>>> my peephole2/load_pair/store_pair which I think are slightly easier to >>>> understand. >>>> >>>> Thanks, >>>> Andrew Pinski >>>> >>>>> >>>>> Any way, volatile cases should be handled conservatively here. Thanks >>>>> again. >>>>> >>>>> Thanks, >>>>> bin >> >> As for the fix, I think it is TARGET_SCHED_FUSION_PRIORITY that needs >> to be fixed. Scheduler reorders loads/stores according to the >> priorities set by that hook function. After fixing the hook, even >> volatile accesses can be paired if they are next to each other (or >> volatile access and normal one) at the first place (which means no >> sequence point between volatile accesses). Makes sense? > > I think the scheduler should already have code to handle volatile > memory locations and have the implicit barrier. > And sequence point in the documentation is talking about source code > sequence points (that is an end of a statement is considered a > sequence point). > > Thanks, > Andrew Pinski > >> >> Thanks, >> bin
Right, sched-deps adds dependence between volatile access. Thanks, bin