On Mon, Dec 8, 2014 at 11:16 AM, Andrew Pinski <pins...@gmail.com> wrote: > On Sun, Dec 7, 2014 at 5:47 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pins...@gmail.com> wrote: >>> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft >>>>> <marcus.shawcr...@gmail.com> wrote: >>>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> >>>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>>> >>>>>>> * config/aarch64/aarch64.md (load_pair<mode>): Split to >>>>>>> load_pairsi, load_pairdi, load_pairsf and load_pairdf. >>>>>>> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split >>>>>>> from load_pair<mode>. New alternative to support int/fp >>>>>>> registers in fp/int mode patterns. >>>>>>> (store_pair<mode>:): Split to store_pairsi, store_pairdi, >>>>>>> store_pairsf and store_pairdi. >>>>>>> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split >>>>>>> from store_pair<mode>. New alternative to support int/fp >>>>>>> registers in fp/int mode patterns. >>>>>>> (*load_pair_extendsidi2_aarch64): New pattern. >>>>>>> (*load_pair_zero_extendsidi2_aarch64): New pattern. >>>>>>> (aarch64-ldpstp.md): Include. >>>>>>> * config/aarch64/aarch64-ldpstp.md: New file. >>>>>>> * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp): >>>>>>> New. >>>>>>> (extract_base_offset_in_addr): New. >>>>>>> (aarch64_operands_ok_for_ldpstp): New. >>>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New. >>>>>>> * config/aarch64/aarch64.c (enum sched_fusion_type): New enum. >>>>>>> (TARGET_SCHED_FUSION_PRIORITY): New hook. >>>>>>> (fusion_load_store): New functon. >>>>>>> (extract_base_offset_in_addr): New function. >>>>>>> (aarch64_gen_adjusted_ldpstp): New function. >>>>>>> (aarch64_sched_fusion_priority): New function. >>>>>>> (aarch64_operands_ok_for_ldpstp): New function. >>>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New function. >>>>>>> >>>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>>> >>>>>>> * gcc.target/aarch64/ldp-stp-1.c: New test. >>>>>>> * gcc.target/aarch64/ldp-stp-2.c: New test. >>>>>>> * gcc.target/aarch64/ldp-stp-3.c: New test. >>>>>>> * gcc.target/aarch64/ldp-stp-4.c: New test. >>>>>>> * gcc.target/aarch64/ldp-stp-5.c: New test. >>>>>>> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion >>>>>>> and peephole2 pass. >>>>>> >>>>>> Committed, thanks. /Marcus >>>>> >>>>> >>>>> This patch has a bug in it dealing with volatile mems. It will do a >>>>> peephole of two volatile mems into a pair which is not correct as the >>>>> order in the architecture is not defined and might even swap the order >>>>> of the load/store incorrectly. >>>>> I noticed this when I was merging in my changes for an improved >>>>> peephone which already has a check for volatile. >>>> >>>> Something like: >>>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx >>>> *operands, bool load, >>>> reg_2 = operands[3]; >>>> } >>>> >>>> + /* The mems cannot be volatile. */ >>>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) >>>> + return false; >>>> + >>>> /* Check if the addresses are in the form of [base+offset]. */ >>>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >>>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >>>> >>>> >>>> ---- CUT ---- >>>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads >>>> being merged into one load pair. >>> >>> >>> And this one: >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 40881e7..da36dc8 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx >>> *operands, bool load, >>> if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode)) >>> return false; >>> >>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2) >>> + || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4)) >>> + return false; >>> + >>> /* Check if the addresses are in the form of [base+offset]. */ >>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >>> --- CUT --- >>> This one happens with c-c++-common/torture/complex-sign-add.c. >>> I will post a full patch tomorrow. >>> >>> Thanks, >>> Andrew >>> >>>> >>>> Thanks, >>>> Andrew >>>> >>>>> >>>>> Thanks, >>>>> Andrew Pinski >> Hi Andrew, >> Thanks for reporting this. When I facing this volatile problem, I >> thought it was fine as long as the memory access wasn't removed >> accidentally. Could you please explain more why the order of volatile >> also matters? > > Because volatile access between sequence points cannot be reordered. > See https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html . > Mainly: > " The minimum requirement is that at a sequence point all previous > accesses to volatile objects have stabilized and no subsequent > accesses have occurred. Thus an implementation is free to reorder and > combine volatile accesses that occur between sequence points, but > cannot do so for accesses across a sequence point". > >> Is it related to some barriers between the two >> accesses? > > Volatile accesses are not allowed to be merged as I pointed out above > or reordered which gives an implicit barrier between the two accesses. Ah, yes, it is implicit barrier that can't be crossed. Thanks very much for explanation.
Thanks, bin > >> If that's the case, does it imply that dependence created >> by GCC isn't correct? > > It is correct just not the peepholes you created. > >> I am still not sure why the case passed in my >> regression test, or how I missed it. > > Well that is because the ICE only shows up with my improved > peephole2/load_pair/store_pair patterns. My load_pair/strore_pair > patterns that I had created were rejecting volatile memory locations > to make sure nobody used these patterns with volatile memory > locations. Once I get some more time, I will post a clean up patch of > my peephole2/load_pair/store_pair which I think are slightly easier to > understand. > > Thanks, > Andrew Pinski > >> >> Any way, volatile cases should be handled conservatively here. Thanks again. >> >> Thanks, >> bin