On Sun, Dec 7, 2014 at 5:47 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Sun, Dec 7, 2014 at 6:24 PM, Andrew Pinski <pins...@gmail.com> wrote: >> On Sat, Dec 6, 2014 at 6:35 PM, Andrew Pinski <pins...@gmail.com> wrote: >>> On Sat, Dec 6, 2014 at 5:54 PM, Andrew Pinski <pins...@gmail.com> wrote: >>>> On Fri, Dec 5, 2014 at 9:08 AM, Marcus Shawcroft >>>> <marcus.shawcr...@gmail.com> wrote: >>>>> On 18 November 2014 at 08:34, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> >>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>> >>>>>> * config/aarch64/aarch64.md (load_pair<mode>): Split to >>>>>> load_pairsi, load_pairdi, load_pairsf and load_pairdf. >>>>>> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split >>>>>> from load_pair<mode>. New alternative to support int/fp >>>>>> registers in fp/int mode patterns. >>>>>> (store_pair<mode>:): Split to store_pairsi, store_pairdi, >>>>>> store_pairsf and store_pairdi. >>>>>> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split >>>>>> from store_pair<mode>. New alternative to support int/fp >>>>>> registers in fp/int mode patterns. >>>>>> (*load_pair_extendsidi2_aarch64): New pattern. >>>>>> (*load_pair_zero_extendsidi2_aarch64): New pattern. >>>>>> (aarch64-ldpstp.md): Include. >>>>>> * config/aarch64/aarch64-ldpstp.md: New file. >>>>>> * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp): >>>>>> New. >>>>>> (extract_base_offset_in_addr): New. >>>>>> (aarch64_operands_ok_for_ldpstp): New. >>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New. >>>>>> * config/aarch64/aarch64.c (enum sched_fusion_type): New enum. >>>>>> (TARGET_SCHED_FUSION_PRIORITY): New hook. >>>>>> (fusion_load_store): New functon. >>>>>> (extract_base_offset_in_addr): New function. >>>>>> (aarch64_gen_adjusted_ldpstp): New function. >>>>>> (aarch64_sched_fusion_priority): New function. >>>>>> (aarch64_operands_ok_for_ldpstp): New function. >>>>>> (aarch64_operands_adjust_ok_for_ldpstp): New function. >>>>>> >>>>>> 2014-11-18 Bin Cheng <bin.ch...@arm.com> >>>>>> >>>>>> * gcc.target/aarch64/ldp-stp-1.c: New test. >>>>>> * gcc.target/aarch64/ldp-stp-2.c: New test. >>>>>> * gcc.target/aarch64/ldp-stp-3.c: New test. >>>>>> * gcc.target/aarch64/ldp-stp-4.c: New test. >>>>>> * gcc.target/aarch64/ldp-stp-5.c: New test. >>>>>> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion >>>>>> and peephole2 pass. >>>>> >>>>> Committed, thanks. /Marcus >>>> >>>> >>>> This patch has a bug in it dealing with volatile mems. It will do a >>>> peephole of two volatile mems into a pair which is not correct as the >>>> order in the architecture is not defined and might even swap the order >>>> of the load/store incorrectly. >>>> I noticed this when I was merging in my changes for an improved >>>> peephone which already has a check for volatile. >>> >>> Something like: >>> @@ -10555,6 +10863,10 @@ aarch64_operands_ok_for_ldpstp (rtx >>> *operands, bool load, >>> reg_2 = operands[3]; >>> } >>> >>> + /* The mems cannot be volatile. */ >>> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) >>> + return false; >>> + >>> /* Check if the addresses are in the form of [base+offset]. */ >>> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >>> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >>> >>> >>> ---- CUT ---- >>> gcc.target/aarch64/rev16_1.c is the testcase where two volatile loads >>> being merged into one load pair. >> >> >> And this one: >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 40881e7..da36dc8 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -10974,6 +10974,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx >> *operands, bool load, >> if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode)) >> return false; >> >> + if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2) >> + || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4)) >> + return false; >> + >> /* Check if the addresses are in the form of [base+offset]. */ >> extract_base_offset_in_addr (mem_1, &base_1, &offset_1); >> if (base_1 == NULL_RTX || offset_1 == NULL_RTX) >> --- CUT --- >> This one happens with c-c++-common/torture/complex-sign-add.c. >> I will post a full patch tomorrow. >> >> Thanks, >> Andrew >> >>> >>> Thanks, >>> Andrew >>> >>>> >>>> Thanks, >>>> Andrew Pinski > Hi Andrew, > Thanks for reporting this. When I facing this volatile problem, I > thought it was fine as long as the memory access wasn't removed > accidentally. Could you please explain more why the order of volatile > also matters?
Because volatile access between sequence points cannot be reordered. See https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html . Mainly: " The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred. Thus an implementation is free to reorder and combine volatile accesses that occur between sequence points, but cannot do so for accesses across a sequence point". > Is it related to some barriers between the two > accesses? Volatile accesses are not allowed to be merged as I pointed out above or reordered which gives an implicit barrier between the two accesses. > If that's the case, does it imply that dependence created > by GCC isn't correct? It is correct just not the peepholes you created. > I am still not sure why the case passed in my > regression test, or how I missed it. Well that is because the ICE only shows up with my improved peephole2/load_pair/store_pair patterns. My load_pair/strore_pair patterns that I had created were rejecting volatile memory locations to make sure nobody used these patterns with volatile memory locations. Once I get some more time, I will post a clean up patch of my peephole2/load_pair/store_pair which I think are slightly easier to understand. Thanks, Andrew Pinski > > Any way, volatile cases should be handled conservatively here. Thanks again. > > Thanks, > bin