On Tue, 5 Feb 2019 at 15:44, Matthew Malcomson <matthew.malcom...@arm.com> wrote: > > These peepholes match a pair of SImode loads or stores that can be > implemented with a single LDRD or STRD instruction. > When compiling for TARGET_ARM, these peepholes originally created a set > pattern in DI mode to be caught by movdi patterns. > > This approach failed to take into account the possibility that the two > matched insns operated on memory with different aliasing information. > The peepholes lost the aliasing information on one of the insns, which > could then cause the scheduler to make an invalid transformation. > > This patch changes the peepholes so they generate a PARALLEL expression > of the two relevant loads or stores, which means the aliasing > information of both is kept. Such a PARALLEL pattern is what the > peepholes currently produce for TARGET_THUMB2. > > In order to match these new insn patterns, we add two new define_insn's. > These > define_insn's use the same checks as the peepholes to find valid insns. > > Note that the patterns now created by the peepholes for LDRD and STRD > are very similar to those created by the peepholes for LDM and STM. > Many patterns could be matched by the LDM and STM define_insns, which > means we rely on the order the define_insn patterns are defined in the > machine description, with those for LDRD/STRD defined before those for > LDM/STM. > > The difference between the peepholes for LDRD/STRD and those for LDM/STM > are mainly that those for LDRD/STRD have some logic to ensure that the > two registers are consecutive and the first one is even. > > Bootstrapped and regtested on arm-none-linux-gnu. > Demonstrated fix of bug 88714 by bootstrapping on armv7l. > > > gcc/ChangeLog: > > 2019-02-05 Matthew Malcomson <matthew.malcom...@arm.com> > > PR bootstrap/88714 > * config/arm/arm-protos.h (valid_operands_ldrd_strd, > arm_count_ldrdstrd_insns): New declarations. > * config/arm/arm.c (mem_ok_for_ldrd_strd): Remove broken handling of > MINUS. > (valid_operands_ldrd_strd): New function. > (arm_count_ldrdstrd_insns): New function. > * config/arm/ldrdstrd.md: Change peepholes to generate PARALLEL SImode > sets instead of single DImode set and define new insns to match this. > > gcc/testsuite/ChangeLog: > > 2019-02-05 Matthew Malcomson <matthew.malcom...@arm.com> > > * gcc.c-torture/execute/pr88714.c: New test. > * gcc.dg/rtl/arm/ldrd-peepholes.c: New test. >
Hi! I'm afaid this patch causes several regressions. Maybe they have already been fixed post-commit (I have several validations for later commits still running)? For the whole picture, see: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/268644/report-build-info.html Namely there are some ICEs: --target arm-none-linux-gnueabi --with-mode arm --with-cpu cortex-a9 --with-fpu default gcc.c-torture/execute/builtins/memcpy-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/memmove-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/mempcpy-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/memset-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/sprintf-chk.c compilation, -O2 (internal compiler error) gcc.c-torture/execute/builtins/sprintf-chk.c compilation, -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) gcc.c-torture/execute/builtins/sprintf-chk.c compilation, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) gcc.c-torture/execute/builtins/sprintf-chk.c compilation, -O3 -g (internal compiler error) gcc.c-torture/execute/builtins/stpcpy-chk.c compilation, -O2 (internal compiler error) gcc.c-torture/execute/builtins/stpcpy-chk.c compilation, -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) gcc.c-torture/execute/builtins/stpcpy-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/stpcpy-chk.c compilation, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) gcc.c-torture/execute/builtins/stpcpy-chk.c compilation, -O3 -g (internal compiler error) gcc.c-torture/execute/builtins/strcat-chk.c compilation, -O2 (internal compiler error) gcc.c-torture/execute/builtins/strcat-chk.c compilation, -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) gcc.c-torture/execute/builtins/strcat-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) gcc.c-torture/execute/builtins/strcat-chk.c compilation, -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) gcc.c-torture/execute/builtins/strcat-chk.c compilation, -O3 -g (internal compiler error) gcc.c-torture/execute/builtins/strncat-chk.c compilation, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) Failing assembler scans: --target arm-none-linux-gnueabi --with-mode arm --with-cpu cortex-a9 --with-fpu default Dejagnu flags: -march=armv5t (ie. same config as above, but forcing -march=armv5t when running the tests: this avoids the ICE, but the scan-assembler fails) gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-not ldm gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-not stm gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-times ldrd\\tr[2468], \\[r0, #8\\] 1 gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-times ldrd\\tr[2468], \\[r0\\] 4 gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-times strd\\tr[2468], \\[r0, #8\\] 1 gcc.dg/rtl/arm/ldrd-peepholes.c scan-assembler-times strd\\tr[2468], \\[r0\\] 6 gcc.dg/rtl/arm/ldrd-peepholes.c scan-rtl-dump peephole2 "Function foo_x1.*\\(parallel \\[\\n[^\\n]*\\(set \\(mem[^\\n]*\\n[^\\n]*\\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\\).*Function foo_x2" gcc.dg/rtl/arm/ldrd-peepholes.c scan-rtl-dump peephole2 "Function foo_x2.*\\(parallel \\[\\n[^\\n]*\\(set \\(mem[^\\n]*\\n[^\\n]*\\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\\).*Function foo_x3" gcc.dg/rtl/arm/ldrd-peepholes.c scan-rtl-dump peephole2 "Function foo_x3.*\\(parallel \\[\\n[^\\n]*\\(set \\(mem[^\\n]*\\n[^\\n]*\\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\\).*Function foo_x4" gcc.dg/rtl/arm/ldrd-peepholes.c scan-rtl-dump peephole2 "Function foo_x4.*\\(parallel \\[\\n[^\\n]*\\(set \\(reg:SI[^\\n]*\\n *\\(mem/c:SI \\(plus:SI \\(reg:SI 0 r0\\)\\n *\\(const_int 8.*Function foo_x5" gcc.dg/rtl/arm/ldrd-peepholes.c scan-rtl-dump peephole2 "Function foo_x5.*\\(parallel \\[\\n[^\\n]*\\(set \\(mem/c:SI \\(plus:SI \\(reg:SI 0 r0\\)\\n *\\(const_int 8.*$" A few more ICEs: --target arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a57 --with-fpu crypto-neon-fp-armv8 gcc.dg/torture/stackalign/nested-6.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic (internal compiler error) gcc.dg/torture/stackalign/nested-6.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -fpic (internal compiler error) gcc.dg/torture/stackalign/nested-6.c -O2 -fpic (internal compiler error) gcc.dg/torture/stackalign/nested-6.c -O3 -g -fpic (internal compiler error) And GCC fails to build: --target arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16 when compiling libsanitizer/libacktrace: Makefile:564: recipe for target 'cp-demangle.lo' failed make[4]: *** [cp-demangle.lo] Error 1 /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/libsanitizer/libbacktrace/../../libiberty/cp-demangle.c: In function 'is_ctor_or_dtor': /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/libsanitizer/libbacktrace/../../libiberty/cp-demangle.c:6615:1: error: insn does not satisfy its constraints: 6615 | } | ^ (insn 236 107 218 2 (parallel [ (set (mem/c:SI (plus:SI (reg/f:SI 11 fp) (const_int -56 [0xffffffffffffffc8])) [1 di.num_comps+0 S4 A32]) (reg:SI 12 ip [131])) (set (mem/f/c:SI (plus:SI (reg/f:SI 11 fp) (const_int -52 [0xffffffffffffffcc])) [26 di.subs+0 S4 A32]) (reg/f:SI 13 sp)) ]) "/home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/libsanitizer/libbacktrace/../../libiberty/cp-demangle.c":6567:13 347 {*arm_strd} (nil)) during RTL pass: cprop_hardreg /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/libsanitizer/libbacktrace/../../libiberty/cp-demangle.c:6615:1: internal compiler error: in extract_constrain_insn, at recog.c:2211 0x5a8795 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/rtl-error.c:108 0x5a87bb _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/rtl-error.c:119 0xb7c15d extract_constrain_insn(rtx_insn*) /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/recog.c:2211 0xb7fd26 copyprop_hardreg_forward_1 /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/regcprop.c:801 0xb80b22 execute /home/christophe.lyon/src/GCC/sources/gcc-fsf/trunk/gcc/regcprop.c:1307 Thanks, Christophe > > > ############### Attachment also inlined for ease of reply > ############### > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index > 79ede0db174fcce87abe8b4d18893550d4c7e2f6..485bc68a618d6ae4a1640368ccb025fe2c9e1420 > 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -125,6 +125,7 @@ extern rtx arm_gen_store_multiple (int *, int, rtx, int, > rtx, HOST_WIDE_INT *); > extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT); > extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool); > extern bool gen_operands_ldrd_strd (rtx *, bool, bool, bool); > +extern bool valid_operands_ldrd_strd (rtx *, bool); > extern int arm_gen_movmemqi (rtx *); > extern bool gen_movmem_ldrd_strd (rtx *); > extern machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx); > @@ -146,6 +147,7 @@ extern const char *output_mov_long_double_arm_from_arm > (rtx *); > extern const char *output_move_double (rtx *, bool, int *count); > extern const char *output_move_quad (rtx *); > extern int arm_count_output_move_double_insns (rtx *); > +extern int arm_count_ldrdstrd_insns (rtx *, bool); > extern const char *output_move_vfp (rtx *operands); > extern const char *output_move_neon (rtx *operands); > extern int arm_attr_length_move_neon (rtx_insn *); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index > 73cb8df9af1ec9d680091bb8691bcd925a1be1d3..1c336da9e5b0948ef1058c46966364510dc1ca38 > 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -15556,7 +15556,7 @@ mem_ok_for_ldrd_strd (rtx mem, rtx *base, rtx > *offset, HOST_WIDE_INT *align) > *base = addr; > return true; > } > - else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == MINUS) > + else if (GET_CODE (addr) == PLUS) > { > *base = XEXP (addr, 0); > *offset = XEXP (addr, 1); > @@ -15721,7 +15721,7 @@ gen_operands_ldrd_strd (rtx *operands, bool load, > } > > /* Make sure accesses are to consecutive memory locations. */ > - if (gap != 4) > + if (gap != GET_MODE_SIZE (SImode)) > return false; > > if (!align_ok_ldrd_strd (align[0], offset)) > @@ -15802,6 +15802,55 @@ gen_operands_ldrd_strd (rtx *operands, bool load, > } > > > +/* Return true if parallel execution of the two word-size accesses provided > + could be satisfied with a single LDRD/STRD instruction. Two word-size > + accesses are represented by the OPERANDS array, where OPERANDS[0,1] are > + register operands and OPERANDS[2,3] are the corresponding memory operands. > + */ > +bool > +valid_operands_ldrd_strd (rtx *operands, bool load) > +{ > + int nops = 2; > + HOST_WIDE_INT offsets[2], offset, align[2]; > + rtx base = NULL_RTX; > + rtx cur_base, cur_offset; > + int i, gap; > + > + /* Check that the memory references are immediate offsets from the > + same base register. Extract the base register, the destination > + registers, and the corresponding memory offsets. */ > + for (i = 0; i < nops; i++) > + { > + if (!mem_ok_for_ldrd_strd (operands[nops+i], &cur_base, &cur_offset, > + &align[i])) > + return false; > + > + if (i == 0) > + base = cur_base; > + else if (REGNO (base) != REGNO (cur_base)) > + return false; > + > + offsets[i] = INTVAL (cur_offset); > + if (GET_CODE (operands[i]) == SUBREG) > + return false; > + } > + > + if (offsets[0] > offsets[1]) > + return false; > + > + gap = offsets[1] - offsets[0]; > + offset = offsets[0]; > + > + /* Make sure accesses are to consecutive memory locations. */ > + if (gap != GET_MODE_SIZE (SImode)) > + return false; > + > + if (!align_ok_ldrd_strd (align[0], offset)) > + return false; > + > + return operands_ok_ldrd_strd (operands[0], operands[1], base, offset, > + false, load); > +} > > > > > /* Print a symbolic form of X to the debug file, F. */ > @@ -28474,6 +28523,26 @@ arm_count_output_move_double_insns (rtx *operands) > return count; > } > > +/* Same as above, but operands are a register/memory pair in SImode. > + Assumes operands has the base register in position 0 and memory in > position > + 2 (which is the order provided by the arm_{ldrd,strd} patterns). */ > +int > +arm_count_ldrdstrd_insns (rtx *operands, bool load) > +{ > + int count; > + rtx ops[2]; > + int regnum, memnum; > + if (load) > + regnum = 0, memnum = 1; > + else > + regnum = 1, memnum = 0; > + ops[regnum] = gen_rtx_REG (DImode, REGNO (operands[0])); > + ops[memnum] = adjust_address (operands[2], DImode, 0); > + output_move_double (ops, false, &count); > + return count; > +} > + > + > int > vfp3_const_double_for_fract_bits (rtx operand) > { > diff --git a/gcc/config/arm/ldrdstrd.md b/gcc/config/arm/ldrdstrd.md > index > be53d010fa6dfcf5a6854ae2b17f7cfcea25db9e..cb7a6adebbc8084a2e642ff2dcbef8b3fb16f268 > 100644 > --- a/gcc/config/arm/ldrdstrd.md > +++ b/gcc/config/arm/ldrdstrd.md > @@ -23,37 +23,22 @@ > ;; The following peephole optimizations identify consecutive memory > ;; accesses, and try to rearrange the operands to enable generation of > ;; ldrd/strd. > +;; > +;; In many cases they behave in the same way that patterns in ldmstm.md > behave, > +;; but there is extra logic in gen_operands_ldrd_strd to try and ensure the > +;; registers used are an (r<N>, r<N + 1>) pair where N is even. > > (define_peephole2 ; ldrd > [(set (match_operand:SI 0 "arm_general_register_operand" "") > - (match_operand:SI 2 "memory_operand" "")) > + (match_operand:SI 2 "memory_operand" "")) > (set (match_operand:SI 1 "arm_general_register_operand" "") > - (match_operand:SI 3 "memory_operand" ""))] > + (match_operand:SI 3 "memory_operand" ""))] > "TARGET_LDRD" > - [(const_int 0)] > + [(parallel [(set (match_dup 0) (match_dup 2)) > + (set (match_dup 1) (match_dup 3))])] > { > if (!gen_operands_ldrd_strd (operands, true, false, false)) > FAIL; > - else if (TARGET_ARM) > - { > - /* In ARM state, the destination registers of LDRD/STRD must be > - consecutive. We emit DImode access. */ > - operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - /* Emit [(set (match_dup 0) (match_dup 2))] */ > - emit_insn (gen_rtx_SET (operands[0], operands[2])); > - DONE; > - } > - else if (TARGET_THUMB2) > - { > - /* Emit the pattern: > - [(parallel [(set (match_dup 0) (match_dup 2)) > - (set (match_dup 1) (match_dup 3))])] */ > - rtx t1 = gen_rtx_SET (operands[0], operands[2]); > - rtx t2 = gen_rtx_SET (operands[1], operands[3]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > - DONE; > - } > }) > > (define_peephole2 ; strd > @@ -62,117 +47,50 @@ (define_peephole2 ; strd > (set (match_operand:SI 3 "memory_operand" "") > (match_operand:SI 1 "arm_general_register_operand" ""))] > "TARGET_LDRD" > - [(const_int 0)] > + [(parallel [(set (match_dup 2) (match_dup 0)) > + (set (match_dup 3) (match_dup 1))])] > { > if (!gen_operands_ldrd_strd (operands, false, false, false)) > FAIL; > - else if (TARGET_ARM) > - { > - /* In ARM state, the destination registers of LDRD/STRD must be > - consecutive. We emit DImode access. */ > - operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - /* Emit [(set (match_dup 2) (match_dup 0))] */ > - emit_insn (gen_rtx_SET (operands[2], operands[0])); > - DONE; > - } > - else if (TARGET_THUMB2) > - { > - /* Emit the pattern: > - [(parallel [(set (match_dup 2) (match_dup 0)) > - (set (match_dup 3) (match_dup 1))])] */ > - rtx t1 = gen_rtx_SET (operands[2], operands[0]); > - rtx t2 = gen_rtx_SET (operands[3], operands[1]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > - DONE; > - } > }) > > ;; The following peepholes reorder registers to enable LDRD/STRD. > (define_peephole2 ; strd of constants > [(set (match_operand:SI 0 "arm_general_register_operand" "") > - (match_operand:SI 4 "const_int_operand" "")) > + (match_operand:SI 4 "const_int_operand" "")) > (set (match_operand:SI 2 "memory_operand" "") > - (match_dup 0)) > + (match_dup 0)) > (set (match_operand:SI 1 "arm_general_register_operand" "") > - (match_operand:SI 5 "const_int_operand" "")) > + (match_operand:SI 5 "const_int_operand" "")) > (set (match_operand:SI 3 "memory_operand" "") > - (match_dup 1))] > + (match_dup 1))] > "TARGET_LDRD" > - [(const_int 0)] > + [(set (match_dup 0) (match_dup 4)) > + (set (match_dup 1) (match_dup 5)) > + (parallel [(set (match_dup 2) (match_dup 0)) > + (set (match_dup 3) (match_dup 1))])] > { > if (!gen_operands_ldrd_strd (operands, false, true, false)) > FAIL; > - else if (TARGET_ARM) > - { > - rtx tmp = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - /* Emit the pattern: > - [(set (match_dup 0) (match_dup 4)) > - (set (match_dup 1) (match_dup 5)) > - (set (match_dup 2) tmp)] */ > - emit_insn (gen_rtx_SET (operands[0], operands[4])); > - emit_insn (gen_rtx_SET (operands[1], operands[5])); > - emit_insn (gen_rtx_SET (operands[2], tmp)); > - DONE; > - } > - else if (TARGET_THUMB2) > - { > - /* Emit the pattern: > - [(set (match_dup 0) (match_dup 4)) > - (set (match_dup 1) (match_dup 5)) > - (parallel [(set (match_dup 2) (match_dup 0)) > - (set (match_dup 3) (match_dup 1))])] */ > - emit_insn (gen_rtx_SET (operands[0], operands[4])); > - emit_insn (gen_rtx_SET (operands[1], operands[5])); > - rtx t1 = gen_rtx_SET (operands[2], operands[0]); > - rtx t2 = gen_rtx_SET (operands[3], operands[1]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > - DONE; > - } > }) > > (define_peephole2 ; strd of constants > [(set (match_operand:SI 0 "arm_general_register_operand" "") > - (match_operand:SI 4 "const_int_operand" "")) > + (match_operand:SI 4 "const_int_operand" "")) > (set (match_operand:SI 1 "arm_general_register_operand" "") > - (match_operand:SI 5 "const_int_operand" "")) > + (match_operand:SI 5 "const_int_operand" "")) > (set (match_operand:SI 2 "memory_operand" "") > - (match_dup 0)) > + (match_dup 0)) > (set (match_operand:SI 3 "memory_operand" "") > - (match_dup 1))] > + (match_dup 1))] > "TARGET_LDRD" > - [(const_int 0)] > + [(set (match_dup 0) (match_dup 4)) > + (set (match_dup 1) (match_dup 5)) > + (parallel [(set (match_dup 2) (match_dup 0)) > + (set (match_dup 3) (match_dup 1))])] > { > if (!gen_operands_ldrd_strd (operands, false, true, false)) > FAIL; > - else if (TARGET_ARM) > - { > - rtx tmp = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - /* Emit the pattern > - [(set (match_dup 0) (match_dup 4)) > - (set (match_dup 1) (match_dup 5)) > - (set (match_dup 2) tmp)] */ > - emit_insn (gen_rtx_SET (operands[0], operands[4])); > - emit_insn (gen_rtx_SET (operands[1], operands[5])); > - emit_insn (gen_rtx_SET (operands[2], tmp)); > - DONE; > - } > - else if (TARGET_THUMB2) > - { > - /* Emit the pattern: > - [(set (match_dup 0) (match_dup 4)) > - (set (match_dup 1) (match_dup 5)) > - (parallel [(set (match_dup 2) (match_dup 0)) > - (set (match_dup 3) (match_dup 1))])] */ > - emit_insn (gen_rtx_SET (operands[0], operands[4])); > - emit_insn (gen_rtx_SET (operands[1], operands[5])); > - rtx t1 = gen_rtx_SET (operands[2], operands[0]); > - rtx t2 = gen_rtx_SET (operands[3], operands[1]); > - emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); > - DONE; > - } > }) > > ;; The following two peephole optimizations are only relevant for ARM > @@ -181,39 +99,32 @@ (define_peephole2 ; strd of constants > (define_peephole2 ; swap the destination registers of two loads > ; before a commutative operation. > [(set (match_operand:SI 0 "arm_general_register_operand" "") > - (match_operand:SI 2 "memory_operand" "")) > + (match_operand:SI 2 "memory_operand" "")) > (set (match_operand:SI 1 "arm_general_register_operand" "") > - (match_operand:SI 3 "memory_operand" "")) > + (match_operand:SI 3 "memory_operand" "")) > (set (match_operand:SI 4 "arm_general_register_operand" "") > - (match_operator:SI 5 "commutative_binary_operator" > + (match_operator:SI 5 "commutative_binary_operator" > [(match_operand 6 "arm_general_register_operand" > "") > (match_operand 7 "arm_general_register_operand" > "") ]))] > "TARGET_LDRD && TARGET_ARM > && ( ((rtx_equal_p(operands[0], operands[6])) && > (rtx_equal_p(operands[1], operands[7]))) > - ||((rtx_equal_p(operands[0], operands[7])) && > (rtx_equal_p(operands[1], operands[6])))) > + ||((rtx_equal_p(operands[0], operands[7])) && > (rtx_equal_p(operands[1], operands[6])))) > && (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], > operands[4])) > && (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], > operands[4]))" > - [(set (match_dup 0) (match_dup 2)) > + [(parallel [(set (match_dup 0) (match_dup 2)) > + (set (match_dup 1) (match_dup 3))]) > (set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))] > - { > - if (!gen_operands_ldrd_strd (operands, true, false, true)) > - { > - FAIL; > - } > - else > - { > - operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - } > - } > -) > +{ > + if (!gen_operands_ldrd_strd (operands, true, false, true)) > + FAIL; > +}) > > (define_peephole2 ; swap the destination registers of two loads > ; before a commutative operation that sets the flags. > [(set (match_operand:SI 0 "arm_general_register_operand" "") > - (match_operand:SI 2 "memory_operand" "")) > + (match_operand:SI 2 "memory_operand" "")) > (set (match_operand:SI 1 "arm_general_register_operand" "") > - (match_operand:SI 3 "memory_operand" "")) > + (match_operand:SI 3 "memory_operand" "")) > (parallel > [(set (match_operand:SI 4 "arm_general_register_operand" "") > (match_operator:SI 5 "commutative_binary_operator" > @@ -225,24 +136,62 @@ (define_peephole2 ; swap the destination registers of > two loads > ||((rtx_equal_p(operands[0], operands[7])) && > (rtx_equal_p(operands[1], operands[6])))) > && (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], > operands[4])) > && (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], > operands[4]))" > - [(set (match_dup 0) (match_dup 2)) > + [(parallel [(set (match_dup 0) (match_dup 2)) > + (set (match_dup 1) (match_dup 3))]) > (parallel > [(set (match_dup 4) > (match_op_dup 5 [(match_dup 6) (match_dup 7)])) > (clobber (reg:CC CC_REGNUM))])] > - { > - if (!gen_operands_ldrd_strd (operands, true, false, true)) > - { > - FAIL; > - } > - else > - { > - operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); > - operands[2] = adjust_address (operands[2], DImode, 0); > - } > - } > -) > +{ > + if (!gen_operands_ldrd_strd (operands, true, false, true)) > + FAIL; > +}) > > ;; TODO: Handle LDRD/STRD with writeback: > ;; (a) memory operands can be POST_INC, POST_DEC, PRE_MODIFY, POST_MODIFY > ;; (b) Patterns may be followed by an update of the base address. > + > + > +;; insns matching the LDRD/STRD patterns that will get created by the above > +;; peepholes. > +;; We use gen_operands_ldrd_strd() with a modify argument as false so that > the > +;; operands are not changed. > +(define_insn "*arm_ldrd" > + [(parallel [(set (match_operand:SI 0 "s_register_operand" "=r") > + (match_operand:SI 2 "memory_operand" "m")) > + (set (match_operand:SI 1 "s_register_operand" "=r") > + (match_operand:SI 3 "memory_operand" "m"))])] > + "TARGET_LDRD && TARGET_ARM && reload_completed > + && valid_operands_ldrd_strd (operands, true)" > + { > + rtx op[2]; > + op[0] = gen_rtx_REG (DImode, REGNO (operands[0])); > + op[1] = adjust_address (operands[2], DImode, 0); > + return output_move_double (op, true, NULL); > + } > + [(set (attr "length") > + (symbol_ref "arm_count_ldrdstrd_insns (operands, true) * 4")) > + (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4")) > + (set_attr "type" "load_8") > + (set_attr "predicable" "yes")] > +) > + > +(define_insn "*arm_strd" > + [(parallel [(set (match_operand:SI 2 "memory_operand" "=m") > + (match_operand:SI 0 "s_register_operand" "r")) > + (set (match_operand:SI 3 "memory_operand" "=m") > + (match_operand:SI 1 "s_register_operand" "r"))])] > + "TARGET_LDRD && TARGET_ARM && reload_completed > + && valid_operands_ldrd_strd (operands, false)" > + { > + rtx op[2]; > + op[0] = adjust_address (operands[2], DImode, 0); > + op[1] = gen_rtx_REG (DImode, REGNO (operands[0])); > + return output_move_double (op, true, NULL); > + } > + [(set (attr "length") > + (symbol_ref "arm_count_ldrdstrd_insns (operands, false) * 4")) > + (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4")) > + (set_attr "type" "store_8") > + (set_attr "predicable" "yes")] > +) > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr88714.c > b/gcc/testsuite/gcc.c-torture/execute/pr88714.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..614ad9ac4a0662ba752532270e2d687505504d48 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr88714.c > @@ -0,0 +1,43 @@ > +/* PR bootstrap/88714 */ > + > +struct S { int a, b, c; int *d; }; > +struct T { int *e, *f, *g; } *t = 0; > +int *o = 0; > + > +__attribute__((noipa)) > +void bar (int *x, int y, int z, int w) > +{ > + if (w == -1) > + { > + if (x != 0 || y != 0 || z != 0) > + __builtin_abort (); > + } > + else if (w != 0 || x != t->g || y != 0 || z != 12) > + __builtin_abort (); > +} > + > +__attribute__((noipa)) void > +foo (struct S *x, struct S *y, int *z, int w) > +{ > + *o = w; > + if (w) > + bar (0, 0, 0, -1); > + x->d = z; > + if (y->d) > + y->c = y->c + y->d[0]; > + bar (t->g, 0, y->c, 0); > +} > + > +int > +main () > +{ > + int a[4] = { 8, 9, 10, 11 }; > + struct S s = { 1, 2, 3, &a[0] }; > + struct T u = { 0, 0, &a[3] }; > + o = &a[2]; > + t = &u; > + foo (&s, &s, &a[1], 5); > + if (s.c != 12 || s.d != &a[1]) > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c > b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..ff209c5df29765441bbe9481ac8caf7bbc6af8f7 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c > @@ -0,0 +1,441 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "Ensure only targetting arm with TARGET_LDRD" { *-*-* } { > "-mthumb" } { "" } } */ > +/* { dg-options "-O3 -marm -fdump-rtl-peephole2" } */ > + > +/* > + Test file contains testcases that are there to check. > + 1) Each peephole generates the expected patterns. > + 2) These patterns match the expected define_insns and generate > ldrd/strd. > + 2) Memory alias information is not lost in the peephole transformation. > + > + I don't check the peephole pass on most of the functions here but just > check > + the correct assembly is output. The ldrd/strd peepholes only generate a > + different pattern to the ldm/stm peepholes in some specific cases, and > those > + are checked. > + > + The exceptions are tested by the crafted testcases at the end of this file > + that are named in the pattern foo_x[[:digit:]]. > + > + The first testcase (foo_mem_11) demonstrates bug 88714 is fixed by > checking > + that both alias sets in the RTL are preserved. > + > + All other testcases are only checked to see that they generate a LDRD or > + STRD instruction accordingly. > + */ > + > + > +/* Example of bugzilla 88714 -- memory aliasing info needs to be retained. > */ > +int __RTL (startwith ("peephole2")) foo_mem_11 (int *a, int *b) > +{ > +(function "foo_mem_11" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (reg:SI r0) [1 S4 A64])) > "/home/matmal01/test.c":18) > + (cinsn 102 (set (reg:SI r3) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [2 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 103 (set (reg:SI r0) > + (plus:SI (reg:SI r2) (reg:SI r3))) > "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* { dg-final { scan-rtl-dump {Function > foo_mem_11.*\(mem/c:SI[^\n]*\[1.*\(mem/c:SI[^\n]*\n[^\n]*\[2.*Function foo11} > "peephole2" } } */ > + > +/* ldrd plain peephole2. */ > +int __RTL (startwith ("peephole2")) foo11 (int *a) > +{ > +(function "foo11" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (reg:SI r0) [0 S4 A64])) > "/home/matmal01/test.c":18) > + (cinsn 102 (set (reg:SI r3) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 103 (set (reg:SI r0) > + (plus:SI (reg:SI r2) (reg:SI r3))) > "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > +/* ldrd plain peephole2, which accepts insns initially out of order. */ > +int __RTL (startwith ("peephole2")) foo11_alt (int *a) > +{ > +(function "foo11_alt" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 102 (set (reg:SI r3) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (reg:SI r0) [0 S4 A64])) > "/home/matmal01/test.c":18) > + (cinsn 103 (set (reg:SI r0) > + (plus:SI (reg:SI r2) (reg:SI r3))) > "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > +/* strd plain peephole2. */ > +int __RTL (startwith ("peephole2")) foo12 (int *a) > +{ > +(function "foo12" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r2)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r3)) "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > +/* strd of constants -- store interleaved with constant move into register. > + Use same register twice to ensure we use the relevant pattern. */ > +int __RTL (startwith ("peephole2")) foo13 (int *a) > +{ > +(function "foo13" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 99 (set (reg:SI r2) > + (const_int 1)) "/home/matmal01/test.c":18) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r2)) "/home/matmal01/test.c":18) > + (cinsn 100 (set (reg:SI r2) > + (const_int 0)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r2)) "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > +/* strd of constants -- stores after constant moves into registers. > + Use registers out of order, is only way to avoid plain strd while hitting > + this pattern. */ > +int __RTL (startwith ("peephole2")) foo14 (int *a) > +{ > +(function "foo14" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 99 (set (reg:SI r3) > + (const_int 1)) "/home/matmal01/test.c":18) > + (cinsn 100 (set (reg:SI r2) > + (const_int 0)) "/home/matmal01/test.c":18) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r3)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r2)) "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > +/* swap the destination registers of two loads before a commutative > operation. > + Here the commutative operation is what the peephole uses to know it can > + swap the register loads around. */ > +int __RTL (startwith ("peephole2")) foo15 (int *a) > +{ > +(function "foo15" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 100 (set (reg:SI r3) > + (mem/c:SI (reg:SI r0) [0 S4 A64])) > "/home/matmal01/test.c":18) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 102 (set (reg:SI r0) > + (plus:SI (reg:SI r2) (reg:SI r3))) > "/home/matmal01/test.c":18 > + (expr_list:REG_DEAD (reg:SI r2) > + (expr_list:REG_DEAD (reg:SI r3) > + (nil)))) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > + > + > +/* swap the destination registers of two loads before a commutative operation > + that sets the flags. */ > +/* > + NOTE Can't make a testcase for this pattern since there are no insn > patterns > + matching the parallel insn in the peephole. > + > + i.e. until some define_insn is defined matching that insn that peephole > can > + never match in real code, and in artificial RTL code any pattern that can > + match it will cause an ICE. > + > +int __RTL (startwith ("peephole2")) foo16 (int *a) > +{ > +(function "foo16" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 100 (set (reg:SI r3) > + (mem/c:SI (reg:SI r0) [0 S4 A64])) > "/home/matmal01/test.c":18) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 103 (parallel > + [(set (reg:SI r0) > + (and:SI (reg:SI r3) (reg:SI r2))) > + (clobber (reg:CC cc))]) "/home/matmal01/test.c":18 > + (expr_list:REG_DEAD (reg:SI r2) > + (expr_list:REG_DEAD (reg:SI r3) > + (nil)))) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +*/ > + > + > +/* Making patterns that will behave differently between the LDM/STM peepholes > + and LDRD/STRD peepholes. > + gen_operands_ldrd_strd() uses peep2_find_free_register() to find spare > + registers to use. > + peep2_find_free_register() only ever returns registers marked in > + call_used_regs, hence we make sure to leave register 2 and 3 available (as > + they are always on in the defaults marked by CALL_USED_REGISTERS). */ > + > +/* gen_operands_ldrd_strd() purposefully finds an even register to look at > + which would treat the following pattern differently to the stm peepholes. > + */ > +int __RTL (startwith ("peephole2")) foo_x1 (int *a) > +{ > +(function "foo_x1" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 99 (set (reg:SI r5) > + (const_int 1)) "/home/matmal01/test.c":18) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r5)) "/home/matmal01/test.c":18) > + (cinsn 100 (set (reg:SI r5) > + (const_int 0)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r5)) "/home/matmal01/test.c":18 > + (expr_list:REG_DEAD (reg:SI r5) > + (nil))) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* Ensure we generated a parallel that started with a set from an even > register. > + i.e. > + (parallel [ > + (set (mem > + (reg:SI <even> > + */ > +/* { dg-final { scan-rtl-dump {Function foo_x1.*\(parallel \[\n[^\n]*\(set > \(mem[^\n]*\n[^\n]*\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\).*Function > foo_x2} "peephole2" } } */ > + > +/* Like above gen_operands_ldrd_strd() would look to start with an even > + register while gen_const_stm_seq() doesn't care. */ > +int __RTL (startwith ("peephole2")) foo_x2 (int *a) > +{ > +(function "foo_x2" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 99 (set (reg:SI r5) > + (const_int 1)) "/home/matmal01/test.c":18) > + (cinsn 100 (set (reg:SI r6) > + (const_int 0)) "/home/matmal01/test.c":18) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r5)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r6)) "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* Ensure generated parallel starts with a set from an even register (as > foo_x1). */ > +/* { dg-final { scan-rtl-dump {Function foo_x2.*\(parallel \[\n[^\n]*\(set > \(mem[^\n]*\n[^\n]*\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\).*Function > foo_x3} "peephole2" } } */ > + > +/* When storing multiple values into a register that will be used later, ldrd > + searches for another register to use instead of just giving up. */ > +int __RTL (startwith ("peephole2")) foo_x3 (int *a) > +{ > +(function "foo_x3" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 99 (set (reg:SI r3) > + (const_int 1)) "/home/matmal01/test.c":18) > + (cinsn 101 (set (mem/c:SI (reg:SI r0) [0 S4 A64]) > + (reg:SI r3)) "/home/matmal01/test.c":18) > + (cinsn 100 (set (reg:SI r3) > + (const_int 0)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 4)) [0 S4 > A32]) > + (reg:SI r3)) "/home/matmal01/test.c":18) > + (cinsn 103 (set (reg:SI r0) > + (plus:SI (reg:SI r0) (reg:SI r3))) > "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* Ensure generated parallel starts with a set from an even register (as > foo_x1). */ > +/* { dg-final { scan-rtl-dump {Function foo_x3.*\(parallel \[\n[^\n]*\(set > \(mem[^\n]*\n[^\n]*\(reg:SI (?:[12])?[2468] r(?:[12])?[2468]\).*Function > foo_x4} "peephole2" } } */ > + > +/* ldrd gen_peephole2_11 but using plus 8 and plus 12 in the offsets. */ > +int __RTL (startwith ("peephole2")) foo_x4 (int *a) > +{ > +(function "foo_x4" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (reg:SI r2) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4 > A64])) "/home/matmal01/test.c":18) > + (cinsn 102 (set (reg:SI r3) > + (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4 > A32])) "/home/matmal01/test.c":18) > + (cinsn 103 (set (reg:SI r0) > + (plus:SI (reg:SI r2) (reg:SI r3))) > "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* Ensure generated parallel starts with a set from the appropriate offset > from > + register 0. > +(parallel [ > + (set (reg:SI ... > + (mem/c:SI (plus:SI (reg:SI 0 r0) > + (const_int 8 .* > +*/ > +/* { dg-final { scan-rtl-dump {Function foo_x4.*\(parallel \[\n[^\n]*\(set > \(reg:SI[^\n]*\n *\(mem/c:SI \(plus:SI \(reg:SI 0 r0\)\n *\(const_int > 8.*Function foo_x5} "peephole2" } } */ > + > +/* strd gen_peephole2_12 but using plus 8 and plus 12 in the offsets. */ > +int __RTL (startwith ("peephole2")) foo_x5 (int *a) > +{ > +(function "foo12" > + (insn-chain > + (cnote 1 NOTE_INSN_DELETED) > + (block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cinsn 101 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 8)) [0 S4 > A64]) > + (reg:SI r2)) "/home/matmal01/test.c":18) > + (cinsn 102 (set (mem/c:SI (plus:SI (reg:SI r0) (const_int 12)) [0 S4 > A32]) > + (reg:SI r3)) "/home/matmal01/test.c":18) > + (edge-to exit (flags "FALLTHRU")) > + ) ;; block 2 > + ) ;; insn-chain > + (crtl > + (return_rtx > + (reg/i:SI r0) > + ) ;; return_rtx > + ) ;; crtl > +) ;; function "main" > +} > +/* Ensure generated parallel starts with a set to the appropriate offset from > + register 0. */ > +/* { dg-final { scan-rtl-dump {Function foo_x5.*\(parallel \[\n[^\n]*\(set > \(mem/c:SI \(plus:SI \(reg:SI 0 r0\)\n *\(const_int 8.*$} "peephole2" } } */ > + > + > +/* { dg-final { scan-assembler-not "ldm" } } */ > +/* { dg-final { scan-assembler-not "stm" } } */ > +/* { dg-final { scan-assembler-times {ldrd\tr[2468], \[r0\]} 4 } } */ > +/* { dg-final { scan-assembler-times {ldrd\tr[2468], \[r0, #8\]} 1 } } */ > +/* { dg-final { scan-assembler-times {strd\tr[2468], \[r0\]} 6 } } */ > +/* { dg-final { scan-assembler-times {strd\tr[2468], \[r0, #8\]} 1 } } */ >