[PATCH, AArch64] Improve handling of constants destined for FP_REGS
(This patch supercedes this one: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01462.html) The movdi_aarch64 pattern allows moving a constant into an FP_REG, but has the constraint Dd, which is stricter than the constraint N for moving a constant into a CORE_REG. This is due to restricted values allowed for MOVI instruction. Due to the predicate allowing any constant that is valid for the CORE_REGs, we can run into situations where IRA/reload has decided to use FP_REGs but the value is not actually valid for MOVI. This patch makes use of TARGET_PREFERRED_RELOAD_CLASS to ensure that NO_REGS (which leads to literal pool) is returned, when the immediate can't be put directly into FP_REGS. A testcase is included. Linux regressions all came back good. OK for trunk? Cheers, Ian 2013-09-04 Ian Bolton gcc/ * config/aarch64/aarch64.c (aarch64_preferred_reload_class): Return NO_REGS for immediate that can't be moved directly into FP_REGS. testsuite/ * gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index aed035a..2c07ccf 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4236,10 +4236,18 @@ aarch64_class_max_nregs (reg_class_t regclass, enum machine_mode mode) } static reg_class_t -aarch64_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass) +aarch64_preferred_reload_class (rtx x, reg_class_t regclass) { - return ((regclass == POINTER_REGS || regclass == STACK_REG) - ? GENERAL_REGS : regclass); + if (regclass == POINTER_REGS || regclass == STACK_REG) +return GENERAL_REGS; + + /* If it's an integer immediate that MOVI can't handle, then + FP_REGS is not an option, so we return NO_REGS instead. */ + if (CONST_INT_P (x) && reg_class_subset_p (regclass, FP_REGS) + && !aarch64_simd_imm_scalar_p (x, GET_MODE (x))) +return NO_REGS; + + return regclass; } void diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c b/gcc/testsuite/gcc.target/aarch64/movdi_1.c new file mode 100644 index 000..a22378d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-inline" } */ + +#include + +void +foo1 (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0x80004cf3dffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +} + +void +foo2 (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0xdffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +}
[PATCH][AArch64] Restrict usage of SBFIZ to valid range only
This fixes an issue where we were generating an SBFIZ with operand 3 outside of the valid range (as determined by the size of the destination register and the amount of shift). My patch checks that the range is valid before allowing the pattern to be used. This has now had full regression testing and all is OK. OK for aarch64-trunk and aarch64-4_7-branch? Cheers, Ian 2012-10-15 Ian Bolton * gcc/config/aarch64/aarch64.md (_shft_): Restrict based on op2. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..3bfe6e6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2311,7 +2311,7 @@ (ashift:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 "register_operand" "r")) (match_operand 2 "const_int_operand" "n")))] - "" + " <= ( - UINTVAL (operands[2]))" "bfiz\\t%0, %1, %2, #" [(set_attr "v8type" "bfm") (set_attr "mode" "")]
RE: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only
> Subject: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only > > This fixes an issue where we were generating an SBFIZ with > operand 3 outside of the valid range (as determined by the > size of the destination register and the amount of shift). > > My patch checks that the range is valid before allowing > the pattern to be used. > > This has now had full regression testing and all is OK. > > OK for aarch64-trunk and aarch64-4_7-branch? > > Cheers, > Ian > > > 2012-10-15 Ian Bolton > > * gcc/config/aarch64/aarch64.md > (_shft_): Restrict based on op2. > > > - > > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > index e6086a9..3bfe6e6 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -2311,7 +2311,7 @@ > (ashift:GPI (ANY_EXTEND:GPI > (match_operand:ALLX 1 "register_operand" "r")) > (match_operand 2 "const_int_operand" "n")))] > - "" > + " <= ( - UINTVAL (operands[2]))" >"bfiz\\t%0, %1, %2, #" >[(set_attr "v8type" "bfm") > (set_attr "mode" "")] > > > New and improved version is at the end of this email. This has had full regression testing and all is OK. OK for aarch64-trunk and aarch64-4_7-branch? Cheers, Ian 2012-10-16 Ian Bolton * gcc/config/aarch64/aarch64.md (_shft_): Restrict operands. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..e77496f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2311,8 +2311,13 @@ (ashift:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 "register_operand" "r")) (match_operand 2 "const_int_operand" "n")))] - "" - "bfiz\\t%0, %1, %2, #" + "UINTVAL (operands[2]) < " +{ + operands[3] = ( <= ( - UINTVAL (operands[2]))) + ? GEN_INT () + : GEN_INT ( - UINTVAL (operands[2])); + return "bfiz\t%0, %1, %2, %3"; +} [(set_attr "v8type" "bfm") (set_attr "mode" "")] )
[PATCH, ARM] Implement __builtin_trap
Hi, Currently, on ARM, you have to either call abort() or raise(SIGTRAP) to achieve a handy crash. This patch allows you to instead call __builtin_trap() which is much more efficient at falling over because it becomes just a single instruction that will trap for you. Two testcases have been added (for ARM and Thumb) and both pass. Note: This is a modified version of a patch originally submitted by Mark Mitchell back in 2010, which came in response to PR target/59091. http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) OK for trunk? Cheers, Ian 2013-12-04 Ian Bolton Mark Mitchell gcc/ * config/arm/arm.md (trap): New pattern. * config/arm/types.md: Added a type for trap. testsuite/ * gcc.target/arm/builtin-trap.c: New test. * gcc.target/arm/thumb-builtin-trap.c: Likewise. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index dd73366..3b7a827 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9927,6 +9927,22 @@ (set_attr "type" "mov_reg")] ) +(define_insn "trap" + [(trap_if (const_int 1) (const_int 0))] + "" + "* + if (TARGET_ARM) +return \".inst\\t0xe7f000f0\"; + else +return \".inst\\t0xdeff\"; + " + [(set (attr "length") + (if_then_else (eq_attr "is_thumb" "yes") + (const_int 2) + (const_int 4))) + (set_attr "type" "trap")] +) + ;; Patterns to allow combination of arithmetic, cond code and shifts diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 1c4b9e3..6351f08 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -152,6 +152,7 @@ ; store2 store 2 words to memory from arm registers. ; store3 store 3 words to memory from arm registers. ; store4 store 4 (or more) words to memory from arm registers. +; trap cause a trap in the kernel. ; udiv unsigned division. ; umaal unsigned multiply accumulate accumulate long. ; umlal unsigned multiply accumulate long. @@ -645,6 +646,7 @@ store2,\ store3,\ store4,\ + trap,\ udiv,\ umaal,\ umlal,\ diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c b/gcc/testsuite/gcc.target/arm/builtin-trap.c new file mode 100644 index 000..4ff8d25 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler "0xe7f000f0" { target { arm_nothumb } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c new file mode 100644 index 000..22e90e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mthumb" } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler "0xdeff" } } */
RE: [PATCH, ARM] Implement __builtin_trap
> On Wed, 4 Dec 2013, Ian Bolton wrote: > > > The main update, other than cosmetic differences, is that we've > chosen > > the same ARM encoding as LLVM for practical purposes. (The Thumb > > encoding in Mark's patch already matched LLVM.) > > Do the encodings match what plain "udf" does in recent-enough gas (too > recent for us to assume it in GCC or glibc for now), or is it something > else? Hi Joseph, Yes, these encodings match the UDF instruction that is defined in the most recent edition of the ARM architecture reference manual. Thumb: 0xde00 | imm8 (we chose 0xff for the imm8) ARM: 0xe7f000f0 | (imm12 << 8) | imm4 (we chose to use 0 for both imms) So as not to break old versions of gas that don't recognise UDF, the encoding is output directly. Apologies if I have over-explained there! Cheers, Ian
RE: [PATCH, ARM] Implement __builtin_trap
> > Hi, > > > > Currently, on ARM, you have to either call abort() or raise(SIGTRAP) > > to achieve a handy crash. > > > > This patch allows you to instead call __builtin_trap() which is much > > more efficient at falling over because it becomes just a single > > instruction that will trap for you. > > > > Two testcases have been added (for ARM and Thumb) and both pass. > > > > > > Note: This is a modified version of a patch originally submitted by > Mark > > Mitchell back in 2010, which came in response to PR target/59091. > > > > http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 > > > > The main update, other than cosmetic differences, is that we've > chosen > > the same ARM encoding as LLVM for practical purposes. (The Thumb > > encoding in Mark's patch already matched LLVM.) > > > > > > OK for trunk? > > > > Cheers, > > Ian > > > > > > 2013-12-04 Ian Bolton > >Mark Mitchell > > > > gcc/ > >* config/arm/arm.md (trap): New pattern. > >* config/arm/types.md: Added a type for trap. > > > > testsuite/ > >* gcc.target/arm/builtin-trap.c: New test. > >* gcc.target/arm/thumb-builtin-trap.c: Likewise. > > > > This needs to set the conds attribute to "unconditional". Otherwise > the ARM backend might try to turn this into a conditional instruction. > > R. Thanks, Richard. I fixed it up, tested it and committed as trivial difference compared to what was approved already.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index dd73366..934b859 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9927,6 +9927,23 @@ (set_attr "type" "mov_reg")] ) +(define_insn "trap" + [(trap_if (const_int 1) (const_int 0))] + "" + "* + if (TARGET_ARM) +return \".inst\\t0xe7f000f0\"; + else +return \".inst\\t0xdeff\"; + " + [(set (attr "length") + (if_then_else (eq_attr "is_thumb" "yes") + (const_int 2) + (const_int 4))) + (set_attr "type" "trap") + (set_attr "conds" "unconditional")] +) + ;; Patterns to allow combination of arithmetic, cond code and shifts diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 1c4b9e3..6351f08 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -152,6 +152,7 @@ ; store2 store 2 words to memory from arm registers. ; store3 store 3 words to memory from arm registers. ; store4 store 4 (or more) words to memory from arm registers. +; trap cause a trap in the kernel. ; udiv unsigned division. ; umaal unsigned multiply accumulate accumulate long. ; umlal unsigned multiply accumulate long. @@ -645,6 +646,7 @@ store2,\ store3,\ store4,\ + trap,\ udiv,\ umaal,\ umlal,\ diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c b/gcc/testsuite/gcc.target/arm/builtin-trap.c new file mode 100644 index 000..4ff8d25 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler "0xe7f000f0" { target { arm_nothumb } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c new file mode 100644 index 000..22e90e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mthumb" } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler "0xdeff" } } */
[PATCH,AArch64] Optimise comparison where intermediate result not used
Hi all, When we perform an addition but only use the result for a comparison, we can save an instruction. Consider this function: int foo (int a, int b) { return ((a + b) == 0) ? 1 : 7; } Here is the original output: foo: add w0, w0, w1 cmp w0, wzr mov w1, 7 mov w0, 1 csel w0, w1, w0, ne ret Now we get this: foo: cmn w0, w1 mov w1, 7 mov w0, 1 cselw0, w1, w0, ne ret :) I added other testcases for this and also some for adds and subs, which were investigated as part of this work. OK for trunk? Cheers, Ian 2012-11-06 Ian Bolton * gcc/config/aarch64/aarch64.md (*compare_neg): New pattern. * gcc/testsuite/gcc.target/aarch64/cmn.c: New test. * gcc/testsuite/gcc.target/aarch64/adds.c: New test. * gcc/testsuite/gcc.target/aarch64/subs.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..6935192 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1310,6 +1310,17 @@ (set_attr "mode" "")] ) +(define_insn "*compare_neg" + [(set (reg:CC CC_REGNUM) + (compare:CC +(match_operand:GPI 0 "register_operand" "r") +(neg:GPI (match_operand:GPI 1 "register_operand" "r"] + "" + "cmn\\t%0, %1" + [(set_attr "v8type" "alus") + (set_attr "mode" "")] +) + (define_insn "*add__" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r") diff --git a/gcc/testsuite/gcc.target/aarch64/adds.c b/gcc/testsuite/gcc.target/aarch64/adds.c new file mode 100644 index 000..aa42321 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/adds.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int z; +int +foo (int x, int y) +{ + int l = x + y; + if (l == 0) +return 5; + + /* { dg-final { scan-assembler "adds\tw\[0-9\]" } } */ + z = l ; + return 25; +} + +typedef long long s64; + +s64 zz; +s64 +foo2 (s64 x, s64 y) +{ + s64 l = x + y; + if (l < 0) +return 5; + + /* { dg-final { scan-assembler "adds\tx\[0-9\]" } } */ + zz = l ; + return 25; +} diff --git a/gcc/testsuite/gcc.target/aarch64/cmn.c b/gcc/testsuite/gcc.target/aarch64/cmn.c new file mode 100644 index 000..1f06f57 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmn.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (int a, int b) +{ + if (a + b) +return 5; + else +return 2; + /* { dg-final { scan-assembler "cmn\tw\[0-9\]" } } */ +} + +typedef long long s64; + +s64 +foo2 (s64 a, s64 b) +{ + if (a + b) +return 5; + else +return 2; + /* { dg-final { scan-assembler "cmn\tx\[0-9\]" } } */ +} diff --git a/gcc/testsuite/gcc.target/aarch64/subs.c b/gcc/testsuite/gcc.target/aarch64/subs.c new file mode 100644 index 000..2bf1975 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/subs.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int z; +int +foo (int x, int y) +{ + int l = x - y; + if (l == 0) +return 5; + + /* { dg-final { scan-assembler "subs\tw\[0-9\]" } } */ + z = l ; + return 25; +} + +typedef long long s64; + +s64 zz; +s64 +foo2 (s64 x, s64 y) +{ + s64 l = x - y; + if (l < 0) +return 5; + + /* { dg-final { scan-assembler "subs\tx\[0-9\]" } } */ + zz = l ; + return 25; +}
[PATCH,AArch64] Use CSINC instead of CSEL to return 1
Where a CSEL can return the value 1 as one of the alternatives, it is usually more efficient to use a CSINC than a CSEL (and never less efficient), since the value of 1 can be derived from wzr, rather than needing to set it up in a register first. This patch enables this capability. It has been regression tested on trunk. OK for commit? Cheers, Ian 2012-11-06 Ian Bolton * gcc/config/aarch64/aarch64.md (cmov_insn): Emit CSINC when one of the alternatives is constant 1. * gcc/config/aarch64/constraints.md: New constraint. * gcc/config/aarch64/predicates.md: Rename predicate aarch64_reg_zero_or_m1 to aarch64_reg_zero_or_m1_or_1. * gcc/testsuite/gcc.target/aarch64/csinc-2.c: New test. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 6935192..038465e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1877,19 +1877,23 @@ ) (define_insn "*cmov_insn" - [(set (match_operand:ALLI 0 "register_operand" "=r,r,r,r") + [(set (match_operand:ALLI 0 "register_operand" "=r,r,r,r,r,r,r") (if_then_else:ALLI (match_operator 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]) -(match_operand:ALLI 3 "aarch64_reg_zero_or_m1" "rZ,rZ,UsM,UsM") -(match_operand:ALLI 4 "aarch64_reg_zero_or_m1" "rZ,UsM,rZ,UsM")))] - "" - ;; Final alternative should be unreachable, but included for completeness +(match_operand:ALLI 3 "aarch64_reg_zero_or_m1_or_1" "rZ,rZ,UsM,rZ,Ui1,UsM,Ui1") +(match_operand:ALLI 4 "aarch64_reg_zero_or_m1_or_1" "rZ,UsM,rZ,Ui1,rZ,UsM,Ui1")))] + "!((operands[3] == const1_rtx && operands[4] == constm1_rtx) + || (operands[3] == constm1_rtx && operands[4] == const1_rtx))" + ;; Final two alternatives should be unreachable, but included for completeness "@ csel\\t%0, %3, %4, %m1 csinv\\t%0, %3, zr, %m1 csinv\\t%0, %4, zr, %M1 - mov\\t%0, -1" + csinc\\t%0, %3, zr, %m1 + csinc\\t%0, %4, zr, %M1 + mov\\t%0, -1 + mov\\t%0, 1" [(set_attr "v8type" "csel") (set_attr "mode" "")] ) diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index da50a47..780faaa 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -102,6 +102,11 @@ A constraint that matches the immediate constant -1." (match_test "op == constm1_rtx")) +(define_constraint "Ui1" + "@internal + A constraint that matches the immediate constant +1." + (match_test "op == const1_rtx")) + (define_constraint "Ui3" "@internal A constraint that matches the integers 0...4." diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 328e5cf..aae71c1 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -31,11 +31,12 @@ (ior (match_operand 0 "register_operand") (match_test "op == const0_rtx" -(define_predicate "aarch64_reg_zero_or_m1" +(define_predicate "aarch64_reg_zero_or_m1_or_1" (and (match_code "reg,subreg,const_int") (ior (match_operand 0 "register_operand") (ior (match_test "op == const0_rtx") -(match_test "op == constm1_rtx") +(ior (match_test "op == constm1_rtx") + (match_test "op == const1_rtx")) (define_predicate "aarch64_fp_compare_operand" (ior (match_operand 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/aarch64/csinc-2.c b/gcc/testsuite/gcc.target/aarch64/csinc-2.c new file mode 100644 index 000..6ed9080 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinc-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (int a, int b) +{ + return (a < b) ? 1 : 7; + /* { dg-final { scan-assembler "csinc\tw\[0-9\].*wzr" } } */ +} + +typedef long long s64; + +s64 +foo2 (s64 a, s64 b) +{ + return (a == b) ? 7 : 1; + /* { dg-final { scan-assembler "csinc\tx\[0-9\].*xzr" } } */ +}
[PATCH arm/aarch64-4.7] Fix up Changelogs
Some changes had been added to gcc/ChangeLog and gcc/testsuite/Changelog when they should have been recorded in the gcc/Changelog.aarch64 and gcc/testsuite/Changelog.aarch64 files instead. Committed as obvious. Cheers, Ian
[PATCH AArch64] Fix faulty commit of testsuite/gcc.target/aarch64/csinc-2.c
A commit I did earlier in the week got truncated somehow, leading to a broken testcase for AArch64 target. I've just commited this fix as obvious on trunk and the arm/aarch64-4.7-branch. Cheers Ian Index: gcc/testsuite/gcc.target/aarch64/csinc-2.c === --- gcc/testsuite/gcc.target/aarch64/csinc-2.c (revision 193571) +++ gcc/testsuite/gcc.target/aarch64/csinc-2.c (revision 193572) @@ -12,3 +12,7 @@ typedef long long s64; s64 foo2 (s64 a, s64 b) +{ + return (a == b) ? 7 : 1; + /* { dg-final { scan-assembler "csinc\tx\[0-9\].*xzr" } } */ +}
[PATCH AArch64] Implement bswaphi2 with rev16
This patch implements the standard pattern bswaphi2 for AArch64. Regression tests all pass. OK for trunk and backport to arm/aarch64-4.7-branch? Cheers, Ian 2012-11-16 Ian Bolton * gcc/config/aarch64/aarch64.md (bswaphi2): New pattern. * gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c: New test. * gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c: New test. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..22c7103 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2340,6 +2340,15 @@ (set_attr "mode" "")] ) +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=r") +(bswap:HI (match_operand:HI 1 "register_operand" "r")))] + "" + "rev16\\t%w0, %w1" + [(set_attr "v8type" "rev") + (set_attr "mode" "HI")] +) + ;; --- ;; Floating-point intrinsics ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c new file mode 100644 index 000..a6706e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times "rev16\\t" 2 } } */ + +/* rev16 */ +short +swaps16 (short x) +{ + return __builtin_bswap16 (x); +} + +/* rev16 */ +unsigned short +swapu16 (unsigned short x) +{ + return __builtin_bswap16 (x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c new file mode 100644 index 000..6018b48 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times "rev16\\t" 2 } } */ + +/* rev16 */ +unsigned short +swapu16_1 (unsigned short x) +{ + return (x << 8) | (x >> 8); +} + +/* rev16 */ +unsigned short +swapu16_2 (unsigned short x) +{ + return (x >> 8) | (x << 8); +}
[PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation
I had already committed my testcase for this for aarch64, but it depends on this patch that doesn't yet exist in 4.7, so I backported to our ARM/aarch64-4.7-branch. Cheers, Ian From: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19c8192 d0d099d157145 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index be34843..ce08fce 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2012-09-26 Christophe Lyon + + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field. + (execute_optimize_bswap): Add support for builtin_bswap16. + 2012-09-26 Richard Guenther * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 3aad841..7c96949 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2012-09-26 Christophe Lyon + + * gcc.target/arm/builtin-bswap16-1.c: New testcase. + 2012-09-25 Segher Boessenkool PR target/51274 diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c new file mode 100644 index 000..6920f00 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target arm_arch_v6_ok } */ +/* { dg-add-options arm_arch_v6 } */ +/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */ + +unsigned short swapu16_1 (unsigned short x) +{ + return (x << 8) | (x >> 8); +} + +unsigned short swapu16_2 (unsigned short x) +{ + return (x >> 8) | (x << 8); +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 16ff397..d9f4e9e 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -154,6 +154,9 @@ static struct static struct { + /* Number of hand-written 16-bit bswaps found. */ + int found_16bit; + /* Number of hand-written 32-bit bswaps found. */ int found_32bit; @@ -1803,9 +1806,9 @@ static unsigned int execute_optimize_bswap (void) { basic_block bb; - bool bswap32_p, bswap64_p; + bool bswap16_p, bswap32_p, bswap64_p; bool changed = false; - tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; + tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; if (BITS_PER_UNIT != 8) return 0; @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void) if (sizeof (HOST_WIDEST_INT) < 8) return 0; + bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) + && optab_handler (bswap_optab, HImode) != CODE_FOR_nothing); bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) && optab_handler (bswap_optab, SImode) != CODE_FOR_nothing); bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) && (optab_handler (bswap_optab, DImode) != CODE_FOR_nothing || (bswap32_p && word_mode == SImode))); - if (!bswap32_p && !bswap64_p) + if (!bswap16_p && !bswap32_p && !bswap64_p) return 0; /* Determine the argument type of the builtins. The code later on assumes that the return and argument type are the same. */ + if (bswap16_p) +{ + tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); +} + if (bswap32_p) { tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void) switch (type_size) { + case 16: + if (bswap16_p) + { + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap_type = bswap16_type; + } + break; case 32: if (bswap32_p) { @@ -1890,7 +1908,9 @@ execute_optimize_bswap (void) continue; changed = true; - if (type_size == 32) + if (type_size == 16) + bswap_stats.found_16bit++; + else if (type_size == 32) bswap_stats.found_32bit++; else bswap_stats.found_64bit++; @@ -1935,6 +1955,8 @@ execute_optimize_bswap (void) } } + statistics_counter_event (cfun, "16-bit bswap implementations found", + bswap_stats.found_16bit); statistics_counter_event (cfun, "32-bit bswap implementations found", bswap_stats.found_32bit); statistics_counter_event (cfun, "64-bit bswap implementations found",
RE: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation
It turned out that this patch depended on another one from earlier, so I have backported that to ARM/aarch64-4.7-branch too. http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00452.html Cheers, Ian > -Original Message- > From: Ian Bolton [mailto:ian.bol...@arm.com] > Sent: 23 November 2012 18:09 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 > optimisation > > I had already committed my testcase for this for aarch64, but > it depends on this patch that doesn't yet exist in 4.7, so I > backported to our ARM/aarch64-4.7-branch. > > Cheers, > Ian > > > > From: > http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19 > c8192d0d099d157145 > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index be34843..ce08fce 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,8 @@ > +2012-09-26 Christophe Lyon > + > + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field. > + (execute_optimize_bswap): Add support for builtin_bswap16. > + > 2012-09-26 Richard Guenther > > * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS. > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 3aad841..7c96949 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,7 @@ > +2012-09-26 Christophe Lyon > + > + * gcc.target/arm/builtin-bswap16-1.c: New testcase. > + > 2012-09-25 Segher Boessenkool > > PR target/51274 > diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c > b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c > new file mode 100644 > index 000..6920f00 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target arm_arch_v6_ok } */ > +/* { dg-add-options arm_arch_v6 } */ > +/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */ > + > +unsigned short swapu16_1 (unsigned short x) > +{ > + return (x << 8) | (x >> 8); > +} > + > +unsigned short swapu16_2 (unsigned short x) > +{ > + return (x >> 8) | (x << 8); > +} > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index 16ff397..d9f4e9e 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -154,6 +154,9 @@ static struct > > static struct > { > + /* Number of hand-written 16-bit bswaps found. */ > + int found_16bit; > + >/* Number of hand-written 32-bit bswaps found. */ >int found_32bit; > > @@ -1803,9 +1806,9 @@ static unsigned int > execute_optimize_bswap (void) > { >basic_block bb; > - bool bswap32_p, bswap64_p; > + bool bswap16_p, bswap32_p, bswap64_p; >bool changed = false; > - tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; > + tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, > bswap64_type = NULL_TREE; > >if (BITS_PER_UNIT != 8) > return 0; > @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void) >if (sizeof (HOST_WIDEST_INT) < 8) > return 0; > > + bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) > +&& optab_handler (bswap_optab, HImode) != > CODE_FOR_nothing); >bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) > && optab_handler (bswap_optab, SImode) != > CODE_FOR_nothing); >bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) > && (optab_handler (bswap_optab, DImode) != > CODE_FOR_nothing > || (bswap32_p && word_mode == SImode))); > > - if (!bswap32_p && !bswap64_p) > + if (!bswap16_p && !bswap32_p && !bswap64_p) > return 0; > >/* Determine the argument type of the builtins. The code later on > assumes that the return and argument type are the same. */ > + if (bswap16_p) > +{ > + tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); > + bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); > +} > + >if (bswap32_p) > { >tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); > @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void) > > switch (type_size) > { > + case 16: > + if (bswap16_p) > + { > + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); > + bswap_type = bswap16_type; > + } > + break; > case 32: > if (bswap32_p) > { > @@ -1890,7 +1908,9 @@ execute_optimize_bswap
RE: [PATCH, AArch64] Make MOVK output operand 2 in hex
Since this is a bug fix, I'll need to backport to 4.8. Is that OK? Cheers, Ian > OK > /Marcus > > On 20 March 2013 17:21, Ian Bolton wrote: > > MOVK should not be generated with a negative immediate, which > > the assembler rightfully rejects. > > > > This patch makes MOVK output its 2nd operand in hex instead. > > > > Tested on bare-metal and linux. > > > > OK for trunk? > > > > Cheers, > > Ian > > > > > > 2013-03-20 Ian Bolton > > > > gcc/ > > * config/aarch64/aarch64.c (aarch64_print_operand): New > > format specifier for printing a constant in hex. > > * config/aarch64/aarch64.md (insv_imm): Use the X > > format specifier for printing second operand. > > > > testsuite/ > > * gcc.target/aarch64/movk.c: New test.
[PATCH, AArch64] Testcases for ANDS instruction
I made some testcases to go with my implementation of ANDS in the backend, but Naveen Hurugalawadi got the ANDS patterns in before me! I'm now just left with the testcases, but they are still worth adding, so here they are. Tests are working correctly as of current trunk. OK to commit? Cheers, Ian 2013-04-26 Ian Bolton * gcc.target/aarch64/ands.c: New test. * gcc.target/aarch64/ands2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands2.c === --- gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-not "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a & 0x; + + /* { dg-final { scan-assembler-not "ands\tw\[0-9\]+, w\[0-9\]+, -1717986919" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, -1717986919" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a & (b << 3); + + /* { dg-final { scan-assembler-not "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & b; + + /* { dg-final { scan-assembler-not "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & 0xll; + + /* { dg-final { scan-assembler-not "ands\tx\[0-9\]+, x\[0-9\]+, -6148914691236517206" } } */ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -6148914691236517206" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a & (b << 3); + + /* { dg-final { scan-assembler-not "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort (); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort (); + + x = ands_si_test2 (29, 4, 5); + if (x != 34) +abort (); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort (); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort (); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort (); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll & 0xll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test2 (0x540004100ll, + 0x32004ll, + 0x805050205ll); + if (y != (0x540004100ll + 0x805050205ll)) +abort (); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll & (0x06408ll << 3)) + + 0x06408ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/ands.c === --- gcc/testsuite/gcc.target/aarch64/ands.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands.c (revision 0) @@ -0,0 +1,151 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, in
[PATCH, AArch64] Support BICS instruction in the backend
With these patterns, we can now generate BICS in the appropriate places. I've included test cases. This has been run on linux and bare-metal regression tests. OK to commit? Cheers, Ian 2013-04-26 Ian Bolton gcc/ * config/aarch64/aarch64.md (*and_one_cmpl3_compare0): New pattern. (*and_one_cmplsi3_compare0_uxtw): Likewise. (*and_one_cmpl_3_compare0): Likewise. (*and_one_cmpl_si3_compare0_uxtw): Likewise. testsuite/ * gcc.target/aarch64/bics.c: New test. * gcc.target/aarch64/bics2.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/bics.c === --- gcc/testsuite/gcc.target/aarch64/bics.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics.c (revision 0) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a & ~b; + + /* { dg-final { scan-assembler "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a & ~(b << 3); + + /* { dg-final { scan-assembler "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~b; + + /* { dg-final { scan-assembler "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~(b << 3); + + /* { dg-final { scan-assembler "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 & 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 & ~(~4 << 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, + ~0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll & ~(~0x06408ll << 3)) + + ~0x06408ll + 0x505050505ll)) +abort (); + + y = bics_di_test2 (0x130002900ll, + ~0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/bics2.c === --- gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0) @@ -0,0 +1,111 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a & ~b; + + /* { dg-final { scan-assembler-not "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler "bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a & ~(b << 3); + + /* { dg-final { scan-assembler-not "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~b; + + /* { dg-final { scan-assembler-not "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + /* { dg-final { scan-assembler "bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~(b << 3); + + /* { dg-final { scan-assembler-not "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (2
[PATCH, AArch64] Support LDR/STR to/from S and D registers
This patch allows us to load to and store from the S and D registers, which helps with doing scalar operations in those registers. This has been regression tested on bare-metal and linux. OK for trunk? Cheers, Ian 2013-04-26 Ian Bolton * config/aarch64/aarch64.md (movsi_aarch64): Support LDR/STR from/to S register. (movdi_aarch64): Support LDR/STR from/to D register.Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 198231) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -808,26 +808,28 @@ (define_expand "mov" ) (define_insn "*movsi_aarch64" - [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m, *w, r,*w") - (match_operand:SI 1 "aarch64_mov_operand" " r,M,m,rZ,rZ,*w,*w"))] + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,*w,m, m,*w, r,*w") + (match_operand:SI 1 "aarch64_mov_operand" " r,M,m, m,rZ,*w,rZ,*w,*w"))] "(register_operand (operands[0], SImode) || aarch64_reg_or_zero (operands[1], SImode))" "@ mov\\t%w0, %w1 mov\\t%w0, %1 ldr\\t%w0, %1 + ldr\\t%s0, %1 str\\t%w1, %0 + str\\t%s1, %0 fmov\\t%s0, %w1 fmov\\t%w0, %s1 fmov\\t%s0, %s1" - [(set_attr "v8type" "move,alu,load1,store1,fmov,fmov,fmov") + [(set_attr "v8type" "move,alu,load1,load1,store1,store1,fmov,fmov,fmov") (set_attr "mode" "SI") - (set_attr "fp" "*,*,*,*,yes,yes,yes")] + (set_attr "fp" "*,*,*,*,*,*,yes,yes,yes")] ) (define_insn "*movdi_aarch64" - [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,m, r, r, *w, r,*w,w") - (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,m,rZ,Usa,Ush,rZ,*w,*w,Dd"))] + [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m, m,r, r, *w, r,*w,w") + (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,m, m,rZ,*w,Usa,Ush,rZ,*w,*w,Dd"))] "(register_operand (operands[0], DImode) || aarch64_reg_or_zero (operands[1], DImode))" "@ @@ -836,16 +838,18 @@ (define_insn "*movdi_aarch64" mov\\t%x0, %1 mov\\t%x0, %1 ldr\\t%x0, %1 + ldr\\t%d0, %1 str\\t%x1, %0 + str\\t%d1, %0 adr\\t%x0, %a1 adrp\\t%x0, %A1 fmov\\t%d0, %x1 fmov\\t%x0, %d1 fmov\\t%d0, %d1 movi\\t%d0, %1" - [(set_attr "v8type" "move,move,move,alu,load1,store1,adr,adr,fmov,fmov,fmov,fmov") + [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov") (set_attr "mode" "DI") - (set_attr "fp" "*,*,*,*,*,*,*,*,yes,yes,yes,yes")] + (set_attr "fp" "*,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes")] ) (define_insn "insv_imm"
RE: [PATCH, AArch64] Testcases for ANDS instruction
> From: Richard Earnshaw > This rule > > + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0- > 9\]+" } } */ > > Will match anything that this rule > > > + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0- > 9\]+, lsl 3" } } */ > > matches (though not vice versa). > > Similarly for the x register variants. > Thanks for the review. I've fixed this up in the attached patch, by counting the number of matches for the first rule and expecting it to match additional times to cover the overlap with the lsl based rule. I've also renamed the testcases in line with the suggested GCC testcase naming convention. OK for commit? Cheers, Ian 2013-05-01 Ian Bolton * gcc.target/aarch64/ands_1.c: New test. * gcc.target/aarch64/ands_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands_1.c === --- gcc/testsuite/gcc.target/aarch64/ands_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands_1.c (revision 0) @@ -0,0 +1,151 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-times "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a & 0xff; + + /* { dg-final { scan-assembler "ands\tw\[0-9\]+, w\[0-9\]+, 255" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a & (b << 3); + + /* { dg-final { scan-assembler "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & b; + + /* { dg-final { scan-assembler-times "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & 0xff; + + /* { dg-final { scan-assembler "ands\tx\[0-9\]+, x\[0-9\]+, 255" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a & (b << 3); + + /* { dg-final { scan-assembler "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort (); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort (); + + x = ands_si_test2 (29, 4, 5); + if (x != 38) +abort (); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort (); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort (); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort (); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll & 0xff) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test2 (0x130002900ll, + 0x32004ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll & (0x06408ll << 3)) + + 0x06408ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/ands_2.c === --- gcc/testsuite/gcc.target/aarch64/ands_2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands_2.c (revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-not "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler-times "and\tw\[0-9\]+, w\[0-9\
RE: [PATCH, AArch64] Support BICS instruction in the backend
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > + /* { dg-final { scan-assembler "bics\tx\[0-9\]+, x\[0-9\]+, x\[0- > 9\]+" } } */ > > + /* { dg-final { scan-assembler "bics\tx\[0-9\]+, x\[0-9\]+, > x\[0-9\]+, lsl 3" } } */ > > Ian, These two patterns have the same issue Richard just highlighted > on your other patch, ie the first pattern will also match anything > matched by the second pattern. > > /Marcus > I've fixed the rules in the testcases and renamed the files to match naming conventions in the latest patch (attached). OK to commit? Cheers, Ian 2013-05-01 Ian Bolton gcc/ * config/aarch64/aarch64.md (*and_one_cmpl3_compare0): New pattern. (*and_one_cmplsi3_compare0_uxtw): Likewise. (*and_one_cmpl_3_compare0): Likewise. (*and_one_cmpl_si3_compare0_uxtw): Likewise. testsuite/ * gcc.target/aarch64/bics_1.c: New test. * gcc.target/aarch64/bics_2.c: Likewise.
[PATCH, AArch64] Fix for LDR/STR to/from S and D registers
This is a fix for this patch: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01621.html If someone compiles with -mgeneral-regs-only then those instructions shouldn't be used. We can enforce that by adding the fp attribute to the relevant alternatives in the patterns. Regression tests all good. OK for trunk? Cheers, Ian 2013-05-01 Ian Bolton * config/aarch64/aarch64.md (movsi_aarch64): Only allow to/from S reg when fp attribute set. (movdi_aarch64): Only allow to/from D reg when fp attribute set.Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 198456) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -825,7 +825,7 @@ (define_insn "*movsi_aarch64" fmov\\t%s0, %s1" [(set_attr "v8type" "move,alu,load1,load1,store1,store1,fmov,fmov,fmov") (set_attr "mode" "SI") - (set_attr "fp" "*,*,*,*,*,*,yes,yes,yes")] + (set_attr "fp" "*,*,*,yes,*,yes,yes,yes,yes")] ) (define_insn "*movdi_aarch64" @@ -850,7 +850,7 @@ (define_insn "*movdi_aarch64" movi\\t%d0, %1" [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov") (set_attr "mode" "DI") - (set_attr "fp" "*,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes")] + (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,yes")] ) (define_insn "insv_imm"
RE: [PATCH, AArch64] Support BICS instruction in the backend
> Can we have the patch attached ? OK Index: gcc/testsuite/gcc.target/aarch64/bics_1.c === --- gcc/testsuite/gcc.target/aarch64/bics_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics_1.c (revision 0) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a & ~b; + + /* { dg-final { scan-assembler-times "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a & ~(b << 3); + + /* { dg-final { scan-assembler "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~b; + + /* { dg-final { scan-assembler-times "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~(b << 3); + + /* { dg-final { scan-assembler "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 & 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 & ~(~4 << 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, + ~0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll & ~(~0x06408ll << 3)) + + ~0x06408ll + 0x505050505ll)) +abort (); + + y = bics_di_test2 (0x130002900ll, + ~0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/bics_2.c === --- gcc/testsuite/gcc.target/aarch64/bics_2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics_2.c (revision 0) @@ -0,0 +1,111 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a & ~b; + + /* { dg-final { scan-assembler-not "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler-times "bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 2 } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a & ~(b << 3); + + /* { dg-final { scan-assembler-not "bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~b; + + /* { dg-final { scan-assembler-not "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + /* { dg-final { scan-assembler-times "bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" 2 } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & ~(b << 3); + + /* { dg-final { scan-assembler-not "bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 & 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 & ~(~4 << 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, +
[PATCH, AArch64] Testcases for TST instruction
I previously fixed a bug with the patterns that generate TST. I added these testcases to make our regression testing more solid. They've been running on our internal branch for about a month. OK to commit to trunk? Cheers, Ian 2013-05-02 Ian Bolton * gcc.target/aarch64/tst_1.c: New test. * gcc.target/aarch64/tst_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/tst_1.c === --- gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0) @@ -0,0 +1,150 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +tst_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-times "tst\tw\[0-9\]+, w\[0-9\]+" 2 } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +tst_si_test2 (int a, int b, int c) +{ + int d = a & 0x; + + /* { dg-final { scan-assembler "tst\tw\[0-9\]+, -1717986919" } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +tst_si_test3 (int a, int b, int c) +{ + int d = a & (b << 3); + + /* { dg-final { scan-assembler "tst\tw\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d == 0) +return 12; + else +return 18; +} + +typedef long long s64; + +s64 +tst_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & b; + + /* { dg-final { scan-assembler-times "tst\tx\[0-9\]+, x\[0-9\]+" 2 } } */ + if (d == 0) +return 12; + else +return 18; +} + +s64 +tst_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & 0xll; + + /* { dg-final { scan-assembler "tst\tx\[0-9\]+, -6148914691236517206" } } */ + if (d == 0) +return 12; + else +return 18; +} + +s64 +tst_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a & (b << 3); + + /* { dg-final { scan-assembler "tst\tx\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +main () +{ + int x; + s64 y; + + x = tst_si_test1 (29, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test1 (5, 2, 20); + if (x != 12) +abort (); + + x = tst_si_test2 (29, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test2 (1024, 2, 20); + if (x != 12) +abort (); + + x = tst_si_test3 (35, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test3 (5, 2, 20); + if (x != 12) +abort (); + + y = tst_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != 18) +abort (); + + y = tst_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 12) +abort (); + + y = tst_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != 18) +abort (); + + y = tst_di_test2 (0x540004100ll, + 0x32004ll, + 0x805050205ll); + if (y != 12) +abort (); + + y = tst_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != 18) +abort (); + + y = tst_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != 12) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/tst_2.c === --- gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0) @@ -0,0 +1,156 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +tst_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-not "tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler-times "and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 2 } } */ + if (d <= 0) +return 12; + else +return 18; +} + +int +tst_si_test2 (int a, int b, int c) +{ + int d = a & 0x; + + /* { dg-final { scan-assembler-not "tst\tw\[0-9\]+, w\[0-9\]+, -1717986919" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, -1717986919" } } */ + if (d <= 0) +return 12; + else +return 18; +} + +int +tst_si_test3 (int a, int b, int c) +{ + int d = a & (b << 3); + + /* { dg-final { scan-assembler-not "tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d <= 0) +return 12; + else +return 18; +} + +typedef long long s64; + +s64 +tst_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & b; + + /* { dg-final { scan-assembler-not "tst\tx\[0-9\]+, x
[PATCH, AArch64] Support BFI instruction and insv standard pattern
Hi, This patch implements the BFI variant of BFM. In doing so, it also implements the insv standard pattern. I've regression tested on bare-metal and linux. It comes complete with its own compilation and execution testcase. OK for trunk? Cheers, Ian 2013-05-08 Ian Bolton gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_reg): New define_insn. testsuite/ * gcc.target/aarch64/bfm_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 330f78c..b730ed0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3118,6 +3118,53 @@ (set_attr "mode" "")] ) +;; Bitfield Insert (insv) +(define_expand "insv" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand") + (match_operand 1 "const_int_operand") + (match_operand 2 "const_int_operand")) + (match_operand:GPI 3 "general_operand"))] + "" +{ + HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << INTVAL (operands[1])) - 1; + + if (GET_MODE_BITSIZE (mode) > BITS_PER_WORD + || INTVAL (operands[1]) < 1 + || INTVAL (operands[1]) >= GET_MODE_BITSIZE (mode) + || INTVAL (operands[2]) < 0 + || (INTVAL (operands[2]) + INTVAL (operands[1])) + > GET_MODE_BITSIZE (mode)) +FAIL; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if (CONST_INT_P (operands[3]) + && ((INTVAL (operands[3]) & mask) == 0 + || (INTVAL (operands[3]) & mask) == mask)) +FAIL; + + if (!register_operand (operands[3], mode)) +operands[3] = force_reg (mode, operands[3]); + + /* Intentional fall-through, which will lead to below pattern + being matched. */ +}) + +(define_insn "*insv_reg" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") + (match_operand 1 "const_int_operand" "n") + (match_operand 2 "const_int_operand" "n")) + (match_operand:GPI 3 "register_operand" "r"))] + "!(GET_MODE_BITSIZE (mode) > BITS_PER_WORD + || INTVAL (operands[1]) < 1 + || INTVAL (operands[1]) >= GET_MODE_BITSIZE (mode) + || INTVAL (operands[2]) < 0 + || (INTVAL (operands[2]) + INTVAL (operands[1])) + > GET_MODE_BITSIZE (mode))" + "bfi\\t%0, %3, %2, %1" + [(set_attr "v8type" "bfm") + (set_attr "mode" "")] +) + (define_insn "*_shft_" [(set (match_operand:GPI 0 "register_operand" "=r") (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/bfm_1.c b/gcc/testsuite/gcc.target/aarch64/bfm_1.c new file mode 100644 index 000..d9a73a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bfm_1.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tw\[0-9\]+, w\[0-9\]+, 0, 8" } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tw\[0-9\]+, w\[0-9\]+, 16, 5" } } */ + a.five = 7; + return a; +} + +int +main (int argc, char** argv) +{ + bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + + if (c.eight != 3) +abort (); + + if (c.five != 7) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits
The MOVK instruction is currently not used when operand 2 is more than 16 bits, which leads to sub-optimal code. This patch improves those situations by removing the check and instead masking down to 16 bits within the new "X" format specifier I added recently. OK for trunk? Cheers, Ian 2013-05-17 Ian Bolton * config/aarch64/aarch64.c (aarch64_print_operand): Change the X format specifier to only display bottom 16 bits. * config/aarch64/aarch64.md (insv_imm): Allow any-sized immediate to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b57416c..1bdfd85 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'X': - /* Print integer constant in hex. */ + /* Print bottom 16 bits of integer constant in hex. */ if (GET_CODE (x) != CONST_INT) { output_operand_lossage ("invalid operand for '%%%c'", code); return; } - asm_fprintf (f, "0x%wx", UINTVAL (x)); + asm_fprintf (f, "0x%wx", UINTVAL (x) & 0x); break; case 'w': diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b27bcda..403d717 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -858,9 +858,8 @@ (const_int 16) (match_operand:GPI 1 "const_int_operand" "n")) (match_operand:GPI 2 "const_int_operand" "n"))] - "INTVAL (operands[1]) < GET_MODE_BITSIZE (mode) - && INTVAL (operands[1]) % 16 == 0 - && UINTVAL (operands[2]) <= 0x" + "UINTVAL (operands[1]) < GET_MODE_BITSIZE (mode) + && UINTVAL (operands[1]) % 16 == 0" "movk\\t%0, %X2, lsl %1" [(set_attr "v8type" "movk") (set_attr "mode" "")]
RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern
> Hi, > > This patch implements the BFI variant of BFM. In doing so, it also > implements the insv standard pattern. > > I've regression tested on bare-metal and linux. > > It comes complete with its own compilation and execution testcase. > > OK for trunk? > > Cheers, > Ian > > > 2013-05-08 Ian Bolton > > gcc/ > * config/aarch64/aarch64.md (insv): New define_expand. > (*insv_reg): New define_insn. > > testsuite/ > * gcc.target/aarch64/bfm_1.c: New test. (This patch did not yet get commit approval.) I improved this patch during the work I did on the recent insv_imm patch (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html). I also renamed the testcase. Regression testing completed successfully. OK for trunk? Cheers, Ian 2013-05-20 Ian Bolton gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_reg): New define_insn. testsuite/ * gcc.target/aarch64/insv_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b27bcda..e5d6950 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3164,6 +3164,52 @@ (set_attr "mode" "")] ) +;; Bitfield Insert (insv) +(define_expand "insv" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand") + (match_operand 1 "const_int_operand") + (match_operand 2 "const_int_operand")) + (match_operand:GPI 3 "general_operand"))] + "" +{ + unsigned HOST_WIDE_INT width = UINTVAL (operands[1]); + unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]); + rtx value = operands[3]; + + if (width == 0 || (pos + width) > GET_MODE_BITSIZE (mode)) +FAIL; + + if (CONST_INT_P (value)) +{ + unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 << width) - 1; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if ((UINTVAL (value) & mask) == 0 + || (UINTVAL (value) & mask) == mask) + FAIL; + + /* Force the constant into a register, unless this is a 16-bit aligned +16-bit wide insert, which is handled by insv_imm. */ + if (width != 16 || (pos % 16) != 0) + operands[3] = force_reg (mode, value); +} + else if (!register_operand (value, mode)) +operands[3] = force_reg (mode, value); +}) + +(define_insn "*insv_reg" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") + (match_operand 1 "const_int_operand" "n") + (match_operand 2 "const_int_operand" "n")) + (match_operand:GPI 3 "register_operand" "r"))] + "!(UINTVAL (operands[1]) == 0 + || (UINTVAL (operands[2]) + UINTVAL (operands[1]) +> GET_MODE_BITSIZE (mode)))" + "bfi\\t%0, %3, %2, %1" + [(set_attr "v8type" "bfm") + (set_attr "mode" "")] +) + (define_insn "*_shft_" [(set (match_operand:GPI 0 "register_operand" "=r") (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c new file mode 100644 index 000..0977e15 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -0,0 +1,84 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8" } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5" } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 32" } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler "orr\tx\[0-9\]+, x\[0-9\]+, 2031616" } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -2031617" } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Fix invalid assembler in scalar_intrinsics.c test
The test file scalar_intrinsics.c (in gcc.target/aarch64) is currently compile-only. If you attempt to make it run, as opposed to just generate assembler, you can't because it won't assemble. There are two issues causing trouble here: 1) Use of invalid instruction "mov d0, d1". It should be "mov d0, v1.d[0]". 2) The vdupd_lane_s64 and vdupd_lane_u64 calls are being given a lane that is out of range, which causes invalid assembler output. This patch fixes both, so that we can build on this to make executable test cases for scalar intrinsics. OK for trunk? Cheers, Ian 2013-05-22 Ian Bolton testsuite/ * gcc.target/aarch64/scalar_intrinsics.c (force_simd): Use a valid instruction. (test_vdupd_lane_s64): Pass a valid lane argument. (test_vdupd_lane_u64): Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index 7427c62..16537ce 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -4,7 +4,7 @@ #include /* Used to force a variable to a SIMD register. */ -#define force_simd(V1) asm volatile ("mov %d0, %d1" \ +#define force_simd(V1) asm volatile ("mov %d0, %1.d[0]" \ : "=w"(V1) \ : "w"(V1)\ : /* No clobbers */); @@ -228,13 +228,13 @@ test_vdups_lane_u32 (uint32x4_t a) int64x1_t test_vdupd_lane_s64 (int64x2_t a) { - return vdupd_lane_s64 (a, 2); + return vdupd_lane_s64 (a, 1); } uint64x1_t test_vdupd_lane_u64 (uint64x2_t a) { - return vdupd_lane_u64 (a, 2); + return vdupd_lane_u64 (a, 1); } /* { dg-final { scan-assembler-times "\\tcmtst\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 2 } } */
RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern
> On 05/20/2013 11:55 AM, Ian Bolton wrote: > > I improved this patch during the work I did on the recent insv_imm > patch > > (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html). > > Thanks, you cleaned up almost everything on which I would have > commented > with the previous patch revision. The only thing left is: > > > + else if (!register_operand (value, mode)) > > +operands[3] = force_reg (mode, value); > > Checking register_operand before force_reg is unnecessary; you're not > saving a > function call, and force_reg will itself perform the register check. Thanks for the review, Richard. Latest patch is attached, which fixes this. Linux and bare-metal regression runs successful. OK for trunk? Cheers, Ian 2013-05-30 Ian Bolton gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_reg): New define_insn. testsuite/ * gcc.target/aarch64/insv_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 2bdbfa9..89db092 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3163,6 +3163,50 @@ (set_attr "mode" "")] ) +;; Bitfield Insert (insv) +(define_expand "insv" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand") + (match_operand 1 "const_int_operand") + (match_operand 2 "const_int_operand")) + (match_operand:GPI 3 "general_operand"))] + "" +{ + unsigned HOST_WIDE_INT width = UINTVAL (operands[1]); + unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]); + rtx value = operands[3]; + + if (width == 0 || (pos + width) > GET_MODE_BITSIZE (mode)) +FAIL; + + if (CONST_INT_P (value)) +{ + unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 << width) - 1; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if ((UINTVAL (value) & mask) == 0 + || (UINTVAL (value) & mask) == mask) + FAIL; + + /* 16-bit aligned 16-bit wide insert is handled by insv_imm. */ + if (width == 16 && (pos % 16) == 0) + DONE; +} + operands[3] = force_reg (mode, value); +}) + +(define_insn "*insv_reg" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") + (match_operand 1 "const_int_operand" "n") + (match_operand 2 "const_int_operand" "n")) + (match_operand:GPI 3 "register_operand" "r"))] + "!(UINTVAL (operands[1]) == 0 + || (UINTVAL (operands[2]) + UINTVAL (operands[1]) +> GET_MODE_BITSIZE (mode)))" + "bfi\\t%0, %3, %2, %1" + [(set_attr "v8type" "bfm") + (set_attr "mode" "")] +) + (define_insn "*_shft_" [(set (match_operand:GPI 0 "register_operand" "=r") (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c new file mode 100644 index 000..bc8928d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -0,0 +1,84 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8" } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5" } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 32" } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler "orr\tx\[0-9\]+, x\[0-9\]+, 2031616" } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -2031617" } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[AArch64, PATCH 1/5] Improve MOVI handling (Change interface of aarch64_simd_valid_immediate)
(This patch is the first of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is pretty simple - just altering an interface, so we can later remove an unnecessary wrapper function. OK for trunk? Cheers, Ian 13-06-03 Ian Bolton * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Change return type to bool for prototype. (aarch64_legitimate_constant_p): Check for true instead of not -1. (aarch64_simd_valid_immediate): Fix up each return to return a bool. (aarch64_simd_immediate_valid_for_move): Update retval for bool.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12a7055..05ff5fa 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -103,7 +103,7 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode, static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_override_options_after_change (void); -static int aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, +static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, int *, unsigned char *, int *, int *); static bool aarch64_vector_mode_supported_p (enum machine_mode); static unsigned bit_count (unsigned HOST_WIDE_INT); @@ -5153,7 +5153,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, rtx x) we now decompose CONST_INTs according to expand_mov_immediate. */ if ((GET_CODE (x) == CONST_VECTOR && aarch64_simd_valid_immediate (x, mode, false, - NULL, NULL, NULL, NULL, NULL) != -1) + NULL, NULL, NULL, NULL, NULL)) || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x)) return !targetm.cannot_force_const_mem (mode, x); @@ -6144,11 +6144,8 @@ aarch64_vect_float_const_representable_p (rtx x) return aarch64_float_const_representable_p (x0); } -/* TODO: This function returns values similar to those - returned by neon_valid_immediate in gcc/config/arm/arm.c - but the API here is different enough that these magic numbers - are not used. It should be sufficient to return true or false. */ -static int +/* Return true for valid and false for invalid. */ +static bool aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, rtx *modconst, int *elementwidth, unsigned char *elementchar, @@ -6184,24 +6181,21 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, if (!(simd_imm_zero || aarch64_vect_float_const_representable_p (op))) - return -1; + return false; - if (modconst) - *modconst = CONST_VECTOR_ELT (op, 0); + if (modconst) + *modconst = CONST_VECTOR_ELT (op, 0); - if (elementwidth) - *elementwidth = elem_width; + if (elementwidth) + *elementwidth = elem_width; - if (elementchar) - *elementchar = sizetochar (elem_width); + if (elementchar) + *elementchar = sizetochar (elem_width); - if (shift) - *shift = 0; + if (shift) + *shift = 0; - if (simd_imm_zero) - return 19; - else - return 18; + return true; } /* Splat vector constant out into a byte vector. */ @@ -6299,7 +6293,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, if (immtype == -1 || (immtype >= 12 && immtype <= 15) || immtype == 18) -return -1; +return false; if (elementwidth) @@ -6351,7 +6345,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, } } - return immtype; + return (immtype >= 0); #undef CHECK } @@ -6369,11 +6363,11 @@ aarch64_simd_immediate_valid_for_move (rtx op, enum machine_mode mode, int tmpwidth; unsigned char tmpwidthc; int tmpmvn = 0, tmpshift = 0; - int retval = aarch64_simd_valid_immediate (op, mode, 0, &tmpconst, + bool retval = aarch64_simd_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth, &tmpwidthc, &tmpmvn, &tmpshift); - if (retval == -1) + if (!retval) return 0; if (modconst)
[AArch64, PATCH 3/5] Improve MOVI handling (Don't update RTX operand in-place)
(This patch is the third of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is focused on cleaning up aarch64_simd_valid_immediate, with better use of arguments and no in-place modification of RTX operands. Specifically, I've changed the set of pointers that are passed in (it's now a struct) and the caller prints out the immediate value directly instead of letting operand[1] get fudged. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton * config/aarch64/aarch64.c (simd_immediate_info): Struct to hold information completed by aarch64_simd_valid_immediate. (aarch64_legitimate_constant_p): Update arguments. (aarch64_simd_valid_immediate): Work with struct rather than many pointers. (aarch64_simd_scalar_immediate_valid_for_move): Update arguments. (aarch64_simd_make_constant): Update arguments. (aarch64_output_simd_mov_immediate): Work with struct rather than many pointers. Output immediate directly rather than as operand. * config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/constraints.md (Dn): Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d1de14e..083ce91 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -156,8 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode); bool aarch64_simd_imm_zero_p (rtx, enum machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode); bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool); -bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, - int *, unsigned char *, int *, int *); +bool aarch64_simd_valid_immediate (rtx, enum machine_mode, bool, + struct simd_immediate_info *); bool aarch64_symbolic_address_p (rtx); bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context, enum aarch64_symbol_type *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5f97efe..d83e645 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -87,6 +87,14 @@ struct aarch64_address_info { enum aarch64_symbol_type symbol_type; }; +struct simd_immediate_info { + rtx value; + int shift; + int element_width; + unsigned char element_char; + bool mvn; +}; + /* The current code model. */ enum aarch64_code_model aarch64_cmodel; @@ -5150,8 +5158,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, rtx x) /* This could probably go away because we now decompose CONST_INTs according to expand_mov_immediate. */ if ((GET_CODE (x) == CONST_VECTOR - && aarch64_simd_valid_immediate (x, mode, false, - NULL, NULL, NULL, NULL, NULL)) + && aarch64_simd_valid_immediate (x, mode, false, NULL)) || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x)) return !targetm.cannot_force_const_mem (mode, x); @@ -6144,10 +6151,8 @@ aarch64_vect_float_const_representable_p (rtx x) /* Return true for valid and false for invalid. */ bool -aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, - rtx *modconst, int *elementwidth, - unsigned char *elementchar, - int *mvn, int *shift) +aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, + struct simd_immediate_info *info) { #define CHECK(STRIDE, ELSIZE, CLASS, TEST, SHIFT, NEG) \ matches = 1; \ @@ -6181,17 +6186,14 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, || aarch64_vect_float_const_representable_p (op))) return false; - if (modconst) - *modconst = CONST_VECTOR_ELT (op, 0); - - if (elementwidth) - *elementwidth = elem_width; - - if (elementchar) - *elementchar = sizetochar (elem_width); - - if (shift) - *shift = 0; + if (info) + { + info->value = CONST_VECTOR_ELT (op, 0); + info->element_width = elem_width; + info->element_char = sizetochar (elem_width); + info->mvn = false; + info->shift = 0; + } return true; } @@ -6293,21 +6295,13 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, || immtype == 18) return false; - - if (elementwidth) -*elementwidth = elsize; - - if (elementchar) -*elementchar = elchar; - - if (mvn) -*mvn = emvn; - - if (shift) -*shift = eshift; - - if (modconst) + if (info) { +
[AArch64, PATCH 2/5] Improve MOVI handling (Remove wrapper function)
(This patch is the second of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is also very simple - removing a wrapper function that was an unnecessary level of indirection. OK for trunk? Cheers, Ian 13-06-03 Ian Bolton * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): No longer static. (aarch64_simd_immediate_valid_for_move): Remove. (aarch64_simd_scalar_immediate_valid_for_move): Update call. (aarch64_simd_make_constant): Update call. (aarch64_output_simd_mov_immediate): Update call. * config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate): Add prototype. * config/aarch64/constraints.md (Dn): Update call.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 91fcde8..d1de14e 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -156,6 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode); bool aarch64_simd_imm_zero_p (rtx, enum machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode); bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool); +bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, + int *, unsigned char *, int *, int *); bool aarch64_symbolic_address_p (rtx); bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context, enum aarch64_symbol_type *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 05ff5fa..aec59b0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -103,8 +103,6 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode, static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_override_options_after_change (void); -static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, -int *, unsigned char *, int *, int *); static bool aarch64_vector_mode_supported_p (enum machine_mode); static unsigned bit_count (unsigned HOST_WIDE_INT); static bool aarch64_const_vec_all_same_int_p (rtx, @@ -6145,7 +6143,7 @@ aarch64_vect_float_const_representable_p (rtx x) } /* Return true for valid and false for invalid. */ -static bool +bool aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, rtx *modconst, int *elementwidth, unsigned char *elementchar, @@ -6349,45 +6347,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, #undef CHECK } -/* Return TRUE if rtx X is legal for use as either a AdvSIMD MOVI instruction - (or, implicitly, MVNI) immediate. Write back width per element - to *ELEMENTWIDTH, and a modified constant (whatever should be output - for a MOVI instruction) in *MODCONST. */ -int -aarch64_simd_immediate_valid_for_move (rtx op, enum machine_mode mode, - rtx *modconst, int *elementwidth, - unsigned char *elementchar, - int *mvn, int *shift) -{ - rtx tmpconst; - int tmpwidth; - unsigned char tmpwidthc; - int tmpmvn = 0, tmpshift = 0; - bool retval = aarch64_simd_valid_immediate (op, mode, 0, &tmpconst, -&tmpwidth, &tmpwidthc, -&tmpmvn, &tmpshift); - - if (!retval) -return 0; - - if (modconst) -*modconst = tmpconst; - - if (elementwidth) -*elementwidth = tmpwidth; - - if (elementchar) -*elementchar = tmpwidthc; - - if (mvn) -*mvn = tmpmvn; - - if (shift) -*shift = tmpshift; - - return 1; -} - static bool aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT minval, @@ -6492,9 +6451,8 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, enum machine_mode mode) gcc_assert (!VECTOR_MODE_P (mode)); vmode = aarch64_preferred_simd_mode (mode); rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op)); - int retval = aarch64_simd_immediate_valid_for_move (op_v, vmode, 0, - NULL, NULL, NULL, NULL); - return retval; + return aarch64_simd_valid_immediate (op_v, vmode, 0, NULL, + NULL, NULL, NULL, NULL); } /* Construct and return a PARALLEL RTX vector. */ @@ -6722,8 +6680,8 @@ aarch64_simd_make_constant (rtx vals) gcc_unreachable (); if (const_vec != NULL_RTX - && aarch64_simd_immediate_valid_for_move (const_vec, mode, NULL, NULL, -
[AArch64, PATCH 4/5] Improve MOVI handling (Other minor clean-up)
(This patch is the fourth of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) I think the changelog says it all here. Nothing major, just tidying up. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton * config/aarch64/aarch64.c (simd_immediate_info): Remove element_char member. (sizetochar): Return signed char. (aarch64_simd_valid_immediate): Remove elchar and other unnecessary variables. (aarch64_output_simd_mov_immediate): Take rtx instead of &rtx. Calculate element_char as required. * config/aarch64/aarch64-protos.h: Update and move prototype for aarch64_output_simd_mov_immediate. * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 083ce91..d21a2f5 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode); bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context, enum machine_mode); +char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned); bool aarch64_pad_arg_upward (enum machine_mode, const_tree); bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool); bool aarch64_regno_ok_for_base_p (int, bool); @@ -258,6 +259,4 @@ extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); - -char* aarch64_output_simd_mov_immediate (rtx *, enum machine_mode, unsigned); #endif /* GCC_AARCH64_PROTOS_H */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 04fbdbd..e5990d4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -409,7 +409,7 @@ case 4: return "ins\t%0.d[0], %1"; case 5: return "mov\t%0, %1"; case 6: - return aarch64_output_simd_mov_immediate (&operands[1], + return aarch64_output_simd_mov_immediate (operands[1], mode, 64); default: gcc_unreachable (); } @@ -440,7 +440,7 @@ case 5: return "#"; case 6: - return aarch64_output_simd_mov_immediate (&operands[1], mode, 128); + return aarch64_output_simd_mov_immediate (operands[1], mode, 128); default: gcc_unreachable (); } diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d83e645..001f9c5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -91,7 +91,6 @@ struct simd_immediate_info { rtx value; int shift; int element_width; - unsigned char element_char; bool mvn; }; @@ -6102,7 +6101,7 @@ aarch64_mangle_type (const_tree type) } /* Return the equivalent letter for size. */ -static unsigned char +static char sizetochar (int size) { switch (size) @@ -6163,7 +6162,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, { \ immtype = (CLASS); \ elsize = (ELSIZE); \ - elchar = sizetochar (elsize);\ eshift = (SHIFT);\ emvn = (NEG);\ break; \ @@ -6172,25 +6170,20 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op); unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode)); unsigned char bytes[16]; - unsigned char elchar = 0; int immtype = -1, matches; unsigned int invmask = inverse ? 0xff : 0; int eshift, emvn; if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) { - bool simd_imm_zero = aarch64_simd_imm_zero_p (op, mode); - int elem_width = GET_MODE_BITSIZE (GET_MODE (CONST_VECTOR_ELT (op, 0))); - - if (!(simd_imm_zero - || aarch64_vect_float_const_representable_p (op))) + if (! (aarch64_simd_imm_zero_p (op, mode) +|| aarch64_vect_float_const_representable_p (op))) return false; if (info) { info->value = CONST_VECTOR_ELT (op, 0); - info->element_width = elem_width; - info->element_char = sizetochar (elem_width); + info->element_width = GET_MODE_BITSIZE (GET_MODE (info->value)); info->mvn = false; info->shift = 0; } @@ -6298,7 +6291,6 @@ aarch64_simd_valid_immedi
[AArch64, PATCH 5/5] Improve MOVI handling (Fix invalid assembler bug)
(This patch is the last of five, where the first 4 did some clean-up and this one fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) GCC will currently generator invalid assembler for MOVI, if the value in question needs to be shifted. For example: prog.s:270: Error: immediate value out of range -128 to 255 at operand 2 -- `movi v16.4h,1024' The correct assembler for the example should be: movi v16.4h, 0x4, lsl 8 The fix involves calling into a function to output the instruction, rather than just leaving for aarch64_print_operand, as is done for vector immediates. Regression runs have passed for Linux and bare-metal. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton gcc/ * config/aarch64/aarch64.md (*mov_aarch64): Call into function to generate MOVI instruction. * config/aarch64/aarch64.c (aarch64_simd_container_mode): New function. (aarch64_preferred_simd_mode): Turn into wrapper. (aarch64_output_scalar_simd_mov_immediate): New function. * config/aarch64/aarch64-protos.h: Add prototype for above. testsuite/ * gcc.target/aarch64/movi_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d21a2f5..0dface1 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode); bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context, enum machine_mode); +char *aarch64_output_scalar_simd_mov_immediate (rtx, enum machine_mode); char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned); bool aarch64_pad_arg_upward (enum machine_mode, const_tree); bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 001f9c5..0ea05d8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5988,32 +5988,57 @@ aarch64_vector_mode_supported_p (enum machine_mode mode) return false; } -/* Return quad mode as the preferred SIMD mode. */ +/* Return appropriate SIMD container + for MODE within a vector of WIDTH bits. */ static enum machine_mode -aarch64_preferred_simd_mode (enum machine_mode mode) +aarch64_simd_container_mode (enum machine_mode mode, unsigned width) { + gcc_assert (width == 64 || width == 128); if (TARGET_SIMD) -switch (mode) - { - case DFmode: -return V2DFmode; - case SFmode: -return V4SFmode; - case SImode: -return V4SImode; - case HImode: -return V8HImode; - case QImode: -return V16QImode; - case DImode: - return V2DImode; -break; - - default:; - } +{ + if (width == 128) + switch (mode) + { + case DFmode: + return V2DFmode; + case SFmode: + return V4SFmode; + case SImode: + return V4SImode; + case HImode: + return V8HImode; + case QImode: + return V16QImode; + case DImode: + return V2DImode; + default: + break; + } + else + switch (mode) + { + case SFmode: + return V2SFmode; + case SImode: + return V2SImode; + case HImode: + return V4HImode; + case QImode: + return V8QImode; + default: + break; + } +} return word_mode; } +/* Return 128-bit container as the preferred SIMD mode for MODE. */ +static enum machine_mode +aarch64_preferred_simd_mode (enum machine_mode mode) +{ + return aarch64_simd_container_mode (mode, 128); +} + /* Return the bitmask of possible vector sizes for the vectorizer to iterate over. */ static unsigned int @@ -7280,6 +7305,18 @@ aarch64_output_simd_mov_immediate (rtx const_vector, return templ; } +char* +aarch64_output_scalar_simd_mov_immediate (rtx immediate, + enum machine_mode mode) +{ + enum machine_mode vmode; + + gcc_assert (!VECTOR_MODE_P (mode)); + vmode = aarch64_simd_container_mode (mode, 64); + rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (immediate)); + return aarch64_output_simd_mov_immediate (v_op, vmode, 64); +} + /* Split operands into moves from op[1] + op[2] into op[0]. */ void diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e1ec48f..458239e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -774,17 +774,34 @@ (match_operand:SHORT 1 "general_operand" " r,M,D,m, m,rZ,*w,*w, r,*w"))] "(register_operand (operands[0], mode) || aarch64_reg_or_zero (operands[1], mode))" - &quo
[PATCH, AArch64] Update insv_1.c test for Big Endian
Hi, The insv_1.c test case I added recently was not compatible with big endian. I attempted to fix with #ifdefs but dejagnu thinks all dg directives in a file, regardless of #ifdefs, are applicable, so I had to instead make a new test and add a new effective target to show when each test is supported. I've tested these two tests on little and big. All was OK. OK for trunk? Cheers, Ian 2013-06-24 Ian Bolton * gcc.target/config/aarch64/insv_1.c: Update to show it doesn't work on big endian. * gcc.target/config/aarch64/insv_2.c: New test for big endian. * lib/target-supports.exp: Define aarch64_little_endian.diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c index bc8928d..6e3c7f0 100644 --- a/gcc/testsuite/gcc.target/aarch64/insv_1.c +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -1,5 +1,6 @@ -/* { dg-do run } */ +/* { dg-do run { target aarch64*-*-* } } */ /* { dg-options "-O2 --save-temps -fno-inline" } */ +/* { dg-require-effective-target aarch64_little_endian } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.target/aarch64/insv_2.c b/gcc/testsuite/gcc.target/aarch64/insv_2.c new file mode 100644 index 000..a7691a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_2.c @@ -0,0 +1,85 @@ +/* { dg-do run { target aarch64*-*-* } } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ +/* { dg-require-effective-target aarch64_big_endian } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 56, 8" } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler "bfi\tx\[0-9\]+, x\[0-9\]+, 43, 5" } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 16" } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler "orr\tx\[0-9\]+, x\[0-9\]+, 272678883688448" } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -272678883688449" } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index a80078a..aca4215 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2105,6 +2105,15 @@ proc check_effective_target_aarch64_big_endian { } { }] } +# Return 1 if this is a AArch64 target supporting little endian +proc check_effective_target_aarch64_little_endian { } { +return [check_no_compiler_messages aarch64_little_endian assembly { +#if !defined(__aarch64__) || defined(__AARCH64EB__) +#error FOO +#endif +}] +} + # Return 1 is this is an arm target using 32-bit instructions proc check_effective_target_arm32 { } { return [check_no_compiler_messages arm32 assembly {
[PATCH, AArch64] Support abs standard pattern for DI mode
Hi, I'm adding support for abs standard pattern name for DI mode, via the ABS instruction in FP registers and the EOR/SUB combo in GP registers. Regression tests for Linux and bare-metal all passed. OK for trunk? Cheers, Ian 2013-06-25 Ian Bolton gcc/ * config/aarch64/aarch64-simd.md (absdi2): Support abs for DI mode. testsuite/ * gcc.target/aarch64/abs_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e88e5be..3700977 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2003,6 +2003,38 @@ (set_attr "mode" "SI")] ) +(define_insn_and_split "absdi2" + [(set (match_operand:DI 0 "register_operand" "=r,w") + (abs:DI (match_operand:DI 1 "register_operand" "r,w"))) + (clobber (match_scratch:DI 2 "=&r,X"))] + "" + "@ + # + abs\\t%d0, %d1" + "reload_completed + && GP_REGNUM_P (REGNO (operands[0])) + && GP_REGNUM_P (REGNO (operands[1]))" + [(const_int 0)] + { +emit_insn (gen_rtx_SET (VOIDmode, operands[2], + gen_rtx_XOR (DImode, +gen_rtx_ASHIFTRT (DImode, + operands[1], + GEN_INT (63)), +operands[1]))); +emit_insn (gen_rtx_SET (VOIDmode, + operands[0], + gen_rtx_MINUS (DImode, + operands[2], + gen_rtx_ASHIFTRT (DImode, +operands[1], +GEN_INT (63); +DONE; + } + [(set_attr "v8type" "alu") + (set_attr "mode" "DI")] +) + (define_insn "neg2" [(set (match_operand:GPI 0 "register_operand" "=r") (neg:GPI (match_operand:GPI 1 "register_operand" "r")))] diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c b/gcc/testsuite/gcc.target/aarch64/abs_1.c new file mode 100644 index 000..938bc84 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern long long llabs (long long); +extern void abort (void); + +long long +abs64 (long long a) +{ + /* { dg-final { scan-assembler "eor\t" } } */ + /* { dg-final { scan-assembler "sub\t" } } */ + return llabs (a); +} + +long long +abs64_in_dreg (long long a) +{ + /* { dg-final { scan-assembler "abs\td\[0-9\]+, d\[0-9\]+" } } */ + register long long x asm ("d8") = a; + register long long y asm ("d9"); + asm volatile ("" : : "w" (x)); + y = llabs (x); + asm volatile ("" : : "w" (y)); + return y; +} + +int +main (void) +{ + volatile long long ll0 = 0LL, ll1 = 1LL, llm1 = -1LL; + + if (abs64 (ll0) != 0LL) +abort (); + + if (abs64 (ll1) != 1LL) +abort (); + + if (abs64 (llm1) != 1LL) +abort (); + + if (abs64_in_dreg (ll0) != 0LL) +abort (); + + if (abs64_in_dreg (ll1) != 1LL) +abort (); + + if (abs64_in_dreg (llm1) != 1LL) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Support BFXIL in the backend
Hi, We don't currently generate BFXIL on AArch64. This patch addresses that, by adding a pattern in the backend. It comes with test cases for little and big endian. Tested on little-endian linux and bare-metal, and big-endian linux. OK for trunk? Cheers, Ian 2013-06-27 Ian Bolton gcc/ * config/aarch64/aarch64.md (*extr_insv_reg): New pattern. testsuite/ * gcc.target/aarch64/bfxil_1.c: New test. * gcc.target/aarch64/bfxil_2.c: Likewise. Index: gcc/testsuite/gcc.target/aarch64/bfxil_1.c === --- gcc/testsuite/gcc.target/aarch64/bfxil_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bfxil_1.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-do run { target aarch64*-*-* } } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ +/* { dg-require-effective-target aarch64_little_endian } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight1: 8; + unsigned short four: 4; + unsigned short eight2: 8; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfxil (bitfield a) +{ + /* { dg-final { scan-assembler "bfxil\tx\[0-9\]+, x\[0-9\]+, 16, 8" } } */ + a.eight1 = a.eight2; + return a; +} + +int +main (void) +{ + static bitfield a; + bitfield b; + + a.eight1 = 9; + a.eight2 = 57; + b = bfxil (a); + + if (b.eight1 != a.eight2) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/bfxil_2.c === --- gcc/testsuite/gcc.target/aarch64/bfxil_2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bfxil_2.c (revision 0) @@ -0,0 +1,42 @@ +/* { dg-do run { target aarch64*-*-* } } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ +/* { dg-require-effective-target aarch64_big_endian } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight1: 8; + unsigned short four: 4; + unsigned short eight2: 8; + unsigned short seven: 7; + unsigned int sixteen: 16; + unsigned short eight3: 8; + unsigned short eight4: 8; +} bitfield; + +bitfield +bfxil (bitfield a) +{ + /* { dg-final { scan-assembler "bfxil\tx\[0-9\]+, x\[0-9\]+, 40, 8" } } */ + a.eight4 = a.eight2; + return a; +} + +int +main (void) +{ + static bitfield a; + bitfield b; + + a.eight4 = 9; + a.eight2 = 57; + b = bfxil (a); + + if (b.eight4 != a.eight2) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 200470) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -3225,6 +3225,21 @@ (define_insn "*insv_reg" (set_attr "mode" "")] ) +(define_insn "*extr_insv_lower_reg" + [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r") + (match_operand 1 "const_int_operand" "n") + (const_int 0)) + (zero_extract:GPI (match_operand:GPI 2 "register_operand" "+r") + (match_dup 1) + (match_operand 3 "const_int_operand" "n")))] + "!(UINTVAL (operands[1]) == 0 + || (UINTVAL (operands[3]) + UINTVAL (operands[1]) +> GET_MODE_BITSIZE (mode)))" + "bfxil\\t%0, %2, %3, %1" + [(set_attr "v8type" "bfm") + (set_attr "mode" "")] +) + (define_insn "*_shft_" [(set (match_operand:GPI 0 "register_operand" "=r") (ashift:GPI (ANY_EXTEND:GPI
[PATCH, AArch64] Add vabs_s64 intrinsic
This patch implements the following intrinsic: int64x1_t vabs_s64 (int64x1 a) It uses __builtin_llabs(), which will lead to "abs Dn, Dm" being generated for this now that my other patch has been committed. Test case added to scalar_intrinsics.c. OK for trunk? Cheers, Ian 2013-07-12 Ian Bolton gcc/ * config/aarch64/arm_neon.h (vabs_s64): New function. testsuite/ * gcc.target/aarch64/scalar_intrinsics.c (test_vabs_s64): Added new test.Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 200594) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -17886,6 +17886,12 @@ vabsq_f64 (float64x2_t __a) return __builtin_aarch64_absv2df (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vabs_s64 (int64x1_t a) +{ + return __builtin_llabs (a); +} + /* vadd */ __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) Index: gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c === --- gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(revision 200594) +++ gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(working copy) @@ -32,6 +32,18 @@ test_vaddd_s64_2 (int64x1_t a, int64x1_t vqaddd_s64 (a, d)); } +/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+, d\[0-9\]+" 1 } } */ + +int64x1_t +test_vabs_s64 (int64x1_t a) +{ + uint64x1_t res; + force_simd (a); + res = vabs_s64 (a); + force_simd (res); + return res; +} + /* { dg-final { scan-assembler-times "\\tcmeq\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */ uint64x1_t
RE: [PATCH, AArch64] Add vabs_s64 intrinsic
> On 12 Jul 2013, at 19:49, Ian Bolton wrote: > > > > > 2013-07-12 Ian Bolton > > > > gcc/ > > * config/aarch64/arm_neon.h (vabs_s64): New function. > > > > testsuite/ > > * gcc.target/aarch64/scalar_intrinsics.c (test_vabs_s64): Added > new > > test. > > OK > /Marcus I needed to update the patch to match argument naming conventions and function ordering conventions. Here is the new one. OK for commit? Cheers, Ian Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 200594) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -17874,6 +17874,12 @@ vabs_f32 (float32x2_t __a) return __builtin_aarch64_absv2sf (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vabs_s64 (int64x1_t __a) +{ + return __builtin_llabs (__a); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vabsq_f32 (float32x4_t __a) { Index: gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c === --- gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(revision 200594) +++ gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(working copy) @@ -32,6 +32,18 @@ test_vaddd_s64_2 (int64x1_t a, int64x1_t vqaddd_s64 (a, d)); } +/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+, d\[0-9\]+" 1 } } */ + +int64x1_t +test_vabs_s64 (int64x1_t a) +{ + uint64x1_t res; + force_simd (a); + res = vabs_s64 (a); + force_simd (res); + return res; +} + /* { dg-final { scan-assembler-times "\\tcmeq\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */ uint64x1_t
[PATCH, AArch64] Support NEG in vector registers for DI and SI mode
Support added for scalar NEG instruction in vector registers. Execution testcase included. Tested on usual GCC Linux regressions. OK for trunk? Cheers, Ian 2013-07-23 Ian Bolton gcc/ * config/aarch64/aarch64-simd.md (neg2): Offer alternative that uses vector registers. testsuite/ * gcc.target/aarch64/neg_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e88e5be..d76056c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2004,12 +2004,17 @@ ) (define_insn "neg2" - [(set (match_operand:GPI 0 "register_operand" "=r") - (neg:GPI (match_operand:GPI 1 "register_operand" "r")))] + [(set (match_operand:GPI 0 "register_operand" "=r,w") + (neg:GPI (match_operand:GPI 1 "register_operand" "r,w")))] "" - "neg\\t%0, %1" + "@ + neg\\t%0, %1 + neg\\t%0, %1" [(set_attr "v8type" "alu") - (set_attr "mode" "")] + (set_attr "simd_type" "*,simd_negabs") + (set_attr "simd" "*,yes") + (set_attr "mode" "") + (set_attr "simd_mode" "")] ) ;; zero_extend version of above diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e40c5d..7acbcfd 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -277,6 +277,12 @@ (V2DI "") (V2SF "") (V4SF "") (V2DF "")]) +;; Register Type Name and Vector Arrangement Specifier for when +;; we are doing scalar for DI and SIMD for SI (ignoring all but +;; lane 0). +(define_mode_attr rtn [(DI "d") (SI "")]) +(define_mode_attr vas [(DI "") (SI ".2s")]) + ;; Map a floating point mode to the appropriate register name prefix (define_mode_attr s [(SF "s") (DF "d")]) diff --git a/gcc/testsuite/gcc.target/aarch64/neg_1.c b/gcc/testsuite/gcc.target/aarch64/neg_1.c new file mode 100644 index 000..04b0fdd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neg_1.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +long long +neg64 (long long a) +{ + /* { dg-final { scan-assembler "neg\tx\[0-9\]+" } } */ + return 0 - a; +} + +long long +neg64_in_dreg (long long a) +{ + /* { dg-final { scan-assembler "neg\td\[0-9\]+, d\[0-9\]+" } } */ + register long long x asm ("d8") = a; + register long long y asm ("d9"); + asm volatile ("" : : "w" (x)); + y = 0 - x; + asm volatile ("" : : "w" (y)); + return y; +} + +int +neg32 (int a) +{ + /* { dg-final { scan-assembler "neg\tw\[0-9\]+" } } */ + return 0 - a; +} + +int +neg32_in_sreg (int a) +{ + /* { dg-final { scan-assembler "neg\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" } } */ + register int x asm ("s8") = a; + register int y asm ("s9"); + asm volatile ("" : : "w" (x)); + y = 0 - x; + asm volatile ("" : : "w" (y)); + return y; +} + +int +main (void) +{ + long long a; + int b; + a = 61; + b = 313; + + if (neg64 (a) != -61) +abort (); + + if (neg64_in_dreg (a) != -61) +abort (); + + if (neg32 (b) != -313) +abort (); + + if (neg32_in_sreg (b) != -313) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Add secondary reload for immediates into FP_REGS
Our movdi_aarch64 pattern allows moving a constant into an FP_REG, but has the constraint Dd, which is stricter than the one for moving a constant into a CORE_REG. This is due to restricted values allowed for MOVI instructions. Due to the predicate for the pattern allowing any constant that is valid for the CORE_REGs, we can run into situations where IRA/reload has decided to use FP_REGs but the value is not actually valid for MOVI. This patch introduces a secondary reload to handle this case. Supplied with testcase that highlighted original problem. Tested on Linux GNU regressions. OK for trunk? Cheers, Ian 2013-07-30 Ian Bolton gcc/ * config/aarch64/aarch64.c (aarch64_secondary_reload)): Handle constant into FP_REGs that is not valid for MOVI. testsuite/ * gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9941d7c..f16988e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4070,6 +4070,15 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, if (rclass == FP_REGS && (mode == TImode || mode == TFmode) && CONSTANT_P(x)) return CORE_REGS; + /* Only a subset of the DImode immediate values valid for CORE_REGS are + valid for FP_REGS. Where we have an immediate value that isn't valid + for FP_REGS, and RCLASS is FP_REGS, we return CORE_REGS to cause the + value to be generated into there first and later copied to FP_REGS to be + used. */ + if (rclass == FP_REGS && mode == DImode && CONST_INT_P (x) + && !aarch64_simd_imm_scalar_p (x, GET_MODE (x))) +return CORE_REGS; + return NO_REGS; } diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c b/gcc/testsuite/gcc.target/aarch64/movdi_1.c new file mode 100644 index 000..1decd99 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-inline" } */ + +#include + +void +foo (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0x80004cf3dffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +}
[PATCH, AArch64] Support EXTR in backend
We couldn't generate EXTR for AArch64 ... until now! This patch includes the pattern and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (*extr5_insn): New pattern. (*extrsi5_insn_uxtw): Likewise. testsuite/ * gcc.target/aarch64/extr.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 73d86a7..ef1c0f3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2703,6 +2703,34 @@ (set_attr "mode" "")] ) +(define_insn "*extr5_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (ior:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r") +(match_operand 3 "const_int_operand" "n")) +(lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r") + (match_operand 4 "const_int_operand" "n"] + "UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode) && + (UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE (mode))" + "extr\\t%0, %1, %2, %4" + [(set_attr "v8type" "shift") + (set_attr "mode" "")] +) + +;; zero_extend version of the above +(define_insn "*extrsi5_insn_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI +(ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r") + (match_operand 3 "const_int_operand" "n")) +(lshiftrt:SI (match_operand:SI 2 "register_operand" "r") + (match_operand 4 "const_int_operand" "n")] + "UINTVAL (operands[3]) < 32 && + (UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)" + "extr\\t%w0, %w1, %w2, %4" + [(set_attr "v8type" "shift") + (set_attr "mode" "SI")] +) + (define_insn "*_ashl" [(set (match_operand:GPI 0 "register_operand" "=r") (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/extr.c b/gcc/testsuite/gcc.target/aarch64/extr.c new file mode 100644 index 000..a78dd8d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/extr.c @@ -0,0 +1,34 @@ +/* { dg-options "-O2 --save-temps" } */ +/* { dg-do run } */ + +extern void abort (void); + +int +test_si (int a, int b) +{ + /* { dg-final { scan-assembler "extr\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 27\n" } } */ + return (a << 5) | ((unsigned int) b >> 27); +} + +long long +test_di (long long a, long long b) +{ + /* { dg-final { scan-assembler "extr\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 45\n" } } */ + return (a << 19) | ((unsigned long long) b >> 45); +} + +int +main () +{ + int v; + long long w; + v = test_si (0x0004, 0x3000); + if (v != 0x0086) +abort(); + w = test_di (0x0001040040040004ll, 0x00700500ll); + if (w != 0x2002002000200380ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Support ROR in backend
We couldn't generate ROR (preferred alias of EXTR when both source registers are the same) for AArch64, when rotating by an immediate, ... until now! This patch includes the pattern and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (*ror3_insn): New pattern. (*rorsi3_insn_uxtw): Likewise. testsuite/ * gcc.target/aarch64/ror.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ef1c0f3..367c0e3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2731,6 +2731,34 @@ (set_attr "mode" "SI")] ) +(define_insn "*ror3_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (rotate:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand 2 "const_int_operand" "n")))] + "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)" +{ + operands[3] = GEN_INT ( - UINTVAL (operands[2])); + return "ror\\t%0, %1, %3"; +} + [(set_attr "v8type" "shift") + (set_attr "mode" "")] +) + +;; zero_extend version of the above +(define_insn "*rorsi3_insn_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI +(rotate:SI (match_operand:SI 1 "register_operand" "r") + (match_operand 2 "const_int_operand" "n"] + "UINTVAL (operands[2]) < 32" +{ + operands[3] = GEN_INT (32 - UINTVAL (operands[2])); + return "ror\\t%w0, %w1, %3"; +} + [(set_attr "v8type" "shift") + (set_attr "mode" "SI")] +) + (define_insn "*_ashl" [(set (match_operand:GPI 0 "register_operand" "=r") (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/ror.c b/gcc/testsuite/gcc.target/aarch64/ror.c new file mode 100644 index 000..4d266f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ror.c @@ -0,0 +1,34 @@ +/* { dg-options "-O2 --save-temps" } */ +/* { dg-do run } */ + +extern void abort (void); + +int +test_si (int a) +{ + /* { dg-final { scan-assembler "ror\tw\[0-9\]+, w\[0-9\]+, 27\n" } } */ + return (a << 5) | ((unsigned int) a >> 27); +} + +long long +test_di (long long a) +{ + /* { dg-final { scan-assembler "ror\tx\[0-9\]+, x\[0-9\]+, 45\n" } } */ + return (a << 19) | ((unsigned long long) a >> 45); +} + +int +main () +{ + int v; + long long w; + v = test_si (0x0203050); + if (v != 0x4060a00) +abort(); + w = test_di (0x020506010304ll); + if (w != 0x102830081820ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Support SBC in the backend
We couldn't generate SBC for AArch64 ... until now! This really patch includes the main pattern, a zero_extend form of it and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (*sub3_carryin): New pattern. (*subsi3_carryin_uxtw): Likewise. testsuite/ * gcc.target/aarch64/sbc.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4358b44..c99e188 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1790,6 +1790,34 @@ (set_attr "mode" "SI")] ) +(define_insn "*sub3_carryin" + [(set +(match_operand:GPI 0 "register_operand" "=r") +(minus:GPI (minus:GPI + (match_operand:GPI 1 "register_operand" "r") + (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))) + (match_operand:GPI 2 "register_operand" "r")))] + "" + "sbc\\t%0, %1, %2" + [(set_attr "v8type" "adc") + (set_attr "mode" "")] +) + +;; zero_extend version of the above +(define_insn "*subsi3_carryin_uxtw" + [(set +(match_operand:DI 0 "register_operand" "=r") +(zero_extend:DI + (minus:SI (minus:SI + (match_operand:SI 1 "register_operand" "r") + (ltu:SI (reg:CC CC_REGNUM) (const_int 0))) + (match_operand:SI 2 "register_operand" "r"] + "" + "sbc\\t%w0, %w1, %w2" + [(set_attr "v8type" "adc") + (set_attr "mode" "SI")] +) + (define_insn "*sub_uxt_multp2" [(set (match_operand:GPI 0 "register_operand" "=rk") (minus:GPI (match_operand:GPI 4 "register_operand" "r") diff --git a/gcc/testsuite/gcc.target/aarch64/sbc.c b/gcc/testsuite/gcc.target/aarch64/sbc.c new file mode 100644 index 000..e479910 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sbc.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps" } */ + +extern void abort (void); + +typedef unsigned int u32int; +typedef unsigned long long u64int; + +u32int +test_si (u32int w1, u32int w2, u32int w3, u32int w4) +{ + u32int w0; + /* { dg-final { scan-assembler "sbc\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+\n" } } */ + w0 = w1 - w2 - (w3 < w4); + return w0; +} + +u64int +test_di (u64int x1, u64int x2, u64int x3, u64int x4) +{ + u64int x0; + /* { dg-final { scan-assembler "sbc\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\n" } } */ + x0 = x1 - x2 - (x3 < x4); + return x0; +} + +int +main () +{ + u32int x; + u64int y; + x = test_si (7, 8, 12, 15); + if (x != -2) +abort(); + y = test_di (0x987654321ll, 0x123456789ll, 0x345345345ll, 0x123123123ll); + if (y != 0x8641fdb98ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
RE: [PING^1] [AArch64] Implement Bitwise AND and Set Flags
> Please consider this as a reminder to review the patch posted at > following link:- > http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01374.html > > The patch is slightly modified to use CC_NZ mode instead of CC. > > Please review the patch and let me know if its okay? > Hi Naveen, With the CC_NZ fix, the patch looks good apart from one thing: the second "set" in each pattern should have the "=r,rk" constraint rather than just "=r,r". That said, I've attached a patch that provides more thorough test cases, including execute ones. When you get commit approval (which will be after GCC goes into stage 1 again) then I can add in the test cases. You might as well run them now though, for more confidence in your work. BTW, I have an implementation of BICS that's been waiting for GCC to hit stage 1. I'll send that out for review soon. NOTE: I do not have maintainer powers here, so you need someone else to give the OK to your patch. Cheers, Ian diff --git a/gcc/testsuite/gcc.target/aarch64/ands1.c b/gcc/testsuite/gcc.target/aarch64/ands1.c new file mode 100644 index 000..e2bf956 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ands1.c @@ -0,0 +1,150 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a & 0xff; + + /* { dg-final { scan-assembler "ands\tw\[0-9\]+, w\[0-9\]+, 255" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a & (b << 3); + + /* { dg-final { scan-assembler "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a & b; + + /* { dg-final { scan-assembler "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a & 0xff; + + /* { dg-final { scan-assembler "ands\tx\[0-9\]+, x\[0-9\]+, 255" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a & (b << 3); + + /* { dg-final { scan-assembler "ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3" } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort(); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort(); + + x = ands_si_test2 (29, 4, 5); + if (x != 38) +abort(); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort(); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort(); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort(); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll & 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort(); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort(); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll & 0xff) + 0x32004ll + 0x505050505ll)) +abort(); + + y = ands_di_test2 (0x130002900ll, + 0x32004ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort(); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll & (0x06408ll << 3)) + + 0x06408ll + 0x505050505ll)) +abort(); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort(); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ands2.c b/gcc/testsuite/gcc.target/aarch64/ands2.c new file mode 100644 index 000..c778a54 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ands2.c @@ -0,0 +1,156 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps" } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a & b; + + /* { dg-final { scan-assembler-not "ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + /* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" } } */ + if (d <= 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a & 0x; + + /* { dg-final { scan-assembler-not "ands
[PATCH, AArch64] Make MOVK output operand 2 in hex
MOVK should not be generated with a negative immediate, which the assembler rightfully rejects. This patch makes MOVK output its 2nd operand in hex instead. Tested on bare-metal and linux. OK for trunk? Cheers, Ian 2013-03-20 Ian Bolton gcc/ * config/aarch64/aarch64.c (aarch64_print_operand): New format specifier for printing a constant in hex. * config/aarch64/aarch64.md (insv_imm): Use the X format specifier for printing second operand. testsuite/ * gcc.target/aarch64/movk.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1404a70..5e51630 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3365,6 +3365,16 @@ aarch64_print_operand (FILE *f, rtx x, char code) REGNO (x) - V0_REGNUM + (code - 'S')); break; +case 'X': + /* Print integer constant in hex. */ + if (GET_CODE (x) != CONST_INT) + { + output_operand_lossage ("invalid operand for '%%%c'", code); + return; + } + asm_fprintf (f, "0x%x", UINTVAL (x)); + break; + case 'w': case 'x': /* Print a general register name or the zero register (32-bit or diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 40e66db..9c89413 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -850,8 +850,8 @@ (match_operand:GPI 2 "const_int_operand" "n"))] "INTVAL (operands[1]) < GET_MODE_BITSIZE (mode) && INTVAL (operands[1]) % 16 == 0 - && INTVAL (operands[2]) <= 0x" - "movk\\t%0, %2, lsl %1" + && UINTVAL (operands[2]) <= 0x" + "movk\\t%0, %X2, lsl %1" [(set_attr "v8type" "movk") (set_attr "mode" "")] ) diff --git a/gcc/testsuite/gcc.target/aarch64/movk.c b/gcc/testsuite/gcc.target/aarch64/movk.c new file mode 100644 index 000..e4b2209 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movk.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +long long int +dummy_number_generator () +{ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0xefff, lsl 16" } } */ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0xc4cc, lsl 32" } } */ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0xfffe, lsl 48" } } */ + return -346565474575675; +} + +int +main (void) +{ + + long long int num = dummy_number_generator (); + if (num > 0) +abort (); + + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x4667, lsl 16" } } */ + /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x7a3d, lsl 32" } } */ + if (num / 69313094915135 != -5) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH AArch64] Make omit-frame-pointer work correctly
Currently, if you compile with -fomit-frame-pointer, the frame record and frame pointer are still maintained (i.e. There is no way to get the behaviour you are asking for!). This patch fixes that. It also makes sure that if you ask for no frame pointers in leaf functions then they are not generated there unless LR gets clobbered in the leaf for some reason. (I have testcases here to check for that.) OK to commit to trunk? Cheers, Ian 2013-03-28 Ian Bolton gcc/ * config/aarch64/aarch64.md (aarch64_can_eliminate): Only keep frame record when required. testsuite/ * gcc.target/aarch64/inc/asm-adder-clobber-lr.c: New test. * gcc.target/aarch64/inc/asm-adder-no-clobber-lr.c: Likewise. * gcc.target/aarch64/test-framepointer-1.c: Likewise. * gcc.target/aarch64/test-framepointer-2.c: Likewise. * gcc.target/aarch64/test-framepointer-3.c: Likewise. * gcc.target/aarch64/test-framepointer-4.c: Likewise. * gcc.target/aarch64/test-framepointer-5.c: Likewise. * gcc.target/aarch64/test-framepointer-6.c: Likewise. * gcc.target/aarch64/test-framepointer-7.c: Likewise. * gcc.target/aarch64/test-framepointer-8.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline --save-temps" } */ + +#include "asm-adder-no-clobber-lr.c" + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer. + LR is not being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline --save-temps" } */ + +#include "asm-adder-clobber-lr.c" + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer. + LR is being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline --save-temps" } */ + +#include "asm-adder-no-clobber-lr.c" + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer. + LR is not being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline --save-temps" } */ + +#include "asm-adder-clobber-lr.c" + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer. + LR is being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c === --- gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c (revi
[PATCH, AArch64] Enable Redundant Extension Elimination by default at 02 or higher
This patch enables Redundant Extension Elimination pass for AArch64. Testing shows no regressions on linux and bare-metal. In terms of performance impact, it reduces code-size for some benchmarks and makes no difference on others. OK to commit to trunk? Cheers, Ian 2013-04-24 Ian Bolton * common/config/aarch64/aarch64-common.c: Enable REE pass at O2 or higher by default. Index: gcc/common/config/aarch64/aarch64-common.c === --- gcc/common/config/aarch64/aarch64-common.c (revision 198231) +++ gcc/common/config/aarch64/aarch64-common.c (working copy) @@ -44,6 +44,8 @@ static const struct default_options aarc { /* Enable section anchors by default at -O1 or higher. */ { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, +/* Enable redundant extension instructions removal at -O2 and higher. */ +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } };
[PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long beefcafebabe () { return 0xBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0x, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0x. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 43a83566..a8e504e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } + /* Look for case where upper 16 bits are set, so we can use MOVN. */ + if ((val & 0xll) == 0xll) +{ + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (~ (~val & (0xll << 32); + emit_insn (gen_insv_immdi (dest, GEN_INT (16), +GEN_INT ((val >> 16) & 0x))); + emit_insn (gen_insv_immdi (dest, GEN_INT (0), +GEN_INT (val & 0x))); + return; +} + simple_sequence: first = true; mask = 0x; diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c new file mode 100644 index 000..cc11ade --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +long long +foo () +{ + /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */ + return 0xbeefcafebabell; +} + +long long +merge4 (int a, int b, int c, int d) +{ + return ((long long) a << 48 | (long long) b << 32 + | (long long) c << 16 | (long long) d); +} + +int main () +{ + if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe)) +abort (); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Fix macro in vdup_lane_2 test case
This patch fixes a defective macro definition, based on correct definition in similar testcases. The test currently passes through luck rather than correctness. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton gcc/testsuite * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an actual instruction to move into the allocated register.diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c index 7c04e75..2072c79 100644 --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c @@ -4,10 +4,11 @@ #include -#define force_simd(V1) asm volatile ("" \ - : "=w"(V1) \ - : "w"(V1)\ - : /* No clobbers */) +/* Used to force a variable to a SIMD register. */ +#define force_simd(V1) asm volatile ("orr %0.16b, %1.16b, %1.16b"\ + : "=w"(V1) \ + : "w"(V1)\ + : /* No clobbers */); extern void abort (void);
[PATCH, AArch64] Implement HARD_REGNO_CALLER_SAVE_MODE
Currently, on AArch64, when a caller-save register is saved/restored, GCC is accessing the maximum size of the hard register. So an SImode integer (4 bytes) value is being stored as DImode (8 bytes) because the int registers are 8 bytes wide, and an SFmode float (4 bytes) and DFmode double (8 bytes) are being stored as TImode (16 bytes) to capture the full 128-bits of the vector register. This patch corrects this, by implementing the HARD_REGNO_CALLER_SAVE_MODE hook, which is called by LRA to determine the minimise size it might need to save/restore. Tested on GCC regression suite and verified impact on a number of examples. OK for trunk? Cheers, Ian 2014-05-12 Ian Bolton * config/aarch64/aarch64-protos.h (aarch64_hard_regno_caller_save_mode): New prototype. * config/aarch64/aarch64.c (aarch64_hard_regno_caller_save_mode): New function. * config/aarch64/aarch64.h (HARD_REGNO_CALLER_SAVE_MODE): New macro.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 04cbc78..7cf7d9f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -202,6 +202,8 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); int aarch64_asm_preferred_eh_data_format (int, int); +enum machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned, + enum machine_mode); int aarch64_hard_regno_mode_ok (unsigned, enum machine_mode); int aarch64_hard_regno_nregs (unsigned, enum machine_mode); int aarch64_simd_attr_length_move (rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 8655f04..c2cc81b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -424,6 +424,24 @@ aarch64_hard_regno_mode_ok (unsigned regno, enum machine_mode mode) return 0; } +/* Implement HARD_REGNO_CALLER_SAVE_MODE. */ +enum machine_mode +aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned nregs, +enum machine_mode mode) +{ + /* Handle modes that fit within single registers. */ + if (nregs == 1 && GET_MODE_SIZE (mode) <= 16) +{ + if (GET_MODE_SIZE (mode) >= 4) +return mode; + else +return SImode; +} + /* Fall back to generic for multi-reg and very large modes. */ + else +return choose_hard_reg_mode (regno, nregs, false); +} + /* Return true if calls to DECL should be treated as long-calls (ie called via a register). */ static bool diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index c9b30d0..0574593 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -824,6 +824,11 @@ do { \ #define SHIFT_COUNT_TRUNCATED !TARGET_SIMD +/* Choose appropriate mode for caller saves, so we do the minimum + required size of load/store. */ +#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ + aarch64_hard_regno_caller_save_mode ((REGNO), (NREGS), (MODE)) + /* Callee only saves lower 64-bits of a 128-bit register. Tell the compiler the callee clobbers the top 64-bits when restoring the bottom 64-bits. */
RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
Ping. This should be relatively simple to review. Many thanks. > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Ian Bolton > Sent: 08 May 2014 18:36 > To: gcc-patches > Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative > immediates where sensible > > Hi, > > It currently takes 4 instructions to generate certain immediates on > AArch64 (unless we put them in the constant pool). > > For example ... > > long long > beefcafebabe () > { > return 0xBEEFCAFEBABEll; > } > > leads to ... > > mov x0, 0x47806 > mov x0, 0xcafe, lsl 16 > mov x0, 0xbeef, lsl 32 > orr x0, x0, -281474976710656 > > The above case is tackled in this patch by employing MOVN > to generate the top 32-bits in a single instruction ... > > mov x0, -71536975282177 > movk x0, 0xcafe, lsl 16 > movk x0, 0xbabe, lsl 0 > > Note that where at least two half-words are 0x, existing > code that does the immediate in two instructions is still used.) > > Tested on standard gcc regressions and the attached test case. > > OK for commit? > > Cheers, > Ian > > > 2014-05-08 Ian Bolton > > gcc/ > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): > Use MOVN when top-most half-word (and only that half-word) > is 0x. > gcc/testsuite/ > * gcc.target/aarch64/movn_1.c: New test.
RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
Ping. This may well be classed as "obvious", but that's not obvious to me, so I request a review. Many thanks. > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Ian Bolton > Sent: 08 May 2014 18:42 > To: gcc-patches > Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case > > This patch fixes a defective macro definition, based on correct > definition in similar testcases. The test currently passes > through luck rather than correctness. > > OK for commit? > > Cheers, > Ian > > > 2014-05-08 Ian Bolton > > gcc/testsuite > * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an > actual instruction to move into the allocated register.
RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > Sent: 19 May 2014 11:45 > To: Ian Bolton > Cc: gcc-patches > Subject: Re: [PATCH, AArch64] Fix macro in vdup_lane_2 test case > > On 8 May 2014 18:41, Ian Bolton wrote: > > > gcc/testsuite > > * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an > > actual instruction to move into the allocated register. > > This macro is attempting to force a value to a particular class of > register, we don't need or want the mov instruction at all. Isn't > something like this sufficient: > > #define force_simd(V1) asm volatile ("" \ > : "+w"(V1)\ > : \ > : /* No clobbers */) > > ? > > /Marcus Thanks for the review, Marcus. I did not think of that and it looks sane, but your suggested approach leads to some of the dup instructions being optimised away. Ordinarily, that would be great but these test cases are trying to force the dups to occur. Cheers, Ian
[PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15
Hi there! An existing optimisation for Thumb-2 converts t32 encodings to t16 encodings to reduce codesize, at the expense of causing redundant flag setting for ADD, AND, etc. This redundant flag setting can have negative performance impact on cortex-a15. This patch introduces two new tuning options so that the conversion from t32 to t16, which takes place in thumb2_reorg, can be suppressed for cortex-a15. To maintain some of the original benefit (reduced codesize), the suppression is only done where the enclosing basic block is deemed worthy of optimising for speed. This tested with no regressions and performance has improved for the workloads tested on cortex-a15. (It might be beneficial to other processors too, but that has not been investigated yet.) OK for stage 1? Cheers, Ian 2014-01-24 Ian Bolton gcc/ * config/arm/arm-protos.h (tune_params): New struct members. * config/arm/arm.c: Initialise tune_params per processor. (thumb2_reorg): Suppress conversion from t32 to t16 when optimizing for speed, based on new tune_params. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 13874ee..74645ee 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -272,6 +272,11 @@ struct tune_params const struct cpu_vec_costs* vec_costs; /* Prefer Neon for 64-bit bitops. */ bool prefer_neon_for_64bits; + /* Prefer 32-bit encoding instead of flag-setting 16-bit encoding. */ + bool disparage_flag_setting_t16_encodings; + /* Prefer 32-bit encoding instead of 16-bit encoding where subset of flags + would be set. */ + bool disparage_partial_flag_setting_t16_encodings; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fc81bf6..1ebaf84 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1481,7 +1481,8 @@ const struct tune_params arm_slowmul_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ &arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_fastmul_tune = @@ -1497,7 +1498,8 @@ const struct tune_params arm_fastmul_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ &arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -1516,7 +1518,8 @@ const struct tune_params arm_strongarm_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ &arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_xscale_tune = @@ -1532,7 +1535,8 @@ const struct tune_params arm_xscale_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ &arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_9e_tune = @@ -1548,7 +1552,8 @@ const struct tune_params arm_9e_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ &arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_v6t2_tune = @@ -156
[PATCH] Make pr59597 test PIC-friendly
PR59597 reinstated some code to cancel unnecessary jump threading, and brought with it a testcase to check that the cancelling happened. http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01448.html With PIC enabled for arm and aarch64, the unnecessary jump threading already never took place, so there is nothing to cancel, leading the test case to fail. My suspicion is that similar issues will happen for other architectures too. This patch changes the called function to be static, so that jump threading and the resulting cancellation happen for PIC variants too. OK for stage 4 or wait for stage 1? Cheers, Ian 2014-02-05 Ian Bolton testsuite/ * gcc.dg/tree-ssa/pr59597.c: Make called function static so that expected outcome works for PIC variants too.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c index 814d299..bc9d730 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c @@ -8,7 +8,8 @@ typedef unsigned int u32; u32 f[NNN], t[NNN]; -u16 Calc_crc8(u8 data, u16 crc ) +static u16 +Calc_crc8 (u8 data, u16 crc) { u8 i=0,x16=0,carry=0; for (i = 0; i < 8; i++) @@ -31,7 +32,9 @@ u16 Calc_crc8(u8 data, u16 crc ) } return crc; } -int main (int argc, char argv[]) + +int +main (int argc, char argv[]) { int i, j; u16 crc; for (j = 0; j < 1000; j++)
[PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard
Hi, The pr59858.c testcase explicitly sets -msoft-float which is incompatible with our -mfloat-abi=hard variant. This patch therefore should not be run if you have -mfloat-abi=hard. Tested with both variations for arm-none-eabi build. OK for commit? Cheers, Ian 2014-02-13 Ian Bolton testsuite/ * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c b/gcc/testsuite/gcc.target/arm/pr59858.c index 463bd38..1e03203 100644 --- a/gcc/testsuite/gcc.target/arm/pr59858.c +++ b/gcc/testsuite/gcc.target/arm/pr59858.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector -Os -g -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts -fno-strength-reduce -fPIC -w" } */ +/* { dg-skip-if "Test is not compatible with hard-float" { *-*-* } { "-mfloat-abi=hard" } { "" } } */ typedef enum { REG_ENOSYS = -1,
RE: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard
> > The pr59858.c testcase explicitly sets -msoft-float which is > incompatible > > with our -mfloat-abi=hard variant. > > > > This patch therefore should not be run if you have -mfloat-abi=hard. > > > > Tested with both variations for arm-none-eabi build. > > > > OK for commit? > > > > Cheers, > > Ian > > > > > > 2014-02-13 Ian Bolton > > > > testsuite/ > > * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard. > > > > > > pr59858-skip-if-hard-float-patch-v2.txt > > > > > > diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c > b/gcc/testsuite/gcc.target/arm/pr59858.c > > index 463bd38..1e03203 100644 > > --- a/gcc/testsuite/gcc.target/arm/pr59858.c > > +++ b/gcc/testsuite/gcc.target/arm/pr59858.c > > @@ -1,5 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall - > Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno- > asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack- > protector -Os -g -feliminate-unused-debug-types -funit-at-a-time - > fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno- > tree-dominator-opts -fno-strength-reduce -fPIC -w" } */ > > +/* { dg-skip-if "Test is not compatible with hard-float" { *-*-* } { > "-mfloat-abi=hard" } { "" } } */ > > > > typedef enum { > > REG_ENOSYS = -1, > > > > This won't work if hard-float is the default. Take a look at the way > other tests check for this. Hi Richard, The test does actually pass if it is hard float by default. My comment on the skip line was misleading, because the precise issue is when someone specifies -mfloat-abi=hard on the command line. I've fixed up that comment in the attached patch now. I've also reduced the number of command-line options passed (without affecting the code generated) in the patch and changed -msoft-float into -mfloat-abi=soft, since the former is deprecated and maps to the latter anyway. OK for commit? Cheers, Iandiff --git a/gcc/testsuite/gcc.target/arm/pr59858.c b/gcc/testsuite/gcc.target/arm/pr59858.c index 463bd38..a944b9a 100644 --- a/gcc/testsuite/gcc.target/arm/pr59858.c +++ b/gcc/testsuite/gcc.target/arm/pr59858.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-march=armv5te -marm -mthumb-interwork -Wall -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector -Os -g -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts -fno-strength-reduce -fPIC -w" } */ +/* { dg-options "-march=armv5te -fno-builtin -mfloat-abi=soft -mthumb -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w" } */ +/* { dg-skip-if "Incompatible command line options: -mfloat-abi=soft -mfloat-abi=hard" { *-*-* } { "-mfloat-abi=hard" } { "" } } */ typedef enum { REG_ENOSYS = -1,
[PATCH, ARM] Support ORN for DImode
Hi, Patterns had previously been added to thumb2.md to support ORN, but only for SImode. This patch adds DImode support, to cover the full 64|64->64 operation and the various 32|64->64 operations (see AND:DI variants that use NOT). The patch comes with its own execution test and looks for correct number of ORN instructions in the assembly. Regressions passed. OK for stage 1? 2014-02-19 Ian Bolton gcc/ * config/arm/thumb2.md (*iordi_notdi_di): New pattern. (*iordi_notzesidi): New pattern. (*iordi_notsesidi_di): New pattern. testsuite/ * gcc.target/arm/iordi_notdi-1.c: New test.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 4f247f8..6a71fec 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1366,6 +1366,79 @@ (set_attr "type" "alu_reg")] ) +; Constants for op 2 will never be given to these patterns. +(define_insn_and_split "*iordi_notdi_di" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r")) + (match_operand:DI 2 "s_register_operand" "r,0")))] + "TARGET_THUMB2" + "#" + "TARGET_THUMB2 && reload_completed" + [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2))) + (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[4] = gen_highpart (SImode, operands[1]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[5] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + +(define_insn_and_split "*iordi_notzesidi_di" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (zero_extend:DI +(match_operand:SI 2 "s_register_operand" "r,r"))) + (match_operand:DI 1 "s_register_operand" "0,?r")))] + "TARGET_THUMB2" + "#" + ; (not (zero_extend...)) means operand0 will always be 0x + "TARGET_THUMB2 && reload_completed" + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int -1))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); + }" + [(set_attr "length" "4,8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + +(define_insn_and_split "*iordi_notsesidi_di" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (sign_extend:DI +(match_operand:SI 2 "s_register_operand" "r,r"))) + (match_operand:DI 1 "s_register_operand" "0,r")))] + "TARGET_THUMB2" + "#" + "TARGET_THUMB2 && reload_completed" + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (ior:SI (not:SI + (ashiftrt:SI (match_dup 2) (const_int 31))) + (match_dup 4)))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[4] = gen_highpart (SImode, operands[1]); +operands[1] = gen_lowpart (SImode, operands[1]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn "*orsi_notsi_si" [(set (match_operand:SI 0 "s_register_operand" "=r") (ior:SI (not:SI (match_operand:SI 2 "s_register_operand" "r")) diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c new file mode 100644 index 000..cda9c0e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +iordi_notd
[PATCH, AArch64] Define __ARM_NEON by default
Hi, This is needed for when people are porting their aarch32 code to aarch64. They will have #ifdef __ARM_NEON (as specified in ACLE) and their intrinsics currently won't get used on aarch64 because it's not defined there by default. This patch defines __ARM_NEON so long as we are not using general regs only. Tested on simple testcase to ensure __ARM_NEON was defined. OK for trunk? Cheers, Ian 2014-02-24 Ian Bolton * config/aarch64/aarch64.h: Define __ARM_NEON by default if we are not using general regs only.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 13c424c..fc21981 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -32,6 +32,9 @@ else \ builtin_define ("__AARCH64EL__"); \ \ + if (!TARGET_GENERAL_REGS_ONLY) \ + builtin_define ("__ARM_NEON"); \ + \ switch (aarch64_cmodel) \ { \ case AARCH64_CMODEL_TINY: \
[PATCH] Keep -ffp-contract=fast by default if we have -funsafe-math-optimizations
Hi, In common.opt, -ffp-contract=fast is set as the default for GCC. But then it gets disabled in c-family/c-opts.c if you are using ISO C (e.g. with -std=c99). The reason for this patch is that if you have also specified -funsafe-math-optimizations (or -Ofast or -ffast-math) then it is likely your preference to have -ffp-contract=fast on, so you can generate fused multiply adds (fma standard pattern). This patch works by blocking the override if you have -funsafe-math-optimizations (directly or indirectly), causing fused multiply add to be used in the places where we might hope to see it. (I had considered forcing -ffp-contract=fast on in opts.c if you have -funsafe-math-optimizations, but it is already on by default ... and it didn't work either! The problem is that it is forced off unless you have explicitly asked for -ffp-contract=fast at the command-line.) Standard regressions passed. OK for trunk or stage 1? Cheers, Ian 10-03-2014 Ian Bolton * gcc/c-family/c-opts.c (c_common_post_options): Don't override -ffp-contract=fast if unsafe-math-optimizations is on.diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index b7478f3..92ba481 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -834,7 +834,8 @@ c_common_post_options (const char **pfilename) if (flag_iso && !c_dialect_cxx () && (global_options_set.x_flag_fp_contract_mode - == (enum fp_contract_mode) 0)) + == (enum fp_contract_mode) 0) + && flag_unsafe_math_optimizations == 0) flag_fp_contract_mode = FP_CONTRACT_OFF; /* By default we use C99 inline semantics in GNU99 or C99 mode. C99
[PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) --> the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) --> the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*anddi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") +(and:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r")) +(zero_extend:DI + (match_operand:SI 1 "s_register_operand" "r,r"] + "TARGET_32BIT" + "#" + "TARGET_32BIT && reload_completed" + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn_and_split "*anddi_notsesidi_di" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*iordi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r")) + (zero_extend:DI +(match_operand:SI 1 "s_register_operand" "r,r"] + "TARGET_THUMB2" + "#" + "TARGET_THUMB2 && reload_completed" + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn_and_split "*iordi_notsesidi_di" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +anddi_di_notdi (s64int a, s64int b) +{ + return (a & ~b); +} + +s64int +anddi_di_notzesidi (s64int a, u32int b) +{ + return (a & ~(u64int) b); +} + +s64int +anddi_notdi_zesidi (s64int a, u32int b) +{ + return (~a & (u64int) b); +} + +s64int +anddi_di_notsesidi (s64int a, s32int b) +{ + return (a & ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x5f470112ll; + s64int c64 = 0xdeadbeef300fll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = anddi_di_notdi (c64, b64); + if (z != 0xdeadbeef2008ll) +abort (); + + z = anddi_di_notzesidi (a64, c32); + if (z != 0xdeadbeefb0b0ll) +abort (); + + z = anddi_notdi_zesidi (c64, c32); + if (z != 0x01104f4fll) +abort (); + + z =
RE: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
> -Original Message- > From: Richard Earnshaw > Sent: 21 March 2014 13:57 > To: Ian Bolton > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A > > On 19/03/14 16:53, Ian Bolton wrote: > > This is a follow-on patch to one already committed: > > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html > > > > It implements patterns to simplify our RTL as follows: > > > > OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) > > --> the top half can be done with a MVN > > > > AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) > > --> the top half becomes zero. > > > > I've added test cases for both of these and also the existing > > anddi_notdi patterns. The tests all pass. > > > > Full regression runs passed. > > > > OK for stage 1? > > > > Cheers, > > Ian > > > > > > 2014-03-19 Ian Bolton > > > > gcc/ > > * config/arm/arm.md (*anddi_notdi_zesidi): New pattern > > * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. > > > > testsuite/ > > * gcc.target/arm/anddi_notdi-1.c: New test. > > * gcc.target/arm/iordi_notdi-1.c: New test case. > > > > > > arm-and-ior-notdi-zeroextend-patch-v1.txt > > > > > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > > index 2ddda02..d2d85ee 100644 > > --- a/gcc/config/arm/arm.md > > +++ b/gcc/config/arm/arm.md > > @@ -2962,6 +2962,28 @@ > > (set_attr "type" "multiple")] > > ) > > > > +(define_insn_and_split "*anddi_notdi_zesidi" > > + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") > > +(and:DI (not:DI (match_operand:DI 2 "s_register_operand" > "0,?r")) > > +(zero_extend:DI > > + (match_operand:SI 1 "s_register_operand" "r,r"] > > The early clobber and register tying here is unnecessary. All of the > input operands are consumed in the first instruction, so you can > eliminate the ties and the restriction on the overlap. Something like > (untested): > > +(define_insn_and_split "*anddi_notdi_zesidi" > + [(set (match_operand:DI 0 "s_register_operand" "=r") > +(and:DI (not:DI (match_operand:DI 2 "s_register_operand" "r")) > +(zero_extend:DI > + (match_operand:SI 1 "s_register_operand" "r"] > > Ok for stage-1 with that change (though I'd recommend a another test > run > to validate the above). > > R. Thanks, Richard. Regression runs came back OK with that change, so I will consider this ready for stage 1. The patch is attached for reference. Cheers, Ian diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..4176b7ff 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*anddi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=r") +(and:DI (not:DI (match_operand:DI 2 "s_register_operand" "r")) +(zero_extend:DI + (match_operand:SI 1 "s_register_operand" "r"] + "TARGET_32BIT" + "#" + "TARGET_32BIT && reload_completed" + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + " + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + }" + [(set_attr "length" "8") + (set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no") + (set_attr "type" "multiple")] +) + (define_insn_and_split "*anddi_notsesidi_di" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr "type" "multiple")] ) +(define_insn_and_split "*iordi_notdi_zesidi" + [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") + (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r")) + (zero_extend:DI +(match_oper
RE: [PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15
> Hi, > > On 28 January 2014 13:10, Ramana Radhakrishnan > wrote: > > On Fri, Jan 24, 2014 at 5:16 PM, Ian Bolton > wrote: > >> Hi there! > >> > >> An existing optimisation for Thumb-2 converts t32 encodings to > >> t16 encodings to reduce codesize, at the expense of causing > >> redundant flag setting for ADD, AND, etc. This redundant flag > >> setting can have negative performance impact on cortex-a15. > >> > >> This patch introduces two new tuning options so that the conversion > >> from t32 to t16, which takes place in thumb2_reorg, can be > suppressed > >> for cortex-a15. > >> > >> To maintain some of the original benefit (reduced codesize), the > >> suppression is only done where the enclosing basic block is deemed > >> worthy of optimising for speed. > >> > >> This tested with no regressions and performance has improved for > >> the workloads tested on cortex-a15. (It might be beneficial to > >> other processors too, but that has not been investigated yet.) > >> > >> OK for stage 1? > > > > This is OK for stage1. > > > > Ramana > > > >> > >> Cheers, > >> Ian > >> > >> > >> 2014-01-24 Ian Bolton > >> > >> gcc/ > >> * config/arm/arm-protos.h (tune_params): New struct members. > >> * config/arm/arm.c: Initialise tune_params per processor. > >> (thumb2_reorg): Suppress conversion from t32 to t16 when > >> optimizing for speed, based on new tune_params. > > This causes > gcc.target/arm/negdi-1.c > gcc.target/arm/negdi-2.c > to FAIL when GCC is configured as: > --with-mode=ar > --with-cpu=cortex-a15 > --with-fpu=neon-vfpv4 > > both tests used to PASS. > (see http://cbuild.validation.linaro.org/build/cross- > validation/gcc/209561/report-build-info.html) Hi Christophe, I don't recall the failure when I did the work, but I see now that the test is looking for negs when my patch is specifically trying to avoid flag-setting operations. So we are now getting an rsb instead of a negs, as intended, and the test needs fixing! Open question: Should I look for either rsb or negs in a single scan-assembler or look for different ones dependent on the cpu in question or just not run the test for cortex-a15? Cheers, Ian
[PATCH, AArch64] Make zero_extends explicit for common SImode patterns
Season's greetings to you! :) I've made zero_extend versions of SI mode patterns that write to W registers in order to make the implicit zero_extend that they do explicit, so GCC can be smarter about when it actually needs to plant a zero_extend (uxtw). This patch significantly reduces the number of redundant uxtw instructions seen in a variety of programs. (There are further patterns that can be done, but I have them in a separate patch that's still in development.) OK for trunk and backport to ARM/aarch64-4.7-branch? Cheers, Ian 2012-12-13 Ian Bolton * gcc/config/aarch64/aarch64.md (*addsi3_aarch64_uxtw): New pattern. (*addsi3_compare0_uxtw): New pattern. (*add__si_uxtw): New pattern. (*add__si_uxtw): New pattern. (*add__shft_si_uxtw): New pattern. (*add__mult_si_uxtw): New pattern. (*add_si_multp2_uxtw): New pattern. (*addsi3_carryin_uxtw): New pattern. (*addsi3_carryin_alt1_uxtw): New pattern. (*addsi3_carryin_alt2_uxtw): New pattern. (*addsi3_carryin_alt3_uxtw): New pattern. (*add_uxtsi_multp2_uxtw): New pattern. (*subsi3_uxtw): New pattern. (*subsi3_compare0_uxtw): New pattern. (*sub__si_uxtw): New pattern. (*sub_mul_imm_si_uxtw): New pattern. (*sub__si_uxtw): New pattern. (*sub__shft_si_uxtw): New pattern. (*sub_si_multp2_uxtw): New pattern. (*sub_uxtsi_multp2_uxtw): New pattern. (*negsi2_uxtw): New pattern. (*negsi2_compare0_uxtw): New pattern. (*neg__si2_uxtw): New pattern. (*neg_mul_imm_si2_uxtw): New pattern. (*mulsi3_uxtw): New pattern. (*maddsi_uxtw): New pattern. (*msubsi_uxtw): New pattern. (*mulsi_neg_uxtw): New pattern. (*divsi3_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a9a8b5f..d5c0206 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1271,6 +1273,22 @@ (set_attr "mode" "SI")] ) +;; zero_extend version of above +(define_insn "*addsi3_aarch64_uxtw" + [(set +(match_operand:DI 0 "register_operand" "=rk,rk,rk") +(zero_extend:DI (plus:SI + (match_operand:SI 1 "register_operand" "%rk,rk,rk") + (match_operand:SI 2 "aarch64_plus_operand" "I,r,J"] + "" + "@ + add\\t%w0, %w1, %2 + add\\t%w0, %w1, %w2 + sub\\t%w0, %w1, #%n2" + [(set_attr "v8type" "alu") + (set_attr "mode" "SI")] +) + (define_insn "*adddi3_aarch64" [(set (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w") @@ -1304,6 +1322,23 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*addsi3_compare0_uxtw" + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ +(plus:SI (match_operand:SI 1 "register_operand" "%r,r") + (match_operand:SI 2 "aarch64_plus_operand" "rI,J")) +(const_int 0))) + (set (match_operand:DI 0 "register_operand" "=r,r") + (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2] + "" + "@ + adds\\t%w0, %w1, %w2 + subs\\t%w0, %w1, #%n2" + [(set_attr "v8type" "alus") + (set_attr "mode" "SI")] +) + (define_insn "*add3nr_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ @@ -1340,6 +1375,19 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI + (ASHIFT:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:QI 2 "aarch64_shift_imm_si" "n")) + (match_operand:SI 3 "register_operand" "r"] + "" + "add\\t%w0, %w3, %w1, %2" + [(set_attr "v8type" "alu_shift") + (set_attr "mode" "SI")] +) + (define_insn "*add_mul_imm_" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (mult:GPI (match_operand:GPI 1 "register_operand" "r") @@ -1361,6 +1409,17 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 "register_operand" "r")) + (match_operand:GPI 2 "register_operand" "r"] + "" + "add\\t%w0, %w2, %w, x
RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns
Hi Richard, > + "add\\t%w0, %w2, %w, xt" > > ^^^ %w1 Got spot. I guess that pattern hasn't fired yet then! I'll fix it. > > This patch significantly reduces the number of redundant > > uxtw instructions seen in a variety of programs. > > > > (There are further patterns that can be done, but I have them > > in a separate patch that's still in development.) > > What do you get if you enable flag_ree, as we do for x86_64? > In theory this should avoid even more extensions... > > > C.f. common/config/i386/i386-common.c: > > static const struct default_options ix86_option_optimization_table[] = > { > /* Enable redundant extension instructions removal at -O2 and > higher. */ > { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, > I should have said that I am indeed running with REE enabled. It has some impact (about 70 further UXTW removed from the set of binaries I've been building) and seems to mostly be good across basic blocks within the same function. As far as I can tell, there is no downside to REE, so I think it should be enabled by default for O2 or higher on AArch64 too. I'll prepare a new patch ...
RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns
> Hi Richard, > > > + "add\\t%w0, %w2, %w, xt" > > > > ^^^ %w1 > > Got spot. I guess that pattern hasn't fired yet then! I'll fix it. Now fixed in v3. > I should have said that I am indeed running with REE enabled. It has > some impact (about 70 further UXTW removed from the set of binaries > I've been building) and seems to mostly be good across basic blocks > within the same function. As far as I can tell, there is no downside > to REE, so I think it should be enabled by default for O2 or higher > on AArch64 too. > I'm going to enable REE in a separate patch. Is this one OK to commit here and backport to ARM/aarch64-4.7-branch? Thanks, Ian diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a9a8b5f..d5c0206 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1271,6 +1273,22 @@ (set_attr "mode" "SI")] ) +;; zero_extend version of above +(define_insn "*addsi3_aarch64_uxtw" + [(set +(match_operand:DI 0 "register_operand" "=rk,rk,rk") +(zero_extend:DI (plus:SI + (match_operand:SI 1 "register_operand" "%rk,rk,rk") + (match_operand:SI 2 "aarch64_plus_operand" "I,r,J"] + "" + "@ + add\\t%w0, %w1, %2 + add\\t%w0, %w1, %w2 + sub\\t%w0, %w1, #%n2" + [(set_attr "v8type" "alu") + (set_attr "mode" "SI")] +) + (define_insn "*adddi3_aarch64" [(set (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w") @@ -1304,6 +1322,23 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*addsi3_compare0_uxtw" + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ +(plus:SI (match_operand:SI 1 "register_operand" "%r,r") + (match_operand:SI 2 "aarch64_plus_operand" "rI,J")) +(const_int 0))) + (set (match_operand:DI 0 "register_operand" "=r,r") + (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2] + "" + "@ + adds\\t%w0, %w1, %w2 + subs\\t%w0, %w1, #%n2" + [(set_attr "v8type" "alus") + (set_attr "mode" "SI")] +) + (define_insn "*add3nr_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ @@ -1340,6 +1375,19 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI + (ASHIFT:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:QI 2 "aarch64_shift_imm_si" "n")) + (match_operand:SI 3 "register_operand" "r"] + "" + "add\\t%w0, %w3, %w1, %2" + [(set_attr "v8type" "alu_shift") + (set_attr "mode" "SI")] +) + (define_insn "*add_mul_imm_" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (mult:GPI (match_operand:GPI 1 "register_operand" "r") @@ -1361,6 +1409,17 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 "register_operand" "r")) + (match_operand:GPI 2 "register_operand" "r"] + "" + "add\\t%w0, %w2, %w1, xt" + [(set_attr "v8type" "alu_ext") + (set_attr "mode" "SI")] +) + (define_insn "*add__shft_" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (ashift:GPI (ANY_EXTEND:GPI @@ -1373,6 +1432,19 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__shft_si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI + (match_operand:SHORT 1 "register_operand" "r")) + (match_operand 2 "aarch64_imm3" "Ui3")) + (match_operand:SI 3 "register_operand" "r"] + "" + "add\\t%w0, %w3, %w1, xt %2" + [(set_attr "v8type" "alu_ext") + (set_attr "mode" "SI")] +) + (define_insn "*add__mult_" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (mult:GPI (ANY_EXTEND:GPI @@ -1385,6 +1457,19 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add__mult_si_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI (mult:SI (ANY_EXTEND:SI +(match_operand:SHORT 1 "register_operand" "r")) + (match_operand 2 "aarch64_pwr_imm3" "Up3")) + (match_operand:SI 3 "register_operand" "r"] + "" + "add\\t%w0, %w3, %w1, xt %p2" + [(set_attr "v8type" "alu_ext") + (set_attr "mode" "SI")] +) + (define_insn "*add__multp2" [(set (match_operand:GPI 0 "register_operand" "=rk") (plus:GPI (ANY_EXTRACT:GPI @@ -1399,6 +1484,21 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*add_si_multp2_uxtw" + [(set (match_operand:DI 0 "register_operand" "=rk") + (zero_extend:DI (plus:SI (ANY_EXTRACT:SI + (mult:SI (match_operand:SI 1 "registe
[PATCH, AArch64] Make zero_extends explicit for some SImode patterns
Greetings! I've made zero_extend versions of SI mode patterns that write to W registers in order to make the implicit zero_extend that they do explicit, so GCC can be smarter about when it actually needs to plant a zero_extend (uxtw). If that sounds familiar, it's because this patch continues the work of one already committed. :) This has been regression-tested for linux and bare-metal. OK for trunk and backport to ARM/aarch64-4.7-branch? Cheers, Ian 2013-01-15 Ian Bolton * gcc/config/aarch64/aarch64.md (*cstoresi_neg_uxtw): New pattern. (*cmovsi_insn_uxtw): New pattern. (*si3_uxtw): New pattern. (*_si3_uxtw): New pattern. (*si3_insn_uxtw): New pattern. (*bswapsi2_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ec65b3c..8dd6c22 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1,5 +1,5 @@ ;; Machine description for AArch64 architecture. -;; Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc. +;; Copyright (C) 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc. ;; Contributed by ARM Ltd. ;; ;; This file is part of GCC. @@ -2193,6 +2193,18 @@ (set_attr "mode" "")] ) +;; zero_extend version of the above +(define_insn "*cstoresi_insn_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI +(match_operator:SI 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)])))] + "" + "cset\\t%w0, %m1" + [(set_attr "v8type" "csel") + (set_attr "mode" "SI")] +) + (define_insn "*cstore_neg" [(set (match_operand:ALLI 0 "register_operand" "=r") (neg:ALLI (match_operator:ALLI 1 "aarch64_comparison_operator" @@ -2203,6 +2215,18 @@ (set_attr "mode" "")] ) +;; zero_extend version of the above +(define_insn "*cstoresi_neg_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI +(neg:SI (match_operator:SI 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]] + "" + "csetm\\t%w0, %m1" + [(set_attr "v8type" "csel") + (set_attr "mode" "SI")] +) + (define_expand "cmov6" [(set (match_operand:GPI 0 "register_operand" "") (if_then_else:GPI @@ -2257,6 +2281,30 @@ (set_attr "mode" "")] ) +;; zero_extend version of above +(define_insn "*cmovsi_insn_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r,r,r,r,r,r,r") + (zero_extend:DI +(if_then_else:SI + (match_operator 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:SI 3 "aarch64_reg_zero_or_m1_or_1" "rZ,rZ,UsM,rZ,Ui1,UsM,Ui1") + (match_operand:SI 4 "aarch64_reg_zero_or_m1_or_1" "rZ,UsM,rZ,Ui1,rZ,UsM,Ui1"] + "!((operands[3] == const1_rtx && operands[4] == constm1_rtx) + || (operands[3] == constm1_rtx && operands[4] == const1_rtx))" + ;; Final two alternatives should be unreachable, but included for completeness + "@ + csel\\t%w0, %w3, %w4, %m1 + csinv\\t%w0, %w3, wzr, %m1 + csinv\\t%w0, %w4, wzr, %M1 + csinc\\t%w0, %w3, wzr, %m1 + csinc\\t%w0, %w4, wzr, %M1 + mov\\t%w0, -1 + mov\\t%w0, 1" + [(set_attr "v8type" "csel") + (set_attr "mode" "SI")] +) + (define_insn "*cmov_insn" [(set (match_operand:GPF 0 "register_operand" "=w") (if_then_else:GPF @@ -2369,6 +2417,17 @@ [(set_attr "v8type" "logic,logic_imm") (set_attr "mode" "")]) +;; zero_extend version of above +(define_insn "*si3_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r,rk") + (zero_extend:DI + (LOGICAL:SI (match_operand:SI 1 "register_operand" "%r,r") +(match_operand:SI 2 "aarch64_logical_operand" "r,K"] + "" + "\\t%w0, %w1, %w2" + [(set_attr "v8type" "logic,logic_imm") + (set_attr "mode" "SI")]) + (define_insn "*_3" [(set (match_operand:GPI 0 "register_operand" "=r") (LOGICAL:GPI (SHIFT:GPI @@ -2380,6 +2439,19 @@ [(set_attr "v8type" "logic_shift") (set_attr "mode" "")]) +;; zero_extend version of abov
[PATCH, AArch64] AND operation should use CC_NZ mode
The mode for AND should really be CC_NZ, so I fixed that up and in the TST patterns that (erroneously) expected it to be CC mode. It has been tested on linux and bare-metal. OK to commit to trunk (as bug fix)? Thanks. Ian 13-02-01 Ian Bolton * config/aarch64/aarch64.c (aarch64_select_cc_mode): Return correct CC mode for AND. * config/aarch64/aarch64.md (*and3nr_compare0): Fixed to use CC_NZ. (*and_3nr_compare0): Likewise. - diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 03b1361..2b09669 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3076,7 +3076,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode) && y == const0_rtx && (code == EQ || code == NE || code == LT || code == GE) - && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)) + && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND)) return CC_NZmode; /* A compare with a shifted operand. Because of canonicalization, diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 36267c9..c4c152f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2470,8 +2470,8 @@ ) (define_insn "*and3nr_compare0" - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ (and:GPI (match_operand:GPI 0 "register_operand" "%r,r") (match_operand:GPI 1 "aarch64_logical_operand" "r,")) (const_int 0)))] @@ -2481,8 +2481,8 @@ (set_attr "mode" "")]) (define_insn "*and_3nr_compare0" - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ (and:GPI (SHIFT:GPI (match_operand:GPI 0 "register_operand" "r") (match_operand:QI 1 "aarch64_shift_imm_" "n"))
[PATCH, AArch64] Allow symbol+offset even if not being used for memory access
Hi, This patch builds on a previous one that allowed symbol+offset as symbol references for memory accesses. It allows us to have symbol+offset even when no memory access is apparent. It reduces codesize for cases such as this one: int arr[100]; uint64_t foo (uint64_t a) { uint64_t const z = 1234567ll<<32+7; uint64_t const y = (uint64_t) &arr[3]; return y + a + z; } Before the patch, the code looked like this: adrpx2, arr mov x1, 74217034874880 add x2, x2, :lo12:arr add x2, x2, 12 movkx1, 2411, lsl 48 add x1, x2, x1 add x0, x1, x0 ret Now, it looks like this: adrpx1, arr+12 mov x2, 74217034874880 movkx2, 2411, lsl 48 add x1, x1, :lo12:arr+12 add x1, x1, x2 add x0, x1, x0 ret Testing shows no regressions. OK to commit? 2012-08-31 Ian Bolton * gcc/config/aarch64/aarch64.md: New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a00d3f0..de9c927 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2795,7 +2795,7 @@ (lo_sum:DI (match_operand:DI 1 "register_operand" "r") (match_operand 2 "aarch64_valid_symref" "S")))] "" - "add\\t%0, %1, :lo12:%2" + "add\\t%0, %1, :lo12:%a2" [(set_attr "v8type" "alu") (set_attr "mode" "DI")] @@ -2890,6 +2890,20 @@ [(set_attr "length" "0")] ) +(define_split + [(set (match_operand:DI 0 "register_operand" "=r") + (const:DI (plus:DI (match_operand:DI 1 "aarch64_valid_symref" "S") + (match_operand:DI 2 "const_int_operand" "i"] + "" + [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1) + (match_dup 2) + (set (match_dup 0) (lo_sum:DI (match_dup 0) +(const:DI (plus:DI (match_dup 1) + (match_dup 2)] + "" +) + + ;; AdvSIMD Stuff (include "aarch64-simd.md")
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
> On 2012-08-31 07:49, Ian Bolton wrote: > > +(define_split > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (const:DI (plus:DI (match_operand:DI 1 "aarch64_valid_symref" > "S") > > + (match_operand:DI 2 "const_int_operand" > "i"] > > + "" > > + [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1) > > + (match_dup 2) > > + (set (match_dup 0) (lo_sum:DI (match_dup 0) > > +(const:DI (plus:DI (match_dup 1) > > + (match_dup 2)] > > + "" > > +) > > You ought not need this as a separate split, since (CONST ...) should > be handled exactly like (SYMBOL_REF). I see in combine.c that it does get done for a MEM (which is how my earlier patch worked), but not when it's being used for other reasons (hence the title of this email). See below for current code from find_split_point: case MEM: #ifdef HAVE_lo_sum /* If we have (mem (const ..)) or (mem (symbol_ref ...)), split it using LO_SUM and HIGH. */ if (GET_CODE (XEXP (x, 0)) == CONST || GET_CODE (XEXP (x, 0)) == SYMBOL_REF) { enum machine_mode address_mode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (x)); SUBST (XEXP (x, 0), gen_rtx_LO_SUM (address_mode, gen_rtx_HIGH (address_mode, XEXP (x, 0)), XEXP (x, 0))); return &XEXP (XEXP (x, 0), 0); } #endif If I don't use my split pattern, I could alter combine to remove the requirement that parent is a MEM. What do you think? > > Also note that constraints ("=r" etc) aren't used for splits. > If I keep the pattern, I will remove the constraints. Thanks for the pointers in this regard. Cheers, Ian
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
> From: Richard Henderson [mailto:r...@redhat.com] > On 09/06/2012 08:06 AM, Ian Bolton wrote: > > If I don't use my split pattern, I could alter combine to remove the > > requirement that parent is a MEM. > > > > What do you think? > > I merely question the calling out of CONST as special. > > Either you've got some pattern that handles SYMBOL_REF > the same way, or you're missing something. Oh, I understand now. Thanks for clarifying. Some digging has shown me that the transformation keys off the equivalence, as highlighted below. It's always phrased in terms of a const and never a symbol_ref. after ud_dce: 6 r82:DI=high(`arr') 7 r81:DI=r82:DI+low(`arr') REG_DEAD: r82:DI REG_EQUAL: `arr' 8 r80:DI=r81:DI+0xc REG_DEAD: r81:DI REG_EQUAL: const(`arr'+0xc) <- this equivalence after combine: 7 r80:DI=high(const(`arr'+0xc)) 8 r80:DI=r80:DI+low(const(`arr'+0xc)) REG_EQUAL: const(`arr'+0xc) <- this equivalence Based on that, and assuming I remove the constraints on the pattern, would you say the patch is worthy of commit? Thanks, Ian
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
> Can you send me the test case you were looking at for this? See attached. (Most of it is superfluous, but the point is that we are not using the address to do a memory access.) Cheers, Ian constant-test1.c Description: Binary data
[PATCH, AArch64] Implement ffs standard pattern
I've implemented the standard pattern ffs, which leads to __builtin_ffs being generated with 4 instructions instead of 5 instructions. Regression tests and my new test pass. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (csinc3): Make it into a named pattern. * config/aarch64/aarch64.md (ffs2): New pattern. testsuite/ * gcc.target/aarch64/ffs.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5278957..dfdba42 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2021,7 +2021,7 @@ [(set_attr "v8type" "csel") (set_attr "mode" "")]) -(define_insn "*csinc3_insn" +(define_insn "csinc3_insn" [(set (match_operand:GPI 0 "register_operand" "=r") (if_then_else:GPI (match_operator:GPI 1 "aarch64_comparison_operator" @@ -2157,6 +2157,21 @@ } ) +(define_expand "ffs2" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand")] + "" + { +rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx); +rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx); + +emit_insn (gen_rbit2 (operands[0], operands[1])); +emit_insn (gen_clz2 (operands[0], operands[0])); +emit_insn (gen_csinc3_insn (operands[0], x, ccreg, operands[0], const0_rtx)); +DONE; + } +) + (define_insn "*and3nr_compare0" [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/ffs.c b/gcc/testsuite/gcc.target/aarch64/ffs.c new file mode 100644 index 000..a344761 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ffs.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ffs(x); +} + +/* { dg-final { scan-assembler "cmp\tw" } } */ +/* { dg-final { scan-assembler "rbit\tw" } } */ +/* { dg-final { scan-assembler "clz\tw" } } */ +/* { dg-final { scan-assembler "csinc\tw" } } */
[PATCH, AArch64] Implement fnma, fms and fnms standard patterns
The following standard pattern names were implemented by simply renaming some existing patterns: * fnma * fms * fnms I have added an extra pattern for when we don't care about signed zero, so we can do "-fma (a,b,c)" more efficiently. Regression testing all passed. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton gcc/ * config/aarch64/aarch64.md (fmsub4): Renamed to fnma4. * config/aarch64/aarch64.md (fnmsub4): Renamed to fms4. * config/aarch64/aarch64.md (fnmadd4): Renamed to fnms4. * config/aarch64/aarch64.md (*fnmadd4): New pattern. testsuite/ * gcc.target/aarch64/fmadd.c: Added extra tests. * gcc.target/aarch64/fnmadd-fastmath.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3fbebf7..33815ff 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2506,7 +2506,7 @@ (set_attr "mode" "")] ) -(define_insn "*fmsub4" +(define_insn "fnma4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) (match_operand:GPF 2 "register_operand" "w") @@ -2517,7 +2517,7 @@ (set_attr "mode" "")] ) -(define_insn "*fnmsub4" +(define_insn "fms4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (match_operand:GPF 1 "register_operand" "w") (match_operand:GPF 2 "register_operand" "w") @@ -2528,7 +2528,7 @@ (set_attr "mode" "")] ) -(define_insn "*fnmadd4" +(define_insn "fnms4" [(set (match_operand:GPF 0 "register_operand" "=w") (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) (match_operand:GPF 2 "register_operand" "w") @@ -2539,6 +2539,18 @@ (set_attr "mode" "")] ) +;; If signed zeros are ignored, -(a * b + c) = -a * b - c. +(define_insn "*fnmadd4" + [(set (match_operand:GPF 0 "register_operand") + (neg:GPF (fma:GPF (match_operand:GPF 1 "register_operand") + (match_operand:GPF 2 "register_operand") + (match_operand:GPF 3 "register_operand"] + "!HONOR_SIGNED_ZEROS (mode) && TARGET_FLOAT" + "fnmadd\\t%0, %1, %2, %3" + [(set_attr "v8type" "fmadd") + (set_attr "mode" "")] +) + ;; --- ;; Floating-point conversions ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/fmadd.c b/gcc/testsuite/gcc.target/aarch64/fmadd.c index 3b4..39975db 100644 --- a/gcc/testsuite/gcc.target/aarch64/fmadd.c +++ b/gcc/testsuite/gcc.target/aarch64/fmadd.c @@ -4,15 +4,52 @@ extern double fma (double, double, double); extern float fmaf (float, float, float); -double test1 (double x, double y, double z) +double test_fma1 (double x, double y, double z) { return fma (x, y, z); } -float test2 (float x, float y, float z) +float test_fma2 (float x, float y, float z) { return fmaf (x, y, z); } +double test_fnma1 (double x, double y, double z) +{ + return fma (-x, y, z); +} + +float test_fnma2 (float x, float y, float z) +{ + return fmaf (-x, y, z); +} + +double test_fms1 (double x, double y, double z) +{ + return fma (x, y, -z); +} + +float test_fms2 (float x, float y, float z) +{ + return fmaf (x, y, -z); +} + +double test_fnms1 (double x, double y, double z) +{ + return fma (-x, y, -z); +} + +float test_fnms2 (float x, float y, float z) +{ + return fmaf (-x, y, -z); +} + /* { dg-final { scan-assembler-times "fmadd\td\[0-9\]" 1 } } */ /* { dg-final { scan-assembler-times "fmadd\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fmsub\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fmsub\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmsub\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmsub\ts\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c new file mode 100644 index 000..9c115df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math" } */ + +extern double fma (double, double, double); +extern float fmaf (float, float, float); + +double test_fma1 (double x, double y, double z) +{ + return - fma (x, y, z); +} + +float test_fma2 (float x, float y, float z) +{ + return - fmaf (x, y, z); +} + +/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */ +
RE: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns
OK for 4.7 as well? > -Original Message- > From: Richard Earnshaw > Sent: 14 September 2012 18:18 > To: Ian Bolton > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, AArch64] Implement fnma, fms and fnms standard > patterns > > On 14/09/12 18:05, Ian Bolton wrote: > > The following standard pattern names were implemented by simply > > renaming some existing patterns: > > > > * fnma > > * fms > > * fnms > > > > I have added an extra pattern for when we don't care about > > signed zero, so we can do "-fma (a,b,c)" more efficiently. > > > > Regression testing all passed. > > > > OK to commit? > > > > Cheers, > > Ian > > > > > > 2012-09-14 Ian Bolton > > > > gcc/ > > * config/aarch64/aarch64.md (fmsub4): Renamed > > to fnma4. > > * config/aarch64/aarch64.md (fnmsub4): Renamed > > to fms4. > > * config/aarch64/aarch64.md (fnmadd4): Renamed > > to fnms4. > > * config/aarch64/aarch64.md (*fnmadd4): New pattern. > > > > testsuite/ > > * gcc.target/aarch64/fmadd.c: Added extra tests. > > * gcc.target/aarch64/fnmadd-fastmath.c: New test. > > > > OK. > > R.
RE: [PATCH, AArch64] Implement ffs standard pattern
OK for aarch64-4.7-branch as well? > -Original Message- > From: Richard Earnshaw > Sent: 14 September 2012 18:31 > To: Ian Bolton > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, AArch64] Implement ffs standard pattern > > On 14/09/12 16:26, Ian Bolton wrote: > > I've implemented the standard pattern ffs, which leads to > > __builtin_ffs being generated with 4 instructions instead > > of 5 instructions. > > > > Regression tests and my new test pass. > > > > OK to commit? > > > > > > Cheers, > > Ian > > > > > > > > 2012-09-14 Ian Bolton > > > > gcc/ > > * config/aarch64/aarch64.md (csinc3): Make it into a > > named pattern. > > * config/aarch64/aarch64.md (ffs2): New pattern. > > > > testsuite/ > > > > * gcc.target/aarch64/ffs.c: New test. > > > > OK. > > R.
[PATCH, AArch64] Implement ctz and clrsb standard patterns
I've implemented the following standard patterns: * clrsb * ctz Regression runs passed and I have added compilation tests for them, and clz as well. (Execution tests are covered by gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c.) OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian 2012-09-18 Ian Bolton gcc/ * config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO. * config/aarch64/aarch64.md (clrsb2): New pattern. * config/aarch64/aarch64.md (rbit2): New pattern. * config/aarch64/aarch64.md (ctz2): New pattern. gcc/testsuite/ * gcc.target/aarch64/clrsb.c: New test. * gcc.target/aarch64/clz.c: New test. * gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. +(UNSPEC_RBIT86) ; Used in aarch64-simd.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr "v8type" "clz") (set_attr "mode" "")]) +(define_insn "clrsb2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_CLS))] + "" + "cls\\t%0, %1" + [(set_attr "v8type" "clz") + (set_attr "mode" "")]) + +(define_insn "rbit2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_RBIT))] + "" + "rbit\\t%0, %1" + [(set_attr "v8type" "rbit") + (set_attr "mode" "")]) + +(define_expand "ctz2" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand")] + "" + { +emit_insn (gen_rbit2 (operands[0], operands[1])); +emit_insn (gen_clz2 (operands[0], operands[0])); +DONE; + } +) + (define_insn "*and3nr_compare0" [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clrsb(x); +} + +/* { dg-final { scan-assembler "cls\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clz(x); +} + +/* { dg-final { scan-assembler "clz\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ctz(x); +} + +/* { dg-final { scan-assembler "rbit\tw" } } */ +/* { dg-final { scan-assembler "clz\tw" } } */ +
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
> > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index 33815ff..5278957 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -153,6 +153,8 @@ > > (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. > > (UNSPEC_FMAX83) ; Used in aarch64-simd.md. > > (UNSPEC_FMIN84) ; Used in aarch64-simd.md. > > +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. > > +(UNSPEC_RBIT86) ; Used in aarch64-simd.md. > > The comment doesn't appear to be true. > Fair point! I will fix that.
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
> > > diff --git a/gcc/config/aarch64/aarch64.md > > b/gcc/config/aarch64/aarch64.md > > > index 33815ff..5278957 100644 > > > --- a/gcc/config/aarch64/aarch64.md > > > +++ b/gcc/config/aarch64/aarch64.md > > > @@ -153,6 +153,8 @@ > > > (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. > > > (UNSPEC_FMAX 83) ; Used in aarch64-simd.md. > > > (UNSPEC_FMIN 84) ; Used in aarch64-simd.md. > > > +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. > > > +(UNSPEC_RBIT 86) ; Used in aarch64-simd.md. > > > > The comment doesn't appear to be true. > > > > Fair point! I will fix that. > New patch with comment fixed is attached. Now good to commit to aarch64-branch and aarch64-4.7-branch? Cheers, Iandiff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64.md. +(UNSPEC_RBIT86) ; Used in aarch64.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr "v8type" "clz") (set_attr "mode" "")]) +(define_insn "clrsb2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_CLS))] + "" + "cls\\t%0, %1" + [(set_attr "v8type" "clz") + (set_attr "mode" "")]) + +(define_insn "rbit2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_RBIT))] + "" + "rbit\\t%0, %1" + [(set_attr "v8type" "rbit") + (set_attr "mode" "")]) + +(define_expand "ctz2" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand")] + "" + { +emit_insn (gen_rbit2 (operands[0], operands[1])); +emit_insn (gen_clz2 (operands[0], operands[0])); +DONE; + } +) + (define_insn "*and3nr_compare0" [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clrsb(x); +} + +/* { dg-final { scan-assembler "cls\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clz(x); +} + +/* { dg-final { scan-assembler "clz\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ctz(x); +} + +/* { dg-final { scan-assembler "rbit\tw" } } */ +/* { dg-final { scan-assembler "clz\tw" } } */ +
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
New version attached with better formatted test cases. OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian - 2012-09-18 Ian Bolton gcc/ * config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO. * config/aarch64/aarch64.md (clrsb2): New pattern. * config/aarch64/aarch64.md (rbit2): New pattern. * config/aarch64/aarch64.md (ctz2): New pattern. gcc/testsuite/ * gcc.target/aarch64/clrsb.c: New test. * gcc.target/aarch64/clz.c: New test. * gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64.md. +(UNSPEC_RBIT86) ; Used in aarch64.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr "v8type" "clz") (set_attr "mode" "")]) +(define_insn "clrsb2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_CLS))] + "" + "cls\\t%0, %1" + [(set_attr "v8type" "clz") + (set_attr "mode" "")]) + +(define_insn "rbit2" + [(set (match_operand:GPI 0 "register_operand" "=r") + (unspec:GPI [(match_operand:GPI 1 "register_operand" "r")] UNSPEC_RBIT))] + "" + "rbit\\t%0, %1" + [(set_attr "v8type" "rbit") + (set_attr "mode" "")]) + +(define_expand "ctz2" + [(match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "register_operand")] + "" + { +emit_insn (gen_rbit2 (operands[0], operands[1])); +emit_insn (gen_clz2 (operands[0], operands[0])); +DONE; + } +) + (define_insn "*and3nr_compare0" [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_clrsb (x); +} + +/* { dg-final { scan-assembler "cls\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_clz (x); +} + +/* { dg-final { scan-assembler "clz\tw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_ctz (x); +} + +/* { dg-final { scan-assembler "rbit\tw" } } */ +/* { dg-final { scan-assembler "clz\tw" } } */ +
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
> Ok. Having dug a bit deeper I think the main problem is that you're > working against yourself by not handling this pattern right from the > beginning. You have split the address incorrectly to begin and are > now trying to recover after the fact. > > The following patch seems to do the trick for me, producing > > > (insn 6 5 7 (set (reg:DI 81) > > (high:DI (const:DI (plus:DI (symbol_ref:DI ("arr") [flags > 0x80] ) > > (const_int 12 [0xc]) z.c:8 -1 > > (nil)) > > > > (insn 7 6 8 (set (reg:DI 80) > > (lo_sum:DI (reg:DI 81) > > (const:DI (plus:DI (symbol_ref:DI ("arr") [flags 0x80] > ) > > (const_int 12 [0xc]) z.c:8 -1 > > (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("arr") > [flags 0x80] ) > > (const_int 12 [0xc]))) > > (nil))) > > right from the .150.expand dump. > > I'll leave it to you to fully regression test and commit the patch > as appropriate. ;-) > Thanks so much for this, Richard. I have prepared a new patch heavily based off yours, which really demands its own new email trail, so I shall make a fresh post. Cheers, Ian
[PATCH, AArch64] Handle symbol + offset more effectively
Hi all, This patch corrects what seemed to be a typo in expand_mov_immediate in aarch64.c, where we had || instead of an && in our original code. if (offset != const0_rtx && (targetm.cannot_force_const_mem (mode, imm) || (can_create_pseudo_p ( // <- should have been && At any given time, this code would have treated all input the same and will have caused all non-zero offsets to have been forced to temporaries, and made us never run the code in the remainder of the function. In terms of measurable impact, this patch provides a better fix to the problem I was trying to solve with this patch: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg02072.html Almost all credit should go to Richard Henderson for this patch. It is all his, but for a minor change I made to some predicates which now become relevant when we execute more of the expand_mov_immediate function. My testing showed no regressions for bare-metal or linux. OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian 2012-09-25 Richard Henderson Ian Bolton * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Fix a functional typo and refactor code in switch statement. * config/aarch64/aarch64.md (add_losym): Handle symbol + offset. * config/aarch64/predicates.md (aarch64_tls_ie_symref): Match const. (aarch64_tls_le_symref): Likewise.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 2d7eba7..edeee30 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -652,43 +652,57 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) unsigned HOST_WIDE_INT val; bool subtargets; rtx subtarget; - rtx base, offset; int one_match, zero_match; gcc_assert (mode == SImode || mode == DImode); - /* If we have (const (plus symbol offset)), and that expression cannot - be forced into memory, load the symbol first and add in the offset. */ - split_const (imm, &base, &offset); - if (offset != const0_rtx - && (targetm.cannot_force_const_mem (mode, imm) - || (can_create_pseudo_p ( -{ - base = aarch64_force_temporary (dest, base); - aarch64_emit_move (dest, aarch64_add_offset (mode, NULL, base, INTVAL (offset))); - return; -} - /* Check on what type of symbol it is. */ - if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF) + if (GET_CODE (imm) == SYMBOL_REF + || GET_CODE (imm) == LABEL_REF + || GET_CODE (imm) == CONST) { - rtx mem; - switch (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)) + rtx mem, base, offset; + enum aarch64_symbol_type sty; + + /* If we have (const (plus symbol offset)), separate out the offset +before we start classifying the symbol. */ + split_const (imm, &base, &offset); + + sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR); + switch (sty) { case SYMBOL_FORCE_TO_MEM: - mem = force_const_mem (mode, imm); + if (offset != const0_rtx + && targetm.cannot_force_const_mem (mode, imm)) + { + gcc_assert(can_create_pseudo_p ()); + base = aarch64_force_temporary (dest, base); + base = aarch64_add_offset (mode, NULL, base, INTVAL (offset)); + aarch64_emit_move (dest, base); + return; + } + mem = force_const_mem (mode, imm); gcc_assert (mem); emit_insn (gen_rtx_SET (VOIDmode, dest, mem)); return; -case SYMBOL_SMALL_TLSGD: -case SYMBOL_SMALL_TLSDESC: -case SYMBOL_SMALL_GOTTPREL: -case SYMBOL_SMALL_TPREL: +case SYMBOL_SMALL_TLSGD: +case SYMBOL_SMALL_TLSDESC: +case SYMBOL_SMALL_GOTTPREL: case SYMBOL_SMALL_GOT: + if (offset != const0_rtx) + { + gcc_assert(can_create_pseudo_p ()); + base = aarch64_force_temporary (dest, base); + base = aarch64_add_offset (mode, NULL, base, INTVAL (offset)); + aarch64_emit_move (dest, base); + return; + } + /* FALLTHRU */ + +case SYMBOL_SMALL_TPREL: case SYMBOL_SMALL_ABSOLUTE: - aarch64_load_symref_appropriately - (dest, imm, aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)); + aarch64_load_symref_appropriately (dest, imm, sty); return; default: @@ -696,7 +710,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } - if ((CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))) + if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode)) { emit_insn (gen_rtx_SET (VOIDmode, dest, imm)); return; diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b399ab4..3834558 100644 --- a/gcc/config/aarch64/aarch64.md +
[PATCH, AArch64] Allow symbol+offset as symbolic constant expression
Hi, This patch reduces codesize for cases such as this one: int arr[100]; int foo () { return arr[10]; } Before the patch, the code looked like this: adrp x0, arr add x0, x0, :lo12:arr ldr w0, [x0,40] Now, it looks like this: adrp x0, arr+40 ldr w0, [x0,#:lo12:arr+40] Some workloads have seen up to 1K reduction in code size. OK to commit? Cheers, Ian 2012-07-06 Ian Bolton * gcc/config/aarch64/aarch64.c (aarch64_print_operand): Use aarch64_classify_symbolic_expression for classifying operands. * gcc/config/aarch64/aarch64.c (aarch64_classify_symbolic_expression): New function. * gcc/config/aarch64/aarch64.c (aarch64_symbolic_constant_p): New function. * gcc/config/aarch64/predicates.md (aarch64_valid_symref): Symbol with constant offset is a valid symbol reference.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 542c1e0..53c238a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2820,6 +2820,17 @@ aarch64_symbolic_address_p (rtx x) return GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF; } +/* Classify the base of symbolic expression X, given that X appears in + context CONTEXT. */ +static enum aarch64_symbol_type +aarch64_classify_symbolic_expression (rtx x, enum aarch64_symbol_context context) +{ + rtx offset; + split_const (x, &x, &offset); + return aarch64_classify_symbol (x, context); +} + + /* Return TRUE if X is a legitimate address for accessing memory in mode MODE. */ static bool @@ -3227,7 +3238,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) if (GET_CODE (x) == HIGH) x = XEXP (x, 0); - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_GOT: asm_fprintf (asm_out_file, ":got:"); @@ -3256,7 +3267,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'L': - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_GOT: asm_fprintf (asm_out_file, ":lo12:"); @@ -3285,7 +3296,8 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'G': - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_TPREL: asm_fprintf (asm_out_file, ":tprel_hi12:"); @@ -4746,6 +4758,8 @@ aarch64_classify_tls_symbol (rtx x) } } +/* Return the method that should be used to access SYMBOL_REF or + LABEL_REF X in context CONTEXT. */ enum aarch64_symbol_type aarch64_classify_symbol (rtx x, enum aarch64_symbol_context context ATTRIBUTE_UNUSED) @@ -4817,7 +4831,23 @@ aarch64_classify_symbol (rtx x, return SYMBOL_FORCE_TO_MEM; } +/* Return true if X is a symbolic constant that can be used in context + CONTEXT. If it is, store the type of the symbol in *SYMBOL_TYPE. */ + +bool +aarch64_symbolic_constant_p (rtx x, enum aarch64_symbol_context context, +enum aarch64_symbol_type *symbol_type) +{ + rtx offset; + split_const (x, &x, &offset); + if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF) +*symbol_type = aarch64_classify_symbol (x, context); + else +return false; + /* No checking of offset at this point. */ + return true; +} bool aarch64_constant_address_p (rtx x) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 7089e8b..328e5cf 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -114,8 +114,12 @@ (match_test "mode == DImode && CONSTANT_ADDRESS_P (op)"))) (define_predicate "aarch64_valid_symref" - (and (match_code "symbol_ref, label_ref") - (match_test "aarch64_classify_symbol (op, SYMBOL_CONTEXT_ADR) != SYMBOL_FORCE_TO_MEM"))) + (match_code "const, symbol_ref, label_ref") +{ + enum aarch64_symbol_type symbol_type; + return (aarch64_symbolic_constant_p (op, SYMBOL_CONTEXT_ADR, &symbol_type) +&& symbol_type != SYMBOL_FORCE_TO_MEM); +}) (define_predicate "aarch64_tls_ie_symref" (match_code "symbol_ref, label_ref")
[PATCH, ARM] Fix broken testcase, vfp-1.c, for Thumb
This patch makes the vfp-1.c testcase work for Thumb. It became broken when we restricted the negative offsets allowed for Thumb to fix up a Spec2K failure some months back. (It was previously possible to generate illegal offsets.) OK for trunk? Cheers, Ian 2011-07-28 Ian Bolton testsuite/ * gcc.target/arm/vfp-1.c: large negative offsets not possible on Thumb2. Index: gcc/testsuite/gcc.target/arm/vfp-1.c === --- gcc/testsuite/gcc.target/arm/vfp-1.c(revision 176838) +++ gcc/testsuite/gcc.target/arm/vfp-1.c(working copy) @@ -127,13 +127,13 @@ void test_convert () { void test_ldst (float f[], double d[]) { /* { dg-final { scan-assembler "flds.+ \\\[r0, #1020\\\]" } } */ - /* { dg-final { scan-assembler "flds.+ \\\[r0, #-1020\\\]" } } */ + /* { dg-final { scan-assembler "flds.+ \\\[r\[0-9\], #-1020\\\]" { target { arm32 && { ! arm_thumb2_ok } } } } } */ /* { dg-final { scan-assembler "add.+ r0, #1024" } } */ - /* { dg-final { scan-assembler "fsts.+ \\\[r0, #0\\\]\n" } } */ + /* { dg-final { scan-assembler "fsts.+ \\\[r\[0-9\], #0\\\]\n" } } */ f[256] = f[255] + f[-255]; /* { dg-final { scan-assembler "fldd.+ \\\[r1, #1016\\\]" } } */ - /* { dg-final { scan-assembler "fldd.+ \\\[r1, #-1016\\\]" } } */ + /* { dg-final { scan-assembler "fldd.+ \\\[r\[1-9\], #-1016\\\]" { target { arm32 && { ! arm_thumb2_ok } } } } } */ /* { dg-final { scan-assembler "fstd.+ \\\[r1, #256\\\]" } } */ d[32] = d[127] + d[-127]; }
RE: Move cgraph_node_set and varpool_node_set out of GGC and make them use pointer_map
> Hi, > I always considered the cgrpah_node_set/varpool_node_set to be > overengineered > but they also turned out to be quite ineffective since we do quite a > lot of > queries into them during stremaing out. > > This patch moves them to pointer_map, like I did for streamer cache. > While > doing so I needed to get the structure out of GGC memory since > pointer_map is > not ggc firendly. This is not a deal at all, because the sets can only > point to > live cgraph/varpool entries anyway. Pointing to removed ones would lead > to > spectacular failures in any case. > > Bootstrapped/regtested x86_64-linux, OK? > > Honza > > * cgraph.h (cgraph_node_set_def, varpool_node_set_def): Move out > of GTY; > replace hash by pointer map. > (cgraph_node_set_element_def, cgraph_node_set_element, > const_cgraph_node_set_element, varpool_node_set_element_def, > varpool_node_set_element, const_varpool_node_set_element): > Remove. > (free_cgraph_node_set, free_varpool_node_set): New function. > (cgraph_node_set_size, varpool_node_set_size): Use vector size. > * tree-emutls.c: Free varpool node set. > * ipa-utils.c (cgraph_node_set_new, cgraph_node_set_add, > cgraph_node_set_remove, cgraph_node_set_find, > dump_cgraph_node_set, > debug_cgraph_node_set, free_cgraph_node_set, > varpool_node_set_new, > varpool_node_set_add, varpool_node_set_remove, > varpool_node_set_find, > dump_varpool_node_set, free_varpool_node_set, > debug_varpool_node_set): > Move here from ipa.c; implement using pointer_map > * ipa.c (cgraph_node_set_new, cgraph_node_set_add, > cgraph_node_set_remove, cgraph_node_set_find, > dump_cgraph_node_set, > debug_cgraph_node_set, varpool_node_set_new, > varpool_node_set_add, varpool_node_set_remove, > varpool_node_set_find, > dump_varpool_node_set, debug_varpool_node_set): > Move to ipa-uitls.c. > * lto/lto.c (ltrans_partition_def): Remove GTY annotations. > (ltrans_partitions): Move to heap. > (new_partition): Update. > (free_ltrans_partitions): New function. > (lto_wpa_write_files): Use it. > * passes.c (ipa_write_summaries): Update. This causes cross and native build of ARM Linux toolchain to fail: gcc -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite- strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing- format-attribute -Wold-style-definition -Wc++-compat -fno-common - DHAVE_CONFIG_H -I. -Ilto - I/work/source/gcc - I/work/source/gcc/lto - I/work/source/gcc/../include - I/work/source/gcc/../libcpp/include - I/work/source/gcc/../libdecnumber - I/work/source/gcc/../libdecnumber/dpd -I../libdecnumber /work/source/gcc/lto/lto.c -o lto/lto.o /work/source/gcc/lto/lto.c:1163: warning: function declaration isn't a prototype /work/source/gcc/lto/lto.c: In function 'free_ltrans_partitions': /work/source/gcc/lto/lto.c:1163: warning: old-style function definition /work/source/gcc/lto/lto.c:1168: error: 'struct ltrans_partition_def' has no member named 'cgraph' /work/source/gcc/lto/lto.c:1168: error: 'set' undeclared (first use in this function) /work/source/gcc/lto/lto.c:1168: error: (Each undeclared identifier is reported only once /work/source/gcc/lto/lto.c:1168: error: for each function it appears in.) /work/source/gcc/lto/lto.c:1171: warning: implicit declaration of function 'VEC_latrans_partition_heap_free' make[2]: *** [lto/lto.o] Error 1 make[2]: *** Waiting for unfinished jobs rm gcov.pod gfdl.pod cpp.pod fsf-funding.pod gcc.pod make[2]: Leaving directory `/work/cross-build/trunk-r173334- thumb/arm-none-linux-gnueabi/obj/gcc1/gcc' make[1]: *** [all-gcc] Error 2 make[1]: Leaving directory `/work/cross-build/trunk-r173334- thumb/arm-none-linux-gnueabi/obj/gcc1' make: *** [all] Error 2 + exit But I see you fixed it up soon after (r173336), so no action is required now, but I figured it was worth letting people know anyway. Cheers, Ian