Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]
Hi Jackson On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote: Hi all, This patch resolves PR86014. It does so by noticing that the last load may clobber the address register without issue (regardless of where it exists in the final ldp/stp sequence). That check has been changed so that the last register may be clobbered and the testcase (gcc.target/aarch64/ldp_stp_10.c) now passes. Bootstrap and regtest OK. OK for trunk? Jackson Changelog: gcc/ 2018-06-25 Jackson Woodruff PR target/86014 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp): Remove address clobber check on last register. This looks good to me but you will need a maintainer to approve it. The only thing I would add is that if you could move the comment on top of the for loop to this patch. That is, keep the original /* Check if the addresses are clobbered by load. */ in your [1/2] and make the comment change in [2/2]. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load, /* Only the last register in the order in which they occur may be clobbered by the load. */ if (load) -for (int i = 0; i < num_instructions; i++) +for (int i = 0; i < num_instructions - 1; i++) if (reg_mentioned_p (reg[i], mem[i])) return false; Thanks Sudi
Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]
Hi Jackson On 11/07/18 17:48, Jackson Woodruff wrote: Hi Sudi, Thanks for the review. On 07/10/2018 10:56 AM, Sudakshina wrote: Hi Jackson - if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode)) + if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode)) mem_1 == mem[1]? Oops, yes... That should be mem[0]. return false; - /* The mems cannot be volatile. */ ... /* If we have SImode and slow unaligned ldp, check the alignment to be at least 8 byte. */ if (mode == SImode && (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) + & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) && !optimize_size - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) + && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT) Likewise Done ... /* Check if the registers are of same class. */ - if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != rclass_4) - return false; + for (int i = 0; i < 3; i++) num_instructions -1 instead of 3 would be more consistent. Done + if (rclass[i] != rclass[i + 1]) + return false; It looks good otherwise. Thanks Sudi Re-regtested and boostrapped. OK for trunk? Looks good to me but you will need approval from a maintainer to commit it! Thanks Sudi Thanks, Jackson
Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]
Hi Jackson On 11/07/18 17:48, Jackson Woodruff wrote: Hi Sudi, On 07/10/2018 02:29 PM, Sudakshina Das wrote: Hi Jackson On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote: Hi all, This patch resolves PR86014. It does so by noticing that the last load may clobber the address register without issue (regardless of where it exists in the final ldp/stp sequence). That check has been changed so that the last register may be clobbered and the testcase (gcc.target/aarch64/ldp_stp_10.c) now passes. Bootstrap and regtest OK. OK for trunk? Jackson Changelog: gcc/ 2018-06-25 Jackson Woodruff PR target/86014 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp): Remove address clobber check on last register. This looks good to me but you will need a maintainer to approve it. The only thing I would add is that if you could move the comment on top of the for loop to this patch. That is, keep the original /* Check if the addresses are clobbered by load. */ in your [1/2] and make the comment change in [2/2]. Thanks, change made. OK for trunk? Looks good to me but you will need approval from a maintainer to commit it! Thanks Sudi Thanks, Jackson
Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns
Hi Matthew On 12/07/18 11:18, Richard Sandiford wrote: Looks good to me FWIW (not a maintainer), just a minor formatting thing: Matthew Malcomson writes: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3302,38 +3302,78 @@ DONE; }) -(define_insn "aarch64_w" +(define_insn "aarch64_subw" [(set (match_operand: 0 "register_operand" "=w") -(ADDSUB: (match_operand: 1 "register_operand" "w") - (ANY_EXTEND: - (match_operand:VD_BHSI 2 "register_operand" "w"] + (minus: +(match_operand: 1 "register_operand" "w") +(ANY_EXTEND: + (match_operand:VD_BHSI 2 "register_operand" "w"] The (minus should be under the "(match_operand": (define_insn "aarch64_subw" [(set (match_operand: 0 "register_operand" "=w") (minus: (match_operand: 1 "register_operand" "w") (ANY_EXTEND: (match_operand:VD_BHSI 2 "register_operand" "w"] Same for the other patterns. Thanks, Richard You will need a maintainer's approval but this looks good to me. Thanks for doing this. I would only point out one other nit which you can choose to ignore: +/* Ensure + saddw2 and one saddw for the function add() + ssubw2 and one ssubw for the function subtract() + uaddw2 and one uaddw for the function uadd() + usubw2 and one usubw for the function usubtract() */ + +/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */ The scan-assembly directives for the different functions can be placed right below each of them and that would make it easier to read the expected results in the test and you can get rid of the comments saying the same. Thanks Sudi
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi Eric On 27/06/18 12:22, Wilco Dijkstra wrote: Eric Botcazou wrote: This test can easily be changed not to use optimize since it doesn't look like it needs it. We really need to tests these builtins properly, otherwise they will continue to fail on most targets. As far as I can see PR target/84521 has been reported only for Aarch64 so I'd just leave the other targets alone (and avoid propagating FUD if possible). It's quite obvious from PR84521 that this is an issue affecting all targets. Adding better generic tests for __builtin_setjmp can only be a good thing. Wilco This conversation seems to have died down and I would like to start it again. I would agree with Wilco's suggestion about keeping the test in the generic folder. I have removed the optimize attribute and the effect is still the same. It passes on AArch64 with this patch and it currently fails on x86 trunk (gcc version 9.0.0 20180712 (experimental) (GCC)) on -O1 and above. Thanks Sudi diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f284e74..9792d28 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 01f35f8..4266a3d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a014a01..d5f33d8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6087,6 +6087,30 @@ DONE; }) +;; This is broadly similar to the builtins.c except that it uses +;; temporaries to load the incoming SP and FP. +(define_expand "nonlocal_goto" + [(use (match_operand 0 "general_operand")) + (use (match_operand 1 "general_operand")) + (use (match_operand 2 "general_operand")) + (use (match_operand 3 "general_operand"))] + "" +{ +rtx label_in = copy_to_reg (operands[1]); +rtx fp_in = copy_to_reg (operands[3]); +rtx sp_in = copy_to_reg (operands[2]); + +emit_move_insn (hard_frame_pointer_rtx, fp_in); +emit_stack_restore (SAVE_NONLOCAL, sp_in); + +emit_use (hard_frame_pointer_rtx); +emit_use (stack_pointer_rtx); + +emit_indirect_jump (label_in); + +DONE; +}) + ;; Helper for aarch64.c code. (define_expand "set_clobber_cc" [(parallel [(set (match_operand 0) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c new file mode 100644 index 000..564ef14 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c @@ -0,0 +1,53 @@ +/* { dg-require-effective-target indirect_jumps } */ + +#include +#include +#include + +jmp_buf buf; + +int uses_longjmp (void) +{ + jmp_buf buf2; + memcpy (buf2, buf, sizeof (buf)); + __builtin_longjmp (buf2, 1); +} + +int gl; +void after_longjmp (void) +{ + gl = 5; +} + +int +test_1 (int n) +{ + volatile int *p = alloca (n); + if (__builtin_setjmp (buf)) +{ + after_longjmp (); +} + else +{ + uses_longjmp (); +} + + return 0; +} + +int +test_2 (int n) +{ + int i; + int *ptr = (int *)__builtin_alloca (sizeof (int) * n); + for (i = 0; i < n; i++) +ptr[i] = i; + test_1 (n); + return 0; +} + +int main (int argc, const char **argv) +{ + __builtin_memset (&buf, 0xaf, sizeof (buf)); + test_2 (100); +}
Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks
Hi Sam On 13/07/18 17:09, Sam Tebbs wrote: Hi all, This patch adds an optimisation that exploits the AArch64 BFXIL instruction when or-ing the result of two bitwise and operations with non-overlapping bitmasks (e.g. (a & 0x) | (b & 0x)). Example: unsigned long long combine(unsigned long long a, unsigned long long b) { return (a & 0xll) | (b & 0xll); } void read2(unsigned long long a, unsigned long long b, unsigned long long *c, unsigned long long *d) { *c = combine(a, b); *d = combine(b, a); } When compiled with -O2, read2 would result in: read2: and x5, x1, #0x and x4, x0, #0x orr x4, x4, x5 and x1, x1, #0x and x0, x0, #0x str x4, [x2] orr x0, x0, x1 str x0, [x3] ret But with this patch results in: read2: mov x4, x1 bfxil x4, x0, 0, 32 str x4, [x2] bfxil x0, x1, 0, 32 str x0, [x3] ret Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions. I am not a maintainer but I have a question about this patch. I may be missing something or reading it wrong. So feel free to point it out: +(define_insn "*aarch64_bfxil" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r") + (match_operand 3 "const_int_operand")) + (and:DI (match_operand:DI 2 "register_operand" "0") + (match_operand 4 "const_int_operand"] + "INTVAL (operands[3]) == ~INTVAL (operands[4]) + && aarch64_is_left_consecutive (INTVAL (operands[3]))" + { + HOST_WIDE_INT op4 = INTVAL (operands[4]); + operands[3] = GEN_INT (64 - ceil_log2 (op4)); + output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands); In the BFXIL you are reading %3 LSB bits from operand 1 and putting it in the LSBs of %0. This means that the pattern should be masking the 32-%3 MSB of %0 and %3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE> Can you please compare a simpler version of the above example you gave to make sure the generated assembly is equivalent before and after the patch: void read2(unsigned long long a, unsigned long long b, unsigned long long *c) { *c = combine(a, b); } From the above text read2: and x5, x1, #0x and x4, x0, #0x orr x4, x4, x5 read2: mov x4, x1 bfxil x4, x0, 0, 32 This does not seem equivalent to me. Thanks Sudi + return ""; + } + [(set_attr "type" "bfx")] +) gcc/ 2018-07-11 Sam Tebbs * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt): Define. * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive): Define. * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function. gcc/testsuite 2018-07-11 Sam Tebbs * gcc.target/aarch64/combine_bfxil.c: New file. * gcc.target/aarch64/combine_bfxil_2.c: New file.
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64
Hi Vlad On Friday 20 July 2018 10:37 AM, Vlad Lazar wrote: Hi, The patch adds implementations for the NEON intrinsics vabsd_s64 and vnegd_s64. (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification) Bootstrapped and regtested on aarch64-none-linux-gnu and there are no regressions. OK for trunk? Thanks for doing this. This looks good to me but you will a maintainer's approval. Thanks Sudi Thanks, Vlad gcc/ 2018-07-02 Vlad Lazar * config/aarch64/arm_neon.h (vabsd_s64, vnegd_s64): New. gcc/testsuite/ 2018-07-02 Vlad Lazar * gcc.target/aarch64/scalar_intrinsics.c (test_vabsd_s64, test_vabsd_s64): New. --- diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 2d18400040f031dfcdaf60269ad484647804e1be..19e22431a85bcd09d0ea759b42b0a52420b6c43c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -11822,6 +11822,13 @@ vabsq_s64 (int64x2_t __a) return __builtin_aarch64_absv2di (__a); } +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vabsd_s64 (int64_t __a) +{ + return __builtin_aarch64_absdi (__a); +} + /* vadd */ __extension__ extern __inline int64_t @@ -22907,6 +22914,12 @@ vneg_s64 (int64x1_t __a) return -__a; } +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vnegd_s64 (int64_t __a) +{ + return -__a; +} __extension__ extern __inline float32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vnegq_f32 (float32x4_t __a) diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index ea29066e369b967d0781d31c8a5208bda9e4f685..45afeec373971838e0cd107038b4aa51a2d4998f 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -603,6 +603,14 @@ test_vsqaddd_u64 (uint64_t a, int64_t b) return vsqaddd_u64 (a, b); } +/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+" 1 } } */ + +int64_t +test_vabsd_s64 (int64_t a) +{ + return vabsd_s64 (a); +} + /* { dg-final { scan-assembler-times "\\tsqabs\\tb\[0-9\]+" 1 } } */ int8_t @@ -627,6 +635,14 @@ test_vqabss_s32 (int32_t a) return vqabss_s32 (a); } +/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */ + +int64_t +test_vnegd_s64 (int64_t a) +{ + return vnegd_s64 (a); +} + /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */ int8_t
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 25/07/18 14:08, Sam Tebbs wrote: On 07/23/2018 05:01 PM, Sudakshina Das wrote: Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi Hi Sudi, Thanks for the feedback. Here is an updated patch that adds more testcases to cover the patterns generated by the different mode combinations. The changelog and description from my original email still apply. Thanks for making the changes and adding more test cases. I do however see that you are only covering 2 out of 4 new *aarch64_get_lane_zero_extenddi<> patterns. The *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind those tests. I would just ask you to add the other two new patterns as well. Also since the different versions of the instruction generate same instructions (like foo_16qi and foo_8qi both give out the same instruction), I would suggest using a -fdump-rtl-final (or any relevant rtl dump) with the dg-options and using a scan-rtl-dump to scan the pattern name. Something like: /* { dg-do compile } */ /* { dg-options "-O3 -fdump-rtl-final" } */ ... ... /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ Thanks Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 25/07/18 14:08, Sam Tebbs wrote: On 07/23/2018 05:01 PM, Sudakshina Das wrote: Hi Sam On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: Hi all, This patch extends the aarch64_get_lane_zero_extendsi instruction definition to also cover DI mode. This prevents a redundant AND instruction from being generated due to the pattern failing to be matched. Example: typedef char v16qi __attribute__ ((vector_size (16))); unsigned long long foo (v16qi a) { return a[0]; } Previously generated: foo: umov w0, v0.b[0] and x0, x0, 255 ret And now generates: foo: umov w0, v0.b[0] ret Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no regressions. gcc/ 2018-07-23 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-23 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file You will need an approval from a maintainer, but I would only add one request to this: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 89e38e6..15fb661 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3032,15 +3032,16 @@ [(set_attr "type" "neon_to_gp")] ) -(define_insn "*aarch64_get_lane_zero_extendsi" - [(set (match_operand:SI 0 "register_operand" "=r") - (zero_extend:SI +(define_insn "*aarch64_get_lane_zero_extend" + [(set (match_operand:GPI 0 "register_operand" "=r") + (zero_extend:GPI Since you are adding 4 new patterns with this change, could you add more cases in your test as well to make sure you have coverage for each of them. Thanks Sudi Hi Sudi, Thanks for the feedback. Here is an updated patch that adds more testcases to cover the patterns generated by the different mode combinations. The changelog and description from my original email still apply. Thanks it looks good to me! You will still need a maintainer to approve. Sudi (vec_select: (match_operand:VDQQH 1 "register_operand" "w") (parallel [(match_operand:SI 2 "immediate_operand" "i")]] "TARGET_SIMD" { - operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2])); + operands[2] = aarch64_endian_lane_rtx (mode, + INTVAL (operands[2])); return "umov\\t%w0, %1.[%2]"; } [(set_attr "type" "neon_to_gp")]
Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode
Hi Sam On 01/08/18 10:12, Sam Tebbs wrote: On 07/31/2018 11:16 PM, James Greenhalgh wrote: On Thu, Jul 26, 2018 at 11:52:15AM -0500, Sam Tebbs wrote: Thanks for making the changes and adding more test cases. I do however see that you are only covering 2 out of 4 new *aarch64_get_lane_zero_extenddi<> patterns. The *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind those tests. I would just ask you to add the other two new patterns as well. Also since the different versions of the instruction generate same instructions (like foo_16qi and foo_8qi both give out the same instruction), I would suggest using a -fdump-rtl-final (or any relevant rtl dump) with the dg-options and using a scan-rtl-dump to scan the pattern name. Something like: /* { dg-do compile } */ /* { dg-options "-O3 -fdump-rtl-final" } */ ... ... /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */ Thanks Sudi Hi Sudi, Thanks again. Here's an update that adds 4 more tests, so all 8 patterns generated are now tested for! This is OK for trunk, thanks for the patch (and thanks Sudi for the review!) Thanks, James Thank you James! I'd appreciate it if someone could commit it as I don't have commit rights yet. I have committed this on your behalf as r263200. Thanks Sudi Sam Below is the updated changelog gcc/ 2018-07-26 Sam Tebbs * config/aarch64/aarch64-simd.md (*aarch64_get_lane_zero_extendsi): Rename to... (*aarch64_get_lane_zero_extend): ... This. Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-26 Sam Tebbs * gcc.target/aarch64/extract_zero_extend.c: New file
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi On 31/07/18 22:48, Andrew Pinski wrote: On Tue, Jul 31, 2018 at 2:43 PM James Greenhalgh wrote: On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote: Hi Eric On 27/06/18 12:22, Wilco Dijkstra wrote: Eric Botcazou wrote: This test can easily be changed not to use optimize since it doesn't look like it needs it. We really need to tests these builtins properly, otherwise they will continue to fail on most targets. As far as I can see PR target/84521 has been reported only for Aarch64 so I'd just leave the other targets alone (and avoid propagating FUD if possible). It's quite obvious from PR84521 that this is an issue affecting all targets. Adding better generic tests for __builtin_setjmp can only be a good thing. Wilco This conversation seems to have died down and I would like to start it again. I would agree with Wilco's suggestion about keeping the test in the generic folder. I have removed the optimize attribute and the effect is still the same. It passes on AArch64 with this patch and it currently fails on x86 trunk (gcc version 9.0.0 20180712 (experimental) (GCC)) on -O1 and above. I don't see where the FUD comes in here; either this builtin has a defined semantics across targets and they are adhered to, or the builtin doesn't have well defined semantics, or the targets fail to implement those semantics. The problem comes from the fact the builtins are not documented at all. See PR59039 for the issue on them not being documented. Thanks @James for bringing this up again. I tried to revive the conversation on PR59039 while working on this as well but that conversation mainly focused on documenting if we are allowed to use __builtin_setjmp and __builtin_longjmp on the same function and with the same jmp buffer or not. This patch and this test case however does not involve that issue. There are other holes in the documentation/implementation of these builtins. For now as advised by James, I have posted the test case on the PR. I personally don't see why this test case should go on the AArch64 tests when it clearly fails on other targets as well. But if we can not come to an agreement on that, I am willing to move it to AArch64 tests and maybe open a new bug report which is not marked as "target" with the same test case. Thanks Sudi Thanks, Andrew I think this should go in as is. If other targets are unhappy with the failing test they should fix their target or skip the test if it is not appropriate. You may want to CC some of the maintainers of platforms you know to fail as a courtesy on the PR (add your testcase, and add failing targets and their maintainers to that PR) before committing so it doesn't come as a complete surprise. This is OK with some attempt to get target maintainers involved in the conversation before commit. Thanks, James diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index f284e74..9792d28 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 01f35f8..4266a3d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17744,6 +17751,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a014a01..d5f33d8 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/
[PATCH, GCC] Fix unrolling check.
Hi I was fiddling around with the loop unrolling pass and noticed a check in decide_unroll_* functions (in the patch). The comment on top of this check says "/* If we were not asked to unroll this loop, just return back silently. */" However the check returns when loop->unroll == 0 rather than 1. The check was added in r255106 where the ChangeLog suggests that the actual intention was probably to check the value 1 and not 0. Tested on aarch64-none-elf with one new regression: FAIL: gcc.dg/pr40209.c (test for excess errors) This fails because the changes cause the loop to unroll 3 times using unroll_stupid and that shows up as excess error due -fopt-info. This option was added in r202077 but I am not sure why this particular test was chosen for it. Does this change look ok? Can I just remove the -fopt-info from the test or unrolling the loop in the test is not desirable? Thanks Sudi gcc/ChangeLog: 2019-11-07 Sudakshina Das * loop-unroll.c (decide_unroll_constant_iterations): Update condition to check loop->unroll. (decide_unroll_runtime_iterations): Likewise. (decide_unroll_stupid): Likewise. diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 63fccd23fae38f8918a7d94411aaa43c72830dd3..9f7ab4b5c1c9b2333148e452b84afbf040707456 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -354,7 +354,7 @@ decide_unroll_constant_iterations (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL) && !loop->unroll) + if (!(flags & UAP_UNROLL) && loop->unroll == 1) return; if (dump_enabled_p ()) @@ -674,7 +674,7 @@ decide_unroll_runtime_iterations (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL) && !loop->unroll) + if (!(flags & UAP_UNROLL) && loop->unroll == 1) return; if (dump_enabled_p ()) @@ -1159,7 +1159,7 @@ decide_unroll_stupid (class loop *loop, int flags) widest_int iterations; /* If we were not asked to unroll this loop, just return back silently. */ - if (!(flags & UAP_UNROLL_ALL) && !loop->unroll) + if (!(flags & UAP_UNROLL_ALL) && loop->unroll == 1) return; if (dump_enabled_p ())
Re: [PATCH, GCC] Fix unrolling check.
Hi Eric On 08/11/2019 19:16, Eric Botcazou wrote: >> I was fiddling around with the loop unrolling pass and noticed a check >> in decide_unroll_* functions (in the patch). The comment on top of this >> check says >> "/* If we were not asked to unroll this loop, just return back silently. >>*/" >> However the check returns when loop->unroll == 0 rather than 1. >> >> The check was added in r255106 where the ChangeLog suggests that the >> actual intention was probably to check the value 1 and not 0. > > No, this is intended, 0 is the default value of the field, not 1. And note > that decide_unroll_constant_iterations, decide_unroll_runtime_iterations and > decide_unroll_stupid *cannot* be called with loop->unroll == 1 because of this > check in decide_unrolling: Thanks for the explanation. However, I do not understand why are we returning with the default value. The comment for "unroll" is a bit ambiguous for value 0. /* The number of times to unroll the loop. 0 means no information given, just do what we always do. A value of 1 means do not unroll the loop. A value of USHRT_MAX means unroll with no specific unrolling factor. Other values means unroll with the given unrolling factor. */ unsigned short unroll; What "do we always do"? Thanks Sudi > >if (loop->unroll == 1) > { > if (dump_file) > fprintf (dump_file, >";; Not unrolling loop, user didn't want it unrolled\n"); > continue; > } > >> Tested on aarch64-none-elf with one new regression: >> FAIL: gcc.dg/pr40209.c (test for excess errors) >> This fails because the changes cause the loop to unroll 3 times using >> unroll_stupid and that shows up as excess error due -fopt-info. This >> option was added in r202077 but I am not sure why this particular test >> was chosen for it. > > That's a regression, there should be no unrolling. >
Re: [PATCH, GCC] Fix unrolling check.
On 11/11/2019 14:50, Eric Botcazou wrote: >> Thanks for the explanation. However, I do not understand why are we >> returning with the default value. > > The regression you reported should be clear enough though: if we don't do > that, we will unroll in cases where we would not have before. Try with a > compiler that predates the pragma and compare, there should be no changes. > >> What "do we always do"? > > What we do in the absence of specific unrolling directives for the loop. Yeah fair enough! Sorry for the trouble. Sudi >
[Patch, GCC] Fix a condition post r278611
Hi While looking at vect_model_reduction_cost function, it seems Richard's change in a recent commit r278611 missed an update to the following if condition. Since the check for EXTRACT_LAST_REDUCTION is now split above, the same check in the if condition will never be true. gcc/ChangeLog 2019-xx-xx Sudakshina Das * tree-vect-loop.c (vect_model_reduction_cost): Remove reduction_type check from if condition. Is this ok for trunk? Thanks Sudi diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index ca8c818..7469204 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, internal_fn reduc_fn, /* No extra instructions needed in the prologue. */ prologue_cost = 0; - if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST) + if (reduc_fn != IFN_LAST) /* Count one reduction-like operation per vector. */ inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar, stmt_info, 0, vect_body);
Re: [Patch, GCC] Fix a condition post r278611
Hi Richard On 05/12/2019 17:04, Richard Sandiford wrote: > Sudakshina Das writes: >> Hi >> >> While looking at vect_model_reduction_cost function, it seems Richard's >> change in a recent commit r278611 missed an update to the following if >> condition. Since the check for EXTRACT_LAST_REDUCTION is now split >> above, the same check in the if condition will never be true. >> >> gcc/ChangeLog >> >> 2019-xx-xx Sudakshina Das >> >> * tree-vect-loop.c (vect_model_reduction_cost): Remove >> reduction_type check from if condition. >> >> Is this ok for trunk? > > OK, thanks. Thanks. Committed as r279012. Sudi > > Richard > >> >> Thanks >> Sudi >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> index ca8c818..7469204 100644 >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, >> internal_fn reduc_fn, >> /* No extra instructions needed in the prologue. */ >> prologue_cost = 0; >> >> - if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST) >> + if (reduc_fn != IFN_LAST) >> /* Count one reduction-like operation per vector. */ >> inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar, >> stmt_info, 0, vect_body);
Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi While looking at the vectorization for following example, we realized that even though vectorizable_shift function was distinguishing vector shifted by vector from vector shifted by scalar, while modeling the cost it would always add the cost of building a vector constant despite not needing it for vector shifted by scalar. This patch fixes this by using scalar_shift_arg to determine whether we need to build a vector for the second operand or not. This reduces prologue cost as shown in the test. Build and regression tests pass on aarch64-none-elf and x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in Spec2017 for AArch64. gcc/ChangeLog: 2019-xx-xx Sudakshina Das Richard Sandiford * tree-vect-stmt.c (vectorizable_shift): Condition ndts for vect_model_simple_cost call on scalar_shift_arg. gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das * gcc.dg/vect/vect-shift-5.c: New test. Is this ok for trunk? Thanks Sudi diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c new file mode 100644 index 000..c1fd4f2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_int } */ + +typedef unsigned int uint32_t; +typedef short unsigned int uint16_t; + +int foo (uint32_t arr[4][4]) +{ + int sum = 0; + for(int i = 0; i < 4; i++) +{ + sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53) + + ((arr[2][i] >> 12) * 7) + ((arr[3][i] >> 13) ^ 43); +} +return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; +} + +/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 2cb6b15..396ff15 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, { STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type; DUMP_VECT_SCOPE ("vectorizable_shift"); - vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec); + vect_model_simple_cost (stmt_info, ncopies, dt, + scalar_shift_arg ? 1 : ndts, slp_node, cost_vec); return true; }
Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi Jeff On 07/12/2019 17:44, Jeff Law wrote: > On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote: >> Hi >> >> While looking at the vectorization for following example, we >> realized >> that even though vectorizable_shift function was distinguishing >> vector >> shifted by vector from vector shifted by scalar, while modeling the >> cost >> it would always add the cost of building a vector constant despite >> not >> needing it for vector shifted by scalar. >> >> This patch fixes this by using scalar_shift_arg to determine whether >> we >> need to build a vector for the second operand or not. This reduces >> prologue cost as shown in the test. >> >> Build and regression tests pass on aarch64-none-elf and >> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >> Spec2017 for AArch64. >> >> gcc/ChangeLog: >> >> 2019-xx-xx Sudakshina Das >> Richard Sandiford >> >> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >> vect_model_simple_cost call on scalar_shift_arg. >> >> gcc/testsuite/ChangeLog: >> >> 2019-xx-xx Sudakshina Das >> >> * gcc.dg/vect/vect-shift-5.c: New test. > It's a bit borderline, but it's really just twiddling a cost, so OK. Thanks :) Committed as r279114. Sudi > > jeff >
Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts
Hi Christophe On 10/12/2019 09:01, Christophe Lyon wrote: > Hi, > > On Mon, 9 Dec 2019 at 11:23, Sudakshina Das wrote: >> >> Hi Jeff >> >> On 07/12/2019 17:44, Jeff Law wrote: >>> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote: >>>> Hi >>>> >>>> While looking at the vectorization for following example, we >>>> realized >>>> that even though vectorizable_shift function was distinguishing >>>> vector >>>> shifted by vector from vector shifted by scalar, while modeling the >>>> cost >>>> it would always add the cost of building a vector constant despite >>>> not >>>> needing it for vector shifted by scalar. >>>> >>>> This patch fixes this by using scalar_shift_arg to determine whether >>>> we >>>> need to build a vector for the second operand or not. This reduces >>>> prologue cost as shown in the test. >>>> >>>> Build and regression tests pass on aarch64-none-elf and >>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in >>>> Spec2017 for AArch64. >>>> > > Looks like you didn't check on arm, where I can see that the new testcase > fails: > FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects > scan-tree-dump vect "vectorizable_shift > ===[\\n\\r][^\\n]*prologue_cost = 0" > FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect > "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0" > > Seen on arm-none-linux-gnueabihf > --with-mode arm > --with-cpu cortex-a9 > --with-fpu neon-fp16 > > Christophe Thanks for reporting this. There is already a bugzilla report PR92870 for powerpc that I am looking at. Apologies I couldn't find your email address there to add you to the cc list. Thanks Sudi > >>>> gcc/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das >>>> Richard Sandiford >>>> >>>> * tree-vect-stmt.c (vectorizable_shift): Condition ndts for >>>> vect_model_simple_cost call on scalar_shift_arg. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-xx-xx Sudakshina Das >>>> >>>> * gcc.dg/vect/vect-shift-5.c: New test. >>> It's a bit borderline, but it's really just twiddling a cost, so OK. >> >> Thanks :) Committed as r279114. >> >> Sudi >> >>> >>> jeff >>> >>
[Committed, testsuite] Fix PR92870
Hi With my recent commit, I added a test that is not passing on all targets. My change was valid for targets that have a vector/scalar shift/rotate optabs (optab that supports vector shifted by scalar). Since it does not seem to be easy to find out which targets would support it, I am limiting the test to the target that I know pass. Committed as obvious r279310. gcc/testsuite/ChangeLog 2019-12-12 Sudakshina Das PR testsuite/92870 * gcc.dg/vect/vect-shift-5.c: Add target to scan-tree-dump. diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c index c1fd4f2..68e517e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c @@ -16,4 +16,7 @@ int foo (uint32_t arr[4][4]) return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1; } -/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */ +/* For a target that has a vector/scalar shift/rotate optab, check + that we are not adding the cost of creating a vector from the scalar + in the prologue. */ +/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" { target { aarch64*-*-* x86_64-*-* } } } } */
[PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi This is my attempt at reviving the old patch https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html I have followed on Kyrill's comment upstream on the link above and I am using the recommended option iii that he mentioned. "1) Adjust the copy_limit to 256 bits after checking AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning. 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-bit moves. by iii: iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs ldp/stps. This wouldn't need any adjustments to MD patterns, but would make aarch64_copy_one_block_and_progress_pointers more complex as it would now have two paths, where one handles two adjacent memory addresses in one calls." With this patch the following test #define N 8 extern int src[N], dst[N]; void foo (void) { __builtin_memcpy (dst, src, N * sizeof (int)); } which was originally giving foo: adrpx1, src add x1, x1, :lo12:src ldp x4, x5, [x1] adrpx0, dst add x0, x0, :lo12:dst ldp x2, x3, [x1, 16] stp x4, x5, [x0] stp x2, x3, [x0, 16] ret changes to the following foo: adrpx1, src add x1, x1, :lo12:src adrpx0, dst add x0, x0, :lo12:dst ldp q1, q0, [x1] stp q1, q0, [x0] ret This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and an overall code size reduction on most SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair registers. Bootstrapped and regression tested on aarch64-none-linux-gnu. Is this ok for trunk? Thanks Sudi gcc/ChangeLog: 2020-07-23 Sudakshina Das Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case for E_V4SImode. (aarch64_gen_load_pair): Likewise. (aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy. (aarch64_expand_cpymem): Expand copy_limit to 256bits where appropriate. gcc/testsuite/ChangeLog: 2020-07-23 Sudakshina Das Kyrylo Tkachov * gcc.target/aarch64/cpymem-q-reg_1.c: New test. * gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs. ** Attachment inlined ** diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3fe1feaa80ccb0a287ee1c7ea1056e8f0a830532..a38ff39c4d5d53f056bbba3114ebaf8f0414c037 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6920,6 +6920,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2, case E_TFmode: return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2); +case E_V4SImode: + return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2); + default: gcc_unreachable (); } @@ -6943,6 +6946,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2, case E_TFmode: return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2); +case E_V4SImode: + return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2); + default: gcc_unreachable (); } @@ -21097,6 +21103,27 @@ static void aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, machine_mode mode) { + /* Handle 256-bit memcpy separately. We do this by making 2 adjacent memory + address copies using V4SImode so that we can use Q registers. */ + if (known_eq (GET_MODE_BITSIZE (mode), 256)) +{ + mode = V4SImode; + rtx reg1 = gen_reg_rtx (mode); + rtx reg2 = gen_reg_rtx (mode); + /* "Cast" the pointers to the correct mode. */ + *src = adjust_address (*src, mode, 0); + *dst = adjust_address (*dst, mode, 0); + /* Emit the memcpy. */ + emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2, + aarch64_progress_pointer (*src))); + emit_insn (aarch64_gen_store_pair (mode, *dst, reg1, +aarch64_progress_pointer (*dst), reg2)); + /* Move the pointers forward. */ + *src = aarch64_move_pointer (*src, 32); + *dst = aarch64_move_pointer (*dst, 32); + return; +} + rtx reg = gen_reg_rtx (mode); /* "Cast" the pointers to the correct mode. */ @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands) /* Convert n to bits to make the rest of the code simpler. */ n = n * BITS_PER_UNIT; - /* Maximum amount to copy in one go. The AArch64 back-end has integer modes - larger than TImode, but we should not use them for loads/stores here. */ - const int copy_limit = GET_MODE_BITSIZE (TImode); + /* Maximum amount to copy in one go. We allow 256-bit chunks based on the + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and
RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi Richard > -Original Message- > From: Richard Sandiford > Sent: 31 July 2020 16:14 > To: Sudakshina Das > Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov > Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem > expansion > > Sudakshina Das writes: > > Hi > > > > This is my attempt at reviving the old patch > > https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html > > > > I have followed on Kyrill's comment upstream on the link above and I am > using the recommended option iii that he mentioned. > > "1) Adjust the copy_limit to 256 bits after checking > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning. > > 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256- > bit moves. by iii: > >iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs > > ldp/stps. This > wouldn't need any adjustments to > > MD patterns, but would make > aarch64_copy_one_block_and_progress_pointers more complex as it would > now have > > two paths, where one handles two adjacent memory addresses in one > calls." > > > > With this patch the following test > > > > #define N 8 > > extern int src[N], dst[N]; > > > > void > > foo (void) > > { > > __builtin_memcpy (dst, src, N * sizeof (int)); } > > > > which was originally giving > > foo: > > adrpx1, src > > add x1, x1, :lo12:src > > ldp x4, x5, [x1] > > adrpx0, dst > > add x0, x0, :lo12:dst > > ldp x2, x3, [x1, 16] > > stp x4, x5, [x0] > > stp x2, x3, [x0, 16] > > ret > > > > > > changes to the following > > foo: > > adrpx1, src > > add x1, x1, :lo12:src > > adrpx0, dst > > add x0, x0, :lo12:dst > > ldp q1, q0, [x1] > > stp q1, q0, [x0] > > ret > > > > This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and > > an overall code size reduction on most > > SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair > registers. > > Sorry for the slow review. LGTM with a very minor nit (sorry)… Thanks. Committed with the change. > > > @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands) > >/* Convert n to bits to make the rest of the code simpler. */ > >n = n * BITS_PER_UNIT; > > > > - /* Maximum amount to copy in one go. The AArch64 back-end has > integer modes > > - larger than TImode, but we should not use them for loads/stores here. > */ > > - const int copy_limit = GET_MODE_BITSIZE (TImode); > > + /* Maximum amount to copy in one go. We allow 256-bit chunks based > on the > > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and > > +TARGET_SIMD. */ > > + const int copy_limit = ((aarch64_tune_params.extra_tuning_flags > > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) > > + || !TARGET_SIMD) > > +? GET_MODE_BITSIZE (TImode) : 256; > > Should only be one space before “256”. > > I guess at some point we should consider handling fixed-length SVE too, but > that's only worth it for -msve-vector-bits=512 and higher. Yes sure I will add this for future backlog. > > Thanks, > Richard
RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion
Hi Richard Thank you for fixing this. I apologise for the trouble. I ran bootstrap only on an earlier version of the patch where I should have ran it again on the final one! ☹ I will be more careful in the future, Thanks Sudi > -Original Message- > From: Richard Sandiford > Sent: 05 August 2020 14:52 > To: Andreas Schwab > Cc: Sudakshina Das ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem > expansion > > Andreas Schwab writes: > > This breaks bootstrap. > > I've pushed the below to fix this after bootstrapping & regression testing on > aarch64-linux-gnu. > > Richard
RE: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.
Hi Martin > -Original Message- > From: Martin Liška > Sent: 21 May 2020 16:01 > To: gcc-patches@gcc.gnu.org > Cc: Sudakshina Das > Subject: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute. > > Hi. > > Similarly to: > > case OPT_mstrict_align: >if (val) > opts->x_target_flags |= MASK_STRICT_ALIGN; >else > opts->x_target_flags &= ~MASK_STRICT_ALIGN; >return true; > > the MASK_GENERAL_REGS_ONLY mask should be handled the same way. My old patch added the -mno-* version of the option and hence needed the change. Without the _no_ version for mgeneral-regs-only, I would imagine "val" to only ever have 1 as a value. Am I missing something here? Sudi > > @Sudakshina: The 'opts->x_target_flags |= MASK_STRICT_ALIGN' change is > not backported to all active branches. Can you please do it? > > Ready to be installed? > > gcc/ChangeLog: > > 2020-05-21 Martin Liska > > * common/config/aarch64/aarch64-common.c > (aarch64_handle_option): > Properly maask MASK_GENERAL_REGS_ONLY based on val. > --- > gcc/common/config/aarch64/aarch64-common.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) >
[PATCH, GCC, AArch64] Fix PR88398 for AArch64
Hi This patch is trying to fix PR88398 for AArch64. As discussed in the PR, loop unrolling is probably what we can do here. As an easy fix, the existing unroll_stupid is unrolling the given example better than the unroll_runtime_iterations since the the loop contains a break inside it. So all I have done here is: 1) Add a target hook so that this is AArch64 specific. 2) We are not unrolling the loops that decide_unroll_runtime_iterations would reject. 3) Out of the ones that decide_unroll_runtime_iterations would accept, check if the loop has more than 1 exit (this is done in the new target hook) and if it does, try to unroll using unroll_stupid. Regression tested on AArch64 and added the test from the PR. This gives an overall code size reduction of 2.35% and performance gain of 0.498% on Spec2017 Intrate. Is this ok for trunk? Thanks Sudi gcc/ChangeLog: 2019-xx-xx Sudakshina Das PR88398 * cfgloop.h: Include target.h. (lpt_dec): Move to... * target.h (lpt_dec): ... Here. * target.def: Define TARGET_LOOP_DECISION_ADJUST. * loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook. (decide_unroll_stupid): Likewise. * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function. (TARGET_LOOP_DECISION_ADJUST): Define for AArch64. * doc/tm.texi: Regenerated. * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST. gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das PR88398 * gcc.target/aarch64/pr88398.c: New test. diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 0b0154ffd7bf031a005de993b101d9db6dd98c43..985c74e3b60728fc8c9d34b69634488cae3451cb 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -21,15 +21,7 @@ along with GCC; see the file COPYING3. If not see #define GCC_CFGLOOP_H #include "cfgloopmanip.h" - -/* Structure to hold decision about unrolling/peeling. */ -enum lpt_dec -{ - LPT_NONE, - LPT_UNROLL_CONSTANT, - LPT_UNROLL_RUNTIME, - LPT_UNROLL_STUPID -}; +#include "target.h" struct GTY (()) lpt_decision { enum lpt_dec decision; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 599d07a729e7438080f8b5240ee95037a49fb983..f31ac41d66257c01ead8d5f5b9b22379ecb5d276 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -21093,6 +21093,39 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +/* Implement TARGET_LOOP_DECISION_ADJUST. CONSIDER is the loop decision + currently being checked for loop LOOP. This returns a decision which could + either be LPT_UNROLL_STUPID or the current value in LOOP. */ +static enum lpt_dec +aarch64_loop_decision_adjust (enum lpt_dec consider, class loop *loop) +{ + switch (consider) +{ +case LPT_UNROLL_CONSTANT: + return loop->lpt_decision.decision; + +case LPT_UNROLL_RUNTIME: +/* Fall through. */ +case LPT_UNROLL_STUPID: + { + vec edges = get_loop_exit_edges (loop); + if (edges.length () > 1) + { + if (dump_file) + fprintf (dump_file, ";; Need change in loop decision\n"); + consider = LPT_UNROLL_STUPID; + return consider; + } + return loop->lpt_decision.decision; + } + +case LPT_NONE: +/* Fall through. */ +default: + gcc_unreachable (); +} +} + /* Implement TARGET_COMPUTE_PRESSURE_CLASSES. */ static int @@ -21839,6 +21872,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost +#undef TARGET_LOOP_DECISION_ADJUST +#define TARGET_LOOP_DECISION_ADJUST aarch64_loop_decision_adjust + #undef TARGET_SCHED_ADJUST_PRIORITY #define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index cd9aed9874f4e6b2b0e2f8956ed6155975e643a8..61bd00e84c8a2a8865e95ba579c3b94790ab1331 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11857,6 +11857,15 @@ is required only when the target has special constraints like maximum number of memory accesses. @end deftypefn +@deftypefn {Target Hook} {enum lpt_dec} TARGET_LOOP_DECISION_ADJUST (enum lpt_dec @var{consider}, class loop *@var{loop}) +This target hook returns either a new value for the loop unrolling +decision or the existing value in @var{loop}. The parameter @var{consider} +is the loop decision currently being tested. The parameter @var{loop} is a +pointer to the loop, which is going to be checked for unrolling. This target +hook is required only when the target wants to override the unrolling +decisions. +@end deftypefn + @defmac POWI_MAX_MULTS If defined, this macro is interpreted as a signed integer C expression that specifies the maximum number of floating point multiplications diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 2739e9ceec5ad7253ff9135da8dbe3bf6010e8d7..7a7f917fb45a6cc22f373ff16f8b78aa3e35f210 100644 --- a/gcc/
Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
Hi Richard I apologise I should have given more explanation on my cover letter. Although the bug was filed for vectorization, the conversation on it talked about loops with two exits not being supported in the vectorizer and being not being possible without lto and peeling causing more harm than benefit. There was also no clear consensus among the discussion about the best way to do unrolling. So I looked at Wilco's suggestion of unrolling here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8 Although unroll_stupid does not exactly unroll it as he shows but it gets closer than unroll_runtime_iterations. So I ran an experiment to see if unrolliong the loop with unroll_stupid gets any benefit. The code size benefit was easy to see with the small example but it also gave performance benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds a switch case at the beginning for iteration check. This is less efficient because it creates too many branch close together specially for a loop which has more than 1 exit. beq .L70 cmp x12, 1 beq .L55 cmp x12, 2 beq .L57 cmp x12, 3 beq .L59 cmp x12, 4 beq .L61 cmp x12, 5 beq .L63 cmp x12, 6 bne .L72 Finally I agree that unroll_stupid by default did not touch loops with multiple exists but that was marked as a "TODO" to change later so I assumed that check was not a hard requirement for the unrolling alghorithm. /* Do not unroll loops with branches inside -- it increases number of mispredicts. TODO: this heuristic needs tunning; call inside the loop body is also relatively good reason to not unroll. */ unroll_stupid is also not touched unless there is -funroll-all-loops or a loop pragma incidcating that maybe this could be potentially harmful on certain targets. Since my experiments on AArch64 showed otherwise, I thought the easiest starting point would be to do this in a target hook and only for a specific case (multiple exits). Thanks Sudi From: Richard Biener Sent: Friday, November 15, 2019 9:32 AM To: Sudakshina Das Cc: gcc-patches@gcc.gnu.org ; Kyrill Tkachov ; James Greenhalgh ; Richard Earnshaw ; bin.ch...@linux.alibaba.com ; o...@ucw.cz Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64 On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das wrote: > > Hi > > This patch is trying to fix PR88398 for AArch64. As discussed in the PR, > loop unrolling is probably what we can do here. As an easy fix, the > existing unroll_stupid is unrolling the given example better than the > unroll_runtime_iterations since the the loop contains a break inside it. Hm, the bug reference doesn't help me at all in reviewing this - the bug is about vectorization. So why is unroll_stupid better than unroll_runtime_iterations for a loop with a break (or as your implementation, with multiple exists)? I don't like this target hook, it seems like general heuristics can be improved here, but it seems unroll-stupid doesn't consider loops with multiple exits at all? Richard. > So all I have done here is: > 1) Add a target hook so that this is AArch64 specific. > 2) We are not unrolling the loops that decide_unroll_runtime_iterations > would reject. > 3) Out of the ones that decide_unroll_runtime_iterations would accept, > check if the loop has more than 1 exit (this is done in the new target > hook) and if it does, try to unroll using unroll_stupid. > > Regression tested on AArch64 and added the test from the PR. This gives > an overall code size reduction of 2.35% and performance gain of 0.498% > on Spec2017 Intrate. > > Is this ok for trunk? > > Thanks > Sudi > > gcc/ChangeLog: > > 2019-xx-xx Sudakshina Das > > PR88398 > * cfgloop.h: Include target.h. > (lpt_dec): Move to... > * target.h (lpt_dec): ... Here. > * target.def: Define TARGET_LOOP_DECISION_ADJUST. > * loop-unroll.c (decide_unroll_runtime_iterations): Use new target > hook. > (decide_unroll_stupid): Likewise. > * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New > function. > (TARGET_LOOP_DECISION_ADJUST): Define for AArch64. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST. > > gcc/testsuite/ChangeLog: > > 2019-xx-xx Sudakshina Das > > PR88398 > * gcc.target/aarch64/pr88398.c: New test.
[Committed][Arm][testsuite] Fix failure for arm-fp16-ops-*.C
Hi Since r275022 which deprecates some uses of volatile, we have seen the following failures on arm-none-eabi and arm-none-linux-gnueabihf FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-1.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-2.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-5.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-6.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-7.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-8.C -std=gnu++2a (test for excess errors) Which catches the deprecated uses of volatile variables declared in arm-fp16-ops.h. This patch removes the volatile declarations from the header. Since none of the tests are run with any high optimization levels, this should change should not prevent the real function of the tests. Tests with RUNTESTFLAGS="dg.exp=arm-fp16-ops-*.C" now pass with the patch on arm-none-eabi. Committed as obvious r278905 gcc/testsuite/ChangeLog: 2019-xx-xx Sudakshina Das * g++.dg/ext/arm-fp16/arm-fp16-ops.h: Remove volatile keyword. Thanks Sudi diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h index 320494e..a92e081 100644 --- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h +++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h @@ -7,16 +7,16 @@ #define TEST(e) assert (e) #define TESTNOT(e) assert (!(e)) -volatile __fp16 h0 = 0.0; -volatile __fp16 h1 = 1.0; -volatile __fp16 h42 = 42.0; -volatile __fp16 hm2 = -2.0; -volatile __fp16 temp; - -volatile float f0 = 0.0; -volatile float f1 = 1.0; -volatile float f42 = 42.0; -volatile float fm2 = -2.0; +__fp16 h0 = 0.0; +__fp16 h1 = 1.0; +__fp16 h42 = 42.0; +__fp16 hm2 = -2.0; +__fp16 temp; + +float f0 = 0.0; +float f1 = 1.0; +float f42 = 42.0; +float fm2 = -2.0; int main (void) {
Re: [PATCH][GCC] Correct name of file in ChangeLog
Hi Matthew On 01/08/18 10:25, matthew.malcom...@arm.com wrote: My first patch included an incorrect ChangeLog entry -- the filename was misspelt. This corrects it. I think this counts as an obvious change. I have committed this on your behalf. Thanks Sudi
Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store
Hi Matthew On 02/08/18 17:26, matthew.malcom...@arm.com wrote: Use the STLUR instruction introduced in Armv8.4-a. This insruction has the store-release semantic like STLR but can take a 9-bit unscaled signed immediate offset. Example test case: ``` void foo () { int32_t *atomic_vals = calloc (4, sizeof (int32_t)); atomic_store_explicit (atomic_vals + 1, 2, memory_order_release); } ``` Before patch generates ``` foo: stp x29, x30, [sp, -16]! mov x1, 4 mov x0, x1 mov x29, sp bl calloc mov w1, 2 add x0, x0, 4 stlrw1, [x0] ldp x29, x30, [sp], 16 ret ``` After patch generates ``` foo: stp x29, x30, [sp, -16]! mov x1, 4 mov x0, x1 mov x29, sp bl calloc mov w1, 2 stlur w1, [x0, 4] ldp x29, x30, [sp], 16 ret ``` Full bootstrap and regression test done on aarch64. Ok for trunk? gcc/ 2018-07-26 Matthew Malcomson * config/aarch64/aarch64-protos.h (aarch64_offset_9bit_signed_unscaled_p): New declaration. * config/aarch64/aarch64.c (aarch64_offset_9bit_signed_unscaled_p): Rename from offset_9bit_signed_unscaled_p. * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro. * config/aarch64/atomics.md (atomic_store): Allow offset and use stlur. * config/aarch64/constraints.md (Ust): New constraint. * config/aarch64/predicates.md. (aarch64_sync_or_stlur_memory_operand): New predicate. gcc/testsuite/ 2018-07-26 Matthew Malcomson * gcc.target/aarch64/atomic-store.c: New. Thank you for doing this. I am not a maintainer but I have a few nits on this patch: diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index af5db9c595385f7586692258f750b6aceb3ed9c8..630a75bf776fcdc374aa9ffa4bb020fea3719320 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx); bool aarch64_mov_operand_p (rtx, machine_mode); ... -static inline bool -offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, +bool +aarch64_offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, poly_int64 offset) This needs to be aligned with the first argument ... @@ -5837,7 +5837,7 @@ aarch64_classify_address (struct aarch64_address_info *info, ldr/str instructions (only big endian will get here). */ if (mode == CImode) return (aarch64_offset_7bit_signed_scaled_p (TImode, offset) - && (offset_9bit_signed_unscaled_p (V16QImode, offset + 32) + && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, offset + 32) This is not less that 80 characters ... +;; STLUR instruction constraint requires Armv8.4 +(define_special_memory_constraint "Ust" + "@internal + A memory address suitable for use with an stlur instruction." + (and (match_operand 0 "aarch64_sync_or_stlur_memory_operand") + (match_test "TARGET_ARMV8_4"))) + You are already checking for TARGET_ARMV8_4 inside aarch64_sync_or_stlur_memory_operand. Also see my comment below for this function. ... +;; True if the operand is memory reference valid for one of a str or stlur +;; operation. +(define_predicate "aarch64_sync_or_stlur_memory_operand" + (ior (match_operand 0 "aarch64_sync_memory_operand") + (and (match_operand 0 "memory_operand") + (match_code "plus" "0") + (match_code "reg" "00") + (match_code "const_int" "01"))) +{ + if (aarch64_sync_memory_operand (op, mode)) +return true; + + if (!TARGET_ARMV8_4) +return false; + + rtx mem_op = XEXP (op, 0); + rtx plus_op0 = XEXP (mem_op, 0); + rtx plus_op1 = XEXP (mem_op, 1); + + if (GET_MODE (plus_op0) != DImode) +return false; + + poly_int64 offset; + poly_int_rtx_p (plus_op1, &offset); + return aarch64_offset_9bit_signed_unscaled_p (mode, offset); +}) + This predicate body makes it a bit mixed up with the two type of operands that you want to test especially looking at it from the constraint check perspective. I am assuming you would not want to use the non-immediate form of stlur and instead only use it in the form: STLUR , [, #] and use stlr for no immediate alternative. Thus the constraint does not need to check for aarch64_sync_memory_operand. My suggestion would be to make this operand check separate. Something like: +(define_predicate "aarch64_sync_or_stlur_memory_operand" + (ior (match_operand 0 "aarch64_sync_memory_operand") + (match_operand 0 "aarch64_stlur_memory_operand"))) Where you define aarch64_stlur_memory_operand as +bool aarch64_stlur_memory_operand (rtx op) +{ + if (!TARGET_ARMV8_4) +return false; + + rtx mem_op = XE
Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.
Hi Tamar On 13/08/18 17:27, Tamar Christina wrote: Hi Thomas, Thanks for the review. I’ll correct the typo before committing if I have no other changes required by a maintainer. Regards, Tamar. I am not a maintainer but I would like to point out something in your patch. I think you test case will fail with -mabi=ilp32 FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors) Excess errors: /work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: warning: overflow in conversion from 'long long int' to 'long int' changes value from '4073709551611' to '2080555003' [-Woverflow] We have had more such recent failures and James gave a very neat way to make sure the mode comes out what you intend it to here: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html I would just ask you to change the data types accordingly and test it with -mabi=ilp32. Thanks Sudi From: Thomas Preudhomme Sent: Monday, August 13, 2018 14:37 To: Tamar Christina Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh ; Richard Earnshaw ; Marcus Shawcroft Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies. Hi Tamar, Thanks for your patch. Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file. Best regards, Thomas On Mon, 13 Aug 2018 at 10:33, Tamar Christina mailto:tamar.christ...@arm.com>> wrote: Hi All, On AArch64 we have integer modes larger than TImode, and while we can generate moves for these they're not as efficient. So instead make sure we limit the maximum we can copy to TImode. This means copying a 16 byte struct will issue 1 TImode copy, which will be done using a single STP as we expect but an CImode sized copy won't issue CImode operations. Bootstrapped and regtested on aarch4-none-linux-gnu and no issues. Crosstested aarch4_be-none-elf and no issues. Ok for trunk? Thanks, Tamar gcc/ 2018-08-13 Tamar Christina mailto:tamar.christ...@arm.com>> * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max. gcc/testsuite/ 2018-08-13 Tamar Christina mailto:tamar.christ...@arm.com>> * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan. --
Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
On 22/11/17 15:21, Kyrill Tkachov wrote: On 22/11/17 11:25, Sudi Das wrote: Hi Kyrill and Christophe In case of soft fp testing, there are other assembly directives apart from the vmov one which are also failing. The directives probably make more sense in the hard fp context so instead of removing the vmov, I have added the -mfloat-abi=hard option. Is this ok for trunk? If yes could someone post it on my behalf? Sudi Thanks Sudi, You're right, this whole test isn't written with softfp in mind. We might as well restrict it to -mfloat-abi=hard. I've committed the patch with r255061. Hi Kyriil Thanks for the commit! Would you like to get commit access to the SVN repo? If you complete the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi using my email address as the approver we can get that sorted out :) Thanks again for the invite. I have filled out the form! :) Sudi Kyrill *** gcc/testsuite/ChangeLog *** 2017-11-22 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option. From: Kyrill Tkachov Sent: Monday, November 20, 2017 2:20 PM To: Christophe Lyon Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c On 20/11/17 14:14, Christophe Lyon wrote: Hi, On 17 November 2017 at 12:12, Kyrill Tkachov wrote: On 17/11/17 10:45, Sudi Das wrote: Hi Kyrill Thanks I have made the change. Thanks Sudi, I've committed this on your behalf with r254863. Kyrill Sudi From: Kyrill Tkachov Sent: Thursday, November 16, 2017 5:03 PM To: Sudi Das; gcc-patches@gcc.gnu.org Cc: nd; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c Hi Sudi, On 16/11/17 16:37, Sudi Das wrote: Hi This patch fixes the test case armv8_2-fp16-move-1.c for arm-none-linux-gnueabihf where 2 of the scan-assembler directives were failing. We now generate less vmov between core and VFP registers. Thus changing those directives to reflect that. Is this ok for trunk? If yes could someone commit it on my behalf? Sudi *** gcc/testsuite/ChangeLog *** 2017-11-16 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler directives. diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index bb4e68f..0ed8560 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c) /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } } */ Some of the moves between core and fp registers were the result of inefficient codegen and in hindsight scanning for them was not very useful. Now that we emit only the required ones I think scanning for the plain vmovs between two S-registers doesn't test anything useful. So can you please just remove the second scan-assembler directive here? You are probably already aware of that: the tests fail on arm-none-linux-gnueabi/arm-none-eabi FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times) but this is not a regression, the previous version of the test had the same problem. Grrr, that's because the softfp ABI necessitates moves between core and FP registers, so scanning for a particular number of vmovs between them is just not gonna be stable across soft-float ABIs. At this point I'd just remove the scan for vmovs completely as it doesn't seem to check anything useful. A patch to remove that scan for VMOV is pre-approved. Thanks for reminding me of this Christophe, softfp tends to slip my mind :( Kyrill Christophe Thanks, Kyrill
[PATCH] Add myself as GCC maintainer
Added myself as GCC maintainer with r255071 *** ChangeLog *** 2017-11-22 Sudakshina Das * MAINTAINERS (Write After Approval): Add myself. Thanks Sudi diff --git a/ChangeLog b/ChangeLog index 13b0321..adaec62 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2017-11-22 Sudakshina Das + + * MAINTAINERS (Write After Approval): Add myself. + 2017-11-06 Palmer Dabbelt * MAINTAINERS (RISC-V): Add Jim Wilson as a maintainer. diff --git a/MAINTAINERS b/MAINTAINERS index d207b58..fb45a3c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -359,6 +359,7 @@ Ian Dall David Daney Robin Dapp Simon Dardis +Sudakshina Das Bud Davis Chris Demetriou Sameera Deshpande
[PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions
Hi For the following test case: __fp16 test_select (__fp16 a, __fp16 b, __fp16 c) { return (a < b) ? b : c; } when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard trunk generates wrong code: test_select: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vcmpe.f32 s0, s15 vmrsAPSR_nzcv, FPSCR // <-- No conditional branch! vmovs1, s2 @ __fp16 .L2: vmovs0, s1 @ __fp16 bx lr There should have been a conditional branch there to skip one of the VMOVs. This patch fixes this problem by making *movhf_vfp_fp16 unconditional wherever needed. Testing done: Add a new test case and checked for regressions arm-none-linux-gnueabihf. Is this ok for trunk? Sudi ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-11-24 Sudakshina Das * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. *** gcc/testsuite/ChangeLog *** 2017-11-24 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-2.c: New test. diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 075a938..61b6477 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -410,7 +410,10 @@ gcc_unreachable (); } } - [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no") + [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\ + unconditional, unconditional, unconditional,\ + unconditional") + (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no") (set_attr "predicable_short_it" "no, no, no, yes,\ no, no, no, no,\ no, no") diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c new file mode 100644 index 000..fcb857f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ +/* { dg-options "-O2 -marm" } */ +/* { dg-add-options arm_v8_2a_fp16_scalar } */ + +__fp16 +test_select (__fp16 a, __fp16 b, __fp16 c) +{ + return (a < b) ? b : c; +} +/* { dg-final { scan-assembler "bmi" } } */
Replying to an older patch ([PATCH] Fix failing test-case)
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html This patch fixed a test case switch-case-2.c. I am seeing switch-case-1.c failing on arm-none-linux-gnueabihf: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 0 times) aarch64-none-linux-gnu: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 2 times) which looks pretty similar (also the same changes make it pass). Can you confirm? Sudi
Re: Replying to an older patch ([PATCH] Fix failing test-case)
On 30/11/17 11:03, Sudakshina Das wrote: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html This patch fixed a test case switch-case-2.c. I am seeing switch-case-1.c failing on arm-none-linux-gnueabihf: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 0 times) aarch64-none-linux-gnu: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 2 times) which looks pretty similar (also the same changes make it pass). Can you confirm? Sudi Put wrong email address earlier. And adding more people on cc
[PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions
Hi This patch is the fix for gcc-7 for the same issue as mentioned in: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html For the following test case: __fp16 test_select (__fp16 a, __fp16 b, __fp16 c) { return (a < b) ? b : c; } when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard generates wrong code: test_select: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vmov.f16r3, s2 @ __fp16 vcmpe.f32 s0, s15 vmrsAPSR_nzcv, FPSCR // <-- No conditional branch vmov.f16r3, s1 @ __fp16 .L1: vmov.f16s0, r3 @ __fp16 bx lr There should have been a conditional branch there to skip one of the VMOVs. This patch fixes this problem by making *movhf_vfp_fp16 unconditional. Testing done: Add a new test case and checked for regressions on bootstrapped arm-none-linux-gnueabihf. Is this ok for gcc-7? Sudi ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-11-30 Sudakshina Das * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. *** gcc/testsuite/ChangeLog *** 2017-11-30 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-2.c: New test. diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index d8f77e2ffe4fdb7c952d6a5ac947d91f89ce259d..9f06c3da9526d09e43836a60f14da9a49671a393 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -456,7 +456,10 @@ gcc_unreachable (); } } - [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no") + [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\ + unconditional, unconditional, unconditional,\ + unconditional") + (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no") (set_attr "predicable_short_it" "no, no, no, yes,\ no, no, no, no,\ no, no") diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c new file mode 100644 index ..ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ +/* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" } */ +/* { dg-add-options arm_v8_2a_fp16_scalar } */ + +__fp16 +test_select (__fp16 a, __fp16 b, __fp16 c) +{ + return (a < b) ? b : c; +} +/* { dg-final { scan-assembler "bpl" } } */
Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions
Hi Kyrill On 27/11/17 12:25, Kyrill Tkachov wrote: Hi Sudi, On 24/11/17 14:57, Sudi Das wrote: Hi For the following test case: __fp16 test_select (__fp16 a, __fp16 b, __fp16 c) { return (a < b) ? b : c; } when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard trunk generates wrong code: test_select: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vcmpe.f32 s0, s15 vmrsAPSR_nzcv, FPSCR // <-- No conditional branch! vmovs1, s2 @ __fp16 .L2: vmovs0, s1 @ __fp16 bx lr There should have been a conditional branch there to skip one of the VMOVs. This patch fixes this problem by making *movhf_vfp_fp16 unconditional wherever needed. Testing done: Add a new test case and checked for regressions arm-none-linux-gnueabihf. Is this ok for trunk? This is ok after assuming a bootstrap on arm-none-linux-gnueabihf passes as well. Does this bug appear on the GCC 7 branch? If so, could you please test this patch on that branch as well if so? I have tested the patch and also sent a new patch request for gcc-7 https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html Thanks Sudi Thanks, Kyrill Sudi ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-11-24 Sudakshina Das * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. *** gcc/testsuite/ChangeLog *** 2017-11-24 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-2.c: New test.
Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions
On 30/11/17 16:07, Kyrill Tkachov wrote: On 30/11/17 16:06, Sudakshina Das wrote: Hi Kyrill On 27/11/17 12:25, Kyrill Tkachov wrote: > Hi Sudi, > > On 24/11/17 14:57, Sudi Das wrote: >> Hi >> >> For the following test case: >> __fp16 >> test_select (__fp16 a, __fp16 b, __fp16 c) >> { >>return (a < b) ? b : c; >> } >> >> when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm >> -mfloat-abi=hard trunk generates wrong code: >> >> test_select: >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> @ link register save eliminated. >> vcvtb.f32.f16 s0, s0 >> vcvtb.f32.f16 s15, s1 >> vcmpe.f32 s0, s15 >> vmrsAPSR_nzcv, FPSCR >> // <-- No conditional branch! >> vmovs1, s2 @ __fp16 >> .L2: >> vmovs0, s1 @ __fp16 >> bx lr >> >> There should have been a conditional branch there to skip one of the >> VMOVs. >> This patch fixes this problem by making *movhf_vfp_fp16 unconditional >> wherever needed. >> >> Testing done: Add a new test case and checked for regressions >> arm-none-linux-gnueabihf. >> >> Is this ok for trunk? >> > > This is ok after assuming a bootstrap on arm-none-linux-gnueabihf passes > as well. > Does this bug appear on the GCC 7 branch? > If so, could you please test this patch on that branch as well if so? > I have tested the patch and also sent a new patch request for gcc-7 https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html Thanks Sudi, this is ok to commit to the branch after we let this patch bake on trunk for a week without problems. Committed as r255301 on trunk. Will wait for a week before committing to gcc-7. Thanks Sudi Kyrill Thanks Sudi > Thanks, > Kyrill > >> Sudi >> >> ChangeLog entry are as follow: >> >> *** gcc/ChangeLog *** >> >> 2017-11-24 Sudakshina Das >> >> * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. >> >> *** gcc/testsuite/ChangeLog *** >> >> 2017-11-24 Sudakshina Das >> >> * gcc.target/arm/armv8_2-fp16-move-2.c: New test. >
Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions
On 30/11/17 16:01, Sudakshina Das wrote: Hi This patch is the fix for gcc-7 for the same issue as mentioned in: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html For the following test case: __fp16 test_select (__fp16 a, __fp16 b, __fp16 c) { return (a < b) ? b : c; } when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard generates wrong code: test_select: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vmov.f16 r3, s2 @ __fp16 vcmpe.f32 s0, s15 vmrs APSR_nzcv, FPSCR // <-- No conditional branch vmov.f16 r3, s1 @ __fp16 .L1: vmov.f16 s0, r3 @ __fp16 bx lr There should have been a conditional branch there to skip one of the VMOVs. This patch fixes this problem by making *movhf_vfp_fp16 unconditional. Testing done: Add a new test case and checked for regressions on bootstrapped arm-none-linux-gnueabihf. Is this ok for gcc-7? Sudi ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-11-30 Sudakshina Das * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. *** gcc/testsuite/ChangeLog *** 2017-11-30 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-2.c: New test. As per the trunk thread for this (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as r255536 on gcc-7-branch for the backport. Thanks Sudi
Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions
Hi Christophe On 12/12/17 09:59, Christophe Lyon wrote: Hi, On 11 December 2017 at 18:12, Sudakshina Das wrote: On 30/11/17 16:01, Sudakshina Das wrote: Hi This patch is the fix for gcc-7 for the same issue as mentioned in: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html For the following test case: __fp16 test_select (__fp16 a, __fp16 b, __fp16 c) { return (a < b) ? b : c; } when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard generates wrong code: test_select: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16s0, s0 vcvtb.f32.f16s15, s1 vmov.f16r3, s2@ __fp16 vcmpe.f32s0, s15 vmrsAPSR_nzcv, FPSCR // <-- No conditional branch vmov.f16r3, s1@ __fp16 .L1: vmov.f16s0, r3@ __fp16 bxlr There should have been a conditional branch there to skip one of the VMOVs. This patch fixes this problem by making *movhf_vfp_fp16 unconditional. Testing done: Add a new test case and checked for regressions on bootstrapped arm-none-linux-gnueabihf. Is this ok for gcc-7? Sudi ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-11-30 Sudakshina Das * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute. *** gcc/testsuite/ChangeLog *** 2017-11-30 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-2.c: New test. As per the trunk thread for this (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as r255536 on gcc-7-branch for the backport. I've noticed that this backport fails on arm-none-linux-gnueabi and arm-none-eabi. I suspect this is partly due to the fact that I use a "recent" dejagnu, and has to do with whether dg-add-options are appended or pre-pended. I'm seeing a compilation line with: -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard -mfpu=fp-armv8 -mfloat-abi=softfp -march=armv8.2-a+fp16 leading to: FAIL: gcc.target/arm/armv8_2-fp16-move-2.c scan-assembler bpl I'm not sure why this works on trunk, but there I have only: -marm -mfloat-abi=softfp -march=armv8.2-a+fp16 Maybe this has to do with the new way cpu/fpu options are parsed on trunk. Sorry for this. I will try to investigate. Thanks Sudi Christophe Thanks Sudi
[PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp
Hi This patch is a follow up to the existing discussions on https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html Bin had earlier submitted a patch to fix the ICE that occurs because of the missing LTGT in aarch64-simd.md. That discussion opened up a new bug report PR81647 for an inconsistent behavior. As discussed earlier on the gcc-patches discussion and on the bug report, PR81647 was occurring because of how UNEQ was handled in aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is guaranteed to not give an FP exception but LTGT might, __builtin_islessgreater gets converted to ~UNEQ very early on in fold_builtin_unordered_cmp. Thus I will post a separate patch for correcting how UNEQ and other unordered comparisons are handled in aarch64-simd.md. This patch is only adding the missing LTGT to plug the ICE. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-13 Sudakshina Das Bin Cheng PR target/81228 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to CCFPEmode. * config/aarch64/aarch64-simd.md (vec_cmp): Add LTGT. *** gcc/testsuite/ChangeLog *** 2017-12-13 Sudakshina Das PR target/81228 * gcc.dg/pr81228.c: New.
Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp
On 13/12/17 16:42, Sudakshina Das wrote: Hi This patch is a follow up to the existing discussions on https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html Bin had earlier submitted a patch to fix the ICE that occurs because of the missing LTGT in aarch64-simd.md. That discussion opened up a new bug report PR81647 for an inconsistent behavior. As discussed earlier on the gcc-patches discussion and on the bug report, PR81647 was occurring because of how UNEQ was handled in aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is guaranteed to not give an FP exception but LTGT might, __builtin_islessgreater gets converted to ~UNEQ very early on in fold_builtin_unordered_cmp. Thus I will post a separate patch for correcting how UNEQ and other unordered comparisons are handled in aarch64-simd.md. This patch is only adding the missing LTGT to plug the ICE. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-13 Sudakshina Das Bin Cheng PR target/81228 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to CCFPEmode. * config/aarch64/aarch64-simd.md (vec_cmp): Add LTGT. *** gcc/testsuite/ChangeLog *** 2017-12-13 Sudakshina Das PR target/81228 * gcc.dg/pr81228.c: New. Sorry Forgot to attach the patch! Sudi diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ae71af8334343a749f11db1801554eac01a33cac..f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2759,6 +2759,7 @@ case UNEQ: case ORDERED: case UNORDERED: +case LTGT: break; default: gcc_unreachable (); @@ -2813,6 +2814,15 @@ emit_insn (gen_one_cmpl2 (operands[0], operands[0])); break; +case LTGT: + /* LTGT is not guranteed to not generate a FP exception. So let's + go the faster way : ((a > b) || (b > a)). */ + emit_insn (gen_aarch64_cmgt (operands[0], + operands[2], operands[3])); + emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2])); + emit_insn (gen_ior3 (operands[0], operands[0], tmp)); + break; + case UNORDERED: /* Operands are ORDERED iff (a > b || b >= a), so we can compute UNORDERED as !ORDERED. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 75a6c0d0421354d7c0759292947eb5d407f5b703..3efb1b7548ea9b0ea5644d99a0677dbe5baba2ef 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4962,13 +4962,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) case UNGT: case UNGE: case UNEQ: - case LTGT: return CCFPmode; case LT: case LE: case GT: case GE: + case LTGT: return CCFPEmode; default: diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c new file mode 100644 index ..f7eecc510ad2acaa656a1ce5df0aafffa56b3bd9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr81228.c @@ -0,0 +1,21 @@ +/* PR target/81228. */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-ssa" } */ + +void *a; + +void b () +{ + char c; + long d; + char *e = a; + for (; d; d++) + { +double f, g; +c = g < f || g > f; +e[d] = c; + } +} + +/* Let's make sure we do have a LTGT. */ +/* { dg-final { scan-tree-dump "<>" "ssa" } } */
Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp
Hi On 13/12/17 16:56, James Greenhalgh wrote: On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote: On 13/12/17 16:42, Sudakshina Das wrote: Hi This patch is a follow up to the existing discussions on https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html Bin had earlier submitted a patch to fix the ICE that occurs because of the missing LTGT in aarch64-simd.md. That discussion opened up a new bug report PR81647 for an inconsistent behavior. As discussed earlier on the gcc-patches discussion and on the bug report, PR81647 was occurring because of how UNEQ was handled in aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is guaranteed to not give an FP exception but LTGT might, __builtin_islessgreater gets converted to ~UNEQ very early on in fold_builtin_unordered_cmp. Thus I will post a separate patch for correcting how UNEQ and other unordered comparisons are handled in aarch64-simd.md. This patch is only adding the missing LTGT to plug the ICE. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? OK. Thanks, James Thanks for the review. Committed as r255625. I think this needs a back-port as well to gcc-7-branch. Is that ok? Sudi
[PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c
Hi This patch is a follow up on my previous patch with r255536 that was a back-port for fixing a wrong code generation (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html). As pointed out by Christophe Lyon (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case started to fail on the new dejagnu for arm-none-linux-gnueabi and arm-none-eabi. This patch just removes the dg-add-options from the test case because I think dg-options has all that is needed anyway. Testing: Since I could not reproduce the failure on my machine, Christophe would it be possible for you to check if this patch fixes the regression for you? Thanks Sudi diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c index ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff..09adddfd57ca13e831c276aef25621e7340bcfff 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c @@ -1,7 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ /* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" } */ -/* { dg-add-options arm_v8_2a_fp16_scalar } */ __fp16 test_select (__fp16 a, __fp16 b, __fp16 c)
Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c
Hi On 14/12/17 17:37, Christophe Lyon wrote: On 14 December 2017 at 17:05, Sudakshina Das wrote: Hi This patch is a follow up on my previous patch with r255536 that was a back-port for fixing a wrong code generation (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html). As pointed out by Christophe Lyon (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case started to fail on the new dejagnu for arm-none-linux-gnueabi and arm-none-eabi. This patch just removes the dg-add-options from the test case because I think dg-options has all that is needed anyway. Testing: Since I could not reproduce the failure on my machine, Christophe would it be possible for you to check if this patch fixes the regression for you? Manually tested on one of the offending configs, it did the trick. Thanks Thank you so much. I will wait for an OK and commit it! Sudi Christophe Thanks Sudi
Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c
Hi On 14/12/17 18:26, Kyrill Tkachov wrote: On 14/12/17 18:17, Sudi Das wrote: Hi On 14/12/17 17:37, Christophe Lyon wrote: > On 14 December 2017 at 17:05, Sudakshina Das wrote: >> Hi >> >> This patch is a follow up on my previous patch with r255536 that was a >> back-port for fixing a wrong code generation >> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html). >> As pointed out by Christophe Lyon >> (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case >> started to fail on the new dejagnu for arm-none-linux-gnueabi and >> arm-none-eabi. >> This patch just removes the dg-add-options from the test case because I >> think dg-options has all that is needed anyway. >> >> Testing: Since I could not reproduce the failure on my machine, Christophe >> would it be possible for you to check if this patch fixes the >> regression for you? >> > > Manually tested on one of the offending configs, it did the trick. > Thanks > Thank you so much. I will wait for an OK and commit it! Thanks Sudi and Christophe. The patch is ok with an appropriate ChangeLog entry. Thanks Kyrill. I added the ChanngeLog entry. Committed with r255681. Sudi Kyrill Sudi > Christophe > >> Thanks >> Sudi
[PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2731,10 +2731,10 @@ break; } /* Fall through. */ -case UNGE: +case UNLT: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLE: +case UNGT: case GT: comparison = gen_aarch64_cmgt; break; @@ -2745,10 +2745,10 @@ break; } /* Fall through. */ -case UNGT: +case UNLE: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLT: +case UNGE: case GE: comparison = gen_aarch64_cmge; break; @@ -2771,21 +2771,35 @@ case UNGT: case UNLE: case UNLT: -case NE: - /* FCM returns false for lanes which are unordered, so if we use - the inverse of the comparison we actually want to emit, then - invert the result, we will end up with the correct result. - Note that a NE NaN and NaN NE b are true for all a, b. - - Our transformations are: - a UNGE b -> !(b GT a) - a UNGT b -> !(b GE a) - a UNLE b -> !(a GT b) - a UNLT b -> !(a GE b) - a NE b -> !(a EQ b) */ - gcc_assert (comparison != NULL); - emit_insn (comparison (operands[0], operands[2], operands[3])); - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); + { + /* All of the above must not raise any FP exceptions. Thus we first + check each operand for NaNs and force any elements containing NaN to + zero before using them in the compare. + Example: UN (a, b) -> UNORDERED (a, b) | + (cm (isnan (a) ? 0.0 : a, + isnan (b) ? 0.0 : b)) + We use the following transformations for doing the comparisions: + a UNGE b -> a GE b + a UNGT b -> a GT b + a UNLE b -> b GE a + a UNLT b -> b GT a. */ + + rtx tmp0 = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2])); + emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3])); + emit_insn (gen_and3 (tmp2, tmp0, tmp1)); + emit_insn (gen_and3 (tmp0, tmp0, + lowpart_subreg (mode, operands[2], mode))); + emit_insn (gen_and3 (tmp1, tmp1, + lowpart_subreg (mode, operands[3], mode))); + gcc_assert (comparison != NULL); + emit_insn (comparison (operands[0], + lowpart_subreg (mode, tmp0, mode), + lowpart_subreg (mode, tmp1, mode))); + emit_insn (gen_orn3 (operands[0], tmp2, operands[0])); + } break; case LT: @@ -2793,25 +2807,19 @@ case GT: case GE: case EQ: +case NE: /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. As a LT b <=> b GE a && a LE b <=> b GT a. Our transformations are: a GE b -> a GE b a GT b -> a GT b a LE b -> b GE a a LT b -> b GT a - a EQ b -> a EQ b */ + a EQ b -> a EQ b + a NE b -> ~(a EQ b) */ gcc_assert (comparison != NULL); emit_insn (comparison (operands[0], operands[2], operands[3])); - break; - -case UNEQ: - /* We first check (a > b || b > a) which is !UNEQ, inverting - this result will then give us (a == b || a UNORDERED b). */ - emit_insn (gen_aarch64_cmgt (operands[0], - operands
Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp
On 14/12/17 10:38, Sudakshina Das wrote: Hi On 13/12/17 16:56, James Greenhalgh wrote: On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote: On 13/12/17 16:42, Sudakshina Das wrote: Hi This patch is a follow up to the existing discussions on https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html Bin had earlier submitted a patch to fix the ICE that occurs because of the missing LTGT in aarch64-simd.md. That discussion opened up a new bug report PR81647 for an inconsistent behavior. As discussed earlier on the gcc-patches discussion and on the bug report, PR81647 was occurring because of how UNEQ was handled in aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is guaranteed to not give an FP exception but LTGT might, __builtin_islessgreater gets converted to ~UNEQ very early on in fold_builtin_unordered_cmp. Thus I will post a separate patch for correcting how UNEQ and other unordered comparisons are handled in aarch64-simd.md. This patch is only adding the missing LTGT to plug the ICE. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? OK. Thanks, James Thanks for the review. Committed as r255625. I think this needs a back-port as well to gcc-7-branch. Is that ok? Sudi Backport Ping! Sudi
Re: Replying to an older patch ([PATCH] Fix failing test-case)
Hi Martin On 19/12/17 10:49, Martin Liška wrote: On 11/30/2017 12:03 PM, Sudakshina Das wrote: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html This patch fixed a test case switch-case-2.c. I am seeing switch-case-1.c failing on arm-none-linux-gnueabihf: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 0 times) aarch64-none-linux-gnu: FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic block[^\\n]*count 2000" 1 (found 2 times) which looks pretty similar (also the same changes make it pass). Can you confirm? Sudi Hello. There's patch for that. Can you please test it? I have tested these changes and the test case passes for both arm-none-linux-gnueabihf and aarch64-none-linux-gnu. Thanks Sudi Martin
[PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0
Hi This patch add support for the missing transformation of (x | y) == x -> (y & ~x) == 0. The transformation for (x & y) == x case already exists in simplify-rtx.c since 2014 as of r218503 and this patch only adds a couple of extra patterns for the IOR case. Citing the example given in PR82439 : Simple testcase (f1 should generate the same as f2): int f1(int x, int y) { return (x | y) == x; } int f2(int x, int y) { return (y & ~x) == 0; } f1: orr w1, w0, w1 cmp w1, w0 csetw0, eq ret f2: bicswzr, w1, w0 csetw0, eq ret This benefits targets that have the BICS instruction to generate better code. Wilco helped me in showing that even in targets that do not have the BICS instruction, this is no worse and gives out 2 instructions. For example in x86: : 0: 09 fe or %edi,%esi 2: 31 c0 xor%eax,%eax 4: 39 fe cmp%edi,%esi 6: 0f 94 c0sete %al 9: c3 retq 0010 : 10: f7 d7 not%edi 12: 31 c0 xor%eax,%eax 14: 85 f7 test %esi,%edi 16: 0f 94 c0sete %al 19: c3 retq Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test cases. Is this ok for trunk? Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-01-03 Sudakshina Das PR target/82439 * simplify-rtx.c (simplify_relational_operation_1): Add simplifications of (x|y) == x for BICS pattern. *** gcc/testsuite/ChangeLog *** 2017-01-03 Sudakshina Das PR target/82439 * gcc.target/aarch64/bics_5.c: New test. * gcc.target/arm/bics_5.c: Likewise.
Re: [PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0
Hi On 03/01/18 14:38, Segher Boessenkool wrote: Hi! On Wed, Jan 03, 2018 at 01:57:38PM +, Sudakshina Das wrote: This patch add support for the missing transformation of (x | y) == x -> (y & ~x) == 0. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test cases. Is this ok for trunk? You forgot to include the patch :-) (facepalm) This is the second time this has happened to me! Sorry about this. Attaching now. Sudi Segher diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index e5cfd3d..17536d0 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5032,34 +5032,38 @@ simplify_relational_operation_1 (enum rtx_code code, machine_mode mode, simplify_gen_binary (XOR, cmp_mode, XEXP (op0, 1), op1)); - /* (eq/ne (and x y) x) simplifies to (eq/ne (and (not y) x) 0), which - can be implemented with a BICS instruction on some targets, or - constant-folded if y is a constant. */ + /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or + constant folding if x/y is a constant. */ if ((code == EQ || code == NE) - && op0code == AND - && rtx_equal_p (XEXP (op0, 0), op1) + && (op0code == AND || op0code == IOR) && !side_effects_p (op1) && op1 != CONST0_RTX (cmp_mode)) { - rtx not_y = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 1), cmp_mode); - rtx lhs = simplify_gen_binary (AND, cmp_mode, not_y, XEXP (op0, 0)); + /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to + (eq/ne (and (not y) x) 0). */ + if ((op0code == AND && rtx_equal_p (XEXP (op0, 0), op1)) + || (op0code == IOR && rtx_equal_p (XEXP (op0, 1), op1))) + { + rtx not_y = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 1), + cmp_mode); + rtx lhs = simplify_gen_binary (AND, cmp_mode, not_y, XEXP (op0, 0)); - return simplify_gen_relational (code, mode, cmp_mode, lhs, - CONST0_RTX (cmp_mode)); -} + return simplify_gen_relational (code, mode, cmp_mode, lhs, + CONST0_RTX (cmp_mode)); + } - /* Likewise for (eq/ne (and x y) y). */ - if ((code == EQ || code == NE) - && op0code == AND - && rtx_equal_p (XEXP (op0, 1), op1) - && !side_effects_p (op1) - && op1 != CONST0_RTX (cmp_mode)) -{ - rtx not_x = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 0), cmp_mode); - rtx lhs = simplify_gen_binary (AND, cmp_mode, not_x, XEXP (op0, 1)); + /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to + (eq/ne (and (not x) y) 0). */ + if ((op0code == AND && rtx_equal_p (XEXP (op0, 1), op1)) + || (op0code == IOR && rtx_equal_p (XEXP (op0, 0), op1))) + { + rtx not_x = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 0), + cmp_mode); + rtx lhs = simplify_gen_binary (AND, cmp_mode, not_x, XEXP (op0, 1)); - return simplify_gen_relational (code, mode, cmp_mode, lhs, - CONST0_RTX (cmp_mode)); + return simplify_gen_relational (code, mode, cmp_mode, lhs, + CONST0_RTX (cmp_mode)); + } } /* (eq/ne (bswap x) C1) simplifies to (eq/ne x C2) with C2 swapped. */ diff --git a/gcc/testsuite/gcc.target/aarch64/bics_5.c b/gcc/testsuite/gcc.target/aarch64/bics_5.c new file mode 100644 index 000..b9c2c40 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bics_5.c @@ -0,0 +1,86 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -fno-inline" } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + if ((a | b) == a) +return a; + else +return c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + if ((a | b) == b) +return b; + else +return c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + if ((a | b) == a) +return a; + else +return c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + if ((a | b) == b) +return b; + else +return c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (0xf00d, 0xf11f, 0); + if (x != 0) +abort (); + + x = bics_si_test1 (0xf11f, 0xf00d, 0); + if (x != 0xf11f) +abort (); + + x = bics_si_test2 (0xf00d, 0xf11f, 0); + if (x != 0xf11f) +abort (); + + x = bics_si_test2 (0xf11f, 0xf00d, 0); + if (x != 0) +abort (); + + y = bics_di_test1 (0x10001000f00dll, 0x12341000f00dll, 0ll); + if (y != 0) +abort (); + + y = bics_di_test1 (0x12341000f00dll, 0x10001000f00dll, 0ll); + if (y != 0x12341000f00dll) +abort (); + + y = bics_di_test2 (0x10001000f00dll, 0x12341000f00dll, 0ll); + if (y != 0x12341000f00dll) +abort (); + + y = bics_di_test2 (0x12341000f00dll, 0x10001000f00dll, 0ll); + if (y != 0) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times "bics\twzr, w\[0-9\]+, w\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler
[PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. diff --git a/gcc/optabs.c b/gcc/optabs.c index 225e955..45b018e 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6295,7 +6295,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, if (cc_reg) { target_bool = emit_store_flag_force (target_bool, EQ, cc_reg, - const0_rtx, VOIDmode, 0, 1); + const0_rtx, mode, 0, 1); goto success; } goto success_bool_from_val; @@ -6323,7 +6323,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, success_bool_from_val: target_bool = emit_store_flag_force (target_bool, EQ, target_oval, - expected, VOIDmode, 1, 1); + expected, mode, 1, 1); success: /* Make sure that the oval output winds up where the caller asked. */ if (ptarget_oval) diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c new file mode 100644 index 000..07eb5f6 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-options "-march=armv5t -mthumb -mfloat-abi=soft" } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +} diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c new file mode 100644 index 000..2570b16 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-options "-march=armv6 -mthumb -mfloat-abi=soft" } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +}
Re: [PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0
Hi Jeff On 04/01/18 18:30, Jeff Law wrote: On 01/03/2018 06:57 AM, Sudakshina Das wrote: Hi This patch add support for the missing transformation of (x | y) == x -> (y & ~x) == 0. The transformation for (x & y) == x case already exists in simplify-rtx.c since 2014 as of r218503 and this patch only adds a couple of extra patterns for the IOR case. Citing the example given in PR82439 : Simple testcase (f1 should generate the same as f2): int f1(int x, int y) { return (x | y) == x; } int f2(int x, int y) { return (y & ~x) == 0; } f1: orr w1, w0, w1 cmp w1, w0 cset w0, eq ret f2: bics wzr, w1, w0 cset w0, eq ret This benefits targets that have the BICS instruction to generate better code. Wilco helped me in showing that even in targets that do not have the BICS instruction, this is no worse and gives out 2 instructions. For example in x86: : 0: 09 fe or %edi,%esi 2: 31 c0 xor %eax,%eax 4: 39 fe cmp %edi,%esi 6: 0f 94 c0 sete %al 9: c3 retq 0010 : 10: f7 d7 not %edi 12: 31 c0 xor %eax,%eax 14: 85 f7 test %esi,%edi 16: 0f 94 c0 sete %al 19: c3 retq Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test cases. Is this ok for trunk? Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-01-03 Sudakshina Das PR target/82439 * simplify-rtx.c (simplify_relational_operation_1): Add simplifications of (x|y) == x for BICS pattern. *** gcc/testsuite/ChangeLog *** 2017-01-03 Sudakshina Das PR target/82439 * gcc.target/aarch64/bics_5.c: New test. * gcc.target/arm/bics_5.c: Likewise. OK. Thanks! Committed as r256275. Sudi jeff
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Kyrill On 04/01/18 16:36, Kyrill Tkachov wrote: Hi Sudi, On 04/01/18 15:35, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c new file mode 100644 index 000..07eb5f6 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-options "-march=armv5t -mthumb -mfloat-abi=soft" } */ The tests in gcc.c-torture/compile/ are supposed to be target-independent, so it's best to not gate it on target arm*-*-*. Best to add the arm-specific options that you want using a dg-additional-options directive gated on target arm*-*-*. + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +} diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c new file mode 100644 index 000..2570b16 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-options "-march=armv6 -mthumb -mfloat-abi=soft" } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +} I don't think we need an armv6 test here as the root cause is the same as on armv5t AFAICT so this won't give us any extra code coverage. I have made the changes in the test files as requested. Thanks Sudi Thanks, Kyrill diff --git a/gcc/optabs.c b/gcc/optabs.c index 225e955..45b018e 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6295,7 +6295,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, if (cc_reg) { target_bool = emit_store_flag_force (target_bool, EQ, cc_reg, - const0_rtx, VOIDmode, 0, 1); + const0_rtx, mode, 0, 1); goto success; } goto success_bool_from_val; @@ -6323,7 +6323,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval, success_bool_from_val: target_bool = emit_store_flag_force (target_bool, EQ, target_oval, - expected, VOIDmode, 1, 1); + expected, mode, 1, 1); success: /* Make sure that the oval output winds up where the caller asked. */ if (ptarget_oval) diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c new file mode 100644 index 000..9fed28c --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -0,0 +1,9 @@ +/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +}
Re: [PATCH PR81647][AARCH64] PING Fix handling of Unordered Comparisons in aarch64-simd.md
PING On 15/12/17 11:57, Sudakshina Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New.
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. Sudi Jeff
Re: C++ PATCHes to xvalue handling
On 23/05/18 18:21, Jason Merrill wrote: The first patch implements the adjustments from core issues 616 and 1213 to the value category of subobjects of class prvalues: they were considered prvalues themselves, but that was kind of nonsensical. Now they are considered xvalues. Along with this, I've removed the diagnostic distinction between xvalues and prvalues when trying to use one or the other as an lvalue; the important thing is that they are rvalues. The second patch corrects various issues with casts and xvalues/rvalue references: we were treating an xvalue operand to dynamic_cast as an lvalue, and we were objecting to casts from prvalue to rvalue reference type. With the second patch: commit f7d2790049fd1e59af4b69ee12f7c101cfe4cdab Author: jason Date: Wed May 23 17:21:39 2018 + Fix cast to rvalue reference from prvalue. * cvt.c (diagnose_ref_binding): Handle rvalue reference. * rtti.c (build_dynamic_cast_1): Don't try to build a reference to non-class type. Handle xvalue argument. * typeck.c (build_reinterpret_cast_1): Allow cast from prvalue to rvalue reference. * semantics.c (finish_compound_literal): Do direct-initialization, not cast, to initialize a reference. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260622 138bc75d-0d04-0410-961f-82ee72b054a4 I have observed the following failure in Spec2017 while building 510.parest_r on aarch64-none-linux-gnu aarch64-none-linux-gnu-g++ -c -o source/numerics/matrices.all_dimensions.o -DSPEC -DNDEBUG -Iinclude -I. -DSPEC_AUTO_SUPPRESS_OPENMP -mcpu=cortex-a57+crypto -Ofast -fomit-frame-pointer -fpermissive-DSPEC_LP64 source/numerics/matrices.all_dimensions.cc source/numerics/matrices.all_dimensions.cc: In static member function 'static void dealii::MatrixTools::apply_boundary_values(const std::map&, dealii::BlockSparseMatrix&, dealii::BlockVector&, dealii::BlockVector&, bool)': source/numerics/matrices.all_dimensions.cc:469:50: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]]; ^ source/numerics/matrices.all_dimensions.cc:472:55: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]+1], ^ source/numerics/matrices.all_dimensions.cc:474:55: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row+1]], ^ source/numerics/matrices.all_dimensions.cc:479:49: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row]], ^ source/numerics/matrices.all_dimensions.cc:481:51: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[row+1]], ^ source/numerics/matrices.all_dimensions.cc:510:50: error: lvalue required as unary '&' operand [this_sparsity.get_rowstart_indices()[0]]); Sudi Tested x86_64-pc-linux-gnu, applying to trunk.
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
On 02/05/18 18:28, Jeff Law wrote: On 03/14/2018 11:40 AM, Sudakshina Das wrote: Hi This patch is another partial fix for PR 84521. This is adding a definition to one of the target hooks used in the SJLJ implemetation so that AArch64 defines the hard_frame_pointer_rtx as the TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is still a lot more work to be done for these builtins in the future. Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added new test. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * builtins.c (expand_builtin_setjmp_receiver): Update condition to restore frame pointer. * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.c-torture/execute/pr84521.c: New test. So just to be clear, you do _not_ want the frame pointer restored here? Right? aarch64_builtin_setjmp_frame_value always returns hard_frame_pointer_rtx which will cause the generic code in builtins.c to not restore the frame pointer. Have you looked at other targets which define builtin_setjmp_frame_value to determine if they'll do the right thing. x86 and sparc are the most important. I see that arc, vax and avr also define that hook, but are obviously harder to test. Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. Sudi jeff
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi Eric On 07/06/18 16:33, Eric Botcazou wrote: Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. The patch is a no-op for SPARC because it defines the nonlocal_goto pattern. But I would nevertheless strongly suggest _not_ fiddling with the generic code like that and just defining the nonlocal_goto pattern for Aarch64 instead. Thank you for the suggestion, I have edited the patch accordingly and defined the nonlocal_goto pattern for AArch64. This has also helped take care of the issue with __builtin_longjmp that Wilco had mentioned in his comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19). I have also modified the test case according to Wilco's comment to add an extra jump buffer. This test case passes with AArch64 but fails on x86 trunk as follows (It may fail on other targets as well): FAIL: gcc.c-torture/execute/pr84521.c -O1 execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr84521.c -Os execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Testing: Bootstrapped and regtested on aarch64-none-linux-gnu. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add cfun->has_nonlocal_label to force frame chain. (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. * config/aarch64/aarch64.md (nonlocal_goto): New. *** gcc/testsuite/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * gcc.c-torture/execute/pr84521.c: New test. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 976f9af..f042def 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index bd0ac2f..95f7fe3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3998,7 +3998,7 @@ static bool aarch64_needs_frame_chain (void) { /* Force a frame chain for EH returns so the return address is at FP+8. */ - if (frame_pointer_needed || crtl->calls_eh_return) + if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label) return true; /* A leaf function cannot have calls or write LR. */ @@ -12213,6 +12213,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17829,6 +17836,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 830f976..381fd83 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -6081,6 +6081,30 @@ DONE; }) +;; This is broadly similar to the builtins.c except that it uses +;; temporaries to load the incoming SP and FP. +(define_expand "nonlocal_goto" + [(use (match_operand 0 "general_operand")) + (use (match_operand 1 "general_operand")) + (use (match_operand 2 "general_operand")) + (use (match_operand 3 "general_operand"))] + "" +{ +rtx label_in = copy_to_reg (operands[1]); +rtx fp_in = copy_to_reg (operands[3]); +rtx sp_in = copy_to_reg (operands[2]); + +emit_move_insn (hard_frame_pointer_rtx, fp_i
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
PING! On 14/06/18 12:10, Sudakshina Das wrote: Hi Eric On 07/06/18 16:33, Eric Botcazou wrote: Sorry this fell off my radar. I have reg-tested it on x86 and tried it on the sparc machine from the gcc farm but I think I couldn't finished the run and now its showing to he unreachable. The patch is a no-op for SPARC because it defines the nonlocal_goto pattern. But I would nevertheless strongly suggest _not_ fiddling with the generic code like that and just defining the nonlocal_goto pattern for Aarch64 instead. Thank you for the suggestion, I have edited the patch accordingly and defined the nonlocal_goto pattern for AArch64. This has also helped take care of the issue with __builtin_longjmp that Wilco had mentioned in his comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19). I have also modified the test case according to Wilco's comment to add an extra jump buffer. This test case passes with AArch64 but fails on x86 trunk as follows (It may fail on other targets as well): FAIL: gcc.c-torture/execute/pr84521.c -O1 execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/pr84521.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr84521.c -Os execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr84521.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Testing: Bootstrapped and regtested on aarch64-none-linux-gnu. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add cfun->has_nonlocal_label to force frame chain. (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. * config/aarch64/aarch64.md (nonlocal_goto): New. *** gcc/testsuite/ChangeLog *** 2018-06-14 Sudakshina Das PR target/84521 * gcc.c-torture/execute/pr84521.c: New test.
[PATCH PR81228][AARCH64][gcc-7] Backport r255625 : Fix ICE by adding LTGT
Hi This patch is only adding the missing LTGT to plug the ICE. This is a backport to r255625 of trunk. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2018-01-09 Sudakshina Das Bin Cheng Backport from mainline: 2017-12-14 Sudakshina Das Bin Cheng PR target/81228 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to CCFPEmode. * config/aarch64/aarch64-simd.md (vec_cmp): Add LTGT. *** gcc/testsuite/ChangeLog *** 2017-01-09 Sudakshina Das Backport from mainline: 2017-12-14 Sudakshina Das PR target/81228 * gcc.dg/pr81228.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c462164..1e0a346 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2490,6 +2490,7 @@ case UNEQ: case ORDERED: case UNORDERED: +case LTGT: break; default: gcc_unreachable (); @@ -2544,6 +2545,15 @@ emit_insn (gen_one_cmpl2 (operands[0], operands[0])); break; +case LTGT: + /* LTGT is not guranteed to not generate a FP exception. So let's + go the faster way : ((a > b) || (b > a)). */ + emit_insn (gen_aarch64_cmgt (operands[0], + operands[2], operands[3])); + emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2])); + emit_insn (gen_ior3 (operands[0], operands[0], tmp)); + break; + case UNORDERED: /* Operands are ORDERED iff (a > b || b >= a), so we can compute UNORDERED as !ORDERED. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 436091a..db517ca 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4664,13 +4664,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) case UNGT: case UNGE: case UNEQ: - case LTGT: return CCFPmode; case LT: case LE: case GT: case GE: + case LTGT: return CCFPEmode; default: diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c new file mode 100644 index 000..f7eecc5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr81228.c @@ -0,0 +1,21 @@ +/* PR target/81228. */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-ssa" } */ + +void *a; + +void b () +{ + char c; + long d; + char *e = a; + for (; d; d++) + { +double f, g; +c = g < f || g > f; +e[d] = c; + } +} + +/* Let's make sure we do have a LTGT. */ +/* { dg-final { scan-tree-dump "<>" "ssa" } } */
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 09/01/18 23:43, Jeff Law wrote: On 01/05/2018 12:25 PM, Sudakshina Das wrote: Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. ? /* Emit a store-flags instruction for comparison CODE on OP0 and OP1 and storing in TARGET. Normally return TARGET. Return 0 if that cannot be done. MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT. So we're comparing op0 and op1 AFAICT. One, but not both can be a CONST_INT. If both are a CONST_INT, then you need to address the problem in the caller (by optimizing away the condition). If you've got a REG and a CONST_INT, then the mode should be taken from the REG operand. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. So if only one of the two objects is a CONST_INT, then the mode should come from the other object. I think that's the fundamental problem here and that you're just papering over it by changing the caller. I think my earlier explanation was a bit misleading and I may have rushed into quoting the comment about both operands being const for emit_store_flag_force(). The problem is with the function and I do agree with your suggestion of changing the function to add the code below to be a better approach than the changing the caller. I will change the patch and test it. Thanks Sudi For example in emit_store_flag_1 we have this code: /* If one operand is constant, make it the second one. Only do this if the other operand is not constant as well. */ if (swap_commutative_operands_p (op0, op1)) { std::swap (op0, op1); code = swap_condition (code); } if (mode == VOIDmode) mode = GET_MODE (op0); I think if you do this in emit_store_flag_force as well everything will "just work". You can put it after this call/test pair: /* First see if emit_store_flag can do the job. */ tem = emit_store_flag (target, code, op0, op1, mode, unsignedp, normalizep); if (tem != 0) return tem; jeff
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 10/01/18 10:44, Sudakshina Das wrote: Hi Jeff On 09/01/18 23:43, Jeff Law wrote: On 01/05/2018 12:25 PM, Sudakshina Das wrote: Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. ? /* Emit a store-flags instruction for comparison CODE on OP0 and OP1 and storing in TARGET. Normally return TARGET. Return 0 if that cannot be done. MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT. So we're comparing op0 and op1 AFAICT. One, but not both can be a CONST_INT. If both are a CONST_INT, then you need to address the problem in the caller (by optimizing away the condition). If you've got a REG and a CONST_INT, then the mode should be taken from the REG operand. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. So if only one of the two objects is a CONST_INT, then the mode should come from the other object. I think that's the fundamental problem here and that you're just papering over it by changing the caller. I think my earlier explanation was a bit misleading and I may have rushed into quoting the comment about both operands being const for emit_store_flag_force(). The problem is with the function and I do agree with your suggestion of changing the function to add the code below to be a better approach than the changing the caller. I will change the patch and test it. This is the updated patch according to your suggestions. Testing: Checked for regressions on arm-none-linux-gnueabihf and added new test case. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. Thanks Sudi For example in emit_store_flag_1 we have this code: /* If one operand is constant, make it the second one. Only do this if the other operand is not constant as well. */ if (swap_commutative_operands_p (op0, op1)) { std::swap (op0, op1); code = swap_condition (code); } if (mode == VOIDmode) mode = GET_MODE (op0); I think if you do this in emit_store_flag_force as well everything will "just work". You can put it after this call/test pair: /* First see if emit_store_flag can do the job. */ tem = emit_store_flag (target, code, op0, op1, mode, unsignedp, normalizep); if (tem != 0) return tem; jeff diff --git a/gcc/expmed.c b/gcc/expmed.c index 6b22946..142d542 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -6084,6 +6084,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1, if (tem != 0) return tem; + /* If one operand is constant, make it the second one. Only do this + if the other operand is not constant as well. */ + + if (swap_commutative_operands_p (op0, op1)) +{ + std::s
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 10/01/18 21:08, Jeff Law wrote: On 01/10/2018 09:25 AM, Sudakshina Das wrote: Hi Jeff On 10/01/18 10:44, Sudakshina Das wrote: Hi Jeff On 09/01/18 23:43, Jeff Law wrote: On 01/05/2018 12:25 PM, Sudakshina Das wrote: Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. ? /* Emit a store-flags instruction for comparison CODE on OP0 and OP1 and storing in TARGET. Normally return TARGET. Return 0 if that cannot be done. MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT. So we're comparing op0 and op1 AFAICT. One, but not both can be a CONST_INT. If both are a CONST_INT, then you need to address the problem in the caller (by optimizing away the condition). If you've got a REG and a CONST_INT, then the mode should be taken from the REG operand. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. So if only one of the two objects is a CONST_INT, then the mode should come from the other object. I think that's the fundamental problem here and that you're just papering over it by changing the caller. I think my earlier explanation was a bit misleading and I may have rushed into quoting the comment about both operands being const for emit_store_flag_force(). The problem is with the function and I do agree with your suggestion of changing the function to add the code below to be a better approach than the changing the caller. I will change the patch and test it. This is the updated patch according to your suggestions. Testing: Checked for regressions on arm-none-linux-gnueabihf and added new test case. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. OK. Thanks. Committed as r256526. Sudi jeff
[PATCH][ARM] Fix test fail with conflicting -mfloat-abi
Hi This patch fixes my earlier test case that fails for arm-none-eabi with explicit user option for -mfloat-abi which conflict with the test case options. I have added a guard to skip the test on those cases. @Christophe: Sorry about this. I think this should fix the test case. Can you please confirm if this works for you? Thanks Sudi gcc/testsuite/ChangeLog 2018-01-12 Sudakshina Das * gcc.c-torture/compile/pr82096.c: Add dg-skip-if directive. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c index 9fed28c..759d390 100644 --- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ static long long AL[24];
Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi
Hi Christophe On 12/01/18 18:32, Christophe Lyon wrote: Le 12 janv. 2018 15:26, "Sudakshina Das" a écrit : Hi This patch fixes my earlier test case that fails for arm-none-eabi with explicit user option for -mfloat-abi which conflict with the test case options. I have added a guard to skip the test on those cases. @Christophe: Sorry about this. I think this should fix the test case. Can you please confirm if this works for you? Yes it does thanks Thanks for checking that. I have added one more directive for armv5t as well to avoid any conflicts for mcpu options. Sudi Thanks Sudi gcc/testsuite/ChangeLog 2018-01-12 Sudakshina Das * gcc.c-torture/compile/pr82096.c: Add dg-skip-if directive. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c index 9fed28c..35551f5 100644 --- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target arm_arch_v5t_ok } */ +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ static long long AL[24];
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 12/01/18 23:00, Jeff Law wrote: On 01/12/2018 01:45 AM, Christophe Lyon wrote: Hi, On 11 January 2018 at 11:58, Sudakshina Das wrote: Hi Jeff On 10/01/18 21:08, Jeff Law wrote: On 01/10/2018 09:25 AM, Sudakshina Das wrote: Hi Jeff On 10/01/18 10:44, Sudakshina Das wrote: Hi Jeff On 09/01/18 23:43, Jeff Law wrote: On 01/05/2018 12:25 PM, Sudakshina Das wrote: Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. ? /* Emit a store-flags instruction for comparison CODE on OP0 and OP1 and storing in TARGET. Normally return TARGET. Return 0 if that cannot be done. MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT. So we're comparing op0 and op1 AFAICT. One, but not both can be a CONST_INT. If both are a CONST_INT, then you need to address the problem in the caller (by optimizing away the condition). If you've got a REG and a CONST_INT, then the mode should be taken from the REG operand. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. So if only one of the two objects is a CONST_INT, then the mode should come from the other object. I think that's the fundamental problem here and that you're just papering over it by changing the caller. I think my earlier explanation was a bit misleading and I may have rushed into quoting the comment about both operands being const for emit_store_flag_force(). The problem is with the function and I do agree with your suggestion of changing the function to add the code below to be a better approach than the changing the caller. I will change the patch and test it. This is the updated patch according to your suggestions. Testing: Checked for regressions on arm-none-linux-gnueabihf and added new test case. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. OK. Thanks. Committed as r256526. Sudi Could you add a guard like in other tests to skip it if the user added -mfloat-abi=XXX when running the tests? For instance, I have a configuration where I add -mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard and the new test fails because: xgcc: error: -mfloat-abi=soft and -mfloat-abi=hard may not be used together It's starting to feel like the test should move into gcc.target/arm :-) I nearly suggested that already. Consider moving it into gcc.target/arm pre-approved along with adding the -O to the options and whatever is needed to skip the test at the appropriate time. My initial thought was also to put the test in gcc.target/arm. But I wanted to put it in a torture suite as this was failing at different optimization levels. Creating several tests for different optimization levels or a new torture suite jus
Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi
Hi Kyrill On 19/01/18 18:00, Kyrill Tkachov wrote: On 16/01/18 10:31, Sudakshina Das wrote: Hi Christophe On 12/01/18 18:32, Christophe Lyon wrote: Le 12 janv. 2018 15:26, "Sudakshina Das" a écrit : Hi This patch fixes my earlier test case that fails for arm-none-eabi with explicit user option for -mfloat-abi which conflict with the test case options. I have added a guard to skip the test on those cases. @Christophe: Sorry about this. I think this should fix the test case. Can you please confirm if this works for you? Yes it does thanks Thanks for checking that. I have added one more directive for armv5t as well to avoid any conflicts for mcpu options. I agree with what Sudi said in https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01422.html I'd rather keep the test in the generic torture suite as long as we get the directives right. So this is ok for trunk (as the changes are arm-specific directives) with one change below: Thanks, Kyrill Sudi Thanks Sudi gcc/testsuite/ChangeLog 2018-01-12 Sudakshina Das * gcc.c-torture/compile/pr82096.c: Add dg-skip-if directive. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c index 9fed28c..35551f5 100644 --- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target arm_arch_v5t_ok } */ Please also guard this on { target arm*-*-* } That way this test will be run on other targets as well so that they can benefit from it. +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ Thanks committed with the change as r256941 Sudi
[PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to fix emit_store_flag_force () function to fix the ICE. The original discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html Is this ok for gcc-7-branch? Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. diff --git a/gcc/expmed.c b/gcc/expmed.c index e9f634a..30001ac 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -5886,6 +5886,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1, if (tem != 0) return tem; + /* If one operand is constant, make it the second one. Only do this + if the other operand is not constant as well. */ + + if (swap_commutative_operands_p (op0, op1)) +{ + std::swap (op0, op1); + code = swap_condition (code); +} + + if (mode == VOIDmode) +mode = GET_MODE (op0); + if (!target) target = gen_reg_rtx (word_mode); diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c new file mode 100644 index 000..d144b70 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -0,0 +1,11 @@ +/* { dg-require-effective-target arm_arch_v5t_ok { target arm*-*-* } } */ +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ +/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +}
Re: [PATCH PR81647][AARCH64][PING] Fix handling of Unordered Comparisons in aarch64-simd.md
PING On 15/12/17 11:57, Sudakshina Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New.
Re: [PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
On 22/01/18 15:23, Richard Biener wrote: On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das wrote: Hi This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to fix emit_store_flag_force () function to fix the ICE. The original discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html Is this ok for gcc-7-branch? Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf. The branch is currently frozen so please wait until after the GCC 7.3 release. Committed as r257741 Thanks Sudi Thanks, Richard. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test.
Re: [PATCH PR82096][gcc-7, gcc-6] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
On 16/02/18 15:40, Sudakshina Das wrote: On 22/01/18 15:23, Richard Biener wrote: On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das wrote: Hi This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to fix emit_store_flag_force () function to fix the ICE. The original discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html Is this ok for gcc-7-branch? Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf. The branch is currently frozen so please wait until after the GCC 7.3 release. Committed as r257741 Backported to gcc-6 as r257871 Thanks Sudi Thanks Sudi Thanks, Richard. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test.
Re: [PATCH PR81228][AARCH64][gcc-7] Backport r255625 : Fix ICE by adding LTGT
On 09/01/18 15:37, Sudakshina Das wrote: Hi This patch is only adding the missing LTGT to plug the ICE. This is a backport to r255625 of trunk. Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new compile time test case that gives out LTGT to make sure it doesn't ICE. Is this ok for trunk? Backported to gcc-7 as r257901. Sudi Thanks Sudi ChangeLog Entries: *** gcc/ChangeLog *** 2018-01-09 Sudakshina Das Bin Cheng Backport from mainline: 2017-12-14 Sudakshina Das Bin Cheng PR target/81228 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to CCFPEmode. * config/aarch64/aarch64-simd.md (vec_cmp): Add LTGT. *** gcc/testsuite/ChangeLog *** 2017-01-09 Sudakshina Das Backport from mainline: 2017-12-14 Sudakshina Das PR target/81228 * gcc.dg/pr81228.c: New.
[PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts
Hi This patch fixes PR82989 so that we avoid NEON instructions when -mneon-for-64bits is not enabled. This is more of a short term fix for the real deeper problem of making and early decision of choosing or rejecting NEON instructions. There is now a new ticket PR84467 to deal with the longer term solution. (Please refer to the discussion in the bug report for more details). Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and added a new test case based on the test given on the bug report. Ok for trunk and backports for gcc-7 and gcc-6 branches? Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * config/arm/neon.md (ashldi3_neon): Update ?s for constraints to favor GPR over NEON registers. (di3_neon): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.target/arm/pr82989.c: New test. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6a6f5d7..1646b21 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1180,12 +1180,12 @@ ) (define_insn_and_split "ashldi3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r, ?w,w") - (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w,w") - (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm,i"))) - (clobber (match_scratch:SI 3"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:SI 4"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w,X")) + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r, ?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w, w") + (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm, i"))) + (clobber (match_scratch:SI 3"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:SI 4"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w, X")) (clobber (reg:CC_C CC_REGNUM))] "TARGET_NEON" "#" @@ -1276,7 +1276,7 @@ ;; ashrdi3_neon ;; lshrdi3_neon (define_insn_and_split "di3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r,?w,?w") + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r,?w,?w") (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r,0w, w") (match_operand:SI 2 "reg_or_int_operand" " r, i, r, i, i, r, i"))) (clobber (match_scratch:SI 3 "=2r, X, &r, X, X,2r, X")) diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c new file mode 100644 index 000..1295ee6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr82989.c @@ -0,0 +1,38 @@ +/* PR target/82989 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-a8" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { "-mfpu=neon" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */ +/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */ +/* { dg-add-options arm_neon } */ + +typedef unsigned long long uint64_t; + +void f_shr_imm (uint64_t *a ) +{ + *a += *a >> 32; +} +/* { dg-final { scan-assembler-not "vshr*" } } */ + +void f_shr_reg (uint64_t *a, uint64_t b) +{ + *a += *a >> b; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ +/* Only 2 times for f_shr_reg. f_shr_imm should not have any. */ +/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */ + +void f_shl_imm (uint64_t *a) +{ + *a += *a << 32; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ + +void f_shl_reg (uint64_t *a, uint64_t b) +{ + *a += *a << b; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ +/* Only 2 times for f_shl_reg. f_shl_imm should not have any. */ +/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
[PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
Hi This patch is another partial fix for PR 84521. This is adding a definition to one of the target hooks used in the SJLJ implemetation so that AArch64 defines the hard_frame_pointer_rtx as the TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is still a lot more work to be done for these builtins in the future. Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added new test. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * builtins.c (expand_builtin_setjmp_receiver): Update condition to restore frame pointer. * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.c-torture/execute/pr84521.c: New test. diff --git a/gcc/builtins.c b/gcc/builtins.c index 85affa7..640f1a9 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label) /* Now put in the code to restore the frame pointer, and argument pointer, if needed. */ - if (! targetm.have_nonlocal_goto ()) + if (! targetm.have_nonlocal_goto () + && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx) { /* First adjust our frame pointer to its actual value. It was previously set to the start of the virtual area corresponding to diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index e3c52f6..7a21c14 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e1fb87f..e7ac0fe 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17505,6 +17512,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c new file mode 100644 index 000..76b10d2 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target indirect_jumps } */ + +#include + +jmp_buf buf; + +int uses_longjmp (void) +{ + __builtin_longjmp (buf, 1); +} + +int gl; +void after_longjmp (void) +{ + gl = 5; +} + +int +test_1 (int n) +{ + volatile int *p = alloca (n); + if (__builtin_setjmp (buf)) +{ + after_longjmp (); +} + else +{ + uses_longjmp (); +} + + return 0; +} + +int __attribute__ ((optimize ("no-omit-frame-pointer"))) +test_2 (int n) +{ + int i; + int *ptr = (int *)__builtin_alloca (sizeof (int) * n); + for (i = 0; i < n; i++) +ptr[i] = i; + test_1 (n); + return 0; +} + +int main (int argc, const char **argv) +{ + __builtin_memset (&buf, 0xaf, sizeof (buf)); + test_2 (100); +}
Re: [Aarch64] Fix conditional branches with target far away.
On 15/03/18 15:27, Sameera Deshpande wrote: Ping! On 28 February 2018 at 16:18, Sameera Deshpande wrote: On 27 February 2018 at 18:25, Ramana Radhakrishnan wrote: On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande wrote: Hi! Please find attached the patch to fix bug in branches with offsets over 1MiB. There has been an attempt to fix this issue in commit 050af05b9761f1979f11c151519e7244d5becd7c However, the far_branch attribute defined in above patch used insn_length - which computes incorrect offset. Hence, eliminated the attribute completely, and computed the offset from insn_addresses instead. Ok for trunk? gcc/Changelog 2018-02-13 Sameera Deshpande * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate all the dependencies on the attribute from RTL patterns. I'm not a maintainer but this looks good to me modulo notes about how this was tested. What would be nice is a testcase for the testsuite as well as ensuring that the patch has been bootstrapped and regression tested. AFAIR, the original patch was put in because match.pd failed when bootstrap in another context. regards Ramana -- - Thanks and regards, Sameera D. The patch is tested with GCC testsuite and bootstrapping successfully. Also tested for spec benchmark. I am not a maintainer either. I noticed that the range check you do for the offset has a (<= || >=). The "far_branch" however did (< || >=) for a positive value. Was that also part of the incorrect offset calculation? @@ -692,7 +675,11 @@ { if (get_attr_length (insn) =3D=3D 8) { - if (get_attr_far_branch (insn) =3D=3D 1) + long long int offset; + offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0))) + - INSN_ADDRESSES (INSN_UID (insn)); + + if (offset <=3D -1048576 || offset >=3D 1048572) return aarch64_gen_far_branch (operands, 2, "Ltb", "\\t%0, %1, "); else @@ -709,12 +696,7 @@ (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) (lt (minus (match_dup 2) (pc)) (const_int 32764))) (const_int 4) - (const_int 8))) - (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) - (const_int 0) - (const_int 1)))] + (const_int 8)))] ) Thanks Sudi -- - Thanks and regards, Sameera D.
Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
Hi On 19/03/18 14:29, James Greenhalgh wrote: On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. OK. Let it soak on trunk for a while before the backport. Thanks. Committed to trunk as r258653. Will wait a week before backport. Sudi Thanks, James ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2731,10 +2731,10 @@ break; } /* Fall through. */ -case UNGE: +case UNLT: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLE: +case UNGT: case GT: comparison = gen_aarch64_cmgt; break; @@ -2745,10 +2745,10 @@ break; } /* Fall through. */ -case UNGT: +case UNLE: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLT: +case UNGE: case GE: comparison = gen_aarch64_cmge; break; @@ -2771,21 +2771,35 @@ case UNGT: case UNLE: case UNLT: -case NE: - /* FCM returns false for lanes which are unordered, so if we use -the inverse of the comparison we actually want to emit, then -invert the result, we will end up with the correct result. -Note that a NE NaN and NaN NE b are true for all a, b. - -Our transformations are: -a UNGE b -> !(b GT a) -a UNGT b -> !(b GE a) -a UNLE b -> !(a GT b) -a UNLT b -> !(a GE b) -a NE b -> !(a EQ b) */ - gcc_assert (comparison != NULL); - emit_insn (comparison (operands[0], operands[2], operands[3])); - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); + { + /* All of the above must not raise any FP exceptions. Thus we first + check each operand for NaNs and force any elements containing NaN to + zero before using them in the compare. + Example: UN (a, b) -> UNORDERED (a, b) | +(cm (isnan (a) ? 0.0 : a, + isnan (b) ? 0.0 : b)) + We use the following transformations for doing the comparisions: + a UNGE b -> a GE b + a UNGT b -> a GT b + a UNLE b -> b GE a + a UNLT b -> b GT a. */ + + rtx tmp0 = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2])); + emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3])); + emit_insn (gen_and3 (tmp2, tmp0, tmp1)); + emit_insn (gen_and3 (tmp0, tmp0, + lowpart_subreg (mode, operands[2], mode))); + emit_insn (gen_and3 (tmp1, tmp1, + lowpart_subreg (mode, operands[3], mode))); + gcc_assert (comparison != NULL); + emit_insn (comparison (operands[0], + lowpart_subreg (mode, tmp0, mode), + lowpart_subreg (mode, tmp1, mode))); + emit_insn (gen_orn3 (operands[0], tmp2, operands[0])); + } break; case LT: @@ -2793,25 +2807,19 @@ case GT: case GE: case EQ: +case NE: /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. As
Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
Hi On 20/03/18 08:13, Christophe Lyon wrote: On 19 March 2018 at 19:55, Sudakshina Das wrote: Hi On 19/03/18 14:29, James Greenhalgh wrote: On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. OK. Let it soak on trunk for a while before the backport. Thanks. Committed to trunk as r258653. Will wait a week before backport. Hi, As the test failed to compile on aarch64 bare-metal targets, I added /* { dg-require-effective-target fenv_exceptions } */ as obvious (r258672). 2018-03-20 Christophe Lyon PR target/81647 * gcc.target/aarch64/pr81647.c: Require fenv_exceptions. Index: testsuite/gcc.target/aarch64/pr81647.c === --- testsuite/gcc.target/aarch64/pr81647.c (revision 258671) +++ testsuite/gcc.target/aarch64/pr81647.c (revision 258672) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O3 -fdump-tree-ssa" } */ +/* { dg-require-effective-target fenv_exceptions } */ #include Christophe Thanks for fixing this and apologies for missing it on the first place! Sudi Sudi Thanks, James ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2731,10 +2731,10 @@ break; } /* Fall through. */ -case UNGE: +case UNLT: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLE: +case UNGT: case GT: comparison = gen_aarch64_cmgt; break; @@ -2745,10 +2745,10 @@ break; } /* Fall through. */ -case UNGT: +case UNLE: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLT: +case UNGE: case GE: comparison = gen_aarch64_cmge; break; @@ -2771,21 +2771,35 @@ case UNGT: case UNLE: case UNLT: -case NE: - /* FCM returns false for lanes which are unordered, so if we use -the inverse of the comparison we actually want to emit, then -invert the result, we will end up with the correct result. -Note that a NE NaN and NaN NE b are true for all a, b. - -Our transformations are: -a UNGE b -> !(b GT a) -a UNGT b -> !(b GE a) -a UNLE b -> !(a GT b) -a UNLT b -> !(a GE b) -a NE b -> !(a EQ b) */ - gcc_assert (comparison != NULL); - emit_insn (comparison (operands[0], operands[2], operands[3])); - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); + { + /* All of the above must not raise any FP exceptions. Thus we first + check each operand for NaNs and force any elements containing NaN to + zero before using them in the compare. + Example: UN (a, b) -> UNORDERED (a, b) | +(cm (isnan (a) ? 0.0 : a, + isnan (b) ? 0.0 : b)) + We use the following transformations for doing the comparisions: + a UNGE b -> a GE b + a UNGT b -> a GT b + a UNLE b -> b GE a + a UNLT b -> b GT a. */ + + rtx tmp0 = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (m
Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts
Hi On 20/03/18 10:03, Richard Earnshaw (lists) wrote: On 14/03/18 10:11, Sudakshina Das wrote: Hi This patch fixes PR82989 so that we avoid NEON instructions when -mneon-for-64bits is not enabled. This is more of a short term fix for the real deeper problem of making and early decision of choosing or rejecting NEON instructions. There is now a new ticket PR84467 to deal with the longer term solution. (Please refer to the discussion in the bug report for more details). Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and added a new test case based on the test given on the bug report. Ok for trunk and backports for gcc-7 and gcc-6 branches? OK for trunk. Please leave it a couple of days before backporting to ensure that the testcase doesn't tickle any multilib issues. R. Thanks. Committed to trunk as r258677. Will wait a week for backporting. Sudi Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * config/arm/neon.md (ashldi3_neon): Update ?s for constraints to favor GPR over NEON registers. (di3_neon): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.target/arm/pr82989.c: New test. pr82989.diff diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6a6f5d7..1646b21 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1180,12 +1180,12 @@ ) (define_insn_and_split "ashldi3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r, ?w,w") - (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w,w") - (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm,i"))) - (clobber (match_scratch:SI 3"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:SI 4"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w,X")) + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r, ?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w, w") + (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm, i"))) + (clobber (match_scratch:SI 3"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:SI 4"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w, X")) (clobber (reg:CC_C CC_REGNUM))] "TARGET_NEON" "#" @@ -1276,7 +1276,7 @@ ;; ashrdi3_neon ;; lshrdi3_neon (define_insn_and_split "di3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r,?w,?w") + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r,?w,?w") (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r,0w, w") (match_operand:SI 2 "reg_or_int_operand" " r, i, r, i, i, r, i"))) (clobber (match_scratch:SI 3"=2r, X, &r, X, X,2r, X")) diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c new file mode 100644 index 000..1295ee6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr82989.c @@ -0,0 +1,38 @@ +/* PR target/82989 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-a8" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { "-mfpu=neon" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */ +/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */ +/* { dg-add-options arm_neon } */ + +typedef unsigned long long uint64_t; + +void f_shr_imm (uint64_t *a ) +{ + *a += *a >> 32; +} +/* { dg-final { scan-assembler-not "vshr*" } } */ + +void f_shr_reg (uint64_t *a, uint64_t b) +{ + *a += *a >> b; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ +/* Only 2 times for f_shr_reg. f_shr_imm should not have any. */ +/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */ + +void f_shl_imm (uint64_t *a) +{ + *a += *a << 32; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ + +void f_shl_reg (uint64_t *a, uint64_t b) +{ + *a += *a << b; +} +/* { dg-final { scan-assembler-not "vshl*" } } */ +/* Only 2 times for f_shl_reg. f_shl_imm should not have any. */ +/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
On 19/03/18 12:11, James Greenhalgh wrote: On Wed, Mar 14, 2018 at 05:40:49PM +, Sudi Das wrote: Hi This patch is another partial fix for PR 84521. This is adding a definition to one of the target hooks used in the SJLJ implemetation so that AArch64 defines the hard_frame_pointer_rtx as the TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is still a lot more work to be done for these builtins in the future. Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added new test. Is this ok for trunk? OK. Thanks James but I realized I marked this wrong as only AArch64 patch. This also has a mid change so cc'ing more people for approval. Sudi Thanks, James *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * builtins.c (expand_builtin_setjmp_receiver): Update condition to restore frame pointer. * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment. * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value): New. (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.c-torture/execute/pr84521.c: New test. diff --git a/gcc/builtins.c b/gcc/builtins.c index 85affa7..640f1a9 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label) /* Now put in the code to restore the frame pointer, and argument pointer, if needed. */ - if (! targetm.have_nonlocal_goto ()) + if (! targetm.have_nonlocal_goto () + && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx) { /* First adjust our frame pointer to its actual value. It was previously set to the start of the virtual area corresponding to diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index e3c52f6..7a21c14 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version; #define EH_RETURN_STACKADJ_RTXgen_rtx_REG (Pmode, R4_REGNUM) #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () -/* Don't use __builtin_setjmp until we've defined it. */ +/* Don't use __builtin_setjmp until we've defined it. + CAUTION: This macro is only used during exception unwinding. + Don't fall for its name. */ #undef DONT_USE_BUILTIN_SETJMP #define DONT_USE_BUILTIN_SETJMP 1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e1fb87f..e7ac0fe 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED) expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); } +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ +static rtx +aarch64_builtin_setjmp_frame_value (void) +{ + return hard_frame_pointer_rtx; +} + /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ static tree @@ -17505,6 +17512,9 @@ aarch64_run_selftests (void) #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN aarch64_fold_builtin +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value + #undef TARGET_FUNCTION_ARG #define TARGET_FUNCTION_ARG aarch64_function_arg diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c new file mode 100644 index 000..76b10d2 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target indirect_jumps } */ + +#include + +jmp_buf buf; + +int uses_longjmp (void) +{ + __builtin_longjmp (buf, 1); +} + +int gl; +void after_longjmp (void) +{ + gl = 5; +} + +int +test_1 (int n) +{ + volatile int *p = alloca (n); + if (__builtin_setjmp (buf)) +{ + after_longjmp (); +} + else +{ + uses_longjmp (); +} + + return 0; +} + +int __attribute__ ((optimize ("no-omit-frame-pointer"))) +test_2 (int n) +{ + int i; + int *ptr = (int *)__builtin_alloca (sizeof (int) * n); + for (i = 0; i < n; i++) +ptr[i] = i; + test_1 (n); + return 0; +} + +int main (int argc, const char **argv) +{ + __builtin_memset (&buf, 0xaf, sizeof (buf)); + test_2 (100); +}
Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts
Hi On 21/03/18 08:51, Christophe Lyon wrote: On 20 March 2018 at 11:58, Sudakshina Das wrote: Hi On 20/03/18 10:03, Richard Earnshaw (lists) wrote: On 14/03/18 10:11, Sudakshina Das wrote: Hi This patch fixes PR82989 so that we avoid NEON instructions when -mneon-for-64bits is not enabled. This is more of a short term fix for the real deeper problem of making and early decision of choosing or rejecting NEON instructions. There is now a new ticket PR84467 to deal with the longer term solution. (Please refer to the discussion in the bug report for more details). Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and added a new test case based on the test given on the bug report. Ok for trunk and backports for gcc-7 and gcc-6 branches? OK for trunk. Please leave it a couple of days before backporting to ensure that the testcase doesn't tickle any multilib issues. R. Thanks. Committed to trunk as r258677. Will wait a week for backporting. Sudi Hi Sudi, I've noticed that: FAIL:gcc.target/arm/pr82989.c scan-assembler-times lsl\\tr[0-9]+, r[0-9]+, r[0-9] 2 FAIL:gcc.target/arm/pr82989.c scan-assembler-times lsr\\tr[0-9]+, r[0-9]+, r[0-9] 2 on target armeb-none-linux-gnueabihf --with-mode thumb --with-cpu cortex-a9 --with-fpu neon-fp16 The tests pass when using --with-mode arm Can you check? Yes I see this as well. Sorry about this. I am testing a quick fix for this at the moment. Thanks Sudi Thanks Christophe Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * config/arm/neon.md (ashldi3_neon): Update ?s for constraints to favor GPR over NEON registers. (di3_neon): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.target/arm/pr82989.c: New test. pr82989.diff diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6a6f5d7..1646b21 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1180,12 +1180,12 @@ ) (define_insn_and_split "ashldi3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r, ?w,w") - (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w,w") - (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm,i"))) - (clobber (match_scratch:SI 3"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:SI 4"= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w,X")) + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r, ?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w, w") + (match_operand:SI 2 "general_operand""rUm, i, r, i, i,rUm, i"))) + (clobber (match_scratch:SI 3"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:SI 4"= X, X, &r, X, X, X, X")) + (clobber (match_scratch:DI 5"=&w, X, X, X, X, &w, X")) (clobber (reg:CC_C CC_REGNUM))] "TARGET_NEON" "#" @@ -1276,7 +1276,7 @@ ;; ashrdi3_neon ;; lshrdi3_neon (define_insn_and_split "di3_neon" - [(set (match_operand:DI 0 "s_register_operand""= w, w,?&r,?r,?&r,?w,?w") + [(set (match_operand:DI 0 "s_register_operand""= w, w, &r, r, &r,?w,?w") (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r,0w, w") (match_operand:SI 2 "reg_or_int_operand" " r, i, r, i, i, r, i"))) (clobber (match_scratch:SI 3 "=2r, X, &r, X, X,2r, X")) diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c new file mode 100644 index 000..1295ee6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr82989.c @@ -0,0 +1,38 @@ +/* PR target/82989 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-a8" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { "-mfpu=neon" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */ +/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */ +/* { dg-add-options arm_neon } */ + +typedef unsigned long long uint64_t; +
[PATCH][ARM] Fix test pr82989.c for big endian and mthumb
Hi The test pr82989.c which was added in one of previous commits is failing for mthumb and big-endian configurations. The aim of this test was to check that NEON instructions are not being used for simple shift operations. The scanning of lsl and lsr instructions and checking its counts were just too restrictive for different configurations. So I have now simplified the test to only check for the absence of NEON instructions. Testing: Only test case change so only tested the said test on differently configured toolchain. @Christophe can you confirm this patch fixes the failure for you? Thanks Sudi *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/82989 * gcc.target/arm/pr82989.c: Change dg-scan-assembly directives. diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c index 6f74dba..8519c3f 100644 --- a/gcc/testsuite/gcc.target/arm/pr82989.c +++ b/gcc/testsuite/gcc.target/arm/pr82989.c @@ -13,26 +13,21 @@ void f_shr_imm (uint64_t *a) { *a += *a >> 32; } -/* { dg-final { scan-assembler-not "vshr*" } } */ void f_shr_reg (uint64_t *a, uint64_t b) { *a += *a >> b; } -/* { dg-final { scan-assembler-not "vshl*" } } */ -/* Only 2 times for f_shr_reg. f_shr_imm should not have any. */ -/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */ void f_shl_imm (uint64_t *a) { *a += *a << 32; } -/* { dg-final { scan-assembler-not "vshl*" } } */ void f_shl_reg (uint64_t *a, uint64_t b) { *a += *a << b; } /* { dg-final { scan-assembler-not "vshl*" } } */ -/* Only 2 times for f_shl_reg. f_shl_imm should not have any. */ -/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */ +/* { dg-final { scan-assembler-not "vshr*" } } */ +/* { dg-final { scan-assembler-not "vmov*" } } */
Re: [PATCH][ARM] Fix test pr82989.c for big endian and mthumb
Hi On 21/03/18 17:03, Kyrill Tkachov wrote: On 21/03/18 16:33, Christophe Lyon wrote: On 21 March 2018 at 13:11, Sudakshina Das wrote: Hi The test pr82989.c which was added in one of previous commits is failing for mthumb and big-endian configurations. The aim of this test was to check that NEON instructions are not being used for simple shift operations. The scanning of lsl and lsr instructions and checking its counts were just too restrictive for different configurations. So I have now simplified the test to only check for the absence of NEON instructions. Testing: Only test case change so only tested the said test on differently configured toolchain. @Christophe can you confirm this patch fixes the failure for you? Yes, the validations are now OK my side. Thanks, the patch is ok for trunk. Thanks Christophe for validating and Kyrill for the Ok. Committed to trunk as r258723. Thanks Sudi Kyrill Thanks Christophe Thanks Sudi *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/82989 * gcc.target/arm/pr82989.c: Change dg-scan-assembly directives.
[PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
Hi The ICE in the bug report was happening because the macro USE_RETURN_INSN (FALSE) was returning different values at different points in the compilation. This was internally occurring because the function arm_compute_static_chain_stack_bytes () which was dependent on arm_r3_live_at_start_p () was giving a different value after the cond_exec instructions were created in ce3 causing the liveness of r3 to escape up to the start block. The function arm_compute_static_chain_stack_bytes () should really only compute the value once during epilogue/prologue stage. This pass introduces a new member 'static_chain_stack_bytes' to the target definition of the struct machine_function which gets calculated in expand_prologue and is the value that is returned by arm_compute_static_chain_stack_bytes () beyond that. Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf and added the reported test to the testsuite. Is this ok for trunk? Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bbf3937..2809112 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cb6ab81..bc31810 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) +return cfun->machine->static_chain_stack_bytes; + /* See the defining assertion in arm_expand_prologue. */ if (IS_NESTED (arm_current_func_type ()) && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM) @@ -21699,6 +21704,11 @@ arm_expand_prologue (void) emit_insn (gen_movsi (stack_pointer_rtx, r1)); } + /* Let's compute the static_chain_stack_bytes required and store it. Right + now the value must the -1 as stored by arm_init_machine_status (). */ + cfun->machine->static_chain_stack_bytes += arm_compute_static_chain_stack_bytes (); + /* The static chain register is the same as the IP register. If it is clobbered when creating the frame, we need to save and restore it. */ clobber_ip = IS_NESTED (func_type) @@ -24875,6 +24885,7 @@ arm_init_machine_status (void) #if ARM_FT_UNKNOWN != 0 machine->func_type = ARM_FT_UNKNOWN; #endif + machine->static_chain_stack_bytes = -1; return machine; } diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c new file mode 100644 index 000..c61c548 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr84826.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fstack-clash-protection" } */ + +void d (void *); + +void a () +{ + int b; + void bar (int c) + { +if (__builtin_expect (c, 0)) + ++b; + } + d (bar); +}
Re: [Aarch64] Fix conditional branches with target far away.
Hi Sameera On 22/03/18 02:07, Sameera Deshpande wrote: Hi Sudakshina, As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the far branch instruction offset is inclusive of both the offsets. Hence, I am using <=||=> and not <||>= as it was in previous implementation. I have to admit earlier I was only looking at the patch mechanically and found a difference with the previous implementation in offset comparison. After you pointed out, I looked up the ARMv8 ARM and I have a couple of doubts: 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive qualifies as an 'in range' offset. However, the code for both attribute length and far_branch has been using [-1048576 ,1048572), that is, ( >= && < ). If the far_branch was incorrectly calculated, then maybe the length calculations with similar magic numbers should also be corrected? Of course, I am not an expert in this and maybe this was a conscience decision so I would ask Ramana to maybe clarify if he remembers. 2. Now to come back to your patch, if my understanding is correct, I think a far_branch would be anything outside of this range, that is, (offset < -1048576 || offset > 1048572), anything that can not be represented in the 21-bit range. Thanks Sudi On 16 March 2018 at 00:51, Sudakshina Das wrote: On 15/03/18 15:27, Sameera Deshpande wrote: Ping! On 28 February 2018 at 16:18, Sameera Deshpande wrote: On 27 February 2018 at 18:25, Ramana Radhakrishnan wrote: On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande wrote: Hi! Please find attached the patch to fix bug in branches with offsets over 1MiB. There has been an attempt to fix this issue in commit 050af05b9761f1979f11c151519e7244d5becd7c However, the far_branch attribute defined in above patch used insn_length - which computes incorrect offset. Hence, eliminated the attribute completely, and computed the offset from insn_addresses instead. Ok for trunk? gcc/Changelog 2018-02-13 Sameera Deshpande * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate all the dependencies on the attribute from RTL patterns. I'm not a maintainer but this looks good to me modulo notes about how this was tested. What would be nice is a testcase for the testsuite as well as ensuring that the patch has been bootstrapped and regression tested. AFAIR, the original patch was put in because match.pd failed when bootstrap in another context. regards Ramana -- - Thanks and regards, Sameera D. The patch is tested with GCC testsuite and bootstrapping successfully. Also tested for spec benchmark. I am not a maintainer either. I noticed that the range check you do for the offset has a (<= || >=). The "far_branch" however did (< || >=) for a positive value. Was that also part of the incorrect offset calculation? @@ -692,7 +675,11 @@ { if (get_attr_length (insn) =3D=3D 8) { - if (get_attr_far_branch (insn) =3D=3D 1) + long long int offset; + offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0))) + - INSN_ADDRESSES (INSN_UID (insn)); + + if (offset <=3D -1048576 || offset >=3D 1048572) return aarch64_gen_far_branch (operands, 2, "Ltb", "\\t%0, %1, "); else @@ -709,12 +696,7 @@ (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) (lt (minus (match_dup 2) (pc)) (const_int 32764))) (const_int 4) - (const_int 8))) - (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) - (const_int 0) - (const_int 1)))] + (const_int 8)))] ) Thanks Sudi -- - Thanks and regards, Sameera D.
Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
Hi Kyrill On 22/03/18 16:08, Kyrill Tkachov wrote: Hi Sudi, On 21/03/18 17:44, Sudakshina Das wrote: Hi The ICE in the bug report was happening because the macro USE_RETURN_INSN (FALSE) was returning different values at different points in the compilation. This was internally occurring because the function arm_compute_static_chain_stack_bytes () which was dependent on arm_r3_live_at_start_p () was giving a different value after the cond_exec instructions were created in ce3 causing the liveness of r3 to escape up to the start block. The function arm_compute_static_chain_stack_bytes () should really only compute the value once during epilogue/prologue stage. This pass introduces a new member 'static_chain_stack_bytes' to the target definition of the struct machine_function which gets calculated in expand_prologue and is the value that is returned by arm_compute_static_chain_stack_bytes () beyond that. Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf and added the reported test to the testsuite. Is this ok for trunk? Thanks for working on this. I agree with the approach, I have a couple of comments inline. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bbf3937..2809112 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cb6ab81..bc31810 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) + return cfun->machine->static_chain_stack_bytes; + My concern is that this approach caches the first value that is computed for static_chain_stack_bytes. I believe the layout frame code is called multiple times during register allocation as it goes through the motions and I think we want the last value it computes during reload How about we do something like: if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed) return cfun->machine->static_chain_stack_bytes; ?... /* See the defining assertion in arm_expand_prologue. */ if (IS_NESTED (arm_current_func_type ()) && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM) @@ -21699,6 +21704,11 @@ arm_expand_prologue (void) emit_insn (gen_movsi (stack_pointer_rtx, r1)); } + /* Let's compute the static_chain_stack_bytes required and store it. Right + now the value must the -1 as stored by arm_init_machine_status (). */ ... this comment would need to be tweaked as cfun->machine->static_chain_stack_bytes may hold an intermediate value computed in reload or some other point before epilogue_completed. + cfun->machine->static_chain_stack_bytes + = arm_compute_static_chain_stack_bytes (); + Maybe I did not understand this completely, but my idea was that I am initializing the value of cfun->machine->static_chain_stack_bytes to be -1 in arm_init_machine_status () and then it stays as -1 all throughout reload and hence the function arm_compute_static_chain_stack_bytes () will keep computing the value instead of returning the cached value. Only during expand_prologue (which I assumed occurs much after reload), I overwrite the initial -1 and after that any call to arm_compute_static_chain_stack_bytes () would return this cached value. I did start out writing the patch with a epilogue_completed check but realized that even during this stage arm_compute_static_chain_stack_bytes () was called several times and thus to avoid those re-computations, (again assuming by this stage we already should have a fixed value) I re-wrote it with the initialization to -1 approach. Thanks Sudi Thanks, Kyrill
Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
Hi On 22/03/18 16:52, Kyrill Tkachov wrote: On 22/03/18 16:20, Sudakshina Das wrote: Hi Kyrill On 22/03/18 16:08, Kyrill Tkachov wrote: Hi Sudi, On 21/03/18 17:44, Sudakshina Das wrote: Hi The ICE in the bug report was happening because the macro USE_RETURN_INSN (FALSE) was returning different values at different points in the compilation. This was internally occurring because the function arm_compute_static_chain_stack_bytes () which was dependent on arm_r3_live_at_start_p () was giving a different value after the cond_exec instructions were created in ce3 causing the liveness of r3 to escape up to the start block. The function arm_compute_static_chain_stack_bytes () should really only compute the value once during epilogue/prologue stage. This pass introduces a new member 'static_chain_stack_bytes' to the target definition of the struct machine_function which gets calculated in expand_prologue and is the value that is returned by arm_compute_static_chain_stack_bytes () beyond that. Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf and added the reported test to the testsuite. Is this ok for trunk? Thanks for working on this. I agree with the approach, I have a couple of comments inline. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bbf3937..2809112 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cb6ab81..bc31810 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) + return cfun->machine->static_chain_stack_bytes; + My concern is that this approach caches the first value that is computed for static_chain_stack_bytes. I believe the layout frame code is called multiple times during register allocation as it goes through the motions and I think we want the last value it computes during reload How about we do something like: if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed) return cfun->machine->static_chain_stack_bytes; ?... /* See the defining assertion in arm_expand_prologue. */ if (IS_NESTED (arm_current_func_type ()) && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM) @@ -21699,6 +21704,11 @@ arm_expand_prologue (void) emit_insn (gen_movsi (stack_pointer_rtx, r1)); } + /* Let's compute the static_chain_stack_bytes required and store it. Right + now the value must the -1 as stored by arm_init_machine_status (). */ ... this comment would need to be tweaked as cfun->machine->static_chain_stack_bytes may hold an intermediate value computed in reload or some other point before epilogue_completed. + cfun->machine->static_chain_stack_bytes + = arm_compute_static_chain_stack_bytes (); + Maybe I did not understand this completely, but my idea was that I am initializing the value of cfun->machine->static_chain_stack_bytes to be -1 in arm_init_machine_status () and then it stays as -1 all throughout reload and hence the function arm_compute_static_chain_stack_bytes () will keep computing the value instead of returning the cached value. Only during expand_prologue (which I assumed occurs much after reload), I overwrite the initial -1 and after that any call to arm_compute_static_chain_stack_bytes () would return this cached value. I did start out writing the patch with a epilogue_completed check but realized that even during this stage arm_compute_static_chain_stack_bytes () was called several times and thus to avoid those re-computations, (again assuming by this stage we already should have a fixed value) I re-wrote it with the initialization to -1 approach. Right, I had read the
Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
On 23/03/18 09:12, Kyrill Tkachov wrote: On 23/03/18 08:47, Christophe Lyon wrote: Hi Sudi, On 22 March 2018 at 18:26, Sudakshina Das wrote: Hi On 22/03/18 16:52, Kyrill Tkachov wrote: On 22/03/18 16:20, Sudakshina Das wrote: Hi Kyrill On 22/03/18 16:08, Kyrill Tkachov wrote: Hi Sudi, On 21/03/18 17:44, Sudakshina Das wrote: Hi The ICE in the bug report was happening because the macro USE_RETURN_INSN (FALSE) was returning different values at different points in the compilation. This was internally occurring because the function arm_compute_static_chain_stack_bytes () which was dependent on arm_r3_live_at_start_p () was giving a different value after the cond_exec instructions were created in ce3 causing the liveness of r3 to escape up to the start block. The function arm_compute_static_chain_stack_bytes () should really only compute the value once during epilogue/prologue stage. This pass introduces a new member 'static_chain_stack_bytes' to the target definition of the struct machine_function which gets calculated in expand_prologue and is the value that is returned by arm_compute_static_chain_stack_bytes () beyond that. Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf and added the reported test to the testsuite. Is this ok for trunk? Thanks for working on this. I agree with the approach, I have a couple of comments inline. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test The new test fails on arm-none-linux-gnueabi --with-mode thumb --with-cpu cortex-a9 --with-fpu default Dejagnu flags: -march=armv5t Because: /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a': /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented: -fstack-check=specific for Thumb-1 compiler exited with status 1 FAIL: gcc.target/arm/pr84826.c (test for excess errors) You probably have to add a require-effective-target to skip the test in such cases. Yeah, these tests need a { dg-require-effective-target supports_stack_clash_protection } A patch to add that is pre-approved. Sorry for missing it in review. Kyrill Hi Christophe and Kyrill How about the attached patch? { dg-require-effective-target supports_stack_clash_protection } is not enabled for any of ARM targets, so this is my work around for that. There is a comment in target-supports.exp which makes me a little hesitant in tinkering with the require effective target code. proc check_effective_target_supports_stack_clash_protection { } { # Temporary until the target bits are fully ACK'd. # if { [istarget aarch*-*-*] } { # return 1 # } if { [istarget x86_64-*-*] || [istarget i?86-*-*] || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] || [istarget s390*-*-*] } { return 1 } return 0 } I have opened a new PR 85005 which mentions this that is meant for GCC 9 as part for a bigger cleanup and redesign of the stack clash protection code on ARM backend. *** gcc/testsuite/ChangeLog *** 2018-03-23 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: Add dg directive. Thanks Sudi Thanks, Christophe diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bbf3937..2809112 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cb6ab81..bc31810 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) + return cfun->machine->static_chain_stack_bytes; + My concern is that this approach caches the first value that is computed for static_chain_stack_bytes. I believe the layout frame code is called multiple times during register allocation as it goes through the motions and I think we want the last value it computes during reload How about
Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
On 23/03/18 13:50, Kyrill Tkachov wrote: On 23/03/18 13:31, Sudakshina Das wrote: On 23/03/18 09:12, Kyrill Tkachov wrote: On 23/03/18 08:47, Christophe Lyon wrote: Hi Sudi, On 22 March 2018 at 18:26, Sudakshina Das wrote: Hi On 22/03/18 16:52, Kyrill Tkachov wrote: On 22/03/18 16:20, Sudakshina Das wrote: Hi Kyrill On 22/03/18 16:08, Kyrill Tkachov wrote: Hi Sudi, On 21/03/18 17:44, Sudakshina Das wrote: Hi The ICE in the bug report was happening because the macro USE_RETURN_INSN (FALSE) was returning different values at different points in the compilation. This was internally occurring because the function arm_compute_static_chain_stack_bytes () which was dependent on arm_r3_live_at_start_p () was giving a different value after the cond_exec instructions were created in ce3 causing the liveness of r3 to escape up to the start block. The function arm_compute_static_chain_stack_bytes () should really only compute the value once during epilogue/prologue stage. This pass introduces a new member 'static_chain_stack_bytes' to the target definition of the struct machine_function which gets calculated in expand_prologue and is the value that is returned by arm_compute_static_chain_stack_bytes () beyond that. Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf and added the reported test to the testsuite. Is this ok for trunk? Thanks for working on this. I agree with the approach, I have a couple of comments inline. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-21 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test The new test fails on arm-none-linux-gnueabi --with-mode thumb --with-cpu cortex-a9 --with-fpu default Dejagnu flags: -march=armv5t Because: /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a': /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented: -fstack-check=specific for Thumb-1 compiler exited with status 1 FAIL: gcc.target/arm/pr84826.c (test for excess errors) You probably have to add a require-effective-target to skip the test in such cases. Yeah, these tests need a { dg-require-effective-target supports_stack_clash_protection } A patch to add that is pre-approved. Sorry for missing it in review. Kyrill Hi Christophe and Kyrill How about the attached patch? { dg-require-effective-target supports_stack_clash_protection } is not enabled for any of ARM targets, so this is my work around for that. There is a comment in target-supports.exp which makes me a little hesitant in tinkering with the require effective target code. proc check_effective_target_supports_stack_clash_protection { } { # Temporary until the target bits are fully ACK'd. # if { [istarget aarch*-*-*] } { # return 1 # } if { [istarget x86_64-*-*] || [istarget i?86-*-*] || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] || [istarget s390*-*-*] } { return 1 } return 0 } I have opened a new PR 85005 which mentions this that is meant for GCC 9 as part for a bigger cleanup and redesign of the stack clash protection code on ARM backend. Ok. Thanks for doing this. Thanks and sorry about this! Committed as r258805. Sudi Kyrill *** gcc/testsuite/ChangeLog *** 2018-03-23 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: Add dg directive. Thanks Sudi Thanks, Christophe diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index bbf3937..2809112 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cb6ab81..bc31810 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) + return cfun->machine->static_chain_stack_bytes; + My concern is that this approach caches the first value that is computed for static_chain_stack_bytes. I believe the
[PATCH][AARCH64][PR target/84882] Add mno-strict-align
Hi This patch adds the no variant to -mstrict-align and the corresponding function attribute. To enable the function attribute, I have modified aarch64_can_inline_p () to allow checks even when the callee function has no attribute. The need for this is shown by the new test target_attr_18.c. Testing: Bootstrapped, regtested and added new tests that are copies of earlier tests checking -mstrict-align with opposite scan directives. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-27 Sudakshina Das * common/config/aarch64/aarch64-common.c (aarch64_handle_option): Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags. * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative. * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg as true for strict-align. (aarch64_can_inline_p): Perform checks even when callee has no attributes to check for strict alignment. * doc/extend.texi (AArch64 Function Attributes): Document no-strict-align. * doc/invoke.texi: (AArch64 Options): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-27 Sudakshina Das * gcc.target/aarch64/pr84882.c: New test. * gcc.target/aarch64/target_attr_18.c: Likewise. diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c index 7fd9305..d5655a0 100644 --- a/gcc/common/config/aarch64/aarch64-common.c +++ b/gcc/common/config/aarch64/aarch64-common.c @@ -97,7 +97,10 @@ aarch64_handle_option (struct gcc_options *opts, return true; case OPT_mstrict_align: - opts->x_target_flags |= MASK_STRICT_ALIGN; + if (val) + opts->x_target_flags |= MASK_STRICT_ALIGN; + else + opts->x_target_flags &= ~MASK_STRICT_ALIGN; return true; case OPT_momit_leaf_frame_pointer: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..4f35a6c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11277,7 +11277,7 @@ static const struct aarch64_attribute_info aarch64_attributes[] = { "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL, OPT_mfix_cortex_a53_843419 }, { "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ }, - { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align }, + { "strict-align", aarch64_attr_mask, true, NULL, OPT_mstrict_align }, { "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL, OPT_momit_leaf_frame_pointer }, { "tls-dialect", aarch64_attr_enum, false, NULL, OPT_mtls_dialect_ }, @@ -11640,16 +11640,13 @@ aarch64_can_inline_p (tree caller, tree callee) tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); - /* If callee has no option attributes, then it is ok to inline. */ - if (!callee_tree) -return true; - struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree ? caller_tree : target_option_default_node); - struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); - + struct cl_target_option *callee_opts + = TREE_TARGET_OPTION (callee_tree ? callee_tree + : target_option_default_node); /* Callee's ISA flags should be a subset of the caller's. */ if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags) diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 52eaf8c..1426b45 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -85,7 +85,7 @@ Target RejectNegative Joined Enum(cmodel) Var(aarch64_cmodel_var) Init(AARCH64_C Specify the code model. mstrict-align -Target Report RejectNegative Mask(STRICT_ALIGN) Save +Target Report Mask(STRICT_ALIGN) Save Don't assume that unaligned accesses are handled by the system. momit-leaf-frame-pointer diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 93a0ebc..dcda216 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -3605,8 +3605,10 @@ for the command line option @option{-mcmodel=}. @item strict-align @cindex @code{strict-align} function attribute, AArch64 Indicates that the compiler should not assume that unaligned memory references -are handled by the system. The behavior is the same as for the command-line -option @option{-mstrict-align}. +are handled by the system. To allow the compiler to assume that aligned memory +references are handled by the system, the inverse attribute +@code{no-strict-align} can be specified. The behavior is the same as for the +command-line option @option{-mstrict-align} and @option{-mno-strict-align}. @item omit-leaf-frame-pointer @cindex @code{omit-leaf-frame-pointer} function attribute, AArch64 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index feacd56..0574d21 100644 --- a/gcc
Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts
On 21/03/18 11:40, Sudakshina Das wrote: Hi On 21/03/18 08:51, Christophe Lyon wrote: On 20 March 2018 at 11:58, Sudakshina Das wrote: Hi On 20/03/18 10:03, Richard Earnshaw (lists) wrote: On 14/03/18 10:11, Sudakshina Das wrote: Hi This patch fixes PR82989 so that we avoid NEON instructions when -mneon-for-64bits is not enabled. This is more of a short term fix for the real deeper problem of making and early decision of choosing or rejecting NEON instructions. There is now a new ticket PR84467 to deal with the longer term solution. (Please refer to the discussion in the bug report for more details). Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and added a new test case based on the test given on the bug report. Ok for trunk and backports for gcc-7 and gcc-6 branches? OK for trunk. Please leave it a couple of days before backporting to ensure that the testcase doesn't tickle any multilib issues. R. Thanks. Committed to trunk as r258677. Will wait a week for backporting. Backported both the commits of trunks to gcc-7 as r258883 and to gcc-6 as r258884 (Reg-tested for both) Thanks Sudi Sudi Hi Sudi, I've noticed that: FAIL: gcc.target/arm/pr82989.c scan-assembler-times lsl\\tr[0-9]+, r[0-9]+, r[0-9] 2 FAIL: gcc.target/arm/pr82989.c scan-assembler-times lsr\\tr[0-9]+, r[0-9]+, r[0-9] 2 on target armeb-none-linux-gnueabihf --with-mode thumb --with-cpu cortex-a9 --with-fpu neon-fp16 The tests pass when using --with-mode arm Can you check? Yes I see this as well. Sorry about this. I am testing a quick fix for this at the moment. Thanks Sudi Thanks Christophe Sudi *** gcc/ChangeLog *** 2018-03-14 Sudakshina Das * config/arm/neon.md (ashldi3_neon): Update ?s for constraints to favor GPR over NEON registers. (di3_neon): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-14 Sudakshina Das * gcc.target/arm/pr82989.c: New test. pr82989.diff diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6a6f5d7..1646b21 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1180,12 +1180,12 @@ ) (define_insn_and_split "ashldi3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r, ?w,w") - (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w,w") - (match_operand:SI 2 "general_operand" "rUm, i, r, i, i,rUm,i"))) - (clobber (match_scratch:SI 3 "= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:SI 4 "= X, X,?&r, X, X, X,X")) - (clobber (match_scratch:DI 5 "=&w, X, X, X, X, &w,X")) + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r, ?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r, 0w, w") + (match_operand:SI 2 "general_operand" "rUm, i, r, i, i,rUm, i"))) + (clobber (match_scratch:SI 3 "= X, X, &r, X, X, X, X")) + (clobber (match_scratch:SI 4 "= X, X, &r, X, X, X, X")) + (clobber (match_scratch:DI 5 "=&w, X, X, X, X, &w, X")) (clobber (reg:CC_C CC_REGNUM))] "TARGET_NEON" "#" @@ -1276,7 +1276,7 @@ ;; ashrdi3_neon ;; lshrdi3_neon (define_insn_and_split "di3_neon" - [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r,?w,?w") + [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r,?w,?w") (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0, r,0w, w") (match_operand:SI 2 "reg_or_int_operand" " r, i, r, i, i, r, i"))) (clobber (match_scratch:SI 3 "=2r, X, &r, X, X,2r, X")) diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c new file mode 100644 index 000..1295ee6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr82989.c @@ -0,0 +1,38 @@ +/* PR target/82989 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-a8" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { "-mfpu=neon" } } */ +/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*"
Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
Hi On 20/03/18 10:57, Sudakshina Das wrote: Hi On 20/03/18 08:13, Christophe Lyon wrote: On 19 March 2018 at 19:55, Sudakshina Das wrote: Hi On 19/03/18 14:29, James Greenhalgh wrote: On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. OK. Let it soak on trunk for a while before the backport. Backported both r258653 and Christophe's r258672 to gcc-7-branch as r258917 (reg-tested). Needed non-functional edits because of the name change from vec_cmp to vec_cmp between gcc-7 and trunk. Thanks Sudi Thanks. Committed to trunk as r258653. Will wait a week before backport. Hi, As the test failed to compile on aarch64 bare-metal targets, I added /* { dg-require-effective-target fenv_exceptions } */ as obvious (r258672). 2018-03-20 Christophe Lyon PR target/81647 * gcc.target/aarch64/pr81647.c: Require fenv_exceptions. Index: testsuite/gcc.target/aarch64/pr81647.c === --- testsuite/gcc.target/aarch64/pr81647.c (revision 258671) +++ testsuite/gcc.target/aarch64/pr81647.c (revision 258672) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O3 -fdump-tree-ssa" } */ +/* { dg-require-effective-target fenv_exceptions } */ #include Christophe Thanks for fixing this and apologies for missing it on the first place! Sudi Sudi Thanks, James ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2731,10 +2731,10 @@ break; } /* Fall through. */ - case UNGE: + case UNLT: std::swap (operands[2], operands[3]); /* Fall through. */ - case UNLE: + case UNGT: case GT: comparison = gen_aarch64_cmgt; break; @@ -2745,10 +2745,10 @@ break; } /* Fall through. */ - case UNGT: + case UNLE: std::swap (operands[2], operands[3]); /* Fall through. */ - case UNLT: + case UNGE: case GE: comparison = gen_aarch64_cmge; break; @@ -2771,21 +2771,35 @@ case UNGT: case UNLE: case UNLT: - case NE: - /* FCM returns false for lanes which are unordered, so if we use - the inverse of the comparison we actually want to emit, then - invert the result, we will end up with the correct result. - Note that a NE NaN and NaN NE b are true for all a, b. - - Our transformations are: - a UNGE b -> !(b GT a) - a UNGT b -> !(b GE a) - a UNLE b -> !(a GT b) - a UNLT b -> !(a GE b) - a NE b -> !(a EQ b) */ - gcc_assert (comparison != NULL); - emit_insn (comparison (operands[0], operands[2], operands[3])); - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); + { + /* All of the above must not raise any FP exceptions. Thus we first + check each operand for NaNs and force any elements containing NaN to + zero before using them in the compare. + Example: UN (a, b) -> UNORDERED (a, b) | + (cm (isnan (a) ? 0.0 : a, + isnan (b) ? 0.0 : b)) + We use the following tr
[PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf
Hi This patch is a request to backport r258777 and r258805 to gcc-7-branch and gcc-6-branch. The same ICE occurs in both the branches with -fstack-check. Thus the test case directive has been changed. The discussion on the patch that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html Testing : Regtested on both the branches with arm-none-linux-gnueabihf Is this ok for gcc-7 and gcc-6? Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-28 Sudakshina Das Backport from mainline 2018-03-22 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-28 Sudakshina Das * gcc.target/arm/pr84826.c: Change dg-option to -fstack-check. Backport from mainline 2018-03-23 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: Add dg directive. Backport from mainline 2018-03-22 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 25953f5..68a6fa5 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1420,6 +1420,9 @@ typedef struct GTY(()) machine_function machine_mode thumb1_cc_mode; /* Set to 1 after arm_reorg has started. */ int after_arm_reorg; + /* The number of bytes used to store the static chain register on the + stack, above the stack frame. */ + int static_chain_stack_bytes; } machine_function; #endif diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6f7ca43..886bcfa 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19097,6 +19097,11 @@ arm_r3_live_at_start_p (void) static int arm_compute_static_chain_stack_bytes (void) { + /* Once the value is updated from the init value of -1, do not + re-compute. */ + if (cfun->machine->static_chain_stack_bytes != -1) +return cfun->machine->static_chain_stack_bytes; + /* See the defining assertion in arm_expand_prologue. */ if (IS_NESTED (arm_current_func_type ()) && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM) @@ -21395,6 +21400,11 @@ arm_expand_prologue (void) emit_insn (gen_movsi (stack_pointer_rtx, r1)); } + /* Let's compute the static_chain_stack_bytes required and store it. Right + now the value must the -1 as stored by arm_init_machine_status (). */ + cfun->machine->static_chain_stack_bytes += arm_compute_static_chain_stack_bytes (); + /* The static chain register is the same as the IP register. If it is clobbered when creating the frame, we need to save and restore it. */ clobber_ip = IS_NESTED (func_type) @@ -24542,6 +24552,7 @@ arm_init_machine_status (void) #if ARM_FT_UNKNOWN != 0 machine->func_type = ARM_FT_UNKNOWN; #endif + machine->static_chain_stack_bytes = -1; return machine; } diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c new file mode 100644 index 000..563ce51 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr84826.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_thumb2_ok } */ +/* { dg-options "-Ofast -fstack-check" } */ + +void d (void *); + +void a () +{ + int b; + void bar (int c) + { +if (__builtin_expect (c, 0)) + ++b; + } + d (bar); +}
Re: [PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf
Hi Kyrill On 29/03/18 09:41, Kyrill Tkachov wrote: Hi Sudi, On 28/03/18 15:04, Sudakshina Das wrote: Hi This patch is a request to backport r258777 and r258805 to gcc-7-branch and gcc-6-branch. The same ICE occurs in both the branches with -fstack-check. Thus the test case directive has been changed. The discussion on the patch that went into trunk is: https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html Testing : Regtested on both the branches with arm-none-linux-gnueabihf Is this ok for gcc-7 and gcc-6? Ok. Thanks, Kyrill Thanks! Committed to gcc-7-branch as r258948 and gcc-6-branch as r258949. Sudi Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-03-28 Sudakshina Das Backport from mainline 2018-03-22 Sudakshina Das PR target/84826 * config/arm/arm.h (machine_function): Add static_chain_stack_bytes. * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid re-computing once computed. (arm_expand_prologue): Compute machine->static_chain_stack_bytes. (arm_init_machine_status): Initialize machine->static_chain_stack_bytes. *** gcc/testsuite/ChangeLog *** 2018-03-28 Sudakshina Das * gcc.target/arm/pr84826.c: Change dg-option to -fstack-check. Backport from mainline 2018-03-23 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: Add dg directive. Backport from mainline 2018-03-22 Sudakshina Das PR target/84826 * gcc.target/arm/pr84826.c: New test.
Re: [Aarch64] Fix conditional branches with target far away.
Hi Sameera On 29/03/18 11:44, Sameera Deshpande wrote: Hi Sudakshina, Thanks for pointing that out. Updated the conditions for attribute length to take care of boundary conditions for offset range. Please find attached the updated patch. I have tested it for gcc testsuite and the failing testcase. Ok for trunk? Thank you so much for fixing the length as well along with you patch. You mention a failing testcase? Maybe it would be helpful to add that to the patch for the gcc testsuite. Sudi On 22 March 2018 at 19:06, Sudakshina Das wrote: Hi Sameera On 22/03/18 02:07, Sameera Deshpande wrote: Hi Sudakshina, As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the far branch instruction offset is inclusive of both the offsets. Hence, I am using <=||=> and not <||>= as it was in previous implementation. I have to admit earlier I was only looking at the patch mechanically and found a difference with the previous implementation in offset comparison. After you pointed out, I looked up the ARMv8 ARM and I have a couple of doubts: 1. My understanding is that any offset in [-1048576 ,1048572] both inclusive qualifies as an 'in range' offset. However, the code for both attribute length and far_branch has been using [-1048576 ,1048572), that is, ( >= && < ). If the far_branch was incorrectly calculated, then maybe the length calculations with similar magic numbers should also be corrected? Of course, I am not an expert in this and maybe this was a conscience decision so I would ask Ramana to maybe clarify if he remembers. 2. Now to come back to your patch, if my understanding is correct, I think a far_branch would be anything outside of this range, that is, (offset < -1048576 || offset > 1048572), anything that can not be represented in the 21-bit range. Thanks Sudi On 16 March 2018 at 00:51, Sudakshina Das wrote: On 15/03/18 15:27, Sameera Deshpande wrote: Ping! On 28 February 2018 at 16:18, Sameera Deshpande wrote: On 27 February 2018 at 18:25, Ramana Radhakrishnan wrote: On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande wrote: Hi! Please find attached the patch to fix bug in branches with offsets over 1MiB. There has been an attempt to fix this issue in commit 050af05b9761f1979f11c151519e7244d5becd7c However, the far_branch attribute defined in above patch used insn_length - which computes incorrect offset. Hence, eliminated the attribute completely, and computed the offset from insn_addresses instead. Ok for trunk? gcc/Changelog 2018-02-13 Sameera Deshpande * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate all the dependencies on the attribute from RTL patterns. I'm not a maintainer but this looks good to me modulo notes about how this was tested. What would be nice is a testcase for the testsuite as well as ensuring that the patch has been bootstrapped and regression tested. AFAIR, the original patch was put in because match.pd failed when bootstrap in another context. regards Ramana -- - Thanks and regards, Sameera D. The patch is tested with GCC testsuite and bootstrapping successfully. Also tested for spec benchmark. I am not a maintainer either. I noticed that the range check you do for the offset has a (<= || >=). The "far_branch" however did (< || >=) for a positive value. Was that also part of the incorrect offset calculation? @@ -692,7 +675,11 @@ { if (get_attr_length (insn) =3D=3D 8) { - if (get_attr_far_branch (insn) =3D=3D 1) + long long int offset; + offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0))) + - INSN_ADDRESSES (INSN_UID (insn)); + + if (offset <=3D -1048576 || offset >=3D 1048572) return aarch64_gen_far_branch (operands, 2, "Ltb", "\\t%0, %1, "); else @@ -709,12 +696,7 @@ (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -32768)) (lt (minus (match_dup 2) (pc)) (const_int 32764))) (const_int 4) - (const_int 8))) - (set (attr "far_branch") - (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int -1048576)) - (lt (minus (match_dup 2) (pc)) (const_int 1048572))) - (const_int 0) - (const_int 1)))] + (const_int 8)))] ) Thanks Sudi -- - Thanks and regards, Sameera D.
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyon wrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande : Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon : 2017-12-14 9:29 GMT+01:00 Sameera Deshpande : Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. Hi, I'm not a maintainer, but I suspect you should add some tests. Christophe -- - Thanks and regards, Sameera D. Hi Christophe, Please find attached the updated patch. Similar to the testcase vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as the intrinsics are not implemented yet. I have also added required target to be little endian. I am not a maintainer either. Shouldn't these intrinsics be supported even for big endian? From your patch: diff --git a/gcc/testsuite/gcc.target/aarch64/a
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 11/04/18 09:04, Sameera Deshpande wrote: On 10 April 2018 at 20:07, Sudakshina Das wrote: Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyon wrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande : Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon : 2017-12-14 9:29 GMT+01:00 Sameera Deshpande : Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. Hi, I'm not a maintainer, but I suspect you should add some tests. Christophe -- - Thanks and regards, Sameera D. Hi Christophe, Please find attached the updated patch. Similar to the testcase vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as the intrinsics are not imple
Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics
Hi Sameera On 11/04/18 13:05, Sameera Deshpande wrote: On 11 April 2018 at 15:53, Sudakshina Das wrote: Hi Sameera On 11/04/18 09:04, Sameera Deshpande wrote: On 10 April 2018 at 20:07, Sudakshina Das wrote: Hi Sameera On 10/04/18 11:20, Sameera Deshpande wrote: On 7 April 2018 at 01:25, Christophe Lyon wrote: Hi, 2018-04-06 12:15 GMT+02:00 Sameera Deshpande : Hi Christophe, Please find attached the updated patch with testcases. Ok for trunk? Thanks for the update. Since the new intrinsics are only available on aarch64, you want to prevent the tests from running on arm. Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets. There are several examples on how to do that in that directory. I have also noticed that the tests fail at execution on aarch64_be. I didn't look at the patch in details. Christophe - Thanks and regards, Sameera D. 2017-12-14 22:17 GMT+05:30 Christophe Lyon : 2017-12-14 9:29 GMT+01:00 Sameera Deshpande : Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3 patchname=armv8_2-fp16-scalar-2.patch3 refrev=259064 email_to=christophe.l...@linaro.org (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. Hi, I'm not a maintainer, but I suspect you should add some tests. Christophe -- -
Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align
Ping! On 27/03/18 13:58, Sudakshina Das wrote: Hi This patch adds the no variant to -mstrict-align and the corresponding function attribute. To enable the function attribute, I have modified aarch64_can_inline_p () to allow checks even when the callee function has no attribute. The need for this is shown by the new test target_attr_18.c. Testing: Bootstrapped, regtested and added new tests that are copies of earlier tests checking -mstrict-align with opposite scan directives. Is this ok for trunk? Sudi *** gcc/ChangeLog *** 2018-03-27 Sudakshina Das * common/config/aarch64/aarch64-common.c (aarch64_handle_option): Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags. * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative. * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg as true for strict-align. (aarch64_can_inline_p): Perform checks even when callee has no attributes to check for strict alignment. * doc/extend.texi (AArch64 Function Attributes): Document no-strict-align. * doc/invoke.texi: (AArch64 Options): Likewise. *** gcc/testsuite/ChangeLog *** 2018-03-27 Sudakshina Das * gcc.target/aarch64/pr84882.c: New test. * gcc.target/aarch64/target_attr_18.c: Likewise.