[PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
In case of high register pressure in PIC mode, address of the stack protector's guard can be spilled on ARM targets as shown in PR85434, thus allowing an attacker to control what the canary would be compared against. ARM does lack stack_protect_set and stack_protect_test insn patterns, defining them does not help as the address is expanded regularly and the patterns only deal with the copy and test of the guard with the canary. This problem does not occur for x86 targets because the PIC access and the test can be done in the same instruction. Aarch64 is exempt too because PIC access insn pattern are mov of UNSPEC which prevents it from the second access in the epilogue being CSEd in cse_local pass with the first access in the prologue. The approach followed here is to create new "combined" set and test standard pattern names that take the unexpanded guard and do the set or test. This allows the target to use an opaque pattern (eg. using UNSPEC) to hide the individual instructions being generated to the compiler and split the pattern into generic load, compare and branch instruction after register allocator, therefore avoiding any spilling. This is here implemented for the ARM targets. For targets not implementing these new standard pattern names, the existing stack_protect_set and stack_protect_test pattern names are used. To be able to split PIC access after register allocation, the functions had to be augmented to force a new PIC register load and to control which register it loads into. This is because sharing the PIC register between prologue and epilogue could lead to spilling due to CSE again which an attacker could use to control what the canary gets compared against. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-07-05 Thomas Preud'homme PR target/85434 * target-insns.def (stack_protect_combined_set): Define new standard pattern name. (stack_protect_combined_test): Likewise. * cfgexpand.c (stack_protect_prologue): Try new stack_protect_combined_set pattern first. * function.c (stack_protect_epilogue): Try new stack_protect_combined_test pattern first. * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now parameters to control which register to use as PIC register and force reloading PIC register respectively. (legitimize_pic_address): Expose above new parameters in prototype and adapt recursive calls accordingly. (arm_legitimize_address): Adapt to new legitimize_pic_address prototype. (thumb_legitimize_address): Likewise. (arm_emit_call_insn): Adapt to new require_pic_register prototype. * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype change. * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address prototype change. (stack_protect_combined_set): New insn_and_split pattern. (stack_protect_set): New insn pattern. (stack_protect_combined_test): New insn_and_split pattern. (stack_protect_test): New insn pattern. * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. (UNSPEC_SP_TEST): Likewise. * doc/md.texi (stack_protect_combined_set): Document new standard pattern name. (stack_protect_set): Clarify that the operand for guard's address is legal. (stack_protect_combined_test): Document new standard pattern name. (stack_protect_test): Clarify that the operand for guard's address is legal. *** gcc/testsuite/ChangeLog *** 2018-07-05 Thomas Preud'homme PR target/85434 * gcc.target/arm/pr85434.c: New test. Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on Aarch64. Testsuite shows no regression on these 3 variants either both with default flags and with -fstack-protector-all. Is this ok for trunk? If yes, would this be acceptable as a backport to GCC 6, 7 and 8 provided that no regression is found? Best regards, Thomas From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001 From: Thomas Preud'homme Date: Tue, 8 May 2018 15:47:05 +0100 Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address on ARM In case of high register pressure in PIC mode, address of the stack protector's guard can be spilled on ARM targets as shown in PR85434, thus allowing an attacker to control what the canary would be compared against. ARM does lack stack_protect_set and stack_protect_test insn patterns, defining them does not help as the address is expanded regularly and the patterns only deal with the copy and test of the guard with the canary. This problem does not occur for x86 targets because the PIC access and the test can be done in the same instruction. Aarch64 is exempt too because PIC access insn pattern are mov of UNSPEC which prevents it from the second access in the epilogue being CSEd in cse_local pass with the first access in the prologue. The approach followed here is to create new "combined" set and test standard pattern names tha
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Adding Jeff and Eric since the patch adds an RTL target hook. Best regards, Thomas On Thu, 5 Jul 2018 at 15:48, Thomas Preudhomme wrote: > > In case of high register pressure in PIC mode, address of the stack > protector's guard can be spilled on ARM targets as shown in PR85434, > thus allowing an attacker to control what the canary would be compared > against. ARM does lack stack_protect_set and stack_protect_test insn > patterns, defining them does not help as the address is expanded > regularly and the patterns only deal with the copy and test of the > guard with the canary. > > This problem does not occur for x86 targets because the PIC access and > the test can be done in the same instruction. Aarch64 is exempt too > because PIC access insn pattern are mov of UNSPEC which prevents it from > the second access in the epilogue being CSEd in cse_local pass with the > first access in the prologue. > > The approach followed here is to create new "combined" set and test > standard pattern names that take the unexpanded guard and do the set or > test. This allows the target to use an opaque pattern (eg. using UNSPEC) > to hide the individual instructions being generated to the compiler and > split the pattern into generic load, compare and branch instruction > after register allocator, therefore avoiding any spilling. This is here > implemented for the ARM targets. For targets not implementing these new > standard pattern names, the existing stack_protect_set and > stack_protect_test pattern names are used. > > To be able to split PIC access after register allocation, the functions > had to be augmented to force a new PIC register load and to control > which register it loads into. This is because sharing the PIC register > between prologue and epilogue could lead to spilling due to CSE again > which an attacker could use to control what the canary gets compared > against. > > ChangeLog entries are as follows: > > *** gcc/ChangeLog *** > > 2018-07-05 Thomas Preud'homme > > PR target/85434 > * target-insns.def (stack_protect_combined_set): Define new standard > pattern name. > (stack_protect_combined_test): Likewise. > * cfgexpand.c (stack_protect_prologue): Try new > stack_protect_combined_set pattern first. > * function.c (stack_protect_epilogue): Try new > stack_protect_combined_test pattern first. > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > parameters to control which register to use as PIC register and force > reloading PIC register respectively. > (legitimize_pic_address): Expose above new parameters in prototype and > adapt recursive calls accordingly. > (arm_legitimize_address): Adapt to new legitimize_pic_address > prototype. > (thumb_legitimize_address): Likewise. > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > change. > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > prototype change. > (stack_protect_combined_set): New insn_and_split pattern. > (stack_protect_set): New insn pattern. > (stack_protect_combined_test): New insn_and_split pattern. > (stack_protect_test): New insn pattern. > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > (UNSPEC_SP_TEST): Likewise. > * doc/md.texi (stack_protect_combined_set): Document new standard > pattern name. > (stack_protect_set): Clarify that the operand for guard's address is > legal. > (stack_protect_combined_test): Document new standard pattern name. > (stack_protect_test): Clarify that the operand for guard's address is > legal. > > *** gcc/testsuite/ChangeLog *** > > 2018-07-05 Thomas Preud'homme > > PR target/85434 > * gcc.target/arm/pr85434.c: New test. > > Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on > Aarch64. Testsuite shows no regression on these 3 variants either both > with default flags and with -fstack-protector-all. > > Is this ok for trunk? If yes, would this be acceptable as a backport to > GCC 6, 7 and 8 provided that no regression is found? > > Best regards, > > Thomas From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001 From: Thomas Preud'homme Date: Tue, 8 May 2018 15:47:05 +0100 Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address on ARM In case of high register pressure in PIC mode, address of the stack protector's guard can be spilled on ARM targets as shown in PR85434, thus allowing an attacker to control what the canary would be compared aga
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Fixed in attached patch. ChangeLog entries are unchanged: *** gcc/ChangeLog *** 2018-07-05 Thomas Preud'homme PR target/85434 * target-insns.def (stack_protect_combined_set): Define new standard pattern name. (stack_protect_combined_test): Likewise. * cfgexpand.c (stack_protect_prologue): Try new stack_protect_combined_set pattern first. * function.c (stack_protect_epilogue): Try new stack_protect_combined_test pattern first. * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now parameters to control which register to use as PIC register and force reloading PIC register respectively. (legitimize_pic_address): Expose above new parameters in prototype and adapt recursive calls accordingly. (arm_legitimize_address): Adapt to new legitimize_pic_address prototype. (thumb_legitimize_address): Likewise. (arm_emit_call_insn): Adapt to new require_pic_register prototype. * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype change. * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address prototype change. (stack_protect_combined_set): New insn_and_split pattern. (stack_protect_set): New insn pattern. (stack_protect_combined_test): New insn_and_split pattern. (stack_protect_test): New insn pattern. * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. (UNSPEC_SP_TEST): Likewise. * doc/md.texi (stack_protect_combined_set): Document new standard pattern name. (stack_protect_set): Clarify that the operand for guard's address is legal. (stack_protect_combined_test): Document new standard pattern name. (stack_protect_test): Clarify that the operand for guard's address is legal. *** gcc/testsuite/ChangeLog *** 2018-07-05 Thomas Preud'homme PR target/85434 * gcc.target/arm/pr85434.c: New test. Best regards, Thomas On Mon, 16 Jul 2018 at 22:46, Jeff Law wrote: > > On 07/05/2018 08:48 AM, Thomas Preudhomme wrote: > > In case of high register pressure in PIC mode, address of the stack > > protector's guard can be spilled on ARM targets as shown in PR85434, > > thus allowing an attacker to control what the canary would be compared > > against. ARM does lack stack_protect_set and stack_protect_test insn > > patterns, defining them does not help as the address is expanded > > regularly and the patterns only deal with the copy and test of the > > guard with the canary. > > > > This problem does not occur for x86 targets because the PIC access and > > the test can be done in the same instruction. Aarch64 is exempt too > > because PIC access insn pattern are mov of UNSPEC which prevents it from > > the second access in the epilogue being CSEd in cse_local pass with the > > first access in the prologue. > > > > The approach followed here is to create new "combined" set and test > > standard pattern names that take the unexpanded guard and do the set or > > test. This allows the target to use an opaque pattern (eg. using UNSPEC) > > to hide the individual instructions being generated to the compiler and > > split the pattern into generic load, compare and branch instruction > > after register allocator, therefore avoiding any spilling. This is here > > implemented for the ARM targets. For targets not implementing these new > > standard pattern names, the existing stack_protect_set and > > stack_protect_test pattern names are used. > > > > To be able to split PIC access after register allocation, the functions > > had to be augmented to force a new PIC register load and to control > > which register it loads into. This is because sharing the PIC register > > between prologue and epilogue could lead to spilling due to CSE again > > which an attacker could use to control what the canary gets compared > > against. > > > > ChangeLog entries are as follows: > > > > *** gcc/ChangeLog *** > > > > 2018-07-05 Thomas Preud'homme > > > > PR target/85434 > > * target-insns.def (stack_protect_combined_set): Define new standard > > pattern name. > > (stack_protect_combined_test): Likewise. > > * cfgexpand.c (stack_protect_prologue): Try new > > stack_protect_combined_set pattern first. > > * function.c (stack_protect_epilogue): Try new > > stack_protect_combined_test pattern first. > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > > parameters to control which register to use as PIC register and force > > reloading PIC register respectively. > > (legitimize_pic_address): Expose above new parameters in prototype and > &
Re: [PATCH] Show valid options for -march and -mtune in --help=target for arm32 (PR driver/83193).
Hi Martin, Why is this needed when -mfpu does not seem to need it for instance? Regarding the patch: > -print "Name(processor_type) Type(enum processor_type)" > -print "Known ARM CPUs (for use with the -mcpu= and -mtune= options):\n" > +print "Name(processor_type) Type(enum processor_type) ForceHelp" > +print "Known ARM CPUs (for use with the -mtune= options):\n" Why changing the text beyond adding ForceHelp? > +@item ForceHelp > +This property is optional. If present, enum values is printed > +in @option{--help} output. > + are printed Thanks, Thomas On Wed, 18 Jul 2018 at 16:50, Martin Liška wrote: > > Hi. > > This introduces new ForceHelp option flag that helps to > print valid option enum values that are not directly > used as a type of an option. > > May I please ask ARM folks to test the patch? > Thanks, > Martin > > gcc/ChangeLog: > > 2018-07-18 Martin Liska > > PR driver/83193 > * config/arm/arm-tables.opt: Add ForceHelp flag for > processor_type and arch_name enum types. > * config/arm/parsecpu.awk: Likewise. > * doc/options.texi: Document new flag ForceHelp. > * opt-read.awk: Parse ForceHelp and set it in construction. > * optc-gen.awk: Likewise. > * opts.c (print_filtered_help): Handle force_help option. > * opts.h (struct cl_enum): New field force_help. > --- > gcc/config/arm/arm-tables.opt | 6 +++--- > gcc/config/arm/parsecpu.awk | 6 +++--- > gcc/doc/options.texi | 4 > gcc/opt-read.awk | 3 +++ > gcc/optc-gen.awk | 3 ++- > gcc/opts.c| 3 ++- > gcc/opts.h| 3 +++ > 7 files changed, 20 insertions(+), 8 deletions(-) > >
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
[Dropping Jeff Law from the list since he already commented on the middle end parts] Hi Kyrill, On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov wrote: > > Hi Thomas, > > On 17/07/18 12:02, Thomas Preudhomme wrote: > > Fixed in attached patch. ChangeLog entries are unchanged: > > > > *** gcc/ChangeLog *** > > > > 2018-07-05 Thomas Preud'homme > > > > PR target/85434 > > * target-insns.def (stack_protect_combined_set): Define new standard > > pattern name. > > (stack_protect_combined_test): Likewise. > > * cfgexpand.c (stack_protect_prologue): Try new > > stack_protect_combined_set pattern first. > > * function.c (stack_protect_epilogue): Try new > > stack_protect_combined_test pattern first. > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > > parameters to control which register to use as PIC register and force > > reloading PIC register respectively. > > (legitimize_pic_address): Expose above new parameters in prototype and > > adapt recursive calls accordingly. > > (arm_legitimize_address): Adapt to new legitimize_pic_address > > prototype. > > (thumb_legitimize_address): Likewise. > > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > > change. > > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > > prototype change. > > (stack_protect_combined_set): New insn_and_split pattern. > > (stack_protect_set): New insn pattern. > > (stack_protect_combined_test): New insn_and_split pattern. > > (stack_protect_test): New insn pattern. > > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > > (UNSPEC_SP_TEST): Likewise. > > * doc/md.texi (stack_protect_combined_set): Document new standard > > pattern name. > > (stack_protect_set): Clarify that the operand for guard's address is > > legal. > > (stack_protect_combined_test): Document new standard pattern name. > > (stack_protect_test): Clarify that the operand for guard's address is > > legal. > > > > *** gcc/testsuite/ChangeLog *** > > > > 2018-07-05 Thomas Preud'homme > > > > PR target/85434 > > * gcc.target/arm/pr85434.c: New test. > > > > Sorry for the delay. Some comments inline. > > Kyrill > > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index d6e3c382085..d1a893ac56e 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -6105,8 +6105,18 @@ stack_protect_prologue (void) > { > tree guard_decl = targetm.stack_protect_guard (); > rtx x, y; > + struct expand_operand ops[2]; > > x = expand_normal (crtl->stack_protect_guard); > + create_fixed_operand (&ops[0], x); > + create_fixed_operand (&ops[1], DECL_RTL (guard_decl)); > + /* Allow the target to compute address of Y and copy it to X without > + leaking Y into a register. This combined address + copy pattern allows > + the target to prevent spilling of any intermediate results by splitting > + it after register allocator. */ > + if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, > ops)) > +return; > + > if (guard_decl) > y = expand_normal (guard_decl); > else > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 8537262ce64..100844e659c 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum > rtx_code); > extern int arm_split_constant (RTX_CODE, machine_mode, rtx, >HOST_WIDE_INT, rtx, rtx, int); > extern int legitimate_pic_operand_p (rtx); > -extern rtx legitimize_pic_address (rtx, machine_mode, rtx); > +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool); > extern rtx legitimize_tls_address (rtx, rtx); > extern bool arm_legitimate_address_p (machine_mode, rtx, bool); > extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, > int); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index ec3abbcba9f..f4a970580c2 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x) > } > > /* Record that the current function needs a PIC register. Initialize > - cfun->machine->pic_reg if we have not already done so. */ > + cfun->machine->pic_reg if we have not already done
Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.
Hi Tamar, On Mon, 23 Jul 2018 at 17:56, Tamar Christina wrote: > > Hi All, > > My previous patch changed arm_can_change_mode_class to allow subregs of > 64bit registers on arm big-endian. However it seems that we can't do this > because a the data in 64 bit VFP registers are stored in little-endian order, > even on big-endian. > > Allowing this change had a knock on effect that caused GCC's no-op detection > to think that loading from the first lane on arm big-endian is a no-op. this > because we can't describe the weird ordering we have on D registers on > big-endian. > > The original issue comes from the fact that the code does > > ... foo (... bar) > { > return bar; > } > > The expansion of the return statement causes GCC to try to return the value in > a register. GCC will try to emit the move then, from MEM to REG (due to the > SSA > temporary.). It checks for a mov optab for this which isn't available and > then tries to do the move in bits using emit_move_multi_word. > > emit_move_multi_word will split the move into sub parts, but then needs to get > the sub parts and does this using subregs, but it's told it can't do subregs! > > The compiler is now stuck in an infinite loop. > > The way this is worked around in the back-end is that we have move patterns in > neon.md that usually just force the register instead of checking with the > back-end. This prevents emit_move_multi_word from being needed. However the > pattern for V4HF and V8HF were guarded by TARGET_NEON && TARGET_FP16. > > I don't believe the TARGET_FP16 guard to be needed, because the pattern > doesn't > actually generate code and requires another pattern for that, and a reg to > reg move > should always be possible anyway. So allowing the force to register here is > safe > and it allows the compiler to generate a correct error instead of ICEing in an > infinite loop. How about subreg to subreg move? Doesn't that expand to more insns (subreg to reg and reg to subreg)? Couldn't you improve the logic to check that there is actually a mode change so that if there isn't (like moving from one subreg to another) just expand to a single move? Best regards, Thomas > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without > introducing > any regressions while fixing > > gcc.dg/vect/vect-nop-move.c execution test > g++.dg/torture/vshuf-v2si.C -O3 -g execution test > g++.dg/torture/vshuf-v4si.C -O3 -g execution test > g++.dg/torture/vshuf-v8hi.C -O3 -g execution test > > Regtested on armeb-none-eabi and no regressions. > Bootstrapped on arm-none-linux-gnueabihf and no issues. > > > Ok for trunk? > > Thanks, > Tamar > > gcc/ > 2018-07-23 Tamar Christina > > PR target/84711 > * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg. > * config/arm/neon.md (movv4hf, movv8hf): Refactored to.. > (mov): ..this and enable unconditionally. > > --
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Hi Kyrill, Using memory_operand worked, the issues I encountered when using it in earlier versions of the patch must have been due to the missing test on address_operand in the preparation statements which I added later. Please find an updated patch in attachment. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-07-05 Thomas Preud'homme * target-insns.def (stack_protect_combined_set): Define new standard pattern name. (stack_protect_combined_test): Likewise. * cfgexpand.c (stack_protect_prologue): Try new stack_protect_combined_set pattern first. * function.c (stack_protect_epilogue): Try new stack_protect_combined_test pattern first. * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now parameters to control which register to use as PIC register and force reloading PIC register respectively. Insert in the stream of insns if possible. (legitimize_pic_address): Expose above new parameters in prototype and adapt recursive calls accordingly. (arm_legitimize_address): Adapt to new legitimize_pic_address prototype. (thumb_legitimize_address): Likewise. (arm_emit_call_insn): Adapt to new require_pic_register prototype. * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype change. * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address prototype change. (stack_protect_combined_set): New insn_and_split pattern. (stack_protect_set): New insn pattern. (stack_protect_combined_test): New insn_and_split pattern. (stack_protect_test): New insn pattern. * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. (UNSPEC_SP_TEST): Likewise. * doc/md.texi (stack_protect_combined_set): Document new standard pattern name. (stack_protect_set): Clarify that the operand for guard's address is legal. (stack_protect_combined_test): Document new standard pattern name. (stack_protect_test): Clarify that the operand for guard's address is legal. *** gcc/testsuite/ChangeLog *** 2018-07-05 Thomas Preud'homme * gcc.target/arm/pr85434.c: New test. Bootstrapped again for Arm and Thumb-2 and regtested with and without -fstack-protector-all without any regression. Best regards, Thomas On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme wrote: > > [Dropping Jeff Law from the list since he already commented on the > middle end parts] > > Hi Kyrill, > > On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov > wrote: > > > > Hi Thomas, > > > > On 17/07/18 12:02, Thomas Preudhomme wrote: > > > Fixed in attached patch. ChangeLog entries are unchanged: > > > > > > *** gcc/ChangeLog *** > > > > > > 2018-07-05 Thomas Preud'homme > > > > > > PR target/85434 > > > * target-insns.def (stack_protect_combined_set): Define new standard > > > pattern name. > > > (stack_protect_combined_test): Likewise. > > > * cfgexpand.c (stack_protect_prologue): Try new > > > stack_protect_combined_set pattern first. > > > * function.c (stack_protect_epilogue): Try new > > > stack_protect_combined_test pattern first. > > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > > > parameters to control which register to use as PIC register and force > > > reloading PIC register respectively. > > > (legitimize_pic_address): Expose above new parameters in prototype and > > > adapt recursive calls accordingly. > > > (arm_legitimize_address): Adapt to new legitimize_pic_address > > > prototype. > > > (thumb_legitimize_address): Likewise. > > > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > > > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > > > change. > > > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > > > prototype change. > > > (stack_protect_combined_set): New insn_and_split pattern. > > > (stack_protect_set): New insn pattern. > > > (stack_protect_combined_test): New insn_and_split pattern. > > > (stack_protect_test): New insn pattern. > > > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > > > (UNSPEC_SP_TEST): Likewise. > > > * doc/md.texi (stack_protect_combined_set): Document new standard > > > pattern name. > > > (stack_protect_set): Clarify that the operand for guard's address is > > > legal. > > > (stack_protect_combined_test): Document new standard pattern name. > > > (stack_protect_test): Clarify that the operand
Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.
Hi Tamar, On Wed, 25 Jul 2018 at 16:28, Tamar Christina wrote: > > Hi Thomas, > > Thanks for the review! > > > > > > > I don't believe the TARGET_FP16 guard to be needed, because the > > > pattern doesn't actually generate code and requires another pattern > > > for that, and a reg to reg move should always be possible anyway. So > > > allowing the force to register here is safe and it allows the compiler > > > to generate a correct error instead of ICEing in an infinite loop. > > > > How about subreg to subreg move? Doesn't that expand to more insns > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to check > > that there is actually a mode change so that if there isn't (like moving > > from > > one subreg to another) just expand to a single move? > > > > Yes, but that is not a new issue. My patch is simply removing the TARGET_FP16 > restrictions and > merging two patterns that should be one using an iterator and nothing more. > > The redundant mov is already there and a different issue than the ICE I'm > trying to fix. It's there for movv4hf and movv6hf but your patch extends this problem to movv2sf and movv4sf as well. > > None of the code inside the expander is needed at all, the code really only > has an effect on subreg > to subreg moves, as `force_reg` doesn't do anything when it's argument is > already a reg. > > The comment in the expander (which was already there) is wrong. The *reason* > the ICE is fixed isn't > because of the `force_reg`. It's because of the mere presence of the expander > itself. The expander matches the > standard mov$a optab and so this prevents emit_move_insn_1 from doing the > move by subwords as it finds a pattern > that's able to do the move. Could you then fix the comment in your patch as well? I hadn't understood the force_reg was not key here. You might want to update the following sentence from your patch description if you are going to include it in your commit message: The way this is worked around in the back-end is that we have move patterns in neon.md that usually just force the register instead of checking with the back-end. "The way this is worked around (..) that just force the register" is what led me to believe the force_reg was important. > > The expander however always falls through and doesn’t stop RTL generation. > You could remove all the code in there and have > it properly match the *neon_mov instructions which will do the right thing > later at code generation time and avoid the redundant > moves. My guess is the original `force_reg` was copied from the other > patterns like `movti` and the existing `mov`. There It makes > sense because the operands can be MEM or anything general_operand. > > However the redundant moves are a different problem than what I'm trying to > solve here. So I think that's another patch which requires further > testing. I was just thinking of restricting when does the force_reg happens but if it can be removed completely I agree it should probably be done in a separate patch. Oh by the way, is there something that prevent those expander to ever be used with a memory operand? Because the GCC internals contains the following piece for mov standard pattern (bold marks added by me): "Second, these patterns are not used solely in the RTL generation pass. Even the reload pass can generate move insns to copy values from stack slots into temporary registers. When it does so, one of the operands is a hard register and the other is an operand that can need to be reloaded into a register. Therefore, when given such a pair of operands, the pattern must generate RTL which needs no reloading and needs no temporary registers—no registers other than the operands. For example, if you support the pattern with a define_ expand, then in such a case the define_expand *mustn’t call force_reg* or any other such function which might generate new pseudo registers." Best regards, Thomas > > Regards, > Tamar > > > Best regards, > > > > Thomas > > > > > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without > > > introducing any regressions while fixing > > > > > > gcc.dg/vect/vect-nop-move.c execution test > > > g++.dg/torture/vshuf-v2si.C -O3 -g execution test > > > g++.dg/torture/vshuf-v4si.C -O3 -g execution test > > > g++.dg/torture/vshuf-v8hi.C -O3 -g execution test > > > > > > Regtested on armeb-none-eabi and no regressions. > > > Bootstrapped on arm-none-linux-gnueabihf and no issues. > > > > > > > > > Ok for trunk? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ > > > 2018-07-23 Tamar Christina > > > > > > PR target/84711 > > > * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg. > > > * config/arm/neon.md (movv4hf, movv8hf): Refactored to.. > > > (mov): ..this and enable unconditionally. > > > > > > --
Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.
On Thu, 26 Jul 2018 at 12:01, Tamar Christina wrote: > > Hi Thomas, > > > -Original Message- > > From: Thomas Preudhomme > > Sent: Thursday, July 26, 2018 09:29 > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan > > ; Richard Earnshaw > > ; ni...@redhat.com; Kyrylo Tkachov > > > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by > > enabling the FP16 pattern unconditionally. > > > > Hi Tamar, > > > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina > > wrote: > > > > > > Hi Thomas, > > > > > > Thanks for the review! > > > > > > > > > > > > > I don't believe the TARGET_FP16 guard to be needed, because the > > > > > pattern doesn't actually generate code and requires another > > > > > pattern for that, and a reg to reg move should always be possible > > > > > anyway. So allowing the force to register here is safe and it > > > > > allows the compiler to generate a correct error instead of ICEing in > > > > > an > > infinite loop. > > > > > > > > How about subreg to subreg move? Doesn't that expand to more insns > > > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to > > > > check that there is actually a mode change so that if there isn't > > > > (like moving from one subreg to another) just expand to a single move? > > > > > > > > > > Yes, but that is not a new issue. My patch is simply removing the > > > TARGET_FP16 restrictions and merging two patterns that should be one > > using an iterator and nothing more. > > > > > > The redundant mov is already there and a different issue than the ICE I'm > > trying to fix. > > > > It's there for movv4hf and movv6hf but your patch extends this problem to > > movv2sf and movv4sf as well. > > I don't understand how it can. My patch just replaces one pattern for V4HF and > one for V8HF with one pattern operating on VH. > > ;; Vector modes for 16-bit floating-point support. > (define_mode_iterator VH [V8HF V4HF]) > > My pattern has absolutely no effect on V2SF and V4SF or any of the other > modes. My bad, I was looking at VF. > > > > > > > > > None of the code inside the expander is needed at all, the code really > > > only has an effect on subreg to subreg moves, as `force_reg` doesn't do > > anything when it's argument is already a reg. > > > > > > The comment in the expander (which was already there) is wrong. The > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's > > > because of the mere presence of the expander itself. The expander > > > matches the standard mov$a optab and so this prevents > > emit_move_insn_1 from doing the move by subwords as it finds a pattern > > that's able to do the move. > > > > Could you then fix the comment in your patch as well? I hadn't understood > > the force_reg was not key here. You might want to update the following > > sentence from your patch description if you are going to include it in your > > commit message: > > I'll update the comment in the patch. The cover letter won't be included in > the commit, > But it does accurately reflect the current state of affairs. The patch will > do the force_reg, > It's just not the reason it works. Understood. > > > > > The way this is worked around in the back-end is that we have move > > patterns in neon.md that usually just force the register instead of checking > > with the back-end. > > > > "The way this is worked around (..) that just force the register" is what > > led > > me to believe the force_reg was important. > > > > > > > > The expander however always falls through and doesn’t stop RTL > > > generation. You could remove all the code in there and have it > > > properly match the *neon_mov instructions which will do the right > > > thing later at code generation time and avoid the redundant moves. My > > guess is the original `force_reg` was copied from the other patterns like > > `movti` and the existing `mov`. There It makes sense because the > > operands can be MEM or anything general_operand. > > > > > > However the redundant moves are a different problem than what I'm > > > trying to solve here. So I think that's another patch which requires > &
Re: [PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0
On Friday 15 January 2016 12:45:04 Ramana Radhakrishnan wrote: > On Wed, Dec 16, 2015 at 9:11 AM, Thomas Preud'homme > > wrote: > > During reorg pass, thumb1_reorg () is tasked with rewriting mov rd, rn to > > subs rd, rn, 0 to avoid a comparison against 0 instruction before doing a > > conditional branch based on it. The actual avoiding of cmp is done in > > cbranchsi4_insn instruction C output template. When the condition is met, > > the source register (rn) is also propagated into the comparison in place > > the destination register (rd). > > > > However, right now thumb1_reorg () only look for a mov followed by a > > cbranchsi but does not check whether the comparison in cbranchsi is > > against the constant 0. This is not safe because a non clobbering > > instruction could exist between the mov and the comparison that modifies > > the source register. This is what happens here with a post increment of > > the source register after the mov, which skip the &a[i] == &a[1] > > comparison for iteration i == 1. > > > > This patch fixes the issue by checking that the comparison is against > > constant 0. > > > > ChangeLog entry is as follow: > > > > > > *** gcc/ChangeLog *** > > > > 2015-12-07 Thomas Preud'homme > > > > * config/arm/arm.c (thumb1_reorg): Check that the comparison is > > against the constant 0. > > OK. > > Ramana > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index 42bf272..49c0a06 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -17195,7 +17195,7 @@ thumb1_reorg (void) > > > >FOR_EACH_BB_FN (bb, cfun) > > > > { > > > >rtx dest, src; > > > > - rtx pat, op0, set = NULL; > > + rtx cmp, op0, op1, set = NULL; > > > >rtx_insn *prev, *insn = BB_END (bb); > >bool insn_clobbered = false; > > > > @@ -17208,8 +17208,13 @@ thumb1_reorg (void) > > > > continue; > > > >/* Get the register with which we are comparing. */ > > > > - pat = PATTERN (insn); > > - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); > > + cmp = XEXP (SET_SRC (PATTERN (insn)), 0); > > + op0 = XEXP (cmp, 0); > > + op1 = XEXP (cmp, 1); > > + > > + /* Check that comparison is against ZERO. */ > > + if (!CONST_INT_P (op1) || INTVAL (op1) != 0) > > + continue; > > > >/* Find the first flag setting insn before INSN in basic block BB. > >*/ > >gcc_assert (insn != BB_HEAD (bb)); > > > > @@ -17249,7 +17254,7 @@ thumb1_reorg (void) > > > > PATTERN (prev) = gen_rtx_SET (dest, src); > > INSN_CODE (prev) = -1; > > /* Set test register in INSN to dest. */ > > > > - XEXP (XEXP (SET_SRC (pat), 0), 0) = copy_rtx (dest); > > + XEXP (cmp, 0) = copy_rtx (dest); > > > > INSN_CODE (insn) = -1; > > > > } > > > > } > > > > Testsuite shows no regression when run for arm-none-eabi with > > -mcpu=cortex-m0 -mthumb The patch applies cleanly on gcc-5-branch and also show no regression when run for arm-none-eabi with -mcpu=cortex-m0 -mthumb. Is it ok to backport? Best regards, Thomas
Re: [PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0
On Thursday 03 March 2016 09:44:31 Ramana Radhakrishnan wrote: > On Thu, Mar 3, 2016 at 9:40 AM, Thomas Preudhomme > > wrote: > > On Friday 15 January 2016 12:45:04 Ramana Radhakrishnan wrote: > >> On Wed, Dec 16, 2015 at 9:11 AM, Thomas Preud'homme > >> > >> wrote: > >> > During reorg pass, thumb1_reorg () is tasked with rewriting mov rd, rn > >> > to > >> > subs rd, rn, 0 to avoid a comparison against 0 instruction before doing > >> > a > >> > conditional branch based on it. The actual avoiding of cmp is done in > >> > cbranchsi4_insn instruction C output template. When the condition is > >> > met, > >> > the source register (rn) is also propagated into the comparison in > >> > place > >> > the destination register (rd). > >> > > >> > However, right now thumb1_reorg () only look for a mov followed by a > >> > cbranchsi but does not check whether the comparison in cbranchsi is > >> > against the constant 0. This is not safe because a non clobbering > >> > instruction could exist between the mov and the comparison that > >> > modifies > >> > the source register. This is what happens here with a post increment of > >> > the source register after the mov, which skip the &a[i] == &a[1] > >> > comparison for iteration i == 1. > >> > > >> > This patch fixes the issue by checking that the comparison is against > >> > constant 0. > >> > > >> > ChangeLog entry is as follow: > >> > > >> > > >> > *** gcc/ChangeLog *** > >> > > >> > 2015-12-07 Thomas Preud'homme > >> > > >> > * config/arm/arm.c (thumb1_reorg): Check that the comparison is > >> > against the constant 0. > >> > >> OK. > >> > >> Ramana > >> > >> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > >> > index 42bf272..49c0a06 100644 > >> > --- a/gcc/config/arm/arm.c > >> > +++ b/gcc/config/arm/arm.c > >> > @@ -17195,7 +17195,7 @@ thumb1_reorg (void) > >> > > >> >FOR_EACH_BB_FN (bb, cfun) > >> > > >> > { > >> > > >> >rtx dest, src; > >> > > >> > - rtx pat, op0, set = NULL; > >> > + rtx cmp, op0, op1, set = NULL; > >> > > >> >rtx_insn *prev, *insn = BB_END (bb); > >> >bool insn_clobbered = false; > >> > > >> > @@ -17208,8 +17208,13 @@ thumb1_reorg (void) > >> > > >> > continue; > >> > > >> >/* Get the register with which we are comparing. */ > >> > > >> > - pat = PATTERN (insn); > >> > - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); > >> > + cmp = XEXP (SET_SRC (PATTERN (insn)), 0); > >> > + op0 = XEXP (cmp, 0); > >> > + op1 = XEXP (cmp, 1); > >> > + > >> > + /* Check that comparison is against ZERO. */ > >> > + if (!CONST_INT_P (op1) || INTVAL (op1) != 0) > >> > + continue; > >> > > >> >/* Find the first flag setting insn before INSN in basic block > >> >BB. > >> >*/ > >> >gcc_assert (insn != BB_HEAD (bb)); > >> > > >> > @@ -17249,7 +17254,7 @@ thumb1_reorg (void) > >> > > >> > PATTERN (prev) = gen_rtx_SET (dest, src); > >> > INSN_CODE (prev) = -1; > >> > /* Set test register in INSN to dest. */ > >> > > >> > - XEXP (XEXP (SET_SRC (pat), 0), 0) = copy_rtx (dest); > >> > + XEXP (cmp, 0) = copy_rtx (dest); > >> > > >> > INSN_CODE (insn) = -1; > >> > > >> > } > >> > > >> > } > >> > > >> > Testsuite shows no regression when run for arm-none-eabi with > >> > -mcpu=cortex-m0 -mthumb > > > > The patch applies cleanly on gcc-5-branch and also show no regression when > > run for arm-none-eabi with -mcpu=cortex-m0 -mthumb. Is it ok to backport? > This deserves a testcase. The original patch don't have one initially because it fixes a fail of an existing testcase (loop-2b.c). However, the test pass on gcc 5 due to difference in code generation. I'm currently trying to come up with a testcase and will get back at you. Best regards, Thomas
Re: [PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0
On Thursday 03 March 2016 15:32:27 Thomas Preudhomme wrote: > On Thursday 03 March 2016 09:44:31 Ramana Radhakrishnan wrote: > > On Thu, Mar 3, 2016 at 9:40 AM, Thomas Preudhomme > > > > wrote: > > > On Friday 15 January 2016 12:45:04 Ramana Radhakrishnan wrote: > > >> On Wed, Dec 16, 2015 at 9:11 AM, Thomas Preud'homme > > >> > > >> wrote: > > >> > During reorg pass, thumb1_reorg () is tasked with rewriting mov rd, > > >> > rn > > >> > to > > >> > subs rd, rn, 0 to avoid a comparison against 0 instruction before > > >> > doing > > >> > a > > >> > conditional branch based on it. The actual avoiding of cmp is done in > > >> > cbranchsi4_insn instruction C output template. When the condition is > > >> > met, > > >> > the source register (rn) is also propagated into the comparison in > > >> > place > > >> > the destination register (rd). > > >> > > > >> > However, right now thumb1_reorg () only look for a mov followed by a > > >> > cbranchsi but does not check whether the comparison in cbranchsi is > > >> > against the constant 0. This is not safe because a non clobbering > > >> > instruction could exist between the mov and the comparison that > > >> > modifies > > >> > the source register. This is what happens here with a post increment > > >> > of > > >> > the source register after the mov, which skip the &a[i] == &a[1] > > >> > comparison for iteration i == 1. > > >> > > > >> > This patch fixes the issue by checking that the comparison is against > > >> > constant 0. > > >> > > > >> > ChangeLog entry is as follow: > > >> > > > >> > > > >> > *** gcc/ChangeLog *** > > >> > > > >> > 2015-12-07 Thomas Preud'homme > > >> > > > >> > * config/arm/arm.c (thumb1_reorg): Check that the comparison > > >> > is > > >> > against the constant 0. > > >> > > >> OK. > > >> > > >> Ramana > > >> > > >> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > >> > index 42bf272..49c0a06 100644 > > >> > --- a/gcc/config/arm/arm.c > > >> > +++ b/gcc/config/arm/arm.c > > >> > @@ -17195,7 +17195,7 @@ thumb1_reorg (void) > > >> > > > >> >FOR_EACH_BB_FN (bb, cfun) > > >> > > > >> > { > > >> > > > >> >rtx dest, src; > > >> > > > >> > - rtx pat, op0, set = NULL; > > >> > + rtx cmp, op0, op1, set = NULL; > > >> > > > >> >rtx_insn *prev, *insn = BB_END (bb); > > >> >bool insn_clobbered = false; > > >> > > > >> > @@ -17208,8 +17208,13 @@ thumb1_reorg (void) > > >> > > > >> > continue; > > >> > > > >> >/* Get the register with which we are comparing. */ > > >> > > > >> > - pat = PATTERN (insn); > > >> > - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); > > >> > + cmp = XEXP (SET_SRC (PATTERN (insn)), 0); > > >> > + op0 = XEXP (cmp, 0); > > >> > + op1 = XEXP (cmp, 1); > > >> > + > > >> > + /* Check that comparison is against ZERO. */ > > >> > + if (!CONST_INT_P (op1) || INTVAL (op1) != 0) > > >> > + continue; > > >> > > > >> >/* Find the first flag setting insn before INSN in basic block > > >> >BB. > > >> >*/ > > >> >gcc_assert (insn != BB_HEAD (bb)); > > >> > > > >> > @@ -17249,7 +17254,7 @@ thumb1_reorg (void) > > >> > > > >> > PATTERN (prev) = gen_rtx_SET (dest, src); > > >> > INSN_CODE (prev) = -1; > > >> > /* Set test register in INSN to dest. */ > > >> > > > >> > - XEXP (XEXP (SET_SRC (pat), 0), 0) = copy_rtx (dest); > > >> > + XEXP (cmp, 0) = copy_rtx (dest); > > >> > > > >> > INSN_CODE (insn) = -1; > > >> > > > >> > } > > >> > > > >> > } > > >> > > > >> > Testsuite shows no regression when run for arm-none-eabi with > > >> > -mcpu=cortex-m0 -mthumb > > > > > > The patch applies cleanly on gcc-5-branch and also show no regression > > > when > > > run for arm-none-eabi with -mcpu=cortex-m0 -mthumb. Is it ok to > > > backport? > > > > This deserves a testcase. > > The original patch don't have one initially because it fixes a fail of an > existing testcase (loop-2b.c). However, the test pass on gcc 5 due to > difference in code generation. I'm currently trying to come up with a > testcase and will get back at you. Sadly I did not manage to come up with a testcase that works on GCC 5. One need to reproduce a sequence of the form: (set B A) (insn clobbering A that is not a set, ie store with post increment) (conditional branch between A and something else) In that case, thumb1_reorg changes the set into (set B (minus A 0)) which is safe but also replace A by B in the conditional insn which is unsafe in the above situation. The problem I am having is to make GCC generate a move instruction because it's always optimized away. Using local register variable is not an option because the move should be between regular registers. Any idea to construct a testcase? Best regards, Thomas
[PATCH] Clarify source of tm.texi to copy for GFDL grant
When tm.texi.in is updated in the source tree, the following message gets displayed: Verify that you have permission to grant a GFDL license for all new text in tm.texi, then copy it to /gcc/doc/tm.texi. Having been myself and some colleagues confused several time by that message as to what tm.texi to copy, I think it would be clearer to indicate the absolute path for the source as well. This patch achieves that. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-08-09 Thomas Preud'homme * Makefile.in: Clarify which tm.texi to copy over to assert the right to grant a GFDL license for all. Testing: Built GCC with a change in tm.texi.in and copied by copy/pasting the source and destination path from the resulting message. Second build then succeeded. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/Makefile.in b/gcc/Makefile.in index e7d818d174c..d8d2b885f6d 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2504,7 +2504,7 @@ s-tm-texi: build/genhooks$(build_exeext) $(srcdir)/doc/tm.texi.in else \ echo >&2 ; \ echo Verify that you have permission to grant a GFDL license for all >&2 ; \ - echo new text in tm.texi, then copy it to $(srcdir)/doc/tm.texi. >&2 ; \ + echo new text in $(objdir)/tm.texi, then copy it to $(srcdir)/doc/tm.texi. >&2 ; \ false; \ fi -- 2.18.0
Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.
Hi Tamar, Thanks for your patch. Just one comment about your ChangeLog entry for the testsuiet change: shouldn't it mention that it is a new testcase? The patch you attached seems to create the file. Best regards, Thomas On Mon, 13 Aug 2018 at 10:33, Tamar Christina wrote: > Hi All, > > On AArch64 we have integer modes larger than TImode, and while we can > generate > moves for these they're not as efficient. > > So instead make sure we limit the maximum we can copy to TImode. This > means > copying a 16 byte struct will issue 1 TImode copy, which will be done > using a > single STP as we expect but an CImode sized copy won't issue CImode > operations. > > Bootstrapped and regtested on aarch4-none-linux-gnu and no issues. > Crosstested aarch4_be-none-elf and no issues. > > Ok for trunk? > > Thanks, > Tamar > > gcc/ > 2018-08-13 Tamar Christina > > * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max. > > gcc/testsuite/ > 2018-08-13 Tamar Christina > > * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan. > > -- >
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Resend hopefully without HTML this time. On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme wrote: > > Hi, > > I've reworked the patch fixing PR85434 (spilling of stack protector guard's > address on ARM) to address the testsuite regression on powerpc and x86 as > well as glibc testsuite regression on ARM. Issues were due to unconditionally > attempting to generate the new patterns. The code now tests if there is a > pattern for them for the target before generating them. In the ARM side of > the patch, I've also added a more specific predicate for the new patterns. > The new patch is found below. > > > In case of high register pressure in PIC mode, address of the stack > protector's guard can be spilled on ARM targets as shown in PR85434, > thus allowing an attacker to control what the canary would be compared > against. ARM does lack stack_protect_set and stack_protect_test insn > patterns, defining them does not help as the address is expanded > regularly and the patterns only deal with the copy and test of the > guard with the canary. > > This problem does not occur for x86 targets because the PIC access and > the test can be done in the same instruction. Aarch64 is exempt too > because PIC access insn pattern are mov of UNSPEC which prevents it from > the second access in the epilogue being CSEd in cse_local pass with the > first access in the prologue. > > The approach followed here is to create new "combined" set and test > standard pattern names that take the unexpanded guard and do the set or > test. This allows the target to use an opaque pattern (eg. using UNSPEC) > to hide the individual instructions being generated to the compiler and > split the pattern into generic load, compare and branch instruction > after register allocator, therefore avoiding any spilling. This is here > implemented for the ARM targets. For targets not implementing these new > standard pattern names, the existing stack_protect_set and > stack_protect_test pattern names are used. > > To be able to split PIC access after register allocation, the functions > had to be augmented to force a new PIC register load and to control > which register it loads into. This is because sharing the PIC register > between prologue and epilogue could lead to spilling due to CSE again > which an attacker could use to control what the canary gets compared > against. > > ChangeLog entries are as follows: > > *** gcc/ChangeLog *** > > 2018-08-09 Thomas Preud'homme > > * target-insns.def (stack_protect_combined_set): Define new standard > pattern name. > (stack_protect_combined_test): Likewise. > * cfgexpand.c (stack_protect_prologue): Try new > stack_protect_combined_set pattern first. > * function.c (stack_protect_epilogue): Try new > stack_protect_combined_test pattern first. > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > parameters to control which register to use as PIC register and force > reloading PIC register respectively. Insert in the stream of insns if > possible. > (legitimize_pic_address): Expose above new parameters in prototype and > adapt recursive calls accordingly. > (arm_legitimize_address): Adapt to new legitimize_pic_address > prototype. > (thumb_legitimize_address): Likewise. > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > change. > * config/arm/predicated.md (guard_operand): New predicate. > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address > prototype change. > (stack_protect_combined_set): New insn_and_split pattern. > (stack_protect_set): New insn pattern. > (stack_protect_combined_test): New insn_and_split pattern. > (stack_protect_test): New insn pattern. > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec. > (UNSPEC_SP_TEST): Likewise. > * doc/md.texi (stack_protect_combined_set): Document new standard > pattern name. > (stack_protect_set): Clarify that the operand for guard's address is > legal. > (stack_protect_combined_test): Document new standard pattern name. > (stack_protect_test): Clarify that the operand for guard's address is > legal. > > *** gcc/testsuite/ChangeLog *** > > 2018-07-05 Thomas Preud'homme > > * gcc.target/arm/pr85434.c: New test. > > > Testing: > > native x86_64: bootstrap + testsuite -> no regression, can see failures with > previous version of patch but not with new version > native powerpc64: bootstrap + testsuite -> no regression, can see failur
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Forgot another important change in ARM backend: The expander were causing one too many indirection which was what caused the test failure in glibc. The new expanders code skip the creation of a move from the memory reference of the guard's address to a register since this is done in the insn themselves. I think during the initial implementation of the first version of the patch I had issues with loading the address and used that to load the address. As can be seen from the absence of regression on the runtime stack protector test in glibc, this is now working properly, also confirmed by manual inspection of the code. I've attached the interdiff from previous version for reference. Best regards, Thomas On Wed, 29 Aug 2018 at 10:51, Thomas Preudhomme wrote: > > Resend hopefully without HTML this time. > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme > wrote: > > > > Hi, > > > > I've reworked the patch fixing PR85434 (spilling of stack protector guard's > > address on ARM) to address the testsuite regression on powerpc and x86 as > > well as glibc testsuite regression on ARM. Issues were due to > > unconditionally attempting to generate the new patterns. The code now tests > > if there is a pattern for them for the target before generating them. In > > the ARM side of the patch, I've also added a more specific predicate for > > the new patterns. The new patch is found below. > > > > > > In case of high register pressure in PIC mode, address of the stack > > protector's guard can be spilled on ARM targets as shown in PR85434, > > thus allowing an attacker to control what the canary would be compared > > against. ARM does lack stack_protect_set and stack_protect_test insn > > patterns, defining them does not help as the address is expanded > > regularly and the patterns only deal with the copy and test of the > > guard with the canary. > > > > This problem does not occur for x86 targets because the PIC access and > > the test can be done in the same instruction. Aarch64 is exempt too > > because PIC access insn pattern are mov of UNSPEC which prevents it from > > the second access in the epilogue being CSEd in cse_local pass with the > > first access in the prologue. > > > > The approach followed here is to create new "combined" set and test > > standard pattern names that take the unexpanded guard and do the set or > > test. This allows the target to use an opaque pattern (eg. using UNSPEC) > > to hide the individual instructions being generated to the compiler and > > split the pattern into generic load, compare and branch instruction > > after register allocator, therefore avoiding any spilling. This is here > > implemented for the ARM targets. For targets not implementing these new > > standard pattern names, the existing stack_protect_set and > > stack_protect_test pattern names are used. > > > > To be able to split PIC access after register allocation, the functions > > had to be augmented to force a new PIC register load and to control > > which register it loads into. This is because sharing the PIC register > > between prologue and epilogue could lead to spilling due to CSE again > > which an attacker could use to control what the canary gets compared > > against. > > > > ChangeLog entries are as follows: > > > > *** gcc/ChangeLog *** > > > > 2018-08-09 Thomas Preud'homme > > > > * target-insns.def (stack_protect_combined_set): Define new standard > > pattern name. > > (stack_protect_combined_test): Likewise. > > * cfgexpand.c (stack_protect_prologue): Try new > > stack_protect_combined_set pattern first. > > * function.c (stack_protect_epilogue): Try new > > stack_protect_combined_test pattern first. > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > > parameters to control which register to use as PIC register and force > > reloading PIC register respectively. Insert in the stream of insns if > > possible. > > (legitimize_pic_address): Expose above new parameters in prototype and > > adapt recursive calls accordingly. > > (arm_legitimize_address): Adapt to new legitimize_pic_address > > prototype. > > (thumb_legitimize_address): Likewise. > > (arm_emit_call_insn): Adapt to new require_pic_register prototype. > > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype > > change. > > * config/arm/predicated.md (guard_operand): New predicate. > > * config/arm/arm.md (movsi expander
Re: [PATCH, GCC/LTO, ping3] Fix PR69866: LTO with def for weak alias in regular object file
On 09/05/17 23:36, Jan Hubicka wrote: Ping? Sorry for late reply My turn to apologize now. Hi, This patch fixes an assert failure when linking one LTOed object file having a weak alias with a regular object file containing a strong definition for that same symbol. The patch is twofold: + do not add an alias to a partition if it is external + do not declare (.globl) an alias if it is external Adding external alises to partitions is important to keep the information that two symbols are the same. So how about simply relaxing the assert then? Right now it trips for any external symbol, even external aliases. How about the following: ChangeLog entries are as follow: *** gcc/lto/ChangeLog *** 2017-06-02 Thomas Preud'homme * lto/lto-partition.c (add_symbol_to_partition_1): Change assert to allow external aliases to be added. *** gcc/ChangeLog *** 2017-03-01 Thomas Preud'homme * cgraphunit.c (cgraph_node::assemble_thunks_and_aliases): Do not declare external aliases. *** gcc/testsuite/ChangeLog *** 2017-02-28 Thomas Preud'homme * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Bootstrapped with LTO on Aarch64 and ARM and testsuite on both of these architectures do not show any regression. Is this ok for trunk? Best regards, Thomas The second part makes sense to me. What breaks when you drop the first change? Honza ChangeLog entries are as follow: *** gcc/lto/ChangeLog *** 2017-03-01 Thomas Preud'homme PR lto/69866 * lto/lto-partition.c (add_symbol_to_partition_1): Do not add external aliases to partition. *** gcc/ChangeLog *** 2017-03-01 Thomas Preud'homme PR lto/69866 * cgraphunit.c (cgraph_node::assemble_thunks_and_aliases): Do not declare external aliases. *** gcc/testsuite/ChangeLog *** 2017-02-28 Thomas Preud'homme PR lto/69866 * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Testing: Testsuite shows no regression when targeting Cortex-M3 with an arm-none-eabi GCC cross-compiler, neither does it show any regression with native LTO-bootstrapped x86-64_linux-gnu and aarch64-linux-gnu compilers. Is this ok for stage4? Best regards, Thomas On 31/03/17 18:07, Richard Biener wrote: On March 31, 2017 5:23:03 PM GMT+02:00, Jeff Law wrote: On 03/16/2017 08:05 AM, Thomas Preudhomme wrote: Ping? Is this ok for stage4? Given the lack of response from Richi, I'd suggest deferring to stage1. Honza needs to review this, i habe too little knowledge here. Richard. jeff diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index c82a88a599ca61b068dd9783d2a6158163809b37..580500ff922b8546d33119261a2455235edbf16d 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1972,7 +1972,7 @@ cgraph_node::assemble_thunks_and_aliases (void) FOR_EACH_ALIAS (this, ref) { cgraph_node *alias = dyn_cast (ref->referring); - if (!alias->transparent_alias) + if (!alias->transparent_alias && !DECL_EXTERNAL (alias->decl)) { bool saved_written = TREE_ASM_WRITTEN (decl); diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index e27d0d1690c1fcfb39e2fac03ce0f4154031fc7c..f44fd435ed075a27e373bdfdf0464eb06e1731ef 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -178,7 +178,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) /* Add all aliases associated with the symbol. */ FOR_EACH_ALIAS (node, ref) -if (!ref->referring->transparent_alias) +if (!ref->referring->transparent_alias + && ref->referring->get_partitioning_class () != SYMBOL_EXTERNAL) add_symbol_to_partition_1 (part, ref->referring); else { @@ -189,7 +190,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) { /* Nested transparent aliases are not permitted. */ gcc_checking_assert (!ref2->referring->transparent_alias); - add_symbol_to_partition_1 (part, ref2->referring); + if (ref2->referring->get_partitioning_class () != SYMBOL_EXTERNAL) + add_symbol_to_partition_1 (part, ref2->referring); } } diff --git a/gcc/testsuite/gcc.dg/lto/pr69866_0.c b/gcc/testsuite/gcc.dg/lto/pr69866_0.c new file mode 100644 index ..f49ef8d4c1da7a21d1bfb5409d647bd18141595b --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr69866_0.c @@ -0,0 +1,13 @@ +/* { dg-lto-do link } */ + +int _umh(int i) +{ + return i+1; +} + +int weaks(int i) __attribute__((weak, alias("_umh"))); + +int main() +{ + return weaks(10); +} diff --git a/gcc/testsuite/gcc.dg/lto/pr69866_1.c b/gcc/testsuite/gcc.dg/lto/pr69866_1.c new file mode 100644 index ..3a14f850eefaffbf659ce4642adef7900330f4ed --- /dev
[PATCH, GCC/testsuite/ARM] Allow arm_arch_*_ok to test several macros
Hi, The general arm_arch_*_ok procedures check architecture availability by substituting macros inside a defined preprocessor operator. This limits them to only check definition of only one macro and force ARMv7VE to be special cased. This patch takes advantage of the fact that architecture macros, when defined, are not null to allow expressing architecture availability by a boolean operation of possibly several macros. It then takes advantage of this to deal with ARMv7VE in the general case. The patch also adds a comment to make it clear that check_effective_target_arm_arch_FUNC_ok does not work as intendend for architecture extensions (eg. ARMv8.1-A) due to lack of extension-specific macro similar to __ARM_ARCH_*__. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok): Test for null definitions instead of them being undefined. Add entry for ARMv7VE. Reindent entry for ARMv8-M Baseline. Add comment warning about using the effective target for architecture extension. (check_effective_target_arm_arch_v7ve_ok): Remove. (add_options_for_arm_arch_v7ve): Likewise. Testing: - gcc.target/arm/atomic_loaddi_10.c passes with the patch for armv7ve but is marked unsupported for armv7-a - verified in the logs that -march=armv7ve is correctly added when running gcc.target/arm/ftest-armv7ve-arm.c Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index ded6383cc1f9a1489cd83e1dace0c2fc48e252c3..e83ec757ae3c0dd7c3cad19cfd5d9577547d18a5 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3775,12 +3775,13 @@ proc check_effective_target_arm_fp16_hw { } { # can be selected and a routine to give the flags to select that architecture # Note: Extra flags may be added to disable options from newer compilers # (Thumb in particular - but others may be added in the future). -# -march=armv7ve is special and is handled explicitly after this loop because -# it needs more than one predefine check to identify. +# Warning: Do not use check_effective_target_arm_arch_*_ok for architecture +# extension (eg. ARMv8.1-A) since there is no macro defined for them. See +# how only __ARM_ARCH_8A__ is checked for ARMv8.1-A. # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */ #/* { dg-add-options arm_arch_v5 } */ # /* { dg-require-effective-target arm_arch_v5_multilib } */ -foreach { armfunc armflag armdef } { +foreach { armfunc armflag armdefs } { v4 "-march=armv4 -marm" __ARM_ARCH_4__ v4t "-march=armv4t" __ARM_ARCH_4T__ v5 "-march=armv5 -marm" __ARM_ARCH_5__ @@ -3795,20 +3796,23 @@ foreach { armfunc armflag armdef } { v7r "-march=armv7-r" __ARM_ARCH_7R__ v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ + v7ve "-march=armv7ve -marm" + "__ARM_ARCH_7A__ && __ARM_FEATURE_IDIV" v8a "-march=armv8-a" __ARM_ARCH_8A__ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ v8_2a "-march=armv8.2a" __ARM_ARCH_8A__ - v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" __ARM_ARCH_8M_BASE__ + v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" + __ARM_ARCH_8M_BASE__ v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { -eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] { +eval [string map [list FUNC $armfunc FLAG $armflag DEFS $armdefs ] { proc check_effective_target_arm_arch_FUNC_ok { } { if { [ string match "*-marm*" "FLAG" ] && ![check_effective_target_arm_arm_ok] } { return 0 } return [check_no_compiler_messages arm_arch_FUNC_ok assembly { - #if !defined (DEF) - #error !DEF + #if !(DEFS) + #error !(DEFS) #endif } "FLAG" ] } @@ -3829,26 +3833,6 @@ foreach { armfunc armflag armdef } { }] } -# Same functions as above but for -march=armv7ve. To uniquely identify -# -march=armv7ve we need to check for __ARM_ARCH_7A__ as well as -# __ARM_FEATURE_IDIV otherwise it aliases with armv7-a. - -proc check_effective_target_arm_arch_v7ve_ok { } { - if { [ string match "*-marm*" "-march=armv7ve" ] && - ![check_effective_target_arm_arm_ok] } { - return 0 -} - return [check_no_compiler_messages arm_arch_v7ve_ok assembly { - #if !defined (__ARM_ARCH_7A__) || !defined (__ARM_FEATURE_IDIV) - #error !armv7ve - #endif - } "-march=armv7ve" ] -} - -proc add_options_for_arm_arch_v7ve { flags } { -return "$flags -march=armv7ve" -} - # Return 1 if GCC was configured with --with-mode= proc check_effective_target_default_mode { } {
Re: [PATCH, GCC/LTO, ping] Fix PR69866: LTO with def for weak alias in regular object file
Ping? Best regards, Thomas On 06/06/17 11:12, Thomas Preudhomme wrote: On 09/05/17 23:36, Jan Hubicka wrote: Ping? Sorry for late reply My turn to apologize now. Hi, This patch fixes an assert failure when linking one LTOed object file having a weak alias with a regular object file containing a strong definition for that same symbol. The patch is twofold: + do not add an alias to a partition if it is external + do not declare (.globl) an alias if it is external Adding external alises to partitions is important to keep the information that two symbols are the same. So how about simply relaxing the assert then? Right now it trips for any external symbol, even external aliases. How about the following: ChangeLog entries are as follow: *** gcc/lto/ChangeLog *** 2017-06-02 Thomas Preud'homme * lto/lto-partition.c (add_symbol_to_partition_1): Change assert to allow external aliases to be added. *** gcc/ChangeLog *** 2017-03-01 Thomas Preud'homme * cgraphunit.c (cgraph_node::assemble_thunks_and_aliases): Do not declare external aliases. *** gcc/testsuite/ChangeLog *** 2017-02-28 Thomas Preud'homme * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Bootstrapped with LTO on Aarch64 and ARM and testsuite on both of these architectures do not show any regression. Is this ok for trunk? Best regards, Thomas The second part makes sense to me. What breaks when you drop the first change? Honza ChangeLog entries are as follow: *** gcc/lto/ChangeLog *** 2017-03-01 Thomas Preud'homme PR lto/69866 * lto/lto-partition.c (add_symbol_to_partition_1): Do not add external aliases to partition. *** gcc/ChangeLog *** 2017-03-01 Thomas Preud'homme PR lto/69866 * cgraphunit.c (cgraph_node::assemble_thunks_and_aliases): Do not declare external aliases. *** gcc/testsuite/ChangeLog *** 2017-02-28 Thomas Preud'homme PR lto/69866 * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Testing: Testsuite shows no regression when targeting Cortex-M3 with an arm-none-eabi GCC cross-compiler, neither does it show any regression with native LTO-bootstrapped x86-64_linux-gnu and aarch64-linux-gnu compilers. Is this ok for stage4? Best regards, Thomas On 31/03/17 18:07, Richard Biener wrote: On March 31, 2017 5:23:03 PM GMT+02:00, Jeff Law wrote: On 03/16/2017 08:05 AM, Thomas Preudhomme wrote: Ping? Is this ok for stage4? Given the lack of response from Richi, I'd suggest deferring to stage1. Honza needs to review this, i habe too little knowledge here. Richard. jeff diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index c82a88a599ca61b068dd9783d2a6158163809b37..580500ff922b8546d33119261a2455235edbf16d 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1972,7 +1972,7 @@ cgraph_node::assemble_thunks_and_aliases (void) FOR_EACH_ALIAS (this, ref) { cgraph_node *alias = dyn_cast (ref->referring); - if (!alias->transparent_alias) + if (!alias->transparent_alias && !DECL_EXTERNAL (alias->decl)) { bool saved_written = TREE_ASM_WRITTEN (decl); diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index e27d0d1690c1fcfb39e2fac03ce0f4154031fc7c..f44fd435ed075a27e373bdfdf0464eb06e1731ef 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -178,7 +178,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) /* Add all aliases associated with the symbol. */ FOR_EACH_ALIAS (node, ref) -if (!ref->referring->transparent_alias) +if (!ref->referring->transparent_alias +&& ref->referring->get_partitioning_class () != SYMBOL_EXTERNAL) add_symbol_to_partition_1 (part, ref->referring); else { @@ -189,7 +190,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) { /* Nested transparent aliases are not permitted. */ gcc_checking_assert (!ref2->referring->transparent_alias); -add_symbol_to_partition_1 (part, ref2->referring); +if (ref2->referring->get_partitioning_class () != SYMBOL_EXTERNAL) + add_symbol_to_partition_1 (part, ref2->referring); } } diff --git a/gcc/testsuite/gcc.dg/lto/pr69866_0.c b/gcc/testsuite/gcc.dg/lto/pr69866_0.c new file mode 100644 index ..f49ef8d4c1da7a21d1bfb5409d647bd18141595b --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr69866_0.c @@ -0,0 +1,13 @@ +/* { dg-lto-do link } */ + +int _umh(int i) +{ + return i+1; +} + +int weaks(int i) __attribute__((weak, alias("_umh"))); + +int main() +{ + return weaks(10); +} diff --git a/gcc/testsuite/gcc.dg/lto/pr69866_1.c b/gcc/testsuite/gcc.dg/lto/pr69866_1.c new file mode 100644 index ..3a14f850eefaffbf659ce4642adef7900330f4ed --
[PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index ded6383cc1f9a1489cd83e1dace0c2fc48e252c3..d7367999fc9df8cf7c654fbb03a059b309e062d6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2916,7 +2916,7 @@ proc check_effective_target_vect_int { } { || [istarget alpha*-*-*] || [istarget ia64-*-*] || [istarget aarch64*-*-*] - || [check_effective_target_arm32] + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && ([et-is-effective-target mips_loongson] || [et-is-effective-target mips_msa])) } { @@ -2944,8 +2944,7 @@ proc check_effective_target_vect_intfloat_cvt { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_intfloat_cvt_saved($et_index) 1 @@ -2987,8 +2986,7 @@ proc check_effective_target_vect_uintfloat_cvt { } { || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) || [istarget aarch64*-*-*] - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_uintfloat_cvt_saved($et_index) 1 @@ -3016,8 +3014,7 @@ proc check_effective_target_vect_floatint_cvt { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_floatint_cvt_saved($et_index) 1 @@ -3043,8 +3040,7 @@ proc check_effective_target_vect_floatuint_cvt { } { set et_vect_floatuint_cvt_saved($et_index) 0 if { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_floatuint_cvt_saved($et_index) 1 @@ -4903,7 +4899,7 @@ proc check_effective_target_vect_shift { } { || [istarget ia64-*
[PATCH, GCC/testsuite] Fix gen-vect-26.c requirements
Hi, gen-vect-26.c tests the vectorizer but only requires vect_cmdline_needed effective target. It should also depends on vect_int to make sure a vector unit is available on the target. This patch fixes that. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-05 Thomas Preud'homme * gcc.dg/tree-ssa/gen-vect-26.c: Also require vect_int effective target. Testing: Testcase is now skipped when targeting Cortex-M3. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c index 8e5f1410612b075914000dcdc643b2838ee3dcd9..8edeb0bbfd31b3926382da27bfafa4f331066ba9 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c @@ -1,4 +1,4 @@ -/* { dg-do run { target vect_cmdline_needed } } */ +/* { dg-do run { target { vect_cmdline_needed && vect_int } } } */ /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */ /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 13/06/17 20:22, Christophe Lyon wrote: Hi Thomas, On 13 June 2017 at 11:08, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). That's strange, my testing detects a syntax error: Executed from: gcc.dg/vect/vect.exp gcc.dg/vect/slp-9.c: error executing dg-final: unbalanced close paren Indeed, I can see the missing parenthesis. I've checked again with the sum file and even with -v -v -v -v dg-cmp-results does not show any regression. compare_tests does though but is often more noisy (saying some tests having disappeared and appeared). This sounds like dg-cmp-results needs to be improved here. I'll do that first then test a fixed version of the patch. Many thanks for the testing! See http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249142-consistent_neon_check/report-build-info.html for a full picture. Note that the cells with "BETTER" seem to be mostly several PASSes becoming unsupported. Thanks, Best regards, Thomas
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 14/06/17 09:29, Christophe Lyon wrote: On 14 June 2017 at 10:25, Thomas Preudhomme wrote: On 13/06/17 20:22, Christophe Lyon wrote: Hi Thomas, On 13 June 2017 at 11:08, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). That's strange, my testing detects a syntax error: Executed from: gcc.dg/vect/vect.exp gcc.dg/vect/slp-9.c: error executing dg-final: unbalanced close paren Indeed, I can see the missing parenthesis. I've checked again with the sum file and even with -v -v -v -v dg-cmp-results does not show any regression. compare_tests does though but is often more noisy (saying some tests having disappeared and appeared). This sounds like dg-cmp-results needs to be improved here. I'll do that first then test a fixed version of the patch. I did patch compare_tests a while ago such that it catches ERROR message from dejagnu (r240288) So dg-cmp-results assume there is only one tool tested in the .sum file (it throws everything before "^Running" and everything after "^[[:space:]]+===" which it assumes is the summary. Gosh, I've used it countless time in that way... Will provide a patch to make it work also in that setup. Best regards, Thomas
[PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
Hi, dg-cmp-results.sh contrib script is written to work with sum file for a single tool only. It throws away the header including the first === line and everything starting from the following ===, assuming it is the test result. This does not work well for sum files with results for multiple tools. This patch changes the logic to instead keep everything between "Running target" line and the beginning of Summary line. Other existing filter mechanism will ensure only FAIL, PASS, etc. lines are kept after that. ChangeLog entry is as follow: *** contrib/ChangeLog *** 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Tested successfully on sum file with single tool with similar results and on sum file with multiple tools now showing a regression with patch proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html Is this ok for trunk? Best regards, Thomas
Re: [PATCH, GCC/testsuite/ARM] Allow arm_arch_*_ok to test several macros
I've heard adding the patch usually helps getting it review so here it is. :-) Best regards, Thomas On 07/06/17 16:42, Thomas Preudhomme wrote: Hi, The general arm_arch_*_ok procedures check architecture availability by substituting macros inside a defined preprocessor operator. This limits them to only check definition of only one macro and force ARMv7VE to be special cased. This patch takes advantage of the fact that architecture macros, when defined, are not null to allow expressing architecture availability by a boolean operation of possibly several macros. It then takes advantage of this to deal with ARMv7VE in the general case. The patch also adds a comment to make it clear that check_effective_target_arm_arch_FUNC_ok does not work as intendend for architecture extensions (eg. ARMv8.1-A) due to lack of extension-specific macro similar to __ARM_ARCH_*__. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok): Test for null definitions instead of them being undefined. Add entry for ARMv7VE. Reindent entry for ARMv8-M Baseline. Add comment warning about using the effective target for architecture extension. (check_effective_target_arm_arch_v7ve_ok): Remove. (add_options_for_arm_arch_v7ve): Likewise. Testing: - gcc.target/arm/atomic_loaddi_10.c passes with the patch for armv7ve but is marked unsupported for armv7-a - verified in the logs that -march=armv7ve is correctly added when running gcc.target/arm/ftest-armv7ve-arm.c Is this ok for trunk? Best regards, Thomas diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index d291769547dcd2a02ecf6f80d60d6be7802af4fd..d875b4bd8bca16c1f381355612ef34f6879c5674 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -91,8 +91,7 @@ sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section. sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ - -e '/^[A-Z]+:/!d' \ + -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ -e 's/^/O:/' \ @@ -102,8 +101,7 @@ sed $E -e "1,/$header/d" \ # Create a temporary file from the new file's interesting section. sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ - -e '/^[A-Z]+:/!d' \ + -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ -e 's/^/N:/' \
[PATCH, GCC/testsuite/ARM] Make gcc.target/arm/its.c more robust
Hi, Testcase gcc.target/arm/its.c was added as part of a patch [1] to limit IT blocks to 2 instructions maximum. However, the patch was only tested indirectly by *aiming* to check that the assembly output does not contain a single IT block with all conditional code in it. This was actually implemented by expecting exactly 2 IT blocks. [1] https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00764.html This does not work as proved by the regression following code changes brought by r248863: some of the instructions are conditionally executed using a branch and thus there is only one IT block. This patch changes the logic to look for an IT block with more than 2 conditions, ie. IT followed by zero or one non space letter. This patch also restrict the testcase to Thumb-only devices since the patch the testcase was contributed with only concerned ARMv7-M targets. Since tuning for ARMv7E-M targets is even more restrictive (only one instruction per IT block), restricting the testcase to all Thumb-only devices is sufficient. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-09 Thomas Preud'homme * gcc.target/arm/its.c: Check that no IT blocks has more than 2 instructions in it rather than the number of IT blocks being 2. Transfer scan directive arm_thumb2 restriction to the whole testcase and restrict further to Thumb-only targets. Testing: Test is correctly skipped when targeting Thumb mode of Cortex-A15 and Cortex-M0 and PASS for Cortex-M7. Note that it FAILs for Cortex-M3 and Cortex-M4 and manual inspection does reveal that an IT block is generated with more than 2 instructions in it. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/its.c b/gcc/testsuite/gcc.target/arm/its.c index 5425f1e920592c911771d93a4620448b06d51394..4e07871b57886e210391db1a72d1bc5b465a49d0 100644 --- a/gcc/testsuite/gcc.target/arm/its.c +++ b/gcc/testsuite/gcc.target/arm/its.c @@ -1,4 +1,6 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_cortex_m } */ +/* { dg-require-effective-target arm_thumb2 } */ /* { dg-options "-O2" } */ int test (int a, int b) { @@ -17,4 +19,6 @@ int test (int a, int b) r -= 3; return r; } -/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */ +/* Ensure there is no IT block with more than 2 instructions, ie. we only allow + IT, ITT and ITE. */ +/* { dg-final { scan-assembler-not "\\sit\[^\\s\]{2,}\\s" } } */
Re: [PATCH, GCC/testsuite/ARM] Make gcc.target/arm/its.c more robust
On 14/06/17 18:03, Richard Earnshaw (lists) wrote: On 14/06/17 17:49, Thomas Preudhomme wrote: Hi, Testcase gcc.target/arm/its.c was added as part of a patch [1] to limit IT blocks to 2 instructions maximum. However, the patch was only tested indirectly by *aiming* to check that the assembly output does not contain a single IT block with all conditional code in it. This was actually implemented by expecting exactly 2 IT blocks. [1] https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00764.html This does not work as proved by the regression following code changes brought by r248863: some of the instructions are conditionally executed using a branch and thus there is only one IT block. This patch changes the logic to look for an IT block with more than 2 conditions, ie. IT followed by zero or one non space letter. This patch also restrict the testcase to Thumb-only devices since the patch the testcase was contributed with only concerned ARMv7-M targets. Since tuning for ARMv7E-M targets is even more restrictive (only one instruction per IT block), restricting the testcase to all Thumb-only devices is sufficient. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-09 Thomas Preud'homme *gcc.target/arm/its.c: Check that no IT blocks has more than 2 instructions in it rather than the number of IT blocks being 2. Transfer scan directive arm_thumb2 restriction to the whole testcase and restrict further to Thumb-only targets. Testing: Test is correctly skipped when targeting Thumb mode of Cortex-A15 and Cortex-M0 and PASS for Cortex-M7. Note that it FAILs for Cortex-M3 and Cortex-M4 and manual inspection does reveal that an IT block is generated with more than 2 instructions in it. Is this ok for trunk? Best regards, Thomas make_its_test_more_robust.patch diff --git a/gcc/testsuite/gcc.target/arm/its.c b/gcc/testsuite/gcc.target/arm/its.c index 5425f1e920592c911771d93a4620448b06d51394..4e07871b57886e210391db1a72d1bc5b465a49d0 100644 --- a/gcc/testsuite/gcc.target/arm/its.c +++ b/gcc/testsuite/gcc.target/arm/its.c @@ -1,4 +1,6 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_cortex_m } */ +/* { dg-require-effective-target arm_thumb2 } */ /* { dg-options "-O2" } */ int test (int a, int b) { @@ -17,4 +19,6 @@ int test (int a, int b) r -= 3; return r; } -/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */ +/* Ensure there is no IT block with more than 2 instructions, ie. we only allow + IT, ITT and ITE. */ +/* { dg-final { scan-assembler-not "\\sit\[^\\s\]{2,}\\s" } } */ Wouldn't {dg-final { scan-assembler-not "it[te][te]" } } be easier to understand? Indeed, or rather "\\sit\[te\]\[te\]" once properly escaped. "\\sit\[te\]{2}" also works and is even simpler so this is what this updated version uses. Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/its.c b/gcc/testsuite/gcc.target/arm/its.c index 5425f1e920592c911771d93a4620448b06d51394..f81a0df51cdb5fc26208c0a99e5c1cfb2ee4ed04 100644 --- a/gcc/testsuite/gcc.target/arm/its.c +++ b/gcc/testsuite/gcc.target/arm/its.c @@ -1,4 +1,6 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_cortex_m } */ +/* { dg-require-effective-target arm_thumb2 } */ /* { dg-options "-O2" } */ int test (int a, int b) { @@ -17,4 +19,6 @@ int test (int a, int b) r -= 3; return r; } -/* { dg-final { scan-assembler-times "\tit" 2 { target arm_thumb2 } } } */ +/* Ensure there is no IT block with more than 2 instructions, ie. we only allow + IT, ITT and ITE. */ +/* { dg-final { scan-assembler-not "\\sit\[te\]{2}" } } */
Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
Forgetting the patch: check! Sending it later as a reply to the wrong message: check! Hopefully I won't check a second time any of those. Best regards, Thomas On 14/06/17 13:30, Thomas Preudhomme wrote: Hi, dg-cmp-results.sh contrib script is written to work with sum file for a single tool only. It throws away the header including the first === line and everything starting from the following ===, assuming it is the test result. This does not work well for sum files with results for multiple tools. This patch changes the logic to instead keep everything between "Running target" line and the beginning of Summary line. Other existing filter mechanism will ensure only FAIL, PASS, etc. lines are kept after that. ChangeLog entry is as follow: *** contrib/ChangeLog *** 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Tested successfully on sum file with single tool with similar results and on sum file with multiple tools now showing a regression with patch proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html Is this ok for trunk? Best regards, Thomas diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index d291769547dcd2a02ecf6f80d60d6be7802af4fd..d875b4bd8bca16c1f381355612ef34f6879c5674 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -91,8 +91,7 @@ sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section. sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ - -e '/^[A-Z]+:/!d' \ + -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ -e 's/^/O:/' \ @@ -102,8 +101,7 @@ sed $E -e "1,/$header/d" \ # Create a temporary file from the new file's interesting section. sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ - -e '/^[A-Z]+:/!d' \ + -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ -e 's/^/N:/' \
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). Testing was done with both compare_tests and the updated dg-cmp-results proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html Is this ok for trunk? Best regards, Thomas On 13/06/17 20:22, Christophe Lyon wrote: Hi Thomas, On 13 June 2017 at 11:08, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite show
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 19/06/17 08:41, Christophe Lyon wrote: Hi Thomas, On 15 June 2017 at 18:18, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). Testing was done with both compare_tests and the updated dg-cmp-results proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html Is this ok for trunk? I applied your patch on top of r249233, and noticed quite a few changes: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html Note that "Big-Regression" cases are caused by the fact that there a are PASS->XPASS and XFAILs disappear with your patch, and many (3000-4000) PASS disappear. In that intended? It certainly is not. I'd like to investigate this but the link to results for rev 249233 is broken. Could you provide me with the results you have for that so that I can compare manually? Best regards, Thomas
Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
Wrong copy paste between the patch I tested and the patch I sent. The first and second command of the sed should be replaced, not the second and third as in the patch I sent. For more safety I'll rerun the tests. Best regards, Thomas On 15/06/17 17:15, Thomas Preudhomme wrote: Forgetting the patch: check! Sending it later as a reply to the wrong message: check! Hopefully I won't check a second time any of those. Best regards, Thomas On 14/06/17 13:30, Thomas Preudhomme wrote: Hi, dg-cmp-results.sh contrib script is written to work with sum file for a single tool only. It throws away the header including the first === line and everything starting from the following ===, assuming it is the test result. This does not work well for sum files with results for multiple tools. This patch changes the logic to instead keep everything between "Running target" line and the beginning of Summary line. Other existing filter mechanism will ensure only FAIL, PASS, etc. lines are kept after that. ChangeLog entry is as follow: *** contrib/ChangeLog *** 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Tested successfully on sum file with single tool with similar results and on sum file with multiple tools now showing a regression with patch proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html Is this ok for trunk? Best regards, Thomas
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 19/06/17 10:16, Thomas Preudhomme wrote: On 19/06/17 08:41, Christophe Lyon wrote: Hi Thomas, On 15 June 2017 at 18:18, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). Testing was done with both compare_tests and the updated dg-cmp-results proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html Is this ok for trunk? I applied your patch on top of r249233, and noticed quite a few changes: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html Note that "Big-Regression" cases are caused by the fact that there a are PASS->XPASS and XFAILs disappear with your patch, and many (3000-4000) PASS disappear. In that intended? It certainly is not. I'd like to investigate this but the link to results for rev 249233 is broken. Could you provide me with the results you have for that so that I can compare manually? Actually yes it is, at least for the configurations with default (which still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've checked all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi configuration and none of them has a dg directive to select the neon unit (such as dg-additional-options ). I've also looked at arm-none-linux-gnueabihf configuration with neon FPU and there is no regression there. I therefore think this is all normal and expected. Note that under current trunk this should be different because neon-fp16 would be selected instead of vfp for default FPU with Cortex-A9. Best regards, Thomas
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 19/06/17 15:31, Christophe Lyon wrote: On 19 June 2017 at 16:11, Thomas Preudhomme wrote: On 19/06/17 10:16, Thomas Preudhomme wrote: On 19/06/17 08:41, Christophe Lyon wrote: Hi Thomas, On 15 June 2017 at 18:18, Thomas Preudhomme wrote: Hi, Conditions checked for ARM targets in vector-related effective targets are inconsistent: * sometimes arm*-*-* is checked * sometimes Neon is checked * sometimes arm_neon_ok and sometimes arm_neon is used for neon check * sometimes check_effective_target_* is used, sometimes is-effective-target This patch consolidate all of these check into using is-effective-target arm_neon and when little endian was checked, the check is kept. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Testing: Testsuite shows no regression when targeting ARMv7-A with -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with default FPU and float ABI (soft). Testing was done with both compare_tests and the updated dg-cmp-results proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html Is this ok for trunk? I applied your patch on top of r249233, and noticed quite a few changes: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html Note that "Big-Regression" cases are caused by the fact that there a are PASS->XPASS and XFAILs disappear with your patch, and many (3000-4000) PASS disappear. In that intended? It certainly is not. I'd like to investigate this but the link to results for rev 249233 is broken. Could you provide me with the results you have for that so that I can compare manually? Actually yes it is, at least for the configurations with default (which still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've checked all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi configuration and none of them has a dg directive to select the neon unit (such as dg-additional-options ). I've also looked at arm-none-linux-gnueabihf configuration with neon FPU and there is no regression there. I therefore think this is all normal and expected. Note that under current trunk this should be different because neon-fp16 would be selected instead of vfp for default FPU with Cortex-A9. OK, thanks for checking. So the version you sent on June 15th is OK? Yes. I can start a validation against current trunk, after Richard's series, it probably makes sense, doesn't it? I think it'll give cleaner results yes. Note that the one with an explicit -mfpu=vfp* without neon will still have a lot of changes but at least the one with default FPU should be more readable. Thanks, Christophe Best regards, Thomas
[arm-embedded] [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro
Hi, We have decided to apply the following patch to the ARM/embedded-6-branch and ARM/embedded-7-branch to implement the __ARM_FEATURE_COPROC coprocessor intrinsic feature macro. 2017-06-20 Thomas Preud'homme Backport from mainline 2017-06-20 Prakhar Bahuguna gcc/ * config/arm/arm-c.c (arm_cpu_builtins): New block to define __ARM_FEATURE_COPROC according to support. gcc/testsuite/ * gcc.target/arm/acle/cdp.c: Add feature macro bitmap test. * gcc.target/arm/acle/cdp2.c: Likewise. * gcc.target/arm/acle/ldc.c: Likewise. * gcc.target/arm/acle/ldc2.c: Likewise. * gcc.target/arm/acle/ldc2l.c: Likewise. * gcc.target/arm/acle/ldcl.c: Likewise. * gcc.target/arm/acle/mcr.c: Likewise. * gcc.target/arm/acle/mcr2.c: Likewise. * gcc.target/arm/acle/mcrr.c: Likewise. * gcc.target/arm/acle/mcrr2.c: Likewise. * gcc.target/arm/acle/mrc.c: Likewise. * gcc.target/arm/acle/mrc2.c: Likewise. * gcc.target/arm/acle/mrrc.c: Likewise. * gcc.target/arm/acle/mrrc2.c: Likewise. * gcc.target/arm/acle/stc.c: Likewise. * gcc.target/arm/acle/stc2.c: Likewise. * gcc.target/arm/acle/stc2l.c: Likewise. * gcc.target/arm/acle/stcl.c: Likewise. Best regards, Thomas --- Begin Message --- On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote: > On 16/06/17 08:48, Prakhar Bahuguna wrote: > > On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote: > >> On 14/06/17 10:35, Prakhar Bahuguna wrote: > >>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which > >>> coprocessor intrinsics are available for the target. If > >>> __ARM_FEATURE_COPROC is > >>> undefined, the target does not support coprocessor intrinsics. The feature > >>> levels are defined as follows: > >>> > >>> +-+---+--+ > >>> | **Bit** | **Value** | **Intrinsics Available** | > >>> +-+---+--+ > >>> | 0 | 0x1 | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc, | > >>> | | | __arm_stcl, __arm_mcr and __arm_mrc | > >>> +-+---+--+ > >>> | 1 | 0x2 | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, | > >>> | | | __arm_stc2l, __arm_mcr2 and __arm_mrc2 | > >>> +-+---+--+ > >>> | 2 | 0x4 | __arm_mcrr and __arm_mrrc| > >>> +-+---+--+ > >>> | 3 | 0x8 | __arm_mcrr2 and __arm_mrrc2 | > >>> +-+---+--+ > >>> > >>> This patch implements full support for this feature macro as defined in > >>> section > >>> 5.9 of the ACLE > >>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros). > >>> > >>> gcc/ChangeLog: > >>> > >>> 2017-06-14 Prakhar Bahuguna > >>> > >>> * config/arm/arm-c.c (arm_cpu_builtins): New block to define > >>>__ARM_FEATURE_COPROC according to support. > >>> > >>> 2017-06-14 Prakhar Bahuguna > >>> * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap > >>> test. > >>> * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise. > >>> * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise. > >>> > >>> Testing done: ACLE regression tests updated with tests for feature macro > >>> bits. > >>> All regression tests pass. > >>> > >>> Okay for trunk? > >>> > >>> > >>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch > >>> > >>> > >>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001 > >>> From: Prakhar Bahuguna > >>> Date: Tue, 2 May 2017 13:43:40 +0100 > >>> Subject: [PAT
[arm-embedded] [PATCH, GCC/LTO, ping] Fix PR69866: LTO with def for weak alias in regular object file
Hi, We have decided to apply the referenced fix (r249352) to the ARM/embedded-6-branch along with its initial commit (r249224) to fix an ICE with LTO and aliases. Fix PR69866 2017-06-20 Thomas Preud'homme Backport from mainline 2017-06-15 Jan Hubicka Thomas Preud'homme gcc/ PR lto/69866 * lto-symtab.c (lto_symtab_merge_symbols): Drop useless definitions that resolved externally. 2017-06-15 Thomas Preud'homme gcc/testsuite/ PR lto/69866 * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Backport from mainline 2017-06-18 Jan Hubicka gcc/testsuite/ * gcc.dg/lto/pr69866_0.c: This test needs alias. Best regards, Thomas --- Begin Message --- > The new test fails on darwin with the usual > > FAIL: gcc.dg/lto/pr69866 c_lto_pr69866_0.o-c_lto_pr69866_1.o link, -O0 -flto > -flto-partition=none > > IMO it requires a > > /* { dg-require-alias "" } */ Yep,I will add it shortly. Honza > > directive. > > TIA > > Dominique --- End Message ---
[arm-embedded] [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names
Hi, We have decided to apply the following patch to the embedded-6-branch to fix naming of an ARM intrinsic. ChangeLog entry is as follows: 2017-06-20 Thomas Preud'homme Backport from mainline 2017-05-04 Prakhar Bahuguna gcc/ * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. gcc/testsuite/ * gcc.target/arm/fpscr.c: New file. Best regards, Thomas --- Begin Message --- Hi Prakhar, Sorry for the delay, On 22/03/17 10:46, Prakhar Bahuguna wrote: The GCC documentation in section 6.60.8 ARM Floating Point Status and Control Intrinsics states that the FPSCR register can be read and written to using the intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these are misnamed within GCC itself and these intrinsic names are not recognised. This patch corrects the intrinsic names to match the documentation, and adds tests to verify these intrinsics generate the correct instructions. Testing done: Ran regression tests on arm-none-eabi for Cortex-M4. 2017-03-09 Prakhar Bahuguna gcc/ChangeLog: * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. * gcc/testsuite/gcc.target/arm/fpscr.c: New file. Okay for stage 1? I see that the mistake was in not addressing one of the review comments in: https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html properly in the patch that added these functions :( This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works fine I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for backwards compatibility as they were not documented and are __builtin_arm* functions that we don't guarantee to maintain. Thanks, Kyrill -- Prakhar Bahuguna --- End Message ---
[PATCH, GCC/ARM] Remove ARMv8-M code for D17-D31
Hi, Function cmse_nonsecure_entry_clear_before_return has code to deal with high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do not support more than 16 double VFP registers (D0-D15). This makes this security-sensitive code harder to read for not much benefit since libcall for cmse_nonsecure_call functions do not deal with those high VFP registers anyway. This commit gets rid of this code for simplicity and fixes 2 issues in the same function: - stop the first loop when reaching maxregno to avoid dealing with VFP registers if targetting Thumb-1 or using -mfloat-abi=soft - include maxregno in that loop ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it and make the remaining entry a 64-bit scalar integer variable and adapt code accordingly. Testing: Testsuite shows no regression when run for ARMv8-M Baseline and ARMv8-M Mainline. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..60a4d1f46765d285de469f51fbb5a0ad76d56d9b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3620,6 +3620,11 @@ arm_option_override (void) if (use_cmse && !arm_arch_cmse) error ("target CPU does not support ARMv8-M Security Extensions"); + /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions + and ARMv8-M Baseline and Mainline do not allow such configuration. */ + if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM) +error ("ARMv8-M Security Extensions incompatible with selected FPU"); + /* Disable scheduling fusion by default if it's not armv7 processor or doesn't prefer ldrd/strd. */ if (flag_schedule_fusion == 2 @@ -24996,15 +25001,15 @@ thumb1_expand_prologue (void) void cmse_nonsecure_entry_clear_before_return (void) { - uint64_t to_clear_mask[2]; + uint64_t to_clear_mask; uint32_t padding_bits_to_clear = 0; uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear; int regno, maxregno = IP_REGNUM; tree result_type; rtx result_rtl; - to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1; - to_clear_mask[0] |= (1ULL << IP_REGNUM); + to_clear_mask = (1ULL << (NUM_ARG_REGS)) - 1; + to_clear_mask |= (1ULL << IP_REGNUM); /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP registers. We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold @@ -25015,23 +25020,22 @@ cmse_nonsecure_entry_clear_before_return (void) maxregno = LAST_VFP_REGNUM; float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1); - to_clear_mask[0] |= float_mask; - - float_mask = (1ULL << (maxregno - 63)) - 1; - to_clear_mask[1] = float_mask; + to_clear_mask |= float_mask; /* Make sure we don't clear the two scratch registers used to clear the relevant FPSCR bits in output_return_instruction. */ emit_use (gen_rtx_REG (SImode, IP_REGNUM)); - to_clear_mask[0] &= ~(1ULL << IP_REGNUM); + to_clear_mask &= ~(1ULL << IP_REGNUM); emit_use (gen_rtx_REG (SImode, 4)); - to_clear_mask[0] &= ~(1ULL << 4); + to_clear_mask &= ~(1ULL << 4); } + gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__); + /* If the user has defined registers to be caller saved, these are no longer restored by the function before returning and must thus be cleared for security purposes. */ - for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++) + for (regno = NUM_ARG_REGS; regno <= maxregno; regno++) { /* We do not touch registers that can be used to pass arguments as per the AAPCS, since these should never be made callee-saved by user @@ -25041,7 +25045,7 @@ cmse_nonsecure_entry_clear_before_return (void) if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM)) continue; if (call_used_regs[regno]) - to_clear_mask[regno / 64] |= (1ULL << (regno % 64)); + to_clear_mask |= (1ULL << regno); } /* Make sure we do not clear the registers used to return the result in. */ @@ -25052,7 +25056,7 @@ cmse_nonsecure_entry_clear_before_return (void) /* No need to check that we return in registers, because we don't support returning on stack yet. */ - to_clear_mask[0] + to_clear_mask &= ~compute_not_to_clear_mask (result_type, result_rtl, 0, padding_bits_to_clear_ptr); } @@ -25063,7 +25067,7 @@ cmse_nonsecure_entry_clear_before_return (void) /* Padding bits to clear is not 0 so we know we are dealing with returning a composite type, which only uses r0. Let's make sure that r1-r3 is cleared too, we will use r1 as a scratch register. */ - gcc_assert ((to_cle
Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh
Hi Mike, Sorry, there was a mistake in the patch I sent. Please find an updated patch below. ChangeLog entry unchanged: *** contrib/ChangeLog *** 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Is this still ok? Best regards, Thomas On 19/06/17 16:55, Mike Stump wrote: On Jun 14, 2017, at 5:30 AM, Thomas Preudhomme wrote: 2017-06-14 Thomas Preud'homme * dg-cmp-results.sh: Keep test result lines rather than throwing header and summary to support sum files with multiple tools. Tested successfully on sum file with single tool with similar results and on sum file with multiple tools now showing a regression with patch proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html Is this ok for trunk? Ok. diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index d291769547dcd2a02ecf6f80d60d6be7802af4fd..921e9337d1f8ffea78ef566c351fb48a8f6ca064 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -90,8 +90,7 @@ echo "Newer log file: $NFILE" sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section. -sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ +sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ @@ -101,8 +100,7 @@ sed $E -e "1,/$header/d" \ >/tmp/o$$-$OBASE # Create a temporary file from the new file's interesting section. -sed $E -e "1,/$header/d" \ - -e '/^[[:space:]]+===/,$d' \ +sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \
[PATCH, GCC/contrib] Fix variant selection in dg-cmp-results.sh
Hi, Commit r249422 to dg-cmp-results.sh broke the variant selection feature where one can restrict the regression test to a specific target variant. This fix restores the feature. ChangeLog entry is as follows: *** contrib/ChangeLog *** 2017-06-21 Thomas Preud'homme * dg-cmp-results.sh: Restore filtering on target variant. Tested on a file with multiple variants which now gives sane results. Is this ok for trunk? Best regards, Thomas diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh index 921e9337d1f8ffea78ef566c351fb48a8f6ca064..5f2fed5ec3ff0c66d22bc07c84571568730fbcac 100755 --- a/contrib/dg-cmp-results.sh +++ b/contrib/dg-cmp-results.sh @@ -90,7 +90,7 @@ echo "Newer log file: $NFILE" sed $E -e '/^[[:space:]]+===/,$d' $NFILE # Create a temporary file from the old file's interesting section. -sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ +sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \ @@ -100,7 +100,7 @@ sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ >/tmp/o$$-$OBASE # Create a temporary file from the new file's interesting section. -sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \ +sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \ -e '/^[A-Z]+:/!d' \ -e '/^(WARNING|ERROR):/d' \ -e 's/\r$//' \
Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names
Hi Kyrill, On 10/04/17 15:01, Kyrill Tkachov wrote: Hi Prakhar, Sorry for the delay, On 22/03/17 10:46, Prakhar Bahuguna wrote: The GCC documentation in section 6.60.8 ARM Floating Point Status and Control Intrinsics states that the FPSCR register can be read and written to using the intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these are misnamed within GCC itself and these intrinsic names are not recognised. This patch corrects the intrinsic names to match the documentation, and adds tests to verify these intrinsics generate the correct instructions. Testing done: Ran regression tests on arm-none-eabi for Cortex-M4. 2017-03-09 Prakhar Bahuguna gcc/ChangeLog: * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. * gcc/testsuite/gcc.target/arm/fpscr.c: New file. Okay for stage 1? I see that the mistake was in not addressing one of the review comments in: https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html properly in the patch that added these functions :( This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works fine I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for backwards compatibility as they were not documented and are __builtin_arm* functions that we don't guarantee to maintain. How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of these versions and the testsuite didn't show any regression for any of the backport when run for Cortex-M7. Patches attached for reference. ChangeLog entries: *** gcc/ChangeLog *** 2017-06-20 Thomas Preud'homme Backport from mainline 2017-05-04 Prakhar Bahuguna * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. *** gcc/testsuite/ChangeLog *** 2017-06-20 Thomas Preud'homme Backport from mainline 2017-05-04 Prakhar Bahuguna gcc/testsuite/ * gcc.target/arm/fpscr.c: New file. Best regards, Thomas diff --git a/gcc/ChangeLog b/gcc/ChangeLog index da321440384628fb1770ff9e96377b341c61da6a..ab0e7c0167ac287b774378c3ecfb15a37d5362e7 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2017-06-20 Thomas Preud'homme + + Backport from mainline + 2017-05-04 Prakhar Bahuguna + + * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename + __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename + __builtin_arm_stfscr to __builtin_arm_set_fpscr. + 2017-06-22 Martin Liska Backport from mainline diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 6f4fd9bdb9774b942f7f51145a406258a82ac1e7..edd6dac6ab73d24447e8c9f6e39c5ba22fbf9302 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -1747,10 +1747,10 @@ arm_init_builtins (void) = build_function_type_list (unsigned_type_node, NULL); arm_builtin_decls[ARM_BUILTIN_GET_FPSCR] - = add_builtin_function ("__builtin_arm_ldfscr", ftype_get_fpscr, + = add_builtin_function ("__builtin_arm_get_fpscr", ftype_get_fpscr, ARM_BUILTIN_GET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); arm_builtin_decls[ARM_BUILTIN_SET_FPSCR] - = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, + = add_builtin_function ("__builtin_arm_set_fpscr", ftype_set_fpscr, ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); } } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index b411b9dbc108f12bd1931f57d3f4c1f315161ca0..a865ed054597c12de76a953fcf751209c1e4b84c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,10 @@ +2017-06-20 Thomas Preud'homme + + Backport from mainline + 2017-05-04 Prakhar Bahuguna + + * gcc.target/arm/fpscr.c: New file. + 2017-06-22 Martin Liska Backport from mainline diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c new file mode 100644 index ..7b4d71d72d8964f6da0d0604bf59aeb4a895df43 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/fpscr.c @@ -0,0 +1,16 @@ +/* Test the fpscr builtins. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ +/* { dg-add-options arm_fp } */ + +void +test_fpscr () +{ + volatile unsigned int status = __builtin_arm_get_fpscr (); + __builtin_arm_set_fpscr (status); +} + +/* { dg-final { scan-assembler "mrc\tp10, 7, r\[0-9\]+, cr1, cr0, 0" } } */ +/* { dg-final { scan-assembler "mcr\tp10, 7, r\[0-9\]+, cr1, cr0, 0" } } */ diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b24b70c7ef819ea3b45b6019b0db4ad37c6dfce8..61578113c5e3dd8cadcaf5e234f0cd5bb7cced38 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2017-06-20 Thomas Preud
Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names
Hi Christophe, On 23/06/17 20:10, Christophe Lyon wrote: Hi Thomas, On 23 June 2017 at 17:48, Thomas Preudhomme wrote: Hi Kyrill, On 10/04/17 15:01, Kyrill Tkachov wrote: Hi Prakhar, Sorry for the delay, On 22/03/17 10:46, Prakhar Bahuguna wrote: The GCC documentation in section 6.60.8 ARM Floating Point Status and Control Intrinsics states that the FPSCR register can be read and written to using the intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However, these are misnamed within GCC itself and these intrinsic names are not recognised. This patch corrects the intrinsic names to match the documentation, and adds tests to verify these intrinsics generate the correct instructions. Testing done: Ran regression tests on arm-none-eabi for Cortex-M4. 2017-03-09 Prakhar Bahuguna gcc/ChangeLog: * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr. * gcc/testsuite/gcc.target/arm/fpscr.c: New file. Okay for stage 1? I see that the mistake was in not addressing one of the review comments in: https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html properly in the patch that added these functions :( This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf works fine I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for backwards compatibility as they were not documented and are __builtin_arm* functions that we don't guarantee to maintain. How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of these versions and the testsuite didn't show any regression for any of the backport when run for Cortex-M7. Three's a problem with GCC-5: gcc.target/arm/fpscr.c: unknown effective target keyword `arm_fp_ok' for " dg-require-effective-target 4 arm_fp_ok " Indeed arm_fp_ok effective-target does not exist in the gcc-5 branch. Oh no. I remember not seeing anything but I can indeed see this with compare_tests from the sum file I save after each testing. Alright, what is done is done, working on a patch now. Best regards, Thomas
[PATCH, GCC/ARM, gcc-5-branch] Fix gcc.target/arm/fpscr.c
Hi, As raised by Christophe Lyon, fpscr.c FAILs because arm_fp_ok and arm_fp are not defined in GCC 5. This commit changes the test to use the same recipe as gcc.target/arm/cmp-2.c ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-26 Thomas Preud'homme * gcc.target/arm/fpscr.c: Require arm_vfp_ok instead of arm_fp_ok and add -mfpu=vfp -mfloat-abi=softfp instead of fp_ok options. Ok for GCC 5? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c index 7b4d71d72d8964f6da0d0604bf59aeb4a895df43..cafba4e8d67545bd210477230b9682fe86620e23 100644 --- a/gcc/testsuite/gcc.target/arm/fpscr.c +++ b/gcc/testsuite/gcc.target/arm/fpscr.c @@ -1,9 +1,9 @@ /* Test the fpscr builtins. */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-require-effective-target arm_vfp_ok } */ /* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ -/* { dg-add-options arm_fp } */ +/* { dg-options "-mfpu=vfp -mfloat-abi=softfp" } */ void test_fpscr ()
Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro
Hi Christophe, On 21/06/17 17:57, Christophe Lyon wrote: Hi, On 19 June 2017 at 11:32, Richard Earnshaw (lists) wrote: On 16/06/17 15:56, Prakhar Bahuguna wrote: On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote: On 16/06/17 08:48, Prakhar Bahuguna wrote: On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote: On 14/06/17 10:35, Prakhar Bahuguna wrote: The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which coprocessor intrinsics are available for the target. If __ARM_FEATURE_COPROC is undefined, the target does not support coprocessor intrinsics. The feature levels are defined as follows: +-+---+--+ | **Bit** | **Value** | **Intrinsics Available** | +-+---+--+ | 0 | 0x1 | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc, | | | | __arm_stcl, __arm_mcr and __arm_mrc | +-+---+--+ | 1 | 0x2 | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, | | | | __arm_stc2l, __arm_mcr2 and __arm_mrc2 | +-+---+--+ | 2 | 0x4 | __arm_mcrr and __arm_mrrc| +-+---+--+ | 3 | 0x8 | __arm_mcrr2 and __arm_mrrc2 | +-+---+--+ This patch implements full support for this feature macro as defined in section 5.9 of the ACLE (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros). gcc/ChangeLog: 2017-06-14 Prakhar Bahuguna * config/arm/arm-c.c (arm_cpu_builtins): New block to define __ARM_FEATURE_COPROC according to support. 2017-06-14 Prakhar Bahuguna * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap test. * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise. Testing done: ACLE regression tests updated with tests for feature macro bits. All regression tests pass. Okay for trunk? 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001 From: Prakhar Bahuguna Date: Tue, 2 May 2017 13:43:40 +0100 Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro --- gcc/config/arm/arm-c.c| 19 +++ gcc/testsuite/gcc.target/arm/acle/cdp.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/cdp2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc2l.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldcl.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcr.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcr2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcrr.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcrr2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrrc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrrc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc2l.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stcl.c | 3 +++ 19 files changed, 73 insertions(+) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 3abe7d1f1f5..3daf4e5e1f3 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -200,6 +200,25 @@ arm_cpu_builtins (struct cpp_reader* pfile) def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV); def_or_undef_macro (pfile, "__ARM_ASM_SYNTAX_UNIFIED__", inline_asm_unified); + + if ((!TARGET_THUMB || TARGET_THUMB2) && arm_arch4 && (!TARGET_THU
Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro
On 26/06/17 15:16, Christophe Lyon wrote: On 26 June 2017 at 16:09, Thomas Preudhomme wrote: Hi Christophe, On 21/06/17 17:57, Christophe Lyon wrote: Hi, On 19 June 2017 at 11:32, Richard Earnshaw (lists) wrote: On 16/06/17 15:56, Prakhar Bahuguna wrote: On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote: On 16/06/17 08:48, Prakhar Bahuguna wrote: On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote: On 14/06/17 10:35, Prakhar Bahuguna wrote: The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which coprocessor intrinsics are available for the target. If __ARM_FEATURE_COPROC is undefined, the target does not support coprocessor intrinsics. The feature levels are defined as follows: +-+---+--+ | **Bit** | **Value** | **Intrinsics Available** | +-+---+--+ | 0 | 0x1 | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc, | | | | __arm_stcl, __arm_mcr and __arm_mrc | +-+---+--+ | 1 | 0x2 | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, | | | | __arm_stc2l, __arm_mcr2 and __arm_mrc2 | +-+---+--+ | 2 | 0x4 | __arm_mcrr and __arm_mrrc | +-+---+--+ | 3 | 0x8 | __arm_mcrr2 and __arm_mrrc2 | +-+---+--+ This patch implements full support for this feature macro as defined in section 5.9 of the ACLE (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros). gcc/ChangeLog: 2017-06-14 Prakhar Bahuguna * config/arm/arm-c.c (arm_cpu_builtins): New block to define __ARM_FEATURE_COPROC according to support. 2017-06-14 Prakhar Bahuguna * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap test. * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise. * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise. Testing done: ACLE regression tests updated with tests for feature macro bits. All regression tests pass. Okay for trunk? 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001 From: Prakhar Bahuguna Date: Tue, 2 May 2017 13:43:40 +0100 Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro --- gcc/config/arm/arm-c.c| 19 +++ gcc/testsuite/gcc.target/arm/acle/cdp.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/cdp2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldc2l.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/ldcl.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcr.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcr2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcrr.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mcrr2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrrc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/mrrc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc2.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stc2l.c | 3 +++ gcc/testsuite/gcc.target/arm/acle/stcl.c | 3 +++ 19 files changed, 73 insertions(+) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 3abe7d1f1f5..3daf4e5e1f3 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -200,6 +200,25 @@ arm_cpu_builtins (struct cpp_reader* pfile) def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV); def_or_undef_macro (pfile, "__ARM_ASM_SYNTAX_UNIFIED__", inline_a
Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro
On 26/06/17 17:01, Thomas Preudhomme wrote: On 26/06/17 15:16, Christophe Lyon wrote: You mean the macro is expected not to be defined on ARMv8-A ? Correct. Most instructions its value represent are not available on ARMv8-A and for those that are the intrinsics are deprecated. I've just noticed that many such instructions not available on ARMv8-A are accepted by GNU as. I would like to enable/disable coprocessor intrinsics tests based on what GNU as returns regarding availability of these instructions so hold on a bit more. Best regards, Thomas
Re: [PATCH, GCC/ARM, gcc-5-branch, ping] Fix gcc.target/arm/fpscr.c
Ping? Best regards, Thomas On 26/06/17 12:32, Thomas Preudhomme wrote: Hi, As raised by Christophe Lyon, fpscr.c FAILs because arm_fp_ok and arm_fp are not defined in GCC 5. This commit changes the test to use the same recipe as gcc.target/arm/cmp-2.c ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-26 Thomas Preud'homme * gcc.target/arm/fpscr.c: Require arm_vfp_ok instead of arm_fp_ok and add -mfpu=vfp -mfloat-abi=softfp instead of fp_ok options. Ok for GCC 5? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c index 7b4d71d72d8964f6da0d0604bf59aeb4a895df43..cafba4e8d67545bd210477230b9682fe86620e23 100644 --- a/gcc/testsuite/gcc.target/arm/fpscr.c +++ b/gcc/testsuite/gcc.target/arm/fpscr.c @@ -1,9 +1,9 @@ /* Test the fpscr builtins. */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-require-effective-target arm_vfp_ok } */ /* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ -/* { dg-add-options arm_fp } */ +/* { dg-options "-mfpu=vfp -mfloat-abi=softfp" } */ void test_fpscr ()
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 20/06/17 13:44, Christophe Lyon wrote: The results with a more recent trunk (r249356)) are here: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249356-consistent_neon_check.patch/report-build-info.html They are slightly different, but still tedious to check ;-) I've checked arm-none-linux-gnueabi and arm-none-linux-gnueabihf and found that: * there's no new FAIL * changes to UNSUPPORTED and NA are for the same files * changes are only for tests in a vect directory * changes for arm-none-linux-gnueabihf are only when targeting vfp without neon (tests are disabled because there is no vector unit) Changes to arm-none-linux-gnueabi makes sense since this defaults to soft floating point and none of the test disabled adds any option to select another variant. I believe this all makes sense. Therefore, is this ok to commit? Best regards, Thomas diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index ded6383cc1f9a1489cd83e1dace0c2fc48e252c3..aa8550c9d2cf0ae7e157d9c67fa06ad811651421 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2916,7 +2916,7 @@ proc check_effective_target_vect_int { } { || [istarget alpha*-*-*] || [istarget ia64-*-*] || [istarget aarch64*-*-*] - || [check_effective_target_arm32] + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && ([et-is-effective-target mips_loongson] || [et-is-effective-target mips_msa])) } { @@ -2944,8 +2944,7 @@ proc check_effective_target_vect_intfloat_cvt { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_intfloat_cvt_saved($et_index) 1 @@ -2987,8 +2986,7 @@ proc check_effective_target_vect_uintfloat_cvt { } { || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) || [istarget aarch64*-*-*] - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_uintfloat_cvt_saved($et_index) 1 @@ -3016,8 +3014,7 @@ proc check_effective_target_vect_floatint_cvt { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_floatint_cvt_saved($et_index) 1 @@ -3043,8 +3040,7 @@ proc check_effective_target_vect_floatuint_cvt { } { set et_vect_floatuint_cvt_saved($et_index) 0 if { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || ([istarget arm*-*-*] - && [check_effective_target_arm_neon_ok]) + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_floatuint_cvt_saved($et_index) 1 @@ -4903,7 +4899,7 @@ proc check_effective_target_vect_shift { } { || [istarget ia64-*-*] || [istarget i?86-*-*] || [istarget x86_64-*-*] || [istarget aarch64*-*-*] - || [check_effective_target_arm32] + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && ([et-is-effective-target mips_msa] || [et-is-effective-target mips_loongson])) } { @@ -4921,7 +4917,7 @@ proc check_effective_target_whole_vector_shift { } { || [istarget ia64-*-*] || [istarget aarch64*-*-*] || [istarget powerpc64*-*-*] - || ([check_effective_target_arm32] + || ([is-effective-target arm_neon] && [check_effective_target_arm_little_endian]) || ([istarget mips*-*-*] && [et-is-effective-target mips_loongson]) } { @@ -4945,8 +4941,7 @@ proc check_effective_target_vect_bswap { } { } else { set et_vect_bswap_saved($et_index) 0 if { [istarget aarch64*-*-*] - || ([istarget arm*-*-*] -&& [check_effective_target_arm_neon]) + || [is-effective-target arm_neon] } { set et_vect_bswap_saved($et_index) 1 } @@ -4969,7 +4964,7 @@ proc check_effective_target_vect_shift_char { } { set et_vect_shift_char_saved($et_index) 0 if { ([istarget powerpc*-*-*] && ![istarget powerpc-*-linux*paired*]) - || [check_effective_target_arm32] + || [is-effective-target arm_neon] || ([istarget mips*-*-*] && [et-is-effective-target mips_msa]) } { set et_vect_shift_char_saved($et_index) 1 @@ -4987,10 +4982,10 @@ proc check_effective_target_vect_shift_char { } { proc check_effective_target_vect_long { } {
Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets
On 28/06/17 15:59, Kyrill Tkachov wrote: Hi Thomas, On 28/06/17 15:49, Thomas Preudhomme wrote: On 20/06/17 13:44, Christophe Lyon wrote: The results with a more recent trunk (r249356)) are here: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249356-consistent_neon_check.patch/report-build-info.html They are slightly different, but still tedious to check ;-) I've checked arm-none-linux-gnueabi and arm-none-linux-gnueabihf and found that: * there's no new FAIL * changes to UNSUPPORTED and NA are for the same files * changes are only for tests in a vect directory * changes for arm-none-linux-gnueabihf are only when targeting vfp without neon (tests are disabled because there is no vector unit) Changes to arm-none-linux-gnueabi makes sense since this defaults to soft floating point and none of the test disabled adds any option to select another variant. I believe this all makes sense. Therefore, is this ok to commit? Best regards, Thomas @@ -4987,10 +4982,10 @@ proc check_effective_target_vect_shift_char { } { proc check_effective_target_vect_long { } { if { [istarget i?86-*-*] || [istarget x86_64-*-*] - || (([istarget powerpc*-*-*] - && ![istarget powerpc-*-linux*paired*]) + || (([istarget powerpc*-*-*] + && ![istarget powerpc-*-linux*paired*]) && [check_effective_target_ilp32]) Is this just a whitespace change? If it is intended then okay. It is yes, trailing whitespace. I took the liberty to fix it because I was changing some other issues in the same procedure. This is okay with a ChangeLog entry. Sorry, I should have pasted it again from the initial message. 2017-06-06 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_vect_int): Replace current ARM check by ARM NEON's availability check. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_shift): Likewise. (check_effective_target_whole_vector_shift): Likewise. (check_effective_target_vect_bswap): Likewise. (check_effective_target_vect_shift_char): Likewise. (check_effective_target_vect_long): Likewise. (check_effective_target_vect_float): Likewise. (check_effective_target_vect_perm): Likewise. (check_effective_target_vect_perm_byte): Likewise. (check_effective_target_vect_perm_short): Likewise. (check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_sum_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi): Likewise. (check_effective_target_vect_widen_mult_hi_to_si): Likewise. (check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise. (check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise. (check_effective_target_vect_widen_shift): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. (check_effective_target_vect_interleave): Likewise. (check_effective_target_vect_multiple_sizes): Likewise. (check_effective_target_vect64): Likewise. (check_effective_target_vect_max_reduc): Likewise. Thanks, this looks like a good change. Kyrill Thanks! Best regards, Thomas
Re: [PATCH, GCC/ARM, ping] Remove ARMv8-M code for D17-D31
Ping? *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it and make the remaining entry a 64-bit scalar integer variable and adapt code accordingly. Best regards, Thomas On 20/06/17 16:01, Thomas Preudhomme wrote: Hi, Function cmse_nonsecure_entry_clear_before_return has code to deal with high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do not support more than 16 double VFP registers (D0-D15). This makes this security-sensitive code harder to read for not much benefit since libcall for cmse_nonsecure_call functions do not deal with those high VFP registers anyway. This commit gets rid of this code for simplicity and fixes 2 issues in the same function: - stop the first loop when reaching maxregno to avoid dealing with VFP registers if targetting Thumb-1 or using -mfloat-abi=soft - include maxregno in that loop ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-06-13 Thomas Preud'homme * config/arm/arm.c (arm_option_override): Forbid ARMv8-M Security Extensions with more than 16 double VFP registers. (cmse_nonsecure_entry_clear_before_return): Remove second entry of to_clear_mask and all code related to it and make the remaining entry a 64-bit scalar integer variable and adapt code accordingly. Testing: Testsuite shows no regression when run for ARMv8-M Baseline and ARMv8-M Mainline. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 259597d8890ee84c5bd92b12b6f9f6521c8dcd2e..60a4d1f46765d285de469f51fbb5a0ad76d56d9b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3620,6 +3620,11 @@ arm_option_override (void) if (use_cmse && !arm_arch_cmse) error ("target CPU does not support ARMv8-M Security Extensions"); + /* We don't clear D16-D31 VFP registers for cmse_nonsecure_call functions + and ARMv8-M Baseline and Mainline do not allow such configuration. */ + if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM) +error ("ARMv8-M Security Extensions incompatible with selected FPU"); + /* Disable scheduling fusion by default if it's not armv7 processor or doesn't prefer ldrd/strd. */ if (flag_schedule_fusion == 2 @@ -24996,15 +25001,15 @@ thumb1_expand_prologue (void) void cmse_nonsecure_entry_clear_before_return (void) { - uint64_t to_clear_mask[2]; + uint64_t to_clear_mask; uint32_t padding_bits_to_clear = 0; uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear; int regno, maxregno = IP_REGNUM; tree result_type; rtx result_rtl; - to_clear_mask[0] = (1ULL << (NUM_ARG_REGS)) - 1; - to_clear_mask[0] |= (1ULL << IP_REGNUM); + to_clear_mask = (1ULL << (NUM_ARG_REGS)) - 1; + to_clear_mask |= (1ULL << IP_REGNUM); /* If we are not dealing with -mfloat-abi=soft we will need to clear VFP registers. We also check that TARGET_HARD_FLOAT and !TARGET_THUMB1 hold @@ -25015,23 +25020,22 @@ cmse_nonsecure_entry_clear_before_return (void) maxregno = LAST_VFP_REGNUM; float_mask &= ~((1ULL << FIRST_VFP_REGNUM) - 1); - to_clear_mask[0] |= float_mask; - - float_mask = (1ULL << (maxregno - 63)) - 1; - to_clear_mask[1] = float_mask; + to_clear_mask |= float_mask; /* Make sure we don't clear the two scratch registers used to clear the relevant FPSCR bits in output_return_instruction. */ emit_use (gen_rtx_REG (SImode, IP_REGNUM)); - to_clear_mask[0] &= ~(1ULL << IP_REGNUM); + to_clear_mask &= ~(1ULL << IP_REGNUM); emit_use (gen_rtx_REG (SImode, 4)); - to_clear_mask[0] &= ~(1ULL << 4); + to_clear_mask &= ~(1ULL << 4); } + gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__); + /* If the user has defined registers to be caller saved, these are no longer restored by the function before returning and must thus be cleared for security purposes. */ - for (regno = NUM_ARG_REGS; regno < LAST_VFP_REGNUM; regno++) + for (regno = NUM_ARG_REGS; regno <= maxregno; regno++) { /* We do not touch registers that can be used to pass arguments as per the AAPCS, since these should never be made callee-saved by user @@ -25041,7 +25045,7 @@ cmse_nonsecure_entry_clear_before_return (void) if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM)) continue; if (call_used_regs[regno]) - to_clear_mask[regno / 64] |= (1ULL << (regno % 64)); + to_clear_mask |= (1ULL << regno); } /* Make sure we do not clear the registers used to return the result in.
Re: [PATCH, GCC/ARM] Remove ARMv8-M code for D17-D31
Hi Richard, On 28/06/17 16:56, Richard Earnshaw (lists) wrote: On 20/06/17 16:01, Thomas Preudhomme wrote: Hi, Function cmse_nonsecure_entry_clear_before_return has code to deal with high VFP register (D16-D31) while ARMv8-M Baseline and Mainline both do not support more than 16 double VFP registers (D0-D15). This makes this security-sensitive code harder to read for not much benefit since libcall for cmse_nonsecure_call functions do not deal with those high VFP registers anyway. This commit gets rid of this code for simplicity and fixes 2 issues in the same function: - stop the first loop when reaching maxregno to avoid dealing with VFP registers if targetting Thumb-1 or using -mfloat-abi=soft - include maxregno in that loop This is silently baking in dangerous assumptions about GCC's internal numbering of the registers. That's not a good idea from a long-term portability perspective. At the very least you need to assert that all the interesting registers are numbered in the range 0..63; but ideally the code should just handle pretty much any assignment of internal register numbers. Well there is already this: gcc_assert ((unsigned) maxregno <= sizeof (to_clear_mask) * __CHAR_BIT__); Did you consider using sbitmaps rather than doing all the multi-word stuff by steam? No but am happy to. I'll respin the patch. Best regards, Thomas
[PATCH, GCC/ARM, 0/3] Add support for ARMv8-R
Hi, This patch series adds support for the ARMv8-R architecture[1] and ARM Cortex-R52[2] to GCC. The patch series consist of the following patches: [ 1/3] Add missing MIDR information for ARM Cortex-R7 and Cortex-R8 processor [ 2/3] Add support for ARMv8-R architecture [ 3/3] Add support for ARM Cortex-R52 [1] https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile [2] https://developer.arm.com/products/processors/cortex-r/cortex-r52
[PATCH 1/3, GCC/ARM] Add MIDR info for ARM Cortex-R7 and Cortex-R8
Hi, The driver is missing MIDR information for processors ARM Cortex-R7 and Cortex-R8 to support -march/-mcpu/-mtune=native on the command line. This patch adds the missing information. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R7 and Cortex-R8 processors. Is this ok for master? Best regards, Thomas diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = { {"0xd09", "armv8-a+crc", "cortex-a73"}, {"0xc14", "armv7-r", "cortex-r4"}, {"0xc15", "armv7-r", "cortex-r5"}, +{"0xc17", "armv7-r", "cortex-r7"}, +{"0xc18", "armv7-r", "cortex-r8"}, {"0xc20", "armv6-m", "cortex-m0"}, {"0xc21", "armv6-m", "cortex-m1"}, {"0xc23", "armv7-m", "cortex-m3"},
[PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture
Hi, This patch adds support for ARMv8-R architecture [1] which was recently announced. User level instructions for ARMv8-R are the same as those in ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same features as ARMv8-A in ARM backend. [1] https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile ChangeLog entries are as follow: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Regenerate. * config/arm/arm-isa.h (ISA_ARMv8r): Define macro. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R enumerator. * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and ARMv8-R with CRC extensions. * doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc options. Document meaning of -march=armv8-r+rcr. *** gcc/testsuite/ChangeLog *** 2017-01-31 Thomas Preud'homme * lib/target-supports.exp: Generate check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r and check_effective_target_arm_arch_v8r_multilib. *** libgcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R. Tested by building an arm-none-eabi GCC cross-compiler targetting ARMv8-R. Is this ok for stage1? Best regards, Thomas diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h index b3888120daa8494eb41bde0368122ad2f06d81af..0a122f5febaaceeeb5a405cb5a64e1edd9b044f3 100644 --- a/gcc/config/arm/arm-cpu-cdata.h +++ b/gcc/config/arm/arm-cpu-cdata.h @@ -1041,6 +1041,20 @@ static const struct arm_arch_core_flag arm_arch_core_flags[] = }, }, { +"armv8-r", +{ + ISA_ARMv8r, + isa_nobit +}, + }, + { +"armv8-r+crc", +{ + ISA_ARMv8r,isa_bit_crc32, + isa_nobit +}, + }, + { "iwmmxt", { ISA_ARMv5te,isa_bit_xscale,isa_bit_iwmmxt, diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h index d6200f9bdc09a9d0c973853b0152a2800eaf2fe5..48c1d88032c1c5dc7c6cba71511f79fe9f2533ea 100644 --- a/gcc/config/arm/arm-cpu-data.h +++ b/gcc/config/arm/arm-cpu-data.h @@ -1478,6 +1478,26 @@ static const struct processors all_architectures[] = NULL }, { +"armv8-r", TARGET_CPU_cortexr4, +(TF_CO_PROC), +"8R", BASE_ARCH_8R, +{ + ISA_ARMv8r, + isa_nobit +}, +NULL + }, + { +"armv8-r+crc", TARGET_CPU_cortexr4, +(TF_CO_PROC), +"8R", BASE_ARCH_8R, +{ + ISA_ARMv8r,isa_bit_crc32, + isa_nobit +}, +NULL + }, + { "iwmmxt", TARGET_CPU_iwmmxt, (TF_LDSCHED | TF_STRONG | TF_XSCALE), "5TE", BASE_ARCH_5TE, diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index fc5d935182ba70de5ab2aefeec492318f42e95c5..be1f0ca4e38ae76683b77d8c3b79a066e62325d7 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -287,6 +287,20 @@ begin arch armv8-m.main+dsp isa ARMv8m_main bit_ARMv7em end arch armv8-m.main+dsp +begin arch armv8-r + tune for cortex-r4 + tune flags CO_PROC + base 8R + isa ARMv8r +end arch armv8-r + +begin arch armv8-r+crc + tune for cortex-r4 + tune flags CO_PROC + base 8R + isa ARMv8r bit_crc32 +end arch armv8-r+crc + begin arch iwmmxt tune for iwmmxt tune flags LDSCHED STRONG XSCALE diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h index 6050bca95587f68a3671dd2144cf845b83da3692..24ec398b346f8effb346235d6f3ab20eb6f70e0f 100644 --- a/gcc/config/arm/arm-isa.h +++ b/gcc/config/arm/arm-isa.h @@ -125,6 +125,7 @@ enum isa_feature #define ISA_ARMv8_2a ISA_ARMv8_1a, isa_bit_ARMv8_2 #define ISA_ARMv8m_base ISA_ARMv6m, isa_bit_ARMv8, isa_bit_cmse, isa_bit_tdiv #define ISA_ARMv8m_main ISA_ARMv7m, isa_bit_ARMv8, isa_bit_cmse +#define ISA_ARMv8r ISA_ARMv8a /* List of all FPU bits to strip out if -mfpu is used to override the default. isa_bit_fp16 is deliberately missing from this list. */ diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index cbcd85d9906d1fc797ab33b3d61969f32b9cc566..7bab5de5a39e9192c97851929b83175648158cdf 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -461,10 +461,16 @@ EnumValue Enum(arm_arch) String(armv8-m.main+dsp) Value(33) EnumValue -Enum(arm_arch) String(iwmmxt) Value(34) +Enum(arm_arch) String(armv8-r) Value(34) EnumValue -Enum(arm_arch) String(iwmmxt2) Value(35) +Enum(arm_arch) String(armv8-r+crc) Value(35) + +EnumValue +Enum(arm_arch) String(iwmmxt) Value(36) + +EnumValue +Enum(arm_arch) String(iwmmxt2) Value(37) Enum Name(arm_fpu) Type(enum fpu_type) diff --git a/gcc/config/arm/arm.h b/gcc/config/ar
[PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor
Hi, This patch adds support for the ARM Cortex-R52 processor rencently announced. [1] https://developer.arm.com/products/processors/cortex-r/cortex-r52 ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/arm-cpus.in (cortex-r52): Add new entry. * config/arm/arm-cpu.h: Regenerate. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Regenerate. * config/arm/arm-tables.opt: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARM Cortex-R52. * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R52. * doc/invoke.texi: Mention -mtune=cortex-r52. Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52. Is this ok for stage1? Best regards, Thomas diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h index 0a122f5febaaceeeb5a405cb5a64e1edd9b044f3..043b5b2db09146b5686a5fe602f907164f9d84c5 100644 --- a/gcc/config/arm/arm-cpu-cdata.h +++ b/gcc/config/arm/arm-cpu-cdata.h @@ -803,6 +803,13 @@ static const struct arm_arch_core_flag arm_arch_core_flags[] = }, }, { +"cortex-r52", +{ + ISA_ARMv8r,isa_bit_crc32, + isa_nobit +}, + }, + { "armv2", { ISA_ARMv2,isa_bit_mode26, diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h index 48c1d88032c1c5dc7c6cba71511f79fe9f2533ea..0677132382fad2f1baf1fbdf5c0b03fe32f752e2 100644 --- a/gcc/config/arm/arm-cpu-data.h +++ b/gcc/config/arm/arm-cpu-data.h @@ -1132,6 +1132,16 @@ static const struct processors all_cores[] = }, &arm_v7m_tune }, + { +"cortex-r52", TARGET_CPU_cortexr52, +(TF_LDSCHED), +"8R", BASE_ARCH_8R, +{ + ISA_ARMv8r,isa_bit_crc32, + isa_nobit +}, +&arm_cortex_tune + }, {NULL, TARGET_CPU_arm_none, 0, NULL, BASE_ARCH_0, {isa_nobit}, NULL} }; diff --git a/gcc/config/arm/arm-cpu.h b/gcc/config/arm/arm-cpu.h index cd282db02f56f4416ff82eb3d8d569cd99fb0d41..4d6ea61d07dc98540f0f75679d8ef6f7eafc10bb 100644 --- a/gcc/config/arm/arm-cpu.h +++ b/gcc/config/arm/arm-cpu.h @@ -132,6 +132,7 @@ enum processor_type TARGET_CPU_cortexa73cortexa53, TARGET_CPU_cortexm23, TARGET_CPU_cortexm33, + TARGET_CPU_cortexr52, TARGET_CPU_arm_none }; diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index be1f0ca4e38ae76683b77d8c3b79a066e62325d7..139aa561d3f918655978e44b5bcb6c0b50747a08 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -1104,6 +1104,16 @@ begin cpu cortex-m33 costs v7m end cpu cortex-m33 + +# V8 R-profile implementations. +begin cpu cortex-r52 + cname cortexr52 + tune flags LDSCHED + architecture armv8-r+crc + costs cortex +end cpu cortex-r52 + + # FPU entries # format: # begin fpu diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 7bab5de5a39e9192c97851929b83175648158cdf..ccd1a7661fb97938ddea7670eebe1a0f48efb929 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -354,6 +354,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23) EnumValue Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33) +EnumValue +Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52) + Enum Name(arm_arch) Type(int) Known ARM architectures (for use with the -march= option): diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h index c394ac805c7577113ed72b31a06ff93dc7f5f490..c3dca1cd4833afd67e56a276ef0e9c1e17f4fae4 100644 --- a/gcc/config/arm/bpabi.h +++ b/gcc/config/arm/bpabi.h @@ -100,7 +100,7 @@ |march=armv8-m.main \ |march=armv8-m.main+dsp|mcpu=cortex-m33 \ |march-armv8-r \ - |march-armv8-r+crc \ + |march-armv8-r+crc|mcpu=cortex-r52 \ :%{!r:--be8}}}" #else #define BE8_LINK_SPEC \ @@ -142,7 +142,7 @@ |march=armv8-m.main \ |march=armv8-m.main+dsp|mcpu=cortex-m33 \ |march=armv8-r \ - |march=armv8-r+crc \ + |march=armv8-r+crc|mcpu=cortex-r52 \ :%{!r:--be8}}}" #endif diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index 29873d57a1e45fd989f6ff01dd4a2ae7320d93bb..00f8128e6911a79f83da03bf731c1cc9127c7285 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -56,6 +56,7 @@ static struct vendor_cpu arm_cpu_table[] = { {"0xc15", "armv7-r", "cortex-r5"}, {"0xc17", "armv7-r", "cortex-r7"}, {"0xc18", "armv7-r", "cortex-r8"}, +{"0xd13", "armv8-r+crc", "cortex-r52"}, {"0xc20", "armv6-m", "cortex-m0"}, {"0xc21", "armv6-m", "cortex-m1"}, {"0xc23", "armv7-m", "cortex-m3"}, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9ea580626749dc9d27bb72d56bbbef6a474a5055..a871837426485dd6a87c541386964bf85dfafde7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -15212,6 +15212,7 @@ Permissible names are: @samp{arm2}, @samp{arm250}, @sa
Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R
On 29/06/17 15:34, Christophe Lyon wrote: On 29 June 2017 at 15:52, Thomas Preudhomme wrote: Hi, This patch series adds support for the ARMv8-R architecture[1] and ARM Cortex-R52[2] to GCC. The patch series consist of the following patches: Hi Thomas, I think you need to rebase your patch because Richard's recent series changed the contents of arm-cpu-data.h and arm-cpu-cdata.h. Err yes indeed. Thanks! Why do you link armv8-r architecture definition to cortex-r4? I understand, where did I do such a thing? Best regards, Thomas
Re: [PATCH 2/3, GCC/ARM] Add support for ARMv8-R architecture
Please ignore this patch. I'll respin the patch on a more recent GCC. Best regards, Thomas On 29/06/17 14:55, Thomas Preudhomme wrote: Hi, This patch adds support for ARMv8-R architecture [1] which was recently announced. User level instructions for ARMv8-R are the same as those in ARMv8-A Aarch32 mode so this patch define ARMv8-R to have the same features as ARMv8-A in ARM backend. [1] https://developer.arm.com/products/architecture/r-profile/docs/ddi0568/latest/arm-architecture-reference-manual-supplement-armv8-for-the-armv8-r-aarch32-architecture-profile ChangeLog entries are as follow: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/arm-cpus.in (armv8-r, armv8-r+rcr): Add new entry. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Regenerate. * config/arm/arm-isa.h (ISA_ARMv8r): Define macro. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (enum base_architecture): Add BASE_ARCH_8R enumerator. * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARMv8-R and ARMv8-R with CRC extensions. * doc/invoke.texi: Mention -march=armv8-r and -march=armv8-r+crc options. Document meaning of -march=armv8-r+rcr. *** gcc/testsuite/ChangeLog *** 2017-01-31 Thomas Preud'homme * lib/target-supports.exp: Generate check_effective_target_arm_arch_v8r_ok, add_options_for_arm_arch_v8r and check_effective_target_arm_arch_v8r_multilib. *** libgcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/lib1funcs.S: Defined __ARM_ARCH__ to 8 for ARMv8-R. Tested by building an arm-none-eabi GCC cross-compiler targetting ARMv8-R. Is this ok for stage1? Best regards, Thomas
Re: [PATCH 3/3, GCC/ARM] Add support for ARM Cortex-R52 processor
Please ignore this patch. I'll respin the patch on a more recent GCC. Best regards, Thomas On 29/06/17 14:56, Thomas Preudhomme wrote: Hi, This patch adds support for the ARM Cortex-R52 processor rencently announced. [1] https://developer.arm.com/products/processors/cortex-r/cortex-r52 ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/arm-cpus.in (cortex-r52): Add new entry. * config/arm/arm-cpu.h: Regenerate. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Regenerate. * config/arm/arm-tables.opt: Regenerate. * config/arm/bpabi.h (BE8_LINK_SPEC): Add entry for ARM Cortex-R52. * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R52. * doc/invoke.texi: Mention -mtune=cortex-r52. Tested by building an arm-none-eabi GCC cross-compiler targeting Cortex-R52. Is this ok for stage1? Best regards, Thomas
Re: [PATCH, GCC/ARM, 0/3] Add support for ARMv8-R
On 29/06/17 16:12, Christophe Lyon wrote: On 29 June 2017 at 16:37, Thomas Preudhomme Why do you link armv8-r architecture definition to cortex-r4? I understand, where did I do such a thing? In patch #2 you have: diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h index d6200f9bdc09a9d0c973853b0152a2800eaf2fe5..48c1d88032c1c5dc7c6cba71511f79fe9f2533ea 100644 --- a/gcc/config/arm/arm-cpu-data.h +++ b/gcc/config/arm/arm-cpu-data.h @@ -1478,6 +1478,26 @@ static const struct processors all_architectures[] = NULL }, { +"armv8-r", TARGET_CPU_cortexr4, +(TF_CO_PROC), +"8R", BASE_ARCH_8R, +{ + ISA_ARMv8r, + isa_nobit +}, +NULL + }, + { +"armv8-r+crc", TARGET_CPU_cortexr4, +(TF_CO_PROC), +"8R", BASE_ARCH_8R, +{ + ISA_ARMv8r,isa_bit_crc32, + isa_nobit +}, +NULL + }, + { "iwmmxt", TARGET_CPU_iwmmxt, (TF_LDSCHED | TF_STRONG | TF_XSCALE), "5TE", BASE_ARCH_5TE, Both entries point to TARGET_CPU_cortexr4. I guess that's because r52 is only defined in patch #3, but then why not update this in patch #3 are replace r4 with r52? Not sure I'm very clear :-) You are. I must have forgotten about that setting when working on patch #3. I'll update this. Thanks for your vigilance :-) Best regards, Thomas
Re: [PATCH, GCC/ARM, gcc-5-branch, ping2] Fix gcc.target/arm/fpscr.c
Ping? Best regards, Thomas On 28/06/17 12:35, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 26/06/17 12:32, Thomas Preudhomme wrote: Hi, As raised by Christophe Lyon, fpscr.c FAILs because arm_fp_ok and arm_fp are not defined in GCC 5. This commit changes the test to use the same recipe as gcc.target/arm/cmp-2.c ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-06-26 Thomas Preud'homme * gcc.target/arm/fpscr.c: Require arm_vfp_ok instead of arm_fp_ok and add -mfpu=vfp -mfloat-abi=softfp instead of fp_ok options. Ok for GCC 5? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c index 7b4d71d72d8964f6da0d0604bf59aeb4a895df43..cafba4e8d67545bd210477230b9682fe86620e23 100644 --- a/gcc/testsuite/gcc.target/arm/fpscr.c +++ b/gcc/testsuite/gcc.target/arm/fpscr.c @@ -1,9 +1,9 @@ /* Test the fpscr builtins. */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_fp_ok } */ +/* { dg-require-effective-target arm_vfp_ok } */ /* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ -/* { dg-add-options arm_fp } */ +/* { dg-options "-mfpu=vfp -mfloat-abi=softfp" } */ void test_fpscr ()
Fix ChangeLog format in r247584
Hi, This patch fixes relative pathnames in gcc/ChangeLog for r247584. Committed as obvious to trunk, GCC 5, 6 and 7. Best regards, Thomas diff --git a/gcc/ChangeLog b/gcc/ChangeLog index f9e00198bbfd352960685b5c72193570e232e68a..39bdcb12ebbad3cdbdce6b9d4dd87c28610e37fe 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -5826,7 +5826,7 @@ 2017-05-04 Prakhar Bahuguna - * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename + * config/arm/arm-builtins.c (arm_init_builtins): Rename __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename __builtin_arm_stfscr to __builtin_arm_set_fpscr.
Re: [PATCH 1/3, GCC/ARM, ping] Add MIDR info for ARM Cortex-R7 and Cortex-R8
Ping? Best regards, Thomas On 29/06/17 14:55, Thomas Preudhomme wrote: Hi, The driver is missing MIDR information for processors ARM Cortex-R7 and Cortex-R8 to support -march/-mcpu/-mtune=native on the command line. This patch adds the missing information. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-01-31 Thomas Preud'homme * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM Cortex-R7 and Cortex-R8 processors. Is this ok for master? Best regards, Thomas diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = { {"0xd09", "armv8-a+crc", "cortex-a73"}, {"0xc14", "armv7-r", "cortex-r4"}, {"0xc15", "armv7-r", "cortex-r5"}, +{"0xc17", "armv7-r", "cortex-r7"}, +{"0xc18", "armv7-r", "cortex-r8"}, {"0xc20", "armv6-m", "cortex-m0"}, {"0xc21", "armv6-m", "cortex-m1"}, {"0xc23", "armv7-m", "cortex-m3"},
[PATCH, GCC/ARM] Fix cmse_nonsecure_entry return insn size
Hi, A number of instructions are output in assembler form by output_return_instruction () when compiling a function with the cmse_nonsecure_entry attribute for Armv8-M Mainline with hardfloat float ABI. However, the corresponding thumb2_cmse_entry_return insn pattern does not account for all these instructions in its computing of the length of the instruction. This may lead GCC to use the wrong branching instruction due to incorrect computation of the offset between the branch instruction's address and the target address. This commit fixes the mismatch between what output_return_instruction () does and what the pattern think it does and adds a note warning about mismatch in the affected functions' heading comments to ensure code does not get out of sync again. Note: no test is provided because the C testcase is fragile (only works on GCC 6) and the extracted RTL test fails to compile due to bugs in the RTL frontend (PR82815 and PR82817) ChangeLog entries are as follows: *** gcc/ChangeLog *** 2017-10-30 Thomas Preud'homme * config/arm/arm.c (output_return_instruction): Add comments to indicate requirement for cmse_nonsecure_entry return to account for the size of clearing instruction output here. (thumb_exit): Likewise. * config/arm/thumb2.md (thumb2_cmse_entry_return): Fix length for return in hardfloat mode. Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 033ec255a577f782201527f57f45802bc0eb45e0..9919f54242d9317125a104f9777d76a85de80e9b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19417,7 +19417,12 @@ arm_get_vfp_saved_size (void) /* Generate a function exit sequence. If REALLY_RETURN is false, then do everything bar the final return instruction. If simple_return is true, - then do not output epilogue, because it has already been emitted in RTL. */ + then do not output epilogue, because it has already been emitted in RTL. + + Note: do not forget to update length attribute of corresponding insn pattern + when changing assembly output (eg. length attribute of + thumb2_cmse_entry_return when updating Armv8-M Mainline Security Extensions + register clearing sequences). */ const char * output_return_instruction (rtx operand, bool really_return, bool reverse, bool simple_return) @@ -23950,7 +23955,12 @@ thumb_pop (FILE *f, unsigned long mask) /* Generate code to return from a thumb function. If 'reg_containing_return_addr' is -1, then the return address is - actually on the stack, at the stack pointer. */ + actually on the stack, at the stack pointer. + + Note: do not forget to update length attribute of corresponding insn pattern + when changing assembly output (eg. length attribute of epilogue_insns when + updating Armv8-M Baseline Security Extensions register clearing + sequences). */ static void thumb_exit (FILE *f, int reg_containing_return_addr) { diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index b78c3d256aeafc2eeb3dcdc2b9b07b1af9df5294..776d611d2538e790a5f504995050ffdfc51d7193 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1132,7 +1132,7 @@ ; we adapt the length accordingly. (set (attr "length") (if_then_else (match_test "TARGET_HARD_FLOAT") - (const_int 12) + (const_int 34) (const_int 8))) ; We do not support predicate execution of returns from cmse_nonsecure_entry ; functions because we need to clear the APSR. Since predicable has to be
[PATCH, GCC/testsuite] Fix retrieval of testname
When gcc-dg-runtest is used to run a test the test is run several times with different options. For clarity of the log, the test infrastructure then append the options to the testname. This means that all the code that must deal with the testcase itself (eg. removing the output files after the test has run) needs to remove the option name. There is already a pattern (see below) for this in several place of the testsuite framework but it is also missing in many places. This patch fixes all of these places. The pattern is as follows: set testcase [testname-for-summary] ; The name might include a list of options; extract the file name. set testcase [lindex $testcase 0] ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-08 Thomas Preud'homme * lib/scanasm.exp (scan-assembler): Extract filename from testname used in summary. (scan-assembler-not): Likewise. (scan-hidden): Likewise. (scan-not-hidden): Likewise. (scan-stack-usage): Likewise. (scan-stack-usage-not): Likewise. (scan-assembler-times): Likewise. (scan-assembler-dem): Likewise. (scan-assembler-dem-not): Likewise. (object-size): Likewise. (scan-lto-assembler): Likewise. * lib/scandump.exp (scan-dump): Likewise. (scan-dump-times): Likewise. (scan-dump-not): Likewise. (scan-dump-dem): Likewise. (scan-dump-dem-not): Likewise Testing: Ran testsuite on bootstrap aarch64-linux-gnu and x86_64-linux-gnu compiled with C, fortran and ada support without any regression. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp index a66bb28253196410554405facefa8641d1020c1d..33286152f30df959a4bffa81634d0bfe7b898e8f 100644 --- a/gcc/testsuite/lib/scanasm.exp +++ b/gcc/testsuite/lib/scanasm.exp @@ -78,7 +78,9 @@ proc dg-scan { name positive testcase output_file orig_args } { proc scan-assembler { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].s" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].s" dg-scan "scan-assembler" 1 $testcase $output_file $args } @@ -89,7 +91,9 @@ force_conventional_output_for scan-assembler proc scan-assembler-not { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].s" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].s" dg-scan "scan-assembler-not" 0 $testcase $output_file $args } @@ -117,7 +121,9 @@ proc hidden-scan-for { symbol } { proc scan-hidden { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].s" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].s" set symbol [lindex $args 0] @@ -133,7 +139,9 @@ proc scan-hidden { args } { proc scan-not-hidden { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].s" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].s" set symbol [lindex $args 0] set hidden_scan [hidden-scan-for $symbol] @@ -163,7 +171,9 @@ proc scan-file-not { output_file args } { proc scan-stack-usage { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].su" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].su" dg-scan "scan-file" 1 $testcase $output_file $args } @@ -173,7 +183,9 @@ proc scan-stack-usage { args } { proc scan-stack-usage-not { args } { set testcase [testname-for-summary] -set output_file "[file rootname [file tail $testcase]].su" +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] +set output_file "[file rootname [file tail $filename]].su" dg-scan "scan-file-not" 0 $testcase $output_file $args } @@ -230,12 +242,14 @@ proc scan-assembler-times { args } { } set testcase [testname-for-summary] +# The name might include a list of options; extract the file name. +set filename [lindex $testcase 0] set pattern [lindex $args 0] set times [lindex $args 1] set pp_pattern [make_pattern_printable $pattern] # This must match the rule in gcc-dg.exp. -set output_file "[file rootname [file tail $testcase]].s" +set output_file "[fi
[PATCH, GCC/testsuite/ARM] Consolidate sources for cmse tests
For the most part, testcases under gcc.target/arm/cmse/baseline and gcc.target/arm/cmse/mainline are duplicate copies with only different dejagnu directives. Although there is no requirement for them to be similar, having them both identical allow to compare the code generated and make it easier in case of change in code generation to both architecture to update the testcases (if one needs updating so does the other). Similarly all the tests in gcc.target/arm/cmse/mainline/ have the same source but are duplicate copies. This patch moves all the code in the tests to a parent directory: gcc.target/arm/cmse for tests shared by Armv8-M Baseline and Mainline and gcc.target/arm/cmse/mainline for tests *only* shared by the various float ABI of Armv8-M Mainline. C includes are then used where the code used to sit. Note that the cmse-13.c test used to differ slightly between architectures and float ABI tested in the first floating-point constant passed to bar: sometimes 1.0 and sometimes 3.0. This patch settles on 3.0 to not confuse with the 1.0 constant used to clear VFP registers in some of the configurations. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-03 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-4.x: New file. * gcc.target/arm/cmse/baseline/bitfield-4.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-4.c: Likewise. * gcc.target/arm/cmse/bitfield-5.x: New file. * gcc.target/arm/cmse/baseline/bitfield-5.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-5.c: Likewise. * gcc.target/arm/cmse/bitfield-6.x: New file. * gcc.target/arm/cmse/baseline/bitfield-6.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-6.c: Likewise. * gcc.target/arm/cmse/bitfield-7.x: New file. * gcc.target/arm/cmse/baseline/bitfield-7.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-7.c: Likewise. * gcc.target/arm/cmse/bitfield-8.x: New file. * gcc.target/arm/cmse/baseline/bitfield-8.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-8.c: Likewise. * gcc.target/arm/cmse/bitfield-9.x: New file. * gcc.target/arm/cmse/baseline/bitfield-9.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-9.c: Likewise. * gcc.target/arm/cmse/bitfield-and-union.x: New file. * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: Rename into ... * gcc.target/arm/cmse/baseline/bitfield-and-union.c: This. Remove code and include above bitfield-and-union.x file. * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Rename into ... * gcc.target/arm/cmse/mainline/bitfield-and-union.c: this. Remove code and include above bitfield-and-union.x file. * gcc.target/arm/cmse/cmse-13.x: New file. * gcc.target/arm/cmse/baseline/cmse-13.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/cmse-5.x: New file. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/harFor the most part, testcases under gcc.target/arm/cmse/baseline and gcc.target/arm/cmse/mainline are duplicate copies with only different dejagnu directives. Although there is no requirement for them to be similar, having them both identical allow to compare the code generated and make it easier in case of change in code generation to both architecture to update the testcases (if one needs updating so does the other). Similarly all the tests in gcc.target/arm/cmse/mainline/ have the same source but are duplicate copies. This patch moves all the code in the tests to a parent directory: gcc.target/arm/cmse for tests shared by Armv8-M Baseline and Mainline and gcc.target/arm/cmse/mainline for tests *only* shared by the various float ABI of Armv8-M Mainline. C includes are then used where the code used to sit. Note that the cmse-13.c test used to differ slightly between architectures and float ABI tested in the first floating-point constant passed to bar: sometimes 1.0 and sometimes 3.0. This patch settles on 3.0 to not confuse with the 1.0 constant used to clear VFP registers in some of the configurations. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-03 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-4.x: New file. * gcc.target/arm/cmse/baseline/bitfield-4.c: Remove code and include above file. * gcc.target/arm/cmse/m
Re: [PATCH, GCC/testsuite/ARM] Consolidate sources for cmse tests
ewise. * gcc.target/arm/cmse/union-2.x: New file. * gcc.target/arm/cmse/baseline/union-2.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/union-2.c: Likewise. Testing: Running cmse.exp for both Armv8-M Baseline and Mainline shows no regression. Is this ok for trunk? Best regards, Thomas On 10/11/17 11:19, Thomas Preudhomme wrote: For the most part, testcases under gcc.target/arm/cmse/baseline and gcc.target/arm/cmse/mainline are duplicate copies with only different dejagnu directives. Although there is no requirement for them to be similar, having them both identical allow to compare the code generated and make it easier in case of change in code generation to both architecture to update the testcases (if one needs updating so does the other). Similarly all the tests in gcc.target/arm/cmse/mainline/ have the same source but are duplicate copies. This patch moves all the code in the tests to a parent directory: gcc.target/arm/cmse for tests shared by Armv8-M Baseline and Mainline and gcc.target/arm/cmse/mainline for tests *only* shared by the various float ABI of Armv8-M Mainline. C includes are then used where the code used to sit. Note that the cmse-13.c test used to differ slightly between architectures and float ABI tested in the first floating-point constant passed to bar: sometimes 1.0 and sometimes 3.0. This patch settles on 3.0 to not confuse with the 1.0 constant used to clear VFP registers in some of the configurations. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-03 Thomas Preud'homme * gcc.target/arm/cmse/bitfield-4.x: New file. * gcc.target/arm/cmse/baseline/bitfield-4.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-4.c: Likewise. * gcc.target/arm/cmse/bitfield-5.x: New file. * gcc.target/arm/cmse/baseline/bitfield-5.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-5.c: Likewise. * gcc.target/arm/cmse/bitfield-6.x: New file. * gcc.target/arm/cmse/baseline/bitfield-6.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-6.c: Likewise. * gcc.target/arm/cmse/bitfield-7.x: New file. * gcc.target/arm/cmse/baseline/bitfield-7.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-7.c: Likewise. * gcc.target/arm/cmse/bitfield-8.x: New file. * gcc.target/arm/cmse/baseline/bitfield-8.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-8.c: Likewise. * gcc.target/arm/cmse/bitfield-9.x: New file. * gcc.target/arm/cmse/baseline/bitfield-9.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/bitfield-9.c: Likewise. * gcc.target/arm/cmse/bitfield-and-union.x: New file. * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: Rename into ... * gcc.target/arm/cmse/baseline/bitfield-and-union.c: This. Remove code and include above bitfield-and-union.x file. * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Rename into ... * gcc.target/arm/cmse/mainline/bitfield-and-union.c: this. Remove code and include above bitfield-and-union.x file. * gcc.target/arm/cmse/cmse-13.x: New file. * gcc.target/arm/cmse/baseline/cmse-13.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/cmse-5.x: New file. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Remove code and include above file. * gcc.target/arm/cmse/mainline/harFor the most part, testcases under gcc.target/arm/cmse/baseline and gcc.target/arm/cmse/mainline are duplicate copies with only different dejagnu directives. Although there is no requirement for them to be similar, having them both identical allow to compare the code generated and make it easier in case of change in code generation to both architecture to update the testcases (if one needs updating so does the other). Similarly all the tests in gcc.target/arm/cmse/mainline/ have the same source but are duplicate copies. This patch moves all the code in the tests to a parent directory: gcc.target/arm/cmse for tests shared by Armv8-M Baseline and Mainline and gcc.target/arm/cmse/mainline for tests *only* shared by the various float ABI of Armv8-M Mainline. C includes are then used where the code used to sit. Note that the cmse-13.c test used to differ slightly between architectures and float ABI tested in the first floating-point constant passed to bar: sometimes 1.0 and sometimes 3.0. This patch settles on 3.0 to not confuse with the 1.0 constant used to clear VFP registers in some of the configurations. ChangeLog entry is
[PATCH, GCC/ARM] Fix ICE in Armv8-M Security Extensions code
Hi, Commit r253825 which introduced some sanity checks for sbitmap revealed a bug in the conversion of cmse_nonsecure_entry_clear_before_return () to using bitmap structure. bitmap_and expects that the two bitmaps have the same length, yet the code in cmse_nonsecure_entry_clear_before_return () have different size for to_clear_bitmap and to_clear_arg_regs_bitmap, with the assumption that bitmap_and would behave has if the bits not allocated were in fact zero. This commit makes sure both bitmap are equally sized. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-11-13 Thomas Preud'homme * config/arm/arm.c (cmse_nonsecure_entry_clear_before_return): Allocate to_clear_arg_regs_bitmap to the same size as to_clear_bitmap. Testing: Bootstrapped GCC on arm-none-linux-gnueabihf target and testsuite shows no regression. Running cmse.exp tests for Armv8-M Baseline and Mainline shows FAIL->PASS for bitfield-1, bitfield-2, bitfield-3 and struct-1 testcases. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index db99303f3fb7a2196f48358e74fa4d98f31f045e..106e3edce0d6f2518eb391c436c5213a78d1275b 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25205,7 +25205,8 @@ cmse_nonsecure_entry_clear_before_return (void) if (padding_bits_to_clear != 0) { rtx reg_rtx; - auto_sbitmap to_clear_arg_regs_bitmap (R0_REGNUM + NUM_ARG_REGS); + int to_clear_bitmap_size = SBITMAP_SIZE ((sbitmap) to_clear_bitmap); + auto_sbitmap to_clear_arg_regs_bitmap (to_clear_bitmap_size); /* Padding bits to clear is not 0 so we know we are dealing with returning a composite type, which only uses r0. Let's make sure that
[PATCH, GCC/testsuite/ARM] Fix selection of effective target for cmse tests
Hi, Some of the tests in the gcc.target/arm/cmse directory (eg. gcc.target/arm/cmse/mainline/bitfield-4.c) are failing when run without an architecture specified in RUNTESTFLAGS due to them not adding the option to select an Armv8-M architecture. This patch fixes the issue by adding the right option from the exp file so that no architecture fiddling is necessary in the individual tests. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-03 Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: Add option to select Armv8-M Baseline or Armv8-M Mainline when running the respective tests. * gcc.target/arm/cmse/baseline/cmse-11.c: Remove architecture check and selection. * gcc.target/arm/cmse/baseline/cmse-13.c: Likewise. * gcc.target/arm/cmse/baseline/cmse-2.c: Likewise. * gcc.target/arm/cmse/baseline/cmse-6.c: Likewise. * gcc.target/arm/cmse/baseline/softfp.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/soft/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/softfp/cmse-8.c: Likewise. Testing: Running cmse.exp for both Armv8-M Baseline and Mainline shows no regression. Running it for a toolchain defaulting to Armv8-M Baseline but with RUNTESTFLAGS unset sees some FAIL->PASS. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c index 795544fe11d9d7f24086be16916a5bfee89d7b44..230b255963f56a6c29b91d2501b43fed6eda2476 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c +++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c @@ -1,7 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-mcmse" } */ -/* { dg-require-effective-target arm_arch_v8m_base_ok } */ -/* { dg-add-options arm_arch_v8m_base } */ int __attribute__ ((cmse_nonsecure_call)) (*bar) (int); diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c index 7208a2cedd2f4f8296b2801d6f5e5d7838b26551..7ab3219e860e993e2eca3bbee2e885f59b7b3cb4 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c +++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c @@ -1,7 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-mcmse" } */ -/* { dg-require-effective-target arm_arch_v8m_base_ok } */ -/* { dg-add-options arm_arch_v8m_base } */ #include "../cmse-13.x" diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c index fec7dc10484b14db5796f5f431a9306c3b2e307c..d5115ecf2bdb3e87dc6a92244cb204e753f25b07 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c +++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c @@ -1,7 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-mcmse" } */ -/* { dg-require-effective-target arm_arch_v8m_base_ok } */ -/* { dg-add-options arm_arch_v8m_base } */ extern float bar (void); diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c index 43d45e7a63e56edfebc203c8f0e516dc13fbbd65..cae4f343621d1a19a8893ea4950d33e5e1842fb5 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c +++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c @@ -1,7 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-mcmse" } */ -/* { dg-require-effective-target arm_arch_v8m_base_ok } */ -/* { dg-add-options arm_arch_v8m_base } */ int __attribute__ ((cmse_nonsecure_call)) (*bar) (double); diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/softfp.c b/gcc/testsuite/gcc.target/arm/cmse/baseline/softfp.c index ca76e12cd9287fd12b7eb7add638973f5d314939..3d383ff6ee17677120e3e1e81726785c30f3b25c 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/baseline/softfp.c +++ b/gcc/testsuite/gcc.target/arm/cm
[PATCH, GCC/testsuite/ARM] Rework expectation for call to Armv8-M nonsecure function
Hi, Testcase gcc.target/arm/cmse/cmse-14.c checks whether bar is called via __gnu_cmse_nonsecure_call libcall and not via a direct call. However the pattern is a bit surprising in that it needs to explicitely allow "by" due to allowing anything before the 'b'. This patch rewrites the logic to look for b as a first non-whitespace letter followed iby anything (to match bl and conditional branches) followed by some spaces and then bar. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-11-01 Thomas Preud'homme * gcc.target/arm/cmse/cmse-14.c: Change logic to match branch instruction to bar. Testing: Test still passes for both Armv8-M Baseline and Mainline. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-14.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-14.c index 701e9ee7e318a07278099548f9b7042a1fde1204..df1ea52bec533c36a738d7d3b2b2ff749b0f3713 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-14.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-14.c @@ -10,4 +10,4 @@ int foo (void) } /* { dg-final { scan-assembler "bl\t__gnu_cmse_nonsecure_call" } } */ -/* { dg-final { scan-assembler-not "b\[^ y\n\]*\\s+bar" } } */ +/* { dg-final { scan-assembler-not "^(.*\\s)?bl?\[^\\s]*\\s+bar" } } */
[PATCH, GCC/ARM] Use bitmap to control cmse_nonsecure_call register clearing
Hi, As part of r253256, cmse_nonsecure_entry_clear_before_return has been rewritten to use auto_sbitmap instead of an integer bitfield to control which register needs to be cleared. This commit continue this work in cmse_nonsecure_call_clear_caller_saved. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-10-16 Thomas Preud'homme * config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Use auto_sbitap instead of integer bitfield to control register needing clearing. Testing: bootstrapped on arm-linux-gnueabihf and no regression in the testsuite. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9919f54242d9317125a104f9777d76a85de80e9b..7384b96fea0179334a6010b099df68c8e2a0fc32 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16990,10 +16990,11 @@ cmse_nonsecure_call_clear_caller_saved (void) FOR_BB_INSNS (bb, insn) { - uint64_t to_clear_mask, float_mask; + unsigned address_regnum, regno, maxregno = + TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : NUM_ARG_REGS - 1; + auto_sbitmap to_clear_bitmap (maxregno + 1); rtx_insn *seq; rtx pat, call, unspec, reg, cleared_reg, tmp; - unsigned int regno, maxregno; rtx address; CUMULATIVE_ARGS args_so_far_v; cumulative_args_t args_so_far; @@ -17024,18 +17025,21 @@ cmse_nonsecure_call_clear_caller_saved (void) continue; /* Determine the caller-saved registers we need to clear. */ - to_clear_mask = (1LL << (NUM_ARG_REGS)) - 1; - maxregno = NUM_ARG_REGS - 1; + bitmap_clear (to_clear_bitmap); + bitmap_set_range (to_clear_bitmap, R0_REGNUM, NUM_ARG_REGS); + /* Only look at the caller-saved floating point registers in case of -mfloat-abi=hard. For -mfloat-abi=softfp we will be using the lazy store and loads which clear both caller- and callee-saved registers. */ if (TARGET_HARD_FLOAT_ABI) { - float_mask = (1LL << (D7_VFP_REGNUM + 1)) - 1; - float_mask &= ~((1LL << FIRST_VFP_REGNUM) - 1); - to_clear_mask |= float_mask; - maxregno = D7_VFP_REGNUM; + auto_sbitmap float_bitmap (maxregno + 1); + + bitmap_clear (float_bitmap); + bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM, +D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1); + bitmap_ior (to_clear_bitmap, to_clear_bitmap, float_bitmap); } /* Make sure the register used to hold the function address is not @@ -17043,7 +17047,9 @@ cmse_nonsecure_call_clear_caller_saved (void) address = RTVEC_ELT (XVEC (unspec, 0), 0); gcc_assert (MEM_P (address)); gcc_assert (REG_P (XEXP (address, 0))); - to_clear_mask &= ~(1LL << REGNO (XEXP (address, 0))); + address_regnum = REGNO (XEXP (address, 0)); + if (address_regnum < R0_REGNUM + NUM_ARG_REGS) + bitmap_clear_bit (to_clear_bitmap, address_regnum); /* Set basic block of call insn so that df rescan is performed on insns inserted here. */ @@ -17064,6 +17070,7 @@ cmse_nonsecure_call_clear_caller_saved (void) FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter) { rtx arg_rtx; + uint64_t to_clear_args_mask; machine_mode arg_mode = TYPE_MODE (arg_type); if (VOID_TYPE_P (arg_type)) @@ -17076,10 +17083,18 @@ cmse_nonsecure_call_clear_caller_saved (void) arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true); gcc_assert (REG_P (arg_rtx)); - to_clear_mask - &= ~compute_not_to_clear_mask (arg_type, arg_rtx, - REGNO (arg_rtx), - padding_bits_to_clear_ptr); + to_clear_args_mask + = compute_not_to_clear_mask (arg_type, arg_rtx, + REGNO (arg_rtx), + padding_bits_to_clear_ptr); + if (to_clear_args_mask) + { + for (regno = R0_REGNUM; regno <= maxregno; regno++) + { + if (to_clear_args_mask & (1ULL << regno)) + bitmap_clear_bit (to_clear_bitmap, regno); + } + } first_param = false; } @@ -17138,7 +17153,7 @@ cmse_nonsecure_call_clear_caller_saved (void) call. */ for (regno = R0_REGNUM; regno <= maxregno; regno++) { - if (!(to_clear_mask & (1LL << regno))) + if (!bitmap_bit_p (to_clear_bitmap, regno)) continue; /* If regno is an even vfp register and its successor is also to @@ -17147,7 +17162,7 @@ cmse_nonsecure_call_clear_caller_saved (void) { if (TARGET_VFP_DOUBLE && VFP_REGNO_OK_FOR_DOUBLE (regno) - && to_clear_mask & (1LL << (regno + 1))) + && bitmap_bit_p (to_clear_bitmap, (regno + 1))) emit_move_insn (gen_rtx_REG (DFmode, regno++), CONST0_RTX (DFmode)); else @@ -17161,7 +17176,6 @@ cmse_nonsecure_call_clear_caller_saved (void) seq = get_insns (); end_sequence (); emit_insn_before (seq, insn); - } } } @@ -25188,7 +25202,7 @@ cmse_nonsecure_entry_clear_before_return (void) if (padding_bits_to_clear != 0) { rtx
[PATCH, GCC/ARM] Factor out CMSE register clearing code
Hi, Functions cmse_nonsecure_call_clear_caller_saved and cmse_nonsecure_entry_clear_before_return both contain very similar code to clear registers. What's worse, they differ slightly at times so if a bug is found in one careful thoughts is needed to decide whether the other function needs fixing too. This commit addresses the situation by factoring the two pieces of code into a new function. In doing so the code generated to clear VFP registers in cmse_nonsecure_call now uses the same sequence as cmse_nonsecure_entry functions. Tests expectation are thus updated accordingly. ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-10-24 Thomas Preud'homme * config/arm/arm.c (cmse_clear_registers): New function. (cmse_nonsecure_call_clear_caller_saved): Replace register clearing code by call to cmse_clear_registers. (cmse_nonsecure_entry_clear_before_return): Likewise. *** gcc/ChangeLog *** 2017-10-24 Thomas Preud'homme * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Adapt expectations to vmov instructions now generated. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and no regression in the testsuite. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9b494e9529a4470c18192a4561e03d2f80e90797..22c9add0722974902b2a89b2b0a75759ff8ba37c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16991,6 +16991,128 @@ compute_not_to_clear_mask (tree arg_type, rtx arg_rtx, int regno, return not_to_clear_mask; } +/* Clear registers secret before doing a cmse_nonsecure_call or returning from + a cmse_nonsecure_entry function. TO_CLEAR_BITMAP indicates which registers + are to be fully cleared, using the value in register CLEARING_REG if more + efficient. The PADDING_BITS_LEN entries array PADDING_BITS_TO_CLEAR gives + the bits that needs to be cleared in caller-saved core registers, with + SCRATCH_REG used as a scratch register for that clearing. + + NOTE: one of three following assertions must hold: + - SCRATCH_REG is a low register + - CLEARING_REG is in the set of registers fully cleared (ie. its bit is set + in TO_CLEAR_BITMAP) + - CLEARING_REG is a low register. */ + +static void +cmse_clear_registers (sbitmap to_clear_bitmap, uint32_t *padding_bits_to_clear, + int padding_bits_len, rtx scratch_reg, rtx clearing_reg) +{ + bool saved_clearing = false; + rtx saved_clearing_reg = NULL_RTX; + int i, regno, clearing_regno, minregno = R0_REGNUM, maxregno = minregno - 1; + + gcc_assert (arm_arch_cmse); + + if (!bitmap_empty_p (to_clear_bitmap)) +{ + minregno = bitmap_first_set_bit (to_clear_bitmap); + maxregno = bitmap_last_set_bit (to_clear_bitmap); +} + clearing_regno = REGNO (clearing_reg); + + /* Clear padding bits. */ + gcc_assert (padding_bits_len <= NUM_ARG_REGS); + for (i = 0, regno = R0_REGNUM; i < padding_bits_len; i++, regno++) +{ + uint64_t mask; + rtx rtx16, dest, cleared_reg = gen_rtx_REG (SImode, regno); + + if (padding_bits_to_clear[i] == 0) + continue; + + /* If this is a Thumb-1 target and SCRATCH_REG is not a low register, use + CLEARING_REG as scratch. */ + if (TARGET_THUMB1 + && REGNO (scratch_reg) > LAST_LO_REGNUM) + { + /* clearing_reg is not to be cleared, copy its value into scratch_reg + such that we can use clearing_reg to clear the unused bits in the + arguments. */ + if ((clearing_regno > maxregno + || !bitmap_bit_p (to_clear_bitmap, clearing_regno)) + && !saved_clearing) + { + gcc_assert (clearing_regno <= LAST_LO_REGNUM); + emit_move_insn (scratch_reg, clearing_reg); + saved_clearing = true; + saved_clearing_reg = scratch_reg; + } + scratch_reg = clearing_reg; + } + + /* Fill the lower half of the negated padding_bits_to_clear[i]. */ + mask = (~padding_bits_to_clear[i]) & 0x; + emit_move_insn (scratch_reg, gen_int_mode (mask, SImode)); + + /* Fill the top half of the negated padding_bits_to_clear[i]. */ + mask = (~padding_bits_to_clear[i]) >> 16; + rtx16 = gen_int_mode (16, SImode); + dest = gen_rtx_ZERO_EXTRACT (SImode, scratch_reg, rtx16, rtx16); + if (mask) + emit_insn (gen_rtx_SET (dest, gen_int_mode (mask, SImode))); + + emit_insn (gen_andsi3 (cleared_reg, cleared_reg, scratch_reg)); +} + if (saved_clearing) +emit_move_insn (clearing_reg, saved_clearing_reg); + + + /* Clear full registers. */ + + /* If not marked for clearing, clearing_reg already does not contain + any secret. */ + if (clearing_regno <= ma
[PATCH, GCC/ARM] Do no clobber r4 in Armv8-M nonsecure call
Hi, Expanders for Armv8-M nonsecure call unnecessarily clobber r4 despite the libcall they perform not writing to r4. Furthermore, the requirement for the branch target address to be in r4 as expected by the libcall is modeled in a convoluted way in the define_insn patterns: the address is a register match_operand constrained by the match_dup for the clobber which is guaranteed to be r4 due to the expander. This patch simplifies all this by simply requiring the address to be in r4 and removing the clobbers. Expanders are left alone because cmse_nonsecure_call_clear_caller_saved relies on branch target memory attributes which would be lost if expanding to reg:SI R4_REGNUM. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-10-24 Thomas Preud'homme * config/arm/arm.md (R4_REGNUM): Define constant. (nonsecure_call_internal): Remove r4 clobber. (nonsecure_call_value_internal): Likewise. * config/arm/thumb1.md (nonsecure_call_reg_thumb1_v5): Remove second clobber and resequence match_operands. (nonsecure_call_value_reg_thumb1_v5): Likewise. * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Likewise. (nonsecure_call_value_reg_thumb2): Likewise. Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index ddb9d8f359007c1d86d497aef0ff5fc0e4061813..6b0794ede9fbc5a4f41e1f4a92acb9b649a277bc 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -30,6 +30,7 @@ (define_constants [(R0_REGNUM 0) ; First CORE register (R1_REGNUM 1) ; Second CORE register + (R4_REGNUM 4) ; Fifth CORE register (IP_REGNUM 12) ; Scratch register (SP_REGNUM 13) ; Stack pointer (LR_REGNUM14) ; Return address register @@ -8118,14 +8119,13 @@ UNSPEC_NONSECURE_MEM) (match_operand 1 "general_operand" "")) (use (match_operand 2 "" "")) - (clobber (reg:SI LR_REGNUM)) - (clobber (reg:SI 4))])] + (clobber (reg:SI LR_REGNUM))])] "use_cmse" " { rtx tmp; tmp = copy_to_suggested_reg (XEXP (operands[0], 0), - gen_rtx_REG (SImode, 4), + gen_rtx_REG (SImode, R4_REGNUM), SImode); operands[0] = replace_equiv_address (operands[0], tmp); @@ -8210,14 +8210,13 @@ UNSPEC_NONSECURE_MEM) (match_operand 2 "general_operand" ""))) (use (match_operand 3 "" "")) - (clobber (reg:SI LR_REGNUM)) - (clobber (reg:SI 4))])] + (clobber (reg:SI LR_REGNUM))])] "use_cmse" " { rtx tmp; tmp = copy_to_suggested_reg (XEXP (operands[1], 0), - gen_rtx_REG (SImode, 4), + gen_rtx_REG (SImode, R4_REGNUM), SImode); operands[1] = replace_equiv_address (operands[1], tmp); diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 5d196a673355a7acf7d0ed30f21b997b815913f5..f91659386bf240172bd9a3076722683c8a50dff4 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -1732,12 +1732,11 @@ ) (define_insn "*nonsecure_call_reg_thumb1_v5" - [(call (unspec:SI [(mem:SI (match_operand:SI 0 "register_operand" "l*r"))] + [(call (unspec:SI [(mem:SI (reg:SI R4_REGNUM))] UNSPEC_NONSECURE_MEM) - (match_operand 1 "" "")) - (use (match_operand 2 "" "")) - (clobber (reg:SI LR_REGNUM)) - (clobber (match_dup 0))] + (match_operand 0 "" "")) + (use (match_operand 1 "" "")) + (clobber (reg:SI LR_REGNUM))] "TARGET_THUMB1 && use_cmse && !SIBLING_CALL_P (insn)" "bl\\t__gnu_cmse_nonsecure_call" [(set_attr "length" "4") @@ -1779,12 +1778,11 @@ (define_insn "*nonsecure_call_value_reg_thumb1_v5" [(set (match_operand 0 "" "") (call (unspec:SI - [(mem:SI (match_operand:SI 1 "register_operand" "l*r"))] + [(mem:SI (reg:SI R4_REGNUM))] UNSPEC_NONSECURE_MEM) - (match_operand 2 "" ""))) - (use (match_operand 3 "" "")) - (clobber (reg:SI LR_REGNUM)) - (clobber (match_dup 1))] + (match_operand 1 "" ""))) + (use (match_operand 2 "" "")) + (clobber (reg:SI LR_REGNUM))] "TARGET_THUMB1 && use_cmse" "bl\\t__gnu_cmse_nonsecure_call" [(set_attr "length" "4") diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 776d611d2538e790a5f504995050ffdfc51d7193..d56a8bd167575263edc2a4b3f66bda34a4a7a72a 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -555,12 +555,11 @@ ) (define_insn "*nonsecure_call_reg_thumb2" - [(call (unspec:SI [(mem:SI (match_operand:SI 0 "s_register_operand" "r"))] + [(call (unspec:SI [(mem:SI (reg:SI R4_REGNUM))] UNSPEC_NONSECURE_MEM) - (match_operand 1 "" "")) - (use (match_operand 2 "" "")) - (clobber (reg:SI LR_REGNUM)) - (clobber (match_dup 0))] + (match_operand 0 "" "")) + (use (match_operand 1 "" "")) + (clobber (reg:SI LR_REGNUM))] "TARGET_THUMB2 && use_cmse" "bl\\t__gnu_cmse_nonsecure_ca
Re: [PATCH] Use bswap framework in store-merging (PR tree-optimization/78821)
Hi Jakub, On 16/11/17 17:06, Jakub Jelinek wrote: Hi! This patch uses the bswap pass framework inside of the store merging pass to handle adjacent stores which produce together a 16/32/64 bit store of bswapped value (loaded or from SSA_NAME) or identity (usually only from SSA_NAME, the code prefers to use the existing store merging code if coming from identity load, because it e.g. can handle arbitrary sizes, not just 16/32/64 bits). There are small tweaks to the bswap code to make it usable inside of the store merging pass. Then when processing the stores, we record what find_bswap_or_nop_1 returns and do a small sanity check on it, and when doing coalesce_immediate_stores (i.e. the splitting into groups), we try for 64-bit, 32-bit and 16-bit sizes if we can extend/shift (according to endianity) and perform_symbolic_merge them together. If it is possible, we turn those 2+ adjacent stores that make together {64,32,16} bits into a separate group and process it specially later (we need to treat it as a single store rather than multiple, so split_group is only very lightweight for that case). Nice, the two finally merged! I took a look at the bswap part and it all looked good to me code and comment wise. I only have one small nit regarding a space/tab change (see below). Bootstrapped/regtested on {x86_64,i686,powerpc64le,powerpc64}-linux, ok for trunk? The cases this patch can handle are less common than rhs_code INTEGER_CST (stores of constants to adjacent memory) or MEM_REF (adjacent memory copying), but are more common than the bitwise ops, during combined x86_64+i686 bootstraps/regtests it triggered: lrotate_expr 974 2528 nop_expr 720 1711 (lrotate_expr stands for bswap, nop_expr for identity, the first column is the actual count of such new stores, the second is the original number of stores that have been optimized this way). Are you saying that lrotate_expr is just the title and it also includes 32- and 64-bit bswap or is it only the count of lrotate_expr nodes? 2017-11-16 Jakub Jelinek PR tree-optimization/78821 * gimple-ssa-store-merging.c (find_bswap_or_nop_load): Give up if base is TARGET_MEM_REF. If base is not MEM_REF, set base_addr to the address of the base rather than the base itself. (find_bswap_or_nop_1): Just use pointer comparison for vuse check. (find_bswap_or_nop_finalize): New function. (find_bswap_or_nop): Use it. (bswap_replace): Return a tree rather than bool, change first argument from gimple * to gimple_stmt_iterator, allow inserting into an empty sequence, allow ins_stmt to be NULL - then emit all stmts into gsi. Fix up MEM_REF address gimplification. (pass_optimize_bswap::execute): Adjust bswap_replace caller. Formatting fix. (struct store_immediate_info): Add N and INS_STMT non-static data members. (store_immediate_info::store_immediate_info): Initialize them from newly added ctor args. (merged_store_group::apply_stores): Formatting fixes. Sort by bitpos at the end. (stmts_may_clobber_ref_p): For stores call also refs_anti_dependent_p. (gather_bswap_load_refs): New function. (imm_store_chain_info::try_coalesce_bswap): New method. (imm_store_chain_info::coalesce_immediate_stores): Use it. (split_group): Handle LROTATE_EXPR and NOP_EXPR rhs_code specially. (imm_store_chain_info::output_merged_store): Fail if number of new estimated stmts is bigger or equal than old. Handle LROTATE_EXPR and NOP_EXPR rhs_code. (pass_store_merging::process_store): Compute n and ins_stmt, if ins_stmt is non-NULL and the store rhs is otherwise invalid, use LROTATE_EXPR rhs_code. Pass n and ins_stmt to store_immediate_info ctor. (pass_store_merging::execute): Calculate dominators. * gcc.dg/store_merging_16.c: New test. --- gcc/gimple-ssa-store-merging.c.jj 2017-11-16 10:45:09.239185205 +0100 +++ gcc/gimple-ssa-store-merging.c 2017-11-16 15:34:08.560080214 +0100 @@ -369,7 +369,10 @@ find_bswap_or_nop_load (gimple *stmt, tr base_addr = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep); - if (TREE_CODE (base_addr) == MEM_REF) + if (TREE_CODE (base_addr) == TARGET_MEM_REF) +/* Do not rewrite TARGET_MEM_REF. */ +return false; + else if (TREE_CODE (base_addr) == MEM_REF) { offset_int bit_offset = 0; tree off = TREE_OPERAND (base_addr, 1); @@ -401,6 +404,8 @@ find_bswap_or_nop_load (gimple *stmt, tr bitpos += bit_offset.to_shwi (); } + else +base_addr = build_fold_addr_expr (base_addr); if (bitpos % BITS_PER_UNIT) return false; @@ -743,8 +748,7 @@ find_bswap_or_nop_1 (gimple *stmt, struc if (TYPE_PRECISION (n1.type) != TYPE_PRECIS
Re: [PATCH, GCC/ARM] Fix cmse_nonsecure_entry return insn size
Hi Kyrill, On 09/11/17 14:26, Kyrill Tkachov wrote: Hi Thomas, On 08/11/17 09:50, Thomas Preudhomme wrote: Hi, A number of instructions are output in assembler form by output_return_instruction () when compiling a function with the cmse_nonsecure_entry attribute for Armv8-M Mainline with hardfloat float ABI. However, the corresponding thumb2_cmse_entry_return insn pattern does not account for all these instructions in its computing of the length of the instruction. This may lead GCC to use the wrong branching instruction due to incorrect computation of the offset between the branch instruction's address and the target address. This commit fixes the mismatch between what output_return_instruction () does and what the pattern think it does and adds a note warning about mismatch in the affected functions' heading comments to ensure code does not get out of sync again. Note: no test is provided because the C testcase is fragile (only works on GCC 6) and the extracted RTL test fails to compile due to bugs in the RTL frontend (PR82815 and PR82817) ChangeLog entries are as follows: *** gcc/ChangeLog *** 2017-10-30 Thomas Preud'homme * config/arm/arm.c (output_return_instruction): Add comments to indicate requirement for cmse_nonsecure_entry return to account for the size of clearing instruction output here. (thumb_exit): Likewise. * config/arm/thumb2.md (thumb2_cmse_entry_return): Fix length for return in hardfloat mode. Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no regression. Is this ok for trunk? Ok for trunk and for the branches after a few days. I've committed the patch to gcc-7-branch (see attached) after another round of testing since nobody reported a regression since. Thanks. Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 989957f048e3c757ef4665d0387ecdc66d26a7dd..7b3f4c1011dc37cb01654f70cfbffadd57d382ec 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19316,7 +19316,12 @@ arm_get_vfp_saved_size (void) /* Generate a function exit sequence. If REALLY_RETURN is false, then do everything bar the final return instruction. If simple_return is true, - then do not output epilogue, because it has already been emitted in RTL. */ + then do not output epilogue, because it has already been emitted in RTL. + + Note: do not forget to update length attribute of corresponding insn pattern + when changing assembly output (eg. length attribute of + thumb2_cmse_entry_return when updating Armv8-M Mainline Security Extensions + register clearing sequences). */ const char * output_return_instruction (rtx operand, bool really_return, bool reverse, bool simple_return) @@ -23809,7 +23814,12 @@ thumb_pop (FILE *f, unsigned long mask) /* Generate code to return from a thumb function. If 'reg_containing_return_addr' is -1, then the return address is - actually on the stack, at the stack pointer. */ + actually on the stack, at the stack pointer. + + Note: do not forget to update length attribute of corresponding insn pattern + when changing assembly output (eg. length attribute of epilogue_insns when + updating Armv8-M Baseline Security Extensions register clearing + sequences). */ static void thumb_exit (FILE *f, int reg_containing_return_addr) { diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 2e7580f220eae1524fef69719b1796f50f5cf27c..35f8e9bbf24058c129cbb117c74d1a4bebbf9f38 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1132,7 +1132,7 @@ ; we adapt the length accordingly. (set (attr "length") (if_then_else (match_test "TARGET_HARD_FLOAT") - (const_int 12) + (const_int 34) (const_int 8))) ; We do not support predicate execution of returns from cmse_nonsecure_entry ; functions because we need to clear the APSR. Since predicable has to be
Re: [PATCH, GCC/ARM] Use bitmap to control cmse_nonsecure_call register clearing
Thanks Kyrill. Committed the attached rebased patch (same patch but without the last hunk because a better fix was done in an earlier commit). Best regards, Thomas On 22/11/17 11:57, Kyrill Tkachov wrote: Hi Thomas, On 15/11/17 17:08, Thomas Preudhomme wrote: Hi, As part of r253256, cmse_nonsecure_entry_clear_before_return has been rewritten to use auto_sbitmap instead of an integer bitfield to control which register needs to be cleared. This commit continue this work in cmse_nonsecure_call_clear_caller_saved. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-10-16 Thomas Preud'homme * config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Use auto_sbitap instead of integer bitfield to control register needing clearing. Testing: bootstrapped on arm-linux-gnueabihf and no regression in the testsuite. Is this ok for trunk? Ok for trunk. Thanks for this conversion. It's much easier to understand the code without having to think about the bitmasks and shifts. Kyrill Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 106e3edce0d6f2518eb391c436c5213a78d1275b..092cd61d49382101bce9b8c5f04de31965dcdc77 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17007,10 +17007,11 @@ cmse_nonsecure_call_clear_caller_saved (void) FOR_BB_INSNS (bb, insn) { - uint64_t to_clear_mask, float_mask; + unsigned address_regnum, regno, maxregno = + TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : NUM_ARG_REGS - 1; + auto_sbitmap to_clear_bitmap (maxregno + 1); rtx_insn *seq; rtx pat, call, unspec, reg, cleared_reg, tmp; - unsigned int regno, maxregno; rtx address; CUMULATIVE_ARGS args_so_far_v; cumulative_args_t args_so_far; @@ -17041,18 +17042,21 @@ cmse_nonsecure_call_clear_caller_saved (void) continue; /* Determine the caller-saved registers we need to clear. */ - to_clear_mask = (1LL << (NUM_ARG_REGS)) - 1; - maxregno = NUM_ARG_REGS - 1; + bitmap_clear (to_clear_bitmap); + bitmap_set_range (to_clear_bitmap, R0_REGNUM, NUM_ARG_REGS); + /* Only look at the caller-saved floating point registers in case of -mfloat-abi=hard. For -mfloat-abi=softfp we will be using the lazy store and loads which clear both caller- and callee-saved registers. */ if (TARGET_HARD_FLOAT_ABI) { - float_mask = (1LL << (D7_VFP_REGNUM + 1)) - 1; - float_mask &= ~((1LL << FIRST_VFP_REGNUM) - 1); - to_clear_mask |= float_mask; - maxregno = D7_VFP_REGNUM; + auto_sbitmap float_bitmap (maxregno + 1); + + bitmap_clear (float_bitmap); + bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM, +D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1); + bitmap_ior (to_clear_bitmap, to_clear_bitmap, float_bitmap); } /* Make sure the register used to hold the function address is not @@ -17060,7 +17064,9 @@ cmse_nonsecure_call_clear_caller_saved (void) address = RTVEC_ELT (XVEC (unspec, 0), 0); gcc_assert (MEM_P (address)); gcc_assert (REG_P (XEXP (address, 0))); - to_clear_mask &= ~(1LL << REGNO (XEXP (address, 0))); + address_regnum = REGNO (XEXP (address, 0)); + if (address_regnum < R0_REGNUM + NUM_ARG_REGS) + bitmap_clear_bit (to_clear_bitmap, address_regnum); /* Set basic block of call insn so that df rescan is performed on insns inserted here. */ @@ -17081,6 +17087,7 @@ cmse_nonsecure_call_clear_caller_saved (void) FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter) { rtx arg_rtx; + uint64_t to_clear_args_mask; machine_mode arg_mode = TYPE_MODE (arg_type); if (VOID_TYPE_P (arg_type)) @@ -17093,10 +17100,18 @@ cmse_nonsecure_call_clear_caller_saved (void) arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true); gcc_assert (REG_P (arg_rtx)); - to_clear_mask - &= ~compute_not_to_clear_mask (arg_type, arg_rtx, - REGNO (arg_rtx), - padding_bits_to_clear_ptr); + to_clear_args_mask + = compute_not_to_clear_mask (arg_type, arg_rtx, + REGNO (arg_rtx), + padding_bits_to_clear_ptr); + if (to_clear_args_mask) + { + for (regno = R0_REGNUM; regno <= maxregno; regno++) + { + if (to_clear_args_mask & (1ULL << regno)) + bitmap_clear_bit (to_clear_bitmap, regno); + } + } first_param = false; } @@ -17155,7 +17170,7 @@ cmse_nonsecure_call_clear_caller_saved (void) call. */ for (regno = R0_REGNUM; regno <= maxregno; regno++) { - if (!(to_clear_mask & (1LL << regno))) + if (!bitmap_bit_p (to_clear_bitmap, regno)) continue; /* If regno is an even vfp register and its successor is also to @@ -17164,7 +17179,7 @@ cmse_nonsecure_call_clear_caller_saved (void) { if (TARGET_VFP_DOUBLE &&
[PATCH, GCC/ARM] Remove useless variable in CMSE code
Hi, Functions cmse_nonsecure_call_clear_caller_saved () and cmse_nonsecure_entry_clear_before_return () use a separate variable holding a pointer to padding_bits_to_clear array's first entry which is used when calling function compute_not_to_clear_mask (). This does not save space over using &padding_bits_to_clear[0] directly so this commit gets rid of it. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-11-08 Thomas Preud'homme * config/arm/arm.c (cmse_nonsecure_call_clear_caller_saved): Get rid of padding_bits_to_clear_ptr. (cmse_nonsecure_entry_clear_before_return): Likewise. Testing: Bootstrapped an arm-none-linux-gnueabihf compiler and regression test does not show any regression. Committed as obvious. Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7384b96fea0179334a6010b099df68c8e2a0fc32..bcb708c1b316ea08969e118fb0949b941ff19c27 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17002,7 +17002,6 @@ cmse_nonsecure_call_clear_caller_saved (void) bool using_r4, first_param = true; function_args_iterator args_iter; uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U}; - uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear[0]; if (!NONDEBUG_INSN_P (insn)) continue; @@ -17086,7 +17085,7 @@ cmse_nonsecure_call_clear_caller_saved (void) to_clear_args_mask = compute_not_to_clear_mask (arg_type, arg_rtx, REGNO (arg_rtx), - padding_bits_to_clear_ptr); + &padding_bits_to_clear[0]); if (to_clear_args_mask) { for (regno = R0_REGNUM; regno <= maxregno; regno++) @@ -25134,7 +25133,6 @@ cmse_nonsecure_entry_clear_before_return (void) { int regno, maxregno = TARGET_HARD_FLOAT ? LAST_VFP_REGNUM : IP_REGNUM; uint32_t padding_bits_to_clear = 0; - uint32_t * padding_bits_to_clear_ptr = &padding_bits_to_clear; auto_sbitmap to_clear_bitmap (maxregno + 1); tree result_type; rtx result_rtl; @@ -25187,7 +25185,7 @@ cmse_nonsecure_entry_clear_before_return (void) gcc_assert (REG_P (result_rtl)); to_clear_return_mask = compute_not_to_clear_mask (result_type, result_rtl, 0, - padding_bits_to_clear_ptr); + &padding_bits_to_clear); if (to_clear_return_mask) { gcc_assert ((unsigned) maxregno < sizeof (long long) * __CHAR_BIT__);
Re: [PATCH, GCC/ARM] Factor out CMSE register clearing code
On 22/11/17 14:45, Kyrill Tkachov wrote: Hi Thomas, On 15/11/17 17:12, Thomas Preudhomme wrote: Hi, Functions cmse_nonsecure_call_clear_caller_saved and cmse_nonsecure_entry_clear_before_return both contain very similar code to clear registers. What's worse, they differ slightly at times so if a bug is found in one careful thoughts is needed to decide whether the other function needs fixing too. This commit addresses the situation by factoring the two pieces of code into a new function. In doing so the code generated to clear VFP registers in cmse_nonsecure_call now uses the same sequence as cmse_nonsecure_entry functions. Tests expectation are thus updated accordingly. ChangeLog entry are as follow: *** gcc/ChangeLog *** 2017-10-24 Thomas Preud'homme * config/arm/arm.c (cmse_clear_registers): New function. (cmse_nonsecure_call_clear_caller_saved): Replace register clearing code by call to cmse_clear_registers. (cmse_nonsecure_entry_clear_before_return): Likewise. *** gcc/ChangeLog *** 2017-10-24 Thomas Preud'homme * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Adapt expectations to vmov instructions now generated. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise. * gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise. Testing: bootstrapped on arm-linux-gnueabihf and no regression in the testsuite. Is this ok for trunk? This looks mostly ok, but I have a concern from reading the code that I'd like some help with... diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 9b494e9529a4470c18192a4561e03d2f80e90797..22c9add0722974902b2a89b2b0a75759ff8ba37c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16991,6 +16991,128 @@ compute_not_to_clear_mask (tree arg_type, rtx arg_rtx, int regno, return not_to_clear_mask; } +/* Clear registers secret before doing a cmse_nonsecure_call or returning from + a cmse_nonsecure_entry function. TO_CLEAR_BITMAP indicates which registers + are to be fully cleared, using the value in register CLEARING_REG if more + efficient. The PADDING_BITS_LEN entries array PADDING_BITS_TO_CLEAR gives + the bits that needs to be cleared in caller-saved core registers, with + SCRATCH_REG used as a scratch register for that clearing. + + NOTE: one of three following assertions must hold: + - SCRATCH_REG is a low register + - CLEARING_REG is in the set of registers fully cleared (ie. its bit is set + in TO_CLEAR_BITMAP) + - CLEARING_REG is a low register. */ + +static void +cmse_clear_registers (sbitmap to_clear_bitmap, uint32_t *padding_bits_to_clear, + int padding_bits_len, rtx scratch_reg, rtx clearing_reg) +{ + bool saved_clearing = false; + rtx saved_clearing_reg = NULL_RTX; + int i, regno, clearing_regno, minregno = R0_REGNUM, maxregno = minregno - 1; + Here minregno becomes 0 and maxregno becomes -1... + gcc_assert (arm_arch_cmse); + + if (!bitmap_empty_p (to_clear_bitmap)) + { + minregno = bitmap_first_set_bit (to_clear_bitmap); + maxregno = bitmap_last_set_bit (to_clear_bitmap); + } ...and here is a path on maxregno may not be set to a proper register number... If bitmap is empty yes, ie. if no bit is set and no register should be cleared. + + for (regno = minregno; regno <= maxregno; regno++) + { + if (!bitmap_bit_p (to_clear_bitmap, regno)) + continue; + ...and here we iterate from minregno (potentially 0) to maxregno (potentially -1) which will lead to trouble. Are there any guarantees that this case will not occur? It absolutely does occur and that's on purpose. If maxregno is -1 it means there is no bit to clear and so it is fine to do nothing. Best regards, Thomas
[arm-embedded] [PATCH, GCC/LTO, ping] Fix PR69866: LTO with def for weak alias in regular object file
Hi, We have decided to apply the forwarded patch to the embedded-7-branch to fix an ICE when doing partial LTO with weak symbols. ChangeLog entry is as follows: 2017-11-28 Thomas Preud'homme Backport from mainline 2017-06-15 Jan Hubicka Thomas Preud'homme PR lto/69866 * lto-symtab.c (lto_symtab_merge_symbols): Drop useless definitions that resolved externally. Backport from mainline 2017-06-15 Thomas Preud'homme PR lto/69866 * gcc.dg/lto/pr69866_0.c: New test. * gcc.dg/lto/pr69866_1.c: Likewise. Best regards, Thomas --- Begin Message --- Hi, I am testing the following. Let me know if it works for you. Honza Index: lto/lto-symtab.c === --- lto/lto-symtab.c(revision 249213) +++ lto/lto-symtab.c(working copy) @@ -952,6 +952,42 @@ if (tgt) node->resolve_alias (tgt, true); } + /* If the symbol was preempted outside IR, see if we want to get rid +of the definition. */ + if (node->analyzed + && !DECL_EXTERNAL (node->decl) + && (node->resolution == LDPR_PREEMPTED_REG + || node->resolution == LDPR_RESOLVED_IR + || node->resolution == LDPR_RESOLVED_EXEC + || node->resolution == LDPR_RESOLVED_DYN)) + { + DECL_EXTERNAL (node->decl) = 1; + /* If alias to local symbol was preempted by external definition, +we know it is not pointing to the local symbol. Remove it. */ + if (node->alias + && !node->weakref + && !node->transparent_alias + && node->get_alias_target ()->binds_to_current_def_p ()) + { + node->alias = false; + node->remove_all_references (); + node->definition = false; + node->analyzed = false; + node->cpp_implicit_alias = false; + } + else if (!node->alias + && node->definition + && node->get_availability () <= AVAIL_INTERPOSABLE) + { + if ((cnode = dyn_cast (node)) != NULL) + cnode->reset (); + else + { + node->analyzed = node->definition = false; + node->remove_all_references (); + } + } + } if (!(cnode = dyn_cast (node)) || !cnode->clone_of --- End Message ---
[PATCH, GCC/testsuite] Improve fstack_protector effective target
Hi, Effective target fstack_protector fails to return an error for newlib-based target (such as arm-none-eabi targets) which does not support stack protector. This is due to the test being too simplist for stack protection code to be generated by GCC: it does not contain a local buffer and does not read unknown input. This commit adds a small local buffer with a copy of the filename to trigger stack protector code to be generated. The filename is used instead of the full path so as to ensure the size will fit in the local buffer. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-11-28 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_fstack_protector): Copy filename in local buffer to trigger stack protection. Testing: Ran gcc.dg/pr38616 on arm-none-eabi and arm-linux-gnueabihf, the former is now UNSUPPORTED while the latter continues to PASS. Is this ok for stage3? Best regards, Thomas diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d30fd368922713d3695f22710197ce7094c977cd..8aff16a25823ec48e76ad6ad8fdc8db998a45877 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1064,7 +1064,11 @@ proc check_effective_target_static {} { # Return 1 if the target supports -fstack-protector proc check_effective_target_fstack_protector {} { return [check_runtime fstack_protector { - int main (void) { return 0; } + #include + int main (int argc, char *argv[]) { + char buf[64]; + return !strcpy (buf, strrchr (argv[0], '/')); + } } "-fstack-protector"] }
[PATCH, GCC/testsuite] Fix dump-noaddr dumpbase
Hi, dump-noaddr test FAILS when $tmpdir is not the same as the directory where runtest is called from. Note that this does not happen when running make check because tmpdir is set to srcdir. In that case, file mkdir will create the directory in the current directory while GCC is invoked from tmpdir and hence -dumpbase look for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to be relative to tmpdir which will work in all case. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-12-05 Thomas Preud'homme * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base relative to tmpdir. Testing: Successfully ran unsorted.exp via make check and out of tree testing using runtest from /test with tmpdir set in /test/site.exp to . Is this ok for stage3? Best regards, Thomas diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x index d14d494570944b2be82c2575204cdbf4b15721ca..68d6c3e38325cabbdd280ecf05e663dbcda99900 100644 --- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x +++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x @@ -11,10 +11,10 @@ proc dump_compare { src options } { foreach option $option_list { file delete -force dump1 file mkdir dump1 - c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" + c-torture-compile $src "$option $options -dumpbase [pwd]/dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" file delete -force dump2 file mkdir dump2 - c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" + c-torture-compile $src "$option $options -dumpbase [pwd]/dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" foreach dump1 [lsort [glob -nocomplain dump1/*]] { regsub dump1/ $dump1 dump2/ dump2 set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"
Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase
On 05/12/17 17:54, Andrew Pinski wrote: On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme wrote: Hi, dump-noaddr test FAILS when $tmpdir is not the same as the directory where runtest is called from. Note that this does not happen when running make check because tmpdir is set to srcdir. In that case, file mkdir will create the directory in the current directory while GCC is invoked from tmpdir and hence -dumpbase look for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to be relative to tmpdir which will work in all case. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-12-05 Thomas Preud'homme * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base relative to tmpdir. Testing: Successfully ran unsorted.exp via make check and out of tree testing using runtest from /test with tmpdir set in /test/site.exp to . Is this ok for stage3? https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html I don't remember where this discussion went last time. Maybe this time there will be a resolution :). FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I think his patch can be simplified though because the compiler seems to be invoked from tmpdir so it can at least be omitted from the -dumpbase. Best regards, Thomas
Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase
Hi Mike, Thanks, I've tested after the two commits and it works both in tree and out of tree. It'll simplify comparing in tree results Vs out of tree for us, thanks a lot! Would you consider a backport to stable branches if nobody complains after a week? Best regards, Thomas On 05/12/17 19:27, Mike Stump wrote: On Dec 5, 2017, at 11:11 AM, Thomas Preudhomme wrote: On 05/12/17 17:54, Andrew Pinski wrote: On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme wrote: Hi, dump-noaddr test FAILS when $tmpdir is not the same as the directory where runtest is called from. Note that this does not happen when running make check because tmpdir is set to srcdir. In that case, file mkdir will create the directory in the current directory while GCC is invoked from tmpdir and hence -dumpbase look for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to be relative to tmpdir which will work in all case. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2017-12-05 Thomas Preud'homme * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base relative to tmpdir. Testing: Successfully ran unsorted.exp via make check and out of tree testing using runtest from /test with tmpdir set in /test/site.exp to . Is this ok for stage3? https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html I don't remember where this discussion went last time. Maybe this time there will be a resolution :). FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I think his patch can be simplified though because the compiler seems to be invoked from tmpdir so it can at least be omitted from the -dumpbase. Sounds reasonable. I've added that on top of his patch and checked that in. Let us know if it works or not.
[PATCH, GCC/ARM] Multilib mapping for Armv8-R
Hi, Due to there being no multilib mapping for Armv8-R, default multilib targeting -march=armv4t with softfloat floating-point arithmetic is being used. This patch maps it instead to the existing Armv7 multilibs. Note that since there is no single-precision multilib compatible with R profile, -march=armv8-r+fp.sp is mapped to -march=armv7 ie. Armv7 with softfloat floating-point. Changelog entry is as follows: *** gcc/ChangeLog *** 2018-02-12 Thomas Preud'homme * config/arm/t-multilib: Map Armv8-R to Armv7 multilibs. Testing: Ran -print-multi-directory for all combinations of extensions one can pass to -march=armv8-r (including no extension but only considering a single ordering of extension). All gave the expected result. Details in appendix. Is this ok for stage4? Best regards, Thomas Appendix: output of -print-multi-directory for all extensions available to -march=armv8-r % for ext in "" +crc +fp.sp +simd +crypto +crc+fp.sp +crc+simd +crc+crypto +fp.sp+simd +fp.sp+crypto +simd+crypto +crc+fp.sp+simd +crc+fp.sp+crypto +crc+simd+crypto +fp.sp+simd+crypto +crc+fp.sp+simd+crypto ; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfloat-abi=soft -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done arm-none-eabi-gcc -march=armv8-r -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+fp.sp -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+simd -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+simd -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+fp.sp+simd -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+fp.sp+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+simd+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+simd+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+fp.sp+simd+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd+crypto -mfloat-abi=soft -print-multi-directory: thumb/v7/nofp % for ext in "" +crc +fp.sp +simd +crypto +crc+fp.sp +crc+simd +crc+crypto +fp.sp+simd +fp.sp+crypto +simd+crypto +crc+fp.sp+simd +crc+fp.sp+crypto +crc+simd+crypto +fp.sp+simd+crypto +crc+fp.sp+simd+crypto ; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfloat-abi=softfp -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done arm-none-eabi-gcc -march=armv8-r -mfloat-abi=softfp -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc -mfloat-abi=softfp -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+fp.sp -mfloat-abi=softfp -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+simd -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp -mfloat-abi=softfp -print-multi-directory: thumb/v7/nofp arm-none-eabi-gcc -march=armv8-r+crc+simd -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+fp.sp+simd -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+fp.sp+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+fp.sp+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp % for ext in "" +crc +fp.sp +simd +crypto +crc+fp.sp +crc+simd +crc+crypto +fp.sp+simd +fp.sp+crypto +simd+cryp
Re: [PATCH, GCC/ARM] Multilib mapping for Armv8-R
/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+fp.sp+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd+crypto -mfloat-abi=softfp -print-multi-directory: thumb/v7+fp/softfp % for ext in "" +crc +fp.sp +simd +crypto +crc+fp.sp +crc+simd +crc+crypto +fp.sp+simd +fp.sp+crypto +simd+crypto +crc+fp.sp+simd +crc+fp.sp+crypto +crc+simd+crypto +fp.sp+simd+crypto +crc+fp.sp+simd+crypto ; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfloat-abi=hard -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done arm-none-eabi-gcc -march=armv8-r -mfloat-abi=hard -print-multi-directory: . arm-none-eabi-gcc -march=armv8-r+crc -mfloat-abi=hard -print-multi-directory: . arm-none-eabi-gcc -march=armv8-r+fp.sp -mfloat-abi=hard -print-multi-directory: . arm-none-eabi-gcc -march=armv8-r+simd -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+fp.sp -mfloat-abi=hard -print-multi-directory: . arm-none-eabi-gcc -march=armv8-r+crc+simd -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+fp.sp+simd -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+fp.sp+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+simd+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+simd+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+fp.sp+simd+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard arm-none-eabi-gcc -march=armv8-r+crc+fp.sp+simd+crypto -mfloat-abi=hard -print-multi-directory: thumb/v7+fp/hard On 13/02/18 10:27, Kyrill Tkachov wrote: Hi Thomas, On 13/02/18 10:24, Thomas Preudhomme wrote: Hi, Due to there being no multilib mapping for Armv8-R, default multilib targeting -march=armv4t with softfloat floating-point arithmetic is being used. This patch maps it instead to the existing Armv7 multilibs. Note that since there is no single-precision multilib compatible with R profile, -march=armv8-r+fp.sp is mapped to -march=armv7 ie. Armv7 with softfloat floating-point. Thanks for doing this. Changelog entry is as follows: *** gcc/ChangeLog *** 2018-02-12 Thomas Preud'homme * config/arm/t-multilib: Map Armv8-R to Armv7 multilibs. Testing: Ran -print-multi-directory for all combinations of extensions one can pass to -march=armv8-r (including no extension but only considering a single ordering of extension). All gave the expected result. Details in appendix. Is this ok for stage4? Best regards, Thomas Appendix: output of -print-multi-directory for all extensions available to -march=armv8-r Can you please add a representative subset of these as tests to gcc.target/arm/multilib.exp. That way we can have the peace of mind that they have sane mappings as we go forward. diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib index 2f790097670e1bf81b56b069a6b1582763aab6e9..cd5927a7c9ec053b4d5b9725f7b30daeca3b1aa3 100644 --- a/gcc/config/arm/t-multilib +++ b/gcc/config/arm/t-multilib @@ -70,6 +70,7 @@ v8_a_simd_variants := $(call all_feat_combs, simd crypto) v8_1_a_simd_variants := $(call all_feat_combs, simd crypto) v8_2_a_simd_variants := $(call all_feat_combs, simd fp16 fp16fml crypto dotprod) v8_4_a_simd_variants := $(call all_feat_combs, simd fp16 crypto) +v8_r_nosimd_variants := $(call all_feat_combs, crc fp.sp) ifneq (,$(HAS_APROFILE)) include $(srcdir)/config/arm/t-aprofile @@ -105,6 +106,20 @@ MULTILIB_MATCHES += march?armv7+fp=march?armv7-r+fp+idiv MULTILIB_MATCHES += $(foreach ARCH, $(all_early_arch), \ march?armv5te+fp=march?$(ARCH)+fp) +# +# Armv8-r: map down onto common v7 code. Please use Armv8-R. +# Note 1: there is no single-precision armv7 multilib so +fp.sp is mapped +# down to softfloat armv7 (second MULTILIB_MATCHES). +# Note 2: +fp.sp being a subset of +simd and +crypto, there is no need to +# consider the combination of +fp.sp with a simd extension since matching +# is run after canonicalization +MULTILIB_MATCHES += march?armv7=march?armv8-r +MULTILIB_MATCHES += $(foreach ARCH, $(v8_r_nosimd_variants), \ + march?armv7=march?armv8-r$(ARCH)) +MULTILIB_MATCHES += $(foreach ARCH
[PATCH, arm-embedded] Multilib mapping for Armv8-R
Hi, We have decided to apply the following patch to the ARM/embedded-7-branch to provide better multilib for Armv8-R targets. Due to there being no multilib mapping for Armv8-R, default multilib built for -march=armv4t with softfloat floating-point arithmetic is being used. This patch maps it instead to the existing Armv7 multilibs. Note that mapping for single-precision Armv8-R has been left out due to there being no Arm implementation of that architecture variant. Changelog entry is as follows: *** gcc/ChangeLog *** 2018-02-26 Thomas Preud'homme * config/arm/t-rmprofile: Map Armv8-R and Armv8-R with CRC extension to Armv7 multilibs. Testing: Ran -print-multi-directory for all combinations of -march=armv8-r/-march=armv8-r+crc with -mfpu=neon-fp-armv8/crypto-neon-fp-armv8. All gave the expected result. Details in appendix. Is this ok for stage4? Best regards, Thomas Appendix: output of -print-multi-directory for all supported Armv8-R configuration single precision FPU excepted. % for ext in "" +crc; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfloat-abi=soft -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done arm-none-eabi-gcc -march=armv8-r -mfloat-abi=soft -print-multi-directory: thumb/v7-ar arm-none-eabi-gcc -march=armv8-r+crc -mfloat-abi=soft -print-multi-directory: thumb/v7-ar % for ext in "" +crc; do for fpu in neon-fp-armv8 crypto-neon-fp-armv8; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfpu=${fpu} -mfloat-abi=softfp -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done ; done arm-none-eabi-gcc -march=armv8-r -mfpu=neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -march=armv8-r -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -march=armv8-r+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -march=armv8-r+crc -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp % for ext in "" +crc; do for fpu in neon-fp-armv8 crypto-neon-fp-armv8; do cmd="arm-none-eabi-gcc -march=armv8-r${ext} -mfpu=${fpu} -mfloat-abi=hard -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done ; done arm-none-eabi-gcc -march=armv8-r -mfpu=neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -march=armv8-r -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -march=armv8-r+crc -mfpu=neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -march=armv8-r+crc -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard % for ext in "" +crc; do cmd="arm-none-eabi-gcc -mthumb -march=armv8-r${ext} -mfpu=${fpu} -mfloat-abi=soft -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done arm-none-eabi-gcc -mthumb -march=armv8-r -mfpu=crypto-neon-fp-armv8 -mfloat-abi=soft -print-multi-directory: . arm-none-eabi-gcc -mthumb -march=armv8-r+crc -mfpu=crypto-neon-fp-armv8 -mfloat-abi=soft -print-multi-directory: . % for ext in "" +crc; do for fpu in neon-fp-armv8 crypto-neon-fp-armv8; do cmd="arm-none-eabi-gcc -mthumb -march=armv8-r${ext} -mfpu=${fpu} -mfloat-abi=softfp -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done ; done arm-none-eabi-gcc -mthumb -march=armv8-r -mfpu=neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -mthumb -march=armv8-r -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -mthumb -march=armv8-r+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp arm-none-eabi-gcc -mthumb -march=armv8-r+crc -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp -print-multi-directory: thumb/v7-ar/fpv3/softfp % for ext in "" +crc; do for fpu in neon-fp-armv8 crypto-neon-fp-armv8; do cmd="arm-none-eabi-gcc -mthumb -march=armv8-r${ext} -mfpu=${fpu} -mfloat-abi=hard -print-multi-directory" ; echo -n "$cmd: " ; eval $cmd ; done ; done arm-none-eabi-gcc -mthumb -march=armv8-r -mfpu=neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -mthumb -march=armv8-r -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -mthumb -march=armv8-r+crc -mfpu=neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard arm-none-eabi-gcc -mthumb -march=armv8-r+crc -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -print-multi-directory: thumb/v7-ar/fpv3/hard diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile index d4bc9fde4c5544812bde4743ccc18d68c1c25132..a3a24d59fb29b42a36177bd2d2ebfae4e50e5a10 100644 --- a/gcc/config/arm/t-rmprofile +++ b/gcc/config/arm/t-rmprofile @@ -135,6 +135,8 @@ MULTILIB_MATCHES += ma
[arm-embedded] Allow -mcpu=cortex-m33+nodsp
Hi, we decided to apply the following patch to ARM/embedded-7-branch to support -mcpu=cortex-m33+nodsp. DSP instructions are optional for Arm Cortex-M33, yet its -mcpu option does not allow +nodsp. Users are thus left with using -march=armv8-m.main -mtune=cortex-m33. This patch creates a new cpu cortex-m33+nodsp since there is no mechanism on GCC 7 for CPU extensions. Since GCC passes the -mcpu parameter down to GAS verbatim and that GAS does not support +nodsp for cortex-m33, this patch also special cases -mcpu=cortex-m33 in arm_file_start to output a .arch option instead of .cpu. 2018-02-26 Thomas Preud'homme * config/arm/arm-cpus.in (cortex-m33+nodsp): New CPU. * config/arm/arm-cpu-cdata.h: Regenerate. * config/arm/arm-cpu-data.h: Likewise. * config/arm/arm-cpu.h: Likewise. * config/arm/arm-tables.opt: Likewise. * config/arm/arm-tune.md: Likewise. * config/arm/arm.c (arm_file_start): Special case * -mcpu=cortex-m33+nodsp to emit .arch armv8-m.main instead. * doc/invoke.texi: Document cortex-m33+nodsp as a valid value for -mcpu and -mtune. Testing: Compiled a hello world with -S -mcpu=cortex-m33 and with -S -mcpu=cortex-m33+dsp and compared both assembly files. The latter correctly emits .arch armv8-m.main instead of .cpu cortex-m33. Best regards, Thomas diff --git a/gcc/ChangeLog.arm b/gcc/ChangeLog.arm index a98ecb028f6800a516f6cd252390ceac1e08911b..e09bd132d224aee511591143d86efff8bb156d60 100644 --- a/gcc/ChangeLog.arm +++ b/gcc/ChangeLog.arm @@ -1,3 +1,9 @@ +2018-02-26 Thomas Preud'homme + + * config/arm/arm-cpus.in (cortex-m33+nodsp): Define. + * doc/invoke.texi: Document +nodsp as a valid extension for + -mcpu=cortex-m33. + 2017-11-23 Thomas Preud'homme Cherry-pick from GCC 7 diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h index 27571c841d928fe9c331006bfc9608c4e75b60d8..f5e34c830ca28196ded0912c230f719a6ff5681e 100644 --- a/gcc/config/arm/arm-cpu-cdata.h +++ b/gcc/config/arm/arm-cpu-cdata.h @@ -789,6 +789,13 @@ static const struct arm_arch_core_flag arm_arch_core_flags[] = }, }, { +"cortex-m33+nodsp", +{ + ISA_ARMv8m_main, + isa_nobit +}, + }, + { "cortex-r52", { ISA_ARMv8r,isa_bit_crc32, diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h index e474efa02ed93a93ae00ac2057a9bc841c48b87f..30902ecabc6c72e46e6f6aa1d92b9980fd639dcd 100644 --- a/gcc/config/arm/arm-cpu-data.h +++ b/gcc/config/arm/arm-cpu-data.h @@ -1221,6 +1221,17 @@ static const struct processors all_cores[] = &arm_v7m_tune }, { +"cortex-m33+nodsp", +TARGET_CPU_cortexm33nodsp, +(TF_LDSCHED), +"8M_MAIN", BASE_ARCH_8M_MAIN, +{ + ISA_ARMv8m_main, + isa_nobit +}, +&arm_v7m_tune + }, + { "cortex-r52", TARGET_CPU_cortexr52, (TF_LDSCHED), diff --git a/gcc/config/arm/arm-cpu.h b/gcc/config/arm/arm-cpu.h index 502965081faa625abc93d97559517baf50972e1b..22566495fdf0da0ad75b81a5956eecb898c38684 100644 --- a/gcc/config/arm/arm-cpu.h +++ b/gcc/config/arm/arm-cpu.h @@ -130,6 +130,7 @@ enum processor_type TARGET_CPU_cortexa73cortexa53, TARGET_CPU_cortexm23, TARGET_CPU_cortexm33, + TARGET_CPU_cortexm33nodsp, TARGET_CPU_cortexr52, TARGET_CPU_arm_none }; diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 5f18dfb35687888bc7f642785693f75658a96733..7368a067db92b384f83fdb4a0af6cb77cff4e6f4 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -1090,6 +1090,13 @@ begin cpu cortex-m33 costs v7m end cpu cortex-m33 +begin cpu cortex-m33+nodsp + cname cortexm33nodsp + tune flags LDSCHED + architecture armv8-m.main + costs v7m +end cpu cortex-m33+nodsp + # V8 R-profile implementations. begin cpu cortex-r52 cname cortexr52 diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index ede44f497edd69390bbbe6de5a913430b546c547..a46bc3c7f8ba6048969bae4d37a7be3c5242ce6a 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -349,6 +349,9 @@ EnumValue Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33) EnumValue +Enum(processor_type) String(cortex-m33+nodsp) Value( TARGET_CPU_cortexm33nodsp) + +EnumValue Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52) Enum diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 519c0556fe76a5a391cd268bb50541c77a4596d4..542b7972d21cd3c9986229e91ce0841522e3b52f 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -57,5 +57,5 @@ cortexa73,exynosm1,xgene1, cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35, cortexa73cortexa53,cortexm23,cortexm33, - cortexr52" + cortexm33nodsp,cortexr52" (const (symbol_ref "((enum attr_tune) arm_tune)"))) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 8f2639f722b1c6a7a3541aa030221811f565fe5e..b37a8ae475489f2f12f8d0
Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase
Finally committed to gcc-7-branch, sorry for doing this so late. I've merged the two commits into one. Patch attached for reference. Best regards, Thomas On 05/12/17 21:26, Mike Stump wrote: On Dec 5, 2017, at 12:56 PM, Thomas Preudhomme wrote: Thanks, I've tested after the two commits and it works both in tree and out of tree. It'll simplify comparing in tree results Vs out of tree for us, thanks a lot! Would you consider a backport to stable branches if nobody complains after a week? Yeah, back port is Ok. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index b211dec4ffb20359f50bbc695481977282eb0525..b78c5f59bfc1121cf61071e41bd11551a9ab7122 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,12 @@ +2017-02-27 Thomas Preud'homme + + Backport from mainline + 2017-12-05 Matthew Gretton-Dann + with follow-up r255433 commit. + + * gcc.c-torture/unsorted/dump-noaddr.x: Generate dump files in + tmpdir. + 2018-02-26 Carl Love Backport from mainline: commit 257747 on 2018-02-16. diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x index d14d494570944b2be82c2575204cdbf4b15721ca..e86f36a1861fc4dc46bd449d78403f510ec4b920 100644 --- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x +++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x @@ -9,14 +9,14 @@ proc dump_compare { src options } { # loop through all the options foreach option $option_list { - file delete -force dump1 - file mkdir dump1 + file delete -force $tmpdir/dump1 + file mkdir $tmpdir/dump1 c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" - file delete -force dump2 - file mkdir dump2 + file delete -force $tmpdir/dump2 + file mkdir $tmpdir/dump2 c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr" - foreach dump1 [lsort [glob -nocomplain dump1/*]] { - regsub dump1/ $dump1 dump2/ dump2 + foreach dump1 [lsort [glob -nocomplain $tmpdir/dump1/*]] { + set dump2 "$tmpdir/dump2/[file tail $dump1]" set dumptail "gcc.c-torture/unsorted/[file tail $dump1]" regsub {\.\d+((t|r|i)\.[^.]+)$} $dumptail {.*\1} dumptail set tmp [ diff "$dump1" "$dump2" ] @@ -29,8 +29,8 @@ proc dump_compare { src options } { } } } -file delete -force dump1 -file delete -force dump2 +file delete -force $tmpdir/dump1 +file delete -force $tmpdir/dump2 } dump_compare $src $options
[PATCH, GCC/testsuite/ARM] Fix copysign_softfloat_1.c option directives
gcc.target/arm/copysign_softfloat_1.c's use of arm_arch_v6t2 in dg-add-option changes the architecture to -march=armv6t2. Since the test only requires Thumb-2 capable architecture, we just need to add -mthumb on the command line since arm_thumb2_ok guarantees by definition that doing that is enough to select Thumb-2. This fixes warning on the command line when having -mcpu=cortex-m3 in RUNTESTFLAGS for instance. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2018-03-01 Thomas Preud'homme diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c index fdbeeadc01e1c9b9a7810a8ff8b23c58f6c429a5..a14922f1c12aeb4a22ee38fde188691d5a89de81 100644 --- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c +++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c @@ -1,7 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_thumb2_ok } */ -/* { dg-add-options arm_arch_v6t2 } */ -/* { dg-additional-options "-O2 --save-temps" } */ +/* { dg-additional-options "-mthumb -O2 --save-temps" } */ extern void abort (void);
[PATCH, GCC/testsuite] Fix FAIL display for some scan-*-times directives
Hi, scan-assembler-times and scan-tree-dump-times dejagnu directives show a different output in the summary files depending on whether they PASS or FAIL. This means that dg-cmp-results would not show a regression because it would not see a connection between the two output. The difference comes from the FAIL showing the number of actual times the pattern was match, presumably to help debugging. This patch moves the info regarding the actual number of times the pattern match in a separate verbose message. This keeps the message unchanged but let developers have the required debug message with -v. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2018-03-13 Thomas Preud'homme * lib/scanasm.exp (scan-assembler-times): Move FAIL debug info into a separate verbose message. * lib/scandump.exp (scan-dump-times): Likewise. Testing: Made a modified version of gcc.dg/nand.c and gcc.dg/torture/pr61772.c to FAIL their scan-assembler-times and scan-tree-dump-times respective directives. Without the patch dg-cmp-results does not flag any regression but does with the patch. Is this ok for stage 4? Best regards, Thomas diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp index 3a775b0a812775193cf1181337a5b890cde74133..61e0f3f48aeea5785689c5df7a15dc2ccbc71029 100644 --- a/gcc/testsuite/lib/scanasm.exp +++ b/gcc/testsuite/lib/scanasm.exp @@ -266,7 +266,8 @@ proc scan-assembler-times { args } { if {$result_count == $times} { pass "$testcase scan-assembler-times $pp_pattern $times" } else { - fail "$testcase scan-assembler-times $pp_pattern $times (found $result_count times)" + verbose -log "$testcase: $pp_pattern found $result_count times" + fail "$testcase scan-assembler-times $pp_pattern $times" } } diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp index 4e3da972ae4ed09c9874eb384daf825e6e2dcde3..be8fbe8b461dc81d5683fe323c0913f678daa1e0 100644 --- a/gcc/testsuite/lib/scandump.exp +++ b/gcc/testsuite/lib/scandump.exp @@ -110,7 +110,8 @@ proc scan-dump-times { args } { if {$result_count == $times} { pass "$testname" } else { -fail "$testname (found $result_count times)" + verbose -log "$testcase: pattern found $result_count times" +fail "$testname" } }
[arm-embedded][PATCH] Add multilib mapping for -mcpu=cortex-m33+nodsp
Hi, Currently -mcpu=cortex-m33+nodsp gets assigned the thumb multilib due to lack of mapping from -mcpu=cortex-m33+nodsp to an -march option. This leads to link failures for linking Armv4T Thumb code from the multilib with Armv8-M Mainline code from the code being compiled. This patch adds the appropriate mapping. ChangeLog entry is as follows: *** gcc/ChangeLog.arm *** 2018-03-14 Thomas Preud'homme * config/arm/t-rmprofile: Add mapping from -mcpu=cortex-m33+nodsp to -march=armv8-m.main. Testing: A hello world fails to link without the patch with a multilib build but succeeds with the patch. We've decided to apply this patch to the ARM/embedded-7-branch branch. Best regards, Thomas diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile index a3a24d59fb29b42a36177bd2d2ebfae4e50e5a10..54411795215b8aff90ba9cfb806ec7b33db4caea 100644 --- a/gcc/config/arm/t-rmprofile +++ b/gcc/config/arm/t-rmprofile @@ -102,6 +102,7 @@ MULTILIB_MATCHES += march?armv7e-m=mcpu?cortex-m4 MULTILIB_MATCHES += march?armv7e-m=mcpu?cortex-m7 MULTILIB_MATCHES += march?armv8-m.base=mcpu?cortex-m23 MULTILIB_MATCHES += march?armv8-m.main=mcpu?cortex-m33 +MULTILIB_MATCHES += march?armv8-m.main=mcpu?cortex-m33+nodsp MULTILIB_MATCHES += march?armv7=mcpu?cortex-r4 MULTILIB_MATCHES += march?armv7=mcpu?cortex-r4f MULTILIB_MATCHES += march?armv7=mcpu?cortex-r5
[arm-embedded][PATCH] Add multilib mapping for -mcpu=cortex-r52
Hi, Currently -mcpu=cortex-r52 gets assigned the default multilib due to lack of mapping from -mcpu=cortex-r52 to an -march option. This is inconsistent with -march=armv8-r which gets the thumb/v7-ar multilib. This patch adds the appropriate mapping. ChangeLog entry is as follows: *** gcc/ChangeLog.arm *** 2018-03-15 Thomas Preud'homme * config/arm/t-rmprofile: Add mapping from -mcpu=cortex-r52 to -march=armv7. Testing: -mcpu=cortex-r52 -print-multi-directory prints . (ie. default mutlilib) without the patch with a multilib build but prints the expected thumb/v7-ar with the patch. We've decided to apply this patch to the ARM/embedded-7-branch. Best regards, Thomas
[PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result
Hi, __builtin_cmse_nonsecure_caller implementation returns true in almost all cases due to 2 separate bugs: * gen_addsi is used instead of gen_andsi to retrieve the lsb * the lsb boolean value is not negated but the specification [1] says the intrinsic should return true for a nonsecure caller and a nonsecure caller is characterized with LR's lsb being 0 This was not caught due to (1) lack of runtime test and (2) the existing RTL scan not taking into account that '.' matches newline in Tcl regular expressions. This patch fixes the implementation issues and improves testing of cmse_nonsecure_caller by (1) adding a runtime test for the secure caller case and (2) looking for an SET insn of an AND expression in the right function. This leaves the nonsecure caller case only partly tested since the exact value being AND and the negation are not covered by the scan and the existing test infrastructure does not allow 2 separate compilation and link to be performed. It is enough though to catch the current incorrect behavior. The patch also reorganize the scan directives in cmse-1.c to more easily identify what function they are intended to test in the file. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * config/arm/arm-builtins.c (arm_expand_builtin): Change expansion to perform a bitwise AND of the argument followed by a boolean negation of the result. *** gcc/testsuite/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan to match a single insn of the baz function. Move scan directives at the end of the file below the functions they are trying to test for better readability. * gcc.target/arm/cmse/cmse-16.c: New testcase. Testing: No bootstrap since only M profile builtin code has been changed but regression testing for arm-none-eabi targeting Arm Cortex-M23 and Cortex-M33 shows no regression. Is this ok for stage4? Best regards, Thomas diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 8940d1f6311bccf86664ab2eaa938735eec595f6..184eb2a934308717b6e1054e376487a297f8d5de 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -2600,7 +2600,9 @@ arm_expand_builtin (tree exp, case ARM_BUILTIN_CMSE_NONSECURE_CALLER: target = gen_reg_rtx (SImode); op0 = arm_return_addr (0, NULL_RTX); - emit_insn (gen_addsi3 (target, op0, const1_rtx)); + emit_insn (gen_andsi3 (target, op0, const1_rtx)); + op1 = gen_rtx_EQ (SImode, target, const0_rtx); + emit_insn (gen_cstoresi4 (target, op1, target, const0_rtx)); return target; case ARM_BUILTIN_TEXTRMSB: diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c index c13272eed683aa06db027cd4646e5fe67817212b..f764153cb17b796ccd0d20abb78d5cf56be52911 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c @@ -71,6 +71,20 @@ baz (void) { return cmse_nonsecure_caller (); } +/* { dg-final { scan-assembler "baz:" } } */ +/* { dg-final { scan-assembler "__acle_se_baz:" } } */ +/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */ +/* Look for an andsi of 1 with a register in function baz, ie. + +;; Function baz + +(insn (set (reg:SI ) + (and:SI (reg:SI ) + (const_int 1 ) + > +(insn +*/ +/* { dg-final { scan-rtl-dump "\n;; Function baz\[^\n\]*\[^(\]+\[^;\]*\n\\(insn \[^(\]+ \\(set \\(reg\[^:\]*:SI \[^)\]+\\)\[^(\]*\\(and:SI \\(reg\[^:\]*:SI \[^)\]+\\)\[^(\]*\\((const_int 1|reg\[^:\]*:SI) \[^)\]+\\)\[^(\]+(\\(nil\\)\[^(\]+)?\\(insn" expand } } */ typedef int __attribute__ ((cmse_nonsecure_call)) (int_nsfunc_t) (void); @@ -86,6 +100,11 @@ qux (int_nsfunc_t * callback) { fp = cmse_nsfptr_create (callback); } +/* { dg-final { scan-assembler "qux:" } } */ +/* { dg-final { scan-assembler "__acle_se_qux:" } } */ +/* { dg-final { scan-assembler "bic" } } */ +/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */ +/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */ int call_callback (void) { @@ -94,13 +113,4 @@ int call_callback (void) else return default_callback (); } -/* { dg-final { scan-assembler "baz:" } } */ -/* { dg-final { scan-assembler "__acle_se_baz:" } } */ -/* { dg-final { scan-assembler "qux:" } } */ -/* { dg-final { scan-assembler "__acle_se_qux:" } } */ -/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */ -/* { dg-final { scan-rtl-dump "and.*reg.*const_int 1" expand } } */ -/* { dg-final { scan-assembler "bic" } } */ -/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */ -/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */ /* { dg-final { scan-assembler-times "bl\\s+__gnu_cmse_nonsecure_call" 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-16.c b/gcc/testsui
Re: [PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result
Oops, forgot the link. On 04/04/18 18:03, Thomas Preudhomme wrote: Hi, __builtin_cmse_nonsecure_caller implementation returns true in almost all cases due to 2 separate bugs: * gen_addsi is used instead of gen_andsi to retrieve the lsb * the lsb boolean value is not negated but the specification [1] says the intrinsic should return true for a nonsecure caller and a nonsecure caller is characterized with LR's lsb being 0 [1] https://static.docs.arm.com/ecm0359818/10/ECM0359818_armv8m_security_extensions_reqs_on_dev_tools_1_0.pdf Best regards, Thomas This was not caught due to (1) lack of runtime test and (2) the existing RTL scan not taking into account that '.' matches newline in Tcl regular expressions. This patch fixes the implementation issues and improves testing of cmse_nonsecure_caller by (1) adding a runtime test for the secure caller case and (2) looking for an SET insn of an AND expression in the right function. This leaves the nonsecure caller case only partly tested since the exact value being AND and the negation are not covered by the scan and the existing test infrastructure does not allow 2 separate compilation and link to be performed. It is enough though to catch the current incorrect behavior. The patch also reorganize the scan directives in cmse-1.c to more easily identify what function they are intended to test in the file. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * config/arm/arm-builtins.c (arm_expand_builtin): Change expansion to perform a bitwise AND of the argument followed by a boolean negation of the result. *** gcc/testsuite/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan to match a single insn of the baz function. Move scan directives at the end of the file below the functions they are trying to test for better readability. * gcc.target/arm/cmse/cmse-16.c: New testcase. Testing: No bootstrap since only M profile builtin code has been changed but regression testing for arm-none-eabi targeting Arm Cortex-M23 and Cortex-M33 shows no regression. Is this ok for stage4? Best regards, Thomas
Re: [PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result
Hi Kyrill, On 04/04/18 18:19, Kyrill Tkachov wrote: Hi Thomas, On 04/04/18 18:03, Thomas Preudhomme wrote: Hi, __builtin_cmse_nonsecure_caller implementation returns true in almost all cases due to 2 separate bugs: * gen_addsi is used instead of gen_andsi to retrieve the lsb * the lsb boolean value is not negated but the specification [1] says the intrinsic should return true for a nonsecure caller and a nonsecure caller is characterized with LR's lsb being 0 This was not caught due to (1) lack of runtime test and (2) the existing RTL scan not taking into account that '.' matches newline in Tcl regular expressions. This patch fixes the implementation issues and improves testing of cmse_nonsecure_caller by (1) adding a runtime test for the secure caller case and (2) looking for an SET insn of an AND expression in the right function. This leaves the nonsecure caller case only partly tested since the exact value being AND and the negation are not covered by the scan and the existing test infrastructure does not allow 2 separate compilation and link to be performed. It is enough though to catch the current incorrect behavior. The patch also reorganize the scan directives in cmse-1.c to more easily identify what function they are intended to test in the file. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * config/arm/arm-builtins.c (arm_expand_builtin): Change expansion to perform a bitwise AND of the argument followed by a boolean negation of the result. *** gcc/testsuite/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan to match a single insn of the baz function. Move scan directives at the end of the file below the functions they are trying to test for better readability. * gcc.target/arm/cmse/cmse-16.c: New testcase. Testing: No bootstrap since only M profile builtin code has been changed but regression testing for arm-none-eabi targeting Arm Cortex-M23 and Cortex-M33 shows no regression. Is this ok for stage4? Ok, thanks for fixing this. Does this need backporting to the branches? Yes to gcc-7-branch only. Best regards, Thomas
Re: [PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result
Hi Kyrill, On 04/04/18 18:20, Thomas Preudhomme wrote: Hi Kyrill, On 04/04/18 18:19, Kyrill Tkachov wrote: Hi Thomas, On 04/04/18 18:03, Thomas Preudhomme wrote: Hi, __builtin_cmse_nonsecure_caller implementation returns true in almost all cases due to 2 separate bugs: * gen_addsi is used instead of gen_andsi to retrieve the lsb * the lsb boolean value is not negated but the specification [1] says the intrinsic should return true for a nonsecure caller and a nonsecure caller is characterized with LR's lsb being 0 This was not caught due to (1) lack of runtime test and (2) the existing RTL scan not taking into account that '.' matches newline in Tcl regular expressions. This patch fixes the implementation issues and improves testing of cmse_nonsecure_caller by (1) adding a runtime test for the secure caller case and (2) looking for an SET insn of an AND expression in the right function. This leaves the nonsecure caller case only partly tested since the exact value being AND and the negation are not covered by the scan and the existing test infrastructure does not allow 2 separate compilation and link to be performed. It is enough though to catch the current incorrect behavior. The patch also reorganize the scan directives in cmse-1.c to more easily identify what function they are intended to test in the file. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * config/arm/arm-builtins.c (arm_expand_builtin): Change expansion to perform a bitwise AND of the argument followed by a boolean negation of the result. *** gcc/testsuite/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan to match a single insn of the baz function. Move scan directives at the end of the file below the functions they are trying to test for better readability. * gcc.target/arm/cmse/cmse-16.c: New testcase. Testing: No bootstrap since only M profile builtin code has been changed but regression testing for arm-none-eabi targeting Arm Cortex-M23 and Cortex-M33 shows no regression. Is this ok for stage4? Ok, thanks for fixing this. Does this need backporting to the branches? Yes to gcc-7-branch only. The patch applies cleanly on gcc-7-branch and the same testing shows no regression. Ok to apply to gcc-7-branch once the patch has baked for 7 days in trunk? Best regards, Thomas
[PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin
Instruction pattern for setting the FPSCR expects the input value to be in a register. However, __builtin_arm_set_fpscr expander does not ensure that this is the case and as a result GCC ICEs when the builtin is called with a constant literal. This commit fixes the builtin to force the input value into a register. It also remove the unneeded volatile in the existing fpscr test and fixes the function prototype. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * config/arm/arm-builtins.c (arm_expand_builtin): Force input operand into register. *** gcc/testsuite/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with literal value. Expect 2 MCR instruction. Fix function prototype. Remove volatile keyword. Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows no regression. Is this ok for stage4? Best regards, Thomas diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 8940d1f6311bccf86664ab2eaa938735eec595f6..e100d933a77c5de4a13cb961d1bff40f57f2ea80 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -2592,7 +2592,7 @@ arm_expand_builtin (tree exp, icode = CODE_FOR_set_fpscr; arg0 = CALL_EXPR_ARG (exp, 0); op0 = expand_normal (arg0); - pat = GEN_FCN (icode) (op0); + pat = GEN_FCN (icode) (force_reg (SImode, op0)); } emit_insn (pat); return target; diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c index 7b4d71d72d8964f6da0d0604bf59aeb4a895df43..4c3eaf7fcf75ad8582071ecb110fd1e4976a3b24 100644 --- a/gcc/testsuite/gcc.target/arm/fpscr.c +++ b/gcc/testsuite/gcc.target/arm/fpscr.c @@ -6,11 +6,14 @@ /* { dg-add-options arm_fp } */ void -test_fpscr () +test_fpscr (void) { - volatile unsigned int status = __builtin_arm_get_fpscr (); + unsigned status; + + __builtin_arm_set_fpscr (0); + status = __builtin_arm_get_fpscr (); __builtin_arm_set_fpscr (status); } /* { dg-final { scan-assembler "mrc\tp10, 7, r\[0-9\]+, cr1, cr0, 0" } } */ -/* { dg-final { scan-assembler "mcr\tp10, 7, r\[0-9\]+, cr1, cr0, 0" } } */ +/* { dg-final { scan-assembler-times "mcr\tp10, 7, r\[0-9\]+, cr1, cr0, 0" 2 } } */
Re: [PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin
On 06/04/18 17:08, Ramana Radhakrishnan wrote: On 06/04/2018 16:54, Thomas Preudhomme wrote: Instruction pattern for setting the FPSCR expects the input value to be in a register. However, __builtin_arm_set_fpscr expander does not ensure that this is the case and as a result GCC ICEs when the builtin is called with a constant literal. This commit fixes the builtin to force the input value into a register. It also remove the unneeded volatile in the existing fpscr test and fixes the function prototype. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * config/arm/arm-builtins.c (arm_expand_builtin): Force input operand into register. *** gcc/testsuite/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with literal value. Expect 2 MCR instruction. Fix function prototype. Remove volatile keyword. Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows no regression. Is this ok for stage4? Best regards, Thomas (sorry about the duplicate for those who get it) LGTM, though in this case I would prefer a bootstrap and regression run as this is automatically exercised most with gcc.dg/atomic_*.c and you really need this tested on linux than just bare-metal as I'm not sure how this gets tested on arm-none-eabi. Oh it is indeed. Didn't realized it was used anywhere. Will start bootstrap right away. What about earlier branches, have you looked ? This is a silly target bug and fixes should go back to older branches in this particular case after baking this on trunk for some time. GCC 6 and 7 are affected as well and a backport will be done once it has baked long enough of course. Best regards, Thomas
Re: [PATCH, GCC/ARM] Fix PR85261: ICE with FPSCR setter builtin
Hi Ramana, On 06/04/18 17:17, Thomas Preudhomme wrote: On 06/04/18 17:08, Ramana Radhakrishnan wrote: On 06/04/2018 16:54, Thomas Preudhomme wrote: Instruction pattern for setting the FPSCR expects the input value to be in a register. However, __builtin_arm_set_fpscr expander does not ensure that this is the case and as a result GCC ICEs when the builtin is called with a constant literal. This commit fixes the builtin to force the input value into a register. It also remove the unneeded volatile in the existing fpscr test and fixes the function prototype. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * config/arm/arm-builtins.c (arm_expand_builtin): Force input operand into register. *** gcc/testsuite/ChangeLog *** 2018-04-06 Thomas Preud'homme PR target/85261 * gcc.target/arm/fpscr.c: Add call to __builtin_arm_set_fpscr with literal value. Expect 2 MCR instruction. Fix function prototype. Remove volatile keyword. Testing: Built an arm-none-eabi GCC cross-compiler and testsuite shows no regression. Is this ok for stage4? Best regards, Thomas (sorry about the duplicate for those who get it) LGTM, though in this case I would prefer a bootstrap and regression run as this is automatically exercised most with gcc.dg/atomic_*.c and you really need this tested on linux than just bare-metal as I'm not sure how this gets tested on arm-none-eabi. Oh it is indeed. Didn't realized it was used anywhere. Will start bootstrap right away. Done with --with-arch=armv8-a --with-mode=thumb --with-fpu=neon-vfpv4 --with-float=hard --enable-languages=c,c++,fortran --with-system-zlib --enable-plugins --enable-bootstrap. Testsuite for that GCC does not show any regression either. Ok to commit? What about earlier branches, have you looked ? This is a silly target bug and fixes should go back to older branches in this particular case after baking this on trunk for some time. GCC 6 and 7 are affected as well and a backport will be done once it has baked long enough of course. Will now bootstrap and regtest against GCC 6 and 7. Will let you know once that is finished. Best regards, Thomas
Re: [PATCH, GCC/ARM] Fix PR85203: cmse_nonsecure_caller returns wrong result
Hi Kyrill, One week went by so I've committed the change to GCC 7 as announced. Best regards, Thomas On 05/04/18 16:36, Kyrill Tkachov wrote: On 05/04/18 16:13, Thomas Preudhomme wrote: Hi Kyrill, On 04/04/18 18:20, Thomas Preudhomme wrote: Hi Kyrill, On 04/04/18 18:19, Kyrill Tkachov wrote: Hi Thomas, On 04/04/18 18:03, Thomas Preudhomme wrote: Hi, __builtin_cmse_nonsecure_caller implementation returns true in almost all cases due to 2 separate bugs: * gen_addsi is used instead of gen_andsi to retrieve the lsb * the lsb boolean value is not negated but the specification [1] says the intrinsic should return true for a nonsecure caller and a nonsecure caller is characterized with LR's lsb being 0 This was not caught due to (1) lack of runtime test and (2) the existing RTL scan not taking into account that '.' matches newline in Tcl regular expressions. This patch fixes the implementation issues and improves testing of cmse_nonsecure_caller by (1) adding a runtime test for the secure caller case and (2) looking for an SET insn of an AND expression in the right function. This leaves the nonsecure caller case only partly tested since the exact value being AND and the negation are not covered by the scan and the existing test infrastructure does not allow 2 separate compilation and link to be performed. It is enough though to catch the current incorrect behavior. The patch also reorganize the scan directives in cmse-1.c to more easily identify what function they are intended to test in the file. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * config/arm/arm-builtins.c (arm_expand_builtin): Change expansion to perform a bitwise AND of the argument followed by a boolean negation of the result. *** gcc/testsuite/ChangeLog *** 2018-04-04 Thomas Preud'homme PR target/85203 * gcc.target/arm/cmse/cmse-1.c: Tighten cmse_nonsecure_caller RTL scan to match a single insn of the baz function. Move scan directives at the end of the file below the functions they are trying to test for better readability. * gcc.target/arm/cmse/cmse-16.c: New testcase. Testing: No bootstrap since only M profile builtin code has been changed but regression testing for arm-none-eabi targeting Arm Cortex-M23 and Cortex-M33 shows no regression. Is this ok for stage4? Ok, thanks for fixing this. Does this need backporting to the branches? Yes to gcc-7-branch only. The patch applies cleanly on gcc-7-branch and the same testing shows no regression. Ok to apply to gcc-7-branch once the patch has baked for 7 days in trunk? Yes, thanks. Kyrill Best regards, Thomas
Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Hi all, Ping? This new version changes both the middle-end and back-end part so will need a review for both of those. Best regards, Thomas On Wed, 29 Aug 2018 at 11:07, Thomas Preudhomme wrote: > > Forgot another important change in ARM backend: > > The expander were causing one too many indirection which was what > caused the test failure in glibc. The new expanders code skip the > creation of a move from the memory reference of the guard's address to > a register since this is done in the insn themselves. I think during > the initial implementation of the first version of the patch I had > issues with loading the address and used that to load the address. As > can be seen from the absence of regression on the runtime stack > protector test in glibc, this is now working properly, also confirmed > by manual inspection of the code. > > I've attached the interdiff from previous version for reference. > > Best regards, > > Thomas > On Wed, 29 Aug 2018 at 10:51, Thomas Preudhomme > wrote: > > > > Resend hopefully without HTML this time. > > > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme > > wrote: > > > > > > Hi, > > > > > > I've reworked the patch fixing PR85434 (spilling of stack protector > > > guard's address on ARM) to address the testsuite regression on powerpc > > > and x86 as well as glibc testsuite regression on ARM. Issues were due to > > > unconditionally attempting to generate the new patterns. The code now > > > tests if there is a pattern for them for the target before generating > > > them. In the ARM side of the patch, I've also added a more specific > > > predicate for the new patterns. The new patch is found below. > > > > > > > > > In case of high register pressure in PIC mode, address of the stack > > > protector's guard can be spilled on ARM targets as shown in PR85434, > > > thus allowing an attacker to control what the canary would be compared > > > against. ARM does lack stack_protect_set and stack_protect_test insn > > > patterns, defining them does not help as the address is expanded > > > regularly and the patterns only deal with the copy and test of the > > > guard with the canary. > > > > > > This problem does not occur for x86 targets because the PIC access and > > > the test can be done in the same instruction. Aarch64 is exempt too > > > because PIC access insn pattern are mov of UNSPEC which prevents it from > > > the second access in the epilogue being CSEd in cse_local pass with the > > > first access in the prologue. > > > > > > The approach followed here is to create new "combined" set and test > > > standard pattern names that take the unexpanded guard and do the set or > > > test. This allows the target to use an opaque pattern (eg. using UNSPEC) > > > to hide the individual instructions being generated to the compiler and > > > split the pattern into generic load, compare and branch instruction > > > after register allocator, therefore avoiding any spilling. This is here > > > implemented for the ARM targets. For targets not implementing these new > > > standard pattern names, the existing stack_protect_set and > > > stack_protect_test pattern names are used. > > > > > > To be able to split PIC access after register allocation, the functions > > > had to be augmented to force a new PIC register load and to control > > > which register it loads into. This is because sharing the PIC register > > > between prologue and epilogue could lead to spilling due to CSE again > > > which an attacker could use to control what the canary gets compared > > > against. > > > > > > ChangeLog entries are as follows: > > > > > > *** gcc/ChangeLog *** > > > > > > 2018-08-09 Thomas Preud'homme > > > > > > * target-insns.def (stack_protect_combined_set): Define new standard > > > pattern name. > > > (stack_protect_combined_test): Likewise. > > > * cfgexpand.c (stack_protect_prologue): Try new > > > stack_protect_combined_set pattern first. > > > * function.c (stack_protect_epilogue): Try new > > > stack_protect_combined_test pattern first. > > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now > > > parameters to control which register to use as PIC register and force > > > reloading PIC register respectively. Insert in the stream of insns if > > &
[PATCH, GCC/ARM] Fix PR87374: ICE with -mslow-flash-data and -mword-relocations
Hi, GCC ICEs under -mslow-flash-data and -mword-relocations because there is no way to load an address, both literal pools and MOVW/MOVT being forbidden. This patch gives an error message when both options are specified by the user and adds the according dg-skip-if directives for tests that use either of these options. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2018-09-25 Thomas Preud'homme PR target/87374 * config/arm/arm.c (arm_option_check_internal): Disable the combined use of -mslow-flash-data and -mword-relocations. *** gcc/testsuite/ChangeLog *** 2018-09-25 Thomas Preud'homme PR target/87374 * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data and -mword-relocations would be passed when compiling the test. * gcc.target/arm/movsi_movt.c: Likewise. * gcc.target/arm/pr81863.c: Likewise. * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise. * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise. * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise. * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise. * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise. * gcc.target/arm/tls-disable-literal-pool.c: Likewise. Testing: Bootstrapped in Thumb-2 mode. No testsuite regression when targeting arm-none-eabi. Modified tests get skipped as expected when running the testsuite with -mslow-flash-data (pr81863.c) or -mword-relocations (all the others). Is this ok for trunk? I'd also appreciate guidance on whether this is worth a backport. It's a simple patch but on the other hand it only prevents some option combination, it does not fix anything so I have mixed feelings. Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6332e68df05..5beffc875c1 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2893,17 +2893,22 @@ arm_option_check_internal (struct gcc_options *opts) flag_pic = 0; } - /* We only support -mpure-code and -mslow-flash-data on M-profile targets - with MOVT. */ - if ((target_pure_code || target_slow_flash_data) - && (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON)) + if (target_pure_code || target_slow_flash_data) { const char *flag = (target_pure_code ? "-mpure-code" : "-mslow-flash-data"); - error ("%s only supports non-pic code on M-profile targets with the " - "MOVT instruction", flag); -} + /* We only support -mpure-code and -mslow-flash-data on M-profile targets + with MOVT. */ + if (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON) + error ("%s only supports non-pic code on M-profile targets with the " + "MOVT instruction", flag); + + /* Cannot load addresses: -mslow-flash-data forbids literal pool and + -mword-relocations forbids relocation of MOVT/MOVW. */ + if (target_word_relocations) + error ("%s incompatible with -mword-relocations", flag); +} } /* Recompute the global settings depending on target attribute options. */ diff --git a/gcc/testsuite/gcc.target/arm/movdi_movt.c b/gcc/testsuite/gcc.target/arm/movdi_movt.c index e2a28ccbd99..a01ffa0dc93 100644 --- a/gcc/testsuite/gcc.target/arm/movdi_movt.c +++ b/gcc/testsuite/gcc.target/arm/movdi_movt.c @@ -1,4 +1,5 @@ /* { dg-do compile { target { arm_cortex_m && { arm_thumb2_ok || arm_thumb1_movt_ok } } } } */ +/* { dg-skip-if "-mslow-flash-data and -mword-relocations incompatible" { *-*-* } { "-mword-relocations" } } */ /* { dg-options "-O2 -mslow-flash-data" } */ unsigned long long diff --git a/gcc/testsuite/gcc.target/arm/movsi_movt.c b/gcc/testsuite/gcc.target/arm/movsi_movt.c index 3cf46e2fd17..19d202ecd33 100644 --- a/gcc/testsuite/gcc.target/arm/movsi_movt.c +++ b/gcc/testsuite/gcc.target/arm/movsi_movt.c @@ -1,4 +1,5 @@ /* { dg-do compile { target { arm_cortex_m && { arm_thumb2_ok || arm_thumb1_movt_ok } } } } */ +/* { dg-skip-if "-mslow-flash-data and -mword-relocations incompatible" { *-*-* } { "-mword-relocations" } } */ /* { dg-options "-O2 -mslow-flash-data" } */ unsigned diff --git a/gcc/testsuite/gcc.target/arm/pr81863.c b/gcc/testsuite/gcc.target/arm/pr81863.c index 63b1ed66b2c..225a0c5cc2b 100644 --- a/gcc/testsuite/gcc.target/arm/pr81863.c +++ b/gcc/testsuite/gcc.target/arm/pr81863.c @@ -1,5 +1,6 @@ /* testsuite/gcc.target/arm/pr48183.c */ /* { dg-do compile } */ +/* { dg-skip-if "-mslow-flash-data and -mword-relocations incompatible" { *-*-* } { "-mslow-flash-data" } } */ /* { dg-options "-O2 -mword-relocations -march=armv7-a -marm" } */ /* { dg-final { scan-assembler-not "\[\\t \]+movw" } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c index 089a72b67f3..d10391a69ac 100644 --- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c +++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c @@ -6,6 +6,7 @@ /* { dg-do compile } */ /* { dg-require-e