from:"Kyrill Tkachov"

Re: [PATCH][aarch64] Avoid tag collisions for loads on falkor

2018-07-02 Thread Kyrill Tkachov


Hi Siddhesh,

On 02/07/18 10:15, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February[1].

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.



Nice! What were the regressions though? Would be nice to adjust the tests
to make them more robust so that we have as clean a testsuite as possible.


[1] https://patchwork.ozlabs.org/patch/872532/

2018-07-02  Siddhesh Poyarekar 
Kugan Vivekanandarajah 

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.



More comments inline, but a general observation:
in the function comment for the new functions can you please include a 
description
of the function arguments and the meaning of the return value (for example, 
some functions return -1 ; what does that mean?).
It really does make it much easier to maintain the code after some time has 
passed.

Thanks,
Kyrill


---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 ++
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 821 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 8 files changed, 891 insertions(+), 46 deletions(-)
 create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4d9f9c6ea29..b78a30f5d69 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
 extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 c_target_objs="aarch64-c.o"
 cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 target_has_targetm_common=yes
 ;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
. */

 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4ea50acaa59..175a3faf057 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -283,6 +283,49 @@ struct tune_params
   const struct cpu_prefetch_tune *prefetch;
 };

+/* Classifies an address.
+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol r

Re: [PATCH][arm] Avoid STRD with odd register for TARGET_ARM in output_move_double

2018-07-02 Thread Kyrill Tkachov


Hi Christophe,

On 02/07/18 13:17, Christophe Lyon wrote:

On Fri, 29 Jun 2018 at 15:32, Kyrill Tkachov
 wrote:

Hi all,

In this testcase the user forces an odd register as the starting reg for a 
DFmode value.
The output_move_double function tries to store that using an STRD instruction.
But for TARGET_ARM the starting register of an STRD must be an even one.
This is always the case with compiler-allocated registers for DFmode values, 
but the
inline assembly forced our hand here.

This patch  restricts the STRD-emitting logic in output_move_double to not avoid
odd-numbered source registers in STRD.
I'm not a fan of the whole function, we should be exposing a lot of the logic 
in there
to RTL rather than at the final output stage, but that would need to be fixed 
separately.

This patch is much safer for backporting purposes.

Bootstrapped and tested on arm-none-linux-gnueabihf.


Hi Kyrill,

I think you want to skip this test if one overrides -mfloat-abi, like
the small attached patch does.
OK?


Ok.
Thanks,
Kyrill



Thanks,

Christophe


Committing to trunk.
Thanks,
Kyrill

2018-06-29  Kyrylo Tkachov  

  * config/arm/arm.c (output_move_double): Don't allow STRD instructions
  if starting source register is not even.

2018-06-29  Kyrylo Tkachov  

  * gcc.target/arm/arm-soft-strd-even.c: New test.

Re: [PATCH, GCC, AARCH64] Add support for +profile extension

2018-07-09 Thread Kyrill Tkachov


Hi Andre,

On 09/07/18 14:20, Andre Vieira (lists) wrote:

Hi,

This patch adds support for the Statistical Profiling Extension (SPE) on
AArch64. Even though the compiler will not generate code any differently
given this extension, it will need to pass it on to the assembler in
order to let it correctly assemble inline asm containing accesses to the
extension's system registers.  The same applies when using the
preprocessor on an assembly file as this first must pass through cc1.

I left the hwcaps string for SPE empty as the kernel does not define a
feature string for this extension.  The current effect of this is that
driver will disable profile feature bit in GCC.  This is OK though
because we don't, nor do we ever, enable this feature bit, as codegen is
not affect by the SPE support and more importantly the driver will still
pass the extension down to the assembler regardless.

Boostrapped aarch64-none-linux-gnu and ran regression tests.

Is it OK for trunk?



This looks sensible to me (though you'll need approval from a maintainer) with 
a small ChangeLog nit...


gcc/ChangeLog:
2018-07-09  Andre Vieira 

* config/aarch64/aarch64-option-extensions.def: New entry for profile
extension.
* config/aarch64/aarch64.h (AARCH64_FL_PROFILE): New.
* doc/invoke.texi (aarch64-feature-modifiers): New entry for profile
extension.

gcc/testsuite/ChangeLog:
2018-07-09 Andre Vieira 



... Make sure there's two spaces between the date, name and email.

Thanks,
Kyrill


* gcc.target/aarch64/profile.c: New test.

Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-10 Thread Kyrill Tkachov


Hi Jackson,

On 10/07/18 09:37, Jackson Woodruff wrote:

Hi all,

This patch removes some duplicated code.  Since this method deals with
four loads or stores, there is a lot of duplicated code that can easily
be replaced with smaller loops.

Regtest and bootstrap OK.

OK for trunk?



This looks like a good cleanup. There are no functional changes, right?
Looks good to me, but you'll need approval from a maintainer.

Thanks,
Kyrill


Thanks,

Jackson

Changelog:

gcc/

2018-06-28  Jackson Woodruff 

 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp):
 Use arrays instead of numbered variables.

Re: [testsuite] Minor tweak to 4 Aarch64 testcases

2018-07-13 Thread Kyrill Tkachov


Hi Eric,

On 13/07/18 09:23, Eric Botcazou wrote:

These 4 Aarch64 testcases use dg-xfail-if to disable themselves on ARM, while
all the other equivalent testcases use dg-skip-if.  The latter form is better
because it doesn't unnecessarily pollute the testsuite report.

Tested on arm-eabi, OK for the mainline?




I used xfail for these testcases in particular because the intrinsics that they 
test
should be available for both arm and aarch64.
They are currently not implemented on arm, even though they should be.
The other tests that are skipped instead of xfailed test intrinsics that should
only be available on aarch64 and not arm.

Cheers,
Kyrill


2018-07-13  Eric Botcazou  

* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace dg-xfail-if
with dg-skip-if for ARM targets.
* gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: Likewise.

--
Eric Botcazou

Re: [PATCH] [v2][aarch64] Avoid tag collisions for loads falkor

2018-07-13 Thread Kyrill Tkachov


Hi Siddhesh,

On 13/07/18 12:26, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February.

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

Changes from v1:

- Fixed up issues pointed out by Kyrill
- Avoid renaming R0/V0 since they could be return values
- Fixed minor formatting issues.

2018-07-02  Siddhesh Poyarekar 
Kugan Vivekanandarajah 

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.



This looks good to me modulo a couple of minor comments inline.
You'll still need an approval from a maintainer.

Thanks,
Kyrill


CC: james.greenha...@arm.com
CC: kyrylo.tkac...@foss.arm.com
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 +
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 856 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 7 files changed, 921 insertions(+), 46 deletions(-)





+/* Find the use def chain in which INSN exists and then see if there is a
+   definition inside the loop and outside it.  We use this as a simple
+   approximation to determine whether the base register is an IV.  The basic
+   idea is to find INSN in the use-def chains for its base register and find
+   all definitions that reach it.  Of all these definitions, there should be at
+   least one definition that is a simple addition of a constant value, either
+   as a binary operation or a pre or post update.
+
+   The function returns true if the base register is estimated to be an IV.  */
+static bool
+iv_p (rtx_insn *insn, rtx reg, struct loop *loop)
+{
+  df_ref ause;
+  unsigned regno = REGNO (reg);
+
+  /* Ignore loads from the stack.  */
+  if (regno == SP_REGNUM)
+return false;
+
+  for (ause= DF_REG_USE_CHAIN (regno); ause; ause = DF_REF_NEXT_REG (ause))
+{


Space after ause


+  if (!DF_REF_INSN_INFO (ause)
+ || !NONDEBUG_INSN_P (DF_REF_INSN (ause)))
+   continue;
+
+  if (insn != DF_REF_INSN (ause))
+   continue;
+
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref def_rec;
+
+  FOR_EACH_INSN_INFO_DEF (def_rec, insn_info)
+   {
+ rtx_insn *insn = DF_REF_INSN (def_rec);
+ basic_block bb = BLOCK_FOR_INSN (insn);
+
+ if (dominated_by_p (CDI_DOMINATORS, bb, loop->header)
+ && bb->loop_father == loop)
+   {
+ if (recog_memoized (insn) < 0)
+   continue;
+
+ rtx pat = PATTERN (insn);
+
+ /* Prefetch or clobber; unlikely to be a constant stride.  The
+falkor software prefetcher tuning is pretty conservative, so
+its presence indicates that the access pattern is probably
+strided but most likely with an unknown stride size or a
+stride size that is quite large.  */
+ if (GET_CODE (pat) != SET)
+   continue;
+
+ rtx x = SET_SRC (pat);
+ if (GET_CODE (x) == ZERO_EXTRACT
+ || GET_CODE (x) == ZERO_EXTEND
+ || GET_CODE (x) == SIGN_EXTEND

Re: [PATCH] [v3][aarch64] Avoid tag collisions for loads falkor

2018-07-16 Thread Kyrill Tkachov


Hi Siddhesh,

On 16/07/18 11:00, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February.

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

Changes from v2:
- Ignore SVE instead of asserting that falkor does not support sve

Changes from v1:

- Fixed up issues pointed out by Kyrill
- Avoid renaming R0/V0 since they could be return values
- Fixed minor formatting issues.

2018-07-02  Siddhesh Poyarekar  
Kugan Vivekanandarajah  

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.


I think this looks ok now. You'll still need a maintainer to approve it though.

Thanks for iterating on this,
Kyrill


CC: james.greenha...@arm.com
CC: kyrylo.tkac...@foss.arm.com
---
  gcc/config.gcc|   2 +-
  gcc/config/aarch64/aarch64-passes.def |   1 +
  gcc/config/aarch64/aarch64-protos.h   |  49 +
  gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
  gcc/config/aarch64/aarch64.c  |  48 +-
  .../aarch64/falkor-tag-collision-avoidance.c  | 857 ++
  gcc/config/aarch64/t-aarch64  |   9 +
  7 files changed, 922 insertions(+), 46 deletions(-)
  create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 63162aab676..c66dda0770e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
 .  */
  
  INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);

+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 87c6ae20278..0a4558c2023 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -283,6 +283,49 @@ struct tune_params
const struct cpu_prefetch_tune *prefetch;
  };
  
+/* Classifies an address.

+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol relocation.
+
+   ADDRESS_SYMBOLIC:
+   A constant symbolic address, in pc-relative literal pool.  */
+
+enum aarch64_address_

Re: Avoid assembler warnings from AArch64 constructor/destructor priorities

2018-07-17 Thread Kyrill Tkachov




On 02/02/18 15:14, Kyrill Tkachov wrote:


On 01/02/18 17:26, Joseph Myers wrote:

On Thu, 1 Feb 2018, Kyrill  Tkachov wrote:


Hi Joseph, aarch64 maintainers,

On 28/09/17 13:31, Joseph Myers wrote:

Many GCC tests fail for AArch64 with current binutils because of
assembler warnings of the form "Warning: ignoring incorrect section
type for .init_array.00100".  The same issue was fixed for ARM in
r247015 by using SECTION_NOTYPE when creating those sections; this
patch applies the same fix to AArch64.

Tested with no regressions with cross to aarch64-linux-gnu. OK to
commit?

There is a user request to backport this patch to the GCC 7 and 6
branches: PR 84168.

Would that be acceptable?

I don't object, but I'm not aware of this being a regression.


I think it's been present since the beginning of the port so
it's not a regression, just a user request for a bugfix backport.

The patch applies, bootstraps and tests cleanly on both branches FWIW.



Gentle ping on this (I had almost forgotten about it).
Do the aarch64 maintainers mind this patch being packported to the GCC 7 and 6 
branches?

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00054.html

Thanks,
Kyrill

[PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-17 Thread Kyrill Tkachov


Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that 
expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR 
otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise 
in the
midend and optimise. The previous approach of generating the open-coded version 
of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% 
improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  

* f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
__builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
__builtin_fmaxl.
* trans-intrinsic.c: Include builtins.h.
(gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  

* gfortran.dg/max_fmaxf.f90: New test.
* gfortran.dg/min_fminf.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
* gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0f39f0ca788ea9e5868d4718c5f90c102958968f..5dd58f3d3d0242d77a5838ffa0395e7b941c8f85 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -724,6 +724,20 @@ gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_roundf", mfunc_float[0], 
 		  BUILT_IN_ROUNDF, "roundf", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  gfc_define_builtin  ("__builtin_fmin", mfunc_double[1],
+		  BUILT_IN_FMIN, "fmin", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminf", mfunc_float[1],
+		  BUILT_IN_FMINF, "fminf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fminl", mfunc_longdouble[1],
+		  BUILT_IN_FMINL, "fminl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
+  gfc_define_builtin  ("__builtin_fmax", mfunc_double[1],
+		  BUILT_IN_FMAX, "fmax", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxf", mfunc_float[1],
+		  BUILT_IN_FMAXF, "fmaxf", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin  ("__builtin_fmaxl", mfunc_longdouble[1],
+		  BUILT_IN_FMAXL, "fmaxl", ATTR_CONST_NOTHROW_LEAF_LIST);
+
   gfc_define_builtin ("__builtin_truncl", mfunc_longdouble[0],
 		  BUILT_IN_TRUNCL, "truncl", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_trunc", mfunc_double[0],
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..5dde54a3f3c2606f987b42480b1921e6304ccda5 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "builtins.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,13 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  __builtin_fmin/max (mvar, a2);
+  __builtin_fmin/max (mvar, a3);
   ...
-  return mvar
+  return mvar;
 }
- */
+   For integral types or when we don't care about NaNs use
+   MIN/MAX_EXPRs.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3891,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3911,79 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1,

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-17 Thread Kyrill Tkachov


Hi Richard,

On 17/07/18 14:27, Richard Biener wrote:

On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
 wrote:

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that 
expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR 
otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise 
in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is


The current implementation expands to:
mvar = a1;
if (a2 .op. mvar || isnan (mvar))
  mvar = a2;
if (a3 .op. mvar || isnan (mvar))
  mvar = a3;
...
return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics 
of fmin/max
as far as I can tell.


/* Minimum and maximum values.  When used with floating point, if both
operands are zeros, or if either operand is NaN, then it is unspecified
which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.


True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their 
use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && 
!HONOR_NANS (type).




I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.


The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax 
then the patch falls back
to the existing expansion.

FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

Thanks,
Kyrill



Richard.


midend and optimise. The previous approach of generating the open-coded version 
of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% 
improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  

  * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
  __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
  __builtin_fmaxl.
  * trans-intrinsic.c: Include builtins.h.
  (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
  functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  

  * gfortran.dg/max_fmaxf.f90: New test.
  * gfortran.dg/min_fminf.f90: Likewise.
  * gfortran.dg/minmax_integer.f90: Likewise.
  * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
  * gfortran.dg/min_fminl_aarch64.f90: Likewise.

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-17 Thread Kyrill Tkachov


Hi Thomas,

On 17/07/18 16:36, Thomas Koenig wrote:

Hi Kyrill,


The current implementation expands to:
 mvar = a1;
 if (a2 .op. mvar || isnan (mvar))
   mvar = a2;
 if (a3 .op. mvar || isnan (mvar))
   mvar = a3;
 ...
 return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics 
of fmin/max
as far as I can tell.


I've looked at the F2008 standard, and, interestingly enough, the
requirement on MIN and MAX do not mention NaNs at all. 13.7.106
has, for MAX,

Result Value. The value of the result is that of the largest argument.

plus some stuff about character variables (not relevant here). Similar
for MIN.

Also, the section on IEEE_ARITHMETIC (14.9) does not mention
comparisons; also, "Complete conformance with IEC 60559:1989 is not
required", what is required is the correct support for +,-, and *,
plus support for / if IEEE_SUPPORT_DIVIDE is covered.



Thanks for checking this.


So, the Fortran standard does not impose many requirements. I do think
that a patch such as yours should not change the current behavior unless
we know what it does and do think it is a good idea.  Hmm...

Having said that, I think we pretty much cover all the corner cases
in nan_1.f90, so if that test passes without regression, then that
aspect should be fine.



Looking at the test it looks like there is a de facto expected behaviour.
For example it contains:
if (max(2.d0, nan) /= 2.d0) STOP 9

So it definitely expects comparison with NaN to return the non-NaN result,
which is a the behaviour what my patch preserves.

On integral arguments or when we don't care about NaNs (-Ofast and such) we'll 
be using
the MIN/MAX_EXPR, which doesn't specify what's returned on a NaN argument, thus 
allowing
for more aggressive optimisations.


Question: You have found an advantage on Aarm64. Do you have
access to other architectures so see if there is also a speed
advantage, or maybe a disadvantage?



Because the expansion now emits straightline code rather than conditionals and 
branches
it should be easier to optimise in general, so I'd expect this to be an 
improvement overall.
That said, I have benchmarked it on SPEC2017 on aarch64.

If you have any benchmarks of interest to you you (or somebody else) can run on 
a target that you
care about I would be very grateful for any results.

Thanks,
Kyrill


Regards

Thomas

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

2018-07-18 Thread Kyrill Tkachov




On 18/07/18 10:44, Richard Biener wrote:

On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
 wrote:

Hi Richard,

On 17/07/18 14:27, Richard Biener wrote:

On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
 wrote:

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that 
expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR 
otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise 
in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

The current implementation expands to:
  mvar = a1;
  if (a2 .op. mvar || isnan (mvar))
mvar = a2;
  if (a3 .op. mvar || isnan (mvar))
mvar = a3;
  ...
  return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics 
of fmin/max
as far as I can tell.


/* Minimum and maximum values.  When used with floating point, if both
 operands are zeros, or if either operand is NaN, then it is unspecified
 which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their 
use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && 
!HONOR_NANS (type).



I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).


Urgh, that can cause headaches...


If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax 
then the patch falls back
to the existing expansion.

As said I would not use fmin/fmax calls here at all.


... Given the comments from Thomas and Janne, maybe we should just emit 
MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these 
intrinsics?
That should make it simpler and more portable.


FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?


It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt 
optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs 
(minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill


Richard.


Thanks,
Kyrill


Richard.


midend and optimise. The previous approach of generating the open-coded version 
of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% 
improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  

   * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
   __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
   __builtin_fmaxl.
   * trans-intrinsic.c: Include builtins.h.
   (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
   functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  

   * gfortran.dg/max_fmaxf.f90: New test.
   * gfortran.dg/min_fminf.f90: Likewise.
   * gfortran.dg/minmax_integer.f90: Likewise.
   * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
   * gfortran.dg/min_fminl_aarch64.f90: Likewise.

[PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov


Hi all,

Thank you for the feedback so far.
This version of the patch doesn't try to emit fmin/fmax function calls but 
instead
emits MIN/MAX_EXPR sequences unconditionally.
I think a source of confusion in the original proposal (for me at least) was
that on aarch64 (that I primarily work on) we implement the fmin/fmax optabs
and therefore these calls are expanded to a single instruction.
But on x86_64 these optabs are not implemented and therefore expand to actual 
library calls.
Therefore at -O3 (no -ffast-math) I saw a gain on aarch64. But I measured today
on x86_64 and saw a regression.

Thomas and Janne suggested that the Fortran standard does not impose a 
requirement
on NaN handling for the min/max intrinsics, which would make emitting 
MIN/MAX_EXPR
sequences unconditionally a valid approach.

However, the gfortran.dg/nan_1.f90 test checks that handling of NaN values in
these intrinsics follows the IEEE semantics (min (nan, 2.0) == 2.0, for 
example).
This is not required by the standard, but is the existing gfortran behaviour.

If we end up always emitting MIN/MAX_EXPR sequences, like this version of the 
patch does,
then that test fails on some configurations of x86_64 and not others (for me it 
FAILs
by default, but passes with -march=native on my machine) and passes on AArch64.
This is expected since MIN/MAX_EXPR doesn't enforce IEEE behaviour on its 
arguments.

However, by always emitting MIN/MAX_EXPR the gfc_conv_intrinsic_minmax function 
is
simplified and, perhaps more importantly, generates faster code in the -O3 case.
With this patch I see performance improvement on 521.wrf on both AArch64 (3.7%)
and x86_64 (5.4%).

Thomas, Janne, would this relaxation of NaN handling be acceptable given the 
benefits
mentioned above? If so, what would be the recommended adjustment to the 
nan_1.f90 test?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
sequence to calculate the min/max.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_float.f90: New test.
* gfortran.dg/min_float.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..e5a1f1ddabeedc7b9f473db11e70f29548fc69ac 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -3874,14 +3874,11 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = MIN/MAX_EXPR (mvar, a2);
+  mvar = MIN/MAX_EXPR (mvar, a3);
   ...
-  return mvar
-}
- */
+  return mvar;
+}  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3888,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3908,37 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 NE_EXPR, logical_type_node,
 TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-  else
-  {
-	cond = NULL_TREE;
-
+	}
+  else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val, &se->pre);
-  }
-
-  thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val));
+	val = gfc_evaluate_now (val, &se->pre);
 
-  tmp = fold_build2_loc (input_location, op, logical_type_node,
-			 convert (type, val), mvar);
+  tree calc;
 
-  /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to
-	 __builtin_isnan might be made dependent on that module being loaded,
-	 to help performance of programs that don't rely on IEEE semantics.  */
-  if (FLOAT_TYPE_P (TREE_TYPE (mvar)))
-	{
-	  isnan = build_call_expr_loc (input_location,
-   builtin_decl_explicit (BUILT_IN_ISNAN),
-   1, mvar);
-	  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
- logical_type_node, tmp,
- fold_convert (logical_type_node

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov



On 18/07/18 14:26, Thomas König wrote:

Hi Kyrlll,


Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov :

Thomas, Janne, would this relaxation of NaN handling be acceptable given the 
benefits
mentioned above? If so, what would be the recommended adjustment to the 
nan_1.f90 test?

I would be a bit careful about changing behavior in such a major way. What 
would the results with NaN and infinity then be, with or without optimization? 
Would the results be consistent with min(nan,num) vs min(num,nan)? Would they 
be consistent with the new IEEE standard?

In general, I think that min(nan,num) should be nan and that our current 
behavior is not the best.

Does anybody have dats points on how this is handled by other compilers?

Oh, and if anything is changed, then compile and runtime behavior should always 
be the same.


Thanks, that makes it clearer what behaviour is accceptable.

So this v3 patch follows Richard Sandiford's suggested approach of emitting 
IFN_FMIN/FMAX
when dealing with floating-point values and NaN handling is important and the 
target
supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence 
is emitted.
For integer types and -ffast-math floating-point it will emit MIN/MAX_EXPR.

With this patch the nan_1.f90 behaviour is preserved on all targets, we get the 
optimal
sequence on aarch64 and on x86_64 we avoid the function call, with no changes 
in code generation.

This gives the performance improvement on 521.wrf on aarch64 and leaves it 
unchanged on x86_64.

I'm hoping this addresses all the concerns raised in this thread:
* The NaN-handling behaviour is unchanged on all platforms.
* The fast inline sequence is emitted where it is available.
* No calls to library fmin*/fmax* are emitted where there were none.
* MIN/MAX_EXPR sequence are emitted where possible.

Is this acceptable?

Thanks,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_fmaxl_aarch64.f90: New test.
* gfortran.dg/min_fminl_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..6f5700f2a421d2a735d77c4c4ec0c4c9c058e727 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = COMP (mvar, a2)
+  mvar = COMP (mvar, a3)
   ...
-  return mvar
+  return mvar;
 }
- */
+Where COMP is MIN/MAX_EXPR for integral types or when we don't
+care about NaNs, or IFN_FMIN/MAX when the target has support for
+fast NaN-honouring min/max.  When neither holds expand a sequence
+of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond = fold_build2_loc (input_location,
+	{
+	  cond = fold_build2_loc (input_location,
 NE_EXPR, logical_type_node,
 TREE_OPERAND (val, 0),
 			build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));
-  else
-  {
-	cond = NULL_TREE;
-
+	}
+  else if (!VAR_P (val) && !TREE_CONSTANT (val))
 	/* Only evaluate the argument once.  */
-	if (!VAR_P (val) && !TREE_CONSTANT (val))
-	  val = gfc_evaluate_now (val,

Re: [PATCH][Fortran][v2] Use MIN/MAX_EXPR for min/max intrinsics

2018-07-18 Thread Kyrill Tkachov


Hi Richard,

On 18/07/18 16:27, Richard Sandiford wrote:

Thanks for doing this.

Kyrill  Tkachov  writes:

+ calc = build_call_expr_internal_loc (input_location, ifn, type,
+ 2, mvar, convert (type, val));

(indentation looks off)


diff --git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 
b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
new file mode 100644
index 
..8c8ea063e5d0718dc829c1f5574c5b46040e6786
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = max (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fmaxl " 7 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 
b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
new file mode 100644
index 
..92368917fb48e0c468a16d080ab3a9ac842e01a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90
@@ -0,0 +1,9 @@
+! { dg-do compile { target aarch64*-*-* } }
+! { dg-options "-O2 -fdump-tree-optimized" }
+
+subroutine fool (a, b, c, d, e, f, g, h)
+  real (kind=16) :: a, b, c, d, e, f, g, h
+  a = min (a, b, c, d, e, f, g, h)
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_fminl " 7 "optimized" } }

Do these still pass?  I wouldn't have expected us to use __builtin_fmin*
and __builtin_fmax* now.

It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4
and kind=8 on AArch64, since that's really the end goal here.


Doh, yes. I had spotted that myself after I had sent out the patch.
I've fixed that and the indentation issue in this small revision.

Given Janne's comments I will commit this tomorrow if there are no objections.
This patch should be a conservative improvement. If the Fortran folks decide
to sacrifice the more predictable NaN handling in favour of more optimisation
leeway by using MIN/MAX_EXPR unconditionally we can do that as a follow-up.

Thanks for the help,
Kyrill

2018-07-18  Kyrylo Tkachov  

* trans-intrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR
or IFN_FMIN/FMAX sequence to calculate the min/max when possible.

2018-07-18  Kyrylo Tkachov  

* gfortran.dg/max_fmax_aarch64.f90: New test.
* gfortran.dg/min_fmin_aarch64.f90: Likewise.
* gfortran.dg/minmax_integer.f90: Likewise.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..c9b5479740c3f98f906132fda5c252274c4b6edd 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans.h"
 #include "stringpool.h"
 #include "fold-const.h"
+#include "internal-fn.h"
 #include "tree-nested.h"
 #include "stor-layout.h"
 #include "toplev.h"	/* For rest_of_decl_compilation.  */
@@ -3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr)
 minmax (a1, a2, a3, ...)
 {
   mvar = a1;
-  if (a2 .op. mvar || isnan (mvar))
-mvar = a2;
-  if (a3 .op. mvar || isnan (mvar))
-mvar = a3;
+  mvar = COMP (mvar, a2)
+  mvar = COMP (mvar, a3)
   ...
-  return mvar
+  return mvar;
 }
- */
+Where COMP is MIN/MAX_EXPR for integral types or when we don't
+care about NaNs, or IFN_FMIN/MAX when the target has support for
+fast NaN-honouring min/max.  When neither holds expand a sequence
+of explicit comparisons.  */
 
 /* TODO: Mismatching types can occur when specific names are used.
These should be handled during resolution.  */
@@ -3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
   tree tmp;
   tree mvar;
   tree val;
-  tree thencase;
   tree *args;
   tree type;
   gfc_actual_arglist *argexpr;
@@ -3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op)
 
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (&se->pre, mvar, args[0]);
-  for (i = 1, argexpr = argexpr->next; i < nargs; i++)
-{
-  tree cond, isnan;
 
+  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
+
+  for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
+{
+  tree cond = NULL_TREE;
   val = args[i];
 
   /* Handle absent optional arguments by ignoring the comparison.  */
   if (argexpr->expr->expr_type == EXPR_VARIABLE
 	  && argexpr->expr->symtree->n.sym->attr.optional
 	  && TREE_CODE (val) == INDIRECT_REF)
-	cond =

Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-07-19 Thread Kyrill Tkachov


Hi Thomas,

On 17/07/18 12:02, Thomas Preudhomme wrote:

Fixed in attached patch. ChangeLog entries are unchanged:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme 

PR target/85434
* target-insns.def (stack_protect_combined_set): Define new standard
pattern name.
(stack_protect_combined_test): Likewise.
* cfgexpand.c (stack_protect_prologue): Try new
stack_protect_combined_set pattern first.
* function.c (stack_protect_epilogue): Try new
stack_protect_combined_test pattern first.
* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
parameters to control which register to use as PIC register and force
reloading PIC register respectively.
(legitimize_pic_address): Expose above new parameters in prototype and
adapt recursive calls accordingly.
(arm_legitimize_address): Adapt to new legitimize_pic_address
prototype.
(thumb_legitimize_address): Likewise.
(arm_emit_call_insn): Adapt to new require_pic_register prototype.
* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
change.
* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
prototype change.
(stack_protect_combined_set): New insn_and_split pattern.
(stack_protect_set): New insn pattern.
(stack_protect_combined_test): New insn_and_split pattern.
(stack_protect_test): New insn pattern.
* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
(UNSPEC_SP_TEST): Likewise.
* doc/md.texi (stack_protect_combined_set): Document new standard
pattern name.
(stack_protect_set): Clarify that the operand for guard's address is
legal.
(stack_protect_combined_test): Document new standard pattern name.
(stack_protect_test): Clarify that the operand for guard's address is
legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme 

PR target/85434
* gcc.target/arm/pr85434.c: New test.



Sorry for the delay. Some comments inline.

Kyrill

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
 {
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
+  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);

+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+ leaking Y into a register.  This combined address + copy pattern allows
+ the target to prevent spilling of any intermediate results by splitting
+ it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+return;
+
   if (guard_decl)
 y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262ce64..100844e659c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum 
rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
   HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbcba9f..f4a970580c2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize

-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   If not NULL, PIC_REG indicates which register to use as PIC register,
+   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void

-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
  routine gets called as part of the rtx cost estimation process.
  We don't want those calls to affect any assumptions about the real
  function; and further, we can't call entry_of_function() until we
  start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
 {
-  gcc_assert (can_create_pseudo_p ());
+  gcc_assert (can_create_pseudo_p ()
+ || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
   if (arm_pic_register != INVALID_R

Re: [PATCH] Fix a missing case of PR 21458 similar to fc6141f097056f830a412afebed8d81a9d72b696.

2018-07-24 Thread Kyrill Tkachov


Hi Robert,

On 24/07/18 09:48, Robert Schiele wrote:

The original fix for PR 21458 was causing some issues, which were
addressed to be fixed with a follow-up fix
fc6141f097056f830a412afebed8d81a9d72b696.  Unfortunately that follow-up
fix missed one case, which is handled by this fix.

Change-Id: Ie32e3f2514b3e4b6b35c0a693de6b65ef010bb9d
---
 gas/config/tc-arm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)



Patches to gas should be sent to the binutils list: binut...@sourceware.org
rather than gcc-patches.

Cheers,
Kyrill


diff --git a/gas/config/tc-arm.c b/gas/config/tc-arm.c
index feb725d..c92b6ef 100644
--- a/gas/config/tc-arm.c
+++ b/gas/config/tc-arm.c
@@ -10836,11 +10836,12 @@ do_t_adr (void)
   inst.instruction |= Rd << 4;
 }

-  if (inst.reloc.exp.X_op == O_symbol
+  if (support_interwork
+  && inst.reloc.exp.X_op == O_symbol
   && inst.reloc.exp.X_add_symbol != NULL
   && S_IS_DEFINED (inst.reloc.exp.X_add_symbol)
   && THUMB_IS_FUNC (inst.reloc.exp.X_add_symbol))
-inst.reloc.exp.X_add_number += 1;
+inst.reloc.exp.X_add_number |= 1;
 }

 /* Arithmetic instructions for which there is just one 16-bit
--
2.4.6

Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-24 Thread Kyrill Tkachov




On 24/07/18 16:12, James Greenhalgh wrote:

On Thu, Jul 19, 2018 at 07:35:22AM -0500, Matthew Malcomson wrote:
> Hi again.
>
> Providing an updated patch to include the formatting suggestions.

Please try not to top-post replies, it makes the conversation thread
harder to follow (reply continues below!).

> On 12/07/18 11:39, Sudakshina Das wrote:
> > Hi Matthew
> >
> > On 12/07/18 11:18, Richard Sandiford wrote:
> >> Looks good to me FWIW (not a maintainer), just a minor formatting thing:
> >>
> >> Matthew Malcomson  writes:
> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >>> b/gcc/config/aarch64/aarch64-simd.md
> >>> index
> >>> 
aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
> >>> 100644
> >>> --- a/gcc/config/aarch64/aarch64-simd.md
> >>> +++ b/gcc/config/aarch64/aarch64-simd.md
> >>> @@ -3302,38 +3302,78 @@
> >>> DONE;
> >>>   })
> >>>   -(define_insn "aarch64_w"
> >>> +(define_insn "aarch64_subw"
> >>> [(set (match_operand: 0 "register_operand" "=w")
> >>> -(ADDSUB: (match_operand: 1 "register_operand"
> >>> "w")
> >>> -(ANY_EXTEND:
> >>> -  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>> +(minus:
> >>> + (match_operand: 1 "register_operand" "w")
> >>> + (ANY_EXTEND:
> >>> +   (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> The (minus should be under the "(match_operand":
> >>
> >> (define_insn "aarch64_subw"
> >>[(set (match_operand: 0 "register_operand" "=w")
> >> (minus: (match_operand: 1 "register_operand" "w")
> >>(ANY_EXTEND:
> >>  (match_operand:VD_BHSI 2 "register_operand" "w"]
> >>
> >> Same for the other patterns.
> >>
> >> Thanks,
> >> Richard
> >>
> >
> > You will need a maintainer's approval but this looks good to me.
> > Thanks for doing this. I would only point out one other nit which you
> > can choose to ignore:
> >
> > +/* Ensure
> > +   saddw2 and one saddw for the function add()
> > +   ssubw2 and one ssubw for the function subtract()
> > +   uaddw2 and one uaddw for the function uadd()
> > +   usubw2 and one usubw for the function usubtract() */
> > +
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
> > +/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */
> >
> > The scan-assembly directives for the different
> > functions can be placed right below each of them and that would
> > make it easier to read the expected results in the test and you
> > can get rid of the comments saying the same.

Thanks for the first-line review Sudi.

OK for trunk.



I've committed this on behalf of Matthew as: https://gcc.gnu.org/r262949

Thanks,
Kyrill


Thanks,
James

Re: [PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-07-26 Thread Kyrill Tkachov


Hi Luis,

On 25/07/18 19:10, Luis Machado wrote:

The adjusted vector costs give Falkor a reasonable boost in performance for FP
benchmarks (both CPU2017 and CPU2006) and doesn't change INT benchmarks that
much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.

OK for trunk?



The patch looks ok and safe to me (though you'll need approval from the 
maintainers).

I'd be interested to see what workloads in CPU2017 were affected by this.
Any chance you could post the breakdown in numbers from CPU2017?

Thanks,
Kyrill


gcc/ChangeLog:

2018-07-25  Luis Machado  

* config/aarch64/aarch64.c (qdf24xx_vector_cost): New.
(qdf24xx_tunings) : Set to qdf24xx_vector_cost.
---
 gcc/config/aarch64/aarch64.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..d443aee 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -430,6 +430,26 @@ static const struct cpu_vector_cost generic_vector_cost =
   1 /* cond_not_taken_branch_cost  */
 };

+/* Qualcomm QDF24xx costs for vector insn classes.  */
+static const struct cpu_vector_cost qdf24xx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* vec_int_stmt_cost  */
+  3, /* vec_fp_stmt_cost  */
+  2, /* vec_permute_cost  */
+  1, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* vec_align_load_cost  */
+  1, /* vec_unalign_load_cost  */
+  1, /* vec_unalign_store_cost  */
+  1, /* vec_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1  /* cond_not_taken_branch_cost  */
+};
+
 /* ThunderX costs for vector insn classes.  */
 static const struct cpu_vector_cost thunderx_vector_cost =
 {
@@ -890,7 +910,7 @@ static const struct tune_params qdf24xx_tunings =
   &qdf24xx_extra_costs,
   &qdf24xx_addrcost_table,
   &qdf24xx_regmove_cost,
-  &generic_vector_cost,
+  &qdf24xx_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
   4, /* memmov_cost  */
--
2.7.4

Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-30 Thread Kyrill Tkachov




On 30/07/18 14:30, Christophe Lyon wrote:

Hi,

On Tue, 24 Jul 2018 at 17:39, Kyrill Tkachov
 wrote:


On 24/07/18 16:12, James Greenhalgh wrote:

On Thu, Jul 19, 2018 at 07:35:22AM -0500, Matthew Malcomson wrote:

Hi again.

Providing an updated patch to include the formatting suggestions.

Please try not to top-post replies, it makes the conversation thread
harder to follow (reply continues below!).


On 12/07/18 11:39, Sudakshina Das wrote:

Hi Matthew

On 12/07/18 11:18, Richard Sandiford wrote:

Looks good to me FWIW (not a maintainer), just a minor formatting thing:

Matthew Malcomson  writes:

diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index
aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3302,38 +3302,78 @@
 DONE;
   })
   -(define_insn "aarch64_w"
+(define_insn "aarch64_subw"
 [(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand"
"w")
-(ANY_EXTEND:
-  (match_operand:VD_BHSI 2 "register_operand" "w"]
+(minus:
+ (match_operand: 1 "register_operand" "w")
+ (ANY_EXTEND:
+   (match_operand:VD_BHSI 2 "register_operand" "w"]

The (minus should be under the "(match_operand":

(define_insn "aarch64_subw"
[(set (match_operand: 0 "register_operand" "=w")
 (minus: (match_operand: 1 "register_operand" "w")
(ANY_EXTEND:
  (match_operand:VD_BHSI 2 "register_operand" "w"]

Same for the other patterns.

Thanks,
Richard


You will need a maintainer's approval but this looks good to me.
Thanks for doing this. I would only point out one other nit which you
can choose to ignore:

+/* Ensure
+   saddw2 and one saddw for the function add()
+   ssubw2 and one ssubw for the function subtract()
+   uaddw2 and one uaddw for the function uadd()
+   usubw2 and one usubw for the function usubtract() */
+
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */

The scan-assembly directives for the different
functions can be placed right below each of them and that would
make it easier to read the expected results in the test and you
can get rid of the comments saying the same.

Thanks for the first-line review Sudi.

OK for trunk.


I've committed this on behalf of Matthew as: https://gcc.gnu.org/r262949


The new test fail with -mabi=ilp32:
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]saddw2[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]saddw[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]ssubw2[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]ssubw[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]uaddw2[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]uaddw[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]usubw2[ \t]+ 1
 gcc.target/aarch64/simd/vect_su_add_sub.c scan-assembler-times [
\t]usubw[ \t]+ 1

I'm not sure how much we care about ilp32?


Looks like the usual pitfall of unsigned longs being 32-bit in ILP32.
The test should be made to use "long long"s that are guaranteed to be 64 bits
instead of just longs

Thanks,
Kyrill


Christophe


Thanks,
Kyrill


Thanks,
James

Re: [PATCH] arm: Generate correct const_ints (PR86640)

2018-07-30 Thread Kyrill Tkachov


Hi Segher,

On 30/07/18 14:14, Segher Boessenkool wrote:

In arm_block_set_aligned_vect 8-bit constants are generated as zero-
extended const_ints, not sign-extended as required.  Fix that.

Tamar tested the patch (see PR); no problems were found.  Is this okay
for trunk?



The patch is okay but please add the testcase from the PR to gcc.dg/
or somewhere else generic (it's not arm-specific).
Thanks Tamar for testing.

Kyrill



Segher


2018-07-30  Segher Boessenkool 

PR target/86640
* config/arm/arm.c (arm_block_set_aligned_vect): Use gen_int_mode
instead of GEN_INT.

---
 gcc/config/arm/arm.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cf12ace..f5eece4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30046,7 +30046,6 @@ arm_block_set_aligned_vect (rtx dstbase,
   rtx dst, addr, mem;
   rtx val_vec, reg;
   machine_mode mode;
-  unsigned HOST_WIDE_INT v = value;
   unsigned int offset = 0;

   gcc_assert ((align & 0x3) == 0);
@@ -30065,10 +30064,8 @@ arm_block_set_aligned_vect (rtx dstbase,

   dst = copy_addr_to_reg (XEXP (dstbase, 0));

-  v = sext_hwi (v, BITS_PER_WORD);
-
   reg = gen_reg_rtx (mode);
-  val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
+  val_vec = gen_const_vec_duplicate (mode, gen_int_mode (value, QImode));
   /* Emit instruction loading the constant value.  */
   emit_move_insn (reg, val_vec);

--
1.8.3.1

Re: [PATCH] arm: Generate correct const_ints (PR86640)

2018-07-31 Thread Kyrill Tkachov


Hi Segher,

On 30/07/18 18:37, Segher Boessenkool wrote:

On Mon, Jul 30, 2018 at 03:55:30PM +0100, Kyrill Tkachov wrote:

Hi Segher,

On 30/07/18 14:14, Segher Boessenkool wrote:

In arm_block_set_aligned_vect 8-bit constants are generated as zero-
extended const_ints, not sign-extended as required.  Fix that.

Tamar tested the patch (see PR); no problems were found.  Is this okay
for trunk?


The patch is okay but please add the testcase from the PR to gcc.dg/
or somewhere else generic (it's not arm-specific).

It only failed with very specific options, which aren't valid on every
ARM config either I think?

-O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a


This is a fairly common Arm configuration that is tested by many people by 
default,
so you wouldn't need to add any arm-specific directives. Just putting it in one 
of
the generic C tests folders would be enough.

Thanks for fixing this,
Kyrill


I don't know the magic incantations for ARM tests, sorry.


Segher

Re: [PATCH][GCC][Arm] Cleanup up reg to reg move in neon_mov.

2018-07-31 Thread Kyrill Tkachov


Hi Tamar,

On 23/07/18 17:52, Tamar Christina wrote:

Hi All,

About 13 years ago the reg-to-reg patterns were split up, before that time
output_move_double could actually handle this case.

After the split was done most patterns were updated except for *neon_mov
which incorrectly retained reg,reg as a valid alternative.

However output_move_double can not handle this and simply returns ""
and asserts.

This pattern is essentially dead and I'm removing it for clarity.

Regtested on armeb-none-eabi and no regressions.
Bootstrapped on arm-none-linux-gnueabihf and no issues.

Ok for trunk?



Ok.
Thanks,
Kyrill


Thanks,
Tamar

gcc/
2018-07-23  Tamar Christina  

* config/arm/neon.md (*neon_mov): Remove reg-to-reg alternative.

--

Re: [PATCH] arm: Generate correct const_ints (PR86640)

2018-07-31 Thread Kyrill Tkachov




On 31/07/18 12:38, Segher Boessenkool wrote:

On Tue, Jul 31, 2018 at 09:02:56AM +0100, Kyrill Tkachov wrote:

Hi Segher,

On 30/07/18 18:37, Segher Boessenkool wrote:

On Mon, Jul 30, 2018 at 03:55:30PM +0100, Kyrill Tkachov wrote:

Hi Segher,

On 30/07/18 14:14, Segher Boessenkool wrote:

In arm_block_set_aligned_vect 8-bit constants are generated as zero-
extended const_ints, not sign-extended as required.  Fix that.

Tamar tested the patch (see PR); no problems were found.  Is this okay
for trunk?


The patch is okay but please add the testcase from the PR to gcc.dg/
or somewhere else generic (it's not arm-specific).

It only failed with very specific options, which aren't valid on every
ARM config either I think?

-O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a

This is a fairly common Arm configuration that is tested by many people by
default,
so you wouldn't need to add any arm-specific directives. Just putting it in
one of
the generic C tests folders would be enough.

Ah ok, so just dg-options -O3 will do?


yes, that should be enough.

Thanks,
Kyrill


(As before, I have no reasonable way to test this).


Segher

Re: [PATCH] arm: Testcase for PR86640

2018-07-31 Thread Kyrill Tkachov


Hi Segher,

On 31/07/18 13:14, Segher Boessenkool wrote:

Hi Kyrill,

As before, untested.  Is this okay for trunk, or will you handle it
yourself (or will someone else do it?)


This is ok.

Thanks,
Kyrill



Segher


2018-07-31  Segher Boessenkool  

gcc/testsuite/
PR target/86640
* gcc.target/arm/pr86640.c: New testcase.

---
  gcc/testsuite/gcc.target/arm/pr86640.c | 10 ++
  1 file changed, 10 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/arm/pr86640.c

diff --git a/gcc/testsuite/gcc.target/arm/pr86640.c 
b/gcc/testsuite/gcc.target/arm/pr86640.c
new file mode 100644
index 000..e104602
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86640.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O3" } */
+
+/* This ICEd with  -O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a  .  */
+
+char fn1() {
+  long long b[5];
+  for (int a = 0; a < 5; a++)
+b[a] = ~0ULL;
+  return b[3];
+}

Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-07-31 Thread Kyrill Tkachov


Hi Thomas,

On 25/07/18 14:28, Thomas Preudhomme wrote:

Hi Kyrill,

Using memory_operand worked, the issues I encountered when using it in
earlier versions of the patch must have been due to the missing test
on address_operand in the preparation statements which I added later.
Please find an updated patch in attachment. ChangeLog entry is as
follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  

 * target-insns.def (stack_protect_combined_set): Define new standard
 pattern name.
 (stack_protect_combined_test): Likewise.
 * cfgexpand.c (stack_protect_prologue): Try new
 stack_protect_combined_set pattern first.
 * function.c (stack_protect_epilogue): Try new
 stack_protect_combined_test pattern first.
 * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
 parameters to control which register to use as PIC register and force
 reloading PIC register respectively.  Insert in the stream of insns if
 possible.
 (legitimize_pic_address): Expose above new parameters in prototype and
 adapt recursive calls accordingly.
 (arm_legitimize_address): Adapt to new legitimize_pic_address
 prototype.
 (thumb_legitimize_address): Likewise.
 (arm_emit_call_insn): Adapt to new require_pic_register prototype.
 * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
 change.
 * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
 prototype change.
 (stack_protect_combined_set): New insn_and_split pattern.
 (stack_protect_set): New insn pattern.
 (stack_protect_combined_test): New insn_and_split pattern.
 (stack_protect_test): New insn pattern.
 * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
 (UNSPEC_SP_TEST): Likewise.
 * doc/md.texi (stack_protect_combined_set): Document new standard
 pattern name.
 (stack_protect_set): Clarify that the operand for guard's address is
 legal.
 (stack_protect_combined_test): Document new standard pattern name.
 (stack_protect_test): Clarify that the operand for guard's address is
 legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  

 * gcc.target/arm/pr85434.c: New test.

Bootstrapped again for Arm and Thumb-2 and regtested with and without
-fstack-protector-all without any regression.


This looks ok to me now.
Thank you for your patience and addressing my comments from before.

Kyrill


Best regards,

Thomas
On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme
 wrote:

[Dropping Jeff Law from the list since he already commented on the
middle end parts]

Hi Kyrill,

On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov
 wrote:

Hi Thomas,

On 17/07/18 12:02, Thomas Preudhomme wrote:

Fixed in attached patch. ChangeLog entries are unchanged:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme 

 PR target/85434
 * target-insns.def (stack_protect_combined_set): Define new standard
 pattern name.
 (stack_protect_combined_test): Likewise.
 * cfgexpand.c (stack_protect_prologue): Try new
 stack_protect_combined_set pattern first.
 * function.c (stack_protect_epilogue): Try new
 stack_protect_combined_test pattern first.
 * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
 parameters to control which register to use as PIC register and force
 reloading PIC register respectively.
 (legitimize_pic_address): Expose above new parameters in prototype and
 adapt recursive calls accordingly.
 (arm_legitimize_address): Adapt to new legitimize_pic_address
 prototype.
 (thumb_legitimize_address): Likewise.
 (arm_emit_call_insn): Adapt to new require_pic_register prototype.
 * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
 change.
 * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
 prototype change.
 (stack_protect_combined_set): New insn_and_split pattern.
 (stack_protect_set): New insn pattern.
 (stack_protect_combined_test): New insn_and_split pattern.
 (stack_protect_test): New insn pattern.
 * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
 (UNSPEC_SP_TEST): Likewise.
 * doc/md.texi (stack_protect_combined_set): Document new standard
 pattern name.
 (stack_protect_set): Clarify that the operand for guard's address is
 legal.
 (stack_protect_combined_test): Document new standard pattern name.
 (stack_protect_test): Clarify that the operand for guard's address is
 legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme 

 PR target/85434
 * gcc.target/arm/pr85434.c: New test.


Sorry for the delay. Some comments inline.

Kyrill

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
   {
 tree

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-04 Thread Kyrill Tkachov


Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.



This looks mostly ok to me but for further review this needs to be 
rebased on top of current trunk as there are some conflicts with the SVE 
ACLE changes that recently went in. Most conflicts looks trivial to 
resolve but one that needs more attention is the definition of the 
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.


Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

    * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
    AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
    AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
    AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
    AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
    (aarch64_init_memtag_builtins): New.
    (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
    (aarch64_general_init_builtins): Call 
aarch64_init_memtag_builtins.

    (aarch64_expand_builtin_memtag): New.
    (aarch64_general_expand_builtin): Call 
aarch64_expand_builtin_memtag.

    (AARCH64_BUILTIN_SUBCODE): New macro.
    (aarch64_resolve_overloaded_memtag): New.
    (aarch64_resolve_overloaded_builtin): New hook. Call
    aarch64_resolve_overloaded_memtag to handle overloaded MTE 
builtins.

    * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
    __ARM_FEATURE_MEMORY_TAGGING when enabled.
    * config/aarch64/aarch64-protos.h 
(aarch64_resolve_overloaded_builtin):

    Add declaration.
    * config/aarch64/aarch64.c (TARGET_RESOLVE_OVERLOADED_BUILTIN):
    New hook.
    * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
    (TARGET_MEMTAG): Likewise.
    * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
    UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
    (irg, gmi, subp, addg, ldg, stg): New instructions.
    * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New 
macro.

    (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
    (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): 
Likewise.

    * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
    (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
    * config/arm/types.md (memtag): New.
    * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-10-16  Dennis Zhang  

    * gcc.target/aarch64/acle/memtag_1.c: New test.
    * gcc.target/aarch64/acle/memtag_2.c: New test.
    * gcc.target/aarch64/acle/memtag_3.c: New test.

Re: [PATCH, GCC/ARM, 1/10] Fix -mcmse check in libgcc

2019-11-04 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 1/10] Fix -mcmse check in libgcc

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to fix the
check to determine whether -mcmse is supported by the host compiler.

=== Patch description ===

Code to detect whether cmse.c can be buit with -mcmse checks the output
of host GCC when invoked with -mcmse. However, an error from the
compiler does not prevent some minimal output so this always holds true.

This does not affect currently supported architectures since the test is
guarded by __ARM_FEATURE_CMSE which is only defined for Armv8-M Baseline
and Mainline and these two architectures accept -mcmse.

However, in the intermediate patches adding support for Armv8.1-M
Mainline, support for Security Extensions is disabled until fully
implemented. This leads to libgcc/config/arm/cmse.c being built with
-mcmse due to the broken test which fails in the intermediate commits.

This patch instead change the test to look at the return value of the
host gcc when invoked with -mcmse.


ChangeLog entry is as follows:

*** libgcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/t-arm: Check return value of gcc rather than lack of
    output.

Testing: Bootstrapped and tested on arm-none-eabi.
Without this patch, GCC stops building after the second patch
of this series.

Is this ok for trunk?


This looks ok to me and safe enough to go in on its own.

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
274bf2a8ef33c5e8a8ee2b246aba92d30297abe1..f2b927f3686a8c0a8e37abfe2d7768f2050d4fb3 
100644

--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,7 +3,7 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \

 _thumb1_case_uhi _thumb1_case_si _speculation_barrier

 HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell 
$(gcc_compile_bare) -dM -E - 
-ifneq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null),)
+ifeq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null 
2>/dev/null; echo $?),0)

 CMSE_OPTS:=-mcmse
 endif

Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-11-04 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 2/10] Add command line support

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to add
command-line support for that new architecture.

=== Patch description ===

Besides the expected enabling of the new value for the -march
command-line option (-march=armv8.1-m.main) and its extensions (see
below), this patch disables support of the Security Extensions for this
newly added architecture. This is done both by not including the cmse
bit in the architecture description and by throwing an error message
when user request Armv8.1-M Mainline Security Extensions. Note that
Armv8-M Baseline and Mainline Security Extensions are still enabled.

Only extensions for already supported instructions are implemented in
this patch. Other extensions (MVE integer and float) will be added in
separate patches. The following configurations are allowed for Armv8.1-M
Mainline with regards to FPU and implemented in this patch:
+ no FPU (+nofp)
+ single precision VFPv5 with FP16 (+fp)
+ double precision VFPv5 with FP16 (+fp.dp)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-cpus.in (armv8_1m_main): New feature.
    (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k,
    ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve,
    ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
    ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
    (ARMv8_1m_main): New feature group.
    (armv8.1-m.main): New architecture.
    * config/arm/arm-tables.opt: Regenerate.
    * config/arm/arm.c (arm_arch8_1m_main): Define and default 
initialize.

    (arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
    (arm_options_perform_arch_sanity_checks): Error out when targeting
    Armv8.1-M Mainline Security Extensions.
    * config/arm/arm.h (arm_arch8_1m_main): Declare.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * lib/target-supports.exp
    (check_effective_target_arm_arch_v8_1m_main_ok): Define.
    (add_options_for_arm_arch_v8_1m_main): Likewise.
(check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; testsuite
shows no regression.

Is this ok for trunk?


Ok.

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cfcd36 
100644

--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -126,6 +126,9 @@ define feature armv8_5
 # M-Profile security extensions.
 define feature cmse

+# Architecture rel 8.1-M.
+define feature armv8_1m_main
+
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.

@@ -223,21 +226,21 @@ define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4 
fpv5 fp16conv fp_dbl ALL_SIMD_I

 # -mfpu support.
 define fgroup ALL_FP    fp16 ALL_FPU_INTERNAL

-define fgroup ARMv4   armv4 notm
-define fgroup ARMv4t  ARMv4 thumb
-define fgroup ARMv5t  ARMv4t armv5t
-define fgroup ARMv5te ARMv5t armv5te
-define fgroup ARMv5tej    ARMv5te
-define fgroup ARMv6   ARMv5te armv6 be8
-define fgroup ARMv6j  ARMv6
-define fgroup ARMv6k  ARMv6 armv6k
-define fgroup ARMv6z  ARMv6
-define fgroup ARMv6kz ARMv6k quirk_armv6kz
-define fgroup ARMv6zk ARMv6k
-define fgroup ARMv6t2 ARMv6 thumb2
+define fgroup ARMv4 armv4 notm
+define fgroup ARMv4t    ARMv4 thumb
+define fgroup ARMv5t    ARMv4t armv5t
+define fgroup ARMv5te   ARMv5t armv5te
+define fgroup ARMv5tej  ARMv5te
+define fgroup ARMv6 ARMv5te armv6 be8
+define fgroup ARMv6j    ARMv6
+define fgroup ARMv6k    ARMv6 armv6k
+define fgroup ARMv6z    ARMv6
+define fgroup ARMv6kz   ARMv6k quirk_armv6kz
+define fgroup ARMv6zk   ARMv6k
+define fgroup ARMv6t2   ARMv6 thumb2
 # This is suspect.  ARMv6-m doesn't really pull in any useful features
 # from ARMv5* or ARMv6.
-define fgroup ARMv6m  armv4 thumb armv5t armv5te armv6 be8
+define fgroup ARMv6m    armv4 thumb armv5t armv5te armv6 be8
 # This is suspect, the 'common' ARMv7 subset excludes the thumb2 
'DSP' and

 # integer SIMD instructions that are in ARMv6T2.  */
 define fgroup ARMv7   ARMv6m thumb2 armv7
@@ -256,6 +259,10 @@ define fgroup ARMv8_5a    ARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
+# Feature cmse is omitted to disable Security Extensions support 
while secu

Re: [PATCH 1/2] [ARM,testsuite] Skip tests incompatible with -mpure-code

2019-11-04 Thread Kyrill Tkachov


Hi Christophe,

On 10/18/19 2:18 PM, Christophe Lyon wrote:

Hi,

All these tests fail when using -mpure-code:
* some force A or R profile
* some use Neon
* some use -fpic/-fPIC
all of which are not supported by this option.

OK?



Hmm... I'm tempted to ask if it would be simpler to add a check for 
-mpure-code in the effective target checks for the above features but a 
lot of those tests don't use effective target checks consistently anyway...


So this is ok, though I'm not a big fan of adding so many dg-skip-if 
directives.


Thanks,

Kyrill




Thanks,

Christophe

Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-11-06 Thread Kyrill Tkachov


Hi Mihail,

On 11/4/19 4:49 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:
> [PATCH, GCC/ARM, 2/10] Add command line support
>
> Hi,
>
> === Context ===
>
> This patch is part of a patch series to add support for Armv8.1-M
> Mainline Security Extensions architecture. Its purpose is to add
> command-line support for that new architecture.
>
> === Patch description ===
>
> Besides the expected enabling of the new value for the -march
> command-line option (-march=armv8.1-m.main) and its extensions (see
> below), this patch disables support of the Security Extensions for this
> newly added architecture. This is done both by not including the cmse
> bit in the architecture description and by throwing an error message
> when user request Armv8.1-M Mainline Security Extensions. Note that
> Armv8-M Baseline and Mainline Security Extensions are still enabled.
>
> Only extensions for already supported instructions are implemented in
> this patch. Other extensions (MVE integer and float) will be added in
> separate patches. The following configurations are allowed for Armv8.1-M
> Mainline with regards to FPU and implemented in this patch:
> + no FPU (+nofp)
> + single precision VFPv5 with FP16 (+fp)
> + double precision VFPv5 with FP16 (+fp.dp)
>
> ChangeLog entry are as follow:
>
> *** gcc/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * config/arm/arm-cpus.in (armv8_1m_main): New feature.
>     (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, 
ARMv6k,
>     ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, 
ARMv7ve,

>     ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
>     ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
>     (ARMv8_1m_main): New feature group.
>     (armv8.1-m.main): New architecture.
>     * config/arm/arm-tables.opt: Regenerate.
>     * config/arm/arm.c (arm_arch8_1m_main): Define and default
> initialize.
>     (arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
>     (arm_options_perform_arch_sanity_checks): Error out when 
targeting

>     Armv8.1-M Mainline Security Extensions.
>     * config/arm/arm.h (arm_arch8_1m_main): Declare.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * lib/target-supports.exp
> (check_effective_target_arm_arch_v8_1m_main_ok): Define.
>     (add_options_for_arm_arch_v8_1m_main): Likewise.
> (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.
>
> Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; 
testsuite

> shows no regression.
>
> Is this ok for trunk?
>
Ok.



Something that I remembered last night upon reflection...

New command-line options (or arguments to them) need documentation in 
invoke.texi.


Please add some either as part of this patch or as a separate patch if 
you prefer.


Thanks,

Kyrill



Thanks,

Kyrill


> Best regards,
>
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index
> 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cfcd36 


> 100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -126,6 +126,9 @@ define feature armv8_5
>  # M-Profile security extensions.
>  define feature cmse
>
> +# Architecture rel 8.1-M.
> +define feature armv8_1m_main
> +
>  # Floating point and Neon extensions.
>  # VFPv1 is not supported in GCC.
>
> @@ -223,21 +226,21 @@ define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4
> fpv5 fp16conv fp_dbl ALL_SIMD_I
>  # -mfpu support.
>  define fgroup ALL_FP    fp16 ALL_FPU_INTERNAL
>
> -define fgroup ARMv4   armv4 notm
> -define fgroup ARMv4t  ARMv4 thumb
> -define fgroup ARMv5t  ARMv4t armv5t
> -define fgroup ARMv5te ARMv5t armv5te
> -define fgroup ARMv5tej    ARMv5te
> -define fgroup ARMv6   ARMv5te armv6 be8
> -define fgroup ARMv6j  ARMv6
> -define fgroup ARMv6k  ARMv6 armv6k
> -define fgroup ARMv6z  ARMv6
> -define fgroup ARMv6kz ARMv6k quirk_armv6kz
> -define fgroup ARMv6zk ARMv6k
> -define fgroup ARMv6t2 ARMv6 thumb2
> +define fgroup ARMv4 armv4 notm
> +define fgroup ARMv4t    ARMv4 thumb
> +define fgroup ARMv5t    ARMv4t armv5t
> +define fgroup ARMv5te   ARMv5t armv5te
> +define fgroup ARMv5tej  ARMv5te
> +define fgroup ARMv6 ARMv5te armv6 be8
> +define fgroup ARMv6j    ARMv6
> +define fgroup ARMv6k    ARMv6

Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-11-06 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on return from
nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore the FP
context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored and
    their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
    Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (fp_sysreg_names): Declare and define.
    (use_return_insn): Also return false for Armv8.1-M Mainline.
    (output_return_instruction): Skip FPSCR clearing if Armv8.1-M
    Mainline instructions are available.
    (arm_compute_frame_layout): Allocate space in frame for FPCXTNS
    when targeting Armv8.1-M Mainline Security Extensions.
    (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
    Mainline entry function.
    (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
    targeting Armv8.1-M Mainline or successor.
    (arm_expand_epilogue): Fix indentation of caller-saved register
    clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
    entry function.
    * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
    (FP_SYSREGS): Likewise.
    (enum vfp_sysregs_encoding): Define enum.
    (fp_sysreg_names): Declare.
    * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile unspec.
    * config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
    (pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and VLDR.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M Mainline 
tests
    from mainline/8m subdirectory and new Armv8.1-M Mainline tests 
from

    mainline/8_1m subdirectory.
    * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move 
and rename

    into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-8.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * g

[PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch adds the plumbing for and an implementation of the saturation
intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics.
These intrinsics set the Q sticky bit in APSR if an overflow occurred.
ACLE allows the user to read that bit (within the same function, it's not
defined across function boundaries) using the __saturation_occurred 
intrinsic

and reset it using __set_saturation_occurred.
Thus, if the user cares about the Q bit they would be using a flow such as:

__set_saturation_occurred (0); // reset the Q bit
...
__ssat (...) // Do some calculations involving __ssat
...
if (__saturation_occurred ()) // if Q bit set handle overflow
  ...

For the implementation this has a few implications:
* We must track the Q-setting side-effects of these instructions to make 
sure

saturation reading/writing intrinsics are ordered properly.
This is done by introducing a new "apsrq" register (and associated
APSRQ_REGNUM) in a similar way to the "fake"" cc register.

* The RTL patterns coming out of these intrinsics can have two forms:
one where they set the APSRQ_REGNUM and one where they don't.
Which one is used depends on whether the function cares about reading the Q
flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the
__saturation_occurred, __set_saturation_occurred occurrences.
If no Q-flag read is present in the function we'll use the simpler
non-Q-setting form to allow for more aggressive scheduling and such.
If a Q-bit read is present then the Q-setting form is emitted.
To avoid adding two patterns for each intrinsic to the MD file we make
use of define_subst to auto-generate the Q-setting forms

* Some existing patterns already produce instructions that may clobber the
Q bit, but they don't model it (as we didn't care about that bit up till 
now).
Since these patterns can be generated from straight-line C code they can 
affect

the Q-bit reads from intrinsics. Therefore they have to be disabled when
a Q-bit read is present.  These are mostly patterns in arm-fixed.md that are
not very common anyway, but there are also a couple of widening
multiply-accumulate patterns in arm.md that can set the Q-bit during
accumulation.

There are more Q-setting intrinsics in ACLE, but these will be 
implemented in

a more mechanical fashion once the infrastructure in this patch goes in.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

    * config/arm/aout.h (REGISTER_NAMES): Add apsrq.
    * config/arm/arm.md (APSRQ_REGNUM): Define.
    (add_setq): New define_subst.
    (add_clobber_q_name): New define_subst_attr.
    (add_clobber_q_pred): Likewise.
    (maddhisi4): Change to define_expand.  Split into mult and add if
    ARM_Q_BIT_READ.
    (arm_maddhisi4): New define_insn.
    (*maddhisi4tb): Disable for ARM_Q_BIT_READ.
    (*maddhisi4tt): Likewise.
    (arm_ssat): New define_expand.
    (arm_usat): Likewise.
    (arm_get_apsr): New define_insn.
    (arm_set_apsr): Likewise.
    (arm_saturation_occurred): New define_expand.
    (arm_set_saturation): Likewise.
    (*satsi_): Rename to...
    (satsi_): ... This.
    (*satsi__shift): Disable for ARM_Q_BIT_READ.
    * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.
    (CALL_USED_REGISTERS): Mark apsrq.
    (FIRST_PSEUDO_REGISTER): Update value.
    (REG_ALLOC_ORDER): Add APSRQ_REGNUM.
    (machine_function): Add q_bit_access.
    (ARM_Q_BIT_READ): Define.
    * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.
    (arm_conditional_register_usage): Clear APSRQ_REGNUM from
    operand_reg_set.
    (arm_q_bit_access): Define.
    * config/arm/arm-builtins.c: Include stringpool.h.
    (arm_sat_binop_imm_qualifiers,
    arm_unsigned_sat_binop_unsigned_imm_qualifiers,
    arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.
    (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,
    UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,
    SET_SAT_QUALIFIERS): Likewise.
    (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.
    (arm_init_acle_builtins): Initialize __builtin_sat_imm_check.
    Handle 0 argument expander.
    (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.
    (arm_check_builtin_call): Define.
    * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,
    arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.
    * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.
    (arm_q_bit_access): Likewise.
    * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,
    __saturation_occurred, __set_saturation_occurred): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
    saturation_occurred, set_saturation_occurred.
    * config/arm/unspecs.md (UNSPEC_Q_SET): Define.
    (UNSPEC_APSR_READ): Likewise.
    (VUNSPEC_APSR_WRITE): Likewise.
    * config/arm/arm-fixed.md (ssadd3): Convert to define_expand.
    (*arm_ssadd3): New define_insn.
    (sssub3): Convert t

[PATCH][arm][3/X] Implement __smla* intrinsics (Q-setting)

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-setting intrinsics form the SMLA* group.
These can set the saturation bit on overflow in the accumulation step.
Like earlier, these have non-Q-setting RTL forms as well for when the 
Q-bit read

is not needed.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov 

    * config/arm/arm.md (arm_smlabb_setq): New define_insn.
    (arm_smlabb): New define_expand.
    (*maddhisi4tb): Rename to...
    (maddhisi4tb): ... This.
    (*maddhisi4tt): Rename to...
    (maddhisi4tt): ... This.
    (arm_smlatb_setq): New define_insn.
    (arm_smlatb): New define_expand.
    (arm_smlatt_setq): New define_insn.
    (arm_smlatt): New define_expand.
    (arm__insn): New define_insn.
    (arm_): New define_expand.
    * config/arm/arm_acle.h (__smlabb, __smlatb, __smlabt, __smlatt,
    __smlawb, __smlawt): Define.
    * config/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SMLAWBT): New int_iterator.
    (slaw_op): New int_attribute.
    * config/arm/unspecs.md (UNSPEC_SMLAWB, UNSPEC_SMLAWT): Define.

2019-11-07  Kyrylo Tkachov 

    * gcc.target/arm/acle/dsp_arith.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index db7a4006eb4f354e08f22c666fea8f1e87726085..05c8ca2772d4475a25b037e3e745c9558e1c5742 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2565,8 +2565,40 @@
(set_attr "predicable" "yes")]
 )
 
+(define_insn "arm_smlabb_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (sign_extend:SI
+			   (match_operand:HI 1 "s_register_operand" "r"))
+			  (sign_extend:SI
+			   (match_operand:HI 2 "s_register_operand" "r")))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlabb%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlabb"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "s_register_operand")]
+  "TARGET_DSP_MULTIPLY"
+  {
+rtx mult1 = gen_lowpart (HImode, operands[1]);
+rtx mult2 = gen_lowpart (HImode, operands[2]);
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm_smlabb_setq (operands[0], mult1, mult2, operands[3]));
+else
+  emit_insn (gen_maddhisi4 (operands[0], mult1, mult2, operands[3]));
+DONE;
+  }
+)
+
 ;; Note: there is no maddhisi4ibt because this one is canonical form
-(define_insn "*maddhisi4tb"
+(define_insn "maddhisi4tb"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(plus:SI (mult:SI (ashiftrt:SI
 			   (match_operand:SI 1 "s_register_operand" "r")
@@ -2580,7 +2612,41 @@
(set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhisi4tt"
+(define_insn "arm_smlatb_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (ashiftrt:SI
+			   (match_operand:SI 1 "s_register_operand" "r")
+			   (const_int 16))
+			  (sign_extend:SI
+			   (match_operand:HI 2 "s_register_operand" "r")))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlatb%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlatb"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "s_register_operand")]
+  "TARGET_DSP_MULTIPLY"
+  {
+rtx mult2 = gen_lowpart (HImode, operands[2]);
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm_smlatb_setq (operands[0], operands[1],
+  mult2, operands[3]));
+else
+  emit_insn (gen_maddhisi4tb (operands[0], operands[1],
+  mult2, operands[3]));
+DONE;
+  }
+)
+
+(define_insn "maddhisi4tt"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(plus:SI (mult:SI (ashiftrt:SI
 			   (match_operand:SI 1 "s_register_operand" "r")
@@ -2595,6 +2661,40 @@
(set_attr "predicable" "yes")]
 )
 
+(define_insn "arm_smlatt_setq"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (mult:SI (ashiftrt:SI
+			   (match_operand:SI 1 "s_register_operand" "r")
+			   (const_int 16))
+			  (ashiftrt:SI
+			   (match_operand:SI 2 "s_register_operand" "r")
+			   (const_int 16)))
+		 (match_operand:SI 3 "s_register_operand" "r")))
+   (set (reg:CC APSRQ_REGNUM)
+	(unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))]
+  "TARGET_DSP_MULTIPLY"
+  "smlatt%?\\t%0, %1, %2, %3"
+  [(set_attr "type" "smlaxy")
+   (set_attr "predicable" "yes")]
+)
+
+(define_expand "arm_smlatt"
+ [(match_operand:SI 0 "s_register_operand")
+  (match_operand:SI 1 "s_register_operand")
+  (match_operand:SI 2 "s_register_operand")
+  (match_operand:SI 3 "s_register_op

[PATCH][arm][2/X] Implement qadd, qsub, __qdbl intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-bit-setting intrinsics from ACLE.
With the plumbing from patch 1 in place they are a simple builtin->RTL 
affair.


Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_expand.
    (arm__insn): New define_insn.
    * config/arm/arm_acle.h (__qadd, __qsub, __qdbl): Define.
    * config/arm/arm_acle_builtins.def: Add builtins for qadd, qsub.
    * config/arm/iterators.md (SSPLUSMINUS): New code iterator.
    (ss_op): New code_attr.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/dsp_arith.c: New test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 09b632b5dbc8b38dcca22494468366c97a514bb6..db7a4006eb4f354e08f22c666fea8f1e87726085 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4076,6 +4076,32 @@
(set_attr "type" "multiple")]
 )
 
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(SSPLUSMINUS:SI (match_operand:SI 1 "s_register_operand")
+			(match_operand:SI 2 "s_register_operand")))]
+  "TARGET_DSP_MULTIPLY"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0],
+	operands[1], operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1], operands[2]));
+DONE;
+  }
+)
+
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(SSPLUSMINUS:SI (match_operand:SI 1 "s_register_operand" "r")
+			(match_operand:SI 2 "s_register_operand" "r")))]
+  "TARGET_DSP_MULTIPLY && "
+  "%?\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_dsp_reg")]
+)
+
 (define_code_iterator SAT [smin smax])
 (define_code_attr SATrev [(smin "smax") (smax "smin")])
 (define_code_attr SATlo [(smin "1") (smax "2")])
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 2564ad849856610f9415586e386f85eea6947bf7..397653d3e8bf43cbcb82d98dd704bcd3a66cf782 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -478,6 +478,29 @@ __ignore_saturation (void)
   })
 #endif
 
+#ifdef __ARM_FEATURE_DSP
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qadd (int32_t __a, int32_t __b)
+{
+  return __builtin_arm_qadd (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qsub (int32_t __a, int32_t __b)
+{
+  return __builtin_arm_qsub (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+__qdbl (int32_t __x)
+{
+  return __qadd (__x, __x);
+}
+#endif
+
 #pragma GCC push_options
 #ifdef __ARM_FEATURE_CRC32
 #ifdef __ARM_FP
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index c72480321faa952ac307418f9e4f7d5f5f9e3745..def1a569311e67194a323decc309ed92747c4c86 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -84,3 +84,5 @@ VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat, si)
 VAR1 (UNSIGNED_SAT_BINOP_UNSIGNED_IMM, usat, si)
 VAR1 (SAT_OCCURRED, saturation_occurred, si)
 VAR1 (SET_SAT, set_saturation, void)
+VAR1 (BINOP, qadd, si)
+VAR1 (BINOP, qsub, si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e5cef6852a2dfcef4cd3597c163a53a6c247afab..ebb8218f265023786730881ef0bc9f818e7235b0 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -264,6 +264,9 @@
 ;; Conversions.
 (define_code_iterator FCVT [unsigned_float float])
 
+;; Saturating addition, subtraction
+(define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer operand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -282,6 +285,8 @@
 
 (define_code_attr vfml_op [(plus "a") (minus "s")])
 
+(define_code_attr ss_op [(ss_plus "qadd") (ss_minus "qsub")])
+
 ;;
 ;; Int iterators
 ;;
diff --git a/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
new file mode 100644
index ..f0bf80993beb0007b0eb360878f0fd1811098d9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_qbit_ok } */
+/* { dg-add-options arm_qbit  } */
+
+#include 
+
+int32_t
+test_qadd (int32_t a, int32_t b)
+{
+  return __qadd (a, b);
+}
+
+int32_t
+test_qdbl (int32_t a)
+{
+  return __qdbl(a);
+}
+
+/* { dg-final { scan-assembler-times "qadd\t...?, ...?, ...?" 2 } } */
+
+int32_t
+test_qsub (int32_t a, int32_t b)
+{
+  return __qsub (a, b);
+}
+
+/* { dg-final { scan-assembler-times "qsub\t...?,

[PATCH][arm][6/X] Add support for __[us]sat16 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This last patch adds the the __ssat16 and __usat16 intrinsics that perform
"clipping" to a particular bitwidth on packed SIMD values, setting the Q bit
as appropriate.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_expand.
    (arm__insn): New define_insn.
    * config/arm/arm_acle.h (__ssat16, __usat16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (USSAT16): New int_iterator.
    (simd32_op): Handle UNSPEC_SSAT16, UNSPEC_USAT16.
    (sup): Likewise.
    * config/arm/predicates.md (ssat16_imm): New predicate.
    (usat16_imm): Likewise.
    * config/arm/unspecs.md (UNSPEC_SSAT16, UNSPEC_USAT16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7717f547ab4706183d2727013496c249edbe7abf..f2f5094f9e2a802557e5c19db1edbc028a91cbd8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5921,6 +5921,33 @@
   }
 )
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "sat16_imm" "i")] USSAT16))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %2, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "sat16_imm")] USSAT16))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index c30645e3949f84321fb1dfe3afd06167ef859d62..9ea922f2d096870d2c2d34ac43f03e3bc9dc4741 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -564,6 +564,24 @@ __smuadx (int16x2_t __a, int16x2_t __b)
   return __builtin_arm_smuadx (__a, __b);
 }
 
+#define __ssat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 1, 16);			\
+int16x2_t __res = __builtin_arm_ssat16 (__arg, __sat);	\
+__res;			\
+  })
+
+#define __usat16(__a, __sat)	\
+  __extension__			\
+  ({\
+int16x2_t __arg = (__a);	\
+__builtin_sat_imm_check (__sat, 0, 15);			\
+int16x2_t __res = __builtin_arm_usat16 (__arg, __sat);	\
+__res;			\
+  })
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 018d89682c61a963961515823420f1b986cd40db..8a21ff74f41840dd793221e079627055d379c474 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -114,3 +114,6 @@ VAR1 (TERNOP, smlsd, si)
 VAR1 (TERNOP, smlsdx, si)
 VAR1 (BINOP, smuad, si)
 VAR1 (BINOP, smuadx, si)
+
+VAR1 (SAT_BINOP_UNSIGNED_IMM, ssat16, si)
+VAR1 (SAT_BINOP_UNSIGNED_IMM, usat16, si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 72aba5e86fc20216bcba74f5cfa5b9f744497a6e..c412851843f4468c2c18bce264288705e076ac50 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -458,6 +458,8 @@
 
 (define_int_iterator SIMD32_BINOP_Q [UNSPEC_SMUAD UNSPEC_SMUADX])
 
+(define_int_iterator USSAT16 [UNSPEC_SSAT16 UNSPEC_USAT16])
+
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
 (define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
@@ -918,6 +920,7 @@
   (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
   (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
   (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+  (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
 ])
 
 (define_int_attr vfml_half
@@ -1083,7 +1086,8 @@
 			(UNSPEC_USUB16 "usub16") (UNSPEC_SMLAD "smlad")
 			(UNSPEC_SMLADX "smladx") (UNSPEC_SMLSD "smlsd")
 			(UNSPEC_SMLSDX "smlsdx") (UNSPEC_SMUAD "smuad")
-			(UNSPEC_SMUADX "smuadx")])
+			(UNSPEC_SMUADX "smuadx") (UNSPEC_SSAT16 "ssat16")
+			(UNSPEC_USAT16 "usat16")])
 
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 267c446c03e8903c21a0d74e43ae589ffcf689f4..c1f655c704011bbe8bac82c24a3234a23bf6b242 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -193,6 +193,14 @@
   (and (match_code "const_int")
(match_test "IN_RANGE (UINTVAL (op), 1, GET_MODE_BITSIZE (mode))")))
 
+(define_predicate "ssat16_imm"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 1, 16)")))
+
+(define

[PATCH][arm][5/X] Implement Q-bit-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch implements some more Q-setting intrinsics of the 
multiply-accumulate
variety, but these are in the SIMD32 family in that they treat their 
operands

as packed SIMD values, but that's not important at the RTL level.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2019-11-07  Kyrylo Tkachov  

    * config/arm/arm.md (arm__insn):
    New define_insns.
    (arm_): New define_expands.
    * config/arm/arm_acle.h (__smlad, __smladx, __smlsd, __smlsdx,
    __smuad, __smuadx): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_TERNOP_Q): New int_iterator.
    (SIMD32_BINOP_Q): Likewise.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md: Define unspecs for the above.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 884a224a991102955787600317581e6468463bea..7717f547ab4706183d2727013496c249edbe7abf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5865,6 +5865,62 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_sreg")])
 
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:SI 3 "s_register_operand" "r")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")
+	   (match_operand:SI 3 "s_register_operand")] SIMD32_TERNOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2], operands[3]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2], operands[3]));
+DONE;
+  }
+)
+
+(define_insn "arm__insn"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD && "
+  "%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_sreg")])
+
+(define_expand "arm_"
+  [(set (match_operand:SI 0 "s_register_operand")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand")
+	   (match_operand:SI 2 "s_register_operand")] SIMD32_BINOP_Q))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_Q_BIT_READ)
+  emit_insn (gen_arm__setq_insn (operands[0], operands[1],
+		operands[2]));
+else
+  emit_insn (gen_arm__insn (operands[0], operands[1],
+	   operands[2]));
+DONE;
+  }
+)
+
 (define_insn "arm_sel"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(unspec:SI
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index b8d02a5502f273fcba492bbeba2542b13334a8ea..c30645e3949f84321fb1dfe3afd06167ef859d62 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -522,6 +522,48 @@ __usub16 (uint16x2_t __a, uint16x2_t __b)
   return __builtin_arm_usub16 (__a, __b);
 }
 
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlad (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlad (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smladx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smladx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsd (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsd (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsdx (int16x2_t __a, int16x2_t __b, int32_t __c)
+{
+  return __builtin_arm_smlsdx (__a, __b, __c);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuad (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuad (__a, __b);
+}
+
+__extension__ extern __inline int32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smuadx (int16x2_t __a, int16x2_t __b)
+{
+  return __builtin_arm_smuadx (__a, __b);
+}
+
 #endif
 
 #ifdef __ARM_FEATURE_SAT
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index 715c3c94e8c8f6355e880a36eb275be80d1a3912..018d89682c61a963961515823420f1b986cd40db 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -107,3 +107,10 @@ VAR1 (UBINOP, usax, si)
 VAR1 (UBINOP, usub16, si)
 
 VAR1 (UBINOP, sel, si)

[PATCH][arm][4/X] Add initial support for GE-setting SIMD32 intrinsics

2019-11-07 Thread Kyrill Tkachov


Hi all,

This patch adds in plumbing for the ACLE intrinsics that set the GE bits in
APSR.  These are special SIMD instructions in Armv6 that pack bytes or
halfwords into the 32-bit general-purpose registers and set the GE bits in
APSR to indicate if some of the "lanes" of the result have overflowed or 
have

some other instruction-specific property.
These bits can then be used by the SEL instruction (accessed through the 
__sel

intrinsic) to select lanes for further processing.

This situation is similar to the Q-setting intrinsics: we have to track 
the GE

fake register, detect when a function reads it through __sel and restrict
existing patterns that may generate GE-clobbering instruction from
straight-line C code when reading the GE bits matters.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committed to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

    * config/arm/aout.h (REGISTER_NAMES): Add apsrge.
    * config/arm/arm.md (APSRGE_REGNUM): Define.
    (arm_): New define_insn.
    (arm_sel): Likewise.
    * config/arm/arm.h (FIXED_REGISTERS): Add entry for apsrge.
    (CALL_USED_REGISTERS): Likewise.
    (REG_ALLOC_ORDER): Likewise.
    (FIRST_PSEUDO_REGISTER): Update value.
    (ARM_GE_BITS_READ): Define.
    * config/arm/arm.c (arm_conditional_register_usage): Clear
    APSRGE_REGNUM from operand_reg_set.
    (arm_ge_bits_access): Define.
    * config/arm/arm-builtins.c (arm_check_builtin_call): Handle
    ARM_BUIILTIN_sel.
    * config/arm/arm-protos.h (arm_ge_bits_access): Declare prototype.
    * config/arm/arm-fixed.md (add3): Convert to define_expand.
    FAIL if ARM_GE_BITS_READ.
    (*arm_add3): New define_insn.
    (sub3): Convert to define_expand.  FAIL if ARM_GE_BITS_READ.
    (*arm_sub3): New define_insn.
    * config/arm/arm_acle.h (__sel, __sadd8, __ssub8, __uadd8, __usub8,
    __sadd16, __sasx, __ssax, __ssub16, __uadd16, __uasx, __usax,
    __usub16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_GE): New int_iterator.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md (UNSPEC_GE_SET): Define.
    (UNSPEC_SEL, UNSPEC_SADD8, UNSPEC_SSUB8, UNSPEC_UADD8, UNSPEC_USUB8,
    UNSPEC_SADD16, UNSPEC_SASX, UNSPEC_SSAX, UNSPEC_SSUB16, UNSPEC_UADD16,
    UNSPEC_UASX, UNSPEC_USAX, UNSPEC_USUB16): Define.

2019-11-07  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.
    * gcc.target/arm/acle/simd32_sel.c: New test.

diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index a5f83cb503f61cc1cab0e61795edde33250610e7..72782758853a869bcb9a9d69f3fa0da979cd711f 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -72,7 +72,7 @@
   "wr8",   "wr9",   "wr10",  "wr11",\
   "wr12",  "wr13",  "wr14",  "wr15",\
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",\
-  "cc", "vfpcc", "sfp", "afp", "apsrq"\
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"		\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 995f50785f6ebff7b3cd47185516f7bcb4fd5b81..2d902d0b325bc1fe5e22831ef8a59a2bb37c1225 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3370,6 +3370,13 @@ arm_check_builtin_call (location_t , vec , tree fndecl,
 	  = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
 		   DECL_ATTRIBUTES (cfun->decl));
 }
+  if (fcode == ARM_BUILTIN_sel)
+{
+  if (cfun && cfun->decl)
+	DECL_ATTRIBUTES (cfun->decl)
+	  = tree_cons (get_identifier ("acle gebits"), NULL_TREE,
+		   DECL_ATTRIBUTES (cfun->decl));
+}
   return true;
 }
 
diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
index 85dbc5d05c35921bc5115df68d30292a712729cf..6d949ba7064c0587d4c5d7b855f2c04c6d0e08e7 100644
--- a/gcc/config/arm/arm-fixed.md
+++ b/gcc/config/arm/arm-fixed.md
@@ -28,11 +28,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "add3"
+(define_expand "add3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+  FAIL;
+  }
+)
+
+(define_insn "*arm_add3"
   [(set (match_operand:ADDSUB 0 "s_register_operand" "=r")
 	(plus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand" "r")
 		 (match_operand:ADDSUB 2 "s_register_operand" "r")))]
-  "TARGET_INT_SIMD"
+  "TARGET_INT_SIMD && !ARM_GE_BITS_READ"
   "sadd%?\\t%0, %1, %2"
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_dsp_reg")])
@@ -76,11 +87,22 @@
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "alu_sreg")])
 
-(define_insn "sub3"
+(define_expand "sub3"
+  [(set (match_operand:ADDSUB 0 "s_register_operand")
+	(minus:ADDSUB (match_operand:ADDSUB 1 "s_register_operand")
+		 (match_operand:ADDSUB 2 "s_register_operand")))]
+  "TARGET_INT_SIMD"
+  {
+if (ARM_GE_BITS_READ)
+

Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-11 Thread Kyrill Tkachov


Hi Richard,

On 11/9/19 12:44 PM, Richard Henderson wrote:

On 11/7/19 11:26 AM, Kyrill Tkachov wrote:

-;; The code sequence emitted by this insn pattern uses the Q flag, which GCC
-;; doesn't generally know about, so we don't bother expanding to individual
-;; instructions.  It may be better to just use an out-of-line asm libcall for
-;; this.
+;; The code sequence emitted by this insn pattern uses the Q flag, so we need
+;; to bail out when ARM_Q_BIT_READ and resort to a library sequence instead.
+
+(define_expand "ssmulsa3"
+  [(parallel [(set (match_operand:SA 0 "s_register_operand")
+   (ss_mult:SA (match_operand:SA 1 "s_register_operand")
+   (match_operand:SA 2 "s_register_operand")))
+   (clobber (match_scratch:DI 3))
+   (clobber (match_scratch:SI 4))
+   (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_32BIT && arm_arch6"
+  {
+if (ARM_Q_BIT_READ)
+  FAIL;
+  }
+)

Coming back to this, why would you not just represent the update of the Q bit?
  This is not generated by generic pattern matching, but by the __ssmulsa3
builtin function.  It seems easy to me to simply describe how this older
builtin operates in conjunction with the new acle builtins.

I recognize that ssadd3 etc are more difficult, because they can be
generated by arithmetic operations on TYPE_SATURATING.  Although again it seems
weird to generate expensive out-of-line code for TYPE_SATURATING when used in
conjunction with acle builtins.

I think it would be better to merely expand the documentation.  Even if only so
far as to say "unsupported to mix these".


I'm tempted to agree, as this part of the patch is quite ugly.

Thank you for the comments on these patches, I wasn't aware of some of 
the mechanisms.


I guess I should have posted the series as an RFC first...

I'll send patches to fix up the issues.

Thanks,

Kyrill


+(define_expand "maddhisi4"
+  [(set (match_operand:SI 0 "s_register_operand")
+   (plus:SI (mult:SI (sign_extend:SI
+  (match_operand:HI 1 "s_register_operand"))
+ (sign_extend:SI
+  (match_operand:HI 2 "s_register_operand")))
+(match_operand:SI 3 "s_register_operand")))]
+  "TARGET_DSP_MULTIPLY"
+  {
+/* If this function reads the Q bit from ACLE intrinsics break up the
+   multiplication and accumulation as an overflow during accumulation will
+   clobber the Q flag.  */
+if (ARM_Q_BIT_READ)
+  {
+   rtx tmp = gen_reg_rtx (SImode);
+   emit_insn (gen_mulhisi3 (tmp, operands[1], operands[2]));
+   emit_insn (gen_addsi3 (operands[0], tmp, operands[3]));
+   DONE;
+  }
+  }
+)
+
+(define_insn "*arm_maddhisi4"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (sign_extend:SI
   (match_operand:HI 1 "s_register_operand" "r"))
  (sign_extend:SI
   (match_operand:HI 2 "s_register_operand" "r")))
 (match_operand:SI 3 "s_register_operand" "r")))]
-  "TARGET_DSP_MULTIPLY"
+  "TARGET_DSP_MULTIPLY && !ARM_Q_BIT_READ"
"smlabb%?\\t%0, %1, %2, %3"
[(set_attr "type" "smlaxy")
 (set_attr "predicable" "yes")]

I think this case would be better represented with a single
define_insn_and_split and a peephole2.  It is easy to notice during peep2
whether or not the Q bit is actually live at the exact place we want to expand
this operation.  If it is live, then use two insns; if it isn't, use one.


r~

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Kyrill Tkachov


Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.


Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

     * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
     AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
     AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
     AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
     AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
     (aarch64_init_memtag_builtins): New.
     (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
     (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
     (aarch64_expand_builtin_memtag): New.
     (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
     (AARCH64_BUILTIN_SUBCODE): New macro.
     (aarch64_resolve_overloaded_memtag): New.
     (aarch64_resolve_overloaded_builtin): New hook. Call
     aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
     * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
     __ARM_FEATURE_MEMORY_TAGGING when enabled.
     * config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin):
     Add declaration.
     * config/aa

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM instruction
when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple instruction 
pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register clearing if
    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks for 
Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,6 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);

 extern int t

Re: [PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
VSCCLRM to do all the VFP register clearing as well as clearing the VFP
register.

=== Patch description ===

This patch adds a new pattern for the VSCCLRM instruction.
cmse_clear_registers () is then modified to use the new VSCCLRM
instruction when targeting Armv8.1-M Mainline, thus, making the Armv8-M
register clearing code specific to Armv8-M.

Since the VSCCLRM instruction mandates VPR in the register list, the
pattern is encoded with a parallel which only requires an unspecified
VUNSPEC_CLRM_VPR constant modelling the APSR clearing. Other expression
in the parallel are expected to be set expression for clearing the VFP
registers.

I see we don't represent the VPR here as a register and use an UNSPEC to 
represent its clearing.


That's okay for now but when we do add support for it for MVE we'll need 
to adjust the RTL representation here to show its clobbering.




ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Adapt prototype.
    * config/arm/arm.c (clear_operation_p): Extend to be able to 
check a

    clear_vfp_multiple pattern based on a new vfp parameter.
    (cmse_clear_registers): Generate VSCCLRM to clear VFP 
registers when

    targeting Armv8.1-M Mainline.
    (cmse_nonsecure_entry_clear_before_return): Clear VFP registers
    unconditionally when targeting Armv8.1-M Mainline 
architecture.  Check
    whether VFP registers are available before looking 
call_used_regs for a

    VFP register.
    * config/arm/predicates.md (clear_multiple_operation): Adapt 
to change

    of prototype of clear_operation_p.
    (clear_vfp_multiple_operation): New predicate.
    * config/arm/unspecs.md (VUNSPEC_VSCCLRM_VPR): New volatile 
unspec.

    * config/arm/vfp.md (clear_vfp_multiple): New define_insn.


Ok.

Thanks,

kyrill




*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for VSCCLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.

Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
1a948d2c97526ad7e67e8d4a610ac74cfdb13882..37a46982bbc1a8f17abe2fc76ba3cb7d65257c0d 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,7 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);

 extern int thumb1_legitimate_address_p (machine_mode, rtx, int);
 extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode,
  bool, bool);
-extern bool clear_operation_p (rtx);
+extern bool clear_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int vfp3_const_double_rtx (rtx);
 extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, 
int *);

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f1f730cecff0fb3da7115ea1147dc8b9ab7076b7..5f3ce5c4605f609d1a0e31c0f697871266bdf835 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13499,8 +13499,9 @@ ldm_stm_operation_p (rtx op, bool load, 
machine_mode mode,

   return true;
 }

-/* Checks whether OP is a valid parallel pattern for a CLRM insn.  To 
be a

-   valid CLRM pattern, OP must have the following form:
+/* Checks whether OP is a valid parallel pattern for a CLRM (if VFP 
is fa

Re: [PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline callee-saved register clearing when calling a function with the
cmse_nonsecure_call attribute with the ultimate goal of having the whole
call sequence inline.

=== Patch description ===

Besides changing the set of registers that needs to be cleared inline,
this patch also generates the push and pop to save and restore
callee-saved registers without trusting the callee inline. To make the
code more future-proof, this (currently) Armv8.1-M specific behavior is
expressed in terms of clearing of callee-saved registers rather than
directly based on the targets.

The patch contains 1 subtlety:

Debug information is disabled for push and pop because the
REG_CFA_RESTORE notes used to describe popping of registers do not stack.
Instead, they just reset the debug state for the register to the one at
the beginning of the function, which is incorrect for a register that is
pushed twice (in prologue and before nonsecure call) and then popped for
the first time. In particular, this occasionally trips CFI note creation
code when there are two codepaths to the epilogue, one of which does not
go through the nonsecure call. Obviously this mean that debugging
between the push and pop is not reliable.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (arm_emit_multi_reg_pop): Declare early.
    (cmse_nonsecure_call_clear_caller_saved): Rename into ...
    (cmse_nonsecure_call_inline_register_clear): This. Save and clear
    callee-saved GPRs as well as clear ip register before doing a 
nonsecure

    call then restore callee-saved GPRs after it when targeting
    Armv8.1-M Mainline.
    (arm_reorg): Adapt to function rename.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for PUSH and POP and 
update

    CLRM check.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?



This is ok.

I think you should get commit access to GCC by now.

Please fill in the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi

listing me as the approver (using my details from the MAINTAINERS file).

Of course, only commit this once the whole series is approved ;)

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
fca10801c87c5e635d573c0fbdc47a1ae229d0ef..12b4b42a66b0c5589690d9a2d8cf8e42712ca2c0 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -187,6 +187,7 @@ static int arm_memory_move_cost (machine_mode, 
reg_class_t, bool);

 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
+static void arm_emit_multi_reg_pop (unsigned long);
 static int arm_arg_partial_bytes (cum

Re: [PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:
[PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall 
functions


Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline instructions to save, clear and restore callee-saved VFP
registers before doing a call to a function with the cmse_nonsecure_call
attribute.

=== Patch description ===

The patch is fairly straightforward in its approach and consist of the
following 3 logical changes:
- abstract the number of floating-point register to clear in
  max_fp_regno
- use max_fp_regno to decide how many registers to clear so that the
  same code works for Armv8-M and Armv8.1-M Mainline
- emit vpush and vpop instruction respectively before and after a
  nonsecure call

Note that as in the patch to clear GPRs inline, debug information has to
be disabled for VPUSH and VPOP due to VPOP adding CFA adjustment note
for SP when R7 is sometimes used as CFA.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (vfp_emit_fstmd): Declare early.
    (arm_emit_vfp_multi_reg_pop): Likewise.
    (cmse_nonsecure_call_inline_register_clear): Abstract number 
of VFP
    registers to clear in max_fp_regno.  Emit VPUSH and VPOP to 
save and

    restore callee-saved VFP registers.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Add 
check for

    VPUSH and VPOP and update expectation for VSCCLRM.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?


Ok.

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c24996897eb21c641914326f7064a26bbb363411..bcc86d50a10f11d9672258442089a0aa5c450b2f 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -188,6 +188,8 @@ static void emit_constant_insn (rtx cond, rtx 
pattern);

 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
+static int vfp_emit_fstmd (int, int);
+static void arm_emit_vfp_multi_reg_pop (int, int, rtx);
 static int arm_arg_partial_bytes (cumulative_args_t,
   const function_arg_info &);
 static rtx arm_function_arg (cumulative_args_t, const 
function_arg_info &);

@@ -17834,8 +17836,10 @@ cmse_nonsecure_call_inline_register_clear (void)
   unsigned address_regnum, regno;
   unsigned max_int_regno =
 clear_callee_saved ? IP_REGNUM : LAST_ARG_REGNUM;
+ unsigned max_fp_regno =
+   TARGET_HAVE_FPCTX_CMSE ? LAST_VFP_REGNUM : D7_VFP_REGNUM;
   unsigned maxregno =
-   TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : max_int_regno;
+   TARGET_HARD_FLOAT_ABI ? max_fp_regno : max_int_regno;
   auto_sbitmap to_clear_bitmap (maxregno + 1);
   rtx_insn *seq;
   rtx pat, call, unspec, clearing_reg, ip_reg, shift;
@@ -17883,7 +17887,7 @@ cmse_nonsecure_call_inline_register_clear (void)

   bitmap_clear (float_bitmap);
   bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM,
-   D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1);
+   max_fp_regno - FIRST_VFP_REGNUM + 1);
   bitmap_ior (to_clear_bitmap, to_clear_bitmap, 
float_bitmap);

 }

@@ -17960,6 +17964,16 @@ cmse_nonsecure_call_inline_register_clear (void)
   /* Disable frame debug info in push because it needs to be
  disabled for pop (see below).  */
   RTX_FRAME_RELATED_P (push_insn) = 0;
+
+ /* Save VFP callee-saved registers.  */
+ if (TARGET_HARD_FLOAT_ABI)
+   {
+ vfp_emit_fstmd (D7_VFP_REGNUM + 1,
+ (max_fp_regno - D7_VFP_REGNUM) / 2);
+ /* Disable frame debug info in push because it needs 
to be

+    disabled for vpop (see below).  */
+ RTX_FRAME_RELATED_P (get_last_insn ()) = 0;
+   }
 }

   /* Clear caller-saved registers that leak before doing a 
non-secure

@@ -17974,9 +17988,25 @@ cmse_nonsecure_call_inline_register_clear (void)

   if (TARGET_HAVE_FPCTX_CMSE)

Re: [PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function

2019-11-12 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 3:24 PM, Mihail Ionescu wrote:
[PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall 
function


Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
lazy store and load instruction inline when calling a function with the
cmse_nonsecure_call attribute with the soft or softfp floating-point
ABI.

=== Patch description ===

This patch adds two new patterns for the VLSTM and VLLDM instructions.
cmse_nonsecure_call_inline_register_clear is then modified to
generate VLSTM and VLLDM respectively before and after calls to
functions with the cmse_nonsecure_call attribute in order to have lazy
saving, clearing and restoring of VFP registers. Since these
instructions do not do writeback of the base register, the stack is 
adjusted

prior the lazy store and after the lazy load with appropriate frame
debug notes to describe the effect on the CFA register.

As with CLRM, VSCCLRM and VSTR/VLDR, the instruction is modeled as an
unspecified operation to the memory pointed to by the base register.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (arm_add_cfa_adjust_cfa_note): Declare early.
    (cmse_nonsecure_call_inline_register_clear): Define new 
lazy_fpclear

    variable as true when floating-point ABI is not hard.  Replace
    check against TARGET_HARD_FLOAT_ABI by checks against 
lazy_fpclear.

    Generate VLSTM and VLLDM instruction respectively before and
    after a function call to cmse_nonsecure_call function.
    * config/arm/unspecs.md (VUNSPEC_VLSTM): Define unspec.
    (VUNSPEC_VLLDM): Likewise.
    * config/arm/vfp.md (lazy_store_multiple_insn): New define_insn.
    (lazy_load_multiple_insn): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Add check 
for VLSTM and

    VLLDM.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
bcc86d50a10f11d9672258442089a0aa5c450b2f..b10f996c023e830ca24ff83fcbab335caf85d4cb 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -186,6 +186,7 @@ static int arm_register_move_cost (machine_mode, 
reg_class_t, reg_class_t);

 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
+static void arm_add_cfa_adjust_cfa_note (rtx, int, rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
 static int vfp_emit_fstmd (int, int);
@@ -17830,6 +17831,9 @@ cmse_nonsecure_call_inline_register_clear (void)
   FOR_BB_INSNS (bb, insn)
 {
   bool clear_callee_saved = TARGET_HAVE_FPCTX_CMSE;
+ /* frame = VFP regs + FPSCR + VPR.  */
+ unsigned lazy_store_stack_frame_size =
+   (LAST_VFP_REGNUM - FIRST_VFP_REGNUM + 1 + 2) * UNITS_PER_WORD;
   unsigned long callee_saved_mask =
 ((1 << (LAST_HI_REGNUM + 1)) - 1)
 & ~((1 << (LAST_ARG_REGNUM + 1)) - 1);
@@ -17847,7 +17851,7 @@ cmse_nonsecure_call_inline_register_clear (void)
   CUMULATIVE_ARGS args_so_far_v;
   cumulative_args_t args_so_far;
   tree arg_type, fntype;
- bool first_param = true;
+ bool first_param = true, lazy_fpclear = !TARGET_HARD_FLOAT_ABI;
   function_args_iterator args_iter;
   uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U};

@@ -17881,7 +17885,7 @@ cmse_nonsecure_call_inline_register_clear (void)
  -mfloat-abi=hard.  For -mfloat-abi=softfp we will be 
using the
  lazy store and loads which clear both caller- and 
callee-saved

  registers.  */
- if (TARGET_HARD_FLOAT_ABI)
+ if (!lazy_fpclear)
 {
   auto_sbitmap float_bitmap (maxregno + 1);

@@ -17965,8 +17969,23 @@ cmse_nonsecure_call_inline_register_clear (void)
  disabled for pop (see below).  */
   RTX_FRAME_RELATED_P (push

Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to call
functions with the cmse_nonsecure_call attribute directly using blxns
with no undue restriction on the register used for that.

=== Patch description ===

This change to use BLXNS to call a nonsecure function from secure
directly (not using a libcall) is made in 2 steps:
- change nonsecure_call patterns to use blxns instead of calling
  __gnu_cmse_nonsecure_call
- loosen requirement for function address to allow any register when
  doing BLXNS.

The former is a straightforward check over whether instructions added in
Armv8.1-M Mainline are available while the latter consist in making the
nonsecure call pattern accept any register by using match_operand and
changing the nonsecure_call_internal expander to no force r4 when
targeting Armv8.1-M Mainline.

The tricky bit is actually in the test update, specifically how to check
that register lists for CLRM have all registers except for the one
holding parameters (already done) and the one holding the address used
by BLXNS. This is achieved with 3 scan-assembler directives.

1) The first one lists all registers that can appear in CLRM but make
   each of them optional.
   Property guaranteed: no wrong register is cleared and none appears
   twice in the register list.
2) The second directive check that the CLRM is made of a fixed number
   of the right registers to be cleared. The number used is the number
   of registers that could contain a secret minus one (used to hold the
   address of the function to call.
   Property guaranteed: register list has the right number of registers
   Cumulated property guaranteed: only registers with a potential secret
   are cleared and they are all listed but ont
3) The last directive checks that we cannot find a CLRM with a register
   in it that also appears in BLXNS. This is check via the use of a
   back-reference on any of the allowed register in CLRM, the
   back-reference enforcing that whatever register match in CLRM must be
   the same in the BLXNS.
   Property guaranteed: register used for BLXNS is different from
   registers cleared in CLRM.

Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c
testcase due to there being two CLRM generated. To ensure the third
directive match the right CLRM to the BLXNS, a negative lookahead is
used between the CLRM register list and the BLXNS. The way negative
lookahead work is by matching the *position* where a given regular
expression does not match. In this case, since it comes after the CLRM
register list it is requesting that what comes after the register list
does not have a CLRM again followed by BLXNS. This guarantees that the
.*blxns after only matches a blxns without another CLRM before.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.md (nonsecure_call_internal): Do not force memory
    address in r4 when targeting Armv8.1-M Mainline.
    (nonsecure_call_value_internal): Likewise.
    * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make 
memory address

    a register match_operand again.  Emit BLXNS when targeting
    Armv8.1-M Mainline.
    (nonsecure_call_value_reg_thumb2): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when 
instructions
    introduced in Armv8.1-M Mainline Security Extensions are 
available and
    restrict checks for libcall to __gnu_cmse_nonsecure_call to 
Armv8-M
    targets only.  Adapt CLRM check to verify register used for 
BLXNS is

    not in the CLRM register list.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise and 
adapt
    check for LSB clearing bit to be using the same register as 
BLXNS when

    targeting Armv8.1-M Mainline.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainlin

Re: [PATCH, GCC/ARM, 10/10] Enable -mcmse

2019-11-12 Thread Kyrill Tkachov




On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 10/10] Enable -mcmse

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable the
-mcmse option now that support for Armv8.1-M Security Extension is
complete.

=== Patch description ===

The patch is straightforward: it redefines ARMv8_1m_main as having the
same features as ARMv8m_main (and thus as having the cmse feature) with
the extra features represented by armv8_1m_main.  It also removes the
error for using -mcmse on Armv8.1-M Mainline.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-cpus.in (ARMv8_1m_main): Redefine as an 
extension to

    Armv8-M Mainline.
    * config/arm/arm.c (arm_options_perform_arch_sanity_checks): 
Remove

    error for using -mcmse when targeting Armv8.1-M Mainline.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?


Ok once the rest of the series is in.

Does this need some documentation though?

Thanks,

Kyrill



Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
652f2a4be9388fd7a74f0ec4615a292fd1cfcd36..a845dd2f83a38519a1387515a2d4646761fb405f 
100644

--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -259,10 +259,7 @@ define fgroup ARMv8_5a    ARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
-# Feature cmse is omitted to disable Security Extensions support 
while secure
-# code compiled by GCC does not preserve FP context as allowed by 
Armv8.1-M

-# Mainline.
-define fgroup ARMv8_1m_main ARMv7m armv8 armv8_1m_main
+define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main

 # Useful combinations.
 define fgroup VFPv2 vfpv2
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
cabcce8c8bd11c5ff3516c3102c0305b865b00cb..0f19b4eb4ec4fcca2df10e1b8e0b79d1a1e0a93d 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3742,9 +3742,6 @@ arm_options_perform_arch_sanity_checks (void)
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
 sorry ("__fp16 and no ldrh");

-  if (use_cmse && arm_arch8_1m_main)
-    error ("ARMv8.1-M Mainline Security Extensions is unsupported");
-
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-11-12 Thread Kyrill Tkachov


Hi Christophe,

On 10/18/19 2:18 PM, Christophe Lyon wrote:

Hi,

This patch extends support for -mpure-code to all thumb-1 processors,
by removing the need for MOVT.

Symbol addresses are built using upper8_15, upper0_7, lower8_15 and
lower0_7 relocations, and constants are built using sequences of
movs/adds and lsls instructions.

The extension of the *thumb1_movhf pattern uses always the same size
(6) although it can emit a shorter sequence when possible. This is
similar to what *arm32_movhf already does.

CASE_VECTOR_PC_RELATIVE is now false with -mpure-code, to avoid
generating invalid assembly code with differences from symbols from
two different sections (the difference cannot be computed by the
assembler).

Tests pr45701-[12].c needed a small adjustment to avoid matching
upper8_15 when looking for the r8 register.

Test no-literal-pool.c is augmented with __fp16, so it now uses
-mfp16-format=ieee.

Test thumb1-Os-mult.c generates an inline code sequence with
-mpure-code and computes the multiplication by using a sequence of
add/shift rather than using the multiply instruction, so we skip it in
presence of -mpure-code.

With -mcpu=cortex-m0, the pure-code/no-literal-pool.c fails because
code like:
static char *p = "Hello World";
char *
testchar ()
{
  return p + 4;
}
generates 2 indirections (I removed non-essential directives/code)
    .section    .rodata
.LC0:
.ascii  "Hello World\000"
.data
p:
.word   .LC0
.section    .rodata
.LC2:
.word   p
.section .text,"0x2006",%progbits
testchar:
push    {r7, lr}
add r7, sp, #0
movs    r3, #:upper8_15:#.LC2
lsls    r3, #8
adds    r3, #:upper0_7:#.LC2
lsls    r3, #8
adds    r3, #:lower8_15:#.LC2
lsls    r3, #8
adds    r3, #:lower0_7:#.LC2
ldr r3, [r3]
ldr r3, [r3]
adds    r3, r3, #4
movs    r0, r3
mov sp, r7
@ sp needed
pop {r7, pc}

By contrast, when using -mcpu=cortex-m4, the code looks like:
    .section    .rodata
.LC0:
.ascii  "Hello World\000"
.data
p:
.word   .LC0
testchar:
push    {r7}
add r7, sp, #0
movw    r3, #:lower16:p
movt    r3, #:upper16:p
ldr r3, [r3]
adds    r3, r3, #4
mov r0, r3
mov sp, r7
pop {r7}
bx  lr

I haven't found yet how to make code for cortex-m0 apply upper/lower
relocations to "p" instead of .LC2. The current code looks functional,
but could be improved.

OK as-is?

Thanks,

Christophe



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f995974..beb8411 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -66,6 +66,7 @@ extern bool arm_small_register_classes_for_mode_p 
(machine_mode);
 extern int const_ok_for_arm (HOST_WIDE_INT);
 extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code);
 extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
+extern void thumb1_gen_const_int (rtx, HOST_WIDE_INT);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
   HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0975d..836f147 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2882,13 +2882,19 @@ arm_option_check_internal (struct gcc_options *opts)
 {
   const char *flag = (target_pure_code ? "-mpure-code" :
 "-mslow-flash-data");
+  bool not_supported = arm_arch_notm || flag_pic || TARGET_NEON;
 
-  /* We only support -mpure-code and -mslow-flash-data on M-profile targets

-with MOVT.  */
-  if (!TARGET_HAVE_MOVT || arm_arch_notm || flag_pic || TARGET_NEON)
+  /* We only support -mslow-flash-data on M-profile targets with
+MOVT.  */
+  if (target_slow_flash_data && (!TARGET_HAVE_MOVT || not_supported))
error ("%s only supports non-pic code on M-profile targets with the "
   "MOVT instruction", flag);
 
+  /* We only support -mpure-code-flash-data on M-profile

+targets.  */


Typo in the option name.

+  if (target_pure_code && not_supported)
+   error ("%s only supports non-pic code on M-profile targets", flag);
+
   /* Cannot load addresses: -mslow-flash-data forbids literal pool and
 -mword-relocations forbids relocation of MOVT/MOVW.  */
   if (target_word_relocations)
@@ -4400,6 +4406,38 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code 
code)
 }
 }
 
+/* Emit a sequence of movs/adds/shift to produce a 32-bit constant.

+   Avoid generating useless code when one of the bytes is zero.  */
+void
+thumb1_gen_const_int (rtx op0, HOST_WIDE_INT op1)
+{
+  bool mov_done_p = false;
+  int i;
+
+  /* Emit upper 3 bytes if needed.  */
+  for (i = 0; i < 3; i++)
+{
+  int byte = (op1 >> (8 * (3 - i))) & 0xff;
+
+  if (byte)
+   {
+ emit_set_insn (op0, mov_done_p
+? gen_rtx_PLUS (SImode,op0, GEN_INT (byte))
+: GEN_INT (byte));
+ mov_done_p = tr

Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-12 Thread Kyrill Tkachov


Hi Christophe,

On 11/12/19 10:29 AM, Christophe Lyon wrote:

On Thu, 7 Nov 2019 at 11:26, Kyrill Tkachov  wrote:

Hi all,

This patch adds the plumbing for and an implementation of the saturation
intrinsics from ACLE [1], in particular the __ssat, __usat intrinsics.
These intrinsics set the Q sticky bit in APSR if an overflow occurred.
ACLE allows the user to read that bit (within the same function, it's not
defined across function boundaries) using the __saturation_occurred
intrinsic
and reset it using __set_saturation_occurred.
Thus, if the user cares about the Q bit they would be using a flow such as:

__set_saturation_occurred (0); // reset the Q bit
...
__ssat (...) // Do some calculations involving __ssat
...
if (__saturation_occurred ()) // if Q bit set handle overflow
...

For the implementation this has a few implications:
* We must track the Q-setting side-effects of these instructions to make
sure
saturation reading/writing intrinsics are ordered properly.
This is done by introducing a new "apsrq" register (and associated
APSRQ_REGNUM) in a similar way to the "fake"" cc register.

* The RTL patterns coming out of these intrinsics can have two forms:
one where they set the APSRQ_REGNUM and one where they don't.
Which one is used depends on whether the function cares about reading the Q
flag. This is detected using the TARGET_CHECK_BUILTIN_CALL hook on the
__saturation_occurred, __set_saturation_occurred occurrences.
If no Q-flag read is present in the function we'll use the simpler
non-Q-setting form to allow for more aggressive scheduling and such.
If a Q-bit read is present then the Q-setting form is emitted.
To avoid adding two patterns for each intrinsic to the MD file we make
use of define_subst to auto-generate the Q-setting forms

* Some existing patterns already produce instructions that may clobber the
Q bit, but they don't model it (as we didn't care about that bit up till
now).
Since these patterns can be generated from straight-line C code they can
affect
the Q-bit reads from intrinsics. Therefore they have to be disabled when
a Q-bit read is present.  These are mostly patterns in arm-fixed.md that are
not very common anyway, but there are also a couple of widening
multiply-accumulate patterns in arm.md that can set the Q-bit during
accumulation.

There are more Q-setting intrinsics in ACLE, but these will be
implemented in
a more mechanical fashion once the infrastructure in this patch goes in.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill


2019-11-07  Kyrylo Tkachov  

  * config/arm/aout.h (REGISTER_NAMES): Add apsrq.
  * config/arm/arm.md (APSRQ_REGNUM): Define.
  (add_setq): New define_subst.
  (add_clobber_q_name): New define_subst_attr.
  (add_clobber_q_pred): Likewise.
  (maddhisi4): Change to define_expand.  Split into mult and add if
  ARM_Q_BIT_READ.
  (arm_maddhisi4): New define_insn.
  (*maddhisi4tb): Disable for ARM_Q_BIT_READ.
  (*maddhisi4tt): Likewise.
  (arm_ssat): New define_expand.
  (arm_usat): Likewise.
  (arm_get_apsr): New define_insn.
  (arm_set_apsr): Likewise.
  (arm_saturation_occurred): New define_expand.
  (arm_set_saturation): Likewise.
  (*satsi_): Rename to...
  (satsi_): ... This.
  (*satsi__shift): Disable for ARM_Q_BIT_READ.
  * config/arm/arm.h (FIXED_REGISTERS): Mark apsrq as fixed.
  (CALL_USED_REGISTERS): Mark apsrq.
  (FIRST_PSEUDO_REGISTER): Update value.
  (REG_ALLOC_ORDER): Add APSRQ_REGNUM.
  (machine_function): Add q_bit_access.
  (ARM_Q_BIT_READ): Define.
  * config/arm/arm.c (TARGET_CHECK_BUILTIN_CALL): Define.
  (arm_conditional_register_usage): Clear APSRQ_REGNUM from
  operand_reg_set.
  (arm_q_bit_access): Define.
  * config/arm/arm-builtins.c: Include stringpool.h.
  (arm_sat_binop_imm_qualifiers,
  arm_unsigned_sat_binop_unsigned_imm_qualifiers,
  arm_sat_occurred_qualifiers, arm_set_sat_qualifiers): Define.
  (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS,
  UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS, SAT_OCCURRED_QUALIFIERS,
  SET_SAT_QUALIFIERS): Likewise.
  (arm_builtins): Define ARM_BUILTIN_SAT_IMM_CHECK.
  (arm_init_acle_builtins): Initialize __builtin_sat_imm_check.
  Handle 0 argument expander.
  (arm_expand_acle_builtin): Handle ARM_BUILTIN_SAT_IMM_CHECK.
  (arm_check_builtin_call): Define.
  * config/arm/arm.md (ssmulsa3, usmulusa3, usmuluha3,
  arm_ssatsihi_shift, arm_usatsihi): Disable when ARM_Q_BIT_READ.
  * config/arm/arm-protos.h (arm_check_builtin_call): Declare prototype.
  (arm_q_bit_access): Likewise.
  * config/arm/arm_acle.h (__ssat, __usat, __ignore_saturation,
  __saturation_occurred, __set_saturation_occurred): Define.
  * config/arm/arm_acle_builtins.def: Define builtins for ssat, usat,
  saturation_occurred, set_

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Kyrill Tkachov




On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

 * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
 AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
 AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
 AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
 AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
 (aarch64_init_memtag_builtins): New.
 (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
 (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
 (aarch64_expand_builtin_memtag): New.
 (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
 (AARCH64_BUILTIN_SUBCODE): New macro.
 (aarch64_resolve_overloaded_memtag): New.
 (aarch64_resolve_overloaded_builtin_general): New hook. Call
 aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
 * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
 __ARM_FEATURE_MEMORY_TAGGING when enabled.
 (aarch64_resolve_overloaded_builtin): Call
 aarch64_resolve_overloaded_builtin_general.
 * config/aarch64/aarch64-protos.h
 (aarch64_resolve_overloaded_builtin_general): New declaration.
 * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
 (TARGET_MEMTAG): Likewise.
 * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
 UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
 (irg, gmi, subp, addg, ldg, stg): New instructions.
 * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
 (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
 (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
 * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
 (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
 * config/arm/types.md (memtag): New.
 * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

 * gcc.target/aarch64/acle/memtag_1.c: New test.
 * gcc.target/aarch64/acle/memtag_2.c: New test.
 * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  * config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin):
  Add declaration.
  *

[PATCH][wwwdocs] GCC 10 changes.html for arm and aarch64

2020-01-10 Thread Kyrill Tkachov


Hi all,

This patch adds initial entries for notable features that went in to GCC 
10 on the arm and aarch64 front.
The list is by no means complete so if you'd like your contribution 
called please shout or post a follow-up patch.

It is, nevertheless, a decent start at the relevant sections in changes.html

Thanks,
Kyrill

commit b539d38b322883ed5aa6563ac879af6a5ebabd96
Author: Kyrylo Tkachov 
Date:   Thu Nov 7 17:58:45 2019 +

[arm/aarch64] GCC 10 changes.html

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index d6108269..8f498017 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -322,17 +322,102 @@ a work-in-progress.
 
 New Targets and Target Specific Improvements
 
-
+AArch64 & arm
+
+  The AArch64 and arm ports now support condition flag output constraints
+  in inline assembly, as indicated by the __GCC_ASM_FLAG_OUTPUTS__.
+  On arm this feature is only available for A32 and T32 targets.
+  Please refer to the documentation for more details. 
+
+
+AArch64
+
+   The -mbranch-protection=pac-ret option now accepts the
+  optional argument +b-key extension to perform return address
+  signing with the B-key instead of the A-key.
+  
+  The Transactional Memory Extension is now supported through ACLE
+  intrinsics.  It can be enabled through the +tme option
+  extension (for example, -march=armv8.5-a+tme).
+  
+  Initial autovectorization support for SVE2 has been added and can be
+  enabled through the   +sve2 option extension (for example,
+  -march=armv8.5-a+sve2).  Additional extensions can be enabled
+  through +sve2-sm4, +sve2=aes,
+  +sve2-sha3, +sve2-bitperm.
+  
+   A number of features from the Armv8.5-a are now supported through ACLE
+  intrinsics.  These include:
+
+	The random number instructions that can be enabled
+	through the (already present in GCC 9.1) +rng option
+	extension.
+	Floating-point intrinsics to round to integer instructions from
+	Armv8.5-a when targeting -march=armv8.5-a or later.
+	Memory Tagging Extension intrinsics enabled through the
+	+memtag option extension.
+
+  
+   The option -moutline-atomics has been added to aid
+  deployment of the Large System Extensions (LSE) on GNU/Linux systems built
+  with a baseline architecture targeting Armv8-A.  When the option is
+  specified code is emitted to detect the presence of LSE instructions at
+  runtime and use them for standard atomic operations.
+  For more information please refer to the documentation.
+  
+  
+   Support has been added for the following processors
+   (GCC identifiers in parentheses):
+   
+ Arm Cortex-A77 (cortex-a77).
+	 Arm Cortex-A76AE (cortex-a76ae).
+	 Arm Cortex-A65 (cortex-a65).
+	 Arm Cortex-A65AE (cortex-a65ae).
+	 Arm Cortex-A34 (cortex-a34).
+   
+   The GCC identifiers can be used
+   as arguments to the -mcpu or -mtune options,
+   for example: -mcpu=cortex-a77 or
+   -mtune=cortex-a65ae or as arguments to the equivalent target
+   attributes and pragmas.
+  
+
 
 
 
-ARM
+arm
 
   Support for the FDPIC ABI has been added. It uses 64-bit
   function descriptors to represent pointers to functions, and enables
   code sharing on MMU-less systems. The corresponding target triple is
   arm-uclinuxfdpiceabi, and the C library is uclibc-ng.
   
+  Support has been added for the Arm EABI on NetBSD through the
+  arm*-*-netbsdelf-*eabi* triplet.
+  
+  The handling of 64-bit integer operations has been significantly reworked
+  and improved leading to improved performance and reduced stack usage when using
+  64-bit integral data types.  The option -mneon-for-64bits is now
+  deprecated and will be removed in a future release.
+  
+   Support has been added for the following processors
+   (GCC identifiers in parentheses):
+   
+ Arm Cortex-A77 (cortex-a77).
+	 Arm Cortex-A76AE (cortex-a76ae).
+	 Arm Cortex-M35P (cortex-m35p).
+   
+   The GCC identifiers can be used
+   as arguments to the -mcpu or -mtune options,
+   for example: -mcpu=cortex-a77 or
+   -mtune=cortex-m35p.
+  
+  Support has been extended for the ACLE
+  https://developer.arm.com/docs/101028/0009/data-processing-intrinsics";>
+  data-processing intrinsics to include 32-bit SIMD, saturating arithmetic,
+  16-bit multiplication and other related intrinsics aimed at DSP algorithm
+  optimization.
+

Re: [Patch 0/X] HWASAN v3

2020-01-10 Thread Kyrill Tkachov

On 1/8/20 11:26 AM, Matthew Malcomson wrote:

Hi everyone,

I'm writing this email to summarise & publicise the state of this patch
series, especially the difficulties around approval for GCC 10 mentioned
on IRC.

The main obstacle seems to be that no maintainer feels they have enough
knowledge about hwasan and justification that it's worthwhile to approve
the patch series.

Similarly, Martin has given a review of the parts of the code he can
(thanks!), but doesn't feel he can do a deep review of the code related
to the RTL hooks and stack expansion -- hence that part is as yet not
reviewed in-depth.

The questions around justification raised on IRC are mainly that it
seems like a proof-of-concept for MTE rather than a stand-alone useable
sanitizer.  Especially since in the GNU world hwasan instrumented code
is not really ready for production since we can only use the
less-"interceptor ABI" rather than the "platform ABI".  This restriction
is because there is no version of glibc with the required modifications
to provide the "platform ABI".

(n.b. that since https://reviews.llvm.org/D69574 the code-generation for
these ABI's is the same).

 From my perspective the reasons that make HWASAN useful in itself are:

1) Much less memory usage.

 From a back-of-the-envelope calculation based on the hwasan paper's
table of memory overhead from over-alignment
https://arxiv.org/pdf/1802.09517.pdf I guess hwasan instrumented code
has an overhead of about 1.1x (~4% from overalignment and ~6.25% from
shadow memory), while asan seems to have an overhead somewhere in the
range 1.5x - 3x.

Maybe there's some data out there comparing total overheads that I
haven't found? (I'd appreciate a reference if anyone has that info).

2) Available on more architectures that MTE.

HWASAN only requires TBI, which is a feature of all AArch64 machines,
while MTE will be an optional extension and only available on certain
architectures.

3) This enables using hwasan in the kernel.

While instrumented user-space applications will be using the
"interceptor ABI" and hence are likely not production-quality, the
biggest aim of implementing hwasan in GCC is to allow building the Linux
kernel with tag-based sanitization using GCC.

Instrumented kernel code uses hooks in the kernel itself, so this ABI
distinction is no longer relevant, and this sanitizer should produce a
production-quality kernel binary.

I'm hoping I can find a maintainer willing to review and ACK this patch
series -- especially with stage3 coming to a close soon.  If there's
anything else I could do to help get someone willing up-to-speed then
please just ask.

FWIW I've reviewed the aarch64 parts over the lifetime of the patch 
series and I am okay with them.

Given the reviews of the sanitiser, library and aarch64 backend 
components, and the data at

https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00387.html

how can we move forward with commit approval ? Is this something a 
global reviewer can help with, Jeff ? :)

Thanks,

Kyrill

Cheers,
Matthew

On 07/01/2020 15:14, Martin Liška wrote:
> On 12/12/19 4:18 PM, Matthew Malcomson wrote:
>
> Hello.
>
> I've just sent few comments that are related to the v3 of the patch set.
> Based on the HWASAN (limited) knowledge the patch seems reasonable 
to me.

> I haven't looked much at the newly introduced RTL-hooks.
> But these seems to me isolated to the aarch64 port.
>
> I can also verify that the patchset works on my aarch64 linux 
machine and

> hwasan.exp and asan.exp tests succeed.
>
>> I haven't gotten ASAN_MARK to print as HWASAN_MARK when using memory
>> tagging,
>> since I'm not sure the way I found to implement this would be
>> acceptable.  The
>> inlined patch below works but it requires a special declaration
>> instead of just
>> an ~#include~.
>
> Knowing that, I would not bother with the printing of HWASAN_MARK.
>
> Thanks for the series,
> Martin

Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

2020-01-13 Thread Kyrill Tkachov


Hi Stam,

On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:

Hi all,

This is a respin of patch:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

which has now been split into two (similar to the Aarch64 version).

This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
It also adds a new machine_mode (BFmode) for this type and 
accompanying Vector

modes V4BFmode and V8BFmode.

The second patch in this series uses existing target hooks to restrict 
type use.


Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Ok for trunk?


This is ok, thanks.

You can commit it once the git conversion goes through :)

Kyrill




Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 





gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * config.gcc: Add arm_bf16.h.
    * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix 
comment.

    (arm_simd_builtin_std_type): Add BFmode.
    (arm_init_simd_builtin_types): Define element types for vector 
types.

    (arm_init_bf16_types):  New function.
    (arm_init_builtins): Add arm_init_bf16_types function call.
    * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector 
modes.

    * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
    * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
    (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
    (arm_vector_mode_supported_p): Add V4BF, V8BF.
    (arm_mangle_type):
    * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
  VALID_NEON_QREG_MODE respectively. Add export 
arm_bf16_type_node,

  arm_bf16_ptr_type_node.
    * config/arm/arm.md: New enabled_for_bfmode_scalar,
  enabled_for_bfmode_vector attributes. Add BFmode to movhf 
expand.

  pattern and define_split between ARM registers.
    * config/arm/arm_bf16.h: New file.
    * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
    * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, 
fporbf): New.

  (VQXMOV): Add V8BF.
    * config/arm/neon.md: Add BF vector types to NEON move patterns.
    * config/arm/vfp.md: Add BFmode to movhf patterns.

gcc/testsuite/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
    * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
    * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
    * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
    * gcc.target/arm/bfloat16_scalar_4.c: New test.
    * gcc.target/arm/bfloat16_simd_1_1.c: New test.
    * gcc.target/arm/bfloat16_simd_1_2.c: New test.
    * gcc.target/arm/bfloat16_simd_2_1.c: New test.
    * gcc.target/arm/bfloat16_simd_2_2.c: New test.
    * gcc.target/arm/bfloat16_simd_3_1.c: New test.
    * gcc.target/arm/bfloat16_simd_3_2.c: New test.

Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]

2020-01-13 Thread Kyrill Tkachov


Hi Stam,

On 1/10/20 6:47 PM, Stam Markianos-Wright wrote:

Hi all,

This patch is part 2 of Bfloat16_t enablement in the ARM back-end.

This new type is constrained using target hooks TARGET_INVALID_CONVERSION,
TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only 
be used

through ACLE intrinsics (will be provided in later patches).

Regression testing on arm-none-eabi passed successfully.

Ok for trunk?



Ok.

Thanks,

Kyrill




Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 





gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright 

    * config/arm/arm.c
    (arm_invalid_conversion): New function for target hook.
    (arm_invalid_unary_op): New function for target hook.
    (arm_invalid_binary_op): New function for target hook.

2020-01-10  Stam Markianos-Wright 

    * gcc.target/arm/bfloat16_scalar_typecheck.c: New test.
    * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test.
    * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test.

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2020-01-13 Thread Kyrill Tkachov


Hi Christophe,

On 12/17/19 3:31 PM, Kyrill Tkachov wrote:


On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;
This is a poor name in the context of the function as a whole.  
What's
not supported.  Please think of a better name so that I have some 
idea

what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?


The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?


Yes, I think that is simplest correct change to make.



Can you also send a patch to the changes.html page for GCC 10 making 
users aware that this restriction is now lifted?


Thanks,

Kyrill



Thanks,

Kyrill



Christophe


Thanks,

Kyrill




Thanks,

Christophe


Thanks,

Christophe


R.

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2020-01-13 Thread Kyrill Tkachov




On 12/18/19 1:26 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 12/17/2019 10:26 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 11/12/2019 09:55 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM 
instruction

when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the 
register

list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple 
instruction pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register 
clearing if

    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile 
unspec.


*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise. Restrict checks 
for Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm-protos.h 
b/gcc/config/arm/arm-prot

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2020-01-14 Thread Kyrill Tkachov




On 1/14/20 1:50 PM, Christophe Lyon wrote:

On Mon, 13 Jan 2020 at 14:49, Kyrill Tkachov
 wrote:

Hi Christophe,

On 12/17/19 3:31 PM, Kyrill Tkachov wrote:

On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;

This is a poor name in the context of the function as a whole.
What's
not supported.  Please think of a better name so that I have some
idea
what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?

The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?

Yes, I think that is simplest correct change to make.


Can you also send a patch to the changes.html page for GCC 10 making
users aware that this restriction is now lifted?


Sure. I should have thought of it when I submitted the GCC patch...

How about the attached? I'm not sure about the right upper/lower case
and  markers

Thanks,

Christophe


commit ba2a354c9ed6c75ec00bf21dd6938b89a113a96e
Author: Christophe Lyon
Date:   Tue Jan 14 13:48:19 2020 +

[arm] Document -mpure-code support for v6m in gcc-10

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index caa9df7..26cdf66 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -417,7 +417,11 @@ a work-in-progress.
   data-processing intrinsics to include 32-bit SIMD, saturating arithmetic,
   16-bit multiplication and other related intrinsics aimed at DSP algorithm
   optimization.
-   
+  
+  Support for -mpure-code in Thumb-1 (v6m) has been
+  added: this M-profile feature is no longer restricted to targets
+  with MOTV. For instance, Cortex-M0 is now
+  supported

Typo in MOVT.
Let's make the last sentence. "For example, -mcpu=cortex-m0 now 
supports this option."

Ok with those changes.
Thanks,
Kyrill

 
 
 AVR

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-17 Thread Kyrill Tkachov

Hi Richard, Wilco,

On 1/17/20 8:43 AM, Richard Sandiford wrote:

Wilco Dijkstra  writes:
> Testing shows the setting of 32:16 for jump alignment has a 
significant codesize
> cost, however it doesn't make a difference in performance. So set 
jump-align

> to 4 to get 1.6% codesize improvement.

I was leaving this to others in case it was obvious to them.  On the
basis that silence suggests it wasn't, :-) could you go into more details?
Is it expected on first principles that jump alignment doesn't matter
for Neoverse N1, or is this purely based on experimentation?  If it's
expected, are we sure that the other "32:16" entries are still worthwhile?
When you say it doesn't make a difference in performance, does that mean
that no individual test's performance changed significantly, or just that
the aggregate score didn't?  Did you experiment with anything inbetween
the current 32:16 and 4, such as 32:8 or even 32:4?

Sorry for dragging my feet on this one, as I put in those numbers last 
year and I've been trying to recall my experiments from then.

The Neoverse N1 Software Optimization guide recommends aligning branch 
targets to 32 bytes withing the bounds of code density requirements.

From my benchmarking last year I do seem to remember function and loop 
alignment to matter.

I probably added the jump alignment for completeness as it's a good idea 
from first principles. But if the code size hit is too large we could 
look to decrease it.

I'd also be interested in seeing the impact of 32:8 and 32:4.

Thanks,

Kyrill

The problem with applying the patch only with the explanation above is
that if someone in future has evidence that jump alignment can make a
difference for their testcase, it would be very hard for them to
reproduce the reasoning that led to this change.

Thanks,
Richard

> OK for commit?
>
> ChangeLog
> 2019-12-24  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (neoversen1_tunings): Set jump_align to 4.
>
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
1646ed1d9a3de8ee2f0abff385a1ea145e234475..209ed8ebbe81104d9d8cff0df31946ab7704fb33 
100644

> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1132,7 +1132,7 @@ static const struct tune_params 
neoversen1_tunings =

>    3, /* issue_rate  */
>    (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* 
fusible_ops  */

>    "32:16",/* function_align.  */
> -  "32:16",/* jump_align.  */
> +  "4",/* jump_align.  */
>    "32:16",/* loop_align.  */
>    2,/* int_reassoc_width.  */
>    4,/* fp_reassoc_width.  */

Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline

2020-01-17 Thread Kyrill Tkachov




On 12/18/19 1:23 PM, Mihail Ionescu wrote:



Hi Kyrill,

On 12/11/2019 05:50 PM, Kyrill Tkachov wrote:
> Hi Mihail,
>
> On 11/14/19 1:54 PM, Mihail Ionescu wrote:
>> Hi,
>>
>> This patch adds the new scalar shift instructions for Armv8.1-M
>> Mainline to the arm backend.
>> This patch is adding the following instructions:
>>
>> ASRL (reg)
>> LSLL (reg)
>>
>
> Sorry for the delay, very busy time for GCC development :(
>
>
>>
>> ChangeLog entry are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>>
>> 2019-11-14  Mihail-Calin Ionescu 
>> 2019-11-14  Sudakshina Das 
>>
>>     * config/arm/arm.h (TARGET_MVE): New macro for MVE support.
>
>
> I don't see this hunk in the patch... There's a lot of v8.1-M-related
> patches in flight. Is it defined elsewhere?

Thanks for having a look at this.
Yes, I forgot to remove that bit from the ChangeLog and mention that the
patch depends on the Armv8.1-M MVE CLI --
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.htm which introduces
the required TARGET_* macros needed. I've updated the ChangeLog to
reflect that:

*** gcc/ChangeLog ***


2019-12-18  Mihail-Calin Ionescu 
2019-12-18  Sudakshina Das  

    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for 
TARGET_HAVE_MVE.

    (ashrdi3): Generate thumb2_asrl for TARGET_HAVE_MVE.
    * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
    register pairs for doubleword quantities for ARMv8.1M-Mainline.
    * config/arm/thumb2.md (thumb2_asrl): New.
    (thumb2_lsll): Likewise.

>
>
>>     * config/arm/arm.md (ashldi3): Generate thumb2_lsll for
>> TARGET_MVE.
>>     (ashrdi3): Generate thumb2_asrl for TARGET_MVE.
>>     * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
>>     register pairs for doubleword quantities for ARMv8.1M-Mainline.
>>     * config/arm/thumb2.md (thumb2_asrl): New.
>>     (thumb2_lsll): Likewise.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2019-11-14  Mihail-Calin Ionescu 
>> 2019-11-14  Sudakshina Das 
>>
>>     * gcc.target/arm/armv8_1m-shift-reg_1.c: New test.
>>
>> Testsuite shows no regression when run for arm-none-eabi targets.
>>
>> Is this ok for trunk?
>>
>> Thanks
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply
>> ###
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index
>> 
be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 


>> 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno,
>> machine_mode mode)
>>
>>    /* We allow almost any value to be stored in the general registers.
>>   Restrict doubleword quantities to even register pairs in ARM 
state

>> - so that we can use ldrd.  Do not allow very large Neon structure
>> - opaque modes in general registers; they would use too many.  */
>> + so that we can use ldrd and Armv8.1-M Mainline instructions.
>> + Do not allow very large Neon structure opaque modes in general
>> + registers; they would use too many.  */
>
>
> This comment now reads:
>
> "Restrict doubleword quantities to even register pairs in ARM state
>   so that we can use ldrd and Armv8.1-M Mainline instructions."
>
> Armv8.1-M Mainline is not ARM mode though, so please clarify this
> comment further.
>
> Looks ok to me otherwise (I may even have merged this with the second
> patch, but I'm not complaining about keeping it simple :) )
>
> Thanks,
>
> Kyrill
>

I've now updated the comment to read:
"Restrict doubleword quantities to even register pairs in ARM state
so that we can use ldrd. The same restriction applies for MVE."



Ok.

Thanks,

Kyril




Regards,
Mihail

>
>>    if (regno <= LAST_ARM_REGNUM)
>>  {
>>    if (ARM_NUM_REGS (mode) > 4)
>>  return false;
>>
>> -  if (TARGET_THUMB2)
>> +  if (TARGET_THUMB2 && !TARGET_HAVE_MVE)
>>  return true;
>>
>>    return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1)
>> != 0);
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index
>> 
a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 


>> 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -3503,6 +3503,22 @@
>>

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-28 Thread Kyrill Tkachov


Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra #defines
from the test.



This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c 
b/gcc/testsuite/gcc.target/arm/pr91816.c
new file mode 100644
index 
..757c897e9c0db32709227b3fdf1b4a8033428232
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */
+int printf(const char *, ...);
+

I think this needs a couple of effective target checks like arm_hard_vfp_ok and 
arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the 
options.

Thanks,
Kyrill

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-30 Thread Kyrill Tkachov




On 1/30/20 2:42 PM, Stam Markianos-Wright wrote:



On 1/28/20 10:35 AM, Kyrill Tkachov wrote:

Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this 
approved

first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra 
#defines

from the test.



This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c 
b/gcc/testsuite/gcc.target/arm/pr91816.c

new file mode 100644
index 
..757c897e9c0db32709227b3fdf1b4a8033428232

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */
+int printf(const char *, ...);
+

I think this needs a couple of effective target checks like 
arm_hard_vfp_ok and arm_thumb2_ok. See other tests in gcc.target/arm 
that add -mthumb to the options.


Hmm, looking back at this now, is there any reason why it can't just be:

/* { dg-do compile } */
/* { dg-require-effective-target arm_thumb2_ok } */
/* { dg-additional-options "-mthumb" }  */

were we don't override the march or fpu options at all, but just use 
`require-effective-target arm_thumb2_ok` to make sure that thumb2 is 
supported?


The attached new diff does just that.



Works for me, there are plenty of configurations run with fpu that it 
should get the right coverage.


Ok (make sure commit the updated, if needed, ChangeLog as well)

Thanks!

Kyrill



Cheers :)

Stam.



Thanks,
Kyrill

Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-01-30 Thread Kyrill Tkachov


Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.

*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 
*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla 
and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches and I
will apply what is relevant to this patch as well. Particularly, I will
change the tests to use the exact input and output registers and I will
change the types of the rtl patterns.



Please send the updated patches so that someone can commit them for you 
once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat
> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns are
> defined in neon.md.
> Two new tests are added to check assembler output and lane indices.
>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't 
have

> commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-11-12  Delia Burduv 
>
>  * config/arm/arm_neon.h (vbfmmlaq_f32): New.
>    (vbfmlalbq_f32): New.
>    (vbfmlaltq_f32): New.
>    (vbfmlalbq_lane_f32): New.
>    (vbfmlaltq_lane_f32): New.
>      (vbfmlalbq_laneq_f32): New.
>    (vbfmlaltq_laneq_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfmmla): New.
>    (vbfmab): New.
>    (vbfmat): New.
>    (vbfmab_lane): New.
>    (vbfmat_lane): New.
>    (vbfmab_laneq): New.
>    (vbfmat_laneq): New.
>   * config/arm/iterators.md (BF_MA): New int iterator.
>    (bt): New int attribute.
>    (VQXBF): Copy of VQX with V8BF.
>    (V_HALF): Added V8BF.
>    * config/arm/neon.md (neon_vbfmmlav8hi): New insn.
>    (neon_vbfmav8hi): New insn.
>    (neon_vbfma_lanev8hi): New insn.
>    (neon_vbfma_laneqv8hi): New expand.
>    (neon_vget_high): Changed iterator to VQXBF.
>  * config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
>    (UNSPEC_BFMAB): New UNSPEC.
>    (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12  Delia Burduv 
>
>      * gcc.target/arm/simd/bf16_ma_1.c: New test.
>      * gcc.target/arm/simd/bf16_ma_2.c: New test.
>      * gcc.target/arm/simd/bf16_mmla_1.c: New test.

Re: [GCC][PATCH][ARM] Regenerate arm-tables.opt for Armv8.1-M patch

2020-02-06 Thread Kyrill Tkachov




On 2/3/20 5:18 PM, Mihail Ionescu wrote:

Hi all,

I've regenerated arm-tables.opt in config/arm to replace the improperly
generated arm-tables.opt file from "[PATCH, GCC/ARM, 2/10] Add command
line support for Armv8.1-M Mainline" 
(9722215a027b68651c3c7a8af9204d033197e9c0).



2020-02-03  Mihail Ionescu  

    * config/arm/arm-tables.opt: Regenerate.

Ok for trunk?



Ok. I would consider it obvious too.

Thanks,

Kyrill




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 
f295a4cffa2bbb3f8163fb9cef784b5af59aee12..a51a131505d184f120a3cfc51273b419bb0cb103 
100644

--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -353,13 +353,16 @@ EnumValue
 Enum(arm_arch) String(armv8-m.main) Value(28)

 EnumValue
-Enum(arm_arch) String(armv8.1-m.main) Value(29)
+Enum(arm_arch) String(armv8-r) Value(29)

 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(30)
+Enum(arm_arch) String(armv8.1-m.main) Value(30)

 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(31)
+Enum(arm_arch) String(iwmmxt) Value(31)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(32)

 Enum
 Name(arm_fpu) Type(enum fpu_type)

Re: [GCC][PATCH][ARM] Set profile to M for Armv8.1-M

2020-02-06 Thread Kyrill Tkachov




On 2/4/20 1:49 PM, Christophe Lyon wrote:
On Mon, 3 Feb 2020 at 18:20, Mihail Ionescu 
 wrote:

>
> Hi,
>
> We noticed that the profile for armv8.1-m.main was not set in 
arm-cpus.in

> , which led to TARGET_ARM_ARCH_PROFILE and _ARM_ARCH_PROFILE not being
> defined properly.
>
>
>
> gcc/ChangeLog:
>
> 2020-02-03  Mihail Ionescu 
>
> * config/arm/arm-cpus.in: Set profile M
> for armv8.1-m.main.
>
>
> Ok for trunk?
>
> Regards,
> Mihail
>
>
> ### Attachment also inlined for ease of reply    
###

>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
1805b2b1cd8d6f65a967b4e3945257854a7e0fc1..96f584da325172bd1460251e2de0ad679589d312 
100644

> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -692,6 +692,7 @@ begin arch armv8.1-m.main
>   tune for cortex-m7
>   tune flags CO_PROC
>   base 8M_MAIN
> + profile M
>   isa ARMv8_1m_main
>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>   option dsp add armv7em
>

I'm wondering whether this is obvious?
OTOH, what's the impact of missing this (or why didn't we notice the
problem via a failing testcase?)


It's only used to set the __ARM_ARCH_PROFILE macro in arm-c.c

I do agree that the patch is obvious, so go ahead and commit this 
please, Mihail.


Thanks,

Kyrill





Christophe

Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-02-11 Thread Kyrill Tkachov


Hi Stam,

On 2/10/20 1:35 PM, Stam Markianos-Wright wrote:



On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
>
>
> On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>>
>> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:


 On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>
>
> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot 
product

>> operations (vector/by element) to the ARM back-end.
>>
>> These are:
>> usdot (vector), dot (by element).
>>
>> The functions are optional from ARMv8.2-a as 
-march=armv8.2-a+i8mm and

>> for ARM they remain optional as of ARMv8.6-a.
>>
>> The functions are declared in arm_neon.h, RTL patterns are 
defined to
>> generate assembler and tests are added to verify and perform 
adequate checks.

>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> for ARM CLI updates, and on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for testsuite effective_target update.
>>
>> Ok for trunk?
>

 New diff addressing review comments from Aarch64 version of the 
patch.


 _Change of order of operands in RTL patterns.
 _Change tests to use check-function-bodies, compile with 
optimisation and

 check for exact registers.
 _Rename tests to remove "-compile-" in filename.

>>>
> .Ping!

Ping :)

Diff re-attached in this ping email is same as the one posted on 10/01

Thank you!



Sorry for the delay.

This is ok.

Thanks,

Kyrill



> .
>>>
>>> Cheers,
>>> Stam
>>>
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> PS. I don't have commit rights, so if someone could commit on 
my behalf,

>> that would be great :)
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-28  Stam Markianos-Wright 
>>
>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>  (USTERNOP_QUALIFIERS): New define.
>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (arm_expand_builtin_args):
>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>  * config/arm/arm_neon.h (vusdot_s32): New.
>>  (vusdot_lane_s32): New.
>>  (vusdotq_lane_s32): New.
>>  (vsudot_lane_s32): New.
>>  (vsudotq_lane_s32): New.
>>  * config/arm/arm_neon_builtins.def
>> (usdot,usdot_lane,sudot_lane): New.
>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>  (sup, opsuffix): Add .
>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12  Stam Markianos-Wright 
>>
>>  * gcc.target/arm/simd/vdot-2-1.c: New test.
>>  * gcc.target/arm/simd/vdot-2-2.c: New test.
>>  * gcc.target/arm/simd/vdot-2-3.c: New test.
>>  * gcc.target/arm/simd/vdot-2-4.c: New test.
>>
>>

[PATCH][gas] Implement .cfi_negate_ra_state directive

2019-12-05 Thread Kyrill Tkachov


Hi all,

This patch implements the .cfi_negate_ra_state to be consistent with
LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code 
DW_CFA_AARCH64_negate_ra_state

is multiplexed on top of DW_CFA_GNU_window_save, as per
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html

I believe this is the simplest patch implementing this and is needed to
allow users to build, for example, the Linux kernel with Armv8.3-A
pointer authentication support with Clang while using gas as the
assembler, which is a common usecase.

Tested gas aarch64-none-elf.
Ok for master and the release branches?

Thanks,
Kyrill

gas/
2019-12-05  Kyrylo Tkachov  

    * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state.
    * testsuite/gas/aarch64/pac_negate_ra_state.s: New file.
    * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise.

diff --git a/gas/dw2gencfi.c b/gas/dw2gencfi.c
index 6c0478a72063801f1f91441a11350daa94605843..707830cbe82f860d21c3b9b8f7cbe1999568398b 100644
--- a/gas/dw2gencfi.c
+++ b/gas/dw2gencfi.c
@@ -726,6 +726,7 @@ const pseudo_typeS cfi_pseudo_table[] =
 { "cfi_remember_state", dot_cfi, DW_CFA_remember_state },
 { "cfi_restore_state", dot_cfi, DW_CFA_restore_state },
 { "cfi_window_save", dot_cfi, DW_CFA_GNU_window_save },
+{ "cfi_negate_ra_state", dot_cfi, DW_CFA_AARCH64_negate_ra_state },
 { "cfi_escape", dot_cfi_escape, 0 },
 { "cfi_signal_frame", dot_cfi, CFI_signal_frame },
 { "cfi_personality", dot_cfi_personality, 0 },
diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.d b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d
new file mode 100644
index ..7ab0f2369dece1a71fc064ae38f6e273128bf074
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.d
@@ -0,0 +1,26 @@
+#objdump: --dwarf=frames
+
+.+: file .+
+
+Contents of the .eh_frame section:
+
+ 0010  CIE
+  Version:   1
+  Augmentation:  "zR"
+  Code alignment factor: 4
+  Data alignment factor: -8
+  Return address column: 30
+  Augmentation data: 1b
+  DW_CFA_def_cfa: r31 \(sp\) ofs 0
+
+0014 0018 0018 FDE cie= pc=..0008
+  DW_CFA_advance_loc: 4 to 0004
+  DW_CFA_GNU_window_save
+  DW_CFA_advance_loc: 4 to 0008
+  DW_CFA_def_cfa_offset: 16
+  DW_CFA_offset: r29 \(x29\) at cfa-16
+  DW_CFA_offset: r30 \(x30\) at cfa-8
+  DW_CFA_nop
+  DW_CFA_nop
+
+
diff --git a/gas/testsuite/gas/aarch64/pac_negate_ra_state.s b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s
new file mode 100644
index ..36ddbeb43b7002a68eb6787a21599eb20d2b965e
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/pac_negate_ra_state.s
@@ -0,0 +1,20 @@
+	.arch armv8-a
+	.text
+	.align	2
+	.global	_Z5foo_av
+	.type	_Z5foo_av, %function
+_Z5foo_av:
+.LFB0:
+	.cfi_startproc
+	hint	25 // paciasp
+	.cfi_negate_ra_state
+	stp	x29, x30, [sp, -16]!
+	.cfi_def_cfa_offset 16
+	.cfi_offset 29, -16
+	.cfi_offset 30, -8
+	.cfi_endproc
+.LFE0:
+	.size	_Z5foo_av, .-_Z5foo_av
+	.align	2
+	.global	_Z5foo_bv
+	.type	_Z5foo_bv, %function

Re: [PATCH][gas] Implement .cfi_negate_ra_state directive

2019-12-05 Thread Kyrill Tkachov


Sorry, wrong list address from my side, please ignore.

Kyrill

On 12/5/19 10:59 AM, Kyrill Tkachov wrote:

Hi all,

This patch implements the .cfi_negate_ra_state to be consistent with
LLVM (https://reviews.llvm.org/D50136). The relevant DWARF code
DW_CFA_AARCH64_negate_ra_state
is multiplexed on top of DW_CFA_GNU_window_save, as per
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00753.html

I believe this is the simplest patch implementing this and is needed to
allow users to build, for example, the Linux kernel with Armv8.3-A
pointer authentication support with Clang while using gas as the
assembler, which is a common usecase.

Tested gas aarch64-none-elf.
Ok for master and the release branches?

Thanks,
Kyrill

gas/
2019-12-05  Kyrylo Tkachov  

 * dw2gencfi.c (cfi_pseudo_table): Add cfi_negate_ra_state.
 * testsuite/gas/aarch64/pac_negate_ra_state.s: New file.
 * testsuite/gas/aarch64/pac_negate_ra_state.d: Likewise.

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-10 Thread Kyrill Tkachov


Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!



Sorry for the delay.



Thank you,
Stam


 Forwarded Message 
Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
branches in Thumb2 (PR91816)
Date: Mon, 21 Oct 2019 10:37:09 +0100
From: Stam Markianos-Wright 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , nd ,
James Greenhalgh , Richard Earnshaw




On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * config/arm/arm.md: Update b for Thumb2 range checks.
>>   * config/arm/arm.c: New function arm_gen_far_branch.
>>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>>   prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>   * testsuite/gcc.target/arm/pr91816.c: New test.
>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
cpu_arch_option *,

>>
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
char *);

>> +
>> +
>
> Lets get the nits out of the way.
>
> Unnecessary extra new line, need a space between int and const above.
>
>

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>
>> +
>> +/* Generate code to enable conditional branches in functions over 
1 MiB.  */

>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +    const char * branch_format)
>
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
>

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +    CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
label_ptr);

>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
label_ptr);

>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
-4) + 4).

>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
(-2040->2048).
>>   ;; For a 'b' pos_range = 254, neg_range = -256  giving 
(-250 ->256).

>>
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574, neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
>
> Unnecessary extra newline.
>

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>> (match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>> (pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -    {
>> -  arm_ccfsm_state += 2;
>> -  return \"\";
>> -    }
>> -  return \"b%d1\\t%l0\";
>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> +  {
>> +    arm_ccfsm_state += 2;
>> +    return \"\";
>> +  }
>> + switch (get_attr_length (insn))
>> +  {
>> +    // Thumb2 16-bit b{cond}
>> +    case 2:
>> +
>> +    // Thumb2 32-bit b{cond}
>> +    case 4: return \"b%d1\\t%l0\";break;
>> +
>> +    // Thumb2 b{cond} out of range.  Use unconditional branch.
>> +    case 8: return arm_gen_far_branch \
>> +    (operands, 0, \"Lbcond\", \"b%D1\t\");
>> +    break;
>> +
>> +    // A32 b{cond}
>> +    defau

Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-11 Thread Kyrill Tkachov


Hi all,

On 12/11/19 9:41 AM, Stam Markianos-Wright wrote:



On 12/11/19 3:48 AM, Jeff Law wrote:
> On Mon, 2019-12-09 at 13:40 +, Stam Markianos-Wright wrote:
>>
>> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
>>>
>>> On 12/2/19 9:27 PM, Joseph Myers wrote:
 On Mon, 2 Dec 2019, Jeff Law wrote:

>> 2019-11-13  Stam Markianos-Wright <
>> stam.markianos-wri...@arm.com>
>>
>>  * real.c (struct arm_bfloat_half_format,
>>  encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>  * real.h (arm_bfloat_half_format): New.
>>
>>
> Generally OK.  Please consider using "arm_bfloat_half" instead
> of
> "bfloat_half" for the name field in the arm_bfloat_half_format
> structure.  I'm not sure if that's really visible externally,
> but it
>>> Hi both! Agreed that we want to be conservative. See latest diff
>>> attached with the name field change (also pasted below).
>>
>> .Ping :)
> Sorry if I wasn't clear.  WIth the name change I considered this OK for
> the trunk.  Please install on the trunk.
>
> If you don't have commit privs let me know.

Ahh ok gotcha! Sorry I'm new here, and yes, I don't have commit
privileges, yet!



I've committed this on Stams' behalf with r279216.

Thanks,

Kyrill



Cheers,
Stam
>
>
> Jeff
>

Re: [PATCH, GCC/ARM, 1/2] Add support for ASRL(reg) and LSLL(reg) instructions for Armv8.1-M Mainline

2019-12-11 Thread Kyrill Tkachov


Hi Mihail,

On 11/14/19 1:54 PM, Mihail Ionescu wrote:

Hi,

This patch adds the new scalar shift instructions for Armv8.1-M
Mainline to the arm backend.
This patch is adding the following instructions:

ASRL (reg)
LSLL (reg)



Sorry for the delay, very busy time for GCC development :(




ChangeLog entry are as follow:

*** gcc/ChangeLog ***


2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * config/arm/arm.h (TARGET_MVE): New macro for MVE support.



I don't see this hunk in the patch... There's a lot of v8.1-M-related 
patches in flight. Is it defined elsewhere?



    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for 
TARGET_MVE.

    (ashrdi3): Generate thumb2_asrl for TARGET_MVE.
    * config/arm/arm.c (arm_hard_regno_mode_ok): Allocate even odd
    register pairs for doubleword quantities for ARMv8.1M-Mainline.
    * config/arm/thumb2.md (thumb2_asrl): New.
    (thumb2_lsll): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * gcc.target/arm/armv8_1m-shift-reg_1.c: New test.

Testsuite shows no regression when run for arm-none-eabi targets.

Is this ok for trunk?

Thanks
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
be51df7d14738bc1addeab8ac5a3806778106bce..bf788087a30343269b30cf7054ec29212ad9c572 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24454,14 +24454,15 @@ arm_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)


   /* We allow almost any value to be stored in the general registers.
  Restrict doubleword quantities to even register pairs in ARM state
- so that we can use ldrd.  Do not allow very large Neon structure
- opaque modes in general registers; they would use too many.  */
+ so that we can use ldrd and Armv8.1-M Mainline instructions.
+ Do not allow very large Neon structure  opaque modes in general
+ registers; they would use too many.  */



This comment now reads:

"Restrict doubleword quantities to even register pairs in ARM state
 so that we can use ldrd and Armv8.1-M Mainline instructions."

Armv8.1-M Mainline is not ARM mode though, so please clarify this 
comment further.


Looks ok to me otherwise (I may even have merged this with the second 
patch, but I'm not complaining about keeping it simple :) )


Thanks,

Kyrill



   if (regno <= LAST_ARM_REGNUM)
 {
   if (ARM_NUM_REGS (mode) > 4)
 return false;

-  if (TARGET_THUMB2)
+  if (TARGET_THUMB2 && !TARGET_HAVE_MVE)
 return true;

   return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) 
!= 0);

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
a91a4b941c3f9d2c3d443f9f4639069ae953fb3b..b735f858a6a5c94d02a6765c1b349cdcb5e77ee3 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3503,6 +3503,22 @@
    (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  if (TARGET_HAVE_MVE)
+    {
+  if (!reg_or_int_operand (operands[2], SImode))
+    operands[2] = force_reg (SImode, operands[2]);
+
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (REG_P (operands[2]))
+   {
+ if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+   emit_insn (gen_movdi (operands[0], operands[1]));
+
+ emit_insn (gen_thumb2_lsll (operands[0], operands[2]));
+ DONE;
+   }
+    }
+
   arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
@@ -3530,6 +3546,16 @@
  (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (TARGET_HAVE_MVE && REG_P (operands[2]))
+    {
+  if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+   emit_insn (gen_movdi (operands[0], operands[1]));
+
+  emit_insn (gen_thumb2_asrl (operands[0], operands[2]));
+  DONE;
+    }
+
   arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
c08dab233784bd1cbaae147ece795058d2ef234f..3a716ea954ac55b2081121248b930b7f11520ffa 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1645,3 +1645,19 @@
   }
   [(set_attr "predicable" "yes")]
 )
+
+(define_insn "thumb2_asrl"
+  [(set (match_operand:DI 0 "arm_general_register_operand" "+r")
+   (ashiftrt:DI (match_dup 0)
+    (match_operand:SI 1 
"arm_general_register_operand" "r")))]

+  "TARGET_HAVE_MVE"
+  "asrl%?\\t%Q0, %R0, %1"
+  [(set_attr "predicable" "yes")])
+
+(define_insn "thumb2

Re: [PATCH, GCC/ARM, 2/2] Add support for ASRL(imm), LSLL(imm) and LSRL(imm) instructions for Armv8.1-M Mainline

2019-12-11 Thread Kyrill Tkachov


Hi Mihail,

On 11/14/19 1:54 PM, Mihail Ionescu wrote:

Hi,

This is part of a series of patches where I am trying to add new
instructions for Armv8.1-M Mainline to the arm backend.
This patch is adding the following instructions:

ASRL (imm)
LSLL (imm)
LSRL (imm)


ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * config/arm/arm.md (ashldi3): Generate thumb2_lsll for both reg
    and valid immediate.
    (ashrdi3): Generate thumb2_asrl for both reg and valid immediate.
    (lshrdi3): Generate thumb2_lsrl for valid immediates.
    * config/arm/constraints.md (Pg): New.
    * config/arm/predicates.md (long_shift_imm): New.
    (arm_reg_or_long_shift_imm): Likewise.
    * config/arm/thumb2.md (thumb2_asrl): New immediate alternative.
    (thumb2_lsll): Likewise.
    (thumb2_lsrl): New.

*** gcc/testsuite/ChangeLog ***

2019-11-14  Mihail-Calin Ionescu 
2019-11-14  Sudakshina Das  

    * gcc.target/arm/armv8_1m-shift-imm_1.c: New test.

Testsuite shows no regression when run for arm-none-eabi targets.

Is this ok for trunk?



This is ok once the prerequisites are in.

Thanks,

Kyrill



Thanks
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
b735f858a6a5c94d02a6765c1b349cdcb5e77ee3..82f4a5573d43925fb7638b9078a06699df38f88c 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3509,8 +3509,8 @@
 operands[2] = force_reg (SImode, operands[2]);

   /* Armv8.1-M Mainline double shifts are not expanded.  */
-  if (REG_P (operands[2]))
-   {
+  if (arm_reg_or_long_shift_imm (operands[2], GET_MODE 
(operands[2])))

+    {
   if (!reg_overlap_mentioned_p(operands[0], operands[1]))
 emit_insn (gen_movdi (operands[0], operands[1]));

@@ -3547,7 +3547,8 @@
   "TARGET_32BIT"
   "
   /* Armv8.1-M Mainline double shifts are not expanded.  */
-  if (TARGET_HAVE_MVE && REG_P (operands[2]))
+  if (TARGET_HAVE_MVE
+  && arm_reg_or_long_shift_imm (operands[2], GET_MODE (operands[2])))
 {
   if (!reg_overlap_mentioned_p(operands[0], operands[1]))
 emit_insn (gen_movdi (operands[0], operands[1]));
@@ -3580,6 +3581,17 @@
  (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
+  /* Armv8.1-M Mainline double shifts are not expanded.  */
+  if (TARGET_HAVE_MVE
+    && long_shift_imm (operands[2], GET_MODE (operands[2])))
+    {
+  if (!reg_overlap_mentioned_p(operands[0], operands[1]))
+    emit_insn (gen_movdi (operands[0], operands[1]));
+
+  emit_insn (gen_thumb2_lsrl (operands[0], operands[2]));
+  DONE;
+    }
+
   arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
  operands[2], gen_reg_rtx (SImode),
  gen_reg_rtx (SImode));
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
b76de81b85c8ce7a2ca484a750b908b7ca64600a..d807818c8499a6a65837f1ed0487e45947f68199 
100644

--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -35,7 +35,7 @@
 ;;   Dt, Dp, Dz, Tu
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz
-;; in all states: Pf
+;; in all states: Pf, Pg

 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us
@@ -187,6 +187,11 @@
 && !is_mm_consume (memmodel_from_int (ival))
 && !is_mm_release (memmodel_from_int (ival))")))

+(define_constraint "Pg"
+  "@internal In Thumb-2 state a constant in range 1 to 32"
+  (and (match_code "const_int")
+   (match_test "TARGET_THUMB2 && ival >= 1 && ival <= 32")))
+
 (define_constraint "Ps"
   "@internal In Thumb-2 state a constant in the range -255 to +255"
   (and (match_code "const_int")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 
69c10c06ff405e19efa172217a08a512c66cb902..ef5b0303d4424981347287865efb3cca85e56f36 
100644

--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -322,6 +322,15 @@
   && (UINTVAL (XEXP (op, 1)) < 32)")))
    (match_test "mode == GET_MODE (op)")))

+;; True for Armv8.1-M Mainline long shift instructions.
+(define_predicate "long_shift_imm"
+  (match_test "satisfies_constraint_Pg (op)"))
+
+(define_predicate "arm_reg_or_long_shift_imm"
+  (ior (match_test "TARGET_THUMB2
+   && arm_general_register_operand (op, GET_MODE (op))")
+   (match_test "satisfies_constraint_Pg (op)")))
+
 ;; True for MULT, to identify which variant of shift_operator is in use.
 (define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
3a716ea954ac55b20811

Re: [PATCH][ARM][GCC][0/x]: Support for MVE ACLE intrinsics.

2019-12-12 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patches series is to support Arm MVE ACLE intrinsics.

Please refer to Arm reference manual [1] and MVE intrinsics [2] for 
more details.

Please refer to Chapter 13 MVE ACLE [3] for MVE intrinsics concepts.

This patch series depends on upstream patches "Armv8.1-M Mainline 
Security Extension" [4],
"CLI and multilib support for Armv8.1-M Mainline MVE extensions" [5] 
and "support for Armv8.1-M

Mainline scalar shifts" [6].

[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914
[2] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics
[3] 
https://static.docs.arm.com/101028/0009/Q3-ACLE_2019Q3_release-0009.pdf?_ga=2.239684871.588348166.1573726994-1501600630.1548848914

[4] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html
[5] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00641.html
[6] https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01194.html

Srinath Parvathaneni(38):
[PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.
[PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.
[PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with 
unary operand.

[PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][3/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.
[PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][2/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][3/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][4/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][5/2x]: MVE intrinsics with binary operands.
[PATCH][ARM][GCC][1/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][2/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][3/3x]: MVE intrinsics with ternary operands.
[PATCH][ARM][GCC][1/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][2/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][3/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][4/4x]: MVE intrinsics with quaternary operands.
[PATCH][ARM][GCC][1/5x]: MVE store intrinsics.
[PATCH][ARM][GCC][2/5x]: MVE load intrinsics.
[PATCH][ARM][GCC][3/5x]: MVE store intrinsics with predicated suffix.
[PATCH][ARM][GCC][4/5x]: MVE load intrinsics with zero(_z) suffix.
[PATCH][ARM][GCC][5/5x]: MVE ACLE load intrinsics which load a byte, 
halfword, or word from memory.
[PATCH][ARM][GCC][6/5x]: Remaining MVE load intrinsics which loads 
half word and word or double word from memory.
[PATCH][ARM][GCC][7/5x]: MVE store intrinsics which stores byte,half 
word or word to memory.
[PATCH][ARM][GCC][8/5x]: Remaining MVE store intrinsics which stores 
an half word, word and double word to memory.
[PATCH][ARM][GCC][6x]:MVE ACLE vaddq intrinsics using arithmetic plus 
operator.

[PATCH][ARM][GCC][7x]: MVE vreinterpretq and vuninitializedq intrinsics.
[PATCH][ARM][GCC][1/8x]: MVE ACLE vidup, vddup, viwdup and vdwdup 
intrinsics with writeback.
[PATCH][ARM][GCC][2/8x]: MVE ACLE gather load and scatter store 
intrinsics with writeback.
[PATCH][ARM][GCC][9x]: MVE ACLE predicated intrinsics with (dont-care) 
variant.
[PATCH][ARM][GCC][10x]: MVE ACLE intrinsics "add with carry across 
beats" and "beat-wise substract".
[PATCH][ARM][GCC][11x]: MVE ACLE vector interleaving store and 
deinterleaving load intrinsics and also aliases to vstr and vldr 
intrinsics.

[PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane.
[PATCH][ARM][GCC][13x]: MVE ACLE scalar shift intrinsics.
[PATCH][ARM][GCC][14x]: MVE ACLE whole vector left shift with carry 
intrinsics.



Thank you for working on these.

I will reply to individual patches with more targeted comments.

As this is a fairly large amount of code, here's my high-level view:

The MVE intrinsics spec has more complexities than the Neon intrinsics one:

* It needs support for both the user-namespace versions, and the __arm_* 
ones.


* There are also overloaded forms that in C are implemented using _Generic.

The above two facts make for a rather bulky and messy arm_mve.h 
implementation.


In the case of the _Generic usage we hit the performance problems 
reported in PR c/91937.


Ideally, I'd like to see the frontend parts of these intrinsics 
implemented in a similar way to the SVE ACLE 
(https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00413.html)


i.e. have the compiler inject the right functions into the language and 
do overload resolution through the appropriate hooks, thus keeping the 
(unavoidable) complexity in the backend rather than arm_mve.h


That being said, this is a major feature that I would very much like to 
see in GCC 10 and the current implementation, outside of the new .md 
f

Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-12-12 Thread Kyrill Tkachov


Hi Matthew,

Martin is the authority on this but I have a small comment inline...

On 12/12/19 3:19 PM, Matthew Malcomson wrote:

This is an analogous option to --bootstrap-asan to configure.  It allows
bootstrapping GCC using HWASAN.

For the same reasons as for ASAN we have to avoid using the HWASAN
sanitizer when compiling libiberty and the lto-plugin.

Also add a function to query whether -fsanitize=hwaddress has been
passed.

ChangeLog:

2019-08-29  Matthew Malcomson 

    * configure: Regenerate.
    * configure.ac: Add --bootstrap-hwasan option.

config/ChangeLog:

2019-12-12  Matthew Malcomson 

    * bootstrap-hwasan.mk: New file.

libiberty/ChangeLog:

2019-12-12  Matthew Malcomson 

    * configure: Regenerate.
    * configure.ac: Avoid using sanitizer.

lto-plugin/ChangeLog:

2019-12-12  Matthew Malcomson 

    * Makefile.am: Avoid using sanitizer.
    * Makefile.in: Regenerate.



### Attachment also inlined for ease of reply    
###



diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
new file mode 100644
index 
..4f60bed3fd6e98b47a3a38aea6eba2a7c320da25

--- /dev/null
+++ b/config/bootstrap-hwasan.mk
@@ -0,0 +1,8 @@
+# This option enables -fsanitize=hwaddress for stage2 and stage3.
+
+STAGE2_CFLAGS += -fsanitize=hwaddress
+STAGE3_CFLAGS += -fsanitize=hwaddress
+POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
+ -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/.libs
diff --git a/configure b/configure
index 
aec9186b2b0123d3088b69eb1ee541567654953e..6f71b111bd18ec053180beecf83dd4549e83c2b9 
100755

--- a/configure
+++ b/configure
@@ -7270,7 +7270,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 
2>&1; then

   case "$BUILD_CONFIG" in
-    *bootstrap-asan* | *bootstrap-ubsan* )
+    *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/configure.ac b/configure.ac
index 
b8ce2ad20b9d03e42731252a9ec2a8417c13e566..16bfdf164555dad94c789f17b6a63ba1a2e3e9f4 
100644

--- a/configure.ac
+++ b/configure.ac
@@ -2775,7 +2775,7 @@ fi
 # or bootstrap-ubsan, bootstrap it.
 if echo " ${target_configdirs} " | grep " libsanitizer " > /dev/null 
2>&1; then

   case "$BUILD_CONFIG" in
-    *bootstrap-asan* | *bootstrap-ubsan* )
+    *bootstrap-hwasan* | *bootstrap-asan* | *bootstrap-ubsan* )
bootstrap_target_libs=${bootstrap_target_libs}target-libsanitizer,
   bootstrap_fixincludes=yes
   ;;
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 
6c9579bfaff955eb43875b404fb7db1a667bf522..da9a8809c3440827ac22ef6936e080820197f4e7 
100644

--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2645,6 +2645,13 @@ Some examples of build configurations designed 
for developers of GCC are:
 Compiles GCC itself using Address Sanitization in order to catch 
invalid memory

 accesses within the GCC code.

+@item @samp{bootstrap-hwasan}
+Compiles GCC itself using HWAddress Sanitization in order to catch 
invalid
+memory accesses within the GCC code.  This option is only available 
on AArch64

+targets with a very recent linux kernel (5.4 or later).




Using terms like "very recent" in documentation is discouraged. It won't 
be very recent in a couple of years time and I doubt any of us will 
remember to come update this snippet :)


I suggest something like "this option requires a Linux kernel support 
that supports the right ABI () (5.4 
or later)".


Thanks,

Kyrill




+
+@end table
+
 @section Building a cross compiler

 When building a cross compiler, it is not generally possible to do a
diff --git a/libiberty/configure b/libiberty/configure
index 
7a34dabec32b0b383bd33f07811757335f4dd39c..cb2dd4ff5295598343cc18b3a79a86a778f2261d 
100755

--- a/libiberty/configure
+++ b/libiberty/configure
@@ -5261,6 +5261,7 @@ fi
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac


diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index 
f1ce76010c9acde79c5dc46686a78b2e2f19244e..043237628b79cbf37d07359b59c5ffe17a7a22ef 
100644

--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -240,6 +240,7 @@ AC_SUBST(PICFLAG)
 NOASANFLAG=
 case " ${CFLAGS} " in
   *\ -fsanitize=address\ *) NOASANFLAG=-fno-sanitize=address ;;
+  *\ -fsanitize=hwaddress\ *) NOASANFLAG=-fno-sanitize=hwaddress ;;
 esac
 AC_SUBST(NOASANFLAG)

diff --git a/lto-plugin/Makefile.am b/lto-plugin/Makefile.am
index 
28dc21014b2e86988fa88adabd63ce6092e18e02..34aa397d785e3cc9b6975de460d065900364c3ff 
100644

--- a/lto-plugin/Makefile.am
+++ b/lto-plugin/Makefile.am
@@ -11,8 +11,8 @@ AM_CPPFLAGS = -I$(

Re: [PATCH] [AARCH64] Improve vector generation cost model

2019-12-13 Thread Kyrill Tkachov


Hi Andrew,

On 3/15/19 1:18 AM, apin...@marvell.com wrote:

From: Andrew Pinski 

Hi,
  On OcteonTX2, ld1r and ld1 (with a single lane) are split
into two different micro-ops unlike most other targets.
This adds three extra costs to the cost table:
ld1_dup: used for "ld1r {v0.4s}, [x0]"
merge_dup: used for "dup v0.4s, v0.4s[0]" and "ins v0.4s[0], v0.4s[0]"
ld1_merge: used fir "ld1 {v0.4s}[0], [x0]"

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.



Sorry for the slow reply, missed it on gcc-patches :(



Thanks,
Andrew Pinski

ChangeLog:
* config/arm/aarch-common-protos.h (vector_cost_table):
Add merge_dup, ld1_merge, and ld1_dup.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs):
Update for the new fields.
(thunderx_extra_costs): Likewise.
(thunderx2t99_extra_costs): Likewise.
(tsv110_extra_costs): Likewise.
* config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
(cortexa53_extra_costs): Likewise.
(cortexa57_extra_costs): Likewise.
(exynosm1_extra_costs): Likewise.
(xgene1_extra_costs): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle vec_dup of a 
memory.

Hanlde vec_merge of a memory.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-cost-tables.h | 20 +++
 gcc/config/aarch64/aarch64.c | 22 +
 gcc/config/arm/aarch-common-protos.h |  3 +++
 gcc/config/arm/aarch-cost-tables.h   | 25 +++-
 4 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h

index 5c9442e1b89..9a7c70ba595 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -123,7 +123,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -227,7 +230,10 @@ const struct cpu_cost_table thunderx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* Alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -330,7 +336,10 @@ const struct cpu_cost_table 
thunderx2t99_extra_costs =

   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* Alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -434,7 +443,10 @@ const struct cpu_cost_table tsv110_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1), /* Alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b38505b0872..dc4d3d39af8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10568,6 +10568,28 @@ cost_plus:
 }
   break;

+    case VEC_DUPLICATE:
+  if (!speed)
+   return false;



If I read the code right, before this patch we would be returning true 
for !speed i.e. not recursing.


Do we want to trigger a recursion now?



+
+  if (GET_CODE (XEXP (x, 0)) == MEM)
+   *cost += extra_cost->vect.ld1_dup;



Please use MEM_P here.



+  else
+   *cost += extra_cost->vect.merge_dup;
+  return true;
+
+    case VEC_MERGE:
+  if (speed && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+   {
+ if (GET_CODE (XEXP (XEXP (x, 0), 0)) == MEM)



And here.

Thanks,

Kyrill



+   *cost += extra_cost->vect.ld1_merge;
+ else
+   *cost += extra_cost->vect.merge_dup;
+ return true;
+   }
+  break;
+
+
 case TRUNCATE:

   /* Decompose muldi3_highpart.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h

index 11cd5145bbc..dbc1282402a 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -131,6 +131,9 @@ struct fp_cost_table
 struct vector_cost_table
 {
   const int alu;
+  const int merge_dup;
+  const int ld1_merge;
+  const int ld1_dup;
 };

 struct cpu_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h 
b/gcc/config/arm/aarch-cost-tables.h

index bc33efadc6c..a51bc668f56 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -121,7 +121,10 @@ const struct cpu_cost_table generic_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (1), /* dup_merge.  */
+    COSTS_N_INSNS (1), /* ld1_merge.  */
+    COSTS_N_INSNS (1)  /* ld1_dup.  */
   }
 };

@@ -224,7 +227,10 @@ const struct cpu_cost_table cortexa53_

Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-12-17 Thread Kyrill Tkachov

Hi Mihail,

On 12/16/19 6:28 PM, Mihail Ionescu wrote:

Hi Kyrill

On 11/06/2019 03:59 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 11/4/19 4:49 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:
> [PATCH, GCC/ARM, 2/10] Add command line support
>
> Hi,
>
> === Context ===
>
> This patch is part of a patch series to add support for Armv8.1-M
> Mainline Security Extensions architecture. Its purpose is to add
> command-line support for that new architecture.
>
> === Patch description ===
>
> Besides the expected enabling of the new value for the -march
> command-line option (-march=armv8.1-m.main) and its extensions (see
> below), this patch disables support of the Security Extensions for 
this

> newly added architecture. This is done both by not including the cmse
> bit in the architecture description and by throwing an error message
> when user request Armv8.1-M Mainline Security Extensions. Note that
> Armv8-M Baseline and Mainline Security Extensions are still enabled.
>
> Only extensions for already supported instructions are implemented in
> this patch. Other extensions (MVE integer and float) will be added in
> separate patches. The following configurations are allowed for 
Armv8.1-M

> Mainline with regards to FPU and implemented in this patch:
> + no FPU (+nofp)
> + single precision VFPv5 with FP16 (+fp)
> + double precision VFPv5 with FP16 (+fp.dp)
>
> ChangeLog entry are as follow:
>
> *** gcc/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * config/arm/arm-cpus.in (armv8_1m_main): New feature.
>     (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, 
ARMv6k,
>     ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, 
ARMv7ve,
>     ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, 
ARMv8_3a,
>     ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): 
Reindent.

>     (ARMv8_1m_main): New feature group.
>     (armv8.1-m.main): New architecture.
>     * config/arm/arm-tables.opt: Regenerate.
>     * config/arm/arm.c (arm_arch8_1m_main): Define and default
> initialize.
>     (arm_option_reconfigure_globals): Initialize 
arm_arch8_1m_main.
>     (arm_options_perform_arch_sanity_checks): Error out when 
targeting

>     Armv8.1-M Mainline Security Extensions.
>     * config/arm/arm.h (arm_arch8_1m_main): Declare.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * lib/target-supports.exp
> (check_effective_target_arm_arch_v8_1m_main_ok): Define.
>     (add_options_for_arm_arch_v8_1m_main): Likewise.
> (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.
>
> Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; 
testsuite

> shows no regression.
>
> Is this ok for trunk?
>
Ok.

Something that I remembered last night upon reflection...

New command-line options (or arguments to them) need documentation in 
invoke.texi.

Please add some either as part of this patch or as a separate patch 
if you prefer.

I've added the missing cli options in invoke.texi.

Here's the updated ChangeLog:

2019-12-06  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8_1m_main): New feature.
(ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k,
ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve,
ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
(ARMv8_1m_main): New feature group.
(armv8.1-m.main): New architecture.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1m_main): Define and default 
initialize.

(arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
(arm_options_perform_arch_sanity_checks): Error out when targeting
Armv8.1-M Mainline Security Extensions.
* config/arm/arm.h (arm_arch8_1m_main): Declare.
* doc/invoke.texi: Document armv8.1-m.main.

*** gcc/testsuite/ChangeLog ***

2019-12-16  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* lib/target-supports.exp
(check_effective_target_arm_arch_v8_1m_main_ok): Define.
(add_options_for_arm_arch_v8_1m_main): Likewise.
(check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.

Thanks, this is ok.

Kyrill

Regards,
Mihail

Thanks,

Kyrill

Thanks,

Kyrill

> Best regards,
>
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index
> 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a

Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-12-17 Thread Kyrill Tkachov


Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:


Hi Kyrill,

On 11/06/2019 04:12 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on 
return from

nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore 
the FP

context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored 
and

    their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
    Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (fp_sysreg_names): Declare and define.
    (use_return_insn): Also return false for Armv8.1-M Mainline.
    (output_return_instruction): Skip FPSCR clearing if Armv8.1-M
    Mainline instructions are available.
    (arm_compute_frame_layout): Allocate space in frame for FPCXTNS
    when targeting Armv8.1-M Mainline Security Extensions.
    (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
    Mainline entry function.
    (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
    targeting Armv8.1-M Mainline or successor.
    (arm_expand_epilogue): Fix indentation of caller-saved register
    clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
    entry function.
    * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
    (FP_SYSREGS): Likewise.
    (enum vfp_sysregs_encoding): Define enum.
    (fp_sysreg_names): Declare.
    * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile 
unspec.

    * config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
    (pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and 
VLDR.

    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M 
Mainline tests
    from mainline/8m subdirectory and new Armv8.1-M Mainline 
tests from

    mainline/8_1m subdirectory.
    * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move 
and rename

    into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c:

Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-12-17 Thread Kyrill Tkachov


Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 11/12/2019 09:55 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM 
instruction

when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple instruction 
pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register 
clearing if

    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile 
unspec.


*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks 
for Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 
100644

--- a/gcc/config/arm/arm-protos.h

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov


Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:
>
> On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
>  wrote:
> >
> > On 18/10/2019 14:18, Christophe Lyon wrote:
> > > +  bool not_supported = arm_arch_notm || flag_pic || 
TARGET_NEON;

> > >
> >
> > This is a poor name in the context of the function as a whole.  What's
> > not supported.  Please think of a better name so that I have some idea
> > what the intention is.
>
> That's to keep most of the code common when checking if -mpure-code
> and -mslow-flash-data are supported.
> These 3 cases are common to the two compilation flags, and
> -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
>
> Would "common_unsupported_modes" work better for you?
> Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
> the two tests.
>

Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?



The name looks ok to me. Richard had a concern about Armv8-M Baseline, 
but I do see it being supported as you pointed out.


So I believe all the concerns are addressed.

Thus the code is ok. However, please also updated the documentation for 
-mpure-code in invoke.texi (it currently states that a MOVT instruction 
is needed).


Thanks,

Kyrill





Thanks,

Christophe

> Thanks,
>
> Christophe
>
> >
> > R.

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov




On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;

This is a poor name in the context of the function as a whole.  What's
not supported.  Please think of a better name so that I have some idea
what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?


The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?


Yes, I think that is simplest correct change to make.

Thanks,

Kyrill



Christophe


Thanks,

Kyrill




Thanks,

Christophe


Thanks,

Christophe


R.

Re: [PATCH][AArch64] Fixup core tunings

2019-12-18 Thread Kyrill Tkachov


Hi Wilco,

On 12/17/19 4:03 PM, Wilco Dijkstra wrote:

Hi Richard,

> This changelog entry is inadequate.  It's also not in the correct style.
>
> It should say what has changed, not just that it has changed.

Sure, but there is often no useful space for that. We should auto generate
changelogs if they are deemed useful. I find the commit message a lot more
useful in general. Here is the updated version:


Several tuning settings in cores.def are not consistent.
Set the tuning for Cortex-A76AE and Cortex-A77 to neoversen1 so
it is the same as for Cortex-A76 and Neoverse N1.
Set the tuning for Neoverse E1 to cortexa73 so it's the same as for
Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to
cortexa53.

Bootstrap OK, OK for commit?



Ok.

Thanks,

Kyrill




ChangeLog:
2019-12-17  Wilco Dijkstra  

* config/aarch64/aarch64-cores.def:
("cortex-a76ae"): Use neoversen1 tuning.
("cortex-a77"): Likewise.
("cortex-a65"): Use cortexa53 scheduler.
("cortex-a65ae"): Likewise.
("neoverse-e1"): Use cortexa73 tuning.
--

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
053c6390e747cb9c818fe29a9b22990143b260ad..d170253c6eddca87f8b9f4f7fcc4692695ef83fb 
100644

--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -101,13 +101,13 @@ AARCH64_CORE("thunderx2t99", thunderx2t99,  
thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a65",  cortexa65, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
-AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
+AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A, AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | 
AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
+AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd4a, -1)


 /* HiSilicon ('H') cores. */
 AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A, AARCH64_FL_FOR_ARCH8_2 
| AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | 
AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
@@ -127,6 +127,6 @@ AARCH64_CORE("cortex-a73.cortex-a53", 
cortexa73cortexa53, cortexa53, 8A,  AARCH

 /* ARM DynamIQ big.LITTLE configurations.  */

 AARCH64_CORE("cortex-a75.cortex-a55", cortexa75cortexa55, cortexa53, 
8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 
0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55", cortexa76cortexa55, cortexa53, 
8_2A, AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd0b, 
0xd05), -1)
+AARCH64_CORE("cortex-a76.cortex-a55", c

Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions

2019-12-18 Thread Kyrill Tkachov


Hi Mihail,

On 11/8/19 4:52 PM, Mihail Ionescu wrote:

Hi,

This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm 
backend.
Two new option added for v8.1-m.main: "+mve" for integer MVE 
instructions only

and "+mve.fp" for both integer and single-precision/half-precision
floating-point MVE.
The patch also maps the Armv8.1-M multilib variants to the 
corresponding v8-M ones.




gcc/ChangeLog:

2019-11-08  Mihail Ionescu  
2019-11-08  Andre Vieira 

    * config/arm/arm-cpus.in (mve, mve_float): New features.
    (dsp, mve, mve.fp): New options.
    * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): 
Define.

    * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.


gcc/testsuite/ChangeLog:

2019-11-08  Mihail Ionescu  
2019-11-08  Andre Vieira 

    * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.


Is this ok for trunk?



This is ok, but please document the new options in invoke.texi.

Thanks,

Kyrill




Best regards,

Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 
100644

--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -194,6 +194,10 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres

+# M-profile Vector Extension feature bits
+define feature mve
+define feature mve_float
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -654,9 +658,12 @@ begin arch armv8.1-m.main
  base 8M_MAIN
  isa ARMv8_1m_main
 # fp => FPv5-sp-d16; fp.dp => FPv5-d16
+ option dsp add armv7em
  option fp add FPv5 fp16
  option fp.dp add FPv5 FP_DBL fp16
  option nofp remove ALL_FP
+ option mve add mve armv7em
+ option mve.fp add mve FPv5 fp16 mve_float armv7em
 end arch armv8.1-m.main

 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 
100644

--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -310,6 +310,12 @@ emission of floating point pcs attributes.  */
    instructions (most are floating-point related).  */
 #define TARGET_HAVE_FPCXT_CMSE  (arm_arch8_1m_main)

+#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \
+  isa_bit_mve))
+
+#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \
+ isa_bit_mve_float))
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
  || (TARGET_THUMB && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef 
100644

--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -54,7 +54,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp

 # Arch Matches
 MULTILIB_MATCHES    += march?armv6s-m=march?armv6-m

-# Map all v8-m.main+dsp FP variants down the the variant without DSP.
+# Map all v8-m.main+dsp FP variants down to the variant without DSP.
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8-m.main+dsp \
    $(foreach FP, +fp +fp.dp, \
march?armv8-m.main$(FP)=march?armv8-m.main+dsp$(FP))
@@ -66,3 +66,18 @@ MULTILIB_MATCHES += 
march?armv7e-m+fp=march?armv7e-m+fpv5
 MULTILIB_REUSE  += $(foreach ARCH, armv6s-m armv7-m armv7e-m 
armv8-m\.base armv8-m\.main, \

mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp)

+# Map v8.1-M to v8-M.
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+dsp
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+mve
+
+v8_1m_sp_variants = +fp +dsp+fp +mve.fp
+v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp
+
+# Map all v8.1-m.main FP sp variants down to v8-m.
+MULTILIB_MATCHES += $(foreach FP, $(v8_1m_sp_variants), \
+ march?armv8-m.main+fp=march?armv8.1-m.main$(FP))
+
+# Map all v8.1-m.main FP dp variants down to v8-m.
+MULTILIB_MATCHES += $(foreach FP, $(v8_1m_dp_variants), \
+ march?armv8-m.main+fp.dp=march?armv8.1-m.main$(FP))
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 
dcea829965eb15e372401e6389df5a1403393ecb..63cca118da2578253740fcd95421eae9ddf219bd 
100644

--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -775,6 +775,27 @@ if {[multilib_config "rmprofile"] } {
 {-march=armv8-r+fp.sp -mfpu=auto -mfloat-abi=hard} 
"thumb/v7-r+fp.sp/ha

Re: [PATCH, GCC/ARM, 9/10] Call nscall function with blxns

2019-12-18 Thread Kyrill Tkachov




On 12/18/19 1:38 PM, Mihail Ionescu wrote:

Hi,

On 11/12/2019 10:23 AM, Kyrill Tkachov wrote:


On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to call
functions with the cmse_nonsecure_call attribute directly using blxns
with no undue restriction on the register used for that.

=== Patch description ===

This change to use BLXNS to call a nonsecure function from secure
directly (not using a libcall) is made in 2 steps:
- change nonsecure_call patterns to use blxns instead of calling
  __gnu_cmse_nonsecure_call
- loosen requirement for function address to allow any register when
  doing BLXNS.

The former is a straightforward check over whether instructions 
added in

Armv8.1-M Mainline are available while the latter consist in making the
nonsecure call pattern accept any register by using match_operand and
changing the nonsecure_call_internal expander to no force r4 when
targeting Armv8.1-M Mainline.

The tricky bit is actually in the test update, specifically how to 
check

that register lists for CLRM have all registers except for the one
holding parameters (already done) and the one holding the address used
by BLXNS. This is achieved with 3 scan-assembler directives.

1) The first one lists all registers that can appear in CLRM but make
   each of them optional.
   Property guaranteed: no wrong register is cleared and none appears
   twice in the register list.
2) The second directive check that the CLRM is made of a fixed number
   of the right registers to be cleared. The number used is the number
   of registers that could contain a secret minus one (used to hold the
   address of the function to call.
   Property guaranteed: register list has the right number of registers
   Cumulated property guaranteed: only registers with a potential 
secret

   are cleared and they are all listed but ont
3) The last directive checks that we cannot find a CLRM with a register
   in it that also appears in BLXNS. This is check via the use of a
   back-reference on any of the allowed register in CLRM, the
   back-reference enforcing that whatever register match in CLRM 
must be

   the same in the BLXNS.
   Property guaranteed: register used for BLXNS is different from
   registers cleared in CLRM.

Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c
testcase due to there being two CLRM generated. To ensure the third
directive match the right CLRM to the BLXNS, a negative lookahead is
used between the CLRM register list and the BLXNS. The way negative
lookahead work is by matching the *position* where a given regular
expression does not match. In this case, since it comes after the CLRM
register list it is requesting that what comes after the register list
does not have a CLRM again followed by BLXNS. This guarantees that the
.*blxns after only matches a blxns without another CLRM before.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.md (nonsecure_call_internal): Do not force 
memory

    address in r4 when targeting Armv8.1-M Mainline.
    (nonsecure_call_value_internal): Likewise.
    * config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make 
memory address

    a register match_operand again.  Emit BLXNS when targeting
    Armv8.1-M Mainline.
    (nonsecure_call_value_reg_thumb2): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when 
instructions
    introduced in Armv8.1-M Mainline Security Extensions are 
available and
    restrict checks for libcall to __gnu_cmse_nonsecure_call to 
Armv8-M
    targets only.  Adapt CLRM check to verify register used for 
BLXNS is

    not in the CLRM register list.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise 
and adapt
    check for LSB clearing bit to be using the same register as 
BLXNS when

    targeting Armv8.1-M Mainline.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard

Re: [PATCH][GCC][arm] Add CLI and multilib support for Armv8.1-M Mainline MVE extensions

2019-12-18 Thread Kyrill Tkachov




On 12/18/19 5:00 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 12/18/2019 02:13 PM, Kyrill Tkachov wrote:
> Hi Mihail,
>
> On 11/8/19 4:52 PM, Mihail Ionescu wrote:
>> Hi,
>>
>> This patch adds CLI and multilib support for Armv8.1-M MVE to the Arm
>> backend.
>> Two new option added for v8.1-m.main: "+mve" for integer MVE
>> instructions only
>> and "+mve.fp" for both integer and single-precision/half-precision
>> floating-point MVE.
>> The patch also maps the Armv8.1-M multilib variants to the
>> corresponding v8-M ones.
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-08  Mihail Ionescu 
>> 2019-11-08  Andre Vieira 
>>
>>     * config/arm/arm-cpus.in (mve, mve_float): New features.
>>     (dsp, mve, mve.fp): New options.
>>     * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT):
>> Define.
>>     * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-08  Mihail Ionescu 
>> 2019-11-08  Andre Vieira 
>>
>>     * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.
>>
>>
>> Is this ok for trunk?
>
>
> This is ok, but please document the new options in invoke.texi.
>

Here it is with the updated invoke.texi and ChangeLog.



Thanks, looks great to me.

Kyrill



gcc/ChangeLog:

2019-12-18  Mihail Ionescu  
2019-12-18  Andre Vieira 

    * config/arm/arm-cpus.in (mve, mve_float): New features.
    (dsp, mve, mve.fp): New options.
    * config/arm/arm.h (TARGET_HAVE_MVE, TARGET_HAVE_MVE_FLOAT): 
Define.

    * config/arm/t-rmprofile: Map v8.1-M multilibs to v8-M.
    * doc/invoke.texi: Document the armv8.1-m mve and dsp options.


gcc/testsuite/ChangeLog:

2019-12-18  Mihail Ionescu  
2019-12-18  Andre Vieira 

    * testsuite/gcc.target/arm/multilib.exp: Add v8.1-M entries.


Thanks,
Mihail

> Thanks,
>
> Kyrill
>
>
>>
>> Best regards,
>>
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply
>> ###
>>
>>
>> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
>> index
>> 
59aad8f62ee5186cc87d3cefaf40ba2ce049012d..c2f016c75e2d8dd06890295321232bef61cbd234 


>> 100644
>> --- a/gcc/config/arm/arm-cpus.in
>> +++ b/gcc/config/arm/arm-cpus.in
>> @@ -194,6 +194,10 @@ define feature sb
>>  # v8-A architectures, added by default from v8.5-A
>>  define feature predres
>>
>> +# M-profile Vector Extension feature bits
>> +define feature mve
>> +define feature mve_float
>> +
>>  # Feature groups.  Conventionally all (or mostly) upper case.
>>  # ALL_FPU lists all the feature bits associated with the 
floating-point
>>  # unit; these will all be removed if the floating-point unit is 
disabled

>> @@ -654,9 +658,12 @@ begin arch armv8.1-m.main
>>   base 8M_MAIN
>>   isa ARMv8_1m_main
>>  # fp => FPv5-sp-d16; fp.dp => FPv5-d16
>> + option dsp add armv7em
>>   option fp add FPv5 fp16
>>   option fp.dp add FPv5 FP_DBL fp16
>>   option nofp remove ALL_FP
>> + option mve add mve armv7em
>> + option mve.fp add mve FPv5 fp16 mve_float armv7em
>>  end arch armv8.1-m.main
>>
>>  begin arch iwmmxt
>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index
>> 
64c292f2862514fb600a4faeaddfeacb2b69180b..9ec38c6af1b84fc92e20e30e8f07ce5360a966c1 


>> 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -310,6 +310,12 @@ emission of floating point pcs attributes.  */
>>     instructions (most are floating-point related).  */
>>  #define TARGET_HAVE_FPCXT_CMSE (arm_arch8_1m_main)
>>
>> +#define TARGET_HAVE_MVE (bitmap_bit_p (arm_active_target.isa, \
>> + isa_bit_mve))
>> +
>> +#define TARGET_HAVE_MVE_FLOAT (bitmap_bit_p (arm_active_target.isa, \
>> + isa_bit_mve_float))
>> +
>>  /* Nonzero if integer division instructions supported.  */
>>  #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
>>   || (TARGET_THUMB && arm_arch_thumb_hwdiv))
>> diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
>> index
>> 
807e69eaf78625f422e2d7ef5936c5c80c5b9073..62e27fd284b21524896430176d64ff5b08c6e0ef 


>> 100644
>> --- a/gcc/config/arm/t-rmprofile
>> +++ b/gcc/config/arm/t-rmprofile
>> @@ -54,7 +54,7 @@ MULTILIB_REQUIRED +=
>> mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp
>>  # Arch Matches
>>  MULTILIB_MATCHES    += mar

Re: [PATCH][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.

2019-12-18 Thread Kyrill Tkachov




On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE 
intrinsics.


Header file arm_mve.h is added to source code, which contains the 
definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md 
is also added which contains the

RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated 
patterns. A new register class "VPR_REG"

is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move 
patterns. The prefix of neon functions

which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to 
simd_immediate_valid_for_move.


In the patch standard patterns mve_move, mve_store and move_load for 
MVE are added and neon.md and vfp.md

files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?


Ok.

Thanks,

Kyrill



Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config.gcc (arm_mve.h): Add header file.
    * config/arm/aout.h (p0): Add new register name.
    * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.
    (ARM_BUILTIN_NEON_LANE_CHECK): Remove.
    (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.
    (arm_init_neon_builtins): Move a check to arm_init_builtins 
function.
    (arm_init_builtins): Move a check from arm_init_neon_builtins 
function.

    (mve_dereference_pointer): Add new function.
    (arm_expand_builtin_args): Add TARGET_HAVE_MVE check.
    (arm_expand_neon_builtin): Move a check to arm_expand_builtin 
function.
    (arm_expand_builtin): Move a check from 
arm_expand_neon_builtin function.

    * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.
    * config/arm/arm-modes.def (INT_MODE): Add three new integer 
modes.
    * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
Rename function.
    (simd_immediate_valid_for_move): Rename 
neon_immediate_valid_for_move function.
    * config/arm/arm.c 
(arm_options_perform_arch_sanity_checks):Enable mve isa bit.

    (use_return_insn): Add TARGET_HAVE_MVE check.
    (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.
    (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.
    (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.
    (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.
    (neon_valid_immediate): Rename to simd_valid_immediate.
    (simd_valid_immediate): Rename from neon_valid_immediate.
    (neon_immediate_valid_for_move): Rename to 
simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Rename from 
neon_immediate_valid_for_move.
    (neon_immediate_valid_for_logic): Modify call to 
neon_valid_immediate function.
    (neon_make_constant): Modify call to neon_valid_immediate 
function.

    (neon_vector_mem_operand): Add TARGET_HAVE_MVE check.
    (output_move_neon): Add TARGET_HAVE_MVE check.
    (arm_compute_frame_layout): Add TARGET_HAVE_MVE check.
    (arm_save_coproc_regs): Add TARGET_HAVE_MVE check.
    (arm_print_operand): Add case 'E' to print memory operands.
    (arm_print_operand_address): Add TARGET_HAVE_MVE check.
    (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.
    (arm_modes_tieable_p): Add TARGET_HAVE_MVE check.
    (arm_regno_class): Add VPR_REGNUM check.
    (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.
    (arm_expand_epilogue): Add TARGET_HAVE_MVE check.
    (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for 
MVE vector modes.

    (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.
    (arm_conditional_register_usage): For TARGET_HAVE_MVE enable 
VPR register.
    * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR 
register.

    (FIRST_PSEUDO_REGISTER): Modify.
    (VALID_MVE_MODE): Define.
    (VALID_MVE_SI_MODE): Define.
    (VALID_MVE_SF_MODE): Define.
    (VALID_MVE_STRUCT_MODE): Define.
    (REG_ALLOC_ORDER): Add VPR_REGNUM entry.
    (enum reg_class): Add VPR_REG entry.
    (REG_CLASS_NAMES): Add VPR_REG entry.
    * config/arm/arm.md (VPR_REGNUM): Define.
    (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.
    (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check 
to allow writeback.

    (include "mve.md"): Include mve.md file.
    * config/arm/arm_mve.h: New file.
    * config/arm/c

Re: [PATCH][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application 
Program Status Register)
register and FPSCR (Floating-point Status and Control Register) 
register for MVE.

This patch also enables thumb2 mov RTL patterns for MVE.

Please refer to Arm reference manual [1] for more details.
[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Add check 
to not allow

    TARGET_HAVE_MVE for this pattern.
    (thumb2_cmse_entry_return): Add TARGET_HAVE_MVE check to 
update APSR register.

    * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Define.
    (VUNSPEC_GET_FPSCR): Remove.
    * config/arm/vfp.md (thumb2_movhi_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movhi_fp16): Add TARGET_HAVE_MVE check.
    (thumb2_movsi_vfp): Add TARGET_HAVE_MVE check.
    (movdi_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movdf_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movsfcc_vfp): Add TARGET_HAVE_MVE check.
    (thumb2_movdfcc_vfp): Add TARGET_HAVE_MVE check.
    (push_multi_vfp): Add TARGET_HAVE_MVE check.
    (set_fpscr): Add TARGET_HAVE_MVE check.
    (get_fpscr): Add TARGET_HAVE_MVE check.



These pattern changes do more that add a TARGET_HAVE_MVE check. Some add 
new alternatives, some even change the RTL pattern.


I'd like to see them reflected in the ChangeLog so that I know they're 
deliberate.






### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
809461a25da5a8058a8afce972dea0d3131effc0..81afd8fcdc1b0a82493dc0758bce16fa9e5fde20 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -435,10 +435,10 @@
 (define_insn "*cmovsi_insn"
   [(set (match_operand:SI 0 "arm_general_register_operand" 
"=r,r,r,r,r,r,r")

 (if_then_else:SI
-    (match_operator 1 "arm_comparison_operator"
- [(match_operand 2 "cc_register" "") (const_int 0)])
-    (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
-    (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, 
r,UM,U1")))]

+   (match_operator 1 "arm_comparison_operator"
+    [(match_operand 2 "cc_register" "") (const_int 0)])
+   (match_operand:SI 3 "arm_reg_or_m1_or_1" "r, r,UM, r,U1,UM,U1")
+   (match_operand:SI 4 "arm_reg_or_m1_or_1" "r,UM, r,U1, r,UM,U1")))]
   "TARGET_THUMB2 && TARGET_COND_ARITH
    && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
    || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
@@ -540,7 +540,7 @@
   [(match_operand 4 "cc_register" "") 
(const_int 0)])

  (match_operand:SF 1 "s_register_operand" "0,r")
  (match_operand:SF 2 "s_register_operand" 
"r,0")))]

-  "TARGET_THUMB2 && TARGET_SOFT_FLOAT"
+  "TARGET_THUMB2 && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE"
   "@
    it\\t%D3\;mov%D3\\t%0, %2
    it\\t%d3\;mov%d3\\t%0, %1"
@@ -1226,7 +1226,7 @@
    ; added to clear the APSR and potentially the FPSCR if VFP is 
available, so

    ; we adapt the length accordingly.
    (set (attr "length")
- (if_then_else (match_test "TARGET_HARD_FLOAT")
+ (if_then_else (match_test "TARGET_HARD_FLOAT || TARGET_HAVE_MVE")
   (const_int 34)
   (const_int 8)))
    ; We do not support predicate execution of returns from 
cmse_nonsecure_entry

diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 
b3b4f8ee3e2d1bdad968a9dd8ccbc72ded274f48..ac7fe7d0af19f1965356d47d8327e24d410b99bd 
100644

--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -170,6 +170,7 @@
   UNSPEC_TORC  ; Used by the intrinsic form of the iWMMXt 
TORC instruction.
   UNSPEC_TORVSC    ; Used by the intrinsic form of the 
iWMMXt TORVSC instruction.
   UNSPEC_TEXTRC    ; Used by the intrinsic form of the 
iWMMXt TEXTRC instruction.

+  UNSPEC_GET_FPSCR ; Represent fetch of FPSCR content.
 ])


@@ -216,7 +217,6 @@
   VUNSPEC_SLX  ; Represent a store-register-release-exclusive.
   VUNSPEC_LDA  ; Represent a store-register-acquire.
   VUNSPEC_STL  ; Represent a store-register-release.
-  VUNSPEC_GET_FPSCR    ; Represent fetch of FPSCR content.
   VUNSPEC_SET_FPSCR    ; Represent assign of FPSCR content.
   VUNSPEC_PROBE_STACK_RANGE ; Represent stack range probing.
   VUNSPEC_CDP  ; Represent the coprocessor cdp instruction.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
6349c0570540ec25a

Re: [PATCH][ARM][GCC][4/x]: MVE ACLE vector interleaving store intrinsics.

2019-12-19 Thread Kyrill Tkachov




On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports MVE ACLE intrinsics vst4q_s8, vst4q_s16, 
vst4q_s32, vst4q_u8,

vst4q_u16, vst4q_u32, vst4q_f16 and vst4q_f32.

In this patch arm_mve_builtins.def file is added to the source code in 
which the

builtins for MVE ACLE intrinsics are defined using builtin qualifiers.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?



Ok.

Thanks,

Kyrill




Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (CF): Define mve_builtin_data.
    (VAR1): Define.
    (ARM_BUILTIN_MVE_PATTERN_START): Define.
    (arm_init_mve_builtins): Define function.
    (arm_init_builtins): Add TARGET_HAVE_MVE check.
    (arm_expand_builtin_1): Check the range of fcode.
    (arm_expand_mve_builtin): Define function to expand MVE builtins.
    (arm_expand_builtin): Check the range of fcode.
    * config/arm/arm_mve.h (__ARM_FEATURE_MVE): Define MVE 
floating point

    types.
    (__ARM_MVE_PRESERVE_USER_NAMESPACE): Define to protect user 
namespace.

    (vst4q_s8): Define macro.
    (vst4q_s16): Likewise.
    (vst4q_s32): Likewise.
    (vst4q_u8): Likewise.
    (vst4q_u16): Likewise.
    (vst4q_u32): Likewise.
    (vst4q_f16): Likewise.
    (vst4q_f32): Likewise.
    (__arm_vst4q_s8): Define inline builtin.
    (__arm_vst4q_s16): Likewise.
    (__arm_vst4q_s32): Likewise.
    (__arm_vst4q_u8): Likewise.
    (__arm_vst4q_u16): Likewise.
    (__arm_vst4q_u32): Likewise.
    (__arm_vst4q_f16): Likewise.
    (__arm_vst4q_f32): Likewise.
    (__ARM_mve_typeid): Define macro with MVE types.
    (__ARM_mve_coerce): Define macro with _Generic feature.
    (vst4q): Define polymorphic variant for different vst4q builtins.
    * config/arm/arm_mve_builtins.def: New file.
    * config/arm/mve.md (MVE_VLD_ST): Define iterator.
    (unspec): Define unspec.
    (mve_vst4q): Define RTL pattern.
    * config/arm/t-arm (arm.o): Add entry for arm_mve_builtins.def.
    (arm-builtins.o): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vst4q_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vst4q_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_s8.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vst4q_u8.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
d4cb0ea3deb49b10266d1620c85e243ed34aee4d..a9f76971ef310118bf7edea6a8dd3de1da46b46b 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -401,6 +401,13 @@ static arm_builtin_datum neon_builtin_data[] =
 };

 #undef CF
+#define CF(N,X) CODE_FOR_mve_##N##X
+static arm_builtin_datum mve_builtin_data[] =
+{
+#include "arm_mve_builtins.def"
+};
+
+#undef CF
 #undef VAR1
 #define VAR1(T, N, A) \
   {#N, UP (A), CODE_FOR_arm_##N, 0, T##_QUALIFIERS},
@@ -705,6 +712,13 @@ enum arm_builtins

 #include "arm_acle_builtins.def"

+  ARM_BUILTIN_MVE_BASE,
+
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_MVE_##N##X,
+#include "arm_mve_builtins.def"
+
   ARM_BUILTIN_MAX
 };

@@ -714,6 +728,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)

+#define ARM_BUILTIN_MVE_PATTERN_START \
+  (ARM_BUILTIN_MVE_BASE + 1)
+
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)

@@ -1219,6 +1236,22 @@ arm_init_acle_builtins (void)
 }
 }

+/* Set up all the MVE builtins mentioned in arm_mve_builtins.def 
file.  */

+static void
+arm_init_mve_builtins (void)
+{
+  volatile unsigned int i, fcode = ARM_BUILTIN_MVE_PATTERN_START;
+
+  arm_init_simd_builtin_scalar_types ();
+  arm_init_simd_builtin_types ();
+
+  for (i = 0; i < ARRAY_SIZE (mve_builtin_data); i++, fcode++)
+    {
+  arm_builtin_datum *d = &mve_builtin_data[i];
+  arm_init_builtin (fcode, d, "__builtin_mve");
+    }
+}
+
 /* Set up all the NEON builtins, even builtins for instructions that 
are not
    in the current target ISA to allow the user to compile particular 
modules
    with different target specific options that differ from the 
command line

@@ -1961,8 +1994,10 @@ arm_init_builtins (void)
   = add_builtin_functio

Re: [PATCH][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the double-precision 
arithmetic
operations for MVE. This changes are to support the MVE ACLE 
intrinsics which

operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] 
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-11  Andre Vieira 
    Srinath Parvathaneni 

    * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
function to add

    emulator calls for dobule precision arithmetic operations for MVE.



I'm a bit confused by the changelog and the comment in the patch





### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
6faed76206b93c1a9dea048e2f693dc16ee58072..358b2638b65a2007d1c7e8062844b67682597f45 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5658,9 +5658,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
   /* Values from double-precision helper functions are returned 
in core

  registers if the selected core only supports single-precision
  arithmetic, even if we are using the hard-float ABI.  The 
same is

-    true for single-precision helpers, but we will never be using the
-    hard-float ABI on a CPU which doesn't support single-precision
-    operations in hardware.  */
+    true for single-precision helpers except in case of MVE, 
because in
+    MVE we will be using the hard-float ABI on a CPU which 
doesn't support
+    single-precision operations in hardware.  In MVE the 
following check

+    enables use of emulation for the double-precision arithmetic
+    operations.  */
+  if (TARGET_HAVE_MVE)
+   {
+ add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+   }


... this adds emulation for SFmode but you say you want double-precision 
emulation?


Can you demonstrate what this patch wants to achieve with a testcase?

Thanks,

Kyrill





   add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));

Re: [PATCH][ARM][GCC][1/1x]: Patch to support MVE ACLE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports MVE ACLE intrinsics vcvtq_f16_s16, vcvtq_f32_s32, 
vcvtq_f16_u16, vcvtq_f32_u32n
vrndxq_f16, vrndxq_f32, vrndq_f16, vrndq_f32, vrndpq_f16, vrndpq_f32, 
vrndnq_f16, vrndnq_f32,
vrndmq_f16, vrndmq_f32, vrndaq_f16, vrndaq_f32, vrev64q_f16, 
vrev64q_f32, vnegq_f16, vnegq_f32,
vdupq_n_f16, vdupq_n_f32, vabsq_f16, vabsq_f32, vrev32q_f16, 
vcvttq_f32_f16, vcvtbq_f32_f16.



Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-17  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (UNOP_NONE_NONE_QUALIFIERS): 
Define macro.

    (UNOP_NONE_SNONE_QUALIFIERS): Likewise.
    (UNOP_NONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vrndxq_f16): Define macro.
    (vrndxq_f32): Likewise.
    (vrndq_f16) Likewise.:
    (vrndq_f32): Likewise.
    (vrndpq_f16): Likewise.
    (vrndpq_f32): Likewise.
    (vrndnq_f16): Likewise.
    (vrndnq_f32): Likewise.
    (vrndmq_f16): Likewise.
    (vrndmq_f32): Likewise.
    (vrndaq_f16): Likewise.
    (vrndaq_f32): Likewise.
    (vrev64q_f16): Likewise.
    (vrev64q_f32): Likewise.
    (vnegq_f16): Likewise.
    (vnegq_f32): Likewise.
    (vdupq_n_f16): Likewise.
    (vdupq_n_f32): Likewise.
    (vabsq_f16): Likewise.
    (vabsq_f32): Likewise.
    (vrev32q_f16): Likewise.
    (vcvttq_f32_f16): Likewise.
    (vcvtbq_f32_f16): Likewise.
    (vcvtq_f16_s16): Likewise.
    (vcvtq_f32_s32): Likewise.
    (vcvtq_f16_u16): Likewise.
    (vcvtq_f32_u32): Likewise.
    (__arm_vrndxq_f16): Define intrinsic.
    (__arm_vrndxq_f32): Likewise.
    (__arm_vrndq_f16): Likewise.
    (__arm_vrndq_f32): Likewise.
    (__arm_vrndpq_f16): Likewise.
    (__arm_vrndpq_f32): Likewise.
    (__arm_vrndnq_f16): Likewise.
    (__arm_vrndnq_f32): Likewise.
    (__arm_vrndmq_f16): Likewise.
    (__arm_vrndmq_f32): Likewise.
    (__arm_vrndaq_f16): Likewise.
    (__arm_vrndaq_f32): Likewise.
    (__arm_vrev64q_f16): Likewise.
    (__arm_vrev64q_f32): Likewise.
    (__arm_vnegq_f16): Likewise.
    (__arm_vnegq_f32): Likewise.
    (__arm_vdupq_n_f16): Likewise.
    (__arm_vdupq_n_f32): Likewise.
    (__arm_vabsq_f16): Likewise.
    (__arm_vabsq_f32): Likewise.
    (__arm_vrev32q_f16): Likewise.
    (__arm_vcvttq_f32_f16): Likewise.
    (__arm_vcvtbq_f32_f16): Likewise.
    (__arm_vcvtq_f16_s16): Likewise.
    (__arm_vcvtq_f32_s32): Likewise.
    (__arm_vcvtq_f16_u16): Likewise.
    (__arm_vcvtq_f32_u32): Likewise.
    (vrndxq): Define polymorphic variants.
    (vrndq): Likewise.
    (vrndpq): Likewise.
    (vrndnq): Likewise.
    (vrndmq): Likewise.
    (vrndaq): Likewise.
    (vrev64q): Likewise.
    (vnegq): Likewise.
    (vabsq): Likewise.
    (vrev32q): Likewise.
    (vcvtbq_f32): Likewise.
    (vcvttq_f32): Likewise.
    (vcvtq): Likewise.
    * config/arm/arm_mve_builtins.def (VAR2): Define.
    (VAR1): Define.
    * config/arm/mve.md (mve_vrndxq_f): Add RTL pattern.
    (mve_vrndq_f): Likewise.
    (mve_vrndpq_f): Likewise.
    (mve_vrndnq_f): Likewise.
    (mve_vrndmq_f): Likewise.
    (mve_vrndaq_f): Likewise.
    (mve_vrev64q_f): Likewise.
    (mve_vnegq_f): Likewise.
    (mve_vdupq_n_f): Likewise.
    (mve_vabsq_f): Likewise.
    (mve_vrev32q_fv8hf): Likewise.
    (mve_vcvttq_f32_f16v4sf): Likewise.
    (mve_vcvtbq_f32_f16v4sf): Likewise.
    (mve_vcvtq_to_f_): Likewise.

gcc/testsuite/ChangeLog:

2019-10-17  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vabsq_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtbq_f32_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f16_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f16_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f32_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_f32_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvttq_f32_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vnegq_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vnegq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev32q_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q

Re: [PATCH][ARM][GCC][2/1x]: MVE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vmvnq_n_s16, vmvnq_n_s32, vrev64q_s8, vrev64q_s16, vrev64q_s32, 
vcvtq_s16_f16, vcvtq_s32_f32,
vrev64q_u8, vrev64q_u16, vrev64q_u32, vmvnq_n_u16, vmvnq_n_u32, 
vcvtq_u16_f16, vcvtq_u32_f32,

vrev64q.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (UNOP_SNONE_SNONE_QUALIFIERS): Define.
    (UNOP_SNONE_NONE_QUALIFIERS): Likewise.
    (UNOP_SNONE_IMM_QUALIFIERS): Likewise.
    (UNOP_UNONE_NONE_QUALIFIERS): Likewise.
    (UNOP_UNONE_UNONE_QUALIFIERS): Likewise.
    (UNOP_UNONE_IMM_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vmvnq_n_s16): Define macro.
    (vmvnq_n_s32): Likewise.
    (vrev64q_s8): Likewise.
    (vrev64q_s16): Likewise.
    (vrev64q_s32): Likewise.
    (vcvtq_s16_f16): Likewise.
    (vcvtq_s32_f32): Likewise.
    (vrev64q_u8): Likewise.
    (vrev64q_u16): Likewise.
    (vrev64q_u32): Likewise.
    (vmvnq_n_u16): Likewise.
    (vmvnq_n_u32): Likewise.
    (vcvtq_u16_f16): Likewise.
    (vcvtq_u32_f32): Likewise.
    (__arm_vmvnq_n_s16): Define intrinsic.
    (__arm_vmvnq_n_s32): Likewise.
    (__arm_vrev64q_s8): Likewise.
    (__arm_vrev64q_s16): Likewise.
    (__arm_vrev64q_s32): Likewise.
    (__arm_vrev64q_u8): Likewise.
    (__arm_vrev64q_u16): Likewise.
    (__arm_vrev64q_u32): Likewise.
    (__arm_vmvnq_n_u16): Likewise.
    (__arm_vmvnq_n_u32): Likewise.
    (__arm_vcvtq_s16_f16): Likewise.
    (__arm_vcvtq_s32_f32): Likewise.
    (__arm_vcvtq_u16_f16): Likewise.
    (__arm_vcvtq_u32_f32): Likewise.
    (vrev64q): Define polymorphic variant.
    * config/arm/arm_mve_builtins.def (UNOP_SNONE_SNONE): Use it.
    (UNOP_SNONE_NONE): Likewise.
    (UNOP_SNONE_IMM): Likewise.
    (UNOP_UNONE_UNONE): Likewise.
    (UNOP_UNONE_NONE): Likewise.
    (UNOP_UNONE_IMM): Likewise.
    * config/arm/mve.md (mve_vrev64q_): Define RTL 
pattern.

    (mve_vcvtq_from_f_): Likewise.
    (mve_vmvnq_n_): Likewise.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vcvtq_s16_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vcvtq_s32_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_u16_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_u32_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vmvnq_n_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_s8.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vrev64q_u8.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
2fee417fe6585f457edd4cf96655366b1d6bd1a0..21b213d8e1bc99a3946f15e97161e01d73832799 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -313,6 +313,42 @@ arm_unop_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_NONE_UNONE_QUALIFIERS \
   (arm_unop_none_unone_qualifiers)

+static enum arm_type_qualifiers
+arm_unop_snone_snone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_SNONE_QUALIFIERS \
+  (arm_unop_snone_snone_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none };
+#define UNOP_SNONE_NONE_QUALIFIERS \
+  (arm_unop_snone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_snone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate };
+#define UNOP_SNONE_IMM_QUALIFIERS \
+  (arm_unop_snone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define UNOP_UNONE_NONE_QUALIFIERS \
+  (arm_unop_unone_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_unop_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qual

Re: [PATCH][ARM][GCC][4/1x]: MVE intrinsics with unary operand.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with unary operand.

vctp16q, vctp32q, vctp64q, vctp8q, vpnot.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


There are few conflicts in defining the machine registers, resolved by 
re-ordering

VPR_REGNUM, APSRQ_REGNUM and APSRGE_REGNUM.

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (hi_UP): Define mode.
    * config/arm/arm.h (IS_VPR_REGNUM): Move.
    * config/arm/arm.md (VPR_REGNUM): Define before APSRQ_REGNUM.
    (APSRQ_REGNUM): Modify.
    (APSRGE_REGNUM): Modify.
    * config/arm/arm_mve.h (vctp16q): Define macro.
    (vctp32q): Likewise.
    (vctp64q): Likewise.
    (vctp8q): Likewise.
    (vpnot): Likewise.
    (__arm_vctp16q): Define intrinsic.
    (__arm_vctp32q): Likewise.
    (__arm_vctp64q): Likewise.
    (__arm_vctp8q): Likewise.
    (__arm_vpnot): Likewise.
    * config/arm/arm_mve_builtins.def (UNOP_UNONE_UNONE): Use builtin
    qualifier.
    * config/arm/mve.md (mve_vctpqhi): Define RTL pattern.
    (mve_vpnothi): Likewise.

gcc/testsuite/ChangeLog:

2019-11-12  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vctp16q.c: New test.
    * gcc.target/arm/mve/intrinsics/vctp32q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vctp64q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vctp8q.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vpnot.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
21b213d8e1bc99a3946f15e97161e01d73832799..cd82aa159089c288607e240de02a85dcbb134a14 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -387,6 +387,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP    E_OImode
 #define hf_UP    E_HFmode
 #define si_UP    E_SImode
+#define hi_UP    E_HImode
 #define void_UP  E_VOIDmode

 #define UP(X) X##_UP
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
485db72f05f16ca389227289a35c232dc982bf9d..95ec7963a57a1a5652a0a9dc30391a0ce6348242 
100644

--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -955,6 +955,9 @@ extern int arm_arch_cmse;
 #define IS_IWMMXT_GR_REGNUM(REGNUM) \
   (((REGNUM) >= FIRST_IWMMXT_GR_REGNUM) && ((REGNUM) <= 
LAST_IWMMXT_GR_REGNUM))


+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Base register for access to local variables of the function.  */
 #define FRAME_POINTER_REGNUM    102

@@ -999,7 +1002,7 @@ extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))

 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   +1 VPR + 1 APSRQ + 1 APSRGE.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
 #define FIRST_PSEUDO_REGISTER   107
@@ -1101,13 +1104,10 @@ extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */    \
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,    \
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,   \
-  VPR_REGNUM   \
+  SP_REGNUM, PC_REGNUM, VPR_REGNUM, APSRQ_REGNUM,\
+  APSRGE_REGNUM    \
 }

-#define IS_VPR_REGNUM(REGNUM) \
-  ((REGNUM) == VPR_REGNUM)
-
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
689baa0b0ff63ef90f47d2fd844cb98c9a1457a0..2a90482a873f8250a3b2b1dec141669f55e0c58b 
100644

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -39,9 +39,9 @@
    (LAST_ARM_REGNUM  15)   ;
    (CC_REGNUM   100)   ; Condition code pseudo register
    (VFPCC_REGNUM    101)   ; VFP Condition code pseudo register
-   (APSRQ_REGNUM    104)   ; Q bit pseudo register
-   (APSRGE_REGNUM   105)   ; GE bits pseudo register
-   (VPR_REGNUM  106)   ; Vector Predication Register - MVE 
register.
+   (VPR_REGNUM  104)   ; Vector Predication Register - MVE 
register.

+   (APSRQ_REGNUM    105)   ; Q bit pseudo register
+   (APSRGE_REGNUM   106)   ; GE bits pseudo register
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 
1d357180ba9ddb26347b55cde625903bdb09eef6..c8d9b6471634725cea9bab3f9fa14

Re: [PATCH][ARM][GCC][1/2x]: MVE intrinsics with binary operands.

2019-12-19 Thread Kyrill Tkachov


Hi Srinath,

On 11/14/19 7:13 PM, Srinath Parvathaneni wrote:

Hello,

This patch supports following MVE ACLE intrinsics with binary operand.

vsubq_n_f16, vsubq_n_f32, vbrsrq_n_f16, vbrsrq_n_f32, vcvtq_n_f16_s16,
vcvtq_n_f32_s32, vcvtq_n_f16_u16, vcvtq_n_f32_u32, vcreateq_f16, 
vcreateq_f32.


Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for 
more details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics


In this patch new constraint "Rd" is added, which checks the constant 
is with in the range of 1 to 16.
Also a new predicate "mve_imm_16" is added, to check the the matching 
constraint Rd.


Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config/arm/arm-builtins.c (BINOP_NONE_NONE_NONE_QUALIFIERS): 
Define

    qualifier for binary operands.
    (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/arm_mve.h (vsubq_n_f16): Define macro.
    (vsubq_n_f32): Likewise.
    (vbrsrq_n_f16): Likewise.
    (vbrsrq_n_f32): Likewise.
    (vcvtq_n_f16_s16): Likewise.
    (vcvtq_n_f32_s32): Likewise.
    (vcvtq_n_f16_u16): Likewise.
    (vcvtq_n_f32_u32): Likewise.
    (vcreateq_f16): Likewise.
    (vcreateq_f32): Likewise.
    (__arm_vsubq_n_f16): Define intrinsic.
    (__arm_vsubq_n_f32): Likewise.
    (__arm_vbrsrq_n_f16): Likewise.
    (__arm_vbrsrq_n_f32): Likewise.
    (__arm_vcvtq_n_f16_s16): Likewise.
    (__arm_vcvtq_n_f32_s32): Likewise.
    (__arm_vcvtq_n_f16_u16): Likewise.
    (__arm_vcvtq_n_f32_u32): Likewise.
    (__arm_vcreateq_f16): Likewise.
    (__arm_vcreateq_f32): Likewise.
    (vsubq): Define polymorphic variant.
    (vbrsrq): Likewise.
    (vcvtq_n): Likewise.
    * config/arm/arm_mve_builtins.def 
(BINOP_NONE_NONE_NONE_QUALIFIERS): Use

    it.
    (BINOP_NONE_NONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_IMM_QUALIFIERS): Likewise.
    (BINOP_NONE_UNONE_UNONE_QUALIFIERS): Likewise.
    * config/arm/constraints.md (Rd): Define constraint to check 
constant is

    in the range of 1 to 16.
    * config/arm/mve.md (mve_vsubq_n_f): Define RTL pattern.
    mve_vbrsrq_n_f: Likewise.
    mve_vcvtq_n_to_f_: Likewise.
    mve_vcreateq_f: Likewise.
    * config/arm/predicates.md (mve_imm_16): Define predicate to check
    the matching constraint Rd.

gcc/testsuite/ChangeLog:

2019-10-21  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/vbrsrq_n_f16.c: New test.
    * gcc.target/arm/mve/intrinsics/vbrsrq_n_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_s16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f16_u16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_s32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vcvtq_n_f32_u32.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
    * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
cd82aa159089c288607e240de02a85dcbb134a14..c2dad057d1365914477c64d559aa1fd1c32bbf19 
100644

--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -349,6 +349,30 @@ arm_unop_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define UNOP_UNONE_IMM_QUALIFIERS \
   (arm_unop_unone_imm_qualifiers)

+static enum arm_type_qualifiers
+arm_binop_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none };
+#define BINOP_NONE_NONE_NONE_QUALIFIERS \
+  (arm_binop_none_none_none_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_NONE_NONE_IMM_QUALIFIERS \
+  (arm_binop_none_none_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_immediate };
+#define BINOP_NONE_UNONE_IMM_QUALIFIERS \
+  (arm_binop_none_unone_imm_qualifiers)
+
+static enum arm_type_qualifiers
+arm_binop_none_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_NONE_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_none_unone_unone_qualifiers)
+
 /* End of Qualifier for MVE builtins.  */

    /* void ([T eleme

Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-12-20 Thread Kyrill Tkachov

Hi Dennis,

On 12/12/19 5:30 PM, Dennis Zhang wrote:

Hi all,

On 22/11/2019 14:33, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It enables options including -march=armv8.6-a, +i8mm and +bf16.
> The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
> Documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8-a.
>

This is an update to rebase the patch to the top.
Some issues are fixed according to the recent CLI patch for AArch64.
ChangeLog is updated as following:

gcc/ChangeLog:

2019-12-12  Dennis Zhang  

    * config/arm/arm-c.c (arm_cpu_builtins): Define
    __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
    __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
    __ARM_BF16_FORMAT_ALTERNATIVE when enabled.
    * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
    * config/arm/arm-tables.opt: Regenerated.
    * config/arm/arm.c (arm_option_reconfigure_globals): Initialize
    arm_arch_i8mm and arm_arch_bf16 when enabled.
    * config/arm/arm.h (TARGET_I8MM): New macro.
    (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
    * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
    * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
    * config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
    (v8_6_a_simd_variants): New.
    (v8_*_a_simd_variants): Add i8mm and bf16.
    * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-12-12  Dennis Zhang  

    * gcc.target/arm/multilib.exp: Add combination tests for 
armv8.6-a.

Is it OK for trunk?

This is ok for trunk.

Please follow the steps at https://gcc.gnu.org/svnwrite.html to get 
write permission to the repo (listing me as approver).

You can then commit it yourself :)

Thanks,

Kyrill

Many thanks!
Dennis

Re: [PATCH] AARCH64: Add Qualcomnm oryon-1 core

2024-05-14 Thread Kyrill Tkachov

Hi Andrew,

On Fri, May 3, 2024 at 8:50 PM Andrew Pinski 
wrote:

> This patch adds Qualcomm's new oryon-1 core; this is enough
> to recongize the core and later on will add the tuning structure.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (oryon-1): New entry.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi  (AArch64 Options): Document oryon-1.
>
> Signed-off-by: Andrew Pinski 
> Co-authored-by: Joel Jones 
> Co-authored-by: Wei Zhao 
> ---
>  gcc/config/aarch64/aarch64-cores.def | 5 +
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/doc/invoke.texi  | 1 +
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index f69fc212d56..be60929e400 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb,
> cortexa57, V8_4A,  (SVE, I8MM, B
>  /* Qualcomm ('Q') cores. */
>  AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO),
> saphira,   0x51, 0xC01, -1)
>
> +/* ARMv8.6-A Architecture Processors.  */
> +
> +/* Qualcomm ('Q') cores. */
> +AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3,
> F16), cortexa72,   0x51, 0x001, -1)
> +
>  /* ARMv8-A big.LITTLE implementations.  */
>
>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53,
> V8A,  (CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md
> b/gcc/config/aarch64/aarch64-tune.md
> index abd3c9e0822..ba940f1c890 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -
>  
> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
> +
>  
> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9456ced468a..eabe09dc28f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21323,6 +21323,7 @@ performance of the code.  Permissible values for
> this option are:
>  @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
>  @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
>  @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
> +@samp{oyron-1},


Typo in the name.
LGTM with that fixed.
Thanks,
Kyrill


>
>  @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
>  @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2},
> @samp{qdf24xx},
>  @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
> --
> 2.43.0
>
>

Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax

2024-05-15 Thread Kyrill Tkachov

Hi Tamar,

On Wed, 15 May 2024 at 11:28, Tamar Christina 
wrote:

> Hi All,
>
> This converts the single alternative patterns to the new compact syntax
> such
> that when I add the new alternatives it's clearer what's being changed.
>
> Note that this will spew out a bunch of warnings from geninsn as it'll
> warn that
> @ is useless for a single alternative pattern.  These are not fatal so
> won't
> break the build and are only temporary.
>
> No change in functionality is expected with this patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?


Ok.
Thanks,
Kyrill


>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve.md (and3,
> @aarch64_pred__z, *3_cc,
> *3_ptest, aarch64_pred__z,
> *3_cc, *3_ptest,
> aarch64_pred__z, *3_cc,
> *3_ptest, *cmp_ptest,
> @aarch64_pred_cmp_wide,
> *aarch64_pred_cmp_wide_cc,
> *aarch64_pred_cmp_wide_ptest,
> *aarch64_brk_cc,
> *aarch64_brk_ptest, @aarch64_brk, *aarch64_brkn_cc,
> *aarch64_brkn_ptest, *aarch64_brk_cc,
> *aarch64_brk_ptest, aarch64_rdffr_z,
> *aarch64_rdffr_z_ptest,
> *aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc):
> Convert
> to compact syntax.
> * config/aarch64/aarch64-sve2.md
> (@aarch64_pred_): Likewise.
>
> ---
> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> index
> 0434358122d2fde71bd0e0f850338e739e9be02c..839ab0627747d7a49bef7b0192ee9e7a42587ca0
> 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -1156,76 +1156,86 @@ (define_insn "aarch64_rdffr"
>
>  ;; Likewise with zero predication.
>  (define_insn "aarch64_rdffr_z"
> -  [(set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +  [(set (match_operand:VNx16BI 0 "register_operand")
> (and:VNx16BI
>   (reg:VNx16BI FFRT_REGNUM)
> - (match_operand:VNx16BI 1 "register_operand" "Upa")))]
> + (match_operand:VNx16BI 1 "register_operand")))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffr\t%0.b, %1/z"
> +  {@ [ cons: =0, 1   ]
> + [ Upa , Upa ] rdffr\t%0.b, %1/z
> +  }
>  )
>
>  ;; Read the FFR to test for a fault, without using the predicate result.
>  (define_insn "*aarch64_rdffr_z_ptest"
>[(set (reg:CC_NZC CC_REGNUM)
> (unspec:CC_NZC
> - [(match_operand:VNx16BI 1 "register_operand" "Upa")
> + [(match_operand:VNx16BI 1 "register_operand")
>(match_dup 1)
>(match_operand:SI 2 "aarch64_sve_ptrue_flag")
>(and:VNx16BI
>  (reg:VNx16BI FFRT_REGNUM)
>  (match_dup 1))]
>   UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1  , 2 ]
> + [ Upa , Upa,   ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Same for unpredicated RDFFR when tested with a known PTRUE.
>  (define_insn "*aarch64_rdffr_ptest"
>[(set (reg:CC_NZC CC_REGNUM)
> (unspec:CC_NZC
> - [(match_operand:VNx16BI 1 "register_operand" "Upa")
> + [(match_operand:VNx16BI 1 "register_operand")
>(match_dup 1)
>(const_int SVE_KNOWN_PTRUE)
>(reg:VNx16BI FFRT_REGNUM)]
>   UNSPEC_PTEST))
> -   (clobber (match_scratch:VNx16BI 0 "=Upa"))]
> +   (clobber (match_scratch:VNx16BI 0))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1   ]
> + [ Upa , Upa ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Read the FFR with zero predication and test the result.
>  (define_insn "*aarch64_rdffr_z_cc"
>[(set (reg:CC_NZC CC_REGNUM)
> (unspec:CC_NZC
> - [(match_operand:VNx16BI 1 "register_operand" "Upa")
> + [(match_operand:VNx16BI 1 "register_operand")
>(match_dup 1)
>(match_operand:SI 2 "aarch64_sve_ptrue_flag")
>(and:VNx16BI
>  (reg:VNx16BI FFRT_REGNUM)
>  (match_dup 1))]
>   UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operand" "=Upa")
> +   (set (match_operand:VNx16BI 0 "register_operand")
> (and:VNx16BI
>   (reg:VNx16BI FFRT_REGNUM)
>   (match_dup 1)))]
>"TARGET_SVE && TARGET_NON_STREAMING"
> -  "rdffrs\t%0.b, %1/z"
> +  {@ [ cons: =0, 1  , 2 ]
> + [ Upa , Upa,   ] rdffrs\t%0.b, %1/z
> +  }
>  )
>
>  ;; Same for unpredicated RDFFR when tested with a known PTRUE.
>  (define_insn "*aarch64_rdffr_cc"
>[(set (reg:CC_NZC CC_REGNUM)
> (unspec:CC_NZC
> - [(match_operand:VNx16BI 1 "register_operand" "Upa")
> + [(match_operand:VNx16BI 1 "register_operand")
>(match_dup 1)
>(const_int SVE_KNOWN_PTRUE)
>(reg:VNx16BI FFRT_REGNUM)]
>   UNSPEC_PTEST))
> -   (set (match_operand:VNx16BI 0 "register_operan

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Kyrill Tkachov

Hi Wilco,

On Fri, May 31, 2024 at 6:38 PM Wilco Dijkstra 
wrote:

> Hi Richard,
>
> > I think this should be in a push_options/pop_options block, as for other
> > intrinsics that require certain features.
>
> But then the intrinsic would always be defined, which is contrary to what
> the
> ACLE spec demands - it would not give a compilation error at the callsite
> but give assembler errors (potentially in different functions after
> inlining).
>
> What was the reason for using an inline asm rather than a builtin?
> > Feels a bit old school. :)  Using a builtin should mean that the
> > RTL optimisers see the extent of the write.
>
> Given this intrinsic will be used very rarely, if ever, it does not make
> sense
> to provide anything more than the basic functionality.
>

I agree that it's unlikely to get much use.
IMO we should be moving the arm_acle.h header to be implemented in
the #pragma GCC aarch64 "arm_acle.h" at the top as much as possible.
So I'd expect handle_arm_acle_h to be extended to inject these definitions
when appropriate and during expansion it'd just generate the RTL pattern
for it, which needn't be exposed as an implementation-defined builtin.
Thanks,
Kyrill

Cheers,
> Wilco

RE: [PATCH][ARM] Improve cond_exec opportunities for immediate shifts for -mrestrict-it

2013-09-10 Thread Kyrill Tkachov

> On 09/09/13 17:32, Kyrylo Tkachov wrote:
> > Hi all,
> >
> > Shift operations with an immediate can generate a 16-bit encoding when
> > the immediate is 5-bit wide, i.e. in the range [0-31]. Therefore we
> > can use them in IT blocks even with the -mrestrict-it rules.
> >
> > I decided to reuse the "N" constraint with a slight modification by
> > removing the !TARGET_32BIT condition from it. The "N" constraint is
> > only used in a few patterns that are predicated by TARGET_THUMB1
> > anyway, so there's no reason for the constraint to check for that as well.
> >
> 
> Hmm, I see no reason why you can't use the M constraint as ARM does.

Ok, it seems that the shift_operator predicate takes care of the validation.
It seems we can indeed reuse "M" and I notice it's also used in 
thumb2_shiftsi3_short
to generate 16-bit shifts.

How about this then?

Tested arm-none-eabi on model with -march=armv8-a.

Kyrill

2013-09-10  Kyrylo Tkachov  

* config/arm/arm.md (arm_shiftsi3): New alternative l/l/M.

2013-09-10  Kyrylo Tkachov  

* gcc.target/arm/thumb-ifcvt-2.c: New test.

GCC32A-1043.patch
Description: Binary data

[PATCH][ARM] set "type" attribute properly in arm_cmpsi_insn, cleanup

2013-09-13 Thread Kyrill Tkachov


Hi all,

This patch splits the r/rI alternative in arm_cmpsi_insn so as to
better specify the "type" attribute. Also, the predicable_short_it
attribute is set properly to conform to the -mrestrict-it rules.

A nearby pattern has a redundant "%?" removed.

Tested arm-none-eabi on a model and bootstrapped on Chromebook.

Ok for trunk?

Thanks,
Kyrill

2013-09-13  Kyrylo Tkachov  

* config/arm/arm.md (arm_cmpsi_insn): Split rI alternative.
Set type attribute correctly. Set predicable_short_it attribute.
(cmpsi_shiftsi): Remove %? from output template.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 423d535bfc63d8db75f233a9a73aaf23d1e83108.. 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8194,19 +8194,21 @@ (define_insn "*addsi3_cbranch_scratch"
 
 (define_insn "*arm_cmpsi_insn"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_operand:SI 0 "s_register_operand" "l,r,r,r")
-		(match_operand:SI 1 "arm_add_operand""Py,r,rI,L")))]
+	(compare:CC (match_operand:SI 0 "s_register_operand" "l,r,r,r,r")
+		(match_operand:SI 1 "arm_add_operand""Py,r,r,I,L")))]
   "TARGET_32BIT"
   "@
cmp%?\\t%0, %1
cmp%?\\t%0, %1
cmp%?\\t%0, %1
+   cmp%?\\t%0, %1
cmn%?\\t%0, #%n1"
   [(set_attr "conds" "set")
-   (set_attr "arch" "t2,t2,any,any")
-   (set_attr "length" "2,2,4,4")
+   (set_attr "arch" "t2,t2,any,any,any")
+   (set_attr "length" "2,2,4,4,4")
(set_attr "predicable" "yes")
-   (set_attr "type" "*,*,*,arlo_imm")]
+   (set_attr "predicable_short_it" "yes,yes,yes,no,no")
+   (set_attr "type" "arlo_imm,*,*,arlo_imm,arlo_imm")]
 )
 
 (define_insn "*cmpsi_shiftsi"
@@ -8216,7 +8218,7 @@ (define_insn "*cmpsi_shiftsi"
 		 [(match_operand:SI 1 "s_register_operand" "r,r")
 		  (match_operand:SI 2 "shift_amount_operand" "M,rM")])))]
   "TARGET_32BIT"
-  "cmp%?\\t%0, %1%S3"
+  "cmp\\t%0, %1%S3"
   [(set_attr "conds" "set")
(set_attr "shift" "1")
(set_attr "arch" "32,a")

[PATCH][ARM][testsuite] Add effective target check for arm conditional execution

2013-09-13 Thread Kyrill Tkachov


Hi all,

gcc.target/arm/minmax_minus.c is really only valid when we have 
conditional execution available, that is non Thumb1-only targets. I've 
added an effective target check for that and used it in the test so that 
it does not get run and give a false negative when testing Thumb1 targets.


Ok for trunk?

Thanks,
Kyrill

2013-09-13  Kyrylo Tkachov  

* lib/target-supports.exp (check_effective_target_arm_cond_exec):
New procedure.
* gcc.target/arm/minmax_minus.c: Check for cond_exec target.diff --git a/gcc/testsuite/gcc.target/arm/minmax_minus.c b/gcc/testsuite/gcc.target/arm/minmax_minus.c
index 4c2dcdf..906342a 100644
--- a/gcc/testsuite/gcc.target/arm/minmax_minus.c
+++ b/gcc/testsuite/gcc.target/arm/minmax_minus.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_cond_exec } */
 /* { dg-options "-O2" } */
 
 #define MAX(a, b) (a > b ? a : b)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 0fb135c..fbe756e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2577,6 +2577,17 @@ proc check_effective_target_arm_thumb2 { } {
 } ""]
 }
 
+# Return 1 if this is an ARM target where conditional execution is available.
+
+proc check_effective_target_arm_cond_exec { } {
+return [check_no_compiler_messages arm_cond_exec assembly {
+	#if defined(__arm__) && defined(__thumb__) && !defined(__thumb2__)
+	#error FOO
+	#endif
+	int i;
+} ""]
+}
+
 # Return 1 if this is an ARM cortex-M profile cpu
 
 proc check_effective_target_arm_cortex_m { } {

[PATCH] Fix FAIL: g++.dg/debug/ra1.C on arm

2013-09-16 Thread Kyrill Tkachov


Hi all,

The test g++.dg/debug/ra1.C now gives an extra warning on arm:
"warning: width of 'tree_base::code' exceeds its type [enabled by default]"
which causes the test to "fail".

This patch adds -fno-short-enums to it to fix the warning for targets 
with short enums (including arm)


Tested to make sure it now passes on arm-none-eabi.

Ok for trunk?

(CC'ing Richard because he added the test)

[gcc/testsuite/]
2013-09-16  Kyrylo Tkachov  

* g++.dg/debug/ra1.C: Add -fno-short-enums for short_enum targets.diff --git a/gcc/testsuite/g++.dg/debug/ra1.C b/gcc/testsuite/g++.dg/debug/ra1.C
index b6f7bfc..aae7311 100644
--- a/gcc/testsuite/g++.dg/debug/ra1.C
+++ b/gcc/testsuite/g++.dg/debug/ra1.C
@@ -1,4 +1,5 @@
 /* { dg-options "-fcompare-debug" } */
+/* { dg-additional-options "-fno-short-enums" { target { short_enums } } } */
 
 enum signop { SIGNED, UNSIGNED };
 enum tree_code { FOO, BAR };

Re: [PATCH][Resend][tree-optimization] Fix PR58088

2013-09-16 Thread Kyrill Tkachov


Ping.

On 09/09/13 10:56, Kyrylo Tkachov wrote:

[Resending, since I was away and not pinging it]


Hi all,

In PR58088 the constant folder goes into an infinite recursion and runs out of
stack space because of two conflicting optimisations:
(X * C1) & C2 plays dirty when nested inside an IOR expression like so: ((X *
C1) & C2) | C4. One can undo the other leading to an infinite recursion.

Thanks to Marek for finding the IOR case.

This patch fixes that by checking in the IOR case that the change to C2 will
not conflict with the AND case transformation. Example testcases in the PR on
bugzilla.

This affects both trunk and 4.8 and regresses and bootstraps cleanly on both.

Bootstrapped on x86_64-linux-gnu and tested arm-none-eabi on qemu.

Ok for trunk and 4.8?

Thanks,
Kyrill

2013-09-09  Kyrylo Tkachov  

PR tree-optimization/58088
* fold-const.c (mask_with_trailing_zeros): New function.
(fold_binary_loc): Make sure we don't recurse infinitely
when the X in (X & C1) | C2 is a tree of the form (Y * K1) & K2.
Use mask_with_trailing_zeros where appropriate.


2013-09-09  Kyrylo Tkachov  

PR tree-optimization/58088
* gcc.c-torture/compile/pr58088.c: New test.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2607 matches

Mail list logo