Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
the compiler about system registers known to the assembler and how
these can be used.

The macros used to hold system register information reflect those in
use by binutils, a design choice made to facilitate the sharing of data
between different parts of the toolchain.

By aligning the representation of data common to different parts of
the toolchain we can greatly reduce the duplication of work,
facilitating the maintenance of the aarch64 back-end across different
parts of the toolchain; any `SYSREG (...)' that is added in one
project can just as easily be added to its counterpart.

GCC does not implement the full range of ISA flags present in
Binutils.  Where this is the case, aliases must be added to aarch64.h
with the unknown architectural extension being mapped to its
associated base architecture, such that any flag present in Binutils
and used in system register definitions is understood in GCC.  Again,
this is done such that flags can be used interchangeably between
projects making use of the aarch64-system-regs.def file.  This is done
in the next patch in the series.

`.arch' directives missing from the emitted assembly files as a
consequence of this aliasing are accounted for by the compiler using
the S encoding of system registers when
issuing mrs/msr instructions.  This design choice ensures the
assembler will accept anything that was deemed acceptable by the
compiler.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-system-regs.def: New.
---
  gcc/config/aarch64/aarch64-sys-regs.def | 1059 +++
  1 file changed, 1059 insertions(+)
  create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def


This file is supposed to be /identical/ to the one in GNU Binutils, 
right?  If so, I think it needs to continue to say that it is part of 
GNU Binutils, not part of GCC.  Ramana, has this happened before?  If 
not, does the SC have a position here?


R.


diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
b/gcc/config/aarch64/aarch64-sys-regs.def
new file mode 100644
index 000..d77fee1d5e3
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sys-regs.def
@@ -0,0 +1,1059 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Arm Ltd
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Array of system registers and their associated arch features.
+
+   Before using #include to read this file, define a macro:
+
+ SYSREG (name, encoding, flags, features)
+
+  The NAME is the system register name, as recognized by the
+  assembler.  ENCODING provides the necessary information for the binary
+  encoding of the system register.  The FLAGS field is a bitmask of
+  relevant behavior information pertaining to the particular register.
+  For example: is it read/write-only? does it alias another register?
+  The FEATURES field maps onto ISA flags and specifies the architectural
+  feature requirements of the system register.  */
+
+  SYSREG ("accdata_el1", CPENC (3,0,13,0,5), 0,  
AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el1",   CPENC (3,0,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el2",   CPENC (3,4,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("actlr_el3",   CPENC (3,6,1,0,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el1",   CPENC (3,0,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el12",  CPENC (3,5,5,1,0),  F_ARCHEXT,
  AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr0_el2",   CPENC (3,4,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr0_el3",   CPENC (3,6,5,1,0),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el1",   CPENC (3,0,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el12",  CPENC (3,5,5,1,1),  F_ARCHEXT,
  AARCH64_FEATURE (V8_1A))
+  SYSREG ("afsr1_el2",   CPENC (3,4,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("afsr1_el3",   CPENC (3,6,5,1,1),  0,
  AARCH64_NO_FEATURES)
+  SYSREG ("aidr_el1",   

Re: [PATCH 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

This patch defines the structure of a new .def file used for
representing the aarch64 system registers, what information it should
hold and the basic framework in GCC to process this file.

Entries in the aarch64-system-regs.def file should be as follows:

   SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)

Where the arguments to SYSREG correspond to:
   - NAME:  The system register name, as used in the assembly language.
   - CPENC: The system register encoding, mapping to:

   s__c_c_

   - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
  encode extra information required to ensure proper use of
  the system register.  For example, a read-only system
  register will have the flag F_REG_READ, while write-only
  registers will be labeled F_REG_WRITE.  Such flags are
  tested against at compile-time.
   - ARCH: The architectural features the system register is associated
  with.  This is encoded via one of three possible macros:
  1. When a system register is universally implemented, we say
  it has no feature requirements, so we tag it with the
  AARCH64_NO_FEATURES macro.
  2. When a register is only implemented for a single
  architectural extension EXT, the AARCH64_FEATURE (EXT), is
  used.
  3. When a given system register is made available by any of N
  possible architectural extensions, the AARCH64_FEATURES(N, ...)
  macro is used to combine them accordingly.

In order to enable proper interpretation of the SYSREG entries by the
compiler, flags defining system register behavior such as `F_REG_READ'
and `F_REG_WRITE' are also defined here, so they can later be used for
the validation of system register properties.

Finally, any architectural feature flags from Binutils missing from GCC
have appropriate aliases defined here so as to ensure
cross-compatibility of SYSREG entries across the toolchain.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.cc (sysreg_names): New.
(sysreg_names_generic): Likewise.
(sysreg_reqs): Likewise.
(sysreg_properties): Likewise.
(nsysreg): Likewise.
* gcc/config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
ISA flag.
(AARCH64_ISA_V8_1A): Likewise.
(AARCH64_ISA_V8_7A): Likewise.
(AARCH64_ISA_V8_8A): Likewise.
(AARCH64_NO_FEATURES): Likewise.
(AARCH64_FL_RAS): New ISA flag alias.
(AARCH64_FL_LOR): Likewise.
(AARCH64_FL_PAN): Likewise.
(AARCH64_FL_AMU): Likewise.
(AARCH64_FL_SCXTNUM): Likewise.
(AARCH64_FL_ID_PFR2): Likewise.
(F_DEPRECATED): New.
(F_REG_READ): Likewise.
(F_REG_WRITE): Likewise.
(F_ARCHEXT): Likewise.
(F_REG_ALIAS): Likewise.
---
  gcc/config/aarch64/aarch64.cc | 55 +++
  gcc/config/aarch64/aarch64.h  | 36 +++
  2 files changed, 91 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..030b39ded1a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -89,6 +89,8 @@
  /* This file should be included last.  */
  #include "target-def.h"
  
+#include "aarch64.h"


This shouldn't be needed.  target.h (included further up) includes tm.h 
which includes this file.


Otherwise OK.

Reviewed-by: rearn...@arm.com


+
  /* Defined for convenience.  */
  #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
  
@@ -2807,6 +2809,59 @@ static const struct processor all_cores[] =

{NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
  };
  
+/* Database of system register names.  */

+const char *sysreg_names[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) NAME,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+const char *sysreg_names_generic[] =
+{
+#define CPENC(SN, OP1, CN, CM, OP2) "s"#SN"_"#OP1"_c"#CN"_c"#CM"_"#OP2
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ENC,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* An aarch64_feature_set initializer for a single feature,
+   AARCH64_FEATURE_.  */
+#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
+
+/* Used by AARCH64_FEATURES.  */
+#define AARCH64_OR_FEATURES_1(X, F1) \
+  AARCH64_FEATURE (F1)
+#define AARCH64_OR_FEATURES_2(X, F1, F2) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
+#define AARCH64_OR_FEATURES_3(X, F1, ...) \
+  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
+
+/* An aarch64_feature_set initializer for the N features listed in "...".  */
+#define AARCH64_FEATURES(N, ...) \
+  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
+
+/* Database of system register architectural requirements.  */
+const unsigned long long sysreg_reqs[] =
+{
+#define SYSREG(NAME, ENC, FLAGS, ARCH) ARCH,
+#include "aarch64-sys-regs.def"
+#undef SYSREG
+};
+
+/* Database 

Re: [PATCH 3/6] aarch64: Implement system register validation tools

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Given the implementation of a mechanism of encoding system registers
into GCC, this patch provides the mechanism of validating their use by
the compiler.  In particular, this involves:

   1. Ensuring a supplied string corresponds to a known system
  register name.  System registers can be accessed either via their
  name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
  Register names are validated using a binary search of the
  `sysreg_names' structure populated from the
  `aarch64_system_regs.def' file via `match_reg'.
  The encoding naming convention is validated via a parser
  implemented in this patch - `is_implem_def_reg'.
   2. Once a given register name is deemed to be valid, it is checked
  against a further 2 criteria:
a. Is the referenced register implemented in the target
   architecture?  This is achieved by comparing the ARCH field
  in the relevant SYSREG entry from `aarch64_system_regs.def'
  against `aarch64_feature_flags' flags set at compile-time.
b. Is the register being used correctly?  Check the requested
  operation against the FLAGS specified in SYSREG.
  This prevents operations like writing to a read-only system
  register.
NOTE: For registers specified via their encoding
(e.g. `S3_0_C4_C0_0'), once the encoding value is deemed valid
(as per step 1) no further checks such as read/write support or
architectural feature requirements are done and this second step
is skipped, as is done in gas.

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): 
New.
(aarch64_retrieve_sysreg): Likewise.
* gcc/config/aarch64/aarch64.cc (match_reg): Likewise.
(is_implem_def_reg): Likewise.
(aarch64_valid_sysreg_name_p): Likewise.
(aarch64_retrieve_sysreg): Likewise.
(aarch64_sysreg_valid_for_rw_p): Likewise.
* gcc/config/aarch64/predicates.md (aarch64_sysreg_string): New.
---
  gcc/config/aarch64/aarch64-protos.h |   2 +
  gcc/config/aarch64/aarch64.cc   | 121 
  gcc/config/aarch64/predicates.md|   4 +
  3 files changed, 127 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..a134e2fcf8e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
  bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+bool aarch64_valid_sysreg_name_p (const char *);
+const char *aarch64_retrieve_sysreg (char *, bool);
  rtx aarch64_check_zero_based_sve_index_immediate (rtx);
  bool aarch64_sve_index_immediate_p (rtx);
  bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 030b39ded1a..dd5ac1cbc8d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28070,6 +28070,127 @@ aarch64_pars_overlap_p (rtx par1, rtx par2)
return false;
  }
  
+/* Binary search of a user-supplied system register name against

+   a database of known register names.  Upon match the index of
+   hit in database is returned, else return -1.  */


Given that we expect the number of explicit sysregs in a single 
compilation unit to be small, this is probably OK.  An alternative would 
be to build a hashmap of the register names the first time this routine 
is called and then do a lookup in that.  That would also avoid the need 
for the list to be maintained alphabetically.



+int
+match_reg (const char *ref, const char *database[], int db_len)
+{
+  /* Check for named system registers.  */
+  int imin = 0, imax = db_len - 1, mid, cmp_res;
+  while (imin <= imax)
+{
+  mid = (imin + imax) / 2;
+
+  cmp_res = strcmp (ref, database[mid]);
+  if (cmp_res == 0)
+   return mid;
+  else if (cmp_res > 0)
+   imin = mid+1;
+  else
+   imax = mid-1;
+}
+  return -1;
+}
+
+/* Parse an implementation-defined system register name of
+   the form S[0-3]_[0-7]_C[0-15]_C[0-15]_[1-7].
+   Return true if name matched against above pattern, false
+   otherwise.  */


Another advantage of using a hash map above would be that we could then 
add registers matched by this routine to the map and therefore optimize 
rescanning for them (on the basis that if they are used once, there's a 
good chance of them being used again).



+bool
+is_implem_def_reg (const char *regname)
+{
+/* Check for implementation-defined system registers.  */
+  int name_len = strlen (regname);
+  if (name_len < 12 || name_len > 14)
+return false;
+
+  in

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

   (set (reg/i:DI 0 x0)
  (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

   "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
  gcc/config/aarch64/aarch64.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
  
switch (GET_CODE (x))

{
+   case CONST_STRING:
+ {
+   const char *output_op = XSTR (x, 0);
+   asm_fprintf (f, "%s", output_op);
+   break;
+ }
case REG:
  if (aarch64_sve_data_mode_p (GET_MODE (x)))
{


Didn't we discuss (off list) always printing out the generic register 
names, so that there was less dependency on having a specific assembler 
version that knows about newer sysregs?


R.


Re: [PATCH 5/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

Implement the aarch64 intrinsics for reading and writing system
registers with the following signatures:

uint32_t __arm_rsr(const char *special_register);
uint64_t __arm_rsr64(const char *special_register);
void* __arm_rsrp(const char *special_register);
float __arm_rsrf(const char *special_register);
double __arm_rsrf64(const char *special_register);
void __arm_wsr(const char *special_register, uint32_t value);
void __arm_wsr64(const char *special_register, uint64_t value);
void __arm_wsrp(const char *special_register, const void *value);
void __arm_wsrf(const char *special_register, float value);
void __arm_wsrf64(const char *special_register, double value);

gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Add enums for new builtins.
(aarch64_init_rwsr_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
(aarch64_expand_rwsr_builtin):  New.
(aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
* gcc/config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
(write_sysregdi): Likewise.
* gcc/config/aarch64/arm_acle.h (__arm_rsr): New.
(__arm_rsrp): Likewise.
(__arm_rsr64): Likewise.
(__arm_rsrf): Likewise.
(__arm_rsrf64): Likewise.
(__arm_wsr): Likewise.
(__arm_wsrp): Likewise.
(__arm_wsr64): Likewise.
(__arm_wsrf): Likewise.
(__arm_wsrf64): Likewise.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr.c: New.
* gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c: Likewise.
---
  gcc/config/aarch64/aarch64-builtins.cc| 200 ++
  gcc/config/aarch64/aarch64.md |  17 ++
  gcc/config/aarch64/arm_acle.h |  30 +++
  .../gcc.target/aarch64/acle/rwsr-1.c  |  20 ++
  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
  5 files changed, 411 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d8bb2a989a5 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,17 @@ enum aarch64_builtins
AARCH64_RBIT,
AARCH64_RBITL,
AARCH64_RBITLL,
+  /* System register builtins.  */
+  AARCH64_RSR,
+  AARCH64_RSRP,
+  AARCH64_RSR64,
+  AARCH64_RSRF,
+  AARCH64_RSRF64,
+  AARCH64_WSR,
+  AARCH64_WSRP,
+  AARCH64_WSR64,
+  AARCH64_WSRF,
+  AARCH64_WSRF64,
AARCH64_BUILTIN_MAX
  };
  
@@ -1798,6 +1809,65 @@ aarch64_init_rng_builtins (void)

   AARCH64_BUILTIN_RNG_RNDRRS);
  }
  
+/* Add builtins for reading system register.  */

+static void
+aarch64_init_rwsr_builtins (void)
+{
+  tree fntype = NULL;
+  tree const_char_ptr_type
+= build_pointer_type (build_type_variant (char_type_node, true, false));
+
+#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
+  aarch64_builtin_decls[AARCH64_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
+
+  fntype
+= build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
+
+  fntype
+= build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
+
+  fntype
+= build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
+
+  fntype
+= build_function_type_list (float_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
+
+  fntype
+= build_function_type_list (double_type_node, const_char_ptr_type, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint32_type_node, NULL);
+
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR, wsr, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   const_ptr_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRP, wsrp, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   uint64_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR64, wsr64, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+   float_type_node, NULL);
+  AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF, wsrf, fntype);
+
+  fntype
+= build_function_type_list (void_type_node, const_char_ptr_type,
+ 

Re: [PATCH 6/6] aarch64: Add front-end argument type checking for target builtins

2023-10-05 Thread Richard Earnshaw




On 03/10/2023 16:18, Victor Do Nascimento wrote:

In implementing the ACLE read/write system register builtins it was
observed that leaving argument type checking to be done at expand-time
meant that poorly-formed function calls were being "fixed" by certain
optimization passes, meaning bad code wasn't being properly picked up
in checking.

Example:

   const char *regname = "amcgcr_el0";
   long long a = __builtin_aarch64_rsr64 (regname);

is reduced by the ccp1 pass to

   long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");

As these functions require an argument of STRING_CST type, there needs
to be a check carried out by the front-end capable of picking this up.

The introduced `check_general_builtin_call' function will be called by
the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
carrying out any appropriate checks associated with a particular
builtin function code.


Doesn't this prevent reasonable wrapping of the __builtin... names with 
something more palatable?  Eg:


static inline __attribute__(("always_inline")) long long get_sysreg_ll 
(const char *regname)

{
  return __builtin_aarch64_rsr64 (regname);
}

...
  long long x = get_sysreg_ll("amcgcr_el0");
...

?

R.



gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
New.
* gcc/config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
Add check_general_builtin_call call.
* gcc/config/aarch64/aarch64-protos.h (check_general_builtin_call):
New.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c: New.
---
  gcc/config/aarch64/aarch64-builtins.cc| 33 +++
  gcc/config/aarch64/aarch64-c.cc   |  4 +--
  gcc/config/aarch64/aarch64-protos.h   |  3 ++
  .../gcc.target/aarch64/acle/rwsr-2.c  | 15 +
  4 files changed, 53 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index d8bb2a989a5..6734361f4f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2126,6 +2126,39 @@ aarch64_general_builtin_decl (unsigned code, bool)
return aarch64_builtin_decls[code];
  }
  
+bool

+check_general_builtin_call (location_t location, vec,
+   unsigned int code, tree fndecl,
+   unsigned int nargs ATTRIBUTE_UNUSED, tree *args)
+{
+  switch (code)
+{
+case AARCH64_RSR:
+case AARCH64_RSRP:
+case AARCH64_RSR64:
+case AARCH64_RSRF:
+case AARCH64_RSRF64:
+case AARCH64_WSR:
+case AARCH64_WSRP:
+case AARCH64_WSR64:
+case AARCH64_WSRF:
+case AARCH64_WSRF64:
+  if (TREE_CODE (args[0]) == VAR_DECL
+ || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
+ != STRING_CST)
+   {
+ const char  *fn_name, *err_msg;
+ fn_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+ err_msg = "first argument to %<%s%> must be a string literal";
+ error_at (location, err_msg, fn_name);
+ return false;
+   }
+}
+  /* Default behavior.  */
+  return true;
+}
+
  typedef enum
  {
SIMD_ARG_COPY_TO_REG,
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 578ec6f45b0..6e2b83b8308 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -338,8 +338,8 @@ aarch64_check_builtin_call (location_t loc, vec 
arg_loc,
switch (code & AARCH64_BUILTIN_CLASS)
  {
  case AARCH64_BUILTIN_GENERAL:
-  return true;
-
+  return check_general_builtin_call (loc, arg_loc, subcode, orig_fndecl,
+nargs, args);
  case AARCH64_BUILTIN_SVE:
return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
  orig_fndecl, nargs, args);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a134e2fcf8e..9ef96ff511f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -990,6 +990,9 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
  void handle_arm_acle_h (void);
  void handle_arm_neon_h (void);
  
+bool check_general_builtin_call (location_t, vec, unsigned int,

+ tree, unsigned int, tree *);
+
  namespace aarch64_sve {
void init_builtins ();
void handle_arm_sve_h ();
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
new file mode 100644
index 000..72e5fb75b21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
@@ -0,0 +1,15 @@
+/* Test the __arm_[r,w]sr ACLE intrinsics fami

Re: [PATCH 4/6] aarch64: Add basic target_print_operand support for CONST_STRING

2023-10-05 Thread Richard Earnshaw




On 05/10/2023 13:26, Richard Earnshaw wrote:



On 03/10/2023 16:18, Victor Do Nascimento wrote:

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

   (set (reg/i:DI 0 x0)
  (unspec:DI [(const_string ("amcgcr_el0"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

   "mrs\t%x0, %1"

gcc/ChangeLog

* gcc/config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.
---
  gcc/config/aarch64/aarch64.cc | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc 
b/gcc/config/aarch64/aarch64.cc

index dd5ac1cbc8d..d6dd0586ac1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12400,6 +12400,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
    switch (GET_CODE (x))
  {
+    case CONST_STRING:
+  {
+    const char *output_op = XSTR (x, 0);
+    asm_fprintf (f, "%s", output_op);
+    break;
+  }
  case REG:
    if (aarch64_sve_data_mode_p (GET_MODE (x)))
  {


Didn't we discuss (off list) always printing out the generic register 
names, so that there was less dependency on having a specific assembler 
version that knows about newer sysregs?




You can ignore this.  I've just seen that the tests show that is happening.

Reviewed-by: rearn...@arm.com


R.


Re: [PATCH 1/2] arm: Use deltas for Arm switch tables

2023-10-19 Thread Richard Earnshaw




On 28/09/2023 14:26, Richard Ball wrote:

For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: New test.


#define CASE_VECTOR_PC_RELATIVE ((TARGET_ARM || TARGET_THUMB2   \
  || (TARGET_THUMB1 \
  && (optimize_size || flag_pic)))  \
 && (!target_pure_code))

A minor nit for future reference: (TARGET_ARM || TARGET_THUMB2) is 
normally written as TARGET_32BIT.  No need to fix this as the next patch 
will rewrite this macro again anyway.


This is OK.

Reviewed-by: rearn...@arm.com

R.


Re: [PATCH 2/2] arm: move the switch tables for Arm to the RO data section.

2023-10-19 Thread Richard Earnshaw




On 28/09/2023 14:29, Richard Ball wrote:

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
 from (!target_pure_code) condition.
 (ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
 .Lrtx label and remove adr instructions.
* config/arm/arm.md
 (arm_casesi_internal): Use force_reg to generate ldr instructions that
 would otherwise be out of range, and change rtl to accommodate force 
reg.
 Additionally remove unnecessary register temp.
 (casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
 targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
 change adr instruction to ldr.


This all looks pretty good, but there are some minor niggles to sort out 
before it can go in...


arm.cc:

 arm_output_casesi (rtx *operands)
 {
+  char buf[100];

buf is unused, so this breaks a native bootstrap.

  output_asm_insn ("add\t%|pc, %|pc, %4, lsl #2", operands);;

Two semicolons at the end of the line.

+  else
+   {
+ output_asm_insn ("ldr\t%|pc, [%5, %0, lsl #2]", operands);
+   }

Our normal coding style is to omit the braces for a single statement in 
an 'if/else' clause, even if the other arm of the clause uses braces, so:


  else
output_asm_insn ("ldr\t%|pc, [%5, %0, lsl #2]", operands);

+output_asm_insn ("nop;", operands);

Stray semicolon after the "nop".

#define CASE_VECTOR_PC_RELATIVE (TARGET_ARM || ((TARGET_THUMB2  \
  || (TARGET_THUMB1 \
  && (optimize_size || flag_pic)))  \
 && (!target_pure_code)))

The indentation here is incorrect, which makes it very hard to 
understand the logic.  But I think a bit of reordering would help 
clarify things as well..


#define CASE_VECTOR_PC_RELATIVE
  (TARGET_ARM
   || (!target_pure_code
   && (TARGET_THUMB2
   || (TARGET_THUMB1 && (optimize_size || flag_pic)

(obviously with the line escapes added back in)

arm.md (casesi):

  "TARGET_ARM || ((TARGET_THUMB2 || optimize_size || flag_pic) &&
   ^^
operators should be at the start of the following line, not the end of 
the previous one.

  (!target_pure_code))"

So:

  "TARGET_ARM || ((TARGET_THUMB2 || optimize_size || flag_pic)
  && (!target_pure_code))"

But I think this could be laid out better as well:

  "(TARGET_ARM
|| (!target_pure_code
&& (TARGET_THUMB2 || optimize_size || flag_pic)))"

arm_casesi_internal:

  rtx tmp = force_reg (SImode, gen_rtx_LABEL_REF (SImode, operands[2]));

Tmp is not generally a good choice of name, even for short fragments 
like this.  Use something more descriptive to the object it holds, like 
"lref"; or, better still, a name that describes what the label points to 
(vec_table_ref?).


elf.h:

/* We put Thumb-2 jump tables in the text section, because it makes
   the code more efficient, but for Thumb-1 and ARM it's better to put 
them out of

   band unless we are generating compressed tables.  */

This comment is misleading now, as it implies that compressed tables for 
arm are still sometimes placed in the text segment (the unless clause) 
and that's not true.


R.


[PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-09 Thread Richard Earnshaw

The patches I posted earlier this year for mitigating against
CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
which it became obvious that a rethink was needed.  This mail, and the
following patches attempt to address that feedback and present a new
approach to mitigating against this form of attack surface.

There were two major issues with the original approach:

- The speculation bounds were too tightly constrained - essentially
  they had to represent and upper and lower bound on a pointer, or a
  pointer offset.
- The speculation constraints could only cover the immediately preceding
  branch, which often did not fit well with the structure of the existing
  code.

An additional criticism was that the shape of the intrinsic did not
fit particularly well with systems that used a single speculation
barrier that essentially had to wait until all preceding speculation
had to be resolved.

To address all of the above, these patches adopt a new approach, based
in part on a posting by Chandler Carruth to the LLVM developers list
(https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
but which we have extended to deal with inter-function speculation.
The patches divide the problem into two halves.

The first half is some target-specific code to track the speculation
condition through the generated code to provide an internal variable
which can tell us whether or not the CPU's control flow speculation
matches the data flow calculations.  The idea is that the internal
variable starts with the value TRUE and if the CPU's control flow
speculation ever causes a jump to the wrong block of code the variable
becomes false until such time as the incorrect control flow
speculation gets unwound.

The second half is that a new intrinsic function is introduced that is
much simpler than we had before.  The basic version of the intrinsic
is now simply:

  T var = __builtin_speculation_safe_value (T unsafe_var);

Full details of the syntax can be found in the documentation patch, in
patch 1.  In summary, when not speculating the intrinsic returns
unsafe_var; when speculating then if it can be shown that the
speculative flow has diverged from the intended control flow then zero
is returned.  An optional second argument can be used to return an
alternative value to zero.  The builtin may cause execution to pause
until the speculation state is resolved.

There are seven patches in this set, as follows.

1) Introduces the new intrinsic __builtin_sepculation_safe_value.
2) Adds a basic hard barrier implementation for AArch32 (arm) state.
3) Adds a basic hard barrier implementation for AArch64 state.
4) Adds a new command-line option -mtrack-speculation (currently a no-op).
5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
6) Adds the new speculation tracking pass for AArch64
7) Uses the new speculation tracking pass to generate CSDB-based barrier
   sequences

I haven't added a speculation-tracking pass for AArch32 at this time.
It is possible to do this, but would require quite a lot of rework for
the arm backend due to the limited number of registers that are
available.

Although patch 6 is AArch64 specific, I'd appreciate a review from
someone more familiar with the branch edge code than myself.  There
appear to be a number of tricky issues with more complex edges so I'd
like a second opinion on that code in case I've missed an important
case.

R.

  

Richard Earnshaw (7):
  Add __builtin_speculation_safe_value
  Arm - add speculation_barrier pattern
  AArch64 - add speculation barrier
  AArch64 - Add new option -mtrack-speculation
  AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
  AArch64 - new pass to add conditional-branch speculation tracking
  AArch64 - use CSDB based sequences if speculation tracking is enabled

 gcc/builtin-types.def |   6 +
 gcc/builtins.c|  57 
 gcc/builtins.def  |  20 ++
 gcc/c-family/c-common.c   | 143 +
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++
 gcc/config/aarch64/aarch64.c  |  88 +-
 gcc/config/aarch64/aarch64.md | 140 -
 gcc/config/aarch64/aarch64.opt|   4 +
 gcc/config/aarch64/iterators.md   |   3 +
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/config/arm/arm.md |  21 ++
 gcc/config/arm/unspecs.md |   1 +
 gcc/doc/cpp.texi  |   4 +
 gcc/doc/extend.texi   |  29 ++
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/md.texi   |  15 +
 gcc/doc/tm.texi 

[PATCH 2/7] Arm - add speculation_barrier pattern

2018-07-09 Thread Richard Earnshaw

This patch defines a speculation barrier for AArch32.

* config/arm/unspecs.md (unspecv): Add VUNSPEC_SPECULATION_BARRIER.
* config/arm/arm.md (speculation_barrier): New expand.
(speculation_barrier_insn): New pattern.
---
 gcc/config/arm/arm.md | 21 +
 gcc/config/arm/unspecs.md |  1 +
 2 files changed, 22 insertions(+)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a026..ca2a2f5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12012,6 +12012,27 @@ (define_insn ""
   [(set_attr "length" "4")
(set_attr "type" "coproc")])
 
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_EITHER"
+  "
+/* Don't emit anything for Thumb1 and suppress the warning from the
+   generic expansion.  */
+if (!TARGET_32BIT)
+   DONE;
+  "
+)
+
+;; Generate a hard speculation barrier when we have not enabled speculation
+;; tracking.
+(define_insn "*speculation_barrier_insn"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_32BIT"
+  "isb\;dsb\\tsy"
+  [(set_attr "type" "block")
+   (set_attr "length" "8")]
+)
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e..1941673 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -168,6 +168,7 @@ (define_c_enum "unspecv" [
   VUNSPEC_MCRR2		; Represent the coprocessor mcrr2 instruction.
   VUNSPEC_MRRC		; Represent the coprocessor mrrc instruction.
   VUNSPEC_MRRC2		; Represent the coprocessor mrrc2 instruction.
+  VUNSPEC_SPECULATION_BARRIER ; Represents an unconditional speculation barrier.
 ])
 
 ;; Enumerators for NEON unspecs.


[PATCH 3/7] AArch64 - add speculation barrier

2018-07-09 Thread Richard Earnshaw

Similar to Arm, this adds an unconditional speculation barrier for AArch64.

* config/aarch64.md (unspecv): Add UNSPECV_SPECULAION_BARRIER.
(speculation_barrier): New insn.
---
 gcc/config/aarch64/aarch64.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..c135ada 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -205,6 +205,7 @@ (define_c_enum "unspecv" [
 UNSPECV_SET_FPSR		; Represent assign of FPSR content.
 UNSPECV_BLOCKAGE		; Represent a blockage
 UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+UNSPECV_SPECULATION_BARRIER ; Represent speculation barrier.
   ]
 )
 
@@ -6093,6 +6094,15 @@ (define_expand "set_clobber_cc"
 		   (match_operand 1))
 	  (clobber (reg:CC CC_REGNUM))])])
 
+;; Hard speculation barrier.
+(define_insn "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPECULATION_BARRIER)]
+  ""
+  "isb\;dsb\\tsy"
+  [(set_attr "length" "8")
+   (set_attr "type" "block")]
+)
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 


[PATCH 4/7] AArch64 - Add new option -mtrack-speculation

2018-07-09 Thread Richard Earnshaw

This patch doesn't do anything useful, it simply adds a new command-line
option -mtrack-speculation to AArch64.  Subsequent patches build on this.

* config/aarch64/aarch64.opt (mtrack-speculation): New target option.
---
 gcc/config/aarch64/aarch64.opt | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 1426b45..bc9b22a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -214,3 +214,7 @@ Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits)
 mverbose-cost-dump
 Common Undocumented Var(flag_aarch64_verbose_cost)
 Enables verbose cost model dumping in the debug dump files.
+
+mtrack-speculation
+Target Var(aarch64_track_speculation)
+Generate code to track when the CPU might be speculating incorrectly.


[PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-09 Thread Richard Earnshaw

This patch defines a new intrinsic function
__builtin_speculation_safe_value.  A generic default implementation is
defined which will attempt to use the backend pattern
"speculation_safe_barrier".  If this pattern is not defined, or if it
is not available, then the compiler will emit a warning, but
compilation will continue.

Note that the test spec-barrier-1.c will currently fail on all
targets.  This is deliberate, the failure will go away when
appropriate action is taken for each target backend.

gcc:
* builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
(BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
(BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
* builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
* builtins.c (expand_speculation_safe_value): New function.
(expand_builtin): Call it.
* doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
* doc/extend.texi: Document __builtin_speculation_safe_value.
* doc/md.texi: Document "speculation_barrier" pattern.
* doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE.
* doc/tm.texi: Regenerated.
* target.def (speculation_safe_value): New hook.
* targhooks.c (default_speculation_safe_value): New function.
* targhooks.h (default_speculation_safe_value): Add prototype.

c-family:
* c-common.c (speculation_safe_resolve_size): New function.
(speculation_safe_resolve_params): New function.
(speculation_safe_resolve_return): New function.
(resolve_overloaded_builtin): Handle __builtin_speculation_safe_value.
* c-cppbuiltin.c (c_cpp_builtins): Add pre-define for
__HAVE_SPECULATION_SAFE_VALUE.

testsuite:
* gcc.dg/spec-barrier-1.c: New test.
* gcc.dg/spec-barrier-2.c: New test.
* gcc.dg/spec-barrier-3.c: New test.
---
 gcc/builtin-types.def |   6 ++
 gcc/builtins.c|  57 ++
 gcc/builtins.def  |  20 +
 gcc/c-family/c-common.c   | 143 ++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/doc/cpp.texi  |   4 +
 gcc/doc/extend.texi   |  29 +++
 gcc/doc/md.texi   |  15 
 gcc/doc/tm.texi   |  20 +
 gcc/doc/tm.texi.in|   2 +
 gcc/target.def|  23 ++
 gcc/targhooks.c   |  27 +++
 gcc/targhooks.h   |   2 +
 gcc/testsuite/gcc.dg/spec-barrier-1.c |  40 ++
 gcc/testsuite/gcc.dg/spec-barrier-2.c |  19 +
 gcc/testsuite/gcc.dg/spec-barrier-3.c |  13 
 16 files changed, 424 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index b01095c..70fae35 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -763,6 +763,12 @@ DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_LONG_VAR,
 			 BT_VOID, BT_LONG)
 DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_ULL_VAR,
 			 BT_VOID, BT_ULONGLONG)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_PTR_PTR_VAR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I1_I1_VAR, BT_I1, BT_I1)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I2_I2_VAR, BT_I2, BT_I2)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I4_I4_VAR, BT_I4, BT_I4)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I8_I8_VAR, BT_I8, BT_I8)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I16_I16_VAR, BT_I16, BT_I16)
 
 DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_FILEPTR_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_CONST_STRING)
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 91658e8..9f97ecf 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6716,6 +6716,52 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
   return target;
 }
 
+/* Expand a call to __builtin_speculation_safe_value_.  MODE
+   represents the size of the first argument to that call, or VOIDmode
+   if the argument is a pointer.  IGNORE will be true if the result
+   isn't used.  */
+static rtx
+expand_speculation_safe_value (machine_mode mode, tree exp, rtx target,
+			   bool ignore)
+{
+  rtx val, failsafe;
+  unsigned nargs = call_expr_nargs (exp);
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+
+  if (mode == VOIDmode)
+{
+  mode = TYPE_MODE (TREE_TYPE (arg0));
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
+}
+
+  val = expand_expr (arg0, NULL_RTX, mode, EXPAND_NORMAL);
+
+  /* An optional second 

[PATCH 7/7] AArch64 - use CSDB based sequences if speculation tracking is enabled

2018-07-09 Thread Richard Earnshaw

In this final patch, now that we can track speculation through conditional
branches, we can use this information to use a less expensive CSDB based
speculation barrier.

* config/aarch64/iterators.md (ALLI_TI): New iterator.
* config/aarch64/aarch64.md (despeculate_copy): New
expand.
(despeculate_copy_insn): New insn.
(despeculate_copyti_insn): New insn.
(despeculate_simple): New insn
(despeculate_simpleti): New insn.
* config/aarch64/aarch64.c (aarch64_speculation_safe_value): New
function.
(TARGET_SPECULATION_SAFE_VALUE): Redefine to
aarch64_speculation_safe_value.
---
 gcc/config/aarch64/aarch64.c| 42 ++
 gcc/config/aarch64/aarch64.md   | 96 +
 gcc/config/aarch64/iterators.md |  3 ++
 3 files changed, 141 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b11d768..b30b857 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17648,6 +17648,45 @@ aarch64_select_early_remat_modes (sbitmap modes)
 }
 }
 
+/* Override the default target speculation_safe_value.  */
+static rtx
+aarch64_speculation_safe_value (machine_mode mode,
+rtx result, rtx val, rtx failval)
+{
+  /* Maybe we should warn if falling back to hard barriers.  They are
+ likely to be noticably more expensive than the alternative below.  */
+  if (!aarch64_track_speculation)
+return default_speculation_safe_value (mode, result, val, failval);
+
+  if (!REG_P (val))
+val = copy_to_mode_reg (mode, val);
+
+  if (!aarch64_reg_or_zero (failval, mode))
+failval = copy_to_mode_reg (mode, failval);
+
+  switch (mode)
+{
+case E_QImode:
+  emit_insn (gen_despeculate_copyqi (result, val, failval));
+  break;
+case E_HImode:
+  emit_insn (gen_despeculate_copyhi (result, val, failval));
+  break;
+case E_SImode:
+  emit_insn (gen_despeculate_copysi (result, val, failval));
+  break;
+case E_DImode:
+  emit_insn (gen_despeculate_copydi (result, val, failval));
+  break;
+case E_TImode:
+  emit_insn (gen_despeculate_copyti (result, val, failval));
+  break;
+default:
+  gcc_unreachable ();
+}
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -18117,6 +18156,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SELECT_EARLY_REMAT_MODES
 #define TARGET_SELECT_EARLY_REMAT_MODES aarch64_select_early_remat_modes
 
+#undef TARGET_SPECULATION_SAFE_VALUE
+#define TARGET_SPECULATION_SAFE_VALUE aarch64_speculation_safe_value
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 528d03d..cbcada2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6129,6 +6129,102 @@ (define_insn "speculation_barrier"
(set_attr "speculation_barrier" "true")]
 )
 
+;; Support for __builtin_speculation_safe_value when we have speculation
+;; tracking enabled.  Use the speculation tracker to decide whether to
+;; copy operand 1 to the target, or to copy the fail value (operand 2).
+(define_expand "despeculate_copy"
+  [(set (match_operand:ALLI_TI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI_TI
+	 [(match_operand:ALLI_TI 1 "register_operand" "r")
+	  (match_operand:ALLI_TI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  "
+  {
+if (operands[2] == const0_rtx)
+  {
+	rtx tracker;
+	if (mode == TImode)
+	  tracker = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+	else
+	  tracker = gen_rtx_REG (mode, SPECULATION_TRACKER_REGNUM);
+
+	emit_insn (gen_despeculate_simple (operands[0], operands[1],
+		 tracker));
+	DONE;
+  }
+  }
+  "
+)
+
+;; Pattern to match despeculate_copy
+(define_insn "*despeculate_copy_insn"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI
+	 [(match_operand:ALLI 1 "register_operand" "r")
+	  (match_operand:ALLI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  {
+operands[3] = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+output_asm_insn ("cmp\\t%3, #0\;csel\\t%0, %1, %2, ne\;csdb",
+		 operands);
+return "";
+  }
+  [(set_attr "length" "12")
+   (set_attr "type" "block")
+   (set_attr "speculation_barrier" "true")]
+)
+
+;; Pattern to match despeculate_copyti
+(define_insn "*despeculate_copyti_insn"
+  [(set (match_operand:TI 0 "register_operand" "=r")
+	(unspec_volatile:TI
+	 [(match_operand:TI 1 "register_operand" "r")
+	  (match_operand:TI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULAT

[PATCH 6/7] AArch64 - new pass to add conditional-branch speculation tracking

2018-07-09 Thread Richard Earnshaw

This patch is the main part of the speculation tracking code.  It adds
a new target-specific pass that is run just before the final branch
reorg pass (so that it can clean up any new edge insertions we make).
The pass is only run with -mtrack-speculation is passed on the command
line.

One thing that did come to light as part of this was that the stack pointer
register was not being permitted in comparision instructions.  We rely on
that for moving the tracking state between SP and the scratch register at
function call boundaries.

* config/aarch64/aarch64-speculation.cc: New file.
* config/aarch64/aarch64-passes.def (pass_track_speculation): Add before
pass_reorder_blocks.
* config/aarch64/aarch64-protos.h (make_pass_track_speculation): Add
prototype.
* config/aarch64/aarch64.c (aarch64_conditional_register_usage): Fix
X14 and X15 when tracking speculation.
* config/aarch64/aarch64.md (register name constants): Add
SPECULATION_TRACKER_REGNUM and SPECULATION_SCRATCH_REGNUM.
(unspec): Add UNSPEC_SPECULATION_TRACKER.
(speculation_barrier): New insn attribute.
(cmp): Allow SP in comparisons.
(speculation_tracker): New insn.
(speculation_barrier): Add speculation_barrier attribute.
* config/aarch64/t-aarch64: Add make rule for aarch64-speculation.o.
* config.gcc (aarch64*-*-*): Add aarch64-speculation.o to extra_objs.
* doc/invoke.texi (AArch64 Options): Document -mtrack-speculation.
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++
 gcc/config/aarch64/aarch64.c  |  13 +
 gcc/config/aarch64/aarch64.md |  30 +-
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  10 +-
 8 files changed, 558 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-speculation.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 78e84c2..b17fdba 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
 	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def
index 87747b4..3d6a254 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
.  */
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bc11a78..e80ffcf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -554,7 +554,8 @@ enum aarch64_parse_opt_result aarch64_parse_extension (const char *,
 std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
 			unsigned long);
 
-rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
+rtl_opt_pass *make_pass_fma_steering (gcc::context *);
+rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc
new file mode 100644
index 000..2dd06ae
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -0,0 +1,494 @@
+/* Speculation tracking and mitigation (e.g. CVE 2017-5753) for AArch64.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree-pass.h"
+#include "profile-count.h"
+#include "cfg.h"
+#include "cfgbuild.h"
+#include "print-rtl.h"
+#include "cfgrtl.h"
+#include "function.h"
+#include "bas

[PATCH 5/7] AArch64 - disable CB[N]Z TB[N]Z when tracking speculation

2018-07-09 Thread Richard Earnshaw

The CB[N]Z and TB[N]Z instructions do not expose the comparison through
the condition code flags.  This makes it impossible to track speculative
execution through such a branch.  We can handle this relatively easily
by simply disabling the patterns in this case.

A side effect of this is that the split patterns for the atomic operations
need to also avoid generating these instructions.  They mostly have simple
fall-backs for this already.

* config/aarch64/aarch64.md (cb1): Disable when
aarch64_track_speculation is true.
(tb1): Likewise.
* config/aarch64/aarch64.c (aarch64_split_compare_regs): Do not
generate CB[N]Z when tracking speculation.
(aarch64_split_compare_and_swap): Likewise.
(aarch64_split_atomic_op): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 33 ++---
 gcc/config/aarch64/aarch64.md |  6 +++---
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..da96afd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14465,7 +14465,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (strong_zero_p)
 {
-  x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14483,7 +14492,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (!is_weak)
 {
-  x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14819,7 +14837,16 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   aarch64_emit_store_exclusive (mode, cond, mem,
 gen_lowpart (mode, new_out), model_rtx);
 
-  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+  if (aarch64_track_speculation)
+{
+  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+  rtx cc_reg = aarch64_gen_compare_reg (NE, cond, const0_rtx);
+  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+}
+  else
+x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c135ada..259a07d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -690,7 +690,7 @@ (define_insn "*cb1"
 (const_int 0))
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
@@ -720,7 +720,7 @@ (define_insn "*tb1"
 	 (label_ref (match_operand 2 "" ""))
 	 (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {
@@ -756,7 +756,7 @@ (define_insn "*cb1"
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {


[PATCH 03/11] AArch64 - add speculation barrier

2018-07-27 Thread Richard Earnshaw

Similar to Arm, this adds an unconditional speculation barrier for AArch64.

* config/aarch64.md (unspecv): Add UNSPECV_SPECULAION_BARRIER.
(speculation_barrier): New insn.
---
 gcc/config/aarch64/aarch64.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..c135ada 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -205,6 +205,7 @@ (define_c_enum "unspecv" [
 UNSPECV_SET_FPSR		; Represent assign of FPSR content.
 UNSPECV_BLOCKAGE		; Represent a blockage
 UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+UNSPECV_SPECULATION_BARRIER ; Represent speculation barrier.
   ]
 )
 
@@ -6093,6 +6094,15 @@ (define_expand "set_clobber_cc"
 		   (match_operand 1))
 	  (clobber (reg:CC CC_REGNUM))])])
 
+;; Hard speculation barrier.
+(define_insn "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPECULATION_BARRIER)]
+  ""
+  "isb\;dsb\\tsy"
+  [(set_attr "length" "8")
+   (set_attr "type" "block")]
+)
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 


[PATCH 00/11] (v2) Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-27 Thread Richard Earnshaw
Port Maintainers: You need to decide what action is required for your
port to handle speculative execution, even if that action is to use
the trivial no-speculation on this architecture.  You must also
consider whether or not a furture implementation of your architecture
might need to deal with this in making that decision.

The patches I posted earlier this year for mitigating against
CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
which it became obvious that a rethink was needed.  This mail, and the
following patches attempt to address that feedback and present a new
approach to mitigating against this form of attack surface.

There were two major issues with the original approach:

- The speculation bounds were too tightly constrained - essentially
  they had to represent and upper and lower bound on a pointer, or a
  pointer offset.
- The speculation constraints could only cover the immediately preceding
  branch, which often did not fit well with the structure of the existing
  code.

An additional criticism was that the shape of the intrinsic did not
fit particularly well with systems that used a single speculation
barrier that essentially had to wait until all preceding speculation
had to be resolved.

To address all of the above, these patches adopt a new approach, based
in part on a posting by Chandler Carruth to the LLVM developers list
(https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
but which we have extended to deal with inter-function speculation.
The patches divide the problem into two halves.

The first half is some target-specific code to track the speculation
condition through the generated code to provide an internal variable
which can tell us whether or not the CPU's control flow speculation
matches the data flow calculations.  The idea is that the internal
variable starts with the value TRUE and if the CPU's control flow
speculation ever causes a jump to the wrong block of code the variable
becomes false until such time as the incorrect control flow
speculation gets unwound.

The second half is that a new intrinsic function is introduced that is
much simpler than we had before.  The basic version of the intrinsic
is now simply:

  T var = __builtin_speculation_safe_value (T unsafe_var);

Full details of the syntax can be found in the documentation patch, in
patch 1.  In summary, when not speculating the intrinsic returns
unsafe_var; when speculating then if it can be shown that the
speculative flow has diverged from the intended control flow then zero
is returned.  An optional second argument can be used to return an
alternative value to zero.  The builtin may cause execution to pause
until the speculation state is resolved.

There are eleven patches in this set, as follows.

1) Introduces the new intrinsic __builtin_sepculation_safe_value.
2) Adds a basic hard barrier implementation for AArch32 (arm) state.
3) Adds a basic hard barrier implementation for AArch64 state.
4) Adds a new command-line option -mtrack-speculation (currently a no-op).
5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
6) Adds the new speculation tracking pass for AArch64
7) Uses the new speculation tracking pass to generate CSDB-based barrier
   sequences
8) Provides an alternative hook implementation for use on targets that never
   speculatively execute
9) Provides an trivial example of using that hook in the pdp11 backend.
10) Provides a possible implementation of the hard barrier for x86
11) Updates the PowerPC backend which already had a suitable barrier under
a different name.

I haven't added a speculation-tracking pass for AArch32 at this time.
It is possible to do this, but would require quite a lot of rework for
the arm backend due to the limited number of registers that are
available.

Although patch 6 is AArch64 specific, I'd appreciate a review from
someone more familiar with the branch edge code than myself.  There
appear to be a number of tricky issues with more complex edges so I'd
like a second opinion on that code in case I've missed an important
case.

R.

Richard Earnshaw (11):
  Add __builtin_speculation_safe_value
  Arm - add speculation_barrier pattern
  AArch64 - add speculation barrier
  AArch64 - Add new option -mtrack-speculation
  AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
  AArch64 - new pass to add conditional-branch speculation tracking
  AArch64 - use CSDB based sequences if speculation tracking is enabled
  targhooks - provide an alternative hook for targets that never execute
speculatively
  pdp11 - example of a port not needing a speculation barrier
  x86 - add speculation_barrier pattern
  rs6000 - add speculation_barrier pattern

 gcc/builtin-attrs.def   |   2 +
 gcc/builtin-types.def   |   6 +
 gcc/builtins.c  |  60 
 gcc/builtins.def|  22 ++
 gcc/c-family/c-comm

[PATCH 02/11] Arm - add speculation_barrier pattern

2018-07-27 Thread Richard Earnshaw

This patch defines a speculation barrier for AArch32.

* config/arm/unspecs.md (unspecv): Add VUNSPEC_SPECULATION_BARRIER.
* config/arm/arm.md (speculation_barrier): New expand.
(speculation_barrier_insn): New pattern.
---
 gcc/config/arm/arm.md | 21 +
 gcc/config/arm/unspecs.md |  1 +
 2 files changed, 22 insertions(+)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a026..ca2a2f5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12012,6 +12012,27 @@ (define_insn ""
   [(set_attr "length" "4")
(set_attr "type" "coproc")])
 
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_EITHER"
+  "
+/* Don't emit anything for Thumb1 and suppress the warning from the
+   generic expansion.  */
+if (!TARGET_32BIT)
+   DONE;
+  "
+)
+
+;; Generate a hard speculation barrier when we have not enabled speculation
+;; tracking.
+(define_insn "*speculation_barrier_insn"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_32BIT"
+  "isb\;dsb\\tsy"
+  [(set_attr "type" "block")
+   (set_attr "length" "8")]
+)
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e..1941673 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -168,6 +168,7 @@ (define_c_enum "unspecv" [
   VUNSPEC_MCRR2		; Represent the coprocessor mcrr2 instruction.
   VUNSPEC_MRRC		; Represent the coprocessor mrrc instruction.
   VUNSPEC_MRRC2		; Represent the coprocessor mrrc2 instruction.
+  VUNSPEC_SPECULATION_BARRIER ; Represents an unconditional speculation barrier.
 ])
 
 ;; Enumerators for NEON unspecs.


[PATCH 06/11] AArch64 - new pass to add conditional-branch speculation tracking

2018-07-27 Thread Richard Earnshaw

This patch is the main part of the speculation tracking code.  It adds
a new target-specific pass that is run just before the final branch
reorg pass (so that it can clean up any new edge insertions we make).
The pass is only run with -mtrack-speculation is passed on the command
line.

One thing that did come to light as part of this was that the stack pointer
register was not being permitted in comparision instructions.  We rely on
that for moving the tracking state between SP and the scratch register at
function call boundaries.

* config/aarch64/aarch64-speculation.cc: New file.
* config/aarch64/aarch64-passes.def (pass_track_speculation): Add before
pass_reorder_blocks.
* config/aarch64/aarch64-protos.h (make_pass_track_speculation): Add
prototype.
* config/aarch64/aarch64.c (aarch64_conditional_register_usage): Fix
X14 and X15 when tracking speculation.
* config/aarch64/aarch64.md (register name constants): Add
SPECULATION_TRACKER_REGNUM and SPECULATION_SCRATCH_REGNUM.
(unspec): Add UNSPEC_SPECULATION_TRACKER.
(speculation_barrier): New insn attribute.
(cmp): Allow SP in comparisons.
(speculation_tracker): New insn.
(speculation_barrier): Add speculation_barrier attribute.
* config/aarch64/t-aarch64: Add make rule for aarch64-speculation.o.
* config.gcc (aarch64*-*-*): Add aarch64-speculation.o to extra_objs.
* doc/invoke.texi (AArch64 Options): Document -mtrack-speculation.
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++
 gcc/config/aarch64/aarch64.c  |  13 +
 gcc/config/aarch64/aarch64.md |  30 +-
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  10 +-
 8 files changed, 558 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-speculation.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 78e84c2..b17fdba 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
 	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def
index 87747b4..3d6a254 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
.  */
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bc11a78..e80ffcf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -554,7 +554,8 @@ enum aarch64_parse_opt_result aarch64_parse_extension (const char *,
 std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
 			unsigned long);
 
-rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
+rtl_opt_pass *make_pass_fma_steering (gcc::context *);
+rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc
new file mode 100644
index 000..2dd06ae
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -0,0 +1,494 @@
+/* Speculation tracking and mitigation (e.g. CVE 2017-5753) for AArch64.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree-pass.h"
+#include "profile-count.h"
+#include "cfg.h"
+#include "cfgbuild.h"
+#include "print-rtl.h"
+#include "cfgrtl.h"
+#include "function.h"
+#include "bas

[PATCH 01/11] Add __builtin_speculation_safe_value

2018-07-27 Thread Richard Earnshaw

This patch defines a new intrinsic function
__builtin_speculation_safe_value.  A generic default implementation is
defined which will attempt to use the backend pattern
"speculation_safe_barrier".  If this pattern is not defined, or if it
is not available, then the compiler will emit a warning, but
compilation will continue.

Note that the test spec-barrier-1.c will currently fail on all
targets.  This is deliberate, the failure will go away when
appropriate action is taken for each target backend.

gcc:
* builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
(BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
(BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
* builtin-attrs.def (ATTR_NOVOPS_NOTHROW_LEAF_LIST): New attribute
list.
* builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
* builtins.c (expand_speculation_safe_value): New function.
(expand_builtin): Call it.
* doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
* doc/extend.texi: Document __builtin_speculation_safe_value.
* doc/md.texi: Document "speculation_barrier" pattern.
* doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE and
TARGET_HAVE_SPECULATION_SAFE_VALUE.
* doc/tm.texi: Regenerated.
* target.def (have_speculation_safe_value, speculation_safe_value): New
hooks.
* targhooks.c (default_have_speculation_safe_value): New function.
(default_speculation_safe_value): New function.
* targhooks.h (default_have_speculation_safe_value): Add prototype.
(default_speculation_safe_value): Add prototype.

c-family:
* c-common.c (speculation_safe_resolve_call): New function.
(speculation_safe_resolve_params): New function.
(speculation_safe_resolve_return): New function.
(resolve_overloaded_builtin): Handle __builtin_speculation_safe_value.
* c-cppbuiltin.c (c_cpp_builtins): Add pre-define for
__HAVE_SPECULATION_SAFE_VALUE.

testsuite:
* c-c++-common/spec-barrier-1.c: New test.
* c-c++-common/spec-barrier-2.c: New test.
* gcc.dg/spec-barrier-3.c: New test.
---
 gcc/builtin-attrs.def   |   2 +
 gcc/builtin-types.def   |   6 +
 gcc/builtins.c  |  60 ++
 gcc/builtins.def|  22 
 gcc/c-family/c-common.c | 164 
 gcc/c-family/c-cppbuiltin.c |   7 +-
 gcc/doc/cpp.texi|   4 +
 gcc/doc/extend.texi |  91 +++
 gcc/doc/md.texi |  15 +++
 gcc/doc/tm.texi |  31 ++
 gcc/doc/tm.texi.in  |   4 +
 gcc/target.def  |  35 ++
 gcc/targhooks.c |  32 ++
 gcc/targhooks.h |   3 +
 gcc/testsuite/c-c++-common/spec-barrier-1.c |  38 +++
 gcc/testsuite/c-c++-common/spec-barrier-2.c |  17 +++
 gcc/testsuite/gcc.dg/spec-barrier-3.c   |  13 +++
 17 files changed, 543 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/spec-barrier-1.c
 create mode 100644 gcc/testsuite/c-c++-common/spec-barrier-2.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c

diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 300ba65..e245e4d 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -129,6 +129,8 @@ DEF_ATTR_TREE_LIST (ATTR_NOTHROW_LIST, ATTR_NOTHROW, ATTR_NULL, ATTR_NULL)
 
 DEF_ATTR_TREE_LIST (ATTR_NOTHROW_LEAF_LIST, ATTR_LEAF, ATTR_NULL, ATTR_NOTHROW_LIST)
 
+DEF_ATTR_TREE_LIST (ATTR_NOVOPS_NOTHROW_LEAF_LIST, ATTR_NOVOPS, \
+		ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_LIST, ATTR_CONST,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_LEAF_LIST, ATTR_CONST,	\
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index b01095c..70fae35 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -763,6 +763,12 @@ DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_LONG_VAR,
 			 BT_VOID, BT_LONG)
 DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_ULL_VAR,
 			 BT_VOID, BT_ULONGLONG)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_PTR_PTR_VAR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I1_I1_VAR, BT_I1, BT_I1)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I2_I2_VAR, BT_I2, BT_I2)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I4_I4_VAR, BT_I4, BT_I4)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I8_I8_VAR, BT_I8, BT_I8)
+DEF_FUNCTIO

[PATCH 07/11] AArch64 - use CSDB based sequences if speculation tracking is enabled

2018-07-27 Thread Richard Earnshaw

In this final patch, now that we can track speculation through conditional
branches, we can use this information to use a less expensive CSDB based
speculation barrier.

* config/aarch64/iterators.md (ALLI_TI): New iterator.
* config/aarch64/aarch64.md (despeculate_copy): New
expand.
(despeculate_copy_insn): New insn.
(despeculate_copyti_insn): New insn.
(despeculate_simple): New insn
(despeculate_simpleti): New insn.
* config/aarch64/aarch64.c (aarch64_speculation_safe_value): New
function.
(TARGET_SPECULATION_SAFE_VALUE): Redefine to
aarch64_speculation_safe_value.
(aarch64_print_operand): Handle const0_rtx in modifier 'H'.
---
 gcc/config/aarch64/aarch64.c| 48 
 gcc/config/aarch64/aarch64.md   | 97 +
 gcc/config/aarch64/iterators.md |  3 ++
 3 files changed, 148 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cca465e..fc6eb1c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6760,6 +6760,12 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   break;
 
 case 'H':
+  if (x == const0_rtx)
+	{
+	  asm_fprintf (f, "xzr");
+	  break;
+	}
+
   if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1))
 	{
 	  output_operand_lossage ("invalid operand for '%%%c'", code);
@@ -17638,6 +17644,45 @@ aarch64_select_early_remat_modes (sbitmap modes)
 }
 }
 
+/* Override the default target speculation_safe_value.  */
+static rtx
+aarch64_speculation_safe_value (machine_mode mode,
+rtx result, rtx val, rtx failval)
+{
+  /* Maybe we should warn if falling back to hard barriers.  They are
+ likely to be noticably more expensive than the alternative below.  */
+  if (!aarch64_track_speculation)
+return default_speculation_safe_value (mode, result, val, failval);
+
+  if (!REG_P (val))
+val = copy_to_mode_reg (mode, val);
+
+  if (!aarch64_reg_or_zero (failval, mode))
+failval = copy_to_mode_reg (mode, failval);
+
+  switch (mode)
+{
+case E_QImode:
+  emit_insn (gen_despeculate_copyqi (result, val, failval));
+  break;
+case E_HImode:
+  emit_insn (gen_despeculate_copyhi (result, val, failval));
+  break;
+case E_SImode:
+  emit_insn (gen_despeculate_copysi (result, val, failval));
+  break;
+case E_DImode:
+  emit_insn (gen_despeculate_copydi (result, val, failval));
+  break;
+case E_TImode:
+  emit_insn (gen_despeculate_copyti (result, val, failval));
+  break;
+default:
+  gcc_unreachable ();
+}
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -18110,6 +18155,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SELECT_EARLY_REMAT_MODES
 #define TARGET_SELECT_EARLY_REMAT_MODES aarch64_select_early_remat_modes
 
+#undef TARGET_SPECULATION_SAFE_VALUE
+#define TARGET_SPECULATION_SAFE_VALUE aarch64_speculation_safe_value
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 528d03d..321a674 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6129,6 +6129,103 @@ (define_insn "speculation_barrier"
(set_attr "speculation_barrier" "true")]
 )
 
+;; Support for __builtin_speculation_safe_value when we have speculation
+;; tracking enabled.  Use the speculation tracker to decide whether to
+;; copy operand 1 to the target, or to copy the fail value (operand 2).
+(define_expand "despeculate_copy"
+  [(set (match_operand:ALLI_TI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI_TI
+	 [(match_operand:ALLI_TI 1 "register_operand" "r")
+	  (match_operand:ALLI_TI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  "
+  {
+if (operands[2] == const0_rtx)
+  {
+	rtx tracker;
+	if (mode == TImode)
+	  tracker = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+	else
+	  tracker = gen_rtx_REG (mode, SPECULATION_TRACKER_REGNUM);
+
+	emit_insn (gen_despeculate_simple (operands[0], operands[1],
+		 tracker));
+	DONE;
+  }
+  }
+  "
+)
+
+;; Patterns to match despeculate_copy.  Note that "hint 0x14" is the
+;; encoding for CSDB, but will work in older versions of the assembler.
+(define_insn "*despeculate_copy_insn"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI
+	 [(match_operand:ALLI 1 "register_operand" "r")
+	  (match_operand:ALLI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  {
+operands[3] = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+output_asm_insn ("cmp\\t%3, #0\;csel\\t%0, %1, %2, ne\;hint\t0x14 // csdb",
+		 o

[PATCH 09/11] pdp11 - example of a port not needing a speculation barrier

2018-07-27 Thread Richard Earnshaw

This patch is intended as an example of all that is needed if the
target system doesn't support CPUs that have speculative execution.
I've chosen the pdp11 port on the basis that it's old enough that this
is likely to be true for all existing implementations and that there
is also little chance of that changing in future!

* config/pdp11/pdp11.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Redefine
to speculation_safe_value_not_needed.
---
 gcc/config/pdp11/pdp11.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index 1bcdaed..62c653f 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -291,6 +291,9 @@ static bool pdp11_scalar_mode_supported_p (scalar_mode);
 
 #undef TARGET_INVALID_WITHIN_DOLOOP
 #define TARGET_INVALID_WITHIN_DOLOOP hook_constcharptr_const_rtx_insn_null
+
+#undef TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
 /* A helper function to determine if REGNO should be saved in the
current function's stack frame.  */


[PATCH 10/11] x86 - add speculation_barrier pattern

2018-07-27 Thread Richard Earnshaw

This patch adds a speculation barrier for x86, based on my
understanding of the required mitigation for that CPU, which is to use
an lfence instruction.

This patch needs some review by an x86 expert and if adjustments are
needed, I'd appreciate it if they could be picked up by the port
maintainer.  This is supposed to serve as an example of how to deploy
the new __builtin_speculation_safe_value() intrinsic on this
architecture.

* config/i386/i386.md (unspecv): Add UNSPECV_SPECULATION_BARRIER.
(speculation_barrier): New insn.
---
 gcc/config/i386/i386.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 559ad93..73948c1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -301,6 +301,9 @@ (define_c_enum "unspecv" [
 
   ;; For CLDEMOTE support
   UNSPECV_CLDEMOTE
+
+  ;; For Speculation Barrier support
+  UNSPECV_SPECULATION_BARRIER
 ])
 
 ;; Constants to represent rounding modes in the ROUND instruction
@@ -20979,6 +20982,13 @@ (define_insn "cldemote"
   [(set_attr "type" "other")
(set_attr "memory" "unknown")])
 
+(define_insn "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPECULATION_BARRIER)]
+  ""
+  "lfence"
+  [(set_attr "type" "other")
+   (set_attr "length" "3")])
+
 (include "mmx.md")
 (include "sse.md")
 (include "sync.md")


[PATCH 08/11] targhooks - provide an alternative hook for targets that never execute speculatively

2018-07-27 Thread Richard Earnshaw

This hook adds an alternative implementation for the target hook
TARGET_HAVE_SPECULATION_SAFE_VALUE; it can be used by targets that have no
CPU implementations that execute code speculatively.  All that is needed for
such targets now is to add:

 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed.

to where you have your other target hooks and you're done.

gcc:
* targhooks.h (speculation_safe_value_not_needed): New prototype.
* targhooks.c (speculation_safe_value_not_needed): New function.
* target.def (have_speculation_safe_value): Update documentation.
* doc/tm.texi: Regenerated.
---
 gcc/doc/tm.texi | 5 +
 gcc/target.def  | 7 ++-
 gcc/targhooks.c | 7 +++
 gcc/targhooks.h | 1 +
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 15b0ab8..f36e376 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11959,6 +11959,11 @@ This hook is used to determine the level of target support for
  a pattern named @code{speculation_barrier}.  Else it returns true
  for the first case and whether the pattern is enabled for the current
  compilation for the second case.
+ 
+ For targets that have no processors that can execute instructions
+ speculatively an alternative implemenation of this hook is available:
+ simply redefine this hook to @code{speculation_safe_value_not_needed}
+ along with your other target hooks.
 @end deftypefn
 
 @deftypefn {Target Hook} rtx TARGET_SPECULATION_SAFE_VALUE (machine_mode @var{mode}, rtx @var{result}, rtx @var{val}, rtx @var{failval})
diff --git a/gcc/target.def b/gcc/target.def
index d598067..5599eb4 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4206,7 +4206,12 @@ DEFHOOK
  The default implementation returns false if the target does not define\n\
  a pattern named @code{speculation_barrier}.  Else it returns true\n\
  for the first case and whether the pattern is enabled for the current\n\
- compilation for the second case.",
+ compilation for the second case.\n\
+ \n\
+ For targets that have no processors that can execute instructions\n\
+ speculatively an alternative implemenation of this hook is available:\n\
+ simply redefine this hook to @code{speculation_safe_value_not_needed}\n\
+ along with your other target hooks.",
 bool, (bool active), default_have_speculation_safe_value)
 
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 06de1e3..62051a9 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2324,6 +2324,13 @@ default_have_speculation_safe_value (bool active)
   return false;
 #endif
 }
+/* Alternative implementation of TARGET_HAVE_SPECULATION_SAFE_VALUE
+   that can be used on targets that never have speculative execution.  */
+bool
+speculation_safe_value_not_needed (bool active)
+{
+  return !active;
+}
 
 /* Default implementation of the speculation-safe-load builtin.  This
implementation simply copies val to result and generates a
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 74ffe5f..b716b97 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -286,6 +286,7 @@ extern void default_select_early_remat_modes (sbitmap);
 extern tree default_preferred_else_value (unsigned, tree, unsigned, tree *);
 
 extern bool default_have_speculation_safe_value (bool);
+extern bool speculation_safe_value_not_needed (bool);
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 
 #endif /* GCC_TARGHOOKS_H */


[PATCH 05/11] AArch64 - disable CB[N]Z TB[N]Z when tracking speculation

2018-07-27 Thread Richard Earnshaw

The CB[N]Z and TB[N]Z instructions do not expose the comparison through
the condition code flags.  This makes it impossible to track speculative
execution through such a branch.  We can handle this relatively easily
by simply disabling the patterns in this case.

A side effect of this is that the split patterns for the atomic operations
need to also avoid generating these instructions.  They mostly have simple
fall-backs for this already.

* config/aarch64/aarch64.md (cb1): Disable when
aarch64_track_speculation is true.
(tb1): Likewise.
* config/aarch64/aarch64.c (aarch64_split_compare_regs): Do not
generate CB[N]Z when tracking speculation.
(aarch64_split_compare_and_swap): Likewise.
(aarch64_split_atomic_op): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 33 ++---
 gcc/config/aarch64/aarch64.md |  6 +++---
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1369704..90849b5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14479,7 +14479,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (strong_zero_p)
 {
-  x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14497,7 +14506,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (!is_weak)
 {
-  x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14833,7 +14851,16 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   aarch64_emit_store_exclusive (mode, cond, mem,
 gen_lowpart (mode, new_out), model_rtx);
 
-  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+  if (aarch64_track_speculation)
+{
+  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+  rtx cc_reg = aarch64_gen_compare_reg (NE, cond, const0_rtx);
+  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+}
+  else
+x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c135ada..259a07d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -690,7 +690,7 @@ (define_insn "*cb1"
 (const_int 0))
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
@@ -720,7 +720,7 @@ (define_insn "*tb1"
 	 (label_ref (match_operand 2 "" ""))
 	 (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {
@@ -756,7 +756,7 @@ (define_insn "*cb1"
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {


[PATCH 04/11] AArch64 - Add new option -mtrack-speculation

2018-07-27 Thread Richard Earnshaw

This patch doesn't do anything useful, it simply adds a new command-line
option -mtrack-speculation to AArch64.  Subsequent patches build on this.

* config/aarch64/aarch64.opt (mtrack-speculation): New target option.
---
 gcc/config/aarch64/aarch64.opt | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 1426b45..bc9b22a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -214,3 +214,7 @@ Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits)
 mverbose-cost-dump
 Common Undocumented Var(flag_aarch64_verbose_cost)
 Enables verbose cost model dumping in the debug dump files.
+
+mtrack-speculation
+Target Var(aarch64_track_speculation)
+Generate code to track when the CPU might be speculating incorrectly.


[PATCH 11/11] rs6000 - add speculation_barrier pattern

2018-07-27 Thread Richard Earnshaw

This patch reworks the existing rs6000_speculation_barrier pattern to
work with the new __builtin_sepculation_safe_value() intrinsic.  The
change is trivial as it simply requires renaming the existing speculation
barrier pattern.

So the total patch is to delete 14 characters!

* config/rs6000/rs6000.md (speculation_barrier): Renamed from
rs6000_speculation_barrier.
* config/rs6000/rs6000.c (rs6000_expand_builtin): Adjust for
new barrier pattern name.
---
 gcc/config/rs6000/rs6000.c  | 2 +-
 gcc/config/rs6000/rs6000.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1976072..46c6838 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16179,7 +16179,7 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 
 case MISC_BUILTIN_SPEC_BARRIER:
   {
-	emit_insn (gen_rs6000_speculation_barrier ());
+	emit_insn (gen_speculation_barrier ());
 	return NULL_RTX;
   }
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44d32d9..03870e9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12614,7 +12614,7 @@ (define_insn "group_ending_nop"
   return "ori 2,2,0";
 })
 
-(define_insn "rs6000_speculation_barrier"
+(define_insn "speculation_barrier"
   [(unspec_volatile:BLK [(const_int 0)] UNSPECV_SPEC_BARRIER)]
   ""
   "ori 31,31,0")


Re: [PATCH] AArch64: Improve immediate generation

2023-10-20 Thread Richard Earnshaw




On 19/10/2023 13:43, Wilco Dijkstra wrote:

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Fix test.
 * gcc.target/aarch64/moveor_imm.c: Add new test.
 * gcc.target/aarch64/pr106583.c: Fix test.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
  }
  
/* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */

diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..2434ca380ca2cad3e1e4181deeaad680f518b866
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -6,7 +6,7 @@
  int
  foo (long long x)
  {
-  return x <= 0x1998;
+  return x <= 0x9998;
  }
  
  int

diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..5f4997b50398fdda5924610959e0c54967ad0735
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,31 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 --save-temps" } */
+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
+
+/* { dg-final { scan-assembler-not {\tmovk\t} } } */
+/* { dg-final { scan-assembler-times {\tmov\t} 5 } } */
+/* { dg-final { scan-assembler-times {\teor\t} 5 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f931580817d78dc1cc58f03b251bd21bec71f59..79ada5160ce059d66eeaee407ca02488b2a1f114
 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr106583.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr106583.c
@@ -3,7 +3,7 @@
  
  long f1 (void)

  {
-  return 0x7efefefefefefeff;
+  return 0x75fefefefefefeff;
  }
  
  long f2 (void)




I think the tests should be converted to use check-function-bodies, 
rather than scanning for counts on the entire file.  It makes it far 
more obvious what's changed if a test starts to fail.  The functions are 
all trivial, so the test can be quite precise.


Otherwise, this LGTM.

R.


Re: [PATCH] aarch64: [PR110986] Emit csinv again for `a ? ~b : b`

2023-10-20 Thread Richard Earnshaw




On 20/10/2023 13:13, Richard Sandiford wrote:

+(define_insn_and_split "*cmov_insn_insv"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(xor:GPI
+(neg:GPI
+ (match_operator:GPI 1 "aarch64_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)]))
+(match_operand:GPI 3 "general_operand" "r")))]
+  "can_create_pseudo_p ()"
+  "#"
+  "&& true"

>

IMO this is an ICE trap, since it hard-codes the assumption that there
will be a split pass after the last pre-LRA call to recog.  I think we
should jsut provide the asm directly instead.


So why not add

(clobber (match_operand:GPI 4 "register_operand" "=&r"))

to the insn, then you'll always get the scratch needed and the need to 
check cane_create_pseudo_p goes away.


R.


Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-20 Thread Richard Earnshaw




On 02/10/2023 18:12, Wilco Dijkstra wrote:

Hi Ramana,


I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
default armhf settings are incorrect. I shouldn't need the --with-float=hard 
since
that is obviously implied by armhf, and they should also imply armv7-a with 
vfpv3
according to documentation. It seems to get confused and skip some tests. I 
tried
using --with-fpu=auto, but that doesn't work at all, so in the end I forced it 
like:
--with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
tests.


Yeah that's a wart that I don't like.

armhf just implies the hard float ABI and came into being to help
distinguish from the Base PCS for some of the distros at the time
(2010s). However we didn't want to set a baseline arch at that time
given the imminent arrival of v8-a and thus the specification of
--with-arch , --with-fpu and --with-float became second nature to many
of us working on it at that time.


Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'
That's not incorrect.  It's the first version of the architecture that 
can support the hard-float ABI.




That's like 25 years out of date!


It's not a matter of being out of date (and it's only 22 years since 
arm1020e was announced ;) it's a matter of being as compatible as we can 
be with existing hardware out-of-the-box.  Distros are free, of course, 
to set a higher bar and do so.




However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'


Wrong.  Rawhide uses Arm state (or it did last I checked).  As I 
mentioned above, distros are free to set a higher bar.




So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float 
configured,
it turns off FP and then complains because hard-float implies you must have 
FP...


OK, I think I see the problem there, it's in the data for
proc add_options_for_arm_arch_FUNC

in lib/target-supports.exp.  In order to work correctly with -mfpu=auto, 
the -march flags in the table need "+fp" adding in most cases (pretty 
much everything from armv5e onwards) - that's harmless whenever the 
float-abi is soft, but should do the right thing when softfp or hard are 
used.




So in most configurations Iincluding the one used by distro compilers) we 
basically
skip lots of tests for no apparent reason...


Ok, thanks for promising to do so - I trust you to get it done. Please
try out various combinations of -march v7ve, v7-a , v8-a with the tool
as each of them have slightly different rules. For instance v7ve
allows LDREXD and STREXD to be single copy atomic for 64 bit loads
whereas v7-a did not .


You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.


Ok if no regressions but as you might get nagged by the post commit CI ...


Thanks, I've committed it. Those links don't show anything concrete, however I 
do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco


R.


Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-24 Thread Richard Earnshaw




On 08/09/2023 09:43, Christophe Lyon via Gcc-patches wrote:

The test was declaring 'int *carry;' and wrote to '*carry' without
initializing 'carry' first, leading to an attempt to write at address
zero, and a crash.

Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
as parameter.

2023-09-08  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.


OK.

R.


---
  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
  1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c 
b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
index a8c6cce67c8..931c9d2f30b 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
+++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
@@ -7,7 +7,7 @@
  
  volatile int32x4_t c1;

  volatile uint32x4_t c2;
-int *carry;
+int carry;
  
  int

  main ()
@@ -21,45 +21,45 @@ main ()
uint32x4_t inactive2 = vcreateq_u32 (0, 0);
  
mve_pred16_t p = 0x;

-  (*carry) = 0x;
+  carry = 0x;
  
__builtin_arm_set_fpscr_nzcvqc (0);

-  c1 = vadcq (a1, b1, carry);
+  c1 = vadcq (a1, b1, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq (a2, b2, carry);
+  c2 = vadcq (a2, b2, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq (a1, b1, carry);
+  c1 = vsbcq (a1, b1, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq (a2, b2, carry);
+  c2 = vsbcq (a2, b2, &carry);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vadcq_m (inactive1, a1, b1, carry, p);
+  c1 = vadcq_m (inactive1, a1, b1, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vadcq_m (inactive2, a2, b2, carry, p);
+  c2 = vadcq_m (inactive2, a2, b2, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
+  c1 = vsbcq_m (inactive1, a1, b1, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
-  (*carry) = 0x;
+  carry = 0x;
__builtin_arm_set_fpscr_nzcvqc (0);
-  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
+  c2 = vsbcq_m (inactive2, a2, b2, &carry, p);
if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
  __builtin_abort ();
  


Re: [PATCH] config, aarch64: Use a more compatible sed invocation.

2023-10-25 Thread Richard Earnshaw




On 24/10/2023 16:53, Iain Sandoe wrote:

Although this came up initially when working on the Darwin Arm64
port, it also breaks cross-compilers on platforms with non-GNU sed.

Tested on x86_64-darwin X aarch64-linux-gnu, aarch64-darwin,
aarch64-linux-gnu and x86_64-linux-gnu.  OK for master?
thanks,
Iain

--- 8< ---

Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension to the -e (recognising extended regex).

This is failing on Darwin, which defaults to Posix behaviour for -e.
However '-E' is accepted to indicate an extended RE.  Strictly, this
is also not really sufficient, since we should only require a Posix
sed (but it seems supported for BSD-derivatives).



The man pages I have for linux, freebsd and macos all show something 
pretty similar:


  -e script
  add the script to the commands to be executed

Wording varies slightly, but I think the meaning is clearly the same. 
So this really has nothing to do with extended regexps.


That means, I think, that we really want '-E -e 

Re: [PATCH v2] AArch64: Improve immediate generation

2023-10-25 Thread Richard Earnshaw




On 24/10/2023 18:27, Wilco Dijkstra wrote:

v2: Use check-function-bodies in tests

Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates.  This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.

gcc/testsuite:
* gcc.target/aarch64/imm_choice_comparison.c: Change tests.
 * gcc.target/aarch64/moveor_imm.c: Add new test.
 * gcc.target/aarch64/pr106583.c: Change tests.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
578a253d6e0e133e19592553fc873b3e73f9f218..ed5be2b64c9a767d74e9d78415da964c669001aa
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5748,6 +5748,26 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
generate,
}
  return 2;
}
+
+  /* Try 2 bitmask immediates which are xor'd together. */
+  for (i = 0; i < 64; i += 16)
+   {
+ val2 = (val >> i) & mask;
+ val2 |= val2 << 16;
+ val2 |= val2 << 32;
+ if (aarch64_bitmask_imm (val2) && aarch64_bitmask_imm (val ^ val2))
+   break;
+   }
+
+  if (i != 64)
+   {
+ if (generate)
+   {
+ emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
+ emit_insn (gen_xordi3 (dest, dest, GEN_INT (val ^ val2)));
+   }
+ return 2;
+   }
  }
  
/* Try a bitmask plus 2 movk to generate the immediate in 3 instructions.  */

diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
index 
ebc44d6dbc7287d907603d77d7b54496de177c4b..a1fc90ad73411ae8ed848fa321586afcb8d710aa
 100644
--- a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -1,32 +1,64 @@
  /* { dg-do compile } */
  /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
  
  /* Go from four moves to two.  */
  
+/*

+** foo:
+** mov w[0-9]+, 2576980377
+** movkx[0-9]+, 0x, lsl 32
+** ...
+*/
+
  int
  foo (long long x)
  {
-  return x <= 0x1998;
+  return x <= 0x9998;
  }
  
+/*

+** GT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  GT (unsigned int x)
  {
return x > 0xfefe;
  }
  
+/*

+** LE:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  LE (unsigned int x)
  {
return x <= 0xfefe;
  }
  
+/*

+** GE:
+** mov w[0-9]+, 4278190079
+** ...
+*/
+
  int
  GE (long long x)
  {
return x >= 0xff00;
  }
  
+/*

+** LT:
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  LT (int x)
  {
@@ -35,6 +67,13 @@ LT (int x)
  
  /* Optimize the immediate in conditionals.  */
  
+/*

+** check:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  check (int x, int y)
  {
@@ -44,11 +83,15 @@ check (int x, int y)
return x;
  }
  
+/*

+** tern:
+** ...
+** mov w[0-9]+, -16777217
+** ...
+*/
+
  int
  tern (int x)
  {
return x >= 0xff00 ? 5 : -3;
  }
-
-/* baz produces one movk instruction.  */
-/* { dg-final { scan-assembler-times "movk" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/moveor_imm.c 
b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
new file mode 100644
index 
..1c0c3f3bf8c588f9661112a8b3f9a72c5ddff95c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/moveor_imm.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** f1:
+**  movx0, -6148914691236517206
+** eor x0, x0, -9223372036854775807
+** ret
+*/


Some odd white space above.

Also, I think it would be better to write the tests as

** f1:
**   ...
**   
**   ...

Then different prologue and epilogue options (such as BTI or pac-ret) 
won't affect the tests.



+
+long f1 (void)
+{
+  return 0x2aab;
+}
+
+/*
+** f2:
+** mov x0, -1085102592571150096
+** eor x0, x0, -2305843009213693951
+** ret
+*/
+
+long f2 (void)
+{
+  return 0x10f0f0f0f0f0f0f1;
+}
+
+/*
+** f3:
+** mov x0, -3689348814741910324
+** eor x0, x0, -4611686018427387903
+** ret
+*/
+
+long f3 (void)
+{
+  return 0xccd;
+}
+
+/*
+** f4:
+** mov x0, -7378697629483820647
+** eor x0, x0, -9223372036854775807
+** ret
+*/
+
+long f4 (void)
+{
+  return 0x1998;
+}
+
+/*
+** f5:
+** mov x0, 3689348814741910323
+** eor x0, x0, 864691128656461824
+** ret
+*/
+
+long f5 (void)
+{
+  return 0x3f333f33;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106583.c 
b/gcc/testsuite/gcc.target/aarch64/pr106583.c
index 
0f93

Re: [PATCH 2/2 v2] arm: move the switch tables for Arm to the RO data section

2023-10-30 Thread Richard Earnshaw




On 27/10/2023 15:55, Richard Ball wrote:

v2: Formatting and nits fixed.

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
 from (!target_pure_code) condition.
 (ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
 .Lrtx label and remove adr instructions.
* config/arm/arm.md
 (arm_casesi_internal): Use force_reg to generate ldr instructions that
 would otherwise be out of range, and change rtl to accommodate force 
reg.
 Additionally remove unnecessary register temp.
 (casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
 targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
 change adr instruction to ldr.


OK.

Reviewed-by: rearn...@arm.com

R.


Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-28 Thread Richard Earnshaw
On 28/10/2019 21:52, Bernhard Reutner-Fischer wrote:
> On Mon, 28 Oct 2019 11:53:06 +1100
> Kugan Vivekanandarajah  wrote:
> 
>> On Wed, 23 Oct 2019 at 23:07, Richard Biener  
>> wrote:
> 
>>> Did you try this with multiple assembler options?  I see you stream
>>> them as -Wa,-mfpu=xyz,-mthumb but then compare the whole
>>> option strings so a mismatch with -Wa,-mthumb,-mfpu=xyz would be
> 
> indeed, i'd have expected some kind of sorting, but i don't see it in
> the proposed patch?

Why?  If the options interact with each other, then sorting could change
the meaning.  We could only sort the options if we knew that couldn't
happen.  For a trivial example,
-mcpu=zzz -mcpu=xxx

would override the zzz with xxx, but sorting would switch them around.

And this is just a trivial case, if the options interact but have
different names then you've no idea what must happen unless you are GAS;
and we don't want to build such knowledge into GCC.

So preserver the options, in the order they were given based on the
standard expectations: namely that options on the command line will
override anything built in to the compiler itself.

R.

> 
>>> diagnosed.  If there's a spec induced -Wa option do we get to see
>>> that as well?  I can imagine -march=xyz enabling a -Wa option
>>> for example.
>>>
>>> + *collect_as = XNEWVEC (char, strlen (args_text) + 1);
>>> + strcpy (*collect_as, args_text);
>>>
>>> there's strdup.  Btw, I'm not sure why you don't simply leave
>>> the -Wa option in the merged options [individually] and match
>>> them up but go the route of comparing strings and carrying that
>>> along separately.  I think that would be much better.
>>
>> Is attached patch which does this is OK?
> 
>> +  obstack_init (&collect_obstack);
>> +  obstack_grow (&collect_obstack, "COLLECT_AS_OPTIONS=",
>> +sizeof ("COLLECT_AS_OPTIONS=") - 1);
>> +  obstack_grow (&collect_obstack, "-Wa,", strlen ("-Wa,"));
> 
> Why don't you grow once, including the "-Wa," ?
> 
>> +/* Append options OPTS from -Wa, options to ARGV_OBSTACK.  */
>> +
>> +static void
>> +append_compiler_wa_options (obstack *argv_obstack,
>> +struct cl_decoded_option *opts,
>> +unsigned int count)
>> +{
>> +  static const char *collect_as;
>> +  for (unsigned int j = 1; j < count; ++j)
>> +{
>> +  struct cl_decoded_option *option = &opts[j];
> 
> Instead of the switch below, why not just
> 
> if (option->opt_index != OPT_Wa_)
>   continue;
> 
> here?
> 
>> +  if (j == 1)
>> +collect_as = NULL;
> 
> or at least here?
> 
> (why's collect_as static in the first place? wouldn't that live in the parent 
> function?)
> 
>> +  const char *args_text = option->orig_option_with_args_text;
>> +  switch (option->opt_index)
>> +{
>> +case OPT_Wa_:
>> +  break;
>> +default:
>> +  continue;
>> +}



Re: [wwwdocs] Recommend reviewing local changes before pushing them

2020-01-14 Thread Richard Earnshaw
On 14/01/2020 13:37, Jonathan Wakely wrote:
> I really think people should be reviewing what they're about to push
> before doing it.
> 
> OK for wwwdocs?
> 

I'd recommend

git push origin HEAD:

rather than just 'git push'

Otherwise, OK.

R.


Re: [wwwdocs] Fix indentation of .ssh/config snippet

2020-01-14 Thread Richard Earnshaw
On 14/01/2020 13:18, Jonathan Wakely wrote:
> OK for wwwdocs?
> 
> 
OK.


[PATCH] arm: correct constraints on movsi_compare0 [PR91913]

2020-02-10 Thread Richard Earnshaw

The peephole that detects a mov of one register to another followed by
a comparison of the original register against zero is only used in Arm
state; but the instruction that matches this is generic to all 32-bit
compilation states.  That instruction lacks support for SP which is
permitted in Arm state, but has restrictions in Thumb2 code.

This patch fixes the problem by allowing SP when in ARM state for all
registers; in Thumb state it allows SP only as a source when the
register really is copied to another target.

* config/arm/arm.md (movsi_compare0): Allow SP as a source register
in Thumb state and also as a destination in Arm state.  Add T16
variants.
---
 gcc/config/arm/arm.md | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5baf82d2ad6..ab277996462 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6627,16 +6627,21 @@ (define_expand "builtin_setjmp_receiver"
 
 (define_insn "*movsi_compare0"
   [(set (reg:CC CC_REGNUM)
-	(compare:CC (match_operand:SI 1 "s_register_operand" "0,r")
+	(compare:CC (match_operand:SI 1 "s_register_operand" "0,0,l,rk,rk")
 		(const_int 0)))
-   (set (match_operand:SI 0 "s_register_operand" "=r,r")
+   (set (match_operand:SI 0 "s_register_operand" "=l,rk,l,r,rk")
 	(match_dup 1))]
   "TARGET_32BIT"
   "@
cmp%?\\t%0, #0
+   cmp%?\\t%0, #0
+   subs%?\\t%0, %1, #0
+   subs%?\\t%0, %1, #0
subs%?\\t%0, %1, #0"
   [(set_attr "conds" "set")
-   (set_attr "type" "alus_imm,alus_imm")]
+   (set_attr "arch" "t2,*,t2,t2,a")
+   (set_attr "type" "alus_imm")
+   (set_attr "length" "2,4,2,4,4")]
 )
 
 ;; Subroutine to store a half word from a register into memory.


[PATCH] arm: check for low register before applying peephole [PR113510]

2024-03-05 Thread Richard Earnshaw

For thumb1, when using a peephole to fuse

mov reg, #const
add reg, reg, SP

into

add reg, SP, #const

we must first check that reg is a low register, otherwise we will ICE
when trying to recognize the resulting insn.

gcc/ChangeLog:

PR target/113510
* config/arm/thumb1.md (peephole2 to fuse mov imm/add SP): Use
low_register_operand.
---

This appears to have gone latent again, but checked against the known
failing version.

 gcc/config/arm/thumb1.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 14d6df580af..d7074b43f60 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -113,7 +113,7 @@ (define_insn_and_split "*thumb1_addsi3"
 ;; Reloading and elimination of the frame pointer can
 ;; sometimes cause this optimization to be missed.
 (define_peephole2
-  [(set (match_operand:SI 0 "arm_general_register_operand" "")
+  [(set (match_operand:SI 0 "low_register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))
(set (match_dup 0)
 	(plus:SI (match_dup 0) (reg:SI SP_REGNUM)))]


[PATCH] gomp: testsuite: improve compatibility of bad-array-section-3.c [PR113428]

2024-03-06 Thread Richard Earnshaw

This test generates different warnings on ilp32 targets because the size
of an integer matches the size of a pointer.  Avoid this by using
signed char.

gcc/testsuite:

PR testsuite/113428
* gcc.dg/gomp/bad-array-section-c-3.c: Use signed char instead
of int.
---

I think this fixes the issues seen on ilp32 machines, without substantially
changing what the test does, but a second set of eyes wouldn't hurt.

 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
index 8be15ced8c0..431af71c422 100644
--- a/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
+++ b/gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
@@ -1,15 +1,15 @@
 /* { dg-do compile } */
 
 struct S {
-  int *ptr;
+  signed char *ptr;
 };
 
 int main()
 {
-  int arr[20];
+  signed char arr[20];
 
   /* Reject array section in compound initialiser.  */
-#pragma omp target map( (struct S) { .ptr = (int *) arr[5:5] } )
+#pragma omp target map( (struct S) { .ptr = (signed char *) arr[5:5] } )
 /* { dg-error {expected '\]' before ':' token} "" { target *-*-* } .-1 } */
 /* { dg-warning {cast to pointer from integer of different size} "" { target *-*-* } .-2 } */
 /* { dg-message {sorry, unimplemented: unsupported map expression} "" { target *-*-* } .-3 } */


[COMMITTED] arm: testsuite: tweak bics_3.c [PR113542]

2024-03-08 Thread Richard Earnshaw

This test was too simple, which meant that the compiler was sometimes
able to find a better optimization of the code than using a BICS
instruction.  Fix this by changing the test slightly to produce a
sequence where BICS should always be the preferred solution.

gcc/testsuite:
PR target/113542
* gcc.target/arm/bics_3.c: Adjust code to something which should
always result in BICS.
---
 gcc/testsuite/gcc.target/arm/bics_3.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c b/gcc/testsuite/gcc.target/arm/bics_3.c
index e056b264e15..4d6938948a1 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -2,13 +2,11 @@
 /* { dg-options "-O2 --save-temps -fno-inline" } */
 /* { dg-require-effective-target arm32 } */
 
-extern void abort (void);
-
 int
 bics_si_test (int a, int b)
 {
-  if (a & ~b)
-return 1;
+  if ((a & ~b) >= 0)
+return 3;
   else
 return 0;
 }
@@ -16,8 +14,8 @@ bics_si_test (int a, int b)
 int
 bics_si_test2 (int a, int b)
 {
-  if (a & ~ (b << 2))
-return 1;
+  if ((a & ~ (b << 2)) >= 0)
+return 3;
   else
 return 0;
 }
@@ -28,13 +26,12 @@ main (void)
   int a = 5;
   int b = 5;
   int c = 20;
-  if (bics_si_test (a, b))
-abort ();
-  if (bics_si_test2 (c, b))
-abort ();
+  if (bics_si_test (a, b) != 3)
+__builtin_abort ();
+  if (bics_si_test2 (c, b) != 3)
+__builtin_abort ();
   return 0;
 }
 
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl #2" 1 } } */
-


Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw




On 13/03/2024 10:58, Maxim Kuvyrkov wrote:

This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* g++.dg/vect/pr84556.cc: Fixup.
* gcc.dg/vect/complex/complex-operations-run.c Fixup.
* gcc.dg/vect/gimplefe-40.c Fixup.
* gcc.dg/vect/gimplefe-41.c Fixup.
* gcc.dg/vect/pr101145inf.c Fixup.
* gcc.dg/vect/pr101145inf_1.c Fixup.
* gcc.dg/vect/pr108316.c Fixup.
* gcc.dg/vect/pr109011-1.c Fixup.
* gcc.dg/vect/pr109011-2.c Fixup.
* gcc.dg/vect/pr109011-3.c Fixup.
* gcc.dg/vect/pr109011-4.c Fixup.
* gcc.dg/vect/pr109011-5.c Fixup.
* gcc.dg/vect/pr111846.c Fixup.
* gcc.dg/vect/pr111860-2.c Fixup.
* gcc.dg/vect/pr111860-3.c Fixup.
* gcc.dg/vect/pr113002.c Fixup.
* gcc.dg/vect/pr113576.c Fixup.
* gcc.dg/vect/pr84711.c Fixup.
* gcc.dg/vect/pr85597.c Fixup.
* gcc.dg/vect/pr88497-1.c Fixup.
* gcc.dg/vect/pr88497-2.c Fixup.
* gcc.dg/vect/pr88497-3.c Fixup.
* gcc.dg/vect/pr88497-4.c Fixup.
* gcc.dg/vect/pr88497-5.c Fixup.
* gcc.dg/vect/pr88497-7.c Fixup.
* gcc.dg/vect/pr92347.c Fixup.
* gcc.dg/vect/pr93069.c Fixup.
* gcc.dg/vect/pr97241.c Fixup.
* gcc.dg/vect/pr99102.c Fixup.
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
* gcc.dg/vect/vect-early-break_65.c Fixup.
* gcc.dg/vect/vect-fold-1.c Fixup.
* gcc.dg/vect/vect-ifcvt-19.c Fixup.
* gcc.dg/vect/vect-ifcvt-20.c Fixup.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
* gcc.dg/vect/vect-singleton_1.c Fixup.
* gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
* gfortran.dg/vect/pr77848.f Fixup.
* gfortran.dg/vect/pr90913.f90 Fixup.


Thanks for looking into this, I agree that changing to 
dg-additional-options looks the right choice.


The only thing to be wary of is that later 'dg-options' directives may 
override dg-additional-options directives; you might want to test at 
least one target where there are target-specific dg-options that you've 
not modified.


The patch is OK, but the ChangeLog is not!  Fixup doesn't t

Re: [PATCH][GCC] aarch64: Fix SCHEDULER_IDENT for Cortex-A520

2024-03-13 Thread Richard Earnshaw




On 12/03/2024 14:08, Richard Ball wrote:

The SCHEDULER_IDENT for this CPU was incorrectly
set to cortexa55, which is incorrect. This can cause
sub-optimal asm to be generated.

Ok for trunk?

gcc/ChangeLog:
PR target/114272
* config/aarch64/aarch64-cores.def (AARCH64_CORE):
Change SCHEDULER_IDENT from cortexa55 to cortexa53
for Cortex-A520.


I don't see having this as a separate patch to the one for Cortex-A510 
as having any value.


Please merge the two together.  A merged patch is pre-approved.

R.


Re: [PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw




On 13/03/2024 12:12, Maxim Kuvyrkov wrote:

Changes in v2:
- Better changelog entry.
- NFC.


This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/pr113576.c: Remove dg-run.  Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-41.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Likewise.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* gcc.dg/vect/pr108316.c: Likewise.
* gcc.dg/vect/pr109011-1.c: Likewise.
* gcc.dg/vect/pr109011-2.c: Likewise.
* gcc.dg/vect/pr109011-3.c: Likewise.
* gcc.dg/vect/pr109011-4.c: Likewise.
* gcc.dg/vect/pr109011-5.c: Likewise.
* gcc.dg/vect/pr111846.c: Likewise.
* gcc.dg/vect/pr111860-2.c: Likewise.
* gcc.dg/vect/pr111860-3.c: Likewise.
* gcc.dg/vect/pr113002.c: Likewise.
* gcc.dg/vect/pr84711.c: Likewise.
* gcc.dg/vect/pr85597.c: Likewise.
* gcc.dg/vect/pr88497-1.c: Likewise.
* gcc.dg/vect/pr88497-2.c: Likewise.
* gcc.dg/vect/pr88497-3.c: Likewise.
* gcc.dg/vect/pr88497-4.c: Likewise.
* gcc.dg/vect/pr88497-5.c: Likewise.
* gcc.dg/vect/pr88497-7.c: Likewise.
* gcc.dg/vect/pr92347.c: Likewise.
* gcc.dg/vect/pr93069.c: Likewise.
* gcc.dg/vect/pr97241.c: Likewise.
* gcc.dg/vect/pr99102.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-ifcvt-19.c: Likewise.
* gcc.dg/vect/vect-ifcvt-20.c: Likewise.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* g++.dg/vect/pr84556.cc: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
* gfortran.dg/vect/pr77848.f: Likewise.
* gfortran.dg/vect/pr90913.f90: Likewise.


OK.

(I wonder how many of the target-specific additional options are 

[PATCH] arm: testsuite: fix issues relating to fp16 alternative testing

2024-02-08 Thread Richard Earnshaw

The v*_fp16_xN_1.c tests on Arm have been unstable since they were
added.  This is not a problem with the tests themselves, or even the
patches that were added, but with the testsuite infrastructure.  It
turned out that another set of dg- tests for fp16 were corrupting the
cached set of options used by the new tests, leading to running the
tests with incorrect flags.

So the primary goal of this patch is to fix the incorrect internal
caching of the options needed to enable fp16 alternative format on
Arm: the code was storing the result in the same variable that was
being used for neon_fp16 and this was leading to testsuite instability
for tests that were checking for neon with fp16.

But in cleaning this up I also noted that we weren't then applying the
flags correctly having detected what they were, so we also address
that.

I suspect there are still some further issues to address here, since
the framework does not correctly test that the multilibs and startup
code enable alternative format; but this is still an improvement over
what we had before.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_fp16_alternative_ok_nocache): Use
et_arm_fp16_alternative_flags to cache the result.  Improve test
for FP16 availability.
(add_options_for_arm_fp16_alternative): Use
et_arm_fp16_alternative_flags.
* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Update dg-* flags.
* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
* gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
* gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
* gcc.target/arm/fp16-rounding-alt-1.c: Likewise.
---
 .../g++.dg/ext/arm-fp16/arm-fp16-ops-3.C |  2 +-
 .../g++.dg/ext/arm-fp16/arm-fp16-ops-4.C |  3 ++-
 .../gcc.dg/torture/arm-fp16-int-convert-alt.c|  2 +-
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c|  2 +-
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c|  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c  |  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c  |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-1.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-10.c |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-11.c |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-12.c |  2 +-
 .../gcc.target/arm/fp16-compile-alt-2.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-3.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-4.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-5.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-6.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-7.c  |  3 ++-
 .../gcc.target/arm/fp16-compile-alt-8.c  |  2 +-
 .../gcc.target/arm/fp16-compile-alt-9.c  |  2 +-
 .../gcc.target/arm/fp16-rounding-alt-1.c |  4 +++-
 gcc/testsuite/lib/target-supports.exp| 16 
 21 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
index 29080c7514f..5eceb3074df 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
@@ -1,6 +1,6 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
 /* { dg-require-effective-target arm_fp16_alternative_ok } */
-/* { dg-options "-mfp16-format=alternative" } */
+/* { dg-add-options arm_fp16_alternative } */
 
 #include "arm-fp16-ops.h"
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
index 4be8883faad..d86019f1469 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
@@ -1,6 +1,7 @@
 /* Test various operators on __fp16 and mixed __fp16/float operands.  */
 /* { dg-do run { target arm*-*-* } } */
 /* { dg-require-effective-target arm_fp16_alternative_ok } */
-/* { dg-options "-mfp16-format=alternative -ffast-math" } */
+/* { dg-options "-ffast-math" } */
+/* { dg-add-opt

Re: [PATCH] testsuite: Disable test for incompatible Arm targets

2024-02-13 Thread Richard Earnshaw




On 13/02/2024 10:44, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-13?

The alternative approach (that is changing the result a bit) is to drop
the special treatment for arm*-*-*. I'm not sure if this is prefered or
just disable the test for incompatible flags for arm*-*-*.

--

The test assumes it's okay to supply -march=armv7-a+simd, but it depends
on what target you are running the tests for.  For example, running the
GCC testsuite for Cortex-M0 produces the follwing entry in the logs:


Running the testsuite with -mcpu= in runtest/site.exp flags will uncover 
a whole host of problems with tests that try to specify an architecture. 
 It's essentially broken/unsupported at present.


I have some ideas for how to fix this properly, but they will have to 
wait for gcc-15 now.  In the mean time, I'd rather we didn't try to 
paper over the problem by putting random changes into the tests right now.


R.



Testing gcc.dg/pr41574.c
doing compile
Executing on host: arm-none-eabi-gcc .../pr41574.c  -mthumb -march=armv6s-m 
-mcpu=cortex-m0 -mfloat-abi=soft   -fdiagnostics-plain-output  -O2 
-march=armv7-a -mfloat-abi=softfp -mfpu=neon -fno-unsafe-math-optimizations 
-fdump-rtl-combine -ffat-lto-objects -S -o pr41574.s(timeout = 800)
spawn -ignore SIGHUP arm-none-eabi-gcc .../pr41574.c -mthumb -march=armv6s-m 
-mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -O2 -march=armv7-a 
-mfloat-abi=softfp -mfpu=neon -fno-unsafe-math-optimizations -fdump-rtl-combine 
-ffat-lto-objects -S -o pr41574.s
pid is 9799 -9799
cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
pid is -1
output is cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
  status 0
FAIL: gcc.dg/pr41574.c (test for excess errors)
Excess errors:
cc1: warning: switch '-mcpu=cortex-m0' conflicts with switch 
'-march=armv7-a+simd'
PASS: gcc.dg/pr41574.c scan-rtl-dump-not combine "\\(plus:DF \\(mult:DF"

Patch has been verified on Linux.

gcc/testsuite/ChangeLog:

* gcc.dg/pr41574.c: Disable test for Arm targets incompatible
with -march=armv7-a+simd.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.dg/pr41574.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr41574.c b/gcc/testsuite/gcc.dg/pr41574.c
index 062c0044532..f6af0c34273 100644
--- a/gcc/testsuite/gcc.dg/pr41574.c
+++ b/gcc/testsuite/gcc.dg/pr41574.c
@@ -1,6 +1,7 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon 
-fno-unsafe-math-optimizations -fdump-rtl-combine" { target { arm*-*-* } } } */
-/* { dg-options "-O2 -fno-unsafe-math-optimizations -fdump-rtl-combine" { 
target { ! arm*-*-* } } } */
+/* { dg-options "-O2 -fno-unsafe-math-optimizations -fdump-rtl-combine" } */
+/* { dg-require-effective-target arm_arch_v7a_neon_multilib { target { 
arm*-*-* } } } */
+/* { dg-additional-options "-march=armv7-a -mfloat-abi=softfp -mfpu=neon" { 
target { arm*-*-* } } } */
  
  
  static const double one=1.0;


[committed] arm: fix ICE with vectorized reciprocal division [PR108120]

2024-02-23 Thread Richard Earnshaw

The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.

Fixed by only enabling vector division when doing unsafe math.

gcc:

PR target/108120
* config/arm/neon.md (div3): Rename from div3.
Gate with ARM_HAVE_NEON__ARITH.

gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.
---
 gcc/config/arm/neon.md  |  4 ++--
 gcc/testsuite/gcc.target/arm/neon-recip-div-1.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/neon-recip-div-1.c

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 17c90f436c6..fa4a7aeda35 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -553,11 +553,11 @@ (define_insn "*mul3_neon"
Enabled with -funsafe-math-optimizations -freciprocal-math
and disabled for -Os since it increases code size .  */
 
-(define_expand "div3"
+(define_expand "div3"
   [(set (match_operand:VCVTF 0 "s_register_operand")
 (div:VCVTF (match_operand:VCVTF 1 "s_register_operand")
 		  (match_operand:VCVTF 2 "s_register_operand")))]
-  "TARGET_NEON && !optimize_size
+  "ARM_HAVE_NEON__ARITH && !optimize_size
&& flag_reciprocal_math"
   {
 rtx rec = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c b/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c
new file mode 100644
index 000..e15c3ca5fe9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-recip-div-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3 -freciprocal-math -fno-unsafe-math-optimizations -save-temps" } */
+/* { dg-add-options arm_neon } */
+
+int *a;
+int n;
+void b() {
+  int c;
+  for (c = 0; c < 10; c++)
+a[c] = (float)c / n;
+}
+/* We should not ICE, or get a vectorized reciprocal instruction when unsafe
+   math optimizations are disabled.  */
+/* { dg-final { scan-assembler-not "vrecpe\\.f32\\t\[qd\].*" } } */
+/* { dg-final { scan-assembler-not "vrecps\\.f32\\t\[qd\].*" } } */


[PATCH] arm: warn about deprecation of iwmmx in mmintrin.h

2024-02-27 Thread Richard Earnshaw

GCC 13's changes file documents that iwmmx is deprecated.  Raise the bar
by warning when the mmintrin.h header is included by users, but provide
a way to suppress the warning.

gcc:
* config/arm/mmintrin.h: Warn if this header is included without
defining __ENABLE_DEPRECATED_IWMMXT.
---
 gcc/config/arm/mmintrin.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 07659502bf2..e9cc3ddd7ab 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -28,6 +28,9 @@
 #error mmintrin.h included without enabling WMMX/WMMX2 instructions (e.g. -march=iwmmxt or -march=iwmmxt2)
 #endif
 
+#ifndef __ENABLE_DEPRECATED_IWMMXT
+#warning support for WMMX/WMMX2 is deprecated and will be removed in GCC 15.  Define __ENABLE_DEPRECATED_IWMMXT to suppress this warning
+#endif
 
 #if defined __cplusplus
 extern "C" {


Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-27 Thread Richard Earnshaw



On 09/01/2023 10:32, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> On powerpc64le-linux, the following patch fixes
> -FAIL: gcc.dg/c2x-stdarg-4.c execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O0  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O1  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O3 -g  execution test
> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -Os  execution test
> The problem is mismatch between the caller and callee side.
> On the callee side, we do:
>   /* NAMED_ARG is a misnomer.  We really mean 'non-variadic'. */
>   if (!cfun->stdarg)
> data->arg.named = 1;  /* No variadic parms.  */
>   else if (DECL_CHAIN (parm))
> data->arg.named = 1;  /* Not the last non-variadic parm. */
>   else if (targetm.calls.strict_argument_naming (all->args_so_far))
> data->arg.named = 1;  /* Only variadic ones are unnamed.  */
>   else
> data->arg.named = 0;  /* Treat as variadic.  */
> which is later passed to the target hooks to determine if a particular
> argument is named or not.  Now, cfun->stdarg is determined from the stdarg_p
> call, which for the new C2X TYPE_NO_NAMED_ARGS_STDARG_P function types
> (rettype fn (...)) returns true.  Such functions have no named arguments,
> so data->arg.named will be 0 in function.cc.  But on the caller side,
> as TYPE_NO_NAMED_ARGS_STDARG_P function types have TYPE_ARG_TYPES NULL,
> we instead treat those calls as unprototyped even when they are prototyped
> - /* If we know nothing, treat all args as named.  */ n_named_args = 
> num_actuals;
> in 2 spots.  We need to treat the TYPE_NO_NAMED_ARGS_STDARG_P cases as
> prototyped with no named arguments.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux (where
> it fixes the above failures), aarch64-linux and s390x-linux, ok for trunk?
> 
> 2023-01-09  Jakub Jelinek  
> 
>   PR target/107453
>   * calls.cc (expand_call): For calls with
>   TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>   Formatting fix.

This one has been festering for a while; both Alexandre and Torbjorn have 
attempted to fix it recently, but I'm not sure either is really right...

On Arm this is causing all anonymous arguments to be passed on the stack, which 
is incorrect per the ABI.  On a target that uses 
'pretend_outgoing_vararg_named', why is it correct to set n_named_args to zero? 
 Is it enough to guard both the statements you've added with 
!targetm.calls.pretend_outgoing_args_named?

R.

> 
> --- gcc/calls.cc.jj   2023-01-02 09:32:28.834192105 +0100
> +++ gcc/calls.cc  2023-01-06 14:52:14.740594896 +0100
> @@ -2908,8 +2908,8 @@ expand_call (tree exp, rtx target, int i
>  }
>  
>/* Count the arguments and set NUM_ACTUALS.  */
> -  num_actuals =
> -call_expr_nargs (exp) + num_complex_actuals + structure_value_addr_parm;
> +  num_actuals
> += call_expr_nargs (exp) + num_complex_actuals + 
> structure_value_addr_parm;
>  
>/* Compute number of named args.
>   First, do a raw count of the args for INIT_CUMULATIVE_ARGS.  */
> @@ -2919,6 +2919,8 @@ expand_call (tree exp, rtx target, int i
>= (list_length (type_arg_types)
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +n_named_args = 0;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2957,6 +2959,8 @@ expand_call (tree exp, rtx target, int i
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +n_named_args = 0;
>else
>  /* Treat all args as named.  */
>  n_named_args = num_actuals;
> 
>   Jakub
> 


Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-27 Thread Richard Earnshaw
[resending, apologies, I accidentally CC'd the wrong person last time]

On 27/02/2024 16:41, Richard Earnshaw wrote:
> 
> 
> On 09/01/2023 10:32, Jakub Jelinek via Gcc-patches wrote:
>> Hi!
>>
>> On powerpc64le-linux, the following patch fixes
>> -FAIL: gcc.dg/c2x-stdarg-4.c execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O0  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O1  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto 
>> -fno-use-linker-plugin -flto-partition=none  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -O3 -g  execution test
>> -FAIL: gcc.dg/torture/c2x-stdarg-split-1a.c   -Os  execution test
>> The problem is mismatch between the caller and callee side.
>> On the callee side, we do:
>>   /* NAMED_ARG is a misnomer.  We really mean 'non-variadic'. */
>>   if (!cfun->stdarg)
>> data->arg.named = 1;  /* No variadic parms.  */
>>   else if (DECL_CHAIN (parm))
>> data->arg.named = 1;  /* Not the last non-variadic parm. */
>>   else if (targetm.calls.strict_argument_naming (all->args_so_far))
>> data->arg.named = 1;  /* Only variadic ones are unnamed.  */
>>   else
>> data->arg.named = 0;  /* Treat as variadic.  */
>> which is later passed to the target hooks to determine if a particular
>> argument is named or not.  Now, cfun->stdarg is determined from the stdarg_p
>> call, which for the new C2X TYPE_NO_NAMED_ARGS_STDARG_P function types
>> (rettype fn (...)) returns true.  Such functions have no named arguments,
>> so data->arg.named will be 0 in function.cc.  But on the caller side,
>> as TYPE_NO_NAMED_ARGS_STDARG_P function types have TYPE_ARG_TYPES NULL,
>> we instead treat those calls as unprototyped even when they are prototyped
>> - /* If we know nothing, treat all args as named.  */ n_named_args = 
>> num_actuals;
>> in 2 spots.  We need to treat the TYPE_NO_NAMED_ARGS_STDARG_P cases as
>> prototyped with no named arguments.
>>
>> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux (where
>> it fixes the above failures), aarch64-linux and s390x-linux, ok for trunk?
>>
>> 2023-01-09  Jakub Jelinek  
>>
>>  PR target/107453
>>  * calls.cc (expand_call): For calls with
>>  TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>  Formatting fix.
> 
> This one has been festering for a while; both Alexandre and Torbjorn have 
> attempted to fix it recently, but I'm not sure either is really right...
> 
> On Arm this is causing all anonymous arguments to be passed on the stack, 
> which is incorrect per the ABI.  On a target that uses 
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to 
> zero?  Is it enough to guard both the statements you've added with 
> !targetm.calls.pretend_outgoing_args_named?
> 
> R.
> 
>>
>> --- gcc/calls.cc.jj  2023-01-02 09:32:28.834192105 +0100
>> +++ gcc/calls.cc 2023-01-06 14:52:14.740594896 +0100
>> @@ -2908,8 +2908,8 @@ expand_call (tree exp, rtx target, int i
>>  }
>>  
>>/* Count the arguments and set NUM_ACTUALS.  */
>> -  num_actuals =
>> -call_expr_nargs (exp) + num_complex_actuals + structure_value_addr_parm;
>> +  num_actuals
>> += call_expr_nargs (exp) + num_complex_actuals + 
>> structure_value_addr_parm;
>>  
>>/* Compute number of named args.
>>   First, do a raw count of the args for INIT_CUMULATIVE_ARGS.  */
>> @@ -2919,6 +2919,8 @@ expand_call (tree exp, rtx target, int i
>>= (list_length (type_arg_types)
>>   /* Count the struct value address, if it is passed as a parm.  */
>>   + structure_value_addr_parm);
>> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> +n_named_args = 0;
>>else
>>  /* If we know nothing, treat all args as named.  */
>>  n_named_args = num_actuals;
>> @@ -2957,6 +2959,8 @@ expand_call (tree exp, rtx target, int i
>> && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>>  /* Don't include the last named arg.  */
>>  --n_named_args;
>> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> +n_named_args = 0;
>>else
>>  /* Treat all args as named.  */
>>  n_named_args = num_actuals;
>>
>>  Jakub
>>


Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw



On 29/02/2024 14:10, Richard Earnshaw (lists) wrote:
> On 27/02/2024 17:25, Jakub Jelinek wrote:
>> On Tue, Feb 27, 2024 at 04:41:32PM +, Richard Earnshaw wrote:
>>>> 2023-01-09  Jakub Jelinek  
>>>>
>>>>PR target/107453
>>>>* calls.cc (expand_call): For calls with
>>>>TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>>>Formatting fix.
>>>
>>> This one has been festering for a while; both Alexandre and Torbjorn have 
>>> attempted to fix it recently, but I'm not sure either is really right...
>>>
>>> On Arm this is causing all anonymous arguments to be passed on the stack,
>>> which is incorrect per the ABI.  On a target that uses
>>> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
>>> zero?  Is it enough to guard both the statements you've added with
>>> !targetm.calls.pretend_outgoing_args_named?
>>
>> I'm afraid I haven't heard of that target hook before.
>> All I was doing with that change was fixing a regression reported in the PR
>> for ppc64le/sparc/nvptx/loongarch at least.
>>
>> The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
>> have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
>> it, but it should be handled as if it was non-NULL but list length was 0.
>>
>> So, for the
>>   if (type_arg_types != 0)
>> n_named_args
>>   = (list_length (type_arg_types)
>>  /* Count the struct value address, if it is passed as a parm.  */
>>  + structure_value_addr_parm);
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* If we know nothing, treat all args as named.  */
>> n_named_args = num_actuals;
>> case, I think guarding it by any target hooks is wrong, although
>> I guess it should have been
>> n_named_args = structure_value_addr_parm;
>> instead of
>> n_named_args = 0;
>>
>> For the second
>>   if (type_arg_types != 0
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
>> bet (but no testing done, don't even know which targets return what for
>> those hooks) we should treat those as if type_arg_types was non-NULL
>> with 0 elements in the list, except the --n_named_args doesn't make sense
>> because that would decrease it to -1.
>> So perhaps
>>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>> && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)))
>> ;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
> 
> I tried the above on arm, aarch64 and x86_64 and that seems fine, including 
> the new testcase you added.
> 

I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores n_named_args 
entirely, it doesn't need it (I don't think it even existed when the AAPCS code 
was added).

R.

> R.
> 
>>
>> (or n_named_args = 0; instead of ; before the final else?  Dunno).
>> I guess we need some testsuite coverage for caller/callee ABI match of
>> struct S { char p[64]; };
>> struct S foo (...);
>>
>>  Jakub
>>
> 


Re: [PATCH v2] libgfortran: Bugfix if not define HAVE_ATOMIC_FETCH_ADD

2024-01-10 Thread Richard Earnshaw

On 05/01/2024 01:43, Lipeng Zhu wrote:

This patch try to fix the bug when HAVE_ATOMIC_FETCH_ADD is
not defined in dec_waiting_unlocked function. As io.h does
not include async.h, the WRLOCK and RWUNLOCK macros are
undefined.

libgfortran/ChangeLog:

* io/io.h (dec_waiting_unlocked): Use
__gthread_rwlock_wrlock/__gthread_rwlock_unlock or
__gthread_mutex_lock/__gthread_mutex_unlock functions
to replace WRLOCK and RWUNLOCK macros.

Signed-off-by: Lipeng Zhu 


Has this been committed yet?

R.

---
  libgfortran/io/io.h | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
(void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (&unit_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (&unit_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (&unit_rwlock);
+#else
+  __gthread_mutex_lock (&unit_rwlock);
u->waiting--;
-  RWUNLOCK (&unit_rwlock);
+  __gthread_mutex_unlock (&unit_rwlock);
+#endif
  #endif
  }
  


Re: [PATCH][GCC][Arm] Define __ARM_FEATURE_BF16 when +bf16 feature is enabled

2024-01-10 Thread Richard Earnshaw




On 08/01/2024 17:21, Matthieu Longo wrote:

Hi,

Arm GCC backend does not define __ARM_FEATURE_BF16 when +bf16 is 
specified (via -march option, or target pragma) whereas it is supposed 
to be tested before including arm_bf16.h (as specified in ACLE document: 
https://arm-software.github.io/acle/main/acle.html#arm_bf16h).


gcc/ChangeLog:

     * config/arm/arm-c.cc (arm_cpu_builtins): define 
__ARM_FEATURE_BF16

     * config/arm/arm.h: define TARGET_BF16

Ok for master ?

Matthieu
index 
2e181bf7f36bab1209d5358e65d9513541683632..21ca22ac71119eda4ff01709aa95002ca13b1813 
100644

--- a/gcc/config/arm/arm-c.cc
+++ b/gcc/config/arm/arm-c.cc
@@ -425,12 +425,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   arm_arch_cde_coproc);

   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16", TARGET_BF16);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+ TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
  TARGET_BF16_FP);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
  TARGET_BF16_SIMD);
-  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
- TARGET_BF16_FP || TARGET_BF16_SIMD);

Why is the definition of __ARM_BF16_FORMAT_ALTERNATIVE changed?  And why 
is there explanation of that change?  It doesn't seem directly related 
to $subject.


R.

 }

 void


Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-10 Thread Richard Earnshaw




On 08/01/2024 16:07, Roger Sayle wrote:


Bootstrapping GCC on arm-linux-gnueabihf with --with-arch=armv6 currently
has a large number of FAILs in libatomic (regressions since last time I
attempted this).  The failure mode is related to IFUNC handling with the
file tas_8_2_.o containing an unresolved reference to the function
libat_test_and_set_1_i2.

Bearing in mind I've no idea what's going on, the following one line
change, to build tas_1_2_.o when building tas_8_2_.o, resolves the problem
for me and restores the libatomic testsuite to 44 expected passes and 5
unsupported tests [from 22 unexpected failures and 22 unresolved testcases].

If this looks like the correct fix, I'm not confident with rebuilding
Makefile.in with correct version of automake, so I'd very much appreciate
it if someone/the reviewer/mainainer could please check this in for me.
Thanks in advance.


2024-01-08  Roger Sayle  

libatomic/ChangeLog
 * Makefile.am: Build tas_1_2_.o on ARCH_ARM_LINUX
 * Makefile.in: Regenerate.


Roger
--



Hi Roger,

I don't really understand all this make foo :( so I'm not sure if this 
is the right fix either.  If this is, as you say, a regression, have you 
been able to track down when it first started to occur?  That might also 
help me to understand what changed to cause this.


Perhaps we should have a PR for this, to make tracking the fixes easier.

R.


Re: [PATCH][GCC][Arm] Add pattern for bswap + rotate -> rev16 [Bug 108933]

2024-01-29 Thread Richard Earnshaw

On 29/01/2024 14:14, Matthieu Longo wrote:

Hi Richard,

Please find below the new patch where I addressed your comments and 
updated the changelog.


rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (arm_rev16si2): Convert to define_insn.
 Correct generated RTL.
 (arm_rev16si2_alt1): Correctly handle conditional execution.
 (arm_rev16si2_alt2): Likewise.

gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
 emitted.


Thanks.  I've tweaked the commit message very slightly and pushed this.

Could you please prepare backports for gcc-11 thru 13?  It should just 
be a matter of cherry-picking the commit.


R.



On 2024-01-22 16:25, Richard Earnshaw (lists) wrote:

On 22/01/2024 12:18, Matthieu Longo wrote:

rev16 pattern was not recognised anymore as a change in the bswap tree
pass was introducing a new GIMPLE form, not recognized by the assembly
final transformation pass.

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108933

gcc/ChangeLog:

 PR target/108933
 * config/arm/arm.md (*arm_rev16si2_alt3): new pattern to 
convert

   a bswap + rotate by 16 bits into rev16


ChangeLog entries need to be written as sentences, so start with a 
capital letter and end with a full stop; continuation lines should 
start in column 8 (one hard tab, don't use spaces).  But in this case, 
"New pattern." is sufficient.




gcc/testsuite/ChangeLog:

 PR target/108933
 * gcc.target/arm/rev16.c: Moved to...
 * gcc.target/arm/rev16_1.c: ...here.
 * gcc.target/arm/rev16_2.c: New test to check that rev16 is
   emitted.



+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "*arm_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=l,r")
+    (rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" 
"l,r"))

+ (const_int 16)))]
+  "arm_arch6"
+  "rev16\\t%0, %1"
+  [(set_attr "arch" "t,32")
+   (set_attr "length" "2,4")
+   (set_attr "type" "rev")]
+)
+

Unfortunately, this is insufficient.  When generating Arm or Thumb2 
code (but not thumb1) we also have to handle conditional execution: we 
need to have '%?' in the output template at the point where a 
condition code might be needed.  That means we need separate output 
templates for all three alternatives (as we need a 16-bit variant for 
thumb2 that's conditional and a 16-bit for thumb1 that isn't).  See 
the output of arm_rev16 for a guide of what is really needed.


I note that the arm_rev16si2_alt1, and arm_rev16si2_alt2 patterns are 
incorrect in this regard as well; that will need fixing.


I also see that arm_rev16si2 currently expands to the alt1 variant 
above; given that the preferred canonical form would now appear to use 
bswap + rotate, we should change that as well.  In fact, we can merge 
your new pattern with the expand entirely and eliminate the need to 
call gen_arm_rev16si2_alt1.  Something like:


(define_insn "arm_rev16si2"
   [(set (match_operand:SI 0 "s_register_operand")
 (rotate:SI (bswap:SI (match_operand:SI 1 
"s_register_operand")) (const_int 16))]

   "arm_arch6"
   "@
   rev16...
   ...


R.



Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Richard Earnshaw




On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from 
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran

/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In 
function

‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ 
[-Wimplicit-function-declaration]

  1023 |   WRLOCK (&unit_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ 
[-Wimplicit-function-declaration]

  1025 |   RWUNLOCK (&unit_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
#else
   WRLOCK (&unit_rwlock);
   u->waiting--;
   RWUNLOCK (&unit_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
    (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (&unit_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (&unit_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (&unit_rwlock);
+#else
+  __gthread_mutex_lock (&unit_rwlock);
    u->waiting--;
-  RWUNLOCK (&unit_rwlock);
+  __gthread_mutex_unlock (&unit_rwlock);
+#endif
  #endif
  }


Lipeng Zhu


Hi Lipeng,

Thanks for the quick reply.  I can confirm that with the above change 
the bootstrap failure is fixed.  However, this shouldn't be considered a 
formal review; libgfortran is not really my area.


I'll be away now until January 2nd.

Richard.


Re: [PATCH] testsuite/arm: Fix bfloat16_vector_typecheck_[12].c tests

2023-11-30 Thread Richard Earnshaw




On 30/11/2023 10:15, Christophe Lyon wrote:

After commit r14-5617-gb8592186611, int32x[24]_t types now use
elements of 'long int' type instead of 'int' on arm-eabi (it's still
'int' on arm-linux-gnueabihf).  Both are 32-bit types anyway.

This patch adjust the two tests so that they optionnally accept 'long '
before 'int' in the expected error message.

2023-11-30  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/bfloat16_vector_typecheck_1.c: Update expected
error message.
* gcc.target/arm/bfloat16_vector_typecheck_2.c: Likewise.


OK.

R.


---
  gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c | 4 ++--
  gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
index f3c350b4cfc..470c13125fb 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_1.c
@@ -119,9 +119,9 @@ bfloat16x4_t footest (bfloat16x4_t vector0)
(bfloat16x4_t) { is_a_short_vec }; /* { dg-error {incompatible types when 
initializing type '__bf16' using type 'int16x4_t'} } */
  
(bfloat16x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type 'bfloat16x4_t'} } */

-  (int32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'int' using type 'bfloat16x4_t'} } */
+  (int32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type '(?:long )?int' using type 'bfloat16x4_t'} } */
(float32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'float' using type 'bfloat16x4_t'} } */
-  (int32x2_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'int' using type 'bfloat16x4_t'} } */
+  (int32x2_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type '(?:long )?int' using type 'bfloat16x4_t'} } */
(float16x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type '__fp16' using type 'bfloat16x4_t'} } */
(int16x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'short int' using type 'bfloat16x4_t'} } */
  
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c

index de0ade52c10..4e0d37907ce 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_vector_typecheck_2.c
@@ -111,7 +111,7 @@ bfloat16x8_t footest (bfloat16x8_t vector0)
(bfloat16x8_t) { is_a_short_vec }; /* { dg-error {incompatible types when 
initializing type '__bf16' using type 'int16x8_t'} } */
  
(bfloat16x8_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when initializing type '__bf16' using type 'bfloat16x8_t'} } */

-  (int32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'int' using type 'bfloat16x8_t'} } */
+  (int32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type '(?:long )?int' using type 'bfloat16x8_t'} } */
(float32x4_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'float' using type 'bfloat16x8_t'} } */
(int64x2_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type 'long long int' using type 'bfloat16x8_t'} } */
(float16x8_t) { glob_bfloat_vec }; /* { dg-error {incompatible types when 
initializing type '__fp16' using type 'bfloat16x8_t'} } */


Re: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-11-30 Thread Richard Earnshaw




On 29/11/2023 18:09, Richard Sandiford wrote:

Wilco Dijkstra  writes:

v2: Use UINTVAL, rename max_mops_size.

The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
 PR target/103100
 * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
 (setmemdi): Likewise.
 * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
 strict-align.  Cleanup condition for using MOPS.
 (aarch64_expand_setmem): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
dd6874d13a75f20d10a244578afc355b25c73da2..8a12894d6b80de1031d6e7d02dca680c57bce136
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25261,27 +25261,23 @@ aarch64_expand_cpymem (rtx *operands)
int mode_bits;
rtx dst = operands[0];
rtx src = operands[1];
+  unsigned align = UINTVAL (operands[3]);
rtx base;
machine_mode cur_mode = BLKmode;
+  bool size_p = optimize_function_for_size_p (cfun);

-  /* Variable-sized memcpy can go through the MOPS expansion if available.  */
-  if (!CONST_INT_P (operands[2]))
+  /* Variable-sized or strict-align copies may use the MOPS expansion.  */
+  if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16))
  return aarch64_expand_cpymem_mops (operands);

-  unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
-
-  /* Try to inline up to 256 bytes or use the MOPS threshold if available.  */
-  unsigned HOST_WIDE_INT max_copy_size
-= TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256;
+  unsigned HOST_WIDE_INT size = UINTVAL (operands[2]);

-  bool size_p = optimize_function_for_size_p (cfun);
+  /* Try to inline up to 256 bytes.  */
+  unsigned max_copy_size = 256;
+  unsigned mops_threshold = aarch64_mops_memcpy_size_threshold;

-  /* Large constant-sized cpymem should go through MOPS when possible.
- It should be a win even for size optimization in the general case.
- For speed optimization the choice between MOPS and the SIMD sequence
- depends on the size of the copy, rather than number of instructions,
- alignment etc.  */
-  if (size > max_copy_size)
+  /* Large copies use MOPS when available or a library call.  */
+  if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold))
  return aarch64_expand_cpymem_mops (operands);


It feels a little unintuitive to be calling aarch64_expand_cpymem_mops
for !TARGET_MOPS, but that's pre-existing, and I can see there are
arguments both ways.

Although !TARGET_SIMD is a niche interest on current trunk, it becomes
important for streaming-compatible mode.  So we might want to look
again at the different handling of !TARGET_SIMD in this function (where
we lower the copy size but not the threshold) and aarch64_expand_setmem
(where we bail out early).  That's not something for this patch though,
just mentioning it.

The patch is OK with me, but please give Richard E a day to object.


This is fine by me.

R.



Thanks,
Richard



int copy_bits = 256;
@@ -25445,12 +25441,13 @@ aarch64_expand_setmem (rtx *operands)
unsigned HOST_WIDE_INT len;
rtx dst = operands[0];
rtx val = operands[2], src;
+  unsigned align = UINTVAL (operands[3]);
rtx base;
machine_mode cur_mode = BLKmode, next_mode;

-  /* If we don't have SIMD registers or the size is variable use the MOPS
- inlined sequence if possible.  */
-  if (!CONST_INT_P (operands[1]) || !TARGET_SIMD)
+  /* Variable-sized or strict-align memset may use the MOPS expansion.  */
+  if (!CONST_INT_P (operands[1]) || !TARGET_SIMD
+  || (STRICT_ALIGNMENT && align < 16))
  return aarch64_expand_setmem_mops (operands);

bool size_p = optimize_function_for_size_p (cfun);
@@ -25458,10 +25455,13 @@ aarch64_expand_setmem (rtx *operands)
/* Default the maximum to 256-bytes when considering only libcall vs
   SIMD broadcast sequence.  */
unsigned max_set_size = 256;
+  unsigned mops_threshold = aarch64_mops_memset_size_threshold;

-  len = INTVAL (operands[1]);
-  if (len > max_set_size && !TARGET_MOPS)
-return false;
+  len = UINTVAL (operands[1]);
+
+  /* Large memset uses MOPS when available or a library call.  */
+  if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))
+return aarch64_expand_setmem_mops (operands);

int cst_val = !!(CONST_INT_P (val) && (INTVAL (val) != 0));
/* The MOPS sequence takes:
@@ -25474,12 +25474,6 @@ aarch64_expand_setmem (rtx *operands)
   the arguments + 1 for the call.  */
unsigned libcall_cost = 4;

-  /* Upper bound check.  For large constant-sized setmem use the MOPS sequence
- when available.  */
-  if (TARGET_MOPS
-  && len >= (unsigned HOST_WIDE_INT) aarch64_mops_memset_size_threshold)
-return aarch64_expand_setmem_mops (

Re: [PATCH] aarch64: modify Ampere CPU tunings on reassociation/FMA

2023-11-30 Thread Richard Earnshaw




On 30/11/2023 08:27, Di Zhao OS wrote:

This patch modifies tunings for ampere1/ampere1a/ampere1b, to:

1. Allow reassociation on FP additions.
2. Avoid generating loop-dependant FMA chains. Added a tuning
option for this.

Bootstrapped and tested. Is this ok for trunk?

Thanks,
Di Zhao

gcc/ChangeLog:

 * config/aarch64/aarch64-tuning-flags.def 
(AARCH64_EXTRA_TUNING_OPTION):
 New tuing option to avoid cross-loop FMA.


typo: tuning


 * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set
 param_avoid_fma_max_bits according to tuning option.
 * config/aarch64/tuning_models/ampere1.h: Modify tunings related with
 FMA.
 * config/aarch64/tuning_models/ampere1a.h: Modify tunings related with
 FMA.
 * config/aarch64/tuning_models/ampere1b.h: Modify tunings related with
 FMA.



You need to mention the name of the structure you're modifying.  Also we 
usually just write 'Likewise.' if the change is identical in effect to 
the change immediately above, so


* config/aarch64/tuning_models/ampere1.h (ampere1_tunings):
Modify tunings related with FMA.
* config/aarch64/tuning_models/ampere1a.h (ampere1a_tunings):
Likewise.
* config/aarch64/tuning_models/ampere1b.h (ampere1b_tunings):
Likewise.

Finally, watch your line length.  The total length of the line should 
not go beyond column 72 in commit log entries, unless that involves 
breaking a single word on a line.


Otherwise, this is OK.

R.


---
  gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++
  gcc/config/aarch64/aarch64.cc   | 6 ++
  gcc/config/aarch64/tuning_models/ampere1.h  | 2 +-
  gcc/config/aarch64/tuning_models/ampere1a.h | 4 ++--
  gcc/config/aarch64/tuning_models/ampere1b.h | 5 +++--
  5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 774568e9106..f28a73839a6 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -47,4 +47,6 @@ AARCH64_EXTRA_TUNING_OPTION ("use_new_vector_costs", 
USE_NEW_VECTOR_COSTS)
  
  AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput", MATCHED_VECTOR_THROUGHPUT)
  
+AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)

+
  #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 64684258b7b..28bc70a787f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16083,6 +16083,12 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
&& opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
  opts->x_flag_prefetch_loop_arrays = 1;
  
+  /* Avoid loop-dependant FMA chains.  */

+  if (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA)
+SET_OPTION_IF_UNSET (opts, &global_options_set, param_avoid_fma_max_bits,
+512);
+
aarch64_override_options_after_change_1 (opts);
  }
  
diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h

index 8d2a1c69610..a144e8f94b3 100644
--- a/gcc/config/aarch64/tuning_models/ampere1.h
+++ b/gcc/config/aarch64/tuning_models/ampere1.h
@@ -104,7 +104,7 @@ static const struct tune_params ampere1_tunings =
2,  /* min_div_recip_mul_df.  */
0,  /* max_case_values.  */
tune_params::AUTOPREFETCHER_WEAK,   /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
&ere1_prefetch_tune,
AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h 
b/gcc/config/aarch64/tuning_models/ampere1a.h
index c419ffb3c1a..f688ed08a79 100644
--- a/gcc/config/aarch64/tuning_models/ampere1a.h
+++ b/gcc/config/aarch64/tuning_models/ampere1a.h
@@ -50,13 +50,13 @@ static const struct tune_params ampere1a_tunings =
"32:16",  /* loop_align.  */
2,  /* int_reassoc_width.  */
4,  /* fp_reassoc_width.  */
-  1,   /* fma_reassoc_width.  */
+  4,   /* fma_reassoc_width.  */
2,  /* vec_reassoc_width.  */
2,  /* min_div_recip_mul_sf.  */
2,  /* min_div_recip_mul_df.  */
0,  /* max_case_values.  */
tune_params::AUTOPREFETCHER_WEAK,   /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
&ere1_prefetch_tune,
AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h 
b/gcc/config/aarch64/tuning_models/ampere1b.h
index c4928f50d29..a98b6a980f7 100644
--- a/gcc/config/aarch64/tuning_models/a

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-05 Thread Richard Earnshaw
igning to 
type '\[^\n\]*' from type '\[^\n\]*'" } */
+/* { dg-message "note: expected '\[^'\n\]*' but argument is of type '\[^'\n\]*'" 
"note: expected" { target *-*-* } .-1 } */
+
+
+DECIMAL_COMPOSITE_DECL(64);  /* { dg-error "incompatible types when assigning to 
type '\[^\n\]*' from type '\[^\n\]*'" } */
+/* { dg-message "note: expected '\[^'\n\]*' but argument is of type '\[^'\n\]*'" 
"note: expected" { target *-*-* } .-1 } */
+
+
+DECIMAL_COMPOSITE_DECL(128); /* { dg-error "incompatible types when assigning to 
type '\[^\n\]*' from type '\[^\n\]*'" } */
+/* { dg-message "note: expected '\[^'\n\]*' but argument is of type '\[^'\n\]*'" 
"note: expected" { target *-*-* } .-1 } */
+
+
+int main()
+{
+  DECIMAL_COMPOSITE_TEST(32);  /* { dg-error "incompatible pointer type" } */
+  DECIMAL_COMPOSITE_TEST(64);  /* { dg-error "incompatible pointer type" } */
+  DECIMAL_COMPOSITE_TEST(128); /* { dg-error "incompatible pointer type" } */
+
+  return 0;
+}
+
+/* The invalid function redeclarations might also trigger:
+   { dg-prune-output "-Warray-parameter" } */
diff --git a/gcc/testsuite/gcc.dg/dfp/composite-type.c 
b/gcc/testsuite/gcc.dg/dfp/composite-type.c
index ce7d5c1a0a0..2eb601400b5 100644
--- a/gcc/testsuite/gcc.dg/dfp/composite-type.c
+++ b/gcc/testsuite/gcc.dg/dfp/composite-type.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-O -Wall -ftrack-macro-expansion=0" } */
+/* { dg-options "-fpermissive -O -Wall -ftrack-macro-expansion=0" } */
  
  /* C99 6.2.7: Compatible type and composite type.  */
  
diff --git a/gcc/testsuite/gcc.dg/diag-aka-1.c b/gcc/testsuite/gcc.dg/diag-aka-1.c

index 3383c1c263b..485a8a5f85d 100644
--- a/gcc/testsuite/gcc.dg/diag-aka-1.c
+++ b/gcc/testsuite/gcc.dg/diag-aka-1.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-Wc++-compat" } */
+/* { dg-options "-fpermissive -Wc++-compat" } */
  
  typedef struct A { int i; } B;

  typedef struct T { int i; } *T; /* { dg-warning "using 'T' as both a typedef and a 
tag is invalid" } */
diff --git a/gcc/testsuite/gcc.dg/diag-aka-1a.c 
b/gcc/testsuite/gcc.dg/diag-aka-1a.c
new file mode 100644
index 000..d161b785e7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/diag-aka-1a.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Wc++-compat" } */
+
+typedef struct A { int i; } B;
+typedef struct T { int i; } *T; /* { dg-warning "using 'T' as both a typedef and a 
tag is invalid" } */
+typedef const float TFA;
+typedef TFA TFB;
+typedef TFB TFC;
+typedef int IA[];
+typedef IA *IAP;
+extern IAP arr[];
+
+void fn1 (B *); /* { dg-message "expected 'B \\*' {aka 'struct A \\*'} but argument 
is of type 'struct B \\*'" } */
+void fn2 (TFC *);
+
+void
+bar (B *b, int *i)
+{
+  fn1 ((struct B *) b); /* { dg-error "passing argument" } */
+  fn2 (i); /* { dg-error "passing argument" } */
+  sizeof (arr); /* { dg-error "invalid application of .sizeof. to incomplete type 
.int \\(\\*\\\[\\\]\\)\\\[\\\]." } */
+}
+
+int
+foo (void *a)
+{
+  T t = a; /* { dg-warning "request for implicit conversion from 'void \\*' to 'T' 
{aka 'struct T \\*'} not" } */
+  return t->i;
+}
diff --git a/gcc/testsuite/gcc.dg/enum-compat-1.c 
b/gcc/testsuite/gcc.dg/enum-compat-1.c
index 5fb150cee79..b7352f6ddc3 100644
--- a/gcc/testsuite/gcc.dg/enum-compat-1.c
+++ b/gcc/testsuite/gcc.dg/enum-compat-1.c
@@ -3,7 +3,7 @@
  /* Origin: Joseph Myers , based on
 PR c/6024 from Richard Earnshaw  */
  /* { dg-do compile } */
-/* { dg-options "" } */
+/* { dg-options "-fpermissive" } */
  
  /* Original test from PR c/6024.  */

  enum e1 {a, b};
diff --git a/gcc/testsuite/gcc.dg/enum-compat-2.c 
b/gcc/testsuite/gcc.dg/enum-compat-2.c
new file mode 100644
index 000..69509012480
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/enum-compat-2.c
@@ -0,0 +1,32 @@
+/* Test that enumerated types are only considered compatible when they
+   are the same type.  PR c/6024.  */
+/* Origin: Joseph Myers , based on
+   PR c/6024 from Richard Earnshaw  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+/* Original test from PR c/6024.  */
+enum e1 {a, b};
+enum e2 {c, d};
+
+void f(enum e1); /* { dg-error "prototype" "error at decl" } */
+
+void f(x)
+ enum e2 x; /* { dg-error "doesn't match prototype" } */
+{
+  return;
+}
+
+/* Other compatibility tests.  */
+enum e3 { A };
+enum e4 { B };
+
+enum e3 v3;
+enum e4 *p = &v3; /* { dg-error "incompatible" "incompatible pointer" } */
+en

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-05 Thread Richard Earnshaw




On 05/12/2023 09:46, Florian Weimer wrote:

* Richard Earnshaw:


(I think it's this patch, not one of the others in the series).

This breaks building libgfortran with newlib on arm and aarch64:


/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2208:46:
error: pointer type mismatch in conditional expression
[-Wincompatible-pointer-types]
  2208 |   dtp->common.iostat : &noiostat;
   |  ^
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2208:27:
note: first expression has type ‘GFC_INTEGER_4 *’ {aka ‘long int *’}
  2208 |   dtp->common.iostat : &noiostat;
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2208:48:
note: second expression has type ‘int *’
  2208 |   dtp->common.iostat : &noiostat;
   |^
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2224:34:
error: passing argument 2 of ‘dtp->u.p.fdtio_ptr’ from incompatible
pointer type [-Wincompatible-pointer-types]
  2224 |   dtp->u.p.fdtio_ptr (p, &unit, iotype, &vlist,
   |  ^
   |  |
   |  int *
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2224:34:
note: expected ‘GFC_INTEGER_4 *’ {aka ‘long int *’} but argument is of
type ‘int *’
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2225:31:
error: passing argument 5 of ‘dtp->u.p.fdtio_ptr’ from incompatible
pointer type [-Wincompatible-pointer-types]
  2225 |   child_iostat, child_iomsg,
   |   ^~~~
   |   |
   |   int *
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/list_read.c:2225:31:
note: expected ‘GFC_INTEGER_4 *’ {aka ‘long int *’} but argument is of
type ‘int *’


Presumably the fixes will look like this?

diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index db3330060ce..4fcc77dbf83 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -2987,13 +2987,13 @@ nml_read_obj (st_parameter_dt *dtp, namelist_info *nl, 
index_type offset,
/* If this object has a User Defined procedure, call it.  */
if (nl->dtio_sub != NULL)
  {
-   int unit = dtp->u.p.current_unit->unit_number;
+   GFC_INTEGER_4 unit = dtp->u.p.current_unit->unit_number;
char iotype[] = "NAMELIST";
gfc_charlen_type iotype_len = 8;
char tmp_iomsg[IOMSG_LEN] = "";
char *child_iomsg;
gfc_charlen_type child_iomsg_len;
-   int noiostat;
+   GFC_INTEGER_4 noiostat;
int *child_iostat = NULL;
gfc_full_array_i4 vlist;
formatted_dtio dtio_ptr = (formatted_dtio)nl->dtio_sub;


Apparently the targets I built define GFC_INTEGER_4 as int, so this
didn't show up.


It looks reasonable to me, but I'm not a real user of libgfortran, so 
there's possibly something more subtle that I've missed (I can't even 
guarantee I copied all of the errors from the build log).


I've copied Tobias as a fortran maintainer, but I don't know if this is 
his forte either, though perhaps he might know who's it is.


R.



Thanks,
Florian



Re: [PATCH] libgfortran: Fix -Wincompatible-pointer-types errors

2023-12-05 Thread Richard Earnshaw




On 05/12/2023 10:33, Jakub Jelinek wrote:

Hi!

On Tue, Dec 05, 2023 at 10:46:02AM +0100, Florian Weimer wrote:

Presumably the fixes will look like this?

diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index db3330060ce..4fcc77dbf83 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -2987,13 +2987,13 @@ nml_read_obj (st_parameter_dt *dtp, namelist_info *nl, 
index_type offset,
/* If this object has a User Defined procedure, call it.  */
if (nl->dtio_sub != NULL)
  {
-   int unit = dtp->u.p.current_unit->unit_number;
+   GFC_INTEGER_4 unit = dtp->u.p.current_unit->unit_number;
char iotype[] = "NAMELIST";
gfc_charlen_type iotype_len = 8;
char tmp_iomsg[IOMSG_LEN] = "";
char *child_iomsg;
gfc_charlen_type child_iomsg_len;
-   int noiostat;
+   GFC_INTEGER_4 noiostat;
int *child_iostat = NULL;
gfc_full_array_i4 vlist;
formatted_dtio dtio_ptr = (formatted_dtio)nl->dtio_sub;


That seems insufficient.

The following patch makes libgfortran build on i686-linux after hacking up
--- kinds.h.xx  2023-12-05 00:23:00.133365064 +0100
+++ kinds.h 2023-12-05 11:19:24.409679808 +0100
@@ -10,8 +10,8 @@ typedef GFC_INTEGER_2 GFC_LOGICAL_2;
  #define HAVE_GFC_LOGICAL_2
  #define HAVE_GFC_INTEGER_2
  
-typedef int32_t GFC_INTEGER_4;

-typedef uint32_t GFC_UINTEGER_4;
+typedef long GFC_INTEGER_4;
+typedef unsigned long GFC_UINTEGER_4;


That doesn't look right for a 64-bit processor.  Presumably 4 means 4 
bytes, but long will generally be 8 on such targets.


R.


  typedef GFC_INTEGER_4 GFC_LOGICAL_4;
  #define HAVE_GFC_LOGICAL_4
  #define HAVE_GFC_INTEGER_4
in the build dir to emulate what newlib aarch64 is doing:

2023-12-05  Florian Weimer  
Jakub Jelinek  

* io/list_read.c (list_formatted_read_scalar) :
Change types of unit and noiostat to GFC_INTEGER_4 from int, change
type of child_iostat from to GFC_INTEGER_4 * from int *, formatting
fixes.
(nml_read_obj): Likewise.
* io/write.c (list_formatted_write_scalar) : Likewise.
(nml_write_obj): Likewise.
* io/transfer.c (unformatted_read, unformatted_write): Likewise.

--- libgfortran/io/list_read.c.jj   2023-05-09 00:07:26.161168737 +0200
+++ libgfortran/io/list_read.c  2023-12-05 11:25:31.837426653 +0100
@@ -2189,14 +2189,14 @@ list_formatted_read_scalar (st_parameter
break;
  case BT_CLASS:
{
- int unit = dtp->u.p.current_unit->unit_number;
+ GFC_INTEGER_4 unit = dtp->u.p.current_unit->unit_number;
  char iotype[] = "LISTDIRECTED";
gfc_charlen_type iotype_len = 12;
  char tmp_iomsg[IOMSG_LEN] = "";
  char *child_iomsg;
  gfc_charlen_type child_iomsg_len;
- int noiostat;
- int *child_iostat = NULL;
+ GFC_INTEGER_4 noiostat;
+ GFC_INTEGER_4 *child_iostat = NULL;
  gfc_full_array_i4 vlist;
  
  	  GFC_DESCRIPTOR_DATA(&vlist) = NULL;

@@ -2204,8 +2204,8 @@ list_formatted_read_scalar (st_parameter
  
  	  /* Set iostat, intent(out).  */

  noiostat = 0;
- child_iostat = (dtp->common.flags & IOPARM_HAS_IOSTAT) ?
- dtp->common.iostat : &noiostat;
+ child_iostat = ((dtp->common.flags & IOPARM_HAS_IOSTAT)
+ ? dtp->common.iostat : &noiostat);
  
  	  /* Set iomsge, intent(inout).  */

  if (dtp->common.flags & IOPARM_HAS_IOMSG)
@@ -2987,14 +2987,14 @@ nml_read_obj (st_parameter_dt *dtp, name
/* If this object has a User Defined procedure, call it.  */
if (nl->dtio_sub != NULL)
  {
-   int unit = dtp->u.p.current_unit->unit_number;
+   GFC_INTEGER_4 unit = dtp->u.p.current_unit->unit_number;
char iotype[] = "NAMELIST";
gfc_charlen_type iotype_len = 8;
char tmp_iomsg[IOMSG_LEN] = "";
char *child_iomsg;
gfc_charlen_type child_iomsg_len;
-   int noiostat;
-   int *child_iostat = NULL;
+   GFC_INTEGER_4 noiostat;
+   GFC_INTEGER_4 *child_iostat = NULL;
gfc_full_array_i4 vlist;
formatted_dtio dtio_ptr = (formatted_dtio)nl->dtio_sub;
  
@@ -3006,8 +3006,8 @@ nml_read_obj (st_parameter_dt *dtp, name
  
  		/* Set iostat, intent(out).  */

noiostat = 0;
-   child_iostat = (dtp->common.flags & IOPARM_HAS_IOSTAT) ?
-   dtp->common.iostat : &noiostat;
+   child_iostat = ((dtp->common.flags & IOPARM_HAS_IOSTAT)
+   ? dtp->common.iostat : &noiostat);
  
  		/* Set iomsg, intent(inout).  */

if (dtp->common.flags & IOPARM_HAS_IOMSG)

Re: [PATCH] libgfortran: Fix -Wincompatible-pointer-types errors

2023-12-05 Thread Richard Earnshaw




On 05/12/2023 10:51, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:47:34AM +, Richard Earnshaw wrote:

The following patch makes libgfortran build on i686-linux after hacking up
--- kinds.h.xx  2023-12-05 00:23:00.133365064 +0100
+++ kinds.h 2023-12-05 11:19:24.409679808 +0100
@@ -10,8 +10,8 @@ typedef GFC_INTEGER_2 GFC_LOGICAL_2;
   #define HAVE_GFC_LOGICAL_2
   #define HAVE_GFC_INTEGER_2
-typedef int32_t GFC_INTEGER_4;
-typedef uint32_t GFC_UINTEGER_4;
+typedef long GFC_INTEGER_4;
+typedef unsigned long GFC_UINTEGER_4;


That doesn't look right for a 64-bit processor.  Presumably 4 means 4 bytes,


i686-linux is an ILP32 target, which I chose exactly because I regularly build
it, had a tree with it around and because unlike 64-bit targets there are 2
standard 32-bit signed integer types.  Though, normally int32_t there is
int rather than long int and so the errors only appeared after this hack.



My point is that on aarch64/x86_64 etc, this will make GFC_INTEGER_4 a 
64-bit type, whereas previously it was 32-bit.


R.


Jakub



Re: [PATCH] libgfortran: Fix -Wincompatible-pointer-types errors

2023-12-05 Thread Richard Earnshaw




On 05/12/2023 10:59, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:57:50AM +, Richard Earnshaw wrote:

On 05/12/2023 10:51, Jakub Jelinek wrote:

On Tue, Dec 05, 2023 at 10:47:34AM +, Richard Earnshaw wrote:

The following patch makes libgfortran build on i686-linux after hacking up
--- kinds.h.xx  2023-12-05 00:23:00.133365064 +0100
+++ kinds.h 2023-12-05 11:19:24.409679808 +0100
@@ -10,8 +10,8 @@ typedef GFC_INTEGER_2 GFC_LOGICAL_2;
#define HAVE_GFC_LOGICAL_2
#define HAVE_GFC_INTEGER_2
-typedef int32_t GFC_INTEGER_4;
-typedef uint32_t GFC_UINTEGER_4;
+typedef long GFC_INTEGER_4;
+typedef unsigned long GFC_UINTEGER_4;


That doesn't look right for a 64-bit processor.  Presumably 4 means 4 bytes,


i686-linux is an ILP32 target, which I chose exactly because I regularly build
it, had a tree with it around and because unlike 64-bit targets there are 2
standard 32-bit signed integer types.  Though, normally int32_t there is
int rather than long int and so the errors only appeared after this hack.



My point is that on aarch64/x86_64 etc, this will make GFC_INTEGER_4 a
64-bit type, whereas previously it was 32-bit.


Sure.  The above patch is a hack for a generated header.  I'm not proposing
that as a change, just explaining how I've verified the actual patch on
i686-linux with such a hack.

Jakub



Ah, I understand now.

I've successfully built arm and aarch64 cross toolchains with this patch 
(newlib).  So LGTM, thanks.


R.


Re: [PATCH] [arm] testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

2023-12-06 Thread Richard Earnshaw

Sorry, I only just spotted this while looking at something else.


On 23/05/2023 15:41, Christophe Lyon via Gcc-patches wrote:

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the




   'wrong' version:
   invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
   invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.
---
  .../mve_intrinsic_type_overloads-int.c| 28 ++-
  1 file changed, 15 insertions(+), 13 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
index 7947dc024bc..ab51cc8b323 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
@@ -47,14 +47,22 @@ foo2 (short * addr, int16x8_t value)
vst1q (addr, value);
  }
  
-void

-foo3 (int * addr, int32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
+/* Glibc defines int32_t as 'int' while newlib defines it as 'long int'.
+
+   Although these correspond to the same size, g++ complains when using the
+   'wrong' version:
+  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
+
+  The trick below is to make this test pass whether using glibc-based or
+  newlib-based toolchains.  */
  
+#if defined(__GLIBC__)

+#define word_type int
+#else
+#define word_type long int
+#endif


GCC #defines __INT32_TYPE__ for this and should be more reliable than 
trying to detect one specific library implementation.  Did you try that?



  void
-foo4 (long * addr, int32x4_t value)
+foo3 (word_type * addr, int32x4_t value)
  {
vst1q (addr, value);
  }
@@ -78,13 +86,7 @@ foo7 (unsigned short * addr, uint16x8_t value)
  }
  
  void

-foo8 (unsigned int * addr, uint32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
-
-void
-foo9 (unsigned long * addr, uint32x4_t value)
+foo8 (unsigned word_type * addr, uint32x4_t value)
  {
vst1q (addr, value);
  }


R.


Re: [PATCH v2 6/7] aarch64,arm: Fix branch-protection= parsing

2023-12-07 Thread Richard Earnshaw




On 03/11/2023 15:36, Szabolcs Nagy wrote:

Refactor the parsing to have a single API and fix a few parsing issues:

- Different handling of "bti+none" and "none+bti": these should be
   rejected because "none" can only appear alone.

- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
   was caused by using strtok_r.



These now print
  error: invalid argument ‘’ for ‘-mbranch-protection=’

which is OK, but might be a bit confusing.  Perhaps we could change this 
specific case to "missing feature or flag for '-mbranch-protection'".


The ideal solution (IMO) would be if we could print something like

  in option
  -mbranch-protection=+bti
  ^
  |
  missing feature or flag

much like we do for source code diagnostics now.

However, I don't know if our framework could handle that for things from 
the command line, and it's not important enough to do now.



- Memory got leaked (str_root was never freed). And two buffers got
   allocated when one is enough.

The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally.  The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.


This is OK.  If you want to do the simple tweak for the error message 
for the case I mention above, consider that pre-approved.


R.


---
v2: merge tests updates into the patch
error message is not changed, see previous discussion:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633945.html
---
  gcc/config/aarch64/aarch64.cc |  37 +--
  gcc/config/arm/aarch-common-protos.h  |   5 +-
  gcc/config/arm/aarch-common.cc| 214 --
  gcc/config/arm/aarch-common.h |  14 +-
  gcc/config/arm/arm.cc |   3 +-
  .../aarch64/branch-protection-attr.c  |   6 +-
  .../aarch64/branch-protection-option.c|   2 +-
  7 files changed, 113 insertions(+), 168 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f8e8fefc8d8..4f7f707b675 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18642,7 +18642,8 @@ aarch64_override_options (void)
  aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
  
if (aarch64_branch_protection_string)

-aarch_validate_mbranch_protection (aarch64_branch_protection_string);
+aarch_validate_mbranch_protection (aarch64_branch_protection_string,
+  "-mbranch-protection=");
  
/* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.

   If either of -march or -mtune is given, they override their
@@ -19016,34 +19017,12 @@ aarch64_handle_attr_cpu (const char *str)
  
  /* Handle the argument STR to the branch-protection= attribute.  */
  
- static bool

- aarch64_handle_attr_branch_protection (const char* str)
- {
-  char *err_str = (char *) xmalloc (strlen (str) + 1);
-  enum aarch_parse_opt_result res = aarch_parse_branch_protection (str,
-  &err_str);
-  bool success = false;
-  switch (res)
-{
- case AARCH_PARSE_MISSING_ARG:
-   error ("missing argument to % pragma 
or"
- " attribute");
-   break;
- case AARCH_PARSE_INVALID_ARG:
-   error ("invalid protection type %qs in % pragma or attribute", err_str);
-   break;
- case AARCH_PARSE_OK:
-   success = true;
-  /* Fall through.  */
- case AARCH_PARSE_INVALID_FEATURE:
-   break;
- default:
-   gcc_unreachable ();
-}
-  free (err_str);
-  return success;
- }
+static bool
+aarch64_handle_attr_branch_protection (const char* str)
+{
+  return aarch_validate_mbranch_protection (str,
+   

Re: [PATCH v2 7/7] aarch64,arm: Move branch-protection data to targets

2023-12-07 Thread Richard Earnshaw




On 03/11/2023 15:36, Szabolcs Nagy wrote:

The branch-protection types are target specific, not the same on arm
and aarch64.  This currently affects pac-ret+b-key, but there will be
a new type on aarch64 that is not relevant for arm.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h (enum aarch64_key_type): Rename to ...
(enum aarch_key_type): ... this.
* config/aarch64/aarch64.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_pac_ret_b_key): Copy.
(aarch_handle_bti_protection): Copy.


I think all of the above functions that have been moved back from 
aarch-common should be renamed back to aarch64_..., unless they are 
directly referenced statically by code in aarch-common.c.

* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Remove.
(aarch_handle_standard_branch_protection): Remove.
(aarch_handle_pac_ret_protection): Remove.
(aarch_handle_pac_ret_leaf): Remove.
(aarch_handle_pac_ret_b_key): Remove.
(aarch_handle_bti_protection): Remove.
* config/arm/aarch-common.h (enum aarch_key_type): Remove.
(struct aarch_branch_protect_type): Declare.
* config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
* config/arm/arm.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_bti_protection): Copy.
(arm_configure_build_target): Copy.


And the same here.


* config/arm/arm.opt: Remove aarch_ra_sign_key.
---
unchanged compared to v1.
---
  gcc/config/aarch64/aarch64-opts.h |  6 ++--
  gcc/config/aarch64/aarch64.cc | 55 +++
  gcc/config/arm/aarch-common.cc| 55 ---
  gcc/config/arm/aarch-common.h | 11 +++
  gcc/config/arm/arm-c.cc   |  2 --
  gcc/config/arm/arm.cc | 52 +
  gcc/config/arm/arm.opt|  3 --
  7 files changed, 109 insertions(+), 75 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 831e28ab52a..1abae1442b5 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -103,9 +103,9 @@ enum stack_protector_guard {
  };
  
  /* The key type that -msign-return-address should use.  */

-enum aarch64_key_type {
-  AARCH64_KEY_A,
-  AARCH64_KEY_B
+enum aarch_key_type {
+  AARCH_KEY_A,
+  AARCH_KEY_B
  };
  
  /* An enum specifying how to handle load and store pairs using

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4f7f707b675..9739223831f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18620,6 +18620,61 @@ aarch64_set_asm_isa_flags (aarch64_feature_flags flags)
aarch64_set_asm_isa_flags (&global_options, flags);
  }
  
+static void

+aarch_handle_no_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
+  aarch_enable_bti = 0;
+}
+
+static void
+aarch_handle_standard_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+  aarch_enable_bti = 1;
+}
+
+static void
+aarch_handle_pac_ret_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+}
+
+static void
+aarch_handle_pac_ret_leaf (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_ALL;
+}
+
+static void
+aarch_handle_pac_ret_b_key (void)
+{
+  aarch_ra_sign_key = AARCH_KEY_B;
+}
+
+static void
+aarch_handle_bti_protection (void)
+{
+  aarch_enable_bti = 1;
+}
+
+static const struct aarch_branch_protect_type aarch_pac_ret_subtypes[] = {
+  { "leaf", false, aarch_handle_pac_ret_leaf, NULL, 0 },
+  { "b-key", false, aarch_handle_pac_ret_b_key, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
+const struct aarch_branch_protect_type aarch_branch_protect_types[] = {


can this be made static now?  And maybe pass the structure as a 
parameter if that's not done already.




+  { "none", true, aarch_handle_no_branch_protection, NULL, 0 },
+  { "standard", true, aarch_handle_standard_branch_protection, NULL, 0 },
+  { "pac-ret", false, aarch_handle_pac_ret_protection, aarch_pac_ret_subtypes,
+ARRAY_SIZE (aarch_pac_ret_subtypes) },
+  { "bti", false, aarch_handle_bti_protection, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
  /* Implement TARGET_OPTION_OVERRIDE.  This is called once in the beginning
 and is used to parse the -m{cpu,tune,arch} strings and setup the initial
 tuning structs.  In particular it must set selected_tune and
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 159c61b786c..92e1248f83f 100644
--- a/gcc/

Re: [PATCH v2 0/3] [GCC] arm: vst1_types_xN ACLE intrinsics

2023-12-07 Thread Richard Earnshaw

Pushed, thanks.

R.


On 07/12/2023 15:28, ezra.sito...@arm.com wrote:

Add xN variants of vst1_types intrinsic.




Re: [PATCH v2 0/3] [GCC] arm: vst1q_types_xN ACLE intrinsics

2023-12-07 Thread Richard Earnshaw

Pushed, thanks.

R.


On 07/12/2023 15:36, ezra.sito...@arm.com wrote:

Add xN variants of vst1q_types intrinsic.




Re: [PATCH v2 0/3] [GCC] arm: vld1_types_xN ACLE intrinsics

2023-12-07 Thread Richard Earnshaw

Pushed, thanks.

R.


On 07/12/2023 15:41, ezra.sito...@arm.com wrote:

Add xN variants of vld1_types intrinsic.




Re: [PATCH v2 0/3] [GCC] arm: vst1_types_xN ACLE intrinsics

2023-12-08 Thread Richard Earnshaw
Sorry, Ezra, but I've taken the decision to back out all 4 of the patch 
series' related to this.  I think the problems that the CI has shown up 
need to be addressed first, and the fixes don't seem to be entirely trivial.


R.

On 07/12/2023 16:44, Richard Earnshaw wrote:

Pushed, thanks.

R.


On 07/12/2023 15:28, ezra.sito...@arm.com wrote:

Add xN variants of vst1_types intrinsic.




Re: [PATCH 1/2] arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns

2023-12-12 Thread Richard Earnshaw




On 06/11/2023 11:20, Stamatis Markianos-Wright wrote:

Patch has already been approved at:

https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630326.html


... But I'm sending this again for archiving on the list after rebasing


A couple of minor nits:

1)

+#define MVE_VPT_PREDICABLE_INSN_P(INSN)
\
+  (recog_memoized (INSN) >= 0   \
+  && get_attr_mve_unpredicated_insn (INSN) != 0)   \

I think it's better to write "!= CODE_FOR_nothing".

+(define_attr "mve_unpredicated_insn" "" (const_int 0))
+

And the default value here should similarly be 'symbol_ref 
"CODE_FOR_nothing"'.


So that the style matches the symbol refs elsewhere.


2)
+(define_insn "*predicated_doloop_end_internal"
+  [(set (pc)
+   (if_then_else
+  (ge (plus:SI (reg:SI LR_REGNUM)
+   (match_operand:SI 0 "const_int_operand" ""))
+   (const_int 0))
+(label_ref (match_operand 1 "" ""))
+(pc)))
+   (set (reg:SI LR_REGNUM)
+   (plus:SI (reg:SI LR_REGNUM) (match_dup 0)))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_32BIT && TARGET_HAVE_LOB && TARGET_HAVE_MVE && TARGET_THUMB2"

TARGET_THUMB2 => TARGET_32BIT, so the first test is redundant.  In fact, 
given that TARGET_HAVE_LOB => armv8.1-m.main => thumb2, why do we need 
either?


So
TARGET_HAVE_LOB && TARGET_HAVE_MVE
should be sufficient.


+(define_insn "dlstp_insn"
+  [
+(set (reg:SI LR_REGNUM)
+(unspec:SI [(match_operand:SI 0 "s_register_operand" "r")]
+ DLSTP))
+  ]
+  "TARGET_32BIT && TARGET_HAVE_LOB && TARGET_HAVE_MVE && TARGET_THUMB2"

Same here.

Otherwise, OK.

R.


Re: [PATCH v2 0/3] [GCC] arm: vld1q_types_xN ACLE intrinsics

2023-12-12 Thread Richard Earnshaw

Pushed, thanks.

R.

On 07/12/2023 15:21, ezra.sito...@arm.com wrote:

Add xN variants of vld1q_types intrinsic.




Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-12-12 Thread Richard Earnshaw




On 30/11/2023 12:55, Stamatis Markianos-Wright wrote:

Hi Andre,

Thanks for the comments, see latest revision attached.

On 27/11/2023 12:47, Andre Vieira (lists) wrote:

Hi Stam,

Just some comments.

+/* Recursively scan through the DF chain backwards within the basic 
block and
+   determine if any of the USEs of the original insn (or the USEs of 
the insns
s/Recursively scan/Scan/ as you no longer recurse, thanks for that by 
the way :) +   where thy were DEF-ed, etc., recursively) were affected 
by implicit VPT

remove recursively for the same reasons.

+  if (!CONST_INT_P (cond_counter_iv.step) || !CONST_INT_P 
(cond_temp_iv.step))

+    return NULL;
+  /* Look at the steps and swap around the rtx's if needed. Error 
out if

+ one of them cannot be identified as constant.  */
+  if (INTVAL (cond_counter_iv.step) != 0 && INTVAL 
(cond_temp_iv.step) != 0)

+    return NULL;

Move the comment above the if before, as the erroring out it talks 
about is there.

Done


+  emit_note_after ((enum insn_note)NOTE_KIND (insn), BB_END (body));
 space after 'insn_note)'

@@ -173,14 +176,14 @@ doloop_condition_get (rtx_insn *doloop_pat)
   if (! REG_P (reg))
 return 0;
 -  /* Check if something = (plus (reg) (const_int -1)).
+  /* Check if something = (plus (reg) (const_int -n)).
  On IA-64, this decrement is wrapped in an if_then_else.  */
   inc_src = SET_SRC (inc);
   if (GET_CODE (inc_src) == IF_THEN_ELSE)
 inc_src = XEXP (inc_src, 1);
   if (GET_CODE (inc_src) != PLUS
   || XEXP (inc_src, 0) != reg
-  || XEXP (inc_src, 1) != constm1_rtx)
+  || !CONST_INT_P (XEXP (inc_src, 1)))

Do we ever check that inc_src is negative? We used to check if it was 
-1, now we only check it's a constnat, but not a negative one, so I 
suspect this needs a:

|| INTVAL (XEXP (inc_src, 1)) >= 0

Good point. Done


@@ -492,7 +519,8 @@ doloop_modify (class loop *loop, class niter_desc 
*desc,

 case GE:
   /* Currently only GE tests against zero are supported.  */
   gcc_assert (XEXP (condition, 1) == const0_rtx);
-
+  /* FALLTHRU */
+    case GTU:
   noloop = constm1_rtx;

I spent a very long time staring at this trying to understand why 
noloop = constm1_rtx for GTU, where I thought it should've been (count 
& (n-1)). For the current use of doloop it doesn't matter because ARM 
is the only target using it and you set desc->noloop_assumptions to 
null_rtx in 'arm_attempt_dlstp_transform' so noloop is never used. 
However, if a different target accepts this GTU pattern then this 
target agnostic code will do the wrong thing.  I suggest we either:
 - set noloop to what we think might be the correct value, which if 
you ask me should be 'count & (XEXP (condition, 1))',
 - or add a gcc_assert (GET_CODE (condition) != GTU); under the if 
(desc->noloop_assumption); part and document why.  I have a slight 
preference for the assert given otherwise we are adding code that we 
can't test.


Yea, that's true tbh. I've done the latter, but also separated out the 
"case GTU:" and added a comment, so that it's more clear that the noloop 
things aren't used in the only implemented GTU case (Arm)


Thank you :)



LGTM otherwise (but I don't have the power to approve this ;)).

Kind regards,
Andre

From: Stamatis Markianos-Wright 
Sent: Thursday, November 16, 2023 11:36 AM
To: Stamatis Markianos-Wright via Gcc-patches; Richard Earnshaw; 
Richard Sandiford; Kyrylo Tkachov
Subject: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated 
Low Overhead Loops


Pinging back to the top of reviewers' inboxes due to worry about Stage 1
End in a few days :)


See the last email for the latest version of the 2/2 patch. The 1/2
patch is A-Ok from Kyrill's earlier target-backend review.


On 10/11/2023 12:41, Stamatis Markianos-Wright wrote:


On 06/11/2023 17:29, Stamatis Markianos-Wright wrote:


On 06/11/2023 11:24, Richard Sandiford wrote:

Stamatis Markianos-Wright  writes:
One of the main reasons for reading the arm bits was to try to 
answer

the question: if we switch to a downcounting loop with a GE
condition,
how do we make sure that the start value is not a large unsigned
number that is interpreted as negative by GE?  E.g. if the loop
originally counted up in steps of N and used an LTU condition,
it could stop at a value in the range [INT_MAX + 1, UINT_MAX].
But the loop might never iterate if we start counting down from
most values in that range.

Does the patch handle that?

So AFAICT this is actually handled in the generic code in
`doloop_valid_p`:

This kind of loops fail because of they are "desc->infinite", then no
loop-doloop conversion is attempted at all (even for standard
dls/le loops)

Thanks to that check I haven't been able to trigger anything like the
behaviour you describe, do you 

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Richard Earnshaw




On 10/11/2023 10:17, Wilco Dijkstra wrote:

Hi Kyrill,


+  /* Reduce the maximum size with -Os.  */
+  if (optimize_function_for_size_p (cfun))
+    max_set_size = 96;
+



 This is a new "magic" number in this code. It looks sensible, but how did 
you arrive at it?


We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
for every 32 bytes, so the 96 means 4 instructions for typical sizes 
(sizes not

a multiple of 16 can add one extra instruction).

I checked codesize on SPECINT2017, and 96 had practically identical size.
Using 128 would also be a reasonable Os value with a very slight size 
increase,
and 384 looks good for O2 - however I didn't want to tune these values 
as this

is a cleanup patch.

Cheers,
Wilco


Shouldn't this be a param then?  Also, manifest constants in the middle 
of code are a potential nightmare, please move it to a #define (even if 
that's then used as the default value for the param).


Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Richard Earnshaw




On 10/11/2023 14:46, Kyrylo Tkachov wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Friday, November 10, 2023 11:31 AM
To: Wilco Dijkstra ; Kyrylo Tkachov
; GCC Patches 
Cc: Richard Sandiford ; Richard Earnshaw

Subject: Re: [PATCH] AArch64: Cleanup memset expansion



On 10/11/2023 10:17, Wilco Dijkstra wrote:

Hi Kyrill,


+  /* Reduce the maximum size with -Os.  */
+  if (optimize_function_for_size_p (cfun))
+    max_set_size = 96;
+



 This is a new "magic" number in this code. It looks sensible, but how

did you arrive at it?


We need 1 instruction to create the value to store (DUP or MOVI) and 1 STP
for every 32 bytes, so the 96 means 4 instructions for typical sizes
(sizes not
a multiple of 16 can add one extra instruction).


It would be useful to have that reasoning in the comment.



I checked codesize on SPECINT2017, and 96 had practically identical size.
Using 128 would also be a reasonable Os value with a very slight size
increase,
and 384 looks good for O2 - however I didn't want to tune these values
as this
is a cleanup patch.

Cheers,
Wilco


Shouldn't this be a param then?  Also, manifest constants in the middle
of code are a potential nightmare, please move it to a #define (even if
that's then used as the default value for the param).


I agree on making this a #define but I wouldn't insist on a param.
Code size IMO has a much more consistent right or wrong answer as it's 
statically determinable.
It this was a speed-related param then I'd expect the flexibility for the power 
user to override such heuristics would be more widely useful.
But for code size the compiler should always be able to get it right.

If Richard would still like the param then I'm fine with having the param, but 
I'd be okay with the comment above and making this a #define.


I don't immediately have a feel for how sensitive code would be to the 
precise value here.  Is this value something that might affect 
individual benchmarks in different ways?  Or something where a future 
architecture might want a different value?  For either of those reasons 
a param might be useful, but if this is primarily a code size trade off 
and the variation in performance is small, then it's probably not 
worthwhile having an additional hook.


R.


[committed 00/22] arm: testsuite: clean up some architecture-specific tests

2023-11-13 Thread Richard Earnshaw
A lot of the arm-specific compiler tests require a specific CPU or
architecture to be specified.  This causes problems if the test suite
run is set up to test a specific architecture or CPU that differs from
the test's requirements.  An exmple I use commonly is

set target_list { "arm-qemu{,-mthumb}" }

but it is possible to also test other architectures or CPUs this way, for
example,

set target_list { "arm-qemu{,-mthumb,
  -march=armv6t2+fp/-mfloat-abi=hard,
  -march=armv8-a+simd/-mthumb/-mfloat-abi=hard,
  -mcpu=cortex-m33/-mfloat-abi=softfp,
  -mcpu=cortex-m55/-mfloat-abi=hard,
  -mcpu=cortex-m23}" }

[line breaks inserted for readability]

tests 7 permutations of
 - base configuration
 - base configuration with -mthumb
 - armv6t2 with FP and a hard-float ABI
 - armv8-a with Neon and thumb and the hard-float ABI
 - cortex-m33 with the softfp ABI
 - cortex-m55 with the hard-float ABI
 - cortex-m23

Over time we have developed a series of checks that can be used to
ensure that we test what we want to test and don't test if the options
conflict, but these have been applied somewhat haphazzardly and as the
framework has been improved tests haven't been updated to make full
use of the tests.

This patch series deploys the framework dg- directives more widely
across the arm-specific tests to make testing more consistent.  On
that long list of permutations above this results in the following
changes:

16 tests move from FAIL to PASS.
21 new FAILS.  
562 new tests that PASS
74 tests that passed have been removed

The new FAILs are real issues on targets that only support
single-precision FP and should be investigated at some point, but
probably aren't urgent given the use cases for cores with this issue.

The tests that have been removed come from the fact that we now more
accurately test that option combinations won't cause problems; they
are related to the fact that if the testrun config specifies -mcpu,
but the test sets -march, then we can get an architecture conflict.
I have some ideas about how to address this, but that's for a later
test series.

committed to master branch.

R.

Richard Earnshaw (22):
  arm: testsuite: correctly detect armv6t2 hardware for acle execution
tests
  arm: testsuite: correctly detect hard_float
  arm: testsuite: avoid hard-float ABI incompatibility with -march
  arm: testsuite: avoid problems with -mfpu=auto in pacbti-m-predef-11.c
  arm: testsuite: avoid problems with -mfpu=auto in attr-crypto.c
  arm: testsuite: avoid problems with -mfpu=auto in attr_thumb-static2.c
  arm: testsuite: tidy up pre-run check for g2.c
  arm: testsuite: improve compatibility of arm/lto/pr96939_1.c
  arm: testsuite: tidy up pr65647-2.c pre-checks.
  arm: testsuite: improve compatibility of arm/pr78353-*.c
  arm: testsuite: improve compatibility of pr88648-asm-syntax-unified.c
  arm: testsuite: improve compatibility of pragma_arch_attribute*.c
  arm: testsuite: improve compatibility of pragma_arch_switch_2.c
  arm: testsuite: modernize framework usage for arm/scd42-2.c
  arm: testsuite: improve compatibility of ftest-armv7m-thumb.c
  arm: testsuite: improve compatibility of gcc.target/arm/macro_defs*.c
  arm: testsuite: improve compatibility of
gcc.target/arm/optional_thumb-*.c
  arm: testsuite: improve compatibility of gcc.target/arm/pr19599.c
  arm: testsuite: improve compatibility of gcc.target/arm/pr59575.c
  testsuite: arm: tighten up mode-specific ISA tests
  arm: testsuite: fix some more architecture tests
  arm: testsuite: improve compatibility of gcc.dg/debug/pr57351.c

 gcc/testsuite/gcc.dg/debug/pr57351.c  |  7 +-
 .../arm/acle/data-intrinsics-armv6.c  |  2 +-
 .../arm/acle/data-intrinsics-rbit.c   |  2 +-
 .../gcc.target/arm/acle/pacbti-m-predef-11.c  |  2 +-
 gcc/testsuite/gcc.target/arm/attr-crypto.c|  2 +-
 .../gcc.target/arm/attr_thumb-static2.c   |  2 +-
 .../gcc.target/arm/ftest-armv7m-thumb.c   |  3 +-
 gcc/testsuite/gcc.target/arm/g2.c | 10 +-
 gcc/testsuite/gcc.target/arm/lto/pr96939_1.c  |  2 +-
 gcc/testsuite/gcc.target/arm/macro_defs0.c|  7 +-
 gcc/testsuite/gcc.target/arm/macro_defs1.c|  6 +-
 gcc/testsuite/gcc.target/arm/macro_defs2.c|  6 +-
 .../gcc.target/arm/optional_thumb-1.c |  2 +-
 .../gcc.target/arm/optional_thumb-3.c |  4 +-
 gcc/testsuite/gcc.target/arm/pr19599.c|  2 +-
 gcc/testsuite/gcc.target/arm/pr59575.c|  4 +-
 gcc/testsuite/gcc.target/arm/pr60650-2.c  |  4 +-
 gcc/testsuite/gcc.target/arm/pr60657.c|  4 +-
 gcc/testsuite/gcc.target/arm/pr60663.c|  4 +-
 gcc/testsuite/gcc.target/arm/pr65647-2.c  |  3 +-
 gcc/testsuite/gcc.target/arm/pr78353-1.c  |  3 +-
 gcc/testsuite/gcc.target/arm/pr78353-2.c  |  3 +-
 gcc/testsuite/gcc.target/arm/pr81863.c|  4 +-
 .../arm/pr88648-asm-syntax-unified.c  |  2 +-
 gcc/test

[committed 01/22] arm: testsuite: correctly detect armv6t2 hardware for acle execution tests

2023-11-13 Thread Richard Earnshaw

Some of the ACLE tests for Arm are executable, but we were only testing
that the compiler could generate code for them, not that the hardware
was capable of executing them.  Fix this by adding an execution test for
suitable hardware.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_v6t2_hw_ok):
New function.
* gcc.target/arm/acle/data-intrinsics-armv6.c: Use it.
* gcc.target/arm/acle/data-intrinsics-rbit.c: Likewise.
---
 .../arm/acle/data-intrinsics-armv6.c   |  2 +-
 .../gcc.target/arm/acle/data-intrinsics-rbit.c |  2 +-
 gcc/testsuite/lib/target-supports.exp  | 18 ++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
index 988ecac3787..6dc8c55e2f9 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target arm_arch_v6t2_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_hw_ok } */
 /* { dg-add-options arm_arch_v6t2 } */
 
 #include "arm_acle.h"
diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
index d1fe274b5ce..b01c4219a7e 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
@@ -1,6 +1,6 @@
 /* Test the ACLE data intrinsics existence for specific instruction.  */
 /* { dg-do run } */
-/* { dg-require-effective-target arm_arch_v6t2_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_hw_ok } */
 /* { dg-additional-options "--save-temps -O1" } */
 /* { dg-add-options arm_arch_v6t2 } */
 /* { dg-final { check-function-bodies "**" "" "" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 1a7bea96c1e..d414cddf4dc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5590,6 +5590,24 @@ proc check_effective_target_arm_thumb1_cbz_ok {} {
 }
 }
 
+# Return 1 if this is an Arm target which supports the Armv6t2 extensions.
+# This can be either in Arm state or in Thumb state.
+
+proc check_effective_target_arm_arch_v6t2_hw_ok {} {
+if [check_effective_target_arm_thumb1_ok] {
+	return [check_no_compiler_messages arm_movt object {
+	int
+	main (void)
+	{
+	  asm ("bfc r0, #1, #2");
+	  return 0;
+	}
+	} [add_options_for_arm_arch_v6t2 ""]]
+} else {
+	return 0
+}
+}
+
 # Return 1 if this is an ARM target where ARMv8-M Security Extensions is
 # available.
 


[committed 02/22] arm: testsuite: correctly detect hard_float

2023-11-13 Thread Richard Earnshaw

Add an arm-specific test to check_effective_target_hard_float for
Arm to handle cases where we only have single-precision FP in hardware.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_hard_float): Add
arm-specific test.
---
 gcc/testsuite/lib/target-supports.exp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d414cddf4dc..ee173b9fb6b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1420,6 +1420,17 @@ proc check_effective_target_mpaired_single { args } {
 # Return true if the target has access to FPU instructions.
 
 proc check_effective_target_hard_float { } {
+# This should work on cores that only have single-precision,
+# and should also correctly handle legacy cores that had thumb1 and
+# lacked FP support for that, but had it in Arm state.
+if { [istarget arm*-*-*] } {
+	return [check_no_compiler_messages hard_float assembly {
+		#if __ARM_FP == 0
+		#error __arm_soft_float
+		#endif
+	}]
+}
+
 if { [istarget loongarch*-*-*] } {
 	return [check_no_compiler_messages hard_float assembly {
 		#if (defined __loongarch_soft_float)


[committed 07/22] arm: testsuite: tidy up pre-run check for g2.c

2023-11-13 Thread Richard Earnshaw

gcc.target/arm/g2.c is an xscale-only test, but the test is quite old
and we have improved the infrastructure for setting up such tests now.
So make use of that to reduce the number of cases where this test fails
to run.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok):
Add entry to check for xscale.
* gcc.target/arm/g2.c: Use it.
---
 gcc/testsuite/gcc.target/arm/g2.c | 10 --
 gcc/testsuite/lib/target-supports.exp |  1 +
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/g2.c b/gcc/testsuite/gcc.target/arm/g2.c
index ca5e3ccff66..04334c97713 100644
--- a/gcc/testsuite/gcc.target/arm/g2.c
+++ b/gcc/testsuite/gcc.target/arm/g2.c
@@ -1,11 +1,9 @@
 /* Verify that hardware multiply is preferred on XScale. */
 /* { dg-do compile } */
-/* { dg-options "-mcpu=xscale -O2 -marm" } */
-/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-march=*" } { "-march=xscale" } } */
-/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } { "-mcpu=xscale" } } */
-/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" } } */
-/* { dg-require-effective-target arm_arch_v5te_arm_ok } */
-/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_arch_xscale_arm_ok } */
+/* { dg-add-options arm_arch_xscale_arm } */
+
 
 /* Brett Gaines' test case. */
 unsigned BCPL(unsigned) __attribute__ ((naked));
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7d83bd8740f..9d2958626ad 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5411,6 +5411,7 @@ foreach { armfunc armflag armdefs } {
 	v5te "-march=armv5te+fp -mfloat-abi=softfp" __ARM_ARCH_5TE__
 	v5te_arm "-march=armv5te+fp -marm" __ARM_ARCH_5TE__
 	v5te_thumb "-march=armv5te+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_5TE__
+	xscale_arm "-mcpu=xscale -mfloat-abi=soft -marm" __XSCALE__
 	v6 "-march=armv6+fp -mfloat-abi=softfp" __ARM_ARCH_6__
 	v6_arm "-march=armv6+fp -marm" __ARM_ARCH_6__
 	v6_thumb "-march=armv6+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6__


[committed 10/22] arm: testsuite: improve compatibility of arm/pr78353-*.c

2023-11-13 Thread Richard Earnshaw

Again, use the infrastructure available to improve the compatibility
of these tests.

gcc/testsuite:

* gcc.target/arm/pr78353-1.c: Use dg-add-options to manage target
flags.
* gcc.target/arm/pr78353-2.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/pr78353-1.c | 3 ++-
 gcc/testsuite/gcc.target/arm/pr78353-2.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr78353-1.c b/gcc/testsuite/gcc.target/arm/pr78353-1.c
index a107e300269..56480774ce4 100644
--- a/gcc/testsuite/gcc.target/arm/pr78353-1.c
+++ b/gcc/testsuite/gcc.target/arm/pr78353-1.c
@@ -1,6 +1,7 @@
 /* { dg-do link }  */
 /* { dg-require-effective-target arm_arch_v7a_multilib } */
-/* { dg-options "-march=armv7-a -mthumb -O2 -flto -Wa,-mimplicit-it=always" }  */
+/* { dg-options "-mthumb -O2 -flto -Wa,-mimplicit-it=always" }  */
+/* { dg-add-options arm_arch_v7a } */
 
 int main(int x)
 {
diff --git a/gcc/testsuite/gcc.target/arm/pr78353-2.c b/gcc/testsuite/gcc.target/arm/pr78353-2.c
index 2589e6135aa..c070d7275bc 100644
--- a/gcc/testsuite/gcc.target/arm/pr78353-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr78353-2.c
@@ -1,6 +1,7 @@
 /* { dg-do link }  */
 /* { dg-require-effective-target arm_arch_v7a_multilib } */
-/* { dg-options "-march=armv7-a -mthumb -O2 -flto -Wa,-mimplicit-it=always,-mthumb" }  */
+/* { dg-options "-mthumb -O2 -flto -Wa,-mimplicit-it=always,-mthumb" }  */
+/* { dg-add-options arm_arch_v7a } */
 
 int main(int x)
 {


[committed 04/22] arm: testsuite: avoid problems with -mfpu=auto in pacbti-m-predef-11.c

2023-11-13 Thread Richard Earnshaw

This test overrides the architecture, but fails to describe which
floating-point features are needed.  This causes problems if the ABI
requires FP for parameter passing and -mfpu=auto is selected, so ensure
that one is specified.

gcc/testsuite:

* gcc.target/arm/acle/pacbti-m-predef-11.c: Add +fp to the -march
specification.
---
 gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
index 9f2711097ac..6a5ae92c567 100644
--- a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-11.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" "-mcpu=*" "-mfloat-abi=*" } } */
-/* { dg-options "-march=armv8.1-m.main+pacbti" } */
+/* { dg-options "-march=armv8.1-m.main+fp+pacbti" } */
 
 #if (__ARM_FEATURE_BTI != 1)
 #error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined to 1."


[committed 03/22] arm: testsuite: avoid hard-float ABI incompatibility with -march

2023-11-13 Thread Richard Earnshaw

A number of tests in the gcc testsuite, especially for arm-specific
targets, add various flags to control the architecture.  These run
into problems when the compiler is configured with -mfpu=auto if the
new architecture lacks an architectural feature that implies we have
floating-point instructions.

The testsuite makes this worse as it falls foul of this requirement in
the base architecture strings provided by target-supports.exp.

To fix this we add "+fp", or something equivalent to this, to all the
base architecture specifications.  The feature will be ignored if the
float ABI is set to soft.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok):
Add base FPU specifications to all architectures that can support
one.
---
 gcc/testsuite/lib/target-supports.exp | 50 +--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ee173b9fb6b..7d83bd8740f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5408,36 +5408,36 @@ foreach { armfunc armflag armdefs } {
 	v5t "-march=armv5t -mfloat-abi=softfp" __ARM_ARCH_5T__
 	v5t_arm "-march=armv5t -marm" __ARM_ARCH_5T__
 	v5t_thumb "-march=armv5t -mthumb -mfloat-abi=softfp" __ARM_ARCH_5T__
-	v5te "-march=armv5te -mfloat-abi=softfp" __ARM_ARCH_5TE__
-	v5te_arm "-march=armv5te -marm" __ARM_ARCH_5TE__
-	v5te_thumb "-march=armv5te -mthumb -mfloat-abi=softfp" __ARM_ARCH_5TE__
-	v6 "-march=armv6 -mfloat-abi=softfp" __ARM_ARCH_6__
-	v6_arm "-march=armv6 -marm" __ARM_ARCH_6__
-	v6_thumb "-march=armv6 -mthumb -mfloat-abi=softfp" __ARM_ARCH_6__
-	v6k "-march=armv6k -mfloat-abi=softfp" __ARM_ARCH_6K__
-	v6k_arm "-march=armv6k -marm" __ARM_ARCH_6K__
-	v6k_thumb "-march=armv6k -mthumb -mfloat-abi=softfp" __ARM_ARCH_6K__
-	v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
-	v6z "-march=armv6z -mfloat-abi=softfp" __ARM_ARCH_6Z__
-	v6z_arm "-march=armv6z -marm" __ARM_ARCH_6Z__
-	v6z_thumb "-march=armv6z -mthumb -mfloat-abi=softfp" __ARM_ARCH_6Z__
+	v5te "-march=armv5te+fp -mfloat-abi=softfp" __ARM_ARCH_5TE__
+	v5te_arm "-march=armv5te+fp -marm" __ARM_ARCH_5TE__
+	v5te_thumb "-march=armv5te+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_5TE__
+	v6 "-march=armv6+fp -mfloat-abi=softfp" __ARM_ARCH_6__
+	v6_arm "-march=armv6+fp -marm" __ARM_ARCH_6__
+	v6_thumb "-march=armv6+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6__
+	v6k "-march=armv6k+fp -mfloat-abi=softfp" __ARM_ARCH_6K__
+	v6k_arm "-march=armv6k+fp -marm" __ARM_ARCH_6K__
+	v6k_thumb "-march=armv6k+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6K__
+	v6t2 "-march=armv6t2+fp" __ARM_ARCH_6T2__
+	v6z "-march=armv6z+fp -mfloat-abi=softfp" __ARM_ARCH_6Z__
+	v6z_arm "-march=armv6z+fp -marm" __ARM_ARCH_6Z__
+	v6z_thumb "-march=armv6z+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6Z__
 	v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
-	v7a "-march=armv7-a" __ARM_ARCH_7A__
-	v7r "-march=armv7-r" __ARM_ARCH_7R__
+	v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
+	v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
 	v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
-	v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
-	v7ve "-march=armv7ve -marm"
+	v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__
+	v7ve "-march=armv7ve+fp -marm"
 		"__ARM_ARCH_7A__ && __ARM_FEATURE_IDIV"
-	v8a "-march=armv8-a" __ARM_ARCH_8A__
-	v8a_hard "-march=armv8-a -mfpu=neon-fp-armv8 -mfloat-abi=hard" __ARM_ARCH_8A__
-	v8_1a "-march=armv8.1-a" __ARM_ARCH_8A__
-	v8_2a "-march=armv8.2-a" __ARM_ARCH_8A__
-	v8r "-march=armv8-r" __ARM_ARCH_8R__
+	v8a "-march=armv8-a+simd" __ARM_ARCH_8A__
+	v8a_hard "-march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" __ARM_ARCH_8A__
+	v8_1a "-march=armv8.1-a+simd" __ARM_ARCH_8A__
+	v8_2a "-march=armv8.2-a+simd" __ARM_ARCH_8A__
+	v8r "-march=armv8-r+fp.sp" __ARM_ARCH_8R__
 	v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
 		__ARM_ARCH_8M_BASE__
-	v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__
-	v8_1m_main "-march=armv8.1-m.main -mthumb" __ARM_ARCH_8M_MAIN__
-	v9a "-march=armv9-a" __ARM_ARCH_9A__ } {
+	v8m_main "-march=armv8-m.main+fp -mthumb" __ARM_ARCH_8M_MAIN__
+	v8_1m_main "-march=armv8.1-m.main+fp -mthumb" __ARM_ARCH_8M_MAIN__
+	v9a "-march=armv9-a+simd" __ARM_ARCH_9A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEFS $armdefs ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	return [check_no_compiler_messages arm_arch_FUNC_ok assembly {


[committed 06/22] arm: testsuite: avoid problems with -mfpu=auto in attr_thumb-static2.c

2023-11-13 Thread Richard Earnshaw

This test overrides the architecture, but fails to describe which
floating-point features are needed.  This causes problems if the ABI
requires FP for parameter passing and -mfpu=auto is selected, so ensure
that one is specified.

gcc/testsuite:

* gcc.target/arm/attr_thumb-static2.c: Add +fp to the -march
specification.
---
 gcc/testsuite/gcc.target/arm/attr_thumb-static2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c b/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
index 77454343b23..a38f9a95607 100644
--- a/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
+++ b/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
@@ -2,7 +2,7 @@
 
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arch_v7a_ok } */
-/* { dg-options "-O0 -march=armv7-a" } */
+/* { dg-options "-O0 -march=armv7-a+fp" } */
 
 struct _NSPoint
 {


[committed 11/22] arm: testsuite: improve compatibility of pr88648-asm-syntax-unified.c

2023-11-13 Thread Richard Earnshaw

Fix another test that was trying to set the architecture directly
rather than using the infrastructure as intended.

gcc/testsuite:

* gcc.target/arm/pr88648-asm-syntax-unified.c: It isn't necessary
to try to override the architecture flags specified by arm_arch_v7a.
---
 gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
index 251b4d5bc9d..53d0bb053fc 100644
--- a/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
+++ b/gcc/testsuite/gcc.target/arm/pr88648-asm-syntax-unified.c
@@ -1,8 +1,8 @@
 /* Test for unified syntax assembly generation.  */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-options "-marm -masm-syntax-unified" } */
 /* { dg-add-options arm_arch_v7a } */
-/* { dg-options "-marm -march=armv7-a -masm-syntax-unified" } */
 
 void test ()
 {


[committed 17/22] arm: testsuite: improve compatibility of gcc.target/arm/optional_thumb-*.c

2023-11-13 Thread Richard Earnshaw

These tests deliberately pass invalid option combinations to check
that the compiler is generating the correct diagnostic.  Nevertheless,
we can improve their compatibility with other testsuite options.  For
optional_thumb-1.c we use a soft-float ABI, while for
optional_thumb2.c we use arm_arch_v7em as the target architecture,
then set the architecture manually.

gcc/testsuite:

* gcc.target/arm/optional_thumb-1.c: Force a soft-float ABI.
* gcc.target/arm/optional_thumb-3.c: Check for armv7e-m compatibility,
then set the architecture explicitly.
---
 gcc/testsuite/gcc.target/arm/optional_thumb-1.c | 2 +-
 gcc/testsuite/gcc.target/arm/optional_thumb-3.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/optional_thumb-1.c b/gcc/testsuite/gcc.target/arm/optional_thumb-1.c
index 99cb0c3f33b..90d9ada6ade 100644
--- a/gcc/testsuite/gcc.target/arm/optional_thumb-1.c
+++ b/gcc/testsuite/gcc.target/arm/optional_thumb-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { ! default_mode } } } */
 /* { dg-skip-if "-marm/-mthumb/-march/-mcpu given" { *-*-* } { "-marm" "-mthumb" "-march=*" "-mcpu=*" } } */
-/* { dg-options "-march=armv6-m" } */
+/* { dg-options "-march=armv6-m -mfloat-abi=soft" } */
 
 /* Check that -mthumb is not needed when compiling for a Thumb-only target.  */
 
diff --git a/gcc/testsuite/gcc.target/arm/optional_thumb-3.c b/gcc/testsuite/gcc.target/arm/optional_thumb-3.c
index d9150e09e47..a6c661ac031 100644
--- a/gcc/testsuite/gcc.target/arm/optional_thumb-3.c
+++ b/gcc/testsuite/gcc.target/arm/optional_thumb-3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_cortex_m } */
+/* { dg-require-effective-target arm_arch_v7em_ok } */
 /* { dg-skip-if "-mthumb given" { *-*-* } { "-mthumb" } } */
-/* { dg-options "-marm" } */
+/* { dg-options "-march=armv7e-m+fp -marm" } */
 /* { dg-error "target CPU does not support ARM mode" "missing error with -marm on Thumb-only targets" { target *-*-* } 0 } */
 
 /* Check that -marm gives an error when compiling for a Thumb-only target.  */


[committed 05/22] arm: testsuite: avoid problems with -mfpu=auto in attr-crypto.c

2023-11-13 Thread Richard Earnshaw

This test overrides the architecture, but fails to describe which
floating-point features are needed.  This causes problems if the ABI
requires FP for parameter passing and -mfpu=auto is selected, so ensure
that one is specified.

gcc/testsuite:

* gcc.target/arm/attr-crypto.c: Add +simd to the -march
specification.
---
 gcc/testsuite/gcc.target/arm/attr-crypto.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/attr-crypto.c b/gcc/testsuite/gcc.target/arm/attr-crypto.c
index 05e458f36b6..3959d0b67e7 100644
--- a/gcc/testsuite/gcc.target/arm/attr-crypto.c
+++ b/gcc/testsuite/gcc.target/arm/attr-crypto.c
@@ -3,7 +3,7 @@
pragma.  */
 /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { "-mpure-code" } } */
 /* { dg-require-effective-target arm_fp_ok } */
-/* { dg-options "-O2 -march=armv8-a" } */
+/* { dg-options "-O2 -march=armv8-a+simd" } */
 /* { dg-add-options arm_fp } */
 
 /* Reset fpu to a value compatible with the next pragmas.  */


[committed 08/22] arm: testsuite: improve compatibility of arm/lto/pr96939_1.c

2023-11-13 Thread Richard Earnshaw

This test overrides the architecture, but fails to specify the
floating point architecture.  This causes problems if -mfpu=auto is
used.

gcc/testsuite:

* gcc.target/arm/lto/pr96939_1.c: Add +simd to the architecture
specification.
---
 gcc/testsuite/gcc.target/arm/lto/pr96939_1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
index 53c6093e803..4afdbdaf5ad 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
@@ -1,5 +1,5 @@
 /* PR target/96939 */
-/* { dg-options "-march=armv8-a+crc" } */
+/* { dg-options "-march=armv8-a+simd+crc" } */
 
 #include 
 


[committed 12/22] arm: testsuite: improve compatibility of pragma_arch_attribute*.c

2023-11-13 Thread Richard Earnshaw

These tests use pragmas adn attributes to change the architecture.
Sometimes they simply add a feature using "+crc", but other times they
try to completely reset the architecture using "arch=armv8-a+crc".
The latter fails on a hard-float ABI with -mfpu=auto because it also
clears the FP capability.  Fix by adding +simd when the full
architecture is specified.

gcc/testsuite:

* gcc.target/arm/pragma_arch_attribute.c: Add +simd to pragmas that
set an explicit architecture.
* gcc.target/arm/pragma_arch_attribute_2.c: Likewise.
* gcc.target/arm/pragma_arch_attribute_3.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/pragma_arch_attribute.c   | 6 +++---
 gcc/testsuite/gcc.target/arm/pragma_arch_attribute_2.c | 2 +-
 gcc/testsuite/gcc.target/arm/pragma_arch_attribute_3.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute.c b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute.c
index a06dbf04037..a5e1edad3a4 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute.c
@@ -10,7 +10,7 @@
 #endif
 
 #pragma GCC push_options
-#pragma GCC target ("arch=armv8-a+crc")
+#pragma GCC target ("arch=armv8-a+simd+crc")
 #ifndef __ARM_FEATURE_CRC32
 # error "__ARM_FEATURE_CRC32 is not defined in push 1."
 #endif
@@ -41,7 +41,7 @@ void test_crc_unknown_ok_attr_1 ()
 # error "__ARM_FEATURE_CRC32 is defined after attribute set 1."
 #endif
 
-__attribute__((target("arch=armv8-a+crc")))
+__attribute__((target("arch=armv8-a+simd+crc")))
 void test_crc_unknown_ok_attr_2 ()
 {
 	__crc32b (0, 0);
@@ -51,4 +51,4 @@ void test_crc_unknown_ok_attr_2 ()
 # error "__ARM_FEATURE_CRC32 is defined after attribute set 2."
 #endif
 
-#pragma GCC reset_options
\ No newline at end of file
+#pragma GCC reset_options
diff --git a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_2.c b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_2.c
index 2e8e385774b..189af170096 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_2.c
@@ -8,7 +8,7 @@
 
 extern uint32_t bar();
 
-__attribute__((target("arch=armv8-a+crc"))) uint32_t crc32cw(uint32_t crc, uint32_t val)
+__attribute__((target("arch=armv8-a+simd+crc"))) uint32_t crc32cw(uint32_t crc, uint32_t val)
 {
 uint32_t res;
 asm("crc32cw %0, %1, %2" : "=r"(res) : "r"(crc), "r"(val));
diff --git a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_3.c b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_3.c
index 3714812cf26..eb7f990477b 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_3.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_arch_attribute_3.c
@@ -9,7 +9,7 @@
 extern uint32_t bar();
 
 #pragma GCC push_options
-#pragma GCC target("arch=armv8-a+crc")
+#pragma GCC target("arch=armv8-a+simd+crc")
 uint32_t crc32cw(uint32_t crc, uint32_t val)
 {
 uint32_t res;


[committed 14/22] arm: testsuite: modernize framework usage for arm/scd42-2.c

2023-11-13 Thread Richard Earnshaw

Make this test more useful by using dg-require-effective-target/
dg-add-options.

gcc/testsuite:

* gcc.target/arm/scd42-2.c: Use modern dg- flags.
---
 gcc/testsuite/gcc.target/arm/scd42-2.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/scd42-2.c b/gcc/testsuite/gcc.target/arm/scd42-2.c
index 3c9768d22d9..cd416885a80 100644
--- a/gcc/testsuite/gcc.target/arm/scd42-2.c
+++ b/gcc/testsuite/gcc.target/arm/scd42-2.c
@@ -1,11 +1,8 @@
 /* Verify that mov is preferred on XScale for loading a 2 byte constant. */
 /* { dg-do compile } */
-/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-march=*" } { "-march=xscale" } } */
-/* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } { "-mcpu=xscale" } } */
-/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" } } */
-/* { dg-require-effective-target arm32 } */
-/* { dg-require-effective-target arm_arch_v5te_arm_ok } */
-/* { dg-options "-mcpu=xscale -O -marm" } */
+/* { dg-require-effective-target arm_arch_xscale_arm_ok } */
+/* { dg-options "-O" } */
+/* { dg-add-options arm_arch_xscale_arm } */
 
 unsigned load2(void) __attribute__ ((naked));
 unsigned load2(void)


[committed 13/22] arm: testsuite: improve compatibility of pragma_arch_switch_2.c

2023-11-13 Thread Richard Earnshaw

This test was explicitly setting the architecture on the command-line and
in the body of the test.  In both cases this causes problems with the auto
FPU setting.  Fix by using the testsuite infrastructure correctly and by
adding +fp to the pragma.

gcc/testsuite:

* gcc.target/arm/pragma_arch_switch_2.c: Use testsuite infrastructure
to set the architecture flags.  Add +fp to the pragma that changes the
architecture.
---
 gcc/testsuite/gcc.target/arm/pragma_arch_switch_2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/pragma_arch_switch_2.c b/gcc/testsuite/gcc.target/arm/pragma_arch_switch_2.c
index 5080d2c7a91..567943bd8ed 100644
--- a/gcc/testsuite/gcc.target/arm/pragma_arch_switch_2.c
+++ b/gcc/testsuite/gcc.target/arm/pragma_arch_switch_2.c
@@ -3,9 +3,10 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_arm_ok } */
 /* { dg-require-effective-target arm_arch_v5te_arm_ok } */
-/* { dg-additional-options "-Wall -O2 -march=armv5te -std=gnu99 -marm" } */
+/* { dg-additional-options "-Wall -O2 -std=gnu99" } */
+/* { dg-add-options arm_arch_v5te_arm } */
 
-#pragma GCC target ("arch=armv6")
+#pragma GCC target ("arch=armv6+fp")
 int test_assembly (int hi, int lo)
 {
int res;


[committed 09/22] arm: testsuite: tidy up pr65647-2.c pre-checks.

2023-11-13 Thread Richard Earnshaw

Another case where we can make better use of the infrastructure to
improve the compatibility of this test.

gcc/testsuite:

* gcc.target/arm/pr65647-2.c: Use dg-add-options to manage target
flags.
---
 gcc/testsuite/gcc.target/arm/pr65647-2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr65647-2.c b/gcc/testsuite/gcc.target/arm/pr65647-2.c
index e3978e512ea..79637bfd9d7 100644
--- a/gcc/testsuite/gcc.target/arm/pr65647-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr65647-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arch_v6_arm_ok } */
-/* { dg-options "-O3 -marm -march=armv6 -std=c99" } */
+/* { dg-options "-O3 -std=c99" } */
+/* { dg-add-options arm_arch_v6_arm } */
 
 typedef struct {
   int i;


[committed 19/22] arm: testsuite: improve compatibility of gcc.target/arm/pr59575.c

2023-11-13 Thread Richard Earnshaw

Use dg-require-effective-target/dg-add-options to improve
compatibility of this test with various compiler configurations.

gcc/testsuite:

* gcc.target/arm/pr59575.c: Use dg-require-effective-target and
dg-add-options.
---
 gcc/testsuite/gcc.target/arm/pr59575.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr59575.c b/gcc/testsuite/gcc.target/arm/pr59575.c
index cc49be3d61f..27d7d40526e 100644
--- a/gcc/testsuite/gcc.target/arm/pr59575.c
+++ b/gcc/testsuite/gcc.target/arm/pr59575.c
@@ -1,7 +1,9 @@
 /* PR target/59575 */
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
 /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { "-mpure-code" } } */
-/* { dg-options "-Os -g -march=armv7-a" } */
+/* { dg-options "-Os -g" } */
+/* { dg-add-options arm_arch_v7a } */
 
 void foo (int *);
 int *bar (int, long long, int);


[committed 20/22] testsuite: arm: tighten up mode-specific ISA tests

2023-11-13 Thread Richard Earnshaw

Some of the standard Arm architecture tests require the test to use a
specific instruction set (arm or thumb).  But although the framework
was checking that the flag was accepted, it wasn't checking that the
flag wasn't somehow being override (eg by run-specific options).  We
can improve these tests easily by checking whether or not __thumb-_ is
defined.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok):
For instruction-set specific tests, check that __thumb__ is, or
isn't defined as appropriate.
---
 gcc/testsuite/lib/target-supports.exp | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 316e34a34be..3d504d26164 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5403,25 +5403,25 @@ proc check_effective_target_arm_fp16_hw { } {
 foreach { armfunc armflag armdefs } {
 	v4 "-march=armv4 -marm" __ARM_ARCH_4__
 	v4t "-march=armv4t -mfloat-abi=softfp" __ARM_ARCH_4T__
-	v4t_arm "-march=armv4t -marm" __ARM_ARCH_4T__
-	v4t_thumb "-march=armv4t -mthumb -mfloat-abi=softfp" __ARM_ARCH_4T__
+	v4t_arm "-march=armv4t -marm" "__ARM_ARCH_4T__ && !__thumb__"
+	v4t_thumb "-march=armv4t -mthumb -mfloat-abi=softfp" "__ARM_ARCH_4T__ && __thumb__"
 	v5t "-march=armv5t -mfloat-abi=softfp" __ARM_ARCH_5T__
-	v5t_arm "-march=armv5t -marm" __ARM_ARCH_5T__
-	v5t_thumb "-march=armv5t -mthumb -mfloat-abi=softfp" __ARM_ARCH_5T__
+	v5t_arm "-march=armv5t -marm" "__ARM_ARCH_5T__ && !__thumb__"
+	v5t_thumb "-march=armv5t -mthumb -mfloat-abi=softfp" "__ARM_ARCH_5T__ && __thumb__"
 	v5te "-march=armv5te+fp -mfloat-abi=softfp" __ARM_ARCH_5TE__
-	v5te_arm "-march=armv5te+fp -marm" __ARM_ARCH_5TE__
-	v5te_thumb "-march=armv5te+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_5TE__
-	xscale_arm "-mcpu=xscale -mfloat-abi=soft -marm" __XSCALE__
+	v5te_arm "-march=armv5te+fp -marm" "__ARM_ARCH_5TE__ && !__thumb__"
+	v5te_thumb "-march=armv5te+fp -mthumb -mfloat-abi=softfp" "__ARM_ARCH_5TE__ && __thumb__"
+	xscale_arm "-mcpu=xscale -mfloat-abi=soft -marm" "__XSCALE__ && !__thumb__"
 	v6 "-march=armv6+fp -mfloat-abi=softfp" __ARM_ARCH_6__
-	v6_arm "-march=armv6+fp -marm" __ARM_ARCH_6__
-	v6_thumb "-march=armv6+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6__
+	v6_arm "-march=armv6+fp -marm" "__ARM_ARCH_6__ && !__thumb__"
+	v6_thumb "-march=armv6+fp -mthumb -mfloat-abi=softfp" "__ARM_ARCH_6__ && __thumb__"
 	v6k "-march=armv6k+fp -mfloat-abi=softfp" __ARM_ARCH_6K__
-	v6k_arm "-march=armv6k+fp -marm" __ARM_ARCH_6K__
-	v6k_thumb "-march=armv6k+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6K__
+	v6k_arm "-march=armv6k+fp -marm" "__ARM_ARCH_6K__ && !__thumb__"
+	v6k_thumb "-march=armv6k+fp -mthumb -mfloat-abi=softfp" "__ARM_ARCH_6K__ && __thumb__"
 	v6t2 "-march=armv6t2+fp" __ARM_ARCH_6T2__
 	v6z "-march=armv6z+fp -mfloat-abi=softfp" __ARM_ARCH_6Z__
-	v6z_arm "-march=armv6z+fp -marm" __ARM_ARCH_6Z__
-	v6z_thumb "-march=armv6z+fp -mthumb -mfloat-abi=softfp" __ARM_ARCH_6Z__
+	v6z_arm "-march=armv6z+fp -marm" "__ARM_ARCH_6Z__ && !__thumb__"
+	v6z_thumb "-march=armv6z+fp -mthumb -mfloat-abi=softfp" "__ARM_ARCH_6Z__ && __thumb__"
 	v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
 	v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
 	v7r "-march=armv7-r+fp" __ARM_ARCH_7R__


[committed 16/22] arm: testsuite: improve compatibility of gcc.target/arm/macro_defs*.c

2023-11-13 Thread Richard Earnshaw

Convert these tests to use dg-add-options for increased compatibilty.
Since they also result in an empty translation unit, override the
default testsuite options.

gcc/testsuite:

* gcc.target/arm/macro_defs0.c: Use dg-effective-target and
dg-add-options.
* gcc.target/arm/macro_defs1.c: Likewise.
* gcc.target/arm/macro_defs2.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/macro_defs0.c | 7 +++
 gcc/testsuite/gcc.target/arm/macro_defs1.c | 6 ++
 gcc/testsuite/gcc.target/arm/macro_defs2.c | 6 ++
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/macro_defs0.c b/gcc/testsuite/gcc.target/arm/macro_defs0.c
index 684d49ffafa..17fd157452e 100644
--- a/gcc/testsuite/gcc.target/arm/macro_defs0.c
+++ b/gcc/testsuite/gcc.target/arm/macro_defs0.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv7-m" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
-/* { dg-options "-march=armv7-m -mcpu=cortex-m3 -mfloat-abi=soft -mthumb" } */
+/* { dg-require-effective-target arm_arch_v7m_ok } */
+/* { dg-options "" } */
+/* { dg-add-options arm_arch_v7m } */
 
 #ifdef __ARM_FP
 #error __ARM_FP should not be defined
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs1.c b/gcc/testsuite/gcc.target/arm/macro_defs1.c
index 655ba9334f3..bd22154321e 100644
--- a/gcc/testsuite/gcc.target/arm/macro_defs1.c
+++ b/gcc/testsuite/gcc.target/arm/macro_defs1.c
@@ -1,10 +1,8 @@
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6-m" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
 /* { dg-require-effective-target arm_arch_v6m_ok } */
-/* { dg-options "-march=armv6-m -mthumb" } */
+/* { dg-options "" } */
+/* { dg-add-options arm_arch_v6m } */
 
 #ifdef __ARM_NEON_FP
 #error __ARM_NEON_FP should not be defined
 #endif
-
diff --git a/gcc/testsuite/gcc.target/arm/macro_defs2.c b/gcc/testsuite/gcc.target/arm/macro_defs2.c
index 9a960423562..a26fc237611 100644
--- a/gcc/testsuite/gcc.target/arm/macro_defs2.c
+++ b/gcc/testsuite/gcc.target/arm/macro_defs2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-march=armv7ve -mcpu=cortex-a15 -mfpu=neon-vfpv4" } */
-/* { dg-add-options arm_neon } */
 /* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "" } */
+/* { dg-add-options arm_neon } */
 
 #ifndef __ARM_NEON_FP
 #error  __ARM_NEON_FP is not defined but should be
@@ -10,5 +10,3 @@
 #ifndef __ARM_FP
 #error  __ARM_FP is not defined but should be
 #endif
-
-


[committed 15/22] arm: testsuite: improve compatibility of ftest-armv7m-thumb.c

2023-11-13 Thread Richard Earnshaw

This test is specific to armv7m cores which do not support hardware
floating-point.  We can improve its compatibility by having the default
options for this core specify -mfloat-abi=soft.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_FUNC_ok):
Use soft-float ABI for armv7m.
* gcc.target/arm/ftest-armv7m-thumb.c: Use dg-require-effective-target
to check flag compatibility.
---
 gcc/testsuite/gcc.target/arm/ftest-armv7m-thumb.c | 3 +--
 gcc/testsuite/lib/target-supports.exp | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/ftest-armv7m-thumb.c b/gcc/testsuite/gcc.target/arm/ftest-armv7m-thumb.c
index 363b48b7516..ba1985f5b0d 100644
--- a/gcc/testsuite/gcc.target/arm/ftest-armv7m-thumb.c
+++ b/gcc/testsuite/gcc.target/arm/ftest-armv7m-thumb.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=arm7-m" } } */
-/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
+/* { dg-require-effective-target arm_arch_v7m_ok }
 /* { dg-options "-mthumb" } */
 /* { dg-add-options arm_arch_v7m } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 9d2958626ad..316e34a34be 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5425,7 +5425,7 @@ foreach { armfunc armflag armdefs } {
 	v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
 	v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
 	v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
-	v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+	v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
 	v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__
 	v7ve "-march=armv7ve+fp -marm"
 		"__ARM_ARCH_7A__ && __ARM_FEATURE_IDIV"


[committed 18/22] arm: testsuite: improve compatibility of gcc.target/arm/pr19599.c

2023-11-13 Thread Richard Earnshaw

Add +fp to the architecture specification, so that -mfpu=auto works
with the hard-float ABI.

gcc/testsuite:

* gcc.target/arm/pr19599.c: Add +fp to the architecture.
---
 gcc/testsuite/gcc.target/arm/pr19599.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr19599.c b/gcc/testsuite/gcc.target/arm/pr19599.c
index a536548442f..d2f15ae4499 100644
--- a/gcc/testsuite/gcc.target/arm/pr19599.c
+++ b/gcc/testsuite/gcc.target/arm/pr19599.c
@@ -1,6 +1,6 @@
 /* { dg-skip-if "need at least armv5te" { *-*-* } { "-march=armv[234]*" "-mthumb" } { "" } } */
 /* { dg-skip-if "FDPIC does not support armv5te" { arm*-*-uclinuxfdpiceabi } "*" "" } */
-/* { dg-options "-O2 -march=armv5te -marm" }  */
+/* { dg-options "-O2 -march=armv5te+fp -marm" }  */
 /* { dg-final { scan-assembler "bx" } } */
 
 int (*indirect_func)();


[committed 22/22] arm: testsuite: improve compatibility of gcc.dg/debug/pr57351.c

2023-11-13 Thread Richard Earnshaw

This test is arm specific and requires neon.  To improve compatibility
add a new test for armv7-a with neon and use that.

gcc/testsuite:

* lib/target-supports.exp (v7a_neon): New feature-test target.
* gcc.dg/debug/pr57351.c: Use it.
---
 gcc/testsuite/gcc.dg/debug/pr57351.c  | 7 +++
 gcc/testsuite/lib/target-supports.exp | 1 +
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/debug/pr57351.c b/gcc/testsuite/gcc.dg/debug/pr57351.c
index 236d74ddedb..50861a4bf88 100644
--- a/gcc/testsuite/gcc.dg/debug/pr57351.c
+++ b/gcc/testsuite/gcc.dg/debug/pr57351.c
@@ -1,8 +1,7 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_neon }  */
-/* { dg-require-effective-target arm_arch_v7a_ok }  */
-/* { dg-options "-std=c99 -Os -g -march=armv7-a" } */
-/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_arch_v7a_neon_ok }  */
+/* { dg-options "-std=c99 -Os -g" } */
+/* { dg-add-options arm_arch_v7a_neon } */
 
 typedef unsigned int size_t;
 typedef int ptrdiff_t;
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ae43dc97872..43a040e135c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5425,6 +5425,7 @@ foreach { armfunc armflag armdefs } {
 	v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
 	v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
 	v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
+	v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" "__ARM_ARCH_7A__ && __ARM_NEON__"
 	v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
 	v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
 	v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__


  1   2   3   4   5   6   7   8   9   10   >