Re: [PATCH 1/5] vec: Add quick_emplace_push/safe_emplace_push

2024-10-23 Thread Andrew Pinski
On Tue, Oct 22, 2024 at 11:49 PM Richard Biener
 wrote:
>
> On Tue, Oct 22, 2024 at 5:31 PM Andrew Pinski  
> wrote:
> >
> > This adds quick_emplace_push and safe_emplace_push to vec.
> > These are like std::vector's emplace_back so you don't need an extra
> > copy of the struct around.
> >
> > Since we require C++11 and also support partial non-PODs for vec, these 
> > functions
> > can be added.
> >
> > I will be using them to cleanup some code and even improving how some places
> > use vec.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * vec.h (vec::quick_emplace_push): New function.
> > (vec::quick_emplace_push): New function.
> > (vec::safe_emplace_push): New function.
> > * vec.cc (test_init): Add test for safe_emplace_push.
> > (test_quick_emplace_push): New function.
> > (test_safe_emplace_push): New function.
> > (vec_cc_tests): Call test_quick_emplace_push and
> > test_safe_emplace_push.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/vec.cc | 41 +
> >  gcc/vec.h  | 46 ++
> >  2 files changed, 87 insertions(+)
> >
> > diff --git a/gcc/vec.cc b/gcc/vec.cc
> > index ba963b96c6a..cd7cd3c156b 100644
> > --- a/gcc/vec.cc
> > +++ b/gcc/vec.cc
> > @@ -304,6 +304,10 @@ test_init ()
> >
> >  ASSERT_EQ (2, v1.length ());
> >  ASSERT_EQ (2, v2.length ());
> > +v2.safe_emplace_push (1);
> > +
> > +ASSERT_EQ (3, v1.length ());
> > +ASSERT_EQ (3, v2.length ());
> >  v1.release ();
> >}
> >  }
> > @@ -327,6 +331,25 @@ test_quick_push ()
> >ASSERT_EQ (7, v[2]);
> >  }
> >
> > +/* Verify that vec::quick_emplace_push works correctly.  */
> > +
> > +static void
> > +test_quick_emplace_push ()
> > +{
> > +  auto_vec > v;
> > +  ASSERT_EQ (0, v.length ());
> > +  v.reserve (3);
> > +  ASSERT_EQ (0, v.length ());
> > +  ASSERT_TRUE (v.space (3));
> > +  v.quick_emplace_push (5, 5);
> > +  v.quick_emplace_push (6, 6);
> > +  v.quick_emplace_push (7, 7);
> > +  ASSERT_EQ (3, v.length ());
> > +  ASSERT_EQ (std::make_pair(5,5), v[0]);
> > +  ASSERT_EQ (std::make_pair(6,6), v[1]);
> > +  ASSERT_EQ (std::make_pair(7,7), v[2]);
> > +}
> > +
> >  /* Verify that vec::safe_push works correctly.  */
> >
> >  static void
> > @@ -343,6 +366,22 @@ test_safe_push ()
> >ASSERT_EQ (7, v[2]);
> >  }
> >
> > +/* Verify that vec::safe_emplace_push works correctly.  */
> > +
> > +static void
> > +test_safe_emplace_push ()
> > +{
> > +  auto_vec > v;
> > +  ASSERT_EQ (0, v.length ());
> > +  v.safe_emplace_push (5, 5);
> > +  v.safe_emplace_push (6, 6);
> > +  v.safe_emplace_push (7, 7);
> > +  ASSERT_EQ (3, v.length ());
> > +  ASSERT_EQ (std::make_pair(5,5), v[0]);
> > +  ASSERT_EQ (std::make_pair(6,6), v[1]);
> > +  ASSERT_EQ (std::make_pair(7,7), v[2]);
> > +}
> > +
> >  /* Verify that vec::truncate works correctly.  */
> >
> >  static void
> > @@ -591,7 +630,9 @@ vec_cc_tests ()
> >  {
> >test_init ();
> >test_quick_push ();
> > +  test_quick_emplace_push ();
> >test_safe_push ();
> > +  test_safe_emplace_push ();
> >test_truncate ();
> >test_safe_grow_cleared ();
> >test_pop ();
> > diff --git a/gcc/vec.h b/gcc/vec.h
> > index b13c4716428..8277d156f05 100644
> > --- a/gcc/vec.h
> > +++ b/gcc/vec.h
> > @@ -619,6 +619,8 @@ public:
> >void splice (const vec &);
> >void splice (const vec *src);
> >T *quick_push (const T &);
> > +  template 
> > +  T *quick_emplace_push (Args&&... args);
> >using pop_ret_type
> >  = typename std::conditional ::value,
> >  T &, void>::type;
> > @@ -1044,6 +1046,21 @@ vec::quick_push (const T &obj)
> >return slot;
> >  }
> >
> > +/* Push T(ARGS) (a new element) onto the end of the vector.  There must be
> > +   sufficient space in the vector.  Return a pointer to the slot
> > +   where T(ARGS) was inserted.  */
> > +
> > +template
> > +template
> > +inline T *
> > +vec::quick_emplace_push (Args&&... args)
> > +{
> > +  gcc_checking_assert (space (1));
> > +  T *slot = &address ()[m_vecpfx.m_num++];
> > +  ::new (static_cast(slot)) T (std::forward(args)...);
> > +  return slot;
> > +}
> > +
> >
> >  /* Pop and return a reference to the last element off the end of the
> > vector.  If T has non-trivial destructor, this method just pops
> > @@ -1612,7 +1629,11 @@ public:
> >void splice (const vec &);
> >void safe_splice (const vec & CXX_MEM_STAT_INFO);
> >T *quick_push (const T &);
> > +  template 
> > +  T *quick_emplace_push (Args&&... args);
> >T *safe_push (const T &CXX_MEM_STAT_INFO);
> > +  template 
> > +  T *safe_emplace_push (Args&&... args CXX_MEM_STAT_INFO);
> >using pop_ret_type
> >  = typename std::conditional ::value,
> >  T &, void>::type;
> > @@ -2070,6 +2091,18 @@ vec::quick_push (const T &obj)
> >return m_vec->quick_push (obj)

Re: [Patch, fortran] PR116733: Generic processing of assumed rank objects (f202y)

2024-10-23 Thread Tobias Burnus

Regarding 202y:

I think it is in general useful to have an implementation of features
before the standard is released, also to find issues before the standard
is released.

The downside I currently see is that the none of the features is really
ready (in the sense that there are explicit edits). In terms of
features, I would prefer that we mainly focus on those that passed not
only at J3 level but also were accepted at WG5 level.

Namely, that's the list at https://wg5-fortran.org/N2201-N2250/N2234.txt

That's the case for 'US23/DIN4 Add generic processing of assumed-rank
object' / 'Ref: 24-136r1 "DIN-4: Generic processing of assumed-rank
objects"' (→ https://j3-fortran.org/doc/year/24/24-136r1.txt )

However, there are other features (like 'unsigned') which were not
accepted, albeit given that gfortran pushes for unsigned, let's see
whether that will change …

High on the list are surely templates, but those are currently
completely underspecifed and too much in a flux…

BTW: This week is the next J3 meeting; let's see what will result from it …

Tobias



Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible

2024-10-23 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi all,
>
> Some vector rotate operations can be implemented in a single instruction
> rather than using the fallback SHL+USRA sequence.
> In particular, when the rotate amount is half the bitwidth of the element
> we can use a REV64,REV32,REV16 instruction.
> This patch adds this transformation in the recently added splitter for vector
> rotates.
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
>   Declare prototype.
>   * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm):
>   Call the above.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/simd/pr117048_2.c: New test.

Sorry to be awkward, but I still think at least part of this should be
target-independent.  Any rotate by a byte amount can be expressed as a
vector permutation in a target-independent way.  Target-independent code
can then use the usual optab routines to query whether the permutation
is possible and/or try to generate it.

I can see that it probably makes sense to leave target code to make
the decision about when to use the permutation strategy vs. other
approaches.  But the code to implement that strategy shouldn't need
to be target-specific.

E.g. we could have a routine:

  expand_rotate_as_vec_perm

which checks whether the rotation amount is suitable and tries to
generate the permutation if so.

Thanks,
Richard

> From e97509382b6bb755336ec4aa220fabd968e69502 Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov 
> Date: Wed, 16 Oct 2024 04:10:08 -0700
> Subject: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions
>  where possible
>
> Some vector rotate operations can be implemented in a single instruction
> rather than using the fallback SHL+USRA sequence.
> In particular, when the rotate amount is half the bitwidth of the element
> we can use a REV64,REV32,REV16 instruction.
> This patch adds this transformation in the recently added splitter for vector
> rotates.
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   * config/aarch64/aarch64-protos.h (aarch64_emit_opt_vec_rotate):
>   Declare prototype.
>   * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Implement.
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm):
>   Call the above.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/simd/pr117048_2.c: New test.
> ---
>  gcc/config/aarch64/aarch64-protos.h   |  1 +
>  gcc/config/aarch64/aarch64-simd.md|  3 +
>  gcc/config/aarch64/aarch64.cc | 49 ++
>  .../gcc.target/aarch64/simd/pr117048_2.c  | 66 +++
>  4 files changed, 119 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/pr117048_2.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index d03c1fe798b..da0e657a513 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -776,6 +776,7 @@ bool aarch64_rnd_imm_p (rtx);
>  bool aarch64_constant_address_p (rtx);
>  bool aarch64_emit_approx_div (rtx, rtx, rtx);
>  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
> +bool aarch64_emit_opt_vec_rotate (rtx, rtx, rtx);
>  tree aarch64_vector_load_decl (tree);
>  rtx aarch64_gen_callee_cookie (aarch64_isa_mode, arm_pcs);
>  void aarch64_expand_call (rtx, rtx, rtx, bool);
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 543179d9fce..44c40512f30 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1313,6 +1313,9 @@
>   (match_dup 4))
> (match_dup 3)))]
>{
> +if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2]))
> +  DONE;
> +
>  operands[3] = reload_completed ? operands[0] : gen_reg_rtx (mode);
>  rtx shft_amnt = unwrap_const_vec_duplicate (operands[2]);
>  int bitwidth = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 21d9a6b5a20..47859c4e31b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -15998,6 +15998,55 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
>return true;
>  }
>  
> +/* Emit an optimized sequence to perform a vector rotate
> +   of REG by the vector constant amount AMNT and place the result
> +   in DST.  Return true iff successful.  */
> +
> +bool
> +aarch64_emit_opt_vec_rotate (rtx dst, rtx reg, rtx amnt)
> +{
> +  amnt = unwrap_const_vec_duplicate (amnt);
> +  gcc_assert (CONST_INT_P (amnt));
> +  HOST_WIDE_INT rotamnt = UINTVAL (amnt);
> +  machine_mode mode = GET_MODE (reg);
> +  /* Rotates by half the element width map down to REV* instructions.  */
> +  if (rotamnt == G

Re: [PATCH] match.pd: Add std::pow folding optimizations.

2024-10-23 Thread Jennifer Schmitz


> On 22 Oct 2024, at 13:14, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
> 
>> 
>> 
>>> On 22 Oct 2024, at 11:05, Richard Biener  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
>>> 
 
 
> On 21 Oct 2024, at 10:51, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 18 Oct 2024, Jennifer Schmitz wrote:
> 
>> This patch adds the following two simplifications in match.pd:
>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
>> - pow (0.0, x) to 0.0, avoiding the call to pow.
>> The patterns are guarded by flag_unsafe_math_optimizations,
>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS,
>> and !HONOR_INFINITIES.
>> 
>> Tests were added to confirm the application of the transform for float,
>> double, and long double.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>> x86_64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>* match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
>>pow (0.0, x) -> 0.0.
>> 
>> gcc/testsuite/
>>* gcc.dg/tree-ssa/pow_fold_1.c: New test.
>> ---
>> gcc/match.pd   | 14 +
>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++
>> 2 files changed, 48 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 12d81fcac0d..ba100b117e7 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  (rdiv @0 (exps:s @1))
>>   (mult @0 (exps (negate @1)
>> 
>> + /* Simplify pow(1.0/x, y) into pow(x, -y).  */
>> + (if (! HONOR_INFINITIES (type)
>> +  && ! HONOR_SIGNED_ZEROS (type)
>> +  && ! flag_trapping_math
>> +  && ! flag_errno_math)
>> +  (simplify
>> +   (POW (rdiv:s real_onep@0 @1) @2)
>> +(POW @1 (negate @2)))
> 
> This one shouldn't need HONOR_SIGNED_ZEROS?
> 
>> +
>> +  /* Simplify pow(0.0, x) into 0.0.  */
>> +  (simplify
>> +   (POW real_zerop@0 @1)
> 
> I think this needs !HONOR_NANS (type)?
> 
> Otherwise OK.
 Thanks for the feedback, Richard and Andrew. I made the following changes 
 to the patch (current version of the patch below):
 - also applied the pattern to POWI and added tests for pow, powif, powil
 - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one 
 additionally under !HONOR_NANS (type)
 - added tests for powf16
>>> 
>>> Note powi is GCC internal, it doesn't set errno and it should be subject
>>> to different rules - I'd rather have patterns working on powi separate.
>> How about moving the patterns for POWI into the section 
>> flag_unsafe_math_optimizations && canonicalize_math_p () and not use 
>> (!flag_errno_math)?
> 
> Sounds good.
> 
>>> 
 Now, I am encountering two problems:
 
 First, the transform is not applied for float16 (even if 
 -fexcess-precision=16). Do you know what the problem could be?
>>> 
>>> I think you want to use POW_ALL instead of POW.  The generated
>>> cfn-operators.pd shows
>>> 
>>> (define_operator_list POW
>>>   BUILT_IN_POWF
>>>   BUILT_IN_POW
>>>   BUILT_IN_POWL
>>>   IFN_POW)
>>> (define_operator_list POW_FN
>>>   BUILT_IN_POWF16
>>>   BUILT_IN_POWF32
>>>   BUILT_IN_POWF64
>>>   BUILT_IN_POWF128
>>>   BUILT_IN_POWF32X
>>>   BUILT_IN_POWF64X
>>>   BUILT_IN_POWF128X
>>>   null)
>>> (define_operator_list POW_ALL
>>>   BUILT_IN_POWF
>>>   BUILT_IN_POW
>>>   BUILT_IN_POWL
>>>   BUILT_IN_POWF16
>>> ...
>>> 
>>> note this comes at expense of more generated code (in
>>> gimple/generic-match.pd).
>> Thanks, that solved the Float16 issue.
>>> 
 Second, validation on aarch64 shows a regression in tests
 - gcc.dg/recip_sqrt_mult_1.c and
 - gcc.dg/recip_sqrt_mult_5.c,
 because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the 
 recip pass and prevents application of the recip-patterns. The reason for 
 this might be that the single-use restriction only work if the integer 
 argument is non-constant, but in the failing test cases, the integer 
 argument is 2 and the pattern is applied despite the :s flag.
 For example, my pattern is **not** applied (single-use restriction works) 
 for:
 double res, res2;
 void foo (double a, int b)
 {
 double f (double);
 double t1 = 1.0 / a;
 res = __builtin_powi (t1, b);
 res2 = f (t1);
 }
 
 But the pattern **is** applied and single-use restriction does **not** 
 work for:
 double r

[PATCH 1/2 v2] Match: Simplify unsigned scalar sat_sub(x, 1) to (x - x != 0)

2024-10-23 Thread Li Xu
From: xuli 

When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
we can simplify (x != 0 ? x + max : 0) to (x - x != 0), thereby eliminating
a branch instruction.

Form2:
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{   \
  return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Take below form 2 as example:
DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  uint8_t _1;
  uint8_t _3;

   [local count: 1073741824]:
  if (x_2(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870912]:
  _3 = x_2(D) + 255;

   [local count: 1073741824]:
  # _1 = PHI 
  return _1;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
beq a0,zero,.L2
addiw   a0,a0,-1
andia0,a0,0xff
.L2:
ret

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  _Bool _1;
  unsigned char _2;
  uint8_t _4;

   [local count: 1073741824]:
  _1 = x_3(D) != 0;
  _2 = (unsigned char) _1;
  _4 = x_3(D) - _2;
  return _4;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
sneza5,a0
subwa0,a0,a5
andia0,a0,0xff
ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 
gcc/ChangeLog:

* match.pd: Simplify (x != 0 ? x + max : 0) to (x - x != 0).

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-44.c: New test.

---
 gcc/match.pd   |  9 
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c | 26 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0455dfa6993..6a245f8e0d3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3383,6 +3383,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* The boundary condition for case 10: IMM = 1:
+   SAT_U_SUB = X >= IMM ? (X - IMM) : 0.
+   simplify (X != 0 ? X + max : 0) to (X - X != 0).  */
+(simplify
+ (cond (ne @0 integer_zerop) (plus @0 integer_all_onesp@1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+ && types_match (type, @0))
+   (minus @0 (convert (ne @0 { build_zero_cst (type); })
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c
new file mode 100644
index 000..756ba065d84
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1" } */
+
+#include 
+
+uint8_t sat_u_imm1_uint8_t (uint8_t x)
+{
+  return x >= (uint8_t)1 ? x - (uint8_t)1 : 0;
+}
+
+uint16_t sat_u_imm1_uint16_t (uint16_t x)
+{
+  return x >= (uint16_t)1 ? x - (uint16_t)1 : 0;
+}
+
+uint32_t sat_u_imm1_uint32_t (uint32_t x)
+{
+  return x >= (uint32_t)1 ? x - (uint32_t)1 : 0;
+}
+
+uint64_t sat_u_imm1_uint64_t (uint64_t x)
+{
+  return x >= (uint64_t)1 ? x - (uint64_t)1 : 0;
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "phiopt1" } } */
-- 
2.17.1



Re: [PATCH v4 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-10-23 Thread Paul-Antoine Arras

Here is an updated patch following these comments.

On 09/10/2024 19:15, Tobias Burnus wrote:
First comments; I need to have a deeper, but now I need fetch some 
victuals.


Paul-Antoine Arras wrote:
This patch adds support to the C front-end to parse the `dispatch` 
construct and
the `adjust_args` clause. It also includes some common C/C++ bits for 
pragmas

and attributes.

Additional common C/C++ testcases are in a later patch in the series.


. . .


--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -571,6 +571,8 @@ const struct attribute_spec 
c_common_gnu_attributes[] =

    handle_omp_declare_variant_attribute, NULL },
    { "omp declare variant variant", 0, -1, true,  false, false, false,
    handle_omp_declare_variant_attribute, NULL },
+  { "omp declare variant adjust_args need_device_ptr", 0, -1, true,  
false, false, false,

+  handle_omp_declare_variant_attribute, NULL },


the first line is 9 characters too long ...


Fixed wrapping.


--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1747,6 +1747,8 @@ static void c_parser_omp_assumption_clauses 
(c_parser *, bool);

  static void c_parser_omp_allocate (c_parser *);
  static void c_parser_omp_assumes (c_parser *);
  static bool c_parser_omp_ordered (c_parser *, enum pragma_context, 
bool *);

+static tree
+c_parser_omp_dispatch (location_t, c_parser *);


Spurious line break after 'tree' in the declaration.


Fixed line break.


+// Adapted from c_parser_expr_no_commas


While parts of GCC have started to use // comments of C++ and C99,
this file seemingly hasn't and I am not sure that you want to be the
first one to adds it ...

I think this needs some words on the purpose of this function,
i.e. why it exists - alias what syntax it support and does not
support.


Updated comment.


+static tree
+c_parser_omp_dispatch_body (c_parser *parser)
+{


...


+  lhs = c_parser_conditional_expression (parser, NULL, NULL);
+  if (TREE_CODE (lhs.value) == CALL_EXPR)
+    return lhs.value;
+  else
+    {


You can save on indentation and curly braces by removing the 'else {'
as after the 'return' you never need to handle the CALL_EXPR case.


Right! Removed the else.


+  location_t op_location = c_parser_peek_token (parser)->location;
+  if (!c_parser_require (parser, CPP_EQ, "expected %<=%>"))
+    return error_mark_node;
+
+  /* Parse function name*/


(Possibly a '.' and then) two spaces before '*/'.


Fixed this and similar comments.


+    for (int i = 0; i < 3; i++)
+  {
+    sizeof_arg[i] = NULL_TREE;
+    sizeof_arg_loc[i] = UNKNOWN_LOCATION;


Wrong size: c_parser_expr_list expects that 6 not 3 values exist.


Changed loop upper bound.


Looks as if your code predates Jakub's change of Dec 2023:
r14-6741-ge7dd72aefed851  Split -Wcalloc-transposed-args warning from - 
Walloc-size, -Walloc-size fixes


Tobias


Thanks,
--
PAcommit 45357fe512876099f1c35c5c5b3b278ceecfda58
Author: Paul-Antoine Arras 
Date:   Fri May 24 17:28:55 2024 +0200

OpenMP: C front-end support for dispatch + adjust_args

This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.

Additional common C/C++ testcases are in a later patch in the series.

gcc/c-family/ChangeLog:

* c-attribs.cc (c_common_gnu_attributes): Add attribute for adjust_args
need_device_ptr.
* c-omp.cc (c_omp_directives): Uncomment dispatch.
* c-pragma.cc (omp_pragmas): Add dispatch.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_DISPATCH.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_NOCONTEXT and
PRAGMA_OMP_CLAUSE_NOVARIANTS.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.

diff --git gcc/c-family/c-attribs.cc gcc/c-family/c-attribs.cc
index 4dd2eecbea5..c4c71ea7514 100644
--- gcc/c-family/c-attribs.cc
+++ gcc/c-family/c-attribs.cc
@@ -571,6 +571,9 @@ const struct attribute_

Re: [PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-23 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi all,
>
> We can make use of the integrated rotate step of the XAR instruction
> to implement most vector integer rotates, as long we zero out one
> of the input registers for it.  This allows for a lower-latency sequence
> than the fallback SHL+USRA, especially when we can hoist the zeroing operation
> away from loops and hot parts.
> We can also use it for 64-bit vectors as long
> as we zero the top half of the vector to be rotated.  That should still be
> preferable to the default sequence.

Is the zeroing necessary?  We don't expect/require that 64-bit vector
modes are maintained in zero-extended form, or that 64-bit ops act as
strict_lowparts, so it should be OK to take a paradoxical subreg.
Or we could just extend the patterns to 64-bit modes, to avoid the
punning.

> With this patch we can gerate for the input:
> v4si
> G1 (v4si r)
> {
> return (r >> 23) | (r << 9);
> }
>
> v8qi
> G2 (v8qi r)
> {
>   return (r << 3) | (r >> 5);
> }
> the assembly for +sve2:
> G1:
> moviv31.4s, 0
> xar z0.s, z0.s, z31.s, #23
> ret
>
> G2:
> moviv31.4s, 0
> fmovd0, d0
> xar z0.b, z0.b, z31.b, #5
> ret
>
> instead of the current:
> G1:
> shl v31.4s, v0.4s, 9
> usrav31.4s, v0.4s, 23
> mov v0.16b, v31.16b
> ret
> G2:
> shl v31.8b, v0.8b, 3
> usrav31.8b, v0.8b, 5
> mov v0.8b, v31.8b
> ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
>   generation of XAR sequences when possible.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/rotate_xar_1.c: New test.
> [...]
> +/*
> +** G1:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   xar v0\.2d, v([0-9]+)\.2d, v([0-9]+)\.2d, 39

FWIW, the (...) captures aren't necessary, since we never use backslash
references to them later.

Thanks,
Richard

> +**  ret
> +*/
> +v2di
> +G1 (v2di r) {
> +return (r >> 39) | (r << 25);
> +}
> +
> +/*
> +** G2:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   xar z0\.s, z([0-9]+)\.s, z([0-9]+)\.s, #23
> +**  ret
> +*/
> +v4si
> +G2 (v4si r) {
> +return (r >> 23) | (r << 9);
> +}
> +
> +/*
> +** G3:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   xar z0\.h, z([0-9]+)\.h, z([0-9]+)\.h, #5
> +**  ret
> +*/
> +v8hi
> +G3 (v8hi r) {
> +return (r >> 5) | (r << 11);
> +}
> +
> +/*
> +** G4:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   xar z0\.b, z([0-9]+)\.b, z([0-9]+)\.b, #6
> +**  ret
> +*/
> +v16qi
> +G4 (v16qi r)
> +{
> +  return (r << 2) | (r >> 6);
> +}
> +
> +/*
> +** G5:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   fmovd[0-9]+, d[0-9]+
> +**   xar z0\.s, z([0-9]+)\.s, z([0-9]+)\.s, #22
> +**  ret
> +*/
> +v2si
> +G5 (v2si r) {
> +return (r >> 22) | (r << 10);
> +}
> +
> +/*
> +** G6:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   fmovd[0-9]+, d[0-9]+
> +**   xar z0\.h, z([0-9]+)\.h, z([0-9]+)\.h, #7
> +**  ret
> +*/
> +v4hi
> +G6 (v4hi r) {
> +return (r >> 7) | (r << 9);
> +}
> +
> +/*
> +** G7:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   fmovd[0-9]+, d[0-9]+
> +**   xar z0\.b, z([0-9]+)\.b, z([0-9]+)\.b, #5
> +**  ret
> +*/
> +v8qi
> +G7 (v8qi r)
> +{
> +  return (r << 3) | (r >> 5);
> +}
> +


[committed] Fortran: Minor follow-up cleanup to error.cc

2024-10-23 Thread Tobias Burnus
Committed attached patch as r15-4565-g0ecc45a88d7722. It removes 
'terminal_width', an unused leftover before switching to the common 
diagnostic, which I missed when doing the last cleanup. Best regards, 
Tobias
commit 0ecc45a88d772268a3bd83af02759857da0826d4
Author: Tobias Burnus 
Date:   Wed Oct 23 12:25:00 2024 +0200

Fortran: Minor follow-up cleanup to error.cc

Follow up to r15-4268-g459c6018d2308d, which removed dead code,
but missing that terminal_width was only set but not used.

gcc/fortran/ChangeLog:

* error.cc (terminal_width, gfc_get_terminal_width): Remove.
(gfc_error_init_1): Do not call one to set the other.

diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index 4e60b148a34..b27cbede164 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -39,8 +39,6 @@ static int suppress_errors = 0;
 
 static bool warnings_not_errors = false;
 
-static int terminal_width;
-
 /* True if the error/warnings should be buffered.  */
 static bool buffered_p;
 
@@ -141,21 +139,11 @@ gfc_query_suppress_errors (void)
 }
 
 
-/* Determine terminal width (for trimming source lines in output).  */
-
-static int
-gfc_get_terminal_width (void)
-{
-  return isatty (STDERR_FILENO) ? get_terminal_width () : INT_MAX;
-}
-
-
 /* Per-file error initialization.  */
 
 void
 gfc_error_init_1 (void)
 {
-  terminal_width = gfc_get_terminal_width ();
   gfc_buffer_error (false);
 }
 


[PATCH v2 0/4] aarch64: add minimal support of AEABI build attributes for GCS

2024-10-23 Thread Matthieu Longo
The primary focus of this patch series is to add support for build attributes 
in the context of GCS (Guarded Control Stack, an Armv9.4-a extension) to the 
AArch64 backend.
It addresses comments from revision 1 [2] and 2 [3], and proposes a different 
approach compared to the previous implementation of the build attributes.

The series is composed of the following 4 patches:
1. Patch adding assembly debug comments (-dA) to the existing GNU properties, 
to improve testing and check the correctness of values.
2. The minimal patch adding support for build attributes in the context of GCS.
3. A refactoring of (2) to make things less error-prone and more modular, add 
support for asciz attributes and more debug information.
4. A refactoring of (1) relying partly on (3).
The targeted final state of this series would consist in squashing (2) + (3), 
and (1) + (4).

**Special note regarding (2):** If Gas has support for build attributes, both 
build attributes and GNU properties will be emitted. This behavior is still 
open for discussion. Please, let me know your thoughts regarding this behavior.

This patch series needs to be applied on top of the patch series for GCS [1].

Bootstrapped on aarch64-none-linux-gnu, and no regression found.

[1]: https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs
[2]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662825.html
[3]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664004.html

Regards,
Matthieu

Diff with revision 1 [2]:
- update the description of (2)
- address the comments related to the tests in (2)
- add new commits (1), (3) and (4)

Diff with revision 2 [3]:
- address comments of Richard Sandiford in revision 2.
- fix several formatting mistakes.
- remove RFC tag.


Matthieu Longo (3):
  aarch64: add debug comments to feature properties in .note.gnu.property
  aarch64: improve assembly debug comments for AEABI build attributes
  aarch64: encapsulate note.gnu.property emission into a class

Srinath Parvathaneni (1):
  aarch64: add minimal support of AEABI build attributes for GCS.

 gcc/config.gcc|   2 +-
 gcc/config.in |   6 +
 gcc/config/aarch64/aarch64-dwarf-metadata.cc  | 145 +++
 gcc/config/aarch64/aarch64-dwarf-metadata.h   | 245 ++
 gcc/config/aarch64/aarch64.cc |  69 ++---
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/configure |  38 +++
 gcc/configure.ac  |  10 +
 gcc/testsuite/gcc.target/aarch64/bti-1.c  |  13 +-
 .../aarch64-build-attributes.exp  |  35 +++
 .../build-attributes/build-attribute-gcs.c|  12 +
 .../build-attribute-standard.c|  12 +
 .../build-attributes/no-build-attribute-bti.c |  12 +
 .../build-attributes/no-build-attribute-gcs.c |  12 +
 .../build-attributes/no-build-attribute-pac.c |  12 +
 .../no-build-attribute-standard.c |  12 +
 gcc/testsuite/lib/target-supports.exp |  16 ++
 17 files changed, 611 insertions(+), 50 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.cc
 create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/aarch64-build-attributes.exp
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-gcs.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-standard.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-bti.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-gcs.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-pac.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-standard.c

-- 
2.47.0



[PATCH v2 2/4] aarch64: add minimal support of AEABI build attributes for GCS.

2024-10-23 Thread Matthieu Longo
From: Srinath Parvathaneni 

GCS (Guarded Control Stack, an Armv9.4-a extension) requires some
caution at runtime. The runtime linker needs to reason about the
compatibility of a set of relocable object files that might not have
been compiled with the same compiler.
Up until now, GNU properties are stored in a ELF section (.note.gnu.property)
and have been used for the previously mentioned runtime checks
performed by the linker. However, GNU properties are limited in
their expressibility, and a long-term commmitment was taken in the
ABI for the Arm architecture [1] to provide build attributes.

This patch adds a first support for AArch64 GCS build attributes.
This support includes generating two new assembler directives:
.aeabi_subsection and .aeabi_attribute. These directives are
generated as per the syntax mentioned in spec "Build Attributes for
the Arm® 64-bit Architecture (AArch64)" available at [1].

gcc/configure.ac now includes a new check to test whether the
assembler being used to build the toolchain supports these new
directives.
Two behaviors can be observed when -mbranch-protection=[gcs|standard]
is passed:
- If the assembler support them, the assembly directives are emitted
along the .note.gnu.property section for backward compatibility.
- If the assembler does not support them, only .note.gnu.property
section will contain the relevant information.

This patch needs to be applied on top of GCC's GCS patch series [2].

Bootstrapped on aarch64-none-linux-gnu, and no regression found.

[1]: https://github.com/ARM-software/abi-aa/pull/230
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs

gcc/ChangeLog:

* config.in: Regenerated
* config/aarch64/aarch64.cc
(HAVE_AS_AEABI_BUILD_ATTRIBUTES): New definition.
(aarch64_emit_aeabi_attribute): New function declaration.
(aarch64_emit_aeabi_subsection): Likewise.
(aarch64_start_file): Emit GCS build attributes.
(aarch64_file_end_indicate_exec_stack): Update GCS bit in
note.gnu.property section.
* configure: Regenerated.
* configure.ac: Add a configure check for gcc.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp:
(check_effective_target_aarch64_gas_has_build_attributes): New.
* gcc.target/aarch64/build-attributes/build-attribute-gcs.c: New test.
* gcc.target/aarch64/build-attributes/build-attribute-standard.c: New 
test.
* gcc.target/aarch64/build-attributes/aarch64-build-attributes.exp: New 
DejaGNU file.
* gcc.target/aarch64/build-attributes/no-build-attribute-bti.c: New 
test.
* gcc.target/aarch64/build-attributes/no-build-attribute-gcs.c: New 
test.
* gcc.target/aarch64/build-attributes/no-build-attribute-pac.c: New 
test.
* gcc.target/aarch64/build-attributes/no-build-attribute-standard.c: 
New test.

Co-Authored-By: Matthieu Longo  
---
 gcc/config.in |  6 +++
 gcc/config/aarch64/aarch64.cc | 41 +++
 gcc/configure | 38 +
 gcc/configure.ac  | 10 +
 .../aarch64-build-attributes.exp  | 35 
 .../build-attributes/build-attribute-gcs.c| 12 ++
 .../build-attribute-standard.c| 12 ++
 .../build-attributes/no-build-attribute-bti.c | 12 ++
 .../build-attributes/no-build-attribute-gcs.c | 12 ++
 .../build-attributes/no-build-attribute-pac.c | 12 ++
 .../no-build-attribute-standard.c | 12 ++
 gcc/testsuite/lib/target-supports.exp | 16 
 12 files changed, 218 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/aarch64-build-attributes.exp
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-gcs.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-standard.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-bti.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-gcs.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-pac.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-standard.c

diff --git a/gcc/config.in b/gcc/config.in
index 7fcabbe5061..0d54141ce30 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -355,6 +355,12 @@
 #endif
 
 
+/* Define if your assembler supports AEABI build attributes. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_AEABI_BUILD_ATTRIBUTES
+#endif
+
+
 /* Define if your assembler supports architecture modifiers. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_ARCHITECTURE_MODIFIERS
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 0466d6d11eb..90aba0fff88 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@

[PATCH v2 4/4] aarch64: encapsulate note.gnu.property emission into a class

2024-10-23 Thread Matthieu Longo
gcc/ChangeLog:

* config.gcc: Add aarch64-dwarf-metadata.o to extra_objs.
* config/aarch64/aarch64-dwarf-metadata.h
(class section_note_gnu_property): Encapsulate GNU properties code
into a class.
* config/aarch64/aarch64.cc
(GNU_PROPERTY_AARCH64_FEATURE_1_AND): Removed.
(GNU_PROPERTY_AARCH64_FEATURE_1_BTI): Likewise.
(GNU_PROPERTY_AARCH64_FEATURE_1_PAC): Likewise.
(GNU_PROPERTY_AARCH64_FEATURE_1_GCS): Likewise.
(aarch64_file_end_indicate_exec_stack): Move GNU properties code to
aarch64-dwarf-metadata.cc
* config/aarch64/t-aarch64: Declare target aarch64-dwarf-metadata.o
* config/aarch64/aarch64-dwarf-metadata.cc: New file.
---
 gcc/config.gcc   |   2 +-
 gcc/config/aarch64/aarch64-dwarf-metadata.cc | 145 +++
 gcc/config/aarch64/aarch64-dwarf-metadata.h  |  19 +++
 gcc/config/aarch64/aarch64.cc|  79 +-
 gcc/config/aarch64/t-aarch64 |  10 ++
 5 files changed, 181 insertions(+), 74 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 71ac3badafd..8d26b8776e3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -351,7 +351,7 @@ aarch64*-*-*)
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
-   extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o 
aarch64-early-ra.o aarch64-ldp-fusion.o"
+   extra_objs="aarch64-builtins.o aarch-common.o aarch64-dwarf-metadata.o 
aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o 
aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o 
aarch64-early-ra.o aarch64-ldp-fusion.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.h 
\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-dwarf-metadata.cc 
b/gcc/config/aarch64/aarch64-dwarf-metadata.cc
new file mode 100644
index 000..f272d290fed
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-dwarf-metadata.cc
@@ -0,0 +1,145 @@
+/* DWARF metadata for AArch64 architecture.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define INCLUDE_STRING
+#define INCLUDE_ALGORITHM
+#define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "output.h"
+
+#include "aarch64-dwarf-metadata.h"
+
+/* Defined for convenience.  */
+#define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT)
+
+namespace aarch64 {
+
+constexpr unsigned GNU_PROPERTY_AARCH64_FEATURE_1_AND = 0xc000;
+constexpr unsigned GNU_PROPERTY_AARCH64_FEATURE_1_BTI = (1U << 0);
+constexpr unsigned GNU_PROPERTY_AARCH64_FEATURE_1_PAC = (1U << 1);
+constexpr unsigned GNU_PROPERTY_AARCH64_FEATURE_1_GCS = (1U << 2);
+
+namespace {
+
+std::string
+gnu_property_features_to_string (unsigned feature_1_and)
+{
+  struct flag_name
+  {
+unsigned int mask;
+const char *name;
+  };
+
+  static const flag_name flags[] = {
+{GNU_PROPERTY_AARCH64_FEATURE_1_BTI, "BTI"},
+{GNU_PROPERTY_AARCH64_FEATURE_1_PAC, "PAC"},
+{GNU_PROPERTY_AARCH64_FEATURE_1_GCS, "GCS"},
+  };
+
+  const char *separator = "";
+  std::string s_features;
+  for (auto &flag : flags)
+if (feature_1_and & flag.mask)
+  {
+   s_features.append (separator).append (flag.name);
+   separator = ", ";
+  }
+  return s_features;
+};
+
+} // namespace anonymous
+
+section_note_gnu_property::section_note_gnu_property ()
+  : m_feature_1_and (0) {}
+
+void
+section_note_gnu_property::bti_enabled ()
+{
+  m_feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
+}

[PATCH v2 1/4] aarch64: add debug comments to feature properties in .note.gnu.property

2024-10-23 Thread Matthieu Longo
GNU properties are emitted to provide some information about the features
used in the generated code like BTI, GCS, or PAC. However, no debug
comment are emitted in the generated assembly even if -dA is provided.
It makes understanding the information stored in the .note.gnu.property
section more difficult than needed.

This patch adds assembly comments (if -dA is provided) next to the GNU
properties. For instance, if BTI and PAC are enabled, it will emit:
  .word  0x3  // GNU_PROPERTY_AARCH64_FEATURE_1_AND (BTI, PAC)

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_file_end_indicate_exec_stack): Emit assembly comments.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bti-1.c: Emit assembly comments, and update
test assertion.
---
 gcc/config/aarch64/aarch64.cc| 35 ++--
 gcc/testsuite/gcc.target/aarch64/bti-1.c | 13 +
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 914f2902d25..0466d6d11eb 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -29259,10 +29259,41 @@ aarch64_file_end_indicate_exec_stack ()
 type   = GNU_PROPERTY_AARCH64_FEATURE_1_AND
 datasz = 4
 data   = feature_1_and.  */
-  assemble_integer (GEN_INT (GNU_PROPERTY_AARCH64_FEATURE_1_AND), 4, 32, 
1);
+  fputs (integer_asm_op (4, true), asm_out_file);
+  fprint_whex (asm_out_file, GNU_PROPERTY_AARCH64_FEATURE_1_AND);
+  putc ('\n', asm_out_file);
   assemble_integer (GEN_INT (4), 4, 32, 1);
-  assemble_integer (GEN_INT (feature_1_and), 4, 32, 1);
 
+  fputs (integer_asm_op (4, true), asm_out_file);
+  fprint_whex (asm_out_file, feature_1_and);
+  if (flag_debug_asm)
+   {
+ struct flag_name
+ {
+   unsigned int mask;
+   const char *name;
+ };
+ static const flag_name flags[] = {
+   { GNU_PROPERTY_AARCH64_FEATURE_1_BTI, "BTI" },
+   { GNU_PROPERTY_AARCH64_FEATURE_1_PAC, "PAC" },
+   { GNU_PROPERTY_AARCH64_FEATURE_1_GCS, "GCS" },
+ };
+
+ const char *separator = "";
+ std::string s_features;
+ for (auto &flag : flags)
+   if (feature_1_and & flag.mask)
+ {
+   s_features.append (separator).append (flag.name);
+   separator = ", ";
+ }
+
+ asm_fprintf (asm_out_file,
+  "\t%s GNU_PROPERTY_AARCH64_FEATURE_1_AND (%s)\n",
+  ASM_COMMENT_START, s_features.c_str ());
+   }
+  else
+   putc ('\n', asm_out_file);
   /* Pad the size of the note to the required alignment.  */
   assemble_align (POINTER_SIZE);
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c 
b/gcc/testsuite/gcc.target/aarch64/bti-1.c
index 5a556b08ed1..53dc2d3cd8b 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* -Os to create jump table.  */
-/* { dg-options "-Os" } */
+/* { dg-options "-Os -dA" } */
 /* { dg-require-effective-target lp64 } */
 /* If configured with --enable-standard-branch-protection, don't use
command line option.  */
@@ -44,8 +44,8 @@ f_jump_table (int y, int n)
   return (y == 0)? y+1:4;
 }
 /* f_jump_table should have PACIASP and AUTIASP.  */
-/* { dg-final { scan-assembler-times "hint\t25" 1 } } */
-/* { dg-final { scan-assembler-times "hint\t29" 1 } } */
+/* { dg-final { scan-assembler-times "hint\t25 // paciasp" 1 } } */
+/* { dg-final { scan-assembler-times "hint\t29 // autiasp" 1 } } */
 
 int
 f_label_address ()
@@ -59,6 +59,7 @@ lab2:
   addr = &&lab1;
   return 2;
 }
-/* { dg-final { scan-assembler-times "hint\t34" 1 } } */
-/* { dg-final { scan-assembler-times "hint\t36" 12 } } */
-/* { dg-final { scan-assembler ".note.gnu.property" { target *-*-linux* } } } 
*/
+/* { dg-final { scan-assembler-times "hint\t34 // bti c" 1 } } */
+/* { dg-final { scan-assembler-times "hint\t36 // bti j" 12 } } */
+/* { dg-final { scan-assembler "\.section\t\.note\.gnu\.property" { target 
*-*-linux* } } } */
+/* { dg-final { scan-assembler "\.word\t0x7\t\/\/ 
GNU_PROPERTY_AARCH64_FEATURE_1_AND \\(BTI, PAC, GCS\\)" { target *-*-linux* } } 
} */
\ No newline at end of file
-- 
2.47.0



[PATCH v2 3/4] aarch64: improve assembly debug comments for AEABI build attributes

2024-10-23 Thread Matthieu Longo
The previous implementation to emit AEABI build attributes did not
support string values (asciz) in aeabi_subsection, and was not
emitting values associated to tags in the assembly comments.

This new approach provides a more user-friendly interface relying on
typing, and improves the emitted assembly comments:
  * aeabi_attribute:
** Adds the interpreted value next to the tag in the assembly
comment.
** Supports asciz values.
  * aeabi_subsection:
** Adds debug information for its parameters.
** Auto-detects the attribute types when declaring the subsection.

Additionally, it is also interesting to note that the code was moved
to a separate file to improve modularity and "releave" the 1000-lines
long aarch64.cc file from a few lines. Finally, it introduces a new
namespace "aarch64::" for AArch64 backend which reduce the length of
function names by not prepending 'aarch64_' to each of them.

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_emit_aeabi_attribute): Delete.
(aarch64_emit_aeabi_subsection): Delete.
(aarch64_start_file): Use aeabi_subsection.
* config/aarch64/aarch64-dwarf-metadata.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/build-attributes/build-attribute-gcs.c:
Improve test to match debugging comments in assembly.
* gcc.target/aarch64/build-attributes/build-attribute-standard.c:
Likewise.
---
 gcc/config/aarch64/aarch64-dwarf-metadata.h   | 226 ++
 gcc/config/aarch64/aarch64.cc |  48 +---
 .../build-attributes/build-attribute-gcs.c|   4 +-
 .../build-attribute-standard.c|   4 +-
 4 files changed, 243 insertions(+), 39 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.h

diff --git a/gcc/config/aarch64/aarch64-dwarf-metadata.h 
b/gcc/config/aarch64/aarch64-dwarf-metadata.h
new file mode 100644
index 000..01f08ad073e
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-dwarf-metadata.h
@@ -0,0 +1,226 @@
+/* DWARF metadata for AArch64 architecture.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef GCC_AARCH64_DWARF_METADATA_H
+#define GCC_AARCH64_DWARF_METADATA_H
+
+#include "system.h"
+#include "vec.h"
+
+namespace aarch64 {
+
+enum attr_val_type : uint8_t
+{
+  uleb128 = 0x0,
+  asciz = 0x1,
+};
+
+enum BA_TagFeature_t : uint8_t
+{
+  Tag_Feature_BTI = 1,
+  Tag_Feature_PAC = 2,
+  Tag_Feature_GCS = 3,
+};
+
+template 
+struct aeabi_attribute
+{
+  T_tag tag;
+  T_val value;
+};
+
+template 
+aeabi_attribute
+make_aeabi_attribute (T_tag tag, T_val val)
+{
+  return aeabi_attribute{tag, val};
+}
+
+namespace details {
+
+constexpr const char *
+to_c_str (bool b)
+{
+  return b ? "true" : "false";
+}
+
+constexpr const char *
+to_c_str (const char *s)
+{
+  return s;
+}
+
+constexpr const char *
+to_c_str (attr_val_type t)
+{
+  return (t == uleb128 ? "ULEB128"
+ : t == asciz ? "asciz"
+ : nullptr);
+}
+
+constexpr const char *
+to_c_str (BA_TagFeature_t feature)
+{
+  return (feature == Tag_Feature_BTI ? "Tag_Feature_BTI"
+ : feature == Tag_Feature_PAC ? "Tag_Feature_PAC"
+ : feature == Tag_Feature_GCS ? "Tag_Feature_GCS"
+ : nullptr);
+}
+
+template <
+  typename T,
+  typename = typename std::enable_if::value, T>::type
+>
+constexpr const char *
+aeabi_attr_str_fmt (T phantom __attribute__((unused)))
+{
+  return "\t.aeabi_attribute %u, %u";
+}
+
+constexpr const char *
+aeabi_attr_str_fmt (const char *phantom __attribute__((unused)))
+{
+  return "\t.aeabi_attribute %u, \"%s\"";
+}
+
+template <
+  typename T,
+  typename = typename std::enable_if::value, T>::type
+>
+constexpr uint8_t
+aeabi_attr_val_for_fmt (T value)
+{
+  return static_cast(value);
+}
+
+constexpr const char *
+aeabi_attr_val_for_fmt (const char *s)
+{
+  return s;
+}
+
+template 
+void
+write (FILE *out_file, aeabi_attribute const &attr)
+{
+  asm_fprintf (out_file, aeabi_attr_str_fmt (T_val{}),
+  attr.tag, aeabi_attr_val_for_fmt (attr.value));
+  if (flag_debug_asm)
+asm_fprintf (out_file, "\t%s %s: %s", ASM_COMMENT_START,
+to_c_str (attr.tag), to_c_str (attr.value));
+  asm_fprintf (out_file, "\n");
+}
+
+template <
+  typename T,
+  typename = typename 

Re: [PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

2024-10-23 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi all,
>
> The MD pattern for the XAR instruction in SVE2 is currently expressed with
> non-canonical RTL by using a ROTATERT code with a constant rotate amount.
> Fix it by using the left ROTATE code.  This necessitates splitting out the
> expander separately to translate the immediate coming from the intrinsic
> from a right-rotate to a left-rotate immediate.

Could we instead do the translation in aarch64-sve-builtins-sve2.cc?
It should be simpler to adjust there, by modifying the function_expander's
args array.

> Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
> element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
> operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
> (that can only handle V2DImode operands).  Therefore let's extend the accepted
> modes of the SVE2 patternt to include the 128-bit Advanced SIMD integer modes.

As mentioned in other reply that I sent out-of-order, I think we could
also include the 64-bit modes.

LGTM otherwise FWIW.

Thanks,
Richard

>
> This leads to some tests for the svxar* intrinsics to fail because they now
> simplify to a plain EOR when the rotate amount is the width of the element.
> This simplification is desirable (EOR instructions have better or equal
> throughput than XAR, and they are non-destructive of their input) so the
> tests are adjusted.
>
> For V2DImode XAR operations we should prefer the Advanced SIMD version when
> it is available (TARGET_SHA3) because it is non-destructive, so restrict the
> SVE2 pattern accordingly.  Tests are added to confirm this.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for mainline?
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
>   * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar): Rename
>   to...
>   (*aarch64_sve2_xar_insn): ... This.  Use SVE_ASIMD_FULL_I
>   iterator and adjust output logic.
>   (@aarch64_sve2_xar): New define_expand.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/xar_neon_modes.c: New test.
>   * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
>   XAR.
>   * gcc.target/aarch64/sve2/acle/asm/xar_s32.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_s64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_s8.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_u16.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_u32.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_u64.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_u8.c: Likewise.
>
> From 41a7b2bfe69d7fc716b5da969d19185885c6b2bf Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov 
> Date: Tue, 22 Oct 2024 03:27:47 -0700
> Subject: [PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR
>  and extend it to fixed-width modes
>
> The MD pattern for the XAR instruction in SVE2 is currently expressed with
> non-canonical RTL by using a ROTATERT code with a constant rotate amount.
> Fix it by using the left ROTATE code.  This necessitates splitting out the
> expander separately to translate the immediate coming from the intrinsic
> from a right-rotate to a left-rotate immediate.
>
> Additionally, as the SVE2 XAR instruction is unpredicated and can handle all
> element sizes from .b to .d, it is a good fit for implementing the XOR+ROTATE
> operation for Advanced SIMD modes where the TARGET_SHA3 cannot be used
> (that can only handle V2DImode operands).  Therefore let's extend the accepted
> modes of the SVE2 patternt to include the 128-bit Advanced SIMD integer modes.
>
> This leads to some tests for the svxar* intrinsics to fail because they now
> simplify to a plain EOR when the rotate amount is the width of the element.
> This simplification is desirable (EOR instructions have better or equal
> throughput than XAR, and they are non-destructive of their input) so the
> tests are adjusted.
>
> For V2DImode XAR operations we should prefer the Advanced SIMD version when
> it is available (TARGET_SHA3) because it is non-destructive, so restrict the
> SVE2 pattern accordingly.  Tests are added to confirm this.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for mainline?
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator.
>   * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar): Rename
>   to...
>   (*aarch64_sve2_xar_insn): ... This.  Use SVE_ASIMD_FULL_I
>   iterator and adjust output logic.
>   (@aarch64_sve2_xar): New define_expand.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/xar_neon_modes.c: New test.
>   * gcc.target/aarch64/xar_v2di_nonsve.c: Likewise.
>   * gcc.target/aarch64/sve2/acle/asm/xar_s16.c: Scan for EOR rather than
>   XAR.
>   * gcc.target/aarch64/sv

[PATCH v2 1/2] aarch64: Add support for mfloat8x{8|16}_t types

2024-10-23 Thread Andrew Carlotti
Compared to v1, I've split changes that aren't used for the type definitions
into a separate patch.  I've also added some tests, mostly along the lines
suggested by Richard S.

Bootstrapped and regression tested on aarch64; ok for master?

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(aarch64_init_simd_builtin_types): Initialise FP8 simd types.
* config/aarch64/aarch64-builtins.h
(enum aarch64_type_qualifiers): Add qualifier_modal_float bit.
* config/aarch64/aarch64-simd-builtin-types.def:
Add Mfloat8x{8|16}_t types.
* config/aarch64/arm_neon.h: Add mfloat8x{8|16}_t typedefs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/movv16qi_2.c: Test mfloat as well.
* gcc.target/aarch64/movv16qi_3.c: Ditto.
* gcc.target/aarch64/movv2x16qi_1.c: Ditto.
* gcc.target/aarch64/movv3x16qi_1.c: Ditto.
* gcc.target/aarch64/movv4x16qi_1.c: Ditto.
* gcc.target/aarch64/movv8qi_2.c: Ditto.
* gcc.target/aarch64/movv8qi_3.c: Ditto.
* gcc.target/aarch64/mfloat-init-1.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.h 
b/gcc/config/aarch64/aarch64-builtins.h
index 
e326fe666769cedd6c06d0752ed30b9359745ac9..00db7a74885db4d97ed365e8e3e2d7cf7d8410a4
 100644
--- a/gcc/config/aarch64/aarch64-builtins.h
+++ b/gcc/config/aarch64/aarch64-builtins.h
@@ -54,6 +54,8 @@ enum aarch64_type_qualifiers
   /* Lane indices selected in quadtuplets. - must be in range, and flipped for
  bigendian.  */
   qualifier_lane_quadtup_index = 0x1000,
+  /* Modal FP types.  */
+  qualifier_modal_float = 0x2000,
 };
 
 #define ENTRY(E, M, Q, G) E,
diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
7d737877e0bf6c1f9eb53351a6085b0db16a04d6..432131c3b2d7cf4f788b79ce3d84c9e7554dc750
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1220,6 +1220,10 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
   aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
+  /* Init FP8 element types.  */
+  aarch64_simd_types[Mfloat8x8_t].eltype = aarch64_mfp8_type_node;
+  aarch64_simd_types[Mfloat8x16_t].eltype = aarch64_mfp8_type_node;
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = aarch64_simd_types[i].eltype;
diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def 
b/gcc/config/aarch64/aarch64-simd-builtin-types.def
index 
6111cd0d4fe1136feabb36a4077cf86d13b835e2..83b2da2e7dc0962c1e5957e25c8f6232c2148fe5
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
+++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
@@ -52,3 +52,5 @@
   ENTRY (Float64x2_t, V2DF, none, 13)
   ENTRY (Bfloat16x4_t, V4BF, none, 14)
   ENTRY (Bfloat16x8_t, V8BF, none, 14)
+  ENTRY (Mfloat8x8_t, V8QI, modal_float, 13)
+  ENTRY (Mfloat8x16_t, V16QI, modal_float, 14)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
e376685489da055029def6b661132b5154886b57..730d9d3fa8158ef2d1d13c0f629e306e774145a0
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -72,6 +72,9 @@ typedef __Poly16_t poly16_t;
 typedef __Poly64_t poly64_t;
 typedef __Poly128_t poly128_t;
 
+typedef __Mfloat8x8_t mfloat8x8_t;
+typedef __Mfloat8x16_t mfloat8x16_t;
+
 typedef __fp16 float16_t;
 typedef float float32_t;
 typedef double float64_t;
diff --git a/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c 
b/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c
new file mode 100644
index 
..15a6b331fd3986476950e799d11bdef710193f1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c
@@ -0,0 +1,5 @@
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+
+/* { dg-error "invalid conversion to type 'mfloat8_t" "" {target *-*-*} 0 } */
+__Mfloat8x8_t const_mf8x8 () { return (__Mfloat8x8_t) { 1, 1, 1, 1, 1, 1, 1, 1 
}; }
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c 
b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
index 
08a0a19b515134742fcb121e8cf6a19600f86075..39a06db0707538996fb5a3990ef53589d0210b17
 100644
--- a/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
@@ -17,6 +17,7 @@ TEST_GENERAL (__Bfloat16x8_t)
 TEST_GENERAL (__Float16x8_t)
 TEST_GENERAL (__Float32x4_t)
 TEST_GENERAL (__Float64x2_t)
+TEST_GENERAL (__Mfloat8x16_t)
 
 __Int8x16_t const_s8x8 () { return (__Int8x16_t) { 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1 }; }
 __Int16x8_t const_s16x4 () { return (__Int16x8_t) { 1, 0, 1, 0, 1, 0, 1, 0 }; }
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c 
b/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c
index 
d43b994c1387bd7d9fb9517944d807e7f70b3c2a..082e95c017381597357cdd2a40fd732b449d369f
 100644
--- a/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_3.c
@@ -22

[PATCH 1/5] Internal-fn: Introduce new IFN MASK_LEN_STRIDED_LOAD{STORE}

2024-10-23 Thread pan2 . li
From: Pan Li 

This patch would like to introduce new IFN for strided load and store.

LOAD:  v = MASK_LEN_STRIDED_LOAD (ptr, stride, mask, len, bias)
STORE: MASK_LEN_STRIED_STORE (ptr, stride, v, mask, len, bias)

The IFN target below code example similar as below

void foo (int * a, int * b, int stride, int n)
{
  for (int i = 0; i < n; i++)
a[i * stride] = b[i * stride];
}

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* internal-fn.cc (strided_load_direct): Add new define direct
for strided load.
(strided_store_direct): Ditto but for store.
(expand_strided_load_optab_fn): Add new func to expand the IFN
MASK_LEN_STRIDED_LOAD in middle-end.
(expand_strided_store_optab_fn): Ditto but for store.
(direct_strided_load_optab_supported_p): Add define for stride
load optab supported.
(direct_strided_store_optab_supported_p): Ditto but for store.
(internal_fn_len_index): Add strided load/store len index.
(internal_fn_mask_index): Ditto but for mask.
(internal_fn_stored_value_index): Add strided store value index.
* internal-fn.def (MASK_LEN_STRIDED_LOAD): Add new IFN for
strided load.
(MASK_LEN_STRIDED_STORE): Ditto but for store.
* optabs.def (OPTAB_D): Add strided load/store optab.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 gcc/internal-fn.cc  | 71 +
 gcc/internal-fn.def |  6 
 gcc/optabs.def  |  2 ++
 3 files changed, 79 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index d89a04fe412..bfbbba8e2dd 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -159,6 +159,7 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define strided_load_direct { -1, -1, false }
 #define len_load_direct { -1, -1, false }
 #define mask_len_load_direct { -1, 4, false }
 #define mask_store_direct { 3, 2, false }
@@ -168,6 +169,7 @@ init_internal_fns ()
 #define vec_cond_mask_len_direct { 1, 1, false }
 #define vec_cond_direct { 2, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define strided_store_direct { 1, 1, false }
 #define len_store_direct { 3, 3, false }
 #define mask_len_store_direct { 4, 5, false }
 #define vec_set_direct { 3, 3, false }
@@ -3712,6 +3714,64 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   assign_call_lhs (lhs, lhs_rtx, &ops[0]);
 }
 
+/* Expand MASK_LEN_STRIDED_LOAD call CALL by optab OPTAB.  */
+
+static void
+expand_strided_load_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
+ direct_optab optab)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+
+  unsigned i = 0;
+  class expand_operand ops[6];
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  create_output_operand (&ops[i++], lhs_rtx, mode);
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+
+  i = add_mask_and_len_args (ops, i, stmt);
+  expand_insn (direct_optab_handler (optab, mode), i, ops);
+
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+emit_move_insn (lhs_rtx, ops[0].value);
+}
+
+/* Expand MASK_LEN_STRIDED_STORE call CALL by optab OPTAB.  */
+
+static void
+expand_strided_store_optab_fn (ATTRIBUTE_UNUSED internal_fn, gcall *stmt,
+  direct_optab optab)
+{
+  internal_fn fn = gimple_call_internal_fn (stmt);
+  int rhs_index = internal_fn_stored_value_index (fn);
+
+  tree base = gimple_call_arg (stmt, 0);
+  tree stride = gimple_call_arg (stmt, 1);
+  tree rhs = gimple_call_arg (stmt, rhs_index);
+
+  rtx base_rtx = expand_normal (base);
+  rtx stride_rtx = expand_normal (stride);
+  rtx rhs_rtx = expand_normal (rhs);
+
+  unsigned i = 0;
+  class expand_operand ops[6];
+  machine_mode mode = TYPE_MODE (TREE_TYPE (rhs));
+
+  create_address_operand (&ops[i++], base_rtx);
+  create_address_operand (&ops[i++], stride_rtx);
+  create_input_operand (&ops[i++], rhs_rtx, mode);
+
+  i = add_mask_and_len_args (ops, i, stmt);
+  expand_insn (direct_optab_handler (optab, mode), i, ops);
+}
+
 /* Helper for expand_DIVMOD.  Return true if the sequence starting with
INSN contains any call insns or insns with {,U}{DIV,MOD} rtxes.  */
 
@@ -4101,6 +4161,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
 #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_gather_load_opt

[PATCH 2/5] Vect: Introduce MASK_LEN_STRIDED_LOAD{STORE} to loop vectorizer

2024-10-23 Thread pan2 . li
From: Pan Li 

This patch would like to allow generation of MASK_LEN_STRIDED_LOAD{STORE} IR
for invariant stride memory access.  For example as below

void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
for (int i = 0; i < n; i++)
  a[i*stride] = b[i*stride] + 100;
}

Before this patch:
  66   │   _73 = .SELECT_VL (ivtmp_71, POLY_INT_CST [4, 4]);
  67   │   _52 = _54 * _73;
  68   │   vect__5.16_61 = .MASK_LEN_GATHER_LOAD (vectp_b.14_59, _58, 4, { 0, 
... }, { -1, ... }, _73, 0);
  69   │   vect__7.17_63 = vect__5.16_61 + { 100, ... };
  70   │   .MASK_LEN_SCATTER_STORE (vectp_a.18_67, _58, 4, vect__7.17_63, { -1, 
... }, _73, 0);
  71   │   vectp_b.14_60 = vectp_b.14_59 + _52;
  72   │   vectp_a.18_68 = vectp_a.18_67 + _52;
  73   │   ivtmp_72 = ivtmp_71 - _73;

After this patch:
  60   │   _70 = .SELECT_VL (ivtmp_68, POLY_INT_CST [4, 4]);
  61   │   _52 = _54 * _70;
  62   │   vect__5.16_58 = .MASK_LEN_STRIDED_LOAD (vectp_b.14_56, _55, { 0, ... 
}, { -1, ... }, _70, 0);
  63   │   vect__7.17_60 = vect__5.16_58 + { 100, ... };
  64   │   .MASK_LEN_STRIDED_STORE (vectp_a.18_64, _55, vect__7.17_60, { -1, 
... }, _70, 0);
  65   │   vectp_b.14_57 = vectp_b.14_56 + _52;
  66   │   vectp_a.18_65 = vectp_a.18_64 + _52;
  67   │   ivtmp_69 = ivtmp_68 - _70;

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_get_strided_load_store_ops): Handle
MASK_LEN_STRIDED_LOAD{STORE} after supported check.
(vectorizable_store): Generate MASK_LEN_STRIDED_LOAD when the offset
of gater is not vector type.
(vectorizable_load): Ditto but for store.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 gcc/tree-vect-stmts.cc | 45 +-
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e7f14c3144c..78d66a4ef9d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2950,6 +2950,15 @@ vect_get_strided_load_store_ops (stmt_vec_info stmt_info,
   *dataref_bump = cse_and_gimplify_to_preheader (loop_vinfo, bump);
 }
 
+  internal_fn ifn
+= DR_IS_READ (dr) ? IFN_MASK_LEN_STRIDED_LOAD : IFN_MASK_LEN_STRIDED_STORE;
+  if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
+{
+  *vec_offset = cse_and_gimplify_to_preheader (loop_vinfo,
+  unshare_expr (DR_STEP (dr)));
+  return;
+}
+
   /* The offset given in GS_INFO can have pointer type, so use the element
  type of the vector instead.  */
   tree offset_type = TREE_TYPE (gs_info->offset_vectype);
@@ -9194,10 +9203,20 @@ vectorizable_store (vec_info *vinfo,
 
  gcall *call;
  if (final_len && final_mask)
-   call = gimple_build_call_internal
-(IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
- vec_offset, scale, vec_oprnd, final_mask,
- final_len, bias);
+   {
+ if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
+   call = gimple_build_call_internal (
+ IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr,
+ vec_offset, scale, vec_oprnd, final_mask, final_len,
+ bias);
+ else
+   /* Non-vector offset indicates that prefer to take
+  MASK_LEN_STRIDED_STORE instead of the
+  IFN_MASK_SCATTER_STORE with direct stride arg.  */
+   call = gimple_build_call_internal (
+ IFN_MASK_LEN_STRIDED_STORE, 6, dataref_ptr,
+ vec_offset, vec_oprnd, final_mask, final_len, bias);
+   }
  else if (final_mask)
call = gimple_build_call_internal
 (IFN_MASK_SCATTER_STORE, 5, dataref_ptr,
@@ -11194,11 +11213,19 @@ vectorizable_load (vec_info *vinfo,
 
  gcall *call;
  if (final_len && final_mask)
-   call
- = gimple_build_call_internal (IFN_MASK_LEN_GATHER_LOAD, 7,
-   dataref_ptr, vec_offset,
-   scale, zero, final_mask,
-   final_len, bias);
+   {
+ if (VECTOR_TYPE_P (TREE_TYPE (vec_offset)))
+   call = gimple_build_call_internal (
+ IFN_MASK_LEN_GATHER_LOAD, 7, dataref_ptr, vec_offset,
+ scale, zero, final_mask, final_len, bias);
+ else
+   /* Non-vector offset indicates that prefer 

[PATCH 3/5] RISC-V: Adjust the gather-scatter testcases due to middle-end change

2024-10-23 Thread pan2 . li
From: Pan Li 

After we have MASK_LEN_STRIDED_LOAD{STORE} in the middle-end, the
strided case need to be adjust for IR check.

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
Adjust IR for MASK_LEN_LOAD check.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
Ditto but for store.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
Ditto.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 .../riscv/rvv/autovec/gather-scatter/strided_load-1.c   | 2 +-
 .../riscv/rvv/autovec/gather-scatter/strided_load-2.c   | 2 +-
 .../riscv/rvv/autovec/gather-scatter/strided_store-1.c  | 2 +-
 .../riscv/rvv/autovec/gather-scatter/strided_store-2.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
index 53263d16ae2..79b39f102bf 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c
@@ -40,6 +40,6 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_GATHER_LOAD" 66 "optimized" 
} } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 66 
"optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
index 6fef474cf8e..8a452e547a3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c
@@ -40,6 +40,6 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_GATHER_LOAD" 33 "optimized" 
} } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD " 33 
"optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
index ad23ed42129..ec8c3a5c63a 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c
@@ -40,6 +40,6 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_SCATTER_STORE" 66 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE" 66 
"optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
index 65f3f00b8c2..b433b5b5210 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c
@@ -40,6 +40,6 @@
 
 TEST_ALL (TEST_LOOP)
 
-/* { dg-final { scan-tree-dump-times " \.MASK_LEN_SCATTER_STORE" 44 
"optimized" } } */
+/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE " 44 
"optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */
-- 
2.43.0



[PATCH 5/5] RISC-V: Add testcases for form 1 of MASK_LEN_STRIDED_LOAD{STORE}

2024-10-23 Thread pan2 . li
From: Pan Li 

Form 1:
  void __attribute__((noinline))\
  vec_strided_load_store_##T##_form_1 (T *restrict out, T *restrict in, \
   long stride, size_t size)\
  { \
for (size_t i = 0; i < size; i++)   \
  out[i * stride] = in[i * stride]; \
  }

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add strided folder.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f16.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f32.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-f64.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i16.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i32.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i64.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-i8.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u16.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u32.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u64.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-run-1-u8.c: New 
test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_data.h: New test.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st_run.h: New test.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 .../rvv/autovec/strided/strided_ld_st-1-f16.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-f32.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-f64.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-i16.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-i32.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-i64.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-i8.c  |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-u16.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-u32.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-u64.c |   11 +
 .../rvv/autovec/strided/strided_ld_st-1-u8.c  |   11 +
 .../autovec/strided/strided_ld_st-run-1-f16.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-f32.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-f64.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-i16.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-i32.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-i64.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-i8.c  |   15 +
 .../autovec/strided/strided_ld_st-run-1-u16.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-u32.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-u64.c |   15 +
 .../autovec/strided/strided_ld_st-run-1-u8.c  |   15 +
 .../riscv/rvv/autovec/strided/strided_ld_st.h |   22 +
 .../rvv/autovec/strided/strided_ld_st_data.h  | 1145 +
 .../rvv/autovec/strided/strided_ld_st_run.h   |   27 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|2 +
 26 files changed, 1482 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u1

Re: [PATCH v4 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-10-23 Thread Tobias Burnus

Hi PA,

thanks for the update.

Paul-Antoine Arras wrote:

[…] Please find attached a revised patch.


LGTM, except:

* The update to builtins.cc's builtin_fnspec  is lacking in the 
changelog list.


* And the new testcase, new 
gcc/testsuite/c-c++-common/gomp/dispatch-10.c, has to be put into 3/7 or 
later of the series as it requires a parser and, as written/file 
location, requires both C and C++, unless a skip for C is added. I 
wonder whether a scan tree-dump check should be added as well.


Next comes the C parser …

* On your side: There is one ICE (testcase in last patch review for 2/7) 
when the variant function has not been declared but used in 'declare 
variant' + adjust_args, which needs to be fixed. As has the [3] vs [6] 
array size issue, I mentioned in my 3/7 review. And probably more …


* And on my side: Still have to finish reviewing that patch.

Tobias




[PATCH 1/2 v3] Match: Simplify unsigned scalar sat_sub(x, 1) to (x - x != 0)

2024-10-23 Thread Li Xu
From: xuli 

When the imm operand op1=1 in the unsigned scalar sat_sub form2 below,
we can simplify (x != 0 ? x + max : 0) to (x - x != 0), thereby eliminating
a branch instruction.

Form2:
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
{   \
  return x >= (T)IMM ? x - (T)IMM : 0;  \
}

Take below form 2 as example:
DEF_SAT_U_SUB_IMM_FMT_2(uint8_t, 1)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  uint8_t _1;
  uint8_t _3;

   [local count: 1073741824]:
  if (x_2(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870912]:
  _3 = x_2(D) + 255;

   [local count: 1073741824]:
  # _1 = PHI 
  return _1;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
beq a0,zero,.L2
addiw   a0,a0,-1
andia0,a0,0xff
.L2:
ret

After this patch:
__attribute__((noinline))
uint8_t sat_u_sub_imm1_uint8_t_fmt_2 (uint8_t x)
{
  _Bool _1;
  unsigned char _2;
  uint8_t _4;

   [local count: 1073741824]:
  _1 = x_3(D) != 0;
  _2 = (unsigned char) _1;
  _4 = x_3(D) - _2;
  return _4;

}

Assembly code:
sat_u_sub_imm1_uint8_t_fmt_2:
sneza5,a0
subwa0,a0,a5
andia0,a0,0xff
ret

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 
gcc/ChangeLog:

* match.pd: Simplify (x != 0 ? x + max : 0) to (x - x != 0).

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-44.c: New test.

---
 gcc/match.pd   |  9 
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c | 26 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0455dfa6993..6a245f8e0d3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3383,6 +3383,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* The boundary condition for case 10: IMM = 1:
+   SAT_U_SUB = X >= IMM ? (X - IMM) : 0.
+   simplify (X != 0 ? X + max : 0) to (X - X != 0).  */
+(simplify
+ (cond (ne @0 integer_zerop) (plus @0 integer_all_onesp) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+ && types_match (type, @0))
+   (minus @0 (convert (ne @0 { build_zero_cst (type); })
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c
new file mode 100644
index 000..756ba065d84
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-44.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1" } */
+
+#include 
+
+uint8_t sat_u_imm1_uint8_t (uint8_t x)
+{
+  return x >= (uint8_t)1 ? x - (uint8_t)1 : 0;
+}
+
+uint16_t sat_u_imm1_uint16_t (uint16_t x)
+{
+  return x >= (uint16_t)1 ? x - (uint16_t)1 : 0;
+}
+
+uint32_t sat_u_imm1_uint32_t (uint32_t x)
+{
+  return x >= (uint32_t)1 ? x - (uint32_t)1 : 0;
+}
+
+uint64_t sat_u_imm1_uint64_t (uint64_t x)
+{
+  return x >= (uint64_t)1 ? x - (uint64_t)1 : 0;
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "phiopt1" } } */
-- 
2.17.1



Re: [PATCH] Implement Fortran diagnostic buffering for non-textual formats [PR105916]

2024-10-23 Thread Tobias Burnus

David Malcolm wrote:

In order to handle various awkward parsing issues, the Fortran frontend
implements buffering of diagnostics, so that diagnostics reported to
global_dc can be either:
(a) immediately issued, or
(b) speculatively reported to global_dc, and stored in a buffer, to
either be issued later or discarded.

...

This patch moves responsibility for such buffering of diagnostics from
fortran's error.cc to the diagnostic subsystem.

...

Does this look OK from the Fortran side?  The Fortran changes are
essentially all specific to error.cc, converting from manipulations of
output_buffer to those of diagnostic_buffer.


Yes, LGTM. (I only looked at the Fortran changes.)

Thanks,

Tobias

PS: I guess we eventually want to have in the testsuite some Fortran 
SARIF tests, which for actual Fortran errors/warnings and not "just" for 
#error.



I'm hoping to get this in as I have followup work to support having e.g.
both text *and* SARIF at once (PR other/116613), and fixing this is a
prerequisite for that work.

Thanks
Dave

gcc/ChangeLog:
PR fortran/105916
* diagnostic-buffer.h: New file.

...

gcc/fortran/ChangeLog:
PR fortran/105916
* error.cc (pp_error_buffer, pp_warning_buffer): Convert from
output_buffer * to diagnostic_buffer *.
(warningcount_buffered, werrorcount_buffered): Eliminate.
(gfc_error_buffer::gfc_error_buffer): Move constructor definition
here, and initialize "buffer" using *global_dc.
(gfc_output_buffer_empty_p): Delete in favor of
diagnostic_buffer::empty_p.
(gfc_clear_pp_buffer): Replace with...
(gfc_clear_diagnostic_buffer): ...this, moving implementation
details to diagnostic_context::clear_diagnostic_buffer.
(gfc_warning): Replace buffering implementation with calls
to global_dc->get_diagnostic_buffer and
global_dc->set_diagnostic_buffer.
(gfc_clear_warning): Update for renaming of gfc_clear_pp_buffer
and elimination of warningcount_buffered and werrorcount_buffered.
(gfc_warning_check): Replace buffering implementation with calls
to pp_warning_buffer->empty_p and
global_dc->flush_diagnostic_buffer.
(gfc_error_opt): Replace buffering implementation with calls to
global_dc->get_diagnostic_buffer and set_diagnostic_buffer.
(gfc_clear_error): Update for renaming of gfc_clear_pp_buffer.
(gfc_error_flag_test): Replace call to gfc_output_buffer_empty_p
with call to diagnostic_buffer::empty_p.
(gfc_error_check): Replace buffering implementation with calls
to pp_error_buffer->empty_p and global_dc->flush_diagnostic_buffer.
(gfc_move_error_buffer_from_to): Replace buffering implementation
with usage of diagnostic_buffer.
(gfc_free_error): Update for renaming of gfc_clear_pp_buffer.
(gfc_diagnostics_init): Use "new" directly when creating
pp_warning_buffer.  Remove setting of m_flush_p on the two
buffers, as this is handled by diagnostic_buffer and by
diagnostic_text_format_buffer's constructor.
* gfortran.h: Replace #include "pretty-print.h" for output_buffer
with #include "diagnostic-buffer.h" for diagnostic_buffer.
(struct gfc_error_buffer): Change type of field "buffer" from
output_buffer to diagnostic_buffer.  Move definition of constructor
into error.cc so that it can use global_dc.

gcc/testsuite/ChangeLog:
PR fortran/105916
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Include
"diagnostic-buffer.h".

...

* gfortran.dg/diagnostic-format-json-pr105916.F90: New test.
* gfortran.dg/diagnostic-format-sarif-1.F90: New test.
* gfortran.dg/diagnostic-format-sarif-1.py: New support script.
* gfortran.dg/diagnostic-format-sarif-pr105916.f90: New test.


[PATCH 4/5] RISC-V: Implement the MASK_LEN_STRIDED_LOAD{STORE}

2024-10-23 Thread pan2 . li
From: Pan Li 

This patch would like to implment the MASK_LEN_STRIDED_LOAD{STORE} in
the RISC-V backend by leveraging the vector strided load/store insn.

For example:
void foo (int * __restrict a, int * __restrict b, int stride, int n)
{
for (int i = 0; i < n; i++)
  a[i*stride] = b[i*stride] + 100;
}

Before this patch:
  38   │ vsetvli a5,a3,e32,m1,ta,ma
  39   │ vluxei64.v  v1,(a1),v4
  40   │ mul a4,a2,a5
  41   │ sub a3,a3,a5
  42   │ vadd.vv v1,v1,v2
  43   │ vsuxei64.v  v1,(a0),v4
  44   │ add a1,a1,a4
  45   │ add a0,a0,a4

After this patch:
  33   │ vsetvli a5,a3,e32,m1,ta,ma
  34   │ vlse32.vv1,0(a1),a2
  35   │ mul a4,a2,a5
  36   │ sub a3,a3,a5
  37   │ vadd.vv v1,v1,v2
  38   │ vsse32.vv1,0(a0),a2
  39   │ add a1,a1,a4
  40   │ add a0,a0,a4

The below test suites are passed for this patch:
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (mask_len_strided_load_): Add
new pattern for MASK_LEN_STRIDED_LOAD.
(mask_len_strided_store_): Ditto but for store.
* config/riscv/riscv-protos.h (expand_strided_load): Add new
func decl to expand strided load.
(expand_strided_store): Ditto but for store.
* config/riscv/riscv-v.cc (expand_strided_load): Add new
func impl to expand strided load.
(expand_strided_store): Ditto but for store.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
---
 gcc/config/riscv/autovec.md | 29 ++
 gcc/config/riscv/riscv-protos.h |  2 ++
 gcc/config/riscv/riscv-v.cc | 52 +
 3 files changed, 83 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index a34f63c9651..85a915bd65f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2855,3 +2855,32 @@ (define_expand "v3"
 DONE;
   }
 )
+
+;; =
+;; == Strided Load/Store
+;; =
+(define_expand "mask_len_strided_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand   1 "pmode_reg_or_0_operand")
+   (match_operand   2 "pmode_reg_or_0_operand")
+   (match_operand:  3 "vector_mask_operand")
+   (match_operand   4 "autovec_length_operand")
+   (match_operand   5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_strided_load (mode, operands);
+DONE;
+  })
+
+(define_expand "mask_len_strided_store_"
+  [(match_operand   0 "pmode_reg_or_0_operand")
+   (match_operand   1 "pmode_reg_or_0_operand")
+   (match_operand:V 2 "register_operand")
+   (match_operand:  3 "vector_mask_operand")
+   (match_operand   4 "autovec_length_operand")
+   (match_operand   5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_strided_store(mode, operands);
+DONE;
+  })
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d690162bb0c..47c9494ff2b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -696,6 +696,8 @@ bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned 
HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, rtx);
 bool expand_vec_setmem (rtx, rtx, rtx);
 bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
+void expand_strided_load (machine_mode, rtx *);
+void expand_strided_store (machine_mode, rtx *);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 630fbd80e94..ae028e8928a 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3833,6 +3833,58 @@ expand_load_store (rtx *ops, bool is_load)
 }
 }
 
+/* Expand MASK_LEN_STRIDED_LOAD.  */
+void
+expand_strided_load (machine_mode mode, rtx *ops)
+{
+  rtx v_reg = ops[0];
+  rtx base = ops[1];
+  rtx stride = ops[2];
+  rtx mask = ops[3];
+  rtx len = ops[4];
+  poly_int64 len_val;
+
+  insn_code icode = code_for_pred_strided_load (mode);
+  rtx emit_ops[] = {v_reg, mask, gen_rtx_MEM (mode, base), stride};
+
+  if (poly_int_rtx_p (len, &len_val)
+  && known_eq (len_val, GET_MODE_NUNITS (mode)))
+emit_vlmax_insn (icode, BINARY_OP_TAMA, emit_ops);
+  else
+{
+  len = satisfies_constraint_K (len) ? len : force_reg (Pmode, len);
+  emit_nonvlmax_insn (icode, BINARY_OP_TAMA, emit_ops, len);
+}
+}
+
+/* Expand MASK_LEN_STRIDED_STORE.  */
+void
+expand_strided_store (machine_mode mode, rtx *ops)
+{
+  rtx v_reg = ops[2];
+  rtx base = ops[0];
+  rtx stride = ops[1];
+  rtx mask = ops[3];
+  rtx len = ops[4];
+  poly_int64 len_val;
+  rtx vl_type;
+
+  if (poly_int_rtx_p (len, &len_val)
+  && known_eq (len_val, GET_MODE_NUNITS (mode)))
+{
+  len = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, len);
+  vl_type = get_avl_type_rtx (VLMAX);

[PATCH v2 2/2] aarch64: Add mfloat vreinterpret intrinsics

2024-10-23 Thread Andrew Carlotti
This patch splits out some of the qualifier handling from the v1 patch, and
adjusts the VREINTERPRET* macros to include support for mf8 intrinsics.

Bootstrapped and regression tested on aarch64; ok for master?

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (MODE_d_mf8): New.
(MODE_q_mf8): New.
(QUAL_mf8): New.
(VREINTERPRET_BUILTINS1): Add mf8 entry.
(VREINTERPRET_BUILTINS): Ditto.
(VREINTERPRETQ_BUILTINS1): Ditto.
(VREINTERPRETQ_BUILTINS): Ditto.
(aarch64_lookup_simd_type_in_table): Match modal_float bit

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c: New test.


diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 
432131c3b2d7cf4f788b79ce3d84c9e7554dc750..31231c9e66ee8307cb86e181fc51ea2622c5f82c
 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -133,6 +133,7 @@
 #define MODE_d_f16 E_V4HFmode
 #define MODE_d_f32 E_V2SFmode
 #define MODE_d_f64 E_V1DFmode
+#define MODE_d_mf8 E_V8QImode
 #define MODE_d_s8 E_V8QImode
 #define MODE_d_s16 E_V4HImode
 #define MODE_d_s32 E_V2SImode
@@ -148,6 +149,7 @@
 #define MODE_q_f16 E_V8HFmode
 #define MODE_q_f32 E_V4SFmode
 #define MODE_q_f64 E_V2DFmode
+#define MODE_q_mf8 E_V16QImode
 #define MODE_q_s8 E_V16QImode
 #define MODE_q_s16 E_V8HImode
 #define MODE_q_s32 E_V4SImode
@@ -177,6 +179,7 @@
 #define QUAL_p16 qualifier_poly
 #define QUAL_p64 qualifier_poly
 #define QUAL_p128 qualifier_poly
+#define QUAL_mf8 qualifier_modal_float
 
 #define LENGTH_d ""
 #define LENGTH_q "q"
@@ -598,6 +601,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
 /* vreinterpret intrinsics are defined for any pair of element types.
{ _bf16   }   { _bf16   }
{  _f16 _f32 _f64 }   {  _f16 _f32 _f64 }
+   { _mf8}   { _mf8}
{ _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
{ _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }
{ _p8  _p16  _p64 }   { _p8  _p16  _p64 }.  */
@@ -609,6 +613,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRET_BUILTIN2 (A, f16) \
   VREINTERPRET_BUILTIN2 (A, f32) \
   VREINTERPRET_BUILTIN2 (A, f64) \
+  VREINTERPRET_BUILTIN2 (A, mf8) \
   VREINTERPRET_BUILTIN2 (A, s8) \
   VREINTERPRET_BUILTIN2 (A, s16) \
   VREINTERPRET_BUILTIN2 (A, s32) \
@@ -626,6 +631,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRET_BUILTINS1 (f16) \
   VREINTERPRET_BUILTINS1 (f32) \
   VREINTERPRET_BUILTINS1 (f64) \
+  VREINTERPRET_BUILTINS1 (mf8) \
   VREINTERPRET_BUILTINS1 (s8) \
   VREINTERPRET_BUILTINS1 (s16) \
   VREINTERPRET_BUILTINS1 (s32) \
@@ -641,6 +647,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
 /* vreinterpretq intrinsics are additionally defined for p128.
{ _bf16 }   { _bf16 }
{  _f16 _f32 _f64   }   {  _f16 _f32 _f64   }
+   { _mf8  }   { _mf8  }
{ _s8  _s16 _s32 _s64   } x { _s8  _s16 _s32 _s64   }
{ _u8  _u16 _u32 _u64   }   { _u8  _u16 _u32 _u64   }
{ _p8  _p16  _p64 _p128 }   { _p8  _p16  _p64 _p128 }.  */
@@ -652,6 +659,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRETQ_BUILTIN2 (A, f16) \
   VREINTERPRETQ_BUILTIN2 (A, f32) \
   VREINTERPRETQ_BUILTIN2 (A, f64) \
+  VREINTERPRETQ_BUILTIN2 (A, mf8) \
   VREINTERPRETQ_BUILTIN2 (A, s8) \
   VREINTERPRETQ_BUILTIN2 (A, s16) \
   VREINTERPRETQ_BUILTIN2 (A, s32) \
@@ -670,6 +678,7 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRETQ_BUILTINS1 (f16) \
   VREINTERPRETQ_BUILTINS1 (f32) \
   VREINTERPRETQ_BUILTINS1 (f64) \
+  VREINTERPRETQ_BUILTINS1 (mf8) \
   VREINTERPRETQ_BUILTINS1 (s8) \
   VREINTERPRETQ_BUILTINS1 (s16) \
   VREINTERPRETQ_BUILTINS1 (s32) \
@@ -1117,7 +1126,8 @@ aarch64_lookup_simd_type_in_table (machine_mode mode,
 {
   int i;
   int nelts = ARRAY_SIZE (aarch64_simd_types);
-  int q = qualifiers & (qualifier_poly | qualifier_unsigned);
+  int q = qualifiers
+& (qualifier_poly | qualifier_unsigned | qualifier_modal_float);
 
   for (i = 0; i < nelts; i++)
 {
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c
new file mode 100644
index 
..5e5921746036bbfbf20d2a77697760efd1f71cc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+
+#include 
+
+#define TEST_128(T, S)   \
+T test_vreinterpretq_##S##_mf8 (mfloat8x16_t a)\
+{\
+  return vreinterpretq_##S##_mf8 (a);\
+} 

Re: [PATCH] match: Reject non-const internal functions [PR117260]

2024-10-23 Thread Richard Biener
On Wed, Oct 23, 2024 at 8:50 AM Richard Biener
 wrote:
>
> On Tue, Oct 22, 2024 at 7:21 PM Andrew Pinski  
> wrote:
> >
> > When internal functions support was added to match 
> > (r6-4979-gc9e926ce2bdc8b),
> > the check for ECF_CONST was the builtin function side. Though before 
> > r15-4503-g8d6d6d537fdc,
> > there was no use of maybe_push_res_to_seq with non-const internal functions 
> > so the check
> > would not make a difference.
> >
> > This adds the check for internal functions just as there is a check for 
> > builtins.
> >
> > Note I didn't add a testcase because there was no non-const internal 
> > function
> > which could be used on x86_64 in a decent manor.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
>
> OK.

Note there's similar correctness restrictions on the matching side, say

  _1 = strlen (_2);
  *_2 = '0';
  _3 = strlen (_2);
  _4 = _3 == _1;

if you had

(simplify
 (eq (BUILT_IN_STRLEN @0) (BUILT_IN_STRLEN @0))
 { boolean_true_node; })

it would currently miscompile the above - you have to be careful when
writing patterns here and it's currently not possible to capture/compare
virtual operands in the patterns (no such in GENERIC) and genmatch
doesn't try to be clever here, but it also doesn't reject matching of
non-ECF_CONST calls since that would go too far.  On the genmatch
side it might be possible to make sure everything we match uses
the same VUSE (or NULL) and has no VDEF (or when there's a VDEF
it matches the other VUSEs - which means there's at most one VDEF).

'errno' is a nuisance there - we might want to make whether a stmt
affects 'errno' a per-stmt thing so we can use range info to prune this
(IIRC there's a PR also requesting some function attribute to annotate
library functions)

Due to the 'errno' nuisance and effective disabling of some patterns
I initially refrained from doing anything about virtual operands during
matching time.

Richard.

> Thanks,
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR tree-optimization/117260
> > * gimple-match-exports.cc (maybe_push_res_to_seq): Reject non-const
> > internal functions.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/gimple-match-exports.cc | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> > index d3e626a1a24..77d225825cf 100644
> > --- a/gcc/gimple-match-exports.cc
> > +++ b/gcc/gimple-match-exports.cc
> > @@ -522,6 +522,11 @@ maybe_push_res_to_seq (gimple_match_op *res_op, 
> > gimple_seq *seq, tree res)
> > {
> >   /* Generate the given function if we can.  */
> >   internal_fn ifn = as_internal_fn (fn);
> > +
> > + /* We can't and should not emit calls to non-const functions.  */
> > + if (!(internal_fn_flags (ifn) & ECF_CONST))
> > +   return NULL_TREE;
> > +
> >   new_stmt = build_call_internal (ifn, res_op);
> >   if (!new_stmt)
> > return NULL_TREE;
> > --
> > 2.43.0
> >


[PATCH 01/22] aarch64: Add -mbranch-protection=gcs option

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

This enables Guarded Control Stack (GCS) compatible code generation.

The "standard" branch-protection type enables it, and the default
depends on the compiler default.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch_gcs_enabled): Declare.
* config/aarch64/aarch64.cc (aarch_gcs_enabled): Define.
(aarch_handle_no_branch_protection): Handle gcs.
(aarch_handle_standard_branch_protection): Handle gcs.
(aarch_handle_gcs_protection): New.
* config/aarch64/aarch64.opt: Add aarch_enable_gcs.
* configure: Regenerate.
* configure.ac: Handle gcs in --enable-standard-branch-protection.
* doc/invoke.texi: Document -mbranch-protection=gcs.
---
 gcc/config/aarch64/aarch64-protos.h |  2 ++
 gcc/config/aarch64/aarch64.cc   | 24 
 gcc/config/aarch64/aarch64.opt  |  3 +++
 gcc/configure   |  2 +-
 gcc/configure.ac|  6 +++---
 gcc/doc/invoke.texi |  5 +++--
 6 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d03c1fe798b..b8ec8a58c4e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1125,4 +1125,6 @@ extern void aarch64_adjust_reg_alloc_order ();
 bool aarch64_optimize_mode_switching (aarch64_mode_entity);
 void aarch64_restore_za (rtx);
 
+extern bool aarch64_gcs_enabled ();
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3ab550acc7c..33f97d42d55 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8502,6 +8502,13 @@ aarch_bti_j_insn_p (rtx_insn *insn)
   return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) == UNSPECV_BTI_J;
 }
 
+/* Return TRUE if Guarded Control Stack is enabled.  */
+bool
+aarch64_gcs_enabled (void)
+{
+  return (aarch64_enable_gcs == 1);
+}
+
 /* Check if X (or any sub-rtx of X) is a PACIASP/PACIBSP instruction.  */
 bool
 aarch_pac_insn_p (rtx x)
@@ -18885,6 +18892,7 @@ aarch64_handle_no_branch_protection (void)
 {
   aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
   aarch_enable_bti = 0;
+  aarch64_enable_gcs = 0;
 }
 
 static void
@@ -18893,6 +18901,7 @@ aarch64_handle_standard_branch_protection (void)
   aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
   aarch64_ra_sign_key = AARCH64_KEY_A;
   aarch_enable_bti = 1;
+  aarch64_enable_gcs = 1;
 }
 
 static void
@@ -18919,6 +18928,11 @@ aarch64_handle_bti_protection (void)
 {
   aarch_enable_bti = 1;
 }
+static void
+aarch64_handle_gcs_protection (void)
+{
+  aarch64_enable_gcs = 1;
+}
 
 static const struct aarch_branch_protect_type aarch64_pac_ret_subtypes[] = {
   { "leaf", false, aarch64_handle_pac_ret_leaf, NULL, 0 },
@@ -18933,6 +18947,7 @@ static const struct aarch_branch_protect_type 
aarch64_branch_protect_types[] =
   { "pac-ret", false, aarch64_handle_pac_ret_protection,
 aarch64_pac_ret_subtypes, ARRAY_SIZE (aarch64_pac_ret_subtypes) },
   { "bti", false, aarch64_handle_bti_protection, NULL, 0 },
+  { "gcs", false, aarch64_handle_gcs_protection, NULL, 0 },
   { NULL, false, NULL, NULL, 0 }
 };
 
@@ -19032,6 +19047,15 @@ aarch64_override_options (void)
 #endif
 }
 
+  if (aarch64_enable_gcs == 2)
+{
+#ifdef TARGET_ENABLE_GCS
+  aarch64_enable_gcs = 1;
+#else
+  aarch64_enable_gcs = 0;
+#endif
+}
+
   /* Return address signing is currently not supported for ILP32 targets.  For
  LP64 targets use the configured option in the absence of a command-line
  option for -mbranch-protection.  */
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index c2c9965b062..36bc719b822 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -45,6 +45,9 @@ uint64_t aarch64_isa_flags_1 = 0
 TargetVariable
 unsigned aarch_enable_bti = 2
 
+TargetVariable
+unsigned aarch64_enable_gcs = 2
+
 TargetVariable
 enum aarch64_key_type aarch64_ra_sign_key = AARCH64_KEY_A
 
diff --git a/gcc/configure b/gcc/configure
index 5acc42c1e4d..8ed47b4dadb 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28044,7 +28044,7 @@ if test "${enable_standard_branch_protection+set}" = 
set; then :
   enableval=$enable_standard_branch_protection;
 case $enableval in
   yes)
-tm_defines="${tm_defines} TARGET_ENABLE_BTI=1 
TARGET_ENABLE_PAC_RET=1"
+tm_defines="${tm_defines} TARGET_ENABLE_BTI=1 
TARGET_ENABLE_PAC_RET=1 TARGET_ENABLE_GCS=1"
 ;;
   no)
 ;;
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 23f4884eff9..8048c4550aa 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4392,14 +4392,14 @@ case "$target" in
 AC_ARG_ENABLE(standard-branch-protection,
 [
 AS_HELP_STRING([--enable-standard-branch-protection],
-[enable Branch Target Identification Mechanism and Return Address 
Signing by 

[PATCH 08/22] aarch64: Add __builtin_aarch64_gcs* tests

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/gcspopm-1.c: New test.
* gcc.target/aarch64/gcspr-1.c: New test.
* gcc.target/aarch64/gcsss-1.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/gcspopm-1.c | 69 
 gcc/testsuite/gcc.target/aarch64/gcspr-1.c   | 31 +
 gcc/testsuite/gcc.target/aarch64/gcsss-1.c   | 49 ++
 3 files changed, 149 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcspopm-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcspr-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcsss-1.c

diff --git a/gcc/testsuite/gcc.target/aarch64/gcspopm-1.c 
b/gcc/testsuite/gcc.target/aarch64/gcspopm-1.c
new file mode 100644
index 000..6e6add39cf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcspopm-1.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=none" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo1:
+** syslxzr, #3, c7, c7, #1 // gcspopm
+** ret
+*/
+void
+foo1 (void)
+{
+  __builtin_aarch64_gcspopm ();
+}
+
+/*
+**foo2:
+** mov x0, 0
+** syslx0, #3, c7, c7, #1 // gcspopm
+** ret
+*/
+unsigned long long
+foo2 (void)
+{
+  return __builtin_aarch64_gcspopm ();
+}
+
+/*
+**foo3:
+** mov x16, 1
+** (
+** mov x0, 0
+** hint40 // chkfeat x16
+** |
+** hint40 // chkfeat x16
+** mov x0, 0
+** )
+** cbz x16, .*
+** ret
+** mov x0, 0
+** syslx0, #3, c7, c7, #1 // gcspopm
+** ret
+*/
+unsigned long long
+foo3 (void)
+{
+  if (__builtin_aarch64_chkfeat (1) == 0)
+return __builtin_aarch64_gcspopm ();
+  return 0;
+}
+
+/*
+**foo4:
+** syslxzr, #3, c7, c7, #1 // gcspopm
+** mov x0, 0
+** syslx0, #3, c7, c7, #1 // gcspopm
+** syslxzr, #3, c7, c7, #1 // gcspopm
+** ret
+*/
+unsigned long long
+foo4 (void)
+{
+  unsigned long long a = __builtin_aarch64_gcspopm ();
+  unsigned long long b = __builtin_aarch64_gcspopm ();
+  unsigned long long c = __builtin_aarch64_gcspopm ();
+  (void) a;
+  (void) c;
+  return b;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/gcspr-1.c 
b/gcc/testsuite/gcc.target/aarch64/gcspr-1.c
new file mode 100644
index 000..0e651979551
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcspr-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=none" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo1:
+** mrs x0, s3_3_c2_c5_1 // gcspr_el0
+** ret
+*/
+void *
+foo1 (void)
+{
+  return __builtin_aarch64_gcspr ();
+}
+
+/*
+**foo2:
+** mrs x[0-9]*, s3_3_c2_c5_1 // gcspr_el0
+** syslxzr, #3, c7, c7, #1 // gcspopm
+** mrs x[0-9]*, s3_3_c2_c5_1 // gcspr_el0
+** sub x0, x[0-9]*, x[0-9]*
+** ret
+*/
+long
+foo2 (void)
+{
+  const char *p = __builtin_aarch64_gcspr ();
+  __builtin_aarch64_gcspopm ();
+  const char *q = __builtin_aarch64_gcspr ();
+  return p - q;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/gcsss-1.c 
b/gcc/testsuite/gcc.target/aarch64/gcsss-1.c
new file mode 100644
index 000..025c7fee647
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcsss-1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=none" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo1:
+** sys #3, c7, c7, #2, x0 // gcsss1
+** mov x[0-9]*, 0
+** syslx[0-9]*, #3, c7, c7, #3 // gcsss2
+** ret
+*/
+void
+foo1 (void *p)
+{
+  __builtin_aarch64_gcsss (p);
+}
+
+/*
+**foo2:
+** sys #3, c7, c7, #2, x0 // gcsss1
+** mov x0, 0
+** syslx0, #3, c7, c7, #3 // gcsss2
+** ret
+*/
+void *
+foo2 (void *p)
+{
+  return __builtin_aarch64_gcsss (p);
+}
+
+/*
+**foo3:
+** mov x16, 1
+** hint40 // chkfeat x16
+** cbnzx16, .*
+** sys #3, c7, c7, #2, x0 // gcsss1
+** mov x0, 0
+** syslx0, #3, c7, c7, #3 // gcsss2
+** ret
+** mov x0, 0
+** ret
+*/
+void *
+foo3 (void *p)
+{
+  if (__builtin_aarch64_chkfeat (1) == 0)
+return __builtin_aarch64_gcsss (p);
+  return 0;
+}
-- 
2.39.5



[PATCH 06/22] aarch64: Add GCS instructions

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Add instructions for the Guarded Control Stack extension.

GCSSS1 and GCSSS2 are modelled as a single GCSSS unspec, because they
are always used together in the compiler.

Before GCSPOPM and GCSSS2 an extra "mov xn, 0" is added to clear the
output register, this is needed to get reasonable result when GCS is
disabled, when the instructions are NOPs. Since the instructions are
expected to be used behind runtime feature checks, this is mainly
relevant if GCS can be disabled asynchronously.

The output of GCSPOPM is usually not needed, so a separate gcspopm_xzr
was added to model that. Did not do the same for GCSSS as it is a less
common operation.

The used mnemonics do not depend on updated assembler since these
instructions can be used without new -march setting behind a runtime
check.

Reading the GCSPR is modelled as unspec_volatile so it does not get
reordered wrt the other instructions changing the GCSPR.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_load_gcspr): New.
(aarch64_gcspopm): New.
(aarch64_gcspopm_xzr): New.
(aarch64_gcsss): New.
---
 gcc/config/aarch64/aarch64.md | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 43bed0ce10f..e4e11e35b5b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -382,6 +382,9 @@ (define_c_enum "unspecv" [
 UNSPECV_BTI_J  ; Represent BTI j.
 UNSPECV_BTI_JC ; Represent BTI jc.
 UNSPECV_CHKFEAT; Represent CHKFEAT X16.
+UNSPECV_GCSPR  ; Represent MRS Xn, GCSPR_EL0
+UNSPECV_GCSPOPM; Represent GCSPOPM.
+UNSPECV_GCSSS  ; Represent GCSSS1 and GCSSS2.
 UNSPECV_TSTART ; Represent transaction start.
 UNSPECV_TCOMMIT; Represent transaction commit.
 UNSPECV_TCANCEL; Represent transaction cancel.
@@ -8321,6 +8324,38 @@ (define_insn "aarch64_chkfeat"
   "hint\\t40 // chkfeat x16"
 )
 
+;; Guarded Control Stack (GCS) instructions
+(define_insn "aarch64_load_gcspr"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec_volatile:DI [(const_int 0)] UNSPECV_GCSPR))]
+  ""
+  "mrs\\t%0, s3_3_c2_c5_1 // gcspr_el0"
+  [(set_attr "type" "mrs")]
+)
+
+(define_insn "aarch64_gcspopm"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec_volatile:DI [(const_int 0)] UNSPECV_GCSPOPM))]
+  ""
+  "mov\\t%0, 0\;sysl\\t%0, #3, c7, c7, #1 // gcspopm"
+  [(set_attr "length" "8")]
+)
+
+(define_insn "aarch64_gcspopm_xzr"
+  [(unspec_volatile [(const_int 0)] UNSPECV_GCSPOPM)]
+  ""
+  "sysl\\txzr, #3, c7, c7, #1 // gcspopm"
+)
+
+(define_insn "aarch64_gcsss"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "r")]
+ UNSPECV_GCSSS))]
+  ""
+  "sys\\t#3, c7, c7, #2, %1 // gcsss1\;mov\\t%0, 0\;sysl\\t%0, #3, c7, c7, #3 
// gcsss2"
+  [(set_attr "length" "12")]
+)
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 
-- 
2.39.5



[PATCH 11/22] aarch64: Add ACLE feature macros for GCS

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
macros for GCS.
---
 gcc/config/aarch64/aarch64-c.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index f9b9e379375..bdc1c0da584 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -247,6 +247,9 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 
   aarch64_def_or_undef (TARGET_PAUTH, "__ARM_FEATURE_PAUTH", pfile);
   aarch64_def_or_undef (TARGET_BTI, "__ARM_FEATURE_BTI", pfile);
+  aarch64_def_or_undef (aarch64_gcs_enabled (),
+   "__ARM_FEATURE_GCS_DEFAULT", pfile);
+  aarch64_def_or_undef (TARGET_GCS, "__ARM_FEATURE_GCS", pfile);
   aarch64_def_or_undef (TARGET_I8MM, "__ARM_FEATURE_MATMUL_INT8", pfile);
   aarch64_def_or_undef (TARGET_BF16_SIMD,
"__ARM_FEATURE_BF16_VECTOR_ARITHMETIC", pfile);
-- 
2.39.5



[PATCH 04/22] aarch64: Add __builtin_aarch64_chkfeat

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Builtin for chkfeat: the input argument is used to initialize x16 then
execute chkfeat and return the updated x16.

Note: ACLE __chkfeat(x) plans to flip the bits to be more intuitive
(xor the input to output), but for the builtin that seems unnecessary
complication.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
Define AARCH64_BUILTIN_CHKFEAT.
(aarch64_general_init_builtins): Handle chkfeat.
(aarch64_general_expand_builtin): Handle chkfeat.
---
 gcc/config/aarch64/aarch64-builtins.cc | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 7d737877e0b..765f2091504 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -875,6 +875,8 @@ enum aarch64_builtins
   AARCH64_PLDX,
   AARCH64_PLI,
   AARCH64_PLIX,
+  /* Armv8.9-A / Armv9.4-A builtins.  */
+  AARCH64_BUILTIN_CHKFEAT,
   AARCH64_BUILTIN_MAX
 };
 
@@ -2280,6 +2282,12 @@ aarch64_general_init_builtins (void)
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
 
+  tree ftype_chkfeat
+= build_function_type_list (uint64_type_node, uint64_type_node, NULL);
+  aarch64_builtin_decls[AARCH64_BUILTIN_CHKFEAT]
+= aarch64_general_add_builtin ("__builtin_aarch64_chkfeat", ftype_chkfeat,
+  AARCH64_BUILTIN_CHKFEAT);
+
   if (in_lto_p)
 handle_arm_acle_h ();
 }
@@ -3484,6 +3492,16 @@ aarch64_general_expand_builtin (unsigned int fcode, tree 
exp, rtx target,
 case AARCH64_PLIX:
   aarch64_expand_prefetch_builtin (exp, fcode);
   return target;
+
+case AARCH64_BUILTIN_CHKFEAT:
+  {
+   rtx x16_reg = gen_rtx_REG (DImode, R16_REGNUM);
+   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
+   emit_move_insn (x16_reg, op0);
+   expand_insn (CODE_FOR_aarch64_chkfeat, 0, 0);
+   emit_move_insn (target, x16_reg);
+   return target;
+  }
 }
 
   if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
-- 
2.39.5



[PATCH 09/22] aarch64: Add GCS support for nonlocal stack save

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Nonlocal stack save and restore has to also save and restore the GCS
pointer. This is used in __builtin_setjmp/longjmp and nonlocal goto.

The GCS specific code is only emitted if GCS branch-protection is
enabled and the code always checks at runtime if GCS is enabled.

The new -mbranch-protection=gcs and old -mbranch-protection=none code
are ABI compatible: jmpbuf for __builtin_setjmp has space for 5
pointers, the layout is

  old layout: fp, pc, sp, unused, unused
  new layout: fp, pc, sp, gcsp, unused

Note: the ILP32 code generation is wrong as it saves the pointers with
Pmode (i.e. 8 bytes per pointer), but the user supplied buffer size is
for 5 pointers (4 bytes per pointer), this is not fixed.

The nonlocal goto has no ABI compatibility issues as the goto and its
destination are in the same translation unit.

gcc/ChangeLog:

* config/aarch64/aarch64.h (STACK_SAVEAREA_MODE): Make space for gcs.
* config/aarch64/aarch64.md (save_stack_nonlocal): New.
(restore_stack_nonlocal): New.
---
 gcc/config/aarch64/aarch64.h  |  7 +++
 gcc/config/aarch64/aarch64.md | 82 +++
 2 files changed, 89 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 593319fd472..43a92e85780 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1297,6 +1297,13 @@ typedef struct
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
 
+/* Have space for both SP and GCSPR in the NONLOCAL case in
+   emit_stack_save as well as in __builtin_setjmp, __builtin_longjmp
+   and __builtin_nonlocal_goto.
+   Note: On ILP32 the documented buf size is not enough PR84150.  */
+#define STACK_SAVEAREA_MODE(LEVEL) \
+  ((LEVEL) == SAVE_NONLOCAL ? TImode : Pmode)
+
 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM)
 
 #define RETURN_ADDR_RTX aarch64_return_addr
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e4e11e35b5b..6e1646387d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1200,6 +1200,88 @@ (define_insn "*cb1"
  (const_int 1)))]
 )
 
+(define_expand "save_stack_nonlocal"
+  [(set (match_operand 0 "memory_operand")
+(match_operand 1 "register_operand"))]
+  ""
+{
+  rtx stack_slot = adjust_address (operands[0], Pmode, 0);
+  emit_move_insn (stack_slot, operands[1]);
+
+  if (aarch64_gcs_enabled ())
+{
+  /* Save GCS with code like
+   mov x16, 1
+   chkfeat x16
+   tbnzx16, 0, .L_done
+   mrs tmp, gcspr_el0
+   str tmp, [%0, 8]
+   .L_done:  */
+
+  rtx done_label = gen_label_rtx ();
+  rtx r16 = gen_rtx_REG (DImode, R16_REGNUM);
+  emit_move_insn (r16, const1_rtx);
+  emit_insn (gen_aarch64_chkfeat ());
+  emit_insn (gen_tbranch_neqi3 (r16, const0_rtx, done_label));
+  rtx gcs_slot = adjust_address (operands[0], Pmode, GET_MODE_SIZE 
(Pmode));
+  rtx gcs = force_reg (Pmode, const0_rtx);
+  emit_insn (gen_aarch64_load_gcspr (gcs));
+  emit_move_insn (gcs_slot, gcs);
+  emit_label (done_label);
+}
+  DONE;
+})
+
+(define_expand "restore_stack_nonlocal"
+  [(set (match_operand 0 "register_operand" "")
+   (match_operand 1 "memory_operand" ""))]
+  ""
+{
+  rtx stack_slot = adjust_address (operands[1], Pmode, 0);
+  emit_move_insn (operands[0], stack_slot);
+
+  if (aarch64_gcs_enabled ())
+{
+  /* Restore GCS with code like
+   mov x16, 1
+   chkfeat x16
+   tbnzx16, 0, .L_done
+   ldr tmp1, [%1, 8]
+   mrs tmp2, gcspr_el0
+   substmp2, tmp1, tmp2
+   b.eq.L_done
+   .L_loop:
+   gcspopm
+   substmp2, tmp2, 8
+   b.ne.L_loop
+   .L_done:  */
+
+  rtx loop_label = gen_label_rtx ();
+  rtx done_label = gen_label_rtx ();
+  rtx r16 = gen_rtx_REG (DImode, R16_REGNUM);
+  emit_move_insn (r16, const1_rtx);
+  emit_insn (gen_aarch64_chkfeat ());
+  emit_insn (gen_tbranch_neqi3 (r16, const0_rtx, done_label));
+  rtx gcs_slot = adjust_address (operands[1], Pmode, GET_MODE_SIZE 
(Pmode));
+  rtx gcs_old = force_reg (Pmode, const0_rtx);
+  emit_move_insn (gcs_old, gcs_slot);
+  rtx gcs_now = force_reg (Pmode, const0_rtx);
+  emit_insn (gen_aarch64_load_gcspr (gcs_now));
+  emit_insn (gen_subdi3_compare1 (gcs_now, gcs_old, gcs_now));
+  rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+  rtx cmp_rtx = gen_rtx_fmt_ee (EQ, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, done_label));
+  emit_label (loop_label);
+  emit_insn (gen_aarch64_gcspopm_xzr ());
+  emit_insn (gen_adddi3_compare0 (gcs_now, gcs_now, GEN_INT (-8)));
+  cc_reg = gen_rtx_RE

[PATCH 13/22] aarch64: Add target pragma tests for gcs

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pragma_cpp_predefs_4.c: Add gcs specific
tests.
---
 .../gcc.target/aarch64/pragma_cpp_predefs_4.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c 
b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
index 8e707630774..417293d4d5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
@@ -91,6 +91,9 @@
 #if __ARM_FEATURE_PAC_DEFAULT != 1
 #error Foo
 #endif
+#ifndef __ARM_FEATURE_GCS_DEFAULT
+#error Foo
+#endif
 
 #pragma GCC target ("branch-protection=none")
 #ifdef __ARM_FEATURE_BTI_DEFAULT
@@ -99,6 +102,9 @@
 #ifdef __ARM_FEATURE_PAC_DEFAULT
 #error Foo
 #endif
+#ifdef __ARM_FEATURE_GCS_DEFAULT
+#error Foo
+#endif
 
 #pragma GCC push_options
 #pragma GCC target "branch-protection=bti+pac-ret"
@@ -117,6 +123,9 @@
 #ifdef __ARM_FEATURE_PAC_DEFAULT
 #error Foo
 #endif
+#ifdef __ARM_FEATURE_GCS_DEFAULT
+#error Foo
+#endif
 
 #pragma GCC target "branch-protection=pac-ret"
 #ifdef __ARM_FEATURE_BTI_DEFAULT
@@ -133,3 +142,29 @@
 #if __ARM_FEATURE_PAC_DEFAULT != 6
 #error Foo
 #endif
+
+#pragma GCC target "branch-protection=gcs"
+#ifdef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_PAC_DEFAULT
+#error Foo
+#endif
+#ifndef __ARM_FEATURE_GCS_DEFAULT
+#error Foo
+#endif
+
+#pragma GCC target "arch=armv8.8-a+gcs"
+#ifndef __ARM_FEATURE_GCS
+#error Foo
+#endif
+
+#pragma GCC target "arch=armv8.8-a+nogcs"
+#ifdef __ARM_FEATURE_GCS
+#error Foo
+#endif
+
+#pragma GCC target "arch=armv8.8-a"
+#ifdef __ARM_FEATURE_GCS
+#error Foo
+#endif
-- 
2.39.5



[PATCH 15/22] aarch64: Emit GNU property NOTE for GCS

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (GNU_PROPERTY_AARCH64_FEATURE_1_GCS):
Define.
(aarch64_file_end_indicate_exec_stack): Set GCS property bit.
---
 gcc/config/aarch64/aarch64.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 33f97d42d55..a89a30113b9 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -29242,6 +29242,7 @@ aarch64_can_tag_addresses ()
 #define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000
 #define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1U << 0)
 #define GNU_PROPERTY_AARCH64_FEATURE_1_PAC (1U << 1)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_GCS (1U << 2)
 void
 aarch64_file_end_indicate_exec_stack ()
 {
@@ -29254,6 +29255,9 @@ aarch64_file_end_indicate_exec_stack ()
   if (aarch_ra_sign_scope != AARCH_FUNCTION_NONE)
 feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
 
+  if (aarch64_gcs_enabled ())
+feature_1_and |= GNU_PROPERTY_AARCH64_FEATURE_1_GCS;
+
   if (feature_1_and)
 {
   /* Generate .note.gnu.property section.  */
@@ -29285,6 +29289,7 @@ aarch64_file_end_indicate_exec_stack ()
   assemble_align (POINTER_SIZE);
 }
 }
+#undef GNU_PROPERTY_AARCH64_FEATURE_1_GCS
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_PAC
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
-- 
2.39.5



[PATCH 21/22] aarch64: Fix tests incompatible with GCS

2024-10-23 Thread Yury Khrustalev
From: Matthieu Longo 

gcc/testsuite/ChangeLog:

* g++.target/aarch64/return_address_sign_ab_exception.C: Update.
* gcc.target/aarch64/eh_return.c: Update.
---
 .../return_address_sign_ab_exception.C| 19 +--
 gcc/testsuite/gcc.target/aarch64/eh_return.c  | 13 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git 
a/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C 
b/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
index ead11de7b15..6c79ebf03eb 100644
--- a/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
+++ b/gcc/testsuite/g++.target/aarch64/return_address_sign_ab_exception.C
@@ -1,16 +1,28 @@
 /* { dg-do run } */
 /* { dg-options "--save-temps" } */
 /* { dg-require-effective-target arm_v8_3a_bkey_directive } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
+/*
+** _Z5foo_av:
+** hint25 // paciasp
+** ...
+*/
 __attribute__((target("branch-protection=pac-ret+leaf")))
 int foo_a () {
   throw 22;
 }
 
+/*
+** _Z5foo_bv:
+** hint27 // pacibsp
+** ...
+*/
 __attribute__((target("branch-protection=pac-ret+leaf+b-key")))
 int foo_b () {
   throw 22;
 }
+/* { dg-final { scan-assembler-times ".cfi_b_key_frame" 1 } } */
 
 int main (int argc, char** argv) {
   try {
@@ -23,9 +35,4 @@ int main (int argc, char** argv) {
 }
   }
   return 1;
-}
-
-/* { dg-final { scan-assembler-times "paciasp" 1 } } */
-/* { dg-final { scan-assembler-times "pacibsp" 1 } } */
-/* { dg-final { scan-assembler-times ".cfi_b_key_frame" 1 } } */
-
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return.c
index 32179488085..51b20f784b3 100644
--- a/gcc/testsuite/gcc.target/aarch64/eh_return.c
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return.c
@@ -1,6 +1,19 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -fno-inline" } */
 
+/* With BTI enabled, this test would crash with SIGILL, Illegal instruction.
+   The 2nd argument of __builtin_eh_return is expected to be an EH handler
+   within a function, rather than a separate function.
+   The current implementation of __builtin_eh_return in AArch64 backend emits a
+   jump instead of branching with LR.
+   The prologue of the handler (i.e. continuation) starts with "bti c" (vs.
+   "bti jc") which is a landing pad type prohibiting jumps, hence the exception
+   at runtime.
+   The current behavior of __builtin_eh_return is considered correct.
+   Consequently, the default option -mbranch-protection=standard needs to be
+   overridden to remove BTI.  */
+/* { dg-additional-options "-mbranch-protection=pac-ret+leaf+gcs" { target { 
default_branch_protection } } } */
+
 #include 
 #include 
 
-- 
2.39.5



[PATCH 18/22] aarch64: libitm: Add GCS support

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Transaction begin and abort use setjmp/longjmp like operations that
need to be updated for GCS compatibility. We use similar logic to
libc setjmp/longjmp that support switching stack and thus switching
GCS (e.g. due to longjmp out of a makecontext stack), this is kept
even though it is likely not required for transaction aborts.

The gtm_jmpbuf is internal to libitm so we can change its layout
without breaking ABI.

libitm/ChangeLog:

* config/aarch64/sjlj.S: Add GCS support and mark GCS compatible.
* config/aarch64/target.h: Add gcs field to gtm_jmpbuf.
---
 libitm/config/aarch64/sjlj.S   | 60 --
 libitm/config/aarch64/target.h |  1 +
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/libitm/config/aarch64/sjlj.S b/libitm/config/aarch64/sjlj.S
index aeffd4d1070..cf1d8af2c96 100644
--- a/libitm/config/aarch64/sjlj.S
+++ b/libitm/config/aarch64/sjlj.S
@@ -29,6 +29,13 @@
 #define AUTIASPhint29
 #define PACIBSPhint27
 #define AUTIBSPhint31
+#define CHKFEAT_X16hint40
+#define MRS_GCSPR(x)   mrs x, s3_3_c2_c5_1
+#define GCSPOPM(x) syslx, #3, c7, c7, #1
+#define GCSSS1(x)  sys #3, c7, c7, #2, x
+#define GCSSS2(x)  syslx, #3, c7, c7, #3
+
+#define L(name) .L##name
 
 #if defined(HAVE_AS_CFI_PSEUDO_OP) && defined(__GCC_HAVE_DWARF2_CFI_ASM)
 # define cfi_negate_ra_state .cfi_negate_ra_state
@@ -80,7 +87,16 @@ _ITM_beginTransaction:
stp d10, d11, [sp, 7*16]
stp d12, d13, [sp, 8*16]
stp d14, d15, [sp, 9*16]
-   str x1, [sp, 10*16]
+
+   /* GCS support.  */
+   mov x2, 0
+   mov x16, 1
+   CHKFEAT_X16
+   tbnzx16, 0, L(gcs_done_sj)
+   MRS_GCSPR (x2)
+   add x2, x2, 8 /* GCS after _ITM_beginTransaction returns.  */
+L(gcs_done_sj):
+   stp x2, x1, [sp, 10*16]
 
/* Invoke GTM_begin_transaction with the struct we just built.  */
mov x1, sp
@@ -117,7 +133,38 @@ GTM_longjmp:
ldp d10, d11, [x1, 7*16]
ldp d12, d13, [x1, 8*16]
ldp d14, d15, [x1, 9*16]
+
+   /* GCS support.  */
+   mov x16, 1
+   CHKFEAT_X16
+   tbnzx16, 0, L(gcs_done_lj)
+   MRS_GCSPR (x7)
ldr x3, [x1, 10*16]
+   mov x4, x3
+   /* x7: GCSPR now.  x3, x4: target GCSPR.  x5, x6: tmp regs.  */
+L(gcs_scan):
+   cmp x7, x4
+   b.eqL(gcs_pop)
+   sub x4, x4, 8
+   /* Check for a cap token.  */
+   ldr x5, [x4]
+   and x6, x4, 0xf000
+   orr x6, x6, 1
+   cmp x5, x6
+   b.neL(gcs_scan)
+L(gcs_switch):
+   add x7, x4, 8
+   GCSSS1 (x4)
+   GCSSS2 (xzr)
+L(gcs_pop):
+   cmp x7, x3
+   b.eqL(gcs_done_lj)
+   GCSPOPM (xzr)
+   add x7, x7, 8
+   b   L(gcs_pop)
+L(gcs_done_lj):
+
+   ldr x3, [x1, 10*16 + 8]
ldp x29, x30, [x1]
cfi_def_cfa(x1, 0)
CFI_PAC_TOGGLE
@@ -132,6 +179,7 @@ GTM_longjmp:
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
 #define FEATURE_1_PAC 2
+#define FEATURE_1_GCS 4
 
 /* Supported features based on the code generation options.  */
 #if defined(__ARM_FEATURE_BTI_DEFAULT)
@@ -146,6 +194,12 @@ GTM_longjmp:
 # define PAC_FLAG 0
 #endif
 
+#if __ARM_FEATURE_GCS_DEFAULT
+# define GCS_FLAG FEATURE_1_GCS
+#else
+# define GCS_FLAG 0
+#endif
+
 /* Add a NT_GNU_PROPERTY_TYPE_0 note.  */
 #define GNU_PROPERTY(type, value)  \
   .section .note.gnu.property, "a";\
@@ -163,7 +217,7 @@ GTM_longjmp:
 .section .note.GNU-stack, "", %progbits
 
 /* Add GNU property note if built with branch protection.  */
-# if (BTI_FLAG|PAC_FLAG) != 0
-GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG)
+# if (BTI_FLAG|PAC_FLAG|GCS_FLAG) != 0
+GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG|GCS_FLAG)
 # endif
 #endif
diff --git a/libitm/config/aarch64/target.h b/libitm/config/aarch64/target.h
index 3d99197bfab..a1f39b4bf7a 100644
--- a/libitm/config/aarch64/target.h
+++ b/libitm/config/aarch64/target.h
@@ -30,6 +30,7 @@ typedef struct gtm_jmpbuf
   unsigned long long pc;   /* x30 */
   unsigned long long gr[10];   /* x19-x28 */
   unsigned long long vr[8];/* d8-d15 */
+  void *gcs;   /* GCSPR_EL0 */
   void *cfa;
 } gtm_jmpbuf;
 
-- 
2.39.5



[PATCH 12/22] aarch64: Add test for GCS ACLE defs

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pragma_cpp_predefs_1.c: GCS test.
---
 .../gcc.target/aarch64/pragma_cpp_predefs_1.c | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_1.c 
b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_1.c
index 307fa3d67da..6122cd55d66 100644
--- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_1.c
@@ -268,6 +268,36 @@
 #error "__ARM_FEATURE_RCPC is not defined but should be!"
 #endif
 
+#pragma GCC target ("arch=armv8.8-a+gcs")
+#ifndef __ARM_FEATURE_GCS
+#error "__ARM_FEATURE_GCS is not defined but should be!"
+#endif
+
+#pragma GCC target ("arch=armv8.8-a+nogcs")
+#ifdef __ARM_FEATURE_GCS
+#error "__ARM_FEATURE_GCS is defined but should not be!"
+#endif
+
+#pragma GCC target ("arch=armv8.8-a")
+#ifdef __ARM_FEATURE_GCS
+#error "__ARM_FEATURE_GCS is defined but should not be!"
+#endif
+
+#pragma GCC target ("branch-protection=gcs")
+#ifndef __ARM_FEATURE_GCS_DEFAULT
+#error "__ARM_FEATURE_GCS_DEFAULT is not defined but should be!"
+#endif
+
+#pragma GCC target ("branch-protection=none")
+#ifdef __ARM_FEATURE_GCS_DEFAULT
+#error "__ARM_FEATURE_GCS_DEFAULT is defined but should not be!"
+#endif
+
+#pragma GCC target ("branch-protection=standard")
+#ifndef __ARM_FEATURE_GCS_DEFAULT
+#error "__ARM_FEATURE_GCS_DEFAULT is not defined but should be!"
+#endif
+
 int
 foo (int a)
 {
-- 
2.39.5



[PATCH 00/22] aarch64: Add support for Guarded Control Stack extension

2024-10-23 Thread Yury Khrustalev
This patch series adds support for the Guarded Control Stack extension [1].

GCS marking for binaries is specified in [2].

Regression tested on AArch64 and no regressions have been found.

Is this OK for trunk?

Sources and branches:
 - binutils-gdb: sourceware.org/git/binutils-gdb.git users/ARM/gcs
 - gcc: this patch series, or
   gcc.gnu.org/git/gcc.git vendors/ARM/gcs-v3
   see https://gcc.gnu.org/gitwrite.html#vendor for setup details
 - glibc: sourceware.org/git/glibc.git arm/gcs-v2
 - kernel: git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/gcs

Cross-building the toolchain for target aarch64-none-linux-gnu:
 - build and install binutils-gdb
 - build and install GCC stage 1
 - install kernel headers
 - install glibc headers
 - build and install GCC stage 2 configuring with 
--enable-standard-branch-protection
 - build and install glibc
 - build and install GCC stage 3 along with target libraries configuring with 
--enable-standard-branch-protection

FVP model provided by the Shrinkwrap tool [3] can be used for testing.

Run tests with environment var

  GLIBC_TUNABLES=glibc.cpu.aarch64_gcs=1:glibc.cpu.aarch64_gcs_policy=2

See details about Glibc tunables in corresponding Glibc patch [4].

Corresponding binutils patch [5].

[1] https://developer.arm.com/documentation/ddi0487/ka/ (chapter D11)
[2] https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst
[3] https://git.gitlab.arm.com/tooling/shrinkwrap.git
[4] 
https://inbox.sourceware.org/libc-alpha/20241023083920.466015-1-yury.khrusta...@arm.com/
[5] 
https://inbox.sourceware.org/binutils/20241014101743.346-1-yury.khrusta...@arm.com/

---

Matthieu Longo (1):
  aarch64: Fix tests incompatible with GCS

Richard Ball (1):
  aarch64: Add tests and docs for indirect_return attribute

Szabolcs Nagy (19):
  aarch64: Add -mbranch-protection=gcs option
  aarch64: Add branch-protection target pragma tests
  aarch64: Add support for chkfeat insn
  aarch64: Add __builtin_aarch64_chkfeat
  aarch64: Add __builtin_aarch64_chkfeat tests
  aarch64: Add GCS instructions
  aarch64: Add GCS builtins
  aarch64: Add __builtin_aarch64_gcs* tests
  aarch64: Add GCS support for nonlocal stack save
  aarch64: Add non-local goto and jump tests for GCS
  aarch64: Add ACLE feature macros for GCS
  aarch64: Add test for GCS ACLE defs
  aarch64: Add target pragma tests for gcs
  aarch64: Add GCS support to the unwinder
  aarch64: Emit GNU property NOTE for GCS
  aarch64: libgcc: add GCS marking to asm
  aarch64: libatomic: add GCS marking to asm
  aarch64: libitm: Add GCS support
  aarch64: Introduce indirect_return attribute

Yury Khrustalev (1):
  aarch64: Fix nonlocal goto tests incompatible with GCS

 gcc/config/aarch64/aarch64-builtins.cc|  88 
 gcc/config/aarch64/aarch64-c.cc   |   3 +
 gcc/config/aarch64/aarch64-protos.h   |   2 +
 gcc/config/aarch64/aarch64.cc |  40 ++
 gcc/config/aarch64/aarch64.h  |   7 +
 gcc/config/aarch64/aarch64.md | 126 ++
 gcc/config/aarch64/aarch64.opt|   3 +
 gcc/config/arm/aarch-bti-insert.cc|  36 -
 gcc/configure |   2 +-
 gcc/configure.ac  |   6 +-
 gcc/doc/extend.texi   |   5 +
 gcc/doc/invoke.texi   |   5 +-
 gcc/testsuite/g++.target/aarch64/pr94515-1.C  |   6 +-
 .../return_address_sign_ab_exception.C|  19 ++-
 gcc/testsuite/gcc.target/aarch64/chkfeat-1.c  |  75 +++
 gcc/testsuite/gcc.target/aarch64/chkfeat-2.c  |  15 +++
 gcc/testsuite/gcc.target/aarch64/eh_return.c  |  13 ++
 .../gcc.target/aarch64/gcs-nonlocal-1.c   |  25 
 .../gcc.target/aarch64/gcs-nonlocal-2.c   |  21 +++
 .../gcc.target/aarch64/gcs-nonlocal-3.c   |  33 +
 gcc/testsuite/gcc.target/aarch64/gcspopm-1.c  |  69 ++
 gcc/testsuite/gcc.target/aarch64/gcspr-1.c|  31 +
 gcc/testsuite/gcc.target/aarch64/gcsss-1.c|  49 +++
 .../gcc.target/aarch64/indirect_return.c  |  25 
 gcc/testsuite/gcc.target/aarch64/pr104689.c   |   3 +-
 .../gcc.target/aarch64/pragma_cpp_predefs_1.c |  30 +
 .../gcc.target/aarch64/pragma_cpp_predefs_4.c |  85 
 .../gcc.target/aarch64/sme/nonlocal_goto_4.c  |   2 +-
 .../gcc.target/aarch64/sme/nonlocal_goto_5.c  |   2 +-
 .../gcc.target/aarch64/sme/nonlocal_goto_6.c  |   2 +-
 libatomic/config/linux/aarch64/atomic_16.S|  11 +-
 libgcc/config/aarch64/aarch64-asm.h   |  16 ++-
 libgcc/config/aarch64/aarch64-unwind.h|  59 +++-
 libitm/config/aarch64/sjlj.S  |  60 -
 libitm/config/aarch64/target.h|   1 +
 35 files changed, 944 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/chkfeat-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/chkfeat-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarc

[PATCH 19/22] aarch64: Introduce indirect_return attribute

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Tail calls of indirect_return functions from non-indirect_return
functions are disallowed even if BTI is disabled, since the call
site may have BTI enabled.

Following x86, mismatching attribute on function pointers is not
a type error even though this can lead to bugs.

Needed for swapcontext within the same function when GCS is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_gnu_attributes): Add
indirect_return.
(aarch64_function_ok_for_sibcall): Disallow tail calls if caller
is non-indirect_return but callee is indirect_return.
(aarch64_comp_type_attributes): Check indirect_return attribute.
* config/arm/aarch-bti-insert.cc (call_needs_bti_j): New.
(rest_of_insert_bti): Use call_needs_bti_j.
---
 gcc/config/aarch64/aarch64.cc  | 11 +
 gcc/config/arm/aarch-bti-insert.cc | 36 ++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a89a30113b9..9bfc9a1dbba 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -853,6 +853,7 @@ static const attribute_spec aarch64_gnu_attributes[] =
affects_type_identity, handler, exclude } */
   { "aarch64_vector_pcs", 0, 0, false, true,  true,  true,
  handle_aarch64_vector_pcs_attribute, NULL },
+  { "indirect_return",0, 0, false, true, true, false, NULL, NULL },
   { "arm_sve_vector_bits", 1, 1, false, true,  false, true,
  aarch64_sve::handle_arm_sve_vector_bits_attribute,
  NULL },
@@ -6429,6 +6430,14 @@ aarch64_function_ok_for_sibcall (tree, tree exp)
 if (bool (aarch64_cfun_shared_flags (state))
!= bool (aarch64_fntype_shared_flags (fntype, state)))
   return false;
+
+  /* BTI J is needed where indirect_return functions may return
+ if bti is enabled there.  */
+  if (lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (fntype))
+  && !lookup_attribute ("indirect_return",
+   TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl
+return false;
+
   return true;
 }
 
@@ -29118,6 +29127,8 @@ aarch64_comp_type_attributes (const_tree type1, 
const_tree type2)
 
   if (!check_attr ("gnu", "aarch64_vector_pcs"))
 return 0;
+  if (!check_attr ("gnu", "indirect_return"))
+return 0;
   if (!check_attr ("gnu", "Advanced SIMD type"))
 return 0;
   if (!check_attr ("gnu", "SVE type"))
diff --git a/gcc/config/arm/aarch-bti-insert.cc 
b/gcc/config/arm/aarch-bti-insert.cc
index 14d36971cd4..403afff9120 100644
--- a/gcc/config/arm/aarch-bti-insert.cc
+++ b/gcc/config/arm/aarch-bti-insert.cc
@@ -92,6 +92,35 @@ const pass_data pass_data_insert_bti =
   0, /* todo_flags_finish.  */
 };
 
+/* Decide if BTI J is needed after a call instruction.  */
+static bool
+call_needs_bti_j (rtx_insn *insn)
+{
+  /* Call returns twice, one of which may be indirect.  */
+  if (find_reg_note (insn, REG_SETJMP, NULL))
+return true;
+
+  /* Tail call does not return.  */
+  if (SIBLING_CALL_P (insn))
+return false;
+
+  /* Check if the function is marked to return indirectly.  */
+  rtx call = get_call_rtx_from (insn);
+  rtx fnaddr = XEXP (call, 0);
+  tree fndecl = NULL_TREE;
+  if (GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF)
+fndecl = SYMBOL_REF_DECL (XEXP (fnaddr, 0));
+  if (fndecl == NULL_TREE)
+fndecl = MEM_EXPR (fnaddr);
+  if (!fndecl)
+return false;
+  if (TREE_CODE (TREE_TYPE (fndecl)) != FUNCTION_TYPE
+  && TREE_CODE (TREE_TYPE (fndecl)) != METHOD_TYPE)
+return false;
+  tree fntype = TREE_TYPE (fndecl);
+  return lookup_attribute ("indirect_return", TYPE_ATTRIBUTES (fntype));
+}
+
 /* Insert the BTI instruction.  */
 /* This is implemented as a late RTL pass that runs before branch
shortening and does the following.  */
@@ -147,10 +176,9 @@ rest_of_insert_bti (void)
}
}
 
- /* Also look for calls to setjmp () which would be marked with
-REG_SETJMP note and put a BTI J after.  This is where longjump ()
-will return.  */
- if (CALL_P (insn) && (find_reg_note (insn, REG_SETJMP, NULL)))
+ /* Also look for calls that may return indirectly, such as setjmp,
+and put a BTI J after them.  */
+ if (CALL_P (insn) && call_needs_bti_j (insn))
{
  bti_insn = aarch_gen_bti_j ();
  emit_insn_after (bti_insn, insn);
-- 
2.39.5



[PATCH 14/22] aarch64: Add GCS support to the unwinder

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Follows the current linux ABI that uses single signal entry token
and shared shadow stack between thread and alt stack.
Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
special with gcs compat codegen) but there is a runtime check anyway.

Change affected tests to be compatible with -mbranch-protection=standard

gcc/testsuite/ChangeLog:

* g++.target/aarch64/pr94515-1.C (f1_no_pac_ret): Update.
(main): Update.
Co-authored-by: Matthieu Longo 

* gcc.target/aarch64/pr104689.c (unwind): Update.
Co-authored-by: Matthieu Longo 

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Define.

Co-authored-by: Matthieu Longo 
---
 gcc/testsuite/g++.target/aarch64/pr94515-1.C |  6 +-
 gcc/testsuite/gcc.target/aarch64/pr104689.c  |  3 +-
 libgcc/config/aarch64/aarch64-unwind.h   | 59 +++-
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.target/aarch64/pr94515-1.C 
b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
index 359039e1753..8175ea50c32 100644
--- a/gcc/testsuite/g++.target/aarch64/pr94515-1.C
+++ b/gcc/testsuite/g++.target/aarch64/pr94515-1.C
@@ -5,7 +5,7 @@
 
 volatile int zero = 0;
 
-__attribute__((noinline, target("branch-protection=none")))
+__attribute__((noinline, target("branch-protection=bti")))
 void unwind (void)
 {
   if (zero == 0)
@@ -22,7 +22,7 @@ int test (int z)
 // autiasp -> cfi_negate_ra_state: RA_signing_SP -> RA_no_signing
 return 1;
   } else {
-// 2nd cfi_negate_ra_state because the CFI directives are processed 
linearily.
+// 2nd cfi_negate_ra_state because the CFI directives are processed 
linearly.
 // At this point, the unwinder would believe that the address is not signed
 // due to the previous return. That's why the compiler has to emit second
 // cfi_negate_ra_state to mean that the return address is still signed.
@@ -33,7 +33,7 @@ int test (int z)
   }
 }
 
-__attribute__((target("branch-protection=none")))
+__attribute__((target("branch-protection=bti")))
 int main ()
 {
   try {
diff --git a/gcc/testsuite/gcc.target/aarch64/pr104689.c 
b/gcc/testsuite/gcc.target/aarch64/pr104689.c
index 3b7adbdfe7d..9688ecc85f9 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr104689.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr104689.c
@@ -98,6 +98,7 @@ asm(""
 "unusual_no_pac_ret:\n"
 "  .cfi_startproc\n"
 "  " SET_RA_STATE_0 "\n"
+"  bti c\n"
 "  stp x29, x30, [sp, -16]!\n"
 "  .cfi_def_cfa_offset 16\n"
 "  .cfi_offset 29, -16\n"
@@ -121,7 +122,7 @@ static void f2_pac_ret (void)
   die ();
 }
 
-__attribute__((target("branch-protection=none")))
+__attribute__((target("branch-protection=bti")))
 static void f1_no_pac_ret (void)
 {
   unusual_pac_ret (f2_pac_ret);
diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 4d36f0b26f7..cf4ec749c05 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -178,6 +178,9 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
*context,
   return addr;
 }
 
+/* GCS enable flag for chkfeat instruction.  */
+#define CHKFEAT_GCS 1
+
 /* SME runtime function local to libgcc, streaming compatible
and preserves more registers than the base PCS requires, but
we don't rely on that here.  */
@@ -185,12 +188,66 @@ __attribute__ ((visibility ("hidden")))
 void __libgcc_arm_za_disable (void);
 
 /* Disable the SME ZA state in case an unwound frame used the ZA
-   lazy saving scheme.  */
+   lazy saving scheme. And unwind the GCS for EH.  */
 #undef _Unwind_Frames_Extra
 #define _Unwind_Frames_Extra(x)\
   do   \
 {  \
   __libgcc_arm_za_disable ();  \
+  if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) == 0)\
+   {   \
+ for (_Unwind_Word n = (x); n != 0; n--)   \
+   __builtin_aarch64_gcspopm ();   \
+   }   \
+}  \
+  while (0)
+
+/* On signal entry the OS places a token on the GCS that can be used to
+   verify the integrity of the GCS pointer on signal return.  It also
+   places the signal handler return address (the restorer that calls the
+   signal return syscall) on the GCS so the handler can return.
+   Because of this token, each stack frame visited during unwinding has
+   exactly one corresponding entry on the GCS, so the frame count is
+   the number of entries that will have to be popped at EH return time.
+
+   Note: This depends on the GCS signal ABI of the OS.
+
+   When unwinding across a stack frame for each frame the corresponding
+   ent

[PATCH 22/22] aarch64: Fix nonlocal goto tests incompatible with GCS

2024-10-23 Thread Yury Khrustalev
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/gcs-nonlocal-3.c: New test.
* gcc.target/aarch64/sme/nonlocal_goto_4.c: Update.
* gcc.target/aarch64/sme/nonlocal_goto_5.c: Update.
* gcc.target/aarch64/sme/nonlocal_goto_6.c: Update.
---
 .../gcc.target/aarch64/gcs-nonlocal-3.c   | 33 +++
 .../gcc.target/aarch64/sme/nonlocal_goto_4.c  |  2 +-
 .../gcc.target/aarch64/sme/nonlocal_goto_5.c  |  2 +-
 .../gcc.target/aarch64/sme/nonlocal_goto_6.c  |  2 +-
 4 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-3.c

diff --git a/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-3.c 
b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-3.c
new file mode 100644
index 000..8511f66f66e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-3.c
@@ -0,0 +1,33 @@
+/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2 
-mbranch-protection=gcs" } */
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
{\.L[0-9]+\:} } } */
+
+void run(void (*)());
+
+/*
+** bar.0:
+** ...
+** hint40 // chkfeat x16
+** tbnzw16, 0, (\.L[0-9]+)
+** ...
+** mrs x1, s3_3_c2_c5_1 // gcspr_el0
+** subsx1, x3, x1
+** bne (\.L[0-9]+)\n\1\:
+** ...
+** br  x[0-9]+\n\2\:
+** ...
+** syslxzr, #3, c7, c7, #1 // gcspopm
+** ...
+** b   \1
+*/
+int
+foo (int *ptr)
+{
+  __label__ failure;
+
+  void bar () { *ptr += 1; goto failure; }
+  run (bar);
+  return 1;
+
+failure:
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c 
b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c
index 0446076286b..aed04bb495c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2 
-mbranch-protection=none" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 void run(void (*)());
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c 
b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c
index 4246aec8b2f..e4a31c5c600 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2 
-mbranch-protection=none" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 void run(void (*)() __arm_streaming);
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c 
b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c
index 151e2f22dc7..38f6c139f6d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2 
-mbranch-protection=none" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 void run(void (*)() __arm_streaming_compatible);
-- 
2.39.5



[PATCH 02/22] aarch64: Add branch-protection target pragma tests

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pragma_cpp_predefs_4.c: Add branch-protection
tests.
---
 .../gcc.target/aarch64/pragma_cpp_predefs_4.c | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c 
b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
index 23ebe5e4f50..8e707630774 100644
--- a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_4.c
@@ -83,3 +83,53 @@
 #ifndef __ARM_FEATURE_SME_F64F64
 #error Foo
 #endif
+
+#pragma GCC target "branch-protection=standard"
+#ifndef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#if __ARM_FEATURE_PAC_DEFAULT != 1
+#error Foo
+#endif
+
+#pragma GCC target ("branch-protection=none")
+#ifdef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_PAC_DEFAULT
+#error Foo
+#endif
+
+#pragma GCC push_options
+#pragma GCC target "branch-protection=bti+pac-ret"
+#ifndef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#pragma GCC pop_options
+#ifdef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+
+#pragma GCC target "branch-protection=bti"
+#ifndef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#ifdef __ARM_FEATURE_PAC_DEFAULT
+#error Foo
+#endif
+
+#pragma GCC target "branch-protection=pac-ret"
+#ifdef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#if __ARM_FEATURE_PAC_DEFAULT != 1
+#error Foo
+#endif
+
+#pragma GCC target "branch-protection=pac-ret+leaf+b-key"
+#ifdef __ARM_FEATURE_BTI_DEFAULT
+#error Foo
+#endif
+#if __ARM_FEATURE_PAC_DEFAULT != 6
+#error Foo
+#endif
-- 
2.39.5



[PATCH 03/22] aarch64: Add support for chkfeat insn

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

This is a hint space instruction to check for enabled HW features and
update the x16 register accordingly.

Use unspec_volatile to prevent reordering it around calls since calls
can enable or disable HW features.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_chkfeat): New.
---
 gcc/config/aarch64/aarch64.md | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c54b29cd64b..43bed0ce10f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -381,6 +381,7 @@ (define_c_enum "unspecv" [
 UNSPECV_BTI_C  ; Represent BTI c.
 UNSPECV_BTI_J  ; Represent BTI j.
 UNSPECV_BTI_JC ; Represent BTI jc.
+UNSPECV_CHKFEAT; Represent CHKFEAT X16.
 UNSPECV_TSTART ; Represent transaction start.
 UNSPECV_TCOMMIT; Represent transaction commit.
 UNSPECV_TCANCEL; Represent transaction cancel.
@@ -8312,6 +8313,14 @@ (define_insn "aarch64_restore_nzcv"
   "msr\tnzcv, %0"
 )
 
+;; CHKFEAT instruction
+(define_insn "aarch64_chkfeat"
+  [(set (reg:DI R16_REGNUM)
+(unspec_volatile:DI [(reg:DI R16_REGNUM)] UNSPECV_CHKFEAT))]
+  ""
+  "hint\\t40 // chkfeat x16"
+)
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 
-- 
2.39.5



[PATCH 05/22] aarch64: Add __builtin_aarch64_chkfeat tests

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/chkfeat-1.c: New test.
* gcc.target/aarch64/chkfeat-2.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/chkfeat-1.c | 75 
 gcc/testsuite/gcc.target/aarch64/chkfeat-2.c | 15 
 2 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/chkfeat-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/chkfeat-2.c

diff --git a/gcc/testsuite/gcc.target/aarch64/chkfeat-1.c 
b/gcc/testsuite/gcc.target/aarch64/chkfeat-1.c
new file mode 100644
index 000..2fae81e740f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/chkfeat-1.c
@@ -0,0 +1,75 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=none" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo1:
+** mov x16, 1
+** hint40 // chkfeat x16
+** mov x0, x16
+** ret
+*/
+unsigned long long
+foo1 (void)
+{
+  return __builtin_aarch64_chkfeat (1);
+}
+
+/*
+**foo2:
+** mov x16, 1
+** movkx16, 0x5678, lsl 32
+** movkx16, 0x1234, lsl 48
+** hint40 // chkfeat x16
+** mov x0, x16
+** ret
+*/
+unsigned long long
+foo2 (void)
+{
+  return __builtin_aarch64_chkfeat (0x123456780001);
+}
+
+/*
+**foo3:
+** mov x16, x0
+** hint40 // chkfeat x16
+** mov x0, x16
+** ret
+*/
+unsigned long long
+foo3 (unsigned long long x)
+{
+  return __builtin_aarch64_chkfeat (x);
+}
+
+/*
+**foo4:
+** ldr x16, \[x0\]
+** hint40 // chkfeat x16
+** str x16, \[x0\]
+** ret
+*/
+void
+foo4 (unsigned long long *p)
+{
+  *p = __builtin_aarch64_chkfeat (*p);
+}
+
+/*
+**foo5:
+** mov x16, 1
+** hint40 // chkfeat x16
+** cmp x16, 0
+**(
+** cselw0, w1, w0, eq
+**|
+** cselw0, w0, w1, ne
+**)
+** ret
+*/
+int
+foo5 (int x, int y)
+{
+  return __builtin_aarch64_chkfeat (1) ? x : y;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/chkfeat-2.c 
b/gcc/testsuite/gcc.target/aarch64/chkfeat-2.c
new file mode 100644
index 000..682524e244f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/chkfeat-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times {hint\t40 // chkfeat x16} 2 } } */
+
+void bar (void);
+
+/* Extern call may change enabled HW features.  */
+unsigned long long
+foo (void)
+{
+  unsigned long long a = __builtin_aarch64_chkfeat (1);
+  bar ();
+  unsigned long long b = __builtin_aarch64_chkfeat (1);
+  return a + b;
+}
-- 
2.39.5



[PATCH 16/22] aarch64: libgcc: add GCS marking to asm

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

libgcc/ChangeLog:

* config/aarch64/aarch64-asm.h (FEATURE_1_GCS): Define.
(GCS_FLAG): Define if GCS is enabled.
(GNU_PROPERTY): Add GCS_FLAG.
---
 libgcc/config/aarch64/aarch64-asm.h | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-asm.h 
b/libgcc/config/aarch64/aarch64-asm.h
index d8ab91d52f1..f7bd225f7a4 100644
--- a/libgcc/config/aarch64/aarch64-asm.h
+++ b/libgcc/config/aarch64/aarch64-asm.h
@@ -22,6 +22,9 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#ifndef AARCH64_ASM_H
+#define AARCH64_ASM_H
+
 #include "auto-target.h"
 
 #define L(label) .L ## label
@@ -38,6 +41,7 @@
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
 #define FEATURE_1_PAC 2
+#define FEATURE_1_GCS 4
 
 /* Supported features based on the code generation options.  */
 #if defined(__ARM_FEATURE_BTI_DEFAULT)
@@ -58,6 +62,12 @@
 # define AUTIASP
 #endif
 
+#if __ARM_FEATURE_GCS_DEFAULT
+# define GCS_FLAG FEATURE_1_GCS
+#else
+# define GCS_FLAG 0
+#endif
+
 #ifdef __ELF__
 #define HIDDEN(name) .hidden name
 #define SYMBOL_SIZE(name) .size name, .-name
@@ -88,8 +98,8 @@
 .previous
 
 /* Add GNU property note if built with branch protection.  */
-# if (BTI_FLAG|PAC_FLAG) != 0
-GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG)
+# if (BTI_FLAG|PAC_FLAG|GCS_FLAG) != 0
+GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG|GCS_FLAG)
 # endif
 #endif
 
@@ -106,3 +116,5 @@ GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG)
 #define END(name) \
   .cfi_endproc;\
   SYMBOL_SIZE(name)
+
+#endif
-- 
2.39.5



[PATCH 20/22] aarch64: Add tests and docs for indirect_return attribute

2024-10-23 Thread Yury Khrustalev
From: Richard Ball 

This patch adds a new testcase and docs
for the indirect_return attribute.

gcc/ChangeLog:

* doc/extend.texi: Add AArch64 docs for indirect_return
attribute.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/indirect_return.c: New test.
Co-authored-by: Yury Khrustalev 
---
 gcc/doc/extend.texi   |  5 
 .../gcc.target/aarch64/indirect_return.c  | 25 +++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/indirect_return.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 42bd567119d..45e2b3ec569 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4760,6 +4760,11 @@ Enable or disable calls to out-of-line helpers to 
implement atomic operations.
 This corresponds to the behavior of the command-line options
 @option{-moutline-atomics} and @option{-mno-outline-atomics}.
 
+@cindex @code{indirect_return} function attribute, AArch64
+@item indirect_return
+Used to inform the compiler that a function may return via
+an indirect return. Adds a BTI J instruction under 
@option{mbranch-protection=} bti.
+
 @end table
 
 The above target attributes can be specified as follows:
diff --git a/gcc/testsuite/gcc.target/aarch64/indirect_return.c 
b/gcc/testsuite/gcc.target/aarch64/indirect_return.c
new file mode 100644
index 000..f1ef56d5557
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/indirect_return.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mbranch-protection=bti" } */
+
+int __attribute((indirect_return))
+foo (int a)
+{
+  return a;
+}
+
+/*
+**func1:
+** hint34 // bti c
+** ...
+** bl  foo
+** hint36 // bti j
+** ...
+** ret
+*/
+int
+func1 (int a, int b)
+{
+  return foo (a + b);
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */
-- 
2.39.5



[PATCH 07/22] aarch64: Add GCS builtins

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

Add new builtins for GCS:

  void *__builtin_aarch64_gcspr (void)
  uint64_t __builtin_aarch64_gcspopm (void)
  void *__builtin_aarch64_gcsss (void *)

The builtins are always enabled, but should be used behind runtime
checks in case the target does not support GCS. They are thin
wrappers around the corresponding instructions.

The GCS pointer is modelled with void * type (normal stores do not
work on GCS memory, but it is writable via the gcsss operation or
via GCSSTR if enabled so not const) and an entry on the GCS is
modelled with uint64_t (since it has fixed size and can be a token
that's not a pointer).

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): Add
AARCH64_BUILTIN_GCSPR, AARCH64_BUILTIN_GCSPOPM, AARCH64_BUILTIN_GCSSS.
(aarch64_init_gcs_builtins): New.
(aarch64_general_init_builtins): Call aarch64_init_gcs_builtins.
(aarch64_expand_gcs_builtin): New.
(aarch64_general_expand_builtin): Call aarch64_expand_gcs_builtin.
---
 gcc/config/aarch64/aarch64-builtins.cc | 70 ++
 1 file changed, 70 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 765f2091504..a42a2b9e67f 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -877,6 +877,9 @@ enum aarch64_builtins
   AARCH64_PLIX,
   /* Armv8.9-A / Armv9.4-A builtins.  */
   AARCH64_BUILTIN_CHKFEAT,
+  AARCH64_BUILTIN_GCSPR,
+  AARCH64_BUILTIN_GCSPOPM,
+  AARCH64_BUILTIN_GCSSS,
   AARCH64_BUILTIN_MAX
 };
 
@@ -2241,6 +2244,29 @@ aarch64_init_fpsr_fpcr_builtins (void)
   AARCH64_BUILTIN_SET_FPSR64);
 }
 
+/* Add builtins for Guarded Control Stack instructions.  */
+
+static void
+aarch64_init_gcs_builtins (void)
+{
+  tree ftype;
+
+  ftype = build_function_type_list (ptr_type_node, NULL);
+  aarch64_builtin_decls[AARCH64_BUILTIN_GCSPR]
+= aarch64_general_add_builtin ("__builtin_aarch64_gcspr", ftype,
+  AARCH64_BUILTIN_GCSPR);
+
+  ftype = build_function_type_list (uint64_type_node, NULL);
+  aarch64_builtin_decls[AARCH64_BUILTIN_GCSPOPM]
+= aarch64_general_add_builtin ("__builtin_aarch64_gcspopm", ftype,
+  AARCH64_BUILTIN_GCSPOPM);
+
+  ftype = build_function_type_list (ptr_type_node, ptr_type_node, NULL);
+  aarch64_builtin_decls[AARCH64_BUILTIN_GCSSS]
+= aarch64_general_add_builtin ("__builtin_aarch64_gcsss", ftype,
+  AARCH64_BUILTIN_GCSSS);
+}
+
 /* Initialize all builtins in the AARCH64_BUILTIN_GENERAL group.  */
 
 void
@@ -2288,6 +2314,8 @@ aarch64_general_init_builtins (void)
 = aarch64_general_add_builtin ("__builtin_aarch64_chkfeat", ftype_chkfeat,
   AARCH64_BUILTIN_CHKFEAT);
 
+  aarch64_init_gcs_builtins ();
+
   if (in_lto_p)
 handle_arm_acle_h ();
 }
@@ -3367,6 +3395,43 @@ aarch64_expand_fpsr_fpcr_getter (enum insn_code icode, 
machine_mode mode,
   return op.value;
 }
 
+/* Expand GCS builtin EXP with code FCODE, putting the result
+   int TARGET.  If IGNORE is true the return value is ignored.  */
+
+rtx
+aarch64_expand_gcs_builtin (tree exp, rtx target, int fcode, int ignore)
+{
+  if (fcode == AARCH64_BUILTIN_GCSPR)
+{
+  expand_operand op;
+  create_output_operand (&op, target, DImode);
+  expand_insn (CODE_FOR_aarch64_load_gcspr, 1, &op);
+  return op.value;
+}
+  if (fcode == AARCH64_BUILTIN_GCSPOPM && ignore)
+{
+  expand_insn (CODE_FOR_aarch64_gcspopm_xzr, 0, 0);
+  return target;
+}
+  if (fcode == AARCH64_BUILTIN_GCSPOPM)
+{
+  expand_operand op;
+  create_output_operand (&op, target, Pmode);
+  expand_insn (CODE_FOR_aarch64_gcspopm, 1, &op);
+  return op.value;
+}
+  if (fcode == AARCH64_BUILTIN_GCSSS)
+{
+  expand_operand ops[2];
+  rtx op1 = expand_normal (CALL_EXPR_ARG (exp, 0));
+  create_output_operand (&ops[0], target, Pmode);
+  create_input_operand (&ops[1], op1, Pmode);
+  expand_insn (CODE_FOR_aarch64_gcsss, 2, ops);
+  return ops[0].value;
+}
+  gcc_unreachable ();
+}
+
 /* Expand an expression EXP that calls built-in function FCODE,
with result going to TARGET if that's convenient.  IGNORE is true
if the result of the builtin is ignored.  */
@@ -3502,6 +3567,11 @@ aarch64_general_expand_builtin (unsigned int fcode, tree 
exp, rtx target,
emit_move_insn (target, x16_reg);
return target;
   }
+
+case AARCH64_BUILTIN_GCSPR:
+case AARCH64_BUILTIN_GCSPOPM:
+case AARCH64_BUILTIN_GCSSS:
+  return aarch64_expand_gcs_builtin (exp, target, fcode, ignore);
 }
 
   if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
-- 
2.39.5



[PATCH 17/22] aarch64: libatomic: add GCS marking to asm

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (FEATURE_1_GCS): Define.
(GCS_FLAG): Define if GCS is enabled.
(GNU_PROPERTY): Add GCS_FLAG.
---
 libatomic/config/linux/aarch64/atomic_16.S | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 5767fba5c03..685db776382 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -775,6 +775,7 @@ END_FEAT (compare_exchange_16, LSE)
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
 #define FEATURE_1_PAC 2
+#define FEATURE_1_GCS 4
 
 /* Supported features based on the code generation options.  */
 #if defined(__ARM_FEATURE_BTI_DEFAULT)
@@ -789,6 +790,12 @@ END_FEAT (compare_exchange_16, LSE)
 # define PAC_FLAG 0
 #endif
 
+#if __ARM_FEATURE_GCS_DEFAULT
+# define GCS_FLAG FEATURE_1_GCS
+#else
+# define GCS_FLAG 0
+#endif
+
 /* Add a NT_GNU_PROPERTY_TYPE_0 note.  */
 #define GNU_PROPERTY(type, value)  \
   .section .note.gnu.property, "a"; \
@@ -806,7 +813,7 @@ END_FEAT (compare_exchange_16, LSE)
 .section .note.GNU-stack, "", %progbits
 
 /* Add GNU property note if built with branch protection.  */
-# if (BTI_FLAG|PAC_FLAG) != 0
-GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG)
+# if (BTI_FLAG|PAC_FLAG|GCS_FLAG) != 0
+GNU_PROPERTY (FEATURE_1_AND, BTI_FLAG|PAC_FLAG|GCS_FLAG)
 # endif
 #endif
-- 
2.39.5



[PATCH 10/22] aarch64: Add non-local goto and jump tests for GCS

2024-10-23 Thread Yury Khrustalev
From: Szabolcs Nagy 

These are scan asm tests only, relying on existing execution tests
for runtime coverage.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/gcs-nonlocal-1.c: New test.
* gcc.target/aarch64/gcs-nonlocal-2.c: New test.
---
 .../gcc.target/aarch64/gcs-nonlocal-1.c   | 25 +++
 .../gcc.target/aarch64/gcs-nonlocal-2.c   | 21 
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-2.c

diff --git a/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-1.c 
b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-1.c
new file mode 100644
index 000..821fab816f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=gcs" } */
+/* { dg-final { scan-assembler-times "hint\\t40 // chkfeat x16" 2 } } */
+/* { dg-final { scan-assembler-times "mrs\\tx\[0-9\]+, s3_3_c2_c5_1 // 
gcspr_el0" 2 } } */
+/* { dg-final { scan-assembler-times "sysl\\txzr, #3, c7, c7, #1 // gcspopm" 1 
} } */
+
+int bar1 (int);
+int bar2 (int);
+
+void foo (int cmd)
+{
+  __label__ start;
+  int x = 0;
+
+  void nonlocal_goto (void)
+  {
+x++;
+goto start;
+  }
+
+start:
+  while (bar1 (x))
+if (bar2 (x))
+  nonlocal_goto ();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-2.c 
b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-2.c
new file mode 100644
index 000..63dbce36e1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/gcs-nonlocal-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=gcs" } */
+/* { dg-final { scan-assembler-times "hint\\t40 // chkfeat x16" 2 } } */
+/* { dg-final { scan-assembler-times "mrs\\tx\[0-9\]+, s3_3_c2_c5_1 // 
gcspr_el0" 2 } } */
+/* { dg-final { scan-assembler-times "sysl\\txzr, #3, c7, c7, #1 // gcspopm" 1 
} } */
+
+void longj (void *buf)
+{
+  __builtin_longjmp (buf, 1);
+}
+
+void foo (void);
+void bar (void);
+
+void setj (void *buf)
+{
+  if (__builtin_setjmp (buf))
+foo ();
+  else
+bar ();
+}
-- 
2.39.5



Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-23 Thread Richard Sandiford
Akram Ahmad  writes:
> This renames the existing {s,u}q{add,sub} instructions to use the
> standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
> IFN_SAT_SUB.
>
> The NEON intrinsics for saturating arithmetic and their corresponding
> builtins are changed to use these standard names too.
>
> Using the standard names for the instructions causes 32 and 64-bit
> unsigned scalar saturating arithmetic to use the NEON instructions,
> resulting in an additional (and inefficient) FMOV to be generated when
> the original operands are in GP registers. This patch therefore also
> restores the original behaviour of using the adds/subs instructions
> in this circumstance.
>
> Additional tests are written for the scalar and Adv. SIMD cases to
> ensure that the correct instructions are used. The NEON intrinsics are
> already tested elsewhere.

Thanks for doing this.  The approach looks good.  My main question is:
are we sure that we want to use the Advanced SIMD instructions for
signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
we only saturate at the negative limit if both operands are negative,
and only saturate at the positive limit if both operands are positive.
So for 32-bit values we can use:

asr tmp, x or y, #31
eor tmp, tmp, #0x8000

to calculate the saturation value and:

addsres, x, y
cselres, tmp, res, vs

to calculate the full result.  That's the same number of instructions
as two fmovs for the inputs, the sqadd, and the fmov for the result,
but it should be more efficient.

The reason for asking now, rather than treating it as a potential
future improvement, is that it would also avoid splitting the patterns
for signed and unsigned ops.  (The length of the split alternative can be
conservatively set to 16 even for the unsigned version, since nothing
should care in practice.  The split will have happened before
shorten_branches.)

> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc: Expand iterators.
>   * config/aarch64/aarch64-simd-builtins.def: Use standard names
>   * config/aarch64/aarch64-simd.md: Use standard names, split insn
>   definitions on signedness of operator and type of operands.
>   * config/aarch64/arm_neon.h: Use standard builtin names.
>   * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
>   simplify splitting of insn for unsigned scalar arithmetic.
>
> gcc/testsuite/ChangeLog:
>
>   * 
> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
>   Template file for unsigned vector saturating arithmetic tests.
>   * 
> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
>   8-bit vector type tests.
>   * 
> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
>   16-bit vector type tests.
>   * 
> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
>   32-bit vector type tests.
>   * 
> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
>   64-bit vector type tests.
>   * gcc.target/aarch64/saturating_arithmetic.inc: Template file
>   for scalar saturating arithmetic tests.
>   * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
>   * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
>   * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
>   * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>  
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
> new file mode 100644
> index 000..63eb21e438b
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
> @@ -0,0 +1,79 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +/*
> +** uadd_lane: { xfail *-*-* }

Just curious: why does this fail?  Is it a vector costing issue?

> +**   dup\tv([0-9]+).8b, w0
> +**   uqadd\tb([0-9]+), b\1, b0
> +**   umov\tw0, v\2.b\[0]
> +**   ret
> +*/
> +/*
> +** uaddq:
> +** ...
> +**   ldr\tq([0-9]+), .*
> +**   ldr\tq([0-9]+), .*
> +**   uqadd\tv\2.16b, v\1.16b, v\2.16b

Since the operands are commutative, and since there's no restriction
on the choice of destination register, it's probably safer to use:

> +**   uqadd\tv[0-9].16b, (?:v\1.16b, v\2.16b|v\2.16b, v\1.16b)

Similarly for the other qadds.  The qsubs do of course have a fixed
order, but the destination is similarly not restricted, so should use
[0-9]+ rather than \n.

Thanks,
Richard


Re: [PATCH v3] AArch64: Fix copysign patterns

2024-10-23 Thread Richard Sandiford
Wilco Dijkstra  writes:
> The current copysign pattern has a mismatch in the predicates and constraints 
> -
> operand[2] is a register_operand but also has an alternative X which allows 
> any
> operand.  Since it is a floating point operation, having an integer 
> alternative
> makes no sense.  Change the expander to always use vector immediates which 
> results
> in better code and sharing of immediates between copysign and xorsign.
>
> Passes bootstrap and regress, OK for commit?
>
> gcc/Changelog:
> * config/aarch64/aarch64.md (copysign3): Widen immediate to 
> vector.
> (copysign3_insn): Use VQ_INT_EQUIV in operand 3.
> * config/aarch64/iterators.md (VQ_INT_EQUIV): New iterator.
> (vq_int_equiv): Likewise.
>
> testsuite/Changelog:
> * gcc.target/aarch64/copysign_3.c: New test.
> * gcc.target/aarch64/copysign_4.c: New test.
> * gcc.target/aarch64/fneg-abs_2.c: Fixup test.
> * gcc.target/aarch64/sve/fneg-abs_2.c: Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> c54b29cd64b9e0dc6c6d12735049386ccedc5408..71f9743df671b70e6a2d189f49de58995398abee
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7218,20 +7218,11 @@ (define_expand "lrint2"
>  }
>  )
>  
> -;; For copysign (x, y), we want to generate:
> +;; For copysignf (x, y), we want to generate:
>  ;;
> -;;   LDR d2, #(1 << 63)
> -;;   BSL v2.8b, [y], [x]
> +;;   moviv31.4s, 0x80, lsl 24
> +;;   bit v0.16b, v1.16b, v31.16b
>  ;;
> -;; or another, equivalent, sequence using one of BSL/BIT/BIF.  Because
> -;; we expect these operations to nearly always operate on
> -;; floating-point values, we do not want the operation to be
> -;; simplified into a bit-field insert operation that operates on the
> -;; integer side, since typically that would involve three inter-bank
> -;; register copies.  As we do not expect copysign to be followed by
> -;; other logical operations on the result, it seems preferable to keep
> -;; this as an unspec operation, rather than exposing the underlying
> -;; logic to the compiler.

I think the comment starting "Because we expect..." is worth keeping.
It explains why we use an unspec for something that could be expressed
in generic rtl.

OK with that change, thanks.

Richard

>  (define_expand "copysign3"
>[(match_operand:GPF 0 "register_operand")
> @@ -7239,32 +7230,25 @@ (define_expand "copysign3"
> (match_operand:GPF 2 "nonmemory_operand")]
>"TARGET_SIMD"
>  {
> -  rtx signbit_const = GEN_INT (HOST_WIDE_INT_M1U
> -<< (GET_MODE_BITSIZE (mode) - 1));
> -  /* copysign (x, -1) should instead be expanded as orr with the sign
> - bit.  */
> +  rtx sign = GEN_INT (HOST_WIDE_INT_M1U << (GET_MODE_BITSIZE (mode) - 
> 1));
> +  rtx v_bitmask = gen_const_vec_duplicate (mode, sign);
> +  v_bitmask = force_reg (mode, v_bitmask);
> +
> +  /* copysign (x, -1) should instead be expanded as orr with the signbit.  */
>rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
> +
>if (GET_CODE (op2_elt) == CONST_DOUBLE
>&& real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
>  {
> -  rtx v_bitmask
> - = force_reg (V2mode,
> -  gen_const_vec_duplicate (V2mode,
> -   signbit_const));
> -
> -  emit_insn (gen_iorv23 (
> - lowpart_subreg (V2mode, operands[0], mode),
> - lowpart_subreg (V2mode, operands[1], mode),
> +  emit_insn (gen_ior3 (
> + lowpart_subreg (mode, operands[0], mode),
> + lowpart_subreg (mode, operands[1], mode),
>   v_bitmask));
>DONE;
>  }
> -
> -  machine_mode int_mode = mode;
> -  rtx bitmask = gen_reg_rtx (int_mode);
> -  emit_move_insn (bitmask, signbit_const);
>operands[2] = force_reg (mode, operands[2]);
>emit_insn (gen_copysign3_insn (operands[0], operands[1], operands[2],
> -bitmask));
> +v_bitmask));
>DONE;
>  }
>  )
> @@ -7273,23 +7257,21 @@ (define_insn "copysign3_insn"
>[(set (match_operand:GPF 0 "register_operand")
>   (unspec:GPF [(match_operand:GPF 1 "register_operand")
>(match_operand:GPF 2 "register_operand")
> -  (match_operand: 3 "register_operand")]
> +  (match_operand: 3 "register_operand")]
>UNSPEC_COPYSIGN))]
>"TARGET_SIMD"
>{@ [ cons: =0 , 1 , 2 , 3 ; attrs: type  ]
>   [ w, w , w , 0 ; neon_bsl  ] bsl\t%0., %2., 
> %1.
>   [ w, 0 , w , w ; neon_bsl  ] bit\t%0., %2., 
> %3.
>   [ w, w , 0 , w ; neon_bsl  ] bif\t%0., %1., 
> %3.
> - [ r, r , 0 , X ; bfm  ] bfxil\t%0, %1, #0, 
> 
>}
>  )
>  
> -
> -;; For xorsign (x, y), we want to generate:
> +;; For xorsignf (x, y), we want to generate:
>  ;;
> -;; LDR   d2, #1<<63
> -;; AND   v3.8B, v1.8B, v2.8B
> -;; EOR   v0.8B, 

[PATCH] libstdc++: Replace std::__to_address in C++20 branch in

2024-10-23 Thread Jonathan Wakely
As noted by Patrick, r15-4546-g85e5b80ee2de80 should have changed the
usage of std::__to_address to std::to_address in the C++20-specific
branch that works on types satisfying std::contiguous_iterator.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (assign(Iter, Iter)): Call
std::to_address instead of __to_address.
---
Tested x86_64-linux.

This patch is also available as a pull request in the forge:
https://forge.sourceware.org/gcc/gcc-TEST/pulls/2

 libstdc++-v3/include/bits/basic_string.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 16e356e0678..28b3e536185 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1748,7 +1748,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
{
  __glibcxx_requires_valid_range(__first, __last);
  return _M_replace(size_type(0), size(),
-   std::__to_address(__first), __last - __first);
+   std::to_address(__first), __last - __first);
}
 #endif
  else
-- 
2.46.2



[PATCH] libstdc++: Add GLIBCXX_TESTSUITE_STDS example to docs

2024-10-23 Thread Jonathan Wakely
libstdc++-v3/ChangeLog:

* doc/xml/manual/test.xml: Add GLIBCXX_TESTSUITE_STDS example.
* doc/html/manual/test.html: Regenerate.
---

This patch is also available as a pull request in the forge:
https://forge.sourceware.org/gcc/gcc-TEST/pulls/1

 libstdc++-v3/doc/html/manual/test.html | 5 +++--
 libstdc++-v3/doc/xml/manual/test.xml   | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/test.html 
b/libstdc++-v3/doc/html/manual/test.html
index 3657997fad4..1c7af1193da 100644
--- a/libstdc++-v3/doc/html/manual/test.html
+++ b/libstdc++-v3/doc/html/manual/test.html
@@ -352,12 +352,13 @@ cat 27_io/objects/char/3_xin.in | 
a.out-std, similar to 
the G++ tests.
   Adding set v3_std_list { 11 17 23 } to
-  ~/.dejagnurc or a file named by the
+  ~/.dejagnurc or to a file named by the
   DEJAGNU environment variable will cause every 
test to
   be run three times, using a different -std 
each time.
   Alternatively, a list of standard versions to test with can be specified
   as a comma-separated list in the GLIBCXX_TESTSUITE_STDS
-  environment variable.
+  environment variable, e.g. GLIBCXX_TESTSUITE_STDS=11,17,23
+  is equivalent to the v3_std_list value above.
 
   To run the libstdc++ test suite under the
   debug mode, use
diff --git a/libstdc++-v3/doc/xml/manual/test.xml 
b/libstdc++-v3/doc/xml/manual/test.xml
index 40926946fe7..6b7f1b04a2a 100644
--- a/libstdc++-v3/doc/xml/manual/test.xml
+++ b/libstdc++-v3/doc/xml/manual/test.xml
@@ -600,12 +600,13 @@ cat 27_io/objects/char/3_xin.in | a.out
   Since GCC 14, the libstdc++ testsuite has built-in support for running
   tests with more than one -std, similar to the G++ tests.
   Adding set v3_std_list { 11 17 23 } to
-  ~/.dejagnurc or a file named by the
+  ~/.dejagnurc or to a file named by the
   DEJAGNU environment variable will cause every test to
   be run three times, using a different -std each time.
   Alternatively, a list of standard versions to test with can be specified
   as a comma-separated list in the GLIBCXX_TESTSUITE_STDS
-  environment variable.
+  environment variable, e.g. GLIBCXX_TESTSUITE_STDS=11,17,23
+  is equivalent to the v3_std_list value above.
 
 
 
-- 
2.46.2



Re: SVE intrinsics: Fold constant operands for svlsl.

2024-10-23 Thread Richard Sandiford
Soumya AR  writes:
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 41673745cfe..aa556859d2e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -1143,11 +1143,14 @@ aarch64_const_binop (enum tree_code code, tree arg1, 
> tree arg2)
>tree type = TREE_TYPE (arg1);
>signop sign = TYPE_SIGN (type);
>wi::overflow_type overflow = wi::OVF_NONE;
> -
> +  unsigned int element_bytes = tree_to_uhwi (TYPE_SIZE_UNIT (type));
>/* Return 0 for division by 0, like SDIV and UDIV do.  */
>if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
>   return arg2;
> -
> +  /* Return 0 if shift amount is out of range. */
> +  if (code == LSHIFT_EXPR
> +   && tree_to_uhwi (arg2) >= (element_bytes * BITS_PER_UNIT))

tree_to_uhwi is dangerous because a general shift might be negative
(even if these particular shift amounts are unsigned).  We should
probably also key off TYPE_PRECISION rather than TYPE_SIZE_UNIT.  So:

if (code == LSHIFT_EXPR
&& wi::geu_p (wi::to_wide (arg2), TYPE_PRECISION (type)))

without the element_bytes variable.  Also: the indentation looks a bit off;
it should be tabs only followed by spaces only.

OK with those change, thanks.

Richard


> + return build_int_cst (type, 0);
>if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>   return NULL_TREE;
>return force_fit_type (type, poly_res, false,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_lsl_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_lsl_1.c
> new file mode 100644
> index 000..6109558001a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_lsl_1.c
> @@ -0,0 +1,142 @@
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_sve.h"
> +
> +/*
> +** s64_x:
> +**   mov z[0-9]+\.d, #20
> +**   ret
> +*/
> +svint64_t s64_x (svbool_t pg) {
> +return svlsl_n_s64_x (pg, svdup_s64 (5), 2);  
> +}
> +
> +/*
> +** s64_x_vect:
> +**   mov z[0-9]+\.d, #20
> +**   ret
> +*/
> +svint64_t s64_x_vect (svbool_t pg) {
> +return svlsl_s64_x (pg, svdup_s64 (5), svdup_u64 (2));  
> +}
> +
> +/*
> +** s64_z:
> +**   mov z[0-9]+\.d, p[0-7]/z, #20
> +**   ret
> +*/
> +svint64_t s64_z (svbool_t pg) {
> +return svlsl_n_s64_z (pg, svdup_s64 (5), 2);  
> +}
> +
> +/*
> +** s64_z_vect:
> +**   mov z[0-9]+\.d, p[0-7]/z, #20
> +**   ret
> +*/
> +svint64_t s64_z_vect (svbool_t pg) {
> +return svlsl_s64_z (pg, svdup_s64 (5), svdup_u64 (2));  
> +}
> +
> +/*
> +** s64_m_ptrue:
> +**   mov z[0-9]+\.d, #20
> +**   ret
> +*/
> +svint64_t s64_m_ptrue () {
> +return svlsl_n_s64_m (svptrue_b64 (), svdup_s64 (5), 2);  
> +}
> +
> +/*
> +** s64_m_ptrue_vect:
> +**   mov z[0-9]+\.d, #20
> +**   ret
> +*/
> +svint64_t s64_m_ptrue_vect () {
> +return svlsl_s64_m (svptrue_b64 (), svdup_s64 (5), svdup_u64 (2));  
> +}
> +
> +/*
> +** s64_m_pg:
> +**   mov z[0-9]+\.d, #5
> +**   lsl z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2
> +**   ret
> +*/
> +svint64_t s64_m_pg (svbool_t pg) {
> +return svlsl_n_s64_m (pg, svdup_s64 (5), 2);
> +} 
> +
> +/*
> +** s64_m_pg_vect:
> +**   mov z[0-9]+\.d, #5
> +**   lsl z[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, #2
> +**   ret
> +*/
> +svint64_t s64_m_pg_vect (svbool_t pg) {
> +return svlsl_s64_m (pg, svdup_s64 (5), svdup_u64 (2));
> +}
> +
> +/*
> +** s64_x_0:
> +**   mov z[0-9]+\.d, #5
> +**   ret
> +*/
> +svint64_t s64_x_0 (svbool_t pg) {
> +return svlsl_n_s64_x (pg, svdup_s64 (5), 0);  
> +}
> +
> +/*
> +** s64_x_bit_width:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   ret
> +*/
> +svint64_t s64_x_bit_width (svbool_t pg) {
> +return svlsl_n_s64_x (pg, svdup_s64 (5), 64); 
> +}
> +
> +/*
> +** s64_x_out_of_range:
> +**   movi?   [vdz]([0-9]+)\.?(?:[0-9]*[bhsd])?, #?0
> +**   ret
> +*/
> +svint64_t s64_x_out_of_range (svbool_t pg) {
> +return svlsl_n_s64_x (pg, svdup_s64 (5), 68); 
> +}
> +
> +/*
> +** u8_x_unsigned_overflow:
> +**   mov z[0-9]+\.b, #-2
> +**   ret
> +*/
> +svuint8_t u8_x_unsigned_overflow (svbool_t pg) {
> +return svlsl_n_u8_x (pg, svdup_u8 (255), 1);  
> +}
> +
> +/*
> +** s8_x_signed_overflow:
> +**   mov z[0-9]+\.b, #-2
> +**   ret
> +*/
> +svint8_t s8_x_signed_overflow (svbool_t pg) {
> +return svlsl_n_s8_x (pg, svdup_s8 (255), 1);  
> +}
> +
> +/*
> +** s8_x_neg_shift:
> +**   mov z[0-9]+\.b, #-2
> +**   ret
> +*/
> +svint8_t s8_x_neg_shift (svbool_t pg) {
> +return svlsl_n_s8_x (pg, svdup_s8 (-1), 1);  
> +}
> +
> +/*
> +** s8_x_max_shift:
> +**   mov z[0-9]+\.b, #-128
> +**   ret
> +*/
> +svint8_t s8_x_max_shift (svbool_t pg) {
> +return svlsl_n_s8_x (pg, svdup_s8 (1), 7);  
> +}
> +


[PATCH 1/2] Relax vect_check_scalar_mask check

2024-10-23 Thread Richard Biener
When the mask is not a constant or external def there's no need to
check the scalar type, in particular with SLP and the mask being
a VEC_PERM_EXPR there isn't a scalar operand ready to check
(not one vect_is_simple_use will get you).  We later check the
vector type and reject non-mask types there.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vect_check_scalar_mask): Only check
the scalar type for constant or extern defs.
---
 gcc/tree-vect-stmts.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cca6fd6fa97..55f263620c5 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2520,7 +2520,8 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
-  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask)))
+  if ((mask_dt == vect_constant_def || mask_dt == vect_external_def)
+  && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (*mask)))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.43.0



Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-23 Thread Richard Sandiford
Richard Sandiford  writes:
> Akram Ahmad  writes:
>> This renames the existing {s,u}q{add,sub} instructions to use the
>> standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
>> IFN_SAT_SUB.
>>
>> The NEON intrinsics for saturating arithmetic and their corresponding
>> builtins are changed to use these standard names too.
>>
>> Using the standard names for the instructions causes 32 and 64-bit
>> unsigned scalar saturating arithmetic to use the NEON instructions,
>> resulting in an additional (and inefficient) FMOV to be generated when
>> the original operands are in GP registers. This patch therefore also
>> restores the original behaviour of using the adds/subs instructions
>> in this circumstance.
>>
>> Additional tests are written for the scalar and Adv. SIMD cases to
>> ensure that the correct instructions are used. The NEON intrinsics are
>> already tested elsewhere.
>
> Thanks for doing this.  The approach looks good.  My main question is:
> are we sure that we want to use the Advanced SIMD instructions for
> signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
> we only saturate at the negative limit if both operands are negative,
> and only saturate at the positive limit if both operands are positive.
> So for 32-bit values we can use:
>
>   asr tmp, x or y, #31
>   eor tmp, tmp, #0x8000
>
> to calculate the saturation value and:
>
>   addsres, x, y
>   cselres, tmp, res, vs

Bah, knew I should have sat on this before sending.  tmp is the
inverse of the saturation value, so we want:

csinv   res, res, tmp, vc

instead of the csel above.

> to calculate the full result.  That's the same number of instructions
> as two fmovs for the inputs, the sqadd, and the fmov for the result,
> but it should be more efficient.
>
> The reason for asking now, rather than treating it as a potential
> future improvement, is that it would also avoid splitting the patterns
> for signed and unsigned ops.  (The length of the split alternative can be
> conservatively set to 16 even for the unsigned version, since nothing
> should care in practice.  The split will have happened before
> shorten_branches.)
>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-builtins.cc: Expand iterators.
>>  * config/aarch64/aarch64-simd-builtins.def: Use standard names
>>  * config/aarch64/aarch64-simd.md: Use standard names, split insn
>>  definitions on signedness of operator and type of operands.
>>  * config/aarch64/arm_neon.h: Use standard builtin names.
>>  * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
>>  simplify splitting of insn for unsigned scalar arithmetic.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
>>  Template file for unsigned vector saturating arithmetic tests.
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
>>  8-bit vector type tests.
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
>>  16-bit vector type tests.
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
>>  32-bit vector type tests.
>>  * 
>> gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
>>  64-bit vector type tests.
>>  * gcc.target/aarch64/saturating_arithmetic.inc: Template file
>>  for scalar saturating arithmetic tests.
>>  * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
>>  * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
>>  * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
>>  * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>  
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>> new file mode 100644
>> index 000..63eb21e438b
>> --- /dev/null
>> +++ 
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>> @@ -0,0 +1,79 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +/*
>> +** uadd_lane: { xfail *-*-* }
>
> Just curious: why does this fail?  Is it a vector costing issue?
>
>> +**  dup\tv([0-9]+).8b, w0
>> +**  uqadd\tb([0-9]+), b\1, b0
>> +**  umov\tw0, v\2.b\[0]
>> +**  ret
>> +*/
>> +/*
>> +** uaddq:
>> +** ...
>> +**  ldr\tq([0-9]+), .*
>> +**  ldr\tq([0-9]+), .*
>> +**  uqadd\tv\2.16b, v\1.16b, v\2.16b
>
> Since the operands are commutative, and since there's no restriction
> on the choice of destination register, it's probably safer to use:
>
>> +**  uqadd\tv[0-9].16b, (?:v\1.16b, v\2.16b|v\2.16b, v\1.16b)
>
> Similarly for the other qadds.  The qsubs do of course have a fixed

Re: [PATCH v3] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-10-23 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This is similar to the recent improvements to the Advanced SIMD popcount
> expansion by using SVE. We can utilize SVE to generate more efficient code for
> scalar mode popcount too.
>
> Changes since v1:
> * v2: Add a new VNx1BI mode and a new test case for V1DI.
> * v3: Abandon VNx1BI changes and add a new variant of aarch64_ptrue_reg.

Sorry for the slow review.

The patch looks good though.  OK with the changes below:

> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> new file mode 100644
> index 000..f086cae55a2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fgimple" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +

It's probably safer to add:

#pragma GCC target "+nosve"

here, so that we don't try to use the SVE instructions.

> +/*
> +** foo:
> +**   cnt v0.8b, v0.8b
> +**   addvb0, v0.8b

Nothing requires the temporary register to be v0, so this should be
something like:

cnt (v[0-9]+\.8b), v0\.8b
addvb0, \1

Thanks,
Richard

> +**   ret
> +*/
> +__Uint64x1_t __GIMPLE
> +foo (__Uint64x1_t x)
> +{
> +  __Uint64x1_t z;
> +
> +  z = .POPCOUNT (x);
> +  return z;
> +}


[PATCH 2/2] tree-optimization/116575 - SLP masked load-lanes discovery

2024-10-23 Thread Richard Biener
The following implements masked load-lane discovery for SLP.  The
challenge here is that a masked load has a full-width mask with
group-size number of elements when this becomes a masked load-lanes
instruction one mask element gates all group members.  We already
have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY
to guard non-uniform masks, but we need to choose a way for SLP
discovery to handle possible masked load-lanes SLP trees.

I have this time chosen to handle load-lanes discovery where we
have performed permute optimization already and conveniently got
the graph with predecessor edges built.  This is because unlike
non-masked loads masked loads with a load_permutation are never
produced by SLP discovery (because load permutation handling doesn't
handle un-permuting the mask) and thus the load-permutation lowering
which handles non-masked load-lanes discovery doesn't trigger.

With this SLP discovery for a possible masked load-lanes, thus
a masked load with uniform mask, produces a splat of a single-lane
sub-graph as the mask SLP operand.  This is a representation that
shouldn't pessimize the mask load case and allows the masked load-lanes
transform to simply elide this splat.

This fixes the aarch64-sve.exp mask_struct_load*.c testcases with
--param vect-force-slp=1

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I realize we are still quite inconsistent in how we do SLP
discovery - mainly because of my idea to only apply minimal
changes at this point.  I would expect that permuted masked loads
miss the interleaving lowering performed by load permutation
lowering.  And if we fix that we again have to decide whether
to interleave or load-lane at the same time.  I'm also not sure
how much good the optimize_slp passes to do VEC_PERMs in the
SLP graph and what stops working when there are no longer any
load_permutations in there.

Richard.

PR tree-optimization/116575
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle
gaps, aka NULL scalar stmt.
(vect_build_slp_tree_2): Allow gaps in the middle of a
grouped mask load.  When the mask of a grouped mask load
is uniform do single-lane discovery for the mask and
insert a splat VEC_PERM_EXPR node.
(vect_optimize_slp_pass::decide_masked_load_lanes): New
function.
(vect_optimize_slp_pass::run): Call it.
---
 gcc/tree-vect-slp.cc | 138 ++-
 1 file changed, 135 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fca9ae86d2e..037098a96cb 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -641,6 +641,16 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
   unsigned int commutative_op = -1U;
   bool first = stmt_num == 0;
 
+  if (!stmt_info)
+{
+  for (auto oi : *oprnds_info)
+   {
+ oi->def_stmts.quick_push (NULL);
+ oi->ops.quick_push (NULL_TREE);
+   }
+  return 0;
+}
+
   if (!is_a (stmt_info->stmt)
   && !is_a (stmt_info->stmt)
   && !is_a (stmt_info->stmt))
@@ -2029,9 +2039,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
has_gaps = true;
  /* We cannot handle permuted masked loads directly, see
 PR114375.  We cannot handle strided masked loads or masked
-loads with gaps.  */
+loads with gaps unless the mask is uniform.  */
  if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
-  && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps))
+  && (DR_GROUP_GAP (first_stmt_info) != 0
+  || (has_gaps
+  && STMT_VINFO_SLP_VECT_ONLY (first_stmt_info
  || STMT_VINFO_STRIDED_P (stmt_info))
{
  load_permutation.release ();
@@ -2054,7 +2066,12 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  unsigned i = 0;
  for (stmt_vec_info si = first_stmt_info;
   si; si = DR_GROUP_NEXT_ELEMENT (si))
-   stmts2[i++] = si;
+   {
+ if (si != first_stmt_info)
+   for (unsigned k = 1; k < DR_GROUP_GAP (si); ++k)
+ stmts2[i++] = NULL;
+ stmts2[i++] = si;
+   }
  bool *matches2 = XALLOCAVEC (bool, dr_group_size);
  slp_tree unperm_load
= vect_build_slp_tree (vinfo, stmts2, dr_group_size,
@@ -2719,6 +2736,43 @@ out:
  continue;
}
 
+  /* When we have a masked load with uniform mask discover this
+as a single-lane mask with a splat permute.  This way we can
+recognize this as a masked load-lane by stripping the splat.  */
+  if (is_a  (STMT_VINFO_STMT (stmt_info))
+ && gimple_call_internal_p (STMT_VINFO_STMT (stm

Re: [PATCH v4 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-10-23 Thread Paul-Antoine Arras

Here is the updated patch.

On 23/10/2024 11:41, Tobias Burnus wrote:
* The update to builtins.cc's builtin_fnspec  is lacking in the 
changelog list.


Added missing items to the ChangeLog.

* And the new testcase, new gcc/testsuite/c-c++-common/gomp/ 
dispatch-10.c, has to be put into 3/7 or later of the series as it 
requires a parser and, as written/file location, requires both C and C+ 
+, unless a skip for C is added.


Moved the testcase to 5/7. Will post the updated patch later.

Thanks,
--
PAcommit 6265c755697bba629fce4b27b8f1800ea1e313fb
Author: Paul-Antoine Arras 
Date:   Fri May 24 15:53:45 2024 +0200

OpenMP: middle-end support for dispatch + adjust_args

This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `omp_get_mapped_ptr` for the adequate device.

For dispatch, the following steps are performed:

* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.

* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.

gcc/ChangeLog:

* builtins.cc (builtin_fnspec): Handle BUILT_IN_OMP_GET_MAPPED_PTR.
* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH.
* gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_dispatch): New function.
(gimple_copy): Handle GIMPLE_OMP_DISPATCH.
* gimple.def (GIMPLE_OMP_DISPATCH): Define.
* gimple.h (gimple_build_omp_dispatch): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH.
(gimple_omp_dispatch_clauses): New function.
(gimple_omp_dispatch_clauses_ptr): Likewise.
(gimple_omp_dispatch_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH.
* gimplify.cc (enum omp_region_type): Add ORT_DISPATCH.
(struct gimplify_omp_ctx): Add in_call_args.
(gimplify_call_expr): Handle need_device_ptr arguments.
(is_gimple_stmt): Handle OMP_DISPATCH.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch
construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(omp_has_novariants): New function.
(omp_has_nocontext): Likewise.
(omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext
clause.
(find_ifn_gomp_dispatch): New function.
(gimplify_omp_dispatch): Likewise.
(gimplify_expr): Handle OMP_DISPATCH.
* gimplify.h (omp_has_novariants): Declare.
* internal-fn.cc (expand_GOMP_DISPATCH): New function.
* internal-fn.def (GOMP_DISPATCH): Define.
* omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define.
(BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define.
(BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define.
* omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH.
(struct omp_ts_info): Add dispatch.
(omp_resolve_declare_variant): Handle novariants. Adjust
DECL_ASSEMBLER_NAME.
* omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_DISPATCH.
(lower_omp_dispatch): New function.
(lower_omp_1): Call it.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_DISPATCH.
(estimate_num_insns): Handle GIMPLE_OMP_DISPATCH.

diff --git gcc/builtins.cc gcc/builtins.cc
index 37c7c98e5c7..c7a8310bb3f 100644
--- gcc/builtins.cc
+++ gcc/builtins.cc
@@ -12580,6 +12580,8 @@ builtin_fnspec (tree callee)
 	 by its first argument.  */
   case BUILT_IN_POSIX_MEMALIGN:
 	return ".cOt";
+  case BUILT_IN_OMP_GET_MAPPED_PTR:
+	return ". R ";
 
   default:
 	return "";
diff --git gcc/gimple-low.cc gcc/gimple-low.cc
index e0371988705..712a1ebf776 100644
--- gcc/gimple-low.cc
+++ gcc/gimple-low.cc
@@ -746,6 +746,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data *data)
 case GIMPLE_EH_MUST_NOT_THROW:
 case GIMPLE_OMP_FOR:
 case GIMPLE_OMP_SCOPE:
+case GIMPLE_OMP_DISPATCH:
 case GIMPLE_OMP_SECTIONS:
 case GIMPLE_OMP_SECTIONS_SWITCH:
 case GIMPLE_OMP_SECTION:
diff --git gcc/gimple-pretty-print.cc gcc/gimple-pretty-print.cc
index 01d7c9f6eeb..7a45e8ec843 100644
--- gcc/gimple-pretty-print.cc
+++ gcc/gimple-pretty-print.cc
@@ -1726,6 +1726,35 @@ dump_gimple_omp_scope (pretty_printer *pp, const gimple *gs,
 }
 }
 
+/

Re: [PATCH] c++: Further fix for get_member_function_from_ptrfunc [PR117259]

2024-10-23 Thread Jason Merrill

On 10/22/24 2:17 PM, Jakub Jelinek wrote:

Hi!

The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.

The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation).  Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.


Hmm, when is that be a problem?  I wouldn't expect instance_ptr to be 
tree_invariant_p.



So, the following patch instead of using save_expr or building SAVE_EXPR
manually builds a TARGET_EXPR.  Both types are pointers, so it doesn't need
to be destroyed in any way, but TARGET_EXPR is what doesn't get optimized
away immediately.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2024-10-22  Jakub Jelinek  

PR c++/117259
* typeck.cc (get_member_function_from_ptrfunc): Use force_target_expr
rather than save_expr for instance_ptr and function.  Don't call it
for TREE_CONSTANT.

* g++.dg/ubsan/pr117259.C: New test.

--- gcc/cp/typeck.cc.jj 2024-10-16 14:42:58.835725318 +0200
+++ gcc/cp/typeck.cc2024-10-22 16:12:58.462731292 +0200
@@ -4193,24 +4193,27 @@ get_member_function_from_ptrfunc (tree *
if (!nonvirtual && is_dummy_object (instance_ptr))
nonvirtual = true;
  
-  /* Use save_expr even when instance_ptr doesn't have side-effects,

-unless it is a simple decl (save_expr won't do anything on
-constants), so that we don't ubsan instrument the expression
-multiple times.  See PR116449.  */
+  /* Use force_target_expr even when instance_ptr doesn't have
+side-effects, unless it is a simple decl or constant, so
+that we don't ubsan instrument the expression multiple times.
+Don't use save_expr, as save_expr can avoid building a SAVE_EXPR
+and building a SAVE_EXPR manually can be optimized away during
+cp_fold.  See PR116449 and PR117259.  */
if (TREE_SIDE_EFFECTS (instance_ptr)
- || (!nonvirtual && !DECL_P (instance_ptr)))
-   {
- instance_save_expr = save_expr (instance_ptr);
- if (instance_save_expr == instance_ptr)
-   instance_save_expr = NULL_TREE;
- else
-   instance_ptr = instance_save_expr;
-   }
+ || (!nonvirtual
+ && !DECL_P (instance_ptr)
+ && !TREE_CONSTANT (instance_ptr)))
+   instance_ptr = instance_save_expr
+ = force_target_expr (TREE_TYPE (instance_ptr), instance_ptr,
+  complain);
  
/* See above comment.  */

if (TREE_SIDE_EFFECTS (function)
- || (!nonvirtual && !DECL_P (function)))
-   function = save_expr (function);
+ || (!nonvirtual
+ && !DECL_P (function)
+ && !TREE_CONSTANT (function)))
+   function
+ = force_target_expr (TREE_TYPE (function), function, complain);
  
/* Start by extracting all the information from the PMF itself.  */

e3 = pfn_from_ptrmemfunc (function);
--- gcc/testsuite/g++.dg/ubsan/pr117259.C.jj2024-10-22 17:00:52.156114344 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr117259.C   2024-10-22 17:05:20.470324367 
+0200
@@ -0,0 +1,13 @@
+// PR c++/117259
+// { dg-do compile }
+// { dg-options "-Wuninitialized -fsanitize=undefined" }
+
+struct A { void foo () {} };
+struct B { void (A::*b) (); B (void (A::*x) ()) : b(x) {}; };
+const B c[1] = { &A::foo };
+
+void
+foo (A *x, int y)
+{
+  (x->*c[y].b) ();
+}

Jakub





[COMMITTED] PR tree-optimization/117222 - Implement operator_pointer_diff::fold_range

2024-10-23 Thread Andrew MacLeod


pointer_diff depends on range_operator::fold_range to do the generic 
fold, which invokes wi_fold on subranges.  It also in turn invokes 
op1_op2_relation_effect for relation effects.


This worked fine when pointers were implemented with irange, but when 
the transition to prange was made, a new version of fold_range is 
invoked by dispatch which uses prange operators. The default fold_range 
for prange does nothing, expecting the few pointer operators which need 
it to implement their own fold_range.  As a result all calls to 
fold_range for pointer_diff were returning false, which translates to 
VARYING.


This PR demonstrates a dependency on knowing the relation between the 
operands is VREL_NE, in which case pointer_diff should return a non-zero 
result instead of varying.  This patch implements the pointer_diff 
version of fold_range.


I am also auditing the other prange operators to see if similar dispatch 
related issues were missed.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From 774ad67fba458dd1beaa0f2d3e389aac46ca18b5 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 21 Oct 2024 16:32:00 -0400
Subject: [PATCH] Implement operator_pointer_diff::fold_range

prange has no default fold_range processing like irange does, so each
pointer specific operator needs to implement its own fold routine.

	PR tree-optimization/117222
	gcc/
	* range-op-ptr.cc (operator_pointer_diff::fold_range): New.
	(operator_pointer_diff::op1_op2_relation_effect): Remove irange
	variant.
	(operator_pointer_diff::update_bitmask): Likewise.

	gcc/testsuite
	* g++.dg/pr117222.C: New.
---
 gcc/range-op-ptr.cc | 37 ++---
 gcc/testsuite/g++.dg/pr117222.C | 16 ++
 2 files changed, 36 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr117222.C

diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 24e206c00cd..07a551618f9 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -567,25 +567,38 @@ pointer_or_operator::wi_fold (irange &r, tree type,
 
 class operator_pointer_diff : public range_operator
 {
+  using range_operator::fold_range;
   using range_operator::update_bitmask;
   using range_operator::op1_op2_relation_effect;
-  virtual bool op1_op2_relation_effect (irange &lhs_range,
-	tree type,
-	const irange &op1_range,
-	const irange &op2_range,
-	relation_kind rel) const;
+  virtual bool fold_range (irange &r, tree type,
+			   const prange &op1,
+			   const prange &op2,
+			   relation_trio trio) const final override;
   virtual bool op1_op2_relation_effect (irange &lhs_range,
 	tree type,
 	const prange &op1_range,
 	const prange &op2_range,
 	relation_kind rel) const final override;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, POINTER_DIFF_EXPR, lh, rh); }
   void update_bitmask (irange &r,
 		   const prange &lh, const prange &rh) const final override
   { update_known_bitmask (r, POINTER_DIFF_EXPR, lh, rh); }
 } op_pointer_diff;
 
+bool
+operator_pointer_diff::fold_range (irange &r, tree type,
+   const prange &op1,
+   const prange &op2,
+   relation_trio trio) const
+{
+  gcc_checking_assert (r.supports_type_p (type));
+
+  r.set_varying (type);
+  relation_kind rel = trio.op1_op2 ();
+  op1_op2_relation_effect (r, type, op1, op2, rel);
+  update_bitmask (r, op1, op2);
+  return true;
+}
+
 bool
 operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 		const prange &op1_range,
@@ -602,16 +615,6 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
   return minus_op1_op2_relation_effect (lhs_range, type, op1, op2, rel);
 }
 
-bool
-operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
-		const irange &op1_range,
-		const irange &op2_range,
-		relation_kind rel) const
-{
-  return minus_op1_op2_relation_effect (lhs_range, type, op1_range, op2_range,
-	rel);
-}
-
 bool
 operator_identity::fold_range (prange &r, tree type ATTRIBUTE_UNUSED,
 			   const prange &lh ATTRIBUTE_UNUSED,
diff --git a/gcc/testsuite/g++.dg/pr117222.C b/gcc/testsuite/g++.dg/pr117222.C
new file mode 100644
index 000..60cf6e30ed5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr117222.C
@@ -0,0 +1,16 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O3 -fdump-tree-evrp" }
+
+#include 
+int main()
+{
+std::vector c {1,2,3,0};
+while(c.size() > 0 && c.back() == 0)
+{
+auto sz = c.size() -1;
+c.resize(sz);
+}
+return 0;
+}
+/* { dg-final { scan-tree-dump "Global Exported.*\[-INF, -1\]\[1, +INF\]" "evrp" } } */
-- 
2.45.0



Re: [PATCH v2 2/4] aarch64: add minimal support of AEABI build attributes for GCS.

2024-10-23 Thread Richard Sandiford
Matthieu Longo  writes:
> @@ -24803,6 +24834,16 @@ aarch64_start_file (void)
> asm_fprintf (asm_out_file, "\t.arch %s\n",
>   aarch64_last_printed_arch_string.c_str ());
>  
> +  /* Check whether the current assembly supports gcs build attributes, if not
> + fallback to .note.gnu.property section.  */
> +#if (HAVE_AS_AEABI_BUILD_ATTRIBUTES)
> +  if (aarch64_gcs_enabled ())

I was hoping we could instead use:

  if (HAVE_AS_AEABI_BUILD_ATTRIBUTES && aarch64_gcs_enabled ())

so that the code is parsed but compiled out when the new syntax is not
supported.  This avoids cases where a patch that works with a new
assembler breaks bootstrap when using an older assembler, or vice
versa.

> +{
> +  aarch64_emit_aeabi_subsection (".aeabi-feature-and-bits", 1, 0);
> +  aarch64_emit_aeabi_attribute ("Tag_Feature_GCS", 3, 1);
> +}
> +#endif
> +
> default_file_start ();
>  }
>  
> [...]
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 8a5fed516b3..f4b1343ca70 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -4387,6 +4387,16 @@ case "$target" in
>   ldr x0, [[x2, #:gotpage_lo15:globalsym]]
>  ],,[AC_DEFINE(HAVE_AS_SMALL_PIC_RELOCS, 1,
>   [Define if your assembler supports relocs needed by -fpic.])])
> +# Check if we have binutils support for AEABI build attributes.
> +gcc_GAS_CHECK_FEATURE([support of AEABI build attributes], 
> gcc_cv_as_aarch64_aeabi_build_attributes,,
> +[
> + .set ATTR_TYPE_uleb128,   0
> + .set ATTR_TYPE_asciz, 1

Very minor, but: we can drop this line, since it isn't used in the test.
Same for the corresponding Tcl test.

OK with those changes, thanks.

Richard

> + .set Tag_Feature_foo, 1
> + .aeabi_subsection .aeabi-feature-and-bits, 1, ATTR_TYPE_uleb128
> + .aeabi_attribute Tag_Feature_foo, 1
> +],,[AC_DEFINE(HAVE_AS_AEABI_BUILD_ATTRIBUTES, 1,
> + [Define if your assembler supports AEABI build attributes.])])
>  # Enable Branch Target Identification Mechanism and Return Address
>  # Signing by default.
>  AC_ARG_ENABLE(standard-branch-protection,


Re: [PATCH v2 0/4] aarch64: add minimal support of AEABI build attributes for GCS

2024-10-23 Thread Richard Sandiford
Matthieu Longo  writes:
> The primary focus of this patch series is to add support for build attributes 
> in the context of GCS (Guarded Control Stack, an Armv9.4-a extension) to the 
> AArch64 backend.
> It addresses comments from revision 1 [2] and 2 [3], and proposes a different 
> approach compared to the previous implementation of the build attributes.
>
> The series is composed of the following 4 patches:
> 1. Patch adding assembly debug comments (-dA) to the existing GNU properties, 
> to improve testing and check the correctness of values.
> 2. The minimal patch adding support for build attributes in the context of 
> GCS.
> 3. A refactoring of (2) to make things less error-prone and more modular, add 
> support for asciz attributes and more debug information.
> 4. A refactoring of (1) relying partly on (3).
> The targeted final state of this series would consist in squashing (2) + (3), 
> and (1) + (4).
>
> **Special note regarding (2):** If Gas has support for build attributes, both 
> build attributes and GNU properties will be emitted. This behavior is still 
> open for discussion. Please, let me know your thoughts regarding this 
> behavior.

I don't have a strong opinion.  But emitting both seems like the safe
and conservatively correct behaviour, so I think the onus would be
on anyone who wants to drop the old information to make the case
for doing that.

> This patch series needs to be applied on top of the patch series for GCS [1].
>
> Bootstrapped on aarch64-none-linux-gnu, and no regression found.
>
> [1]: 
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/ARM/heads/gcs
> [2]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662825.html
> [3]: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664004.html
>
> Regards,
> Matthieu
>
> Diff with revision 1 [2]:
> - update the description of (2)
> - address the comments related to the tests in (2)
> - add new commits (1), (3) and (4)
>
> Diff with revision 2 [3]:
> - address comments of Richard Sandiford in revision 2.
> - fix several formatting mistakes.
> - remove RFC tag.
>
>
> Matthieu Longo (3):
>   aarch64: add debug comments to feature properties in .note.gnu.property
>   aarch64: improve assembly debug comments for AEABI build attributes
>   aarch64: encapsulate note.gnu.property emission into a class
>
> Srinath Parvathaneni (1):
>   aarch64: add minimal support of AEABI build attributes for GCS.

Looks good, thanks.  OK for trunk with the suggested changes for
patches 2 and 3.

Richard

>  gcc/config.gcc|   2 +-
>  gcc/config.in |   6 +
>  gcc/config/aarch64/aarch64-dwarf-metadata.cc  | 145 +++
>  gcc/config/aarch64/aarch64-dwarf-metadata.h   | 245 ++
>  gcc/config/aarch64/aarch64.cc |  69 ++---
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/configure |  38 +++
>  gcc/configure.ac  |  10 +
>  gcc/testsuite/gcc.target/aarch64/bti-1.c  |  13 +-
>  .../aarch64-build-attributes.exp  |  35 +++
>  .../build-attributes/build-attribute-gcs.c|  12 +
>  .../build-attribute-standard.c|  12 +
>  .../build-attributes/no-build-attribute-bti.c |  12 +
>  .../build-attributes/no-build-attribute-gcs.c |  12 +
>  .../build-attributes/no-build-attribute-pac.c |  12 +
>  .../no-build-attribute-standard.c |  12 +
>  gcc/testsuite/lib/target-supports.exp |  16 ++
>  17 files changed, 611 insertions(+), 50 deletions(-)
>  create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.cc
>  create mode 100644 gcc/config/aarch64/aarch64-dwarf-metadata.h
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/aarch64-build-attributes.exp
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-gcs.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/build-attribute-standard.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-bti.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-gcs.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-pac.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/build-attributes/no-build-attribute-standard.c


Re: [PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-10-23 Thread Richard Sandiford
Evgeny Karpov  writes:
> Tuesday, October 22, 2024
> Richard Sandiford  wrote:
>
>>> If ASM_OUTPUT_ALIGNED_LOCAL uses an alignment less than BIGGEST_ALIGNMENT,
>>> it might trigger a relocation issue.
>>>
>>> relocation truncated to fit: IMAGE_REL_ARM64_PAGEOFFSET_12L
>>
>> Sorry to press the issue, but: why does that happen?
>
> #define IMAGE_REL_ARM64_PAGEOFFSET_12L  0x0007  /* The 12-bit page offset of 
> the target, for instruction LDR (indexed, unsigned immediate). */
>
> Based on the documentation for LDR
> https://developer.arm.com/documentation/ddi0596/2020-12/Base-Instructions/LDR--immediate---Load-Register--immediate--
> "For the 64-bit variant: is the optional positive immediate byte offset, a 
> multiple of 8 in the range 0 to 32760, defaulting to 0 and encoded in the 
> "imm12" field as /8"

This in itself is relatively standard thouugh.  We can't assume
without checking that any given offset will be "nicely" aligned.
So...

> This means BIGGEST_ALIGNMENT (128) could be replaced with 64.
>
> auto rounded = ROUND_UP (MAX ((SIZE), 1),       \
>     MAX ((ALIGNMENT), 64) / BITS_PER_UNIT);
>
> It works for most cases, however, not for all of them.

...although this will work for, say, loading all of:

unsigned char x[8];

using a single LDR, it doesn't look like it would cope with:

  struct __attribute__((packed)) {
char x;
void *y;
  } foo;

  void *f() { return foo.y; }

Or, even if that does work, it isn't clear to me why patching
ASM_OUTPUT_ALIGNED_LOCAL is a complete solution to the problem.

ISTM that we should be checking the known alignment during code generation,
and only using relocations if their alignment requirements are known to
be met.

Once that's done, it would make sense to increase the default alignment
if that improves code quality.  But it would be good to fix the correctness
issue first, while the problem is still easily visible.

If we do want to increase the default alignment to improve code quality,
the normal way would be via macros like DATA_ALIGNMENT or LOCAL_ALIGNMENT.
The advantage of those macros is that the increased alignment is visible
during code generation, rather than something that is only applied at
output time.

Thanks,
Richard


Re: [PATCH v7] Target-independent store forwarding avoidance.

2024-10-23 Thread Jakub Jelinek
On Wed, Oct 23, 2024 at 04:27:29PM +0200, Konstantinos Eleftheriou wrote:

Just random ChangeLog formatting nits, not actual patch review:

> gcc/ChangeLog:
> 
>   * Makefile.in: Add avoid-store-forwarding.o

Missing . at the end.  Though, you should really also mention
what you're changing, so
* Makefile.in (OBJS): Add avoid-store-forwarding.o.

>   * common.opt: New option -favoid-store-forwarding.

* common.opt (favoid-store-forwarding): New option.

>   * common.opt.urls: Regenerate.
>   * doc/invoke.texi: New param store-forwarding-max-distance.
>   * doc/passes.texi: Document new pass.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in: Document new pass.
>   * params.opt: New param store-forwarding-max-distance.

* params.opt (store-forwarding-max-distance): New param.

>   * passes.def: Schedule a new pass.

This doesn't say what you schedule and where.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
or so.

>   * target.def (HOOK_PREFIX): New target hook avoid_store_forwarding_p.

Better
* target.def (avoid_store_forwarding_p): New DEFHOOK.

>   * target.h (struct store_fwd_info): Declare.
>   * targhooks.cc (default_avoid_store_forwarding_p):
> Add default_avoid_store_forwarding_p.

We don't indent the subsequent lines further than by a single tab.  Plus
it is just a new function, so just say
* targhooks.cc (default_avoid_store_forwarding_p): New function.

>   * targhooks.h (default_avoid_store_forwarding_p):
> Add default_avoid_store_forwarding_p.

See above, but just: Declare.

>   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
>   * avoid-store-forwarding.cc: New file.
>   * avoid-store-forwarding.h: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-4.c: New test (XFAIL).
>   * gcc.target/aarch64/avoid-store-forwarding-5.c: New test (XFAIL).

The " (XFAIL)" parts IMHO don't belong here.
And, it would be nice to get some test coverage on other arches too.

Jakub



[PATCH] ginclude: stdalign.h should define __xxx_is_defined macros for C++

2024-10-23 Thread Jonathan Wakely
The __alignas_is_defined macro has been required by C++ since C++11, and
C++ Library DR 4036 clarified that __alignof_is_defined should be
defined too.

The macros alignas and alignof should not be defined, as they're
keywords in C++.

Technically it's implementation-defined whether __STDC_VERSION__ is
defined by a C++ compiler, but G++ does not define it. Adjusting the
first #if this way works as intended: A C23 compiler will not enter the
outer if-group and so will not define any of the macros, a C17 compiler
will enter both if-groups and so define all the macros, and a C++
compiler will enter the outer if-group but not the inner if-group.

gcc/ChangeLog:

* ginclude/stdalign.h (__alignas_is_defined): Define for C++.
(__alignof_is_defined): Likewise.

libstdc++-v3/ChangeLog:

* testsuite/18_support/headers/cstdalign/macros.cc: New test.
---

The libc++ devs noticed recently that GCC's  doesn't conform
to the C++ requirements.

Tested x86_64-linux.

OK for trunk?

 gcc/ginclude/stdalign.h   |  5 ++--
 .../18_support/headers/cstdalign/macros.cc| 24 +++
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc

diff --git a/gcc/ginclude/stdalign.h b/gcc/ginclude/stdalign.h
index 5f82f2d68f2..af73c322624 100644
--- a/gcc/ginclude/stdalign.h
+++ b/gcc/ginclude/stdalign.h
@@ -26,11 +26,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #ifndef _STDALIGN_H
 #define _STDALIGN_H
 
-#if (!defined __cplusplus  \
- && !(defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L))
+#if !(defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L)
 
+#ifndef __cplusplus
 #define alignas _Alignas
 #define alignof _Alignof
+#endif
 
 #define __alignas_is_defined 1
 #define __alignof_is_defined 1
diff --git a/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc 
b/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc
new file mode 100644
index 000..c50c921cd59
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc
@@ -0,0 +1,24 @@
+// { dg-options "-D_GLIBCXX_USE_DEPRECATED=1 -Wno-deprecated" }
+// { dg-do preprocess { target c++11 } }
+
+#include 
+
+#ifndef __alignas_is_defined
+# error "The header  fails to define a macro named  
__alignas_is_defined"
+#elif __alignas_is_defined != 1
+# error "__alignas_is_defined is not defined to 1 in "
+#endif
+
+#ifndef __alignof_is_defined
+# error "The header  fails to define a macro named 
__alignof_is_defined"
+#elif __alignof_is_defined != 1
+# error "__alignof_is_defined is not defined to 1 in "
+#endif
+
+#ifdef alignas
+# error "The header  defines a macro named alignas"
+#endif
+
+#ifdef alignof
+# error "The header  defines a macro named alignof"
+#endif
-- 
2.46.2



Re: [PATCH] SVE intrinsics: Add constant folding for svindex.

2024-10-23 Thread Richard Sandiford
Jennifer Schmitz  writes:
> This patch folds svindex with constant arguments into a vector series.
> We implemented this in svindex_impl::fold using the function build_vec_series.
> For example,
> svuint64_t f1 ()
> {
>   return svindex_u642 (10, 3);
> }
> compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...}
> in the gimple pass lower.
> This optimization benefits cases where svindex is used in combination with
> other gimple-level optimizations.
> For example,
> svuint64_t f2 ()
> {
> return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5);
> }
> has previously been compiled to
> f2:
> index   z0.d, #10, #3
> mul z0.d, z0.d, #5
> ret
> Now, it is compiled to
> f2:
> mov x0, 50
> index   z0.d, x0, #15
> ret

Nice!  Thanks for doing this.

> For non-constant arguments, build_vec_series produces a VEC_SERIES_EXPR,
> which is translated back at RTL level to an index instruction without codegen
> changes.
>
> We added test cases checking
> - the application of the transform during gimple for constant arguments,
> - the interaction with another gimple-level optimization.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (svindex_impl::fold): Add constant folding.
>
> gcc/testsuite/
>   * gcc.target/aarch64/sve/index_const_fold.c: New test.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 12 +++
>  .../gcc.target/aarch64/sve/index_const_fold.c | 35 +++
>  2 files changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 1c17149e1f0..f6b1657ecbb 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1304,6 +1304,18 @@ public:
>  
>  class svindex_impl : public function_base
>  {
> +public:
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +tree vec_type = TREE_TYPE (f.lhs);
> +tree base = gimple_call_arg (f.call, 0);
> +tree step = gimple_call_arg (f.call, 1);

Could we restrict this to:

  if (TREE_CODE (base) != INTEGER_CST || TREE_CODE (step) != INTEGER_CST)
return nullptr;

for now?  This goes back to the previous discussion about how "creative"
the compiler is allowed to be in replacing the user's original instruction
selection.  IIRC, it'd also be somewhat novel to use VEC_SERIES_EXPR for
constant-length vectors.

We can always relax this later if we find a compelling use case.
But it looks like the tests would still pass with the guard above.

OK with that change, thanks.

Richard

> +
> +return gimple_build_assign (f.lhs,
> + build_vec_series (vec_type, base, step));
> +  }
> +
>  public:
>rtx
>expand (function_expander &e) const override
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c
> new file mode 100644
> index 000..7abb803f58b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/index_const_fold.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#include 
> +#include 
> +
> +#define INDEX_CONST(TYPE, TY)\
> +  sv##TYPE f_##TY##_index_const ()   \
> +  {  \
> +return svindex_##TY (10, 3); \
> +  }
> +
> +#define MULT_INDEX(TYPE, TY) \
> +  sv##TYPE f_##TY##_mult_index ()\
> +  {  \
> +return svmul_x (svptrue_b8 (),   \
> + svindex_##TY (10, 3),   \
> + 5); \
> +  }
> +
> +#define ALL_TESTS(TYPE, TY)  \
> +  INDEX_CONST (TYPE, TY) \
> +  MULT_INDEX (TYPE, TY)
> +
> +ALL_TESTS (uint8_t, u8)
> +ALL_TESTS (uint16_t, u16)
> +ALL_TESTS (uint32_t, u32)
> +ALL_TESTS (uint64_t, u64)
> +ALL_TESTS (int8_t, s8)
> +ALL_TESTS (int16_t, s16)
> +ALL_TESTS (int32_t, s32)
> +ALL_TESTS (int64_t, s64)
> +
> +/* { dg-final { scan-tree-dump-times "return \\{ 10, 13, 16, ... \\}" 8 
> "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "return \\{ 50, 65, 80, ... \\}" 8 
> "optimized" } } */


Re: [PATCH 3/9] Simplify X /[ex] Y cmp Z -> X cmp (Y * Z)

2024-10-23 Thread Andrew MacLeod



On 10/18/24 12:48, Richard Sandiford wrote:

[+ranger folks, who I forgot to CC originally, sorry!]

This patch applies X /[ex] Y cmp Z -> X cmp (Y * Z) when Y * Z is
representable.  The closest check for "is representable" on range
operations seemed to be overflow_free_p.  However, that is designed
for testing existing operations and so takes the definedness of
signed overflow into account.  Here, the question is whether we
can create an entirely new value.

The patch adds a new optional argument to overflow_free_p to
distinguish these cases.  It also adds a wrapper, so that it isn't
necessary to specify TRIO_VARYING.


Rather than complicating it with an additional parameter, and dealing 
with a wrapper, perhaps we can change the return value from a bool to an 
enum,  something like


enum overflow_free  {
 OF_MAY_OVERFLOW = 0, // overflow_free_p would return false or 
0.. this si the same
 OF_OVERFLOW_FREE,    //  same as overflow_free_p currently 
returning true

     OF_FITS_TYPE             //  Your requirement for fits_type.
};

And change the current implementations to return FITS_TYPE if possible, 
and failing that, then see if they can return OVERFLOW_FREE.


That shouldn't impact any existing uses of overflow_free_p since both 
values amount to "true" and allows you to easily ask either question.  
You just get the most restrictive answer..


Andrew


I couldn't find a good way of removing the duplication between
the two operand orders.  The rules are (in a loose sense) symmetric,
but they're not based on commutativity.

gcc/
* range-op.h (range_query_type): New enum.
(range_op_handler::fits_type_p): New function.
(range_operator::overflow_free_p): Add an argument to specify the
type of query.
(range_op_handler::overflow_free_p): Likewise.
* range-op-mixed.h (operator_plus::overflow_free_p): Likewise.
(operator_minus::overflow_free_p): Likewise.
(operator_mult::overflow_free_p): Likewise.
* range-op.cc (range_op_handler::overflow_free_p): Likewise.
(range_operator::overflow_free_p): Likewise.
(operator_plus::overflow_free_p): Likewise.
(operator_minus::overflow_free_p): Likewise.
(operator_mult::overflow_free_p): Likewise.
* match.pd: Simplify X /[ex] Y cmp Z -> X cmp (Y * Z) when
Y * Z is representable.

gcc/testsuite/
* gcc.dg/tree-ssa/cmpexactdiv-7.c: New test.
* gcc.dg/tree-ssa/cmpexactdiv-8.c: Likewise.
---
  gcc/match.pd  | 21 +
  gcc/range-op-mixed.h  |  9 --
  gcc/range-op.cc   | 19 ++--
  gcc/range-op.h| 31 +--
  gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-7.c | 21 +
  gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-8.c | 20 
  6 files changed, 107 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-7.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index b952225b08c..1b1d38cf105 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2679,6 +2679,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(le (minus (convert:etype @0) { lo; }) { hi; })
(gt (minus (convert:etype @0) { lo; }) { hi; })
  
+#if GIMPLE

+/* X /[ex] Y cmp Z -> X cmp (Y * Z), if Y * Z is representable.  */
+(for cmp (simple_comparison)
+ (simplify
+  (cmp (exact_div:s @0 @1) @2)
+  (with { int_range_max r1, r2; }
+   (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (r1, @1)
+   && get_range_query (cfun)->range_of_expr (r2, @2)
+   && range_op_handler (MULT_EXPR).fits_type_p (r1, r2))
+(cmp @0 (mult @1 @2)
+ (simplify
+  (cmp @2 (exact_div:s @0 @1))
+  (with { int_range_max r1, r2; }
+   (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (r1, @1)
+   && get_range_query (cfun)->range_of_expr (r2, @2)
+   && range_op_handler (MULT_EXPR).fits_type_p (r1, r2))
+(cmp (mult @1 @2) @0)
+#endif
+
  /* X + Z < Y + Z is the same as X < Y when there is no overflow.  */
  (for op (lt le ge gt)
   (simplify
diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index cc1db2f6775..402cb87c2b2 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -539,7 +539,8 @@ public:
   const irange &rh) const final override;
  
virtual bool overflow_free_p (const irange &lh, const irange &rh,

-   relation_trio = TRIO_VARYING) const;
+   relation_trio = TRIO_VARYING,
+   range_query_type = QUERY_WITH_GIMPLE_UB) const;
// Check compatibility of all operands.
bool operand_check_p (tree t1, tree t2, tree t3) const final override
  { return range_compatible_p (t1, t2) && range_compatible_p (t1, t3); }
@@

Re: [PATCH] libstdc++: Replace std::__to_address in C++20 branch in

2024-10-23 Thread Jonathan Wakely
On Wed, 23 Oct 2024 at 13:18, Jonathan Wakely  wrote:
>
> As noted by Patrick, r15-4546-g85e5b80ee2de80 should have changed the
> usage of std::__to_address to std::to_address in the C++20-specific
> branch that works on types satisfying std::contiguous_iterator.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/basic_string.h (assign(Iter, Iter)): Call
> std::to_address instead of __to_address.
> ---
> Tested x86_64-linux.
>
> This patch is also available as a pull request in the forge:
> https://forge.sourceware.org/gcc/gcc-TEST/pulls/2

Patrick approved this on the forge (thanks!), so I've added his
Reviewed-by: tag and pushed it now.

>
>  libstdc++-v3/include/bits/basic_string.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/basic_string.h 
> b/libstdc++-v3/include/bits/basic_string.h
> index 16e356e0678..28b3e536185 100644
> --- a/libstdc++-v3/include/bits/basic_string.h
> +++ b/libstdc++-v3/include/bits/basic_string.h
> @@ -1748,7 +1748,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> {
>   __glibcxx_requires_valid_range(__first, __last);
>   return _M_replace(size_type(0), size(),
> -   std::__to_address(__first), __last - __first);
> +   std::to_address(__first), __last - __first);
> }
>  #endif
>   else
> --
> 2.46.2
>



Re: [PATCH 2/2] tree-optimization/116575 - SLP masked load-lanes discovery

2024-10-23 Thread Richard Sandiford
Richard Biener  writes:
> The following implements masked load-lane discovery for SLP.  The
> challenge here is that a masked load has a full-width mask with
> group-size number of elements when this becomes a masked load-lanes
> instruction one mask element gates all group members.  We already
> have some discovery hints in place, namely STMT_VINFO_SLP_VECT_ONLY
> to guard non-uniform masks, but we need to choose a way for SLP
> discovery to handle possible masked load-lanes SLP trees.
>
> I have this time chosen to handle load-lanes discovery where we
> have performed permute optimization already and conveniently got
> the graph with predecessor edges built.  This is because unlike
> non-masked loads masked loads with a load_permutation are never
> produced by SLP discovery (because load permutation handling doesn't
> handle un-permuting the mask) and thus the load-permutation lowering
> which handles non-masked load-lanes discovery doesn't trigger.
>
> With this SLP discovery for a possible masked load-lanes, thus
> a masked load with uniform mask, produces a splat of a single-lane
> sub-graph as the mask SLP operand.  This is a representation that
> shouldn't pessimize the mask load case and allows the masked load-lanes
> transform to simply elide this splat.

It's been too long since I did significant work on the vectoriser for
me to make a sensible comment on this, but FWIW, I agree the representation
of a splatted mask sounds good.

> This fixes the aarch64-sve.exp mask_struct_load*.c testcases with
> --param vect-force-slp=1
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> I realize we are still quite inconsistent in how we do SLP
> discovery - mainly because of my idea to only apply minimal
> changes at this point.  I would expect that permuted masked loads
> miss the interleaving lowering performed by load permutation
> lowering.  And if we fix that we again have to decide whether
> to interleave or load-lane at the same time.  I'm also not sure
> how much good the optimize_slp passes to do VEC_PERMs in the
> SLP graph and what stops working when there are no longer any
> load_permutations in there.

Yeah, I'm also not sure about that.  The code only considers candidate
layouts that would undo a load permutation or a bijective single-input
VEC_PERM_EXPR.  It won't do anything for 2-to-1 permutes or single-input
packs.  The current layout selection is probably quite outdated at
this point.

Thanks,
Richard

> Richard.
>
>   PR tree-optimization/116575
>   * tree-vect-slp.cc (vect_get_and_check_slp_defs): Handle
>   gaps, aka NULL scalar stmt.
>   (vect_build_slp_tree_2): Allow gaps in the middle of a
>   grouped mask load.  When the mask of a grouped mask load
>   is uniform do single-lane discovery for the mask and
>   insert a splat VEC_PERM_EXPR node.
>   (vect_optimize_slp_pass::decide_masked_load_lanes): New
>   function.
>   (vect_optimize_slp_pass::run): Call it.
> ---
>  gcc/tree-vect-slp.cc | 138 ++-
>  1 file changed, 135 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index fca9ae86d2e..037098a96cb 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -641,6 +641,16 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
>unsigned int commutative_op = -1U;
>bool first = stmt_num == 0;
>  
> +  if (!stmt_info)
> +{
> +  for (auto oi : *oprnds_info)
> + {
> +   oi->def_stmts.quick_push (NULL);
> +   oi->ops.quick_push (NULL_TREE);
> + }
> +  return 0;
> +}
> +
>if (!is_a (stmt_info->stmt)
>&& !is_a (stmt_info->stmt)
>&& !is_a (stmt_info->stmt))
> @@ -2029,9 +2039,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>   has_gaps = true;
> /* We cannot handle permuted masked loads directly, see
>PR114375.  We cannot handle strided masked loads or masked
> -  loads with gaps.  */
> +  loads with gaps unless the mask is uniform.  */
> if ((STMT_VINFO_GROUPED_ACCESS (stmt_info)
> -&& (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps))
> +&& (DR_GROUP_GAP (first_stmt_info) != 0
> +|| (has_gaps
> +&& STMT_VINFO_SLP_VECT_ONLY (first_stmt_info
> || STMT_VINFO_STRIDED_P (stmt_info))
>   {
> load_permutation.release ();
> @@ -2054,7 +2066,12 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> unsigned i = 0;
> for (stmt_vec_info si = first_stmt_info;
>  si; si = DR_GROUP_NEXT_ELEMENT (si))
> - stmts2[i++] = si;
> + {
> +   if (si != first_stmt_info)
> + for (unsigned k = 1; k < DR_GROUP_GAP (si); ++k)
> +   stmts2[i+

Re: [PATCH] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-10-23 Thread Patrick Palka
On Tue, 22 Oct 2024, Marek Polacek wrote:

> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> This patch implements C++26 Pack Indexing, as described in
> .
> 
> The issue discussing how to mangle pack indexes has not been resolved
> yet  and I've
> made no attempt to address it so far.
> 
> Rather than introducing a new template code for a pack indexing, I'm
> adding a new operand to EXPR_PACK_EXPANSION to store the index; for
> TYPE_PACK_EXPANSION, I'm stashing the index into TYPE_VALUES_RAW.  This

What are the pros and cons of reusing TYPE/EXPR_PACK_EXPANSION instead
of creating two new tree codes for these operators (one of whose
operands would itself be a bare TYPE/EXPR_PACK_EXPANSION)?

I feel a little iffy at first glance about reusing these tree codes
since it muddles what "kind" of tree they are: currently they represent
a _vector_ or types/exprs (which is reflected by their tcc_exceptional
class), and with this approach they can now also represent a single
type/expr (despite their tcc_exceptional class), depending on whether
PACK_EXPANSION_INDEX is set.

At the same time, the pattern of a generic *_PACK_EXPANSION can be
anything whereas for these index operators we know it's always a single
bare pack, so we also don't need the full expressivity of
*_PACK_EXPANSION to represent these operators either.

Maybe it's the case that reusing these tree codes significantly
simplifies parts of the implementation?

> feature is akin to __type_pack_element, so they can share the element
> extraction part.
> 
> A pack indexing in a decltype proved to be a bit tricky; eventually,
> I've added PACK_EXPANSION_PARENTHESIZED_P -- while parsing, we can't
> really tell what it's going to expand to.
> 
> With this feature, it's valid to write something like
> 
>   using U = tmpl;
> 
> where we first expand the template argument into
> 
>   Ts...[Is#0], Ts...[Is#1], ...
> 
> and then substitute each individual pack index.
> 
> I have no test for the module.cc code, that is just guesswork.
> 
>   PR c++/113798
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.def (EXPR_PACK_EXPANSION): Add another operand.
>   * cp-tree.h (PACK_EXPANSION_INDEX): Define.
>   (PACK_EXPANSION_PARENTHESIZED_P): Define.
>   (pack_index_element): Declare.
>   * cxx-pretty-print.cc (cxx_pretty_printer::expression)
>   : Print PACK_EXPANSION_INDEX.
>   (cxx_pretty_printer::type_id) : Print
>   PACK_EXPANSION_INDEX.
>   * decl.cc (xref_basetypes): Set PACK_EXPANSION_INDEX.
>   * error.cc (dump_type): Print PACK_EXPANSION_INDEX.
>   * mangle.cc (write_type) : New comment.
>   * module.cc (trees_out::type_node): Stream PACK_EXPANSION_INDEX.
>   (trees_in::tree_node): Read in PACK_EXPANSION_INDEX.
>   * parser.cc (cp_parser_pack_index): New.
>   (cp_parser_primary_expression): Handle a pack-index-expression.
>   (cp_parser_unqualified_id): Handle a pack-index-specifier.
>   (cp_parser_nested_name_specifier_opt): Also check for
>   a pack-index-specifier.  Handle a pack-index-specifier.
>   (cp_parser_mem_initializer_id): Handle a pack-index-specifier.
>   (cp_parser_simple_type_specifier): Likewise.
>   (cp_parser_base_specifier): Likewise.
>   * pt.cc (iterative_hash_template_arg) : Also
>   hash PACK_EXPANSION_INDEX.
>   (find_parameter_packs_r) : A type with
>   PACK_EXPANSION_INDEX is not a bare parameter pack.
>   : Walk into PACK_EXPANSION_INDEXes.
>   (instantiate_class_template): Handle a pack-index-specifier.
>   (tsubst_pack_expansion): tsubst_expr the PACK_EXPANSION_INDEX.
>   If there was a PACK_EXPANSION_INDEX, pull out the element via
>   pack_index_element.
>   (tsubst) : Handle a pack-index-specifier.
>   : For a PACK_EXPANSION_P, figure out if it should
>   be treated as an id-expression.
>   : Handle it if there is a
>   PACK_EXPANSION_INDEX.
>   (tsubst_stmt) : Likewise.
>   (tsubst_expr) : Likewise.
>   (tsubst_initializer_list): Handle a pack-index-specifier.
>   * ptree.cc (cxx_print_type) : Print
>   the PACK_EXPANSION_INDEX.
>   * semantics.cc (finish_parenthesized_expr): Maybe set
>   PACK_EXPANSION_PARENTHESIZED_P.
>   (finish_base_specifier): Check for a PACK_EXPANSION_P with
>   a PACK_EXPANSION_INDEX.
>   (get_vec_elt_checking): New, broken out of finish_type_pack_element.
>   (finish_type_pack_element): Call get_vec_elt_checking.
>   (pack_index_element): New.
>   * tree.cc (cp_build_qualified_type): Set PACK_EXPANSION_INDEX.
>   (cp_tree_equal) : Also compare the
>   PACK_EXPANSION_INDEXes.
>   (cp_walk_subtrees)case EXPR_PACK_EXPANSION>: Walk the PACK_EXPANSION_INDEX.
>   * typeck.cc (structural_comptypes) : Also
>   compare the PACK_EXPANSION_INDEXes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g+

[pushed] doc: remove obsolete deprecated info

2024-10-23 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

These formerly deprecated features eventually made it into the C++ standard.

gcc/ChangeLog:

* doc/extend.texi (Deprecated Features): Remove text about some
no-longer-deprecated features.
---
 gcc/doc/extend.texi | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 42bd567119d..6c2d6a610cd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -30169,16 +30169,6 @@ The use of default arguments in function pointers, 
function typedefs
 and other places where they are not permitted by the standard is
 deprecated and will be removed from a future version of G++.
 
-G++ allows floating-point literals to appear in integral constant expressions,
-e.g.@: @samp{ enum E @{ e = int(2.2 * 3.7) @} }
-This extension is deprecated and will be removed from a future version.
-
-G++ allows static data members of const floating-point type to be declared
-with an initializer in a class definition. The standard only allows
-initializers for static members of const integral types and const
-enumeration types so this extension has been deprecated and will be removed
-from a future version.
-
 G++ allows attributes to follow a parenthesized direct initializer,
 e.g.@: @samp{ int f (0) __attribute__ ((something)); } This extension
 has been ignored since G++ 3.3 and is deprecated.

base-commit: 22a37534c640ca5ff2f0e947dfe60df59fb6bba1
-- 
2.47.0



Re: counted_by attribute and type compatibility

2024-10-23 Thread Qing Zhao


> On Oct 22, 2024, at 15:16, Martin Uecker  wrote:
> 
 
>>> 
>>> I doesn't really make sense when they are inconsistent.
>>> Still, we could just warn and pick one of the attributes
>>> when forming the composite type.
>> 
>> If both are defined locally, such inconsistencies should be very easily 
>> fixed in the source code level.
>> And I think that such inconsistencies should be fixed in the source code. 
>> Therefore, reporting error 
>> makes more sense to me.
> 
> I agree, and error makes sense.  What worries me a little bit
> is tying this to a semantic change in type compatibility.
> 
> typedef struct foo { int n; int m; 
> [[gnu::counted_by(n)]] char buf[]; } aaa_t;
> 
> void foo()
> {
>  struct foo { int n; int m;
> [[gnu::counted_by(m)]] char buf[]; } *b;
> 
>  ... = _Generic(b, aaa_t*: 1, default: 0); 
> }
> 
> would go into the default branch for compilers supporting 
> the attribute but go into the first branch for others.  Also
> it affects ailasing rules.

So, they are in separate compilations? Then the compiler is not able to catch 
such
 inconsistency during compilation time. 
> 
> But maybe this is not a problem.
This does look like an issue to me…
Not sure how to resolve such issue at this moment.

Or, only when the “counted_by” information is included into the TYPE, such 
issue can be resolved?

> 
>>> 
>>> But I was thinking about the case where you have a type with
>>> a counted_by attribute and one without. Using them together
>>> seems useful, e.g. to add a counted_by in your local version
>>> of a type which needs to be compatible to some API.
>> 
>> For API compatibility purpose, yes, I agree here. 
>> A stupid question here: if one is defined locally, the other one
>> is NOT defined locally, can such inconsistency be caught by the
>> same compilation (is this the LTO compilation?)
> 
> If there is separate compilation this is not catched. LTO
> has a much coarser notion of types and would not notice
> either (I believe).

Okay. Then such inconsistency will not be caught during compilation time.

> 
>> Suppose we can catch such inconsistency in the same compilation,
>> which version we should keep? I guess that we should keep the
>> version without the counted_by attribute? 
>> 
> I would keep the one with the attribute, because this is the
> one which has more information. 
Make sense to me.

Thanks.
Qing
> 
> 
> Martin




Re: [PATCH v6] Target-independent store forwarding avoidance.

2024-10-23 Thread Konstantinos Eleftheriou
Hi Jeff, thanks for the feedback.
Indeed, there was an issue with copying back the load register when
the load is eliminated.
I just sent a new version
(https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666230.html).

On Fri, Oct 18, 2024 at 9:55 PM Jeff Law  wrote:
>
>
>
> On 10/18/24 3:57 AM, Konstantinos Eleftheriou wrote:
> > From: kelefth 
> >
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >   strbw2, [x1, 1]
> >   ldr x0, [x1]  # Expensive store forwarding to larger load.
> >
> > To:
> >
> >   ldr x0, [x1]
> >   strbw2, [x1]
> >   bfi x0, x2, 0, 8
> >
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >Neoverse-N1:  +29.4%
> >Intel Coffeelake: +13.1%
> >AMD 5950X:+17.5%
> So just fired this up on the crosses after enabling it by default.  It's
> still got several hours to go, but there's a pretty clear goof in here
> that's showing up on multiple targets.
>
> Just taking mcore-elf as an example, we're mis-compiling muldi3 from libgcc.
>
> We have this in the .asmcons dump:
>
> > (insn 37 36 40 4 (set (mem/j/c:SI (reg/f:SI 8 r8) [1 MEM[(union  
> > *)_61].s.low+0 S4 A64])
> > (reg:SI 77 [ _10 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":532:649 
> > discrim 3 65 {*mcore.md:1196}
> >  (expr_list:REG_DEAD (reg:SI 77 [ _10 ])
> > (nil)))
> [ ... ]
>
> > (insn 44 43 45 4 (set (mem/j/c:SI (plus:SI (reg/f:SI 8 r8)
> > (const_int 4 [0x4])) [1 MEM[(union  *)_61].s.high+0 S4 A32])
> > (reg:SI 81 [ _18 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":534:12 
> > 65 {*mcore.md:1196}
> >  (expr_list:REG_DEAD (reg:SI 81 [ _18 ])
> > (nil)))
> > (note 45 44 49 4 NOTE_INSN_DELETED)
> > (insn 49 45 50 4 (set (reg/i:DI 2 r2)
> > (mem/j/c:DI (reg/f:SI 8 r8) [1 MEM[(union  *)_61].ll+0 S8 A64])) 
> > "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 68 {movdi_i}
> >  (nil))
>
> So we've got two SImode stores which are then loaded in DImode a bit
> later to set the return value for the function.  A very clear
> opportunity to do store forwarding.
>
>
> In the store-forwarding dump we have:
> > (insn 70 36 40 4 (set (reg:SI 95)
> > (reg:SI 77 [ _10 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":532:649 
> > discrim 3 65 {*mcore.md:1196}
> >  (nil))
> [ ... ]
> > (insn 67 43 45 4 (set (reg:SI 94)
> > (reg:SI 81 [ _18 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":534:12 
> > 65 {*mcore.md:1196}
> >  (nil))
> > (note 45 67 71 4 NOTE_INSN_DELETED)
> > (insn 71 45 69 4 (set (mem/j/c:SI (reg/f:SI 8 r8) [1 MEM[(union  
> > *)_61].s.low+0 S4 A64])
> > (reg:SI 95)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 
> > {*mcore.md:1196}
> >  (nil))
> > (insn 69 71 68 4 (set (reg:DI 93)
> > (subreg:DI (reg:SI 95) 0)) 
> > "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 68 {movdi_i}
> >  (nil))
> > (insn 68 69 66 4 (set (mem/j/c:SI (plus:SI (reg/f:SI 8 r8)
> > (const_int 4 [0x4])) [1 MEM[(union  *)_61].s.high+0 S4 A32])
> > (reg:SI 94)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 
> > {*mcore.md:1196}
> >  (nil))
> > (insn 66 68 50 4 (set (subreg:SI (reg:DI 93) 4)
> > (reg:SI 94)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 
> > {*mcore.md:1196}
> >  (nil))
>
> Note that we never put a value into (reg:DI 2), so the return value from
> this routine is garbage, naturally leading to testsuite failures.
>
> It looks like we're missing a copy from (reg:DI 93) to (reg:DI 2) to me.
>
> You should be able to see this with a cross compiler and don't need
> binutils/gas, newlib, etc.
>
> Compile the attached testcase with an mcore-elf configured compiler with
> -O3 -favoid-store-forwarding
>
>
> Related, but obviously not a requirement to go forward.   After the SFB
> elimination, the two stores at insns 71, 68 are dead and could be
> removed.  In theory DSE should have eliminated them, but isn't for
> reasons I haven't investigated.
>
> Jeff
>


Re: [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342]

2024-10-23 Thread Victor Do Nascimento

On 2/1/24 21:59, Richard Sandiford wrote:

Andre Vieira  writes:

This patch finalizes adding support for the generation of SVE simd clones when
no simdlen is provided, following the ABI rules where the widest data type
determines the minimum amount of elements in a length agnostic vector.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare.
* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
visibility global and support use for non_acle types.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
when no simdlen is provided, according to ABI rules.
(simd_clone_adjust_sve_vector_type): New helper function.
(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones
and modify types to use SVE types.
* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
(simd_clone_adjust): Adapt safelen check to be compatible with VLA
simdlen.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 target
only test.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
* gcc.target/aarch64/vect-simd-clone-1.c: New test.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index a0b142e0b94..207396de0ff 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1031,6 +1031,8 @@ namespace aarch64_sve {
  #ifdef GCC_TARGET_H
bool verify_type_context (location_t, type_context_kind, const_tree, bool);
  #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+ const char *, const char *);
  }
  
  extern void aarch64_split_combinev16qi (rtx operands[3]);

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 11f5c5c500c..747131e684e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -953,14 +953,16 @@ static bool reported_missing_registers_p;
  /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE 
vectors
 and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
 mangling of the type.  ACLE_NAME is the  name of the type.  */
-static void
+void
  add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
const char *mangled_name, const char *acle_name)
  {
tree mangled_name_tree
  = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+= (acle_name ? get_identifier (acle_name) : NULL_TREE);
  
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);

+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
value = tree_cons (NULL_TREE, mangled_name_tree, value);
value = tree_cons (NULL_TREE, size_int (num_pr), value);
value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 31617510160..cba8879ab33 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28527,7 +28527,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
int num, bool explicit_p)
  {
tree t, ret_type;
-  unsigned int nds_elt_bits;
+  unsigned int nds_elt_bits, wds_elt_bits;
unsigned HOST_WIDE_INT const_simdlen;
  
if (!TARGET_SIMD)

@@ -28572,10 +28572,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
if (TREE_CODE (ret_type) != VOID_TYPE)
  {
nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
+  wds_elt_bits = nds_elt_bits;
vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
  }
else
-nds_elt_bits = POINTER_SIZE;
+{
+  nds_elt_bits = POINTER_SIZE;
+  wds_elt_bits = 0;
+}
  
int i;

tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
@@ -28583,44 +28587,72 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
 t && t != void_list_node; t = TREE_CHAIN (t), i++)
  {
-  tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+  tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
- && !supported_simd_type (arg_type))
+ && !supported_simd_type (type))
{
  if (!explicit_p)
;
- else if (COMPLEX_FLOAT_TYPE_P (ret_type))
+ else if (COMPLEX_FLOAT_TYPE_P (type))
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
"GCC does not curren

Re: [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value

2024-10-23 Thread François Dumont

Sorry but I'm not sure, is it also ok for the 3 backports ?

On 22/10/2024 22:43, Jonathan Wakely wrote:

On Tue, 22 Oct 2024 at 18:28, François Dumont  wrote:

Hi

  libstdc++: Always instantiate key_type to compute hash code [PR115285]

  Even if it is possible to compute a hash code from the inserted
arguments
  we need to instantiate the key_type to guaranty hash code consistency.

  Preserve the lazy instantiation of the mapped_type in the context of
  associative containers.

  libstdc++-v3/ChangeLog:

  PR libstdc++/115285
  * include/bits/hashtable.h (_S_forward_key<_Kt>): Always
return a temporary
  key_type instance.
  * testsuite/23_containers/unordered_map/96088.cc: Adapt to
additional instanciation.
  Also check that mapped_type is not instantiated when there
is no insertion.
  * testsuite/23_containers/unordered_multimap/96088.cc:
Adapt to additional
  instanciation.
  * testsuite/23_containers/unordered_multiset/96088.cc:
Likewise.
  * testsuite/23_containers/unordered_set/96088.cc: Likewise.
  * testsuite/23_containers/unordered_set/pr115285.cc: New
test case.


Tested under Linux x64,

ok to commit ?

OK, thanks



RE: [PATCH v3] aarch64: Improve scalar mode popcount expansion by using SVE [PR113860]

2024-10-23 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > This is similar to the recent improvements to the Advanced SIMD
> > popcount expansion by using SVE. We can utilize SVE to generate more
> > efficient code for scalar mode popcount too.
> >
> > Changes since v1:
> > * v2: Add a new VNx1BI mode and a new test case for V1DI.
> > * v3: Abandon VNx1BI changes and add a new variant of
> aarch64_ptrue_reg.
> 
> Sorry for the slow review.
> 
> The patch looks good though.  OK with the changes below:
> 
> > diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > new file mode 100644
> > index 000..f086cae55a2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt12.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fgimple" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> 
> It's probably safer to add:
> 
> #pragma GCC target "+nosve"
> 
> here, so that we don't try to use the SVE instructions.
> 
> > +/*
> > +** foo:
> > +** cnt v0.8b, v0.8b
> > +** addvb0, v0.8b
> 
> Nothing requires the temporary register to be v0, so this should be something
> like:
> 
>   cnt (v[0-9]+\.8b), v0\.8b
>   addvb0, \1

Good point! I've updated the testcase and pushed the patch as 
r15-4579-g9ffcf1f193b47.

Thanks,
Pengxuan
> 
> Thanks,
> Richard
> 
> > +** ret
> > +*/
> > +__Uint64x1_t __GIMPLE
> > +foo (__Uint64x1_t x)
> > +{
> > +  __Uint64x1_t z;
> > +
> > +  z = .POPCOUNT (x);
> > +  return z;
> > +}


Re: [Bug libstdc++/115285] [12/13/14/15 Regression] std::unordered_set can have duplicate value

2024-10-23 Thread Jonathan Wakely
On Wed, 23 Oct 2024 at 18:37, François Dumont  wrote:
>
> Sorry but I'm not sure, is it also ok for the 3 backports ?

Yeah, I should have said - OK for the branches too, thanks.

>
> On 22/10/2024 22:43, Jonathan Wakely wrote:
> > On Tue, 22 Oct 2024 at 18:28, François Dumont  wrote:
> >> Hi
> >>
> >>   libstdc++: Always instantiate key_type to compute hash code 
> >> [PR115285]
> >>
> >>   Even if it is possible to compute a hash code from the inserted
> >> arguments
> >>   we need to instantiate the key_type to guaranty hash code 
> >> consistency.
> >>
> >>   Preserve the lazy instantiation of the mapped_type in the context of
> >>   associative containers.
> >>
> >>   libstdc++-v3/ChangeLog:
> >>
> >>   PR libstdc++/115285
> >>   * include/bits/hashtable.h (_S_forward_key<_Kt>): Always
> >> return a temporary
> >>   key_type instance.
> >>   * testsuite/23_containers/unordered_map/96088.cc: Adapt to
> >> additional instanciation.
> >>   Also check that mapped_type is not instantiated when there
> >> is no insertion.
> >>   * testsuite/23_containers/unordered_multimap/96088.cc:
> >> Adapt to additional
> >>   instanciation.
> >>   * testsuite/23_containers/unordered_multiset/96088.cc:
> >> Likewise.
> >>   * testsuite/23_containers/unordered_set/96088.cc: Likewise.
> >>   * testsuite/23_containers/unordered_set/pr115285.cc: New
> >> test case.
> >>
> >>
> >> Tested under Linux x64,
> >>
> >> ok to commit ?
> > OK, thanks
> >
>



testsuite: Use -std=gnu17 in gcc.dg/pr114115.c

2024-10-23 Thread Joseph Myers
One test failing with a -std=gnu23 default that I wanted to
investigate further is gcc.dg/pr114115.c.  Building with -std=gnu23
produces a warning:

pr114115.c:18:8: warning: 'ifunc' resolver for 'foo_ifunc2' should return 'void 
* (*)(void)' [-Wattribute-alias=]

It turns out that this warning (from cgraphunit.cc) is disabled for
unprototyped functions.  It's not immediately obvious that being
unprototyped has much to do with such incompatibilities of return type
(void versus void *), but it still seems reasonable to address this
warning by adding -std=gnu17 to the options for this testcase, so
minimizing the perturbation to what it tests.

Tested for x86_64.

* gcc.dg/pr114115.c: Use -std=gnu17.

diff --git a/gcc/testsuite/gcc.dg/pr114115.c b/gcc/testsuite/gcc.dg/pr114115.c
index 2629f591877..c8ed4913dbf 100644
--- a/gcc/testsuite/gcc.dg/pr114115.c
+++ b/gcc/testsuite/gcc.dg/pr114115.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O0 -fprofile-generate -fdump-tree-optimized" } */
+/* { dg-options "-std=gnu17 -O0 -fprofile-generate -fdump-tree-optimized" } */
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-require-ifunc "" } */
 

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH v2 9/9] aarch64: Handle alignment when it is bigger than BIGGEST_ALIGNMENT

2024-10-23 Thread Evgeny Karpov
Tuesday, October 22, 2024
Richard Sandiford  wrote:

>> If ASM_OUTPUT_ALIGNED_LOCAL uses an alignment less than BIGGEST_ALIGNMENT,
>> it might trigger a relocation issue.
>>
>> relocation truncated to fit: IMAGE_REL_ARM64_PAGEOFFSET_12L
>
> Sorry to press the issue, but: why does that happen?

#define IMAGE_REL_ARM64_PAGEOFFSET_12L  0x0007  /* The 12-bit page offset of 
the target, for instruction LDR (indexed, unsigned immediate). */

Based on the documentation for LDR
https://developer.arm.com/documentation/ddi0596/2020-12/Base-Instructions/LDR--immediate---Load-Register--immediate--
"For the 64-bit variant: is the optional positive immediate byte offset, a 
multiple of 8 in the range 0 to 32760, defaulting to 0 and encoded in the 
"imm12" field as /8"

This means BIGGEST_ALIGNMENT (128) could be replaced with 64.

auto rounded = ROUND_UP (MAX ((SIZE), 1),       \
    MAX ((ALIGNMENT), 64) / BITS_PER_UNIT);

It works for most cases, however, not for all of them.

cross-aarch64-w64-mingw32-msvcrt/lib/gcc/aarch64-w64-mingw32/15.0.0/libgomp.a(oacc-profiling.o):
 in function `goacc_profiling_initialize':
mingw-woarm64-build/code/gcc/libgomp/oacc-profiling.c:105:(.text+0x2c): 
relocation truncated to fit: IMAGE_REL_ARM64_PAGEOFFSET_12L against `no symbol'

This case should be investigated separately and fixed in the following
patch series and BIGGEST_ALIGNMENT should be used for now.

>>> Better to use "auto" rather than "unsigned".
>> It looks like "auto" cannot be used there.
>
>What goes wrong if you use it?
>
>The reason for asking for "auto" was to avoid silent truncation.

After the second try and recompiling, it looks like it works.

Regards,
Evgeny


diff --git a/gcc/config/aarch64/aarch64-coff.h 
b/gcc/config/aarch64/aarch64-coff.h
index 8fc6ca0440d..52c8c8d99c2 100644
--- a/gcc/config/aarch64/aarch64-coff.h
+++ b/gcc/config/aarch64/aarch64-coff.h
@@ -58,6 +58,13 @@
   assemble_name ((FILE), (NAME)),              \
   fprintf ((FILE), "," HOST_WIDE_INT_PRINT_UNSIGNED "\n", (ROUNDED)))

+#define ASM_OUTPUT_ALIGNED_LOCAL(FILE, NAME, SIZE, ALIGNMENT)  \
+  {                                                            \
+    auto rounded = ROUND_UP (MAX ((SIZE), 1),          \
+      MAX ((ALIGNMENT), BIGGEST_ALIGNMENT) / BITS_PER_UNIT);   \
+    ASM_OUTPUT_LOCAL (FILE, NAME, SIZE, rounded);              \
+  }
+
 #define ASM_OUTPUT_SKIP(STREAM, NBYTES)        \
   fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))


Re: [PATCH] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-10-23 Thread Jason Merrill

On 10/23/24 10:20 AM, Patrick Palka wrote:

On Tue, 22 Oct 2024, Marek Polacek wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C++26 Pack Indexing, as described in
.

The issue discussing how to mangle pack indexes has not been resolved
yet  and I've
made no attempt to address it so far.

Rather than introducing a new template code for a pack indexing, I'm
adding a new operand to EXPR_PACK_EXPANSION to store the index; for
TYPE_PACK_EXPANSION, I'm stashing the index into TYPE_VALUES_RAW.  This


What are the pros and cons of reusing TYPE/EXPR_PACK_EXPANSION instead
of creating two new tree codes for these operators (one of whose
operands would itself be a bare TYPE/EXPR_PACK_EXPANSION)?

I feel a little iffy at first glance about reusing these tree codes
since it muddles what "kind" of tree they are: currently they represent
a _vector_ or types/exprs (which is reflected by their tcc_exceptional
class), and with this approach they can now also represent a single
type/expr (despite their tcc_exceptional class), depending on whether
PACK_EXPANSION_INDEX is set.


Yeah, I made a similar comment.


At the same time, the pattern of a generic *_PACK_EXPANSION can be
anything whereas for these index operators we know it's always a single
bare pack, so we also don't need the full expressivity of
*_PACK_EXPANSION to represent these operators either.


I imagine that someone will want to extend it to indexing into an 
arbitrary pack expansion before long, so I wouldn't try too hard to 
simplify based on that assumption.


Jason



[committed] libstdc++: Add -D_GLIBCXX_ASSERTIONS default for -O0 to API history

2024-10-23 Thread Jonathan Wakely
Excuse the huge diff, it's because it adds a new section heading so all
the TOC pages and section listings change.

Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document that assertions are
enabled for unoptimized builds.
* doc/html/*: Regenerate.
---
 libstdc++-v3/doc/html/index.html   | 2 +-
 libstdc++-v3/doc/html/manual/api.html  | 2 ++
 libstdc++-v3/doc/html/manual/appendix.html | 2 +-
 libstdc++-v3/doc/html/manual/appendix_porting.html | 2 +-
 libstdc++-v3/doc/html/manual/index.html| 2 +-
 libstdc++-v3/doc/xml/manual/evolution.xml  | 8 
 6 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/doc/html/index.html b/libstdc++-v3/doc/html/index.html
index 395908f17a1..e96f2cb2704 100644
--- a/libstdc++-v3/doc/html/index.html
+++ b/libstdc++-v3/doc/html/index.html
@@ -142,7 +142,7 @@
 Existing tests
 
 C++11 Requirements Test Sequence Descriptions
-ABI Policy and 
GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed 
ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.314Backwards 
CompatibilityFirstSecondThirdPre-ISO headers 
removedExtension headers hash_map, 
hash_set moved to ext or backwardsNo ios::nocreate/ios::noreplace.
+ABI Policy and 
GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed 
ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.31415Backwards 
CompatibilityFirstSecondThirdPre-ISO headers 
removedExtension headers hash_map, 
hash_set moved to ext or backwardsNo ios::nocreate/ios::noreplace.
 
 No stream::attach(int fd)
 
diff --git a/libstdc++-v3/doc/html/manual/api.html 
b/libstdc++-v3/doc/html/manual/api.html
index 59ed4c88862..799f6eae2a2 100644
--- a/libstdc++-v3/doc/html/manual/api.html
+++ b/libstdc++-v3/doc/html/manual/api.html
@@ -496,4 +496,6 @@ to be used with std::basic_istream.
   The extension allowing std::basic_string to be 
instantiated
   with an allocator that doesn't match the string's character type is no
   longer allowed in C++20 mode.
+15
+Enabled debug assertions by default for unoptimized builds.
 Prev Up NextABI Policy and Guidelines Home Backwards 
Compatibility
\ No newline at end of file
diff --git a/libstdc++-v3/doc/html/manual/appendix.html 
b/libstdc++-v3/doc/html/manual/appendix.html
index affd5839f43..69a0e0018f3 100644
--- a/libstdc++-v3/doc/html/manual/appendix.html
+++ b/libstdc++-v3/doc/html/manual/appendix.html
@@ -16,7 +16,7 @@
 Existing tests
 
 C++11 Requirements Test Sequence Descriptions
-ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.314Backwards 
CompatibilityFirstSecondThirdPre-ISO 
headers removedExtension headers hash_map, hash_set 
moved to ext or backwardsNo ios::nocreate/ios::noreplace.
+ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.31415Backwards 
CompatibilityFirstSecondThirdPre-ISO 
headers removedExtension headers hash_map, hash_set 
moved to ext or backwardsNo ios::nocreate/ios::noreplace.
 
 No stream::attach(int fd)
 
diff --git a/libstdc++-v3/doc/html/manual/appendix_porting.html 
b/libstdc++-v3/doc/html/manual/appendix_porting.html
index 5d8d5da0bf9..c76ef295e78 100644
--- a/libstdc++-v3/doc/html/manual/appendix_porting.html
+++ b/libstdc++-v3/doc/html/manual/appendix_porting.html
@@ -14,7 +14,7 @@
 Existing tests
 
 C++11 Requirements Test Sequence Descriptions
-ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImplementationTestingSingle ABI 
TestingMultiple ABI 
TestingOutstanding 
IssuesAPI Evolution and Deprecation 
History3.03.13.23.33.44.04.14.24.34.44.54.64.74.84.955.3677.27.38910111212.31313.314Backwards 
CompatibilityFirstSecondThirdPre-ISO 
headers removedExtension headers hash_map, hash_set 
moved to ext or backwardsNo ios::nocreate/ios::noreplace.
+ABI Policy and GuidelinesThe C++ 
InterfaceVersioningGoalsHistoryPrerequisitesConfiguringChecking 
ActiveAllowed ChangesProhibited 
ChangesImpleme

Re: [PATCH] Implement Fortran diagnostic buffering for non-textual formats [PR105916]

2024-10-23 Thread David Malcolm
On Wed, 2024-10-23 at 11:03 +0200, Tobias Burnus wrote:
> David Malcolm wrote:
> > In order to handle various awkward parsing issues, the Fortran
> > frontend
> > implements buffering of diagnostics, so that diagnostics reported
> > to
> > global_dc can be either:
> > (a) immediately issued, or
> > (b) speculatively reported to global_dc, and stored in a buffer, to
> > either be issued later or discarded.
> ...
> > This patch moves responsibility for such buffering of diagnostics
> > from
> > fortran's error.cc to the diagnostic subsystem.
> ...
> > Does this look OK from the Fortran side?  The Fortran changes are
> > essentially all specific to error.cc, converting from manipulations
> > of
> > output_buffer to those of diagnostic_buffer.
> 
> Yes, LGTM. (I only looked at the Fortran changes.)

Thanks; I've gone ahead and pushed it to trunk (as r15-4575-
gf565063a0602ad).

> 
> Thanks,
> 
> Tobias
> 
> PS: I guess we eventually want to have in the testsuite some Fortran 
> SARIF tests, which for actual Fortran errors/warnings and not "just"
> for 
> #error.

The fortran testsuite currently has test coverage for the following
case:

* json #error and #warning (unbuffered diagnostics)
* sarif #error (unbuffered diagnostic)
* json and sarif coverage for PR105916 (buffered diagnostics that are
discarded)

but I realize we don't have coverage there for the case where buffered
diagnostics are flushed/emitted.  I can add this, but my Fortran skills
are non-existent, so is there a trivial case that would hit this path?
Ideas for test coverage would be most welcome.  There is some test
coverage for this case in the selftests in diagnostic.cc, but not yet a
DejaGnu end-to-end test of it in terms of invoking gfortran and
checking the .sarif output.

Note that I've deprecated the "json" output format, so new testcases
for machine-readable diagnostics should be for sarif.

> 
> > I'm hoping to get this in as I have followup work to support having
> > e.g.
> > both text *and* SARIF at once (PR other/116613), and fixing this is
> > a
> > prerequisite for that work.

Thanks again for the review; I'll get back to working on PR
other/116613.  I'll try to add a Fortran test case for that, e.g. for
outputting both sarif 2.1 *and* sarif 2.2 from the same invocation.

Dave


> > 
> > Thanks
> > Dave
> > 
> > gcc/ChangeLog:
> > PR fortran/105916
> > * diagnostic-buffer.h: New file.
> ...
> > gcc/fortran/ChangeLog:
> > PR fortran/105916
> > * error.cc (pp_error_buffer, pp_warning_buffer): Convert
> > from
> > output_buffer * to diagnostic_buffer *.
> > (warningcount_buffered, werrorcount_buffered): Eliminate.
> > (gfc_error_buffer::gfc_error_buffer): Move constructor
> > definition
> > here, and initialize "buffer" using *global_dc.
> > (gfc_output_buffer_empty_p): Delete in favor of
> > diagnostic_buffer::empty_p.
> > (gfc_clear_pp_buffer): Replace with...
> > (gfc_clear_diagnostic_buffer): ...this, moving
> > implementation
> > details to diagnostic_context::clear_diagnostic_buffer.
> > (gfc_warning): Replace buffering implementation with calls
> > to global_dc->get_diagnostic_buffer and
> > global_dc->set_diagnostic_buffer.
> > (gfc_clear_warning): Update for renaming of
> > gfc_clear_pp_buffer
> > and elimination of warningcount_buffered and
> > werrorcount_buffered.
> > (gfc_warning_check): Replace buffering implementation with
> > calls
> > to pp_warning_buffer->empty_p and
> > global_dc->flush_diagnostic_buffer.
> > (gfc_error_opt): Replace buffering implementation with
> > calls to
> > global_dc->get_diagnostic_buffer and
> > set_diagnostic_buffer.
> > (gfc_clear_error): Update for renaming of
> > gfc_clear_pp_buffer.
> > (gfc_error_flag_test): Replace call to
> > gfc_output_buffer_empty_p
> > with call to diagnostic_buffer::empty_p.
> > (gfc_error_check): Replace buffering implementation with
> > calls
> > to pp_error_buffer->empty_p and global_dc-
> > >flush_diagnostic_buffer.
> > (gfc_move_error_buffer_from_to): Replace buffering
> > implementation
> > with usage of diagnostic_buffer.
> > (gfc_free_error): Update for renaming of
> > gfc_clear_pp_buffer.
> > (gfc_diagnostics_init): Use "new" directly when creating
> > pp_warning_buffer.  Remove setting of m_flush_p on the two
> > buffers, as this is handled by diagnostic_buffer and by
> > diagnostic_text_format_buffer's constructor.
> > * gfortran.h: Replace #include "pretty-print.h" for
> > output_buffer
> > with #include "diagnostic-buffer.h" for diagnostic_buffer.
> > (struct gfc_error_buffer): Change type of field "buffer"
> > from
> > output_buffer to diagnostic_buffer.  Move definition of
> > constructor
> > into error.cc so that it can use global_dc.
> > 
> > gcc/testsuite/ChangeLog:
> > PR fortran/105916
> > * gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Include
> > "diagnostic-buffer.h".

[PATCH v7] Target-independent store forwarding avoidance.

2024-10-23 Thread Konstantinos Eleftheriou
From: kelefth 

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

 strbw2, [x1, 1]
 ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

 ldr x0, [x1]
 strbw2, [x1]
 bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

  Neoverse-N1:  +29.4%
  Intel Coffeelake: +13.1%
  AMD 5950X:+17.5%

gcc/ChangeLog:

* Makefile.in: Add avoid-store-forwarding.o
* common.opt: New option -favoid-store-forwarding.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt: New param store-forwarding-max-distance.
* passes.def: Schedule a new pass.
* target.def (HOOK_PREFIX): New target hook avoid_store_forwarding_p.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p):
  Add default_avoid_store_forwarding_p.
* targhooks.h (default_avoid_store_forwarding_p):
  Add default_avoid_store_forwarding_p.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test (XFAIL).
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test (XFAIL).

Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 

Series-version: 7

Series-changes: 7
- Fix bug when copying back the load register, in the case that the
  load is eliminated.

Series-changes: 6
- Reject the transformation on cases that would cause store_bit_field
  to generate subreg expressions on different register classes.
  Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
  contain such cases and have been marked as XFAIL.
- Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
- Inline and remove get_load_mem.
- New implementation for is_store_forwarding.
- Refactor the main loop in avoid_store_forwarding.
- Avoid using the word 'forwardings'.
- Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
- Don't use df_insn_rescan where not needed.
- Change order of emitting stores and bit insert instructions.
- Check and reject loads for which the dest register overlaps with src.
- Remove unused variable.
- Change some gen_mov_insn function calls to gen_rtx_SET.
- Subtract the cost of eliminated load, instead of 1, for the total 
cost.
- Use delete_insn instead of set_insn_deleted.
- Regenerate common.opt.urls.
- Add some more comments.

Series-changes: 5
- Fix bug with BIG_ENDIAN targets.
- Fix bug with unrecognized instructions.
- Fix / simplify pass init/fini.

Series-changes: 4
- Change pass scheduling to run after sched1.
- Add target hook to decide whether a store forwarding instance
  should be avoided or not.
- Fix bugs.

Series-changes: 3
- Only emit SUBREG after calling validate_subreg.
- Fix memory corruption due to vec self-reference.
- Fix bitmap_bit_in_range_p ICE due to BLKMode.
- Reject MEM to MEM sets.
- Add get_load_mem comment.
- Add new testcase.

Series-changes: 2
- Allow modes that are not scalar_int_mode.
- Introduce simple costing to avoid unprofitable transformations.
- Reject bit insert sequences that spill to memory.
- Document new pass.
- Fix and add testcases.
---
 gcc/Makefile.in   |   1 +
 gcc/avoid-store-forwarding.cc | 634 ++
 gcc/avoid-store-forwarding.h  |  56 ++
 gcc/common.opt|   4 +
 gcc/common.opt.urls   |   3 +
 gcc/doc/invoke.texi   |   9 +
 gcc/doc/passes.texi   |   8 +
 gcc/doc/tm.texi   |   8 +
 gcc/doc/tm.texi.in|   2 +
 gcc/params.opt|   4 +
 gcc/passes.def|   1 +
 gcc/target.def|  10 +
 gcc/target.h 

[PATCH] top-level: Add pull request template for Forgejo

2024-10-23 Thread Jonathan Wakely
This complements the existing .github/PULL_REQUEST_TEMPLATE.md file,
which is used when somebody opens a pull request for an unofficial
mirror/fork of GCC on Github. The text in the existing file is very
specific to GitHub and doesn't make much sense to include on every PR
created on forge.sourceware.org.

I tested it by pushing this file to my own fork and opening a pull
request against that fork. This template in .forgejo was used instead of
the one in .github, see https://forge.sourceware.org/redi/gcc/pulls/1

OK for trunk?

-- >8 --

ChangeLog:

* .forgejo/PULL_REQUEST_TEMPLATE.md: New file.
---
 .forgejo/PULL_REQUEST_TEMPLATE.md | 9 +
 1 file changed, 9 insertions(+)
 create mode 100644 .forgejo/PULL_REQUEST_TEMPLATE.md

diff --git a/.forgejo/PULL_REQUEST_TEMPLATE.md 
b/.forgejo/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 000..fc645f11305
--- /dev/null
+++ b/.forgejo/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,9 @@
+Thanks for taking the time to contribute to GCC!
+
+Please be advised that https://forge.sourceware.org/ is currently a trial
+that is being used by the GCC community to experiment with a new workflow
+based on pull requests.
+
+Pull requests sent here may be forgotten or ignored. Patches that you want to
+propose for inclusion in GCC should use the existing email-based workflow,
+see https://gcc.gnu.org/contribute.html
-- 
2.46.2



Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-23 Thread Akram Ahmad

On 23/10/2024 12:20, Richard Sandiford wrote:

Thanks for doing this.  The approach looks good.  My main question is:
are we sure that we want to use the Advanced SIMD instructions for
signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
we only saturate at the negative limit if both operands are negative,
and only saturate at the positive limit if both operands are positive.
So for 32-bit values we can use:

asr tmp, x or y, #31
eor tmp, tmp, #0x8000

to calculate the saturation value and:

addsres, x, y
cselres, tmp, res, vs

to calculate the full result.  That's the same number of instructions
as two fmovs for the inputs, the sqadd, and the fmov for the result,
but it should be more efficient.

The reason for asking now, rather than treating it as a potential
future improvement, is that it would also avoid splitting the patterns
for signed and unsigned ops.  (The length of the split alternative can be
conservatively set to 16 even for the unsigned version, since nothing
should care in practice.  The split will have happened before
shorten_branches.)


Hi Richard, thanks for looking over this.

I might be misunderstanding your suggestion, but is there a way to 
efficiently
check the signedness of the second operand (let's say 'y') if it is 
stored in
a register? This is a problem we considered and couldn't solve 
post-reload, as
we only have three registers (including two operands) to work with. (I 
might be
wrong in terms of how many registers we have available). AFAIK that's 
why we only

use adds, csinv / subs, csel in the unsigned case.

To illustrate the point better: consider signed X + Y where both operands
are in GPR. Without knowing the signedness of Y, for branchless code, we 
would

need to saturate at both the positive and negative limit and then perform a
comparison on Y to check the sign, selecting either saturating limit 
accordingly.
This of course doesn't apply if signed saturating 'addition' with a 
negative op2
is only required to saturate to the positive limit- nor does it apply if 
Y or

op2 is an immediate.

Otherwise, I agree that this should be fixed now rather than as a future
improvement.




gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
new file mode 100644
index 000..63eb21e438b
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+/*
+** uadd_lane: { xfail *-*-* }

Just curious: why does this fail?  Is it a vector costing issue?

This is due to a missing pattern from match.pd- I've sent another patch
upstream to rectify this. In essence, this function exposes a commutative
form of an existing addition pattern, but that form isn't currently 
commutative
when it should be. It's a similar reason for why the uqsubs are also 
marked as

xfail, so that same patch series contains a fix for the uqsub case too.

Since the operands are commutative, and since there's no restriction
on the choice of destination register, it's probably safer to use:


+** uqadd\tv[0-9].16b, (?:v\1.16b, v\2.16b|v\2.16b

Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-23 Thread Richard Sandiford
Akram Ahmad  writes:
> On 23/10/2024 12:20, Richard Sandiford wrote:
>> Thanks for doing this.  The approach looks good.  My main question is:
>> are we sure that we want to use the Advanced SIMD instructions for
>> signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
>> we only saturate at the negative limit if both operands are negative,
>> and only saturate at the positive limit if both operands are positive.
>> So for 32-bit values we can use:
>>
>>  asr tmp, x or y, #31
>>  eor tmp, tmp, #0x8000
>>
>> to calculate the saturation value and:
>>
>>  addsres, x, y
>>  cselres, tmp, res, vs
>>
>> to calculate the full result.  That's the same number of instructions
>> as two fmovs for the inputs, the sqadd, and the fmov for the result,
>> but it should be more efficient.
>>
>> The reason for asking now, rather than treating it as a potential
>> future improvement, is that it would also avoid splitting the patterns
>> for signed and unsigned ops.  (The length of the split alternative can be
>> conservatively set to 16 even for the unsigned version, since nothing
>> should care in practice.  The split will have happened before
>> shorten_branches.)
>
> Hi Richard, thanks for looking over this.
>
> I might be misunderstanding your suggestion, but is there a way to
> efficiently check the signedness of the second operand (let's say 'y')
> if it is stored in a register? This is a problem we considered and
> couldn't solve post-reload, as we only have three registers (including
> two operands) to work with. (I might be wrong in terms of how many
> registers we have available). AFAIK that's why we only use adds, csinv
> / subs, csel in the unsigned case.

Ah, ok.  For post-reload splits, we would need to add:

  (clobber (match_operand:GPI 3 "scratch_operand"))

then use "X" as the constraint for the Advanced SIMD alternative and
"&r" as the constraint in the GPR alternative.  But I suppose that
also sinks my dream of a unified pattern, since the unsigned case
wouldn't need the extra operand.

In both cases (signed and unsigned), the pattern should clobber CC_REGNUM,
since the split changes the flags.

> [...]
>>> diff --git 
>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>>  
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> new file mode 100644
>>> index 000..63eb21e438b
>>> --- /dev/null
>>> +++ 
>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
>>> @@ -0,0 +1,79 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +/*
>>> +** uadd_lane: { xfail *-*-* }
>> Just curious: why does this fail?  Is it a vector costing issue?
> This is due to a missing pattern from match.pd- I've sent another patch
> upstream to rectify this. In essence, this function exposes a commutative
> form of an existing addition pattern, but that form isn't currently 
> commutative
> when it should be. It's a similar reason for why the uqsubs are also 
> marked as
> xfail, so that same patch series contains a fix for the uqsub case too.

Ah, ok, thanks.

Richard


Re: [PATCH] ginclude: stdalign.h should define __xxx_is_defined macros for C++

2024-10-23 Thread Jason Merrill

On 10/23/24 10:39 AM, Jonathan Wakely wrote:

The __alignas_is_defined macro has been required by C++ since C++11, and
C++ Library DR 4036 clarified that __alignof_is_defined should be
defined too.

The macros alignas and alignof should not be defined, as they're
keywords in C++.

Technically it's implementation-defined whether __STDC_VERSION__ is
defined by a C++ compiler, but G++ does not define it. Adjusting the
first #if this way works as intended: A C23 compiler will not enter the
outer if-group and so will not define any of the macros, a C17 compiler
will enter both if-groups and so define all the macros, and a C++
compiler will enter the outer if-group but not the inner if-group.

gcc/ChangeLog:

* ginclude/stdalign.h (__alignas_is_defined): Define for C++.
(__alignof_is_defined): Likewise.


Do we want to note somehow that these macros are deprecated since C++17?


libstdc++-v3/ChangeLog:

* testsuite/18_support/headers/cstdalign/macros.cc: New test.
---

The libc++ devs noticed recently that GCC's  doesn't conform
to the C++ requirements.

Tested x86_64-linux.

OK for trunk?

  gcc/ginclude/stdalign.h   |  5 ++--
  .../18_support/headers/cstdalign/macros.cc| 24 +++
  2 files changed, 27 insertions(+), 2 deletions(-)
  create mode 100644 
libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc

diff --git a/gcc/ginclude/stdalign.h b/gcc/ginclude/stdalign.h
index 5f82f2d68f2..af73c322624 100644
--- a/gcc/ginclude/stdalign.h
+++ b/gcc/ginclude/stdalign.h
@@ -26,11 +26,12 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  #ifndef _STDALIGN_H
  #define _STDALIGN_H
  
-#if (!defined __cplusplus		\

- && !(defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L))
+#if !(defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L)
  
+#ifndef __cplusplus

  #define alignas _Alignas
  #define alignof _Alignof
+#endif
  
  #define __alignas_is_defined 1

  #define __alignof_is_defined 1
diff --git a/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc 
b/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc
new file mode 100644
index 000..c50c921cd59
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/headers/cstdalign/macros.cc
@@ -0,0 +1,24 @@
+// { dg-options "-D_GLIBCXX_USE_DEPRECATED=1 -Wno-deprecated" }
+// { dg-do preprocess { target c++11 } }
+
+#include 


Should there also/instead be a test with ?


+
+#ifndef __alignas_is_defined
+# error "The header  fails to define a macro named  
__alignas_is_defined"
+#elif __alignas_is_defined != 1
+# error "__alignas_is_defined is not defined to 1 in "
+#endif
+
+#ifndef __alignof_is_defined
+# error "The header  fails to define a macro named 
__alignof_is_defined"
+#elif __alignof_is_defined != 1
+# error "__alignof_is_defined is not defined to 1 in "
+#endif
+
+#ifdef alignas
+# error "The header  defines a macro named alignas"
+#endif
+
+#ifdef alignof
+# error "The header  defines a macro named alignof"
+#endif




Re: [PATCH] SVE intrinsics: Fold division and multiplication by -1 to neg.

2024-10-23 Thread Richard Sandiford
Jennifer Schmitz  writes:
> Because a neg instruction has lower latency and higher throughput than
> sdiv and mul, svdiv and svmul by -1 can be folded to svneg. For svdiv,
> this is already implemented on the RTL level; for svmul, the
> optimization was still missing.
> This patch implements folding to svneg for both operations using the
> gimple_folder. For svdiv, the transform is applied if the divisor is -1.
> Svmul is folded if either of the operands is -1. A case distinction of
> the predication is made to account for the fact that svneg_m has 3 arguments
> (argument 0 holds the values for the inactive lanes), while svneg_x and
> svneg_z have only 2 arguments.
> Tests were added or adjusted to check the produced assembly and runtime
> tests were added to check correctness.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 

Sorry for the slow review.

> [...]
> @@ -2033,12 +2054,37 @@ public:
>  if (integer_zerop (op1) || integer_zerop (op2))
>return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>  
> +/* If one of the operands is all integer -1, fold to svneg.  */
> +tree pg = gimple_call_arg (f.call, 0);
> +tree negated_op = NULL;
> +if (integer_minus_onep (op2))
> +  negated_op = op1;
> +else if (integer_minus_onep (op1))
> +  negated_op = op2;
> +if (!f.type_suffix (0).unsigned_p && negated_op)

This is definitely ok, but it would be nice to handle the unsigned_p
case too at some point.  This would mean bit-casting to the equivalent
signed type, doing the negation, and casting back.  It would be good
to have a helper for doing that (maybe with a lambda callback that
emits the actual call) since I can imagine it will be needed elsewhere
too.

> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> index 13009d88619..1d605dbdd8d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_s32_m_untied, svint32_t,
>  
>  /*
>  ** mul_m1_s32_m:
> -**   mov (z[0-9]+)\.b, #-1
> -**   mul z0\.s, p0/m, z0\.s, \1\.s
> +**   neg z0\.s, p0/m, z0\.s
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_s32_m, svint32_t,
>   z0 = svmul_n_s32_m (p0, z0, -1),
>   z0 = svmul_m (p0, z0, -1))
>  
> +/*
> +** mul_m1r_s32_m:
> +**   mov (z[0-9]+)\.b, #-1
> +**   mov (z[0-9]+)\.d, z0\.d
> +**   movprfx z0, \1
> +**   neg z0\.s, p0/m, \2\.s
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_s32_m, svint32_t,
> + z0 = svmul_s32_m (p0, svdup_s32 (-1), z0),
> + z0 = svmul_m (p0, svdup_s32 (-1), z0))

Maybe it would be better to test the untied case instead, by passing
z1 rather than z0 as the final argument.  Hopefully that would leave
us with just the first and last instructions.  (I think the existing
tests already cover the awkward tied2 case well enough.)

Same for the later similar tests.

OK with that change, thanks.

Richard

> +
>  /*
>  ** mul_s32_z_tied1:
>  **   movprfx z0\.s, p0/z, z0\.s
> @@ -597,13 +608,44 @@ TEST_UNIFORM_Z (mul_255_s32_x, svint32_t,
>  
>  /*
>  ** mul_m1_s32_x:
> -**   mul z0\.s, z0\.s, #-1
> +**   neg z0\.s, p0/m, z0\.s
>  **   ret
>  */
>  TEST_UNIFORM_Z (mul_m1_s32_x, svint32_t,
>   z0 = svmul_n_s32_x (p0, z0, -1),
>   z0 = svmul_x (p0, z0, -1))
>  
> +/*
> +** mul_m1r_s32_x:
> +**   neg z0\.s, p0/m, z0\.s
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_s32_x, svint32_t,
> + z0 = svmul_s32_x (p0, svdup_s32 (-1), z0),
> + z0 = svmul_x (p0, svdup_s32 (-1), z0))
> +
> +/*
> +** mul_m1_s32_z:
> +**   mov (z[0-9]+)\.d, z0\.d
> +**   movprfx z0\.s, p0/z, \1\.s
> +**   neg z0\.s, p0/m, \1\.s
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1_s32_z, svint32_t,
> + z0 = svmul_n_s32_z (p0, z0, -1),
> + z0 = svmul_z (p0, z0, -1))
> +
> +/*
> +** mul_m1r_s32_z:
> +**   mov (z[0-9]+)\.d, z0\.d
> +**   movprfx z0\.s, p0/z, \1\.s
> +**   neg z0\.s, p0/m, \1\.s
> +**   ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_s32_z, svint32_t,
> + z0 = svmul_s32_z (p0, svdup_s32 (-1),  z0),
> + z0 = svmul_z (p0, svdup_s32 (-1), z0))
> +
>  /*
>  ** mul_m127_s32_x:
>  **   mul z0\.s, z0\.s, #-127
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> index 530d9fc84a5..c05d184f2fe 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> @@ -192,8 +192,7 @@ TEST_UNIFORM_Z (mul_3_s64_m_untied, svint64_t,
>  
>  /*
>  ** mul_m1_s64_m:
> -**   mov (z[0-9]+)\.b, #-1
> -**   mul z0\.d, p0/m, z0\.d, \1\.d
> +**   neg z0\.d, p0/m, z0\.d
>  **   ret

Re: [PATCH v2 2/2] aarch64: Add mfloat vreinterpret intrinsics

2024-10-23 Thread Richard Sandiford
Andrew Carlotti  writes:
> This patch splits out some of the qualifier handling from the v1 patch, and
> adjusts the VREINTERPRET* macros to include support for mf8 intrinsics.
>
> Bootstrapped and regression tested on aarch64; ok for master?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (MODE_d_mf8): New.
>   (MODE_q_mf8): New.
>   (QUAL_mf8): New.
>   (VREINTERPRET_BUILTINS1): Add mf8 entry.
>   (VREINTERPRET_BUILTINS): Ditto.
>   (VREINTERPRETQ_BUILTINS1): Ditto.
>   (VREINTERPRETQ_BUILTINS): Ditto.
>   (aarch64_lookup_simd_type_in_table): Match modal_float bit
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 432131c3b2d7cf4f788b79ce3d84c9e7554dc750..31231c9e66ee8307cb86e181fc51ea2622c5f82c
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -133,6 +133,7 @@
>  #define MODE_d_f16 E_V4HFmode
>  #define MODE_d_f32 E_V2SFmode
>  #define MODE_d_f64 E_V1DFmode
> +#define MODE_d_mf8 E_V8QImode
>  #define MODE_d_s8 E_V8QImode
>  #define MODE_d_s16 E_V4HImode
>  #define MODE_d_s32 E_V2SImode
> @@ -148,6 +149,7 @@
>  #define MODE_q_f16 E_V8HFmode
>  #define MODE_q_f32 E_V4SFmode
>  #define MODE_q_f64 E_V2DFmode
> +#define MODE_q_mf8 E_V16QImode
>  #define MODE_q_s8 E_V16QImode
>  #define MODE_q_s16 E_V8HImode
>  #define MODE_q_s32 E_V4SImode
> @@ -177,6 +179,7 @@
>  #define QUAL_p16 qualifier_poly
>  #define QUAL_p64 qualifier_poly
>  #define QUAL_p128 qualifier_poly
> +#define QUAL_mf8 qualifier_modal_float
>  
>  #define LENGTH_d ""
>  #define LENGTH_q "q"
> @@ -598,6 +601,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>  /* vreinterpret intrinsics are defined for any pair of element types.
> { _bf16   }   { _bf16   }
> {  _f16 _f32 _f64 }   {  _f16 _f32 _f64 }
> +   { _mf8}   { _mf8}
> { _s8  _s16 _s32 _s64 } x { _s8  _s16 _s32 _s64 }
> { _u8  _u16 _u32 _u64 }   { _u8  _u16 _u32 _u64 }
> { _p8  _p16  _p64 }   { _p8  _p16  _p64 }.  */
> @@ -609,6 +613,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VREINTERPRET_BUILTIN2 (A, f16) \
>VREINTERPRET_BUILTIN2 (A, f32) \
>VREINTERPRET_BUILTIN2 (A, f64) \
> +  VREINTERPRET_BUILTIN2 (A, mf8) \
>VREINTERPRET_BUILTIN2 (A, s8) \
>VREINTERPRET_BUILTIN2 (A, s16) \
>VREINTERPRET_BUILTIN2 (A, s32) \
> @@ -626,6 +631,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VREINTERPRET_BUILTINS1 (f16) \
>VREINTERPRET_BUILTINS1 (f32) \
>VREINTERPRET_BUILTINS1 (f64) \
> +  VREINTERPRET_BUILTINS1 (mf8) \
>VREINTERPRET_BUILTINS1 (s8) \
>VREINTERPRET_BUILTINS1 (s16) \
>VREINTERPRET_BUILTINS1 (s32) \
> @@ -641,6 +647,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>  /* vreinterpretq intrinsics are additionally defined for p128.
> { _bf16 }   { _bf16 }
> {  _f16 _f32 _f64   }   {  _f16 _f32 _f64   }
> +   { _mf8  }   { _mf8  }
> { _s8  _s16 _s32 _s64   } x { _s8  _s16 _s32 _s64   }
> { _u8  _u16 _u32 _u64   }   { _u8  _u16 _u32 _u64   }
> { _p8  _p16  _p64 _p128 }   { _p8  _p16  _p64 _p128 }.  */
> @@ -652,6 +659,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VREINTERPRETQ_BUILTIN2 (A, f16) \
>VREINTERPRETQ_BUILTIN2 (A, f32) \
>VREINTERPRETQ_BUILTIN2 (A, f64) \
> +  VREINTERPRETQ_BUILTIN2 (A, mf8) \
>VREINTERPRETQ_BUILTIN2 (A, s8) \
>VREINTERPRETQ_BUILTIN2 (A, s16) \
>VREINTERPRETQ_BUILTIN2 (A, s32) \
> @@ -670,6 +678,7 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VREINTERPRETQ_BUILTINS1 (f16) \
>VREINTERPRETQ_BUILTINS1 (f32) \
>VREINTERPRETQ_BUILTINS1 (f64) \
> +  VREINTERPRETQ_BUILTINS1 (mf8) \
>VREINTERPRETQ_BUILTINS1 (s8) \
>VREINTERPRETQ_BUILTINS1 (s16) \
>VREINTERPRETQ_BUILTINS1 (s32) \
> @@ -1117,7 +1126,8 @@ aarch64_lookup_simd_type_in_table (machine_mode mode,
>  {
>int i;
>int nelts = ARRAY_SIZE (aarch64_simd_types);
> -  int q = qualifiers & (qualifier_poly | qualifier_unsigned);
> +  int q = qualifiers
> +& (qualifier_poly | qualifier_unsigned | qualifier_modal_float);
>  
>for (i = 0; i < nelts; i++)
>  {
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c
> new file mode 100644
> index 
> ..5e5921746036bbfbf20d2a77697760efd1f71cc2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/mf8-reinterpret.c
> @@ -0,0 +1,46

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-23 Thread Richard Biener
On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Retrieving sys/user time in timevars is quite expensive because it
> always needs a system call. Only getting the wall time is much
> cheaper because operating systems have optimized paths for this.
>
> The sys time isn't that interesting for a compiler and wall time
> is usually close to user time except when the system is overloaded.
> On the other hand when it is not wall time is more accurate because
> it has less overhead.
>
> For building tramp3d with -O0 the -ftime-report overhead drops from
> 18% to 3%. For -O2 it drops from 8% to not measurable.
>
> I changed the code to use gettimeofday as a fallback for clock_gettime
> CLOCK_MONOTONIC.  If a host has neither of those the time will not
> be measured. Previously clock was the fallback.

OK for trunk if there's no serious objection until mid next week.

Richard.

> This removes a lot of code in timevar.cc:
>
>  gcc/timevar.cc | 167 
> ++---
>  gcc/timevar.h  |  10 +---
>
>  2 files changed, 17 insertions(+), 160 deletions(-)
>
> Bootstrapped on x86_64-linux with full test suite run.
>
> gcc/ChangeLog:
>
> * timevar.cc (struct tms): Remove.
> (RUSAGE_SELF): Remove.
> (TICKS_PER_SECOND): Remove.
> (USE_TIMES): Remove.
> (HAVE_USER_TIME): Remove.
> (HAVE_SYS_TIME): Remove.
> (HAVE_WALL_TIME): Remove.
> (USE_GETRUSAGE): Remove.
> (USE_CLOCK): Remove.
> (NANOSEC_PER_SEC): Remove.
> (TICKS_TO_NANOSEC): Remove.
> (CLOCKS_TO_NANOSEC): Remove.
> (timer::named_items::push): Remove sys/user.
> (get_time): Remove clock and times and getruage code.
> (timevar_accumulate): Remove sys/user.
> (timevar_diff): Dito.
> (timer::validate_phases): Dito.
> (timer::print_row): Dito.
> (timer::all_zero): Dito.
> (timer::print): Dito.
> (make_json_for_timevar_time_def): Dito.
> * timevar.h (struct timevar_time_def): Dito.
>
> ---
>
> v2: Adjust JSON/Sarif output too.
> v3: Make unconditional.
> ---
>  gcc/timevar.cc | 189 ++---
>  gcc/timevar.h  |  10 +--
>  2 files changed, 22 insertions(+), 177 deletions(-)
>
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 68bcf44864f9..4a57e74230d3 100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -26,84 +26,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "options.h"
>  #include "json.h"
>
> -#ifndef HAVE_CLOCK_T
> -typedef int clock_t;
> -#endif
> -
> -#ifndef HAVE_STRUCT_TMS
> -struct tms
> -{
> -  clock_t tms_utime;
> -  clock_t tms_stime;
> -  clock_t tms_cutime;
> -  clock_t tms_cstime;
> -};
> -#endif
> -
> -#ifndef RUSAGE_SELF
> -# define RUSAGE_SELF 0
> -#endif
> -
> -/* Calculation of scale factor to convert ticks to seconds.
> -   We mustn't use CLOCKS_PER_SEC except with clock().  */
> -#if HAVE_SYSCONF && defined _SC_CLK_TCK
> -# define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
> -#else
> -# ifdef CLK_TCK
> -#  define TICKS_PER_SECOND CLK_TCK /* POSIX 1003.1-1988; obsolescent */
> -# else
> -#  ifdef HZ
> -#   define TICKS_PER_SECOND HZ  /* traditional UNIX */
> -#  else
> -#   define TICKS_PER_SECOND 100 /* often the correct value */
> -#  endif
> -# endif
> -#endif
> -
> -/* Prefer times to getrusage to clock (each gives successively less
> -   information).  */
> -#ifdef HAVE_TIMES
> -# if defined HAVE_DECL_TIMES && !HAVE_DECL_TIMES
> -  extern clock_t times (struct tms *);
> -# endif
> -# define USE_TIMES
> -# define HAVE_USER_TIME
> -# define HAVE_SYS_TIME
> -# define HAVE_WALL_TIME
> -#else
> -#ifdef HAVE_GETRUSAGE
> -# if defined HAVE_DECL_GETRUSAGE && !HAVE_DECL_GETRUSAGE
> -  extern int getrusage (int, struct rusage *);
> -# endif
> -# define USE_GETRUSAGE
> -# define HAVE_USER_TIME
> -# define HAVE_SYS_TIME
> -#else
> -#ifdef HAVE_CLOCK
> -# if defined HAVE_DECL_CLOCK && !HAVE_DECL_CLOCK
> -  extern clock_t clock (void);
> -# endif
> -# define USE_CLOCK
> -# define HAVE_USER_TIME
> -#endif
> -#endif
> -#endif
> -
> -/* libc is very likely to have snuck a call to sysconf() into one of
> -   the underlying constants, and that can be very slow, so we have to
> -   precompute them.  Whose wonderful idea was it to make all those
> -   _constants_ variable at run time, anyway?  */
> -#define NANOSEC_PER_SEC 10
> -#ifdef USE_TIMES
> -static uint64_t ticks_to_nanosec;
> -#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
> -#endif
> -
> -#ifdef USE_CLOCK
> -static uint64_t clocks_to_nanosec;
> -#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
> -#endif
> -
>  /* Non-NULL if timevars should be used.  In GCC, this happens with
> the -ftime-report flag.  */
>
> @@ -181,8 +103,6 @@ timer::named_items::push (const char *item_name)
>timer::timevar_def *def = &m_hash_map.get_or_insert (item_name

Re: [PATCH] Add 'cobol' to Makefile.def, take 2

2024-10-23 Thread Richard Biener
On Tue, Oct 15, 2024 at 1:10 AM James K. Lowden
 wrote:
>
> Consequent to advice, I'm preparing the Cobol front-end patches as a
> small number of hopefully meaningful patches covering many files.
>
> 1.  meta files used by autotools etc.
> 2.  gcc/cobol/*.h
> 3.  gcc/cobol/*.{y,l,cc}
> 4.  libgcobol
> 5.  documentation
> 6.  tests
>
> The patch below is step #1.  It comprises all the "meta files" needed
> for the Cobol front end, including every existing file that we modified.
>
> 1.  It does not interfere with --languages=c,c++, etc
> 2.  It does not work with --languages=cobol because the source files
> are missing.
>
> If this looks OK, I'll continue on the same path. I can have the next
> set ready tomorrow afternoon.
>
> The next message would be a set of 3 patches, steps 2-4 above.  That
> will build with --languages=cobol, but not install or test.  Test &
> documentation files would comprise the remaining patches.
>
> In testing the patch with "git am", I got a warning about a blank line
> at EOF, but I couldn't figure out where it was, or if it mattered.
>
> --jkl
>
> From 06a93d00f4433fb61ff9611c6e945a3a11c89479bld.patch 4 Oct 2024 12:01:22 
> -0400
> From: "James K. Lowden" 
> Date: Mon 14 Oct 2024 03:25:23 PM EDT
> Subject: [PATCH]  Add 'cobol' to 9 files
>
> * Makefile.def: Add libgcobol module and cobol language.
> * configure: Add libgcobol module and cobol language.

This would say

   * configure: Regenerated.

> * configure.ac: Add libgcobol module and cobol language.
> * gcc/cobol/LICENSE: Add gcc/cobol/LICENSE
> * gcc/cobol/Make-lang.in: Add libgcobol module and cobol language.
> * gcc/cobol/config-lang.in: Add libgcobol module and cobol language.
> * gcc/cobol/lang.opt: Add libgcobol module and cobol language.
> * gcc/common.opt: Add libgcobol module and cobol language.
>
> ---
> Makefile.def | ++-
> configure | +-
> configure.ac | +-
> gcc/cobol/LICENSE | +-
> gcc/cobol/Make-lang.in | 
> -
> gcc/cobol/config-lang.in | ++-
> gcc/cobol/lang.opt | 
> -
> gcc/common.opt | 
> 8 files changed, 439 insertions(+), 8 deletions(-)
> diff --git a/Makefile.def b/Makefile.def
> index 19954e7d731..1192e852c7a 100644
> --- a/Makefile.def
> +++ b/Makefile.def
> @@ -209,6 +209,7 @@ target_modules = { module= libgomp; bootstrap= true; 
> lib_path=.libs; };
>  target_modules = { module= libitm; lib_path=.libs; };
>  target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
>  target_modules = { module= libgrust; };
> +target_modules = { module= libgcobol; };
>
>  // These are (some of) the make targets to be done in each subdirectory.
>  // Not all; these are the ones which don't have special options.
> @@ -324,6 +325,7 @@ flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; };
>  flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; };
>  flags_to_pass = { flag= DSYMUTIL_FOR_TARGET ; };
>  flags_to_pass = { flag= FLAGS_FOR_TARGET ; };
> +flags_to_pass = { flag= GCOBOL_FOR_TARGET ; };
>  flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; };
>  flags_to_pass = { flag= GOC_FOR_TARGET ; };
>  flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; };
> @@ -655,6 +657,7 @@ lang_env_dependencies = { module=libgcc; no_gcc=true; 
> no_c=true; };
>  // built newlib on some targets (e.g. Cygwin).  It still needs
>  // a dependency on libgcc for native targets to configure.
>  lang_env_dependencies = { module=libiberty; no_c=true; };
> +lang_env_dependencies = { module=libgcobol; cxx=true; };
>
>  dependencies = { module=configure-target-fastjar; on=configure-target-zlib; 
> };
>  dependencies = { module=all-target-fastjar; on=all-target-zlib; };
> @@ -690,6 +693,7 @@ dependencies = { module=install-target-libvtv; 
> on=install-target-libgcc; };
>  dependencies = { module=install-target-libitm; on=install-target-libgcc; };
>  dependencies = { module=install-target-libobjc; on=install-target-libgcc; };
>  dependencies = { module=install-target-libstdc++-v3; 
> on=install-target-libgcc; };
> +dependencies = { module=install-target-libgcobol; 
> on=install-target-libstdc++-v3; };
>
>  // Target modules in the 'src' repository.
>  lang_env_dependencies = { module=libtermcap; };
> @@ -727,6 +731,8 @@ languages = { language=d;   gcc-check-target=check-d;
> lib-check-target=check-target-libphobos; };
>  languages = { language=jit;gcc-check-target=check-jit; };
>  languages = { language=rust;   gcc-check-target=check-rust; };
> +languages = { language=cobol;  gcc-check-target=check-cobol;
> +   

Re: [PATCH v2 1/2] aarch64: Add support for mfloat8x{8|16}_t types

2024-10-23 Thread Richard Sandiford
Andrew Carlotti  writes:
> Compared to v1, I've split changes that aren't used for the type definitions
> into a separate patch.  I've also added some tests, mostly along the lines
> suggested by Richard S.
>
> Bootstrapped and regression tested on aarch64; ok for master?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc
>   (aarch64_init_simd_builtin_types): Initialise FP8 simd types.
>   * config/aarch64/aarch64-builtins.h
>   (enum aarch64_type_qualifiers): Add qualifier_modal_float bit.
>   * config/aarch64/aarch64-simd-builtin-types.def:
>   Add Mfloat8x{8|16}_t types.
>   * config/aarch64/arm_neon.h: Add mfloat8x{8|16}_t typedefs.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/movv16qi_2.c: Test mfloat as well.
>   * gcc.target/aarch64/movv16qi_3.c: Ditto.
>   * gcc.target/aarch64/movv2x16qi_1.c: Ditto.
>   * gcc.target/aarch64/movv3x16qi_1.c: Ditto.
>   * gcc.target/aarch64/movv4x16qi_1.c: Ditto.
>   * gcc.target/aarch64/movv8qi_2.c: Ditto.
>   * gcc.target/aarch64/movv8qi_3.c: Ditto.
>   * gcc.target/aarch64/mfloat-init-1.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-builtins.h 
> b/gcc/config/aarch64/aarch64-builtins.h
> index 
> e326fe666769cedd6c06d0752ed30b9359745ac9..00db7a74885db4d97ed365e8e3e2d7cf7d8410a4
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.h
> +++ b/gcc/config/aarch64/aarch64-builtins.h
> @@ -54,6 +54,8 @@ enum aarch64_type_qualifiers
>/* Lane indices selected in quadtuplets. - must be in range, and flipped 
> for
>   bigendian.  */
>qualifier_lane_quadtup_index = 0x1000,
> +  /* Modal FP types.  */
> +  qualifier_modal_float = 0x2000,
>  };
>  
>  #define ENTRY(E, M, Q, G) E,
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 
> 7d737877e0bf6c1f9eb53351a6085b0db16a04d6..432131c3b2d7cf4f788b79ce3d84c9e7554dc750
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1220,6 +1220,10 @@ aarch64_init_simd_builtin_types (void)
>aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
>aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
>  
> +  /* Init FP8 element types.  */
> +  aarch64_simd_types[Mfloat8x8_t].eltype = aarch64_mfp8_type_node;
> +  aarch64_simd_types[Mfloat8x16_t].eltype = aarch64_mfp8_type_node;
> +
>for (i = 0; i < nelts; i++)
>  {
>tree eltype = aarch64_simd_types[i].eltype;
> diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def 
> b/gcc/config/aarch64/aarch64-simd-builtin-types.def
> index 
> 6111cd0d4fe1136feabb36a4077cf86d13b835e2..83b2da2e7dc0962c1e5957e25c8f6232c2148fe5
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
> @@ -52,3 +52,5 @@
>ENTRY (Float64x2_t, V2DF, none, 13)
>ENTRY (Bfloat16x4_t, V4BF, none, 14)
>ENTRY (Bfloat16x8_t, V8BF, none, 14)
> +  ENTRY (Mfloat8x8_t, V8QI, modal_float, 13)
> +  ENTRY (Mfloat8x16_t, V16QI, modal_float, 14)
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> e376685489da055029def6b661132b5154886b57..730d9d3fa8158ef2d1d13c0f629e306e774145a0
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -72,6 +72,9 @@ typedef __Poly16_t poly16_t;
>  typedef __Poly64_t poly64_t;
>  typedef __Poly128_t poly128_t;
>  
> +typedef __Mfloat8x8_t mfloat8x8_t;
> +typedef __Mfloat8x16_t mfloat8x16_t;
> +
>  typedef __fp16 float16_t;
>  typedef float float32_t;
>  typedef double float64_t;
> diff --git a/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c 
> b/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c
> new file mode 100644
> index 
> ..15a6b331fd3986476950e799d11bdef710193f1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/mfloat-init-1.c
> @@ -0,0 +1,5 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O --save-temps" } */
> +
> +/* { dg-error "invalid conversion to type 'mfloat8_t" "" {target *-*-*} 0 } 
> */
> +__Mfloat8x8_t const_mf8x8 () { return (__Mfloat8x8_t) { 1, 1, 1, 1, 1, 1, 1, 
> 1 }; }
> diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c 
> b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
> index 
> 08a0a19b515134742fcb121e8cf6a19600f86075..39a06db0707538996fb5a3990ef53589d0210b17
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_2.c
> @@ -17,6 +17,7 @@ TEST_GENERAL (__Bfloat16x8_t)
>  TEST_GENERAL (__Float16x8_t)
>  TEST_GENERAL (__Float32x4_t)
>  TEST_GENERAL (__Float64x2_t)
> +TEST_GENERAL (__Mfloat8x16_t)
>  
>  __Int8x16_t const_s8x8 () { return (__Int8x16_t) { 1, 1, 1, 1, 1, 1, 1, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1 }; }
>  __Int16x8_t const_s16x4 () { return (__Int16x8_t) { 1, 0, 1, 0, 1, 0, 1, 0 
> }; }
> diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_3

Re: [PATCH] c++: Further fix for get_member_function_from_ptrfunc [PR117259]

2024-10-23 Thread Jason Merrill

On 10/23/24 12:33 PM, Jakub Jelinek wrote:

On Wed, Oct 23, 2024 at 12:27:32PM -0400, Jason Merrill wrote:

On 10/22/24 2:17 PM, Jakub Jelinek wrote:

The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.

The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation).  Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.


Hmm, when is that be a problem?  I wouldn't expect instance_ptr to be
tree_invariant_p.


E.g. TREE_READONLY !TREE_SIDE_EFFECTS ARRAY_REF (with some const array first
operand and some VAR_DECL etc. second operand).


That seems like a bug in tree_invariant_p.


That is tree_invariant_p, but when -fsanitize=bounds attempts to instrument
that, it sees the index is a VAR_DECL and so creates SAVE_EXPR for it.

Jakub





[Pushed] aarch64: Fix warning in aarch64_ptrue_reg

2024-10-23 Thread Andrew Pinski
After r15-4579-g9ffcf1f193b477, we get the following warning/error while 
bootstrapping on aarch64:
```
../../gcc/gcc/config/aarch64/aarch64.cc: In function ‘rtx_def* 
aarch64_ptrue_reg(machine_mode, unsigned int)’:
../../gcc/gcc/config/aarch64/aarch64.cc:3643:21: error: comparison of integer 
expressions of different signedness: ‘int’ and ‘unsigned int’ 
[-Werror=sign-compare]
 3643 |   for (int i = 0; i < vl; i++)
  |   ~~^~~~
```

This changes the type of i to unsigned to match the type of vl.

Pushed as obvious after a bootstrap/test on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_ptrue_reg):

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e6d957d275d..7fbe3a7380c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3640,10 +3640,10 @@ aarch64_ptrue_reg (machine_mode mode, unsigned int vl)
 
   rtx_vector_builder builder (VNx16BImode, vl, 2);
 
-  for (int i = 0; i < vl; i++)
+  for (unsigned i = 0; i < vl; i++)
 builder.quick_push (CONST1_RTX (BImode));
 
-  for (int i = 0; i < vl; i++)
+  for (unsigned i = 0; i < vl; i++)
 builder.quick_push (CONST0_RTX (BImode));
 
   rtx const_vec = builder.build ();
-- 
2.43.0



Re: [PATCH 2/4] RISC-V: Implement TARGET_SCHED_PRESSURE_PREFER_NARROW [PR/114729]

2024-10-23 Thread Vineet Gupta
On 10/22/24 12:02, rep.dot@gmail.com wrote:
>> +/* { dg-final { scan-assembler-times "%sfp" 0 } } */
> scan-assembler-not, please

Fixed and also in the other patch.

Thx,
-Vineet


Re: [PATCH v2 3/4] aarch64: improve assembly debug comments for AEABI build attributes

2024-10-23 Thread Richard Sandiford
Matthieu Longo  writes:
> The previous implementation to emit AEABI build attributes did not
> support string values (asciz) in aeabi_subsection, and was not
> emitting values associated to tags in the assembly comments.
>
> This new approach provides a more user-friendly interface relying on
> typing, and improves the emitted assembly comments:
>   * aeabi_attribute:
> ** Adds the interpreted value next to the tag in the assembly
> comment.
> ** Supports asciz values.
>   * aeabi_subsection:
> ** Adds debug information for its parameters.
> ** Auto-detects the attribute types when declaring the subsection.
>
> Additionally, it is also interesting to note that the code was moved
> to a separate file to improve modularity and "releave" the 1000-lines

I think you dropped a 0.  I wish it was only 1000 :-)

> long aarch64.cc file from a few lines. Finally, it introduces a new
> namespace "aarch64::" for AArch64 backend which reduce the length of
> function names by not prepending 'aarch64_' to each of them.
> [...]
> diff --git a/gcc/config/aarch64/aarch64-dwarf-metadata.h 
> b/gcc/config/aarch64/aarch64-dwarf-metadata.h
> new file mode 100644
> index 000..01f08ad073e
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-dwarf-metadata.h
> @@ -0,0 +1,226 @@
> +/* DWARF metadata for AArch64 architecture.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#ifndef GCC_AARCH64_DWARF_METADATA_H
> +#define GCC_AARCH64_DWARF_METADATA_H
> +
> +#include "system.h"

We should drop this line.  It's the .cc file's responsibility to
include system.h.

> +#include "vec.h"
> +
> +namespace aarch64 {
> +
> +enum attr_val_type : uint8_t
> +{
> +  uleb128 = 0x0,
> +  asciz = 0x1,
> +};
> +
> +enum BA_TagFeature_t : uint8_t
> +{
> +  Tag_Feature_BTI = 1,
> +  Tag_Feature_PAC = 2,
> +  Tag_Feature_GCS = 3,
> +};
> +
> +template 
> +struct aeabi_attribute
> +{
> +  T_tag tag;
> +  T_val value;
> +};
> +
> +template 
> +aeabi_attribute
> +make_aeabi_attribute (T_tag tag, T_val val)
> +{
> +  return aeabi_attribute{tag, val};
> +}
> +
> +namespace details {
> +
> +constexpr const char *
> +to_c_str (bool b)
> +{
> +  return b ? "true" : "false";
> +}
> +
> +constexpr const char *
> +to_c_str (const char *s)
> +{
> +  return s;
> +}
> +
> +constexpr const char *
> +to_c_str (attr_val_type t)
> +{
> +  return (t == uleb128 ? "ULEB128"
> +   : t == asciz ? "asciz"
> +   : nullptr);
> +}
> +
> +constexpr const char *
> +to_c_str (BA_TagFeature_t feature)
> +{
> +  return (feature == Tag_Feature_BTI ? "Tag_Feature_BTI"
> +   : feature == Tag_Feature_PAC ? "Tag_Feature_PAC"
> +   : feature == Tag_Feature_GCS ? "Tag_Feature_GCS"
> +   : nullptr);
> +}
> +
> +template <
> +  typename T,
> +  typename = typename std::enable_if::value, T>::type
> +>
> +constexpr const char *
> +aeabi_attr_str_fmt (T phantom __attribute__((unused)))

FWIW, it would be ok to drop the parameter name and the attribute.
But it's ok as-is too, if you think it makes the intention clearer.

> +{
> +  return "\t.aeabi_attribute %u, %u";
> +}
> +
> +constexpr const char *
> +aeabi_attr_str_fmt (const char *phantom __attribute__((unused)))
> +{
> +  return "\t.aeabi_attribute %u, \"%s\"";
> +}
> [...]
> @@ -24834,17 +24808,21 @@ aarch64_start_file (void)
> asm_fprintf (asm_out_file, "\t.arch %s\n",
>   aarch64_last_printed_arch_string.c_str ());
>  
> -  /* Check whether the current assembly supports gcs build attributes, if not
> - fallback to .note.gnu.property section.  */
> +  /* Check whether the current assembler supports AEABI build attributes, if
> + not fallback to .note.gnu.property section.  */
>  #if (HAVE_AS_AEABI_BUILD_ATTRIBUTES)

Just to note that, as with patch 2, I hope this could be:

  if (HAVE_AS_AEABI_BUILD_ATTRIBUTES)
{
  ...
}

instead.

OK with those changes, thanks.

Richard


Re: [PATCH] c++: Further fix for get_member_function_from_ptrfunc [PR117259]

2024-10-23 Thread Jakub Jelinek
On Wed, Oct 23, 2024 at 12:27:32PM -0400, Jason Merrill wrote:
> On 10/22/24 2:17 PM, Jakub Jelinek wrote:
> > The following testcase shows that the previous 
> > get_member_function_from_ptrfunc
> > changes weren't sufficient and we still have cases where
> > -fsanitize=undefined with pointers to member functions can cause wrong code
> > being generated and related false positive warnings.
> > 
> > The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
> > some invariant arithmetics and in the end it could be really large
> > expressions which would be evaluated several times (and what is worse, with
> > -fsanitize=undefined those expressions then can have SAVE_EXPRs added to
> > their subparts for -fsanitize=bounds or -fsanitize=null or
> > -fsanitize=alignment instrumentation).  Tried to just build1 a SAVE_EXPR
> > + add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
> > because cp_fold happily optimizes those SAVE_EXPRs away when it sees
> > SAVE_EXPR operand is tree_invariant_p.
> 
> Hmm, when is that be a problem?  I wouldn't expect instance_ptr to be
> tree_invariant_p.

E.g. TREE_READONLY !TREE_SIDE_EFFECTS ARRAY_REF (with some const array first
operand and some VAR_DECL etc. second operand).
That is tree_invariant_p, but when -fsanitize=bounds attempts to instrument
that, it sees the index is a VAR_DECL and so creates SAVE_EXPR for it.

Jakub



RE: [Pushed] aarch64: Fix warning in aarch64_ptrue_reg

2024-10-23 Thread Pengxuan Zheng (QUIC)
My bad. Thanks for fixing this quickly, Andrew!

Thanks,
Pengxuan
> 
> After r15-4579-g9ffcf1f193b477, we get the following warning/error while
> bootstrapping on aarch64:
> ```
> ../../gcc/gcc/config/aarch64/aarch64.cc: In function ‘rtx_def*
> aarch64_ptrue_reg(machine_mode, unsigned int)’:
> ../../gcc/gcc/config/aarch64/aarch64.cc:3643:21: error: comparison of
> integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-
> Werror=sign-compare]
>  3643 |   for (int i = 0; i < vl; i++)
>   |   ~~^~~~
> ```
> 
> This changes the type of i to unsigned to match the type of vl.
> 
> Pushed as obvious after a bootstrap/test on aarch64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.cc (aarch64_ptrue_reg):
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e6d957d275d..7fbe3a7380c 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -3640,10 +3640,10 @@ aarch64_ptrue_reg (machine_mode mode,
> unsigned int vl)
> 
>rtx_vector_builder builder (VNx16BImode, vl, 2);
> 
> -  for (int i = 0; i < vl; i++)
> +  for (unsigned i = 0; i < vl; i++)
>  builder.quick_push (CONST1_RTX (BImode));
> 
> -  for (int i = 0; i < vl; i++)
> +  for (unsigned i = 0; i < vl; i++)
>  builder.quick_push (CONST0_RTX (BImode));
> 
>rtx const_vec = builder.build ();
> --
> 2.43.0



Re: [PATCH] c++: Further fix for get_member_function_from_ptrfunc [PR117259]

2024-10-23 Thread Jason Merrill

On 10/23/24 3:07 PM, Jakub Jelinek wrote:

On Wed, Oct 23, 2024 at 08:53:36PM +0200, Jakub Jelinek wrote:

save_expr has been doing that at least since 1992, likely before that.
Though, that
4073  /* Array ref is const/volatile if the array elements are
4074 or if the array is..  */
4075  TREE_READONLY (rval)
4076|= (CP_TYPE_CONST_P (type) | TREE_READONLY (array));
is done in the C++ FE also since 1994-ish.


Setting TREE_READONLY is correct, according to tree.h that just means it 
means it isn't modifiable.



I'm afraid what will break especially in Ada if we change it.
Though, unsure even to what.

Are the TREE_READONLY flags needed on ARRAY_REFs/COMPONENT_REFs with
ARRAY_REF bases etc.?
If yes, are ARRAY_REFs/ARRAY_RANGE_REFs with non-invariant index (or
possibly also non-invariant 3rd/4th argument or ARRAY_RANGE_REFs with
non-invariant type size) the only problematic cases?
Say TREE_READONLY COMPONENT_REF with VAR_DECL base should be invariant
I'd hope.
So, should we for the (TREE_READONLY (t) && !TREE_SIDE_EFFECTS (t))
case walk the tree, looking for the ARRAY_REFs etc. and checking if that
is really invariant?


Perhaps INDIRECT_REF/MEM_REF are similarly a problem, one could have
TREE_READONLY INDIRECT_REF or say COMPONENT_REF with INDIRECT_REF first
operand etc. and if the pointer which is dereferenced isn't invariant,
then the INDIRECT_REF/MEM_REF isn't invariant either.


Exactly.

Jason



[PATCH 3/2] c++: remove WILDCARD_DECL

2024-10-23 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

This tree code was added as part of the initial Concepts TS
implementation to support type-constraints introducing any kind
of template-parameter, not just type template-parameters, e.g.

  template concept C = ...;
  template class TT> concept D = ...;

  template void f(); // T is an NTTP of type int, U is a TTP

When resolving the type-constraint we would use WILDCARD_DECL as the
dummy first argument during template argument coercion that is a valid
argument for any kind of template parameter.

But Concepts TS support has been removed, and C++20 type-constraints are
restricted to only introduce type template-parameters, and so we don't
need this catch-all WILDCARD_DECL anymore; we can instead use an auto
as the dummy first argument.

In passing introduce a helper for returning the prototype parameter
(i.e. first template parameter) of a concept and use it.  Also remove a
redundant concept_definition_p overload.

gcc/cp/ChangeLog:

* constraint.cc (build_type_constraint): Use an auto as the
first template argument.
(finish_shorthand_constraint): Use concept_prototype_parameter.
* cp-objcp-common.cc (cp_common_init_ts): Remove WILDCARD_DECL
handling.
* cp-tree.def (WILDCARD_DECL): Remove.
* cp-tree.h (WILDCARD_PACK_P): Remove.
(concept_definition_p): Remove redundant overload.
(concept_prototype_parameter): Define.
* error.cc (dump_decl) : Remove.
(dump_expr) : Likewise.
* parser.cc (cp_parser_placeholder_type_specifier): Check
the prototype parameter earlier, before build_type_constraint.
Use concept_prototype_parameter.
* pt.cc (convert_wildcard_argument): Remove.
(convert_template_argument): Remove WILDCARD_DECL handling.
(coerce_template_parameter_pack): Likewise.
(tsubst) : Likewise.
(type_dependent_expression_p): Likewise.
(placeholder_type_constraint_dependent_p): Likewise.
---
 gcc/cp/constraint.cc  |  6 ++
 gcc/cp/cp-objcp-common.cc |  1 -
 gcc/cp/cp-tree.def|  6 --
 gcc/cp/cp-tree.h  | 27 ++-
 gcc/cp/error.cc   |  5 -
 gcc/cp/parser.cc  | 31 +++
 gcc/cp/pt.cc  | 37 ++---
 7 files changed, 33 insertions(+), 80 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 9394bea8835..d6a6ac03393 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1154,9 +1154,8 @@ build_concept_id (tree expr)
 tree
 build_type_constraint (tree decl, tree args, tsubst_flags_t complain)
 {
-  tree wildcard = build_nt (WILDCARD_DECL);
   ++processing_template_decl;
-  tree check = build_concept_check (decl, wildcard, args, complain);
+  tree check = build_concept_check (decl, make_auto (), args, complain);
   --processing_template_decl;
   return check;
 }
@@ -1203,8 +1202,7 @@ finish_shorthand_constraint (tree decl, tree constr, bool 
is_non_type)
 {
   tree id = PLACEHOLDER_TYPE_CONSTRAINTS (constr);
   tree tmpl = TREE_OPERAND (id, 0);
-  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (tmpl);
-  proto = TREE_VALUE (TREE_VEC_ELT (parms, 0));
+  proto = concept_prototype_parameter (tmpl);
   con = DECL_TEMPLATE_RESULT (tmpl);
   args = TREE_OPERAND (id, 1);
 }
diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
index cd379514991..69eed72a5a2 100644
--- a/gcc/cp/cp-objcp-common.cc
+++ b/gcc/cp/cp-objcp-common.cc
@@ -624,7 +624,6 @@ cp_common_init_ts (void)
 
   /* New decls.  */
   MARK_TS_DECL_COMMON (TEMPLATE_DECL);
-  MARK_TS_DECL_COMMON (WILDCARD_DECL);
 
   MARK_TS_DECL_NON_COMMON (USING_DECL);
 
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 18f75108c7b..53511a6d8cc 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -487,12 +487,6 @@ DEFTREECODE (OMP_DEPOBJ, "omp_depobj", tcc_statement, 2)
 /* Used to represent information associated with constrained declarations. */
 DEFTREECODE (CONSTRAINT_INFO, "constraint_info", tcc_exceptional, 0)
 
-/* A wildcard declaration is a placeholder for a template parameter
-   used to resolve constrained-type-names in concepts.  During
-   resolution, the matching argument is saved as the TREE_TYPE
-   of the wildcard.  */
-DEFTREECODE (WILDCARD_DECL, "wildcard_decl", tcc_declaration, 0)
-
 /* A requires-expr has three operands. The first operand is
its parameter list (possibly NULL). The second is a list of
requirements, which are denoted by the _REQ* tree codes
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6dcf32b178e..c25dafd5981 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -438,7 +438,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   TINFO_HAS_ACCESS_ERRORS (in TEMPLATE_INFO)
   SIZEOF_EXPR_TYPE_P (in SIZEOF_EXPR)
   COMPOUND_REQ_NOEXCEPT_P (in COM

  1   2   >