date:20240827

Re: [PATCH 1/2] ipa: Treat static constructors and destructors as non-local (PR 115815)

2024-08-27 Thread Martin Jambor

Hello,

and ping please.

Martin


On Fri, Aug 09 2024, Martin Jambor wrote:
> Hello,
>
> and ping please.
>
> Martin
>
> On Fri, Jul 26 2024, Martin Jambor wrote:
>> Hi,
>>
>> in PR 115815, IPA-SRA thought it had control over all invocations of a
>> (recursive) static destructor but it did not see the implied
>> invocation which led to the original being left behind and the
>> clean-up code encountering uses of SSAs that definitely should have
>> been dead.
>>
>> Fixed by teaching cgraph_node::can_be_local_p about static
>> constructors and destructors.  Similar test is missing in
>> cgraph_node::local_p so I added the check there as well.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master and after a
>> while to gcc14 and gcc13 release branches?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  PR ipa/115815
>>  * cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
>>  DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
>>  * ipa-visibility.cc (non_local_p): Likewise.
>>  (cgraph_node::local_p): Delete extraneous line of tabs.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  PR ipa/115815
>>  * gcc.dg/lto/pr115815_0.c: New test.
>> ---
>>  gcc/cgraph.cc |  4 +++-
>>  gcc/ipa-visibility.cc |  5 +++--
>>  gcc/testsuite/gcc.dg/lto/pr115815_0.c | 18 ++
>>  3 files changed, 24 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr115815_0.c
>>
>> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
>> index 473d8410bc9..39a3adbc7c3 100644
>> --- a/gcc/cgraph.cc
>> +++ b/gcc/cgraph.cc
>> @@ -2434,7 +2434,9 @@ cgraph_node_cannot_be_local_p_1 (cgraph_node *node, 
>> void *)
>>  && !node->forced_by_abi
>>  && !node->used_from_object_file_p ()
>>  && !node->same_comdat_group)
>> -   || !node->externally_visible));
>> +   || !node->externally_visible)
>> +   && !DECL_STATIC_CONSTRUCTOR (node->decl)
>> +   && !DECL_STATIC_DESTRUCTOR (node->decl));
>>  }
>>  
>>  /* Return true if cgraph_node can be made local for API change.
>> diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
>> index 501d3c304aa..21f0c47f388 100644
>> --- a/gcc/ipa-visibility.cc
>> +++ b/gcc/ipa-visibility.cc
>> @@ -102,7 +102,9 @@ non_local_p (struct cgraph_node *node, void *data 
>> ATTRIBUTE_UNUSED)
>> && !node->externally_visible
>> && !node->used_from_other_partition
>> && !node->in_other_partition
>> -   && node->get_availability () >= AVAIL_AVAILABLE);
>> +   && node->get_availability () >= AVAIL_AVAILABLE
>> +   && !DECL_STATIC_CONSTRUCTOR (node->decl)
>> +   && !DECL_STATIC_DESTRUCTOR (node->decl));
>>  }
>>  
>>  /* Return true when function can be marked local.  */
>> @@ -116,7 +118,6 @@ cgraph_node::local_p (void)
>>   return n->callees->callee->local_p ();
>> return !n->call_for_symbol_thunks_and_aliases (non_local_p,
>>NULL, true);
>> -
>>  }
>>  
>>  /* A helper for comdat_can_be_unshared_p.  */
>> diff --git a/gcc/testsuite/gcc.dg/lto/pr115815_0.c 
>> b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
>> new file mode 100644
>> index 000..d938ae4c802
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/lto/pr115815_0.c
>> @@ -0,0 +1,18 @@
>> +int a;
>> +volatile int v;
>> +volatile int w;
>> +
>> +int __attribute__((destructor))
>> +b() {
>> +  if (v)
>> +return a + b();
>> +  v = 5;
>> +  return 0;
>> +}
>> +
>> +int
>> +main (int argc, char **argv)
>> +{
>> +  w = 1;
>> +  return 0;
>> +}
>> -- 
>> 2.45.2

Re: [PATCH 2/2] ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra

2024-08-27 Thread Martin Jambor

Hello,

and ping please.

Martin

On Fri, Aug 09 2024, Martin Jambor wrote:
> Hello,
>
> and ping please.
>
> Martin
>
> On Fri, Jul 26 2024, Martin Jambor wrote:
>> Hi,
>>
>> when looking at PR 115815 we realized that it would make sense to make
>> calls to functions originally declared static constructors and
>> destructors created by pass_ipa_cdtor_merge visible to IPA-SRA.  This
>> patch does that.
>>
>> Bootstrapped and tested on x86_64-linux.  OK for master?
>>
>> Thanks,
>>
>> Martin
>>
>>
>> gcc/ChangeLog:
>>
>> 2024-07-25  Martin Jambor  
>>
>>  * passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
>>  pass_ipa_sra.
>> ---
>>  gcc/passes.def | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index b06d6d45f63..33b2c10c9c9 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -157,9 +157,9 @@ along with GCC; see the file COPYING3.  If not see
>>NEXT_PASS (pass_ipa_profile);
>>NEXT_PASS (pass_ipa_icf);
>>NEXT_PASS (pass_ipa_devirt);
>> +  NEXT_PASS (pass_ipa_cdtor_merge);
>>NEXT_PASS (pass_ipa_cp);
>>NEXT_PASS (pass_ipa_sra);
>> -  NEXT_PASS (pass_ipa_cdtor_merge);
>>NEXT_PASS (pass_ipa_fn_summary);
>>NEXT_PASS (pass_ipa_inline);
>>NEXT_PASS (pass_ipa_pure_const);
>> -- 
>> 2.45.2

Re: PING^5 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-08-27 Thread Martin Jambor

Hi,

On Fri, Aug 09 2024, Kewen.Lin wrote:
> Hi,
>
> Gentle ping this patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html

I'd like to second this ping, please.

Thank you,

Martin


>
> BR,
> Kewen
>
>>> on 2024/7/12 00:15, Martin Jambor wrote:
 Hi,

 can I add myself to the bunch of people who are pinging this? 
 Having
 this in will make our life easier.

 Thanks a lot,

 Martin


 On Wed, May 08 2024, Kewen.Lin wrote:
> Hi,
>
> As the discussion in PR112980, although the current
> implementation for -fpatchable-function-entry* conforms
> with the documentation (making N NOPs be consecutive),
> it's inefficient for both kernel and userspace livepatching
> (see comments in PR for the details).
>
> So this patch is to change the current implementation by
> emitting the "before" NOPs before global entry point and
> the "after" NOPs after local entry point.  The new behavior
> would not keep NOPs to be consecutive, so the documentation
> is updated to emphasize this.
>
> Bootstrapped and regress-tested on powerpc64-linux-gnu
> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?  And backporting to active branches
> after burn-in time?  I guess we should also mention this
> change in changes.html?
>
> BR,
> Kewen
> -
>   PR target/112980
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000-logue.cc
> (rs6000_output_function_prologue):
>   Adjust the handling on patch area emitting with dual
> entry, remove
>   the restriction on "before" NOPs count, not emit
> "before" NOPs any
>   more but only emit "after" NOPs.
>   * config/rs6000/rs6000.cc
> (rs6000_print_patchable_function_entry):
>   Adjust by respecting cfun->machine-
>> stop_patch_area_print.
>   (rs6000_elf_declare_function_name): For ELFv2 with dual
> entry, set
>   cfun->machine->stop_patch_area_print as true.
>   * config/rs6000/rs6000.h (struct machine_function):
> Remove member
>   global_entry_emitted, add new member
> stop_patch_area_print.
>   * doc/invoke.texi (option -fpatchable-function-entry):
> Adjust the
>   documentation for PowerPC ELFv2 dual entry.
>
> gcc/testsuite/ChangeLog:
>
>   * c-c++-common/patchable_function_entry-default.c:
> Adjust.
>   * gcc.target/powerpc/pr99888-4.c: Likewise.
>   * gcc.target/powerpc/pr99888-5.c: Likewise.
>   * gcc.target/powerpc/pr99888-6.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 40 +
> --
>  gcc/config/rs6000/rs6000.cc   | 15 +--
>  gcc/config/rs6000/rs6000.h    | 10 +++--
>  gcc/doc/invoke.texi   |  8 ++--
>  .../patchable_function_entry-default.c    |  3 --
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>  8 files changed, 33 insertions(+), 55 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..0eb019b44b3 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE
> *file)
>     fprintf (file, "\tadd 2,2,12\n");
>   }
>
> -  unsigned short patch_area_size = crtl->patch_area_size;
> -  unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> - {
> -   cfun->machine->global_entry_emitted = true;
> -   /* As ELFv2 ABI shows, the allowable bytes between the
> global
> -  and local entry points are 0, 4, 8, 16, 32 and 64
> when
> -  there is a local entry point.  Considering there
> are two
> -  non-prefixed instructions for global entry point
> prologue
> -  (8 bytes), the count for patchable nops before
> local entry
> -  point would be 2, 6 and 14.  It's possible to
> support those
> -  other counts of nops by not making a local entry
> point, but
> -  we don't have clear use cases for them, so leave
> them
> -  unsupported for now.  */
> -   if (patch_area_entry > 0)
> -     {
> -   if (patch_area_entry != 2
> -   && patch_area_entry != 6
> -   && patch_area_entry != 14)
> - error ("unsupported number of nops before
> function entry (%u)",
> -    patch_area_entry);
> -   rs6000_print_patchable_function_entry (file,
> patch_area_entry,
> -  true);
> -   patch_area_

[PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD

2024-08-27 Thread pan2 . li

From: Pan Li 

The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
  T __attribute__((noinline))  \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {\
unsigned i;\
T ret; \
for (i = 0; i < limit; i++)\
  {\
out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
  }\
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)

It will fail to vectorize as the vectorizable_call will check the
operands is type_compatiable but the imm will be (const_int 9) with
the SImode, which is different from _2 (DImode).  Aka:

uint64_t _1;
uint64_t _2;
_1 = .SAT_ADD (_2, 9);

This patch would like to reconcile the imm operand to the operand type
mode of _2 if and only if there is no precision/data loss.  Aka convert
the imm 9 to the DImode for above example.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_reconcile_cst_to_unsigned):
Add new func impl to reconcile the cst int type to given TREE type.
(vect_recog_sat_add_pattern): Reconcile the ops of .SAT_ADD
before building the gimple call.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: 
New test.

Signed-off-by: Pan Li 
---
 .../binop/vec_sat_u_add_imm_reconcile-1.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-10.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-11.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-12.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-13.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-14.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-15.c|  9 +
 .../binop/vec_sat_u_add_imm_reconcile-2.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-3.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-4.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-5.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-6.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-7.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-8.c |  9 +
 .../binop/vec_sat_u_add_imm_reconcile-9.c |  9 +
 .../riscv/rvv/autovec/vec_sat_arith.h | 20 ++
 gcc/tree-vect-patterns.cc | 38 +++
 17 files changed, 193 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c
 create mode 100644 
gcc/testsuite/gcc.tar

[PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar .SAT_SUB IMM form 3

2024-08-27 Thread pan2 . li

From: Pan Li 

This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 3.  Aka:

Form 3:
  #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
  T __attribute__((noinline)) \
  sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
  {   \
return (T)IMM > y ? (T)IMM - y : 0;   \
  }

DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 23)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  9 +++
 .../gcc.target/riscv/sat_u_sub_imm-10.c   | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-10_1.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-10_2.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-11.c   | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-11_1.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-11_2.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-12.c   | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-9.c| 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-9_1.c  | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-9_2.c  | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-10.c   | 56 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-11.c   | 55 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-12.c   | 48 
 .../gcc.target/riscv/sat_u_sub_imm-run-9.c| 56 +++
 15 files changed, 432 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-11_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-11_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-9_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-9_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-9.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index c8ff8320d82..b4339eb0dff 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -231,6 +231,13 @@ sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
   return x >= (T)IMM ? x - (T)IMM : 0;  \
 }
 
+#define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
+T __attribute__((noinline)) \
+sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
+{   \
+  return (T)IMM > y ? (T)IMM - y : 0;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -248,6 +255,8 @@ sat_u_sub_imm##IMM##_##T##_fmt_2 (T x)  \
   if (sat_u_sub_imm##IMM##_##T##_fmt_1(y) != expect) __builtin_abort ()
 #define RUN_SAT_U_SUB_IMM_FMT_2(T, x, IMM, expect) \
   if (sat_u_sub_imm##IMM##_##T##_fmt_2(x) != expect) __builtin_abort ()
+#define RUN_SAT_U_SUB_IMM_FMT_3(T, IMM, y, expect) \
+  if (sat_u_sub_imm##IMM##_##T##_fmt_3(y) != expect) __builtin_abort ()
 
 
/**/
 /* Saturation Truncate (unsigned and signed)  
*/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10.c
new file mode 100644
index 000..db450d7cfbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options

[PATCH v1 2/2] RISC-V: Add testcases for unsigned scalar .SAT_SUB IMM form 4

2024-08-27 Thread pan2 . li

From: Pan Li 

This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 4.  Aka:

Form 4:
  #define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
  T __attribute__((noinline)) \
  sat_u_sub_imm##IMM##_##T##_fmt_4 (T x)  \
  {   \
return x > (T)IMM ? x - (T)IMM : 0;   \
  }

DEF_SAT_U_SUB_IMM_FMT_4(uint64_t, 23)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-13_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-14_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-15_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-16.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-16.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h|  9 +++
 .../gcc.target/riscv/sat_u_sub_imm-13.c   | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-13_1.c | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-13_2.c | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-14.c   | 20 +++
 .../gcc.target/riscv/sat_u_sub_imm-14_1.c | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-14_2.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-15.c   | 19 +++
 .../gcc.target/riscv/sat_u_sub_imm-15_1.c | 21 +++
 .../gcc.target/riscv/sat_u_sub_imm-15_2.c | 22 
 .../gcc.target/riscv/sat_u_sub_imm-16.c   | 18 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-13.c   | 55 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-14.c   | 55 +++
 .../gcc.target/riscv/sat_u_sub_imm-run-15.c   | 54 ++
 .../gcc.target/riscv/sat_u_sub_imm-run-16.c   | 48 
 15 files changed, 421 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-14_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-14_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-15_1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-15_2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-16.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index b4339eb0dff..a899979904b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -238,6 +238,13 @@ sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
   return (T)IMM > y ? (T)IMM - y : 0;   \
 }
 
+#define DEF_SAT_U_SUB_IMM_FMT_4(T, IMM) \
+T __attribute__((noinline)) \
+sat_u_sub_imm##IMM##_##T##_fmt_4 (T x)  \
+{   \
+  return x > (T)IMM ? x - (T)IMM : 0;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -257,6 +264,8 @@ sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
   if (sat_u_sub_imm##IMM##_##T##_fmt_2(x) != expect) __builtin_abort ()
 #define RUN_SAT_U_SUB_IMM_FMT_3(T, IMM, y, expect) \
   if (sat_u_sub_imm##IMM##_##T##_fmt_3(y) != expect) __builtin_abort ()
+#define RUN_SAT_U_SUB_IMM_FMT_4(T, x, IMM, expect) \
+  if (sat_u_sub_imm##IMM##_##T##_fmt_4(x) != expect) __builtin_abort ()
 
 
/**/
 /* Saturation Truncate (unsigned and signed)  
*/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13.c
new file mode 100644
index 000..7dcbc3b1a12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-13.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-opt

Re: [RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-08-27 Thread Richard Biener

On Mon, Aug 26, 2024 at 5:26 PM Matevos Mehrabyan
 wrote:
>
>
>
> On Mon, Aug 26, 2024 at 2:44 AM Jeff Law  wrote:
>>
>>
>>
>> On 8/20/24 5:41 AM, Richard Biener wrote:
>>
>> >
>> > So the store-merging variant IIRC tracks a single overall source
>> > only (unless it was extended and I missed that) and operates at
>> > a byte granularity.  I did want to extend it to support vector shuffles
>> > at one point (with two sources then), but didn't get to it.  The
>> > current implementation manages to be quite efficient - mainly due
>> > to the restriction to a single source I guess.
>> >
>> > How does that compare to the symbolic execution engine?
>> >
>> > What can the symbolic execution engine handle?  The store-merging
>> > machinery can only handle plain copies, can the symbolic
>> > execution engine tell that for example bits 3-7 are bits 1-5 from A
>> > plus constant 9 with appropriately truncated result?
>> Conceptually this is the kind of thing it's supposed to handle, but
>> there may be implementation details that are missing for the case you want.
>>
>> More importantly, the execution engine has a limited set of expressions
>> it knows how to evaluate, so there's a reasonable chance if you feed it
>> something more general than what is typically seen in a CRC loop that
>> it's going to give up because it doesn't know how to handle more than
>> just a few operators.
>>
>>
>
> By using this symbolic execution engine, you can determine that bits 3-7 are 
> bits 1-5 from A.
> I think the documentation will help others to understand how it works and 
> what it does.
> Since the documentation is not ready, here is a simple demo example:
> For the following code:
>
> foo(byte A) {
> byte tmp = A ^ 5;
> byte result = tmp << 2;
> result = result | 4;
> return result;
> }
>
> the symbolic executor would:
>
> define(A);  // A = 
> // Here, each bit of A is mapped to its origin A. So A[3]->get_origin() will 
> return A.
> // Besides that, each bit has an index field that denotes its initial 
> position.
> // So A[3]->get_index() will return 3 even if it is moved or assigned to 
> another variable.
> xor(tmp, A, 5);  // tmp =  ^ 0, A0 ^ 1>
> shift_left(result, tmp, 2);  // result =  ^ 0, A0 ^ 1,0,0>
> or(result, result, 4);  // result =  1,0,0>, set result[2] = 1
>
> After these operations, we can examine the result and see that bits 3-7 of 
> the result are 1-5 bits of the A argument.
> For example, result[4] is the (A2 ^ 1) xor expression (can be checked by 
> is_a),
> so it has left and right operands: one of them is the A2 symbolic bit, and 
> the other is the constant 1.
> So result[4]->get_left()->get_origin() will return A and 
> result[4]->get_left()->get_index() will return 2
> as its initial bit position was that.
>
> The symbolic executor supports few operations, it may need to be extended to 
> use elsewhere.
> Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB, MUL, 
> and COMPLEMENT.

OK, so it seems it should be able to handle what the bswap pass
requires as well (just with unnecessary
bit precision and possibly some memory/compile-time overhead).  The
bswap pass also handles
{L,R}ROTATE_EXPR but that should be trivially to add if you can handle
shifts.  It can also handle
conversions (zero-/sign-extend and truncate), those should be easy as well.

Can it handle

  tmp = A & 0x00ff00ff00ff;
  tmp2 = B & 0xff00ff00ff00;
  result = A | B;

?  That is, make recognizing a blend (or as extension a shuffle) of
two sources into one?

I would guess that parameterizing the engine on the granularity (byte
vs. bit) would be
possible to implement as well as possibly making the granularity
variable as to "split"
bits only when necessary?  I'm thinking of the cost of simulating a
whole function "forward",
keeping a lattice of SSA name -> symbolic execution result here.  Cost
in terms of
both memory and compile-time.

Note it shouldn't be a requirement for you to merge the bswap
byte-tracking code but
it would be good to have the symbolic execution engine extensible
enough to eventually
cover what bswap does today and make the long-wanted extension of recognizing
two-source vector permutes possible.

Richard.

>>
>>
>> >
>> > Note we should always have an eye on compile-time complexity,
>> > GCC does get used on multi-megabyte machine-generated sources
>> > that tend to look very uniform - variants with loops and bit operations
>> > supported by symbolic execution would not catch me in surprise.
>> Which is why it's a two phase recognition.  It uses simple tests to
>> filter out the vast majority of loops, leaving just a few that have a
>> good chance of being a CRC for the more expensive verification step
>> using the symbolic execution engine.
>>
>> jeff
>>
>
> Best Regards,
> Matevos.

New Chinese (simplified) PO file for 'gcc' (version 14.2.0)

2024-08-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-14.2.0.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-27 Thread Richard Biener

On Tue, Aug 27, 2024 at 3:06 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > I think you want to use nop_convert here, for sure a truncation or
> > extension wouldn't be valid?
>
> Oh, yes, should be nop_convert.
>
> > I think you don't need :c on both the inner plus and the bit_xor here?
>
> Sure, could you please help to explain more about when should I need to add 
> :c?
> Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios.

:c is required when you want to match up @0s and they appear in a commutative
operation and there's no canonicalization rule putting it into one or the other
position.  In your case you have two commutative operations you want to match
up, so it should be only necessary to try swapping one of it to get the match,
it's not required to swap both.  This reduces the number of generated patterns.

> > +   integer_zerop)
> > +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
>
> > The comment above quotes 'MIN' but that's not present here - that is,
> > the comment quotes a source form while we match what we see on
> > GIMPLE?  I do expect the matching will be quite fragile when not
> > being isolated.
>
> Got it, will update the comments to gimple.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 26, 2024 9:40 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 26, 2024 at 4:20 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_1 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> > return (x ^ y) < 0 \
> >   ? sum\
> >   : (sum ^ x) >= 0 \
> > ? sum  \
> > : x < 0 ? MIN : MAX;   \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
> >6   │ {
> >7   │   int64_t sum;
> >8   │   long unsigned int x.0_1;
> >9   │   long unsigned int y.1_2;
> >   10   │   long unsigned int _3;
> >   11   │   long int _4;
> >   12   │   long int _5;
> >   13   │   int64_t _6;
> >   14   │   _Bool _11;
> >   15   │   long int _12;
> >   16   │   long int _13;
> >   17   │   long int _14;
> >   18   │   long int _16;
> >   19   │   long int _17;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   23   │   x.0_1 = (long unsigned int) x_7(D);
> >   24   │   y.1_2 = (long unsigned int) y_8(D);
> >   25   │   _3 = x.0_1 + y.1_2;
> >   26   │   sum_9 = (int64_t) _3;
> >   27   │   _4 = x_7(D) ^ y_8(D);
> >   28   │   _5 = x_7(D) ^ sum_9;
> >   29   │   _17 = ~_4;
> >   30   │   _16 = _5 & _17;
> >   31   │   if (_16 < 0)
> >   32   │ goto ; [41.00%]
> >   33   │   else
> >   34   │ goto ; [59.00%]
> >   35   │ ;;succ:   3
> >   36   │ ;;4
> >   37   │
> >   38   │ ;;   basic block 3, loop depth 0
> >   39   │ ;;pred:   2
> >   40   │   _11 = x_7(D) < 0;
> >   41   │   _12 = (long int) _11;
> >   42   │   _13 = -_12;
> >   43   │   _14 = _13 ^ 9223372036854775807;
> >   44   │ ;;succ:   4
> >   45   │
> >   46   │ ;;   basic block 4, loop depth 0
> >   47   │ ;;pred:   2
> >   48   │ ;;3
> >   49   │   # _6 = PHI 
> >   50   │   return _6;
> >   51   │ ;;succ:   EXIT
> >   52   │
> >   53   │ }
> >
> > After this patch:
> >4   │ __attribute__((noinline))
> >5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
> >6   │ {
> >7   │   int64_t _4;
> >8   │
> >9   │ ;;   basic block 2, loop depth 0
> >   10   │ ;;pred:   ENTRY
> >   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   12   │   return _4;
> >   13   │ ;;succ:   EXIT
> >   14   │
> >   15   │ }
> >
> > The below test suites are passed for this patch.
> > * The rv64gcv fully regression test.
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add the matching for signed .SAT_ADD.
> > * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> > matching func decl.
> > (match_unsigned_saturation_add

PING ^1 [PATCH] GCC Driver : Enable very long gcc command-line option

2024-08-27 Thread Dora, Sunil Kumar

Dear GCC Team,

Please consider this as a gentle reminder to review the patch I posted at the 
following link: [ 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660223.html ].

BUG Link : [ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527 ]

Your feedback or approval would be greatly appreciated.

Best Regards,
Sunil Dora

On 2024-08-13 2:30 a.m., 
deepthi.hem...@windriver.com wrote:

From: sunil dora 

For excessively long environment variables i.e >128KB
Store the arguments in a temporary file and collect them back together in 
collect2.

This commit patches for COLLECT_GCC_OPTIONS issue:
GCC should not limit the length of command line passed to collect2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527

The Linux kernel has the following limits on shell commands:
I.  Total number of bytes used to specify arguments must be under 128KB.
II. Each environment variable passed to an executable must be under 128 KiB

In order to circumvent these limitations, many build tools support
response-files, i.e. files that contain the arguments for the executed
command. These are typically passed using @ syntax.

Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
expanded command line to collect2. With many options, this exceeds the limit II.

GCC : Added Testcase for PR111527

TC1 : If the command line argument less than 128kb, gcc should use
  COLLECT_GCC_OPTION to communicate and compile fine.
TC2 : If the command line argument in the range of 128kb to 2mb,
  gcc should copy arguments in a file and use FILE_GCC_OPTIONS
  to communicate and compile fine.
TC3 : If the command line argument greater thean 2mb, gcc shuld
  fail the compile and report error. (Expected FAIL)

Signed-off-by: Topi Kuutela 

Signed-off-by: sunil dora 


---
 
gcc/collect2.cc
   | 40 +++--
 
gcc/gcc.cc
| 37 +--
 gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
 7 files changed, 160 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
 create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c

diff --git 
a/gcc/collect2.cc
 
b/gcc/collect2.cc
index 902014a9cc1..564d8968648 100644
--- 
a/gcc/collect2.cc
+++ 
b/gcc/collect2.cc
@@ -376,6 +376,40 @@ typedef int scanfilter;

 static void scan_prog_file (const char *, scanpass, scanfilter);

+char* getenv_extended (const char* var_name)
+{
+  int file_size;
+  char* buf = NULL;
+
+  char* string = getenv (var_name);
+  if (!string)
+{
+  char* string = getenv ("FILE_GCC_OPTIONS");
+  FILE *fptr;
+  fptr = fopen (string, "r");
+  if (fptr == NULL)
+   return (0);
+  /* Copy contents from temporary file to buffer */
+  if (fseek (fptr, 0, SEEK_END) == -1)
+   return (0);
+  file_size = ftell (fptr);
+  rewind (fptr);
+  buf = (char *) xmalloc (file_size + 1);
+  if (buf == NULL)
+   return (0);
+  if (fread ((void *) buf, file_size, 1, fptr) <= 0)
+   {
+ free (buf);
+ fatal_error (input_location, "fread failed");
+ return (0);
+   }
+  buf[file_size] = '\0';
+  return buf;
+}
+  return string;
+}
+
+

 /* Delete tempfiles and exit function.  */

@

Re: [PATCH v2 1/2] Enhance cse_insn to handle all-zeros and all-ones for vector mode.

2024-08-27 Thread Richard Biener

On Tue, Aug 27, 2024 at 5:20 AM liuhongt  wrote:
>
> > You are possibly overwriting src_related_elt - I'd suggest to either break
> > here or do the loop below for each found elt?
> Changed.
>
> > Do we know that will always succeed?
> 1) validate_subreg allows subreg for 2 vector modes with same component modes.
> 2) gen_lowpart in cse.cc is defined as gen_lowpart_if_possible,
> If it fails, it returns 0, just fallback to src_related = 0.
>
> > So on the GIMPLE side we are trying to handle such cases by maintaining
> > only a single element in the hashtables, thus hash and compare them
> > the same - them in this case (vec_dup:M (reg:c)) and (vec_dup:N (reg:c)),
> > leaving it up to the consumer to reject or pun mismatches.
> rtx_cost will be used to decided if it's profitable
> ((subreg:M (reg: N) 0) vs (vec_dup:M (reg:c))), if M and N is
> not tieable, rtx_cost will be expensive and failed the replacement.

I see.

> >
> > For constants that would hold even more - note CSEing vs. duplicating
> > constants might not be universally good.
> Assume you mean (reg:c) in (vec_dup:M (reg:c) is from a constant, the later

I was refering to the const0_rtx and constm1_rtx hunk at the start of the
patch (maybe split the patch for bisection purposes?).  And mostly from
the context of RA and register pressure - though I would guess RA should
be good enough to handle re-materialization here but still there's a single
vs. a split live-range.

> rtl optimizer (.i.e forwprop/combine) will try to do the further simplication
> for the constants if rtx_cost is profitable.)
> For const_vector, it handled by the other codes
>
> 5063  /* Try to re-materialize a vec_dup with an existing constant.   */
> 5064  rtx src_elt;
> 5065  if ((!src_eqv_here || CONSTANT_P (src_eqv_here))
> 5066  && const_vec_duplicate_p (src, &src_elt))
> 5067{
>

Another general remark would be - did you try to handle these
transforms in postreload CSE?  That's supposed to detect
"noop moves" and in fact all this does for x86 is to re-use
a hardreg?

I'll leave actual review to CSE/RTL experts.

Thanks,
Richard.

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> Also try to handle redundant broadcasts when there's already a
> broadcast to a bigger mode with exactly the same component value.
> For broadcast, component mode needs to be the same.
> For all-zeros/ones, only need to check the bigger mode.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/92080
> * cse.cc (cse_insn): Handle all-ones/all-zeros, and vec_dup
> with variables.
> ---
>  gcc/cse.cc | 82 ++
>  1 file changed, 82 insertions(+)
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 65794ac5f2c..fab2f515f8c 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4870,6 +4870,50 @@ cse_insn (rtx_insn *insn)
> }
> }
>
> +  /* Try to handle special const_vector with elt 0 or -1.
> +They can be represented with different modes, and can be cse.  */
> +  if (src_const && src_related == 0 && CONST_VECTOR_P (src_const)
> + && (src_const == CONST0_RTX (mode)
> + || src_const == CONSTM1_RTX (mode))
> + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +   {
> + machine_mode mode_iter;
> +
> + for (int l = 0; l != 2; l++)
> +   {
> + FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_VECTOR_INT)
> +   {
> + if (maybe_lt (GET_MODE_SIZE (mode_iter),
> +   GET_MODE_SIZE (mode)))
> +   continue;
> +
> + rtx src_const_iter = (src_const == CONST0_RTX (mode)
> +   ? CONST0_RTX (mode_iter)
> +   : CONSTM1_RTX (mode_iter));
> +
> + struct table_elt *const_elt
> +   = lookup (src_const_iter, HASH (src_const_iter, 
> mode_iter),
> + mode_iter);
> +
> + if (const_elt == 0)
> +   continue;
> +
> + for (const_elt = const_elt->first_same_value;
> +  const_elt; const_elt = const_elt->next_same_value)
> +   if (REG_P (const_elt->exp))
> + {
> +   src_related = gen_lowpart (mode, const_elt->exp);
> +   break;
> + }
> +
> + if (src_related != 0)
> +   break;
> +   }
> + if (src_related != 0)
> +   break;
> +   }
> +   }
> +
>/* See if we have a CONST_INT that is already in a register in a
>  wider mode.  */
>
> @@ -5041,6 +5085,44 @@ cse_insn (rtx_insn *insn)
> }
> }
>
> +  /* Try to find something like (vec_dup:v16si (reg:c))
> +for (vec_dup:v8si (reg:c)).  */
> +  if (src_related == 0
> + && V

Re: [PATCH] MATCH: add abs support for half float

2024-08-27 Thread Richard Biener

On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> > On 22 Aug 2024, at 10:34 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >>> On 20 Aug 2024, at 6:09 pm, Richard Biener  
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Thanks for the comments.
> 
> > On 2 Aug 2024, at 8:36 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >>
> >>
> >>> On 1 Aug 2024, at 10:46 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
> 
>  On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  
>  wrote:
> >
> > On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >>  wrote:
> >>>
> >>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
>   wrote:
> >
> > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
> >>  wrote:
> >>>
> >>> On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Revised based on the comment and moved it into existing 
>  patterns as.
> 
>  gcc/ChangeLog:
> 
>  * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : 
>  -A.
>  Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> 
>  gcc/testsuite/ChangeLog:
> 
>  * gcc.dg/tree-ssa/absfloat16.c: New test.
> >>>
> >>> The testcase needs to make sure it runs only for targets that 
> >>> support
> >>> float16 so like:
> >>>
> >>> /* { dg-require-effective-target float16 } */
> >>> /* { dg-add-options float16 } */
> >> Added in the attached version.
> >
> > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> > (for cmp (ge gt)
> > (simplify
> > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > -&& bitwise_equal_p (@0, @1))
> > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > +&& ((VECTOR_TYPE_P (type)
> > + && tree_nop_conversion_p (TREE_TYPE (@0), 
> > TREE_TYPE (@1)))
> > +   || (!VECTOR_TYPE_P (type)
> > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > +&& bitwise_equal_p (@1, @2))
> >
> > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > with the convert still applied to @1 - that looks odd.  You are 
> > allowing
> > sign-changing conversions but doesn't that change ge/gt 
> > behavior?
> > Also why are sign/zero-extensions not OK for vector types?
>  Thanks for the review.
>  My main motivation here is for _Float16  as below.
> 
>  _Float16 absfloat16 (_Float16 x)
>  {
>  float _1;
>  _Float16 _2;
>  _Float16 _4;
>   [local count: 1073741824]:
>  _1 = (float) x_3(D);
>  if (_1 < 0.0)
>  goto ; [41.00%]
>  else
>  goto ; [59.00%]
>   [local count: 440234144]:\
>  _4 = -x_3(D);
>   [local count: 1073741824]:
>  # _2 = PHI <_4(3), x_3(D)(2)>
>  return _2;
>  }
> 
>  This is why I added  bitwise_equal_p test of @1 against @2 with
>  TYPE_PRECISION checks.
>  I agree that I will have to check for sign-changing conv

Re: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD

2024-08-27 Thread Richard Biener

On Tue, Aug 27, 2024 at 9:09 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will fail to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be (const_int 9) with
> the SImode, which is different from _2 (DImode).  Aka:
>
> uint64_t _1;
> uint64_t _2;
> _1 = .SAT_ADD (_2, 9);
>
> This patch would like to reconcile the imm operand to the operand type
> mode of _2 if and only if there is no precision/data loss.  Aka convert
> the imm 9 to the DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_reconcile_cst_to_unsigned):
> Add new func impl to reconcile the cst int type to given TREE type.
> (vect_recog_sat_add_pattern): Reconcile the ops of .SAT_ADD
> before building the gimple call.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper 
> macros.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: 
> New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: 
> New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../binop/vec_sat_u_add_imm_reconcile-1.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-10.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-11.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-12.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-13.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-14.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-15.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-2.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-3.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-4.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-5.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-6.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-7.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-8.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-9.c |  9 +
>  .../riscv/rvv/autovec/vec_sat_arith.h | 20 ++
>  gcc/tree-vect-patterns.cc | 38 +++
>  17 files changed, 193 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c
>

Re: [PATCH v2] Extend check-function-bodies to allow label and directives

2024-08-27 Thread Richard Sandiford

"H.J. Lu"  writes:
> As PR target/116174 shown, we may need to verify labels and the directive
> order.  Extend check-function-bodies to support matched output lines to
> allow label and directives.
>
> gcc/
>
>   * doc/sourcebuild.texi (check-function-bodies): Add an optional
>   argument for matched output lines.
>
> gcc/testsuite/
>
>   * gcc.target/i386/pr116174.c: Use check-function-bodies.
>   * lib/scanasm.exp (parse_function_bodies): Append the line if
>   $up_config(matched) matches the line.
>   (check-function-bodies): Add an argument for matched.  Set
>   up_config(matched) to $matched.  Append the expected line without
>   $config(line_prefix) to function_regexp if it starts with ".L".
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/doc/sourcebuild.texi |  6 +-
>  gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
>  gcc/testsuite/lib/scanasm.exp| 14 --
>  3 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 1a31f00fb65..f7128f445cf 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -3530,7 +3530,7 @@ assembly output.
>  Passes if @var{symbol} is not defined as a hidden symbol in the test's
>  assembly output.
>  
> -@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
> target/xfail @var{selector} @}]]
> +@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
> target/xfail @var{selector} @} [@var{matched}]]]
>  Looks through the source file for comments that give the expected assembly
>  output for selected functions.  Each line of expected output starts with the
>  prefix string @var{prefix} and the expected output for a function as a whole
> @@ -3544,6 +3544,10 @@ command line.  This can help if a source file is 
> compiled both with
>  and without optimization, since it is rarely useful to check the full
>  function body for unoptimized code.
>  
> +@var{matched}, if specified, is a regular expression which matches a
> +line of the function body.  If @var{matched} isn't specified, lines
> +beginning with labels, directives and comments are ignored.
> +

How about instead splitting:

  Depending on the configuration (see
  @code{configure_check-function-bodies} in
  @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
  compiler's assembly output directives such as @code{.cfi_startproc},
  local label definitions such as @code{.LFB0}, and more.
  It then matches the result against the expected
  output for a function as a single regular expression.  This means that
  later lines can use backslashes to refer back to @samp{(@dots{})}
  captures on earlier lines.  For example:

into two paragraphs at "If then", and describing the new behaviour at
the end of the first paragraph:


Depending on the configuration (see
@code{configure_check-function-bodies} in
@file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
compiler's assembly output directives such as @code{.cfi_startproc},
local label definitions such as @code{.LFB0}, and more.  This behavior
can be overridden using the optional @var{matched} argument, which
specifies a regexp for lines that should not be discarded in this way.

The test then matches the result against the expected
output for a function as a single regular expression.  This means that
later lines can use backslashes to refer back to @samp{(@dots{})}
captures on earlier lines.  For example:


>  The first line of the expected output for a function @var{fn} has the form:
>  
>  @smallexample
> diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
> b/gcc/testsuite/gcc.target/i386/pr116174.c
> index 8877d0b51af..91ec3288786 100644
> --- a/gcc/testsuite/gcc.target/i386/pr116174.c
> +++ b/gcc/testsuite/gcc.target/i386/pr116174.c
> @@ -1,6 +1,20 @@
>  /* { dg-do compile { target *-*-linux* } } */
> -/* { dg-options "-O2 -fcf-protection=branch" } */
> +/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
> +/* Keep labels and directives ('.p2align', '.cfi_startproc').
> +/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {.*} } } 
> */

This works, but matches everything.  Maybe {^\t?\.} would be more precise.
The current version is fine too though, if you think it will work for all
assembly dialects.

>  
> +/*
> +**foo:
> +**.LFB0:
> +**   .cfi_startproc
> +** (
> +**   endbr64
> +**   .p2align 5
> +** |
> +**   endbr32
> +** )
> +**...
> +*/
>  char *
>  foo (char *dest, const char *src)
>  {
> @@ -8,5 +22,3 @@ foo (char *dest, const char *src)
>  /* nothing */;
>return --dest;
>  }
> -
> -/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 42

[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.

2024-08-27 Thread Xianmiao Qu

Currently, in RV32, even with the D extension enabled, the cost of DFmode
register moves is still set to 'COSTS_N_INSNS (2)'. This results in the
'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG
register moves, leading to the generation of many redundant instructions.

As an example, consider the following test case:
  double foo (int t, double a, double b)
  {
if (t > 0)
  return a;
else
  return b;
  }

When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is 
generated:
  .cfi_startproc
  addisp,sp,-32
  .cfi_def_cfa_offset 32
  fsd fa0,8(sp)
  fsd fa1,16(sp)
  lw  a4,8(sp)
  lw  a5,12(sp)
  lw  a2,16(sp)
  lw  a3,20(sp)
  bgt a0,zero,.L1
  mv  a4,a2
  mv  a5,a3
  .L1:
  sw  a4,24(sp)
  sw  a5,28(sp)
  fld fa0,24(sp)
  addisp,sp,32
  .cfi_def_cfa_offset 0
  jr  ra
  .cfi_endproc

After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the
generated code is as follows, with a significant reduction in the number
of instructions.
  .cfi_startproc
  ble a0,zero,.L5
  ret
  .L5:
  fmv.d   fa0,fa1
  ret
  .cfi_endproc

gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the
DFmode register move for RV32.

gcc/testsuite/
* gcc.target/riscv/rv32-movdf-cost.c: New test.
---
 gcc/config/riscv/riscv.cc|  5 +
 gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c | 13 +
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f544c1287ec..a47dedf73c10 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3560,6 +3560,11 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   if (outer_code == INSN
  && register_operand (SET_DEST (x), GET_MODE (SET_DEST (x
{
+ if (REG_P (SET_SRC (x)) && TARGET_DOUBLE_FLOAT && mode == DFmode)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
  riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
  return true;
}
diff --git a/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c 
b/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c
new file mode 100644
index ..cb679e7b95fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imafdc -mabi=ilp32d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+double foo (int t, double a, double b)
+{
+  if (t > 0)
+return a;
+  else
+return b;
+}
+
+/* { dg-final { scan-assembler-not "fsd\t" } } */
-- 
2.43.0

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Andrew Stubbs


On 22/08/2024 19:26, Tobias Burnus wrote:
This patch adds OpenMP's interop support to the libgomp plugins (nvptx: 
cuda, cuda_driver, hip; gcn: hip, hsa).*


[The idea is that the user can ask OpenMP to return a foreign-runtime 
handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device number – 
and to create a stream (CUstream, hipStream_t, cudaStream_t, 
hsa_queue_t), where OpenMP can take care of dependencies, .e.g, via the 
'depend' clause.]


The attached patch comes on top of the interop routine patch, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661118.html (and 
the associated .texi patch, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661072.html ).


The patch is more a WIP/RFC patch than a final patch as it is currently 
not wired up: while 'GOMP_interop' can be called manually, the proper 
way will be OpenMP's 'interop' directive, currently unimplemented. 
Hence, this patch is not extensively tested, does not include testcases, 
and target.c's GOMP_interop will surely change to handle all clauses.


But except that target.c's GOMP_interop will change, the rest of the 
patch should be be rather solid – and could in principle be applied.


Therefore:

(A) Any comments, suggestions regarding the patch in general and in 
particular the plugin/ related parts?


The code all looks pretty reasonable to me.

The header file conditional includes worry me though: it is adding 
complexity in a way that hurts maintainability, and looks like it might 
break somebody's hypothetical out-of-tree plugin. Is it not better for a 
plugin that supports interop to include omp.h itself?


(B) RFC: The *stream* *creation* (hsa_queue_t, cudaStream_t/hipStream_t) 
functions have tons of options. Thus:


(i) Does the chosen size/flags argument for the stream/queue generation 
for GCN/HIP/CUDA make sense? – Or are other values that are more sensible?


I think we want to follow the principle of least surprise, so max size 
queues with the same type we normally use, and certainly no non-blocking.



(ii) Should the user be able to tweak the values?

I mean, the user could say:** 'prefer_type({fr("cuda"), 
attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'.


Do we want to permit this? If yes, which of the values should be 
changeable?


Is there any prior art for this? It looks like it could be added in 
future, without breaking backward compatibility, so I say "no" (at least 
for now).




Tobias

(*) For Nvidia, HIP is just a thin wrapper of defines, typedefs and 
inline functions around CUDA. Thus, hip, cuda and cuda_driver are 
effectively all the same. / The HSA is a new proposal that is currently 
added additional-definition document. (OpenMP spec Issue #4023.)


(**) The used syntax and in particular 'attr' are new in OpenMP 6.0 (new 
in TR13). Note that attr only takes string literals [while 'fr' takes 
strings and (6.0) identifiers ["omp_ifr_cuda"] or constant integer 
expressions (5.1)].

Re: [PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.

2024-08-27 Thread Kito Cheng

LGTM, good catch, and I am a little suppressed that we don't handle
"case REG" in riscv_rtx_costs...but adding that might disturb too much
at once, so this fix is fine for now, and ...and I guess we should
improve that in future.


On Tue, Aug 27, 2024 at 5:19 PM Xianmiao Qu  wrote:
>
> Currently, in RV32, even with the D extension enabled, the cost of DFmode
> register moves is still set to 'COSTS_N_INSNS (2)'. This results in the
> 'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG
> register moves, leading to the generation of many redundant instructions.
>
> As an example, consider the following test case:
>   double foo (int t, double a, double b)
>   {
> if (t > 0)
>   return a;
> else
>   return b;
>   }
>
> When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is 
> generated:
>   .cfi_startproc
>   addisp,sp,-32
>   .cfi_def_cfa_offset 32
>   fsd fa0,8(sp)
>   fsd fa1,16(sp)
>   lw  a4,8(sp)
>   lw  a5,12(sp)
>   lw  a2,16(sp)
>   lw  a3,20(sp)
>   bgt a0,zero,.L1
>   mv  a4,a2
>   mv  a5,a3
>   .L1:
>   sw  a4,24(sp)
>   sw  a5,28(sp)
>   fld fa0,24(sp)
>   addisp,sp,32
>   .cfi_def_cfa_offset 0
>   jr  ra
>   .cfi_endproc
>
> After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the
> generated code is as follows, with a significant reduction in the number
> of instructions.
>   .cfi_startproc
>   ble a0,zero,.L5
>   ret
>   .L5:
>   fmv.d   fa0,fa1
>   ret
>   .cfi_endproc
>
> gcc/
> * config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the
> DFmode register move for RV32.
>
> gcc/testsuite/
> * gcc.target/riscv/rv32-movdf-cost.c: New test.
> ---
>  gcc/config/riscv/riscv.cc|  5 +
>  gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c | 13 +
>  2 files changed, 18 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 1f544c1287ec..a47dedf73c10 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3560,6 +3560,11 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
> outer_code, int opno ATTRIBUTE_UN
>if (outer_code == INSN
>   && register_operand (SET_DEST (x), GET_MODE (SET_DEST (x
> {
> + if (REG_P (SET_SRC (x)) && TARGET_DOUBLE_FLOAT && mode == DFmode)
> +   {
> + *total = COSTS_N_INSNS (1);
> + return true;
> +   }
>   riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
>   return true;
> }
> diff --git a/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c 
> b/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c
> new file mode 100644
> index ..cb679e7b95fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rv32-movdf-cost.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32imafdc -mabi=ilp32d" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +double foo (int t, double a, double b)
> +{
> +  if (t > 0)
> +return a;
> +  else
> +return b;
> +}
> +
> +/* { dg-final { scan-assembler-not "fsd\t" } } */
> --
> 2.43.0
>

[PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-27 Thread Alex Coplan

Hi,

This is a v3 that hopefully addresses the feedback from both Jason and
Jakub.  v2 was posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html

(Sorry for the delay in posting the re-spin, I was away last week.)

In this version we refactor to introudce a helper class (annotate_saver)
which is much less invasive to the caller (maybe_convert_cond) and
should (at least in theory) be reuseable elsewhere.

This version also relies on the assumption that operands 1 and 2 of
ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
without having to rely on assumptions about the specific changes made
in maybe_convert_cond.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

For the testcase added with this patch, we would end up losing the:

  #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

   /* [...] We assume that the annotations come immediately before the
  condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

   :
  D.4524 = .ANNOTATE (iftmp.1, 1, 4);
  retval.0 = D.4524;
  if (retval.0 != 0)
goto ; [INV]
  else
goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

  if (<::operator() (&pred, *first), unroll 4>>>)
goto ;
  else
goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
 -> maybe_convert_cond
 -> condition_conversion
 -> fold_build_cleanup_point_expr

this patch chooses to fix the issue by first introducing a new helper
class (annotate_saver) to save and restore outer chains of
ANNOTATE_EXPRs and then using it in maybe_convert_cond.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

gcc/cp/ChangeLog:

PR libstdc++/116140
* semantics.cc (anotate_saver): New. Use it ...
(maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 5ab2076b673..b1a49b14238 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
nested_p,
 }
 }
 
+/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, users
+   can construct one of these like so:
+
+ annotate_saver s (&t);
+
+   and t will be updated to have any annotations removed.  The user can then
+   transform t, and later restore the ANNOTATE_EXPRs with:
+
+ t = s.restore (t).
+
+   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
+   expressions following any operations on t.  */
+
+class annotate_saver {
+  /* The chain of saved annotations, if there were any.  Otherwise null.  */
+  tree m_annotations;
+
+  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
+ for the innermost annotation A.  */
+  tree *m_inner;
+
+public:
+  annotate_saver (tree *);
+  tree restore (tree);
+};
+
+/* If *COND is an ANNOTATE_EXPR, walk through the chain of annotations, and set
+   *COND equal to the first non-ANNOTATE_EXPR (saving a pointer to the
+   original chain of annotations for later use in restore).  */
+
+annotate_saver::annotate_saver (tree *cond) : m_annotations (nullptr)
+{
+  tree *t = cond;
+  while (TREE_CODE (*t) == ANNOTATE_EXPR)
+t = &TREE_OPERAND (*t, 0);
+
+  if (t != cond)
+{
+  m_annotations = *cond;
+  *cond = *t;
+  m_inner = t;
+}
+}
+
+/* If we didn't strip any annotations on construction, return NEW_INNER
+   unmodified.  Otherwise, wrap the saved annotations around NEW_INNER 
(updating
+   the types and flags of the annotations if needed) and return the resulting
+   expression.  */
+
+tree
+annotate_saver::restore (tree new_inner)
+{
+  if (!m_annotations)
+return new_inner;
+
+  /* If the type of the inner expression changed, we need to update the types
+ of all the ANNOTATE_EXPRs.  We may need to update the flags too, but we
+ assume they only change if the type of the inner expression changes.
+ The flag update logic assumes that the other operands to the
+ ANNOTATE_EXPRs are alway

Re: [PATCH] arm: Always use vmov.f64 instead of vmov.f32 with MVE

2024-08-27 Thread Richard Earnshaw (lists)

On 21/08/2024 17:03, Christophe Lyon wrote:
> With MVE, vmov.f64 is always supported (no need for +fp.dp extension).
> 
> This patch updates two patterns:
> - in movdi_vfp, we incorrectly checked
>   TARGET_VFP_SINGLE || TARGET_HAVE_MVE instead of
>   TARGET_VFP_SINGLE && !TARGET_HAVE_MVE, and didn't take into account
>   these two possibilities when computing the length attribute.
> 
> - in thumb2_movdf_vfp, we checked only TARGET_VFP_SINGLE.
> 
> No need to update movdf_vfp, since it is enabled only for TARGET_ARM
> (which is not the case when MVE is enabled).
> 
> The patch also updates gcc.target/arm/armv8_1m-fp64-move-1.c, to
> accept only vmov.f64 instead of vmov.f32.
> 
> Tested on arm-none-eabi with:
> qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
> qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve
> qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp
> qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp+fp.dp
> 
> 2024-08-21  Christophe Lyon  
> 
>   gcc/
>   * config/arm/vfp.md (movdi_vfp, thumb2_movdf_vfp): Handle MVE
>   case.
> 
>   gcc/testsuite/
>   * gcc.target/arm/armv8_1m-fp64-move-1.c: Update expected code.

OK.

R.

> ---
>  gcc/config/arm/vfp.md   | 8 
>  gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c | 8 +---
>  2 files changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 773f55664a9..3212d9c7aa1 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -367,7 +367,7 @@
>  case 8:
>return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
>  case 9:
> -  if (TARGET_VFP_SINGLE || TARGET_HAVE_MVE)
> +  if (TARGET_VFP_SINGLE && !TARGET_HAVE_MVE)
>   return \"vmov%?.f32\\t%0, %1\\t%@ int\;vmov%?.f32\\t%p0, %p1\\t%@ int\";
>else
>   return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
> @@ -385,7 +385,7 @@
>  (symbol_ref "arm_count_output_move_double_insns 
> (operands) * 4")
>(eq_attr "alternative" "9")
> (if_then_else
> - (match_test "TARGET_VFP_SINGLE")
> + (match_test "TARGET_VFP_SINGLE && 
> !TARGET_HAVE_MVE")
>   (const_int 8)
>   (const_int 4))]
>(const_int 4)))
> @@ -744,7 +744,7 @@
>case 6: case 7: case 9:
>   return output_move_double (operands, true, NULL);
>case 8:
> - if (TARGET_VFP_SINGLE)
> + if (TARGET_VFP_SINGLE && !TARGET_HAVE_MVE)
> return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\";
>   else
> return \"vmov%?.f64\\t%P0, %P1\";
> @@ -758,7 +758,7 @@
> (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8)
>  (eq_attr "alternative" "8")
>   (if_then_else
> -  (match_test "TARGET_VFP_SINGLE")
> +  (match_test "TARGET_VFP_SINGLE && 
> !TARGET_HAVE_MVE")
>(const_int 8)
>(const_int 4))]
> (const_int 4)))
> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c 
> b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
> index d236f0826c3..4a3cf0a5afb 100644
> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
> @@ -33,13 +33,7 @@ w_r ()
>  
>  /*
>  ** w_w:
> -** (
> -**   vmov.f32s2, s0
> -**   vmov.f32s3, s1
> -** |
> -**   vmov.f32s3, s1
> -**   vmov.f32s2, s0
> -** )
> +**   vmov.f64d1, d0
>  **   bx  lr
>  */
>  void

Re: [PATCH] testuite: Accept vmov.f64

2024-08-27 Thread Richard Earnshaw (lists)

On 21/08/2024 17:06, Christophe Lyon wrote:
> On Wed, 14 Aug 2024 at 22:04, Torbjörn SVENSSON
>  wrote:
>>
>> Ok for trunk and releases/gcc-14?
>>
>> --
>>
>> On Cortex-M55 with fpv5-d16, the vmov.f64 instruction is used.
> 
> Hi Torbjorn,
> 
> Thanks for the patch: after looking further I realized that we can
> always generate vmov.f64 with MVE, so I propose this patch instead:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661064.html
> 

Agreed, except ...

> Thanks,
> 
> Christophe
> 
>>
>> gcc/testsuite/ChangeLog:
>>
>> * armv8_1m-fp64-move-1.c: Accept vmov.f64 instruction.
>>
>> Signed-off-by: Torbjörn SVENSSON 
>> ---
>>  gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c 
>> b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
>> index d236f0826c3..44abfcf1518 100644
>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c
>> @@ -2,7 +2,7 @@
>>  /* { dg-options "-O" } */
>>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>>  /* { dg-add-options arm_v8_1m_mve } */
>> -/* { dg-additional-options "-mfloat-abi=hard" } *
>> +/* { dg-additional-options "-mfloat-abi=hard" } */

... this typo isn't fixed by Christophe's patch.  Could you commit that as 
obvious, please.

R.

>>  /* { dg-final { check-function-bodies "**" "" } } */
>>
>>  /*
>> @@ -39,6 +39,8 @@ w_r ()
>>  ** |
>>  ** vmov.f32s3, s1
>>  ** vmov.f32s2, s0
>> +** |
>> +** vmov.f64d1, d0
>>  ** )
>>  ** bx  lr
>>  */
>> --
>> 2.25.1
>>

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-08-27 Thread Richard Sandiford

Tamar Christina  writes:
> Hi Jennifer,
>
>> -Original Message-
>> From: Jennifer Schmitz 
>> Sent: Friday, August 23, 2024 1:07 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Richard Sandiford ; Kyrylo Tkachov
>> 
>> Subject: [RFC][PATCH] AArch64: Remove
>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>>
>> This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>> tunable and
>> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
>> default.

Thanks for doing this.  This has been on my TODO list ever since the
tunable was added.

The history is that these "new" costs were originally added in stage 4
of GCC 11 for Neoverse V1.  Since the costs were added so late, it wasn't
appropriate to change the behaviour for any other core.  All the new code
was therefore gated on this option.

The new costs do two main things:

(1) use throughput-based calculations where available, including to choose
between Advanced SIMD and SVE

(2) try to make the latency-based costs more precise, by looking more closely
at the provided stmt_info

Old cost models won't be affected by (1) either way, since they don't
provide any throughput information.  But they should in principle benefit
from (2).  So...

>> To that end, the function aarch64_use_new_vector_costs_p and its uses were
>> removed. Additionally, guards were added prevent nullpointer dereferences of
>> fields in cpu_vector_cost.
>>
>
> I'm not against this change, but it does mean that we now switch old Adv. SIMD
> cost models as well to the new throughput based cost models.  That means that
> -mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I think
> some distros explicitly use this (I believe yocto for instance does).

...it shouldn't mean that we start using throughput-based models for
cortexa53 etc., since there's no associated issue info.

> Have we validated that the old generic cost model still behaves sensibly with 
> this change?
>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu:
>> No problems bootstrapping, but several test files (in aarch64-sve.exp:
>> gather_load_extend_X.c
>> where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of small
>> differences
>> in codegen that make some of the scan-assembler-times tests fail.
>>
>> Kyrill suggested to add a -fvect-cost-model=unlimited flag to these tests 
>> and add
>> some
>
> I don't personally like unlimited here as unlimited means just vectorize at 
> any
> cost.  This means that costing between modes are also disabled. A lot of these
> testcases are intended to test not just that we vectorize but that we 
> vectorize
> with efficient code.
>
> I'd prefer to use -fvect-cost-model=dynamic if that fixes the testcases.

Yeah, I don't think -fvect-cost-model=unlimited would work for the
gather_load_extend_X.c tests, since we require the cost model to decide
when to use extending loads vs loading a full vector and unpacking.

[...tries patch...]

It seems like three main things are contributing to the difference:

1. we now cost 0 for a scalar prologue extension from a loaded value
2. we now cost gathers & scatters based on gather_load_x32/x64_cost
3. we now apply a large one-off cost for gathers (Tamar's recent change)

(1) is expected.

(2) partly looks like a latent bug.  We're using the x32 cost for
VNx2QI->VNx2SI, even though those are really .B->.D loads.

@@ -16819,7 +16811,7 @@ aarch64_detect_vector_stmt_subtype (vec_info *vinfo, 
vect_cost_for_stmt kind,
   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
 {
   unsigned int nunits = vect_nunits_for_cost (vectype);
-  if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64)
+  if (known_eq (GET_MODE_NUNITS (TYPE_MODE (vectype)), aarch64_sve_vg))
return { sve_costs->gather_load_x64_cost, nunits };
   return { sve_costs->gather_load_x32_cost, nunits };
 }

fixes that.

(3) is interesting.  generic_vector_cost isn't used by default for any
SVE CPU, or any -march=...+sve.  So the question is: should we treat it
as "architectural intent"/somewhat idealised?  Or should we try to make
it produce good code for existing SVE cores, in which case it would
overlap quite a lot with generic_armv8_a and generic_armv9_a.

If the former, we could set gather_load_x32_init_cost and
gather_load_x64_init_cost to 0 for generic_sve_vector_cost
(and nothing else, so that generic_armv*_a are unaffected).

On the patch:

> @@ -16733,7 +16723,8 @@ aarch64_in_loop_reduction_latency (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  {
>const cpu_vector_cost *vec_costs = aarch64_tune_params.vec_costs;
>const sve_vec_cost *sve_costs = nullptr;
> -  if (vec_flags & VEC_ANY_SVE)
> +  if (vec_costs->sve
> +  && vec_flags & VEC_ANY_SVE)
>  sve_costs = aarch64_tune_params.vec_costs->sve;

This doesn't seem necessary.  I agree we might as well reuse the
vec_costs local variable in

Un-XFAIL 'gcc.dg/signbit-5.c' for GCN (was: [PATCH] RISC-V: Remove testcase XFAIL)

2024-08-27 Thread Thomas Schwinge

Hi!

On 2024-08-19T13:14:02-0700, Edwin Lu  wrote:
> The testcase has been modified to include the -fwrapv flag which now
> causes the test to pass. Remove the xfail exception

> --- a/gcc/testsuite/gcc.dg/signbit-5.c
> +++ b/gcc/testsuite/gcc.dg/signbit-5.c
> @@ -4,7 +4,6 @@
>  /* This test does not work when the truth type does not match vector type.  
> */
>  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
>  /* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } 
> } */
> -/* { dg-xfail-run-if "truth type does not match vector type" { riscv_v } } */

Same thing for GCN; I've pushed to trunk branch
commit 2daf6187c7289d012365419e10995042139cf8f5
"Un-XFAIL 'gcc.dg/signbit-5.c' for GCN", see attached.


Grüße
 Thomas


>From 2daf6187c7289d012365419e10995042139cf8f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 27 Aug 2024 12:37:29 +0200
Subject: [PATCH] Un-XFAIL 'gcc.dg/signbit-5.c' for GCN

It XPASSes after recent commit 5a3387938d4d95717cac29eecd0ba53e0ef9094d
"testsuite: Add -fwrapv to signbit-5.c".

	gcc/testsuite/
	* gcc.dg/signbit-5.c: Un-XFAIL for GCN.
---
 gcc/testsuite/gcc.dg/signbit-5.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c
index e65c8910c82..57e29e3ca63 100644
--- a/gcc/testsuite/gcc.dg/signbit-5.c
+++ b/gcc/testsuite/gcc.dg/signbit-5.c
@@ -3,7 +3,6 @@
 
 /* This test does not work when the truth type does not match vector type.  */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
-/* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } } */
 
 
 #include 
-- 
2.34.1

[pushed] [PATCH] testsuite: Fix ending of comment in test cases

2024-08-27 Thread Torbjörn SVENSSON

Found a few more places that had similar issue with the termination of the 
comment, so fixed them all.

Pushed below patch as obvious (r15-3215).

--


gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: Fixed dg-comment.
* gcc.dg/pr71071.c: Likewise.
* gcc.dg/tree-ssa/noreturn-1.c: Likewise.
* gcc.dg/tree-ssa/pr56727.c: Likewise.
* gcc.target/arc/loop-2.cpp: Likewise.
* gcc.target/arc/loop-3.c: Likewise.
* gcc.target/arc/pr9001107555.c: Likewise.
* gcc.target/arm/armv8_1m-fp16-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.
* gcc.target/i386/amxint8-asmatt-1.c: Likewise.
* gcc.target/i386/amxint8-asmintel-1.c: Likewise.
* gcc.target/i386/avx512bw-vpermt2w-1.c: Likewise.
* gcc.target/i386/avx512vbmi-vpermt2b-1.c: Likewise.
* gcc.target/i386/endbr_immediate.c: Likewise.
* gcc.target/i386/pr96539.c: Likewise.
* gcc.target/i386/sse2-pr98461-2.c: Likewise.
* gcc.target/m68k/pr39726.c: Likewise.
* gcc.target/m68k/pr52076-1.c: Likewise.
* gcc.target/m68k/pr52076-2.c: Likewise.
* gcc.target/nvptx/v2si-vec-set-extract.c: Likewise.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.dg/pr108757-1.c | 2 +-
 gcc/testsuite/gcc.dg/pr71071.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr56727.c   | 2 +-
 gcc/testsuite/gcc.target/arc/loop-2.cpp   | 2 +-
 gcc/testsuite/gcc.target/arc/loop-3.c | 2 +-
 gcc/testsuite/gcc.target/arc/pr9001107555.c   | 2 +-
 gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-1.c   | 2 +-
 gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-1.c   | 2 +-
 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c   | 2 +-
 gcc/testsuite/gcc.target/i386/amxint8-asmatt-1.c  | 2 +-
 gcc/testsuite/gcc.target/i386/amxint8-asmintel-1.c| 2 +-
 gcc/testsuite/gcc.target/i386/avx512bw-vpermt2w-1.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vbmi-vpermt2b-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/endbr_immediate.c   | 2 +-
 gcc/testsuite/gcc.target/i386/pr96539.c   | 2 +-
 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c| 2 +-
 gcc/testsuite/gcc.target/m68k/pr39726.c   | 2 +-
 gcc/testsuite/gcc.target/m68k/pr52076-1.c | 2 +-
 gcc/testsuite/gcc.target/m68k/pr52076-2.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/v2si-vec-set-extract.c | 2 +-
 21 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr108757-1.c 
b/gcc/testsuite/gcc.dg/pr108757-1.c
index 7908f4bdcb8..712dc4c30e9 100644
--- a/gcc/testsuite/gcc.dg/pr108757-1.c
+++ b/gcc/testsuite/gcc.dg/pr108757-1.c
@@ -13,6 +13,6 @@ typedef int INT;
 #define IMIN INT_MIN
 #include "pr108757.h"
 
-/* { dg-final { scan-tree-dump-not " = x_\[0-9\]+\\(D\\) \\+ " "optimized" } } 
*
+/* { dg-final { scan-tree-dump-not " = x_\[0-9\]+\\(D\\) \\+ " "optimized" } } 
*/
 /* { dg-final { scan-tree-dump-not " = x_\[0-9\]+\\(D\\) \\- " "optimized" } } 
*/
 /* { dg-final { scan-tree-dump-not " = b_\[0-9\]+ \\+ " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr71071.c b/gcc/testsuite/gcc.dg/pr71071.c
index 582f1f15a43..3e83dc9f1b7 100644
--- a/gcc/testsuite/gcc.dg/pr71071.c
+++ b/gcc/testsuite/gcc.dg/pr71071.c
@@ -1,5 +1,5 @@
 /* PR bootstrap/71071 */
-/* { dg-do compile } *
+/* { dg-do compile } */
 /* { dg-options "-O2" } */
 
 struct S { unsigned b : 1; } a;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c
index ae7ee42fabc..35f3d980217 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/noreturn-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } *
+/* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-ssa -std=gnu11" } */
 /* { dg-final { scan-tree-dump-times "__builtin_unreachable" 4 "ssa" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr56727.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr56727.c
index 3080ce183b8..da2c9ab31f2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr56727.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr56727.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target fpic } } *
+/* { dg-do compile { target fpic } } */
 /* { dg-require-alias "" } */
 /* { dg-options "-O2 -fPIC -fdump-tree-optimized" } */
 void do_not_optimize(int b)
diff --git a/gcc/testsuite/gcc.target/arc/loop-2.cpp 
b/gcc/testsuite/gcc.target/arc/loop-2.cpp
index d1dc917ba47..9cfb3274e21 100644
--- a/gcc/testsuite/gcc.target/arc/loop-2.cpp
+++ b/gcc/testsuite/gcc.target/arc/loop-2.cpp
@@ -1,4 +1,4 @@
-/* { dg-options "-O2" } *
+/* { dg-options "-O2" } */
 /* { dg-do assemble } */
 
 /* This file fails to assemble if we forgot to increase the number of
diff --git a/gcc/testsuite/gcc.target/arc/loop-3.c 
b/gcc/testsuite/gcc.target/arc/loop-3.c
index a

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Tobias Burnus


Hi Andrew,

Andrew Stubbs:

On 22/08/2024 19:26, Tobias Burnus wrote:
(A) Any comments, suggestions regarding the patch in general and in 
particular the plugin/ related parts?


The code all looks pretty reasonable to me.

The header file conditional includes worry me though: it is adding 
complexity in a way that hurts maintainability, and looks like it 
might break somebody's hypothetical out-of-tree plugin. Is it not 
better for a plugin that supports interop to include omp.h itself?
I do note that libgomp.h explicitly includes 'omp.h.in' – and later 
includes 'libgomp-plugin.h' and not omp.h.


But I don't know why. It could be some build-related issue or because it 
replaces already the locking definition by its own? (Albeit it could 
still use 'omp.h' together with the current '#ifdef' protection.)


Assuming that omp.h.in is only included as the locking-type dance is 
done – and not an actual build issues: I will try whether just including 
'omp.h' in plugin/plugin-*.c and libgomp-plugin.c before 
libgomp-plugin.h works. For libgomp.h, it is already included (and then 
used by target.c).


* * *

(B) RFC: The *stream* *creation* (hsa_queue_t, 
cudaStream_t/hipStream_t) functions have tons of options. Thus:

...

(ii) Should the user be able to tweak the values?

I mean, the user could say:** 'prefer_type({fr("cuda"), 
attr("ompx_priority:-2,ompx_non_blocking")},{fr("hsa"),attr("ompx_queue_size:64"})'.


Do we want to permit this? If yes, which of the values should be 
changeable?


Is there any prior art for this? It looks like it could be added in 
future, without breaking backward compatibility, so I say "no" (at 
least for now).


There is no real prior art as the 'attr' is a very new feature (voted in 
in the about two months ago); I think it was mainly proposed for 'sycl' 
to specify an 'in-order' queue, which is a commonly what needed, but the 
default in sycl is an 'out-of-order' queue. In any case, it seems as if 
they intent to provide either type of queue.


Still, if there is a sensible attribute to set, I think it makes sense 
to actually add it – and 'ompx_gnu_' should avoid interoperability issues.


But as the feature is supported code wise, adding an attribute only 
requires changing two files: The plugin-.c and libgomp.texi, i.e. 
that's simple and quick.


Tobias

RE: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-08-27 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, August 27, 2024 11:46 AM
> To: Tamar Christina 
> Cc: Jennifer Schmitz ; gcc-patches@gcc.gnu.org; Kyrylo
> Tkachov 
> Subject: Re: [RFC][PATCH] AArch64: Remove
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> 
> Tamar Christina  writes:
> > Hi Jennifer,
> >
> >> -Original Message-
> >> From: Jennifer Schmitz 
> >> Sent: Friday, August 23, 2024 1:07 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Richard Sandiford ; Kyrylo Tkachov
> >> 
> >> Subject: [RFC][PATCH] AArch64: Remove
> >> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> >>
> >> This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> >> tunable and
> >> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
> >> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
> >> default.
> 
> Thanks for doing this.  This has been on my TODO list ever since the
> tunable was added.
> 
> The history is that these "new" costs were originally added in stage 4
> of GCC 11 for Neoverse V1.  Since the costs were added so late, it wasn't
> appropriate to change the behaviour for any other core.  All the new code
> was therefore gated on this option.
> 
> The new costs do two main things:
> 
> (1) use throughput-based calculations where available, including to choose
> between Advanced SIMD and SVE
> 
> (2) try to make the latency-based costs more precise, by looking more closely
> at the provided stmt_info
> 
> Old cost models won't be affected by (1) either way, since they don't
> provide any throughput information.  But they should in principle benefit
> from (2).  So...
> 
> >> To that end, the function aarch64_use_new_vector_costs_p and its uses were
> >> removed. Additionally, guards were added prevent nullpointer dereferences 
> >> of
> >> fields in cpu_vector_cost.
> >>
> >
> > I'm not against this change, but it does mean that we now switch old Adv. 
> > SIMD
> > cost models as well to the new throughput based cost models.  That means 
> > that
> > -mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I think
> > some distros explicitly use this (I believe yocto for instance does).
> 
> ...it shouldn't mean that we start using throughput-based models for
> cortexa53 etc., since there's no associated issue info.

Yes, I was using throughput based model as a name.  But as you indicated in (2)
it does change the latency calculation. 

My question was because of things in e.g. aarch64_adjust_stmt_cost and friends,
e.g. aarch64_multiply_add_p changes the cost between FMA SIMD vs scalar.

So my question..

> 
> > Have we validated that the old generic cost model still behaves sensibly 
> > with this
> change?

is still valid I think, we *are* changing the cost for all models,
and while they should indeed be more accurate, there could be knock on effects.

Thanks,
Tamar

> >
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu:
> >> No problems bootstrapping, but several test files (in aarch64-sve.exp:
> >> gather_load_extend_X.c
> >> where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of 
> >> small
> >> differences
> >> in codegen that make some of the scan-assembler-times tests fail.
> >>
> >> Kyrill suggested to add a -fvect-cost-model=unlimited flag to these tests 
> >> and
> add
> >> some
> >
> > I don't personally like unlimited here as unlimited means just vectorize at 
> > any
> > cost.  This means that costing between modes are also disabled. A lot of 
> > these
> > testcases are intended to test not just that we vectorize but that we 
> > vectorize
> > with efficient code.
> >
> > I'd prefer to use -fvect-cost-model=dynamic if that fixes the testcases.
> 
> Yeah, I don't think -fvect-cost-model=unlimited would work for the
> gather_load_extend_X.c tests, since we require the cost model to decide
> when to use extending loads vs loading a full vector and unpacking.
> 
> [...tries patch...]
> 
> It seems like three main things are contributing to the difference:
> 
> 1. we now cost 0 for a scalar prologue extension from a loaded value
> 2. we now cost gathers & scatters based on gather_load_x32/x64_cost
> 3. we now apply a large one-off cost for gathers (Tamar's recent change)
> 
> (1) is expected.
> 
> (2) partly looks like a latent bug.  We're using the x32 cost for
> VNx2QI->VNx2SI, even though those are really .B->.D loads.
> 
> @@ -16819,7 +16811,7 @@ aarch64_detect_vector_stmt_subtype (vec_info
> *vinfo, vect_cost_for_stmt kind,
>&& STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) ==
> VMAT_GATHER_SCATTER)
>  {
>unsigned int nunits = vect_nunits_for_cost (vectype);
> -  if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64)
> +  if (known_eq (GET_MODE_NUNITS (TYPE_MODE (vectype)),
> aarch64_sve_vg))
>   return { sve_costs->gather_load_x64_cost, nunits };
>return { sve_costs->gather_load_x32_cost, nunits };
>  }
> 
> fixes that.
> 
> (3) is interesting.  generic_ve

RE: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned .SAT_ADD

2024-08-27 Thread Li, Pan2

Thanks Richard for comments.


> Err, can you please simply do
>if (TREE_CODE (ops[1]) == INTEGER_CST)
>  ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1])
> ?  you are always matching the constant to @1 IIRC.

That would be much more simple, will have a try in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 27, 2024 5:09 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Reconcile the const_int operand type of unsigned 
.SAT_ADD

On Tue, Aug 27, 2024 at 9:09 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_ADD has 2 operand, when one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
>   T __attribute__((noinline))  \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {\
> unsigned i;\
> T ret; \
> for (i = 0; i < limit; i++)\
>   {\
> out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>   }\
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will fail to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be (const_int 9) with
> the SImode, which is different from _2 (DImode).  Aka:
>
> uint64_t _1;
> uint64_t _2;
> _1 = .SAT_ADD (_2, 9);
>
> This patch would like to reconcile the imm operand to the operand type
> mode of _2 if and only if there is no precision/data loss.  Aka convert
> the imm 9 to the DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_reconcile_cst_to_unsigned):
> Add new func impl to reconcile the cst int type to given TREE type.
> (vect_recog_sat_add_pattern): Reconcile the ops of .SAT_ADD
> before building the gimple call.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper 
> macros.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c: 
> New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-11.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-12.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c: New test.
> * 
> gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-3.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-4.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-7.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-8.c: 
> New test.
> * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c: 
> New test.
>
> Signed-off-by: Pan Li 
> ---
>  .../binop/vec_sat_u_add_imm_reconcile-1.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-10.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-11.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-12.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-13.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-14.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-15.c|  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-2.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-3.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-4.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-5.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-6.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-7.c |  9 +
>  .../binop/vec_sat_u_add_imm_reconcile-8.c |  9 +
>  .../binop/vec_sat_u_add_imm_rec

RE: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-27 Thread Li, Pan2

> :c is required when you want to match up @0s and they appear in a commutative
> operation and there's no canonicalization rule putting it into one or the 
> other
> position.  In your case you have two commutative operations you want to match
> up, so it should be only necessary to try swapping one of it to get the match,
> it's not required to swap both.  This reduces the number of generated 
> patterns.

Thanks Richard for the explanation. Got the point that the swap on captures for 
a op will
also effect on other op(s), will update in v4.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, August 27, 2024 4:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer .SAT_ADD

On Tue, Aug 27, 2024 at 3:06 AM Li, Pan2  wrote:
>
> Thanks Richard for comments.
>
> > I think you want to use nop_convert here, for sure a truncation or
> > extension wouldn't be valid?
>
> Oh, yes, should be nop_convert.
>
> > I think you don't need :c on both the inner plus and the bit_xor here?
>
> Sure, could you please help to explain more about when should I need to add 
> :c?
> Liker inner plus/and/or ... etc, sometimes got confused for similar scenarios.

:c is required when you want to match up @0s and they appear in a commutative
operation and there's no canonicalization rule putting it into one or the other
position.  In your case you have two commutative operations you want to match
up, so it should be only necessary to try swapping one of it to get the match,
it's not required to swap both.  This reduces the number of generated patterns.

> > +   integer_zerop)
> > +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
>
> > The comment above quotes 'MIN' but that's not present here - that is,
> > the comment quotes a source form while we match what we see on
> > GIMPLE?  I do expect the matching will be quite fragile when not
> > being isolated.
>
> Got it, will update the comments to gimple.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 26, 2024 9:40 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v3] Match: Support form 1 for scalar signed integer 
> .SAT_ADD
>
> On Mon, Aug 26, 2024 at 4:20 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to support the form 1 of the scalar signed
> > integer .SAT_ADD.  Aka below example:
> >
> > Form 1:
> >   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
> >   T __attribute__((noinline))  \
> >   sat_s_add_##T##_fmt_1 (T x, T y) \
> >   {\
> > T sum = (UT)x + (UT)y; \
> > return (x ^ y) < 0 \
> >   ? sum\
> >   : (sum ^ x) >= 0 \
> > ? sum  \
> > : x < 0 ? MIN : MAX;   \
> >   }
> >
> > DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
> >
> > We can tell the difference before and after this patch if backend
> > implemented the ssadd3 pattern similar as below.
> >
> > Before this patch:
> >4   │ __attribute__((noinline))
> >5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
> >6   │ {
> >7   │   int64_t sum;
> >8   │   long unsigned int x.0_1;
> >9   │   long unsigned int y.1_2;
> >   10   │   long unsigned int _3;
> >   11   │   long int _4;
> >   12   │   long int _5;
> >   13   │   int64_t _6;
> >   14   │   _Bool _11;
> >   15   │   long int _12;
> >   16   │   long int _13;
> >   17   │   long int _14;
> >   18   │   long int _16;
> >   19   │   long int _17;
> >   20   │
> >   21   │ ;;   basic block 2, loop depth 0
> >   22   │ ;;pred:   ENTRY
> >   23   │   x.0_1 = (long unsigned int) x_7(D);
> >   24   │   y.1_2 = (long unsigned int) y_8(D);
> >   25   │   _3 = x.0_1 + y.1_2;
> >   26   │   sum_9 = (int64_t) _3;
> >   27   │   _4 = x_7(D) ^ y_8(D);
> >   28   │   _5 = x_7(D) ^ sum_9;
> >   29   │   _17 = ~_4;
> >   30   │   _16 = _5 & _17;
> >   31   │   if (_16 < 0)
> >   32   │ goto ; [41.00%]
> >   33   │   else
> >   34   │ goto ; [59.00%]
> >   35   │ ;;succ:   3
> >   36   │ ;;4
> >   37   │
> >   38   │ ;;   basic block 3, loop depth 0
> >   39   │ ;;pred:   2
> >   40   │   _11 = x_7(D) < 0;
> >   41   │   _12 = (long int) _11;
> >   42   │   _13 = -_12;
> >   43   │   _14 = _13 ^ 9223372036854775807;
> >   44   │ ;;succ:   4
> >   45   │
> >   46   │ ;;   basic block 4, loop depth 0
> >   47   │ ;;pred:   2
> >   48   │ ;;3
> >   49   │   # _6 = PHI 
> >   50   │   return _6;
> >

Re: [PATCH] testsuite: Avoid running neon test on Cortex-M55

2024-08-27 Thread Richard Earnshaw (lists)

On 13/08/2024 17:18, Andre Vieira (lists) wrote:
> I'm not a maintainer but I'd argue the entire test is bogus.
> 
> The error reporting in this area seems to be somewhat fragile, if you compile 
> it with '-march=armv7-a -mfloat-abi=soft', you also don't get the error this 
> is testing for.  I'd argue this kind of user friendly error message should 
> just go through the #include  and if a user is using __builtin's 
> directly like this then they better know what they are doing and so 'useful' 
> errors are probably less of a priority.
> 
> In case you are wondering: no we don't offer nice errors when '#include 
> ' is compiled with a MVE enabled combination of march/mcpu, but 
> the errors are somewhat friendlier if you compile a '#include ' 
> with a NEON enabled command line.
> 
> Anyway, lets see what Richard says.

I'm inclined to agree.  Unfortunately, we have quite a few suspicious tests: 
it's the downside of a convention that all patches should be accompanied by 
tests - if the tests aren't up to much then often cause more problems than they 
are worth.  I'd be happy for this one to go.

If you really think it ought to be kept for some reason, then the 
dg-add-options directive should be updated to match the require effective 
target (ie 'arm_neon').

R.

> 
> On 13/08/2024 12:15, Torbjörn SVENSSON wrote:
>> Ok for trunk and releases/gcc-14?
>>
>> -- 
>>
>> Cortex-M55 supports VFP, but does not contain neon, so the test is
>> invalid in this context.
>>
>> Without this patch, the following error can be seen in the logs:
>>
>> .../attr-neon-builtin-fail2.c: In function 'foo':
>> .../attr-neon-builtin-fail2.c:13:27: error: implicit declaration of function 
>> '__builtin_neon_vaddlsv8qi'; did you mean '__builtin_neon_vabshf'? 
>> [-Wimplicit-function-declaration]
>> .../attr-neon-builtin-fail2.c:13:3: error: cannot convert a value of type 
>> 'int' to vector type '__simd128_int16_t' which has different size
>>
>> gcc/testsuite/ChangeLog:
>>
>> * attr-neon-builtin-fail2.c: Check ET neon.
>>
>> Signed-off-by: Torbjörn SVENSSON 
>> Co-authored-by: Yvan ROUX 
>> ---
>>   gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail2.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail2.c 
>> b/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail2.c
>> index 9cb5a2ebb90..8942b0d68f1 100644
>> --- a/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail2.c
>> +++ b/gcc/testsuite/gcc.target/arm/attr-neon-builtin-fail2.c
>> @@ -1,6 +1,7 @@
>>   /* Check that calling a neon builtin from a function compiled with vfp 
>> fails.  */
>>   /* { dg-do compile } */
>>   /* { dg-require-effective-target arm_vfp_ok } */
>> +/* { dg-require-effective-target arm_neon_ok } */
>>   /* { dg-options "-O2" } */
>>   /* { dg-add-options arm_vfp } */
>>

[PATCH v4] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-27 Thread pan2 . li

From: Pan Li 

This patch would like to support the form 1 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_add_##T##_fmt_1 (T x, T y) \
  {\
T sum = (UT)x + (UT)y; \
return (x ^ y) < 0 \
  ? sum\
  : (sum ^ x) >= 0 \
? sum  \
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t sum;
   8   │   long unsigned int x.0_1;
   9   │   long unsigned int y.1_2;
  10   │   long unsigned int _3;
  11   │   long int _4;
  12   │   long int _5;
  13   │   int64_t _6;
  14   │   _Bool _11;
  15   │   long int _12;
  16   │   long int _13;
  17   │   long int _14;
  18   │   long int _16;
  19   │   long int _17;
  20   │
  21   │ ;;   basic block 2, loop depth 0
  22   │ ;;pred:   ENTRY
  23   │   x.0_1 = (long unsigned int) x_7(D);
  24   │   y.1_2 = (long unsigned int) y_8(D);
  25   │   _3 = x.0_1 + y.1_2;
  26   │   sum_9 = (int64_t) _3;
  27   │   _4 = x_7(D) ^ y_8(D);
  28   │   _5 = x_7(D) ^ sum_9;
  29   │   _17 = ~_4;
  30   │   _16 = _5 & _17;
  31   │   if (_16 < 0)
  32   │ goto ; [41.00%]
  33   │   else
  34   │ goto ; [59.00%]
  35   │ ;;succ:   3
  36   │ ;;4
  37   │
  38   │ ;;   basic block 3, loop depth 0
  39   │ ;;pred:   2
  40   │   _11 = x_7(D) < 0;
  41   │   _12 = (long int) _11;
  42   │   _13 = -_12;
  43   │   _14 = _13 ^ 9223372036854775807;
  44   │ ;;succ:   4
  45   │
  46   │ ;;   basic block 4, loop depth 0
  47   │ ;;pred:   2
  48   │ ;;3
  49   │   # _6 = PHI 
  50   │   return _6;
  51   │ ;;succ:   EXIT
  52   │
  53   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t _4;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  12   │   return _4;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the matching for signed .SAT_ADD.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
matching func decl.
(match_unsigned_saturation_add): Try signed .SAT_ADD and rename
to ...
(match_saturation_add): ... here.
(math_opts_dom_walker::after_dom_children): Update the above renamed
func from caller.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 15 +++
 gcc/tree-ssa-math-opts.cc | 35 ++-
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 78f1957e8c7..09a36159163 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3192,6 +3192,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0
 
+/* Signed saturation add, case 1:
+   T sum = (T)((UT)X + (UT)Y)
+   SAT_S_ADD = (X ^ sum) & !(X ^ Y) < 0 ? (-(T)(X < 0) ^ MAX) : sum;
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (lt (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
+ (nop_convert @1
+  (bit_not (bit_xor:c @0 @1)))
+   integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   @2)
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 8d96a4c964b..3c93fca5b53 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4023,6 +4023,8 @@ extern bool gimple_unsigned_integer_sat_add (tree, tree*, 
tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
 
+extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree));
+
 static void
 build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn,

Re: [PATCH v4] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-27 Thread Richard Biener

On Tue, Aug 27, 2024 at 1:53 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the form 1 of the scalar signed
> integer .SAT_ADD.  Aka below example:
>
> Form 1:
>   #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \
>   T __attribute__((noinline))  \
>   sat_s_add_##T##_fmt_1 (T x, T y) \
>   {\
> T sum = (UT)x + (UT)y; \
> return (x ^ y) < 0 \
>   ? sum\
>   : (sum ^ x) >= 0 \
> ? sum  \
> : x < 0 ? MIN : MAX;   \
>   }
>
> DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX)
>
> We can tell the difference before and after this patch if backend
> implemented the ssadd3 pattern similar as below.
>
> Before this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t sum;
>8   │   long unsigned int x.0_1;
>9   │   long unsigned int y.1_2;
>   10   │   long unsigned int _3;
>   11   │   long int _4;
>   12   │   long int _5;
>   13   │   int64_t _6;
>   14   │   _Bool _11;
>   15   │   long int _12;
>   16   │   long int _13;
>   17   │   long int _14;
>   18   │   long int _16;
>   19   │   long int _17;
>   20   │
>   21   │ ;;   basic block 2, loop depth 0
>   22   │ ;;pred:   ENTRY
>   23   │   x.0_1 = (long unsigned int) x_7(D);
>   24   │   y.1_2 = (long unsigned int) y_8(D);
>   25   │   _3 = x.0_1 + y.1_2;
>   26   │   sum_9 = (int64_t) _3;
>   27   │   _4 = x_7(D) ^ y_8(D);
>   28   │   _5 = x_7(D) ^ sum_9;
>   29   │   _17 = ~_4;
>   30   │   _16 = _5 & _17;
>   31   │   if (_16 < 0)
>   32   │ goto ; [41.00%]
>   33   │   else
>   34   │ goto ; [59.00%]
>   35   │ ;;succ:   3
>   36   │ ;;4
>   37   │
>   38   │ ;;   basic block 3, loop depth 0
>   39   │ ;;pred:   2
>   40   │   _11 = x_7(D) < 0;
>   41   │   _12 = (long int) _11;
>   42   │   _13 = -_12;
>   43   │   _14 = _13 ^ 9223372036854775807;
>   44   │ ;;succ:   4
>   45   │
>   46   │ ;;   basic block 4, loop depth 0
>   47   │ ;;pred:   2
>   48   │ ;;3
>   49   │   # _6 = PHI 
>   50   │   return _6;
>   51   │ ;;succ:   EXIT
>   52   │
>   53   │ }
>
> After this patch:
>4   │ __attribute__((noinline))
>5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
>6   │ {
>7   │   int64_t _4;
>8   │
>9   │ ;;   basic block 2, loop depth 0
>   10   │ ;;pred:   ENTRY
>   11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   12   │   return _4;
>   13   │ ;;succ:   EXIT
>   14   │
>   15   │ }
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.

OK.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add the matching for signed .SAT_ADD.
> * tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
> matching func decl.
> (match_unsigned_saturation_add): Try signed .SAT_ADD and rename
> to ...
> (match_saturation_add): ... here.
> (math_opts_dom_walker::after_dom_children): Update the above renamed
> func from caller.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 15 +++
>  gcc/tree-ssa-math-opts.cc | 35 ++-
>  2 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 78f1957e8c7..09a36159163 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3192,6 +3192,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0
>
> +/* Signed saturation add, case 1:
> +   T sum = (T)((UT)X + (UT)Y)
> +   SAT_S_ADD = (X ^ sum) & !(X ^ Y) < 0 ? (-(T)(X < 0) ^ MAX) : sum;
> +
> +   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
> +(match (signed_integer_sat_add @0 @1)
> + (cond^ (lt (bit_and:c (bit_xor:c @0 (nop_convert@2 (plus (nop_convert @0)
> + (nop_convert @1
> +  (bit_not (bit_xor:c @0 @1)))
> +   integer_zerop)
> +   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
> +   @2)
> + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 8d96a4c964b..3c93fca5b53 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4023,6 +4023,8 @@ extern bool gimple_unsigned_integer_sat_add (tree, 
> tree*, tree (*)(tree));
>

RE: [PATCH] RISC-V: Fix double mode under RV32 not utilize vf

2024-08-27 Thread Demin Han

Hi Jeff,

Yes, there are some tests fails after the last_combine pass introduced.
I remember these tests still have vv format which not become vf after 
last_combine.

I’ll update the testcase based on my local branch after your push.

Regards,
Demin

From: Jeff Law 
Sent: 2024年8月26日 5:59
To: Demin Han ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; pan2...@intel.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix double mode under RV32 not utilize vf



On Fri, Jul 19, 2024 at 12:07 PM Jeff Law 
mailto:jeffreya...@gmail.com>> wrote:


On 7/19/24 2:55 AM, demin.han wrote:
> Currently, some binops of vector vs double scalar under RV32 can't
> translated to vf but vfmv+vxx.vv.
>
> The cause is that vec_duplicate is also expanded to broadcast for double mode
> under RV32. last-combine can't process expanded broadcast.
>
> gcc/ChangeLog:
>
>   * config/riscv/vector.md: Add !FLOAT_MODE_P constrain
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
>   * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
>   * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
>   * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
>   * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto
It looks like vadd-rv32gcv-nofm still isn't quite right according to the
pre-commit testing:

  >
https://github.com/ewlu/gcc-precommit-ci/issues/1931#issuecomment-2238752679


OK once that's fixed.  No need to wait for another review cycle.
There's a reasonable chance late-combine was catching more cases that could be 
turned into .vf forms.  That was pretty common when I first looked at the 
late-combine changes.

Regardless,  I adjusted the vadd/vsub tests and pushed this to the trunk.

Thanks,
jeff

Re: PING ^1 [PATCH] GCC Driver : Enable very long gcc command-line option

2024-08-27 Thread Richard Biener

On Tue, 27 Aug 2024, Dora, Sunil Kumar wrote:

> Dear GCC Team,
> 
> Please consider this as a gentle reminder to review the patch I posted at the 
> following link: [ 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660223.html ].
> 
> BUG Link : [ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527 ]
> 
> Your feedback or approval would be greatly appreciated.
> 
> Best Regards,
> Sunil Dora
> 
> On 2024-08-13 2:30 a.m., 
> deepthi.hem...@windriver.com wrote:
> 
> From: sunil dora 
> 
> For excessively long environment variables i.e >128KB
> Store the arguments in a temporary file and collect them back together in 
> collect2.
> 
> This commit patches for COLLECT_GCC_OPTIONS issue:
> GCC should not limit the length of command line passed to collect2.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
> 
> The Linux kernel has the following limits on shell commands:
> I.  Total number of bytes used to specify arguments must be under 128KB.
> II. Each environment variable passed to an executable must be under 128 KiB
> 
> In order to circumvent these limitations, many build tools support
> response-files, i.e. files that contain the arguments for the executed
> command. These are typically passed using @ syntax.
> 
> Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
> expanded command line to collect2. With many options, this exceeds the limit 
> II.
> 
> GCC : Added Testcase for PR111527
> 
> TC1 : If the command line argument less than 128kb, gcc should use
>   COLLECT_GCC_OPTION to communicate and compile fine.
> TC2 : If the command line argument in the range of 128kb to 2mb,
>   gcc should copy arguments in a file and use FILE_GCC_OPTIONS
>   to communicate and compile fine.
> TC3 : If the command line argument greater thean 2mb, gcc shuld
>   fail the compile and report error. (Expected FAIL)

Just as a random comment - I'd prefer if COLLECT_GCC_OPTIONS would
allow response files instead of indirecting via a new fixed other
environment like the proposed FILE_GCC_OPTIONS.

That looks like a cleaner interface to me.

Richard.

> Signed-off-by: Topi Kuutela 
> 
> Signed-off-by: sunil dora 
> 
> 
> ---
>  
> gcc/collect2.cc
>| 40 +++--
>  
> gcc/gcc.cc
> | 37 +--
>  gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
>  7 files changed, 160 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c
> 
> diff --git 
> a/gcc/collect2.cc
>  
> b/gcc/collect2.cc
> index 902014a9cc1..564d8968648 100644
> --- 
> a/gcc/collect2.cc
> +++ 
> b/gcc/collect2.cc
> @@ -376,6 +376,40 @@ typedef int scanfilter;
> 
>  static void scan_prog_file (const char *, scanpass, scanfilter);
> 
> +char* getenv_extended (const char* var_name)
> +{
> +  int file_size;
> +  char* buf = NULL;
> +
> +  char* string = getenv (var_name);
> +  if (!string)
> +{
> +  char* string = getenv ("FILE_GCC_OPTIONS");
> +  FILE *fptr;
> +  fptr = fopen (string, "r");
> +  if (fptr == NULL)
> +   return (0);
> +  /* Copy contents from temporary file to buffer */
> +  if (fseek (fptr, 0, SEEK_EN

Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar .SAT_SUB IMM form 3

2024-08-27 Thread Jeff Law





On 8/27/24 1:17 AM, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to add test cases for the unsigned scalar
.SAT_SUB IMM form 3.  Aka:

Form 3:
   #define DEF_SAT_U_SUB_IMM_FMT_3(T, IMM) \
   T __attribute__((noinline)) \
   sat_u_sub_imm##IMM##_##T##_fmt_3 (T y)  \
   {   \
 return (T)IMM > y ? (T)IMM - y : 0;   \
   }

DEF_SAT_U_SUB_IMM_FMT_3(uint64_t, 23)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-10_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-11_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-9_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-9.c: New test.

Both patches in this series are OK.
jeff

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-27 Thread Qing Zhao



> On Aug 26, 2024, at 15:46, Bill Wendling  wrote:
> 
> On Wed, Aug 21, 2024 at 8:43 AM Martin Uecker  wrote:
>> 
>> Am Mittwoch, dem 21.08.2024 um 15:24 + schrieb Qing Zhao:
 
 But if we changed it to return a void pointer,  we could make this
 a compile-time check:
 
 auto ret = __builtin_get_counted_by(__p->FAM);
 
 _Generic(ret, void*: (void)0, default: *ret = COUNT);
>>> 
>>> Is there any benefit to return a void pointer than a SIZE_T pointer for
>>> the NULL pointer?
>> 
>> Yes! You can test with _Generic (or __builtin_types_compatible_p)
>> at compile-time based on the type whether you can set *ret to COUNT
>> or not as in the example above.
>> 
>> So it is not a weird run-time test which needs to be optimized
>> away.
>> 
> Using a '_Generic' moves so much of the work onto the programmer that
> it would be far easier, and cleaner, for them simply to specify the
> 'counter' field in the macro and be done with it. Something like:
> 
>  #define alloc(PTR, COUNT, FAM, COUNTER)
> 
> If the FAM doesn't have a 'counted_by' field:
> 
>  #define alloc(PTR, COUNT, FAM)
> 
> (It would use VAR_ARGS of course). Why not simply have the compiler
> automatically adjust the return type?

What do you mean by “have the compiler automatically adjust the return type”?
From my current GCC implementation, if there is counted_by object associated 
with the flexible array member, then the returned pointer points to the 
counted_by object (which has its own original type), compiler does not need to 
_adjust_ the returned type. 

So, what do you mean by _adjust_?

Qing


> It's perfectly capable of Doing
> the Right Thing(tm). Otherwise, this builtin becomes even less
> desirable to use than it currently is.

Re: [committed] libstdc++: Make std::vector::reference constructor private [PR115098]

2024-08-27 Thread Jonathan Wakely

On Mon, 26 Aug 2024 at 00:08, Andrew Pinski  wrote:
>
> On Fri, Aug 23, 2024 at 5:20 AM Jonathan Wakely  wrote:
> >
> > Tested x86_64-linux. Pushed to trunk.
> >
> > -- >8 --
> >
> > The standard says this constructor should be private.  LWG 4141 proposes
> > to remove it entirely. We still need it, but it doesn't need to be
> > public.
> >
> > For std::bitset the default constructor is already private (and never
> > even defined) but there's a non-standard constructor that's public, but
> > doesn't need to be.
>
> This looks like broke the pretty-printers testcase:
> ```
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:
> In function 'int main()':
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:156:
> error: 'std::_Bit_reference::_Bit_reference()' is private within this
> context
> In file included from
> /home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/vector:67,
>  from
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:31:
> /home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_bvector.h:90:
> note: declared private here
> compiler exited with status 1
>
> ...
> spawn -ignore SIGHUP
> /home/apinski/src/upstream-gcc-isel/gcc/objdir/./gcc/xg++
> -shared-libgcc -B/home/apinski/src/upstream-gcc-isel/gcc/objdir/./gcc
> -nostdinc++ 
> -L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src
> -L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs
> -L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
> -B/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/bin/
> -B/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/lib/ -isystem
> /home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/include -isystem
> /home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/sys-include
> -fchecking=1 
> -B/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/./libstdc++-v3/src/.libs
> -fmessage-length=0 -fno-show-column -ffunction-sections
> -fdata-sections -fcf-protection -mshstk -g -O2 -D_GNU_SOURCE
> -DLOCALEDIR="." -nostdinc++
> -I/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
> -I/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include
> -I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/libsupc++
> -I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/include/backward
> -I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/util
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
> -g -O0 -fdiagnostics-plain-output ./libtestc++.a -Wl,--gc-sections
> -L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/filesystem/.libs
> -L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/experimental/.libs
> -lm -o ./simple11.exe
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:
> In function 'int main()':
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:149:
> error: 'std::_Bit_reference::_Bit_reference()' is private within this
> context
> In file included from
> /home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/vector:67,
>  from
> /home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:31:
> /home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_bvector.h:90:
> note: declared private here
> compiler exited with status 1
> ```
>
> Noticed because of the new UNRESOLVED .

Oops, thanks for noticing. I didn't see it because it didn't add a FAIL.

I'll push a fix to the tests.

[committed] libstdc++: Do not use std::vector::reference default ctor [PR115098]

2024-08-27 Thread Jonathan Wakely

This default constructor was made private by r15-3124-gb25b101bc38000 so
the pretty printer tests need a fix to stop using it. There's no
conforming way to get a default-constructed 'reference' now, e.g. trying
to access an element of a default-constructed std::vector will
trigger an assertion. Remove the tests, but leave a comment in the
printer code about handling it.

libstdc++-v3/ChangeLog:

PR libstdc++/115098
* python/libstdcxx/v6/printers.py (StdBitReferencePrinter): Add
comment.
* testsuite/libstdc++-prettyprinters/simple.cc: Do not default
construct std::vector::reference.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py| 3 +++
 libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc   | 3 ---
 libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc | 3 ---
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index a6c2ed4599f..92104937862 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -613,6 +613,9 @@ class StdBitReferencePrinter(printer_base):
 
 def to_string(self):
 if not self._val['_M_p']:
+# PR libstdc++/115098 removed the reference default constructor
+# that this case relates to. New code should never need this,
+# but we still handle it for compatibility with old binaries.
 return 'invalid std::vector::reference'
 return bool(self._val['_M_p'].dereference() & (self._val['_M_mask']))
 
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index 7bdc6548f72..c6d18d3fe03 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -153,9 +153,6 @@ main()
   std::vector::reference br5 = *vbIt5;
 // { dg-final { note-test br5 {true} } }
 
- std::vector::reference br0;
-// { dg-final { note-test br0 {invalid std::vector::reference} } }
-
   __gnu_cxx::slist sll;
   sll.push_front(23);
   sll.push_front(47);
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
index 3658e3ef4eb..7fd0c4d76b2 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
@@ -146,9 +146,6 @@ main()
   std::vector::reference br5 = *vbIt5;
 // { dg-final { note-test br5 {true} } }
 
- std::vector::reference br0;
-// { dg-final { note-test br0 {invalid std::vector::reference} } }
-
   __gnu_cxx::slist sll;
   sll.push_front(23);
   sll.push_front(47);
-- 
2.46.0

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-27 Thread Qing Zhao



> On Aug 26, 2024, at 16:30, Kees Cook  wrote:
> 
> On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
>> Hi, Martin,
>> 
>> Looks like that there is some issue when I tried to use the _Generic for the 
>> testing cases, and then I narrowed down to a
>> small testing case that shows the problem without any change to GCC.
>> 
>> [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
>> struct annotated {
>>  char b;
>>  int c[];
>> } *array_annotated;  
>> extern void * counted_by_ref (int *);
>> 
>> int main(int argc, char *argv[])
>> {
>>  typeof(counted_by_ref (array_annotated->c)) ret
>>= counted_by_ref (array_annotated->c); 
>>   _Generic (ret, void* : (void)0, default: *ret = 10);
>> 
>>  return 0;
>> }
>> [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
>> t1.c: In function ‘main’:
>> t1.c:12:44: warning: dereferencing ‘void *’ pointer
>>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>>  |^~~~
>> t1.c:12:49: error: invalid use of void expression
>>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>>  | ^
> 
> I implemented it like this[1] in the Linux kernel. So yours could be:
> 
> struct annotated {
>  char b;
>  int c[] __attribute__((counted_by(b));
> };
> extern struct annotated *array_annotated;
> 
> int main(int argc, char *argv[])
> {
>  typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
>void *: (size_t *)NULL,
>default: __builtin_get_counted_by(array_annotated->c)))
> ret = __builtin_get_counted_by(array_annotated->c);
>  if (ret)
> *ret = 10;
> 
>  return 0;
> }
> 
> It's a bit cumbersome, but it does what's needed.
> 
> This is, however, just doing exactly what Bill has suggested: it is
> converting the (void *)NULL into (size_t *)NULL when there is no
> counted_by annotation...

That’s the reason why I returned a (size_t *) instead of a (void *) for the 
NULL pointer when there is no counted_by annotation in the 1st version of the 
patch, then the conversion from (void *) NULL to (size_t *) NULL in the source 
code level will not be needed.

Then the above can be simplified as:

typeof (__builtin_get_counted_by(array_annotated->c)) ret = 
__builtin_get_counted_by(array_annotated->c);

If (ret) *ret = 10;

I am wondering shall I still keep the (size_t *) for the returned NULL pointer 
instead of the (void *)?

Qing


> 
> -Kees
> 
> [1] 
> https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
> 
> -- 
> Kees Cook

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-27 Thread Qing Zhao



> On Aug 26, 2024, at 17:01, Martin Uecker  wrote:
> 
> Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
>> On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
>>> Hi, Martin,
>>> 
>>> Looks like that there is some issue when I tried to use the _Generic for 
>>> the testing cases, and then I narrowed down to a
>>> small testing case that shows the problem without any change to GCC.
>>> 
>>> [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
>>> struct annotated {
>>>  char b;
>>>  int c[];
>>> } *array_annotated;  
>>> extern void * counted_by_ref (int *);
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>>  typeof(counted_by_ref (array_annotated->c)) ret
>>>= counted_by_ref (array_annotated->c); 
>>>   _Generic (ret, void* : (void)0, default: *ret = 10);
>>> 
>>>  return 0;
>>> }
>>> [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
>>> t1.c: In function ‘main’:
>>> t1.c:12:44: warning: dereferencing ‘void *’ pointer
>>>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>>>  |^~~~
>>> t1.c:12:49: error: invalid use of void expression
>>>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>>>  | ^
>> 
>> I implemented it like this[1] in the Linux kernel. So yours could be:
>> 
>> struct annotated {
>>  char b;
>>  int c[] __attribute__((counted_by(b));
>> };
>> extern struct annotated *array_annotated;
>> 
>> int main(int argc, char *argv[])
>> {
>>  typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
>>void *: (size_t *)NULL,
>>default: __builtin_get_counted_by(array_annotated->c)))
>> ret = __builtin_get_counted_by(array_annotated->c);
>>  if (ret)
>> *ret = 10;
>> 
>>  return 0;
>> }
>> 
>> It's a bit cumbersome, but it does what's needed.
>> 
>> This is, however, just doing exactly what Bill has suggested: it is
>> converting the (void *)NULL into (size_t *)NULL when there is no
>> counted_by annotation...
>> 
>> -Kees
>> 
>> [1] 
>> https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
> 
> Interesting. Will __builtin_get_counted_by(array_annotated->c) give
> a null pointer (or an invalid pointer) of the correct type if 
> array_annotated is a null pointer of an annotated struct type?

A little confused here: 
1. can array_annotated->c be passed to __builtin_get_counted_by as 
__builtin_get_counted_by(array_annotated->c) when array_annotated is NULL? 
2. If YES, then this should cause a run time error?

Qing
> 
> I also wonder a bit about the multiple macro evaluations of the arguments
> for P and SIZE.
> 
> Martin

[PATCH v3] Extend check-function-bodies to allow label and directives

2024-08-27 Thread H.J. Lu

As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu 
---
 gcc/doc/sourcebuild.texi |  9 ++---
 gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
 gcc/testsuite/lib/scanasm.exp| 15 +--
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1a31f00fb65..3c55f103795 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3530,7 +3530,7 @@ assembly output.
 Passes if @var{symbol} is not defined as a hidden symbol in the test's
 assembly output.
 
-@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @}]]
+@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @} [@var{matched}]]]
 Looks through the source file for comments that give the expected assembly
 output for selected functions.  Each line of expected output starts with the
 prefix string @var{prefix} and the expected output for a function as a whole
@@ -3557,8 +3557,11 @@ Depending on the configuration (see
 @code{configure_check-function-bodies} in
 @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
 compiler's assembly output directives such as @code{.cfi_startproc},
-local label definitions such as @code{.LFB0}, and more.
-It then matches the result against the expected
+local label definitions such as @code{.LFB0}, and more.  This behavior
+can be overridden using the optional @var{matched} argument, which
+specifies a regexp for lines that should not be discarded in this way.
+
+The test then matches the result against the expected
 output for a function as a single regular expression.  This means that
 later lines can use backslashes to refer back to @samp{(@dots{})}
 captures on earlier lines.  For example:
diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 8877d0b51af..686aeb9ff31 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,20 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fcf-protection=branch" } */
+/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
+/* Keep labels and directives ('.p2align', '.cfi_startproc').
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  
} } */
 
+/*
+**foo:
+**.LFB0:
+** .cfi_startproc
+** (
+** endbr64
+** .p2align 5
+** |
+** endbr32
+** )
+**...
+*/
 char *
 foo (char *dest, const char *src)
 {
@@ -8,5 +22,3 @@ foo (char *dest, const char *src)
 /* nothing */;
   return --dest;
 }
-
-/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 42c719c512c..dd5ebb0b18c 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -952,6 +952,9 @@ proc parse_function_bodies { config filename result } {
verbose "parse_function_bodies: $function_name:\n$function_body"
set up_result($function_name) $function_body
set in_function 0
+   } elseif { $up_config(matched) ne "" \
+  && [regexp $up_config(matched) $line] } {
+   append function_body $line "\n"
} elseif { [regexp $up_config(fluff) $line] } {
verbose "parse_function_bodies: $function_name: ignoring fluff 
line: $line"
} else {
@@ -982,7 +985,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR]] } }
+# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
 #
 # See sourcebuild.texi for details.
 
@@ -990,7 +993,7 @@ proc check-function-bodies { args } {
 if { [llength $args] < 2 } {
error "too few arguments to check-function-bodies"
 }
-if { [llength $args] > 4 } {
+if { [llength $args] > 5 } {
error "too many arguments to check-function-bodies"
 }
 
@@ -1029,6 +1032,11 @@ proc check-function-bodies { args } {
}
 }
 
+set matched ""
+if { [llength $args] >=

Re: LRA: Fix setup_sp_offset

2024-08-27 Thread Michael Matz

Hello,

On Mon, 26 Aug 2024, Paul Koning wrote:

> >>> Yeah, I wondered as well.  For things to go wrong some instructions that 
> >>> contain pre/post-inc/dec of the stack pointer need to have reloads in 
> >>> such 
> >>> a way that the actual SP-change sideeffect moves to a different 
> >>> instruction.  
> >> 
> >> I think I've seen that in the past on PDP11, and reported it, but I 
> >> thought that particular issue was fixed not too long after.
> > 
> > Do you have a reference handy?  I'd like to take a look, if for nothing 
> > else than curiosity ;-)
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87944 
>  which says it was fixed 
> in GCC 14 on 5/30/2023.

Ah, yes, thanks.  The referenced patch there changed stuff at the caller 
of setup_sp_offset for the before sequence only and left the after 
sequence alone.  I think it worked for your case only because it was a 
single reload and it was in front of the insn.

Ciao,
Michael.

Re: [PATCH v2] Extend check-function-bodies to allow label and directives

2024-08-27 Thread H.J. Lu

On Tue, Aug 27, 2024 at 2:18 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > As PR target/116174 shown, we may need to verify labels and the directive
> > order.  Extend check-function-bodies to support matched output lines to
> > allow label and directives.
> >
> > gcc/
> >
> >   * doc/sourcebuild.texi (check-function-bodies): Add an optional
> >   argument for matched output lines.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/i386/pr116174.c: Use check-function-bodies.
> >   * lib/scanasm.exp (parse_function_bodies): Append the line if
> >   $up_config(matched) matches the line.
> >   (check-function-bodies): Add an argument for matched.  Set
> >   up_config(matched) to $matched.  Append the expected line without
> >   $config(line_prefix) to function_regexp if it starts with ".L".
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/doc/sourcebuild.texi |  6 +-
> >  gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
> >  gcc/testsuite/lib/scanasm.exp| 14 --
> >  3 files changed, 32 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index 1a31f00fb65..f7128f445cf 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -3530,7 +3530,7 @@ assembly output.
> >  Passes if @var{symbol} is not defined as a hidden symbol in the test's
> >  assembly output.
> >
> > -@item check-function-bodies @var{prefix} @var{terminator} [@var{options} 
> > [@{ target/xfail @var{selector} @}]]
> > +@item check-function-bodies @var{prefix} @var{terminator} [@var{options} 
> > [@{ target/xfail @var{selector} @} [@var{matched}]]]
> >  Looks through the source file for comments that give the expected assembly
> >  output for selected functions.  Each line of expected output starts with 
> > the
> >  prefix string @var{prefix} and the expected output for a function as a 
> > whole
> > @@ -3544,6 +3544,10 @@ command line.  This can help if a source file is 
> > compiled both with
> >  and without optimization, since it is rarely useful to check the full
> >  function body for unoptimized code.
> >
> > +@var{matched}, if specified, is a regular expression which matches a
> > +line of the function body.  If @var{matched} isn't specified, lines
> > +beginning with labels, directives and comments are ignored.
> > +
>
> How about instead splitting:
>
>   Depending on the configuration (see
>   @code{configure_check-function-bodies} in
>   @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
>   compiler's assembly output directives such as @code{.cfi_startproc},
>   local label definitions such as @code{.LFB0}, and more.
>   It then matches the result against the expected
>   output for a function as a single regular expression.  This means that
>   later lines can use backslashes to refer back to @samp{(@dots{})}
>   captures on earlier lines.  For example:
>
> into two paragraphs at "If then", and describing the new behaviour at
> the end of the first paragraph:
>
> 
> Depending on the configuration (see
> @code{configure_check-function-bodies} in
> @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
> compiler's assembly output directives such as @code{.cfi_startproc},
> local label definitions such as @code{.LFB0}, and more.  This behavior
> can be overridden using the optional @var{matched} argument, which
> specifies a regexp for lines that should not be discarded in this way.
>
> The test then matches the result against the expected
> output for a function as a single regular expression.  This means that
> later lines can use backslashes to refer back to @samp{(@dots{})}
> captures on earlier lines.  For example:
> 

Fixed in v3.

> >  The first line of the expected output for a function @var{fn} has the form:
> >
> >  @smallexample
> > diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
> > b/gcc/testsuite/gcc.target/i386/pr116174.c
> > index 8877d0b51af..91ec3288786 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr116174.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr116174.c
> > @@ -1,6 +1,20 @@
> >  /* { dg-do compile { target *-*-linux* } } */
> > -/* { dg-options "-O2 -fcf-protection=branch" } */
> > +/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
> > +/* Keep labels and directives ('.p2align', '.cfi_startproc').
> > +/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {.*} } 
> > } */
>
> This works, but matches everything.  Maybe {^\t?\.} would be more precise.

Fixed in v3.

> The current version is fine too though, if you think it will work for all
> assembly dialects.
>
> >
> > +/*
> > +**foo:
> > +**.LFB0:
> > +**   .cfi_startproc
> > +** (
> > +**   endbr64
> > +**   .p2align 5
> > +** |
> > +**   endbr32
> > +** )
> > +**...
> > +*/
> >  char *

Re: [PATCH v2] Extend check-function-bodies to allow label and directives

2024-08-27 Thread Richard Sandiford

"H.J. Lu"  writes:
>> >   append function_regexp ")"
>> >   } elseif { [string equal $line "..."] } {
>> >   append function_regexp ".*"
>> > + } elseif { [regexp "^.L.*" $line] } {
>>
>> {^\.L} would be more precise than "^.L.*".
>
> I tried  {^\.L}.  It didn't work.  I used "^.L" in v3.

Why didn't it work though?  "^.L.*" matches "ALL" as well as ".L".

Richard

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-27 Thread Qing Zhao



> On Aug 27, 2024, at 02:17, Martin Uecker  wrote:
> 
> Am Montag, dem 26.08.2024 um 17:21 -0700 schrieb Kees Cook:
>> On Mon, Aug 26, 2024 at 11:01:08PM +0200, Martin Uecker wrote:
>>> Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
 On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> Hi, Martin,
> 
> Looks like that there is some issue when I tried to use the _Generic for 
> the testing cases, and then I narrowed down to a
> small testing case that shows the problem without any change to GCC.
> 
> [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> struct annotated {
>  char b;
>  int c[];
> } *array_annotated;  
> extern void * counted_by_ref (int *);
> 
> int main(int argc, char *argv[])
> {
>  typeof(counted_by_ref (array_annotated->c)) ret
>= counted_by_ref (array_annotated->c); 
>   _Generic (ret, void* : (void)0, default: *ret = 10);
> 
>  return 0;
> }
> [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> t1.c: In function ‘main’:
> t1.c:12:44: warning: dereferencing ‘void *’ pointer
>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>  |^~~~
> t1.c:12:49: error: invalid use of void expression
>   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
>  | ^
 
 I implemented it like this[1] in the Linux kernel. So yours could be:
 
 struct annotated {
  char b;
  int c[] __attribute__((counted_by(b));
 };
 extern struct annotated *array_annotated;
 
 int main(int argc, char *argv[])
 {
  typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
void *: (size_t *)NULL,
default: __builtin_get_counted_by(array_annotated->c)))
 ret = __builtin_get_counted_by(array_annotated->c);
  if (ret)
 *ret = 10;
 
  return 0;
 }
 
 It's a bit cumbersome, but it does what's needed.
 
 This is, however, just doing exactly what Bill has suggested: it is
 converting the (void *)NULL into (size_t *)NULL when there is no
 counted_by annotation...
 
 -Kees
 
 [1] 
 https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
>>> 
>>> Interesting. Will __builtin_get_counted_by(array_annotated->c) give
>>> a null pointer (or an invalid pointer) of the correct type if 
>>> array_annotated is a null pointer of an annotated struct type?
>> 
>> If you mean this part:
>> 
>> typeof(P) __obj_ptr = NULL; \
>> /* Just query the counter type for type_max checking. */ \
>> typeof(_Generic(__flex_counter(__obj_ptr->FAM), \
>> void *: (size_t *)NULL, \
>> default: __flex_counter(__obj_ptr->FAM))) \
>> __counter_type_ptr = NULL; \
>> 
>> Where __obj_ptr starts as NULL, then yes. (Or at least, yes it does
>> currently with Qing's GCC patch and Bill's Clang patch.)
> 
> Does __builtin_get_counted_by not evaluate its argument?

__builtin_get_counted_by currently is implemented as a C reserved words, and 
purely implemented in C parser as an C operator. 

So, it doesn’t apply complicated evaluations on its argument. 
I think that this should provide enough and simple functionality as needed. 
If no, please let me know. 


> In any
> case, I think this should be documented whether this is 
> supposed to work (or not).
Okay. 
> 
>> 
>>> I also wonder a bit about the multiple macro evaluations of the arguments
>>> for P and SIZE.
>> 
>> I tried to design it so they aren't used with anything that should
>> have side-effects.
> 
> I was more concerned about the cost of macro expansions on
> compile times. I would do:
> 
> __auto_type __FOO = (FOO);
> 
> for all macro parameters that are evaluated multiple times
> and are expressions which might contain macros themselves.
> 
> There is also the issue of evaluation of typeof for variably modified 
> types, which might not currently affect the kernel, but this would
> also become safer for such types.
> 
> 
>> Anyway, if __builtin_get_counted_by returns (size_t *)NULL then I think
>> the _Generic wrapping isn't needed. That would make it easier to use?
> 
> It would make it easier for your use case.  I wonder though
> whether other people might want to have the compile time error
> when there is no attribute.

Then this will go back to the previous discussion: whether we should go the 
approach  for the unary operator __counted_by(PTR), then the other builtin 
__builtin_has_attribute(PTR, counted_by) should be provided to the user. 

For GCC, there is no issue, we already have the __builtin_has_attribute (PTR, 
counted_by) available. 
But CLANG doesn’t have this builtin currently, need to implement it before the 
unary operator __counted_by(PTR).

Let me know your opinion.

Thanks

Qing
> 
> 
> Martin
> 
>> 
>> -Kees

[PATCH] RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

2024-08-27 Thread Robin Dapp

Hi,

this is a hopefully better way to solve the "subreg problem" by first,
in the generic case, have the RA go via memory and second, providing a
vector-vector extract that deals with it in an optimized way.

When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).

As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.

The culprit in PR116086 is

 _12 = BIT_FIELD_REF ;

which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

PR target/116086

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.
---
 gcc/config/riscv/autovec.md   |  35 +
 gcc/config/riscv/riscv.cc |  11 ++
 gcc/config/riscv/vector-iterators.md  | 147 ++
 .../riscv/rvv/autovec/pr116086-2-run.c|   6 +
 .../gcc.target/riscv/rvv/autovec/pr116086-2.c |  18 +++
 .../gcc.target/riscv/rvv/autovec/pr116086.c   |  76 +
 gcc/testsuite/lib/target-supports.exp |  37 +
 7 files changed, 330 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f422ec0dc1e..5a7cf3523a7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1462,6 +1462,41 @@ (define_expand "vec_extractbi"
   DONE;
 })
 
+;; -
+;;  [INT,FP] Extract a vector from a vector.
+;; -
+;; TODO: This can be extended to allow basically any extract mode.
+;; For now this helps optimize VLS subregs like (subreg:V2DI (reg:V4DI) 16)
+;; that would otherwise need to go via memory.
+
+(define_expand "vec_extract"
+  [(set (match_operand: 0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:V_HAS_HALF1 "register_operand")
+   (parallel
+[(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+   (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
 ;; -
 ;;  [FP] Binary operations
 ;; -
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..4b9f3081ac5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10630,6 +10630,17 @@ riscv_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (reg_classes_intersect_p (V_REGS, rclass)
   && !ordered_p (GET_MODE_PRECISION (from), GET_MODE_PRECISION (to)))
 return false;
+
+  /* Subregs of modes larger than one vector are ambiguous.
+ A V4DImode with rv64gcv_zvl128b could, for example, span two registers/one
+ register group of two at VLEN = 128 or one register at VLEN >= 256 and
+ we cannot, statically, determine which part of it to extract.
+ Therefore prevent that.  */
+  if (reg_classes_intersect_p (V_REGS, rclass)
+  && riscv_v_ext_vls_mode_p (from)
+  && !ordered_p (BITS_PER_RISCV_VECTOR, GET_MODE_PRECISION (from)))
+  return false;
+
   return !reg_classes_intersect_p (FP_REGS, rclass);
 }
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iter

[PATCH v2] gimple ssa: switchconv: Use __builtin_popcount and support more types in exp transform [PR116355]

2024-08-27 Thread Filip Kastl

Hi,

this is the second version of this patch.  See the mail with the first version
here:

https://inbox.sourceware.org/gcc-patches/ZsnRLdYErnIWQlCe@localhost.localdomain/

In this version I've made these adjustments:
- Added calls direct_internal_fn_supported_p to can_pow2p.  Before I just
  assumed that if the target supports FFS at all it will support it for
  unsigned, long unsigned and long long unsigned and didn't check this.
  - Also added a direct_intenal_fn_supported_p check for the type passed to
can_pow2p as a small compile time optimization.  This is the first check
that runs and if it is positive, the function exits early and doesn't
bother with any conversions.
- can_pow2p and can_log2 now return the type that gen_pow2p and gen_log2 should
  convert to before generating the respective operation.  gen_pow2p and
  gen_log2 now have this type as one of their parameters.
- Using gimple_convert instead of manually building CONVERT_EXPR/NOP_EXPR
  assignments.
- Using gimple_build for building __builtin_popcount.
- Adjusted ChangeLog entries.

Bootstrapped and regtested on x86_64 linux.  Ok to push?

Cheers,
Filip Kastl


-- 8< --


The gen_pow2p function generates (a & -a) == a as a fallback for
POPCOUNT (a) == 1.  Not only is the bitmagic not equivalent to
POPCOUNT (a) == 1 but it also introduces UB (consider signed
a = INT_MIN).

This patch rewrites gen_pow2p to always use __builtin_popcount instead.
This means that what the end result GIMPLE code is gets decided by an
already existing machinery in a later pass.  That is a cleaner solution
I think.  This existing machinery also uses a ^ (a - 1) > a - 1 which is
the correct bitmagic.

While rewriting gen_pow2p I had to add logic for converting the
operand's type to a type that __builtin_popcount accepts.  I naturally
also added this logic to gen_log2.  Thanks to this, exponential index
transform gains the capability to handle all operand types with
precision at most that of long long int.

PR tree-optimization/116355

gcc/ChangeLog:

* tree-switch-conversion.cc (can_log2): Add capability to
suggest converting the operand to a different type.
(gen_log2): Add capability to generate a conversion in case the
operand is of a type incompatible with the logarithm operation.
(can_pow2p): New function.
(gen_pow2p): Rewrite to use __builtin_popcount instead of
manually inserting an internal fn call or bitmagic.  Also add
capability to generate a conversion.
(switch_conversion::is_exp_index_transform_viable): Call
can_pow2p.  Store types suggested by can_log2 and gen_log2.
(switch_conversion::exp_index_transform): Params of gen_pow2p
and gen_log2 changed so update their calls.
* tree-switch-conversion.h: Add m_exp_index_transform_log2_type
and m_exp_index_transform_pow2p_type to switch_conversion class
to track type conversions needed to generate the "is power of 2"
and logarithm operations.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of POPCOUNT internal fn after switch conversion.  Test
for it after __builtin_popcount has had a chance to get
expanded.
* gcc.target/i386/switch-exp-transform-3.c: Also test char and
short.

Signed-off-by: Filip Kastl 
---
 .../gcc.target/i386/switch-exp-transform-1.c  |   7 +-
 .../gcc.target/i386/switch-exp-transform-3.c  |  98 ++-
 gcc/tree-switch-conversion.cc | 152 ++
 gcc/tree-switch-conversion.h  |   7 +
 4 files changed, 227 insertions(+), 37 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
index 53d31460ba3..a8c9e03e515 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-1.c
@@ -1,9 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-switchconv -mpopcnt -mbmi" } */
+/* { dg-options "-O2 -fdump-tree-switchconv -fdump-tree-widening_mul -mpopcnt 
-mbmi" } */
 
 /* Checks that exponential index transform enables switch conversion to convert
this switch into an array lookup.  Also checks that the "index variable is a
-   power of two" check has been generated.  */
+   power of two" check has been generated and that it has been later expanded
+   into an internal function.  */
 
 int foo(unsigned bar)
 {
@@ -29,4 +30,4 @@ int foo(unsigned bar)
 }
 
 /* { dg-final { scan-tree-dump "CSWTCH" "switchconv" } } */
-/* { dg-final { scan-tree-dump "POPCOUNT" "switchconv" } } */
+/* { dg-final { scan-tree-dump "POPCOUNT" "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index 64a7b146172..5011d1ebb0e 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-

[PATCH v4] Extend check-function-bodies to allow label and directives

2024-08-27 Thread H.J. Lu

As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu 
---
 gcc/doc/sourcebuild.texi |  9 ++---
 gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
 gcc/testsuite/lib/scanasm.exp| 15 +--
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1a31f00fb65..3c55f103795 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3530,7 +3530,7 @@ assembly output.
 Passes if @var{symbol} is not defined as a hidden symbol in the test's
 assembly output.
 
-@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @}]]
+@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @} [@var{matched}]]]
 Looks through the source file for comments that give the expected assembly
 output for selected functions.  Each line of expected output starts with the
 prefix string @var{prefix} and the expected output for a function as a whole
@@ -3557,8 +3557,11 @@ Depending on the configuration (see
 @code{configure_check-function-bodies} in
 @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
 compiler's assembly output directives such as @code{.cfi_startproc},
-local label definitions such as @code{.LFB0}, and more.
-It then matches the result against the expected
+local label definitions such as @code{.LFB0}, and more.  This behavior
+can be overridden using the optional @var{matched} argument, which
+specifies a regexp for lines that should not be discarded in this way.
+
+The test then matches the result against the expected
 output for a function as a single regular expression.  This means that
 later lines can use backslashes to refer back to @samp{(@dots{})}
 captures on earlier lines.  For example:
diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 8877d0b51af..686aeb9ff31 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,20 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fcf-protection=branch" } */
+/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
+/* Keep labels and directives ('.p2align', '.cfi_startproc').
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  
} } */
 
+/*
+**foo:
+**.LFB0:
+** .cfi_startproc
+** (
+** endbr64
+** .p2align 5
+** |
+** endbr32
+** )
+**...
+*/
 char *
 foo (char *dest, const char *src)
 {
@@ -8,5 +22,3 @@ foo (char *dest, const char *src)
 /* nothing */;
   return --dest;
 }
-
-/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 42c719c512c..737eefc655e 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -952,6 +952,9 @@ proc parse_function_bodies { config filename result } {
verbose "parse_function_bodies: $function_name:\n$function_body"
set up_result($function_name) $function_body
set in_function 0
+   } elseif { $up_config(matched) ne "" \
+  && [regexp $up_config(matched) $line] } {
+   append function_body $line "\n"
} elseif { [regexp $up_config(fluff) $line] } {
verbose "parse_function_bodies: $function_name: ignoring fluff 
line: $line"
} else {
@@ -982,7 +985,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR]] } }
+# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
 #
 # See sourcebuild.texi for details.
 
@@ -990,7 +993,7 @@ proc check-function-bodies { args } {
 if { [llength $args] < 2 } {
error "too few arguments to check-function-bodies"
 }
-if { [llength $args] > 4 } {
+if { [llength $args] > 5 } {
error "too many arguments to check-function-bodies"
 }
 
@@ -1029,6 +1032,11 @@ proc check-function-bodies { args } {
}
 }
 
+set matched ""
+if { [llength $args] >=

Re: [PATCH v2] Extend check-function-bodies to allow label and directives

2024-08-27 Thread H.J. Lu

On Tue, Aug 27, 2024 at 6:54 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> >> >   append function_regexp ")"
> >> >   } elseif { [string equal $line "..."] } {
> >> >   append function_regexp ".*"
> >> > + } elseif { [regexp "^.L.*" $line] } {
> >>
> >> {^\.L} would be more precise than "^.L.*".
> >
> > I tried  {^\.L}.  It didn't work.  I used "^.L" in v3.
>
> Why didn't it work though?  "^.L.*" matches "ALL" as well as ".L".
>
> Richard

I tried it again by typing it by hand.  It works now.  Fixed in v4:

https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661586.html

Thanks.


--
H.J.

[PATCH] RISC-V: Add missing mode_idx for vrol and vror

2024-08-27 Thread Kito Cheng

We add pattern for vector rotate, but seems like we forgot adding
mode_idx which used in AVL propgation (riscv-avlprop.cc).

gcc/ChangeLog:

* config/riscv/vector.md (mode_idx): Add vrol and vror.

gcctestsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/rotr.c: New.
---
 gcc/config/riscv/vector.md|  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/rotr.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/rotr.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 666719330c6..d0677325ba1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -816,7 +816,7 @@
vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\

vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\

vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
-   
vgather,vcompress,vmov,vnclip,vnshift,vandn,vcpop,vclz,vctz")
+   
vgather,vcompress,vmov,vnclip,vnshift,vandn,vcpop,vclz,vctz,vrol,vror")
   (const_int 0)
 
   (eq_attr "type" "vimovvx,vfmovvf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/rotr.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/rotr.c
new file mode 100644
index 000..055b28d1e78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/rotr.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvbb -mabi=lp64d -fno-vect-cost-model 
-mrvv-vector-bits=zvl" } */
+
+typedef int a;
+void *b;
+a c;
+void d() {
+  a e = c, f =0;
+  short *g = b;
+  for (; f < e; f++)
+*(g + f) = (255 & (*(g + f) >> 8)) | *(g + f) << 8;
+}
+
-- 
2.34.1

Re: [PATCH v4] Extend check-function-bodies to allow label and directives

2024-08-27 Thread Andreas Schwab

On Aug 27 2024, H.J. Lu wrote:

> diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
> b/gcc/testsuite/gcc.target/i386/pr116174.c
> index 8877d0b51af..686aeb9ff31 100644
> --- a/gcc/testsuite/gcc.target/i386/pr116174.c
> +++ b/gcc/testsuite/gcc.target/i386/pr116174.c
> @@ -1,6 +1,20 @@
>  /* { dg-do compile { target *-*-linux* } } */
> -/* { dg-options "-O2 -fcf-protection=branch" } */
> +/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
> +/* Keep labels and directives ('.p2align', '.cfi_startproc').
> +/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.} 
>  } } */

This has a "nested" comment (line 3 missing coment end).

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] RISC-V: Add missing mode_idx for vrol and vror

2024-08-27 Thread Robin Dapp

You don't need an OK of course but LGTM.

When I found another instance of this I was thinking about having
exhaustive self tests for those attributes.  Maybe a good learning
exercise?

-- 
Regards
 Robin

[PATCH] pr116174.c: Add the missing */

2024-08-27 Thread H.J. Lu

* gcc.target/i386/pr116174.c: Add the missing */.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/pr116174.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 686aeb9ff31..3c8000f2aad 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -g0 -fcf-protection=branch" } */
-/* Keep labels and directives ('.p2align', '.cfi_startproc').
+/* Keep labels and directives ('.p2align', '.cfi_startproc').  */
 /* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  
} } */
 
 /*
-- 
2.46.0

Re: [PATCH v4] Extend check-function-bodies to allow label and directives

2024-08-27 Thread Richard Sandiford

Andreas Schwab  writes:
> On Aug 27 2024, H.J. Lu wrote:
>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
>> b/gcc/testsuite/gcc.target/i386/pr116174.c
>> index 8877d0b51af..686aeb9ff31 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr116174.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr116174.c
>> @@ -1,6 +1,20 @@
>>  /* { dg-do compile { target *-*-linux* } } */
>> -/* { dg-options "-O2 -fcf-protection=branch" } */
>> +/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
>> +/* Keep labels and directives ('.p2align', '.cfi_startproc').
>> +/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
>> {^\t?\.}  } } */
>
> This has a "nested" comment (line 3 missing coment end).

The patch is ok with that fixed.

Thanks,
Richard

Re: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

2024-08-27 Thread juzhe.zh...@rivai.ai

+(define_mode_iterator V_HAS_HALF [
+  V2QI V4QI V8QI V16QI V32QI V64QI V128QI V256QI V512QI V1024QI V2048QI V4096QI
+  V2HI V4HI V8HI V16HI V32HI V64HI V128HI V256HI V512HI V1024HI V2048HI
+  V2SI V4SI V8SI V16SI V32SI V64SI V128SI V256SI V512SI V1024SI
+  V2DI V4DI V8DI V16DI V32DI V64DI V128DI V256DI V512DI
+  V2SF V4SF V8SF V16SF V32SF V64SF V128SF V256SF V512SF V1024SF
+  V2DF V4DF V8DF V16DF V32DF V64DF V128DF V256DF V512DF
+])

Seems you missed predicate here ?
Like:
(V4096QI "riscv_vector::vls_mode_valid_p (V4096QImode) && TARGET_MIN_VLEN >= 
4096")(V32DF "riscv_vector::vls_mode_valid_p (V32DFmode) && 
TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 256")



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-08-27 22:02
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector 
[PR116086].
Hi,
 
this is a hopefully better way to solve the "subreg problem" by first,
in the generic case, have the RA go via memory and second, providing a
vector-vector extract that deals with it in an optimized way.
 
When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).
 
As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.
 
The culprit in PR116086 is
 
_12 = BIT_FIELD_REF ;
 
which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
PR target/116086
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_extract): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.
 
gcc/testsuite/ChangeLog:
 
* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.
---
gcc/config/riscv/autovec.md   |  35 +
gcc/config/riscv/riscv.cc |  11 ++
gcc/config/riscv/vector-iterators.md  | 147 ++
.../riscv/rvv/autovec/pr116086-2-run.c|   6 +
.../gcc.target/riscv/rvv/autovec/pr116086-2.c |  18 +++
.../gcc.target/riscv/rvv/autovec/pr116086.c   |  76 +
gcc/testsuite/lib/target-supports.exp |  37 +
7 files changed, 330 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f422ec0dc1e..5a7cf3523a7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1462,6 +1462,41 @@ (define_expand "vec_extractbi"
   DONE;
})
+;; -
+;;  [INT,FP] Extract a vector from a vector.
+;; -
+;; TODO: This can be extended to allow basically any extract mode.
+;; For now this helps optimize VLS subregs like (subreg:V2DI (reg:V4DI) 16)
+;; that would otherwise need to go via memory.
+
+(define_expand "vec_extract"
+  [(set (match_operand: 0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:V_HAS_HALF 1 "register_operand")
+   (parallel
+ [(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+ (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+ riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
;; -
;;  [FP] Binary operations
;; -
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..4b9f3081ac5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10630,6 +10630,17 @@ riscv_can_change_mode_class (machine_mode from,

Re: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

2024-08-27 Thread Jeff Law





On 8/27/24 8:02 AM, Robin Dapp wrote:

Hi,

this is a hopefully better way to solve the "subreg problem" by first,
in the generic case, have the RA go via memory and second, providing a
vector-vector extract that deals with it in an optimized way.

When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).

As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.

The culprit in PR116086 is

  _12 = BIT_FIELD_REF ;

which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.

Regtested on rv64gcv_zvfh_zvbb.

Regards
  Robin

PR target/116086

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.
---



diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..4b9f3081ac5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10630,6 +10630,17 @@ riscv_can_change_mode_class (machine_mode from, 
machine_mode to,
if (reg_classes_intersect_p (V_REGS, rclass)
&& !ordered_p (GET_MODE_PRECISION (from), GET_MODE_PRECISION (to)))
  return false;
+
+  /* Subregs of modes larger than one vector are ambiguous.
+ A V4DImode with rv64gcv_zvl128b could, for example, span two registers/one
+ register group of two at VLEN = 128 or one register at VLEN >= 256 and
+ we cannot, statically, determine which part of it to extract.
+ Therefore prevent that.  */
+  if (reg_classes_intersect_p (V_REGS, rclass)
+  && riscv_v_ext_vls_mode_p (from)
+  && !ordered_p (BITS_PER_RISCV_VECTOR, GET_MODE_PRECISION (from)))
+  return false;
+
return !reg_classes_intersect_p (FP_REGS, rclass);
  }
Yea, this looks much more likely to avoid problems in the middle-end by 
indicating it's not safe to use a SUBREG to change the view of certain 
vector modes.


I think this is good to go once Juzhe's comments are addressed.

jeff

Re: [PATCH v2 1/9] RISC-V: Fix vid const vector expander for non-npatterns size steps

2024-08-27 Thread Jeff Law





On 8/26/24 6:36 PM, Patrick O'Neill wrote:

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2,  4,  4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

OK
jeff

Re: [PATCH v2 2/9] RISC-V: Reorder insn cost match order to match corresponding expander match order

2024-08-27 Thread Jeff Law





On 8/26/24 6:36 PM, Patrick O'Neill wrote:

The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Relocate.

Already ACK'd.

jeff

Re: [PATCH v2 3/9] RISC-V: Handle case when constant vector construction target rtx is not a register

2024-08-27 Thread Jeff Law





On 8/26/24 6:36 PM, Patrick O'Neill wrote:

This manifests in RTL that is optimized away which causes runtime failures
in the testsuite. Update all patterns to use a temp result register if required.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.
OK.  Just a note below, I don't think you necessarily need to change 
anything.





Signed-off-by: Patrick O'Neill 
---
  gcc/config/riscv/riscv-v.cc | 73 +
  1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a3039a2cb19..aea4b9b872b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1150,26 +1150,29 @@ static void
  expand_const_vector (rtx target, rtx src)
  {
machine_mode mode = GET_MODE (target);
+  rtx result = register_operand (target, mode) ? target : gen_reg_rtx (mode);


So a cheaper test would be REG_OR_SUBREG_P rather than register_operand.

While testing register_operand does check the mode, if we have a 
mismatch on the modes between src/target, then the copy from result to 
target is going to fail.


But again, I don't think you really need to change anything here.  Just 
pointing out the marginally more efficient test.


jeff

Re: [PATCH v2 4/9] RISC-V: Emit costs for bool and stepped const vectors

2024-08-27 Thread Jeff Law





On 8/26/24 6:36 PM, Patrick O'Neill wrote:

These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for
bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.

Already ACK'd.

jeff

Re: [PATCH v2 5/9] RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

2024-08-27 Thread Jeff Law





On 8/26/24 6:36 PM, Patrick O'Neill wrote:

The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point
case.

OK
Jeff

Re: [PATCH v2 6/9] RISC-V: Allow non-duplicate bool patterns in expand_const_vector

2024-08-27 Thread Jeff Law





On 8/26/24 6:37 PM, Patrick O'Neill wrote:

Currently we assert when encountering a non-duplicate boolean vector.
This patch allows non-duplicate vectors to fall through to the
gcc_unreachable and assert there.

This will be useful when adding a catch-all pattern to emit costs and
handle arbitary vectors.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate
to fall through other patterns before asserting.

OK
jeff

Re: [PATCH v2 7/9] RISC-V: Move helper functions above expand_const_vector

2024-08-27 Thread Jeff Law





On 8/26/24 6:37 PM, Patrick O'Neill wrote:

These subroutines will be used in expand_const_vector in a future patch.
Relocate so expand_const_vector can use them.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vector_init_insert_elems): Relocate.
(expand_vector_init_trailing_same_elem): Ditto.

Already ACK'd.
jeff

Re: [PATCH v2 8/9] RISC-V: Add vslide1up/down pattern to expand_const_vector

2024-08-27 Thread Jeff Law





On 8/26/24 6:37 PM, Patrick O'Neill wrote:

Also explicitly disallow CONST_VECTOR_DUPLICATE_P for now.
CONST_VECTOR_DUPLICATE_P was previously disallowed implicitly.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Update comment.
(expand_vector_init_insert_elems): Ditto.
(expand_const_vector): Add catch-all pattern.
* config/riscv/riscv.cc (riscv_const_insns): Add costing for catch-all
pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/materialize-1.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-2.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-3.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-4.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-5.c: New test.
* gcc.target/riscv/rvv/autovec/materialize-6.c: New test.

Signed-off-by: Patrick O'Neill 
---
This causes 4 new regressions on glibc rv64gcv:
Appears to be spilling due to the increased register pressure from 
materializing constants for vslide1down:
FAIL: gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c scan-assembler-not jr
FAIL: gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c scan-assembler-not sp

Caused due to vle32/64 being replaced with splat & vslide1down:
FAIL: gcc.target/riscv/rvv/autovec/vls/init-5.c -O3 -ftree-vectorize 
-mrvv-vector-bits=scalable  scan-assembler-times vle32\\.v 7
FAIL: gcc.target/riscv/rvv/autovec/vls/init-7.c -O3 -ftree-vectorize 
-mrvv-vector-bits=scalable  scan-assembler-times vle64\\.v 7
Going to assume you'll do something with those scan-asm tests as a 
follow-up.




I'm not sure if it's profitable to replace a lmul8 load with 127 vslide1down.vx
ops but we're being honest with the middle end when returning the # of insns
we'll be emitting when costing...
Yea.  I think in general we don't really know how LMUL is going to 
perform on designs.  I'd rather be honest with the middle end.


OK.

jeff

Re: [PATCH v2 9/9] RISC-V: Add cost model asserts

2024-08-27 Thread Jeff Law





On 8/26/24 6:37 PM, Patrick O'Neill wrote:

This patch adds some advanced checking to assert that the emitted costs match
emitted patterns for const_vecs.

Flow:
Costing: Insert into hashmap>
Expand: Check for membership in hashmap
  -> Not in hashmap: ignore, this wasn't costed
  -> In hashmap: Iterate over vec
 -> if RTX not in hashmap: Ignore, this wasn't costed (hash collision)
 -> if RTX in hashmap: Assert enum is expected

There are no false positive asserts with this flow.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Add RTL_CHECKING gated
asserts.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
* config/riscv/riscv-v.h (insert_expected_pattern): Add helper function
to insert hash collisions into hash map vec key.
(get_expected_costed_type): Add helper function to get the expected
cost type for a given rtx pattern.
I suspect this is going to be problematical at some point, particularly 
since we can get hash conflicts for cases that aren't problematical.


In general we also want to avoid #ifdefs -- we're not clean in that 
regards by any means, but much of that cruft has been converted to a 
runtime check.  The basic idea is that conditionally compiled code like 
that tends to be problematical for various checks like unused 
variables/paramters, use-without-defintion objects, etc.



I'd tend to prefer to drop this, but I'm not steadfastly against including.

jeff

Re: [PATCH] testsuite: Reduce cut-&-paste in scanltranstree.exp

2024-08-27 Thread Alex Coplan

On 15/08/2024 13:55, Richard Sandiford wrote:
> scanltranstree.exp defines some LTO wrappers around standard
> non-LTO scanners.  Four of them are cut-&-paste variants of
> one another, so this patch generates them from a single template.
> It also does the same for scan-ltrans-tree-dump-times, so that
> other *-times scanners can be added easily in future.
> 
> The scanners seem to be lightly used.  gcc.dg/ipa/ipa-icf-38.c uses
> scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
> uses scan-ltrans-tree-dump-{not,times}.  Nothing currently seems
> to use scan-ltrans-tree-dump-dem*.
> 
> Tested on the files above so far.  Surprisingly, it worked first time,
> but I tested that deliberately introduced mistakes were flagged.
> (That's my story anyway.)  OK if it passes full testing on
> aarch64-linux-gnu & x86_64-linux-gnu?

Thanks for doing this.  I had the feeling when trying to add the RTL
variants of the scanners that the code needed refactoring, but my Tcl
skills really weren't up to the job.  I learned a lot about Tcl by
trying to make sense of this patch.

I'll try adding the RTL variants on top of this.

Thanks,
Alex

> 
> Richard
> 
> 
> gcc/testsuite/
>   * lib/scanltranstree.exp: Redefine the routines using two
>   templates.
> ---
>  gcc/testsuite/lib/scanltranstree.exp | 186 +--
>  1 file changed, 62 insertions(+), 124 deletions(-)
> 
> diff --git a/gcc/testsuite/lib/scanltranstree.exp 
> b/gcc/testsuite/lib/scanltranstree.exp
> index 79f05f0ffed..bc6e02dc369 100644
> --- a/gcc/testsuite/lib/scanltranstree.exp
> +++ b/gcc/testsuite/lib/scanltranstree.exp
> @@ -19,130 +19,68 @@
>  
>  load_lib scandump.exp
>  
> -# Utility for scanning compiler result, invoked via dg-final.
> -# Call pass if pattern is present, otherwise fail.
> -#
> -# Argument 0 is the regexp to match
> -# Argument 1 is the name of the dumped tree pass
> -# Argument 2 handles expected failures and the like
> -proc scan-ltrans-tree-dump { args } {
> -
> -if { [llength $args] < 2 } {
> - error "scan-ltrans-tree-dump: too few arguments"
> - return
> -}
> -if { [llength $args] > 3 } {
> - error "scan-ltrans-tree-dump: too many arguments"
> - return
> -}
> -if { [llength $args] >= 3 } {
> - scan-dump "ltrans-tree" [lindex $args 0] \
> -   "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 1]" ".ltrans0.ltrans" \
> -   [lindex $args 2]
> -} else {
> - scan-dump "ltrans-tree" [lindex $args 0] \
> -   "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 1]" ".ltrans0.ltrans"
> -}
> -}
> -
> -# Call pass if pattern is present given number of times, otherwise fail.
> -# Argument 0 is the regexp to match
> -# Argument 1 is number of times the regexp must be found
> -# Argument 2 is the name of the dumped tree pass
> -# Argument 3 handles expected failures and the like
> -proc scan-ltrans-tree-dump-times { args } {
> -
> -if { [llength $args] < 3 } {
> - error "scan-ltrans-tree-dump-times: too few arguments"
> - return
> -}
> -if { [llength $args] > 4 } {
> - error "scan-ltrans-tree-dump-times: too many arguments"
> - return
> -}
> -if { [llength $args] >= 4 } {
> - scan-dump-times "ltrans-tree" [lindex $args 0] [lindex $args 1] \
> - "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 2]" \
> - ".ltrans0.ltrans" [lindex $args 3]
> -} else {
> - scan-dump-times "ltrans-tree" [lindex $args 0] [lindex $args 1] \
> - "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 2]" 
> ".ltrans0.ltrans"
> -}
> +# The first item in the list is an LTO equivalent of the second item
> +# in the list; see the documentation of the second item for details.
> +foreach { name scan type suffix } {
> +scan-ltrans-tree-dump scan-dump ltrans-tree t
> +scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
> +scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
> +scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
> +} {
> +eval [string map [list @NAME@ $name \
> +@SCAN@ $scan \
> +@TYPE@ $type \
> +@SUFFIX@ $suffix] {
> +proc @NAME@ { args } {
> + if { [llength $args] < 2 } {
> + error "@NAME@: too few arguments"
> + return
> + }
> + if { [llength $args] > 3 } {
> + error "@NAME@: too many arguments"
> + return
> + }
> + if { [llength $args] >= 3 } {
> + @SCAN@ @TYPE@ [lindex $args 0] \
> + "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> + ".ltrans0.ltrans" \
> + [lindex $args 2]
> + } else {
> + @SCAN@ @TYPE@ [lindex $args 0] \
> + "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> + ".ltrans0.ltrans"
> + }
> +}
> +}]
>  }
>  
> -# Call pass if patt

Re: [PATCH 1/5] Handle namespaced names for CodeView

2024-08-27 Thread Jeff Law





On 8/26/24 4:48 PM, Mark Harmstone wrote:

Run all CodeView names through a new function get_name, which chains
together a DIE's DW_AT_name with that of its parent to create a
C++-style name.

gcc/
* dwarf2codeview.cc (get_name): New function.
(add_enum_forward_def): Call get_name.
(get_type_num_enumeration_type): Call get_name.
(add_struct_forward_def): Call get_name.
(get_type_num_struct): Call get_name.
(add_variable): Call get_name.
(add function): Call get_name.
* dwarf2out.cc (get_die_parent): Rename to dw_get_die_parent and make
non-static.
(generate_type_signature): Handle renamed get_die_parent.
* dwarf2out.h (dw_get_die_parent): Add declaration.

This series is fine.

I don't think I'm really adding much with the review step.  You're the 
expert here, so with your permission I'd like to ask the steering 
committee to ACK you as maintainer of the codeview bits.


Jeff

Re: [patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-27 Thread Jeff Law





On 8/26/24 1:15 PM, Georg-Johann Lay wrote:



What the avr-ifelse pass does is try to replace 2 cbranch insns with
one compare insn and two branches.  It runs after reload and just prior
to .split2 (split_after_reload).  It must run after reload because
REG_CC comes into existence in .split2.  For example, the last case
belongs to transforming

    if (x == (unsigned) -1) goto A;
    if (x == (unsigned) -2) goto B;

to

    REG_CC = x compare (unsigned) -2;
    if (REG_CC >  0) goto A;
    if (REG_CC == 0) goto B;
Hmm.  I'd envisioned doing this in gimple, but as your example shows, 
it's not suitable for gimple (it's an extra expression evaluation).




(As an aside, this will be transformed further down the line to

    REG_CC = x compare (unsigned) -2;
    if (REG_CC == 0) goto B;
    if (REG_CC >= 0) goto A;

in order to avoid GTU.)

None of the code really looks AVR specific, so is there a good reason 
why we're not doing this in one of the target independent passes?


That target-independent pass would be compare-elim, which avr does
not use.  Some of the reasons why not using compare-elim I tried to
get across in the review for PR115830:
It feels like it'd fit in RTL jump optimizations, well outside 
compare-elim.  Though that may still be too high level.  So yea, let's 
keep it AVR specific.



Jeff

Re: [RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-08-27 Thread Mariam Arutunian

On Tue, Aug 27, 2024 at 12:25 PM Richard Biener 
wrote:

> On Mon, Aug 26, 2024 at 5:26 PM Matevos Mehrabyan
>  wrote:
> >
> >
> >
> > On Mon, Aug 26, 2024 at 2:44 AM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 8/20/24 5:41 AM, Richard Biener wrote:
> >>
> >> >
> >> > So the store-merging variant IIRC tracks a single overall source
> >> > only (unless it was extended and I missed that) and operates at
> >> > a byte granularity.  I did want to extend it to support vector
> shuffles
> >> > at one point (with two sources then), but didn't get to it.  The
> >> > current implementation manages to be quite efficient - mainly due
> >> > to the restriction to a single source I guess.
> >> >
> >> > How does that compare to the symbolic execution engine?
> >> >
> >> > What can the symbolic execution engine handle?  The store-merging
> >> > machinery can only handle plain copies, can the symbolic
> >> > execution engine tell that for example bits 3-7 are bits 1-5 from A
> >> > plus constant 9 with appropriately truncated result?
> >> Conceptually this is the kind of thing it's supposed to handle, but
> >> there may be implementation details that are missing for the case you
> want.
> >>
> >> More importantly, the execution engine has a limited set of expressions
> >> it knows how to evaluate, so there's a reasonable chance if you feed it
> >> something more general than what is typically seen in a CRC loop that
> >> it's going to give up because it doesn't know how to handle more than
> >> just a few operators.
> >>
> >>
> >
> > By using this symbolic execution engine, you can determine that bits 3-7
> are bits 1-5 from A.
> > I think the documentation will help others to understand how it works
> and what it does.
> > Since the documentation is not ready, here is a simple demo example:
> > For the following code:
> >
> > foo(byte A) {
> > byte tmp = A ^ 5;
> > byte result = tmp << 2;
> > result = result | 4;
> > return result;
> > }
> >
> > the symbolic executor would:
> >
> > define(A);  // A = 
> > // Here, each bit of A is mapped to its origin A. So A[3]->get_origin()
> will return A.
> > // Besides that, each bit has an index field that denotes its initial
> position.
> > // So A[3]->get_index() will return 3 even if it is moved or assigned to
> another variable.
> > xor(tmp, A, 5);  // tmp =  1, A1 ^ 0, A0 ^ 1>
> > shift_left(result, tmp, 2);  // result =  1, A1 ^ 0, A0 ^ 1,0,0>
> > or(result, result, 4);  // result =  ^ 0, 1,0,0>, set result[2] = 1
> >
> > After these operations, we can examine the result and see that bits 3-7
> of the result are 1-5 bits of the A argument.
> > For example, result[4] is the (A2 ^ 1) xor expression (can be checked by
> is_a),
> > so it has left and right operands: one of them is the A2 symbolic bit,
> and the other is the constant 1.
> > So result[4]->get_left()->get_origin() will return A and
> result[4]->get_left()->get_index() will return 2
> > as its initial bit position was that.
> >
> > The symbolic executor supports few operations, it may need to be
> extended to use elsewhere.
> > Supported operations: AND, OR, XOR, SHIFT_RIGHT, SHIFT_LEFT, ADD, SUB,
> MUL, and COMPLEMENT.
>
> OK, so it seems it should be able to handle what the bswap pass
> requires as well (just with unnecessary
> bit precision and possibly some memory/compile-time overhead).  The
> bswap pass also handles
> {L,R}ROTATE_EXPR but that should be trivially to add if you can handle
> shifts.  It can also handle
> conversions (zero-/sign-extend and truncate), those should be easy as well.


Yes, the support can be added.



>

Can it handle
>
>   tmp = A & 0x00ff00ff00ff;
>   tmp2 = B & 0xff00ff00ff00;
>   result = A | B;
>
> ?  That is, make recognizing a blend (or as extension a shuffle) of
> two sources into one?


Yes, it can handle these kinds of cases too.
In this case we will have
tmp = <0, 0, A13, A12, 0, 0, A10, A9, 0, 0, A6, A5, 0,0,0,0> (Simple
optimizations, such as A15 & 0 -> 0 are performed in place.)
tmp2 = 
result = 


>
I would guess that parameterizing the engine on the granularity (byte
> vs. bit) would be
> possible to implement as well as possibly making the granularity
> variable as to "split"
> bits only when necessary?  I'm thinking of the cost of simulating a
> whole function "forward",
> keeping a lattice of SSA name -> symbolic execution result here.  Cost
> in terms of
> both memory and compile-time.
>

Yes, parameterizing the engine based on granularity (byte vs. bit) is
possible, but it would require significant changes to the code.



> Note it shouldn't be a requirement for you to merge the bswap
> byte-tracking code but
> it would be good to have the symbolic execution engine extensible
> enough to eventually
> cover what bswap does today and make the long-wanted extension of
> recognizing
> two-source vector permutes possible.
>

In the future, we are ready to add if needed.

Best regards,
Mariam


Richard.
>
> >>
> >>
> >> >
> >

Re: New version of unsiged patch

2024-08-27 Thread Thomas Koenig


Steve,


On Sun, Aug 18, 2024 at 12:10:18PM +0200, Thomas Koenig wrote:


this version of the patch includes DOT_PRODUCT, MATMUL and quite
a few improvements for simplification.


Thomas,

Your updated patch applied cleanly on top-of-tree gcc.
Bootstrap and regression testing on amd64-*-freebsd
completed without any oddities.

I'll read through the patch over the next few days.


Any comments so far?  I know the patch is very big :-) but
I can also incorporate comments you make before you have read
the whole thing.

Best regards

Thomas

Re: [PATCH] Libquadmath: update doc for some constants

2024-08-27 Thread FX Coudert

kind ping
Give it’s a doc patch, I think it might fall under the obvious rule, and will 
commit in a week if there is no objection.

FX

> As reported by Peter Randall, the description of three constants in 
> libquadmath is wrong. Attached patch fixes them.
> 
> OK to push?
> 
> FX
> 
> 
> libquadmath/ChangeLog:
> 
> * libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_2_PIq): Fix
> description of these constants.
> 
> 


0001-Libquadmath-update-doc-for-some-constants.patch
Description: Binary data

Re: [PATCH v2 9/9] RISC-V: Add cost model asserts

2024-08-27 Thread Patrick O'Neill




On 8/27/24 08:19, Jeff Law wrote:



On 8/26/24 6:37 PM, Patrick O'Neill wrote:
This patch adds some advanced checking to assert that the emitted 
costs match

emitted patterns for const_vecs.

Flow:
Costing: Insert into hashmap>
Expand: Check for membership in hashmap
  -> Not in hashmap: ignore, this wasn't costed
  -> In hashmap: Iterate over vec
 -> if RTX not in hashmap: Ignore, this wasn't costed (hash 
collision)

 -> if RTX in hashmap: Assert enum is expected

There are no false positive asserts with this flow.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Add RTL_CHECKING 
gated

asserts.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
* config/riscv/riscv-v.h (insert_expected_pattern): Add helper 
function

to insert hash collisions into hash map vec key.
(get_expected_costed_type): Add helper function to get the expected
cost type for a given rtx pattern.
I suspect this is going to be problematical at some point, 
particularly since we can get hash conflicts for cases that aren't 
problematical.


The concern here is that the hash -> vec mapping will end up containing 
too many entries in the vec due to hash collisions?




In general we also want to avoid #ifdefs -- we're not clean in that 
regards by any means, but much of that cruft has been converted to a 
runtime check.  The basic idea is that conditionally compiled code 
like that tends to be problematical for various checks like unused 
variables/paramters, use-without-defintion objects, etc.



I'd tend to prefer to drop this, but I'm not steadfastly against 
including.


Makes sense - the more I've thought about it the less happy I am with 
it's implementation. I'll put it on the backburner for now and see if 
there's a more elegant solution I can poke at.


Thanks for the review!

Patrick

Re: [PATCH] Libquadmath: update doc for some constants

2024-08-27 Thread Tobias Burnus


Hi FX,

FX Coudert wrote:

Give it’s a doc patch, I think it might fall under the obvious rule, and will 
commit in a week if there is no objection.


The patch clearly fixes a bug in the current specification and is fine, 
I just wonder …



* libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_2_PIq): Fix
description of these constants.



diff --git a/libquadmath/libquadmath.texi b/libquadmath/libquadmath.texi
index dc2a9ff374b..ce4accf6421 100644
--- a/libquadmath/libquadmath.texi
+++ b/libquadmath/libquadmath.texi

…

  @item @code{M_PI_2q}: pi divided by two
  @item @code{M_PI_4q}: pi divided by four
  @item @code{M_1_PIq}: one over pi
-@item @code{M_2_PIq}: one over two pi
+@item @code{M_2_PIq}: two over pi
  @item @code{M_2_SQRTPIq}: two over square root of pi
  @item @code{M_SQRT2q}: square root of 2
  @item @code{M_SQRT1_2q}: one over square root of 2


... whether we should change the "over" which somehow sounds odd. "two 
divided by pi" sounds better to me than "two over pi".


I do note, however, that the following documentation uses a slightly 
different wording:


"M_2_PI -Two times the reciprocal of pi."

https://www.gnu.org/software/libc/manual/html_node/Mathematical-Constants.html

Hence, while I am fine with the change, I think we should replace the 
"over" wording (multiple times) and move either to "divided by" or 
[(…times) the reciprocal of".


Tobias

[Committed v2 1/9] RISC-V: Fix vid const vector expander for non-npatterns size steps

2024-08-27 Thread Patrick O'Neill




On 8/27/24 07:55, Jeff Law wrote:



On 8/26/24 6:36 PM, Patrick O'Neill wrote:

Prior to this patch the expander would emit vectors like:
{ 0, 0, 5, 5, 10, 10, ...}
as:
{ 0, 0, 2, 2,  4,  4, ...}

This patch sets the step size to the requested value.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in
expander.

OK
jeff


Committed.

[Committed v2 2/9] RISC-V: Reorder insn cost match order to match corresponding expander match order

2024-08-27 Thread Patrick O'Neill




On 8/27/24 07:56, Jeff Law wrote:



On 8/26/24 6:36 PM, Patrick O'Neill wrote:

The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Relocate.

Already ACK'd.

jeff


Committed.

Re: [PATCH v2 3/9] RISC-V: Handle case when constant vector construction target rtx is not a register

2024-08-27 Thread Patrick O'Neill




On 8/27/24 08:00, Jeff Law wrote:



On 8/26/24 6:36 PM, Patrick O'Neill wrote:
This manifests in RTL that is optimized away which causes runtime 
failures
in the testsuite. Update all patterns to use a temp result register 
if required.


gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.
OK.  Just a note below, I don't think you necessarily need to change 
anything.





Signed-off-by: Patrick O'Neill 
---
  gcc/config/riscv/riscv-v.cc | 73 +
  1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a3039a2cb19..aea4b9b872b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1150,26 +1150,29 @@ static void
  expand_const_vector (rtx target, rtx src)
  {
    machine_mode mode = GET_MODE (target);
+  rtx result = register_operand (target, mode) ? target : 
gen_reg_rtx (mode);


So a cheaper test would be REG_OR_SUBREG_P rather than register_operand.

While testing register_operand does check the mode, if we have a 
mismatch on the modes between src/target, then the copy from result to 
target is going to fail.


But again, I don't think you really need to change anything here. Just 
pointing out the marginally more efficient test.


jeff


Committed.

Thanks for the note! If I have a future series touching this I'll swap 
out this test and other cases of it as a patch.

[Committed v2 4/9] RISC-V: Emit costs for bool and stepped const vectors

2024-08-27 Thread Patrick O'Neill




On 8/27/24 08:01, Jeff Law wrote:



On 8/26/24 6:36 PM, Patrick O'Neill wrote:

These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder 
costs for

bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.

Already ACK'd.

jeff


Committed.

[Committed v2 5/9] RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

2024-08-27 Thread Patrick O'Neill



On 8/27/24 08:02, Jeff Law wrote:



On 8/26/24 6:36 PM, Patrick O'Neill wrote:

The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 
floating-point

case.

OK
Jeff


Committed with revised changelog:

RISC-V: Handle 0.0 floating point pattern costing to match const_vector expander

The comment previously here stated that the Wc0/Wc1 cases are handled by
the vi constraint but that is not true for the 0.0 Wc0 case.

gcc/ChangeLog:

* config/riscv/riscv-v.h (valid_vec_immediate_p): Add new helper.
* config/riscv/riscv-v.cc (valid_vec_immediate_p): Ditto.
(expand_const_vector): Use new helper.
* config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point
case.

[Committed v2 6/9] RISC-V: Allow non-duplicate bool patterns in expand_const_vector

2024-08-27 Thread Patrick O'Neill




On 8/27/24 08:04, Jeff Law wrote:



On 8/26/24 6:37 PM, Patrick O'Neill wrote:

Currently we assert when encountering a non-duplicate boolean vector.
This patch allows non-duplicate vectors to fall through to the
gcc_unreachable and assert there.

This will be useful when adding a catch-all pattern to emit costs and
handle arbitary vectors.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate
to fall through other patterns before asserting.

OK
jeff


Committed.

[Committed v2 7/9] RISC-V: Move helper functions above expand_const_vector

2024-08-27 Thread Patrick O'Neill




On 8/27/24 08:04, Jeff Law wrote:



On 8/26/24 6:37 PM, Patrick O'Neill wrote:

These subroutines will be used in expand_const_vector in a future patch.
Relocate so expand_const_vector can use them.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vector_init_insert_elems): 
Relocate.

(expand_vector_init_trailing_same_elem): Ditto.

Already ACK'd.
jeff


Committed.

Re: [PATCH] c++: Don't show constructor internal name in error message [PR105483]

2024-08-27 Thread Simon Martin

Hi Jason,

On 26 Aug 2024, at 19:30, Jason Merrill wrote:

> On 8/26/24 12:49 PM, Simon Martin wrote:
>> We mention 'X::__ct' instead of 'X::X' in the "names the constructor,

>> not the type" error for this invalid code:
>>
>> === cut here ===
>> struct X {};
>> void g () {
>>X::X x;
>> }
>> === cut here ===
>>
>> The problem is that we use %<%T::%D%> to build the error message, 
>> while
>> %qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This 
>> is
>> what this patch does, along with skipping until the end of the 
>> statement
>> to avoid emitting extra (useless) errors.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/105483
>>
>> gcc/cp/ChangeLog:
>>
>>  * parser.cc (cp_parser_expression_statement): Use %qE instead of
>>  incorrect %<%T::%D%>, and skip to end of statement.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/tc1/dr147.C: Adjust test expectation.
>>  * g++.dg/diagnostic/pr105483.C: New test.
>>
>> ---
>>   gcc/cp/parser.cc   | 7 ---
>>   gcc/testsuite/g++.dg/diagnostic/pr105483.C | 7 +++
>>   gcc/testsuite/g++.dg/tc1/dr147.C   | 2 +-
>>   3 files changed, 12 insertions(+), 4 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr105483.C
>>
>> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
>> index 28ebf2beb60..ef4e3838a86 100644
>> --- a/gcc/cp/parser.cc
>> +++ b/gcc/cp/parser.cc
>> @@ -13240,10 +13240,11 @@ cp_parser_expression_statement (cp_parser* 
>> parser, tree in_statement_expr)
>> && DECL_CONSTRUCTOR_P (get_first_fn (statement)))
>>  {
>>/* A::A a; */
>> -  tree fn = get_first_fn (statement);
>>error_at (token->location,
>> -"%<%T::%D%> names the constructor, not the type",
>> -DECL_CONTEXT (fn), DECL_NAME (fn));
>> +"%qE names the constructor, not the type",
>> +get_first_fn (statement));
>> +  cp_parser_skip_to_end_of_block_or_statement (parser);
>
> Why block_or_statement rather than just _statement?
It was a mistake, thanks for catching it!

> Maybe move the skip+return out of this block to share it with the 
> preceding typename error?
Good idea. It’s a tiny bit more involved than just moving, because 
we’d miss genuine errors emitted by 
cp_parser_consume_semicolon_at_end_of_statement (e.g. break the 
c-c++-common/pr44515.c test, among others), however the updated patch 

does what you’re suggesting. I have successfully tested on 
x86_64-pc-linux-gnu. OK for trunk?

Thanks!
   Simon
>
> Jason
From 00b9f316f7d20f75b150c23fa4d4c9bdc02191b8 Mon Sep 17 00:00:00 2001
From: Simon Martin 
Date: Mon, 26 Aug 2024 14:09:46 +0200
Subject: [PATCH] c++: Don't show constructor internal name in error message 
[PR105483]

We mention 'X::__ct' instead of 'X::X' in the "names the constructor,
not the type" error for this invalid code:

=== cut here ===
struct X {};
void g () {
  X::X x;
}
=== cut here ===

The problem is that we use %<%T::%D%> to build the error message, while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This is
what this patch does.

It also skips until the end of the statement and returns error_mark_node
for this and the preceding if block, to avoid emitting extra (useless)
errors.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/105483

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>. Skip to end of statement and return
error_mark_node in case of error.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error36.C: Adjust test expectation.
* g++.dg/tc1/dr147.C: Likewise.
* g++.old-deja/g++.other/typename1.C: Likewise.
* g++.dg/diagnostic/pr105483.C: New test.

---
 gcc/cp/parser.cc | 14 +-
 gcc/testsuite/g++.dg/diagnostic/pr105483.C   |  7 +++
 gcc/testsuite/g++.dg/parse/error36.C |  4 ++--
 gcc/testsuite/g++.dg/tc1/dr147.C |  2 +-
 gcc/testsuite/g++.old-deja/g++.other/typename1.C |  2 +-
 5 files changed, 20 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr105483.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 28ebf2beb60..a722641be34 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13232,18 +13232,22 @@ cp_parser_expression_statement (cp_parser* parser, 
tree in_statement_expr)
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON)
   && !cp_parser_uncommitted_to_tentative_parse_p (parser))
 {
+  bool has_errored = true;
   if (TREE_CODE (statement) == SCOPE_REF)
error_at (token->location, "need % before %qE because "
  "%qT is a dependent scope",
  statement, TREE_OPERAND (statement, 0));
   else if (is_overloaded_fn (statement)
   && DECL_CONSTRUCTOR_P (get_first_fn (statement)))
+   /* A::A a; */
+   error_

[PATCH] Tweak documentation of ASM_INPUT_P

2024-08-27 Thread Richard Sandiford

The documentation of ASM_INPUT_P implied that the flag has no
effect on ASM_EXPRs that have operands (and which therefore must be
extended asms).  In fact we require ASM_INPUT_P to be false for all
extended asms.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
* tree.h (ASM_INPUT_P): Fix documentation.
---
 gcc/tree.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree.h b/gcc/tree.h
index 5dcbb2fb5dd..c501019717f 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1428,8 +1428,9 @@ class auto_suppress_location_wrappers
 #define ASM_INPUTS(NODE)TREE_OPERAND (ASM_EXPR_CHECK (NODE), 2)
 #define ASM_CLOBBERS(NODE)  TREE_OPERAND (ASM_EXPR_CHECK (NODE), 3)
 #define ASM_LABELS(NODE)   TREE_OPERAND (ASM_EXPR_CHECK (NODE), 4)
-/* Nonzero if we want to create an ASM_INPUT instead of an
-   ASM_OPERAND with no operands.  */
+/* Nonzero if the asm is a basic asm, zero if it is an extended asm.
+   Basic asms use a plain ASM_INPUT insn pattern whereas extended asms
+   use an ASM_OPERANDS insn pattern.  */
 #define ASM_INPUT_P(NODE) (ASM_EXPR_CHECK (NODE)->base.static_flag)
 #define ASM_VOLATILE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.public_flag)
 /* Nonzero if we want to consider this asm as minimum length and cost
-- 
2.25.1

Re: [patch,avr] Overhaul avr-ifelse RTL optimization pass

2024-08-27 Thread Georg-Johann Lay


Am 27.08.24 um 17:28 schrieb Jeff Law:


On 8/26/24 1:15 PM, Georg-Johann Lay wrote:


What the avr-ifelse pass does is try to replace 2 cbranch insns with
one compare insn and two branches.  It runs after reload and just prior
to .split2 (split_after_reload).  It must run after reload because
REG_CC comes into existence in .split2.  For example, the last case
belongs to transforming

    if (x == (unsigned) -1) goto A;
    if (x == (unsigned) -2) goto B;

to

    REG_CC = x compare (unsigned) -2;
    if (REG_CC >  0) goto A;
    if (REG_CC == 0) goto B;
Hmm.  I'd envisioned doing this in gimple, but as your example shows, 
it's not suitable for gimple (it's an extra expression evaluation).


Max be something like starship operator for integer mode would help?

All the situations are effectively sship situation, but a sship
detection in gimple could do even more:

char fun (char x)
{
if (x > '0') return 10;
return x == '0';
}

The .optimized tree dump reads something like:

char fun (char x)
{
  _Bool _1;
  char _2;
  char _4;

   [local count: 1073741824]:
  if (x_3(D) > 48)
goto ; [21.72%]
  else
goto ; [78.28%]

   [local count: 840525096]:
  _1 = x_3(D) == 48;
  _4 = (char) _1;

   [local count: 1073741824]:
  # _2 = PHI <10(2), _4(3)>

  return _2;
}

The problem with this is that the code gets expanded like:

   cbranch
   REG = const ;; clobbers CC
   cbranch

With a spaceship insn, the backend could avoid that, though it
is unclear to me to which rtl this should be expanded.
JUMP_INSNs can only have one JUMP_LABEL (though I recently learned
that JUMP_INSN can have more than one label when they are registered
as insn notes, but I never tried that.)

For example, a spaceship insn could provide operands like

$0, $1: The ops for the <=> comparison
$2: label_ref for <
$3: label_ref for =
$4: label_ref for >

Or more general, provide set rtxes as $2, $4, $4 like pc=pc for the
fallthrough case, (set reg:QI 1) etc. so that jumps can be avoided.

Johann



(As an aside, this will be transformed further down the line to

    REG_CC = x compare (unsigned) -2;
    if (REG_CC == 0) goto B;
    if (REG_CC >= 0) goto A;

in order to avoid GTU.)

None of the code really looks AVR specific, so is there a good reason 
why we're not doing this in one of the target independent passes?


That target-independent pass would be compare-elim, which avr does
not use.  Some of the reasons why not using compare-elim I tried to
get across in the review for PR115830:
It feels like it'd fit in RTL jump optimizations, well outside 
compare-elim.  Though that may still be too high level.  So yea, let's 
keep it AVR specific.



Jeff

Re: [PATCH] c++: Don't show constructor internal name in error message [PR105483]

2024-08-27 Thread Jason Merrill


On 8/27/24 1:15 PM, Simon Martin wrote:

Hi Jason,

On 26 Aug 2024, at 19:30, Jason Merrill wrote:


On 8/26/24 12:49 PM, Simon Martin wrote:

We mention 'X::__ct' instead of 'X::X' in the "names the constructor,



not the type" error for this invalid code:

=== cut here ===
struct X {};
void g () {
X::X x;
}
=== cut here ===

The problem is that we use %<%T::%D%> to build the error message,
while
%qE does exactly what we need since we have DECL_CONSTRUCTOR_P. This
is
what this patch does, along with skipping until the end of the
statement
to avoid emitting extra (useless) errors.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/105483

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): Use %qE instead of
incorrect %<%T::%D%>, and skip to end of statement.

gcc/testsuite/ChangeLog:

* g++.dg/tc1/dr147.C: Adjust test expectation.
* g++.dg/diagnostic/pr105483.C: New test.

---
   gcc/cp/parser.cc   | 7 ---
   gcc/testsuite/g++.dg/diagnostic/pr105483.C | 7 +++
   gcc/testsuite/g++.dg/tc1/dr147.C   | 2 +-
   3 files changed, 12 insertions(+), 4 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr105483.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 28ebf2beb60..ef4e3838a86 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13240,10 +13240,11 @@ cp_parser_expression_statement (cp_parser*
parser, tree in_statement_expr)
   && DECL_CONSTRUCTOR_P (get_first_fn (statement)))
{
  /* A::A a; */
- tree fn = get_first_fn (statement);
  error_at (token->location,
-   "%<%T::%D%> names the constructor, not the type",
-   DECL_CONTEXT (fn), DECL_NAME (fn));
+   "%qE names the constructor, not the type",
+   get_first_fn (statement));
+ cp_parser_skip_to_end_of_block_or_statement (parser);


Why block_or_statement rather than just _statement?

It was a mistake, thanks for catching it!


Maybe move the skip+return out of this block to share it with the
preceding typename error?

Good idea. It’s a tiny bit more involved than just moving, because
we’d miss genuine errors emitted by
cp_parser_consume_semicolon_at_end_of_statement (e.g. break the
c-c++-common/pr44515.c test, among others), however the updated patch

does what you’re suggesting. I have successfully tested on
x86_64-pc-linux-gnu. OK for trunk?


OK.

Jason

Re: New version of unsiged patch

2024-08-27 Thread Steve Kargl

On Tue, Aug 27, 2024 at 06:46:08PM +0200, Thomas Koenig wrote:
> Steve,
> 
> > On Sun, Aug 18, 2024 at 12:10:18PM +0200, Thomas Koenig wrote:
> > > 
> > > this version of the patch includes DOT_PRODUCT, MATMUL and quite
> > > a few improvements for simplification.
> > 
> > Thomas,
> > 
> > Your updated patch applied cleanly on top-of-tree gcc.
> > Bootstrap and regression testing on amd64-*-freebsd
> > completed without any oddities.
> > 
> > I'll read through the patch over the next few days.
> 
> Any comments so far?  I know the patch is very big :-) but
> I can also incorporate comments you make before you have read
> the whole thing.
> 

Unfortunately, I got sidetracked on my half-cycle trig function patch.

I'll move your patch to the top of my queue as you've done quite
a bit of work and I would like to see it move forward.  I did read
the J3 paper and J3 github repo archive for the issue.  It looks 
like a well-thought out approach to the unsigned problem.

-- 
Steve

Re: [PATCH v1] Provide new GCC builtin __builtin_get_counted_by [PR116016]

2024-08-27 Thread Bill Wendling

On Tue, Aug 27, 2024 at 6:58 AM Qing Zhao  wrote:
> > On Aug 27, 2024, at 02:17, Martin Uecker  wrote:
> > Am Montag, dem 26.08.2024 um 17:21 -0700 schrieb Kees Cook:
> >> On Mon, Aug 26, 2024 at 11:01:08PM +0200, Martin Uecker wrote:
> >>> Am Montag, dem 26.08.2024 um 13:30 -0700 schrieb Kees Cook:
>  On Mon, Aug 26, 2024 at 07:30:15PM +, Qing Zhao wrote:
> > Hi, Martin,
> >
> > Looks like that there is some issue when I tried to use the _Generic 
> > for the testing cases, and then I narrowed down to a
> > small testing case that shows the problem without any change to GCC.
> >
> > [opc@qinzhao-ol8u3-x86 gcc]$ cat t1.c
> > struct annotated {
> >  char b;
> >  int c[];
> > } *array_annotated;
> > extern void * counted_by_ref (int *);
> >
> > int main(int argc, char *argv[])
> > {
> >  typeof(counted_by_ref (array_annotated->c)) ret
> >= counted_by_ref (array_annotated->c);
> >   _Generic (ret, void* : (void)0, default: *ret = 10);
> >
> >  return 0;
> > }
> > [opc@qinzhao-ol8u3-x86 gcc]$ /home/opc/Install/latest/bin/gcc t1.c
> > t1.c: In function ‘main’:
> > t1.c:12:44: warning: dereferencing ‘void *’ pointer
> >   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> >  |^~~~
> > t1.c:12:49: error: invalid use of void expression
> >   12 |   _Generic (ret, void* : (void)0, default: *ret = 10);
> >  | ^
> 
>  I implemented it like this[1] in the Linux kernel. So yours could be:
> 
>  struct annotated {
>   char b;
>   int c[] __attribute__((counted_by(b));
>  };
>  extern struct annotated *array_annotated;
> 
>  int main(int argc, char *argv[])
>  {
>   typeof(_Generic(__builtin_get_counted_by(array_annotated->c),
> void *: (size_t *)NULL,
> default: __builtin_get_counted_by(array_annotated->c)))
>  ret = __builtin_get_counted_by(array_annotated->c);
>   if (ret)
>  *ret = 10;
> 
>   return 0;
>  }
> 
>  It's a bit cumbersome, but it does what's needed.
> 
>  This is, however, just doing exactly what Bill has suggested: it is
>  converting the (void *)NULL into (size_t *)NULL when there is no
>  counted_by annotation...
> 
>  -Kees
> 
>  [1] 
>  https://lore.kernel.org/linux-hardening/20240822231324.make.666-k...@kernel.org/
> >>>
> >>> Interesting. Will __builtin_get_counted_by(array_annotated->c) give
> >>> a null pointer (or an invalid pointer) of the correct type if
> >>> array_annotated is a null pointer of an annotated struct type?
> >>
> >> If you mean this part:
> >>
> >> typeof(P) __obj_ptr = NULL; \
> >> /* Just query the counter type for type_max checking. */ \
> >> typeof(_Generic(__flex_counter(__obj_ptr->FAM), \
> >> void *: (size_t *)NULL, \
> >> default: __flex_counter(__obj_ptr->FAM))) \
> >> __counter_type_ptr = NULL; \
> >>
> >> Where __obj_ptr starts as NULL, then yes. (Or at least, yes it does
> >> currently with Qing's GCC patch and Bill's Clang patch.)
> >
> > Does __builtin_get_counted_by not evaluate its argument?
>
> __builtin_get_counted_by currently is implemented as a C reserved words, and 
> purely implemented in C parser as an C operator.
>
> So, it doesn’t apply complicated evaluations on its argument.
> I think that this should provide enough and simple functionality as needed.
> If no, please let me know.
>
>
> > In any
> > case, I think this should be documented whether this is
> > supposed to work (or not).
> Okay.
> >
> >>
> >>> I also wonder a bit about the multiple macro evaluations of the arguments
> >>> for P and SIZE.
> >>
> >> I tried to design it so they aren't used with anything that should
> >> have side-effects.
> >
> > I was more concerned about the cost of macro expansions on
> > compile times. I would do:
> >
> > __auto_type __FOO = (FOO);
> >
> > for all macro parameters that are evaluated multiple times
> > and are expressions which might contain macros themselves.
> >
> > There is also the issue of evaluation of typeof for variably modified
> > types, which might not currently affect the kernel, but this would
> > also become safer for such types.
> >
> >
> >> Anyway, if __builtin_get_counted_by returns (size_t *)NULL then I think
> >> the _Generic wrapping isn't needed. That would make it easier to use?
> >
> > It would make it easier for your use case.  I wonder though
> > whether other people might want to have the compile time error
> > when there is no attribute.
>
> Then this will go back to the previous discussion: whether we should go the 
> approach  for the unary operator __counted_by(PTR), then the other builtin 
> __builtin_has_attribute(PTR, counted_by) should be provided to the user.
>
> For GCC, there is no issue, we already have the __builtin_has_attribute (PTR, 
> counted_by)

Re: [PATCH] Libquadmath: update doc for some constants

2024-08-27 Thread Sandra Loosemore


On 8/27/24 11:06, Tobias Burnus wrote:

Hi FX,

FX Coudert wrote:
Give it’s a doc patch, I think it might fall under the obvious rule, 
and will commit in a week if there is no objection.


The patch clearly fixes a bug in the current specification and is fine, 
I just wonder …



* libquadmath.texi (M_LOG2Eq, M_LOG10Eq, M_2_PIq): Fix
description of these constants.



diff --git a/libquadmath/libquadmath.texi b/libquadmath/libquadmath.texi
index dc2a9ff374b..ce4accf6421 100644
--- a/libquadmath/libquadmath.texi
+++ b/libquadmath/libquadmath.texi

…

  @item @code{M_PI_2q}: pi divided by two
  @item @code{M_PI_4q}: pi divided by four
  @item @code{M_1_PIq}: one over pi
-@item @code{M_2_PIq}: one over two pi
+@item @code{M_2_PIq}: two over pi
  @item @code{M_2_SQRTPIq}: two over square root of pi
  @item @code{M_SQRT2q}: square root of 2
  @item @code{M_SQRT1_2q}: one over square root of 2


... whether we should change the "over" which somehow sounds odd. "two 
divided by pi" sounds better to me than "two over pi".


I do note, however, that the following documentation uses a slightly 
different wording:


"M_2_PI -Two times the reciprocal of pi."

https://www.gnu.org/software/libc/manual/html_node/Mathematical-Constants.html

Hence, while I am fine with the change, I think we should replace the 
"over" wording (multiple times) and move either to "divided by" or 
[(…times) the reciprocal of".


I agree.  I don't have a preference for which is better, but being 
consistent with other documentation might be a winning argument.


Alas, Texinfo has lousy support for formatting mathematical equations, 
so that's not an option.


-Sandra

Re: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

2024-08-27 Thread Robin Dapp

> +(define_mode_iterator V_HAS_HALF [
> +  V2QI V4QI V8QI V16QI V32QI V64QI V128QI V256QI V512QI V1024QI V2048QI 
> V4096QI
> +  V2HI V4HI V8HI V16HI V32HI V64HI V128HI V256HI V512HI V1024HI V2048HI
> +  V2SI V4SI V8SI V16SI V32SI V64SI V128SI V256SI V512SI V1024SI
> +  V2DI V4DI V8DI V16DI V32DI V64DI V128DI V256DI V512DI
> +  V2SF V4SF V8SF V16SF V32SF V64SF V128SF V256SF V512SF V1024SF
> +  V2DF V4DF V8DF V16DF V32DF V64DF V128DF V256DF V512DF
> +])
>
> Seems you missed predicate here ?
> Like:
> (V4096QI "riscv_vector::vls_mode_valid_p (V4096QImode) && TARGET_MIN_VLEN >= 
> 4096")(V32DF "riscv_vector::vls_mode_valid_p (V32DFmode) && 
> TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 256")

Yes I did while copying things over, thanks.

Attached is V2 with that changed.

[PATCH v2] RISC-V: Fix subreg of VLS modes larger than a vector
 [PR116086].

When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).

As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.

The culprit in PR116086 is

 _12 = BIT_FIELD_REF ;

which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.

PR target/116086

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.
---
 gcc/config/riscv/autovec.md   |  35 +++
 gcc/config/riscv/riscv.cc |  11 +
 gcc/config/riscv/vector-iterators.md  | 202 ++
 .../riscv/rvv/autovec/pr116086-2-run.c|   6 +
 .../gcc.target/riscv/rvv/autovec/pr116086-2.c |  18 ++
 .../gcc.target/riscv/rvv/autovec/pr116086.c   |  76 +++
 gcc/testsuite/lib/target-supports.exp |  37 
 7 files changed, 385 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f422ec0dc1e..4703b079fcb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1462,6 +1462,41 @@ (define_expand "vec_extractbi"
   DONE;
 })
 
+;; -
+;;  [INT,FP] Extract a vector from a vector.
+;; -
+;; TODO: This can be extended to allow basically any extract mode.
+;; For now this helps optimize VLS subregs like (subreg:V2DI (reg:V4DI) 16)
+;; that would otherwise need to go via memory.
+
+(define_expand "vec_extract"
+  [(set (match_operand:   0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:VLS_HAS_HALF  1 "register_operand")
+   (parallel
+[(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+   (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
 ;; -
 ;;  [FP] Binary operations
 ;; -
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..4b9f3081ac5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10630,6 +10630,17 @@ riscv_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (reg_classes_intersect_p (V_REGS, rclass)
   && !ordered_p (GET_MODE_PRECISION (from), GET_MODE_PRECISION (to)))
 return false;
+
+  /* Subregs of modes larger than one vector are ambiguous.

Re: [PATCH v3 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-08-27 Thread Harald Anlauf


Mikael,

Am 23.08.24 um 10:31 schrieb Mikael Morin:

From: Mikael Morin 

The documentation in this patch was partly reworded, compared
to the previous version posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660607.html
The rest of the patch is unchanged, just rebased to a more recent
master.

Joseph is in CC as I need a ack for the new option.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?


the Fortran side is OK now.

Thanks for the patch!

Harald


-- >8 --

Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.

The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together.  For each intrinsic, a default
value is set if none was set.  The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.

There is no direct support for this behaviour provided by the .opt options
framework.  It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable.  Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was set or not
by checking whether the two bits have different values.

PR fortran/90608

gcc/ChangeLog:

* flag-types.h (enum gfc_inlineable_intrinsics): New type.

gcc/fortran/ChangeLog:

* invoke.texi(finline-intrinsics): Document new flag.
* lang.opt (finline-intrinsics, finline-intrinsics=,
fno-inline-intrinsics): New flags.
* options.cc (gfc_post_options): If the option variable controling
the inlining of MAXLOC (respectively MINLOC) has not been set, set
it or clear it depending on the optimization option variables.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
if inlining for the intrinsic is disabled according to the option
variable.

gcc/testsuite/ChangeLog:

* gfortran.dg/minmaxloc_18.f90: New test.
* gfortran.dg/minmaxloc_18a.f90: New test.
* gfortran.dg/minmaxloc_18b.f90: New test.
* gfortran.dg/minmaxloc_18c.f90: New test.
* gfortran.dg/minmaxloc_18d.f90: New test.
---
  gcc/flag-types.h|  30 +
  gcc/fortran/invoke.texi |  31 +
  gcc/fortran/lang.opt|  27 +
  gcc/fortran/options.cc  |  21 +-
  gcc/fortran/trans-intrinsic.cc  |  13 +-
  gcc/testsuite/gfortran.dg/minmaxloc_18.f90  | 772 
  gcc/testsuite/gfortran.dg/minmaxloc_18a.f90 |  10 +
  gcc/testsuite/gfortran.dg/minmaxloc_18b.f90 |  10 +
  gcc/testsuite/gfortran.dg/minmaxloc_18c.f90 |  10 +
  gcc/testsuite/gfortran.dg/minmaxloc_18d.f90 |  10 +
  10 files changed, 929 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 1e497f0bb91..df56337f7e8 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -451,6 +451,36 @@ enum gfc_convert
  };


+/* gfortran -finline-intrinsics= values;
+   We use two identical bits for each value, and initialize with alternated
+   bits, so that we can check whether a value has been set by checking whether
+   the two bits have identical value.  */
+
+#define GFC_INL_INTR_VAL(idx) (3 << (2 * idx))
+#define GFC_INL_INTR_UNSET_VAL(val) (0x & (val))
+
+enum gfc_inlineable_intrinsics
+{
+  GFC_FLAG_INLINE_INTRINSIC_NONE = 0,
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC = GFC_INL_INTR_VAL (0),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC = GFC_INL_INTR_VAL (1),
+  GFC_FLAG_INLINE_INTRINSIC_ALL = GFC_FLAG_INLINE_INTRINSIC_MAXLOC
+ | GFC_FLAG_INLINE_INTRINSIC_MINLOC,
+
+  GFC_FLAG_INLINE_INTRINSIC_NONE_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_NONE),
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MAXLOC),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MINLOC),
+  GFC_FLAG_INLINE_INTRINSIC_ALL_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_ALL)
+};
+
+#undef GFC_INL_INTR_UNSET_VAL
+#undef GFC_INL_INTR_VAL
+
+
  /* Inline String Operations functions.  */
  enum ilsop_fn
  {
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 6bc42afe2c4

[PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-08-27 Thread H.J. Lu

Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects.  Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".

PR ipa/116410
* ipa-modref.cc (analyze_parms): Always analyze function parameter
for LTO streaming.

Signed-off-by: H.J. Lu 
---
 gcc/ipa-modref.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index 59cfe91f987..9275030c254 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -2975,7 +2975,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
summary->arg_flags.safe_grow_cleared (count, true);
  summary->arg_flags[parm_index] = EAF_UNUSED;
}
- else if (summary_lto)
+ if (summary_lto)
{
  if (parm_index >= summary_lto->arg_flags.length ())
summary_lto->arg_flags.safe_grow_cleared (count, true);
@@ -3034,7 +3034,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
summary->arg_flags.safe_grow_cleared (count, true);
  summary->arg_flags[parm_index] = flags;
}
- else if (summary_lto)
+ if (summary_lto)
{
  if (parm_index >= summary_lto->arg_flags.length ())
summary_lto->arg_flags.safe_grow_cleared (count, true);
-- 
2.46.0

m68k: Accept ASHIFT like MULT in address operand

2024-08-27 Thread Andreas Schwab

When LRA pulls an address operand out of a MEM it caninoicalizes a
containing MULT into ASHIFT.  Adjust the address decomposer to recognize
this form.

PR target/116413
* config/m68k/m68k.cc (m68k_decompose_index): Accept ASHIFT like
MULT.
(m68k_rtx_costs) [PLUS]: Likewise.
(m68k_legitimize_address): Likewise.
---
 gcc/config/m68k/m68k.cc | 58 -
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index 21c94981d22..7986e92c511 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -1503,12 +1503,14 @@ m68k_legitimize_address (rtx x, rtx oldx, machine_mode 
mode)
 
 #define COPY_ONCE(Y) if (!copied) { Y = copy_rtx (Y); copied = ch = 1; }
 
-  if (GET_CODE (XEXP (x, 0)) == MULT)
+  if (GET_CODE (XEXP (x, 0)) == MULT
+ || GET_CODE (XEXP (x, 0)) == ASHIFT)
{
  COPY_ONCE (x);
  XEXP (x, 0) = force_operand (XEXP (x, 0), 0);
}
-  if (GET_CODE (XEXP (x, 1)) == MULT)
+  if (GET_CODE (XEXP (x, 1)) == MULT
+ || GET_CODE (XEXP (x, 1)) == ASHIFT)
{
  COPY_ONCE (x);
  XEXP (x, 1) = force_operand (XEXP (x, 1), 0);
@@ -2069,16 +2071,29 @@ m68k_decompose_index (rtx x, bool strict_p, struct 
m68k_address *address)
 
   /* Check for a scale factor.  */
   scale = 1;
-  if ((TARGET_68020 || TARGET_COLDFIRE)
-  && GET_CODE (x) == MULT
-  && GET_CODE (XEXP (x, 1)) == CONST_INT
-  && (INTVAL (XEXP (x, 1)) == 2
- || INTVAL (XEXP (x, 1)) == 4
- || (INTVAL (XEXP (x, 1)) == 8
- && (TARGET_COLDFIRE_FPU || !TARGET_COLDFIRE
+  if (TARGET_68020 || TARGET_COLDFIRE)
 {
-  scale = INTVAL (XEXP (x, 1));
-  x = XEXP (x, 0);
+  if (GET_CODE (x) == MULT
+ && GET_CODE (XEXP (x, 1)) == CONST_INT
+ && (INTVAL (XEXP (x, 1)) == 2
+ || INTVAL (XEXP (x, 1)) == 4
+ || (INTVAL (XEXP (x, 1)) == 8
+ && (TARGET_COLDFIRE_FPU || !TARGET_COLDFIRE
+   {
+ scale = INTVAL (XEXP (x, 1));
+ x = XEXP (x, 0);
+   }
+  /* LRA uses ASHIFT instead of MULT outside of MEM.  */
+  else if (GET_CODE (x) == ASHIFT
+  && GET_CODE (XEXP (x, 1)) == CONST_INT
+  && (INTVAL (XEXP (x, 1)) == 1
+  || INTVAL (XEXP (x, 1)) == 2
+  || (INTVAL (XEXP (x, 1)) == 3
+  && (TARGET_COLDFIRE_FPU || !TARGET_COLDFIRE
+   {
+ scale = 1 << INTVAL (XEXP (x, 1));
+ x = XEXP (x, 0);
+   }
 }
 
   /* Check for a word extension.  */
@@ -2246,8 +2261,10 @@ m68k_decompose_address (machine_mode mode, rtx x,
  ??? do_tablejump creates these addresses before placing the target
  label, so we have to assume that unplaced labels are jump table
  references.  It seems unlikely that we would ever generate indexed
- accesses to unplaced labels in other cases.  */
+ accesses to unplaced labels in other cases.  Do not accept it in
+ PIC mode, since the label address will need to be loaded from memory.  */
   if (GET_CODE (x) == PLUS
+  && !flag_pic
   && m68k_jump_table_ref_p (XEXP (x, 1))
   && m68k_decompose_index (XEXP (x, 0), strict_p, address))
 {
@@ -3068,12 +3085,17 @@ m68k_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   /* An lea costs about three times as much as a simple add.  */
   if (mode == SImode
  && GET_CODE (XEXP (x, 1)) == REG
- && GET_CODE (XEXP (x, 0)) == MULT
- && GET_CODE (XEXP (XEXP (x, 0), 0)) == REG
- && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT
- && (INTVAL (XEXP (XEXP (x, 0), 1)) == 2
- || INTVAL (XEXP (XEXP (x, 0), 1)) == 4
- || INTVAL (XEXP (XEXP (x, 0), 1)) == 8))
+ && ((GET_CODE (XEXP (x, 0)) == MULT
+  && GET_CODE (XEXP (XEXP (x, 0), 0)) == REG
+  && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT
+  && (INTVAL (XEXP (XEXP (x, 0), 1)) == 2
+  || INTVAL (XEXP (XEXP (x, 0), 1)) == 4
+  || INTVAL (XEXP (XEXP (x, 0), 1)) == 8))
+ || (GET_CODE (XEXP (x, 0)) == ASHIFT
+ && GET_CODE (XEXP (XEXP (x, 0), 0)) == REG
+ && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT
+ && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (XEXP (x, 0), 1))
+ <= 3
{
/* lea an@(dx:l:i),am */
*total = COSTS_N_INSNS (TARGET_COLDFIRE ? 2 : 3);
-- 
2.46.0


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH] c++/coroutines: fix actor cases not being added to the current switch [PR109867]

2024-08-27 Thread Arsen Arsenović

Jason Merrill  writes:

> On 8/1/24 12:48 PM, Arsen Arsenović wrote:
>> Tested on x86_64-pc-linux-gnu, no regression.
>> OK for trunk?
>> TIA, have a lovely day.
>> -- >8 --
>> Previously, we were building and inserting case_labels manually, which
>> lead to them not being added into the currently running switch via
>
> "led"
>
>> c_add_case_label.  This lead to false diagnostics that the user could
>> not act on.
>
> The case changes are OK.

Applying the following now that we don't need the hack anymore.
Regstrapped on x86_64-pc-linux-gnu.

Thanks, have a lovely evening.
-- >8 --
Previously, we were building and inserting case_labels manually, which
led to them not being added into the currently running switch via
c_add_case_label.  This led to false diagnostics that the user could not
act on.

PR c++/109867

gcc/cp/ChangeLog:

* coroutines.cc (expand_one_await_expression): Replace uses of
build_case_label with finish_case_label.
(build_actor_fn): Ditto.
(create_anon_label_with_ctx): Remove now-unused function.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr109867.C: New test.

Reviewed-by: Iain Sandoe 
---
 gcc/cp/coroutines.cc  | 52 ---
 .../g++.dg/coroutines/torture/pr109867.C  | 23 
 2 files changed, 34 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr109867.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 31dc39afeee2..f243fe9adae2 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1708,20 +1708,6 @@ coro_build_artificial_var (location_t loc, const char 
*name, tree type,
type, ctx, init);
 }
 
-/* Helpers for label creation:
-   1. Create a named label in the specified context.  */
-
-static tree
-create_anon_label_with_ctx (location_t loc, tree ctx)
-{
-  tree lab = build_decl (loc, LABEL_DECL, NULL_TREE, void_type_node);
-
-  DECL_CONTEXT (lab) = ctx;
-  DECL_ARTIFICIAL (lab) = true;
-  DECL_IGNORED_P (lab) = true;
-  TREE_USED (lab) = true;
-  return lab;
-}
 
 /*  2. Create a named label in the specified context.  */
 
@@ -1935,22 +1921,16 @@ expand_one_await_expression (tree *stmt, tree 
*await_expr, void *d)
data->coro_fp);
   r = cp_build_init_expr (cond, r);
   finish_switch_cond (r, sw);
-  r = build_case_label (integer_zero_node, NULL_TREE,
-   create_anon_label_with_ctx (loc, actor));
-  add_stmt (r); /* case 0: */
+  finish_case_label (loc, integer_zero_node, NULL_TREE); /*  case 0: */
   /* Implement the suspend, a scope exit without clean ups.  */
   r = build_call_expr_internal_loc (loc, IFN_CO_SUSPN, void_type_node, 1,
is_cont ? cont : susp);
   r = coro_build_cvt_void_expr_stmt (r, loc);
   add_stmt (r); /*   goto ret;  */
-  r = build_case_label (integer_one_node, NULL_TREE,
-   create_anon_label_with_ctx (loc, actor));
-  add_stmt (r); /* case 1:  */
+  finish_case_label (loc, integer_one_node, NULL_TREE); /*  case 1:  */
   r = build1_loc (loc, GOTO_EXPR, void_type_node, resume_label);
   add_stmt (r); /*  goto resume;  */
-  r = build_case_label (NULL_TREE, NULL_TREE,
-   create_anon_label_with_ctx (loc, actor));
-  add_stmt (r); /* default:;  */
+  finish_case_label (loc, NULL_TREE, NULL_TREE); /* default:;  */
   r = build1_loc (loc, GOTO_EXPR, void_type_node, destroy_label);
   add_stmt (r); /* goto destroy;  */
 
@@ -2291,9 +2271,7 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
 
   tree destroy_dispatcher = begin_switch_stmt ();
   finish_switch_cond (rat, destroy_dispatcher);
-  tree ddeflab = build_case_label (NULL_TREE, NULL_TREE,
-  create_anon_label_with_ctx (loc, actor));
-  add_stmt (ddeflab);
+  tree ddeflab = finish_case_label (loc, NULL_TREE, NULL_TREE);
   tree b = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
   b = coro_build_cvt_void_expr_stmt (b, loc);
   add_stmt (b);
@@ -2304,18 +2282,15 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
  frame itself.  */
   tree del_promise_label
 = create_named_label_with_ctx (loc, "coro.delete.promise", actor);
-  b = build_case_label (build_int_cst (short_unsigned_type_node, 1), NULL_TREE,
-   create_anon_label_with_ctx (loc, actor));
-  add_stmt (b);
+  finish_case_label (loc, build_int_cst (short_unsigned_type_node, 1),
+NULL_TREE);
   add_stmt (build_stmt (loc, GOTO_EXPR, del_promise_label));
 
   short unsigned lab_num = 3;
   for (unsigned destr_pt = 0; destr_pt < body_count; destr_pt++)
 {
   tree l_num = build_int_cst (short_unsigned_type_node, lab_num);
-  b = build_case_label (l_num, NULL_TREE,
-   create_anon_label_with_ctx (l

Re: Re: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].

2024-08-27 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-08-28 03:48
To: juzhe.zh...@rivai.ai; gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; jeffreya...@gmail.com; 
pan2...@intel.com; Robin Dapp
Subject: Re: [PATCH] RISC-V: Fix subreg of VLS modes larger than a vector 
[PR116086].
> +(define_mode_iterator V_HAS_HALF [
> +  V2QI V4QI V8QI V16QI V32QI V64QI V128QI V256QI V512QI V1024QI V2048QI 
> V4096QI
> +  V2HI V4HI V8HI V16HI V32HI V64HI V128HI V256HI V512HI V1024HI V2048HI
> +  V2SI V4SI V8SI V16SI V32SI V64SI V128SI V256SI V512SI V1024SI
> +  V2DI V4DI V8DI V16DI V32DI V64DI V128DI V256DI V512DI
> +  V2SF V4SF V8SF V16SF V32SF V64SF V128SF V256SF V512SF V1024SF
> +  V2DF V4DF V8DF V16DF V32DF V64DF V128DF V256DF V512DF
> +])
>
> Seems you missed predicate here ?
> Like:
> (V4096QI "riscv_vector::vls_mode_valid_p (V4096QImode) && TARGET_MIN_VLEN >= 
> 4096")(V32DF "riscv_vector::vls_mode_valid_p (V32DFmode) && 
> TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 256")
 
Yes I did while copying things over, thanks.
 
Attached is V2 with that changed.
 
[PATCH v2] RISC-V: Fix subreg of VLS modes larger than a vector
[PR116086].
 
When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to.  For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at VLEN=128) or the higher part of a single register (at VLEN>128).
 
As the subreg is statically ambiguous we prevent such situations in
can_change_mode_class.
 
The culprit in PR116086 is
 
_12 = BIT_FIELD_REF ;
 
which can be expanded with a vector-vector extract (from V4DI to V2DI).
This patch adds a VLS-mode vector-vector extract that handles "halving"
cases like this one by sliding down the source vector, thus making sure
the correct part is used.
 
PR target/116086
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_extract): Add
vector-vector extract for VLS modes.
* config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid
VLS modes larger than one vector.
* config/riscv/vector-iterators.md: Add vector-vector extract
iterators.
 
gcc/testsuite/ChangeLog:
 
* lib/target-supports.exp: Add effective target checks for
zvl256b and zvl512b.
* gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr116086.c: New test.
---
gcc/config/riscv/autovec.md   |  35 +++
gcc/config/riscv/riscv.cc |  11 +
gcc/config/riscv/vector-iterators.md  | 202 ++
.../riscv/rvv/autovec/pr116086-2-run.c|   6 +
.../gcc.target/riscv/rvv/autovec/pr116086-2.c |  18 ++
.../gcc.target/riscv/rvv/autovec/pr116086.c   |  76 +++
gcc/testsuite/lib/target-supports.exp |  37 
7 files changed, 385 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116086.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f422ec0dc1e..4703b079fcb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1462,6 +1462,41 @@ (define_expand "vec_extractbi"
   DONE;
})
+;; -
+;;  [INT,FP] Extract a vector from a vector.
+;; -
+;; TODO: This can be extended to allow basically any extract mode.
+;; For now this helps optimize VLS subregs like (subreg:V2DI (reg:V4DI) 16)
+;; that would otherwise need to go via memory.
+
+(define_expand "vec_extract"
+  [(set (match_operand: 0 "nonimmediate_operand")
+ (vec_select:
+   (match_operand:VLS_HAS_HALF 1 "register_operand")
+   (parallel
+ [(match_operand 2 "immediate_operand")])))]
+  "TARGET_VECTOR"
+{
+  int sz = GET_MODE_NUNITS (mode).to_constant ();
+  int part = INTVAL (operands[2]);
+
+  rtx start = GEN_INT (part * sz);
+  rtx tmp = operands[1];
+
+  if (part != 0)
+{
+  tmp = gen_reg_rtx (mode);
+
+  rtx ops[] = {tmp, operands[1], start};
+  riscv_vector::emit_vlmax_insn
+ (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+ riscv_vector::BINARY_OP, ops);
+}
+
+  emit_move_insn (operands[0], gen_lowpart (mode, tmp));
+  DONE;
+})
+
;; -
;;  [FP] Binary operations
;; -
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8538d405f50..4b9f3081ac5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10630,6 +10630,17 @@ riscv_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (reg_classes_intersect_p (V_REGS, rclass)
   &

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-08-27 Thread Edwin Lu


On 8/22/2024 5:35 AM, Richard Biener wrote:

On Thu, Aug 22, 2024 at 1:03 AM Edwin Lu  wrote:


Hi,

Just wanted to ping this for more guidance.


It's difficult for me as long as I cannot investigate this with a testcase.  Can
we go ahead with the other parts so the testcase can be added and the
issue reproduced?

Richard.


The testcase can be found in patch 1/2 of the series 
https://inbox.sourceware.org/gcc-patches/2024053512.2625173-2-patr...@rivosinc.com/ 
with the newly added gcc.target/riscv/rvv/autovec/no-segment.c file.


+cc Jeff, Palmer

From what I understand reading over the threads on the previous patch 
series versions, review for patch 1/2 has been deferred until this bug 
fix has been acked so we can't move forward with the other patch yet.


Edwin


On 7/24/2024 12:03 PM, Edwin Lu wrote:


On 7/24/2024 3:52 AM, Richard Biener wrote:

On Wed, Jul 24, 2024 at 1:31 AM Edwin Lu  wrote:


On 7/23/2024 11:20 AM, Richard Sandiford wrote:

Edwin Lu  writes:

On 7/23/2024 4:56 AM, Richard Biener wrote:

On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:

Hi Richard,

On 5/31/2024 1:48 AM, Richard Biener wrote:

On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill
 wrote:

From: Greg McGary 

Still a NACK.  If remain ends up zero then

 /* Try to use a single smaller load when
we are about
to load excess elements compared to
the unrolled
scalar loop.  */
 if (known_gt ((vec_num * j + i + 1) *
nunits,
(group_size * vf -
gap)))
   {
 poly_uint64 remain = ((group_size *
vf - gap)
   - (vec_num * j
+ i) * nunits);
 if (known_ge ((vec_num * j + i + 1)
* nunits
   - (group_size * vf -
gap), nunits))
   /* DR will be unused.  */
   ltype = NULL_TREE;

needs to be re-formulated so that the combined conditions make sure
this doesn't happen.  The outer known_gt should already ensure that
remain > 0.  For correctness that should possibly be maybe_gt
though.

Yeah.  FWIW, I mentioned the maybe_gt thing in
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:

 Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
 We can only skip doing it if we know for sure that the load
won't cross
 the gap.  (Not sure whether the difference can trigger in
practice.)

But AFAICT, the known_gt doesn't inherently prove that remain is known
to be nonzero.  It just proves that the gap between the end of the
scalar
accesses and the end of this vector is known to be nonzero.

[switching round for easier reply]

[removed some clarification questions about poly_int and vector
representations]

What is j and i when the divisor is zero?

The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs =
{2,
2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2
remain = {coeffs = {0, 2}}

OK, so let's use D to mean "data" and G to mean "gap".  Then, for the
minimum vector length of 2 elements, we have:

 DD GG DD GG

The last load will read beyond the scalar loop if the vector loop
happens
to handle all elements of the scalar loop.

For a vector length of 4 elements, we have:

 DDGG DDGG DDGG DDGG

where every load contains both data and gaps.  The same will be true
for larger vectors.

That's where remain={0,2} is coming from.  The last load is fully
redundant
for the minimum VL but not for larger VL.
Based on that, the patch below looks correct to me, but I might have
misunderstood the intent.

As an alternative to the original patch, would this also make sense?

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index aab3aa59962..cd657ac63af 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11479,7 +11479,7 @@ vectorizable_load (vec_info *vinfo,
   /* Try to use a single smaller load when we are
about
  to load excess elements compared to the
unrolled
  scalar loop.  */
-   if (known_gt ((vec_num * j + i + 1) * nunits,
+   if (maybe_gt ((vec_num * j + i + 1) * nunits,
  (group_size * vf - gap)))
 {
   poly_uint64 remain = ((group_size * vf - gap)

That's a good point - this should indeed be maybe_gt


@@ -11502,6 +11502,10 @@ vectorizable_load (vec_info *vinfo,
 /* Aligned access to the gap area when
there's
at least one element in it is OK.  */
 ;
+   else if (maybe_eq (remain, 0))
+ /* Handle remain.coeffs[0] == 0 case.
Number of
+elemen

[PATCH] Fix test failing on sparc

2024-08-27 Thread Andi Kleen

From: Andi Kleen 

SPARC does not support vectorizing conditions, which this test relies
on. Use vect_condition as effective target.

Committed as obvious.

PR testsuite/116500

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-switch-ifcvt-1.c: Use vect_condition to
check if vectorizing conditions is supported for target.
---
 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
index f5352ef8ed7a..2e3a9ae3c249 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_condition } */
 #include "tree-vect.h"
 
 extern void abort (void);
-- 
2.45.2

How do I know if my patch was merged?

2024-08-27 Thread Weslley da Silva Pereira

Hi all,

Thanks for reading my email.

I submitted a patch for libstdc++/complex, but I have no idea if that was
merged. I also have no idea on how to check that. Could someone help me?

Patch name: "[PATCH] libstdc++/complex: Remove implicit type casts in
complex"

Many thanks,
  Weslley

-- 
Weslley S. Pereira

Re: sched1 pathology on RISC-V : PR/114729

2024-08-27 Thread Vineet Gupta

Hi Richard,

On 8/7/24 10:47, Richard Sandiford wrote:
> I should probably start by saying that the "model" heuristic is now
> pretty old and was originally tuned for an in-order AArch32 core.
> The aim wasn't to *minimise* spilling, but to strike a better balance
> between parallelising with spills vs. sequentialising.  At the time,
> scheduling without taking register pressure into account would overly
> parallelise things, whereas the original -fsched-pressure would overly
> serialise (i.e. was too conservative).
>
> There were specific workloads in, er, a formerly popular embedded
> benchmark that benefitted significantly from *some* spilling.
>
> This comment probably sums up the trade-off best:
>
>This pressure cost is deliberately timid.  The intention has been
>to choose a heuristic that rarely interferes with the normal list
>scheduler in cases where that scheduler would produce good code.
>We simply want to curb some of its worst excesses.
>
> Because it was tuned for an in-order core, it was operating in an
> environment where instruction latencies were meaningful and realistic.
> So it still deferred to those to quite a big extent.  This is almost
> certainly too conservative for out-of-order cores.
>
> So...
>
> Vineet Gupta  writes:
>> On 8/5/24 21:31, Jeff Law wrote:
>>> On 8/5/24 5:35 PM, Vineet Gupta wrote:
 Hi Richard,

 I'm reaching out for some insight on PR/114729. Apologies in advance for
 the long email.

 On RISC-V we are hitting sched1 pathology on SPEC2017 Cactu where
 codegen spills are overwhelming the execution: disabling sched1 shaves
 off 1.3 trillion dynamic icounts which is about half of total execution
 (in user mode qemu run).

 I have a reduced test down to a single spill (w/ sched1 and non w/o).
 The full sched1 verbose log for test is attached for reference (so is
 the test case itself).

 The gist of the issue (from schedule pov) is annotated insn below: 46,
 54, 55

      ld    a5,%lo(u)(s0)
      fld    fa5,%lo(f)(t6)
      fcvt.l.d a4,fa4,rtz
      srli    a0,a5,16  # insn 46  (ideal order 1)
      srli    a1,a5,32  # insn 55  (ideal order 3)
      sh    a5,%lo(_Z1sv)(a3)   # insn 44
      slli    a4,a4,3
      srli    a5,a5,48
      fsd    fa5,%lo(e)(t0)
      add    a4,s9,a4
      sh    a0,%lo(_Z1sv+2)(a3)    # insn 54  (ideal order 2)
      sh    a1,%lo(_Z1sv+4)(a3)    # insn 64
      sh    a5,%lo(_Z1sv+6)(a3)

 If insn 54 were scheduled ahead of insn 55, the corresponding reg need
 not be allocated (and consequently not spilled) in the outer loop.
 There are no uses of a0 after insn 54.
>>> So what does the ready queue look like at the start of whatever cycle 
>>> insn 46 fires on?  I would expect insn 46, 55, 44, the slli & fsd just 
>>> based on data dependencies.
>> Dump just before insn 46 is scheduled (most of them are from tail end of
>> BB due to it processing insns from end)
>>
>> ;;        Ready list after ready_sort:
>> 81:50(cost=0:prio=7:delay=2:idx=12) 
>> 80:49(cost=0:prio=6:delay=1:idx=23)  65:44(cost=1:prio=7:idx=20) 
>> 55:42(cost=1:prio=7:idx=18)  94:58(cost=0:prio=2:idx=0) 
>> 92:56(cost=0:prio=5:idx=28)  88:54(cost=0:prio=5:idx=26) 
>> 44:39(cost=0:prio=6:idx=15)  46:40(cost=0:prio=7:idx=16)
>> ;;     13--> b  0: i  46 r180=zxt(r170,0x10,0x10)   
>> :alu:@GR_REGS+1(1)@FP_REGS+0(0)
>>
>> 46 is at the head hence it is scheduled.
>>
 Looking into sched1 logs (#8916):  schedule_insn () is called for insn
 46 and insn 54 added to ready q.
>>> Presumably those happen on different cycles?   I would not expect 45 to 
>>> enter the ready queue on some cycle N.  Then on N+M insn 54 should enter 
>>> the ready queue, prsumably with M == 1.
>> Nope they are in same cycle since 54 is in SD_LIST_FORW of insn 46.
>>
>>   advance = schedule_insn(insn = 46)
>>    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>>     sd_iterator_cond (&sd_it, &dep);)
>>     try_ready
>>   fix_tick_ready
>>  change_queue_index(insn = 54, delay=QUEUE_READY) 
>>
 Then we have ready_sort -> ready_sort_real () -> model_set_excess_costs
 () called for insns in ready q.

 For insn 54, model_excess_cost () there is a reg dead, hence the -1 in
 print below. However its model_excess_group_cost () is still 0,
 disregarding the delta -1.

 ;;        |  17   54 |    6  +1 | GR_REGS:[-1 base cost 0] FP_REGS:[0
 base cost 0]

 Per comments around model_set_excess_costs () this seems intentional /
 as designed "... negative costs are converted to zero ones ...".

 The subsequent qsort () w/ numerous gyrations of rank_for_schedule()
 ends up moving 55 to top.
>>> Presumably due to the number of uses and the types of uses.
>> Its the 3 things in following order:
>>

RE: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

2024-08-27 Thread Li, Pan2

Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, August 19, 2024 10:05 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]

From: Pan Li 

Move the run test of pr116278 to dg/torture and leave the risc-v the
asm check under risc-v part.

PR target/116278

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116278-run-1.c: Take compile instead of run.
* gcc.target/riscv/pr116278-run-2.c: Ditto.
* gcc.dg/torture/pr116278-run-1.c: New test.
* gcc.dg/torture/pr116278-run-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++
 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++
 .../gcc.target/riscv/pr116278-run-1.c |  2 +-
 .../gcc.target/riscv/pr116278-run-2.c |  2 +-
 4 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
new file mode 100644
index 000..8e07fb6af29
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int8_t b[1];
+int8_t *d = b;
+int32_t c;
+
+int main() {
+  b[0] = -40;
+  uint16_t t = (uint16_t)d[0];
+
+  c = (t < 0xFFF6 ? t : 0xFFF6) + 9;
+
+  if (c != 65505)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
new file mode 100644
index 000..d85e21531e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32 } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int16_t b[1];
+int16_t *d = b;
+int64_t c;
+
+int main() {
+  b[0] = -40;
+  uint32_t t = (uint32_t)d[0];
+
+  c = (t < 0xFFF6u ? t : 0xFFF6u) + 9;
+
+  if (c != 4294967265)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c 
b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
index d3812bdcdfb..c758fca7975 100644
--- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { riscv_v } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-expand-details" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c 
b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
index 669cd4f003f..a4da8a323f0 100644
--- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
+++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { riscv_v } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-expand-details" } */
 
 #include 
-- 
2.43.0

RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2

2024-08-27 Thread Li, Pan2

Hi Patrick,

Could you please help to re-trigger the pre-commit?
Thanks in advance!

Pan

-Original Message-
From: Patrick O'Neill  
Sent: Tuesday, August 20, 2024 12:14 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; Jeff Law 

Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and 
oct .SAT_TRUNC form 2

Hi Pan,

Once the postcommit baseline moves forward (trunk is currently failing 
to build linux targets [1] [2]) I'll re-trigger precommit for you.

Thanks,
Patrick

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409
[2]: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1564

On 8/18/24 19:49, Li, Pan2 wrote:
> Turn out that the pre-commit doesn't pick up the newest upstream when testing 
> this patch.
>
> Pan
>
> -Original Message-
> From: Li, Pan2 
> Sent: Monday, August 19, 2024 9:25 AM
> To: Jeff Law ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
> Opps, let me double check what happened to my local tester.
>
> Pan
>
> -Original Message-
> From: Jeff Law 
> Sent: Sunday, August 18, 2024 11:21 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad 
> and oct .SAT_TRUNC form 2
>
>
>
> On 8/18/24 12:10 AM, pan2...@intel.com wrote:
>> From: Pan Li 
>>
>> This patch would like to add test cases for the unsigned scalar quad and
>> oct .SAT_TRUNC form 2.  Aka:
>>
>> Form 2:
>> #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
>> NT __attribute__((noinline)) \
>> sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
>> {\
>>   WT max = (WT)(NT)-1;   \
>>   return x > max ? (NT) max : (NT)x; \
>> }
>>
>> QUAD:
>> DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t)
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t)
>>
>> OCT:
>> DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t)
>>
>> The below test is passed for this patch.
>> * The rv64gcv regression test.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/sat_u_trunc-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-12.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-10.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-11.c: New test.
>>  * gcc.target/riscv/sat_u_trunc-run-12.c: New test.
> Looks like they're failing in the upstream pre-commit tester:
>
>> https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578
>
> jeff

[gcc-wwwdocs PATCH] gcc-15: Mention recent update for x86_64 backend

2024-08-27 Thread Haochen Jiang

Hi all,

Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs, resend
the patch.

This patch will add documentation for recent update in x86-64 backend.

Ok for wwwdocs trunk?

Thx,
Haochen

---

Mention AVX10.2 support and Xeon Phi removal in GCC 15.

---
 htdocs/gcc-15/changes.html | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index d0d6d147..4cb0fa90 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -132,7 +132,23 @@ a work-in-progress.
 code like 1 << offset is not fast enough.
 
 
-
+IA-32/x86-64
+
+
+  New ISA extension support for Intel AVX10.2 was added.
+  AVX10.2 intrinsics are available via the -mavx10.2 or
+  -mavx10.2-256 compiler switch with 256-bit vector size
+  support. 512-bit vector size support for AVX10.2 intrinsics are
+  available via the -mavx10.2-512 compiler switch.
+  
+  Xeon Phi CPUs support (a.k.a. Knight Landing and Knight Mill) were 
removed
+  in GCC 15. GCC will no longer accept -mavx5124fmaps,
+  -mavx5124vnniw, -mavx512er,
+  -mavx512pf, -mprefetchwt1,
+  -march=knl, -march=knm, -mtune=knl
+  or -mtune=knm compiler switches.
+  
+
 
 
 
-- 
2.31.1

Re: [PATCH] MATCH: add abs support for half float

2024-08-27 Thread Kugan Vivekanandarajah

Hi Richard,

Thanks for the reply.

> On 27 Aug 2024, at 7:05 pm, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
>  wrote:
>> 
>> Hi Richard,
>> 
>>> On 22 Aug 2024, at 10:34 pm, Richard Biener  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
>>>  wrote:
 
 Hi Richard,
 
> On 20 Aug 2024, at 6:09 pm, Richard Biener  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
>  wrote:
>> 
>> Thanks for the comments.
>> 
>>> On 2 Aug 2024, at 8:36 pm, Richard Biener  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
>>>  wrote:
 
 
 
> On 1 Aug 2024, at 10:46 pm, Richard Biener 
>  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
>  wrote:
>> 
>> 
>> On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  
>> wrote:
>>> 
>>> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
>>>  wrote:
 
 On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
  wrote:
> 
> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
>  wrote:
>> 
>> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
>>  wrote:
>>> 
>>> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
>>>  wrote:
 
 On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
  wrote:
> 
> On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
>  wrote:
>> 
>> Revised based on the comment and moved it into existing 
>> patterns as.
>> 
>> gcc/ChangeLog:
>> 
>> * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : 
>> -A.
>> Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/tree-ssa/absfloat16.c: New test.
> 
> The testcase needs to make sure it runs only for targets that 
> support
> float16 so like:
> 
> /* { dg-require-effective-target float16 } */
> /* { dg-add-options float16 } */
 Added in the attached version.
>>> 
>>> + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
>>> (for cmp (ge gt)
>>> (simplify
>>> -   (cnd (cmp @0 zerop) @1 (negate @1))
>>> -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
>>> -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
>>> -&& bitwise_equal_p (@0, @1))
>>> +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
>>> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
>>> +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
>>> +&& ((VECTOR_TYPE_P (type)
>>> + && tree_nop_conversion_p (TREE_TYPE (@0), 
>>> TREE_TYPE (@1)))
>>> +   || (!VECTOR_TYPE_P (type)
>>> +   && (TYPE_PRECISION (TREE_TYPE (@1))
>>> +   <= TYPE_PRECISION (TREE_TYPE (@0)
>>> +&& bitwise_equal_p (@1, @2))
>>> 
>>> I wonder about the bitwise_equal_p which tests @1 against @2 now
>>> with the convert still applied to @1 - that looks odd.  You are 
>>> allowing
>>> sign-changing conversions but doesn't that change ge/gt 
>>> behavior?
>>> Also why are sign/zero-extensions not OK for vector types?
>> Thanks for the review.
>> My main motivation here is for _Float16  as below.
>> 
>> _Float16 absfloat16 (_Float16 x)
>> {
>> float _1;
>> _Float16 _2;
>> _Float16 _4;
>>  [local count: 1073741824]:
>> _1 = (float) x_3(D);
>> if (_1 < 0.0)
>> goto ; [41.00%]
>> else
>> goto ; [59.00%]
>>  [local count: 440234144]:\
>> _4 = -x_3(D);
>>  [local count: 1073741824]:
>> # _2 = PHI <_4(3), x_3(D)(2)>
>> return _2;
>> }

New Georgian PO file for 'gcc' (version 14.2.0)

2024-08-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Georgian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/ka.po

(This file, 'gcc-14.2.0.ka.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

1 2 >

1 - 100 of 109 matches

Mail list logo