Re: [PATCH 39/52] i386: New hook implementation ix86_c_mode_for_floating_type

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin  wrote:
>
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in i386 port, and add new port specific hook
> implementation ix86_c_mode_for_floating_type.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_c_mode_for_floating_type): New
> function.
> (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
> * config/i386/i386.h (FLOAT_TYPE_SIZE): Remove.
> (DOUBLE_TYPE_SIZE): Likewise.
> (LONG_DOUBLE_TYPE_SIZE): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 15 +++
>  gcc/config/i386/i386.h  |  4 
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..6abb6d7a1ca 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25794,6 +25794,19 @@ ix86_bitint_type_info (int n, struct bitint_info 
> *info)
>return true;
>  }
>
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return DFmode, TFmode
> +   or XFmode for TI_LONG_DOUBLE_TYPE which is for long double type,
> +   based on long double bits, go with the default one for the others.  */
> +
> +static machine_mode
> +ix86_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  if (ti == TI_LONG_DOUBLE_TYPE)
> +return (TARGET_LONG_DOUBLE_64 ? DFmode
> + : (TARGET_LONG_DOUBLE_128 ? TFmode : 
> XFmode));
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Returns modified FUNCTION_TYPE for cdtor callabi.  */
>  tree
>  ix86_cxx_adjust_cdtor_callabi_fntype (tree fntype)
> @@ -26419,6 +26432,8 @@ static const scoped_attribute_specs *const 
> ix86_attribute_table[] =
>  #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
>  #undef TARGET_C_BITINT_TYPE_INFO
>  #define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE ix86_c_mode_for_floating_type
>  #undef TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE
>  #define TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE 
> ix86_cxx_adjust_cdtor_callabi_fntype
>  #undef TARGET_PROMOTE_PROTOTYPES
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 359a8408263..fad434c10d6 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -675,10 +675,6 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>  #define LONG_TYPE_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define POINTER_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define LONG_LONG_TYPE_SIZE 64
> -#define FLOAT_TYPE_SIZE 32
> -#define DOUBLE_TYPE_SIZE 64
> -#define LONG_DOUBLE_TYPE_SIZE \
> -  (TARGET_LONG_DOUBLE_64 ? 64 : (TARGET_LONG_DOUBLE_128 ? 128 : 80))
>
>  #define WIDEST_HARDWARE_FP_SIZE 80
>
> --
> 2.43.0
>


Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt  wrote:
>
> W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for
> movdfcc/movsfcc, and could possibly fail cost comparison. Increase
> branch cost could hurt performance for other modes, so specially add
> some preference for floating point ifcvt.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_noce_conversion_profitable_p): Add
> some preference for floating point ifcvt when SSE4.1 is not
> available.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115299.c: New test.
> * gcc.target/i386/pr86722.c: Adjust testcase.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc  | 17 +
>  gcc/testsuite/gcc.target/i386/pr115299.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr86722.c  |  2 +-
>  3 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115299.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1a0206ab573..271da127a89 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24879,6 +24879,23 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> struct noce_if_info *if_info)
> return false;
> }
>  }
> +
> +  /* W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por)
> + for movdfcc/movsfcc, and could possibly fail cost comparison.
> + Increase branch cost will hurt performance for other modes, so
> + specially add some preference for floating point ifcvt.  */
> +  if (!TARGET_SSE4_1 && if_info->x
> +  && GET_MODE_CLASS (GET_MODE (if_info->x)) == MODE_FLOAT
> +  && if_info->speed_p)
> +{
> +  unsigned cost = seq_cost (seq, true);
> +
> +  if (cost <= if_info->original_cost)
> +   return true;
> +
> +  return cost <= (if_info->max_seq_cost + COSTS_N_INSNS (2));
> +}
> +
>return default_noce_conversion_profitable_p (seq, if_info);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr115299.c 
> b/gcc/testsuite/gcc.target/i386/pr115299.c
> new file mode 100644
> index 000..53c5899136a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115299.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-sse4.1 -msse2" } */
> +
> +void f(double*d,double*e){
> +  for(;d +*d=(*d<.5)?.7:0;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)(?:cmpnltsd|cmpltsd)} } } */
> +/* { dg-final { scan-assembler {(?n)(?:andnpd|andpd)} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr86722.c 
> b/gcc/testsuite/gcc.target/i386/pr86722.c
> index 4de2ca1a6c0..e266a1e56c2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86722.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86722.c
> @@ -6,5 +6,5 @@ void f(double*d,double*e){
>  *d=(*d<.5)?.7:0;
>  }
>
> -/* { dg-final { scan-assembler-not "andnpd" } } */
> +/* { dg-final { scan-assembler-times {(?n)(?:andnpd|andpd)} 1 } } */
>  /* { dg-final { scan-assembler-not "orpd" } } */
> --
> 2.31.1
>


[patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')

2024-06-03 Thread Tobias Burnus

Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is

"ld: Global constructor/destructor support".


The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

OK? Comments, remarks?

Tobias

PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
(nvptx-ld.cc: Improve C++11 compatibility with older compilers)
proofs worthwhile and gets merged, we should point to that commit
instead.install.texi (nvptx): Recommend nvptx-tools 2024-05-30

gcc/
	* doc/install.texi (nvptx): Recommend nvptx-tools 2024-05-30 or newer.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..4859f6743ab 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4698,7 +4698,8 @@ Andes NDS32 target in big endian mode.
 Nvidia PTX target.
 
 Instead of GNU binutils, you will need to install
-@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}.
+@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}
+(recommended: 96f8fc5 of 2024-05-30 -- or newer).
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
 


[nvptx] *ping* - [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-06-03 Thread Tobias Burnus

Hi Thomas, hi Tom,

any comment regarding this patch?
 https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650007.html

Tobias

Am 25.04.24 um 12:51 schrieb Tobias Burnus:

Motivated by a surprise of a colleague that with -m32,
no offload dumps were created; that's because mkoffload
does not process host binaries when the are 32bit (i.e. ilp32).

Internally, that done as follows: The host compiler passes to
'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64

That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
and rs6000.

While it is sensible (albeit not strictly required) that GCC requires that
the host and device side agree and that only 64bit is implemented for the
device side, it can be confusing that silently no offloading code is generated.


Hence, I propose to print a warning in that case - as implemented in the 
attached patch:

$ gcc -fopenmp -m32 test.c
nvptx mkoffload: warning: offload code generation skipped: offloading with 
32-bit host code is currently not supported
gcn mkoffload: warning: offload code generation skipped: offloading with 32-bit 
host code is currently not supported

* * *

This shouldn't have any effect on offload builds using -m64
and non-offload builds – while several testcases already have
issues with '-m32' when offloading is enabled or an offloading
device is available.

To make it not worse, this patch adds some pruning and for
a subset of the failing testcases, I added code to avoids FAILS.
There are some more fails, but those aren't new.

Comments, remarks, suggestions?
Is the mkoffload.cc part is okay?

Tobias



[PATCH] pair-fusion: fix for older GCC

2024-06-03 Thread Marc Poulhiès
Older GCCs fail with:

  .../gcc/pair-fusion.cc: In member function ‘bool 
pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
rtl_ssa::in
  sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
  .../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, namespace, 
or enumeration
 if (m_pass->should_handle_writeback (writeback::ALL)

Renaming the enum type works around the name conflict with the local
variable and also prevents future similar conflicts.

gcc/ChangeLog:

* pair-fusion.h (enum class writeback): Rename to...
(enum class writeback_type): ...this.
(struct pair_fusion): Adjust type name after renaming.
* pair-fusion.cc (pair_fusion_bb_info::track_access): Likewise.
(pair_fusion_bb_info::fuse_pair): Likewise.
(pair_fusion::process_block): Likewise.
---
Patch discussed in 
https://inbox.sourceware.org/gcc-patches/mptwmn93njq@arm.com/

Tested on x86_64-linux-gnu. OK for master?

 gcc/pair-fusion.cc | 6 +++---
 gcc/pair-fusion.h  | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 9f897ac04e2..26b2284ed37 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -426,7 +426,7 @@ pair_fusion_bb_info::track_access (insn_info *insn, bool 
load_p, rtx mem)
 return;
 
   // Ignore writeback accesses if the hook says to do so.
-  if (!m_pass->should_handle_writeback (writeback::EXISTING)
+  if (!m_pass->should_handle_writeback (writeback_type::EXISTING)
   && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
 return;
 
@@ -1787,7 +1787,7 @@ pair_fusion_bb_info::fuse_pair (bool load_p,
   // update of the base register and try and fold it in to make this into a
   // writeback pair.
   insn_info *trailing_add = nullptr;
-  if (m_pass->should_handle_writeback (writeback::ALL)
+  if (m_pass->should_handle_writeback (writeback_type::ALL)
   && !writeback_effect
   && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
 XEXP (pats[0], 0), nullptr)
@@ -2996,7 +2996,7 @@ void pair_fusion::process_block (bb_info *bb)
   rtx pat = PATTERN (rti);
   bool load_p;
   if (reload_completed
- && should_handle_writeback (writeback::ALL)
+ && should_handle_writeback (writeback_type::ALL)
  && pair_mem_insn_p (rti, load_p))
try_promote_writeback (insn, load_p);
 
diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
index 2a38dc8f743..45e4edceecb 100644
--- a/gcc/pair-fusion.h
+++ b/gcc/pair-fusion.h
@@ -75,7 +75,7 @@ struct alias_walker;
 
 // When querying should_handle_writeback, this enum is used to
 // qualify which opportunities we are asking about.
-enum class writeback {
+enum class writeback_type {
   // Only those writeback opportunities that arise from existing
   // auto-increment accesses.
   EXISTING,
@@ -123,7 +123,7 @@ struct pair_fusion {
   // Return true if we should try to handle writeback opportunities.
   // WHICH determines the kinds of writeback opportunities the caller
   // is asking about.
-  virtual bool should_handle_writeback (enum writeback which) = 0;
+  virtual bool should_handle_writeback (writeback_type which) = 0;
 
   // Given BASE_MEM, the mem from the lower candidate access for a pair,
   // and LOAD_P (true if the access is a load), check if we should proceed
-- 
2.45.1



Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-06-03 Thread Marc Poulhiès
Richard Sandiford  writes:

> Marc Poulhiès  writes:
>> Hello,
>>
>> I can't bootstrap using gcc 5.5 since this change. It fails with:
>>
>> .../gcc/pair-fusion.cc: In member function ‘bool 
>> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
>> rtl_ssa::in
>> sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
>> .../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, 
>> namespace, or enumeration
>>if (m_pass->should_handle_writeback (writeback::ALL)
>> ^
>> Is it possible that C++11 enum classes are not correctly supported in
>> older GCC?
>
> Looks to be due to an overloading of "writeback", which is also a local
> variable in that function.
>
> One fix would be to rename the type to "writeback_type".
> FWIW, the "enum"s in "enum writeback" can also be removed,
> so it'd be s/enum writeback/writeback_type/.

Thanks Richard!

I've submitted a patch based on your suggestion and checked the compiler
was correctly built using gcc 5.5.

Marc


Re: [PATCH] pair-fusion: fix for older GCC

2024-06-03 Thread Richard Sandiford
Marc Poulhiès  writes:
> Older GCCs fail with:
>
>   .../gcc/pair-fusion.cc: In member function ‘bool 
> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
> rtl_ssa::in
>   sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
>   .../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, 
> namespace, or enumeration
>  if (m_pass->should_handle_writeback (writeback::ALL)
>
> Renaming the enum type works around the name conflict with the local
> variable and also prevents future similar conflicts.
>
> gcc/ChangeLog:
>
>   * pair-fusion.h (enum class writeback): Rename to...
>   (enum class writeback_type): ...this.
>   (struct pair_fusion): Adjust type name after renaming.
>   * pair-fusion.cc (pair_fusion_bb_info::track_access): Likewise.
>   (pair_fusion_bb_info::fuse_pair): Likewise.
>   (pair_fusion::process_block): Likewise.

OK, thanks, and sorry for missing this during the review.

Richard

> ---
> Patch discussed in 
> https://inbox.sourceware.org/gcc-patches/mptwmn93njq@arm.com/
>
> Tested on x86_64-linux-gnu. OK for master?
>
>  gcc/pair-fusion.cc | 6 +++---
>  gcc/pair-fusion.h  | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> index 9f897ac04e2..26b2284ed37 100644
> --- a/gcc/pair-fusion.cc
> +++ b/gcc/pair-fusion.cc
> @@ -426,7 +426,7 @@ pair_fusion_bb_info::track_access (insn_info *insn, bool 
> load_p, rtx mem)
>  return;
>  
>// Ignore writeback accesses if the hook says to do so.
> -  if (!m_pass->should_handle_writeback (writeback::EXISTING)
> +  if (!m_pass->should_handle_writeback (writeback_type::EXISTING)
>&& GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
>  return;
>  
> @@ -1787,7 +1787,7 @@ pair_fusion_bb_info::fuse_pair (bool load_p,
>// update of the base register and try and fold it in to make this into a
>// writeback pair.
>insn_info *trailing_add = nullptr;
> -  if (m_pass->should_handle_writeback (writeback::ALL)
> +  if (m_pass->should_handle_writeback (writeback_type::ALL)
>&& !writeback_effect
>&& (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1,
>XEXP (pats[0], 0), nullptr)
> @@ -2996,7 +2996,7 @@ void pair_fusion::process_block (bb_info *bb)
>rtx pat = PATTERN (rti);
>bool load_p;
>if (reload_completed
> -   && should_handle_writeback (writeback::ALL)
> +   && should_handle_writeback (writeback_type::ALL)
> && pair_mem_insn_p (rti, load_p))
>   try_promote_writeback (insn, load_p);
>  
> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> index 2a38dc8f743..45e4edceecb 100644
> --- a/gcc/pair-fusion.h
> +++ b/gcc/pair-fusion.h
> @@ -75,7 +75,7 @@ struct alias_walker;
>  
>  // When querying should_handle_writeback, this enum is used to
>  // qualify which opportunities we are asking about.
> -enum class writeback {
> +enum class writeback_type {
>// Only those writeback opportunities that arise from existing
>// auto-increment accesses.
>EXISTING,
> @@ -123,7 +123,7 @@ struct pair_fusion {
>// Return true if we should try to handle writeback opportunities.
>// WHICH determines the kinds of writeback opportunities the caller
>// is asking about.
> -  virtual bool should_handle_writeback (enum writeback which) = 0;
> +  virtual bool should_handle_writeback (writeback_type which) = 0;
>  
>// Given BASE_MEM, the mem from the lower candidate access for a pair,
>// and LOAD_P (true if the access is a load), check if we should proceed


[wwwdocs] gcc-15/changes.html (nvptx): Constructors are now supported

2024-06-03 Thread Tobias Burnus

Comments or fine as is?

Tobias
gcc-15/changes.html (nvptx): Constructors are now supported

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..b3305079 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -85,7 +103,14 @@ a work-in-progress.
 
 
 
-
+NVPTX
+
+
+  GCC's nvptx target now supports constructors and destructors;
+  for this, a recent version of nvptx-tools is https://gcc.gnu.org/install/specific.html#nvptx-x-none";
+  >required.
+
 
 
 



Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')

2024-06-03 Thread Richard Biener
On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Thomas Schwinge wrote:
> > In the following, I have then reconsidered that stance; we may actually
> > "Implement global constructor, destructor support in a conceptually
> > simpler way than using 'collect2' (the program): implement the respective
> > functionality in the nvptx-tools 'ld'".  The latter is
> > 
> > "ld: Global constructor/destructor support".
> 
> The attached patch makes clearer which version should be
> installed by recommending this patch (= latest nvptx-tools)
> in install.texi.
> 
> OK? Comments, remarks?

Can we simply say "newerst" where I guess refering to a github repo
already implies this?

> Tobias
> 
> PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
> (nvptx-ld.cc: Improve C++11 compatibility with older compilers)
> proofs worthwhile and gets merged, we should point to that commit
> instead.


RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-03 Thread Hu, Lin1
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 31, 2024 8:41 PM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, 
> float
> -> float and int <-> float.
> 
> On Fri, 31 May 2024, Hu, Lin1 wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, May 29, 2024 5:41 PM
> > > To: Hu, Lin1 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > ubiz...@gmail.com
> > > Subject: Re: [PATCH 1/3] vect: generate suitable convert insn for
> > > int -> int, float
> > > -> float and int <-> float.
> > >
> > > On Thu, 23 May 2024, Hu, Lin1 wrote:
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/107432
> > > > * tree-vect-generic.cc
> > > > (supportable_indirect_narrowing_operation): New function for
> > > > support indirect narrowing convert.
> > > > (supportable_indirect_widening_operation): New function for
> > > > support indirect widening convert.
> > > > (expand_vector_conversion): Support convert for int -> int,
> > > > float -> float and int <-> float.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/107432
> > > > * gcc.target/i386/pr107432-1.c: New test.
> > > > * gcc.target/i386/pr107432-2.c: Ditto.
> > > > * gcc.target/i386/pr107432-3.c: Ditto.
> > > > * gcc.target/i386/pr107432-4.c: Ditto.
> > > > * gcc.target/i386/pr107432-5.c: Ditto.
> > > > * gcc.target/i386/pr107432-6.c: Ditto.
> > > > * gcc.target/i386/pr107432-7.c: Ditto.
> > > > ---
> > > > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> > > > index
> > > > ab640096ca2..0bedb53d9f9 100644
> > > > --- a/gcc/tree-vect-generic.cc
> > > > +++ b/gcc/tree-vect-generic.cc
> > > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not
> > > > see #include "gimple-match.h"
> > > >  #include "recog.h" /* FIXME: for insn_data */
> > > >  #include "optabs-libfuncs.h"
> > > > +#include "cfgloop.h"
> > > > +#include "tree-vectorizer.h"
> > > >
> > > >
> > > >  /* Build a ternary operation and gimplify it.  Emit code before GSI.
> > > > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> > > (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> > > >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> > > >
> > > > +/* A subroutine of expand_vector_conversion, support indirect
> > > > +conversion
> > > for
> > > > +   float <-> int, like double -> char.  */ bool
> > > > +supportable_indirect_narrowing_operation (gimple_stmt_iterator *gsi,
> > > > +enum tree_code code,
> > > > +tree lhs,
> > > > +tree arg)
> > > > +{
> > > > +  gimple *g;
> > > > +  tree ret_type = TREE_TYPE (lhs);
> > > > +  tree arg_type = TREE_TYPE (arg);
> > > > +  tree new_rhs;
> > > > +
> > > > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > > > + unsigned int arg_elt_bits = vector_element_bits (arg_type);  if
> > > > + (code != FIX_TRUNC_EXPR || flag_trapping_math || ret_elt_bits >=
> > > arg_elt_bits)
> > > > +return false;
> > > > +
> > > > +  unsigned short target_size;
> > > > +  scalar_mode tmp_cvt_mode;
> > > > +  scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (ret_type));
> > > > + scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (arg_type));
> > > > + tree cvt_type = NULL_TREE;  tmp_cvt_mode = lhs_mode;
> > > > + target_size = GET_MODE_SIZE (rhs_mode);
> > > > +
> > > > +  opt_scalar_mode mode_iter;
> > > > +  enum tree_code tc1, tc2;
> > > > +  unsigned HOST_WIDE_INT nelts
> > > > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > > > +
> > > > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > > > +{
> > > > +  tmp_cvt_mode = mode_iter.require ();
> > > > +
> > > > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > > > +   break;
> > > > +
> > > > +  scalar_mode cvt_mode;
> > > > +  int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> > > > +  if (!int_mode_for_size (tmp_cvt_size, 0).exists (&cvt_mode))
> > > > +   break;
> > > > +
> > > > +  int cvt_size = GET_MODE_BITSIZE (cvt_mode);
> > > > +  bool isUnsigned = TYPE_UNSIGNED (ret_type) || TYPE_UNSIGNED
> > > (arg_type);
> > > > +  cvt_type = build_nonstandard_integer_type (cvt_size,
> > > > + isUnsigned);
> > > > +
> > > > +  cvt_type = build_vector_type (cvt_type, nelts);
> > > > +  if (cvt_type == NULL_TREE
> > > > + || !supportable_convert_operation ((tree_code) NOP_EXPR,
> > > > +ret_type,
> > > > +cvt_type, &tc1)
> > > > + || !supportable_convert_operation ((tree_code) code,
> > > > +cvt_type

Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus

Richard Biener wrote:

On Mon, 3 Jun 2024, Tobias Burnus wrote:

Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is

"ld: Global constructor/destructor support".

The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

Can we simply say "newerst" where I guess refering to a github repo
already implies this?


Good question. The problem I see with just referring to a repository 
(even with newest) often means: yes, that software I have (whatever 
version). While if some reference goes to a 2024 version, I might not 
know what version I have but likely an older version → I will update.


Admittedly, as people tend to *not* read the documentation, this 
approach might fail as well. But, maybe, it is sufficient to update GCC 
15's release notes?*


It won't help those not reading with the release notes before building 
and the wording* had to be changed a bit as install.texi no longer 
states what version should be used, but it would be an alternative


(*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html

Tobias



Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Richard:
> On 31/05/24 8:08 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> On 31/05/24 3:23 pm, Richard Sandiford wrote:
 Ajit Agarwal  writes:
> Hello All:
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional 
> virtual
> function implementation required for rs6000 are added in 
> aarch64-ldp-fusion.cc.
>
> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> aarch64, rs6000, middle-end: Add implementation for different targets for 
> pair mem fusion
>
> Common infrastructure using generic code for pair mem fusion of different
> targets.
>
> rs6000 target specific specific code implements virtual functions defined
> by generic code.
>
> Code is implemented with pure virtual functions to interface with target
> code.
>
> Target specific code are added in rs6000-mem-fusion.cc and additional 
> virtual
> function implementation required for rs6000 are added in 
> aarch64-ldp-fusion.cc.
>
> 2024-05-31  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>   implementation of additional virtual functions added in pair_fusion
>   struct.
>   * config/rs6000/rs6000-passes.def: New mem fusion pass
>   before pass_early_remat.
>   * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>   Add target specific implementation using pure virtual
>   functions.
>   * config.gcc: Add new object file.
>   * config/rs6000/rs6000-protos.h: Add new prototype for mem
>   fusion pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>   from private.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/me-fusion.C: New test.
>   * g++.target/powerpc/mem-fusion-1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---

 This isn't a complete review, just some initial questions & comments
 about selected parts.

> [...]
> +/* Check whether load can be fusable or not.
> +   Return true if dependent use is UNSPEC otherwise false.  */
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  rtx_insn *insn = info->rtl ();
> +
> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
> +if (REG_NOTE_KIND (note) == REG_EQUAL
> + || REG_NOTE_KIND (note) == REG_EQUIV)
> +  return false;

 It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
 note.  What's the reason for doing this?  Are you trying to avoid
 fusing pairs before reload that are equivalent to a MEM (i.e. have
 a natural spill slot)?  I think Alex hit a similar situation.

>>>
>>> We have used the above check because of some SPEC benchmarks failing with
>>> with MEM pairs having REG_EQUAL/EQUIV notes.
>>>
>>> By adding the checks the benchmarks passes and also it improves the
>>> performance.
>>>
>>> This checks were added during initial implementation of pair fusion
>>> pass.
>>>
>>> I will investigate further if this check is still required or not.
>> 
>> Thanks.  If it does affect SPEC results, it would be good to look
>> at the underlying reason, as a justification for the check.
>> 
>> AIUI, the case Alex was due to the way that the RA recognises:
>> 
>>   (set (reg R) (mem address-of-a-stack-variable))
>> REG_EQUIV: (mem address-of-a-stack-variable)
>> 
>> where the REG_EQUIV is either explicit or detected by the RA.
>> If R needs to be spilled, it can then be spilled to its existing
>> location on the stack.  And if R needs to be spilled in the
>> instruction above (because of register pressure before the first
>> use of R), the RA is able to delete the instruction.
>> 
>> But if that is the reason, the condition should be restricted
>> to cases in which the note is a memory.
>> 
>> I think Alex had tried something similar and found that it wasn't
>> always effective.
>> 
>
> Sure I will check.
>>> [...]
> +&& info->is_real ())
> +   {
> + rtx_insn *rtl_insn = info->rtl ();
> + rtx set = single_set (rtl_insn);
> +
> + if (set == NULL_RTX)
> +   return false;
> +
> + rtx op0 = SET_SRC (set);
> + if (GET_CODE (op0) != UNSPEC)
> +   return false;

 What's the motivation for rejecting unspecs?  It's unsual to trea

[COMMITTED] testsuite: Require vect_shift in gcc.dg/vect/pr112325.c [PR115303]

2024-06-03 Thread Rainer Orth
The new gcc.dg/vect/pr112325.c test FAILs on Solaris/SPARC:

FAIL: gcc.dg/vect/pr112325.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/pr112325.c scan-tree-dump-times vect "vectorized 1 loops" 1

As analyzed in the PR, the test requires vect_shift, so this patch adds
that requirement.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Pre-approved by Richard in the PR, committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-06-03  Rainer Orth  

gcc/testsuite:
PR tree-optimization/115303
* gcc.dg/vect/pr112325.c: Require vect_shift.

diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c b/gcc/testsuite/gcc.dg/vect/pr112325.c
--- a/gcc/testsuite/gcc.dg/vect/pr112325.c
+++ b/gcc/testsuite/gcc.dg/vect/pr112325.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -funroll-loops -fdump-tree-vect-details" } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
 /* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
 
 typedef unsigned short ggml_fp16_t;


Re: [PATCH 02/52] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Iain Buclaw
Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
> TYPE_PRECISION of long_double_type_node.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 

Thanks, one question though: Is TYPE_PRECISION really equivalent to
LONG_DOUBLE_TYPE_SIZE?

Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
the answer to be "no".

i.e: TYPE_PRECISION = 80, but LONG_DOUBLE_TYPE_SIZE = 96 or 128.

Iain.


Re: [PATCH 01/52] ada: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Eric Botcazou
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in ada
> with TYPE_PRECISION of long_double_type_node.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/ada/ChangeLog:
> 
>   * gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
>   long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

OK, thanks.

-- 
Eric Botcazou




Re: [PATCH 02/52] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Iain,

on 2024/6/3 16:40, Iain Buclaw wrote:
> Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
>> TYPE_PRECISION of long_double_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
> 
> Thanks, one question though: Is TYPE_PRECISION really equivalent to
> LONG_DOUBLE_TYPE_SIZE?

Yes, it's guaranteed by the code in build_common_tree_nodes:

  long_double_type_node = make_node (REAL_TYPE);
  TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
  layout_type (long_double_type_node);

, the macro LONG_DOUBLE_TYPE_SIZE is assigned to TYPE_PRECISION of
long_double_type_node, layout_type will only pick up one mode as
the given precision and won't change it.

> 
> Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
> the answer to be "no".

I'm afraid it's poorly named before.

> 
> i.e: TYPE_PRECISION = 80, but LONG_DOUBLE_TYPE_SIZE = 96 or 128.

>From what I interpreted from the code, it should never happen.

BR,
Kewen



Re: [PATCH 51/52] sparc: New hook implementation sparc_c_mode_for_floating_type

2024-06-03 Thread Eric Botcazou
>   * config/sparc/sparc.cc (sparc_c_mode_for_floating_type): New
>   (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>   (FLOAT_TYPE_SIZE): Remove.
>   (DOUBLE_TYPE_SIZE): Likewise.
>   (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   (sparc_type_code): Replace FLOAT_TYPE_SIZE with TYPE_PRECISION of
>   float_type_node.
>   * config/sparc/sparc.h (FLOAT_TYPE_SIZE): Remove.
>   (DOUBLE_TYPE_SIZE): Remove.
>   * config/sparc/freebsd.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/linux.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/linux64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/netbsd-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/openbsd64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/sol2.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/sp-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>   * config/sparc/sp64-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>   (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.

OK, modulo the following tweaks:

> --- a/gcc/config/sparc/sparc.cc
> +++ b/gcc/config/sparc/sparc.cc
> @@ -718,6 +718,7 @@ static bool sparc_vectorize_vec_perm_const
> (machine_mode, machine_mode, const vec_perm_indices &);
>  static bool sparc_can_follow_jump (const rtx_insn *, const rtx_insn *);
>  static HARD_REG_SET sparc_zero_call_used_regs (HARD_REG_SET);
> +static machine_mode sparc_c_mode_for_floating_type (enum tree_index);
>  
>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>  /* Table of valid machine attributes.  */
> @@ -971,6 +972,9 @@ char sparc_hard_reg_printed[8];
>  #undef TARGET_ZERO_CALL_USED_REGS
>  #define TARGET_ZERO_CALL_USED_REGS sparc_zero_call_used_regs
> 
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE sparc_c_mode_for_floating_type
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
>  /* Return the memory reference contained in X if any, zero otherwise.  */
> @@ -9824,16 +9828,9 @@ sparc_assemble_integer (rtx x, unsigned int size, int
> aligned_p) #define LONG_LONG_TYPE_SIZE (BITS_PER_WORD * 2)
>  #endif
> 
> -#ifndef FLOAT_TYPE_SIZE
> -#define FLOAT_TYPE_SIZE BITS_PER_WORD
> -#endif
> -
> -#ifndef DOUBLE_TYPE_SIZE
> -#define DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
> -#endif
> -
> -#ifndef LONG_DOUBLE_TYPE_SIZE
> -#define LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
> +/* LONG_DOUBLE_TYPE_SIZE get poisoned, so add SPARC_ prefix.  */
> +#ifndef SPARC_LONG_LONG_TYPE_SIZE
> +#define SPARC_LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>  #endif
> 
>  unsigned long

You can delete {SPARC_}LONG_DOUBLE_TYPE_SIZE too.

> @@ -9920,7 +9917,7 @@ sparc_type_code (tree type)
> /* Carefully distinguish all the standard types of C,
>without messing up if the language is not C.  */
> 
> -   if (TYPE_PRECISION (type) == FLOAT_TYPE_SIZE)
> +   if (TYPE_PRECISION (type) == TYPE_PRECISION (float_type_node))
>   return (qualifiers | 6);
> 
> else
> @@ -13984,4 +13981,16 @@ sparc_zero_call_used_regs (HARD_REG_SET
> need_zeroed_hardregs) return need_zeroed_hardregs;
>  }
> 
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
> +   for TI_LONG_DOUBLE_TYPE which is for long double type, go with the
> +   default one for the others.  */
> +
> +static machine_mode
> +sparc_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  if (ti == TI_LONG_DOUBLE_TYPE)
> +return SPARC_LONG_DOUBLE_TYPE_SIZE == 128 ? TFmode : DFmode;
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  #include "gt-sparc.h"

I think that TI_LONG_DOUBLE_TYPE is self-explanatory so just:

/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
   for TI_LONG_DOUBLE_TYPE and the default for others.

-- 
Eric Botcazou




RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-03 Thread Richard Biener
On Mon, 3 Jun 2024, Hu, Lin1 wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 31, 2024 8:41 PM
> > To: Hu, Lin1 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > ubiz...@gmail.com
> > Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for int -> 
> > int, float
> > -> float and int <-> float.
> > 
> > On Fri, 31 May 2024, Hu, Lin1 wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Wednesday, May 29, 2024 5:41 PM
> > > > To: Hu, Lin1 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > > ubiz...@gmail.com
> > > > Subject: Re: [PATCH 1/3] vect: generate suitable convert insn for
> > > > int -> int, float
> > > > -> float and int <-> float.
> > > >
> > > > On Thu, 23 May 2024, Hu, Lin1 wrote:
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   PR target/107432
> > > > >   * tree-vect-generic.cc
> > > > >   (supportable_indirect_narrowing_operation): New function for
> > > > >   support indirect narrowing convert.
> > > > >   (supportable_indirect_widening_operation): New function for
> > > > >   support indirect widening convert.
> > > > >   (expand_vector_conversion): Support convert for int -> int,
> > > > >   float -> float and int <-> float.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >   PR target/107432
> > > > >   * gcc.target/i386/pr107432-1.c: New test.
> > > > >   * gcc.target/i386/pr107432-2.c: Ditto.
> > > > >   * gcc.target/i386/pr107432-3.c: Ditto.
> > > > >   * gcc.target/i386/pr107432-4.c: Ditto.
> > > > >   * gcc.target/i386/pr107432-5.c: Ditto.
> > > > >   * gcc.target/i386/pr107432-6.c: Ditto.
> > > > >   * gcc.target/i386/pr107432-7.c: Ditto.
> > > > > ---
> > > > > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> > > > > index
> > > > > ab640096ca2..0bedb53d9f9 100644
> > > > > --- a/gcc/tree-vect-generic.cc
> > > > > +++ b/gcc/tree-vect-generic.cc
> > > > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not
> > > > > see #include "gimple-match.h"
> > > > >  #include "recog.h"   /* FIXME: for insn_data */
> > > > >  #include "optabs-libfuncs.h"
> > > > > +#include "cfgloop.h"
> > > > > +#include "tree-vectorizer.h"
> > > > >
> > > > >
> > > > >  /* Build a ternary operation and gimplify it.  Emit code before GSI.
> > > > > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> > > > (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> > > > >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> > > > >
> > > > > +/* A subroutine of expand_vector_conversion, support indirect
> > > > > +conversion
> > > > for
> > > > > +   float <-> int, like double -> char.  */ bool
> > > > > +supportable_indirect_narrowing_operation (gimple_stmt_iterator *gsi,
> > > > > +  enum tree_code code,
> > > > > +  tree lhs,
> > > > > +  tree arg)
> > > > > +{
> > > > > +  gimple *g;
> > > > > +  tree ret_type = TREE_TYPE (lhs);
> > > > > +  tree arg_type = TREE_TYPE (arg);
> > > > > +  tree new_rhs;
> > > > > +
> > > > > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > > > > + unsigned int arg_elt_bits = vector_element_bits (arg_type);  if
> > > > > + (code != FIX_TRUNC_EXPR || flag_trapping_math || ret_elt_bits >=
> > > > arg_elt_bits)
> > > > > +return false;
> > > > > +
> > > > > +  unsigned short target_size;
> > > > > +  scalar_mode tmp_cvt_mode;
> > > > > +  scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (ret_type));
> > > > > + scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (arg_type));
> > > > > + tree cvt_type = NULL_TREE;  tmp_cvt_mode = lhs_mode;
> > > > > + target_size = GET_MODE_SIZE (rhs_mode);
> > > > > +
> > > > > +  opt_scalar_mode mode_iter;
> > > > > +  enum tree_code tc1, tc2;
> > > > > +  unsigned HOST_WIDE_INT nelts
> > > > > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > > > > +
> > > > > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > > > > +{
> > > > > +  tmp_cvt_mode = mode_iter.require ();
> > > > > +
> > > > > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > > > > + break;
> > > > > +
> > > > > +  scalar_mode cvt_mode;
> > > > > +  int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> > > > > +  if (!int_mode_for_size (tmp_cvt_size, 0).exists (&cvt_mode))
> > > > > + break;
> > > > > +
> > > > > +  int cvt_size = GET_MODE_BITSIZE (cvt_mode);
> > > > > +  bool isUnsigned = TYPE_UNSIGNED (ret_type) || TYPE_UNSIGNED
> > > > (arg_type);
> > > > > +  cvt_type = build_nonstandard_integer_type (cvt_size,
> > > > > + isUnsigned);
> > > > > +
> > > > > +  cvt_type = build_vector_type (cvt_type, nelts);
> > > > > +  if (cvt_type == NULL_TREE
> > > > > +   || !supportable_convert_operation ((tree_code) NOP_EXPR,
> > > > > +

Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Richard Biener
On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Richard Biener wrote:
> > On Mon, 3 Jun 2024, Tobias Burnus wrote:
> >> Thomas Schwinge wrote:
> >>> In the following, I have then reconsidered that stance; we may actually
> >>> "Implement global constructor, destructor support in a conceptually
> >>> simpler way than using 'collect2' (the program): implement the respective
> >>> functionality in the nvptx-tools 'ld'".  The latter is
> >>> 
> >>> "ld: Global constructor/destructor support".
> >> The attached patch makes clearer which version should be
> >> installed by recommending this patch (= latest nvptx-tools)
> >> in install.texi.
> > Can we simply say "newerst" where I guess refering to a github repo
> > already implies this?
> 
> Good question. The problem I see with just referring to a repository (even
> with newest) often means: yes, that software I have (whatever version). While
> if some reference goes to a 2024 version, I might not know what version I have
> but likely an older version → I will update.
> 
> Admittedly, as people tend to *not* read the documentation, this approach
> might fail as well. But, maybe, it is sufficient to update GCC 15's release
> notes?*
> 
> It won't help those not reading with the release notes before building and the
> wording* had to be changed a bit as install.texi no longer states what version
> should be used, but it would be an alternative

install.texi also has the issue that it's not pre-packaged in a
easy to discover and readable file in the release tarballs and that
the online version is only for trunk.

> (*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html
> 
> Tobias
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Andrew Stubbs

On 03/06/2024 04:01, Kewen Lin wrote:

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in gcn port.

gcc/ChangeLog:

* config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.



Assuming that this does not enable some long-double mode support that 
wasn't present before then this LGTM.


GCN does have some partially implemented support for HFmode ... do I 
need to do something new for that to work?


Andrew


---
  gcc/config/gcn/gcn.h | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index afa615320ca..e3bfd29c17d 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -111,9 +111,6 @@
  #define INT_TYPE_SIZE   32
  #define LONG_TYPE_SIZE  64
  #define LONG_LONG_TYPE_SIZE 64
-#define FLOAT_TYPE_SIZE  32
-#define DOUBLE_TYPE_SIZE 64
-#define LONG_DOUBLE_TYPE_SIZE64
  #define DEFAULT_SIGNED_CHAR 1
  #define PCC_BITFIELD_TYPE_MATTERS 1
  




[PATCH] Implement wrap-around arithmetics in DWARF expressions

2024-06-03 Thread Eric Botcazou
Hi,

for the attached Ada package declaring a simple variable-sized record type, 
the compiler builds a "size function" in GENERIC which is at -Og:

sizetype _GLOBAL.SZ5_p (p__enum p0)
{
  return (UNSIGNED_8) p0 + 252 <= 3 ? 32 : 0;
}

The UNSIGNED_8-based trick eliminates one branch but relies on the wrap-around 
arithmetics of UNSIGNED_8.  This size function is then translated into a DWARF 
procedure, but the wrap-around arithmetics is dropped, leading to a wrong size 
calculation when the DWARF procedure is executed.

The fix also contains an optimization of unsigned comparisons in DWARF for the 
case where the type is smaller than the "generic type", as is the case here.

Tested on x86-64/Linux, OK for the mainline?


2024-06-03  Eric Botcazou  

* dwarf2out.cc (loc_list_from_tree_1) ; Add const.
: Use a signed comparison for small unsigned types.
Implement wrap-around arithmetics for small integer types.

-- 
Eric Botcazoupackage P is

  type Enum is (Zero, One, Two, Three, Four, Five, Six, Seven, Eight, Nine);
 
  type Rec (Kind : Enum := Zero) is record
  case Kind is
when Four .. Seven =>
  S : String (1 .. 32);
when others =>
  null;
  end case;
end record;

end P;
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5b064ffd78a..89efa5474d3 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -19383,7 +19383,7 @@ loc_list_from_tree_1 (tree loc, int want_address,
 case ROUND_DIV_EXPR:
   if (TYPE_UNSIGNED (TREE_TYPE (loc)))
 	{
-	  enum machine_mode mode = TYPE_MODE (TREE_TYPE (loc));
+	  const enum machine_mode mode = TYPE_MODE (TREE_TYPE (loc));
 	  scalar_int_mode int_mode;
 
 	  if ((dwarf_strict && dwarf_version < 5)
@@ -19518,6 +19518,15 @@ loc_list_from_tree_1 (tree loc, int want_address,
 do_comp_binop:
   if (TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (loc, 0
 	{
+	  const enum machine_mode mode
+	= TYPE_MODE (TREE_TYPE (TREE_OPERAND (loc, 0)));
+	  scalar_int_mode int_mode;
+
+	  /* We can use a signed comparison if the sign bit is not set.  */
+	  if (is_a  (mode, &int_mode)
+	  && GET_MODE_SIZE (int_mode) < DWARF2_ADDR_SIZE)
+	goto do_binop;
+
 	  list_ret = loc_list_from_tree (TREE_OPERAND (loc, 0), 0, context);
 	  list_ret1 = loc_list_from_tree (TREE_OPERAND (loc, 1), 0, context);
 	  list_ret = loc_list_from_uint_comparison (list_ret, list_ret1,
@@ -19544,6 +19553,7 @@ loc_list_from_tree_1 (tree loc, int want_address,
   add_loc_list (&list_ret, list_ret1);
   if (list_ret == 0)
 	return 0;
+
   add_loc_descr_to_each (list_ret, new_loc_descr (op, 0, 0));
   break;
 
@@ -19668,6 +19678,28 @@ loc_list_from_tree_1 (tree loc, int want_address,
   if (!ret && !list_ret)
 return 0;
 
+  /* Implement wrap-around arithmetics for small integer types.  */
+  if ((TREE_CODE (loc) == PLUS_EXPR
+   || TREE_CODE (loc) == MINUS_EXPR
+   || TREE_CODE (loc) == MULT_EXPR
+   || TREE_CODE (loc) == NEGATE_EXPR
+   || TREE_CODE (loc) == LSHIFT_EXPR)
+  && INTEGRAL_TYPE_P (TREE_TYPE (loc))
+  && TYPE_OVERFLOW_WRAPS (TREE_TYPE (loc)))
+{
+  const enum machine_mode mode = TYPE_MODE (TREE_TYPE (loc));
+  scalar_int_mode int_mode;
+
+  if (is_a  (mode, &int_mode)
+	  && GET_MODE_SIZE (int_mode) < DWARF2_ADDR_SIZE)
+	{
+	   const unsigned HOST_WIDE_INT mask
+	= (HOST_WIDE_INT_1U << GET_MODE_BITSIZE (int_mode)) - 1;
+	   add_loc_descr_to_each (list_ret, uint_loc_descriptor (mask));
+	   add_loc_descr_to_each (list_ret, new_loc_descr (DW_OP_and, 0, 0));
+	}
+}
+
   if (want_address == 2 && !have_address
   && (dwarf_version >= 4 || !dwarf_strict))
 {


Re: [PATCH] pair-fusion: fix for older GCC

2024-06-03 Thread Marc Poulhiès
Richard Sandiford  writes:

> Marc Poulhiès  writes:
>> Older GCCs fail with:
>>
>>   .../gcc/pair-fusion.cc: In member function ‘bool 
>> pair_fusion_bb_info::fuse_pair(bool, unsigned int, int, rtl_ssa::insn_info*, 
>> rtl_ssa::in
>>   sn_info*, base_cand&, const rtl_ssa::insn_range_info&)’:
>>   .../gcc/pair-fusion.cc:1790:40: error: ‘writeback’ is not a class, 
>> namespace, or enumeration
>>  if (m_pass->should_handle_writeback (writeback::ALL)
>>
>> Renaming the enum type works around the name conflict with the local
>> variable and also prevents future similar conflicts.
>>
>> gcc/ChangeLog:
>>
>>  * pair-fusion.h (enum class writeback): Rename to...
>>  (enum class writeback_type): ...this.
>>  (struct pair_fusion): Adjust type name after renaming.
>>  * pair-fusion.cc (pair_fusion_bb_info::track_access): Likewise.
>>  (pair_fusion_bb_info::fuse_pair): Likewise.
>>  (pair_fusion::process_block): Likewise.
>
> OK, thanks, and sorry for missing this during the review.

This breaks aarch64:

https://builder.sourceware.org/buildbot/#/builders/266/builds/3487/steps/4/logs/stdio

I'll try to send a followup fix quickly...

Marc


RE: [PATCH] aarch64: Support multiple variants including up to 3

2024-06-03 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Saturday, May 4, 2024 2:03 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH] aarch64: Support multiple variants including
> up to 3
> 
> On some of the Qualcomm's SoC that includes oryon-1 core,
> the variant will be different on the cores due to big.little
> config. Though the difference between big and little is not
> significant enough to have seperate cost/scheduling models
> for them and the feature set is the same across all variants.
> 
> Also on some SoCs, there are 3 variants of the core,
> big.middle.little so this increases the support there for up to 3
> cores and 3 variants in the original parsing loop but it does not
> change the support for max of 2 different cores.
> 
> After this patch and the patch that adds oryon-1, -
> mcpu=native works on the SoCs I am working with.
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.

Ping?

> 
> gcc/ChangeLog:
> 
>   * config/aarch64/driver-aarch64.cc
> (host_detect_local_cpu): Support
>   3 cores and 3 variants. If there is one core but multiple
> variant,
>   then treat the variant as being all.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/cpunative/info_25: New file.
>   * gcc.target/aarch64/cpunative/info_26: New file.
>   * gcc.target/aarch64/cpunative/native_cpu_25.c: New
> test.
>   * gcc.target/aarch64/cpunative/native_cpu_26.c: New
> test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/driver-aarch64.cc  | 14 ++
>  .../gcc.target/aarch64/cpunative/info_25  | 17
> 
>  .../gcc.target/aarch64/cpunative/info_26  | 26
> +++
>  .../aarch64/cpunative/native_cpu_25.c | 11 
>  .../aarch64/cpunative/native_cpu_26.c | 11 
>  5 files changed, 74 insertions(+), 5 deletions(-)  create mode
> 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_25
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/info_26
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_25.c
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_26.c
> 
> diff --git a/gcc/config/aarch64/driver-aarch64.cc
> b/gcc/config/aarch64/driver-aarch64.cc
> index b620351e572..abe6e7df7dc 100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -256,9 +256,9 @@ host_detect_local_cpu (int argc, const
> char **argv)
>bool cpu = false;
>unsigned int i = 0;
>unsigned char imp = INVALID_IMP;
> -  unsigned int cores[2] = { INVALID_CORE, INVALID_CORE };
> +  unsigned int cores[3] = { INVALID_CORE, INVALID_CORE,
> INVALID_CORE };
>unsigned int n_cores = 0;
> -  unsigned int variants[2] = { ALL_VARIANTS, ALL_VARIANTS };
> +  unsigned int variants[3] = { ALL_VARIANTS, ALL_VARIANTS,
> ALL_VARIANTS
> + };
>unsigned int n_variants = 0;
>bool processed_exts = false;
>aarch64_feature_flags extension_flags = 0; @@ -314,7
> +314,7 @@ host_detect_local_cpu (int argc, const char
> **argv)
> unsigned cvariant = parse_field (buf);
> if (!contains_core_p (variants, cvariant))
>   {
> -  if (n_variants == 2)
> +   if (n_variants == 3)
>  goto not_found;
> 
>variants[n_variants++] = cvariant; @@ -326,7 +326,7
> @@ host_detect_local_cpu (int argc, const char **argv)
> unsigned ccore = parse_field (buf);
> if (!contains_core_p (cores, ccore))
>   {
> -   if (n_cores == 2)
> +   if (n_cores == 3)
>   goto not_found;
> 
> cores[n_cores++] = ccore;
> @@ -383,11 +383,15 @@ host_detect_local_cpu (int argc,
> const char **argv)
>/* Weird cpuinfo format that we don't know how to handle.
> */
>if (n_cores == 0
>|| n_cores > 2
> -  || (n_cores == 1 && n_variants != 1)
>|| imp == INVALID_IMP
>|| !processed_exts)
>  goto not_found;
> 
> +  /* If we have one core type but multiple variants, consider
> + that as one variant with ALL_VARIANTS instead.  */  if
> (n_cores ==
> + 1 && n_variants != 1)
> +variants[0] = ALL_VARIANTS;
> +
>/* Simple case, one core type or just looking for the arch. */
>if (n_cores == 1 || arch)
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
> new file mode 100644
> index 000..d6e83ccab09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
> @@ -0,0 +1,17 @@
> +processor: 0
> +BogoMIPS : 38.40
> +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
> atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop
> sha3 sm3 sm4 asimddp sha512 asimdfhm dit uscat ilrcpc flagm
> ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 rng bti ecv
> afp rpres
> +CPU implementer  : 0x51
> +CPU architecture: 8
> +CPU va

RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-03 Thread Hu, Lin1
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 3, 2024 5:03 PM
> To: Hu, Lin1 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> ubiz...@gmail.com
> Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, 
> float
> -> float and int <-> float.
> 
> On Mon, 3 Jun 2024, Hu, Lin1 wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, May 31, 2024 8:41 PM
> > > To: Hu, Lin1 
> > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > ubiz...@gmail.com
> > > Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for
> > > int -> int, float
> > > -> float and int <-> float.
> > >
> > > On Fri, 31 May 2024, Hu, Lin1 wrote:
> > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Wednesday, May 29, 2024 5:41 PM
> > > > > To: Hu, Lin1 
> > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > ; ubiz...@gmail.com
> > > > > Subject: Re: [PATCH 1/3] vect: generate suitable convert insn
> > > > > for int -> int, float
> > > > > -> float and int <-> float.
> > > > >
> > > > > On Thu, 23 May 2024, Hu, Lin1 wrote:
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/107432
> > > > > > * tree-vect-generic.cc
> > > > > > (supportable_indirect_narrowing_operation): New function for
> > > > > > support indirect narrowing convert.
> > > > > > (supportable_indirect_widening_operation): New function for
> > > > > > support indirect widening convert.
> > > > > > (expand_vector_conversion): Support convert for int -> int,
> > > > > > float -> float and int <-> float.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > PR target/107432
> > > > > > * gcc.target/i386/pr107432-1.c: New test.
> > > > > > * gcc.target/i386/pr107432-2.c: Ditto.
> > > > > > * gcc.target/i386/pr107432-3.c: Ditto.
> > > > > > * gcc.target/i386/pr107432-4.c: Ditto.
> > > > > > * gcc.target/i386/pr107432-5.c: Ditto.
> > > > > > * gcc.target/i386/pr107432-6.c: Ditto.
> > > > > > * gcc.target/i386/pr107432-7.c: Ditto.
> > > > > > ---
> > > > > > diff --git a/gcc/tree-vect-generic.cc
> > > > > > b/gcc/tree-vect-generic.cc index
> > > > > > ab640096ca2..0bedb53d9f9 100644
> > > > > > --- a/gcc/tree-vect-generic.cc
> > > > > > +++ b/gcc/tree-vect-generic.cc
> > > > > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If
> > > > > > not see #include "gimple-match.h"
> > > > > >  #include "recog.h" /* FIXME: for insn_data */
> > > > > >  #include "optabs-libfuncs.h"
> > > > > > +#include "cfgloop.h"
> > > > > > +#include "tree-vectorizer.h"
> > > > > >
> > > > > >
> > > > > >  /* Build a ternary operation and gimplify it.  Emit code before 
> > > > > > GSI.
> > > > > > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> > > > > (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> > > > > >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> > > > > >
> > > > > > +/* A subroutine of expand_vector_conversion, support indirect
> > > > > > +conversion
> > > > > for
> > > > > > +   float <-> int, like double -> char.  */ bool
> > > > > > +supportable_indirect_narrowing_operation (gimple_stmt_iterator 
> > > > > > *gsi,
> > > > > > +enum tree_code code,
> > > > > > +tree lhs,
> > > > > > +tree arg)
> > > > > > +{
> > > > > > +  gimple *g;
> > > > > > +  tree ret_type = TREE_TYPE (lhs);
> > > > > > +  tree arg_type = TREE_TYPE (arg);
> > > > > > +  tree new_rhs;
> > > > > > +
> > > > > > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > > > > > + unsigned int arg_elt_bits = vector_element_bits (arg_type);
> > > > > > + if (code != FIX_TRUNC_EXPR || flag_trapping_math ||
> > > > > > + ret_elt_bits >=
> > > > > arg_elt_bits)
> > > > > > +return false;
> > > > > > +
> > > > > > +  unsigned short target_size;  scalar_mode tmp_cvt_mode;
> > > > > > + scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE
> > > > > > + (ret_type)); scalar_mode rhs_mode = GET_MODE_INNER
> > > > > > + (TYPE_MODE (arg_type)); tree cvt_type = NULL_TREE;
> > > > > > + tmp_cvt_mode = lhs_mode; target_size = GET_MODE_SIZE
> > > > > > + (rhs_mode);
> > > > > > +
> > > > > > +  opt_scalar_mode mode_iter;
> > > > > > +  enum tree_code tc1, tc2;
> > > > > > +  unsigned HOST_WIDE_INT nelts
> > > > > > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > > > > > +
> > > > > > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > > > > > +{
> > > > > > +  tmp_cvt_mode = mode_iter.require ();
> > > > > > +
> > > > > > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > > > > > +   break;
> > > > > > +
> > > > > > +  scalar_mode cvt_mode;
> > > > > > +  int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> > > > > > +  if (!int_mode_for_size (tmp_cvt_size, 0).exists (&cvt_mode))
> > > > > > +   brea

RE: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-03 Thread Richard Biener
On Mon, 3 Jun 2024, Hu, Lin1 wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 3, 2024 5:03 PM
> > To: Hu, Lin1 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > ubiz...@gmail.com
> > Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for int -> 
> > int, float
> > -> float and int <-> float.
> > 
> > On Mon, 3 Jun 2024, Hu, Lin1 wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, May 31, 2024 8:41 PM
> > > > To: Hu, Lin1 
> > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ;
> > > > ubiz...@gmail.com
> > > > Subject: RE: [PATCH 1/3] vect: generate suitable convert insn for
> > > > int -> int, float
> > > > -> float and int <-> float.
> > > >
> > > > On Fri, 31 May 2024, Hu, Lin1 wrote:
> > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Wednesday, May 29, 2024 5:41 PM
> > > > > > To: Hu, Lin1 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao
> > > > > > ; ubiz...@gmail.com
> > > > > > Subject: Re: [PATCH 1/3] vect: generate suitable convert insn
> > > > > > for int -> int, float
> > > > > > -> float and int <-> float.
> > > > > >
> > > > > > On Thu, 23 May 2024, Hu, Lin1 wrote:
> > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > >   PR target/107432
> > > > > > >   * tree-vect-generic.cc
> > > > > > >   (supportable_indirect_narrowing_operation): New function for
> > > > > > >   support indirect narrowing convert.
> > > > > > >   (supportable_indirect_widening_operation): New function for
> > > > > > >   support indirect widening convert.
> > > > > > >   (expand_vector_conversion): Support convert for int -> int,
> > > > > > >   float -> float and int <-> float.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > >   PR target/107432
> > > > > > >   * gcc.target/i386/pr107432-1.c: New test.
> > > > > > >   * gcc.target/i386/pr107432-2.c: Ditto.
> > > > > > >   * gcc.target/i386/pr107432-3.c: Ditto.
> > > > > > >   * gcc.target/i386/pr107432-4.c: Ditto.
> > > > > > >   * gcc.target/i386/pr107432-5.c: Ditto.
> > > > > > >   * gcc.target/i386/pr107432-6.c: Ditto.
> > > > > > >   * gcc.target/i386/pr107432-7.c: Ditto.
> > > > > > > ---
> > > > > > > diff --git a/gcc/tree-vect-generic.cc
> > > > > > > b/gcc/tree-vect-generic.cc index
> > > > > > > ab640096ca2..0bedb53d9f9 100644
> > > > > > > --- a/gcc/tree-vect-generic.cc
> > > > > > > +++ b/gcc/tree-vect-generic.cc
> > > > > > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If
> > > > > > > not see #include "gimple-match.h"
> > > > > > >  #include "recog.h"   /* FIXME: for insn_data */
> > > > > > >  #include "optabs-libfuncs.h"
> > > > > > > +#include "cfgloop.h"
> > > > > > > +#include "tree-vectorizer.h"
> > > > > > >
> > > > > > >
> > > > > > >  /* Build a ternary operation and gimplify it.  Emit code before 
> > > > > > > GSI.
> > > > > > > @@ -1834,6 +1836,142 @@ do_vec_narrow_conversion
> > > > > > (gimple_stmt_iterator *gsi, tree inner_type, tree a,
> > > > > > >return gimplify_build2 (gsi, code, outer_type, b, c);  }
> > > > > > >
> > > > > > > +/* A subroutine of expand_vector_conversion, support indirect
> > > > > > > +conversion
> > > > > > for
> > > > > > > +   float <-> int, like double -> char.  */ bool
> > > > > > > +supportable_indirect_narrowing_operation (gimple_stmt_iterator 
> > > > > > > *gsi,
> > > > > > > +  enum tree_code code,
> > > > > > > +  tree lhs,
> > > > > > > +  tree arg)
> > > > > > > +{
> > > > > > > +  gimple *g;
> > > > > > > +  tree ret_type = TREE_TYPE (lhs);
> > > > > > > +  tree arg_type = TREE_TYPE (arg);
> > > > > > > +  tree new_rhs;
> > > > > > > +
> > > > > > > +  unsigned int ret_elt_bits = vector_element_bits (ret_type);
> > > > > > > + unsigned int arg_elt_bits = vector_element_bits (arg_type);
> > > > > > > + if (code != FIX_TRUNC_EXPR || flag_trapping_math ||
> > > > > > > + ret_elt_bits >=
> > > > > > arg_elt_bits)
> > > > > > > +return false;
> > > > > > > +
> > > > > > > +  unsigned short target_size;  scalar_mode tmp_cvt_mode;
> > > > > > > + scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE
> > > > > > > + (ret_type)); scalar_mode rhs_mode = GET_MODE_INNER
> > > > > > > + (TYPE_MODE (arg_type)); tree cvt_type = NULL_TREE;
> > > > > > > + tmp_cvt_mode = lhs_mode; target_size = GET_MODE_SIZE
> > > > > > > + (rhs_mode);
> > > > > > > +
> > > > > > > +  opt_scalar_mode mode_iter;
> > > > > > > +  enum tree_code tc1, tc2;
> > > > > > > +  unsigned HOST_WIDE_INT nelts
> > > > > > > += constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> > > > > > > +
> > > > > > > +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> > > > > > > +{
> > > > > > > +  tmp_cvt_mode = mode_iter.require ();
> > > > > > > +
> > > > > > > +  if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> > > > > > 

[PATCH] aarch64: adjust enum writeback after rename

2024-06-03 Thread Marc Poulhiès
gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc (struct aarch64_pair_fusion):
Use new type name.
---
My previous change fixed the generic code, but I forgot to adjust the overload 
in aarch64.

I don't have an aarch64 setup to check it fixes the build, but will set it up 
later. Unless it's
OK to apply it as it's easy enough.

Marc

 gcc/config/aarch64/aarch64-ldp-fusion.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 0af927231d3..b255dcbe73c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -75,9 +75,9 @@ struct aarch64_pair_fusion : public pair_fusion
 return aarch64_ldp_alias_check_limit;
   }
 
-  bool should_handle_writeback (enum writeback which) override final
+  bool should_handle_writeback (writeback_type which) override final
   {
-if (which == writeback::ALL)
+if (which == writeback_type::ALL)
   return aarch64_ldp_writeback > 1;
 else
   return aarch64_ldp_writeback;
-- 
2.45.1



Re: [PATCH 09/52] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type

2024-06-03 Thread Kewen.Lin
Hi Richi,

on 2024/6/3 14:49, Richard Biener wrote:
> On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin  wrote:
>>
>> Currently how we determine which mode will be used for a
>> floating point type is that for a given type precision
>> (size) call mode_for_size to get the first mode which has
>> this size in the specified class.  On Powerpc, we have
>> three modes (TF/KF/IF) having the same mode precision 128
>> (see[1]), so the processing forces us to have to place TF
>> at the first place, it would require us to make more
>> adjustment in some generic code to avoid some unexpected
>> mode conversions and it would be even worse if we get rid
>> of TF eventually one day.  And as Joseph pointed out in [2],
>> "floating types should have their mode, not a poorly
>> defined precision value", as Joseph and Richi suggested,
>> this patch is to introduce one hook mode_for_floating_type
>> which returns the corresponding mode for type float, double
>> or long double.  The default implementation returns SFmode
>> for float and DFmode for double or long double.  For ports
>> which need special treatment, there are some other patches
>> for their own port specific implementation (referring to
>> how {,LONG_}DOUBLE_TYPE_SIZE get used there).  For all
>> generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
>> on the context, some of them are replaced with TYPE_PRECISION
>> of the according type node, some other are replaced with
>> GET_MODE_PRECISION on the mode from mode_for_floating_type.
>> This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
>> so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
>> specific are removed, but there are still some which are
>> good to be kept for readability then they get renamed with
>> port specific prefix.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/ChangeLog:
>>
>> * coretypes.h (enum tree_index): Forward declaration.
>> * defaults.h (FLOAT_TYPE_SIZE): Remove.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
>> * doc/rtl.texi: Update document by replacing {FLOAT,DOUBLE}_TYPE_SIZE
>> with C type {float,double}.
>> * doc/tm.texi.in: Document new hook mode_for_floating_type, remove
>> document entries for {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE and
>> update document for WIDEST_HARDWARE_FP_SIZE.
>> * doc/tm.texi: Regenerate.
>> * emit-rtl.cc (init_emit_once): Replace DOUBLE_TYPE_SIZE by
>> calling targetm.c.mode_for_floating_type with TI_DOUBLE_TYPE.
>> * real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use TYPE_PRECISION of
>> long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
>> * system.h (FLOAT_TYPE_SIZE): Poison.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
>> * target.def (mode_for_floating_type): New hook.
>> * targhooks.cc (default_mode_for_floating_type): New function.
>> (default_scalar_mode_supported_p): Update macros
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>> targetm.c.mode_for_floating_type with
>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>> * targhooks.h (default_mode_for_floating_type): New declaration.
>> * tree-core.h (enum tree_index): Specify underlying type unsigned
>> to sync with forward declaration in coretypes.h.
>> (NUM_FLOATN_TYPES): Explicitly convert to int.
>> (NUM_FLOATNX_TYPES): Likewise.
>> (NUM_FLOATN_NX_TYPES): Likewise.
>> * tree.cc (build_common_tree_nodes): Update macros
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>> targetm.c.mode_for_floating_type with
>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE and set type mode accordingly.
>> ---
>>  gcc/coretypes.h|  1 +
>>  gcc/defaults.h | 12 
>>  gcc/doc/rtl.texi   |  2 +-
>>  gcc/doc/tm.texi| 33 +
>>  gcc/doc/tm.texi.in | 27 +++
>>  gcc/emit-rtl.cc|  3 ++-
>>  gcc/real.h |  7 ---
>>  gcc/system.h   |  3 ++-
>>  gcc/target.def |  9 +
>>  gcc/targhooks.cc   | 18 +++---
>>  gcc/targhooks.h|  1 +
>>  gcc/tree-core.h| 13 +++--
>>  gcc/tree.cc| 18 +++---
>>  13 files changed, 77 insertions(+), 70 deletions(-)
>>
>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>> index 1ac6f0abea3..00c1c58bd8c 100644
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -100,6 +100,7 @@ struct gimple;
>>  typedef gimple *gimple_seq;
>>  struct gimple_stmt_iterator;
>>  class code_helper;
>> +enum tree_index : unsigned;
>>
>>  /* Forward declare rtx_code, so that we can use it in target hooks without
>> needing to pull in rtl.h.  */
>> diff --git a/gcc/defaults.h b/gcc/defaults.h
>> index 92f3e07f742..ac2d25852ab 100644
>> --- a/gcc/def

Re: [PATCH 51/52] sparc: New hook implementation sparc_c_mode_for_floating_type

2024-06-03 Thread Kewen.Lin
Hi Eric,

on 2024/6/3 17:02, Eric Botcazou wrote:
>>  * config/sparc/sparc.cc (sparc_c_mode_for_floating_type): New
>>  (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>>  (FLOAT_TYPE_SIZE): Remove.
>>  (DOUBLE_TYPE_SIZE): Likewise.
>>  (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  (sparc_type_code): Replace FLOAT_TYPE_SIZE with TYPE_PRECISION of
>>  float_type_node.
>>  * config/sparc/sparc.h (FLOAT_TYPE_SIZE): Remove.
>>  (DOUBLE_TYPE_SIZE): Remove.
>>  * config/sparc/freebsd.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/linux.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/linux64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/netbsd-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/openbsd64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sol2.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sp-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sp64-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
> 
> OK, modulo the following tweaks:

Thanks for the review!

> 
>> --- a/gcc/config/sparc/sparc.cc
>> +++ b/gcc/config/sparc/sparc.cc
>> @@ -718,6 +718,7 @@ static bool sparc_vectorize_vec_perm_const
>> (machine_mode, machine_mode, const vec_perm_indices &);
>>  static bool sparc_can_follow_jump (const rtx_insn *, const rtx_insn *);
>>  static HARD_REG_SET sparc_zero_call_used_regs (HARD_REG_SET);
>> +static machine_mode sparc_c_mode_for_floating_type (enum tree_index);
>>  
>>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>>  /* Table of valid machine attributes.  */
>> @@ -971,6 +972,9 @@ char sparc_hard_reg_printed[8];
>>  #undef TARGET_ZERO_CALL_USED_REGS
>>  #define TARGET_ZERO_CALL_USED_REGS sparc_zero_call_used_regs
>>
>> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
>> +#define TARGET_C_MODE_FOR_FLOATING_TYPE sparc_c_mode_for_floating_type
>> +
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>>
>>  /* Return the memory reference contained in X if any, zero otherwise.  */
>> @@ -9824,16 +9828,9 @@ sparc_assemble_integer (rtx x, unsigned int size, int
>> aligned_p) #define LONG_LONG_TYPE_SIZE (BITS_PER_WORD * 2)
>>  #endif
>>
>> -#ifndef FLOAT_TYPE_SIZE
>> -#define FLOAT_TYPE_SIZE BITS_PER_WORD
>> -#endif
>> -
>> -#ifndef DOUBLE_TYPE_SIZE
>> -#define DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>> -#endif
>> -
>> -#ifndef LONG_DOUBLE_TYPE_SIZE
>> -#define LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>> +/* LONG_DOUBLE_TYPE_SIZE get poisoned, so add SPARC_ prefix.  */
>> +#ifndef SPARC_LONG_LONG_TYPE_SIZE
>> +#define SPARC_LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>>  #endif
>>
>>  unsigned long
> 
> You can delete {SPARC_}LONG_DOUBLE_TYPE_SIZE too.

Good point, sparc.h already defines the default.

> 
>> @@ -9920,7 +9917,7 @@ sparc_type_code (tree type)
>>/* Carefully distinguish all the standard types of C,
>>   without messing up if the language is not C.  */
>>
>> -  if (TYPE_PRECISION (type) == FLOAT_TYPE_SIZE)
>> +  if (TYPE_PRECISION (type) == TYPE_PRECISION (float_type_node))
>>  return (qualifiers | 6);
>>
>>else
>> @@ -13984,4 +13981,16 @@ sparc_zero_call_used_regs (HARD_REG_SET
>> need_zeroed_hardregs) return need_zeroed_hardregs;
>>  }
>>
>> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
>> +   for TI_LONG_DOUBLE_TYPE which is for long double type, go with the
>> +   default one for the others.  */
>> +
>> +static machine_mode
>> +sparc_c_mode_for_floating_type (enum tree_index ti)
>> +{
>> +  if (ti == TI_LONG_DOUBLE_TYPE)
>> +return SPARC_LONG_DOUBLE_TYPE_SIZE == 128 ? TFmode : DFmode;
>> +  return default_mode_for_floating_type (ti);
>> +}
>> +
>>  #include "gt-sparc.h"
> 
> I think that TI_LONG_DOUBLE_TYPE is self-explanatory so just:
> 
> /* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
>for TI_LONG_DOUBLE_TYPE and the default for others.
> 

Will adjust it, thanks!

BR,
Kewen



Re: [PATCH] aarch64: adjust enum writeback after rename

2024-06-03 Thread Richard Sandiford
Marc Poulhiès  writes:
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-ldp-fusion.cc (struct aarch64_pair_fusion):
>   Use new type name.
> ---
> My previous change fixed the generic code, but I forgot to adjust the 
> overload in aarch64.
>
> I don't have an aarch64 setup to check it fixes the build, but will set it up 
> later. Unless it's
> OK to apply it as it's easy enough.

I've tested locally and it builds correctly, so ok now.

Thanks,
Richard

> Marc
>
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 0af927231d3..b255dcbe73c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -75,9 +75,9 @@ struct aarch64_pair_fusion : public pair_fusion
>  return aarch64_ldp_alias_check_limit;
>}
>  
> -  bool should_handle_writeback (enum writeback which) override final
> +  bool should_handle_writeback (writeback_type which) override final
>{
> -if (which == writeback::ALL)
> +if (which == writeback_type::ALL)
>return aarch64_ldp_writeback > 1;
>  else
>return aarch64_ldp_writeback;


Re: [PATCH] aarch64: adjust enum writeback after rename

2024-06-03 Thread Marc Poulhiès
Richard Sandiford  writes:

> Marc Poulhiès  writes:
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc (struct aarch64_pair_fusion):
>>  Use new type name.
>> ---
>> My previous change fixed the generic code, but I forgot to adjust the 
>> overload in aarch64.
>>
>> I don't have an aarch64 setup to check it fixes the build, but will set it 
>> up later. Unless it's
>> OK to apply it as it's easy enough.
>
> I've tested locally and it builds correctly, so ok now.

Thanks for testing Richard, now applied. And sorry for missing this in the 
first fix.

Marc


Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Andrew,

on 2024/6/3 17:14, Andrew Stubbs wrote:
> On 03/06/2024 04:01, Kewen Lin wrote:
>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> defines in gcn port.
>>
>> gcc/ChangeLog:
>>
>> * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
> 
> 
> Assuming that this does not enable some long-double mode support that wasn't 
> present before then this LGTM.

Thanks!  Yes, it doesn't, as the default hook implementation returns DFmode for 
long double type.

> 
> GCN does have some partially implemented support for HFmode ... do I need to 
> do something new for that to work?

For this hook, no, as it's mainly for float, double and long double types (C 
language supported non decimal floating
point types).  If you are referring to _Float16, I guess you may be interested 
in another hook TARGET_FLOATN_MODE
which is for FloatN types.

BR,
Kewen

> 
> Andrew
> 
>> ---
>>   gcc/config/gcn/gcn.h | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
>> index afa615320ca..e3bfd29c17d 100644
>> --- a/gcc/config/gcn/gcn.h
>> +++ b/gcc/config/gcn/gcn.h
>> @@ -111,9 +111,6 @@
>>   #define INT_TYPE_SIZE  32
>>   #define LONG_TYPE_SIZE  64
>>   #define LONG_LONG_TYPE_SIZE  64
>> -#define FLOAT_TYPE_SIZE  32
>> -#define DOUBLE_TYPE_SIZE  64
>> -#define LONG_DOUBLE_TYPE_SIZE  64
>>   #define DEFAULT_SIGNED_CHAR  1
>>   #define PCC_BITFIELD_TYPE_MATTERS 1
>>   
> 



[PATCH 1/2][final] Avoid inserting after a GIMPLE_COND with SLP and early break

2024-06-03 Thread Richard Biener
When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bf1f467f53f..11ec82086fc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9650,7 +9650,12 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- gsi_next (&si);
+ /* When we're getting gsi_after_labels from the starting
+condition of a fully masked/len loop avoid insertion
+after a GIMPLE_COND that can appear as the only header
+stmt with early break vectorization.  */
+ if (gimple_code (last_stmt) != GIMPLE_COND)
+   gsi_next (&si);
}
 }
 
-- 
2.35.3



[PATCH 2/2][final] RISC-V: Do single-lane SLP discovery for reductions

2024-06-03 Thread Richard Biener
The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.

For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.

The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

I've decided to adjust three testcases for appearing single-lane
SLP instances instead of not dumping "vectorizing stmts using SLP"
for single-lane instances as that also requires testsuite adjustments.

This is the final version of the series where the set of FAILs
caused is minimized to arm/risc-v architecture specific ones and
a few generic ones that will get resolved with merging the load
permute part.  From there it should be possible to start filling
missing pieces like generating load-lane/store-lane via SLP
patterns (or permute optimization?) and implement missing SLP
support in a few places.  After the load part is in I plan
to add a default-off --param that makes vectorization FAIL if
there's non-SLP vectorization surviving.

I plan to push this version if the CI goes through w/o surprises.

Thanks,
Richard.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions.  Make sure to try single-lane SLP reduction
for all reductions as fallback.
(vectorizable_load): Avoid outer loop SLP vectorization with
multi-copy vector stmts in the inner loop.
(vectorizable_store): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.

* gcc.dg/vect/slp-24.c: Expect another SLP instance for the
reduction.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-reduc-6.c: Remove scan for zero SLP instances.
---
 gcc/testsuite/gcc.dg/vect/slp-24-big-array.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-24.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-6.c  |  1 -
 gcc/tree-vect-loop.cc|  4 +-
 gcc/tree-vect-slp.cc | 71 +++-
 gcc/tree-vect-stmts.cc   | 24 ++-
 6 files changed, 80 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c 
b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
index 5eaea9600ac..63f744338a1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
@@ -92,4 +92,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c 
b/gcc/testsuite/gcc.dg/vect/slp-24.c
index 59178f2c0f2..7814d7c324e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
@@ -78,4 +78,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
index 1fd15aa3c87..5566705a704 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
@@ -45,6 +45,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { 
vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } 
} */
 /* { dg-final { scan-tree-dump-times "different interleaving chains in one 
node" 1 "vect" { target { ! vect_no_int_add } } } } */
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a08357acc11..06292ed8bbe 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6504,7 +6504,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three sche

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 05:41:11PM +0800, Kewen.Lin wrote:
> > GCN does have some partially implemented support for HFmode ... do I need 
> > to do something new for that to work?
> 
> For this hook, no, as it's mainly for float, double and long double types (C 
> language supported non decimal floating
> point types).  If you are referring to _Float16, I guess you may be 
> interested in another hook TARGET_FLOATN_MODE
> which is for FloatN types.

You don't need a new hook for that, the current _FloatNN discovery code is all
that is needed.  There should be just one mode for the IEEE compliant
implementations for each size (there is the _Float16 vs. __bf16 but the
latter isn't IEEE compliant, or just IEEE like), so tree.cc should figure
everything out together with the current langhooks.

Jakub



Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Florian Weimer
* Jonathan Wakely:

> +/** Stop the program with a breakpoint or debug trap.
> + *
> + * The details of how a breakpoint is implemented are platform-specific.
> + * Some systems provide a special instruction, such as `int3` in x86.
> + * When no more appropriate mechanism is available, this will stop the
> + * program using `__builtin_trap()`. It might not be possible for the
> + * program to continue after such a breakpoint.
> + *
> + * @since C++26
> + */
> +void
> +breakpoint() noexcept;

Would it make sense to have a special function symbol for this, on which
the debugger sets the breakpoint?  We already have this for a slightly
different purpose in glibc, see _dl_debug_state.

Otherwise you'll get crashes if the debugger detection delivers a false
positive (e.g. when running under an strace-equivalent if libstdc++
cannot check the process name).

Or is_debugger_present should perhaps read a volatile bool that is
updated by the debugger as needed, and not bother with any ptrace
checks.

Thanks,
Florian



Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus

Richard Biener wrote:

install.texi also has the issue that it's not pre-packaged in a
easy to discover and readable file in the release tarballs and that
the online version is only for trunk.


I always wondered why it is not included at 
https://gcc.gnu.org/onlinedocs/ — it would then also be linked from, 
e.g., https://gcc.gnu.org/gcc-14/index.html


Tobias



Re: [PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-06-03 Thread Maciej W. Rozycki
On Tue, 28 May 2024, David Malcolm wrote:

> I've pushed this to gcc trunk as r15-872-g7cc529fe514cc6 (having
> bootstrapped and lightly tested it on x86_64-pc-linux-gnu)

 Thank you for fixing this up.  Is this a new requirement now for .opt 
file changes?  Why does it have to be called by hand then rather than 
being a make dependency?

  Maciej


Re: [PATCH 36/52] aarch64: New hook implementation aarch64_c_mode_for_floating_type

2024-06-03 Thread Richard Sandiford
Kewen Lin  writes:
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in aarch64 port, and add new port specific hook
> implementation aarch64_c_mode_for_floating_type.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type):
>   New function.
>   (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>   * config/aarch64/aarch64.h (FLOAT_TYPE_SIZE): Remove.
>   (DOUBLE_TYPE_SIZE): Likewise.
>   (LONG_DOUBLE_TYPE_SIZE): Likewise.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 15 +++
>  gcc/config/aarch64/aarch64.h  |  6 --
>  2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index ee12d8897a8..1c1c278fd69 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -28427,6 +28427,18 @@ aarch64_bitint_type_info (int n, struct bitint_info 
> *info)
>return true;
>  }
>  
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode for
> +   TI_LONG_DOUBLE_TYPE which is for long double type, go with the default
> +   one for the others.  */
> +
> +static machine_mode
> +aarch64_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  if (ti == TI_LONG_DOUBLE_TYPE)
> +return TFmode;
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
> scheduled for speculative execution.  Reject the long-running division
> and square-root instructions.  */
> @@ -30554,6 +30566,9 @@ aarch64_run_selftests (void)
>  #undef TARGET_C_BITINT_TYPE_INFO
>  #define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info
>  
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE aarch64_c_mode_for_floating_type
> +
>  #undef  TARGET_EXPAND_BUILTIN
>  #define TARGET_EXPAND_BUILTIN aarch64_expand_builtin
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bbf11faaf4b..2064c23f961 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -96,12 +96,6 @@
>  
>  #define LONG_LONG_TYPE_SIZE  64
>  
> -#define FLOAT_TYPE_SIZE  32
> -
> -#define DOUBLE_TYPE_SIZE 64
> -
> -#define LONG_DOUBLE_TYPE_SIZE128
> -
>  /* This value is the amount of bytes a caller is allowed to drop the stack
> before probing has to be done for stack clash protection.  */
>  #define STACK_CLASH_CALLER_GUARD 1024


[committed] install.texi (gcn): Fix date of recommended newlib version

2024-06-03 Thread Tobias Burnus

Somehow, I was one year ahead. The commit wasn't 2025-03-25 but in 2024.

Committed as obvious, also to avoid future confusions.

Tobias
commit 16fb3abf0fb4b88ee0e27732db217909fa429a81
Author: Tobias Burnus 
Date:   Mon Jun 3 12:56:39 2024 +0200

install.texi (gcn): Fix date of recommended newlib version

gcc/ChangeLog:

* doc/install.texi (gcn): Fix date of recommended newlib version.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..c781646ac1f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3950,7 +3950,7 @@ by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
 Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
 GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Ajit Agarwal
Hello Richard:

On 03/06/24 2:07 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>> On 31/05/24 8:08 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 On 31/05/24 3:23 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello All:
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional 
>> virtual
>> function implementation required for rs6000 are added in 
>> aarch64-ldp-fusion.cc.
>>
>> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64, rs6000, middle-end: Add implementation for different targets 
>> for pair mem fusion
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional 
>> virtual
>> function implementation required for rs6000 are added in 
>> aarch64-ldp-fusion.cc.
>>
>> 2024-05-31  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>>  implementation of additional virtual functions added in pair_fusion
>>  struct.
>>  * config/rs6000/rs6000-passes.def: New mem fusion pass
>>  before pass_early_remat.
>>  * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>>  Add target specific implementation using pure virtual
>>  functions.
>>  * config.gcc: Add new object file.
>>  * config/rs6000/rs6000-protos.h: Add new prototype for mem
>>  fusion pass.
>>  * config/rs6000/t-rs6000: Add new rule.
>>  * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>>  from private.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.target/powerpc/me-fusion.C: New test.
>>  * g++.target/powerpc/mem-fusion-1.C: New test.
>>  * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>> ---
>
> This isn't a complete review, just some initial questions & comments
> about selected parts.
>
>> [...]
>> +/* Check whether load can be fusable or not.
>> +   Return true if dependent use is UNSPEC otherwise false.  */
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> +  rtx_insn *insn = info->rtl ();
>> +
>> +  for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
>> +if (REG_NOTE_KIND (note) == REG_EQUAL
>> +|| REG_NOTE_KIND (note) == REG_EQUIV)
>> +  return false;
>
> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
> note.  What's the reason for doing this?  Are you trying to avoid
> fusing pairs before reload that are equivalent to a MEM (i.e. have
> a natural spill slot)?  I think Alex hit a similar situation.
>

 We have used the above check because of some SPEC benchmarks failing with
 with MEM pairs having REG_EQUAL/EQUIV notes.

 By adding the checks the benchmarks passes and also it improves the
 performance.

 This checks were added during initial implementation of pair fusion
 pass.

 I will investigate further if this check is still required or not.
>>>
>>> Thanks.  If it does affect SPEC results, it would be good to look
>>> at the underlying reason, as a justification for the check.
>>>
>>> AIUI, the case Alex was due to the way that the RA recognises:
>>>
>>>   (set (reg R) (mem address-of-a-stack-variable))
>>> REG_EQUIV: (mem address-of-a-stack-variable)
>>>
>>> where the REG_EQUIV is either explicit or detected by the RA.
>>> If R needs to be spilled, it can then be spilled to its existing
>>> location on the stack.  And if R needs to be spilled in the
>>> instruction above (because of register pressure before the first
>>> use of R), the RA is able to delete the instruction.
>>>
>>> But if that is the reason, the condition should be restricted
>>> to cases in which the note is a memory.
>>>
>>> I think Alex had tried something similar and found that it wasn't
>>> always effective.
>>>
>>
>> Sure I will check.
 [...]
>> +   && info->is_real ())
>> +  {
>> +rtx_insn *rtl_insn = info->rtl ();
>> +rtx set = single_set (rtl_insn);
>> +
>> +if (set == NULL_RTX)
>> +  return

Re: [PATCH] Implement wrap-around arithmetics in DWARF expressions

2024-06-03 Thread Richard Biener
On Mon, Jun 3, 2024 at 11:18 AM Eric Botcazou  wrote:
>
> Hi,
>
> for the attached Ada package declaring a simple variable-sized record type,
> the compiler builds a "size function" in GENERIC which is at -Og:
>
> sizetype _GLOBAL.SZ5_p (p__enum p0)
> {
>   return (UNSIGNED_8) p0 + 252 <= 3 ? 32 : 0;
> }
>
> The UNSIGNED_8-based trick eliminates one branch but relies on the wrap-around
> arithmetics of UNSIGNED_8.  This size function is then translated into a DWARF
> procedure, but the wrap-around arithmetics is dropped, leading to a wrong size
> calculation when the DWARF procedure is executed.
>
> The fix also contains an optimization of unsigned comparisons in DWARF for the
> case where the type is smaller than the "generic type", as is the case here.
>
> Tested on x86-64/Linux, OK for the mainline?

OK.

Thanks,
Richard.

>
> 2024-06-03  Eric Botcazou  
>
> * dwarf2out.cc (loc_list_from_tree_1) ; Add const.
> : Use a signed comparison for small unsigned types.
> Implement wrap-around arithmetics for small integer types.
>
> --
> Eric Botcazou


Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Richard Biener
On Mon, 3 Jun 2024, Tobias Burnus wrote:

> Richard Biener wrote:
> > install.texi also has the issue that it's not pre-packaged in a
> > easy to discover and readable file in the release tarballs and that
> > the online version is only for trunk.
> 
> I always wondered why it is not included at https://gcc.gnu.org/onlinedocs/ —
> it would then also be linked from, e.g., https://gcc.gnu.org/gcc-14/index.html

I'm quite sure it's because nobody bothered to update
maintainer-scripts/update_web_docs_git.  The install docs are
generated into INSTALL/ in the release tarballs it seems but
it's html rather than a plain text file there.

Richard.

> Tobias
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Ulrich Drepper
On Mon, Jun 3, 2024 at 12:20 PM Florian Weimer  wrote:

> Would it make sense to have a special function symbol for this, on which
> the debugger sets the breakpoint?  […]


Jon and I discussed more details off-list.  Hopefully a more complete
version is coming soon-ish.


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal  writes:
>> [...]
>> If it is intentional, what distinguishes things like vperm and xxinsertw
>> (and all other unspecs) from plain addition?
>> 
>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>  (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>
>
> Plain addition are not supported currently.
> We have not seen many cases with plain addition and this patch
> will not accept plain addition.
>
>  
>> This is why the intention behind the patch is important.  As it stands,
>> it isn't clear what criteria the patch is using to distinguish "valid"
>> fuse candidates from "invalid" ones.
>>
>
> Intention behind this patch all variants of UNSPEC instructions are
> supported and uses without UNSPEC are not supported in this patch.

But why make the distinction this way though?  UNSPEC is a very
GCC-specific concept.  Whether something is an UNSPEC or some other
RTL code depends largely on historical accident.  E.g. we have specific
codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).

It seems unlikely that GCC's choice about whether to represent something
as an UNSPEC or as another RTL code lines up neatly with the kind of
codegen decisions that a good assembly programmer would make.

I suppose another way of asking is to turn this around and say: what
kind of uses are you trying to exclude?  Presumably things are worse
if you remove this function override.  But what makes them worse?
What kind of uses cause the regression?

>>> [...]
>>> +  // Given insn_info pair I1 and I2, return true if offsets are in 
>>> order.
>>> +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
>>> + rtl_ssa::insn_info *i2) = 
>>> 0;
>>> +
>>
>> This name seems a bit misleading.  The function is used in:
>>
>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, 
>> unsigned access_size,
>>reversed = true;
>>  }
>>  
>> +  if (!m_pass->should_handle_unordered_insns (i1, i2))
>> +return false;
>> +
>>rtx cand_mems[2];
>>rtx reg_ops[2];
>>rtx pats[2];
>>
>> and so it acts as a general opt-out.  The insns aren't known to be 
>> unordered.
>>
>> It looks like the rs6000 override requires the original insns to be
>> in offset order.  Could you say why that's necessary?  (Both in email
>> and as a comment in the code.)
>>
>
> Yes rs6000 requires the original load insns to be in offset order.
> Some regression tests like vect-outer-4f fails if we do load pair
> fusion with load offsets are not in offset order as this breaks lxvp 
> semantics.

 How does it break the semantics though?  In principle, the generic code
 only fuses if it has "proved" that the loads can happen in either order.
 So it shouldn't matter which order the hardware does things in.

 Could you give an example of the kind of situation that you want
 to avoid, and why it generates the wrong result?

>>>
>>> (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM  
>>> [(short unsigned int *)vectp.62_36 + 64B] ])
>>> (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>> (const_int 64 [0x40])) [1 MEM >> int> [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>  (nil))
>>> (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM  
>>> [(short unsigned int *)vectp.62_36 + 80B] ])
>>> (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>> (const_int 80 [0x50])) [1 MEM >> int> [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>  (nil))
>>> (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM  
>>> [(short unsigned int *)vectp.62_36 + 16B] ])
>>> (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>> (const_int 16 [0x10])) [1 MEM >> int> [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16]))  
>>> {vsx_movv16qi_64bit}
>>>  (nil))
>>> (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM  
>>> [(short unsigned int *)vectp.62_36 + 32B] ])
>>> (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>> (const_int 32 [0x20])) [1 MEM >> int> [(short unsigned int *)vectp.62_36 + 32B]+0 S16 A16])) 
>>> {vsx_movv16qi_64bit}
>>>  (nil))
>>> (insn 22 21 37 2 (set (reg:V16QI 166 [ MEM  
>>> [(short unsigned int *)vectp.62_36 + 48B] ])
>>> (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
>>> (const_int 48 [0x30])) [1 MEM >> int> [(short unsigned int *)vectp.62_36 + 48B]+0 S16 A16])) 
>>> {vsx_movv16qi_64bit}
>>>  (nil))
>>>
>>> insn 22 and insn 31 is merged in the failure case and breaks the code.
>> 
>> What specifically goes wrong though?  This is just a sequence of loads
>> from t

[PING] [PATCH v4 2/3] [RFC] ifcvt: Allow more operations in multiple set if conversion

2024-06-03 Thread Manolis Tsamis
Currently the operations allowed for if conversion of a basic block with
multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
bb_ok_for_noce_convert_multiple_sets).

This commit allows more operations (arithmetic, compare, etc) to participate
in if conversion. The target's profitability hook and ifcvt's costing is
expected to reject sequences that are unprofitable.

This is especially useful for targets which provide a rich selection of
conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, ...)
which are currently not used in basic blocks with more than a single set.

gcc/ChangeLog:

* ifcvt.cc (try_emit_cmove_seq): Modify comments.
(noce_convert_multiple_sets_1): Modify comments.
(bb_ok_for_noce_convert_multiple_sets): Allow more operations.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v4:
- Remove unnecessary hardcoded list of allowed ops in
bb_ok_for_noce_convert_multiple_sets.
- Set need_cmov based on BB live_out instead of REG_DEAD notes.
- Fix preexisting issues and improve the code that sets read_comparison.

 gcc/ifcvt.cc  | 34 +++-
 .../aarch64/ifcvt_multiple_sets_arithm.c  | 79 +++
 2 files changed, 92 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 763a25f816e..dc00042be81 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3432,13 +3432,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx 
temp,
 /* We have something like:
 
  if (x > y)
-   { i = a; j = b; k = c; }
+   { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
 
Make it:
 
- tmp_i = (x > y) ? a : i;
- tmp_j = (x > y) ? b : j;
- tmp_k = (x > y) ? c : k;
+ tmp_i = (x > y) ? EXPR_A : i;
+ tmp_j = (x > y) ? EXPR_B : j;
+ tmp_k = (x > y) ? EXPR_C : k;
  i = tmp_i;
  j = tmp_j;
  k = tmp_k;
@@ -3839,11 +3839,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
 
 
 
-/* Return true iff basic block TEST_BB is comprised of only
-   (SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  Also check that we have more than one set
-   (other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+/* Return true iff basic block TEST_BB is suitable for conversion to a
+   series of conditional moves.  Also check that we have more than one
+   set (other routines can handle a single set better than we would),
+   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
through the insns store the sum of their potential costs in COST.  */
 
 static bool
@@ -3869,20 +3868,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
test_bb, unsigned *cost)
   rtx dest = SET_DEST (set);
   rtx src = SET_SRC (set);
 
-  /* We can possibly relax this, but for now only handle REG to REG
-(including subreg) moves.  This avoids any issues that might come
-from introducing loads/stores that might violate data-race-freedom
-guarantees.  */
-  if (!REG_P (dest))
-   return false;
-
-  if (!((REG_P (src) || CONSTANT_P (src))
-   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
- && subreg_lowpart_p (src
+  /* Do not handle anything involving memory loads/stores since it might
+violate data-race-freedom guarantees.  */
+  if (!REG_P (dest) || contains_mem_rtx_p (src))
return false;
 
-  /* Destination must be appropriate for a conditional write.  */
-  if (!noce_operand_ok (dest))
+  /* Destination and source must be appropriate.  */
+  if (!noce_operand_ok (dest) || !noce_operand_ok (src))
return false;
 
   /* We must be able to conditionally move in this mode.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c 
b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
new file mode 100644
index 000..d977f4d62ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ce1" } */
+
+void sink2(int, int);
+void sink3(int, int, int);
+
+void cond1(int cond, int x, int y)
+{
+  if (cond)
+{
+  x = x << 4;
+  y = 1;
+}
+
+  sink2(x, y);
+}
+
+void cond2(int cond, int x, int y)
+{
+  if (cond)
+{
+  x++;
+  y++;
+}
+
+  sink2(x, y);
+}
+
+void cond3(int cond, int x1, int x2, int x3)
+{
+  if (cond)
+{
+  x1++;
+  x2++;
+  x3++;
+}
+
+  sink3(x1, x2, x3);
+}
+
+void cond4(int cond, int x, int y)
+{
+  if (cond)
+{
+  x += 2;
+  y += 3;
+}
+
+  sink2(x, y);
+}
+
+void cond5(int cond, int x, int y, int 

[PING] [PATCH v4 0/3] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2024-06-03 Thread Manolis Tsamis


noce_convert_multiple_sets has been introduced and extended over time to handle
if conversion for blocks with multiple sets. Currently this is focused on
register moves and rejects any sort of arithmetic operations.

This series is an extension to allow more sequences to take part in if
conversion. The first patch is a required change to emit correct code and the
second patch whitelists a larger number of operations through
bb_ok_for_noce_convert_multiple_sets. The third patch adds support to rewire
multiple registers in noce_convert_multiple_sets_1 and refactors the code with
a new helper info struct. The fourth patch removes some old code that should
not be needed anymore.

For targets that have a rich selection of conditional instructions,
like aarch64, I have seen an ~5x increase of profitable if conversions for
multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
benchmarks and I have not seen performance regressions on either x64 / aarch64.

Some samples that previously resulted in a branch but now better use these
instructions can be seen in the provided test cases.

Bootstrapped and tested on AArch64 and x86-64.


Changes in v4:
- Remove unnecessary hardcoded list of allowed ops in
bb_ok_for_noce_convert_multiple_sets.
- Set need_cmov based on BB live_out instead of REG_DEAD notes.
- Fix preexisting issues and improve the code that sets read_comparison.

Manolis Tsamis (3):
  [RFC] ifcvt: handle sequences that clobber flags in
noce_convert_multiple_sets
  [RFC] ifcvt: Allow more operations in multiple set if conversion
  [RFC] ifcvt: Handle multiple rewired regs and refactor
noce_convert_multiple_sets

 gcc/ifcvt.cc  | 383 --
 gcc/ifcvt.h   |  16 +
 .../aarch64/ifcvt_multiple_sets_arithm.c  |  79 
 .../aarch64/ifcvt_multiple_sets_rewire.c  |  20 +
 4 files changed, 292 insertions(+), 206 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

-- 
2.44.0



[PING] [PATCH v4 1/3] [RFC] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2024-06-03 Thread Manolis Tsamis
This is an extension of what was done in PR106590.

Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).

This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. This makes it possible to include arithmetic operations
in noce_convert_multiple_sets.

It also makes the code that checks whether the condition is used outside of the
if_then_else emitted more robust.

gcc/ChangeLog:

* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
Refactor the code that sets read_comparison.

Signed-off-by: Manolis Tsamis 
---

(no changes since v1)

 gcc/ifcvt.cc | 106 ---
 1 file changed, 59 insertions(+), 47 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..763a25f816e 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3592,20 +3592,6 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   return true;
 }
 
-/* Helper function for noce_convert_multiple_sets_1.  If store to
-   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
-
-static void
-check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
-{
-  rtx *p = (rtx *) p0;
-  if (p[0] == NULL_RTX)
-return;
-  if (reg_overlap_mentioned_p (dest, p[0])
-  || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
-p[0] = NULL_RTX;
-}
-
 /* This goes through all relevant insns of IF_INFO->then_bb and tries to
create conditional moves.  In case a simple move sufficis the insn
should be listed in NEED_NO_CMOV.  The rewired-src cases should be
@@ -3731,36 +3717,67 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
*if_info,
 creating an additional compare for each.  If successful, costing
 is easier and this sequence is usually preferred.  */
   if (cc_cmp)
-   seq2 = try_emit_cmove_seq (if_info, temp, cond,
-  new_val, old_val, need_cmov,
-  &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+   {
+ seq2 = try_emit_cmove_seq (if_info, temp, cond,
+new_val, old_val, need_cmov,
+&cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+
+ /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
+clobbered.  We can't safely use the sequence in this case.  */
+ if (seq2 && (modified_in_p (cc_cmp, seq2)
+ || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2
+   seq2 = NULL;
+   }
 
   /* The backend might have created a sequence that uses the
-condition.  Check this.  */
+condition as a value.  Check this.  */
+
+  /* We cannot handle anything more complex than a reg or constant.  */
+  if (!REG_P (XEXP (cond, 0)) && !CONSTANT_P (XEXP (cond, 0)))
+   read_comparison = true;
+
+  if (!REG_P (XEXP (cond, 1)) && !CONSTANT_P (XEXP (cond, 1)))
+   read_comparison = true;
+
   rtx_insn *walk = seq2;
-  while (walk)
+  int if_then_else_count = 0;
+  while (walk && !read_comparison)
{
- rtx set = single_set (walk);
+ rtx exprs_to_check[2];
+ unsigned int exprs_count = 0;
 
- if (!set || !SET_SRC (set))
+ rtx set = single_set (walk);
+ if (set && XEXP (set, 1)
+ && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
{
- walk = NEXT_INSN (walk);
- continue;
+ /* We assume that this is the cmove created by the backend that
+naturally uses the condition.  */
+ exprs_to_check[exprs_count++] = XEXP (XEXP (set, 1), 1);
+ exprs_to_check[exprs_count++] = XEXP (XEXP (set, 1), 2);
+ if_then_else_count++;
}
+ else if (NONDEBUG_INSN_P (walk))
+   exprs_to_check[exprs_count++] = PATTERN (walk);
 
- rtx src = SET_SRC (set);
+ /* Bail if we get more than one if_then_else because the assumption
+above may be incorrect.  */
+ if (if_then_else_count > 1)
+   {
+ read_comparison = true;
+ break;
+   }
 
- if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
-   ; /* We assume that this is the cmove created by the backend that
-naturally uses the condition.  Therefore we ignore it.  */
- else
+ for (unsigned int i = 0; i < exprs_count; i++)
{

[PING] [PATCH v4 3/3] [RFC] ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets

2024-06-03 Thread Manolis Tsamis
The existing implementation of need_cmov_or_rewire and
noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
This commit enchances them so they can handle/rewire arbitrary set statements.

To do that a new helper struct noce_multiple_sets_info is introduced which is
used by noce_convert_multiple_sets and its helper functions. This results in
cleaner function signatures, improved efficientcy (a number of vecs and hash
set/map are replaced with a single vec of struct) and simplicity.

gcc/ChangeLog:

* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
rewiring of multiple registers.
(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
noce_multiple_sets_info to store info for noce_convert_multiple_sets.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.

Signed-off-by: Manolis Tsamis 
---

(no changes since v1)

 gcc/ifcvt.cc  | 243 --
 gcc/ifcvt.h   |  16 ++
 .../aarch64/ifcvt_multiple_sets_rewire.c  |  20 ++
 3 files changed, 141 insertions(+), 138 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_rewire.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index dc00042be81..8e36c16ee57 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -98,14 +98,10 @@ static bool dead_or_predicable (basic_block, basic_block, 
basic_block,
edge, bool);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
-static void need_cmov_or_rewire (basic_block, hash_set *,
-hash_map *);
+static void init_noce_multiple_sets_info (basic_block,
+  auto_delete_vec &);
 static bool noce_convert_multiple_sets_1 (struct noce_if_info *,
- hash_set *,
- hash_map *,
- auto_vec *,
- auto_vec *,
- auto_vec *, int *);
+  auto_delete_vec &, int *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3487,24 +3483,13 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
 
-  /* The true targets for a conditional move.  */
-  auto_vec targets;
-  /* The temporaries introduced to allow us to not consider register
- overlap.  */
-  auto_vec temporaries;
-  /* The insns we've emitted.  */
-  auto_vec unmodified_insns;
-
-  hash_set need_no_cmov;
-  hash_map rewired_src;
-
-  need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src);
+  auto_delete_vec insn_info;
+  init_noce_multiple_sets_info (then_bb, insn_info);
 
   int last_needs_comparison = -1;
 
   bool ok = noce_convert_multiple_sets_1
-(if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
- &unmodified_insns, &last_needs_comparison);
+(if_info, insn_info, &last_needs_comparison);
   if (!ok)
   return false;
 
@@ -3519,8 +3504,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   end_sequence ();
   start_sequence ();
   ok = noce_convert_multiple_sets_1
-   (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-&unmodified_insns, &last_needs_comparison);
+   (if_info, insn_info, &last_needs_comparison);
   /* Actually we should not fail anymore if we reached here,
 but better still check.  */
   if (!ok)
@@ -3529,12 +3513,12 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 
   /* We must have seen some sort of insn to insert, otherwise we were
  given an empty BB to convert, and we can't handle that.  */
-  gcc_assert (!unmodified_insns.is_empty ());
+  gcc_assert (!insn_info.is_empty ());
 
   /* Now fixup the assignments.  */
-  for (unsigned i = 0; i < targets.length (); i++)
-if (targets[i] != temporaries[i])
-  noce_emit_move_insn (targets[i], temporaries[i]);
+  for (unsigned i = 0; i < insn_info.length (); i++)
+if (insn_info[i]->target != insn_info[i]->temporary)
+  noce_emit_move_insn (insn_info[i]->target, insn_info[i]->temporary);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
@@ -3549,10 +3533,10 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 set_used_flags (insn);
 
   /* Mark all our temporaries and targets as used.  */
-  for (unsigned i = 0; i < targets.length (); i++)
+  for (unsigned i = 0; i < insn_info.length (); i++)
 {
-  set_used_flags (temporaries[i]);
-  set_used_flags (targets[i]);
+  set_used_flags (insn_info[i]-

Re: [PATCH 14/52] fr30: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in fr30 port.

gcc/ChangeLog:

* config/fr30/fr30.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-03 Thread Richard Sandiford
Robin Dapp  writes:
> Hi,
>
> before noce_find_if_block processes a block it sets up an if_info
> structure that holds the original costs.  At that point the costs of
> the then/else blocks have not been added so we only care about the
> "if" cost.
>
> The code originally used BRANCH_COST for that but was then changed
> to COST_N_INSNS (2) - a compare and a jump.
> This patch computes the jump costs via
>   insn_cost (if_info.jump, ...)
> which is supposed to incorporate the branch costs and, in case of a CC
> comparison,
>   pattern_cost (if_info.cond, ...)
> which is supposed to account for the CC creation.
>
> For compare_and_jump patterns insn_cost should have already computed
> the right cost.
>
> Does this "split" make sense, generally?
>
> Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> on riscv.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   * ifcvt.cc (noce_process_if_block): Subtract condition pattern
>   cost if applicable.
>   (noce_find_if_block): Use insn_cost and pattern_cost for
>   original cost.
> ---
>  gcc/ifcvt.cc | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 58ed42673e5..305b9faed38 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3940,7 +3940,9 @@ noce_process_if_block (struct noce_if_info *if_info)
>   ??? Actually, instead of the branch instruction costs we might want
>   to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
>  
> -  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
> +  unsigned potential_cost = if_info->original_cost;
> +  if (cc_in_cond (if_info->cond))
> +potential_cost -= pattern_cost (if_info->cond, if_info->speed_p);
>unsigned old_cost = if_info->original_cost;
>if (!else_bb
>&& HAVE_conditional_move
> @@ -4703,11 +4705,13 @@ noce_find_if_block (basic_block test_bb, edge 
> then_edge, edge else_edge,
>  = targetm.max_noce_ifcvt_seq_cost (then_edge);
>/* We'll add in the cost of THEN_BB and ELSE_BB later, when we check
>   that they are valid to transform.  We can't easily get back to the insn
> - for COND (and it may not exist if we had to canonicalize to get COND),
> - and jump_insns are always given a cost of 1 by seq_cost, so treat
> - both instructions as having cost COSTS_N_INSNS (1).  */
> -  if_info.original_cost = COSTS_N_INSNS (2);
> -
> + for COND (and it may not exist if we had to canonicalize to get COND).
> + Here we assume one CC compare insn (if the target uses CC) and one
> + jump insn that is costed via insn_cost.  It is assumed that the
> + costs of a jump insn are dependent on the branch costs.  */
> +  if (cc_in_cond (if_info.cond))
> +if_info.original_cost = pattern_cost (if_info.cond, if_info.speed_p);
> +  if_info.original_cost += insn_cost (if_info.jump, if_info.speed_p);
>  
>/* Do the real work.  */

Is there any way we can avoid using pattern_cost here?  Using it means
that we can make use of targetm.insn_cost for the jump but circumvent
it for the condition, giving a bit of a mixed metric.

(I realise there are existing calls to pattern_cost in ifcvt.cc,
but if possible I think we should try to avoid adding more.)

Thanks,
Richard


[r14-10271 Regression] FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors) on Linux/x86_64

2024-06-03 Thread haochen.jiang
On Linux/x86_64,

97474ba2075dc3c397bbc2861646561dcfd13386 is the first bad commit
commit 97474ba2075dc3c397bbc2861646561dcfd13386
Author: Haochen Jiang 
Date:   Mon May 20 15:52:32 2024 +0800

Add AVX10.1 target_clones support

caused

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/gcc-14/releases/gcc-14/r14-10271/usr
 --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH 06/52] m2: Replace uses of {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE

2024-06-03 Thread Gaius Mulley
Kewen Lin  writes:

> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> in m2 with TYPE_PRECISION of {float,{,long_}double}_type_node.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>
> gcc/m2/ChangeLog:
>
>   * gm2-gcc/m2type.cc (build_m2_short_real_node): Use TYPE_PRECISION of
>   float_type_node to replace FLOAT_TYPE_SIZE.
>   (build_m2_real_node): Use TYPE_PRECISION of double_type_node to
>   replace DOUBLE_TYPE_SIZE.
>   (build_m2_long_real_node): Use TYPE_PRECISION of
>   long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

> ---
>  gcc/m2/gm2-gcc/m2type.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/m2/gm2-gcc/m2type.cc b/gcc/m2/gm2-gcc/m2type.cc
> index 571923c08ef..d52cbdf0b99 100644
> --- a/gcc/m2/gm2-gcc/m2type.cc
> +++ b/gcc/m2/gm2-gcc/m2type.cc
> @@ -1420,7 +1420,7 @@ build_m2_short_real_node (void)
>/* Define `REAL'.  */
>  
>c = make_node (REAL_TYPE);
> -  TYPE_PRECISION (c) = FLOAT_TYPE_SIZE;
> +  TYPE_PRECISION (c) = TYPE_PRECISION (float_type_node);
>layout_type (c);
>return c;
>  }
> @@ -1433,7 +1433,7 @@ build_m2_real_node (void)
>/* Define `REAL'.  */
>  
>c = make_node (REAL_TYPE);
> -  TYPE_PRECISION (c) = DOUBLE_TYPE_SIZE;
> +  TYPE_PRECISION (c) = TYPE_PRECISION (double_type_node);
>layout_type (c);
>return c;
>  }
> @@ -1447,7 +1447,7 @@ build_m2_long_real_node (void)
>if (M2Options_GetIBMLongDouble ())
>  {
>longreal = make_node (REAL_TYPE);
> -  TYPE_PRECISION (longreal) = LONG_DOUBLE_TYPE_SIZE;
> +  TYPE_PRECISION (longreal) = TYPE_PRECISION (long_double_type_node);
>  }
>else if (M2Options_GetIEEELongDouble ())
>  longreal = float128_type_node;

all look good to me thanks,

regards,
Gaius


[PATCH] testsuite/115304 - properly guard gcc.dg/vect/slp-gap-1.c

2024-06-03 Thread Richard Biener
Testing on sparc shows we need vect_unpack and vect_perm.  This
isn't enough to resolve the GCN fail which ends up using interleaving.

Tested on x86_64-linux and sparc-solaris.

PR testsuite/115304
* gcc.dg/vect/slp-gap-1.c: Require vect_unpack and vect_perm.
---
 gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
index 36463ca22c5..9856da7a7f4 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -15,4 +15,4 @@ void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, 
uint8_t *pix2) {
 /* We can vectorize this without peeling for gaps and thus without epilogue,
but the only thing we can reliably scan is the zero-padding trick for the
partial loads.  */
-/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
vect64 } } } */
+/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target { 
vect64 && { vect_unpack && vect_perm } } } } } */
-- 
2.35.3


[COMMITTED] Remove value_range typedef.

2024-06-03 Thread Aldy Hernandez
Now that pointers and integers have been disambiguated from irange,
and all the pointer range temporaries use prange, we can reclaim
value_range as a general purpose range container.

This patch removes the typedef, in favor of int_range_max, thus
providing slightly better ranges in places.  I have also used
int_range<1> or <2> when it's known ahead of time how big the range will
be, thus saving a few words.

In a follow-up patch I will rename the Value_Range temporary to
value_range.

No change in performance.

gcc/ChangeLog:

* builtins.cc (expand_builtin_strnlen): Replace value_range use
with int_range_max or irange when appropriate.
(determine_block_size): Same.
* fold-const.cc (minmax_from_comparison): Same.
* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Same.
(array_bounds_checker::check_array_ref): Same.
* gimple-fold.cc (size_must_be_zero_p): Same.
* gimple-predicate-analysis.cc (find_var_cmp_const): Same.
* gimple-ssa-sprintf.cc (get_int_range): Same.
(format_integer): Same.
(try_substitute_return_value): Same.
(handle_printf_call): Same.
* gimple-ssa-warn-restrict.cc
(builtin_memref::extend_offset_range): Same.
* graphite-sese-to-poly.cc (add_param_constraints): Same.
* internal-fn.cc (get_min_precision): Same.
* match.pd: Same.
* pointer-query.cc (get_size_range): Same.
* range-op.cc (get_shift_range): Same.
(operator_trunc_mod::op1_range): Same.
(operator_trunc_mod::op2_range): Same.
* range.cc (range_negatives): Same.
* range.h (range_positives): Same.
(range_negatives): Same.
* tree-affine.cc (expr_to_aff_combination): Same.
* tree-data-ref.cc (compute_distributive_range): Same.
(nop_conversion_for_offset_p): Same.
(split_constant_offset): Same.
(split_constant_offset_1): Same.
(dr_step_indicator): Same.
* tree-dfa.cc (get_ref_base_and_extent): Same.
* tree-scalar-evolution.cc (iv_can_overflow_p): Same.
* tree-ssa-math-opts.cc (optimize_spaceship): Same.
* tree-ssa-pre.cc (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.cc (compare_nonzero_chars): Same.
(dump_strlen_info): Same.
(get_range_strlen_dynamic): Same.
(set_strlen_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(strlen_pass::get_len_or_size): Same.
(strlen_pass::handle_builtin_string_cmp): Same.
(strlen_pass::count_nonzero_bytes_addr): Same.
(strlen_pass::handle_integral_assign): Same.
* tree-switch-conversion.cc (bit_test_cluster::emit): Same.
* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters): Same.
(vect_do_peeling): Same.
* tree-vect-patterns.cc (vect_get_range_info): Same.
(vect_recog_divmod_pattern): Same.
* tree.cc (get_range_pos_neg): Same.
* value-range.cc (debug): Remove value_range variants.
* value-range.h (value_range): Remove typedef.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Replace
value_range use with int_range_max or irange when appropriate.
(check_for_binary_op_overflow): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.
(find_case_label_ranges): Same.
(simplify_using_ranges::simplify_abs_using_ranges): Same.
(test_for_singularity): Same.
(simplify_using_ranges::simplify_compare_using_ranges_1): Same.
(simplify_using_ranges::simplify_casted_compare): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.
(simplify_conversion_using_ranges): Same.
(simplify_using_ranges::two_valued_val_range_p): Same.
---
 gcc/builtins.cc  |  4 ++--
 gcc/fold-const.cc|  4 ++--
 gcc/gimple-array-bounds.cc   |  4 ++--
 gcc/gimple-fold.cc   |  4 ++--
 gcc/gimple-predicate-analysis.cc |  2 +-
 gcc/gimple-ssa-sprintf.cc|  8 +++
 gcc/gimple-ssa-warn-restrict.cc  |  2 +-
 gcc/graphite-sese-to-poly.cc |  2 +-
 gcc/internal-fn.cc   |  2 +-
 gcc/match.pd | 22 +-
 gcc/pointer-query.cc |  2 +-
 gcc/range-op.cc  | 21 +-
 gcc/range.cc | 10 -
 gcc/range.h  |  4 ++--
 gcc/tree-affine.cc   |  2 +-
 gcc/tree-data-ref.cc | 28 +++
 gcc/tree-dfa.cc  |  2 +-
 gcc/tree-scalar-evolution.cc |  2 +-
 gcc/tree-ssa-math-opts.cc|  2 +-
 gcc/tree-ssa-pre.cc  |  2 +-
 gcc/tree-ssa-reassoc.cc  |  2 +-
 gcc/tree-ssa-strlen.cc   | 22 +-
 gcc/tree-switch-conversion.cc|  2 +-
 gcc/tree-vect-loop-manip.cc

[r15-984 Regression] FAIL: gcc.target/i386/pr86722.c scan-assembler-times (?n)(?:andnpd|andpd) 1 on Linux/x86_64

2024-06-03 Thread haochen.jiang
On Linux/x86_64,

ac306de7d5100d3682eae2270995a9abbe19db38 is the first bad commit
commit ac306de7d5100d3682eae2270995a9abbe19db38
Author: liuhongt 
Date:   Fri May 31 14:38:07 2024 +0800

Add some preference for floating point rtl ifcvt when sse4.1 is not 
available

caused

FAIL: gcc.target/i386/pr86722.c scan-assembler-times (?n)(?:andnpd|andpd) 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-984/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr86722.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH 15/52] frv: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in frv port.

gcc/ChangeLog:

* config/frv/frv.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH 18/52] iq2000: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in iq2000 port.

gcc/ChangeLog:

* config/iq2000/iq2000.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH 20/52] m32c: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in m32c port.

gcc/ChangeLog:

* config/m32c/m32c.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH 21/52] m32r: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in m32r port.

gcc/ChangeLog:

* config/m32r/m32r.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Peter0x44

On 2024-06-01 03:22, Jonathan Wakely wrote:

Here's an implementation of the C++26  header.

We should really have some tests that invoke GDB and check that the
breakpoint works when a debugger is attached. That seems tricky to do
via the main conformance.exp testsuite. It could be done via the
prettyprinters.exp testsuite, which already uses GDB, but would require
some changes to the gdb-test procedure, which assumes it needs to 
insert

its own breakpoint at marked spots in the code.

I think we could add this without those tests, and improve it later.

-- >8 --

It would be good to provide a macOS definition of is_debugger_present,
but that isn't included in this change.

libstdc++-v3/ChangeLog:

* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for facilities needed by .
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (debugging): Add.
* include/bits/version.h: Regenerate.
* src/c++26/Makefile.am: Add new file.
* src/c++26/Makefile.in: Regenerate.
* include/std/debugging: New file.
* src/c++26/debugging.cc: New file.
* testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
* testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
New test.
---
 libstdc++-v3/config.h.in  |   9 ++
 libstdc++-v3/configure|  22 +++
 libstdc++-v3/configure.ac |   9 ++
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   9 ++
 libstdc++-v3/include/bits/version.h   |  10 ++
 libstdc++-v3/include/std/debugging|  82 
 libstdc++-v3/src/c++26/Makefile.am|   4 +-
 libstdc++-v3/src/c++26/Makefile.in|   7 +-
 libstdc++-v3/src/c++26/debugging.cc   | 126 ++
 .../19_diagnostics/debugging/breakpoint.cc|   9 ++
 .../debugging/breakpoint_if_debugging.cc  |   9 ++
 13 files changed, 295 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/include/std/debugging
 create mode 100644 libstdc++-v3/src/c++26/debugging.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/breakpoint.cc
 create mode 100644 
libstdc++-v3/testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc


diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 486ba450749..07203815459 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -70,6 +70,9 @@
 /* Define to 1 if you have the `cosl' function. */
 #undef HAVE_COSL

+/* Define to 1 if you have the  header file. */
+#undef HAVE_DEBUGAPI_H
+
 /* Define to 1 if you have the declaration of `strnlen', and to 0 if 
you

don't. */
 #undef HAVE_DECL_STRNLEN
@@ -436,6 +439,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_PARAM_H

+/* Define to 1 if you have the  header file. */
+#undef HAVE_SYS_PTRACE_H
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_RESOURCE_H

@@ -847,6 +853,9 @@
 /* Define if nl_langinfo_l should be used for std::text_encoding. */
 #undef _GLIBCXX_USE_NL_LANGINFO_L

+/* Define if /proc/self/status should be used for . */
+#undef _GLIBCXX_USE_PROC_SELF_STATUS
+
 /* Define if pthreads_num_processors_np is available in . 
*/

 #undef _GLIBCXX_USE_PTHREADS_NUM_PROCESSORS_NP

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5645e991af7..55ddf36a7f1 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -54612,6 +54612,28 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu



+# For std::is_debugger_present
+case "$target_os" in
+  linux*)
+
+$as_echo "#define _GLIBCXX_USE_PROC_SELF_STATUS 1" >>confdefs.h
+
+;;
+esac
+for ac_header in sys/ptrace.h debugapi.h
+do :
+  as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
+ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" 
"$ac_includes_default"

+if eval test \"x\$"$as_ac_Header"\" = x"yes"; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+
+done
+
+
 # Define documentation rules conditionally.

 # See if makeinfo has been installed and is modern enough
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index ccb24a82be7..96b412fb7ae 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -573,6 +573,15 @@ GLIBCXX_CHECK_FILEBUF_NATIVE_HANDLES
 # For std::text_encoding
 GLIBCXX_CHECK_TEXT_ENCODING

+# For std::is_debugger_present
+case "$target_os" in
+  linux*)
+AC_DEFINE([_GLIBCXX_USE_PROC_SELF_STATUS],1,
+ [Define if /proc/self/status should be used for .])
+;;
+esac
+AC_CHECK_HEADERS([sys/ptrace.h debugapi.h])
+
 # Define documentation rules conditionally.

 # See if makeinfo has been installed and is modern enough
diff --git a/libstdc++-v3/in

Re: [PATCH 25/52] msp430: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in msp430 port.

gcc/ChangeLog:

* config/msp430/msp430.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH 32/52] stormy16: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in stormy16 port.

gcc/ChangeLog:

* config/stormy16/stormy16.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




Re: [PATCH 43/52] rx: New hook implementation rx_c_mode_for_floating_type

2024-06-03 Thread Nick Clifton

Hi Kewen,


This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in rx port, and add new port specific hook
implementation rx_c_mode_for_floating_type.

gcc/ChangeLog:

* config/rx/rx.cc (rx_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/rx/rx.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Approved - please apply.

Cheers
  Nick




[PATCH] Arm: Fix ldrd offset range [PR115153]

2024-06-03 Thread Wilco Dijkstra

The valid offset range of LDRD in arm_legitimate_index_p is increased to
-1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode.
Fix this by moving the LDRD check earlier.

Passes bootstrap & regress, OK for commit?

gcc:
PR target/115153
* config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before 
NEON.
(thumb2_legitimate_index_p): Update comments.

gcc/testsuite:
PR target/115153
* gcc.target/arm/pr115153.c: Add new test.

---

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
ea0c963a4d67ecd70e1571624e84dfe46d757df9..d260ebe0734d424942a773386986a02fe6d1803c
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -8852,6 +8852,28 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
&& INTVAL (index) > -1024
&& (INTVAL (index) & 3) == 0);
 
+  if (arm_address_register_rtx_p (index, strict_p)
+  && (GET_MODE_SIZE (mode) <= 4))
+return 1;
+
+  /* This handles DFmode only if !TARGET_HARD_FLOAT.  */
+  if (mode == DImode || mode == DFmode)
+{
+  if (code == CONST_INT)
+   {
+ HOST_WIDE_INT val = INTVAL (index);
+
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return val > -256 && val < 256;
+ else
+   return val > -4096 && val < 4092;
+   }
+
+  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
+}
+
   /* For quad modes, we restrict the constant offset to be slightly less
  than what the instruction format permits.  We do this because for
  quad mode moves, we will actually decompose them into two separate
@@ -8864,7 +8886,7 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
&& (INTVAL (index) & 3) == 0);
 
   /* We have no such constraint on double mode offsets, so we permit the
- full range of the instruction format.  */
+ full range of the instruction format.  Note DImode is included here.  */
   if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
 return (code == CONST_INT
&& INTVAL (index) < 1024
@@ -8877,27 +8899,6 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
&& INTVAL (index) > -1024
&& (INTVAL (index) & 3) == 0);
 
-  if (arm_address_register_rtx_p (index, strict_p)
-  && (GET_MODE_SIZE (mode) <= 4))
-return 1;
-
-  if (mode == DImode || mode == DFmode)
-{
-  if (code == CONST_INT)
-   {
- HOST_WIDE_INT val = INTVAL (index);
-
- /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
-If vldr is selected it uses arm_coproc_mem_operand.  */
- if (TARGET_LDRD)
-   return val > -256 && val < 256;
- else
-   return val > -4096 && val < 4092;
-   }
-
-  return TARGET_LDRD && arm_address_register_rtx_p (index, strict_p);
-}
-
   if (GET_MODE_SIZE (mode) <= 4
   && ! (arm_arch4
&& (mode == HImode
@@ -9000,7 +9001,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx index, 
int strict_p)
&& (INTVAL (index) & 3) == 0);
 
   /* We have no such constraint on double mode offsets, so we permit the
- full range of the instruction format.  */
+ full range of the instruction format.  Note DImode is included here.  */
   if (TARGET_NEON && VALID_NEON_DREG_MODE (mode))
 return (code == CONST_INT
&& INTVAL (index) < 1024
@@ -9011,6 +9012,7 @@ thumb2_legitimate_index_p (machine_mode mode, rtx index, 
int strict_p)
   && (GET_MODE_SIZE (mode) <= 4))
 return 1;
 
+  /* This handles DImode if !TARGET_NEON, and DFmode if !TARGET_VFP_BASE.  */
   if (mode == DImode || mode == DFmode)
 {
   if (code == CONST_INT)
diff --git a/gcc/testsuite/gcc.target/arm/pr115153.c 
b/gcc/testsuite/gcc.target/arm/pr115153.c
new file mode 100644
index 
..db1cd93b3d31b33a9800dac0d8dbe73e058e4073
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr115153.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -marm" } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-add-options arm_v8_neon } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** f1:
+** add r0, r0, #256
+** ldrdr0, r1, \[r0\]
+** bx  lr
+*/
+long long f1 (long long *p)
+{
+  return __atomic_load_n (p + 32, __ATOMIC_RELAXED);
+}



[PATCH] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-03 Thread Wilco Dijkstra
A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected.  To work around this, use the 'Uw'
constraint that blocks writeback.

Passes bootstrap & regress, OK for commit?

gcc:
PR target/115153
* config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
(arm_atomic_store): Likewise.

gcc/testsuite:
PR target/115188
* gcc.target/arm/pr115188.c: Add new test.

---

diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 
df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -65,7 +65,7 @@
 (define_insn "arm_atomic_load"
   [(set (match_operand:QHSI 0 "register_operand" "=r,l")
 (unspec_volatile:QHSI
-  [(match_operand:QHSI 1 "memory_operand" "m,m")]
+  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
   VUNSPEC_LDR))]
   ""
   "ldr\t%0, %1"
@@ -81,7 +81,7 @@
 )
 
 (define_insn "arm_atomic_store"
-  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
+  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
 (unspec_volatile:QHSI
   [(match_operand:QHSI 1 "register_operand" "r,l")]
   VUNSPEC_STR))]
diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
b/gcc/testsuite/gcc.target/arm/pr115188.c
new file mode 100644
index 
..ef40d7732b77936c845707989465a01ecca5adb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr115188.c
@@ -0,0 +1,10 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_arch_v6m_ok }
+/* { dg-options "-O2 -mthumb" } */
+/* { dg-add-options arm_arch_v6m } */
+
+void init (int *p, int n)
+{
+  for (int i = 0; i < n; i++)
+__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
+}



[PATCH] s390: testsuite: Fix nobp-table-jump-*.c

2024-06-03 Thread Stefan Schulze Frielinghaus
Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong
enough in order to render these tests useless.  Fixed by disabling IPA.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform
IPA.
* gcc.target/s390/nobp-table-jump-inline-z900.c: Dito.
* gcc.target/s390/nobp-table-jump-z10.c: Dito.
* gcc.target/s390/nobp-table-jump-z900.c: Dito.
---
 Ok for mainline?

 .../s390/nobp-table-jump-inline-z10.c | 42 +--
 .../s390/nobp-table-jump-inline-z900.c| 42 +--
 .../gcc.target/s390/nobp-table-jump-z10.c | 42 +--
 .../gcc.target/s390/nobp-table-jump-z900.c| 42 +--
 4 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
index 8dfd7e4c786..121751166d0 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 (void) { return 13; }
-int __attribute__((noinline,noclone)) foo14 (void) { return 14; }
-int __attribute__((noinline,noclone)) foo15 (void) { return 15; }
-int __attribute__((noinline,noclone)) foo16 (void) { return 16; }
-int __attribute__((noinline,noclone)) foo17 (void) { return 17; }
-int __attribute__((noinline,noclone)) foo18 (void) { return 18; }
-int __attribute__((noinline,noclone)) foo19 (void) { return 19; }
-int __attribute__((noinline,noclone)) foo20 (void) { return 20; }
+int __attribute__((noipa)) foo1 (void) { return 1; }
+int __attribute__((noipa)) foo2 (void) { return 2; }
+int __attribute__((noipa)) foo3 (void) { return 3; }
+int __attribute__((noipa)) foo4 (void) { return 4; }
+int __attribute__((noipa)) foo5 (void) { return 5; }
+int __attribute__((noipa)) foo6 (void) { return 6; }
+int __attribute__((noipa)) foo7 (void) { return 7; }
+int __attribute__((noipa)) foo8 (void) { return 8; }
+int __attribute__((noipa)) foo9 (void) { return 9; }
+int __attribute__((noipa)) foo10 (void) { return 10; }
+int __attribute__((noipa)) foo11 (void) { return 11; }
+int __attribute__((noipa)) foo12 (void) { return 12; }
+int __attribute__((noipa)) foo13 (void) { return 13; }
+int __attribute__((noipa)) foo14 (void) { return 14; }
+int __attribute__((noipa)) foo15 (void) { return 15; }
+int __attribute__((noipa)) foo16 (void) { return 16; }
+int __attribute__((noipa)) foo17 (void) { return 17; }
+int __attribute__((noipa)) foo18 (void) { return 18; }
+int __attribute__((noipa)) foo19 (void) { return 19; }
+int __attribute__((noipa)) foo20 (void) { return 20; }
 
 
-int __attribute__((noinline,noclone))
+int __attribute__((noipa))
 bar (int a)
 {
   int ret = 0;
diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
index 46d2c54bcff..5ad0c72afc3 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 (

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Ajit Agarwal
Hello Richard:

On 03/06/24 5:03 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>>> [...]
>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>> (and all other unspecs) from plain addition?
>>>
>>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>> (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>>
>>
>> Plain addition are not supported currently.
>> We have not seen many cases with plain addition and this patch
>> will not accept plain addition.
>>
>>  
>>> This is why the intention behind the patch is important.  As it stands,
>>> it isn't clear what criteria the patch is using to distinguish "valid"
>>> fuse candidates from "invalid" ones.
>>>
>>
>> Intention behind this patch all variants of UNSPEC instructions are
>> supported and uses without UNSPEC are not supported in this patch.
> 
> But why make the distinction this way though?  UNSPEC is a very
> GCC-specific concept.  Whether something is an UNSPEC or some other
> RTL code depends largely on historical accident.  E.g. we have specific
> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
> 
> It seems unlikely that GCC's choice about whether to represent something
> as an UNSPEC or as another RTL code lines up neatly with the kind of
> codegen decisions that a good assembly programmer would make.
> 
> I suppose another way of asking is to turn this around and say: what
> kind of uses are you trying to exclude?  Presumably things are worse
> if you remove this function override.  But what makes them worse?
> What kind of uses cause the regression?
> 

Uses of fused load where load with low address uses are modified with load with 
high address uses.

Similarly load with high address uses are modified with load low address
uses.

This is the semantics of lxvp instructions which can occur through
UNSPEC uses otherwise it breaks the functionality and seen failure
in almost all vect regressions and SPEC benchmarks.


 [...]
 +  // Given insn_info pair I1 and I2, return true if offsets are in 
 order.
 +  virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
 +rtl_ssa::insn_info *i2) = 
 0;
 +
>>>
>>> This name seems a bit misleading.  The function is used in:
>>>
>>> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, 
>>> unsigned access_size,
>>>reversed = true;
>>>  }
>>>  
>>> +  if (!m_pass->should_handle_unordered_insns (i1, i2))
>>> +return false;
>>> +
>>>rtx cand_mems[2];
>>>rtx reg_ops[2];
>>>rtx pats[2];
>>>
>>> and so it acts as a general opt-out.  The insns aren't known to be 
>>> unordered.
>>>
>>> It looks like the rs6000 override requires the original insns to be
>>> in offset order.  Could you say why that's necessary?  (Both in email
>>> and as a comment in the code.)
>>>
>>
>> Yes rs6000 requires the original load insns to be in offset order.
>> Some regression tests like vect-outer-4f fails if we do load pair
>> fusion with load offsets are not in offset order as this breaks lxvp 
>> semantics.
>
> How does it break the semantics though?  In principle, the generic code
> only fuses if it has "proved" that the loads can happen in either order.
> So it shouldn't matter which order the hardware does things in.
>
> Could you give an example of the kind of situation that you want
> to avoid, and why it generates the wrong result?
>

 (insn 31 62 32 2 (set (reg:V16QI 177 [ MEM  
 [(short unsigned int *)vectp.62_36 + 64B] ])
 (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
 (const_int 64 [0x40])) [1 MEM >>> int> [(short unsigned int *)vectp.62_36 + 64B]+0 S16 A16]))  
 {vsx_movv16qi_64bit}
  (nil))
 (insn 32 31 16 2 (set (reg:V16QI 178 [ MEM  
 [(short unsigned int *)vectp.62_36 + 80B] ])
 (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
 (const_int 80 [0x50])) [1 MEM >>> int> [(short unsigned int *)vectp.62_36 + 80B]+0 S16 A16]))  
 {vsx_movv16qi_64bit}
  (nil))
 (insn 16 32 21 2 (set (reg:V16QI 159 [ MEM  
 [(short unsigned int *)vectp.62_36 + 16B] ])
 (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
 (const_int 16 [0x10])) [1 MEM >>> int> [(short unsigned int *)vectp.62_36 + 16B]+0 S16 A16]))  
 {vsx_movv16qi_64bit}
  (nil))
 (insn 21 16 22 2 (set (reg:V16QI 165 [ MEM  
 [(short unsigned int *)vectp.62_36 + 32B] ])
 (mem:V16QI (plus:DI (reg/f:DI 121 [ vectp.62 ])
 (const_int 32 [0x20])) [1 MEM >>> int> [(short unsigned int *)

[committed] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak
PR target/115321

gcc/ChangeLog:

* config/i386/i386.md (bswapsi2): Force operand 1
to a register also for !TARGET_BSWAP.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115321.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2c95395b7be..ef83984d00e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21193,18 +21193,19 @@ (define_expand "bswapsi2"
(bswap:SI (match_operand:SI 1 "nonimmediate_operand")))]
   ""
 {
-  if (TARGET_MOVBE)
-;
-  else if (TARGET_BSWAP)
-operands[1] = force_reg (SImode, operands[1]);
-  else
+  if (!TARGET_MOVBE)
 {
-  rtx x = gen_reg_rtx (SImode);
+  operands[1] = force_reg (SImode, operands[1]);
 
-  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
-  emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
-  DONE;
+  if (!TARGET_BSWAP)
+   {
+ rtx x = gen_reg_rtx (SImode);
+
+ emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
+ emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
+ emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
+ DONE;
+   }
 }
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr115321.c 
b/gcc/testsuite/gcc.target/i386/pr115321.c
new file mode 100644
index 000..0ddab9bd7a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115321.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-march=i386" } */
+
+unsigned foo (unsigned x) { return __builtin_bswap32 (x); }


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 03/06/24 5:03 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
 [...]
 If it is intentional, what distinguishes things like vperm and xxinsertw
 (and all other unspecs) from plain addition?

   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
 (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
(match_operand:VSX_F 2 "vsx_register_operand" "wa")))]

>>>
>>> Plain addition are not supported currently.
>>> We have not seen many cases with plain addition and this patch
>>> will not accept plain addition.
>>>
>>>  
 This is why the intention behind the patch is important.  As it stands,
 it isn't clear what criteria the patch is using to distinguish "valid"
 fuse candidates from "invalid" ones.

>>>
>>> Intention behind this patch all variants of UNSPEC instructions are
>>> supported and uses without UNSPEC are not supported in this patch.
>> 
>> But why make the distinction this way though?  UNSPEC is a very
>> GCC-specific concept.  Whether something is an UNSPEC or some other
>> RTL code depends largely on historical accident.  E.g. we have specific
>> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
>> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
>> 
>> It seems unlikely that GCC's choice about whether to represent something
>> as an UNSPEC or as another RTL code lines up neatly with the kind of
>> codegen decisions that a good assembly programmer would make.
>> 
>> I suppose another way of asking is to turn this around and say: what
>> kind of uses are you trying to exclude?  Presumably things are worse
>> if you remove this function override.  But what makes them worse?
>> What kind of uses cause the regression?
>> 
>
> Uses of fused load where load with low address uses are modified with load 
> with high address uses.
>
> Similarly load with high address uses are modified with load low address
> uses.

It sounds like something is going wrong the subreg updates.
Can you give an example of where this occurs?  For instance...

> This is the semantics of lxvp instructions which can occur through
> UNSPEC uses otherwise it breaks the functionality and seen failure
> in almost all vect regressions and SPEC benchmarks.

...could you take one of the simpler vect regressions, show the before
and after RTL, and why the transformation is wrong?

Thanks,
Richard


Re: [PATCH] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-03 Thread Christophe Lyon

Hi Wilco,


On 6/3/24 15:42, Wilco Dijkstra wrote:

A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected.  To work around this, use the 'Uw'
constraint that blocks writeback.

Passes bootstrap & regress, OK for commit?

gcc:
 PR target/115153

I guess this is typo (should be 115188) ?


 * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
 (arm_atomic_store): Likewise.

gcc/testsuite:
 PR target/115188
 * gcc.target/arm/pr115188.c: Add new test.

---

diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 
df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -65,7 +65,7 @@
  (define_insn "arm_atomic_load"
[(set (match_operand:QHSI 0 "register_operand" "=r,l")
  (unspec_volatile:QHSI
-  [(match_operand:QHSI 1 "memory_operand" "m,m")]
+  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
VUNSPEC_LDR))]
""
"ldr\t%0, %1"
@@ -81,7 +81,7 @@
  )
  
  (define_insn "arm_atomic_store"

-  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
+  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
  (unspec_volatile:QHSI
[(match_operand:QHSI 1 "register_operand" "r,l")]
VUNSPEC_STR))]
diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
b/gcc/testsuite/gcc.target/arm/pr115188.c
new file mode 100644
index 
..ef40d7732b77936c845707989465a01ecca5adb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr115188.c
@@ -0,0 +1,10 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_arch_v6m_ok }
+/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I think you don't need to add it 

here?

Thanks,

Christophe


+/* { dg-add-options arm_arch_v6m } */
+
+void init (int *p, int n)
+{
+  for (int i = 0; i < n; i++)
+__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
+}



[r15-983 Regression] FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors) on Linux/x86_64

2024-06-03 Thread haochen.jiang
On Linux/x86_64,

1f2ca510065a2033bac408eb5a960ef0126f25cc is the first bad commit
commit 1f2ca510065a2033bac408eb5a960ef0126f25cc
Author: Haochen Jiang 
Date:   Mon May 20 15:52:32 2024 +0800

Add AVX10.1 target_clones support

caused

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-983/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx10_1-25.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-06-03 Thread Qing Zhao


> On Jun 3, 2024, at 02:29, Richard Biener  wrote:
> 
> On Fri, May 31, 2024 at 11:23 PM Qing Zhao  wrote:
>> 
>> 
>> 
>>> On May 23, 2024, at 07:46, Richard Biener  
>>> wrote:
>>> 
>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao  wrote:
 
 
 
> On May 22, 2024, at 03:38, Richard Biener  
> wrote:
> 
> On Tue, May 21, 2024 at 11:36 PM David Malcolm  
> wrote:
>> 
>> On Tue, 2024-05-21 at 15:13 +, Qing Zhao wrote:
>>> Thanks for the comments and suggestions.
>>> 
 On May 15, 2024, at 10:00, David Malcolm 
 wrote:
 
 On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote:
> On Mon, 13 May 2024, Qing Zhao wrote:
> 
>> -Warray-bounds is an important option to enable linux kernal to
>> keep
>> the array out-of-bound errors out of the source tree.
>> 
>> However, due to the false positive warnings reported in
>> PR109071
>> (-Warray-bounds false positive warnings due to code duplication
>> from
>> jump threading), -Warray-bounds=1 cannot be added on by
>> default.
>> 
>> Although it's impossible to elinimate all the false positive
>> warnings
>> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
>> documentation says "always out of bounds"), we should minimize
>> the
>> false positive warnings in -Warray-bounds=1.
>> 
>> The root reason for the false positive warnings reported in
>> PR109071 is:
>> 
>> When the thread jump optimization tries to reduce the # of
>> branches
>> inside the routine, sometimes it needs to duplicate the code
>> and
>> split into two conditional pathes. for example:
>> 
>> The original code:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>> if (index >= 4)
>>  warn ();
>> *ptr = 0;
>> *val = sg->vals[index];
>> if (index >= 4)
>>  warn ();
>> *ptr = *val;
>> 
>> return;
>> }
>> 
>> With the thread jump, the above becomes:
>> 
>> void sparx5_set (int * ptr, struct nums * sg, int index)
>> {
>> if (index >= 4)
>>  {
>>warn ();
>>*ptr = 0; // Code duplications since "warn" does
>> return;
>>*val = sg->vals[index];   // same this line.
>>  // In this path, since it's
>> under
>> the condition
>>  // "index >= 4", the compiler
>> knows
>> the value
>>  // of "index" is larger then 4,
>> therefore the
>>  // out-of-bound warning.
>>warn ();
>>  }
>> else
>>  {
>>*ptr = 0;
>>*val = sg->vals[index];
>>  }
>> *ptr = *val;
>> return;
>> }
>> 
>> We can see, after the thread jump optimization, the # of
>> branches
>> inside
>> the routine "sparx5_set" is reduced from 2 to 1, however,  due
>> to
>> the
>> code duplication (which is needed for the correctness of the
>> code),
>> we
>> got a false positive out-of-bound warning.
>> 
>> In order to eliminate such false positive out-of-bound warning,
>> 
>> A. Add one more flag for GIMPLE: is_splitted.
>> B. During the thread jump optimization, when the basic blocks
>> are
>> duplicated, mark all the STMTs inside the original and
>> duplicated
>> basic blocks as "is_splitted";
>> C. Inside the array bound checker, add the following new
>> heuristic:
>> 
>> If
>> 1. the stmt is duplicated and splitted into two conditional
>> paths;
>> +  2. the warning level < 2;
>> +  3. the current block is not dominating the exit block
>> Then not report the warning.
>> 
>> The false positive warnings are moved from -Warray-bounds=1 to
>> -Warray-bounds=2 now.
>> 
>> Bootstrapped and regression tested on both x86 and aarch64.
>> adjusted
>> -Warray-bounds-61.c due to the false positive warnings.
>> 
>> Let me know if you have any comments and suggestions.
> 
> At the last Cauldron I talked with David Malcolm about these kind
> of
> issues and thought of instead of suppressing diagnostics to
> record
> how a block was duplicated.  For jump threading my idea was to
> record
> the condition that was proved true when entering the path and do
> this
> by recording t

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Ajit Agarwal
Hello Richard:

On 03/06/24 7:47 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> On 03/06/24 5:03 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
> [...]
> If it is intentional, what distinguishes things like vperm and xxinsertw
> (and all other unspecs) from plain addition?
>
>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>   (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>

 Plain addition are not supported currently.
 We have not seen many cases with plain addition and this patch
 will not accept plain addition.

  
> This is why the intention behind the patch is important.  As it stands,
> it isn't clear what criteria the patch is using to distinguish "valid"
> fuse candidates from "invalid" ones.
>

 Intention behind this patch all variants of UNSPEC instructions are
 supported and uses without UNSPEC are not supported in this patch.
>>>
>>> But why make the distinction this way though?  UNSPEC is a very
>>> GCC-specific concept.  Whether something is an UNSPEC or some other
>>> RTL code depends largely on historical accident.  E.g. we have specific
>>> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
>>> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
>>>
>>> It seems unlikely that GCC's choice about whether to represent something
>>> as an UNSPEC or as another RTL code lines up neatly with the kind of
>>> codegen decisions that a good assembly programmer would make.
>>>
>>> I suppose another way of asking is to turn this around and say: what
>>> kind of uses are you trying to exclude?  Presumably things are worse
>>> if you remove this function override.  But what makes them worse?
>>> What kind of uses cause the regression?
>>>
>>
>> Uses of fused load where load with low address uses are modified with load 
>> with high address uses.
>>
>> Similarly load with high address uses are modified with load low address
>> uses.
> 
> It sounds like something is going wrong the subreg updates.
> Can you give an example of where this occurs?  For instance...
> 
>> This is the semantics of lxvp instructions which can occur through
>> UNSPEC uses otherwise it breaks the functionality and seen failure
>> in almost all vect regressions and SPEC benchmarks.
> 
> ...could you take one of the simpler vect regressions, show the before
> and after RTL, and why the transformation is wrong?
>

Before the change:

(insn 32 30 103 5 (set (reg:V16QI 127 [ _32 ])
(mem:V16QI (reg:DI 130 [ ivtmp.37 ]) [1 MEM  [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
 (nil))
(insn 103 32 135 5 (set (reg:V16QI 173 [ _32 ])
(mem:V16QI (plus:DI (reg:DI 130 [ ivtmp.37 ])
(const_int 16 [0x10])) [1 MEM  
[(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
 (nil))
(insn 135 103 34 5 (set (reg:DI 155)
(plus:DI (reg:DI 130 [ ivtmp.37 ])
(const_int 16 [0x10]))) 66 {*adddi3}
 (nil))
(insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
(unspec:V16QI [
(reg:V16QI 127 [ _32 ]) repeated x2
(reg:V16QI 152)
] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
 (expr_list:REG_DEAD (reg:V16QI 127 [ _32 ])
(nil)))
(insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
(unspec:V16QI [
(reg:V16QI 173 [ _32 ]) repeated x2
(reg:V16QI 152)
] UNSPEC_VPERM)) 
 {altivec_vperm_v16qi_direct}


After the change:

(insn 103 30 135 5 (set (reg:OO 127 [ _32 ])
(mem:OO (reg:DI 130 [ ivtmp.37 ]) [1 MEM  
[(short unsigned int *)_55]+0 S16 A128])) {*movoo}
 (nil))
(insn 135 103 34 5 (set (reg:DI 155)
(plus:DI (reg:DI 130 [ ivtmp.37 ])
(const_int 16 [0x10]))) 66 {*adddi3}
 (nil))
(insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
(unspec:V16QI [
(subreg:V16QI (reg:OO 127 [ _32 ]) 16)
(subreg:V16QI (reg:OO 127 [ _32 ]) 16)
(reg:V16QI 152)
] UNSPEC_VPERM)) {altivec_vperm_v16qi_direct}
 (expr_list:REG_DEAD (reg:OO 127 [ _32 ])
(nil)))
(insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
(unspec:V16QI [
(subreg:V16QI (reg:OO 127 [ _32 ]) 0)
(subreg:V16QI (reg:OO 127 [ _32 ]) 0)
(reg:V16QI 152)
] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}

After the change the tests passes.
 
> Thanks,
> Richard

Thanks & Regards
Ajit

Thanks & Regards
Ajit


Re: [PATCH v2 1/2] Factor out static_assert constexpr string extraction for reuse

2024-06-03 Thread Jason Merrill

On 6/2/24 23:45, Andi Kleen wrote:

No intentional semantics change.

gcc/cp/ChangeLog:

* cp-tree.h (struct cstr): Add structure.
(get_cstr): Declare.
(extract_cstr): Declare.
(free_cstr): Declare.
* semantics.cc (finish_static_assert): Factor out constant
string expression extraction code and move to...
(get_cstr): Here.
(extract_cstr): Dito.
(free_cstr): Dito.
---
  gcc/cp/cp-tree.h|  17 +++
  gcc/cp/semantics.cc | 292 +---
  2 files changed, 184 insertions(+), 125 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 565e4a9290e2..25b8033db788 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -9015,6 +9015,23 @@ struct push_access_scope_guard
}
  };
  
+/* Extracting strings from constexpr */

+
+/* Temporary data for extracting constant string.  */
+struct cstr
+{
+  tree message;
+  tree message_data;
+  tree message_sz;
+  char *buf;
+};
+
+bool get_cstr (tree message, location_t location, const char *what, const char 
*what2,
+  cstr &cstr);
+bool extract_cstr (const char *what, const char *what2, location_t location,
+  cstr &cstr, const char * & msg, int &len);
+void free_cstr (cstr &cstr);


get_ and extract_ are confusingly similar names.  Let's make cstr more 
of a class, initialized at least from 'message', and perhaps from the 
other context information.


Then get_ can be something like cstr::type_check.  And then free_cstr is 
the destructor, and the copy constructor is deleted.


And please don't use the same name for the class and parameter/variables 
of that type.



+ error_at (location, "%qs %s must be a string "


From ABOUT-GCC-NLS: "Avoid using %s to compose a diagnostic message 
from multiple translatable strings; instead, write out the full 
diagnostic message for each variant. Only use %s for message components 
that do not need translation, such as keywords."


Also, if you are passing an English string for a diagnostic to a 
function that isn't from diagnostic.h, you need to use the intl.h macros 
to mark it for translation.


Jason



Re: [PATCH 02/52] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Iain Buclaw
Excerpts from Kewen.Lin's message of Juni 3, 2024 10:57 am:
> Hi Iain,
> 
> on 2024/6/3 16:40, Iain Buclaw wrote:
>> Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
>>> Joseph pointed out "floating types should have their mode,
>>> not a poorly defined precision value" in the discussion[1],
>>> as he and Richi suggested, the existing macros
>>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>>> hook mode_for_floating_type.  To be prepared for that, this
>>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
>>> TYPE_PRECISION of long_double_type_node.
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>>
>> 
>> Thanks, one question though: Is TYPE_PRECISION really equivalent to
>> LONG_DOUBLE_TYPE_SIZE?
> 
> Yes, it's guaranteed by the code in build_common_tree_nodes:
> 
>   long_double_type_node = make_node (REAL_TYPE);
>   TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
>   layout_type (long_double_type_node);
> 
> , the macro LONG_DOUBLE_TYPE_SIZE is assigned to TYPE_PRECISION of
> long_double_type_node, layout_type will only pick up one mode as
> the given precision and won't change it.
> 
>> 
>> Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
>> the answer to be "no".
> 
> I'm afraid it's poorly named before.
> 

Thanks for confirming Kewen.

I suspect then that this code is incorrectly using this macro, and it
should instead be using:

int_size_in_bytes(long_double_type_node)

as any padding should be considered as part of the overall type size for
the purpose that this field serves in the D part of the front-end.

Are you able to update the patch this way instead? Otherwise I'm happy
to push the change instead.

Thanks again,
Iain.


Re: [PATCH v7 4/9] C++: Support clang compatible [[musttail]] (PR83324)

2024-06-03 Thread Jason Merrill

On 6/2/24 13:16, Andi Kleen wrote:

This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

One problem is that tree-tailcall usually fails when optimization
is disabled, which implies the attribute only really works with
optimization on. But that seems to be a reasonable limitation.

Passes bootstrap and full test

PR83324

gcc/cp/ChangeLog:

* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
(cp_parser_std_attribute): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
---
  gcc/cp/parser.cc | 42 +-
  gcc/cp/pt.cc |  4 +++-
  2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 779625144db4..c2cb304bac7d 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
  static tree cp_parser_range_for_member_function
(tree, tree);
  static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
  static void cp_parser_declaration_statement
(cp_parser *);
  
@@ -12747,13 +12747,17 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,

 NULL_TREE, false);
  break;
  
+	case RID_RETURN:

+ std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
+ statement = cp_parser_jump_statement (parser, std_attrs);
+ break;
+
case RID_BREAK:
case RID_CONTINUE:
-   case RID_RETURN:
case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
- statement = cp_parser_jump_statement (parser);
+ statement = cp_parser_jump_statement (parser, std_attrs);
  break;
  
  	  /* Objective-C++ exception-handling constructs.  */

@@ -14813,10 +14817,11 @@ cp_parser_init_statement (cp_parser *parser, tree 
*decl)
 jump-statement:
   goto * expression ;
  
+   STD_ATTRS are the statement attributes. They can be modified.

 Returns the new BREAK_STMT, CONTINUE_STMT, RETURN_EXPR, or GOTO_EXPR.  */
  
  static tree

-cp_parser_jump_statement (cp_parser* parser)
+cp_parser_jump_statement (cp_parser* parser, tree &std_attrs)
  {
tree statement = error_mark_node;
cp_token *token;
@@ -14893,6 +14898,33 @@ cp_parser_jump_statement (cp_parser* parser)
  /* If the next token is a `;', then there is no
 expression.  */
  expr = NULL_TREE;
+
+   if (keyword == RID_RETURN && expr)
+ {
+   bool musttail_p = false;
+   if (lookup_attribute ("gnu", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
+ }
+   // support this for compatibility


This should follow the usual comment conventions, i.e.

/* Support this for compatibility.  */


+   if (lookup_attribute ("clang", "musttail", std_attrs))
+ {
+   musttail_p = true;
+   std_attrs = remove_attribute ("clang", "musttail", std_attrs);
+ }
+   if (musttail_p)
+ {
+   tree t = expr;
+   if (t && TREE_CODE (t) == TARGET_EXPR)
+ t = TARGET_EXPR_INITIAL (t);
+   if (t && TREE_CODE (t) != CALL_EXPR)
+ error_at (token->location, "cannot tail-call: return value must be 
a call");
+   else
+ CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+ }
+ }
+
/* Build the return-statement, check co-return first, since type
   deduction is not valid there.  */
if (keyword == RID_CO_RETURN)
@@ -30316,7 +30348,7 @@ cp_parser_std_a

Re: [PATCH v7 6/9] Add tests for C/C++ musttail attributes

2024-06-03 Thread Jason Merrill

On 6/2/24 13:16, Andi Kleen wrote:

Mostly adopted from the existing C musttail plugin tests.

gcc/testsuite/ChangeLog:

* c-c++-common/musttail1.c: New test.
* c-c++-common/musttail2.c: New test.
* c-c++-common/musttail3.c: New test.
* c-c++-common/musttail4.c: New test.
* c-c++-common/musttail7.c: New test.
* c-c++-common/musttail8.c: New test.
* g++.dg/musttail6.C: New test.
* g++.dg/musttail9.C: New test.
* g++.dg/musttail10.C: New test.
---
  gcc/testsuite/c-c++-common/musttail1.c | 14 +++
  gcc/testsuite/c-c++-common/musttail2.c | 33 +++
  gcc/testsuite/c-c++-common/musttail3.c | 29 +
  gcc/testsuite/c-c++-common/musttail4.c | 17 
  gcc/testsuite/c-c++-common/musttail5.c | 28 +
  gcc/testsuite/c-c++-common/musttail7.c | 14 +++
  gcc/testsuite/c-c++-common/musttail8.c | 17 
  gcc/testsuite/g++.dg/musttail10.C  | 20 +
  gcc/testsuite/g++.dg/musttail6.C   | 58 ++
  gcc/testsuite/g++.dg/musttail9.C   | 10 +
  10 files changed, 240 insertions(+)
  create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail5.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail7.c
  create mode 100644 gcc/testsuite/c-c++-common/musttail8.c
  create mode 100644 gcc/testsuite/g++.dg/musttail10.C
  create mode 100644 gcc/testsuite/g++.dg/musttail6.C
  create mode 100644 gcc/testsuite/g++.dg/musttail9.C

diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
b/gcc/testsuite/c-c++-common/musttail1.c
new file mode 100644
index ..74efcc2a0bc6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
+
+int __attribute__((noinline,noclone,noipa))
+callee (int i)
+{
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+caller (int i)
+{
+  [[gnu::musttail]] return callee (i + 1);
+}
diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
b/gcc/testsuite/c-c++-common/musttail2.c
new file mode 100644
index ..86f2c3d77404
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[256]; int i; };
+
+int __attribute__((noinline,noclone,noipa))
+test_2_callee (int i, struct box b)
+{
+  if (b.field[0])
+return 5;
+  return i * i;
+}
+
+int __attribute__((noinline,noclone,noipa))
+test_2_caller (int i)
+{
+  struct box b;
+  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
tail-call: " } */
+}
+
+extern void setjmp (void);
+void
+test_3 (void)
+{
+  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
+}
+
+extern float f7(void);
+
+int
+test_6 (void)
+{
+  [[gnu::musttail]] return f7(); /* { dg-error "cannot tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
b/gcc/testsuite/c-c++-common/musttail3.c
new file mode 100644
index ..ea9589c59ef2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+extern int foo2 (int x, ...);
+
+struct str
+{
+  int a, b;
+};
+
+struct str
+cstruct (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return cstruct (x + 1);
+  return ((struct str){ x, 0 });
+}
+
+int
+foo (int x)
+{
+  if (x < 10)
+[[clang::musttail]] return foo2 (x, 29);
+  if (x < 100)
+{
+  int k = foo (x + 1);
+  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } */
+}
+  return x;
+}
diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
b/gcc/testsuite/c-c++-common/musttail4.c
new file mode 100644
index ..23f4b5e1cd68
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
+
+struct box { char field[64]; int i; };
+
+struct box __attribute__((noinline,noclone,noipa))
+returns_struct (int i)
+{
+  struct box b;
+  b.i = i * i;
+  return b;
+}
+
+int __attribute__((noinline,noclone))
+test_1 (int i)
+{
+  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
tail-call: " } */
+}
diff --git a/gcc/testsuite/c-c++-common/musttail5.c 
b/gcc/testsuite/c-c++-common/musttail5.c
new file mode 100644
index ..234da0d3f2a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/musttail5.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c23" { target c } } */
+/* { dg-options "-std=gnu++11" { target c++ } } */
+
+[[musttail]] int j; /* { dg-warning "attribute" } */
+__attribute__((musttai

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-06-03 Thread David Malcolm
On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote:
> On Fri, May 31, 2024 at 11:23 PM Qing Zhao 
> wrote:
> > 
> > 
> > 
> > > On May 23, 2024, at 07:46, Richard Biener
> > >  wrote:
> > > 
> > > On Wed, May 22, 2024 at 8:53 PM Qing Zhao 
> > > wrote:
> > > > 
> > > > 
> > > > 
> > > > > On May 22, 2024, at 03:38, Richard Biener
> > > > >  wrote:
> > > > > 
> > > > > On Tue, May 21, 2024 at 11:36 PM David Malcolm
> > > > >  wrote:
> > > > > > 
> > > > > > On Tue, 2024-05-21 at 15:13 +, Qing Zhao wrote:
> > > > > > > Thanks for the comments and suggestions.
> > > > > > > 
> > > > > > > > On May 15, 2024, at 10:00, David Malcolm
> > > > > > > > 
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener
> > > > > > > > wrote:
> > > > > > > > > On Mon, 13 May 2024, Qing Zhao wrote:
> > > > > > > > > 
> > > > > > > > > > -Warray-bounds is an important option to enable
> > > > > > > > > > linux kernal to
> > > > > > > > > > keep
> > > > > > > > > > the array out-of-bound errors out of the source
> > > > > > > > > > tree.
> > > > > > > > > > 
> > > > > > > > > > However, due to the false positive warnings
> > > > > > > > > > reported in
> > > > > > > > > > PR109071
> > > > > > > > > > (-Warray-bounds false positive warnings due to code
> > > > > > > > > > duplication
> > > > > > > > > > from
> > > > > > > > > > jump threading), -Warray-bounds=1 cannot be added
> > > > > > > > > > on by
> > > > > > > > > > default.
> > > > > > > > > > 
> > > > > > > > > > Although it's impossible to elinimate all the false
> > > > > > > > > > positive
> > > > > > > > > > warnings
> > > > > > > > > > from -Warray-bounds=1 (See PR104355 Misleading -
> > > > > > > > > > Warray-bounds
> > > > > > > > > > documentation says "always out of bounds"), we
> > > > > > > > > > should minimize
> > > > > > > > > > the
> > > > > > > > > > false positive warnings in -Warray-bounds=1.
> > > > > > > > > > 
> > > > > > > > > > The root reason for the false positive warnings
> > > > > > > > > > reported in
> > > > > > > > > > PR109071 is:
> > > > > > > > > > 
> > > > > > > > > > When the thread jump optimization tries to reduce
> > > > > > > > > > the # of
> > > > > > > > > > branches
> > > > > > > > > > inside the routine, sometimes it needs to duplicate
> > > > > > > > > > the code
> > > > > > > > > > and
> > > > > > > > > > split into two conditional pathes. for example:
> > > > > > > > > > 
> > > > > > > > > > The original code:
> > > > > > > > > > 
> > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int
> > > > > > > > > > index)
> > > > > > > > > > {
> > > > > > > > > > if (index >= 4)
> > > > > > > > > >   warn ();
> > > > > > > > > > *ptr = 0;
> > > > > > > > > > *val = sg->vals[index];
> > > > > > > > > > if (index >= 4)
> > > > > > > > > >   warn ();
> > > > > > > > > > *ptr = *val;
> > > > > > > > > > 
> > > > > > > > > > return;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > With the thread jump, the above becomes:
> > > > > > > > > > 
> > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int
> > > > > > > > > > index)
> > > > > > > > > > {
> > > > > > > > > > if (index >= 4)
> > > > > > > > > >   {
> > > > > > > > > >     warn ();
> > > > > > > > > >     *ptr = 0; // Code duplications since
> > > > > > > > > > "warn" does
> > > > > > > > > > return;
> > > > > > > > > >     *val = sg->vals[index];   // same this line.
> > > > > > > > > >   // In this path,
> > > > > > > > > > since it's
> > > > > > > > > > under
> > > > > > > > > > the condition
> > > > > > > > > >   // "index >= 4", the
> > > > > > > > > > compiler
> > > > > > > > > > knows
> > > > > > > > > > the value
> > > > > > > > > >   // of "index" is
> > > > > > > > > > larger then 4,
> > > > > > > > > > therefore the
> > > > > > > > > >   // out-of-bound
> > > > > > > > > > warning.
> > > > > > > > > >     warn ();
> > > > > > > > > >   }
> > > > > > > > > > else
> > > > > > > > > >   {
> > > > > > > > > >     *ptr = 0;
> > > > > > > > > >     *val = sg->vals[index];
> > > > > > > > > >   }
> > > > > > > > > > *ptr = *val;
> > > > > > > > > > return;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > We can see, after the thread jump optimization, the
> > > > > > > > > > # of
> > > > > > > > > > branches
> > > > > > > > > > inside
> > > > > > > > > > the routine "sparx5_set" is reduced from 2 to 1,
> > > > > > > > > > however,  due
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > code duplication (which is needed for the
> > > > > > > > > > correctness of the
> > > > > > > > > > code),
> > > > > > > > > > we
> > > > > > > > > > got a false positive out-of-bound warning.
> > > > > > > > > > 
> > > > > > > > > > In order to eliminate such false positive out-of-
> > > > > > > > > > bound warning,
> > > > > > > > > > 
> > > > > > > > > 

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Richard:
>
> On 03/06/24 7:47 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> On 03/06/24 5:03 pm, Richard Sandiford wrote:
 Ajit Agarwal  writes:
>> [...]
>> If it is intentional, what distinguishes things like vperm and xxinsertw
>> (and all other unspecs) from plain addition?
>>
>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>  (match_operand:VSX_F 2 "vsx_register_operand" "wa")))]
>>
>
> Plain addition are not supported currently.
> We have not seen many cases with plain addition and this patch
> will not accept plain addition.
>
>  
>> This is why the intention behind the patch is important.  As it stands,
>> it isn't clear what criteria the patch is using to distinguish "valid"
>> fuse candidates from "invalid" ones.
>>
>
> Intention behind this patch all variants of UNSPEC instructions are
> supported and uses without UNSPEC are not supported in this patch.

 But why make the distinction this way though?  UNSPEC is a very
 GCC-specific concept.  Whether something is an UNSPEC or some other
 RTL code depends largely on historical accident.  E.g. we have specific
 codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
 for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).

 It seems unlikely that GCC's choice about whether to represent something
 as an UNSPEC or as another RTL code lines up neatly with the kind of
 codegen decisions that a good assembly programmer would make.

 I suppose another way of asking is to turn this around and say: what
 kind of uses are you trying to exclude?  Presumably things are worse
 if you remove this function override.  But what makes them worse?
 What kind of uses cause the regression?

>>>
>>> Uses of fused load where load with low address uses are modified with load 
>>> with high address uses.
>>>
>>> Similarly load with high address uses are modified with load low address
>>> uses.
>> 
>> It sounds like something is going wrong the subreg updates.
>> Can you give an example of where this occurs?  For instance...
>> 
>>> This is the semantics of lxvp instructions which can occur through
>>> UNSPEC uses otherwise it breaks the functionality and seen failure
>>> in almost all vect regressions and SPEC benchmarks.
>> 
>> ...could you take one of the simpler vect regressions, show the before
>> and after RTL, and why the transformation is wrong?
>
> Before the change:
>
> (insn 32 30 103 5 (set (reg:V16QI 127 [ _32 ])
> (mem:V16QI (reg:DI 130 [ ivtmp.37 ]) [1 MEM  int> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>  (nil))
> (insn 103 32 135 5 (set (reg:V16QI 173 [ _32 ])
> (mem:V16QI (plus:DI (reg:DI 130 [ ivtmp.37 ])
> (const_int 16 [0x10])) [1 MEM  
> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>  (nil))
> (insn 135 103 34 5 (set (reg:DI 155)
> (plus:DI (reg:DI 130 [ ivtmp.37 ])
> (const_int 16 [0x10]))) 66 {*adddi3}
>  (nil))
> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
> (unspec:V16QI [
> (reg:V16QI 127 [ _32 ]) repeated x2
> (reg:V16QI 152)
> ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>  (expr_list:REG_DEAD (reg:V16QI 127 [ _32 ])
> (nil)))
> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
> (unspec:V16QI [
> (reg:V16QI 173 [ _32 ]) repeated x2
> (reg:V16QI 152)
> ] UNSPEC_VPERM)) 
>  {altivec_vperm_v16qi_direct}
>
>
> After the change:
>
> (insn 103 30 135 5 (set (reg:OO 127 [ _32 ])
> (mem:OO (reg:DI 130 [ ivtmp.37 ]) [1 MEM  int> [(short unsigned int *)_55]+0 S16 A128])) {*movoo}
>  (nil))
> (insn 135 103 34 5 (set (reg:DI 155)
> (plus:DI (reg:DI 130 [ ivtmp.37 ])
> (const_int 16 [0x10]))) 66 {*adddi3}
>  (nil))
> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
> (unspec:V16QI [
> (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
> (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
> (reg:V16QI 152)
> ] UNSPEC_VPERM)) {altivec_vperm_v16qi_direct}
>  (expr_list:REG_DEAD (reg:OO 127 [ _32 ])
> (nil)))
> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
> (unspec:V16QI [
> (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
> (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
> (reg:V16QI 152)
> ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>
> After the change the tests passes.

But isn't this an example of the optimisation working on unspecs,
and working correctly?

I meant instead: could you give an example of the vect regressions
that you saw with the unspec test removed?  You mentioned that many
ve

[PATCH] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-03 Thread Jakub Jelinek
Hi!

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or &rs6000_instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
&rs6000_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&rs6000_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (-m32/-m64
testing there), ok for trunk and after a while for release branches?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays,
which have GTY markup.
(write_bif_static_init): Adjust for the above changes.
(write_ovld_static_init): Likewise.
(write_init_bif_table): Likewise.
(write_init_ovld_table): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Likewise.
* config/rs6000/rs6000-c.cc (find_instance): Likewise.  Make static.
(altivec_resolve_overloaded_builtin): Adjust for the above changes.

--- gcc/config/rs6000/rs6000-gen-builtins.cc.jj 2024-05-31 22:09:39.300059155 
+0200
+++ gcc/config/rs600

Re: [PATCH v2 2/2] C++: Support constexpr strings for asm statements

2024-06-03 Thread Jason Merrill

On 6/2/24 23:45, Andi Kleen wrote:

Some programing styles use a lot of inline assembler, and it is common
to use very complex preprocessor macros to generate the assembler
strings for the asm statements. In C++ there would be a typesafe alternative
using templates and constexpr to generate the assembler strings, but
unfortunately the asm statement requires plain string literals, so this
doesn't work.

This patch modifies the C++ parser to accept strings generated by
constexpr instead of just plain strings. This requires new syntax
because e.g. asm("..." : "r" (expr)) would be ambigious with a function
call. I chose () to make it unique. For example now you can write

constexpr const char *genasm() { return "insn"; }
constexpr const char *genconstraint() { return "r"; }

asm(genasm() :: (genconstraint()) (input));


Looks plausible.  What happens when someone forgets the parens, as seems 
a likely mistake?



The constexpr strings are allowed for the asm template, the
constraints and the clobbers (every time current asm accepts a string)

This version allows the same constexprs as C++26 static_assert,
following Jakub's suggestion.

The drawback of this scheme is that the constexpr doesn't have
full control over the input/output/clobber lists, but that can be
usually handled with a switch statement.  One could imagine
more flexible ways to handle that, for example supporting constexpr
vectors for the clobber list, or similar. But even without
that it is already useful.

Bootstrapped and full test on x86_64-linux.

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Define
  __GNU_CONSTEXPR_ASM__.


__GXX instead of __GNU would be more consistent with the other old 
predefined macros.


Jason



Re: [PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-06-03 Thread David Malcolm
On Mon, 2024-06-03 at 11:30 +0100, Maciej W. Rozycki wrote:
> On Tue, 28 May 2024, David Malcolm wrote:
> 
> > I've pushed this to gcc trunk as r15-872-g7cc529fe514cc6 (having
> > bootstrapped and lightly tested it on x86_64-pc-linux-gnu)
> 
>  Thank you for fixing this up.  Is this a new requirement now for
> .opt 
> file changes?  

Yes, as of GCC 14.

I posted the objectives here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636060.html

> Why does it have to be called by hand then rather than
> being a make dependency?

IIRC, the idea is:
(a) to avoid requiring Python 3 as a build dependency, and 
(b) to avoid requiring the HTML docs to be built before gcc's code can
be built

since missing a few URLs is a relatively minor issue that we don't want
to complicate the build for.

Hence we decided to check for it in CI instead.

Hope the trade-off sounds reasonable
Dave



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs

On 28/05/2024 23:33, Tobias Burnus wrote:
While most of the nvptx systems I have access to don't have the support 
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, 
one has:


Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support 
this feature. And with that feature, unified-shared memory support does 
work, presumably by handling automatic page migration when a page fault 
occurs.


Hence: Enable USM support for those. When doing so, all 'requires 
unified_shared_memory' tests of sollve_vv pass :-)


I am not quite sure whether there are unintended side effects, hence, I 
have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.


OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…





+   /* If USM has been requested and is supported by all devices
+  of this type, set the capability accordingly.  */
+   if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+ current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because it 
now short-circuits all of those code-paths), and it's just not true for 
devices where all host memory isn't magically addressable on the device.


Is there another way to detect truly shared memory?

Andrew


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Jonathan Wakely
On Mon, 3 Jun 2024 at 14:36, Peter0x44 wrote:
> > +void
> > +std::breakpoint() noexcept
> > +{
> > +#if _GLIBCXX_HAVE_DEBUGAPI_H && defined(_WIN32) &&
> > !defined(__CYGWIN__)
> > +  DebugBreak();
> > +#elif __has_builtin(__builtin_debugtrap)
> > +  __builtin_debugtrap(); // Clang
> > +#elif defined(__i386__) || defined(__x86_64__)
> > +  __asm__ volatile ("int3");
> Worth noting, currently gcc has some bugs around its handling of int3,

s/gcc/gdb/ ?

> https://sourceware.org/bugzilla/show_bug.cgi?id=31194
> int3;nop seems to work around it okay. __builtin_debugtrap() of clang
> does run into the same issues.

It seemed to work OK for me, maybe because there's no code after it?

void breakpoint() {
__asm__ volatile ("int3);
}



[PATCH] libstdc++: Fix simd conversion for -fno-signed-char for Clang

2024-06-03 Thread Matthias Kretz
Tested on x86_64-linux-gnu (also -m32 and -mx32), aarch64-linux-gnu, and arm-
linux-gnueabi(hf).

OK for trunk and backports?

--- 8< ---

The special case for Clang in the trait producing a signed integer type
lead to the trait returning 'char' where it should have been 'signed
char'. This workaround was introduced because on Clang the return type
of vector compares was not convertible to '_SimdWrapper<
__int_for_sizeof_t<...' unless '__int_for_sizeof_t' was an alias
for 'char'. In order to not rewrite the complete mask type code (there
is code scattered around the implementation assuming signed integers),
this needs to be 'signed char'; so the special case for Clang needs to
be removed.
The conversion issue is now solved in _SimdWrapper, which now
additionally allows conversion from vector types with compatible
integral type.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/115308
* include/experimental/bits/simd.h (__int_for_sizeof): Remove
special cases for __clang__.
(_SimdWrapper): Change constructor overload set to allow
conversion from vector types with integral conversions via bit
reinterpretation.
---
 libstdc++-v3/include/experimental/bits/simd.h | 45 +++
 1 file changed, 27 insertions(+), 18 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 7c524625719..cb1f13d8ba6 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -606,19 +606,12 @@ __int_for_sizeof()
 static_assert(_Bytes > 0);
 if constexpr (_Bytes == sizeof(int))
   return int();
-  #ifdef __clang__
-else if constexpr (_Bytes == sizeof(char))
-  return char();
-  #else
 else if constexpr (_Bytes == sizeof(_SChar))
   return _SChar();
-  #endif
 else if constexpr (_Bytes == sizeof(short))
   return short();
-  #ifndef __clang__
 else if constexpr (_Bytes == sizeof(long))
   return long();
-  #endif
 else if constexpr (_Bytes == sizeof(_LLong))
   return _LLong();
   #ifdef __SIZEOF_INT128__
@@ -2747,6 +2740,8 @@ struct _SimdWrapperBase
 
 // }}}
 // _SimdWrapper{{{
+struct _DisabledSimdWrapper;
+
 template 
   struct _SimdWrapper<
 _Tp, _Width,
@@ -2756,16 +2751,17 @@ struct _SimdWrapper
 			  == sizeof(__vector_type_t<_Tp, _Width>),
 		   __vector_type_t<_Tp, _Width>>
   {
-using _Base
-  = _SimdWrapperBase<__has_iec559_behavior<__signaling_NaN, _Tp>::value
-			   && sizeof(_Tp) * _Width
-== sizeof(__vector_type_t<_Tp, _Width>),
-			 __vector_type_t<_Tp, _Width>>;
+static constexpr bool _S_need_default_init
+  = __has_iec559_behavior<__signaling_NaN, _Tp>::value
+	  and sizeof(_Tp) * _Width == sizeof(__vector_type_t<_Tp, _Width>);
+
+using _BuiltinType = __vector_type_t<_Tp, _Width>;
+
+using _Base = _SimdWrapperBase<_S_need_default_init, _BuiltinType>;
 
 static_assert(__is_vectorizable_v<_Tp>);
 static_assert(_Width >= 2); // 1 doesn't make sense, use _Tp directly then
 
-using _BuiltinType = __vector_type_t<_Tp, _Width>;
 using value_type = _Tp;
 
 static inline constexpr size_t _S_full_size
@@ -2801,13 +2797,26 @@ _SimdWrapper(initializer_list<_Tp> __init)
 _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper&
 operator=(_SimdWrapper&&) = default;
 
-template >,
-			 is_same<_V, __intrinsic_type_t<_Tp, _Width>
+// Convert from exactly matching __vector_type_t
+using _SimdWrapperBase<_S_need_default_init, _BuiltinType>::_SimdWrapperBase;
+
+// Convert from __intrinsic_type_t if __intrinsic_type_t and __vector_type_t differ, otherwise
+// this ctor should not exist. Making the argument type unusable is our next best solution.
+_GLIBCXX_SIMD_INTRINSIC constexpr
+_SimdWrapper(conditional_t>,
+			   _DisabledSimdWrapper, __intrinsic_type_t<_Tp, _Width>> __x)
+: _Base(__vector_bitcast<_Tp, _Width>(__x)) {}
+
+// Convert from different __vector_type_t, but only if bit reinterpretation is a correct
+// conversion of the value_type
+template ,
+	  typename = enable_if_t
+   and is_integral_v>>
   _GLIBCXX_SIMD_INTRINSIC constexpr
   _SimdWrapper(_V __x)
-  // __vector_bitcast can convert e.g. __m128 to __vector(2) float
-  : _Base(__vector_bitcast<_Tp, _Width>(__x)) {}
+  : _Base(reinterpret_cast<_BuiltinType>(__x)) {}
 
 template  && ...)


Re: [PATCH v7 4/9] C++: Support clang compatible [[musttail]] (PR83324)

2024-06-03 Thread Andi Kleen
On Mon, Jun 03, 2024 at 10:42:20AM -0400, Jason Merrill wrote:
> > @@ -30316,7 +30348,7 @@ cp_parser_std_attribute (cp_parser *parser, tree 
> > attr_ns)
> >   /* Maybe we don't expect to see any arguments for this attribute.  */
> >   const attribute_spec *as
> > = lookup_attribute_spec (TREE_PURPOSE (attribute));
> > -if (as && as->max_length == 0)
> > +if ((as && as->max_length == 0) || is_attribute_p ("musttail", 
> > attr_id))
> 
> This shouldn't be necessary with the attribute in the c-attribs table,
> right?  This patch is OK without this hunk and with the comment tweak above.

Yes I will remove it. Also the hunk above can be simplified, we don't
need the extra case anymore.

But unfortunately there's another problem (sorry I missed that earlier
but the Linaro bot pointed it out again):

This hunk:

@@ -21085,12 +21085,14 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
bool ord = CALL_EXPR_ORDERED_ARGS (t);
bool rev = CALL_EXPR_REVERSE_ARGS (t);
-   if (op || ord || rev)
+   bool mtc = CALL_EXPR_MUST_TAIL_CALL (t);
+   if (op || ord || rev || mtc)
  if (tree call = extract_call_expr (ret))
{
  CALL_EXPR_OPERATOR_SYNTAX (call) = op;
  CALL_EXPR_ORDERED_ARGS (call) = ord;
  CALL_EXPR_REVERSE_ARGS (call) = rev;
+ CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
}
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */


causes 

/home/ak/gcc/gcc/gcc/testsuite/g++.dg/ipa/devirt-52.C:49:8: internal
compiler error: tree check: expected call_expr, have aggr_init_expr in
tsubst_expr, at cp/pt.cc:21095
0x910b66 tree_check_failed(tree_node const*, char const*, int, char
const*, ...)
../../gcc/gcc/tree.cc:8995


I suspect that's because CALL_EXPR_MUST_TAIL_CALL uses
tree_core->static_flag, but that means something else during
template expansion?

Any suggestions how to fix that?

-Andi


Re: [PATCH v7 4/9] C++: Support clang compatible [[musttail]] (PR83324)

2024-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 08:33:52AM -0700, Andi Kleen wrote:
> On Mon, Jun 03, 2024 at 10:42:20AM -0400, Jason Merrill wrote:
> > > @@ -30316,7 +30348,7 @@ cp_parser_std_attribute (cp_parser *parser, tree 
> > > attr_ns)
> > >   /* Maybe we don't expect to see any arguments for this attribute.  
> > > */
> > >   const attribute_spec *as
> > > = lookup_attribute_spec (TREE_PURPOSE (attribute));
> > > -if (as && as->max_length == 0)
> > > +if ((as && as->max_length == 0) || is_attribute_p ("musttail", 
> > > attr_id))
> > 
> > This shouldn't be necessary with the attribute in the c-attribs table,
> > right?  This patch is OK without this hunk and with the comment tweak above.
> 
> Yes I will remove it. Also the hunk above can be simplified, we don't
> need the extra case anymore.
> 
> But unfortunately there's another problem (sorry I missed that earlier
> but the Linaro bot pointed it out again):
> 
> This hunk:
> 
> @@ -21085,12 +21085,14 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl)
>   bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
>   bool ord = CALL_EXPR_ORDERED_ARGS (t);
>   bool rev = CALL_EXPR_REVERSE_ARGS (t);
> - if (op || ord || rev)
> + bool mtc = CALL_EXPR_MUST_TAIL_CALL (t);
> + if (op || ord || rev || mtc)
> if (tree call = extract_call_expr (ret))
>   {
> CALL_EXPR_OPERATOR_SYNTAX (call) = op;
> CALL_EXPR_ORDERED_ARGS (call) = ord;
> CALL_EXPR_REVERSE_ARGS (call) = rev;
> +   CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
>   }

The difference is that CALL_EXPR_MUST_TAIL_CALL is defined as:
#define CALL_EXPR_MUST_TAIL_CALL(NODE) \
  (CALL_EXPR_CHECK (NODE)->base.static_flag)
while the others like:
#define CALL_EXPR_ORDERED_ARGS(NODE) \
  TREE_LANG_FLAG_3 (CALL_OR_AGGR_INIT_CHECK (NODE))
where
#define CALL_OR_AGGR_INIT_CHECK(NODE) \
  TREE_CHECK2 ((NODE), CALL_EXPR, AGGR_INIT_EXPR)
while
#define CALL_EXPR_CHECK(t)  TREE_CHECK (t, CALL_EXPR)
(this one is defined in generated tree-check.h).
So, while the CALL_EXPR_REVERSE_ARGS etc. can be used on either
CALL_EXPR or AGGR_INIT_EXPR (the latter is a C++ specific tree code),
CALL_EXPR_MUST_TAIL_CALL is allowed only on CALL_EXPR.
AGGR_INIT_EXPR is used for C++ constructor calls, so I think
they really don't need such a flag, so you could do:
bool mtc = (TREE_CODE (t) == CALL_EXPR
? CALL_EXPR_MUST_TAIL_CALL (t) : false);
if (op || ord || rev || mtc)
...
  if (mtc)
CALL_EXPR_MUST_TAIL_CALL (call) = 1;
or something similar.
Or you'd need to define a variant of the CALL_EXPR_MUST_TAIL_CALL
macro for the C++ FE (as CALL_OR_AGGR_INIT_CHECK is C++ FE too)
and use that in the FE and somehow assert it means the same thing
as the middle-end flag except that it can be also used on AGGR_INIT_EXPR.

Jakub



Re: [PATCH] libstdc++: Optimize std::basic_string_view::starts_with

2024-06-03 Thread Jonathan Wakely
Pushed to trunk.

On Sat, 1 Jun 2024 at 11:26, Jonathan Wakely  wrote:
>
> We get smaller code at all optimization levels by not creating a
> temporary object, just comparing lengths first and then using
> traits_type::compare. This does less work than calling substr then
> operator==.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/string_view (starts_with(basic_string_view)):
> Compare lengths first and then call traits_type::compare
> directly.
> ---
>  libstdc++-v3/include/std/string_view | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/string_view 
> b/libstdc++-v3/include/std/string_view
> index a7c5a126461..740aa9344f0 100644
> --- a/libstdc++-v3/include/std/string_view
> +++ b/libstdc++-v3/include/std/string_view
> @@ -385,7 +385,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>[[nodiscard]]
>constexpr bool
>starts_with(basic_string_view __x) const noexcept
> -  { return this->substr(0, __x.size()) == __x; }
> +  {
> +   return _M_len >= __x._M_len
> + && traits_type::compare(_M_str, __x._M_str, __x._M_len) == 0;
> +  }
>
>[[nodiscard]]
>constexpr bool
> --
> 2.45.1
>



Re: [PATCH v2 2/2] C++: Support constexpr strings for asm statements

2024-06-03 Thread Andi Kleen
On Mon, Jun 03, 2024 at 11:01:02AM -0400, Jason Merrill wrote:
> On 6/2/24 23:45, Andi Kleen wrote:
> > Some programing styles use a lot of inline assembler, and it is common
> > to use very complex preprocessor macros to generate the assembler
> > strings for the asm statements. In C++ there would be a typesafe alternative
> > using templates and constexpr to generate the assembler strings, but
> > unfortunately the asm statement requires plain string literals, so this
> > doesn't work.
> > 
> > This patch modifies the C++ parser to accept strings generated by
> > constexpr instead of just plain strings. This requires new syntax
> > because e.g. asm("..." : "r" (expr)) would be ambigious with a function
> > call. I chose () to make it unique. For example now you can write
> > 
> > constexpr const char *genasm() { return "insn"; }
> > constexpr const char *genconstraint() { return "r"; }
> > 
> > asm(genasm() :: (genconstraint()) (input));
> 
> Looks plausible.  What happens when someone forgets the parens, as seems a
> likely mistake?

constexpr-asm-1.C:27:13: error: expected string-literal before ‘genfoo’
   27 | asm(genfoo() : genoutput() (a) : geninput() (1) :
   genclobber());
 | ^~


Admittedly not great, I will try to give a better message.


-Andi


Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-06-03 Thread Ajit Agarwal
Hello Richard:

On 03/06/24 8:24 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> On 03/06/24 7:47 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
 On 03/06/24 5:03 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>>> [...]
>>> If it is intentional, what distinguishes things like vperm and xxinsertw
>>> (and all other unspecs) from plain addition?
>>>
>>>   [(set (match_operand:VSX_F 0 "vsx_register_operand" "=wa")
>>> (plus:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" "wa")
>>> (match_operand:VSX_F 2 "vsx_register_operand" 
>>> "wa")))]
>>>
>>
>> Plain addition are not supported currently.
>> We have not seen many cases with plain addition and this patch
>> will not accept plain addition.
>>
>>  
>>> This is why the intention behind the patch is important.  As it stands,
>>> it isn't clear what criteria the patch is using to distinguish "valid"
>>> fuse candidates from "invalid" ones.
>>>
>>
>> Intention behind this patch all variants of UNSPEC instructions are
>> supported and uses without UNSPEC are not supported in this patch.
>
> But why make the distinction this way though?  UNSPEC is a very
> GCC-specific concept.  Whether something is an UNSPEC or some other
> RTL code depends largely on historical accident.  E.g. we have specific
> codes for VEC_SELECT, VEC_MERGE, and VEC_DUPLICATE, but don't have one
> for VEC_PERM (even for VEC_PERM_EXPR exists in gimple).
>
> It seems unlikely that GCC's choice about whether to represent something
> as an UNSPEC or as another RTL code lines up neatly with the kind of
> codegen decisions that a good assembly programmer would make.
>
> I suppose another way of asking is to turn this around and say: what
> kind of uses are you trying to exclude?  Presumably things are worse
> if you remove this function override.  But what makes them worse?
> What kind of uses cause the regression?
>

 Uses of fused load where load with low address uses are modified with load 
 with high address uses.

 Similarly load with high address uses are modified with load low address
 uses.
>>>
>>> It sounds like something is going wrong the subreg updates.
>>> Can you give an example of where this occurs?  For instance...
>>>
 This is the semantics of lxvp instructions which can occur through
 UNSPEC uses otherwise it breaks the functionality and seen failure
 in almost all vect regressions and SPEC benchmarks.
>>>
>>> ...could you take one of the simpler vect regressions, show the before
>>> and after RTL, and why the transformation is wrong?
>>
>> Before the change:
>>
>> (insn 32 30 103 5 (set (reg:V16QI 127 [ _32 ])
>> (mem:V16QI (reg:DI 130 [ ivtmp.37 ]) [1 MEM > unsigned int> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>>  (nil))
>> (insn 103 32 135 5 (set (reg:V16QI 173 [ _32 ])
>> (mem:V16QI (plus:DI (reg:DI 130 [ ivtmp.37 ])
>> (const_int 16 [0x10])) [1 MEM  
>> [(short unsigned int *)_55]+0 S16 A128])) {vsx_movv16qi_64bit}
>>  (nil))
>> (insn 135 103 34 5 (set (reg:DI 155)
>> (plus:DI (reg:DI 130 [ ivtmp.37 ])
>> (const_int 16 [0x10]))) 66 {*adddi3}
>>  (nil))
>> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
>> (unspec:V16QI [
>> (reg:V16QI 127 [ _32 ]) repeated x2
>> (reg:V16QI 152)
>> ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>>  (expr_list:REG_DEAD (reg:V16QI 127 [ _32 ])
>> (nil)))
>> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
>> (unspec:V16QI [
>> (reg:V16QI 173 [ _32 ]) repeated x2
>> (reg:V16QI 152)
>> ] UNSPEC_VPERM)) 
>>  {altivec_vperm_v16qi_direct}
>>
>>
>> After the change:
>>
>> (insn 103 30 135 5 (set (reg:OO 127 [ _32 ])
>> (mem:OO (reg:DI 130 [ ivtmp.37 ]) [1 MEM > int> [(short unsigned int *)_55]+0 S16 A128])) {*movoo}
>>  (nil))
>> (insn 135 103 34 5 (set (reg:DI 155)
>> (plus:DI (reg:DI 130 [ ivtmp.37 ])
>> (const_int 16 [0x10]))) 66 {*adddi3}
>>  (nil))
>> (insn 34 135 104 5 (set (reg:V16QI 143 [ _27 ])
>> (unspec:V16QI [
>> (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
>> (subreg:V16QI (reg:OO 127 [ _32 ]) 16)
>> (reg:V16QI 152)
>> ] UNSPEC_VPERM)) {altivec_vperm_v16qi_direct}
>>  (expr_list:REG_DEAD (reg:OO 127 [ _32 ])
>> (nil)))
>> (insn 104 34 35 5 (set (reg:V16QI 174 [ _27 ])
>> (unspec:V16QI [
>> (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
>> (subreg:V16QI (reg:OO 127 [ _32 ]) 0)
>> (reg:V16QI 152)
>> ] UNSPEC_VPERM))  {altivec_vperm_v16qi_direct}
>>
>> After the change the tests passes.
> 
> But isn't this an exa

Re: [PATCH 31/52] pru: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Dimitar Dimitrov
On Sun, Jun 02, 2024 at 10:01:21PM -0500, Kewen Lin wrote:
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in pru port.
> 
> gcc/ChangeLog:
> 
>   * config/pru/pru.h (FLOAT_TYPE_SIZE): Remove.
>   (DOUBLE_TYPE_SIZE): Likewise.
>   (LONG_DOUBLE_TYPE_SIZE): Likewise.
> ---
>  gcc/config/pru/pru.h | 3 ---
>  1 file changed, 3 deletions(-)
> 

OK, thanks.

Dimitar


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Peter0x44

3 Jun 2024 4:14:28 pm Jonathan Wakely :


On Mon, 3 Jun 2024 at 14:36, Peter0x44 wrote:

+void
+std::breakpoint() noexcept
+{
+#if _GLIBCXX_HAVE_DEBUGAPI_H && defined(_WIN32) &&
!defined(__CYGWIN__)
+  DebugBreak();
+#elif __has_builtin(__builtin_debugtrap)
+  __builtin_debugtrap(); // Clang
+#elif defined(__i386__) || defined(__x86_64__)
+  __asm__ volatile ("int3");

Worth noting, currently gcc has some bugs around its handling of int3,


s/gcc/gdb/ ?

Yes, my bad



https://sourceware.org/bugzilla/show_bug.cgi?id=31194
int3;nop seems to work around it okay. __builtin_debugtrap() of clang
does run into the same issues.


It seemed to work OK for me, maybe because there's no code after it?
I suspect it won't matter for the tests, the assertion failure is only if 
you step after hitting the int3. I just figured I'd mention it as a heads 
up. It would affect users writing code.


void breakpoint() {
__asm__ volatile ("int3);
}


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Andrew Pinski
On Mon, Jun 3, 2024 at 9:06 AM Peter0x44  wrote:
>
> 3 Jun 2024 4:14:28 pm Jonathan Wakely :
>
> > On Mon, 3 Jun 2024 at 14:36, Peter0x44 wrote:
> >>> +void
> >>> +std::breakpoint() noexcept
> >>> +{
> >>> +#if _GLIBCXX_HAVE_DEBUGAPI_H && defined(_WIN32) &&
> >>> !defined(__CYGWIN__)
> >>> +  DebugBreak();
> >>> +#elif __has_builtin(__builtin_debugtrap)
> >>> +  __builtin_debugtrap(); // Clang
> >>> +#elif defined(__i386__) || defined(__x86_64__)
> >>> +  __asm__ volatile ("int3");
> >> Worth noting, currently gcc has some bugs around its handling of int3,
> >
> > s/gcc/gdb/ ?
> Yes, my bad
> >
> >> https://sourceware.org/bugzilla/show_bug.cgi?id=31194
> >> int3;nop seems to work around it okay. __builtin_debugtrap() of clang
> >> does run into the same issues.
> >
> > It seemed to work OK for me, maybe because there's no code after it?
> I suspect it won't matter for the tests, the assertion failure is only if
> you step after hitting the int3. I just figured I'd mention it as a heads
> up. It would affect users writing code.

Note there is a request for adding __builtin_break to GCC;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84595 .
(and one for __builtin_debugtrap;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 ).
When __builtin_break is added, I think we should use that here if it
is available (as a higher priority than __builtin_debugtrap).

Thanks,
Andrew


> >
> > void breakpoint() {
> > __asm__ volatile ("int3);
> > }


Re: [PATCH] Fix -Wstringop-overflow warning in 23_containers/vector/types/1.cc PR109849

2024-06-03 Thread Jonathan Wakely
On Mon, 3 Jun 2024 at 05:56, François Dumont  wrote:
>
> I hadn't try to make my patch as limited as possible to fix the problem,
> indeed.
>
>  libstdc++: Fix -Wstringop-overflow warning coming from std::vector
> [PR109849]
>
>  libstdc++-v3/ChangeLog:
>
>  PR libstdc++/109849
>  * include/bits/vector.tcc
>  (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt,
>  forward_iterator_tag)): Add __builtin_unreachable
> expression to tell
>  the compiler that the allocated buffer is large enough to
> receive current
>  elements plus the range to insert.
>
> Tested under Linux x64, ok to commit ?

Does the !__builtin_constant_p(__len) in this version do anything?

If it's a constant, then the compiler can already provide it's in
range, so the __builtin_unreachable() is redundant, but doesn't do any
harm.



Re: [PATCH v7 4/9] C++: Support clang compatible [[musttail]] (PR83324)

2024-06-03 Thread Jason Merrill

On 6/3/24 11:44, Jakub Jelinek wrote:

On Mon, Jun 03, 2024 at 08:33:52AM -0700, Andi Kleen wrote:

On Mon, Jun 03, 2024 at 10:42:20AM -0400, Jason Merrill wrote:

@@ -30316,7 +30348,7 @@ cp_parser_std_attribute (cp_parser *parser, tree 
attr_ns)
   /* Maybe we don't expect to see any arguments for this attribute.  */
   const attribute_spec *as
 = lookup_attribute_spec (TREE_PURPOSE (attribute));
-if (as && as->max_length == 0)
+if ((as && as->max_length == 0) || is_attribute_p ("musttail", attr_id))


This shouldn't be necessary with the attribute in the c-attribs table,
right?  This patch is OK without this hunk and with the comment tweak above.


Yes I will remove it. Also the hunk above can be simplified, we don't
need the extra case anymore.

But unfortunately there's another problem (sorry I missed that earlier
but the Linaro bot pointed it out again):

This hunk:

@@ -21085,12 +21085,14 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
bool ord = CALL_EXPR_ORDERED_ARGS (t);
bool rev = CALL_EXPR_REVERSE_ARGS (t);
-   if (op || ord || rev)
+   bool mtc = CALL_EXPR_MUST_TAIL_CALL (t);
+   if (op || ord || rev || mtc)
  if (tree call = extract_call_expr (ret))
{
  CALL_EXPR_OPERATOR_SYNTAX (call) = op;
  CALL_EXPR_ORDERED_ARGS (call) = ord;
  CALL_EXPR_REVERSE_ARGS (call) = rev;
+ CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
}


The difference is that CALL_EXPR_MUST_TAIL_CALL is defined as:
#define CALL_EXPR_MUST_TAIL_CALL(NODE) \
   (CALL_EXPR_CHECK (NODE)->base.static_flag)
while the others like:
#define CALL_EXPR_ORDERED_ARGS(NODE) \
   TREE_LANG_FLAG_3 (CALL_OR_AGGR_INIT_CHECK (NODE))
where
#define CALL_OR_AGGR_INIT_CHECK(NODE) \
   TREE_CHECK2 ((NODE), CALL_EXPR, AGGR_INIT_EXPR)
while
#define CALL_EXPR_CHECK(t)  TREE_CHECK (t, CALL_EXPR)
(this one is defined in generated tree-check.h).
So, while the CALL_EXPR_REVERSE_ARGS etc. can be used on either
CALL_EXPR or AGGR_INIT_EXPR (the latter is a C++ specific tree code),
CALL_EXPR_MUST_TAIL_CALL is allowed only on CALL_EXPR.
AGGR_INIT_EXPR is used for C++ constructor calls, so I think
they really don't need such a flag


AGGR_INIT_EXPR is also used for functions returning a class, so I think 
it is needed.


Jason



Re: [PATCH] check_GNU_style: Use raw strings.

2024-06-03 Thread Jeff Law




On 5/31/24 1:38 PM, Robin Dapp wrote:

Hi,

this silences some warnings when using check_GNU_style.

I didn't expect this to have any bootstrap or regtest impact
but I still ran it on x86 - no change.

Regards
  Robin

contrib/ChangeLog:

* check_GNU_style_lib.py: Use raw strings for regexps.

OK
jeff



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Tobias Burnus

Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly.  */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because it 
now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


I think if a user wants to have pseudo USM – and does so by passing 
-foffload-memory=unified – we can add another flag to the internal 
omp_requires_mask. - By passing this option, a user should then also be 
aware of all the unavoidable special-case issues of pseudo-USM and 
cannot complain if they run into those.


If not, well, then the user either gets true USM (if supported) - or 
host fallback. Either of it is perfectly fine.


With -foffload-memory=unified, the compiler can then add all the 
omp_alloc calls – and, e.g., set a new GOMP_REQUIRES_OFFLOAD_MANAGED 
flag. If that's set, we wouldn't do the line above quoted capability 
setting in libgomp/target.c.


For nvidia, GOMP_REQUIRES_OFFLOAD_MANAGED probably requires 
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, i.e. when 0 then we 
probably want to return -1 also for -foffload-memory=unified. - A quick 
check shows that Tesla K20 (Kepler, sm_35) has 0 while Volta, Ada, 
Ampere (sm_70, sm_82, sm_89) have 1. (I recall using managed memory on 
an old system; page migration to the device worked fine, but a on-host 
accesses while the kernel was still running, crashed the program.|)

|

For amdgcn, my impression is that we don't need to handle 
-foffload-memory=unified as only the MI200 series (+ APUs) supports this 
well, but MI200 also supports true USM (with page migration; for APU it 
makes even less sense). - But, of course, we still may. — Auto-setting 
HSA_XNACK could be still be done MI200, but I wonder how to distinguish 
MI300X vs. MI300A, but it probably doesn't harm (nor help) to set 
HSA_XNACK for APUs …



and it's just not true for devices where all host memory isn't 
magically addressable on the device.

Is there another way to detect truly shared memory?


Do you have any indication that the current checks become true when the 
memory is not accessible?


Tobias


Re: [PATCH v6 1/8] Improve must tail in RTL backend

2024-06-03 Thread Michael Matz
Hello,

On Fri, 31 May 2024, Andi Kleen wrote:

> > I think the ultimate knowledge if a call can or cannot be implemented as 
> > tail-call lies within calls.cc/expand_call: It is inherently 
> > target and ABI specific how arguments and returns are layed out, how the 
> > stack frame is generated, if arguments are or aren't removed by callers 
> > or callees and so on; all of that being knowledge that tree-tailcall 
> > doesn't have and doesn't want to have.  As such tree-tailcall should 
> > not be regarded as ultimate truth, and failures of tree-tailcall to 
> > recognize something as tail-callable shouldn't matter.
> 
> It's not the ultimate truth, but some of the checks it does are not 
> duplicated at expand time nor the backend. So it's one necessary pre 
> condition with the current code base.
> 
> Yes maybe the checks could be all moved, but that's a much larger 
> project.

Hmm.  I count six tests in about 25 lines of code in 
tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.

Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
find_tail_calls)?  Why would that be needed for musttail?  Is that 
attribute sometimes applied to calls that aren't in fact sibcall-able?

One thing I'm worried about is the need for a new sibcall pass at O0 just 
for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
live local variables for the whole function, potentially being quadratic.


Ciao,
Michael.


Re: [PATCH] RISC-V: Add min/max patterns for ifcvt.

2024-06-03 Thread Palmer Dabbelt

On Fri, 31 May 2024 08:07:11 PDT (-0700), Robin Dapp wrote:

Hi,

ifcvt likes to emit

(set
  (if_then_else)
(ge (reg 1) (reg2))
(reg 1)
(reg 2))

which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.

This depends on the generic ifcvt change.


https://inbox.sourceware.org/gcc-patches/57bb6ce5-79c3-4b08-b524-e807b9ac4...@gmail.com/T/#u

in case anyone's looking for it.


Regtested on rv64gcv_zvfh_zicond_zbb_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_cmp_3):
New min/max ifcvt pattern.
* config/riscv/iterators.md (minu): New iterator.
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Remove riscv-specific adjustment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-04.c: New test.
---
 gcc/config/riscv/bitmanip.md  | 13 +
 gcc/config/riscv/iterators.md |  8 
 gcc/config/riscv/riscv.cc |  3 --
 .../gcc.target/riscv/zbb-min-max-04.c | 47 +++
 4 files changed, 68 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..11102985796 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -547,6 +547,19 @@ (define_insn "*3"
   "\t%0,%1,%z2"
   [(set_attr "type" "")])

+;; Provide a minmax pattern for ifcvt to match.
+(define_insn "*_cmp_3"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (if_then_else:X
+   (bitmanip_minmax_cmp_op
+   (match_operand:X 1 "register_operand" "r")
+   (match_operand:X 2 "register_operand" "r"))
+   (match_dup 1)
+   (match_dup 2)))]
+  "TARGET_ZBB"
+  "\t%0,%1,%z2"
+  [(set_attr "type" "")])


This is a bit different than how we're handling the other min/max type 
attributes


   (define_insn "*3"
 [(set (match_operand:X 0 "register_operand" "=r")
   (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
  (match_operand:X 2 "reg_or_0_operand" "rJ")))]
 "TARGET_ZBB"
 "\t%0,%1,%z2"
 [(set_attr "type" "")])

but it looks like it ends up with the same types after all the iterators 
(there's some "max vs smax" and "smax vs maxs" juggling, but IIUC it 
ends up in the same place).  I think it'd be clunkier to try and use all 
the same iterators, though, so


Reviewed-by: Palmer Dabbelt 

[I was wondering if we need the reversed, Jeff on the call says we 
don't.  I couldn't figure out how to write a test for it.]



+
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
 (define_insn_and_split "*minmax"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8a9d1986b4a..2f7be6e83c1 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -202,6 +202,14 @@ (define_code_iterator bitmanip_bitwise [and ior])

 (define_code_iterator bitmanip_minmax [smin umin smax umax])

+(define_code_iterator bitmanip_minmax_cmp_op [lt ltu le leu ge geu gt gtu])
+
+; Map a comparison operator to a min or max.
+(define_code_attr bitmanip_minmax_cmp_insn [(lt "min") (ltu "minu")
+   (le "min") (leu "minu")
+   (ge "max") (geu "maxu")
+   (gt "max") (gtu "maxu")])
+
 (define_code_iterator clz_ctz_pcnt [clz ctz popcount])

 (define_code_iterator bitmanip_rotate [rotate rotatert])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 13cd61a4a22..d17c0a260a2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4009,9 +4009,6 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
 {
   struct noce_if_info riscv_if_info = *if_info;

-  riscv_if_info.original_cost -= COSTS_N_INSNS (2);
-  riscv_if_info.original_cost += insn_cost (if_info->jump, if_info->speed_p);
-
   /* Hack alert!  When `noce_try_store_flag_mask' uses `cstore4'
  to emit a conditional set operation on DImode output it comes up
  with a sequence such as:
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
new file mode 100644
index 000..ebf1889075d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-04.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zicond_zbb -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-finline-functions" "-funroll-loops" 
"-ftracer" } } */
+
+typedef int move_s;


IIUC that typedef isn't used?


+
+int
+remove_one_fast (int *move_ordering, const int num_moves, int mark)
+{
+  int i, best = -100;
+  int tmp = 0;
+
+  for (i = mark; i < num_moves; i++)
+{
+  if (move_ordering[i] > best)
+   

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs

On 03/06/2024 17:46, Tobias Burnus wrote:

Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly.  */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because it 
now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


Er, no.

The default do-nothing USM uses slow uncachable PCI memory accesses (on 
devices that don't have truly shared memory, like APUs).


The CUDA Managed Memory and AMD Coarse Grained memory implementation 
uses proper page migration and permits full-speed memory access on the 
device (just don't thrash the pages too fast).


These are very different things!

I think if a user wants to have pseudo USM – and does so by passing 
-foffload-memory=unified – we can add another flag to the internal 
omp_requires_mask. - By passing this option, a user should then also be 
aware of all the unavoidable special-case issues of pseudo-USM and 
cannot complain if they run into those.


If not, well, then the user either gets true USM (if supported) - or 
host fallback. Either of it is perfectly fine.


With -foffload-memory=unified, the compiler can then add all the 
omp_alloc calls – and, e.g., set a new GOMP_REQUIRES_OFFLOAD_MANAGED 
flag. If that's set, we wouldn't do the line above quoted capability 
setting in libgomp/target.c.


For nvidia, GOMP_REQUIRES_OFFLOAD_MANAGED probably requires 
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, i.e. when 0 then we 
probably want to return -1 also for -foffload-memory=unified. - A quick 
check shows that Tesla K20 (Kepler, sm_35) has 0 while Volta, Ada, 
Ampere (sm_70, sm_82, sm_89) have 1. (I recall using managed memory on 
an old system; page migration to the device worked fine, but a on-host 
accesses while the kernel was still running, crashed the program.|)

|

For amdgcn, my impression is that we don't need to handle 
-foffload-memory=unified as only the MI200 series (+ APUs) supports this 
well, but MI200 also supports true USM (with page migration; for APU it 
makes even less sense). - But, of course, we still may. — Auto-setting 
HSA_XNACK could be still be done MI200, but I wonder how to distinguish 
MI300X vs. MI300A, but it probably doesn't harm (nor help) to set 
HSA_XNACK for APUs …



and it's just not true for devices where all host memory isn't 
magically addressable on the device.

Is there another way to detect truly shared memory?


Do you have any indication that the current checks become true when the 
memory is not accessible?


On AMD MI200, your check broken my USM testcases (because the code they 
were testing isn't active).  This is a serious performance problem.


Andrew


  1   2   >