[PATCH 1/2] i386: Deprecate -m[no-]avx10.1 and make -mno-avx10.1-512 to disable the whole AVX10.1

2025-02-13 Thread Haochen Jiang
Based on the feedback we got, we would like to re-alias avx10.x to 512
bit in the future. This leaves the current avx10.1 alias to 256 bit
inconsistent. Since it has been there for GCC 14.1 and GCC 14.2,
we decide to deprecate avx10.1 alias. The current proposal is not
adding it back in the future, but it might change if necessary.

For -mno- options, it is confusing what it is disabling when it comes
to avx10. Since there is barely usage enabling AVX10 with 512 bit
then disabling it, we will only provide -mno-avx10.x options in the
future, disabling the whole AVX10.x. If someone really wants to disable
512 bit after enabling it, -mavx10.x-512 -mno-avx10.x -mavx10.x-256 is
the only way to do that since we also do not want to break the usual
expression on -m- options enabling everything mentioned.

However, for avx10.1, since we deprecated avx10.1, there is no reason
we should have -mno-avx10.1. Thus, we need to keep -mno-avx10.1-[256,512].
To avoid confusion, we will make -mno-avx10.1-512 to disable the
whole AVX10.1 set to match the future -mno-avx10.x.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX2_UNSET): Change AVX10.1 unset macro.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_1_UNSET): New.
(ix86_handle_option): Adjust AVX10.1 unset macro.
* common/config/i386/i386-isas.h: Remove avx10.1.
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Ditto.
(ix86_option_override_internal): Adjust warning message.
* config/i386/i386.opt: Remove mavx10.1.
* doc/extend.texi: Remove avx10.1 and adjust doc.
* doc/sourcebuild.texi: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10-check.h: Change to avx10.1-256.
* gcc.target/i386/avx10_1-1.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-14.c: Ditto.
* gcc.target/i386/avx10_1-21.c: Ditto.
* gcc.target/i386/avx10_1-22.c: Ditto.
* gcc.target/i386/avx10_1-23.c: Ditto.
* gcc.target/i386/avx10_1-24.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-5.c: Ditto.
* gcc.target/i386/avx10_1-6.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/pr117946.c: Ditto.
* gcc.target/i386/avx10_1-12.c: Adjust warning message.
* gcc.target/i386/avx10_1-19.c: Ditto.
* gcc.target/i386/avx10_1-17.c: Adjust to no-avx10.1-512.
---
 gcc/common/config/i386/i386-common.cc   | 18 --
 gcc/common/config/i386/i386-isas.h  |  1 -
 gcc/config/i386/i386-options.cc |  3 +--
 gcc/config/i386/i386.opt|  5 -
 gcc/doc/extend.texi | 11 ---
 gcc/doc/sourcebuild.texi|  5 +
 gcc/testsuite/gcc.target/i386/avx10-check.h |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-12.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-13.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-14.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-17.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/avx10_1-19.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-21.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-22.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-23.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-24.c  |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-5.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-6.c   |  2 +-
 gcc/testsuite/gcc.target/i386/avx10_1-8.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr117946.c|  2 +-
 22 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 52ad1c5acd1..793d6845684 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -249,7 +249,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \
| OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \
| OPTION_MASK_ISA2_AVXVNNIINT16_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET \
-   | OPTION_MASK_ISA2_AVX10_1_256_UNSET)
+   | OPTION_MASK_ISA2_AVX10_1_UNSET)
 #define OPTION_MASK_ISA_AVX512F_UNSET \
   (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \
| OPTION_MASK_ISA_AVX512DQ_UNSET | OPTION_MASK_ISA_AVX512BW_UNSET \
@@ -325,11 +325,9 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F
 #define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
 #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR
-#define OPTION_MASK_ISA2_AVX10_1_256_UNSET \
-  (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_

[PATCH 0/2] i386: Adjust AVX10 related options

2025-02-13 Thread Haochen Jiang
Hi all,

According to the previous feedback on our RFC for AVX10 option adjustment
and discussion with LLVM, we finalized how we are going to handle that.

The overall direction is to re-alias avx10.x alias to 512 bit and only
using -mno-avx10.x to disable everything instead of the current confusing
-mno-avx10.x-[256,512], leading to deprecating -mno-avx10.x-[256,512].

It is fine for AVX10.2 since it is just introduced. However, it will become
tricky for AVX10.1 introduced in GCC 14. Thus, we will deprecate
avx10.1 alias. For -mno- options, since we do not have avx10.1, having 
-mno-avx10.1 would become weird. We will keep both -mno-avx10.1-256 and
-mno-avx10.1-512, while changing -mno-avx10.1-512 also disabling the whole
AVX10.1 to align with future.

For option re-design to follow the latter length to determine the AVX10
size, we choose not to change that since it will break the previous
impression on -m options should enable everything after that. Also it
will make options like -mavx10.2-512 -mavx10.4-256 losing its flexibilty
on only enabling 512 bit on AVX10.1/2 but enabling 256 bit on AVX10.3/4.

Upcoming are the two patches, the first patch will be backported to
GCC 14. Ok for trunk?

Thx,
Haochen




Re: [PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test

2025-02-13 Thread Richard Biener
On Wed, 12 Feb 2025, Andrew Pinski wrote:

> On Wed, Feb 12, 2025 at 4:04 AM Richard Biener  wrote:
> >
> > The PR indicates a very specific issue with regard to SSA coalescing
> > failures because there's a pre IV increment loop exit test.  While
> > IVOPTs created the desired IL we later simplify the exit test into
> > the undesirable form again.  The following fixes this up during RTL
> > expansion where we try to improve coalescing of IVs.  That seems
> > easier that trying to avoid the simplification with some weird
> > heuristics (it could also have been written this way).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK for trunk?
> >
> > Thanks,
> > Richard.
> >
> > PR tree-optimization/86270
> > * tree-outof-ssa.cc (insert_backedge_copies): Pattern
> > match a single conflict in a loop condition and adjust
> > that avoiding the conflict if possible.
> >
> > * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
> > copies as well.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr86270.c |  3 ++
> >  gcc/tree-outof-ssa.cc   | 49 ++---
> >  2 files changed, 47 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c 
> > b/gcc/testsuite/gcc.target/i386/pr86270.c
> > index 68562446fa4..89b9aeb317a 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> > @@ -13,3 +13,6 @@ test ()
> >
> >  /* Check we do not split the backedge but keep nice loop form.  */
> >  /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */
> > +/* Check we do not end up with reg-reg moves from a pre-increment IV
> > +   exit test.  */
> > +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" 
> > } } */
> > diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
> > index d340d4ba529..f285c81599e 100644
> > --- a/gcc/tree-outof-ssa.cc
> > +++ b/gcc/tree-outof-ssa.cc
> > @@ -1259,10 +1259,9 @@ insert_backedge_copies (void)
> >   if (gimple_nop_p (def)
> >   || gimple_code (def) == GIMPLE_PHI)
> > continue;
> > - tree name = copy_ssa_name (result);
> > - gimple *stmt = gimple_build_assign (name, result);
> >   imm_use_iterator imm_iter;
> >   gimple *use_stmt;
> > + auto_vec uses;
> >   /* The following matches trivially_conflicts_p.  */
> >   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result)
> > {
> > @@ -1273,11 +1272,51 @@ insert_backedge_copies (void)
> > {
> >   use_operand_p use;
> >   FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
> > -   SET_USE (use, name);
> > +   uses.safe_push (use);
> > }
> > }
> > - gimple_stmt_iterator gsi = gsi_for_stmt (def);
> > - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> > + /* When there is just a conflicting statement try to
> > +adjust that to refer to the new definition.
> > +In particular for now handle a conflict with the
> > +use in a (exit) condition with a NE compare,
> > +replacing a pre-IV-increment compare with a
> > +post-IV-increment one.  */
> > + if (uses.length () == 1
> > + && is_a  (USE_STMT (uses[0]))
> > + && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR
> > + && is_gimple_assign (def)
> > + && gimple_assign_rhs1 (def) == result
> > + && (gimple_assign_rhs_code (def) == PLUS_EXPR
> > + || gimple_assign_rhs_code (def) == MINUS_EXPR
> > + || gimple_assign_rhs_code (def) == 
> > POINTER_PLUS_EXPR)
> > + && TREE_CODE (gimple_assign_rhs2 (def)) == 
> > INTEGER_CST)
> > +   {
> > + gcond *cond = as_a  (USE_STMT (uses[0]));
> > + tree *adj;
> > + if (gimple_cond_lhs (cond) == result)
> > +   adj = gimple_cond_rhs_ptr (cond);
> > + else
> > +   adj = gimple_cond_lhs_ptr (cond);
> > + tree name = copy_ssa_name (result);
> 
> Should this be `copy_ssa_name (*adj)`? Since the new name is based on
> `*adj` rather than based on the result.

Good point, I've adjusted this in my local copy.

Richard.


> Thanks,
> Andrew Pinski
> 
> > + gimple *stmt
> > +   = gimple_build_assign (name,
> > +  gimple_assign_rhs_code (def),
> > +   

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Robin Dapp
>>> Other thoughts?
>> 
>> The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff 
>> we can't/don't model in the pipeline, but I have no idea how to model 
>> the VL=0 case there.
> Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be 
> the first time a hook got (ab)used in ways that weren't part of the 
> original intent.

I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?

Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?

On a separate note:  How about we move the vsetvl pass after sched2?
Then we could at least rely on LCM doing its work uninhibited and wouldn't
reorder vsetvls afterwards.  Or do we somehow rely on rtl_dce and BB
reorder to run afterwards?

That won't help with the problem here but might with others.

-- 
Regards
 Robin



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Vineet Gupta
On 2/13/25 20:46, Jeff Law wrote:
>> BTW what exactly is speculative scheduling ? As in what is it actually 
>> trying to
>> schedule ahead ?
> In simplest terms assume we have this kind of graph
>
>  0
> / \
>1-->2
>
>
> The scheduler knows how to build scheduling regions, essentially 
> extended basic blocks.  In this case we have two regions one with the 
> blocks 0,1 the other being just block 2.
>
> In the multi-block region 0,1 we allow insns from block 1 to speculate 
> into block 0.
>
> Let's assume we're on a simple 2-wide in order machine and somewhere in 
> bb0 we there's a slot available for an insn that we couldn't fill with 
> anything useful from bb0.  In that case we may speculate an insn from 
> bb1 into bb0 to execute "for free" in that unused slot.
>
> That's the basic idea.  It was particularly helpful for in-order cores 
> in the past. It's dramatically less important for an out of order core 
> since those are likely doing the speculation in hardware.

That is great info, super helpful.

Given this background, I'd argue that Edwin's patch to barricade vsetvls in
scheduling is the right thing to do anyways, this issue or otherwise.

> Naturally if you're using icounts for evaluation this kind of behavior 
> is highly undesirable since that kind of evaluation says the 
> transformation is bad, but in reality on certain designs is quite helpful.

Sure.

Thx,
-Vineet


Re: [PATCH] libstdc++: Implement P3138R5 views::cache_latest

2025-02-13 Thread Patrick Palka
On Thu, 13 Feb 2025, Jonathan Wakely wrote:

> On Tue, 11 Feb 2025 at 05:59, Patrick Palka  wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > -- >8 --
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/version.def (ranges_cache_latest): Define.
> > * include/bits/version.h: Regenerate.
> > * include/std/ranges (cache_latest_view): Define for C++26.
> > (cache_latest_view::_Iterator): Likewise.
> > (cache_latest_view::_Sentinel): Likewise.
> > (views::__detail::__can_cache_latest): Likewise.
> > (views::_CacheLatest, views::cache_latest): Likewise.
> > * testsuite/std/ranges/adaptors/cache_latest/1.cc: New test.
> 
> The test is missing from the patch.

Whoops, below is the complete patch.

> 
> > ---
> >  libstdc++-v3/include/bits/version.def |   8 ++
> >  libstdc++-v3/include/bits/version.h   |  10 ++
> >  libstdc++-v3/include/std/ranges   | 189 ++
> >  3 files changed, 207 insertions(+)
> >
> > diff --git a/libstdc++-v3/include/bits/version.def 
> > b/libstdc++-v3/include/bits/version.def
> > index 002e560dc0d..6fb5db2e1fc 100644
> > --- a/libstdc++-v3/include/bits/version.def
> > +++ b/libstdc++-v3/include/bits/version.def
> > @@ -1837,6 +1837,14 @@ ftms = {
> >};
> >  };
> >
> > +ftms = {
> > +  name = ranges_cache_latest;
> > +  values = {
> > +v = 202411;
> > +cxxmin = 26;
> > +  };
> > +};
> > +
> >  ftms = {
> >name = ranges_concat;
> >values = {
> > diff --git a/libstdc++-v3/include/bits/version.h 
> > b/libstdc++-v3/include/bits/version.h
> > index 70de189b1e0..db61a396c45 100644
> > --- a/libstdc++-v3/include/bits/version.h
> > +++ b/libstdc++-v3/include/bits/version.h
> > @@ -2035,6 +2035,16 @@
> >  #endif /* !defined(__cpp_lib_is_virtual_base_of) && 
> > defined(__glibcxx_want_is_virtual_base_of) */
> >  #undef __glibcxx_want_is_virtual_base_of
> >
> > +#if !defined(__cpp_lib_ranges_cache_latest)
> > +# if (__cplusplus >  202302L)
> > +#  define __glibcxx_ranges_cache_latest 202411L
> > +#  if defined(__glibcxx_want_all) || 
> > defined(__glibcxx_want_ranges_cache_latest)
> > +#   define __cpp_lib_ranges_cache_latest 202411L
> > +#  endif
> > +# endif
> > +#endif /* !defined(__cpp_lib_ranges_cache_latest) && 
> > defined(__glibcxx_want_ranges_cache_latest) */
> > +#undef __glibcxx_want_ranges_cache_latest
> > +
> >  #if !defined(__cpp_lib_ranges_concat)
> >  # if (__cplusplus >  202302L)
> >  #  define __glibcxx_ranges_concat 202403L
> > diff --git a/libstdc++-v3/include/std/ranges 
> > b/libstdc++-v3/include/std/ranges
> > index 5c795a90fbc..db9a00be264 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -58,6 +58,7 @@
> >  #define __glibcxx_want_ranges_as_const
> >  #define __glibcxx_want_ranges_as_rvalue
> >  #define __glibcxx_want_ranges_cartesian_product
> > +#define __glibcxx_want_ranges_cache_latest
> >  #define __glibcxx_want_ranges_concat
> >  #define __glibcxx_want_ranges_chunk
> >  #define __glibcxx_want_ranges_chunk_by
> > @@ -1534,6 +1535,8 @@ namespace views::__adaptor
> > this->_M_payload._M_apply(_Optional_func{__f}, __i);
> > return this->_M_get();
> >   }
> > +
> > +   using _Optional_base<_Tp>::_M_reset;

I also forgot to mention this change in the ChangeLog.

> >};
> >
> >  template
> > @@ -10203,6 +10206,192 @@ namespace ranges
> >  } // namespace ranges
> >  #endif // __cpp_lib_ranges_concat
> >
> > +#if __cpp_lib_ranges_cache_latest // C++ >= 26
> > +namespace ranges
> > +{
> > +  template
> > +requires view<_Vp>
> > +  class cache_latest_view : public view_interface>
> > +  {
> > +_Vp _M_base = _Vp();
> > +
> > +using __cache_t = conditional_t>,
> > +   add_pointer_t>,
> > +   range_reference_t<_Vp>>;
> 
> __conditional_t is cheaper to instantiate than conditional_t, so when
> it doesn't affect the mangled name of a public symbol we should prefer
> __conditional_t.

Ack, fixed below.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/version.def (ranges_cache_latest): Define.
* include/bits/version.h: Regenerate.
* include/std/ranges (__detail::__non_propagating_cache::_M_reset):
Export from base class _Optional_base.
(cache_latest_view): Define for C++26.
(cache_latest_view::_Iterator): Likewise.
(cache_latest_view::_Sentinel): Likewise.
(views::__detail::__can_cache_latest): Likewise.
(views::_CacheLatest, views::cache_latest): Likewise.
* testsuite/std/ranges/adaptors/cache_latest/1.cc: New test.
---
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/ranges   | 189 ++
 .../std/ranges/adaptors/cache_latest/1.cc |  72 +++
 4 files changed, 279 insertions(+)
 

Re: [PATCH v2 2/8] LoongArch: Allow moving TImode vectors

2025-02-13 Thread Lulu Cheng

Hi,

If only apply the first and second patches, the code will not compile.

Otherwise LGTM.

Thanks!

在 2025/2/13 下午5:41, Xi Ruoyao 写道:

We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors.  Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.

For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX registers so we won't get a reload failure when we start to
save TImode vectors in these registers.

This implicitly depends on the vrepli optimization: without it we'd try
"vrepli.q" which does not really exist and trigger an ICE.

gcc/ChangeLog:

* config/loongarch/lsx.md (mov): Remove.
(movmisalign): Remove.
(mov_lsx): Remove.
* config/loongarch/lasx.md (mov): Remove.
(movmisalign): Remove.
(mov_lasx): Remove.
* config/loongarch/simd.md (ALLVEC_TI): New mode iterator.
(mov): Likewise.
(mov_simd): New define_insn_and_split.
---
  gcc/config/loongarch/lasx.md | 40 --
  gcc/config/loongarch/lsx.md  | 36 ---
  gcc/config/loongarch/simd.md | 42 
  3 files changed, 42 insertions(+), 76 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index a37c85a25a4..d82ad61be60 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -699,46 +699,6 @@ (define_expand "lasx_xvrepli"
DONE;
  })
  
-(define_expand "mov"

-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-
-(define_expand "movmisalign"
-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-;; 256-bit LASX modes can only exist in LASX registers or memory.
-(define_insn "mov_lasx"
-  [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
-   (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
-  "ISA_HAS_LASX"
-  { return loongarch_output_move (operands); }
-  [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
-   (set_attr "mode" "")
-   (set_attr "length" "8,4,4,4,4")])
-
-
-(define_split
-  [(set (match_operand:LASX 0 "nonimmediate_operand")
-   (match_operand:LASX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LASX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
  
  ;; LASX

  (define_insn "add3"
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index ca0066a21ed..bcc5ae85fb3 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -575,42 +575,6 @@ (define_insn "lsx_vshuf_"
[(set_attr "type" "simd_sld")
 (set_attr "mode" "")])
  
-(define_expand "mov"

-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_expand "movmisalign"
-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_insn "mov_lsx"
-  [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f,*r")
-   (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r,*r"))]
-  "ISA_HAS_LSX"
-{ return loongarch_output_move (operands); }
-  [(set_attr "type" 
"simd_move,simd_load,simd_store,simd_copy,simd_insert,simd_copy")
-   (set_attr "mode" "")])
-
-(define_split
-  [(set (match_operand:LSX 0 "nonimmediate_operand")
-   (match_operand:LSX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LSX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
  
  ;; Integer operations

  (define_insn "add3"
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 7605b17d21e..61fc1ab20ad 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -130,6 +130,48 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
  ;; instruction here so we can avoid duplicating logics.
  ;; ===
  
+

+;; Move
+
+;; Some immediate values in V1TI or V2TI may be stored in LSX or LASX
+;; registers, thus we need to allow moving them for reload.
+(define_mode_iterator ALLVEC_TI [ALLVEC
+(V1TI "ISA_HAS_LSX")
+(V2TI "ISA_HAS_LASX")])
+
+(define_expand "mov"
+  [(set (match_operand:ALLVEC_TI 0)
+   (match_operand:ALLVEC_TI 1))]
+  ""
+{
+  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
+DONE;
+})
+
+(define_expand 

[PATCH] rx: allow cmpstrnsi len to be zero

2025-02-13 Thread Keith Packard
The SCMPU instruction doesn't change the C and Z flags when the
incoming length is zero, which means the insn will produce a
value based upon the existing flag values.

As a quick kludge, adjust these flags to ensure a zero result in this
case.

Signed-off-by: Keith Packard 
---
 gcc/config/rx/rx.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index 89211585c9c..edb2c96603f 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -2590,7 +2590,9 @@ (define_insn "rx_cmpstrn"
(clobber (reg:SI 3))
(clobber (reg:CC CC_REG))]
   "rx_allow_string_insns"
-  "scmpu   ; Perform the string comparison
+  "setpsw  z   ; Set flags in case len is zero
+   setpsw  c
+   scmpu   ; Perform the string comparison
mov #-1, %0  ; Set up -1 result (which cannot be created
 ; by the SC insn)
bnc?+   ; If Carry is not set skip over
-- 
2.47.2



[PATCH 2/2] i386: Re-alias avx10.2 to 512 bit and deprecate -mno-avx10.2-[256, 512]

2025-02-13 Thread Haochen Jiang
As mentioned in avx10.1 option deprecate patch, based on the feedback
we got, we would like to re-alias avx10.x to 512 bit.

For -mno- options, also mentioned in the previous patch, it is confusing
what it is disabling when it comes to avx10. So we will only provide
-mno-avx10.x options from AVX10.2, disabling the whole AVX10.x.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_UNSET): Adjust macro.
(OPTION_MASK_ISA2_AVX10_2_256_UNSET): Removed.
(OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX10_2_UNSET): New.
(ix86_handle_option): Remove disable part for avx10.2-256.
Rename avx10.2-512 switch case to avx10.2 and adjust disable
part macro.
* common/config/i386/i386-isas.h: Adjust avx10.2 and
avx10.2-512.
* config/i386/driver-i386.cc
(host_detect_local_cpu): Do not append -mno-avx10.x-256
for -march=native.
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Adjust avx10.2 and
avx10.2-512.
* config/i386/i386.opt: Reject Negative for mavx10.2-256.
Alias mavx10.2-512 to mavx10.2. Reject Negative for
mavx10.2-512.
* doc/extend.texi: Adjust documentation.
* doc/sourcebuild.texi: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c:
Add missing avx10_2_512 check.
* gcc.target/i386/avx10_2-512-vminmaxpd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminmaxps-2.c: Ditto.
* gcc.target/i386/avx10-check.h: Change avx10.2 to avx10.2-256.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-1.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-comibf-1.c: Ditto.
* gcc.target/i386/avx10_2-comibf-2.c: Ditto.
* gcc.target/i386/avx10_2-comibf-3.c: Ditto.
* gcc.target/i386/avx10_2-comibf-4.c: Ditto.
* gcc.target/i386/avx10_2-compare-1.c: Ditto.
* gcc.target/i386/avx10_2-compare-1b.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avx10_2-minmax-1.c: Ditto.
* gcc.target/i386/avx10_2-movrs-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: Ditto.
* gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-vaddbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcmpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcomisbf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vcomisbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2dqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2qqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2udqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttpd2uqqs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.targe

[PING 2] [PATCH v2] rs6000: Inefficient vector splat of small V2DI constants [PR107757]

2025-02-13 Thread Surya Kumari Jangala
Ping.
I have incorporated review comments from Peter in this revised patch. The 
comment was to remove -mvsx option from dg-options as this is implied by 
-mcpu=power8.
Ok for trunk?

Regards,
Surya

On 09/01/25 8:53 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 02/12/24 2:20 pm, Surya Kumari Jangala wrote:
>> I have incorporated review comments in this patch.
>>
>> Regards,
>> Surya
>>
>>
>> rs6000: Inefficient vector splat of small V2DI constants [PR107757]
>>
>> On P8, for vector splat of double word constants, specifically -1 and 1,
>> gcc generates inefficient code. For -1, gcc generates two instructions
>> (vspltisw and vupkhsw) whereas only one instruction (vspltisw) is
>> sufficient. For constant 1, gcc generates a load of the constant from
>> .rodata instead of the instructions vspltisw and vupkhsw.
>>
>> The routine vspltisw_vupkhsw_constant_p() returns true if the constant
>> can be synthesized with instructions vspltisw and vupkhsw. However, for
>> constant 1, this routine returns false.
>>
>> For constant -1, this routine returns true. Vector splat of -1 can be
>> done with only one instruction, i.e., vspltisw. We do not need two
>> instructions. Hence this routine should return false for -1.
>>
>> With this patch, gcc generates only one instruction (vspltisw)
>> for -1. And for constant 1, this patch generates two instructions
>> (vspltisw and vupkhsw).
>>
>> 2024-11-20  Surya Kumari Jangala  
>>
>> gcc/
>>  PR target/107757
>>  * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p):
>>  Return false for -1 and return true for 1.
>>
>> gcc/testsuite/
>>  PR target/107757
>>  * gcc.target/powerpc/pr107757-1.c: New.
>>  * gcc.target/powerpc/pr107757-2.c: New.
>> ---
>>  gcc/config/rs6000/rs6000.cc   |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr107757-1.c | 14 ++
>>  gcc/testsuite/gcc.target/powerpc/pr107757-2.c | 13 +
>>  3 files changed, 28 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107757-2.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 02a2f1152db..d0c528f4d5f 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -6652,7 +6652,7 @@ vspltisw_vupkhsw_constant_p (rtx op, machine_mode 
>> mode, int *constant_ptr)
>>  return false;
>>  
>>value = INTVAL (elt);
>> -  if (value == 0 || value == 1
>> +  if (value == 0 || value == -1
>>|| !EASY_VECTOR_15 (value))
>>  return false;
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
>> new file mode 100644
>> index 000..49076fba255
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-1.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
>> +/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */
>> +/* { dg-final { scan-assembler {\mvupkhsw\M} } } */
>> +/* { dg-final { scan-assembler-not {\mlvx\M} } } */
>> +
>> +#include 
>> +
>> +vector long long
>> +foo ()
>> +{
>> +  return vec_splats (1LL);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr107757-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
>> new file mode 100644
>> index 000..4955696f11d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr107757-2.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
>> +/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-final { scan-assembler {\mvspltisw\M} } } */
>> +/* { dg-final { scan-assembler-not {\mvupkhsw\M} } } */
>> +
>> +#include 
>> +
>> +vector long long
>> +foo ()
>> +{
>> + return vec_splats (~0LL);
>> +}
> 



[PATCH v2] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread H.J. Lu
Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

PR target/109780
PR target/109093
* config/i386/i386.cc (ix86_update_stack_alignment): New.
(ix86_find_all_reg_use_1): Likewise.
(ix86_find_all_reg_use): Likewise.
(ix86_find_max_used_stack_alignment): Also check memory accesses
from registers defined by stack or frame registers.

gcc/testsuite/

PR target/109780
PR target/109093
* g++.target/i386/pr109780-1.C: New test.
* gcc.target/i386/pr109093-1.c: Likewise.
* gcc.target/i386/pr109780-1.c: Likewise.
* gcc.target/i386/pr109780-2.c: Likewise.
* gcc.target/i386/pr109780-3.c: Likewise.

-- 
H.J.
From 820f939a024fc71e4e37b509a3aa0290e8c4e9df Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 14 Mar 2023 11:41:51 -0700
Subject: [PATCH v2] x86: Properly find the maximum stack slot alignment

Don't assume that stack slots can only be accessed by stack or frame
registers.  We first find all registers defined by stack or frame
registers.  Then check memory accesses by such registers, including
stack and frame registers.

gcc/

	PR target/109780
	PR target/109093
	* config/i386/i386.cc (ix86_update_stack_alignment): New.
	(ix86_find_all_reg_use_1): Likewise.
	(ix86_find_all_reg_use): Likewise.
	(ix86_find_max_used_stack_alignment): Also check memory accesses
	from registers defined by stack or frame registers.

gcc/testsuite/

	PR target/109780
	PR target/109093
	* g++.target/i386/pr109780-1.C: New test.
	* gcc.target/i386/pr109093-1.c: Likewise.
	* gcc.target/i386/pr109780-1.c: Likewise.
	* gcc.target/i386/pr109780-2.c: Likewise.
	* gcc.target/i386/pr109780-3.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc| 173 ++---
 gcc/testsuite/g++.target/i386/pr109780-1.C |  72 +
 gcc/testsuite/gcc.target/i386/pr109093-1.c |  39 +
 gcc/testsuite/gcc.target/i386/pr109780-1.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr109780-2.c |  21 +++
 gcc/testsuite/gcc.target/i386/pr109780-3.c |  52 +++
 6 files changed, 350 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr109780-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109093-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr109780-3.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3128973ba79..4d855d9541c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -8466,6 +8466,110 @@ output_probe_stack_range (rtx reg, rtx end)
   return "";
 }
 
+/* Update the maximum stack slot alignment from memory alignment in
+   PAT.  */
+
+static void
+ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
+{
+  /* This insn may reference stack slot.  Update the maximum stack slot
+ alignment.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, pat, ALL)
+if (MEM_P (*iter))
+  {
+	unsigned int alignment = MEM_ALIGN (*iter);
+	unsigned int *stack_alignment
+	  = (unsigned int *) data;
+	if (alignment > *stack_alignment)
+	  *stack_alignment = alignment;
+	break;
+  }
+}
+
+/* Helper function for ix86_find_all_reg_use.  */
+
+static void
+ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
+			 auto_bitmap &worklist)
+{
+  rtx src = SET_SRC (set);
+  if (MEM_P (src))
+return;
+
+  rtx dest = SET_DEST (set);
+  if (!REG_P (dest))
+return;
+
+  if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
+return;
+
+  /* Add this register to stack_slot_access.  */
+  add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
+  bitmap_set_bit (worklist, REGNO (dest));
+}
+
+/* Find all registers defined with REG.  */
+
+static void
+ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
+		   unsigned int reg, auto_bitmap &worklist)
+{
+  for (df_ref ref = DF_REG_USE_CHAIN (reg);
+   ref != NULL;
+   ref = DF_REF_NEXT_REG (ref))
+{
+  if (DF_REF_IS_ARTIFICIAL (ref))
+	continue;
+
+  rtx_insn *insn = DF_REF_INSN (ref);
+  if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+  if (CALL_P (insn) || JUMP_P (insn))
+	continue;
+
+  rtx set = single_set (insn);
+  if (set)
+	ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);
+
+  rtx pat = PATTERN (insn);
+  if (GET_CODE (pat) != PARALLEL)
+	continue;
+
+  for (int i = 0; i < XVECLEN (pat, 0); i++)
+	{
+	  rtx exp = XVECEXP (pat, 0, i);
+	  switch (GET_CODE (exp))
+	{
+	case ASM_OPERANDS:
+	case CLOBBER:
+	case PREFETCH:
+	case USE:
+	  break;
+	case UNSPEC:
+	case UNSPEC_VOLATILE:
+	  for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--)
+		{
+		  rtx x = XVECEXP (exp, 0, j);
+		  if (GET_CODE (x) == SET)
+		ix86_find_all_reg_use_1 (x,

[PATCH v2] LoongArch: Adjust the cost of ADDRESS_REG_REG.

2025-02-13 Thread Lulu Cheng
After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.

Add option '-maddr-reg-reg-cost='.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Add
option '-maddr-reg-reg-cost='.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize
addr_reg_reg_cost to 3.
* config/loongarch/loongarch-opts.cc
(loongarch_target_option_override): If '-maddr-reg-reg-cost='
is not used, set it to the initial value.
* config/loongarch/loongarch-tune.h
(struct loongarch_rtx_cost_data): Add the member
addr_reg_reg_cost and its assignment function to the structure
loongarch_rtx_cost_data.
* config/loongarch/loongarch.cc (loongarch_address_insns):
Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* doc/invoke.texi: Add description of '-maddr-reg-reg-cost='.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/const-double-zero-stx.c: Add
'-maddr-reg-reg-cost=1'.
* gcc.target/loongarch/stack-check-alloca-1.c: Likewise.

Change-Id: I8fbf7a6d073b16c7829b1a9a8d239b131d53ab1b
---
 gcc/config/loongarch/genopts/loongarch.opt.in  | 4 
 gcc/config/loongarch/loongarch-def.cc  | 1 +
 gcc/config/loongarch/loongarch-opts.cc | 3 +++
 gcc/config/loongarch/loongarch-tune.h  | 7 +++
 gcc/config/loongarch/loongarch.cc  | 2 +-
 gcc/config/loongarch/loongarch.opt | 4 
 gcc/config/loongarch/loongarch.opt.urls| 3 +++
 gcc/doc/invoke.texi| 7 ++-
 gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +-
 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c  | 2 +-
 10 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 8c292c8600d..39c1545e540 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -177,6 +177,10 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(la_branch_cost) Save
 -mbranch-cost=COST Set the cost of branches to roughly COST instructions.
 
+maddr-reg-reg-cost=
+Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save
+-maddr-reg-reg-cost=COST  Set the cost of ADDRESS_REG_REG to the value 
calculated by COST.
+
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV) Save
 Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index b0271eb3b9a..5f235a04ef2 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
 movcf2gr (COSTS_N_INSNS (7)),
 movgr2cf (COSTS_N_INSNS (15)),
 branch_cost (6),
+addr_reg_reg_cost (3),
 memory_latency (4) {}
 
 /* The following properties cannot be looked up directly using "cpucfg".
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 36342cc9373..c2a63f75fc2 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target 
*target,
   if (!opts_set->x_la_branch_cost)
 opts->x_la_branch_cost = loongarch_cost->branch_cost;
 
+  if (!opts_set->x_la_addr_reg_reg_cost)
+opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost;
+
   /* other stuff */
   if (ABI_LP64_P (target->abi.base))
 opts->x_flag_pcc_struct_return = 0;
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index e69173ebf79..f7819fe7678 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data
   unsigned short movcf2gr;
   unsigned short movgr2cf;
   unsigned short branch_cost;
+  unsigned short addr_reg_reg_cost;
   unsigned short memory_latency;
 
   /* Default RTX cost initializer, implemented in loongarch-def.cc.  */
@@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data
 return *this;
   }
 
+  loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short 
_addr_reg_reg_cost)
+  {
+addr_reg_reg_cost = _addr_reg_reg_cost;
+return *this;
+  }
+
   loongarch_rtx_cost_data memory_latency_ (unsigned short _memory_latency)
   {
 memory_latency = _memory_latency;
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..495b62309d6 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -238

Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]

2025-02-13 Thread Jakub Jelinek
On Thu, Feb 13, 2025 at 12:48:44PM +0100, Richard Biener wrote:
> So what this basically does is ensure we mark DECL_VALUE_EXPR when
> VAR is marked which isn't done when marking a tree node.
> 
> That you special-case the hashtable walker is a workaround for
> us not being able to say
> 
> struct GTY((mark_extra_stuff)) tree_decl_with_vis {
> 
> on 'tree' (or specifically the structs for a VAR_DECL).  And that we
> rely on gengtype producing the 'tree' marker.  So we rely on the
> hashtable keeping referenced trees live.

Yes, we could just arrange for gt_ggc_mx_lang_tree_node to additionally
mark DECL_VALUE_EXPR for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set (dunno how
exactly).
I think what the patch does should be slightly cheaper, we avoid those
DECL_VALUE_EXPR hash table lookups in the common case where DECL_VALUE_EXPR
of marked variables just refers to trees which reference only marked
VAR_DECLs and no unmarked ones.

Jakub



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Vineet Gupta
On 2/13/25 14:17, Robin Dapp wrote:
 Other thoughts?
>>> The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff 
>>> we can't/don't model in the pipeline, but I have no idea how to model 
>>> the VL=0 case there.
>> Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be 
>> the first time a hook got (ab)used in ways that weren't part of the 
>> original intent.
> I don't fully understand what's happening.  So the hoisting is being done
> speculatively here?  And it just happens to be "bad" because that might
> cause a VL=0 case.  But are we sure a lack of speculation cannot cause
> such cases?

Exactly. My gut feeling w/o deep dive was this seemed like papering over the 
issue.

BTW what exactly is speculative scheduling ? As in what is it actually trying to
schedule ahead ?

> Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
> understand the problem more thoroughly before changing things.
> In the end LCM minimizes the number of vsetvls and inserts them at the
> "earliest" point.  If that is not sufficient I'd say we need modify
> the constraints (maybe on a per-uarch basis)?

As far as LCM is concerned it is hoisting the insn to the optimal spot. However
there's some additional logic such as in can_use_next_avl_p () which influences
if things can be moved around.

> On a separate note:  How about we move the vsetvl pass after sched2?
> Then we could at least rely on LCM doing its work uninhibited and wouldn't
> reorder vsetvls afterwards. 

Bingo ! excellent idea. This would ensure scheduling doesn't undo carefully
placed stuff, but 

>  Or do we somehow rely on rtl_dce and BB
> reorder to run afterwards?

... I have no idea if any of this is in play.

> That won't help with the problem here but might with others.

Right this needs to be evaluated independently with both icounts and BPI3 runs
to see if anything falls out.

-Vineet


[PATCH v3 2/4] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-13 Thread Lulu Cheng
Split the implementation of the function loongarch_cpu_cpp_builtins into two 
parts:
  1. Macro definitions that do not change (only considering 64-bit architecture)
  2. Macro definitions that change with different compilation options.

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (builtin_undef): New macro.
(loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins
and loongarch_define_unconditional_macros.
(loongarch_def_or_undef): New functions.
(loongarch_define_unconditional_macros): Likewise.
(loongarch_update_cpp_builtins): Likewise.

---
 gcc/config/loongarch/loongarch-c.cc | 122 ++--
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 5d8c02e094b..9a8de1ec381 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -31,26 +31,22 @@ along with GCC; see the file COPYING3.  If not see
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
+#define builtin_undef(TXT) cpp_undef (pfile, TXT)
 #define builtin_assert(TXT) cpp_assert (pfile, TXT)
 
-void
-loongarch_cpu_cpp_builtins (cpp_reader *pfile)
+static void
+loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader *pfile)
 {
-  builtin_assert ("machine=loongarch");
-  builtin_assert ("cpu=loongarch");
-  builtin_define ("__loongarch__");
-
-  builtin_define_with_value ("__loongarch_arch",
-loongarch_arch_strings[la_target.cpu_arch], 1);
-
-  builtin_define_with_value ("__loongarch_tune",
-loongarch_tune_strings[la_target.cpu_tune], 1);
-
-  builtin_define_with_value ("_LOONGARCH_ARCH",
-loongarch_arch_strings[la_target.cpu_arch], 1);
+  if (def_p)
+cpp_define (pfile, macro);
+  else
+cpp_undef (pfile, macro);
+}
 
-  builtin_define_with_value ("_LOONGARCH_TUNE",
-loongarch_tune_strings[la_target.cpu_tune], 1);
+static void
+loongarch_define_unconditional_macros (cpp_reader *pfile)
+{
+  builtin_define ("__loongarch__");
 
   /* Base architecture / ABI.  */
   if (TARGET_64BIT)
@@ -66,6 +62,48 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_lp64");
 }
 
+  /* Add support for FLOAT128_TYPE on the LoongArch architecture.  */
+  builtin_define ("__FLOAT128_TYPE__");
+
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+  builtin_define ("__builtin_nanq=__builtin_nanf128");
+  builtin_define ("__builtin_nansq=__builtin_nansf128");
+  builtin_define ("__builtin_infq=__builtin_inff128");
+  builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+
+  /* Native Data Sizes.  */
+  builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_FPSET", 32);
+  builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
+}
+
+static void
+loongarch_update_cpp_builtins (cpp_reader *pfile)
+{
+  /* Since the macros in this function might be redefined, it's necessary to
+ undef them first.*/
+  builtin_undef ("__loongarch_arch");
+  builtin_define_with_value ("__loongarch_arch",
+loongarch_arch_strings[la_target.cpu_arch], 1);
+
+  builtin_undef ("__loongarch_tune");
+  builtin_define_with_value ("__loongarch_tune",
+loongarch_tune_strings[la_target.cpu_tune], 1);
+
+  builtin_undef ("_LOONGARCH_ARCH");
+  builtin_define_with_value ("_LOONGARCH_ARCH",
+loongarch_arch_strings[la_target.cpu_arch], 1);
+
+  builtin_undef ("_LOONGARCH_TUNE");
+  builtin_define_with_value ("_LOONGARCH_TUNE",
+loongarch_tune_strings[la_target.cpu_tune], 1);
+
+  builtin_undef ("__loongarch_double_float");
+  builtin_undef ("__loongarch_single_float");
   /* These defines reflect the ABI in use, not whether the
  FPU is directly accessible.  */
   if (TARGET_DOUBLE_FLOAT_ABI)
@@ -73,6 +111,8 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else if (TARGET_SINGLE_FLOAT_ABI)
 builtin_define ("__loongarch_single_float=1");
 
+  builtin_undef ("__loongarch_soft_float");
+  builtin_undef ("__loongarch_hard_float");
   if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
 builtin_define ("__loongarch_hard_float=1");
   else
@@ -80,6 +120,7 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
 
 
   /* ISA Extensions.  */
+  builtin_undef ("__loongarch_frlen");
   if (TARGET_DOUBLE_FLOAT)
 builtin_define ("__loongarch_frlen=64");
   else if (TARGET_SINGLE_FLOAT)
@@ -87

[PATCH v3 0/4] Organize the code and fix PR118828 and PR118843.

2025-02-13 Thread Lulu Cheng
v1 -> v2:
 1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE}
__loongarch_{div32,am_bh,amcas,ld_seq_sa} and 
__loongarch_version_major/__loongarch_version_minor to update function.
 2. Fixed PR118843.
 3. Add testsuites.

v2 -> v3:
  1. Modify test cases (pr118828-3.c pr118828-4.c).

Lulu Cheng (4):
  LoongArch: Move the function loongarch_register_pragmas to
loongarch-c.cc.
  LoongArch: Split the function loongarch_cpu_cpp_builtins into two
functions.
  LoongArch: After setting the compilation options, update the
predefined macros.
  LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined
[PR118843].

 gcc/config/loongarch/loongarch-c.cc   | 204 +-
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch-target-attr.cc |  48 -
 .../gcc.target/loongarch/pr118828-2.c |  30 +++
 .../gcc.target/loongarch/pr118828-3.c |  32 +++
 .../gcc.target/loongarch/pr118828-4.c |  32 +++
 gcc/testsuite/gcc.target/loongarch/pr118828.c |  34 +++
 gcc/testsuite/gcc.target/loongarch/pr118843.c |   6 +
 8 files changed, 287 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c

-- 
2.34.1



[PATCH v3 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-13 Thread Lulu Cheng
PR target/118828

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse):
Update the predefined macros.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118828.c: New test.
* gcc.target/loongarch/pr118828-2.c: New test.
* gcc.target/loongarch/pr118828-3.c: New test.
* gcc.target/loongarch/pr118828-4.c: New test.

---
 gcc/config/loongarch/loongarch-c.cc   | 14 
 .../gcc.target/loongarch/pr118828-2.c | 30 
 .../gcc.target/loongarch/pr118828-3.c | 32 +
 .../gcc.target/loongarch/pr118828-4.c | 32 +
 gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 +++
 5 files changed, 142 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 9a8de1ec381..66ae77ad665 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "c-family/c-pragma.h"
 #include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
@@ -212,6 +213,19 @@ loongarch_pragma_target_parse (tree args, tree pop_target)
 
   loongarch_reset_previous_fndecl ();
 
+  /* For the definitions, ensure all newly defined macros are considered
+ as used for -Wunused-macros.  There is no point warning about the
+ compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros;
+  cpp_opts->warn_unused_macros = 0;
+
+  cpp_force_token_locations (parse_in, BUILTINS_LOCATION);
+  loongarch_update_cpp_builtins (parse_in);
+  cpp_stop_forcing_token_locations (parse_in);
+
+  cpp_opts->warn_unused_macros = saved_warn_unused_macros;
+
   /* If we're popping or reseting make sure to update the globals so that
  the optab availability predicates get recomputed.  */
   if (pop_target)
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-2.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c
new file mode 100644
index 000..3d32fcc15c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c
@@ -0,0 +1,30 @@
+/* { dg-do preprocess } */
+/* { dg-options "-mno-lsx" } */
+
+#ifdef __loongarch_sx
+#error LSX should not be available here
+#endif
+
+#ifdef __loongarch_simd_width
+#error simd width shuold not be available here
+#endif
+
+#pragma GCC push_options
+#pragma GCC target("lsx")
+#ifndef __loongarch_sx
+#error LSX should be available here
+#endif
+#ifndef __loongarch_simd_width
+#error simd width should be available here
+#elif __loongarch_simd_width != 128
+#error simd width should be 128
+#endif
+#pragma GCC pop_options
+
+#ifdef __loongarch_sx
+#error LSX should become unavailable again
+#endif
+
+#ifdef __loongarch_simd_width
+#error simd width shuold become unavailable again
+#endif
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
new file mode 100644
index 000..31ab8e59a3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64" } */
+/* { dg-final { scan-assembler "t1: loongarch64" } } */
+/* { dg-final { scan-assembler "t2: la64v1.1" } } */
+/* { dg-final { scan-assembler "t3: loongarch64" } } */
+
+#ifndef __loongarch_arch
+#error __loongarch_arch should be available here
+#endif
+
+void
+t1 (void)
+{
+  asm volatile ("# t1: " __loongarch_arch);
+}
+
+#pragma GCC push_options
+#pragma GCC target("arch=la64v1.1")
+
+void
+t2 (void)
+{
+  asm volatile ("# t2: " __loongarch_arch);
+}
+
+#pragma GCC pop_options
+
+void
+t3 (void)
+{
+  asm volatile ("# t3: " __loongarch_arch);
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-4.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
new file mode 100644
index 000..77587ee5614
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mtune=la464" } */
+/* { dg-final { scan-assembler "t1: la464" } } */
+/* { dg-final { scan-assembler "t2: la664" } } */
+/* { dg-final { scan-assembler "t3: la464" } } */
+
+#ifndef __loongarch_tune
+#error __loongarch_tune should be available here
+#endif
+
+void
+t1 (void)
+{
+  asm volatile ("# t1: " __loongarch_tune);
+}
+
+#pragma GCC push_options
+#pragma GCC target("tune=la664")
+
+void
+t2 (void)
+{
+  asm volatile ("# t2: " __loongarch_tune);
+}
+
+#pragma GCC pop_options

[PATCH v3 4/4] LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843].

2025-02-13 Thread Lulu Cheng
PR target/118843

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc
(loongarch_update_cpp_builtins): Fix macro definition issues.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118843.c: New test.

---
 gcc/config/loongarch/loongarch-c.cc   | 27 ++-
 gcc/testsuite/gcc.target/loongarch/pr118843.c |  6 +
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 66ae77ad665..effdcf0e255 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -129,9 +129,6 @@ loongarch_update_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
-  loongarch_def_or_undef (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE,
- "__loongarch_frecipe", pfile);
-
   loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_simd", pfile);
   loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_sx", pfile);
   loongarch_def_or_undef (ISA_HAS_LASX, "__loongarch_asx", pfile);
@@ -149,17 +146,23 @@ loongarch_update_cpp_builtins (cpp_reader *pfile)
   int max_v_major = 1, max_v_minor = 0;
 
   for (int i = 0; i < N_EVO_FEATURES; i++)
-if (la_target.isa.evolution & la_evo_feature_masks[i])
-  {
-   builtin_define (la_evo_macro_name[i]);
+{
+  builtin_undef (la_evo_macro_name[i]);
 
-   int major = la_evo_version_major[i],
-   minor = la_evo_version_minor[i];
+  if (la_target.isa.evolution & la_evo_feature_masks[i]
+ && (la_evo_feature_masks[i] != OPTION_MASK_ISA_FRECIPE
+ || TARGET_HARD_FLOAT))
+   {
+ builtin_define (la_evo_macro_name[i]);
 
-   max_v_major = major > max_v_major ? major : max_v_major;
-   max_v_minor = major == max_v_major
- ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor;
-  }
+ int major = la_evo_version_major[i],
+ minor = la_evo_version_minor[i];
+
+ max_v_major = major > max_v_major ? major : max_v_major;
+ max_v_minor = major == max_v_major
+   ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor;
+   }
+}
 
   /* Find the minimum ISA version required to run the target program.  */
   builtin_undef ("__loongarch_version_major");
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118843.c 
b/gcc/testsuite/gcc.target/loongarch/pr118843.c
new file mode 100644
index 000..30372b8ffe6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118843.c
@@ -0,0 +1,6 @@
+/* { dg-do preprocess } */
+/* { dg-options "-mfrecipe -mfpu=none" } */
+
+#ifdef __loongarch_frecipe
+#error __loongarch_frecipe should not be avaliable here
+#endif
-- 
2.34.1



[PATCH v3 1/4] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

2025-02-13 Thread Lulu Cheng
gcc/ChangeLog:

* config/loongarch/loongarch-target-attr.cc
(loongarch_pragma_target_parse): Move to ...
(loongarch_register_pragmas): Move to ...
* config/loongarch/loongarch-c.cc
(loongarch_pragma_target_parse): ... here.
(loongarch_register_pragmas): ... here.
* config/loongarch/loongarch-protos.h
(loongarch_process_target_attr): Function Declaration.

---
 gcc/config/loongarch/loongarch-c.cc   | 51 +++
 gcc/config/loongarch/loongarch-protos.h   |  1 +
 gcc/config/loongarch/loongarch-target-attr.cc | 48 -
 3 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index c95c0f373be..5d8c02e094b 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -23,9 +23,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "target.h"
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
@@ -145,3 +147,52 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
 
 }
+
+/* Hook to validate the current #pragma GCC target and set the state, and
+   update the macros based on what was changed.  If ARGS is NULL, then
+   POP_TARGET is used to reset the options.  */
+
+static bool
+loongarch_pragma_target_parse (tree args, tree pop_target)
+{
+  /* If args is not NULL then process it and setup the target-specific
+ information that it specifies.  */
+  if (args)
+{
+  if (!loongarch_process_target_attr (args, NULL))
+   return false;
+
+  loongarch_option_override_internal (&la_target,
+ &global_options,
+ &global_options_set);
+}
+
+  /* args is NULL, restore to the state described in pop_target.  */
+  else
+{
+  pop_target = pop_target ? pop_target : target_option_default_node;
+  cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (pop_target));
+}
+
+  target_option_current_node
+= build_target_option_node (&global_options, &global_options_set);
+
+  loongarch_reset_previous_fndecl ();
+
+  /* If we're popping or reseting make sure to update the globals so that
+ the optab availability predicates get recomputed.  */
+  if (pop_target)
+loongarch_save_restore_target_globals (pop_target);
+
+  return true;
+}
+
+/* Implement REGISTER_TARGET_PRAGMAS.  */
+
+void
+loongarch_register_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = loongarch_pragma_target_parse;
+}
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index b99f949a004..e7b318143bf 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -219,4 +219,5 @@ extern void loongarch_option_override_internal (struct 
loongarch_target *, struc
 extern void loongarch_reset_previous_fndecl (void);
 extern void loongarch_save_restore_target_globals (tree new_tree);
 extern void loongarch_register_pragmas (void);
+extern bool loongarch_process_target_attr (tree args, tree fndecl);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch-target-attr.cc 
b/gcc/config/loongarch/loongarch-target-attr.cc
index cee7031ca1e..cb537446dff 100644
--- a/gcc/config/loongarch/loongarch-target-attr.cc
+++ b/gcc/config/loongarch/loongarch-target-attr.cc
@@ -422,51 +422,3 @@ loongarch_option_valid_attribute_p (tree fndecl, tree, 
tree args, int)
   return ret;
 }
 
-/* Hook to validate the current #pragma GCC target and set the state, and
-   update the macros based on what was changed.  If ARGS is NULL, then
-   POP_TARGET is used to reset the options.  */
-
-static bool
-loongarch_pragma_target_parse (tree args, tree pop_target)
-{
-  /* If args is not NULL then process it and setup the target-specific
- information that it specifies.  */
-  if (args)
-{
-  if (!loongarch_process_target_attr (args, NULL))
-   return false;
-
-  loongarch_option_override_internal (&la_target,
- &global_options,
- &global_options_set);
-}
-
-  /* args is NULL, restore to the state described in pop_target.  */
-  else
-{
-  pop_target = pop_target ? pop_target : target_option_default_node;
-  cl_target_option_restore (&global_options, &global_options_set,
-   TREE_TARGET_OPTION (pop_target));
-}
-
-  target_option_current_node
-= build_target_option_node (&globa

Re: [PATCH htdocs] bugs: mention ASAN too

2025-02-13 Thread Sam James
Gerald Pfeifer  writes:

> On Mon, 11 Nov 2024, Sam James wrote:
>> Request that reporters try `-fsanitize=address,undefined` rather than
>> just `-fsanitize=undefined` when reporting bugs. We get invalid bug
>> reports which ASAN would've caught sometimes, even if it's less often
>> than where UBSAN would help.
>
> I don't have a strong opinion on this and would prefer someone else to 
> chime in. That said, if we don't hear from someone else by early next 
> week, please go ahead and push.

Done now - sorry, this had slipped my mind.

>
>
> Just one (naive) question: Are there instances where -fsanitize=undefined 
> may be available/working where -fsanitize=address,undefined may be not?
>
> If so, perhaps provide both invocations as in
>-fsanitize=undefined or -fsanitize=address,un...
> ?
>
> Your call; just a thought.
>

It's a good question - AFAIK there aren't any such cases. It is
possible, but rather remote, that the instrumentation from one *but not*
the other inhibits a compiler bug in some cases (or just user UB). I can
include both if you think that's worth doing, but I tend to think it'll
make the text too verbose.

> Gerald
>
>
>>  htdocs/bugs/index.html | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
>> index c7d2f310..d6556b26 100644
>> --- a/htdocs/bugs/index.html
>> +++ b/htdocs/bugs/index.html
>> @@ -52,7 +52,7 @@ try a current release or development snapshot.
>>  with gcc -Wall -Wextra and see whether this shows anything
>>  wrong with your code.  Similarly, if compiling with
>>  -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations
>> -makes a difference, or if compiling with -fsanitize=undefined
>> +makes a difference, or if compiling with 
>> -fsanitize=address,undefined
>>  produces any run-time errors, then your code is probably not correct.
>>  


Re: [PATCH 0/2] x86: Add a pass to fold tail call

2025-02-13 Thread Uros Bizjak
On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu  wrote:
>
> x86 conditional branch (jcc) target can be either a label or a symbol.
> Add a pass to fold tail call with jcc by turning:
>
> jcc .L6
> ...
> .L6:
> jmp tailcall
>
> into:
>
> jcc tailcall
>
> After basic block reordering pass, conditional branches look like
>
> (jump_insn 7 6 14 2 (set (pc)
> (if_then_else (eq (reg:CCZ 17 flags)
> (const_int 0 [0]))
> (label_ref:DI 23)
> (pc))) "x.c":8:5 1458 {jcc}
>  (expr_list:REG_DEAD (reg:CCZ 17 flags)
> (int_list:REG_BR_PROB 217325348 (nil)))
> ...
> (code_label 23 20 8 4 4 (nil) [1 uses])
> (note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
>  on_decl 0x7f4cff3c0b00 bar>) [0 bar S1 A8])
> (const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
>  (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  
>  l 0x7f4cff3c0b00 bar>)
> (nil))
> (nil))
>
> If the branch edge destination is a basic block with only a direct
> sibcall, change the jcc target to the sibcall target and decrement
> the destination basic block entry label use count.  Even though the
> destination basic block is unused, it must be kept since it is required
> by RTL control flow check and JUMP_LABEL of the conditional jump can
> only point to a code label, not a code symbol.  Dummy sibcall patterns
> are added so that sibcalls in basic blocks, whose entry label use count
> is 0, won't be generated.

This reads like you are trying to get around some checks in RTL
control flow. So, either changes you are performing to RTX stream are
not allowed (these checks are here for a reason), or the
infrastructure is not (yet) prepared to handle this functionality.
Either way, please discuss with infrastructure maintainers (CC'd)
first if the approach is correct and if these changes to RTX stream
are allowed by the infra.

Thanks,
Uros.

>
> Jump tables like
>
> foo:
> .cfi_startproc
> cmpl$4, %edi
> ja  .L1
> movl%edi, %edi
> jmp *.L4(,%rdi,8)
> .section.rodata
> .L4:
> .quad   .L8
> .quad   .L7
> .quad   .L6
> .quad   .L5
> .quad   .L3
> .text
> .L5:
> jmp bar3
> .L3:
> jmp bar4
> .L8:
> jmp bar0
> .L7:
> jmp bar1
> .L6:
> jmp bar2
> .L1:
> ret
> .cfi_endproc
>
> can also be changed to:
>
> foo:
> .cfi_startproc
> cmpl$4, %edi
> ja  .L1
> movl%edi, %edi
> jmp *.L4(,%rdi,8)
> .section.rodata
> .L4:
> .quad   bar0
> .quad   bar1
> .quad   bar2
> .quad   bar3
> .quad   bar4
> .text
> .L1:
> ret
> .cfi_endproc
>
> After basic block reordering pass, jump tables look like:
>
> (jump_table_data 16 15 17 (addr_vec:DI [
> (label_ref:DI 18)
> (label_ref:DI 22)
> (label_ref:DI 26)
> (label_ref:DI 30)
> (label_ref:DI 34)
> ]))
> ...
> (code_label 30 17 31 4 5 (nil) [1 uses])
> (note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
> ) [0 bar3 S1 A8])
> (const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
>  (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
> )
> (nil))
> (nil))
>
> If the jump table entry points to a target basic block with only a direct
> sibcall, change the entry to point to the sibcall target and decrement
> the target basic block entry label use count.  If the target basic block
> isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its
> entry label use count is 0.
>
> Update final_scan_insn_1 to skip a label if its use count is 0 and
> support symbol reference in jump table.  Update create_trace_edges to
> skip symbol reference in jump table.
>
> H.J. Lu (2):
>   x86: Add a pass to fold tail call
>   x86: Fold sibcall targets into jump table
>
>  gcc/config/i386/i386-features.cc   | 274 +
>  gcc/config/i386/i386-passes.def|   1 +
>  gcc/config/i386/i386-protos.h  |   3 +
>  gcc/config/i386/i386.cc|  12 +
>  gcc/config/i386/i386.md|  57 -
>  gcc/config/i386/predicates.md  |   4 +
>  gcc/dwarf2cfi.cc   |   7 +-
>  gcc/final.cc   |  26 +-
>  gcc/testsuite/gcc.target/i386/pr14721-1a.c |  54 
>  gcc/testsuite/gcc.target/i386/pr14721-1b.c |  37 +++
>  gcc/testsuite/gcc.target/i386/pr14721-1c.c |  37 +++
>  gcc/testsuite/gcc.target/i386/pr14721-2a.c |  58 +
>  gcc/testsuite/gcc.target/i386/pr14721-2b.c |  41 +++
>  gcc/testsuite/gcc.target/i386/pr14721-2c.c |  43 
>  gcc/testsuite/

[wwwdocs,applied] Mention -mno-call-main

2025-02-13 Thread Georg-Johann Lay

Applied the following avr news to gcc-15:

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7638d3d5..41425257 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -500,6 +500,10 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
>-msplit-ldst and
 href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#index-msplit-bit-shift";

>-msplit-bit-shift.
+  Support has been added for the new option
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#index-mno-call-main";
+   >-mno-call-main.  Instead of calling 
main,

+it will be located in section .init9.
 

 IA-32/x86-64

Johann


Re: [PATCH 0/2] x86: Add a pass to fold tail call

2025-02-13 Thread H.J. Lu
On Thu, Feb 13, 2025 at 5:31 PM Uros Bizjak  wrote:
>
> On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu  wrote:
> >
> > x86 conditional branch (jcc) target can be either a label or a symbol.
> > Add a pass to fold tail call with jcc by turning:
> >
> > jcc .L6
> > ...
> > .L6:
> > jmp tailcall
> >
> > into:
> >
> > jcc tailcall
> >
> > After basic block reordering pass, conditional branches look like
> >
> > (jump_insn 7 6 14 2 (set (pc)
> > (if_then_else (eq (reg:CCZ 17 flags)
> > (const_int 0 [0]))
> > (label_ref:DI 23)
> > (pc))) "x.c":8:5 1458 {jcc}
> >  (expr_list:REG_DEAD (reg:CCZ 17 flags)
> > (int_list:REG_BR_PROB 217325348 (nil)))
> > ...
> > (code_label 23 20 8 4 4 (nil) [1 uses])
> > (note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> > (call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
> >  > on_decl 0x7f4cff3c0b00 bar>) [0 bar S1 A8])
> > (const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
> >  (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  
> >  > l 0x7f4cff3c0b00 bar>)
> > (nil))
> > (nil))
> >
> > If the branch edge destination is a basic block with only a direct
> > sibcall, change the jcc target to the sibcall target and decrement
> > the destination basic block entry label use count.  Even though the
> > destination basic block is unused, it must be kept since it is required
> > by RTL control flow check and JUMP_LABEL of the conditional jump can
> > only point to a code label, not a code symbol.  Dummy sibcall patterns
> > are added so that sibcalls in basic blocks, whose entry label use count
> > is 0, won't be generated.
>
> This reads like you are trying to get around some checks in RTL
> control flow. So, either changes you are performing to RTX stream are
> not allowed (these checks are here for a reason), or the
> infrastructure is not (yet) prepared to handle this functionality.

The main issue is that because JUMP_LABEL of the conditional
jump can point to a code label, not a code symbol, I have no choice
but keep it even if it is unused.  If the infrastructure allows a symbol
reference in all places where a label reference is allowed, only
x86 backend changes are needed.

BTW, some targets, like arm, don't set use count on referenced
labels.  I will add a target hook to opt-out the zero use count label.

> Either way, please discuss with infrastructure maintainers (CC'd)
> first if the approach is correct and if these changes to RTX stream
> are allowed by the infra.
>
> Thanks,
> Uros.
>
> >
> > Jump tables like
> >
> > foo:
> > .cfi_startproc
> > cmpl$4, %edi
> > ja  .L1
> > movl%edi, %edi
> > jmp *.L4(,%rdi,8)
> > .section.rodata
> > .L4:
> > .quad   .L8
> > .quad   .L7
> > .quad   .L6
> > .quad   .L5
> > .quad   .L3
> > .text
> > .L5:
> > jmp bar3
> > .L3:
> > jmp bar4
> > .L8:
> > jmp bar0
> > .L7:
> > jmp bar1
> > .L6:
> > jmp bar2
> > .L1:
> > ret
> > .cfi_endproc
> >
> > can also be changed to:
> >
> > foo:
> > .cfi_startproc
> > cmpl$4, %edi
> > ja  .L1
> > movl%edi, %edi
> > jmp *.L4(,%rdi,8)
> > .section.rodata
> > .L4:
> > .quad   bar0
> > .quad   bar1
> > .quad   bar2
> > .quad   bar3
> > .quad   bar4
> > .text
> > .L1:
> > ret
> > .cfi_endproc
> >
> > After basic block reordering pass, jump tables look like:
> >
> > (jump_table_data 16 15 17 (addr_vec:DI [
> > (label_ref:DI 18)
> > (label_ref:DI 22)
> > (label_ref:DI 26)
> > (label_ref:DI 30)
> > (label_ref:DI 34)
> > ]))
> > ...
> > (code_label 30 17 31 4 5 (nil) [1 uses])
> > (note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> > (call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
> > ) [0 bar3 S1 A8])
> > (const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
> >  (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
> > )
> > (nil))
> > (nil))
> >
> > If the jump table entry points to a target basic block with only a direct
> > sibcall, change the entry to point to the sibcall target and decrement
> > the target basic block entry label use count.  If the target basic block
> > isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its
> > entry label use count is 0.
> >
> > Update final_scan_insn_1 to skip a label if its use count is 0 and
> > support symbol reference in jump table.  Update create_trace_edges to
> > skip symbol reference in jump table.
> >
> > H.J. Lu (2):
> >   x86: Add a pass to fold tail call
> >   x86: Fold sibcall targets into jump table
> >
> >  gcc/config/i386/i386-features.cc 

[PATCH v2 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

2025-02-13 Thread Xi Ruoyao
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization.  Also fix some test failures with the test cases:

- gcc.dg/vect/vect-reduc-chain-2.c
- gcc.dg/vect/vect-reduc-chain-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
- gcc.dg/vect/vect-reduc-chain-dot-slp-4.c

gcc/ChangeLog:

* config/loongarch/simd.md (wvec_half): New define_mode_attr.
(dot_prod): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-2.c (dg-final): Scan
DOT_PROD_EXPR in optimized tree.
---
 gcc/config/loongarch/simd.md  | 29 +++
 .../gcc.target/loongarch/wide-mul-reduc-2.c   |  3 +-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 661f5dc8dda..45d2bcaec2e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -90,6 +90,12 @@ (define_mode_attr WVEC_HALF [(V2DI "V1TI") (V4DI "V2TI")
 (V8HI "V4SI") (V16HI "V8SI")
 (V16QI "V8HI") (V32QI "V16HI")])
 
+;; Lower-case version.
+(define_mode_attr wvec_half [(V2DI "v1ti") (V4DI "v2ti")
+(V4SI "v2di") (V8SI "v4di")
+(V8HI "v4si") (V16HI "v8si")
+(V16QI "v8hi") (V32QI "v16hi")])
+
 ;; Integer vector modes with the same length and unit size as a mode.
 (define_mode_attr VIMODE [(V2DI "V2DI") (V4SI "V4SI")
  (V8HI "V8HI") (V16QI "V16QI")
@@ -786,6 +792,29 @@ (define_expand 
"_vmaddw__"
   DONE;
 })
 
+(define_expand "dot_prod"
+  [(match_operand: 0 "register_operand" "=f,f")
+   (match_operand:IVEC   1 "register_operand" " f,f")
+   (match_operand:IVEC   2 "register_operand" " f,f")
+   (match_operand: 3 "reg_or_0_operand" " 0,YG")
+   (any_extend (const_int 0))]
+  ""
+{
+  auto [op0, op1, op2, op3] = operands;
+
+  if (op3 == CONST0_RTX (mode))
+emit_insn (
+  gen__vmulwev__ (op0, op1, op2));
+  else
+emit_insn (
+  gen__vmaddwev__ (op0, op3, op1,
+  op2));
+
+  emit_insn (
+gen__vmaddwod__ (op0, op0, op1, op2));
+  DONE;
+})
+
 (define_insn "simd_maddw_evod__hetero"
   [(set (match_operand: 0 "register_operand" "=f")
(plus:
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
index 07a7601888a..61e92e58fc3 100644
--- a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlasx" } */
+/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */
 /* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */
+/* { dg-final { scan-tree-dump "DOT_PROD_EXPR" "optimized" } } */
 
 typedef __INT32_TYPE__ i32;
 typedef __INT64_TYPE__ i64;
-- 
2.48.1



[PATCH v2 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-13 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.

gcc/ChangeLog:

* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/wide-mul-reduc-1.c: New test.
* gcc.target/loongarch/wide-mul-reduc-2.c: New test.
---
 gcc/config/loongarch/simd.md   | 16 
 .../gcc.target/loongarch/wide-mul-reduc-1.c| 18 ++
 .../gcc.target/loongarch/wide-mul-reduc-2.c| 17 +
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index b7a28f7b3f2..661f5dc8dda 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -630,6 +630,7 @@ (define_expand "cbranch4"
 ;; Operations on elements at even/odd indices.
 (define_int_iterator zero_one [0 1])
 (define_int_attr ev_od [(0 "ev") (1 "od")])
+(define_int_attr even_odd [(0 "even") (1 "odd")])
 
 ;; Integer widening add/sub/mult.
 (define_insn "simd_w_evod__"
@@ -665,6 +666,21 @@ (define_expand 
"_vw__"
   DONE;
 })
 
+(define_expand "vec_widen_mult__"
+  [(match_operand: 0 "register_operand" "=f")
+   (match_operand:IVEC   1 "register_operand" " f")
+   (match_operand:IVEC   2 "register_operand" " f")
+   (any_extend (const_int 0))
+   (const_int zero_one)]
+  ""
+{
+  emit_insn (
+gen__vmulw__ (operands[0],
+operands[1],
+operands[2]));
+  DONE;
+})
+
 (define_insn "simd_w_evod__hetero"
   [(set (match_operand: 0 "register_operand" "=f")
(addsubmul:
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
new file mode 100644
index 000..d6e0da59dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump "WIDEN_MULT_EVEN_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "WIDEN_MULT_ODD_EXPR" "optimized" } } */
+
+typedef __INT32_TYPE__ i32;
+typedef __INT64_TYPE__ i64;
+
+i32 x[8], y[8];
+
+i64
+test (void)
+{
+  i64 ret = 0;
+  for (int i = 0; i < 8; i++)
+ret ^= (i64) x[i] * y[i];
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c 
b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
new file mode 100644
index 000..07a7601888a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlasx" } */
+/* { dg-final { scan-assembler "xvmaddw(ev|od)\\.d\\.w" } } */
+
+typedef __INT32_TYPE__ i32;
+typedef __INT64_TYPE__ i64;
+
+i32 x[8], y[8];
+
+i64
+test (void)
+{
+  i64 ret = 0;
+  for (int i = 0; i < 8; i++)
+ret += (i64) x[i] * y[i];
+  return ret;
+}
-- 
2.48.1



Re: [PATCH] libstdc++: Implement P3138R5 views::cache_latest

2025-02-13 Thread Jonathan Wakely
On Tue, 11 Feb 2025 at 05:59, Patrick Palka  wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/version.def (ranges_cache_latest): Define.
> * include/bits/version.h: Regenerate.
> * include/std/ranges (cache_latest_view): Define for C++26.
> (cache_latest_view::_Iterator): Likewise.
> (cache_latest_view::_Sentinel): Likewise.
> (views::__detail::__can_cache_latest): Likewise.
> (views::_CacheLatest, views::cache_latest): Likewise.
> * testsuite/std/ranges/adaptors/cache_latest/1.cc: New test.

The test is missing from the patch.

> ---
>  libstdc++-v3/include/bits/version.def |   8 ++
>  libstdc++-v3/include/bits/version.h   |  10 ++
>  libstdc++-v3/include/std/ranges   | 189 ++
>  3 files changed, 207 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index 002e560dc0d..6fb5db2e1fc 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -1837,6 +1837,14 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = ranges_cache_latest;
> +  values = {
> +v = 202411;
> +cxxmin = 26;
> +  };
> +};
> +
>  ftms = {
>name = ranges_concat;
>values = {
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index 70de189b1e0..db61a396c45 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -2035,6 +2035,16 @@
>  #endif /* !defined(__cpp_lib_is_virtual_base_of) && 
> defined(__glibcxx_want_is_virtual_base_of) */
>  #undef __glibcxx_want_is_virtual_base_of
>
> +#if !defined(__cpp_lib_ranges_cache_latest)
> +# if (__cplusplus >  202302L)
> +#  define __glibcxx_ranges_cache_latest 202411L
> +#  if defined(__glibcxx_want_all) || 
> defined(__glibcxx_want_ranges_cache_latest)
> +#   define __cpp_lib_ranges_cache_latest 202411L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_ranges_cache_latest) && 
> defined(__glibcxx_want_ranges_cache_latest) */
> +#undef __glibcxx_want_ranges_cache_latest
> +
>  #if !defined(__cpp_lib_ranges_concat)
>  # if (__cplusplus >  202302L)
>  #  define __glibcxx_ranges_concat 202403L
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 5c795a90fbc..db9a00be264 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -58,6 +58,7 @@
>  #define __glibcxx_want_ranges_as_const
>  #define __glibcxx_want_ranges_as_rvalue
>  #define __glibcxx_want_ranges_cartesian_product
> +#define __glibcxx_want_ranges_cache_latest
>  #define __glibcxx_want_ranges_concat
>  #define __glibcxx_want_ranges_chunk
>  #define __glibcxx_want_ranges_chunk_by
> @@ -1534,6 +1535,8 @@ namespace views::__adaptor
> this->_M_payload._M_apply(_Optional_func{__f}, __i);
> return this->_M_get();
>   }
> +
> +   using _Optional_base<_Tp>::_M_reset;
>};
>
>  template
> @@ -10203,6 +10206,192 @@ namespace ranges
>  } // namespace ranges
>  #endif // __cpp_lib_ranges_concat
>
> +#if __cpp_lib_ranges_cache_latest // C++ >= 26
> +namespace ranges
> +{
> +  template
> +requires view<_Vp>
> +  class cache_latest_view : public view_interface>
> +  {
> +_Vp _M_base = _Vp();
> +
> +using __cache_t = conditional_t>,
> +   add_pointer_t>,
> +   range_reference_t<_Vp>>;

__conditional_t is cheaper to instantiate than conditional_t, so when
it doesn't affect the mangled name of a public symbol we should prefer
__conditional_t.

> +__detail::__non_propagating_cache<__cache_t> _M_cache;
> +
> +class _Iterator;
> +class _Sentinel;
> +
> +  public:
> +cache_latest_view() requires default_initializable<_Vp> = default;
> +
> +constexpr explicit
> +cache_latest_view(_Vp __base)
> +: _M_base(std::move(__base))
> +{ }
> +
> +constexpr _Vp
> +base() const & requires copy_constructible<_Vp>
> +{ return _M_base; }
> +
> +constexpr _Vp
> +base() &&
> +{ return std::move(_M_base); }
> +
> +constexpr auto
> +begin()
> +{ return _Iterator(*this); }
> +
> +constexpr auto
> +end()
> +{ return _Sentinel(*this); }
> +
> +constexpr auto
> +size() requires sized_range<_Vp>
> +{ return ranges::size(_M_base); }
> +
> +constexpr auto
> +size() const requires sized_range
> +{ return ranges::size(_M_base); }
> +  };
> +
> +  template
> +cache_latest_view(_Range&&) -> cache_latest_view>;
> +
> +  template
> +requires view<_Vp>
> +  class cache_latest_view<_Vp>::_Iterator
> +  {
> +cache_latest_view* _M_parent;
> +iterator_t<_Vp> _M_current;
> +
> +constexpr explicit
> +_Iterator(cache_latest_view& __parent)
> +: _M_parent(std::__addressof(__parent)),
> +  _M_current(ranges:

Re: [PATCH] arm: gimple fold aes[ed] [PR114522]

2025-02-13 Thread Richard Earnshaw (lists)
On 12/02/2025 11:01, Christophe Lyon wrote:
> Almost a copy/paste from the recent aarch64 version of this patch,
> this one is a bit more intrusive because it also introduces
> arm_general_gimple_fold_builtin.
> 
> With this patch,
> gcc.target/arm/aes_xor_combine.c scan-assembler-not veor
> passes again.
> 
> gcc/ChangeLog:
> 
>   PR target/114522
>   * config/arm/arm-builtins.cc (arm_fold_aes_op): New function.
>   (arm_general_gimple_fold_builtin): New function.
>   * config/arm/arm-builtins.h (arm_general_gimple_fold_builtin): New
>   prototype.
>   * config/arm/arm.cc (arm_gimple_fold_builtin): Call
>   arm_general_gimple_fold_builtin as needed.

Thanks for picking this up.

OK.

R.

> ---
>  gcc/config/arm/arm-builtins.cc | 55 ++
>  gcc/config/arm/arm-builtins.h  |  1 +
>  gcc/config/arm/arm.cc  |  3 ++
>  3 files changed, 59 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
> index e860607686c..c56ab5db985 100644
> --- a/gcc/config/arm/arm-builtins.cc
> +++ b/gcc/config/arm/arm-builtins.cc
> @@ -45,6 +45,9 @@
>  #include "arm-builtins.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "ssa.h"
>  
>  #define SIMD_MAX_BUILTIN_ARGS 7
>  
> @@ -4053,4 +4056,56 @@ arm_cde_end_args (tree fndecl)
>  }
>  }
>  
> +/* Fold a call to vaeseq_u8 and vaesdq_u8.
> +   That is `vaeseq_u8 (x ^ y, 0)` gets folded
> +   into `vaeseq_u8 (x, y)`.*/
> +static gimple *
> +arm_fold_aes_op (gcall *stmt)
> +{
> +  tree arg0 = gimple_call_arg (stmt, 0);
> +  tree arg1 = gimple_call_arg (stmt, 1);
> +  if (integer_zerop (arg0))
> +arg0 = arg1;
> +  else if (!integer_zerop (arg1))
> +return nullptr;
> +  if (TREE_CODE (arg0) != SSA_NAME)
> +return nullptr;
> +  if (!has_single_use (arg0))
> +return nullptr;
> +  auto *s = dyn_cast (SSA_NAME_DEF_STMT (arg0));
> +  if (!s || gimple_assign_rhs_code (s) != BIT_XOR_EXPR)
> +return nullptr;
> +  gimple_call_set_arg (stmt, 0, gimple_assign_rhs1 (s));
> +  gimple_call_set_arg (stmt, 1, gimple_assign_rhs2 (s));
> +  return stmt;
> +}
> +
> +/* Try to fold STMT, given that it's a call to the built-in function with
> +   subcode FCODE.  Return the new statement on success and null on
> +   failure.  */
> +gimple *
> +arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt)
> +{
> +  gimple *new_stmt = NULL;
> +
> +  switch (fcode)
> +{
> +case ARM_BUILTIN_CRYPTO_AESE:
> +case ARM_BUILTIN_CRYPTO_AESD:
> +  new_stmt = arm_fold_aes_op (stmt);
> +  break;
> +}
> +
> +  /* GIMPLE assign statements (unlike calls) require a non-null lhs.  If we
> + created an assign statement with a null lhs, then fix this by assigning
> + to a new (and subsequently unused) variable.  */
> +  if (new_stmt && is_gimple_assign (new_stmt) && !gimple_assign_lhs 
> (new_stmt))
> +{
> +  tree new_lhs = make_ssa_name (gimple_call_return_type (stmt));
> +  gimple_assign_set_lhs (new_stmt, new_lhs);
> +}
> +
> +  return new_stmt;
> +}
> +
>  #include "gt-arm-builtins.h"
> diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
> index 1fa85b602d9..3a646619f44 100644
> --- a/gcc/config/arm/arm-builtins.h
> +++ b/gcc/config/arm/arm-builtins.h
> @@ -32,6 +32,7 @@ enum resolver_ident {
>  };
>  enum resolver_ident arm_describe_resolver (tree);
>  unsigned arm_cde_end_args (tree);
> +gimple *arm_general_gimple_fold_builtin (unsigned int, gcall *);
>  
>  #define ENTRY(E, M, Q, S, T, G) E,
>  enum arm_simd_type
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index a95ddf8201f..00499a26bae 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -76,6 +76,7 @@
>  #include "aarch-common.h"
>  #include "aarch-common-protos.h"
>  #include "machmode.h"
> +#include "arm-builtins.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2859,7 +2860,9 @@ arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>switch (code & ARM_BUILTIN_CLASS)
>  {
>  case ARM_BUILTIN_GENERAL:
> +  new_stmt = arm_general_gimple_fold_builtin (subcode, stmt);
>break;
> +
>  case ARM_BUILTIN_MVE:
>new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
>  }



[PATCH] c, v2: do not warn about truncating NUL char when initializing nonstring arrays [PR117178]

2025-02-13 Thread Jakub Jelinek
On Wed, Feb 05, 2025 at 10:53:24AM -0800, Kees Cook wrote:
> On Wed, Feb 05, 2025 at 12:59:58PM +0100, Jakub Jelinek wrote:
> > Kees, any progress on this?
> 
> I need to take another run at it. I got stalled out when I discovered
> that I array-of-char-arrays attributes got applied at the "wrong" depth,
> and stuff wasn't working.
> 
> e.g.:
> 
> char acpi_table[TABLE_SIZE][4] __attribute((nonstring)) = {
>   { "ohai" },
>   { "1234" },
> };
> 
> when nonstring was checked for on something like "acpi_table[2]" it
> wouldn't be found, since it was applied at the top level.

While I think we should address that, I think it should be handled
incrementally, it is basically a change in the nonstring attribute and
needs to be dealt wherever nonstring is handled.

In order to speed things up, I took your patch and applied Marek's and my
review comments to it, furthermore removed unreachable code -
if (warn_cxx_compat || len - unit > avail)
...
else if (warn_unterminated_string_initialization)
{
if (len - unit > avail)
...
else
...
}
makes no sense, as the second len - unit > avail will be always false.
And tweaked the test coverage a little bit as well.

Kees, are you submitting this under assignment to FSF (maybe the Google one
if it has one) or DCO?  See https://gcc.gnu.org/contribute.html#legal
for details.  If DCO, can you add your Signed-off-by: tag for it?

So far lightly tested, ok for trunk if it passes bootstrap/regtest?

2025-02-13  Kees Cook  
Jakub Jelinek  

PR c/117178
gcc/
* doc/invoke.texi (Wunterminated-string-initialization): Document
the new interaction between this warning and -Wc++-compat and that
initialization of decls with nonstring attribute aren't warned about.
gcc/c-family/
* c.opt (Wunterminated-string-initialization): Don't depend on
-Wc++-compat.
gcc/c/
* c-typeck.cc (digest_init): Add DECL argument.  Adjust wording of
pedwarn_init for too long strings and provide details on the lengths,
for string literals where just the trailing NULL doesn't fit warn for
warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++"
and provides details on lengths, otherwise for
warn_unterminated_string_initialization adjust the warning, provide
details on lengths and don't warn if get_attr_nonstring_decl (decl).
(build_c_cast, store_init_value, output_init_element): Adjust
digest_init callers.
gcc/testsuite/
* gcc.dg/Wunterminated-string-initialization.c: Add additional test
coverage.
* gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of
the diagnostics.
* gcc.dg/Wcxx-compat-23.c: New test.
* gcc.dg/Wcxx-compat-24.c: New test.

--- gcc/doc/invoke.texi.jj  2025-02-13 10:17:17.320789358 +0100
+++ gcc/doc/invoke.texi 2025-02-13 13:11:42.089042791 +0100
@@ -8661,17 +8661,20 @@ give a larger number of false positives
 @opindex Wunterminated-string-initialization
 @opindex Wno-unterminated-string-initialization
 @item -Wunterminated-string-initialization @r{(C and Objective-C only)}
-Warn about character arrays
-initialized as unterminated character sequences
-with a string literal.
+Warn about character arrays initialized as unterminated character sequences
+with a string literal, unless the declaration being initialized has
+the @code{nonstring} attribute.
 For example:
 
 @smallexample
-char arr[3] = "foo";
+char arr[3] = "foo"; /* Warning.  */
+char arr2[3] __attribute__((nonstring)) = "bar"; /* No warning.  */
 @end smallexample
 
-This warning is enabled by @option{-Wextra} and @option{-Wc++-compat}.
-In C++, such initializations are an error.
+This warning is enabled by @option{-Wextra}.  If @option{-Wc++-compat}
+is enabled, the warning has slightly different wording and warns even
+if the declaration being initialized has the @code{nonstring} warning,
+as in C++ such initializations are an error.
 
 @opindex Warray-compare
 @opindex Wno-array-compare
--- gcc/c-family/c.opt.jj   2025-01-02 11:47:29.681229781 +0100
+++ gcc/c-family/c.opt  2025-02-13 12:49:47.187320829 +0100
@@ -1550,7 +1550,7 @@ C ObjC Var(warn_unsuffixed_float_constan
 Warn about unsuffixed float constants.
 
 Wunterminated-string-initialization
-C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C 
ObjC,Wextra || Wc++-compat)
+C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C 
ObjC,Wextra)
 Warn about character arrays initialized as unterminated character sequences 
with a string literal.
 
 Wunused
--- gcc/c/c-typeck.cc.jj2025-01-14 09:36:43.751522483 +0100
+++ gcc/c/c-typeck.cc   2025-02-13 12:52:14.366275230 +0100
@@ -116,8 +116,8 @@ static void push_member_name (tree);
 static int spelling_length (void);
 static char *print_spelling (char *);
 static void warning_init (location_t, int, const char *);
-static tree digest_init (location_t, tree, tree, tr

Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]

2025-02-13 Thread Richard Biener
On Thu, 13 Feb 2025, Jakub Jelinek wrote:

> On Thu, Feb 13, 2025 at 12:48:44PM +0100, Richard Biener wrote:
> > So what this basically does is ensure we mark DECL_VALUE_EXPR when
> > VAR is marked which isn't done when marking a tree node.
> > 
> > That you special-case the hashtable walker is a workaround for
> > us not being able to say
> > 
> > struct GTY((mark_extra_stuff)) tree_decl_with_vis {
> > 
> > on 'tree' (or specifically the structs for a VAR_DECL).  And that we
> > rely on gengtype producing the 'tree' marker.  So we rely on the
> > hashtable keeping referenced trees live.
> 
> Yes, we could just arrange for gt_ggc_mx_lang_tree_node to additionally
> mark DECL_VALUE_EXPR for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set (dunno how
> exactly).
> I think what the patch does should be slightly cheaper, we avoid those
> DECL_VALUE_EXPR hash table lookups in the common case where DECL_VALUE_EXPR
> of marked variables just refers to trees which reference only marked
> VAR_DECLs and no unmarked ones.

Agreed, I also don't know how to inject additional code into
gt_ggc_mx_lang_tree_node.

Richard.


Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Jeff Law




On 2/13/25 1:47 AM, Robin Dapp wrote:

Other thoughts?


The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff
we can't/don't model in the pipeline, but I have no idea how to model
the VL=0 case there.

Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be
the first time a hook got (ab)used in ways that weren't part of the
original intent.


I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?
Yes/No.  The scheduler certainly has code to avoid hoisting when doing 
so would  change semantics.  That's not what's happening here.


I'd have to put it in a debugger or read the full dumps with some crazy 
scheduler dump verbosity setting to be sure, but what I suspect is 
happening is the scheduler is processing a multi-block region 
(effectively an extended basic block).   In this scenario the scheduler 
can pull insns from a later block into an earlier block, including past 
a conditional branch as long as it doesn't change program semantics.





Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?
The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a 
path that didn't have a vsetvl before.  Consider this simple graph.


0
   / \
  2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl 
will land in bb4.  bb0 is not a valid insertion point for the vsetvl 
pass because the path 0->3 doesn't strictly need a vsetvl.  That's 
inherent in the LCM algorithm (anticipatable).


The scheduler has no such limitations.  The scheduler might create a 
scheduling region out of blocks 0 and 2.  In that scenario, insns from 
block 2 may speculate into block 0 as long as doing so doesn't change 
semantics.




On a separate note:  How about we move the vsetvl pass after sched2?
Then we could at least rely on LCM doing its work uninhibited and wouldn't
reorder vsetvls afterwards.  Or do we somehow rely on rtl_dce and BB
reorder to run afterwards?

That won't help with the problem here but might with others.
It's a double edged sword.  If you defer placement until after 
scheduling, then the vsetvls can wreck havoc with whatever schedule that 
sched2 came up with.  It won't matter much for out of order designs, but 
potentially does for others.


In theory at sched2 time the insn stream should be fixed.  There are 
practical/historical exceptions, but changes to the insn stream after 
that point are discouraged.


jeff


RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-13 Thread Richard Biener
On Wed, 12 Feb 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, February 12, 2025 3:20 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load
> > [PR118464]
> > 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, February 12, 2025 2:58 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> > > [PR118464]
> > >
> > > On Tue, 11 Feb 2025, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This fixes two PRs on Early break vectorization by delaying the safety 
> > > > checks to
> > > > vectorizable_load when the VF, VMAT and vectype are all known.
> > > >
> > > > This patch does add two new restrictions:
> > > >
> > > > 1. On LOAD_LANES targets, where the buffer size is known, we reject 
> > > > uneven
> > > >group sizes, as they are unaligned every n % 2 iterations and so may 
> > > > cross
> > > >a page unwittingly.
> > > >
> > > > 2. On LOAD_LANES targets when the buffer is unknown, we reject 
> > > > vectorization
> > > if
> > > >we cannot peel for alignment, as the alignment requirement is quite 
> > > > large at
> > > >GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial 
> > > > so we
> > > >don't support it for now.
> > > >
> > > > There are other steps documented inside the code itself so that the 
> > > > reasoning
> > > > is next to the code.
> > > >
> > > > Note that for VLA I have still left this fully disabled when not 
> > > > working on a
> > > > fixed buffer.
> > > >
> > > > For VLA targets like SVE return element alignment as the desired vector
> > > > alignment.  This means that the loads are never misaligned and so 
> > > > annoying it
> > > > won't ever need to peel.
> > > >
> > > > So what I think needs to happen in GCC 16 is that.
> > > >
> > > > 1. during vect_compute_data_ref_alignment we need to take the max of
> > > >POLY_VALUE_MIN and vector_alignment.
> > > >
> > > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the 
> > > > guard add
> > a
> > > >check that ncopies * vectype does not exceed POLY_VALUE_MAX which we
> > use
> > > as a
> > > >proxy for pagesize.
> > > >
> > > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > > >vect_determine_partial_vectors_and_peeling since the first iteration 
> > > > has to
> > > >be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail 
> > > > to
> > > >vectorize.
> > > >
> > > > 4. Create a default mask to be used, so that
> > > vect_use_loop_mask_for_alignment_p
> > > >becomes true and we generate the peeled check through loop control 
> > > > for
> > > >partial loops.  From what I can tell this won't work for
> > > >LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> > > support at
> > > >all in the compiler.  That would need to be done independently from 
> > > > the
> > > >above.
> > >
> > > We basically need to implement peeling/versioning for alignment based
> > > on the actual POLY value with the fallback being first-fault loads.
> > >
> > > > In any case, not GCC 15 material so I've kept the WIP patches I have
> > > downstream.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * doc/invoke.texi (min-pagesize): Update docs with vectorizer 
> > > > use.
> > > > * tree-vect-data-refs.cc 
> > > > (vect_analyze_early_break_dependences): Delay
> > > > checks.
> > > > (vect_compute_data_ref_alignment): Remove alignment checks and 
> > > > move
> > > to
> > > > get_load_store_type, increase group access alignment.
> > > > (vect_enhance_data_refs_alignment): Add note to comment needing
> > > > investigating.
> > > > (vect_analyze_data_refs_alignment): Likewise.
> > > > (vect_supportable_dr_alignment): For group loads look at first 
> > > > DR.
> > > > * tree-vect-stmts.cc (get_load_store_type):
> > > > Perform safety checks for early break pfa.
> > > > * tree-vectorizer.h (dr_peeling_alignment,
> > > > dr_set_peeling_alignment): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR tree-optimization/118464
> > > > PR tree-optimization/116855
> > > > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes 
> > > > because the
> > > > load type is relaxed later.
> > > > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > > > * gcc.dg/vect/vect-early-break_22.c: Reje

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Jeff Law




On 2/13/25 5:12 AM, Vineet Gupta wrote:

On 2/13/25 14:17, Robin Dapp wrote:

Other thoughts?

The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff
we can't/don't model in the pipeline, but I have no idea how to model
the VL=0 case there.

Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be
the first time a hook got (ab)used in ways that weren't part of the
original intent.

I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?


Exactly. My gut feeling w/o deep dive was this seemed like papering over the 
issue.
Perhaps, but I'm pretty confident that even if this specific situation 
turns out to be slightly different that the scenario I see can/will 
happen elsewhere.




BTW what exactly is speculative scheduling ? As in what is it actually trying to
schedule ahead ?

In simplest terms assume we have this kind of graph

0
   / \
  1-->2


The scheduler knows how to build scheduling regions, essentially 
extended basic blocks.  In this case we have two regions one with the 
blocks 0,1 the other being just block 2.


In the multi-block region 0,1 we allow insns from block 1 to speculate 
into block 0.


Let's assume we're on a simple 2-wide in order machine and somewhere in 
bb0 we there's a slot available for an insn that we couldn't fill with 
anything useful from bb0.  In that case we may speculate an insn from 
bb1 into bb0 to execute "for free" in that unused slot.


That's the basic idea.  It was particularly helpful for in-order cores 
in the past. It's dramatically less important for an out of order core 
since those are likely doing the speculation in hardware.


Naturally if you're using icounts for evaluation this kind of behavior 
is highly undesirable since that kind of evaluation says the 
transformation is bad, but in reality on certain designs is quite helpful.


Jeff


Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Robin Dapp
> The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
> path that didn't have a vsetvl before.  Consider this simple graph.
>
>  0
> / \
>2-->3
>
> If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
> will land in bb4.  bb0 is not a valid insertion point for the vsetvl
> pass because the path 0->3 doesn't strictly need a vsetvl.  That's
> inherent in the LCM algorithm (anticipatable).

Yeah, I remember the same issue with the rounding-mode setter placement.

Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3
(or any other block that doesn't require one)?  Such a dummy vsetvl would be
fusible with every other vsetvl.  If there are dummy vsetvls remaining after
LCM just delete them?

Just thinking out loud, the devil will be in the details.

-- 
Regards
 Robin



[PATCH] [ifcombine] cope with signbit tests of extended values

2025-02-13 Thread Alexandre Oliva


A compare with zero may be taken as a sign bit test by
fold_truth_andor_for_ifcombine, but the operand may be extended from a
narrower field.  If the operand was narrower, the bitsize will reflect
the narrowing conversion, but if it was wider, we'll only know whether
the field is sign- or zero-extended from unsignedp, but we won't know
whether it needed to be extended, because arg will have changed to the
narrower variable when we get to the point in which we can compute the
arg width.  If it's sign-extended, we're testing the right bit, but if
it's zero-extended, there isn't any bit we can test.

Instead of punting and leaving the foldable compare to be figured out
by another pass, arrange for the sign bit resulting from the widening
zero-extension to be taken as zero, so that the modified compare will
yield the desired result.

While at that, avoid swapping the right-hand compare operands when
we've already determined that it was a signbit test: it no use to even
try.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/118805
* gimple-fold.cc (fold_truth_andor_for_combine): Detect and
cope with zero-extension in signbit tests.  Reject swapping
right-compare operands if rsignbit.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118805
* gcc.dg/field-merge-26.c: New.
---
 gcc/gimple-fold.cc|   22 +-
 gcc/testsuite/gcc.dg/field-merge-26.c |   20 
 2 files changed, 37 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-26.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 29191685a43c5..0380c7af4c213 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8090,14 +8090,16 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
 
   /* Prepare to turn compares of signed quantities with zero into sign-bit
  tests.  We need not worry about *_reversep here for these compare
- rewrites: loads will have already been reversed before compares.  */
-  bool lsignbit = false, rsignbit = false;
+ rewrites: loads will have already been reversed before compares.  Save the
+ precision, because [lr]l_arg may change and we won't be able to tell how
+ wide it was originally.  */
+  unsigned lsignbit = 0, rsignbit = 0;
   if ((lcode == LT_EXPR || lcode == GE_EXPR)
   && integer_zerop (lr_arg)
   && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
   && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
 {
-  lsignbit = true;
+  lsignbit = TYPE_PRECISION (TREE_TYPE (ll_arg));
   lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
 }
   /* Turn compares of unsigned quantities with powers of two into
@@ -8130,7 +8132,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
   && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
 {
-  rsignbit = true;
+  rsignbit = TYPE_PRECISION (TREE_TYPE (rl_arg));
   rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
 }
   else if ((rcode == LT_EXPR || rcode == GE_EXPR)
@@ -8204,7 +8206,7 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   || ! operand_equal_p (ll_inner, rl_inner, 0))
 {
   /* Try swapping the operands.  */
-  if (ll_reversep != rr_reversep
+  if (ll_reversep != rr_reversep || rsignbit
  || !operand_equal_p (ll_inner, rr_inner, 0))
return 0;
 
@@ -8284,6 +8286,14 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
   if (lsignbit)
 {
   wide_int sign = wi::mask (ll_bitsize - 1, true, ll_bitsize);
+  /* If ll_arg is zero-extended and we're testing the sign bit, we know
+what the result should be.  Shifting the sign bit out of sign will get
+us to mask the entire field out, yielding zero, i.e., the sign bit of
+the zero-extended value.  We know the masked value is being compared
+with zero, so the compare will get us the result we're looking
+for: TRUE if EQ_EXPR, FALSE if NE_EXPR.  */
+  if (lsignbit > ll_bitsize && ll_unsignedp)
+   sign <<= 1;
   if (!ll_and_mask.get_precision ())
ll_and_mask = sign;
   else
@@ -8303,6 +8313,8 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree 
truth_type,
   if (rsignbit)
 {
   wide_int sign = wi::mask (rl_bitsize - 1, true, rl_bitsize);
+  if (rsignbit > rl_bitsize && ll_unsignedp)
+   sign <<= 1;
   if (!rl_and_mask.get_precision ())
rl_and_mask = sign;
   else
diff --git a/gcc/testsuite/gcc.dg/field-merge-26.c 
b/gcc/testsuite/gcc.dg/field-merge-26.c
new file mode 100644
index 0..96d7e7205c5f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/field-merge-26.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-ccp -fno-tree-copy-prop -fno-tree-forwprop 
-fno-tree-fre" } */
+
+/* PR tree-optimization/118805 */
+

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Robin Dapp
> Yeah, I remember the same issue with the rounding-mode setter placement.
>
> Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3
> (or any other block that doesn't require one)?  Such a dummy vsetvl would be
> fusible with every other vsetvl.  If there are dummy vsetvls remaining after
> LCM just delete them?
>
> Just thinking out loud, the devil will be in the details.

Register liveness is of course relevant here.  Will surely depend on the
specific example whether that makes sense or not.

-- 
Regards
 Robin



[PATCH v2 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

2025-02-13 Thread Xi Ruoyao
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors.  To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even indices for define_insn's, and generate those
vectors in define_expand's.

For "backward compatibilty" we need to provide a "punned" version for
the operations invoking TImode vectors as the intrinsics still expect
DImode vectors.

The stat is "201 insertions, 905 deletions."

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVADDWEV): Remove.
(UNSPEC_LASX_XVADDWEV2): Remove.
(UNSPEC_LASX_XVADDWEV3): Remove.
(UNSPEC_LASX_XVSUBWEV): Remove.
(UNSPEC_LASX_XVSUBWEV2): Remove.
(UNSPEC_LASX_XVMULWEV): Remove.
(UNSPEC_LASX_XVMULWEV2): Remove.
(UNSPEC_LASX_XVMULWEV3): Remove.
(UNSPEC_LASX_XVADDWOD): Remove.
(UNSPEC_LASX_XVADDWOD2): Remove.
(UNSPEC_LASX_XVADDWOD3): Remove.
(UNSPEC_LASX_XVSUBWOD): Remove.
(UNSPEC_LASX_XVSUBWOD2): Remove.
(UNSPEC_LASX_XVMULWOD): Remove.
(UNSPEC_LASX_XVMULWOD2): Remove.
(UNSPEC_LASX_XVMULWOD3): Remove.
(lasx_xvwev_h_b): Remove.
(lasx_xvwev_w_h): Remove.
(lasx_xvwev_d_w): Remove.
(lasx_xvaddwev_q_d): Remove.
(lasx_xvsubwev_q_d): Remove.
(lasx_xvmulwev_q_d): Remove.
(lasx_xvwod_h_b): Remove.
(lasx_xvwod_w_h): Remove.
(lasx_xvwod_d_w): Remove.
(lasx_xvaddwod_q_d): Remove.
(lasx_xvsubwod_q_d): Remove.
(lasx_xvmulwod_q_d): Remove.
(lasx_xvaddwev_q_du): Remove.
(lasx_xvsubwev_q_du): Remove.
(lasx_xvmulwev_q_du): Remove.
(lasx_xvaddwod_q_du): Remove.
(lasx_xvsubwod_q_du): Remove.
(lasx_xvmulwod_q_du): Remove.
(lasx_xvwev_h_bu_b): Remove.
(lasx_xvwev_w_hu_h): Remove.
(lasx_xvwev_d_wu_w): Remove.
(lasx_xvwod_h_bu_b): Remove.
(lasx_xvwod_w_hu_h): Remove.
(lasx_xvwod_d_wu_w): Remove.
(lasx_xvaddwev_q_du_d): Remove.
(lasx_xvsubwev_q_du_d): Remove.
(lasx_xvmulwev_q_du_d): Remove.
(lasx_xvaddwod_q_du_d): Remove.
(lasx_xvsubwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_XVADDWEV): Remove.
(UNSPEC_LSX_VADDWEV2): Remove.
(UNSPEC_LSX_VADDWEV3): Remove.
(UNSPEC_LSX_VSUBWEV): Remove.
(UNSPEC_LSX_VSUBWEV2): Remove.
(UNSPEC_LSX_VMULWEV): Remove.
(UNSPEC_LSX_VMULWEV2): Remove.
(UNSPEC_LSX_VMULWEV3): Remove.
(UNSPEC_LSX_VADDWOD): Remove.
(UNSPEC_LSX_VADDWOD2): Remove.
(UNSPEC_LSX_VADDWOD3): Remove.
(UNSPEC_LSX_VSUBWOD): Remove.
(UNSPEC_LSX_VSUBWOD2): Remove.
(UNSPEC_LSX_VMULWOD): Remove.
(UNSPEC_LSX_VMULWOD2): Remove.
(UNSPEC_LSX_VMULWOD3): Remove.
(lsx_vwev_h_b): Remove.
(lsx_vwev_w_h): Remove.
(lsx_vwev_d_w): Remove.
(lsx_vaddwev_q_d): Remove.
(lsx_vsubwev_q_d): Remove.
(lsx_vmulwev_q_d): Remove.
(lsx_vwod_h_b): Remove.
(lsx_vwod_w_h): Remove.
(lsx_vwod_d_w): Remove.
(lsx_vaddwod_q_d): Remove.
(lsx_vsubwod_q_d): Remove.
(lsx_vmulwod_q_d): Remove.
(lsx_vaddwev_q_du): Remove.
(lsx_vsubwev_q_du): Remove.
(lsx_vmulwev_q_du): Remove.
(lsx_vaddwod_q_du): Remove.
(lsx_vsubwod_q_du): Remove.
(lsx_vmulwod_q_du): Remove.
(lsx_vwev_h_bu_b): Remove.
(lsx_vwev_w_hu_h): Remove.
(lsx_vwev_d_wu_w): Remove.
(lsx_vwod_h_bu_b): Remove.
(lsx_vwod_w_hu_h): Remove.
(lsx_vwod_d_wu_w): Remove.
(lsx_vaddwev_q_du_d): Remove.
(lsx_vsubwev_q_du_d): Remove.
(lsx_vmulwev_q_du_d): Remove.
(lsx_vaddwod_q_du_d): Remove.
(lsx_vsubwod_q_du_d): Remove.
(lsx_vmulwod_q_du_d): Remove.
* config/loongarch/loongarch-modes.def: Add V1TI, V1DI, and
V4TI.
* config/loongarch/loongarch-protos.h
(loongarch_gen_stepped_int_parallel): New function prototype.
* config/loongarch/loongarch.cc (loongarch_print_operand):
Accept 'O' for printing "ev" or "od."
(loongarch_gen_stepped_int_parallel): Implement.
* config/loongarch/loongarch.md (mode): Add V1DI, V1TI, and
mention V2TI.
* config/loongarch/predicates.md
(vect_par_cnst_even_or_odd_half): New define_predicate.
* config/loongarch/simd.md (WVEC_HALF): New define_mode_attr.
(simdfmt_w): Likewise.
(zero_one): New define_int_iterator.
(ev_od): New define_int_attr.
(simd_w_evod__): New define_insn.
(_vw__): New
define_expand.
(simd_w_evod__hetero): New define_insn.
(_vw__u_):
New define_expand.
(DIVEC): New defin

[PATCH] [testsuite] adjust expectations of x86 vect-simd-clone tests

2025-02-13 Thread Alexandre Oliva


Some vect-simd-clone tests fail when targeting ancient x86 variants,
because the expected transformations only take place with -msse4 or
higher.

So arrange for these tests to take an -msse4 option on x86, so that
the expected vectorization takes place, but decay to a compile test if
vect.exp would enable execution but the target doesn't have an sse4
runtime.  This requires the new dg-do-if to override the action on a
target while retaining the default action on others, instead of
disabling the test.

We can count on avx512f compile-time support for these tests, because
vect_simd_clones requires that on x86, and that implies sse4 support,
so we need not complicate the scan conditionals with tests for sse4,
except on the last test.

Regstrapped on x86_64-linux-gnu, also tested with gcc-14 targeting
x86_64-elf, targeting a cpu without sse4 support.  Ok to install?


for  gcc/ChangeLog

* doc/sourcebuild.texi (dg-do-if): Document.

for  gcc/testsuite/ChangeLog

* lib/target-supports-dg.exp (dg-do-if): New.
* gcc.dg/vect/vect-simd-clone-16f.c: Use -msse4 on x86, and
skip in case execution is enabled but the runtime isn't.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-20.c: Likewise, but only skip
the scan test.
---
 gcc/doc/sourcebuild.texi|5 
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c |2 ++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c |2 ++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c |2 ++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c  |6 +++--
 gcc/testsuite/lib/target-supports-dg.exp|   29 +++
 6 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 98ede70f23c05..255d1a451e44d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1128,6 +1128,11 @@ by the specified floating-point factor.
 @subsubsection Skip a test for some targets
 
 @table @code
+@item @{ dg-do-if @var{action} @{ @var{selector} @} @}
+Same as dg-do if the selector matches and the test hasn't already been
+marked as unsupported.  Use it to override an action on a target while
+leaving the default action alone for other targets.
+
 @item @{ dg-skip-if @var{comment} @{ @var{selector} @} [@{ @var{include-opts} 
@} [@{ @var{exclude-opts} @}]] @}
 Arguments @var{include-opts} and @var{exclude-opts} are lists in which
 each element is a string of zero or more GCC options.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
index 7cd29e894d050..bb3b081b0e3d8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c
@@ -1,5 +1,7 @@
+/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */
 /* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-msse4" { target sse4 } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } 
&& { ! lp64 } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
index 177521dc44531..504465614c989 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c
@@ -1,5 +1,7 @@
+/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */
 /* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-msse4" { target sse4 } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } 
&& { ! lp64 } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
index 4dd51381d73c0..0c418d4324821 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c
@@ -1,5 +1,7 @@
+/* { dg-do-if compile { target { sse2_runtime && { ! sse4_runtime } } } } */
 /* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-msse4" { target sse4 } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
 /* { dg-additional-options "-mno-avx512f" { target { { i?86*-*-* x86_64-*-* } 
&& { ! lp64 } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c
index 9f51a68f3a0c8..3e626fc4d4d56 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-s

[PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]

2025-02-13 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because we have multiple levels of
DECL_VALUE_EXPR VAR_DECLs:
  character(kind=1) id_string[1:.id_string] [value-expr: *id_string.55];
  character(kind=1)[1:.id_string] * id_string.55 [value-expr: 
FRAME.107.id_string.55];
  integer(kind=8) .id_string [value-expr: FRAME.107..id_string];
id_string is the user variable mentioned in BLOCK_VARS, it has
DECL_VALUE_EXPR because it is a VLA, id_string.55 is a temporary created by
gimplify_vla_decl as the address that points to the start of the VLA, what
is normally used in the IL to access it.  But as this artificial var is then
used inside of a nested function, tree-nested.cc adds DECL_VALUE_EXPR to it
too and moves the actual value into the FRAME.107 object's member.
Now, remove_unused_locals removes id_string.55 (and various other VAR_DECLs)
from cfun->local_decls, simply because it is not mentioned in the IL at all
(neither is id_string itself, but that is kept in BLOCK_VARS as it has
DECL_VALUE_EXPR).  So, after this point, id_string.55 tree isn't referenced from
anywhere but id_string's DECL_VALUE_EXPR.  Next GC collection is triggered,
and we are unlucky enough that in the value_expr_for_decl hash table
(underlying hash map for DECL_VALUE_EXPR) the id_string.55 entry comes
before the id_string entry.  id_string is ggc_marked_p because it is
referenced from BLOCK_VARS, but id_string.55 is not, as we don't mark
DECL_VALUE_EXPR anywhere but by gt_cleare_cache on value_expr_for_decl.
But gt_cleare_cache does two things, it calls clear_slots on entries
where the key is not ggc_marked_p (so the id_string.55 mapping to
FRAME.107.id_string.55 is lost and DECL_VALUE_EXPR (id_string.55) becomes
NULL) but then later we see id_string entry, which is ggc_marked_p, so mark
the whole hash table entry, which sets ggc_set_mark on id_string.55.  But
at this point its DECL_VALUE_EXPR is lost.
Later during dwarf2out.cc we want to emit DW_AT_location for id_string, see
it has DECL_VALUE_EXPR, so emit it as indirection of id_string.55 for which
we again lookup DECL_VALUE_EXPR as it has DECL_HAS_VALUE_EXPR_P, but as it
is NULL, we ICE, instead of finding it is a subobject of FRAME.107 for which
we can find its stack location.

Now, as can be seen in the PR, I've tried to tweak tree-ssa-live.cc so that
it would keep id_string.55 in cfun->local_decls; that prohibits it from
the DECL_VALUE_EXPR of it being GC until expansion, but then we shrink and
free cfun->local_decls completely and so GC at that point still can throw
it away.

The following patch adds an extension to the GTY ((cache)) option, before
calling the gt_cleare_cache on some hash table by specifying
GTY ((cache ("somefn"))) it calls somefn on that hash table as well.
And this extra hook can do any additional ggc_set_mark needed so that
gt_cleare_cache preserves everything that is actually needed and throws
away the rest.

In order to make it just 2 pass rather than up to n passes - (if we had
say
id1 -> something, id2 -> x(id1), id3 -> x(id2), id4 -> x(id3), id5 -> x(id4)
in the value_expr_for_decl hash table in that order (where idN are VAR_DECLs
with DECL_HAS_VALUE_EXPR_P, id5 is the only one mentioned from outside and
idN -> X stands for idN having DECL_VALUE_EXPR X, something for some
arbitrary tree and x(idN) for some arbitrary tree which mentions idN
variable) and in each pass just marked the to part of entries with
ggc_marked_p base.from we'd need to repeat until we don't mark anything)
the patch calls walk_tree on DECL_VALUE_EXPR of the marked trees and if it
finds yet unmarked tree, it marks it and walks its DECL_VALUE_EXPR as well
the same way.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-13  Jakub Jelinek  

PR debug/118790
* gengtype.cc (write_roots): Remove cache variable, instead break from
the loop on match and test o for NULL.  If the cache option has
non-empty string argument, call the specified function with v->name
as argument before calling gt_cleare_cache on it.
* tree.cc (gt_value_expr_mark_2, gt_value_expr_mark_1,
gt_value_expr_mark): New functions.
(value_expr_for_decl): Use GTY ((cache ("gt_value_expr_mark"))) rather
than just GTY ((cache)).
* doc/gty.texi (cache): Document optional argument of cache option.

* gfortran.dg/gomp/pr118790.f90: New test.

--- gcc/gengtype.cc.jj  2025-01-02 11:23:02.613710956 +0100
+++ gcc/gengtype.cc 2025-02-12 17:15:08.560424329 +0100
@@ -4656,13 +4656,12 @@ write_roots (pair_p variables, bool emit
   outf_p f = get_output_file_with_visibility (CONST_CAST (input_file*,
  v->line.file));
   struct flist *fli;
-  bool cache = false;
   options_p o;
 
   for (o = v->opt; o; o = o->next)
if (strcmp (o->name, "cache") == 0)
- cache = true;
-   if (!cache)
+ break;
+   if (!o)
continue;
 

[PATCH] [testsuite] add x86 effective target

2025-02-13 Thread Alexandre Oliva


I got tired of repeating the conditional that recognizes ia32 or
x86_64, and introduced 'x86' as a shorthand for that, adjusting all
occurrences in target-supports.exp, to set an example.  I found some
patterns that recognized i?86* and x86_64*, but I took those as likely
cut&pastos instead of trying to preserve those weirdnesses.

Regstrapped on x86_64-linux-gnu, also tested with gcc-14 targeting
x86_64-elf.  Ok to install?


for  gcc/ChangeLog

* doc/sourcebuild.texi: Add x86 effective target.

for  gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_x86): New.
Replace all uses of i?86-*-* and x86_64-*-* in this file.
---
 gcc/doc/sourcebuild.texi  |3 +
 gcc/testsuite/lib/target-supports.exp |  188 +
 2 files changed, 99 insertions(+), 92 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 255d1a451e44d..d4e2a13dd77a4 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2801,6 +2801,9 @@ Target supports the execution of @code{user_msr} 
instructions.
 @item vect_cmdline_needed
 Target requires a command line argument to enable a SIMD instruction set.
 
+@item x86
+Target is ia32 or x86_64.
+
 @item xorsign
 Target supports the xorsign optab expansion.
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 60e24129bd585..035f82eb86c93 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -740,7 +740,7 @@ proc check_profiling_available { test_what } {
 }
 
 if { $test_what == "-fauto-profile" } {
-   if { !([istarget i?86-*-linux*] || [istarget x86_64-*-linux*]) } {
+   if { !([check_effective_target_x86] && [istarget *-*-linux*]) } {
verbose "autofdo only supported on linux"
return 0
}
@@ -2616,17 +2616,23 @@ proc remove_options_for_riscv_zvbb { flags } {
 return [add_options_for_riscv_z_ext zvbb $flags]
 }
 
+# Return 1 if the target is ia32 or x86_64.
+
+proc check_effective_target_x86 { } {
+if { ([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+   return 1
+} else {
+return 0
+}
+}
+
 # Return 1 if the target OS supports running SSE executables, 0
 # otherwise.  Cache the result.
 
 proc check_sse_os_support_available { } {
 return [check_cached_effective_target sse_os_support_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
-   expr 0
-   } else {
-   expr 1
-   }
+   expr [check_effective_target_x86]
 }]
 }
 
@@ -2636,7 +2642,7 @@ proc check_sse_os_support_available { } {
 proc check_avx_os_support_available { } {
 return [check_cached_effective_target avx_os_support_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+   if { !([check_effective_target_x86]) } {
expr 0
} else {
# Check that OS has AVX and SSE saving enabled.
@@ -2659,7 +2665,7 @@ proc check_avx_os_support_available { } {
 proc check_avx512_os_support_available { } {
 return [check_cached_effective_target avx512_os_support_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+   if { !([check_effective_target_x86]) } {
expr 0
} else {
# Check that OS has AVX512, AVX and SSE saving enabled.
@@ -2682,7 +2688,7 @@ proc check_avx512_os_support_available { } {
 proc check_sse_hw_available { } {
 return [check_cached_effective_target sse_hw_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+   if { !([check_effective_target_x86]) } {
expr 0
} else {
check_runtime_nocache sse_hw_available {
@@ -2706,7 +2712,7 @@ proc check_sse_hw_available { } {
 proc check_sse2_hw_available { } {
 return [check_cached_effective_target sse2_hw_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+   if { !([check_effective_target_x86]) } {
expr 0
} else {
check_runtime_nocache sse2_hw_available {
@@ -2730,7 +2736,7 @@ proc check_sse2_hw_available { } {
 proc check_sse4_hw_available { } {
 return [check_cached_effective_target sse4_hw_available {
# If this is not the right target then we can skip the test.
-   if { !([istarget i?86-*-*] || [istarget x86_64-*-*]) } {
+   if { !([check_effective_target_x86]) } {
expr 0
} else {
check_runtime_nocache sse4_hw_available {
@@ -2754,7 +2760,7 @@ proc check_sse4_hw_available { } {
 proc check_avx_h

[committed] testsuite: Add another range for coroutines testcase [PR118574]

2025-02-13 Thread Jakub Jelinek
Hi!

On Tue, Feb 11, 2025 at 11:47:09PM +0100, Jason Merrill wrote:
> The implementation in r15-3840 used a novel technique of wrapping the entire
> range-for loop in a CLEANUP_POINT_EXPR, which confused the coroutines
> transformation.  Instead let's use the existing extend_ref_init_temps
> mechanism.
> 
> This does not revert all of r15-3840, only the parts that change how
> CLEANUP_POINT_EXPRs are applied to range-for declarations.

Thanks.

Here is a patch which adds another range for coroutine testcase, which doesn't
extend (across co_await) just the __for_range var and what it binds
to (so passes even without -frange-for-ext-temps), but also some other
temporaries and verifies they are destructed in the right order.

Tested on x86_64-linux, committed to trunk as obvious.

2025-02-13  Jakub Jelinek  

PR c++/118574
* g++.dg/coroutines/range-for2.C: New test.

--- gcc/testsuite/g++.dg/coroutines/range-for2.C.jj 2025-02-13 
11:28:48.381043861 +0100
+++ gcc/testsuite/g++.dg/coroutines/range-for2.C2025-02-13 
11:49:03.872040995 +0100
@@ -0,0 +1,92 @@
+// PR c++/118574
+// { dg-do run }
+// { dg-additional-options "-std=c++23 -O2" }
+
+#include 
+
+[[gnu::noipa]] void
+baz (int *)
+{
+}
+
+struct D {
+  D () : d (new int (42)) {}
+  ~D () { if (*d != 42) __builtin_abort (); *d = 0; baz (d); delete d; }
+  int *d;
+};
+
+struct E {
+  E (const D &x) : e (x) {}
+  void test () const { if (*e.d != 42) __builtin_abort (); }
+  ~E () { test (); }
+  const D &e;
+};
+
+struct A {
+  const char **a = nullptr;
+  int n = 0;
+  const E *e1 = nullptr;
+  const E *e2 = nullptr;
+  void test () const { if (e1) e1->test (); if (e2) e2->test (); }
+  void push_back (const char *x) { test (); if (!a) a = new const char *[2]; 
a[n++] = x; }
+  const char **begin () const { test (); return a; }
+  const char **end () const { test (); return a + n; }
+  ~A () { test (); delete[] a; }
+};
+
+struct B {
+  long ns;
+  bool await_ready () const noexcept { return false; }
+  void await_suspend (std::coroutine_handle<> h) const noexcept {
+volatile int v = 0;
+while (v < ns)
+  v = v + 1;
+h.resume ();
+  }
+  void await_resume () const noexcept {}
+};
+
+struct C {
+  struct promise_type {
+const char *value;
+std::suspend_never initial_suspend () { return {}; }
+std::suspend_always final_suspend () noexcept { return {}; }
+void return_value (const char *v) { value = v; }
+void unhandled_exception () { __builtin_abort (); }
+C get_return_object () { return C{this}; }
+  };
+  promise_type *p;
+  explicit C (promise_type *p) : p(p) {}
+  const char *get () { return p->value; }
+};
+
+A
+foo (const E &e1, const E &e2)
+{
+  A a;
+  a.e1 = &e1;
+  a.e2 = &e2;
+  a.push_back ("foo");
+  a.push_back ("bar");
+  return a;
+}
+
+C
+bar ()
+{
+  A ret;
+  for (const auto &item : foo (E{D {}}, E{D {}}))
+{
+  co_await B{20};
+  ret.push_back (item);
+}
+  co_return "foobar";
+}
+
+int
+main ()
+{
+  auto task = bar ();
+  if (__builtin_strcmp (task.get (), "foobar"))
+__builtin_abort ();
+}

Jakub



Re: [PATCH] LoongArch: Accept ADD, IOR or XOR when combining objects with no bits in common [PR115478]

2025-02-13 Thread Lulu Cheng

LGTM!

Thanks!

在 2025/2/11 下午2:34, Xi Ruoyao 写道:

Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
It's generally a good thing (allowing to use our alsl instruction or
similar instrunction on other architectures), but it's preventing us
from using bytepick.  For example, if we shift a __int128 by 16 bits,
the higher word can be produced via a single bytepick.d instruction with
immediate 2, but we got:

srli.d  $r12,$r4,48
slli.d  $r5,$r5,16
slli.d  $r4,$r4,16
add.d   $r5,$r12,$r5
jr  $r1

This wasn't work with GCC 14, but after r15-6490 it's supposed to work
if IOR was used instead of PLUS.

To fix this, add a code iterator to match IOR, XOR, and PLUS and use it
instead of just IOR if we know the operands have no overlapping bits.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_or_plus): New
define_code_iterator.
(bstrins__for_ior_mask): Use any_or_plus instead of ior.
(bytepick_w_): Likewise.
(bytepick_d_): Likewise.
(bytepick_d__rev): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bytepick_shift_128.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 46 +++
  .../gcc.target/loongarch/bytepick_shift_128.c |  9 
  2 files changed, 36 insertions(+), 19 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/bytepick_shift_128.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2baba13560a..6f507c3c7f6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -488,6 +488,10 @@ (define_code_attr bitwise_operand [(and "and_operand")
   (xor "uns_arith_operand")])
  (define_code_attr is_and [(and "true") (ior "false") (xor "false")])
  
+;; If we know the operands does not have overlapping bits, use this

+;; instead of just ior to cover more cases.
+(define_code_iterator any_or_plus [any_or plus])
+
  ;; This code iterator allows unsigned and signed division to be generated
  ;; from the same template.
  (define_code_iterator any_div [div udiv mod umod])
@@ -1588,10 +1592,11 @@ (define_insn "*one_cmplsi2_internal"
  
  (define_insn_and_split "*bstrins__for_ior_mask"

[(set (match_operand:GPR 0 "register_operand" "=r")
-   (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand" "r")
- (match_operand:GPR 2 "const_int_operand" "i"))
-(and:GPR (match_operand:GPR 3 "register_operand" "r")
- (match_operand:GPR 4 "const_int_operand" "i"]
+   (any_or_plus:GPR
+ (and:GPR (match_operand:GPR 1 "register_operand" "r")
+  (match_operand:GPR 2 "const_int_operand" "i"))
+ (and:GPR (match_operand:GPR 3 "register_operand" "r")
+  (match_operand:GPR 4 "const_int_operand" "i"]
"loongarch_pre_reload_split ()
 && loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
"#"
@@ -4256,12 +4261,13 @@ (define_expand "2"
  DONE;
})
  
-(define_insn "bytepick_w_"

+(define_insn "*bytepick_w_"
[(set (match_operand:SI 0 "register_operand" "=r")
-   (ior:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
-(const_int ))
-   (ashift:SI (match_operand:SI 2 "register_operand" "r")
-  (const_int bytepick_w_ashift_amount]
+   (any_or_plus:SI
+ (lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
+  (const_int ))
+ (ashift:SI (match_operand:SI 2 "register_operand" "r")
+(const_int bytepick_w_ashift_amount]
""
"bytepick.w\t%0,%1,%2,"
[(set_attr "mode" "SI")])
@@ -4299,22 +4305,24 @@ (define_insn "bytepick_w_1_extend"
"bytepick.w\t%0,%2,%1,1"
[(set_attr "mode" "SI")])
  
-(define_insn "bytepick_d_"

+(define_insn "*bytepick_d_"
[(set (match_operand:DI 0 "register_operand" "=r")
-   (ior:DI (lshiftrt (match_operand:DI 1 "register_operand" "r")
- (const_int ))
-   (ashift (match_operand:DI 2 "register_operand" "r")
-   (const_int bytepick_d_ashift_amount]
+   (any_or_plus:DI
+ (lshiftrt (match_operand:DI 1 "register_operand" "r")
+   (const_int ))
+ (ashift (match_operand:DI 2 "register_operand" "r")
+ (const_int bytepick_d_ashift_amount]
"TARGET_64BIT"
"bytepick.d\t%0,%1,%2,"
[(set_attr "mode" "DI")])
  
-(define_insn "bytepick_d__rev"

+(define_insn "*bytepick_d__rev"
[(set (match_operand:DI 0 "register_operand" "=r")
-   (ior:DI (ashift (match_operand:DI 1 "register_operand" "r")
-   (const_int bytepick_d_ashift_amount))
-   (lshiftrt (match_operand:DI 2 "register_operand" "r")
-   

Re: [pushed] c++: don't default -frange-for-ext-temps in -std=gnu++20 [PR188574]

2025-02-13 Thread Jakub Jelinek
On Wed, Feb 12, 2025 at 12:07:53AM +0100, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
> -- 8< --
> 
> Since -frange-for-ext-temps has been causing trouble, let's not enable it
> by default in pre-C++23 GNU modes for GCC 15, and also allow disabling it in
> C++23 and up.

The reason for disallowing disabling it for C++23 and up has been feature
test macros, but admittedly that will be only a problem if/when C++26 or
C++29 etc. add another range for paper and bump __cpp_range_based_for
value again.  At that point unless that change is conditional on another flag
we'd need to require -frange-for-ext-temps to be on.  This can certainly
wait until that happens (if ever).

Jakub



Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-13 Thread Lulu Cheng

Hi, Ruoyao:

When will it be convenient for you to submit the v2 version of the patch?

I am planning to merge the current patches and then test the optimal values

for -malign-{functions,labels,jumps,loops} on that basis.


在 2025/2/12 上午3:30, Xi Ruoyao 写道:

On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:

在 2025/2/7 下午8:09, Xi Ruoyao 写道:
/* snip */

-
-(define_insn "lasx_xvpickev_w"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (vec_select:V8SI
-     (vec_concat:V16SI
-       (match_operand:V8SI 1 "register_operand" "f")
-       (match_operand:V8SI 2 "register_operand" "f"))
-     (parallel [(const_int 0) (const_int 2)
-    (const_int 8) (const_int 10)
-    (const_int 4) (const_int 6)
-    (const_int 12) (const_int 14)])))]
-  "ISA_HAS_LASX"
-  "xvpickev.w\t%u0,%u2,%u1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8SI")])
-

/* snip */

+;; Picking even/odd elements.
+(define_insn "simd_pick_evod_"
+  [(set (match_operand:ALLVEC 0 "register_operand" "=f")
+   (vec_select:ALLVEC
+     (vec_concat:
+       (match_operand:ALLVEC 1 "register_operand" "f")
+       (match_operand:ALLVEC 2 "register_operand" "f"))
+     (match_operand: 3 "vect_par_cnst_even_or_odd_half")))]

For LASX, the generated select array is problematic, taking xvpickev.w
as an example:

xvpickev.w  vd,vj,vk

The behavior of the instruction is as follows:

vd.w[0] = vk.w[0]

vd.w[1] = vk.w[2]

vd.w[2] = vj.w[0]

vd.w[3] = vj.w[2]

vd.w[4] = vk.w[4]

vd.w[5] = vk.w[6]

vd.w[6] = vj.w[4]

vd.w[7] = vj.w[6]

Oops stupid I.  Strangely the bootstrapping (even with BOOT_CFLAGS="-O2
-g -march=la664") and regtesting cannot catch it.

I'll limit this to LSX in v2.





[COMMITTED] doc: Update install.texi for GCC 15 on Solaris

2025-02-13 Thread Rainer Orth
Apart from minor updates, this patch is primarily an important caveat
about binutils PR ld/32580, which has broken the binutils 2.44 ld on
Solaris/x86.

Tested on i386-pc-solaris2.11, committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2025-02-11  Rainer Orth  

gcc:
* doc/install.texi (Specific, *-*-solaris2*): Updates for newer
Solaris 11.4 SRUs and binutils 2.44.

# HG changeset patch
# Parent  e96fa536cfda3b63e25f7fa1bd6b17875d7ec056
doc: Update install.texi for GCC 15

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4840,7 +4840,7 @@ Support for Solaris 10 has been removed 
 9 has been removed in GCC 5.  Support for Solaris 8 has been removed in
 GCC 4.8.  Support for Solaris 7 has been removed in GCC 4.6.
 
-Solaris 11.4 provides one or more of GCC 5, 7, 9, 10, 11, 12, and 13.
+Solaris 11.4 provides one or more of GCC 5, 7, 9, 10, 11, 12, 13, and 14.
 
 You need to install the @code{system/header}, @code{system/linker}, and
 @code{developer/assembler} packages.
@@ -4862,7 +4862,7 @@ conjunction with the Solaris linker.
 The GNU @command{as} versions included in Solaris 11.4, from GNU
 binutils 2.30.1 or newer (in @file{/usr/bin/gas} and
 @file{/usr/gnu/bin/as}), are known to work.  The version from GNU
-binutils 2.42 is known to work as well.  Recent versions of the Solaris
+binutils 2.44 is known to work as well.  Recent versions of the Solaris
 assembler in @file{/usr/bin/as} work almost as well, though.  To use GNU
 @command{as}, configure with the options @option{--with-gnu-as
 --with-as=@//usr/@/gnu/@/bin/@/as}.
@@ -4870,9 +4870,12 @@ assembler in @file{/usr/bin/as} work alm
 For linking, the Solaris linker is preferred.  If you want to use the
 GNU linker instead, the version in Solaris 11.4, from GNU binutils
 2.30.1 or newer (in @file{/usr/gnu/bin/ld} and @file{/usr/bin/gld}),
-works, as does the version from GNU binutils 2.42.  However, it
+works.  However, it
 generally lacks platform specific features, so better stay with Solaris
-@command{ld}.  To use the LTO linker plugin
+@command{ld}.  When using the version from GNU binutils 2.44, there's
+an important caveat: binutils @emph{must} be configured with
+@code{CONFIG_SHELL=/bin/bash}, otherwise the linker's built-in linker
+scripts get corrupted on x86.  To use the LTO linker plugin
 (@option{-fuse-linker-plugin}) with GNU @command{ld}, GNU binutils
 @emph{must} be configured with @option{--enable-largefile}.  To use
 Solaris @command{ld}, we recommend to configure with
@@ -4894,7 +4897,7 @@ will be disabled if no appropriate versi
 work.
 
 In order to build the GNU Ada compiler, GNAT, a working GNAT is needed.
-Since Solaris 11.4 SRU 39, GNAT 11, 12 or 13 is bundled in the
+Since Solaris 11.4 SRU 39, GNAT 11, 12, 13 or 14 is bundled in the
 @code{developer/gcc/gcc-gnat} package.
 
 In order to build the GNU D compiler, GDC, a working @samp{libphobos} is


Re: [PATCH v2] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread Uros Bizjak
On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu  wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.
>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use_1): Likewise.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.
> * gcc.target/i386/pr109780-3.c: Likewise.

Some non-algorithmical changes below, otherwise LGTM. Please also get
someone to review dataflow infrastructure usage, I am not well versed
with it.

+/* Helper function for ix86_find_all_reg_use.  */
+
+static void
+ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
+ auto_bitmap &worklist)
+{
+  rtx src = SET_SRC (set);
+  if (MEM_P (src))

Also reject assignment from CONST_SCALAR_INT?

+return;
+
+  rtx dest = SET_DEST (set);
+  if (!REG_P (dest))
+return;

Can we switch these two so the test for REG_P (dest) will be first? We
are not interested in anything that doesn't assign to a register.

+/* Find all registers defined with REG.  */
+
+static void
+ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
+   unsigned int reg, auto_bitmap &worklist)
+{
+  for (df_ref ref = DF_REG_USE_CHAIN (reg);
+   ref != NULL;
+   ref = DF_REF_NEXT_REG (ref))
+{
+  if (DF_REF_IS_ARTIFICIAL (ref))
+continue;
+
+  rtx_insn *insn = DF_REF_INSN (ref);
+  if (!NONDEBUG_INSN_P (insn))
+continue;

Here we pass only NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X)

+  if (CALL_P (insn) || JUMP_P (insn))
+continue;

And here remains only NONJUMP_INSN_P (X), so both above conditions
could be substituted with:

if (!NONJUMP_INSN_P (X))
  continue;
+
+  rtx set = single_set (insn);
+  if (set)
+ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);
+
+  rtx pat = PATTERN (insn);
+  if (GET_CODE (pat) != PARALLEL)
+continue;
+
+  for (int i = 0; i < XVECLEN (pat, 0); i++)
+{
+  rtx exp = XVECEXP (pat, 0, i);
+  switch (GET_CODE (exp))
+{
+case ASM_OPERANDS:
+case CLOBBER:
+case PREFETCH:
+case USE:
+  break;
+case UNSPEC:
+case UNSPEC_VOLATILE:
+  for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--)
+{
+  rtx x = XVECEXP (exp, 0, j);
+  if (GET_CODE (x) == SET)
+ix86_find_all_reg_use_1 (x, stack_slot_access,
+ worklist);
+}
+  break;
+case SET:
+  ix86_find_all_reg_use_1 (exp, stack_slot_access,
+   worklist);
+  break;
+default:
+  debug_rtx (exp);

Stray debug remaining?

+  HARD_REG_SET stack_slot_access;
+  CLEAR_HARD_REG_SET (stack_slot_access);
+
+  /* Stack slot can be accessed by stack pointer, frame pointer or
+ registers defined by stack pointer or frame pointer.  */
+  auto_bitmap worklist;

Please put a line of vertical space here ...

+  add_to_hard_reg_set (&stack_slot_access, Pmode,
+   STACK_POINTER_REGNUM);
+  bitmap_set_bit (worklist, STACK_POINTER_REGNUM);

... here ...

+  if (frame_pointer_needed)
+{
+  add_to_hard_reg_set (&stack_slot_access, Pmode,
+   HARD_FRAME_POINTER_REGNUM);
+  bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM);
+}

... here ...

+  unsigned int reg;

... here ...

+  do
+{
+  reg = bitmap_clear_first_set_bit (worklist);
+  ix86_find_all_reg_use (stack_slot_access, reg, worklist);
+}
+  while (!bitmap_empty_p (worklist));
+
+  hard_reg_set_iterator hrsi;

... here ...

+  EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi)
+for (df_ref ref = DF_REG_USE_CHAIN (reg);
+ ref != NULL;
+ ref = DF_REF_NEXT_REG (ref))
+  {
+if (DF_REF_IS_ARTIFICIAL (ref))
+  continue;
+
+rtx_insn *insn = DF_REF_INSN (ref);

... and here.

+if (!NONDEBUG_INSN_P (insn))

!NONJUMP_INSN_P ?

+  continue;

Also some vertical space here.

+note_stores (insn, ix86_update_stack_alignment,
+ &stack_alignment);
+  }
 }

diff --git a/gcc/testsuite/gcc.target/i386/pr109093-1.c
b/gcc/testsuite/gcc.target/i386/pr109093-1.c
new file mode 100644
index 000..0459d1947f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109093-1.c
@@ -0,0 +1,39 @@
+/* { dg-do run }  */
+/* { dg-options "-O2 -mavx2 -mtune=znver1
-ftrivial-auto-var-init=zero -fno-stack-protector" } */
+

Please use

/* { dg-do run { target

[COMMITTED] build: Remove HAVE_LD_EH_FRAME_CIEV3

2025-02-13 Thread Rainer Orth
Old versions of Solaris ld and GNU ld didn't support CIEv3 in .eh_frame.
To avoid this breaking the build

[build] Default to DWARF 4 on Solaris if linker supports CIEv3
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00669.html

checked for the necessary linker support, defaulting to DWARF-2 if
necessary.  Solaris ld was fixed in Solaris 11.1, GNU ld in binutils
2.16, so this is long obsolete and only used in Solaris code anyway.

This patch thus removes both the configure check and
solaris_override_options.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

Committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2025-02-12  Rainer Orth  

gcc:
* configure.ac (gcc_cv_ld_eh_frame_ciev3): Remove.
* configure, config.in: Regenerate.
* config/sol2.cc (solaris_override_options): Remove.
* config/sol2.h (SUBTARGET_OVERRIDE_OPTIONS): Remove.
* config/sol2-protos.h (solaris_override_options): Remove.

# HG changeset patch
# Parent  172c287f84e717c376d7214926fa3c33845335cb
build: Remove HAVE_LD_EH_FRAME_CIEV3

diff --git a/gcc/config.in b/gcc/config.in
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1774,12 +1774,6 @@
 #endif
 
 
-/* Define 0/1 if your linker supports CIE v3 in .eh_frame. */
-#ifndef USED_FOR_TARGET
-#undef HAVE_LD_EH_FRAME_CIEV3
-#endif
-
-
 /* Define if your linker supports .eh_frame_hdr. */
 #undef HAVE_LD_EH_FRAME_HDR
 
diff --git a/gcc/config/sol2-protos.h b/gcc/config/sol2-protos.h
--- a/gcc/config/sol2-protos.h
+++ b/gcc/config/sol2-protos.h
@@ -24,7 +24,6 @@ extern void solaris_elf_asm_comdat_secti
 extern void solaris_file_end (void);
 extern void solaris_insert_attributes (tree, tree *);
 extern void solaris_output_init_fini (FILE *, tree);
-extern void solaris_override_options (void);
 
 /* In sol2-c.cc.  */
 extern void solaris_register_pragmas (void);
diff --git a/gcc/config/sol2.cc b/gcc/config/sol2.cc
--- a/gcc/config/sol2.cc
+++ b/gcc/config/sol2.cc
@@ -291,13 +291,4 @@ solaris_file_end (void)
 (NULL);
 }
 
-void
-solaris_override_options (void)
-{
-  /* Older versions of Solaris ld cannot handle CIE version 3 in .eh_frame.
- Don't emit DWARF3/4 unless specifically selected if so.  */
-  if (!HAVE_LD_EH_FRAME_CIEV3 && !OPTION_SET_P (dwarf_version))
-dwarf_version = 2;
-}
-
 #include "gt-sol2.h"
diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -119,11 +119,6 @@ along with GCC; see the file COPYING3.  
 TARGET_SUB_OS_CPP_BUILTINS();			\
   } while (0)
 
-#define SUBTARGET_OVERRIDE_OPTIONS			\
-  do {			\
-solaris_override_options ();			\
-  } while (0)
-
 #if DEFAULT_ARCH32_P
 #define MULTILIB_DEFAULTS { "m32" }
 #else
diff --git a/gcc/configure b/gcc/configure
--- a/gcc/configure
+++ b/gcc/configure
@@ -32369,46 +32369,6 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_eh_frame_hdr" >&5
 $as_echo "$gcc_cv_ld_eh_frame_hdr" >&6; }
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker CIEv3 in .eh_frame support" >&5
-$as_echo_n "checking linker CIEv3 in .eh_frame support... " >&6; }
-gcc_cv_ld_eh_frame_ciev3=no
-if test $in_tree_ld = yes ; then
-  if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 16 -o "$gcc_cv_gld_major_version" -gt 2 \
- && test $in_tree_ld_is_elf = yes; then
-gcc_cv_ld_eh_frame_ciev3=yes
-  fi
-elif test x$gcc_cv_ld != x; then
-  if echo "$ld_ver" | grep GNU > /dev/null; then
-gcc_cv_ld_eh_frame_ciev3=yes
-if test 0"$ld_date" -lt 20040513; then
-  if test -n "$ld_date"; then
-	# If there was date string, but was earlier than 2004-05-13, fail
-	gcc_cv_ld_eh_frame_ciev3=no
-  elif test "$ld_vers_major" -lt 2; then
-	gcc_cv_ld_eh_frame_ciev3=no
-  elif test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 16; then
-	gcc_cv_ld_eh_frame_ciev3=no
-  fi
-fi
-  else
-case "$target" in
-  *-*-solaris2*)
-# Sun ld added support for CIE v3 in .eh_frame in Solaris 11.1.
-if test "$ld_vers_major" -gt 1 || test "$ld_vers_minor" -ge 2324; then
-  gcc_cv_ld_eh_frame_ciev3=yes
-fi
-;;
-esac
-  fi
-fi
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_LD_EH_FRAME_CIEV3 `if test x"$gcc_cv_ld_eh_frame_ciev3" = xyes; then echo 1; else echo 0; fi`
-_ACEOF
-
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_eh_frame_ciev3" >&5
-$as_echo "$gcc_cv_ld_eh_frame_ciev3" >&6; }
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker position independent executable support" >&5
 $as_echo_n "checking linker position independent executable support... " >&6; }
 gcc_cv_ld_pie=no
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6110,42 +6110,6 @@ if test x"$gcc_cv_ld_eh_frame_hdr" = xye
 fi
 AC_MSG_RESULT($gcc_cv_ld_eh_fram

[PATCH v2 4/8] LoongArch: Simplify {lsx_, lasx_x}vh{add, sub}w description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX_XVHADDW_QU_DU): Remove.
(UNSPEC_LASX_XVHSUBW_QU_DU): Remove.
(lasx_xvhw_h_b): Remove.
(lasx_xvhw_w_h): Remove.
(lasx_xvhw_d_w): Remove.
(lasx_xvhaddw_q_d): Remove.
(lasx_xvhsubw_q_d): Remove.
(lasx_xvhaddw_qu_du): Remove.
(lasx_xvhsubw_qu_du): Remove.
(reduc_plus_scal_v4di): Call gen_lasx_haddw_q_d_punned instead
of gen_lasx_xvhaddw_q_d.
(reduc_plus_scal_v8si): Likewise.
* config/loongarch/lsx.md (UNSPEC_LSX_VHADDW_Q_D): Remove.
(UNSPEC_ASX_VHSUBW_Q_D): Remove.
(UNSPEC_ASX_VHADDW_QU_DU): Remove.
(UNSPEC_ASX_VHSUBW_QU_DU): Remove.
(lsx_vhw_h_b): Remove.
(lsx_vhw_w_h): Remove.
(lsx_vhw_d_w): Remove.
(lsx_vhaddw_q_d): Remove.
(lsx_vhsubw_q_d): Remove.
(lsx_vhaddw_qu_du): Remove.
(lsx_vhsubw_qu_du): Remove.
(reduc_plus_scal_v2di): Change the temporary register mode to
V1TI, and pun the mode calling gen_vec_extractv2didi.
(reduc_plus_scal_v4si): Change the temporary register mode to
V1TI.
* config/loongarch/simd.md (simd_hw__): New
define_insn.
(_vhw__): New
define_expand.
(_hw_q_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vhaddw_q_d): Define as a macro to override with
punned expand.
(CODE_FOR_lsx_vhaddw_qu_du): Likewise.
(CODE_FOR_lsx_vhsubw_q_d): Likewise.
(CODE_FOR_lsx_vhsubw_qu_du): Likewise.
(CODE_FOR_lasx_xvhaddw_q_d): Likewise.
(CODE_FOR_lasx_xvhaddw_qu_du): Likewise.
(CODE_FOR_lasx_xvhsubw_q_d): Likewise.
(CODE_FOR_lasx_xvhsubw_qu_du): Likewise.
---
 gcc/config/loongarch/lasx.md   | 126 +
 gcc/config/loongarch/loongarch-builtins.cc |  10 ++
 gcc/config/loongarch/lsx.md| 108 +-
 gcc/config/loongarch/simd.md   |  52 +
 4 files changed, 69 insertions(+), 227 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 640fa028f1e..1dc11840187 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -100,10 +100,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVMADDWOD
   UNSPEC_LASX_XVMADDWOD2
   UNSPEC_LASX_XVMADDWOD3
-  UNSPEC_LASX_XVHADDW_Q_D
-  UNSPEC_LASX_XVHSUBW_Q_D
-  UNSPEC_LASX_XVHADDW_QU_DU
-  UNSPEC_LASX_XVHSUBW_QU_DU
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -1407,76 +1403,6 @@ (define_insn "fixuns_trunc2"
(set_attr "cnv_mode" "")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvhw_h_b"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (addsub:V16HI
- (any_extend:V16HI
-   (vec_select:V16QI
- (match_operand:V32QI 1 "register_operand" "f")
- (parallel [(const_int 1) (const_int 3)
-(const_int 5) (const_int 7)
-(const_int 9) (const_int 11)
-(const_int 13) (const_int 15)
-(const_int 17) (const_int 19)
-(const_int 21) (const_int 23)
-(const_int 25) (const_int 27)
-(const_int 29) (const_int 31)])))
- (any_extend:V16HI
-   (vec_select:V16QI
- (match_operand:V32QI 2 "register_operand" "f")
- (parallel [(const_int 0) (const_int 2)
-(const_int 4) (const_int 6)
-(const_int 8) (const_int 10)
-(const_int 12) (const_int 14)
-(const_int 16) (const_int 18)
-(const_int 20) (const_int 22)
-(const_int 24) (const_int 26)
-(const_int 28) (const_int 30)])]
-  "ISA_HAS_LASX"
-  "xvhw.h.b\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "V16HI")])
-
-(define_insn "lasx_xvhw_w_h"
-  [(set (match_operand:V8SI 0 "register_operand" "=f")
-   (addsub:V8SI
- (any_extend:V8SI
-   (vec_select:V8HI
- (match_operand:V16HI 1 "register_operand" "f")
- (parallel [(const_int 1) (const_int 3)
-(const_int 5) (const_int 7)
-(const_int 9) (const_int 11)
-(const_int 13) (const_int 15)])))
- (any_extend:V8SI
-   (vec_select:V8HI
- (match_operand:V16HI 2 "register_operand" "f")
- (parallel [(const_int 0) (const_int 2)
-(const_int 4) (const_int 6)
-(const_int 8

[PATCH v2 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

2025-02-13 Thread Xi Ruoyao
This series is intended to fix some test failures on
vect-reduc-chain-*.c by adding the [su]dot_prod* expand for LSX and LASX
vector modes.  But the code base of the related instructions was not
readable, so clean it up first (using the approach learnt from AArch64)
before adding the expands.

v1 => v2:

- Only simplify vpick{ev,od}, not xvpick{ev,od} (where
  vect_par_cnst_even_or_odd_half is not suitable).
- Keep {sign,zero}_extend out of vec_select.
- Remove vect_par_cnst_{even,odd}_half for simd_hw__,
  to simplify the code and allow it to match the RTL in case the even
  half is selected for the left operand of addsub.  Swap the operands if
  needed when outputting the asm.
- Fix typos in commit subjects.
- Mention V2TI in loongarch-modes.def.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (8):
  LoongArch: Try harder using vrepli instructions to materialize const
vectors
  LoongArch: Allow moving TImode vectors
  LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
  LoongArch: Simplify {lsx_,lasx_x}vh{add,sub}w description
  LoongArch: Simplify {lsx_,lasx_x}vmaddw description
  LoongArch: Simplify lsx_vpick description
  LoongArch: Implement vec_widen_mult_{even,odd}_* for LSX and LASX
modes
  LoongArch: Implement [su]dot_prod* for LSX and LASX modes

 gcc/config/loongarch/constraints.md   |2 +-
 gcc/config/loongarch/lasx.md  | 1070 +
 gcc/config/loongarch/loongarch-builtins.cc|   60 +
 gcc/config/loongarch/loongarch-modes.def  |5 +-
 gcc/config/loongarch/loongarch-protos.h   |3 +
 gcc/config/loongarch/loongarch.cc |   50 +-
 gcc/config/loongarch/loongarch.md |2 +-
 gcc/config/loongarch/lsx.md   | 1006 +---
 gcc/config/loongarch/predicates.md|   27 +
 gcc/config/loongarch/simd.md  |  390 +-
 gcc/testsuite/gcc.target/loongarch/vrepli.c   |   15 +
 .../gcc.target/loongarch/wide-mul-reduc-1.c   |   18 +
 .../gcc.target/loongarch/wide-mul-reduc-2.c   |   18 +
 13 files changed, 612 insertions(+), 2054 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/wide-mul-reduc-2.c

-- 
2.48.1



[PATCH v2 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

2025-02-13 Thread Xi Ruoyao
For

  a = (v4si){0x, 0x, 0x, 0x}

we just want

  vrepli.b $vr0, 0xdd

but the compiler actually produces a load:

  la.local $r14,.LC0
  vld  $vr0,$r14,0

It's because we only tried vrepli.d which wouldn't work.  Try all vrepli
instructions for const int vector materializing to fix it.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_const_vector_vrepli): New function prototype.
* config/loongarch/loongarch.cc (loongarch_const_vector_vrepli):
Implement.
(loongarch_const_insns): Call loongarch_const_vector_vrepli
instead of loongarch_const_vector_same_int_p.
(loongarch_split_vector_move_p): Likewise.
(loongarch_output_move): Use loongarch_const_vector_vrepli to
pun operend[1] into a better mode if it's a const int vector,
and decide the suffix of [x]vrepli with the new mode.
* config/loongarch/constraints.md (YI): Call
loongarch_const_vector_vrepli instead of
loongarch_const_vector_same_int_p.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vrepli.c: New test.
---
 gcc/config/loongarch/constraints.md |  2 +-
 gcc/config/loongarch/loongarch-protos.h |  1 +
 gcc/config/loongarch/loongarch.cc   | 34 ++---
 gcc/testsuite/gcc.target/loongarch/vrepli.c | 15 +
 4 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vrepli.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index a7c31c2c4e0..97a4e4e35d3 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -301,7 +301,7 @@ (define_constraint "YI"
A replicated vector const in which the replicated value is in the range
[-512,511]."
   (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, -512, 511)")))
+   (match_test "loongarch_const_vector_vrepli (op, mode)")))
 
 (define_constraint "YC"
   "@internal
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index b99f949a004..20acca690c8 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -121,6 +121,7 @@ extern bool loongarch_const_vector_same_int_p (rtx, 
machine_mode,
 extern bool loongarch_const_vector_shuffle_set_p (rtx, machine_mode);
 extern bool loongarch_const_vector_bitimm_set_p (rtx, machine_mode);
 extern bool loongarch_const_vector_bitimm_clr_p (rtx, machine_mode);
+extern rtx loongarch_const_vector_vrepli (rtx, machine_mode);
 extern rtx loongarch_lsx_vec_parallel_const_half (machine_mode, bool);
 extern rtx loongarch_gen_const_int_vector (machine_mode, HOST_WIDE_INT);
 extern enum reg_class loongarch_secondary_reload_class (enum reg_class,
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..e036f802fde 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1846,6 +1846,28 @@ loongarch_const_vector_shuffle_set_p (rtx op, 
machine_mode mode)
   return true;
 }
 
+rtx
+loongarch_const_vector_vrepli (rtx x, machine_mode mode)
+{
+  int size = GET_MODE_SIZE (mode);
+
+  if (GET_CODE (x) != CONST_VECTOR
+  || GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
+return NULL_RTX;
+
+  for (scalar_int_mode elem_mode: {QImode, HImode, SImode, DImode})
+{
+  machine_mode new_mode =
+   mode_for_vector (elem_mode, size / GET_MODE_SIZE (elem_mode))
+ .require ();
+  rtx op = lowpart_subreg (new_mode, x, mode);
+  if (loongarch_const_vector_same_int_p (op, new_mode, -512, 511))
+   return op;
+}
+
+  return NULL_RTX;
+}
+
 /* Return true if rtx constants of mode MODE should be put into a small
data section.  */
 
@@ -2501,7 +2523,7 @@ loongarch_const_insns (rtx x)
 case CONST_VECTOR:
   if ((LSX_SUPPORTED_MODE_P (GET_MODE (x))
   || LASX_SUPPORTED_MODE_P (GET_MODE (x)))
- && loongarch_const_vector_same_int_p (x, GET_MODE (x), -512, 511))
+ && loongarch_const_vector_vrepli (x, GET_MODE (x)))
return 1;
   /* Fall through.  */
 case CONST_DOUBLE:
@@ -4656,7 +4678,7 @@ loongarch_split_vector_move_p (rtx dest, rtx src)
   /* Check for vector set to an immediate const vector with valid replicated
  element.  */
   if (FP_REG_RTX_P (dest)
-  && loongarch_const_vector_same_int_p (src, GET_MODE (src), -512, 511))
+  && loongarch_const_vector_vrepli (src, GET_MODE (src)))
 return false;
 
   /* Check for vector load zero immediate.  */
@@ -4792,13 +4814,15 @@ loongarch_output_move (rtx *operands)
   && src_code == CONST_VECTOR
   && CONST_INT_P (CONST_VECTOR_ELT (src, 0)))
 {
-  gcc_assert (loongarch_const_vector_same_int_p (src, mode, -512, 511));
+  operands[1] = loongarch_const_vector_vrepli (src, mode);
+  gcc_assert (operands[1]);

[PATCH v2 2/8] LoongArch: Allow moving TImode vectors

2025-02-13 Thread Xi Ruoyao
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors.  Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.

For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX registers so we won't get a reload failure when we start to
save TImode vectors in these registers.

This implicitly depends on the vrepli optimization: without it we'd try
"vrepli.q" which does not really exist and trigger an ICE.

gcc/ChangeLog:

* config/loongarch/lsx.md (mov): Remove.
(movmisalign): Remove.
(mov_lsx): Remove.
* config/loongarch/lasx.md (mov): Remove.
(movmisalign): Remove.
(mov_lasx): Remove.
* config/loongarch/simd.md (ALLVEC_TI): New mode iterator.
(mov): Likewise.
(mov_simd): New define_insn_and_split.
---
 gcc/config/loongarch/lasx.md | 40 --
 gcc/config/loongarch/lsx.md  | 36 ---
 gcc/config/loongarch/simd.md | 42 
 3 files changed, 42 insertions(+), 76 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index a37c85a25a4..d82ad61be60 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -699,46 +699,6 @@ (define_expand "lasx_xvrepli"
   DONE;
 })
 
-(define_expand "mov"
-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-
-(define_expand "movmisalign"
-  [(set (match_operand:LASX 0)
-   (match_operand:LASX 1))]
-  "ISA_HAS_LASX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-;; 256-bit LASX modes can only exist in LASX registers or memory.
-(define_insn "mov_lasx"
-  [(set (match_operand:LASX 0 "nonimmediate_operand" "=f,f,R,*r,*f")
-   (match_operand:LASX 1 "move_operand" "fYGYI,R,f,*f,*r"))]
-  "ISA_HAS_LASX"
-  { return loongarch_output_move (operands); }
-  [(set_attr "type" "simd_move,simd_load,simd_store,simd_copy,simd_insert")
-   (set_attr "mode" "")
-   (set_attr "length" "8,4,4,4,4")])
-
-
-(define_split
-  [(set (match_operand:LASX 0 "nonimmediate_operand")
-   (match_operand:LASX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LASX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
 
 ;; LASX
 (define_insn "add3"
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index ca0066a21ed..bcc5ae85fb3 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -575,42 +575,6 @@ (define_insn "lsx_vshuf_"
   [(set_attr "type" "simd_sld")
(set_attr "mode" "")])
 
-(define_expand "mov"
-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_expand "movmisalign"
-  [(set (match_operand:LSX 0)
-   (match_operand:LSX 1))]
-  "ISA_HAS_LSX"
-{
-  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
-DONE;
-})
-
-(define_insn "mov_lsx"
-  [(set (match_operand:LSX 0 "nonimmediate_operand" "=f,f,R,*r,*f,*r")
-   (match_operand:LSX 1 "move_operand" "fYGYI,R,f,*f,*r,*r"))]
-  "ISA_HAS_LSX"
-{ return loongarch_output_move (operands); }
-  [(set_attr "type" 
"simd_move,simd_load,simd_store,simd_copy,simd_insert,simd_copy")
-   (set_attr "mode" "")])
-
-(define_split
-  [(set (match_operand:LSX 0 "nonimmediate_operand")
-   (match_operand:LSX 1 "move_operand"))]
-  "reload_completed && ISA_HAS_LSX
-   && loongarch_split_move_p (operands[0], operands[1])"
-  [(const_int 0)]
-{
-  loongarch_split_move (operands[0], operands[1]);
-  DONE;
-})
 
 ;; Integer operations
 (define_insn "add3"
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 7605b17d21e..61fc1ab20ad 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -130,6 +130,48 @@ (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
 ;; instruction here so we can avoid duplicating logics.
 ;; ===
 
+
+;; Move
+
+;; Some immediate values in V1TI or V2TI may be stored in LSX or LASX
+;; registers, thus we need to allow moving them for reload.
+(define_mode_iterator ALLVEC_TI [ALLVEC
+(V1TI "ISA_HAS_LSX")
+(V2TI "ISA_HAS_LASX")])
+
+(define_expand "mov"
+  [(set (match_operand:ALLVEC_TI 0)
+   (match_operand:ALLVEC_TI 1))]
+  ""
+{
+  if (loongarch_legitimize_move (mode, operands[0], operands[1]))
+DONE;
+})
+
+(define_expand "movmisalign"
+  [(set (match_operand:ALLVEC_TI 0)
+   (match_operand:ALLVEC_TI 1))]
+  ""
+{
+  if (loongarch_legitimize_move (mode, operands[0], operands[1])

Re: [PATCH v4] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]

2025-02-13 Thread Sam James
Alexandre Oliva  writes:

> On Feb  6, 2025, Sam James  wrote:
>
>> Richard Biener  writes:
>>> On Thu, Feb 6, 2025 at 2:41 PM Alexandre Oliva  wrote:
 
 On Jan 27, 2025, Richard Biener  wrote:
 > (I see the assert is no longer in the patch).
 
 That's because the assert went in as part of an earlier patch.  I take
 it it should be backed out along with the to-be-split-out bits above,
 right?
>>> 
>>> Yes.
>>> 
>>> (IIRC there's also a PR tripping over this or a similar assert)
>
>> Right, PR118706.
>
> Thanks.  I've added its testcase to the patch below, reverted the
> assert, and dropped the other unwanted bits.  Regstrapped on
> x86_64-linux-gnu.  Ok to install?

Thanks. BTW, there's another for you at PR118805 (sorry). 

>
>
>
> If decode_field_reference finds a load that accesses past the inner
> object's size, bail out.
>
> Drop the too-strict assert.
>
>
> for  gcc/ChangeLog
>
>   PR tree-optimization/118514
>   PR tree-optimization/118706
>   * gimple-fold.cc (decode_field_reference): Refuse to consider
>   merging out-of-bounds BIT_FIELD_REFs.
>   (make_bit_field_load): Drop too-strict assert.
>   * tree-eh.cc (bit_field_ref_in_bounds_p): Rename to...
>   (access_in_bounds_of_type_p): ... this.  Change interface,
>   export.
>   (tree_could_trap_p): Adjust.
>   * tree-eh.h (access_in_bounds_of_type_p): Declare.
>
> for  gcc/testsuite/ChangeLog
>
>   PR tree-optimization/118514
>   PR tree-optimization/118706
>   * gcc.dg/field-merge-25.c: New.
> ---
>  gcc/gimple-fold.cc|   11 ++-
>  gcc/testsuite/gcc.dg/field-merge-25.c |   15 +++
>  gcc/tree-eh.cc|   25 +
>  gcc/tree-eh.h |1 +
>  4 files changed, 31 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-25.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 45485782cdf91..29191685a43c5 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT 
> *pbitsize,
>|| bs <= shiftrt
>|| offset != 0
>|| TREE_CODE (inner) == PLACEHOLDER_EXPR
> -  /* Reject out-of-bound accesses (PR79731).  */
> -  || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> -   && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> -bp + bs) < 0)
> +  /* Reject out-of-bound accesses (PR79731, PR118514).  */
> +  || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp)
>|| (INTEGRAL_TYPE_P (TREE_TYPE (inner))
> && !type_has_mode_precision_p (TREE_TYPE (inner
>  return NULL_TREE;
> @@ -7859,11 +7857,6 @@ make_bit_field_load (location_t loc, tree inner, tree 
> orig_inner, tree type,
>gimple *new_stmt = gsi_stmt (i);
>if (gimple_has_mem_ops (new_stmt))
>   gimple_set_vuse (new_stmt, reaching_vuse);
> -  gcc_checking_assert (! (gimple_assign_load_p (point)
> -   && gimple_assign_load_p (new_stmt))
> -|| (tree_could_trap_p (gimple_assign_rhs1 (point))
> -== tree_could_trap_p (gimple_assign_rhs1
> -  (new_stmt;
>  }
>  
>gimple_stmt_iterator gsi = gsi_for_stmt (point);
> diff --git a/gcc/testsuite/gcc.dg/field-merge-25.c 
> b/gcc/testsuite/gcc.dg/field-merge-25.c
> new file mode 100644
> index 0..e769b0ae7b846
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-25.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-fre" } */
> +
> +/* PR tree-optimization/118706 */
> +
> +int a[1][1][3], b;
> +int main() {
> +  int c = -1;
> +  while (b) {
> +if (a[c][c][6])
> +  break;
> +if (a[0][0][0])
> +  break;
> +  }
> +}
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index 7015189a2de83..a4d59954c0597 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref)
>return true;
>  }
>  
> -/* Return true iff EXPR, a BIT_FIELD_REF, accesses a bit range that is known 
> to
> -   be in bounds for the referred operand type.  */
> +/* Return true iff a BIT_FIELD_REF <(TYPE)???, SIZE, OFFSET> would access a 
> bit
> +   range that is known to be in bounds for TYPE.  */
>  
> -static bool
> -bit_field_ref_in_bounds_p (tree expr)
> +bool
> +access_in_bounds_of_type_p (tree type, poly_uint64 size, poly_uint64 offset)
>  {
> -  tree size_tree;
> -  poly_uint64 size_max, min, wid, max;
> +  tree type_size_tree;
> +  poly_uint64 type_size_max, min = offset, wid = size, max;
>  
> -  size_tree = TYPE_SIZE (TREE_TYPE (TREE_OPERAND (expr, 0)));
> -  if (!size_tree || !poly_int_tree_p (size_tree, &size_max))
> +  type_size_tree = TYPE_SIZE (type);
> +  if (!type_size_tree || !poly_int_tree_p (type_size_tree, &type_size_max))
>  retu

[PATCH, FYI] [testsuite] fix check-function-bodies usage

2025-02-13 Thread Alexandre Oliva


The existing usage comment for check-function-bodies is presumably a
typo, as it doesn't match existing uses.  Fix it.

Tested on x86_64-linux-gnu.  I'm going to install it as obvious if there
are no objections in the next 24 hours.


for  gcc/testsuite/ChangeLog

* lib/scanasm.exp (check-function-bodies): Fix usage comment.
---
 gcc/testsuite/lib/scanasm.exp |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index beffedd5bce46..97935cb23c3cf 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -985,7 +985,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
+# { dg-final { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
 #
 # See sourcebuild.texi for details.
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH v2 6/8] LoongArch: Simplify lsx_vpick description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.

This is not suitable for LASX where lasx_xvpick has a different
semantic.

gcc/ChangeLog:

* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_i): Make it same as simdfmt for integer vector
modes.
(_f): New define_mode_attr.
* config/loongarch/lsx.md (lsx_vpickev_b): Remove.
(lsx_vpickev_h): Remove.
(lsx_vpickev_w): Remove.
(lsx_vpickev_w_f): Remove.
(lsx_vpickod_b): Remove.
(lsx_vpickod_h): Remove.
(lsx_vpickod_w): Remove.
(lsx_vpickev_w_f): Remove.
(lsx_pick_evod_): New define_insn.
(lsx_vpick_<_f>): New
define_expand.
---
 gcc/config/loongarch/lsx.md  | 142 ++-
 gcc/config/loongarch/simd.md |  24 +-
 2 files changed, 47 insertions(+), 119 deletions(-)

diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index c7df04c6389..9d7254768ae 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1624,125 +1624,33 @@ (define_insn "lsx_nor_"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "")])
 
-(define_insn "lsx_vpickev_b"
-[(set (match_operand:V16QI 0 "register_operand" "=f")
-  (vec_select:V16QI
-   (vec_concat:V32QI
- (match_operand:V16QI 1 "register_operand" "f")
- (match_operand:V16QI 2 "register_operand" "f"))
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)
-  (const_int 8) (const_int 10)
-  (const_int 12) (const_int 14)
-  (const_int 16) (const_int 18)
-  (const_int 20) (const_int 22)
-  (const_int 24) (const_int 26)
-  (const_int 28) (const_int 30)])))]
-  "ISA_HAS_LSX"
-  "vpickev.b\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V16QI")])
-
-(define_insn "lsx_vpickev_h"
-[(set (match_operand:V8HI 0 "register_operand" "=f")
-  (vec_select:V8HI
-   (vec_concat:V16HI
- (match_operand:V8HI 1 "register_operand" "f")
- (match_operand:V8HI 2 "register_operand" "f"))
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)
-  (const_int 8) (const_int 10)
-  (const_int 12) (const_int 14)])))]
-  "ISA_HAS_LSX"
-  "vpickev.h\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8HI")])
-
-(define_insn "lsx_vpickev_w"
-[(set (match_operand:V4SI 0 "register_operand" "=f")
-  (vec_select:V4SI
-   (vec_concat:V8SI
- (match_operand:V4SI 1 "register_operand" "f")
- (match_operand:V4SI 2 "register_operand" "f"))
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)])))]
-  "ISA_HAS_LSX"
-  "vpickev.w\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V4SI")])
-
-(define_insn "lsx_vpickev_w_f"
-[(set (match_operand:V4SF 0 "register_operand" "=f")
-  (vec_select:V4SF
-   (vec_concat:V8SF
- (match_operand:V4SF 1 "register_operand" "f")
- (match_operand:V4SF 2 "register_operand" "f"))
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)])))]
-  "ISA_HAS_LSX"
-  "vpickev.w\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V4SF")])
-
-(define_insn "lsx_vpickod_b"
-[(set (match_operand:V16QI 0 "register_operand" "=f")
-  (vec_select:V16QI
-   (vec_concat:V32QI
- (match_operand:V16QI 1 "register_operand" "f")
- (match_operand:V16QI 2 "register_operand" "f"))
-   (parallel [(const_int 1) (const_int 3)
-  (const_int 5) (const_int 7)
-  (const_int 9) (const_int 11)
-  (const_int 13) (const_int 15)
-  (const_int 17) (const_int 19)
-  (const_int 21) (const_int 23)
-  (const_int 25) (const_int 27)
-  (const_int 29) (const_int 31)])))]
-  "ISA_HAS_LSX"
-  "vpickod.b\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V16QI")])
-
-(define_insn "lsx_vpickod_h"
-[(set (match_operand:V8HI 0 "register_operand" "=f")
-  (vec_select:V8HI
-   (vec_concat:V16HI
- (match_operand:V8HI 1 "register_operand" "f")
- (match_operand:V8HI 2 "register_operand" "f"))
-   (parallel [(const_int 1) (const_int 3)
-  (const_int 5) (const_int 7)
-  (const_int 9) (const_int 11)
-  (const_int 13) (const_int 15)])))]
-  "ISA_HAS_LSX"
-  "vpickod.h\t%w0,%w2,%w1"
-  [(set_attr "type" "simd_permute")
-   (set_attr "mode" "V8HI")])
-
-(define_insn "lsx_vpickod_w"
-[(set (match_operand:V4SI 0 "register_operand" "=f")
-  (vec_select:V4SI
-   (vec_concat:V8SI
- (match_operand:V4SI 1 "register_operand" "f")
-   

[PATCH v2 5/8] LoongArch: Simplify {lsx_,lasx_x}vmaddw description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.

Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.

gcc/ChangeLog:

* config/loongarch/lasx.md (UNSPEC_LASX_XVMADDWEV): Remove.
(UNSPEC_LASX_XVMADDWEV2): Remove.
(UNSPEC_LASX_XVMADDWEV3): Remove.
(UNSPEC_LASX_XVMADDWOD): Remove.
(UNSPEC_LASX_XVMADDWOD2): Remove.
(UNSPEC_LASX_XVMADDWOD3): Remove.
(lasx_xvmaddwev_h_b): Remove.
(lasx_xvmaddwev_w_h): Remove.
(lasx_xvmaddwev_d_w): Remove.
(lasx_xvmaddwev_q_d): Remove.
(lasx_xvmaddwod_h_b): Remove.
(lasx_xvmaddwod_w_h): Remove.
(lasx_xvmaddwod_d_w): Remove.
(lasx_xvmaddwod_q_d): Remove.
(lasx_xvmaddwev_q_du): Remove.
(lasx_xvmaddwod_q_du): Remove.
(lasx_xvmaddwev_h_bu_b): Remove.
(lasx_xvmaddwev_w_hu_h): Remove.
(lasx_xvmaddwev_d_wu_w): Remove.
(lasx_xvmaddwev_q_du_d): Remove.
(lasx_xvmaddwod_h_bu_b): Remove.
(lasx_xvmaddwod_w_hu_h): Remove.
(lasx_xvmaddwod_d_wu_w): Remove.
(lasx_xvmaddwod_q_du_d): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VMADDWEV): Remove.
(UNSPEC_LSX_VMADDWEV2): Remove.
(UNSPEC_LSX_VMADDWEV3): Remove.
(UNSPEC_LSX_VMADDWOD): Remove.
(UNSPEC_LSX_VMADDWOD2): Remove.
(UNSPEC_LSX_VMADDWOD3): Remove.
(lsx_vmaddwev_h_b): Remove.
(lsx_vmaddwev_w_h): Remove.
(lsx_vmaddwev_d_w): Remove.
(lsx_vmaddwev_q_d): Remove.
(lsx_vmaddwod_h_b): Remove.
(lsx_vmaddwod_w_h): Remove.
(lsx_vmaddwod_d_w): Remove.
(lsx_vmaddwod_q_d): Remove.
(lsx_vmaddwev_q_du): Remove.
(lsx_vmaddwod_q_du): Remove.
(lsx_vmaddwev_h_bu_b): Remove.
(lsx_vmaddwev_w_hu_h): Remove.
(lsx_vmaddwev_d_wu_w): Remove.
(lsx_vmaddwev_q_du_d): Remove.
(lsx_vmaddwod_h_bu_b): Remove.
(lsx_vmaddwod_w_hu_h): Remove.
(lsx_vmaddwod_d_wu_w): Remove.
(lsx_vmaddwod_q_du_d): Remove.
* config/loongarch/simd.md (simd_maddw_evod__):
New define_insn.
(_vmaddw__): New
define_expand.
(simd_maddw_evod__hetero): New define_insn.
(_vmaddw__u_):
New define_expand.
(_maddw_q_d_punned): New define_expand.
(_maddw_q_du_d_punned): New define_expand.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vmaddwev_q_d): Define as a macro to override it
with the punned expand.
(CODE_FOR_lsx_vmaddwev_q_du): Likewise.
(CODE_FOR_lsx_vmaddwev_q_du_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_d): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du): Likewise.
(CODE_FOR_lsx_vmaddwod_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwev_q_du_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_d): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du): Likewise.
(CODE_FOR_lasx_xvmaddwod_q_du_d): Likewise.
---
 gcc/config/loongarch/lasx.md   | 400 -
 gcc/config/loongarch/loongarch-builtins.cc |  14 +
 gcc/config/loongarch/lsx.md| 320 -
 gcc/config/loongarch/simd.md   | 104 ++
 4 files changed, 118 insertions(+), 720 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 1dc11840187..4ac85b7fcf9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -94,12 +94,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVPERMI_Q
   UNSPEC_LASX_XVPERMI_D
 
-  UNSPEC_LASX_XVMADDWEV
-  UNSPEC_LASX_XVMADDWEV2
-  UNSPEC_LASX_XVMADDWEV3
-  UNSPEC_LASX_XVMADDWOD
-  UNSPEC_LASX_XVMADDWOD2
-  UNSPEC_LASX_XVMADDWOD3
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -3122,400 +3116,6 @@ (define_insn "lasx_xvldrepl__insn_0"
(set_attr "mode" "")
(set_attr "length" "4")])
 
-;;XVMADDWEV.H.B   XVMADDWEV.H.BU
-(define_insn "lasx_xvmaddwev_h_b"
-  [(set (match_operand:V16HI 0 "register_operand" "=f")
-   (plus:V16HI
- (match_operand:V16HI 1 "register_operand" "0")
- (mult:V16HI
-   (any_extend:V16HI
- (vec_select:V16QI
-   (match_operand:V32QI 2 "register_operand" "%f")
-   (parallel [(const_int 0) (const_int 2)
-  (const_int 4) (const_int 6)
-  (const_int 8) (const_int 10)
-  (const_int 12) (const_int 14)
-  (const_int 16) (const_int 18)
-  (const_int 20) (const_int 22)
-  (const_int 24) (const_int 26)
-  (const_int 28) (const_int 30)])))
- 

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-13 Thread Xi Ruoyao
On Thu, 2025-02-13 at 17:01 +0800, Lulu Cheng wrote:
> Hi, Ruoyao:
> 
> When will it be convenient for you to submit the v2 version of the
> patch?

https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675672.html

> 
> I am planning to merge the current patches and then test the optimal
> values
> 
> for -malign-{functions,labels,jumps,loops} on that basis.

Thanks!

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

2025-02-13 Thread Matthew Malcomson




On 2/12/25 23:30, Jason Merrill wrote:

External email: Use caution opening links or attachments


In the new `check_known_compiler_messages_nocache` procedure I use some


Why is it not enough to look for the message with "[regexp" like
check_alias_available does?

Jason



The goal was that I wanted to be able to query "the warnings/errors are 
*only* about this thing", rather than "warnings mention this thing".


That said, since my use-case here is to give a boolean, the hypothetical 
case of "extra" messages has to be categorised in one or the other bucket.
Since the final behaviour would be much the same -- possible "excess 
error" messages on targets which support 
__builtin_speculation_safe_value instead of on targets which don't -- a 
simple `regexp` would work for this patch just as well.


Shall I make that change?


Re: [PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test

2025-02-13 Thread Richard Biener
On Thu, 13 Feb 2025, Richard Biener wrote:

> On Wed, 12 Feb 2025, Andrew Pinski wrote:
> 
> > On Wed, Feb 12, 2025 at 4:04 AM Richard Biener  wrote:
> > >
> > > The PR indicates a very specific issue with regard to SSA coalescing
> > > failures because there's a pre IV increment loop exit test.  While
> > > IVOPTs created the desired IL we later simplify the exit test into
> > > the undesirable form again.  The following fixes this up during RTL
> > > expansion where we try to improve coalescing of IVs.  That seems
> > > easier that trying to avoid the simplification with some weird
> > > heuristics (it could also have been written this way).
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > OK for trunk?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > PR tree-optimization/86270
> > > * tree-outof-ssa.cc (insert_backedge_copies): Pattern
> > > match a single conflict in a loop condition and adjust
> > > that avoiding the conflict if possible.
> > >
> > > * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
> > > copies as well.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr86270.c |  3 ++
> > >  gcc/tree-outof-ssa.cc   | 49 ++---
> > >  2 files changed, 47 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c 
> > > b/gcc/testsuite/gcc.target/i386/pr86270.c
> > > index 68562446fa4..89b9aeb317a 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> > > @@ -13,3 +13,6 @@ test ()
> > >
> > >  /* Check we do not split the backedge but keep nice loop form.  */
> > >  /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */
> > > +/* Check we do not end up with reg-reg moves from a pre-increment IV
> > > +   exit test.  */
> > > +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, 
> > > %\?\[er\].x" } } */
> > > diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
> > > index d340d4ba529..f285c81599e 100644
> > > --- a/gcc/tree-outof-ssa.cc
> > > +++ b/gcc/tree-outof-ssa.cc
> > > @@ -1259,10 +1259,9 @@ insert_backedge_copies (void)
> > >   if (gimple_nop_p (def)
> > >   || gimple_code (def) == GIMPLE_PHI)
> > > continue;
> > > - tree name = copy_ssa_name (result);
> > > - gimple *stmt = gimple_build_assign (name, result);
> > >   imm_use_iterator imm_iter;
> > >   gimple *use_stmt;
> > > + auto_vec uses;
> > >   /* The following matches trivially_conflicts_p.  */
> > >   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result)
> > > {
> > > @@ -1273,11 +1272,51 @@ insert_backedge_copies (void)
> > > {
> > >   use_operand_p use;
> > >   FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
> > > -   SET_USE (use, name);
> > > +   uses.safe_push (use);
> > > }
> > > }
> > > - gimple_stmt_iterator gsi = gsi_for_stmt (def);
> > > - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> > > + /* When there is just a conflicting statement try to
> > > +adjust that to refer to the new definition.
> > > +In particular for now handle a conflict with the
> > > +use in a (exit) condition with a NE compare,
> > > +replacing a pre-IV-increment compare with a
> > > +post-IV-increment one.  */
> > > + if (uses.length () == 1
> > > + && is_a  (USE_STMT (uses[0]))
> > > + && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR
> > > + && is_gimple_assign (def)
> > > + && gimple_assign_rhs1 (def) == result
> > > + && (gimple_assign_rhs_code (def) == PLUS_EXPR
> > > + || gimple_assign_rhs_code (def) == MINUS_EXPR
> > > + || gimple_assign_rhs_code (def) == 
> > > POINTER_PLUS_EXPR)
> > > + && TREE_CODE (gimple_assign_rhs2 (def)) == 
> > > INTEGER_CST)
> > > +   {
> > > + gcond *cond = as_a  (USE_STMT (uses[0]));
> > > + tree *adj;
> > > + if (gimple_cond_lhs (cond) == result)
> > > +   adj = gimple_cond_rhs_ptr (cond);
> > > + else
> > > +   adj = gimple_cond_lhs_ptr (cond);
> > > + tree name = copy_ssa_name (result);
> > 
> > Should this be `copy_ssa_name (*adj)`? Since the new name is based on
> > `*adj` rather than based on the result.
> 
> Good point, I've adjusted this in my local copy.

Ah, but i

[PATCH][v2] tree-optimization/86270 - improve SSA coalescing for loop exit test

2025-02-13 Thread Richard Biener
The PR indicates a very specific issue with regard to SSA coalescing
failures because there's a pre IV increment loop exit test.  While
IVOPTs created the desired IL we later simplify the exit test into
the undesirable form again.  The following fixes this up during RTL
expansion where we try to improve coalescing of IVs.  That seems
easier that trying to avoid the simplification with some weird
heuristics (it could also have been written this way).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?

Thanks,
Richard.

PR tree-optimization/86270
* tree-outof-ssa.cc (insert_backedge_copies): Pattern
match a single conflict in a loop condition and adjust
that avoiding the conflict if possible.

* gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
copies as well.
---
 gcc/testsuite/gcc.target/i386/pr86270.c |  3 ++
 gcc/tree-outof-ssa.cc   | 51 ++---
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c 
b/gcc/testsuite/gcc.target/i386/pr86270.c
index 68562446fa4..89b9aeb317a 100644
--- a/gcc/testsuite/gcc.target/i386/pr86270.c
+++ b/gcc/testsuite/gcc.target/i386/pr86270.c
@@ -13,3 +13,6 @@ test ()
 
 /* Check we do not split the backedge but keep nice loop form.  */
 /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */
+/* Check we do not end up with reg-reg moves from a pre-increment IV
+   exit test.  */
+/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" } } 
*/
diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index d340d4ba529..1b5b67c2e2b 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-outof-ssa.h"
 #include "dojump.h"
 #include "internal-fn.h"
+#include "gimple-fold.h"
 
 /* FIXME: A lot of code here deals with expanding to RTL.  All that code
should be in cfgexpand.cc.  */
@@ -1259,10 +1260,9 @@ insert_backedge_copies (void)
  if (gimple_nop_p (def)
  || gimple_code (def) == GIMPLE_PHI)
continue;
- tree name = copy_ssa_name (result);
- gimple *stmt = gimple_build_assign (name, result);
  imm_use_iterator imm_iter;
  gimple *use_stmt;
+ auto_vec uses;
  /* The following matches trivially_conflicts_p.  */
  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result)
{
@@ -1273,11 +1273,52 @@ insert_backedge_copies (void)
{
  use_operand_p use;
  FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
-   SET_USE (use, name);
+   uses.safe_push (use);
}
}
- gimple_stmt_iterator gsi = gsi_for_stmt (def);
- gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ /* When there is just a conflicting statement try to
+adjust that to refer to the new definition.
+In particular for now handle a conflict with the
+use in a (exit) condition with a NE compare,
+replacing a pre-IV-increment compare with a
+post-IV-increment one.  */
+ if (uses.length () == 1
+ && is_a  (USE_STMT (uses[0]))
+ && (gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR
+ || gimple_cond_code (USE_STMT (uses[0])) == EQ_EXPR)
+ && is_gimple_assign (def)
+ && gimple_assign_rhs1 (def) == result
+ && (gimple_assign_rhs_code (def) == PLUS_EXPR
+ || gimple_assign_rhs_code (def) == MINUS_EXPR
+ || gimple_assign_rhs_code (def) == POINTER_PLUS_EXPR)
+ && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST)
+   {
+ gcond *cond = as_a  (USE_STMT (uses[0]));
+ tree *adj;
+ if (gimple_cond_lhs (cond) == result)
+   adj = gimple_cond_rhs_ptr (cond);
+ else
+   adj = gimple_cond_lhs_ptr (cond);
+ gimple_stmt_iterator gsi = gsi_for_stmt (cond);
+ tree newval
+   = gimple_build (&gsi, true, GSI_SAME_STMT,
+   UNKNOWN_LOCATION,
+   gimple_assign_rhs_code (def),
+   TREE_TYPE (*adj),
+   *adj, gimple_assign_rhs2 (def));
+ *adj = newval;
+ SET_USE (uses[0], arg);
+ update_stmt (cond);
+   }
+ 

Re: [PATCH] tree, gengtype: Fix up GC issue with DECL_VALUE_EXPR [PR118790]

2025-02-13 Thread Richard Biener
On Thu, 13 Feb 2025, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because we have multiple levels of
> DECL_VALUE_EXPR VAR_DECLs:
>   character(kind=1) id_string[1:.id_string] [value-expr: *id_string.55];
>   character(kind=1)[1:.id_string] * id_string.55 [value-expr: 
> FRAME.107.id_string.55];
>   integer(kind=8) .id_string [value-expr: FRAME.107..id_string];
> id_string is the user variable mentioned in BLOCK_VARS, it has
> DECL_VALUE_EXPR because it is a VLA, id_string.55 is a temporary created by
> gimplify_vla_decl as the address that points to the start of the VLA, what
> is normally used in the IL to access it.  But as this artificial var is then
> used inside of a nested function, tree-nested.cc adds DECL_VALUE_EXPR to it
> too and moves the actual value into the FRAME.107 object's member.
> Now, remove_unused_locals removes id_string.55 (and various other VAR_DECLs)
> from cfun->local_decls, simply because it is not mentioned in the IL at all
> (neither is id_string itself, but that is kept in BLOCK_VARS as it has
> DECL_VALUE_EXPR).  So, after this point, id_string.55 tree isn't referenced 
> from
> anywhere but id_string's DECL_VALUE_EXPR.  Next GC collection is triggered,
> and we are unlucky enough that in the value_expr_for_decl hash table
> (underlying hash map for DECL_VALUE_EXPR) the id_string.55 entry comes
> before the id_string entry.  id_string is ggc_marked_p because it is
> referenced from BLOCK_VARS, but id_string.55 is not, as we don't mark
> DECL_VALUE_EXPR anywhere but by gt_cleare_cache on value_expr_for_decl.
> But gt_cleare_cache does two things, it calls clear_slots on entries
> where the key is not ggc_marked_p (so the id_string.55 mapping to
> FRAME.107.id_string.55 is lost and DECL_VALUE_EXPR (id_string.55) becomes
> NULL) but then later we see id_string entry, which is ggc_marked_p, so mark
> the whole hash table entry, which sets ggc_set_mark on id_string.55.  But
> at this point its DECL_VALUE_EXPR is lost.
> Later during dwarf2out.cc we want to emit DW_AT_location for id_string, see
> it has DECL_VALUE_EXPR, so emit it as indirection of id_string.55 for which
> we again lookup DECL_VALUE_EXPR as it has DECL_HAS_VALUE_EXPR_P, but as it
> is NULL, we ICE, instead of finding it is a subobject of FRAME.107 for which
> we can find its stack location.
> 
> Now, as can be seen in the PR, I've tried to tweak tree-ssa-live.cc so that
> it would keep id_string.55 in cfun->local_decls; that prohibits it from
> the DECL_VALUE_EXPR of it being GC until expansion, but then we shrink and
> free cfun->local_decls completely and so GC at that point still can throw
> it away.
> 
> The following patch adds an extension to the GTY ((cache)) option, before
> calling the gt_cleare_cache on some hash table by specifying
> GTY ((cache ("somefn"))) it calls somefn on that hash table as well.
> And this extra hook can do any additional ggc_set_mark needed so that
> gt_cleare_cache preserves everything that is actually needed and throws
> away the rest.
> 
> In order to make it just 2 pass rather than up to n passes - (if we had
> say
> id1 -> something, id2 -> x(id1), id3 -> x(id2), id4 -> x(id3), id5 -> x(id4)
> in the value_expr_for_decl hash table in that order (where idN are VAR_DECLs
> with DECL_HAS_VALUE_EXPR_P, id5 is the only one mentioned from outside and
> idN -> X stands for idN having DECL_VALUE_EXPR X, something for some
> arbitrary tree and x(idN) for some arbitrary tree which mentions idN
> variable) and in each pass just marked the to part of entries with
> ggc_marked_p base.from we'd need to repeat until we don't mark anything)
> the patch calls walk_tree on DECL_VALUE_EXPR of the marked trees and if it
> finds yet unmarked tree, it marks it and walks its DECL_VALUE_EXPR as well
> the same way.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

So what this basically does is ensure we mark DECL_VALUE_EXPR when
VAR is marked which isn't done when marking a tree node.

That you special-case the hashtable walker is a workaround for
us not being able to say

struct GTY((mark_extra_stuff)) tree_decl_with_vis {

on 'tree' (or specifically the structs for a VAR_DECL).  And that we
rely on gengtype producing the 'tree' marker.  So we rely on the
hashtable keeping referenced trees live.

OK.

Thanks,
Richard.

> 2025-02-13  Jakub Jelinek  
> 
>   PR debug/118790
>   * gengtype.cc (write_roots): Remove cache variable, instead break from
>   the loop on match and test o for NULL.  If the cache option has
>   non-empty string argument, call the specified function with v->name
>   as argument before calling gt_cleare_cache on it.
>   * tree.cc (gt_value_expr_mark_2, gt_value_expr_mark_1,
>   gt_value_expr_mark): New functions.
>   (value_expr_for_decl): Use GTY ((cache ("gt_value_expr_mark"))) rather
>   than just GTY ((cache)).
>   * doc/gty.texi (cache): Document optional argument of cache option.
> 
>  

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Edwin Lu



On 2/13/2025 4:12 AM, Vineet Gupta wrote:

On 2/13/25 14:17, Robin Dapp wrote:

Other thoughts?

The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff
we can't/don't model in the pipeline, but I have no idea how to model
the VL=0 case there.

Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be
the first time a hook got (ab)used in ways that weren't part of the
original intent.

I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?

Exactly. My gut feeling w/o deep dive was this seemed like papering over the 
issue.

BTW what exactly is speculative scheduling ? As in what is it actually trying to
schedule ahead ?


Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?

As far as LCM is concerned it is hoisting the insn to the optimal spot. However
there's some additional logic such as in can_use_next_avl_p () which influences
if things can be moved around.


Since sched1 put the vsetvl right before the branch, that was always 
determined to be the "earliest" point because it was now available on 
all outgoing edges. Without the vsetvl right before the branch, the 
"earliest" point to insert the vsetvls was determined to be the 
beginning of each basic block.


I did try adding some additional logic to adjust the way vsetvl fusion 
occurs across basic blocks in these scenarios  i.e. performing the 
fusion in the opposite manner (breaking lcm guarantees); however, from 
my testing, fusing two vsetvls didn't actually remove the fused 
expression from the vinfo list. I'm not sure if that's intended but as a 
result, phase 3 would remove the fused block and use the vinfo that 
should've been fused into the other.



That won't help with the problem here but might with others.

Right this needs to be evaluated independently with both icounts and BPI3 runs
to see if anything falls out.

-Vineet


I'll add an opt flag to gate this for testing purposes.

Edwin



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Jeff Law




On 2/13/25 11:57 AM, Robin Dapp wrote:

I did try adding some additional logic to adjust the way vsetvl fusion
occurs across basic blocks in these scenarios  i.e. performing the
fusion in the opposite manner (breaking lcm guarantees); however, from
my testing, fusing two vsetvls didn't actually remove the fused
expression from the vinfo list. I'm not sure if that's intended but as a
result, phase 3 would remove the fused block and use the vinfo that
should've been fused into the other.


It depends on the specific example but keeping deleted vsetvls/infos around
has a purpose because it helps delete other vsetvls still.  I don't recall
details but I remember having at least a few examples for it.
Yea, that can certainly happen with LCM based algorithms when computing 
the availability and anticipatable sets.


Jeff


Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2025-02-13 Thread Jakub Jelinek
On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> As the manual states, using "-fhardened -fstack-protector" will produce
> a warning because -fhardened wants to enable -fstack-protector-strong,
> but it can't since it's been overriden by the weaker -fstack-protector.
> 
> -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> produce the same warning.  But we don't detect this combination, so
> this patch fixes it.  I also renamed a variable to better reflect its
> purpose.
> 
> Also don't check warn_hardened in process_command, since it's always
> true there.
> 
> Also tweak wording in the manual as Jon Wakely suggested on IRC.
> 
>   PR driver/117739
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi: Tweak wording for -Whardened.
>   * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
>   specified, don't enable linker hardening.
>   (process_command): Don't check warn_hardened.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/fhardened-16.c: New test.
>   * c-c++-common/fhardened-17.c: New test.
>   * c-c++-common/fhardened-18.c: New test.
>   * c-c++-common/fhardened-19.c: New test.
>   * c-c++-common/fhardened-20.c: New test.
>   * c-c++-common/fhardened-21.c: New test.

LGTM.

Jakub



[PATCH] tree: Fix up the DECL_VALUE_EXPR GC marking [PR118790]

2025-02-13 Thread Jakub Jelinek
Hi!

The ggc_set_mark call in gt_value_expr_mark_2 is actually wrong, that
just marks the VAR_DECL itself, but doesn't mark the subtrees of it (type
etc.).  So, I think we need to test gcc_marked_p for whether it is marked
or not, if not marked walk the DECL_VALUE_EXPR and then gt_ggc_mx mark
the VAR_DECL that was determined not marked and needs to be marked now.
One option would be to call gt_ggc_mx (t) right after the DECL_VALUE_EXPR
walking, but I'm a little bit worried that the subtree marking could mark
other VAR_DECLs (e.g. seen from DECL_SIZE or TREE_TYPE and the like) and
if they would be DECL_HAS_VALUE_EXPR_P we might not walk their
DECL_VALUE_EXPR anymore later.
So, the patch defers the gt_ggc_mx calls until we've walked all the
DECL_VALUE_EXPRs directly or indirectly connected to already marked
VAR_DECLs.

Ok for trunk if this passes bootstrap/regtest?

2025-02-13  Jakub Jelinek  

PR debug/118790
* tree.cc (struct gt_value_expr_mark_data): New type.
(gt_value_expr_mark_2): Don't call ggc_set_mark, instead check
ggc_marked_p.  Treat data as gt_value_expr_mark_data * with pset
in it rather than address of the pset itself and push to be marked
VAR_DECLs into to_mark vec.
(gt_value_expr_mark_1): Change argument from hash_set *
to gt_value_expr_mark_data * and find pset in it.
(gt_value_expr_mark): Pass to traverse_noresize address of
gt_value_expr_mark_data object rather than hash_table and
for all entries in the to_mark vector after the traversal call
gt_ggc_mx.

--- gcc/tree.cc.jj  2025-02-13 14:14:44.330394074 +0100
+++ gcc/tree.cc 2025-02-13 16:24:39.609106712 +0100
@@ -211,6 +211,11 @@ struct cl_option_hasher : ggc_cache_ptr_
 
 static GTY ((cache)) hash_table *cl_option_hash_table;
 
+struct gt_value_expr_mark_data {
+  hash_set pset;
+  auto_vec to_mark;
+};
+
 /* Callback called through walk_tree_1 to discover DECL_HAS_VALUE_EXPR_P
VAR_DECLs which weren't marked yet, in that case marks them and
walks their DECL_VALUE_EXPR expressions.  */
@@ -219,11 +224,12 @@ static tree
 gt_value_expr_mark_2 (tree *tp, int *, void *data)
 {
   tree t = *tp;
-  if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t) && !ggc_set_mark (t))
+  if (VAR_P (t) && DECL_HAS_VALUE_EXPR_P (t) && !ggc_marked_p (t))
 {
   tree dve = DECL_VALUE_EXPR (t);
-  walk_tree_1 (&dve, gt_value_expr_mark_2, data,
-  (hash_set *) data, NULL);
+  gt_value_expr_mark_data *d = (gt_value_expr_mark_data *) data;
+  walk_tree_1 (&dve, gt_value_expr_mark_2, data, &d->pset, NULL);
+  d->to_mark.safe_push (t);
 }
   return NULL_TREE;
 }
@@ -232,10 +238,10 @@ gt_value_expr_mark_2 (tree *tp, int *, v
value_expr_for_decl hash table.  */
 
 int
-gt_value_expr_mark_1 (tree_decl_map **e, hash_set *pset)
+gt_value_expr_mark_1 (tree_decl_map **e, gt_value_expr_mark_data *data)
 {
   if (ggc_marked_p ((*e)->base.from))
-walk_tree_1 (&(*e)->to, gt_value_expr_mark_2, pset, pset, NULL);
+walk_tree_1 (&(*e)->to, gt_value_expr_mark_2, data, &data->pset, NULL);
   return 1;
 }
 
@@ -255,8 +261,11 @@ gt_value_expr_mark (hash_table pset;
-  h->traverse_noresize *, gt_value_expr_mark_1> (&pset);
+  gt_value_expr_mark_data data;
+  h->traverse_noresize (&data);
+  for (auto v : data.to_mark)
+gt_ggc_mx (v);
 }
 
 /* General tree->tree mapping  structure for use in hash tables.  */

Jakub



Re: [patch, fortran] PR117430 gfortran allows type(C_ptr) in I/O list

2025-02-13 Thread Harald Anlauf

Am 12.02.25 um 21:49 schrieb Jerry D:

The attached patch is fairly obvious. The use of notify_std is changed
to a gfc_error. Several test cases had to be adjusted.

Regression tested on x86_64.

OK for trunk?


This is not a review, just some random comments on the testsuite changes
by your patch:

diff --git a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
index 4c2a7d657ee..92bfca4363d 100644
--- a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
+++ b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
@@ -1,5 +1,4 @@
 ! { dg-do compile }
-! { dg-options "" }
 !
 ! PR fortran/56378
 ! PR fortran/52426
@@ -24,5 +23,5 @@ contains
 end module

 use iso_c_binding
-print *, c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have
either the POINTER or the TARGET attribute" }
+i = c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have
either the POINTER or the TARGET attribute" }
^^^ i is not declared a type(c_ptr)
 end

diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
index 4ce1c6809e4..834570cb74d 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-options "-std=gnu" }
 ! This test case exists because gfortran had an error in converting the
 ! expressions for the derived types from iso_c_binding in some cases.
 module c_ptr_tests_10
@@ -7,7 +6,7 @@ module c_ptr_tests_10

 contains
   subroutine sub0() bind(c)
-print *, 'c_null_ptr is: ', c_null_ptr
+print *, 'c_null_ptr is: ', transfer (cptr, C_LONG_LONG)
 
This does not do what one naively might think.
transfer (cptr, C_LONG_LONG) == transfer (cptr, 0)

You probably want: transfer (cptr, 0_C_INTPTR_T)

   end subroutine sub0
 end module c_ptr_tests_10


diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
index 5a32553b8c5..711b9c157d4 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
@@ -16,9 +16,9 @@ contains
 type(myF90Derived), pointer :: my_f90_type_ptr

 my_f90_type%my_c_ptr = c_null_ptr
-print *, 'my_f90_type is: ', my_f90_type%my_c_ptr
+print *, 'my_f90_type is: ', transfer(my_f90_type%my_c_ptr,
C_LONG_LONG)
 my_f90_type_ptr => my_f90_type
-print *, 'my_f90_type_ptr is: ', my_f90_type_ptr%my_c_ptr
+print *, 'my_f90_type_ptr is: ', transfer(my_f90_type_ptr%my_c_ptr,
 C_LONG_LONG)
   end subroutine sub0
 end module c_ptr_tests_9

Likewise.

diff --git a/gcc/testsuite/gfortran.dg/init_flag_17.f90
b/gcc/testsuite/gfortran.dg/init_flag_17.f90
index 401830fccbc..8bb9f7b1ef7 100644
--- a/gcc/testsuite/gfortran.dg/init_flag_17.f90
+++ b/gcc/testsuite/gfortran.dg/init_flag_17.f90
@@ -19,8 +19,8 @@ program init_flag_17

   type(ty) :: t

-  print *, t%ptr
-  print *, t%fptr
+  print *, transfer(t%ptr, c_long_long)
+  print *, transfer(t%fptr, c_long_long)

 end program

Likewise.


diff --git a/gcc/testsuite/gfortran.dg/pr32601_1.f03
b/gcc/testsuite/gfortran.dg/pr32601_1.f03
index a297e1728ec..1a48419112d 100644
--- a/gcc/testsuite/gfortran.dg/pr32601_1.f03
+++ b/gcc/testsuite/gfortran.dg/pr32601_1.f03
@@ -4,9 +4,9 @@
 ! PR fortran/32601
 use, intrinsic :: iso_c_binding, only: c_loc, c_ptr
 implicit none
-
+integer i
 ! This was causing an ICE, but is an error because the argument to C_LOC
 ! needs to be a variable.
-print *, c_loc(4) ! { dg-error "shall have either the POINTER or the
TARGET attribute" }
+i = c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET
attribute" }

 end

Again, i should be declared as type(c_ptr).

Cheers,
Harald


Regards,

Jerry


Author: Jerry DeLisle 
Date:   Tue Feb 11 20:57:50 2025 -0800

     Fortran:  gfortran allows type(C_ptr) in I/O list

     Before this patch, gfortran was accepting invalid use of
     type(c_ptr) in I/O statements. The fix affects several
     existing test cases so no new test case needed.

     Existing tests were modified to pass by either using the
     transfer function to convert to an acceptable value or
     using an assignment to a like type (non-I/O).

     PR fortran/117430

     gcc/fortran/ChangeLog:

     * resolve.cc (resolve_transfer): Issue the error
     with no exceptions allowed.

     gcc/testsuite/ChangeLog:

     * gfortran.dg/c_loc_test_17.f90: Modify to pass.
     * gfortran.dg/c_ptr_tests_10.f03: Likewise.
     * gfortran.dg/c_ptr_tests_16.f90: Likewise.
     * gfortran.dg/c_ptr_tests_9.f03: Likewise.
     * gfortran.dg/init_flag_17.f90: Likewise.
     * gfortran.dg/pr32601_1.f03: Likewise.





[PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label

2025-02-13 Thread H.J. Lu
Increment LABEL_NUSES when using minipool_vector_label to avoid the zero
use count on minipool_vector_label.

PR target/118866
* config/arm/arm.cc (arm_reorg): Increment LABEL_NUSES when
using minipool_vector_label.

-- 
H.J.
From 91907dc6d948bf256dfa95a161af783df44b1b65 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 14 Feb 2025 05:25:47 +0800
Subject: [PATCH] arm: Increment LABEL_NUSES when using minipool_vector_label

Increment LABEL_NUSES when using minipool_vector_label to avoid the zero
use count on minipool_vector_label.

	PR target/118866
	* config/arm/arm.cc (arm_reorg): Increment LABEL_NUSES when
	using minipool_vector_label.

Signed-off-by: H.J. Lu 
---
 gcc/config/arm/arm.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index a95ddf8201f..2e3ffdd2607 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19787,6 +19787,7 @@ arm_reorg (void)
 			   gen_rtx_LABEL_REF (VOIDmode,
 		  minipool_vector_label),
 			   this_fix->minipool->offset);
+	LABEL_NUSES (minipool_vector_label) += 1;
 	*this_fix->loc = gen_rtx_MEM (this_fix->mode, addr);
 	  }
 
-- 
2.48.1



[pushed] c++: omp declare variant tweak

2025-02-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In r15-6707 I changed this function to use build_stub_object to more simply
produce the right type, but it occurs to me that forward_parm would be even
better, specifically for the diagnostic.

This changes nothing with respect to PR118791.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Use forward_parm.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/declare-variant-3.C: Adjust diagnostic.
* g++.dg/gomp/declare-variant-5.C: Adjust diagnostic.
---
 gcc/cp/decl.cc| 2 +-
 gcc/testsuite/g++.dg/gomp/declare-variant-3.C | 8 
 gcc/testsuite/g++.dg/gomp/declare-variant-5.C | 8 
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7f7f4938f2c..df4e66798b1 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8462,7 +8462,7 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
   if (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE)
 parm = DECL_CHAIN (parm);
   for (; parm; parm = DECL_CHAIN (parm))
-vec_safe_push (args, build_stub_object (TREE_TYPE (parm)));
+vec_safe_push (args, forward_parm (parm));
 
   unsigned nappend_args = 0;
   tree append_args_list = TREE_CHAIN (TREE_CHAIN (chain));
diff --git a/gcc/testsuite/g++.dg/gomp/declare-variant-3.C 
b/gcc/testsuite/g++.dg/gomp/declare-variant-3.C
index 8c0cfd218ad..fdf030fc429 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-variant-3.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-variant-3.C
@@ -86,8 +86,8 @@ struct E { int e; };
 
 void fn19 (E, int);
 
-#pragma omp declare variant (fn19)match(user={condition(0)})   // { dg-error 
{could not convert 'std::declval\(\)' from 'int' to 'E'} }
-void fn20 (int, E);
+#pragma omp declare variant (fn19)match(user={condition(0)})   // { dg-error 
{could not convert 'i' from 'int' to 'E'} }
+void fn20 (int i, E e);
 
 struct F { operator int () const { return 42; } int f; };
 void fn21 (int, F);
@@ -95,8 +95,8 @@ void fn21 (int, F);
 #pragma omp declare variant ( fn21 ) match (user = { condition ( 1 - 1 ) } )   
// { dg-error "variant 'void fn21\\\(int, F\\\)' and base 'void fn22\\\(F, 
F\\\)' have incompatible types" }
 void fn22 (F, F);
 
-#pragma omp declare variant (fn19) match (user={condition(0)}) // { 
dg-error {could not convert 'std::declval\(\)' from 'F' to 'E'} }
-void fn23 (F, int);
+#pragma omp declare variant (fn19) match (user={condition(0)}) // { 
dg-error {could not convert 'f' from 'F' to 'E'} }
+void fn23 (F f, int i);
 
 void fn24 (int);
 struct U { int u; };
diff --git a/gcc/testsuite/g++.dg/gomp/declare-variant-5.C 
b/gcc/testsuite/g++.dg/gomp/declare-variant-5.C
index a4747ac030b..f3697f66aba 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-variant-5.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-variant-5.C
@@ -74,8 +74,8 @@ struct E { int e; };
 
 void fn19 (E, int) {}
 
-#pragma omp declare variant (fn19)match(user={condition(0)})   // { dg-error 
{could not convert 'std::declval\(\)' from 'int' to 'E'} }
-void fn20 (int, E) {}
+#pragma omp declare variant (fn19)match(user={condition(0)})   // { dg-error 
{could not convert 'i' from 'int' to 'E'} }
+void fn20 (int i, E e) {}
 
 struct F { operator int () const { return 42; } int f; };
 void fn21 (int, F) {}
@@ -83,8 +83,8 @@ void fn21 (int, F) {}
 #pragma omp declare variant ( fn21 ) match (user = { condition ( 1 - 1 ) } )   
// { dg-error "variant 'void fn21\\\(int, F\\\)' and base 'void fn22\\\(F, 
F\\\)' have incompatible types" }
 void fn22 (F, F) {}
 
-#pragma omp declare variant (fn19) match (user={condition(0)}) // { 
dg-error {could not convert 'std::declval\(\)' from 'F' to 'E'} }
-void fn23 (F, int) {}
+#pragma omp declare variant (fn19) match (user={condition(0)}) // { 
dg-error {could not convert 'f' from 'F' to 'E'} }
+void fn23 (F f, int i) {}
 
 void fn24 (int);
 struct U { int u; };

base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578
-- 
2.48.1



[pushed] c++: use -Wprio-ctor-dtor for attribute init_priority

2025-02-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

gcc/cp/ChangeLog:

* tree.cc (handle_init_priority_attribute): Use OPT_prio_ctor_dtor.

gcc/testsuite/ChangeLog:

* g++.dg/special/initp1.C: Test disabling -Wprio-ctor-dtor.
---
 gcc/cp/tree.cc| 3 ++-
 gcc/testsuite/g++.dg/special/initp1.C | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 79bc74fa2b7..bf84fb6bcec 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5335,7 +5335,8 @@ handle_init_priority_attribute (tree* node,
   && !in_system_header_at (input_location))
 {
   warning
-   (0, "requested % %i is reserved for internal use",
+   (OPT_Wprio_ctor_dtor,
+"requested % %i is reserved for internal use",
 pri);
 }
 
diff --git a/gcc/testsuite/g++.dg/special/initp1.C 
b/gcc/testsuite/g++.dg/special/initp1.C
index 4a539a5a4bd..ef88ca970b8 100644
--- a/gcc/testsuite/g++.dg/special/initp1.C
+++ b/gcc/testsuite/g++.dg/special/initp1.C
@@ -30,9 +30,9 @@ Two hoo[ 3 ] = {
 Two( 15, 16 )
 };
 
-Two coo[ 3 ] __attribute__((init_priority(1000)));
-
-Two koo[ 3 ] __attribute__((init_priority(1000))) = {
+Two coo[ 3 ] __attribute__((init_priority(10))); // { dg-warning "reserved" }
+#pragma GCC diagnostic ignored "-Wprio-ctor-dtor"
+Two koo[ 3 ] __attribute__((init_priority(10))) = {
 Two( 21, 22 ),
 Two( 23, 24 ),
 Two( 25, 26 )

base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578
prerequisite-patch-id: cf6b02f09f22e626404250f9e5fc33e6e0351db2
-- 
2.48.1



[pushed] testsuite: adjust nontype-class72 for implicit constexpr

2025-02-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

This test added by r15-7507 doesn't get some expected diagnostics if we
implicitly make I(E) constexpr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class72.C: Disable -fimplicit-constexpr.
---
 gcc/testsuite/g++.dg/cpp2a/nontype-class72.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C
index 1c48ff57add..c36be7a4a80 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C
@@ -1,6 +1,7 @@
 // PR c++/113800
 // P2308R1 - Template parameter initialization
 // { dg-do compile { target c++20 } }
+// { dg-additional-options "-fno-implicit-constexpr" }
 // Invalid cases.
 
 namespace std {

base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578
-- 
2.48.1



[PATCH] dwarf: emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack [PR70536]

2025-02-13 Thread Jason Merrill
From: Ed Catmur 

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Per https://wiki.dwarfstd.org/C++0x_Variadic_templates.md
DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name:

17$:  DW_TAG_formal_parameter_pack
  DW_AT_name("args")
18$:  DW_TAG_formal_parameter
  ! no DW_AT_name attribute
  DW_AT_type(reference to 13$)
(...)

PR c++/70536

gcc/ChangeLog:

* dwarf2out.cc (gen_formal_parameter_pack_die): Add name attr.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/template-func-params-7.C: Check for pack names.

Co-authored-by: Jason Merrill 
---
 gcc/dwarf2out.cc   | 2 +-
 gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C | 7 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 43884f206c0..ed7d9402200 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -23195,7 +23195,7 @@ gen_formal_parameter_pack_die  (tree parm_pack,
  && subr_die);
 
   parm_pack_die = new_die (DW_TAG_GNU_formal_parameter_pack, subr_die, 
parm_pack);
-  add_src_coords_attributes (parm_pack_die, parm_pack);
+  add_name_and_src_coords_attributes (parm_pack_die, parm_pack);
 
   for (arg = pack_arg; arg; arg = DECL_CHAIN (arg))
 {
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C
index 22b0e4f984d..4e95c238bcd 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-7.C
@@ -23,6 +23,9 @@
 // These 3 function template instantiations has a total of 3 template
 // parameters named T.
 // { dg-final { scan-assembler-times "\.ascii \"T.0\"\[\t 
\]+\[^\n\]*DW_AT_name" 3 } }
+// And the packs also have names.
+// { dg-final { scan-assembler-times "\.ascii \"PTs.0\"\[\t 
\]+\[^\n\]*DW_AT_name" 3 } }
+// { dg-final { scan-assembler-times "\.ascii \"args.0\"\[\t 
\]+\[^\n\]*DW_AT_name" 3 } }
 
 
 void
@@ -35,11 +38,11 @@ printf(const char* s)
   */
 }
 
-template
+template
 void
 printf(const char* s,
T value,
-   PackTypes... args)
+   PTs... args)
 {
   while (*s)
 {

base-commit: cdb4d27a4c2786cf1b1b0eb1872eac6a5f931578
prerequisite-patch-id: cf6b02f09f22e626404250f9e5fc33e6e0351db2
prerequisite-patch-id: 29fd7472d58735638f85059fd1678bba9acf7bf6
-- 
2.48.1



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Palmer Dabbelt

On Thu, 13 Feb 2025 06:46:10 PST (-0800), jeffreya...@gmail.com wrote:



On 2/13/25 1:47 AM, Robin Dapp wrote:

Other thoughts?


The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff
we can't/don't model in the pipeline, but I have no idea how to model
the VL=0 case there.

Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be
the first time a hook got (ab)used in ways that weren't part of the
original intent.


I don't fully understand what's happening.  So the hoisting is being done
speculatively here?  And it just happens to be "bad" because that might
cause a VL=0 case.  But are we sure a lack of speculation cannot cause
such cases?

Yes/No.  The scheduler certainly has code to avoid hoisting when doing
so would  change semantics.  That's not what's happening here.

I'd have to put it in a debugger or read the full dumps with some crazy
scheduler dump verbosity setting to be sure, but what I suspect is
happening is the scheduler is processing a multi-block region
(effectively an extended basic block).   In this scenario the scheduler
can pull insns from a later block into an earlier block, including past
a conditional branch as long as it doesn't change program semantics.


(Sorry to keep crossing the threads here, there's just a lot in this one 
and stuff gets truncated.)


FWIW, that's what tripped up my "maybe there's a functional bug here" 
thought.  It looks like the scheduling is seeing


   bne t0, x0, end
   vsetvli t1, t2, ...
   vsetvli x0, t2, ...
   ...
 end:
   vsetvli x0, t2, ...

and thinking it's safe to schedule that like

   vsetvli t1, t2, ...
   bne t0, x0, end
   vsetvli x0, t2, ...
   ...
 end:
   vsetvli x0, t2, ...

which I'd assumed is because the scheduler sees both execution paths 
overwriting the vector control registers and thus thinks it's safe to 
move the first vsetvli to execute speculatively.  From reading 
"6. Configuration-Setting Instructions" in vector.md that seems 
intentional, though, so maybe it's all just fine?



Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?

The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
path that didn't have a vsetvl before.  Consider this simple graph.

 0
/ \
   2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
will land in bb4.  bb0 is not a valid insertion point for the vsetvl
pass because the path 0->3 doesn't strictly need a vsetvl.  That's
inherent in the LCM algorithm (anticipatable).

The scheduler has no such limitations.  The scheduler might create a
scheduling region out of blocks 0 and 2.  In that scenario, insns from
block 2 may speculate into block 0 as long as doing so doesn't change
semantics.


Ya.  The combination of the scheduler moving a vsetvli before the 
branch (IIUC from bb2 to bb0 here) and the vsetvli merging causes it to 
look like the whole vsetvli was moved before the branch.


I'm not sure why the scheduler doesn't move both vsetvli instructions to 
execute speculatively, but otherwise this seems to be behaving as 
designed.  It's just tripping up the VL=0 cases for us.



On a separate note:  How about we move the vsetvl pass after sched2?
Then we could at least rely on LCM doing its work uninhibited and wouldn't
reorder vsetvls afterwards.  Or do we somehow rely on rtl_dce and BB
reorder to run afterwards?

That won't help with the problem here but might with others.

It's a double edged sword.  If you defer placement until after
scheduling, then the vsetvls can wreck havoc with whatever schedule that
sched2 came up with.  It won't matter much for out of order designs, but
potentially does for others.


Maybe that's a broad uarch split point here?  For OOO designs we'd 
want to rely on HW scheduling and thus avoid hoisting possibly-expensive 
vsetvli instructions (where they'd need to execute in HW because of the 
side effects), while on in-order designs we'd want to aggressively 
schedule vsetvli instructions because we can't rely on HW scheduling to 
hide the latency.



In theory at sched2 time the insn stream should be fixed.  There are
practical/historical exceptions, but changes to the insn stream after
that point are discouraged.


We were just talking about this is our toolchain team meeting, and it 
seems like both GCC and LLVM are in similar spots here -- essentially 
the required set of vsetvli instructions depends very strongly on 
scheduling, so trying to do them independently is just always going to 
lead to sub-par results.  It feels kind of like we want some 
scheduling-based cost feedback in the vsetvli pass (or the other way 
around if they're in the other order) to get better results.


Maybe that's too much of a time sin

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-13 Thread Tamar Christina



> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, February 13, 2025 4:55 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd
> 
> Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> That said, I'm quite sure we don't want to have a dr->target_alignment
> >> that isn't power-of-two, so if the comput doesn't end up with a
> >> power-of-two value we should leave it as the target prefers and
> >> fixup (or fail) during vectorizable_load.
> >
> > Ack I'll round up to power of 2.
> 
> I don't think that's enough.  Rounding up 3 would give 4, but a group
> size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X
> for some X.  [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary.
> 

Indeed, instead of rounding up I just reject the non-power of 2 alignment
requests in vectorizable_load as Richi originally requested.   I thought I could
get it to work better by rounding up but it doesn't seem worth it.

Cheers,
Tamar

> Thanks,
> Richard


[pushed: r15-7515] jit: add "final override" to diagnostic sink [PR116613]

2025-02-13 Thread David Malcolm
I added class jit_diagnostic_listener in r15-4760-g0b73e9382ab51c
but forgot to annotate one of the vfuncs with "override".

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-7515-g6ac313525a1fae.

gcc/jit/ChangeLog:
PR other/116613
* dummy-frontend.cc
(jit_diagnostic_listener::on_report_diagnostic): Add
"final override".

Signed-off-by: David Malcolm 
---
 gcc/jit/dummy-frontend.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/dummy-frontend.cc b/gcc/jit/dummy-frontend.cc
index 1d0080d6fec..88784ec9e92 100644
--- a/gcc/jit/dummy-frontend.cc
+++ b/gcc/jit/dummy-frontend.cc
@@ -1017,7 +1017,7 @@ public:
   }
 
   void on_report_diagnostic (const diagnostic_info &info,
-diagnostic_t orig_diag_kind)
+diagnostic_t orig_diag_kind) final override
   {
 JIT_LOG_SCOPE (gcc::jit::active_playback_ctxt->get_logger ());
 
-- 
2.26.3



Patch ping^6 (Re: [PATCH] analyzer: Handle nonnull_if_nonzero attribute [PR117023])

2025-02-13 Thread Jakub Jelinek
On Thu, Feb 06, 2025 at 04:30:47PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 21, 2025 at 04:59:16PM +0100, Jakub Jelinek wrote:
> > On Tue, Jan 07, 2025 at 01:49:04PM +0100, Jakub Jelinek wrote:
> > > On Wed, Dec 18, 2024 at 12:15:15PM +0100, Jakub Jelinek wrote:
> > > > On Fri, Dec 06, 2024 at 05:07:40PM +0100, Jakub Jelinek wrote:
> > > > > I'd like to ping the
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668699.html
> > > > > patch.
> > > > > 
> > > > > The patches it depended on are already committed and there is a patch
> > > > > which depends on this (the builtins shift from nonnull to 
> > > > > nonnull_if_nonzero
> > > > > where needed) which has been approved but can't be committed.
> > > > 
> > > > Gentle ping on this one.
> > > 
> > > Ping.
> > 
> > Ping again.
> 
> Ping.

Ping again.

> > > > > > 2024-11-14  Jakub Jelinek  
> > > > > > 
> > > > > > PR c/117023
> > > > > > gcc/analyzer/
> > > > > > * sm-malloc.cc (malloc_state_machine::on_stmt): Handle
> > > > > > also nonnull_if_nonzero attributes.
> > > > > > gcc/testsuite/
> > > > > > * c-c++-common/analyzer/call-summaries-malloc.c
> > > > > > (test_use_without_check): Pass 4 rather than sz to memset.
> > > > > > * c-c++-common/analyzer/strncpy-1.c (test_null_dst,
> > > > > > test_null_src): Pass 42 rather than count to strncpy.

Jakub



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Jeff Law




On 2/13/25 11:13 AM, Palmer Dabbelt wrote:



FWIW, that's what tripped up my "maybe there's a functional bug here" 
thought.  It looks like the scheduling is seeing


    bne t0, x0, end
    vsetvli t1, t2, ...
    vsetvli x0, t2, ...
    ...
  end:
    vsetvli x0, t2, ...

and thinking it's safe to schedule that like

    vsetvli t1, t2, ...
    bne t0, x0, end
    vsetvli x0, t2, ...
    ...
  end:
    vsetvli x0, t2, ...

which I'd assumed is because the scheduler sees both execution paths 
overwriting the vector control registers and thus thinks it's safe to 
move the first vsetvli to execute speculatively.  From reading "6. 
Configuration-Setting Instructions" in vector.md that seems intentional, 
though, so maybe it's all just fine?
I think it's fine.   Perhaps not what we want from a performance 
standpoint, but functionally safe.






Also, why doesn't the vsetvl pass fix the situation?  IMHO we need to
understand the problem more thoroughly before changing things.
In the end LCM minimizes the number of vsetvls and inserts them at the
"earliest" point.  If that is not sufficient I'd say we need modify
the constraints (maybe on a per-uarch basis)?

The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
path that didn't have a vsetvl before.  Consider this simple graph.

 0
    / \
   2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
will land in bb4.  bb0 is not a valid insertion point for the vsetvl
pass because the path 0->3 doesn't strictly need a vsetvl.  That's
inherent in the LCM algorithm (anticipatable).

The scheduler has no such limitations.  The scheduler might create a
scheduling region out of blocks 0 and 2.  In that scenario, insns from
block 2 may speculate into block 0 as long as doing so doesn't change
semantics.


Ya.  The combination of the scheduler moving a vsetvli before the branch 
(IIUC from bb2 to bb0 here) and the vsetvli merging causes it to look 
like the whole vsetvli was moved before the branch.


I'm not sure why the scheduler doesn't move both vsetvli instructions to 
execute speculatively, but otherwise this seems to be behaving as 
designed.  It's just tripping up the VL=0 cases for us.
You'd have to get into those dumps and possibly throw the compiler under 
a debugger.  My guess is it didn't see any advantage in doing so.





Maybe that's a broad uarch split point here?  For OOO designs we'd want 
to rely on HW scheduling and thus avoid hoisting possibly-expensive 
vsetvli instructions (where they'd need to execute in HW because of the 
side effects), while on in-order designs we'd want to aggressively 
schedule vsetvli instructions because we can't rely on HW scheduling to 
hide the latency.
There may be.  But the natural question would be cost/benefit.  It may 
not buy us anything on the performance side to defer vsetvl insertion 
for OOO cores.  At which point the only advantage is testsuite 
stability.  And if that's the only benefit, we may be able to do that 
through other mechanisms.






In theory at sched2 time the insn stream should be fixed.  There are
practical/historical exceptions, but changes to the insn stream after
that point are discouraged.


We were just talking about this is our toolchain team meeting, and it 
seems like both GCC and LLVM are in similar spots here -- essentially 
the required set of vsetvli instructions depends very strongly on 
scheduling, so trying to do them independently is just always going to 
lead to sub-par results.  It feels kind of like we want some scheduling- 
based cost feedback in the vsetvli pass (or the other way around if 
they're in the other order) to get better results.


Maybe that's too much of a time sink for the OOO machines, though?  If 
we've got HW scheduling then the SW just has to be in the ballpark and 
everything should be fine.
I'd guess it more work than it'd be worth.  We're just not seeing 
vsetvls being all that problematical on our design.  I do see a lot of 
seemingly gratutious changes in the vector config, but when we make 
changes to fix that we generally end up with worse performing code.


Jeff



[PATCH] c++: fix propagating REF_PARENTHESIZED_P [PR116379]

2025-02-13 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we have:

  template
  struct X{
  T val;
  decltype(auto) value(){
  return (val);
  }
  };

where the return type of value should be 'int &' since '(val)' is an
expression, not a name, and decltype(auto) performs the type deduction
using the decltype rules.

The problem is that we weren't propagating REF_PARENTHESIZED_P
correctly: the return value of finish_non_static_data_member in this
test was a REFERENCE_REF_P, so we didn't set the flag.  We should
use force_paren_expr like below.

PR c++/116379

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Use force_paren_expr to set
REF_PARENTHESIZED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto9.C: New test.
---
 gcc/cp/pt.cc|  4 ++--
 gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C | 15 +++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a2fc8813e9d..5706a3987c3 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21712,8 +21712,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  {
r = finish_non_static_data_member (member, object, NULL_TREE,
   complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ force_paren_expr (r);
RETURN (r);
  }
else if (type_dependent_expression_p (object))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C
new file mode 100644
index 000..1ccf95a0170
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C
@@ -0,0 +1,15 @@
+// PR c++/116379
+// { dg-do compile { target c++14 } }
+
+template
+struct X {
+  T val;
+  decltype(auto) value() { return (val); }
+};
+
+int main() {
+  int i = 0;
+  X x{ static_cast(i) };
+  using type = decltype(x.value());
+  using type = int&;
+}

base-commit: a134dcd8a010744a0097d190f73a4efc2e381531
-- 
2.48.1



[PATCH V3] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Edwin Lu
The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs

vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
sub a1,a1,a5 <-- a1 potentially set to 0
sh2add  a0,a5,a0
vfmacc.vv   v1,v2,v2
vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
beq a1,zero,.L12 <-- check if avl is 0

This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.

Since this is purely a performance related patch, gate the target hook
with an opt flag to see the fallout.

PR 117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
* config/riscv/riscv.opt: Add temporary opt.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr117974.c: New test.

Signed-off-by: Edwin Lu 
---
V2: add testcase
V3: add opt flag to test performance
---
 gcc/config/riscv/riscv.cc | 25 +++
 gcc/config/riscv/riscv.opt|  4 +++
 .../gcc.target/riscv/rvv/vsetvl/pr117974.c| 17 +
 3 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e14126e3a4..7203594b526 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,6 +10209,28 @@ riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn 
*insn, int cost,
   return new_cost;
 }

+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook.  Return true if insn can
+   can be scheduled for speculative execution.  Reject vsetvl instructions to
+   prevent the scheduler from hoisting them out of basic blocks without
+   checking for data dependencies PR117974.  */
+static bool
+riscv_sched_can_speculate_insn (rtx_insn *insn)
+{
+  /* Gate speculative scheduling of vsetvl instructions behind opt flag
+ for performance testing purposes.  */
+  if (!vsetvl_speculative_sched)
+return true;
+
+  switch (get_attr_type (insn))
+{
+  case TYPE_VSETVL:
+  case TYPE_VSETVL_PRE:
+   return false;
+  default:
+   return true;
+}
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -14055,6 +14077,9 @@ bool need_shadow_stack_push_pop_p ()
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost

+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN riscv_sched_can_speculate_insn
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7515c8ea13d..486ba746d99 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -681,3 +681,7 @@ Specifies whether the fence.tso instruction should be used.
 mautovec-segment
 Target Integer Var(riscv_mautovec_segment) Init(1)
 Enable (default) or disable generation of vector segment load/store 
instructions.
+
+-param=vsetvl-speculative-sched
+Target Undocumented Uinteger Var(vsetvl_speculative_sched) Init(0)
+-param=vsetvl-speculative-sched Enable speculative scheduling of vsetvl 
instructions.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c
new file mode 100644
index 000..97839427987
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -mrvv-vector-bits=zvl 
-Ofast" } */
+/* { dg-additional-options "--param=vsetvl-speculative-sched" } */
+
+float g(float q[], int N){
+float dqnorm = 0.0;
+
+#pragma GCC unroll 4
+
+for (int i=0; i < N; i++) {
+dqnorm = dqnorm + q[i] * q[i];
+}
+return dqnorm;
+}
+
+/* { dg-final { scan-assembler-times {beq\s+[a-x0-9]+,zero,.L12\s+vsetvli} 3 } 
} */
+
--
2.43.0



Re: [patch, Fortran] Fix PR 118845

2025-02-13 Thread Thomas Koenig

Hi Jerry,


This is OK.


Pushed as r15-7509.  Thanks for the review!

It would be good to get confirmation that the lapack builds 
now.  I use to be set up here to do that, but dont have it at the moment.


It checked the original test case, that passed.  But yes, a Lapack
tester would be nice.

Now on to PR118862 (but not this today :-)

Best regards

Thomas



[PATCH v2 1/1] gdc: define ELFv1 and ELFv2 versions for powerpc

2025-02-13 Thread liushuyu
From: Zixing Liu 

gcc/ChangeLog:
* config/rs6000/rs6000-d.cc: define ELFv1 and ELFv2
  version identifiers according to the target options.

gcc/testsuite/ChangeLog:
* gdc.dg/ppcabi.d: Add a test to test for code generation
  correctness when using IEEE 128 and new ELFv1 and ELFv2
  identifiers.

Signed-off-by: Zixing Liu 
---
 gcc/config/rs6000/rs6000-d.cc |  5 +
 gcc/testsuite/gdc.dg/ppcabi.d | 23 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/ppcabi.d

diff --git a/gcc/config/rs6000/rs6000-d.cc b/gcc/config/rs6000/rs6000-d.cc
index c9e1acad88..bc5d643d49 100644
--- a/gcc/config/rs6000/rs6000-d.cc
+++ b/gcc/config/rs6000/rs6000-d.cc
@@ -45,6 +45,11 @@ rs6000_d_target_versions (void)
   d_add_builtin_version ("PPC_SoftFloat");
   d_add_builtin_version ("D_SoftFloat");
 }
+
+  if (DEFAULT_ABI == ABI_ELFv2)
+d_add_builtin_version ("ELFv2");
+  else
+d_add_builtin_version ("ELFv1");
 }
 
 /* Handle a call to `__traits(getTargetInfo, "floatAbi")'.  */
diff --git a/gcc/testsuite/gdc.dg/ppcabi.d b/gcc/testsuite/gdc.dg/ppcabi.d
new file mode 100644
index 00..9271c64436
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/ppcabi.d
@@ -0,0 +1,23 @@
+// { dg-do compile { target { powerpc64*-linux-gnu* } } }
+// { dg-options "-mabi=ieeelongdouble -mabi=elfv2 -mcpu=power9 -O2" }
+
+// { dg-final { scan-assembler "_Z13test_functionu9__ieee128" } }
+extern (C++) bool test_function(real arg) {
+// { dg-final { scan-assembler "xscmpuqp" } }
+// { dg-final { scan-assembler-not "fcmpu" } }
+return arg > 0.0;
+}
+
+// { dg-final { scan-assembler "test_version" } }
+extern (C) bool test_version() {
+// { dg-final { scan-assembler "li 3,1" } }
+version (PPC64) return real.mant_dig == 113;
+else return false;
+}
+
+// { dg-final { scan-assembler "test_elf_version" } }
+extern (C) bool test_elf_version() {
+// { dg-final { scan-assembler "li 3,0" } }
+version (ELFv2) return false;
+else return true;
+}
-- 
2.48.1



[PATCH v2 0/1] gdc: define ELFv1 and ELFv2 versions for powerpc

2025-02-13 Thread liushuyu
From: Zixing Liu 

This patch was formerly known as
"gdc: define ELFv1, ELFv2 and D_PPCUseIEEE128 versions for powerpc",
due to new developments in https://github.com/dlang/dmd/pull/20826,
compiler is now not required to mark D_PPCUseIEEE128 version identifier.
Instead, correctly setting real.mant_dig will suffice (GDC is already providing
the correct information).

The patch adds the ELFv1 and ELFv2 version identifiers to bridge
the gap between LLVM D Compiler (LDC) and GNU D Compiler (GDC) so that
the user can reliably use the "version(...)" syntax to check for which
ABI is currently in use.

ELFv1 and ELFv2 ABI concept seem to only exist on POWER platforms,
so other platforms do not need to follow this change, as far as I know.

Zixing Liu (1):
  gdc: define ELFv1 and ELFv2 versions for powerpc

 gcc/config/rs6000/rs6000-d.cc |  5 +
 gcc/testsuite/gdc.dg/ppcabi.d | 23 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/ppcabi.d

-- 
2.48.1



Re: 7/7 [Fortran, Patch, Coarray, PR107635] Remove deprecated coarray routines

2025-02-13 Thread Jerry D

On 2/10/25 2:25 AM, Andre Vehreschild wrote:

[PATCH 7/7] Fortran: Remove deprecated coarray routines [PR107635]



I have applied all patches. Regression tested OK here.

From patch 5 there was one reject:

patching file gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90
Hunk #1 FAILED at 39.
1 out of 1 hunk FAILED -- saving rejects to file 
gcc/testsuite/gfortran.dg/coarray/send_char_array_1.f90.rej


I commented earlier about changing the name of rewrite.cc.

I am now going through the whole enchilada for editorial stuff.

Regards,

Jerry


[patch, Fortran] Fix PR 118845

2025-02-13 Thread Thomas Koenig

Hello world,

this was an interesting regression.  It came from my recent
patch, where an assert was triggered because a procedure artificial
dummy argument generated for a global symbol did not have the
information if if was a function or a subroutine.  Fixed by
adding the information in gfc_get_formal_from_actual_arglist.

This information then uncovered some new errors, also in the
testsuite, which needed fixing.  Finally, the error is made to
look a bit nicer, so the user gets a pointer to where the
original interface comes from, like this:

   10 | CALL bar (test2) ! { dg-error "Interface mismatch in dummy 
procedure" }

  |  1
..
   16 | CALL bar (test) ! { dg-error "Interface mismatch in dummy 
procedure" }

  |  2
Fehler: Interface mismatch in dummy procedure at (1) conflichts with 
(2): 'test2' is not a subroutine


Regression-tested. OK for trunk?

Best regards

Thomas

gcc/fortran/ChangeLog:

PR fortran/118845
* interface.cc (compare_parameter): If the formal attribute has been
generated from an actual argument list, also output an pointer to
there in case of an error.
(gfc_get_formal_from_actual_arglist): Set function and subroutine
attributes and (if it is a function) the typespec from the actual
argument.

gcc/testsuite/ChangeLog:

PR fortran/118845
* gfortran.dg/recursive_check_4.f03: Adjust call so types matche.
* gfortran.dg/recursive_check_6.f03: Likewise.
* gfortran.dg/specifics_2.f90: Adjust calls so types match.
* gfortran.dg/interface_52.f90: New test.
* gfortran.dg/interface_53.f90: New test.
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index fdde84db80d..edec907d33a 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -2474,8 +2474,16 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
 	   sizeof(err),NULL, NULL))
 	{
 	  if (where)
-	gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at %L:"
-			   " %s", formal->name, &actual->where, err);
+	{
+	  /* Artificially generated symbol names would only confuse.  */
+	  if (formal->attr.artificial)
+		gfc_error_opt (0, "Interface mismatch in dummy procedure "
+			   "at %L conflicts with %L: %s", &actual->where,
+			   &formal->declared_at, err);
+	  else
+		gfc_error_opt (0, "Interface mismatch in dummy procedure %qs "
+			   "at %L: %s", formal->name, &actual->where, err);
+	}
 	  return false;
 	}
 
@@ -2483,8 +2491,16 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
    sizeof(err), NULL, NULL))
 	{
 	  if (where)
-	gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at %L:"
-			   " %s", formal->name, &actual->where, err);
+	{
+	  if (formal->attr.artificial)
+		gfc_error_opt (0, "Interface mismatch in dummy procedure "
+			   "at %L conflichts with %L: %s", &actual->where,
+			   &formal->declared_at, err);
+	  else
+		gfc_error_opt (0, "Interface mismatch in dummy procedure %qs at "
+			   "%L: %s", formal->name, &actual->where, err);
+
+	}
 	  return false;
 	}
 
@@ -5822,7 +5838,14 @@ gfc_get_formal_from_actual_arglist (gfc_symbol *sym,
 	  gfc_get_symbol (name, gfc_current_ns, &s);
 	  if (a->expr->ts.type == BT_PROCEDURE)
 	{
+	  gfc_symbol *asym = a->expr->symtree->n.sym;
 	  s->attr.flavor = FL_PROCEDURE;
+	  if (asym->attr.function)
+		{
+		  s->attr.function = 1;
+		  s->ts = asym->ts;
+		}
+	  s->attr.subroutine = asym->attr.subroutine;
 	}
 	  else
 	{
diff --git a/gcc/testsuite/gfortran.dg/interface_52.f90 b/gcc/testsuite/gfortran.dg/interface_52.f90
new file mode 100644
index 000..4d619241c27
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_52.f90
@@ -0,0 +1,20 @@
+  ! { dg-do compile }
+MODULE m
+  IMPLICIT NONE
+
+CONTAINS
+
+  SUBROUTINE test ()
+IMPLICIT NONE
+
+CALL bar (test2) ! { dg-error "Interface mismatch in dummy procedure" }
+  END SUBROUTINE test
+
+  INTEGER FUNCTION test2 () RESULT (x)
+IMPLICIT NONE
+
+CALL bar (test) ! { dg-error "Interface mismatch in dummy procedure" }
+  END FUNCTION test2
+
+END MODULE m
+
diff --git a/gcc/testsuite/gfortran.dg/interface_53.f90 b/gcc/testsuite/gfortran.dg/interface_53.f90
new file mode 100644
index 000..99a2b959463
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_53.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR 118845 - reduced from a segfault in Lapack.
+SUBROUTINE SDRVES(  RESULT )
+  external SSLECT
+  CALL SGEES( SSLECT )
+  CALL SGEES( SSLECT )
+  RESULT = SSLECT( 1, 2 )
+END
diff --git a/gcc/testsuite/gfortran.dg/recursive_check_4.f03 b/gcc/testsuite/gfortran.dg/recursive_check_4.f03
index ece42ca2312..da45762f9b1 100644
--- a/gcc/testsuite/gfortran.dg/recursive_check_4.f03
+++ b/gcc/testsuite/gfortran.dg/recursive_check_4.f03
@@ -20,7 +20,7 @@ CONTAINS
 IMPLICIT NONE
 PRO

[pushed] c++: -frange-for-ext-temps and reused temps [PR118856]

2025-02-13 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Some things in the front-end use a TARGET_EXPR to create a temporary, then
refer to its TARGET_EXPR_SLOT separately later; in this testcase,
maybe_init_list_as_range does.  So we need to handle that pattern in
extend_all_temps.

PR c++/118856

gcc/cp/ChangeLog:

* call.cc (struct extend_temps_data): Add var_map.
(extend_all_temps): Adjust.
(set_up_extended_ref_temp): Make walk_data void*.
(extend_temps_r): Remap variables.  Handle pset here.
Extend all TARGET_EXPRs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/range-for9.C: New test.
---
 gcc/cp/call.cc  | 89 +
 gcc/testsuite/g++.dg/cpp23/range-for9.C | 20 ++
 2 files changed, 68 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/range-for9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 2c77b4a4b68..38a8f7fdcda 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14154,18 +14154,6 @@ make_temporary_var_for_ref_to_temp (tree decl, tree 
type)
   return pushdecl (var);
 }
 
-/* Data for extend_temps_r, mostly matching the parameters of
-   extend_ref_init_temps.  */
-
-struct extend_temps_data
-{
-  tree decl;
-  tree init;
-  vec **cleanups;
-  tree* cond_guard;
-  hash_set *pset;
-};
-
 static tree extend_temps_r (tree *, int *, void *);
 
 /* EXPR is the initializer for a variable DECL of reference or
@@ -14177,7 +14165,7 @@ static tree extend_temps_r (tree *, int *, void *);
 static tree
 set_up_extended_ref_temp (tree decl, tree expr, vec **cleanups,
  tree *initp, tree *cond_guard,
- extend_temps_data *walk_data)
+ void *walk_data)
 {
   tree init;
   tree type;
@@ -14218,7 +14206,7 @@ set_up_extended_ref_temp (tree decl, tree expr, 
vec **cleanups,
  maybe_constant_init because the extension might change its result.  */
   if (walk_data)
 cp_walk_tree (&TARGET_EXPR_INITIAL (expr), extend_temps_r,
- walk_data, walk_data->pset);
+ walk_data, nullptr);
   else
 TARGET_EXPR_INITIAL (expr)
   = extend_ref_init_temps (decl, TARGET_EXPR_INITIAL (expr), cleanups,
@@ -14833,6 +14821,19 @@ extend_ref_init_temps_1 (tree decl, tree init, 
vec **cleanups,
   return init;
 }
 
+/* Data for extend_temps_r, mostly matching the parameters of
+   extend_ref_init_temps.  */
+
+struct extend_temps_data
+{
+  tree decl;
+  tree init;
+  vec **cleanups;
+  tree* cond_guard;
+  hash_set *pset; // For avoiding redundant walk_tree.
+  hash_map *var_map; // For remapping extended temps.
+};
+
 /* Tree walk function for extend_all_temps.  Generally parallel to
extend_ref_init_temps_1, but adapted for walk_tree.  */
 
@@ -14841,7 +14842,15 @@ extend_temps_r (tree *tp, int *walk_subtrees, void 
*data)
 {
   extend_temps_data *d = (extend_temps_data *)data;
 
-  if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR)
+  if (TREE_CODE (*tp) == VAR_DECL)
+{
+  if (tree *r = d->var_map->get (*tp))
+   *tp = *r;
+  return NULL_TREE;
+}
+
+  if (TYPE_P (*tp) || TREE_CODE (*tp) == CLEANUP_POINT_EXPR
+  || d->pset->add (*tp))
 {
   *walk_subtrees = 0;
   return NULL_TREE;
@@ -14849,13 +14858,13 @@ extend_temps_r (tree *tp, int *walk_subtrees, void 
*data)
 
   if (TREE_CODE (*tp) == COND_EXPR)
 {
-  cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, d->pset);
+  cp_walk_tree (&TREE_OPERAND (*tp, 0), extend_temps_r, d, nullptr);
 
   auto walk_arm = [d](tree &op)
   {
tree cur_cond_guard = NULL_TREE;
auto ov = make_temp_override (d->cond_guard, &cur_cond_guard);
-   cp_walk_tree (&op, extend_temps_r, d, d->pset);
+   cp_walk_tree (&op, extend_temps_r, d, nullptr);
if (cur_cond_guard)
  {
tree set = build2 (MODIFY_EXPR, boolean_type_node,
@@ -14870,29 +14879,25 @@ extend_temps_r (tree *tp, int *walk_subtrees, void 
*data)
   return NULL_TREE;
 }
 
-  if (TREE_CODE (*tp) == ADDR_EXPR
-  /* A discarded-value temporary.  */
-  || (TREE_CODE (*tp) == CONVERT_EXPR
- && VOID_TYPE_P (TREE_TYPE (*tp
-{
-  tree *p;
-  for (p = &TREE_OPERAND (*tp, 0);
-  TREE_CODE (*p) == COMPONENT_REF || TREE_CODE (*p) == ARRAY_REF; )
-   p = &TREE_OPERAND (*p, 0);
-  if (TREE_CODE (*p) == TARGET_EXPR)
-   {
- tree subinit = NULL_TREE;
- *p = set_up_extended_ref_temp (d->decl, *p, d->cleanups, &subinit,
-d->cond_guard, d);
- if (TREE_CODE (*tp) == ADDR_EXPR)
-   recompute_tree_invariant_for_addr_expr (*tp);
- if (subinit)
-   *tp = cp_build_compound_expr (subinit, *tp, tf_none);
-   }
-}
+  tree *p = tp;
 
-  /* TARGET_EXPRs that aren't handled by the above are implementation details
- that shouldn't be ref-exten

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Jeff Law




On 2/13/25 8:19 AM, Robin Dapp wrote:

The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
path that didn't have a vsetvl before.  Consider this simple graph.

  0
 / \
2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
will land in bb4.  bb0 is not a valid insertion point for the vsetvl
pass because the path 0->3 doesn't strictly need a vsetvl.  That's
inherent in the LCM algorithm (anticipatable).


Yeah, I remember the same issue with the rounding-mode setter placement.
Yes.  For VXRM placement, under the right circumstances we pretend there 
is a need for the VXRM state at the first instruction in the first BB. 
That enables very aggressive hoisting by LCM in those limited cases.






Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3
(or any other block that doesn't require one)?  Such a dummy vsetvl would be
fusible with every other vsetvl.  If there are dummy vsetvls remaining after
LCM just delete them?

Just thinking out loud, the devil will be in the details.
But in Vineet's case they want to avoid speculation as that can result 
in a vl=0 case.  If we had a dummy fusible vsetvl in bb3, then that 
would allow movement into bb0 which is undesirable.


WRT a question Palmer asked earlier in the thread.  I went back and 
reviewed the code/docs around the hook Edwin is using.  My reading is a 
bit different and that what Edwin is doing is perfectly fine.


Jeff








Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-13 Thread Richard Sandiford
Vladimir Makarov  writes:
> On 2/7/25 12:18 PM, Richard Sandiford wrote:
>> FWIW, here's a very rough initial version of the kind of thing
>> I was thinking about.  Hopefully the hook documentation describes
>> the approach.  It's deliberately (overly?) flexible.
>>
>> I've included an aarch64 version that (a) models the fact that the
>> first caller-save can also allocate the frame more-or-less for free,
>> and (b) once we've saved an odd number of GPRs, saving one more is
>> essentialy free.  I also hacked up an x86 version locally to model
>> the allocation benefits of using caller-saved registers.  It seemed
>> to fix the povray example above.
>>
>> This still needs a lot of clean-up and testing, but I thought I might
>> as well send what I have before leaving for the weekend.  Does it look
>> reasonable in principle?
>>
> Richard, thank you for continuing work on this problem.  These hooks and 
> their implementation have much more sense to me.  Although it is 
> difficult to predict that it will solve all existing related PRs. You 
> definitely get my approval of your hooks if you will manage not to have 
> new GCC testsuite failures with these hooks on x86-64, aarch64, and ppc64.

Thanks Vlad!  Here's an updated patch that passes testing aarch64-linux-gnu
and x86_64-linux-gnu.  I haven't yet checked ppc64, but will do that.
Just wanted to post what I have before going off on a long weekend.

As described below, the patch also shows no change to AArch64 SPEC2017
scores.  I'm afraid I'll need help from x86 folks to do performance
testing there.

Richard


>From 46ad583e65a1c5a27e2203a7571bba6eb0766bc6 Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Fri, 7 Feb 2025 15:40:21 +
Subject: [PATCH] ira: Add new hooks for callee-save vs spills [PR117477]
To: gcc-patches@gcc.gnu.org

Following on from the discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html

this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.

(The patch does not attempt to address the shrink-wrapping part of
the thread above.)

On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores.  (An earlier version
did regress perlbench, because the aarch64 hook in that version
incorrectly treated call-preserved registers as having the same
cost as call-clobbered registers, even for pseudos that are not live
across a call.  Oops.)

The x86 change follows Honza's suggestion of deducting 2 from the
current cost, to model the saving of using push & pop.  With the
new hooks, we could instead increase the cost of using a caller-saved
register (i.e. model the extra add and sub), but I haven't tried that.
I did however check that deducting 1 instead of 2 was enough to make
pr91384.c pass for -mabi=32 but not for -mabi=64.

gcc/
PR rtl-optimization/117477
* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
(aarch64_frame_allocation_cost): Likewise.
(TARGET_CALLEE_SAVE_COST): Define.
(TARGET_FRAME_ALLOCATION_COST): Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
Replace with...
(ix86_callee_save_cost): ...this new hook.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST): Define.
* target.h (spill_cost_type, frame_cost_type): New enums.
* target.def (callee_save_cost, frame_allocation_cost): New hooks.
(ira_callee_saved_register_cost_scale): Delete.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
* doc/tm.texi: Regenerate.
* hard-reg-set.h (hard_reg_set_popcount): New function.
* ira-color.cc (allocated_memory_p): New variable.
(allocated_callee_save_regs): Likewise.
(record_allocation): New function.
(assign_hard_reg): Use targetm.frame_allocation_cost to model
the cost of the first spill or first caller save.  Use
targetm.callee_save_cost to model the cost of using new callee-saved
registers.  Apply the exit rather than entry frequency to the cost
of restoring a register or deallocating the frame.  Update the
new variables above.
(improve_allocation): Use record_allocation.
(color): Initialize allocated_callee_save_regs.
(ira_color): Initialize allocated_memory_p.
* targhooks.h (default_callee_save_cost): Declare.
(default_frame_allocation_cost): Likewise.
* targhooks.cc (default_callee_save_cost): New function.
(default_frame_allocation_cost): Likewise.

gcc/testsuite/
PR rtl-

Re: [patch, Fortran] Fix PR 118845

2025-02-13 Thread Jerry D

On 2/13/25 11:59 AM, Thomas Koenig wrote:

Hello world,

this was an interesting regression.  It came from my recent
patch, where an assert was triggered because a procedure artificial
dummy argument generated for a global symbol did not have the
information if if was a function or a subroutine.  Fixed by
adding the information in gfc_get_formal_from_actual_arglist.

This information then uncovered some new errors, also in the
testsuite, which needed fixing.  Finally, the error is made to
look a bit nicer, so the user gets a pointer to where the
original interface comes from, like this:

    10 | CALL bar (test2) ! { dg-error "Interface mismatch in dummy 
procedure" }

   |  1
..
    16 | CALL bar (test) ! { dg-error "Interface mismatch in dummy 
procedure" }

   |  2
Fehler: Interface mismatch in dummy procedure at (1) conflichts with 
(2): 'test2' is not a subroutine


Regression-tested. OK for trunk?


This is OK. It would be good to get confirmation that the lapack builds 
now.  I use to be set up here to do that, but dont have it at the moment.


Thanks for the quick fix.

Jerry


Best regards

 Thomas

gcc/fortran/ChangeLog:

 PR fortran/118845
 * interface.cc (compare_parameter): If the formal attribute has been
 generated from an actual argument list, also output an pointer to
 there in case of an error.
 (gfc_get_formal_from_actual_arglist): Set function and subroutine
 attributes and (if it is a function) the typespec from the actual
 argument.

gcc/testsuite/ChangeLog:

 PR fortran/118845
 * gfortran.dg/recursive_check_4.f03: Adjust call so types matche.
 * gfortran.dg/recursive_check_6.f03: Likewise.
 * gfortran.dg/specifics_2.f90: Adjust calls so types match.
 * gfortran.dg/interface_52.f90: New test.
 * gfortran.dg/interface_53.f90: New test.




Re: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-13 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> That said, I'm quite sure we don't want to have a dr->target_alignment
>> that isn't power-of-two, so if the comput doesn't end up with a
>> power-of-two value we should leave it as the target prefers and
>> fixup (or fail) during vectorizable_load.
>
> Ack I'll round up to power of 2.

I don't think that's enough.  Rounding up 3 would give 4, but a group
size of 3 would produce vector iterations that start at 0, 3X, 6X, 9X, 12X
for some X.  [3X, 6X) and [6X, 9X) both straddle a 4X alignment boundary.

Thanks,
Richard


Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Robin Dapp
> I did try adding some additional logic to adjust the way vsetvl fusion 
> occurs across basic blocks in these scenarios  i.e. performing the 
> fusion in the opposite manner (breaking lcm guarantees); however, from 
> my testing, fusing two vsetvls didn't actually remove the fused 
> expression from the vinfo list. I'm not sure if that's intended but as a 
> result, phase 3 would remove the fused block and use the vinfo that 
> should've been fused into the other.

It depends on the specific example but keeping deleted vsetvls/infos around
has a purpose because it helps delete other vsetvls still.  I don't recall
details but I remember having at least a few examples for it.

-- 
Regards
 Robin



Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Palmer Dabbelt

On Thu, 13 Feb 2025 07:38:13 PST (-0800), jeffreya...@gmail.com wrote:



On 2/13/25 8:19 AM, Robin Dapp wrote:

The vsevl pass is LCM based.  So it's not allowed to add a vsetvl on a
path that didn't have a vsetvl before.  Consider this simple graph.

  0
 / \
2-->3

If we have need for a vsetvl in bb2, but not bb0 or bb3, then the vsetvl
will land in bb4.  bb0 is not a valid insertion point for the vsetvl
pass because the path 0->3 doesn't strictly need a vsetvl.  That's
inherent in the LCM algorithm (anticipatable).


Yeah, I remember the same issue with the rounding-mode setter placement.

Yes.  For VXRM placement, under the right circumstances we pretend there
is a need for the VXRM state at the first instruction in the first BB.
That enables very aggressive hoisting by LCM in those limited cases.





Wouldn't that be fixable by requiring a dummy/wildcard/dontcare vsetvl in bb3
(or any other block that doesn't require one)?  Such a dummy vsetvl would be
fusible with every other vsetvl.  If there are dummy vsetvls remaining after
LCM just delete them?

Just thinking out loud, the devil will be in the details.

But in Vineet's case they want to avoid speculation as that can result
in a vl=0 case.  If we had a dummy fusible vsetvl in bb3, then that
would allow movement into bb0 which is undesirable.


Ya, I think we confused everyone because there's really two 
vsetvli/branch movement things we've been talking about and they're kind 
of the opposite.


There's the issue this patch works around, where we found some vsetvli 
instances that set VL=0 in unrolled loops.  That makes some of our 
hardware people upset.  Turns out the reduced test case has the branches 
to early-out of the unrolled loop when VL would be 0, so just banning 
vsetvli speculation fixes the issue.  It's kind of a indirect way to 
solve a uarch-specific problem, so who knows if it'll be worth doing.


Then there's the vsetvli loop-invarint hoisting / vector tail generation 
thing we were talking about in the meeting this week.  Having the 
vsetvli in the loop made a different subset of our hardware people upset.  
That's kind of the opposite optimization, though we'd want to avoid the 
VL=0 case.  

They're both "Vineet's bug", the hardware people tend to call Vineet 
when they get upset ;)



WRT a question Palmer asked earlier in the thread.  I went back and
reviewed the code/docs around the hook Edwin is using.  My reading is a
bit different and that what Edwin is doing is perfectly fine.


Awesome, thanks.  So I think if this is sane enough to run experiments 
we can at least try that out and see what happens.



Jeff






Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2025-02-13 Thread Marek Polacek
Ping.

On Thu, Feb 06, 2025 at 11:26:48AM -0500, Marek Polacek wrote:
> Ping.
> 
> On Tue, Jan 21, 2025 at 11:05:46AM -0500, Marek Polacek wrote:
> > Ping.
> > 
> > On Fri, Jan 10, 2025 at 03:07:52PM -0500, Marek Polacek wrote:
> > > Ping.
> > > 
> > > On Fri, Dec 20, 2024 at 08:58:05AM -0500, Marek Polacek wrote:
> > > > Ping.
> > > > 
> > > > On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote:
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > As the manual states, using "-fhardened -fstack-protector" will 
> > > > > produce
> > > > > a warning because -fhardened wants to enable -fstack-protector-strong,
> > > > > but it can't since it's been overriden by the weaker 
> > > > > -fstack-protector.
> > > > > 
> > > > > -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> > > > > logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> > > > > produce the same warning.  But we don't detect this combination, so
> > > > > this patch fixes it.  I also renamed a variable to better reflect its
> > > > > purpose.
> > > > > 
> > > > > Also don't check warn_hardened in process_command, since it's always
> > > > > true there.
> > > > > 
> > > > > Also tweak wording in the manual as Jon Wakely suggested on IRC.
> > > > > 
> > > > >   PR driver/117739
> > > > > 
> > > > > gcc/ChangeLog:
> > > > > 
> > > > >   * doc/invoke.texi: Tweak wording for -Whardened.
> > > > >   * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
> > > > >   specified, don't enable linker hardening.
> > > > >   (process_command): Don't check warn_hardened.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > > 
> > > > >   * c-c++-common/fhardened-16.c: New test.
> > > > >   * c-c++-common/fhardened-17.c: New test.
> > > > >   * c-c++-common/fhardened-18.c: New test.
> > > > >   * c-c++-common/fhardened-19.c: New test.
> > > > >   * c-c++-common/fhardened-20.c: New test.
> > > > >   * c-c++-common/fhardened-21.c: New test.
> > > > > ---
> > > > >  gcc/doc/invoke.texi   |  4 ++--
> > > > >  gcc/gcc.cc| 20 ++--
> > > > >  gcc/testsuite/c-c++-common/fhardened-16.c |  5 +
> > > > >  gcc/testsuite/c-c++-common/fhardened-17.c |  5 +
> > > > >  gcc/testsuite/c-c++-common/fhardened-18.c |  5 +
> > > > >  gcc/testsuite/c-c++-common/fhardened-19.c |  5 +
> > > > >  gcc/testsuite/c-c++-common/fhardened-20.c |  5 +
> > > > >  gcc/testsuite/c-c++-common/fhardened-21.c |  5 +
> > > > >  8 files changed, 46 insertions(+), 8 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c
> > > > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c
> > > > > 
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > > index 346ac1369b8..371f723539c 100644
> > > > > --- a/gcc/doc/invoke.texi
> > > > > +++ b/gcc/doc/invoke.texi
> > > > > @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}.
> > > > >  Warn when @option{-fhardened} did not enable an option from its set 
> > > > > (for
> > > > >  which see @option{-fhardened}).  For instance, using 
> > > > > @option{-fhardened}
> > > > >  and @option{-fstack-protector} at the same time on the command line 
> > > > > causes
> > > > > -@option{-Whardened} to warn because 
> > > > > @option{-fstack-protector-strong} is
> > > > > -not enabled by @option{-fhardened}.
> > > > > +@option{-Whardened} to warn because 
> > > > > @option{-fstack-protector-strong} will
> > > > > +not be enabled by @option{-fhardened}.
> > > > >  
> > > > >  This warning is enabled by default and has effect only when 
> > > > > @option{-fhardened}
> > > > >  is enabled.
> > > > > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > > > > index 92c92996401..d2718d263bb 100644
> > > > > --- a/gcc/gcc.cc
> > > > > +++ b/gcc/gcc.cc
> > > > > @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0;
> > > > > driver added to dumpdir after dumpbase or linker output name.  */
> > > > >  static bool dumpdir_trailing_dash_added = false;
> > > > >  
> > > > > -/* True if -r, -shared, -pie, or -no-pie were specified on the 
> > > > > command
> > > > > -   line.  */
> > > > > -static bool any_link_options_p;
> > > > > +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were
> > > > > +   specified on the command line, and therefore -fhardened should not
> > > > > +   add -z now/relro.  */
> > > > > +static bool avoid_linker_hardening_p;
> > > > >  
> > > > >  /* True if -static was specified on the command line.  */
> > > > >  static bool static_p;
> > > > > @@ -4434,10

Re: [PATCH] RISC-V: Avoid more unsplit insns in const expander [PR118832].

2025-02-13 Thread Jeff Law




On 2/12/25 7:03 AM, Robin Dapp wrote:

Hi,

in PR118832 we have another instance of the problem already noticed in
PR117878.  We sometimes use e.g. expand_simple_binop for vector
operations like shift or and.  While this is usually OK, it causes
problems when doing it late, e.g. during LRA.

In particular, we might rematerialize a const_vector during LRA, which
then leaves an insn laying around that cannot be split any more if it
requires a pseudo.  Therefore we should only use the split variants
in expand_const_vector.

This patch fixed the issue in the PR and also pre-emptively rewrites two
other spots that might be prone to the same issue.

Regtested on rv64gcv_zvl512b.  As the two other cases don't have a test
(so might not even trigger) I unconditionally enabled them for my testsuite
run.

Regards
  Robin

PR target/118832

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector):  Expand as
vlmax insn during lra.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118832.c: New test.

Pushed to the trunk and I'll update the BZ entry momentarily.

jeff



Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-13 Thread Jeff Law




On 2/12/25 2:22 PM, Jakub Jelinek wrote:



I agree that the most common cases should be all the arguments the same
type.  I was working under the assumption that the args would be compatible
types already, forgetting that IFNs are different in that regard than other
gimple ops.  I wouldn't want to go any further than all three operands the
same with the easy to reason about relation checks.


For gcc-16 I think we can extend that block fairly easily to handle certain
mismatched size cases and we look to see if there's cases where the
combination of a relationship between the arguments and some range
information would allow us to capture further cases.


For the GCC 16 version, I think best would be (given Andrew's mail that
the relations aren't likely very useful for incompatible types) to
   relation_kind rel = VREL_VARYING;
   if (code == MINUS_EXPR
   && types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1))
 {
   rel = query->relation().query (s, op0, op1);
   /* The result of the infinite precision subtraction of
 the same values will be always 0.  That will fit into any result
 type.  */
   if (rel == VREL_EQ)
return true;
 }

then do the current
   int_range_max vr0, vr1;
   if (!query->range_of_expr (vr0, op0, s) || vr0.undefined_p ())
 vr0.set_varying (TREE_TYPE (op0));
   if (!query->range_of_expr (vr1, op1, s) || vr1.undefined_p ())
 vr1.set_varying (TREE_TYPE (op1));

   tree vr0min = wide_int_to_tree (TREE_TYPE (op0), vr0.lower_bound ());
   tree vr0max = wide_int_to_tree (TREE_TYPE (op0), vr0.upper_bound ());
   tree vr1min = wide_int_to_tree (TREE_TYPE (op1), vr1.lower_bound ());
   tree vr1max = wide_int_to_tree (TREE_TYPE (op1), vr1.upper_bound ());

and then we can e.g. special case > and >=:
   /* If op1 is not negative, op0 - op1 for op0 >= op1 will be always
  in [0, op0] and so if vr0max - vr1min fits into type, there won't
  be any overflow.  */
   if ((rel == VREL_GT || rel == VREL_GE)
   && tree_int_cst_sgn (vr1min) >= 0
   && !arith_overflowed_p (MINUS_EXPR, type, vr0max, vr1min))
 return true;

Would need to think about if anything could be simplified for
VREL_G{T,E} if tree_int_cst_sgn (vr1min) < 0.

As for VREL_LT, one would need to think it through as well for both
tree_int_cst_sgn (vr1min) >= 0 and tree_int_cst_sgn (vr1min) < 0.
For the former, the infinite precision of subtraction is known given
the relation to be < 0.  Now obviously if TYPE_UNSIGNED (type) that
would imply always overflow.  But for !TYPE_UNSIGNED (type) that isn't
necessarily the case and the question is if the relation helps with the
reasoning.  Generally the code otherwise tries to check 2 boundaries
(for MULT_EXPR 4 but we don't care about that), if they both don't overflow,
it is ok, if only one overflows, we don't know, if both boundaries don't
overflow, we need to look further and check some corner cases in between.

Or just go with that even for GCC 15 (completely untested and dunno if
something needs to be done about s = NULL passed to query or not) for
now, with the advantage that it can do something even for the cases where
type is not compatible with types of arguments, and perhaps add additional
cases later?
This is further than I wanted to go for gcc-15.  But I can support 
something like this as it's not a major extension to what I was 
suggesting.  And of course it addresses the correctness issues around 
different types.  I'll play with it a bit.


And WRT an earlier message about gcc-16.  Yea, I think opening a bug for 
additional cases would be a good idea.


Jeff



Re: [PATCH] c++: fix propagating REF_PARENTHESIZED_P [PR116379]

2025-02-13 Thread Jason Merrill

On 2/13/25 11:37 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

OK.


-- >8 --
Here we have:

   template
   struct X{
   T val;
   decltype(auto) value(){
  return (val);
   }
   };

where the return type of value should be 'int &' since '(val)' is an
expression, not a name, and decltype(auto) performs the type deduction
using the decltype rules.

The problem is that we weren't propagating REF_PARENTHESIZED_P
correctly: the return value of finish_non_static_data_member in this
test was a REFERENCE_REF_P, so we didn't set the flag.  We should
use force_paren_expr like below.

PR c++/116379

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Use force_paren_expr to set
REF_PARENTHESIZED_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto9.C: New test.
---
  gcc/cp/pt.cc|  4 ++--
  gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C | 15 +++
  2 files changed, 17 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a2fc8813e9d..5706a3987c3 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21712,8 +21712,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  {
r = finish_non_static_data_member (member, object, NULL_TREE,
   complain);
-   if (TREE_CODE (r) == COMPONENT_REF)
- REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (t);
+   if (REF_PARENTHESIZED_P (t))
+ force_paren_expr (r);
RETURN (r);
  }
else if (type_dependent_expression_p (object))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C
new file mode 100644
index 000..1ccf95a0170
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto9.C
@@ -0,0 +1,15 @@
+// PR c++/116379
+// { dg-do compile { target c++14 } }
+
+template
+struct X {
+  T val;
+  decltype(auto) value() { return (val); }
+};
+
+int main() {
+  int i = 0;
+  X x{ static_cast(i) };
+  using type = decltype(x.value());
+  using type = int&;
+}

base-commit: a134dcd8a010744a0097d190f73a4efc2e381531




[PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread H.J. Lu
On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak  wrote:
>
> On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu  wrote:
> >
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use_1): Likewise.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
> > * gcc.target/i386/pr109780-3.c: Likewise.
>
> Some non-algorithmical changes below, otherwise LGTM. Please also get
> someone to review dataflow infrastructure usage, I am not well versed
> with it.
>
> +/* Helper function for ix86_find_all_reg_use.  */
> +
> +static void
> +ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
> + auto_bitmap &worklist)
> +{
> +  rtx src = SET_SRC (set);
> +  if (MEM_P (src))
>
> Also reject assignment from CONST_SCALAR_INT?

Done.

> +return;
> +
> +  rtx dest = SET_DEST (set);
> +  if (!REG_P (dest))
> +return;
>
> Can we switch these two so the test for REG_P (dest) will be first? We
> are not interested in anything that doesn't assign to a register.

Done.

> +/* Find all registers defined with REG.  */
> +
> +static void
> +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> +   unsigned int reg, auto_bitmap &worklist)
> +{
> +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> +   ref != NULL;
> +   ref = DF_REF_NEXT_REG (ref))
> +{
> +  if (DF_REF_IS_ARTIFICIAL (ref))
> +continue;
> +
> +  rtx_insn *insn = DF_REF_INSN (ref);
> +  if (!NONDEBUG_INSN_P (insn))
> +continue;
>
> Here we pass only NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X)
>
> +  if (CALL_P (insn) || JUMP_P (insn))
> +continue;
>
> And here remains only NONJUMP_INSN_P (X), so both above conditions
> could be substituted with:
>
> if (!NONJUMP_INSN_P (X))
>   continue;

Done.

> +
> +  rtx set = single_set (insn);
> +  if (set)
> +ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);
> +
> +  rtx pat = PATTERN (insn);
> +  if (GET_CODE (pat) != PARALLEL)
> +continue;
> +
> +  for (int i = 0; i < XVECLEN (pat, 0); i++)
> +{
> +  rtx exp = XVECEXP (pat, 0, i);
> +  switch (GET_CODE (exp))
> +{
> +case ASM_OPERANDS:
> +case CLOBBER:
> +case PREFETCH:
> +case USE:
> +  break;
> +case UNSPEC:
> +case UNSPEC_VOLATILE:
> +  for (int j = XVECLEN (exp, 0) - 1; j >= 0; j--)
> +{
> +  rtx x = XVECEXP (exp, 0, j);
> +  if (GET_CODE (x) == SET)
> +ix86_find_all_reg_use_1 (x, stack_slot_access,
> + worklist);
> +}
> +  break;
> +case SET:
> +  ix86_find_all_reg_use_1 (exp, stack_slot_access,
> +   worklist);
> +  break;
> +default:
> +  debug_rtx (exp);
>
> Stray debug remaining?

Removed.

> +  HARD_REG_SET stack_slot_access;
> +  CLEAR_HARD_REG_SET (stack_slot_access);
> +
> +  /* Stack slot can be accessed by stack pointer, frame pointer or
> + registers defined by stack pointer or frame pointer.  */
> +  auto_bitmap worklist;
>
> Please put a line of vertical space here ...

Done.

> +  add_to_hard_reg_set (&stack_slot_access, Pmode,
> +   STACK_POINTER_REGNUM);
> +  bitmap_set_bit (worklist, STACK_POINTER_REGNUM);
>
> ... here ...

Done.

> +  if (frame_pointer_needed)
> +{
> +  add_to_hard_reg_set (&stack_slot_access, Pmode,
> +   HARD_FRAME_POINTER_REGNUM);
> +  bitmap_set_bit (worklist, HARD_FRAME_POINTER_REGNUM);
> +}
>
> ... here ...
>

Done.

> +  unsigned int reg;
>
> ... here ...

Done.

> +  do
> +{
> +  reg = bitmap_clear_first_set_bit (worklist);
> +  ix86_find_all_reg_use (stack_slot_access, reg, worklist);
> +}
> +  while (!bitmap_empty_p (worklist));
> +
> +  hard_reg_set_iterator hrsi;
>
> ... here ...

Done.

> +  EXECUTE_IF_SET_IN_HARD_REG_SET (stack_slot_access, 0, reg, hrsi)
> +for (df_ref ref = DF_REG_USE_CHAIN (reg);
> + ref != NULL;
> + ref = DF_REF_NEXT_REG (ref))
> +  {
> +if (DF_REF_IS_ARTIFICIAL (ref))
> +  continue;
> +
> +rtx_insn *insn = DF_REF_INSN (ref);
>
> ... and here.
>

Done.

> +if (!NONDEBUG_INSN_P (insn))
>
> !NONJUMP_INSN_P ?

Changed.

> +  continue;
>
> Also some vertical space here.

Done.

> +note_stores (insn, ix86_

[PATCH] libstc++: Improve list assumption after constructor [PR118865]

2025-02-13 Thread Andrew Pinski
The code example here does:
```
if (begin == end) __builtin_unreachable();
std::list nl(begin, end);

for (auto it = nl.begin(); it != nl.end(); it++)
{
...
}
/* Remove the first element of the list. */
nl.erase(nl.begin());
```

And we get a warning because because we jump threaded the case were we
think the list was empty from the for loop BUT we populated it without
an empty array. So can help the compiler here by adding that after initializing
the list with non empty array, that the list will not be empty either.

This is able to remove the -Wfree-nonheap-object warning in the first reduced
testcase (with the fix for `begin == end` case added) in the PR 118865; the 
second
reduced testcase has been filed off as PR 118867.

Bootstrapped and tested on x86_64-linux-gnu.

libstdc++-v3/ChangeLog:

PR libstdc++/118865
* include/bits/stl_list.h (_M_initialize_dispatch): Add an
unreachable if the iterator was not empty that the list will
now be not empty.

Signed-off-by: Andrew Pinski 
---
 libstdc++-v3/include/bits/stl_list.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index be33eeb03d4..f987d8b9d0a 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -2384,12 +2384,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
_M_initialize_dispatch(_InputIterator __first, _InputIterator __last,
   __false_type)
{
+ bool __notempty = __first != __last;
  for (; __first != __last; ++__first)
 #if __cplusplus >= 201103L
emplace_back(*__first);
 #else
push_back(*__first);
 #endif
+if (__notempty)
+  {
+if (begin() == end())
+  __builtin_unreachable();
+  }
}
 
   // Called by list(n,v,a), and the range constructor when it turns out
-- 
2.43.0



Re: [patch, fortran] PR117430 gfortran allows type(C_ptr) in I/O list

2025-02-13 Thread Jerry D

On 2/13/25 1:42 PM, Harald Anlauf wrote:

Am 12.02.25 um 21:49 schrieb Jerry D:

The attached patch is fairly obvious. The use of notify_std is changed
to a gfc_error. Several test cases had to be adjusted.

Regression tested on x86_64.

OK for trunk?


This is not a review, just some random comments on the testsuite changes
by your patch:



I will update and give the i some declarations. I just tried integer and 
it worked. Of course you are correct to declasre these as type(c_ptr)


Regarding, the use of transfer, I will fix those as well.

The patch itself is trivial so I will wait a day or so for any other 
comments.


Thanks for feedback.

Jerry




diff --git a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
index 4c2a7d657ee..92bfca4363d 100644
--- a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
+++ b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
@@ -1,5 +1,4 @@
  ! { dg-do compile }
-! { dg-options "" }
  !
  ! PR fortran/56378
  ! PR fortran/52426
@@ -24,5 +23,5 @@ contains
  end module

  use iso_c_binding
-print *, c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have
either the POINTER or the TARGET attribute" }
+i = c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have
either the POINTER or the TARGET attribute" }
^^^ i is not declared a type(c_ptr)
  end

diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
index 4ce1c6809e4..834570cb74d 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
@@ -1,5 +1,4 @@
  ! { dg-do run }
-! { dg-options "-std=gnu" }
  ! This test case exists because gfortran had an error in converting the
  ! expressions for the derived types from iso_c_binding in some cases.
  module c_ptr_tests_10
@@ -7,7 +6,7 @@ module c_ptr_tests_10

  contains
    subroutine sub0() bind(c)
-    print *, 'c_null_ptr is: ', c_null_ptr
+    print *, 'c_null_ptr is: ', transfer (cptr, C_LONG_LONG)
  
This does not do what one naively might think.
transfer (cptr, C_LONG_LONG) == transfer (cptr, 0)

You probably want: transfer (cptr, 0_C_INTPTR_T)

    end subroutine sub0
  end module c_ptr_tests_10


diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
index 5a32553b8c5..711b9c157d4 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
@@ -16,9 +16,9 @@ contains
  type(myF90Derived), pointer :: my_f90_type_ptr

  my_f90_type%my_c_ptr = c_null_ptr
-    print *, 'my_f90_type is: ', my_f90_type%my_c_ptr
+    print *, 'my_f90_type is: ', transfer(my_f90_type%my_c_ptr,
C_LONG_LONG)
  my_f90_type_ptr => my_f90_type
-    print *, 'my_f90_type_ptr is: ', my_f90_type_ptr%my_c_ptr
+    print *, 'my_f90_type_ptr is: ', transfer(my_f90_type_ptr%my_c_ptr,
  C_LONG_LONG)
    end subroutine sub0
  end module c_ptr_tests_9

Likewise.

diff --git a/gcc/testsuite/gfortran.dg/init_flag_17.f90
b/gcc/testsuite/gfortran.dg/init_flag_17.f90
index 401830fccbc..8bb9f7b1ef7 100644
--- a/gcc/testsuite/gfortran.dg/init_flag_17.f90
+++ b/gcc/testsuite/gfortran.dg/init_flag_17.f90
@@ -19,8 +19,8 @@ program init_flag_17

    type(ty) :: t

-  print *, t%ptr
-  print *, t%fptr
+  print *, transfer(t%ptr, c_long_long)
+  print *, transfer(t%fptr, c_long_long)

  end program

Likewise.


diff --git a/gcc/testsuite/gfortran.dg/pr32601_1.f03
b/gcc/testsuite/gfortran.dg/pr32601_1.f03
index a297e1728ec..1a48419112d 100644
--- a/gcc/testsuite/gfortran.dg/pr32601_1.f03
+++ b/gcc/testsuite/gfortran.dg/pr32601_1.f03
@@ -4,9 +4,9 @@
  ! PR fortran/32601
  use, intrinsic :: iso_c_binding, only: c_loc, c_ptr
  implicit none
-
+integer i
  ! This was causing an ICE, but is an error because the argument to C_LOC
  ! needs to be a variable.
-print *, c_loc(4) ! { dg-error "shall have either the POINTER or the
TARGET attribute" }
+i = c_loc(4) ! { dg-error "shall have either the POINTER or the TARGET
attribute" }

  end

Again, i should be declared as type(c_ptr).

Cheers,
Harald


Regards,

Jerry


Author: Jerry DeLisle 
Date:   Tue Feb 11 20:57:50 2025 -0800

 Fortran:  gfortran allows type(C_ptr) in I/O list

 Before this patch, gfortran was accepting invalid use of
 type(c_ptr) in I/O statements. The fix affects several
 existing test cases so no new test case needed.

 Existing tests were modified to pass by either using the
 transfer function to convert to an acceptable value or
 using an assignment to a like type (non-I/O).

 PR fortran/117430

 gcc/fortran/ChangeLog:

 * resolve.cc (resolve_transfer): Issue the error
 with no exceptions allowed.

 gcc/testsuite/ChangeLog:

 * gfortran.dg/c_loc_test_17.f90: Modify to pass.
 * gfortran.dg/c_ptr_tests_10.f03: Likewise.
 * gfortran.dg/c_ptr_tes

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-13 Thread Vineet Gupta
On 2/14/25 04:58, Jeff Law wrote:
> I'd guess it more work than it'd be worth.  We're just not seeing 
> vsetvls being all that problematical on our design.  I do see a lot of 
> seemingly gratutious changes in the vector config, but when we make 
> changes to fix that we generally end up with worse performing code.

To be clear the VSETVL on their own are not problematic for us either. It
causing VL=0 is.
I have a change in works which could introduce additional VSETVLs ;-)
-Vineet


Re: [PATCH] tree-optimization/90579 - avoid STLF fail by better optimizing

2025-02-13 Thread Jeff Law




On 2/12/25 7:58 AM, Richard Biener wrote:

For the testcase in question which uses a fold-left vectorized
reduction of a reverse iterating loop we'd need two forwprop
invocations to first bypass the permute emitted for the reverse
iterating loop and then to decompose the vector load that only
feeds element extracts.  The following moves the first transform
to a match.pd pattern and makes sure we fold the element extracts
when the vectorizer emits them so the single forwprop pass can
then pick up the vector load decomposition, avoiding the forwarding
fail that causes.

Moving simplify_bitfield_ref also makes forwprop remove the dead
VEC_PERM_EXPR via the simple-dce it uses - this was also
previously missing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/90579
* tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
match.pd.
(pass_forwprop::execute): Adjust.
* match.pd (bit_field_ref (vec_perm ...)): New pattern
modeled after simplify_bitfield_ref.
* tree-vect-loop.cc (vect_expand_fold_left): Fold the
element extract stmt, combining it with the vector def.

OK.

Jeff



[PATCH] i386: Do not check vector size conflict when AVX512 is not explicitly set [PR 118815]

2025-02-13 Thread Haochen Jiang
Hi all,

When AVX512 is not explicitly set, we should not take EVEX512 bit into
consideration when checking vector size. It will solve the intrin header
file reporting warnings when compiling with -Wsystem-headers.

However, there is side effect on the usage for '-march=xxx -mavx10.1-256',
where xxx is with AVX512. It will not report warning on vector size for now.
Since it is a rare usage, we will take it.

Ok for trunk and backport to GCC 14?

gcc/ChangeLog:

PR target/118815
* config/i386/i386-options.cc (ix86_option_override_internal):
Do not check vector size conflict when AVX512 is not explicitly
set.

gcc/testsuite/ChangeLog:

PR target/118815
* gcc.target/i386/pr118815.c: New test.
---
 gcc/config/i386/i386-options.cc  | 1 +
 gcc/testsuite/gcc.target/i386/pr118815.c | 9 +
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr118815.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 3467ab0bbeb..7e85334d3d3 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -2711,6 +2711,7 @@ ix86_option_override_internal (bool main_args_p,
"using 512 as max vector size");
}
   else if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
+  && (opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512F)
   && !(OPTION_MASK_ISA2_EVEX512
& opts->x_ix86_isa_flags2_explicit))
warning (0, "Vector size conflicts between AVX10.1 and AVX512, using "
diff --git a/gcc/testsuite/gcc.target/i386/pr118815.c 
b/gcc/testsuite/gcc.target/i386/pr118815.c
new file mode 100644
index 000..84308fce08a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr118815.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v3" } */
+
+#pragma GCC push_options
+#pragma GCC target("avx10.2-256")
+
+void foo();
+
+#pragma GCC pop_options
-- 
2.31.1



Re: [pushed][PATCH v3 0/4] Organize the code and fix PR118828 and PR118843.

2025-02-13 Thread Lulu Cheng

Pushed to r15-7521..r15-7524

在 2025/2/13 下午8:59, Lulu Cheng 写道:

v1 -> v2:
  1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE}
__loongarch_{div32,am_bh,amcas,ld_seq_sa} and
__loongarch_version_major/__loongarch_version_minor to update function.
  2. Fixed PR118843.
  3. Add testsuites.

v2 -> v3:
   1. Modify test cases (pr118828-3.c pr118828-4.c).

Lulu Cheng (4):
   LoongArch: Move the function loongarch_register_pragmas to
 loongarch-c.cc.
   LoongArch: Split the function loongarch_cpu_cpp_builtins into two
 functions.
   LoongArch: After setting the compilation options, update the
 predefined macros.
   LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined
 [PR118843].

  gcc/config/loongarch/loongarch-c.cc   | 204 +-
  gcc/config/loongarch/loongarch-protos.h   |   1 +
  gcc/config/loongarch/loongarch-target-attr.cc |  48 -
  .../gcc.target/loongarch/pr118828-2.c |  30 +++
  .../gcc.target/loongarch/pr118828-3.c |  32 +++
  .../gcc.target/loongarch/pr118828-4.c |  32 +++
  gcc/testsuite/gcc.target/loongarch/pr118828.c |  34 +++
  gcc/testsuite/gcc.target/loongarch/pr118843.c |   6 +
  8 files changed, 287 insertions(+), 100 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c





Re:[pushed] [PATCH v2] LoongArch: Adjust the cost of ADDRESS_REG_REG.

2025-02-13 Thread Lulu Cheng

Pushed to r15-7525.

在 2025/2/13 下午4:40, Lulu Cheng 写道:

After changing this cost from 1 to 3, the performance of spec2006
401 473 416 465 482 can be improved by about 2% on LA664.

Add option '-maddr-reg-reg-cost='.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in: Add
option '-maddr-reg-reg-cost='.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Initialize
addr_reg_reg_cost to 3.
* config/loongarch/loongarch-opts.cc
(loongarch_target_option_override): If '-maddr-reg-reg-cost='
is not used, set it to the initial value.
* config/loongarch/loongarch-tune.h
(struct loongarch_rtx_cost_data): Add the member
addr_reg_reg_cost and its assignment function to the structure
loongarch_rtx_cost_data.
* config/loongarch/loongarch.cc (loongarch_address_insns):
Use la_addr_reg_reg_cost to set the cost of ADDRESS_REG_REG.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* doc/invoke.texi: Add description of '-maddr-reg-reg-cost='.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/const-double-zero-stx.c: Add
'-maddr-reg-reg-cost=1'.
* gcc.target/loongarch/stack-check-alloca-1.c: Likewise.

Change-Id: I8fbf7a6d073b16c7829b1a9a8d239b131d53ab1b
---
  gcc/config/loongarch/genopts/loongarch.opt.in  | 4 
  gcc/config/loongarch/loongarch-def.cc  | 1 +
  gcc/config/loongarch/loongarch-opts.cc | 3 +++
  gcc/config/loongarch/loongarch-tune.h  | 7 +++
  gcc/config/loongarch/loongarch.cc  | 2 +-
  gcc/config/loongarch/loongarch.opt | 4 
  gcc/config/loongarch/loongarch.opt.urls| 3 +++
  gcc/doc/invoke.texi| 7 ++-
  gcc/testsuite/gcc.target/loongarch/const-double-zero-stx.c | 2 +-
  gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c  | 2 +-
  10 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 8c292c8600d..39c1545e540 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -177,6 +177,10 @@ mbranch-cost=
  Target RejectNegative Joined UInteger Var(la_branch_cost) Save
  -mbranch-cost=COSTSet the cost of branches to roughly COST instructions.
  
+maddr-reg-reg-cost=

+Target RejectNegative Joined UInteger Var(la_addr_reg_reg_cost) Save
+-maddr-reg-reg-cost=COST  Set the cost of ADDRESS_REG_REG to the value 
calculated by COST.
+
  mcheck-zero-division
  Target Mask(CHECK_ZERO_DIV) Save
  Trap on integer divide by zero.
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index b0271eb3b9a..5f235a04ef2 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -136,6 +136,7 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
  movcf2gr (COSTS_N_INSNS (7)),
  movgr2cf (COSTS_N_INSNS (15)),
  branch_cost (6),
+addr_reg_reg_cost (3),
  memory_latency (4) {}
  
  /* The following properties cannot be looked up directly using "cpucfg".

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 36342cc9373..c2a63f75fc2 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -1010,6 +1010,9 @@ loongarch_target_option_override (struct loongarch_target 
*target,
if (!opts_set->x_la_branch_cost)
  opts->x_la_branch_cost = loongarch_cost->branch_cost;
  
+  if (!opts_set->x_la_addr_reg_reg_cost)

+opts->x_la_addr_reg_reg_cost = loongarch_cost->addr_reg_reg_cost;
+
/* other stuff */
if (ABI_LP64_P (target->abi.base))
  opts->x_flag_pcc_struct_return = 0;
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index e69173ebf79..f7819fe7678 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -38,6 +38,7 @@ struct loongarch_rtx_cost_data
unsigned short movcf2gr;
unsigned short movgr2cf;
unsigned short branch_cost;
+  unsigned short addr_reg_reg_cost;
unsigned short memory_latency;
  
/* Default RTX cost initializer, implemented in loongarch-def.cc.  */

@@ -115,6 +116,12 @@ struct loongarch_rtx_cost_data
  return *this;
}
  
+  loongarch_rtx_cost_data addr_reg_reg_cost_ (unsigned short _addr_reg_reg_cost)

+  {
+addr_reg_reg_cost = _addr_reg_reg_cost;
+return *this;
+  }
+
loongarch_rtx_cost_data memory_latency_ (unsigned short _memory_latency)
{
  memory_latency = _memory_latency;
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..495b

  1   2   >