date:20240924

Re: [PATCH] libstdc++: more #pragma diagnostic

2024-09-24 Thread Jonathan Wakely

On Tue, 24 Sept 2024, 21:43 Jason Merrill,  wrote:

> On 9/24/24 7:51 AM, Jason Merrill wrote:
> > Tested x86_64-pc-linux-gnu.
> >
> > Is this the right fix, or do we want to stop using these deprecated
> classes,
> > here and in stl_function.h?
>

We can't stop using them in stl_function.h for ABI compatibility reasons,
and the parallel mode should be deprecated in favour of C++17 parallel
algos so isn't worth "fixing", so I think the pragmas are the right answer
here.

OK, thanks.



> Oops, adding libstdc++ CC.
>
> > -- 8< --
> >
> > The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to
> > -Wdeprecated-declarations warnings in some parallel/ headers.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/parallel/base.h: Suppress -Wdeprecated-declarations.
> >   * include/parallel/multiseq_selection.h: Likewise.
> > ---
> >   libstdc++-v3/include/parallel/base.h   | 4 
> >   libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++
> >   2 files changed, 10 insertions(+)
> >
> > diff --git a/libstdc++-v3/include/parallel/base.h
> b/libstdc++-v3/include/parallel/base.h
> > index 5bc5350e723..fcbcc1e0b99 100644
> > --- a/libstdc++-v3/include/parallel/base.h
> > +++ b/libstdc++-v3/include/parallel/base.h
> > @@ -166,6 +166,8 @@ namespace __gnu_parallel
> > { return !_M_comp(__a, __b) && !_M_comp(__b, __a); }
> >   };
> >
> > +#pragma GCC diagnostic push
> > +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" //
> *nary_function
> >
> > /** @brief Similar to std::unary_negate,
> >  *  but giving the argument types explicitly. */
> > @@ -297,6 +299,8 @@ namespace __gnu_parallel
> >   struct _Multiplies<_Tp, _Tp, _Tp>
> >   : public std::multiplies<_Tp> { };
> >
> > +#pragma GCC diagnostic pop // -Wdeprecated-declarations
> > +
> > /** @brief _Iterator associated with __gnu_parallel::_PseudoSequence.
> >  *  If features the usual random-access iterator functionality.
> >  *  @param _Tp Sequence _M_value type.
> > diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h
> b/libstdc++-v3/include/parallel/multiseq_selection.h
> > index f25895adbdd..22bd97e6432 100644
> > --- a/libstdc++-v3/include/parallel/multiseq_selection.h
> > +++ b/libstdc++-v3/include/parallel/multiseq_selection.h
> > @@ -48,6 +48,10 @@
> >
> >   namespace __gnu_parallel
> >   {
> > +
> > +#pragma GCC diagnostic push
> > +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" //
> *nary_function
> > +
> > /** @brief Compare __a pair of types lexicographically, ascending. */
> > template
> >   class _Lexicographic
> > @@ -100,6 +104,8 @@ namespace __gnu_parallel
> > }
> >   };
> >
> > +#pragma GCC diagnostic pop // -Wdeprecated-declarations
> > +
> > /**
> >  *  @brief Splits several sorted sequences at a certain global
> __rank,
> >  *  resulting in a splitting point for each sequence.
> >
> > base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8
>
>

Re: [PATCH] gfortran testsuite: Remove unit-files in files having open-statements, PR116701

2024-09-24 Thread Jerry D


On 9/23/24 11:21 PM, Hans-Peter Nilsson wrote:

Here's a general approach to handle PR116701.  I considered
adding manual deletions as quoted below and mentioned in the
PR, but seeing the handling of "integer 8" in
fortran-torture-execute I decided to follow that example:
better scan the source for open-statements than relying on
manual annotations and people remembering to add them for
new test-cases.

I hope the inclusion of gfortran-dg.exp in
fortran-torture.exp is not controversial, but there's no
fortran-specific testsuite file common to dg and
classic-torture and also this placement is still in the
"Utility routines" section of gfortran-dg.exp.  (BTW, the C
torture-tests changed to the dg framework some time ago - no
more .x-files there and dg-directives actually work - there
are some in gfortran.fortran-torture that are apparently
ignored!)


Explain this change of including gfortran-dg.exp in fortran-torture.exp.

What does it mean in the case I do 'make -k -j4 check-fortran'? Does 
gfortran-dg-exp get performed twice? Forgive my ignorance of the 
testsuite incantations.


Regards,

Jerry

Re: [PATCH] gfortran testsuite: Remove unit-files in files having open-statements, PR116701

2024-09-24 Thread Hans-Peter Nilsson

Thanks for the review!

> Date: Tue, 24 Sep 2024 17:10:27 -0700
> Cc: Jerry D 
> From: Jerry D 
> On 9/23/24 11:21 PM, Hans-Peter Nilsson wrote:
> > I hope the inclusion of gfortran-dg.exp in
> > fortran-torture.exp is not controversial, but there's no
> > fortran-specific testsuite file common to dg and
> > classic-torture and also this placement is still in the
> > "Utility routines" section of gfortran-dg.exp.  (BTW, the C
> > torture-tests changed to the dg framework some time ago - no
> > more .x-files there and dg-directives actually work - there
> > are some in gfortran.fortran-torture that are apparently
> > ignored!)
> 
> Explain this change of including gfortran-dg.exp in fortran-torture.exp.

I need to put the new proc in a file, to be used by both dg
and classic-torture.  I picked among the untility-carrying
files gfortran-dg.exp, as it looked more fitting than
e.g. fortran-modules.exp.  Since it's not previously
included there, I included that file in fortran-torture.exp.

By including that file, not just the new proc
gfortran-dg-rmunits but also the other procs in that file
are available.  Since they don't collide with the
fortran-torture machinery, that should have no effect.

> What does it mean in the case I do 'make -k -j4 check-fortran'? Does 
> gfortran-dg-exp get performed twice?

(I assume you mean "are the gfortran.dg tests run twice" as
other interpretations make less sense to me.)

No.

> Forgive my ignorance of the 
> testsuite incantations.

There's nothing but load_lib and proc definitions in
gfortran-dg.exp, specifically no "top-level code" running
tests like execute.exp or dg.exp, so including it should
have no such effect...but I see that the files it include
*do* have top-level code (setting global variables for use
by the testsuite machinery, *not* running tests).

Perhaps I should ignore that misnomer and put
gfortran-dg-rmunits in fortran-modules.exp in order to put
pollution worries to rest.  After all, that file already has
the utility proc igrep, used in gfortran-dg-rmunits.  So,
new version coming up.

brgds, H-P

[PATCH] libgcc, libstdc++: Make more entities no longer TU-local [PR115126]

2024-09-24 Thread Nathaniel Shead

I found that my previous minimal change to libstdc++ was only sufficient
to pass regtest on x86_64-pc-linux-gnu; Linaro complained about ARM and
aarch64.  This patch removes the rest of the internal-linkage entities I
could find exposed via libstdc++.

The libgcc changes include some blocks specific to FreeBSD, Solaris <10,
and HP-UX; I haven't been able to test these changes at this time.
Happy to adjust or remove those hunks as needed.  Apologies if I haven't
CC'd in the correct people.

Bootstrapped and regtested on x86_64-pc-linux-gnu and
aarch64-unknown-linux-gnu, OK for trunk?

-- >8 --

In C++20, modules streaming check for exposures of TU-local entities.
In general exposing internal linkage functions in a header is liable to
cause ODR violations in C++, and this is now detected in a module
context.

This patch goes through and removes 'static' from many functions exposed
through libstdc++ to prevent code like the following from failing:

  export module M;
  extern "C++" {
#include 
  }

Since gthreads is used from C as well, we need to choose whether to use
'inline' or 'static inline' depending on whether we're compiling for C
or C++ (since the semantics of 'inline' are different between the
languages).  Additionally we need to remove static global variables, so
we migrate these to function-local statics to avoid the ODR issues.

There doesn't seem to be a good workaround for weakrefs, so I've left
them as-is and will work around it in the modules streaming code to
consider them as not TU-local.

The same issue occurs in the objective-C specific parts of gthreads, but
I'm not familiar with the surrounding context and we don't currently
test modules with Objective C++ anyway so I've left it as-is.

PR libstdc++/115126

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_INLINE): New macro.
(__gthread_active): Convert from variable to function.
(__gthread_trigger): Mark as __GTHREAD_INLINE instead of static.
(__gthread_active_p): Likewise.
(__gthread_create): Likewise.
(__gthread_join): Likewise.
(__gthread_detach): Likewise.
(__gthread_equal): Likewise.
(__gthread_self): Likewise.
(__gthread_yield): Likewise.
(__gthread_once): Likewise.
(__gthread_key_create): Likewise.
(__gthread_key_delete): Likewise.
(__gthread_getspecific): Likewise.
(__gthread_setspecific): Likewise.
(__gthread_mutex_init_function): Likewise.
(__gthread_mutex_destroy): Likewise.
(__gthread_mutex_lock): Likewise.
(__gthread_mutex_trylock): Likewise.
(__gthread_mutex_timedlock): Likewise.
(__gthread_mutex_unlock): Likewise.
(__gthread_recursive_mutex_init_function): Likewise.
(__gthread_recursive_mutex_lock): Likewise.
(__gthread_recursive_mutex_trylock): Likewise.
(__gthread_recursive_mutex_timedlock): Likewise.
(__gthread_recursive_mutex_unlock): Likewise.
(__gthread_recursive_mutex_destroy): Likewise.
(__gthread_cond_init_function): Likewise.
(__gthread_cond_broadcast): Likewise.
(__gthread_cond_signal): Likewise.
(__gthread_cond_wait): Likewise.
(__gthread_cond_timedwait): Likewise.
(__gthread_cond_wait_recursive): Likewise.
(__gthread_cond_destroy): Likewise.
(__gthread_rwlock_rdlock): Likewise.
(__gthread_rwlock_tryrdlock): Likewise.
(__gthread_rwlock_wrlock): Likewise.
(__gthread_rwlock_trywrlock): Likewise.
(__gthread_rwlock_unlock): Likewise.
* gthr-single.h (__GTHREAD_INLINE): New macro.
(__gthread_active_p): Mark as __GTHREAD_INLINE instead of static.
(__gthread_once): Likewise.
(__gthread_key_create): Likewise.
(__gthread_key_delete): Likewise.
(__gthread_getspecific): Likewise.
(__gthread_setspecific): Likewise.
(__gthread_mutex_destroy): Likewise.
(__gthread_mutex_lock): Likewise.
(__gthread_mutex_trylock): Likewise.
(__gthread_mutex_unlock): Likewise.
(__gthread_recursive_mutex_lock): Likewise.
(__gthread_recursive_mutex_trylock): Likewise.
(__gthread_recursive_mutex_unlock): Likewise.
(__gthread_recursive_mutex_destroy): Likewise.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr.h (std::__is_shared_ptr): Remove
unnecessary 'static'.
* include/bits/unique_ptr.h (std::__is_unique_ptr): Likewise.
* include/std/future (std::__craete_task_state): Likewise.
* include/std/shared_mutex (_GLIBCXX_GTRHW): Likewise.
(__glibcxx_rwlock_init): Likewise.
(__glibcxx_rwlock_timedrdlock): Likewise.
(__glibcxx_rwlock_timedwrlock): Likewise.
(__glibcxx_rwlock_rdlock): Likewise.
(__glibcxx_rwlock_tryrdlock): Likewise.
(__glibcxx_rwlock_wrlock): Likewise.
(__glibcxx_rwlock_trywrlock): Likewise.
(__gli

[PATCH 11/10] c++/modules: Treat weakrefs as not TU-local [PR115126]

2024-09-24 Thread Nathaniel Shead

This follows up on some more test failures reported by Linaro on
aarch64.  The testcase also depends on the libgcc/libstdc++ patch here: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663749.html

To avoid an intermediary state where aarch64 regtests fail I could
include the module.cc changes in patch 6 of this series.  Let me know if
you'd like me to send through a full updated v2 patch series instead of
having all these 'extra' patches fixing issues on other platforms...

Bootstrapped and regtested on x86_64-pc-linux and
aarch64-unknown-linux-gnu, OK for trunk?

-- >8 --

On some targets the gthreads support code uses weakref aliases on
entities marked 'static'.  By the C++ standard these have internal
linkage, but we really shouldn't consider these as TU-local.

This provides enough of the puzzle to pass the testcase in the PR on at
least x86_64-linux and aarch64-linux; we'll see what happens on other
targets.

PR c++/115126

gcc/cp/ChangeLog:

* module.cc (depset::hash::is_tu_local_entity): Don't treat weak
entities as TU-local.

gcc/testsuite/ChangeLog:

* g++.dg/modules/xtreme-header-8.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 5 +
 gcc/testsuite/g++.dg/modules/xtreme-header-8.C | 8 
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-8.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index d54f1c88366..3e9b63c1e56 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13135,6 +13135,11 @@ depset::hash::is_tu_local_entity (tree decl, bool 
explain/*=false*/)
   linkage_kind kind = decl_linkage (decl);
   if (kind == lk_internal)
 {
+  /* But don't consider weak entities as TU-local.  */
+  tree inner = STRIP_TEMPLATE (decl);
+  if (VAR_OR_FUNCTION_DECL_P (inner) && DECL_WEAK (inner))
+   return false;
+
   if (explain)
inform (loc, "%qD declared with internal linkage", decl);
   return true;
diff --git a/gcc/testsuite/g++.dg/modules/xtreme-header-8.C 
b/gcc/testsuite/g++.dg/modules/xtreme-header-8.C
new file mode 100644
index 000..9da4e01cc68
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/xtreme-header-8.C
@@ -0,0 +1,8 @@
+// PR c++/115126
+// { dg-additional-options "-fmodules-ts -Wignored-exposures" }
+// { dg-module-cmi xstd }
+
+export module xstd;
+extern "C++" {
+  #include "xtreme-header.h"
+}
-- 
2.46.0

Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE

2024-09-24 Thread Hongtao Liu

On Tue, Sep 24, 2024 at 5:46 PM Uros Bizjak  wrote:
>
> On Tue, Sep 24, 2024 at 11:23 AM liuhongt  wrote:
> >
> > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT.
> > Otherwise NULL_RTX.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ready push to trunk.
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.dg/rtl/x86_64/vector_eq.c: New test.
> > ---
> >  gcc/config/i386/i386.h  |  5 +++-
> >  gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 +
> >  2 files changed, 30 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index c1ec92ffb15..b12be41424f 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, 
> > const char **argv);
> > and give entire struct the alignment of an int.  */
> >  /* Required on the 386 since it doesn't have bit-field insns.  */
> >  #define PCC_BITFIELD_TYPE_MATTERS 1
> > -
> > +
> > +#define VECTOR_STORE_FLAG_VALUE(MODE) \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX)
> > +
> >  /* Standard register usage.  */
> >
> >  /* This processor has special stack-like registers.  See reg-stack.cc
> > diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c 
> > b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
> > new file mode 100644
> > index 000..b82603d0b64
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile { target x86_64-*-* } } */
>
> target { { i?86-*-* x86_64-*-* } && lp64 }
Thanks, changed.
>
> Uros.
>
> > +/* { dg-additional-options "-O2 -march=x86-64-v3" } */
> > +
> > +typedef int v4si __attribute__((vector_size(16)));
> > +
> > +v4si __RTL (startwith ("vregs")) foo (void)
> > +{
> > +(function "foo"
> > +  (insn-chain
> > +(block 2
> > +  (edge-from entry (flags "FALLTHRU"))
> > +  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > +  (cnote 2 NOTE_INSN_FUNCTION_BEG)
> > +  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) 
> > (const_int 0) (const_int 0) (const_int 0)])))
> > +  (cinsn 5 (set (reg:V4SI <2>)
> > +   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
> > +  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
> > +  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
> > +  (edge-to exit (flags "FALLTHRU"))
> > +)
> > +  )
> > + (crtl (return_rtx (reg/i:V4SI xmm0)))
> > +)
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "vpxor" } } */
> > --
> > 2.31.1
> >



-- 
BR,
Hongtao

Re: [PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-24 Thread Hongtao Liu

On Wed, Sep 25, 2024 at 1:07 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds GENERIC and GIMPLE folders for various
> x86 min/max builtins.
> As discussed, these builtins have effectively x < y ? x : y
> (or x > y ? x : y) behavior.
> The GENERIC folding is done if all the (relevant) arguments are
> constants (such as VECTOR_CST for vectors) and is done because
> the GIMPLE folding can't easily handle masking, rounding and the
> ss/sd cases (in a way that it would be pattern recognized back to the
> corresponding instructions).  The GIMPLE folding is also done just
> for TARGET_SSE4 or later when optimizing, otherwise it is apparently
> not matched back.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-09-24  Jakub Jelinek  
>
> PR target/116738
> * config/i386/i386.cc (ix86_fold_builtin): Handle
> IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*.
> (ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*.
>
> * gcc.target/i386/avx512f-pr116738-1.c: New test.
> * gcc.target/i386/avx512f-pr116738-2.c: New test.
>
> --- gcc/config/i386/i386.cc.jj  2024-09-12 10:56:57.344683959 +0200
> +++ gcc/config/i386/i386.cc 2024-09-23 15:15:40.154783766 +0200
> @@ -18507,6 +18507,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
> = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>enum rtx_code rcode;
>bool is_vshift;
> +  enum tree_code tcode;
> +  bool is_scalar;
>unsigned HOST_WIDE_INT mask;
>
>switch (fn_code)
> @@ -18956,6 +18958,133 @@ ix86_fold_builtin (tree fndecl, int n_ar
> }
>   break;
>
> +   case IX86_BUILTIN_MINSS:
> +   case IX86_BUILTIN_MINSH_MASK:
> + tcode = LT_EXPR;
> + is_scalar = true;
> + goto do_minmax;
> +
> +   case IX86_BUILTIN_MAXSS:
> +   case IX86_BUILTIN_MAXSH_MASK:
> + tcode = GT_EXPR;
> + is_scalar = true;
> + goto do_minmax;
> +
> +   case IX86_BUILTIN_MINPS:
> +   case IX86_BUILTIN_MINPD:
> +   case IX86_BUILTIN_MINPS256:
> +   case IX86_BUILTIN_MINPD256:
> +   case IX86_BUILTIN_MINPS512:
> +   case IX86_BUILTIN_MINPD512:
> +   case IX86_BUILTIN_MINPS128_MASK:
> +   case IX86_BUILTIN_MINPD128_MASK:
> +   case IX86_BUILTIN_MINPS256_MASK:
> +   case IX86_BUILTIN_MINPD256_MASK:
> +   case IX86_BUILTIN_MINPH128_MASK:
> +   case IX86_BUILTIN_MINPH256_MASK:
> +   case IX86_BUILTIN_MINPH512_MASK:
> + tcode = LT_EXPR;
> + is_scalar = false;
> + goto do_minmax;
> +
> +   case IX86_BUILTIN_MAXPS:
> +   case IX86_BUILTIN_MAXPD:
> +   case IX86_BUILTIN_MAXPS256:
> +   case IX86_BUILTIN_MAXPD256:
> +   case IX86_BUILTIN_MAXPS512:
> +   case IX86_BUILTIN_MAXPD512:
> +   case IX86_BUILTIN_MAXPS128_MASK:
> +   case IX86_BUILTIN_MAXPD128_MASK:
> +   case IX86_BUILTIN_MAXPS256_MASK:
> +   case IX86_BUILTIN_MAXPD256_MASK:
> +   case IX86_BUILTIN_MAXPH128_MASK:
> +   case IX86_BUILTIN_MAXPH256_MASK:
> +   case IX86_BUILTIN_MAXPH512_MASK:
> + tcode = GT_EXPR;
> + is_scalar = false;
> +   do_minmax:
> + gcc_assert (n_args >= 2);
> + if (TREE_CODE (args[0]) != VECTOR_CST
> + || TREE_CODE (args[1]) != VECTOR_CST)
> +   break;
> + mask = HOST_WIDE_INT_M1U;
> + if (n_args > 2)
> +   {
> + gcc_assert (n_args >= 4);
> + /* This is masked minmax.  */
> + if (TREE_CODE (args[3]) != INTEGER_CST
> + || TREE_SIDE_EFFECTS (args[2]))
> +   break;
> + mask = TREE_INT_CST_LOW (args[3]);
> + unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0]));
> + mask |= HOST_WIDE_INT_M1U << elems;
> + if (mask != HOST_WIDE_INT_M1U
> + && TREE_CODE (args[2]) != VECTOR_CST)
> +   break;
> + if (n_args >= 5)
> +   {
> + if (!tree_fits_uhwi_p (args[4]))
> +   break;
> + if (tree_to_uhwi (args[4]) != 4
> + && tree_to_uhwi (args[4]) != 8)
> +   break;
> +   }
> + if (mask == (HOST_WIDE_INT_M1U << elems))
> +   return args[2];
> +   }
> + /* Punt on NaNs, unless exceptions are disabled.  */
> + if (HONOR_NANS (args[0])
> + && (n_args < 5 || tree_to_uhwi (args[4]) != 8))
> +   for (int i = 0; i < 2; ++i)
> + {
> +   unsigned count = vector_cst_encoded_nelts (args[i]), j;
> +   for (j = 0; j < count; ++j)
> + if (!tree_expr_nan_p (VECTOR_CST_ENCODED_ELT (args[i], j)))
Is this a typo? I assume you want to check if the component is NAN, so
tree_expr_nan_p, not !tree_expr_nan_p?
> +   break;
> +   if (j < count)
> +

[PATCH v2] gfortran testsuite: Remove unit-files in files having open-statements, PR116701

2024-09-24 Thread Hans-Peter Nilsson

Changes since v1:
- Rename gfortran-dg-rmunits to fortran-delete-unit-files.
- Move it to lib/fortran-modules.exp.
- Tweak commit message accordingly and mention cause of placement of
  the proc.
- Tweak proc comment to mention why keeping removals unique despite
  comment.

Here's a general approach to handle PR116701.  I considered
adding manual deletions as quoted below and mentioned in the
PR, but seeing the handling of "integer 8" in
fortran-torture-execute I decided to follow that example:
better scan the source for open-statements than relying on
manual annotations and people remembering to add them for
new test-cases.

I hope the inclusion of gfortran-dg.exp in
fortran-torture.exp is not controversial, but there's no
fortran-specific testsuite file common to dg and
classic-torture and also this placement is still in the
"Utility routines" section of gfortran-dg.exp.  (BTW, the C
torture-tests changed to the dg framework some time ago - no
more .x-files there and dg-directives actually work - there
are some in gfortran.fortran-torture that are apparently
ignored!)

There's one further cleanup possible, removing the manual
removal in open_errors_2.f90 (which should have used
"target", not "build")

Works for cris-elf (no regressions).  Version v1 was also
similarly regtested on native x86_64-linux-gnu.  Manual
checks have verified the unit-removal.

Ok to commit?

-- >8 --
PR testsuite/116701 shows that left-behind files from
unnamed gfortran open statements (named unit.N, where N =
unit number) can interfere with the result of a subsequent
run.  While that's unlikely to happen for a "real" fortran
target or a test with a deleting close-statement, test-cases
should not rely on previous test-cases passing and not
execute along different execution paths depending on earlier
runs, even if the difference is benevolent.

Most but not all fortran test-cases go through
gfortran-dg-runtest (gfortran.dg) or fortran-torture-execute
(gfortran.fortran-torture).  However, the exceptions, with
more complex framework and call-chains, either don't run or
don't have open-statements, so a more complex solution
doesn't seem worthwhile.  If test-cases with open-statements
are added later to those parts of the test-suite, calls to
fortran-delete-unit-files at the right spot may be added or
worst case, "manual" cleanup-calls added, like:
! { dg-final { remote_file target delete "fort.10" } }
Put the new proc in fortran-modules.exp since that's where other
common fortran-testsuite dejagnu-library functions are located.

PR testsuite/116701
* lib/fortran-modules.exp (fortran-delete-unit-files): New proc.
* lib/gfortran-dg.exp (gfortran-dg-runtest): Call
fortran-delete-unit-files after executing test.
* lib/fortran-torture.exp (fortran-torture-execute): Ditto.
---
 gcc/testsuite/lib/fortran-modules.exp | 21 +
 gcc/testsuite/lib/fortran-torture.exp |  2 ++
 gcc/testsuite/lib/gfortran-dg.exp |  1 +
 3 files changed, 24 insertions(+)

diff --git a/gcc/testsuite/lib/fortran-modules.exp 
b/gcc/testsuite/lib/fortran-modules.exp
index 158b16bada91..a7196f13ed22 100644
--- a/gcc/testsuite/lib/fortran-modules.exp
+++ b/gcc/testsuite/lib/fortran-modules.exp
@@ -172,3 +172,24 @@ proc igrep { args } {
 }
 return $grep_out
 }
+
+# If the code has any "open" statements for numbered units, make sure
+# no corresponding output file remains.  Redundant remove operations
+# are ok, but duplicate removals look sloppy, so track for uniqueness.
+proc fortran-delete-unit-files { src } {  
+set openpat {open *\( *(?:unit *= *)?([0-9]+)}
+set openmatches [igrep $src $openpat]
+if {![string match "" $openmatches]} {
+   # verbose -log "Found \"$openmatches\""
+   set deleted_units {}
+   foreach openmatch $openmatches {
+   regexp -nocase -- "$openpat" $openmatch match unit
+   if {[lsearch $deleted_units $unit] < 0} {
+   set rmfile "fort.$unit"
+   verbose -log "Deleting $rmfile"
+   remote_file target delete "fort.$unit"
+   lappend deleted_units $unit
+   }
+   }
+}
+}
diff --git a/gcc/testsuite/lib/fortran-torture.exp 
b/gcc/testsuite/lib/fortran-torture.exp
index 66f5bc822232..0727fb4fb0a6 100644
--- a/gcc/testsuite/lib/fortran-torture.exp
+++ b/gcc/testsuite/lib/fortran-torture.exp
@@ -332,6 +332,8 @@ proc fortran-torture-execute { src } {
catch { remote_file build delete $executable }
 }
$status "$testcase execution, $option"
+
+   fortran-delete-unit-files $src
 }
 cleanup-modules ""
 }
diff --git a/gcc/testsuite/lib/gfortran-dg.exp 
b/gcc/testsuite/lib/gfortran-dg.exp
index fcba95dc3961..2edc09e5c995 100644
--- a/gcc/testsuite/lib/gfortran-dg.exp
+++ b/gcc/testsuite/lib/gfortran-dg.exp
@@ -160,6 +160,7 @@ proc gfortran-dg-runtest { testcases flags 
default-extra-flags } {
foreach flags_t $option_list {
verbo

Re: libgomp: with USM, init 'link' variables with host address

2024-09-24 Thread Tobias Burnus


Now committed as r15-3836-g4cb20dc043cf70

Contrary to the originally posted patch, it also acts on the newer/newly 
added 'omp requires self_maps'.


In the area of (unified-)shared memory/self maps, the next step seems to 
be to do still mapping for static variables – before moving to 
refinements like how to handle implicit 'declare target' for static 
variables, …


For this piece of code, we also want to run it for APUs even when no USM 
has been requested, avoid adding those to the mapping table (for self 
maps) and do a more efficient mapping (e.g. memcpy or avoid multiple locks).


Tobias

Tobias Burnus wrote:


short version: I think the patch as posted is fine and no action 
beyond is needed for this one issue.


See below for the long version.

Possibly modifications (now or as follow up):
- using memcpy + or let the plugin do it
- not adding link variables to the splay tree with 'USM'.

Thomas Schwinge wrote:

Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

(I yet have to set up such a USM configuration...)


You already used an USM config, e.g., when running gfx90a (likewise: 
gfx90c), except that USM on mainline it currently only works if you 
explicitly set 'export HSA_XNACK=1'.


For Nvptx, you need a post-Volta GPU with the open-kernels driver, 
which is for newer driver versions the default.


* * *

Do I understand correctly that even if
'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the
'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not
(yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'?


We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires 
unified_shared_memory'.


But, indeed, we cannot skip the memory mapping parts – due to the way 
we handle static variables.


* * *


+
+  if (is_link_var
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+    gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
+    &k->host_start, sizeof (void *), false, NULL);
  }

Calling 'gomp_copy_host2dev' looks a bit funny given we've just
determined USM (..., but I'm not asking for plain 'memcpy').


I guess a plain memcpy would do as well. [Assuming that the device's 
static variable is host accessible, which it probably is and should be.]


I add it to my to-do list for USM-related tasks to change this; 
possibly moving it to the plugin side has some advantages? Possibly 
not adding it to the splay tree if not needed. (Cf. below for env var 
discussion.)


Regarding the unload: For 'declare target link(A)', we have, e.g., 
'static int *A' on the device side. Thus, we could do 'A = NULL' – and 
rather should do 'A = {clobber}', but that's rather pointless in 
general and especially when unloading the image.



What's the advantage/rationale of doing this here vs. in
'gomp_map_vars_internal' for 'REFCOUNT_LINK'?  (May be worth a source
code comment?)


(A, B, C refers to the following example.)

We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in 
gomp_map_vars_internal.


Besides: We only want to do the initialization once and not every time 
gomp_map_vars_internal is called.


I think the following program may help to understand the issue and the 
patch better.


Note: While A, B, C are 'int …[3]' on the host, on the device we only 
have 'int B[3]' while for A it's 'int *A' and C only exists on the host.


 * * *

#pragma requires unified_shared_memory

static int A[3], B[3], C[3];
#pragma omp declare target link(A) enter(B)

#pragma omp begin declare target
void f(int *p)
{
   A[2] += B[2] + p[2];  // p points to the host's C variable
}
#pragma omp end declare target

void foo(int dev) {
  int *ptr = C;
  #pragma omp target firstprivate(ptr) device(dev)
    f (ptr);
}


* * *

Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before 
the target

region and inside the target region.

'B' points to the device local version of the variable.

And 'A' on a non-host device is likely to be NULL ('static int *A' + 
.BSS) before this patch.

Or pointing to the host's 'A' with this patch.

* * *

With A pointing to the host version (and likewise 'p' pointing to the 
host C), host fallback
and device version yield identical result for 'A' and for 'C' (via 
ptr/p). — However, 'B' on
host and non-host device have nothing in common. While that might be 
fine, in general it is not.


Hence, in order to get for a .BSS valued 'B' the same result on host 
and device, we need, e.g.


#pragma omp data map(always: B) device(dev)
  foo (dev);

to call 'foo' to ensure that the two 'B' are in sync.

* * *

Code wise, this means that with GOMP_OFFLOAD_CAP_SHARED_MEM, we still 
have

to apply the map for 'declare target enter(…)' variables, except if host
and device share the same code – but that should only be the case for
host fallback (= initial device) and, possibly, 
GOMP_OFFLOAD_CAP_NATIVE_EXEC.


* * *

NOTE: OpenMP still permits to honor explicit 'map' with 'requires 
unified_sha

[pushed] libgcc, Darwin: Drop the legacy library build for macOS >= 15 [PR116809].

2024-09-24 Thread Iain Sandoe

Tested on i686-darwin9, 17; x86_64-darwin17, 19, 21, 23 and my FX on
x86_64 darwin24, pushed to trunk, thanks
Iain

--- 8< ---

We have been building a legacy libgcc_s.1 DSO to support code that
was built with older compilers.

>From macOS 15,  the unwinder no longer exports some of the symbols used
in that library which (a) cuases bootstrap fail and (b) means that the
legacy library is no longer useful.

No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.

PR target/116809

libgcc/ChangeLog:

* config.host: Build legacy libgcc_s.1 on hosts before macOS 15.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.

Signed-off-by: Iain Sandoe 
---
 libgcc/config.host  | 11 +++
 libgcc/config/i386/t-darwin |  3 ---
 libgcc/config/rs6000/t-darwin   |  3 ---
 libgcc/config/t-darwin-libgccs1 |  3 +++
 4 files changed, 10 insertions(+), 10 deletions(-)
 create mode 100644 libgcc/config/t-darwin-libgccs1

diff --git a/libgcc/config.host b/libgcc/config.host
index 5c6b656531f..00bd6384c0f 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -239,22 +239,25 @@ case ${host} in
   esac
   tmake_file="$tmake_file t-slibgcc-darwin"
   case ${host} in
+x86_64-*-darwin2[0-3]*)
+  tmake_file="t-darwin-min-11 t-darwin-libgccs1 $tmake_file"
+  ;;
 *-*-darwin2*)
   tmake_file="t-darwin-min-11 $tmake_file"
   ;;
 *-*-darwin1[89]*)
-  tmake_file="t-darwin-min-8 $tmake_file"
+  tmake_file="t-darwin-min-8 t-darwin-libgccs1 $tmake_file"
   ;;
 *-*-darwin9* | *-*-darwin1[0-7]*)
-  tmake_file="t-darwin-min-5 $tmake_file"
+  tmake_file="t-darwin-min-5 t-darwin-libgccs1 $tmake_file"
   ;;
 *-*-darwin[4-8]*)
-  tmake_file="t-darwin-min-1 $tmake_file"
+  tmake_file="t-darwin-min-1 t-darwin-libgccs1 $tmake_file"
   ;;
 *)
   # Fall back to configuring for the oldest system known to work with
   # all archs and the current sources.
-  tmake_file="t-darwin-min-5 $tmake_file"
+  tmake_file="t-darwin-min-5 t-darwin-libgccs1 $tmake_file"
   echo "Warning: libgcc configured to support macOS 10.5" 1>&2
   ;;
   esac
diff --git a/libgcc/config/i386/t-darwin b/libgcc/config/i386/t-darwin
index 4c18da1efbf..c6b3acaaca2 100644
--- a/libgcc/config/i386/t-darwin
+++ b/libgcc/config/i386/t-darwin
@@ -4,6 +4,3 @@ LIB2FUNCS_EXCLUDE = _fixtfdi _fixunstfdi _floatditf _floatunditf
 
 # Extra symbols for this port.
 SHLIB_MAPFILES += $(srcdir)/config/i386/libgcc-darwin.ver
-
-# Build a legacy libgcc_s.1
-BUILD_LIBGCCS1 = YES
diff --git a/libgcc/config/rs6000/t-darwin b/libgcc/config/rs6000/t-darwin
index 183d0df92ce..8b513bdb1d7 100644
--- a/libgcc/config/rs6000/t-darwin
+++ b/libgcc/config/rs6000/t-darwin
@@ -56,6 +56,3 @@ unwind-dw2_s.o: HOST_LIBGCC2_CFLAGS += -maltivec
 unwind-dw2.o: HOST_LIBGCC2_CFLAGS += -maltivec
 
 LIB2ADDEH += $(srcdir)/config/rs6000/darwin-fallback.c
-
-# Build a legacy libgcc_s.1
-BUILD_LIBGCCS1 = YES
diff --git a/libgcc/config/t-darwin-libgccs1 b/libgcc/config/t-darwin-libgccs1
new file mode 100644
index 000..b88b1a5bba8
--- /dev/null
+++ b/libgcc/config/t-darwin-libgccs1
@@ -0,0 +1,3 @@
+
+# Build a legacy libgcc_s.1
+BUILD_LIBGCCS1 = YES
-- 
2.39.2 (Apple Git-143)

Re: [PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-24 Thread Jakub Jelinek

On Tue, Sep 24, 2024 at 01:34:44PM -0400, Jason Merrill wrote:
> Let's also give an error for trying to disable it in C++23+.
> Missing function comment, maybe just use the one below?
> Please add a comment to this and range-for4 explaining that this is to get
> the fix enabled in GNU modes.
> 
> OK with those changes.

Done, committed now, thanks for the review.

I've also committed the following tweak for the status page:

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index d986fc79..76f6ef6d 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -576,7 +576,7 @@
 
Wording for P2644R1 Fix for Range-based for Loop 
https://wg21.link/p2718";>P2718R0
-   https://gcc.gnu.org/PR107637";>No
+   15
__cpp_range_based_for >= 202211L

[PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE

2024-09-24 Thread liuhongt

Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT.
Otherwise NULL_RTX.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro.

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/x86_64/vector_eq.c: New test.
---
 gcc/config/i386/i386.h  |  5 +++-
 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 +
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index c1ec92ffb15..b12be41424f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
and give entire struct the alignment of an int.  */
 /* Required on the 386 since it doesn't have bit-field insns.  */
 #define PCC_BITFIELD_TYPE_MATTERS 1
-
+
+#define VECTOR_STORE_FLAG_VALUE(MODE) \
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX)
+
 /* Standard register usage.  */
 
 /* This processor has special stack-like registers.  See reg-stack.cc
diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c 
b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
new file mode 100644
index 000..b82603d0b64
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-additional-options "-O2 -march=x86-64-v3" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+
+v4si __RTL (startwith ("vregs")) foo (void)
+{
+(function "foo"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cnote 2 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) 
(const_int 0) (const_int 0) (const_int 0)])))
+  (cinsn 5 (set (reg:V4SI <2>)
+   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
+  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
+  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
+  (edge-to exit (flags "FALLTHRU"))
+)
+  )
+ (crtl (return_rtx (reg/i:V4SI xmm0)))
+)
+}
+
+/* { dg-final { scan-assembler-not "vpxor" } } */
-- 
2.31.1

[PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-24 Thread Jakub Jelinek

On Mon, Sep 23, 2024 at 03:46:36PM -0400, Jason Merrill wrote:
> > -frange-based-for-ext-temps
> > or do you have better suggestion?
> 
> I'd probably drop "based", "range-for" seems enough.
> 
> > Shall we allow also disabling it in C++23 or later modes, or override
> > user choice unconditionally for C++23+ and only allow users to
> > enable/disable it in C++11-C++20?
> 
> Hmm, I think the latter.
> 
> > What about the __cpp_range_based_for predefined macro?
> > Shall it be defined to the C++23 202211L value if the switch is on?
> > While that could be done in theory for C++17 and later code, for C++11/14
> > __cpp_range_based_for is 200907L and doesn't include the C++17
> > 201603L step.  Or keep the macro only for C++23 and later?
> 
> I think update the macro for 17 and later.

Ok.

Here is a new patch.

> > > > @@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre
> > > >  else
> > > > {
> > > >   range_temp = build_range_temp (init);
> > > > + tree name = DECL_NAME (range_temp);
> > > >   DECL_NAME (range_temp) = NULL_TREE;
> > > >   pushdecl (range_temp);
> > > > + DECL_NAME (range_temp) = name;
> > > >   cp_finish_decl (range_temp, init,
> > > >   /*is_constant_init*/false, NULL_TREE,
> > > >   LOOKUP_ONLYCONVERTING);
> > > > + DECL_NAME (range_temp) = NULL_TREE;
> > > 
> > > This messing with the name needs a rationale.  What wants it to be null?
> > 
> > I'll add comments.  The first = NULL_TREE; is needed so that pushdecl
> > doesn't register the temporary for name lookup, the = name now is so that
> > cp_finish_decl recognizes the temporary as range based for temporary
> > for the lifetime extension, and the last one is just to preserve previous
> > behavior, not have it visible in debug info etc.
> 
> But cp_convert_range_for doesn't ever set the name to NULL_TREE, why should
> the OMP variant be different?
> 
> Having it visible to name lookup in the debugger seems beneficial. Having it
> visible to the code seems less useful, but not important to prevent.

So, in the end it works fine even for the OpenMP case when not inside of a
template, all I had to add is the renaming of the symbol at the end after
pop_scope from "__for_range " to "__for_range" etc.
It doesn't work unfortunately during instantiation, we only create a single
scope in that case for the whole loop nest rather than one for each loop in
it and changing that isn't easy.  With the "__for_range " name in, if there
are 2+ range based for loops in the OpenMP loop nest (collapsed or ordered),
one gets then errors about defining it multiple times.
I'll try to fix that up at incrementally later, for now I just went with
a new flag to the function, so that it does the DECL_NAME dances only when
called from the instantiation (and confirmed actually all 3 spots are
needed, clearing before pushdecl, resetting back before cp_finish_decl and
clearing after cp_finish_decl, the last one so that pop_scope doesn't ICE
on seeing the name change).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-24  Jakub Jelinek  

PR c++/107637
gcc/
* omp-general.cc (find_combined_omp_for, find_nested_loop_xform):
Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR.
* doc/invoke.texi (frange-for-ext-temps): Document.  Add
-fconcepts to the C++ option list.
gcc/c-family/
* c.opt (frange-for-ext-temps): New option.
* c-opts.cc (c_common_post_options): Set flag_range_for_ext_temps
for C++23 or later or for C++11 or later in !flag_iso mode if
the option wasn't set by user.
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_range_based_for
value for flag_range_for_ext_temps from 201603L to 202212L in C++17
or later.
* c-omp.cc (c_find_nested_loop_xform_r): Handle CLEANUP_POINT_EXPR
like TRY_FINALLY_EXPR.
gcc/cp/
* cp-tree.h: Implement C++23 P2718R0 - Wording for P2644R1 Fix for
Range-based for Loop.
(cp_convert_omp_range_for): Add bool tmpl_p argument.
(find_range_for_decls): Declare.
* parser.cc (cp_convert_range_for): For flag_range_for_ext_temps call
push_stmt_list () before cp_finish_decl for range_temp and save it
temporarily to FOR_INIT_STMT.
(cp_convert_omp_range_for): Add tmpl_p argument.  If set, remember
DECL_NAME of range_temp and for cp_finish_decl call restore it before
clearing it again, if unset, don't adjust DECL_NAME of range_temp at
all.
(cp_parser_omp_loop_nest): For flag_range_for_ext_temps range for add
CLEANUP_POINT_EXPR around sl.  Call find_range_for_decls and adjust
DECL_NAMEs for range fors if not processing_template_decl.  Adjust
cp_convert_omp_range_for caller.  Remove superfluous backslash at the
end of line.
* decl.cc (initialize_local_v

[committed] i386: Fix comment typo

2024-09-24 Thread Jakub Jelinek

Hi!

Found a comment typo, fixed as obvious.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2024-09-24  Jakub Jelinek  

* config/i386/i386-expand.cc (ix86_expand_round_builtin): Fix comment
typo, insead -> instead.

--- gcc/config/i386/i386-expand.cc.jj   2024-09-20 08:57:02.496083163 +0200
+++ gcc/config/i386/i386-expand.cc  2024-09-23 11:01:14.128079764 +0200
@@ -12748,7 +12748,7 @@ ix86_expand_round_builtin (const struct
  /* Skip erasing embedded rounding for below expanders who
 generates multiple insns.  In ix86_erase_embedded_rounding
 the pattern will be transformed to a single set, and emit_insn
-appends the set insead of insert it to chain.  So the insns
+appends the set instead of insert it to chain.  So the insns
 emitted inside define_expander would be ignored.  */
  switch (icode)
{

Jakub

[PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-24 Thread Jakub Jelinek

Hi!

The following patch adds GENERIC and GIMPLE folders for various
x86 min/max builtins.
As discussed, these builtins have effectively x < y ? x : y
(or x > y ? x : y) behavior.
The GENERIC folding is done if all the (relevant) arguments are
constants (such as VECTOR_CST for vectors) and is done because
the GIMPLE folding can't easily handle masking, rounding and the
ss/sd cases (in a way that it would be pattern recognized back to the
corresponding instructions).  The GIMPLE folding is also done just
for TARGET_SSE4 or later when optimizing, otherwise it is apparently
not matched back.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-24  Jakub Jelinek  

PR target/116738
* config/i386/i386.cc (ix86_fold_builtin): Handle
IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*.
(ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*.

* gcc.target/i386/avx512f-pr116738-1.c: New test.
* gcc.target/i386/avx512f-pr116738-2.c: New test.

--- gcc/config/i386/i386.cc.jj  2024-09-12 10:56:57.344683959 +0200
+++ gcc/config/i386/i386.cc 2024-09-23 15:15:40.154783766 +0200
@@ -18507,6 +18507,8 @@ ix86_fold_builtin (tree fndecl, int n_ar
= (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl);
   enum rtx_code rcode;
   bool is_vshift;
+  enum tree_code tcode;
+  bool is_scalar;
   unsigned HOST_WIDE_INT mask;
 
   switch (fn_code)
@@ -18956,6 +18958,133 @@ ix86_fold_builtin (tree fndecl, int n_ar
}
  break;
 
+   case IX86_BUILTIN_MINSS:
+   case IX86_BUILTIN_MINSH_MASK:
+ tcode = LT_EXPR;
+ is_scalar = true;
+ goto do_minmax;
+
+   case IX86_BUILTIN_MAXSS:
+   case IX86_BUILTIN_MAXSH_MASK:
+ tcode = GT_EXPR;
+ is_scalar = true;
+ goto do_minmax;
+
+   case IX86_BUILTIN_MINPS:
+   case IX86_BUILTIN_MINPD:
+   case IX86_BUILTIN_MINPS256:
+   case IX86_BUILTIN_MINPD256:
+   case IX86_BUILTIN_MINPS512:
+   case IX86_BUILTIN_MINPD512:
+   case IX86_BUILTIN_MINPS128_MASK:
+   case IX86_BUILTIN_MINPD128_MASK:
+   case IX86_BUILTIN_MINPS256_MASK:
+   case IX86_BUILTIN_MINPD256_MASK:
+   case IX86_BUILTIN_MINPH128_MASK:
+   case IX86_BUILTIN_MINPH256_MASK:
+   case IX86_BUILTIN_MINPH512_MASK:
+ tcode = LT_EXPR;
+ is_scalar = false;
+ goto do_minmax;
+
+   case IX86_BUILTIN_MAXPS:
+   case IX86_BUILTIN_MAXPD:
+   case IX86_BUILTIN_MAXPS256:
+   case IX86_BUILTIN_MAXPD256:
+   case IX86_BUILTIN_MAXPS512:
+   case IX86_BUILTIN_MAXPD512:
+   case IX86_BUILTIN_MAXPS128_MASK:
+   case IX86_BUILTIN_MAXPD128_MASK:
+   case IX86_BUILTIN_MAXPS256_MASK:
+   case IX86_BUILTIN_MAXPD256_MASK:
+   case IX86_BUILTIN_MAXPH128_MASK:
+   case IX86_BUILTIN_MAXPH256_MASK:
+   case IX86_BUILTIN_MAXPH512_MASK:
+ tcode = GT_EXPR;
+ is_scalar = false;
+   do_minmax:
+ gcc_assert (n_args >= 2);
+ if (TREE_CODE (args[0]) != VECTOR_CST
+ || TREE_CODE (args[1]) != VECTOR_CST)
+   break;
+ mask = HOST_WIDE_INT_M1U;
+ if (n_args > 2)
+   {
+ gcc_assert (n_args >= 4);
+ /* This is masked minmax.  */
+ if (TREE_CODE (args[3]) != INTEGER_CST
+ || TREE_SIDE_EFFECTS (args[2]))
+   break;
+ mask = TREE_INT_CST_LOW (args[3]);
+ unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0]));
+ mask |= HOST_WIDE_INT_M1U << elems;
+ if (mask != HOST_WIDE_INT_M1U
+ && TREE_CODE (args[2]) != VECTOR_CST)
+   break;
+ if (n_args >= 5)
+   {
+ if (!tree_fits_uhwi_p (args[4]))
+   break;
+ if (tree_to_uhwi (args[4]) != 4
+ && tree_to_uhwi (args[4]) != 8)
+   break;
+   }
+ if (mask == (HOST_WIDE_INT_M1U << elems))
+   return args[2];
+   }
+ /* Punt on NaNs, unless exceptions are disabled.  */
+ if (HONOR_NANS (args[0])
+ && (n_args < 5 || tree_to_uhwi (args[4]) != 8))
+   for (int i = 0; i < 2; ++i)
+ {
+   unsigned count = vector_cst_encoded_nelts (args[i]), j;
+   for (j = 0; j < count; ++j)
+ if (!tree_expr_nan_p (VECTOR_CST_ENCODED_ELT (args[i], j)))
+   break;
+   if (j < count)
+ break;
+ }
+ {
+   tree res = const_binop (tcode,
+   truth_type_for (TREE_TYPE (args[0])),
+   args[0], args[1]);
+   if (res == NULL_TREE || TREE_CODE (res) != VECTOR_CST)
+ break;
+   res = fold_ternary (VEC_COND_EXPR, TREE_TYPE (args[0]), res,
+

[PATCH] libcpp: Add -Wleading-whitespace= warning

2024-09-24 Thread Jakub Jelinek

Hi!

The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663388.html
patch adds -Wleading-whitespace= warning option.
This warning doesn't care how much one actually indents which line
in the source (that is something that can't be easily done in the
preprocessor without doing syntactic analysis), but just simple checks
on what kind of whitespace is used in the indentation.
I think it is still useful to get warnings about such issues early,
while git diagnoses some of it in patches (e.g. the tab after space
case), getting the warnings earlier might help avoiding such issues
sooner.

There are projects which ban use of tabs and require just spaces,
others which require indentation just with horizontal tabs, and finally
projects which want indentation with tabs for multiples of tabstop size
followed by spaces (fewer than tabstop size), like GCC.
For all 3 kinds the warning diagnoses indentation with '\v' or '\f'
characters (unless line contains just whitespace), and for the last one
also cases where a space in the indentation is followed by horizontal
tab or where there are N or more consecutive spaces in the indentation
(for -ftabstop=N).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, for additional testing I've enabled the warnings (without -Werror
for them) in stage3.  There are many warnings (both trailing and leading
whitespace), some of them something that can be easily fixed in the headers
or source files, but others with whitespace issues in generated sources,
so if we enable the warnings, either we'd need to adjust the generators
or disable the warnings in (some of the) generated files.

2024-09-24  Jakub Jelinek  

libcpp/
* include/cpplib.h (struct cpp_options): Add
cpp_warn_leading_whitespace and cpp_tabstop members.
(enum cpp_warning_reason): Add CPP_W_LEADING_WHITESPACE.
* internal.h (struct _cpp_line_note): Document new
line note kinds.
* init.cc (cpp_create_reader): Set cpp_tabstop to 8.
* lex.cc (find_leading_whitespace_issues): New function.
(_cpp_clean_line): Use it.
(_cpp_process_line_notes): Handle 'L', 'S' and 'T' line notes.
(lex_raw_string): Clear type on 'L', 'S' and 'T' line notes
inside of raw string literals.
gcc/
* doc/invoke.texi (Wleading-whitespace=): Document.
gcc/c-family/
* c.opt (Wleading-whitespace=): New option.
* c-opts.cc (c_common_post_options): Set cpp_opts->cpp_tabstop
to global_dc->m_tabstop.
gcc/testsuite/
* c-c++-common/cpp/Wleading-whitespace-1.c: New test.
* c-c++-common/cpp/Wleading-whitespace-2.c: New test.
* c-c++-common/cpp/Wleading-whitespace-3.c: New test.
* c-c++-common/cpp/Wleading-whitespace-4.c: New test.

--- libcpp/include/cpplib.h.jj  2024-09-23 16:08:40.846050280 +0200
+++ libcpp/include/cpplib.h 2024-09-23 17:09:32.250056701 +0200
@@ -594,9 +594,15 @@ struct cpp_options
   /* True if -finput-charset= option has been used explicitly.  */
   bool cpp_input_charset_explicit;
 
+  /* -Wleading-whitespace= value.  */
+  unsigned char cpp_warn_leading_whitespace;
+
   /* -Wtrailing-whitespace= value.  */
   unsigned char cpp_warn_trailing_whitespace;
 
+  /* -ftabstop= value.  */
+  unsigned int cpp_tabstop;
+
   /* Dependency generation.  */
   struct
   {
@@ -713,6 +719,7 @@ enum cpp_warning_reason {
   CPP_W_BIDIRECTIONAL,
   CPP_W_INVALID_UTF8,
   CPP_W_UNICODE,
+  CPP_W_LEADING_WHITESPACE,
   CPP_W_TRAILING_WHITESPACE
 };
 
--- libcpp/internal.h.jj2024-09-23 16:08:40.846050280 +0200
+++ libcpp/internal.h   2024-09-23 18:19:46.642467051 +0200
@@ -318,7 +318,8 @@ struct _cpp_line_note
 
   /* Type of note.  The 9 'from' trigraph characters represent those
  trigraphs, '\\' an escaped newline, ' ' an escaped newline with
- intervening space, 'W' trailing whitespace, 0 represents a note that
+ intervening space, 'W' trailing whitespace, 'L', 'S' and 'T' for
+ leading whitespace issues, 0 represents a note that
  has already been handled, and anything else is invalid.  */
   unsigned int type;
 };
--- libcpp/init.cc.jj   2024-09-20 08:57:03.041075703 +0200
+++ libcpp/init.cc  2024-09-23 17:24:53.564421636 +0200
@@ -246,6 +246,7 @@ cpp_create_reader (enum c_lang lang, cpp
   CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
   CPP_OPTION (pfile, cpp_warn_unicode) = 1;
   CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
+  CPP_OPTION (pfile, cpp_tabstop) = 8;
 
   /* Default CPP arithmetic to something sensible for the host for the
  benefit of dumb users like fix-header.  */
--- libcpp/lex.cc.jj2024-09-23 16:08:40.847050267 +0200
+++ libcpp/lex.cc   2024-09-24 09:32:57.293210930 +0200
@@ -818,6 +818,59 @@ _cpp_init_lexer (void)
 #endif
 }
 
+/* Look for leading whitespace style issues on lines which don't contain
+   just whitespace.
+   For -Wleading-whitespace=spaces report if such lines

Re: [PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]

2024-09-24 Thread Jason Merrill


On 9/24/24 12:53 PM, Jakub Jelinek wrote:

On Mon, Sep 23, 2024 at 03:46:36PM -0400, Jason Merrill wrote:

-frange-based-for-ext-temps
or do you have better suggestion?


I'd probably drop "based", "range-for" seems enough.


Shall we allow also disabling it in C++23 or later modes, or override
user choice unconditionally for C++23+ and only allow users to
enable/disable it in C++11-C++20?


Hmm, I think the latter.


What about the __cpp_range_based_for predefined macro?
Shall it be defined to the C++23 202211L value if the switch is on?
While that could be done in theory for C++17 and later code, for C++11/14
__cpp_range_based_for is 200907L and doesn't include the C++17
201603L step.  Or keep the macro only for C++23 and later?


I think update the macro for 17 and later.


Ok.

Here is a new patch.


@@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre
  else
{
  range_temp = build_range_temp (init);
+ tree name = DECL_NAME (range_temp);
  DECL_NAME (range_temp) = NULL_TREE;
  pushdecl (range_temp);
+ DECL_NAME (range_temp) = name;
  cp_finish_decl (range_temp, init,
  /*is_constant_init*/false, NULL_TREE,
  LOOKUP_ONLYCONVERTING);
+ DECL_NAME (range_temp) = NULL_TREE;


This messing with the name needs a rationale.  What wants it to be null?


I'll add comments.  The first = NULL_TREE; is needed so that pushdecl
doesn't register the temporary for name lookup, the = name now is so that
cp_finish_decl recognizes the temporary as range based for temporary
for the lifetime extension, and the last one is just to preserve previous
behavior, not have it visible in debug info etc.


But cp_convert_range_for doesn't ever set the name to NULL_TREE, why should
the OMP variant be different?

Having it visible to name lookup in the debugger seems beneficial. Having it
visible to the code seems less useful, but not important to prevent.


So, in the end it works fine even for the OpenMP case when not inside of a
template, all I had to add is the renaming of the symbol at the end after
pop_scope from "__for_range " to "__for_range" etc.
It doesn't work unfortunately during instantiation, we only create a single
scope in that case for the whole loop nest rather than one for each loop in
it and changing that isn't easy.  With the "__for_range " name in, if there
are 2+ range based for loops in the OpenMP loop nest (collapsed or ordered),
one gets then errors about defining it multiple times.
I'll try to fix that up at incrementally later, for now I just went with
a new flag to the function, so that it does the DECL_NAME dances only when
called from the instantiation (and confirmed actually all 3 spots are
needed, clearing before pushdecl, resetting back before cp_finish_decl and
clearing after cp_finish_decl, the last one so that pop_scope doesn't ICE
on seeing the name change).


Don't worry too much about fixing it up if it's complicated.


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-24  Jakub Jelinek  

PR c++/107637
gcc/
* omp-general.cc (find_combined_omp_for, find_nested_loop_xform):
Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR.
* doc/invoke.texi (frange-for-ext-temps): Document.  Add
-fconcepts to the C++ option list.
gcc/c-family/
* c.opt (frange-for-ext-temps): New option.
* c-opts.cc (c_common_post_options): Set flag_range_for_ext_temps
for C++23 or later or for C++11 or later in !flag_iso mode if
the option wasn't set by user.
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_range_based_for
value for flag_range_for_ext_temps from 201603L to 202212L in C++17
or later.
* c-omp.cc (c_find_nested_loop_xform_r): Handle CLEANUP_POINT_EXPR
like TRY_FINALLY_EXPR.
gcc/cp/
* cp-tree.h: Implement C++23 P2718R0 - Wording for P2644R1 Fix for
Range-based for Loop.
(cp_convert_omp_range_for): Add bool tmpl_p argument.
(find_range_for_decls): Declare.
* parser.cc (cp_convert_range_for): For flag_range_for_ext_temps call
push_stmt_list () before cp_finish_decl for range_temp and save it
temporarily to FOR_INIT_STMT.
(cp_convert_omp_range_for): Add tmpl_p argument.  If set, remember
DECL_NAME of range_temp and for cp_finish_decl call restore it before
clearing it again, if unset, don't adjust DECL_NAME of range_temp at
all.
(cp_parser_omp_loop_nest): For flag_range_for_ext_temps range for add
CLEANUP_POINT_EXPR around sl.  Call find_range_for_decls and adjust
DECL_NAMEs for range fors if not processing_template_decl.  Adjust
cp_convert_omp_range_for caller.  Remove superfluous backslash at the
end of line.
* decl.cc (initialize_local_var): For flag_range_for_ext_temps
temporarily clear stmts_are_full_expr

Re: [PATCH 02/10] c++: Update decl_linkage for C++11

2024-09-24 Thread Jason Merrill


On 9/23/24 7:43 PM, Nathaniel Shead wrote:

This patch intends no change in functionality apart from the mangling
difference noted; more tests are in patch 4 of this series, which adds a
way to actually check what the linkage of decl_linkage provides more
directly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently modules code uses a variety of ad-hoc methods to attempt to
determine whether an entity has internal linkage, which leads to
inconsistencies and some correctness issues as different edge cases are
neglected.  While investigating this I discovered 'decl_linkage', but it
doesn't seem to have been updated to account for the C++11 clarification
that all entities declared in an anonymous namespace are internal.

I'm not convinced that even in C++98 it was intended that e.g. types in
anonymous namespaces should be external, but some tests in the testsuite
rely on this, so for compatibility I restricted those modifications to
C++11 and later.

This should have relatively minimal impact as not much seems to actually
rely on decl_linkage, but does change the mangling of symbols in
anonymous namespaces slightly.  Previously, we had

   namespace {
 int x;  // mangled as '_ZN12_GLOBAL__N_11xE'
 static int y;  // mangled as '_ZN12_GLOBAL__N_1L1yE'
   }

but with this patch the x is now mangled like y (with the extra 'L').
For contrast, Clang currently mangles neither x nor y with the 'L'.
Since this only affects internal-linkage entities I don't believe this
should break ABI in any observable fashion.

gcc/cp/ChangeLog:

* name-lookup.cc (do_namespace_alias): Propagate TREE_PUBLIC for
namespace aliases.
* tree.cc (decl_linkage): Update rules for C++11.

gcc/testsuite/ChangeLog:

* g++.dg/modules/mod-sym-4.C: Update test to account for
non-static internal-linkage variables new mangling.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc|  1 +
  gcc/cp/tree.cc   | 92 +++-
  gcc/testsuite/g++.dg/modules/mod-sym-4.C |  4 +-
  3 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index c7a693e02d5..50e169eca43 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6610,6 +6610,7 @@ do_namespace_alias (tree alias, tree name_space)
DECL_NAMESPACE_ALIAS (alias) = name_space;
DECL_EXTERNAL (alias) = 1;
DECL_CONTEXT (alias) = FROB_CONTEXT (current_scope ());
+  TREE_PUBLIC (alias) = TREE_PUBLIC (DECL_CONTEXT (alias));
set_originating_module (alias);
  
pushdecl (alias);

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index f43febed124..28e14295de4 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5840,7 +5840,7 @@ char_type_p (tree type)
  || same_type_p (type, wchar_type_node));
  }
  
-/* Returns the kind of linkage associated with the indicated DECL.  Th

+/* Returns the kind of linkage associated with the indicated DECL.  The
 value returned is as specified by the language standard; it is
 independent of implementation details regarding template
 instantiation, etc.  For example, it is possible that a declaration
@@ -5857,53 +5857,75 @@ decl_linkage (tree decl)
   linkage first, and then transform that into a concrete
   implementation.  */
  
-  /* Things that don't have names have no linkage.  */

-  if (!DECL_NAME (decl))
-return lk_none;
+  /* An explicit type alias has no linkage.  */
+  if (TREE_CODE (decl) == TYPE_DECL
+  && !DECL_IMPLICIT_TYPEDEF_P (decl)
+  && !DECL_SELF_REFERENCE_P (decl))
+{
+  /* But this could be a typedef name for linkage purposes, in which
+case we're interested in the linkage of the main decl.  */


Perhaps we should move is_naming_typedef_decl out of dwarf2out.cc...

Anyway, the patch is OK.

Jason

Re: [PATCH] libstdc++: more #pragma diagnostic

2024-09-24 Thread Jason Merrill


On 9/24/24 7:51 AM, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.

Is this the right fix, or do we want to stop using these deprecated classes,
here and in stl_function.h?


Oops, adding libstdc++ CC.


-- 8< --

The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to
-Wdeprecated-declarations warnings in some parallel/ headers.

libstdc++-v3/ChangeLog:

* include/parallel/base.h: Suppress -Wdeprecated-declarations.
* include/parallel/multiseq_selection.h: Likewise.
---
  libstdc++-v3/include/parallel/base.h   | 4 
  libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++
  2 files changed, 10 insertions(+)

diff --git a/libstdc++-v3/include/parallel/base.h 
b/libstdc++-v3/include/parallel/base.h
index 5bc5350e723..fcbcc1e0b99 100644
--- a/libstdc++-v3/include/parallel/base.h
+++ b/libstdc++-v3/include/parallel/base.h
@@ -166,6 +166,8 @@ namespace __gnu_parallel
{ return !_M_comp(__a, __b) && !_M_comp(__b, __a); }
  };
  
+#pragma GCC diagnostic push

+#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
  
/** @brief Similar to std::unary_negate,

 *  but giving the argument types explicitly. */
@@ -297,6 +299,8 @@ namespace __gnu_parallel
  struct _Multiplies<_Tp, _Tp, _Tp>
  : public std::multiplies<_Tp> { };
  
+#pragma GCC diagnostic pop // -Wdeprecated-declarations

+
/** @brief _Iterator associated with __gnu_parallel::_PseudoSequence.
 *  If features the usual random-access iterator functionality.
 *  @param _Tp Sequence _M_value type.
diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h 
b/libstdc++-v3/include/parallel/multiseq_selection.h
index f25895adbdd..22bd97e6432 100644
--- a/libstdc++-v3/include/parallel/multiseq_selection.h
+++ b/libstdc++-v3/include/parallel/multiseq_selection.h
@@ -48,6 +48,10 @@
  
  namespace __gnu_parallel

  {
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
+
/** @brief Compare __a pair of types lexicographically, ascending. */
template
  class _Lexicographic
@@ -100,6 +104,8 @@ namespace __gnu_parallel
}
  };
  
+#pragma GCC diagnostic pop // -Wdeprecated-declarations

+
/**
 *  @brief Splits several sorted sequences at a certain global __rank,
 *  resulting in a splitting point for each sequence.

base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8

[PATCH v1 1/3] Match: Support form 1 for scalar signed integer SAT_SUB

2024-09-24 Thread pan2 . li

From: Pan Li 

This patch would like to support the form 1 of the scalar signed
integer SAT_SUB.  Aka below example:

Form 1:
  #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_1 (T x, T y) \
  {\
T minus = (UT)x - (UT)y;   \
return (x ^ y) >= 0\
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_1 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t minus;
   8   │   unsigned char x.0_1;
   9   │   unsigned char y.1_2;
  10   │   unsigned char _3;
  11   │   signed char _4;
  12   │   signed char _5;
  13   │   int8_t _6;
  14   │   _Bool _11;
  15   │   signed char _12;
  16   │   signed char _13;
  17   │   signed char _14;
  18   │   signed char _15;
  19   │
  20   │ ;;   basic block 2, loop depth 0
  21   │ ;;pred:   ENTRY
  22   │   x.0_1 = (unsigned char) x_7(D);
  23   │   y.1_2 = (unsigned char) y_8(D);
  24   │   _3 = x.0_1 - y.1_2;
  25   │   minus_9 = (int8_t) _3;
  26   │   _4 = x_7(D) ^ y_8(D);
  27   │   _5 = x_7(D) ^ minus_9;
  28   │   _15 = _4 & _5;
  29   │   if (_15 < 0)
  30   │ goto ; [41.00%]
  31   │   else
  32   │ goto ; [59.00%]
  33   │ ;;succ:   3
  34   │ ;;4
  35   │
  36   │ ;;   basic block 3, loop depth 0
  37   │ ;;pred:   2
  38   │   _11 = x_7(D) < 0;
  39   │   _12 = (signed char) _11;
  40   │   _13 = -_12;
  41   │   _14 = _13 ^ 127;
  42   │ ;;succ:   4
  43   │
  44   │ ;;   basic block 4, loop depth 0
  45   │ ;;pred:   2
  46   │ ;;3
  47   │   # _6 = PHI 
  48   │   return _6;
  49   │ ;;succ:   EXIT
  50   │
  51   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_sub_int8_t_fmt_1 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _6 = .SAT_SUB (x_7(D), y_8(D)); [tail call]
  12   │   return _6;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add case 1 matching pattern for signed SAT_SUB.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_sub): Add new
decl for generated SAT_SUB matching func.
(match_unsigned_saturation_sub): Rename from...
(match_saturation_sub): ...Rename to and add signed SAT_SUB matching.
(math_opts_dom_walker::after_dom_children): Leverage the named
match func for both the unsigned and signed SAT_SUB.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 14 ++
 gcc/tree-ssa-math-opts.cc |  8 +---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 940292d0d49..63f7f3142c4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3358,6 +3358,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* Signed saturation sub, case 1:
+   T minus = (T)((UT)X - (UT)Y);
+   SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
+
+   The T and UT are type pair like T=int8_t, UT=uint8_t.  */
+(match (signed_integer_sat_sub @0 @1)
+ (cond^ (lt (bit_and:c (bit_xor:c @0 @1)
+  (bit_xor @0 (nop_convert@2 (minus (nop_convert @0)
+(nop_convert @1)
+   integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   @2)
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type
+
 /* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT).
SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))).  */
 (match (unsigned_integer_sat_trunc @0)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index d61668aacfc..f04b17101db 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4024,6 +4024,7 @@ extern bool gimple_unsigned_integer_sat_sub (tree, tree*, 
tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
 
 extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree));
+extern bool gimple_signed_integer_sat_sub (tree, tree*, tree (*)(tree));
 
 static void
 build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn,
@@ -4162,7 +4163,7 @@ match_unsigned_saturation_sub (gimple_stmt_iterator *gsi, 
gassign *stmt)
  *   [local count: 1073741824]:
  *  _1 = .SAT_SUB (x_2(D), y_3(D));  */
 static void
-match_unsigned_sat

[PATCH v1 2/3] RISC-V: Implement scalar SAT_SUB for signed integer

2024-09-24 Thread pan2 . li

From: Pan Li 

This patch would like to implement the sssub form 1.  Aka:

Form 1:
  #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_1 (T x, T y) \
  {\
T minus = (UT)x - (UT)y;   \
return (x ^ y) >= 0\
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
  10   │ sat_s_sub_int8_t_fmt_1:
  11   │ subwa5,a0,a1
  12   │ slliw   a5,a5,24
  13   │ sraiw   a5,a5,24
  14   │ xor a1,a0,a1
  15   │ xor a4,a0,a5
  16   │ and a1,a1,a4
  17   │ blt a1,zero,.L4
  18   │ mv  a0,a5
  19   │ ret
  20   │ .L4:
  21   │ sraia0,a0,63
  22   │ xoria5,a0,127
  23   │ mv  a0,a5
  24   │ ret

After this patch:
  10   │ sat_s_sub_int8_t_fmt_1:
  11   │ sub a4,a0,a1
  12   │ xor a5,a0,a4
  13   │ xor a1,a0,a1
  14   │ and a5,a5,a1
  15   │ srlia5,a5,7
  16   │ andia5,a5,1
  17   │ sraia0,a0,63
  18   │ xoria3,a0,127
  19   │ neg a0,a5
  20   │ addia5,a5,-1
  21   │ and a3,a3,a0
  22   │ and a0,a4,a5
  23   │ or  a0,a0,a3
  24   │ slliw   a0,a0,24
  25   │ sraiw   a0,a0,24
  26   │ ret

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_sssub): Add new func
decl for expanding signed SAT_SUB.
* config/riscv/riscv.cc (riscv_expand_sssub): Add new func impl
for expanding signed SAT_SUB.
* config/riscv/riscv.md (sssub3): Add new pattern sssub
for scalar signed integer.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc   | 69 +
 gcc/config/riscv/riscv.md   | 11 ++
 3 files changed, 81 insertions(+)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 07a4d42e3a5..3d8775e582d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -136,6 +136,7 @@ extern void riscv_legitimize_poly_move (machine_mode, rtx, 
rtx, rtx);
 extern void riscv_expand_usadd (rtx, rtx, rtx);
 extern void riscv_expand_ssadd (rtx, rtx, rtx);
 extern void riscv_expand_ussub (rtx, rtx, rtx);
+extern void riscv_expand_sssub (rtx, rtx, rtx);
 extern void riscv_expand_ustrunc (rtx, rtx);
 
 #ifdef RTX_CODE
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7be3939a7f9..8708a7b42c6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12329,6 +12329,75 @@ riscv_expand_ussub (rtx dest, rtx x, rtx y)
   emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
 }
 
+/* Implements the signed saturation sub standard name ssadd for int mode.
+
+   z = SAT_SUB(x, y).
+   =>
+   1.  minus = x - y
+   2.  xor_0 = x ^ y
+   3.  xor_1 = x ^ minus
+   4.  lt_0 = xor_1 < 0
+   5.  lt_1 = xor_0 < 0
+   6.  and = lt_0 & lt_1
+   7.  lt = x < 0
+   8.  neg = -lt
+   9.  max = INT_MAX
+   10. max = max ^ neg
+   11. neg = -and
+   12. max = max & neg
+   13. and = and - 1
+   14. z = minus & and
+   15. z = z | max  */
+
+void
+riscv_expand_sssub (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant ();
+  rtx shift_bits = GEN_INT (bitsize - 1);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_minus = gen_reg_rtx (Xmode);
+  rtx xmode_xor_0 = gen_reg_rtx (Xmode);
+  rtx xmode_xor_1 = gen_reg_rtx (Xmode);
+  rtx xmode_lt_0 = gen_reg_rtx (Xmode);
+  rtx xmode_lt_1 = gen_reg_rtx (Xmode);
+  rtx xmode_and = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_neg = gen_reg_rtx (Xmode);
+  rtx xmode_max = gen_reg_rtx (Xmode);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: mins = x - y, xor_0 = x ^ y, xor_1 = x ^ minus.  */
+  riscv_emit_binary (MINUS, xmode_minus, xmode_x, xmode_y);
+  riscv_emit_binary (XOR, xmode_xor_0, xmode_x, xmode_y);
+  riscv_emit_binary (XOR, xmode_xor_1, xmode_x, xmode_minus);
+
+  /* Step-2: and = xor_0 < 0 & xor_1 < 0.  */
+  riscv_emit_binary (LSHIFTRT, xmode_lt_0, xmode_xor_0, shift_bits);
+  riscv_emit_binary (LSHIFTRT, xmode_lt_1, xmode_xor_1, shift_bits);
+  riscv_emit_binary (AND, xmode_and, xmode_lt_0, xmode_lt_1);
+  riscv_emit_binary (AND, xmode_and, xmode_and, CONST1_RTX (Xmode));
+
+  /* Step-3: lt = x < 0, neg = -lt.  */
+  riscv_emit_binary (LT, xmode_lt, xmode_x, CONST0_RTX (Xmode));
+  riscv_emit_unary (NEG, xmode_neg, xmode_lt);
+
+  /* Step-4: max = 0x7f..., max = max ^ neg, neg = -and, max = max & neg.  */
+  riscv_emit_move (xmode_max

[PATCH v1 3/3] RISC-V: Add testcases for form 1 of scalar signed SAT_SUB

2024-09-24 Thread pan2 . li

From: Pan Li 

Form 1:
  #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_sub_##T##_fmt_1 (T x, T y) \
  {\
T minus = (UT)x - (UT)y;   \
return (x ^ y) >= 0\
  ? minus  \
  : (minus ^ x) >= 0   \
? minus\
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_arith_data.h: Add test data for SAT_SUB.
* gcc.target/riscv/sat_s_sub-1-i16.c: New test.
* gcc.target/riscv/sat_s_sub-1-i32.c: New test.
* gcc.target/riscv/sat_s_sub-1-i64.c: New test.
* gcc.target/riscv/sat_s_sub-1-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 17 +
 .../gcc.target/riscv/sat_arith_data.h | 73 +++
 .../gcc.target/riscv/sat_s_sub-1-i16.c| 30 
 .../gcc.target/riscv/sat_s_sub-1-i32.c| 28 +++
 .../gcc.target/riscv/sat_s_sub-1-i64.c| 27 +++
 .../gcc.target/riscv/sat_s_sub-1-i8.c | 28 +++
 .../gcc.target/riscv/sat_s_sub-run-1-i16.c| 16 
 .../gcc.target/riscv/sat_s_sub-run-1-i32.c| 16 
 .../gcc.target/riscv/sat_s_sub-run-1-i64.c| 16 
 .../gcc.target/riscv/sat_s_sub-run-1-i8.c | 16 
 10 files changed, 267 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index a2617b6db70..587f3f8348c 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -353,6 +353,23 @@ sat_u_sub_imm_type_check##_##INDEX##_##T##_fmt_4 (T x)\
   return x > IMM ? x - IMM : 0;   \
 }
 
+#define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
+T __attribute__((noinline))  \
+sat_s_sub_##T##_fmt_1 (T x, T y) \
+{\
+  T minus = (UT)x - (UT)y;   \
+  return (x ^ y) >= 0\
+? minus  \
+: (minus ^ x) >= 0   \
+  ? minus\
+  : x < 0 ? MIN : MAX;   \
+}
+#define DEF_SAT_S_SUB_FMT_1_WRAP(T, UT, MIN, MAX) \
+  DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX)
+
+#define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y)
+#define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y)
+
 
/**/
 /* Saturation Truncate (unsigned and signed)  
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith_data.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith_data.h
index 75037c5d806..39a1e17cd3d 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith_data.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith_data.h
@@ -37,6 +37,11 @@ TEST_BINARY_STRUCT (int16_t, ssadd)
 TEST_BINARY_STRUCT (int32_t, ssadd)
 TEST_BINARY_STRUCT (int64_t, ssadd)
 
+TEST_BINARY_STRUCT (int8_t,  sssub)
+TEST_BINARY_STRUCT (int16_t, sssub)
+TEST_BINARY_STRUCT (int32_t, sssub)
+TEST_BINARY_STRUCT (int64_t, sssub)
+
 TEST_UNARY_STRUCT_DECL(uint8_t, uint16_t) \
   TEST_UNARY_DATA(uint8_t, uint16_t)[] =
 {
@@ -189,4 +194,72 @@ TEST_BINARY_STRUCT_DECL(int64_t, ssadd) 
TEST_BINARY_DATA(int64_t, ssadd)[] =
   { -9223372036854775803ll,   9223372036854775805ll,   2},
 };
 
+TEST_BINARY_STRUCT_DECL(int8_t, sssub) TEST_BINARY_DATA(int8_t, sssub)[] =
+{
+  {   0,0,0},
+  {   2,4,   -2},
+  { 126,   -1,  127},
+  { 127,   -1,  127},
+  { 127, -127,  127},
+  {  -7,   -4,   -3},
+

Re: [r15-3834 Regression] FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) on Linux/x86_64

2024-09-24 Thread Sandra Loosemore


On 9/24/24 14:08, haochen.jiang wrote:

On Linux/x86_64,

96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 is the first bad commit
commit 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4
Author: Sandra Loosemore 
Date:   Fri Sep 6 20:58:13 2024 +

 OpenMP: Check additional restrictions on context selector properties

caused

FAIL: c-c++-common/gomp/declare-variant-duplicates.c  (test for errors, line 11)
FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3834/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c 
--target_board='unix{-m32\ -march=cascadelake}'"


It turns out the problem here is that with -m32, "i386" is defined as a 
built-in preprocessor macro :-O so it cannot be used as an identifier. 
I've pushed the attached patch to adjust the testcase not to do that.


-Sandra

From 6935bddd8f90dde6009a1b8dea9745788ceeefb1 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Wed, 25 Sep 2024 02:59:53 +
Subject: [PATCH] OpenMP: Fix testsuite failure on x86 with -m32

The testcase decare-variant-duplicates.c added in commit
96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 failed on 32-bit x86
because on that target "i386" is defined as a preprocessor macro
and cannot be used as an identifier.  Fixed by rewriting that test
not to do that.

gcc/testsuite/ChangeLog
	* c-c++-common/gomp/declare-variant-duplicates.c: Avoid using
	"i386" as an identifier.
---
 gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c
index 47d34fc52e2..9f319c72449 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c
@@ -8,6 +8,6 @@ extern int f4 (int);
 
 #pragma omp declare variant (f1) match (device={kind(cpu,gpu,"cpu")})  /* { dg-error "trait-property .cpu. specified more than once" } */
 #pragma omp declare variant (f2) match (device={isa(sse4,"avx",avx)})  /* { dg-error "trait-property .avx. specified more than once" } */
-#pragma omp declare variant (f3) match (device={arch(x86_64,i386,aarch64,"i386")})  /* { dg-error "trait-property .i386. specified more than once" } */
+#pragma omp declare variant (f3) match (device={arch(x86_64,"i386",aarch64,"x86_64")})  /* { dg-error "trait-property .x86_64. specified more than once" } */
 #pragma omp declare variant (f4) match (implementation={vendor(llvm,gnu,"arm",gnu)})  /* { dg-error "trait-property .gnu. specified more than once" } */
 int f (int);
-- 
2.25.1

Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

2024-09-24 Thread Marek Polacek

On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote:
> On 9/20/24 12:18 AM, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > This PR reports a missed optimization.  When we have:
> > 
> >Str str{"Test"};
> >callback(str);
> > 
> > as in the test, we're able to evaluate the Str::Str() call at compile
> > time.  But when we have:
> > 
> >callback(Str{"Test"});
> > 
> > we are not.  With this patch (in fact, it's Patrick's patch with a little
> > tweak), we turn
> > 
> >callback (TARGET_EXPR  >  5
> >  __ct_comp
> >  D.2890
> >  (struct Str *) <<< Unknown tree: void_cst >>>
> >  (const char *) "Test" )
> > 
> > into
> > 
> >callback (TARGET_EXPR )
> > 
> > I explored the idea of calling maybe_constant_value for the whole
> > TARGET_EXPR in cp_fold.  That has three problems:
> > - we can't always elide a TARGET_EXPR, so we'd have to make sure the
> >result is also a TARGET_EXPR;
> 
> I'd think that the result should always be a TARGET_EXPR for a class, and
> that's the case we want to fold; a TARGET_EXPR for a scalar is always the
> initialize-temp-and-use-it pattern you mention below.

Checking CLASS_TYPE_P would solve some of the problems, yes.  But...
 
> > - the resulting TARGET_EXPR must have the same flags, otherwise Bad
> >Things happen;
> 
> I guess maybe_constant_value should be fixed to preserve flags regardless of
> this change.

Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags,
but here we go into the break_out_target_exprs block in maybe_constant_value
and that doesn't necessarily preserve them.

> > - getting a new slot is also problematic.  I've seen a test where we
> >had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR
> >would get us "TARGET_EXPR", but since we don't see the outer
> >D.2680, we can't replace it with D.2681, and things break.
> 
> Hmm, yeah.  Maybe only if TARGET_EXPR_IMPLICIT_P?

...unfortunately that doesn't always help.  I've reduced an example into:

  struct optional {
constexpr optional(int) {}
  };
  optional foo() { return 2; }

where check_return_expr creates a COMPOUND_EXPR:

retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
 TREE_OPERAND (retval, 0));

where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P.
 
> > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
> > 
> > FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant 
> > store: .*.a = {}"
> > is easy.  Previously, we would call C::C, so .gimple has:
> > 
> >D.2590 = {};
> >C::C (&D.2590);
> >D.2597 = D.2590;
> >return D.2597;
> > 
> > Then .einline inlines the C::C call:
> > 
> >D.2590 = {};
> >D.2590.a = {}; // #1
> >D.2590.b = 0;  // #2
> >D.2597 = D.2590;
> >D.2590 ={v} {CLOBBER(eos)};
> >return D.2597;
> > 
> > then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
> > passes.  But with the patch, .gimple won't have that C::C call, so the
> > IL is of course going to look different.
> 
> Maybe -fno-inline instead of the --param?

Then that C::C call isn't inlined and the test fails :/.
 
> > Unfortunately, pr78687.C is much more complicated and I can't explain
> > precisely what happens there.  But it seems like a good idea to have
> > a way to avoid this optimization.  So I've added the "noinline" check.
> 
> Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy
> constructors are not constexpr.  And I wouldn't expect the optimization to
> affect the value-initialization option_2().

In pr78687.C we do this new optimization only once for
"constexpr eggs::variants::variant::variant(U&&) noexcept 
(std::is_nothrow_constructible::value)".
 
> > PR c++/116416
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold_r) : Try to fold
> > TARGET_EXPR_INITIAL and replace it with the folded result if
> > it's TREE_CONSTANT.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/analyzer/pr97116.C: Adjust dg-message.
> > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus.
> > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)).
> > * g++.dg/tree-ssa/pr90883.C: Likewise.
> > * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
> > 
> > Co-authored-by: Patrick Palka 
> > ---
> >   gcc/cp/cp-gimplify.cc | 14 +
> >   gcc/testsuite/g++.dg/analyzer/pr97116.C   |  2 +-
> >   .../g++.dg/cpp1y/constexpr-prvalue1.C | 29 +++
> >   gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C  |  2 +-
> >   gcc/testsuite/g++.dg/tree-ssa/pr78687.C   |  5 +++-
> >   gcc/testsuite/g++.dg/tree-ssa/pr90883.C   |  1 +
> >   6 files changed, 50 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify

Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]

2024-09-24 Thread Marek Polacek

On Sat, Sep 21, 2024 at 05:00:51PM +0200, Jakub Jelinek wrote:
> On Fri, Sep 20, 2024 at 07:03:45PM -0400, Jason Merrill wrote:
> > > The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes 
> > > more
> > > sense to me.
> > > Because checking "noinline" attribute (which means don't inline this
> > > function) on current_function_decl rather than on functions being 
> > > "inlined"
> > > (the constexpr functions being non-manifestly constant evaluated) is just
> > > weird.
> > > If we really wanted, we could honor "noinline" during constant evaluation
> > > on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the
> > > non-manifestly constant evaluated cases or just in special cases like 
> > > these
> > > two (CALL_EXPR in cp_fold, this in cp_fold_r).
> > 
> > Checking noinline in non-manifestly constant-evaluated cases might make
> > sense.
> 
> Though, if somebody marks some function explicitly constexpr they should be
> prepared to get some constexpr evaluation of it, doesn't have to be strictly
> standard required one.

Yeah, I would agree with that.

> And for -fimplicit-constexpr we already have "noinline" attribute check, so
> maybe it is ok as is.

Yeah.  I dropped the "noinline" attribute check though because I no longer
see any need for it.

Marek

[Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

2024-09-24 Thread Tobias Burnus

OpenMP mandates that when certain clauses are used with 'omp requires' 
that in all compilation units this requires clause appears.


Those clauses influence the offloading behavior (+ potentially codegen); 
hence, the must requires must match for those claues when device code is 
involved. That's the case for device functions (in particular 'declare 
target') and all OpenMP directives that take a 'device' clause.


Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally more 
explicit. Thus, this patch adds it for 'declare target' and it adds it 
("device" clause!) for 'interop' (but only for Fortran as C/C++ still 
does not support 'interop' directive plarsing.)


And comment before I commit it?

Tobias

PS: In TR13, page 321, lines 14–16 — 
https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf
OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop

Older versions of the OpenMP specification were not clear about what counted
as device usage. Newer (like TR13) are rather clear. Hence, this commit adds
"target used" also when 'declare target' or 'interop' are encountered.
(The latter only to Fortran as C/C++ parsing support is still missing.)
TR13 also lists 'dispatch' as construct and 'device_safesync' affected by
device use, but both are not yet supported in GCC:

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_declare_target): Set target-used bit
	in omp_requires_mask.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_declare_target): Set target-used bit
	in omp_requires_mask.

gcc/fortran/ChangeLog:

	* parse.cc (decode_omp_directive): Set target-used bit of
	omp_requires_mask when encountering the declare_target or interop
	directive.

 gcc/c/c-parser.cc| 3 +++
 gcc/cp/parser.cc | 3 +++
 gcc/fortran/parse.cc | 8 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 6a46577f511..a681438cbbe 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (c_parser_next_token_is (parser, CPP_NAME)
   || (c_parser_next_token_is (parser, CPP_COMMA)
 	  && c_parser_peek_2nd_token (parser)->type == CPP_NAME))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 35c266659e4..3b3ab0f1923 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -49524,6 +49524,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
   int device_type = 0;
   bool indirect = false;
   bool only_device_type_or_indirect = true;
+  if (flag_openmp)
+omp_requires_mask
+  = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
   || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA)
 	  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME)))
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index e749bbdc6b5..9e06dbf0911 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1345,8 +1345,12 @@ decode_omp_directive (void)
 
   switch (ret)
 {
-/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET.
-   FIXME: Get clarification, cf. OpenMP Spec Issue #3240.  */
+/* For the constraints on clauses with the global requirement property,
+   we set omp_target_seen. This included all clauses that take the
+   DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device
+   (which effectively is implied by the former).  */
+case ST_OMP_DECLARE_TARGET:
+case ST_OMP_INTEROP:
 case ST_OMP_TARGET:
 case ST_OMP_TARGET_DATA:
 case ST_OMP_TARGET_ENTER_DATA:

Re: [PATCH] ltmain.sh: allow more flags at link-time

2024-09-24 Thread Alan Modra

On Thu, Sep 19, 2024 at 11:52:48PM +0100, Sam James wrote:
> Sam James  writes:
> 
> > Sam James  writes:
> >
> >> libtool defaults to filtering flags passed at link-time.
> >>
> >> This brings the filtering in GCC's 'fork' of libtool into sync with
> >> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e.

Looks OK to me, thanks.

-- 
Alan Modra

[r15-3841 Regression] FAIL: gfortran.dg/unsigned_25.f90 -Os (test for excess errors) on Linux/x86_64

2024-09-24 Thread haochen.jiang

On Linux/x86_64,

5d98fe096b5d17021875806ffc32ba41ea0e87b0 is the first bad commit
commit 5d98fe096b5d17021875806ffc32ba41ea0e87b0
Author: Thomas Koenig 
Date:   Tue Sep 24 21:51:42 2024 +0200

Implement MATMUL and DOT_PRODUCT for unsigned.

caused

FAIL: gfortran.dg/unsigned_25.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/unsigned_25.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/unsigned_25.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/unsigned_25.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/unsigned_25.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/unsigned_25.f90   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3841/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unsigned_25.f90 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/unsigned_25.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[PATCH] tree-optimization/116819 - SLP with !STMT_VINFO_RELEVANT representative

2024-09-24 Thread Richard Biener

Under some circumstances we can end up picking a not relevant stmt
as representative of a SLP node.  Instead of skipping stmt analysis
and declaring success we have to either ignore relevancy throughout
the code base or fail SLP operation verification.  The following
does the latter.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116819
* tree-vect-stmts.cc (vect_analyze_stmt): When the SLP
representative isn't relevant signal failure instead of
success.
---
 gcc/tree-vect-stmts.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f7867c0803b..7e0a8095fe8 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13289,6 +13289,12 @@ vect_analyze_stmt (vec_info *vinfo,
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "irrelevant.\n");
 
+ if (node)
+   return opt_result::failure_at (stmt_info->stmt,
+  "not vectorized:"
+  " irrelevant stmt as SLP node %p "
+  "representative.\n",
+  (void *)node);
   return opt_result::success ();
 }
 }
-- 
2.43.0

RE: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD

2024-09-24 Thread Li, Pan2

Thanks Robin, this depends on [PATCH 1/2] of match.pd change, will commit it 
after that.

Pan

-Original Message-
From: Robin Dapp  
Sent: Tuesday, September 24, 2024 8:40 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp 
Subject: Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector 
SAT_ADD

LGTM (in case you haven't committed it yet).

-- 
Regards
 Robin

Re: [PATCH] Update email in MAINTAINERS file.

2024-09-24 Thread Filip Kastl

On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote:
> From: Aldy Hernandez 
> 
> ChangeLog:
> 
>   * MAINTAINERS: Update email and add myself to DCO.
> ---
>  MAINTAINERS | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cfd96c9f33e..e9fafaf45a7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -116,7 +116,7 @@ riscv port  Jim Wilson  
> 
>  rs6000/powerpc port David Edelsohn  
>  rs6000/powerpc port Segher Boessenkool  
>  rs6000/powerpc port Kewen Lin   
> -rs6000 vector extns Aldy Hernandez  
> +rs6000 vector extns Aldy Hernandez  
>  rx port Nick Clifton
>  s390 port   Ulrich Weigand  
>  s390 port   Andreas Krebbel 
> @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely 
> 
>  c++ runtime libs special modes  François Dumont 
>  fixincludes Bruce Korb  
>  *gimpl* Jakub Jelinek   
> -*gimpl* Aldy Hernandez  
> +*gimpl* Aldy Hernandez  
>  *gimpl* Jason Merrill   
>  gcse.cc Jeff Law
>  global opt frameworkJeff Law
> @@ -240,7 +240,7 @@ option handling Joseph Myers
> 
>  middle-end  Jeff Law
>  middle-end  Ian Lance Taylor
>  middle-end  Richard Biener  
> -*vrp, rangerAldy Hernandez  
> +*vrp, rangerAldy Hernandez  
>  *vrp, rangerAndrew MacLeod  
>  tree-ssaAndrew MacLeod  
>  tree browser/unparser   Sebastian Pop   
> @@ -518,7 +518,7 @@ Daniel Hellstromdanielh 
> 
>  Fergus Henderson-   
>  Richard Henderson   rth 
>  Stuart Hendersonshenders
> -Aldy Hernandez  aldyh   
> +Aldy Hernandez  aldyh   
>  Philip Herron   redbrain
> 
>  Marius Hillenbrand  -   
>  Matthew Hiller  -   
> @@ -948,3 +948,4 @@ Jonathan Wakely 
> 
>  Alexander Westbrooks
>  Chung-Ju Wu 
>  Pengxuan Zheng  
> +Aldy Hernandez  
> -- 
> 2.43.0
> 

Hi Aldy,

Could you move your entry in the DCO list so that it respects surname
alphabetical order, please?  Your name should be between Robin Dapp and Michal
Jires.

Thanks,
Filip Kastl

[PATCH] RISC-V: Fix FIXED_REGISTERS comment missing return address register

2024-09-24 Thread chenyixuan

From: Yixuan Chen 

gcc/config/ChangeLog:

2024-09-24  Yixuan Chen  

* riscv/riscv.h: Fix FIXED_REGISTERS comment missing return address 
register.
---
 gcc/config/riscv/riscv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index ead97867eb8..3aecb43f831 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -316,7 +316,7 @@ ASM_MISA_SPEC
 
 #define FIRST_PSEUDO_REGISTER 128
 
-/* x0, sp, gp, and tp are fixed.  */
+/* x0, ra, sp, gp, and tp are fixed.  */
 
 #define FIXED_REGISTERS
\
 { /* General registers.  */\
-- 
2.45.2

Re: [PATCH] MATCH: add abs support for half float

2024-09-24 Thread Richard Biener

On Mon, Sep 23, 2024 at 10:52 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> > On 20 Sep 2024, at 8:11 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Sep 20, 2024 at 10:23 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >>> On 17 Sep 2024, at 7:36 pm, Richard Biener  
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Tue, Sep 17, 2024 at 10:31 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi Richard,
> 
> > On 10 Sep 2024, at 9:33 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Sep 5, 2024 at 3:19 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Thanks for the explanation.
> >>
> >>
> >>> On 2 Sep 2024, at 9:47 am, Andrew Pinski  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Sun, Sep 1, 2024 at 4:27 PM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi Andrew.
> 
> > On 28 Aug 2024, at 2:23 pm, Andrew Pinski  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Aug 27, 2024 at 8:54 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >> Thanks for the reply.
> >>
> >>> On 27 Aug 2024, at 7:05 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Hi Richard,
> 
> > On 22 Aug 2024, at 10:34 pm, Richard Biener 
> >  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >>> On 20 Aug 2024, at 6:09 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  Thanks for the comments.
> 
> > On 2 Aug 2024, at 8:36 pm, Richard Biener 
> >  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >>
> >>
> >>> On 1 Aug 2024, at 10:46 pm, Richard Biener 
> >>>  wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
> 
>  On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski 
>   wrote:
> >
> > On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >>  wrote:
> >>>
> >>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >>>  wrote:
> 
>  On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
>   wrote:
> >
> > On Tue, Jul 23, 2024 at 10:27 AM Kugan 
> > Vivekanandarajah
> >  wrote:
> >>
> >> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
> >>  wrote:
> >>>
> >>> On Mon, Jul 22, 2024 at 5:26 PM Kugan 
> >>> Vivekanandarajah
> >>>  wrote:
> 
>  Revised based on the comment and moved it into 
>  existing patterns as.
> 
>  gcc/ChangeLog:
> 
>  * match.pd: Extend A CMP 0 ? A : -A into (type)A 
>  CMP 0 ? A : -A.
>  Extend A CMP 0 ? A : -A in

Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS

2024-09-24 Thread Jennifer Schmitz



> On 28 Aug 2024, at 14:56, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 28 Aug 2024, at 10:27, Tamar Christina  wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>>> -Original Message-
>>> From: Kyrylo Tkachov 
>>> Sent: Wednesday, August 28, 2024 8:55 AM
>>> To: Tamar Christina 
>>> Cc: Richard Sandiford ; Jennifer Schmitz
>>> ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov
>>> 
>>> Subject: Re: [RFC][PATCH] AArch64: Remove
>>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>>> 
>>> Hi all,
>>> 
>>> Thanks to Jennifer for proposing a patch and Tamar and Richard for digging 
>>> into it.
>>> 
 On 27 Aug 2024, at 13:16, Tamar Christina  wrote:
 
 External email: Use caution opening links or attachments
 
 
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, August 27, 2024 11:46 AM
> To: Tamar Christina 
> Cc: Jennifer Schmitz ; gcc-patches@gcc.gnu.org; 
> Kyrylo
> Tkachov 
> Subject: Re: [RFC][PATCH] AArch64: Remove
> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> 
> Tamar Christina  writes:
>> Hi Jennifer,
>> 
>>> -Original Message-
>>> From: Jennifer Schmitz 
>>> Sent: Friday, August 23, 2024 1:07 PM
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: Richard Sandiford ; Kyrylo Tkachov
>>> 
>>> Subject: [RFC][PATCH] AArch64: Remove
>>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>>> 
>>> This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
>>> tunable and
>>> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
>>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend
>>> the
>>> default.
> 
> Thanks for doing this.  This has been on my TODO list ever since the
> tunable was added.
> 
> The history is that these "new" costs were originally added in stage 4
> of GCC 11 for Neoverse V1.  Since the costs were added so late, it wasn't
> appropriate to change the behaviour for any other core.  All the new code
> was therefore gated on this option.
> 
> The new costs do two main things:
> 
> (1) use throughput-based calculations where available, including to choose
>  between Advanced SIMD and SVE
> 
> (2) try to make the latency-based costs more precise, by looking more 
> closely
>  at the provided stmt_info
> 
> Old cost models won't be affected by (1) either way, since they don't
> provide any throughput information.  But they should in principle benefit
> from (2).  So...
> 
>>> To that end, the function aarch64_use_new_vector_costs_p and its uses
>>> were
>>> removed. Additionally, guards were added prevent nullpointer 
>>> dereferences
>>> of
>>> fields in cpu_vector_cost.
>>> 
>> 
>> I'm not against this change, but it does mean that we now switch old Adv.
>>> SIMD
>> cost models as well to the new throughput based cost models.  That means
>>> that
>> -mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I think
>> some distros explicitly use this (I believe yocto for instance does).
> 
> ...it shouldn't mean that we start using throughput-based models for
> cortexa53 etc., since there's no associated issue info.
 
 Yes, I was using throughput based model as a name.  But as you indicated 
 in (2)
 it does change the latency calculation.
 
 My question was because of things in e.g. aarch64_adjust_stmt_cost and
>>> friends,
 e.g. aarch64_multiply_add_p changes the cost between FMA SIMD vs scalar.
 
 So my question..
 
> 
>> Have we validated that the old generic cost model still behaves sensibly 
>> with
>>> this
> change?
 
 is still valid I think, we *are* changing the cost for all models,
 and while they should indeed be more accurate, there could be knock on 
 effects.
 
>>> 
>>> We can run SPEC on a Grace system with -mcpu=generic to see what the effect 
>>> is,
>>> but wider benchmarking would be more appropriate. Can you help with that
>>> Tamar once we agree on the other implementation details in this patch?
>>> 
>> 
>> Sure that's not a problem.  Just ping me when you have a patch you want me 
>> to test :)
>> 
>>> 
 Thanks,
 Tamar
 
>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu:
>>> No problems bootstrapping, but several test files (in aarch64-sve.exp:
>>> gather_load_extend_X.c
>>> where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of 
>>> small
>>> differences
>>> in codegen that make some of the scan-assembler-times tests fail.
>>> 
>>> Kyrill suggested to add a -fvect-cost-model=unlimited flag to these 
>>> tests and
> add
>>> some
>> 
>> I don't personally like unlimited here as unlimited means just vectorize 
>> at any
>> cost.  This means that costing between

Re: [committed] arc: Remove mlra option [PR113954]

2024-09-24 Thread Claudiu Zissulescu Ianculescu

I'll include your comment in my second patch where I clean some
patterns used by reload.

Thank you,
claudiu

On Mon, Sep 23, 2024 at 5:05 PM Andreas Schwab  wrote:
>
> On Sep 23 2024, Claudiu Zissulescu wrote:
>
> > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
> > index c800226b179..a225adeff57 100644
> > --- a/gcc/config/arc/arc.cc
> > +++ b/gcc/config/arc/arc.cc
> > @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, 
> > machine_mode mode);
> >arc_no_speculation_in_delay_slots_p
> >
> >  #undef TARGET_LRA_P
> > -#define TARGET_LRA_P arc_lra_p
> > +#define TARGET_LRA_P hook_bool_void_true
>
> This is the default for lra_p, so you can remove the override.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

SVE intrinsics: Fold constant operands for svlsl.

2024-09-24 Thread Soumya AR

This patch implements constant folding for svlsl. Test cases have been added to
check for the following cases:

Zero, merge, and don't care predication.
Shift by 0.
Shift by register width.
Overflow shift on signed and unsigned integers.
Shift on a negative integer.
Maximum possible shift, eg. shift by 7 on an 8-bit integer.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold):
Try constant folding.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/const_fold_lsl_1.c: New test.



0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch
Description: 0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch

[PATCH] Testsuite, darwin: account for macOS 15

2024-09-24 Thread FX Coudert

I’ve pushed the attached patch as obvious, taking into account the newly 
released macOS 15 (darwin24). It makes the test pass.

FX



0001-Testsuite-darwin-account-for-macOS-15.patch
Description: Binary data

[PATCH] tree-optimization/114855 - more update_ssa speedup

2024-09-24 Thread Richard Biener

The following tackles another source of slow bitmap operations,
namely populating blocks_to_update.  We already have that in
tree view around PHI insertion but also the initial population is
slow.  There's unfortunately a conditional inbetween list view
requirement and the bitmap API doesn't allow opportunistic
switching but rejects tree -> tree or list -> list transitions.
So the following patch wraps the early population in a tree view
section with possibly one redundant tree -> list -> tree view
transition.

This cuts tree SSA incremental from 228.25s (21%) to 65.05s (7%).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114855
* tree-into-ssa.cc (update_ssa): Use tree view for the
initial population of blocks_to_update.
---
 gcc/tree-into-ssa.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 1cce9d62809..fc61d47ca77 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3445,6 +3445,7 @@ update_ssa (unsigned update_flags)
   blocks_with_phis_to_rewrite = BITMAP_ALLOC (NULL);
   bitmap_tree_view (blocks_with_phis_to_rewrite);
   blocks_to_update = BITMAP_ALLOC (NULL);
+  bitmap_tree_view (blocks_to_update);
 
   insert_phi_p = (update_flags != TODO_update_ssa_no_phi);
 
@@ -3492,6 +3493,8 @@ update_ssa (unsigned update_flags)
 placement heuristics.  */
   prepare_block_for_update (start_bb, insert_phi_p);
 
+  bitmap_list_view (blocks_to_update);
+
   tree name;
 
   if (flag_checking)
@@ -3517,6 +3520,8 @@ update_ssa (unsigned update_flags)
 }
   else
 {
+  bitmap_list_view (blocks_to_update);
+
   /* Otherwise, the entry block to the region is the nearest
 common dominator for the blocks in BLOCKS.  */
   start_bb = nearest_common_dominator_for_set (CDI_DOMINATORS,
-- 
2.43.0

Re: [PATCH v3] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread Richard Biener

On Tue, Sep 24, 2024 at 12:29 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> instead of the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.

OK.

Thanks,
Richard.

> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR middle-end/116814
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Make
> ifn is_supported type check based on operand instead of lhs.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc |  2 +-
>  2 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..dd6f29daa7c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index d61668aacfc..8c622514dbd 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4042,7 +4042,7 @@ build_saturation_binary_arith_call 
> (gimple_stmt_iterator *gsi, gphi *phi,
> internal_fn fn, tree lhs, tree op_0,
> tree op_1)
>  {
> -  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), 
> OPTIMIZE_FOR_BOTH))
> +  if (direct_internal_fn_supported_p (fn, TREE_TYPE (op_0), 
> OPTIMIZE_FOR_BOTH))
>  {
>gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
>gimple_call_set_lhs (call, lhs);
> --
> 2.43.0
>

[PATCH] libstdc++: more #pragma diagnostic

2024-09-24 Thread Jason Merrill

Tested x86_64-pc-linux-gnu.

Is this the right fix, or do we want to stop using these deprecated classes,
here and in stl_function.h?

-- 8< --

The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to
-Wdeprecated-declarations warnings in some parallel/ headers.

libstdc++-v3/ChangeLog:

* include/parallel/base.h: Suppress -Wdeprecated-declarations.
* include/parallel/multiseq_selection.h: Likewise.
---
 libstdc++-v3/include/parallel/base.h   | 4 
 libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/libstdc++-v3/include/parallel/base.h 
b/libstdc++-v3/include/parallel/base.h
index 5bc5350e723..fcbcc1e0b99 100644
--- a/libstdc++-v3/include/parallel/base.h
+++ b/libstdc++-v3/include/parallel/base.h
@@ -166,6 +166,8 @@ namespace __gnu_parallel
   { return !_M_comp(__a, __b) && !_M_comp(__b, __a); }
 };
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
 
   /** @brief Similar to std::unary_negate,
*  but giving the argument types explicitly. */
@@ -297,6 +299,8 @@ namespace __gnu_parallel
 struct _Multiplies<_Tp, _Tp, _Tp>
 : public std::multiplies<_Tp> { };
 
+#pragma GCC diagnostic pop // -Wdeprecated-declarations
+
   /** @brief _Iterator associated with __gnu_parallel::_PseudoSequence.
*  If features the usual random-access iterator functionality.
*  @param _Tp Sequence _M_value type.
diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h 
b/libstdc++-v3/include/parallel/multiseq_selection.h
index f25895adbdd..22bd97e6432 100644
--- a/libstdc++-v3/include/parallel/multiseq_selection.h
+++ b/libstdc++-v3/include/parallel/multiseq_selection.h
@@ -48,6 +48,10 @@
 
 namespace __gnu_parallel
 {
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
+
   /** @brief Compare __a pair of types lexicographically, ascending. */
   template
 class _Lexicographic
@@ -100,6 +104,8 @@ namespace __gnu_parallel
   }
 };
 
+#pragma GCC diagnostic pop // -Wdeprecated-declarations
+
   /** 
*  @brief Splits several sorted sequences at a certain global __rank,
*  resulting in a splitting point for each sequence.

base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8
-- 
2.46.0

Re: [Fortran, Patch, PR101100, v1] Fix ICE when compiling with caf-lib and using proc_pointer component.

2024-09-24 Thread Andre Vehreschild

Hi Harald,

thanks for the review. Committed as gcc-15-3827-g0c0d79c783f

Thanks again,
Andre

On Mon, 23 Sep 2024 21:25:55 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 19.09.24 um 14:19 schrieb Andre Vehreschild:
> > Hi all,
> >
> > the attached patch fixes an ICE when compiling with -fcoarray=lib and using
> > (proc_-)pointer component in a coarray. The code was looking at the wrong
> > location for the caf-token.
> >
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> this looks good to me.
>
> Thanks for the patch!
>
> Harald
>
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH 2/2] Disable add_store_equivs when -fno-expensive-optimizations

2024-09-24 Thread Richard Biener

IRAs add_store_equivs is quadratic in the size of the function worst
case, disable it when -fno-expensive-optimizations which means at
-O1 and -Og.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

* ira.cc (ira): Gate add_store_equivs on flag_expensive_optimizations.
---
 gcc/ira.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 3936456c4ed..5231f63398e 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5739,7 +5739,7 @@ ira (FILE *f)
 combine_and_move_insns ();
 
   /* Gather additional equivalences with memory.  */
-  if (optimize)
+  if (optimize && flag_expensive_optimizations)
 add_store_equivs ();
 
   loop_optimizer_finalize ();
-- 
2.43.0

[PATCH 1/2] rtl-optimization/114855 - slow add_store_equivs in IRA

2024-09-24 Thread Richard Biener

For the testcase in PR114855 at -O1 add_store_equivs shows up as the
main sink for bitmap_set_bit because it uses a bitmap to mark all
seen insns by UID to make sure the forward walk in memref_used_between_p
will find the insn in question.  Given we do have a CFG here the
functions operation is questionable, given memref_used_between_p
together with the walk of all insns is obviously quadratic in the
worst case that whole thing should be re-done ... but, for the
testcase, using a sbitmap of size get_max_uid () + 1 gets
bitmap_set_bit off the profile and improves IRA time from 15.58s (8%)
to 3.46s (2%).

Now, given above quadraticness I wonder whether we should instead
gate add_store_equivs on optimize > 1 or flag_expensive_optimizations.

Jeff, you added the bitmap in r6-7529-g14d7d4be52585b, I have no idea
how get_insns () works at this point and in which CFG mode we are but
a simplification might be to simply verify both insns are in the same
BB and hopefully get_insns gets us walk the insns in order there, thus
we could elide the bitmap completely (with some loss of cases, but
the function comment suggests it is supposed to catch single-BB
cases only anyway?!).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR rtl-optimization/114855
* ira.cc (add_store_equivs): Use sbitmap for tracking
visited insns.
---
 gcc/ira.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 156541df4e6..3936456c4ed 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3838,7 +3838,8 @@ update_equiv_regs (void)
 static void
 add_store_equivs (void)
 {
-  auto_bitmap seen_insns;
+  auto_sbitmap seen_insns (get_max_uid () + 1);
+  bitmap_clear (seen_insns);
 
   for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
 {
-- 
2.43.0

Re: [PATCH v3 1/4] tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types

2024-09-24 Thread Artemiy Volkov

On 9/24/2024 12:16 AM, Jeff Law wrote:
> 
> 
> On 9/23/24 2:32 AM, Artemiy Volkov wrote:
>> Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are
>> integer constants and X is of a UB-on-overflow type.  The pattern is
>> simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the
>> comparison (with opposite signs).  If C1 - C2 happens to overflow,
>> replace the whole expression with either a constant 0 or a constant 1
>> node, depending on the comparison operator and the sign of the overflow.
>>
>> This transformation allows to occasionally save load-immediate /
>> subtraction instructions, e.g. the following statement:
>>
>> 10 - (int) x <= 9;
>>
>> now compiles to
>>
>> sgt a0,a0,zero
>>
>> instead of
>>
>> li  a5,10
>> sub a0,a5,a0
>> slti    a0,a0,10
>>
>> on 32-bit RISC-V.
>>
>> Additional examples can be found in the newly added test file. This
>> patch has been bootstrapped and regtested on aarch64, x86_64, and
>> i386, and additionally regtested on riscv32.  Existing tests were
>> adjusted where necessary.
>>
>> gcc/ChangeLog:
>>
>> PR tree-optimization/116024
>>  * match.pd: New transformation around integer comparison.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.dg/tree-ssa/pr116024.c: New test.
>>  * gcc.dg/pr67089-6.c: Adjust.
> I think Richi is already engaged on the review side, so I'll let him own 
> especially since he knows more about match.pd patterns than I do.
> 
> 
>> +int32_t i1(void)
>> +{
>> +  int32_t l = 2;
>> +  l = 10 - (int32_t)f();
>> +  return l <= 9; // f() > 0
>> +}
> Why the initialization of l = 2?  It's trivially dead and I expect it to 
> be cleaned up early in the optimization pipeline.  It looks like most of 
> the tests in the series have this trivially dead initialization code.

Hi Jeff,

These initializers come from the original reduced testcase in 116024 and 
are completely superfluous - I will remove them before resending the series.

Thanks,
Artemiy

> 
> Jeff
> 
> 
> 
>

Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD

2024-09-24 Thread Robin Dapp

LGTM (in case you haven't committed it yet).

-- 
Regards
 Robin

Re: [PATCH v1 2/2] RISC-V: Add testcases for form 2 of signed vector SAT_ADD

2024-09-24 Thread Robin Dapp

LGTM.

-- 
Regards
 Robin

Re: [PATCH] Update email in MAINTAINERS file.

2024-09-24 Thread Aldy Hernandez

Pushed attached patch.

Thanks.
Aldy

On Tue, Sep 24, 2024 at 10:09 AM Filip Kastl  wrote:

> On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote:
> > From: Aldy Hernandez 
> >
> > ChangeLog:
> >
> >   * MAINTAINERS: Update email and add myself to DCO.
> > ---
> >  MAINTAINERS | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index cfd96c9f33e..e9fafaf45a7 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -116,7 +116,7 @@ riscv port  Jim Wilson  <
> jim.wilson@gmail.com>
> >  rs6000/powerpc port David Edelsohn  
> >  rs6000/powerpc port Segher Boessenkool  <
> seg...@kernel.crashing.org>
> >  rs6000/powerpc port Kewen Lin   
> > -rs6000 vector extns Aldy Hernandez  
> > +rs6000 vector extns Aldy Hernandez  
> >  rx port Nick Clifton
> >  s390 port   Ulrich Weigand  
> >  s390 port   Andreas Krebbel 
> > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely <
> jwak...@redhat.com>
> >  c++ runtime libs special modes  François Dumont 
> >  fixincludes Bruce Korb  
> >  *gimpl* Jakub Jelinek   
> > -*gimpl* Aldy Hernandez  
> > +*gimpl* Aldy Hernandez  
> >  *gimpl* Jason Merrill   
> >  gcse.cc Jeff Law
> >  global opt frameworkJeff Law
> > @@ -240,7 +240,7 @@ option handling Joseph Myers<
> josmy...@redhat.com>
> >  middle-end  Jeff Law
> >  middle-end  Ian Lance Taylor
> >  middle-end  Richard Biener  
> > -*vrp, rangerAldy Hernandez  
> > +*vrp, rangerAldy Hernandez  
> >  *vrp, rangerAndrew MacLeod  
> >  tree-ssaAndrew MacLeod  
> >  tree browser/unparser   Sebastian Pop   
> > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh <
> dan...@gaisler.com>
> >  Fergus Henderson-   
> >  Richard Henderson   rth 
> >  Stuart Hendersonshenders
> > -Aldy Hernandez  aldyh   
> > +Aldy Hernandez  aldyh   
> >  Philip Herron   redbrain<
> herron.phi...@googlemail.com>
> >  Marius Hillenbrand  -   
> >  Matthew Hiller  -   
> > @@ -948,3 +948,4 @@ Jonathan Wakely <
> jwak...@redhat.com>
> >  Alexander Westbrooks >
> >  Chung-Ju Wu 
> >  Pengxuan Zheng  <
> quic_pzh...@quicinc.com>
> > +Aldy Hernandez  
> > --
> > 2.43.0
> >
>
> Hi Aldy,
>
> Could you move your entry in the DCO list so that it respects surname
> alphabetical order, please?  Your name should be between Robin Dapp and
> Michal
> Jires.
>
> Thanks,
> Filip Kastl
>
>
From 34366176046351250e1beb578664d926fbdd50c9 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 24 Sep 2024 11:40:52 +0200
Subject: [PATCH] Alphabetize my entry in MAINTAINER's DCO list.

ChangeLog:

	* MAINTAINERS: Move my entry in DCO list into alphabetical order.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3b4cf9d20d8..47b5915e9f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -917,6 +917,7 @@ information.
 Juergen Christ  
 Robin Dapp  
 Robin Dapp  
+Aldy Hernandez  
 Michal Jires
 Matthias Kretz  
 Prathamesh Kulkarni 
@@ -949,4 +950,3 @@ Jonathan Wakely 
 Alexander Westbrooks
 Chung-Ju Wu 
 Pengxuan Zheng  
-Aldy Hernandez  
-- 
2.43.0

Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE

2024-09-24 Thread Uros Bizjak

On Tue, Sep 24, 2024 at 11:23 AM liuhongt  wrote:
>
> Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT.
> Otherwise NULL_RTX.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
>
> gcc/ChangeLog:
>
> * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/rtl/x86_64/vector_eq.c: New test.
> ---
>  gcc/config/i386/i386.h  |  5 +++-
>  gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 +
>  2 files changed, 30 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index c1ec92ffb15..b12be41424f 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
> and give entire struct the alignment of an int.  */
>  /* Required on the 386 since it doesn't have bit-field insns.  */
>  #define PCC_BITFIELD_TYPE_MATTERS 1
> -
> +
> +#define VECTOR_STORE_FLAG_VALUE(MODE) \
> +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX)
> +
>  /* Standard register usage.  */
>
>  /* This processor has special stack-like registers.  See reg-stack.cc
> diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c 
> b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
> new file mode 100644
> index 000..b82603d0b64
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target x86_64-*-* } } */

target { { i?86-*-* x86_64-*-* } && lp64 }

Uros.

> +/* { dg-additional-options "-O2 -march=x86-64-v3" } */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +
> +v4si __RTL (startwith ("vregs")) foo (void)
> +{
> +(function "foo"
> +  (insn-chain
> +(block 2
> +  (edge-from entry (flags "FALLTHRU"))
> +  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +  (cnote 2 NOTE_INSN_FUNCTION_BEG)
> +  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) 
> (const_int 0) (const_int 0) (const_int 0)])))
> +  (cinsn 5 (set (reg:V4SI <2>)
> +   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
> +  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
> +  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
> +  (edge-to exit (flags "FALLTHRU"))
> +)
> +  )
> + (crtl (return_rtx (reg/i:V4SI xmm0)))
> +)
> +}
> +
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> --
> 2.31.1
>

Re: [PATCH] Update email in MAINTAINERS file.

2024-09-24 Thread Filip Kastl

On Tue 2024-09-24 11:43:47, Aldy Hernandez wrote:
> Pushed attached patch.
> 
> Thanks.
> Aldy
> 

Nice.

Thanks!
Filip

> On Tue, Sep 24, 2024 at 10:09 AM Filip Kastl  wrote:
> 
> > On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote:
> > > From: Aldy Hernandez 
> > >
> > > ChangeLog:
> > >
> > >   * MAINTAINERS: Update email and add myself to DCO.
> > > ---
> > >  MAINTAINERS | 9 +
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index cfd96c9f33e..e9fafaf45a7 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -116,7 +116,7 @@ riscv port  Jim Wilson  <
> > jim.wilson@gmail.com>
> > >  rs6000/powerpc port David Edelsohn  
> > >  rs6000/powerpc port Segher Boessenkool  <
> > seg...@kernel.crashing.org>
> > >  rs6000/powerpc port Kewen Lin   
> > > -rs6000 vector extns Aldy Hernandez  
> > > +rs6000 vector extns Aldy Hernandez  
> > >  rx port Nick Clifton
> > >  s390 port   Ulrich Weigand  
> > >  s390 port   Andreas Krebbel 
> > > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely <
> > jwak...@redhat.com>
> > >  c++ runtime libs special modes  François Dumont 
> > >  fixincludes Bruce Korb  
> > >  *gimpl* Jakub Jelinek   
> > > -*gimpl* Aldy Hernandez  
> > > +*gimpl* Aldy Hernandez  
> > >  *gimpl* Jason Merrill   
> > >  gcse.cc Jeff Law
> > >  global opt frameworkJeff Law
> > > @@ -240,7 +240,7 @@ option handling Joseph Myers<
> > josmy...@redhat.com>
> > >  middle-end  Jeff Law
> > >  middle-end  Ian Lance Taylor
> > >  middle-end  Richard Biener  
> > > -*vrp, rangerAldy Hernandez  
> > > +*vrp, rangerAldy Hernandez  
> > >  *vrp, rangerAndrew MacLeod  
> > >  tree-ssaAndrew MacLeod  
> > >  tree browser/unparser   Sebastian Pop   
> > > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh <
> > dan...@gaisler.com>
> > >  Fergus Henderson-   
> > >  Richard Henderson   rth 
> > >  Stuart Hendersonshenders
> > > -Aldy Hernandez  aldyh   
> > > +Aldy Hernandez  aldyh   
> > >  Philip Herron   redbrain<
> > herron.phi...@googlemail.com>
> > >  Marius Hillenbrand  -   
> > >  Matthew Hiller  -   
> > > @@ -948,3 +948,4 @@ Jonathan Wakely <
> > jwak...@redhat.com>
> > >  Alexander Westbrooks > >
> > >  Chung-Ju Wu 
> > >  Pengxuan Zheng  <
> > quic_pzh...@quicinc.com>
> > > +Aldy Hernandez  
> > > --
> > > 2.43.0
> > >
> >
> > Hi Aldy,
> >
> > Could you move your entry in the DCO list so that it respects surname
> > alphabetical order, please?  Your name should be between Robin Dapp and
> > Michal
> > Jires.
> >
> > Thanks,
> > Filip Kastl
> >
> >

> From 34366176046351250e1beb578664d926fbdd50c9 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Tue, 24 Sep 2024 11:40:52 +0200
> Subject: [PATCH] Alphabetize my entry in MAINTAINER's DCO list.
> 
> ChangeLog:
> 
>   * MAINTAINERS: Move my entry in DCO list into alphabetical order.
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3b4cf9d20d8..47b5915e9f8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -917,6 +917,7 @@ information.
>  Juergen Christ  
>  Robin Dapp  
>  Robin Dapp  
> +Aldy Hernandez  
>  Michal Jires
>  Matthias Kretz  
>  Prathamesh Kulkarni 
> @@ -949,4 +950,3 @@ Jonathan Wakely 
> 
>  Alexander Westbrooks
>  Chung-Ju Wu 
>  Pengxuan Zheng  
> -Aldy Hernandez  
> -- 
> 2.43.0
>

[PATCH] tree-optimization/114855 - slow VRP due to equiv oracle queries

2024-09-24 Thread Richard Biener

For the testcase in PR114855 VRP takes 320.41s (23%) (after mitigating
backwards threader slowness).  This is mostly due to the bitmap check
in equiv_oracle::find_equiv_dom.  The following turns this bitmap
to tree view, trading the linear search for a O(log N) one which
improves VRP time to 54.54s (5%).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK if that
succeeds?

Thanks,
Richard.

PR tree-optimization/114855
* value-relation.cc (equiv_oracle::equiv_oracle): Switch
m_equiv_set to tree view.
---
 gcc/value-relation.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 45722fcd13a..d6ad2dd984f 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -321,6 +321,7 @@ equiv_oracle::equiv_oracle ()
   m_equiv.create (0);
   m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
   m_equiv_set = BITMAP_ALLOC (&m_bitmaps);
+  bitmap_tree_view (m_equiv_set);
   obstack_init (&m_chain_obstack);
   m_self_equiv.create (0);
   m_self_equiv.safe_grow_cleared (num_ssa_names + 1);
-- 
2.43.0

Re: [patch, fortran] Matmul and dot_product for unsigned

2024-09-24 Thread Andre Vehreschild

Hi Thomas,

thanks for your answers. I am ok with the patch.

- Andre

On Mon, 23 Sep 2024 15:07:31 +0200
Thomas Koenig  wrote:

> Hello Andre and everybody else?
>
> Any more comments on the matmul patch? The other ones depend on
> it, so I would like to commit (unless there are further
> questions, of course).
>
> Best regards
>
>   Thomas


--
Andre Vehreschild * Email: vehre ad gmx dot de

RE: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread Li, Pan2

Thanks Richard for comments.

> Since you're creating the call with op_0/op_1 shouldn't you _only_ check 
> support
> for op_type operation and not lhs_type?

Yes, your are right. Checking operand makes much more sense to me. Let me 
update in v3.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, September 24, 2024 3:42 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
checking

On Tue, Sep 24, 2024 at 9:13 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR middle-end/116814
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> ifn is_supported check for operand TREE type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc | 23 +++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..dd6f29daa7c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index d61668aacfc..361761cedef 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
> (gimple_stmt_iterator *gsi, gphi *phi,
> internal_fn fn, tree lhs, tree op_0,
> tree op_1)
>  {
> -  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), 
> OPTIMIZE_FOR_BOTH))
> -{
> -  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> -  gimple_call_set_lhs (call, lhs);
> -  gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  tree op_type = TREE_TYPE (op_0);
>
> -  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> -  remove_phi_node (&psi, /* releas

Re: [Fortran, Patch, PR84870, v1] Fix ICE and allocated memory not assigned correctly.

2024-09-24 Thread Andre Vehreschild

Hi Harald,

thanks for the review. Committed as gcc-15-3825-gf5035d7d015

Thanks again,
Andre

On Mon, 23 Sep 2024 21:19:40 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 19.09.24 um 16:01 schrieb Andre Vehreschild:
> > Hi all,
> >
> > in PR84870 an ICE was reported, that has been fixed in the meantime by some
> > other patch. Nevertheless did a testcase reveal that the memory handling
> > still was not correct. I.e. the test case in the patch was answering 2 for
> > both x.b.a and y.b.a which is not correct.
> >
> > For a coarray all memory is allocated using an array descriptor. For scalars
> > just a temporary descriptor is created and handed to the caf-register
> > routine. The error here was, that the memory now handed back in the
> > temporary descriptor was not used for the memory in the component, thus the
> > pointer in the component was not updated. The patch fixes this.
> >
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> this looks good to me.
>
> Thanks for the patch!
>
> Harald
>
> > Regards,
> > Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

[PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread pan2 . li

From: Pan Li 

This patch would like to fix the following ICE for -O2 -m32 of x86_64.

during RTL pass: expand
JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
int)':
JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
expand_fn_using_insn, at internal-fn.cc:263
3 | void DequeueEvent(unsigned frame) {
  |  ^~~~
0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
diagnostic_metadata const*, diagnostic_option_id, char const*,
__va_list_tag (*) [1], diagnostic_t)
???:0
0x27c4a3f internal_error(char const*, ...)
???:0
0x27b3994 fancy_abort(char const*, int, char const*)
???:0
0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
???:0
0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
???:0
0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
???:0

We allowed the operand convert when matching SAT_SUB in match.pd, to support
the zip benchmark SAT_SUB pattern.  Aka,

(convert? (minus (convert1? @0) (convert1? @1))) for below sample code.

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

The pattern match for SAT_SUB itself may also act on below scalar sample
code too.

unsigned long long GetTimeFromFrames(int);
unsigned long long GetMicroSeconds();

void DequeueEvent(unsigned frame) {
  long long frame_time = GetTimeFromFrames(frame);
  unsigned long long current_time = GetMicroSeconds();
  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
}

Aka:

uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);

Then there will be a problem when ia32 or -m32 is given when compiling.
Because we only check the lhs (aka uint32_t) type is supported by ifn
and missed the operand (aka uint64_t).  Mostly DImode is disabled for
32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

PR middle-end/116814

gcc/ChangeLog:

* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
ifn is_supported check for operand TREE type.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116814-1.C: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
 gcc/tree-ssa-math-opts.cc | 23 +++
 2 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C

diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
b/gcc/testsuite/g++.dg/torture/pr116814-1.C
new file mode 100644
index 000..dd6f29daa7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2" } */
+
+unsigned long long GetTimeFromFrames(int);
+unsigned long long GetMicroSeconds();
+
+void DequeueEvent(unsigned frame) {
+  long long frame_time = GetTimeFromFrames(frame);
+  unsigned long long current_time = GetMicroSeconds();
+
+  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index d61668aacfc..361761cedef 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
(gimple_stmt_iterator *gsi, gphi *phi,
internal_fn fn, tree lhs, tree op_0,
tree op_1)
 {
-  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), OPTIMIZE_FOR_BOTH))
-{
-  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
-  gimple_call_set_lhs (call, lhs);
-  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+  tree lhs_type = TREE_TYPE (lhs);
+  tree op_type = TREE_TYPE (op_0);
 
-  gimple_stmt_iterator psi = gsi_for_stmt (phi);
-  remove_phi_node (&psi, /* release_lhs_p */ false);
-}
+  if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH))
+return;
+
+  if (lhs_type != op_type
+  && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH))
+return;
+
+  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
+  gimple_call_set_lhs (call, lhs);
+  gsi_insert_before (gsi, call, GSI_SAME_STMT);
+
+  gimple_stmt_iterator psi = gsi_for_stmt (phi);
+  remove_phi_node (&psi, /* release_lhs_p */ false);
 }
 
 /*
-- 
2.43.0

Re: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-24 Thread Richard Biener

On Tue, 24 Sep 2024, Tamar Christina wrote:

> > Can you explain how you get to see constant/external defs with 
> > astmt_vec_info?  That's somehow a violation of some inherentinvariant in 
> > the vectorizer.
> 
> I'm not sure I actually get any. It could be the condition is never hit 
> with a stmt_vec_info. I had assumed however since the condition is part 
> of a gimple_cond and if one of the arguments of the gimple_cond is loop 
> bound, that the condition would be analyzed too.
> 
> So if you're saying you never get a stmt_vec_info for invariants at this 
> point (I assume you could see you see them in the corresponding slp 
> tree) then maybe checking for the stmt_vec_info is enough.
> 
> However, when I was looking around for how to check for externals I 
> noticed other patterns also check for externals and constants. So I 
> assumed that you could indeed get them.

You usually check that after doing vect_is_simple_use on the SSA name
or constant which internally makes all stmts with a stmt_vec_info
one of the internal def kinds.

So I guess you could do vect_is_simple_use on 'var' as well and check
the 'dt' it will populate

Richard.

 
> Kind regards,
> Tamar
> 
> 
> 
> From: Richard Biener 
> Sent: Tuesday, September 24, 2024 7:45 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org ; nd ; 
> j...@ventanamicro.com 
> Subject: RE: [PATCH]middle-end: check explicitly for external or constants 
> when checking for loop invariant [PR116817]
> 
> On Mon, 23 Sep 2024, Tamar Christina wrote:
> 
> > I had made the condition to strict before, here's an updated patch:
> >
> > Hi All,
> >
> > The previous check if a value was external was checking
> > !vect_get_internal_def (vinfo, var) but this of course isn't completely 
> > right
> > as they could reductions etc.
> >
> > This changes the check to just explicitly look at externals and constants.
> > Note that reductions remain unhandled here, but we don't support codegen of
> > boolean reductions today anyway.
> 
> Can you explain how you get to see constant/external defs with a
> stmt_vec_info?  That's somehow a violation of some inherent
> invariant in the vectorizer.
> 
> Richard.
> 
> > So at the time we do then this would have the be handled as well in 
> > lowering.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >PR tree-optimization/116817
> >* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
> >externals.
> >
> > gcc/testsuite/ChangeLog:
> >
> >PR tree-optimization/116817
> >* g++.dg/vect/pr116817.cc: New test.
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
> > b/gcc/testsuite/g++.dg/vect/pr116817.cc
> > new file mode 100644
> > index 
> > ..7e28982fb138c24f956aedb03fa454d9d858
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +int main_ulData0;
> > +unsigned *main_pSrcBuffer;
> > +int main(void) {
> > +  int iSrc = 0;
> > +  bool bData0;
> > +  for (; iSrc < 4; iSrc++) {
> > +if (bData0)
> > +  main_pSrcBuffer[iSrc] = main_ulData0;
> > +else
> > +  main_pSrcBuffer[iSrc] = 0;
> > +bData0 = !bData0;
> > +  }
> > +}
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 
> > e7e877dd2adb55262822f1660f8d92b42d44e6d0..f0298b2ab97a1e7dd0d943340e1389c3c0fa796e
> >  100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> >return NULL;
> >
> > +  stmt_vec_info var_def_info = vinfo->lookup_def (var);
> >if (check_bool_pattern (var, vinfo, bool_stmts))
> >var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> >else if (integer_type_for_mask (var, vinfo))
> >return NULL;
> >else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> > -&& !vect_get_internal_def (vinfo, var))
> > +&& (!var_def_info
> > +|| STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def
> > +|| STMT_VINFO_DEF_TYPE (var_def_info) == 
> > vect_constant_def))
> >{
> >  /* If the condition is already a boolean then manually convert it 
> > to a
> > mask of the given integer type but don't set a vectype.  */
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146,

[PATCH v3] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread pan2 . li

From: Pan Li 

This patch would like to fix the following ICE for -O2 -m32 of x86_64.

during RTL pass: expand
JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
int)':
JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
expand_fn_using_insn, at internal-fn.cc:263
3 | void DequeueEvent(unsigned frame) {
  |  ^~~~
0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
diagnostic_metadata const*, diagnostic_option_id, char const*,
__va_list_tag (*) [1], diagnostic_t)
???:0
0x27c4a3f internal_error(char const*, ...)
???:0
0x27b3994 fancy_abort(char const*, int, char const*)
???:0
0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
???:0
0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
???:0
0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
???:0

We allowed the operand convert when matching SAT_SUB in match.pd, to support
the zip benchmark SAT_SUB pattern.  Aka,

(convert? (minus (convert1? @0) (convert1? @1))) for below sample code.

void test (uint16_t *x, unsigned b, unsigned n)
{
  unsigned a = 0;
  register uint16_t *p = x;

  do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
  } while (--n);
}

The pattern match for SAT_SUB itself may also act on below scalar sample
code too.

unsigned long long GetTimeFromFrames(int);
unsigned long long GetMicroSeconds();

void DequeueEvent(unsigned frame) {
  long long frame_time = GetTimeFromFrames(frame);
  unsigned long long current_time = GetMicroSeconds();
  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
}

Aka:

uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);

Then there will be a problem when ia32 or -m32 is given when compiling.
Because we only check the lhs (aka uint32_t) type is supported by ifn
instead of the operand (aka uint64_t).  Mostly DImode is disabled for
32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

PR middle-end/116814

gcc/ChangeLog:

* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Make
ifn is_supported type check based on operand instead of lhs.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr116814-1.C: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
 gcc/tree-ssa-math-opts.cc |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C

diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
b/gcc/testsuite/g++.dg/torture/pr116814-1.C
new file mode 100644
index 000..dd6f29daa7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2" } */
+
+unsigned long long GetTimeFromFrames(int);
+unsigned long long GetMicroSeconds();
+
+void DequeueEvent(unsigned frame) {
+  long long frame_time = GetTimeFromFrames(frame);
+  unsigned long long current_time = GetMicroSeconds();
+
+  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index d61668aacfc..8c622514dbd 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4042,7 +4042,7 @@ build_saturation_binary_arith_call (gimple_stmt_iterator 
*gsi, gphi *phi,
internal_fn fn, tree lhs, tree op_0,
tree op_1)
 {
-  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), OPTIMIZE_FOR_BOTH))
+  if (direct_internal_fn_supported_p (fn, TREE_TYPE (op_0), OPTIMIZE_FOR_BOTH))
 {
   gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
   gimple_call_set_lhs (call, lhs);
-- 
2.43.0

[PATCH] Fix bogus SLP nvector compute in check_load_store_for_partial_vectors

2024-09-24 Thread Richard Biener

We have a new overload for vect_get_num_copies that handles both
SLP and non-SLP.  Use it.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Use the new vect_get_num_copies overload.
---
 gcc/tree-vect-stmts.cc | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cafcedb7b9e..f7867c0803b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1507,13 +1507,7 @@ check_load_store_for_partial_vectors (loop_vec_info 
loop_vinfo, tree vectype,
   if (memory_access_type == VMAT_INVARIANT)
 return;
 
-  unsigned int nvectors;
-  if (slp_node)
-/* ???  Incorrect for multi-lane lanes.  */
-nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) / group_size;
-  else
-nvectors = vect_get_num_copies (loop_vinfo, vectype);
-
+  unsigned int nvectors = vect_get_num_copies (loop_vinfo, slp_node, vectype);
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   machine_mode vecmode = TYPE_MODE (vectype);
-- 
2.43.0

[PATCH] MATCH: Simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` [PR112472]

2024-09-24 Thread Eikansh Gupta

This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, 
-1.0/1.0)`
depending on the sign of CST. Previously, it was simplified to `copysign (x, 
CST)`.
It can be optimized as the sign of the CST matters, not the value.

The patch also simplify `(trunc)abs (extend x)` to `abs (x)`.

PR tree-optimization/112472

gcc/ChangeLog:

* match.pd ((trunc)copysign ((extend)x, -CST) --> copysign (x, -1.0)): 
New pattern.
((trunc)abs (extend x) --> abs (x)): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr112472.c: New test.
---
 gcc/match.pd | 24 +++-
 gcc/testsuite/gcc.dg/tree-ssa/pr112472.c | 22 ++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112472.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 940292d0d49..52dc8b539fc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8535,7 +8535,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2))
&& direct_internal_fn_supported_p (IFN_COPYSIGN,
  type, OPTIMIZE_FOR_BOTH))
-(IFN_COPYSIGN @0 @1
+(IFN_COPYSIGN @0 @1)))
+ /* (trunc)copysign (extend)x, CST) to copysign (x, -1.0/1.0) */
+ (simplify
+  (convert (copysigns (convert@2 @0) REAL_CST@1))
+   (if (optimize
+   && !HONOR_SNANS (@2)
+   && types_match (type, TREE_TYPE (@0))
+   && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2))
+   && direct_internal_fn_supported_p (IFN_COPYSIGN,
+ type, OPTIMIZE_FOR_BOTH))
+(if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
+ (IFN_COPYSIGN @0 { build_minus_one_cst (type); })
+ (IFN_COPYSIGN @0 { build_one_cst (type); })
+
+/* (trunc)abs (extend x) --> abs (x)
+   x is a float value */
+(simplify
+ (convert (abs (convert@1 @0)))
+  (if (optimize
+  && !HONOR_SNANS (@1)
+  && types_match (type, TREE_TYPE (@0))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1)))
+   (abs @0)))
 
 (for froms (BUILT_IN_FMAF BUILT_IN_FMA BUILT_IN_FMAL)
  tos (IFN_FMA IFN_FMA IFN_FMA)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c
new file mode 100644
index 000..8f97278ffe8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c
@@ -0,0 +1,22 @@
+/* PR tree-optimization/109878 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+/* Optimized to .COPYSIGN(a, -1.0e+0) */
+float f(float a)
+{
+  return (float)__builtin_copysign(a, -3.0);
+}
+
+/* This gets converted to (float) abs((double) a)
+   With the patch it is optimized to abs(a) */
+float f2(float a)
+{
+  return (float)__builtin_copysign(a, 5.0);
+}
+
+/* { dg-final { scan-tree-dump-not "= __builtin_copysign" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " double " "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times "-1.0e\\+0" 1 "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times " ABS_EXPR " 1 "optimized" { target 
ifn_copysign } } } */
-- 
2.17.1

Re: [Patch] OpenMP: Add support for 'self_maps' to the 'require' directive

2024-09-24 Thread Tobias Burnus


Hi all,

now committed as r15-3822-gb752eed3e3f2f2, see attachment.

I fixed on C/C++ test issue (missing 's') and added the Fortran module 
check.


Tobias

PS: I noticed that 'declare target' does not add the target-used flag. 
At least TR13 is very clear that it counts, but currently GCC does not 
regard this (with a FIXME check spec note.) This needs to be fixed 
ventually.


PPS: Old discussion:

Andre Vehreschild:

Hi Tobias,

to my eye this looks fine. I would appreciate, if you could add some tests for
errors on the fortran side, esp. where modules are involved. But no must.

Ok for mainline. Thanks for the patch.

- Andre

On Sat, 21 Sep 2024 23:37:33 +0200
Tobias Burnus  wrote:


Add support of the 'self_maps' clause in 'omp requires',
an OpenMP 6 feature but added here mostly as part of the
on-going improvement of the unified-shared memory (USM) handling.

Comments, remarks concerns before I commit it?

* * *

Regarding USM, there is on one hand the hardware:

- some hardware cannot access the host memory at all
- other hardware can access it, but either only through
an interconnect or via page migration on page fault
- on the third time of hardware, a host and device share
the same memory controller

For the latter, a 'map' never does make sense, but for
the second case, it depends on the details whether it is
better to do mapping or directly accessing the memory
(i.e. via interconnect or page migration).

On the compile-time side, the user can demand:
- no requirement
- 'requires unified_shared_memory' (= memory has to be accessible
but the implementation can still do mapping for explicit maps)
- 'requires shared_memory' - mapping is strictly not permitted.
- other hints using compiler flags

And for the runtime, the result depends on the actual hardware,
the compile-time wishes, environment variables what is done.

* * *

Currently, the runtime never maps with USM, i.e. both act the same.
At least using an environment variable, I would consider enabling
mapping - one could also consider to have it always do mappings,
except for self_maps.

On the compile side, we need to handle implicit 'declare target'
better - as it currently leads to separate memory. Using 'link',
we could point to the host memory (at least for 'self_maps').

And before we can enable USM by default for integrated/APU devices,
we need to solve some issues with 'link' (→ posted link) and for
those, 'map' has to be honored.

Those are 5.x follow up tasks, but having 'self_maps' available,
completes the what-does-the-user-want part.

Tobias

PS: There is also the 'self' modifier to the map clause, working
on a per-variable granularity. However, this like several other
6.0 items is completely out of scope of the current USM work.

PPS: See
also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and 
the patch associated set, posted
at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html
commit b752eed3e3f2f27570ea89b7c2339468698472a8
Author: Tobias Burnus 
Date:   Tue Sep 24 10:53:59 2024 +0200

OpenMP: Add support for 'self_maps' to the 'require' directive

'self_maps' implies 'unified_shared_memory', except that the latter
also permits that explicit maps copy data to device memory while
self_maps does not. In GCC, currently, both are handled identical.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_requires): Handle self_maps clause.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_requires): Handle self_maps clause.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS.
(gfc_namespace): Enlarge omp_requires bitfield.
* module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS.
(mio_symbol_attribute): Handle it.
* openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle
self_maps clause.
* parse.cc (gfc_parse_file): Handle self_maps clause.

gcc/ChangeLog:

* lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle
self_maps clause.
* omp-general.cc (struct omp_ts_info, omp_context_selector_matches):
Likewise for the associated trait.
* omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS.
* omp-selectors.h (enum omp_ts_code): Add
OMP_TRAIT_IMPLEMENTATION_SELF_MAPS.

include/ChangeLog:

* gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices):
Accept self_maps clause.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
Likewise.
* libgomp.texi (TR13 Impl. Status): Set to 'Y'.
* target.c (gomp_requires_to_name, GOMP_offload_register_ver,
gomp_target_init): Handle sel

Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking

2024-09-24 Thread Richard Biener

On Tue, Sep 24, 2024 at 9:13 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to fix the following ICE for -O2 -m32 of x86_64.
>
> during RTL pass: expand
> JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned
> int)':
> JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in
> expand_fn_using_insn, at internal-fn.cc:263
> 3 | void DequeueEvent(unsigned frame) {
>   |  ^~~~
> 0x27b580d diagnostic_context::diagnostic_impl(rich_location*,
> diagnostic_metadata const*, diagnostic_option_id, char const*,
> __va_list_tag (*) [1], diagnostic_t)
> ???:0
> 0x27c4a3f internal_error(char const*, ...)
> ???:0
> 0x27b3994 fancy_abort(char const*, int, char const*)
> ???:0
> 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int)
> ???:0
> 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int)
> ???:0
> 0xf2c87c expand_SAT_SUB(internal_fn, gcall*)
> ???:0
>
> We allowed the operand convert when matching SAT_SUB in match.pd, to support
> the zip benchmark SAT_SUB pattern.  Aka,
>
> (convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
>
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   register uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
>   } while (--n);
> }
>
> The pattern match for SAT_SUB itself may also act on below scalar sample
> code too.
>
> unsigned long long GetTimeFromFrames(int);
> unsigned long long GetMicroSeconds();
>
> void DequeueEvent(unsigned frame) {
>   long long frame_time = GetTimeFromFrames(frame);
>   unsigned long long current_time = GetMicroSeconds();
>   DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> }
>
> Aka:
>
> uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
>
> Then there will be a problem when ia32 or -m32 is given when compiling.
> Because we only check the lhs (aka uint32_t) type is supported by ifn
> and missed the operand (aka uint64_t).  Mostly DImode is disabled for
> 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> PR middle-end/116814
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> ifn is_supported check for operand TREE type.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr116814-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 
>  gcc/tree-ssa-math-opts.cc | 23 +++
>  2 files changed, 27 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> new file mode 100644
> index 000..dd6f29daa7c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long long GetTimeFromFrames(int);
> +unsigned long long GetMicroSeconds();
> +
> +void DequeueEvent(unsigned frame) {
> +  long long frame_time = GetTimeFromFrames(frame);
> +  unsigned long long current_time = GetMicroSeconds();
> +
> +  DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index d61668aacfc..361761cedef 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call 
> (gimple_stmt_iterator *gsi, gphi *phi,
> internal_fn fn, tree lhs, tree op_0,
> tree op_1)
>  {
> -  if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), 
> OPTIMIZE_FOR_BOTH))
> -{
> -  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> -  gimple_call_set_lhs (call, lhs);
> -  gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  tree op_type = TREE_TYPE (op_0);
>
> -  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> -  remove_phi_node (&psi, /* release_lhs_p */ false);
> -}
> +  if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH))
> +return;
> +
> +  if (lhs_type != op_type
> +  && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH))
> +return;
> +
> +  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);

Since you're creating the call with op_0/op_1 shouldn't you _only_ check support
for op_type operation and not lhs_type?

Thanks,
Richard.

> +  gimple_call_set_lhs (call, lhs);
> +  gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +
> +  gimple_stmt_iterator psi = gsi_for_st

[PATCH] Simplify range-op shift mask generation

2024-09-24 Thread Richard Biener

The following reduces the number of wide_ints built which show up
in the profile for PR114855 as the largest remaining bit at -O1.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* range-op.cc (operator_rshift::op1_range): Use wi::mask instead
of shift and not.
---
 gcc/range-op.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index c576f688221..3f5cf083440 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2863,7 +2863,7 @@ operator_rshift::op1_range (irange &r,
   // OP1 is anything from 0011 1000 to 0011 .  That is, a
   // range from LHS<<3 plus a mask of the 3 bits we shifted on the
   // right hand side (0x07).
-  wide_int mask = wi::bit_not (wi::lshift (wi::minus_one (prec), shift));
+  wide_int mask = wi::mask (shift.to_uhwi (), false, prec);
   int_range_max mask_range (type,
wi::zero (TYPE_PRECISION (type)),
mask);
-- 
2.43.0

[PATCH 3/3] phiprop: VOP phi confuses phiprop [PR116824]

2024-09-24 Thread Andrew Pinski

Another small phiprop improvement, in some cases
we could have a vop defining statement be a phi which might
be the same bb as the load happens. This is ok since the phi
here is not a store so we can just accept it.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116824

gcc/ChangeLog:

* tree-ssa-phiprop.cc (propagate_with_phi): Don't
reject if the bb of the def_stmt is the same as load
and if the def_stmt was a phi.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phiprop-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c | 30 +++
 gcc/tree-ssa-phiprop.cc   |  3 ++-
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c
new file mode 100644
index 000..a0d5891dc60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-cselim-details 
-fdump-tree-phiopt2" } */
+
+/* PR tree-optimization/116824 */
+/* phiprop should be able to handle the case where the vops defining
+   statement was a phi in the same bb as the deference. */
+
+int g(int i, int *tt)
+{
+  const int t = 10;
+  const int *a;
+  {
+if (t < i)
+{
+  *tt = 1;
+  a = &t;
+}
+else
+{
+  *tt = 1;
+  a = &i;
+}
+  }
+  return *a;
+}
+
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
"phiprop1"} } */
+/* { dg-final { scan-tree-dump-times "factoring out stores" 1 "cselim"} } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt2"} } */
+
diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index f04990e8cb4..4d1df7d351e 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -401,7 +401,8 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
  def_stmt = SSA_NAME_DEF_STMT (vuse);
}
   if (!SSA_NAME_IS_DEFAULT_DEF (vuse)
- && (gimple_bb (def_stmt) == bb
+ && ((gimple_bb (def_stmt) == bb
+  && !is_a(def_stmt))
  || (gimple_bb (def_stmt)
  && !dominated_by_p (CDI_DOMINATORS,
  bb, gimple_bb (def_stmt)
-- 
2.43.0

[PATCH 1/3] Add an alternative testcase for PR 70740

2024-09-24 Thread Andrew Pinski

While looking into improving phiprop, I noticed that
the current pr70740.c testcase was being optimized almost
all the way before phiprop because the addresses were considered
the same; the arrays were all zero in size.

This adds an alternative testcase which changes the array sizes to be 1
and phiprop can and will act on this testcase now and the fix which was
being tested is actually tested now.

Tested on x86_64-linux-gnu.

PR 70740

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr70740-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr70740-1.c | 41 
 1 file changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr70740-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr70740-1.c 
b/gcc/testsuite/gcc.dg/torture/pr70740-1.c
new file mode 100644
index 000..77e6a2d7187
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr70740-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+
+/* This is an alternative to the original pr70740.c testcase,
+   arrays are now 1 in size where they were 0 in the other testcase. */
+
+extern int foo (void);
+extern void *memcpy (void *, const void *, __SIZE_TYPE__);
+
+struct
+{
+  char a[6];
+} d;
+struct
+{
+  int a1[1];
+  int a2[1];
+  int a3[1];
+  int a4[1];
+} a, c;
+int b;
+
+int *
+bar ()
+{
+  if (b)
+return a.a4;
+  return a.a2;
+}
+
+void
+baz ()
+{
+  int *e, *f;
+  if (foo ())
+e = c.a3;
+  else
+e = c.a1;
+  memcpy (d.a, e, 6);
+  f = bar ();
+  memcpy (d.a, f, 1);
+}
-- 
2.43.0

[PATCH 2/3] phiprop: Skip over clobbers [PR116823]

2024-09-24 Thread Andrew Pinski

In C++ code the clobber gets in the way of phiprop.
E.g.
```
  if (lr_bitpos.2401_412 < rr_bitpos.2402_413)
goto ; [INV]
  else
goto ; [INV]

   :

   :
  MEM[(struct poly_int *)&D.192544] ={v} {CLOBBER(bob)};
  _1060 = MEM[(const long int &)iftmp.2400_515];
```

The above comes from fold-const.cc. The clobber in the above case
is the clobber from the start of the constructor but other clobbers
can also get in the way, see gcc.dg/tree-ssa/phiprop-2.c for an example.
This shows up in a lot of C++ code where std::min/max (or even ?: like in the
fold-const.cc case) is used with in connection of constructors.
So optimizing this early in phiprop can improve code generation and compile
time speed.

g++.dg/tree-ssa/phiprop-2.C contains the reduced testcase from fold-const.cc.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116823

gcc/ChangeLog:

* tree-ssa-phiprop.cc (phiprop_insert_phi): Get
the use_vuse before the looping of the phi arguments,
also skip over clobbers to get the use_vuse.
(propagate_with_phi): Skip over clobbers for the vuse.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phiprop-2.c: New test.
* g++.dg/tree-ssa/phiprop-1.C: New test.
* g++.dg/tree-ssa/phiprop-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C | 23 +++
 gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C | 25 +
 gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c | 27 +++
 gcc/tree-ssa-phiprop.cc   | 25 -
 4 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C
new file mode 100644
index 000..e3388d1d157
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */
+
+/* PR tree-optimization/116823 */
+/* The clobber on a should not get in the way of phiprop here even if
+   this is undefined code. */
+/* We should have MIN_EXPR early on then too. */
+
+static inline
+const int &c(const int &d, const int &e) {
+  if (d < e)
+return d;
+  return e;
+}
+
+int g(int i, struct f *ff)
+{
+  const int &a = c(i, 10);
+  return a;
+}
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
"phiprop1"} } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "release_ssa"} } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C
new file mode 100644
index 000..1a0d6ed92ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */
+
+/* PR tree-optimization/116823 */
+/* The clobber on the temp s2 should not get in the way of phiprop here. */
+/* We should have MAX_EXPR early on then too. */
+/* This is derived from fold-const.cc; s2 is similar to poly_int. */
+
+struct s2
+{
+  int i;
+  s2(const int &a) : i (a) {}
+};
+
+
+int h(s2 b);
+
+int g(int l, int r)
+{
+  return h(l > r ? l : r);
+}
+
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
"phiprop1"} } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c
new file mode 100644
index 000..546031e63d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */
+
+/* PR tree-optimization/116823 */
+/* The clobber on b should not get in the way of phiprop here. */
+/* We should have MIN_EXPR early on. */
+
+void f(int *);
+
+int g(int i)
+{
+  const int t = 10;
+  const int *a;
+  {
+int b;
+f(&b);
+if (t < i)
+  a = &t;
+else
+  a = &i;
+  }
+  return *a;
+}
+
+/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 
"phiprop1"} } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "release_ssa"} } */
+
diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 2a1cdae46d2..f04990e8cb4 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -159,6 +159,20 @@ phiprop_insert_phi (basic_block bb, gphi *phi, gimple 
*use_stmt,
 }
 
   gphi *vphi = get_virtual_phi (bb);
+  tree use_vuse = gimple_vuse (use_stmt);
+  gimple *def_stmt = SSA_NAME_DEF_STMT (use_vuse);
+  /* Skip over clobbers in the same bb as the use
+ as they don't interfer with loads. */
+  while (!SSA_NAME_IS_DEFAULT_DEF (use_vuse)
+&& gimple_clobber_p (def_stmt

[r15-3834 Regression] FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) on Linux/x86_64

2024-09-24 Thread haochen.jiang

On Linux/x86_64,

96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 is the first bad commit
commit 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4
Author: Sandra Loosemore 
Date:   Fri Sep 6 20:58:13 2024 +

OpenMP: Check additional restrictions on context selector properties

caused

FAIL: c-c++-common/gomp/declare-variant-duplicates.c  (test for errors, line 11)
FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3834/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH 03/10] c++/modules: Use decl_linkage in maybe_record_mergeable_decl

2024-09-24 Thread Jason Merrill


On 9/23/24 7:44 PM, Nathaniel Shead wrote:

I don't currently have any testcases where this changes something, but I felt
it to be a valuable cleanup.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

This avoids any possible inconsistencies (current or future) about
whether a declaration is internal or not.

gcc/cp/ChangeLog:

* name-lookup.cc (maybe_record_mergeable_decl): Use decl_linkage
instead of ad-hoc checks.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc | 9 +
  1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 50e169eca43..c0f89f98d87 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3725,17 +3725,10 @@ maybe_record_mergeable_decl (tree *slot, tree name, 
tree decl)
if (TREE_CODE (*slot) != BINDING_VECTOR)
  return;
  
-  if (!TREE_PUBLIC (CP_DECL_CONTEXT (decl)))

-/* Member of internal namespace.  */
+  if (decl_linkage (decl) == lk_internal)
  return;
  
tree not_tmpl = STRIP_TEMPLATE (decl);

-  if ((TREE_CODE (not_tmpl) == FUNCTION_DECL
-   || VAR_P (not_tmpl))
-  && DECL_THIS_STATIC (not_tmpl))
-/* Internal linkage.  */
-return;
-
bool is_attached = (DECL_LANG_SPECIFIC (not_tmpl)
  && DECL_MODULE_ATTACH_P (not_tmpl));
tree *gslot = get_fixed_binding_slot

Re: [PATCH] Simplify range-op shift mask generation

2024-09-24 Thread Aldy Hernandez

Richard Biener  writes:

> The following reduces the number of wide_ints built which show up
> in the profile for PR114855 as the largest remaining bit at -O1.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Thanks.

Re: [PATCH] RISC-V: Fix FIXED_REGISTERS comment missing return address register

2024-09-24 Thread Jeff Law





On 9/24/24 2:11 AM, chenyix...@iscas.ac.cn wrote:

From: Yixuan Chen 

gcc/config/ChangeLog:

2024-09-24  Yixuan Chen  

* riscv/riscv.h: Fix FIXED_REGISTERS comment missing return address 
register.
Thanks.  I made minor fixes to the ChangeLog entry and pushed this to 
the trunk.


jeff

Re: [PATCH] c++/contracts: ICE in build_contract_condition_function [PR116490]

2024-09-24 Thread Jason Merrill


On 8/30/24 8:49 AM, Nina Dinka Ranns wrote:

We currently do not expect comdat group of the guarded function to
be set at the time of generating pre and post check function.
However, in the case of an explicit instantiation, the guarded
function has been added to a comdat group before generating contract
check functions, which causes the observed ICE. Current assert
removed and an additional check for comdat group of the guarded
function added. With this change, the pre and post check functions
get added to the same comdat group of the guarded function if the
guarded function is already placed in a comdat group.

Tested on x86_64-pc-linux-gnu.

Patch attached to the email.


Thanks for the ping, I missed this the first time.  Please CC me 
directly on C++ patches, especially on pings.


FWIW it attached as application/x-patch, which Thunderbird doesn't 
understand to display inline; text/plain or text/x-patch attachments 
work better.  I don't know how to tell gmail that it's text other than 
perhaps changing the extension to .txt.


Please include the ChangeLog entries in plaintext, along with the 
description/rationale.


Pushed, thanks!

Jason

Re: [PATCH 1/2] rtl-optimization/114855 - slow add_store_equivs in IRA

2024-09-24 Thread Jeff Law





On 9/24/24 6:34 AM, Richard Biener wrote:

For the testcase in PR114855 at -O1 add_store_equivs shows up as the
main sink for bitmap_set_bit because it uses a bitmap to mark all
seen insns by UID to make sure the forward walk in memref_used_between_p
will find the insn in question.  Given we do have a CFG here the
functions operation is questionable, given memref_used_between_p
together with the walk of all insns is obviously quadratic in the
worst case that whole thing should be re-done ... but, for the
testcase, using a sbitmap of size get_max_uid () + 1 gets
bitmap_set_bit off the profile and improves IRA time from 15.58s (8%)
to 3.46s (2%).

Now, given above quadraticness I wonder whether we should instead
gate add_store_equivs on optimize > 1 or flag_expensive_optimizations.

Jeff, you added the bitmap in r6-7529-g14d7d4be52585b, I have no idea
how get_insns () works at this point and in which CFG mode we are but
a simplification might be to simply verify both insns are in the same
BB and hopefully get_insns gets us walk the insns in order there, thus
we could elide the bitmap completely (with some loss of cases, but
the function comment suggests it is supposed to catch single-BB
cases only anyway?!).
I don't recall the work, but looking at the PR and history, I'm pretty 
confident the equivalence code here is assuming linear IL, so BB or 
perhaps EBB.   In retrospect it probably would have been better to 
restrict the check to a BB/EBB.




Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR rtl-optimization/114855
* ira.cc (add_store_equivs): Use sbitmap for tracking
visited insns.

OK
jeff

Re: [PATCH RFA] libstdc++: #ifdef out #pragma GCC system_header

2024-09-24 Thread Jonathan Wakely


On 23/09/24 11:06 -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu, OK for trunk?


Yes please, I've wanted this for years, but it wasn't practical until
you changed some of the warnings recently.

Thanks!

Re: [PATCH] tree-optimization/114855 - slow VRP due to equiv oracle queries

2024-09-24 Thread Andrew MacLeod


Absolutely ok.

Thanks!

Andrew

On 9/24/24 05:52, Richard Biener wrote:

For the testcase in PR114855 VRP takes 320.41s (23%) (after mitigating
backwards threader slowness).  This is mostly due to the bitmap check
in equiv_oracle::find_equiv_dom.  The following turns this bitmap
to tree view, trading the linear search for a O(log N) one which
improves VRP time to 54.54s (5%).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK if that
succeeds?

Thanks,
Richard.

PR tree-optimization/114855
* value-relation.cc (equiv_oracle::equiv_oracle): Switch
m_equiv_set to tree view.
---
  gcc/value-relation.cc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 45722fcd13a..d6ad2dd984f 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -321,6 +321,7 @@ equiv_oracle::equiv_oracle ()
m_equiv.create (0);
m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
m_equiv_set = BITMAP_ALLOC (&m_bitmaps);
+  bitmap_tree_view (m_equiv_set);
obstack_init (&m_chain_obstack);
m_self_equiv.create (0);
m_self_equiv.safe_grow_cleared (num_ssa_names + 1);

Re: [PATCH 2/2] Disable add_store_equivs when -fno-expensive-optimizations

2024-09-24 Thread Jeff Law





On 9/24/24 6:35 AM, Richard Biener wrote:

IRAs add_store_equivs is quadratic in the size of the function worst
case, disable it when -fno-expensive-optimizations which means at
-O1 and -Og.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

* ira.cc (ira): Gate add_store_equivs on flag_expensive_optimizations.

Given it's quadratic, definitely OK :-)

jeff

[PATCH v1 1/3] RISC-V: Refine the testcase of vector SAT_ADD

2024-09-24 Thread pan2 . li

From: Pan Li 

Take scan-assembler-times for vsadd insn check instead of function body,
as we only care about if we can generate the fixed point insn vsadd.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: Ditto.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_s_add-1.c   | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_s_add-2.c   | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_s_add-3.c   | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_s_add-4.c   | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c   | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-10.c  |  5 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-11.c  | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-12.c  | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-13.c  | 12 +---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-14.c  | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_add-15.c

Re: [PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC

2024-09-24 Thread 钟居哲

LGTM

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-09-25 14:45
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC
From: Pan Li 

Take scan-assembler-times for vsadd insn check instead of function body,
as we only care about if we can generate the fixed point insn vnclip.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: Ditto.

Signed-off-by: Pan Li 
---
.../rvv/autovec/unop/vec_sat_u_trunc-1.c  | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-10.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-11.c | 16 +-
.../rvv/autovec/unop/vec_sat_u_trunc-12.c | 12 +--
.../rvv/autovec/unop/vec_sat_u_trunc-13.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-14.c | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-15.c | 21 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-16.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-17.c | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-18.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-19.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-2.c  | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-20.c | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-21.c | 21 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-22.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-23.c | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-24.c | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-3.c  | 21 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-4.c  | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-5.c  | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-6.c  | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-7.c  | 13 ++--
.../rvv/autovec/unop/vec_sat_u_trunc-8.c  | 17 ++-
.../rvv/autovec/unop/vec_sat_u_trunc-9.c  | 21 ++-
24 files changed, 46 insertions(+), 328 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
index 186005733ec..3d29d26abff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
@@ -1,18 +1,9 @@
/* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
-/* { dg-skip-if "" { *-*-* } { "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details" } */
#include "../vec_sat_arith.h"
-/*
-** vec_sat_u_trunc_uint8_t_uint16_t_fmt_1:
-** ...
-** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*mf2,\s*ta,\s*ma
-** ...
-** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0
-** ...
-*/
DEF_VEC_SAT_U_TRUNC_FMT_1 (uint8_t, uint16_t)
/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 4 "expand" } } */
+/* { dg-final { scan-assembler-times {vnclipu\.wi} 1 } } */
diff --git 
a/gcc/testsuite/gcc.

[PATCH v1 2/3] RISC-V: Refine the testcase of vector SAT_SUB

2024-09-24 Thread pan2 . li

From: Pan Li 

Take scan-assembler-times for vsadd insn check instead of function body,
as we only care about if we can generate the fixed point insn vssub.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: Ditto.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-1.c  | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-10.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-11.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-12.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-13.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-14.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-15.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-16.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-17.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-18.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-19.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-2.c  | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-20.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-21.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-22.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-23.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-24.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-25.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-26.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_sat_u_sub-27.c | 13 ++---
 .../riscv/rvv/autovec/binop/vec_

[PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC

2024-09-24 Thread pan2 . li

From: Pan Li 

Take scan-assembler-times for vsadd insn check instead of function body,
as we only care about if we can generate the fixed point insn vnclip.

The below test are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Remove
func body check and take scan asm times instead.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: Ditto.

Signed-off-by: Pan Li 
---
 .../rvv/autovec/unop/vec_sat_u_trunc-1.c  | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-10.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-11.c | 16 +-
 .../rvv/autovec/unop/vec_sat_u_trunc-12.c | 12 +--
 .../rvv/autovec/unop/vec_sat_u_trunc-13.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-14.c | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-15.c | 21 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-16.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-17.c | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-18.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-19.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-20.c | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-21.c | 21 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-22.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-23.c | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-24.c | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  | 21 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-4.c  | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-5.c  | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-6.c  | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-7.c  | 13 ++--
 .../rvv/autovec/unop/vec_sat_u_trunc-8.c  | 17 ++-
 .../rvv/autovec/unop/vec_sat_u_trunc-9.c  | 21 ++-
 24 files changed, 46 insertions(+), 328 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
index 186005733ec..3d29d26abff 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
@@ -1,18 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
-/* { dg-skip-if "" { *-*-* } { "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details" } */
 
 #include "../vec_sat_arith.h"
 
-/*
-** vec_sat_u_trunc_uint8_t_uint16_t_fmt_1:
-** ...
-** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*mf2,\s*ta,\s*ma
-** ...
-** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0
-** ...
-*/
 DEF_VEC_SAT_U_TRUNC_FMT_1 (uint8_t, uint16_t)
 
 /* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 4 "expand" } } */
+/* { dg-final { scan-assembler-times {vnclipu\.wi} 1 } } */
diff --git 
a/gcc/te

RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-24 Thread Li, Pan2

Got it, thanks a lot.

Pan

-Original Message-
From: Uros Bizjak  
Sent: Tuesday, September 24, 2024 3:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; 
tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand 
promotion

On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2  wrote:
>
> Got it and thanks, let me rerun to make sure it works well as expected.

For reference, this is documented in:

https://gcc.gnu.org/wiki/Testing_GCC
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
https://gcc.gnu.org/install/test.html

Uros.

Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion

2024-09-24 Thread Uros Bizjak

On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2  wrote:
>
> Got it and thanks, let me rerun to make sure it works well as expected.

For reference, this is documented in:

https://gcc.gnu.org/wiki/Testing_GCC
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
https://gcc.gnu.org/install/test.html

Uros.

Re: [Patch, fortran] PR116733: Generic processing of assumed rank objects (f202y)

2024-09-24 Thread Andre Vehreschild

Hi Paul,

in addition to Thomas' remarks (which I second to), I have the following:

> diff --git a/gcc/fortran/intrinsic.cc b/gcc/fortran/intrinsic.cc
> index 0a6be215825..d95f35145b5 100644
> --- a/gcc/fortran/intrinsic.cc
> +++ b/gcc/fortran/intrinsic.cc
> @@ -293,11 +293,15 @@ do_ts29113_check (gfc_intrinsic_sym *specific, 
> gfc_actual_arglist *arg)
>&a->expr->where, gfc_current_intrinsic);
> ok = false;
>   }
> -  else if (a->expr->rank == -1 && !specific->inquiry)
> +  else if (a->expr->rank == -1
> +&& !(specific->inquiry
> + || (specific->id == GFC_ISYM_RESHAPE
> + && (gfc_option.allow_std & GFC_STD_F202Y
>   {
> gfc_error ("Assumed-rank argument at %L is only permitted as actual "
> -  "argument to intrinsic inquiry functions",
> -  &a->expr->where);
> +  "argument to intrinsic inquiry functions or to reshape. "

Is it not a convention to write Fortran intrinsics function names all
uppercase? I.e. RESHAPE when the function is meant just to make it clear like in
the message above on C_LOC and PRESENT (lines 268--270).

> +  "The latter is an experimental F202y feature. Use "
> +  "-std=f202y to enable", &a->expr->where);
> ok = false;
>   }
>else if (a->expr->rank == -1 && arg != a)
> @@ -307,6 +311,13 @@ do_ts29113_check (gfc_intrinsic_sym *specific,
> gfc_actual_arglist *arg) &a->expr->where, gfc_current_intrinsic);
> ok = false;
>   }
> +  else if (a->expr->rank == -1 && specific->id == GFC_ISYM_RESHAPE
> +&& !gfc_is_simply_contiguous (a->expr, true, false))
> + {
> +   gfc_error ("Assumed rank argument to the reshape intrinsic at %L "

Here, too?

> +  "must be contiguous", &a->expr->where);
> +   ok = false;
> + }
>  }
>  
>return ok;



> diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
> index 0cd78a57a2f..81610b93345 100644
> --- a/gcc/fortran/match.cc
> +++ b/gcc/fortran/match.cc
> @@ -1920,7 +1920,31 @@ gfc_match_associate (void)
>gfc_association_list* a;
>  
>/* Match the next association.  */
> -  if (gfc_match (" %n =>", newAssoc->name) != MATCH_YES)
> +  if (gfc_match (" %n ", newAssoc->name) != MATCH_YES)
> + {
> +   /* "Expected associate name at %C" would be better.
> +   Change associate_3.f03 to match.  */

That's an odd comment. Sounds to me like a remark to your self.

> +   gfc_error ("Expected associate name at %C");
> +   goto assocListError;
> + }
> +
> +  /* Required for an assumed rank target.  */
> +  if (gfc_peek_char () == '(')
> + {
> +   newAssoc->ar = gfc_get_array_ref ();

This is not freeed in case of an error and may result in a memory leak, right?

> +   if (gfc_match_array_ref (newAssoc->ar, NULL, 0, 0) != MATCH_YES)
> + {
> +   gfc_error ("Bad bounds remapping list at %C");
> +   goto assocListError;
> + }
> + }
> +
> +  if (newAssoc->ar && !(gfc_option.allow_std & GFC_STD_F202Y))
> + gfc_error_now ("The bounds remapping list at %C is an experimental "
> +"F202y feature. Use std=f202y to enable");
> +
> +  /* Match the next association.  */
> +  if (gfc_match (" =>", newAssoc->name) != MATCH_YES)
>   {
> gfc_error ("Expected association at %C");
> goto assocListError;



> diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
> index 07e28a9f7a8..aa0ee1b0164 100644
> --- a/gcc/fortran/trans-expr.cc
> +++ b/gcc/fortran/trans-expr.cc



> @@ -10784,6 +10815,13 @@ gfc_trans_pointer_assignment (gfc_expr * expr1,
> gfc_expr * expr2) 
> gcc_assert (remap->u.ar.start[dim] &&
> remap->u.ar.end[dim]); 
> +   if (remap->u.ar.start[dim]->expr_type != EXPR_CONSTANT
> +   || remap->u.ar.start[dim]->expr_type != EXPR_VARIABLE)
> + gfc_resolve_expr (remap->u.ar.start[dim]);
> +   if (remap->u.ar.end[dim]->expr_type != EXPR_CONSTANT
> +   || remap->u.ar.end[dim]->expr_type != EXPR_VARIABLE)
> + gfc_resolve_expr (remap->u.ar.end[dim]);
> +

Can't these resolves be done during resolve-stage? I have had some serious
trouble with late resolves, therefore asking.

> /* Convert declared bounds.  */
> gfc_init_se (&lower_se, NULL);
> gfc_init_se (&upper_se, NULL);



> diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
> index 86c54970475..450c11c06d7 100644
> --- a/gcc/fortran/trans-stmt.cc
> +++ b/gcc/fortran/trans-stmt.cc
> @@ -1910,6 +1910,20 @@ trans_associate_var (gfc_symbol *sym,
> gfc_wrapped_block *block) gfc_add_init_cleanup (block, gfc_finish_block
> (&se.pre), tmp); }
>/* Now all the other kinds of associate variable.  */
> +  else if (e->rank == -1 &&

[PATCH] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-24 Thread Jan Beulich

Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly
said "..., but we need to emit {evex} prefix in the assembly if AES ISA
is not enabled". Yet it did so only for the TARGET_AES insns. Going from
the alternative chosen in the TARGET_VAES insns is wrong for two
reasons:
- if, with AES disabled, the latter alternative was chosen despite no
  "high" XMM register nor any eGPR in use, gas would still pick the AES
  (VEX) encoding when no {evex} pseudo-prefix is in use (which is
  against - as stated by the description of said commit - AES presently
  not being considered a prereq of VAES in gcc);
- if AES is (also) enabled, EVEX encoding would needlessly be forced.

gcc/

* config/i386/sse.md (vaesdec_, vaesdeclast_,
vaesenc_, vaesenclast_): Replace which_alternative
check by TARGET_AES one.
---
As an aside - {evex} (and other) pseudo-prefixes would better be avoided
anyway whenever possible, as those are getting in the way of code
putting in place macro overrides for certain insns: gas 2.43 rejects
such bogus placement of pseudo-prefixes.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30802,7 +30802,7 @@
  UNSPEC_VAESDEC))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
@@ -30816,7 +30816,7 @@
  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
@@ -30830,7 +30830,7 @@
  UNSPEC_VAESENC))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
@@ -30844,7 +30844,7 @@
  UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";

81 matches

Mail list logo