Re: [PATCH v2] cse: avoid signed overflow in compute_const_anchors [PR 104843]

2022-03-10 Thread Richard Biener via Gcc-patches
On Wed, Mar 9, 2022 at 5:12 PM Xi Ruoyao  wrote:
>
> On Wed, 2022-03-09 at 15:55 +0100, Richard Biener wrote:
>
> > isn't it better to make targetm.const_anchor unsigned?
> > The & and ~ are not subject to overflow rules.
>
> It's not enough: if n is the minimum value of HOST_WIDE_INT and
> const_anchor = 0x8000 (the value for MIPS), we'll have a signed 0x7fff
> in *upper_base.  Then the next line, "*upper_offs = n - *upper_base;"
> will be a signed overflow again.
>
> How about the following?

Hmm, so all this seems to be to round CST up and down to a multiple of
CONST_ANCHOR.
It works on CONST_INT only which is sign-extended, so if there is
overflow the resulting
anchor is broken as far as I can see.  So instead of papering over this issue
the function should return false when n is negative since then
n & ~(targetm.const_anchor - 1) is also not n rounded down to a
multiple of const_anchor.

But of course I know nothing about this ..

Richard.

> -- >8 --
>
> With a non-zero const_anchor, the behavior of this function relied on
> signed overflow.
>
> gcc/
>
> PR rtl-optimization/104843
> * cse.cc (compute_const_anchors): Use unsigned HOST_WIDE_INT for
> n to perform overflow arithmetics safely.
> ---
>  gcc/cse.cc | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index a18b599d324..052fa0c3490 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -1169,12 +1169,12 @@ compute_const_anchors (rtx cst,
>HOST_WIDE_INT *lower_base, HOST_WIDE_INT *lower_offs,
>HOST_WIDE_INT *upper_base, HOST_WIDE_INT *upper_offs)
>  {
> -  HOST_WIDE_INT n = INTVAL (cst);
> -
> -  *lower_base = n & ~(targetm.const_anchor - 1);
> -  if (*lower_base == n)
> +  unsigned HOST_WIDE_INT n = UINTVAL (cst);
> +  unsigned HOST_WIDE_INT lb = n & ~(targetm.const_anchor - 1);
> +  if (lb == n)
>  return false;
>
> +  *lower_base = lb;
>*upper_base =
>  (n + (targetm.const_anchor - 1)) & ~(targetm.const_anchor - 1);
>*upper_offs = n - *upper_base;
> --
> 2.35.1
>
>
> >


Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-10 Thread Richard Biener via Gcc-patches
On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
> >>
> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> >> >  wrote:
> >> > >
> >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold 
> >> > > memcpy.
> >> > > The default is
> >> > >
> >> > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> >> > >
> >> > > For x86, it is MOVE_MAX to restore the old behavior before
> >> >
> >> > I know we've discussed this to death in the PR, I just want to repeat 
> >> > here
> >> > that the GIMPLE folding expects to generate a single load and a single
> >> > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> >> > was chosen originally (it's documented to what a "single instruction" 
> >> > does).
> >> > In practice MOVE_MAX does not seem to cover vector register sizes
> >> > so Richard pulled MOVE_RATIO which is really intended to cover
> >> > the case of using multiple instructions for moving memory (but then I
> >> > don't remember whether for the ARM case the single load/store GIMPLE
> >> > will be expanded to multiple load/store instructions).
> >> >
> >> > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> >> > being very specific for memcpy folding (we also fold memmove btw).
> >> >
> >> > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> >> > than MOVE_MAX here and still honor the idea of single instructions.
> >> > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> >> > not MOVE_MAX * MOVE_RATIO.
> >> >
> >> > So if we need a new hook then that hook should at least get the
> >> > 'speed' argument of MOVE_RATIO and it should get a better name.
> >> >
> >> > I still think that it should be possible to improve the insn check to
> >> > avoid use of "disabled" modes, maybe that's also a point to add
> >> > a new hook like .move_with_mode_p or so?  To quote, we do
> >>
> >> Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
> >
> > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> > mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> > and whose x86 implementation would already be fine (doing larger moves
> > and also not doing too large moves).  But appearantly the arm folks
> > decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO.
>
> It seems like there are old comments and old documentation that justify
> both interpretations, so there are good arguments on both sides.  But
> with this kind of thing I think we have to infer the meaning of the
> macro from the way it's currently used, rather than trusting such old
> and possibly out-of-date and contradictory information.
>
> FWIW, I agree that (if we exclude old reload, which we should!) the
> only direct uses of MOVE_MAX before the patch were not specific to
> integer registers and so MOVE_MAX should include vectors if the
> target wants vector modes to be used for general movement.
>
> Even if people disagree that that's the current meaning, I think it's
> at least a sensible meaning.  It provides information that AFAIK isn't
> available otherwise, and it avoids overlap with MAX_FIXED_MODE_SIZE.
>
> So FWIW, I think it'd be reasonable to change non-x86 targets if they
> want vector modes to be used for single-insn copies.

Note a slight complication in the GIMPLE folding case is that we
do not end up using vector modes but we're using "fake"
integer modes like OImode which x86 has move patterns for.
If we'd use vector modes we could use existing target hooks to
eventually decide whether auto-using those is desired or not.

>
> Thanks,
> Richard


Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-10 Thread Richard Biener via Gcc-patches
On Wed, Mar 9, 2022 at 7:08 PM H.J. Lu  wrote:
>
> On Wed, Mar 9, 2022 at 12:25 AM Richard Biener
>  wrote:
> >
> > On Tue, Mar 8, 2022 at 4:44 PM H.J. Lu  wrote:
> > >
> > > On Mon, Mar 7, 2022 at 5:45 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
> > > > >
> > > > > On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> > > > > > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to 
> > > > > > > fold memcpy.
> > > > > > > The default is
> > > > > > >
> > > > > > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> > > > > > >
> > > > > > > For x86, it is MOVE_MAX to restore the old behavior before
> > > > > >
> > > > > > I know we've discussed this to death in the PR, I just want to 
> > > > > > repeat here
> > > > > > that the GIMPLE folding expects to generate a single load and a 
> > > > > > single
> > > > > > store (that is what it does on the GIMPLE level) which is why 
> > > > > > MOVE_MAX
> > > > > > was chosen originally (it's documented to what a "single 
> > > > > > instruction" does).
> > > > > > In practice MOVE_MAX does not seem to cover vector register sizes
> > > > > > so Richard pulled MOVE_RATIO which is really intended to cover
> > > > > > the case of using multiple instructions for moving memory (but then 
> > > > > > I
> > > > > > don't remember whether for the ARM case the single load/store GIMPLE
> > > > > > will be expanded to multiple load/store instructions).
> > > > > >
> > > > > > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> > > > > > being very specific for memcpy folding (we also fold memmove btw).
> > > > > >
> > > > > > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> > > > > > than MOVE_MAX here and still honor the idea of single instructions.
> > > > > > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> > > > > > not MOVE_MAX * MOVE_RATIO.
> > > > > >
> > > > > > So if we need a new hook then that hook should at least get the
> > > > > > 'speed' argument of MOVE_RATIO and it should get a better name.
> > > > > >
> > > > > > I still think that it should be possible to improve the insn check 
> > > > > > to
> > > > > > avoid use of "disabled" modes, maybe that's also a point to add
> > > > > > a new hook like .move_with_mode_p or so?  To quote, we do
> > > > >
> > > > > Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
> > > >
> > > > Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> > > > mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> > > > and whose x86 implementation would already be fine (doing larger moves
> > > > and also not doing too large moves).  But appearantly the arm folks
> > > > decided that that's not fit and instead (mis-?)used MOVE_MAX * 
> > > > MOVE_RATIO.
> > > >
> > > > Yes, MOVE_MAX_PIECES is documented to apply to move_by_pieces.
> > > > Still GIMPLE memcpy/memmove inlining wants to mimic exactly that but
> > > > restrict itself to a single load and a single store.
> > > >
> > > > > >
> > > > > >   scalar_int_mode mode;
> > > > > >   if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> > > > > >   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 
> > > > > > 8
> > > > > >   && have_insn_for (SET, mode)
> > > > > >   /* If the destination pointer is not aligned we 
> > > > > > must be able
> > > > > >  to emit an unaligned store.  */
> > > > > >   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > > > > >   || !targetm.slow_unaligned_access (mode, 
> > > > > > dest_align)
> > > > > >   || (optab_handler (movmisalign_optab, mode)
> > > > > >   != CODE_FOR_nothing)))
> > > > > >
> > > > > > where I understand the ISA is enabled and if the user explicitely
> > > > > > uses it that's OK but -mprefer-avx128 should tell GCC to never
> > > > > > generate AVX256 code where the user was not explicitely using it
> > > > > > (still for example glibc might happily use AVX256 code to implement
> > > > > > the memcpy we are folding!)
> > > > > >
> > > > > > Note the BB vectorizer also might end up with using AVX256 because
> > > > > > in places it also relies on optab queries and the 
> > > > > > vector_mode_supported_p
> > > > > > check (but the memcpy folding uses the fake integer modes).  So
> > > > > > x86 might need to implement the related_mode hook to avoid 
> > > > > > "auto"-using
> > > > > > a larger vector mode which the default implementation would happily 
> > > > > > do.
> > > > > >
> > > > > > Richard.
> > > > >
> > > > > OK for master?
> > > >
> > > > Looking for opinions from others as well.
> > > >
> > > > Btw, there's a similar use in expand_DEFERRED_INIT:
> > > >
> > > >   && int_mode_for_size (tree

Re: [PATCH] rs6000: Improve .machine

2022-03-10 Thread Sebastian Huber

On 04/03/2022 17:51, Segher Boessenkool wrote:

Hi!

This adds more correct .machine for most older CPUs.  It should be
conservative in the sense that everything we handled before we handle at
least as well now.  This does not yet revamp the server CPU handling, it
is too risky at this point in time.

Tested on powerpc64-linux {-m32,-m64}.  Also manually tested with all
-mcpu=, and the output of that passed through the GNU assembler.

I plan to commit this later today.


Could this be back ported to GCC 10 and 11? I would fix the following 
issue for -mcpu=405:


Error: unrecognized opcode: `dlmzb.'

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[PATCH] c++: allow variadic operator[] for C++23 [PR103460]

2022-03-10 Thread Jakub Jelinek via Gcc-patches
Hi!

wg21.link/p2128 removed "with exactly one parameter" from over.sub
section.  grok_op_properties has for that the last 2 lines in:
case OVL_OP_FLAG_BINARY:
  if (arity != 2)
{
  if (operator_code == ARRAY_REF && cxx_dialect >= cxx23)
break;
but unfortunately it isn't enough, we reject variadic operator[]
earlier.  The following patch accepts variadic operator[] for C++23
too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-10  Jakub Jelinek  

PR c++/103460
* decl.cc (grok_op_properties): Allow variadic operator[] for
C++23.

* g++.dg/cpp23/subscript7.C: New test.

--- gcc/cp/decl.cc.jj   2022-03-09 15:24:57.285926439 +0100
+++ gcc/cp/decl.cc  2022-03-09 16:56:41.053901657 +0100
@@ -15214,6 +15214,9 @@ grok_op_properties (tree decl, bool comp
   if (!arg)
{
  /* Variadic.  */
+ if (operator_code == ARRAY_REF && cxx_dialect >= cxx23)
+   break;
+
  error_at (loc, "%qD must not have variable number of arguments",
decl);
  return false;
@@ -15289,7 +15292,8 @@ grok_op_properties (tree decl, bool comp
 }
 
   /* There can be no default arguments.  */
-  for (tree arg = argtypes; arg != void_list_node; arg = TREE_CHAIN (arg))
+  for (tree arg = argtypes; arg && arg != void_list_node;
+   arg = TREE_CHAIN (arg))
 if (TREE_PURPOSE (arg))
   {
TREE_PURPOSE (arg) = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp23/subscript7.C.jj  2022-03-09 17:02:22.915179262 
+0100
+++ gcc/testsuite/g++.dg/cpp23/subscript7.C 2022-03-09 17:02:18.446240994 
+0100
@@ -0,0 +1,17 @@
+// PR c++/103460
+// { dg-do compile }
+// { dg-options "-std=c++23" }
+
+struct S {
+  int &operator[] (int, ...);
+} s;
+struct T {
+  int &operator[] (auto...);
+} t;
+struct U {
+  int &operator[] (...);
+} u;
+
+int a = s[1] + s[2, 1] + s[3, 2, 1] + s[4, 3, 2, 1]
+   + t[0.0] + t[nullptr, s, 42]
+   + u[] + u[42] + u[1.5L, 1LL];

Jakub



Re: [Patch] Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

2022-03-10 Thread Thomas Schwinge
Hi Tobias!

On 2022-03-08T15:25:07+0100, Tobias Burnus  wrote:
> found when working on the deep-mapping patch* with OpenMP code
> (and part of that patch) but it already shows up in an existing
> OpenACC testcase. I think it makes sense to fix it already for GCC 12.
>
> Problem: Also for unallocated allocatables, their size was
> calculated - the 'if(desc.data == NULL)' check was only added
> for pointers.
>
> Result after the patch: When compiling with -O (which is the default
> for goacc.exp), the warning now disappears. Thus, I now use '-O0'
> and the previous "is uninitialized" is now "may be uninitialized".

I recently added that checking in
commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases", to document the status quo.

I'll leave it to you to decide what is more appropriate: (1), as you have
proposed, add '-O0' (but with source code comment, please); something
like:

 ! { dg-additional-options -Wuninitialized }
+! Trigger "may be used uninitialized".
+! { dg-additional-options -O0 }

..., or (2): update the test cases to simply reflect diagnostics that are
now (no longer) seen with (default) '-O' (rationale: the test cases
haven't originally been written for the '-Wuninitialized' diagnostics;
that's just tested additionally, and using '-O0' instead of '-O' may be
disturbing what they originally meant to test?), or (3): duplicate the
test cases to account for both (1) and (2) (in other words: write
dedicated test cases for your GCC/Fortran front end changes (for example,
based on the ones you've modified here), and for the existing test cases
apply (2)).  The latter, (3), would be my approach.

> Unrelated to the patch and the testcase, I added some
> 'allocate'**/'if(allocated())' to the testcase - as otherwise
> uninit vars would be accessed. (Not relevant for the warning
> or the patch - but I prefer no invalid code in testcases,
> if it can be avoided.)

Agreed in principle, but again: I don't know what these test cases
originally have been testing?

> OK for mainline?

I can't comment on the GCC/Fortran front end changes -- so unless
somebody else speaks up, that's an implicit approval for those, I
suppose.  ;-)

> Tobias
> * https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html

> ** I am actually not sure whether 'acc update(b)' will/should map a
> previous allocated variable - or whether it should.

(Are the typos here: in "will/should map": 's%map%update', and in
"or whether it should": 's%should%shouldn't'?)

> But that's
> unrelated to this bug fix. See also: https://gcc.gnu.org/PR96668
> for the re-mapping in OpenMP (works for arrays but not scalars).

I don't quickly dig that, sorry.  Do we need to first clarify that with
OpenACC Technical Committee, or is this just a GCC/OpenACC implementation
issue?


Grüße
 Thomas


> Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping
>
> gcc/fortran/ChangeLog:
>
>   * trans-openmp.cc (gfc_trans_omp_clauses, gfc_omp_finish_clause):
>   Obtain size for mapping only if allocatable array is allocated.
>
> gcc/testsuite/ChangeLog:
>
>   * gfortran.dg/goacc/array-with-dt-1.f90: Run with -O0 and
>   update dg-warning.
>   * gfortran.dg/goacc/pr93464.f90: Likewise.
>
>  gcc/fortran/trans-openmp.cc |  6 --
>  gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 | 12 +---
>  gcc/testsuite/gfortran.dg/goacc/pr93464.f90 |  8 
>  3 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
> index 4d56a771349..fad76a4791f 100644
> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -1597,7 +1597,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
> openacc)
>tree size = create_tmp_var (gfc_array_index_type);
>tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type));
>elemsz = fold_convert (gfc_array_index_type, elemsz);
> -  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
> +  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_ALLOCATABLE
> +   || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
> || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER_CONT)
>   {
> stmtblock_t cond_block;
> @@ -3208,7 +3209,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
>
> /* We have to check for n->sym->attr.dimension because
>of scalar coarrays.  */
> -   if (n->sym->attr.pointer && n->sym->attr.dimension)
> +   if ((n->sym->attr.pointer || n->sym->attr.allocatable)
> +   && n->sym->attr.dimension)
>   {
> stmtblock_t cond_block;
> tree size
> diff --git a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 
> b/gcc/testsuite/gfortran.dg/

Re: [PATCH] rs6000, v3: Fix up __SIZEOF_{FLOAT, IBM}128__ defines [PR99708]

2022-03-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 09, 2022 at 04:57:01PM -0600, Segher Boessenkool wrote:
> > > If you are fed up with all this, please commit what you have now (after
> > > testing of course ;-) ), and I'll pick up things myself.  Either way,
> > > thank you for all your work on this!
> > 
> > Ok, here is what I'll test momentarily:
> 
> Thanks again!

Unfortunately, while regtesting on powerpc64le-linux went fine
(except for
  l += 2.0;
in the last testcase should have been
  h += 2.0;
already fixed in my copy), on powerpc64-linux it regressed (both -m32 and
-m64) following testcase:

+FAIL: gcc.target/powerpc/convert-fp-128.c (test for excess errors)
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendddtd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendddtfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extenddfddM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extenddftdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsddd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsddfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsdtd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsdtfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsfddM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsfsdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendsftdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_extendtftdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_truncdddfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_truncddsd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_truncddsfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_truncdfsdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_truncsdsfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctddd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctddfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctdsd2M 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctdsfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctdtfM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctfddM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times mbl 
__dpd_trunctfsdM 1
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times 
mblM 24
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times 
mfmrM 0
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times 
mfrsp|xsrspM 2
+UNRESOLVED: gcc.target/powerpc/convert-fp-128.c scan-assembler-times 
mlfdM 2

The problem is that previously on the pre-VSX -mcpu=
where we support only TFmode being double double we accepted both:
typedef float __attribute__((mode(IF))) mode_if;
typedef float __attribute__((mode(KF))) mode_kf;
in the testcase.  handle_mode_attribute calls
  /* Allow the target a chance to translate MODE into something supported.
 See PR86324.  */
  mode = targetm.translate_mode_attribute (mode);
and the rs6000 hook for it looks like:
/* Target hook for translate_mode_attribute.  */
static machine_mode
rs6000_translate_mode_attribute (machine_mode mode)
{
  if ((FLOAT128_IEEE_P (mode)
   && ieee128_float_type_node == long_double_type_node)
  || (FLOAT128_IBM_P (mode)
  && ibm128_float_type_node == long_double_type_node))
return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
  return mode;
}
With the v3 patch, ibm128_float_type_node == long_double_type_node
in that case and IF -> TF translation looks correct to me, under
the hood it will do the same thing.
But the fact that it accepted KFmode before and silently handled
it like TFmode, where KFmode should be IEEE quad, while TFmode in this
case is double double looks like a bug to me.
So, I think we just need to adjust the testcase.
As mode_kf is only used #ifdef __FLOAT128_TYPE__, I've guarded its
definition with that condition too.

Thus, here is what I've committed in the end.

Note, really unsure about backports, this patch is quite large and
changes behavior here and there.  Probably easiest would be just
to revert th

Re: [PATCH] rs6000: Improve .machine

2022-03-10 Thread Segher Boessenkool
Hi!

On Thu, Mar 10, 2022 at 09:25:21AM +0100, Sebastian Huber wrote:
> On 04/03/2022 17:51, Segher Boessenkool wrote:
> >This adds more correct .machine for most older CPUs.  It should be
> >conservative in the sense that everything we handled before we handle at
> >least as well now.  This does not yet revamp the server CPU handling, it
> >is too risky at this point in time.
> >
> >Tested on powerpc64-linux {-m32,-m64}.  Also manually tested with all
> >-mcpu=, and the output of that passed through the GNU assembler.
> >
> >I plan to commit this later today.
> 
> Could this be back ported to GCC 10 and 11? I would fix the following 
> issue for -mcpu=405:
> 
> Error: unrecognized opcode: `dlmzb.'

Good to hear!

Unfortunately there is PR104829 about this commit.  I don't see how the
commit can break anything (that wasn't already broken); it's not clear
how it happens at all, and neither me nor colleagues could reproduce it
so far.

So I won't yet backport it, but first wait what happens here.

Thanks for the report,


Segher


[PATCH] libphobos: Enable on Solaris/SPARC or with /bin/as [PR 103528]

2022-03-10 Thread Rainer Orth
libphobos is currently only enabled on Solaris/x86 with gas.  As
discovered when gdc was switched to the dmd frontend, this initially
broke bootstrap for the other Solaris configurations.

However, it's now well possible to enable it both for Solaris/x86 with
as and Solaris/SPARC (both as and gas) since the original problems (x86
as linelength limit among others) are long gone.

The following patch does just that.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (both as and
gas) with gdc 9.3.0 (x86) resp. 9.4.0 (sparc, configured with
--enable-libphobos) as bootstrap compilers.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2021-12-01  Rainer Orth  

libphobos:
PR d/103528
* configure.ac : Remove
gas requirement.
* configure: Regenerate.
* configure.tgt (sparc*-*-solaris2.11*): Mark supported.

# HG changeset patch
# Parent  0c244e272e86453f81642685e5eee36c3996fadf
libphobos: Enable on Solaris/SPARC or with /bin/as [PR ?]

diff --git a/libphobos/configure.ac b/libphobos/configure.ac
--- a/libphobos/configure.ac
+++ b/libphobos/configure.ac
@@ -189,18 +189,6 @@ AC_MSG_CHECKING([for host support for li
 . ${srcdir}/configure.tgt
 case ${host} in
   x86_64-*-solaris2.* | i?86-*-solaris2.*)
-# libphobos doesn't compile with the Solaris/x86 assembler due to a
-# relatively low linelength limit.
-as_prog=`$CC -print-prog-name=as`
-if test -n "$as_prog" && $as_prog -v /dev/null 2>&1 | grep GNU > /dev/null 2>&1; then
-  druntime_cv_use_gas=yes;
-else
-  druntime_cv_use_gas=no;
-fi
-rm -f a.out
-if test x$druntime_cv_use_gas = xno; then
-  LIBPHOBOS_SUPPORTED=no
-fi
 # 64-bit D execution fails with Solaris ld without -z relax=transtls support.
 if test "$druntime_ld_gld" = "no" && test "$druntime_ld_relax_transtls" = "no"; then
   LIBPHOBOS_SUPPORTED=no
diff --git a/libphobos/configure.tgt b/libphobos/configure.tgt
--- a/libphobos/configure.tgt
+++ b/libphobos/configure.tgt
@@ -49,6 +49,9 @@ case "${target}" in
   s390*-linux*)
 	LIBPHOBOS_SUPPORTED=yes
 	;;
+  sparc*-*-solaris2.11*)
+	LIBPHOBOS_SUPPORTED=yes
+	;;
   x86_64-*-freebsd* | i?86-*-freebsd*)
 	LIBPHOBOS_SUPPORTED=yes
 	;;


Re: [PATCH] rs6000, v3: Fix up __SIZEOF_{FLOAT, IBM}128__ defines [PR99708]

2022-03-10 Thread Segher Boessenkool
Hi!

On Thu, Mar 10, 2022 at 10:35:36AM +0100, Jakub Jelinek wrote:
> On Wed, Mar 09, 2022 at 04:57:01PM -0600, Segher Boessenkool wrote:
> > > > If you are fed up with all this, please commit what you have now (after
> > > > testing of course ;-) ), and I'll pick up things myself.  Either way,
> > > > thank you for all your work on this!
> > > 
> > > Ok, here is what I'll test momentarily:
> > 
> > Thanks again!
> 
> Unfortunately, while regtesting on powerpc64le-linux went fine
> (except for
>   l += 2.0;
> in the last testcase should have been
>   h += 2.0;
> already fixed in my copy), on powerpc64-linux it regressed (both -m32 and
> -m64) following testcase:
> 
> +FAIL: gcc.target/powerpc/convert-fp-128.c (test for excess errors)



> The problem is that previously on the pre-VSX -mcpu=
> where we support only TFmode being double double we accepted both:
> typedef float __attribute__((mode(IF))) mode_if;
> typedef float __attribute__((mode(KF))) mode_kf;

There is no KFmode in that case, so the test case is just broken?  (It
should not depend on VSX at all, but that has been the situation since
forever).

It is not hard to ICE the compiler with bad mode attributes, this has
nothing to do with IEEE QP or anything.  It is comparable to how with
bad inline assembler you can cause ICEs (by giving RA no way out, that
it can see anyway).

> in the testcase.  handle_mode_attribute calls
>   /* Allow the target a chance to translate MODE into something supported.
>  See PR86324.  */
>   mode = targetm.translate_mode_attribute (mode);
> and the rs6000 hook for it looks like:
> /* Target hook for translate_mode_attribute.  */
> static machine_mode
> rs6000_translate_mode_attribute (machine_mode mode)
> {
>   if ((FLOAT128_IEEE_P (mode)
>&& ieee128_float_type_node == long_double_type_node)
>   || (FLOAT128_IBM_P (mode)
>   && ibm128_float_type_node == long_double_type_node))
> return COMPLEX_MODE_P (mode) ? E_TCmode : E_TFmode;
>   return mode;
> }

Bah.  That looks like a workaround for some other bug :-(

> With the v3 patch, ibm128_float_type_node == long_double_type_node
> in that case and IF -> TF translation looks correct to me, under
> the hood it will do the same thing.

We should not use IFmode at all there, because it will *not* do the same
thing: we do not handle IFmode in all cases there, only the few that use
this hook.

This needs to be fixed.  That needs fixes in generic code, that still
thinks there is a total order on the floating point modes; but there are
IFmode values that are not representable in KFmode, and KFmode values
that are not representable in IFmode.  Pretending they can be ordered
(in any direction) gives problems.  This is why we have these
workarounds.

> But the fact that it accepted KFmode before and silently handled
> it like TFmode, where KFmode should be IEEE quad, while TFmode in this
> case is double double looks like a bug to me.

Yes.  That is exactly why I did not want KFmode to be handled as the
wrong (but valid) mode: it hides problems, and not in a harmless way,
it makes problems much harder to find.

> So, I think we just need to adjust the testcase.

Hopefully!  There may be other things still depending on this
translation, so this does not give me the warm fuzzies :-(

> As mode_kf is only used #ifdef __FLOAT128_TYPE__, I've guarded its
> definition with that condition too.
> 
> Thus, here is what I've committed in the end.
> 
> Note, really unsure about backports, this patch is quite large and
> changes behavior here and there.  Probably easiest would be just
> to revert those __SIZEOF_*128__ rs6000 change on release branches?

Yes.

> Or backport a strictly __SIZEOF_*128__ related change (such as
> use TARGET_FLOAT128_TYPE as the condition on whether to predefine
> those macros or not together with moving __SIZEOF_FLOAT128__ to the other
> function next to __float128)?

And then more projects would need extra checks for broken code, making
this whole __SIZEOF_* thing less useful?  Not a fan.  Also, this would
be not a real backport at all :-(

Thanks again,


Segher


Enhance further testcases to verify handling of OpenACC privatization level [PR90115]

2022-03-10 Thread Thomas Schwinge
Hi!

On 2021-05-21T21:29:19+0200, I wrote:
> I've pushed "[OpenACC privatization] Largely extend diagnostics and
> corresponding testsuite coverage [PR90115]" to master branch in commit
> 11b8286a83289f5b54e813f14ff56d730c3f3185

To demonstrate that later changes don't vs. how they do change things,
pushed to master branch commit 1d9dc3dd74eddd192bec1ac6f4d6548a81deb9a5
"Enhance further testcases to verify handling of OpenACC privatization
level [PR90115]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 1d9dc3dd74eddd192bec1ac6f4d6548a81deb9a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 8 Mar 2022 11:51:55 +0100
Subject: [PATCH] Enhance further testcases to verify handling of OpenACC
 privatization level [PR90115]

As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	gcc/testsuite/
	* c-c++-common/goacc/nesting-1.c: Enhance.
	* gcc.dg/goacc/nested-function-1.c: Likewise.
	* gcc.dg/goacc/nested-function-2.c: Likewise.
	* gfortran.dg/goacc/nested-function-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-fortran/routine-1.f90: Enhance.
	* testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-9.f90: Likewise.
---
 gcc/testsuite/c-c++-common/goacc/nesting-1.c  | 57 +
 .../gcc.dg/goacc/nested-function-1.c  | 54 
 .../gcc.dg/goacc/nested-function-2.c  | 28 -
 .../gfortran.dg/goacc/nested-function-1.f90   | 62 +++
 .../libgomp.oacc-fortran/routine-1.f90| 19 +-
 .../libgomp.oacc-fortran/routine-2.f90| 19 +-
 .../libgomp.oacc-fortran/routine-3.f90| 19 +-
 .../libgomp.oacc-fortran/routine-9.f90| 19 +-
 8 files changed, 227 insertions(+), 50 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/nesting-1.c b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
index cab4f98950d..83cbff767a4 100644
--- a/gcc/testsuite/c-c++-common/goacc/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
@@ -1,3 +1,15 @@
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+   { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
+
 extern int i;
 
 void
@@ -5,7 +17,11 @@ f_acc_parallel (void)
 {
 #pragma acc parallel
   {
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+/* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+/* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_i$c_loop_i }
+   { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } l_loop_i$c_loop_i } */
+/* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
 for (i = 0; i < 2; ++i)
   ;
   }
@@ -15,9 +31,12 @@ f_acc_parallel (void)
 void
 f_acc_kernels (void)
 {
-#pragma acc kernels
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
   {
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+/* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+/* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_i$c_loop_i } */
 for (i = 0; i < 2; ++i)
   ;
   }
@@ -34,17 +53,25 @@ f_acc_data (void)
 
 #pragma acc parallel
 {
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization lev

Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]

2022-03-10 Thread Thomas Schwinge
Hi!

On 2022-02-14T16:56:35+0100, I wrote:
> [...] give rise to  "[12 Regression] ICE in
> expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".

Pushed to master branch commit 687091257820f4a6a005186437917270ecd27416
"Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]", see
attached: currently XFAILed with 'dg-ice'.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 687091257820f4a6a005186437917270ecd27416 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 27 Jan 2022 14:17:28 +0100
Subject: [PATCH] Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/102330
	gcc/testsuite/
	* gfortran.dg/goacc-gomp/pr102330-1.f90: New file.
	* gfortran.dg/goacc-gomp/pr102330-2.f90: Likewise.
	* gfortran.dg/goacc-gomp/pr102330-3.f90: Likewise.
---
 .../gfortran.dg/goacc-gomp/pr102330-1.f90 | 20 +
 .../gfortran.dg/goacc-gomp/pr102330-2.f90 | 20 +
 .../gfortran.dg/goacc-gomp/pr102330-3.f90 | 22 +++
 3 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90

diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
new file mode 100644
index 000..fba8c718dc2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-1.f90
@@ -0,0 +1,20 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  !$omp master taskloop simd
+  do i = 1, 8
+  end do
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
new file mode 100644
index 000..7a1ce8b088c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-2.f90
@@ -0,0 +1,20 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  !$omp taskloop lastprivate(i)
+  do i = 1, 8
+  end do
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
diff --git a/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90 b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
new file mode 100644
index 000..b8b1479c7ea
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc-gomp/pr102330-3.f90
@@ -0,0 +1,22 @@
+! { dg-additional-options -fchecking }
+! { dg-ice TODO }
+
+! { dg-additional-options -fopt-info-omp-note }
+
+! { dg-additional-options --param=openacc-privatization=noisy }
+
+program p
+  i = 0
+  !$omp task shared(i)
+  i = 1
+  !$omp end task
+  !$omp taskwait
+  !$acc parallel loop ! { dg-line l_compute1 }
+  ! { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 }
+  do i = 1, 8
+  end do
+end
+! { dg-bogus {Error: non-register as LHS of binary operation} TODO { target { ! offloading_enabled } xfail *-*-* } .-1 }
+! { dg-bogus {error: non-register as LHS of binary operation} TODO { target offloading_enabled xfail *-*-* } .-2 }
+! TODO See PR101551 for 'offloading_enabled' differences.
+! { dg-excess-errors ICE }
-- 
2.34.1



Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774]

2022-03-10 Thread Thomas Schwinge
Hi!

On 2022-03-10T12:13:29+0100, I wrote:
> On 2022-02-14T16:56:35+0100, I wrote:
>> [...] give rise to  "[12 Regression] ICE in
>> expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".
>
> Pushed to master branch commit 687091257820f4a6a005186437917270ecd27416
> "Add 'gfortran.dg/goacc-gomp/pr102330-{1,2,3}.f90' [PR102330]", see
> attached: currently XFAILed with 'dg-ice'.

Well, and as I should figure out, the very same problem/fix is what
causes/cures recently-filed  "OpenACC
'kernels' decomposition: internal compiler error: 'verify_gimple' failed,
with 'loop' with explicit 'seq' or 'independent'"!

Pushed to master branch commit 448741533a75862ebf51d8e73eb1dd1f6a47eec5
"Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c' [PR104774]", see
attached: currently XFAILed with 'dg-ice'.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 448741533a75862ebf51d8e73eb1dd1f6a47eec5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 3 Mar 2022 18:00:52 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104774-1.c'
 [PR104774]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/104774
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104774-1.c: New file.
---
 .../goacc/kernels-decompose-pr104774-1.c  | 41 +++
 1 file changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
new file mode 100644
index 000..776f4d6befa
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104774-1.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels /* { dg-line l_compute1 } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'k' declared in block requested to be made addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'k' made addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  {
+int k;
+
+/* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop seq /* { dg-line l_loop_k1 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+for (k = 0; k < 2; k++)
+  arr_0 = k;
+
+/* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent reduction(+: arr_0) /* { dg-line l_loop_k2 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+for (k = 0; k < 2; k++)
+  arr_0 += k;
+  }
+}
+/* { dg-bogus {error: non-register as LHS of binary operation} {} { xfail *-*-* } .-1 }
+   { dg-bogus {error: invalid RHS for gimple memory store: 'var_decl'} {} { xfail *-*-* } .-2 }
+   { dg-allow-blank-lines-in-output 1 }
+   { dg-excess-errors ICE } */
-- 
2.34.1



[committed][nvptx] Restore default to sm_30

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the
intention was that the sm_xx for all libraries was switched back to sm_30
using MULTILIB_EXTRA_OPTS, without changing the default sm_35.

Testing on an sm_30 board revealed that still some libs were build with sm_35,
so fix this by switching back to default sm_30.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Restore default to sm_30

gcc/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104758
* config/nvptx/nvptx.opt (misa): Set default to sm_30.
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Remove misa=sm_30.

---
 gcc/config/nvptx/nvptx.opt | 2 +-
 gcc/config/nvptx/t-nvptx   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index c83ceb3568b..fea99c5d406 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -53,7 +53,7 @@ Generate code for OpenMP offloading: enables -msoft-stack and 
-muniform-simt.
 
 ; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM35)
+Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM30)
 Specify the version of the ptx ISA to use.
 
 Enum
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 8f67264d132..a4a5341bb24 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -32,4 +32,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
 
 MULTILIB_OPTIONS = mgomp
 
-MULTILIB_EXTRA_OPTS = misa=sm_30 mptx=3.1
+MULTILIB_EXTRA_OPTS = mptx=3.1


[committed][nvptx] Add multilib mptx=3.1

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the
intention was that the ptx isa version for all libraries was switched back to
3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0.

Further testing revealed that this is not the case, and some libs were still
build with 6.0.

Fix this by introducing an mptx=3.1 multilib.

Adding a multilib should be avoided if possible, because it adds build time.
But I think it's a reasonable trade-off.  With --disable-multilib, the default
lib with misa=sm_30 and mptx=6.0 should be usable in most scenarios.  With
--enable-multilib, we can enable older drivers, as well as generate code
similar to how that was done in previous gcc releases, which is very useful.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add multilib mptx=3.1

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Move mptx=3.1 ...
(MULTILIB_OPTIONS): ... here.

---
 gcc/config/nvptx/t-nvptx | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index a4a5341bb24..b63c4a5a39d 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -30,6 +30,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
  tmp-nvptx-gen.opt $(srcdir)/config/nvptx/nvptx-gen.opt
$(STAMP) s-nvptx-gen-opt
 
-MULTILIB_OPTIONS = mgomp
-
-MULTILIB_EXTRA_OPTS = mptx=3.1
+MULTILIB_OPTIONS = mgomp mptx=3.1


[committed][nvptx] Use atom.and.b64 instead of atom.b64.and

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

The ptx manual prescribes the instruction format atom{.space}.op.type but the
compiler currently emits:
...
  atom.b64.and %r31, [%r30], %r32;
...
which uses the instruction format atom{.space}.type.op.

Fix this by emitting instead:
...
  atom.and.b64  %r31, [%r30], %r32;
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use atom.and.b64 instead of atom.b64.and

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "atomic_fetch_"):
Emit atom.and.b64 instead of atom.b64.and.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

* gcc.target/nvptx/atomic_fetch-1.c: Update.
* gcc.target/nvptx/atomic_fetch-2.c: Update.

---
 gcc/config/nvptx/nvptx.md   |  2 +-
 gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c | 36 -
 gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c | 18 ++---
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index a453c1de503..8079763077f 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2226,7 +2226,7 @@ (define_insn "atomic_fetch_"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.b%T0.\\t%0, %1, %2;";
+  = "%.\\tatom%A1..b%T0\\t%0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
 
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c 
b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
index 941cf3a2ab4..801572928cb 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c
@@ -66,35 +66,35 @@ main()
 /* Generic.  */
 
 /* { dg-final { scan-assembler-times "atom.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.xor.b32" 1 } } */
 
 /* Global.  */
 
 /* { dg-final { scan-assembler-times "atom.global.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.global.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.global.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.global.xor.b32" 1 } } */
 
 /* Shared.  */
 
 /* { dg-final { scan-assembler-times "atom.shared.add.u64" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b64.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.and.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.or.b64" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.xor.b64" 1 } } */
 
 /* { dg-final { scan-assembler-times "atom.shared.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.and" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.or" 1 } } */
-/* { dg-final { scan-assembler-times "atom.shared.b32.xor" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.and.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.or.b32" 1 } } */
+/* { dg-final { scan-assembler-times "atom.shared.xor.b32" 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c 
b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c
index f5131fc4984..fa8d158cac3 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-2.c
@@ -69,9 +69,9 @@ main()
 /* { dg-final { scan-assembler-times "atom.cas.b64" 3 } } */
 
 /* { dg-final { scan-assembler-times "atom.add.u32" 1 } } */
-/* { dg-final { scan-assembler-times "atom.b32.and" 1 } } *

[committed][nvptx] Use bit-bucket operand for atom insns

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

For an atomic fetch operation that doesn't use the result:
...
  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
...
we currently emit:
...
  atom.add.u64 %r26, [%r25], %r27;
...

Detect the REG_UNUSED reg-note for %r26, and emit instead:
...
  atom.add.u64 _, [%r25], %r27;
...

Likewise for all atom insns.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use bit-bucket operand for atom insns

gcc/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104815
* config/nvptx/nvptx.cc (nvptx_print_operand): Handle 'x' operand
modifier.
* config/nvptx/nvptx.md: Use %x0 destination operand in atom insns.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

PR target/104815
* gcc.target/nvptx/atomic-bit-bucket-dest.c: New test.

---
 gcc/config/nvptx/nvptx.cc  | 11 ++-
 gcc/config/nvptx/nvptx.md  | 10 +++
 .../gcc.target/nvptx/atomic-bit-bucket-dest.c  | 35 ++
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 6ca99a61cbd..14911bd15f1 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2835,7 +2835,8 @@ nvptx_mem_maybe_shared_p (const_rtx x)
S -- print a shuffle kind specified by CONST_INT
t -- print a type opcode suffix, promoting QImode to 32 bits
T -- print a type size in bits
-   u -- print a type opcode suffix without promotions.  */
+   u -- print a type opcode suffix without promotions.
+   x -- print a destination operand that may also be a bit bucket.  */
 
 static void
 nvptx_print_operand (FILE *file, rtx x, int code)
@@ -2863,6 +2864,14 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 
   switch (code)
 {
+case 'x':
+  if (current_output_insn != NULL
+ && find_reg_note (current_output_insn, REG_UNUSED, x) != NULL_RTX)
+   {
+ fputs ("_", file);
+ return;
+   }
+  goto common;
 case 'B':
   if (SYMBOL_REF_P (XEXP (x, 0)))
switch (SYMBOL_DATA_AREA (XEXP (x, 0)))
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 8079763077f..1cbf197065f 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -2050,7 +2050,7 @@ (define_insn "atomic_compare_and_swap_1"
   ""
   {
 const char *t
-  = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+  = "%.\\tatom%A1.cas.b%T0\\t%x0, %1, %2, %3;";
 return nvptx_output_atomic_insn (t, operands, 1, 4);
   }
   [(set_attr "atomic" "true")])
@@ -2076,7 +2076,7 @@ (define_insn "atomic_exchange"
return "";
   }
 const char *t
-  = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;";
+  = "%.\tatom%A1.exch.b%T0\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2166,7 +2166,7 @@ (define_insn "atomic_fetch_add"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+  = "%.\\tatom%A1.add%t0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2196,7 +2196,7 @@ (define_insn "atomic_fetch_addsf"
return "";
   }
 const char *t
-  = "%.\\tatom%A1.add%t0\\t%0, %1, %2;";
+  = "%.\\tatom%A1.add%t0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
   [(set_attr "atomic" "true")])
@@ -2226,7 +2226,7 @@ (define_insn "atomic_fetch_"
return "";
   }
 const char *t
-  = "%.\\tatom%A1..b%T0\\t%0, %1, %2;";
+  = "%.\\tatom%A1..b%T0\\t%x0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
   }
 
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c 
b/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c
new file mode 100644
index 000..7e3ffcece06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-bit-bucket-dest.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -misa=sm_35" } */
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0
+};
+
+unsigned long long int *p64;
+unsigned long long int v64;
+
+int
+main()
+{
+  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_and (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_or (p64, v64, MEMMODEL_RELAXED);
+  __atomic_fetch_xor (p64, v64, MEMMODEL_RELAXED);
+  __atomic_exchange_n (p64, v64, MEMMODEL_RELAXED);
+
+  {
+unsigned long long expected = v64;
+__atomic_compare_exchange_n (p64, &expected, 0, 0, MEMMODEL_RELAXED,
+MEMMODEL_RELAXED);
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "atom.add.u64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "atom.and.b64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "atom.or.b64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "atom.xor.b64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "atom.exch.b64\[\t \]+_," 1 } } */
+/* { 

[committed][nvptx] Handle unused result in nvptx_unisimt_handle_set

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

For an example:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
{
  #pragma omp atomic update
  counter_N0 = counter_N0 + 1 ;
}
...
I noticed that the result of the atomic update (%r30) is propagated:
...
  @%r33 atom.add.u32_, [%r29], 1;
shfl.sync.idx.b32   %r30, %r30, %r32, 31, 0x;
...
even though it is unused (which is why the bit bucket operand _ is used).

Fix this by not emitting the shuffle in this case, such that we have instead:
...
  @%r33 atom.add.u32_, [%r29], 1;
bar.warp.sync   0x;
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle unused result in nvptx_unisimt_handle_set

gcc/ChangeLog:

2022-03-07  Tom de Vries  

* config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Handle unused
result.

gcc/testsuite/ChangeLog:

2022-03-07  Tom de Vries  

* gcc.target/nvptx/uniform-simt-4.c: New test.

---
 gcc/config/nvptx/nvptx.cc   |  4 +++-
 gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c | 22 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 14911bd15f1..c41e305a34f 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -3274,7 +3274,9 @@ static bool
 nvptx_unisimt_handle_set (rtx set, rtx_insn *insn, rtx master)
 {
   rtx reg;
-  if (GET_CODE (set) == SET && REG_P (reg = SET_DEST (set)))
+  if (GET_CODE (set) == SET
+  && REG_P (reg = SET_DEST (set))
+  && find_reg_note (insn, REG_UNUSED, reg) == NULL_RTX)
 {
   emit_insn_after (nvptx_gen_shuffle (reg, reg, master, SHUFFLE_IDX),
   insn);
diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c 
b/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c
new file mode 100644
index 000..c33de7a4111
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-4.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -muniform-simt -mptx=_" } */
+
+enum memmodel
+{
+  MEMMODEL_RELAXED = 0
+};
+
+unsigned long long int *p64;
+unsigned long long int v64;
+
+int
+main()
+{
+  __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "atom.add.u64\[\t \]+_," 1 } } */
+/* { dg-final { scan-assembler-times "bar.warp.sync" 1 } } */
+/* { dg-final { scan-assembler-not "shfl.sync.idx" } } */


[committed][nvptx] Disable warp sync in simt region

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

I ran into a hang for this code:
...
  #pragma omp target map(tofrom: counter_N0)
  #pragma omp simd
  for (int i = 0 ; i < 1 ; i++ )
{
  #pragma omp atomic update
  counter_N0 = counter_N0 + 1 ;
}
...

This has to do with the nature of -muniform-simt.  It has two modes of
operation: inside and outside an SIMT region.

Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent.  This approach works unless the insn that is executed is a syscall
or an atomic insn.  In that case, the insn is predicated, such that it
executes in only one thread.  If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.

Inside an SIMT region, a warp executes in all threads.  However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code.  Care has been taken
though to ensure that the predication and propagation is a nop.  That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.

That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
  @%r33 atom.add.u32_, [%r29], 1;
shfl.sync.idx.b32   %r30, %r30, %r32, 31, 0x;
...

The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.

Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Disable warp sync in simt region

gcc/ChangeLog:

2022-03-08  Tom de Vries  

PR target/104783
* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
(nvptx_get_unisimt_outside_simt_predicate): New function.
(predicate_insn): New function, factored out of ...
(nvptx_reorg_uniform_simt): ... here.  Predicate all emitted insns.
* config/nvptx/nvptx.h (struct machine_function): Add
unisimt_outside_simt_predicate field.
* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
(define_insn "nvptx_uniform_warp_check"): Make predicable.

libgomp/ChangeLog:

2022-03-10  Tom de Vries  

* testsuite/libgomp.c/pr104783.c: New test.

---
 gcc/config/nvptx/nvptx.cc  | 45 +++---
 gcc/config/nvptx/nvptx.h   |  1 +
 gcc/config/nvptx/nvptx.md  | 29 --
 libgomp/testsuite/libgomp.c/pr104783.c | 18 ++
 4 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index c41e305a34f..3a7be63c290 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -1364,6 +1364,13 @@ nvptx_init_unisimt_predicate (FILE *file)
   int master = REGNO (cfun->machine->unisimt_master);
   int pred = REGNO (cfun->machine->unisimt_predicate);
   fprintf (file, "\t\tld.shared.u32 %%r%d, [%%r%d];\n", master, loc);
+  if (cfun->machine->unisimt_outside_simt_predicate)
+   {
+ int pred_outside_simt
+   = REGNO (cfun->machine->unisimt_outside_simt_predicate);
+ fprintf (file, "\t\tsetp.eq.u32 %%r%d, %%r%d, 0;\n",
+  pred_outside_simt, master);
+   }
   fprintf (file, "\t\tmov.u32 %%ustmp0, %%laneid;\n");
   /* Compute 'master lane index' as 'laneid & __nvptx_uni[tid.y]'.  */
   fprintf (file, "\t\tand.b32 %%r%d, %%r%d, %%ustmp0;\n", master, master);
@@ -1589,6 +1596,13 @@ nvptx_output_unisimt_switch (FILE *file, bool entering)
   fprintf (file, "\t{\n");
   fprintf (file, "\t\t.reg.u32 %%ustmp2;\n");
   fprintf (file, "\t\tmov.u32 %%ustmp2, %d;\n", entering ? -1 : 0);
+  if (cfun->machine->unisimt_outside_simt_predicate)
+{
+  int pred_outside_simt
+   = REGNO (cfun->machine->unisimt_outside_simt_predicate);
+  fprintf (file, "\t\tmov.pred %%r%d, %d;\n", pred_outside_simt,
+  entering ? 0 : 1);
+}
   if (!crtl->is_leaf)
 {
   int loc = REGNO (cfun->machine->unisimt_location);
@@ -3242,6 +3256,13 @@ nvptx_get_unisimt_predicate ()
   return pred ? pred : pred = gen_reg_rtx (BImode);
 }
 
+static rtx
+nvptx_get_unisimt_outside_simt_predicate ()
+{
+  rtx &pred = cfun->machine->unisimt_outside_simt_predicate;
+  return pred ? pred : pred = gen_reg_rtx (BImode);
+}
+
 /* Return true if given call insn references one of the functions provided by
the CUDA runtime: malloc,

[committed][nvptx] Use no,yes for attribute predicable

2022-03-10 Thread Tom de Vries via Gcc-patches
Hi,

The documentation states about the predicable instruction attribute:
...
This attribute must be a boolean (i.e. have exactly two elements in its
list-of-values), with the possible values being no and yes.
...

The nvptx port has instead:
...
(define_attr "predicable" "false,true"
  (const_string "true"))
...

Fix this by updating to:
...
(define_attr "predicable" "no,yes"
  (const_string "yes"))
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use no,yes for attribute predicable

gcc/ChangeLog:

2022-03-08  Tom de Vries  

PR target/104840
* config/nvptx/nvptx.md (define_attr "predicable"): Use no,yes instead
of false,true.

---
 gcc/config/nvptx/nvptx.md | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1ccb0f11e4c..1dec7caa0d1 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -172,8 +172,8 @@ (define_predicate "symbol_ref_function_operand"
   return SYMBOL_REF_FUNCTION_P (op);
 })
 
-(define_attr "predicable" "false,true"
-  (const_string "true"))
+(define_attr "predicable" "no,yes"
+  (const_string "yes"))
 
 (define_cond_exec
   [(match_operator 0 "predicate_operator"
@@ -911,7 +911,7 @@ (define_insn "br_true"
  (pc)))]
   ""
   "%j0\\tbra\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "br_false"
   [(set (pc)
@@ -921,7 +921,7 @@ (define_insn "br_false"
  (pc)))]
   ""
   "%J0\\tbra\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 ;; unified conditional branch
 (define_insn "br_true_uni"
@@ -931,7 +931,7 @@ (define_insn "br_true_uni"
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
   "%j0\\tbra.uni\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "br_false_uni"
   [(set (pc) (if_then_else
@@ -940,7 +940,7 @@ (define_insn "br_false_uni"
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
   "%J0\\tbra.uni\\t%l1;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "cbranch4"
   [(set (pc)
@@ -1619,7 +1619,7 @@ (define_insn "return"
 {
   return nvptx_output_return ();
 }
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "epilogue"
   [(clobber (const_int 0))]
@@ -1712,7 +1712,7 @@ (define_insn "trap_if_true"
(const_int 0))]
   ""
   "%j0 trap; %j0 exit;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "trap_if_false"
   [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R")
@@ -1720,7 +1720,7 @@ (define_insn "trap_if_false"
(const_int 0))]
   ""
   "%J0 trap; %J0 exit;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "ctrap4"
   [(trap_if (match_operator 0 "nvptx_comparison_operator"
@@ -1769,28 +1769,28 @@ (define_insn "nvptx_fork"
   UNSPECV_FORK)]
   ""
   "// fork %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_forked"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_FORKED)]
   ""
   "// forked %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_joining"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_JOINING)]
   ""
   "// joining %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_join"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
   UNSPECV_JOIN)]
   ""
   "// join %0;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "oacc_fork"
   [(set (match_operand:SI 0 "nvptx_nonmemory_operand" "")
@@ -2035,7 +2035,7 @@ (define_insn "atomic_compare_and_swap_1_local"
output_asm_insn ("}", NULL);
return "";
   }
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "atomic_compare_and_swap_1"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
@@ -2263,7 +2263,7 @@ (define_insn "nvptx_barsync"
  ? "\\tbarrier.sync\\t%0, %1;"
  : "\\tbar.sync\\t%0, %1;");
   }
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_insn "nvptx_warpsync"
   [(unspec_volatile [(const_int 0)] UNSPECV_WARPSYNC)]
@@ -2310,7 +2310,7 @@ (define_insn "*memory_barrier"
(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]
   ""
   "\\tmembar.sys;"
-  [(set_attr "predicable" "false")])
+  [(set_attr "predicable" "no")])
 
 (define_expand "nvptx_membar_cta"
   [(set (match_dup 0)
@@ -2326,7 +2326,7 @@ (define_insn "*nvptx_membar_cta"
(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR

[OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, PR102330, PR104774]

2022-03-10 Thread Thomas Schwinge
Hi!

On 2022-02-15T13:40:09+, Julian Brown  wrote:
> On Mon, 14 Feb 2022 16:56:35 +0100
> Thomas Schwinge  wrote:
>> Two more questions here, in context of 
>> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
>> since r12-980-g29a2f51806c":
>>
>> On 2019-06-03T17:02:45+0100, Julian Brown  wrote:
>> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This 
>> > information
>> > +   is used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
>> > +{
>> > +  tree c;
>> > +
>> > +  if (!ctx)
>> > +return;
>> > +
>> > +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> > +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
>> > +  {
>> > +  tree decl = OMP_CLAUSE_DECL (c);
>> > +  if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
>> > +ctx->oacc_addressable_var_decls->safe_push (decl);
>> > +  }
>> > +}
>>
>> So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
>> through 'lookup_decl (decl, ctx)')...
>
> I think you're right that this one should be using lookup_decl, but...
>
>> > +/* Record addressable vars declared in BINDVARS in CTX.  This information 
>> > is
>> > +   used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
>> > +{
>> > +  if (!ctx)
>> > +return;
>> > +
>> > +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
>> > +if (VAR_P (v) && TREE_ADDRESSABLE (v))
>> > +  ctx->oacc_addressable_var_decls->safe_push (v);
>> > +}
>>
>> ..., and similarly here analyze 'v' (without 'lookup_decl (v, ctx)')...
>
> I'm not so sure about this one: if the variables are declared at a
> particular binding level, I think they have to be in the current OMP
> context (and thus shadow any definitions that might be present in the
> parent context)? Maybe that can be confirmed via an assertion.

Yes, I've added an 'gcc_checking_assert (lookup_decl (v, ctx) == v);'.

>> > +/* Mark addressable variables which are declared implicitly or explicitly 
>> > as
>> > +   gang private with a special attribute.  These may need to have their
>> > +   declarations altered later on in compilation (e.g. in
>> > +   execute_oacc_device_lower or the backend, depending on how the OpenACC
>> > +   execution model is implemented on a given target) to ensure that 
>> > sharing
>> > +   semantics are correct.  */
>> > +
>> > +static void
>> > +mark_oacc_gangprivate (vec *decls, omp_context *ctx)
>> > +{
>> > +  int i;
>> > +  tree decl;
>> > +
>> > +  FOR_EACH_VEC_ELT (*decls, i, decl)
>> > +{
>> > +  for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
>> > +  {
>> > +tree inner_decl = maybe_lookup_decl (decl, thisctx);
>> > +if (inner_decl)
>> > +  {
>> > +decl = inner_decl;
>> > +break;
>> > +  }
>> > +  }
>> > +  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
>> > +  {
>> > +if (dump_file && (dump_flags & TDF_DETAILS))
>> > +  {
>> > +fprintf (dump_file,
>> > + "Setting 'oacc gangprivate' attribute for decl:");
>> > +print_generic_decl (dump_file, decl, TDF_SLIM);
>> > +fputc ('\n', dump_file);
>> > +  }
>> > +DECL_ATTRIBUTES (decl)
>> > +  = tree_cons (get_identifier ("oacc gangprivate"),
>> > +   NULL, DECL_ATTRIBUTES (decl));
>> > +  }
>> > +}
>> > +}
>>
>> ..., but here we action on the 'maybe_lookup_decl'-translated
>> 'inner_decl', if applicable.  In certain cases that one may be
>> different from the original 'decl'.  (In particular (only?), when the
>> OMP lowering has made 'decl' "late 'TREE_ADDRESSABLE'".)  This
>> assymetry I understand to give rise to 
>> "[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932
>> since r12-980-g29a2f51806c".
>>
>> It makes sense to me that we do the OpenACC privatization on the
>> 'lookup_decl' -- but shouldn't we then do that in the analysis phase,
>> too?  (This appears to work fine for OpenACC 'private' clauses (...,
>> and avoids marking a few as addressable/gang-private), and for those
>> in 'gimple_bind_vars' it doesn't seem to make a difference (for the
>> current test cases and/or compiler transformations).)
>
> Yes, I think you're right.
>
>> And, second question: what case did you run into or foresee, that you
>> here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a
>> plain 'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.
>
> I'd probably misunderstood about lookup_decl walking up through parent
> contexts itself... oops.
>
>> Unless you think this needs more consideration, I suggest to do these
>> two changes.  (I have a WIP patch in testing.)
>
> Sounds good to me.

Thanks for your conceptual review.  Pushed to master branch
commit 7a5e036b61aa088e6b856

Re: [PATCH] libphobos: Enable on Solaris/SPARC or with /bin/as [PR 103528]

2022-03-10 Thread Iain Buclaw via Gcc-patches
Excerpts from Rainer Orth's message of März 10, 2022 11:19 am:
> libphobos is currently only enabled on Solaris/x86 with gas.  As
> discovered when gdc was switched to the dmd frontend, this initially
> broke bootstrap for the other Solaris configurations.
> 
> However, it's now well possible to enable it both for Solaris/x86 with
> as and Solaris/SPARC (both as and gas) since the original problems (x86
> as linelength limit among others) are long gone.
> 
> The following patch does just that.
> 
> Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (both as and
> gas) with gdc 9.3.0 (x86) resp. 9.4.0 (sparc, configured with
> --enable-libphobos) as bootstrap compilers.
> 
> Ok for trunk?

OK.

Thanks,
Iain.


Re: [PATCH v2] cse: avoid signed overflow in compute_const_anchors [PR 104843]

2022-03-10 Thread Xi Ruoyao via Gcc-patches
On Thu, 2022-03-10 at 09:01 +0100, Richard Biener wrote:
> On Wed, Mar 9, 2022 at 5:12 PM Xi Ruoyao 
> wrote:
> > 
> > On Wed, 2022-03-09 at 15:55 +0100, Richard Biener wrote:
> > 
> > > isn't it better to make targetm.const_anchor unsigned?
> > > The & and ~ are not subject to overflow rules.
> > 
> > It's not enough: if n is the minimum value of HOST_WIDE_INT and
> > const_anchor = 0x8000 (the value for MIPS), we'll have a signed
> > 0x7fff
> > in *upper_base.  Then the next line, "*upper_offs = n -
> > *upper_base;"
> > will be a signed overflow again.
> > 
> > How about the following?
> 
> Hmm, so all this seems to be to round CST up and down to a multiple of
> CONST_ANCHOR.
> It works on CONST_INT only which is sign-extended, so if there is
> overflow the resulting
> anchor is broken as far as I can see.

On MIPS addiu/daddiu do 2-complement addition, so the overflowed result
is still usable.

> So instead of papering over this issue
> the function should return false when n is negative since then
> n & ~(targetm.const_anchor - 1) is also not n rounded down to a
> multiple of const_anchor.

This function does work for negative n, like:

void g (int, int);
void
f (void)
{
  g(0x8123, 0x81240001);
}

It should produce:

li  $4,-2128347136  # 0x8124
daddiu  $5,$4,1
daddiu  $4,$4,-1
jal g

But return false for negative n will cause regression for this case,
producing:

li  $5,-2128347136  # 0x8124
li  $4,-2128412672  # 0x8123
ori $5,$5,0x1
ori $4,$4,0x
jal g

That being said, it indeed does not work for:

void g (int, int);
void f ()
{
  g (0x7fff, 0x8001);
}

It produces:

li  $5,-2147483648  # 0x8000
li  $4,2147418112   # 0x7fff
daddiu  $5,$5,1
ori $4,$4,0x
jal g

Should be:

li  $5,-2147483648  # 0x8000
daddiu  $5,$5,1
addiu   $4,$5,-1

> > -- >8 --
> > 
> > With a non-zero const_anchor, the behavior of this function relied on
> > signed overflow.
> > 
> > gcc/
> > 
> >     PR rtl-optimization/104843
> >     * cse.cc (compute_const_anchors): Use unsigned HOST_WIDE_INT for
> >     n to perform overflow arithmetics safely.
> > ---
> >  gcc/cse.cc | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/cse.cc b/gcc/cse.cc
> > index a18b599d324..052fa0c3490 100644
> > --- a/gcc/cse.cc
> > +++ b/gcc/cse.cc
> > @@ -1169,12 +1169,12 @@ compute_const_anchors (rtx cst,
> >    HOST_WIDE_INT *lower_base, HOST_WIDE_INT *lower_offs,
> >    HOST_WIDE_INT *upper_base, HOST_WIDE_INT *upper_offs)
> >  {
> > -  HOST_WIDE_INT n = INTVAL (cst);
> > -
> > -  *lower_base = n & ~(targetm.const_anchor - 1);
> > -  if (*lower_base == n)
> > +  unsigned HOST_WIDE_INT n = UINTVAL (cst);
> > +  unsigned HOST_WIDE_INT lb = n & ~(targetm.const_anchor - 1);
> > +  if (lb == n)
> >  return false;
> > 
> > +  *lower_base = lb;
> >    *upper_base =
> >  (n + (targetm.const_anchor - 1)) & ~(targetm.const_anchor - 1);
> >    *upper_offs = n - *upper_base;
> > --
> > 2.35.1
> > 
> > 
> > > 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] contrib: Avoid use of "echo -n" in git customization [PR102664]

2022-03-10 Thread Richard Earnshaw via Gcc-patches




On 09/03/2022 15:05, Jonathan Wakely via Gcc-patches wrote:

On 09/03/22 12:15 +, Richard Earnshaw wrote:

The -n option to echo is non-portable.  The generally recommended
alternative is to use the shell printf command.

contrib/ChangeLog:

PR other/102664
* gcc-git-customization.sh (ask): Use printf instead of echo -n.

diff --git a/contrib/gcc-git-customization.sh 
b/contrib/gcc-git-customization.sh

index b24948d9874..cf46c494a6a 100755
--- a/contrib/gcc-git-customization.sh
+++ b/contrib/gcc-git-customization.sh
@@ -7,7 +7,7 @@ ask () {
    question=$1
    default=$2
    var=$3
-    echo -n $question "["$default"]? "
+    printf "%s" "$question [$default]? "
    read answer
    if [ "x$answer" = "x" ]
    then


This isn't enough to get the script working on AIX and Solaris. The
attached patch has been tested on Fedora Linux, NetBSD 9.2, AIX 7 and
Solaris 11.

The part checking the result of `git rev-parse --git-path hooks` was
needed to work around Git 2.4.0 on gcc211 in the compile farm, which
is a Solaris 11 sparc box. That's a truly ancient version, but
handling the error (and just skipping installation of the hook) isn't
difficult, so seems worthwhile. I can revert that part if preferred.

OK for trunk?




OK.


Re: [Patch] Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

2022-03-10 Thread Tobias Burnus

Hi Thomas, hi all,

(Updated patch attached – only changes the goacc testcases.
I intent to commit the patch tomorrow, unless more comments come up.)

On 10.03.22 10:00, Thomas Schwinge wrote:

[OpenACC testcases:]


I recently added that checking in ... [...] to document the status quo.
(1), as you have proposed, add '-O0' (but with source code comment, please); 
[...]
..., or (2): update the test cases [...] (rationale: the test cases
haven't originally been written for the '-Wuninitialized' diagnostics; [...]
(3): duplicate the test cases to account for both (1) and (2)
[...] (3), would be my approach.


Attached patch does (3). I also remove the code tweaking, added in previous
patch. - But added a bunch of comments.

And I have to admit that I did not realize that the -Wuninitialized was only
added later. (I did not expect that new flags get added to existing patches.)


** I am actually not sure whether 'acc update(b)' will/should map a
previous allocated variable - or whether it should. [...]
testcases.


Should be: "previously *un*allocated" (+ Thomas s%...%.. comments).


I don't quickly dig that, sorry.  Do we need to first clarify that with
OpenACC Technical Committee, or is this just a GCC/OpenACC implementation
issue?


That's mostly an OpenACC spec question.
But I did not check what the spec says. Thus, I don't know
* whether the spec needs to be improved
* what the implications are on the implementation.

I assume that the implementation for OpenMP does also make sense for
OpenACC (i.e. either works as required or does more but in a sensible
way) - but I don't know.

I think either/both of us should check the OpenACC spec.

 * * *

On the OpenMP side (clarified in 5.1 or 5.2):
* Map first an unallocated allocatable
=> That one is regarded as mapped/present
* Allocate it and map it again (e.g. implicit/explicit
  mapping for a target region)
=> Spec: Implementation may or may not update the item.
   However, (only) with 'always' the spec guarantees that it
   will also show up as allocated on the device.
=> Sentiment: should also work without 'always' modifier
   if previously unallocated. (Discussion postponed.)
* I have not checked 'update' but I think it behaves
  like 'map(always,to/from:' (except for ref counting)

OpenMP GCC implementation: there is a known issue for
scalar allocatables (PR96668). It does work for arrays
(also without 'always') - and 'omp update' has not been
checked. (Ref counting issues?)

OpenACC GCC implementation: I think this code is shared
with OpenMP and, thus, works likewise. But I have have
also not checked this.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_omp_finish_clause):
	Obtain size for mapping only if allocatable array is allocated.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/array-with-dt-1.f90: Update/add comments;
	remove dg-warning for 'is used uninitialized'.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-1a.f90: New; copied from
	gfortran.dg/goacc/array-with-dt-1.f90 but run with -O0. Update
	dg-warning for 'may be used uninitialized'.
	* gfortran.dg/goacc/pr93464-2.f90: Likewise; copied from
	gfortran.dg/goacc/pr93464.f90.

 gcc/fortran/trans-openmp.cc|  6 +++--
 .../gfortran.dg/goacc/array-with-dt-1.f90  | 18 ---
 .../gfortran.dg/goacc/array-with-dt-1a.f90 | 27 ++
 gcc/testsuite/gfortran.dg/goacc/pr93464-2.f90  | 26 +
 gcc/testsuite/gfortran.dg/goacc/pr93464.f90| 12 ++
 5 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4d56a771349..fad76a4791f 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1597,7 +1597,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc)
   tree size = create_tmp_var (gfc_array_index_type);
   tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type));
   elemsz = fold_convert (gfc_array_index_type, elemsz);
-  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
+  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_ALLOCATABLE
+	  || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
 	  || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER_CONT)
 	{
 	  stmtblock_t cond_block;
@@ -3208,7 +3209,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 
 		  /* We have to check for n->sym->attr.dimension because
 			 of scalar coarrays.  */
-		  if (n->sym->attr.pointer && n->sym->attr.dimension)
+		  if ((n->sym->attr.pointer || n->sym->attr.allocatable)
+			

[PATCH] tree-optimization/102943 - avoid (re-)computing dominance bitmap

2022-03-10 Thread Richard Biener via Gcc-patches
Currently back_propagate_equivalences tries to optimize dominance
queries in a smart way but it fails to notice that when fast indexes
are available the dominance query is fast (when called from DOM).
It also re-computes the dominance bitmap for each equivalence recorded
on an edge, which for FP are usually several.  Finally it fails to
use the tree bitmap view for efficiency.  Overall this cuts 7
seconds of compile-time from originally 77 in the slowest LTRANS
unit when building 521.wrf_r.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2022-03-10  Richard Biener  

PR tree-optimization/102943
* tree-ssa-dom.cc (back_propagate_equivalences): Only
populate the dominance bitmap if fast queries are not
available.  Use a tree view bitmap.
(record_temporary_equivalences): Cache the dominance bitmap
across all equivalences on the edge.
---
 gcc/tree-ssa-dom.cc | 58 +++--
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index fc90c207b52..21745bf31d3 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -1025,12 +1025,13 @@ dom_valueize (tree t)
additional equivalences that are valid on edge E.  */
 static void
 back_propagate_equivalences (tree lhs, edge e,
-class const_and_copies *const_and_copies)
+class const_and_copies *const_and_copies,
+bitmap *domby)
 {
   use_operand_p use_p;
   imm_use_iterator iter;
-  bitmap domby = NULL;
   basic_block dest = e->dest;
+  bool domok = (dom_info_state (CDI_DOMINATORS) == DOM_OK);
 
   /* Iterate over the uses of LHS to see if any dominate E->dest.
  If so, they may create useful equivalences too.
@@ -1053,27 +1054,38 @@ back_propagate_equivalences (tree lhs, edge e,
   if (!lhs2 || TREE_CODE (lhs2) != SSA_NAME)
continue;
 
-  /* Profiling has shown the domination tests here can be fairly
-expensive.  We get significant improvements by building the
-set of blocks that dominate BB.  We can then just test
-for set membership below.
-
-We also initialize the set lazily since often the only uses
-are going to be in the same block as DEST.  */
-  if (!domby)
+  if (domok)
{
- domby = BITMAP_ALLOC (NULL);
- basic_block bb = get_immediate_dominator (CDI_DOMINATORS, dest);
- while (bb)
+ if (!dominated_by_p (CDI_DOMINATORS, dest, gimple_bb (use_stmt)))
+   continue;
+   }
+  else
+   {
+ /* Profiling has shown the domination tests here can be fairly
+expensive when the fast indexes are not computed.
+We get significant improvements by building the
+set of blocks that dominate BB.  We can then just test
+for set membership below.
+
+We also initialize the set lazily since often the only uses
+are going to be in the same block as DEST.  */
+
+ if (!*domby)
{
- bitmap_set_bit (domby, bb->index);
- bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+ *domby = BITMAP_ALLOC (NULL);
+ bitmap_tree_view (*domby);
+ basic_block bb = get_immediate_dominator (CDI_DOMINATORS, dest);
+ while (bb)
+   {
+ bitmap_set_bit (*domby, bb->index);
+ bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+   }
}
-   }
 
-  /* This tests if USE_STMT does not dominate DEST.  */
-  if (!bitmap_bit_p (domby, gimple_bb (use_stmt)->index))
-   continue;
+ /* This tests if USE_STMT does not dominate DEST.  */
+ if (!bitmap_bit_p (*domby, gimple_bb (use_stmt)->index))
+   continue;
+   }
 
   /* At this point USE_STMT dominates DEST and may result in a
 useful equivalence.  Try to simplify its RHS to a constant
@@ -1083,9 +1095,6 @@ back_propagate_equivalences (tree lhs, edge e,
   if (res && (TREE_CODE (res) == SSA_NAME || is_gimple_min_invariant 
(res)))
record_equality (lhs2, res, const_and_copies);
 }
-
-  if (domby)
-BITMAP_FREE (domby);
 }
 
 /* Record into CONST_AND_COPIES and AVAIL_EXPRS_STACK any equivalences implied
@@ -1110,6 +1119,7 @@ record_temporary_equivalences (edge e,
   for (i = 0; edge_info->cond_equivalences.iterate (i, &eq); ++i)
avail_exprs_stack->record_cond (eq);
 
+  bitmap domby = NULL;
   edge_info::equiv_pair *seq;
   for (i = 0; edge_info->simple_equivalences.iterate (i, &seq); ++i)
{
@@ -1146,8 +1156,10 @@ record_temporary_equivalences (edge e,
  /* Any equivalence found for LHS may result in additional
 equivalences for other uses of LHS that we have already
 processed.  */
- back_propagate_equivalences (lhs, e, c

[committed] libstdc++: Support VAX floats in std::strong_order

2022-03-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, and basic soundness check on vax-dec-netbsdelf.

Pushed to trunk.

-- >8 --

The VAX float and double format does not support NaN, so the
std::partial_ordering returned by <=> will never be 'unordered'. We can
just use the partial_ordering value as the strong_ordering.

libstdc++-v3/ChangeLog:

* libsupc++/compare (_Strong_ordering::_S_fp_cmp) [__vax__]: Use
<=> comparison.
---
 libstdc++-v3/libsupc++/compare | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index 050cf7ed20d..3c22d9addf1 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -843,6 +843,11 @@ namespace std
static constexpr strong_ordering
_S_fp_cmp(_Tp __x, _Tp __y) noexcept
{
+#ifdef __vax__
+ // VAX format has no NaN, only "excess" for Inf, so totally ordered.
+ return __builtin_bit_cast(strong_ordering, __x <=> __y);
+#endif
+
  auto __ix = _S_fp_bits(__x);
  auto __iy = _S_fp_bits(__y);
 
-- 
2.34.1



Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-10 Thread Xi Ruoyao via Gcc-patches
On Wed, 2022-03-09 at 18:25 +, Richard Sandiford wrote:
> Xi Ruoyao  writes:
> > Bootstrapped and regtested on mips64el-linux-gnuabi64.
> > 
> > I'm not sure if it's "correct" to clobber other registers during the
> > zeroing of scratch registers.  But I can't really come up with a
> > better
> > idea: on MIPS there is no simple way to clear one bit in FCSR (i. e.
> > FCC[x]).  We can't just use "c.f.s $fccx,$f0,$f0" because it will
> > raise
> > an exception if $f0 contains a sNaN.
> 
> Yeah, it's a bit of a grey area, but I think it should be fine,
> provided
> that the extra clobbers are never used as return registers (which is
> obviously true for the FCC registers).
> 
> But on that basis…

/* snip */

> > +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, LO_REGNUM))
> > +   SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
> > +  else
> > +   emit_clobber (gen_rtx_REG (word_mode, LO_REGNUM));
> 
> …I don't think this conditional LO_REGNUM code is worth it.
> We might as well just add both registers to zeroed_hardregs.

(See below)

> > +    }
> > +
> > +  bool zero_fcc = false;
> > +  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
> > +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
> > +  zero_fcc = true;
> > +
> > +  /* MIPS does not have a simple way to clear one bit in FCC.  We just
> > + clear FCC with ctc1 and clobber all FCC bits.  */
> > +  if (zero_fcc)
> > +    {
> > +  emit_insn (gen_mips_zero_fcc ());
> > +  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
> > +   if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
> > + SET_HARD_REG_BIT (zeroed_hardregs, i);
> > +   else
> > + emit_clobber (gen_rtx_REG (CCmode, i));
> > +    }
> 
> Here too I think we should just do:
> 
>   zeroed_hardregs |= reg_class_contents[ST_REGS] & accessible_reg_set;
> 
> to include all available FCC registers.

I'm afraid that doing so will cause an ICE (triggering an assertion
somewhere).  Could someone confirm that returning "more" registers than
required is allowed?  GCC Internal does not say it explicitly, and x86
port is carefully avoiding from clearing registers not requested to be
cleared.

> > +  need_zeroed_hardregs &= ~zeroed_hardregs;
> > +  return zeroed_hardregs |
> > +    default_zero_call_used_regs (need_zeroed_hardregs);
> 
> Nit, but: should be formatted as:
> 
>   return (zeroed_hardregs
>   | default_zero_call_used_regs (need_zeroed_hardregs));
> 
> > +}

Will do.

> >  /* Initialize the GCC target structure.  */
> >  #undef TARGET_ASM_ALIGNED_HI_OP
> > @@ -22919,6 +22964,8 @@ mips_asm_file_end (void)
> >  #undef TARGET_ASM_FILE_END
> >  #define TARGET_ASM_FILE_END mips_asm_file_end
> >  
> > +#undef TARGET_ZERO_CALL_USED_REGS
> > +#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs
> >  
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >  
> > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> > index e0f0a582732..edf58710cdd 100644
> > --- a/gcc/config/mips/mips.md
> > +++ b/gcc/config/mips/mips.md
> > @@ -96,6 +96,7 @@ (define_c_enum "unspec" [
> >    ;; Floating-point environment.
> >    UNSPEC_GET_FCSR
> >    UNSPEC_SET_FCSR
> > +  UNSPEC_ZERO_FCC
> >  
> >    ;; HI/LO moves.
> >    UNSPEC_MFHI
> > @@ -7670,6 +7671,11 @@ (define_insn "*mips_set_fcsr"
> >    "TARGET_HARD_FLOAT"
> >    "ctc1\t%0,$31")
> >  
> > +(define_insn "mips_zero_fcc"
> > +  [(unspec_volatile [(const_int 0)] UNSPEC_ZERO_FCC)]
> > +  "TARGET_HARD_FLOAT"
> > +  "ctc1\t$0,$25")
> 
> I've forgotten a lot of MIPS stuff, so: does this clear only the
> FCC registers, or does it clear other things (such as exception bits)
> as well?

Yes, with fs = 25 CTC1 only clear FCCs.

> Does it work even for !ISA_HAS_8CC?

For !ISA_HAS_8CC targets, ST_REG_FIRST is not added into
need_zeroed_hardregs at all. I think it's another bug I didn't
catched...

> I think this pattern should explicit clear all eight registers, e.g.
> using:
> 
>   (set (reg:CC FCC0_REGNUM) (const_int 0))
>   (set (reg:CC FCC1_REGNUM) (const_int 0))
>   …
> 
> which unfortunately means defining 8 new register constants in mips.md.
> I guess for extra safety there should be a separate !ISA_HAS_8CC version
> that only sets FCC0_REGNUM.

Will do.

> An alternative would be to avoid clearing the FCC registers altogether.
> I suppose that's less secure, but residual information could leak through
> the exception bits as well, and it isn't clear whether those should be
> zeroed at the end of each function.  I guess it depends on people's
> appetite for risk.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [committed] libstdc++: Support VAX floats in std::strong_order

2022-03-10 Thread Jonathan Wakely via Gcc-patches
On Thu, 10 Mar 2022 at 11:53, Jonathan Wakely via Libstdc++
 wrote:
>
> Tested x86_64-linux, and basic soundness check on vax-dec-netbsdelf.

But apparently not enough of a soundness check, because
isnan(__builtin_nan("")) is true for VAX, so GCC seems to have a NaN
pattern, despite what I read online about the format.

Fix on the way ...



[PATCH] ada/104861 - use target_noncanonial for Target_Name

2022-03-10 Thread Richard Biener via Gcc-patches
The following arranges for s-oscons.ads to record target_noncanonical
for Target_Name, matching the install directory layout and what
gcc -dumpmachine says.  This fixes build issues with gprbuild.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed as
approved in bugzilla.

OK also for branches after a while?

Thanks,
Richard.

2022-03-10  Richard Biener  

PR ada/104861
gcc/ada/
* gcc-interface/Makefile.in (target_noncanonical): Substitute.
(OSCONS_CPP): Pass target_noncanonical as TARGET.
---
 gcc/ada/gcc-interface/Makefile.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index b8a24708280..1e9801a8b96 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -138,6 +138,7 @@ objdir = .
 
 target_alias=@target_alias@
 target=@target@
+target_noncanonical=@target_noncanonical@
 target_cpu=@target_cpu@
 target_vendor=@target_vendor@
 target_os=@target_os@
@@ -598,7 +599,7 @@ OSCONS_CC=$(subst ./xgcc,../../xgcc,$(subst -B./, 
-B../../,$(GCC_FOR_TARGET)))
 # has a  header).
 
 OSCONS_CPP=$(OSCONS_CC) $(GNATLIBCFLAGS_FOR_C) -E -C \
-  -DTARGET=\"$(target)\" -iquote $(fsrcpfx)ada $(fsrcpfx)ada/s-oscons-tmplt.c 
> s-oscons-tmplt.i
+  -DTARGET=\"$(target_noncanonical)\" -iquote $(fsrcpfx)ada 
$(fsrcpfx)ada/s-oscons-tmplt.c > s-oscons-tmplt.i
 OSCONS_EXTRACT=$(OSCONS_CC) $(GNATLIBCFLAGS_FOR_C) -S s-oscons-tmplt.i
 
 # Note: if you need to build with a non-GNU compiler, you could adapt the
-- 
2.34.1


[PATCH] tree-optimization/102943 - use tree form for sbr_sparse_bitmap

2022-03-10 Thread Richard Biener via Gcc-patches
The following arranges to remove an indirection do the bitvector
in sbr_sparse_bitmap by embedding bitmap_head instead of bitmap
and using the tree form (since we only ever set/query individual
aligned bit chunks).  That shaves off 6 seconds from 70 seconds
of the slowest 521.wrf_r LRANS unit build.

Bootstrap & regtest pending on x86_64-unknown-linux-gnu, will push
if successful.

2022-03-10  Richard Biener  

PR tree-optimization/102943
* gimple-range-cache.cc (sbr_sparse_bitmap::bitvec):
Make a bitmap_head.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Adjust and switch
to tree view.
(sbr_sparse_bitmap::set_bb_range): Adjust.
(sbr_sparse_bitmap::get_bb_range): Likewise.
---
 gcc/gimple-range-cache.cc | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 613135266a4..583ba29eb63 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -313,7 +313,7 @@ private:
   int bitmap_get_quad (const_bitmap head, int quad);
   irange_allocator *m_irange_allocator;
   irange *m_range[SBR_NUM];
-  bitmap bitvec;
+  bitmap_head bitvec;
   tree m_type;
 };
 
@@ -324,7 +324,8 @@ sbr_sparse_bitmap::sbr_sparse_bitmap (tree t, 
irange_allocator *allocator,
 {
   gcc_checking_assert (TYPE_P (t));
   m_type = t;
-  bitvec = BITMAP_ALLOC (bm);
+  bitmap_initialize (&bitvec, bm);
+  bitmap_tree_view (&bitvec);
   m_irange_allocator = allocator;
   // Pre-cache varying.
   m_range[0] = m_irange_allocator->allocate (2);
@@ -370,7 +371,7 @@ sbr_sparse_bitmap::set_bb_range (const_basic_block bb, 
const irange &r)
 {
   if (r.undefined_p ())
 {
-  bitmap_set_quad (bitvec, bb->index, SBR_UNDEF);
+  bitmap_set_quad (&bitvec, bb->index, SBR_UNDEF);
   return true;
 }
 
@@ -380,11 +381,11 @@ sbr_sparse_bitmap::set_bb_range (const_basic_block bb, 
const irange &r)
   {
if (!m_range[x])
  m_range[x] = m_irange_allocator->allocate (r);
-   bitmap_set_quad (bitvec, bb->index, x + 1);
+   bitmap_set_quad (&bitvec, bb->index, x + 1);
return true;
   }
   // All values are taken, default to VARYING.
-  bitmap_set_quad (bitvec, bb->index, SBR_VARYING);
+  bitmap_set_quad (&bitvec, bb->index, SBR_VARYING);
   return false;
 }
 
@@ -394,7 +395,7 @@ sbr_sparse_bitmap::set_bb_range (const_basic_block bb, 
const irange &r)
 bool
 sbr_sparse_bitmap::get_bb_range (irange &r, const_basic_block bb)
 {
-  int value = bitmap_get_quad (bitvec, bb->index);
+  int value = bitmap_get_quad (&bitvec, bb->index);
 
   if (!value)
 return false;
@@ -412,7 +413,7 @@ sbr_sparse_bitmap::get_bb_range (irange &r, 
const_basic_block bb)
 bool
 sbr_sparse_bitmap::bb_range_p (const_basic_block bb)
 {
-  return (bitmap_get_quad (bitvec, bb->index) != 0);
+  return (bitmap_get_quad (&bitvec, bb->index) != 0);
 }
 
 // -
-- 
2.34.1


Re: [PATCH] ada/104861 - use target_noncanonial for Target_Name

2022-03-10 Thread Arnaud Charlet via Gcc-patches
> The following arranges for s-oscons.ads to record target_noncanonical
> for Target_Name, matching the install directory layout and what
> gcc -dumpmachine says.  This fixes build issues with gprbuild.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed as
> approved in bugzilla.
> 
> OK also for branches after a while?

Yes, thanks.

> 2022-03-10  Richard Biener  
> 
>   PR ada/104861
> gcc/ada/
>   * gcc-interface/Makefile.in (target_noncanonical): Substitute.
>   (OSCONS_CPP): Pass target_noncanonical as TARGET.


Re: [PATCH] [i386] Don't fold builtin into gimple when isa mismatches.

2022-03-10 Thread Hongtao Liu via Gcc-patches
On Tue, Mar 8, 2022 at 9:30 AM Hongtao Liu  wrote:
>
> ping^1
>
> On Fri, Feb 25, 2022 at 1:51 PM Hongtao Liu  wrote:
> >
> > On Fri, Feb 25, 2022 at 1:50 PM liuhongt  wrote:
> > >
> > > The patch fixes ICE in ix86_gimple_fold_builtin.
> > >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for main trunk?
I'm going to push the patch to trunk if there's no objection,
basically the patch cuts the code(partially) in ix86_expand_builtin
into ix86_check_builtin_isa_match, and use it to guard
ix86_gimple_fold_builtin.
The patch will fix GCC12 regression in PR104666.
> >
> > > gcc/ChangeLog:
> > >
> > > PR target/104666
> > > * config/i386/i386-expand.cc
> > > (ix86_check_builtin_isa_match): New func.
> > > (ix86_expand_builtin): Move code to
> > > ix86_check_builtin_isa_match and call it.
> > > * config/i386/i386-protos.h
> > > (ix86_check_builtin_isa_match): Declare.
> > > * config/i386/i386.cc (ix86_gimple_fold_builtin): Don't fold
> > > builtin into gimple when isa mismatches.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr104666.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc   | 97 ++--
> > >  gcc/config/i386/i386-protos.h|  5 ++
> > >  gcc/config/i386/i386.cc  |  4 +
> > >  gcc/testsuite/gcc.target/i386/pr104666.c | 49 
> > >  4 files changed, 115 insertions(+), 40 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr104666.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc 
> > > b/gcc/config/i386/i386-expand.cc
> > > index faa0191c6dd..1d132f0181d 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -12232,46 +12232,14 @@ ix86_expand_vec_set_builtin (tree exp)
> > >return target;
> > >  }
> > >
> > > -/* Expand an expression EXP that calls a built-in function,
> > > -   with result going to TARGET if that's convenient
> > > -   (and in mode MODE if that's convenient).
> > > -   SUBTARGET may be used as the target for computing one of EXP's 
> > > operands.
> > > -   IGNORE is nonzero if the value is to be ignored.  */
> > > -
> > > -rtx
> > > -ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
> > > -machine_mode mode, int ignore)
> > > +/* Return true if the necessary isa options for this builtin exist,
> > > +   else false.
> > > +   fcode = DECL_MD_FUNCTION_CODE (fndecl);  */
> > > +bool
> > > +ix86_check_builtin_isa_match (unsigned int fcode,
> > > + HOST_WIDE_INT* pbisa,
> > > + HOST_WIDE_INT* pbisa2)
> > >  {
> > > -  size_t i;
> > > -  enum insn_code icode, icode2;
> > > -  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> > > -  tree arg0, arg1, arg2, arg3, arg4;
> > > -  rtx op0, op1, op2, op3, op4, pat, pat2, insn;
> > > -  machine_mode mode0, mode1, mode2, mode3, mode4;
> > > -  unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> > > -
> > > -  /* For CPU builtins that can be folded, fold first and expand the 
> > > fold.  */
> > > -  switch (fcode)
> > > -{
> > > -case IX86_BUILTIN_CPU_INIT:
> > > -  {
> > > -   /* Make it call __cpu_indicator_init in libgcc. */
> > > -   tree call_expr, fndecl, type;
> > > -type = build_function_type_list (integer_type_node, NULL_TREE);
> > > -   fndecl = build_fn_decl ("__cpu_indicator_init", type);
> > > -   call_expr = build_call_expr (fndecl, 0);
> > > -   return expand_expr (call_expr, target, mode, EXPAND_NORMAL);
> > > -  }
> > > -case IX86_BUILTIN_CPU_IS:
> > > -case IX86_BUILTIN_CPU_SUPPORTS:
> > > -  {
> > > -   tree arg0 = CALL_EXPR_ARG (exp, 0);
> > > -   tree fold_expr = fold_builtin_cpu (fndecl, &arg0);
> > > -   gcc_assert (fold_expr != NULL_TREE);
> > > -   return expand_expr (fold_expr, target, mode, EXPAND_NORMAL);
> > > -  }
> > > -}
> > > -
> > >HOST_WIDE_INT isa = ix86_isa_flags;
> > >HOST_WIDE_INT isa2 = ix86_isa_flags2;
> > >HOST_WIDE_INT bisa = ix86_builtins_isa[fcode].isa;
> > > @@ -12321,7 +12289,56 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> > > subtarget,
> > >bisa |= OPTION_MASK_ISA_SSE2;
> > >  }
> > >
> > > -  if ((bisa & isa) != bisa || (bisa2 & isa2) != bisa2)
> > > +  if (pbisa)
> > > +*pbisa = bisa;
> > > +  if (pbisa2)
> > > +*pbisa2 = bisa2;
> > > +
> > > +  return (bisa & isa) == bisa && (bisa2 & isa2) == bisa2;
> > > +}
> > > +
> > > +/* Expand an expression EXP that calls a built-in function,
> > > +   with result going to TARGET if that's convenient
> > > +   (and in mode MODE if that's convenient).
> > > +   SUBTARGET may be used as the target for computing one of EXP's 
> > > operands.
> > > +   IGNORE is nonzero if the value is to be ignored.  */
> > > +
> > > +rtx
> > > +ix86_expand_builtin (tree exp, rtx target, rtx subtarget,
> > > 

Re: [PATCH v2 RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-10 Thread Xi Ruoyao via Gcc-patches
Changes from v1:

 * Added all zeroed registers into the return value of
   TARGET_ZERO_CALL_USED_REGS.  **The question: is this allowed?**
 * Define mips_zero_fcc insn only for ISA_HAS_8CC and mips_isa >
   MIPS_ISA_MIPS4, because MIPS I-IV and MIPSr6 don't support
   "ISA_HAS_8CC && mips_isa > MIPS_ISA_MIPS4".
 * Change mips_zero_fcc to explicit clear all eight registers.
 * Report an error for MIPS I-IV.

-- >8 --

This fixes the ICEs using -fzero-call-used-regs=all for MIPS target.

OpenSSH-8.9p1 has started to enable this by default, giving us a reason
to fix -fzero-call-used-regs for more targets.

gcc/

 PR target/xx (WIP)
 PR target/xx (Don't push)
 * config/mips/mips.cc (mips_zero_call_used_regs): New function.
 (TARGET_ZERO_CALL_USED_REGS): Define.
 * config/mips/mips.md (FCC{0..9}_REGNUM): New constants.
 (mips_zero_fcc): New insn.

gcc/testsuite

 * c-c++-common/zero-scratch-regs-8.c: Enable for MIPS.
 * c-c++-common/zero-scratch-regs-9.c: Likewise.
 * c-c++-common/zero-scratch-regs-10.c: Likewise.
 * c-c++-common/zero-scratch-regs-11.c: Likewise.
---
 gcc/config/mips/mips.cc | 55 +++
 gcc/config/mips/mips.md | 20 +++
 .../c-c++-common/zero-scratch-regs-10.c | 2 +-
 .../c-c++-common/zero-scratch-regs-11.c | 2 +-
 .../c-c++-common/zero-scratch-regs-8.c | 2 +-
 .../c-c++-common/zero-scratch-regs-9.c | 2 +-
 6 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 4f9683e8bf4..59eef515826 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -22611,6 +22611,59 @@ mips_asm_file_end (void)
 if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
+
+static HARD_REG_SET
+mips_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+ HARD_REG_SET zeroed_hardregs;
+ CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+ if (TEST_HARD_REG_BIT (need_zeroed_hardregs, HI_REGNUM))
+ {
+ /* Clear HI and LO altogether. MIPS target treats HILO as a
+ double-word register. */
+ machine_mode dword_mode = TARGET_64BIT ? TImode : DImode;
+ rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST);
+ rtx zero = CONST0_RTX (dword_mode);
+ emit_move_insn (hilo, zero);
+
+ SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM);
+ SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
+ }
+
+ /* MIPS does not have a simple way to clear one bit in FCC. We just
+ clear FCC with ctc1 and clobber all FCC bits. */
+ HARD_REG_SET fcc = reg_class_contents[ST_REGS] & accessible_reg_set;
+ if (hard_reg_set_intersect_p (need_zeroed_hardregs, fcc))
+ {
+ static bool issued_error = false;
+ if (mips_isa <= MIPS_ISA_MIPS4)
+ {
+ /* We don't have an easy way to clear FCC on MIPS I, II, III,
+ and IV. */
+ if (!issued_error)
+ sorry ("%qs not supported on this target",
+ "-fzero-call-used-regs");
+ issued_error = true;
+
+ /* Prevent an ICE. */
+ need_zeroed_hardregs &= ~fcc;
+ }
+ else
+ {
+ /* If the target is MIPSr6, we should not reach here. All other
+ MIPS targets are ISA_HAS_8CC. */
+ gcc_assert (ISA_HAS_8CC);
+ emit_insn (gen_mips_zero_fcc ());
+ zeroed_hardregs |= fcc;
+ }
+ }
+
+ need_zeroed_hardregs &= ~zeroed_hardregs;
+ return (zeroed_hardregs |
+ default_zero_call_used_regs (need_zeroed_hardregs));
+}
+
 
 /* Initialize the GCC target structure. */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -22919,6 +22972,8 @@ mips_asm_file_end (void)
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END mips_asm_file_end
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index e0f0a582732..36d6a43d67f 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -167,6 +167,14 @@ (define_constants
 (SET_FCSR_REGNUM 4)
 (PIC_FUNCTION_ADDR_REGNUM 25)
 (RETURN_ADDR_REGNUM 31)
+ (FCC0_REGNUM 67)
+ (FCC1_REGNUM 68)
+ (FCC2_REGNUM 69)
+ (FCC3_REGNUM 70)
+ (FCC4_REGNUM 71)
+ (FCC5_REGNUM 72)
+ (FCC6_REGNUM 73)
+ (FCC7_REGNUM 74)
 (CPRESTORE_SLOT_REGNUM 76)
 (GOT_VERSION_REGNUM 79)
@@ -7670,6 +7678,18 @@ (define_insn "*mips_set_fcsr"
 "TARGET_HARD_FLOAT"
 "ctc1\t%0,$31")
+(define_insn "mips_zero_fcc"
+ [(set (reg:CC FCC0_REGNUM) (const_int 0))
+ (set (reg:CC FCC1_REGNUM) (const_int 0))
+ (set (reg:CC FCC2_REGNUM) (const_int 0))
+ (set (reg:CC FCC3_REGNUM) (const_int 0))
+ (set (reg:CC FCC4_REGNUM) (const_int 0))
+ (set (reg:CC FCC5_REGNUM) (const_int 0))
+ (set (reg:CC FCC6_REGNUM) (const_int 0))
+ (set (reg:CC FCC7_REGNUM) (const_int 0))]
+ "TARGET_HARD_FLOAT && ISA_HAS_8CC && mips_isa > MIPS_ISA_MIPS4"
+ "ctc1\t$0,$25")
+
 ;; See tls_get_tp_mips16_ for why this form is used.
 (define_insn "mips_set_fcsr_mips16_"
 [(unspec_volatile:SI [(match_operand:P 0 "call_insn_operand" "dS")
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
index 96e0b79b328..c23b2ceb391 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
+++ b/gcc/tes

[PATCH v2 RFC, resend] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-10 Thread Xi Ruoyao via Gcc-patches
On Thu, 2022-03-10 at 21:41 +0800, Xi Ruoyao wrote:
> Changes from v1:
> 
>  * Added all zeroed registers into the return value of
>    TARGET_ZERO_CALL_USED_REGS.  **The question: is this allowed?**
>  * Define mips_zero_fcc insn only for ISA_HAS_8CC and mips_isa >
>    MIPS_ISA_MIPS4, because MIPS I-IV and MIPSr6 don't support
>    "ISA_HAS_8CC && mips_isa > MIPS_ISA_MIPS4".
>  * Change mips_zero_fcc to explicit clear all eight registers.
>  * Report an error for MIPS I-IV.

My mail client somehow mangled the patch.  Resending...

-- >8 --

This fixes the ICEs using -fzero-call-used-regs=all for MIPS target.

OpenSSH-8.9p1 has started to enable this by default, giving us a reason
to fix -fzero-call-used-regs for more targets.

gcc/

PR target/xx (WIP)
PR target/xx (Don't push)
* config/mips/mips.cc (mips_zero_call_used_regs): New function.
  (TARGET_ZERO_CALL_USED_REGS): Define.
* config/mips/mips.md (FCC{0..9}_REGNUM): New constants.
(mips_zero_fcc): New insn.

gcc/testsuite

* c-c++-common/zero-scratch-regs-8.c: Enable for MIPS.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
* c-c++-common/zero-scratch-regs-10.c: Likewise.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
---
 gcc/config/mips/mips.cc   | 55 +++
 gcc/config/mips/mips.md   | 20 +++
 .../c-c++-common/zero-scratch-regs-10.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-11.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-8.c|  2 +-
 .../c-c++-common/zero-scratch-regs-9.c|  2 +-
 6 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 4f9683e8bf4..59eef515826 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -22611,6 +22611,59 @@ mips_asm_file_end (void)
   if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
+
+static HARD_REG_SET
+mips_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, HI_REGNUM))
+{
+  /* Clear HI and LO altogether.  MIPS target treats HILO as a
+double-word register.  */
+  machine_mode dword_mode = TARGET_64BIT ? TImode : DImode;
+  rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST);
+  rtx zero = CONST0_RTX (dword_mode);
+  emit_move_insn (hilo, zero);
+
+  SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM);
+  SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
+}
+
+  /* MIPS does not have a simple way to clear one bit in FCC.  We just
+ clear FCC with ctc1 and clobber all FCC bits.  */
+  HARD_REG_SET fcc = reg_class_contents[ST_REGS] & accessible_reg_set;
+  if (hard_reg_set_intersect_p (need_zeroed_hardregs, fcc))
+{
+  static bool issued_error = false;
+  if (mips_isa <= MIPS_ISA_MIPS4)
+   {
+ /* We don't have an easy way to clear FCC on MIPS I, II, III,
+and IV.  */
+ if (!issued_error)
+   sorry ("%qs not supported on this target",
+  "-fzero-call-used-regs");
+ issued_error = true;
+
+ /* Prevent an ICE.  */
+ need_zeroed_hardregs &= ~fcc;
+   }
+  else
+   {
+ /* If the target is MIPSr6, we should not reach here.  All other
+MIPS targets are ISA_HAS_8CC.  */
+ gcc_assert (ISA_HAS_8CC);
+ emit_insn (gen_mips_zero_fcc ());
+ zeroed_hardregs |= fcc;
+   }
+}
+
+  need_zeroed_hardregs &= ~zeroed_hardregs;
+  return (zeroed_hardregs |
+ default_zero_call_used_regs (need_zeroed_hardregs));
+}
+
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -22919,6 +22972,8 @@ mips_asm_file_end (void)
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END mips_asm_file_end
 
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index e0f0a582732..36d6a43d67f 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -167,6 +167,14 @@ (define_constants
(SET_FCSR_REGNUM4)
(PIC_FUNCTION_ADDR_REGNUM   25)
(RETURN_ADDR_REGNUM 31)
+   (FCC0_REGNUM67)
+   (FCC1_REGNUM68)
+   (FCC2_REGNUM69)
+   (FCC3_REGNUM70)
+   (FCC4_REGNUM71)
+   (FCC5_REGNUM72)
+   (FCC6_REGNUM73)
+   (FCC7_REGNUM74)
(CPRESTORE_SLOT_REGNUM  76)
(GOT_VERSION_REGNUM 79)
 
@@ -7670,6 +7678,18 @@ (define_insn "*mips_set_fcsr"
   "TARGET_HARD_FLOAT"
   "ctc1\t%0,$31")
 
+(define_insn "mips_zero_fcc"
+  [(set (reg:CC FCC0_REGNUM) (c

Re: [PATCH] PR c++/84964: Middle-end patch to expand_call for ICE after sorry.

2022-03-10 Thread Jason Merrill via Gcc-patches

On 2/28/22 03:52, Roger Sayle wrote:


This patch resolves PR c++/84964 which is an ICE in the middle-end after
emitting a "sorry, unimplemented" message, and is a regression from
earlier releases of GCC.  This issue is that after encountering a
function call requiring an unreasonable amount of stack space, the
code continues and falls foul of an assert checking that stack pointer
has been correctly updated.  The fix is to (locally) consider aborted
function calls as "no return", which skips this downstream sanity check.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-28  Roger Sayle  

gcc/ChangeLog
PR c++/84964
* calls.cc (expand_call): Ignore stack adjustments after sorry.

gcc/testsuite/ChangeLog
PR c++/84964
* g++.dg/pr84964.C: New test case.


Again I'd prefer to have the test in a subdirectory, though which one is 
less clear; either opt or other, I guess.  OK with that change.




Re: [PATCH] c++: allow variadic operator[] for C++23 [PR103460]

2022-03-10 Thread Jason Merrill via Gcc-patches

On 3/10/22 04:34, Jakub Jelinek wrote:

Hi!

wg21.link/p2128 removed "with exactly one parameter" from over.sub
section.  grok_op_properties has for that the last 2 lines in:
 case OVL_OP_FLAG_BINARY:
   if (arity != 2)
 {
   if (operator_code == ARRAY_REF && cxx_dialect >= cxx23)
 break;
but unfortunately it isn't enough, we reject variadic operator[]
earlier.  The following patch accepts variadic operator[] for C++23
too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-03-10  Jakub Jelinek  

PR c++/103460
* decl.cc (grok_op_properties): Allow variadic operator[] for
C++23.

* g++.dg/cpp23/subscript7.C: New test.

--- gcc/cp/decl.cc.jj   2022-03-09 15:24:57.285926439 +0100
+++ gcc/cp/decl.cc  2022-03-09 16:56:41.053901657 +0100
@@ -15214,6 +15214,9 @@ grok_op_properties (tree decl, bool comp
if (!arg)
{
  /* Variadic.  */
+ if (operator_code == ARRAY_REF && cxx_dialect >= cxx23)
+   break;
+
  error_at (loc, "%qD must not have variable number of arguments",
decl);
  return false;
@@ -15289,7 +15292,8 @@ grok_op_properties (tree decl, bool comp
  }
  
/* There can be no default arguments.  */

-  for (tree arg = argtypes; arg != void_list_node; arg = TREE_CHAIN (arg))
+  for (tree arg = argtypes; arg && arg != void_list_node;
+   arg = TREE_CHAIN (arg))
  if (TREE_PURPOSE (arg))
{
TREE_PURPOSE (arg) = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp23/subscript7.C.jj  2022-03-09 17:02:22.915179262 
+0100
+++ gcc/testsuite/g++.dg/cpp23/subscript7.C 2022-03-09 17:02:18.446240994 
+0100
@@ -0,0 +1,17 @@
+// PR c++/103460
+// { dg-do compile }
+// { dg-options "-std=c++23" }
+
+struct S {
+  int &operator[] (int, ...);
+} s;
+struct T {
+  int &operator[] (auto...);
+} t;
+struct U {
+  int &operator[] (...);
+} u;
+
+int a = s[1] + s[2, 1] + s[3, 2, 1] + s[4, 3, 2, 1]
+   + t[0.0] + t[nullptr, s, 42]
+   + u[] + u[42] + u[1.5L, 1LL];

Jakub





[committed] analyzer: fix duplicates in check_for_tainted_size_arg

2022-03-10 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7594-g708646de75cba2.

gcc/analyzer/ChangeLog:
* sm-taint.cc (taint_state_machine::check_for_tainted_size_arg):
Avoid generating duplicate saved_diagnostics by only handling the
rdwr_map entry for the ptrarg, not the duplicate entry for the
sizarg.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/taint-size-access-attr-1.c: Add
-fanalyzer-show-duplicate-count to options; verify that a
duplicate was not created for the tainted size.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc | 4 
 gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c | 7 ---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index c7b28329fca..a13c2fe2cfa 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -904,6 +904,10 @@ taint_state_machine::check_for_tainted_size_arg 
(sm_context *sm_ctxt,
   if (!access)
continue;
 
+  /* Ignore any duplicate entry in the map for the size argument.  */
+  if (access->ptrarg != argno)
+   continue;
+
   if (access->sizarg == UINT_MAX)
continue;
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c 
b/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
index 724679a8cf3..7d243a9570f 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
@@ -1,8 +1,8 @@
 /* Passing tainted sizes to external functions with attribute ((access)) with
a size-index.  */
 
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
+// TODO: remove need for the explicit taint option:
+/* { dg-additional-options "-fanalyzer-checker=taint 
-fanalyzer-show-duplicate-count" } */
 
 #include "analyzer-decls.h"
 #include 
@@ -27,7 +27,8 @@ void test_fn_read_only (FILE *f, void *p)
 __analyzer_dump_state ("taint", tmp.sz); /* { dg-warning "state: 
'tainted'" } */
 /* { dg-message "\\(\[0-9\]+\\) \\.\\.\\.to here" "event: to here" { 
target *-*-* } .-1 } */
 
-extern_fn_read_only (p, tmp.sz); /* { dg-warning "use of 
attacker-controlled value 'tmp.sz' as size without upper-bounds checking" } */
+extern_fn_read_only (p, tmp.sz); /* { dg-warning "use of 
attacker-controlled value 'tmp.sz' as size without upper-bounds checking" 
"warning" } */
+/* { dg-bogus "duplicate" "duplicate" { target *-*-* } .-1 } */
   }
 }
 
-- 
2.26.3



Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-10 Thread Jason Merrill via Gcc-patches

On 3/2/22 19:15, Martin Sebor wrote:

The -Wdangling-pointer code tests the EDGE_DFS_BACK but the pass never
calls the mark_dfs_back_edges() function that initializes the bit (I
didn't know about it).  As a result the bit is not set when expected,
which can cause false positives under the right conditions.

The attached patch adds a call to the warning pass to initialize
the bit.  Tested on x86_64-linux.


OK on Monday if no other comments.

Jason



[committed] analyzer: check for writes to consts via access attr [PR104793]

2022-03-10 Thread David Malcolm via Gcc-patches
This patch extends:
  -Wanalyzer-write-to-const
  -Wanalyzer-write-to-string-literal
so that they will check for __attribute__ ((access, ) on calls to
externally-defined functions, and complain about read-only regions
pointed to by arguments marked with a "write_only" or "read_write"
attribute.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7595-gb6eaf90c64f915.

gcc/analyzer/ChangeLog:
PR analyzer/104793
* region-model.cc
(region_model::check_external_function_for_access_attr): New.
(region_model::handle_unrecognized_call): Call it.
* region-model.h
(region_model::check_external_function_for_access_attr): New decl.
(region_model::handle_unrecognized_call): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/104793
* gcc.dg/analyzer/write-to-const-2.c: New test.
* gcc.dg/analyzer/write-to-function-1.c: New test.
* gcc.dg/analyzer/write-to-string-literal-2.c: New test.
* gcc.dg/analyzer/write-to-string-literal-3.c: New test.
* gcc.dg/analyzer/write-to-string-literal-4.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  | 58 
 gcc/analyzer/region-model.h   |  3 +
 .../gcc.dg/analyzer/write-to-const-2.c| 60 +
 .../gcc.dg/analyzer/write-to-function-1.c | 15 +
 .../analyzer/write-to-string-literal-2.c  | 19 ++
 .../analyzer/write-to-string-literal-3.c  | 66 +++
 .../analyzer/write-to-string-literal-4.c  | 23 +++
 7 files changed, 244 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-const-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-function-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-4.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 5cfa3543f17..5760ff70938 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1583,6 +1583,61 @@ region_model::purge_state_involving (const svalue *sval,
 ctxt->purge_state_involving (sval);
 }
 
+/* Check CALL a call to external function CALLEE_FNDECL based on
+   any __attribute__ ((access, ) on the latter, complaining to
+   CTXT about any issues.
+
+   Currently we merely call check_region_for_write on any regions
+   pointed to by arguments marked with a "write_only" or "read_write"
+   attribute.  */
+
+void
+region_model::
+check_external_function_for_access_attr (const gcall *call,
+tree callee_fndecl,
+region_model_context *ctxt) const
+{
+  gcc_assert (call);
+  gcc_assert (callee_fndecl);
+  gcc_assert (ctxt);
+
+  tree fntype = TREE_TYPE (callee_fndecl);
+  if (!fntype)
+return;
+
+  if (!TYPE_ATTRIBUTES (fntype))
+return;
+
+  /* Initialize a map of attribute access specifications for arguments
+ to the function call.  */
+  rdwr_map rdwr_idx;
+  init_attr_rdwr_indices (&rdwr_idx, TYPE_ATTRIBUTES (fntype));
+
+  unsigned argno = 0;
+
+  for (tree iter = TYPE_ARG_TYPES (fntype); iter;
+   iter = TREE_CHAIN (iter), ++argno)
+{
+  const attr_access* access = rdwr_idx.get (argno);
+  if (!access)
+   continue;
+
+  /* Ignore any duplicate entry in the map for the size argument.  */
+  if (access->ptrarg != argno)
+   continue;
+
+  if (access->mode == access_write_only
+ || access->mode == access_read_write)
+   {
+ tree ptr_tree = gimple_call_arg (call, access->ptrarg);
+ const svalue *ptr_sval = get_rvalue (ptr_tree, ctxt);
+ const region *reg = deref_rvalue (ptr_sval, ptr_tree, ctxt);
+ check_region_for_write (reg, ctxt);
+ /* We don't use the size arg for now.  */
+   }
+}
+}
+
 /* Handle a call CALL to a function with unknown behavior.
 
Traverse the regions in this model, determining what regions are
@@ -1598,6 +1653,9 @@ region_model::handle_unrecognized_call (const gcall *call,
 {
   tree fndecl = get_fndecl_for_call (call, ctxt);
 
+  if (fndecl && ctxt)
+check_external_function_for_access_attr (call, fndecl, ctxt);
+
   reachable_regions reachable_regs (this);
 
   /* Determine the reachable regions and their mutability.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index aa489d06a38..788d0c22bca 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -846,6 +846,9 @@ class region_model
  region_model_context *ctxt) const;
 
   void check_call_args (const call_details &cd) const;
+  void check_external_function_for_access_attr (const gcall *call,
+   tree callee_fndecl,
+  

[committed] analyzer: add notes to write-to-const/string from access attr [PR104793]

2022-03-10 Thread David Malcolm via Gcc-patches
The previous patch extended
  -Wanalyzer-write-to-const
  -Wanalyzer-write-to-string-literal
to make use of __attribute__ ((access, ), but the results could be
inscrutable.

This patch adds notes to such diagnostics to give the user a reason for
why the analyzer is complaining.

Example output:

test.c: In function 'main':
test.c:15:13: warning: write to string literal 
[-Wanalyzer-write-to-string-literal]
   15 | if (getrandom((char *)test, sizeof(buf), GRND_RANDOM))
  | ^
  'main': event 1
|
|   15 | if (getrandom((char *)test, sizeof(buf), GRND_RANDOM))
|  | ^
|  | |
|  | (1) write to string literal here
|
test.c:3:5: note: parameter 1 of 'getrandom' marked with attribute 'access 
(write_only, 1, 2)'
3 | int getrandom (void *__buffer, size_t __length,
  | ^

Unfortunately we don't have location information for the attributes
themselves, just the function declaration, and there doesn't seem to be
a good way of getting at the location of the individual parameters from
the middle end (the C and C++ FEs both have get_fndecl_argument_location,
but the implementations are different).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7596-gc65d3c7f9dade1.

gcc/analyzer/ChangeLog:
PR analyzer/104793
* analyzer.h (class pending_note): New forward decl.
* diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
Initialize m_notes.
(saved_diagnostic::operator==): Compare m_notes.
(saved_diagnostic::add_note): New.
(saved_diagnostic::emit_any_notes): New.
(diagnostic_manager::add_note): New.
(diagnostic_manager::emit_saved_diagnostic): Call emit_any_notes
after emitting the warning.
* diagnostic-manager.h (saved_diagnostic::add_note): New decl.
(saved_diagnostic::emit_any_notes): New decl.
(saved_diagnostic::m_notes): New field.
(diagnostic_manager::add_note): New decl.
* engine.cc (impl_region_model_context::add_note): New.
* exploded-graph.h (impl_region_model_context::add_note): New
decl.
* pending-diagnostic.h (class pending_note): New.
(class pending_note_subclass): New template.
* region-model.cc (class reason_attr_access): New.
(check_external_function_for_access_attr): Add class
annotating_ctxt and use it when checking region.
(noop_region_model_context::add_note): New.
* region-model.h (region_model_context::add_note): New vfunc.
(noop_region_model_context::add_note): New decl.
(class region_model_context_decorator): New.
(class note_adding_context): New.

gcc/testsuite/ChangeLog:
PR analyzer/104793
* gcc.dg/analyzer/write-to-const-2.c: Add dg-message directives
for expected notes.
* gcc.dg/analyzer/write-to-function-1.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-2.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-3.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-4.c: Likewise.
* gcc.dg/analyzer/write-to-string-literal-5.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |   1 +
 gcc/analyzer/diagnostic-manager.cc|  43 +-
 gcc/analyzer/diagnostic-manager.h |   7 +
 gcc/analyzer/engine.cc|  10 ++
 gcc/analyzer/exploded-graph.h |   1 +
 gcc/analyzer/pending-diagnostic.h |  43 ++
 gcc/analyzer/region-model.cc  |  73 -
 gcc/analyzer/region-model.h   | 146 ++
 .../gcc.dg/analyzer/write-to-const-2.c|   8 +-
 .../gcc.dg/analyzer/write-to-function-1.c |   2 +-
 .../analyzer/write-to-string-literal-2.c  |   2 +-
 .../analyzer/write-to-string-literal-3.c  |   8 +-
 .../analyzer/write-to-string-literal-4.c  |   2 +-
 .../analyzer/write-to-string-literal-5.c  |  31 
 14 files changed, 362 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/write-to-string-literal-5.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 36db4f2b538..223ab7075f3 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -85,6 +85,7 @@ class bounded_ranges;
 class bounded_ranges_manager;
 
 class pending_diagnostic;
+class pending_note;
 class state_change_event;
 class checker_path;
 class extrinsic_state;
diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 680016e94bc..561bb18cee0 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -629,7 +629,8 @@ saved_diagnostic::saved_diagnostic (const state_machine *sm,
   m_var (var), m_sval (sv

Re: [committed] libstdc++: Support VAX floats in std::strong_order

2022-03-10 Thread Jonathan Wakely via Gcc-patches
On Thu, 10 Mar 2022 at 12:16, Jonathan Wakely wrote:
>
> On Thu, 10 Mar 2022 at 11:53, Jonathan Wakely via Libstdc++
>  wrote:
> >
> > Tested x86_64-linux, and basic soundness check on vax-dec-netbsdelf.
>
> But apparently not enough of a soundness check, because
> isnan(__builtin_nan("")) is true for VAX, so GCC seems to have a NaN
> pattern, despite what I read online about the format.
>
> Fix on the way ...

Here's the fix that adds support for VAX NaN (and works around
PR104865 which I discovered while trying to make this work).

Tested x86_64-linux, and slightly tested on vax-dec-netbsdelf again.

Pushed to trunk.
commit 73f3b8a53e6664c079731c2a183c16621481d039
Author: Jonathan Wakely 
Date:   Thu Mar 10 14:17:03 2022

libstdc++: Fix std::strong_order to handle NaN on VAX

I mistakenly believed that VAX floats do not support NaN, but with GCC
__builtin_isnan(__builtin_nan("")) is true. That means my previous
change to  is wrong, because it fails to handle NaN.

When std::numeric_limits::is_iec559 is false, as on
VAX, the standard only requires an ordering that is consistent with the
ordering observed by comparison operators. With this change the ordering
is -NaN < numbers < +NaN, and there is no support for different NaN bit
patterns (as I'm not even sure if GCC supports any for VAX).

libstdc++-v3/ChangeLog:

* libsupc++/compare (_Strong_order::_S_fp_cmp) [__vax__]:
Handle NaN.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index 3c22d9addf1..6e1ed53eeed 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -844,8 +844,16 @@ namespace std
_S_fp_cmp(_Tp __x, _Tp __y) noexcept
{
 #ifdef __vax__
- // VAX format has no NaN, only "excess" for Inf, so totally ordered.
- return __builtin_bit_cast(strong_ordering, __x <=> __y);
+ if (__builtin_isnan(__x) || __builtin_isnan(__y))
+   {
+ int __ix = (bool) __builtin_isnan(__x);
+ int __iy = (bool) __builtin_isnan(__y);
+ __ix *= __builtin_signbit(__x) ? -1 : 1;
+ __iy *= __builtin_signbit(__y) ? -1 : 1;
+ return __ix <=> __iy;
+   }
+ else
+   return __builtin_bit_cast(strong_ordering, __x <=> __y);
 #endif
 
  auto __ix = _S_fp_bits(__x);


Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-10 Thread Patrick Palka via Gcc-patches
On Wed, 9 Mar 2022, Jason Merrill wrote:

> On 3/1/22 18:08, Patrick Palka wrote:
> > A well-formed call to std::move/forward is equivalent to a cast, but the
> > former being a function call means it comes with bloated debug info, which
> > persists even after the call has been inlined away, for an operation that
> > is never interesting to debug.
> > 
> > This patch addresses this problem in a relatively ad-hoc way by folding
> > calls to std::move/forward into casts as part of the frontend's general
> > expression folding routine.  After this patch with -O2 and a non-checking
> > compiler, debug info size for some testcases decreases by about ~10% and
> > overall compile time and memory usage decreases by ~2%.
> 
> Impressive.  Which testcases?

I saw the largest percent reductions in debug file object size in
various tests from cmcstl2 and range-v3, e.g.
test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
(which are among their biggest tests).

Significant reductions in debug object file size can be observed in
some libstdc++ testcases too, such as a 5.5% reduction in
std/ranges/adaptor/join.cc

> 
> Do you also want to handle addressof and as_const in this patch, as Jonathan
> suggested?

Yes, good idea.  Since each of their argument and return types are
indirect types, I think we can use the same NOP_EXPR-based folding for
them.

> 
> I think we can do this now, and think about generalizing more in stage 1.
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something we
> > want to consider for GCC 12?
> > 
> > PR c++/96780
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold) : When optimizing,
> > fold calls to std::move/forward into simple casts.
> > * cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
> > * typeck.cc (is_std_move_p, is_std_forward_p): Export.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/opt/pr96780.C: New test.
> > ---
> >   gcc/cp/cp-gimplify.cc  | 18 ++
> >   gcc/cp/cp-tree.h   |  2 ++
> >   gcc/cp/typeck.cc   |  6 ++
> >   gcc/testsuite/g++.dg/opt/pr96780.C | 24 
> >   4 files changed, 46 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index d7323fb5c09..0b009b631c7 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > +++ b/gcc/cp/cp-gimplify.cc
> > @@ -2756,6 +2756,24 @@ cp_fold (tree x)
> > case CALL_EXPR:
> > {
> > +   if (optimize
> 
> I think this should check flag_no_inline rather than optimize.

Sounds good.

Here's a patch that extends the folding to as_const and addressof (as
well as __addressof, which I'm kind of unsure about since it's
non-standard).  I suppose it also doesn't hurt to verify that the return
and argument type of the function are sane before we commit to folding.

-- >8 --

Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]

A well-formed call to std::move/forward is equivalent to a cast, but the
former being a function call means the compiler generates debug info for
it, which persists even after the call has been inlined away, for an
operation that's never interesting to debug.

This patch addresses this problem in a relatively ad-hoc way by folding
calls to std::move/forward and other cast-like functions into simple
casts as part of the frontend's general expression folding routine.
After this patch with -O2 and a non-checking compiler, debug info size
for some testcases decreases by about ~10% and overall compile time and
memory usage decreases by ~2%.

PR c++/96780

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold) : When optimizing,
fold calls to std::move/forward and other cast-like functions
into simple casts.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr96780.C: New test.
---
 gcc/cp/cp-gimplify.cc  | 36 +++-
 gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++
 2 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index d7323fb5c09..efc4c8f0eb9 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2756,9 +2756,43 @@ cp_fold (tree x)
 
 case CALL_EXPR:
   {
-   int sv = optimize, nw = sv;
tree callee = get_callee_fndecl (x);
 
+   /* "Inline" calls to std::move/forward and other cast-like functions
+  by simply folding them into the corresponding cast determined by
+  their return type.  This is cheaper than relying on the middle-end
+  to do so, and also means we avoid generating useless debug info for
+  them at all.
+
+  At this point the argument has already been converted into a
+  reference, so it suffices to use a NOP_EXPR to express the
+  cast.  */
+   i

Re: [PATCH] c++: fold calls to std::move/forward [PR96780]

2022-03-10 Thread Jonathan Wakely via Gcc-patches
On Thu, 10 Mar 2022 at 15:27, Patrick Palka wrote:
> Here's a patch that extends the folding to as_const and addressof (as
> well as __addressof, which I'm kind of unsure about since it's
> non-standard).

N.B. libstdc++ almost never uses std::addressof, because that calls
std::__addressof, so we just use that directly to avoid the double
indirection. I plan to change that in stage 1 and make std::addressof
just call the built-in directly, so that it won't have the extra
overhead. If they both get folded that wouldn't matter so much (it
would still be useful for Clang, and would presumably make GCC compile
ever so slightly faster).


Re: [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

2022-03-10 Thread Marcel Vollweiler

Hi Jakub,

This is an update to the patch from Tue Mar 8:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591343.html

I just added "get_mapped_ptr" to the "omp_runtime_apis" array in omp-low.cc and
replaced "omp_get_num_devices" by "gomp_get_num_devices" in target.c.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

gcc/ChangeLog:

* omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to
omp_runtime_apis array.

libgomp/ChangeLog:

* libgomp.map: Added omp_get_mapped_ptr.
* libgomp.texi: Tagged omp_get_mapped_ptr as supported.
* omp.h.in: Added omp_get_mapped_ptr.
* omp_lib.f90.in: Added interface for omp_get_mapped_ptr.
* omp_lib.h.in: Likewise.
* target.c (omp_get_mapped_ptr): Added implementation of
omp_get_mapped_ptr.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 77176ef..02a0f72 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -3962,6 +3962,7 @@ omp_runtime_api_call (const_tree fndecl)
   "target_is_present",
   "target_memcpy",
   "target_memcpy_rect",
+  "get_mapped_ptr",
   NULL,
   /* Now omp_* calls that are available as omp_* and omp_*_; however, the
 DECL_NAME is always omp_* without tailing underscore.  */
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac5809..608a54c 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -226,6 +226,11 @@ OMP_5.1 {
omp_get_teams_thread_limit_;
 } OMP_5.0.2;
 
+OMP_5.1.1 {
+  global:
+   omp_get_mapped_ptr;
+} OMP_5.1;
+
 GOMP_1.0 {
   global:
GOMP_atomic_end;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 161a423..c163b56 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -314,7 +314,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{omp_target_is_accessible} runtime routine @tab N @tab
 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
   runtime routines @tab N @tab
-@item @code{omp_get_mapped_ptr} runtime routine @tab N @tab
+@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
 @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
   @code{omp_aligned_calloc} runtime routines @tab Y @tab
 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 89c5d65..18d0152 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -282,6 +282,7 @@ extern int omp_target_memcpy_rect (void *, const void *, 
__SIZE_TYPE__, int,
 extern int omp_target_associate_ptr (const void *, const void *, __SIZE_TYPE__,
 __SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern int omp_target_disassociate_ptr (const void *, int) __GOMP_NOTHROW;
+extern void *omp_get_mapped_ptr (const void *, int) __GOMP_NOTHROW;
 
 extern void omp_set_affinity_format (const char *) __GOMP_NOTHROW;
 extern __SIZE_TYPE__ omp_get_affinity_format (char *, __SIZE_TYPE__)
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index daf40dc..506f15c 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -835,6 +835,15 @@
   end function omp_target_disassociate_ptr
 end interface
 
+interface
+  function omp_get_mapped_ptr (ptr, device_num) bind(c)
+use, intrinsic :: iso_c_binding, only : c_ptr, c_int
+type(c_ptr) :: omp_get_mapped_ptr
+type(c_ptr), value :: ptr
+integer(c_int), value :: device_num
+  end function omp_get_mapped_ptr
+end interface
+
 #if _OPENMP >= 201811
 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
 #endif
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index ff857a4..0f48510 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -416,3 +416,12 @@
   integer(c_int), value :: device_num
 end function omp_target_disassociate_ptr
   end interface
+
+  interface
+function omp_get_mapped_ptr (ptr, device_num) bind(c)
+  use, intrinsic :: iso_c_binding, only : c_ptr, c_int
+  type(c_ptr) :: omp_get_mapped_ptr
+  t

Re: [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

2022-03-10 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 10, 2022 at 05:01:35PM +0100, Marcel Vollweiler wrote:
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -3962,6 +3962,7 @@ omp_runtime_api_call (const_tree fndecl)
>"target_is_present",
>"target_memcpy",
>"target_memcpy_rect",
> +  "get_mapped_ptr",
>NULL,
>/* Now omp_* calls that are available as omp_* and omp_*_; however, the
>DECL_NAME is always omp_* without tailing underscore.  */

The entries in each NULL separated subsection are supposed to be sorted
alphabetically.
Other than that LGTM, but stage1 is still far...

Jakub



Re: [PATCH] c++: naming a dependently-scoped template for CTAD [PR104641]

2022-03-10 Thread Patrick Palka via Gcc-patches
On Wed, 9 Mar 2022, Jason Merrill wrote:

> On 3/9/22 10:39, Patrick Palka wrote:
> > On Tue, 8 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/2/22 14:32, Patrick Palka wrote:
> > > > In order to be able to perform CTAD for a dependently-scoped template
> > > > such as A::B in the testcase below, we need to permit a
> > > > typename-specifier to resolve to a template as per [dcl.type.simple]/2,
> > > > at least when it appears in a CTAD-enabled context.
> > > > 
> > > > This patch implements this using a new tsubst flag tf_tst_ok to control
> > > > when a TYPENAME_TYPE is allowed to name a template, and sets this flag
> > > > when substituting into the type of a CAST_EXPR, CONSTRUCTOR or VAR_DECL
> > > > (each of which is a CTAD-enabled context).
> > > 
> > > What breaks if we always allow that, or at least in -std that support
> > > CTAD?
> > 
> > AFAICT no significant breakage, but some accepts-invalid and diagnostic
> > regressions crop up, e.g. accepts-invalid for
> > 
> >using type = typename A::B; // no more diagnostic if typename resolves
> > to a
> >   // template at instantiation time
> > 
> > and diagnostic regression for
> > 
> >template::B> void f();
> >// no more elaboration why deduction failed if typename resolves
> >// to a template
> 
> Ah, sure, the cost is that we would need to check for this case in various
> callers, rather than setting a flag in a different set of callers.  Fair
> enough.

Yes exactly, and presumably the set of callers for which typename is
permitted to resolve to a template is much smaller, so making the
behavior opt-in instead of opt-out seems more desirable.  Alternatively
we could add a new flag to TYPENAME_TYPE set carefully at parse time
that controls this behavior, but seems overall simpler to not use a
new tree flag if we can get away with it.

> 
> > @@ -16229,6 +16237,12 @@ tsubst (tree t, tree args, tsubst_flags_t complain,
> > tree in_decl)
> >   }
> >   }
> >  +  if (TREE_CODE (f) == TEMPLATE_DECL)
> > + {
> > +   gcc_checking_assert (tst_ok);
> > +   f = make_template_placeholder (f);
> > + }
> 
> How about calling make_template_placeholder in make_typename_type?

That works nicely too, like so?

-- >8 --

Subject: [PATCH] c++: naming a dependently-scoped template for CTAD [PR104641]

In order to be able to perform CTAD for a dependently-scoped template
(such as A::B in the testcase below), we need to permit a
typename-specifier to resolve to a template as per [dcl.type.simple]/3,
at least when it appears in a CTAD-enabled context.

This patch implements this using a new tsubst flag tf_tst_ok to control
when a TYPENAME_TYPE is allowed to name a template, and sets this flag
when substituting into the type of a CAST_EXPR, CONSTRUCTOR or VAR_DECL
(each of which is a CTAD-enabled context).

PR c++/104641

gcc/cp/ChangeLog:

* cp-tree.h (tsubst_flags::tf_tst_ok): New flag.
* decl.cc (make_typename_type): Allow a typename-specifier to
resolve to a template when tf_tst_ok, in which case return
a CTAD placeholder for the template.
* pt.cc (tsubst_decl) : Set tf_tst_ok when
substituting the type.
(tsubst): Clear tf_tst_ok and remember if it was set.
: Pass tf_tst_ok to make_typename_type
appropriately.
(tsubst_copy) : Set tf_tst_ok when substituting
the type.
(tsubst_copy_and_build) : Likewise.
: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction107.C: New test.
---
 gcc/cp/cp-tree.h  |  2 ++
 gcc/cp/decl.cc| 20 ++---
 gcc/cp/pt.cc  | 29 +++
 .../g++.dg/cpp1z/class-deduction107.C | 20 +
 4 files changed, 61 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction107.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b71bce1ab97..b7606f22287 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5557,6 +5557,8 @@ enum tsubst_flags {
(build_target_expr and friends) */
   tf_norm = 1 << 11,/* Build diagnostic information during
constraint normalization.  */
+  tf_tst_ok = 1 << 12,  /* Allow a typename-specifier to name
+   a template.  */
   /* Convenient substitution flags combinations.  */
   tf_warning_or_error = tf_warning | tf_error
 };
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 992e38385c2..d2d46915068 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4204,16 +4204,28 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
 }
   if (!want_template && TREE_CODE (t) != TYPE_DECL)
 {
-  if (complain & tf_error)
-   error ("% names %q#T, which is not a type",
-  context, name, t);
-  return error_ma

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059

Hi,

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.
> 
specification. 
(ok :-)


> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.
> 

inlining.


> This code removes the -mpower8-fusion and -mpower8-fusion-sign options, and
> only enables power8 fusion if we are tuning for a power8.  Power8 sign fusion
> is only enabled if we are tuning for a power8 and we have -O3 optimization or
> higher.
> 
> I left the options -mno-power8-fusion and -mno-power8-fusion-sign in 
> rs6000.opt
> and they don't issue a warning.  If the user explicitly used -mpower8-fusion 
> or
> -mpower8-fusion-sign, then they will get a warning that the swtich has been
> removed.
> 

switch


> Similarly, I left in the pragma target and attribute target support for the
> fusion options, but they don't do anything now.  This is because I believe the
> customer who encountered this problem now is explicitly setting the
> no-power8-fusion option in the pragma or attribute to avoid the warning.
> 
> I have tested this on the following systems, and they all bootstraps fine and
> there were no regressions in the test suite:
> 
> big endian power8 (both 64-bit and 32-bit)
> little endian power9
> little endian power10
> 

ok.

> Can I check this patch into the current master branch for GCC and after a
> cooling period check in the patch to the GCC 11 and GCC 10 branches.  The
> customer is currently using GCC 10.
> 
> 2022-03-09   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.

ok

>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target to set
>   power8-fusion and power8-fusion-sign, but these no longer represent
>   options that the user can set.
>   (rs6000_print_options_internal): Skip printing nop options.

ok


>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.

ok


>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore the no form and warn that the option was removed for the regular
>   form.
>   (-mpower8-fusion-sign): Likewise.

ok

>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mpower8-fusion
>   and -mpower8-fusion-sign.

This change removes the -mpower8-fusion and -mno-power8-fusion options,
There is not a direct reference to -mpower8-fusion-sign in the change
here.  It may be an implied removal, but not immediately obvious to me.


> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.

ok

> ---
>  gcc/config/rs6000/rs6000-cpus.def | 22 +++--
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 14 +-
>  gcc/config/rs6000/rs6000.opt  | 19 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 75 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..a05b2d8c41a 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -43,9 +43,7 @@
>| OPTION_MASK_ALTIVEC  \
>| OPTION_MASK_VSX)
> 
> -/* For now, don't provide an embedded version of ISA 2.07.  Do n

[RFC] RISCV: Combine Pass Clobber Ops

2022-03-10 Thread Patrick O'Neill
RISC-V's C-extension describes 2-byte instructions with special
constraints. One of those constraints is that one of the sources/dest 
registers are equal (op will clobber one of it's operands). This patch
adds support for combining simple sequences:

r1 = r2 + r3 (4 bytes)
r2 DEAD
r4 = r1 + r5 (4 bytes)
r1 DEAD

Combine pass now generates:

r2 = r2 + r3 (2 bytes)
r4 = r2 + r5 (4 bytes)
r2 DEAD

This change results in a ~150 Byte decrease in the linux kernel's
compiled size (text: 5327254 Bytes -> 5327102 Bytes).

I added this enforcement during the combine pass since it looks at the
cost of certian expressions and can rely on the target to tell the
pass that clobber-ops are cheaper than regular ops.

The main thing holding this RFC back is the combine pass's behavior for
sequences like this:
b = a << 5;
c = b + 2;

Normally the combine pass modifies the RTL to be:
c = (a << 5) + 2
before expanding it back to the original statement.

With my changes, the RTL is prevented from being combined like that and
instead results in RTL like this:
c = 2
which is clearly wrong.

I think that the next step would be to figure out where this
re-expansion takes place and implement the same-register constraint
there. However, I'm opening the RFC for any input:
1. Are there better ways to enforce same-register constraints during the
   combine pass other than declaring the source/dest register to be the
   same in RTL? Specifically, I'm concerned that this addition may
   restrict subsequent RTL pass optimizations.
2. Are there other concerns with implementing source-dest constraints
   within the combine pass?
3. Any other thoughts/input you have is welcome!

2022-03-10 Patrick O'Neill 

* combine.cc: Add register equality replacement.
* riscv.cc (riscv_insn_cost): Add in order to tell combine pass
  that clobber-ops are cheaper.
* riscv.h: Add c extension argument macros.

Signed-off-by: Patrick O'Neill 
---
 gcc/combine.cc| 71 +++
 gcc/config/riscv/riscv.cc | 42 +++
 gcc/config/riscv/riscv.h  |  7 
 3 files changed, 120 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 8f06ee0e54f..be910f8c734 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3280,6 +3280,77 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   i2_is_used = n_occurrences;
 }
 
+/* Attempt to replace ops with clobber-ops.
+ If the target implements clobber ops (set r1 (plus (r1)(r2))) as cheaper,
+ this pattern allows the combine pass to optimize with that in mind.
+ NOTE: This conditional is not triggered in most cases. Ideally we would be
+ able to move it above the if (i2_is_used == 0), but that breaks the
+ testsuite.
+ See RFC blurb for more info.  */
+  if (!i0 && !i1 && i2 && i3 && GET_CODE(PATTERN(i2)) == SET && 
GET_CODE(SET_DEST(PATTERN(i2))) == REG
+  && GET_CODE(PATTERN(i3)) == SET
+  && (GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_BIN_ARITH || 
GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_COMM_ARITH || 
GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_UNARY)
+  && GET_CODE(SET_DEST(PATTERN(i3))) == REG && 
REGNO(XEXP(SET_SRC(PATTERN(i3)), 0)) != REGNO(SET_DEST(PATTERN(i3 {
+
+rtx_code insn_class = GET_CODE(SET_SRC (PATTERN(i2)));
+
+if (GET_RTX_CLASS(insn_class) == RTX_BIN_ARITH || 
GET_RTX_CLASS(insn_class) == RTX_COMM_ARITH || GET_RTX_CLASS(insn_class) == 
RTX_UNARY) {
+
+  rtx operand1 = XEXP (SET_SRC (PATTERN (i2)), 0);
+  rtx prior_reg = SET_DEST (PATTERN (i2));
+
+  if (GET_CODE(operand1) == REG
+  && GET_MODE(operand1) == GET_MODE(prior_reg)
+ && find_reg_note (i2, REG_DEAD, operand1)
+  && regno_use_in (REGNO(prior_reg), PATTERN(i3))
+ && find_reg_note (i3, REG_DEAD, SET_DEST (PATTERN(i2 {
+
+   // Now we have a dead operand register, and we know where the dest dies.
+
+   // Remove the note declaring the register as dead
+   rtx note = find_reg_note (i2, REG_DEAD, operand1);
+   remove_note (i2, note);
+
+   // Overwrite i2 dest with operand1
+   rtx i2_dest = copy_rtx(operand1);
+   SUBST (SET_DEST (PATTERN (i2)), i2_dest);
+
+   // Replace the previous i2 dest register with operand1 in i3
+   rtx op1_copy = copy_rtx(operand1);
+   rtx new_src = simplify_replace_rtx(SET_SRC (PATTERN (i3)), prior_reg, 
op1_copy);
+   SUBST (SET_SRC (PATTERN (i3)), new_src);
+
+   // Move the dest dead note to the new register
+   note = find_reg_note (i3, REG_DEAD, prior_reg);
+   if (note) {
+ remove_note (i3, note);
+ //add_reg_note (i3, REG_DEAD, op1_copy);
+   }
+
+   newi2pat = PATTERN (i2);
+   newpat = PATTERN (i3);
+
+   subst_insn = i3;
+   subst_low_luid = DF_INSN_LUID (i2);
+   added_sets_2 = added_sets_1 = added_sets_0 = 0;
+   i2dest = i2_dest;
+   i2dest_killed = dead_

Re: [RFC] RISCV: Combine Pass Clobber Ops

2022-03-10 Thread H.J. Lu via Gcc-patches
On Thu, Mar 10, 2022 at 9:36 AM Patrick O'Neill  wrote:
>
> RISC-V's C-extension describes 2-byte instructions with special
> constraints. One of those constraints is that one of the sources/dest
> registers are equal (op will clobber one of it's operands). This patch
> adds support for combining simple sequences:
>
> r1 = r2 + r3 (4 bytes)
> r2 DEAD
> r4 = r1 + r5 (4 bytes)
> r1 DEAD
>
> Combine pass now generates:
>
> r2 = r2 + r3 (2 bytes)
> r4 = r2 + r5 (4 bytes)
> r2 DEAD
>
> This change results in a ~150 Byte decrease in the linux kernel's
> compiled size (text: 5327254 Bytes -> 5327102 Bytes).
>
> I added this enforcement during the combine pass since it looks at the
> cost of certian expressions and can rely on the target to tell the
> pass that clobber-ops are cheaper than regular ops.
>
> The main thing holding this RFC back is the combine pass's behavior for
> sequences like this:
> b = a << 5;
> c = b + 2;
>
> Normally the combine pass modifies the RTL to be:
> c = (a << 5) + 2
> before expanding it back to the original statement.
>
> With my changes, the RTL is prevented from being combined like that and
> instead results in RTL like this:
> c = 2
> which is clearly wrong.
>
> I think that the next step would be to figure out where this
> re-expansion takes place and implement the same-register constraint
> there. However, I'm opening the RFC for any input:
> 1. Are there better ways to enforce same-register constraints during the
>combine pass other than declaring the source/dest register to be the
>same in RTL? Specifically, I'm concerned that this addition may
>restrict subsequent RTL pass optimizations.
> 2. Are there other concerns with implementing source-dest constraints
>within the combine pass?
> 3. Any other thoughts/input you have is welcome!
>
> 2022-03-10 Patrick O'Neill 
>
> * combine.cc: Add register equality replacement.
> * riscv.cc (riscv_insn_cost): Add in order to tell combine pass
>   that clobber-ops are cheaper.
> * riscv.h: Add c extension argument macros.
>
> Signed-off-by: Patrick O'Neill 
> ---
>  gcc/combine.cc| 71 +++
>  gcc/config/riscv/riscv.cc | 42 +++
>  gcc/config/riscv/riscv.h  |  7 
>  3 files changed, 120 insertions(+)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 8f06ee0e54f..be910f8c734 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3280,6 +3280,77 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>i2_is_used = n_occurrences;
>  }
>
> +/* Attempt to replace ops with clobber-ops.
> + If the target implements clobber ops (set r1 (plus (r1)(r2))) as 
> cheaper,
> + this pattern allows the combine pass to optimize with that in mind.
> + NOTE: This conditional is not triggered in most cases. Ideally we would 
> be
> + able to move it above the if (i2_is_used == 0), but that breaks the
> + testsuite.
> + See RFC blurb for more info.  */
> +  if (!i0 && !i1 && i2 && i3 && GET_CODE(PATTERN(i2)) == SET && 
> GET_CODE(SET_DEST(PATTERN(i2))) == REG

Please put one condition per line to break the long line.

> +  && GET_CODE(PATTERN(i3)) == SET
> +  && (GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_BIN_ARITH || 
> GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_COMM_ARITH || 
> GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_UNARY)

Likewise.

> +  && GET_CODE(SET_DEST(PATTERN(i3))) == REG && 
> REGNO(XEXP(SET_SRC(PATTERN(i3)), 0)) != REGNO(SET_DEST(PATTERN(i3 {

Likewise.

> +
> +rtx_code insn_class = GET_CODE(SET_SRC (PATTERN(i2)));
> +
> +if (GET_RTX_CLASS(insn_class) == RTX_BIN_ARITH || 
> GET_RTX_CLASS(insn_class) == RTX_COMM_ARITH || GET_RTX_CLASS(insn_class) == 
> RTX_UNARY) {
> +

Likewise.

> +  rtx operand1 = XEXP (SET_SRC (PATTERN (i2)), 0);
> +  rtx prior_reg = SET_DEST (PATTERN (i2));
> +
> +  if (GET_CODE(operand1) == REG
> +  && GET_MODE(operand1) == GET_MODE(prior_reg)
> + && find_reg_note (i2, REG_DEAD, operand1)
> +  && regno_use_in (REGNO(prior_reg), PATTERN(i3))
> + && find_reg_note (i3, REG_DEAD, SET_DEST (PATTERN(i2 {
> +
> +   // Now we have a dead operand register, and we know where the dest 
> dies.
> +
> +   // Remove the note declaring the register as dead
> +   rtx note = find_reg_note (i2, REG_DEAD, operand1);
> +   remove_note (i2, note);
> +
> +   // Overwrite i2 dest with operand1
> +   rtx i2_dest = copy_rtx(operand1);
> +   SUBST (SET_DEST (PATTERN (i2)), i2_dest);
> +
> +   // Replace the previous i2 dest register with operand1 in i3
> +   rtx op1_copy = copy_rtx(operand1);
> +   rtx new_src = simplify_replace_rtx(SET_SRC (PATTERN (i3)), prior_reg, 
> op1_copy);
> +   SUBST (SET_SRC (PATTERN (i3)), new_src);
> +
> +   // Move the dest dead note to the new register
> +   note = find_reg_note (i3

Re: [committed] libstdc++: Support VAX floats in std::strong_order

2022-03-10 Thread Koning, Paul via Gcc-patches



> On Mar 10, 2022, at 9:27 AM, Jonathan Wakely via Gcc-patches 
>  wrote:
> 
> On Thu, 10 Mar 2022 at 12:16, Jonathan Wakely wrote:
>> 
>> On Thu, 10 Mar 2022 at 11:53, Jonathan Wakely via Libstdc++
>>  wrote:
>>> 
>>> Tested x86_64-linux, and basic soundness check on vax-dec-netbsdelf.
>> 
>> But apparently not enough of a soundness check, because
>> isnan(__builtin_nan("")) is true for VAX, so GCC seems to have a NaN
>> pattern, despite what I read online about the format.

VAX float has signalling NaN, but not a non-signalling NaN nor an Inf.  See the 
VAX architecture manual.  Signalling NaN (called "reserved operand") is encoded 
as sign=1 and exponent=0.

paul



Re: [PATCH] libgomp : OMPD implementation

2022-03-10 Thread Mohamed Atef via Gcc-patches
Hi all,
  We remind you of this patch.

Thanks

Mohamed

في الأربعاء، ١٦ فبراير، ٢٠٢٢ ١١:٠٤ م Mohamed Atef 
كتب:

> Sorry I forgot to uncomment 2 lines,
> here's the Patch Again.
>
> Thanks
> Mohamed
>
> On Wed, Feb 16, 2022 at 10:54 PM Mohamed Atef 
> wrote:
>
>> HI,
>> I am sorry that the previous patch was buggy.
>> This patch contains the header files and source files of functions that
>> are specified in OpenMP Application ProgrammingInterface book from sections
>> (5.1, 5.2, 5.3, 5.4, 5.5.1, 5.5.2) the functions are tested using the gdb
>> plugin and the results are correct.
>> Please Review this Patch and reply to us.
>>
>> 2022-02-16  Mohamed Atef  
>>
>> * Makefile.am (toolexeclib_LTLIBRARIES): Add libgompd.la.
>> (libgompd_la_LDFLAGS, libgompd_la_DEPENDENCIES, libgompd_la_LINK,
>> libgompd_la_SOURCES, libgompd_version_dep,
>> libgompd_version_script,
>> libgompd.ver-sun, libgompd.ver, libgompd_version_info): Defined.
>> * Makefile.in: Regenerate.
>> * aclocal.m4: Regenerate.
>> * config/darwin/plugin-suffix.h: Removed ().
>> * config/hpux/plugin-suffix.h: Removed ().
>> * config/posix/plugin-suffix.h: Removed ().
>> * configure: Regenerate.
>> * env.c: (#include "ompd-support.h") : Added.
>> (initialize_env) : Call ompd_load().
>> * parallel.c:(#include "ompd-support.h"): Added.
>> (GOMP_parallel) : Call ompd_bp_parallel_begin and
>> ompd_bp_parallel_end.
>> * libgomp.map: Add OMP_5.0.3 symobl versions.
>> * libgompd.map: New file.
>> * omp-tools.h.in : New file.
>> * omp-types.h.in : New file.
>> * ompd-support.h : New file.
>> * ompd-support.c : New file.
>> * ompd-helper.h : New file.
>> * ompd-helper.c: New file.
>> * ompd-init.c: New file.
>> * testsuite/Makfile.in: Regenerate.
>>
>>
>>
>>


[RFC v2] RISCV: Combine Pass Clobber Ops

2022-03-10 Thread Patrick O'Neill
RISC-V's C-extension describes 2-byte instructions with special
constraints. One of those constraints is that one of the sources/dest 
registers are equal (op will clobber one of it's operands). This patch
adds support for combining simple sequences:

r1 = r2 + r3 (4 bytes)
r2 DEAD
r4 = r1 + r5 (4 bytes)
r1 DEAD

Combine pass now generates:

r2 = r2 + r3 (2 bytes)
r4 = r2 + r5 (4 bytes)
r2 DEAD

This change results in a ~150 Byte decrease in the linux kernel's
compiled size (text: 5327254 Bytes -> 5327102 Bytes).

I added this enforcement during the combine pass since it looks at the
cost of certian expressions and can rely on the target to tell the
pass that clobber-ops are cheaper than regular ops.

The main thing holding this RFC back is the combine pass's behavior for
sequences like this:
b = a << 5;
c = b + 2;

Normally the combine pass modifies the RTL to be:
c = (a << 5) + 2
before expanding it back to the original statement.

With my changes, the RTL is prevented from being combined like that and
instead results in RTL like this:
c = 2
which is clearly wrong.

I think that the next step would be to figure out where this
re-expansion takes place and implement the same-register constraint
there. However, I'm opening the RFC for any input:
1. Are there better ways to enforce same-register constraints during the
   combine pass other than declaring the source/dest register to be the
   same in RTL? Specifically, I'm concerned that this addition may
   restrict subsequent RTL pass optimizations.
2. Are there other concerns with implementing source-dest constraints
   within the combine pass?
3. Any other thoughts/input you have is welcome!

2022-03-10 Patrick O'Neill 

* combine.cc: Add register equality replacement.
* riscv.cc (riscv_insn_cost): Add in order to tell combine pass
  that clobber-ops are cheaper.
* riscv.h: Add c extension argument macros.

Signed-off-by: Patrick O'Neill 
---
Changelog:
v2:
 - Fix whitespace
 - Rearrange conditionals to break long lines
---
 gcc/combine.cc| 78 +++
 gcc/config/riscv/riscv.cc | 42 +
 gcc/config/riscv/riscv.h  |  7 
 3 files changed, 127 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 8f06ee0e54f..1b14e802166 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3280,6 +3280,84 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   i2_is_used = n_occurrences;
 }
 
+/* Attempt to replace ops with clobber-ops.
+ If the target implements clobber ops (set r1 (plus (r1)(r2))) as cheaper,
+ this pattern allows the combine pass to optimize with that in mind.
+ NOTE: This conditional is not triggered in most cases. Ideally we would be
+ able to move it above the if (i2_is_used == 0), but that breaks the
+ testsuite.
+ See RFC blurb for more info.  */
+  if (!i0 && !i1 && i2 && i3 && GET_CODE(PATTERN(i2)) == SET
+  && GET_CODE(SET_DEST(PATTERN(i2))) == REG
+  && GET_CODE(PATTERN(i3)) == SET
+  && (GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_BIN_ARITH
+ || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_COMM_ARITH
+ || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3 == RTX_UNARY)
+  && GET_CODE(SET_DEST(PATTERN(i3))) == REG
+  && REGNO(XEXP(SET_SRC(PATTERN(i3)), 0)) != REGNO(SET_DEST(PATTERN(i3 
{
+
+rtx_code insn_class = GET_CODE(SET_SRC (PATTERN(i2)));
+
+if (GET_RTX_CLASS(insn_class) == RTX_BIN_ARITH
+   || GET_RTX_CLASS(insn_class) == RTX_COMM_ARITH
+   || GET_RTX_CLASS(insn_class) == RTX_UNARY) {
+
+  rtx operand1 = XEXP (SET_SRC (PATTERN (i2)), 0);
+  rtx prior_reg = SET_DEST (PATTERN (i2));
+
+  if (GET_CODE(operand1) == REG
+ && GET_MODE(operand1) == GET_MODE(prior_reg)
+ && find_reg_note (i2, REG_DEAD, operand1)
+ && regno_use_in (REGNO(prior_reg), PATTERN(i3))
+ && find_reg_note (i3, REG_DEAD, SET_DEST (PATTERN(i2 {
+
+   // Now we have a dead operand register, and we know where the dest dies.
+
+   // Remove the note declaring the register as dead
+   rtx note = find_reg_note (i2, REG_DEAD, operand1);
+   remove_note (i2, note);
+
+   // Overwrite i2 dest with operand1
+   rtx i2_dest = copy_rtx(operand1);
+   SUBST (SET_DEST (PATTERN (i2)), i2_dest);
+
+   // Replace the previous i2 dest register with operand1 in i3
+   rtx op1_copy = copy_rtx(operand1);
+   rtx new_src = simplify_replace_rtx(SET_SRC (PATTERN (i3)),
+   prior_reg, op1_copy);
+   SUBST (SET_SRC (PATTERN (i3)), new_src);
+
+   // Move the dest dead note to the new register
+   note = find_reg_note (i3, REG_DEAD, prior_reg);
+   if (note) {
+ remove_note (i3, note);
+ //add_reg_note (i3, REG_DEAD, op1_copy);
+   }
+
+   newi2pat = PATTERN (i2);
+   newpat = PATTERN (i3);
+
+

[PATCH] c++: Fix ICE with bad conversion shortcutting [PR104622]

2022-03-10 Thread Patrick Palka via Gcc-patches
When shortcutting bad conversions during overload resolution, we assume
argument conversions get computed in sequential order and that therefore
we just need to inspect the last conversion in order to determine if _any_
conversion is missing.  But this assumption turns out to be false for
templates, because during deduction check_non_deducible_conversion can
compute an argument conversion out of order.

So in the testcase below, at the end of add_template_candidate the convs
array looks like {bad, missing, good} where the last conversion was
computed during deduction and the first was computed later from
add_function_candidate.  We need to add this candidate to BAD_FNS since
not all of its argument conversions were computed, but we don't do so
because we only checked if the last argument conversion was missing.

This patch fixes this by checking for a missing conversion exhaustively.
In passing, this cleans up check_non_deducible_conversion a bit since
AFAICT the only values of strict we expect to see here are the three
enumerators of unification_kind_t.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/104622

gcc/cp/ChangeLog:

* call.cc (missing_conversion_p): Define.
(add_candidates): Use it.
* pt.cc (check_non_deducible_conversion): Change type of strict
parameter to unification_kind_t and directly test for DEDUCE_CALL.

gcc/testsuite/ChangeLog:

* g++.dg/template/conv18.C: New test.
---
 gcc/cp/call.cc | 13 -
 gcc/cp/pt.cc   |  6 +++---
 gcc/testsuite/g++.dg/template/conv18.C | 14 ++
 3 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/conv18.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index d6eed5ed835..8fe8ef306ea 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -6023,6 +6023,17 @@ perfect_candidate_p (z_candidate *cand)
   return true;
 }
 
+/* True iff one of CAND's argument conversions is NULL.  */
+
+static bool
+missing_conversion_p (const z_candidate *cand)
+{
+  for (unsigned i = 0; i < cand->num_convs; ++i)
+if (!cand->convs[i])
+  return true;
+  return false;
+}
+
 /* Add each of the viable functions in FNS (a FUNCTION_DECL or
OVERLOAD) to the CANDIDATES, returning an updated list of
CANDIDATES.  The ARGS are the arguments provided to the call;
@@ -6200,7 +6211,7 @@ add_candidates (tree fns, tree first_arg, const vec *args,
 
   if (cand->viable == -1
  && shortcut_bad_convs
- && !cand->convs[cand->reversed () ? 0 : cand->num_convs - 1])
+ && missing_conversion_p (cand))
{
  /* This candidate has been tentatively marked non-strictly viable,
 and we didn't compute all argument conversions for it (having
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f890d92d715..715eea27577 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -152,7 +152,7 @@ static tree coerce_innermost_template_parms (tree, tree, 
tree, tsubst_flags_t,
  bool, bool);
 static void tsubst_enum(tree, tree, tree);
 static bool check_instantiated_args (tree, tree, tsubst_flags_t);
-static int check_non_deducible_conversion (tree, tree, int, int,
+static int check_non_deducible_conversion (tree, tree, unification_kind_t, int,
   struct conversion **, bool);
 static int maybe_adjust_types_for_deduction (tree, unification_kind_t,
 tree*, tree*, tree);
@@ -22304,7 +22304,7 @@ maybe_adjust_types_for_deduction (tree tparms,
unify_one_argument.  */
 
 static int
-check_non_deducible_conversion (tree parm, tree arg, int strict,
+check_non_deducible_conversion (tree parm, tree arg, unification_kind_t strict,
int flags, struct conversion **conv_p,
bool explain_p)
 {
@@ -22324,7 +22324,7 @@ check_non_deducible_conversion (tree parm, tree arg, 
int strict,
   if (can_convert_arg (type, parm, NULL_TREE, flags, complain))
return unify_success (explain_p);
 }
-  else if (strict != DEDUCE_EXACT)
+  else if (strict == DEDUCE_CALL)
 {
   bool ok = false;
   tree conv_arg = TYPE_P (arg) ? NULL_TREE : arg;
diff --git a/gcc/testsuite/g++.dg/template/conv18.C 
b/gcc/testsuite/g++.dg/template/conv18.C
new file mode 100644
index 000..f59f6fda77c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/conv18.C
@@ -0,0 +1,14 @@
+// PR c++/104622
+// { dg-additional-options "-fpermissive" }
+
+template
+struct type_identity {
+  typedef T type;
+};
+
+template void f(typename type_identity::type*, T, int*);
+
+int main() {
+  const int p = 0;
+  f(&p, 0, 0); // { dg-warning "invalid conversion" }
+}
-- 
2.35.1.455.g1a4874565f



Re: [PATCH] c++: return-type-req in constraint using only outer tparms [PR104527]

2022-03-10 Thread Jason Merrill via Gcc-patches

On 2/16/22 15:56, Patrick Palka wrote:

On Tue, 15 Feb 2022, Jason Merrill wrote:


On 2/14/22 11:32, Patrick Palka wrote:

Here the template context for the atomic constraint has two levels of
template arguments, but since it depends only on the innermost argument
T we use a single-level argument vector during substitution into the
constraint (built by get_mapped_args).  We eventually pass this vector
to do_auto_deduction as part of checking the return-type-requirement
inside the atom, but do_auto_deduction expects outer_targs to be a full
set of arguments for sake of satisfaction.


Could we note the current number of levels in the map and use that in
get_mapped_args instead of the highest level parameter we happened to use?


Ah yeah, that seems to work nicely.  IIUC it should suffice to remember
whether the atomic constraint expression came from a concept definition.
If it did, then the depth of the argument vector returned by
get_mapped_args must be one, otherwise (as in the testcase) it must be
the same as the template depth of the constrained entity, which is the
depth of ARGS.

How does the following look?  Bootstrapped and regtested on
x86_64-pc-linux-gnu and also on cmcstl2 and range-v3.

-- >8 --

Subject: [PATCH] c++: return-type-req in constraint using only outer tparms
  [PR104527]

Here the template context for the atomic constraint has two levels of
template parameters, but since it depends only on the innermost parameter
T we use a single-level argument vector (built by get_mapped_args) during
substitution into the atom.  We eventually pass this vector to
do_auto_deduction as part of checking the return-type-requirement within
the atom, but do_auto_deduction expects outer_targs to be a full set of
arguments for sake of satisfaction.

This patch fixes this by making get_mapped_args always return an
argument vector whose depth corresponds to the template depth of the
context in which the atomic constraint expression was written, instead
of the highest parameter level that the expression happens to use.

PR c++/104527

gcc/cp/ChangeLog:

* constraint.cc (normalize_atom): Set
ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P appropriately.
(get_mapped_args):  Make static, adjust parameters.  Always
return a vector whose depth corresponds to the template depth of
the context of the atomic constraint expression.  Micro-optimize
by passing false as exact to safe_grow_cleared and by collapsing
a multi-level depth-one argument vector.
(satisfy_atom): Adjust call to get_mapped_args and
diagnose_atomic_constraint.
(diagnose_atomic_constraint): Replace map parameter with an args
parameter.
* cp-tree.h (ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P): Define.
(get_mapped_args): Remove declaration.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-return-req4.C: New test.
---
  gcc/cp/constraint.cc  | 64 +++
  gcc/cp/cp-tree.h  |  7 +-
  .../g++.dg/cpp2a/concepts-return-req4.C   | 24 +++
  3 files changed, 69 insertions(+), 26 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 12db7e5cf14..306e28955c6 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -764,6 +764,8 @@ normalize_atom (tree t, tree args, norm_info info)
tree ci = build_tree_list (t, info.context);
  
tree atom = build1 (ATOMIC_CONSTR, ci, map);

+  if (info.in_decl && concept_definition_p (info.in_decl))
+ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (atom) = true;


I'm a bit nervous about relying on in_decl, given that we support 
normalizing when it isn't set; I don't remember the circumstances for 
that.  Maybe make the flag indicate that ctx_parms had depth 1?



if (!info.generate_diagnostics ())
  {
/* Cache the ATOMIC_CONSTRs that we return, so that sat_hasher::equal
@@ -2826,33 +2828,37 @@ satisfaction_value (tree t)
  return boolean_true_node;
  }
  
-/* Build a new template argument list with template arguments corresponding

-   to the parameters used in an atomic constraint.  */
+/* Build a new template argument vector according to the parameter
+   mapping of the atomic constraint T, using arguments from ARGS.  */
  
-tree

-get_mapped_args (tree map)
+static tree
+get_mapped_args (tree t, tree args)
  {
+  tree map = ATOMIC_CONSTR_MAP (t);
+
/* No map, no arguments.  */
if (!map)
  return NULL_TREE;
  
-  /* Find the mapped parameter with the highest level.  */

-  int count = 0;
-  for (tree p = map; p; p = TREE_CHAIN (p))
-{
-  int level;
-  int index;
-  template_parm_level_and_index (TREE_VALUE (p), &level, &index);
-  if (level > count)
-count = level;
-}
+  /* Determine the depth of the resulting argument vector.  */
+  int depth;
+  if (ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (t))
+

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread Segher Boessenkool
On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > --- a/gcc/config/rs6000/rs6000-cpus.def
> > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > @@ -43,9 +43,7 @@
> >  | OPTION_MASK_ALTIVEC  \
> >  | OPTION_MASK_VSX)
> > 
> > -/* For now, don't provide an embedded version of ISA 2.07.  Do not set 
> > power8
> > -   fusion here, instead set it in rs6000.cc if we are tuning for a power8
> > -   system.  */
> > +/* For now, don't provide an embedded version of ISA 2.07.  */
> 
> ok.  (as far as removing the comment, I'm not clear what the remaining
> comment is telling me, but thats outside of the scope of this patch).

It is saying there is nothing that implements Book III-E of ISA 2.07
(nothing in GCC, but no actual CPU either).  Or Category: Embedded even
maybe :-)

It could be clearer perhaps, or just be removed completely; it might
have been useful historically, but it isn't anymore really.


Segher


Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Thu, 2022-03-10 at 13:49 -0600, Segher Boessenkool wrote:
> On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > > --- a/gcc/config/rs6000/rs6000-cpus.def
> > > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > > @@ -43,9 +43,7 @@
> > >| OPTION_MASK_ALTIVEC  
> > > \
> > >| OPTION_MASK_VSX)
> > > 
> > > -/* For now, don't provide an embedded version of ISA 2.07.  Do
> > > not set power8
> > > -   fusion here, instead set it in rs6000.cc if we are tuning for
> > > a power8
> > > -   system.  */
> > > +/* For now, don't provide an embedded version of ISA 2.07.  */
> > 
> > ok.  (as far as removing the comment, I'm not clear what the
> > remaining
> > comment is telling me, but thats outside of the scope of this
> > patch).
> 
> It is saying there is nothing that implements Book III-E of ISA 2.07
> (nothing in GCC, but no actual CPU either).  Or Category: Embedded
> even
> maybe :-)

Lol, Ok.  The small-e in embedded did not clue me in that this was
referring to the big-E Embedded category.  :-)

> It could be clearer perhaps, or just be removed completely; it might
> have been useful historically, but it isn't anymore really.


THanks,
-Will

> 
> 
> Segher



Re: [PATCH] c++: improve location of fold expressions

2022-03-10 Thread Jason Merrill via Gcc-patches

On 3/1/22 00:14, Patrick Palka wrote:

This improves diagnostic quality for unsatisfied atomic constraints
that consist of a fold expression, e.g. in concepts/diagnostics3.C:

   .../diagnostic3.C:10:22: note: the expression ‘(foo && ...) [with Ts = 
{int, char}]’ evaluated to ‘false’
  10 | requires (foo && ...)
 |  ^~~~

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* semantics.cc (finish_unary_fold_expr): Use input_location
instead of UNKNOWN_LOCATION.
(finish_binary_fold_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic3.C: Adjusted expected location of
"evaluated to false" diagnostics.
---
  gcc/cp/semantics.cc | 4 ++--
  gcc/testsuite/g++.dg/concepts/diagnostic3.C | 8 
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index a2c0eb050e6..07cae993efe 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12185,7 +12185,7 @@ finish_unary_fold_expr (tree expr, int op, tree_code 
dir)
  
/* Build the fold expression.  */

tree code = build_int_cstu (integer_type_node, abs (op));
-  tree fold = build_min_nt_loc (UNKNOWN_LOCATION, dir, code, pack);
+  tree fold = build_min_nt_loc (input_location, dir, code, pack);
FOLD_EXPR_MODIFY_P (fold) = (op < 0);
TREE_TYPE (fold) = build_dependent_operator_type (NULL_TREE,
FOLD_EXPR_OP (fold),
@@ -12214,7 +12214,7 @@ finish_binary_fold_expr (tree pack, tree init, int op, 
tree_code dir)
  {
pack = make_pack_expansion (pack);
tree code = build_int_cstu (integer_type_node, abs (op));
-  tree fold = build_min_nt_loc (UNKNOWN_LOCATION, dir, code, pack, init);
+  tree fold = build_min_nt_loc (input_location, dir, code, pack, init);
FOLD_EXPR_MODIFY_P (fold) = (op < 0);
TREE_TYPE (fold) = build_dependent_operator_type (NULL_TREE,
FOLD_EXPR_OP (fold),
diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic3.C 
b/gcc/testsuite/g++.dg/concepts/diagnostic3.C
index 7796e264251..410651a9c1a 100644
--- a/gcc/testsuite/g++.dg/concepts/diagnostic3.C
+++ b/gcc/testsuite/g++.dg/concepts/diagnostic3.C
@@ -7,18 +7,18 @@ template
concept foo = (bool)(foo_v | foo_v);
  
  template

-requires (foo && ...)
+requires (foo && ...) // { dg-message "with Ts = .int, char... evaluated to 
.false." }
  void
-bar() // { dg-message "with Ts = .int, char... evaluated to .false." }
+bar()
  { }
  
  template

  struct S { };
  
  template

-requires (foo> && ...)
+requires (foo> && ...) // { dg-message "with Is = .2, 3, 4... evaluated to 
.false." }
  void
-baz() // { dg-message "with Is = .2, 3, 4... evaluated to .false." }
+baz()
  { }
  
  void




Re: [PATCH] c++: Fix ICE with non-constant satisfaction [PR98644]

2022-03-10 Thread Jason Merrill via Gcc-patches

On 3/1/22 00:10, Patrick Palka wrote:

On Tue, 19 Jan 2021, Jason Merrill wrote:


On 1/13/21 12:05 PM, Patrick Palka wrote:

In the below testcase, the expression of the atomic constraint after
substitution is (int *) NON_LVALUE_EXPR <1> != 0B which is not a C++
constant expression, but its TREE_CONSTANT flag is set (from build2),
so satisfy_atom fails to notice that it's non-constant (and we end
up tripping over the assert in satisfaction_value).

Since TREE_CONSTANT doesn't necessarily correspond to C++ constantness,
this patch makes satisfy_atom instead check is_rvalue_constant_expression.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/10?

gcc/cp/ChangeLog:

PR c++/98644
* constraint.cc (satisfy_atom): Check is_rvalue_constant_expression
instead of TREE_CONSTANT.

gcc/testsuite/ChangeLog:

PR c++/98644
* g++.dg/cpp2a/concepts-pr98644.C: New test.
---
   gcc/cp/constraint.cc  | 2 +-
   gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C | 7 +++
   2 files changed, 8 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 9049d087859..f99a25dc8a4 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2969,7 +2969,7 @@ satisfy_atom (tree t, tree args, sat_info info)
   {
 result = maybe_constant_value (result, NULL_TREE,
 /*manifestly_const_eval=*/true);
-  if (!TREE_CONSTANT (result))


This should be sufficient.  If the result isn't constant, maybe_constant_value
shouldn't return it with TREE_CONSTANT set.  See


   /* This isn't actually constant, so unset TREE_CONSTANT.


in cxx_eval_outermost_constant_expr.


I see, so the problem seems to be that the fail-fast path of
maybe_constant_value isn't clearing TREE_CONSTANT sufficiently.  Would
it make sense to fix this like so?

-- >8 --

Subject: [PATCH] c++: ICE with non-constant satisfaction value [PR98644]

Here during satisfaction the expression of the atomic constraint after
substitution is (int *) NON_LVALUE_EXPR <1> != 0B, which is not a C++
constant expression due to the reinterpret_cast, but TREE_CONSTANT is
set since its value is otherwise effectively constant.  We then call
maybe_constant_value on it, which proceeds via its fail-fast path to
exit early without clearing TREE_CONSTANT.  But satisfy_atom relies
on checking TREE_CONSTANT of the result of maybe_constant_value in order
to detect non-constant satisfaction.

This patch fixes this by making the fail-fast path of maybe_constant_value
clear TREE_CONSTANT in this case, like cxx_eval_outermost_constant_expr
in the normal path would have done.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/98644

gcc/cp/ChangeLog:

* constexpr.cc (maybe_constant_value): In the fail-fast path,
clear TREE_CONSTANT on the result if it's set on the input.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr98644.C: New test.
* g++.dg/parse/array-size2.C: Remove expected diagnostic about a
narrowing conversion.
---
  gcc/cp/constexpr.cc   | 4 +++-
  gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C | 7 +++
  gcc/testsuite/g++.dg/parse/array-size2.C  | 2 --
  3 files changed, 10 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 4716694cb71..234cf0acc26 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7965,8 +7965,10 @@ maybe_constant_value (tree t, tree decl, bool 
manifestly_const_eval)
  
if (!is_nondependent_constant_expression (t))

  {
-  if (TREE_OVERFLOW_P (t))
+  if (TREE_OVERFLOW_P (t)
+ || (!processing_template_decl && TREE_CONSTANT (t)))
{
+ /* This isn't actually constant, so unset TREE_CONSTANT.  */
  t = build_nop (TREE_TYPE (t), t);


build_nop isn't appropriate for arbitrary expressions (classes, in 
particular).  We probably want to factor out the code in 
cxx_eval_outermost_constant_expr under the "this isn't actually 
constant" comment.



  TREE_CONSTANT (t) = false;
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C
new file mode 100644
index 000..6772f72a3ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C
@@ -0,0 +1,7 @@
+// PR c++/98644
+// { dg-do compile { target c++20 } }
+
+template concept Signed = bool(T(1)); // { dg-error 
"reinterpret_cast" }
+static_assert(Signed); // { dg-error "non-constant" }
+
+constexpr bool B = requires { requires bool((char *)1); }; // { dg-error 
"reinterpret_cast" }
diff --git a/gcc/testsuite/g++.dg/parse/array-size2.C 
b/gcc/testsuite/g++.dg/parse/array-size2.C
index c4a69df3b01..e58fe266e77 100644
--- a/gcc/

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-10 Thread Qing Zhao via Gcc-patches


> On Mar 9, 2022, at 12:25 PM, Richard Sandiford via Gcc-patches 
>  wrote:
> 
> Xi Ruoyao  writes:
>> Bootstrapped and regtested on mips64el-linux-gnuabi64.
>> 
>> I'm not sure if it's "correct" to clobber other registers during the
>> zeroing of scratch registers.  But I can't really come up with a better
>> idea: on MIPS there is no simple way to clear one bit in FCSR (i. e.
>> FCC[x]).  We can't just use "c.f.s $fccx,$f0,$f0" because it will raise
>> an exception if $f0 contains a sNaN.
> 
> Yeah, it's a bit of a grey area, but I think it should be fine, provided
> that the extra clobbers are never used as return registers (which is
> obviously true for the FCC registers).
> 
> But on that basis…
> 
>> +static HARD_REG_SET
>> +mips_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  HARD_REG_SET zeroed_hardregs;
>> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> +
>> +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, HI_REGNUM))
>> +{
>> +  /* Clear HI and LO altogether.  MIPS target treats HILO as a
>> + double-word register.  */
>> +  machine_mode dword_mode = TARGET_64BIT ? TImode : DImode;
>> +  rtx hilo = gen_rtx_REG (dword_mode, MD_REG_FIRST);
>> +  rtx zero = CONST0_RTX (dword_mode);
>> +  emit_move_insn (hilo, zero);
>> +
>> +  SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM);
>> +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, LO_REGNUM))
>> +SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
>> +  else
>> +emit_clobber (gen_rtx_REG (word_mode, LO_REGNUM));
> 
> …I don't think this conditional LO_REGNUM code is worth it.
> We might as well just add both registers to zeroed_hardregs.

If the LO_REGNUM is NOT in “need_zeroed_hardregs”, adding it to 
“zeroed_hardregs” seems not right to me.
What’s you mean by “not worth it”?

> 
>> +}
>> +
>> +  bool zero_fcc = false;
>> +  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
>> +if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
>> +  zero_fcc = true;
>> +
>> +  /* MIPS does not have a simple way to clear one bit in FCC.  We just
>> + clear FCC with ctc1 and clobber all FCC bits.  */
>> +  if (zero_fcc)
>> +{
>> +  emit_insn (gen_mips_zero_fcc ());
>> +  for (int i = ST_REG_FIRST; i <= ST_REG_LAST; i++)
>> +if (TEST_HARD_REG_BIT (need_zeroed_hardregs, i))
>> +  SET_HARD_REG_BIT (zeroed_hardregs, i);
>> +else
>> +  emit_clobber (gen_rtx_REG (CCmode, i));
>> +}
> 
> Here too I think we should just do:
> 
>  zeroed_hardregs |= reg_class_contents[ST_REGS] & accessible_reg_set;
> 
> to include all available FCC registers.

What’s the relationship between “ST_REGs” and FCC? (sorry for the stupid 
question since I am not familiar with the MIPS register set).

From the above code, looks like that when any  “ST_REGs” is in 
“need_zeroed_hardregs”,FCC need to be cleared? 

thanks.

Qing


> 
>> +
>> +  need_zeroed_hardregs &= ~zeroed_hardregs;
>> +  return zeroed_hardregs |
>> + default_zero_call_used_regs (need_zeroed_hardregs);
> 
> Nit, but: should be formatted as:
> 
>  return (zeroed_hardregs
> | default_zero_call_used_regs (need_zeroed_hardregs));
> 
>> +}
>> +
>> 
>> /* Initialize the GCC target structure.  */
>> #undef TARGET_ASM_ALIGNED_HI_OP
>> @@ -22919,6 +22964,8 @@ mips_asm_file_end (void)
>> #undef TARGET_ASM_FILE_END
>> #define TARGET_ASM_FILE_END mips_asm_file_end
>> 
>> +#undef TARGET_ZERO_CALL_USED_REGS
>> +#define TARGET_ZERO_CALL_USED_REGS mips_zero_call_used_regs
>> 
>> struct gcc_target targetm = TARGET_INITIALIZER;
>> 
>> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
>> index e0f0a582732..edf58710cdd 100644
>> --- a/gcc/config/mips/mips.md
>> +++ b/gcc/config/mips/mips.md
>> @@ -96,6 +96,7 @@ (define_c_enum "unspec" [
>>   ;; Floating-point environment.
>>   UNSPEC_GET_FCSR
>>   UNSPEC_SET_FCSR
>> +  UNSPEC_ZERO_FCC
>> 
>>   ;; HI/LO moves.
>>   UNSPEC_MFHI
>> @@ -7670,6 +7671,11 @@ (define_insn "*mips_set_fcsr"
>>   "TARGET_HARD_FLOAT"
>>   "ctc1\t%0,$31")
>> 
>> +(define_insn "mips_zero_fcc"
>> +  [(unspec_volatile [(const_int 0)] UNSPEC_ZERO_FCC)]
>> +  "TARGET_HARD_FLOAT"
>> +  "ctc1\t$0,$25")
> 
> I've forgotten a lot of MIPS stuff, so: does this clear only the
> FCC registers, or does it clear other things (such as exception bits)
> as well?  Does it work even for !ISA_HAS_8CC?
> 
> I think this pattern should explicit clear all eight registers, e.g. using:
> 
>  (set (reg:CC FCC0_REGNUM) (const_int 0))
>  (set (reg:CC FCC1_REGNUM) (const_int 0))
>  …
> 
> which unfortunately means defining 8 new register constants in mips.md.
> I guess for extra safety there should be a separate !ISA_HAS_8CC version
> that only sets FCC0_REGNUM.
> 
> An alternative would be to avoid clearing the FCC registers altogether.
> I suppose that's less secure, but residual information could leak through
> the exception bits as well, and it isn't clear whether those should be
> zeroed at the end of each function.  I guess it d

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread Michael Meissner via Gcc-patches
On Thu, Mar 10, 2022 at 01:49:36PM -0600, Segher Boessenkool wrote:
> On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > > --- a/gcc/config/rs6000/rs6000-cpus.def
> > > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > > @@ -43,9 +43,7 @@
> > >| OPTION_MASK_ALTIVEC  \
> > >| OPTION_MASK_VSX)
> > > 
> > > -/* For now, don't provide an embedded version of ISA 2.07.  Do not set 
> > > power8
> > > -   fusion here, instead set it in rs6000.cc if we are tuning for a power8
> > > -   system.  */
> > > +/* For now, don't provide an embedded version of ISA 2.07.  */
> > 
> > ok.  (as far as removing the comment, I'm not clear what the remaining
> > comment is telling me, but thats outside of the scope of this patch).
> 
> It is saying there is nothing that implements Book III-E of ISA 2.07
> (nothing in GCC, but no actual CPU either).  Or Category: Embedded even
> maybe :-)
> 
> It could be clearer perhaps, or just be removed completely; it might
> have been useful historically, but it isn't anymore really.

At the time it was written, there were possiblities of power8 that weren't
server class (i.e. with VSX).  But yeah now, it might be useful to delete it.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] rs6000, v3: Fix up __SIZEOF_{FLOAT, IBM}128__ defines [PR99708]

2022-03-10 Thread Michael Meissner via Gcc-patches
On Wed, Mar 09, 2022 at 04:57:01PM -0600, Segher Boessenkool wrote:
> On Wed, Mar 09, 2022 at 10:10:07PM +0100, Jakub Jelinek wrote:
> > On Wed, Mar 09, 2022 at 02:57:20PM -0600, Segher Boessenkool wrote:
> > > But __ibm128 should *always* be supported, so this is a hypothetical
> > > problem.
> > 
> > I bet that will require much more work.  I think for the barely supported
> > (or really unsupported?) case of old sysv IEEE quad
> 
> The "q" library routines are double-double.  On RIOS2 (POWER2) there
> were "quad" instructions that worked on a pair of FP regs, but that was
> handled as a pair of FP regs, and since 2012 we do not support POWER2
> anymore anyway.
> 
> I have no clue if and when the "q_" library routines are used.  The do
> take KFmode params (or TFmode if we use double-double preferably).
> 
> Or are you thinking of something else still?

Probably libquadmath (which we still build).  Libquadmath defines all of the
functions with a 'q' suffix.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] rs6000: Fix up __SIZEOF_{FLOAT,IBM}128__ defines [PR99708]

2022-03-10 Thread Michael Meissner via Gcc-patches
On Mon, Mar 07, 2022 at 03:37:18PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Sat, Mar 05, 2022 at 09:21:51AM +0100, Jakub Jelinek wrote:
> > As mentioned in the PR, right now on powerpc* __SIZEOF_{FLOAT,IBM}128__
> > macros are predefined unconditionally, because {ieee,ibm}128_float_type_node
> > is always non-NULL, doesn't reflect whether __ieee128 or __ibm128 are
> > actually supported or not.
> > 
> > The following patch:
> > 1) makes those {ieee,ibm}128_float_type_node trees NULL if we don't
> >really support them instead of equal to long_double_type_node
> > 2) adjusts the builtins code to use
> >ibm128_float_type_node ? ibm128_float_type_node : long_double_type_node
> >for the 2 builtins, so that we don't ICE during builtins creation
> >if ibm128_float_type_node is NULL (using error_mark_node instead of
> >long_double_type_node sounds more risky to me)
> 
> I feel the opposite way: (potentially) using the wrong thing is just a
> ticking time bomb, never "safer".
> 
> > 3) in rs6000_type_string will not match NULL as __ibm128, and adds
> >a __ieee128 case
> > 4) removes the clearly unused ptr_{ieee,ibm}128_float_type_node trees;
> >if something needs them in the future, they can be easily added back,
> >but wasting GTY just in case...
> > 5) actually syncs __SIZEOF_FLOAT128__ with the __float128 macro being
> >defined in addition to __ieee128 being usable
> > 
> > Now, in the PR Segher said he doesn't like 5), but I think it is better
> > to match the reality and get this PR fixed and if we want to rethink
> > how __float128 is defined (whether it is a macro, or perhaps another
> > builtin type like __ieee128 which could be easily done by
> >lang_hooks.types.register_builtin_type (ieee128_float_type_node,
> >   "__ieee128");
> >lang_hooks.types.register_builtin_type (ieee128_float_type_node,
> >   "__float128");
> > perhaps under some conditions, rethink if the -mfloat128 option exists
> > and what it does etc., it can be done incrementally and the rs6000-c.cc
> > hunks in the patch can be easily reverted (appart from the formatting
> > fix).
> 
> There needs to be a __SIZEOF_IEEE128__ as well, if you like reality :-)
> Sorry I did not pick up on that earlier.

No, no, no.

The '__ieee128' keyword was used as a way to define the type but not enable the
'__float128' keyword.  Then rs6000-c.cc defines __float128 to be __ieee128,
similar to defining 'vector' and '__vector' to be
__attribute__((altivec(vector__))'.

Unfortunately, there is no way to remove a keyword after the creation (or at
least there wasn't in the GCC 8 time frame when I wrote the code), and you need
to create the types at GCC startup to set up the built-ins.

No one should be using '__ieee128'.  The official keywords are '__float128' and
for C (not C++) '_Float128'.

Perhaps in GCC 13 it is time to just remove it and always just define
'__float128' instead.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread Michael Meissner via Gcc-patches
On Thu, Mar 10, 2022 at 01:49:36PM -0600, Segher Boessenkool wrote:
> On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > > --- a/gcc/config/rs6000/rs6000-cpus.def
> > > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > > @@ -43,9 +43,7 @@
> > >| OPTION_MASK_ALTIVEC  \
> > >| OPTION_MASK_VSX)
> > > 
> > > -/* For now, don't provide an embedded version of ISA 2.07.  Do not set 
> > > power8
> > > -   fusion here, instead set it in rs6000.cc if we are tuning for a power8
> > > -   system.  */
> > > +/* For now, don't provide an embedded version of ISA 2.07.  */
> > 
> > ok.  (as far as removing the comment, I'm not clear what the remaining
> > comment is telling me, but thats outside of the scope of this patch).
> 
> It is saying there is nothing that implements Book III-E of ISA 2.07
> (nothing in GCC, but no actual CPU either).  Or Category: Embedded even
> maybe :-)
> 
> It could be clearer perhaps, or just be removed completely; it might
> have been useful historically, but it isn't anymore really.

Other than possibly removing the comment, are there other things about the
patch that need to be done?

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] c++: return-type-req in constraint using only outer tparms [PR104527]

2022-03-10 Thread Patrick Palka via Gcc-patches


On Thu, 10 Mar 2022, Jason Merrill wrote:

> On 2/16/22 15:56, Patrick Palka wrote:
> > On Tue, 15 Feb 2022, Jason Merrill wrote:
> > 
> > > On 2/14/22 11:32, Patrick Palka wrote:
> > > > Here the template context for the atomic constraint has two levels of
> > > > template arguments, but since it depends only on the innermost argument
> > > > T we use a single-level argument vector during substitution into the
> > > > constraint (built by get_mapped_args).  We eventually pass this vector
> > > > to do_auto_deduction as part of checking the return-type-requirement
> > > > inside the atom, but do_auto_deduction expects outer_targs to be a full
> > > > set of arguments for sake of satisfaction.
> > > 
> > > Could we note the current number of levels in the map and use that in
> > > get_mapped_args instead of the highest level parameter we happened to use?
> > 
> > Ah yeah, that seems to work nicely.  IIUC it should suffice to remember
> > whether the atomic constraint expression came from a concept definition.
> > If it did, then the depth of the argument vector returned by
> > get_mapped_args must be one, otherwise (as in the testcase) it must be
> > the same as the template depth of the constrained entity, which is the
> > depth of ARGS.
> > 
> > How does the following look?  Bootstrapped and regtested on
> > x86_64-pc-linux-gnu and also on cmcstl2 and range-v3.
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: return-type-req in constraint using only outer tparms
> >   [PR104527]
> > 
> > Here the template context for the atomic constraint has two levels of
> > template parameters, but since it depends only on the innermost parameter
> > T we use a single-level argument vector (built by get_mapped_args) during
> > substitution into the atom.  We eventually pass this vector to
> > do_auto_deduction as part of checking the return-type-requirement within
> > the atom, but do_auto_deduction expects outer_targs to be a full set of
> > arguments for sake of satisfaction.
> > 
> > This patch fixes this by making get_mapped_args always return an
> > argument vector whose depth corresponds to the template depth of the
> > context in which the atomic constraint expression was written, instead
> > of the highest parameter level that the expression happens to use.
> > 
> > PR c++/104527
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constraint.cc (normalize_atom): Set
> > ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P appropriately.
> > (get_mapped_args):  Make static, adjust parameters.  Always
> > return a vector whose depth corresponds to the template depth of
> > the context of the atomic constraint expression.  Micro-optimize
> > by passing false as exact to safe_grow_cleared and by collapsing
> > a multi-level depth-one argument vector.
> > (satisfy_atom): Adjust call to get_mapped_args and
> > diagnose_atomic_constraint.
> > (diagnose_atomic_constraint): Replace map parameter with an args
> > parameter.
> > * cp-tree.h (ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P): Define.
> > (get_mapped_args): Remove declaration.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-return-req4.C: New test.
> > ---
> >   gcc/cp/constraint.cc  | 64 +++
> >   gcc/cp/cp-tree.h  |  7 +-
> >   .../g++.dg/cpp2a/concepts-return-req4.C   | 24 +++
> >   3 files changed, 69 insertions(+), 26 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
> > 
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 12db7e5cf14..306e28955c6 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -764,6 +764,8 @@ normalize_atom (tree t, tree args, norm_info info)
> > tree ci = build_tree_list (t, info.context);
> >   tree atom = build1 (ATOMIC_CONSTR, ci, map);
> > +  if (info.in_decl && concept_definition_p (info.in_decl))
> > +ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (atom) = true;
> 
> I'm a bit nervous about relying on in_decl, given that we support normalizing
> when it isn't set; I don't remember the circumstances for that.  Maybe make
> the flag indicate that ctx_parms had depth 1?

in_decl gets reliably updated by norm_info::update_context whenever we
recurse inside a concept-id during normalization.  And I think the only
other situation we have to worry about is when starting out with a
concept-id, which is handled by normalize_concept_definition where we
also set in_decl appropriately.

AFAICT, in_decl is not set (at the start) only when normalizing a
placeholder type constraint or nested-requirement, and from some
subsumption entrypoints.  And we shouldn't see an atom that belongs to a
concept in these cases unless we recurse into a concept-id, in which
case norm_info::update_context will update in_decl appropriately.

So IMHO it should be safe to rely on in_decl here to detect if the atom
belongs to a concept, at least given the current entrypoint

[committed] libstdc++: Move closing brace outside #endif [PR104866]

2022-03-10 Thread Jonathan Wakely via Gcc-patches
From: Detlef Vollmann 

Tested x86_64-linux, pushed to trunk.

-- >8--

Author: Detlef Vollmann 

libstdc++-v3/ChangeLog:

PR libstdc++/104866
* include/bits/this_thread_sleep.h: Fix order of #endif and
closing brace of namespace.
---
 libstdc++-v3/include/bits/this_thread_sleep.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/this_thread_sleep.h 
b/libstdc++-v3/include/bits/this_thread_sleep.h
index 86bc6ffd632..712de5a6ff9 100644
--- a/libstdc++-v3/include/bits/this_thread_sleep.h
+++ b/libstdc++-v3/include/bits/this_thread_sleep.h
@@ -105,8 +105,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__now = _Clock::now();
  }
   }
-  } // namespace this_thread
 #endif // ! NO_SLEEP
+  } // namespace this_thread
 
   /// @}
 
-- 
2.34.1



[committed] [PR103074] LRA: Check new conflicts when splitting hard reg live range

2022-03-10 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103074

The patch was successfully bootstrapped and tested on x86-64 and aarch64.

commit d8e5fff6b74b82c2ac3254be9a1f0fb6b30dbdbf
Author: Vladimir N. Makarov 
Date:   Thu Mar 10 16:16:00 2022 -0500

[PR103074] LRA: Check new conflicts when splitting hard reg live range.

Splitting hard register live range can create (artificial)
conflict of the hard register with another pseudo because of simplified
conflict calculation in LRA.  We should check such conflict on the next
assignment sub-pass and spill and reassign the pseudo if necessary.
The patch implements this.

gcc/ChangeLog:

PR target/103074
* lra-constraints.cc (split_reg): Set up
check_and_force_assignment_correctness_p when splitting hard
register live range.

gcc/testsuite/ChangeLog:

PR target/103074
* gcc.target/i386/pr103074.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 080b44ad87a..d92ab76908c 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5994,12 +5994,17 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
 			 before_p ? NULL : save,
 			 call_save_p
 			 ?  "Add save<-reg" : "Add split<-reg");
-  if (nregs > 1)
+  if (nregs > 1 || original_regno < FIRST_PSEUDO_REGISTER)
 /* If we are trying to split multi-register.  We should check
conflicts on the next assignment sub-pass.  IRA can allocate on
sub-register levels, LRA do this on pseudos level right now and
this discrepancy may create allocation conflicts after
-   splitting.  */
+   splitting.
+
+   If we are trying to split hard register we should also check conflicts
+   as such splitting can create artificial conflict of the hard register
+   with another pseudo because of simplified conflict calculation in
+   LRA.  */
 check_and_force_assignment_correctness_p = true;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
diff --git a/gcc/testsuite/gcc.target/i386/pr103074.c b/gcc/testsuite/gcc.target/i386/pr103074.c
new file mode 100644
index 000..276ad82a1de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103074.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=bonnell -Os -fPIC -fschedule-insns -w" } */
+
+void
+serialize_collection (char *ptr, int a, int need_owner)
+{
+  if (need_owner)
+__builtin_sprintf(ptr, "%d:%d", 0, a);
+  else
+{
+  static char buff[32];
+
+  __builtin_sprintf(buff, "%d:%d", a >> 32, a);
+  __builtin_sprintf(ptr, "%d:%d:\"%s\"", 0, 0, buff);
+}
+}


[committed] analyzer: fix ICE with -fanalyzer-transitivity [PR104863]

2022-03-10 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7605-gd016dd7dbb8140f03cde7e2179ebaf9ec3e9d2f1.

gcc/analyzer/ChangeLog:
PR analyzer/104863
* constraint-manager.cc (constraint_manager::add_constraint):
Refresh the EC IDs when adding constraints implied by offsets.

gcc/testsuite/ChangeLog:
PR analyzer/104863
* gcc.dg/analyzer/torture/pr104863.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/constraint-manager.cc   |  4 
 gcc/testsuite/gcc.dg/analyzer/torture/pr104863.c | 14 ++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/pr104863.c

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index ac1e4feaee5..9c8c60429f4 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1818,6 +1818,10 @@ constraint_manager::add_constraint (const svalue *lhs,
  = m_mgr->get_or_create_constant_svalue (offset_of_cst);
if (!add_constraint (implied_lhs, implied_op, implied_rhs))
  return false;
+   /* The above add_constraint could lead to EC merger, so we need
+  to refresh the EC IDs.  */
+   lhs_ec_id = get_or_add_equiv_class (lhs);
+   rhs_ec_id = get_or_add_equiv_class (rhs);
  }
 
   add_unknown_constraint (lhs_ec_id, op, rhs_ec_id);
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr104863.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr104863.c
new file mode 100644
index 000..30ed4fe022d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr104863.c
@@ -0,0 +1,14 @@
+/* { dg-additional-options "-fanalyzer-transitivity" } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
+
+extern void g();
+struct a {
+} b(int c, int d) {
+  struct a *e = 0;
+  int f;
+  if (c & 1 || !(c & 2))
+return *e;
+  f = 0;
+  for (; f < d - 1; f++)
+g(e[1]); /* { dg-warning "dereference of NULL" } */
+}
-- 
2.26.3



[PATCH] c++: ICE with template code in constexpr [PR104284]

2022-03-10 Thread Marek Polacek via Gcc-patches
Since r9-6073 cxx_eval_store_expression preevaluates the value to
be stored, and that revealed a crash where a template code (here,
code=IMPLICIT_CONV_EXPR) leaks into cxx_eval*.

It happens because we're performing build_vec_init while processing
a template, which calls get_temp_regvar which creates an INIT_EXPR.
This INIT_EXPR's RHS contains an rvalue conversion so we create an
IMPLICIT_CONV_EXPR.  Its operand is not type-dependent and the whole
INIT_EXPR is not type-dependent.  So we call build_non_dependent_expr
which, with -fchecking=2, calls fold_non_dependent_expr.  At this
point the expression still has an IMPLICIT_CONV_EXPR, which ought to
be handled in instantiate_non_dependent_expr_internal.  However,
tsubst_copy_and_build doesn't handle INIT_EXPR; it will just call
tsubst_copy which does nothing when args is null.  So we fail to
replace the IMPLICIT_CONV_EXPR and ICE.

Eliding the IMPLICIT_CONV_EXPR in this particular case would be too
risky, so we could do

  if (TREE_CODE (t) == INIT_EXPR)
t = TREE_OPERAND (t, 1);

in fold_non_dependent_expr, but that feels too ad hoc.  So it might
make sense to actually take care of INIT_EXPR in tsubst_c_and_b.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR c++/104284

gcc/cp/ChangeLog:

* pt.cc (tsubst_copy_and_build): Handle INIT_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-104284.C: New test.
---
 gcc/cp/pt.cc  |  8 
 gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C | 17 +
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f7ee33a6dfd..e8920f98e4d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21289,6 +21289,14 @@ tsubst_copy_and_build (tree t,
 with constant operands.  */
   RETURN (t);
 
+case INIT_EXPR:
+  {
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = RECUR (TREE_OPERAND (t, 1));
+   RETURN (build2_loc (input_location, INIT_EXPR, TREE_TYPE (op0),
+   op0, op1));
+  }
+
 case NON_LVALUE_EXPR:
 case VIEW_CONVERT_EXPR:
   if (location_wrapper_p (t))
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C
new file mode 100644
index 000..f60033069e4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C
@@ -0,0 +1,17 @@
+// PR c++/104284
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-fchecking=2" }
+
+struct S {
+  char c{};
+};
+
+auto x = [](auto) { constexpr S s[]{{}}; };
+
+template
+constexpr void gn ()
+{
+  constexpr S s[]{{}};
+}
+
+static_assert ((gn(), true), "");

base-commit: b5417a0ba7e26bec2abf05cad6c6ef840a9be41c
-- 
2.35.1



Re: [PATCH] c++: ICE with template code in constexpr [PR104284]

2022-03-10 Thread Marek Polacek via Gcc-patches
On Thu, Mar 10, 2022 at 05:04:59PM -0500, Marek Polacek via Gcc-patches wrote:
> Since r9-6073 cxx_eval_store_expression preevaluates the value to
> be stored, and that revealed a crash where a template code (here,
> code=IMPLICIT_CONV_EXPR) leaks into cxx_eval*.
> 
> It happens because we're performing build_vec_init while processing
> a template, which calls get_temp_regvar which creates an INIT_EXPR.
> This INIT_EXPR's RHS contains an rvalue conversion so we create an
> IMPLICIT_CONV_EXPR.  Its operand is not type-dependent and the whole
> INIT_EXPR is not type-dependent.  So we call build_non_dependent_expr
> which, with -fchecking=2, calls fold_non_dependent_expr.  At this
> point the expression still has an IMPLICIT_CONV_EXPR, which ought to
> be handled in instantiate_non_dependent_expr_internal.  However,
> tsubst_copy_and_build doesn't handle INIT_EXPR; it will just call
> tsubst_copy which does nothing when args is null.  So we fail to
> replace the IMPLICIT_CONV_EXPR and ICE.

Forgot to mention: without -fchecking=2 there's no problem because
digest_init will subst the IMPLICIT_CONV_EXPR:

#0  tsubst_copy_and_build (t=, args=, complain=3, 
in_decl=, function_p=false, integral_constant_expression_p=true)
at /home/mpolacek/src/gcc/gcc/cp/pt.cc:20063
#1  0x00de1ae1 in instantiate_non_dependent_expr_internal 
(expr=, 
complain=3) at /home/mpolacek/src/gcc/gcc/cp/pt.cc:6358
#2  0x00b702d4 in fold_non_dependent_expr_template 
(t=, 
complain=3, manifestly_const_eval=false, object=)
at /home/mpolacek/src/gcc/gcc/cp/constexpr.cc:8050
#3  0x00b706f0 in fold_non_dependent_init (t=, complain=3, 
manifestly_const_eval=false, object=) at 
/home/mpolacek/src/gcc/gcc/cp/constexpr.cc:8143
#4  0x00f08f4f in massage_init_elt (type=, 
init=, nested=0, flags=257, complain=3)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:1437
#5  0x00f0949c in process_init_constructor_array (type=, 
init=, nested=0, flags=257, complain=3)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:1502
#6  0x00f0aec1 in process_init_constructor (type=, 
init=, nested=0, flags=257, complain=3)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:1917
#7  0x00f0890c in digest_init_r (type=, 
init=, nested=0, flags=257, complain=3)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:1324
#8  0x00f08b1b in digest_init_flags (type=, 
init=, flags=257, complain=3)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:1370
#9  0x00f06815 in store_init_value (decl=, 
init=, cleanups=0x7fffba68, flags=257)
at /home/mpolacek/src/gcc/gcc/cp/typeck2.cc:842
#10 0x00bf56cc in check_initializer (decl=, 
init=, flags=257, cleanups=0x7fffba68)
at /home/mpolacek/src/gcc/gcc/cp/decl.cc:7337
#11 0x00bfa8df in cp_finish_decl (decl=, 
init=, init_const_expr_p=true, 
asmspec_tree=, flags=1)
at /home/mpolacek/src/gcc/gcc/cp/decl.cc:8174
 
> Eliding the IMPLICIT_CONV_EXPR in this particular case would be too
> risky, so we could do
> 
>   if (TREE_CODE (t) == INIT_EXPR)
> t = TREE_OPERAND (t, 1);
> 
> in fold_non_dependent_expr, but that feels too ad hoc.  So it might
> make sense to actually take care of INIT_EXPR in tsubst_c_and_b.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?
> 
>   PR c++/104284
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_copy_and_build): Handle INIT_EXPR.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/constexpr-104284.C: New test.
> ---
>  gcc/cp/pt.cc  |  8 
>  gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C | 17 +
>  2 files changed, 25 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index f7ee33a6dfd..e8920f98e4d 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -21289,6 +21289,14 @@ tsubst_copy_and_build (tree t,
>with constant operands.  */
>RETURN (t);
>  
> +case INIT_EXPR:
> +  {
> + tree op0 = RECUR (TREE_OPERAND (t, 0));
> + tree op1 = RECUR (TREE_OPERAND (t, 1));
> + RETURN (build2_loc (input_location, INIT_EXPR, TREE_TYPE (op0),
> + op0, op1));
> +  }
> +
>  case NON_LVALUE_EXPR:
>  case VIEW_CONVERT_EXPR:
>if (location_wrapper_p (t))
> diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C 
> b/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C
> new file mode 100644
> index 000..f60033069e4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-104284.C
> @@ -0,0 +1,17 @@
> +// PR c++/104284
> +// { dg-do compile { target c++14 } }
> +// { dg-additional-options "-fchecking=2" }
> +
> +struct S {
> +  char c{};
> +};
> +
> +auto x = [](auto) { constexpr S s[]{{}}; };
> +
> +template
> +constexpr void gn ()
> +{
> +  constexpr S s[]{{}};
> +}
> +
> +static_assert ((gn(), true), "");
> 
> base-commit: b5417a0ba7e26b

[PATCH v2] RISCV: Add support for inlining subword atomics

2022-03-10 Thread Patrick O'Neill
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls 
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2022-02-15 Patrick O'Neill 

PR target/104338
* riscv.opt: Add command-line flag.
* invoke.texi: Add blurb regarding command-line flag.
* sync.md (atomic_fetch_): logic for 
expanding subword atomic operations.
* sync.md (subword_atomic_fetch_strong_): LR/SC
block for performing atomic operation
* atomic.c: Add reference to duplicate logic.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* inline-atomics-9.c: Likewise.

Signed-off-by: Patrick O'Neill 
---
There may be further concerns about the memory consistency of these 
operations, but this patch focuses on simply moving the logic inline.
Those concerns can be addressed in a future patch.
---
v2 Changelog:
 - Add texti blurb
 - Update target flag
 - add 'UNSPEC_SYNC_OLD_OP_SUBWORD' for subword ops
---
 gcc/config/riscv/riscv.opt|   4 +
 gcc/config/riscv/sync.md  |  98 +++
 gcc/doc/invoke.texi   |   7 +
 .../gcc.target/riscv/inline-atomics-1.c   |  11 +
 .../gcc.target/riscv/inline-atomics-2.c   |  12 +
 .../gcc.target/riscv/inline-atomics-3.c   | 569 ++
 .../gcc.target/riscv/inline-atomics-4.c   | 566 +
 .../gcc.target/riscv/inline-atomics-5.c   |  13 +
 .../gcc.target/riscv/inline-atomics-6.c   |  12 +
 .../gcc.target/riscv/inline-atomics-7.c   |  12 +
 .../gcc.target/riscv/inline-atomics-8.c   |  17 +
 .../gcc.target/riscv/inline-atomics-9.c   |  17 +
 libgcc/config/riscv/atomic.c  |   2 +
 13 files changed, 1340 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-9.c

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 9fffc08220d..8378e41aa85 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -225,3 +225,7 @@ Enum(isa_spec_class) String(20191213) 
Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) 
Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+minline-atomics
+Target Mask(INLINE_SUBWORD_ATOMIC)
+Always inline subword atomic operations.
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 86b41e6b00a..05cbdfd5db3 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -22,6 +22,7 @@
 (define_c_enum "unspec" [
   UNSPEC_COMPARE_AND_SWAP
   UNSPEC_SYNC_OLD_OP
+  UNSPEC_SYNC_OLD_OP_SUBWORD
   UNSPEC_SYNC_EXCHANGE
   UNSPEC_ATOMIC_STORE
   UNSPEC_MEMORY_BARRIER
@@ -92,6 +93,103 @@
   "%F3amo.%A3 %0,%z2,%1"
   [(set (attr "length") (const_int 8))])
 
+(define_expand "atomic_fetch_"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r") ;; old 
value at mem
+   (match_operand:SHORT 1 "memory_operand" "+A"));; mem 
location
+   (set (match_dup 1)
+   (unspec_volatile:SHORT
+ [(any_atomic:SHORT (match_dup 1)
+(match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value 
for op
+  (match_operand:SI 3 "const_int_operand")]  ;; model
+UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+ subword_atomic_fetch_strong_ to implement a LR/SC version of the
+ operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+ is disabled */
+
+  rtx old = gen_reg_rtx (SImode);
+  rtx mem = operands[1];
+  rtx value = operands[2];
+  rtx mask = gen_reg_rtx (SImode);
+  rtx notmask = gen_reg_rtx (SImode);
+
+  rtx addr = force_reg (Pmode

[PATCH] PR middle-end/98420: Don't fold x - x to 0.0 with -frounding-math

2022-03-10 Thread Roger Sayle

This patch addresses PR middle-end/98420, which is inappropriate constant
folding of x - x to 0.0 (in match.pd) when -frounding-math is specified.
Specifically, x - x may be -0.0 with FE_DOWNWARD as the rounding mode.

To summarize, the desired IEEE behaviour, x - x for floating point x,
(1) can't be folded to 0.0 by default, due to the possibility of NaN or Inf
(2) can be folded to 0.0 with -ffinite-math-only
(3) can't be folded to 0.0 with -ffinite-math-only -frounding-math
(4) can be folded with -ffinite-math-only -frounding-math -fno-signed-zeros

Technically, this is a regression from GCC 4.1 (according to godbolt.org)
so hopefully this patch is suitable during stage4.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-03-10  Roger Sayle  

gcc/ChangeLog
PR middle-end/98420
* match.pd (minus @0 @0): Additional checks for -fno-rounding-math
(the defaut) or -fno-signed-zeros.

gcc/testsuite/ChangeLog
PR middle-end/98420
* gcc.dg/pr98420.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 97399e5..3fe53d1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -229,13 +229,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Simplify x - x.
This is unsafe for certain floats even in non-IEEE formats.
In IEEE, it is unsafe because it does wrong for NaNs.
+   PR middle-end/98420: x - x may be -0.0 with FE_DOWNWARD.
Also note that operand_equal_p is always false if an operand
is volatile.  */
 (simplify
  (minus @0 @0)
  (if (!FLOAT_TYPE_P (type)
   || (!tree_expr_maybe_nan_p (@0)
- && !tree_expr_maybe_infinite_p (@0)))
+ && !tree_expr_maybe_infinite_p (@0)
+ && (!flag_rounding_math || !HONOR_SIGNED_ZEROS (type
   { build_zero_cst (type); }))
 (simplify
  (pointer_diff @@0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr98420.c b/gcc/testsuite/gcc.dg/pr98420.c
new file mode 100644
index 000..c289b84
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98420.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffinite-math-only -frounding-math -fdump-tree-optimized" 
} */
+double foo (double a)
+{
+  return a - a;
+}
+
+/* { dg-final { scan-tree-dump " = a_\[0-9\]\\(D\\) - a_\[0-9\]\\(D\\);" 
"optimized" } } */


[committed] libstdc++: Do not use fast_float for 16-bit size_t [PR104870]

2022-03-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and built on msp430-elf, pushed to trunk.

-- >8 --

The preprocessor condition for using fast_float should match the one in
the header, and require at least 32-bit size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/104870
* src/c++17/floating_from_chars.cc: Check __SIZE_WIDTH__ >= 32
before using fast_float.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index ba1345db3f2..ba0426b3344 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -61,7 +61,8 @@
 extern "C" __ieee128 __strtoieee128(const char*, char**);
 #endif
 
-#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64
+#if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
+&& __SIZE_WIDTH__ >= 32
 # define USE_LIB_FAST_FLOAT 1
 # if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
 #  undef USE_STRTOD_FOR_FROM_CHARS
-- 
2.34.1



Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-10 Thread Xi Ruoyao via Gcc-patches
On Thu, 2022-03-10 at 20:31 +, Qing Zhao wrote:

> > > +  SET_HARD_REG_BIT (zeroed_hardregs, HI_REGNUM);
> > > +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, LO_REGNUM))
> > > +   SET_HARD_REG_BIT (zeroed_hardregs, LO_REGNUM);
> > > +  else
> > > +   emit_clobber (gen_rtx_REG (word_mode, LO_REGNUM));
> > 
> > …I don't think this conditional LO_REGNUM code is worth it.
> > We might as well just add both registers to zeroed_hardregs.
> 
> If the LO_REGNUM is NOT in “need_zeroed_hardregs”, adding it to 
> “zeroed_hardregs” seems not right to me.
> What’s you mean by “not worth it”?

It's because the MIPS port almost always treat HI as "a subreg of dword
HI-LO register".  A direct "mthi $0" is possible but MIPS backend does
not recognize "emit_move_insn (HI, CONST_0)".  In theory it's possible
to emit the mthi instruction explicitly here though, but we'll need to
clear something NOT in need_zeroed_hardregs for MIPS anyway (see below).

> > Here too I think we should just do:
> > 
> >  zeroed_hardregs |= reg_class_contents[ST_REGS] & accessible_reg_set;
> > 
> > to include all available FCC registers.
> 
> What’s the relationship between “ST_REGs” and FCC? (sorry for the stupid 
> question since I am not familiar with the MIPS register set).

MIPS instruction manual names the 8 one-bit floating condition codes
FCC0, ..., FCC7, but GCC MIPS backend code names the condition codes
ST_REG0, ..., ST_REG7.  Maybe it's better to always use the name
"ST_REG" instead of "FCC" then.

> From the above code, looks like that when any  “ST_REGs” is in 
> “need_zeroed_hardregs”,FCC need to be cleared? 

Because there is no elegant way to clear one specific FCC bit in MIPS. 
A "ctc1 $0, $25" instruction will zero them altogether.  If we really
need to clear only one of them (let's say ST_REG3), we'll have to emit
something like

mtc1  $0, $0   # zero FPR0 to ensure it won't contain sNaN
c.f.s $3, $0, $0

Then we'll still need to clobber FPR0 with zero.  So anyway we'll have
to clear some registers not specified in need_zeroed_hardregs.

And the question is: is it really allowed to return something other than
a subset of need_zeroed_hardregs for a TARGET_ZERO_CALL_USED_REGS hook?
If yes then we'll happily to do so (like how the v2 of the patch does),
otherwise we'd need to clobber those registers NOT in
need_zeroed_hardregs explicitly.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH 1/2] libsanitizer: cherry-pick db7bca28638e from upstream

2022-03-10 Thread Xi Ruoyao via Gcc-patches
libsanitizer/

* sanitizer_common/sanitizer_atomic_clang.h: Ensures to only
include sanitizer_atomic_clang_mips.h for O32.
---
 libsanitizer/sanitizer_common/sanitizer_atomic_clang.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h 
b/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
index fc13ca52dda..ccf18f0786d 100644
--- a/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
+++ b/libsanitizer/sanitizer_common/sanitizer_atomic_clang.h
@@ -96,8 +96,8 @@ inline bool atomic_compare_exchange_weak(volatile T *a,
 // This include provides explicit template instantiations for atomic_uint64_t
 // on MIPS32, which does not directly support 8 byte atomics. It has to
 // proceed the template definitions above.
-#if defined(_MIPS_SIM) && defined(_ABIO32)
-  #include "sanitizer_atomic_clang_mips.h"
+#if defined(_MIPS_SIM) && defined(_ABIO32) && _MIPS_SIM == _ABIO32
+#  include "sanitizer_atomic_clang_mips.h"
 #endif
 
 #undef ATOMIC_ORDER
-- 
2.35.1




[PATCH 2/2] Enable libsanitizer build on mips64

2022-03-10 Thread Xi Ruoyao via Gcc-patches
Bootstrapped and regtested on mips64el-linux-gnuabi64.

bootstrap-ubsan revealed 3 bugs (PR 104842, 104843, 104851).
bootstrap-asan did not reveal any new bug.

gcc/

* config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Define.
* config/mips/mips.cc (mips_option_override): Make
-fsanitize=address imply -fasynchronous-unwind-tables.  This is
needed by libasan for stack backtrace on MIPS.
(mips_asan_shadow_offset): Return SUBTARGET_SHADOW_OFFSET.

gcc/testsuite:

* c-c++-common/asan/global-overflow-1.c: Skip for MIPS with some
optimization levels because inaccurate debug info is causing
dg-output mismatch on line numbers.
* g++.dg/asan/large-func-test-1.C: Likewise.

libsanitizer/

* configure.tgt: Enable build on mips64.
---
 gcc/config/mips/mips.cc | 9 -
 gcc/config/mips/mips.h  | 7 +++
 gcc/testsuite/c-c++-common/asan/global-overflow-1.c | 1 +
 gcc/testsuite/g++.dg/asan/large-func-test-1.C   | 1 +
 libsanitizer/configure.tgt  | 4 
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 59eef515826..6b06c6380f6 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -19974,6 +19974,13 @@ mips_option_override (void)
target_flags |= MASK_64BIT;
 }
 
+  /* -fsanitize=address needs to turn on -fasynchronous-unwind-tables in
+ order for tracebacks to be complete but not if any
+ -fasynchronous-unwind-table were already specified.  */
+  if (flag_sanitize & SANITIZE_USER_ADDRESS
+  && !global_options_set.x_flag_asynchronous_unwind_tables)
+flag_asynchronous_unwind_tables = 1;
+
   if ((target_flags_explicit & MASK_FLOAT64) != 0)
 {
   if (mips_isa_rev >= 6 && !TARGET_FLOAT64)
@@ -22591,7 +22598,7 @@ mips_constant_alignment (const_tree exp, HOST_WIDE_INT 
align)
 static unsigned HOST_WIDE_INT
 mips_asan_shadow_offset (void)
 {
-  return 0x0aaa;
+  return SUBTARGET_SHADOW_OFFSET;
 }
 
 /* Implement TARGET_STARTING_FRAME_OFFSET.  See mips_compute_frame_info
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 0029864fdcd..858bbba3a36 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3463,3 +3463,10 @@ struct GTY(())  machine_function {
&& !TARGET_MICROMIPS && !TARGET_FIX_24K)
 
 #define NEED_INDICATE_EXEC_STACK 0
+
+/* Define the shadow offset for asan. Other OS's can override in the
+   respective tm.h files.  */
+#ifndef SUBTARGET_SHADOW_OFFSET
+#define SUBTARGET_SHADOW_OFFSET \
+  (POINTER_SIZE == 64 ? HOST_WIDE_INT_1 << 37 : HOST_WIDE_INT_C (0x0aaa))
+#endif
diff --git a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c 
b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
index 1092a316681..ec412231be0 100644
--- a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
@@ -22,6 +22,7 @@ int main() {
   return res;
 }
 
+/* { dg-skip-if "inaccurate debug info" { mips*-*-* } { "*" } { "-O0" } } */
 /* { dg-output "READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r)" } */
 /* { dg-output "#0 0x\[0-9a-f\]+ +(in _*main 
(\[^\n\r]*global-overflow-1.c:20|\[^\n\r]*:0|\[^\n\r]*\\+0x\[0-9a-z\]*)|\[(\])\[^\n\r]*(\n|\r\n|\r).*"
 } */
 /* { dg-output "0x\[0-9a-f\]+ is located 0 bytes to the right of global 
variable" } */
diff --git a/gcc/testsuite/g++.dg/asan/large-func-test-1.C 
b/gcc/testsuite/g++.dg/asan/large-func-test-1.C
index b42c09e3b0d..ac9deb898c8 100644
--- a/gcc/testsuite/g++.dg/asan/large-func-test-1.C
+++ b/gcc/testsuite/g++.dg/asan/large-func-test-1.C
@@ -35,6 +35,7 @@ int main() {
   delete x;
 }
 
+// { dg-skip-if "inaccurate debug info" { mips*-*-* } { "-Os" } { "" }  }
 // { dg-output "ERROR: AddressSanitizer:? heap-buffer-overflow on 
address\[^\n\r]*" }
 // { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" }
 // { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r)" }
diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 5a59ea6a1b5..fb89df4935c 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -54,10 +54,6 @@ case "${target}" in
;;
   arm*-*-linux*)
;;
-  mips*64*-*-linux*)
-   # This clause is only here to not match the supported mips*-*-linux*.
-   UNSUPPORTED=1
-   ;;
   mips*-*-linux*)
;;
   aarch64*-*-linux*)
-- 
2.35.1




Re: [PATCH] tree-optimization/102943 - avoid (re-)computing dominance bitmap

2022-03-10 Thread Jeff Law via Gcc-patches




On 3/10/2022 4:50 AM, Richard Biener via Gcc-patches wrote:

Currently back_propagate_equivalences tries to optimize dominance
queries in a smart way but it fails to notice that when fast indexes
are available the dominance query is fast (when called from DOM).
It also re-computes the dominance bitmap for each equivalence recorded
on an edge, which for FP are usually several.  Finally it fails to
use the tree bitmap view for efficiency.  Overall this cuts 7
seconds of compile-time from originally 77 in the slowest LTRANS
unit when building 521.wrf_r.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2022-03-10  Richard Biener  

PR tree-optimization/102943
* tree-ssa-dom.cc (back_propagate_equivalences): Only
populate the dominance bitmap if fast queries are not
available.  Use a tree view bitmap.
(record_temporary_equivalences): Cache the dominance bitmap
across all equivalences on the edge.
Anything that helps WRF build times is a win in my book.    IIRC the 
worst module in there takes something like 27 minutes to cross compile 
for our target.  Annoying as hell when you have to do it multiple times 
a day.


jeff



[r12-7607 Regression] FAIL: g++.dg/other/pr84964.C -std=c++98 (test for warnings, line 6) on Linux/x86_64

2022-03-10 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

a717376e99fb33ba3b06bd8122e884f4b63a60c9 is the first bad commit
commit a717376e99fb33ba3b06bd8122e884f4b63a60c9
Author: Roger Sayle 
Date:   Thu Mar 10 23:49:15 2022 +

PR c++/84964: Middle-end patch to expand_call for ICE after sorry.

caused

FAIL: g++.dg/other/pr84964.C  -std=c++14  (test for warnings, line 6)
FAIL: g++.dg/other/pr84964.C  -std=c++17  (test for warnings, line 6)
FAIL: g++.dg/other/pr84964.C  -std=c++20  (test for warnings, line 6)
FAIL: g++.dg/other/pr84964.C  -std=c++98  (test for warnings, line 6)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7607/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/other/pr84964.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/other/pr84964.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] Fix DImode to TImode sign extend issue, PR target/104868

2022-03-10 Thread Michael Meissner via Gcc-patches
Fix DImode to TImode sign extend issue, PR target/104898

PR target/104868 had had an issue where my code that updated the DImode to
TImode sign extension for power10 failed.  In looking at the failure
message, the reason is when extendditi2 tries to split the insn, it
generates an insn that does not satisfy its constraints:

(set (reg:V2DI 65 1)
 (vec_duplicate:V2DI (reg:DI 0)))

The reason is vsx_splat_v2di does not allow GPR register 0 when the will
be generating a mtvsrdd instruction.  In the definition of the mtvsrdd
instruction, if the RA register is 0, it means clear the upper 64 bits of
the vector instead of moving register GPR 0 to those bits.

When I wrote the extendditi2 pattern, I forgot that mtvsrdd had that
behavior so I used a 'r' constraint instead of 'b'.  In the rare case
where the value is in GPR register 0, this split will fail.

This patch uses the right constraint for extendditi2.

Note, I was unable to get the example to fail.  I built a toolchain, and
modified it so libgfortran was built with -flto.  But I feel confident that
this patch is the right fix for the problem listed in the PR.

Can I check this into the master branch?  Assuming this patch is accepted, I
would incorporate it into the backport for GCC 11.  I wasn't planning on
backporting it to GCC 10, since the original bug (PR target/104698) does not
show up there.

2022-03-10   Michael Meissner  

gcc/
PR target/104868
* config/rs6000/vsx.md (extendditi2): Use a 'b' constraint when
moving from a GPR register to an Altivec register.
---
 gcc/config/rs6000/vsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index d0fb92f5985..15bd86dfdfb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5033,7 +5033,7 @@ (define_expand "vsignextend_si_v2di"
 ;; generate the vextsd2q instruction.
 (define_insn_and_split "extendditi2"
   [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
-   (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
+   (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,b,wa,Z")))
(clobber (reg:DI CA_REGNO))]
   "TARGET_POWERPC64 && TARGET_POWER10"
   "#"
-- 
2.35.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Check if loading const from mem is faster

2022-03-10 Thread Jiufu Guo via Gcc-patches


Hi!

Richard Biener  writes:

> On Thu, 10 Mar 2022, Jiufu Guo wrote:
>
>> 
>> Hi!
>> 
>> Richard Biener  writes:
>> 
>> > On Wed, 9 Mar 2022, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi!
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Tue, 8 Mar 2022, Jiufu Guo wrote:
>> >> >
>> >> >> Jiufu Guo  writes:
>> >> >> 
>> >> >> Hi!
>> >> >> 
>> >> >> > Hi Sehger,
>> >> >> >
>> >> >> > Segher Boessenkool  writes:
>> >> >> >
>> >> >> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote:
>> >> >> >>> Segher Boessenkool  writes:
>> >> >> >>> > No.  insn_cost is only for correct, existing instructions, not 
>> >> >> >>> > for
>> >> >> >>> > made-up nonsense.  I created insn_cost precisely to get away 
>> >> >> >>> > from that
>> >> >> >>> > aspect of rtx_cost (and some other issues, like, it is 
>> >> >> >>> > incredibly hard
>> >> >> >>> > and cumbersome to write a correct rtx_cost).
>> >> >> >>> 
>> >> >> >>> Thanks! The implementations of hook insn_cost are align with this
>> >> >> >>> design, they are  checking insn's attributes and COSTS_N_INSNS.
>> >> >> >>> 
>> >> >> >>> One question on the speciall case: 
>> >> >> >>> For instruction: "r119:DI=0x100803004101001"
>> >> >> >>> Would we treat it as valid instruction?
>> >> >> >>
>> >> >> >> Currently we do, alternative 6 in *movdi_internal64: we allow any 
>> >> >> >> r<-n.
>> >> >> >> This is costed as 5 insns (cost=20).
>> >> >> >>
>> >> >> >> It generally is better to split things into patterns close to the
>> >> >> >> eventual machine isntructions as early as possible: all the more 
>> >> >> >> generic
>> >> >> >> optimisations can take advantage of that then.
>> >> >> > Get it!
>> >> >> >>
>> >> >> >>> A patch, which is attached the end of this mail, accepts
>> >> >> >>> "r119:DI=0x100803004101001" as input of insn_cost.
>> >> >> >>> In this patch, 
>> >> >> >>> - A tmp instruction is generated via make_insn_raw.
>> >> >> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost.
>> >> >> >>> - In hook of insn_cost, checking the special 'constant' 
>> >> >> >>> instruction.
>> >> >> >>> Are these make sense?
>> >> >> >>
>> >> >> >> I'll review that patch inline.
>> >> >> 
>> >> >> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc.
>> >> >> Different from the previous partial patch, this patch replaces all 
>> >> >> usage
>> >> >> of rtx_cost. It may be better/aggressive than previous one.
>> >> >
>> >> > I think there's no advantage for using insn_cost over rtx_cost for
>> >> > the simple SET case.
>> >> 
>> >> Thanks for your comments and raise this concern.
>> >> 
>> >> For those targets which do not implement insn_cost, insn_cost calls
>> >> rtx_cost through pattern_cost, then insn_cost is equal to rtx_cost.
>> >> 
>> >> While, for those targets which have insn_cost, it seems insn_cost would
>> >> be better(or say more accurate/consistent?) than rtx_cost. Since:
>> >> - insn_cost recog the insn first, and compute cost through something
>> >
>> > target hooks are expected to call recog on the insn but the generic
>> > fallback does not!?  Or do you say a target _could_ call recog?
>> > I think it would be valid to only expect recognized insns here
>> > and thus cse.cc obligation would be to call regoc on the "fake"
>> > instruction which then raises the obvious issue whether you should
>> > ever call recog on something "fake".
>> Thanks Richard! I also feel this is a main point of insn_cost.
>> From my understanding: it would be better to let insn_cost check
>> the valid recognizable insns; this would be the major purpose of
>> insn_cost.  While, I'm also wondering, we may let it go for 'fake'
>> instruction for current implementations (e.g. arm_insn_cost) which
>> behaviors to walk/check pattern. 
>
> Note for the CSE case you'll always end up with the single move
> pattern so it's somewhat pointless to do the recog & insn_cost
> dance there.  For move cost using rtx_cost (SET, ..) should
> be good enough.  One could argue we should standardize a (wrapping)
> API like move_cost (enum machine_mode mode, rtx to, rtx from),
> with to/from optional (if omitted a REG of MODE).  But the existing
> rtx_cost target hook implementation should be sufficient to handle
> it, without building a fake insn and without doing (the pointless,
> if not failing) recog process.

Thanks so much for your comments and suggestions!

Using rtx_cost is able to handle those things in cse.cc.
For insn_cost, it is good for checking an instruction.  To simulate
'rtx_cost' using insn_cost, there is excess work: 'recog' on the setting
of rtx expr and alternative estimation. 

I just wondering we may prefer to use insn_cost for its consistency and
accuracy. :-) So, the patch is prepared.


BR,
Jiufu

>
> Richard.
>
>> >
>> > I also see that rs6000_insn_cost does
>> >
>> > static int
>> > rs6000_insn_cost (rtx_insn *insn, bool speed)
>> > {
>> >   if (recog_memoized (insn) < 0)
>> > return 0;
>> >
>> > so not recognized insns become quite che

[PATCH] middle-end/104854: Limit strncmp overread warnings

2022-03-10 Thread Siddhesh Poyarekar
The size argument in strncmp only describe the maximum length to which
to compare two strings and is not an indication of sizes of the two
source strings.  Do not warn if it is larger than the two input strings
because it is entirely likely that the size argument is a conservative
maximum to accommodate inputs of different lengths and only a subset is
reachable through the current code path.

gcc/ChangeLog:

middle-end/104854
* gimple-ssa-warn-access.cc
(pass_waccess::warn_zero_sized_strncmp_inputs): New function.
(pass_waccess::check_strncmp): Use it.

gcc/testsuite/ChangeLog:

middle-end/104854
* gcc.dg/Wstringop-overread.c (test_strncmp_array): Don't expect
failures for non-zero sizes.

Signed-off-by: Siddhesh Poyarekar 
---

x86_64 bootstrap in progress.

 gcc/gimple-ssa-warn-access.cc | 39 +--
 gcc/testsuite/gcc.dg/Wstringop-overread.c |  2 +-
 2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 75297ed7c9e..970f4b9b69f 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2137,6 +2137,9 @@ private:
   /* Return true if use follows an invalidating statement.  */
   bool use_after_inval_p (gimple *, gimple *, bool = false);
 
+  /* Emit an overread warning for zero sized inputs to strncmp.  */
+  void warn_zero_sized_strncmp_inputs (gimple *, tree *, access_data *);
+
   /* A pointer_query object to store information about pointers and
  their targets in.  */
   pointer_query m_ptr_qry;
@@ -2619,8 +2622,20 @@ pass_waccess::check_stxncpy (gcall *stmt)
data.mode, &data, m_ptr_qry.rvals);
 }
 
-/* Check a call STMT to stpncpy() or strncpy() for overflow and warn
-   if it does.  */
+/* Warn for strncmp on a zero sized source or when an argument isn't
+   nul-terminated.  */
+void
+pass_waccess::warn_zero_sized_strncmp_inputs (gimple *stmt, tree *bndrng,
+ access_data *pad)
+{
+  tree func = get_callee_fndecl (stmt);
+  location_t loc = gimple_location (stmt);
+  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func, bndrng,
+   size_zero_node, pad);
+}
+
+/* Check a call STMT to strncmp () for overflow and warn if it does.  This is
+   limited to checking for NUL terminated arrays for now.  */
 
 void
 pass_waccess::check_strncmp (gcall *stmt)
@@ -2703,21 +2718,11 @@ pass_waccess::check_strncmp (gcall *stmt)
   else if (rem2 == 0 || (rem2 < rem1 && lendata2.decl))
 rem1 = rem2;
 
-  /* Point PAD at the array to reference in the note if a warning
- is issued.  */
-  access_data *pad = len1 ? &adata2 : &adata1;
-  offset_int maxrem = wi::max (rem1, rem2, UNSIGNED);
-  if (lendata1.decl || lendata2.decl
-  || maxrem < wi::to_offset (bndrng[0]))
-{
-  /* Warn when either argument isn't nul-terminated or the maximum
-remaining space in the two arrays is less than the bound.  */
-  tree func = get_callee_fndecl (stmt);
-  location_t loc = gimple_location (stmt);
-  maybe_warn_for_bound (OPT_Wstringop_overread, loc, stmt, func,
-   bndrng, wide_int_to_tree (sizetype, maxrem),
-   pad);
-}
+  if (rem1 == 0)
+warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata1);
+  if (rem2 == 0)
+warn_zero_sized_strncmp_inputs (stmt, bndrng, &adata2);
+
 }
 
 /* Determine and check the sizes of the source and the destination
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overread.c 
b/gcc/testsuite/gcc.dg/Wstringop-overread.c
index 7db74029819..fb8e626439d 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overread.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overread.c
@@ -431,7 +431,7 @@ void test_strncmp_array (const char *s, int i)
 
   T (strncmp (a1, b1, 0));
   T (strncmp (a1, b1, 1));
-  T (strncmp (a1, b1, 2));  // { dg-warning "'strncmp' specified bound 2 
exceeds source size 1" }
+  T (strncmp (a1, b1, 2));
 }
 
 
-- 
2.35.1



[PATCH v4] libgo: Don't use pt_regs member in mcontext_t

2022-03-10 Thread soeren--- via Gcc-patches
From: Sören Tempel 

The .regs member is primarily intended to be used in conjunction with
ptrace. Since this code is not using ptrace, using .regs is a bad idea.
Furthermore, the code currently fails to compile on musl since the
pt_regs type (used by .regs) is in an incomplete type which has to be
completed by inclusion of the asm/ptrace.h Kernel header. Contrary to
glibc, this header is not indirectly included by musl through other
header files.

This patch fixes compilation of this code with musl libc by accessing
the register values via .gp_regs/.gregs (depending on 32-bit or 64-bit
PowerPC) instead of using .regs. For more details, see
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591261.html

For the offsets in gp_regs refer to the Kernel asm/ptrace.h header.

This patch has been tested on Alpine Linux ppc64le (uses musl libc).

Signed-off-by: Sören Tempel 

ChangeLog:

* libgo/runtime/go-signal.c (defined): Use .gp_regs/.gregs
  to access ppc64/ppc32 registers.
(dumpregs): Ditto.
---
Changes since v3: Add special handling for 32-bit PowerPC with glibc,
also avoid use of gregs_t type since glibc does not seem to define
it on PowerPC.

This version of the patch introduces a new macro (PPC_GPREGS) to access
these registers to special case musl/glibc handling in a central place
once instead of duplicating it twice.

 libgo/runtime/go-signal.c | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index d30d1603adc..3255046260d 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -16,6 +16,21 @@
   #define SA_RESTART 0
 #endif
 
+// The PowerPC API for accessing gregs/gp_regs differs greatly across
+// different libc implementations (musl and glibc).  To workaround that,
+// define the canonical way to access these registers once here.
+//
+// See https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591360.html
+#ifdef __PPC__
+#if defined(__PPC64__)   /* ppc64 glibc & musl */
+#define PPC_GPREGS(MCTX) (MCTX)->gp_regs
+#elif defined(__GLIBC__) /* ppc32 glibc */
+#define PPC_GPREGS(MCTX) (MCTX)->uc_regs->gregs
+#else/* ppc32 musl */
+#define PPC_GPREGS(MCTX) (MCTX)->gregs
+#endif /* __PPC64__ */
+#endif /* __PPC__ */
+
 #ifdef USING_SPLIT_STACK
 
 extern void __splitstack_getcontext(void *context[10]);
@@ -224,7 +239,8 @@ getSiginfo(siginfo_t *info, void *context 
__attribute__((unused)))
 #elif defined(__alpha__) && defined(__linux__)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.sc_pc;
 #elif defined(__PPC__) && defined(__linux__)
-   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
+   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
+   ret.sigpc = PPC_GPREGS(m)[32];
 #elif defined(__PPC__) && defined(_AIX)
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
 #elif defined(__aarch64__) && defined(__linux__)
@@ -341,13 +357,13 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
*context __attribute__((u
int i;
 
for (i = 0; i < 32; i++)
-   runtime_printf("r%d %X\n", i, m->regs->gpr[i]);
-   runtime_printf("pc  %X\n", m->regs->nip);
-   runtime_printf("msr %X\n", m->regs->msr);
-   runtime_printf("cr  %X\n", m->regs->ccr);
-   runtime_printf("lr  %X\n", m->regs->link);
-   runtime_printf("ctr %X\n", m->regs->ctr);
-   runtime_printf("xer %X\n", m->regs->xer);
+   runtime_printf("r%d %X\n", i, PPC_GPREGS(m)[i]);
+   runtime_printf("pc  %X\n", PPC_GPREGS(m)[32]);
+   runtime_printf("msr %X\n", PPC_GPREGS(m)[33]);
+   runtime_printf("cr  %X\n", PPC_GPREGS(m)[38]);
+   runtime_printf("lr  %X\n", PPC_GPREGS(m)[36]);
+   runtime_printf("ctr %X\n", PPC_GPREGS(m)[35]);
+   runtime_printf("xer %X\n", PPC_GPREGS(m)[37]);
  }
 #elif defined(__PPC__) && defined(_AIX)
  {