date:20220810

[PATCH] rs6000: Enable generate const through pli+pli+rldimi

2022-08-10 Thread Jiufu Guo via Gcc-patches

Hi,

As mentioned in PR106550, since pli could support 34bits immediate, we could
use less instructions(3insn would be ok) to build 64bits constant with pli.

For example, for constant 0x020805006106003, we could generate it with:
asm code1:
pli 9,101736451 (0x6106003)
sldi 9,9,32
paddi 9,9, 213 (0x0208050)

or asm code2:
pli 10, 213
pli 9, 101736451
rldimi 9, 10, 32, 0

If there is only one register can be used, then the asm code1 is ok. Otherwise
asm code2 may be better.

This patch re-enable the constant building(splitter) before RA by updating the
constrains from int_reg_operand_not_pseudo to gpc_reg_operand.  And then, we
could use two different pseduo for two pli(s), and asm code2 can be generated.

This patch also could generate asm code1 if hard register is allocated for the
constant.

This patch pass boostrap and regtest on ppc64le(includes p10).
Is it ok for trunk?

BR,
Jeff(Jiufu)


PR target/106550

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Enable constant
build with pli instructions.
* config/rs6000/rs6000.md: Use gpc_reg_operand for constant splitter.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106550.c: New test.


---
 gcc/config/rs6000/rs6000.cc | 40 +
 gcc/config/rs6000/rs6000.md |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 
 3 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..a2e6b7f59a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10181,6 +10181,46 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
gen_rtx_IOR (DImode, copy_rtx (temp),
 GEN_INT (ud1)));
 }
+  else if (TARGET_PREFIXED)
+{
+  /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
+  if (can_create_pseudo_p ())
+   {
+ temp = gen_reg_rtx (DImode);
+ rtx temp1 = gen_reg_rtx (DImode);
+ emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
+ emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
+
+ rtx one = gen_rtx_AND (DImode, temp1, GEN_INT (0x));
+ rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
+ emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));
+   }
+
+  /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32.  */
+  else
+   {
+ emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3));
+
+ emit_move_insn (copy_rtx (dest),
+ gen_rtx_ASHIFT (DImode, copy_rtx (dest),
+ GEN_INT (32)));
+
+ bool cannot_use_paddi = REGNO (dest) == FIRST_GPR_REGNO;
+
+ /* Use paddi for the low32 bits.  */
+ if (ud2 != 0 && ud1 != 0 && !cannot_use_paddi)
+   emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest),
+   GEN_INT ((ud2 << 16) | ud1)));
+ /* Use oris, ori for low32 bits.  */
+ if (ud2 != 0 && (ud1 == 0 || cannot_use_paddi))
+   emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest,
+   gen_rtx_IOR (DImode, copy_rtx (dest),
+GEN_INT (ud2 << 16)));
+ if (ud1 != 0 && (ud2 == 0 || cannot_use_paddi))
+   emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest),
+  GEN_INT (ud1)));
+   }
+}
   else
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1367a2cb779..abe777a593c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9659,7 +9659,7 @@ (define_split
 ;; When non-easy constants can go in the TOC, this should use
 ;; easy_fp_constant predicate.
 (define_split
-  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
+  [(set (match_operand:DI 0 "gpc_reg_operand")
(match_operand:DI 1 "const_int_operand"))]
   "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
   [(set (match_dup 0) (match_dup 2))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
b/gcc/testsuite/gcc.target/powerpc/pr106550.c
new file mode 100644
index 000..bca7760bad9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -0,0 +1,14 @@
+/* PR target/106550 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */
+
+void
+foo (unsigned long long *a)
+{
+  *a++ = 0x020805006106003;
+  *a++ = 0x2351847027482577;  
+}
+
+/* { dg-final { scan-assembler-times {\mpli\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
+
-- 
2.17.1

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Martin Liška

On 8/10/22 08:56, Richard Biener wrote:
> C++ standard library includes have to go through system.h (#define
> INCLUDE_STRING).

Oh, yeah. That means I need to rely on the flat header files :/

> 
> Does the API really have to use std::string?

I would like to. My main motivation is std::string::rfind function that
has no C equivalent (would be rstrstr).

MartinFrom 8e1d866e9bf3005c8a8b1afa9df1bdf807b8394c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 9 Aug 2022 13:59:32 +0200
Subject: [PATCH 1/3] Factor out jobserver_active_p.

gcc/ChangeLog:

	* gcc.cc (driver::detect_jobserver): Remove and move to
	jobserver.h.
	* lto-wrapper.cc (jobserver_active_p): Likewise.
	(run_gcc): Likewise.
	* jobserver.h: New file.
---
 gcc/gcc.cc | 37 +++--
 gcc/jobserver.h| 83 ++
 gcc/lto-wrapper.cc | 44 +---
 3 files changed, 97 insertions(+), 67 deletions(-)
 create mode 100644 gcc/jobserver.h

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 5cbb38560b2..ca70dbd3a6d 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -27,6 +27,7 @@ CC recognizes how to compile each input file by suffixes in the file names.
 Once it knows which kind of compilation to perform, the procedure for
 compilation is specified by a string called a "spec".  */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -43,6 +44,7 @@ compilation is specified by a string called a "spec".  */
 #include "opts.h"
 #include "filenames.h"
 #include "spellcheck.h"
+#include "jobserver.h"
 
 
 
@@ -9178,38 +9180,9 @@ driver::final_actions () const
 void
 driver::detect_jobserver () const
 {
-  /* Detect jobserver and drop it if it's not working.  */
-  const char *makeflags = env.get ("MAKEFLAGS");
-  if (makeflags != NULL)
-{
-  const char *needle = "--jobserver-auth=";
-  const char *n = strstr (makeflags, needle);
-  if (n != NULL)
-	{
-	  int rfd = -1;
-	  int wfd = -1;
-
-	  bool jobserver
-	= (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
-	   && rfd > 0
-	   && wfd > 0
-	   && is_valid_fd (rfd)
-	   && is_valid_fd (wfd));
-
-	  /* Drop the jobserver if it's not working now.  */
-	  if (!jobserver)
-	{
-	  unsigned offset = n - makeflags;
-	  char *dup = xstrdup (makeflags);
-	  dup[offset] = '\0';
-
-	  const char *space = strchr (makeflags + offset, ' ');
-	  if (space != NULL)
-		strcpy (dup + offset, space);
-	  xputenv (concat ("MAKEFLAGS=", dup, NULL));
-	}
-	}
-}
+  jobserver_info jinfo;
+  if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
+xputenv (jinfo.skipped_makeflags.c_str ());
 }
 
 /* Determine what the exit code of the driver should be.  */
diff --git a/gcc/jobserver.h b/gcc/jobserver.h
new file mode 100644
index 000..d57930ff7af
--- /dev/null
+++ b/gcc/jobserver.h
@@ -0,0 +1,83 @@
+/* GNU make's jobserver related functionality.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+See dbgcnt.def for usage information.  */
+
+#ifndef GCC_JOBSERVER_H
+#define GCC_JOBSERVER_H
+
+using namespace std;
+
+struct jobserver_info
+{
+  /* Default constructor.  */
+  jobserver_info ();
+
+  /* Error message if there is a problem.  */
+  string error_msg = "";
+  /* Skipped MAKEFLAGS where --jobserver-auth is skipped.  */
+  string skipped_makeflags = "";
+  /* File descriptor for reading used for jobserver communication.  */
+  int rfd = -1;
+  /* File descriptor for writing used for jobserver communication.  */
+  int wfd = -1;
+  /* Return true if jobserver is active.  */
+  bool is_active = false;
+};
+
+jobserver_info::jobserver_info ()
+{
+  /* Detect jobserver and drop it if it's not working.  */
+  string js_needle = "--jobserver-auth=";
+
+  const char *envval = getenv ("MAKEFLAGS");
+  if (envval != NULL)
+{
+  string makeflags = envval;
+  size_t n = makeflags.rfind (js_needle);
+  if (n != string::npos)
+	{
+	  if (sscanf (makeflags.c_str () + n + js_needle.size (),
+		  "%d,%d", &rfd, &wfd) == 2
+	  && rfd > 0
+	  && wfd > 0
+	  && is_valid_fd (rfd)
+	  && is_valid_fd (wfd))
+	is_active = true;
+	  else
+	{
+	  string dup = makeflags.substr (0, n);
+	  size_t pos = makeflags.find (' ', n);
+	  if (pos != string::npos)
+		dup += m

Re: [PATCH v2, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-10 Thread Kewen.Lin via Gcc-patches

on 2022/8/10 05:34, Segher Boessenkool wrote:
> On Tue, Aug 09, 2022 at 11:14:16AM +0800, Kewen.Lin wrote:
>> on 2022/8/8 14:04, HAO CHEN GUI wrote:
>>> +/* { dg-do run { target { has_arch_ppc64 } } } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
>>> +/* { dg-require-effective-target int128 } */
>>> +/* { dg-require-effective-target p9modulo_hw } */
>>> +/* { dg-final { scan-assembler-times {\mmaddld\M} 2 } } */
>>> +/* { dg-final { scan-assembler-times {\mmaddhd\M} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\mmaddhdu\M} 1 } } */
>>> +
>>
>> Maybe it's good to split this case into two, one for compiling and the other 
>> for running.
>> Since the generated asm is a test point here, with one separated case for 
>> compiling, we
>> can still have that part of test coverage on hosts which are unable to run 
>> this case.
>> You can move functions multiply_add and multiply_addu into one common header 
>> file, then
>> include it in both source files.
> 
> Yeah, good point.  You cannot make dg-do do different things on
> different targets.  Fortunatelt just duplicating this test and then
> removing the things not relevant to run resp. compile testing makes
> things even more clear :-)
> 
>> Nit: better to add one explicit "return 0;" to avoid possible warning.
> 
> This is in main(), the C standard requires this to work without return
> (and it is common).  But, before C99 the implicit return value from
> main() was undefined, so yes, it could warn then.  Does it?
> 

Yes, exactly, with explicit -std=c89 -Wreturn-type it will have a warning:

warning: control reaches end of non-void function...

BR,
Kewen

Re: [PATCH] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-08-10 Thread Kewen.Lin via Gcc-patches

on 2022/8/10 05:10, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 09, 2022 at 08:51:59PM +0800, Kewen.Lin wrote:
>> on 2022/8/9 18:35, Segher Boessenkool wrote:
 +/* As ELFv2 ABI shows, the allowable bytes past the global entry
 +   point are 0, 4, 8, 16, 32 and 64.  Considering there are two
 +   non-prefixed instructions for global entry (8 bytes), the count
 +   for patchable NOPs before local entry would be 2, 6 and 14.  */
>>>
>>> The other option is to allow other numbers of nops, but in that case not
>>> have a local entry point (so, always use the global entry point).
>>
>> Good point, it's doable, but it means for the other counts of NOPs, the
>> patched function has to pay the cost of TOC initialization all the time,
>> IMHO it may not be what we want.
> 
> It isn't very expensive: the main benefit of the LEP is not not having
> to do those two insns, but having the r2 setter earlier, allowing loads
> via the TOC reg to execute earlier.
> 

OK.

>>> I don't know if that is useful for any users of this support (if there
>>> even are such users :-P )
>>
>> Yeah, as the discussions in PR98125, powerpc linux kernel doesn't adopt
>> this feature.  :-P
> 
> Right, -mprofile-kernel is more efficient.
> 
> So maybe just say in the comment that it is possible to support those
> other nop pad sizes, by not doing a LEP at all?  Instead of sasying it
> cannot be done :-)

OK, I'll update the comments like:

/* As ELFv2 ABI shows, the allowable bytes past the global entry
   point are 0, 4, 8, 16, 32 and 64 when there is a local entry.
   Considering there are two non-prefixed instructions for global
   entry (8 bytes), the count for patchable NOPs before local entry
   would be 2, 6 and 14.  It's possible to support those other
   counts of NOPs by not doing a local entry at all, but we don't
   have clear user cases for them, so leave them unsupported for
   now.  */

> 
>>
>>>
 +if (patch_area_entry > 0)
 +  {
 +if (patch_area_entry != 2
 +&& patch_area_entry != 6
 +&& patch_area_entry != 14)
 +  error ("for %<-fpatchable-function-entry=%u,%u%>, patching "
 + "%u NOP(s) before function entry is invalid, it can "
 + "cause assembler error",
>>>
>>> I would not say "it can [etc.]" at all.  Oh, and "NOP" (capitals) isn't
>>> a thing, it is not an acronym or such ;-)
>>>
>>
>> Poor at wording.  :(  Could you help to suggest some words here? 
> 
> I'll try...
> 
> "unsupported number of nops before function entry (%u)"
> 

Nice, will update with this.

 +/* { dg-require-effective-target powerpc_elfv2 } */
 +/* Specify -mcpu=power9 to ensure global entry is needed.  */
 +/* { dg-options "-mdejagnu-cpu=power9" } */
>>>
>>> Why would it be needed for p9, and not older, or newer?
>>>
>>
>> It can be p8 or p9, but not p10 and later.  
>>
>> It's meant to exclude pc-relative feature which can make the case not
>> generate a global entry point prologue and the test point will become
>> unavailable.  I thought about adding -mno-pcrel, but guessed it's safer
>> to use one cpu type which doesn't support pcrel at all, since it can
>> exclude all possibilities that pcrel gets re-enabled.
>>
>> Do you think -mno-pcrel is more elegant and relatively safe?
>> Or just update the comments to make it more meaningful?
> 
> Just use { ! powerpc_pcrel } ?  I don't think you can put that in a
> dg-require-effective-target, but you can do for example
>   dg-do compile { target { ! powerpc_pcrel } }
> or similar.
> 

Good idea, I'll send out one new version of patch after some testings.

BR,
Kewen

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Richard Biener via Gcc-patches

On Wed, Aug 10, 2022 at 9:17 AM Martin Liška  wrote:
>
> On 8/10/22 08:56, Richard Biener wrote:
> > C++ standard library includes have to go through system.h (#define
> > INCLUDE_STRING).
>
> Oh, yeah. That means I need to rely on the flat header files :/
>
> >
> > Does the API really have to use std::string?
>
> I would like to. My main motivation is std::string::rfind function that
> has no C equivalent (would be rstrstr).

The old code happily uses strstr though, not worrying about
finding the last instance of --jobserver-auth?

Anyway, I'm not going to insist - I just noticed the actual
users use .c_str on the error message and adjusting the
environment for a not working jobserver is done
inconsistently.  Since I'm coming from C I was more
expecting sth like

 bool jobserver_active = probe_jobserver (true /* diagnose */);

rather than pulling in a class instance from an all-inline
implementation.  But hey ;)

>
> Martin

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Martin Liška

On 8/10/22 09:47, Richard Biener wrote:
> On Wed, Aug 10, 2022 at 9:17 AM Martin Liška  wrote:
>>
>> On 8/10/22 08:56, Richard Biener wrote:
>>> C++ standard library includes have to go through system.h (#define
>>> INCLUDE_STRING).
>>
>> Oh, yeah. That means I need to rely on the flat header files :/
>>
>>>
>>> Does the API really have to use std::string?
>>
>> I would like to. My main motivation is std::string::rfind function that
>> has no C equivalent (would be rstrstr).
> 
> The old code happily uses strstr though, not worrying about
> finding the last instance of --jobserver-auth?

Yes, sorry, I forgot to mention that, it's something I was notified by the GNU 
make
developer here: https://savannah.gnu.org/bugs/index.php?57242#comment13

> 
> Anyway, I'm not going to insist - I just noticed the actual
> users use .c_str on the error message and adjusting the
> environment for a not working jobserver is done
> inconsistently.  Since I'm coming from C I was more
> expecting sth like
> 
>  bool jobserver_active = probe_jobserver (true /* diagnose */);

Well, the main problem is that I need to "extra" a bunch of information
when parsing the env variable (and each consumer needs something else,
so that's why the jobserver_info members). It was very ugly having all
these return values being given as params (of pointer type).

Martin

> 
> rather than pulling in a class instance from an all-inline
> implementation.  But hey ;)
> 
> 
>>
>> Martin

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-10 Thread Richard Biener via Gcc-patches

On Wed, 10 Aug 2022, Richard Biener wrote:

> On Tue, 9 Aug 2022, Andrew MacLeod wrote:
> 
> > 
> > On 8/9/22 09:01, Richard Biener wrote:
> > > This revisits how we compute imports later used for the ranger path
> > > query during backwards threading.  The compute_imports function
> > > of the path solver ends up pulling the SSA def chain of regular
> > > stmts without limit and since it starts with just the gori imports
> > > of the path exit it misses some interesting names to translate
> > > during path discovery.  In fact with a still empty path this
> > > compute_imports function looks like not the correct tool.
> > 
> > I don't really know how this works in practice.  Aldys off this week, so he
> > can comment when he returns.
> > 
> > The original premise was along the line of recognizing that only changes to 
> > a
> > GORI import name to a block can affect the branch at the end of the block. 
> > ie, if the path doesn't change any import to block A, then the branch at the
> > end of block A will not change either.    Likewise, if it does change an
> > import, then we look at whether the branch can be threaded.    Beyond that
> > basic premise, I dont know what all it does.
> 
> Yep, I also think that's the idea.

[...]

> Anyway, the important result of the change is that the imports
> set is vastly smaller since it is now constrained to the
> actual path.  Plus it now contains the local defs in blocks
> of the path that do not dominate the exit block which means
> it might get more threading opportunities.  Which reminds me
> to re-do the cc1files experiment for this change.

Results are a bit unconclusive.  We have
ethread 14044 -> 14058, first threadfull 36986 -> 37174,
first thread 8060 -> 8049, dom 21723 -> 21822, second thread
(after loop opts) 6242 -> 5582, second dom 8522 -> 8765,
second threadfull 3072 -> 2998 which makes an overall drop
in the number of threads from 98649 to 98448.

I've isolated one testcase we threaded before but no longer after
this change and that is

void foo (int nest, int print_nest)
{
  _Bool t0 = nest != 0;
  _Bool t1 = nest == print_nest;
  _Bool t2 = t0 & t1;
  if (t2)
__builtin_puts ("x");
  nest++;
  if (nest > 2)
__builtin_abort ();
  if (print_nest == nest)
__builtin_puts ("y");
}

where we are able to thread from if (t2) to if (print_nest == nest)
resolving that to false when t2 is true using the nest == print_nest
relation.  Now, the reason is because of the imports added by

  // Exported booleans along the path, may help conditionals.
  if (m_resolve)
for (i = 0; i < m_path.length (); ++i)
  {   
basic_block bb = m_path[i];
tree name;
FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
  if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
bitmap_set_bit (imports, SSA_NAME_VERSION (name));
  }

_BUT_ - it turns out that the m_path local to the solver is still
the one from the last back_threader::find_taken_edge_switch
calling m_solver->compute_ranges (path, m_imports), so for a possibly
completely unrelated path.  Truncating the path also on the solver
side before calling compute_imports makes the testcase fail to thread.
I'll note the PHI handling also looks at the stale m_path.

I see the solver itself adds relations from edges on the path so
the cruical item here seems to be to add imports for the path
entry conditional, but those would likely be GORI imports for that
block?  Unfortunately that fails to add t[012], the GORI exports
seem to cover all that's needed but then exports might be too much
here?  At least that's what the code in compute_imports effectively
does (also for the entry block).  But I do wonder why compute_ranges
does not add relations computed by the entry conditional ...
That's probably because of

void
path_range_query::compute_outgoing_relations (basic_block bb, basic_block 
next)
{
  gimple *stmt = last_stmt (bb);

  if (stmt
  && gimple_code (stmt) == GIMPLE_COND
  && (import_p (gimple_cond_lhs (stmt))
  || import_p (gimple_cond_rhs (stmt

where for a condition like above the names in the imports are not
in the condition itself.  The gori.outgoing_edge_range_p compute
of the import ranges is appearantly not affected by this
restriction and picks up nest as ~[0, 0] on the respective path
even though, while 'nest' is in imports, the temporaries are not.
That would support the argument to drop the import_p checks in
path_range_query::compute_outgoing_relations.

I'm not sure it's worth fixing incrementally though, it's fallout
of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
still computes imports itself before calling compute_ranges.

For comparing thread numbers fixing the current state incrementally
would be nice I guess, I will test something but not further pursue it
if not successful.

Richard.

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Richard Biener via Gcc-patches

On Wed, Aug 10, 2022 at 11:30 AM Martin Liška  wrote:
>
> On 8/10/22 09:47, Richard Biener wrote:
> > On Wed, Aug 10, 2022 at 9:17 AM Martin Liška  wrote:
> >>
> >> On 8/10/22 08:56, Richard Biener wrote:
> >>> C++ standard library includes have to go through system.h (#define
> >>> INCLUDE_STRING).
> >>
> >> Oh, yeah. That means I need to rely on the flat header files :/
> >>
> >>>
> >>> Does the API really have to use std::string?
> >>
> >> I would like to. My main motivation is std::string::rfind function that
> >> has no C equivalent (would be rstrstr).
> >
> > The old code happily uses strstr though, not worrying about
> > finding the last instance of --jobserver-auth?
>
> Yes, sorry, I forgot to mention that, it's something I was notified by the 
> GNU make
> developer here: https://savannah.gnu.org/bugs/index.php?57242#comment13
>
> >
> > Anyway, I'm not going to insist - I just noticed the actual
> > users use .c_str on the error message and adjusting the
> > environment for a not working jobserver is done
> > inconsistently.  Since I'm coming from C I was more
> > expecting sth like
> >
> >  bool jobserver_active = probe_jobserver (true /* diagnose */);
>
> Well, the main problem is that I need to "extra" a bunch of information
> when parsing the env variable (and each consumer needs something else,
> so that's why the jobserver_info members). It was very ugly having all
> these return values being given as params (of pointer type).

Yeah, fair enough.

> Martin
>
> >
> > rather than pulling in a class instance from an all-inline
> > implementation.  But hey ;)
> >
> >
> >>
> >> Martin
>

Re: [PATCH 3/3] lto: respect jobserver in parallel WPA streaming

2022-08-10 Thread Richard Biener via Gcc-patches

On Tue, Aug 9, 2022 at 2:02 PM Martin Liška  wrote:
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> PR lto/106328
>
> gcc/ChangeLog:
>
> * jobserver.h (struct jobserver_info): Add pipefd.
> (jobserver_info::connect): New.
> (jobserver_info::disconnect): Likewise.
> (jobserver_info::get_token): Likewise.
> (jobserver_info::return_token): Likewise.
>
> gcc/lto/ChangeLog:
>
> * lto.cc (wait_for_child): Decrement nruns once a process
> finishes.
> (stream_out_partitions): Use job server if active.
> (do_whole_program_analysis): Likewise.
> ---
>  gcc/jobserver.h | 54 
>  gcc/lto/lto.cc  | 55 +
>  2 files changed, 96 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/jobserver.h b/gcc/jobserver.h
> index 856e326ddfc..2a7dc9f4113 100644
> --- a/gcc/jobserver.h
> +++ b/gcc/jobserver.h
> @@ -31,6 +31,18 @@ struct jobserver_info
>/* Default constructor.  */
>jobserver_info ();
>
> +  /* Connect to the server.  */
> +  void connect ();
> +
> +  /* Disconnect from the server.  */
> +  void disconnect ();
> +
> +  /* Get token from the server.  */
> +  bool get_token ();
> +
> +  /* Return token to the server.  */
> +  void return_token ();
> +
>/* Error message if there is a problem.  */
>string error_msg = "";
>/* Skipped MAKEFLAGS where --jobserver-auth is skipped.  */
> @@ -41,6 +53,8 @@ struct jobserver_info
>int wfd = -1;
>/* Named pipe path.  */
>string pipe_path = "";
> +  /* Pipe file descriptor.  */
> +  int pipefd = -1;
>/* Return true if jobserver is active.  */
>bool is_active = false;
>  };
> @@ -97,4 +111,44 @@ jobserver_info::jobserver_info ()
>  error_msg = "jobserver is not available: " + error_msg;
>  }
>
> +void
> +jobserver_info::connect ()
> +{
> +  if (!pipe_path.empty ())
> +pipefd = open (pipe_path.c_str (), O_RDWR);
> +}
> +
> +void
> +jobserver_info::disconnect ()

If it's not a template and the methods not inline won't we get multiple
definition errors when this include file is included in multiple places
from an executable?  opts-common.cc might be a good place to
stuff it, and opts.h to declare (if it were not for std::string, so possibly
jobserver.h is good enough, or opts-jobserver.h?)

> +{
> +  if (!pipe_path.empty ())
> +{
> +  gcc_assert (close (pipefd) == 0);
> +  pipefd = -1;
> +}
> +}
> +
> +bool
> +jobserver_info::get_token ()
> +{
> +  int fd = pipe_path.empty () ? rfd : pipefd;
> +  char c;
> +  unsigned n = read (fd, &c, 1);
> +  if (n != 1)
> +{
> +  gcc_assert (errno == EAGAIN);
> +  return false;
> +}
> +  else
> +return true;
> +}
> +
> +void
> +jobserver_info::return_token ()
> +{
> +  int fd = pipe_path.empty () ? wfd : pipefd;
> +  char c = 'G';
> +  gcc_assert (write (fd, &c, 1) == 1);
> +}
> +
>  #endif /* GCC_JOBSERVER_H */
> diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
> index 31b0c1862f7..56266195ead 100644
> --- a/gcc/lto/lto.cc
> +++ b/gcc/lto/lto.cc
> @@ -54,11 +54,17 @@ along with GCC; see the file COPYING3.  If not see
>  #include "attribs.h"
>  #include "builtins.h"
>  #include "lto-common.h"
> -
> +#include "jobserver.h"
>
>  /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver. 
>  */
>  static int lto_parallelism;
>
> +/* Number of active WPA streaming processes.  */
> +static int nruns = 0;
> +
> +/* GNU make's jobserver info.  */
> +static jobserver_info *jinfo = NULL;
> +
>  /* Return true when NODE has a clone that is analyzed (i.e. we need
> to load its body even if the node itself is not needed).  */
>
> @@ -205,6 +211,12 @@ wait_for_child ()
>  "streaming subprocess was killed by signal");
>  }
>while (!WIFEXITED (status) && !WIFSIGNALED (status));
> +
> +--nruns;
> +
> +/* Return token to the jobserver if active.  */
> +if (jinfo != NULL && jinfo->is_active)
> +  jinfo->return_token ();
>  }
>  #endif
>
> @@ -228,25 +240,35 @@ stream_out_partitions (char *temp_filename, int blen, 
> int min, int max,
>bool ARG_UNUSED (last))
>  {
>  #ifdef HAVE_WORKING_FORK
> -  static int nruns;
> -
>if (lto_parallelism <= 1)
>  {
>stream_out_partitions_1 (temp_filename, blen, min, max);
>return;
>  }
>
> -  /* Do not run more than LTO_PARALLELISM streamings
> - FIXME: we ignore limits on jobserver.  */
>if (lto_parallelism > 0 && nruns >= lto_parallelism)
> -{
> -  wait_for_child ();
> -  nruns --;
> -}
> +wait_for_child ();
> +
>/* If this is not the last parallel partition, execute new
>   streaming process.  */
>if (!last)
>  {
> +  if (jinfo != NULL && jinfo->is_active)
> +   while (true)
> + {
> +   if (jinfo->get_token ())
> +

Re: [PATCH 3/3] lto: respect jobserver in parallel WPA streaming

2022-08-10 Thread Martin Liška

On 8/10/22 12:52, Richard Biener wrote:
> If it's not a template and the methods not inline won't we get multiple
> definition errors when this include file is included in multiple places
> from an executable?  opts-common.cc might be a good place to
> stuff it, and opts.h to declare (if it were not for std::string, so possibly
> jobserver.h is good enough, or opts-jobserver.h?)

Works for me, updated in v2.

MartinFrom ff2073e1f549df40a5e859e2cde9a403307c1960 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 9 Aug 2022 13:59:39 +0200
Subject: [PATCH 3/3] lto: respect jobserver in parallel WPA streaming

	PR lto/106328

gcc/ChangeLog:

	* opts-jobserver.h (struct jobserver_info): Add pipefd.
	(jobserver_info::connect): New.
	(jobserver_info::disconnect): Likewise.
	(jobserver_info::get_token): Likewise.
	(jobserver_info::return_token): Likewise.
	* opts-common.cc: Implement the new functions.

gcc/lto/ChangeLog:

	* lto.cc (wait_for_child): Decrement nruns once a process
	finishes.
	(stream_out_partitions): Use job server if active.
	(do_whole_program_analysis): Likewise.
---
 gcc/lto/lto.cc   | 58 +---
 gcc/opts-common.cc   | 42 
 gcc/opts-jobserver.h | 14 +++
 3 files changed, 100 insertions(+), 14 deletions(-)

diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
index 31b0c1862f7..c82307f4f7e 100644
--- a/gcc/lto/lto.cc
+++ b/gcc/lto/lto.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -54,11 +55,17 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "builtins.h"
 #include "lto-common.h"
+#include "opts-jobserver.h"
 
-
-/* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
+/* Number of parallel tasks to run.  */
 static int lto_parallelism;
 
+/* Number of active WPA streaming processes.  */
+static int nruns = 0;
+
+/* GNU make's jobserver info.  */
+static jobserver_info *jinfo = NULL;
+
 /* Return true when NODE has a clone that is analyzed (i.e. we need
to load its body even if the node itself is not needed).  */
 
@@ -205,6 +212,12 @@ wait_for_child ()
 		 "streaming subprocess was killed by signal");
 }
   while (!WIFEXITED (status) && !WIFSIGNALED (status));
+
+--nruns;
+
+/* Return token to the jobserver if active.  */
+if (jinfo != NULL && jinfo->is_active)
+  jinfo->return_token ();
 }
 #endif
 
@@ -228,25 +241,35 @@ stream_out_partitions (char *temp_filename, int blen, int min, int max,
 		   bool ARG_UNUSED (last))
 {
 #ifdef HAVE_WORKING_FORK
-  static int nruns;
-
   if (lto_parallelism <= 1)
 {
   stream_out_partitions_1 (temp_filename, blen, min, max);
   return;
 }
 
-  /* Do not run more than LTO_PARALLELISM streamings
- FIXME: we ignore limits on jobserver.  */
   if (lto_parallelism > 0 && nruns >= lto_parallelism)
-{
-  wait_for_child ();
-  nruns --;
-}
+wait_for_child ();
+
   /* If this is not the last parallel partition, execute new
  streaming process.  */
   if (!last)
 {
+  if (jinfo != NULL && jinfo->is_active)
+	while (true)
+	  {
+	if (jinfo->get_token ())
+	  break;
+	if (nruns > 0)
+	  wait_for_child ();
+	else
+	  {
+		/* There are no free tokens, lets do the job outselves.  */
+		stream_out_partitions_1 (temp_filename, blen, min, max);
+		asm_nodes_output = true;
+		return;
+	  }
+	  }
+
   pid_t cpid = fork ();
 
   if (!cpid)
@@ -264,10 +287,12 @@ stream_out_partitions (char *temp_filename, int blen, int min, int max,
   /* Last partition; stream it and wait for all children to die.  */
   else
 {
-  int i;
   stream_out_partitions_1 (temp_filename, blen, min, max);
-  for (i = 0; i < nruns; i++)
+  while (nruns > 0)
 	wait_for_child ();
+
+  if (jinfo != NULL && jinfo->is_active)
+	jinfo->disconnect ();
 }
   asm_nodes_output = true;
 #else
@@ -460,9 +485,14 @@ do_whole_program_analysis (void)
 
   lto_parallelism = 1;
 
-  /* TODO: jobserver communication is not supported, yet.  */
   if (!strcmp (flag_wpa, "jobserver"))
-lto_parallelism = param_max_lto_streaming_parallelism;
+{
+  jinfo = new jobserver_info ();
+  if (jinfo->is_active)
+	jinfo->connect ();
+
+  lto_parallelism = param_max_lto_streaming_parallelism;
+}
   else
 {
   lto_parallelism = atoi (flag_wpa);
diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index c2993f9140a..6c9355f6372 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -2059,3 +2059,45 @@ jobserver_info::jobserver_info ()
   if (!error_msg.empty ())
 error_msg = "jobserver is not available: " + error_msg;
 }
+
+void
+jobserver_info::connect ()
+{
+  if (!pipe_path.empty ())
+pipef

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Martin Liška

On 8/10/22 12:47, Richard Biener wrote:
> Yeah, fair enough.

I'm going to install the v3 where I renamed jobserver.h
and moved the ctor implementation to opts-common.cc.

Cheers,
MartinFrom 6f3abcb3fa26f6ed719450f6bc70b2b37179973a Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 9 Aug 2022 13:59:32 +0200
Subject: [PATCH 1/3] Factor out jobserver_active_p.

gcc/ChangeLog:

	* gcc.cc (driver::detect_jobserver): Remove and move to
	jobserver.h.
	* lto-wrapper.cc (jobserver_active_p): Likewise.
	(run_gcc): Likewise.
	* opts-jobserver.h: New file.
	* opts-common.cc (jobserver_info::jobserver_info): New function.
---
 gcc/gcc.cc   | 37 +
 gcc/lto-wrapper.cc   | 44 +---
 gcc/opts-common.cc   | 41 +
 gcc/opts-jobserver.h | 44 
 4 files changed, 99 insertions(+), 67 deletions(-)
 create mode 100644 gcc/opts-jobserver.h

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 5cbb38560b2..cac11c1a117 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -27,6 +27,7 @@ CC recognizes how to compile each input file by suffixes in the file names.
 Once it knows which kind of compilation to perform, the procedure for
 compilation is specified by a string called a "spec".  */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -43,6 +44,7 @@ compilation is specified by a string called a "spec".  */
 #include "opts.h"
 #include "filenames.h"
 #include "spellcheck.h"
+#include "opts-jobserver.h"
 
 
 
@@ -9178,38 +9180,9 @@ driver::final_actions () const
 void
 driver::detect_jobserver () const
 {
-  /* Detect jobserver and drop it if it's not working.  */
-  const char *makeflags = env.get ("MAKEFLAGS");
-  if (makeflags != NULL)
-{
-  const char *needle = "--jobserver-auth=";
-  const char *n = strstr (makeflags, needle);
-  if (n != NULL)
-	{
-	  int rfd = -1;
-	  int wfd = -1;
-
-	  bool jobserver
-	= (sscanf (n + strlen (needle), "%d,%d", &rfd, &wfd) == 2
-	   && rfd > 0
-	   && wfd > 0
-	   && is_valid_fd (rfd)
-	   && is_valid_fd (wfd));
-
-	  /* Drop the jobserver if it's not working now.  */
-	  if (!jobserver)
-	{
-	  unsigned offset = n - makeflags;
-	  char *dup = xstrdup (makeflags);
-	  dup[offset] = '\0';
-
-	  const char *space = strchr (makeflags + offset, ' ');
-	  if (space != NULL)
-		strcpy (dup + offset, space);
-	  xputenv (concat ("MAKEFLAGS=", dup, NULL));
-	}
-	}
-}
+  jobserver_info jinfo;
+  if (!jinfo.is_active && !jinfo.skipped_makeflags.empty ())
+xputenv (jinfo.skipped_makeflags.c_str ());
 }
 
 /* Determine what the exit code of the driver should be.  */
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 795ab74555c..1e8eba16dfb 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
./ccCJuXGv.lto.ltrans.o
 */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -49,6 +50,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-section-names.h"
 #include "collect-utils.h"
 #include "opts-diagnostic.h"
+#include "opt-suggestions.h"
+#include "opts-jobserver.h"
 
 /* Environment variable, used for passing the names of offload targets from GCC
driver to lto-wrapper.  */
@@ -1336,35 +1339,6 @@ init_num_threads (void)
 #endif
 }
 
-/* Test and return reason why a jobserver cannot be detected.  */
-
-static const char *
-jobserver_active_p (void)
-{
-  #define JS_PREFIX "jobserver is not available: "
-  #define JS_NEEDLE "--jobserver-auth="
-
-  const char *makeflags = getenv ("MAKEFLAGS");
-  if (makeflags == NULL)
-return JS_PREFIX "% environment variable is unset";
-
-  const char *n = strstr (makeflags, JS_NEEDLE);
-  if (n == NULL)
-return JS_PREFIX "%<" JS_NEEDLE "%> is not present in %";
-
-  int rfd = -1;
-  int wfd = -1;
-
-  if (sscanf (n + strlen (JS_NEEDLE), "%d,%d", &rfd, &wfd) == 2
-  && rfd > 0
-  && wfd > 0
-  && is_valid_fd (rfd)
-  && is_valid_fd (wfd))
-return NULL;
-  else
-return JS_PREFIX "cannot access %<" JS_NEEDLE "%> file descriptors";
-}
-
 /* Print link to -flto documentation with a hint message.  */
 
 void
@@ -1422,7 +1396,6 @@ run_gcc (unsigned argc, char *argv[])
   bool jobserver_requested = false;
   int auto_parallel = 0;
   bool no_partition = false;
-  const char *jobserver_error = NULL;
   bool fdecoded_options_first = true;
   vec fdecoded_options;
   fdecoded_options.create (16);
@@ -1653,14 +1626,14 @@ run_gcc (unsigned argc, char *argv[])
 }
   else
 {
-  jobserver_error = jobserver_active_p ();
-  if (jobserver && jobserver_error != NULL)
+  jobserver_info jinfo;
+  if (jobserver && !jinfo.is_active)
 	{
 	  /* Fall back to auto parallelism.  */
 	  jobserver = 0;
 	  auto_parallel = 1;
 	}
-

Re: [PATCH 2/3] lto: support --jobserver-style=fifo for recent GNU make

2022-08-10 Thread Martin Liška

On 8/9/22 14:00, Martin Liška wrote:
> |Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready 
> to be installed? Thanks, Martin|

Sending v2 where I renamed the file. I'm going to install it as the original 
fifo support
patch was already approved by Richi.

Cheers,
MartinFrom 4a678594435e2133166fad7cdc7ed144205455ff Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 9 Aug 2022 13:59:36 +0200
Subject: [PATCH 2/3] lto: support --jobserver-style=fifo for recent GNU make

gcc/ChangeLog:

	* opts-jobserver.h: Add one member.
	* opts-common.cc (jobserver_info::jobserver_info): Parse FIFO
	format of --jobserver-auth.
---
 gcc/opts-common.cc   | 17 +++--
 gcc/opts-jobserver.h |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 4d4f424df13..c2993f9140a 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -2010,8 +2010,14 @@ void prepend_xassembler_to_collect_as_options (const char *collect_as_options,
 
 jobserver_info::jobserver_info ()
 {
+  /* Traditionally, GNU make uses opened pipes for jobserver-auth,
+e.g. --jobserver-auth=3,4.
+Starting with GNU make 4.4, one can use --jobserver-style=fifo
+and then named pipe is used: --jobserver-auth=fifo:/tmp/hcsparta.  */
+
   /* Detect jobserver and drop it if it's not working.  */
   string js_needle = "--jobserver-auth=";
+  string fifo_prefix = "fifo:";
 
   const char *envval = getenv ("MAKEFLAGS");
   if (envval != NULL)
@@ -2020,8 +2026,15 @@ jobserver_info::jobserver_info ()
   size_t n = makeflags.rfind (js_needle);
   if (n != string::npos)
 	{
-	  if (sscanf (makeflags.c_str () + n + js_needle.size (),
-		  "%d,%d", &rfd, &wfd) == 2
+	  string ending = makeflags.substr (n + js_needle.size ());
+	  if (ending.find (fifo_prefix) == 0)
+	{
+	  ending = ending.substr (fifo_prefix.size ());
+	  pipe_path = ending.substr (0, ending.find (' '));
+	  is_active = true;
+	}
+	  else if (sscanf (makeflags.c_str () + n + js_needle.size (),
+			   "%d,%d", &rfd, &wfd) == 2
 	  && rfd > 0
 	  && wfd > 0
 	  && is_valid_fd (rfd)
diff --git a/gcc/opts-jobserver.h b/gcc/opts-jobserver.h
index 68ce188b84a..98ea2579962 100644
--- a/gcc/opts-jobserver.h
+++ b/gcc/opts-jobserver.h
@@ -37,6 +37,8 @@ struct jobserver_info
   int rfd = -1;
   /* File descriptor for writing used for jobserver communication.  */
   int wfd = -1;
+  /* Named pipe path.  */
+  string pipe_path = "";
   /* Return true if jobserver is active.  */
   bool is_active = false;
 };
-- 
2.37.1

Re: [PATCH] libstdc++: Outline the overlapping case of string _M_replace into a separate function [PR105329]

2022-08-10 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 27, 2022 at 11:33:29AM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following patch is partially a workaround for bogus warnings
> when the compiler isn't able to fold _M_disjunct call into constant
> false, but also an optimization attempt - assuming _M_disjunct (__s)
> is rare, the patch should shrink code size for the common case and
> use library or for non-standard instantiations an out of line
> function to handle the rare case.

I'd like to ping this patch.

Thanks.

Jakub

Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-10 Thread Ilya Leoshkevich via Gcc-patches

On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
> produces an insn that vsel is supposed to recognize, but can't,
> because it's not defined for V2SF.  Fix by defining it for all vector
> modes supported by copysign3.
> 
> gcc/ChangeLog:
> 
> * config/s390/vector.md (V_HW_FT): New iterator.
> * config/s390/vx-builtins.md (vsel): Use V instead of
> V_HW.
> ---
>  gcc/config/s390/vector.md  |  6 ++
>  gcc/config/s390/vx-builtins.md | 12 ++--
>  2 files changed, 12 insertions(+), 6 deletions(-)

Jakub pointed out that this is broken in gcc-12 as well.
The patch applies cleanly, and I started a bootstrap/regtest.
Ok for gcc-12?

Re: [PATCH 1/3] Factor out jobserver_active_p.

2022-08-10 Thread Richard Biener via Gcc-patches

On Wed, Aug 10, 2022 at 1:11 PM Martin Liška  wrote:
>
> On 8/10/22 12:47, Richard Biener wrote:
> > Yeah, fair enough.
>
> I'm going to install the v3 where I renamed jobserver.h
> and moved the ctor implementation to opts-common.cc.

For the record, the v3 series is OK

> Cheers,
> Martin

[PATCH] Fix path query compute_imports for external path

2022-08-10 Thread Richard Biener via Gcc-patches

The following fixes the use of compute_imports from the backwards
threader which ends up accessing stale m_path from a previous
threading attempt.  The fix is to pass in the path explicitely
(and not the exit), and initializing it with the exit around this
call from the backwards threader.  That unfortunately exposed that
we rely on this broken behavior as the new testcase shows.  The
missed threading can be restored by registering all relations
from conditions on the path during solving, for the testcase the
particular important case is for relations provided by the path
entry conditional.

I've verified that the GORI query for imported ranges on edges
is not restricted this way.

This regresses the new ssa-thread-19.c testcase which is exactly
a case for the other patch re-doing how we compute imports since
this misses imports for defs that are not on the dominating path
from the exit.

That's one of the cases this regresses (it also progresses a few
due to more or the correct relations added).  Overall it
reduces the number of threads from 98649 to 98620 on my set of
cc1files.  I think it's a reasonable intermediate step to find
a stable, less random ground to compare stats to.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

* gimple-range-path.h (path_range_query::compute_imports):
Take path as argument, not the exit block.
* gimple-range-path.cc (path_range_query::compute_imports):
Likewise, and adjust, avoiding possibly stale m_path.
(path_range_query::compute_outgoing_relations): Register
relations for all conditionals.
* tree-ssa-threadbackward.cc (back_threader::find_paths):
Adjust.

* gcc.dg/tree-ssa/ssa-thread-18.c: New testcase.
* gcc.dg/tree-ssa/ssa-thread-19.c: Likewise, but XFAILed.
---
 gcc/gimple-range-path.cc  | 21 +---
 gcc/gimple-range-path.h   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c | 20 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c | 33 +++
 gcc/tree-ssa-threadbackward.cc|  4 ++-
 5 files changed, 65 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..389faec260c 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -549,7 +549,7 @@ path_range_query::add_to_imports (tree name, bitmap imports)
   return false;
 }
 
-// Compute the imports to the path ending in EXIT.  These are
+// Compute the imports to PATH.  These are
 // essentially the SSA names used to calculate the final conditional
 // along the path.
 //
@@ -559,9 +559,10 @@ path_range_query::add_to_imports (tree name, bitmap 
imports)
 // we can solve.
 
 void
-path_range_query::compute_imports (bitmap imports, basic_block exit)
+path_range_query::compute_imports (bitmap imports, const vec 
&path)
 {
   // Start with the imports from the exit block...
+  basic_block exit = path[0];
   gori_compute &gori = m_ranger->gori ();
   bitmap r_imports = gori.imports (exit);
   bitmap_copy (imports, r_imports);
@@ -599,7 +600,7 @@ path_range_query::compute_imports (bitmap imports, 
basic_block exit)
  tree arg = gimple_phi_arg (phi, i)->def;
 
  if (TREE_CODE (arg) == SSA_NAME
- && m_path.contains (e->src)
+ && path.contains (e->src)
  && bitmap_set_bit (imports, SSA_NAME_VERSION (arg)))
worklist.safe_push (arg);
}
@@ -607,9 +608,9 @@ path_range_query::compute_imports (bitmap imports, 
basic_block exit)
 }
   // Exported booleans along the path, may help conditionals.
   if (m_resolve)
-for (i = 0; i < m_path.length (); ++i)
+for (i = 0; i < path.length (); ++i)
   {
-   basic_block bb = m_path[i];
+   basic_block bb = path[i];
tree name;
FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
  if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
@@ -636,7 +637,7 @@ path_range_query::compute_ranges (const vec 
&path,
   if (imports)
 bitmap_copy (m_imports, imports);
   else
-compute_imports (m_imports, exit_bb ());
+compute_imports (m_imports, m_path);
 
   if (m_resolve)
 get_path_oracle ()->reset_path ();
@@ -845,15 +846,9 @@ path_range_query::compute_phi_relations (basic_block bb, 
basic_block prev)
 void
 path_range_query::compute_outgoing_relations (basic_block bb, basic_block next)
 {
-  gimple *stmt = last_stmt (bb);
-
-  if (stmt
-  && gimple_code (stmt) == GIMPLE_COND
-  && (import_p (gimple_cond_lhs (stmt))
- || import_p (gimple_cond_rhs (stmt
+  if (gcond *cond = safe_dyn_cast  (last_stmt (bb)))
 {
   int_range<2> r;
-  gcond *cond = as_a (stmt);
   edge e0 = EDGE_SUCC (bb, 0);
   edge e1 = EDGE_SUCC (bb, 1);
 
dif

[PATCH] tree-optimization/106513 - fix mistake in bswap symbolic number shifts

2022-08-10 Thread Richard Biener via Gcc-patches

This fixes a mistake in typing a local variable in the symbolic
shift routine.

Bootstrap & regtest pending on x86_64-unknown-linux-gnu.

PR tree-optimization/106513
* gimple-ssa-store-merging.cc (do_shift_rotate): Use uint64_t
for head_marker.

* gcc.dg/torture/pr106513.c: New testcase.
---
 gcc/gimple-ssa-store-merging.cc |  2 +-
 gcc/testsuite/gcc.dg/torture/pr106513.c | 26 +
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr106513.c

diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
index 0640168bcc4..b80b8eac444 100644
--- a/gcc/gimple-ssa-store-merging.cc
+++ b/gcc/gimple-ssa-store-merging.cc
@@ -263,7 +263,7 @@ do_shift_rotate (enum tree_code code,
 int count)
 {
   int i, size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
-  unsigned head_marker;
+  uint64_t head_marker;
 
   if (count < 0
   || count >= TYPE_PRECISION (n->type)
diff --git a/gcc/testsuite/gcc.dg/torture/pr106513.c 
b/gcc/testsuite/gcc.dg/torture/pr106513.c
new file mode 100644
index 000..92c02ffb37b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr106513.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+typedef __INT64_TYPE__ int64_t;
+
+__attribute__((noinline)) int64_t
+swap64 (int64_t n)
+{
+  return (((n & (((int64_t) 0xff) )) << 56) |
+  ((n & (((int64_t) 0xff) << 8)) << 40) |
+  ((n & (((int64_t) 0xff) << 16)) << 24) |
+  ((n & (((int64_t) 0xff) << 24)) << 8) |
+  ((n & (((int64_t) 0xff) << 32)) >> 8) |
+  ((n & (((int64_t) 0xff) << 40)) >> 24) |
+  ((n & (((int64_t) 0xff) << 48)) >> 40) |
+  ((n & ((int64_t)(0xffull << 56))) >> 56));
+}
+
+int main (void)
+{
+  volatile int64_t n = 0x8000l;
+
+  if (swap64(n) != 0xff80l)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.35.3

Re: [PATCH] tree-optimization/106513 - fix mistake in bswap symbolic number shifts

2022-08-10 Thread Jakub Jelinek via Gcc-patches

On Wed, Aug 10, 2022 at 03:47:10PM +0200, Richard Biener via Gcc-patches wrote:
> This fixes a mistake in typing a local variable in the symbolic
> shift routine.
> 
> Bootstrap & regtest pending on x86_64-unknown-linux-gnu.
> 
>   PR tree-optimization/106513
>   * gimple-ssa-store-merging.cc (do_shift_rotate): Use uint64_t
>   for head_marker.

Ok.

> 
>   * gcc.dg/torture/pr106513.c: New testcase.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr106513.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +
> +typedef __INT64_TYPE__ int64_t;
> +
> +__attribute__((noinline)) int64_t
> +swap64 (int64_t n)
> +{
> +  return (((n & (((int64_t) 0xff) )) << 56) |
> +  ((n & (((int64_t) 0xff) << 8)) << 40) |
> +  ((n & (((int64_t) 0xff) << 16)) << 24) |
> +  ((n & (((int64_t) 0xff) << 24)) << 8) |
> +  ((n & (((int64_t) 0xff) << 32)) >> 8) |
> +  ((n & (((int64_t) 0xff) << 40)) >> 24) |
> +  ((n & (((int64_t) 0xff) << 48)) >> 40) |
> +  ((n & ((int64_t)(0xffull << 56))) >> 56));
> +}
> +
> +int main (void)
> +{
> +  volatile int64_t n = 0x8000l;
> +
> +  if (swap64(n) != 0xff80l)
> +__builtin_abort ();

Please use ll instead of l in both cases above.
Perhaps -0x80ll would be simpler...

Jakub

[PATCH] soft-fp: Update soft-fp from glibc

2022-08-10 Thread Kito Cheng

This patch is updating all soft-fp from glibc, most changes are
copyright years update, removing "Contributed by" lines and update URL for
license, and changes other than those update are adding conversion
function between IEEE half and 32-bit/64-bit integer, those functions are
required by RISC-V _Float16 support.

libgcc/ChangeLog:

* soft-fp/fixhfdi.c: New.
* soft-fp/fixhfsi.c: Likewise.
* soft-fp/fixunshfdi.c: Likewise.
* soft-fp/fixunshfsi.c: Likewise.
* soft-fp/floatdihf.c: Likewise.
* soft-fp/floatsihf.c: Likewise.
* soft-fp/floatundihf.c: Likewise.
* soft-fp/floatunsihf.c: Likewise.
* soft-fp/adddf3.c: Updating copyright years, removing "Contributed by"
lines and update URL for license.
* soft-fp/addsf3.c: Likewise.
* soft-fp/addtf3.c: Likewise.
* soft-fp/divdf3.c: Likewise.
* soft-fp/divsf3.c: Likewise.
* soft-fp/divtf3.c: Likewise.
* soft-fp/double.h: Likewise.
* soft-fp/eqdf2.c: Likewise.
* soft-fp/eqhf2.c: Likewise.
* soft-fp/eqsf2.c: Likewise.
* soft-fp/eqtf2.c: Likewise.
* soft-fp/extenddftf2.c: Likewise.
* soft-fp/extended.h: Likewise.
* soft-fp/extendhfdf2.c: Likewise.
* soft-fp/extendhfsf2.c: Likewise.
* soft-fp/extendhftf2.c: Likewise.
* soft-fp/extendhfxf2.c: Likewise.
* soft-fp/extendsfdf2.c: Likewise.
* soft-fp/extendsftf2.c: Likewise.
* soft-fp/extendxftf2.c: Likewise.
* soft-fp/fixdfdi.c: Likewise.
* soft-fp/fixdfsi.c: Likewise.
* soft-fp/fixdfti.c: Likewise.
* soft-fp/fixhfti.c: Likewise.
* soft-fp/fixsfdi.c: Likewise.
* soft-fp/fixsfsi.c: Likewise.
* soft-fp/fixsfti.c: Likewise.
* soft-fp/fixtfdi.c: Likewise.
* soft-fp/fixtfsi.c: Likewise.
* soft-fp/fixtfti.c: Likewise.
* soft-fp/fixunsdfdi.c: Likewise.
* soft-fp/fixunsdfsi.c: Likewise.
* soft-fp/fixunsdfti.c: Likewise.
* soft-fp/fixunshfti.c: Likewise.
* soft-fp/fixunssfdi.c: Likewise.
* soft-fp/fixunssfsi.c: Likewise.
* soft-fp/fixunssfti.c: Likewise.
* soft-fp/fixunstfdi.c: Likewise.
* soft-fp/fixunstfsi.c: Likewise.
* soft-fp/fixunstfti.c: Likewise.
* soft-fp/floatdidf.c: Likewise.
* soft-fp/floatdisf.c: Likewise.
* soft-fp/floatditf.c: Likewise.
* soft-fp/floatsidf.c: Likewise.
* soft-fp/floatsisf.c: Likewise.
* soft-fp/floatsitf.c: Likewise.
* soft-fp/floattidf.c: Likewise.
* soft-fp/floattihf.c: Likewise.
* soft-fp/floattisf.c: Likewise.
* soft-fp/floattitf.c: Likewise.
* soft-fp/floatundidf.c: Likewise.
* soft-fp/floatundisf.c: Likewise.
* soft-fp/floatunditf.c: Likewise.
* soft-fp/floatunsidf.c: Likewise.
* soft-fp/floatunsisf.c: Likewise.
* soft-fp/floatunsitf.c: Likewise.
* soft-fp/floatuntidf.c: Likewise.
* soft-fp/floatuntihf.c: Likewise.
* soft-fp/floatuntisf.c: Likewise.
* soft-fp/floatuntitf.c: Likewise.
* soft-fp/gedf2.c: Likewise.
* soft-fp/gesf2.c: Likewise.
* soft-fp/getf2.c: Likewise.
* soft-fp/half.h: Likewise.
* soft-fp/ledf2.c: Likewise.
* soft-fp/lesf2.c: Likewise.
* soft-fp/letf2.c: Likewise.
* soft-fp/muldf3.c: Likewise.
* soft-fp/mulsf3.c: Likewise.
* soft-fp/multf3.c: Likewise.
* soft-fp/negdf2.c: Likewise.
* soft-fp/negsf2.c: Likewise.
* soft-fp/negtf2.c: Likewise.
* soft-fp/op-1.h: Likewise.
* soft-fp/op-2.h: Likewise.
* soft-fp/op-4.h: Likewise.
* soft-fp/op-8.h: Likewise.
* soft-fp/op-common.h: Likewise.
* soft-fp/quad.h: Likewise.
* soft-fp/single.h: Likewise.
* soft-fp/soft-fp.h: Likewise.
* soft-fp/subdf3.c: Likewise.
* soft-fp/subsf3.c: Likewise.
* soft-fp/subtf3.c: Likewise.
* soft-fp/truncdfhf2.c: Likewise.
* soft-fp/truncdfsf2.c: Likewise.
* soft-fp/truncsfhf2.c: Likewise.
* soft-fp/trunctfdf2.c: Likewise.
* soft-fp/trunctfhf2.c: Likewise.
* soft-fp/trunctfsf2.c: Likewise.
* soft-fp/trunctfxf2.c: Likewise.
* soft-fp/truncxfhf2.c: Likewise.
* soft-fp/unorddf2.c: Likewise.
* soft-fp/unordsf2.c: Likewise.
---
 libgcc/soft-fp/adddf3.c  |  6 ++---
 libgcc/soft-fp/addsf3.c  |  6 ++---
 libgcc/soft-fp/addtf3.c  |  6 ++---
 libgcc/soft-fp/divdf3.c  |  6 ++---
 libgcc/soft-fp/divsf3.c  |  6 ++---
 libgcc/soft-fp/divtf3.c  |  6 ++---
 libgcc/soft-fp/double.h  |  8 ++-
 libgcc/soft-fp/eqdf2.c   |  6 ++---
 libgcc/soft-fp/eqhf2.c   |  2 +-
 libgcc/soft-fp/eqsf2.c   |  6 ++---
 libgcc/soft-fp/eqtf2.c   |  6 ++---
 libgcc/soft-fp/extenddftf2.c |  6 ++

Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-10 Thread Mir Immad via Gcc-patches

 > if you convert the "int m;" locals into an extern global, like in
> comment #0 of bug 106551, does that still trigger the crash on the
> unpatched sm-fd.cc?

Yes, it does, since m would be in "m_start" state. I'm sending an updated
patch.

Thanks
Immad.

On Wed, Aug 10, 2022 at 1:32 AM David Malcolm  wrote:

> On Tue, 2022-08-09 at 21:42 +0530, Immad Mir wrote:
> > This patch fixes the ICE caused by valid_to_unchecked_state,
> > at analyzer/sm-fd.cc by handling the m_start state in
> > check_for_dup.
> >
> > Tested lightly on x86_64.
> >
> > gcc/analyzer/ChangeLog:
> > PR analyzer/106551
> > * sm-fd.cc (check_for_dup): handle the m_start
> > state when transitioning the state of LHS
> > of dup, dup2 and dup3 call.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.dg/analyzer/fd-dup-1.c: New testcases.
> >
> > Signed-off-by: Immad Mir 
> > ---
> >  gcc/analyzer/sm-fd.cc|  4 ++--
> >  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28
> > +++-
> >  2 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > index 8bb76d72b05..c8b9930a7b6 100644
> > --- a/gcc/analyzer/sm-fd.cc
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> > *sm_ctxt, const supernode *node,
> >  case DUP_1:
> >if (lhs)
> > {
> > - if (is_constant_fd_p (state_arg_1))
> > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > m_start)
> > sm_ctxt->set_next_state (stmt, lhs,
> > m_unchecked_read_write);
> >   else
> > sm_ctxt->set_next_state (stmt, lhs,
> > @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> > *sm_ctxt, const supernode *node,
> >file descriptor i.e the first argument.  */
> >if (lhs)
> > {
> > - if (is_constant_fd_p (state_arg_1))
> > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > m_start)
> > sm_ctxt->set_next_state (stmt, lhs,
> > m_unchecked_read_write);
> >   else
> > sm_ctxt->set_next_state (stmt, lhs,
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > index eba2570568f..ed4d6de57db 100644
> > --- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > @@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
> >  close (fd);
> >  }
> >
> > -}
> > \ No newline at end of file
> > +}
> > +
> > +void
> > +test_20 ()
> > +{
> > +int m;
> > +int fd = dup (m); /* { dg-warning "'dup' on possibly invalid
> > file descriptor 'm'" } */
> > +close (fd);
> > +}
> > +
> > +void
> > +test_21 ()
> > +{
> > +int m;
> > +int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly
> > invalid file descriptor 'm'" } */
> > +close (fd);
> > +}
> > +
> > +void
> > +test_22 (int flags)
> > +{
> > +int m;
> > +int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on possibly
> > invalid file descriptor 'm'" } */
> > +close (fd);
> > +}
>
> Thanks for the updated patch.
>
> The test cases looked suspicious to me - I was wondering why the
> analyzer doesn't complain about the uninitialized values being passed
> to the various dup functions as parameters.  So your test cases seem to
> have uncovered a hidden pre-existing bug in the analyzer's
> uninitialized value detection, which I've filed for myself to deal with
> as PR analyzer/106573.
>
> If you convert the "int m;" locals into an extern global, like in
> comment #0 of bug 106551, does that still trigger the crash on the
> unpatched sm-fd.cc?  If so, then that's greatly preferable as a
> regression test, since otherwise I'll have to modify that test case
> when I fix bug 106573.
>
> Dave
>
>
>
>

[PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-10 Thread Immad Mir via Gcc-patches

This patch fixes the ICE caused by valid_to_unchecked_state,
at analyzer/sm-fd.cc by handling the m_start state in
check_for_dup.

Tested lightly on x86_64.

gcc/analyzer/ChangeLog:
PR analyzer/106551
* sm-fd.cc (check_for_dup): handle the m_start
state when transitioning the state of LHS
of dup, dup2 and dup3 call.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-dup-1.c: New testcases.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc|  4 ++--
 gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 27 +++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index 8bb76d72b05..c8b9930a7b6 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, const 
supernode *node,
 case DUP_1:
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
@@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, 
const supernode *node,
   file descriptor i.e the first argument.  */
   if (lhs)
{
- if (is_constant_fd_p (state_arg_1))
+ if (is_constant_fd_p (state_arg_1) || state_arg_1 == m_start)
sm_ctxt->set_next_state (stmt, lhs, m_unchecked_read_write);
  else
sm_ctxt->set_next_state (stmt, lhs,
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
index eba2570568f..bc823d1ce4e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
@@ -220,4 +220,29 @@ test_19 (const char *path, void *buf)
 close (fd);
 }
 
-}
\ No newline at end of file
+}
+
+extern int m;
+
+void
+test_20 ()
+{
+int fd = dup (m); /* { dg-warning "'dup' on possibly invalid file 
descriptor 'm'" } */
+close (fd);
+}
+
+void
+test_21 ()
+{
+int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly invalid file 
descriptor 'm'" } */
+close (fd);
+}
+
+void
+test_22 (int flags)
+{
+int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on possibly invalid 
file descriptor 'm'" } */
+close (fd);
+}
+
+
-- 
2.25.1

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-10 Thread Aldy Hernandez via Gcc-patches

I'm still on vacation, but I figured I'd mention a few things to
either unblock you, or move the conversation along.

On Wed, Aug 10, 2022 at 8:45 AM Richard Biener  wrote:
>
> On Tue, 9 Aug 2022, Andrew MacLeod wrote:
>
> >
> > On 8/9/22 09:01, Richard Biener wrote:
> > > This revisits how we compute imports later used for the ranger path
> > > query during backwards threading.  The compute_imports function
> > > of the path solver ends up pulling the SSA def chain of regular
> > > stmts without limit and since it starts with just the gori imports
> > > of the path exit it misses some interesting names to translate
> > > during path discovery.  In fact with a still empty path this
> > > compute_imports function looks like not the correct tool.
> >
> > I don't really know how this works in practice.  Aldys off this week, so he
> > can comment when he returns.
> >
> > The original premise was along the line of recognizing that only changes to 
> > a
> > GORI import name to a block can affect the branch at the end of the block.
> > ie, if the path doesn't change any import to block A, then the branch at the
> > end of block A will not change either.Likewise, if it does change an
> > import, then we look at whether the branch can be threaded.Beyond that
> > basic premise, I dont know what all it does.
>
> Yep, I also think that's the idea.

Yes.

>
> > I presume the unbounded def chain is for local defs within a block that in
> > turn feeds the import to another block.   Im not sure why we need to do much
> > with those..  again, its only the import to the defchain that can affect the
> > outcome t the end of the chain.. and if it changes, then you need to
> > recalculate the entire chain.. but that would be part of the normal path
> > walk.  I suspect ther eis also some pruning that can be done there, as GORi
> > reflects "can affect the range" not "will affect the range".
> >
> > Perhaps whats going on is that all those local elements are being added up
> > front to the list of interesting names?  That would certainly blow up the
> > bitmaps and loops and such.
>
> What it does is, if we have
>
> bb:
>   _3 = _5 + 1;
>   _1 = _3 + _4;
>   if (_1 > _2)
>
> it puts _3 and _4 and _5 into the set of interesting names.  That's
> OK and desired I think?  The actual problem is that compute_imports
> will follow the def of _5 and _4 into dominating blocks recursively,
> adding things to the imports even if the definition blocks are not
> on the path (the path is empty at the point we call compute_imports).

That sounds like a bug.  We shouldn't recurse unbounded outside of the
path.  For that matter, for anything outside the path, we should be
querying the ranger, which can cache things appropriately.  The intent
was that for anything queried outside of the path, we should be using
the ranger through range_on_path_entry.

> For the testcase at hand this pulls in some 1000s of names into the
> initial set of imports.  Now, the current path discovery code
> only adds to imports by means of PHI translating from a PHI
> def to a PHI use on the path edge - it doesn't add any further
> local names used to define such PHI edge use from blocks not
> dominating the path exit (but on the about to be threaded path).
>
> I'll also note that compute_imports does
>
>   // Exported booleans along the path, may help conditionals.
>   if (m_resolve)
> for (i = 0; i < m_path.length (); ++i)
>   {
> basic_block bb = m_path[i];
> tree name;
> FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
>   if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
> bitmap_set_bit (imports, SSA_NAME_VERSION (name));
>   }
>
> but at least for the backwards threader this isn't effective
> since the path is empty at the point we call this.  And no
> other code in the backwards threader does sth like this.

IIRC, this came about to help with statements leading up to the
condition.  See fur_source::register_outgoing_edges:

  // Now look for other relations in the exports.  This will find stmts
  // leading to the condition such as:
  // c_2 = a_4 < b_7
  // if (c_2)
  FOR_EACH_GORI_EXPORT_NAME (*(gori ()), bb, name)
{
  if (TREE_CODE (TREE_TYPE (name)) != BOOLEAN_TYPE)
continue;

At least back when I wrote it, it made a difference in the number of
threads we got.

> I would also say for the exit block it doesn't make
> much sense to look at gori exports.  Plus I don't understand
> what this tries to capture - it presumably catches stmts
> like _1 = _2 == _3 that have uses in downstream blocks but
> might be not directly part of the conditional op def chain.
>
> In the end all the threader does is, once the path is complete,
> compute ranges for the path and the imports and then fold the
> path exit stmt.
>
> I'm not sure which names we need in the import set for this
> but as I guess the set of imports constrain the names we
> compute ranges for so for the BB above, if we want to have
> a

[PATCH 0/2] RISC-V: Support _Float16 type and implement zfh and zfhmin extension

2022-08-10 Thread Kito Cheng

This patch set implements Zfh and Zfhmin, adds soft-float for _Float16, and 
enables _Float16 type in C++ mode.

Zfh and Zfhmin are extensions for IEEE half precision, both are ratified in 
Jan. 2022[1]

v2 Changes:
Fix mangling for C++ mode to fit the RISC-V psABI spec.


[1] 
https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768

[PATCH v2 1/2] RISC-V: Support _Float16 type.

2022-08-10 Thread Kito Cheng

RISC-V decide use _Float16 as primary IEEE half precision type, and this
already become part of psABI, this patch has added folloing support for
_Float16:

- Soft-float support for _Float16.
- Make sure _Float16 available on C++ mode.
- Name mangling for _Float16 on C++ mode.

gcc/ChangeLog

* config/riscv/riscv-builtins.cc: include stringpool.h
(riscv_float16_type_node): New.
(riscv_init_builtin_types): Ditto.
(riscv_init_builtins): Call riscv_init_builtin_types.
* config/riscv/riscv-modes.def (HF): New.
* gcc/config/riscv/riscv.cc (riscv_output_move): Handle HFmode.
(riscv_mangle_type): New.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_excess_precision): Ditto.
(riscv_floatn_mode): Ditto.
(riscv_init_libfuncs): Ditto.
(TARGET_MANGLE_TYPE): Ditto.
(TARGET_SCALAR_MODE_SUPPORTED_P): Ditto.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Ditto.
(TARGET_INIT_LIBFUNCS): Ditto.
(TARGET_C_EXCESS_PRECISION): Ditto.
(TARGET_FLOATN_MODE): Ditto.
* gcc/config/riscv/riscv.md (mode): Add HF.
(softload): Add HF.
(softstore): Ditto.
(fmt): Ditto.
(UNITMODE): Ditto.
(movhf): New.
(*movhf_softfloat): New.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_H): New.
(_FP_NANFRAC_H): Ditto.
(_FP_NANSIGN_H): Ditto.
* config/riscv/t-softfp32 (softfp_extensions): Add HF related
routines.
(softfp_truncations): Ditto.
(softfp_extras): Ditto.
* config/riscv/t-softfp64 (softfp_extras): Add HF related routines.

gcc/testsuite/ChangeLog:

* gcc/testsuite/g++.target/riscv/_Float16.C: New.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16.c: Ditto.
---
 gcc/config/riscv/riscv-builtins.cc|  24 +++
 gcc/config/riscv/riscv-modes.def  |   1 +
 gcc/config/riscv/riscv.cc | 171 --
 gcc/config/riscv/riscv.md |  30 ++-
 gcc/testsuite/g++.target/riscv/_Float16.C |  18 ++
 .../gcc.target/riscv/_Float16-soft-1.c|   9 +
 .../gcc.target/riscv/_Float16-soft-2.c|  13 ++
 .../gcc.target/riscv/_Float16-soft-3.c|  12 ++
 .../gcc.target/riscv/_Float16-soft-4.c|  12 ++
 gcc/testsuite/gcc.target/riscv/_Float16.c |  19 ++
 libgcc/config/riscv/sfp-machine.h |   3 +
 libgcc/config/riscv/t-softfp32|   5 +
 libgcc/config/riscv/t-softfp64|   1 +
 13 files changed, 300 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/_Float16.C
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16.c

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 1218fdfc67d..3009311604d 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "diagnostic-core.h"
 #include "stor-layout.h"
+#include "stringpool.h"
 #include "expr.h"
 #include "langhooks.h"
 
@@ -160,6 +161,8 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
 #define GET_BUILTIN_DECL(CODE) \
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
+tree riscv_float16_type_node = NULL_TREE;
+
 /* Return the function type associated with function prototype TYPE.  */
 
 static tree
@@ -185,11 +188,32 @@ riscv_build_function_type (enum riscv_function_type type)
   return types[(int) type];
 }
 
+static void
+riscv_init_builtin_types (void)
+{
+  /* Provide the _Float16 type and float16_type_node if needed.  */
+  if (!float16_type_node)
+{
+  riscv_float16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (riscv_float16_type_node) = 16;
+  SET_TYPE_MODE (riscv_float16_type_node, HFmode);
+  layout_type (riscv_float16_type_node);
+}
+  else
+riscv_float16_type_node = float16_type_node;
+
+  if (!maybe_get_identifier ("_Float16"))
+lang_hooks.types.register_builtin_type (riscv_float16_type_node,
+   "_Float16");
+}
+
 /* Implement TARGET_INIT_BUILTINS.  */
 
 void
 riscv_init_builtins (void)
 {
+  riscv_init_builtin_types ();
+
   for (size_t i = 0; i < ARRAY_SIZE (riscv_builtins); i++)
 {
   const struct riscv_bu

[PATCH v2 2/2] RISC-V: Support zfh and zfhmin extension

2022-08-10 Thread Kito Cheng

Zfh and Zfhmin are extensions for IEEE half precision, both are ratified
in Jan. 2022[1]:

- Zfh has full set of operation like F or D for single or double precision.
- Zfhmin has only provide minimal support for half precision operation,
  like conversion, load, store and move instructions.

[1] 
https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info): Add
zfh and zfhmin.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZFHMIN): New.
(MASK_ZFH): Ditto.
(TARGET_ZFHMIN): Ditto.
(TARGET_ZFH): Ditto.
* config/riscv/riscv.cc (riscv_output_move): Handle HFmode move
for zfh and zfhmin.
(riscv_emit_float_compare): Handle HFmode.
* config/riscv/riscv.md (ANYF): Add HF.
(SOFTF): Add HF.
(load): Ditto.
(store): Ditto.
(truncsfhf2): New.
(truncdfhf2): Ditto.
(extendhfsf2): Ditto.
(extendhfdf2): Ditto.
(*movhf_hardfloat): Ditto.
(*movhf_softfloat): Make sure not ZFHMIN.
* config/riscv/riscv.opt (riscv_zf_subext): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zfh-1.c: New.
* gcc.target/riscv/_Float16-zfh-2.c: Ditto.
* gcc.target/riscv/_Float16-zfh-3.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-1.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-2.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-3.c: Ditto.
* gcc.target/riscv/arch-16.c: Ditto.
* gcc.target/riscv/arch-17.c: Ditto.
* gcc.target/riscv/predef-21.c: Ditto.
* gcc.target/riscv/predef-22.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.cc   |  8 +++
 gcc/config/riscv/riscv-opts.h |  6 ++
 gcc/config/riscv/riscv.cc | 33 ++-
 gcc/config/riscv/riscv.md | 59 +--
 gcc/config/riscv/riscv.opt|  3 +
 .../gcc.target/riscv/_Float16-zfh-1.c |  8 +++
 .../gcc.target/riscv/_Float16-zfh-2.c |  8 +++
 .../gcc.target/riscv/_Float16-zfh-3.c |  8 +++
 .../gcc.target/riscv/_Float16-zfhmin-1.c  |  9 +++
 .../gcc.target/riscv/_Float16-zfhmin-2.c  |  9 +++
 .../gcc.target/riscv/_Float16-zfhmin-3.c  |  9 +++
 gcc/testsuite/gcc.target/riscv/arch-16.c  |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-17.c  |  5 ++
 gcc/testsuite/gcc.target/riscv/predef-21.c| 59 +++
 gcc/testsuite/gcc.target/riscv/predef-22.c| 59 +++
 15 files changed, 279 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-21.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-22.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0e5be2ce105..4ee1b3198c5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -96,6 +96,9 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvl32768b", "zvl16384b"},
   {"zvl65536b", "zvl32768b"},
 
+  {"zfh", "zfhmin"},
+  {"zfhmin", "f"},
+
   {NULL, NULL}
 };
 
@@ -193,6 +196,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1148,6 +1154,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B},
 
+  {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
+  {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
 
   {NULL, NULL, 0}
 };
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1e153b3a6e7..85e869e62e3 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -153,6 +153,12 @@ enum stack_protector_guard {
 #define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
 #define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
 
+#define MASK_ZFHMIN   (1 << 0)
+#define MASK_ZFH  (1 << 1)
+
+#define TARGET_ZFHMIN

Re: [PATCH] RISC-V: Use the X iterator for eh_set_lr_{si,di}

2022-08-10 Thread Kito Cheng via Gcc-patches

LGTM, thanks!

On Sun, Aug 7, 2022 at 3:42 AM Palmer Dabbelt  wrote:
>
> These two patterns were independent, but exactly match the semantics of
> X.  Replace them with a single paramaterized pattern.  Thanks to Andrew
> for pointing this one out over IRC.
>
> gcc/ChangeLog
>
> * config/riscv/riscv.md (eh_set_lr_): New pattern.
> (eh_set_lr_si): Remove.
> (eh_set_lr_di): Likewise.
> ---
> No new failures on the Linux multilibs on trunk.
> ---
>  gcc/config/riscv/riscv.md | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 0796f91dd30..11a59f98a9f 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2562,16 +2562,10 @@
>  ;; Clobber the return address on the stack.  We can't expand this
>  ;; until we know where it will be put in the stack frame.
>
> -(define_insn "eh_set_lr_si"
> -  [(unspec [(match_operand:SI 0 "register_operand" "r")] UNSPEC_EH_RETURN)
> -   (clobber (match_scratch:SI 1 "=&r"))]
> -  "! TARGET_64BIT"
> -  "#")
> -
> -(define_insn "eh_set_lr_di"
> -  [(unspec [(match_operand:DI 0 "register_operand" "r")] UNSPEC_EH_RETURN)
> -   (clobber (match_scratch:DI 1 "=&r"))]
> -  "TARGET_64BIT"
> +(define_insn "eh_set_lr_"
> +  [(unspec [(match_operand:X 0 "register_operand" "r")] UNSPEC_EH_RETURN)
> +   (clobber (match_scratch:X 1 "=&r"))]
> +  ""
>"#")
>
>  (define_split
> --
> 2.34.1
>

Re: [PATCH] RISC-V: Fix the sge ..., x0, ... pattern

2022-08-10 Thread Kito Cheng via Gcc-patches

LGTM, that's apparently some kind of copy & paste error (from *slt
pattern) when we add this pattern.

On Sun, Aug 7, 2022 at 3:42 AM Palmer Dabbelt  wrote:
>
> There's no operand 2 here, so referencing it doesn't make sense.  I
> couldn't find a way to trigger bad assembly output so I don't have a
> test.
>
> gcc/ChangeLog
>
> PR target/106543
> * config/riscv/riscv.md (sge_): Remove
> reference to non-existent operand.
> ---
> No new failures on the Linux multilibs on trunk.
> ---
>  gcc/config/riscv/riscv.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 0796f91dd30..ed1c7f241e6 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2386,7 +2386,7 @@
> (any_ge:GPR (match_operand:X 1 "register_operand" " r")
> (const_int 1)))]
>""
> -  "slt%i2\t%0,zero,%1"
> +  "slt\t%0,zero,%1"
>[(set_attr "type" "slt")
> (set_attr "mode" "")])
>
> --
> 2.34.1
>

Re: [09/23] Add a cut-down version of std::span (array_slice)

2022-08-10 Thread Martin Jambor

Hello,

I have one more question/comment about array_slice.  Ever since I
started to use it...

On Fri, Nov 13 2020, Richard Sandiford via Gcc-patches wrote:
> A later patch wants to be able to pass around subarray views of an
> existing array.  The standard class to do that is std::span, but it's
> a C++20 thing.  This patch just adds a cut-down version of it.
>
> The intention is just to provide what's currently needed.
>
> gcc/
>   * vec.h (array_slice): New class.
> ---
>  gcc/vec.h | 120 ++
>  1 file changed, 120 insertions(+)
>
> diff --git a/gcc/vec.h b/gcc/vec.h
> index f02beddc975..7768de9f518 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -2128,6 +2128,126 @@ release_vec_vec (vec > &vec)
>vec.release ();
>  }
>  
> +// Provide a subset of the std::span functionality.  (We can't use std::span
> +// itself because it's a C++20 feature.)
> +//
> +// In addition, provide an invalid value that is distinct from all valid
> +// sequences (including the empty sequence).  This can be used to return
> +// failure without having to use std::optional.
> +//
> +// There is no operator bool because it would be ambiguous whether it is
> +// testing for a valid value or an empty sequence.
> +template
> +class array_slice
> +{
> +  template friend class array_slice;
> +
> +public:
> +  using value_type = T;
> +  using iterator = T *;
> +  using const_iterator = const T *;
> +
> +  array_slice () : m_base (nullptr), m_size (0) {}
> +
> +  template
> +  array_slice (array_slice other)
> +: m_base (other.m_base), m_size (other.m_size) {}
> +
> +  array_slice (iterator base, unsigned int size)
> +: m_base (base), m_size (size) {}
> +
> +  template
> +  array_slice (T (&array)[N]) : m_base (array), m_size (N) {}
> +
> +  template
> +  array_slice (const vec &v)
> +: m_base (v.address ()), m_size (v.length ()) {}
> +
> +  iterator begin () { return m_base; }
> +  iterator end () { return m_base + m_size; }
> +
> +  const_iterator begin () const { return m_base; }
> +  const_iterator end () const { return m_base + m_size; }
> +
> +  value_type &front ();
> +  value_type &back ();
> +  value_type &operator[] (unsigned int i);
> +
> +  const value_type &front () const;
> +  const value_type &back () const;
> +  const value_type &operator[] (unsigned int i) const;
> +
> +  size_t size () const { return m_size; }

...this has been a constant source of compile errors, because vectors
have length () and this is size ().

I understand that the motivation was consistency with std::span, but do
we really want to add another inconsistency with ourselves?

Given that array_slice is not that much used yet, I believe we can still
change to be consistent with vectors.  I personally think we should but
at the very least, if we keep it as it is, I'd like us to do so
deliberately.

Thanks,

Martin

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-10 Thread Aldy Hernandez via Gcc-patches

On Wed, Aug 10, 2022 at 12:46 PM Richard Biener  wrote:
>
> On Wed, 10 Aug 2022, Richard Biener wrote:
>
> > On Tue, 9 Aug 2022, Andrew MacLeod wrote:
> >
> > >
> > > On 8/9/22 09:01, Richard Biener wrote:
> > > > This revisits how we compute imports later used for the ranger path
> > > > query during backwards threading.  The compute_imports function
> > > > of the path solver ends up pulling the SSA def chain of regular
> > > > stmts without limit and since it starts with just the gori imports
> > > > of the path exit it misses some interesting names to translate
> > > > during path discovery.  In fact with a still empty path this
> > > > compute_imports function looks like not the correct tool.
> > >
> > > I don't really know how this works in practice.  Aldys off this week, so 
> > > he
> > > can comment when he returns.
> > >
> > > The original premise was along the line of recognizing that only changes 
> > > to a
> > > GORI import name to a block can affect the branch at the end of the block.
> > > ie, if the path doesn't change any import to block A, then the branch at 
> > > the
> > > end of block A will not change either.Likewise, if it does change an
> > > import, then we look at whether the branch can be threaded.Beyond that
> > > basic premise, I dont know what all it does.
> >
> > Yep, I also think that's the idea.
>
> [...]
>
> > Anyway, the important result of the change is that the imports
> > set is vastly smaller since it is now constrained to the
> > actual path.  Plus it now contains the local defs in blocks
> > of the path that do not dominate the exit block which means
> > it might get more threading opportunities.  Which reminds me
> > to re-do the cc1files experiment for this change.
>
> Results are a bit unconclusive.  We have
> ethread 14044 -> 14058, first threadfull 36986 -> 37174,
> first thread 8060 -> 8049, dom 21723 -> 21822, second thread
> (after loop opts) 6242 -> 5582, second dom 8522 -> 8765,
> second threadfull 3072 -> 2998 which makes an overall drop
> in the number of threads from 98649 to 98448.

The threading dance between all passes is a bit fickle as you can see.
Particularly, DOM is a red herring.  If it starts firing up, as it
looks like from your numbers, it's usually because we regressed in the
backward threaders and DOM is picking up the slack.  Though a slightly
increase in DOM threading can be because we opened up more
opportunities for DOM because of better threading in the backward
threaders.  Sometimes it helps to run without DOM (or disable DOM
threading) to compare apples with apples...being careful of course,
that you're not dropping a bunch of threads in thread2 for instance,
and picking them up in threadfull2 or whatever.

Ughhh...I really hate that we have 20 million threading passes.

>
> I've isolated one testcase we threaded before but no longer after
> this change and that is
>
> void foo (int nest, int print_nest)
> {
>   _Bool t0 = nest != 0;
>   _Bool t1 = nest == print_nest;
>   _Bool t2 = t0 & t1;
>   if (t2)
> __builtin_puts ("x");
>   nest++;
>   if (nest > 2)
> __builtin_abort ();
>   if (print_nest == nest)
> __builtin_puts ("y");
> }
>
> where we are able to thread from if (t2) to if (print_nest == nest)
> resolving that to false when t2 is true using the nest == print_nest
> relation.  Now, the reason is because of the imports added by
>
>   // Exported booleans along the path, may help conditionals.
>   if (m_resolve)
> for (i = 0; i < m_path.length (); ++i)
>   {
> basic_block bb = m_path[i];
> tree name;
> FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
>   if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
> bitmap_set_bit (imports, SSA_NAME_VERSION (name));
>   }
>
> _BUT_ - it turns out that the m_path local to the solver is still
> the one from the last back_threader::find_taken_edge_switch
> calling m_solver->compute_ranges (path, m_imports), so for a possibly
> completely unrelated path.  Truncating the path also on the solver
> side before calling compute_imports makes the testcase fail to thread.
> I'll note the PHI handling also looks at the stale m_path.

Not good.  Thanks for noticing.

>
> I see the solver itself adds relations from edges on the path so
> the cruical item here seems to be to add imports for the path
> entry conditional, but those would likely be GORI imports for that
> block?  Unfortunately that fails to add t[012], the GORI exports
> seem to cover all that's needed but then exports might be too much
> here?  At least that's what the code in compute_imports effectively
> does (also for the entry block).  But I do wonder why compute_ranges
> does not add relations computed by the entry conditional ...
> That's probably because of
>
> void
> path_range_query::compute_outgoing_relations (basic_block bb, basic_block
> next)
> {
>   gimple *stmt = last_stmt (bb);
>
>   if (stmt
>   && gimple_code (stmt) == GIMPLE_COND
>   && (im

Re: [PATCH] Fix path query compute_imports for external path

2022-08-10 Thread Aldy Hernandez via Gcc-patches

Looks good to me.

Thanks.
Aldy

On Wed, Aug 10, 2022 at 3:01 PM Richard Biener  wrote:
>
> The following fixes the use of compute_imports from the backwards
> threader which ends up accessing stale m_path from a previous
> threading attempt.  The fix is to pass in the path explicitely
> (and not the exit), and initializing it with the exit around this
> call from the backwards threader.  That unfortunately exposed that
> we rely on this broken behavior as the new testcase shows.  The
> missed threading can be restored by registering all relations
> from conditions on the path during solving, for the testcase the
> particular important case is for relations provided by the path
> entry conditional.
>
> I've verified that the GORI query for imported ranges on edges
> is not restricted this way.
>
> This regresses the new ssa-thread-19.c testcase which is exactly
> a case for the other patch re-doing how we compute imports since
> this misses imports for defs that are not on the dominating path
> from the exit.
>
> That's one of the cases this regresses (it also progresses a few
> due to more or the correct relations added).  Overall it
> reduces the number of threads from 98649 to 98620 on my set of
> cc1files.  I think it's a reasonable intermediate step to find
> a stable, less random ground to compare stats to.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> * gimple-range-path.h (path_range_query::compute_imports):
> Take path as argument, not the exit block.
> * gimple-range-path.cc (path_range_query::compute_imports):
> Likewise, and adjust, avoiding possibly stale m_path.
> (path_range_query::compute_outgoing_relations): Register
> relations for all conditionals.
> * tree-ssa-threadbackward.cc (back_threader::find_paths):
> Adjust.
>
> * gcc.dg/tree-ssa/ssa-thread-18.c: New testcase.
> * gcc.dg/tree-ssa/ssa-thread-19.c: Likewise, but XFAILed.
> ---
>  gcc/gimple-range-path.cc  | 21 +---
>  gcc/gimple-range-path.h   |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c | 20 +++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c | 33 +++
>  gcc/tree-ssa-threadbackward.cc|  4 ++-
>  5 files changed, 65 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index 43e7526b6fc..389faec260c 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -549,7 +549,7 @@ path_range_query::add_to_imports (tree name, bitmap 
> imports)
>return false;
>  }
>
> -// Compute the imports to the path ending in EXIT.  These are
> +// Compute the imports to PATH.  These are
>  // essentially the SSA names used to calculate the final conditional
>  // along the path.
>  //
> @@ -559,9 +559,10 @@ path_range_query::add_to_imports (tree name, bitmap 
> imports)
>  // we can solve.
>
>  void
> -path_range_query::compute_imports (bitmap imports, basic_block exit)
> +path_range_query::compute_imports (bitmap imports, const vec 
> &path)
>  {
>// Start with the imports from the exit block...
> +  basic_block exit = path[0];
>gori_compute &gori = m_ranger->gori ();
>bitmap r_imports = gori.imports (exit);
>bitmap_copy (imports, r_imports);
> @@ -599,7 +600,7 @@ path_range_query::compute_imports (bitmap imports, 
> basic_block exit)
>   tree arg = gimple_phi_arg (phi, i)->def;
>
>   if (TREE_CODE (arg) == SSA_NAME
> - && m_path.contains (e->src)
> + && path.contains (e->src)
>   && bitmap_set_bit (imports, SSA_NAME_VERSION (arg)))
> worklist.safe_push (arg);
> }
> @@ -607,9 +608,9 @@ path_range_query::compute_imports (bitmap imports, 
> basic_block exit)
>  }
>// Exported booleans along the path, may help conditionals.
>if (m_resolve)
> -for (i = 0; i < m_path.length (); ++i)
> +for (i = 0; i < path.length (); ++i)
>{
> -   basic_block bb = m_path[i];
> +   basic_block bb = path[i];
> tree name;
> FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
>   if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
> @@ -636,7 +637,7 @@ path_range_query::compute_ranges (const vec 
> &path,
>if (imports)
>  bitmap_copy (m_imports, imports);
>else
> -compute_imports (m_imports, exit_bb ());
> +compute_imports (m_imports, m_path);
>
>if (m_resolve)
>  get_path_oracle ()->reset_path ();
> @@ -845,15 +846,9 @@ path_range_query::compute_phi_relations (basic_block bb, 
> basic_block prev)
>  void
>  path_range_query::compute_outgoing_relations (basic_block bb, basic_block 
> next)
>  {
> -  gimple *stmt = last_stmt (bb);
> -
> -  if (stmt
> -  &&

Re: [PATCH] Fix path query compute_imports for external path

2022-08-10 Thread Aldy Hernandez via Gcc-patches

Oh, and if it wasn't clear from an earlier message.  I'm OK (and
thankful) for everything you're doing in this space, especially if you
stay on top of the threading counts from patch to patch, and you get
your gori questions answered by Andrew ;-).  No sense in you getting
blocked, since you seem to be making good progress.

Aldy

On Wed, Aug 10, 2022 at 6:10 PM Aldy Hernandez  wrote:
>
> Looks good to me.
>
> Thanks.
> Aldy
>
> On Wed, Aug 10, 2022 at 3:01 PM Richard Biener  wrote:
> >
> > The following fixes the use of compute_imports from the backwards
> > threader which ends up accessing stale m_path from a previous
> > threading attempt.  The fix is to pass in the path explicitely
> > (and not the exit), and initializing it with the exit around this
> > call from the backwards threader.  That unfortunately exposed that
> > we rely on this broken behavior as the new testcase shows.  The
> > missed threading can be restored by registering all relations
> > from conditions on the path during solving, for the testcase the
> > particular important case is for relations provided by the path
> > entry conditional.
> >
> > I've verified that the GORI query for imported ranges on edges
> > is not restricted this way.
> >
> > This regresses the new ssa-thread-19.c testcase which is exactly
> > a case for the other patch re-doing how we compute imports since
> > this misses imports for defs that are not on the dominating path
> > from the exit.
> >
> > That's one of the cases this regresses (it also progresses a few
> > due to more or the correct relations added).  Overall it
> > reduces the number of threads from 98649 to 98620 on my set of
> > cc1files.  I think it's a reasonable intermediate step to find
> > a stable, less random ground to compare stats to.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> >
> > Thanks,
> > Richard.
> >
> > * gimple-range-path.h (path_range_query::compute_imports):
> > Take path as argument, not the exit block.
> > * gimple-range-path.cc (path_range_query::compute_imports):
> > Likewise, and adjust, avoiding possibly stale m_path.
> > (path_range_query::compute_outgoing_relations): Register
> > relations for all conditionals.
> > * tree-ssa-threadbackward.cc (back_threader::find_paths):
> > Adjust.
> >
> > * gcc.dg/tree-ssa/ssa-thread-18.c: New testcase.
> > * gcc.dg/tree-ssa/ssa-thread-19.c: Likewise, but XFAILed.
> > ---
> >  gcc/gimple-range-path.cc  | 21 +---
> >  gcc/gimple-range-path.h   |  2 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c | 20 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c | 33 +++
> >  gcc/tree-ssa-threadbackward.cc|  4 ++-
> >  5 files changed, 65 insertions(+), 15 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-18.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-19.c
> >
> > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > index 43e7526b6fc..389faec260c 100644
> > --- a/gcc/gimple-range-path.cc
> > +++ b/gcc/gimple-range-path.cc
> > @@ -549,7 +549,7 @@ path_range_query::add_to_imports (tree name, bitmap 
> > imports)
> >return false;
> >  }
> >
> > -// Compute the imports to the path ending in EXIT.  These are
> > +// Compute the imports to PATH.  These are
> >  // essentially the SSA names used to calculate the final conditional
> >  // along the path.
> >  //
> > @@ -559,9 +559,10 @@ path_range_query::add_to_imports (tree name, bitmap 
> > imports)
> >  // we can solve.
> >
> >  void
> > -path_range_query::compute_imports (bitmap imports, basic_block exit)
> > +path_range_query::compute_imports (bitmap imports, const vec 
> > &path)
> >  {
> >// Start with the imports from the exit block...
> > +  basic_block exit = path[0];
> >gori_compute &gori = m_ranger->gori ();
> >bitmap r_imports = gori.imports (exit);
> >bitmap_copy (imports, r_imports);
> > @@ -599,7 +600,7 @@ path_range_query::compute_imports (bitmap imports, 
> > basic_block exit)
> >   tree arg = gimple_phi_arg (phi, i)->def;
> >
> >   if (TREE_CODE (arg) == SSA_NAME
> > - && m_path.contains (e->src)
> > + && path.contains (e->src)
> >   && bitmap_set_bit (imports, SSA_NAME_VERSION (arg)))
> > worklist.safe_push (arg);
> > }
> > @@ -607,9 +608,9 @@ path_range_query::compute_imports (bitmap imports, 
> > basic_block exit)
> >  }
> >// Exported booleans along the path, may help conditionals.
> >if (m_resolve)
> > -for (i = 0; i < m_path.length (); ++i)
> > +for (i = 0; i < path.length (); ++i)
> >{
> > -   basic_block bb = m_path[i];
> > +   basic_block bb = path[i];
> > tree name;
> > FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
> >   if (TRE

Re: [PATCH] rs6000: Enable generate const through pli+pli+rldimi

2022-08-10 Thread Segher Boessenkool

Hi!

On Wed, Aug 10, 2022 at 03:11:23PM +0800, Jiufu Guo wrote:
> As mentioned in PR106550, since pli could support 34bits immediate, we could
> use less instructions(3insn would be ok) to build 64bits constant with pli.
> 
> For example, for constant 0x020805006106003, we could generate it with:
> asm code1:
> pli 9,101736451 (0x6106003)
> sldi 9,9,32
> paddi 9,9, 213 (0x0208050)
> 
> or asm code2:
> pli 10, 213
> pli 9, 101736451
> rldimi 9, 10, 32, 0
> 
> If there is only one register can be used, then the asm code1 is ok. Otherwise
> asm code2 may be better.

It is significantly better yes.  That code with sldi is perhaps what we
have to do after reload, but all those three insns are sequential,
expensive.

> This patch re-enable the constant building(splitter) before RA by updating the
> constrains from int_reg_operand_not_pseudo to gpc_reg_operand.  And then, we
> could use two different pseduo for two pli(s), and asm code2 can be generated.

> This patch also could generate asm code1 if hard register is allocated for the
> constant.

> +  else if (TARGET_PREFIXED)
> +{
> +  /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
> +  if (can_create_pseudo_p ())
> + {
> +   temp = gen_reg_rtx (DImode);
> +   rtx temp1 = gen_reg_rtx (DImode);
> +   emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
> +   emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
> +
> +   rtx one = gen_rtx_AND (DImode, temp1, GEN_INT (0x));

Why do you meed to mask the value here?  That is a nop, no?

> +   rtx two = gen_rtx_ASHIFT (DImode, temp, GEN_INT (32));
> +   emit_move_insn (dest, gen_rtx_IOR (DImode, one, two));

But you can call gen_rotldi3_insert_3 explicitly, a better idea if this
code can run late (so we cannot rely on other optimisations to clean
things up).

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -9659,7 +9659,7 @@ (define_split
>  ;; When non-easy constants can go in the TOC, this should use
>  ;; easy_fp_constant predicate.
>  (define_split
> -  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
>   (match_operand:DI 1 "const_int_operand"))]
>"TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
>[(set (match_dup 0) (match_dup 2))

This is a huge change.  Do you have some indication that it helps /
hurts / is neutral?  Some reasoning why it is a good idea?

I am not against it, but some more rationale would be good :-)

Btw, this splitter uses operands[2] and [3] in the replacement, and
neither of those exists.  The replacement never is used of course.
Instead, rs6000_emit_set_const is called always.  It would be less
misleading if the replacement text was just "(pc)" or such.


Segher

Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-10 Thread David Malcolm via Gcc-patches

On Wed, 2022-08-10 at 20:34 +0530, Mir Immad wrote:
>  > if you convert the "int m;" locals into an extern global, like in
> > comment #0 of bug 106551, does that still trigger the crash on the
> > unpatched sm-fd.cc?
> 
> Yes, it does, since m would be in "m_start" state. I'm sending an
> updated
> patch.

Great!  

Note that I recently committed a fix for bug 106573, which has an xfail
on a dg-bogus to mark a false positive which your patch hopefully also
fixes (in fd-uninit-1.c).  Can you please rebase and see if your patch
does fix it?

Thanks
Dave


> 
> Thanks
> Immad.
> 
> On Wed, Aug 10, 2022 at 1:32 AM David Malcolm 
> wrote:
> 
> > On Tue, 2022-08-09 at 21:42 +0530, Immad Mir wrote:
> > > This patch fixes the ICE caused by valid_to_unchecked_state,
> > > at analyzer/sm-fd.cc by handling the m_start state in
> > > check_for_dup.
> > > 
> > > Tested lightly on x86_64.
> > > 
> > > gcc/analyzer/ChangeLog:
> > >     PR analyzer/106551
> > >     * sm-fd.cc (check_for_dup): handle the m_start
> > >     state when transitioning the state of LHS
> > >     of dup, dup2 and dup3 call.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >     * gcc.dg/analyzer/fd-dup-1.c: New testcases.
> > > 
> > > Signed-off-by: Immad Mir 
> > > ---
> > >  gcc/analyzer/sm-fd.cc    |  4 ++--
> > >  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28
> > > +++-
> > >  2 files changed, 29 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > > index 8bb76d72b05..c8b9930a7b6 100644
> > > --- a/gcc/analyzer/sm-fd.cc
> > > +++ b/gcc/analyzer/sm-fd.cc
> > > @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> > > *sm_ctxt, const supernode *node,
> > >  case DUP_1:
> > >    if (lhs)
> > >     {
> > > - if (is_constant_fd_p (state_arg_1))
> > > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > > m_start)
> > >     sm_ctxt->set_next_state (stmt, lhs,
> > > m_unchecked_read_write);
> > >   else
> > >     sm_ctxt->set_next_state (stmt, lhs,
> > > @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> > > *sm_ctxt, const supernode *node,
> > >    file descriptor i.e the first argument.  */
> > >    if (lhs)
> > >     {
> > > - if (is_constant_fd_p (state_arg_1))
> > > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > > m_start)
> > >     sm_ctxt->set_next_state (stmt, lhs,
> > > m_unchecked_read_write);
> > >   else
> > >     sm_ctxt->set_next_state (stmt, lhs,
> > > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > index eba2570568f..ed4d6de57db 100644
> > > --- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > @@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
> > >  close (fd);
> > >  }
> > > 
> > > -}
> > > \ No newline at end of file
> > > +}
> > > +
> > > +void
> > > +test_20 ()
> > > +{
> > > +    int m;
> > > +    int fd = dup (m); /* { dg-warning "'dup' on possibly invalid
> > > file descriptor 'm'" } */
> > > +    close (fd);
> > > +}
> > > +
> > > +void
> > > +test_21 ()
> > > +{
> > > +    int m;
> > > +    int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly
> > > invalid file descriptor 'm'" } */
> > > +    close (fd);
> > > +}
> > > +
> > > +void
> > > +test_22 (int flags)
> > > +{
> > > +    int m;
> > > +    int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on
> > > possibly
> > > invalid file descriptor 'm'" } */
> > > +    close (fd);
> > > +}
> > 
> > Thanks for the updated patch.
> > 
> > The test cases looked suspicious to me - I was wondering why the
> > analyzer doesn't complain about the uninitialized values being
> > passed
> > to the various dup functions as parameters.  So your test cases
> > seem to
> > have uncovered a hidden pre-existing bug in the analyzer's
> > uninitialized value detection, which I've filed for myself to deal
> > with
> > as PR analyzer/106573.
> > 
> > If you convert the "int m;" locals into an extern global, like in
> > comment #0 of bug 106551, does that still trigger the crash on the
> > unpatched sm-fd.cc?  If so, then that's greatly preferable as a
> > regression test, since otherwise I'll have to modify that test case
> > when I fix bug 106573.
> > 
> > Dave
> > 
> > 
> > 
> >

Re: [PATCH 0/5] IEEE 128-bit built-in overload support.

2022-08-10 Thread Segher Boessenkool

On Wed, Aug 10, 2022 at 02:23:27AM -0400, Michael Meissner wrote:
> On Fri, Aug 05, 2022 at 01:19:05PM -0500, Segher Boessenkool wrote:
> > On Thu, Jul 28, 2022 at 12:43:49AM -0400, Michael Meissner wrote:
> > > These patches lay the foundation for a set of follow-on patches that will
> > > change the internal handling of 128-bit floating point types in GCC.  In 
> > > the
> > > future patches, I hope to change the compiler to always use KFmode for the
> > > explicit _Float128/__float128 types, to always use TFmode for the long 
> > > double
> > > type, no matter which 128-bit floating point type is used, and IFmode for 
> > > the
> > > explicit __ibm128 type.
> > 
> > Making TFmode different from KFmode and IFmode is not an improvement.
> > NAK.
> 
> First of all, it already IS different from KFmode and IFmode, as we've talked
> about.

It always is the same as either IFmode or KFmode in the end.  It is a
separate mode, yes, because generic code always wants to use TFmode.

> I'm trying to clean this mess up.  Having explicit __float128's being
> converted to TFmode if -mabi=ieeelongdouble is just as bad, and it means that
> _Float128 and __float128 are not the same type.

What do types have to do with this at all?

If TFmode means IEEE QP float, TFmode and KFmode can be used
interchangeably.  When TFmode means double-double, TFmode and IFmode can
be used interchangeably.  We should never depend on TFmode being
different from both underlying modes, that way madness lies.

If you remember, in 2016 or such I experimented with making TFmode a
macro-like thingie, so that we always get KFmode and IFmode in the
instruction stream.  This did not work because of the fundamental
problem that KFmode and IFmode cannot be ordered: for both modes there
are numbers it can represent that cannot be represented in the other
mode; converting from IFmode to KFmode is lossty for some numbers, and
the same is true for converting from KFmode to IFmode.  But, some
internals of GCC require all pairs of floating point modes (that can be
converted between at least) to be comparable (in the mathmatical sense).

Until that problem is solved, we CANNOT move forward.  Your 126/127/128
precision hack gave us some time, but nothing has been improved since
then, and things have started to fall apart at the seams again

Segher

Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-10 Thread Segher Boessenkool

On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
> On 2022/8/9 11:01, Kewen.Lin wrote:
> >I have some concern on those changed "altivec_*_direct", IMHO the suffix
> >"_direct" is normally to indicate the define_insn is mapped to the
> >corresponding hw insn directly.  With this change, for example,
> >altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
> >misleading.  Maybe we can add the corresponding _direct_le and _direct_be
> >versions, both are mapped into the same insn but have different RTL
> >patterns.  Looking forward to Segher's and David's suggestions.
> 
> Thanks!  Do you mean same RTL patterns with different hw insn?

A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
instruction, never a vmrglb instead.  Misleading names are an expensive
problem.


Segher

[PATCH] match.pd: Add abs with bitwise and pattern [PR106243]

2022-08-10 Thread Sam Feifer via Gcc-patches

This patch adds a simplification to match.pd that was discussed on the
thread for pr106243. It simplifies the pattern, abs(x) & 1, to x & 1.

There are also tests for the simplification in this patch. I couldn't
figure out how to get abs to work with vectors. If a test for that is
necessary, could I get some guidance on using abs with vector types?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR tree-optimization/106243

gcc/ChangeLog:

* match.pd (abs(x) & 1): New simplification.

gcc/testsuite/ChangeLog:

* gcc.dg/pr106243-2.c: New test.
* gcc.dg/pr106243-3.c: New test.
---
 gcc/match.pd  |  5 +
 gcc/testsuite/gcc.dg/pr106243-2.c | 31 +++
 gcc/testsuite/gcc.dg/pr106243-3.c | 18 ++
 3 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr106243-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr106243-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index f82f94ad1fe..c04e70f34c1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8071,3 +8071,8 @@ and,
 (simplify 
   (bit_and (negate @0) integer_onep@1)
   (bit_and @0 @1))
+
+/* abs(x) & 1 -> x & 1.  */
+(simplify
+  (bit_and (abs @0) integer_onep@1)
+  (bit_and @0 @1))
diff --git a/gcc/testsuite/gcc.dg/pr106243-2.c 
b/gcc/testsuite/gcc.dg/pr106243-2.c
new file mode 100644
index 000..27e66f59160
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106243-2.c
@@ -0,0 +1,31 @@
+/* PR tree-optimization/106243 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+__attribute__((noipa)) int foo (int x) {
+return abs(x) & 1; 
+}
+
+__attribute__((noipa)) int bar (int x) {
+return (0 - abs(x)) & 1;
+}
+
+/* Commutative property.  */
+__attribute__((noipa)) int baz (int x) {
+return 1 & abs(x);
+}
+
+/* Forward propogation.  */
+__attribute__((noipa)) int qux (int x) {
+int y = abs(x);
+return y & 1;
+}
+
+/* Should not simplify.  */
+__attribute__((noipa)) int thud (int x) {
+return abs(x) & -1;
+}
+
+/* { dg-final {scan-tree-dump-times " ABS_EXPR " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr106243-3.c 
b/gcc/testsuite/gcc.dg/pr106243-3.c
new file mode 100644
index 000..68800868751
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106243-3.c
@@ -0,0 +1,18 @@
+/* PR tree-optimization/106243 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "pr106243-2.c"
+
+int main () {
+
+if (foo(6) != 0
+|| bar(-3) != 1
+|| baz(32) != 0
+|| qux(-128) != 0
+|| foo (127) != 1) {
+__builtin_abort();
+}
+
+return 0;
+}

base-commit: be58bf98e98bb431ed26ca8be84586075fe8be82
-- 
2.31.1

Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-10 Thread Mir Immad via Gcc-patches

 > Can you please rebase and see if your patch
> does fix it?

No, the patch that I sent did not attempt to fix this. Now that I have made
the correction, XFAIL in fd-uninit-1.c has changed to XPASS.

Should i remove the dg-bogus warning from fd-uninit-1.c test_1?

Thanks.
Immad.


On Wed, Aug 10, 2022 at 10:26 PM David Malcolm  wrote:

> On Wed, 2022-08-10 at 20:34 +0530, Mir Immad wrote:
> >  > if you convert the "int m;" locals into an extern global, like in
> > > comment #0 of bug 106551, does that still trigger the crash on the
> > > unpatched sm-fd.cc?
> >
> > Yes, it does, since m would be in "m_start" state. I'm sending an
> > updated
> > patch.
>
> Great!
>
> Note that I recently committed a fix for bug 106573, which has an xfail
> on a dg-bogus to mark a false positive which your patch hopefully also
> fixes (in fd-uninit-1.c).  Can you please rebase and see if your patch
> does fix it?
>
> Thanks
> Dave
>
>
> >
> > Thanks
> > Immad.
> >
> > On Wed, Aug 10, 2022 at 1:32 AM David Malcolm 
> > wrote:
> >
> > > On Tue, 2022-08-09 at 21:42 +0530, Immad Mir wrote:
> > > > This patch fixes the ICE caused by valid_to_unchecked_state,
> > > > at analyzer/sm-fd.cc by handling the m_start state in
> > > > check_for_dup.
> > > >
> > > > Tested lightly on x86_64.
> > > >
> > > > gcc/analyzer/ChangeLog:
> > > > PR analyzer/106551
> > > > * sm-fd.cc (check_for_dup): handle the m_start
> > > > state when transitioning the state of LHS
> > > > of dup, dup2 and dup3 call.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > > * gcc.dg/analyzer/fd-dup-1.c: New testcases.
> > > >
> > > > Signed-off-by: Immad Mir 
> > > > ---
> > > >  gcc/analyzer/sm-fd.cc|  4 ++--
> > > >  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28
> > > > +++-
> > > >  2 files changed, 29 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > > > index 8bb76d72b05..c8b9930a7b6 100644
> > > > --- a/gcc/analyzer/sm-fd.cc
> > > > +++ b/gcc/analyzer/sm-fd.cc
> > > > @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup (sm_context
> > > > *sm_ctxt, const supernode *node,
> > > >  case DUP_1:
> > > >if (lhs)
> > > > {
> > > > - if (is_constant_fd_p (state_arg_1))
> > > > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > > > m_start)
> > > > sm_ctxt->set_next_state (stmt, lhs,
> > > > m_unchecked_read_write);
> > > >   else
> > > > sm_ctxt->set_next_state (stmt, lhs,
> > > > @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup (sm_context
> > > > *sm_ctxt, const supernode *node,
> > > >file descriptor i.e the first argument.  */
> > > >if (lhs)
> > > > {
> > > > - if (is_constant_fd_p (state_arg_1))
> > > > + if (is_constant_fd_p (state_arg_1) || state_arg_1 ==
> > > > m_start)
> > > > sm_ctxt->set_next_state (stmt, lhs,
> > > > m_unchecked_read_write);
> > > >   else
> > > > sm_ctxt->set_next_state (stmt, lhs,
> > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > index eba2570568f..ed4d6de57db 100644
> > > > --- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > @@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
> > > >  close (fd);
> > > >  }
> > > >
> > > > -}
> > > > \ No newline at end of file
> > > > +}
> > > > +
> > > > +void
> > > > +test_20 ()
> > > > +{
> > > > +int m;
> > > > +int fd = dup (m); /* { dg-warning "'dup' on possibly invalid
> > > > file descriptor 'm'" } */
> > > > +close (fd);
> > > > +}
> > > > +
> > > > +void
> > > > +test_21 ()
> > > > +{
> > > > +int m;
> > > > +int fd = dup2 (m, 1); /* { dg-warning "'dup2' on possibly
> > > > invalid file descriptor 'm'" } */
> > > > +close (fd);
> > > > +}
> > > > +
> > > > +void
> > > > +test_22 (int flags)
> > > > +{
> > > > +int m;
> > > > +int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on
> > > > possibly
> > > > invalid file descriptor 'm'" } */
> > > > +close (fd);
> > > > +}
> > >
> > > Thanks for the updated patch.
> > >
> > > The test cases looked suspicious to me - I was wondering why the
> > > analyzer doesn't complain about the uninitialized values being
> > > passed
> > > to the various dup functions as parameters.  So your test cases
> > > seem to
> > > have uncovered a hidden pre-existing bug in the analyzer's
> > > uninitialized value detection, which I've filed for myself to deal
> > > with
> > > as PR analyzer/106573.
> > >
> > > If you convert the "int m;" locals into an extern global, like in
> > > comment #0 of bug 106551, does that still trigger the crash on the
> > > unpatched sm-fd.cc?  If so, then that's greatly preferable as a
> > > regression test, since otherwise I'll have to modify that test case
> > > when

Re: [PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-08-10 Thread Segher Boessenkool

Hi!

Sorry for the tardiness.

On Fri, Jul 22, 2022 at 03:07:55PM +0800, HAO CHEN GUI wrote:
>   This patch creates a new function - change_pseudo_and_mask. If recog fails,
> the function converts a single pseudo to the pseudo AND with a mask if the
> outer operator is IOR/XOR/PLUS and inner operator is ASHIFT or AND. The
> conversion helps pattern to match rotate and mask insn on some targets.

The name isn't so clear.  It isn't changing a mask, to start with.

> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
> +   ASHIFT/AND,

"When the outercode of the SET_SRC of PAT is ..."

> convert a pseudo to pseudo AND with a mask if its nonzero_bits
> +   is less than its mode mask.  The nonzero_bits in later passes is not a
> +   superset of what is known in combine pass.  So an insn with nonzero_bits
> +   can't be recoged later.  */

Can this not be done with a splitter in the machine description?


Segher

[PATCH] Complete __gnu_test::basic_string<>::compare support

2022-08-10 Thread François Dumont via Gcc-patches

Here is another patch to complete __gnu_debug::basic_string<> Standard 
conformity. This one is adding the missing compare overloads.


I also would like to propose to change how __gnu_debug::basic_string<> 
is tested. I considered activating  checks when 
_GLIBCXX_ASSERTIONS is defined but it turns out that to do so this 
light-debug mode should then also consider _GLIBCXX_DEBUG_PEDANTIC. I 
prefer to avoid this.


So I restored previous behavior. I'm now checking for the 
_GLIBCXX_TEST_DEBUG_STRING macro to force usage of . This 
way I am testing it using:


make check-debug CXXFLAGS=-D_GLIBCXX_TEST_DEBUG_STRING

    libstdc++: Add __gnu_debug::basic_string<>::compare overloads

    Rather than adding those implementations we are ading a:
    using _Base::compare;

    so that any compare method not implemented at __gnu_debug::basic_string
    level are injected from the base class.

    Also review how __gnu_debug::basic_string is tested. Now require to 
define

    _GLIBCXX_TEST_DEBUG_STRING when running 'make check-debug'.

    libstdc++-v3/ChangeLog

    * include/debug/string: Add using _Base::compare.
    (__gnu_debug::basic_string<>::compare(const 
basic_string<>&)): Remove.
    (__gnu_debug::basic_string<>::compare(size_type, size_type, 
const basic_string<>&)):

    Remove.
    (__gnu_debug::basic_string<>::compare(size_type, size_type, 
const basic_string<>&,

    size_type, size_type)): Remove.
    * testsuite/util/testsuite_string.h 
[_GLIBCXX_TEST_DEBUG_STRING]: Include .
    * 
testsuite/21_strings/basic_string/operations/compare/char/1.cc: Include 
testsuite_string.h

    and use __gnu_test::string.
    * 
testsuite/21_strings/basic_string/operations/compare/char/13650.cc: 
Likewise.
    * 
testsuite/21_strings/basic_string/operations/compare/char/2.cc: Likewise.
    * 
testsuite/21_strings/basic_string/operations/rfind/char/1.cc: Likewise.
    * 
testsuite/21_strings/basic_string/operations/rfind/char/2.cc: Likewise.
    * 
testsuite/21_strings/basic_string/operations/rfind/char/3.cc: Likewise.
    * 
testsuite/21_strings/basic_string/operations/compare/wchar_t/1.cc: 
Include testsuite_string.h

    and use __gnu_test::wstring.
    * 
testsuite/21_strings/basic_string/operations/compare/wchar_t/13650.cc: 
Likewise.
    * 
testsuite/21_strings/basic_string/operations/compare/wchar_t/2.cc: Likewise.


Tested under Linux x86_64.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string
index a4482db4af5..c32eb41eacd 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -1002,22 +1002,11 @@ namespace __gnu_debug
   substr(size_type __pos = 0, size_type __n = _Base::npos) const
   { return basic_string(_Base::substr(__pos, __n)); }
 
-  int
-  compare(const basic_string& __str) const
-  { return _Base::compare(__str); }
-
-  int
-  compare(size_type __pos1, size_type __n1,
-	  const basic_string& __str) const
-  { return _Base::compare(__pos1, __n1, __str); }
-
-  int
-  compare(size_type __pos1, size_type __n1, const basic_string& __str,
-	  size_type __pos2, size_type __n2) const
-  { return _Base::compare(__pos1, __n1, __str, __pos2, __n2); }
+  using _Base::compare;
 
+  _GLIBCXX20_CONSTEXPR
   int
-  compare(const _CharT* __s) const
+  compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT
   {
 	__glibcxx_check_string(__s);
 	return _Base::compare(__s);
@@ -1025,6 +1014,7 @@ namespace __gnu_debug
 
   //  _GLIBCXX_RESOLVE_LIB_DEFECTS
   //  5. string::compare specification questionable
+  _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos1, size_type __n1, const _CharT* __s) const
   {
@@ -1034,6 +1024,7 @@ namespace __gnu_debug
 
   //  _GLIBCXX_RESOLVE_LIB_DEFECTS
   //  5. string::compare specification questionable
+  _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos1, size_type __n1,const _CharT* __s,
 	  size_type __n2) const
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/1.cc b/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/1.cc
index 0bef0d2dd3d..c04b83c4896 100644
--- a/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/1.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/1.cc
@@ -29,7 +29,7 @@
 // NB compare should be thought of as a lexographical compare, ie how
 // things would be sorted in a dictionary.
 
-#include 
+#include 
 #include 
 #include 
 
@@ -67,7 +67,7 @@ test_value(int result, want_value expected)
 int 
 test01()
 {
-  using namespace std;
+  using namespace __gnu_test;
 
   string 	str_0("costa rica");
   string 	str_1("costa marbella");
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/ope

Re: [PATCH] analyzer: fix ICE casued by dup2 in sm-fd.cc[PR106551]

2022-08-10 Thread David Malcolm via Gcc-patches

On Wed, 2022-08-10 at 22:51 +0530, Mir Immad wrote:
>  > Can you please rebase and see if your patch
> > does fix it?
> 
> No, the patch that I sent did not attempt to fix this. Now that I
> have made
> the correction, XFAIL in fd-uninit-1.c has changed to XPASS.

Great - that means that, with your fix, we no longer bogusly emit that
false positive.

> 
> Should i remove the dg-bogus warning from fd-uninit-1.c test_1?

Yes please.

Thanks
Dave

> 
> Thanks.
> Immad.
> 
> 
> On Wed, Aug 10, 2022 at 10:26 PM David Malcolm 
> wrote:
> 
> > On Wed, 2022-08-10 at 20:34 +0530, Mir Immad wrote:
> > >  > if you convert the "int m;" locals into an extern global, like
> > > in
> > > > comment #0 of bug 106551, does that still trigger the crash on
> > > > the
> > > > unpatched sm-fd.cc?
> > > 
> > > Yes, it does, since m would be in "m_start" state. I'm sending an
> > > updated
> > > patch.
> > 
> > Great!
> > 
> > Note that I recently committed a fix for bug 106573, which has an
> > xfail
> > on a dg-bogus to mark a false positive which your patch hopefully
> > also
> > fixes (in fd-uninit-1.c).  Can you please rebase and see if your
> > patch
> > does fix it?
> > 
> > Thanks
> > Dave
> > 
> > 
> > > 
> > > Thanks
> > > Immad.
> > > 
> > > On Wed, Aug 10, 2022 at 1:32 AM David Malcolm <
> > > dmalc...@redhat.com>
> > > wrote:
> > > 
> > > > On Tue, 2022-08-09 at 21:42 +0530, Immad Mir wrote:
> > > > > This patch fixes the ICE caused by valid_to_unchecked_state,
> > > > > at analyzer/sm-fd.cc by handling the m_start state in
> > > > > check_for_dup.
> > > > > 
> > > > > Tested lightly on x86_64.
> > > > > 
> > > > > gcc/analyzer/ChangeLog:
> > > > >     PR analyzer/106551
> > > > >     * sm-fd.cc (check_for_dup): handle the m_start
> > > > >     state when transitioning the state of LHS
> > > > >     of dup, dup2 and dup3 call.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > >     * gcc.dg/analyzer/fd-dup-1.c: New testcases.
> > > > > 
> > > > > Signed-off-by: Immad Mir 
> > > > > ---
> > > > >  gcc/analyzer/sm-fd.cc    |  4 ++--
> > > > >  gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 28
> > > > > +++-
> > > > >  2 files changed, 29 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > > > > index 8bb76d72b05..c8b9930a7b6 100644
> > > > > --- a/gcc/analyzer/sm-fd.cc
> > > > > +++ b/gcc/analyzer/sm-fd.cc
> > > > > @@ -983,7 +983,7 @@ fd_state_machine::check_for_dup
> > > > > (sm_context
> > > > > *sm_ctxt, const supernode *node,
> > > > >  case DUP_1:
> > > > >    if (lhs)
> > > > >     {
> > > > > - if (is_constant_fd_p (state_arg_1))
> > > > > + if (is_constant_fd_p (state_arg_1) || state_arg_1
> > > > > ==
> > > > > m_start)
> > > > >     sm_ctxt->set_next_state (stmt, lhs,
> > > > > m_unchecked_read_write);
> > > > >   else
> > > > >     sm_ctxt->set_next_state (stmt, lhs,
> > > > > @@ -1011,7 +1011,7 @@ fd_state_machine::check_for_dup
> > > > > (sm_context
> > > > > *sm_ctxt, const supernode *node,
> > > > >    file descriptor i.e the first argument.  */
> > > > >    if (lhs)
> > > > >     {
> > > > > - if (is_constant_fd_p (state_arg_1))
> > > > > + if (is_constant_fd_p (state_arg_1) || state_arg_1
> > > > > ==
> > > > > m_start)
> > > > >     sm_ctxt->set_next_state (stmt, lhs,
> > > > > m_unchecked_read_write);
> > > > >   else
> > > > >     sm_ctxt->set_next_state (stmt, lhs,
> > > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > > b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > > index eba2570568f..ed4d6de57db 100644
> > > > > --- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
> > > > > @@ -220,4 +220,30 @@ test_19 (const char *path, void *buf)
> > > > >  close (fd);
> > > > >  }
> > > > > 
> > > > > -}
> > > > > \ No newline at end of file
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +test_20 ()
> > > > > +{
> > > > > +    int m;
> > > > > +    int fd = dup (m); /* { dg-warning "'dup' on possibly
> > > > > invalid
> > > > > file descriptor 'm'" } */
> > > > > +    close (fd);
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +test_21 ()
> > > > > +{
> > > > > +    int m;
> > > > > +    int fd = dup2 (m, 1); /* { dg-warning "'dup2' on
> > > > > possibly
> > > > > invalid file descriptor 'm'" } */
> > > > > +    close (fd);
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +test_22 (int flags)
> > > > > +{
> > > > > +    int m;
> > > > > +    int fd = dup3 (m, 1, flags); /* { dg-warning "'dup3' on
> > > > > possibly
> > > > > invalid file descriptor 'm'" } */
> > > > > +    close (fd);
> > > > > +}
> > > > 
> > > > Thanks for the updated patch.
> > > > 
> > > > The test cases looked suspicious to me - I was wondering why
> > > > the
> > > > analyzer doesn't complain about the uninitialized values b

Re: Rust frontend patches v1

2022-08-10 Thread Philip Herron

Hi everyone

For my v2 of the patches, I've been spending a lot of time ensuring
each patch is buildable. It would end up being simpler if it was
possible if each patch did not have to be like this so I could split
up the front-end in more patches. Does this make sense? In theory,
when everything goes well, does this still mean that we can merge in
one commit, or should it follow a series of buildable patches? I've
received feedback that it might be possible to ignore making each
patch an independent chunk and just focus on splitting it up as small
as possible even if they don't build.

I hope this makes sense.

Thanks

--Phil

On Thu, 28 Jul 2022 at 10:39, Philip Herron  wrote:
>
> Thanks, for confirming David. I think it was too big in the end. I was
> trying to figure out how to actually split that up but it seems
> reasonable that I can split up the front-end patches into patches for
> each separate pass in the compiler seems like a reasonable approach
> for now.
>
> --Phil
>
> On Wed, 27 Jul 2022 at 17:45, David Malcolm  wrote:
> >
> > On Wed, 2022-07-27 at 14:40 +0100, herron.philip--- via Gcc-patches
> > wrote:
> > > This is the initial version 1 patch set for the Rust front-end. There
> > > are more changes that need to be extracted out for all the target
> > > hooks we have implemented. The goal is to see if we are implementing
> > > the target hooks information for x86 and arm. We have more patches
> > > for the other targets I can add in here but they all follow the
> > > pattern established here.
> > >
> > > Each patch is buildable on its own and rebased ontop of
> > > 718cf8d0bd32689192200d2156722167fd21a647. As for ensuring we keep
> > > attribution for all the patches we have received in the front-end
> > > should we create a CONTRIBUTOR's file inside the front-end folder?
> > >
> > > Note thanks to Thomas Schwinge and Mark Wielaard, we are keeping a
> > > branch up to date with our code on:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/rust/master
> > >  but this is not rebased ontop of gcc head.
> > >
> > > Let me know if I have sent these patches correctly or not, this is a
> > > learning experience with git send-email.
> > >
> > > [PATCH Rust front-end v1 1/4] Add skeleton Rust front-end folder
> > > [PATCH Rust front-end v1 2/4] Add Rust lang TargetHooks for i386 and
> > > [PATCH Rust front-end v1 3/4] Add Rust target hooks to ARM
> > > [PATCH Rust front-end v1 4/4] Add Rust front-end and associated
> >
> > FWIW it looks like patch 4 of the kit didn't make it (I didn't get a
> > copy and I don't see it in the archives).
> >
> > Maybe it exceeded a size limit?  If so, maybe try splitting it up into
> > more patches.
> >
> > Dave
> >

Re: Rust frontend patches v1

2022-08-10 Thread David Malcolm via Gcc-patches

On Wed, 2022-08-10 at 19:56 +0100, Philip Herron wrote:
> Hi everyone
> 
> For my v2 of the patches, I've been spending a lot of time ensuring
> each patch is buildable. It would end up being simpler if it was
> possible if each patch did not have to be like this so I could split
> up the front-end in more patches. Does this make sense?

Yes.  I've often split up patches into chunks when posting them for
review, and I see other people here do that.

It makes it easier for the reviewer also, since it's usually easier to
deal with e.g. 10 small patches than one enormous one (e.g. if many of
them are uncontroversial, having them split up makes it easier to focus
attention on just the controversial areas).

>  In theory,
> when everything goes well, does this still mean that we can merge in
> one commit, 

Split up the patches for review, but make a note in the cover letter
than they would all be merged into one when committing.

(especially if doing the split is taking up a lot of time; we don't
want to be mandating busy-work)

Dave


> or should it follow a series of buildable patches? I've
> received feedback that it might be possible to ignore making each
> patch an independent chunk and just focus on splitting it up as small
> as possible even if they don't build.
> 
> I hope this makes sense.
> 
> Thanks
> 
> --Phil
> 
> On Thu, 28 Jul 2022 at 10:39, Philip Herron < 
> philip.her...@embecosm.com> wrote:
> > 
> > Thanks, for confirming David. I think it was too big in the end. I
> > was
> > trying to figure out how to actually split that up but it seems
> > reasonable that I can split up the front-end patches into patches
> > for
> > each separate pass in the compiler seems like a reasonable approach
> > for now.
> > 
> > --Phil
> > 
> > On Wed, 27 Jul 2022 at 17:45, David Malcolm 
> > wrote:
> > > 
> > > On Wed, 2022-07-27 at 14:40 +0100, herron.philip--- via Gcc-
> > > patches
> > > wrote:
> > > > This is the initial version 1 patch set for the Rust front-end.
> > > > There
> > > > are more changes that need to be extracted out for all the
> > > > target
> > > > hooks we have implemented. The goal is to see if we are
> > > > implementing
> > > > the target hooks information for x86 and arm. We have more
> > > > patches
> > > > for the other targets I can add in here but they all follow the
> > > > pattern established here.
> > > > 
> > > > Each patch is buildable on its own and rebased ontop of
> > > > 718cf8d0bd32689192200d2156722167fd21a647. As for ensuring we
> > > > keep
> > > > attribution for all the patches we have received in the front-
> > > > end
> > > > should we create a CONTRIBUTOR's file inside the front-end
> > > > folder?
> > > > 
> > > > Note thanks to Thomas Schwinge and Mark Wielaard, we are
> > > > keeping a
> > > > branch up to date with our code on:
> > > >  
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/rust/master
> > > >  but this is not rebased ontop of gcc head.
> > > > 
> > > > Let me know if I have sent these patches correctly or not, this
> > > > is a
> > > > learning experience with git send-email.
> > > > 
> > > > [PATCH Rust front-end v1 1/4] Add skeleton Rust front-end
> > > > folder
> > > > [PATCH Rust front-end v1 2/4] Add Rust lang TargetHooks for
> > > > i386 and
> > > > [PATCH Rust front-end v1 3/4] Add Rust target hooks to ARM
> > > > [PATCH Rust front-end v1 4/4] Add Rust front-end and associated
> > > 
> > > FWIW it looks like patch 4 of the kit didn't make it (I didn't
> > > get a
> > > copy and I don't see it in the archives).
> > > 
> > > Maybe it exceeded a size limit?  If so, maybe try splitting it up
> > > into
> > > more patches.
> > > 
> > > Dave
> > > 
>

Re: [PATCH V2] arm: add -static-pie support

2022-08-10 Thread Ramana Radhakrishnan via Gcc-patches

Hi Lance,

Thanks for your contribution - looks like your first one to GCC ?

The patch looks good to me, though it should probably go through a
full test suite run on arm-linux-gnueabihf and get a ChangeLog - see
here for more https://gcc.gnu.org/contribute.html#patches.

This is probably small enough to go under the 10 line rule but since
you've used Signed-off-by in your patch, is that indicating you are
contributing under DCO rules -
https://gcc.gnu.org/contribute.html#legal ?

regards
Ramana

On Thu, Aug 4, 2022 at 5:48 PM Lance Fredrickson via Gcc-patches
 wrote:
>
> Just a follow up trying to get some eyes on my static-pie patch
> submission for arm.
> Feedback welcome.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html
>
> thanks,
> Lance Fredrickson

[GCC13][Patch][PR106457]improve array_at_struct_end_p for array objects (PR106457)

2022-08-10 Thread Qing Zhao via Gcc-patches

Hi,

As mentioned in the bug report, I reopened this bug since the previous patch:

commit r13-1875-gff26f0ba68fe6e870f315d0601b596f889b89680
Author: Richard Biener 
Date:   Thu Jul 28 10:07:32 2022 +0200

middle-end/106457 - improve array_at_struct_end_p for array objects
Array references to array objects are never at struct end.


Didn’t resolve this bug.

This is a new patch, and my current work on -fstrict-flex-array depends on this 
patch.

Please take a look at the patch and let me know whether it’s good for 
committing.

Thanks.

Qing.


==

[PATCH] middle-end/106457 - improve array_at_struct_end_p for array
 objects (PR106457)

Array references are not handled correctly by current array_at_struct_end_p,
for the following array references:

Example 1: (from gcc/testsuite/gcc.dg/torture/pr50067-[1|2].c):

short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
 ... = (*((char(*)[32])&a[0]))[i+8];  // this array reference

Example 2: (from gcc/testsuite/gcc.target/aarch64/vadd_reduc-2.c):

int test (uint8_t *p, uint32_t t[1][1], int n) {
  for (int i = 0; i < 4; i++, p++)
t[i][0] = ...;  // this array reference
...
}

Example 3: (from gcc/testsuite/g++.dg/debug/debug5.C):

  int a = 1;
  int b = 1;
  int e[a][b];
  e[0][0] = 0;  // this array reference

All the above array references are identified as TRUE by the current
array_at_struct_end_p, therefore treated as flexible array members.
Obviously, they are just simple array references, not an array refs
to the last field of a struture. The current array_at_struct_end_p handles
such array references incorrectly.

In order to handle array references correctly, we could recursively check
its first operand if it's a MEM_REF or COMPONENT_REF and stop as FALSE
when otherwise. This resolved all the issues for ARRAY_REF.

bootstrapped and regression tested on both X86 and Aarch64.
Multiple testing cases behave differently due to array_at_struct_end_p now
behave correctly (return FALSE now, then they are not flexible array member
anymore). Adjust these testing cases.

There is one regression for gcc/target/aarch64/vadd_reduc-2.c is left
unresolved since the loop transformation is changed due to the changed behavior
of array_at_struct_end_p, simple adjustment of the testing case doesnt work.
I will file a bug to record this regression.

gcc/ChangeLog:

PR middle-end/106457
* tree.cc (array_at_struct_end_p): Handle array objects recursively
through its first operand.

gcc/testsuite/ChangeLog:

PR middle-end/106457
* gcc.dg/torture/pr50067-1.c: Add -Wno-aggressive-loop-optimizations
to suppress warnings.
* gcc.dg/torture/pr50067-2.c: Likewise.
* gcc.target/aarch64/vadd_reduc-2.c: Likewise.
* gcc.target/i386/pr104059.c: Likewise.


The complete patch is at:




0001-middle-end-106457-improve-array_at_struct_end_p-for-.patch
Description: 0001-middle-end-106457-improve-array_at_struct_end_p-for-.patch

Re: [PATCH v2 2/2] RISC-V: Support zfh and zfhmin extension

2022-08-10 Thread 钟居哲

LGTM. 



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2022-08-10 23:44
To: gcc-patches; kito.cheng; jim.wilson.gcc; palmer; andrew; juzhe.zhong; joseph
CC: Kito Cheng
Subject: [PATCH v2 2/2] RISC-V: Support zfh and zfhmin extension
Zfh and Zfhmin are extensions for IEEE half precision, both are ratified
in Jan. 2022[1]:
 
- Zfh has full set of operation like F or D for single or double precision.
- Zfhmin has only provide minimal support for half precision operation,
  like conversion, load, store and move instructions.
 
[1] 
https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768
 
gcc/ChangeLog:
 
* common/config/riscv/riscv-common.cc (riscv_implied_info): Add
zfh and zfhmin.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZFHMIN): New.
(MASK_ZFH): Ditto.
(TARGET_ZFHMIN): Ditto.
(TARGET_ZFH): Ditto.
* config/riscv/riscv.cc (riscv_output_move): Handle HFmode move
for zfh and zfhmin.
(riscv_emit_float_compare): Handle HFmode.
* config/riscv/riscv.md (ANYF): Add HF.
(SOFTF): Add HF.
(load): Ditto.
(store): Ditto.
(truncsfhf2): New.
(truncdfhf2): Ditto.
(extendhfsf2): Ditto.
(extendhfdf2): Ditto.
(*movhf_hardfloat): Ditto.
(*movhf_softfloat): Make sure not ZFHMIN.
* config/riscv/riscv.opt (riscv_zf_subext): New.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/_Float16-zfh-1.c: New.
* gcc.target/riscv/_Float16-zfh-2.c: Ditto.
* gcc.target/riscv/_Float16-zfh-3.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-1.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-2.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-3.c: Ditto.
* gcc.target/riscv/arch-16.c: Ditto.
* gcc.target/riscv/arch-17.c: Ditto.
* gcc.target/riscv/predef-21.c: Ditto.
* gcc.target/riscv/predef-22.c: Ditto.
---
gcc/common/config/riscv/riscv-common.cc   |  8 +++
gcc/config/riscv/riscv-opts.h |  6 ++
gcc/config/riscv/riscv.cc | 33 ++-
gcc/config/riscv/riscv.md | 59 +--
gcc/config/riscv/riscv.opt|  3 +
.../gcc.target/riscv/_Float16-zfh-1.c |  8 +++
.../gcc.target/riscv/_Float16-zfh-2.c |  8 +++
.../gcc.target/riscv/_Float16-zfh-3.c |  8 +++
.../gcc.target/riscv/_Float16-zfhmin-1.c  |  9 +++
.../gcc.target/riscv/_Float16-zfhmin-2.c  |  9 +++
.../gcc.target/riscv/_Float16-zfhmin-3.c  |  9 +++
gcc/testsuite/gcc.target/riscv/arch-16.c  |  5 ++
gcc/testsuite/gcc.target/riscv/arch-17.c  |  5 ++
gcc/testsuite/gcc.target/riscv/predef-21.c| 59 +++
gcc/testsuite/gcc.target/riscv/predef-22.c| 59 +++
15 files changed, 279 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/arch-16.c
create mode 100644 gcc/testsuite/gcc.target/riscv/arch-17.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-21.c
create mode 100644 gcc/testsuite/gcc.target/riscv/predef-22.c
 
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0e5be2ce105..4ee1b3198c5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -96,6 +96,9 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvl32768b", "zvl16384b"},
   {"zvl65536b", "zvl32768b"},
+  {"zfh", "zfhmin"},
+  {"zfhmin", "f"},
+
   {NULL, NULL}
};
@@ -193,6 +196,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
};
@@ -1148,6 +1154,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B},
+  {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
+  {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
   {NULL, NULL, 0}
};
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1e153b3a6e7..85e869e62e3 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -153,6 +153,12 @@ enum stack_protector_guard {
#define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
#define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
+#define MASK_ZFHMIN   (1 << 0)
+#define MASK_ZFH  (1 << 1)
+
+#define TARGET_ZFHMIN ((riscv_zf_subext & MASK_ZFHMIN) != 0)
+#define TARGET_ZFH

Re: [PATCH v2] c-family: Honor -Wno-init-self for cv-qual vars [PR102633]

2022-08-10 Thread Jason Merrill via Gcc-patches


On 8/8/22 12:06, Marek Polacek wrote:

On Sat, Aug 06, 2022 at 03:29:05PM -0700, Jason Merrill wrote:

On 7/26/22 14:31, Marek Polacek wrote:

On Tue, Jul 26, 2022 at 04:24:18PM -0400, Jason Merrill wrote:

On 7/26/22 15:03, Marek Polacek wrote:

Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

 const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711 case DECL_EXPR:
715   if (VAR_P (DECL_EXPR_DECL (*expr_p))
716   && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717   && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718   && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL 
(*expr_p))
719   && !warn_init_self)
720 suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it...except it doesn't, for volatile variables
in C++.  The problem is that for

 volatile int k = k;

we see that the initializer has TREE_SIDE_EFFECTS, so we perform dynamic
initialization.  So there's no DECL_INITIAL and the suppress_warning
call above is never done.  I suppose we could amend get_no_uninit_warning
to return true for volatile-qualified expressions.  I mean, can we ever
say for a fact that a volatile variable is uninitialized?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR middle-end/102633

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_gimplify_expr): Strip NOPs of DECL_INITIAL.


I wonder if we want to handle this i = i case earlier, like in finish_decl.


I could, something like

@@ -5381,7 +5381,14 @@ finish_decl (tree decl, location_t init_loc, tree init,
   init = NULL_TREE;

 if (init)
-store_init_value (init_loc, decl, init, origtype);
+{
+  /* In the self-init case, undo the artificial NOP_EXPR we may have
+added in convert_lvalue_to_rvalue so that c_gimplify_expr/DECL_EXPR
+can perform suppress_warning.  */
+  if (TREE_CODE (init) == NOP_EXPR && TREE_OPERAND (init, 0) == decl)
+   init = TREE_OPERAND (init, 0);
+  store_init_value (init_loc, decl, init, origtype);
+}

but then I'd have to do the same thing in cp_finish_decl because
decay_conversion also adds a NOP_EXPR for cv-qualified non-class prvalues.
Is that what we want?  To me that seems less clean than having c_gimplify_expr
see through NOP_EXPRs.


I was thinking of checking the form of the initializer before
decay_conversion or anything else messes with it, and calling
suppress_warning at that point instead of in c_gimplify_expr.


Aaah, okay.  Here's a patch that does it.  In the C FE it has to
happen really early.  Now both front ends behave the same wrt volatiles!

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


LGTM.


-- >8 --
Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

   const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711 case DECL_EXPR:
715   if (VAR_P (DECL_EXPR_DECL (*expr_p))
716   && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717   && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718   && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL 
(*expr_p))
719   && !warn_init_self)
720 suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.

PR middle-end/102633

gcc/c-family/ChangeLog:

* c-gimplify.cc (c_gimplify_expr) : Don't call
suppress_warning here.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_initializer): Add new tree parameter.  Use it.
Call suppress_warning.
(c_parser_declaration_or_fndef): Pass d down to c_parser_initializer.
(c_parser_omp_declare_reduction): Pass omp_priv down to
c_parser_initializer.

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Call suppress_warning.

gcc/testsuite/ChangeLog:

* c-c++-common/Winit-self1.c: New test.
* c-c++-common/Winit-self2.c: New test.
---
  gcc/c-family/c-gimplify.cc   | 12 -
  gcc/c/c-parser.cc| 19 ---
  gcc/cp/decl.cc   |  8 ++
  gcc/testsuite/c-c++-common/Winit-self1.c | 31 
  gcc/testsuite/c-c++-common/Winit-self2.c | 31 
  5 files changed, 85 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/c-c++-common/Winit-self1.c
  create mode 100644 gcc/testsu

Re: [PATCH v3] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-08-10 Thread HAO CHEN GUI via Gcc-patches

Hi Segher,
  Really appreciate your review comments.

On 11/8/2022 上午 1:38, Segher Boessenkool wrote:
> Hi!
> 
> Sorry for the tardiness.
> 
> On Fri, Jul 22, 2022 at 03:07:55PM +0800, HAO CHEN GUI wrote:
>>   This patch creates a new function - change_pseudo_and_mask. If recog fails,
>> the function converts a single pseudo to the pseudo AND with a mask if the
>> outer operator is IOR/XOR/PLUS and inner operator is ASHIFT or AND. The
>> conversion helps pattern to match rotate and mask insn on some targets.
> 
> The name isn't so clear.  It isn't changing a mask, to start with.
How about setting function name to change_pseudo? It converts a pseudo to
the pseudo AND with a mask in a particular situation.

> 
>> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
>> +   ASHIFT/AND,
> 
> "When the outercode of the SET_SRC of PAT is ..."
Yeah, I will change it.

> 
>> convert a pseudo to pseudo AND with a mask if its nonzero_bits
>> +   is less than its mode mask.  The nonzero_bits in later passes is not a
>> +   superset of what is known in combine pass.  So an insn with nonzero_bits
>> +   can't be recoged later.  */
> 
> Can this not be done with a splitter in the machine description?
> 
Sorry, I don't quite understand it. Do you mean if the conversion can be done in
split pass?

If a pseudo has DImode and stem from a char, we get nonzero_bits as 0xff in 
combine
pass. But in split pass, it's nonzero_bits is 0x. So the 
conversion
can only be done in combine pass.

Thanks
Gui Haochen

Re: [PATCH] RISC-V/testsuite: Restrict remaining `fmin'/`fmax' tests to hard float

2022-08-10 Thread Kito Cheng via Gcc-patches

LGTM, thanks :)

On Fri, Jul 29, 2022 at 3:52 AM Maciej W. Rozycki  wrote:
>
> Complement commit 7915f6551343 ("RISC-V/testsuite: constraint some of
> tests to hard_float") and also restrict the remaining `fmin'/`fmax'
> tests to hard-float test configurations.
>
> gcc/testsuite/
> * gcc.target/riscv/fmax-snan.c: Add `dg-require-effective-target
> hard_float'.
> * gcc.target/riscv/fmaxf-snan.c: Likewise.
> * gcc.target/riscv/fmin-snan.c: Likewise.
> * gcc.target/riscv/fminf-snan.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/riscv/fmax-snan.c  |1 +
>  gcc/testsuite/gcc.target/riscv/fmaxf-snan.c |1 +
>  gcc/testsuite/gcc.target/riscv/fmin-snan.c  |1 +
>  gcc/testsuite/gcc.target/riscv/fminf-snan.c |1 +
>  4 files changed, 4 insertions(+)
>
> gcc-riscv-fmin-fmax-test-hard-float.diff
> Index: gcc/gcc/testsuite/gcc.target/riscv/fmax-snan.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/fmax-snan.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/fmax-snan.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fsignaling-nans -dp" 
> } */
>
>  double
> Index: gcc/gcc/testsuite/gcc.target/riscv/fmaxf-snan.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/fmaxf-snan.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/fmaxf-snan.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fsignaling-nans -dp" 
> } */
>
>  float
> Index: gcc/gcc/testsuite/gcc.target/riscv/fmin-snan.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/fmin-snan.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/fmin-snan.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fsignaling-nans -dp" 
> } */
>
>  double
> Index: gcc/gcc/testsuite/gcc.target/riscv/fminf-snan.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/riscv/fminf-snan.c
> +++ gcc/gcc/testsuite/gcc.target/riscv/fminf-snan.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fsignaling-nans -dp" 
> } */
>
>  float

Re: [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU

2022-08-10 Thread Kito Cheng via Gcc-patches

LGTM, but with a nit, I don't get set.w but get an andi like below, so
maybe we should also scan-assembler-not andi? feel free to commit that
directly with that fix

```asm
sleu:
   sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
   xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
   andia0,a0,1 # 16[c=4 l=4]  anddi3/1
   ret # 25[c=0 l=4]  simple_return
```

On Wed, Aug 3, 2022 at 5:54 PM Maciej W. Rozycki  wrote:
>
> We produce inefficient code for some synthesized SImode conditional set
> operations (i.e. ones that are not directly implemented in hardware) on
> RV64.  For example a piece of C code like this:
>
> int
> sleu (unsigned int x, unsigned int y)
> {
>   return x <= y;
> }
>
> gets compiled (at `-O2') to this:
>
> sleu:
> sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
> xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
> sext.w  a0,a0   # 16[c=4 l=4]  extendsidi2/0
> ret # 25[c=0 l=4]  simple_return
>
> This is because the middle end expands a SLEU operation missing from
> RISC-V hardware into a sequence of a SImode SGTU operation followed by
> an explicit SImode XORI operation with immediate 1.  And while the SGTU
> machine instruction (alias SLTU with the input operands swapped) gives a
> properly sign-extended 32-bit result which is valid both as a SImode or
> a DImode operand the middle end does not see that through a SImode XORI
> operation, because we tell the middle end that the RISC-V target (unlike
> MIPS) may hold values in DImode integer registers that are valid for
> SImode operations even if not properly sign-extended.
>
> However the RISC-V psABI requires that 32-bit function arguments and
> results passed in 64-bit integer registers be properly sign-extended, so
> this is explicitly done at the conclusion of the function.
>
> Fix this by making the backend use a sequence of a DImode SGTU operation
> followed by a SImode SEQZ operation instead.  The latter operation is
> known by the middle end to produce a properly sign-extended 32-bit
> result and therefore combine gets rid of the sign-extension operation
> that follows and actually folds it into the very same XORI machine
> operation resulting in:
>
> sleu:
> sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_didi
> xoria0,a0,1 # 16[c=4 l=4]  xordi3/1
> ret # 25[c=0 l=4]  simple_return
>
> instead (although the SEQZ alias SLTIU against immediate 1 machine
> instruction would equally do and is actually retained at `-O0').  This
> is handled analogously for the remaining synthesized operations of this
> kind, i.e. `SLE', `SGEU', and `SGE'.
>
> gcc/
> * config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0
> rather that XOR 1 for LE and LEU operations.
>
> gcc/testsuite/
> * gcc.target/riscv/sge.c: New test.
> * gcc.target/riscv/sgeu.c: New test.
> * gcc.target/riscv/sle.c: New test.
> * gcc.target/riscv/sleu.c: New test.
> ---
> Hi,
>
>  Regression-tested with the `riscv64-linux-gnu' target.  OK to apply?
>
>   Maciej
> ---
>  gcc/config/riscv/riscv.cc |4 ++--
>  gcc/testsuite/gcc.target/riscv/sge.c  |   11 +++
>  gcc/testsuite/gcc.target/riscv/sgeu.c |   11 +++
>  gcc/testsuite/gcc.target/riscv/sle.c  |   11 +++
>  gcc/testsuite/gcc.target/riscv/sleu.c |   11 +++
>  5 files changed, 46 insertions(+), 2 deletions(-)
>
> gcc-riscv-int-order-inv-seqz.diff
> Index: gcc/gcc/config/riscv/riscv.cc
> ===
> --- gcc.orig/gcc/config/riscv/riscv.cc
> +++ gcc/gcc/config/riscv/riscv.cc
> @@ -2500,9 +2500,9 @@ riscv_emit_int_order_test (enum rtx_code
> }
>else if (invert_ptr == 0)
> {
> - rtx inv_target = riscv_force_binary (GET_MODE (target),
> + rtx inv_target = riscv_force_binary (word_mode,
>inv_code, cmp0, cmp1);
> - riscv_emit_binary (XOR, target, inv_target, const1_rtx);
> + riscv_emit_binary (EQ, target, inv_target, const0_rtx);
> }
>else
> {
> Index: gcc/gcc/testsuite/gcc.target/riscv/sge.c
> ===
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sge.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int
> +sge (int x, int y)
> +{
> +  return x >= y;
> +}
> +
> +/* { dg-final { scan-assembler-not "sext\\.w" } } */
> Index: gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
> ===
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-t

Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-10 Thread Andreas Krebbel via Gcc-patches

On 8/10/22 13:42, Ilya Leoshkevich wrote:
> On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
>> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
>>
>>
>>
>> dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
>> produces an insn that vsel is supposed to recognize, but can't,
>> because it's not defined for V2SF.  Fix by defining it for all vector
>> modes supported by copysign3.
>>
>> gcc/ChangeLog:
>>
>> * config/s390/vector.md (V_HW_FT): New iterator.
>> * config/s390/vx-builtins.md (vsel): Use V instead of
>> V_HW.
>> ---
>>  gcc/config/s390/vector.md  |  6 ++
>>  gcc/config/s390/vx-builtins.md | 12 ++--
>>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> Jakub pointed out that this is broken in gcc-12 as well.
> The patch applies cleanly, and I started a bootstrap/regtest.
> Ok for gcc-12?

Yes. Thanks!

Andreas

Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-10 Thread Xionghu Luo via Gcc-patches





On 2022/8/11 01:07, Segher Boessenkool wrote:

On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:

On 2022/8/9 11:01, Kewen.Lin wrote:

I have some concern on those changed "altivec_*_direct", IMHO the suffix
"_direct" is normally to indicate the define_insn is mapped to the
corresponding hw insn directly.  With this change, for example,
altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
misleading.  Maybe we can add the corresponding _direct_le and _direct_be
versions, both are mapped into the same insn but have different RTL
patterns.  Looking forward to Segher's and David's suggestions.


Thanks!  Do you mean same RTL patterns with different hw insn?


A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
instruction, never a vmrglb instead.  Misleading names are an expensive
problem.




Thanks.  Then on LE platforms, if user calls altivec_vmrghw，it will be
expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
For BE just strict forward, seems more clear :-), OK for master?


[PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS 
[PR106069]

v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
Linux(Thanks to Kewen).

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/config/rs6000/altivec.md| 223 ++--
 gcc/config/rs6000/rs6000.cc |  36 ++--
 gcc/config/rs6000/vsx.md|  24 +--
 gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++
 4 files changed, 305 insertions(+), 98 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..78245f470e9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT 
(17),
+ GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21),
+ GEN_I

Re: [09/23] Add a cut-down version of std::span (array_slice)

2022-08-10 Thread Richard Biener via Gcc-patches

On Wed, Aug 10, 2022 at 6:04 PM Martin Jambor  wrote:
>
> Hello,
>
> I have one more question/comment about array_slice.  Ever since I
> started to use it...
>
> On Fri, Nov 13 2020, Richard Sandiford via Gcc-patches wrote:
> > A later patch wants to be able to pass around subarray views of an
> > existing array.  The standard class to do that is std::span, but it's
> > a C++20 thing.  This patch just adds a cut-down version of it.
> >
> > The intention is just to provide what's currently needed.
> >
> > gcc/
> >   * vec.h (array_slice): New class.
> > ---
> >  gcc/vec.h | 120 ++
> >  1 file changed, 120 insertions(+)
> >
> > diff --git a/gcc/vec.h b/gcc/vec.h
> > index f02beddc975..7768de9f518 100644
> > --- a/gcc/vec.h
> > +++ b/gcc/vec.h
> > @@ -2128,6 +2128,126 @@ release_vec_vec (vec > &vec)
> >vec.release ();
> >  }
> >
> > +// Provide a subset of the std::span functionality.  (We can't use 
> > std::span
> > +// itself because it's a C++20 feature.)
> > +//
> > +// In addition, provide an invalid value that is distinct from all valid
> > +// sequences (including the empty sequence).  This can be used to return
> > +// failure without having to use std::optional.
> > +//
> > +// There is no operator bool because it would be ambiguous whether it is
> > +// testing for a valid value or an empty sequence.
> > +template
> > +class array_slice
> > +{
> > +  template friend class array_slice;
> > +
> > +public:
> > +  using value_type = T;
> > +  using iterator = T *;
> > +  using const_iterator = const T *;
> > +
> > +  array_slice () : m_base (nullptr), m_size (0) {}
> > +
> > +  template
> > +  array_slice (array_slice other)
> > +: m_base (other.m_base), m_size (other.m_size) {}
> > +
> > +  array_slice (iterator base, unsigned int size)
> > +: m_base (base), m_size (size) {}
> > +
> > +  template
> > +  array_slice (T (&array)[N]) : m_base (array), m_size (N) {}
> > +
> > +  template
> > +  array_slice (const vec &v)
> > +: m_base (v.address ()), m_size (v.length ()) {}
> > +
> > +  iterator begin () { return m_base; }
> > +  iterator end () { return m_base + m_size; }
> > +
> > +  const_iterator begin () const { return m_base; }
> > +  const_iterator end () const { return m_base + m_size; }
> > +
> > +  value_type &front ();
> > +  value_type &back ();
> > +  value_type &operator[] (unsigned int i);
> > +
> > +  const value_type &front () const;
> > +  const value_type &back () const;
> > +  const value_type &operator[] (unsigned int i) const;
> > +
> > +  size_t size () const { return m_size; }
>
> ...this has been a constant source of compile errors, because vectors
> have length () and this is size ().
>
> I understand that the motivation was consistency with std::span, but do
> we really want to add another inconsistency with ourselves?
>
> Given that array_slice is not that much used yet, I believe we can still
> change to be consistent with vectors.  I personally think we should but
> at the very least, if we keep it as it is, I'd like us to do so
> deliberately.

We could alternatively add length in addition to size (and maybe size to
vec<> if std::vector has size but not length) with a comment deprecating
the "non-standard" variant?

Richard.

>
> Thanks,
>
> Martin
>

52 matches

Mail list logo