Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders

2021-06-03 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu  wrote:
>
> Update move expanders to convert the CONST_WIDE_INT operand to vector
> broadcast from a byte with AVX2.  Add ix86_gen_scratch_sse_rtx to
> return a scratch SSE register which won't increase stack alignment
> requirement and blocks transformation by the combine pass.

Using fixed scratch reg is just too hackish for my taste. The
expansion is OK to emit some optimized sequence, but the approach to
use fixed reg to bypass stack alignment functionality and combine is
not.

Perhaps a new insn pattern should be introduced, e.g.

(define_insn_and_split ""
   [(set (match_opreand:V 0 "memory_operand" "=m,m")
(vec_duplicate:V
  (match_operand:S 2 "reg_or_0_operand" "r,C"))
(clobber (match_scratch:V 1 "=x"))]

and split it at some appropriate point.

Uros.

>
> A small benchmark:
>
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
>
> shows that broadcast is a little bit faster on Intel Core i7-8559U:
>
> $ make
> gcc -g -I. -O2   -c -o test.o test.c
> gcc -g   -c -o memory.o memory.S
> gcc -g   -c -o broadcast.o broadcast.S
> gcc -o test test.o memory.o broadcast.o
> ./test
> memory   : 99333
> broadcast: 97208
> $
>
> broadcast is also smaller:
>
> $ size memory.o broadcast.o
>textdata bss dec hex filename
> 132   0   0 132  84 memory.o
> 122   0   0 122  7a broadcast.o
> $
>
> gcc/
>
> PR target/100865
> * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> New prototype.
> (ix86_byte_broadcast): New function.
> (ix86_convert_const_wide_int_to_broadcast): Likewise.
> (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast
> if mode size is 16 bytes or bigger.
> (ix86_expand_vector_move): Try
> ix86_convert_const_wide_int_to_broadcast.
> * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> prototype.
> * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
> an argument to ignore stack_alignment_estimated.  It is passed
> as false by default.
> (ix86_gen_scratch_sse_rtx): New function.
>
> gcc/testsuite/
>
> PR target/100865
> * gcc.target/i386/pr100865-1.c: New test.
> * gcc.target/i386/pr100865-2.c: Likewise.
> * gcc.target/i386/pr100865-3.c: Likewise.
> * gcc.target/i386/pr100865-4.c: Likewise.
> * gcc.target/i386/pr100865-5.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.c  | 103 ++---
>  gcc/config/i386/i386-protos.h  |   2 +
>  gcc/config/i386/i386.c |  50 +-
>  gcc/testsuite/gcc.target/i386/pr100865-1.c |  13 +++
>  gcc/testsuite/gcc.target/i386/pr100865-2.c |  14 +++
>  gcc/testsuite/gcc.target/i386/pr100865-3.c |  15 +++
>  gcc/testsuite/gcc.target/i386/pr100865-4.c |  16 
>  gcc/testsuite/gcc.target/i386/pr100865-5.c |  17 
>  8 files changed, 215 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 4185f58eed5..658adafa269 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -93,6 +93,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "i386-builtins.h"
>  #include "i386-expand.h"
>
> +static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
> +  rtx);
> +
>  /* Split one or more double-mode RTL references into pairs of half-mode
> references.  The RTL can be REG, offsettable MEM, integer constant, or
> CONST_DOUBLE.  "operands" is a pointer to an array of double-mode RTLs to
> @@ -190,6 +193,65 @@ ix86_expand_clear (rtx dest)
>emit_insn (tmp);
>  }
>
> +/* Return a byte value which V can be broadcasted from.  Otherwise,
> +   return INT_MAX.  */
> +
> +static int
> +ix86_byte_broadcast (HOST_WIDE_INT v)
> +{
> +  wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT);
> +  int byte_broadcast = wi::extract_uhwi (val, 0, BITS_PER_UNIT);
> +  for (unsigned int i = BITS_PER_UNIT;
> +   i < HOST_BITS_PER_WIDE_INT;
> +   i += BITS_PER_UNIT)
> +{
> +  int byte = wi::extract_uhwi (val, i, BITS_PER_UNIT);
> +  if (byte_broadcast != byte)
> +   return INT_MAX;
> +}
> +  return byte_broadcast;
> +}
> +
> +/* Convert the CONST_WIDE_INT operand OP to broadcast in MODE.  */
> +
> +static rtx
> +ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op)
> +{
> +  rtx target = nullptr;
> +
> +  /* Convert CONST_WIDE_INT to broadcast only if vector broadc

Re: [PATCH] ARC: gcc driver default to hs38_linux

2021-06-03 Thread Claudiu Zissulescu via Gcc-patches
Right, we can consider this cpu switch missing in gcc11.

Best,
Claudiu

From: Vineet Gupta 
Sent: Wednesday, June 2, 2021 8:36 PM
To: Claudiu Zissulescu ; gcc-patches@gcc.gnu.org 

Cc: linux-snps-...@lists.infradead.org 
Subject: Re: [PATCH] ARC: gcc driver default to hs38_linux

On 6/2/21 1:38 AM, Claudiu Zissulescu wrote:
> Approved.

Thx for the super quick action on this Claudiu. Can this be slated for
backports too as it causes issues when building toolchains for modern
cores without explicit defaults.

-Vineet

>
> //Claudiu
> 
> *From:* Vineet Gupta 
> *Sent:* Tuesday, June 1, 2021 10:42 PM
> *To:* gcc-patches@gcc.gnu.org 
> *Cc:* Claudiu Zissulescu ;
> linux-snps-...@lists.infradead.org
> ; Vineet Gupta 
> *Subject:* [PATCH] ARC: gcc driver default to hs38_linux
> arc700 is legacy and there's no active development for it, so switch to
> latest hs38_linux as default
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/arc/arc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index bd1fe0abd7af..252241a858c9 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -34,7 +34,7 @@ along with GCC; see the file COPYING3. If not see
>  #define SYMBOL_FLAG_CMEM(SYMBOL_FLAG_MACH_DEP << 3)
>
>  #ifndef TARGET_CPU_DEFAULT
> -#define TARGET_CPU_DEFAULT PROCESSOR_arc700
> +#define TARGET_CPU_DEFAULT PROCESSOR_hs38_linux
>  #endif
>
>  /* Check if this symbol has a long_call attribute in its declaration */
> --
> 2.25.1
>



Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Segher Boessenkool
Hi!

On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote:
> on 2021/6/3 上午7:52, Segher Boessenkool wrote:
> >> - add a new "define_independent_insn_and_split" that has the
> >>   current semantics of define_insn_and_split.  This should be
> >>   mechanical.
> > 
> > I'd rather not have that -- we can just write separate define_insn and
> > define_split in that case.
> 
> Not sure if someone would argue that he/she would like to go with one shared
> pattern as before, to avoid any possible differences between two seperated
> patterns and have good maintainability (like only editing on place) and
> slightly better efficiency.

You only would do this if you have a different insn condition and split
condition, which is a very important thing to know, it doesn't hurt to
draw attention to it.  The efficiency is exactly the same btw,
define_insn_and_split is just syntactic sugar.

The whole point of requiring the split condition to start with && is so
it will become harder to mess things up (it will make the gen* code a
tiny little bit simpler as well).  And there is no transition period or
anything like that needed either.  Just the bunch that will break will
need fixing.  So let's find out how many of those there are :-)

> > How many such cases *are* there?  There are no users exposed to this,
> > and when the split condition is required to start with "&&" (instead of
> > getting that implied) it is not a silent change ever, either.
> 
> If I read the proposal right, the explicit "&&" is only required when going
> to find all potential problematic places for final implied "&&" change.
> But one explicit "&&" does offer good readability.

My proposal is to always require && (or maybe identical insn and split
conditions should be allowed as well, if people still use that -- that
is how we wrote "&& 1" before that existed).


Segher


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi Richi/Richard/Jeff/Segher,
>
> Thanks for the comments!
>
> on 2021/6/3 锟斤拷锟斤拷7:52, Segher Boessenkool wrote:
>> On Wed, Jun 02, 2021 at 06:32:13PM +0100, Richard Sandiford wrote:
>>> Richard Biener  writes:
 So what Richard suggests would be to disallow split conditions
 that do not start with "&& ", it's probably easy to do that as well
 and look for build fails.  That should catch all cases to look at.
>>>
>>> Yeah.  As a strawman proposal, how about:
>>>
>>> - add a new "define_independent_insn_and_split" that has the
>>>   current semantics of define_insn_and_split.  This should be
>>>   mechanical.
>> 
>> I'd rather not have that -- we can just write separate define_insn and
>> define_split in that case.
>> 
>
> Not sure if someone would argue that he/she would like to go with one shared
> pattern as before, to avoid any possible differences between two seperated
> patterns and have good maintainability (like only editing on place) and
> slightly better efficiency.

Right.  Plus it creates less make-work.  If we didn't have it, someone
would need to split the define_insn_and_splits that don't currently
use "&&", then later someone might decide that the missing "&&" was a
mistake and need to put them together again (or just revert the patch
and edit from there, I guess).

Plus define_independent_insn_and_split would act as a flag for something
that might be suspect.  If we split them then the define_split condition
will seem to have been chosen deliberately in isolation.

>> How many such cases *are* there?  There are no users exposed to this,
>> and when the split condition is required to start with "&&" (instead of
>> getting that implied) it is not a silent change ever, either.
>> 
>
> If I read the proposal right, the explicit "&&" is only required when going
> to find all potential problematic places for final implied "&&" change.
> But one explicit "&&" does offer good readability.

I don't know.  "&& 1" looks kind of weird to me.

One thing I'd been wondering about a while ago was whether we should key
the split part of define_insn_and_splits off the insn code, instead of
repeating the pattern match and insn C condition.  That would make the
split apply only to the associated define_insns, whereas at the moment
they also apply to any earlier (less general) define_insn that happens
to match the pattern and the C conditions.  It would also reduce the
complexity of the autogenerated define_split logic.

I don't know whether that's a good idea or not.  But having an explicit
"&&" implies that the generator shouldn't do that, and that it should
retest the insn condition from scratch.

>>> - find the define_insn_and_splits that are missing the "&&", and where
>>>   missing the "&&" might make a difference.  Change them to
>>>   define_independent_insn_and_splits.
>>>
>>>   Like Richard says, this can be done by temporarily disallowing
>>>   define_insn_and_splits that have no "&&".
>> 
>> If we make that change permanently, that is all steps we ever need!
>> 
>
> So the question is that: whether we need to demand an explicit "&&".
> Richard's proposal is for answer "no" which aligns with Richi's auto
> filling advice before.  I think it would result in fewer changes since
> those places without explicit "&&" are mostly unintentional, all the jobs
> are done by implied "&&".  Its downside seems to be bad readability, new
> readers may take it as two seperated conditions at first glance, but I
> guess if we emphasize this change in the document it would be fine?
> Or emitting one warning if missing an explicit "&&"?

IMO the natural way to read it is that the split C condition gives the
conditions under which the instruction should be split.  I think that's
why forgetting the "&&" is such a common mistake.  (I know I've done it
plenty of times.)

IMO requiring the "&&" is baking in an alternative, less intuitive,
interpretation.

Thanks,
Richard


Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-03 Thread Trevor Saunders
On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote:
> On 6/2/21 12:55 AM, Richard Biener wrote:
> > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > 
> > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign 
> > > > > > > > > because
> > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > delete)
> > > > > > > > > either special function.  Since I first ran into the problem,
> > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy
> > > > > > > > > assignment operator.
> > > > > > > > > 
> > > > > > > > > The attached patch adds the two special functions to auto_vec 
> > > > > > > > > along
> > > > > > > > > with a few simple tests.  It makes auto_vec safe to use in 
> > > > > > > > > containers
> > > > > > > > > that expect copyable and assignable element types and passes
> > > > > > > > > bootstrap
> > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > 
> > > > > > > > The question is whether we want such uses to appear since those
> > > > > > > > can be quite inefficient?  Thus the option is to delete those
> > > > > > > > operators?
> > > > > > > 
> > > > > > > I would strongly prefer the generic vector class to have the 
> > > > > > > properties
> > > > > > > expected of any other generic container: copyable and assignable. 
> > > > > > >  If
> > > > > > > we also want another vector type with this restriction I suggest 
> > > > > > > to add
> > > > > > > another "noncopyable" type and make that property explicit in its 
> > > > > > > name.
> > > > > > > I can submit one in a followup patch if you think we need one.
> > > > > > 
> > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > Looking
> > > > > > around
> > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> do it
> > > > > > might be surprising (I added the move capability to match how vec<>
> > > > > > is used - as "reference" to a vector)
> > > > > 
> > > > > The vec base classes are special: they have no ctors at all (because
> > > > > of their use in unions).  That's something we might have to live with
> > > > > but it's not a model to follow in ordinary containers.
> > > > 
> > > > I don't think we have to live with it anymore, now that we're writing
> > > > C++11.
> > > > 
> > > > > The auto_vec class was introduced to fill the need for a conventional
> > > > > sequence container with a ctor and dtor.  The missing copy ctor and
> > > > > assignment operators were an oversight, not a deliberate feature.
> > > > > This change fixes that oversight.

I've been away a while, but trying to get back into this, sorry.  It was
definitely an oversight to leave these undefined for the compiler to
provide a default definition of, but I agree with Richi, the better
thing to have done, or do now would be to mark them as deleted and make
auto_vec move only (with copy() for when you really need a deep copy.
> > > > > 
> > > > > The revised patch also adds a copy ctor/assignment to the auto_vec
> > > > > primary template (that's also missing it).  In addition, it adds
> > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > assignment as you prefer.
> > > > 
> > > > Hmm, adding another class doesn't really help with the confusion richi
> > > > mentions.  And many uses of auto_vec will pass them as vec, which will
> > > > still do a shallow copy.  I think it's probably better to disable the
> > > > copy special members for auto_vec until we fix vec<>.
> > > 
> > > There are at least a couple of problems that get in the way of fixing
> > > all of vec to act like a well-behaved C++ container:
> > > 
> > > 1) The embedded vec has a trailing "flexible" array member with its
> > > instances having different size.  They're initialized by memset and
> > > copied by memcpy.  The class can't have copy ctors or assignments
> > > but it should disable/delete them instead.
> > > 
> > > 2) The heap-based vec is used throughout GCC with the assumption of
> > > shallow copy semantics (not just as function arguments but also as
> > > members of other such POD classes).  This can be changed by providing
> > > copy and move ctors and assignment operators for it, and also for
> > > some of the classes in which it's a member and that are used with
> > > the same assumption.
> > > 
> > > 3) The heap-based vec::block_remove() assumes its elements are PODs.
> > > That breaks in VEC

Re: [arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]

2021-06-03 Thread Christophe Lyon via Gcc-patches
On Mon, 15 Mar 2021 at 17:03, Christophe Lyon
 wrote:
>
> On Wed, 3 Mar 2021 at 15:00, Richard Earnshaw
>  wrote:
> >
> > On 02/03/2021 18:35, Christophe Lyon wrote:
> > > On Tue, 2 Mar 2021 at 19:18, Richard Earnshaw
> > >  wrote:
> > >>
> > >> On 02/03/2021 18:14, Richard Earnshaw via Gcc-patches wrote:
> > >>> On 02/03/2021 18:10, Christophe Lyon wrote:
> >  On Tue, 2 Mar 2021 at 17:25, Richard Earnshaw
> >   wrote:
> > >
> > > On 02/03/2021 16:19, Richard Earnshaw via Gcc-patches wrote:
> > >> On 01/03/2021 15:26, Christophe Lyon via Gcc-patches wrote:
> > >>> Ping?
> > >>>
> > >>> On Wed, 3 Feb 2021 at 10:01, Christophe Lyon 
> > >>>  wrote:
> > 
> >  Ping?
> >  I guess that's obvious enough?
> > 
> >  On Wed, 27 Jan 2021 at 10:03, Christophe Lyon
> >   wrote:
> > >
> > > Depending on how the toolchain is configured or how the testsuite 
> > > is
> > > executed, -mthumb may not be compatible. Like for other tests, 
> > > skip
> > > pr97969.c in this case.
> > >
> > > For instance arm-linux-gnueabihf and -march=armv5t in 
> > > RUNTESTFLAGS.
> > >
> > > 2021-01-27  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > PR target/97969
> > > * gcc.target/arm/pr97969.c: Skip if thumb mode is not available.
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c
> > > b/gcc/testsuite/gcc.target/arm/pr97969.c
> > > index 714a1d1..0b5d07f 100644
> > > --- a/gcc/testsuite/gcc.target/arm/pr97969.c
> > > +++ b/gcc/testsuite/gcc.target/arm/pr97969.c
> > > @@ -1,4 +1,5 @@
> > >  /* { dg-do compile } */
> > > +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
> > >  /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w 
> > > -Os" } */
> > >
> > >  typedef a[23];
> > >>
> > >> I'm working on a patch to make this sort of change unnecessary (I 
> > >> hope).
> > >>  Just running some final checks.
> > >>
> > >> R.
> > >>
> > >
> > > Ah, wait.  This one already has an explicit -mthumb, so my patch won't
> > > affect this.  But why is -mthumb needed for this test anyway?  It's 
> > > just
> > > a compilation test, so why not drop that and we'll generally get 
> > > better
> > > coverage all round.
> > >
> > 
> >  For instance I see the test fail for target arm-none-linux-gnueabihf
> >  --with-mode arm --with-cpu cortex-a9 --with-fpu vfp
> >  and running the tests with -march=armv5t
> > 
> >  We get the famous thumb-1 + hard-float ABI not supported.
> > 
> >  I guess -mthumb is inherited from the bug report?
> > 
> >  Christophe
> > 
> > >>>
> > >>> dropping the -mthumb should fix that though?
> > >>>
> > >>> In fact, I'd drop -Os as well, it's not needed as -Os is just one of the
> > >>> many options that are used to build this test already.
> > >>>
> > >>> R.
> > >>>
> > >>
> > >> But maybe then we need to change dg-options into dg-add-options.
> > >>
> > >
> > > Not sure to follow: the test is compiled only once, with:
> > > -std=c99 -fno-omit-frame-pointer -mthumb -w -Os
> > > in my logs
> > >
> >
> > I think it's only run the once /because/ the test sets dg-options rather
> > than dg-add-options.
> >
>
> Hi, sorry for the delay...
> I guess you mean dg-additional-options ?
> I did try that, to be sure, but the tests in gcc.target/arm are only
> compiled once.
>
> Back to the original discussion, if we drop -mthumb, which is required
> according to the PR (?), how do we ensure coverage? Sure I'm running
> the testsuite with various RUNTESTFLAGS settings, but wouldn't it be
> better to test what the PR reports by default?
>

Hi,

I'm resurrecting this discussion since Vladimir backported his patch
to gcc-9, and I just received a new failure warning from validation on
that branch.

Richard, any update?

Thanks

Christophe

> Thanks
>
> Christophe
> .


Re: [arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]

2021-06-03 Thread Christophe Lyon via Gcc-patches
On Thu, 3 Jun 2021 at 10:38, Christophe Lyon  wrote:
>
> On Mon, 15 Mar 2021 at 17:03, Christophe Lyon
>  wrote:
> >
> > On Wed, 3 Mar 2021 at 15:00, Richard Earnshaw
> >  wrote:
> > >
> > > On 02/03/2021 18:35, Christophe Lyon wrote:
> > > > On Tue, 2 Mar 2021 at 19:18, Richard Earnshaw
> > > >  wrote:
> > > >>
> > > >> On 02/03/2021 18:14, Richard Earnshaw via Gcc-patches wrote:
> > > >>> On 02/03/2021 18:10, Christophe Lyon wrote:
> > >  On Tue, 2 Mar 2021 at 17:25, Richard Earnshaw
> > >   wrote:
> > > >
> > > > On 02/03/2021 16:19, Richard Earnshaw via Gcc-patches wrote:
> > > >> On 01/03/2021 15:26, Christophe Lyon via Gcc-patches wrote:
> > > >>> Ping?
> > > >>>
> > > >>> On Wed, 3 Feb 2021 at 10:01, Christophe Lyon 
> > > >>>  wrote:
> > > 
> > >  Ping?
> > >  I guess that's obvious enough?
> > > 
> > >  On Wed, 27 Jan 2021 at 10:03, Christophe Lyon
> > >   wrote:
> > > >
> > > > Depending on how the toolchain is configured or how the 
> > > > testsuite is
> > > > executed, -mthumb may not be compatible. Like for other tests, 
> > > > skip
> > > > pr97969.c in this case.
> > > >
> > > > For instance arm-linux-gnueabihf and -march=armv5t in 
> > > > RUNTESTFLAGS.
> > > >
> > > > 2021-01-27  Christophe Lyon  
> > > >
> > > > gcc/testsuite/
> > > > PR target/97969
> > > > * gcc.target/arm/pr97969.c: Skip if thumb mode is not available.
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c
> > > > b/gcc/testsuite/gcc.target/arm/pr97969.c
> > > > index 714a1d1..0b5d07f 100644
> > > > --- a/gcc/testsuite/gcc.target/arm/pr97969.c
> > > > +++ b/gcc/testsuite/gcc.target/arm/pr97969.c
> > > > @@ -1,4 +1,5 @@
> > > >  /* { dg-do compile } */
> > > > +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } 
> > > > */
> > > >  /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w 
> > > > -Os" } */
> > > >
> > > >  typedef a[23];
> > > >>
> > > >> I'm working on a patch to make this sort of change unnecessary (I 
> > > >> hope).
> > > >>  Just running some final checks.
> > > >>
> > > >> R.
> > > >>
> > > >
> > > > Ah, wait.  This one already has an explicit -mthumb, so my patch 
> > > > won't
> > > > affect this.  But why is -mthumb needed for this test anyway?  It's 
> > > > just
> > > > a compilation test, so why not drop that and we'll generally get 
> > > > better
> > > > coverage all round.
> > > >
> > > 
> > >  For instance I see the test fail for target arm-none-linux-gnueabihf
> > >  --with-mode arm --with-cpu cortex-a9 --with-fpu vfp
> > >  and running the tests with -march=armv5t
> > > 
> > >  We get the famous thumb-1 + hard-float ABI not supported.
> > > 
> > >  I guess -mthumb is inherited from the bug report?
> > > 
> > >  Christophe
> > > 
> > > >>>
> > > >>> dropping the -mthumb should fix that though?
> > > >>>
> > > >>> In fact, I'd drop -Os as well, it's not needed as -Os is just one of 
> > > >>> the
> > > >>> many options that are used to build this test already.
> > > >>>
> > > >>> R.
> > > >>>
> > > >>
> > > >> But maybe then we need to change dg-options into dg-add-options.
> > > >>
> > > >
> > > > Not sure to follow: the test is compiled only once, with:
> > > > -std=c99 -fno-omit-frame-pointer -mthumb -w -Os
> > > > in my logs
> > > >
> > >
> > > I think it's only run the once /because/ the test sets dg-options rather
> > > than dg-add-options.
> > >
> >
> > Hi, sorry for the delay...
> > I guess you mean dg-additional-options ?
> > I did try that, to be sure, but the tests in gcc.target/arm are only
> > compiled once.
> >
> > Back to the original discussion, if we drop -mthumb, which is required
> > according to the PR (?), how do we ensure coverage? Sure I'm running
> > the testsuite with various RUNTESTFLAGS settings, but wouldn't it be
> > better to test what the PR reports by default?
> >
>
> Hi,
>
> I'm resurrecting this discussion since Vladimir backported his patch
> to gcc-9, and I just received a new failure warning from validation on
> that branch.
>
Sorry, I meant gcc-10.

> Richard, any update?
>
> Thanks
>
> Christophe
>
> > Thanks
> >
> > Christophe
> > .


Re: [backport gcc10, gcc9] Requet to backport PR97969

2021-06-03 Thread Christophe Lyon via Gcc-patches
On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Vladimir Makarov 
> > Sent: 31 May 2021 16:52
> > To: Przemyslaw Wirkus ; Richard Biener
> > 
> > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > ; Kyrylo Tkachov
> > 
> > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> >
> >
> > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:
> > > Hi,
> > > Just a follow up after GCC 11 release.
> > >
> > > I've backported to gcc-10 branch (without any change to original patches)
> > > PR97969 and following PR98722 & PR98777 patches.
> > >
> > > Commits apply cleanly without changes.
> > > Built and regression tested on:
> > > * arm-none-eabi and
> > > * aarch64-none-linux-gnu cross toolchains.
> > >
> > > There were no issues and no regressions (all OK).
> > >
> > > OK for backport to gcc-10 branch ?
> >
> > Sorry for delay with the answer due to my vacation.
> >
> > As the patches did not introduce new PRs I believe they are ok for gcc-10.
>
> Backported to gcc-10 branch. Thank you for your support.
>

Hi,

I'm surprised to see many new errors on arm after the backport for PR98722
See: 
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build-info.html

Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new
ICEs) on your side?

Thanks

> Kind regards
> Przemyslaw
>
> > Thank you.
> >
> > >
> > > Kind regards,
> > > Przemyslaw Wirkus
> > >
> > > ---
> > > commits I've backported:
> > >
> > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > Author: Vladimir N. Makarov 
> > > Date:   Tue Jan 12 11:26:15 2021 -0500
> > >
> > >  [PR97969] LRA: Transform pattern `plus (plus (hard reg, const), 
> > > pseudo)`
> > after elimination
> > >
> > > commit 4334b524274203125193a08a8485250c41c2daa9
> > > Author: Vladimir N. Makarov 
> > > Date:   Wed Jan 20 11:40:14 2021 -0500
> > >
> > >  [PR98722] LRA: Check that target has no 3-op add insn to transform 2
> > plus expression.
> > >
> > > commit 68ba1039c7daf0485b167fe199ed7e8031158091
> > > Author: Vladimir N. Makarov 
> > > Date:   Thu Jan 21 17:27:01 2021 -0500
> > >
> > >  [PR98777] LRA: Use preliminary created pseudo for in LRA elimination
> > subpass
> > >
> > > $ ./contrib/git-backport.py cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > $ ./contrib/git-backport.py 4334b524274203125193a08a8485250c41c2daa9
> > > $ ./contrib/git-backport.py 68ba1039c7daf0485b167fe199ed7e8031158091
> > >
> > >
> > >> Richard.
>


RE: [backport gcc10, gcc9] Requet to backport PR97969

2021-06-03 Thread Przemyslaw Wirkus via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: 03 June 2021 09:45
> To: Przemyslaw Wirkus 
> Cc: Vladimir Makarov ; ja...@redhat.com; Richard
> Earnshaw ; Richard Biener
> ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> 
> Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> 
> On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > > -Original Message-
> > > From: Vladimir Makarov 
> > > Sent: 31 May 2021 16:52
> > > To: Przemyslaw Wirkus ; Richard Biener
> > > 
> > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com;
> > > Richard Earnshaw ; Ramana
> Radhakrishnan
> > > ; Kyrylo Tkachov
> > > 
> > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> > >
> > >
> > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:
> > > > Hi,
> > > > Just a follow up after GCC 11 release.
> > > >
> > > > I've backported to gcc-10 branch (without any change to original
> > > > patches)
> > > > PR97969 and following PR98722 & PR98777 patches.
> > > >
> > > > Commits apply cleanly without changes.
> > > > Built and regression tested on:
> > > > * arm-none-eabi and
> > > > * aarch64-none-linux-gnu cross toolchains.
> > > >
> > > > There were no issues and no regressions (all OK).
> > > >
> > > > OK for backport to gcc-10 branch ?
> > >
> > > Sorry for delay with the answer due to my vacation.
> > >
> > > As the patches did not introduce new PRs I believe they are ok for gcc-10.
> >
> > Backported to gcc-10 branch. Thank you for your support.
> >
> 
> Hi,
> 
> I'm surprised to see many new errors on arm after the backport for PR98722
> See: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-
> 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build-
> info.html
> 
> Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new
> ICEs) on your side?

Apologies.

I've built and regtested before submitting backport yesterday.
I will check on my side and build one of your failing configurations.

Przemyslaw

> Thanks
> 
> > Kind regards
> > Przemyslaw
> >
> > > Thank you.
> > >
> > > >
> > > > Kind regards,
> > > > Przemyslaw Wirkus
> > > >
> > > > ---
> > > > commits I've backported:
> > > >
> > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > Author: Vladimir N. Makarov 
> > > > Date:   Tue Jan 12 11:26:15 2021 -0500
> > > >
> > > >  [PR97969] LRA: Transform pattern `plus (plus (hard reg,
> > > > const), pseudo)`
> > > after elimination
> > > >
> > > > commit 4334b524274203125193a08a8485250c41c2daa9
> > > > Author: Vladimir N. Makarov 
> > > > Date:   Wed Jan 20 11:40:14 2021 -0500
> > > >
> > > >  [PR98722] LRA: Check that target has no 3-op add insn to
> > > > transform 2
> > > plus expression.
> > > >
> > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091
> > > > Author: Vladimir N. Makarov 
> > > > Date:   Thu Jan 21 17:27:01 2021 -0500
> > > >
> > > >  [PR98777] LRA: Use preliminary created pseudo for in LRA
> > > > elimination
> > > subpass
> > > >
> > > > $ ./contrib/git-backport.py
> > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > $ ./contrib/git-backport.py
> > > > 4334b524274203125193a08a8485250c41c2daa9
> > > > $ ./contrib/git-backport.py
> > > > 68ba1039c7daf0485b167fe199ed7e8031158091
> > > >
> > > >
> > > >> Richard.
> >


[PATCH] c++: Fix up attribute handling in methods in templates [PR100872]

2021-06-03 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase FAILs because a dependent (late) attribute is never
tsubsted.  While the testcase is OpenMP, I think it is a generic C++ FE problem
that could affect any other dependent attribute.

apply_late_template_attributes documents that it relies on
  /* save_template_attributes puts the dependent attributes at the beginning of
 the list; find the non-dependent ones.  */
The "operator binding" attributes that are sometimes added are added to the
head of DECL_ATTRIBUTES list though and because it doesn't have
ATTR_IS_DEPENDENT set it violates this requirement.

The following patch fixes it by adding that attribute after all
ATTR_IS_DEPENDENT attributes.  I'm not 100% sure if DECL_ATTRIBUTES can't be
shared by multiple functions (e.g. the cdtor clones), but the code uses
later remove_attribute which could break that too.  In any case it passed
bootstrap/regtest on x86_64-linux and i686-linux.

Other option would be to copy_list the ATTR_IS_DEPENDENT portion of the
DECL_ATTRIBUTES list if we need to do this, that would be the same as this
patch but replace that *ap = op_attr; at the end with
  *ap = NULL_TREE;
  DECL_ATTRIBUTES (cfn) = chainon (copy_list (DECL_ATTRIBUTES (cfn)),
   op_attr);
Or perhaps set ATTR_IS_DEPENDENT on the "operator bindings" attribute,
though it would need to be studied what would it try to do with the
attribute during tsubst.

2021-06-03  Jakub Jelinek  

PR c++/100872
* name-lookup.c (maybe_save_operator_binding): Add op_attr after all
ATTR_IS_DEPENDENT attributes in the DECL_ATTRIBUTES list rather than
to the start.

* g++.dg/gomp/declare-simd-8.C: New test.

--- gcc/cp/name-lookup.c.jj 2021-05-11 09:06:24.281997782 +0200
+++ gcc/cp/name-lookup.c2021-06-02 15:50:52.042521824 +0200
@@ -9136,9 +9136,12 @@ maybe_save_operator_binding (tree e)
   tree op_attr = lookup_attribute (op_bind_attrname, attributes);
   if (!op_attr)
 {
+  tree *ap = &DECL_ATTRIBUTES (cfn);
+  while (*ap && ATTR_IS_DEPENDENT (*ap))
+   ap = &TREE_CHAIN (*ap);
   op_attr = tree_cons (get_identifier (op_bind_attrname),
-  NULL_TREE, attributes);
-  DECL_ATTRIBUTES (cfn) = op_attr;
+  NULL_TREE, *ap);
+  *ap = op_attr;
 }
 
   tree op_bind = purpose_member (fnname, TREE_VALUE (op_attr));
--- gcc/testsuite/g++.dg/gomp/declare-simd-8.C.jj   2021-06-02 
16:02:32.792681922 +0200
+++ gcc/testsuite/g++.dg/gomp/declare-simd-8.C  2021-06-02 16:02:09.849004442 
+0200
@@ -0,0 +1,15 @@
+// PR c++/100872
+
+template 
+struct S {
+  #pragma omp declare simd aligned(a : N * 2) aligned(b) linear(ref(b): N)
+  float foo (float *a, T *&b) { return *a + *b; }
+};
+
+S<16, float> s;
+
+float
+bar (float *a, float *p)
+{
+  return s.foo (a, p);
+}

Jakub



[PATCH] [i386] Fix ICE of insn does not satisfy its constraints.

2021-06-03 Thread liuhongt via Gcc-patches
For evex encoding extended instructions, when vector length is less
than 512 bits, AVX512VL is needed, besides some instructions like
vpmovzxbx need extra AVX512BW. So this patch refines corresponding
constraints, i.e. from "v/vm" to "Yv/Yvm", from "v/vm" to "Yw/Ywm".

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
Ok for trunk?
and backport to GCC10 and GCC11?

BTW, I'm not sure should this patch be backport to GCC8 and GCC9 since
constraints "Yw" is introduced in GCC10, if we want to do that, we also
need to backport defination of "Yw".

gcc/ChangeLog:

PR target/100885
* config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3): Refine
constraints.
(avx2_v8qiv8si2): Ditto.
(*avx2_v8qiv8si2_1): Ditto.
(sse4_1_v4qiv4si2): Ditto.
(*sse4_1_v4qiv4si2_1): Ditto.
(avx2_v8hiv8si2): Ditto.
(avx2_zero_extendv8hiv8si2_1): Ditto.
(sse4_1_v4hiv4si2): Ditto.
(*sse4_1_v4hiv4si2_1): Ditto.
(*sse4_1_zero_extendv4hiv4si2_3): Ditto.
(avx2_v4qiv4di2): Ditto.
(*avx2_v4qiv4di2_1): Ditto.
(sse4_1_v2qiv2di2): Ditto.
(avx2_v4hiv4di2): Ditto.
(*avx2_v4hiv4di2_1): Ditto.
(sse4_1_v2hiv2di2): Ditto.
(*sse4_1_v2hiv2di2_1): Ditto.
(avx2_v4siv4di2): Ditto.
(*avx2_zero_extendv4siv4di2_1): Ditto.
(v4siv4di2): Ditto.
(sse4_1_v2siv2di2): Ditto.
(*sse4_1_v2siv2di2_1): Ditto.
(*sse4_1_zero_extendv2siv2di2_3): Ditto.

gcc/testsuite/ChangeLog:

PR target/100885
* g++.target/i386/pr100885.C: New test.
---
 gcc/config/i386/sse.md   |  78 +--
 gcc/testsuite/g++.target/i386/pr100885.C | 159 +++
 2 files changed, 198 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr100885.C

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a4503ddcb73..42e62fd4f89 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18101,10 +18101,10 @@ (define_insn_and_split 
"*sse4_1_v8qiv8hi2_2"
   "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);")
 
 (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3"
-  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v")
+  [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw")
(vec_select:V16QI
  (vec_concat:V32QI
-   (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,vm")
+   (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,Ywm")
(match_operand:V16QI 2 "const0_operand" "C,C,C"))
  (match_parallel 3 "pmovzx_parallel"
[(match_operand 4 "const_int_operand" "n,n,n")])))]
@@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2"
   "TARGET_AVX512F")
 
 (define_insn "avx2_v8qiv8si2"
-  [(set (match_operand:V8SI 0 "register_operand" "=v")
+  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
(any_extend:V8SI
  (vec_select:V8QI
-   (match_operand:V16QI 1 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "Yv")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)
   (const_int 4) (const_int 5)
@@ -18179,7 +18179,7 @@ (define_insn "avx2_v8qiv8si2"
(set_attr "mode" "OI")])
 
 (define_insn "*avx2_v8qiv8si2_1"
-  [(set (match_operand:V8SI 0 "register_operand" "=v")
+  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
(any_extend:V8SI
  (match_operand:V8QI 1 "memory_operand" "m")))]
   "TARGET_AVX2 && "
@@ -18225,10 +18225,10 @@ (define_expand "v8qiv8si2"
 })
 
 (define_insn "sse4_1_v4qiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,Yv")
(any_extend:V4SI
  (vec_select:V4QI
-   (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
+   (match_operand:V16QI 1 "register_operand" "Yr,*x,Yv")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)]]
   "TARGET_SSE4_1 && "
@@ -18240,7 +18240,7 @@ (define_insn "sse4_1_v4qiv4si2"
(set_attr "mode" "TI")])
 
 (define_insn "*sse4_1_v4qiv4si2_1"
-  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,Yv")
(any_extend:V4SI
  (match_operand:V4QI 1 "memory_operand" "m,m,m")))]
   "TARGET_SSE4_1 && "
@@ -18322,9 +18322,9 @@ (define_insn_and_split 
"avx512f_zero_extendv16hiv16si2_1"
 })
 
 (define_insn "avx2_v8hiv8si2"
-  [(set (match_operand:V8SI 0 "register_operand" "=v")
+  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
(any_extend:V8SI
-   (match_operand:V8HI 1 "nonimmediate_operand" "vm")))]
+   (match_operand:V8HI 1 "nonimmediate_operand" "Yvm")))]
   "TARGET_AVX2 && "
   "vpmovwd\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
@@ -18339,10 +18339,10 @@ (define_expand "v8hiv8si2"
   "T

Re: [backport gcc10, gcc9] Requet to backport PR97969

2021-06-03 Thread Christophe Lyon via Gcc-patches
On Thu, 3 Jun 2021 at 10:54, Przemyslaw Wirkus
 wrote:
>
>
>
> > -Original Message-
> > From: Christophe Lyon 
> > Sent: 03 June 2021 09:45
> > To: Przemyslaw Wirkus 
> > Cc: Vladimir Makarov ; ja...@redhat.com; Richard
> > Earnshaw ; Richard Biener
> > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> > 
> > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> >
> > On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Vladimir Makarov 
> > > > Sent: 31 May 2021 16:52
> > > > To: Przemyslaw Wirkus ; Richard Biener
> > > > 
> > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com;
> > > > Richard Earnshaw ; Ramana
> > Radhakrishnan
> > > > ; Kyrylo Tkachov
> > > > 
> > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> > > >
> > > >
> > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:
> > > > > Hi,
> > > > > Just a follow up after GCC 11 release.
> > > > >
> > > > > I've backported to gcc-10 branch (without any change to original
> > > > > patches)
> > > > > PR97969 and following PR98722 & PR98777 patches.
> > > > >
> > > > > Commits apply cleanly without changes.
> > > > > Built and regression tested on:
> > > > > * arm-none-eabi and
> > > > > * aarch64-none-linux-gnu cross toolchains.
> > > > >
> > > > > There were no issues and no regressions (all OK).
> > > > >
> > > > > OK for backport to gcc-10 branch ?
> > > >
> > > > Sorry for delay with the answer due to my vacation.
> > > >
> > > > As the patches did not introduce new PRs I believe they are ok for 
> > > > gcc-10.
> > >
> > > Backported to gcc-10 branch. Thank you for your support.
> > >
> >
> > Hi,
> >
> > I'm surprised to see many new errors on arm after the backport for PR98722
> > See: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-
> > 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build-
> > info.html
> >
> > Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new
> > ICEs) on your side?
>
> Apologies.
>
> I've built and regtested before submitting backport yesterday.
> I will check on my side and build one of your failing configurations.
>

After I sent the previous email, I received the validation results for
the next backport,
and it seems it fixes the ICEs introduced by r10-9881:
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10-9882-g05f6971ac40912ef062915f88b3ea0bf27278285/report-build-info.html

I guess you ran validations with the 3 backports combined, rather than
individually?

So it looks OK now.

Thanks,

Christophe


> Przemyslaw
>
> > Thanks
> >
> > > Kind regards
> > > Przemyslaw
> > >
> > > > Thank you.
> > > >
> > > > >
> > > > > Kind regards,
> > > > > Przemyslaw Wirkus
> > > > >
> > > > > ---
> > > > > commits I've backported:
> > > > >
> > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > > Author: Vladimir N. Makarov 
> > > > > Date:   Tue Jan 12 11:26:15 2021 -0500
> > > > >
> > > > >  [PR97969] LRA: Transform pattern `plus (plus (hard reg,
> > > > > const), pseudo)`
> > > > after elimination
> > > > >
> > > > > commit 4334b524274203125193a08a8485250c41c2daa9
> > > > > Author: Vladimir N. Makarov 
> > > > > Date:   Wed Jan 20 11:40:14 2021 -0500
> > > > >
> > > > >  [PR98722] LRA: Check that target has no 3-op add insn to
> > > > > transform 2
> > > > plus expression.
> > > > >
> > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091
> > > > > Author: Vladimir N. Makarov 
> > > > > Date:   Thu Jan 21 17:27:01 2021 -0500
> > > > >
> > > > >  [PR98777] LRA: Use preliminary created pseudo for in LRA
> > > > > elimination
> > > > subpass
> > > > >
> > > > > $ ./contrib/git-backport.py
> > > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > > $ ./contrib/git-backport.py
> > > > > 4334b524274203125193a08a8485250c41c2daa9
> > > > > $ ./contrib/git-backport.py
> > > > > 68ba1039c7daf0485b167fe199ed7e8031158091
> > > > >
> > > > >
> > > > >> Richard.
> > >


RE: [backport gcc10, gcc9] Requet to backport PR97969

2021-06-03 Thread Przemyslaw Wirkus via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: 03 June 2021 10:10
> To: Przemyslaw Wirkus 
> Cc: Vladimir Makarov ; ja...@redhat.com; Richard
> Earnshaw ; Richard Biener
> ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> 
> Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> 
> On Thu, 3 Jun 2021 at 10:54, Przemyslaw Wirkus
>  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Christophe Lyon 
> > > Sent: 03 June 2021 09:45
> > > To: Przemyslaw Wirkus 
> > > Cc: Vladimir Makarov ; ja...@redhat.com;
> > > Richard Earnshaw ; Richard Biener
> > > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> > > 
> > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> > >
> > > On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches  > > patc...@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > > -Original Message-
> > > > > From: Vladimir Makarov 
> > > > > Sent: 31 May 2021 16:52
> > > > > To: Przemyslaw Wirkus ; Richard
> > > > > Biener 
> > > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com;
> > > > > Richard Earnshaw ; Ramana
> > > Radhakrishnan
> > > > > ; Kyrylo Tkachov
> > > > > 
> > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969
> > > > >
> > > > >
> > > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote:
> > > > > > Hi,
> > > > > > Just a follow up after GCC 11 release.
> > > > > >
> > > > > > I've backported to gcc-10 branch (without any change to
> > > > > > original
> > > > > > patches)
> > > > > > PR97969 and following PR98722 & PR98777 patches.
> > > > > >
> > > > > > Commits apply cleanly without changes.
> > > > > > Built and regression tested on:
> > > > > > * arm-none-eabi and
> > > > > > * aarch64-none-linux-gnu cross toolchains.
> > > > > >
> > > > > > There were no issues and no regressions (all OK).
> > > > > >
> > > > > > OK for backport to gcc-10 branch ?
> > > > >
> > > > > Sorry for delay with the answer due to my vacation.
> > > > >
> > > > > As the patches did not introduce new PRs I believe they are ok for 
> > > > > gcc-
> 10.
> > > >
> > > > Backported to gcc-10 branch. Thank you for your support.
> > > >
> > >
> > > Hi,
> > >
> > > I'm surprised to see many new errors on arm after the backport for
> > > PR98722
> > > See:
> > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-
> > > 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-
> build-
> > > info.html
> > >
> > > Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new
> > > ICEs) on your side?
> >
> > Apologies.
> >
> > I've built and regtested before submitting backport yesterday.
> > I will check on my side and build one of your failing configurations.
> >
> 
> After I sent the previous email, I received the validation results for the 
> next
> backport, and it seems it fixes the ICEs introduced by r10-9881:
> https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10-
> 9882-g05f6971ac40912ef062915f88b3ea0bf27278285/report-build-info.html
> 
> I guess you ran validations with the 3 backports combined, rather than
> individually?


Yes, these three PRs are all connected to each other. That's why I've ran 
validation
after third one, not for each one separately.

PS: Office folks now roast me with true passion ;)

P.

> So it looks OK now.
> 
> Thanks,
> 
> Christophe
> 
> 
> > Przemyslaw
> >
> > > Thanks
> > >
> > > > Kind regards
> > > > Przemyslaw
> > > >
> > > > > Thank you.
> > > > >
> > > > > >
> > > > > > Kind regards,
> > > > > > Przemyslaw Wirkus
> > > > > >
> > > > > > ---
> > > > > > commits I've backported:
> > > > > >
> > > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > > > Author: Vladimir N. Makarov 
> > > > > > Date:   Tue Jan 12 11:26:15 2021 -0500
> > > > > >
> > > > > >  [PR97969] LRA: Transform pattern `plus (plus (hard reg,
> > > > > > const), pseudo)`
> > > > > after elimination
> > > > > >
> > > > > > commit 4334b524274203125193a08a8485250c41c2daa9
> > > > > > Author: Vladimir N. Makarov 
> > > > > > Date:   Wed Jan 20 11:40:14 2021 -0500
> > > > > >
> > > > > >  [PR98722] LRA: Check that target has no 3-op add insn to
> > > > > > transform 2
> > > > > plus expression.
> > > > > >
> > > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091
> > > > > > Author: Vladimir N. Makarov 
> > > > > > Date:   Thu Jan 21 17:27:01 2021 -0500
> > > > > >
> > > > > >  [PR98777] LRA: Use preliminary created pseudo for in LRA
> > > > > > elimination
> > > > > subpass
> > > > > >
> > > > > > $ ./contrib/git-backport.py
> > > > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330
> > > > > > $ ./contrib/git-backport.py
> > > > > > 4334b524274203125193a08a8485250c41c2daa9
> > > > > > $ ./contrib/git-backport.py
> > > > > > 68ba1039c7daf0485b167fe199ed7e8031158091
> > > > > >
> > > > > >
> > > > > >> Richard.
> > > >


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 03:00:44AM -0500, Segher Boessenkool wrote:
> On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote:
> The whole point of requiring the split condition to start with && is so
> it will become harder to mess things up (it will make the gen* code a
> tiny little bit simpler as well).  And there is no transition period or
> anything like that needed either.  Just the bunch that will break will
> need fixing.  So let's find out how many of those there are :-)
> 
> > > How many such cases *are* there?  There are no users exposed to this,
> > > and when the split condition is required to start with "&&" (instead of
> > > getting that implied) it is not a silent change ever, either.
> > 
> > If I read the proposal right, the explicit "&&" is only required when going
> > to find all potential problematic places for final implied "&&" change.
> > But one explicit "&&" does offer good readability.
> 
> My proposal is to always require && (or maybe identical insn and split
> conditions should be allowed as well, if people still use that -- that
> is how we wrote "&& 1" before that existed).

I prototyped this.  There are very many errors.  Iterators often modify
the insn condition (for one iteration of it), but that does not work if
the split condition does not start with "&&"!

See attached prototype.


Segher

= = =

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 2cb760ffb90f..05d46fd3775c 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc)
 case DEFINE_INSN_AND_SPLIT:
 case DEFINE_INSN_AND_REWRITE:
   {
-   const char *split_cond;
rtx split;
rtvec attr;
int i;
@@ -611,15 +610,20 @@ process_rtx (rtx desc, file_location loc)
 
/* If the split condition starts with "&&", append it to the
   insn condition to create the new split condition.  */
-   split_cond = XSTR (desc, 4);
-   if (split_cond[0] == '&' && split_cond[1] == '&')
+   const char *insn_cond = XSTR (desc, 2);
+   const char *split_cond = XSTR (desc, 4);
+   if (!strncmp (split_cond, "&&", 2))
  {
rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond);
-   split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2),
+   split_cond = rtx_reader_ptr->join_c_conditions (insn_cond,
split_cond + 2);
+ } else if (insn_cond[0]) {
+   if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
+ error_at (loc, "the rewrite condition must start with `&&'");
+   else
+ error_at (loc, "the split condition must start with `&&' [%s]", 
insn_cond);
  }
-   else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
- error_at (loc, "the rewrite condition must start with `&&'");
+
XSTR (split, 1) = split_cond;
if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
  XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1));


[PATCH][committed]AArch64 Fix failing testcase for native cpu detection

2021-06-03 Thread Tamar Christina via Gcc-patches
Hi All,

A late change in the patch changed the implemented ID to one that
hasn't been used yet to avoid any ambiguity. Unfortunately the
chosen value of 0xFF matches the value of -1 which is used as an
invalid implementer so the test started failing.

This patch changes it to 0xFE which is the highest usable number.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_16: Update implementer.
* gcc.target/aarch64/cpunative/info_17: Likewise

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
index 
b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
@@ -1,7 +1,7 @@
 processor  : 0
 BogoMIPS   : 100.00
 Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
-CPU implementer: 0xff
+CPU implementer: 0xfe
 CPU architecture: 8
 CPU variant: 0x0
 CPU part   : 0xd08
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
index 
b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
@@ -1,7 +1,7 @@
 processor  : 0
 BogoMIPS   : 100.00
 Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
-CPU implementer: 0xff
+CPU implementer: 0xfe
 CPU architecture: 8
 CPU variant: 0x0
 CPU part   : 0xd08


-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16
@@ -1,7 +1,7 @@
 processor	: 0
 BogoMIPS	: 100.00
 Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
-CPU implementer	: 0xff
+CPU implementer	: 0xfe
 CPU architecture: 8
 CPU variant	: 0x0
 CPU part	: 0xd08
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17
@@ -1,7 +1,7 @@
 processor	: 0
 BogoMIPS	: 100.00
 Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2
-CPU implementer	: 0xff
+CPU implementer	: 0xfe
 CPU architecture: 8
 CPU variant	: 0x0
 CPU part	: 0xd08



Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 09:05:02AM +0100, Richard Sandiford via Gcc-patches 
wrote:
> Right.  Plus it creates less make-work.  If we didn't have it, someone
> would need to split the define_insn_and_splits that don't currently
> use "&&", then later someone might decide that the missing "&&" was a
> mistake and need to put them together again (or just revert the patch
> and edit from there, I guess).

I would hope no maintainer would be foolish enough to flip-flop like
that.

> Plus define_independent_insn_and_split would act as a flag for something
> that might be suspect.

That always is a conceptual error, even.  The name says so already: the
insn and the split are very different things, with different conditions,
they just happen to share a pattern.

> If we split them then the define_split condition
> will seem to have been chosen deliberately in isolation.

And it will have beenm chosen deliberately!  Why *else* write things
like this?

> > If I read the proposal right, the explicit "&&" is only required when going
> > to find all potential problematic places for final implied "&&" change.
> > But one explicit "&&" does offer good readability.
> 
> I don't know.  "&& 1" looks kind of weird to me.

We have it in rs6000.md since 2004.  sparc has had it since 2002.  i386
has had it since 2001.  arm still does not have it :-)

> > So the question is that: whether we need to demand an explicit "&&".
> > Richard's proposal is for answer "no" which aligns with Richi's auto
> > filling advice before.  I think it would result in fewer changes since
> > those places without explicit "&&" are mostly unintentional, all the jobs
> > are done by implied "&&".  Its downside seems to be bad readability, new
> > readers may take it as two seperated conditions at first glance, but I
> > guess if we emphasize this change in the document it would be fine?
> > Or emitting one warning if missing an explicit "&&"?
> 
> IMO the natural way to read it is that the split C condition gives the
> conditions under which the instruction should be split.  I think that's
> why forgetting the "&&" is such a common mistake.  (I know I've done it
> plenty of times.)

You even do in this description!  :-)  You do not want the define_split
in a define_insn_and_split to happen for insns that do not match the
insn condition, that would not be intuitive at all.

> IMO requiring the "&&" is baking in an alternative, less intuitive,
> interpretation.

But you want the exact same semantics, right?  I do agree that the
syntax without "&&" looks nicer.  It has many practical problems though,
so it is nice to aim for, but we cannot move there until we have all
existing machine descriptions fixed up first.


Segher


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool  writes:
> On Thu, Jun 03, 2021 at 09:05:02AM +0100, Richard Sandiford via Gcc-patches 
> wrote:
>> Right.  Plus it creates less make-work.  If we didn't have it, someone
>> would need to split the define_insn_and_splits that don't currently
>> use "&&", then later someone might decide that the missing "&&" was a
>> mistake and need to put them together again (or just revert the patch
>> and edit from there, I guess).
>
> I would hope no maintainer would be foolish enough to flip-flop like
> that.

But the point is that the maintainers wouldn't be flip-flopping.
The move to define_independent_insn_and_split would be a mechnical
change made by someone else.

We shouldn't just add "&&" to all define_insn_and_splits that currently
lack them.  That might well break things.  So to get the perfect result,
someone has to look at each individual define_insn_and_split that currently
lacks a "&&" to see what the effects of adding "&&" would be.

IMO it's not reasonable to ask Kewen to do that for all ports.  So the
process I suggested was a way of mechanically changing existing ports in
a way that would not require input from target maintainers, or extensive
testing on affected targets.  It would keep the behaviour of the port
the same as it is now (whether that's good or bad).  This part of the
process could be scripted.

>> > If I read the proposal right, the explicit "&&" is only required when going
>> > to find all potential problematic places for final implied "&&" change.
>> > But one explicit "&&" does offer good readability.
>> 
>> I don't know.  "&& 1" looks kind of weird to me.
>
> We have it in rs6000.md since 2004.  sparc has had it since 2002.  i386
> has had it since 2001.  arm still does not have it :-)

Sure, this syntax goes back 20 years.  I don't think that answers the
point though.  The question was whether a split condition "&& 1" is
more readable than a condition "", given that "" means "always" in other
contexts.  Given the choice, IMO "" is more readable and "&& 1" looks
weird/inconsistent.

Thanks,
Richard


Re: [patch] Tame fix for PR ipa/99122

2021-06-03 Thread Eric Botcazou
> Not sure whether we know VLA results are always returned by
> reference?  In particular does this mean we'll never see a
> WITH_SIZE_EXPR on the LHS of a call?  You might have noticed
> I've done WITH_SIZE_EXPR "enhancements" recently on trunk.

VLAs are always aggregate_value_p so the caller passes the address of the 
return slot, but they are not always DECL_BY_REFERENCE, at least in Ada.

In Ada we use DECL_BY_REFERENCE only for self-referential types, i.e. types 
whose size is not the same for all the objects, because we need to control the 
amount of returned bytes on a case-by-case basis; when the size is uniform, 
this complication is unnecessary.

> As for the patch we indeed didn't have a testcase covering the
> return case so it should be safe to revert the return part.

OK, thanks, patch installed.

-- 
Eric Botcazou




Re: GCC documentation: porting to Sphinx

2021-06-03 Thread Martin Liška

On 6/2/21 10:41 PM, Martin Sebor wrote:

On 5/31/21 7:25 AM, Martin Liška wrote:

Hello.

I've made quite some progress with the porting of the documentation and
I would like to present it to the community now:
https://splichal.eu/scripts/sphinx/




Hello.

Thank you for the review.


Just a few issues I noticed in the warnings section:

The headings of some warnings mention the same option twice (e.g.,
-Wabi, -Wabi, -Wno-abi;  -Wdouble-promotion, -Wdouble-promotion,
-Wno-double-promotion;  -Winit-self, -Winit-self, -Wno-init-self).
This looks like a pretty pervasive problem.


You are right, I fixed that.



Mentioning the -Wno-xxx option is redundant in a heading for -Wxxx.


Yes. Good reason for that is that Sphinx can then generated properly links
to the current non-documented version of the option. Hope it's improvement
over the current situation?



The headings of some other warnings also mention options that are
only remotely related to them.  E.g., -Wformat has all these:

   -Wformat, -Wno-format, -ffreestanding, -fno-builtin, -Wformat=

(I see the same problem in the attributes section where the headings
for some attributes include option names).

That seems quite puzzling.  I assume it's a consequence of having
index entries for the related options, but I don't think making
them visible in the headings is helfpful.


Oh, you are right. It was consequence of wrong parsing of index entries.
It should be fixed now.



Headings that in the manual today include a level like

   -Wformat-overflow
   -Wformat-overflow=level

don't mention the level in the Spinx manual:

   -Wformat-overflow, -Wno-format-overflow

When the /level/ is then discussed in the rest of the text it's
not clear what it refers to.


Should be also fixed now.

Can you please take a look at the current output and give me a feedback?
Thanks,
Martin



Martin



Note the documentation is automatically ([1]) generated from texinfo with a 
GitHub workflow ([2]).
It's built on the devel/sphinx GCC branch which I periodically with the master 
branch. One can
see the current source .rst files here: [3].

Changes made since the last time:
- a shared content is factored out ([4])
- conditional build is fully supported (even for shared parts)
- manual pages look reasonable well
- folders are created for files which have >= 5 TOC tree entries
- various formatting issues were resolved
- baseconf.py reads BASE-VER, DEV-PHASE, .. files

I've got couple of questions:

1) Do we have to you the following cover text?
    Copyright (c) 1988-2020 Free Software Foundation, Inc.

    Permission is granted to copy, distribute and/or modify this document under 
the terms of the GNU Free Documentation License, Version 1.3 or any later version 
published by the Free Software Foundation; with the Invariant Sections being 
"GNU General Public
    License" and "Funding Free Software", the Front-Cover texts being (a) 
(see below), and with the Back-Cover Texts being (b) (see below).  A copy of the license is 
included in the gfdl(7) man page.

    (a) The FSF's Front-Cover Text is:

 A GNU Manual

    (b) The FSF's Back-Cover Text is:

 You have freedom to copy and modify this GNU Manual, like GNU
 software.  Copies published by the Free Software Foundation raise
 funds for GNU development.

2) Do we want to generate fsf-funding, gpl and gfdl manual pages?
3) Do we want to preserve the current strange copy mechanism for 
./gcc/doc/tm.texi.in ?
4) Do we want a copyright header for the created .rst files?

Thoughts?
Thanks,
Martin

[1] https://github.com/davidmalcolm/texi2rst
[2] https://github.com/davidmalcolm/texi2rst/actions
[3] https://github.com/marxin/texi2rst-generated/tree/master/sphinx
[4] https://github.com/marxin/texi2rst-generated/tree/master/sphinx/share






RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Segher Boessenkool 
>Sent: Thursday, June 3, 2021 4:46 AM
>To: Richard Biener 
>Cc: Liu, Hongtao ; GCC Patches patc...@gcc.gnu.org>
>Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not
>(vec_duplicate A)).
>
>Hi!
>
>On Wed, Jun 02, 2021 at 09:07:35AM +0200, Richard Biener wrote:
>> On Wed, Jun 2, 2021 at 7:41 AM liuhongt via Gcc-patches
>>  wrote:
>> > For i386, it will enable below opt
>> >
>> > from
>> > notl%edi
>> > vpbroadcastd%edi, %xmm0
>> > vpand   %xmm1, %xmm0, %xmm0
>> > to
>> > vpbroadcastd%edi, %xmm0
>> > vpandn   %xmm1, %xmm0, %xmm0
>>
>> There will be cases where (vec_duplicate (not A)) is better than (not
>> (vec_duplicate A)), so I'm not sure it is a good idea to forcefully
>> canonicalize unary operations.
>
>It is two unaries in sequence, where the order does not matter either.
>As in all such cases you either have to handle both cases everywhere, or have
>a canonical order.
>
>> I suppose the
>> simplification happens inside combine
>
>combine uses simplify-rtx for most cases (it is part of combine, but used in
>quite a few other places these days).
>
>> - doesn't combine
>> already have code to try variants of an expression and isn't this a
>> good candidate that can be added there, avoiding the canonicalization?
>
>As I mentioned, this is done in simplify-rtx in cases that do not have a
>canonical representation.  This is critical because it prevents loops.
>
>A very typical example is how UMIN is optimised:
>
>   case UMIN:
>  if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
>   return op1;
>  if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
>   return op0;
>  tem = simplify_associative_operation (code, mode, op0, op1);
>  if (tem)
>   return tem;
>  break;
>
>(the stuff using "tem").
>
>Hongtao, can we do something similar here?  Does that work well?  Please try
>it out :-)

In simplify_rtx, no simplication occurs, there is just the difference between
 (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will 
only be 0.
Basically we don't know it's a simplication until combine successfully split the
3->2 instructions (not + broadcast + and to andnot + broadcast), but it's 
pretty awkward
to do this in combine.

Consider andnot is existed for many backends, I think a canonicalization is 
needed here.
Maybe we can add insn canonicalization for transforming (and (vect_duplicate 
(not A)) B) to 
(and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not 
(vec_duplicate A))?

>
>
>Segher


Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao via Gcc-patches wrote:
> In simplify_rtx, no simplication occurs, there is just the difference between
>  (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will 
> only be 0.
> Basically we don't know it's a simplication until combine successfully split 
> the
> 3->2 instructions (not + broadcast + and to andnot + broadcast), but it's 
> pretty awkward
> to do this in combine.
> 
> Consider andnot is existed for many backends, I think a canonicalization is 
> needed here.
> Maybe we can add insn canonicalization for transforming (and (vect_duplicate 
> (not A)) B) to 
> (and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not 
> (vec_duplicate A))?

For the (not (vec_duplicate)) vs. (vec_duplicate (not)) it isn't clear which
one is generally a win on major targets, so I'd say it is better to add a
combine splitter to swap it in backends that want that.

Jakub



[committed] arc: Remove obsolete options

2021-06-03 Thread Claudiu Zissulescu via Gcc-patches
This is a respin of an older series of three patches which I have
merged into one. The new (committed) patch is keeping the obsolete
options int .opt file marking them accrodingly for backwards
compatibility.

Remove the following obsolete options:
- munalign-prob-threshold
- malign-call
- mmixed-code

The ARC's options are marked as obsolete and ignored for backwards
compatibility.

gcc/
2021-06-03  Claudiu Zissulescu  

* common/config/arc/arc-common.c (arc_option_optimization_table):
Remove malign-call.
* config/arc/arc.c (arc_unalign_branch_p): Remove unused function.
* config/arc/arc.h (TARGET_MIXED_CODE): Remove macro.
(INDEX_REG_CLASS): Only refer to GENERAL_REGS.
* config/arc/arc.md (abssi2_mixed): Remove pattern.
* config/arc/arc.opt (munalign-prob-threshold): Mark it obsolete.
(malign-call): Likewise.
(mmixed-code): Likewise.
* doc/invoke.texi (ARC): Update doc.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/common/config/arc/arc-common.c |  1 -
 gcc/config/arc/arc.c   | 23 ---
 gcc/config/arc/arc.h   |  4 +---
 gcc/config/arc/arc.md  |  8 
 gcc/config/arc/arc.opt | 18 ++
 gcc/doc/invoke.texi| 13 +++--
 6 files changed, 10 insertions(+), 57 deletions(-)

diff --git a/gcc/common/config/arc/arc-common.c 
b/gcc/common/config/arc/arc-common.c
index 86674dd3de7..6a119029616 100644
--- a/gcc/common/config/arc/arc-common.c
+++ b/gcc/common/config/arc/arc-common.c
@@ -62,7 +62,6 @@ static const struct default_options 
arc_option_optimization_table[] =
 { OPT_LEVELS_SIZE, OPT_fif_conversion, NULL, 0 },
 { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
 { OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_msize_level_, NULL, 0 },
-{ OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_malign_call, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 9153f0529ab..b77d0566386 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -9868,29 +9868,6 @@ gen_acc2 (void)
   return gen_rtx_REG (SImode, TARGET_BIG_ENDIAN ? 57: 56);
 }
 
-/* FIXME: a parameter should be added, and code added to final.c,
-   to reproduce this functionality in shorten_branches.  */
-#if 0
-/* Return nonzero iff BRANCH should be unaligned if possible by upsizing
-   a previous instruction.  */
-int
-arc_unalign_branch_p (rtx branch)
-{
-  rtx note;
-
-  if (!TARGET_UNALIGN_BRANCH)
-return 0;
-  /* Do not do this if we have a filled delay slot.  */
-  if (get_attr_delay_slot_filled (branch) == DELAY_SLOT_FILLED_YES
-  && !NEXT_INSN (branch)->deleted ())
-return 0;
-  note = find_reg_note (branch, REG_BR_PROB, 0);
-  return (!note
- || (arc_unalign_prob_threshold && !br_prob_note_reliable_p (note))
- || INTVAL (XEXP (note, 0)) < arc_unalign_prob_threshold);
-}
-#endif
-
 /* When estimating sizes during arc_reorg, when optimizing for speed, there
are three reasons why we need to consider branches to be length 6:
- annull-false delay slot insns are implemented using conditional execution,
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 252241a858c..0224ae65074 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -115,8 +115,6 @@ extern const char *arc_cpu_to_as (int argc, const char 
**argv);
 
 /* Run-time compilation parameters selecting different hardware subsets.  */
 
-#define TARGET_MIXED_CODE (TARGET_MIXED_CODE_SET)
-
 #define TARGET_SPFP (TARGET_SPFP_FAST_SET || TARGET_SPFP_COMPACT_SET)
 #define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET   \
 || TARGET_FP_DP_AX)
@@ -571,7 +569,7 @@ extern enum reg_class arc_regno_reg_class[];
a scale factor or added to another register (as well as added to a
displacement).  */
 
-#define INDEX_REG_CLASS (TARGET_MIXED_CODE ? ARCOMPACT16_REGS : GENERAL_REGS)
+#define INDEX_REG_CLASS GENERAL_REGS
 
 /* The class value for valid base registers. A base register is one used in
an address which is the register value plus a displacement.  */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index a67bb581003..de61b2b790f 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2011,14 +2011,6 @@ (define_expand "extendhisi2"
 
 ;; Absolute instructions
 
-(define_insn "*abssi2_mixed"
-  [(set (match_operand:SI 0 "compact_register_operand" "=q")
-   (abs:SI (match_operand:SI 1 "compact_register_operand" "q")))]
-  "TARGET_MIXED_CODE"
-  "abs%? %0,%1%&"
-  [(set_attr "type" "two_cycle_core")
-   (set_attr "iscompact" "true")])
-
 (define_insn "abssi2"
   [(set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,w,w")
(abs:SI (match_operand:SI 1 "nonmemory_operand" "Rcq#q,cL,Cal")))]
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 85688d57574..a8935db2dd8 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/conf

Re: [PATCH] xtensa: Fix 2 warnings during xtensa build [PR100841]

2021-06-03 Thread Max Filippov via Gcc-patches
On Wed, Jun 2, 2021 at 11:27 AM Jeff Law  wrote:
> On 6/2/2021 11:09 AM, Jakub Jelinek wrote:
> > When building gcc targetting xtensa-linux, there are 2 warnings the PR
> > complains about:
> > ../../gcc/dwarf2cfi.c: In function ‘void init_one_dwarf_reg_size(int, 
> > machine_mode, rtx, machine_mode, init_one_dwarf_reg_state*)’:
> > ../../gcc/dwarf2cfi.c:291:12: warning: comparison of integer expressions of 
> > different signedness: ‘const unsigned int’ and ‘int’ [-Wsign-compare]
> >291 |   if (rnum >= DWARF_FRAME_REGISTERS)
> > ../../gcc/function.c: In function ‘void gen_call_used_regs_seq(rtx_insn*, 
> > unsigned int)’:
> > ../../gcc/function.c:5897:63: warning: comparison of unsigned expression in 
> > ‘< 0’ is always false [-Wtype-limits]
> >   5897 |   if (crtl->uses_only_leaf_regs && LEAF_REG_REMAP (regno) < 0)
> > which might during bootstrap or when configured with --enable-werror-always
> > be turned into errors.
> >
> > The first one is the -Wsign-compare warning, in c-family we do:
> > 2281/* Do not warn if the signed quantity is an unsuffixed integer
> > 2282   literal (or some static constant expression involving such
> > 2283   literals or a conditional expression involving such literals)
> > 2284   and it is non-negative.  */
> > 2285if (tree_expr_nonnegative_warnv_p (sop, &ovf))
> > 2286  /* OK */;
> > and so don't warn if that function determines the expression is
> > non-negative.  But xtensa defines DWARF_FRAME_REGISTERS as
> > (16 + (something ? 0 : 1)) and that isn't handled by
> > tree_expr_nonnegative_warnv_p, VRP can handle it of course, but that is much
> > later.
> > The second chunk rewrites it into a form that tree_expr_nonnegative_warnv_p
> > can handle, in particular (something ? 16 : 16 + 1), where for COND_EXPRs
> > that function checks both the 2nd and 3rd operand of the ternary operator
> > and if both are nonnegative, returns true.
> >
> > The other warning has been introduced fairly recently; LEAF_REG_REMAP is
> > currently used by 2 targets only, and is documented to yield -1 if a hard
> > reg number can't be remapped and the remapped register number otherwise.
> > That means that the type of the expression should be signed (otherwise -1
> > could never appear), and on SPARC indeed it is defined as
> > extern char leaf_reg_remap[];
> > #define LEAF_REG_REMAP(REGNO) (leaf_reg_remap[REGNO])
> > so unless the host is -funsigned-char by default it works fine.
> > I guess sparc.[ch] should be fixed to use signed char of leaf_reg_remap,
> > Eric?
> > The argument to LEAF_REG_REMAP is often unsigned int though, hard
> > register numbers are usually not negative, and thus the warning.
> > I think xtensa doesn't have 2G hard registers and so it is ok to just cast
> > it to int.
> >
> > Verified just by making sure the warnings go away in a cross, ok for trunk?
> >
> > 2021-06-02  Jakub Jelinek  
> >
> >   PR target/100841
> >   * config/xtensa/xtensa.h (LEAF_REG_REMAP): Cast REGNO to int to avoid
> >   -Wtype-limits warnings.
> >   (DWARF_FRAME_REGISTER): Rewrite into ternary operator with addition
> >   in operands to avoid -Wsign-compare warnings.
> OK.
> jeff

Looks good to me. Thank you!

-- Max


[Ada] Fix miscompilation of predicate on bit-packed array types

2021-06-03 Thread Eric Botcazou
This is a regression present on the mainline and 11 branch in the form of a
miscompilation by the new mod/ref IPA pass of code that passes constrained
bit-packed array objets in a call to a subprograms taking unconstrained bit-
packed array parameters, which occurs for predicate on bit-packed array types.

Tested on x86-64/Linux, applied on the mainline.


2021-06-03  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Add PAT
local constant and use it throughout.  If it is set, use a ref-all
pointer type for the pointer-to-array field of the fat pointer type.
: Add PAT local constant and use it throughout.


2021-06-03  Eric Botcazou  

* gnat.dg/bit_packed_array6.adb: New test.
* gnat.dg/bit_packed_array6_pkg.ads: New helper.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index bc7046accb5..6fc94dda83b 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -2100,6 +2100,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 case E_Array_Type:
   {
+	const Entity_Id PAT = Packed_Array_Impl_Type (gnat_entity);
 	const bool convention_fortran_p
 	  = (Convention (gnat_entity) == Convention_Fortran);
 	const int ndim = Number_Dimensions (gnat_entity);
@@ -2203,16 +2204,16 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  }
 
 	/* If the GNAT encodings are used, give the fat pointer type a name.
-	   If this is a packed array, tell the debugger how to interpret the
-	   underlying bits by fetching that of the implementation type.  But
-	   in any case, mark it as artificial so the debugger can skip it.  */
+	   If this is a packed type implemented specially, tell the debugger
+	   how to interpret the underlying bits by fetching the name of the
+	   implementation type.  But, in any case, mark it as artificial so
+	   the debugger can skip it.  */
 	const Entity_Id gnat_name
-	  = (Present (Packed_Array_Impl_Type (gnat_entity))
-	 && gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL)
-	? Packed_Array_Impl_Type (gnat_entity)
+	  = Present (PAT) && gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL
+	? PAT
 	: gnat_entity;
 	tree xup_name
-	  = (gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL)
+	  = gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL
 	? create_concat_name (gnat_name, "XUP")
 	: gnu_entity_name;
 	create_type_decl (xup_name, gnu_fat_type, true, debug_info_p,
@@ -2347,9 +2348,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 	/* If this is a packed type implemented specially, then process the
 	   implementation type so it is elaborated in the proper scope.  */
-	if (Present (Packed_Array_Impl_Type (gnat_entity)))
-	  gnat_to_gnu_entity (Packed_Array_Impl_Type (gnat_entity), NULL_TREE,
-			  false);
+	if (Present (PAT))
+	  gnat_to_gnu_entity (PAT, NULL_TREE, false);
 
 	/* Otherwise, if an alignment is specified, use it if valid and, if
 	   the alignment was requested with an explicit clause, state so.  */
@@ -2374,8 +2374,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  tem = change_qualified_type (tem, TYPE_QUAL_VOLATILE);
 
 	/* Adjust the type of the pointer-to-array field of the fat pointer
-	   and record the aliasing relationships if necessary.  */
-	TREE_TYPE (TYPE_FIELDS (gnu_fat_type)) = build_pointer_type (tem);
+	   and record the aliasing relationships if necessary.  If this is
+	   a packed type implemented specially, then use a ref-all pointer
+	   type since the implementation type may vary between constrained
+	   subtypes and unconstrained base type.  */
+	if (Present (PAT))
+	  TREE_TYPE (TYPE_FIELDS (gnu_fat_type))
+	= build_pointer_type_for_mode (tem, ptr_mode, true);
+	else
+	  TREE_TYPE (TYPE_FIELDS (gnu_fat_type)) = build_pointer_type (tem);
 	if (TYPE_ALIAS_SET_KNOWN_P (gnu_fat_type))
 	  record_component_aliases (gnu_fat_type);
 
@@ -2439,6 +2446,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	;
   else
 	{
+	  const Entity_Id PAT = Packed_Array_Impl_Type (gnat_entity);
 	  Entity_Id gnat_index, gnat_base_index;
 	  const bool convention_fortran_p
 	= (Convention (gnat_entity) == Convention_Fortran);
@@ -2844,7 +2852,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 	  /* If this is a packed type implemented specially, then replace our
 	 type with the implementation type.  */
-	  if (Present (Packed_Array_Impl_Type (gnat_entity)))
+	  if (Present (PAT))
 	{
 	  /* First finish the type we had been making so that we output
 		 debugging information for it.  */
@@ -2869,8 +2877,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		 this type again.  */
 	  save_gnu_tree (gnat_entity, gnu_tmp_decl, false);
 
-	  gnu_type
-		= gnat_to_gnu_type (Packed_Array_Impl_Type (gn

Re: [PATCH] libstdc++: Avoid hard error in ranges::unique_copy [PR100770]

2021-06-03 Thread Jonathan Wakely via Gcc-patches

On 27/05/21 09:50 -0400, Patrick Palka via Libstdc++ wrote:

On Wed, 26 May 2021, Tim Song wrote:


On Wed, May 26, 2021 at 1:43 PM Patrick Palka  wrote:
>
> On Wed, 26 May 2021, Tim Song wrote:
> >
> > On Wed, May 26, 2021 at 12:00 PM Patrick Palka via Libstdc++
> >  wrote:
> > >
> > > -   else if constexpr (input_iterator<_Out>
> > > -  && same_as, 
iter_value_t<_Out>>)
> > > +   else if constexpr (requires { requires (input_iterator<_Out>
> > > +   && 
same_as,
> > > +  
iter_value_t<_Out>>); })
> >
> > It's arguably cleaner to extract this into a concept which can then
> > also be used in the constraint.
>
> Sounds good, though I'm not sure what name to give to this relatively
> ad-hoc set of requirements.  Any suggestions? :)
>

Something along the lines of "__can_reread_output", perhaps?


Works for me.  Here's v2, which factors out the condition into a concept
and defines the value_type, pointer and reference of output_iterator_wrapper
to void but leaves alone its difference_type.  Tested on x86_64-pc-linux-gnu.


OK for 10, 11 and trunk, thanks.


-- >8 --

Subject: [PATCH] libstdc++: Avoid hard error in ranges::unique_copy [PR100770]

Here, in the constexpr if condition within unique_copy, when
input_iterator<_Out> isn't satisfied we must avoid substituting into
iter_value_t<_Out> because the latter isn't necessarily well-formed in
this case.  To that end, this patch factors out the condition into a
concept and uses it throughout.

This patch also makes the definition of our testsuite
output_iterator_wrapper more minimal by defining its value_type, pointer
and reference member types to void.  This means our existing tests for
unique_copy already exercise the fix for this bug, so we don't need
to add another test.  The only other fallout of this testsuite iterator
change appears in std/ranges/range.cc, where the use of range_value_t
on a test_output_range is now ill-formed.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h (__detail::__can_reread_output):
Factor out this concept from ...
(__unique_copy_fn::operator()): ... here.  Use the concept
throughout.
* testsuite/std/ranges/range.cc: Remove now ill-formed use
of range_value_t on an output_range.
* testsuite/util/testsuite_iterators.h (output_iterator_wrapper):
Define value_type, pointer and reference member types to void.
---
libstdc++-v3/include/bits/ranges_algo.h  | 16 ++--
libstdc++-v3/testsuite/std/ranges/range.cc   |  3 ---
.../testsuite/util/testsuite_iterators.h |  2 +-
3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index cda3042c11f..ecf1378742d 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -1396,6 +1396,13 @@ namespace ranges

  inline constexpr __unique_fn unique{};

+  namespace __detail
+  {
+template
+  concept __can_reread_output = input_iterator<_Out>
+   && same_as<_Tp, iter_value_t<_Out>>;
+  }
+
  template
using unique_copy_result = in_out_result<_Iter, _Out>;

@@ -1407,8 +1414,7 @@ namespace ranges
   projected<_Iter, _Proj>> _Comp = ranges::equal_to>
  requires indirectly_copyable<_Iter, _Out>
&& (forward_iterator<_Iter>
-   || (input_iterator<_Out>
-   && same_as, iter_value_t<_Out>>)
+   || __detail::__can_reread_output<_Out, iter_value_t<_Iter>>
|| indirectly_copyable_storable<_Iter, _Out>)
  constexpr unique_copy_result<_Iter, _Out>
  operator()(_Iter __first, _Sent __last, _Out __result,
@@ -1432,8 +1438,7 @@ namespace ranges
}
return {__next, std::move(++__result)};
  }
-   else if constexpr (input_iterator<_Out>
-  && same_as, iter_value_t<_Out>>)
+   else if constexpr (__detail::__can_reread_output<_Out, 
iter_value_t<_Iter>>)
  {
*__result = *__first;
while (++__first != __last)
@@ -1467,8 +1472,7 @@ namespace ranges
   projected, _Proj>> _Comp = ranges::equal_to>
  requires indirectly_copyable, _Out>
&& (forward_iterator>
-   || (input_iterator<_Out>
-   && same_as, iter_value_t<_Out>>)
+   || __detail::__can_reread_output<_Out, range_value_t<_Range>>
|| indirectly_copyable_storable, _Out>)
  constexpr unique_copy_result, _Out>
  operator()(_Range&& __r, _Out __result,
diff --git a/libstdc++-v3/testsuite/std/ranges/range.cc 
b/libstdc++-v3/testsuite/std/ranges/range.cc
index 795aca472e5..aa29af471a4 100644
--- a/libstdc++-v3/testsuite/std/ranges/range.cc
+++ b/libstdc++-v3/testsuite/std/ranges/range.cc
@@ -75,9 +75,6 @@ static_assert( same_as,
static_assert( same_as,
   

Re: [PATCH] libstdc++: Simplify range adaptors' forwarding of bound args when possible

2021-06-03 Thread Jonathan Wakely via Gcc-patches

On 14/05/21 14:27 -0400, Patrick Palka via Libstdc++ wrote:

r11-8053 rewrote the range adaptor implementation in conformance with
P2281, making partial application act like a SFINAE-friendly perfect
forwarding call wrapper.  Making SFINAE-friendliness coexist with
perfect forwarding here requires adding fallback deleted operator()
overloads (as described in e.g. section 5.5 of wg21.link/p0847r6).  But
unfortunately, as reported in PR100577, this necessary technique of
using of deleted overloads regresses diagnostics for ill-formed calls to
partially applied range adaptors in GCC.

Although GCC's diagnostics can arguably be improved here by having the
compiler explain why the other candidates weren't viable when overload
resolution selects a deleted candidate, we can also largely work around
this on the library side (and achieve more concise diagnostics than by
a frontend-side improvement alone) if we take advantage of the
observation that not all range adaptors need perfect forwarding call
wrapper semantics, in fact only views::split currently needs it, because
all other range adaptors either take no extra arguments or only
arguments that are expected to be freely/cheaply copyable, e.g. function
objects and integer-like types.  (The discussion section of P2281 goes
into detail about why views::split is special.)

To that end, this introduces opt-in flags for denoting a range adaptor
as having "simple" extra arguments (in the case of a range adaptor
non-closure) or having a "simple" call operator (in the case of a range
adaptor closure).  These flags are then used to conditionally simplify
the operator() for the generic _Partial and _Pipe class templates, down
from needing three overloads thereof (including one defined as deleted)
to just needing a single overload.  The end result is that diagnostic
quality is restored for all adaptors except for views::split, and
diagnostics for the adaptors are generally made more concise since
there's only a single _Partial/_Pipe overload to diagnose instead of
three of them.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11?


OK for trunk and 11.

Is there any benefit in using [[no_unique_address]] for
_Partial::_M_args too?



Re: GCC documentation: porting to Sphinx

2021-06-03 Thread Martin Liška

On 6/2/21 6:44 PM, Joseph Myers wrote:

On Wed, 2 Jun 2021, Joel Sherrill wrote:


For RTEMS, we switched from texinfo to Sphinx and the dependency
on Python3 for Sphinx has caused a bit of hassle. Is this going to be
an issue for GCC?


What Sphinx (and, thus, Python) versions does the GCC manual build work
with?


I've just tried version 1.7.6 which we use for libgccjit and it's fine:
https://gcc.gnu.org/onlinedocs/jit/

About Python version: I'm not planning supporting Python2, it's dead 10 years 
already.


Can it work with e.g. any Sphinx versions from the past five years,
or are there newer Sphinx features that are critical for the GCC manuals?


The mentioned version is 3 years and I expect even older releases would work.
No, I don't rely on any new feature.

I would recommend testing the build. You can simply clone:
https://github.com/marxin/texi2rst-generated

and simply run 'make html' or 'make latexpdf'. Basic dependencies are mentioned 
here:
https://github.com/marxin/texi2rst-generated#requirements

Martin


I've seen a need for frequent Sphinx updates being a pain when building
other software using Sphinx for its manual.


Also we rely on TexLive for PDF output and that's a bit of a pain to
install. Tex was incorrectly packaged on some RHEL/CentOS versions.


This is nothing new, since building PDF manuals from Texinfo sources also
needs TeX.





Re: [PATCH] Use _GLIBCXX_ASSERTIONS as _GLIBCXX_DEBUG light

2021-06-03 Thread Jonathan Wakely via Gcc-patches

On 27/05/21 19:37 +0200, François Dumont via Libstdc++ wrote:
We have been talking for a long time of a debug mode with less impact 
on performances.


We already have it, that's what _GLIBCXX_ASSERTIONS already is :-)


I propose to simply use the existing _GLIBCXX_ASSERTIONS macro.

    libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic debug checks

    Use _GLIBCXX_ASSERTIONS as a _GLIBCXX_DEBUG light mode. When 
defined it activates
    all _GLIBCXX_DEBUG checks but skipping those requiring to loop 
through the iterator

    range unless in case of constexpr.

    libstdc++-v3/ChangeLog:

    * include/debug/debug.h [_GLIBCXX_ASSERTIONS]: Define 
debug macros non-empty.
    * include/debug/helper_functions.h: Cleanup comment about 
removed _Iter_base.
    * include/debug/functions.h (__skip_debug_runtime_check): 
New, returns false if

    _GLIBCXX_DEBUG is defined or if constant evaluated.
    (__check_sorted, __check_partitioned_lower, 
__check_partitioned_upper): Use latter.


Tested under Linux x64.

Ok to commit ?

François




diff --git a/libstdc++-v3/include/debug/debug.h 
b/libstdc++-v3/include/debug/debug.h
index 116f2f023e2..2e6ce1c8a93 100644
--- a/libstdc++-v3/include/debug/debug.h
+++ b/libstdc++-v3/include/debug/debug.h
@@ -61,7 +61,7 @@ namespace __gnu_debug
struct _Safe_iterator;
}

-#ifndef _GLIBCXX_DEBUG
+#ifndef _GLIBCXX_ASSERTIONS

# define __glibcxx_requires_cond(_Cond,_Msg)
# define __glibcxx_requires_valid_range(_First,_Last)
diff --git a/libstdc++-v3/include/debug/functions.h 
b/libstdc++-v3/include/debug/functions.h
index 6cac11f2abd..ee0eb877568 100644
--- a/libstdc++-v3/include/debug/functions.h
+++ b/libstdc++-v3/include/debug/functions.h
@@ -48,6 +48,25 @@ namespace __gnu_debug
  template
struct _Is_contiguous_sequence : std::__false_type { };

+  _GLIBCXX20_CONSTEXPR


Should this be simply _GLIBCXX_CONSTEXPR so that it can be constexpr
in C++14 mode too? Or are there are never any debug checks in
functions that are already constexpr in C++14 or C++17?


+  inline bool
+  __skip_debug_runtime_check()
+  {
+// We could be here while only _GLIBCXX_ASSERTIONS has been defined.
+// In this case we skip expensive runtime checks, constexpr will still
+// be checked.
+return
+#ifndef _GLIBCXX_DEBUG
+# if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+  !__builtin_is_constant_evaluated();
+# else
+  true;
+# endif
+#else
+  false;
+#endif


I think this would be simpler without the nesting, and without the
preprocessor checks halfway through the return statement:

#ifdef _GLIBCXX_DEBUG
return false;
#elif _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
return !__builtin_is_constant_evaluated();
#else
return true;
#endif



+  }
+
  /* Checks that [first, last) is a valid range, and then returns
   * __first. This routine is useful when we can't use a separate
   * assertion statement because, e.g., we are in a constructor.
@@ -260,8 +279,9 @@ namespace __gnu_debug
inline bool
__check_sorted(const _InputIterator& __first, const _InputIterator& __last)
{
-  return __check_sorted_aux(__first, __last,
-   std::__iterator_category(__first));
+  return __skip_debug_runtime_check()
+   || __check_sorted_aux(__first, __last,
+ std::__iterator_category(__first));


Currently this function is never called at all ifndef _GLIBCXX_DEBUG.
With this change, it's going to be present for _GLIBCXX_ASSERTIONS,
and if it isn't inlined it's going to explode the code size.

Some linux distros are already building the entire distro with
_GLIBCXX_ASSERTIONS so I think we need to be quite careful about this
kind of large change affecting every algo.

So maybe we shouldn't enable these checks via _GLIBCXX_ASSERTIONS, but
a new macro.



Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches

Thank you Kewen!!

I will apply this now.

BR,
Andre

On 25/05/2021 09:42, Kewen.Lin wrote:

on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:

Hi Andre,

on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses 
an unpredicated (all-true predicate for SVE) main loop and a predicated tail 
loop. The way this was implemented seems to mean it re-uses the same 
vector-mode for both loops, which means the tail loop isn't an actual loop but 
only executes one iteration.

This patch uses the knowledge of the conditions to enter an epilogue loop to 
help come up with a potentially more restricive upper bound.

Regression tested on aarch64-linux-gnu and also ran the testsuite using 
'--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
failures.

Would be good to have this tested for PPC too as I believe they are the main 
users of the --param vect-partial-vector-usage=1 option. Can someone help me 
test (and maybe even benchmark?) this on a PPC target?



Thanks for doing this!  I can test it on Power10 which enables this parameter
by default, also evaluate its impact on SPEC2017 Ofast/unroll.


Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
SPEC2017 run didn't show any remarkable improvement/degradation.

BR,
Kewen


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Bill Schmidt via Gcc-patches

On 6/2/21 7:46 PM, Xionghu Luo wrote:

Hi,

On 2021/6/3 06:20, Segher Boessenkool wrote:

On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:

On P8LE, extra rot64+rot64 load or store instructions are generated
in float128 to vector __int128 conversion.

This patch teaches pass swaps to also handle such pattens to remove
extra swap instructions.

Did you check if this is already handled by simplify-rtx if the mode had
been TImode (not V1TImode)?  If not, why do you not handle it there?

I tried to do it in combine or peephole, the later pass split2
or split3 will still split it to rotate + rotate again as we have split
after reload, and this pattern is quite P8LE specific, so put it in pass
swap.  The simplify-rtx could simplify
r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
operations already.


vsx.md:

;; The post-reload split requires that we re-permute the source
;; register in case it is still live.
(define_split
   [(set (match_operand:VSX_LE_128 0 "memory_operand")
 (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
&& !altivec_indexed_or_indirect_operand (operands[0], mode)"
   [(const_int 0)]
{
   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
   rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
   DONE;
})


Note also that swap optimization can handle more general cases than 
simplify-rtx.  In my view it's best to have it covered in both places.


Bill

  
Thanks,

Xionghu


Re: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.

2021-06-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 03, 2021 at 05:07:26PM +0800, liuhongt via Gcc-patches wrote:
> @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2"
>"TARGET_AVX512F")
>  
>  (define_insn "avx2_v8qiv8si2"
> -  [(set (match_operand:V8SI 0 "register_operand" "=v")
> +  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
>   (any_extend:V8SI
> (vec_select:V8QI
> - (match_operand:V16QI 1 "register_operand" "v")
> + (match_operand:V16QI 1 "register_operand" "Yv")
>   (parallel [(const_int 0) (const_int 1)
>  (const_int 2) (const_int 3)
>  (const_int 4) (const_int 5)

Why do you need this change (and similarly other v -> Yv changes)?
I mean, ix86_hard_regno_mode_ok for TARGET_AVX512F && !TARGET_AVX512VL
should return false for the 16-byte and 32-byte vector modes.

The reason to use Yv is typically where the match_operand has 64-byte
vector mode or scalar mode, yet it needs an AVX512VL instruction.

The changes to use Yw look ok, that is for the cases where the
insn requires both AVX512VL and AVX512BW, while ix86_hard_regno_mode_ok
ensures the xmm16+ regs won't be used for the 16/32-byte vectors when
AVX512VL is not on, it doesn't ensure that AVX512BW will be enabled.

Jakub



Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-03 Thread Andrew MacLeod via Gcc-patches

On 6/2/21 6:32 AM, Aldy Hernandez wrote:
We've been having "issues" in our branch when exporting to the global 
space ranges that take into account previously known ranges 
(SSA_NAME_RANGE_INFO, etc).  For the longest time we had the export 
feature turned off because it had the potential of removing 
__builtin_unreachable code early in the pipeline.  This was causing 
one or two tests to fail.


I finally got fed up, and investigated why.

Take the following code:

  i_4 = somerandom ();
  if (i_4 < 0)
    goto ; [INV]
  else
    goto ; [INV]

   :
  __builtin_unreachable ();

   :

It turns out that both legacy evrp and VRP have code that notices the 
above pattern and sets the *global* range for i_4 to [0,MAX]. That is, 
the range for i_4 is set, not at BB4, but at the definition site.  See 
uses of assert_unreachable_fallthru_edge_p() for details.


This global range causes subsequent passes (VRP1 in the testcase 
below), to remove the checks and the __builtin_unreachable code 
altogether.


// pr80776-1.c
int somerandom (void);
void
Foo (void)
{
  int i = somerandom ();
  if (! (0 <= i))
    __builtin_unreachable ();
  if (! (0 <= i && i <= 99))
    __builtin_unreachable ();
  sprintf (number, "%d", i);
}

This means that by the time the -Wformat-overflow warning runs, the 
above sprintf has been left unguarded, and a bogus warning is issued.


Currently the above test does not warn, but that's because of an 
oversight in export_global_ranges().  This function is disregarding 
known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and 
only setting ranges the ranger knows about.


For the above test the IL is:

   :
  i_4 = somerandom ();
  if (i_4 < 0)
    goto ; [INV]
  else
    goto ; [INV]

   :
  __builtin_unreachable ();

   :
  i.0_1 = (unsigned int) i_4;
  if (i.0_1 > 99)
    goto ; [INV]
  else
    goto ; [INV]

   :
  __builtin_unreachable ();

   :
  _7 = __builtin___sprintf_chk (&number, 1, 7, "%d", i_4);


Legacy evrp has determined that the range for i_4 is [0,MAX] per my 
analysis above, but ranger has no known range for i_4 at the 
definition site.  So at export_global_ranges time, ranger leaves the 
[0,MAX] alone.


OTOH, evrp sets the global range at the definition for i.0_1 to 
[0,99] per the same unreachable feature.  However, ranger has 
correctly determined that the range for i.0_1 at the definition is 
[0,MAX], which it then proceeds to export.  Since the current 
export_global_ranges (mistakenly) does not take into account previous 
global ranges, the ranges in the global tables end up like this:


i_4: [0, MAX]
i.0_1: [0, MAX]

This causes the first unreachable block to be removed in VRP1, but the 
second one to remain.  Later VRP can determine that i_4 in the sprintf 
call is [0,99], and no warning is issued.


But... the missing bogus warning is due to current 
export_global_ranges ignoring SSA_NAME_RANGE_INFO and friends, 
something which I'd like to fix.  However, fixing this, gets us back to:


i_4: [0, MAX]
i.0_1: [0, 99]

Which means, we'll be back to removing the unreachable blocks and 
issuing a warning in pr80776-1.c (like we have been since the 
beginning of time).


The attached patch fixes export_global_ranges to the expected 
behavior, and adds the previous XFAIL to pr80776-1.c, while 
documenting why this warning is issued in the first place.


Once legacy evrp is removed, this won't be an issue, as ranges in the 
IL will tell the truth.  However, this will mean that we will no 
longer remove the first __builtin_unreachable combo.  But ISTM, that 
would be correct behavior ??.




Tested on x86-64 Linux.

OK?


If I understand correctly, we will have to ensure that 
__builtin_unreachable ()  and the branches to them are removed by the 
end of gimple one way or another.  While legacy EVRP is still running, 
that isnt a problem.


This patch merely changes the values we export from ranger to be more up 
to date... and some fallout you've encountered.


This might get taken care of down the road.. there is some potential for 
identifying a range that dominates all uses and making that the global 
range.. that would also resolve the issue of builtin_unreachable().  
Anyway, we can deal with this when we try to move to ranger-only mode


OK.



Re: rs6000: Require ELFv2 ABI for ROP test (PR100750)

2021-06-03 Thread Bill Schmidt via Gcc-patches

On 6/1/21 7:44 PM, Segher Boessenkool wrote:

It would be nice if we had a selector for when we can use -mrop-protect,
instead of assuming it is only for ELFv2.


Agreed -- I'll take a note and try to get to this sometime.


[committed] diagnostic-show-locus: tweak rejection logic

2021-06-03 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 4bdc6e17c9dc621f9502a84abceb2e17ae0418f8.

gcc/ChangeLog:
* diagnostic-show-locus.c (diagnostic_show_locus): Don't reject
printing the same location twice if there are fix-it hints,
multiple locations, or a label.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-show-locus.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 4111cd66544..24bd0313126 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -2600,9 +2600,11 @@ diagnostic_show_locus (diagnostic_context * context,
 return;
 
   /* Don't print the same source location twice in a row, unless we have
- fix-it hints.  */
+ fix-it hints, or multiple locations, or a label.  */
   if (loc == context->last_location
-  && richloc->get_num_fixit_hints () == 0)
+  && richloc->get_num_fixit_hints () == 0
+  && richloc->get_num_locations () == 1
+  && richloc->get_range (0)->m_label == NULL)
 return;
 
   context->last_location = loc;
-- 
2.26.3



[committed] analyzer: show types for poisoned_svalue and compound_svalue

2021-06-03 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as e84fe25f638efe1ead4304693d91d7555e7a.

gcc/analyzer/ChangeLog:
* svalue.cc (poisoned_svalue::dump_to_pp): Dump type.
(compound_svalue::dump_to_pp): Dump any type.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/svalue.cc | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 897e84e8464..a16563d912a 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -735,9 +735,17 @@ void
 poisoned_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 {
   if (simple)
-pp_printf (pp, "POISONED(%s)", poison_kind_to_str (m_kind));
+{
+  pp_string (pp, "POISONED(");
+  print_quoted_type (pp, get_type ());
+  pp_printf (pp, ", %s)", poison_kind_to_str (m_kind));
+}
   else
-pp_printf (pp, "poisoned_svalue(%s)", poison_kind_to_str (m_kind));
+{
+  pp_string (pp, "poisoned_svalue(");
+  print_quoted_type (pp, get_type ());
+  pp_printf (pp, ", %s)", poison_kind_to_str (m_kind));
+}
 }
 
 /* Implementation of svalue::accept vfunc for poisoned_svalue.  */
@@ -1228,17 +1236,26 @@ compound_svalue::dump_to_pp (pretty_printer *pp, bool 
simple) const
   if (simple)
 {
   pp_string (pp, "COMPOUND(");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
+  pp_character (pp, '{');
   m_map.dump_to_pp (pp, simple, false);
-  pp_character (pp, ')');
+  pp_string (pp, "})");
 }
   else
 {
   pp_string (pp, "compound_svalue (");
-  pp_string (pp, ", ");
+  if (get_type ())
+   {
+ print_quoted_type (pp, get_type ());
+ pp_string (pp, ", ");
+   }
   pp_character (pp, '{');
   m_map.dump_to_pp (pp, simple, false);
-  pp_string (pp, "}, ");
-  pp_character (pp, ')');
+  pp_string (pp, "})");
 }
 }
 
-- 
2.26.3



[committed] analyzer: remove unused prototypes

2021-06-03 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 981d98b883ed521c88c295ed82227c605d82add4.

gcc/analyzer/ChangeLog:
* store.h (store::get_direct_binding): Remove unused decl.
(store::get_default_binding): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/store.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h
index dc22d96f186..d68513ca94c 100644
--- a/gcc/analyzer/store.h
+++ b/gcc/analyzer/store.h
@@ -611,8 +611,6 @@ public:
 
   json::object *to_json () const;
 
-  const svalue *get_direct_binding (store_manager *mgr, const region *reg);
-  const svalue *get_default_binding (store_manager *mgr, const region *reg);
   const svalue *get_any_binding (store_manager *mgr, const region *reg) const;
 
   bool called_unknown_fn_p () const { return m_called_unknown_fn; }
-- 
2.26.3



Re: [Patch, fortran] PR fortran/100120/100816/100818/100819/100821 problems raised by aggregate data types

2021-06-03 Thread dhumieres.dominique--- via Gcc-patches

Hi José,


Patch tested only on x86_64-pc-linux-gnu.


Also tested on darwin20. The patch is OK for me.

Thanks for the work,

Dominique


Re: [Patch] Fortran/OpenMP: Add omp loop [PR99928]

2021-06-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 03, 2021 at 12:30:50AM +0200, Tobias Burnus wrote:
> This patch adds support for 'omp loop' to gfortran including the combined
> constructs. It also fixes some splitting issues with clauses in
> combined constructs.
> 
> It does not attempt to clean up all remaining Fortran issues with
> clauses in combined constructs (cf. below + PR).
> 
>  * * *
> 
> Since 'parallel loop' is now supported, using
>  !$omp parallel &
>  !$acc& loop
> no longer gave an error. (Same result before: '!$acc& do'.)

I think best would be just to remove that part of the testcase
in the loop patch and handle the !$omp with !$acc continuations
and vice versa in a separate change.  That seems to be a preexisting
bug not really related to whether we support loop or not.
fatal_error is certainly not ideal, but I can understand fixing
it otherwise might be hard.
Wonder if we just shouldn't treat the incorrect continuations
as if they were simple comments.

> The check does (for Fortran – and alike for C):
>   ! { dg-final { scan-tree-dump-not "omp 
> distribute\[^\n\r]*lastprivate\\(j00\\)" "gimple" } }
>   ! { dg-final { scan-tree-dump-not "omp 
> parallel\[^\n\r]*lastprivate\\(j00\\)" "gimple" } }
>   ! { dg-final { scan-tree-dump-not "omp for\[^\n\r]*lastprivate\\(j00\\)" 
> "gimple" } }
>   ! { dg-final { scan-tree-dump "omp simd\[^\n\r]*linear\\(j00:1\\)" "gimple" 
> } }
>   !$omp distribute parallel do simd default(none)
>   do j00 = 1, 64
>   end do
> 
> While C generates (original dump)
> 
>   #pragma omp distribute
>   #pragma omp parallel default(none)
> #pragma omp for nowait
> #pragma omp simd
> 
> Fortran generates the same except for:
> #pragma omp simd linear(j00:1)

The *-7 testcase is not appropriate for Fortran, it tests the
IV declared in the construct cases which Fortran has no counterpart for,
even if the IV is first seen within the construct, that would implicitly
declare it outside of the construct.

> +   if (gfc_match ("teams )") == MATCH_YES)
> + c->bind = OMP_BIND_TEAMS;
> +   else if (gfc_match ("parallel )") == MATCH_YES)
> + c->bind = OMP_BIND_PARALLEL;
> +   else if (gfc_match ("thread )") == MATCH_YES)
> + c->bind = OMP_BIND_THREAD;
> +   else
> + {
> +   gfc_error ("Expected TEAMS, PARALLEL or THEAD as binding in "

Typo, s/THEAD/THREAD/

> +  if (clauses->bind == OMP_BIND_TEAMS)
> +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_TEAMS;
> +  else if (clauses->bind == OMP_BIND_PARALLEL)
> +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_PARALLEL;
> +  else if (clauses->bind == OMP_BIND_THREAD)
> +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_THREAD;
> +  else
> + gcc_unreachable ();

Wouldn't a switch (clauses->bind) be cleaner?

Otherwise LGTM.

Jakub



Re: [PATCH 1/2] Implement generic expression evaluator for range_query.

2021-06-03 Thread Aldy Hernandez via Gcc-patches
Andrew had some minor suggestions and cleanups after I posted this.

Andrew, is this what you had in mind?

Aldy

On Sat, May 29, 2021 at 6:06 AM Jeff Law  wrote:
>
>
>
> On 5/27/2021 2:55 AM, Aldy Hernandez via Gcc-patches wrote:
> > Right now, range_of_expr only works with constants, SSA names, and
> > pointers.  Anything else gets returned as VARYING.  This patch adds the
> > capability to deal with arbitrary expressions, inasmuch as these
> > tree codes are implemented in range-ops.cc.
> >
> > This will give us the ability to ask for the range of any tree expression,
> > not just constants and SSA names, with range_of_expr().
> >
> > This is a more generic implementation of determine_value_range in VRP.
> > A follow-up patch will remove all uses of it in favor of the
> > range_query API.
> >
> > Tested on x86-64 Linux.
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> >   * function-tests.c (test_ranges): Call gimple_range_tests.
> >   * gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use
> >   get_global_range_query instead of get_tree_range.
> >   * gimple-range.cc (fur_source::get_operand): Add argument to
> >   get_tree_range.
> >   (get_arith_expr_range): New.
> >   (get_tree_range): Add gimple and range_query arguments.
> >   Call get_arith_expr_range.
> >   (gimple_ranger::range_of_expr): Add argument to get_tree_range.
> >   Include gimple-range-tests.cc.
> >   * gimple-range.h (get_tree_range): Add argument.
> >   * selftest.h (gimple_range_tests): New.
> >   * value-query.cc (global_range_query::range_of_expr): Add
> >   argument to get_tree_range.
> >   * vr-values.c (vr_values::range_of_expr): Same.
> >   * gimple-range-tests.cc: New file.
> Both patches in this series are fine.
>
> Thanks,
> Jeff
>
From fe8f95d2baf92fc5fa967c5d183c5480f003e15d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 26 May 2021 08:40:17 +0200
Subject: [PATCH 1/2] Implement generic expression evaluator for range_query.

Right now, range_of_expr only works with constants, SSA names, and
pointers.  Anything else gets returned as VARYING.  This patch adds the
capability to deal with arbitrary expressions, inasmuch as these
tree codes are implemented in range-ops.cc.

This will give us the ability to ask for the range of any tree expression,
not just constants and SSA names, with range_of_expr().

This is a more generic implementation of determine_value_range in VRP.
A follow-up patch will remove all uses of it in favor of the
range_query API.

gcc/ChangeLog:

	* function-tests.c (test_ranges): Call gimple_range_tests.
	* gimple-range-cache.cc (ranger_cache::range_of_expr): Pass stmt
	to get_tree_range.
	* gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use
	get_global_range_query instead of get_tree_range.
	* gimple-range.cc (fur_source::get_operand): Add argument to
	get_tree_range.
	(get_tree_range): Move to value-query.cc.
	Call get_arith_expr_range.
	(gimple_ranger::range_of_expr): Add argument to get_tree_range.
	Include gimple-range-tests.cc.
	* gimple-range.h (fold_range): Add argument.
	(get_tree_range): Remove.
	* selftest.h (gimple_range_tests): New.
	* value-query.cc (global_range_query::range_of_expr): Add
	stmt argument.
	(range_query::get_tree_range): Move from gimple-range.cc.
	* value-query.h (class range_query): Add get_tree_range and
	get_arith_expr_range.  Make fur_source a friend.
	* vr-values.c (vr_values::range_of_expr): Pass stmt to
	get_tree_range.
	* gimple-range-tests.cc: New file.
---
 gcc/function-tests.c  |  5 +++
 gcc/gimple-range-cache.cc |  4 +-
 gcc/gimple-range-tests.cc | 72 +
 gcc/gimple-range.cc   | 64 ++---
 gcc/gimple-range.h|  7 ++--
 gcc/selftest.h|  1 +
 gcc/value-query.cc| 84 ++-
 gcc/value-query.h |  2 +
 gcc/vr-values.c   |  2 +-
 9 files changed, 172 insertions(+), 69 deletions(-)
 create mode 100644 gcc/gimple-range-tests.cc

diff --git a/gcc/function-tests.c b/gcc/function-tests.c
index 1b8f665cde1..0ac1d37f8ff 100644
--- a/gcc/function-tests.c
+++ b/gcc/function-tests.c
@@ -581,6 +581,11 @@ test_ranges ()
   push_cfun (fun);
   range_tests ();
   range_op_tests ();
+
+  build_cfg (fndecl);
+  convert_to_ssa (fndecl);
+  gimple_range_tests ();
+
   pop_cfun ();
 }
 
diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index cc27574b7b4..c58acf48dfb 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -827,7 +827,7 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt)
 {
   if (!gimple_range_ssa_p (name))
 {
-  get_tree_range (r, name);
+  get_tree_range (r, name, stmt);
   return true;
 }
 
@@ -860,7 +860,7 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt)
 	}
 }
   else
-get_tree_range (r, expr);
+get_tree_range (r, expr, NULL);
   return false;
 }
 
d

Re: [PATCH 1/2] Implement generic expression evaluator for range_query.

2021-06-03 Thread Andrew MacLeod via Gcc-patches

On 6/3/21 11:39 AM, Aldy Hernandez via Gcc-patches wrote:

Andrew had some minor suggestions and cleanups after I posted this.

Andrew, is this what you had in mind?

Aldy

On Sat, May 29, 2021 at 6:06 AM Jeff Law  wrote:



On 5/27/2021 2:55 AM, Aldy Hernandez via Gcc-patches wrote:

Right now, range_of_expr only works with constants, SSA names, and
pointers.  Anything else gets returned as VARYING.  This patch adds the
capability to deal with arbitrary expressions, inasmuch as these
tree codes are implemented in range-ops.cc.

This will give us the ability to ask for the range of any tree expression,
not just constants and SSA names, with range_of_expr().

This is a more generic implementation of determine_value_range in VRP.
A follow-up patch will remove all uses of it in favor of the
range_query API.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

   * function-tests.c (test_ranges): Call gimple_range_tests.
   * gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use
   get_global_range_query instead of get_tree_range.
   * gimple-range.cc (fur_source::get_operand): Add argument to
   get_tree_range.
   (get_arith_expr_range): New.
   (get_tree_range): Add gimple and range_query arguments.
   Call get_arith_expr_range.
   (gimple_ranger::range_of_expr): Add argument to get_tree_range.
   Include gimple-range-tests.cc.
   * gimple-range.h (get_tree_range): Add argument.
   * selftest.h (gimple_range_tests): New.
   * value-query.cc (global_range_query::range_of_expr): Add
   argument to get_tree_range.
   * vr-values.c (vr_values::range_of_expr): Same.
   * gimple-range-tests.cc: New file.

Both patches in this series are fine.

Thanks,
Jeff


yeah, OK.

Andrew



Re: [PATCH] wwwdocs: Do not rewrite the page titles

2021-06-03 Thread Jonathan Wakely via Gcc-patches

Ping.

Is this OK now?


On 18/04/21 23:45 +0100, Jonathan Wakely wrote:

Remove GNU and FSF attribution from HTML page titles.


I don't see why we should have to "comply with the GNU style" if we're
truly an independent project run by the GCC developers and aided by
the steering committee.

OK for wwwdocs?





commit d157082f49725510560cb83f2f1c045e2968ad3b
Author: Jonathan Wakely 
Date:   Sun Apr 18 23:42:26 2021 +0100

   Do not rewrite the page titles

   Remove GNU and FSF attribution from HTML page titles.

diff --git a/htdocs/style.mhtml b/htdocs/style.mhtml
index 677fd05a..eb7c353d 100644
--- a/htdocs/style.mhtml
+++ b/htdocs/style.mhtml
@@ -32,14 +32,6 @@
 


-;;; Redefine the  tag to comply with the GNU style.
-
-
-
-%body
-- GNU Project - Free Software Foundation (FSF)
-
-
;;; Redefine the  tag, adding navigation and a standard footer.






[c-family] Fix issue for external subtypes with -fdump-ada-spec

2021-06-03 Thread Eric Botcazou
This works around an irregularity of the language whereby subtypes, unlike
types, are not visible through a limited_with clause.

Tested on x86-64/Linux, applied on the mainline.


2021-06-03  Eric Botcazou  

c-family/
* c-ada-spec.c (pp_ada_tree_identifier): Tidy up.
(dump_ada_node) : Deal specially with external subtypes.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index ef0c74c3f08..751cc0edef8 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -1341,49 +1341,46 @@ pp_ada_tree_identifier (pretty_printer *buffer, tree node, tree type,
   char *s = to_ada_name (name, &space_found);
   tree decl = get_underlying_decl (type);
 
-  /* If the entity comes from another file, generate a package prefix.  */
   if (decl)
 {
-  expanded_location xloc = expand_location (decl_sloc (decl, false));
+  /* If the entity comes from another file, generate a package prefix.  */
+  const expanded_location xloc = expand_location (decl_sloc (decl, false));
 
-  if (xloc.file && xloc.line)
+  if (xloc.line && xloc.file && xloc.file != current_source_file)
 	{
-	  if (xloc.file != current_source_file)
+	  switch (TREE_CODE (type))
 	{
-	  switch (TREE_CODE (type))
-		{
-		  case ENUMERAL_TYPE:
-		  case INTEGER_TYPE:
-		  case REAL_TYPE:
-		  case FIXED_POINT_TYPE:
-		  case BOOLEAN_TYPE:
-		  case REFERENCE_TYPE:
-		  case POINTER_TYPE:
-		  case ARRAY_TYPE:
-		  case RECORD_TYPE:
-		  case UNION_TYPE:
-		  case TYPE_DECL:
-		if (package_prefix)
-		  {
-			char *s1 = get_ada_package (xloc.file);
-			append_withs (s1, limited_access);
-			pp_string (buffer, s1);
-			pp_dot (buffer);
-			free (s1);
-		  }
-		break;
-		  default:
-		break;
-		}
+	  case ENUMERAL_TYPE:
+	  case INTEGER_TYPE:
+	  case REAL_TYPE:
+	  case FIXED_POINT_TYPE:
+	  case BOOLEAN_TYPE:
+	  case REFERENCE_TYPE:
+	  case POINTER_TYPE:
+	  case ARRAY_TYPE:
+	  case RECORD_TYPE:
+	  case UNION_TYPE:
+	  case TYPE_DECL:
+		if (package_prefix)
+		  {
+		char *s1 = get_ada_package (xloc.file);
+		append_withs (s1, limited_access);
+		pp_string (buffer, s1);
+		pp_dot (buffer);
+		free (s1);
+		  }
+		break;
+	  default:
+		break;
+	}
 
-	  /* Generate the additional package prefix for C++ classes.  */
-	  if (separate_class_package (decl))
-		{
-		  pp_string (buffer, "Class_");
-		  pp_string (buffer, s);
-		  pp_dot (buffer);
-		}
-	 }
+	  /* Generate the additional package prefix for C++ classes.  */
+	  if (separate_class_package (decl))
+	{
+	  pp_string (buffer, "Class_");
+	  pp_string (buffer, s);
+	  pp_dot (buffer);
+	}
 	}
 }
 
@@ -2220,6 +2217,24 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 		{
 		  tree type_name = TYPE_NAME (TREE_TYPE (node));
 
+		  /* Generate "access " instead of "access "
+		 if the subtype comes from another file, because subtype
+		 declarations do not contribute to the limited view of a
+		 package and thus subtypes cannot be referenced through
+		 a limited_with clause.  */
+		  if (type_name
+		  && TREE_CODE (type_name) == TYPE_DECL
+		  && DECL_ORIGINAL_TYPE (type_name)
+		  && TYPE_NAME (DECL_ORIGINAL_TYPE (type_name)))
+		{
+		  const expanded_location xloc
+			= expand_location (decl_sloc (type_name, false));
+		  if (xloc.line
+			  && xloc.file
+			  && xloc.file != current_source_file)
+			type_name = DECL_ORIGINAL_TYPE (type_name);
+		}
+
 		  /* For now, handle access-to-access as System.Address.  */
 		  if (TREE_CODE (TREE_TYPE (node)) == POINTER_TYPE)
 		{
@@ -2241,8 +2256,8 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc,
 		{
 		  if (!type || TREE_CODE (type) != FUNCTION_DECL)
 			{
-			  pp_string (buffer, "access ");
 			  is_access = true;
+			  pp_string (buffer, "access ");
 
 			  if (quals & TYPE_QUAL_CONST)
 			pp_string (buffer, "constant ");


[c-family] Fix issue for nested record types with -fdump-ada-spec

2021-06-03 Thread Eric Botcazou
Ada does not support anonymous record declarations nested in other record 
declarations so -fdump-ada-spec needs to unnest them, and this contains a few 
fixes for this machinery.

Tested on x86-64/Linux, applied on the mainline.


2021-06-03  Eric Botcazou  

c-family/
* c-ada-spec.c (dump_ada_macros): Minor tweaks.
(dump_ada_decl_name): Likewise.
(dump_anonymous_type_name): Remove parent parameter and adjust.
(dump_sloc): Minor tweak.
(dump_ada_array_type): Remove type parameter and adjust.
(dump_ada_enum_type): Remove parent parameter and adjust.
(dump_ada_node): Adjust calls to above functions.
(dumped_anonymous_types): New global variable.
(dump_nested_types_1): Rename into...
(dump_nested_types): ...this.
(dump_nested_type): Remove parent and dumped_types parameters.
: Replace dumped_types with dumped_anonymous_types.
Adjust calls to dump_anonymous_type_name and dump_ada_array_type.
(dump_ada_specs): Initialize and free dumped_anonymous_types.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index 751cc0edef8..a2669c604a9 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -408,8 +408,8 @@ dump_ada_macros (pretty_printer *pp, const char* file)
 			}
 		  else
 			{
-			  chars_seen = sprintf
-			((char *) buffer, "Character'Val (%d)", (int) c);
+			  chars_seen = sprintf ((char *) buffer,
+		"Character'Val (%d)", (int) c);
 			  buffer += chars_seen;
 			}
 		}
@@ -611,7 +611,7 @@ dump_ada_macros (pretty_printer *pp, const char* file)
 	  pp_string (pp, ";  --  ");
 	  pp_string (pp, sloc.file);
 	  pp_colon (pp);
-	  pp_scalar (pp, "%d", sloc.line);
+	  pp_decimal_int (pp, sloc.line);
 	  pp_newline (pp);
 	}
   else
@@ -1464,28 +1464,21 @@ dump_ada_decl_name (pretty_printer *buffer, tree decl, bool limited_access)
 	{
 	  pp_string (buffer, "anon");
 	  if (TREE_CODE (decl) == FIELD_DECL)
-	pp_scalar (buffer, "%d", DECL_UID (decl));
+	pp_decimal_int (buffer, DECL_UID (decl));
 	  else
-	pp_scalar (buffer, "%d", TYPE_UID (TREE_TYPE (decl)));
+	pp_decimal_int (buffer, TYPE_UID (TREE_TYPE (decl)));
 	}
   else if (TREE_CODE (type_name) == IDENTIFIER_NODE)
 	pp_ada_tree_identifier (buffer, type_name, decl, limited_access);
 }
 }
 
-/* Dump in BUFFER a name for the type T, which is a _TYPE without TYPE_NAME.
-   PARENT is the parent node of T.  */
+/* Dump in BUFFER a name for the type T, which is a TYPE without TYPE_NAME.  */
 
 static void
-dump_anonymous_type_name (pretty_printer *buffer, tree t, tree parent)
+dump_anonymous_type_name (pretty_printer *buffer, tree t)
 {
-  if (DECL_NAME (parent))
-pp_ada_tree_identifier (buffer, DECL_NAME (parent), parent, false);
-  else
-{
-  pp_string (buffer, "anon");
-  pp_scalar (buffer, "%d", TYPE_UID (TREE_TYPE (parent)));
-}
+  pp_string (buffer, "anon");
 
   switch (TREE_CODE (t))
 {
@@ -1506,7 +1499,7 @@ dump_anonymous_type_name (pretty_printer *buffer, tree t, tree parent)
   break;
 }
 
-  pp_scalar (buffer, "%d", TYPE_UID (t));
+  pp_decimal_int (buffer, TYPE_UID (t));
 }
 
 /* Dump in BUFFER aspect Import on a given node T.  SPC is the current
@@ -1757,12 +1750,12 @@ dump_sloc (pretty_printer *buffer, tree node)
 {
   expanded_location xloc;
 
-  xloc.file = NULL;
-
   if (DECL_P (node))
 xloc = expand_location (DECL_SOURCE_LOCATION (node));
   else if (EXPR_HAS_LOCATION (node))
 xloc = expand_location (EXPR_LOCATION (node));
+  else
+xloc.file = NULL;
 
   if (xloc.file)
 {
@@ -1790,11 +1783,11 @@ is_char_array (tree t)
 	 && id_equal (DECL_NAME (TYPE_NAME (t)), "char");
 }
 
-/* Dump in BUFFER an array type NODE of type TYPE in Ada syntax.  SPC is the
-   indentation level.  */
+/* Dump in BUFFER an array type NODE in Ada syntax.  SPC is the indentation
+   level.  */
 
 static void
-dump_ada_array_type (pretty_printer *buffer, tree node, tree type, int spc)
+dump_ada_array_type (pretty_printer *buffer, tree node, int spc)
 {
   const bool char_array = is_char_array (node);
 
@@ -1823,8 +1816,8 @@ dump_ada_array_type (pretty_printer *buffer, tree node, tree type, int spc)
 	  || (!RECORD_OR_UNION_TYPE_P (tmp)
 	  && TREE_CODE (tmp) != ENUMERAL_TYPE))
 	dump_ada_node (buffer, tmp, node, spc, false, true);
-  else if (type)
-	dump_anonymous_type_name (buffer, tmp, type);
+  else
+	dump_anonymous_type_name (buffer, tmp);
 }
 }
 
@@ -1954,11 +1947,10 @@ is_simple_enum (tree node)
 }
 
 /* Dump in BUFFER the declaration of enumeral NODE of type TYPE in Ada syntax.
-   PARENT is the parent node of NODE.  SPC is the indentation level.  */
+   SPC is the indentation level.  */
 
 static void
-dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, tree parent,
-		int spc)
+dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, int spc)
 {
   if (is_simple_enum

[c-family] Fix duplicate name issues in output of -fdump-ada-spec

2021-06-03 Thread Eric Botcazou
The namespace rules are different in the C family of languages and in Ada, and a
few adjustments are further needed in -fdump-ada-spec because of them.

Tested on x86-64/Linux, applied on the mainline.


2021-06-03  Eric Botcazou  

c-family/
* c-ada-spec.c (dump_ada_enum_type): Dump a prefix for constants.
(htable_t): New typedef.
(overloaded_names): Use it.
(add_name): New function.
(init_overloaded_names): Use add_name to populate the table and add
special cases for sigaction and stat.
(overloaded_name_p): Rename into...
(overloading_index): ...this.  Do not initialize overloaded_names table
here.  Return the index or zero.
(dump_ada_declaration): Minor tweaks.  Do not skip overloaded functions
but add an overloading suffix instead.
(dump_ada_specs): Initialize overloaded_names tables here.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c
index 29eb0b01a91..ef0c74c3f08 100644
--- a/gcc/c-family/c-ada-spec.c
+++ b/gcc/c-family/c-ada-spec.c
@@ -2003,7 +2003,15 @@ dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, tree parent,
 	  pp_semicolon (buffer);
 	  newline_and_indent (buffer, spc);
 
+	  if (TYPE_NAME (node))
+	dump_ada_node (buffer, node, NULL_TREE, spc, false, true);
+	  else if (type)
+	dump_ada_node (buffer, type, NULL_TREE, spc, false, true);
+	  else
+	dump_anonymous_type_name (buffer, node, parent);
+	  pp_underscore (buffer);
 	  pp_ada_tree_identifier (buffer, TREE_PURPOSE (value), node, false);
+
 	  pp_string (buffer, " : constant ");
 
 	  if (TYPE_NAME (node))
@@ -2628,11 +2636,31 @@ struct overloaded_name_hasher : delete_ptr_hash
 { return a->name == b->name; }
 };
 
-static hash_table *overloaded_names;
+typedef hash_table htable_t;
+
+static htable_t *overloaded_names;
+
+/* Add an overloaded NAME with N occurrences to TABLE.  */
+
+static void
+add_name (const char *name, unsigned int n, htable_t *table)
+{
+  struct overloaded_name_hash in, *h, **slot;
+  tree id = get_identifier (name);
+  hashval_t hash = htab_hash_pointer (id);
+  in.hash = hash;
+  in.name = id;
+  slot = table->find_slot_with_hash (&in, hash, INSERT);
+  h = new overloaded_name_hash;
+  h->hash = hash;
+  h->name = id;
+  h->n = n;
+  *slot = h;
+}
 
 /* Initialize the table with the problematic overloaded names.  */
 
-static hash_table *
+static htable_t *
 init_overloaded_names (void)
 {
   static const char *names[] =
@@ -2640,41 +2668,31 @@ init_overloaded_names (void)
   { "memchr", "rawmemchr", "memrchr", "strchr", "strrchr", "strchrnul",
 "strpbrk", "strstr", "strcasestr", "index", "rindex", "basename" };
 
-  hash_table *table
-= new hash_table (64);
+  htable_t *table = new htable_t (64);
 
   for (unsigned int i = 0; i < ARRAY_SIZE (names); i++)
-{
-  struct overloaded_name_hash in, *h, **slot;
-  tree id = get_identifier (names[i]);
-  hashval_t hash = htab_hash_pointer (id);
-  in.hash = hash;
-  in.name = id;
-  slot = table->find_slot_with_hash (&in, hash, INSERT);
-  h = new overloaded_name_hash;
-  h->hash = hash;
-  h->name = id;
-  h->n = 0;
-  *slot = h;
-}
+add_name (names[i], 0, table);
+
+  /* Consider that sigaction() is overloaded by struct sigaction for QNX.  */
+  add_name ("sigaction", 1, table);
+
+  /* Consider that stat() is overloaded by struct stat for QNX.  */
+  add_name ("stat", 1, table);
 
   return table;
 }
 
-/* Return whether NAME cannot be supported as overloaded name.  */
+/* Return the overloading index of NAME or 0 if NAME is not overloaded.  */
 
-static bool
-overloaded_name_p (tree name)
+static unsigned int
+overloading_index (tree name)
 {
-  if (!overloaded_names)
-overloaded_names = init_overloaded_names ();
-
   struct overloaded_name_hash in, *h;
   hashval_t hash = htab_hash_pointer (name);
   in.hash = hash;
   in.name = name;
   h = overloaded_names->find_with_hash (&in, hash);
-  return h && ++h->n > 1;
+  return h ? ++h->n : 0;
 }
 
 /* Dump in BUFFER constructor spec corresponding to T for TYPE.  */
@@ -2798,14 +2816,17 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc)
 	}
 
   /* Skip unnamed or anonymous structs/unions/enum types.  */
-  if (!orig && !decl_name && !name
+  if (!orig
 	  && (RECORD_OR_UNION_TYPE_P (TREE_TYPE (t))
-	  || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE))
+	  || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE)
+	  && !decl_name
+	  && !name)
 	return 0;
 
-	/* Skip anonymous enum types (duplicates of real types).  */
+  /* Skip duplicates of structs/unions/enum types built in C++.  */
   if (!orig
-	  && TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE
+	  && (RECORD_OR_UNION_TYPE_P (TREE_TYPE (t))
+	  || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE)
 	  && decl_name
 	  && (*IDENTIFIER_POINTER (decl_name) == '.'
 	  || *IDENTIFIER_POINTER (decl_

Re: [PATCH] wwwdocs: Do not rewrite the page titles

2021-06-03 Thread Jonathan Wakely via Gcc-patches

On 03/06/21 16:50 +0100, Jonathan Wakely wrote:

Ping.

Is this OK now?


On 18/04/21 23:45 +0100, Jonathan Wakely wrote:

Remove GNU and FSF attribution from HTML page titles.


I don't see why we should have to "comply with the GNU style" if we're
truly an independent project run by the GCC developers and aided by
the steering committee.

OK for wwwdocs?


An alternative change would be to just drop the mention of the FSF,
since they don't fund GCC or provide hosting etc., as in the attached
patch.

And as I pointed out previously, none of these sites refer to the FSF
in their page s:

https://www.gnu.org/software/gdb/
https://www.gnu.org/software/libc/
https://www.gnu.org/software/guile/
https://www.gnu.org/software/emacs/
https://guix.gnu.org/

(Some don't refer to the GNU project either.)


diff --git a/htdocs/style.mhtml b/htdocs/style.mhtml
index 677fd05a..1f7139eb 100644
--- a/htdocs/style.mhtml
+++ b/htdocs/style.mhtml
@@ -32,12 +32,12 @@
  
 
 
-;;; Redefine the  tag to comply with the GNU style.
+;;; Redefine the  tag to mention the GNU project.
 
 
 
 %body
-- GNU Project - Free Software Foundation (FSF)
+- GNU Project
 
 
 ;;; Redefine the  tag, adding navigation and a standard footer.


Re: [PATCH 04/11] cris: Update unexpected empty split condition

2021-06-03 Thread Hans-Peter Nilsson via Gcc-patches
> From: Kewen.Lin 
> Date: Thu, 3 Jun 2021 07:45:57 +0200

> on 2021/6/2 Hans-Peter Nilsson wrote:
> >> From: Kewen Lin 
> >> Date: Wed, 2 Jun 2021 07:04:54 +0200
> > 
> >> gcc/ChangeLog:
> >>
> >>* config/cris/cris.md (*addi_reload): Fix empty split condition.

> >> -  ""
> >> +  "&& 1"

> > Ok, thanks, if only for all-round consistency.
> > 
> > In preparation for a warning for an empty condition?  I'm
> > usually all for .md-warnings, but I'm not sure about the
> > benefit of that one, though.  Those "&& 1" look...hackish.
> 
> Thanks!  Yeah, the 01/11 patch aims to raise one error message
> for the define_insn_and_split whose split condition is empty
> while insn condition isn't.  In most cases, when we write one
> define_insn_and_split we want the splitting only to take effect
> while we see the define_insn matching happen (insn cond holds),
> but if we leave the split condition empty, the splitting will
> be done always, it could result in some unexpected consequence.
> Mostly this is unintentional.

It certainly was in the patch above!

>  The error message is to avoid
> people to make it unintentionally.
> 
> As you may have seen from the discussion under the 00/11 thread,
> we will probably end up with some other solution, so I will hold
> the changes for the ports, sorry for wasting your time and the
> other port maintainers'.

No worries: I certainly don't consider it wasted and I'd
prefer to have the patch above committed sooner than the
conclusion of that discussion.  (If you don't get to it,
I'll do it, after a round of testing.)

If you're considering further target patches to adjust for
eventually changed semantics in the define_insn_and_split
split-condition, then whatever trivial patch to cris.md that
gets the effect of the one you sent is preapproved.

Again, thanks.

brgds, H-P


Fwd: [PUSHED] Skip out on processing __builtin_clz when varying.

2021-06-03 Thread Aldy Hernandez via Gcc-patches
Ping*2

-- Forwarded message -
From: Aldy Hernandez 
Date: Thu, May 13, 2021, 20:02
Subject: Re: [PUSHED] Skip out on processing __builtin_clz when varying.
To: Jakub Jelinek 
Cc: GCC patches 




On 5/12/21 5:08 PM, Jakub Jelinek wrote:
> On Wed, May 12, 2021 at 05:01:00PM -0400, Aldy Hernandez via Gcc-patches
wrote:
>>
>>  PR c/100521
>>  * gimple-range.cc (range_of_builtin_call): Skip out on
>>processing __builtin_clz when varying.
>> ---
>>   gcc/gimple-range.cc | 2 +-
>>   gcc/testsuite/gcc.dg/pr100521.c | 8 
>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/gcc.dg/pr100521.c
>>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr100521.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +int
>> +__builtin_clz (int a)
>
> Is this intentional?  People shouldn't be redefining builtins...

Ughhh.  I don't think that's intentional.  For that matter, the current
nor the old code is designed to deal with this, especially in this case
when the builtin is being redefined with incompatible arguments.  That
is, the above "builtin" has a signed integer as an argument, whereas the
original builtin had an unsigned one.

In looking at the original vr-values code, I think this could use a
cleanup.  First, ranges from range_of_expr are always numeric so we
should adjust.  Also, the checks for non-zero were assuming the argument
was unsigned, which in the above redirect is clearly not.  I've cleaned
this up, so that it works either way, though perhaps we should _also_
bail on non-builtins. I don't know...this is before my time.

BTW, I've removed the following annoying idiom:

- int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ());
- if (newmini == prec)

This is really a check for r.upper_bound() == 0, as floor_log2(0)
returns -1.  It's confusing.

How does this look?  For reference, the original code where this all
came from is 82b6d25d289195.

Thanks for pointing this out.
Aldy
From f8a958e8028ed129558f9ad7ccf423c834d377bd Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 13 May 2021 13:47:41 -0400
Subject: [PATCH] Cleanup clz and ctz code in range_of_builtin_call.

gcc/ChangeLog:

	* gimple-range.cc (range_of_builtin_call): Cleanup clz and ctz
	code.
---
 gcc/gimple-range.cc | 43 ---
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 5b288d8e6a7..b33ba1c8099 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -736,33 +736,29 @@ range_of_builtin_call (range_query &query, irange &r, gcall *call)
 	}
 
   query.range_of_expr (r, arg, call);
-  // From clz of minimum we can compute result maximum.
-  if (r.constant_p () && !r.varying_p ())
+  if (!r.undefined_p ())
 	{
-	  int newmaxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
-	  // Argument is unsigned, so do nothing if it is [0, ...] range.
-	  if (newmaxi != prec)
+	  // From clz of minimum we can compute result maximum.
+	  if (wi::gt_p (r.lower_bound (), 0, TYPE_SIGN (r.type (
+	{
+	  maxi = prec - 1 - wi::floor_log2 (r.lower_bound ());
+	  if (mini == -2)
+		mini = 0;
+	}
+	  else if (!range_includes_zero_p (&r))
 	{
 	  mini = 0;
-	  maxi = newmaxi;
+	  maxi = prec - 1;
 	}
-	}
-  else if (!range_includes_zero_p (&r))
-	{
-	  maxi = prec - 1;
-	  mini = 0;
-	}
-  if (mini == -2)
-	break;
-  // From clz of maximum we can compute result minimum.
-  if (r.constant_p ())
-	{
-	  int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ());
-	  if (newmini == prec)
+	  if (mini == -2)
+	break;
+	  // From clz of maximum we can compute result minimum.
+	  wide_int max = r.upper_bound ();
+	  int newmini = prec - 1 - wi::floor_log2 (max);
+	  if (max == 0)
 	{
-	  // Argument range is [0, 0].  If CLZ_DEFINED_VALUE_AT_ZERO
-	  // is 2 with VALUE of prec, return [prec, prec], otherwise
-	  // ignore the range.
+	  // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec,
+	  // return [prec, prec], otherwise ignore the range.
 	  if (maxi == prec)
 		mini = prec;
 	}
@@ -803,7 +799,8 @@ range_of_builtin_call (range_query &query, irange &r, gcall *call)
   query.range_of_expr (r, arg, call);
   if (!r.undefined_p ())
 	{
-	  if (r.lower_bound () != 0)
+	  // If arg is non-zero, then use [0, prec - 1].
+	  if (!range_includes_zero_p (&r))
 	{
 	  mini = 0;
 	  maxi = prec - 1;
-- 
2.31.1



[PATCH] PR libstdc++/98842: Fixed Constraints on operator<=>(optional, U)

2021-06-03 Thread Seija K. via Gcc-patches
The original operator was underconstrained. _Up needs to fulfill
compare_three_way_result,
as mentioned in this bug report
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98842

diff --git a/libstdc++-v3/include/std/optional
b/libstdc++-v3/include/std/optional
index 8b9e038e6e510..9e61c1b2cbfbd 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1234,7 +1234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__rhs || __lhs >= *__rhs; }

 #ifdef __cpp_lib_three_way_comparison
-  template
+  template _Up>
 constexpr compare_three_way_result_t<_Tp, _Up>
 operator<=>(const optional<_Tp>& __x, const _Up& __v)
 { return bool(__x) ? *__x <=> __v : strong_ordering::less; }


[RFC] Implementing detection of saturation and rounding arithmetic

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This RFC is motivated by the IV sharing RFC in 
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the 
need to have the IVOPTS pass be able to clean up IV's shared between 
multiple loops. When creating a similar problem with C code I noticed 
IVOPTs treated IV's with uses outside the loop differently, this didn't 
even required multiple loops, take for instance the following example 
using SVE intrinsics:


#include 
#include 
extern void use (char *);
void bar (char  * __restrict__ a, char * __restrict__ b, char * 
__restrict__ c, unsigned n)

{
    svbool_t all_true = svptrue_b8 ();
  unsigned i = 0;
  if (n < (UINT_MAX - svcntb() - 1))
    {
    for (; i < n; i += svcntb())
    {
    svuint8_t va = svld1 (all_true, (uint8_t*)a);
    svuint8_t vb = svld1 (all_true, (uint8_t*)b);
    svst1 (all_true, (uint8_t *)c, svadd_z (all_true, va,vb));
    a += svcntb();
    b += svcntb();
    c += svcntb();
    }
    }
  use (a);
}

IVOPTs tends to generate a shared IV for SVE memory accesses, as we 
don't have a post-increment for SVE load/stores. If we had not included 
'use (a);' in this example, IVOPTs would have replaced the IV's for a, b 
and c with a single one, (also used for the loop-control). See:


   [local count: 955630225]:
  # ivtmp.7_8 = PHI 
  va_14 = MEM  [(unsigned char *)a_10(D) + ivtmp.7_8 * 1];
  vb_15 = MEM  [(unsigned char *)b_11(D) + ivtmp.7_8 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_14, vb_15);
  MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2;
  ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16];
  i_23 = (unsigned int) ivtmp.7_25;
  if (n_9(D) > i_23)
    goto ; [89.00%]
  else
    goto ; [11.00%]

 However, due to the 'use (a);' it will create two IVs one for 
loop-control, b and c and one for a. See:


  [local count: 955630225]:
  # a_28 = PHI 
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_28];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  a_18 = a_28 + POLY_INT_CST [16, 16];
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_8 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_8)
    goto ; [89.00%]
  else
    goto ; [11.00%]

With the first patch attached in this RFC 'no_cost.patch', I tell IVOPTs 
to not cost uses outside of the loop. This makes IVOPTs generate a 
single IV, but unfortunately it decides to create the variable for the 
use inside the loop and it also seems to use the pre-increment value of 
the shared-IV and add the [16,16] to it. See:


   [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  _8 = (unsigned long) a_11(D);
  _7 = _8 + ivtmp.7_25;
  _6 = _7 + POLY_INT_CST [16, 16];
  a_18 = (char * restrict) _6;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_5 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_5)
    goto ; [89.00%]
  else
    goto ; [11.00%]

With the patch 'var_after.patch' I make get_computation_aff_1 use 
'cand->var_after' for outside uses thus using the post-increment var of 
the candidate IV. This means I have to insert it in a different place 
and make sure to delete the old use->stmt. I'm sure there is a better 
way to do this using IVOPTs current framework, but I didn't find one 
yet. See the result:


  [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  _8 = (unsigned long) a_11(D);
  _7 = _8 + ivtmp.7_24;
  a_18 = (char * restrict) _7;
  i_6 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_6)
    goto ; [89.00%]
  else
    goto ; [11.00%]


This is still not optimal as we are still doing the update inside the 
loop and there is absolutely no need for that. I found that running sink 
would solve it and it seems someone has added a second sink pass, so 
that saves me a third patch :) see after sink2:


   [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_6 = (unsigned int) ivtmp.7_24;
  if (i_6 < n_10(D))
    goto ; [89.00%]
  else
    goto ; [11.00%]

   [local count: 105119324]:
  _8 = (unsigned long) a_11(D);
  _7 = _8 + ivtmp.7_24;
  a_18 = (char * restrict) _7;
  goto ; [

[RFC][ivopts] Generate better code for IVs with uses outside the loop (was Re: [RFC] Implementing detection of saturation and rounding arithmetic)

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches

Streams got crossed there and used the wrong subject ...

On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This RFC is motivated by the IV sharing RFC in 
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the 
need to have the IVOPTS pass be able to clean up IV's shared between 
multiple loops. When creating a similar problem with C code I noticed 
IVOPTs treated IV's with uses outside the loop differently, this 
didn't even required multiple loops, take for instance the following 
example using SVE intrinsics:


#include 
#include 
extern void use (char *);
void bar (char  * __restrict__ a, char * __restrict__ b, char * 
__restrict__ c, unsigned n)

{
    svbool_t all_true = svptrue_b8 ();
  unsigned i = 0;
  if (n < (UINT_MAX - svcntb() - 1))
    {
    for (; i < n; i += svcntb())
    {
    svuint8_t va = svld1 (all_true, (uint8_t*)a);
    svuint8_t vb = svld1 (all_true, (uint8_t*)b);
    svst1 (all_true, (uint8_t *)c, svadd_z (all_true, 
va,vb));

    a += svcntb();
    b += svcntb();
    c += svcntb();
    }
    }
  use (a);
}

IVOPTs tends to generate a shared IV for SVE memory accesses, as we 
don't have a post-increment for SVE load/stores. If we had not 
included 'use (a);' in this example, IVOPTs would have replaced the 
IV's for a, b and c with a single one, (also used for the 
loop-control). See:


   [local count: 955630225]:
  # ivtmp.7_8 = PHI 
  va_14 = MEM  [(unsigned char *)a_10(D) + ivtmp.7_8 * 1];
  vb_15 = MEM  [(unsigned char *)b_11(D) + ivtmp.7_8 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_14, vb_15);
  MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2;
  ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16];
  i_23 = (unsigned int) ivtmp.7_25;
  if (n_9(D) > i_23)
    goto ; [89.00%]
  else
    goto ; [11.00%]

 However, due to the 'use (a);' it will create two IVs one for 
loop-control, b and c and one for a. See:


  [local count: 955630225]:
  # a_28 = PHI 
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_28];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  a_18 = a_28 + POLY_INT_CST [16, 16];
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_8 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_8)
    goto ; [89.00%]
  else
    goto ; [11.00%]

With the first patch attached in this RFC 'no_cost.patch', I tell 
IVOPTs to not cost uses outside of the loop. This makes IVOPTs 
generate a single IV, but unfortunately it decides to create the 
variable for the use inside the loop and it also seems to use the 
pre-increment value of the shared-IV and add the [16,16] to it. See:


   [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  _8 = (unsigned long) a_11(D);
  _7 = _8 + ivtmp.7_25;
  _6 = _7 + POLY_INT_CST [16, 16];
  a_18 = (char * restrict) _6;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_5 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_5)
    goto ; [89.00%]
  else
    goto ; [11.00%]

With the patch 'var_after.patch' I make get_computation_aff_1 use 
'cand->var_after' for outside uses thus using the post-increment var 
of the candidate IV. This means I have to insert it in a different 
place and make sure to delete the old use->stmt. I'm sure there is a 
better way to do this using IVOPTs current framework, but I didn't 
find one yet. See the result:


  [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  _8 = (unsigned long) a_11(D);
  _7 = _8 + ivtmp.7_24;
  a_18 = (char * restrict) _7;
  i_6 = (unsigned int) ivtmp.7_24;
  if (n_10(D) > i_6)
    goto ; [89.00%]
  else
    goto ; [11.00%]


This is still not optimal as we are still doing the update inside the 
loop and there is absolutely no need for that. I found that running 
sink would solve it and it seems someone has added a second sink pass, 
so that saves me a third patch :) see after sink2:


   [local count: 955630225]:
  # ivtmp.7_25 = PHI 
  va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
  vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
  _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
  MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
  ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
  i_6 = (unsigned int) ivtmp.7_24;
  if (i_6 < n_10(D))
    goto ; [89.00%]
  else
    goto ; [11.00%]

[PATCH] libgcc: Fix _Unwind_Backtrace() for SEH

2021-06-03 Thread Seija K. via Gcc-patches
Forgot to assign to gcc_context.cfa and gcc_context.ra. Note this fix can
be backported to earlier editions of gcc as well

diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c
index 8c6aade9a3b39..d40d16702a9e1 100644
--- a/libgcc/unwind-seh.c
+++ b/libgcc/unwind-seh.c
@@ -466,6 +466,9 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace,
&gcc_context.disp->HandlerData,
&gcc_context.disp->EstablisherFrame, NULL);

+  gcc_context.cfa = ms_context.Rsp;
+  gcc_context.ra = ms_context.Rip;
+
   /* Call trace function.  */
   if (trace (&gcc_context, trace_argument) != _URC_NO_REASON)
return _URC_FATAL_PHASE1_ERROR;


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Jeff Law via Gcc-patches




On 6/3/2021 2:00 AM, Segher Boessenkool wrote:

Hi!

On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote:

on 2021/6/3 上午7:52, Segher Boessenkool wrote:

- add a new "define_independent_insn_and_split" that has the
   current semantics of define_insn_and_split.  This should be
   mechanical.

I'd rather not have that -- we can just write separate define_insn and
define_split in that case.

Not sure if someone would argue that he/she would like to go with one shared
pattern as before, to avoid any possible differences between two seperated
patterns and have good maintainability (like only editing on place) and
slightly better efficiency.

You only would do this if you have a different insn condition and split
condition, which is a very important thing to know, it doesn't hurt to
draw attention to it.  The efficiency is exactly the same btw,
define_insn_and_split is just syntactic sugar.

The whole point of requiring the split condition to start with && is so
it will become harder to mess things up (it will make the gen* code a
tiny little bit simpler as well).  And there is no transition period or
anything like that needed either.  Just the bunch that will break will
need fixing.  So let's find out how many of those there are :-)
Exactly.   While these empty conditions or those not starting with "&&" 
are technically valid, they're all suspicious from a port correctness 
standpoint, particularly if the main condition is non-empty.


Having made that mistake when converting the H8 away from CC0, I can say 
fairly confidently that if we had this in place a year ago that those 
mistakes would likely have been avoided.  Thankfully the H8 isn't a 
heavily used port and has limped along until I stumbled over the issue a 
week or so ago while polishing some improvements to the port.

Jeff



Re: GCC documentation: porting to Sphinx

2021-06-03 Thread Joseph Myers
On Thu, 3 Jun 2021, Martin Liška wrote:

> On 6/2/21 6:44 PM, Joseph Myers wrote:
> > On Wed, 2 Jun 2021, Joel Sherrill wrote:
> > 
> > > For RTEMS, we switched from texinfo to Sphinx and the dependency
> > > on Python3 for Sphinx has caused a bit of hassle. Is this going to be
> > > an issue for GCC?
> > 
> > What Sphinx (and, thus, Python) versions does the GCC manual build work
> > with?
> 
> I've just tried version 1.7.6 which we use for libgccjit and it's fine:
> https://gcc.gnu.org/onlinedocs/jit/
> 
> About Python version: I'm not planning supporting Python2, it's dead 10 years
> already.

There should be appropriate configure checks to avoid building manuals 
with too-old versions (i.e. disable the info/man manual build/install when 
Sphinx, or the Python version it's using, is too old or missing, not fail 
configure).

Actually this code is depending on Python 3.6 or later because of the use 
of an f-string in baseconf.py (without that f-string, it works with older 
versions, even 2.7).  Formally 3.5 and older are no longer supported 
upstream, but certainly still present in some maintained long-term-support 
distribution versions.

> I would recommend testing the build. You can simply clone:
> https://github.com/marxin/texi2rst-generated
> 
> and simply run 'make html' or 'make latexpdf'. Basic dependencies are
> mentioned here:
> https://github.com/marxin/texi2rst-generated#requirements

It appears "make html" works (with lots of WARNINGs) with Sphinx 1.6.1 but 
fails with 1.4 ("Theme error: unsupported theme option 
'prev_next_buttons_location' given").

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2, rs6000] Remove mode promotion for pseudos

2021-06-03 Thread Segher Boessenkool
Hi!

On Thu, May 20, 2021 at 05:49:49PM +0800, HAO CHEN GUI wrote:
>   rs6000 has instructions that can do almost everything 32 bit
>   at least as efficiently as corresponding 64 bit things. The
>   mode promotion can be defered to when a wide mode is necessary.
>   So it helps a lot not promote mode for pseudos. SPECint test
>   shows that the overall performance improvement (by geomean) is
>   more than 2% with this patch.
>   testsuite/gcc.target/powerpc/not-promote-mode.c illustrates how
>   the patch eliminates the redundant extensions and do further
>   optimization by disabling mode promotion for pseduos.

I'd still like to see if (and why) this works better than explicitly
promoting QImode and HImode here.  But that can be done later.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/not-promote-mode.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */

Just

/* { dg-do compile { target lp64 } } */

because the rest is already implied by this being in gcc.target/powerpc .

The patch is okay for trunk.  Thank you very much for finding this huge
performance gain!


Segher


Re: [PATCH] MAINTAINERS: create DCO section; add myself to it

2021-06-03 Thread Jason Merrill via Gcc-patches

On 6/2/21 12:33 PM, Koning, Paul wrote:




On Jun 2, 2021, at 11:03 AM, Jason Merrill via Gcc-patches 
 wrote:

On 6/1/21 3:22 PM, Richard Biener via Gcc wrote:

On June 1, 2021 7:30:54 PM GMT+02:00, David Malcolm via Gcc  
wrote:

...


The MAINTAINERS file doesn't seem to have such a "DCO list"
yet; does the following patch look like what you had in mind?

ChangeLog

* MAINTAINERS: Create DCO section; add myself to it.
---
MAINTAINERS | 12 
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index db25583b37b..1148e0915cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -685,3 +685,15 @@ Josef Zlomek   

James Dennett   
Christian Ehrhardt  
Dara Hazeghi
+
+
+DCO
+===
+
+Developers with commit access may add their name to the following list
+to certify the DCO (https://developercertificate.org/) for all

There should be a verbatim copy of the DCO in this file or the repository.


It's on the website now, at gcc.gnu.org/dco.html , and I've added the section 
to MAINTAINERS.  It's not clear to me that it needs to be in the source tree as 
well, since it's project contribution policy rather than license.


I'm wondering about change control of this document.  The GPL has a version 
number and references to use the version number.  The DCO seems to have a 
version number, but the DCO section in the MAINTAINERS file does not give it.  
I would think that a certification should call out which DCO it uses, whether 
in a one-off (in a patch) or in the MAINTAINERS DCO list.


I've added the version to the MAINTAINERS list, thanks.  It is not 
customary to mention the version in a Signed-off-by tag.


Jason



Re: [PATCH 1/2] rs6000: Add support for _mm_minpos_epu16

2021-06-03 Thread Paul A. Clarke via Gcc-patches
On Wed, Jun 02, 2021 at 07:27:35PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> 
> > +/* Return horizontal packed word minimum and its index in bits [15:0]
> > +   and bits [18:16] respectively.  */
> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_minpos_epu16 (__m128i __A)
> > +{
> > +  union __u
> > +{
> > +  __m128i __m;
> > +  __v8hu __uh;
> > +};
> > +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> > +  unsigned short __ridx = 0;
> > +  unsigned short __rmin = __u.__uh[__ridx];
> > +  for (unsigned long __i = __ridx+1;
> 
> (spaces around the "+"?)

ok

> 
> > +   __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);
> 
> You should either use a macro for that, or just write "8" :-)

ok. (There should be a standard thing for this operation.)

> > +   __i++)
> > +{
> > +  if (__u.__uh[__i] < __rmin)
> > +{
> > +  __rmin = __u.__uh[__i];
> > +  __ridx = __i;
> > +}
> > +}
> > +  __r.__uh[0] = __rmin;
> > +  __r.__uh[1] = __ridx;
> > +  return __r.__m;
> > +}
> 
> This does not compute the index correctly for big endian (it needs to
> walk from right to left for that).  The construction of the return value
> looks wrong as well.
> 
> Okay for trunk with that fixed.  Thanks!

I'm not seeing the issue here. The values are numbered by element order,
and the results are in the "first" (minimum value) and "second" (index of
first encountered minimum value in element order) elements of the result.

PC


Re: [PATCH 2/2] rs6000: Add test for _mm_minpos_epu16

2021-06-03 Thread Paul A. Clarke via Gcc-patches
On Wed, Jun 02, 2021 at 08:50:56PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 05:13:16PM -0500, Paul A. Clarke wrote:
> > +  for (i = 0; i < NUM; i++)
> > +src.s[i] = i * i - 68 * i + 1200;
> 
> Could you do tests with some identical elements as well?  Because that
> is where I think it fails on BE currently.

Let me re-do the test case a bit more to provide a better set of
input data, rather than this computational attempt which misses a
bunch of interesting cases.

I'll send a v2 in a bit.

PC


[PATCH] c++: top-level cv-quals on type of NTTP [PR100893]

2021-06-03 Thread Patrick Palka via Gcc-patches
Here, we're rejecting the specialization of g with T=A, F=&f in the
first testcase due to a spurious constness mismatch between the type of
the template argument &f and the substituted type of F (the substituted
type has a top-level const).  Note that this mismatch doesn't occur with
object pointers because in that case a call to
perform_qualification_conversions from convert_nontype_argument
implicitly adds a top-level const to the argument (via a cast) to match.

This however seems to be a manifestation of a more general conformance
issue -- that we're not dropping top-level cv-quals after substituting
into the type of an NTTP as per [temp.param]/6 (we only do so at parse
time in process_template_parm).  This patch makes convert_template_argument
drop top-level cv-quals accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100893

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Strip top-level cv-quals
on the substituted type of a non-type template parameter.

gcc/testsuite/ChangeLog:

* g++.dg/template/param4.C: New test.
* g++.dg/template/param5.C: New test.
---
 gcc/cp/pt.c|  7 ++-
 gcc/testsuite/g++.dg/template/param4.C | 10 ++
 gcc/testsuite/g++.dg/template/param5.C |  7 +++
 3 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/param4.C
 create mode 100644 gcc/testsuite/g++.dg/template/param5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7211bdc5bbc..66cc88a331f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8494,7 +8494,12 @@ convert_template_argument (tree parm,
return error_mark_node;
}
   else
-   t = tsubst (t, args, complain, in_decl);
+   {
+ t = tsubst (t, args, complain, in_decl);
+ /* Ignore top-level qualifiers on the substituted type of this
+non-type template parameter, as per [temp.param]/6.  */
+ t = cv_unqualified (t);
+   }
 
   if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/template/param4.C 
b/gcc/testsuite/g++.dg/template/param4.C
new file mode 100644
index 000..d55fecec2d9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/param4.C
@@ -0,0 +1,10 @@
+// PR c++/100893
+
+template void g() { }
+
+struct A { typedef void (* const type)(); };
+void f();
+template void g();
+
+struct B { typedef void (B::* const type)(); void f(); };
+template void g();
diff --git a/gcc/testsuite/g++.dg/template/param5.C 
b/gcc/testsuite/g++.dg/template/param5.C
new file mode 100644
index 000..b8c92f4c217
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/param5.C
@@ -0,0 +1,7 @@
+// Verify top-level cv-qualifiers are dropped when determining the substituted
+// type of a non-type-template parameter, as per [temp.param]/6.
+// { dg-do compile { target c++11 } }
+
+template decltype(V)& f();
+using type = decltype(f());
+using type = int&;
-- 
2.32.0.rc2



[PATCH] i386: Add insert and extract patterns for 4-byte vectors [PR100637]

2021-06-03 Thread Uros Bizjak via Gcc-patches
The patch introduces insert and extract patterns for 4-byte vectors.
It effectively only emits PINSR and PEXTR instructions when available,
otherwise falls back to generic code that emulates these instructions
via inserts, extracts, logic operations and shifts in integer registers.

Please note that generic fallback produces better code than the current
approach of constructing new vector in memory (due to store forwarding stall)
so also enable QImode 8-byte vector inserts only with TARGET_SSE4_1.

2021-06-03  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_vector_set):
Handle V2HI and V4QI modes.
(ix86_expand_vector_extract): Ditto.
* config/i386/mmx.md (*pinsrw): New insn pattern.
(*pinsrb): Ditto.
(*pextrw): Ditto.
(*pextrw_zext): Ditto.
(*pextrb): Ditto.
(*pextrb_zext): Ditto.
(vec_setv2hi): New expander.
(vec_extractv2hihi): Ditto.
(vec_setv4qi): Ditto.
(vec_extractv4qiqi): Ditto.

(vec_setv8qi): Enable only for TARGET_SSE4_1.
(vec_extractv8qiqi): Ditto.

gcc/testsuite/

PR target/100637
* gcc.target/i386/vperm-v2hi.c: New test.
* gcc.target/i386/vperm-v4qi.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 4185f58eed5..eb7cdb0c14f 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -14968,6 +14968,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   return;
 
 case E_V8HImode:
+case E_V2HImode:
   use_vec_merge = TARGET_SSE2;
   break;
 case E_V4HImode:
@@ -14975,6 +14976,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   break;
 
 case E_V16QImode:
+case E_V4QImode:
   use_vec_merge = TARGET_SSE4_1;
   break;
 
@@ -15274,6 +15276,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   break;
 
 case E_V8HImode:
+case E_V2HImode:
   use_vec_extr = TARGET_SSE2;
   break;
 case E_V4HImode:
@@ -15294,6 +15297,9 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
  return;
}
   break;
+case E_V4QImode:
+  use_vec_extr = TARGET_SSE4_1;
+  break;
 
 case E_V8SFmode:
   if (TARGET_AVX)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f39e062ddfc..914e5e91e90 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3092,7 +3092,7 @@ (define_expand "vec_setv8qi"
   [(match_operand:V8QI 0 "register_operand")
(match_operand:QI 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_set (TARGET_MMX_WITH_SSE, operands[0], operands[1],
  INTVAL (operands[2]));
@@ -3103,7 +3103,7 @@ (define_expand "vec_extractv8qiqi"
   [(match_operand:QI 0 "register_operand")
(match_operand:V8QI 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_extract (TARGET_MMX_WITH_SSE, operands[0],
  operands[1], INTVAL (operands[2]));
@@ -3120,6 +3120,178 @@ (define_expand "vec_initv8qiqi"
   DONE;
 })
 
+(define_insn "*pinsrw"
+  [(set (match_operand:V2HI 0 "register_operand" "=x,YW")
+(vec_merge:V2HI
+  (vec_duplicate:V2HI
+(match_operand:HI 2 "nonimmediate_operand" "rm,rm"))
+ (match_operand:V2HI 1 "register_operand" "0,YW")
+  (match_operand:SI 3 "const_int_operand")))]
+  "TARGET_SSE2
+   && ((unsigned) exact_log2 (INTVAL (operands[3]))
+   < GET_MODE_NUNITS (V2HImode))"
+{
+  operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
+  switch (which_alternative)
+{
+case 1:
+  if (MEM_P (operands[2]))
+   return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  else
+   return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+case 0:
+  if (MEM_P (operands[2]))
+   return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  else
+   return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+default:
+  gcc_unreachable ();
+}
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog")
+   (set_attr "length_immediate" "1")
+   (set_attr "mode" "TI")])
+
+(define_insn "*pinsrb"
+  [(set (match_operand:V4QI 0 "register_operand" "=x,YW")
+(vec_merge:V4QI
+  (vec_duplicate:V4QI
+(match_operand:QI 2 "nonimmediate_operand" "rm,rm"))
+ (match_operand:V4QI 1 "register_operand" "0,YW")
+  (match_operand:SI 3 "const_int_operand")))]
+  "TARGET_SSE4_1
+   && ((unsigned) exact_log2 (INTVAL (operands[3]))
+   < GET_MODE_NUNITS (V4QImode))"
+{
+  operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
+  switch (which_alternative)
+{
+ca

Re: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]

2021-06-03 Thread Patrick Palka via Gcc-patches
On Thu, 3 Jun 2021, Patrick Palka wrote:

> Here, we're rejecting the specialization of g with T=A, F=&f in the
> first testcase due to a spurious constness mismatch between the type of
> the template argument &f and the substituted type of F (the substituted
> type has a top-level const).  Note that this mismatch doesn't occur with
> object pointers because in that case a call to
> perform_qualification_conversions from convert_nontype_argument
> implicitly adds a top-level const to the argument (via a cast) to match.
> 
> This however seems to be a manifestation of a more general conformance
> issue -- that we're not dropping top-level cv-quals after substituting
> into the type of an NTTP as per [temp.param]/6 (we only do so at parse
> time in process_template_parm).  This patch makes convert_template_argument
> drop top-level cv-quals accordingly.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/100893
> 
> gcc/cp/ChangeLog:
> 
>   * pt.c (convert_template_argument): Strip top-level cv-quals
>   on the substituted type of a non-type template parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/param4.C: New test.
>   * g++.dg/template/param5.C: New test.
> ---
>  gcc/cp/pt.c|  7 ++-
>  gcc/testsuite/g++.dg/template/param4.C | 10 ++
>  gcc/testsuite/g++.dg/template/param5.C |  7 +++
>  3 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/param4.C
>  create mode 100644 gcc/testsuite/g++.dg/template/param5.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 7211bdc5bbc..66cc88a331f 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -8494,7 +8494,12 @@ convert_template_argument (tree parm,
>   return error_mark_node;
>   }
>else
> - t = tsubst (t, args, complain, in_decl);
> + {
> +   t = tsubst (t, args, complain, in_decl);
> +   /* Ignore top-level qualifiers on the substituted type of this
> +  non-type template parameter, as per [temp.param]/6.  */
> +   t = cv_unqualified (t);
> + }

Err, shortly after posting I realized we could get top-level cv-quals not
only after substitution, but also after deduction of a decltype(auto)
NTTP (as in nontype-auto19.C below), so the call to cv_unqualified would
need to happen after do_auto_deduction too, like so.  Testing in
progress.

-- >8 --

Subject: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]

Here, we're rejecting the specialization of g with T=A, F=&f in
param4.C below due to a spurious constness mismatch between the type of
the template argument &f and the substituted type of F (the substituted
type has a top-level const).  Note that this mismatch doesn't occur with
object pointers because in that case a call to
perform_qualification_conversions from convert_nontype_argument
implicitly adds a top-level const to the argument (via a cast) to match.

This however seems to be a manifestation of a more general conformance
issue -- that we're not dropping top-level cv-quals after substituting
into the type of an NTTP as per [temp.param]/6 (we only do so at parse
time in process_template_parm).  So this patch makes
convert_template_argument drop top-level cv-quals accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100893

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Strip top-level cv-quals
on the substituted type of a non-type template parameter.

gcc/testsuite/ChangeLog:

* g++.dg/template/param4.C: New test.
* g++.dg/template/param5.C: New test.
* g++.dg/cpp1z/nontype-auto19.C: New test.
* g++.dg/cpp2a/concepts-decltype.C: Don't expect that the
deduced type of a decltype(auto) NTTP has top-level cv-quals.
---
 gcc/cp/pt.c|  4 
 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C|  8 
 gcc/testsuite/g++.dg/cpp2a/concepts-decltype.C |  2 +-
 gcc/testsuite/g++.dg/template/param4.C | 10 ++
 gcc/testsuite/g++.dg/template/param5.C |  7 +++
 5 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C
 create mode 100644 gcc/testsuite/g++.dg/template/param4.C
 create mode 100644 gcc/testsuite/g++.dg/template/param5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7211bdc5bbc..3cac073ed50 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8496,6 +8496,10 @@ convert_template_argument (tree parm,
   else
t = tsubst (t, args, complain, in_decl);
 
+  /* Drop top-level cv-qualifiers on the substituted/deduced type of
+this non-type template parameter, as per [temp.param]/6.  */
+  t = cv_unqualified (t);
+
   if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C 
b/gcc/testsuite

[PATCH] libgcc libiberty: optimize and modernize standard string and memory functions

2021-06-03 Thread Seija K. via Gcc-patches
This patch optimizes and simplifies many of the standard string functions.

Since C99, some of the standard string functions have been changed to use
the restrict modifier.

diff --git a/libgcc/memcmp.c b/libgcc/memcmp.c
index 2348afe1d27f7..74195cf6baf13 100644
--- a/libgcc/memcmp.c
+++ b/libgcc/memcmp.c
@@ -7,10 +7,11 @@ memcmp (const void *str1, const void *str2, size_t count)
   const unsigned char *s1 = str1;
   const unsigned char *s2 = str2;

-  while (count-- > 0)
+  while (count--)
 {
-  if (*s1++ != *s2++)
- return s1[-1] < s2[-1] ? -1 : 1;
+  if (*s1 != *s2)
+ return *s1 < *s2 ? -1 : 1;
+  s1++, s2++;
 }
   return 0;
 }
diff --git a/libgcc/memcpy.c b/libgcc/memcpy.c
index 58b1e405627aa..616df78fd2969 100644
--- a/libgcc/memcpy.c
+++ b/libgcc/memcpy.c
@@ -2,7 +2,7 @@
 #include 

 void *
-memcpy (void *dest, const void *src, size_t len)
+memcpy (void * restrict dest, const void * restrict src, size_t len)
 {
   char *d = dest;
   const char *s = src;
diff --git a/libgcc/memset.c b/libgcc/memset.c
index 3e7025ee39443..b3b27cd63e12d 100644
--- a/libgcc/memset.c
+++ b/libgcc/memset.c
@@ -5,7 +5,7 @@ void *
 memset (void *dest, int val, size_t len)
 {
   unsigned char *ptr = dest;
-  while (len-- > 0)
-*ptr++ = val;
+  while (len--)
+*ptr++ = (unsigned char)val;
   return dest;
 }
diff --git a/libiberty/memchr.c b/libiberty/memchr.c
index 7448ab9e71c32..6f03e9c281108 100644
--- a/libiberty/memchr.c
+++ b/libiberty/memchr.c
@@ -23,7 +23,7 @@ memchr (register const PTR src_void, int c, size_t length)
 {
   const unsigned char *src = (const unsigned char *)src_void;

-  while (length-- > 0)
+  while (length--)
   {
 if (*src == c)
  return (PTR)src;
diff --git a/libiberty/memcmp.c b/libiberty/memcmp.c
index 37db60f38267a..f41b35a758cc4 100644
--- a/libiberty/memcmp.c
+++ b/libiberty/memcmp.c
@@ -27,8 +27,9 @@ memcmp (const PTR str1, const PTR str2, size_t count)

   while (count-- > 0)
 {
-  if (*s1++ != *s2++)
- return s1[-1] < s2[-1] ? -1 : 1;
+  if (*s1 != *s2)
+ return *s1 < *s2 ? -1 : 1;
+  s1++, s2++;
 }
   return 0;
 }
diff --git a/libiberty/memcpy.c b/libiberty/memcpy.c
index 7f67d0bd1f26c..d388ae7f3506b 100644
--- a/libiberty/memcpy.c
+++ b/libiberty/memcpy.c
@@ -19,7 +19,7 @@ Copies @var{length} bytes from memory region
@var{in} to region
 void bcopy (const void*, void*, size_t);

 PTR
-memcpy (PTR out, const PTR in, size_t length)
+memcpy (PTR restrict out, const PTR restrict in, size_t length)
 {
 bcopy(in, out, length);
 return out;
diff --git a/libiberty/mempcpy.c b/libiberty/mempcpy.c
index f4c624d4a3227..ac56eeaee0d5e 100644
--- a/libiberty/mempcpy.c
+++ b/libiberty/mempcpy.c
@@ -33,10 +33,10 @@ Copies @var{length} bytes from memory region
@var{in} to region
 #include 
 #include 

-extern PTR memcpy (PTR, const PTR, size_t);
+extern PTR memcpy (PTR restrict, const PTR restrict, size_t);

 PTR
-mempcpy (PTR dst, const PTR src, size_t len)
+mempcpy (PTR restrict dst, const PTR restrict src, size_t len)
 {
   return (char *) memcpy (dst, src, len) + len;
 }


Re: [PATCH] rtl: constm64_rtx..const64_rtx

2021-06-03 Thread Jeff Law via Gcc-patches




On 6/2/2021 4:43 PM, Segher Boessenkool wrote:

Hi!

On Wed, Jun 02, 2021 at 06:07:28PM +0100, Richard Sandiford wrote:

Segher Boessenkool  writes:

Since times immemorial there has been const_int_rtx for all values from
-64 to 64, but only constm1_rtx..const2_rtx have been available for
convenient use.  Change this, so that we can use all values in
{-64,...,64} in RTL easily.  This matters, because then we we just say
   if (XEXP (x, 1) == const16_rtx)
and things like that, since all const_int in that range are unique.  We
already do for -1, 0, 1, 2, but we could for everything.

No strong objection, but personally I'd rather not add something
that is very specific to VOIDmode CONST_INTs.  I realise it's very
unlikely that we'll ever be able to give CONST_INTs their proper mode
(no-one has the kind of time needed to do that), but I don't think we
should make the switch actively harder either.

How does this make that harder?

Having no mode for CONST_INTs makes some things significantly *easier*
btw.  Well you know that, that is what makes any conversion away from
this so much harder :-)

We have has const0_rtx etc. since forever, this patch just increases the
range (to those values that have had guaranteed unique RTXes since
decades as well).
Yea, but often what you really want is CONST0_RTX (mode) instead of 
const0_rtx.   It's easily goof'd and often the cause minor missed 
optimizations.


jeff


Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao wrote:
> >A very typical example is how UMIN is optimised:
> >
> >   case UMIN:
> >  if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
> > return op1;
> >  if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
> > return op0;
> >  tem = simplify_associative_operation (code, mode, op0, op1);
> >  if (tem)
> > return tem;
> >  break;
> >
> >(the stuff using "tem").
> >
> >Hongtao, can we do something similar here?  Does that work well?  Please try
> >it out :-)
> 
> In simplify_rtx, no simplication occurs, there is just the difference between
>  (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will 
> only be 0.

simplify-rtx is used by combine.  When you do and+not+splat for example
my suggestion should kick in.  Try it out, don't just dismiss it?

> Basically we don't know it's a simplication until combine successfully split 
> the
> 3->2 instructions (not + broadcast + and to andnot + broadcast), but it's 
> pretty awkward
> to do this in combine.

But you need to do this *before* it is split.  That is the whole point.

> Consider andnot is existed for many backends, I think a canonicalization is 
> needed here.

Please do note that that is not as easy as yoou may think: you need to
make sure nothing ever creates non-canonical code.

> Maybe we can add insn canonicalization for transforming (and (vect_duplicate 
> (not A)) B) to 
> (and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not 
> (vec_duplicate A))?

I don't understand what this means?


Segher


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-03 Thread Qing Zhao via Gcc-patches
Hi, Richard,

For the following, I need more clarification:



+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.


I will lower .DEFFERED_INIT to MEMS following expand_builtin_memset for 
“AUTO_INIT_PATTERN”.
My question is:
Do I need to do the same for “AUTO_INIT_ZERO”?

Thanks.

Qing



Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Segher Boessenkool
Hi!

On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote:
> On 2021/6/3 06:20, Segher Boessenkool wrote:
> > On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
> >> On P8LE, extra rot64+rot64 load or store instructions are generated
> >> in float128 to vector __int128 conversion.
> >>
> >> This patch teaches pass swaps to also handle such pattens to remove
> >> extra swap instructions.
> > 
> > Did you check if this is already handled by simplify-rtx if the mode had
> > been TImode (not V1TImode)?  If not, why do you not handle it there?
> 
> I tried to do it in combine or peephole, the later pass split2
> or split3 will still split it to rotate + rotate again as we have split
> after reload, and this pattern is quite P8LE specific, so put it in pass
> swap.  The simplify-rtx could simplify 
> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
> operations already.

What mode are those subregs?  Abbreviated RTL printouts are very lossy.
Assuming those are TImode (please check), then yes, that is what I
asked, thanks.

> ;; The post-reload split requires that we re-permute the source
> ;; register in case it is still live.
> (define_split
>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
> (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>&& !altivec_indexed_or_indirect_operand (operands[0], mode)"
>   [(const_int 0)]
> {
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>   rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>   DONE;
> })

Yes, that needs improvement itself.

The tthing to realise is that TImode is optimised by generic code just
fine (as all scalar integer modes are), but V1TImode is not.  We have
that mode because we really needed to not put TImode in vector registers
so much on older cpus, but that balance may have changed by now.  Worth
experimenting with, we now can do pretty much all noormal operations in
vector registers!


Segher


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-03 Thread Qing Zhao via Gcc-patches
Hi, Richard,


On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Wed, 12 May 2021, Qing Zhao wrote:

Hi,

This is the 3rd version of the patch for the new security feature for GCC.

Please take look and let me know your comments and suggestions.


+/* Returns true when the given TYPE has padding inside it.
+   return false otherwise.  */
+bool
+type_has_padding (tree type)
+{
+  switch (TREE_CODE (type))
+{
+case RECORD_TYPE:
+  {

btw, there's __builtin_clear_padding and a whole machinery around
it in gimple-fold.c, I'm sure that parts could be re-used if they
are neccessary in the end.

To address the above suggestion:

My study shows: the call to __builtin_clear_padding is expanded during 
gimplification phase.
And there is no __bultin_clear_padding expanding during rtx expanding phase.
However, for -ftrivial-auto-var-init, padding initialization should be done 
both in gimplification phase and rtx expanding phase.
since the __builtin_clear_padding might not be good for rtx expanding, reusing 
__builtin_clear_padding might not work.

Let me know if you have any more comments on this.

Thanks.

Qing


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 08:09:36AM -0500, Bill Schmidt wrote:
> Note also that swap optimization can handle more general cases than 
> simplify-rtx.

Do you have examples?  That should be fixed (unless it is something
Power-specific?)

> In my view it's best to have it covered in both places.

Oh certainly, and we need that p8swaps pass on at least p8 anyway (but
perhaps we can allow TImode in vector regs on later cpus).


Segher


PING [PATCH] PR fortran/99839 - [9/10/11/12 Regression] ICE in inline_matmul_assign, at fortran/frontend-passes.c:4234

2021-06-03 Thread Harald Anlauf via Gcc-patches
*PING*

> Gesendet: Donnerstag, 27. Mai 2021 um 22:20 Uhr
> Von: "Harald Anlauf" 
> An: "fortran" , "gcc-patches" 
> Betreff: [PATCH] PR fortran/99839 - [9/10/11/12 Regression] ICE in 
> inline_matmul_assign, at fortran/frontend-passes.c:4234
>
> Dear Fortranners,
>
> frontend optimization tries to inline matmul, but then it also needs
> to take care of the assignment to the result array.  If that one is
> not of canonical type, we currently get an ICE.  The straightforward
> solution is to simply punt in those cases and avoid inlining.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for mainline?  Backport to affected branches?
>
> Thanks,
> Harald
>
>
> Fortran - ICE in inline_matmul_assign
>
> Restrict inlining of matmul to those cases where assignment to the
> result array does not need special treatment.
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/99839
>   * frontend-passes.c (inline_matmul_assign): Do not inline matmul
>   if the assignment to the resulting array if it is not of canonical
>   type (real/integer/complex/logical).
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/99839
>   * gfortran.dg/inline_matmul_25.f90: New test.
>
>


Re: [PATCH] c++: Fix up attribute handling in methods in templates [PR100872]

2021-06-03 Thread Jason Merrill via Gcc-patches

On 6/3/21 5:00 AM, Jakub Jelinek wrote:

Hi!

The following testcase FAILs because a dependent (late) attribute is never
tsubsted.  While the testcase is OpenMP, I think it is a generic C++ FE problem
that could affect any other dependent attribute.

apply_late_template_attributes documents that it relies on
   /* save_template_attributes puts the dependent attributes at the beginning of
  the list; find the non-dependent ones.  */
The "operator binding" attributes that are sometimes added are added to the
head of DECL_ATTRIBUTES list though and because it doesn't have
ATTR_IS_DEPENDENT set it violates this requirement.

The following patch fixes it by adding that attribute after all
ATTR_IS_DEPENDENT attributes.  I'm not 100% sure if DECL_ATTRIBUTES can't be
shared by multiple functions (e.g. the cdtor clones), but the code uses
later remove_attribute which could break that too.  In any case it passed
bootstrap/regtest on x86_64-linux and i686-linux.


OK.


Other option would be to copy_list the ATTR_IS_DEPENDENT portion of the
DECL_ATTRIBUTES list if we need to do this, that would be the same as this
patch but replace that *ap = op_attr; at the end with
   *ap = NULL_TREE;
   DECL_ATTRIBUTES (cfn) = chainon (copy_list (DECL_ATTRIBUTES (cfn)),
   op_attr);
Or perhaps set ATTR_IS_DEPENDENT on the "operator bindings" attribute,
though it would need to be studied what would it try to do with the
attribute during tsubst.
2021-06-03  Jakub Jelinek  

PR c++/100872
* name-lookup.c (maybe_save_operator_binding): Add op_attr after all
ATTR_IS_DEPENDENT attributes in the DECL_ATTRIBUTES list rather than
to the start.

* g++.dg/gomp/declare-simd-8.C: New test.

--- gcc/cp/name-lookup.c.jj 2021-05-11 09:06:24.281997782 +0200
+++ gcc/cp/name-lookup.c2021-06-02 15:50:52.042521824 +0200
@@ -9136,9 +9136,12 @@ maybe_save_operator_binding (tree e)
tree op_attr = lookup_attribute (op_bind_attrname, attributes);
if (!op_attr)
  {
+  tree *ap = &DECL_ATTRIBUTES (cfn);
+  while (*ap && ATTR_IS_DEPENDENT (*ap))
+   ap = &TREE_CHAIN (*ap);
op_attr = tree_cons (get_identifier (op_bind_attrname),
-  NULL_TREE, attributes);
-  DECL_ATTRIBUTES (cfn) = op_attr;
+  NULL_TREE, *ap);
+  *ap = op_attr;
  }
  
tree op_bind = purpose_member (fnname, TREE_VALUE (op_attr));

--- gcc/testsuite/g++.dg/gomp/declare-simd-8.C.jj   2021-06-02 
16:02:32.792681922 +0200
+++ gcc/testsuite/g++.dg/gomp/declare-simd-8.C  2021-06-02 16:02:09.849004442 
+0200
@@ -0,0 +1,15 @@
+// PR c++/100872
+
+template 
+struct S {
+  #pragma omp declare simd aligned(a : N * 2) aligned(b) linear(ref(b): N)
+  float foo (float *a, T *&b) { return *a + *b; }
+};
+
+S<16, float> s;
+
+float
+bar (float *a, float *p)
+{
+  return s.foo (a, p);
+}

Jakub





Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote:
> If remove the rotate in simplify-rtx like below:
> 
> +++ b/gcc/simplify-rtx.c
> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>  case ROTATE:
>if (trueop1 == CONST0_RTX (mode))
> return op0;
> +
> +  if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64)
> + && CONST_INT_P (XEXP (trueop0, 1))
> + && INTVAL (XEXP (trueop0, 1)) == 64)
> +   return XEXP (trueop0, 0);

(The hardcoded 64 need improving -- but this is just a proof of concept
I'll assume :-) )

> Combine still fail to merge the two instructions:
> 
> Trying 6 -> 7:
> 6: r120:KF#0=r125:KF#0<-<0x40
>   REG_DEAD r125:KF
> 7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40
>   REG_DEAD r120:KF
> Successfully matched this instruction:
> (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
> (reg:DI 123)) [1  S16 A128])
> (subreg:V1TI (reg:KF 125) 0))
> rejecting combination of insns 6 and 7
> original costs 4 + 4 = 8
> replacement cost 12

So what instructions were these?  Why did the store cost 4 but the new
one costs 12?

> By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8,

It should be the same cost as the other store!

> it could merge the instructions:
> 
> 21: r125:KF=%v2:KF
>   REG_DEAD %v2:KF
> 2: NOTE_INSN_DELETED
> 3: NOTE_INSN_FUNCTION_BEG
> 6: NOTE_INSN_DELETED
>17: r123:DI=0x20
> 7: [sfp:DI+r123:DI]=r125:KF#0
>   REG_DEAD r125:KF
>19: NOTE_INSN_DELETED
>14: %v2:V1TI=[sfp:DI+r123:DI]
>   REG_DEAD r123:DI
>15: use %v2:V1TI
> 
> Then followed split1 pass will still split it to due to no dse pass
> between to remove the memory operations on stack, remove the rotate
> in swap won't face such problem since it runs before dse and no split
> pass between them:

Sure, but none of that is the point.  I asked if we did this for TImode
properly, and maybe we do, but:

>22: r126:V1TI=r125:KF#0<-<0x40
>23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40

... this is V1TI mode.


Segher


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Bill Schmidt via Gcc-patches



On 6/3/21 3:19 PM, Segher Boessenkool wrote:

On Thu, Jun 03, 2021 at 08:09:36AM -0500, Bill Schmidt wrote:

Note also that swap optimization can handle more general cases than
simplify-rtx.

Do you have examples?  That should be fixed (unless it is something
Power-specific?)
It is Power-specific.  This optimization looks at entire webs of 
computation to determine whether the computation can be done in the 
"wrong" lanes without problems, and removes all the extra swaps at 
once.  That is beyond what simplify-rtx is capable of.  It takes full 
dataflow analysis to see what can legally be done.  As a side effect, 
p8swaps takes care of some cases that simplify-rtx would catch at that 
particular point in compilation, but not all.

  In my view it's best to have it covered in both places.

Oh certainly, and we need that p8swaps pass on at least p8 anyway (but
perhaps we can allow TImode in vector regs on later cpus).


Yes, I agree that should potentially be revisited, but we still have the 
ABI requirement to keep them in GPRs for parameter passing, so there are 
limits on what we can do.


Bill




Segher


Re: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]

2021-06-03 Thread Jason Merrill via Gcc-patches

On 6/3/21 2:40 PM, Patrick Palka wrote:

On Thu, 3 Jun 2021, Patrick Palka wrote:


Here, we're rejecting the specialization of g with T=A, F=&f in the
first testcase due to a spurious constness mismatch between the type of
the template argument &f and the substituted type of F (the substituted
type has a top-level const).  Note that this mismatch doesn't occur with
object pointers because in that case a call to
perform_qualification_conversions from convert_nontype_argument
implicitly adds a top-level const to the argument (via a cast) to match.

This however seems to be a manifestation of a more general conformance
issue -- that we're not dropping top-level cv-quals after substituting
into the type of an NTTP as per [temp.param]/6 (we only do so at parse
time in process_template_parm).  This patch makes convert_template_argument
drop top-level cv-quals accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100893

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Strip top-level cv-quals
on the substituted type of a non-type template parameter.

gcc/testsuite/ChangeLog:

* g++.dg/template/param4.C: New test.
* g++.dg/template/param5.C: New test.
---
  gcc/cp/pt.c|  7 ++-
  gcc/testsuite/g++.dg/template/param4.C | 10 ++
  gcc/testsuite/g++.dg/template/param5.C |  7 +++
  3 files changed, 23 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/param4.C
  create mode 100644 gcc/testsuite/g++.dg/template/param5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7211bdc5bbc..66cc88a331f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8494,7 +8494,12 @@ convert_template_argument (tree parm,
return error_mark_node;
}
else
-   t = tsubst (t, args, complain, in_decl);
+   {
+ t = tsubst (t, args, complain, in_decl);
+ /* Ignore top-level qualifiers on the substituted type of this
+non-type template parameter, as per [temp.param]/6.  */
+ t = cv_unqualified (t);
+   }


Err, shortly after posting I realized we could get top-level cv-quals not
only after substitution, but also after deduction of a decltype(auto)
NTTP (as in nontype-auto19.C below), so the call to cv_unqualified would
need to happen after do_auto_deduction too, like so.  Testing in
progress.

-- >8 --

Subject: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]

Here, we're rejecting the specialization of g with T=A, F=&f in
param4.C below due to a spurious constness mismatch between the type of
the template argument &f and the substituted type of F (the substituted
type has a top-level const).  Note that this mismatch doesn't occur with
object pointers because in that case a call to
perform_qualification_conversions from convert_nontype_argument
implicitly adds a top-level const to the argument (via a cast) to match.

This however seems to be a manifestation of a more general conformance
issue -- that we're not dropping top-level cv-quals after substituting
into the type of an NTTP as per [temp.param]/6 (we only do so at parse
time in process_template_parm).  So this patch makes
convert_template_argument drop top-level cv-quals accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100893

gcc/cp/ChangeLog:

* pt.c (convert_template_argument): Strip top-level cv-quals
on the substituted type of a non-type template parameter.

gcc/testsuite/ChangeLog:

* g++.dg/template/param4.C: New test.
* g++.dg/template/param5.C: New test.
* g++.dg/cpp1z/nontype-auto19.C: New test.
* g++.dg/cpp2a/concepts-decltype.C: Don't expect that the
deduced type of a decltype(auto) NTTP has top-level cv-quals.
---
  gcc/cp/pt.c|  4 
  gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C|  8 
  gcc/testsuite/g++.dg/cpp2a/concepts-decltype.C |  2 +-
  gcc/testsuite/g++.dg/template/param4.C | 10 ++
  gcc/testsuite/g++.dg/template/param5.C |  7 +++
  5 files changed, 30 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C
  create mode 100644 gcc/testsuite/g++.dg/template/param4.C
  create mode 100644 gcc/testsuite/g++.dg/template/param5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7211bdc5bbc..3cac073ed50 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8496,6 +8496,10 @@ convert_template_argument (tree parm,
else
t = tsubst (t, args, complain, in_decl);
  
+  /* Drop top-level cv-qualifiers on the substituted/deduced type of

+this non-type template parameter, as per [temp.param]/6.  */
+  t = cv_unqualified (t);


Might move this after...


if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;


...this test.  OK either way.


diff --git a/gcc/testsui

[PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.

2021-06-03 Thread Jim Wilson
These were only enabled for embedded elf originally because that was
the safe option, and linux had no obvious use for them.  But now that
we have new extensions coming like V that affect process state and ABIs,
the attributes are expected to be useful for linux, and may be required
by the psABI.  clang already emits them for all riscv targets.

Tested with a patched open embedded build and boot, and a native
toolchain build.

Committed.

Jim

gcc/
* config.gcc (riscv*-*-*): If --with-riscv-attribute not used,
turn TARGET_RISCV_ATTRIBUTES on for all riscv targets.
---
 gcc/config.gcc | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 92fad8e20ca..6833a6c13d9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4605,14 +4605,7 @@ case "${target}" in
tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0"
;;
""|default)
-   case "${target}" in
-   riscv*-*-elf*)
-   tm_defines="${tm_defines} 
TARGET_RISCV_ATTRIBUTE=1"
-   ;;
-   *)
-   tm_defines="${tm_defines} 
TARGET_RISCV_ATTRIBUTE=0"
-   ;;
-   esac
+   tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1"
;;
*)
echo "--with-riscv-attribute=${with_riscv_attribute} is 
not supported.  The argument must begin with yes, no or default." 1>&2
-- 
2.25.1



Re: [PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.

2021-06-03 Thread Palmer Dabbelt

On Thu, 03 Jun 2021 13:55:40 PDT (-0700), Jim Wilson wrote:

These were only enabled for embedded elf originally because that was
the safe option, and linux had no obvious use for them.  But now that
we have new extensions coming like V that affect process state and ABIs,
the attributes are expected to be useful for linux, and may be required
by the psABI.  clang already emits them for all riscv targets.

Tested with a patched open embedded build and boot, and a native
toolchain build.

Committed.


Works for me.

Nelson: I'm assuming this is why the .riscv.attributes is in 
binutils-all now?  That seems fine, we should just call it out as the 
reason explicitly by splitting that out into its own commit.




Jim

gcc/
* config.gcc (riscv*-*-*): If --with-riscv-attribute not used,
turn TARGET_RISCV_ATTRIBUTES on for all riscv targets.
---
 gcc/config.gcc | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 92fad8e20ca..6833a6c13d9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4605,14 +4605,7 @@ case "${target}" in
tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0"
;;
""|default)
-   case "${target}" in
-   riscv*-*-elf*)
-   tm_defines="${tm_defines} 
TARGET_RISCV_ATTRIBUTE=1"
-   ;;
-   *)
-   tm_defines="${tm_defines} 
TARGET_RISCV_ATTRIBUTE=0"
-   ;;
-   esac
+   tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1"
;;
*)
echo "--with-riscv-attribute=${with_riscv_attribute} is not 
supported.  The argument must begin with yes, no or default." 1>&2


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 11:25:49AM +0100, Richard Sandiford wrote:
> We shouldn't just add "&&" to all define_insn_and_splits that currently
> lack them.

My previous post shows that this *already* is required.

> IMO it's not reasonable to ask Kewen to do that for all ports.  So the
> process I suggested was a way of mechanically changing existing ports in
> a way that would not require input from target maintainers, or extensive
> testing on affected targets.

I fear it will end up as Yet Another unfinished transition this way :-(

> >> I don't know.  "&& 1" looks kind of weird to me.
> >
> > We have it in rs6000.md since 2004.  sparc has had it since 2002.  i386
> > has had it since 2001.  arm still does not have it :-)
> 
> Sure, this syntax goes back 20 years.  I don't think that answers the
> point though.  The question was whether a split condition "&& 1" is
> more readable than a condition "", given that "" means "always" in other
> contexts.  Given the choice, IMO "" is more readable and "&& 1" looks
> weird/inconsistent.

In most ports "&& 1" is all over the place and well-known already.  Sure
we could change to "" meaning always, but that is not what it currently
means!

If we could just start all over we could do it perfectly (but see
second-system syndrome, heh).  But we cannot.  IMO we should especially
avoid everything that uses new semantics for old syntax.


Segher


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 03, 2021 at 04:25:44PM -0500, Segher Boessenkool wrote:
> If we could just start all over we could do it perfectly (but see
> second-system syndrome, heh).  But we cannot.  IMO we should especially
> avoid everything that uses new semantics for old syntax.

Agreed, that would be a nightmare for backporting.

Jakub



Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 11:11:53AM -0600, Jeff Law wrote:
> On 6/3/2021 2:00 AM, Segher Boessenkool wrote:
> >The whole point of requiring the split condition to start with && is so
> >it will become harder to mess things up (it will make the gen* code a
> >tiny little bit simpler as well).  And there is no transition period or
> >anything like that needed either.  Just the bunch that will break will
> >need fixing.  So let's find out how many of those there are :-)
> Exactly.   While these empty conditions or those not starting with "&&" 
> are technically valid, they're all suspicious from a port correctness 
> standpoint, particularly if the main condition is non-empty.

And note that this is also the case if you wrote the insn condition as
an empty string, but you used some iterator with a condition.  I found
many of these in rs6000.  This will need to be fixed before we do
anything else.

> Having made that mistake when converting the H8 away from CC0, I can say 
> fairly confidently that if we had this in place a year ago that those 
> mistakes would likely have been avoided.  Thankfully the H8 isn't a 
> heavily used port and has limped along until I stumbled over the issue a 
> week or so ago while polishing some improvements to the port.

I've noticed unexpected splits in rs6000 only a few times over the
years. That doesn't mean more didn't happen, they just didn't cause
obvious enough problems :-)


Segher


Re: [PATCH 04/11] cris: Update unexpected empty split condition

2021-06-03 Thread Hans-Peter Nilsson via Gcc-patches
> From: Hans-Peter Nilsson 
> CC: "gcc-patches@gcc.gnu.org" 
> Date: Thu, 3 Jun 2021 18:12:25 +0200

> I'd
> prefer to have the patch above committed sooner than the
> conclusion of that discussion.  (If you don't get to it,
> I'll do it, after a round of testing.)

Done; no regressions.

brgds, H-P


Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders

2021-06-03 Thread H.J. Lu via Gcc-patches
On Thu, Jun 3, 2021 at 12:39 AM Uros Bizjak  wrote:
>
> On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu  wrote:
> >
> > Update move expanders to convert the CONST_WIDE_INT operand to vector
> > broadcast from a byte with AVX2.  Add ix86_gen_scratch_sse_rtx to
> > return a scratch SSE register which won't increase stack alignment
> > requirement and blocks transformation by the combine pass.
>
> Using fixed scratch reg is just too hackish for my taste. The

It was recommended to use hard register for things like this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569945.html

> expansion is OK to emit some optimized sequence, but the approach to
> use fixed reg to bypass stack alignment functionality and combine is
> not.
>
> Perhaps a new insn pattern should be introduced, e.g.
>
> (define_insn_and_split ""
>[(set (match_opreand:V 0 "memory_operand" "=m,m")
> (vec_duplicate:V
>   (match_operand:S 2 "reg_or_0_operand" "r,C"))
> (clobber (match_scratch:V 1 "=x"))]
>
> and split it at some appropriate point.

I will give it a try.

Thanks.

> Uros.
>
> >
> > A small benchmark:
> >
> > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
> >
> > shows that broadcast is a little bit faster on Intel Core i7-8559U:
> >
> > $ make
> > gcc -g -I. -O2   -c -o test.o test.c
> > gcc -g   -c -o memory.o memory.S
> > gcc -g   -c -o broadcast.o broadcast.S
> > gcc -o test test.o memory.o broadcast.o
> > ./test
> > memory   : 99333
> > broadcast: 97208
> > $
> >
> > broadcast is also smaller:
> >
> > $ size memory.o broadcast.o
> >textdata bss dec hex filename
> > 132   0   0 132  84 memory.o
> > 122   0   0 122  7a broadcast.o
> > $
> >
> > gcc/
> >
> > PR target/100865
> > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> > New prototype.
> > (ix86_byte_broadcast): New function.
> > (ix86_convert_const_wide_int_to_broadcast): Likewise.
> > (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast
> > if mode size is 16 bytes or bigger.
> > (ix86_expand_vector_move): Try
> > ix86_convert_const_wide_int_to_broadcast.
> > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> > prototype.
> > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
> > an argument to ignore stack_alignment_estimated.  It is passed
> > as false by default.
> > (ix86_gen_scratch_sse_rtx): New function.
> >
> > gcc/testsuite/
> >
> > PR target/100865
> > * gcc.target/i386/pr100865-1.c: New test.
> > * gcc.target/i386/pr100865-2.c: Likewise.
> > * gcc.target/i386/pr100865-3.c: Likewise.
> > * gcc.target/i386/pr100865-4.c: Likewise.
> > * gcc.target/i386/pr100865-5.c: Likewise.
> > ---
> >  gcc/config/i386/i386-expand.c  | 103 ++---
> >  gcc/config/i386/i386-protos.h  |   2 +
> >  gcc/config/i386/i386.c |  50 +-
> >  gcc/testsuite/gcc.target/i386/pr100865-1.c |  13 +++
> >  gcc/testsuite/gcc.target/i386/pr100865-2.c |  14 +++
> >  gcc/testsuite/gcc.target/i386/pr100865-3.c |  15 +++
> >  gcc/testsuite/gcc.target/i386/pr100865-4.c |  16 
> >  gcc/testsuite/gcc.target/i386/pr100865-5.c |  17 
> >  8 files changed, 215 insertions(+), 15 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 4185f58eed5..658adafa269 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -93,6 +93,9 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "i386-builtins.h"
> >  #include "i386-expand.h"
> >
> > +static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
> > +  rtx);
> > +
> >  /* Split one or more double-mode RTL references into pairs of half-mode
> > references.  The RTL can be REG, offsettable MEM, integer constant, or
> > CONST_DOUBLE.  "operands" is a pointer to an array of double-mode RTLs 
> > to
> > @@ -190,6 +193,65 @@ ix86_expand_clear (rtx dest)
> >emit_insn (tmp);
> >  }
> >
> > +/* Return a byte value which V can be broadcasted from.  Otherwise,
> > +   return INT_MAX.  */
> > +
> > +static int
> > +ix86_byte_broadcast (HOST_WIDE_INT v)
> > +{
> > +  wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT);
> > +  int byte_broadcast = wi::extract_uhwi (val, 0, BITS_PER_UNIT);
> > +  for (unsigned int i = BITS_PER_UNIT;
> > +   i < HOST_BITS_PER_WIDE_INT;
> > +   i += BITS

[PATCH] [og11] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset

2021-06-03 Thread Julian Brown
At Richard Biener's suggestion, this patch undoes the following patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571712.html

and moves the stripping of ARRAY_REFS/INDIRECT_REFS out of
extract_base_bit_offset and back into the (two) call sites of the
function. The difference between the two ways of looking through these
nodes comes down to (I think) what processing has been done on the
clause in question already: in the case where BASE_REF is non-NULL,
we are processing an OMP_CLAUSE_DECL for the first time. Conversely,
when BASE_REF is NULL, we are processing a node from the sorted list
that is being constructed after a GOMP_MAP_STRUCT node.

In practice, this appears to have no effect on test results (and I
haven't come up with a new test where it makes a difference), though
I will fold this version into the next iteration of these patches sent
upstream in order to avoid potentially introducing a bug.

Tested with offloading to NVPTX. I will apply to the og11 branch after
the weekend.

2021-06-03  Julian Brown  

gcc/
* gimplify.c (extract_base_bit_offset): Don't look through ARRAY_REFs or
INDIRECT_REFs here.
(build_struct_group): Reinstate previous behaviour for handling
ARRAY_REFs/INDIRECT_REFs.
---
 gcc/gimplify.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c6ebef8e41c..1742a2cb564 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8526,20 +8526,6 @@ extract_base_bit_offset (tree base, tree *base_ind, tree 
*base_ref,
   if (base_ref)
 *base_ref = NULL_TREE;
 
-  if (TREE_CODE (base) == ARRAY_REF)
-{
-  while (TREE_CODE (base) == ARRAY_REF)
-   base = TREE_OPERAND (base, 0);
-  if (TREE_CODE (base) != COMPONENT_REF
- || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
-   return NULL_TREE;
-}
-  else if (TREE_CODE (base) == INDIRECT_REF
-  && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
-  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
-  == REFERENCE_TYPE))
-base = TREE_OPERAND (base, 0);
-
   base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
  &unsignedp, &reversep, &volatilep);
 
@@ -9116,11 +9102,17 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
   poly_offset_int coffset;
   poly_int64 cbitpos;
   tree base_ind, base_ref, tree_coffset;
+  tree ocd = OMP_CLAUSE_DECL (c);
   bool openmp = !(region_type & ORT_ACC);
 
-  tree base = extract_base_bit_offset (OMP_CLAUSE_DECL (c), &base_ind,
-  &base_ref, &cbitpos, &coffset,
-  &tree_coffset, openmp);
+  while (TREE_CODE (ocd) == ARRAY_REF)
+ocd = TREE_OPERAND (ocd, 0);
+
+  if (TREE_CODE (ocd) == INDIRECT_REF)
+ocd = TREE_OPERAND (ocd, 0);
+
+  tree base = extract_base_bit_offset (ocd, &base_ind, &base_ref, &cbitpos,
+  &coffset, &tree_coffset, openmp);
 
   bool do_map_struct = (base == decl && !tree_coffset);
 
@@ -9347,9 +9339,23 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
poly_offset_int offset;
poly_int64 bitpos;
tree tree_offset;
-   tree base = extract_base_bit_offset (sc_decl, NULL, NULL,
-&bitpos, &offset,
-&tree_offset, openmp);
+
+   if (TREE_CODE (sc_decl) == ARRAY_REF)
+ {
+   while (TREE_CODE (sc_decl) == ARRAY_REF)
+ sc_decl = TREE_OPERAND (sc_decl, 0);
+   if (TREE_CODE (sc_decl) != COMPONENT_REF
+   || TREE_CODE (TREE_TYPE (sc_decl)) != ARRAY_TYPE)
+ break;
+ }
+   else if (TREE_CODE (sc_decl) == INDIRECT_REF
+&& TREE_CODE (TREE_OPERAND (sc_decl, 0)) == COMPONENT_REF
+&& (TREE_CODE (TREE_TYPE (TREE_OPERAND (sc_decl, 0)))
+== REFERENCE_TYPE))
+ sc_decl = TREE_OPERAND (sc_decl, 0);
+
+   tree base = extract_base_bit_offset (sc_decl, NULL, NULL, &bitpos,
+&offset, &tree_offset, openmp);
if (!base || !operand_equal_p (base, decl, 0))
  break;
if (scp)
-- 
2.29.2



Re: [Patch] Fortran/OpenMP: Add omp loop [PR99928]

2021-06-03 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 03, 2021 at 05:07:22PM +0200, Jakub Jelinek via Gcc-patches wrote:
> I think best would be just to remove that part of the testcase
> in the loop patch and handle the !$omp with !$acc continuations
> and vice versa in a separate change.  That seems to be a preexisting
> bug not really related to whether we support loop or not.
> fatal_error is certainly not ideal, but I can understand fixing
> it otherwise might be hard.
> Wonder if we just shouldn't treat the incorrect continuations
> as if they were simple comments.

Perhaps can't it gfc_error_now for the mixing of OMP and ACC line
continuations and then act as if the incorrect ones were comments
(like for -fopenmp -fno-openacc or -fopenacc -fno-openmp)?

Jakub



Re: [PATCH] rtl: constm64_rtx..const64_rtx

2021-06-03 Thread Segher Boessenkool
On Thu, Jun 03, 2021 at 01:32:54PM -0600, Jeff Law wrote:
> On 6/2/2021 4:43 PM, Segher Boessenkool wrote:
> >We have has const0_rtx etc. since forever, this patch just increases the
> >range (to those values that have had guaranteed unique RTXes since
> >decades as well).
> Yea, but often what you really want is CONST0_RTX (mode) instead of 
> const0_rtx.   It's easily goof'd and often the cause minor missed 
> optimizations.

I'd say "sometimes" instead of "often"...  almost always you really do
want a scalar integer.  Also, although CONST0_RTX makes sense for pretty
much all modes, CONST64_RTX will not (and those are not currently unique
RTX anyway).


Segher


Re: [wwwdocs] lists: Correct procmail recipe

2021-06-03 Thread Hans-Peter Nilsson
On Wed, 2 Jun 2021, Gerald Pfeifer wrote:

> On Tue, 1 Jun 2021, Segher Boessenkool wrote:
> > We haven't had Sender: for a while now.
>
> "a while now" was about four(?) hours when you sent that yesterday. :-)
>
> I know since I still had been using that and was looking for all my
> missing gcc-related mails yesterday afternoon.  Thanks for spotting
> this and this change!

JFTR, nitpickish: I still get a "Sender:" field (obviously
generated at gcc.gnu.org), just not matching that format.

brgds, H-P


RE: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.

2021-06-03 Thread Liu, Hongtao via Gcc-patches


>-Original Message-
>From: Jakub Jelinek 
>Sent: Thursday, June 3, 2021 9:49 PM
>To: Liu, Hongtao 
>Cc: gcc-patches@gcc.gnu.org
>Subject: Re: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.
>
>On Thu, Jun 03, 2021 at 05:07:26PM +0800, liuhongt via Gcc-patches wrote:
>> @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2"
>>"TARGET_AVX512F")
>>
>>  (define_insn "avx2_v8qiv8si2"
>> -  [(set (match_operand:V8SI 0 "register_operand" "=v")
>> +  [(set (match_operand:V8SI 0 "register_operand" "=Yv")
>>  (any_extend:V8SI
>>(vec_select:V8QI
>> -(match_operand:V16QI 1 "register_operand" "v")
>> +(match_operand:V16QI 1 "register_operand" "Yv")
>>  (parallel [(const_int 0) (const_int 1)
>> (const_int 2) (const_int 3)
>> (const_int 4) (const_int 5)
>
>Why do you need this change (and similarly other v -> Yv changes)?
>I mean, ix86_hard_regno_mode_ok for TARGET_AVX512F
>&& !TARGET_AVX512VL should return false for the 16-byte and 32-byte vector
>modes.
>
>The reason to use Yv is typically where the match_operand has 64-byte vector
>mode or scalar mode, yet it needs an AVX512VL instruction.
>
>The changes to use Yw look ok, that is for the cases where the insn requires
>both AVX512VL and AVX512BW, while ix86_hard_regno_mode_ok ensures
>the xmm16+ regs won't be used for the 16/32-byte vectors when AVX512VL is
>not on, it doesn't ensure that AVX512BW will be enabled.
Thanks for the review.
Yes, you're right, AVX512VL parts are already guaranteed by 
ix86_hard_regno_mode_ok.

Here is updated patch.
>
>   Jakub



0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch
Description: 0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch


Re: [PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.

2021-06-03 Thread Nelson Chu
On Fri, Jun 4, 2021 at 5:20 AM Palmer Dabbelt  wrote:
>
> On Thu, 03 Jun 2021 13:55:40 PDT (-0700), Jim Wilson wrote:
> > These were only enabled for embedded elf originally because that was
> > the safe option, and linux had no obvious use for them.  But now that
> > we have new extensions coming like V that affect process state and ABIs,
> > the attributes are expected to be useful for linux, and may be required
> > by the psABI.  clang already emits them for all riscv targets.
> >
> > Tested with a patched open embedded build and boot, and a native
> > toolchain build.
> >
> > Committed.
>
> Works for me.
>
> Nelson: I'm assuming this is why the .riscv.attributes is in
> binutils-all now?  That seems fine, we should just call it out as the
> reason explicitly by splitting that out into its own commit.

Yes this patch will make .riscv.attributes always be generated, both
for elf and linux toolchains.
Got it, I will split this one into a separate commit,  thanks.

FYI, this update resolves the ABI problems for linux toolchain that I
met before,
https://sourceware.org/pipermail/binutils/2020-November/114016.html
Now linux toolchain also generate the elf architecture attributes, so
binutils will try to choose the correct ABI according to the
attributes, if user compile a assembly code without setting -mabi=.

Thanks
Nelson


> > Jim
> >
> >   gcc/
> >   * config.gcc (riscv*-*-*): If --with-riscv-attribute not used,
> >   turn TARGET_RISCV_ATTRIBUTES on for all riscv targets.
> > ---
> >  gcc/config.gcc | 9 +
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 92fad8e20ca..6833a6c13d9 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -4605,14 +4605,7 @@ case "${target}" in
> >   tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0"
> >   ;;
> >   ""|default)
> > - case "${target}" in
> > - riscv*-*-elf*)
> > - tm_defines="${tm_defines} 
> > TARGET_RISCV_ATTRIBUTE=1"
> > - ;;
> > - *)
> > - tm_defines="${tm_defines} 
> > TARGET_RISCV_ATTRIBUTE=0"
> > - ;;
> > - esac
> > + tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1"
> >   ;;
> >   *)
> >   echo "--with-riscv-attribute=${with_riscv_attribute} 
> > is not supported.  The argument must begin with yes, no or default." 1>&2


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches



On 2021/6/4 04:31, Segher Boessenkool wrote:
> On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote:
>> If remove the rotate in simplify-rtx like below:
>>
>> +++ b/gcc/simplify-rtx.c
>> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 
>> (rtx_code code,
>>   case ROTATE:
>> if (trueop1 == CONST0_RTX (mode))
>>  return op0;
>> +
>> +  if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64)
>> + && CONST_INT_P (XEXP (trueop0, 1))
>> + && INTVAL (XEXP (trueop0, 1)) == 64)
>> +   return XEXP (trueop0, 0);
> 
> (The hardcoded 64 need improving -- but this is just a proof of concept
> I'll assume :-) )
> 
>> Combine still fail to merge the two instructions:
>>
>> Trying 6 -> 7:
>>  6: r120:KF#0=r125:KF#0<-<0x40
>>REG_DEAD r125:KF
>>  7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40
>>REG_DEAD r120:KF
>> Successfully matched this instruction:
>> (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
>>  (reg:DI 123)) [1  S16 A128])
>>  (subreg:V1TI (reg:KF 125) 0))
>> rejecting combination of insns 6 and 7
>> original costs 4 + 4 = 8
>> replacement cost 12
> 
> So what instructions were these?  Why did the store cost 4 but the new
> one costs 12?

For this case of __float128 to vector __int128:

typedef union
{
  __float128 vf1;
  vector __int128 vi128;
  __int128 i128;
} VF_128;

vector __int128
foo1 (__float128 f128)
{
  VF_128 vunion;

  vunion.vf1 = f128;
  return vunion.vi128;
}

Without this patch, the RTL in combine is:

(insn 6 3 17 2 (set (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
(rotate:V1TI (subreg:V1TI (reg:KF 125) 0)
(const_int 64 [0x40]))) "pr100085.c":258:14 1113 
{*vsx_le_permute_v1ti}
 (expr_list:REG_DEAD (reg:KF 125)
(nil)))
(insn 17 6 7 2 (set (reg:DI 123)
(const_int 32 [0x20])) "pr100085.c":258:14 636 {*movdi_internal64}
 (nil))
(insn 7 17 19 2 (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
(reg:DI 123)) [1  S16 A128])
(rotate:V1TI (subreg:V1TI (reg:KF 120 [ f128 ]) 0)
(const_int 64 [0x40]))) "pr100085.c":258:14 1113 
{*vsx_le_permute_v1ti}
 (expr_list:REG_DEAD (reg:KF 120 [ f128 ])
(nil)))
(note 19 7 14 2 NOTE_INSN_DELETED)
(insn 14 19 15 2 (set (reg/i:V1TI 66 %v2)
(mem/c:V1TI (plus:DI (reg/f:DI 110 sfp)
(reg:DI 123)) [1  S16 A128])) "pr100085.c":260:1 1119 
{*vsx_le_perm_load_v1ti}
 (expr_list:REG_DEAD (reg:DI 123)
(nil)))
(insn 15 14 0 2 (use (reg/i:V1TI 66 %v2)) "pr100085.c":260:1 -1
 (nil))

insn 6 and insn 7 are two vsx_le_permute_v1ti instructions each with costs 4,
(The two instructions are VSX and LE specific like Bill said, swap pass tries 
to remove insn if legal).  If remove the rotates in simplify-rtx.c
(simplify_context::simplify_binary_operation_1) like my last reply, combine will
try to merge them to vsx_le_perm_store_v1ti whose insn cost is 12 and meet 
"rejecting
combination".  They are all V1TI mode.


> 
>> By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8,
> 
> It should be the same cost as the other store!

vsx_le_permute_v1ti's cost is defined to 4 in vsx.md:

;; Little endian word swapping for 128-bit types that are either scalars or the
;; special V1TI container class, which it is not appropriate to use vec_select
;; for the type.
(define_insn "*vsx_le_permute_"
  [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q")
(rotate:VSX_TI
 (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r")
 (const_int 64)))]
  "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
  "@
   xxpermdi %x0,%x1,%x1,2
   lxvd2x %x0,%y1
   stxvd2x %x1,%y0
   mr %0,%L1\;mr %L0,%1
   ld%U1%X1 %0,%L1\;ld%U1%X1 %L0,%1
   std%U0%X0 %L1,%0\;std%U0%X0 %1,%L0"
  [(set_attr "length" "*,*,*,8,8,8")
   (set_attr "type" "vecperm,vecload,vecstore,*,load,store")])

> 
>> it could merge the instructions:
>>
>>  21: r125:KF=%v2:KF
>>REG_DEAD %v2:KF
>>  2: NOTE_INSN_DELETED
>>  3: NOTE_INSN_FUNCTION_BEG
>>  6: NOTE_INSN_DELETED
>> 17: r123:DI=0x20
>>  7: [sfp:DI+r123:DI]=r125:KF#0
>>REG_DEAD r125:KF
>> 19: NOTE_INSN_DELETED
>> 14: %v2:V1TI=[sfp:DI+r123:DI]
>>REG_DEAD r123:DI
>> 15: use %v2:V1TI
>>
>> Then followed split1 pass will still split it to due to no dse pass
>> between to remove the memory operations on stack, remove the rotate
>> in swap won't face such problem since it runs before dse and no split
>> pass between them:
> 
> Sure, but none of that is the point.  I asked if we did this for TImode
> properly, and maybe we do, but:
> 
>> 22: r126:V1TI=r125:KF#0<-<0x40
>> 23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40
> 
> ... this is V1TI mode.
> 
> 
> Segher
> 

-- 
Thanks,
Xionghu


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches
Hi,


On 2021/6/3 21:09, Bill Schmidt wrote:
> On 6/2/21 7:46 PM, Xionghu Luo wrote:
>> Hi,
>>
>> On 2021/6/3 06:20, Segher Boessenkool wrote:
>>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
 On P8LE, extra rot64+rot64 load or store instructions are generated
 in float128 to vector __int128 conversion.

 This patch teaches pass swaps to also handle such pattens to remove
 extra swap instructions.
>>> Did you check if this is already handled by simplify-rtx if the mode had
>>> been TImode (not V1TImode)?  If not, why do you not handle it there?
>> I tried to do it in combine or peephole, the later pass split2
>> or split3 will still split it to rotate + rotate again as we have split
>> after reload, and this pattern is quite P8LE specific, so put it in pass
>> swap.  The simplify-rtx could simplify
>> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
>> operations already.
>>
>>
>> vsx.md:
>>
>> ;; The post-reload split requires that we re-permute the source
>> ;; register in case it is still live.
>> (define_split
>>    [(set (match_operand:VSX_LE_128 0 "memory_operand")
>>  (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>    "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && 
>> !TARGET_P9_VECTOR
>>     && !altivec_indexed_or_indirect_operand (operands[0], mode)"
>>    [(const_int 0)]
>> {
>>    rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>>    rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
>>    rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>>    DONE;
>> })
> 
> Note also that swap optimization can handle more general cases than 
> simplify-rtx.  In my view it's best to have it covered in both places.
> 

But this pattern is after reload quite later than swap optimization,
so it couldn't remove the swap operations as expected, I have a below
example that matched the above pattern in pass split2, this may be not 
quite appropriate as there is a function call between the load and store.

extern vector __int128 foo1 (__float128 a);

int foo2 ()
{
  __binary128 f128 = {3.1415926535897932384626433832795028841971693993751058Q};
  vector __int128 ret = foo1 (f128);
  return ret[0];
}


295r.split (*see insn 35, 36, 37*):

...
Splitting with gen_split_558 (vsx.md:1079)
...

(insn 33 12 34 2 (set (reg/f:DI 9 %r9 [121])
(high:DI (unspec:DI [
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
(reg:DI 2 %r2)
] UNSPEC_TOCREL))) "pr100085.c":279:25 715 {*largetoc_high}
 (nil))
(insn 34 33 6 2 (set (reg/f:DI 9 %r9 [121])
(lo_sum:DI (reg/f:DI 9 %r9 [121])
(unspec:DI [
(symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
(reg:DI 2 %r2)
] UNSPEC_TOCREL))) "pr100085.c":279:25 717 {*largetoc_low}
 (expr_list:REG_EQUAL (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
(nil)))
(insn 6 34 8 2 (set (reg:V1TI 66 %v2 [123])
(rotate:V1TI (mem/c:V1TI (reg/f:DI 9 %r9 [121]) [1 f128+0 S16 A128])
(const_int 64 [0x40]))) "pr100085.c":279:25 1113 
{*vsx_le_permute_v1ti}
 (nil))
(insn 8 6 9 2 (set (reg:V1TI 66 %v2)
(rotate:V1TI (reg:V1TI 66 %v2 [123])
(const_int 64 [0x40]))) "pr100085.c":279:25 1113 
{*vsx_le_permute_v1ti}
 (nil))
(call_insn 9 8 32 2 (parallel [
(set (reg:V1TI 66 %v2)
(call (mem:SI (symbol_ref:DI ("foo1") [flags 0x41]  
) [0 foo
1 S4 A8])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:DI 96 lr))
]) "pr100085.c":279:25 735 {*call_value_nonlocal_aixdi}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo1") [flags 0x41]  
)
(nil))
(expr_list (use (reg:DI 2 %r2))
(expr_list:KF (use (reg:KF 66 %v2))
(nil
(insn 32 9 35 2 (set (reg:DI 9 %r9 [138])
(plus:DI (reg/f:DI 1 %r1)
(const_int 32 [0x20]))) "pr100085.c":279:25 66 {*adddi3}
 (nil))
(insn 35 32 36 2 (set (reg:V1TI 66 %v2)
(rotate:V1TI (reg:V1TI 66 %v2)
(const_int 64 [0x40]))) "pr100085.c":279:25 1113 
{*vsx_le_permute_v1ti}
 (nil))
(insn 36 35 37 2 (set (mem/c:V1TI (reg:DI 9 %r9 [138]) [2 %sfp+32 S16 A128])
(rotate:V1TI (reg:V1TI 66 %v2)
(const_int 64 [0x40]))) "pr100085.c":279:25 1113 
{*vsx_le_permute_v1ti}
 (nil))
(insn 37 36 28 2 (set (reg:V1TI 66 %v2)
(rotate:V1TI (reg:V1TI 66 %v2)
(const_int 64 [0x40]))) "pr100085.c":279:25 1113 
{*vsx_le_permute_v1ti}
 (nil))
(insn 28 37 17 2 (set (reg:DI 3 %r3 [133])
(mem/c:DI (plus:DI (reg/f:DI 1 %r1)
(const_int 32 [0x20])) [2 %sfp+32 S8 A128])) 
"pr100085.c":279:25 636 {*movdi_internal64}
 (nil))
(insn 17 28 18 2 (set (reg/i:DI 3 %r3)
(sign_extend:DI (reg:SI 3 %r3 [129]))) "pr100085.c":281:1 31 
{extendsidi2}
 (nil))
(insn 18 17 30 2 (use (reg/i:DI 3 %r3)) "pr100085.c":

Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-03 Thread Xionghu Luo via Gcc-patches



On 2021/6/4 04:16, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote:
>> On 2021/6/3 06:20, Segher Boessenkool wrote:
>>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote:
 On P8LE, extra rot64+rot64 load or store instructions are generated
 in float128 to vector __int128 conversion.

 This patch teaches pass swaps to also handle such pattens to remove
 extra swap instructions.
>>>
>>> Did you check if this is already handled by simplify-rtx if the mode had
>>> been TImode (not V1TImode)?  If not, why do you not handle it there?
>>
>> I tried to do it in combine or peephole, the later pass split2
>> or split3 will still split it to rotate + rotate again as we have split
>> after reload, and this pattern is quite P8LE specific, so put it in pass
>> swap.  The simplify-rtx could simplify
>> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register
>> operations already.
> 
> What mode are those subregs?  Abbreviated RTL printouts are very lossy.
> Assuming those are TImode (please check), then yes, that is what I
> asked, thanks.

typedef union
{
  __float128 vf1;
  vector __int128 vi128;
  __int128 i128;
} VF_128;


VF_128 vu;

int foo3 ()
{
  __float128 f128 = {3.1415926535897932384626433832795028841971693993751058Q};
  vu.vf1 = f128;
  vector __int128 ret = vu.vi128;
  return ret[0];
}

This case catches such optimization, they are also V1TImode:

Trying 8 -> 9:
8: r122:KF#0=r123:KF#0<-<0x40
  REG_DEAD r123:KF
9: r124:KF#0=r122:KF#0<-<0x40
  REG_DEAD r122:KF
Successfully matched this instruction:
(set (subreg:V1TI (reg:KF 124) 0)
(rotate:V1TI (rotate:V1TI (subreg:V1TI (reg:KF 123) 0)
(const_int 64 [0x40]))
(const_int 64 [0x40])))
allowing combination of insns 8 and 9
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 8.
modifying insn i3 9: r124:KF#0=r123:KF#0<-<0x40<-<0x40
  REG_DEAD r123:KF
deferring rescan insn with uid = 9.


With confirmation, actually it was optimized by this pattern from vsx.md
in split1 pass instead of simplify-rtx.


(define_insn_and_split "*vsx_le_undo_permute_"
  [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa")
(rotate:VSX_TI
 (rotate:VSX_TI
  (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa")
  (const_int 64))
 (const_int 64)))]
  "!BYTES_BIG_ENDIAN && TARGET_VSX"
  "@
   #
   xxlor %x0,%x1"
  "&& 1"
  [(set (match_dup 0) (match_dup 1))]
{
  if (reload_completed && REGNO (operands[0]) == REGNO (operands[1]))
{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}
}
  [(set_attr "length" "0,4")
   (set_attr "type" "veclogical")])


> 
>> ;; The post-reload split requires that we re-permute the source
>> ;; register in case it is still live.
>> (define_split
>>[(set (match_operand:VSX_LE_128 0 "memory_operand")
>>  (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>>"!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>> && !altivec_indexed_or_indirect_operand (operands[0], mode)"
>>[(const_int 0)]
>> {
>>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>>rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
>>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>>DONE;
>> })
> 
> Yes, that needs improvement itself.
> 
> The tthing to realise is that TImode is optimised by generic code just
> fine (as all scalar integer modes are), but V1TImode is not.  We have
> that mode because we really needed to not put TImode in vector registers
> so much on older cpus, but that balance may have changed by now.  Worth
> experimenting with, we now can do pretty much all noormal operations in
> vector registers!
>
We have two issues stated in PR100085, one is __float128 to vector __int128 
(V1TImode),
the other is float128 to __float128 to __int128 (TImode).   The first one could 
be 
solved by this patch(by pass swap optimization), so to cover the TImode case, 
we should
also generate rotate permute instructions when gen_movti for P8LE like gen_movkf
in vector.md(below change is exactly copied from "mov"...)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 84820d3b5cb..f35c235a39e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7385,7 +7385,22 @@ (define_expand "mov"
(match_operand:INT 1 "any_operand"))]
   ""
 {
-  rs6000_emit_move (operands[0], operands[1], mode);
+  /* When generating load/store instructions to/from VSX registers on
+ pre-power9 hardware in little endian mode, we need to emit register
+ permute instructions to byte swap the contents, since the VSX load/store
+ instructions do not include a byte swap as part of their operation.
+ Altivec loads and stores have no such problem, so we skip them below.  */
+  if (!BYTES_BIG_ENDIAN
+  && VECTOR_MEM_VSX_P (mode)
+  &

RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

2021-06-03 Thread Liu, Hongtao via Gcc-patches



>-Original Message-
>From: Segher Boessenkool 
>Sent: Friday, June 4, 2021 4:00 AM
>To: Liu, Hongtao 
>Cc: Richard Biener ; GCC Patches patc...@gcc.gnu.org>
>Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not
>(vec_duplicate A)).
>
>On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao wrote:
>> >A very typical example is how UMIN is optimised:
>> >
>> >   case UMIN:
>> >  if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
>> >return op1;
>> >  if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
>> >return op0;
>> >  tem = simplify_associative_operation (code, mode, op0, op1);
>> >  if (tem)
>> >return tem;
>> >  break;
>> >
>> >(the stuff using "tem").
>> >
>> >Hongtao, can we do something similar here?  Does that work well?
>> >Please try it out :-)
>>
>> In simplify_rtx, no simplication occurs, there is just the difference
>> between  (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here
>tem will only be 0.
>
>simplify-rtx is used by combine.  When you do and+not+splat for example my
>suggestion should kick in.  Try it out, don't just dismiss it?
>
Forgive my obtuseness, do you mean try the following changes, if so then there 
will be no "kick in", 
temp will be 0, there's no simplification here since it's just the difference 
between  (vec_duplicate (not REG))
 and (not (vec_duplicate (REG)). Or maybe you mean something else?

@@ -1708,6 +1708,17 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
 #endif
   break;

+  /* Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).  */
+case VEC_DUPLICATE:
+  if (GET_CODE (op) == NOT)
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (mode, XEXP (op, 0));
+ temp = simplify_unary_operation (NOT, mode, vec_dup, GET_MODE (op));
+ if (temp)
+   return temp;
+   }
+  break;
+
>> Basically we don't know it's a simplication until combine successfully
>> split the
>> 3->2 instructions (not + broadcast + and to andnot + broadcast), but
>> 3->it's pretty awkward
>> to do this in combine.
>
>But you need to do this *before* it is split.  That is the whole point.
>
>> Consider andnot is existed for many backends, I think a canonicalization is
>needed here.
>
>Please do note that that is not as easy as yoou may think: you need to make
>sure nothing ever creates non-canonical code.
>
>> Maybe we can add insn canonicalization for transforming (and
>> (vect_duplicate (not A)) B) to (and (not (duplicate (not A)) B) instead of
>(vec_duplicate (not A)) to (not (vec_duplicate A))?
>
>I don't understand what this means?
I mean let's give a last shot for andnot in case AND like below

@ -3702,6 +3702,16 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
return tem;
+
+  if (GET_CODE (op0) == VEC_DUPLICATE
+ && GET_CODE (XEXP (op0, 0)) == NOT)
+   {
+ rtx vec_dup = gen_rtx_VEC_DUPLICATE (GET_MODE (op0),
+  XEXP (XEXP (op0, 0), 0));
+ return simplify_gen_binary (AND, mode,
+ gen_rtx_NOT (mode, vec_dup),
+ op1);
+   }
   break;
>
>
>Segher


Re: [PATCH 1/2] CALL_INSN may not be a real function call.

2021-06-03 Thread Hongtao Liu via Gcc-patches
Ping,
This is a splitted middle-end patch as a follow up of
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571544.html

On Thu, Jun 3, 2021 at 2:54 PM liuhongt via Gcc-patches
 wrote:
>
> Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a
> fake call, it won't have its own function stack.
>
> gcc/ChangeLog
>
> PR target/82735
> * df-scan.c (df_get_call_refs): When call_insn is a fake call,
> it won't use stack pointer reg.
> * final.c (leaf_function_p): When call_insn is a fake call, it
> won't affect caller as a leaf function.
> * reg-stack.c (callee_clobbers_any_stack_reg): New.
> (subst_stack_regs): When call_insn doesn't clobber any stack
> reg, don't clear the arguments.
> * rtl.c (shallow_copy_rtx): Don't clear flag used when orig is
> a insn.
> * shrink-wrap.c (requires_stack_frame_p): No need for stack
> frame for a fake call.
> * rtl.h (FAKE_CALL_P): New macro.
> ---
>  gcc/df-scan.c |  3 ++-
>  gcc/final.c   |  3 ++-
>  gcc/reg-stack.c   | 18 +-
>  gcc/rtl.c |  6 --
>  gcc/rtl.h |  5 +
>  gcc/shrink-wrap.c |  2 +-
>  6 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 6691c3e8357..1268536b3f0 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3090,7 +3090,8 @@ df_get_call_refs (class df_collection_rec 
> *collection_rec,
>
>for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>  {
> -  if (i == STACK_POINTER_REGNUM)
> +  if (i == STACK_POINTER_REGNUM
> + && !FAKE_CALL_P (insn_info->insn))
> /* The stack ptr is used (honorarily) by a CALL insn.  */
> df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
>NULL, bb, insn_info, DF_REF_REG_USE,
> diff --git a/gcc/final.c b/gcc/final.c
> index e0a70fcd830..817f7722cb2 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -4109,7 +4109,8 @@ leaf_function_p (void)
>for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
>  {
>if (CALL_P (insn)
> - && ! SIBLING_CALL_P (insn))
> + && ! SIBLING_CALL_P (insn)
> + && ! FAKE_CALL_P (insn))
> return 0;
>if (NONJUMP_INSN_P (insn)
>   && GET_CODE (PATTERN (insn)) == SEQUENCE
> diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
> index 25210f0c17f..1d9ea035cf4 100644
> --- a/gcc/reg-stack.c
> +++ b/gcc/reg-stack.c
> @@ -174,6 +174,7 @@
>  #include "reload.h"
>  #include "tree-pass.h"
>  #include "rtl-iter.h"
> +#include "function-abi.h"
>
>  #ifdef STACK_REGS
>
> @@ -2368,6 +2369,18 @@ subst_asm_stack_regs (rtx_insn *insn, stack_ptr 
> regstack)
> }
>}
>  }
> +
> +/* Return true if a function call is allowed to alter some or all bits
> +   of any stack reg.  */
> +static bool
> +callee_clobbers_any_stack_reg (const function_abi & callee_abi)
> +{
> +  for (unsigned regno = FIRST_STACK_REG; regno <= LAST_STACK_REG; regno++)
> +if (callee_abi.clobbers_at_least_part_of_reg_p (regno))
> +  return true;
> +  return false;
> +}
> +
>
>  /* Substitute stack hard reg numbers for stack virtual registers in
> INSN.  Non-stack register numbers are not changed.  REGSTACK is the
> @@ -2382,7 +2395,10 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack)
>bool control_flow_insn_deleted = false;
>int i;
>
> -  if (CALL_P (insn))
> +  /* If the target of the call doesn't clobber any stack registers,
> + Don't clear the arguments.  */
> +  if (CALL_P (insn)
> +  && callee_clobbers_any_stack_reg (insn_callee_abi (insn)))
>  {
>int top = regstack->top;
>
> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index b0ba1ff684c..aaee882f5ca 100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -395,8 +395,10 @@ shallow_copy_rtx (const_rtx orig MEM_STAT_DECL)
>  case SCRATCH:
>break;
>  default:
> -  /* For all other RTXes clear the used flag on the copy.  */
> -  RTX_FLAG (copy, used) = 0;
> +  /* For all other RTXes clear the used flag on the copy.
> +CALL_INSN use "used" flag to indicate it's a fake call.  */
> +  if (!INSN_P (orig))
> +   RTX_FLAG (copy, used) = 0;
>break;
>  }
>return copy;
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 35178b5bfac..5ed0d6dd6fa 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -839,6 +839,11 @@ struct GTY(()) rtvec_def {
>  /* Predicate yielding nonzero iff X is a call insn.  */
>  #define CALL_P(X) (GET_CODE (X) == CALL_INSN)
>
> +/* 1 if RTX is a call_insn for a fake call.
> +   CALL_INSN use "used" flag to indicate it's a fake call.  */
> +#define FAKE_CALL_P(RTX)\
> +  (RTL_FLAG_CHECK1 ("FAKE_CALL_P", (RTX), CALL_INSN)->used)
> +
>  /* Predicate yielding nonzero iff X is an insn that cannot jump.  */
>  #define NONJUMP_INSN_P(X) (GET_CODE (X) == INSN)
>
> diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
> i

Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.

2021-06-03 Thread Hongtao Liu via Gcc-patches
Ping

This is a splitted backend patch as a follow up of
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571545.html

On Thu, Jun 3, 2021 at 2:55 PM liuhongt via Gcc-patches
 wrote:
>
> When __builtin_ia32_vzeroupper is called explicitly, the corresponding
> vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
> which leads to incorrect optimization in pass_reload. In order to
> solve this problem, this patch refine instructions as call_insns in
> which the call has a special vzeroupper ABI.
>
> gcc/ChangeLog:
>
> PR target/82735
> * config/i386/i386-expand.c (ix86_expand_builtin): Remove
> assignment of cfun->machine->has_explicit_vzeroupper.
> * config/i386/i386-features.c
> (ix86_add_reg_usage_to_vzerouppers): Delete.
> (ix86_add_reg_usage_to_vzeroupper): Ditto.
> (rest_of_handle_insert_vzeroupper): Remove
> ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
> of the function.
> (gate): Remove cfun->machine->has_explicit_vzeroupper.
> * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
> Declared.
> * config/i386/i386.c (ix86_insn_callee_abi): New function.
> (ix86_initialize_callee_abi): Ditto.
> (ix86_expand_avx_vzeroupper): Ditto.
> (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
> ABI.
> (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
> (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
> directly.
> * config/i386/i386.h (struct GTY(()) machine_function): Delete
> has_explicit_vzeroupper.
> * config/i386/i386.md (enum unspec): New member
> UNSPEC_CALLEE_ABI.
> (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New
> define_constants for insn callee abi index.
> * config/i386/predicates.md (vzeroupper_pattern): Adjust.
> * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
> (avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
> (*avx_vzeroupper): Rename to ..
> (avx_vzeroupper_callee_abi): .. this, and adjust pattern as
> call_insn which has a special vzeroupper ABI.
> (*avx_vzeroupper_1): Deleted.
>
> gcc/testsuite/ChangeLog:
>
> PR target/82735
> * gcc.target/i386/pr82735-1.c: New test.
> * gcc.target/i386/pr82735-2.c: New test.
> * gcc.target/i386/pr82735-3.c: New test.
> * gcc.target/i386/pr82735-4.c: New test.
> * gcc.target/i386/pr82735-5.c: New test.
> ---
>  gcc/config/i386/i386-expand.c |  4 -
>  gcc/config/i386/i386-features.c   | 99 +++
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/i386.c| 55 -
>  gcc/config/i386/i386.h|  4 -
>  gcc/config/i386/i386.md   | 10 +++
>  gcc/config/i386/predicates.md |  5 +-
>  gcc/config/i386/sse.md| 59 --
>  gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++
>  gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 +
>  gcc/testsuite/gcc.target/i386/pr82735-3.c |  5 ++
>  gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++
>  gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 +
>  13 files changed, 252 insertions(+), 143 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 9f3d41955a2..d25d59aa4e7 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -13282,10 +13282,6 @@ rdseed_step:
>
>return 0;
>
> -case IX86_BUILTIN_VZEROUPPER:
> -  cfun->machine->has_explicit_vzeroupper = true;
> -  break;
> -
>  default:
>break;
>  }
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index 77783a154b6..a25769ae478 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p)
>return 0;
>  }
>
> -/* Modify the vzeroupper pattern in INSN so that it describes the effect
> -   that the instruction has on the SSE registers.  LIVE_REGS are the set
> -   of registers that are live across the instruction.
> -
> -   For a live register R we use:
> -
> - (set (reg:V2DF R) (reg:V2DF R))
> -
> -   which preserves the low 128 bits but clobbers the upper bits.  */
> -
> -static void
> -ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> -{
> -  rtx pattern = PATTERN (insn);
> -  unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  unsigned int npats = nregs;
> -  fo

Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2021/6/3 下午5:18, Segher Boessenkool wrote:
> On Thu, Jun 03, 2021 at 03:00:44AM -0500, Segher Boessenkool wrote:
>> On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote:
>> The whole point of requiring the split condition to start with && is so
>> it will become harder to mess things up (it will make the gen* code a
>> tiny little bit simpler as well).  And there is no transition period or
>> anything like that needed either.  Just the bunch that will break will
>> need fixing.  So let's find out how many of those there are :-)
>>

To find out those need fixing seems to be the critical part.  It's
not hard to add one explicit "&&" to those that don't have it now, but
even with further bootstrapped and regression tested I'm still not
confident the adjustments are safe enough, since the testing coverage
could be limited.  It may need more efforts to revisit, or/and test
with more coverages, and port maintainers' reviews.

In order to find one example which needs more fixing, for rs6000/i386/
aarch64, I fixed all define_insn_and_splits whose insn cond isn't empty
(from gensupport's view since the iterator can add more on) while split
cond don't start with "&&" , also skipped those whose insn conds are
the same as their split conds.  Unfortunately (or fortunately :-\) all
were bootstrapped and regress-tested.

The related diffs are attached, which is based on r12-0.

 How many such cases *are* there?  There are no users exposed to this,
 and when the split condition is required to start with "&&" (instead of
 getting that implied) it is not a silent change ever, either.
>>>
>>> If I read the proposal right, the explicit "&&" is only required when going
>>> to find all potential problematic places for final implied "&&" change.
>>> But one explicit "&&" does offer good readability.
>>
>> My proposal is to always require && (or maybe identical insn and split
>> conditions should be allowed as well, if people still use that -- that
>> is how we wrote "&& 1" before that existed).
> 
> I prototyped this.  There are very many errors.  Iterators often modify
> the insn condition (for one iteration of it), but that does not work if
> the split condition does not start with "&&"!
> 
> See attached prototype.
> 
> 

Thanks for the prototype!

BR,
Kewen

> Segher
> 
> = = =
> 
> diff --git a/gcc/gensupport.c b/gcc/gensupport.c
> index 2cb760ffb90f..05d46fd3775c 100644
> --- a/gcc/gensupport.c
> +++ b/gcc/gensupport.c
> @@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc)
>  case DEFINE_INSN_AND_SPLIT:
>  case DEFINE_INSN_AND_REWRITE:
>{
> - const char *split_cond;
>   rtx split;
>   rtvec attr;
>   int i;
> @@ -611,15 +610,20 @@ process_rtx (rtx desc, file_location loc)
>  
>   /* If the split condition starts with "&&", append it to the
>  insn condition to create the new split condition.  */
> - split_cond = XSTR (desc, 4);
> - if (split_cond[0] == '&' && split_cond[1] == '&')
> + const char *insn_cond = XSTR (desc, 2);
> + const char *split_cond = XSTR (desc, 4);
> + if (!strncmp (split_cond, "&&", 2))
> {
>   rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond);
> - split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2),
> + split_cond = rtx_reader_ptr->join_c_conditions (insn_cond,
>   split_cond + 2);
> +   } else if (insn_cond[0]) {
> + if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
> +   error_at (loc, "the rewrite condition must start with `&&'");
> + else
> +   error_at (loc, "the split condition must start with `&&' [%s]", 
> insn_cond);
> }
> - else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
> -   error_at (loc, "the rewrite condition must start with `&&'");
> +
>   XSTR (split, 1) = split_cond;
>   if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
> XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1));
> 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index abfd845..86869d9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1283,7 +1283,7 @@ (define_insn_and_split "*movsi_aarch64"
fmov\\t%w0, %s1
fmov\\t%s0, %s1
* return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
-  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
SImode)
+  "&& CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
SImode)
 && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
[(const_int 0)]
"{
@@ -1319,7 +1319,7 @@ (define_insn_and_split "*movdi_aarch64"
fmov\\t%x0, %d1
fmov\\t%d0, %d1
* return aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);"
-   "(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), 
DImode))
+   "&& (CONST_INT_P (operands[1]) && !aarch64_move_imm (

Re: [PATCH 04/11] cris: Update unexpected empty split condition

2021-06-03 Thread Kewen.Lin via Gcc-patches
on 2021/6/4 上午12:12, Hans-Peter Nilsson wrote:
>> From: Kewen.Lin 
>> Date: Thu, 3 Jun 2021 07:45:57 +0200
> 
>> on 2021/6/2 Hans-Peter Nilsson wrote:
 From: Kewen Lin 
 Date: Wed, 2 Jun 2021 07:04:54 +0200
>>>
 gcc/ChangeLog:

* config/cris/cris.md (*addi_reload): Fix empty split condition.
> 
 -  ""
 +  "&& 1"
> 
>>> Ok, thanks, if only for all-round consistency.
>>>
>>> In preparation for a warning for an empty condition?  I'm
>>> usually all for .md-warnings, but I'm not sure about the
>>> benefit of that one, though.  Those "&& 1" look...hackish.
>>
>> Thanks!  Yeah, the 01/11 patch aims to raise one error message
>> for the define_insn_and_split whose split condition is empty
>> while insn condition isn't.  In most cases, when we write one
>> define_insn_and_split we want the splitting only to take effect
>> while we see the define_insn matching happen (insn cond holds),
>> but if we leave the split condition empty, the splitting will
>> be done always, it could result in some unexpected consequence.
>> Mostly this is unintentional.
> 
> It certainly was in the patch above!
> 
>>  The error message is to avoid
>> people to make it unintentionally.
>>
>> As you may have seen from the discussion under the 00/11 thread,
>> we will probably end up with some other solution, so I will hold
>> the changes for the ports, sorry for wasting your time and the
>> other port maintainers'.
> 
> No worries: I certainly don't consider it wasted and I'd
> prefer to have the patch above committed sooner than the
> conclusion of that discussion.  (If you don't get to it,
> I'll do it, after a round of testing.)
> 

Thanks for your help on testing!

> If you're considering further target patches to adjust for
> eventually changed semantics in the define_insn_and_split
> split-condition, then whatever trivial patch to cris.md that
> gets the effect of the one you sent is preapproved.
> 

OK, thanks again!

BR,
Kewen


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-03 Thread Kewen.Lin via Gcc-patches
on 2021/6/3 下午4:05, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richi/Richard/Jeff/Segher,
>>
>> Thanks for the comments!
>>
>> on 2021/6/3 锟斤拷锟斤拷7:52, Segher Boessenkool wrote:
>>> On Wed, Jun 02, 2021 at 06:32:13PM +0100, Richard Sandiford wrote:
 Richard Biener  writes:
> So what Richard suggests would be to disallow split conditions
> that do not start with "&& ", it's probably easy to do that as well
> and look for build fails.  That should catch all cases to look at.

 Yeah.  As a strawman proposal, how about:

 - add a new "define_independent_insn_and_split" that has the
   current semantics of define_insn_and_split.  This should be
   mechanical.
>>>
>>> I'd rather not have that -- we can just write separate define_insn and
>>> define_split in that case.
>>>
>>
>> Not sure if someone would argue that he/she would like to go with one shared
>> pattern as before, to avoid any possible differences between two seperated
>> patterns and have good maintainability (like only editing on place) and
>> slightly better efficiency.
> 
> Right.  Plus it creates less make-work.  If we didn't have it, someone
> would need to split the define_insn_and_splits that don't currently
> use "&&", then later someone might decide that the missing "&&" was a
> mistake and need to put them together again (or just revert the patch
> and edit from there, I guess).
> 
> Plus define_independent_insn_and_split would act as a flag for something
> that might be suspect.  If we split them then the define_split condition
> will seem to have been chosen deliberately in isolation.
> 
>>> How many such cases *are* there?  There are no users exposed to this,
>>> and when the split condition is required to start with "&&" (instead of
>>> getting that implied) it is not a silent change ever, either.
>>>
>>
>> If I read the proposal right, the explicit "&&" is only required when going
>> to find all potential problematic places for final implied "&&" change.
>> But one explicit "&&" does offer good readability.
> 
> I don't know.  "&& 1" looks kind of weird to me.
> 
> One thing I'd been wondering about a while ago was whether we should key
> the split part of define_insn_and_splits off the insn code, instead of
> repeating the pattern match and insn C condition.  That would make the
> split apply only to the associated define_insns, whereas at the moment
> they also apply to any earlier (less general) define_insn that happens
> to match the pattern and the C conditions.  It would also reduce the
> complexity of the autogenerated define_split logic.
> 
> I don't know whether that's a good idea or not.  But having an explicit
> "&&" implies that the generator shouldn't do that, and that it should
> retest the insn condition from scratch.
> 
 - find the define_insn_and_splits that are missing the "&&", and where
   missing the "&&" might make a difference.  Change them to
   define_independent_insn_and_splits.

   Like Richard says, this can be done by temporarily disallowing
   define_insn_and_splits that have no "&&".
>>>
>>> If we make that change permanently, that is all steps we ever need!
>>>
>>
>> So the question is that: whether we need to demand an explicit "&&".
>> Richard's proposal is for answer "no" which aligns with Richi's auto
>> filling advice before.  I think it would result in fewer changes since
>> those places without explicit "&&" are mostly unintentional, all the jobs
>> are done by implied "&&".  Its downside seems to be bad readability, new
>> readers may take it as two seperated conditions at first glance, but I
>> guess if we emphasize this change in the document it would be fine?
>> Or emitting one warning if missing an explicit "&&"?
> 
> IMO the natural way to read it is that the split C condition gives the
> conditions under which the instruction should be split.  I think that's
> why forgetting the "&&" is such a common mistake.  (I know I've done it
> plenty of times.)
> 
> IMO requiring the "&&" is baking in an alternative, less intuitive,
> interpretation.
> 

Thanks for the explanation, I was thinking people may have got used to
the starting "&&" in split condition, so it's easy for them to read.
But I agree it's better not to have it in the natural way.

BR,
Kewen


[PATCH] c++: tsubst_function_decl and excess arg levels [PR100102]

2021-06-03 Thread Patrick Palka via Gcc-patches
Here, when instantiating the dependent alias template
duration::__is_harmonic with args={{T,U},{int}}, we find ourselves
substituting the function decl _S_gcd.  Since we have more arg levels
than _S_gcd has parm levels, an old special case in tsubst_function_decl
causes us to unwantedly reduce args to its innermost level, yielding
args={int}, which leads to a nonsensical substitution into the decl's
context and an eventual crash.

The comment for this special case refers to three examples for which we
ought to see more arg levels than parm levels here, but none of the
examples actually demonstrate this.  In the first example, when
defining S::f(U) parms_depth is 2 and args_depth is 1, and
later when instantiating say S::f both depths are 2.  In the
second example, when substituting the template friend declaration
parms_depth is 2 and args_depth is 1, and later when instantiating f
both depths are 1.  Finally, the third example is invalid since we can't
specialize a member template of an unspecialized class template like
that.

Given that this reduction code seems no longer relevant for its
documented purpose and that it causes problems as in the PR, this patch
just removes it.  Note that as far as bootstrap/regtest is concerned,
this code is dead; the below two tests would be the first to trigger the
removed code.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps backports?  Also tested on various other libraries,
e.g. range-v3 and cmcstl2.

PR c++/100102

gcc/cp/ChangeLog:

* pt.c (tsubst_function_decl): Remove old code for reducing
args when it has excess levels.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-72.C: New test.
* g++.dg/cpp0x/alias-decl-72a.C: New test.
---
 gcc/cp/pt.c | 39 -
 gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C  |  9 +
 gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C |  9 +
 3 files changed, 18 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3cac073ed50..a6acdf864d1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13909,45 +13909,6 @@ tsubst_function_decl (tree t, tree args, 
tsubst_flags_t complain,
  if (tree spec = retrieve_specialization (gen_tmpl, argvec, hash))
return spec;
}
-
-  /* We can see more levels of arguments than parameters if
-there was a specialization of a member template, like
-this:
-
-template  struct S { template  void f(); }
-template <> template  void S::f(U);
-
-Here, we'll be substituting into the specialization,
-because that's where we can find the code we actually
-want to generate, but we'll have enough arguments for
-the most general template.
-
-We also deal with the peculiar case:
-
-template  struct S {
-  template  friend void f();
-};
-template  void f() {}
-template S;
-template void f();
-
-Here, the ARGS for the instantiation of will be {int,
-double}.  But, we only need as many ARGS as there are
-levels of template parameters in CODE_PATTERN.  We are
-careful not to get fooled into reducing the ARGS in
-situations like:
-
-template  struct S { template  void f(U); }
-template  template <> void S::f(int) {}
-
-which we can spot because the pattern will be a
-specialization in this case.  */
-  int args_depth = TMPL_ARGS_DEPTH (args);
-  int parms_depth =
-   TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (t)));
-
-  if (args_depth > parms_depth && !DECL_TEMPLATE_SPECIALIZATION (t))
-   args = get_innermost_template_args (args, parms_depth);
 }
   else
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C
new file mode 100644
index 000..8009756dcba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C
@@ -0,0 +1,9 @@
+// PR c++/100102
+// { dg-do compile { target c++11 } }
+
+template struct ratio;
+template struct duration {
+  static constexpr int _S_gcd();
+  template using __is_harmonic = ratio<_S_gcd>;
+  using type = __is_harmonic;
+};
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C
new file mode 100644
index 000..a4443e18f9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C
@@ -0,0 +1,9 @@
+// PR c++/100102
+// { dg-do compile { target c++11 } }
+
+template struct ratio;
+template struct duration {
+  static constexpr int _S_gcd();
+  template using __is_harmonic = ratio<(duration::_S_gcd)()>;
+  using type = __is_harmonic;
+};
-- 
2.32.0.rc2



Re: [PATCH] Simplify (view_convert ~a) < 0 to (view_convert a) >= 0 [PR middle-end/100738]

2021-06-03 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 1, 2021 at 6:17 PM Marc Glisse  wrote:
>
> On Tue, 1 Jun 2021, Hongtao Liu via Gcc-patches wrote:
>
> > Hi:
> >  This patch is about to simplify (view_convert:type ~a) < 0 to
> > (view_convert:type a) >= 0 when type is signed integer. Similar for
> > (view_convert:type ~a) >= 0.
> >  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >  Ok for the trunk?
> >
> > gcc/ChangeLog:
> >
> >PR middle-end/100738
> >* match.pd ((view_convert ~a) < 0 --> (view_convert a) >= 0,
> >(view_convert ~a) >= 0 --> (view_convert a) < 0): New GIMPLE
> >simplification.
>
> We already have
>
> /* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
> (for cmp (simple_comparison)
>   scmp (swapped_simple_comparison)
>   (simplify
>(cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
>(if (single_use (@2)
> && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
> (scmp @0 (bit_not @1)
>
> Would it make sense to try and generalize it a bit, say with
>
> (cmp (nop_convert1? (bit_not @0)) CONSTANT_CLASS_P)
>
> (scmp (view_convert:XXX @0) (bit_not @1))
>
Thanks for your advice, it looks great.
And can I use *view_convert1?* instead of *nop_convert1?* here,
because the original case is view_convert, and nop_convert would fail
to simplify the case.
> (I still believe that it is a bad idea that SSA_NAMEs are strongly typed,
> encoding the type in operations would be more convenient, but I think the
> time for that choice has long gone)
>
> --
> Marc Glisse



-- 
BR,
Hongtao


Re: [PATCH 1/2] [i386] Fold blendv builtins into gimple.

2021-06-03 Thread Hongtao Liu via Gcc-patches
ping

On Mon, May 24, 2021 at 12:56 PM Hongtao Liu  wrote:
>
> Hi:
>   This patch is about to Fold __builtin_ia32_pblendvb128 (a, b, c) as
>  VEC_COND_EXPR (c < 0, b, a), similar for float version but with
>  mask operand VIEW_CONVERT_EXPR to same sized integer vectype.
>
> After folding, blendv related patterns can be redefined as
> vec_merge since all elements of mask operand is either const0_rtx or
> constm1_rtx now. It could potentially enable more rtl optimizations.
>
> Besides, although there's no pblendv{d,q} instructions, backend can
> still define their patterns and generate blendv{ps,pd} instead.
>
>   Bootstrap and regtested on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386-builtin.def (IX86_BUILTIN_BLENDVPD256,
> IX86_BUILTIN_BLENDVPS256, IX86_BUILTIN_PBLENDVB256,
> IX86_BUILTIN_BLENDVPD, IX86_BUILTIN_BLENDVPS,
> IX86_BUILTIN_PBLENDVB128): Replace icode with
> CODE_FOR_nothing.
> * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use
> 
> gen_avx_blendvd256/gen_avx_blendvq256/gen_sse4_1_blendvd/gen_sse4_1_blendvq
> for V8SI/V4DI/V4SI/V2DImode.
> * config/i386/i386.c (ix86_gimple_fold_builtin): Fold blendv
> builtins.
> * config/i386/mmx.md (mmx_blendvps): Change to define_expand.
> (*mmx_blendvps): New pattern implemented as vec_merge.
> * config/i386/sse.md
> (_blendv): Change to
> define_expand.
> (_pblendvb): Ditto.
> (*_blendv): New pattern
> implemented as vec_merge.
> (*_pblendvb): Ditto.
> (*_pblendvb_lt): Redefined as define_insn with
> pattern implemented as vec_merge instead of UNSPEC_BLENDV.
> (*_blendv_lt): Ditto,
> and extend mode to V48_AVX.
> (*_pblendvb_not_lt): New.
> (*_blendv_ltint): Deleted.
> (*_pblendvb_lt): Ditto.
> (*_pblendvb_not_lt): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/funcspec-8.c: Replace
> __builtin_ia32_blendvpd with  __builtin_ia32_roundps_az.
> * gcc.target/i386/blendv-1.c: New test.
> * gcc.target/i386/blendv-2.c: New test.
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders

2021-06-03 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 4, 2021 at 12:39 AM H.J. Lu  wrote:
>
> On Thu, Jun 3, 2021 at 12:39 AM Uros Bizjak  wrote:
> >
> > On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu  wrote:
> > >
> > > Update move expanders to convert the CONST_WIDE_INT operand to vector
> > > broadcast from a byte with AVX2.  Add ix86_gen_scratch_sse_rtx to
> > > return a scratch SSE register which won't increase stack alignment
> > > requirement and blocks transformation by the combine pass.
> >
> > Using fixed scratch reg is just too hackish for my taste. The
>
> It was recommended to use hard register for things like this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569945.html

I was worried that the temporary (even hard reg) will be spilled under
some rare cases. If this is not the case, and even recommended for
your use case, then it should be OK. Maybe use a hard register instead
of match_scratch in the clobber of the proposed insn pattern below.

> > expansion is OK to emit some optimized sequence, but the approach to
> > use fixed reg to bypass stack alignment functionality and combine is
> > not.
> >
> > Perhaps a new insn pattern should be introduced, e.g.
> >
> > (define_insn_and_split ""
> >[(set (match_opreand:V 0 "memory_operand" "=m,m")
> > (vec_duplicate:V
> >   (match_operand:S 2 "reg_or_0_operand" "r,C"))
> > (clobber (match_scratch:V 1 "=x"))]
> >
> > and split it at some appropriate point.

Please note that zero (C) can be assigned directly to the XMM
register, so there is no need for the intermediate integer reg,
assuming that zero was propagated into the insn (-1 can be handled
this way, too). Due to these propagations, it looks that the correct
point to split the insn is after the reload.

On a related note, you are using only vpbroadcastb, assuming byte
(QImode) granularity. Is it possible to also use HImode and larger
modes to handle e.g. initializations of int arrays?

Uros.

> I will give it a try.
>
> Thanks.
>
> > Uros.
> >
> > >
> > > A small benchmark:
> > >
> > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
> > >
> > > shows that broadcast is a little bit faster on Intel Core i7-8559U:
> > >
> > > $ make
> > > gcc -g -I. -O2   -c -o test.o test.c
> > > gcc -g   -c -o memory.o memory.S
> > > gcc -g   -c -o broadcast.o broadcast.S
> > > gcc -o test test.o memory.o broadcast.o
> > > ./test
> > > memory   : 99333
> > > broadcast: 97208
> > > $
> > >
> > > broadcast is also smaller:
> > >
> > > $ size memory.o broadcast.o
> > >textdata bss dec hex filename
> > > 132   0   0 132  84 memory.o
> > > 122   0   0 122  7a broadcast.o
> > > $
> > >
> > > gcc/
> > >
> > > PR target/100865
> > > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> > > New prototype.
> > > (ix86_byte_broadcast): New function.
> > > (ix86_convert_const_wide_int_to_broadcast): Likewise.
> > > (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast
> > > if mode size is 16 bytes or bigger.
> > > (ix86_expand_vector_move): Try
> > > ix86_convert_const_wide_int_to_broadcast.
> > > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> > > prototype.
> > > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
> > > an argument to ignore stack_alignment_estimated.  It is passed
> > > as false by default.
> > > (ix86_gen_scratch_sse_rtx): New function.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/100865
> > > * gcc.target/i386/pr100865-1.c: New test.
> > > * gcc.target/i386/pr100865-2.c: Likewise.
> > > * gcc.target/i386/pr100865-3.c: Likewise.
> > > * gcc.target/i386/pr100865-4.c: Likewise.
> > > * gcc.target/i386/pr100865-5.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-expand.c  | 103 ++---
> > >  gcc/config/i386/i386-protos.h  |   2 +
> > >  gcc/config/i386/i386.c |  50 +-
> > >  gcc/testsuite/gcc.target/i386/pr100865-1.c |  13 +++
> > >  gcc/testsuite/gcc.target/i386/pr100865-2.c |  14 +++
> > >  gcc/testsuite/gcc.target/i386/pr100865-3.c |  15 +++
> > >  gcc/testsuite/gcc.target/i386/pr100865-4.c |  16 
> > >  gcc/testsuite/gcc.target/i386/pr100865-5.c |  17 
> > >  8 files changed, 215 insertions(+), 15 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > > index 4185f58eed5..658adafa269 100644
> > > --- a/gcc/config/i386/i386-expand.c
> > > +++ b/gcc/config/i386/

Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.

2021-06-03 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 3, 2021 at 8:54 AM liuhongt  wrote:
>
> When __builtin_ia32_vzeroupper is called explicitly, the corresponding
> vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
> which leads to incorrect optimization in pass_reload. In order to
> solve this problem, this patch refine instructions as call_insns in
> which the call has a special vzeroupper ABI.
>
> gcc/ChangeLog:
>
> PR target/82735
> * config/i386/i386-expand.c (ix86_expand_builtin): Remove
> assignment of cfun->machine->has_explicit_vzeroupper.
> * config/i386/i386-features.c
> (ix86_add_reg_usage_to_vzerouppers): Delete.
> (ix86_add_reg_usage_to_vzeroupper): Ditto.
> (rest_of_handle_insert_vzeroupper): Remove
> ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
> of the function.
> (gate): Remove cfun->machine->has_explicit_vzeroupper.
> * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
> Declared.
> * config/i386/i386.c (ix86_insn_callee_abi): New function.
> (ix86_initialize_callee_abi): Ditto.
> (ix86_expand_avx_vzeroupper): Ditto.
> (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
> ABI.
> (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
> (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
> directly.
> * config/i386/i386.h (struct GTY(()) machine_function): Delete
> has_explicit_vzeroupper.
> * config/i386/i386.md (enum unspec): New member
> UNSPEC_CALLEE_ABI.
> (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New
> define_constants for insn callee abi index.
> * config/i386/predicates.md (vzeroupper_pattern): Adjust.
> * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
> (avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
> (*avx_vzeroupper): Rename to ..
> (avx_vzeroupper_callee_abi): .. this, and adjust pattern as
> call_insn which has a special vzeroupper ABI.
> (*avx_vzeroupper_1): Deleted.
>
> gcc/testsuite/ChangeLog:
>
> PR target/82735
> * gcc.target/i386/pr82735-1.c: New test.
> * gcc.target/i386/pr82735-2.c: New test.
> * gcc.target/i386/pr82735-3.c: New test.
> * gcc.target/i386/pr82735-4.c: New test.
> * gcc.target/i386/pr82735-5.c: New test.

LGTM, with a small nit below.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.c |  4 -
>  gcc/config/i386/i386-features.c   | 99 +++
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/i386.c| 55 -
>  gcc/config/i386/i386.h|  4 -
>  gcc/config/i386/i386.md   | 10 +++
>  gcc/config/i386/predicates.md |  5 +-
>  gcc/config/i386/sse.md| 59 --
>  gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++
>  gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 +
>  gcc/testsuite/gcc.target/i386/pr82735-3.c |  5 ++
>  gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++
>  gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 +
>  13 files changed, 252 insertions(+), 143 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 9f3d41955a2..d25d59aa4e7 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -13282,10 +13282,6 @@ rdseed_step:
>
>return 0;
>
> -case IX86_BUILTIN_VZEROUPPER:
> -  cfun->machine->has_explicit_vzeroupper = true;
> -  break;
> -
>  default:
>break;
>  }
> diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
> index 77783a154b6..a25769ae478 100644
> --- a/gcc/config/i386/i386-features.c
> +++ b/gcc/config/i386/i386-features.c
> @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p)
>return 0;
>  }
>
> -/* Modify the vzeroupper pattern in INSN so that it describes the effect
> -   that the instruction has on the SSE registers.  LIVE_REGS are the set
> -   of registers that are live across the instruction.
> -
> -   For a live register R we use:
> -
> - (set (reg:V2DF R) (reg:V2DF R))
> -
> -   which preserves the low 128 bits but clobbers the upper bits.  */
> -
> -static void
> -ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs)
> -{
> -  rtx pattern = PATTERN (insn);
> -  unsigned int nregs = TARGET_64BIT ? 16 : 8;
> -  unsigned int npats = nregs;
> -  for (unsigned int i = 0; i < nregs; ++i)
> -{
> -  unsigned int regno = GET_SSE_REGNO

Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.

2021-06-03 Thread Hongtao Liu via Gcc-patches
On Fri, Jun 4, 2021 at 2:27 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, Jun 3, 2021 at 8:54 AM liuhongt  wrote:
> >
> > When __builtin_ia32_vzeroupper is called explicitly, the corresponding
> > vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
> > which leads to incorrect optimization in pass_reload. In order to
> > solve this problem, this patch refine instructions as call_insns in
> > which the call has a special vzeroupper ABI.
> >
> > gcc/ChangeLog:
> >
> > PR target/82735
> > * config/i386/i386-expand.c (ix86_expand_builtin): Remove
> > assignment of cfun->machine->has_explicit_vzeroupper.
> > * config/i386/i386-features.c
> > (ix86_add_reg_usage_to_vzerouppers): Delete.
> > (ix86_add_reg_usage_to_vzeroupper): Ditto.
> > (rest_of_handle_insert_vzeroupper): Remove
> > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
> > of the function.
> > (gate): Remove cfun->machine->has_explicit_vzeroupper.
> > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
> > Declared.
> > * config/i386/i386.c (ix86_insn_callee_abi): New function.
> > (ix86_initialize_callee_abi): Ditto.
> > (ix86_expand_avx_vzeroupper): Ditto.
> > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
> > ABI.
> > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
> > (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
> > directly.
> > * config/i386/i386.h (struct GTY(()) machine_function): Delete
> > has_explicit_vzeroupper.
> > * config/i386/i386.md (enum unspec): New member
> > UNSPEC_CALLEE_ABI.
> > (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New
> > define_constants for insn callee abi index.
> > * config/i386/predicates.md (vzeroupper_pattern): Adjust.
> > * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
> > (avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
> > (*avx_vzeroupper): Rename to ..
> > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as
> > call_insn which has a special vzeroupper ABI.
> > (*avx_vzeroupper_1): Deleted.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/82735
> > * gcc.target/i386/pr82735-1.c: New test.
> > * gcc.target/i386/pr82735-2.c: New test.
> > * gcc.target/i386/pr82735-3.c: New test.
> > * gcc.target/i386/pr82735-4.c: New test.
> > * gcc.target/i386/pr82735-5.c: New test.
>
> LGTM, with a small nit below.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386-expand.c |  4 -
> >  gcc/config/i386/i386-features.c   | 99 +++
> >  gcc/config/i386/i386-protos.h |  1 +
> >  gcc/config/i386/i386.c| 55 -
> >  gcc/config/i386/i386.h|  4 -
> >  gcc/config/i386/i386.md   | 10 +++
> >  gcc/config/i386/predicates.md |  5 +-
> >  gcc/config/i386/sse.md| 59 --
> >  gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++
> >  gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 +
> >  gcc/testsuite/gcc.target/i386/pr82735-3.c |  5 ++
> >  gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++
> >  gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 +
> >  13 files changed, 252 insertions(+), 143 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 9f3d41955a2..d25d59aa4e7 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -13282,10 +13282,6 @@ rdseed_step:
> >
> >return 0;
> >
> > -case IX86_BUILTIN_VZEROUPPER:
> > -  cfun->machine->has_explicit_vzeroupper = true;
> > -  break;
> > -
> >  default:
> >break;
> >  }
> > diff --git a/gcc/config/i386/i386-features.c 
> > b/gcc/config/i386/i386-features.c
> > index 77783a154b6..a25769ae478 100644
> > --- a/gcc/config/i386/i386-features.c
> > +++ b/gcc/config/i386/i386-features.c
> > @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p)
> >return 0;
> >  }
> >
> > -/* Modify the vzeroupper pattern in INSN so that it describes the effect
> > -   that the instruction has on the SSE registers.  LIVE_REGS are the set
> > -   of registers that are live across the instruction.
> > -
> > -   For a live register R we use:
> > -
> > - (set (reg:V2DF R) (reg:V2DF R))
> > -
> > -   which preserves the low 128 bits but clobbers the upper bits.  */
> > -
> > -static

  1   2   >