Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders
On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu wrote: > > Update move expanders to convert the CONST_WIDE_INT operand to vector > broadcast from a byte with AVX2. Add ix86_gen_scratch_sse_rtx to > return a scratch SSE register which won't increase stack alignment > requirement and blocks transformation by the combine pass. Using fixed scratch reg is just too hackish for my taste. The expansion is OK to emit some optimized sequence, but the approach to use fixed reg to bypass stack alignment functionality and combine is not. Perhaps a new insn pattern should be introduced, e.g. (define_insn_and_split "" [(set (match_opreand:V 0 "memory_operand" "=m,m") (vec_duplicate:V (match_operand:S 2 "reg_or_0_operand" "r,C")) (clobber (match_scratch:V 1 "=x"))] and split it at some appropriate point. Uros. > > A small benchmark: > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast > > shows that broadcast is a little bit faster on Intel Core i7-8559U: > > $ make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -c -o memory.o memory.S > gcc -g -c -o broadcast.o broadcast.S > gcc -o test test.o memory.o broadcast.o > ./test > memory : 99333 > broadcast: 97208 > $ > > broadcast is also smaller: > > $ size memory.o broadcast.o >textdata bss dec hex filename > 132 0 0 132 84 memory.o > 122 0 0 122 7a broadcast.o > $ > > gcc/ > > PR target/100865 > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): > New prototype. > (ix86_byte_broadcast): New function. > (ix86_convert_const_wide_int_to_broadcast): Likewise. > (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast > if mode size is 16 bytes or bigger. > (ix86_expand_vector_move): Try > ix86_convert_const_wide_int_to_broadcast. > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New > prototype. > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add > an argument to ignore stack_alignment_estimated. It is passed > as false by default. > (ix86_gen_scratch_sse_rtx): New function. > > gcc/testsuite/ > > PR target/100865 > * gcc.target/i386/pr100865-1.c: New test. > * gcc.target/i386/pr100865-2.c: Likewise. > * gcc.target/i386/pr100865-3.c: Likewise. > * gcc.target/i386/pr100865-4.c: Likewise. > * gcc.target/i386/pr100865-5.c: Likewise. > --- > gcc/config/i386/i386-expand.c | 103 ++--- > gcc/config/i386/i386-protos.h | 2 + > gcc/config/i386/i386.c | 50 +- > gcc/testsuite/gcc.target/i386/pr100865-1.c | 13 +++ > gcc/testsuite/gcc.target/i386/pr100865-2.c | 14 +++ > gcc/testsuite/gcc.target/i386/pr100865-3.c | 15 +++ > gcc/testsuite/gcc.target/i386/pr100865-4.c | 16 > gcc/testsuite/gcc.target/i386/pr100865-5.c | 17 > 8 files changed, 215 insertions(+), 15 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > index 4185f58eed5..658adafa269 100644 > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -93,6 +93,9 @@ along with GCC; see the file COPYING3. If not see > #include "i386-builtins.h" > #include "i386-expand.h" > > +static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx, > + rtx); > + > /* Split one or more double-mode RTL references into pairs of half-mode > references. The RTL can be REG, offsettable MEM, integer constant, or > CONST_DOUBLE. "operands" is a pointer to an array of double-mode RTLs to > @@ -190,6 +193,65 @@ ix86_expand_clear (rtx dest) >emit_insn (tmp); > } > > +/* Return a byte value which V can be broadcasted from. Otherwise, > + return INT_MAX. */ > + > +static int > +ix86_byte_broadcast (HOST_WIDE_INT v) > +{ > + wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT); > + int byte_broadcast = wi::extract_uhwi (val, 0, BITS_PER_UNIT); > + for (unsigned int i = BITS_PER_UNIT; > + i < HOST_BITS_PER_WIDE_INT; > + i += BITS_PER_UNIT) > +{ > + int byte = wi::extract_uhwi (val, i, BITS_PER_UNIT); > + if (byte_broadcast != byte) > + return INT_MAX; > +} > + return byte_broadcast; > +} > + > +/* Convert the CONST_WIDE_INT operand OP to broadcast in MODE. */ > + > +static rtx > +ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op) > +{ > + rtx target = nullptr; > + > + /* Convert CONST_WIDE_INT to broadcast only if vector broadc
Re: [PATCH] ARC: gcc driver default to hs38_linux
Right, we can consider this cpu switch missing in gcc11. Best, Claudiu From: Vineet Gupta Sent: Wednesday, June 2, 2021 8:36 PM To: Claudiu Zissulescu ; gcc-patches@gcc.gnu.org Cc: linux-snps-...@lists.infradead.org Subject: Re: [PATCH] ARC: gcc driver default to hs38_linux On 6/2/21 1:38 AM, Claudiu Zissulescu wrote: > Approved. Thx for the super quick action on this Claudiu. Can this be slated for backports too as it causes issues when building toolchains for modern cores without explicit defaults. -Vineet > > //Claudiu > > *From:* Vineet Gupta > *Sent:* Tuesday, June 1, 2021 10:42 PM > *To:* gcc-patches@gcc.gnu.org > *Cc:* Claudiu Zissulescu ; > linux-snps-...@lists.infradead.org > ; Vineet Gupta > *Subject:* [PATCH] ARC: gcc driver default to hs38_linux > arc700 is legacy and there's no active development for it, so switch to > latest hs38_linux as default > > Signed-off-by: Vineet Gupta > --- > gcc/config/arc/arc.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h > index bd1fe0abd7af..252241a858c9 100644 > --- a/gcc/config/arc/arc.h > +++ b/gcc/config/arc/arc.h > @@ -34,7 +34,7 @@ along with GCC; see the file COPYING3. If not see > #define SYMBOL_FLAG_CMEM(SYMBOL_FLAG_MACH_DEP << 3) > > #ifndef TARGET_CPU_DEFAULT > -#define TARGET_CPU_DEFAULT PROCESSOR_arc700 > +#define TARGET_CPU_DEFAULT PROCESSOR_hs38_linux > #endif > > /* Check if this symbol has a long_call attribute in its declaration */ > -- > 2.25.1 >
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
Hi! On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote: > on 2021/6/3 上午7:52, Segher Boessenkool wrote: > >> - add a new "define_independent_insn_and_split" that has the > >> current semantics of define_insn_and_split. This should be > >> mechanical. > > > > I'd rather not have that -- we can just write separate define_insn and > > define_split in that case. > > Not sure if someone would argue that he/she would like to go with one shared > pattern as before, to avoid any possible differences between two seperated > patterns and have good maintainability (like only editing on place) and > slightly better efficiency. You only would do this if you have a different insn condition and split condition, which is a very important thing to know, it doesn't hurt to draw attention to it. The efficiency is exactly the same btw, define_insn_and_split is just syntactic sugar. The whole point of requiring the split condition to start with && is so it will become harder to mess things up (it will make the gen* code a tiny little bit simpler as well). And there is no transition period or anything like that needed either. Just the bunch that will break will need fixing. So let's find out how many of those there are :-) > > How many such cases *are* there? There are no users exposed to this, > > and when the split condition is required to start with "&&" (instead of > > getting that implied) it is not a silent change ever, either. > > If I read the proposal right, the explicit "&&" is only required when going > to find all potential problematic places for final implied "&&" change. > But one explicit "&&" does offer good readability. My proposal is to always require && (or maybe identical insn and split conditions should be allowed as well, if people still use that -- that is how we wrote "&& 1" before that existed). Segher
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
"Kewen.Lin" writes: > Hi Richi/Richard/Jeff/Segher, > > Thanks for the comments! > > on 2021/6/3 锟斤拷锟斤拷7:52, Segher Boessenkool wrote: >> On Wed, Jun 02, 2021 at 06:32:13PM +0100, Richard Sandiford wrote: >>> Richard Biener writes: So what Richard suggests would be to disallow split conditions that do not start with "&& ", it's probably easy to do that as well and look for build fails. That should catch all cases to look at. >>> >>> Yeah. As a strawman proposal, how about: >>> >>> - add a new "define_independent_insn_and_split" that has the >>> current semantics of define_insn_and_split. This should be >>> mechanical. >> >> I'd rather not have that -- we can just write separate define_insn and >> define_split in that case. >> > > Not sure if someone would argue that he/she would like to go with one shared > pattern as before, to avoid any possible differences between two seperated > patterns and have good maintainability (like only editing on place) and > slightly better efficiency. Right. Plus it creates less make-work. If we didn't have it, someone would need to split the define_insn_and_splits that don't currently use "&&", then later someone might decide that the missing "&&" was a mistake and need to put them together again (or just revert the patch and edit from there, I guess). Plus define_independent_insn_and_split would act as a flag for something that might be suspect. If we split them then the define_split condition will seem to have been chosen deliberately in isolation. >> How many such cases *are* there? There are no users exposed to this, >> and when the split condition is required to start with "&&" (instead of >> getting that implied) it is not a silent change ever, either. >> > > If I read the proposal right, the explicit "&&" is only required when going > to find all potential problematic places for final implied "&&" change. > But one explicit "&&" does offer good readability. I don't know. "&& 1" looks kind of weird to me. One thing I'd been wondering about a while ago was whether we should key the split part of define_insn_and_splits off the insn code, instead of repeating the pattern match and insn C condition. That would make the split apply only to the associated define_insns, whereas at the moment they also apply to any earlier (less general) define_insn that happens to match the pattern and the C conditions. It would also reduce the complexity of the autogenerated define_split logic. I don't know whether that's a good idea or not. But having an explicit "&&" implies that the generator shouldn't do that, and that it should retest the insn condition from scratch. >>> - find the define_insn_and_splits that are missing the "&&", and where >>> missing the "&&" might make a difference. Change them to >>> define_independent_insn_and_splits. >>> >>> Like Richard says, this can be done by temporarily disallowing >>> define_insn_and_splits that have no "&&". >> >> If we make that change permanently, that is all steps we ever need! >> > > So the question is that: whether we need to demand an explicit "&&". > Richard's proposal is for answer "no" which aligns with Richi's auto > filling advice before. I think it would result in fewer changes since > those places without explicit "&&" are mostly unintentional, all the jobs > are done by implied "&&". Its downside seems to be bad readability, new > readers may take it as two seperated conditions at first glance, but I > guess if we emphasize this change in the document it would be fine? > Or emitting one warning if missing an explicit "&&"? IMO the natural way to read it is that the split C condition gives the conditions under which the instruction should be split. I think that's why forgetting the "&&" is such a common mistake. (I know I've done it plenty of times.) IMO requiring the "&&" is baking in an alternative, less intuitive, interpretation. Thanks, Richard
Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)
On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote: > On 6/2/21 12:55 AM, Richard Biener wrote: > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor wrote: > > > > > > On 5/27/21 2:53 PM, Jason Merrill wrote: > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote: > > > > > On 4/27/21 8:04 AM, Richard Biener wrote: > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor > > > > > > wrote: > > > > > > > > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote: > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign > > > > > > > > > because > > > > > > > > > the class manages its own memory but doesn't define (or > > > > > > > > > delete) > > > > > > > > > either special function. Since I first ran into the problem, > > > > > > > > > auto_vec has grown a move ctor and move assignment from > > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy > > > > > > > > > assignment operator. > > > > > > > > > > > > > > > > > > The attached patch adds the two special functions to auto_vec > > > > > > > > > along > > > > > > > > > with a few simple tests. It makes auto_vec safe to use in > > > > > > > > > containers > > > > > > > > > that expect copyable and assignable element types and passes > > > > > > > > > bootstrap > > > > > > > > > and regression testing on x86_64-linux. > > > > > > > > > > > > > > > > The question is whether we want such uses to appear since those > > > > > > > > can be quite inefficient? Thus the option is to delete those > > > > > > > > operators? > > > > > > > > > > > > > > I would strongly prefer the generic vector class to have the > > > > > > > properties > > > > > > > expected of any other generic container: copyable and assignable. > > > > > > > If > > > > > > > we also want another vector type with this restriction I suggest > > > > > > > to add > > > > > > > another "noncopyable" type and make that property explicit in its > > > > > > > name. > > > > > > > I can submit one in a followup patch if you think we need one. > > > > > > > > > > > > I'm not sure (and not strictly against the copy and assign). > > > > > > Looking > > > > > > around > > > > > > I see that vec<> does not do deep copying. Making auto_vec<> do it > > > > > > might be surprising (I added the move capability to match how vec<> > > > > > > is used - as "reference" to a vector) > > > > > > > > > > The vec base classes are special: they have no ctors at all (because > > > > > of their use in unions). That's something we might have to live with > > > > > but it's not a model to follow in ordinary containers. > > > > > > > > I don't think we have to live with it anymore, now that we're writing > > > > C++11. > > > > > > > > > The auto_vec class was introduced to fill the need for a conventional > > > > > sequence container with a ctor and dtor. The missing copy ctor and > > > > > assignment operators were an oversight, not a deliberate feature. > > > > > This change fixes that oversight. I've been away a while, but trying to get back into this, sorry. It was definitely an oversight to leave these undefined for the compiler to provide a default definition of, but I agree with Richi, the better thing to have done, or do now would be to mark them as deleted and make auto_vec move only (with copy() for when you really need a deep copy. > > > > > > > > > > The revised patch also adds a copy ctor/assignment to the auto_vec > > > > > primary template (that's also missing it). In addition, it adds > > > > > a new class called auto_vec_ncopy that disables copying and > > > > > assignment as you prefer. > > > > > > > > Hmm, adding another class doesn't really help with the confusion richi > > > > mentions. And many uses of auto_vec will pass them as vec, which will > > > > still do a shallow copy. I think it's probably better to disable the > > > > copy special members for auto_vec until we fix vec<>. > > > > > > There are at least a couple of problems that get in the way of fixing > > > all of vec to act like a well-behaved C++ container: > > > > > > 1) The embedded vec has a trailing "flexible" array member with its > > > instances having different size. They're initialized by memset and > > > copied by memcpy. The class can't have copy ctors or assignments > > > but it should disable/delete them instead. > > > > > > 2) The heap-based vec is used throughout GCC with the assumption of > > > shallow copy semantics (not just as function arguments but also as > > > members of other such POD classes). This can be changed by providing > > > copy and move ctors and assignment operators for it, and also for > > > some of the classes in which it's a member and that are used with > > > the same assumption. > > > > > > 3) The heap-based vec::block_remove() assumes its elements are PODs. > > > That breaks in VEC
Re: [arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]
On Mon, 15 Mar 2021 at 17:03, Christophe Lyon wrote: > > On Wed, 3 Mar 2021 at 15:00, Richard Earnshaw > wrote: > > > > On 02/03/2021 18:35, Christophe Lyon wrote: > > > On Tue, 2 Mar 2021 at 19:18, Richard Earnshaw > > > wrote: > > >> > > >> On 02/03/2021 18:14, Richard Earnshaw via Gcc-patches wrote: > > >>> On 02/03/2021 18:10, Christophe Lyon wrote: > > On Tue, 2 Mar 2021 at 17:25, Richard Earnshaw > > wrote: > > > > > > On 02/03/2021 16:19, Richard Earnshaw via Gcc-patches wrote: > > >> On 01/03/2021 15:26, Christophe Lyon via Gcc-patches wrote: > > >>> Ping? > > >>> > > >>> On Wed, 3 Feb 2021 at 10:01, Christophe Lyon > > >>> wrote: > > > > Ping? > > I guess that's obvious enough? > > > > On Wed, 27 Jan 2021 at 10:03, Christophe Lyon > > wrote: > > > > > > Depending on how the toolchain is configured or how the testsuite > > > is > > > executed, -mthumb may not be compatible. Like for other tests, > > > skip > > > pr97969.c in this case. > > > > > > For instance arm-linux-gnueabihf and -march=armv5t in > > > RUNTESTFLAGS. > > > > > > 2021-01-27 Christophe Lyon > > > > > > gcc/testsuite/ > > > PR target/97969 > > > * gcc.target/arm/pr97969.c: Skip if thumb mode is not available. > > > > > > diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c > > > b/gcc/testsuite/gcc.target/arm/pr97969.c > > > index 714a1d1..0b5d07f 100644 > > > --- a/gcc/testsuite/gcc.target/arm/pr97969.c > > > +++ b/gcc/testsuite/gcc.target/arm/pr97969.c > > > @@ -1,4 +1,5 @@ > > > /* { dg-do compile } */ > > > +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */ > > > /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w > > > -Os" } */ > > > > > > typedef a[23]; > > >> > > >> I'm working on a patch to make this sort of change unnecessary (I > > >> hope). > > >> Just running some final checks. > > >> > > >> R. > > >> > > > > > > Ah, wait. This one already has an explicit -mthumb, so my patch won't > > > affect this. But why is -mthumb needed for this test anyway? It's > > > just > > > a compilation test, so why not drop that and we'll generally get > > > better > > > coverage all round. > > > > > > > For instance I see the test fail for target arm-none-linux-gnueabihf > > --with-mode arm --with-cpu cortex-a9 --with-fpu vfp > > and running the tests with -march=armv5t > > > > We get the famous thumb-1 + hard-float ABI not supported. > > > > I guess -mthumb is inherited from the bug report? > > > > Christophe > > > > >>> > > >>> dropping the -mthumb should fix that though? > > >>> > > >>> In fact, I'd drop -Os as well, it's not needed as -Os is just one of the > > >>> many options that are used to build this test already. > > >>> > > >>> R. > > >>> > > >> > > >> But maybe then we need to change dg-options into dg-add-options. > > >> > > > > > > Not sure to follow: the test is compiled only once, with: > > > -std=c99 -fno-omit-frame-pointer -mthumb -w -Os > > > in my logs > > > > > > > I think it's only run the once /because/ the test sets dg-options rather > > than dg-add-options. > > > > Hi, sorry for the delay... > I guess you mean dg-additional-options ? > I did try that, to be sure, but the tests in gcc.target/arm are only > compiled once. > > Back to the original discussion, if we drop -mthumb, which is required > according to the PR (?), how do we ensure coverage? Sure I'm running > the testsuite with various RUNTESTFLAGS settings, but wouldn't it be > better to test what the PR reports by default? > Hi, I'm resurrecting this discussion since Vladimir backported his patch to gcc-9, and I just received a new failure warning from validation on that branch. Richard, any update? Thanks Christophe > Thanks > > Christophe > .
Re: [arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]
On Thu, 3 Jun 2021 at 10:38, Christophe Lyon wrote: > > On Mon, 15 Mar 2021 at 17:03, Christophe Lyon > wrote: > > > > On Wed, 3 Mar 2021 at 15:00, Richard Earnshaw > > wrote: > > > > > > On 02/03/2021 18:35, Christophe Lyon wrote: > > > > On Tue, 2 Mar 2021 at 19:18, Richard Earnshaw > > > > wrote: > > > >> > > > >> On 02/03/2021 18:14, Richard Earnshaw via Gcc-patches wrote: > > > >>> On 02/03/2021 18:10, Christophe Lyon wrote: > > > On Tue, 2 Mar 2021 at 17:25, Richard Earnshaw > > > wrote: > > > > > > > > On 02/03/2021 16:19, Richard Earnshaw via Gcc-patches wrote: > > > >> On 01/03/2021 15:26, Christophe Lyon via Gcc-patches wrote: > > > >>> Ping? > > > >>> > > > >>> On Wed, 3 Feb 2021 at 10:01, Christophe Lyon > > > >>> wrote: > > > > > > Ping? > > > I guess that's obvious enough? > > > > > > On Wed, 27 Jan 2021 at 10:03, Christophe Lyon > > > wrote: > > > > > > > > Depending on how the toolchain is configured or how the > > > > testsuite is > > > > executed, -mthumb may not be compatible. Like for other tests, > > > > skip > > > > pr97969.c in this case. > > > > > > > > For instance arm-linux-gnueabihf and -march=armv5t in > > > > RUNTESTFLAGS. > > > > > > > > 2021-01-27 Christophe Lyon > > > > > > > > gcc/testsuite/ > > > > PR target/97969 > > > > * gcc.target/arm/pr97969.c: Skip if thumb mode is not available. > > > > > > > > diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c > > > > b/gcc/testsuite/gcc.target/arm/pr97969.c > > > > index 714a1d1..0b5d07f 100644 > > > > --- a/gcc/testsuite/gcc.target/arm/pr97969.c > > > > +++ b/gcc/testsuite/gcc.target/arm/pr97969.c > > > > @@ -1,4 +1,5 @@ > > > > /* { dg-do compile } */ > > > > +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } > > > > */ > > > > /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w > > > > -Os" } */ > > > > > > > > typedef a[23]; > > > >> > > > >> I'm working on a patch to make this sort of change unnecessary (I > > > >> hope). > > > >> Just running some final checks. > > > >> > > > >> R. > > > >> > > > > > > > > Ah, wait. This one already has an explicit -mthumb, so my patch > > > > won't > > > > affect this. But why is -mthumb needed for this test anyway? It's > > > > just > > > > a compilation test, so why not drop that and we'll generally get > > > > better > > > > coverage all round. > > > > > > > > > > For instance I see the test fail for target arm-none-linux-gnueabihf > > > --with-mode arm --with-cpu cortex-a9 --with-fpu vfp > > > and running the tests with -march=armv5t > > > > > > We get the famous thumb-1 + hard-float ABI not supported. > > > > > > I guess -mthumb is inherited from the bug report? > > > > > > Christophe > > > > > > >>> > > > >>> dropping the -mthumb should fix that though? > > > >>> > > > >>> In fact, I'd drop -Os as well, it's not needed as -Os is just one of > > > >>> the > > > >>> many options that are used to build this test already. > > > >>> > > > >>> R. > > > >>> > > > >> > > > >> But maybe then we need to change dg-options into dg-add-options. > > > >> > > > > > > > > Not sure to follow: the test is compiled only once, with: > > > > -std=c99 -fno-omit-frame-pointer -mthumb -w -Os > > > > in my logs > > > > > > > > > > I think it's only run the once /because/ the test sets dg-options rather > > > than dg-add-options. > > > > > > > Hi, sorry for the delay... > > I guess you mean dg-additional-options ? > > I did try that, to be sure, but the tests in gcc.target/arm are only > > compiled once. > > > > Back to the original discussion, if we drop -mthumb, which is required > > according to the PR (?), how do we ensure coverage? Sure I'm running > > the testsuite with various RUNTESTFLAGS settings, but wouldn't it be > > better to test what the PR reports by default? > > > > Hi, > > I'm resurrecting this discussion since Vladimir backported his patch > to gcc-9, and I just received a new failure warning from validation on > that branch. > Sorry, I meant gcc-10. > Richard, any update? > > Thanks > > Christophe > > > Thanks > > > > Christophe > > .
Re: [backport gcc10, gcc9] Requet to backport PR97969
On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches wrote: > > Hi, > > > -Original Message- > > From: Vladimir Makarov > > Sent: 31 May 2021 16:52 > > To: Przemyslaw Wirkus ; Richard Biener > > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com; > > Richard Earnshaw ; Ramana Radhakrishnan > > ; Kyrylo Tkachov > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote: > > > Hi, > > > Just a follow up after GCC 11 release. > > > > > > I've backported to gcc-10 branch (without any change to original patches) > > > PR97969 and following PR98722 & PR98777 patches. > > > > > > Commits apply cleanly without changes. > > > Built and regression tested on: > > > * arm-none-eabi and > > > * aarch64-none-linux-gnu cross toolchains. > > > > > > There were no issues and no regressions (all OK). > > > > > > OK for backport to gcc-10 branch ? > > > > Sorry for delay with the answer due to my vacation. > > > > As the patches did not introduce new PRs I believe they are ok for gcc-10. > > Backported to gcc-10 branch. Thank you for your support. > Hi, I'm surprised to see many new errors on arm after the backport for PR98722 See: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build-info.html Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new ICEs) on your side? Thanks > Kind regards > Przemyslaw > > > Thank you. > > > > > > > > Kind regards, > > > Przemyslaw Wirkus > > > > > > --- > > > commits I've backported: > > > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > Author: Vladimir N. Makarov > > > Date: Tue Jan 12 11:26:15 2021 -0500 > > > > > > [PR97969] LRA: Transform pattern `plus (plus (hard reg, const), > > > pseudo)` > > after elimination > > > > > > commit 4334b524274203125193a08a8485250c41c2daa9 > > > Author: Vladimir N. Makarov > > > Date: Wed Jan 20 11:40:14 2021 -0500 > > > > > > [PR98722] LRA: Check that target has no 3-op add insn to transform 2 > > plus expression. > > > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091 > > > Author: Vladimir N. Makarov > > > Date: Thu Jan 21 17:27:01 2021 -0500 > > > > > > [PR98777] LRA: Use preliminary created pseudo for in LRA elimination > > subpass > > > > > > $ ./contrib/git-backport.py cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > $ ./contrib/git-backport.py 4334b524274203125193a08a8485250c41c2daa9 > > > $ ./contrib/git-backport.py 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > > > > >> Richard. >
RE: [backport gcc10, gcc9] Requet to backport PR97969
> -Original Message- > From: Christophe Lyon > Sent: 03 June 2021 09:45 > To: Przemyslaw Wirkus > Cc: Vladimir Makarov ; ja...@redhat.com; Richard > Earnshaw ; Richard Biener > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches patc...@gcc.gnu.org> wrote: > > > > Hi, > > > > > -Original Message- > > > From: Vladimir Makarov > > > Sent: 31 May 2021 16:52 > > > To: Przemyslaw Wirkus ; Richard Biener > > > > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com; > > > Richard Earnshaw ; Ramana > Radhakrishnan > > > ; Kyrylo Tkachov > > > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > > > > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote: > > > > Hi, > > > > Just a follow up after GCC 11 release. > > > > > > > > I've backported to gcc-10 branch (without any change to original > > > > patches) > > > > PR97969 and following PR98722 & PR98777 patches. > > > > > > > > Commits apply cleanly without changes. > > > > Built and regression tested on: > > > > * arm-none-eabi and > > > > * aarch64-none-linux-gnu cross toolchains. > > > > > > > > There were no issues and no regressions (all OK). > > > > > > > > OK for backport to gcc-10 branch ? > > > > > > Sorry for delay with the answer due to my vacation. > > > > > > As the patches did not introduce new PRs I believe they are ok for gcc-10. > > > > Backported to gcc-10 branch. Thank you for your support. > > > > Hi, > > I'm surprised to see many new errors on arm after the backport for PR98722 > See: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc- > 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build- > info.html > > Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new > ICEs) on your side? Apologies. I've built and regtested before submitting backport yesterday. I will check on my side and build one of your failing configurations. Przemyslaw > Thanks > > > Kind regards > > Przemyslaw > > > > > Thank you. > > > > > > > > > > > Kind regards, > > > > Przemyslaw Wirkus > > > > > > > > --- > > > > commits I've backported: > > > > > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > Author: Vladimir N. Makarov > > > > Date: Tue Jan 12 11:26:15 2021 -0500 > > > > > > > > [PR97969] LRA: Transform pattern `plus (plus (hard reg, > > > > const), pseudo)` > > > after elimination > > > > > > > > commit 4334b524274203125193a08a8485250c41c2daa9 > > > > Author: Vladimir N. Makarov > > > > Date: Wed Jan 20 11:40:14 2021 -0500 > > > > > > > > [PR98722] LRA: Check that target has no 3-op add insn to > > > > transform 2 > > > plus expression. > > > > > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > Author: Vladimir N. Makarov > > > > Date: Thu Jan 21 17:27:01 2021 -0500 > > > > > > > > [PR98777] LRA: Use preliminary created pseudo for in LRA > > > > elimination > > > subpass > > > > > > > > $ ./contrib/git-backport.py > > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > $ ./contrib/git-backport.py > > > > 4334b524274203125193a08a8485250c41c2daa9 > > > > $ ./contrib/git-backport.py > > > > 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > > > > > > > >> Richard. > >
[PATCH] c++: Fix up attribute handling in methods in templates [PR100872]
Hi! The following testcase FAILs because a dependent (late) attribute is never tsubsted. While the testcase is OpenMP, I think it is a generic C++ FE problem that could affect any other dependent attribute. apply_late_template_attributes documents that it relies on /* save_template_attributes puts the dependent attributes at the beginning of the list; find the non-dependent ones. */ The "operator binding" attributes that are sometimes added are added to the head of DECL_ATTRIBUTES list though and because it doesn't have ATTR_IS_DEPENDENT set it violates this requirement. The following patch fixes it by adding that attribute after all ATTR_IS_DEPENDENT attributes. I'm not 100% sure if DECL_ATTRIBUTES can't be shared by multiple functions (e.g. the cdtor clones), but the code uses later remove_attribute which could break that too. In any case it passed bootstrap/regtest on x86_64-linux and i686-linux. Other option would be to copy_list the ATTR_IS_DEPENDENT portion of the DECL_ATTRIBUTES list if we need to do this, that would be the same as this patch but replace that *ap = op_attr; at the end with *ap = NULL_TREE; DECL_ATTRIBUTES (cfn) = chainon (copy_list (DECL_ATTRIBUTES (cfn)), op_attr); Or perhaps set ATTR_IS_DEPENDENT on the "operator bindings" attribute, though it would need to be studied what would it try to do with the attribute during tsubst. 2021-06-03 Jakub Jelinek PR c++/100872 * name-lookup.c (maybe_save_operator_binding): Add op_attr after all ATTR_IS_DEPENDENT attributes in the DECL_ATTRIBUTES list rather than to the start. * g++.dg/gomp/declare-simd-8.C: New test. --- gcc/cp/name-lookup.c.jj 2021-05-11 09:06:24.281997782 +0200 +++ gcc/cp/name-lookup.c2021-06-02 15:50:52.042521824 +0200 @@ -9136,9 +9136,12 @@ maybe_save_operator_binding (tree e) tree op_attr = lookup_attribute (op_bind_attrname, attributes); if (!op_attr) { + tree *ap = &DECL_ATTRIBUTES (cfn); + while (*ap && ATTR_IS_DEPENDENT (*ap)) + ap = &TREE_CHAIN (*ap); op_attr = tree_cons (get_identifier (op_bind_attrname), - NULL_TREE, attributes); - DECL_ATTRIBUTES (cfn) = op_attr; + NULL_TREE, *ap); + *ap = op_attr; } tree op_bind = purpose_member (fnname, TREE_VALUE (op_attr)); --- gcc/testsuite/g++.dg/gomp/declare-simd-8.C.jj 2021-06-02 16:02:32.792681922 +0200 +++ gcc/testsuite/g++.dg/gomp/declare-simd-8.C 2021-06-02 16:02:09.849004442 +0200 @@ -0,0 +1,15 @@ +// PR c++/100872 + +template +struct S { + #pragma omp declare simd aligned(a : N * 2) aligned(b) linear(ref(b): N) + float foo (float *a, T *&b) { return *a + *b; } +}; + +S<16, float> s; + +float +bar (float *a, float *p) +{ + return s.foo (a, p); +} Jakub
[PATCH] [i386] Fix ICE of insn does not satisfy its constraints.
For evex encoding extended instructions, when vector length is less than 512 bits, AVX512VL is needed, besides some instructions like vpmovzxbx need extra AVX512BW. So this patch refines corresponding constraints, i.e. from "v/vm" to "Yv/Yvm", from "v/vm" to "Yw/Ywm". Bootstrapped and regtested on x86_64-linux-gnu{-m32,} Ok for trunk? and backport to GCC10 and GCC11? BTW, I'm not sure should this patch be backport to GCC8 and GCC9 since constraints "Yw" is introduced in GCC10, if we want to do that, we also need to backport defination of "Yw". gcc/ChangeLog: PR target/100885 * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3): Refine constraints. (avx2_v8qiv8si2): Ditto. (*avx2_v8qiv8si2_1): Ditto. (sse4_1_v4qiv4si2): Ditto. (*sse4_1_v4qiv4si2_1): Ditto. (avx2_v8hiv8si2): Ditto. (avx2_zero_extendv8hiv8si2_1): Ditto. (sse4_1_v4hiv4si2): Ditto. (*sse4_1_v4hiv4si2_1): Ditto. (*sse4_1_zero_extendv4hiv4si2_3): Ditto. (avx2_v4qiv4di2): Ditto. (*avx2_v4qiv4di2_1): Ditto. (sse4_1_v2qiv2di2): Ditto. (avx2_v4hiv4di2): Ditto. (*avx2_v4hiv4di2_1): Ditto. (sse4_1_v2hiv2di2): Ditto. (*sse4_1_v2hiv2di2_1): Ditto. (avx2_v4siv4di2): Ditto. (*avx2_zero_extendv4siv4di2_1): Ditto. (v4siv4di2): Ditto. (sse4_1_v2siv2di2): Ditto. (*sse4_1_v2siv2di2_1): Ditto. (*sse4_1_zero_extendv2siv2di2_3): Ditto. gcc/testsuite/ChangeLog: PR target/100885 * g++.target/i386/pr100885.C: New test. --- gcc/config/i386/sse.md | 78 +-- gcc/testsuite/g++.target/i386/pr100885.C | 159 +++ 2 files changed, 198 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/g++.target/i386/pr100885.C diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a4503ddcb73..42e62fd4f89 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -18101,10 +18101,10 @@ (define_insn_and_split "*sse4_1_v8qiv8hi2_2" "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") (define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3" - [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v") + [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,Yw") (vec_select:V16QI (vec_concat:V32QI - (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,vm") + (match_operand:V16QI 1 "vector_operand" "YrBm,*xBm,Ywm") (match_operand:V16QI 2 "const0_operand" "C,C,C")) (match_parallel 3 "pmovzx_parallel" [(match_operand 4 "const_int_operand" "n,n,n")])))] @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2" "TARGET_AVX512F") (define_insn "avx2_v8qiv8si2" - [(set (match_operand:V8SI 0 "register_operand" "=v") + [(set (match_operand:V8SI 0 "register_operand" "=Yv") (any_extend:V8SI (vec_select:V8QI - (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "Yv") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) @@ -18179,7 +18179,7 @@ (define_insn "avx2_v8qiv8si2" (set_attr "mode" "OI")]) (define_insn "*avx2_v8qiv8si2_1" - [(set (match_operand:V8SI 0 "register_operand" "=v") + [(set (match_operand:V8SI 0 "register_operand" "=Yv") (any_extend:V8SI (match_operand:V8QI 1 "memory_operand" "m")))] "TARGET_AVX2 && " @@ -18225,10 +18225,10 @@ (define_expand "v8qiv8si2" }) (define_insn "sse4_1_v4qiv4si2" - [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") + [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,Yv") (any_extend:V4SI (vec_select:V4QI - (match_operand:V16QI 1 "register_operand" "Yr,*x,v") + (match_operand:V16QI 1 "register_operand" "Yr,*x,Yv") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]] "TARGET_SSE4_1 && " @@ -18240,7 +18240,7 @@ (define_insn "sse4_1_v4qiv4si2" (set_attr "mode" "TI")]) (define_insn "*sse4_1_v4qiv4si2_1" - [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") + [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,Yv") (any_extend:V4SI (match_operand:V4QI 1 "memory_operand" "m,m,m")))] "TARGET_SSE4_1 && " @@ -18322,9 +18322,9 @@ (define_insn_and_split "avx512f_zero_extendv16hiv16si2_1" }) (define_insn "avx2_v8hiv8si2" - [(set (match_operand:V8SI 0 "register_operand" "=v") + [(set (match_operand:V8SI 0 "register_operand" "=Yv") (any_extend:V8SI - (match_operand:V8HI 1 "nonimmediate_operand" "vm")))] + (match_operand:V8HI 1 "nonimmediate_operand" "Yvm")))] "TARGET_AVX2 && " "vpmovwd\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") @@ -18339,10 +18339,10 @@ (define_expand "v8hiv8si2" "T
Re: [backport gcc10, gcc9] Requet to backport PR97969
On Thu, 3 Jun 2021 at 10:54, Przemyslaw Wirkus wrote: > > > > > -Original Message- > > From: Christophe Lyon > > Sent: 03 June 2021 09:45 > > To: Przemyslaw Wirkus > > Cc: Vladimir Makarov ; ja...@redhat.com; Richard > > Earnshaw ; Richard Biener > > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches > patc...@gcc.gnu.org> wrote: > > > > > > Hi, > > > > > > > -Original Message- > > > > From: Vladimir Makarov > > > > Sent: 31 May 2021 16:52 > > > > To: Przemyslaw Wirkus ; Richard Biener > > > > > > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com; > > > > Richard Earnshaw ; Ramana > > Radhakrishnan > > > > ; Kyrylo Tkachov > > > > > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > > > > > > > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote: > > > > > Hi, > > > > > Just a follow up after GCC 11 release. > > > > > > > > > > I've backported to gcc-10 branch (without any change to original > > > > > patches) > > > > > PR97969 and following PR98722 & PR98777 patches. > > > > > > > > > > Commits apply cleanly without changes. > > > > > Built and regression tested on: > > > > > * arm-none-eabi and > > > > > * aarch64-none-linux-gnu cross toolchains. > > > > > > > > > > There were no issues and no regressions (all OK). > > > > > > > > > > OK for backport to gcc-10 branch ? > > > > > > > > Sorry for delay with the answer due to my vacation. > > > > > > > > As the patches did not introduce new PRs I believe they are ok for > > > > gcc-10. > > > > > > Backported to gcc-10 branch. Thank you for your support. > > > > > > > Hi, > > > > I'm surprised to see many new errors on arm after the backport for PR98722 > > See: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc- > > 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report-build- > > info.html > > > > Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new > > ICEs) on your side? > > Apologies. > > I've built and regtested before submitting backport yesterday. > I will check on my side and build one of your failing configurations. > After I sent the previous email, I received the validation results for the next backport, and it seems it fixes the ICEs introduced by r10-9881: https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10-9882-g05f6971ac40912ef062915f88b3ea0bf27278285/report-build-info.html I guess you ran validations with the 3 backports combined, rather than individually? So it looks OK now. Thanks, Christophe > Przemyslaw > > > Thanks > > > > > Kind regards > > > Przemyslaw > > > > > > > Thank you. > > > > > > > > > > > > > > Kind regards, > > > > > Przemyslaw Wirkus > > > > > > > > > > --- > > > > > commits I've backported: > > > > > > > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > > Author: Vladimir N. Makarov > > > > > Date: Tue Jan 12 11:26:15 2021 -0500 > > > > > > > > > > [PR97969] LRA: Transform pattern `plus (plus (hard reg, > > > > > const), pseudo)` > > > > after elimination > > > > > > > > > > commit 4334b524274203125193a08a8485250c41c2daa9 > > > > > Author: Vladimir N. Makarov > > > > > Date: Wed Jan 20 11:40:14 2021 -0500 > > > > > > > > > > [PR98722] LRA: Check that target has no 3-op add insn to > > > > > transform 2 > > > > plus expression. > > > > > > > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > Author: Vladimir N. Makarov > > > > > Date: Thu Jan 21 17:27:01 2021 -0500 > > > > > > > > > > [PR98777] LRA: Use preliminary created pseudo for in LRA > > > > > elimination > > > > subpass > > > > > > > > > > $ ./contrib/git-backport.py > > > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > > $ ./contrib/git-backport.py > > > > > 4334b524274203125193a08a8485250c41c2daa9 > > > > > $ ./contrib/git-backport.py > > > > > 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > > > > > > > > > > >> Richard. > > >
RE: [backport gcc10, gcc9] Requet to backport PR97969
> -Original Message- > From: Christophe Lyon > Sent: 03 June 2021 10:10 > To: Przemyslaw Wirkus > Cc: Vladimir Makarov ; ja...@redhat.com; Richard > Earnshaw ; Richard Biener > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > On Thu, 3 Jun 2021 at 10:54, Przemyslaw Wirkus > wrote: > > > > > > > > > -Original Message- > > > From: Christophe Lyon > > > Sent: 03 June 2021 09:45 > > > To: Przemyslaw Wirkus > > > Cc: Vladimir Makarov ; ja...@redhat.com; > > > Richard Earnshaw ; Richard Biener > > > ; gcc-patches@gcc.gnu.org; Ramana Radhakrishnan > > > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > > > On Thu, 3 Jun 2021 at 00:31, Przemyslaw Wirkus via Gcc-patches > > patc...@gcc.gnu.org> wrote: > > > > > > > > Hi, > > > > > > > > > -Original Message- > > > > > From: Vladimir Makarov > > > > > Sent: 31 May 2021 16:52 > > > > > To: Przemyslaw Wirkus ; Richard > > > > > Biener > > > > > Cc: gcc-patches@gcc.gnu.org; ja...@redhat.com; ni...@redhat.com; > > > > > Richard Earnshaw ; Ramana > > > Radhakrishnan > > > > > ; Kyrylo Tkachov > > > > > > > > > > Subject: Re: [backport gcc10, gcc9] Requet to backport PR97969 > > > > > > > > > > > > > > > On 2021-05-25 5:14 a.m., Przemyslaw Wirkus wrote: > > > > > > Hi, > > > > > > Just a follow up after GCC 11 release. > > > > > > > > > > > > I've backported to gcc-10 branch (without any change to > > > > > > original > > > > > > patches) > > > > > > PR97969 and following PR98722 & PR98777 patches. > > > > > > > > > > > > Commits apply cleanly without changes. > > > > > > Built and regression tested on: > > > > > > * arm-none-eabi and > > > > > > * aarch64-none-linux-gnu cross toolchains. > > > > > > > > > > > > There were no issues and no regressions (all OK). > > > > > > > > > > > > OK for backport to gcc-10 branch ? > > > > > > > > > > Sorry for delay with the answer due to my vacation. > > > > > > > > > > As the patches did not introduce new PRs I believe they are ok for > > > > > gcc- > 10. > > > > > > > > Backported to gcc-10 branch. Thank you for your support. > > > > > > > > > > Hi, > > > > > > I'm surprised to see many new errors on arm after the backport for > > > PR98722 > > > See: > > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc- > > > 10/r10-9881-g1791b11d9cae388ae18a768eeb96c998439c986a/report- > build- > > > info.html > > > > > > Przemyslaw, Vladimir do you confirm r10-9881 has no such errors (new > > > ICEs) on your side? > > > > Apologies. > > > > I've built and regtested before submitting backport yesterday. > > I will check on my side and build one of your failing configurations. > > > > After I sent the previous email, I received the validation results for the > next > backport, and it seems it fixes the ICEs introduced by r10-9881: > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-10/r10- > 9882-g05f6971ac40912ef062915f88b3ea0bf27278285/report-build-info.html > > I guess you ran validations with the 3 backports combined, rather than > individually? Yes, these three PRs are all connected to each other. That's why I've ran validation after third one, not for each one separately. PS: Office folks now roast me with true passion ;) P. > So it looks OK now. > > Thanks, > > Christophe > > > > Przemyslaw > > > > > Thanks > > > > > > > Kind regards > > > > Przemyslaw > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > > Kind regards, > > > > > > Przemyslaw Wirkus > > > > > > > > > > > > --- > > > > > > commits I've backported: > > > > > > > > > > > > commit cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > > > Author: Vladimir N. Makarov > > > > > > Date: Tue Jan 12 11:26:15 2021 -0500 > > > > > > > > > > > > [PR97969] LRA: Transform pattern `plus (plus (hard reg, > > > > > > const), pseudo)` > > > > > after elimination > > > > > > > > > > > > commit 4334b524274203125193a08a8485250c41c2daa9 > > > > > > Author: Vladimir N. Makarov > > > > > > Date: Wed Jan 20 11:40:14 2021 -0500 > > > > > > > > > > > > [PR98722] LRA: Check that target has no 3-op add insn to > > > > > > transform 2 > > > > > plus expression. > > > > > > > > > > > > commit 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > > Author: Vladimir N. Makarov > > > > > > Date: Thu Jan 21 17:27:01 2021 -0500 > > > > > > > > > > > > [PR98777] LRA: Use preliminary created pseudo for in LRA > > > > > > elimination > > > > > subpass > > > > > > > > > > > > $ ./contrib/git-backport.py > > > > > > cf2ac1c30af0fa783c8d72e527904dda5d8cc330 > > > > > > $ ./contrib/git-backport.py > > > > > > 4334b524274203125193a08a8485250c41c2daa9 > > > > > > $ ./contrib/git-backport.py > > > > > > 68ba1039c7daf0485b167fe199ed7e8031158091 > > > > > > > > > > > > > > > > > >> Richard. > > > >
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On Thu, Jun 03, 2021 at 03:00:44AM -0500, Segher Boessenkool wrote: > On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote: > The whole point of requiring the split condition to start with && is so > it will become harder to mess things up (it will make the gen* code a > tiny little bit simpler as well). And there is no transition period or > anything like that needed either. Just the bunch that will break will > need fixing. So let's find out how many of those there are :-) > > > > How many such cases *are* there? There are no users exposed to this, > > > and when the split condition is required to start with "&&" (instead of > > > getting that implied) it is not a silent change ever, either. > > > > If I read the proposal right, the explicit "&&" is only required when going > > to find all potential problematic places for final implied "&&" change. > > But one explicit "&&" does offer good readability. > > My proposal is to always require && (or maybe identical insn and split > conditions should be allowed as well, if people still use that -- that > is how we wrote "&& 1" before that existed). I prototyped this. There are very many errors. Iterators often modify the insn condition (for one iteration of it), but that does not work if the split condition does not start with "&&"! See attached prototype. Segher = = = diff --git a/gcc/gensupport.c b/gcc/gensupport.c index 2cb760ffb90f..05d46fd3775c 100644 --- a/gcc/gensupport.c +++ b/gcc/gensupport.c @@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc) case DEFINE_INSN_AND_SPLIT: case DEFINE_INSN_AND_REWRITE: { - const char *split_cond; rtx split; rtvec attr; int i; @@ -611,15 +610,20 @@ process_rtx (rtx desc, file_location loc) /* If the split condition starts with "&&", append it to the insn condition to create the new split condition. */ - split_cond = XSTR (desc, 4); - if (split_cond[0] == '&' && split_cond[1] == '&') + const char *insn_cond = XSTR (desc, 2); + const char *split_cond = XSTR (desc, 4); + if (!strncmp (split_cond, "&&", 2)) { rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond); - split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2), + split_cond = rtx_reader_ptr->join_c_conditions (insn_cond, split_cond + 2); + } else if (insn_cond[0]) { + if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) + error_at (loc, "the rewrite condition must start with `&&'"); + else + error_at (loc, "the split condition must start with `&&' [%s]", insn_cond); } - else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) - error_at (loc, "the rewrite condition must start with `&&'"); + XSTR (split, 1) = split_cond; if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1));
[PATCH][committed]AArch64 Fix failing testcase for native cpu detection
Hi All, A late change in the patch changed the implemented ID to one that hasn't been used yet to avoid any ambiguity. Unfortunately the chosen value of 0xFF matches the value of -1 which is used as an invalid implementer so the test started failing. This patch changes it to 0xFE which is the highest usable number. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Committed under the obvious rule. Thanks, Tamar gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_16: Update implementer. * gcc.target/aarch64/cpunative/info_17: Likewise --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 @@ -1,7 +1,7 @@ processor : 0 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2 -CPU implementer: 0xff +CPU implementer: 0xfe CPU architecture: 8 CPU variant: 0x0 CPU part : 0xd08 diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 @@ -1,7 +1,7 @@ processor : 0 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2 -CPU implementer: 0xff +CPU implementer: 0xfe CPU architecture: 8 CPU variant: 0x0 CPU part : 0xd08 -- diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_16 @@ -1,7 +1,7 @@ processor : 0 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2 -CPU implementer : 0xff +CPU implementer : 0xfe CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 index b0679579d9167d46c832e55cb63d9077f7a80f70..2c04ff19cb6ca99f7c62ea80c5bb6cd0985a0949 100644 --- a/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_17 @@ -1,7 +1,7 @@ processor : 0 BogoMIPS : 100.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp sve sve2 -CPU implementer : 0xff +CPU implementer : 0xfe CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On Thu, Jun 03, 2021 at 09:05:02AM +0100, Richard Sandiford via Gcc-patches wrote: > Right. Plus it creates less make-work. If we didn't have it, someone > would need to split the define_insn_and_splits that don't currently > use "&&", then later someone might decide that the missing "&&" was a > mistake and need to put them together again (or just revert the patch > and edit from there, I guess). I would hope no maintainer would be foolish enough to flip-flop like that. > Plus define_independent_insn_and_split would act as a flag for something > that might be suspect. That always is a conceptual error, even. The name says so already: the insn and the split are very different things, with different conditions, they just happen to share a pattern. > If we split them then the define_split condition > will seem to have been chosen deliberately in isolation. And it will have beenm chosen deliberately! Why *else* write things like this? > > If I read the proposal right, the explicit "&&" is only required when going > > to find all potential problematic places for final implied "&&" change. > > But one explicit "&&" does offer good readability. > > I don't know. "&& 1" looks kind of weird to me. We have it in rs6000.md since 2004. sparc has had it since 2002. i386 has had it since 2001. arm still does not have it :-) > > So the question is that: whether we need to demand an explicit "&&". > > Richard's proposal is for answer "no" which aligns with Richi's auto > > filling advice before. I think it would result in fewer changes since > > those places without explicit "&&" are mostly unintentional, all the jobs > > are done by implied "&&". Its downside seems to be bad readability, new > > readers may take it as two seperated conditions at first glance, but I > > guess if we emphasize this change in the document it would be fine? > > Or emitting one warning if missing an explicit "&&"? > > IMO the natural way to read it is that the split C condition gives the > conditions under which the instruction should be split. I think that's > why forgetting the "&&" is such a common mistake. (I know I've done it > plenty of times.) You even do in this description! :-) You do not want the define_split in a define_insn_and_split to happen for insns that do not match the insn condition, that would not be intuitive at all. > IMO requiring the "&&" is baking in an alternative, less intuitive, > interpretation. But you want the exact same semantics, right? I do agree that the syntax without "&&" looks nicer. It has many practical problems though, so it is nice to aim for, but we cannot move there until we have all existing machine descriptions fixed up first. Segher
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
Segher Boessenkool writes: > On Thu, Jun 03, 2021 at 09:05:02AM +0100, Richard Sandiford via Gcc-patches > wrote: >> Right. Plus it creates less make-work. If we didn't have it, someone >> would need to split the define_insn_and_splits that don't currently >> use "&&", then later someone might decide that the missing "&&" was a >> mistake and need to put them together again (or just revert the patch >> and edit from there, I guess). > > I would hope no maintainer would be foolish enough to flip-flop like > that. But the point is that the maintainers wouldn't be flip-flopping. The move to define_independent_insn_and_split would be a mechnical change made by someone else. We shouldn't just add "&&" to all define_insn_and_splits that currently lack them. That might well break things. So to get the perfect result, someone has to look at each individual define_insn_and_split that currently lacks a "&&" to see what the effects of adding "&&" would be. IMO it's not reasonable to ask Kewen to do that for all ports. So the process I suggested was a way of mechanically changing existing ports in a way that would not require input from target maintainers, or extensive testing on affected targets. It would keep the behaviour of the port the same as it is now (whether that's good or bad). This part of the process could be scripted. >> > If I read the proposal right, the explicit "&&" is only required when going >> > to find all potential problematic places for final implied "&&" change. >> > But one explicit "&&" does offer good readability. >> >> I don't know. "&& 1" looks kind of weird to me. > > We have it in rs6000.md since 2004. sparc has had it since 2002. i386 > has had it since 2001. arm still does not have it :-) Sure, this syntax goes back 20 years. I don't think that answers the point though. The question was whether a split condition "&& 1" is more readable than a condition "", given that "" means "always" in other contexts. Given the choice, IMO "" is more readable and "&& 1" looks weird/inconsistent. Thanks, Richard
Re: [patch] Tame fix for PR ipa/99122
> Not sure whether we know VLA results are always returned by > reference? In particular does this mean we'll never see a > WITH_SIZE_EXPR on the LHS of a call? You might have noticed > I've done WITH_SIZE_EXPR "enhancements" recently on trunk. VLAs are always aggregate_value_p so the caller passes the address of the return slot, but they are not always DECL_BY_REFERENCE, at least in Ada. In Ada we use DECL_BY_REFERENCE only for self-referential types, i.e. types whose size is not the same for all the objects, because we need to control the amount of returned bytes on a case-by-case basis; when the size is uniform, this complication is unnecessary. > As for the patch we indeed didn't have a testcase covering the > return case so it should be safe to revert the return part. OK, thanks, patch installed. -- Eric Botcazou
Re: GCC documentation: porting to Sphinx
On 6/2/21 10:41 PM, Martin Sebor wrote: On 5/31/21 7:25 AM, Martin Liška wrote: Hello. I've made quite some progress with the porting of the documentation and I would like to present it to the community now: https://splichal.eu/scripts/sphinx/ Hello. Thank you for the review. Just a few issues I noticed in the warnings section: The headings of some warnings mention the same option twice (e.g., -Wabi, -Wabi, -Wno-abi; -Wdouble-promotion, -Wdouble-promotion, -Wno-double-promotion; -Winit-self, -Winit-self, -Wno-init-self). This looks like a pretty pervasive problem. You are right, I fixed that. Mentioning the -Wno-xxx option is redundant in a heading for -Wxxx. Yes. Good reason for that is that Sphinx can then generated properly links to the current non-documented version of the option. Hope it's improvement over the current situation? The headings of some other warnings also mention options that are only remotely related to them. E.g., -Wformat has all these: -Wformat, -Wno-format, -ffreestanding, -fno-builtin, -Wformat= (I see the same problem in the attributes section where the headings for some attributes include option names). That seems quite puzzling. I assume it's a consequence of having index entries for the related options, but I don't think making them visible in the headings is helfpful. Oh, you are right. It was consequence of wrong parsing of index entries. It should be fixed now. Headings that in the manual today include a level like -Wformat-overflow -Wformat-overflow=level don't mention the level in the Spinx manual: -Wformat-overflow, -Wno-format-overflow When the /level/ is then discussed in the rest of the text it's not clear what it refers to. Should be also fixed now. Can you please take a look at the current output and give me a feedback? Thanks, Martin Martin Note the documentation is automatically ([1]) generated from texinfo with a GitHub workflow ([2]). It's built on the devel/sphinx GCC branch which I periodically with the master branch. One can see the current source .rst files here: [3]. Changes made since the last time: - a shared content is factored out ([4]) - conditional build is fully supported (even for shared parts) - manual pages look reasonable well - folders are created for files which have >= 5 TOC tree entries - various formatting issues were resolved - baseconf.py reads BASE-VER, DEV-PHASE, .. files I've got couple of questions: 1) Do we have to you the following cover text? Copyright (c) 1988-2020 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "GNU General Public License" and "Funding Free Software", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the gfdl(7) man page. (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. 2) Do we want to generate fsf-funding, gpl and gfdl manual pages? 3) Do we want to preserve the current strange copy mechanism for ./gcc/doc/tm.texi.in ? 4) Do we want a copyright header for the created .rst files? Thoughts? Thanks, Martin [1] https://github.com/davidmalcolm/texi2rst [2] https://github.com/davidmalcolm/texi2rst/actions [3] https://github.com/marxin/texi2rst-generated/tree/master/sphinx [4] https://github.com/marxin/texi2rst-generated/tree/master/sphinx/share
RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).
>-Original Message- >From: Segher Boessenkool >Sent: Thursday, June 3, 2021 4:46 AM >To: Richard Biener >Cc: Liu, Hongtao ; GCC Patches patc...@gcc.gnu.org> >Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not >(vec_duplicate A)). > >Hi! > >On Wed, Jun 02, 2021 at 09:07:35AM +0200, Richard Biener wrote: >> On Wed, Jun 2, 2021 at 7:41 AM liuhongt via Gcc-patches >> wrote: >> > For i386, it will enable below opt >> > >> > from >> > notl%edi >> > vpbroadcastd%edi, %xmm0 >> > vpand %xmm1, %xmm0, %xmm0 >> > to >> > vpbroadcastd%edi, %xmm0 >> > vpandn %xmm1, %xmm0, %xmm0 >> >> There will be cases where (vec_duplicate (not A)) is better than (not >> (vec_duplicate A)), so I'm not sure it is a good idea to forcefully >> canonicalize unary operations. > >It is two unaries in sequence, where the order does not matter either. >As in all such cases you either have to handle both cases everywhere, or have >a canonical order. > >> I suppose the >> simplification happens inside combine > >combine uses simplify-rtx for most cases (it is part of combine, but used in >quite a few other places these days). > >> - doesn't combine >> already have code to try variants of an expression and isn't this a >> good candidate that can be added there, avoiding the canonicalization? > >As I mentioned, this is done in simplify-rtx in cases that do not have a >canonical representation. This is critical because it prevents loops. > >A very typical example is how UMIN is optimised: > > case UMIN: > if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0)) > return op1; > if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0)) > return op0; > tem = simplify_associative_operation (code, mode, op0, op1); > if (tem) > return tem; > break; > >(the stuff using "tem"). > >Hongtao, can we do something similar here? Does that work well? Please try >it out :-) In simplify_rtx, no simplication occurs, there is just the difference between (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will only be 0. Basically we don't know it's a simplication until combine successfully split the 3->2 instructions (not + broadcast + and to andnot + broadcast), but it's pretty awkward to do this in combine. Consider andnot is existed for many backends, I think a canonicalization is needed here. Maybe we can add insn canonicalization for transforming (and (vect_duplicate (not A)) B) to (and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not (vec_duplicate A))? > > >Segher
Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).
On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao via Gcc-patches wrote: > In simplify_rtx, no simplication occurs, there is just the difference between > (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will > only be 0. > Basically we don't know it's a simplication until combine successfully split > the > 3->2 instructions (not + broadcast + and to andnot + broadcast), but it's > pretty awkward > to do this in combine. > > Consider andnot is existed for many backends, I think a canonicalization is > needed here. > Maybe we can add insn canonicalization for transforming (and (vect_duplicate > (not A)) B) to > (and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not > (vec_duplicate A))? For the (not (vec_duplicate)) vs. (vec_duplicate (not)) it isn't clear which one is generally a win on major targets, so I'd say it is better to add a combine splitter to swap it in backends that want that. Jakub
[committed] arc: Remove obsolete options
This is a respin of an older series of three patches which I have merged into one. The new (committed) patch is keeping the obsolete options int .opt file marking them accrodingly for backwards compatibility. Remove the following obsolete options: - munalign-prob-threshold - malign-call - mmixed-code The ARC's options are marked as obsolete and ignored for backwards compatibility. gcc/ 2021-06-03 Claudiu Zissulescu * common/config/arc/arc-common.c (arc_option_optimization_table): Remove malign-call. * config/arc/arc.c (arc_unalign_branch_p): Remove unused function. * config/arc/arc.h (TARGET_MIXED_CODE): Remove macro. (INDEX_REG_CLASS): Only refer to GENERAL_REGS. * config/arc/arc.md (abssi2_mixed): Remove pattern. * config/arc/arc.opt (munalign-prob-threshold): Mark it obsolete. (malign-call): Likewise. (mmixed-code): Likewise. * doc/invoke.texi (ARC): Update doc. Signed-off-by: Claudiu Zissulescu --- gcc/common/config/arc/arc-common.c | 1 - gcc/config/arc/arc.c | 23 --- gcc/config/arc/arc.h | 4 +--- gcc/config/arc/arc.md | 8 gcc/config/arc/arc.opt | 18 ++ gcc/doc/invoke.texi| 13 +++-- 6 files changed, 10 insertions(+), 57 deletions(-) diff --git a/gcc/common/config/arc/arc-common.c b/gcc/common/config/arc/arc-common.c index 86674dd3de7..6a119029616 100644 --- a/gcc/common/config/arc/arc-common.c +++ b/gcc/common/config/arc/arc-common.c @@ -62,7 +62,6 @@ static const struct default_options arc_option_optimization_table[] = { OPT_LEVELS_SIZE, OPT_fif_conversion, NULL, 0 }, { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 }, { OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_msize_level_, NULL, 0 }, -{ OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_malign_call, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c index 9153f0529ab..b77d0566386 100644 --- a/gcc/config/arc/arc.c +++ b/gcc/config/arc/arc.c @@ -9868,29 +9868,6 @@ gen_acc2 (void) return gen_rtx_REG (SImode, TARGET_BIG_ENDIAN ? 57: 56); } -/* FIXME: a parameter should be added, and code added to final.c, - to reproduce this functionality in shorten_branches. */ -#if 0 -/* Return nonzero iff BRANCH should be unaligned if possible by upsizing - a previous instruction. */ -int -arc_unalign_branch_p (rtx branch) -{ - rtx note; - - if (!TARGET_UNALIGN_BRANCH) -return 0; - /* Do not do this if we have a filled delay slot. */ - if (get_attr_delay_slot_filled (branch) == DELAY_SLOT_FILLED_YES - && !NEXT_INSN (branch)->deleted ()) -return 0; - note = find_reg_note (branch, REG_BR_PROB, 0); - return (!note - || (arc_unalign_prob_threshold && !br_prob_note_reliable_p (note)) - || INTVAL (XEXP (note, 0)) < arc_unalign_prob_threshold); -} -#endif - /* When estimating sizes during arc_reorg, when optimizing for speed, there are three reasons why we need to consider branches to be length 6: - annull-false delay slot insns are implemented using conditional execution, diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index 252241a858c..0224ae65074 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -115,8 +115,6 @@ extern const char *arc_cpu_to_as (int argc, const char **argv); /* Run-time compilation parameters selecting different hardware subsets. */ -#define TARGET_MIXED_CODE (TARGET_MIXED_CODE_SET) - #define TARGET_SPFP (TARGET_SPFP_FAST_SET || TARGET_SPFP_COMPACT_SET) #define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET \ || TARGET_FP_DP_AX) @@ -571,7 +569,7 @@ extern enum reg_class arc_regno_reg_class[]; a scale factor or added to another register (as well as added to a displacement). */ -#define INDEX_REG_CLASS (TARGET_MIXED_CODE ? ARCOMPACT16_REGS : GENERAL_REGS) +#define INDEX_REG_CLASS GENERAL_REGS /* The class value for valid base registers. A base register is one used in an address which is the register value plus a displacement. */ diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index a67bb581003..de61b2b790f 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -2011,14 +2011,6 @@ (define_expand "extendhisi2" ;; Absolute instructions -(define_insn "*abssi2_mixed" - [(set (match_operand:SI 0 "compact_register_operand" "=q") - (abs:SI (match_operand:SI 1 "compact_register_operand" "q")))] - "TARGET_MIXED_CODE" - "abs%? %0,%1%&" - [(set_attr "type" "two_cycle_core") - (set_attr "iscompact" "true")]) - (define_insn "abssi2" [(set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,w,w") (abs:SI (match_operand:SI 1 "nonmemory_operand" "Rcq#q,cL,Cal")))] diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt index 85688d57574..a8935db2dd8 100644 --- a/gcc/config/arc/arc.opt +++ b/gcc/conf
Re: [PATCH] xtensa: Fix 2 warnings during xtensa build [PR100841]
On Wed, Jun 2, 2021 at 11:27 AM Jeff Law wrote: > On 6/2/2021 11:09 AM, Jakub Jelinek wrote: > > When building gcc targetting xtensa-linux, there are 2 warnings the PR > > complains about: > > ../../gcc/dwarf2cfi.c: In function ‘void init_one_dwarf_reg_size(int, > > machine_mode, rtx, machine_mode, init_one_dwarf_reg_state*)’: > > ../../gcc/dwarf2cfi.c:291:12: warning: comparison of integer expressions of > > different signedness: ‘const unsigned int’ and ‘int’ [-Wsign-compare] > >291 | if (rnum >= DWARF_FRAME_REGISTERS) > > ../../gcc/function.c: In function ‘void gen_call_used_regs_seq(rtx_insn*, > > unsigned int)’: > > ../../gcc/function.c:5897:63: warning: comparison of unsigned expression in > > ‘< 0’ is always false [-Wtype-limits] > > 5897 | if (crtl->uses_only_leaf_regs && LEAF_REG_REMAP (regno) < 0) > > which might during bootstrap or when configured with --enable-werror-always > > be turned into errors. > > > > The first one is the -Wsign-compare warning, in c-family we do: > > 2281/* Do not warn if the signed quantity is an unsuffixed integer > > 2282 literal (or some static constant expression involving such > > 2283 literals or a conditional expression involving such literals) > > 2284 and it is non-negative. */ > > 2285if (tree_expr_nonnegative_warnv_p (sop, &ovf)) > > 2286 /* OK */; > > and so don't warn if that function determines the expression is > > non-negative. But xtensa defines DWARF_FRAME_REGISTERS as > > (16 + (something ? 0 : 1)) and that isn't handled by > > tree_expr_nonnegative_warnv_p, VRP can handle it of course, but that is much > > later. > > The second chunk rewrites it into a form that tree_expr_nonnegative_warnv_p > > can handle, in particular (something ? 16 : 16 + 1), where for COND_EXPRs > > that function checks both the 2nd and 3rd operand of the ternary operator > > and if both are nonnegative, returns true. > > > > The other warning has been introduced fairly recently; LEAF_REG_REMAP is > > currently used by 2 targets only, and is documented to yield -1 if a hard > > reg number can't be remapped and the remapped register number otherwise. > > That means that the type of the expression should be signed (otherwise -1 > > could never appear), and on SPARC indeed it is defined as > > extern char leaf_reg_remap[]; > > #define LEAF_REG_REMAP(REGNO) (leaf_reg_remap[REGNO]) > > so unless the host is -funsigned-char by default it works fine. > > I guess sparc.[ch] should be fixed to use signed char of leaf_reg_remap, > > Eric? > > The argument to LEAF_REG_REMAP is often unsigned int though, hard > > register numbers are usually not negative, and thus the warning. > > I think xtensa doesn't have 2G hard registers and so it is ok to just cast > > it to int. > > > > Verified just by making sure the warnings go away in a cross, ok for trunk? > > > > 2021-06-02 Jakub Jelinek > > > > PR target/100841 > > * config/xtensa/xtensa.h (LEAF_REG_REMAP): Cast REGNO to int to avoid > > -Wtype-limits warnings. > > (DWARF_FRAME_REGISTER): Rewrite into ternary operator with addition > > in operands to avoid -Wsign-compare warnings. > OK. > jeff Looks good to me. Thank you! -- Max
[Ada] Fix miscompilation of predicate on bit-packed array types
This is a regression present on the mainline and 11 branch in the form of a miscompilation by the new mod/ref IPA pass of code that passes constrained bit-packed array objets in a call to a subprograms taking unconstrained bit- packed array parameters, which occurs for predicate on bit-packed array types. Tested on x86-64/Linux, applied on the mainline. 2021-06-03 Eric Botcazou * gcc-interface/decl.c (gnat_to_gnu_entity) : Add PAT local constant and use it throughout. If it is set, use a ref-all pointer type for the pointer-to-array field of the fat pointer type. : Add PAT local constant and use it throughout. 2021-06-03 Eric Botcazou * gnat.dg/bit_packed_array6.adb: New test. * gnat.dg/bit_packed_array6_pkg.ads: New helper. -- Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c index bc7046accb5..6fc94dda83b 100644 --- a/gcc/ada/gcc-interface/decl.c +++ b/gcc/ada/gcc-interface/decl.c @@ -2100,6 +2100,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) case E_Array_Type: { + const Entity_Id PAT = Packed_Array_Impl_Type (gnat_entity); const bool convention_fortran_p = (Convention (gnat_entity) == Convention_Fortran); const int ndim = Number_Dimensions (gnat_entity); @@ -2203,16 +2204,16 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) } /* If the GNAT encodings are used, give the fat pointer type a name. - If this is a packed array, tell the debugger how to interpret the - underlying bits by fetching that of the implementation type. But - in any case, mark it as artificial so the debugger can skip it. */ + If this is a packed type implemented specially, tell the debugger + how to interpret the underlying bits by fetching the name of the + implementation type. But, in any case, mark it as artificial so + the debugger can skip it. */ const Entity_Id gnat_name - = (Present (Packed_Array_Impl_Type (gnat_entity)) - && gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL) - ? Packed_Array_Impl_Type (gnat_entity) + = Present (PAT) && gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL + ? PAT : gnat_entity; tree xup_name - = (gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL) + = gnat_encodings != DWARF_GNAT_ENCODINGS_MINIMAL ? create_concat_name (gnat_name, "XUP") : gnu_entity_name; create_type_decl (xup_name, gnu_fat_type, true, debug_info_p, @@ -2347,9 +2348,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) /* If this is a packed type implemented specially, then process the implementation type so it is elaborated in the proper scope. */ - if (Present (Packed_Array_Impl_Type (gnat_entity))) - gnat_to_gnu_entity (Packed_Array_Impl_Type (gnat_entity), NULL_TREE, - false); + if (Present (PAT)) + gnat_to_gnu_entity (PAT, NULL_TREE, false); /* Otherwise, if an alignment is specified, use it if valid and, if the alignment was requested with an explicit clause, state so. */ @@ -2374,8 +2374,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) tem = change_qualified_type (tem, TYPE_QUAL_VOLATILE); /* Adjust the type of the pointer-to-array field of the fat pointer - and record the aliasing relationships if necessary. */ - TREE_TYPE (TYPE_FIELDS (gnu_fat_type)) = build_pointer_type (tem); + and record the aliasing relationships if necessary. If this is + a packed type implemented specially, then use a ref-all pointer + type since the implementation type may vary between constrained + subtypes and unconstrained base type. */ + if (Present (PAT)) + TREE_TYPE (TYPE_FIELDS (gnu_fat_type)) + = build_pointer_type_for_mode (tem, ptr_mode, true); + else + TREE_TYPE (TYPE_FIELDS (gnu_fat_type)) = build_pointer_type (tem); if (TYPE_ALIAS_SET_KNOWN_P (gnu_fat_type)) record_component_aliases (gnu_fat_type); @@ -2439,6 +2446,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) ; else { + const Entity_Id PAT = Packed_Array_Impl_Type (gnat_entity); Entity_Id gnat_index, gnat_base_index; const bool convention_fortran_p = (Convention (gnat_entity) == Convention_Fortran); @@ -2844,7 +2852,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) /* If this is a packed type implemented specially, then replace our type with the implementation type. */ - if (Present (Packed_Array_Impl_Type (gnat_entity))) + if (Present (PAT)) { /* First finish the type we had been making so that we output debugging information for it. */ @@ -2869,8 +2877,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition) this type again. */ save_gnu_tree (gnat_entity, gnu_tmp_decl, false); - gnu_type - = gnat_to_gnu_type (Packed_Array_Impl_Type (gn
Re: [PATCH] libstdc++: Avoid hard error in ranges::unique_copy [PR100770]
On 27/05/21 09:50 -0400, Patrick Palka via Libstdc++ wrote: On Wed, 26 May 2021, Tim Song wrote: On Wed, May 26, 2021 at 1:43 PM Patrick Palka wrote: > > On Wed, 26 May 2021, Tim Song wrote: > > > > On Wed, May 26, 2021 at 12:00 PM Patrick Palka via Libstdc++ > > wrote: > > > > > > - else if constexpr (input_iterator<_Out> > > > - && same_as, iter_value_t<_Out>>) > > > + else if constexpr (requires { requires (input_iterator<_Out> > > > + && same_as, > > > + iter_value_t<_Out>>); }) > > > > It's arguably cleaner to extract this into a concept which can then > > also be used in the constraint. > > Sounds good, though I'm not sure what name to give to this relatively > ad-hoc set of requirements. Any suggestions? :) > Something along the lines of "__can_reread_output", perhaps? Works for me. Here's v2, which factors out the condition into a concept and defines the value_type, pointer and reference of output_iterator_wrapper to void but leaves alone its difference_type. Tested on x86_64-pc-linux-gnu. OK for 10, 11 and trunk, thanks. -- >8 -- Subject: [PATCH] libstdc++: Avoid hard error in ranges::unique_copy [PR100770] Here, in the constexpr if condition within unique_copy, when input_iterator<_Out> isn't satisfied we must avoid substituting into iter_value_t<_Out> because the latter isn't necessarily well-formed in this case. To that end, this patch factors out the condition into a concept and uses it throughout. This patch also makes the definition of our testsuite output_iterator_wrapper more minimal by defining its value_type, pointer and reference member types to void. This means our existing tests for unique_copy already exercise the fix for this bug, so we don't need to add another test. The only other fallout of this testsuite iterator change appears in std/ranges/range.cc, where the use of range_value_t on a test_output_range is now ill-formed. libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (__detail::__can_reread_output): Factor out this concept from ... (__unique_copy_fn::operator()): ... here. Use the concept throughout. * testsuite/std/ranges/range.cc: Remove now ill-formed use of range_value_t on an output_range. * testsuite/util/testsuite_iterators.h (output_iterator_wrapper): Define value_type, pointer and reference member types to void. --- libstdc++-v3/include/bits/ranges_algo.h | 16 ++-- libstdc++-v3/testsuite/std/ranges/range.cc | 3 --- .../testsuite/util/testsuite_iterators.h | 2 +- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_algo.h b/libstdc++-v3/include/bits/ranges_algo.h index cda3042c11f..ecf1378742d 100644 --- a/libstdc++-v3/include/bits/ranges_algo.h +++ b/libstdc++-v3/include/bits/ranges_algo.h @@ -1396,6 +1396,13 @@ namespace ranges inline constexpr __unique_fn unique{}; + namespace __detail + { +template + concept __can_reread_output = input_iterator<_Out> + && same_as<_Tp, iter_value_t<_Out>>; + } + template using unique_copy_result = in_out_result<_Iter, _Out>; @@ -1407,8 +1414,7 @@ namespace ranges projected<_Iter, _Proj>> _Comp = ranges::equal_to> requires indirectly_copyable<_Iter, _Out> && (forward_iterator<_Iter> - || (input_iterator<_Out> - && same_as, iter_value_t<_Out>>) + || __detail::__can_reread_output<_Out, iter_value_t<_Iter>> || indirectly_copyable_storable<_Iter, _Out>) constexpr unique_copy_result<_Iter, _Out> operator()(_Iter __first, _Sent __last, _Out __result, @@ -1432,8 +1438,7 @@ namespace ranges } return {__next, std::move(++__result)}; } - else if constexpr (input_iterator<_Out> - && same_as, iter_value_t<_Out>>) + else if constexpr (__detail::__can_reread_output<_Out, iter_value_t<_Iter>>) { *__result = *__first; while (++__first != __last) @@ -1467,8 +1472,7 @@ namespace ranges projected, _Proj>> _Comp = ranges::equal_to> requires indirectly_copyable, _Out> && (forward_iterator> - || (input_iterator<_Out> - && same_as, iter_value_t<_Out>>) + || __detail::__can_reread_output<_Out, range_value_t<_Range>> || indirectly_copyable_storable, _Out>) constexpr unique_copy_result, _Out> operator()(_Range&& __r, _Out __result, diff --git a/libstdc++-v3/testsuite/std/ranges/range.cc b/libstdc++-v3/testsuite/std/ranges/range.cc index 795aca472e5..aa29af471a4 100644 --- a/libstdc++-v3/testsuite/std/ranges/range.cc +++ b/libstdc++-v3/testsuite/std/ranges/range.cc @@ -75,9 +75,6 @@ static_assert( same_as, static_assert( same_as,
Re: [PATCH] libstdc++: Simplify range adaptors' forwarding of bound args when possible
On 14/05/21 14:27 -0400, Patrick Palka via Libstdc++ wrote: r11-8053 rewrote the range adaptor implementation in conformance with P2281, making partial application act like a SFINAE-friendly perfect forwarding call wrapper. Making SFINAE-friendliness coexist with perfect forwarding here requires adding fallback deleted operator() overloads (as described in e.g. section 5.5 of wg21.link/p0847r6). But unfortunately, as reported in PR100577, this necessary technique of using of deleted overloads regresses diagnostics for ill-formed calls to partially applied range adaptors in GCC. Although GCC's diagnostics can arguably be improved here by having the compiler explain why the other candidates weren't viable when overload resolution selects a deleted candidate, we can also largely work around this on the library side (and achieve more concise diagnostics than by a frontend-side improvement alone) if we take advantage of the observation that not all range adaptors need perfect forwarding call wrapper semantics, in fact only views::split currently needs it, because all other range adaptors either take no extra arguments or only arguments that are expected to be freely/cheaply copyable, e.g. function objects and integer-like types. (The discussion section of P2281 goes into detail about why views::split is special.) To that end, this introduces opt-in flags for denoting a range adaptor as having "simple" extra arguments (in the case of a range adaptor non-closure) or having a "simple" call operator (in the case of a range adaptor closure). These flags are then used to conditionally simplify the operator() for the generic _Partial and _Pipe class templates, down from needing three overloads thereof (including one defined as deleted) to just needing a single overload. The end result is that diagnostic quality is restored for all adaptors except for views::split, and diagnostics for the adaptors are generally made more concise since there's only a single _Partial/_Pipe overload to diagnose instead of three of them. Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11? OK for trunk and 11. Is there any benefit in using [[no_unique_address]] for _Partial::_M_args too?
Re: GCC documentation: porting to Sphinx
On 6/2/21 6:44 PM, Joseph Myers wrote: On Wed, 2 Jun 2021, Joel Sherrill wrote: For RTEMS, we switched from texinfo to Sphinx and the dependency on Python3 for Sphinx has caused a bit of hassle. Is this going to be an issue for GCC? What Sphinx (and, thus, Python) versions does the GCC manual build work with? I've just tried version 1.7.6 which we use for libgccjit and it's fine: https://gcc.gnu.org/onlinedocs/jit/ About Python version: I'm not planning supporting Python2, it's dead 10 years already. Can it work with e.g. any Sphinx versions from the past five years, or are there newer Sphinx features that are critical for the GCC manuals? The mentioned version is 3 years and I expect even older releases would work. No, I don't rely on any new feature. I would recommend testing the build. You can simply clone: https://github.com/marxin/texi2rst-generated and simply run 'make html' or 'make latexpdf'. Basic dependencies are mentioned here: https://github.com/marxin/texi2rst-generated#requirements Martin I've seen a need for frequent Sphinx updates being a pain when building other software using Sphinx for its manual. Also we rely on TexLive for PDF output and that's a bit of a pain to install. Tex was incorrectly packaged on some RHEL/CentOS versions. This is nothing new, since building PDF manuals from Texinfo sources also needs TeX.
Re: [PATCH] Use _GLIBCXX_ASSERTIONS as _GLIBCXX_DEBUG light
On 27/05/21 19:37 +0200, François Dumont via Libstdc++ wrote: We have been talking for a long time of a debug mode with less impact on performances. We already have it, that's what _GLIBCXX_ASSERTIONS already is :-) I propose to simply use the existing _GLIBCXX_ASSERTIONS macro. libstdc++: [_GLIBCXX_ASSERTIONS] Activate basic debug checks Use _GLIBCXX_ASSERTIONS as a _GLIBCXX_DEBUG light mode. When defined it activates all _GLIBCXX_DEBUG checks but skipping those requiring to loop through the iterator range unless in case of constexpr. libstdc++-v3/ChangeLog: * include/debug/debug.h [_GLIBCXX_ASSERTIONS]: Define debug macros non-empty. * include/debug/helper_functions.h: Cleanup comment about removed _Iter_base. * include/debug/functions.h (__skip_debug_runtime_check): New, returns false if _GLIBCXX_DEBUG is defined or if constant evaluated. (__check_sorted, __check_partitioned_lower, __check_partitioned_upper): Use latter. Tested under Linux x64. Ok to commit ? François diff --git a/libstdc++-v3/include/debug/debug.h b/libstdc++-v3/include/debug/debug.h index 116f2f023e2..2e6ce1c8a93 100644 --- a/libstdc++-v3/include/debug/debug.h +++ b/libstdc++-v3/include/debug/debug.h @@ -61,7 +61,7 @@ namespace __gnu_debug struct _Safe_iterator; } -#ifndef _GLIBCXX_DEBUG +#ifndef _GLIBCXX_ASSERTIONS # define __glibcxx_requires_cond(_Cond,_Msg) # define __glibcxx_requires_valid_range(_First,_Last) diff --git a/libstdc++-v3/include/debug/functions.h b/libstdc++-v3/include/debug/functions.h index 6cac11f2abd..ee0eb877568 100644 --- a/libstdc++-v3/include/debug/functions.h +++ b/libstdc++-v3/include/debug/functions.h @@ -48,6 +48,25 @@ namespace __gnu_debug template struct _Is_contiguous_sequence : std::__false_type { }; + _GLIBCXX20_CONSTEXPR Should this be simply _GLIBCXX_CONSTEXPR so that it can be constexpr in C++14 mode too? Or are there are never any debug checks in functions that are already constexpr in C++14 or C++17? + inline bool + __skip_debug_runtime_check() + { +// We could be here while only _GLIBCXX_ASSERTIONS has been defined. +// In this case we skip expensive runtime checks, constexpr will still +// be checked. +return +#ifndef _GLIBCXX_DEBUG +# if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED + !__builtin_is_constant_evaluated(); +# else + true; +# endif +#else + false; +#endif I think this would be simpler without the nesting, and without the preprocessor checks halfway through the return statement: #ifdef _GLIBCXX_DEBUG return false; #elif _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED return !__builtin_is_constant_evaluated(); #else return true; #endif + } + /* Checks that [first, last) is a valid range, and then returns * __first. This routine is useful when we can't use a separate * assertion statement because, e.g., we are in a constructor. @@ -260,8 +279,9 @@ namespace __gnu_debug inline bool __check_sorted(const _InputIterator& __first, const _InputIterator& __last) { - return __check_sorted_aux(__first, __last, - std::__iterator_category(__first)); + return __skip_debug_runtime_check() + || __check_sorted_aux(__first, __last, + std::__iterator_category(__first)); Currently this function is never called at all ifndef _GLIBCXX_DEBUG. With this change, it's going to be present for _GLIBCXX_ASSERTIONS, and if it isn't inlined it's going to explode the code size. Some linux distros are already building the entire distro with _GLIBCXX_ASSERTIONS so I think we need to be quite careful about this kind of large change affecting every algo. So maybe we shouldn't enable these checks via _GLIBCXX_ASSERTIONS, but a new macro.
Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
Thank you Kewen!! I will apply this now. BR, Andre On 25/05/2021 09:42, Kewen.Lin wrote: on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote: Hi Andre, on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: Hi, When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration. This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound. Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures. Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target? Thanks for doing this! I can test it on Power10 which enables this parameter by default, also evaluate its impact on SPEC2017 Ofast/unroll. Bootstrapped/regtested on powerpc64le-linux-gnu Power10. SPEC2017 run didn't show any remarkable improvement/degradation. BR, Kewen
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On 6/2/21 7:46 PM, Xionghu Luo wrote: Hi, On 2021/6/3 06:20, Segher Boessenkool wrote: On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: On P8LE, extra rot64+rot64 load or store instructions are generated in float128 to vector __int128 conversion. This patch teaches pass swaps to also handle such pattens to remove extra swap instructions. Did you check if this is already handled by simplify-rtx if the mode had been TImode (not V1TImode)? If not, why do you not handle it there? I tried to do it in combine or peephole, the later pass split2 or split3 will still split it to rotate + rotate again as we have split after reload, and this pattern is quite P8LE specific, so put it in pass swap. The simplify-rtx could simplify r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register operations already. vsx.md: ;; The post-reload split requires that we re-permute the source ;; register in case it is still live. (define_split [(set (match_operand:VSX_LE_128 0 "memory_operand") (match_operand:VSX_LE_128 1 "vsx_register_operand"))] "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR && !altivec_indexed_or_indirect_operand (operands[0], mode)" [(const_int 0)] { rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); DONE; }) Note also that swap optimization can handle more general cases than simplify-rtx. In my view it's best to have it covered in both places. Bill Thanks, Xionghu
Re: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.
On Thu, Jun 03, 2021 at 05:07:26PM +0800, liuhongt via Gcc-patches wrote: > @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2" >"TARGET_AVX512F") > > (define_insn "avx2_v8qiv8si2" > - [(set (match_operand:V8SI 0 "register_operand" "=v") > + [(set (match_operand:V8SI 0 "register_operand" "=Yv") > (any_extend:V8SI > (vec_select:V8QI > - (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 1 "register_operand" "Yv") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) Why do you need this change (and similarly other v -> Yv changes)? I mean, ix86_hard_regno_mode_ok for TARGET_AVX512F && !TARGET_AVX512VL should return false for the 16-byte and 32-byte vector modes. The reason to use Yv is typically where the match_operand has 64-byte vector mode or scalar mode, yet it needs an AVX512VL instruction. The changes to use Yw look ok, that is for the cases where the insn requires both AVX512VL and AVX512BW, while ix86_hard_regno_mode_ok ensures the xmm16+ regs won't be used for the 16/32-byte vectors when AVX512VL is not on, it doesn't ensure that AVX512BW will be enabled. Jakub
Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code
On 6/2/21 6:32 AM, Aldy Hernandez wrote: We've been having "issues" in our branch when exporting to the global space ranges that take into account previously known ranges (SSA_NAME_RANGE_INFO, etc). For the longest time we had the export feature turned off because it had the potential of removing __builtin_unreachable code early in the pipeline. This was causing one or two tests to fail. I finally got fed up, and investigated why. Take the following code: i_4 = somerandom (); if (i_4 < 0) goto ; [INV] else goto ; [INV] : __builtin_unreachable (); : It turns out that both legacy evrp and VRP have code that notices the above pattern and sets the *global* range for i_4 to [0,MAX]. That is, the range for i_4 is set, not at BB4, but at the definition site. See uses of assert_unreachable_fallthru_edge_p() for details. This global range causes subsequent passes (VRP1 in the testcase below), to remove the checks and the __builtin_unreachable code altogether. // pr80776-1.c int somerandom (void); void Foo (void) { int i = somerandom (); if (! (0 <= i)) __builtin_unreachable (); if (! (0 <= i && i <= 99)) __builtin_unreachable (); sprintf (number, "%d", i); } This means that by the time the -Wformat-overflow warning runs, the above sprintf has been left unguarded, and a bogus warning is issued. Currently the above test does not warn, but that's because of an oversight in export_global_ranges(). This function is disregarding known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and only setting ranges the ranger knows about. For the above test the IL is: : i_4 = somerandom (); if (i_4 < 0) goto ; [INV] else goto ; [INV] : __builtin_unreachable (); : i.0_1 = (unsigned int) i_4; if (i.0_1 > 99) goto ; [INV] else goto ; [INV] : __builtin_unreachable (); : _7 = __builtin___sprintf_chk (&number, 1, 7, "%d", i_4); Legacy evrp has determined that the range for i_4 is [0,MAX] per my analysis above, but ranger has no known range for i_4 at the definition site. So at export_global_ranges time, ranger leaves the [0,MAX] alone. OTOH, evrp sets the global range at the definition for i.0_1 to [0,99] per the same unreachable feature. However, ranger has correctly determined that the range for i.0_1 at the definition is [0,MAX], which it then proceeds to export. Since the current export_global_ranges (mistakenly) does not take into account previous global ranges, the ranges in the global tables end up like this: i_4: [0, MAX] i.0_1: [0, MAX] This causes the first unreachable block to be removed in VRP1, but the second one to remain. Later VRP can determine that i_4 in the sprintf call is [0,99], and no warning is issued. But... the missing bogus warning is due to current export_global_ranges ignoring SSA_NAME_RANGE_INFO and friends, something which I'd like to fix. However, fixing this, gets us back to: i_4: [0, MAX] i.0_1: [0, 99] Which means, we'll be back to removing the unreachable blocks and issuing a warning in pr80776-1.c (like we have been since the beginning of time). The attached patch fixes export_global_ranges to the expected behavior, and adds the previous XFAIL to pr80776-1.c, while documenting why this warning is issued in the first place. Once legacy evrp is removed, this won't be an issue, as ranges in the IL will tell the truth. However, this will mean that we will no longer remove the first __builtin_unreachable combo. But ISTM, that would be correct behavior ??. Tested on x86-64 Linux. OK? If I understand correctly, we will have to ensure that __builtin_unreachable () and the branches to them are removed by the end of gimple one way or another. While legacy EVRP is still running, that isnt a problem. This patch merely changes the values we export from ranger to be more up to date... and some fallout you've encountered. This might get taken care of down the road.. there is some potential for identifying a range that dominates all uses and making that the global range.. that would also resolve the issue of builtin_unreachable(). Anyway, we can deal with this when we try to move to ranger-only mode OK.
Re: rs6000: Require ELFv2 ABI for ROP test (PR100750)
On 6/1/21 7:44 PM, Segher Boessenkool wrote: It would be nice if we had a selector for when we can use -mrop-protect, instead of assuming it is only for ELFv2. Agreed -- I'll take a note and try to get to this sometime.
[committed] diagnostic-show-locus: tweak rejection logic
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as 4bdc6e17c9dc621f9502a84abceb2e17ae0418f8. gcc/ChangeLog: * diagnostic-show-locus.c (diagnostic_show_locus): Don't reject printing the same location twice if there are fix-it hints, multiple locations, or a label. Signed-off-by: David Malcolm --- gcc/diagnostic-show-locus.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c index 4111cd66544..24bd0313126 100644 --- a/gcc/diagnostic-show-locus.c +++ b/gcc/diagnostic-show-locus.c @@ -2600,9 +2600,11 @@ diagnostic_show_locus (diagnostic_context * context, return; /* Don't print the same source location twice in a row, unless we have - fix-it hints. */ + fix-it hints, or multiple locations, or a label. */ if (loc == context->last_location - && richloc->get_num_fixit_hints () == 0) + && richloc->get_num_fixit_hints () == 0 + && richloc->get_num_locations () == 1 + && richloc->get_range (0)->m_label == NULL) return; context->last_location = loc; -- 2.26.3
[committed] analyzer: show types for poisoned_svalue and compound_svalue
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as e84fe25f638efe1ead4304693d91d7555e7a. gcc/analyzer/ChangeLog: * svalue.cc (poisoned_svalue::dump_to_pp): Dump type. (compound_svalue::dump_to_pp): Dump any type. Signed-off-by: David Malcolm --- gcc/analyzer/svalue.cc | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc index 897e84e8464..a16563d912a 100644 --- a/gcc/analyzer/svalue.cc +++ b/gcc/analyzer/svalue.cc @@ -735,9 +735,17 @@ void poisoned_svalue::dump_to_pp (pretty_printer *pp, bool simple) const { if (simple) -pp_printf (pp, "POISONED(%s)", poison_kind_to_str (m_kind)); +{ + pp_string (pp, "POISONED("); + print_quoted_type (pp, get_type ()); + pp_printf (pp, ", %s)", poison_kind_to_str (m_kind)); +} else -pp_printf (pp, "poisoned_svalue(%s)", poison_kind_to_str (m_kind)); +{ + pp_string (pp, "poisoned_svalue("); + print_quoted_type (pp, get_type ()); + pp_printf (pp, ", %s)", poison_kind_to_str (m_kind)); +} } /* Implementation of svalue::accept vfunc for poisoned_svalue. */ @@ -1228,17 +1236,26 @@ compound_svalue::dump_to_pp (pretty_printer *pp, bool simple) const if (simple) { pp_string (pp, "COMPOUND("); + if (get_type ()) + { + print_quoted_type (pp, get_type ()); + pp_string (pp, ", "); + } + pp_character (pp, '{'); m_map.dump_to_pp (pp, simple, false); - pp_character (pp, ')'); + pp_string (pp, "})"); } else { pp_string (pp, "compound_svalue ("); - pp_string (pp, ", "); + if (get_type ()) + { + print_quoted_type (pp, get_type ()); + pp_string (pp, ", "); + } pp_character (pp, '{'); m_map.dump_to_pp (pp, simple, false); - pp_string (pp, "}, "); - pp_character (pp, ')'); + pp_string (pp, "})"); } } -- 2.26.3
[committed] analyzer: remove unused prototypes
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as 981d98b883ed521c88c295ed82227c605d82add4. gcc/analyzer/ChangeLog: * store.h (store::get_direct_binding): Remove unused decl. (store::get_default_binding): Likewise. Signed-off-by: David Malcolm --- gcc/analyzer/store.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h index dc22d96f186..d68513ca94c 100644 --- a/gcc/analyzer/store.h +++ b/gcc/analyzer/store.h @@ -611,8 +611,6 @@ public: json::object *to_json () const; - const svalue *get_direct_binding (store_manager *mgr, const region *reg); - const svalue *get_default_binding (store_manager *mgr, const region *reg); const svalue *get_any_binding (store_manager *mgr, const region *reg) const; bool called_unknown_fn_p () const { return m_called_unknown_fn; } -- 2.26.3
Re: [Patch, fortran] PR fortran/100120/100816/100818/100819/100821 problems raised by aggregate data types
Hi José, Patch tested only on x86_64-pc-linux-gnu. Also tested on darwin20. The patch is OK for me. Thanks for the work, Dominique
Re: [Patch] Fortran/OpenMP: Add omp loop [PR99928]
On Thu, Jun 03, 2021 at 12:30:50AM +0200, Tobias Burnus wrote: > This patch adds support for 'omp loop' to gfortran including the combined > constructs. It also fixes some splitting issues with clauses in > combined constructs. > > It does not attempt to clean up all remaining Fortran issues with > clauses in combined constructs (cf. below + PR). > > * * * > > Since 'parallel loop' is now supported, using > !$omp parallel & > !$acc& loop > no longer gave an error. (Same result before: '!$acc& do'.) I think best would be just to remove that part of the testcase in the loop patch and handle the !$omp with !$acc continuations and vice versa in a separate change. That seems to be a preexisting bug not really related to whether we support loop or not. fatal_error is certainly not ideal, but I can understand fixing it otherwise might be hard. Wonder if we just shouldn't treat the incorrect continuations as if they were simple comments. > The check does (for Fortran – and alike for C): > ! { dg-final { scan-tree-dump-not "omp > distribute\[^\n\r]*lastprivate\\(j00\\)" "gimple" } } > ! { dg-final { scan-tree-dump-not "omp > parallel\[^\n\r]*lastprivate\\(j00\\)" "gimple" } } > ! { dg-final { scan-tree-dump-not "omp for\[^\n\r]*lastprivate\\(j00\\)" > "gimple" } } > ! { dg-final { scan-tree-dump "omp simd\[^\n\r]*linear\\(j00:1\\)" "gimple" > } } > !$omp distribute parallel do simd default(none) > do j00 = 1, 64 > end do > > While C generates (original dump) > > #pragma omp distribute > #pragma omp parallel default(none) > #pragma omp for nowait > #pragma omp simd > > Fortran generates the same except for: > #pragma omp simd linear(j00:1) The *-7 testcase is not appropriate for Fortran, it tests the IV declared in the construct cases which Fortran has no counterpart for, even if the IV is first seen within the construct, that would implicitly declare it outside of the construct. > + if (gfc_match ("teams )") == MATCH_YES) > + c->bind = OMP_BIND_TEAMS; > + else if (gfc_match ("parallel )") == MATCH_YES) > + c->bind = OMP_BIND_PARALLEL; > + else if (gfc_match ("thread )") == MATCH_YES) > + c->bind = OMP_BIND_THREAD; > + else > + { > + gfc_error ("Expected TEAMS, PARALLEL or THEAD as binding in " Typo, s/THEAD/THREAD/ > + if (clauses->bind == OMP_BIND_TEAMS) > +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_TEAMS; > + else if (clauses->bind == OMP_BIND_PARALLEL) > +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_PARALLEL; > + else if (clauses->bind == OMP_BIND_THREAD) > +OMP_CLAUSE_BIND_KIND (c) = OMP_CLAUSE_BIND_THREAD; > + else > + gcc_unreachable (); Wouldn't a switch (clauses->bind) be cleaner? Otherwise LGTM. Jakub
Re: [PATCH 1/2] Implement generic expression evaluator for range_query.
Andrew had some minor suggestions and cleanups after I posted this. Andrew, is this what you had in mind? Aldy On Sat, May 29, 2021 at 6:06 AM Jeff Law wrote: > > > > On 5/27/2021 2:55 AM, Aldy Hernandez via Gcc-patches wrote: > > Right now, range_of_expr only works with constants, SSA names, and > > pointers. Anything else gets returned as VARYING. This patch adds the > > capability to deal with arbitrary expressions, inasmuch as these > > tree codes are implemented in range-ops.cc. > > > > This will give us the ability to ask for the range of any tree expression, > > not just constants and SSA names, with range_of_expr(). > > > > This is a more generic implementation of determine_value_range in VRP. > > A follow-up patch will remove all uses of it in favor of the > > range_query API. > > > > Tested on x86-64 Linux. > > > > OK? > > > > gcc/ChangeLog: > > > > * function-tests.c (test_ranges): Call gimple_range_tests. > > * gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use > > get_global_range_query instead of get_tree_range. > > * gimple-range.cc (fur_source::get_operand): Add argument to > > get_tree_range. > > (get_arith_expr_range): New. > > (get_tree_range): Add gimple and range_query arguments. > > Call get_arith_expr_range. > > (gimple_ranger::range_of_expr): Add argument to get_tree_range. > > Include gimple-range-tests.cc. > > * gimple-range.h (get_tree_range): Add argument. > > * selftest.h (gimple_range_tests): New. > > * value-query.cc (global_range_query::range_of_expr): Add > > argument to get_tree_range. > > * vr-values.c (vr_values::range_of_expr): Same. > > * gimple-range-tests.cc: New file. > Both patches in this series are fine. > > Thanks, > Jeff > From fe8f95d2baf92fc5fa967c5d183c5480f003e15d Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Wed, 26 May 2021 08:40:17 +0200 Subject: [PATCH 1/2] Implement generic expression evaluator for range_query. Right now, range_of_expr only works with constants, SSA names, and pointers. Anything else gets returned as VARYING. This patch adds the capability to deal with arbitrary expressions, inasmuch as these tree codes are implemented in range-ops.cc. This will give us the ability to ask for the range of any tree expression, not just constants and SSA names, with range_of_expr(). This is a more generic implementation of determine_value_range in VRP. A follow-up patch will remove all uses of it in favor of the range_query API. gcc/ChangeLog: * function-tests.c (test_ranges): Call gimple_range_tests. * gimple-range-cache.cc (ranger_cache::range_of_expr): Pass stmt to get_tree_range. * gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use get_global_range_query instead of get_tree_range. * gimple-range.cc (fur_source::get_operand): Add argument to get_tree_range. (get_tree_range): Move to value-query.cc. Call get_arith_expr_range. (gimple_ranger::range_of_expr): Add argument to get_tree_range. Include gimple-range-tests.cc. * gimple-range.h (fold_range): Add argument. (get_tree_range): Remove. * selftest.h (gimple_range_tests): New. * value-query.cc (global_range_query::range_of_expr): Add stmt argument. (range_query::get_tree_range): Move from gimple-range.cc. * value-query.h (class range_query): Add get_tree_range and get_arith_expr_range. Make fur_source a friend. * vr-values.c (vr_values::range_of_expr): Pass stmt to get_tree_range. * gimple-range-tests.cc: New file. --- gcc/function-tests.c | 5 +++ gcc/gimple-range-cache.cc | 4 +- gcc/gimple-range-tests.cc | 72 + gcc/gimple-range.cc | 64 ++--- gcc/gimple-range.h| 7 ++-- gcc/selftest.h| 1 + gcc/value-query.cc| 84 ++- gcc/value-query.h | 2 + gcc/vr-values.c | 2 +- 9 files changed, 172 insertions(+), 69 deletions(-) create mode 100644 gcc/gimple-range-tests.cc diff --git a/gcc/function-tests.c b/gcc/function-tests.c index 1b8f665cde1..0ac1d37f8ff 100644 --- a/gcc/function-tests.c +++ b/gcc/function-tests.c @@ -581,6 +581,11 @@ test_ranges () push_cfun (fun); range_tests (); range_op_tests (); + + build_cfg (fndecl); + convert_to_ssa (fndecl); + gimple_range_tests (); + pop_cfun (); } diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index cc27574b7b4..c58acf48dfb 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -827,7 +827,7 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt) { if (!gimple_range_ssa_p (name)) { - get_tree_range (r, name); + get_tree_range (r, name, stmt); return true; } @@ -860,7 +860,7 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt) } } else -get_tree_range (r, expr); +get_tree_range (r, expr, NULL); return false; } d
Re: [PATCH 1/2] Implement generic expression evaluator for range_query.
On 6/3/21 11:39 AM, Aldy Hernandez via Gcc-patches wrote: Andrew had some minor suggestions and cleanups after I posted this. Andrew, is this what you had in mind? Aldy On Sat, May 29, 2021 at 6:06 AM Jeff Law wrote: On 5/27/2021 2:55 AM, Aldy Hernandez via Gcc-patches wrote: Right now, range_of_expr only works with constants, SSA names, and pointers. Anything else gets returned as VARYING. This patch adds the capability to deal with arbitrary expressions, inasmuch as these tree codes are implemented in range-ops.cc. This will give us the ability to ask for the range of any tree expression, not just constants and SSA names, with range_of_expr(). This is a more generic implementation of determine_value_range in VRP. A follow-up patch will remove all uses of it in favor of the range_query API. Tested on x86-64 Linux. OK? gcc/ChangeLog: * function-tests.c (test_ranges): Call gimple_range_tests. * gimple-range-gori.cc (gori_compute::expr_range_at_stmt): Use get_global_range_query instead of get_tree_range. * gimple-range.cc (fur_source::get_operand): Add argument to get_tree_range. (get_arith_expr_range): New. (get_tree_range): Add gimple and range_query arguments. Call get_arith_expr_range. (gimple_ranger::range_of_expr): Add argument to get_tree_range. Include gimple-range-tests.cc. * gimple-range.h (get_tree_range): Add argument. * selftest.h (gimple_range_tests): New. * value-query.cc (global_range_query::range_of_expr): Add argument to get_tree_range. * vr-values.c (vr_values::range_of_expr): Same. * gimple-range-tests.cc: New file. Both patches in this series are fine. Thanks, Jeff yeah, OK. Andrew
Re: [PATCH] wwwdocs: Do not rewrite the page titles
Ping. Is this OK now? On 18/04/21 23:45 +0100, Jonathan Wakely wrote: Remove GNU and FSF attribution from HTML page titles. I don't see why we should have to "comply with the GNU style" if we're truly an independent project run by the GCC developers and aided by the steering committee. OK for wwwdocs? commit d157082f49725510560cb83f2f1c045e2968ad3b Author: Jonathan Wakely Date: Sun Apr 18 23:42:26 2021 +0100 Do not rewrite the page titles Remove GNU and FSF attribution from HTML page titles. diff --git a/htdocs/style.mhtml b/htdocs/style.mhtml index 677fd05a..eb7c353d 100644 --- a/htdocs/style.mhtml +++ b/htdocs/style.mhtml @@ -32,14 +32,6 @@ -;;; Redefine the tag to comply with the GNU style. - - - -%body -- GNU Project - Free Software Foundation (FSF) - - ;;; Redefine the tag, adding navigation and a standard footer.
[c-family] Fix issue for external subtypes with -fdump-ada-spec
This works around an irregularity of the language whereby subtypes, unlike types, are not visible through a limited_with clause. Tested on x86-64/Linux, applied on the mainline. 2021-06-03 Eric Botcazou c-family/ * c-ada-spec.c (pp_ada_tree_identifier): Tidy up. (dump_ada_node) : Deal specially with external subtypes. -- Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c index ef0c74c3f08..751cc0edef8 100644 --- a/gcc/c-family/c-ada-spec.c +++ b/gcc/c-family/c-ada-spec.c @@ -1341,49 +1341,46 @@ pp_ada_tree_identifier (pretty_printer *buffer, tree node, tree type, char *s = to_ada_name (name, &space_found); tree decl = get_underlying_decl (type); - /* If the entity comes from another file, generate a package prefix. */ if (decl) { - expanded_location xloc = expand_location (decl_sloc (decl, false)); + /* If the entity comes from another file, generate a package prefix. */ + const expanded_location xloc = expand_location (decl_sloc (decl, false)); - if (xloc.file && xloc.line) + if (xloc.line && xloc.file && xloc.file != current_source_file) { - if (xloc.file != current_source_file) + switch (TREE_CODE (type)) { - switch (TREE_CODE (type)) - { - case ENUMERAL_TYPE: - case INTEGER_TYPE: - case REAL_TYPE: - case FIXED_POINT_TYPE: - case BOOLEAN_TYPE: - case REFERENCE_TYPE: - case POINTER_TYPE: - case ARRAY_TYPE: - case RECORD_TYPE: - case UNION_TYPE: - case TYPE_DECL: - if (package_prefix) - { - char *s1 = get_ada_package (xloc.file); - append_withs (s1, limited_access); - pp_string (buffer, s1); - pp_dot (buffer); - free (s1); - } - break; - default: - break; - } + case ENUMERAL_TYPE: + case INTEGER_TYPE: + case REAL_TYPE: + case FIXED_POINT_TYPE: + case BOOLEAN_TYPE: + case REFERENCE_TYPE: + case POINTER_TYPE: + case ARRAY_TYPE: + case RECORD_TYPE: + case UNION_TYPE: + case TYPE_DECL: + if (package_prefix) + { + char *s1 = get_ada_package (xloc.file); + append_withs (s1, limited_access); + pp_string (buffer, s1); + pp_dot (buffer); + free (s1); + } + break; + default: + break; + } - /* Generate the additional package prefix for C++ classes. */ - if (separate_class_package (decl)) - { - pp_string (buffer, "Class_"); - pp_string (buffer, s); - pp_dot (buffer); - } - } + /* Generate the additional package prefix for C++ classes. */ + if (separate_class_package (decl)) + { + pp_string (buffer, "Class_"); + pp_string (buffer, s); + pp_dot (buffer); + } } } @@ -2220,6 +2217,24 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc, { tree type_name = TYPE_NAME (TREE_TYPE (node)); + /* Generate "access " instead of "access " + if the subtype comes from another file, because subtype + declarations do not contribute to the limited view of a + package and thus subtypes cannot be referenced through + a limited_with clause. */ + if (type_name + && TREE_CODE (type_name) == TYPE_DECL + && DECL_ORIGINAL_TYPE (type_name) + && TYPE_NAME (DECL_ORIGINAL_TYPE (type_name))) + { + const expanded_location xloc + = expand_location (decl_sloc (type_name, false)); + if (xloc.line + && xloc.file + && xloc.file != current_source_file) + type_name = DECL_ORIGINAL_TYPE (type_name); + } + /* For now, handle access-to-access as System.Address. */ if (TREE_CODE (TREE_TYPE (node)) == POINTER_TYPE) { @@ -2241,8 +2256,8 @@ dump_ada_node (pretty_printer *buffer, tree node, tree type, int spc, { if (!type || TREE_CODE (type) != FUNCTION_DECL) { - pp_string (buffer, "access "); is_access = true; + pp_string (buffer, "access "); if (quals & TYPE_QUAL_CONST) pp_string (buffer, "constant ");
[c-family] Fix issue for nested record types with -fdump-ada-spec
Ada does not support anonymous record declarations nested in other record declarations so -fdump-ada-spec needs to unnest them, and this contains a few fixes for this machinery. Tested on x86-64/Linux, applied on the mainline. 2021-06-03 Eric Botcazou c-family/ * c-ada-spec.c (dump_ada_macros): Minor tweaks. (dump_ada_decl_name): Likewise. (dump_anonymous_type_name): Remove parent parameter and adjust. (dump_sloc): Minor tweak. (dump_ada_array_type): Remove type parameter and adjust. (dump_ada_enum_type): Remove parent parameter and adjust. (dump_ada_node): Adjust calls to above functions. (dumped_anonymous_types): New global variable. (dump_nested_types_1): Rename into... (dump_nested_types): ...this. (dump_nested_type): Remove parent and dumped_types parameters. : Replace dumped_types with dumped_anonymous_types. Adjust calls to dump_anonymous_type_name and dump_ada_array_type. (dump_ada_specs): Initialize and free dumped_anonymous_types. -- Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c index 751cc0edef8..a2669c604a9 100644 --- a/gcc/c-family/c-ada-spec.c +++ b/gcc/c-family/c-ada-spec.c @@ -408,8 +408,8 @@ dump_ada_macros (pretty_printer *pp, const char* file) } else { - chars_seen = sprintf - ((char *) buffer, "Character'Val (%d)", (int) c); + chars_seen = sprintf ((char *) buffer, + "Character'Val (%d)", (int) c); buffer += chars_seen; } } @@ -611,7 +611,7 @@ dump_ada_macros (pretty_printer *pp, const char* file) pp_string (pp, "; -- "); pp_string (pp, sloc.file); pp_colon (pp); - pp_scalar (pp, "%d", sloc.line); + pp_decimal_int (pp, sloc.line); pp_newline (pp); } else @@ -1464,28 +1464,21 @@ dump_ada_decl_name (pretty_printer *buffer, tree decl, bool limited_access) { pp_string (buffer, "anon"); if (TREE_CODE (decl) == FIELD_DECL) - pp_scalar (buffer, "%d", DECL_UID (decl)); + pp_decimal_int (buffer, DECL_UID (decl)); else - pp_scalar (buffer, "%d", TYPE_UID (TREE_TYPE (decl))); + pp_decimal_int (buffer, TYPE_UID (TREE_TYPE (decl))); } else if (TREE_CODE (type_name) == IDENTIFIER_NODE) pp_ada_tree_identifier (buffer, type_name, decl, limited_access); } } -/* Dump in BUFFER a name for the type T, which is a _TYPE without TYPE_NAME. - PARENT is the parent node of T. */ +/* Dump in BUFFER a name for the type T, which is a TYPE without TYPE_NAME. */ static void -dump_anonymous_type_name (pretty_printer *buffer, tree t, tree parent) +dump_anonymous_type_name (pretty_printer *buffer, tree t) { - if (DECL_NAME (parent)) -pp_ada_tree_identifier (buffer, DECL_NAME (parent), parent, false); - else -{ - pp_string (buffer, "anon"); - pp_scalar (buffer, "%d", TYPE_UID (TREE_TYPE (parent))); -} + pp_string (buffer, "anon"); switch (TREE_CODE (t)) { @@ -1506,7 +1499,7 @@ dump_anonymous_type_name (pretty_printer *buffer, tree t, tree parent) break; } - pp_scalar (buffer, "%d", TYPE_UID (t)); + pp_decimal_int (buffer, TYPE_UID (t)); } /* Dump in BUFFER aspect Import on a given node T. SPC is the current @@ -1757,12 +1750,12 @@ dump_sloc (pretty_printer *buffer, tree node) { expanded_location xloc; - xloc.file = NULL; - if (DECL_P (node)) xloc = expand_location (DECL_SOURCE_LOCATION (node)); else if (EXPR_HAS_LOCATION (node)) xloc = expand_location (EXPR_LOCATION (node)); + else +xloc.file = NULL; if (xloc.file) { @@ -1790,11 +1783,11 @@ is_char_array (tree t) && id_equal (DECL_NAME (TYPE_NAME (t)), "char"); } -/* Dump in BUFFER an array type NODE of type TYPE in Ada syntax. SPC is the - indentation level. */ +/* Dump in BUFFER an array type NODE in Ada syntax. SPC is the indentation + level. */ static void -dump_ada_array_type (pretty_printer *buffer, tree node, tree type, int spc) +dump_ada_array_type (pretty_printer *buffer, tree node, int spc) { const bool char_array = is_char_array (node); @@ -1823,8 +1816,8 @@ dump_ada_array_type (pretty_printer *buffer, tree node, tree type, int spc) || (!RECORD_OR_UNION_TYPE_P (tmp) && TREE_CODE (tmp) != ENUMERAL_TYPE)) dump_ada_node (buffer, tmp, node, spc, false, true); - else if (type) - dump_anonymous_type_name (buffer, tmp, type); + else + dump_anonymous_type_name (buffer, tmp); } } @@ -1954,11 +1947,10 @@ is_simple_enum (tree node) } /* Dump in BUFFER the declaration of enumeral NODE of type TYPE in Ada syntax. - PARENT is the parent node of NODE. SPC is the indentation level. */ + SPC is the indentation level. */ static void -dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, tree parent, - int spc) +dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, int spc) { if (is_simple_enum
[c-family] Fix duplicate name issues in output of -fdump-ada-spec
The namespace rules are different in the C family of languages and in Ada, and a few adjustments are further needed in -fdump-ada-spec because of them. Tested on x86-64/Linux, applied on the mainline. 2021-06-03 Eric Botcazou c-family/ * c-ada-spec.c (dump_ada_enum_type): Dump a prefix for constants. (htable_t): New typedef. (overloaded_names): Use it. (add_name): New function. (init_overloaded_names): Use add_name to populate the table and add special cases for sigaction and stat. (overloaded_name_p): Rename into... (overloading_index): ...this. Do not initialize overloaded_names table here. Return the index or zero. (dump_ada_declaration): Minor tweaks. Do not skip overloaded functions but add an overloading suffix instead. (dump_ada_specs): Initialize overloaded_names tables here. -- Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.c b/gcc/c-family/c-ada-spec.c index 29eb0b01a91..ef0c74c3f08 100644 --- a/gcc/c-family/c-ada-spec.c +++ b/gcc/c-family/c-ada-spec.c @@ -2003,7 +2003,15 @@ dump_ada_enum_type (pretty_printer *buffer, tree node, tree type, tree parent, pp_semicolon (buffer); newline_and_indent (buffer, spc); + if (TYPE_NAME (node)) + dump_ada_node (buffer, node, NULL_TREE, spc, false, true); + else if (type) + dump_ada_node (buffer, type, NULL_TREE, spc, false, true); + else + dump_anonymous_type_name (buffer, node, parent); + pp_underscore (buffer); pp_ada_tree_identifier (buffer, TREE_PURPOSE (value), node, false); + pp_string (buffer, " : constant "); if (TYPE_NAME (node)) @@ -2628,11 +2636,31 @@ struct overloaded_name_hasher : delete_ptr_hash { return a->name == b->name; } }; -static hash_table *overloaded_names; +typedef hash_table htable_t; + +static htable_t *overloaded_names; + +/* Add an overloaded NAME with N occurrences to TABLE. */ + +static void +add_name (const char *name, unsigned int n, htable_t *table) +{ + struct overloaded_name_hash in, *h, **slot; + tree id = get_identifier (name); + hashval_t hash = htab_hash_pointer (id); + in.hash = hash; + in.name = id; + slot = table->find_slot_with_hash (&in, hash, INSERT); + h = new overloaded_name_hash; + h->hash = hash; + h->name = id; + h->n = n; + *slot = h; +} /* Initialize the table with the problematic overloaded names. */ -static hash_table * +static htable_t * init_overloaded_names (void) { static const char *names[] = @@ -2640,41 +2668,31 @@ init_overloaded_names (void) { "memchr", "rawmemchr", "memrchr", "strchr", "strrchr", "strchrnul", "strpbrk", "strstr", "strcasestr", "index", "rindex", "basename" }; - hash_table *table -= new hash_table (64); + htable_t *table = new htable_t (64); for (unsigned int i = 0; i < ARRAY_SIZE (names); i++) -{ - struct overloaded_name_hash in, *h, **slot; - tree id = get_identifier (names[i]); - hashval_t hash = htab_hash_pointer (id); - in.hash = hash; - in.name = id; - slot = table->find_slot_with_hash (&in, hash, INSERT); - h = new overloaded_name_hash; - h->hash = hash; - h->name = id; - h->n = 0; - *slot = h; -} +add_name (names[i], 0, table); + + /* Consider that sigaction() is overloaded by struct sigaction for QNX. */ + add_name ("sigaction", 1, table); + + /* Consider that stat() is overloaded by struct stat for QNX. */ + add_name ("stat", 1, table); return table; } -/* Return whether NAME cannot be supported as overloaded name. */ +/* Return the overloading index of NAME or 0 if NAME is not overloaded. */ -static bool -overloaded_name_p (tree name) +static unsigned int +overloading_index (tree name) { - if (!overloaded_names) -overloaded_names = init_overloaded_names (); - struct overloaded_name_hash in, *h; hashval_t hash = htab_hash_pointer (name); in.hash = hash; in.name = name; h = overloaded_names->find_with_hash (&in, hash); - return h && ++h->n > 1; + return h ? ++h->n : 0; } /* Dump in BUFFER constructor spec corresponding to T for TYPE. */ @@ -2798,14 +2816,17 @@ dump_ada_declaration (pretty_printer *buffer, tree t, tree type, int spc) } /* Skip unnamed or anonymous structs/unions/enum types. */ - if (!orig && !decl_name && !name + if (!orig && (RECORD_OR_UNION_TYPE_P (TREE_TYPE (t)) - || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE)) + || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE) + && !decl_name + && !name) return 0; - /* Skip anonymous enum types (duplicates of real types). */ + /* Skip duplicates of structs/unions/enum types built in C++. */ if (!orig - && TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE + && (RECORD_OR_UNION_TYPE_P (TREE_TYPE (t)) + || TREE_CODE (TREE_TYPE (t)) == ENUMERAL_TYPE) && decl_name && (*IDENTIFIER_POINTER (decl_name) == '.' || *IDENTIFIER_POINTER (decl_
Re: [PATCH] wwwdocs: Do not rewrite the page titles
On 03/06/21 16:50 +0100, Jonathan Wakely wrote: Ping. Is this OK now? On 18/04/21 23:45 +0100, Jonathan Wakely wrote: Remove GNU and FSF attribution from HTML page titles. I don't see why we should have to "comply with the GNU style" if we're truly an independent project run by the GCC developers and aided by the steering committee. OK for wwwdocs? An alternative change would be to just drop the mention of the FSF, since they don't fund GCC or provide hosting etc., as in the attached patch. And as I pointed out previously, none of these sites refer to the FSF in their page s: https://www.gnu.org/software/gdb/ https://www.gnu.org/software/libc/ https://www.gnu.org/software/guile/ https://www.gnu.org/software/emacs/ https://guix.gnu.org/ (Some don't refer to the GNU project either.) diff --git a/htdocs/style.mhtml b/htdocs/style.mhtml index 677fd05a..1f7139eb 100644 --- a/htdocs/style.mhtml +++ b/htdocs/style.mhtml @@ -32,12 +32,12 @@ -;;; Redefine the tag to comply with the GNU style. +;;; Redefine the tag to mention the GNU project. %body -- GNU Project - Free Software Foundation (FSF) +- GNU Project ;;; Redefine the tag, adding navigation and a standard footer.
Re: [PATCH 04/11] cris: Update unexpected empty split condition
> From: Kewen.Lin > Date: Thu, 3 Jun 2021 07:45:57 +0200 > on 2021/6/2 Hans-Peter Nilsson wrote: > >> From: Kewen Lin > >> Date: Wed, 2 Jun 2021 07:04:54 +0200 > > > >> gcc/ChangeLog: > >> > >>* config/cris/cris.md (*addi_reload): Fix empty split condition. > >> - "" > >> + "&& 1" > > Ok, thanks, if only for all-round consistency. > > > > In preparation for a warning for an empty condition? I'm > > usually all for .md-warnings, but I'm not sure about the > > benefit of that one, though. Those "&& 1" look...hackish. > > Thanks! Yeah, the 01/11 patch aims to raise one error message > for the define_insn_and_split whose split condition is empty > while insn condition isn't. In most cases, when we write one > define_insn_and_split we want the splitting only to take effect > while we see the define_insn matching happen (insn cond holds), > but if we leave the split condition empty, the splitting will > be done always, it could result in some unexpected consequence. > Mostly this is unintentional. It certainly was in the patch above! > The error message is to avoid > people to make it unintentionally. > > As you may have seen from the discussion under the 00/11 thread, > we will probably end up with some other solution, so I will hold > the changes for the ports, sorry for wasting your time and the > other port maintainers'. No worries: I certainly don't consider it wasted and I'd prefer to have the patch above committed sooner than the conclusion of that discussion. (If you don't get to it, I'll do it, after a round of testing.) If you're considering further target patches to adjust for eventually changed semantics in the define_insn_and_split split-condition, then whatever trivial patch to cris.md that gets the effect of the one you sent is preapproved. Again, thanks. brgds, H-P
Fwd: [PUSHED] Skip out on processing __builtin_clz when varying.
Ping*2 -- Forwarded message - From: Aldy Hernandez Date: Thu, May 13, 2021, 20:02 Subject: Re: [PUSHED] Skip out on processing __builtin_clz when varying. To: Jakub Jelinek Cc: GCC patches On 5/12/21 5:08 PM, Jakub Jelinek wrote: > On Wed, May 12, 2021 at 05:01:00PM -0400, Aldy Hernandez via Gcc-patches wrote: >> >> PR c/100521 >> * gimple-range.cc (range_of_builtin_call): Skip out on >>processing __builtin_clz when varying. >> --- >> gcc/gimple-range.cc | 2 +- >> gcc/testsuite/gcc.dg/pr100521.c | 8 >> 2 files changed, 9 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.dg/pr100521.c >> >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr100521.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> + >> +int >> +__builtin_clz (int a) > > Is this intentional? People shouldn't be redefining builtins... Ughhh. I don't think that's intentional. For that matter, the current nor the old code is designed to deal with this, especially in this case when the builtin is being redefined with incompatible arguments. That is, the above "builtin" has a signed integer as an argument, whereas the original builtin had an unsigned one. In looking at the original vr-values code, I think this could use a cleanup. First, ranges from range_of_expr are always numeric so we should adjust. Also, the checks for non-zero were assuming the argument was unsigned, which in the above redirect is clearly not. I've cleaned this up, so that it works either way, though perhaps we should _also_ bail on non-builtins. I don't know...this is before my time. BTW, I've removed the following annoying idiom: - int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ()); - if (newmini == prec) This is really a check for r.upper_bound() == 0, as floor_log2(0) returns -1. It's confusing. How does this look? For reference, the original code where this all came from is 82b6d25d289195. Thanks for pointing this out. Aldy From f8a958e8028ed129558f9ad7ccf423c834d377bd Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 13 May 2021 13:47:41 -0400 Subject: [PATCH] Cleanup clz and ctz code in range_of_builtin_call. gcc/ChangeLog: * gimple-range.cc (range_of_builtin_call): Cleanup clz and ctz code. --- gcc/gimple-range.cc | 43 --- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index 5b288d8e6a7..b33ba1c8099 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -736,33 +736,29 @@ range_of_builtin_call (range_query &query, irange &r, gcall *call) } query.range_of_expr (r, arg, call); - // From clz of minimum we can compute result maximum. - if (r.constant_p () && !r.varying_p ()) + if (!r.undefined_p ()) { - int newmaxi = prec - 1 - wi::floor_log2 (r.lower_bound ()); - // Argument is unsigned, so do nothing if it is [0, ...] range. - if (newmaxi != prec) + // From clz of minimum we can compute result maximum. + if (wi::gt_p (r.lower_bound (), 0, TYPE_SIGN (r.type ( + { + maxi = prec - 1 - wi::floor_log2 (r.lower_bound ()); + if (mini == -2) + mini = 0; + } + else if (!range_includes_zero_p (&r)) { mini = 0; - maxi = newmaxi; + maxi = prec - 1; } - } - else if (!range_includes_zero_p (&r)) - { - maxi = prec - 1; - mini = 0; - } - if (mini == -2) - break; - // From clz of maximum we can compute result minimum. - if (r.constant_p ()) - { - int newmini = prec - 1 - wi::floor_log2 (r.upper_bound ()); - if (newmini == prec) + if (mini == -2) + break; + // From clz of maximum we can compute result minimum. + wide_int max = r.upper_bound (); + int newmini = prec - 1 - wi::floor_log2 (max); + if (max == 0) { - // Argument range is [0, 0]. If CLZ_DEFINED_VALUE_AT_ZERO - // is 2 with VALUE of prec, return [prec, prec], otherwise - // ignore the range. + // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec, + // return [prec, prec], otherwise ignore the range. if (maxi == prec) mini = prec; } @@ -803,7 +799,8 @@ range_of_builtin_call (range_query &query, irange &r, gcall *call) query.range_of_expr (r, arg, call); if (!r.undefined_p ()) { - if (r.lower_bound () != 0) + // If arg is non-zero, then use [0, prec - 1]. + if (!range_includes_zero_p (&r)) { mini = 0; maxi = prec - 1; -- 2.31.1
[PATCH] PR libstdc++/98842: Fixed Constraints on operator<=>(optional, U)
The original operator was underconstrained. _Up needs to fulfill compare_three_way_result, as mentioned in this bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98842 diff --git a/libstdc++-v3/include/std/optional b/libstdc++-v3/include/std/optional index 8b9e038e6e510..9e61c1b2cbfbd 100644 --- a/libstdc++-v3/include/std/optional +++ b/libstdc++-v3/include/std/optional @@ -1234,7 +1234,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return !__rhs || __lhs >= *__rhs; } #ifdef __cpp_lib_three_way_comparison - template + template _Up> constexpr compare_three_way_result_t<_Tp, _Up> operator<=>(const optional<_Tp>& __x, const _Up& __v) { return bool(__x) ? *__x <=> __v : strong_ordering::less; }
[RFC] Implementing detection of saturation and rounding arithmetic
Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up IV's shared between multiple loops. When creating a similar problem with C code I noticed IVOPTs treated IV's with uses outside the loop differently, this didn't even required multiple loops, take for instance the following example using SVE intrinsics: #include #include extern void use (char *); void bar (char * __restrict__ a, char * __restrict__ b, char * __restrict__ c, unsigned n) { svbool_t all_true = svptrue_b8 (); unsigned i = 0; if (n < (UINT_MAX - svcntb() - 1)) { for (; i < n; i += svcntb()) { svuint8_t va = svld1 (all_true, (uint8_t*)a); svuint8_t vb = svld1 (all_true, (uint8_t*)b); svst1 (all_true, (uint8_t *)c, svadd_z (all_true, va,vb)); a += svcntb(); b += svcntb(); c += svcntb(); } } use (a); } IVOPTs tends to generate a shared IV for SVE memory accesses, as we don't have a post-increment for SVE load/stores. If we had not included 'use (a);' in this example, IVOPTs would have replaced the IV's for a, b and c with a single one, (also used for the loop-control). See: [local count: 955630225]: # ivtmp.7_8 = PHI va_14 = MEM [(unsigned char *)a_10(D) + ivtmp.7_8 * 1]; vb_15 = MEM [(unsigned char *)b_11(D) + ivtmp.7_8 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_14, vb_15); MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2; ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16]; i_23 = (unsigned int) ivtmp.7_25; if (n_9(D) > i_23) goto ; [89.00%] else goto ; [11.00%] However, due to the 'use (a);' it will create two IVs one for loop-control, b and c and one for a. See: [local count: 955630225]: # a_28 = PHI # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_28]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; a_18 = a_28 + POLY_INT_CST [16, 16]; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_8 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_8) goto ; [89.00%] else goto ; [11.00%] With the first patch attached in this RFC 'no_cost.patch', I tell IVOPTs to not cost uses outside of the loop. This makes IVOPTs generate a single IV, but unfortunately it decides to create the variable for the use inside the loop and it also seems to use the pre-increment value of the shared-IV and add the [16,16] to it. See: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; _8 = (unsigned long) a_11(D); _7 = _8 + ivtmp.7_25; _6 = _7 + POLY_INT_CST [16, 16]; a_18 = (char * restrict) _6; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_5 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_5) goto ; [89.00%] else goto ; [11.00%] With the patch 'var_after.patch' I make get_computation_aff_1 use 'cand->var_after' for outside uses thus using the post-increment var of the candidate IV. This means I have to insert it in a different place and make sure to delete the old use->stmt. I'm sure there is a better way to do this using IVOPTs current framework, but I didn't find one yet. See the result: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; _8 = (unsigned long) a_11(D); _7 = _8 + ivtmp.7_24; a_18 = (char * restrict) _7; i_6 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_6) goto ; [89.00%] else goto ; [11.00%] This is still not optimal as we are still doing the update inside the loop and there is absolutely no need for that. I found that running sink would solve it and it seems someone has added a second sink pass, so that saves me a third patch :) see after sink2: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_6 = (unsigned int) ivtmp.7_24; if (i_6 < n_10(D)) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _8 = (unsigned long) a_11(D); _7 = _8 + ivtmp.7_24; a_18 = (char * restrict) _7; goto ; [
[RFC][ivopts] Generate better code for IVs with uses outside the loop (was Re: [RFC] Implementing detection of saturation and rounding arithmetic)
Streams got crossed there and used the wrong subject ... On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote: Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up IV's shared between multiple loops. When creating a similar problem with C code I noticed IVOPTs treated IV's with uses outside the loop differently, this didn't even required multiple loops, take for instance the following example using SVE intrinsics: #include #include extern void use (char *); void bar (char * __restrict__ a, char * __restrict__ b, char * __restrict__ c, unsigned n) { svbool_t all_true = svptrue_b8 (); unsigned i = 0; if (n < (UINT_MAX - svcntb() - 1)) { for (; i < n; i += svcntb()) { svuint8_t va = svld1 (all_true, (uint8_t*)a); svuint8_t vb = svld1 (all_true, (uint8_t*)b); svst1 (all_true, (uint8_t *)c, svadd_z (all_true, va,vb)); a += svcntb(); b += svcntb(); c += svcntb(); } } use (a); } IVOPTs tends to generate a shared IV for SVE memory accesses, as we don't have a post-increment for SVE load/stores. If we had not included 'use (a);' in this example, IVOPTs would have replaced the IV's for a, b and c with a single one, (also used for the loop-control). See: [local count: 955630225]: # ivtmp.7_8 = PHI va_14 = MEM [(unsigned char *)a_10(D) + ivtmp.7_8 * 1]; vb_15 = MEM [(unsigned char *)b_11(D) + ivtmp.7_8 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_14, vb_15); MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2; ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16]; i_23 = (unsigned int) ivtmp.7_25; if (n_9(D) > i_23) goto ; [89.00%] else goto ; [11.00%] However, due to the 'use (a);' it will create two IVs one for loop-control, b and c and one for a. See: [local count: 955630225]: # a_28 = PHI # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_28]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; a_18 = a_28 + POLY_INT_CST [16, 16]; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_8 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_8) goto ; [89.00%] else goto ; [11.00%] With the first patch attached in this RFC 'no_cost.patch', I tell IVOPTs to not cost uses outside of the loop. This makes IVOPTs generate a single IV, but unfortunately it decides to create the variable for the use inside the loop and it also seems to use the pre-increment value of the shared-IV and add the [16,16] to it. See: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; _8 = (unsigned long) a_11(D); _7 = _8 + ivtmp.7_25; _6 = _7 + POLY_INT_CST [16, 16]; a_18 = (char * restrict) _6; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_5 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_5) goto ; [89.00%] else goto ; [11.00%] With the patch 'var_after.patch' I make get_computation_aff_1 use 'cand->var_after' for outside uses thus using the post-increment var of the candidate IV. This means I have to insert it in a different place and make sure to delete the old use->stmt. I'm sure there is a better way to do this using IVOPTs current framework, but I didn't find one yet. See the result: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; _8 = (unsigned long) a_11(D); _7 = _8 + ivtmp.7_24; a_18 = (char * restrict) _7; i_6 = (unsigned int) ivtmp.7_24; if (n_10(D) > i_6) goto ; [89.00%] else goto ; [11.00%] This is still not optimal as we are still doing the update inside the loop and there is absolutely no need for that. I found that running sink would solve it and it seems someone has added a second sink pass, so that saves me a third patch :) see after sink2: [local count: 955630225]: # ivtmp.7_25 = PHI va_15 = MEM [(unsigned char *)a_11(D) + ivtmp.7_25 * 1]; vb_16 = MEM [(unsigned char *)b_12(D) + ivtmp.7_25 * 1]; _2 = svadd_u8_z ({ -1, ... }, va_15, vb_16); MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2; ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16]; i_6 = (unsigned int) ivtmp.7_24; if (i_6 < n_10(D)) goto ; [89.00%] else goto ; [11.00%]
[PATCH] libgcc: Fix _Unwind_Backtrace() for SEH
Forgot to assign to gcc_context.cfa and gcc_context.ra. Note this fix can be backported to earlier editions of gcc as well diff --git a/libgcc/unwind-seh.c b/libgcc/unwind-seh.c index 8c6aade9a3b39..d40d16702a9e1 100644 --- a/libgcc/unwind-seh.c +++ b/libgcc/unwind-seh.c @@ -466,6 +466,9 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace, &gcc_context.disp->HandlerData, &gcc_context.disp->EstablisherFrame, NULL); + gcc_context.cfa = ms_context.Rsp; + gcc_context.ra = ms_context.Rip; + /* Call trace function. */ if (trace (&gcc_context, trace_argument) != _URC_NO_REASON) return _URC_FATAL_PHASE1_ERROR;
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On 6/3/2021 2:00 AM, Segher Boessenkool wrote: Hi! On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote: on 2021/6/3 上午7:52, Segher Boessenkool wrote: - add a new "define_independent_insn_and_split" that has the current semantics of define_insn_and_split. This should be mechanical. I'd rather not have that -- we can just write separate define_insn and define_split in that case. Not sure if someone would argue that he/she would like to go with one shared pattern as before, to avoid any possible differences between two seperated patterns and have good maintainability (like only editing on place) and slightly better efficiency. You only would do this if you have a different insn condition and split condition, which is a very important thing to know, it doesn't hurt to draw attention to it. The efficiency is exactly the same btw, define_insn_and_split is just syntactic sugar. The whole point of requiring the split condition to start with && is so it will become harder to mess things up (it will make the gen* code a tiny little bit simpler as well). And there is no transition period or anything like that needed either. Just the bunch that will break will need fixing. So let's find out how many of those there are :-) Exactly. While these empty conditions or those not starting with "&&" are technically valid, they're all suspicious from a port correctness standpoint, particularly if the main condition is non-empty. Having made that mistake when converting the H8 away from CC0, I can say fairly confidently that if we had this in place a year ago that those mistakes would likely have been avoided. Thankfully the H8 isn't a heavily used port and has limped along until I stumbled over the issue a week or so ago while polishing some improvements to the port. Jeff
Re: GCC documentation: porting to Sphinx
On Thu, 3 Jun 2021, Martin Liška wrote: > On 6/2/21 6:44 PM, Joseph Myers wrote: > > On Wed, 2 Jun 2021, Joel Sherrill wrote: > > > > > For RTEMS, we switched from texinfo to Sphinx and the dependency > > > on Python3 for Sphinx has caused a bit of hassle. Is this going to be > > > an issue for GCC? > > > > What Sphinx (and, thus, Python) versions does the GCC manual build work > > with? > > I've just tried version 1.7.6 which we use for libgccjit and it's fine: > https://gcc.gnu.org/onlinedocs/jit/ > > About Python version: I'm not planning supporting Python2, it's dead 10 years > already. There should be appropriate configure checks to avoid building manuals with too-old versions (i.e. disable the info/man manual build/install when Sphinx, or the Python version it's using, is too old or missing, not fail configure). Actually this code is depending on Python 3.6 or later because of the use of an f-string in baseconf.py (without that f-string, it works with older versions, even 2.7). Formally 3.5 and older are no longer supported upstream, but certainly still present in some maintained long-term-support distribution versions. > I would recommend testing the build. You can simply clone: > https://github.com/marxin/texi2rst-generated > > and simply run 'make html' or 'make latexpdf'. Basic dependencies are > mentioned here: > https://github.com/marxin/texi2rst-generated#requirements It appears "make html" works (with lots of WARNINGs) with Sphinx 1.6.1 but fails with 1.4 ("Theme error: unsupported theme option 'prev_next_buttons_location' given"). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 2/2, rs6000] Remove mode promotion for pseudos
Hi! On Thu, May 20, 2021 at 05:49:49PM +0800, HAO CHEN GUI wrote: > rs6000 has instructions that can do almost everything 32 bit > at least as efficiently as corresponding 64 bit things. The > mode promotion can be defered to when a wide mode is necessary. > So it helps a lot not promote mode for pseudos. SPECint test > shows that the overall performance improvement (by geomean) is > more than 2% with this patch. > testsuite/gcc.target/powerpc/not-promote-mode.c illustrates how > the patch eliminates the redundant extensions and do further > optimization by disabling mode promotion for pseduos. I'd still like to see if (and why) this works better than explicitly promoting QImode and HImode here. But that can be done later. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/not-promote-mode.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ Just /* { dg-do compile { target lp64 } } */ because the rest is already implied by this being in gcc.target/powerpc . The patch is okay for trunk. Thank you very much for finding this huge performance gain! Segher
Re: [PATCH] MAINTAINERS: create DCO section; add myself to it
On 6/2/21 12:33 PM, Koning, Paul wrote: On Jun 2, 2021, at 11:03 AM, Jason Merrill via Gcc-patches wrote: On 6/1/21 3:22 PM, Richard Biener via Gcc wrote: On June 1, 2021 7:30:54 PM GMT+02:00, David Malcolm via Gcc wrote: ... The MAINTAINERS file doesn't seem to have such a "DCO list" yet; does the following patch look like what you had in mind? ChangeLog * MAINTAINERS: Create DCO section; add myself to it. --- MAINTAINERS | 12 1 file changed, 12 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index db25583b37b..1148e0915cf 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -685,3 +685,15 @@ Josef Zlomek James Dennett Christian Ehrhardt Dara Hazeghi + + +DCO +=== + +Developers with commit access may add their name to the following list +to certify the DCO (https://developercertificate.org/) for all There should be a verbatim copy of the DCO in this file or the repository. It's on the website now, at gcc.gnu.org/dco.html , and I've added the section to MAINTAINERS. It's not clear to me that it needs to be in the source tree as well, since it's project contribution policy rather than license. I'm wondering about change control of this document. The GPL has a version number and references to use the version number. The DCO seems to have a version number, but the DCO section in the MAINTAINERS file does not give it. I would think that a certification should call out which DCO it uses, whether in a one-off (in a patch) or in the MAINTAINERS DCO list. I've added the version to the MAINTAINERS list, thanks. It is not customary to mention the version in a Signed-off-by tag. Jason
Re: [PATCH 1/2] rs6000: Add support for _mm_minpos_epu16
On Wed, Jun 02, 2021 at 07:27:35PM -0500, Segher Boessenkool wrote: > On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote: > > Add a naive implementation of the subject x86 intrinsic to > > ease porting. > > > +/* Return horizontal packed word minimum and its index in bits [15:0] > > + and bits [18:16] respectively. */ > > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_minpos_epu16 (__m128i __A) > > +{ > > + union __u > > +{ > > + __m128i __m; > > + __v8hu __uh; > > +}; > > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > > + unsigned short __ridx = 0; > > + unsigned short __rmin = __u.__uh[__ridx]; > > + for (unsigned long __i = __ridx+1; > > (spaces around the "+"?) ok > > > + __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]); > > You should either use a macro for that, or just write "8" :-) ok. (There should be a standard thing for this operation.) > > + __i++) > > +{ > > + if (__u.__uh[__i] < __rmin) > > +{ > > + __rmin = __u.__uh[__i]; > > + __ridx = __i; > > +} > > +} > > + __r.__uh[0] = __rmin; > > + __r.__uh[1] = __ridx; > > + return __r.__m; > > +} > > This does not compute the index correctly for big endian (it needs to > walk from right to left for that). The construction of the return value > looks wrong as well. > > Okay for trunk with that fixed. Thanks! I'm not seeing the issue here. The values are numbered by element order, and the results are in the "first" (minimum value) and "second" (index of first encountered minimum value in element order) elements of the result. PC
Re: [PATCH 2/2] rs6000: Add test for _mm_minpos_epu16
On Wed, Jun 02, 2021 at 08:50:56PM -0500, Segher Boessenkool wrote: > On Wed, Jun 02, 2021 at 05:13:16PM -0500, Paul A. Clarke wrote: > > + for (i = 0; i < NUM; i++) > > +src.s[i] = i * i - 68 * i + 1200; > > Could you do tests with some identical elements as well? Because that > is where I think it fails on BE currently. Let me re-do the test case a bit more to provide a better set of input data, rather than this computational attempt which misses a bunch of interesting cases. I'll send a v2 in a bit. PC
[PATCH] c++: top-level cv-quals on type of NTTP [PR100893]
Here, we're rejecting the specialization of g with T=A, F=&f in the first testcase due to a spurious constness mismatch between the type of the template argument &f and the substituted type of F (the substituted type has a top-level const). Note that this mismatch doesn't occur with object pointers because in that case a call to perform_qualification_conversions from convert_nontype_argument implicitly adds a top-level const to the argument (via a cast) to match. This however seems to be a manifestation of a more general conformance issue -- that we're not dropping top-level cv-quals after substituting into the type of an NTTP as per [temp.param]/6 (we only do so at parse time in process_template_parm). This patch makes convert_template_argument drop top-level cv-quals accordingly. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/100893 gcc/cp/ChangeLog: * pt.c (convert_template_argument): Strip top-level cv-quals on the substituted type of a non-type template parameter. gcc/testsuite/ChangeLog: * g++.dg/template/param4.C: New test. * g++.dg/template/param5.C: New test. --- gcc/cp/pt.c| 7 ++- gcc/testsuite/g++.dg/template/param4.C | 10 ++ gcc/testsuite/g++.dg/template/param5.C | 7 +++ 3 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/template/param4.C create mode 100644 gcc/testsuite/g++.dg/template/param5.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7211bdc5bbc..66cc88a331f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8494,7 +8494,12 @@ convert_template_argument (tree parm, return error_mark_node; } else - t = tsubst (t, args, complain, in_decl); + { + t = tsubst (t, args, complain, in_decl); + /* Ignore top-level qualifiers on the substituted type of this +non-type template parameter, as per [temp.param]/6. */ + t = cv_unqualified (t); + } if (invalid_nontype_parm_type_p (t, complain)) return error_mark_node; diff --git a/gcc/testsuite/g++.dg/template/param4.C b/gcc/testsuite/g++.dg/template/param4.C new file mode 100644 index 000..d55fecec2d9 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/param4.C @@ -0,0 +1,10 @@ +// PR c++/100893 + +template void g() { } + +struct A { typedef void (* const type)(); }; +void f(); +template void g(); + +struct B { typedef void (B::* const type)(); void f(); }; +template void g(); diff --git a/gcc/testsuite/g++.dg/template/param5.C b/gcc/testsuite/g++.dg/template/param5.C new file mode 100644 index 000..b8c92f4c217 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/param5.C @@ -0,0 +1,7 @@ +// Verify top-level cv-qualifiers are dropped when determining the substituted +// type of a non-type-template parameter, as per [temp.param]/6. +// { dg-do compile { target c++11 } } + +template decltype(V)& f(); +using type = decltype(f()); +using type = int&; -- 2.32.0.rc2
[PATCH] i386: Add insert and extract patterns for 4-byte vectors [PR100637]
The patch introduces insert and extract patterns for 4-byte vectors. It effectively only emits PINSR and PEXTR instructions when available, otherwise falls back to generic code that emulates these instructions via inserts, extracts, logic operations and shifts in integer registers. Please note that generic fallback produces better code than the current approach of constructing new vector in memory (due to store forwarding stall) so also enable QImode 8-byte vector inserts only with TARGET_SSE4_1. 2021-06-03 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_vector_set): Handle V2HI and V4QI modes. (ix86_expand_vector_extract): Ditto. * config/i386/mmx.md (*pinsrw): New insn pattern. (*pinsrb): Ditto. (*pextrw): Ditto. (*pextrw_zext): Ditto. (*pextrb): Ditto. (*pextrb_zext): Ditto. (vec_setv2hi): New expander. (vec_extractv2hihi): Ditto. (vec_setv4qi): Ditto. (vec_extractv4qiqi): Ditto. (vec_setv8qi): Enable only for TARGET_SSE4_1. (vec_extractv8qiqi): Ditto. gcc/testsuite/ PR target/100637 * gcc.target/i386/vperm-v2hi.c: New test. * gcc.target/i386/vperm-v4qi.c: Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 4185f58eed5..eb7cdb0c14f 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -14968,6 +14968,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) return; case E_V8HImode: +case E_V2HImode: use_vec_merge = TARGET_SSE2; break; case E_V4HImode: @@ -14975,6 +14976,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) break; case E_V16QImode: +case E_V4QImode: use_vec_merge = TARGET_SSE4_1; break; @@ -15274,6 +15276,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) break; case E_V8HImode: +case E_V2HImode: use_vec_extr = TARGET_SSE2; break; case E_V4HImode: @@ -15294,6 +15297,9 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) return; } break; +case E_V4QImode: + use_vec_extr = TARGET_SSE4_1; + break; case E_V8SFmode: if (TARGET_AVX) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index f39e062ddfc..914e5e91e90 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -3092,7 +3092,7 @@ (define_expand "vec_setv8qi" [(match_operand:V8QI 0 "register_operand") (match_operand:QI 1 "register_operand") (match_operand 2 "const_int_operand")] - "TARGET_MMX || TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" { ix86_expand_vector_set (TARGET_MMX_WITH_SSE, operands[0], operands[1], INTVAL (operands[2])); @@ -3103,7 +3103,7 @@ (define_expand "vec_extractv8qiqi" [(match_operand:QI 0 "register_operand") (match_operand:V8QI 1 "register_operand") (match_operand 2 "const_int_operand")] - "TARGET_MMX || TARGET_MMX_WITH_SSE" + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" { ix86_expand_vector_extract (TARGET_MMX_WITH_SSE, operands[0], operands[1], INTVAL (operands[2])); @@ -3120,6 +3120,178 @@ (define_expand "vec_initv8qiqi" DONE; }) +(define_insn "*pinsrw" + [(set (match_operand:V2HI 0 "register_operand" "=x,YW") +(vec_merge:V2HI + (vec_duplicate:V2HI +(match_operand:HI 2 "nonimmediate_operand" "rm,rm")) + (match_operand:V2HI 1 "register_operand" "0,YW") + (match_operand:SI 3 "const_int_operand")))] + "TARGET_SSE2 + && ((unsigned) exact_log2 (INTVAL (operands[3])) + < GET_MODE_NUNITS (V2HImode))" +{ + operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3]))); + switch (which_alternative) +{ +case 1: + if (MEM_P (operands[2])) + return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + else + return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}"; +case 0: + if (MEM_P (operands[2])) + return "pinsrw\t{%3, %2, %0|%0, %2, %3}"; + else + return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}"; +default: + gcc_unreachable (); +} +} + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sselog") + (set_attr "length_immediate" "1") + (set_attr "mode" "TI")]) + +(define_insn "*pinsrb" + [(set (match_operand:V4QI 0 "register_operand" "=x,YW") +(vec_merge:V4QI + (vec_duplicate:V4QI +(match_operand:QI 2 "nonimmediate_operand" "rm,rm")) + (match_operand:V4QI 1 "register_operand" "0,YW") + (match_operand:SI 3 "const_int_operand")))] + "TARGET_SSE4_1 + && ((unsigned) exact_log2 (INTVAL (operands[3])) + < GET_MODE_NUNITS (V4QImode))" +{ + operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3]))); + switch (which_alternative) +{ +ca
Re: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]
On Thu, 3 Jun 2021, Patrick Palka wrote: > Here, we're rejecting the specialization of g with T=A, F=&f in the > first testcase due to a spurious constness mismatch between the type of > the template argument &f and the substituted type of F (the substituted > type has a top-level const). Note that this mismatch doesn't occur with > object pointers because in that case a call to > perform_qualification_conversions from convert_nontype_argument > implicitly adds a top-level const to the argument (via a cast) to match. > > This however seems to be a manifestation of a more general conformance > issue -- that we're not dropping top-level cv-quals after substituting > into the type of an NTTP as per [temp.param]/6 (we only do so at parse > time in process_template_parm). This patch makes convert_template_argument > drop top-level cv-quals accordingly. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for > trunk? > > PR c++/100893 > > gcc/cp/ChangeLog: > > * pt.c (convert_template_argument): Strip top-level cv-quals > on the substituted type of a non-type template parameter. > > gcc/testsuite/ChangeLog: > > * g++.dg/template/param4.C: New test. > * g++.dg/template/param5.C: New test. > --- > gcc/cp/pt.c| 7 ++- > gcc/testsuite/g++.dg/template/param4.C | 10 ++ > gcc/testsuite/g++.dg/template/param5.C | 7 +++ > 3 files changed, 23 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/template/param4.C > create mode 100644 gcc/testsuite/g++.dg/template/param5.C > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c > index 7211bdc5bbc..66cc88a331f 100644 > --- a/gcc/cp/pt.c > +++ b/gcc/cp/pt.c > @@ -8494,7 +8494,12 @@ convert_template_argument (tree parm, > return error_mark_node; > } >else > - t = tsubst (t, args, complain, in_decl); > + { > + t = tsubst (t, args, complain, in_decl); > + /* Ignore top-level qualifiers on the substituted type of this > + non-type template parameter, as per [temp.param]/6. */ > + t = cv_unqualified (t); > + } Err, shortly after posting I realized we could get top-level cv-quals not only after substitution, but also after deduction of a decltype(auto) NTTP (as in nontype-auto19.C below), so the call to cv_unqualified would need to happen after do_auto_deduction too, like so. Testing in progress. -- >8 -- Subject: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893] Here, we're rejecting the specialization of g with T=A, F=&f in param4.C below due to a spurious constness mismatch between the type of the template argument &f and the substituted type of F (the substituted type has a top-level const). Note that this mismatch doesn't occur with object pointers because in that case a call to perform_qualification_conversions from convert_nontype_argument implicitly adds a top-level const to the argument (via a cast) to match. This however seems to be a manifestation of a more general conformance issue -- that we're not dropping top-level cv-quals after substituting into the type of an NTTP as per [temp.param]/6 (we only do so at parse time in process_template_parm). So this patch makes convert_template_argument drop top-level cv-quals accordingly. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/100893 gcc/cp/ChangeLog: * pt.c (convert_template_argument): Strip top-level cv-quals on the substituted type of a non-type template parameter. gcc/testsuite/ChangeLog: * g++.dg/template/param4.C: New test. * g++.dg/template/param5.C: New test. * g++.dg/cpp1z/nontype-auto19.C: New test. * g++.dg/cpp2a/concepts-decltype.C: Don't expect that the deduced type of a decltype(auto) NTTP has top-level cv-quals. --- gcc/cp/pt.c| 4 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C| 8 gcc/testsuite/g++.dg/cpp2a/concepts-decltype.C | 2 +- gcc/testsuite/g++.dg/template/param4.C | 10 ++ gcc/testsuite/g++.dg/template/param5.C | 7 +++ 5 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C create mode 100644 gcc/testsuite/g++.dg/template/param4.C create mode 100644 gcc/testsuite/g++.dg/template/param5.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7211bdc5bbc..3cac073ed50 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8496,6 +8496,10 @@ convert_template_argument (tree parm, else t = tsubst (t, args, complain, in_decl); + /* Drop top-level cv-qualifiers on the substituted/deduced type of +this non-type template parameter, as per [temp.param]/6. */ + t = cv_unqualified (t); + if (invalid_nontype_parm_type_p (t, complain)) return error_mark_node; diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C b/gcc/testsuite
[PATCH] libgcc libiberty: optimize and modernize standard string and memory functions
This patch optimizes and simplifies many of the standard string functions. Since C99, some of the standard string functions have been changed to use the restrict modifier. diff --git a/libgcc/memcmp.c b/libgcc/memcmp.c index 2348afe1d27f7..74195cf6baf13 100644 --- a/libgcc/memcmp.c +++ b/libgcc/memcmp.c @@ -7,10 +7,11 @@ memcmp (const void *str1, const void *str2, size_t count) const unsigned char *s1 = str1; const unsigned char *s2 = str2; - while (count-- > 0) + while (count--) { - if (*s1++ != *s2++) - return s1[-1] < s2[-1] ? -1 : 1; + if (*s1 != *s2) + return *s1 < *s2 ? -1 : 1; + s1++, s2++; } return 0; } diff --git a/libgcc/memcpy.c b/libgcc/memcpy.c index 58b1e405627aa..616df78fd2969 100644 --- a/libgcc/memcpy.c +++ b/libgcc/memcpy.c @@ -2,7 +2,7 @@ #include void * -memcpy (void *dest, const void *src, size_t len) +memcpy (void * restrict dest, const void * restrict src, size_t len) { char *d = dest; const char *s = src; diff --git a/libgcc/memset.c b/libgcc/memset.c index 3e7025ee39443..b3b27cd63e12d 100644 --- a/libgcc/memset.c +++ b/libgcc/memset.c @@ -5,7 +5,7 @@ void * memset (void *dest, int val, size_t len) { unsigned char *ptr = dest; - while (len-- > 0) -*ptr++ = val; + while (len--) +*ptr++ = (unsigned char)val; return dest; } diff --git a/libiberty/memchr.c b/libiberty/memchr.c index 7448ab9e71c32..6f03e9c281108 100644 --- a/libiberty/memchr.c +++ b/libiberty/memchr.c @@ -23,7 +23,7 @@ memchr (register const PTR src_void, int c, size_t length) { const unsigned char *src = (const unsigned char *)src_void; - while (length-- > 0) + while (length--) { if (*src == c) return (PTR)src; diff --git a/libiberty/memcmp.c b/libiberty/memcmp.c index 37db60f38267a..f41b35a758cc4 100644 --- a/libiberty/memcmp.c +++ b/libiberty/memcmp.c @@ -27,8 +27,9 @@ memcmp (const PTR str1, const PTR str2, size_t count) while (count-- > 0) { - if (*s1++ != *s2++) - return s1[-1] < s2[-1] ? -1 : 1; + if (*s1 != *s2) + return *s1 < *s2 ? -1 : 1; + s1++, s2++; } return 0; } diff --git a/libiberty/memcpy.c b/libiberty/memcpy.c index 7f67d0bd1f26c..d388ae7f3506b 100644 --- a/libiberty/memcpy.c +++ b/libiberty/memcpy.c @@ -19,7 +19,7 @@ Copies @var{length} bytes from memory region @var{in} to region void bcopy (const void*, void*, size_t); PTR -memcpy (PTR out, const PTR in, size_t length) +memcpy (PTR restrict out, const PTR restrict in, size_t length) { bcopy(in, out, length); return out; diff --git a/libiberty/mempcpy.c b/libiberty/mempcpy.c index f4c624d4a3227..ac56eeaee0d5e 100644 --- a/libiberty/mempcpy.c +++ b/libiberty/mempcpy.c @@ -33,10 +33,10 @@ Copies @var{length} bytes from memory region @var{in} to region #include #include -extern PTR memcpy (PTR, const PTR, size_t); +extern PTR memcpy (PTR restrict, const PTR restrict, size_t); PTR -mempcpy (PTR dst, const PTR src, size_t len) +mempcpy (PTR restrict dst, const PTR restrict src, size_t len) { return (char *) memcpy (dst, src, len) + len; }
Re: [PATCH] rtl: constm64_rtx..const64_rtx
On 6/2/2021 4:43 PM, Segher Boessenkool wrote: Hi! On Wed, Jun 02, 2021 at 06:07:28PM +0100, Richard Sandiford wrote: Segher Boessenkool writes: Since times immemorial there has been const_int_rtx for all values from -64 to 64, but only constm1_rtx..const2_rtx have been available for convenient use. Change this, so that we can use all values in {-64,...,64} in RTL easily. This matters, because then we we just say if (XEXP (x, 1) == const16_rtx) and things like that, since all const_int in that range are unique. We already do for -1, 0, 1, 2, but we could for everything. No strong objection, but personally I'd rather not add something that is very specific to VOIDmode CONST_INTs. I realise it's very unlikely that we'll ever be able to give CONST_INTs their proper mode (no-one has the kind of time needed to do that), but I don't think we should make the switch actively harder either. How does this make that harder? Having no mode for CONST_INTs makes some things significantly *easier* btw. Well you know that, that is what makes any conversion away from this so much harder :-) We have has const0_rtx etc. since forever, this patch just increases the range (to those values that have had guaranteed unique RTXes since decades as well). Yea, but often what you really want is CONST0_RTX (mode) instead of const0_rtx. It's easily goof'd and often the cause minor missed optimizations. jeff
Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).
On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao wrote: > >A very typical example is how UMIN is optimised: > > > > case UMIN: > > if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0)) > > return op1; > > if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0)) > > return op0; > > tem = simplify_associative_operation (code, mode, op0, op1); > > if (tem) > > return tem; > > break; > > > >(the stuff using "tem"). > > > >Hongtao, can we do something similar here? Does that work well? Please try > >it out :-) > > In simplify_rtx, no simplication occurs, there is just the difference between > (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will > only be 0. simplify-rtx is used by combine. When you do and+not+splat for example my suggestion should kick in. Try it out, don't just dismiss it? > Basically we don't know it's a simplication until combine successfully split > the > 3->2 instructions (not + broadcast + and to andnot + broadcast), but it's > pretty awkward > to do this in combine. But you need to do this *before* it is split. That is the whole point. > Consider andnot is existed for many backends, I think a canonicalization is > needed here. Please do note that that is not as easy as yoou may think: you need to make sure nothing ever creates non-canonical code. > Maybe we can add insn canonicalization for transforming (and (vect_duplicate > (not A)) B) to > (and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not > (vec_duplicate A))? I don't understand what this means? Segher
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
Hi, Richard, For the following, I need more clarification: +/* Expand the IFN_DEFERRED_INIT function according to its second argument. */ +static void +expand_DEFERRED_INIT (internal_fn, gcall *stmt) +{ + tree var = gimple_call_lhs (stmt); + tree init = NULL_TREE; + enum auto_init_type init_type += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); + + switch (init_type) +{ +default: + gcc_unreachable (); +case AUTO_INIT_PATTERN: + init = build_pattern_cst_for_auto_init (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; +case AUTO_INIT_ZERO: + init = build_zero_cst (TREE_TYPE (var)); + expand_assignment (var, init, false); + break; +} I think actually building build_pattern_cst_for_auto_init can generate massive garbage and for big auto vars code size is also a concern and ideally on x86 you'd produce rep movq. So I don't think going via expand_assignment is good. Instead you possibly want to lower .DEFERRED_INIT to MEMs following expand_builtin_memset and eventually enhance that to allow storing pieces larger than a byte. I will lower .DEFFERED_INIT to MEMS following expand_builtin_memset for “AUTO_INIT_PATTERN”. My question is: Do I need to do the same for “AUTO_INIT_ZERO”? Thanks. Qing
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
Hi! On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote: > On 2021/6/3 06:20, Segher Boessenkool wrote: > > On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: > >> On P8LE, extra rot64+rot64 load or store instructions are generated > >> in float128 to vector __int128 conversion. > >> > >> This patch teaches pass swaps to also handle such pattens to remove > >> extra swap instructions. > > > > Did you check if this is already handled by simplify-rtx if the mode had > > been TImode (not V1TImode)? If not, why do you not handle it there? > > I tried to do it in combine or peephole, the later pass split2 > or split3 will still split it to rotate + rotate again as we have split > after reload, and this pattern is quite P8LE specific, so put it in pass > swap. The simplify-rtx could simplify > r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register > operations already. What mode are those subregs? Abbreviated RTL printouts are very lossy. Assuming those are TImode (please check), then yes, that is what I asked, thanks. > ;; The post-reload split requires that we re-permute the source > ;; register in case it is still live. > (define_split > [(set (match_operand:VSX_LE_128 0 "memory_operand") > (match_operand:VSX_LE_128 1 "vsx_register_operand"))] > "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >&& !altivec_indexed_or_indirect_operand (operands[0], mode)" > [(const_int 0)] > { > rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); > rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); > rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); > DONE; > }) Yes, that needs improvement itself. The tthing to realise is that TImode is optimised by generic code just fine (as all scalar integer modes are), but V1TImode is not. We have that mode because we really needed to not put TImode in vector registers so much on older cpus, but that balance may have changed by now. Worth experimenting with, we now can do pretty much all noormal operations in vector registers! Segher
Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc
Hi, Richard, On May 26, 2021, at 6:18 AM, Richard Biener mailto:rguent...@suse.de>> wrote: On Wed, 12 May 2021, Qing Zhao wrote: Hi, This is the 3rd version of the patch for the new security feature for GCC. Please take look and let me know your comments and suggestions. +/* Returns true when the given TYPE has padding inside it. + return false otherwise. */ +bool +type_has_padding (tree type) +{ + switch (TREE_CODE (type)) +{ +case RECORD_TYPE: + { btw, there's __builtin_clear_padding and a whole machinery around it in gimple-fold.c, I'm sure that parts could be re-used if they are neccessary in the end. To address the above suggestion: My study shows: the call to __builtin_clear_padding is expanded during gimplification phase. And there is no __bultin_clear_padding expanding during rtx expanding phase. However, for -ftrivial-auto-var-init, padding initialization should be done both in gimplification phase and rtx expanding phase. since the __builtin_clear_padding might not be good for rtx expanding, reusing __builtin_clear_padding might not work. Let me know if you have any more comments on this. Thanks. Qing
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On Thu, Jun 03, 2021 at 08:09:36AM -0500, Bill Schmidt wrote: > Note also that swap optimization can handle more general cases than > simplify-rtx. Do you have examples? That should be fixed (unless it is something Power-specific?) > In my view it's best to have it covered in both places. Oh certainly, and we need that p8swaps pass on at least p8 anyway (but perhaps we can allow TImode in vector regs on later cpus). Segher
PING [PATCH] PR fortran/99839 - [9/10/11/12 Regression] ICE in inline_matmul_assign, at fortran/frontend-passes.c:4234
*PING* > Gesendet: Donnerstag, 27. Mai 2021 um 22:20 Uhr > Von: "Harald Anlauf" > An: "fortran" , "gcc-patches" > Betreff: [PATCH] PR fortran/99839 - [9/10/11/12 Regression] ICE in > inline_matmul_assign, at fortran/frontend-passes.c:4234 > > Dear Fortranners, > > frontend optimization tries to inline matmul, but then it also needs > to take care of the assignment to the result array. If that one is > not of canonical type, we currently get an ICE. The straightforward > solution is to simply punt in those cases and avoid inlining. > > Regtested on x86_64-pc-linux-gnu. > > OK for mainline? Backport to affected branches? > > Thanks, > Harald > > > Fortran - ICE in inline_matmul_assign > > Restrict inlining of matmul to those cases where assignment to the > result array does not need special treatment. > > gcc/fortran/ChangeLog: > > PR fortran/99839 > * frontend-passes.c (inline_matmul_assign): Do not inline matmul > if the assignment to the resulting array if it is not of canonical > type (real/integer/complex/logical). > > gcc/testsuite/ChangeLog: > > PR fortran/99839 > * gfortran.dg/inline_matmul_25.f90: New test. > >
Re: [PATCH] c++: Fix up attribute handling in methods in templates [PR100872]
On 6/3/21 5:00 AM, Jakub Jelinek wrote: Hi! The following testcase FAILs because a dependent (late) attribute is never tsubsted. While the testcase is OpenMP, I think it is a generic C++ FE problem that could affect any other dependent attribute. apply_late_template_attributes documents that it relies on /* save_template_attributes puts the dependent attributes at the beginning of the list; find the non-dependent ones. */ The "operator binding" attributes that are sometimes added are added to the head of DECL_ATTRIBUTES list though and because it doesn't have ATTR_IS_DEPENDENT set it violates this requirement. The following patch fixes it by adding that attribute after all ATTR_IS_DEPENDENT attributes. I'm not 100% sure if DECL_ATTRIBUTES can't be shared by multiple functions (e.g. the cdtor clones), but the code uses later remove_attribute which could break that too. In any case it passed bootstrap/regtest on x86_64-linux and i686-linux. OK. Other option would be to copy_list the ATTR_IS_DEPENDENT portion of the DECL_ATTRIBUTES list if we need to do this, that would be the same as this patch but replace that *ap = op_attr; at the end with *ap = NULL_TREE; DECL_ATTRIBUTES (cfn) = chainon (copy_list (DECL_ATTRIBUTES (cfn)), op_attr); Or perhaps set ATTR_IS_DEPENDENT on the "operator bindings" attribute, though it would need to be studied what would it try to do with the attribute during tsubst. 2021-06-03 Jakub Jelinek PR c++/100872 * name-lookup.c (maybe_save_operator_binding): Add op_attr after all ATTR_IS_DEPENDENT attributes in the DECL_ATTRIBUTES list rather than to the start. * g++.dg/gomp/declare-simd-8.C: New test. --- gcc/cp/name-lookup.c.jj 2021-05-11 09:06:24.281997782 +0200 +++ gcc/cp/name-lookup.c2021-06-02 15:50:52.042521824 +0200 @@ -9136,9 +9136,12 @@ maybe_save_operator_binding (tree e) tree op_attr = lookup_attribute (op_bind_attrname, attributes); if (!op_attr) { + tree *ap = &DECL_ATTRIBUTES (cfn); + while (*ap && ATTR_IS_DEPENDENT (*ap)) + ap = &TREE_CHAIN (*ap); op_attr = tree_cons (get_identifier (op_bind_attrname), - NULL_TREE, attributes); - DECL_ATTRIBUTES (cfn) = op_attr; + NULL_TREE, *ap); + *ap = op_attr; } tree op_bind = purpose_member (fnname, TREE_VALUE (op_attr)); --- gcc/testsuite/g++.dg/gomp/declare-simd-8.C.jj 2021-06-02 16:02:32.792681922 +0200 +++ gcc/testsuite/g++.dg/gomp/declare-simd-8.C 2021-06-02 16:02:09.849004442 +0200 @@ -0,0 +1,15 @@ +// PR c++/100872 + +template +struct S { + #pragma omp declare simd aligned(a : N * 2) aligned(b) linear(ref(b): N) + float foo (float *a, T *&b) { return *a + *b; } +}; + +S<16, float> s; + +float +bar (float *a, float *p) +{ + return s.foo (a, p); +} Jakub
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote: > If remove the rotate in simplify-rtx like below: > > +++ b/gcc/simplify-rtx.c > @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > case ROTATE: >if (trueop1 == CONST0_RTX (mode)) > return op0; > + > + if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64) > + && CONST_INT_P (XEXP (trueop0, 1)) > + && INTVAL (XEXP (trueop0, 1)) == 64) > + return XEXP (trueop0, 0); (The hardcoded 64 need improving -- but this is just a proof of concept I'll assume :-) ) > Combine still fail to merge the two instructions: > > Trying 6 -> 7: > 6: r120:KF#0=r125:KF#0<-<0x40 > REG_DEAD r125:KF > 7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40 > REG_DEAD r120:KF > Successfully matched this instruction: > (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp) > (reg:DI 123)) [1 S16 A128]) > (subreg:V1TI (reg:KF 125) 0)) > rejecting combination of insns 6 and 7 > original costs 4 + 4 = 8 > replacement cost 12 So what instructions were these? Why did the store cost 4 but the new one costs 12? > By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8, It should be the same cost as the other store! > it could merge the instructions: > > 21: r125:KF=%v2:KF > REG_DEAD %v2:KF > 2: NOTE_INSN_DELETED > 3: NOTE_INSN_FUNCTION_BEG > 6: NOTE_INSN_DELETED >17: r123:DI=0x20 > 7: [sfp:DI+r123:DI]=r125:KF#0 > REG_DEAD r125:KF >19: NOTE_INSN_DELETED >14: %v2:V1TI=[sfp:DI+r123:DI] > REG_DEAD r123:DI >15: use %v2:V1TI > > Then followed split1 pass will still split it to due to no dse pass > between to remove the memory operations on stack, remove the rotate > in swap won't face such problem since it runs before dse and no split > pass between them: Sure, but none of that is the point. I asked if we did this for TImode properly, and maybe we do, but: >22: r126:V1TI=r125:KF#0<-<0x40 >23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40 ... this is V1TI mode. Segher
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On 6/3/21 3:19 PM, Segher Boessenkool wrote: On Thu, Jun 03, 2021 at 08:09:36AM -0500, Bill Schmidt wrote: Note also that swap optimization can handle more general cases than simplify-rtx. Do you have examples? That should be fixed (unless it is something Power-specific?) It is Power-specific. This optimization looks at entire webs of computation to determine whether the computation can be done in the "wrong" lanes without problems, and removes all the extra swaps at once. That is beyond what simplify-rtx is capable of. It takes full dataflow analysis to see what can legally be done. As a side effect, p8swaps takes care of some cases that simplify-rtx would catch at that particular point in compilation, but not all. In my view it's best to have it covered in both places. Oh certainly, and we need that p8swaps pass on at least p8 anyway (but perhaps we can allow TImode in vector regs on later cpus). Yes, I agree that should potentially be revisited, but we still have the ABI requirement to keep them in GPRs for parameter passing, so there are limits on what we can do. Bill Segher
Re: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893]
On 6/3/21 2:40 PM, Patrick Palka wrote: On Thu, 3 Jun 2021, Patrick Palka wrote: Here, we're rejecting the specialization of g with T=A, F=&f in the first testcase due to a spurious constness mismatch between the type of the template argument &f and the substituted type of F (the substituted type has a top-level const). Note that this mismatch doesn't occur with object pointers because in that case a call to perform_qualification_conversions from convert_nontype_argument implicitly adds a top-level const to the argument (via a cast) to match. This however seems to be a manifestation of a more general conformance issue -- that we're not dropping top-level cv-quals after substituting into the type of an NTTP as per [temp.param]/6 (we only do so at parse time in process_template_parm). This patch makes convert_template_argument drop top-level cv-quals accordingly. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/100893 gcc/cp/ChangeLog: * pt.c (convert_template_argument): Strip top-level cv-quals on the substituted type of a non-type template parameter. gcc/testsuite/ChangeLog: * g++.dg/template/param4.C: New test. * g++.dg/template/param5.C: New test. --- gcc/cp/pt.c| 7 ++- gcc/testsuite/g++.dg/template/param4.C | 10 ++ gcc/testsuite/g++.dg/template/param5.C | 7 +++ 3 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/template/param4.C create mode 100644 gcc/testsuite/g++.dg/template/param5.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7211bdc5bbc..66cc88a331f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8494,7 +8494,12 @@ convert_template_argument (tree parm, return error_mark_node; } else - t = tsubst (t, args, complain, in_decl); + { + t = tsubst (t, args, complain, in_decl); + /* Ignore top-level qualifiers on the substituted type of this +non-type template parameter, as per [temp.param]/6. */ + t = cv_unqualified (t); + } Err, shortly after posting I realized we could get top-level cv-quals not only after substitution, but also after deduction of a decltype(auto) NTTP (as in nontype-auto19.C below), so the call to cv_unqualified would need to happen after do_auto_deduction too, like so. Testing in progress. -- >8 -- Subject: [PATCH] c++: top-level cv-quals on type of NTTP [PR100893] Here, we're rejecting the specialization of g with T=A, F=&f in param4.C below due to a spurious constness mismatch between the type of the template argument &f and the substituted type of F (the substituted type has a top-level const). Note that this mismatch doesn't occur with object pointers because in that case a call to perform_qualification_conversions from convert_nontype_argument implicitly adds a top-level const to the argument (via a cast) to match. This however seems to be a manifestation of a more general conformance issue -- that we're not dropping top-level cv-quals after substituting into the type of an NTTP as per [temp.param]/6 (we only do so at parse time in process_template_parm). So this patch makes convert_template_argument drop top-level cv-quals accordingly. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/100893 gcc/cp/ChangeLog: * pt.c (convert_template_argument): Strip top-level cv-quals on the substituted type of a non-type template parameter. gcc/testsuite/ChangeLog: * g++.dg/template/param4.C: New test. * g++.dg/template/param5.C: New test. * g++.dg/cpp1z/nontype-auto19.C: New test. * g++.dg/cpp2a/concepts-decltype.C: Don't expect that the deduced type of a decltype(auto) NTTP has top-level cv-quals. --- gcc/cp/pt.c| 4 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C| 8 gcc/testsuite/g++.dg/cpp2a/concepts-decltype.C | 2 +- gcc/testsuite/g++.dg/template/param4.C | 10 ++ gcc/testsuite/g++.dg/template/param5.C | 7 +++ 5 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto19.C create mode 100644 gcc/testsuite/g++.dg/template/param4.C create mode 100644 gcc/testsuite/g++.dg/template/param5.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7211bdc5bbc..3cac073ed50 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8496,6 +8496,10 @@ convert_template_argument (tree parm, else t = tsubst (t, args, complain, in_decl); + /* Drop top-level cv-qualifiers on the substituted/deduced type of +this non-type template parameter, as per [temp.param]/6. */ + t = cv_unqualified (t); Might move this after... if (invalid_nontype_parm_type_p (t, complain)) return error_mark_node; ...this test. OK either way. diff --git a/gcc/testsui
[PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.
These were only enabled for embedded elf originally because that was the safe option, and linux had no obvious use for them. But now that we have new extensions coming like V that affect process state and ABIs, the attributes are expected to be useful for linux, and may be required by the psABI. clang already emits them for all riscv targets. Tested with a patched open embedded build and boot, and a native toolchain build. Committed. Jim gcc/ * config.gcc (riscv*-*-*): If --with-riscv-attribute not used, turn TARGET_RISCV_ATTRIBUTES on for all riscv targets. --- gcc/config.gcc | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 92fad8e20ca..6833a6c13d9 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4605,14 +4605,7 @@ case "${target}" in tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0" ;; ""|default) - case "${target}" in - riscv*-*-elf*) - tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1" - ;; - *) - tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0" - ;; - esac + tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1" ;; *) echo "--with-riscv-attribute=${with_riscv_attribute} is not supported. The argument must begin with yes, no or default." 1>&2 -- 2.25.1
Re: [PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.
On Thu, 03 Jun 2021 13:55:40 PDT (-0700), Jim Wilson wrote: These were only enabled for embedded elf originally because that was the safe option, and linux had no obvious use for them. But now that we have new extensions coming like V that affect process state and ABIs, the attributes are expected to be useful for linux, and may be required by the psABI. clang already emits them for all riscv targets. Tested with a patched open embedded build and boot, and a native toolchain build. Committed. Works for me. Nelson: I'm assuming this is why the .riscv.attributes is in binutils-all now? That seems fine, we should just call it out as the reason explicitly by splitting that out into its own commit. Jim gcc/ * config.gcc (riscv*-*-*): If --with-riscv-attribute not used, turn TARGET_RISCV_ATTRIBUTES on for all riscv targets. --- gcc/config.gcc | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 92fad8e20ca..6833a6c13d9 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4605,14 +4605,7 @@ case "${target}" in tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0" ;; ""|default) - case "${target}" in - riscv*-*-elf*) - tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1" - ;; - *) - tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0" - ;; - esac + tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1" ;; *) echo "--with-riscv-attribute=${with_riscv_attribute} is not supported. The argument must begin with yes, no or default." 1>&2
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On Thu, Jun 03, 2021 at 11:25:49AM +0100, Richard Sandiford wrote: > We shouldn't just add "&&" to all define_insn_and_splits that currently > lack them. My previous post shows that this *already* is required. > IMO it's not reasonable to ask Kewen to do that for all ports. So the > process I suggested was a way of mechanically changing existing ports in > a way that would not require input from target maintainers, or extensive > testing on affected targets. I fear it will end up as Yet Another unfinished transition this way :-( > >> I don't know. "&& 1" looks kind of weird to me. > > > > We have it in rs6000.md since 2004. sparc has had it since 2002. i386 > > has had it since 2001. arm still does not have it :-) > > Sure, this syntax goes back 20 years. I don't think that answers the > point though. The question was whether a split condition "&& 1" is > more readable than a condition "", given that "" means "always" in other > contexts. Given the choice, IMO "" is more readable and "&& 1" looks > weird/inconsistent. In most ports "&& 1" is all over the place and well-known already. Sure we could change to "" meaning always, but that is not what it currently means! If we could just start all over we could do it perfectly (but see second-system syndrome, heh). But we cannot. IMO we should especially avoid everything that uses new semantics for old syntax. Segher
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On Thu, Jun 03, 2021 at 04:25:44PM -0500, Segher Boessenkool wrote: > If we could just start all over we could do it perfectly (but see > second-system syndrome, heh). But we cannot. IMO we should especially > avoid everything that uses new semantics for old syntax. Agreed, that would be a nightmare for backporting. Jakub
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
On Thu, Jun 03, 2021 at 11:11:53AM -0600, Jeff Law wrote: > On 6/3/2021 2:00 AM, Segher Boessenkool wrote: > >The whole point of requiring the split condition to start with && is so > >it will become harder to mess things up (it will make the gen* code a > >tiny little bit simpler as well). And there is no transition period or > >anything like that needed either. Just the bunch that will break will > >need fixing. So let's find out how many of those there are :-) > Exactly. While these empty conditions or those not starting with "&&" > are technically valid, they're all suspicious from a port correctness > standpoint, particularly if the main condition is non-empty. And note that this is also the case if you wrote the insn condition as an empty string, but you used some iterator with a condition. I found many of these in rs6000. This will need to be fixed before we do anything else. > Having made that mistake when converting the H8 away from CC0, I can say > fairly confidently that if we had this in place a year ago that those > mistakes would likely have been avoided. Thankfully the H8 isn't a > heavily used port and has limped along until I stumbled over the issue a > week or so ago while polishing some improvements to the port. I've noticed unexpected splits in rs6000 only a few times over the years. That doesn't mean more didn't happen, they just didn't cause obvious enough problems :-) Segher
Re: [PATCH 04/11] cris: Update unexpected empty split condition
> From: Hans-Peter Nilsson > CC: "gcc-patches@gcc.gnu.org" > Date: Thu, 3 Jun 2021 18:12:25 +0200 > I'd > prefer to have the patch above committed sooner than the > conclusion of that discussion. (If you don't get to it, > I'll do it, after a round of testing.) Done; no regressions. brgds, H-P
Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders
On Thu, Jun 3, 2021 at 12:39 AM Uros Bizjak wrote: > > On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu wrote: > > > > Update move expanders to convert the CONST_WIDE_INT operand to vector > > broadcast from a byte with AVX2. Add ix86_gen_scratch_sse_rtx to > > return a scratch SSE register which won't increase stack alignment > > requirement and blocks transformation by the combine pass. > > Using fixed scratch reg is just too hackish for my taste. The It was recommended to use hard register for things like this: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569945.html > expansion is OK to emit some optimized sequence, but the approach to > use fixed reg to bypass stack alignment functionality and combine is > not. > > Perhaps a new insn pattern should be introduced, e.g. > > (define_insn_and_split "" >[(set (match_opreand:V 0 "memory_operand" "=m,m") > (vec_duplicate:V > (match_operand:S 2 "reg_or_0_operand" "r,C")) > (clobber (match_scratch:V 1 "=x"))] > > and split it at some appropriate point. I will give it a try. Thanks. > Uros. > > > > > A small benchmark: > > > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast > > > > shows that broadcast is a little bit faster on Intel Core i7-8559U: > > > > $ make > > gcc -g -I. -O2 -c -o test.o test.c > > gcc -g -c -o memory.o memory.S > > gcc -g -c -o broadcast.o broadcast.S > > gcc -o test test.o memory.o broadcast.o > > ./test > > memory : 99333 > > broadcast: 97208 > > $ > > > > broadcast is also smaller: > > > > $ size memory.o broadcast.o > >textdata bss dec hex filename > > 132 0 0 132 84 memory.o > > 122 0 0 122 7a broadcast.o > > $ > > > > gcc/ > > > > PR target/100865 > > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): > > New prototype. > > (ix86_byte_broadcast): New function. > > (ix86_convert_const_wide_int_to_broadcast): Likewise. > > (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast > > if mode size is 16 bytes or bigger. > > (ix86_expand_vector_move): Try > > ix86_convert_const_wide_int_to_broadcast. > > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New > > prototype. > > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add > > an argument to ignore stack_alignment_estimated. It is passed > > as false by default. > > (ix86_gen_scratch_sse_rtx): New function. > > > > gcc/testsuite/ > > > > PR target/100865 > > * gcc.target/i386/pr100865-1.c: New test. > > * gcc.target/i386/pr100865-2.c: Likewise. > > * gcc.target/i386/pr100865-3.c: Likewise. > > * gcc.target/i386/pr100865-4.c: Likewise. > > * gcc.target/i386/pr100865-5.c: Likewise. > > --- > > gcc/config/i386/i386-expand.c | 103 ++--- > > gcc/config/i386/i386-protos.h | 2 + > > gcc/config/i386/i386.c | 50 +- > > gcc/testsuite/gcc.target/i386/pr100865-1.c | 13 +++ > > gcc/testsuite/gcc.target/i386/pr100865-2.c | 14 +++ > > gcc/testsuite/gcc.target/i386/pr100865-3.c | 15 +++ > > gcc/testsuite/gcc.target/i386/pr100865-4.c | 16 > > gcc/testsuite/gcc.target/i386/pr100865-5.c | 17 > > 8 files changed, 215 insertions(+), 15 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c > > > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > > index 4185f58eed5..658adafa269 100644 > > --- a/gcc/config/i386/i386-expand.c > > +++ b/gcc/config/i386/i386-expand.c > > @@ -93,6 +93,9 @@ along with GCC; see the file COPYING3. If not see > > #include "i386-builtins.h" > > #include "i386-expand.h" > > > > +static bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx, > > + rtx); > > + > > /* Split one or more double-mode RTL references into pairs of half-mode > > references. The RTL can be REG, offsettable MEM, integer constant, or > > CONST_DOUBLE. "operands" is a pointer to an array of double-mode RTLs > > to > > @@ -190,6 +193,65 @@ ix86_expand_clear (rtx dest) > >emit_insn (tmp); > > } > > > > +/* Return a byte value which V can be broadcasted from. Otherwise, > > + return INT_MAX. */ > > + > > +static int > > +ix86_byte_broadcast (HOST_WIDE_INT v) > > +{ > > + wide_int val = wi::uhwi (v, HOST_BITS_PER_WIDE_INT); > > + int byte_broadcast = wi::extract_uhwi (val, 0, BITS_PER_UNIT); > > + for (unsigned int i = BITS_PER_UNIT; > > + i < HOST_BITS_PER_WIDE_INT; > > + i += BITS
[PATCH] [og11] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset
At Richard Biener's suggestion, this patch undoes the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571712.html and moves the stripping of ARRAY_REFS/INDIRECT_REFS out of extract_base_bit_offset and back into the (two) call sites of the function. The difference between the two ways of looking through these nodes comes down to (I think) what processing has been done on the clause in question already: in the case where BASE_REF is non-NULL, we are processing an OMP_CLAUSE_DECL for the first time. Conversely, when BASE_REF is NULL, we are processing a node from the sorted list that is being constructed after a GOMP_MAP_STRUCT node. In practice, this appears to have no effect on test results (and I haven't come up with a new test where it makes a difference), though I will fold this version into the next iteration of these patches sent upstream in order to avoid potentially introducing a bug. Tested with offloading to NVPTX. I will apply to the og11 branch after the weekend. 2021-06-03 Julian Brown gcc/ * gimplify.c (extract_base_bit_offset): Don't look through ARRAY_REFs or INDIRECT_REFs here. (build_struct_group): Reinstate previous behaviour for handling ARRAY_REFs/INDIRECT_REFs. --- gcc/gimplify.c | 46 ++ 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index c6ebef8e41c..1742a2cb564 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8526,20 +8526,6 @@ extract_base_bit_offset (tree base, tree *base_ind, tree *base_ref, if (base_ref) *base_ref = NULL_TREE; - if (TREE_CODE (base) == ARRAY_REF) -{ - while (TREE_CODE (base) == ARRAY_REF) - base = TREE_OPERAND (base, 0); - if (TREE_CODE (base) != COMPONENT_REF - || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE) - return NULL_TREE; -} - else if (TREE_CODE (base) == INDIRECT_REF - && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) - == REFERENCE_TYPE)) -base = TREE_OPERAND (base, 0); - base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep); @@ -9116,11 +9102,17 @@ build_struct_group (struct gimplify_omp_ctx *ctx, poly_offset_int coffset; poly_int64 cbitpos; tree base_ind, base_ref, tree_coffset; + tree ocd = OMP_CLAUSE_DECL (c); bool openmp = !(region_type & ORT_ACC); - tree base = extract_base_bit_offset (OMP_CLAUSE_DECL (c), &base_ind, - &base_ref, &cbitpos, &coffset, - &tree_coffset, openmp); + while (TREE_CODE (ocd) == ARRAY_REF) +ocd = TREE_OPERAND (ocd, 0); + + if (TREE_CODE (ocd) == INDIRECT_REF) +ocd = TREE_OPERAND (ocd, 0); + + tree base = extract_base_bit_offset (ocd, &base_ind, &base_ref, &cbitpos, + &coffset, &tree_coffset, openmp); bool do_map_struct = (base == decl && !tree_coffset); @@ -9347,9 +9339,23 @@ build_struct_group (struct gimplify_omp_ctx *ctx, poly_offset_int offset; poly_int64 bitpos; tree tree_offset; - tree base = extract_base_bit_offset (sc_decl, NULL, NULL, -&bitpos, &offset, -&tree_offset, openmp); + + if (TREE_CODE (sc_decl) == ARRAY_REF) + { + while (TREE_CODE (sc_decl) == ARRAY_REF) + sc_decl = TREE_OPERAND (sc_decl, 0); + if (TREE_CODE (sc_decl) != COMPONENT_REF + || TREE_CODE (TREE_TYPE (sc_decl)) != ARRAY_TYPE) + break; + } + else if (TREE_CODE (sc_decl) == INDIRECT_REF +&& TREE_CODE (TREE_OPERAND (sc_decl, 0)) == COMPONENT_REF +&& (TREE_CODE (TREE_TYPE (TREE_OPERAND (sc_decl, 0))) +== REFERENCE_TYPE)) + sc_decl = TREE_OPERAND (sc_decl, 0); + + tree base = extract_base_bit_offset (sc_decl, NULL, NULL, &bitpos, +&offset, &tree_offset, openmp); if (!base || !operand_equal_p (base, decl, 0)) break; if (scp) -- 2.29.2
Re: [Patch] Fortran/OpenMP: Add omp loop [PR99928]
On Thu, Jun 03, 2021 at 05:07:22PM +0200, Jakub Jelinek via Gcc-patches wrote: > I think best would be just to remove that part of the testcase > in the loop patch and handle the !$omp with !$acc continuations > and vice versa in a separate change. That seems to be a preexisting > bug not really related to whether we support loop or not. > fatal_error is certainly not ideal, but I can understand fixing > it otherwise might be hard. > Wonder if we just shouldn't treat the incorrect continuations > as if they were simple comments. Perhaps can't it gfc_error_now for the mixing of OMP and ACC line continuations and then act as if the incorrect ones were comments (like for -fopenmp -fno-openacc or -fopenacc -fno-openmp)? Jakub
Re: [PATCH] rtl: constm64_rtx..const64_rtx
On Thu, Jun 03, 2021 at 01:32:54PM -0600, Jeff Law wrote: > On 6/2/2021 4:43 PM, Segher Boessenkool wrote: > >We have has const0_rtx etc. since forever, this patch just increases the > >range (to those values that have had guaranteed unique RTXes since > >decades as well). > Yea, but often what you really want is CONST0_RTX (mode) instead of > const0_rtx. It's easily goof'd and often the cause minor missed > optimizations. I'd say "sometimes" instead of "often"... almost always you really do want a scalar integer. Also, although CONST0_RTX makes sense for pretty much all modes, CONST64_RTX will not (and those are not currently unique RTX anyway). Segher
Re: [wwwdocs] lists: Correct procmail recipe
On Wed, 2 Jun 2021, Gerald Pfeifer wrote: > On Tue, 1 Jun 2021, Segher Boessenkool wrote: > > We haven't had Sender: for a while now. > > "a while now" was about four(?) hours when you sent that yesterday. :-) > > I know since I still had been using that and was looking for all my > missing gcc-related mails yesterday afternoon. Thanks for spotting > this and this change! JFTR, nitpickish: I still get a "Sender:" field (obviously generated at gcc.gnu.org), just not matching that format. brgds, H-P
RE: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints.
>-Original Message- >From: Jakub Jelinek >Sent: Thursday, June 3, 2021 9:49 PM >To: Liu, Hongtao >Cc: gcc-patches@gcc.gnu.org >Subject: Re: [PATCH] [i386] Fix ICE of insn does not satisfy its constraints. > >On Thu, Jun 03, 2021 at 05:07:26PM +0800, liuhongt via Gcc-patches wrote: >> @@ -18163,10 +18163,10 @@ (define_expand "v16qiv16si2" >>"TARGET_AVX512F") >> >> (define_insn "avx2_v8qiv8si2" >> - [(set (match_operand:V8SI 0 "register_operand" "=v") >> + [(set (match_operand:V8SI 0 "register_operand" "=Yv") >> (any_extend:V8SI >>(vec_select:V8QI >> -(match_operand:V16QI 1 "register_operand" "v") >> +(match_operand:V16QI 1 "register_operand" "Yv") >> (parallel [(const_int 0) (const_int 1) >> (const_int 2) (const_int 3) >> (const_int 4) (const_int 5) > >Why do you need this change (and similarly other v -> Yv changes)? >I mean, ix86_hard_regno_mode_ok for TARGET_AVX512F >&& !TARGET_AVX512VL should return false for the 16-byte and 32-byte vector >modes. > >The reason to use Yv is typically where the match_operand has 64-byte vector >mode or scalar mode, yet it needs an AVX512VL instruction. > >The changes to use Yw look ok, that is for the cases where the insn requires >both AVX512VL and AVX512BW, while ix86_hard_regno_mode_ok ensures >the xmm16+ regs won't be used for the 16/32-byte vectors when AVX512VL is >not on, it doesn't ensure that AVX512BW will be enabled. Thanks for the review. Yes, you're right, AVX512VL parts are already guaranteed by ix86_hard_regno_mode_ok. Here is updated patch. > > Jakub 0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch Description: 0001-i386-Fix-ICE-of-insn-does-not-satisfy-its-constraint_v2.patch
Re: [PATCH] RISC-V: Enable riscv attributes by default for all riscv targets.
On Fri, Jun 4, 2021 at 5:20 AM Palmer Dabbelt wrote: > > On Thu, 03 Jun 2021 13:55:40 PDT (-0700), Jim Wilson wrote: > > These were only enabled for embedded elf originally because that was > > the safe option, and linux had no obvious use for them. But now that > > we have new extensions coming like V that affect process state and ABIs, > > the attributes are expected to be useful for linux, and may be required > > by the psABI. clang already emits them for all riscv targets. > > > > Tested with a patched open embedded build and boot, and a native > > toolchain build. > > > > Committed. > > Works for me. > > Nelson: I'm assuming this is why the .riscv.attributes is in > binutils-all now? That seems fine, we should just call it out as the > reason explicitly by splitting that out into its own commit. Yes this patch will make .riscv.attributes always be generated, both for elf and linux toolchains. Got it, I will split this one into a separate commit, thanks. FYI, this update resolves the ABI problems for linux toolchain that I met before, https://sourceware.org/pipermail/binutils/2020-November/114016.html Now linux toolchain also generate the elf architecture attributes, so binutils will try to choose the correct ABI according to the attributes, if user compile a assembly code without setting -mabi=. Thanks Nelson > > Jim > > > > gcc/ > > * config.gcc (riscv*-*-*): If --with-riscv-attribute not used, > > turn TARGET_RISCV_ATTRIBUTES on for all riscv targets. > > --- > > gcc/config.gcc | 9 + > > 1 file changed, 1 insertion(+), 8 deletions(-) > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index 92fad8e20ca..6833a6c13d9 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -4605,14 +4605,7 @@ case "${target}" in > > tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=0" > > ;; > > ""|default) > > - case "${target}" in > > - riscv*-*-elf*) > > - tm_defines="${tm_defines} > > TARGET_RISCV_ATTRIBUTE=1" > > - ;; > > - *) > > - tm_defines="${tm_defines} > > TARGET_RISCV_ATTRIBUTE=0" > > - ;; > > - esac > > + tm_defines="${tm_defines} TARGET_RISCV_ATTRIBUTE=1" > > ;; > > *) > > echo "--with-riscv-attribute=${with_riscv_attribute} > > is not supported. The argument must begin with yes, no or default." 1>&2
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On 2021/6/4 04:31, Segher Boessenkool wrote: > On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote: >> If remove the rotate in simplify-rtx like below: >> >> +++ b/gcc/simplify-rtx.c >> @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 >> (rtx_code code, >> case ROTATE: >> if (trueop1 == CONST0_RTX (mode)) >> return op0; >> + >> + if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64) >> + && CONST_INT_P (XEXP (trueop0, 1)) >> + && INTVAL (XEXP (trueop0, 1)) == 64) >> + return XEXP (trueop0, 0); > > (The hardcoded 64 need improving -- but this is just a proof of concept > I'll assume :-) ) > >> Combine still fail to merge the two instructions: >> >> Trying 6 -> 7: >> 6: r120:KF#0=r125:KF#0<-<0x40 >>REG_DEAD r125:KF >> 7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40 >>REG_DEAD r120:KF >> Successfully matched this instruction: >> (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp) >> (reg:DI 123)) [1 S16 A128]) >> (subreg:V1TI (reg:KF 125) 0)) >> rejecting combination of insns 6 and 7 >> original costs 4 + 4 = 8 >> replacement cost 12 > > So what instructions were these? Why did the store cost 4 but the new > one costs 12? For this case of __float128 to vector __int128: typedef union { __float128 vf1; vector __int128 vi128; __int128 i128; } VF_128; vector __int128 foo1 (__float128 f128) { VF_128 vunion; vunion.vf1 = f128; return vunion.vi128; } Without this patch, the RTL in combine is: (insn 6 3 17 2 (set (subreg:V1TI (reg:KF 120 [ f128 ]) 0) (rotate:V1TI (subreg:V1TI (reg:KF 125) 0) (const_int 64 [0x40]))) "pr100085.c":258:14 1113 {*vsx_le_permute_v1ti} (expr_list:REG_DEAD (reg:KF 125) (nil))) (insn 17 6 7 2 (set (reg:DI 123) (const_int 32 [0x20])) "pr100085.c":258:14 636 {*movdi_internal64} (nil)) (insn 7 17 19 2 (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp) (reg:DI 123)) [1 S16 A128]) (rotate:V1TI (subreg:V1TI (reg:KF 120 [ f128 ]) 0) (const_int 64 [0x40]))) "pr100085.c":258:14 1113 {*vsx_le_permute_v1ti} (expr_list:REG_DEAD (reg:KF 120 [ f128 ]) (nil))) (note 19 7 14 2 NOTE_INSN_DELETED) (insn 14 19 15 2 (set (reg/i:V1TI 66 %v2) (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp) (reg:DI 123)) [1 S16 A128])) "pr100085.c":260:1 1119 {*vsx_le_perm_load_v1ti} (expr_list:REG_DEAD (reg:DI 123) (nil))) (insn 15 14 0 2 (use (reg/i:V1TI 66 %v2)) "pr100085.c":260:1 -1 (nil)) insn 6 and insn 7 are two vsx_le_permute_v1ti instructions each with costs 4, (The two instructions are VSX and LE specific like Bill said, swap pass tries to remove insn if legal). If remove the rotates in simplify-rtx.c (simplify_context::simplify_binary_operation_1) like my last reply, combine will try to merge them to vsx_le_perm_store_v1ti whose insn cost is 12 and meet "rejecting combination". They are all V1TI mode. > >> By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8, > > It should be the same cost as the other store! vsx_le_permute_v1ti's cost is defined to 4 in vsx.md: ;; Little endian word swapping for 128-bit types that are either scalars or the ;; special V1TI container class, which it is not appropriate to use vec_select ;; for the type. (define_insn "*vsx_le_permute_" [(set (match_operand:VSX_TI 0 "nonimmediate_operand" "=wa,wa,Z,&r,&r,Q") (rotate:VSX_TI (match_operand:VSX_TI 1 "input_operand" "wa,Z,wa,r,Q,r") (const_int 64)))] "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR" "@ xxpermdi %x0,%x1,%x1,2 lxvd2x %x0,%y1 stxvd2x %x1,%y0 mr %0,%L1\;mr %L0,%1 ld%U1%X1 %0,%L1\;ld%U1%X1 %L0,%1 std%U0%X0 %L1,%0\;std%U0%X0 %1,%L0" [(set_attr "length" "*,*,*,8,8,8") (set_attr "type" "vecperm,vecload,vecstore,*,load,store")]) > >> it could merge the instructions: >> >> 21: r125:KF=%v2:KF >>REG_DEAD %v2:KF >> 2: NOTE_INSN_DELETED >> 3: NOTE_INSN_FUNCTION_BEG >> 6: NOTE_INSN_DELETED >> 17: r123:DI=0x20 >> 7: [sfp:DI+r123:DI]=r125:KF#0 >>REG_DEAD r125:KF >> 19: NOTE_INSN_DELETED >> 14: %v2:V1TI=[sfp:DI+r123:DI] >>REG_DEAD r123:DI >> 15: use %v2:V1TI >> >> Then followed split1 pass will still split it to due to no dse pass >> between to remove the memory operations on stack, remove the rotate >> in swap won't face such problem since it runs before dse and no split >> pass between them: > > Sure, but none of that is the point. I asked if we did this for TImode > properly, and maybe we do, but: > >> 22: r126:V1TI=r125:KF#0<-<0x40 >> 23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40 > > ... this is V1TI mode. > > > Segher > -- Thanks, Xionghu
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
Hi, On 2021/6/3 21:09, Bill Schmidt wrote: > On 6/2/21 7:46 PM, Xionghu Luo wrote: >> Hi, >> >> On 2021/6/3 06:20, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: On P8LE, extra rot64+rot64 load or store instructions are generated in float128 to vector __int128 conversion. This patch teaches pass swaps to also handle such pattens to remove extra swap instructions. >>> Did you check if this is already handled by simplify-rtx if the mode had >>> been TImode (not V1TImode)? If not, why do you not handle it there? >> I tried to do it in combine or peephole, the later pass split2 >> or split3 will still split it to rotate + rotate again as we have split >> after reload, and this pattern is quite P8LE specific, so put it in pass >> swap. The simplify-rtx could simplify >> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register >> operations already. >> >> >> vsx.md: >> >> ;; The post-reload split requires that we re-permute the source >> ;; register in case it is still live. >> (define_split >> [(set (match_operand:VSX_LE_128 0 "memory_operand") >> (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >> "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && >> !TARGET_P9_VECTOR >> && !altivec_indexed_or_indirect_operand (operands[0], mode)" >> [(const_int 0)] >> { >> rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >> rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); >> rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >> DONE; >> }) > > Note also that swap optimization can handle more general cases than > simplify-rtx. In my view it's best to have it covered in both places. > But this pattern is after reload quite later than swap optimization, so it couldn't remove the swap operations as expected, I have a below example that matched the above pattern in pass split2, this may be not quite appropriate as there is a function call between the load and store. extern vector __int128 foo1 (__float128 a); int foo2 () { __binary128 f128 = {3.1415926535897932384626433832795028841971693993751058Q}; vector __int128 ret = foo1 (f128); return ret[0]; } 295r.split (*see insn 35, 36, 37*): ... Splitting with gen_split_558 (vsx.md:1079) ... (insn 33 12 34 2 (set (reg/f:DI 9 %r9 [121]) (high:DI (unspec:DI [ (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (reg:DI 2 %r2) ] UNSPEC_TOCREL))) "pr100085.c":279:25 715 {*largetoc_high} (nil)) (insn 34 33 6 2 (set (reg/f:DI 9 %r9 [121]) (lo_sum:DI (reg/f:DI 9 %r9 [121]) (unspec:DI [ (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (reg:DI 2 %r2) ] UNSPEC_TOCREL))) "pr100085.c":279:25 717 {*largetoc_low} (expr_list:REG_EQUAL (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) (nil))) (insn 6 34 8 2 (set (reg:V1TI 66 %v2 [123]) (rotate:V1TI (mem/c:V1TI (reg/f:DI 9 %r9 [121]) [1 f128+0 S16 A128]) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 8 6 9 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2 [123]) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (call_insn 9 8 32 2 (parallel [ (set (reg:V1TI 66 %v2) (call (mem:SI (symbol_ref:DI ("foo1") [flags 0x41] ) [0 foo 1 S4 A8]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:DI 96 lr)) ]) "pr100085.c":279:25 735 {*call_value_nonlocal_aixdi} (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo1") [flags 0x41] ) (nil)) (expr_list (use (reg:DI 2 %r2)) (expr_list:KF (use (reg:KF 66 %v2)) (nil (insn 32 9 35 2 (set (reg:DI 9 %r9 [138]) (plus:DI (reg/f:DI 1 %r1) (const_int 32 [0x20]))) "pr100085.c":279:25 66 {*adddi3} (nil)) (insn 35 32 36 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 36 35 37 2 (set (mem/c:V1TI (reg:DI 9 %r9 [138]) [2 %sfp+32 S16 A128]) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 37 36 28 2 (set (reg:V1TI 66 %v2) (rotate:V1TI (reg:V1TI 66 %v2) (const_int 64 [0x40]))) "pr100085.c":279:25 1113 {*vsx_le_permute_v1ti} (nil)) (insn 28 37 17 2 (set (reg:DI 3 %r3 [133]) (mem/c:DI (plus:DI (reg/f:DI 1 %r1) (const_int 32 [0x20])) [2 %sfp+32 S8 A128])) "pr100085.c":279:25 636 {*movdi_internal64} (nil)) (insn 17 28 18 2 (set (reg/i:DI 3 %r3) (sign_extend:DI (reg:SI 3 %r3 [129]))) "pr100085.c":281:1 31 {extendsidi2} (nil)) (insn 18 17 30 2 (use (reg/i:DI 3 %r3)) "pr100085.c":
Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]
On 2021/6/4 04:16, Segher Boessenkool wrote: > Hi! > > On Thu, Jun 03, 2021 at 08:46:46AM +0800, Xionghu Luo wrote: >> On 2021/6/3 06:20, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 03:19:32AM -0500, Xionghu Luo wrote: On P8LE, extra rot64+rot64 load or store instructions are generated in float128 to vector __int128 conversion. This patch teaches pass swaps to also handle such pattens to remove extra swap instructions. >>> >>> Did you check if this is already handled by simplify-rtx if the mode had >>> been TImode (not V1TImode)? If not, why do you not handle it there? >> >> I tried to do it in combine or peephole, the later pass split2 >> or split3 will still split it to rotate + rotate again as we have split >> after reload, and this pattern is quite P8LE specific, so put it in pass >> swap. The simplify-rtx could simplify >> r124:KF#0=r123:KF#0<-<0x40<-<0x40 to r124:KF#0=r123:KF#0 for register >> operations already. > > What mode are those subregs? Abbreviated RTL printouts are very lossy. > Assuming those are TImode (please check), then yes, that is what I > asked, thanks. typedef union { __float128 vf1; vector __int128 vi128; __int128 i128; } VF_128; VF_128 vu; int foo3 () { __float128 f128 = {3.1415926535897932384626433832795028841971693993751058Q}; vu.vf1 = f128; vector __int128 ret = vu.vi128; return ret[0]; } This case catches such optimization, they are also V1TImode: Trying 8 -> 9: 8: r122:KF#0=r123:KF#0<-<0x40 REG_DEAD r123:KF 9: r124:KF#0=r122:KF#0<-<0x40 REG_DEAD r122:KF Successfully matched this instruction: (set (subreg:V1TI (reg:KF 124) 0) (rotate:V1TI (rotate:V1TI (subreg:V1TI (reg:KF 123) 0) (const_int 64 [0x40])) (const_int 64 [0x40]))) allowing combination of insns 8 and 9 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 8. modifying insn i3 9: r124:KF#0=r123:KF#0<-<0x40<-<0x40 REG_DEAD r123:KF deferring rescan insn with uid = 9. With confirmation, actually it was optimized by this pattern from vsx.md in split1 pass instead of simplify-rtx. (define_insn_and_split "*vsx_le_undo_permute_" [(set (match_operand:VSX_TI 0 "vsx_register_operand" "=wa,wa") (rotate:VSX_TI (rotate:VSX_TI (match_operand:VSX_TI 1 "vsx_register_operand" "0,wa") (const_int 64)) (const_int 64)))] "!BYTES_BIG_ENDIAN && TARGET_VSX" "@ # xxlor %x0,%x1" "&& 1" [(set (match_dup 0) (match_dup 1))] { if (reload_completed && REGNO (operands[0]) == REGNO (operands[1])) { emit_note (NOTE_INSN_DELETED); DONE; } } [(set_attr "length" "0,4") (set_attr "type" "veclogical")]) > >> ;; The post-reload split requires that we re-permute the source >> ;; register in case it is still live. >> (define_split >>[(set (match_operand:VSX_LE_128 0 "memory_operand") >> (match_operand:VSX_LE_128 1 "vsx_register_operand"))] >>"!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR >> && !altivec_indexed_or_indirect_operand (operands[0], mode)" >>[(const_int 0)] >> { >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>rs6000_emit_le_vsx_permute (operands[0], operands[1], mode); >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode); >>DONE; >> }) > > Yes, that needs improvement itself. > > The tthing to realise is that TImode is optimised by generic code just > fine (as all scalar integer modes are), but V1TImode is not. We have > that mode because we really needed to not put TImode in vector registers > so much on older cpus, but that balance may have changed by now. Worth > experimenting with, we now can do pretty much all noormal operations in > vector registers! > We have two issues stated in PR100085, one is __float128 to vector __int128 (V1TImode), the other is float128 to __float128 to __int128 (TImode). The first one could be solved by this patch(by pass swap optimization), so to cover the TImode case, we should also generate rotate permute instructions when gen_movti for P8LE like gen_movkf in vector.md(below change is exactly copied from "mov"...) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 84820d3b5cb..f35c235a39e 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7385,7 +7385,22 @@ (define_expand "mov" (match_operand:INT 1 "any_operand"))] "" { - rs6000_emit_move (operands[0], operands[1], mode); + /* When generating load/store instructions to/from VSX registers on + pre-power9 hardware in little endian mode, we need to emit register + permute instructions to byte swap the contents, since the VSX load/store + instructions do not include a byte swap as part of their operation. + Altivec loads and stores have no such problem, so we skip them below. */ + if (!BYTES_BIG_ENDIAN + && VECTOR_MEM_VSX_P (mode) + &
RE: [PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).
>-Original Message- >From: Segher Boessenkool >Sent: Friday, June 4, 2021 4:00 AM >To: Liu, Hongtao >Cc: Richard Biener ; GCC Patches patc...@gcc.gnu.org> >Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not >(vec_duplicate A)). > >On Thu, Jun 03, 2021 at 11:03:43AM +, Liu, Hongtao wrote: >> >A very typical example is how UMIN is optimised: >> > >> > case UMIN: >> > if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0)) >> >return op1; >> > if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0)) >> >return op0; >> > tem = simplify_associative_operation (code, mode, op0, op1); >> > if (tem) >> >return tem; >> > break; >> > >> >(the stuff using "tem"). >> > >> >Hongtao, can we do something similar here? Does that work well? >> >Please try it out :-) >> >> In simplify_rtx, no simplication occurs, there is just the difference >> between (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here >tem will only be 0. > >simplify-rtx is used by combine. When you do and+not+splat for example my >suggestion should kick in. Try it out, don't just dismiss it? > Forgive my obtuseness, do you mean try the following changes, if so then there will be no "kick in", temp will be 0, there's no simplification here since it's just the difference between (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). Or maybe you mean something else? @@ -1708,6 +1708,17 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, #endif break; + /* Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)). */ +case VEC_DUPLICATE: + if (GET_CODE (op) == NOT) + { + rtx vec_dup = gen_rtx_VEC_DUPLICATE (mode, XEXP (op, 0)); + temp = simplify_unary_operation (NOT, mode, vec_dup, GET_MODE (op)); + if (temp) + return temp; + } + break; + >> Basically we don't know it's a simplication until combine successfully >> split the >> 3->2 instructions (not + broadcast + and to andnot + broadcast), but >> 3->it's pretty awkward >> to do this in combine. > >But you need to do this *before* it is split. That is the whole point. > >> Consider andnot is existed for many backends, I think a canonicalization is >needed here. > >Please do note that that is not as easy as yoou may think: you need to make >sure nothing ever creates non-canonical code. > >> Maybe we can add insn canonicalization for transforming (and >> (vect_duplicate (not A)) B) to (and (not (duplicate (not A)) B) instead of >(vec_duplicate (not A)) to (not (vec_duplicate A))? > >I don't understand what this means? I mean let's give a last shot for andnot in case AND like below @ -3702,6 +3702,16 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, tem = simplify_associative_operation (code, mode, op0, op1); if (tem) return tem; + + if (GET_CODE (op0) == VEC_DUPLICATE + && GET_CODE (XEXP (op0, 0)) == NOT) + { + rtx vec_dup = gen_rtx_VEC_DUPLICATE (GET_MODE (op0), + XEXP (XEXP (op0, 0), 0)); + return simplify_gen_binary (AND, mode, + gen_rtx_NOT (mode, vec_dup), + op1); + } break; > > >Segher
Re: [PATCH 1/2] CALL_INSN may not be a real function call.
Ping, This is a splitted middle-end patch as a follow up of https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571544.html On Thu, Jun 3, 2021 at 2:54 PM liuhongt via Gcc-patches wrote: > > Use "used" flag for CALL_INSN to indicate it's a fake call. If it's a > fake call, it won't have its own function stack. > > gcc/ChangeLog > > PR target/82735 > * df-scan.c (df_get_call_refs): When call_insn is a fake call, > it won't use stack pointer reg. > * final.c (leaf_function_p): When call_insn is a fake call, it > won't affect caller as a leaf function. > * reg-stack.c (callee_clobbers_any_stack_reg): New. > (subst_stack_regs): When call_insn doesn't clobber any stack > reg, don't clear the arguments. > * rtl.c (shallow_copy_rtx): Don't clear flag used when orig is > a insn. > * shrink-wrap.c (requires_stack_frame_p): No need for stack > frame for a fake call. > * rtl.h (FAKE_CALL_P): New macro. > --- > gcc/df-scan.c | 3 ++- > gcc/final.c | 3 ++- > gcc/reg-stack.c | 18 +- > gcc/rtl.c | 6 -- > gcc/rtl.h | 5 + > gcc/shrink-wrap.c | 2 +- > 6 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/gcc/df-scan.c b/gcc/df-scan.c > index 6691c3e8357..1268536b3f0 100644 > --- a/gcc/df-scan.c > +++ b/gcc/df-scan.c > @@ -3090,7 +3090,8 @@ df_get_call_refs (class df_collection_rec > *collection_rec, > >for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > { > - if (i == STACK_POINTER_REGNUM) > + if (i == STACK_POINTER_REGNUM > + && !FAKE_CALL_P (insn_info->insn)) > /* The stack ptr is used (honorarily) by a CALL insn. */ > df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], >NULL, bb, insn_info, DF_REF_REG_USE, > diff --git a/gcc/final.c b/gcc/final.c > index e0a70fcd830..817f7722cb2 100644 > --- a/gcc/final.c > +++ b/gcc/final.c > @@ -4109,7 +4109,8 @@ leaf_function_p (void) >for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) > { >if (CALL_P (insn) > - && ! SIBLING_CALL_P (insn)) > + && ! SIBLING_CALL_P (insn) > + && ! FAKE_CALL_P (insn)) > return 0; >if (NONJUMP_INSN_P (insn) > && GET_CODE (PATTERN (insn)) == SEQUENCE > diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c > index 25210f0c17f..1d9ea035cf4 100644 > --- a/gcc/reg-stack.c > +++ b/gcc/reg-stack.c > @@ -174,6 +174,7 @@ > #include "reload.h" > #include "tree-pass.h" > #include "rtl-iter.h" > +#include "function-abi.h" > > #ifdef STACK_REGS > > @@ -2368,6 +2369,18 @@ subst_asm_stack_regs (rtx_insn *insn, stack_ptr > regstack) > } >} > } > + > +/* Return true if a function call is allowed to alter some or all bits > + of any stack reg. */ > +static bool > +callee_clobbers_any_stack_reg (const function_abi & callee_abi) > +{ > + for (unsigned regno = FIRST_STACK_REG; regno <= LAST_STACK_REG; regno++) > +if (callee_abi.clobbers_at_least_part_of_reg_p (regno)) > + return true; > + return false; > +} > + > > /* Substitute stack hard reg numbers for stack virtual registers in > INSN. Non-stack register numbers are not changed. REGSTACK is the > @@ -2382,7 +2395,10 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack) >bool control_flow_insn_deleted = false; >int i; > > - if (CALL_P (insn)) > + /* If the target of the call doesn't clobber any stack registers, > + Don't clear the arguments. */ > + if (CALL_P (insn) > + && callee_clobbers_any_stack_reg (insn_callee_abi (insn))) > { >int top = regstack->top; > > diff --git a/gcc/rtl.c b/gcc/rtl.c > index b0ba1ff684c..aaee882f5ca 100644 > --- a/gcc/rtl.c > +++ b/gcc/rtl.c > @@ -395,8 +395,10 @@ shallow_copy_rtx (const_rtx orig MEM_STAT_DECL) > case SCRATCH: >break; > default: > - /* For all other RTXes clear the used flag on the copy. */ > - RTX_FLAG (copy, used) = 0; > + /* For all other RTXes clear the used flag on the copy. > +CALL_INSN use "used" flag to indicate it's a fake call. */ > + if (!INSN_P (orig)) > + RTX_FLAG (copy, used) = 0; >break; > } >return copy; > diff --git a/gcc/rtl.h b/gcc/rtl.h > index 35178b5bfac..5ed0d6dd6fa 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -839,6 +839,11 @@ struct GTY(()) rtvec_def { > /* Predicate yielding nonzero iff X is a call insn. */ > #define CALL_P(X) (GET_CODE (X) == CALL_INSN) > > +/* 1 if RTX is a call_insn for a fake call. > + CALL_INSN use "used" flag to indicate it's a fake call. */ > +#define FAKE_CALL_P(RTX)\ > + (RTL_FLAG_CHECK1 ("FAKE_CALL_P", (RTX), CALL_INSN)->used) > + > /* Predicate yielding nonzero iff X is an insn that cannot jump. */ > #define NONJUMP_INSN_P(X) (GET_CODE (X) == INSN) > > diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c > i
Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.
Ping This is a splitted backend patch as a follow up of https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571545.html On Thu, Jun 3, 2021 at 2:55 PM liuhongt via Gcc-patches wrote: > > When __builtin_ia32_vzeroupper is called explicitly, the corresponding > vzeroupper pattern does not carry any CLOBBERS or SETs before LRA, > which leads to incorrect optimization in pass_reload. In order to > solve this problem, this patch refine instructions as call_insns in > which the call has a special vzeroupper ABI. > > gcc/ChangeLog: > > PR target/82735 > * config/i386/i386-expand.c (ix86_expand_builtin): Remove > assignment of cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-features.c > (ix86_add_reg_usage_to_vzerouppers): Delete. > (ix86_add_reg_usage_to_vzeroupper): Ditto. > (rest_of_handle_insert_vzeroupper): Remove > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end > of the function. > (gate): Remove cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper): > Declared. > * config/i386/i386.c (ix86_insn_callee_abi): New function. > (ix86_initialize_callee_abi): Ditto. > (ix86_expand_avx_vzeroupper): Ditto. > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper > ABI. > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi. > (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper > directly. > * config/i386/i386.h (struct GTY(()) machine_function): Delete > has_explicit_vzeroupper. > * config/i386/i386.md (enum unspec): New member > UNSPEC_CALLEE_ABI. > (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New > define_constants for insn callee abi index. > * config/i386/predicates.md (vzeroupper_pattern): Adjust. > * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted. > (avx_vzeroupper): Call ix86_expand_avx_vzeroupper. > (*avx_vzeroupper): Rename to .. > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as > call_insn which has a special vzeroupper ABI. > (*avx_vzeroupper_1): Deleted. > > gcc/testsuite/ChangeLog: > > PR target/82735 > * gcc.target/i386/pr82735-1.c: New test. > * gcc.target/i386/pr82735-2.c: New test. > * gcc.target/i386/pr82735-3.c: New test. > * gcc.target/i386/pr82735-4.c: New test. > * gcc.target/i386/pr82735-5.c: New test. > --- > gcc/config/i386/i386-expand.c | 4 - > gcc/config/i386/i386-features.c | 99 +++ > gcc/config/i386/i386-protos.h | 1 + > gcc/config/i386/i386.c| 55 - > gcc/config/i386/i386.h| 4 - > gcc/config/i386/i386.md | 10 +++ > gcc/config/i386/predicates.md | 5 +- > gcc/config/i386/sse.md| 59 -- > gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++ > gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 + > gcc/testsuite/gcc.target/i386/pr82735-3.c | 5 ++ > gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++ > gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 + > 13 files changed, 252 insertions(+), 143 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > index 9f3d41955a2..d25d59aa4e7 100644 > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -13282,10 +13282,6 @@ rdseed_step: > >return 0; > > -case IX86_BUILTIN_VZEROUPPER: > - cfun->machine->has_explicit_vzeroupper = true; > - break; > - > default: >break; > } > diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c > index 77783a154b6..a25769ae478 100644 > --- a/gcc/config/i386/i386-features.c > +++ b/gcc/config/i386/i386-features.c > @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p) >return 0; > } > > -/* Modify the vzeroupper pattern in INSN so that it describes the effect > - that the instruction has on the SSE registers. LIVE_REGS are the set > - of registers that are live across the instruction. > - > - For a live register R we use: > - > - (set (reg:V2DF R) (reg:V2DF R)) > - > - which preserves the low 128 bits but clobbers the upper bits. */ > - > -static void > -ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs) > -{ > - rtx pattern = PATTERN (insn); > - unsigned int nregs = TARGET_64BIT ? 16 : 8; > - unsigned int npats = nregs; > - fo
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
Hi Segher, on 2021/6/3 下午5:18, Segher Boessenkool wrote: > On Thu, Jun 03, 2021 at 03:00:44AM -0500, Segher Boessenkool wrote: >> On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote: >> The whole point of requiring the split condition to start with && is so >> it will become harder to mess things up (it will make the gen* code a >> tiny little bit simpler as well). And there is no transition period or >> anything like that needed either. Just the bunch that will break will >> need fixing. So let's find out how many of those there are :-) >> To find out those need fixing seems to be the critical part. It's not hard to add one explicit "&&" to those that don't have it now, but even with further bootstrapped and regression tested I'm still not confident the adjustments are safe enough, since the testing coverage could be limited. It may need more efforts to revisit, or/and test with more coverages, and port maintainers' reviews. In order to find one example which needs more fixing, for rs6000/i386/ aarch64, I fixed all define_insn_and_splits whose insn cond isn't empty (from gensupport's view since the iterator can add more on) while split cond don't start with "&&" , also skipped those whose insn conds are the same as their split conds. Unfortunately (or fortunately :-\) all were bootstrapped and regress-tested. The related diffs are attached, which is based on r12-0. How many such cases *are* there? There are no users exposed to this, and when the split condition is required to start with "&&" (instead of getting that implied) it is not a silent change ever, either. >>> >>> If I read the proposal right, the explicit "&&" is only required when going >>> to find all potential problematic places for final implied "&&" change. >>> But one explicit "&&" does offer good readability. >> >> My proposal is to always require && (or maybe identical insn and split >> conditions should be allowed as well, if people still use that -- that >> is how we wrote "&& 1" before that existed). > > I prototyped this. There are very many errors. Iterators often modify > the insn condition (for one iteration of it), but that does not work if > the split condition does not start with "&&"! > > See attached prototype. > > Thanks for the prototype! BR, Kewen > Segher > > = = = > > diff --git a/gcc/gensupport.c b/gcc/gensupport.c > index 2cb760ffb90f..05d46fd3775c 100644 > --- a/gcc/gensupport.c > +++ b/gcc/gensupport.c > @@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc) > case DEFINE_INSN_AND_SPLIT: > case DEFINE_INSN_AND_REWRITE: >{ > - const char *split_cond; > rtx split; > rtvec attr; > int i; > @@ -611,15 +610,20 @@ process_rtx (rtx desc, file_location loc) > > /* If the split condition starts with "&&", append it to the > insn condition to create the new split condition. */ > - split_cond = XSTR (desc, 4); > - if (split_cond[0] == '&' && split_cond[1] == '&') > + const char *insn_cond = XSTR (desc, 2); > + const char *split_cond = XSTR (desc, 4); > + if (!strncmp (split_cond, "&&", 2)) > { > rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond); > - split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2), > + split_cond = rtx_reader_ptr->join_c_conditions (insn_cond, > split_cond + 2); > + } else if (insn_cond[0]) { > + if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) > + error_at (loc, "the rewrite condition must start with `&&'"); > + else > + error_at (loc, "the split condition must start with `&&' [%s]", > insn_cond); > } > - else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) > - error_at (loc, "the rewrite condition must start with `&&'"); > + > XSTR (split, 1) = split_cond; > if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE) > XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1)); > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index abfd845..86869d9 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1283,7 +1283,7 @@ (define_insn_and_split "*movsi_aarch64" fmov\\t%w0, %s1 fmov\\t%s0, %s1 * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);" - "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) + "&& CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))" [(const_int 0)] "{ @@ -1319,7 +1319,7 @@ (define_insn_and_split "*movdi_aarch64" fmov\\t%x0, %d1 fmov\\t%d0, %d1 * return aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);" - "(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), DImode)) + "&& (CONST_INT_P (operands[1]) && !aarch64_move_imm (
Re: [PATCH 04/11] cris: Update unexpected empty split condition
on 2021/6/4 上午12:12, Hans-Peter Nilsson wrote: >> From: Kewen.Lin >> Date: Thu, 3 Jun 2021 07:45:57 +0200 > >> on 2021/6/2 Hans-Peter Nilsson wrote: From: Kewen Lin Date: Wed, 2 Jun 2021 07:04:54 +0200 >>> gcc/ChangeLog: * config/cris/cris.md (*addi_reload): Fix empty split condition. > - "" + "&& 1" > >>> Ok, thanks, if only for all-round consistency. >>> >>> In preparation for a warning for an empty condition? I'm >>> usually all for .md-warnings, but I'm not sure about the >>> benefit of that one, though. Those "&& 1" look...hackish. >> >> Thanks! Yeah, the 01/11 patch aims to raise one error message >> for the define_insn_and_split whose split condition is empty >> while insn condition isn't. In most cases, when we write one >> define_insn_and_split we want the splitting only to take effect >> while we see the define_insn matching happen (insn cond holds), >> but if we leave the split condition empty, the splitting will >> be done always, it could result in some unexpected consequence. >> Mostly this is unintentional. > > It certainly was in the patch above! > >> The error message is to avoid >> people to make it unintentionally. >> >> As you may have seen from the discussion under the 00/11 thread, >> we will probably end up with some other solution, so I will hold >> the changes for the ports, sorry for wasting your time and the >> other port maintainers'. > > No worries: I certainly don't consider it wasted and I'd > prefer to have the patch above committed sooner than the > conclusion of that discussion. (If you don't get to it, > I'll do it, after a round of testing.) > Thanks for your help on testing! > If you're considering further target patches to adjust for > eventually changed semantics in the define_insn_and_split > split-condition, then whatever trivial patch to cris.md that > gets the effect of the one you sent is preapproved. > OK, thanks again! BR, Kewen
Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions
on 2021/6/3 下午4:05, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi Richi/Richard/Jeff/Segher, >> >> Thanks for the comments! >> >> on 2021/6/3 锟斤拷锟斤拷7:52, Segher Boessenkool wrote: >>> On Wed, Jun 02, 2021 at 06:32:13PM +0100, Richard Sandiford wrote: Richard Biener writes: > So what Richard suggests would be to disallow split conditions > that do not start with "&& ", it's probably easy to do that as well > and look for build fails. That should catch all cases to look at. Yeah. As a strawman proposal, how about: - add a new "define_independent_insn_and_split" that has the current semantics of define_insn_and_split. This should be mechanical. >>> >>> I'd rather not have that -- we can just write separate define_insn and >>> define_split in that case. >>> >> >> Not sure if someone would argue that he/she would like to go with one shared >> pattern as before, to avoid any possible differences between two seperated >> patterns and have good maintainability (like only editing on place) and >> slightly better efficiency. > > Right. Plus it creates less make-work. If we didn't have it, someone > would need to split the define_insn_and_splits that don't currently > use "&&", then later someone might decide that the missing "&&" was a > mistake and need to put them together again (or just revert the patch > and edit from there, I guess). > > Plus define_independent_insn_and_split would act as a flag for something > that might be suspect. If we split them then the define_split condition > will seem to have been chosen deliberately in isolation. > >>> How many such cases *are* there? There are no users exposed to this, >>> and when the split condition is required to start with "&&" (instead of >>> getting that implied) it is not a silent change ever, either. >>> >> >> If I read the proposal right, the explicit "&&" is only required when going >> to find all potential problematic places for final implied "&&" change. >> But one explicit "&&" does offer good readability. > > I don't know. "&& 1" looks kind of weird to me. > > One thing I'd been wondering about a while ago was whether we should key > the split part of define_insn_and_splits off the insn code, instead of > repeating the pattern match and insn C condition. That would make the > split apply only to the associated define_insns, whereas at the moment > they also apply to any earlier (less general) define_insn that happens > to match the pattern and the C conditions. It would also reduce the > complexity of the autogenerated define_split logic. > > I don't know whether that's a good idea or not. But having an explicit > "&&" implies that the generator shouldn't do that, and that it should > retest the insn condition from scratch. > - find the define_insn_and_splits that are missing the "&&", and where missing the "&&" might make a difference. Change them to define_independent_insn_and_splits. Like Richard says, this can be done by temporarily disallowing define_insn_and_splits that have no "&&". >>> >>> If we make that change permanently, that is all steps we ever need! >>> >> >> So the question is that: whether we need to demand an explicit "&&". >> Richard's proposal is for answer "no" which aligns with Richi's auto >> filling advice before. I think it would result in fewer changes since >> those places without explicit "&&" are mostly unintentional, all the jobs >> are done by implied "&&". Its downside seems to be bad readability, new >> readers may take it as two seperated conditions at first glance, but I >> guess if we emphasize this change in the document it would be fine? >> Or emitting one warning if missing an explicit "&&"? > > IMO the natural way to read it is that the split C condition gives the > conditions under which the instruction should be split. I think that's > why forgetting the "&&" is such a common mistake. (I know I've done it > plenty of times.) > > IMO requiring the "&&" is baking in an alternative, less intuitive, > interpretation. > Thanks for the explanation, I was thinking people may have got used to the starting "&&" in split condition, so it's easy for them to read. But I agree it's better not to have it in the natural way. BR, Kewen
[PATCH] c++: tsubst_function_decl and excess arg levels [PR100102]
Here, when instantiating the dependent alias template duration::__is_harmonic with args={{T,U},{int}}, we find ourselves substituting the function decl _S_gcd. Since we have more arg levels than _S_gcd has parm levels, an old special case in tsubst_function_decl causes us to unwantedly reduce args to its innermost level, yielding args={int}, which leads to a nonsensical substitution into the decl's context and an eventual crash. The comment for this special case refers to three examples for which we ought to see more arg levels than parm levels here, but none of the examples actually demonstrate this. In the first example, when defining S::f(U) parms_depth is 2 and args_depth is 1, and later when instantiating say S::f both depths are 2. In the second example, when substituting the template friend declaration parms_depth is 2 and args_depth is 1, and later when instantiating f both depths are 1. Finally, the third example is invalid since we can't specialize a member template of an unspecialized class template like that. Given that this reduction code seems no longer relevant for its documented purpose and that it causes problems as in the PR, this patch just removes it. Note that as far as bootstrap/regtest is concerned, this code is dead; the below two tests would be the first to trigger the removed code. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps backports? Also tested on various other libraries, e.g. range-v3 and cmcstl2. PR c++/100102 gcc/cp/ChangeLog: * pt.c (tsubst_function_decl): Remove old code for reducing args when it has excess levels. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-72.C: New test. * g++.dg/cpp0x/alias-decl-72a.C: New test. --- gcc/cp/pt.c | 39 - gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C | 9 + gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C | 9 + 3 files changed, 18 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 3cac073ed50..a6acdf864d1 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -13909,45 +13909,6 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain, if (tree spec = retrieve_specialization (gen_tmpl, argvec, hash)) return spec; } - - /* We can see more levels of arguments than parameters if -there was a specialization of a member template, like -this: - -template struct S { template void f(); } -template <> template void S::f(U); - -Here, we'll be substituting into the specialization, -because that's where we can find the code we actually -want to generate, but we'll have enough arguments for -the most general template. - -We also deal with the peculiar case: - -template struct S { - template friend void f(); -}; -template void f() {} -template S; -template void f(); - -Here, the ARGS for the instantiation of will be {int, -double}. But, we only need as many ARGS as there are -levels of template parameters in CODE_PATTERN. We are -careful not to get fooled into reducing the ARGS in -situations like: - -template struct S { template void f(U); } -template template <> void S::f(int) {} - -which we can spot because the pattern will be a -specialization in this case. */ - int args_depth = TMPL_ARGS_DEPTH (args); - int parms_depth = - TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (t))); - - if (args_depth > parms_depth && !DECL_TEMPLATE_SPECIALIZATION (t)) - args = get_innermost_template_args (args, parms_depth); } else { diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C new file mode 100644 index 000..8009756dcba --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72.C @@ -0,0 +1,9 @@ +// PR c++/100102 +// { dg-do compile { target c++11 } } + +template struct ratio; +template struct duration { + static constexpr int _S_gcd(); + template using __is_harmonic = ratio<_S_gcd>; + using type = __is_harmonic; +}; diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C new file mode 100644 index 000..a4443e18f9d --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-72a.C @@ -0,0 +1,9 @@ +// PR c++/100102 +// { dg-do compile { target c++11 } } + +template struct ratio; +template struct duration { + static constexpr int _S_gcd(); + template using __is_harmonic = ratio<(duration::_S_gcd)()>; + using type = __is_harmonic; +}; -- 2.32.0.rc2
Re: [PATCH] Simplify (view_convert ~a) < 0 to (view_convert a) >= 0 [PR middle-end/100738]
On Tue, Jun 1, 2021 at 6:17 PM Marc Glisse wrote: > > On Tue, 1 Jun 2021, Hongtao Liu via Gcc-patches wrote: > > > Hi: > > This patch is about to simplify (view_convert:type ~a) < 0 to > > (view_convert:type a) >= 0 when type is signed integer. Similar for > > (view_convert:type ~a) >= 0. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for the trunk? > > > > gcc/ChangeLog: > > > >PR middle-end/100738 > >* match.pd ((view_convert ~a) < 0 --> (view_convert a) >= 0, > >(view_convert ~a) >= 0 --> (view_convert a) < 0): New GIMPLE > >simplification. > > We already have > > /* Fold ~X op C as X op' ~C, where op' is the swapped comparison. */ > (for cmp (simple_comparison) > scmp (swapped_simple_comparison) > (simplify >(cmp (bit_not@2 @0) CONSTANT_CLASS_P@1) >(if (single_use (@2) > && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)) > (scmp @0 (bit_not @1) > > Would it make sense to try and generalize it a bit, say with > > (cmp (nop_convert1? (bit_not @0)) CONSTANT_CLASS_P) > > (scmp (view_convert:XXX @0) (bit_not @1)) > Thanks for your advice, it looks great. And can I use *view_convert1?* instead of *nop_convert1?* here, because the original case is view_convert, and nop_convert would fail to simplify the case. > (I still believe that it is a bad idea that SSA_NAMEs are strongly typed, > encoding the type in operations would be more convenient, but I think the > time for that choice has long gone) > > -- > Marc Glisse -- BR, Hongtao
Re: [PATCH 1/2] [i386] Fold blendv builtins into gimple.
ping On Mon, May 24, 2021 at 12:56 PM Hongtao Liu wrote: > > Hi: > This patch is about to Fold __builtin_ia32_pblendvb128 (a, b, c) as > VEC_COND_EXPR (c < 0, b, a), similar for float version but with > mask operand VIEW_CONVERT_EXPR to same sized integer vectype. > > After folding, blendv related patterns can be redefined as > vec_merge since all elements of mask operand is either const0_rtx or > constm1_rtx now. It could potentially enable more rtl optimizations. > > Besides, although there's no pblendv{d,q} instructions, backend can > still define their patterns and generate blendv{ps,pd} instead. > > Bootstrap and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386-builtin.def (IX86_BUILTIN_BLENDVPD256, > IX86_BUILTIN_BLENDVPS256, IX86_BUILTIN_PBLENDVB256, > IX86_BUILTIN_BLENDVPD, IX86_BUILTIN_BLENDVPS, > IX86_BUILTIN_PBLENDVB128): Replace icode with > CODE_FOR_nothing. > * config/i386/i386-expand.c (ix86_expand_sse_movcc): Use > > gen_avx_blendvd256/gen_avx_blendvq256/gen_sse4_1_blendvd/gen_sse4_1_blendvq > for V8SI/V4DI/V4SI/V2DImode. > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold blendv > builtins. > * config/i386/mmx.md (mmx_blendvps): Change to define_expand. > (*mmx_blendvps): New pattern implemented as vec_merge. > * config/i386/sse.md > (_blendv): Change to > define_expand. > (_pblendvb): Ditto. > (*_blendv): New pattern > implemented as vec_merge. > (*_pblendvb): Ditto. > (*_pblendvb_lt): Redefined as define_insn with > pattern implemented as vec_merge instead of UNSPEC_BLENDV. > (*_blendv_lt): Ditto, > and extend mode to V48_AVX. > (*_pblendvb_not_lt): New. > (*_blendv_ltint): Deleted. > (*_pblendvb_lt): Ditto. > (*_pblendvb_not_lt): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/funcspec-8.c: Replace > __builtin_ia32_blendvpd with __builtin_ia32_roundps_az. > * gcc.target/i386/blendv-1.c: New test. > * gcc.target/i386/blendv-2.c: New test. > > > -- > BR, > Hongtao -- BR, Hongtao
Re: [PATCH] x86: Convert CONST_WIDE_INT to broadcast in move expanders
On Fri, Jun 4, 2021 at 12:39 AM H.J. Lu wrote: > > On Thu, Jun 3, 2021 at 12:39 AM Uros Bizjak wrote: > > > > On Thu, Jun 3, 2021 at 5:49 AM H.J. Lu wrote: > > > > > > Update move expanders to convert the CONST_WIDE_INT operand to vector > > > broadcast from a byte with AVX2. Add ix86_gen_scratch_sse_rtx to > > > return a scratch SSE register which won't increase stack alignment > > > requirement and blocks transformation by the combine pass. > > > > Using fixed scratch reg is just too hackish for my taste. The > > It was recommended to use hard register for things like this: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569945.html I was worried that the temporary (even hard reg) will be spilled under some rare cases. If this is not the case, and even recommended for your use case, then it should be OK. Maybe use a hard register instead of match_scratch in the clobber of the proposed insn pattern below. > > expansion is OK to emit some optimized sequence, but the approach to > > use fixed reg to bypass stack alignment functionality and combine is > > not. > > > > Perhaps a new insn pattern should be introduced, e.g. > > > > (define_insn_and_split "" > >[(set (match_opreand:V 0 "memory_operand" "=m,m") > > (vec_duplicate:V > > (match_operand:S 2 "reg_or_0_operand" "r,C")) > > (clobber (match_scratch:V 1 "=x"))] > > > > and split it at some appropriate point. Please note that zero (C) can be assigned directly to the XMM register, so there is no need for the intermediate integer reg, assuming that zero was propagated into the insn (-1 can be handled this way, too). Due to these propagations, it looks that the correct point to split the insn is after the reload. On a related note, you are using only vpbroadcastb, assuming byte (QImode) granularity. Is it possible to also use HImode and larger modes to handle e.g. initializations of int arrays? Uros. > I will give it a try. > > Thanks. > > > Uros. > > > > > > > > A small benchmark: > > > > > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast > > > > > > shows that broadcast is a little bit faster on Intel Core i7-8559U: > > > > > > $ make > > > gcc -g -I. -O2 -c -o test.o test.c > > > gcc -g -c -o memory.o memory.S > > > gcc -g -c -o broadcast.o broadcast.S > > > gcc -o test test.o memory.o broadcast.o > > > ./test > > > memory : 99333 > > > broadcast: 97208 > > > $ > > > > > > broadcast is also smaller: > > > > > > $ size memory.o broadcast.o > > >textdata bss dec hex filename > > > 132 0 0 132 84 memory.o > > > 122 0 0 122 7a broadcast.o > > > $ > > > > > > gcc/ > > > > > > PR target/100865 > > > * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): > > > New prototype. > > > (ix86_byte_broadcast): New function. > > > (ix86_convert_const_wide_int_to_broadcast): Likewise. > > > (ix86_expand_move): Try ix86_convert_const_wide_int_to_broadcast > > > if mode size is 16 bytes or bigger. > > > (ix86_expand_vector_move): Try > > > ix86_convert_const_wide_int_to_broadcast. > > > * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New > > > prototype. > > > * config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add > > > an argument to ignore stack_alignment_estimated. It is passed > > > as false by default. > > > (ix86_gen_scratch_sse_rtx): New function. > > > > > > gcc/testsuite/ > > > > > > PR target/100865 > > > * gcc.target/i386/pr100865-1.c: New test. > > > * gcc.target/i386/pr100865-2.c: Likewise. > > > * gcc.target/i386/pr100865-3.c: Likewise. > > > * gcc.target/i386/pr100865-4.c: Likewise. > > > * gcc.target/i386/pr100865-5.c: Likewise. > > > --- > > > gcc/config/i386/i386-expand.c | 103 ++--- > > > gcc/config/i386/i386-protos.h | 2 + > > > gcc/config/i386/i386.c | 50 +- > > > gcc/testsuite/gcc.target/i386/pr100865-1.c | 13 +++ > > > gcc/testsuite/gcc.target/i386/pr100865-2.c | 14 +++ > > > gcc/testsuite/gcc.target/i386/pr100865-3.c | 15 +++ > > > gcc/testsuite/gcc.target/i386/pr100865-4.c | 16 > > > gcc/testsuite/gcc.target/i386/pr100865-5.c | 17 > > > 8 files changed, 215 insertions(+), 15 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5.c > > > > > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > > > index 4185f58eed5..658adafa269 100644 > > > --- a/gcc/config/i386/i386-expand.c > > > +++ b/gcc/config/i386/
Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.
On Thu, Jun 3, 2021 at 8:54 AM liuhongt wrote: > > When __builtin_ia32_vzeroupper is called explicitly, the corresponding > vzeroupper pattern does not carry any CLOBBERS or SETs before LRA, > which leads to incorrect optimization in pass_reload. In order to > solve this problem, this patch refine instructions as call_insns in > which the call has a special vzeroupper ABI. > > gcc/ChangeLog: > > PR target/82735 > * config/i386/i386-expand.c (ix86_expand_builtin): Remove > assignment of cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-features.c > (ix86_add_reg_usage_to_vzerouppers): Delete. > (ix86_add_reg_usage_to_vzeroupper): Ditto. > (rest_of_handle_insert_vzeroupper): Remove > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end > of the function. > (gate): Remove cfun->machine->has_explicit_vzeroupper. > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper): > Declared. > * config/i386/i386.c (ix86_insn_callee_abi): New function. > (ix86_initialize_callee_abi): Ditto. > (ix86_expand_avx_vzeroupper): Ditto. > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper > ABI. > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi. > (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper > directly. > * config/i386/i386.h (struct GTY(()) machine_function): Delete > has_explicit_vzeroupper. > * config/i386/i386.md (enum unspec): New member > UNSPEC_CALLEE_ABI. > (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New > define_constants for insn callee abi index. > * config/i386/predicates.md (vzeroupper_pattern): Adjust. > * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted. > (avx_vzeroupper): Call ix86_expand_avx_vzeroupper. > (*avx_vzeroupper): Rename to .. > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as > call_insn which has a special vzeroupper ABI. > (*avx_vzeroupper_1): Deleted. > > gcc/testsuite/ChangeLog: > > PR target/82735 > * gcc.target/i386/pr82735-1.c: New test. > * gcc.target/i386/pr82735-2.c: New test. > * gcc.target/i386/pr82735-3.c: New test. > * gcc.target/i386/pr82735-4.c: New test. > * gcc.target/i386/pr82735-5.c: New test. LGTM, with a small nit below. Thanks, Uros. > --- > gcc/config/i386/i386-expand.c | 4 - > gcc/config/i386/i386-features.c | 99 +++ > gcc/config/i386/i386-protos.h | 1 + > gcc/config/i386/i386.c| 55 - > gcc/config/i386/i386.h| 4 - > gcc/config/i386/i386.md | 10 +++ > gcc/config/i386/predicates.md | 5 +- > gcc/config/i386/sse.md| 59 -- > gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++ > gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 + > gcc/testsuite/gcc.target/i386/pr82735-3.c | 5 ++ > gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++ > gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 + > 13 files changed, 252 insertions(+), 143 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > index 9f3d41955a2..d25d59aa4e7 100644 > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -13282,10 +13282,6 @@ rdseed_step: > >return 0; > > -case IX86_BUILTIN_VZEROUPPER: > - cfun->machine->has_explicit_vzeroupper = true; > - break; > - > default: >break; > } > diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c > index 77783a154b6..a25769ae478 100644 > --- a/gcc/config/i386/i386-features.c > +++ b/gcc/config/i386/i386-features.c > @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p) >return 0; > } > > -/* Modify the vzeroupper pattern in INSN so that it describes the effect > - that the instruction has on the SSE registers. LIVE_REGS are the set > - of registers that are live across the instruction. > - > - For a live register R we use: > - > - (set (reg:V2DF R) (reg:V2DF R)) > - > - which preserves the low 128 bits but clobbers the upper bits. */ > - > -static void > -ix86_add_reg_usage_to_vzeroupper (rtx_insn *insn, bitmap live_regs) > -{ > - rtx pattern = PATTERN (insn); > - unsigned int nregs = TARGET_64BIT ? 16 : 8; > - unsigned int npats = nregs; > - for (unsigned int i = 0; i < nregs; ++i) > -{ > - unsigned int regno = GET_SSE_REGNO
Re: [PATCH 2/2] Fix _mm256_zeroupper by representing the instructions as call_insns in which the call has a special vzeroupper ABI.
On Fri, Jun 4, 2021 at 2:27 PM Uros Bizjak via Gcc-patches wrote: > > On Thu, Jun 3, 2021 at 8:54 AM liuhongt wrote: > > > > When __builtin_ia32_vzeroupper is called explicitly, the corresponding > > vzeroupper pattern does not carry any CLOBBERS or SETs before LRA, > > which leads to incorrect optimization in pass_reload. In order to > > solve this problem, this patch refine instructions as call_insns in > > which the call has a special vzeroupper ABI. > > > > gcc/ChangeLog: > > > > PR target/82735 > > * config/i386/i386-expand.c (ix86_expand_builtin): Remove > > assignment of cfun->machine->has_explicit_vzeroupper. > > * config/i386/i386-features.c > > (ix86_add_reg_usage_to_vzerouppers): Delete. > > (ix86_add_reg_usage_to_vzeroupper): Ditto. > > (rest_of_handle_insert_vzeroupper): Remove > > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end > > of the function. > > (gate): Remove cfun->machine->has_explicit_vzeroupper. > > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper): > > Declared. > > * config/i386/i386.c (ix86_insn_callee_abi): New function. > > (ix86_initialize_callee_abi): Ditto. > > (ix86_expand_avx_vzeroupper): Ditto. > > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper > > ABI. > > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi. > > (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper > > directly. > > * config/i386/i386.h (struct GTY(()) machine_function): Delete > > has_explicit_vzeroupper. > > * config/i386/i386.md (enum unspec): New member > > UNSPEC_CALLEE_ABI. > > (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New > > define_constants for insn callee abi index. > > * config/i386/predicates.md (vzeroupper_pattern): Adjust. > > * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted. > > (avx_vzeroupper): Call ix86_expand_avx_vzeroupper. > > (*avx_vzeroupper): Rename to .. > > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as > > call_insn which has a special vzeroupper ABI. > > (*avx_vzeroupper_1): Deleted. > > > > gcc/testsuite/ChangeLog: > > > > PR target/82735 > > * gcc.target/i386/pr82735-1.c: New test. > > * gcc.target/i386/pr82735-2.c: New test. > > * gcc.target/i386/pr82735-3.c: New test. > > * gcc.target/i386/pr82735-4.c: New test. > > * gcc.target/i386/pr82735-5.c: New test. > > LGTM, with a small nit below. > > Thanks, > Uros. > > > --- > > gcc/config/i386/i386-expand.c | 4 - > > gcc/config/i386/i386-features.c | 99 +++ > > gcc/config/i386/i386-protos.h | 1 + > > gcc/config/i386/i386.c| 55 - > > gcc/config/i386/i386.h| 4 - > > gcc/config/i386/i386.md | 10 +++ > > gcc/config/i386/predicates.md | 5 +- > > gcc/config/i386/sse.md| 59 -- > > gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++ > > gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 + > > gcc/testsuite/gcc.target/i386/pr82735-3.c | 5 ++ > > gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++ > > gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 + > > 13 files changed, 252 insertions(+), 143 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c > > > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > > index 9f3d41955a2..d25d59aa4e7 100644 > > --- a/gcc/config/i386/i386-expand.c > > +++ b/gcc/config/i386/i386-expand.c > > @@ -13282,10 +13282,6 @@ rdseed_step: > > > >return 0; > > > > -case IX86_BUILTIN_VZEROUPPER: > > - cfun->machine->has_explicit_vzeroupper = true; > > - break; > > - > > default: > >break; > > } > > diff --git a/gcc/config/i386/i386-features.c > > b/gcc/config/i386/i386-features.c > > index 77783a154b6..a25769ae478 100644 > > --- a/gcc/config/i386/i386-features.c > > +++ b/gcc/config/i386/i386-features.c > > @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p) > >return 0; > > } > > > > -/* Modify the vzeroupper pattern in INSN so that it describes the effect > > - that the instruction has on the SSE registers. LIVE_REGS are the set > > - of registers that are live across the instruction. > > - > > - For a live register R we use: > > - > > - (set (reg:V2DF R) (reg:V2DF R)) > > - > > - which preserves the low 128 bits but clobbers the upper bits. */ > > - > > -static