[PATCH] [i386] Revert x86_order_regs_for_local_alloc changes in r12-1669.
Still put general regs as first alloca order. This should fix 2 failures introduced by r12-1669, also add xfail to new failed testcases to temporarily avoid regression, eventually xfail should be removed. compare_test log on non-avx512 target Tests that now fail, but worked before (6 tests): gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4 gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1 gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1 unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4 unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1 unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1 Tests that now work, but didn't before (2 tests): unix/-m32: gcc.target/i386/avx512bw-pr70329-1.c execution test unix/-m32: gcc.target/i386/pr96814.c execution test Bootstrap is ok, so is regression test on x86-64-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/101185 * config/i386/i386.c (x86_order_regs_for_local_alloc): Revert r12-1669. gcc/testsuite/ChangeLog PR target/101185 * gcc.target/i386/bitwise_mask_op-3.c: Add xfail to temporarily avoid regression, eventually xfail should be removed. --- gcc/config/i386/i386.c| 13 - gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c | 6 +++--- 2 files changed, 7 insertions(+), 12 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3d5883b8d0e..c71c9e666a4 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -20476,15 +20476,6 @@ x86_order_regs_for_local_alloc (void) int pos = 0; int i; - /* When allocano cost of GENERAL_REGS is same as MASK_REGS, allocate - MASK_REGS first since it has already been disparaged. This is for - testcase bitwise_mask_op3.c where the input is allocated as mask - registers, then mask bitwise instructions should be used there. - Refer to pr101142. */ - /* Mask register. */ - for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) - reg_alloc_order [pos++] = i; - /* First allocate the local general purpose registers. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i)) @@ -20511,6 +20502,10 @@ x86_order_regs_for_local_alloc (void) for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) reg_alloc_order [pos++] = i; + /* Mask register. */ + for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) + reg_alloc_order [pos++] = i; + /* x87 registers. */ if (TARGET_SSE_MATH) for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++) diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c index 4a9078615aa..352c49d6c6b 100644 --- a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c @@ -12,7 +12,7 @@ foo_orb (__m512i a, __m512i b) foo = m1 | m2; } -/* { dg-final { scan-assembler-times "korb\[\t \]" "1" } } */ +/* { dg-final { scan-assembler-times "korb\[\t \]" "1" { xfail *-*-* } } } */ void foo_xorb (__m512i a, __m512i b) @@ -22,7 +22,7 @@ foo_xorb (__m512i a, __m512i b) foo = m1 ^ m2; } -/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" } } */ +/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" { xfail *-*-* } } } */ void foo_andb (__m512i a, __m512i b) @@ -40,4 +40,4 @@ foo_andnb (__m512i a, __m512i b) foo = m1 & ~m2; } -/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} } */ +/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4" { xfail *-*-* } } } */ -- 2.27.0
Re: [PATCH] [i386] Revert x86_order_regs_for_local_alloc changes in r12-1669.
On Thu, Jun 24, 2021 at 10:44 AM liuhongt wrote: > > Still put general regs as first alloca order. > > This should fix 2 failures introduced by r12-1669, also add xfail to new > failed testcases to temporarily avoid regression, eventually xfail should > be removed. > > compare_test log on non-avx512 target > > Tests that now fail, but worked before (6 tests): > > gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t ] 4 > gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] 1 > gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t ] 1 > unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kmovb[\t > ] 4 > unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times korb[\t ] > 1 > unix/-m32: gcc.target/i386/bitwise_mask_op-3.c scan-assembler-times kxorb[\t > ] 1 > > Tests that now work, but didn't before (2 tests): > > unix/-m32: gcc.target/i386/avx512bw-pr70329-1.c execution test > unix/-m32: gcc.target/i386/pr96814.c execution test > > Bootstrap is ok, so is regression test on x86-64-linux-gnu{-m32,}. > Ok for trunk? Yes, let's start improvements from non-regressed state. Thanks, Uros. > gcc/ChangeLog: > > PR target/101185 > * config/i386/i386.c (x86_order_regs_for_local_alloc): > Revert r12-1669. > > gcc/testsuite/ChangeLog > > PR target/101185 > * gcc.target/i386/bitwise_mask_op-3.c: Add xfail to > temporarily avoid regression, eventually xfail should be > removed. > --- > gcc/config/i386/i386.c| 13 - > gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c | 6 +++--- > 2 files changed, 7 insertions(+), 12 deletions(-) > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 3d5883b8d0e..c71c9e666a4 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -20476,15 +20476,6 @@ x86_order_regs_for_local_alloc (void) > int pos = 0; > int i; > > - /* When allocano cost of GENERAL_REGS is same as MASK_REGS, allocate > - MASK_REGS first since it has already been disparaged. This is for > - testcase bitwise_mask_op3.c where the input is allocated as mask > - registers, then mask bitwise instructions should be used there. > - Refer to pr101142. */ > - /* Mask register. */ > - for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) > - reg_alloc_order [pos++] = i; > - > /* First allocate the local general purpose registers. */ > for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i)) > @@ -20511,6 +20502,10 @@ x86_order_regs_for_local_alloc (void) > for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) > reg_alloc_order [pos++] = i; > > + /* Mask register. */ > + for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) > + reg_alloc_order [pos++] = i; > + > /* x87 registers. */ > if (TARGET_SSE_MATH) > for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++) > diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c > b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c > index 4a9078615aa..352c49d6c6b 100644 > --- a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c > +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c > @@ -12,7 +12,7 @@ foo_orb (__m512i a, __m512i b) >foo = m1 | m2; > } > > -/* { dg-final { scan-assembler-times "korb\[\t \]" "1" } } */ > +/* { dg-final { scan-assembler-times "korb\[\t \]" "1" { xfail *-*-* } } } > */ > > void > foo_xorb (__m512i a, __m512i b) > @@ -22,7 +22,7 @@ foo_xorb (__m512i a, __m512i b) >foo = m1 ^ m2; > } > > -/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" } } */ > +/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" { xfail *-*-* } } } > */ > > void > foo_andb (__m512i a, __m512i b) > @@ -40,4 +40,4 @@ foo_andnb (__m512i a, __m512i b) >foo = m1 & ~m2; > } > > -/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} } */ > +/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4" { xfail *-*-* } } } > */ > -- > 2.27.0 >
Re: [RFC] Return NULL from gimple_call_return_type if no return available.
On Wed, Jun 23, 2021 at 9:25 PM Andrew MacLeod wrote: > > On 6/23/21 2:37 PM, Richard Biener via Gcc-patches wrote: > > On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches > > wrote: > >> The call to gimple_call_fntype() in gimple_call_return_type() may > >> return > >> NULL, which causes the TREE_TYPE(lhs) to ICE. I think it would be best > >> to > >> return NULL (or void_type_node) rather than aborting. > >> > >> I'm running into this because fold_using_range::range_of_call, calls > >> gimple_call_return_type which may ICE for builtins with no LHS. > >> Instead > >> of special casing things in range_of_call, perhaps it's best to plug > >> the > >> source. > >> > >> Does this sound reasonable? > > No, you need to make sure to not call this on an internal function call > > instead. > > Otherwise it is never NULL. > > > > Richard. > > Out of curiosity, why is it not OK to call this on an internal function > call? Shouldn't all calls return something at least? like VOIDmode if > they don't return anything? It just seems like it needs to be special > cased either at any possible call site, or simply in the routine. we > stumbled across it and it wasn't obvious why. Well, gimple_call_fntype simply returns NULL because internal functions do not have any API/ABI. So you either deal with a NULL return value but then explicitely checking for an internal function call is clearly better from a source documentation point of view. I think the existing type == NULL check was likely added to avoid touching too much code and we somehow didn't think of internal function calls w/o a LHS (but why are you asking for the return type for a call w/o LHS?) So yes, you could argue that if (type == NULL_TREE) { gcc_assert (gimple_call_internal_p (gs)); tree lhs = gimple_call_lhs (gs); return lhs ? TREE_TYPE (lhs) : void_type_node; } would make gimple_call_return_type more robust. But then I wonder why the function exists in the first place ;) I suppose like gimple_expr_type it's one of those I'd eventually see to go away. Richard. > Andrew > >
Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec
On Wed, Jun 23, 2021 at 12:22 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders > > wrote: > >> > >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote: > >> > On 6/21/21 1:15 AM, Richard Biener wrote: > > [...] > >> > > > >> > > But maybe I'm misunderstanding C++ too much :/ > >> > > > >> > > Well, I guess b) from above means auto_vec<> passing to > >> > > vec<> taking functions will need changes? > >> > > >> > Converting an auto_vec object to a vec slices off its data members. > >> > The auto_vec specialization has no data members so that's not > >> > a bug in and of itself, but auto_vec does have data members > >> > so that would be a bug. The risk is not just passing it to > >> > functions by value but also returning it. That risk was made > >> > worse by the addition of the move ctor. > >> > >> I would agree that the conversion from auto_vec<> to vec<> is > >> questionable, and should get some work at some point, perhaps just > >> passingauto_vec references is good enough, or perhaps there is value in > >> some const_vec view to avoid having to rely on optimizations, I'm not > >> sure without looking more at the usage. > > > > We do need to be able to provide APIs that work with both auto_vec<> > > and vec<>, I agree that those currently taking a vec<> by value are > > fragile (and we've had bugs there before), but I'm not ready to say > > that changing them all to [const] vec<>& is OK. The alternative > > would be passing a const_vec<> by value, passing that along to > > const vec<>& APIs should be valid then (I can see quite some API > > boundary cleanups being necessary here ...). > > FWIW, as far as const_vec<> goes, we already have array_slice, which is > mostly a version of std::span. (The only real reason for not using > std::span itself is that it requires a later version of C++.) > > Taking those as arguments has the advantage that we can pass normal > arrays as well. It's not really a "const" thing it seems though it certainly would not expose any API that would reallocate the vector (which is the problematic part of passing vec<> by value, not necessarily changing elements in-place). Since array_slice doesn't inherit most of the vec<> API transforming an API from vec<> to array_slice<> will be also quite some work. But I agree it might be useful for generic API stuff. Richard. > Thanks, > Richard
Re: predcom: Refactor more by encapsulating global states
Hi Martin, on 2021/6/23 上午12:14, Martin Sebor wrote: > On 6/21/21 8:35 PM, Kewen.Lin wrote: >> Hi Richi and Martin, >> Thanks Richi! One draft (not ready for review) is attached for the further discussion. It follows the idea of RAII-style cleanup. I noticed that Martin suggested stepping forward to make tree_predictive_commoning_loop and its callees into one class (Thanks Martin), since there are not many this kind of C++-style work functions, I want to double confirm which option do you guys prefer? >>> >>> Such general cleanup is of course desired - Giuliano started some of it >>> within >>> GSoC two years ago in the attempt to thread the compilation process. The >>> cleanup then helps to get rid of global state which of course interferes >>> here >>> (and avoids unnecessary use of TLS vars). >>> >>> So yes, encapsulating global state into a class and making accessors >>> member functions is something that is desired (but a lot of mechanical >>> work). >>> >>> Thanks >>> Richard. >>> >>> I meant that not necessarily as something to include in this patch >>> but as a suggestion for a future improvement. If you'd like to >>> tackle it at any point that would be great of course In any >>> event, thanks for double-checking! >>> >>> The attached patch looks good to me as well (more for the sake of >>> style than anything else, declaring the class copy ctor and copy assignment >>> = delete would > make it clear it's not meant to be >>> copied, although in this case it's unlikely to make a practical >>> difference). >>> >>> Martin. >> >> >> Thanks for your explanation! Sorry for the late response. >> As the way to encapsulate global state into a class and making accessors >> member functions looks more complete, I gave up the RAII draft and >> switched onto this way. >> >> This patch is to encapsulate global states into a class and >> making their accessors as member functions, remove some >> consequent useless clean up code, and do some clean up with >> RAII. > > Nice! > > A further improvement worth considering (if you're so inclined :) > is replacing the pcom_worker vec members with auto_vec (obviating > having to explicitly release them) and for the same reason also > replacing the comp_ptrs bare pointer members with auto_vecs. > There may be other opportunities to do the same in individual > functions (I'd look to get rid of as many calls to functions > like XNEW()/XNEWVEC() and free() use auto_vec instead). > > An unrelated but worthwhile change is to replace the FOR_EACH_ > loops with C++ 11 range loops, analogously to: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572315.html > > Finally, the only loosely followed naming convention for member > variables is to start them with the m_ prefix. > > These just suggestions that could be done in a followup, not > something I would consider prerequisite for accepting the patch > as is if I were in a position to make such a decision. > Many thanks for all the great suggestions! I'll deal with them in a follow up patch. BR, Kewen
Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec
On Thu, Jun 24, 2021 at 12:56 AM Martin Sebor wrote: > > On 6/23/21 1:43 AM, Richard Biener wrote: > > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders > > wrote: > >> > >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote: > >>> On 6/21/21 1:15 AM, Richard Biener wrote: > > [...] > > But maybe I'm misunderstanding C++ too much :/ > > Well, I guess b) from above means auto_vec<> passing to > vec<> taking functions will need changes? > >>> > >>> Converting an auto_vec object to a vec slices off its data members. > >>> The auto_vec specialization has no data members so that's not > >>> a bug in and of itself, but auto_vec does have data members > >>> so that would be a bug. The risk is not just passing it to > >>> functions by value but also returning it. That risk was made > >>> worse by the addition of the move ctor. > >> > >> I would agree that the conversion from auto_vec<> to vec<> is > >> questionable, and should get some work at some point, perhaps just > >> passingauto_vec references is good enough, or perhaps there is value in > >> some const_vec view to avoid having to rely on optimizations, I'm not > >> sure without looking more at the usage. > > > > We do need to be able to provide APIs that work with both auto_vec<> > > and vec<>, I agree that those currently taking a vec<> by value are > > fragile (and we've had bugs there before), but I'm not ready to say > > that changing them all to [const] vec<>& is OK. The alternative > > would be passing a const_vec<> by value, passing that along to > > const vec<>& APIs should be valid then (I can see quite some API > > boundary cleanups being necessary here ...). > > > > But with all this I don't know how to adjust auto_vec<> to no > > longer "decay" to vec<> but still being able to pass it to vec<>& > > and being able to call vec<> member functions w/o jumping through > > hoops. Any hints on that? private inheritance achieves the first > > but also hides all the API ... > > The simplest way to do that is by preventing the implicit conversion > between the two types and adding a to_vec() member function to > auto_vec as Jason suggested. This exposes the implicit conversions > to the base vec, forcing us to review each and "fix" it either by > changing the argument to either vec& or const vec&, or less often > to avoid the indirection if necessary, by converting the auto_vec > to vec explicitly by calling to_vec(). The attached diff shows > a very rough POC of what that might look like. A side benefit > is that it improves const-safety and makes the effects of functions > on their callers easier to understand. > > But that's orthogonal to making auto_vec copy constructible and copy > assignable. That change can be made independently and with much less > effort and no risk. There's a bunch of stuff I can't review because of lack of C++ knowledge (the vNULL changes). I suppose that + /* Prevent implicit conversion from auto_vec. Use auto_vec::to_vec() + instead. */ + template + vec (auto_vec &) = delete; + + template + void operator= (auto_vec &) = delete; still allows passing auto_vec<> to [const] vec<> & without the .to_vec so it's just the by-value passing that becomes explicit? If so then I agree this is an improvement. The patch is likely incomplete though or is it really only so few signatures that need changing? Thanks, Richard. > Martin
Re: predcom: Refactor more by encapsulating global states
on 2021/6/23 下午3:22, Richard Biener wrote: > On Tue, Jun 22, 2021 at 4:35 AM Kewen.Lin wrote: >> >> Hi Richi and Martin, >> Thanks Richi! One draft (not ready for review) is attached for the further discussion. It follows the idea of RAII-style cleanup. I noticed that Martin suggested stepping forward to make tree_predictive_commoning_loop and its callees into one class (Thanks Martin), since there are not many this kind of C++-style work functions, I want to double confirm which option do you guys prefer? >>> >>> Such general cleanup is of course desired - Giuliano started some of it >>> within >>> GSoC two years ago in the attempt to thread the compilation process. The >>> cleanup then helps to get rid of global state which of course interferes >>> here >>> (and avoids unnecessary use of TLS vars). >>> >>> So yes, encapsulating global state into a class and making accessors >>> member functions is something that is desired (but a lot of mechanical >>> work). >>> >>> Thanks >>> Richard. >>> >>> I meant that not necessarily as something to include in this patch >>> but as a suggestion for a future improvement. If you'd like to >>> tackle it at any point that would be great of course In any >>> event, thanks for double-checking! >>> >>> The attached patch looks good to me as well (more for the sake of >>> style than anything else, declaring the class copy ctor and copy assignment >>> = delete would > make it clear it's not meant to be >>> copied, although in this case it's unlikely to make a practical >>> difference). >>> >>> Martin. >> >> >> Thanks for your explanation! Sorry for the late response. >> As the way to encapsulate global state into a class and making accessors >> member functions looks more complete, I gave up the RAII draft and >> switched onto this way. >> >> This patch is to encapsulate global states into a class and >> making their accessors as member functions, remove some >> consequent useless clean up code, and do some clean up with >> RAII. >> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu, also >> bootstrapped on ppc64le P9 with bootstrap-O3 config. >> >> Is it ok for trunk? > > OK, thanks for the work - Martins suggestions are good but indeed can be > handled as followup. > Thanks Richi! Re-tested and committed in r12-1767, I will make up a follow up patch for Martin's suggestions. BR, Kewen
Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support
on 2021/6/24 上午12:58, Segher Boessenkool wrote: > On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote: +#ifdef FLOAT128_HW_INSNS_ISA3_1 TFtype __floattikf (TItype_ppc) __attribute__ ((__ifunc__ ("__floattikf_resolve"))); >>> >>> I wonder if we now need TItype_ppc at all anymore, btw? >> >> Sorry that I don't quite follow this question. > > I thought it may do the same as just TItype now, but the ifunc stuff > probably makes it different still :-) > Ah, thanks for the clarification! If I read it right, TItype is defined with __attribute__ ((mode (TI))) while TItype_ppc is defined with __attribute__ ((__mode__ (__TI__))), the later writing looks special. BR, Kewen
Re: [PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO
On Mon, Jun 21, 2021 at 11:55 PM Martin Sebor via Gcc-patches wrote: > > Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html > > Looking for a review of the LTO changes to switch TREE_NO_WARNING to > the suppress_warning() API. Hmm, since the warning suppressions are on location ad-hoc data the appropriate thing is to adjust the location streaming and that part seems to be missing? So what you now stream is simply the "everything" fallback, correct? In particular: else -bp_pack_value (bp, TREE_NO_WARNING (expr), 1); +/* FIXME: pack all warning bits. */ +bp_pack_value (bp, warning_suppressed_p (expr), 1); this looks like a wrong comment in that light. - else -compare_values (TREE_NO_WARNING); + else if (t1->base.nowarning_flag != t2->base.nowarning_flag) +return false; uh. Can you keep sth like TREE_NO_WARNING_RAW or so? Thanks, Richard. > On 6/4/21 3:43 PM, Martin Sebor wrote: > > The attached patch replaces the uses of TREE_NO_WARNING in the LTO > > front end with the new suppress_warning() API. It adds a couple of > > FIXMEs that I plan to take care of in a follow up. >
[committed] openmp: in_reduction clause support on target construct
Hi! This patch adds support for in_reduction clause on target construct, though for now only for synchronous targets (without nowait clause). The encountering thread in that case runs the target task and blocks until the target region ends, so it is implemented by remapping it before entering the target, initializing the private copy if not yet initialized for the current thread and then using the remapped addresses for the mapping addresses. For nowait combined with in_reduction the patch contains a hack where the nowait clause is ignored. To implement it correctly, I think we would need to create a new private variable for the in_reduction and initialize it before doing the async target and adjust the map addresses to that private variable and then pass a function pointer to the library routine with code where the callback would remap the address to the current threads private variable and use in_reduction combiner to combine the private variable we've created into the thread's copy. The library would then need to make sure that the routine is called in some thread participating in the parallel (and not in an unshackeled thread). Bootstrapped/regtested on x86_64-linux and i686-linux (with no offloading) and regtested on x86_64-linux with offloading to nvptx-none, committed to trunk. 2021-06-24 Jakub Jelinek gcc/ * tree.h (OMP_CLAUSE_MAP_IN_REDUCTION): Document meaning for OpenMP. * gimplify.c (gimplify_scan_omp_clauses): For OpenMP map clauses with OMP_CLAUSE_MAP_IN_REDUCTION flag partially defer gimplification of non-decl OMP_CLAUSE_DECL. For OMP_CLAUSE_IN_REDUCTION on OMP_TARGET user outer_ctx instead of ctx for placeholders and initializer/combiner gimplification. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_MAP_IN_REDUCTION on target constructs. (lower_rec_input_clauses): Likewise. (lower_omp_target): Likewise. * omp-expand.c (expand_omp_target): Temporarily ignore nowait clause on target if in_reduction is present. gcc/c-family/ * c-common.h (enum c_omp_region_type): Add C_ORT_TARGET and C_ORT_OMP_TARGET. * c-omp.c (c_omp_split_clauses): For OMP_CLAUSE_IN_REDUCTION on combined target constructs also add map (always, tofrom:) clause. gcc/c/ * c-parser.c (omp_split_clauses): Pass C_ORT_OMP_TARGET instead of C_ORT_OMP for clauses on target construct. (OMP_TARGET_CLAUSE_MASK): Add in_reduction clause. (c_parser_omp_target): For non-combined target add map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION. Pass C_ORT_OMP_TARGET to c_finish_omp_clauses. * c-typeck.c (handle_omp_array_sections): Adjust ort handling for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are never present on C_ORT_*DECLARE_SIMD. (c_finish_omp_clauses): Likewise. Handle OMP_CLAUSE_IN_REDUCTION on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on corresponding map clauses. gcc/cp/ * parser.c (cp_omp_split_clauses): Pass C_ORT_OMP_TARGET instead of C_ORT_OMP for clauses on target construct. (OMP_TARGET_CLAUSE_MASK): Add in_reduction clause. (cp_parser_omp_target): For non-combined target add map (always, tofrom:) clauses for OMP_CLAUSE_IN_REDUCTION. Pass C_ORT_OMP_TARGET to finish_omp_clauses. * semantics.c (handle_omp_array_sections_1): Adjust ort handling for addition of C_ORT_OMP_TARGET and simplify, mapping clauses are never present on C_ORT_*DECLARE_SIMD. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Handle OMP_CLAUSE_IN_REDUCTION on C_ORT_OMP_TARGET, set OMP_CLAUSE_MAP_IN_REDUCTION on corresponding map clauses. * pt.c (tsubst_expr): Pass C_ORT_OMP_TARGET instead of C_ORT_OMP for clauses on target construct. gcc/testsuite/ * c-c++-common/gomp/target-in-reduction-1.c: New test. * c-c++-common/gomp/clauses-1.c: Add in_reduction clauses on target or combined target constructs. libgomp/ * testsuite/libgomp.c-c++-common/target-in-reduction-1.c: New test. * testsuite/libgomp.c-c++-common/target-in-reduction-2.c: New test. * testsuite/libgomp.c++/target-in-reduction-1.C: New test. * testsuite/libgomp.c++/target-in-reduction-2.C: New test. --- gcc/tree.h.jj 2021-06-23 09:57:34.624714820 +0200 +++ gcc/tree.h 2021-06-23 13:33:00.372434259 +0200 @@ -1651,7 +1651,8 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \ TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) /* Nonzero if this map clause is for an OpenACC compute construct's reduction - variable. */ + variable or OpenMP map clause mentioned also in in_reduction clause on the + same construct. */ #define OMP_CLAUSE_MAP_IN_REDUCTION(NOD
[PATCH] Fix SLP permute propagation error
This fixes SLP permute propagation to not propagate across operations that have different semantics on different lanes like for example the recently added COMPLEX_ADD_ROT90. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-06-24 Richard Biener * tree-vect-slp.c (vect_optimize_slp): Do not propagate across operations that have different semantics on different lanes. --- gcc/tree-vect-slp.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 29db56ed532..69ee8faed09 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3680,18 +3680,34 @@ vect_optimize_slp (vec_info *vinfo) { int idx = ipo[i-1]; slp_tree node = vertices[idx].node; - /* For leafs there's nothing to do - we've seeded permutes -on those above. */ - if (SLP_TREE_DEF_TYPE (node) != vect_internal_def) + + /* Handle externals and constants optimistically throughout the +iteration. */ + if (SLP_TREE_DEF_TYPE (node) == vect_external_def + || SLP_TREE_DEF_TYPE (node) == vect_constant_def) continue; vertices[idx].visited = 1; - /* We cannot move a permute across a store. */ - if (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node)) - && DR_IS_WRITE - (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node + /* We do not handle stores with a permutation. */ + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node); + if (STMT_VINFO_DATA_REF (rep) + && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep))) continue; + /* We cannot move a permute across an operation that is +not independent on lanes. Note this is an explicit +negative list since that's much shorter than the respective +positive one but it's critical to keep maintaining it. */ + if (is_gimple_call (STMT_VINFO_STMT (rep))) + switch (gimple_call_combined_fn (STMT_VINFO_STMT (rep))) + { + case CFN_COMPLEX_ADD_ROT90: + case CFN_COMPLEX_ADD_ROT270: + case CFN_COMPLEX_MUL: + case CFN_COMPLEX_MUL_CONJ: + continue; + default:; + } int perm = -1; for (graph_edge *succ = slpg->vertices[idx].succ; -- 2.26.2
[PATCH] stor-layout: Avoid DECL_BIT_FIELD_REPRESENTATIVE with NULL TREE_TYPE [PR101172]
Hi! finish_bitfield_representative has an early out if the field after a bitfield has error_mark_node type, but that early out leads to TREE_TYPE of the DECL_BIT_FIELD_REPRESENTATIVE being NULL, which breaks assumptions on code that uses the DECL_BIT_FIELD_REPRESENTATIVE during error-recovery. The following patch instead sets TREE_TYPE of the representative to error_mark_node, something the users can deal with better. At this point the representative can be set as DECL_BIT_FIELD_REPRESENTATIVE for multiple bitfields, so making sure that we clear the DECL_BIT_FIELD_REPRESENTATIVE instead would be harder (but doable, e.g. with the error_mark_node TREE_TYPE set by this patch set some flag in the caller and if the flag is there, walk all the fields once again and clear all DECL_BIT_FIELD_REPRESENTATIVE that have error_mark_node TREE_TYPE). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-06-24 Jakub Jelinek PR middle-end/101172 * stor-layout.c (finish_bitfield_representative): If nextf has error_mark_node type, set repr type to error_mark_node too. * gcc.dg/pr101172.c: New test. --- gcc/stor-layout.c.jj2021-06-21 09:39:21.798485474 +0200 +++ gcc/stor-layout.c 2021-06-23 11:10:22.617680051 +0200 @@ -2086,7 +2086,10 @@ finish_bitfield_representative (tree rep /* If there was an error, the field may be not laid out correctly. Don't bother to do anything. */ if (TREE_TYPE (nextf) == error_mark_node) - return; + { + TREE_TYPE (repr) = error_mark_node; + return; + } maxsize = size_diffop (DECL_FIELD_OFFSET (nextf), DECL_FIELD_OFFSET (repr)); if (tree_fits_uhwi_p (maxsize)) --- gcc/testsuite/gcc.dg/pr101172.c.jj 2021-06-23 11:15:19.934670004 +0200 +++ gcc/testsuite/gcc.dg/pr101172.c 2021-06-23 11:15:07.255841009 +0200 @@ -0,0 +1,20 @@ +/* PR middle-end/101172 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +union U +{ + int a[3]; + struct + { +int a : 3; +struct this_struct var;/* { dg-error "field 'var' has incomplete type" } */ + } b; +}; + +const union U hello = {.a = {1, 2, 3}}; + +void foo() +{ + int x = hello.b.a; +} Jakub
[PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
This patch enhances FRE by recording equivalences generated by conditional statements, so that we can find more optimize opportunities in situations like following two cases: case 1: void f (unsigned int a, unsigned int b) { if (a == b) { for (unsigned i = 0; i < a; i++) { if (i == b) foo (); /* Unreachable */ } } } The "i == b" condition is always false, yet original FRE cannot predict this. Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s result. (Moreover, VRP can't infer "i == b" is false either, as its boundary calculation is hindered by the "unsigned" modifier.) case 2: Consider GIMPLE code: : if (a != 0) goto ; [INV] else goto ; [INV] : : # c = PHI if (a != 0) goto ; [INV] else goto ; [INV] : if (c > x) goto ; [INV] else goto ; [INV] : __builtin_puts (&"Unreachable!"[0]); : __builtin_puts (&"a"[0]); : ... The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the same conditional statement, meanwhile bb2 dominates bb4, so it is predictable that c==x at bb5 and c==y at bb7. Keeping record of this might bring further optimizations, such as removing the conditional in bb5. The basic idea is to use a hashmap to record additional equivalence information generated by conditional statements. Insert to the map: 1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record the equivalence of x and y at edge destiny in the map. 2. Consider case 2 above, when we fail to predict the result of a conditional expression (at bb4), if following conditions be satisfied: a. There is a previous corresponding predicate. In this case, it will be "a != 0 is true" at bb3. b. bb3's single predecessor bb2 dominates bb4. c. bb3's conditional statement's predicate code (or inverted code) is identical with that of bb4. (This condition can be relaxed.) Then we can try to find a PHI operation along A's true and false edge, and record the equivalence of PHI operation's result and arguments at bb4's true and false destinations. Regarding this case, "c==x at bb5" and "c==y at bb7" will be recorded. Use the map: When we cannot find a prediction result for a conditional statement by original process, replace conditional expression's operands with qualified equivalence and try again. As searching predicates and the ssa names to record are based on value numbering, we need to "unwind" the equivalence map for iteration. Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu. Regards, Di Zhao Extend FRE with an "equivalence map" for condition prediction. 2021-06-24 Di Zhao gcc/ChangeLog: PR tree-optimization/101186 * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function. (dominated_by_p_w_unex): Moved upward, no change. (vn_nary_op_get_predicated_value): Moved upward, no change. (struct val_equiv_hasher): Hasher for the "equivalence map". (compute_single_op_hash): Compute hashcode from ssa name. (is_vn_qualified_at_bb): Check if vn_pval is valid at BB. (val_equiv_insert): Insert into "equivalence map". (vn_lookup_cond_result): Lookup conditional expression's result by VN. (find_predicated_value_by_equiv): Search for predicated value, utilizing equivalences that we have recorded. (record_equiv_from_previous_edge): Record equivalence relation from a previouse edge on current edge. (record_equiv_from_previous_cond): Search for equivalences generated by previous conditional statements, and record them on current BB's successors. (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert into the "equivalence map" for predicate like "x==y is true". (free_rpo_vn): Free the "equivalence map". (process_bb): Insert into & lookup from the "equivalence map". (struct unwind_state): Add "equivalence map" unwind state. (do_unwind): Unwind the "equivalence map". (do_rpo_vn): Update "equivalence map" unwind state. gcc/testsuite/ChangeLog: PR tree-optimization/101186 * gcc.dg/tree-ssa/vrp03.c: Disable fre. * gcc.dg/tree-ssa/ssa-fre-95.c: New test. * gcc.dg/tree-ssa/ssa-fre-96.c: New test. tree-optimization-101186.patch Description: tree-optimization-101186.patch
[PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]
Hi! The recent addition of gcc_assert (regno < endregno); triggers during glibc build on m68k. The problem is that RA decisions shouldn't depend on expressions in DEBUG_INSNs and those expressions can contain paradoxical subregs of certain pseudos. If RA then decides to allocate the pseudo to a register with very small hard register REGNO, we can trigger the new assert, as (int) subreg_regno_offset may be negative on big endian and the small REGNO + the negative offset can wrap around. The following patch in that case records the range from the REGNO 0 to endregno, before the addition of the assert as both regno and endregno are unsigned it wouldn't record anything at all silently. Bootstrapped/regtested on x86_64-linux and i686-linux and tested with a cross compiler to m68k-liux on the testcase, ok for trunk? 2021-06-24 Jakub Jelinek PR middle-end/101170 * df-scan.c (df_ref_record): For paradoxical big-endian SUBREGs where regno + subreg_regno_offset wraps around use 0 as starting regno. * gcc.dg/pr101170.c: New test. --- gcc/df-scan.c.jj2021-06-22 10:04:46.371208994 +0200 +++ gcc/df-scan.c 2021-06-23 12:46:51.654678805 +0200 @@ -2576,9 +2576,21 @@ df_ref_record (enum df_ref_class cl, if (GET_CODE (reg) == SUBREG) { - regno += subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)), - SUBREG_BYTE (reg), GET_MODE (reg)); - endregno = regno + subreg_nregs (reg); + int off = subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)), +SUBREG_BYTE (reg), GET_MODE (reg)); + unsigned int nregno = regno + off; + endregno = nregno + subreg_nregs (reg); + if (off < 0 && regno < (unsigned) -off) + /* Deal with paradoxical SUBREGs on big endian where + in debug insns the hard reg number might be smaller + than -off, such as (subreg:DI (reg:SI 0 [+4 ]) 0)); + RA decisions shouldn't be affected by debug insns + and so RA can decide to put pseudo into a hard reg + with small REGNO, even when it is referenced in + a paradoxical SUBREG in a debug insn. */ + regno = 0; + else + regno = nregno; } else endregno = END_REGNO (reg); --- gcc/testsuite/gcc.dg/pr101170.c.jj 2021-06-23 12:27:08.866593960 +0200 +++ gcc/testsuite/gcc.dg/pr101170.c 2021-06-23 12:26:55.823769555 +0200 @@ -0,0 +1,37 @@ +/* PR middle-end/101170 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -g" } */ + +#include + +struct S { int a; int b[4]; } s; +va_list ap; +int i; +long long l; + +struct S +foo (int x) +{ + struct S a = {}; + do +if (x) + return a; + while (1); +} + +__attribute__((noipa)) void +bar (void) +{ + for (; i; i++) +l |= va_arg (ap, long long) << s.b[i]; + if (l) +foo (l); +} + +void +baz (int v, ...) +{ + va_start (ap, v); + bar (); + va_end (ap); +} Jakub
Re: [PATCH] stor-layout: Avoid DECL_BIT_FIELD_REPRESENTATIVE with NULL TREE_TYPE [PR101172]
On Thu, 24 Jun 2021, Jakub Jelinek wrote: > Hi! > > finish_bitfield_representative has an early out if the field after a > bitfield has error_mark_node type, but that early out leads to TREE_TYPE > of the DECL_BIT_FIELD_REPRESENTATIVE being NULL, which breaks assumptions > on code that uses the DECL_BIT_FIELD_REPRESENTATIVE during error-recovery. > > The following patch instead sets TREE_TYPE of the representative to > error_mark_node, something the users can deal with better. At this point > the representative can be set as DECL_BIT_FIELD_REPRESENTATIVE for multiple > bitfields, so making sure that we clear the DECL_BIT_FIELD_REPRESENTATIVE > instead would be harder (but doable, e.g. with the error_mark_node TREE_TYPE > set by this patch set some flag in the caller and if the flag is there, walk > all the fields once again and clear all DECL_BIT_FIELD_REPRESENTATIVE that > have error_mark_node TREE_TYPE). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. > 2021-06-24 Jakub Jelinek > > PR middle-end/101172 > * stor-layout.c (finish_bitfield_representative): If nextf has > error_mark_node type, set repr type to error_mark_node too. > > * gcc.dg/pr101172.c: New test. > > --- gcc/stor-layout.c.jj 2021-06-21 09:39:21.798485474 +0200 > +++ gcc/stor-layout.c 2021-06-23 11:10:22.617680051 +0200 > @@ -2086,7 +2086,10 @@ finish_bitfield_representative (tree rep >/* If there was an error, the field may be not laid out > correctly. Don't bother to do anything. */ >if (TREE_TYPE (nextf) == error_mark_node) > - return; > + { > + TREE_TYPE (repr) = error_mark_node; > + return; > + } >maxsize = size_diffop (DECL_FIELD_OFFSET (nextf), >DECL_FIELD_OFFSET (repr)); >if (tree_fits_uhwi_p (maxsize)) > --- gcc/testsuite/gcc.dg/pr101172.c.jj2021-06-23 11:15:19.934670004 > +0200 > +++ gcc/testsuite/gcc.dg/pr101172.c 2021-06-23 11:15:07.255841009 +0200 > @@ -0,0 +1,20 @@ > +/* PR middle-end/101172 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +union U > +{ > + int a[3]; > + struct > + { > +int a : 3; > +struct this_struct var; /* { dg-error "field 'var' has incomplete type" > } */ > + } b; > +}; > + > +const union U hello = {.a = {1, 2, 3}}; > + > +void foo() > +{ > + int x = hello.b.a; > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[PATCH] c: Fix C cast error-recovery [PR101171]
Hi! The following testcase ICEs during error-recovery, as build_c_cast calls note_integer_operands on error_mark_node and that wraps it into C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on. Seems most other callers of note_integer_operands check early if something is error_mark_node and return before calling note_integer_operands on it. The following patch fixes it by not calling on error_mark_node, another possibility would be to handle error_mark_node in note_integer_operands and just return it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-06-24 Jakub Jelinek PR c/101171 * c-typeck.c (build_c_cast): Don't call note_integer_operands on error_mark_node. * gcc.dg/pr101171.c: New test. --- gcc/c/c-typeck.c.jj 2021-06-23 13:33:00.375434219 +0200 +++ gcc/c/c-typeck.c2021-06-23 17:51:17.501401208 +0200 @@ -6131,6 +6131,7 @@ build_c_cast (location_t loc, tree type, return value reflects this. */ if (int_operands && INTEGRAL_TYPE_P (type) + && value != error_mark_node && !EXPR_INT_CONST_OPERANDS (value)) value = note_integer_operands (value); --- gcc/testsuite/gcc.dg/pr101171.c.jj 2021-06-23 17:56:53.409896019 +0200 +++ gcc/testsuite/gcc.dg/pr101171.c 2021-06-23 17:56:28.668227851 +0200 @@ -0,0 +1,13 @@ +/* PR c/101171 */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +extern void foo (void); +int x = 0x1234; + +void +bar (void) +{ + if (x != (sizeof ((enum T) 0x1234))) /* { dg-error "conversion to incomplete type" } */ +foo (); +} Jakub
Re: [PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]
On Thu, 24 Jun 2021, Jakub Jelinek wrote: > Hi! > > The recent addition of gcc_assert (regno < endregno); triggers during > glibc build on m68k. > The problem is that RA decisions shouldn't depend on expressions in > DEBUG_INSNs and those expressions can contain paradoxical subregs of certain > pseudos. If RA then decides to allocate the pseudo to a register > with very small hard register REGNO, we can trigger the new assert, > as (int) subreg_regno_offset may be negative on big endian and the small > REGNO + the negative offset can wrap around. Hm, I wonder if we should reset the debug_insn if the RA made a decision that produces such non-sensical result for a debug_insn use? > The following patch in that case records the range from the REGNO 0 to > endregno, before the addition of the assert as both regno and endregno are > unsigned it wouldn't record anything at all silently. > > Bootstrapped/regtested on x86_64-linux and i686-linux and tested with a > cross compiler to m68k-liux on the testcase, ok for trunk? OK. Thanks, Richard. > 2021-06-24 Jakub Jelinek > > PR middle-end/101170 > * df-scan.c (df_ref_record): For paradoxical big-endian SUBREGs > where regno + subreg_regno_offset wraps around use 0 as starting > regno. > > * gcc.dg/pr101170.c: New test. > > --- gcc/df-scan.c.jj 2021-06-22 10:04:46.371208994 +0200 > +++ gcc/df-scan.c 2021-06-23 12:46:51.654678805 +0200 > @@ -2576,9 +2576,21 @@ df_ref_record (enum df_ref_class cl, > >if (GET_CODE (reg) == SUBREG) > { > - regno += subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)), > - SUBREG_BYTE (reg), GET_MODE (reg)); > - endregno = regno + subreg_nregs (reg); > + int off = subreg_regno_offset (regno, GET_MODE (SUBREG_REG (reg)), > + SUBREG_BYTE (reg), GET_MODE (reg)); > + unsigned int nregno = regno + off; > + endregno = nregno + subreg_nregs (reg); > + if (off < 0 && regno < (unsigned) -off) > + /* Deal with paradoxical SUBREGs on big endian where > +in debug insns the hard reg number might be smaller > +than -off, such as (subreg:DI (reg:SI 0 [+4 ]) 0)); > +RA decisions shouldn't be affected by debug insns > +and so RA can decide to put pseudo into a hard reg > +with small REGNO, even when it is referenced in > +a paradoxical SUBREG in a debug insn. */ > + regno = 0; > + else > + regno = nregno; > } >else > endregno = END_REGNO (reg); > --- gcc/testsuite/gcc.dg/pr101170.c.jj2021-06-23 12:27:08.866593960 > +0200 > +++ gcc/testsuite/gcc.dg/pr101170.c 2021-06-23 12:26:55.823769555 +0200 > @@ -0,0 +1,37 @@ > +/* PR middle-end/101170 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -g" } */ > + > +#include > + > +struct S { int a; int b[4]; } s; > +va_list ap; > +int i; > +long long l; > + > +struct S > +foo (int x) > +{ > + struct S a = {}; > + do > +if (x) > + return a; > + while (1); > +} > + > +__attribute__((noipa)) void > +bar (void) > +{ > + for (; i; i++) > +l |= va_arg (ap, long long) << s.b[i]; > + if (l) > +foo (l); > +} > + > +void > +baz (int v, ...) > +{ > + va_start (ap, v); > + bar (); > + va_end (ap); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [PATCH] df: Fix up handling of paradoxical subregs in debug insns [PR101170]
On Thu, Jun 24, 2021 at 12:13:18PM +0200, Richard Biener wrote: > > The recent addition of gcc_assert (regno < endregno); triggers during > > glibc build on m68k. > > The problem is that RA decisions shouldn't depend on expressions in > > DEBUG_INSNs and those expressions can contain paradoxical subregs of certain > > pseudos. If RA then decides to allocate the pseudo to a register > > with very small hard register REGNO, we can trigger the new assert, > > as (int) subreg_regno_offset may be negative on big endian and the small > > REGNO + the negative offset can wrap around. > > Hm, I wonder if we should reset the debug_insn if the RA made a decision > that produces such non-sensical result for a debug_insn use? For debug info purposes it isn't necessarily invalid. If we extract later on only the well defined bits from the paradoxical subreg (e.g. by using a lowpart subreg of the debug expr), or AND it with a constant mask that only contains bits on the low part positions, then it is fine. And otherwise, it is useless for debug info anyway, we can't express that in DWARF (low bits set to ..., high bits undefined) - there the whole thing would be undefined. Jakub
[PATCH] c: Fix up c_parser_has_attribute_expression [PR101176]
Hi! This function keeps src_range member of the result uninitialized, which at least under valgrind can show up later when those uninitialized location_t's can make it into the IL or location_t hash tables. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-06-24 Jakub Jelinek PR c/101176 * c-parser.c (c_parser_has_attribute_expression): Set source range for the result. --- gcc/c/c-parser.c.jj 2021-06-23 13:33:00.374434233 +0200 +++ gcc/c/c-parser.c2021-06-23 23:53:45.146083923 +0200 @@ -8406,6 +8406,7 @@ c_parser_has_attribute_expression (c_par { gcc_assert (c_parser_next_token_is_keyword (parser, RID_BUILTIN_HAS_ATTRIBUTE)); + location_t start = c_parser_peek_token (parser)->location; c_parser_consume_token (parser); c_inhibit_evaluation_warnings++; @@ -8484,6 +8485,7 @@ c_parser_has_attribute_expression (c_par parser->translate_strings_p = save_translate_strings_p; + location_t finish = c_parser_peek_token (parser)->location; if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN)) c_parser_consume_token (parser); else @@ -8512,6 +8514,7 @@ c_parser_has_attribute_expression (c_par else result.value = boolean_false_node; + set_c_expr_source_range (&result, start, finish); return result; } Jakub
[RFC PATCH] i386: Add pack/unpack patterns for 64bit vectors [PR89021]
2021-06-24 Uroš Bizjak gcc/ PR target/89021 * config/i386/i386-expand.c (ix86_expand_sse_unpack): Handle V8QI and V4HI modes. * config/i386/mmx.md (sse4_1_v4qiv4hi2): New insn pattern. (sse4_1_v4qiv4hi2): Ditto. (mmxpackmode): New mode attribute. (vec_pack_trunc_): New expander. (mmxunpackmode): New mode attribute. (vec_unpacks_lo_): New expander. (vec_unpacks_hi_): Ditto. (vec_unpacku_lo_): Ditto. (vec_unpacku_hi_): Ditto. * config/i386/i386.md (extsuffix): Move from ... * config/i386/sse.md: ... here. gcc/testsuite/ PR target/89021 * gcc.target/i386/pr97249-1.c (foo): Add #pragma to avoid loop vectorization. (foo1): Ditto. (foo2): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. There is still one scan-tree-not failure in generic vectorization testsuite: FAIL: gcc.dg/vect/vect-nb-iter-ub-3.c scan-tree-dump-not cunroll "loop turned into non-loop; it never loops" This probably happens due to the additional epilogue vectorization, but I don't know how to "fix" this failure. Richi, can you perhaps help me here? Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 2cb939e51c3..e9763eb5b3e 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -5161,6 +5161,18 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) else unpack = gen_sse4_1_sign_extendv2siv2di2; break; + case E_V8QImode: + if (unsigned_p) + unpack = gen_sse4_1_zero_extendv4qiv4hi2; + else + unpack = gen_sse4_1_sign_extendv4qiv4hi2; + break; + case E_V4HImode: + if (unsigned_p) + unpack = gen_sse4_1_zero_extendv2hiv2si2; + else + unpack = gen_sse4_1_sign_extendv2hiv2si2; + break; default: gcc_unreachable (); } @@ -5172,10 +5184,24 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) } else if (high_p) { - /* Shift higher 8 bytes to lower 8 bytes. */ - tmp = gen_reg_rtx (V1TImode); - emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, src), -GEN_INT (64))); + switch (GET_MODE_SIZE (imode)) + { + case 16: + /* Shift higher 8 bytes to lower 8 bytes. */ + tmp = gen_reg_rtx (V1TImode); + emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, src), +GEN_INT (64))); + break; + case 8: + /* Shift higher 4 bytes to lower 4 bytes. */ + tmp = gen_reg_rtx (V1DImode); + emit_insn (gen_mmx_lshrv1di3 (tmp, gen_lowpart (V1DImode, src), + GEN_INT (32))); + break; + default: + gcc_unreachable (); + } + tmp = gen_lowpart (imode, tmp); } else @@ -5207,6 +5233,18 @@ ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) else unpack = gen_vec_interleave_lowv4si; break; + case E_V8QImode: + if (high_p) + unpack = gen_mmx_punpckhbw; + else + unpack = gen_mmx_punpcklbw; + break; + case E_V4HImode: + if (high_p) + unpack = gen_mmx_punpckhwd; + else + unpack = gen_mmx_punpcklwd; + break; default: gcc_unreachable (); } diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9043be3105d..9b619e2f78f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1000,6 +1000,9 @@ (define_code_iterator any_truncate [ss_truncate truncate us_truncate]) (define_code_attr trunsuffix [(ss_truncate "s") (truncate "") (us_truncate "us")]) +;; Instruction suffix for SSE sign and zero extensions. +(define_code_attr extsuffix [(sign_extend "sx") (zero_extend "zx")]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 7a827dceb01..e887f03474d 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2639,6 +2639,78 @@ (define_insn_and_split "mmx_punpckldq" (set_attr "type" "mmxcvt,sselog,sselog") (set_attr "mode" "DI,TI,TI")]) +(define_insn "sse4_1_v4qiv4hi2" + [(set (match_operand:V4HI 0 "register_operand" "=Yr,*x,Yw") + (any_extend:V4HI + (vec_select:V4QI + (match_operand:V8QI 1 "register_operand" "Yr,*x,Yw") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]] + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" + "%vpmovbw\t{%1, %0|%0, %1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_at
Re: [RFC PATCH] i386: Add pack/unpack patterns for 64bit vectors [PR89021]
On Thu, 24 Jun 2021, Uros Bizjak wrote: > 2021-06-24 Uroš Bizjak > > gcc/ > PR target/89021 > * config/i386/i386-expand.c (ix86_expand_sse_unpack): > Handle V8QI and V4HI modes. > * config/i386/mmx.md (sse4_1_v4qiv4hi2): > New insn pattern. > (sse4_1_v4qiv4hi2): Ditto. > (mmxpackmode): New mode attribute. > (vec_pack_trunc_): New expander. > (mmxunpackmode): New mode attribute. > (vec_unpacks_lo_): New expander. > (vec_unpacks_hi_): Ditto. > (vec_unpacku_lo_): Ditto. > (vec_unpacku_hi_): Ditto. > * config/i386/i386.md (extsuffix): Move from ... > * config/i386/sse.md: ... here. > > gcc/testsuite/ > > PR target/89021 > * gcc.target/i386/pr97249-1.c (foo): Add #pragma > to avoid loop vectorization. > (foo1): Ditto. > (foo2): Ditto. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > There is still one scan-tree-not failure in generic vectorization testsuite: > > FAIL: gcc.dg/vect/vect-nb-iter-ub-3.c scan-tree-dump-not cunroll "loop > turned into non-loop; it never loops" > > This probably happens due to the additional epilogue vectorization, > but I don't know how to "fix" this failure. Richi, can you perhaps > help me here? I would suggest to add --param vect-epilogues-nomask=0 to dg-additional-options to preserve what the testcase tested. Richard.
Re: [PATCH][RFC] Add x86 subadd SLP pattern
On Tue, 22 Jun 2021, Uros Bizjak wrote: > On Tue, Jun 22, 2021 at 12:34 PM Richard Biener wrote: > > > > On Tue, 22 Jun 2021, Uros Bizjak wrote: > > > > > On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford > > > wrote: > > > > > > > >> Well, the pattern is called addsub in the x86 world because highpart > > > > >> does add and lowpart does sub. In left-to-right writing systems > > > > >> highpart comes before lowpart, so you have addsub. > > > > > > > > > > The other targets mentioned do not seem to agree but I can live > > > > > with that, thus I'll change back to addsub. > > > > > > > > FWIW, subadd sounds clearer to me too. It seems surprising to put > > > > imaginary before real when interpreting something as complex, for > > > > example. > > > > > > > > Putting the highpart first feels especially odd on an LE system like > > > > x86? > > > > > > The XMM vector is documented left to right with MSB at the left (c.f. > > > most significant *DIGIT* of the number at the left) > > > > > > xmm[MSB] xmm[LSB] > > > > > > so, looking at x86 ADDSUBPD insn documentation: > > > > > > xmm2[127:64] xmm2[63:0] > > > ( + -) > > > xmm1[127:64] xmm1[63:0] > > > (=) > > > xmm1[127:64] holds ADD > > > xmm1[63:0] holds SUB > > > > > > xmm1[127:64] xmm1 [63:0] > > > ADD SUB > > > > I think if we really want to resolve the "ambiguity" we have to > > spell it out in the optab. vec_addodd_subeven or vec_addsub_oddeven. > > As I noted there are targets who have the opposite so we could > > then add vec_addsub_evenodd (not vec_subadd_evenodd). > > > > Just tell me what you prefer - I'll adjust the patch accordingly. > > I'd use addsub when add comes at the left of sub, and MSB is also > considered at the left. subadd for when sub comes at the left of add > where MSB is also at the left. > > I think that the documentation should clear the ambiguity. > > Otherwise, just my 0.02?, I don't want to bikeshed about this ad > infinitum, so I'll stop there. Yeah, so I've not changed anything but after the permute propagation fix I just pushed I've added a testcase that would be broken without it and adding the new CFN_VEC_ADDSUB case (which I now did). Re-bootstrapped and tested on x86_64-unknown-linux-gnu, now pushed since it helps me working with Tamar on the SLP pattern reorg to have at least one matching pattern on x86_64. Richard. >From a6bfb9106f5605c0ce6ffe1ed91b50510e43343b Mon Sep 17 00:00:00 2001 From: Richard Biener Date: Mon, 31 May 2021 13:19:01 +0200 Subject: [PATCH] Add x86 addsub SLP pattern To: gcc-patches@gcc.gnu.org This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... } thus subtract, add alternating on lanes, starting with subtract. It adds a corresponding optab and direct internal function, vec_addsub$a3 and renames the existing i386 backend patterns to the new canonical name. The SLP pattern matches the exact alternating lane sequence rather than trying to be clever and anticipating incoming permutes - we could permute the two input vectors to the needed lane alternation, do the addsub and then permute the result vector back but that's only profitable in case the two input or the output permute will vanish - something Tamars refactoring of SLP pattern recog should make possible. 2021-06-17 Richard Biener * config/i386/sse.md (avx_addsubv4df3): Rename to vec_addsubv4df3. (avx_addsubv8sf3): Rename to vec_addsubv8sf3. (sse3_addsubv2df3): Rename to vec_addsubv2df3. (sse3_addsubv4sf3): Rename to vec_addsubv4sf3. * internal-fn.def (VEC_ADDSUB): New internal optab fn. * optabs.def (vec_addsub_optab): New optab. * tree-vect-slp-patterns.c (class addsub_pattern): New. (slp_patterns): Add addsub_pattern. (vect_optimize_slp): Disable propagation across CFN_VEC_ADDSUB. * tree-vectorizer.h (vect_pattern::vect_pattern): Make m_ops optional. * doc/md.texi (vec_addsub3): Document. * gcc.target/i386/vect-addsubv2df.c: New testcase. * gcc.target/i386/vect-addsubv4sf.c: Likewise. * gcc.target/i386/vect-addsubv4df.c: Likewise. * gcc.target/i386/vect-addsubv8sf.c: Likewise. * gcc.target/i386/vect-addsub-2.c: Likewise. * gcc.target/i386/vect-addsub-3.c: Likewise. --- gcc/config/i386/i386-builtin.def | 8 +- gcc/config/i386/sse.md| 8 +- gcc/doc/md.texi | 8 ++ gcc/internal-fn.def | 1 + gcc/optabs.def| 1 + gcc/testsuite/gcc.target/i386/vect-addsub-2.c | 21 gcc/testsuite/gcc.target/i386/vect-addsub-3.c | 38 +++ .../gcc.target/i386/vect-addsubv2df.c | 42 .../gcc.target/i386/vect-addsubv4df.c | 36 +++ .../gcc.target/i386/vect-addsubv4sf.c | 46 .../gcc.target/i386/vect-addsubv8sf.c
[ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics
Hi, This patch replaces builtins for vdup_n and vmov_n. The patch results in regression for pr51534.c. Consider following function: uint8x8_t f1 (uint8x8_t a) { return vcgt_u8(a, vdup_n_u8(0)); } code-gen before patch: f1: vmov.i32 d16, #0 @ v8qi vcgt.u8 d0, d0, d16 bx lr code-gen after patch: f1: vceq.i8 d0, d0, #0 vmvnd0, d0 bx lr I am not sure which one is better tho ? Also, this patch regressed bf16_dup.c on arm-linux-gnueabi, which is due to a missed opt in lowering. I had filed it as PR98435, and posted a fix for it here: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html Thanks, Prathamesh 2021-06-24 Prathamesh Kulkarni PR target/66791 * gcc/config/arm/arm_neon.h (vdup_n_s8): Replace call to builtin with constructor. (vdup_n_s16): Likewise. (vdup_n_s32): Likewise. (vdup_n_s64): Likewise. (vdup_n_u8): Likewise. (vdup_n_u16): Likewise. (vdup_n_u32): Likewise. (vdup_n_u64): Likewise. (vdup_n_p8): Likewise. (vdup_n_p16): Likewise. (vdup_n_p64): Likewise. (vdup_n_f16): Likewise. (vdup_n_f32): Likewise. (vdupq_n_s8): Likewise. (vdupq_n_s16): Likewise. (vdupq_n_s32): Likewise. (vdupq_n_s64): Likewise. (vdupq_n_u8): Likewise. (vdupq_n_u16): Likewise. (vdupq_n_u32): Likewise. (vdupq_n_u64): Likewise. (vdupq_n_p8): Likewise. (vdupq_n_p16): Likewise. (vdupq_n_p64): Likewise. (vdupq_n_f16): Likewise. (vdupq_n_f32): Likewise. (vmov_n_s8): Replace call to builtin with call to corresponding vdup intrinsic. (vmov_n_s16): Likewise. (vmov_n_s32): Likewise. (vmov_n_s64): Likewise. (vmov_n_u8): Likewise. (vmov_n_u16): Likewise. (vmov_n_u32): Likewise. (vmov_n_u64): Likewise. (vmov_n_p8): Likewise. (vmov_n_p16): Likewise. (vmov_n_f16): Likewise. (vmov_n_f32): Likewise. (vmovq_n_s8): Likewise. (vmovq_n_s16): Likewise. (vmovq_n_s32): Likewise. (vmovq_n_s64): Likewise. (vmovq_n_u8): Likewise. (vmovq_n_u16): Likewise. (vmovq_n_u32): Likewise. (vmovq_n_u64): Likewise. (vmovq_n_p8): Likewise. (vmovq_n_p16): Likewise. (vmovq_n_f16): Likewise. (vmovq_n_f32): Likewise. * config/arm/arm_neon_builtins.def: Remove entries for vdup_n. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3efcfa45229..bf26cd49d53 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -6625,63 +6625,63 @@ __extension__ extern __inline int8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_s8 (int8_t __a) { - return (int8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a); + return (int8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a}; } __extension__ extern __inline int16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_s16 (int16_t __a) { - return (int16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a); + return (int16x4_t) {__a, __a, __a, __a}; } __extension__ extern __inline int32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_s32 (int32_t __a) { - return (int32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a); + return (int32x2_t) {__a, __a}; } __extension__ extern __inline float32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_f32 (float32_t __a) { - return (float32x2_t)__builtin_neon_vdup_nv2sf ((__builtin_neon_sf) __a); + return (float32x2_t) {__a, __a}; } __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_u8 (uint8_t __a) { - return (uint8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a); + return (uint8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a}; } __extension__ extern __inline uint16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_u16 (uint16_t __a) { - return (uint16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a); + return (uint16x4_t) {__a, __a, __a, __a}; } __extension__ extern __inline uint32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_u32 (uint32_t __a) { - return (uint32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a); + return (uint32x2_t) {__a, __a}; } __extension__ extern __inline poly8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdup_n_p8 (poly8_t __a) { - return (poly8x8_t)__builtin_neon_vdup_nv8qi ((__builtin_neon_qi) __a); + return (poly8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a}; } __extension__ extern __inline poly16x4_t __attribute__ ((__always_inline__, __g
[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? This helps with generating the code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/s390.c (print_operand): Handle %K. * config/s390/s390.md (*movdi_64): Use %K for larl. (reload_base_64): Likewise. (*sibcall_brc): Use %K for j. (*sibcall_brcl): Use %K for jg. (*sibcall_value_brc): Use %K for j. (*sibcall_value_brcl): Use %K for jg. (*bras): Use %K. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (main_base_64): Use %K for larl. (reload_base_64): Likewise. (@split_stack_call): Use %K for jg. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call31-z10-pic-nodatarel.c: New test. * gcc.target/s390/call31-z10-pic.c: New test. * gcc.target/s390/call31-z10.c: New test. * gcc.target/s390/call31-z9-pic-nodatarel.c: New test. * gcc.target/s390/call31-z9-pic.c: New test. * gcc.target/s390/call31-z9.c: New test. * gcc.target/s390/call64-z10-pic-nodatarel.c: New test. * gcc.target/s390/call64-z10-pic.c: New test. * gcc.target/s390/call64-z10.c: New test. * gcc.target/s390/call64-z9-pic-nodatarel.c: New test. * gcc.target/s390/call64-z9-pic.c: New test. * gcc.target/s390/call64-z9.c: New test. --- gcc/config/s390/s390.c| 9 + gcc/config/s390/s390.md | 26 ++--- gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/gcc.target/s390/call.h | 38 +++ .../s390/call31-z10-pic-nodatarel.c | 16 .../gcc.target/s390/call31-z10-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 .../s390/call64-z10-pic-nodatarel.c | 17 + .../gcc.target/s390/call64-z10-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z10.c| 15 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9.c | 15 gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 + gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +-- 19 files changed, 258 insertions(+), 43 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..e7839044a40 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr) 'E': print opcode suffix for branch on index instruction. 'G': print the size of the operand in bytes. 'J': print tls_load/tls_gdcall/tls_ldcall suffix +'K': print @PLT suffix for call targets and load address values. 'M': print the second word of a TImode operand. 'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference or address. @@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code) case 'Y': print_shift_count_operand (file, x); return; + +case 'K': + if (TARGET_64BIT + && flag_pic + && GET_CODE (x) == SYMBOL_REF + && SYMBOL_REF_FUNCTION_P (x)) + fprintf (file, "@PLT"); + return
Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.
Thanks Jonathan. Here are some benchmarks (assembly in [1]): https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw Unfortunately, quick-bench times out unless some implementations are commented out. You can copy the code and run it locally (needs google benchmark) to get the full picture. I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits really good code and looking at previous versions this has been very consistent. This implementation also highlights very clearly that my hack to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7) saves a ror instruction with remarkable performance effects. In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest. Nevertheless, compared to other fast alternatives, performance differences are small and results seem very much platform dependent. In my opinion, his implementation is the best, especially, given recent compiler changes. My reasoning follows. My original motivation was contrasting 'year % 400 == 0' with 'year % 16' as in versions 1 and 2: __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1 __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2 It's fair to expect that version 2 won't be slower than version 1. However, this is the case and by quite a big margin. The emitted instructions are very reasonable and, I guess, the issue is the choice of registers. I've reimplemented half of version 2 in inline asm using similar register choices as version 1 and got the performance boost that I was expecting. (By no means I'm suggesting a non portable implementation here. It was just an exercise to make my point. Also, it performs very poorly when compiled by clang.) The point here is that code emitted for sources like versions 1 and 2 (involving %) seems sensitive to very small variations and has been changing across compiler releases in the last 3 years or so. (This can be checked on godbolt.) Clang also seems very confused [2]. Even if the emitted code for version 2 and others were good today, I fear a new compiler release could regress. The stability of the code emitted for Ulrich Drepper's implementation is much more reliable, its performance is very good and its clarity is only compromised by my own hack for __is_multiple_of_100. (Recall, however, that this hack is crucial for his implementation's performance.) Best wishes, Cassio. [1] https://godbolt.org/z/TbGYEqx33 [2] https://bugs.llvm.org/show_bug.cgi?id=50334 On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely wrote: > On 23/06/21 18:51 +0100, Jonathan Wakely wrote: > >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux. > >Pushed to trunk. > > > > > > > > >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb > >Author: Cassio Neri > >Date: Wed Jun 23 15:32:16 2021 > > > >libstdc++: More efficient std::chrono::year::leap > > > >Simple change to std::chrono::year::is_leap. If a year is multiple of > 100, > >then it's divisible by 400 if and only if it's divisible by 16. The > latter > >allows for better code generation. > > > >The expression is then either y%16 or y%4 which are both powers of two > >and so it can be rearranged to use simple bitmask operations. > > > >Co-authored-by: Jonathan Wakely > >Co-authored-by: Ulrich Drepper > > > >libstdc++-v3/ChangeLog: > > > >* include/std/chrono (chrono::year::is_leap()): Optimize. > > > >diff --git a/libstdc++-v3/include/std/chrono > b/libstdc++-v3/include/std/chrono > >index 4631a727d73..863b6a27bdf 100644 > >--- a/libstdc++-v3/include/std/chrono > >+++ b/libstdc++-v3/include/std/chrono > >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > // [1] https://github.com/cassioneri/calendar > > // [2] > https://accu.org/journals/overload/28/155/overload155.pdf#page=16 > > > >+ // Furthermore, if y%100 != 0, then y%400==0 is equivalent to > y%16==0, > >+ // so we can rearrange the expression to (mult_100 ? y % 4 : y % > 16)==0 > > But Ulrich pointed out I got my boolean logic all muddled up in the > comment. Fixed with the attached patch! > >
[PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
CPUID functions are used to detect CPU features. If vector ISAs are enabled, compiler is free to use them in these functions. Add __attribute__ ((target("general-regs-only"))) to CPUID functions to avoid vector instructions. gcc/ PR target/101185 * config/i386/cpuid.h (__get_cpuid_max): Add __attribute__ ((target("general-regs-only"))). (__get_cpuid): Likewise. (__get_cpuid_count): Likewise. (__cpuidex): Likewise. gcc/testsuite/ PR target/101185 * gcc.target/i386/avx512-check.h (check_osxsave): Add __attribute__ ((target("general-regs-only"))). (main): Likewise. --- gcc/config/i386/cpuid.h | 4 gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index aebc17c6827..74881ee91e5 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -243,6 +243,7 @@ pointer is non-null, then first four bytes of the signature (as found in ebx register) are returned in location pointed by sig. */ +__attribute__ ((target("general-regs-only"))) static __inline unsigned int __get_cpuid_max (unsigned int __ext, unsigned int *__sig) { @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig) supported and returns 1 for valid cpuid information or 0 for unsupported cpuid leaf. All pointers are required to be non-null. */ +__attribute__ ((target("general-regs-only"))) static __inline int __get_cpuid (unsigned int __leaf, unsigned int *__eax, unsigned int *__ebx, @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, /* Same as above, but sub-leaf can be specified. */ +__attribute__ ((target("general-regs-only"))) static __inline int __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, unsigned int *__eax, unsigned int *__ebx, @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, return 1; } +__attribute__ ((target("general-regs-only"))) static __inline void __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) { diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h index 0a377dba1d5..406faf8fe03 100644 --- a/gcc/testsuite/gcc.target/i386/avx512-check.h +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h @@ -25,6 +25,7 @@ do_test (void) } #endif +__attribute__ ((target("general-regs-only"))) static int check_osxsave (void) { @@ -34,6 +35,7 @@ check_osxsave (void) return (ecx & bit_OSXSAVE) != 0; } +__attribute__ ((target("general-regs-only"))) int main () { -- 2.31.1
Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches wrote: > > This patch enhances FRE by recording equivalences generated by conditional > statements, so that we can find more optimize opportunities in situations like > following two cases: > > case 1: > void f (unsigned int a, unsigned int b) > { > if (a == b) > { > for (unsigned i = 0; i < a; i++) > { > if (i == b) > foo (); /* Unreachable */ > } > } > } > The "i == b" condition is always false, yet original FRE cannot predict this. > Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s > result. (Moreover, VRP can't infer "i == b" is false either, as its boundary > calculation is hindered by the "unsigned" modifier.) > > case 2: > Consider GIMPLE code: >: > if (a != 0) > goto ; [INV] > else > goto ; [INV] > >: > >: > # c = PHI > if (a != 0) > goto ; [INV] > else > goto ; [INV] > >: > if (c > x) > goto ; [INV] > else > goto ; [INV] > >: > __builtin_puts (&"Unreachable!"[0]); > >: > __builtin_puts (&"a"[0]); > >: > ... > The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the > same > conditional statement, meanwhile bb2 dominates bb4, so it is predictable that > c==x at bb5 and c==y at bb7. Keeping record of this might bring further > optimizations, such as removing the conditional in bb5. > > The basic idea is to use a hashmap to record additional equivalence > information > generated by conditional statements. > > Insert to the map: > 1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record > the equivalence of x and y at edge destiny in the map. > 2. Consider case 2 above, when we fail to predict the result of a > conditional expression (at bb4), if following conditions be satisfied: > a. There is a previous corresponding predicate. In this case, > it will > be "a != 0 is true" at bb3. > b. bb3's single predecessor bb2 dominates bb4. > c. bb3's conditional statement's predicate code (or inverted > code) is > identical with that of bb4. (This condition can be relaxed.) > Then we can try to find a PHI operation along A's true and false edge, and > record the equivalence of PHI operation's result and arguments at bb4's > true and false destinations. Regarding this case, "c==x at bb5" and > "c==y at bb7" will be recorded. > > Use the map: > When we cannot find a prediction result for a conditional statement by > original process, replace conditional expression's operands with qualified > equivalence and try again. > > As searching predicates and the ssa names to record are based on > value numbering, we need to "unwind" the equivalence map for iteration. > > Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu. I have some reservations about extending the ad-hoc "predicated value" code. Some comments on the patch: +/* hashtable & helpers to record equivalences at given bb. */ + +typedef struct val_equiv_s +{ + val_equiv_s *next; + val_equiv_s *unwind_to; + hashval_t hashcode; + /* SSA name this val_equiv_s is associated with. */ + tree name; + /* RESULT in a vn_pval entry is SSA name of a equivalence. */ + vn_pval *values; +} * val_equiv_t; all of this (and using a hashtable for recording) is IMHO a bit overkill. Since you only ever record equivalences for values the more natural place to hook those in is the vn_ssa_aux structure where we also record the availability chain. There's little commentary in the new code, in particular function-level comments are missing everywhere. There's complexity issues, like I see val_equiv_insert has a "recurse" feature but also find_predicated_value_by_equiv is quadratic in the number of equivalences of the lhs/rhs. Without knowing what the recursion on the former is for - nothing tells me - I suspect either of both should be redundant. You seem to record equivalences at possible use points which looks odd at best - I'd expected equivalences being recorded at the same point we record predicated values and for the current condition, not the one determining some other predication. What was the motivation to do it the way you do it? Why is the code conditional on 'iterate'? You've probably realized that there's no "nice" way to handle temporary equivalences in a VN scheme using hashing for expressions (unless you degrade hashes a lot). You quote opportunities that are catched with this like + if (a != 0) +{ + c = b; +} +
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches wrote: > > CPUID functions are used to detect CPU features. If vector ISAs > are enabled, compiler is free to use them in these functions. Add > __attribute__ ((target("general-regs-only"))) to CPUID functions > to avoid vector instructions. But there are GPR instructions not in x86_64, so shouldn't we use target("march=x86_64") or so? Note doing either will of course prevent inlining of those "inlines". So I'm not sure how much of a fix this is ... the error will almost always be visible in the caller as well. > gcc/ > > PR target/101185 > * config/i386/cpuid.h (__get_cpuid_max): Add > __attribute__ ((target("general-regs-only"))). > (__get_cpuid): Likewise. > (__get_cpuid_count): Likewise. > (__cpuidex): Likewise. > > gcc/testsuite/ > > PR target/101185 > * gcc.target/i386/avx512-check.h (check_osxsave): Add > __attribute__ ((target("general-regs-only"))). > (main): Likewise. > --- > gcc/config/i386/cpuid.h | 4 > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > 2 files changed, 6 insertions(+) > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > index aebc17c6827..74881ee91e5 100644 > --- a/gcc/config/i386/cpuid.h > +++ b/gcc/config/i386/cpuid.h > @@ -243,6 +243,7 @@ > pointer is non-null, then first four bytes of the signature > (as found in ebx register) are returned in location pointed by sig. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline unsigned int > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > { > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > supported and returns 1 for valid cpuid information or 0 for > unsupported cpuid leaf. All pointers are required to be non-null. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline int > __get_cpuid (unsigned int __leaf, > unsigned int *__eax, unsigned int *__ebx, > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > /* Same as above, but sub-leaf can be specified. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline int > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, >unsigned int *__eax, unsigned int *__ebx, > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int > __subleaf, >return 1; > } > > +__attribute__ ((target("general-regs-only"))) > static __inline void > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > { > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > b/gcc/testsuite/gcc.target/i386/avx512-check.h > index 0a377dba1d5..406faf8fe03 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > @@ -25,6 +25,7 @@ do_test (void) > } > #endif > > +__attribute__ ((target("general-regs-only"))) > static int > check_osxsave (void) > { > @@ -34,6 +35,7 @@ check_osxsave (void) >return (ecx & bit_OSXSAVE) != 0; > } > > +__attribute__ ((target("general-regs-only"))) > int > main () > { > -- > 2.31.1 >
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 5:35 AM Richard Biener wrote: > > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches > wrote: > > > > CPUID functions are used to detect CPU features. If vector ISAs > > are enabled, compiler is free to use them in these functions. Add > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > to avoid vector instructions. > > But there are GPR instructions not in x86_64, so shouldn't > we use target("march=x86_64") or so? Note doing either will > of course prevent inlining of those "inlines". Does -march=x86_64, which enables CMOV and other GPR ISAs, work for -m32? > So I'm not sure how much of a fix this is ... the error will almost > always be visible in the caller as well. I think _attribute__ ((target("general-regs-only"))) is a step forward. > > gcc/ > > > > PR target/101185 > > * config/i386/cpuid.h (__get_cpuid_max): Add > > __attribute__ ((target("general-regs-only"))). > > (__get_cpuid): Likewise. > > (__get_cpuid_count): Likewise. > > (__cpuidex): Likewise. > > > > gcc/testsuite/ > > > > PR target/101185 > > * gcc.target/i386/avx512-check.h (check_osxsave): Add > > __attribute__ ((target("general-regs-only"))). > > (main): Likewise. > > --- > > gcc/config/i386/cpuid.h | 4 > > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > > 2 files changed, 6 insertions(+) > > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > > index aebc17c6827..74881ee91e5 100644 > > --- a/gcc/config/i386/cpuid.h > > +++ b/gcc/config/i386/cpuid.h > > @@ -243,6 +243,7 @@ > > pointer is non-null, then first four bytes of the signature > > (as found in ebx register) are returned in location pointed by sig. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline unsigned int > > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > > { > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int > > *__sig) > > supported and returns 1 for valid cpuid information or 0 for > > unsupported cpuid leaf. All pointers are required to be non-null. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline int > > __get_cpuid (unsigned int __leaf, > > unsigned int *__eax, unsigned int *__ebx, > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > > > /* Same as above, but sub-leaf can be specified. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline int > > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, > >unsigned int *__eax, unsigned int *__ebx, > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int > > __subleaf, > >return 1; > > } > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline void > > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > > { > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > > b/gcc/testsuite/gcc.target/i386/avx512-check.h > > index 0a377dba1d5..406faf8fe03 100644 > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > > @@ -25,6 +25,7 @@ do_test (void) > > } > > #endif > > > > +__attribute__ ((target("general-regs-only"))) > > static int > > check_osxsave (void) > > { > > @@ -34,6 +35,7 @@ check_osxsave (void) > >return (ecx & bit_OSXSAVE) != 0; > > } > > > > +__attribute__ ((target("general-regs-only"))) > > int > > main () > > { > > -- > > 2.31.1 > > -- H.J.
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu wrote: > > On Thu, Jun 24, 2021 at 5:35 AM Richard Biener > wrote: > > > > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches > > wrote: > > > > > > CPUID functions are used to detect CPU features. If vector ISAs > > > are enabled, compiler is free to use them in these functions. Add > > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > > to avoid vector instructions. > > > > But there are GPR instructions not in x86_64, so shouldn't > > we use target("march=x86_64") or so? Note doing either will > > of course prevent inlining of those "inlines". > > Does -march=x86_64, which enables CMOV and other GPR > ISAs, work for -m32? I don't think so. I'm also not sure whether -march=xyz in a target attribute overrides -mavx512f on the command-line ;) > > So I'm not sure how much of a fix this is ... the error will almost > > always be visible in the caller as well. > > I think _attribute__ ((target("general-regs-only"))) is a step > forward. That I agree to, but then the cpuid code is likely written the way it is to allow inlining. But code using CPUID should best compile functions under the check with additional target attribute (or in a separate TU) rather than compiling everything with extra -mXYZ and trying to "disable" things in the dispatching code (and the code leading to it!). Richard. > > > gcc/ > > > > > > PR target/101185 > > > * config/i386/cpuid.h (__get_cpuid_max): Add > > > __attribute__ ((target("general-regs-only"))). > > > (__get_cpuid): Likewise. > > > (__get_cpuid_count): Likewise. > > > (__cpuidex): Likewise. > > > > > > gcc/testsuite/ > > > > > > PR target/101185 > > > * gcc.target/i386/avx512-check.h (check_osxsave): Add > > > __attribute__ ((target("general-regs-only"))). > > > (main): Likewise. > > > --- > > > gcc/config/i386/cpuid.h | 4 > > > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > > > 2 files changed, 6 insertions(+) > > > > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > > > index aebc17c6827..74881ee91e5 100644 > > > --- a/gcc/config/i386/cpuid.h > > > +++ b/gcc/config/i386/cpuid.h > > > @@ -243,6 +243,7 @@ > > > pointer is non-null, then first four bytes of the signature > > > (as found in ebx register) are returned in location pointed by sig. > > > */ > > > > > > +__attribute__ ((target("general-regs-only"))) > > > static __inline unsigned int > > > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > > > { > > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int > > > *__sig) > > > supported and returns 1 for valid cpuid information or 0 for > > > unsupported cpuid leaf. All pointers are required to be non-null. */ > > > > > > +__attribute__ ((target("general-regs-only"))) > > > static __inline int > > > __get_cpuid (unsigned int __leaf, > > > unsigned int *__eax, unsigned int *__ebx, > > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > > > > > /* Same as above, but sub-leaf can be specified. */ > > > > > > +__attribute__ ((target("general-regs-only"))) > > > static __inline int > > > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, > > >unsigned int *__eax, unsigned int *__ebx, > > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int > > > __subleaf, > > >return 1; > > > } > > > > > > +__attribute__ ((target("general-regs-only"))) > > > static __inline void > > > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > > > { > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > index 0a377dba1d5..406faf8fe03 100644 > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > @@ -25,6 +25,7 @@ do_test (void) > > > } > > > #endif > > > > > > +__attribute__ ((target("general-regs-only"))) > > > static int > > > check_osxsave (void) > > > { > > > @@ -34,6 +35,7 @@ check_osxsave (void) > > >return (ecx & bit_OSXSAVE) != 0; > > > } > > > > > > +__attribute__ ((target("general-regs-only"))) > > > int > > > main () > > > { > > > -- > > > 2.31.1 > > > > > > > -- > H.J.
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 5:47 AM Richard Biener wrote: > > On Thu, Jun 24, 2021 at 2:42 PM H.J. Lu wrote: > > > > On Thu, Jun 24, 2021 at 5:35 AM Richard Biener > > wrote: > > > > > > On Thu, Jun 24, 2021 at 2:13 PM H.J. Lu via Gcc-patches > > > wrote: > > > > > > > > CPUID functions are used to detect CPU features. If vector ISAs > > > > are enabled, compiler is free to use them in these functions. Add > > > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > > > to avoid vector instructions. > > > > > > But there are GPR instructions not in x86_64, so shouldn't > > > we use target("march=x86_64") or so? Note doing either will > > > of course prevent inlining of those "inlines". > > > > Does -march=x86_64, which enables CMOV and other GPR > > ISAs, work for -m32? > > I don't think so. I'm also not sure whether -march=xyz in a > target attribute overrides -mavx512f on the command-line ;) > > > > So I'm not sure how much of a fix this is ... the error will almost > > > always be visible in the caller as well. > > > > I think _attribute__ ((target("general-regs-only"))) is a step > > forward. > > That I agree to, but then the cpuid code is likely written the > way it is to allow inlining. But code using CPUID should best compile > functions under the check with additional target attribute > (or in a separate TU) rather than compiling everything with > extra -mXYZ and trying to "disable" things in the dispatching > code (and the code leading to it!). CPUID checks in GCC tests should be compiled with noinline, ... plus minimum ISAs allowed. > Richard. > > > > > gcc/ > > > > > > > > PR target/101185 > > > > * config/i386/cpuid.h (__get_cpuid_max): Add > > > > __attribute__ ((target("general-regs-only"))). > > > > (__get_cpuid): Likewise. > > > > (__get_cpuid_count): Likewise. > > > > (__cpuidex): Likewise. > > > > > > > > gcc/testsuite/ > > > > > > > > PR target/101185 > > > > * gcc.target/i386/avx512-check.h (check_osxsave): Add > > > > __attribute__ ((target("general-regs-only"))). > > > > (main): Likewise. > > > > --- > > > > gcc/config/i386/cpuid.h | 4 > > > > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > > > > 2 files changed, 6 insertions(+) > > > > > > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > > > > index aebc17c6827..74881ee91e5 100644 > > > > --- a/gcc/config/i386/cpuid.h > > > > +++ b/gcc/config/i386/cpuid.h > > > > @@ -243,6 +243,7 @@ > > > > pointer is non-null, then first four bytes of the signature > > > > (as found in ebx register) are returned in location pointed by sig. > > > > */ > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > static __inline unsigned int > > > > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > > > > { > > > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int > > > > *__sig) > > > > supported and returns 1 for valid cpuid information or 0 for > > > > unsupported cpuid leaf. All pointers are required to be non-null. > > > > */ > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > static __inline int > > > > __get_cpuid (unsigned int __leaf, > > > > unsigned int *__eax, unsigned int *__ebx, > > > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > > > > > > > /* Same as above, but sub-leaf can be specified. */ > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > static __inline int > > > > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, > > > >unsigned int *__eax, unsigned int *__ebx, > > > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned > > > > int __subleaf, > > > >return 1; > > > > } > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > static __inline void > > > > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > > > > { > > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > index 0a377dba1d5..406faf8fe03 100644 > > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > @@ -25,6 +25,7 @@ do_test (void) > > > > } > > > > #endif > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > static int > > > > check_osxsave (void) > > > > { > > > > @@ -34,6 +35,7 @@ check_osxsave (void) > > > >return (ecx & bit_OSXSAVE) != 0; > > > > } > > > > > > > > +__attribute__ ((target("general-regs-only"))) > > > > int > > > > main () > > > > { > > > > -- > > > > 2.31.1 > > > > > > > > > > > > -- > > H.J. -- H.J.
Re: [PATCH 5/7] Try inverted comparison for match_simplify in phiopt
Hi Andrew, just a nit.. On Wed, 23 Jun 2021 15:19:13 -0700 apinski--- via Gcc-patches wrote: > From: Andrew Pinski > > Since match and simplify does not have all of the inverted > comparison patterns, it make sense to just have > phi-opt try to do the inversion and try match and simplify again. > > OK? Bootstrapped and tested on x86_64-linux-gnu. > > Thanks, > Andrew Pinski > > gcc/ChangeLog: > > * tree-ssa-phiopt.c (gimple_simplify_phiopt): > If "A ? B : C" fails to simplify, try "(!A) ? C : B". > --- > gcc/tree-ssa-phiopt.c | 27 ++- > 1 file changed, 26 insertions(+), 1 deletion(-) > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c > index 8b0e68c1e90..5f099eca343 100644 > --- a/gcc/tree-ssa-phiopt.c > +++ b/gcc/tree-ssa-phiopt.c > @@ -828,7 +828,8 @@ phiopt_early_allow (enum tree_code code) > with parts pushed if EARLY_P was true. Also rejects non allowed tree code > if EARLY_P is set. > Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries > - to simplify CMP ? ARG0 : ARG1. */ > + to simplify CMP ? ARG0 : ARG1. > + Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse failed. */ commentary typo above, args need swap: + Also try to simplify (!CMP) ? ARG1 : ARG0 if the non-inverse failed. */ thanks,
Re: [PATCH] c: Fix up c_parser_has_attribute_expression [PR101176]
On Thu, Jun 24, 2021 at 12:20:56PM +0200, Jakub Jelinek wrote: > Hi! > > This function keeps src_range member of the result uninitialized, which at > least under valgrind can show up later when those uninitialized location_t's > can make it into the IL or location_t hash tables. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok, thanks. > 2021-06-24 Jakub Jelinek > > PR c/101176 > * c-parser.c (c_parser_has_attribute_expression): Set source range for > the result. > > --- gcc/c/c-parser.c.jj 2021-06-23 13:33:00.374434233 +0200 > +++ gcc/c/c-parser.c 2021-06-23 23:53:45.146083923 +0200 > @@ -8406,6 +8406,7 @@ c_parser_has_attribute_expression (c_par > { >gcc_assert (c_parser_next_token_is_keyword (parser, > RID_BUILTIN_HAS_ATTRIBUTE)); > + location_t start = c_parser_peek_token (parser)->location; >c_parser_consume_token (parser); > >c_inhibit_evaluation_warnings++; > @@ -8484,6 +8485,7 @@ c_parser_has_attribute_expression (c_par > >parser->translate_strings_p = save_translate_strings_p; > > + location_t finish = c_parser_peek_token (parser)->location; >if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN)) > c_parser_consume_token (parser); >else > @@ -8512,6 +8514,7 @@ c_parser_has_attribute_expression (c_par >else > result.value = boolean_false_node; > > + set_c_expr_source_range (&result, start, finish); >return result; > } Marek
[committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*
The LWG issue proposes to add a conditional noexcept-specifier to std::unique_ptr's dereference operator. The issue is currently in Tentatively Ready status, but even if it isn't voted into the draft, we can do it as a conforming extensions. This commit also adds a similar noexcept-specifier to operator[] for the unique_ptr partial specialization. Also ensure that all dereference operators for shared_ptr are noexcept, and adds tests for the std::optional accessors modified by the issue, which were already noexcept in our implementation. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]): Add noexcept. * include/bits/unique_ptr.h (unique_ptr::operator*): Add conditional noexcept as per LWG 2762. * testsuite/20_util/shared_ptr/observers/array.cc: Check that dereferencing cannot throw. * testsuite/20_util/shared_ptr/observers/get.cc: Likewise. * testsuite/20_util/optional/observers/lwg2762.cc: New test. * testsuite/20_util/unique_ptr/lwg2762.cc: New test. Tested powerpc64le-linux. Committed to trunk. commit 17bc3848e065c0980523e1a1592f2f03b24b4f1c Author: Jonathan Wakely Date: Thu Jun 24 12:56:20 2021 libstdc++: Implement LWG 2762 for std::unique_ptr::operator* The LWG issue proposes to add a conditional noexcept-specifier to std::unique_ptr's dereference operator. The issue is currently in Tentatively Ready status, but even if it isn't voted into the draft, we can do it as a conforming extensions. This commit also adds a similar noexcept-specifier to operator[] for the unique_ptr partial specialization. Also ensure that all dereference operators for shared_ptr are noexcept, and adds tests for the std::optional accessors modified by the issue, which were already noexcept in our implementation. Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]): Add noexcept. * include/bits/unique_ptr.h (unique_ptr::operator*): Add conditional noexcept as per LWG 2762. * testsuite/20_util/shared_ptr/observers/array.cc: Check that dereferencing cannot throw. * testsuite/20_util/shared_ptr/observers/get.cc: Likewise. * testsuite/20_util/optional/observers/lwg2762.cc: New test. * testsuite/20_util/unique_ptr/lwg2762.cc: New test. diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index eb9ad23ba1e..5be935d174d 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -1035,7 +1035,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif element_type& - operator[](ptrdiff_t __i) const + operator[](ptrdiff_t __i) const noexcept { __glibcxx_assert(_M_get() != nullptr); __glibcxx_assert(!extent<_Tp>::value || __i < extent<_Tp>::value); diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h index 6e5537536e8..1781fe15649 100644 --- a/libstdc++-v3/include/bits/unique_ptr.h +++ b/libstdc++-v3/include/bits/unique_ptr.h @@ -402,7 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// Dereference the stored pointer. typename add_lvalue_reference::type - operator*() const + operator*() const noexcept(noexcept(*std::declval())) { __glibcxx_assert(get() != pointer()); return *get(); @@ -655,6 +655,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// Access an element of owned array. typename std::add_lvalue_reference::type operator[](size_t __i) const + noexcept(noexcept(std::declval()[std::declval()])) { __glibcxx_assert(get() != pointer()); return get()[__i]; diff --git a/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc b/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc new file mode 100644 index 000..a0cf0bc19a0 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/optional/observers/lwg2762.cc @@ -0,0 +1,21 @@ +// { dg-do compile { target c++17 } } + +// LWG 2762 adds noexcept to operator-> and operator* +#include + +struct S +{ + void can_throw(); + void cannot_throw() noexcept; +}; + +static_assert( ! noexcept(std::declval&>()->can_throw()) ); +static_assert( noexcept(std::declval&>()->cannot_throw()) ); + +static_assert( noexcept(std::declval&>().operator->()) ); +static_assert( noexcept(std::declval&>().operator->()) ); + +static_assert( noexcept(*std::declval&>()) ); +static_assert( noexcept(*std::declval&>()) ); +static_assert( noexcept(*std::declval&&>()) ); +static_assert( noexcept(*std::declval&&>()) ); diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/observers/array.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/observers/array.cc index f6
Re: [PATCH 0/3] Improve and document stdx::simd testsuite
On Wed, 23 Jun 2021 at 16:46, Jonathan Wakely wrote: > > On Tue, 8 Jun 2021 at 09:57, Matthias Kretz wrote: > > > > As discussed a long time ago on IRC, this improves (i.e. decreases by > > default) > > the verbosity of make check-simd, gives more verbosity options, and finally > > documents how the simd testsuite is used and how it works. In addition, > > after > > PR98834 was resolved, remove the -fno-tree-vrp workaround. > > > > Tested on x86_64-linux (and more). > > Great, thanks, I'm about to push all 3 patches to trunk. I've pushed this fix for some typos in the new README.md commit 07ba52849ffca26a3d461f94921b23a9cdbaea7f Author: Jonathan Wakely Date: Thu Jun 24 13:49:19 2021 libstdc++: Fix typos and markdown errors in new simd/README.md Signed-off-by: Jonathan Wakely libstdc++-v3/ChangeLog: * testsuite/experimental/simd/README.md: Fix typos. diff --git a/libstdc++-v3/testsuite/experimental/simd/README.md b/libstdc++-v3/testsuite/experimental/simd/README.md index db0d71f8d43..b82453df403 100644 --- a/libstdc++-v3/testsuite/experimental/simd/README.md +++ b/libstdc++-v3/testsuite/experimental/simd/README.md @@ -11,7 +11,7 @@ ### `target_list` Similar to dejagnu target lists: E.g. -`target_list="unix{-march=sandybridge,-march=native/-ffast-math,-march=native/-ffinite-math-only}" +`target_list="unix{-march=sandybridge,-march=native/-ffast-math,-march=native/-ffinite-math-only}"` would create three subdirs in `testsuite/simd/` to run the complete simd testsuite first with `-march=sandybridge`, then with `-march=native -ffast-math`, and finally with `-march=native -ffinite-math-only`. @@ -21,7 +21,7 @@ This variable can be set to a path to a file which is equivalent to a dejagnu board. The file needs to be a valid `sh` script since it is sourced from the -`scripts/check_simd` script. It's purpose is to set the `target_list` variable +`scripts/check_simd` script. Its purpose is to set the `target_list` variable depending on `$target_triplet` (or whatever else makes sense for you). Example: ```sh @@ -52,7 +52,7 @@ `power7` which always uses the flags `-mcpu=power7` and `-static` when compiling tests and prepends `$HOME/bin/run_on_gccfarm gcc112` to test executables. In `target_list` you can now use the name `power7`. E.g. -`target_list="power7 power7/-ffast-math"` or it's shorthand +`target_list="power7 power7/-ffast-math"` or its shorthand `target_list="power7{,-ffast-math}"`. @@ -109,7 +109,7 @@ influencing the set of tests to generate and whether the test driver should expect a failure. -Then the test must at least `#include "bits/verify.h", which provides `main` +Then the test must at least `#include "bits/verify.h"`, which provides `main` and declares a `template void test()` function, which the test has to define. The template parameter is set to `simd` type where `T` and `Abi` are determined by the type and ABI subset dimensions.
Re: [PATCH, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v4]
On Fri, Jun 18, 2021 at 10:25:16PM +0800, Chung-Lin Tang wrote: Note, you'll need to rebase your patch, it clashes with r12-1768-g7619d33471c10fe3d149dcbb701d99ed3dd23528. Sorry for that. And sorry for patch review delay. > --- a/gcc/c/c-typeck.c > +++ b/gcc/c/c-typeck.c > @@ -13104,6 +13104,12 @@ handle_omp_array_sections_1 (tree c, tree t, > vec &types, > return error_mark_node; > } > t = TREE_OPERAND (t, 0); > + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Map clauses never appear on declare simd, so (ort == C_ORT_ACC || ort == C_ORT_OMP) previously meant always and since the in_reduction change is incorrect (as C_ORT_OMP_TARGET is used for target construct but not for e.g. target data* or target update). > + && TREE_CODE (t) == MEM_REF) So please just use if (TREE_CODE (t) == MEM_REF) or explain when it shouldn't trigger. > @@ -14736,6 +14743,11 @@ c_finish_omp_clauses (tree clauses, enum > c_omp_region_type ort) > { > while (TREE_CODE (t) == COMPONENT_REF) > t = TREE_OPERAND (t, 0); > + if (TREE_CODE (t) == MEM_REF) > + { > + t = TREE_OPERAND (t, 0); > + STRIP_NOPS (t); > + } This doesn't look correct. At least the parsing (and the spec AFAIK) doesn't ensure that if there is ->, it must come before all the dots. So, if one uses map (s->x.y) the above would work, but if map (s->x.y->z) or map (s.a->b->c->d->e) is used, it wouldn't. I'd expect a single while loop that looks through COMPONENT_REFs and MEM_REFs as they appear. Maybe the handle_omp_array_sections_1 MEM_REF case too? Or do you want to have it done incrementally, start with supporting only a single -> first before all the dots and later on add support for the rest? I think the 5.0 and especially 5.1 wording basically says that map clause operand is arbitrary lvalue expression that includes array section support too, so eventually we should just have somewhere in parsing scope a bool whether OpenMP array sections are allowed or not, add OMP_ARRAY_REF or similar tree code for those and after parsing the expression, ensure array sections appear only where they can appear and for a subset of the lvalue expressions where we have decl plus series of -> field or . field or [ index ] or [ array section stuff ] handle those specially. That arbitrary lvalue can certainly be done incrementally. map (foo(123)->a.b[3]->c.d[:7]) and the like. > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP > && OMP_CLAUSE_MAP_IMPLICIT (c) > && (bitmap_bit_p (&map_head, DECL_UID (t)) > @@ -14802,6 +14814,15 @@ c_finish_omp_clauses (tree clauses, enum > c_omp_region_type ort) > bias) to zero here, so it is not set erroneously to the pointer > size later on in gimplify.c. */ > OMP_CLAUSE_SIZE (c) = size_zero_node; > + indir_component_ref_p = false; > + if ((ort == C_ORT_ACC || ort == C_ORT_OMP) Same comment about ort tests. > + && TREE_CODE (t) == COMPONENT_REF > + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) > + { > + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); > + indir_component_ref_p = true; > + STRIP_NOPS (t); > + } Again, this can handle only a single -> > @@ -42330,16 +42328,10 @@ cp_parser_omp_target (cp_parser *parser, cp_token > *pragma_tok, > cclauses[C_OMP_CLAUSE_SPLIT_TARGET] = tc; > } > } > - tree stmt = make_node (OMP_TARGET); > - TREE_TYPE (stmt) = void_type_node; > - OMP_TARGET_CLAUSES (stmt) = cclauses[C_OMP_CLAUSE_SPLIT_TARGET]; > - c_omp_adjust_map_clauses (OMP_TARGET_CLAUSES (stmt), true); > - OMP_TARGET_BODY (stmt) = body; > - OMP_TARGET_COMBINED (stmt) = 1; > - SET_EXPR_LOCATION (stmt, pragma_tok->location); > - add_stmt (stmt); > - pc = &OMP_TARGET_CLAUSES (stmt); > - goto check_clauses; > + c_omp_adjust_map_clauses (cclauses[C_OMP_CLAUSE_SPLIT_TARGET], true); > + finish_omp_target (pragma_tok->location, > + cclauses[C_OMP_CLAUSE_SPLIT_TARGET], body, true); What is the advantage of finish_omp_target. Perhaps the check_clauses label can be renamed and more things common to both paths moved after the label if needed, but as long as it isn't something also called during instantiation, I find it cleaner to do it in cp_parser_omp_target at one place. The reason for e.g. finish_omp_parallel is that it is called from both parsing and instantiation. > --- a/gcc/cp/semantics.c > +++ b/gcc/cp/semantics.c > @@ -4990,6 +4990,9 @@ handle_omp_array_sections_1 (tree c, tree t, vec > &types, > { >if (error_operand_p (t)) > return error_mark_node; > + if ((ort == C_ORT_ACC || ort ==
Re: [PATCH] c: Fix C cast error-recovery [PR101171]
On Thu, Jun 24, 2021 at 12:11:23PM +0200, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs during error-recovery, as build_c_cast calls > note_integer_operands on error_mark_node and that wraps it into > C_MAYBE_CONST_EXPR which is unexpected and causes ICE later on. > Seems most other callers of note_integer_operands check early if something > is error_mark_node and return before calling note_integer_operands on it. Yeah. > The following patch fixes it by not calling on error_mark_node, another > possibility would be to handle error_mark_node in note_integer_operands and > just return it. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? This looks OK to me, thanks. > 2021-06-24 Jakub Jelinek > > PR c/101171 > * c-typeck.c (build_c_cast): Don't call note_integer_operands on > error_mark_node. > > * gcc.dg/pr101171.c: New test. > > --- gcc/c/c-typeck.c.jj 2021-06-23 13:33:00.375434219 +0200 > +++ gcc/c/c-typeck.c 2021-06-23 17:51:17.501401208 +0200 > @@ -6131,6 +6131,7 @@ build_c_cast (location_t loc, tree type, > return value reflects this. */ >if (int_operands >&& INTEGRAL_TYPE_P (type) > + && value != error_mark_node >&& !EXPR_INT_CONST_OPERANDS (value)) > value = note_integer_operands (value); > > --- gcc/testsuite/gcc.dg/pr101171.c.jj2021-06-23 17:56:53.409896019 > +0200 > +++ gcc/testsuite/gcc.dg/pr101171.c 2021-06-23 17:56:28.668227851 +0200 > @@ -0,0 +1,13 @@ > +/* PR c/101171 */ > +/* { dg-do compile } */ > +/* { dg-options "" } */ > + > +extern void foo (void); > +int x = 0x1234; > + > +void > +bar (void) > +{ > + if (x != (sizeof ((enum T) 0x1234))) /* { dg-error "conversion to > incomplete type" } */ > +foo (); > +} > > Jakub > Marek
Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
On 6/24/21 8:29 AM, Richard Biener wrote: On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches wrote: You quote opportunities that are catched with this like + if (a != 0) +{ + c = b; +} + for (unsigned i = 0; i < c; i++) +{ + if (a != 0) + { + if (i >= b) + /* Should be eliminated. +*/ + foo (); but say other "obvious" cases like + if (a != 0) +{ + c = b; +} + for (unsigned i = 0; i < c; i++) +{ + if (a != 0) + { + /* Should be zero. */ return b - c; are not handled. That's in line with the current "value predication" which mainly aims at catching simple jump threading opportunities; you only simplify conditions with the recorded equivalences. But then the complexity of handling equivalences does probably not outweight the opportunities catched - can you share some numbers on how many branches are newly known taken during VN with this patch during say bootstrap or build of SPEC CPU? I've hoped to ditch the current "value predication" code by eventually using the relation oracle from ranger but did not yet have the chance to look into that. Now, the predicated equivalences are likely not something that infrastructure can handle? In the end I think we should research into maintaining an alternate expression table for conditions (those we like to simplify with equivalences) and use a data structure that more easily allows to introduce (temporary) equivalences. Like by maintaining back-references of values we've recorded a condition for and a way to quickly re-canonicalize conditions. Well - it requires some research, as said. The idea would be to handle all this eventually if it doesnt now. It'll certainly be capable of it. we havent looked into any missed cases yet. early days :-) The oracle handles predicated things just fine. We're missing transitive relations, and any time an edge has multiple predecessors we sort of bail on predicated things for now. I also haven't gotten to producing a nice relation/equivlance map of what we actually know... That might be worthwhile sooner than later. THe original function in EVRP currently looks like: === BB 2 : if (a_5(D) == b_6(D)) goto ; [INV] else goto ; [INV] === BB 8 Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides a_5 and b_6 as equivalences : goto ; [100.00%] === BB 6 : # i_1 = PHI <0(8), i_10(5)> if (i_1 < a_5(D)) goto ; [INV] else goto ; [INV] === BB 3 Relational : (i_1 < a_5(D)) edge 6->3 provides this relation : if (i_1 == b_6(D)) goto ; [INV] else goto ; [INV] So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < a_5 in BB3 as well.. so we should be able to indicate that i_1 == b_6 as [0,0].. we currently aren't. I think I had turned on equivalence mapping during relational processing, so should be able to tag that without transitive relations... I'll have a look at why. And once we get a bit further along, you will be able to access this without ranger.. if one wants to simply register the relations directly. Anyway, I'll get back to you why its currently being missed. Andrew
Re: [RFC] Return NULL from gimple_call_return_type if no return available.
On 6/24/21 5:07 AM, Richard Biener wrote: On Wed, Jun 23, 2021 at 9:25 PM Andrew MacLeod wrote: On 6/23/21 2:37 PM, Richard Biener via Gcc-patches wrote: On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches wrote: The call to gimple_call_fntype() in gimple_call_return_type() may return NULL, which causes the TREE_TYPE(lhs) to ICE. I think it would be best to return NULL (or void_type_node) rather than aborting. I'm running into this because fold_using_range::range_of_call, calls gimple_call_return_type which may ICE for builtins with no LHS. Instead of special casing things in range_of_call, perhaps it's best to plug the source. Does this sound reasonable? No, you need to make sure to not call this on an internal function call instead. Otherwise it is never NULL. Richard. Out of curiosity, why is it not OK to call this on an internal function call? Shouldn't all calls return something at least? like VOIDmode if they don't return anything? It just seems like it needs to be special cased either at any possible call site, or simply in the routine. we stumbled across it and it wasn't obvious why. Well, gimple_call_fntype simply returns NULL because internal functions do not have any API/ABI. So you either deal with a NULL return value but then explicitely checking for an internal function call is clearly better from a source documentation point of view. I think the existing type == NULL check was likely added to avoid touching too much code and we somehow didn't think of internal function calls w/o a LHS (but why are you asking for the return type for a call w/o LHS?) We'll still compute values for statements that don't have a LHS.. there's nothing inherently wrong with that. The primary example is if (x_2 < y_3) we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS. It primarily becomes a generic way to ask for the range of each of the operands of the statement, and process it regardless of the presence of a LHS. I don't know, maybe there is (or will be) an internal function that doesn't have a LHS but which can be folded away/rewritten if the operands are certain values. Andrew
Re: [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups
On Tue, 8 Jun 2021 at 13:10, Matthias Kretz wrote: > > The following patches mostly contain code cleanups and minor corrections. The > major feature in this patchset is the last patch, which should make the use of > stdx::simd much safer wrt. ODR violations involuntarily introduced by linking > TUs that were compiled with different -m and floating-point flags. > > Matthias Kretz (11): > libstdc++: Improve copysign codegen > libstdc++: Remove dead code > libstdc++: Improve fixed_size codegen > libstdc++: Make use of __builtin_bit_cast > libstdc++: Remove incorrect fabs overload > libstdc++: Minor simd_math cleanups > libstdc++: Fix condition when AVX512F ldexp implementation is used > libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil > libstdc++: Ensure unrolled loops inline the lambda > libstdc++: Fix internal names: add missing underscores > libstdc++: Fix ODR issues with different -m flags Thanks! I've pushed all except the bit_cast one (as discussed on IRC) and the ODR one (which I'm still reviewing).
Re: [RFC] Return NULL from gimple_call_return_type if no return available.
On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches wrote: > We'll still compute values for statements that don't have a LHS.. there's > nothing inherently wrong with that. The primary example is > > if (x_2 < y_3) > > we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS. It > primarily becomes a generic way to ask for the range of each of the operands > of the statement, and process it regardless of the presence of a LHS. I > don't know, maybe there is (or will be) an internal function that doesn't > have a LHS but which can be folded away/rewritten if the operands are > certain values. There are many internal functions that aren't ECF_CONST or ECF_PURE. Some of them, like IFN*STORE* I think never have an lhs, others have them, but if the lhs is unused, various optimization passes can just remove those lhs from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls would be DCEd). I think generally, if a call doesn't have lhs, there is no point in computing a value range for that missing lhs. It won't be useful for the call arguments to lhs direction (nothing would care about that value) and it won't be useful on the direction from the lhs to the call arguments either. Say if one has p_23 = __builtin_memcpy (p_75, q_23, 16); then one can imply from ~[0, 0] range on p_75 that p_23 has that range too (and vice versa), but if one has __builtin_memcpy (p_125, q_23, 16); none of that makes sense. So instead of punting when gimple_call_return_type returns NULL IMHO the code should punt when gimple_call_lhs is NULL. Jakub
[PATCH v2 1/2] Add -f[no-]direct-extern-access
-fdirect-extern-access is the default. With -fno-direct-extern-access: 1. Always use GOT to access undefined data and function symbols, including in PIE and non-PIE. These will avoid copy relocations in executables. This is compatible with existing executables and shared libraries. 2. In executable and shared library, bind symbols with the STV_PROTECTED visibility locally: a. The address of data symbol is the address of data body. b. For systems without function descriptor, the function pointer is the address of function body. c. The resulting shared libraries may not be incompatible with executables which have copy relocations on protected symbols. 3. Update asm_preferred_eh_data_format to properly select EH encoding format with -fno-direct-extern-access. 4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy relocation with -fno-direct-extern-access. gcc/ PR target/35513 PR target/100593 * common.opt: Add -fdirect-extern-access. * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a bool argument. * config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool argument to indicate call operand. Force non-call load from GOT for -fno-direct-extern-access. (legitimate_pic_address_disp_p): Avoid copy relocation in PIE for -fno-direct-extern-access. (ix86_print_operand): Pass true to ix86_force_load_from_GOT_p for call operand. (asm_preferred_eh_data_format): Use PC-relative format for -fno-direct-extern-access to avoid copy relocation. Check ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4. (ix86_binds_local_p): Don't treat protected data as extern and avoid copy relocation on common symbol with -fno-direct-extern-access. (ix86_reloc_rw_mask): New to avoid copy relocation for -fno-direct-extern-access. (TARGET_ASM_RELOC_RW_MASK): New. * doc/invoke.texi: Document -f[no-]direct-extern-access. gcc/testsuite/ PR target/35513 PR target/100593 * g++.dg/pr35513-1.C: New file. * g++.dg/pr35513-2.C: Likewise. * gcc.target/i386/pr35513-1.c: Likewise. * gcc.target/i386/pr35513-2.c: Likewise. * gcc.target/i386/pr35513-3.c: Likewise. * gcc.target/i386/pr35513-4.c: Likewise. * gcc.target/i386/pr35513-5.c: Likewise. * gcc.target/i386/pr35513-6.c: Likewise. * gcc.target/i386/pr35513-7.c: Likewise. * gcc.target/i386/pr35513-8.c: Likewise. --- gcc/common.opt| 4 ++ gcc/config/i386/i386-protos.h | 2 +- gcc/config/i386/i386.c| 50 +++-- gcc/doc/invoke.texi | 13 ++ gcc/testsuite/g++.dg/pr35513-1.C | 25 +++ gcc/testsuite/g++.dg/pr35513-2.C | 53 +++ gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++ gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++ gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++ gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++ gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++ gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++ gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++ gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++ 14 files changed, 278 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-8.c diff --git a/gcc/common.opt b/gcc/common.opt index a1353e06bdc..b4827f59cfa 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1427,6 +1427,10 @@ fdiagnostics-minimum-margin-width= Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6) Set minimum width of left margin of source code when showing source. +fdirect-extern-access +Common Var(flag_direct_extern_access) Init(1) Optimization +Do not use GOT to access external symbols. + fdisable- Common Joined RejectNegative Var(common_deferred_options) Defer -fdisable-[tree|rtl|ipa]-=range1+range2 Disable an optimization pass. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 65fc307dc7b..3f1bdd14e6d 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -77,7 +77,7 @@ extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, rtx, rtx, bool); extern bool constant_address
[PATCH v2 0/2] Implement indirect external access
Changes in the v2 patch. 1. Rename the option to -fdirect-extern-access. --- On systems with copy relocation: * A copy in executable is created for the definition in a shared library at run-time by ld.so. * The copy is referenced by executable and shared libraries. * Executable can access the copy directly. Issues are: * Overhead of a copy, time and space, may be visible at run-time. * Read-only data in the shared library becomes read-write copy in executable at run-time. * Local access to data with the STV_PROTECTED visibility in the shared library must use GOT. On systems without function descriptor, function pointers vary depending on where and how the functions are defined. * If the function is defined in executable, it can be the address of function body. * If the function, including the function with STV_PROTECTED visibility, is defined in the shared library, it can be the address of the PLT entry in executable or shared library. Issues are: * The address of function body may not be used as its function pointer. * ld.so needs to search loaded shared libraries for the function pointer of the function with STV_PROTECTED visibility. Here is a proposal to remove copy relocation and use canonical function pointer: 1. Accesses, including in PIE and non-PIE, to undefined symbols must use GOT. a. Linker may optimize out GOT access if the data is defined in PIE or non-PIE. 2. Read-only data in the shared library remain read-only at run-time 3. Address of global data with the STV_PROTECTED visibility in the shared library is the address of data body. a. Can use IP-relative access. b. May need GOT without IP-relative access. 4. For systems without function descriptor, a. All global function pointers of undefined functions in PIE and non-PIE must use GOT. Linker may optimize out GOT access if the function is defined in PIE or non-PIE. b. Function pointer of functions with the STV_PROTECTED visibility in executable and shared library is the address of function body. i. Can use IP-relative access. ii. May need GOT without IP-relative access. iii. Branches to undefined functions may use PLT. 5. Single global definition marker: Add GNU_PROPERTY_1_NEEDED: #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO to indicate the needed properties by the object file. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) to indicate that the object file requires canonical function pointers and cannot be used with copy relocation. This bit should be cleared in executable when there are non-GOT or non-PLT relocations in relocatable input files without this bit set. a. Protected symbol access within the shared library can be treated as local. b. Copy relocation should be disallowed at link-time and run-time. c. GOT function pointer reference is required at link-time and run-time. The indirect external access marker can be used in the following ways: 1. Linker can decide the best way to resolve a relocation against a protected symbol before seeing all relocations against the symbol. 2. Dynamic linker can decide if it is an error to have a copy relocation in executable against the protected symbol in a shared library by checking if the shared library is built with -fno-direct-extern-access. Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is the default. With -fno-direct-extern-access: 1. Always to use GOT to access undefined symbols, including in PIE and non-PIE. This is safe to do and does not break the ABI. 2. In executable and shared library, for symbols with the STV_PROTECTED visibility: a. The address of data symbol is the address of data body. b. For systems without function descriptor, the function pointer is the address of function body. These break the ABI and resulting shared libraries may not be compatible with executables which are not compiled with -fno-direct-extern-access. 3. Generate an indirect external access marker in relocatable objects if supported by linker. H.J. Lu (2): Add -f[no-]direct-extern-access Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE gcc/common.opt| 4 ++ gcc/config.in | 6 +++ gcc/config/i386/gnu-property.c| 31 - gcc/config/i386/i386-protos.h | 2 +- gcc/config/i386/i386.c| 52 -- gcc/configure | 24 ++ gcc/configure.ac | 20 + gcc/doc/invoke.texi | 13 ++ gcc/doc/tm.texi | 5 +++ gcc/doc/tm.texi.in| 2 + gcc/output.h | 2 + gcc/target.def| 8 gcc/testsuite/g++.dg/pr35513-1.C | 25 +++ gcc/testsuite/g++.dg/pr35513-2.C | 53 +++ gcc/testsuite/gcc.target/i386/pr35513
[PATCH v2 2/2] Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE
Generate the marker for -fno-direct-extern-access to indicate that the object file uses GOT to access all external symbols. Access to protected symbols in the resulting shared library is treated as local, which requires canonical function pointers and cannot be used with copy relocation. * configure.ac (HAVE_LD_INDIRECT_EXTERN_ACCESS_SUPPORT): New. Define to 1 if linker supports -z indirect-extern-access. * output.h (emit_gnu_property): New. (emit_gnu_property_note): Likewise. * target.def (emit_gnu_property_note): Add a argetm.asm_out hook. * toplev.c (compile_file): Call emit_gnu_property_note before file_end. * varasm.c (emit_gnu_property): New. (emit_gnu_property_note): Likewise. * config.in: Regenerated. * configure: Likewise. * doc/tm.texi: Likewise. * config/i386/gnu-property.c (emit_gnu_property): Removed. (TARGET_ASM_EMIT_GNU_PROPERTY_NOTE): New. * doc/tm.texi.in: Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE. --- gcc/config.in | 6 + gcc/config/i386/gnu-property.c | 31 -- gcc/config/i386/i386.c | 2 ++ gcc/configure | 24 + gcc/configure.ac | 20 +++ gcc/doc/tm.texi| 5 gcc/doc/tm.texi.in | 2 ++ gcc/output.h | 2 ++ gcc/target.def | 8 ++ gcc/toplev.c | 3 +++ gcc/varasm.c | 47 ++ 11 files changed, 119 insertions(+), 31 deletions(-) diff --git a/gcc/config.in b/gcc/config.in index 18e627141cc..7a3e003f8ac 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -1640,6 +1640,12 @@ #endif +/* Define to 1 if your linker supports -z indirect-extern-access */ +#ifndef USED_FOR_TARGET +#undef HAVE_LD_INDIRECT_EXTERN_ACCESS_SUPPORT +#endif + + /* Define if your PowerPC64 linker supports a large TOC. */ #ifndef USED_FOR_TARGET #undef HAVE_LD_LARGE_TOC diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c index 4ba04403002..9fe8d00132e 100644 --- a/gcc/config/i386/gnu-property.c +++ b/gcc/config/i386/gnu-property.c @@ -24,37 +24,6 @@ along with GCC; see the file COPYING3. If not see #include "output.h" #include "linux-common.h" -static void -emit_gnu_property (unsigned int type, unsigned int data) -{ - int p2align = ptr_mode == SImode ? 2 : 3; - - switch_to_section (get_section (".note.gnu.property", - SECTION_NOTYPE, NULL)); - - ASM_OUTPUT_ALIGN (asm_out_file, p2align); - /* name length. */ - fprintf (asm_out_file, ASM_LONG "1f - 0f\n"); - /* data length. */ - fprintf (asm_out_file, ASM_LONG "4f - 1f\n"); - /* note type: NT_GNU_PROPERTY_TYPE_0. */ - fprintf (asm_out_file, ASM_LONG "5\n"); - fprintf (asm_out_file, "0:\n"); - /* vendor name: "GNU". */ - fprintf (asm_out_file, STRING_ASM_OP "\"GNU\"\n"); - fprintf (asm_out_file, "1:\n"); - ASM_OUTPUT_ALIGN (asm_out_file, p2align); - /* pr_type. */ - fprintf (asm_out_file, ASM_LONG "0x%x\n", type); - /* pr_datasz. */ - fprintf (asm_out_file, ASM_LONG "3f - 2f\n"); - fprintf (asm_out_file, "2:\n"); - fprintf (asm_out_file, ASM_LONG "0x%x\n", data); - fprintf (asm_out_file, "3:\n"); - ASM_OUTPUT_ALIGN (asm_out_file, p2align); - fprintf (asm_out_file, "4:\n"); -} - void file_end_indicate_exec_stack_and_gnu_property (void) { diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4ab85cc4fe0..8e86781299e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24042,6 +24042,8 @@ ix86_run_selftests (void) #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES # undef TARGET_ASM_RELOC_RW_MASK # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask +# undef TARGET_ASM_EMIT_GNU_PROPERTY_NOTE +# define TARGET_ASM_EMIT_GNU_PROPERTY_NOTE emit_gnu_property_note #endif static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED) diff --git a/gcc/configure b/gcc/configure index f0b2ebde3cf..1b92dd803ca 100755 --- a/gcc/configure +++ b/gcc/configure @@ -32177,6 +32177,30 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ld_bndplt_support" >&5 $as_echo "$ld_bndplt_support" >&6; } +# Check linker supports '-z indirect-extern-access' +ld_indirect_extern_access=0 +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker -z indirect-extern-access option" >&5 +$as_echo_n "checking linker -z indirect-extern-access option... " >&6; } +if test x"$ld_is_gold" = xno; then + if test $in_tree_ld = yes ; then +if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" -ge 37 -o "$gcc_cv_gld_major_version" -gt 2; then + ld_indirect_extern_access=1 +fi + elif test x$gcc_cv_ld != x; then +# Check if linker supports -z indirect-extern-access option +if $gcc_cv_ld --help 2>&1 | grep -- '-z indirect-extern-access' > /dev/null; then + ld_indirect
Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface
Hi Jason. > On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote: >> This patch introduces a dwarf2int.h header, to be used by code that >> needs access to the internal DIE structures and their attributes. > > Why not put these bits in dwarf2out.h? We think that it makes sense to have a separated interface file for the implementation of DWARF-based debug formats. It is called `internal' because it provides access to internal data structures as well as the basic accessor functions to the internals of the DWARF DIEs. That said, if you think this is a blocker, we can put these definitions in dwarf2out.h. But we think dwarf2out.c is in much need of some refactoring, and this could be a first step in that direction :)
Re: [RFC] Return NULL from gimple_call_return_type if no return available.
On 6/24/21 9:45 AM, Jakub Jelinek wrote: On Thu, Jun 24, 2021 at 09:31:13AM -0400, Andrew MacLeod via Gcc-patches wrote: We'll still compute values for statements that don't have a LHS.. there's nothing inherently wrong with that. The primary example is if (x_2 < y_3) we will compute [0,0] [1,1] or [0,1] for that statement, without a LHS. It primarily becomes a generic way to ask for the range of each of the operands of the statement, and process it regardless of the presence of a LHS. I don't know, maybe there is (or will be) an internal function that doesn't have a LHS but which can be folded away/rewritten if the operands are certain values. There are many internal functions that aren't ECF_CONST or ECF_PURE. Some of them, like IFN*STORE* I think never have an lhs, others have them, but if the lhs is unused, various optimization passes can just remove those lhs from the internal fn calls (if they'd be ECF_CONST or ECF_PURE, the calls would be DCEd). I think generally, if a call doesn't have lhs, there is no point in computing a value range for that missing lhs. It won't be useful for the call arguments to lhs direction (nothing would care about that value) and it won't be useful on the direction from the lhs to the call arguments either. Say if one has p_23 = __builtin_memcpy (p_75, q_23, 16); then one can imply from ~[0, 0] range on p_75 that p_23 has that range too (and vice versa), but if one has __builtin_memcpy (p_125, q_23, 16); none of that makes sense. So instead of punting when gimple_call_return_type returns NULL IMHO the code should punt when gimple_call_lhs is NULL. Well, we are going to punt anyway, because the call type, whether it is NULL or VOIDmode is not supported by irange. It was more just a matter of figuring out whether us checking for internal call or the gimple_function_return_type call should do the check... Ultimately in the end it doesnt matter.. just seemed like something someone else could trip across if we didnt strengthen gimple_call_return_type to not ice. Andrew
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
For -ffast-math there was a missing using namespace __proposed left. The attached patch resolves the issue. From: Matthias Kretz The __bit_cast function was a hack to achieve what __builtin_bit_cast can do, therefore use __builtin_bit_cast if possible. However, __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since it isn't trivially copyable (in the language sense — in principle it is). Therefore add __proposed::simd_bit_cast to enable the use case required in the test framework. Signed-off-by: Matthias Kretz libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Implement via __builtin_bit_cast #if available. (__proposed::simd_bit_cast): Add overloads for simd and simd_mask, which use __builtin_bit_cast (or __bit_cast #if not available), which return an object of the requested type with the same bits as the argument. * include/experimental/bits/simd_math.h: Use simd_bit_cast instead of __bit_cast to allow casts to fixed_size_simd. (copysign): Remove branch that was only required if __bit_cast cannot be constexpr. * testsuite/experimental/simd/tests/bits/test_values.h: Switch from __bit_cast to __proposed::simd_bit_cast since the former will not cast fixed_size objects anymore. --- libstdc++-v3/include/experimental/bits/simd.h | 57 ++- .../include/experimental/bits/simd_math.h | 37 ++-- .../simd/tests/bits/test_values.h | 8 +-- 3 files changed, 76 insertions(+), 26 deletions(-) -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ── diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 163f1b574e2..852d0b62012 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1598,7 +1598,9 @@ template _GLIBCXX_SIMD_INTRINSIC constexpr _To __bit_cast(const _From __x) { -// TODO: implement with / replace by __builtin_bit_cast ASAP +#if __has_builtin(__builtin_bit_cast) +return __builtin_bit_cast(_To, __x); +#else static_assert(sizeof(_To) == sizeof(_From)); constexpr bool __to_is_vectorizable = is_arithmetic_v<_To> || is_enum_v<_To>; @@ -1629,6 +1631,7 @@ template reinterpret_cast(&__x), sizeof(_To)); return __r; } +#endif } // }}} @@ -2900,6 +2903,58 @@ template (__x)}; } + +template + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd<_Up, _Abi>& __x) + { +using _Tp = typename _To::value_type; +using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; +using _From = simd<_Up, _Abi>; +using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember; +// with concepts, the following should be constraints +static_assert(sizeof(_To) == sizeof(_From)); +static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>); +static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>); +#if __has_builtin(__builtin_bit_cast) +return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))}; +#else +return {__private_init, __bit_cast<_ToMember>(__data(__x))}; +#endif + } + +template + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd_mask<_Up, _Abi>& __x) + { +using _From = simd_mask<_Up, _Abi>; +static_assert(sizeof(_To) == sizeof(_From)); +static_assert(is_trivially_copyable_v<_From>); +// _To can be simd, specifically simd> in which case _To is not trivially +// copyable. +if constexpr (is_simd_v<_To>) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + static_assert(is_trivially_copyable_v<_ToMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __x)}; +#else + return {__private_init, __bit_cast<_ToMember>(__x)}; +#endif + } +else + { + static_assert(is_trivially_copyable_v<_To>); +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else + return __bit_cast<_To>(__x); +#endif + } + } } // namespace __proposed // simd_cast {{{2 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index d954e761eee..ef2bdc641b8 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -405,10 +405,11 @@ template using _Vp = simd<_Tp, _Abi>; usin
Re: GCC documentation: porting to Sphinx
On 6/23/21 6:00 PM, Joseph Myers wrote: On Wed, 23 Jun 2021, Martin Liška wrote: @Joseph: Can you share your thoughts about the used Makefile integration? What do you suggest for 2) (note that explicit listing of all .rst file would be crazy)? You can write dependencies on e.g. doc/gcc/*.rst (which might be more files than actually are relevant in some cases, if the directory includes some common files shared by some but not all manuals, but should be conservatively safe if you list appropriate directories there), rather than needing to name all the individual files. Doing things with makefile dependencies seems better than relying on what sphinx-build does when rerun unnecessarily (if sphinx-build avoids rebuilding in some cases where the makefiles think a rebuild is needed, that's fine as an optimization). All right. I've just done that and it was easier than I expected. Now the dependencies are properly followed. It looks like this makefile integration loses some of the srcinfo / srcman support. That support should stay (be updated for the use of Sphinx) so that release tarballs (as generated by maintainer-scripts/gcc_release, which uses --enable-generated-files-in-srcdir) continue to include man pages / info files (and make sure that, if those files are present in the source directory, then building and installing GCC does install them even when sphinx-build is absent at build/install time). Oh, and I've just recovered this one as well. Pushed changes to the me/sphinx-v2 branch and I'm waiting for more feedback. In the meantime, I'm going to prepare further integration of other manuals and targets (PDF, HTML). Martin
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > --- a/libstdc++-v3/include/experimental/bits/simd.h > +++ b/libstdc++-v3/include/experimental/bits/simd.h > @@ -1598,7 +1598,9 @@ template >_GLIBCXX_SIMD_INTRINSIC constexpr _To >__bit_cast(const _From __x) >{ > -// TODO: implement with / replace by __builtin_bit_cast ASAP > +#if __has_builtin(__builtin_bit_cast) Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in c++config to define a new macro and use that macro here? Though it is true that c++config already uses #if __has_builtin(__builtin_is_constant_evaluated) and so would fail miserably for compilers that don't support __has_builtin Jakub
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > @@ -1598,7 +1598,9 @@ template > >_GLIBCXX_SIMD_INTRINSIC constexpr _To > >__bit_cast(const _From __x) > >{ > > -// TODO: implement with / replace by __builtin_bit_cast ASAP > > +#if __has_builtin(__builtin_bit_cast) > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > c++config to define a new macro and use that macro here? > Though it is true that c++config already uses > #if __has_builtin(__builtin_is_constant_evaluated) > and so would fail miserably for compilers that don't support __has_builtin GCC was the last of our supported compilers to implement __has_builtin, so for GCC trunk we can assume that it's always supported. The code in c++config.h still has some value for built-ins that aren't called __builtin_xxx because older versions of Clang need different handling for those. But for __builtin_bit_cast and __builtin_is_constant_evaluted we can just use __is_builtin directly.
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, 24 Jun 2021 at 15:11, Jonathan Wakely wrote: > > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > @@ -1598,7 +1598,9 @@ template > > >_GLIBCXX_SIMD_INTRINSIC constexpr _To > > >__bit_cast(const _From __x) > > >{ > > > -// TODO: implement with / replace by __builtin_bit_cast ASAP > > > +#if __has_builtin(__builtin_bit_cast) > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > c++config to define a new macro and use that macro here? > > Though it is true that c++config already uses > > #if __has_builtin(__builtin_is_constant_evaluated) > > and so would fail miserably for compilers that don't support __has_builtin > > GCC was the last of our supported compilers to implement > __has_builtin, so for GCC trunk we can assume that it's always > supported. > > The code in c++config.h still has some value for built-ins that aren't > called __builtin_xxx because older versions of Clang need different > handling for those. But for __builtin_bit_cast and > __builtin_is_constant_evaluted we can just use __is_builtin directly. s/__is_builtin/__has_builtin/
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote: > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > @@ -1598,7 +1598,9 @@ template > > >_GLIBCXX_SIMD_INTRINSIC constexpr _To > > >__bit_cast(const _From __x) > > >{ > > > -// TODO: implement with / replace by __builtin_bit_cast ASAP > > > +#if __has_builtin(__builtin_bit_cast) > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > c++config to define a new macro and use that macro here? > > Though it is true that c++config already uses > > #if __has_builtin(__builtin_is_constant_evaluated) > > and so would fail miserably for compilers that don't support __has_builtin > > GCC was the last of our supported compilers to implement > __has_builtin, so for GCC trunk we can assume that it's always > supported. We don't support mixing GCC and libstdc++ versions, so I'm not worried about GCC. At least according to godbolt, already clang 3.0 supports it which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't support it, only ICC 2021 does. And ICC 19.1 seems to be released in October 2020. So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its definition a little bit earlier and use it also for __builtin_is_constant_evaluated? Jakub
Re: [PATCH][RFC] Add x86 subadd SLP pattern
On Thu, Jun 24, 2021 at 1:07 PM Richard Biener wrote: > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 > instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... } > thus subtract, add alternating on lanes, starting with subtract. > > It adds a corresponding optab and direct internal function, > vec_addsub$a3 and renames the existing i386 backend patterns to > the new canonical name. > > The SLP pattern matches the exact alternating lane sequence rather > than trying to be clever and anticipating incoming permutes - we > could permute the two input vectors to the needed lane alternation, > do the addsub and then permute the result vector back but that's > only profitable in case the two input or the output permute will > vanish - something Tamars refactoring of SLP pattern recog should > make possible. Using the attached patch, I was also able to generate addsub for the following testcase: float x[2], y[2], z[2]; void foo () { x[0] = y[0] - z[0]; x[1] = y[1] + z[1]; } vmovq y(%rip), %xmm0 vmovq z(%rip), %xmm1 vaddsubps %xmm1, %xmm0, %xmm0 vmovlps %xmm0, x(%rip) ret Uros. diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index e887f03474d..5f10572718d 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -788,6 +788,24 @@ (define_insn "*mmx_haddsubv2sf3" (set_attr "prefix_extra" "1") (set_attr "mode" "V2SF")]) +(define_insn "vec_addsubv2sf3" + [(set (match_operand:V2SF 0 "register_operand" "=x,x") + (vec_merge:V2SF + (minus:V2SF + (match_operand:V2SF 1 "register_operand" "0,x") + (match_operand:V2SF 2 "register_operand" "x,x")) + (plus:V2SF (match_dup 1) (match_dup 2)) + (const_int 1)))] + "TARGET_SSE3 && TARGET_MMX_WITH_SSE" + "@ + addsubps\t{%2, %0|%0, %2} + vaddsubps\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "prefix" "orig,vex") + (set_attr "prefix_rep" "1,*") + (set_attr "mode" "V4SF")]) + ; ;; ;; Parallel single-precision floating point comparisons
Re: [PATCH] add -ltrans-objects lto-plugin debug option
On 6/22/2021 12:49 AM, Richard Biener wrote: This adds a -ltrans-objects option to lto-plugin that by-passes lto-wrapper invocation and instead feeds LD the final LTRANS objects directly from the response file given as argument to the option. This allows LD issues involving the linker-plugin path to be debugged in an easier way with just the IR objects (their symtab) and the LTRANS objects as testcase. I've tested the path re-building stage2 build/genmatch from an LTO bootstrap and got a bit-identical executable by adding -plugin-opt=-ltrans-objects=y to the original collect2 invocation, seeding y with the final objects as printed by building genmatch with -save-temps -v. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? 2021-06-22 Richard Biener lto-plugin/ * lto-plugin.c (ltrans_objects): New global. (all_symbols_read_handler): If -ltrans-objects was specified, add the output files from the specified file directly. (process_option): Handle -ltrans-objects. OK jeff
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, 24 Jun 2021 at 15:21, Jakub Jelinek wrote: > > On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote: > > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > > @@ -1598,7 +1598,9 @@ template > > > >_GLIBCXX_SIMD_INTRINSIC constexpr _To > > > >__bit_cast(const _From __x) > > > >{ > > > > -// TODO: implement with / replace by __builtin_bit_cast ASAP > > > > +#if __has_builtin(__builtin_bit_cast) > > > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > > c++config to define a new macro and use that macro here? > > > Though it is true that c++config already uses > > > #if __has_builtin(__builtin_is_constant_evaluated) > > > and so would fail miserably for compilers that don't support __has_builtin > > > > GCC was the last of our supported compilers to implement > > __has_builtin, so for GCC trunk we can assume that it's always > > supported. > > We don't support mixing GCC and libstdc++ versions, so I'm not worried > about GCC. At least according to godbolt, already clang 3.0 supports it > which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't > support it, only ICC 2021 does. And ICC 19.1 seems to be released in > October 2020. > > So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its > definition a little bit earlier and use it also for > __builtin_is_constant_evaluated? I discussed this with Judy Ward on the Intel compiler team. If you're using their compiler, you should be using the latest version. They also claim 100% compatibility with GCC, for versions they've been able to test. So if you are using libstdc++ headers from a GCC release that supports __has_builtin, then you need to use a release of the Intel compiler that supports __has_builtin. Otherwise, it's unsupported. So in GCC 12 C++ headers we support GCC 12, versions of Intel compatible with GCC 12, and the last few releases of Clang. All of those have __has_builtin. Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer to not use it where it isn't needed, as in the attached (untested) patch. diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 9911d4deb72..3c07590 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -55,7 +55,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef __cpp_lib_is_constant_evaluated // Support P1032R1 in C++20 (but not P0980R1 yet). # define __cpp_lib_constexpr_string 201811L -#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) // Support P0426R1 changes to char_traits in C++17. # define __cpp_lib_constexpr_string 201611L #elif __cplusplus > 201703L diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index 9314117aed8..3ec668b65cf 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -720,13 +720,11 @@ namespace std # define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 #endif -#ifdef __has_builtin -# ifdef __is_identifier +#ifdef __is_identifier // Intel and older Clang require !__is_identifier for some built-ins: -# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B) -# else -# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) -# endif +# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B) +#else +# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) #endif #if _GLIBCXX_HAS_BUILTIN(__has_unique_object_representations) @@ -737,18 +735,10 @@ namespace std # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 #endif -#if _GLIBCXX_HAS_BUILTIN(__builtin_is_constant_evaluated) -# define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 -#endif - #if _GLIBCXX_HAS_BUILTIN(__is_same) # define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 #endif -#if _GLIBCXX_HAS_BUILTIN(__builtin_launder) -# define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 -#endif - #undef _GLIBCXX_HAS_BUILTIN diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h index 3da6e28a513..77ad7be5dfb 100644 --- a/libstdc++-v3/include/bits/char_traits.h +++ b/libstdc++-v3/include/bits/char_traits.h @@ -238,7 +238,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef __cpp_lib_is_constant_evaluated // Unofficial macro indicating P1032R1 support in C++20 # define __cpp_lib_constexpr_char_traits 201811L -#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) // Unofficial macro indicating P0426R1 support in C++17 # define __cpp_lib_constexpr_char_traits 201611L #endif @@ -295,7 +295,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote: > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer > to not use it where it isn't needed, as in the attached (untested) > patch. My rationale for this is that I'd prefer to use standardized features like __has_include and __has_cpp_attribute where possible, instead of adding more and more configure macros. You don't need to look in c++config.h to see how the macro is defined if you just use a standard feature directly. __has_builtin obviously isn't standardized, but as long as it's available on all the compilers we care about (which it is) then the same rationale applies.
Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface
On 6/24/21 9:52 AM, Jose E. Marchesi wrote: On 5/31/21 12:57 PM, Jose E. Marchesi via Gcc-patches wrote: This patch introduces a dwarf2int.h header, to be used by code that needs access to the internal DIE structures and their attributes. Why not put these bits in dwarf2out.h? We think that it makes sense to have a separated interface file for the implementation of DWARF-based debug formats. It is called `internal' because it provides access to internal data structures as well as the basic accessor functions to the internals of the DWARF DIEs. Yes, but "internal data structures" also describes most of the current dwarf2out.h. I'm not opposed to refactoring the header, but splitting off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better dividing line. Jason
Re: [PATCH] c++: requires-expression folding [PR101182]
On 6/23/21 7:37 PM, Patrick Palka wrote: Here we're crashing because cp_fold_function walks into the (templated) requirements of a requires-expression outside a template, but the folding routines aren't prepared to handle templated trees. This patch fixes this by making cp_fold use evaluate_requires_expr to fold a requires-expression as a whole, which also means we no longer need to explicitly do so during gimplification. (We don't immediately fold non-dependent requires-expressions at parse time for sake of better diagnostics in case one appears as the condition of a failed static_assert.) Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on cmcstl2 and range-v3, does this look OK for trunk/11? OK. PR c++/101182 gcc/cp/ChangeLog: * constraint.cc (evaluate_requires_expr): Adjust function comment. * cp-gimplify.c (cp_genericize_r) : Move to ... (cp_fold) : ... here. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires27.C: New test. --- gcc/cp/constraint.cc | 2 +- gcc/cp/cp-gimplify.c | 10 -- gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C | 10 ++ 3 files changed, 15 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 74b16d27101..286332ba2a7 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -3340,7 +3340,7 @@ evaluate_concept_check (tree check) } /* Evaluate the requires-expression T, returning either boolean_true_node - or boolean_false_node. This is used during gimplification and constexpr + or boolean_false_node. This is used during folding and constexpr evaluation. */ tree diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index 96d91b6ef2f..33e10556e32 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -1465,12 +1465,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data) TARGET_EXPR_NO_ELIDE (stmt) = 1; break; -case REQUIRES_EXPR: - /* Emit the value of the requires-expression. */ - *stmt_p = evaluate_requires_expr (stmt); - *walk_subtrees = 0; - break; - case TEMPLATE_ID_EXPR: gcc_assert (concept_check_p (stmt)); /* Emit the value of the concept check. */ @@ -2708,6 +2702,10 @@ cp_fold (tree x) x = r; break; +case REQUIRES_EXPR: + x = evaluate_requires_expr (x); + break; + default: return org_x; } diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C new file mode 100644 index 000..87a759a689a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C @@ -0,0 +1,10 @@ +// PR c++/101182 +// { dg-do compile { target concepts } } + +int a; +int g(bool); + +int f() { + g(requires { a++; }); + return requires { a++; }; +}
Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
On Thu, Jun 24, 2021 at 03:40:09PM +0100, Jonathan Wakely wrote: > On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote: > > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer > > to not use it where it isn't needed, as in the attached (untested) > > patch. > > My rationale for this is that I'd prefer to use standardized features > like __has_include and __has_cpp_attribute where possible, instead of > adding more and more configure macros. You don't need to look in > c++config.h to see how the macro is defined if you just use a standard > feature directly. > > __has_builtin obviously isn't standardized, but as long as it's > available on all the compilers we care about (which it is) then the > same rationale applies. Okay. Jakub
MSPs, MSSPs Records
Hi, Hope you find this email well! Are you looking for a customer base of your competitors installed base which will help you reach niche target and also helps you to grab new clients for your latest service and products? We also have an exclusive database of: 1. Cloud Service Providers- CSPs 2. Independent Software Vendors- ISVs 3. System Integrators- SIs and more 4. Managed Service Providers- MSPs 5. Managed Security Service Providers- MSSPs and more Data can be customized based on your requirement (e.g. Job title, Verticals, Geography etc.) Please feel free to get back to me with your target criteria, if your criteria are different from the above-mentioned applications let me know we have close to 5000+ technology installations. Best Regards, Jennifer Daly Sr. Lead Analyst Note: This email is not expected to be a spam. It would be ideal if you acknowledge our conciliatory sentiments and answer in the subject taking with leave off to be expelled from our Mailing list. Why not try it/us out?
Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
On 6/24/21 9:25 AM, Andrew MacLeod wrote: On 6/24/21 8:29 AM, Richard Biener wrote: THe original function in EVRP currently looks like: === BB 2 : if (a_5(D) == b_6(D)) goto ; [INV] else goto ; [INV] === BB 8 Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides a_5 and b_6 as equivalences : goto ; [100.00%] === BB 6 : # i_1 = PHI <0(8), i_10(5)> if (i_1 < a_5(D)) goto ; [INV] else goto ; [INV] === BB 3 Relational : (i_1 < a_5(D)) edge 6->3 provides this relation : if (i_1 == b_6(D)) goto ; [INV] else goto ; [INV] So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < a_5 in BB3 as well.. so we should be able to indicate that i_1 == b_6 as [0,0].. we currently aren't. I think I had turned on equivalence mapping during relational processing, so should be able to tag that without transitive relations... I'll have a look at why. And once we get a bit further along, you will be able to access this without ranger.. if one wants to simply register the relations directly. Anyway, I'll get back to you why its currently being missed. Andrew As promised. There was a typo in the equivalency comparisons... so it was getting missed. With the fix, the oracle identifies the relation and evrp will now fold that expression away and the IL becomes: : if (a_5(D) == b_6(D)) goto ; [INV] else goto ; [INV] : i_10 = i_1 + 1; : # i_1 = PHI <0(2), i_10(3)> if (i_1 < a_5(D)) goto ; [INV] else goto ; [INV] : return; for the other cases you quote, there are no predictions such that if a != 0 then this equivalency exists... + if (a != 0) + { + c = b; + } but the oracle would register that in the TRUE block, c and b are equivalent... so some other pass that was interested in tracking conditions that make a block relevant would be able to compare relations... Andrew
Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec
On 6/24/21 3:27 AM, Richard Biener wrote: On Thu, Jun 24, 2021 at 12:56 AM Martin Sebor wrote: On 6/23/21 1:43 AM, Richard Biener wrote: On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders wrote: On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote: On 6/21/21 1:15 AM, Richard Biener wrote: [...] But maybe I'm misunderstanding C++ too much :/ Well, I guess b) from above means auto_vec<> passing to vec<> taking functions will need changes? Converting an auto_vec object to a vec slices off its data members. The auto_vec specialization has no data members so that's not a bug in and of itself, but auto_vec does have data members so that would be a bug. The risk is not just passing it to functions by value but also returning it. That risk was made worse by the addition of the move ctor. I would agree that the conversion from auto_vec<> to vec<> is questionable, and should get some work at some point, perhaps just passingauto_vec references is good enough, or perhaps there is value in some const_vec view to avoid having to rely on optimizations, I'm not sure without looking more at the usage. We do need to be able to provide APIs that work with both auto_vec<> and vec<>, I agree that those currently taking a vec<> by value are fragile (and we've had bugs there before), but I'm not ready to say that changing them all to [const] vec<>& is OK. The alternative would be passing a const_vec<> by value, passing that along to const vec<>& APIs should be valid then (I can see quite some API boundary cleanups being necessary here ...). But with all this I don't know how to adjust auto_vec<> to no longer "decay" to vec<> but still being able to pass it to vec<>& and being able to call vec<> member functions w/o jumping through hoops. Any hints on that? private inheritance achieves the first but also hides all the API ... The simplest way to do that is by preventing the implicit conversion between the two types and adding a to_vec() member function to auto_vec as Jason suggested. This exposes the implicit conversions to the base vec, forcing us to review each and "fix" it either by changing the argument to either vec& or const vec&, or less often to avoid the indirection if necessary, by converting the auto_vec to vec explicitly by calling to_vec(). The attached diff shows a very rough POC of what that might look like. A side benefit is that it improves const-safety and makes the effects of functions on their callers easier to understand. But that's orthogonal to making auto_vec copy constructible and copy assignable. That change can be made independently and with much less effort and no risk. There's a bunch of stuff I can't review because of lack of C++ knowledge (the vNULL changes). I suppose that + /* Prevent implicit conversion from auto_vec. Use auto_vec::to_vec() + instead. */ + template + vec (auto_vec &) = delete; + + template + void operator= (auto_vec &) = delete; still allows passing auto_vec<> to [const] vec<> & without the .to_vec so it's just the by-value passing that becomes explicit? If so then I agree this is an improvement. The patch is likely incomplete though or is it really only so few signatures that need changing? Yes, to both questions. I just wanted to show how we might go about making this transition. I converted a few files but many more remain. I stopped when changing a vec argument to const vec& started to cause errors due to missing const down the line (e.g., assigning the address of a vec element to an uqualified pointer). I'll need to follow where that pointer flows and see if it's used to modify the object or if it too could be made const. To me that seems worthwhile doing now but it makes progress slow. A separate question is whether the vec value arguments to functions that modify them should be changed to references or pointers. Both kinds of APIs exist but the latter seems prevalent. Changing them to the latter means more churn. Martin
Re: [PATCH 7/7] Port most of the A CMP 0 ? A : -A to match
On 6/19/2021 3:49 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski To improve phiopt and be able to remove abs_replacement, this ports most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to match.pd. There is a few extra changes that are needed to remove the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison: * Need to handle (A - B) case * Need to handle UN* comparisons. I will handle those in a different patch. Note phi-opt-15.c test needed to be updated as we get ABSU now instead of not getting ABS. When ABSU was added phiopt was not updated even to use ABSU instead of not creating ABS. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (A CMP 0 ? A : -A): New patterns. * tree-ssa-phiopt.c (abs_replacement): Delete function. (tree_ssa_phiopt_worker): Don't call abs_replacement. Update comment about abs_replacement. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect ABSU and still not expect ABS_EXPR. +(for cnd (cond vec_cond) + /* A == 0? A : -Asame as -A */ ^^ Missing whitespace. This occurs in the comment each pattern. OK with the nits fixed. jeff
Re: [PATCH 1/7] Expand the comparison argument of fold_cond_expr_with_comparison
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski To make things slightly easiler to convert fold_cond_expr_with_comparison over to match.pd, expanding the arg0 argument into 3 different arguments is done. Also this was simple because we don't use arg0 after grabbing the code and the two operands. Also since we do this, we don't need to fold the comparison to get the inverse but just use invert_tree_comparison directly. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * fold-const.c (fold_cond_expr_with_comparison): Exand arg0 into comp_code, arg00, and arg01. (fold_ternary_loc): Use invert_tree_comparison instead of fold_invert_truthvalue for the case where we have A CMP B ? C : A. OK jeff
Re: [PATCH 2/7] Reset the range info on the moved instruction in PHIOPT
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski I had missed this when wrote the patch which allowed the gimple to be moved from inside the conditional as it. It was also missed in the review. Anyways the range information needs to be reset for the moved gimple as it was under a conditional and the flow has changed to be unconditional. I have not seen any testcase in the wild that produces wrong code yet which is why there is no testcase but this is similar to what the other code in phiopt does so after moving those to match, there might be some. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.c (match_simplify_replacement): Reset flow senatitive info on the moved ssa set. OK jeff
Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface
This patch introduces a dwarf2int.h header, to be used by code that needs access to the internal DIE structures and their attributes. >>> >>> Why not put these bits in dwarf2out.h? >> We think that it makes sense to have a separated interface file for >> the >> implementation of DWARF-based debug formats. It is called `internal' >> because it provides access to internal data structures as well as the >> basic accessor functions to the internals of the DWARF DIEs. > > Yes, but "internal data structures" also describes most of the current > dwarf2out.h. Yes right, dwarf2out.h contains a mixture of function prototypes and several data structures, many of them "internal". > I'm not opposed to refactoring the header, but splitting > off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better > dividing line. Ok, so what about this: at this point we remove dwarf2int.h from our patch, put the definitions in dwarf2out.h instead, and then once the stuff is upstream we can discuss on how better refactor the dwarf2out* stuff. Is that ok?
[PATCH] c++: alias CTAD and aggregate deduction cand [PR98832]
During alias CTAD, we're accidentally ignoring the aggregate deduction candidate of the underlying template because it's added to the candidate set separately via maybe_aggr_guide (which doesn't yet handle alias templates) rather than via deduction_guides_for (which does). This patch makes maybe_aggr_guide handle alias templates in a manner similar to deduction_guides_for. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/98832 gcc/cp/ChangeLog: * pt.c (maybe_aggr_guide): Handle an alias template. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/class-deduction-alias9.C: New test. --- gcc/cp/pt.c | 11 +++ gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C | 6 ++ 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index aeb1e8a6f97..db15e4714d5 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -28880,6 +28880,8 @@ is_spec_or_derived (tree etype, tree tmpl) return !err; } +static tree alias_ctad_tweaks (tree, tree); + /* Return a C++20 aggregate deduction candidate for TYPE initialized from INIT. */ @@ -28892,6 +28894,15 @@ maybe_aggr_guide (tree tmpl, tree init, vec *args) if (init == NULL_TREE) return NULL_TREE; + if (DECL_ALIAS_TEMPLATE_P (tmpl)) +{ + tree under = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl)); + tree tinfo = get_template_info (under); + if (tree guide = maybe_aggr_guide (TI_TEMPLATE (tinfo), init, args)) + return alias_ctad_tweaks (tmpl, guide); + return NULL_TREE; +} + /* We might be creating a guide for a class member template, e.g., template struct A { diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C new file mode 100644 index 000..0aaf203639a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C @@ -0,0 +1,6 @@ +// PR c++/98832 +// { dg-do compile { target c++20 } } + +template struct X{ U u; }; +template using Y = X; +Y y{0}; -- 2.32.0.93.g670b81a890
Re: [PATCH 3/7] Duplicate the range information of the phi onto the new ssa_name
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski Since match_simplify_replacement uses gimple_simplify, there is a new ssa name created sometimes and then we go and replace the phi edge with this new ssa name, the range information on the phi is lost. Placing this in replace_phi_edge_with_variable is the best option instead of doing it in each time replace_phi_edge_with_variable is called which is what is done today. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range info if we're the only things setting the target PHI. (value_replacement): Don't duplicate range here. (minmax_replacement): Likewise. --- gcc/tree-ssa-phiopt.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index 24cbce9955a..147397ad82c 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -391,6 +391,24 @@ replace_phi_edge_with_variable (basic_block cond_block, basic_block bb = gimple_bb (phi); basic_block block_to_remove; gimple_stmt_iterator gsi; + tree phi_result = PHI_RESULT (phi); + + /* Duplicate range info as needed. */ + if (TREE_CODE (new_tree) == SSA_NAME + && EDGE_COUNT (gimple_bb (phi)->preds) == 2 + && INTEGRAL_TYPE_P (TREE_TYPE (phi_result))) +{ + if (!SSA_NAME_RANGE_INFO (new_tree) + && SSA_NAME_RANGE_INFO (phi_result)) + duplicate_ssa_name_range_info (new_tree, + SSA_NAME_RANGE_TYPE (phi_result), + SSA_NAME_RANGE_INFO (phi_result)); + if (SSA_NAME_RANGE_INFO (new_tree) + && !SSA_NAME_RANGE_INFO (phi_result)) + duplicate_ssa_name_range_info (phi_result, + SSA_NAME_RANGE_TYPE (new_tree), + SSA_NAME_RANGE_INFO (new_tree)); +} I'm not sure you need the EDGE_COUNT test here. I worry that copying the range info from the argument to the PHI result isn't right. It was OK for the ABS replacement, but I'm not sure that's true in general. Jeff /* Change the PHI argument to new. */ SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree); @@ -1385,16 +1403,6 @@ value_replacement (basic_block cond_bb, basic_block middle_bb, : # u_3 = PHI */ reset_flow_sensitive_info (lhs); - if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))) - { - /* If available, we can use VR of phi result at least. */ - tree phires = gimple_phi_result (phi); - struct range_info_def *phires_range_info - = SSA_NAME_RANGE_INFO (phires); - if (phires_range_info) - duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires), - phires_range_info); - } gimple_stmt_iterator gsi_from; for (int i = prep_cnt - 1; i >= 0; --i) { @@ -1794,13 +1802,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, gimple_seq stmts = NULL; tree phi_result = PHI_RESULT (phi); result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1); - /* Duplicate range info if we're the only things setting the target PHI. */ - if (!gimple_seq_empty_p (stmts) - && EDGE_COUNT (gimple_bb (phi)->preds) == 2 - && !POINTER_TYPE_P (TREE_TYPE (phi_result)) - && SSA_NAME_RANGE_INFO (phi_result)) -duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result), - SSA_NAME_RANGE_INFO (phi_result)); gsi = gsi_last_bb (cond_bb); gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
Re: [PATCH 5/7] Try inverted comparison for match_simplify in phiopt
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski Since match and simplify does not have all of the inverted comparison patterns, it make sense to just have phi-opt try to do the inversion and try match and simplify again. OK? Bootstrapped and tested on x86_64-linux-gnu. Thanks, Andrew Pinski gcc/ChangeLog: * tree-ssa-phiopt.c (gimple_simplify_phiopt): If "A ? B : C" fails to simplify, try "(!A) ? C : B". OK jeff
Re: [PATCH 7/7] Port most of the A CMP 0 ? A : -A to match
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski To improve phiopt and be able to remove abs_replacement, this ports most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to match.pd. There is a few extra changes that are needed to remove the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison: * Need to handle (A - B) case * Need to handle UN* comparisons. I will handle those in a different patch. Note phi-opt-15.c test needed to be updated as we get ABSU now instead of not getting ABS. When ABSU was added phiopt was not updated even to use ABSU instead of not creating ABS. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (A CMP 0 ? A : -A): New patterns. * tree-ssa-phiopt.c (abs_replacement): Delete function. (tree_ssa_phiopt_worker): Don't call abs_replacement. Update comment about abs_replacement. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect ABSU and still not expect ABS_EXPR. And I've already ack'd this part :-) I think it's unchanged since he original. Jeff
Re: [PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO
On 6/24/21 3:32 AM, Richard Biener wrote: On Mon, Jun 21, 2021 at 11:55 PM Martin Sebor via Gcc-patches wrote: Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html Looking for a review of the LTO changes to switch TREE_NO_WARNING to the suppress_warning() API. Hmm, since the warning suppressions are on location ad-hoc data the appropriate thing is to adjust the location streaming and that part seems to be missing? So what you now stream is simply the "everything" fallback, correct? In particular: else -bp_pack_value (bp, TREE_NO_WARNING (expr), 1); +/* FIXME: pack all warning bits. */ +bp_pack_value (bp, warning_suppressed_p (expr), 1); this looks like a wrong comment in that light. Yes, this is just a fallback. I haven't thought about how to handle the FIXME yet but from your comment it sounds like this code might stay the same (or maybe even go back to streaming the flag directly) and the nowarn_spec_t bitmap should be streamed elsewhere? - else -compare_values (TREE_NO_WARNING); + else if (t1->base.nowarning_flag != t2->base.nowarning_flag) +return false; uh. Can you keep sth like TREE_NO_WARNING_RAW or so? The flag is used directly in fold-const.c and cp/module.cc so this would be in keeping with that, but I also don't mind adding a macro for it. My only concern is with macro getting used to inadvertently bypass the API. Martin Thanks, Richard. On 6/4/21 3:43 PM, Martin Sebor wrote: The attached patch replaces the uses of TREE_NO_WARNING in the LTO front end with the new suppress_warning() API. It adds a couple of FIXMEs that I plan to take care of in a follow up.
Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface
On 6/24/21 11:13 AM, Jose E. Marchesi wrote: This patch introduces a dwarf2int.h header, to be used by code that needs access to the internal DIE structures and their attributes. Why not put these bits in dwarf2out.h? We think that it makes sense to have a separated interface file for the implementation of DWARF-based debug formats. It is called `internal' because it provides access to internal data structures as well as the basic accessor functions to the internals of the DWARF DIEs. Yes, but "internal data structures" also describes most of the current dwarf2out.h. Yes right, dwarf2out.h contains a mixture of function prototypes and several data structures, many of them "internal". I'm not opposed to refactoring the header, but splitting off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better dividing line. Ok, so what about this: at this point we remove dwarf2int.h from our patch, put the definitions in dwarf2out.h instead, and then once the stuff is upstream we can discuss on how better refactor the dwarf2out* stuff. Is that ok? Sounds good. Jason
Re: [PATCH] c++: alias CTAD and aggregate deduction cand [PR98832]
On 6/24/21 11:15 AM, Patrick Palka wrote: During alias CTAD, we're accidentally ignoring the aggregate deduction candidate of the underlying template because it's added to the candidate set separately via maybe_aggr_guide (which doesn't yet handle alias templates) rather than via deduction_guides_for (which does). This patch makes maybe_aggr_guide handle alias templates in a manner similar to deduction_guides_for. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. PR c++/98832 gcc/cp/ChangeLog: * pt.c (maybe_aggr_guide): Handle an alias template. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/class-deduction-alias9.C: New test. --- gcc/cp/pt.c | 11 +++ gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C | 6 ++ 2 files changed, 17 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index aeb1e8a6f97..db15e4714d5 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -28880,6 +28880,8 @@ is_spec_or_derived (tree etype, tree tmpl) return !err; } +static tree alias_ctad_tweaks (tree, tree); + /* Return a C++20 aggregate deduction candidate for TYPE initialized from INIT. */ @@ -28892,6 +28894,15 @@ maybe_aggr_guide (tree tmpl, tree init, vec *args) if (init == NULL_TREE) return NULL_TREE; + if (DECL_ALIAS_TEMPLATE_P (tmpl)) +{ + tree under = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl)); + tree tinfo = get_template_info (under); + if (tree guide = maybe_aggr_guide (TI_TEMPLATE (tinfo), init, args)) + return alias_ctad_tweaks (tmpl, guide); + return NULL_TREE; +} + /* We might be creating a guide for a class member template, e.g., template struct A { diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C new file mode 100644 index 000..0aaf203639a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias9.C @@ -0,0 +1,6 @@ +// PR c++/98832 +// { dg-do compile { target c++20 } } + +template struct X{ U u; }; +template using Y = X; +Y y{0};
Re: [PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)
On Fri, May 14, 2021 at 09:20:25PM +0800, Chung-Lin Tang wrote: > diff --git a/gcc/gimplify.c b/gcc/gimplify.c > index e790f08b23f..69c4a8e0a0a 100644 > --- a/gcc/gimplify.c > +++ b/gcc/gimplify.c > @@ -10374,6 +10374,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, > void *data) > gcc_unreachable (); > } >OMP_CLAUSE_SET_MAP_KIND (clause, kind); > + OMP_CLAUSE_MAP_IMPLICIT_P (clause) = 1; >if (DECL_SIZE (decl) > && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) > { As Thomas mentioned, there is now also OMP_CLAUSE_MAP_IMPLICIT that means something different: /* Nonzero on map clauses added implicitly for reduction clauses on combined or composite constructs. They shall be removed if there is an explicit map clause. */ Having OMP_CLAUSE_MAP_IMPLICIT and OMP_CLAUSE_MAP_IMPLICIT_P would be too confusing. So either we need to use just one flag for both purposes or have two different flags and find a better name for one of them. The former would be possible if no OMP_CLAUSE_MAP clauses added by the FEs are implicit - then you could clear OMP_CLAUSE_MAP_IMPLICIT in gimplify_scan_omp_clauses. I wonder if it is the case though, e.g. doesn't your "Improve OpenMP target support for C++ [PR92120 v4]" patch add a lot of such implicit map clauses (e.g. the this[:1] and various others)? Also, gimplify_adjust_omp_clauses_1 sometimes doesn't add just one map clause, but several, shouldn't those be marked implicit too? And similarly it calls lang_hooks.decls.omp_finish_clause which can add even further map clauses implicitly, shouldn't those be implicit too (in that case copy the flag from the clause it is called on to the extra clauses it adds)? Also as Thomas mentioned, it should be restricted to non-OpenACC, it can check gimplify_omp_ctxp->region_type if it is OpenMP or OpenACC. > @@ -10971,9 +10972,15 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, > gimple_seq body, tree *list_p, > list_p = &OMP_CLAUSE_CHAIN (c); > } > > - /* Add in any implicit data sharing. */ > + /* Add in any implicit data sharing. Implicit clauses are added at the > start Two spaces after dot in comments. > + of the clause list, but after any non-map clauses. */ >struct gimplify_adjust_omp_clauses_data data; > - data.list_p = list_p; > + tree *implicit_add_list_p = orig_list_p; > + while (*implicit_add_list_p > + && OMP_CLAUSE_CODE (*implicit_add_list_p) != OMP_CLAUSE_MAP) > +implicit_add_list_p = &OMP_CLAUSE_CHAIN (*implicit_add_list_p); Why are the implicit map clauses added first and not last? There is also the OpenMP 5.1 [352:17-22] case which basically says that the implicit mappings should be ignored if there are explicit ones on the same construct (though, do we really create implicit clauses in that case?). > --- a/include/gomp-constants.h > +++ b/include/gomp-constants.h > @@ -40,11 +40,22 @@ > #define GOMP_MAP_FLAG_SPECIAL_0 (1 << 2) > #define GOMP_MAP_FLAG_SPECIAL_1 (1 << 3) > #define GOMP_MAP_FLAG_SPECIAL_2 (1 << 4) > +#define GOMP_MAP_FLAG_SPECIAL_3 (1 << 5) > #define GOMP_MAP_FLAG_SPECIAL_4 (1 << 6) > #define GOMP_MAP_FLAG_SPECIAL(GOMP_MAP_FLAG_SPECIAL_1 \ >| GOMP_MAP_FLAG_SPECIAL_0) > #define GOMP_MAP_DEEP_COPY (GOMP_MAP_FLAG_SPECIAL_4 \ >| GOMP_MAP_FLAG_SPECIAL_2) > +/* This value indicates the map was created implicitly according to > + OpenMP rules. */ > +#define GOMP_MAP_IMPLICIT(GOMP_MAP_FLAG_SPECIAL_3 \ > + | GOMP_MAP_FLAG_SPECIAL_4) > +/* Mask for entire set of special map kind bits. */ > +#define GOMP_MAP_FLAG_SPECIAL_BITS (GOMP_MAP_FLAG_SPECIAL_0 \ > + | GOMP_MAP_FLAG_SPECIAL_1 \ > + | GOMP_MAP_FLAG_SPECIAL_2 \ > + | GOMP_MAP_FLAG_SPECIAL_3 \ > + | GOMP_MAP_FLAG_SPECIAL_4) > /* Flag to force a specific behavior (or else, trigger a run-time error). */ > #define GOMP_MAP_FLAG_FORCE (1 << 7) > > @@ -186,6 +197,9 @@ enum gomp_map_kind > #define GOMP_MAP_ALWAYS_P(X) \ >(GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM)) > > +#define GOMP_MAP_IMPLICIT_P(X) \ > + (((X) & GOMP_MAP_FLAG_SPECIAL_BITS) == GOMP_MAP_IMPLICIT) I think here we need to decide with which GOMP_MAP* kinds the implicit bit will need to be combined with, with looking forward into what features we still need to implement for OpenMP 5.0/5.1 (not aware of anything in 5.2 that would need special care but perhaps I've missed it). E.g. for declare mapper those single OMP_CLAUSE_MAPs with the implicit bit might need to be split into various smaller ones, map this FIELD_DECL, map that other FIELD_DECL, what it points to, etc. Even without declare mapp
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu wrote: > > CPUID functions are used to detect CPU features. If vector ISAs > are enabled, compiler is free to use them in these functions. Add > __attribute__ ((target("general-regs-only"))) to CPUID functions > to avoid vector instructions. These functions are intended to be inlined, so how does target attribute affect inlining? Uros. > > gcc/ > > PR target/101185 > * config/i386/cpuid.h (__get_cpuid_max): Add > __attribute__ ((target("general-regs-only"))). > (__get_cpuid): Likewise. > (__get_cpuid_count): Likewise. > (__cpuidex): Likewise. > > gcc/testsuite/ > > PR target/101185 > * gcc.target/i386/avx512-check.h (check_osxsave): Add > __attribute__ ((target("general-regs-only"))). > (main): Likewise. > --- > gcc/config/i386/cpuid.h | 4 > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > 2 files changed, 6 insertions(+) > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > index aebc17c6827..74881ee91e5 100644 > --- a/gcc/config/i386/cpuid.h > +++ b/gcc/config/i386/cpuid.h > @@ -243,6 +243,7 @@ > pointer is non-null, then first four bytes of the signature > (as found in ebx register) are returned in location pointed by sig. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline unsigned int > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > { > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > supported and returns 1 for valid cpuid information or 0 for > unsupported cpuid leaf. All pointers are required to be non-null. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline int > __get_cpuid (unsigned int __leaf, > unsigned int *__eax, unsigned int *__ebx, > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > /* Same as above, but sub-leaf can be specified. */ > > +__attribute__ ((target("general-regs-only"))) > static __inline int > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, >unsigned int *__eax, unsigned int *__ebx, > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int > __subleaf, >return 1; > } > > +__attribute__ ((target("general-regs-only"))) > static __inline void > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > { > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > b/gcc/testsuite/gcc.target/i386/avx512-check.h > index 0a377dba1d5..406faf8fe03 100644 > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > @@ -25,6 +25,7 @@ do_test (void) > } > #endif > > +__attribute__ ((target("general-regs-only"))) > static int > check_osxsave (void) > { > @@ -34,6 +35,7 @@ check_osxsave (void) >return (ecx & bit_OSXSAVE) != 0; > } > > +__attribute__ ((target("general-regs-only"))) > int > main () > { > -- > 2.31.1 >
Re: [PATCH 4/7] Allow match-and-simplified phiopt to run in early phiopt
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski To move a few things more to match-and-simplify from phiopt, we need to allow match_simplify_replacement to run in early phiopt. To do this we add a replacement for gimple_simplify that is explictly for phiopt. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.c (match_simplify_replacement): Add early_p argument. Call gimple_simplify_phiopt instead of gimple_simplify. (tree_ssa_phiopt_worker): Update call to match_simplify_replacement and allow unconditionally. (phiopt_early_allow): New function. (gimple_simplify_phiopt): New function. So the two questions on my end are why did we restrict when this could run before and why restrict the codes we're willing to optimize in the early phase? Not an ACK or NAK at this point, just trying to understand a bit of the history. jeff
Re: [PATCH 6/7] Lower for loops before lowering cond in genmatch
On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote: From: Andrew Pinski While converting some fold_cond_expr_with_comparison to match, I found that I wanted to use "for cnd (cond vec_cond)" but that was not causing the lowering of cond to happen. What was happening was the lowering of the for loop was happening after the lowering of the cond. So swapping was the correct thing to do but it also means we need to copy for_subst_vec in lower_cond. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * genmatch.c (lower_cond): Copy for_subst_vec for the simplify also. (lower): Swap the order for lower_for and lower_cond. I need to defer this to Richi. I don't know the genmatch code at all. jeff
RE: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics
> -Original Message- > From: Prathamesh Kulkarni > Sent: 24 June 2021 12:11 > To: gcc Patches ; Kyrylo Tkachov > > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics > > Hi, > This patch replaces builtins for vdup_n and vmov_n. > The patch results in regression for pr51534.c. > Consider following function: > > uint8x8_t f1 (uint8x8_t a) { > return vcgt_u8(a, vdup_n_u8(0)); > } > > code-gen before patch: > f1: > vmov.i32 d16, #0 @ v8qi > vcgt.u8 d0, d0, d16 > bx lr > > code-gen after patch: > f1: > vceq.i8 d0, d0, #0 > vmvnd0, d0 > bx lr > > I am not sure which one is better tho ? I think they're equivalent in practice, in any case the patch itself is good (move away from RTL builtins). Ok. Thanks, Kyrill > > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi, > which is due to a missed opt in lowering. I had filed it as > PR98435, and posted a fix for it here: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html > > Thanks, > Prathamesh
Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec
Richard Biener via Gcc-patches writes: > On Wed, Jun 23, 2021 at 12:22 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders >> > wrote: >> >> >> >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote: >> >> > On 6/21/21 1:15 AM, Richard Biener wrote: >> > [...] >> >> > > >> >> > > But maybe I'm misunderstanding C++ too much :/ >> >> > > >> >> > > Well, I guess b) from above means auto_vec<> passing to >> >> > > vec<> taking functions will need changes? >> >> > >> >> > Converting an auto_vec object to a vec slices off its data members. >> >> > The auto_vec specialization has no data members so that's not >> >> > a bug in and of itself, but auto_vec does have data members >> >> > so that would be a bug. The risk is not just passing it to >> >> > functions by value but also returning it. That risk was made >> >> > worse by the addition of the move ctor. >> >> >> >> I would agree that the conversion from auto_vec<> to vec<> is >> >> questionable, and should get some work at some point, perhaps just >> >> passingauto_vec references is good enough, or perhaps there is value in >> >> some const_vec view to avoid having to rely on optimizations, I'm not >> >> sure without looking more at the usage. >> > >> > We do need to be able to provide APIs that work with both auto_vec<> >> > and vec<>, I agree that those currently taking a vec<> by value are >> > fragile (and we've had bugs there before), but I'm not ready to say >> > that changing them all to [const] vec<>& is OK. The alternative >> > would be passing a const_vec<> by value, passing that along to >> > const vec<>& APIs should be valid then (I can see quite some API >> > boundary cleanups being necessary here ...). >> >> FWIW, as far as const_vec<> goes, we already have array_slice, which is >> mostly a version of std::span. (The only real reason for not using >> std::span itself is that it requires a later version of C++.) >> >> Taking those as arguments has the advantage that we can pass normal >> arrays as well. > > It's not really a "const" thing it seems though it certainly would not expose > any API that would reallocate the vector (which is the problematic part > of passing vec<> by value, not necessarily changing elements in-place). > > Since array_slice doesn't inherit most of the vec<> API transforming an > API from vec<> to array_slice<> will be also quite some work. But I > agree it might be useful for generic API stuff. Which API functions would be the most useful ones to have? We could add them if necessary. There again, for things like searching and sorting, I think it would be better to use where possible. It should usually be more efficient than the void* callback approach that the C code tended to use. Thanks, Richard
RE: [ARM] PR98435: Missed optimization in expanding vector constructor
> -Original Message- > From: Prathamesh Kulkarni > Sent: 14 June 2021 09:02 > To: Christophe Lyon > Cc: gcc Patches ; Kyrylo Tkachov > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector > constructor > > On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni > wrote: > > > > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon > wrote: > > > > > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches > > > wrote: > > > > > > > > Hi, > > > > As mentioned in PR, for the following test-case: > > > > > > > > #include > > > > > > > > bfloat16x4_t f1 (bfloat16_t a) > > > > { > > > > return vdup_n_bf16 (a); > > > > } > > > > > > > > bfloat16x4_t f2 (bfloat16_t a) > > > > { > > > > return (bfloat16x4_t) {a, a, a, a}; > > > > } > > > > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized: > > > > > > > > f1: > > > > vdup.16 d16, r0 > > > > vmovr0, r1, d16 @ v4bf > > > > bx lr > > > > > > > > f2: > > > > mov r3, r0 @ __bf16 > > > > adr r1, .L4 > > > > ldrdr0, [r1] > > > > mov r2, r3 @ __bf16 > > > > mov ip, r3 @ __bf16 > > > > bfi r1, r2, #0, #16 > > > > bfi r0, ip, #0, #16 > > > > bfi r1, r3, #16, #16 > > > > bfi r0, r2, #16, #16 > > > > bx lr > > > > > > > > This seems to happen because vec_init pattern in neon.md has VDQ > mode > > > > iterator, which doesn't include V4BF. In attached patch, I changed > > > > mode > > > > to VDQX which seems to work for the test-case, and the compiler now > generates: > > > > > > > > f2: > > > > vdup.16 d16, r0 > > > > vmovr0, r1, d16 @ v4bf > > > > bx lr > > > > > > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am > not > > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has > > > > only 128-bit vectors ? > > > > > > > > > > I think patterns common to both Neon and MVE should be moved to > > > vec-common.md, I don't know why such patterns were left in neon.md. > > Since we end up calling neon_expand_vector_init for both NEON and MVE, > > I am not sure if we should separate the pattern ? > > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as > > in attached patch so > > it will call neon_expand_vector_init only for 128-bit vectors ? > > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either. > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html > (attaching patch as text). > --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -459,10 +459,12 @@ ) (define_expand "vec_init" - [(match_operand:VDQ 0 "s_register_operand") + [(match_operand:VDQX 0 "s_register_operand") (match_operand 1 "" "")] "TARGET_NEON || TARGET_HAVE_MVE" { + if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE (operands[0])) != 16) +FAIL; neon_expand_vector_init (operands[0], operands[1]); DONE; }) I think we should move this to vec-common.md like Christophe said. Perhaps rather than making it FAIL for non-16 MVE sizes we just disable it in the expander condition? "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (< VDQ>mode) != 16)" Thanks, Kyrill > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > That being said, I suggest you look at other similar patterns in > > > vec-common.md, most of which are gated on > > > ARM_HAVE__ARITH > > > and possibly beware of issues with iwmmxt :-) > > > > > > Christophe > > > > > > > Thanks, > > > > Prathamesh
Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline
On 6/22/2021 1:05 AM, Xi Ruoyao via Gcc-patches wrote: mips.exp does not support -fno-inline, causing the tests return "ERROR: Unrecognised option: -fno-inline for dg-options ... ". Use noinline attribute like other mips target tests, to workaround it. gcc/testsuite/ * gcc.target/mips/cfgcleanup-jalr2.c: Remove -fno-inline and add __attribute__((noinline)). * gcc.target/mips/cfgcleanup-jalr3.c: Likewise. I'd like to know a bit more here. mips.exp shouldn't care about the options passed to the compiler and to the best of my knowledge -fno-inline is still a valid GCC option. So while I don't think the patch itself is wrong, I question if it's necessary and whether or not your just papering over some other issue. Jeff
Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline
On Thu, 2021-06-24 at 10:48 -0600, Jeff Law wrote: > > > On 6/22/2021 1:05 AM, Xi Ruoyao via Gcc-patches wrote: > > mips.exp does not support -fno-inline, causing the tests return > > "ERROR: > > Unrecognised option: -fno-inline for dg-options ... ". > > > > Use noinline attribute like other mips target tests, to workaround > > it. > > > > gcc/testsuite/ > > > > * gcc.target/mips/cfgcleanup-jalr2.c: Remove -fno-inline and > > add > > __attribute__((noinline)). > > * gcc.target/mips/cfgcleanup-jalr3.c: Likewise. > I'd like to know a bit more here. mips.exp shouldn't care about the > options passed to the compiler and to the best of my knowledge > -fno-inline is still a valid GCC option. So while I don't think the > patch itself is wrong, I question if it's necessary and whether or not > your just papering over some other issue. There is some logic processing options in mips.exp. Some options are overrided for multilib. It seems the mips.exp was originally designed as: * MIPS options should go in dg-options * Other options should go in dg-additional-options In d2148424165 marxin merged some dg-additional-options into dg-options, exploited the problem. And, the "origin" convention seems already broken: there is something like -funroll-loops which is not a MIPS option, but accepted by mips.exp in dg-options. Possiblities are: (1) this patch (2) make mips.exp accept -fno-inline as "if it is a MIPS option" (3) refactor mips.exp to pass everything itself doesn't know directly to gcc -- Xi Ruoyao
PING^1 [PATCH v4 0/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
On Wed, Jun 9, 2021 at 4:39 PM H.J. Lu wrote: > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO > operands to vector broadcast from an integer with AVX2. > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which > won't increase stack alignment requirement and blocks transformation by > the combine pass. > > A small benchmark: > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast > > shows that broadcast is a little bit faster on Intel Core i7-8559U: > > $ make > gcc -g -I. -O2 -c -o test.o test.c > gcc -g -c -o memory.o memory.S > gcc -g -c -o broadcast.o broadcast.S > gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S > gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o > ./test > memory : 147215 > broadcast : 121213 > vec_dup_sse2: 171366 > $ > > broadcast is also smaller: > > $ size memory.o broadcast.o >textdata bss dec hex filename > 132 0 0 132 84 memory.o > 122 0 0 122 7a broadcast.o > $ > > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast > from memory. > 4. Update avx512f_cond_move.c to expect integer broadcast. > > A small benchmark: > > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast > > shows that integer broadcast is faster than embedded memory broadcast: > > $ make > gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c > gcc -g -c -o memory.o memory.S > gcc -g -c -o broadcast.o broadcast.S > gcc -o test test.o memory.o broadcast.o > ./test > memory : 425538 > broadcast : 375260 > $ > > 5. Update vec_duplicate to allow to fail so that backend can only allow > broadcasting an integer constant to a vector when broadcast instruction > is available. This can be used by memset expander to avoid vec_duplicate > when loading from constant pool is more efficient. > 6. Add vec_duplicate expander and enable vec_duplicate from a > non-standard SSE constant integer only if vector broadcast is available. > > H.J. Lu (2): > x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast > x86: Add vec_duplicate expander > > gcc/config/i386/i386-expand.c | 208 +- > gcc/config/i386/i386-protos.h | 3 + > gcc/config/i386/i386.c| 13 ++ > gcc/config/i386/sse.md| 21 ++ > gcc/doc/md.texi | 2 - > .../i386/avx512f-broadcast-pr87767-1.c| 7 +- > .../i386/avx512f-broadcast-pr87767-5.c| 5 +- > .../gcc.target/i386/avx512f_cond_move.c | 4 +- > .../i386/avx512vl-broadcast-pr87767-1.c | 12 +- > .../i386/avx512vl-broadcast-pr87767-5.c | 9 +- > gcc/testsuite/gcc.target/i386/pr100865-1.c| 13 ++ > gcc/testsuite/gcc.target/i386/pr100865-10a.c | 33 +++ > gcc/testsuite/gcc.target/i386/pr100865-10b.c | 7 + > gcc/testsuite/gcc.target/i386/pr100865-11a.c | 23 ++ > gcc/testsuite/gcc.target/i386/pr100865-11b.c | 8 + > gcc/testsuite/gcc.target/i386/pr100865-12a.c | 20 ++ > gcc/testsuite/gcc.target/i386/pr100865-12b.c | 8 + > gcc/testsuite/gcc.target/i386/pr100865-2.c| 14 ++ > gcc/testsuite/gcc.target/i386/pr100865-3.c| 15 ++ > gcc/testsuite/gcc.target/i386/pr100865-4a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr100865-4b.c | 9 + > gcc/testsuite/gcc.target/i386/pr100865-5a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr100865-5b.c | 9 + > gcc/testsuite/gcc.target/i386/pr100865-6a.c | 16 ++ > gcc/testsuite/gcc.target/i386/pr100865-6b.c | 9 + > gcc/testsuite/gcc.target/i386/pr100865-7a.c | 17 ++ > gcc/testsuite/gcc.target/i386/pr100865-7b.c | 9 + > gcc/testsuite/gcc.target/i386/pr100865-8a.c | 24 ++ > gcc/testsuite/gcc.target/i386/pr100865-8b.c | 7 + > gcc/testsuite/gcc.target/i386/pr100865-9a.c | 25 +++ > gcc/testsuite/gcc.target/i386/pr100865-9b.c | 7 + > 31 files changed, 563 insertions(+), 26 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c > create mode 100644 gcc/testsuite/gcc.target/i
[COMMITTED] Correctly unify recomputation with existing range.
When propagating the on-entry cache, new block ranges are calculated by combining all the incoming edges and comparing to the old value. When a recomputation was performed on an edge, it didn't take into account that the value in the block may already be better than a potential recompuation... Thus a worse values was sometimes propagated. Fixed by simply calling the now correct range_on_edge the cache provides. bootstraps on x86_64-pc-linux-gnu with no regressions. pushed. Andrew >From 5bdcfb74ff97b42f08993af8614c35685fdd8689 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 23 Jun 2021 15:25:45 -0400 Subject: [PATCH 2/3] Correctly unify recomputation with existing range. When propagating the on-entry cache, new block ranges are calculated by combining all the incoming edges and comparing to the old value. When a recomputation was performed on an edge, it didn't take into account that the value in the block may already be better than a potential recompuation... Thus a worse values was sometimes propagated. Fixed by simply calling the now correct range_on_edge the cache provides. * gimple-range-cache.cc (ranger_cache::propagate_cache): Call range_on_edge instead of manually calculating. --- gcc/gimple-range-cache.cc | 29 + 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index a377261c5c7..98ecdbbd68e 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -1037,27 +1037,12 @@ ranger_cache::propagate_cache (tree name) new_range.set_undefined (); FOR_EACH_EDGE (e, ei, bb->preds) { + range_on_edge (e_range, e, name); if (DEBUG_RANGE_CACHE) - fprintf (dump_file, " edge %d->%d :", e->src->index, bb->index); - // Get whatever range we can for this edge. - if (!m_gori.outgoing_edge_range_p (e_range, e, name, *this)) { - exit_range (e_range, name, e->src); - if (DEBUG_RANGE_CACHE) - { - fprintf (dump_file, "No outgoing edge range, picked up "); - e_range.dump (dump_file); - fprintf (dump_file, "\n"); - } - } - else - { - if (DEBUG_RANGE_CACHE) - { - fprintf (dump_file, "outgoing range :"); - e_range.dump (dump_file); - fprintf (dump_file, "\n"); - } + fprintf (dump_file, " edge %d->%d :", e->src->index, bb->index); + e_range.dump (dump_file); + fprintf (dump_file, "\n"); } new_range.union_ (e_range); if (new_range.varying_p ()) @@ -1074,7 +1059,11 @@ ranger_cache::propagate_cache (tree name) if (DEBUG_RANGE_CACHE) { if (!ok_p) - fprintf (dump_file, " Cache failure to store value."); + { + fprintf (dump_file, " Cache failure to store value:"); + print_generic_expr (dump_file, name, TDF_SLIM); + fprintf (dump_file, " "); + } else { fprintf (dump_file, " Updating range to "); -- 2.17.2
[COMMITTED] Fix relation query of equivalences.
When looking for relations between equivalencies, a typo was causing the same bitmap to be checked for both operands, instead of the correct one for each. This caused us to never notice relations between equivalences. I also noticed that under some circumstances the relation dump would call blocks which were NULL and trap.. Also fixed. bootstraps on x86_64-pc-linux-gnu with no regressions. pushed. Andrew >From ce0b409f562cd09c67cc2dce74143a0f0647cde5 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 24 Jun 2021 11:13:47 -0400 Subject: [PATCH 3/3] Fix relation query of equivalences. When looking for relations between equivalencies, a typo was causing the wrong bitmap to be checked. Effect was is missed them. Plus don't dump blocks which don't exist. * value-relation.cc (equiv_oracle::dump): Do not dump NULL blocks. (relation_oracle::find_relation_block): Check correct bitmap. (relation_oracle::dump): Do not dump NULL blocks. --- gcc/value-relation.cc | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index 3c8698f2a54..43fcab7995a 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -444,7 +444,7 @@ equiv_oracle::dump (FILE *f) const { fprintf (f, "Equivalency dump\n"); for (unsigned i = 0; i < m_equiv.length (); i++) -if (m_equiv[i]) +if (m_equiv[i] && BASIC_BLOCK_FOR_FN (cfun, i)) { fprintf (f, "BB%d\n", i); dump (f, BASIC_BLOCK_FOR_FN (cfun, i)); @@ -757,9 +757,9 @@ relation_oracle::find_relation_block (unsigned bb, const_bitmap b1, { unsigned op1 = SSA_NAME_VERSION (ptr->op1 ()); unsigned op2 = SSA_NAME_VERSION (ptr->op2 ()); - if (bitmap_bit_p (b1, op1) && bitmap_bit_p (b1, op2)) + if (bitmap_bit_p (b1, op1) && bitmap_bit_p (b2, op2)) return ptr->kind (); - if (bitmap_bit_p (b1, op2) && bitmap_bit_p (b1, op1)) + if (bitmap_bit_p (b1, op2) && bitmap_bit_p (b2, op1)) return relation_swap (ptr->kind ()); } @@ -925,8 +925,9 @@ relation_oracle::dump (FILE *f) const { fprintf (f, "Relation dump\n"); for (unsigned i = 0; i < m_relations.length (); i++) -{ - fprintf (f, "BB%d\n", i); - dump (f, BASIC_BLOCK_FOR_FN (cfun, i)); -} +if (BASIC_BLOCK_FOR_FN (cfun, i)) + { + fprintf (f, "BB%d\n", i); + dump (f, BASIC_BLOCK_FOR_FN (cfun, i)); + } } -- 2.17.2
[PATCH] fixinc: don't "fix" machine names in quoted string [PR91085]
Bootstrapped and regtested on x86_64-linux-gnu. Ok for trunk? fixincludes/ * fixfixes.c (print_quote): Define it unconditionally, taking and returning const char *. (machine_name_fix): Output quoted strings verbatim. --- fixincludes/fixfixes.c | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c index 5b23a8b640d..2f4ee28a897 100644 --- a/fixincludes/fixfixes.c +++ b/fixincludes/fixfixes.c @@ -78,15 +78,14 @@ static void fix (const char* filname ATTRIBUTE_UNUSED , \ const char* text ATTRIBUTE_UNUSED , \ tFixDesc* p_fixd ATTRIBUTE_UNUSED ) -#ifdef NEED_PRINT_QUOTE /* * Skip over a quoted string. Single quote strings may * contain multiple characters if the first character is * a backslash. Especially a backslash followed by octal digits. * We are not doing a correctness syntax check here. */ -static char* -print_quote(char q, char* text ) +static const char* +print_quote(char q, const char* text ) { fputc( q, stdout ); @@ -118,7 +117,6 @@ print_quote(char q, char* text ) return text; } -#endif /* NEED_PRINT_QUOTE */ /* @@ -488,7 +486,7 @@ FIX_PROC_HEAD( char_macro_def_fix ) FIX_PROC_HEAD( machine_name_fix ) { regmatch_t match[2]; - const char *line, *base, *limit, *p, *q; + const char *line, *base, *limit, *p, *q, *quote; regex_t *label_re, *name_re; char scratch[SCRATCHSZ]; size_t len; @@ -533,7 +531,7 @@ FIX_PROC_HEAD( machine_name_fix ) for (;;) { again: - if (base == limit) + if (base >= limit) break; if (xregexec (name_re, base, 1, match, REG_NOTBOL)) @@ -543,6 +541,26 @@ FIX_PROC_HEAD( machine_name_fix ) if (match[0].rm_eo > limit - base) break; + /* Looking for the next non-escaped quote in the line. */ + quote = base - 1; + do +{ + quote++; + quote = strpbrk (quote, "'\""); + if (quote && quote > limit) +quote = NULL; +} + while (quote && quote[-1] == '\\'); + + if (quote && match[0].rm_eo > quote - base) +{ + /* Match is after the quote: print the quoted string and + everything before it verbatim. */ + fwrite (text, 1, quote - text, stdout); + base = text = print_quote (*quote, quote + 1); + goto again; +} + p = base + match[0].rm_so; base += match[0].rm_eo; -- 2.32.0
Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only
On Thu, Jun 24, 2021 at 9:12 AM Uros Bizjak wrote: > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu wrote: > > > > CPUID functions are used to detect CPU features. If vector ISAs > > are enabled, compiler is free to use them in these functions. Add > > __attribute__ ((target("general-regs-only"))) to CPUID functions > > to avoid vector instructions. > > These functions are intended to be inlined, so how does target > attribute affect inlining? CPUID function won't be inlined. Then we need to rewrite x86 tests to put CPUID check in a separate function with target attribute to limit ISAs used. > Uros. > > > > > gcc/ > > > > PR target/101185 > > * config/i386/cpuid.h (__get_cpuid_max): Add > > __attribute__ ((target("general-regs-only"))). > > (__get_cpuid): Likewise. > > (__get_cpuid_count): Likewise. > > (__cpuidex): Likewise. > > > > gcc/testsuite/ > > > > PR target/101185 > > * gcc.target/i386/avx512-check.h (check_osxsave): Add > > __attribute__ ((target("general-regs-only"))). > > (main): Likewise. > > --- > > gcc/config/i386/cpuid.h | 4 > > gcc/testsuite/gcc.target/i386/avx512-check.h | 2 ++ > > 2 files changed, 6 insertions(+) > > > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > > index aebc17c6827..74881ee91e5 100644 > > --- a/gcc/config/i386/cpuid.h > > +++ b/gcc/config/i386/cpuid.h > > @@ -243,6 +243,7 @@ > > pointer is non-null, then first four bytes of the signature > > (as found in ebx register) are returned in location pointed by sig. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline unsigned int > > __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > > { > > @@ -298,6 +299,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int > > *__sig) > > supported and returns 1 for valid cpuid information or 0 for > > unsupported cpuid leaf. All pointers are required to be non-null. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline int > > __get_cpuid (unsigned int __leaf, > > unsigned int *__eax, unsigned int *__ebx, > > @@ -315,6 +317,7 @@ __get_cpuid (unsigned int __leaf, > > > > /* Same as above, but sub-leaf can be specified. */ > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline int > > __get_cpuid_count (unsigned int __leaf, unsigned int __subleaf, > >unsigned int *__eax, unsigned int *__ebx, > > @@ -330,6 +333,7 @@ __get_cpuid_count (unsigned int __leaf, unsigned int > > __subleaf, > >return 1; > > } > > > > +__attribute__ ((target("general-regs-only"))) > > static __inline void > > __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) > > { > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > > b/gcc/testsuite/gcc.target/i386/avx512-check.h > > index 0a377dba1d5..406faf8fe03 100644 > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > > @@ -25,6 +25,7 @@ do_test (void) > > } > > #endif > > > > +__attribute__ ((target("general-regs-only"))) > > static int > > check_osxsave (void) > > { > > @@ -34,6 +35,7 @@ check_osxsave (void) > >return (ecx & bit_OSXSAVE) != 0; > > } > > > > +__attribute__ ((target("general-regs-only"))) > > int > > main () > > { > > -- > > 2.31.1 > > -- H.J.
Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*
On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote: > The LWG issue proposes to add a conditional noexcept-specifier to > std::unique_ptr's dereference operator. The issue is currently in > Tentatively Ready status, but even if it isn't voted into the draft, we > can do it as a conforming extensions. This commit also adds a similar > noexcept-specifier to operator[] for the unique_ptr partial > specialization. The conditional noexcept added to unique_ptr::operator[] seems to break the case where T is complete only after the fact: struct T; extern std::unique_ptr p; auto& t = p[1]; struct T { }; /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], _Dp>::operator[](std::size_t) co nst [with _Tp = A; _Dp = std::default_delete; typename std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’: testcase.cc:5:14: required from here /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of incomplete type ‘struct A’ 658 | noexcept(noexcept(std::declval()[std::declval()])) | ~~~^ testcase.cc:3:8: note: forward declaration of ‘struct A’ 3 | struct A; |^ > > Also ensure that all dereference operators for shared_ptr are noexcept, > and adds tests for the std::optional accessors modified by the issue, > which were already noexcept in our implementation. > > Signed-off-by: Jonathan Wakely > > libstdc++-v3/ChangeLog: > > * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]): > Add noexcept. > * include/bits/unique_ptr.h (unique_ptr::operator*): Add > conditional noexcept as per LWG 2762. > * testsuite/20_util/shared_ptr/observers/array.cc: Check that > dereferencing cannot throw. > * testsuite/20_util/shared_ptr/observers/get.cc: Likewise. > * testsuite/20_util/optional/observers/lwg2762.cc: New test. > * testsuite/20_util/unique_ptr/lwg2762.cc: New test. > > Tested powerpc64le-linux. Committed to trunk. > >
Re: [PATCH] aix: handle 64bit inodes for include directories
On 6/23/2021 12:53 AM, CHIGOT, CLEMENT via Gcc-patches wrote: Hi David, Did you have a chance to take look at this patch ? Thanks, Clément +DavidMalcolm Can you review this patch when you have a moment? Thanks, David On Mon, May 17, 2021 at 3:05 PM David Edelsohn wrote: The aix.h change is okay with me, but you need to get approval for the incpath.c and cpplib.h parts of the patch from the appropriate maintainers. Thanks, David On Mon, May 17, 2021 at 7:44 AM CHIGOT, CLEMENT wrote: On AIX, stat will store inodes in 32bit even when using LARGE_FILES. If the inode is larger, it will return -1 in st_ino. Thus, in incpath.c when comparing include directories, if several of them have 64bit inodes, they will be considered as duplicated. gcc/ChangeLog: 2021-05-06 Clément Chigot * configure.ac: Check sizeof ino_t and dev_t. * config.in: Regenerate. * configure: Regenerate. * config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define. * incpath.c (HOST_STAT_FOR_64BIT_INODES): New define. (remove_duplicates): Use it. libcpp/ChangeLog: 2021-05-06 Clément Chigot * configure.ac: Check sizeof ino_t and dev_t. * config.in: Regenerate. * configure: Regenerate. * include/cpplib.h (INO_T_CPP): Change for AIX. (DEV_T_CPP): New macro. (struct cpp_dir): Use it. So my worry here is this is really a host property -- ie, this is behavior of where GCC runs, not the target for which GCC is generating code. That implies that the change in aix.h is wrong. aix.h is for the target, not the host -- you don't want to define something like HOST_STAT_FOR_64BIT_INODES there. You'd want to be triggering this behavior via a host fragment, x-aix, or better yet via an autoconf test. jeff
[r12-1764 Regression] FAIL: g++.dg/opt/pr91838.C -std=c++2a scan-assembler pxor\\s+%xmm0,\\s+%xmm0 on Linux/x86_64
On Linux/x86_64, 3bd86940c428de9dde53e41265fb1435ed236f5e is the first bad commit commit 3bd86940c428de9dde53e41265fb1435ed236f5e Author: liuhongt Date: Tue Jan 26 16:29:32 2021 +0800 i386: Add vashlm3/vashrm3/vlshrm3 to enable vectorization of vector shift vector. [PR98434] caused FAIL: g++.dg/opt/pr91838.C -std=c++14 scan-assembler pxor\\s+%xmm0,\\s+%xmm0 FAIL: g++.dg/opt/pr91838.C -std=c++17 scan-assembler pxor\\s+%xmm0,\\s+%xmm0 FAIL: g++.dg/opt/pr91838.C -std=c++2a scan-assembler pxor\\s+%xmm0,\\s+%xmm0 with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1764/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/opt/pr91838.C --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[r12-1773 Regression] FAIL: gcc.dg/vect/pr96854.c (test for excess errors) on Linux/x86_64
On Linux/x86_64, 7a6c31f0f84a7295433ebac09b94fae2d5cc2892 is the first bad commit commit 7a6c31f0f84a7295433ebac09b94fae2d5cc2892 Author: Richard Biener Date: Mon May 31 13:19:01 2021 +0200 Add x86 addsub SLP pattern caused FAIL: gcc.dg/vect/pr96854.c -flto -ffat-lto-objects (internal compiler error) FAIL: gcc.dg/vect/pr96854.c -flto -ffat-lto-objects (test for excess errors) FAIL: gcc.dg/vect/pr96854.c (internal compiler error) FAIL: gcc.dg/vect/pr96854.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1773/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/pr96854.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/pr96854.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*
On Thu, 24 Jun 2021 at 19:54, Patrick Palka wrote: > > On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote: > > > The LWG issue proposes to add a conditional noexcept-specifier to > > std::unique_ptr's dereference operator. The issue is currently in > > Tentatively Ready status, but even if it isn't voted into the draft, we > > can do it as a conforming extensions. This commit also adds a similar > > noexcept-specifier to operator[] for the unique_ptr partial > > specialization. > > The conditional noexcept added to unique_ptr::operator[] seems to break > the case where T is complete only after the fact: > > struct T; > extern std::unique_ptr p; > auto& t = p[1]; > struct T { }; > > /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename > std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], > _Dp>::operator[](std::size_t) co > nst [with _Tp = A; _Dp = std::default_delete; typename > std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’: > testcase.cc:5:14: required from here > /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of > incomplete type ‘struct A’ > 658 | > noexcept(noexcept(std::declval()[std::declval()])) > | ~~~^ > testcase.cc:3:8: note: forward declaration of ‘struct A’ > 3 | struct A; > |^ OK, I'll remove that again.
Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support
On Thu, Jun 24, 2021 at 05:32:20PM +0800, Kewen.Lin wrote: > on 2021/6/24 上午12:58, Segher Boessenkool wrote: > > On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote: > +#ifdef FLOAT128_HW_INSNS_ISA3_1 > TFtype __floattikf (TItype_ppc) > __attribute__ ((__ifunc__ ("__floattikf_resolve"))); > >>> > >>> I wonder if we now need TItype_ppc at all anymore, btw? > >> > >> Sorry that I don't quite follow this question. > > > > I thought it may do the same as just TItype now, but the ifunc stuff > > probably makes it different still :-) > > Ah, thanks for the clarification! If I read it right, TItype is defined > with __attribute__ ((mode (TI))) while TItype_ppc is defined with > __attribute__ ((__mode__ (__TI__))), the later writing looks special. I managed to read things wrong, I thought there was some ifunc stuff in the definition of TItype_ppc. Of course there is not, it is just setting the mode. mode(__TI__) is just the more portable way of writing mode(TI), the latter will not work if something #define's TI (you cannot do that with __TI__, you are not allowed to by the C standard, in application code). So it looks like we could just use {U,}TItype here, no _ppc. Carl, is there some reason I'm not seeing you used it? Segher
[COMMITTED] Add a test case to confirm the equivalence's are being checked by EVRP.
Adding a testcase which shows the equivalence/relation engine working as it should. pushed. Andrew >From ce3316e9c02c81c509173572c71a101f4eb62a24 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 24 Jun 2021 13:49:51 -0400 Subject: [PATCH 2/2] Add a testcase to confirm the equivalence's are being checked by EVRP. * gcc.dg/tree-ssa/evrp30.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/evrp30.c | 16 1 file changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/evrp30.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c new file mode 100644 index 000..2c5ff41ecd9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp30.c @@ -0,0 +1,16 @@ +/* Confirm the ranger is picking up a relationship with equivalences. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +extern void foo (); + +void f (unsigned int a, unsigned int b) +{ + if (a == b) +for (unsigned i = 0; i < a; i++) + if (i == b) // Confirm i < a also means i < b. + foo (); /* Unreachable */ +} + +/* { dg-final { scan-tree-dump-times "foo\\(" 0 "evrp"} } */ + -- 2.17.2
[COMMITTED] tree-optimization/101189 - Only register relations on live edges
As mentioned in the PR analysis, we shouldn't register a relation on an outgoing conditional edge if range analysis proves that edge can never be taken. Its just asking for trouble :-) Bootstrapped on x86_64-pc-linux-gnu. Pushed. Andrew >From a0accaa99844b0c40661202635859f8c0be76cdd Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Thu, 24 Jun 2021 13:35:21 -0400 Subject: [PATCH 1/2] Only register relations on live edges Register a relation on a conditional edge only if the LHS supports this edge being taken. gcc/ PR tree-optimization/101189 * gimple-range-fold.cc (fold_using_range::range_of_range_op): Pass LHS range of condition to postfold routine. (fold_using_range::postfold_gcond_edges): Only process the TRUE or FALSE edge if the LHS range supports it being taken. * gimple-range-fold.h (postfold_gcond_edges): Add range parameter. gcc/testsuite/ * gcc.dg/tree-ssa/pr101189.c: New. --- gcc/gimple-range-fold.cc | 29 +++- gcc/gimple-range-fold.h | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr101189.c | 17 ++ 3 files changed, 41 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101189.c diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc index 583348e6e36..1fa4ace32b9 100644 --- a/gcc/gimple-range-fold.cc +++ b/gcc/gimple-range-fold.cc @@ -617,7 +617,7 @@ fold_using_range::range_of_range_op (irange &r, gimple *s, fur_source &src) } } else if (is_a (s)) - postfold_gcond_edges (as_a (s), src); + postfold_gcond_edges (as_a (s), r, src); } else r.set_varying (type); @@ -1247,9 +1247,11 @@ fold_using_range::relation_fold_and_or (irange& lhs_range, gimple *s, // Register any outgoing edge relations from a conditional branch. void -fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src) +fold_using_range::postfold_gcond_edges (gcond *s, irange& lhs_range, + fur_source &src) { int_range_max r; + int_range<2> e0_range, e1_range; tree name; range_operator *handler; basic_block bb = gimple_bb (s); @@ -1257,10 +1259,27 @@ fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src) edge e0 = EDGE_SUCC (bb, 0); if (!single_pred_p (e0->dest)) e0 = NULL; + else +{ + // If this edge is never taken, ignore it. + gcond_edge_range (e0_range, e0); + e0_range.intersect (lhs_range); + if (e0_range.undefined_p ()) + e0 = NULL; +} + edge e1 = EDGE_SUCC (bb, 1); if (!single_pred_p (e1->dest)) e1 = NULL; + else +{ + // If this edge is never taken, ignore it. + gcond_edge_range (e1_range, e1); + e1_range.intersect (lhs_range); + if (e1_range.undefined_p ()) + e1 = NULL; +} // At least one edge needs to be single pred. if (!e0 && !e1) @@ -1276,15 +1295,13 @@ fold_using_range::postfold_gcond_edges (gcond *s, fur_source &src) gcc_checking_assert (handler); if (e0) { - gcond_edge_range (r, e0); - relation_kind relation = handler->op1_op2_relation (r); + relation_kind relation = handler->op1_op2_relation (e0_range); if (relation != VREL_NONE) src.register_relation (e0, relation, ssa1, ssa2); } if (e1) { - gcond_edge_range (r, e1); - relation_kind relation = handler->op1_op2_relation (r); + relation_kind relation = handler->op1_op2_relation (e1_range); if (relation != VREL_NONE) src.register_relation (e1, relation, ssa1, ssa2); } diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h index aeb923145ca..dc1b28f9acc 100644 --- a/gcc/gimple-range-fold.h +++ b/gcc/gimple-range-fold.h @@ -158,6 +158,6 @@ protected: void range_of_ssa_name_with_loop_info (irange &, tree, class loop *, gphi *, fur_source &src); void relation_fold_and_or (irange& lhs_range, gimple *s, fur_source &src); - void postfold_gcond_edges (gcond *s, fur_source &src); + void postfold_gcond_edges (gcond *s, irange &lhs_range, fur_source &src); }; #endif // GCC_GIMPLE_RANGE_FOLD_H diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c b/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c new file mode 100644 index 000..5730708a0b8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101189.c @@ -0,0 +1,17 @@ +/* PR tree-optimization/101189 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +static int a, b; +int main() { + int d = 0, e, f = 5; + if (a) +f = 0; + for (; f < 4; f++) +; + e = f ^ -f; + e && d; + if (!e) +e || b; + return 0; +} -- 2.17.2
[PATCH] c++: CTAD within alias template [PR91911]
In the first testcase below, during parsing of the alias template ConstSpanType, transparency of alias template specializations means we replace SpanType with SpanType's substituted definition. But this substitution lowers the level of the CTAD placeholder for span(T()) from 2 to 1, and so the later instantiantion of ConstSpanType erroneously substitutes this CTAD placeholder with the template argument at level 1 index 0, i.e. with int, before we get a chance to perform the CTAD. In light of this, it seems we should avoid level lowering when substituting through through the type-id of a dependent alias template specialization. To that end this patch makes lookup_template_class_1 pass tf_partial to tsubst in this situation. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/91911 gcc/cp/ChangeLog: * pt.c (lookup_template_class_1): When looking up a dependent alias template specialization, pass tf_partial to tsubst. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction92.C: New test. --- gcc/cp/pt.c | 7 +- .../g++.dg/cpp1z/class-deduction92.C | 17 + .../g++.dg/cpp1z/class-deduction93.C | 25 +++ 3 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction92.C create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction93.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index f73c7471a33..23c5f515716 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -9954,7 +9954,12 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context, template-arguments for the template-parameters in the type-id of the alias template. */ - t = tsubst (TREE_TYPE (gen_tmpl), arglist, complain, in_decl); + /* When substituting a dependent alias template specialization, +we pass tf_partial to avoid lowering the level of any 'auto' +in its type-id which might correspond to CTAD placeholders. */ + t = tsubst (TREE_TYPE (gen_tmpl), arglist, + complain | (is_dependent_type * tf_partial), + in_decl); /* Note that the call above (by indirectly calling register_specialization in tsubst_decl) registers the TYPE_DECL representing the specialization of the alias diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C new file mode 100644 index 000..ae3c55508b2 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C @@ -0,0 +1,17 @@ +// PR c++/91911 +// { dg-do compile { target c++17 } } + +template +struct span { + using value_type = T; + span(T); +}; + +template +using SpanType = decltype(span(T())); + +template +using ConstSpanType = span::value_type>; + +using type = ConstSpanType; +using type = span; diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C new file mode 100644 index 000..eebc986832e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C @@ -0,0 +1,25 @@ +// PR c++/98077 +// { dg-do compile { target c++17 } } + +template +struct function { + template function(T) { } + using type = R; +}; + +template function(T) -> function; + +template +struct CallableTrait; + +template +struct CallableTrait> { using ReturnType = R; }; + +template +using CallableTraitT = decltype(function{F()}); + +template +using ReturnType = typename CallableTraitT::type; + +using type = ReturnType; +using type = int; -- 2.32.0.93.g670b81a890
Re: [committed] libstdc++: Implement LWG 2762 for std::unique_ptr::operator*
That example violates http://eel.is/c++draft/unique.ptr.runtime.general#3 On Thu, Jun 24, 2021 at 1:55 PM Patrick Palka via Gcc-patches wrote: > > On Thu, 24 Jun 2021, Jonathan Wakely via Libstdc++ wrote: > > > The LWG issue proposes to add a conditional noexcept-specifier to > > std::unique_ptr's dereference operator. The issue is currently in > > Tentatively Ready status, but even if it isn't voted into the draft, we > > can do it as a conforming extensions. This commit also adds a similar > > noexcept-specifier to operator[] for the unique_ptr partial > > specialization. > > The conditional noexcept added to unique_ptr::operator[] seems to break > the case where T is complete only after the fact: > > struct T; > extern std::unique_ptr p; > auto& t = p[1]; > struct T { }; > > /include/c++/12.0.0/bits/unique_ptr.h: In instantiation of ‘typename > std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp [], > _Dp>::operator[](std::size_t) co > nst [with _Tp = A; _Dp = std::default_delete; typename > std::add_lvalue_reference<_Tp>::type = A&; std::size_t = long unsigned int]’: > testcase.cc:5:14: required from here > /include/c++/12.0.0/bits/unique_ptr.h:658:48: error: invalid use of > incomplete type ‘struct A’ > 658 | > noexcept(noexcept(std::declval()[std::declval()])) > | ~~~^ > testcase.cc:3:8: note: forward declaration of ‘struct A’ > 3 | struct A; > |^ > > > > > Also ensure that all dereference operators for shared_ptr are noexcept, > > and adds tests for the std::optional accessors modified by the issue, > > which were already noexcept in our implementation. > > > > Signed-off-by: Jonathan Wakely > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/shared_ptr_base.h (__shared_ptr_access::operator[]): > > Add noexcept. > > * include/bits/unique_ptr.h (unique_ptr::operator*): Add > > conditional noexcept as per LWG 2762. > > * testsuite/20_util/shared_ptr/observers/array.cc: Check that > > dereferencing cannot throw. > > * testsuite/20_util/shared_ptr/observers/get.cc: Likewise. > > * testsuite/20_util/optional/observers/lwg2762.cc: New test. > > * testsuite/20_util/unique_ptr/lwg2762.cc: New test. > > > > Tested powerpc64le-linux. Committed to trunk. > > > >
Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)
On 6/23/21 11:26 PM, Jeff Law wrote: On 6/23/2021 1:47 PM, Martin Sebor via Gcc-patches wrote: On 6/22/21 5:28 PM, David Malcolm wrote: On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote: On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote: The attached patch introduces the suppress_warning(), warning_suppressed(), and copy_no_warning() APIs without making use of them in the rest of GCC. They are in three files: diagnostic-spec.{h,c}: Location-centric overloads. warning-control.cc: Tree- and gimple*-centric overloads. The location-centric overloads are suitable to use from the diagnostic subsystem. The rest can be used from the front ends and the middle end. [...snip...] +/* Return true if warning OPT is suppressed for decl/expression EXPR. + By default tests the disposition for any warning. */ + +bool +warning_suppressed_p (const_tree expr, opt_code opt /* = all_warnings */) +{ + const nowarn_spec_t *spec = get_nowarn_spec (expr); + + if (!spec) + return get_no_warning_bit (expr); + + const nowarn_spec_t optspec (opt); + bool dis = *spec & optspec; + gcc_assert (get_no_warning_bit (expr) || !dis); + return dis; Looking through the patch, I don't like the use of "dis" for the "is suppressed" bool, since... It's an argument to a a function, unlikely to confuse anyone. But I also don't think it's important enough to argue about so I changed it along with the comment below. [...snip...] + +/* Enable, or by default disable, a warning for the statement STMT. + The wildcard OPT of -1 controls all warnings. */ ...I find the above comment to be confusingly worded due to the double- negative. If I'm reading your intent correctly, how about this wording: /* Change the supression of warnings for statement STMT. OPT controls which warnings are affected. The wildcard OPT of -1 controls all warnings. If SUPP is true (the default), enable the suppression of the warnings. If SUPP is false, disable the suppression of the warnings. */ ...and rename "dis" to "supp". Or have I misread the intent of the patch? + +void +suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */, + bool dis /* = true */) +{ + if (opt == no_warning) + return; + + const key_type_t key = convert_to_key (stmt); + + dis = suppress_warning_at (key, opt, dis) || dis; + set_no_warning_bit (stmt, dis); +} [...snip...] Also, I think I prefer having a separate entrypoints for the "all warnings" case; on reading through the various patches I think that in e.g.: - TREE_NO_WARNING (*expr_p) = 1; + suppress_warning (*expr_p); I prefer: suppress_warnings (*expr_p); (note the plural) since that way we can grep for them, and it seems like better grammar to me. Both entrypoints could be implemented by a static suppress_warning_1 internally if that makes it easier. In that vein, "unsuppress_warning" seems far clearer to me that "suppress_warning (FOO, false)"; IIRC there are very few uses of this non-default arg (I couldn't find any in a quick look through the v2 kit). Does this make sense? In my experience, names that differ only slightly (such as in just one letter) tend to easily lead to frustrating mistakes. The new warning_suppressed_p() added in this patch also handles multiple warnings (via a wildcard) and uses a singular form, and as do the gimple_{get,set}no_warning() functions that are being replaced. So I don't think the suggested change is needed or would be an improvement. Similarly, there is no gimple_unset_no_warning() today and I don't think adding unsuppress_warning() is necessary or would add value, especially since there are just a handful of callers that re-enable warnings. For the callers that might need to enable or disable a warning based on some condition, it's more convenient to do so in a single call then by having to call different functions based the value of the condition. In the attached patch I've changed the comment as you asked and also renamed the dis argument to supp but kept the function names the same. I've also moved a few changes to tree.h from a later patch to this one to let it stand alone and regtested it on its own on x86_64-linux. I'll plan to go ahead with this version unless requestsfor substantive changes come up in the review of the rest of the changes. Thanks Martin gcc-no-warning-1.diff Add support for per-location warning groups. gcc/ChangeLog: * Makefile.in (OBJS-libcommon): Add diagnostic-spec.o. * gengtype.c (open_base_files): Add diagnostic-spec.h. * diagnostic-spec.c: New file. * diagnostic-spec.h: New file. * tree.h (no_warning, all_warnings, suppress_warning_at): New declarations. * warning-control.cc: New file. diff --git a/gcc/diagnostic-spec.h b/gcc/diagnostic-spec.h new file mode 100644 index 000..62d9270ca6d --- /dev/null +++ b/gcc/diagnostic-spec.h @@ -0,0 +1,140 @@ +/* Language-ind
Re: [PATCH 3/13] v2 Use new per-location warning APIs in C front end
On 6/23/21 11:09 PM, Jeff Law wrote: On 6/4/2021 3:41 PM, Martin Sebor via Gcc-patches wrote: The attached patch replaces the uses of TREE_NO_WARNING in the C front end with the new suppress_warning(), warning_suppressed_p(), and copy_warning() APIs. gcc-no-warning-c.diff Add support for per-location warning groups. gcc/c/ChangeLog: * c-decl.c (pop_scope): Replace direct uses of TREE_NO_WARNING with warning_suppressed_p, suppress_warning, and copy_no_warning. (diagnose_mismatched_decls): Same. (duplicate_decls): Same. (grokdeclarator): Same. (finish_function): Same. (c_write_global_declarations_1): Same. * c-fold.c (c_fully_fold_internal): Same. * c-parser.c (c_parser_expr_no_commas): Same. (c_parser_postfix_expression): Same. * c-typeck.c (array_to_pointer_conversion): Same. (function_to_pointer_conversion): Same. (default_function_array_conversion): Same. (convert_lvalue_to_rvalue): Same. (default_conversion): Same. (build_indirect_ref): Same. (build_function_call_vec): Same. (build_atomic_assign): Same. (build_unary_op): Same. (c_finish_return): Same. (emit_side_effect_warnings): Same. (c_finish_stmt_expr): Same. (c_omp_clause_copy_ctor): Same. OK once prereqs are approved. I've retested this on top of the first patch and pushed r12-1802. Martin
Re: [PATCH 4/13] v2 Use new per-location warning APIs in C family code
On 6/23/21 11:06 PM, Jeff Law wrote: On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote: The attached patch replaces the uses of TREE_NO_WARNING in the shared C family front end with the new suppress_warning(), warning_suppressed_p(), and copy_warning() APIs. gcc-no-warning-c-family.diff Add support for per-location warning groups. gcc/c-family/ChangeLog: * c-common.c (c_wrap_maybe_const): Remove TREE_NO_WARNING. (c_common_truthvalue_conversion): Replace direct uses of TREE_NO_WARNING with warning_suppressed_p, suppress_warning, and copy_no_warning. (check_function_arguments_recurse): Same. * c-gimplify.c (c_gimplify_expr): Same. * c-warn.c (overflow_warning): Same. (warn_logical_operator): Same. (warn_if_unused_value): Same. (do_warn_unused_parameter): Same. OK once prereqs are approved. jeff Retested on top patch 1 and the C FE stuff and pushed r12-1803. Martin