[PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
Hi! The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10 changes. The simplification triggers on (x & 4294967040U) >= 0U and turns it into: x <= 255U which is incorrect, it should fold to 1 because unsigned >= 0U is always true and normally the /* Non-equality compare simplifications from fold_binary */ (if (wi::to_wide (cst) == min) (if (cmp == GE_EXPR) { constant_boolean_node (true, type); }) simplification folds that, but this simplification was done earlier. The simplification correctly doesn't include lt which has the same reason why it shouldn't be handled, we'll fold it to 0 elsewhere. But, IMNSHO while it isn't incorrect to handle le and gt there, it is unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never appear, again in /* Non-equality compare simplifications from fold_binary */ we have a simplification for it: (if (cmp == LE_EXPR) (eq @2 @1)) (if (cmp == GT_EXPR) (ne @2 @1 This is done for (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done for both integers and vectors. As the bitmask_inv_cst_vector_p simplification only handles eq and ne for signed types, I think it can be simplified to just following patch. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or VECTOR_CST? Also, without/with this patch I see on i686-linux (can be reproduced with RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* signbit-2*'" too): FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}" FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1 FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors) FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1, 65535 }" FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1 FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1 FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}" 1 FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1 FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1 FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 "&s*.+{ 4294967290,.+}" FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }" Those tests use vect_int effective target, but AFAIK that can be used only in *.dg/vect/ because it relies on vect.exp enabling options to support vectorization on the particular target (e.g. for i686-linux that -msse2). I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-options other than having the test driven by vect.exp. And, finally, I've noticed incorrect formatting in the new bitmask_inv_cst_vector_p routine: do { if (idx > 0) cst = vector_cst_elt (t, idx); ... builder.quick_push (newcst); } while (++idx < nelts); It should be do { if (idx > 0) cst = vector_cst_elt (t, idx); ... builder.quick_push (newcst); } while (++idx < nelts); 2021-11-25 Jakub Jelinek PR tree-optimization/103417 * match.pd ((X & Y) CMP 0): Only handle eq and ne. Commonalize common tests. * gcc.c-torture/execute/pr103417.c: New test. --- gcc/match.pd.jj 2021-11-24 11:46:03.191918052 +0100 +++ gcc/match.pd2021-11-24 22:33:43.852575772 +0100 @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z where ~Y + 1 == pow2 and Z = ~Y. */ (for cst (VECTOR_CST INTEGER_CST) - (for cmp (le eq ne ge gt) - icmp (le le gt le gt) - (simplify - (cmp (bit_and:c@2 @0 cst@1) integer_zerop) - (with { tree csts = bitmask_inv_cst_vector_p (@1); } - (switch - (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1)) - && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) - (icmp @0 { csts; })) - (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1)) - && (cmp == EQ_EXPR || cmp == NE_EXPR) - && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) + (for cmp (eq ne) + icmp (le gt) + (simplify + (cmp (bit_and:c@2 @0 cst@1) integer_zerop) +(with { tree csts = bitmask_inv_cst_vector_p (@1); } + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) + (if (TYPE_UNSIGNED (TREE_TYPE (@1))) + (icmp @0 { csts; }) (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } - (icmp (convert:utype @0) { csts; } +(icmp (convert:utype @0) { csts; } /* -A CMP -B -> B CMP A. */ (for cmp (tcc_comparison) --- gcc/testsuite
Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code
On Wed, 24 Nov 2021, Michael Matz wrote: > Hello, > > On Wed, 24 Nov 2021, Richard Biener wrote: > > > >> +/* Unreachable code in if (0) block. */ > > >> +void baz(int *p) > > >> +{ > > >> + if (0) > > >> + { > > >> +return; /* { dg-bogus "not reachable" } */ > > > > > >Hmm? Why are you explicitely saying that warning here would be bogus? > > > > Because I don't think we want to warn here. Such code is common from > > template instantiation or macro expansion. > > Like all code with an (const-propagated) explicit 'if (0)', which is of > course the reason why -Wunreachable-code is a challenge. OK, so I probably shouldn't have taken -Wunreachable-code but named it somehow differently. We want to diagnose obvious programming mistakes, not (source code) optimization opportunities. So int foo (int i) { return i; i += 1; return i; } should be diagnosed for example but not so int foo (int i) { if (USE_NOOP_FOO) return i; i += 1; return i; } and compiling with -DUSE_NOOP_FOO=1 > IOW: I could > accept your argument but then wonder why you want to warn about the second > statement of the guarded block. The situation was: > > if (0) { > return; // (1) don't warn here? > whatever++; // (2) but warn here? because as said above, the whatever++ will never be reachable even if you change the condition in the if(). See my response to Martin where I said I think if (0) of a block is a good way to comment it out but keep it syntactically correct. > } > > That seems even more confusing. So you don't want to warn about > unreachable code (the 'return') but you do want to warn about unreachable > code within unreachable code (point (2) is unreachable because of the > if(0) and because of the return). If your worry is macro/template > expansion resulting if(0)'s then I don't see why you would only disable > warnings for some of the statements therein. The point is not to disable the warning for some statements therein but to avoid diagnosing following stmts. > It seems we are actually interested in code unreachable via fallthrough or > labels, not in all unreachable code, so maybe the warning is mis-named. Yes, that's definitely the case - I was too lazy to re-use the old option name here. But I don't have a good name at hand, maybe clang has an option covering the cases I'm thinking about. Btw, the diagnostic spotted qsort_chk doing if (CMP (i1, i2)) break; else if (CMP (i2, i1)) return ERR2 (i1, i2); where ERR2 expands to a call to a noreturn void "returning" qsort_chk_error, so the 'return' stmt is not reachable. Not exactly a bug but somewhat difficult to avoid the diagnostic for. I suppose the pointless 'return' is to make it more visible that the loop terminates here (albeit we don't return normally). Likewise we diagnose (c_tree_equal): default: gcc_unreachable (); } /* We can get here with --disable-checking. */ return false; where the 'return false' is never reachable. The return was likely inserted to avoid very strange error paths then the unreachable falls through to some other random function. > Btw. what does the code now do about this situation: > > if (0) { > something++; // 1 > return; // 2 > somethingelse++; // 3 > } > > does it warn at (1) or not? (I assume it unconditionally warns at (3)) It warns at (3). It basically assumes that if (0) might become if (1) in some other configuration and thus the diagnostic is difficult to silence in source. Any suggestion for a better option name? Richard.
Re: [PATCH] bswap: Improve perform_symbolic_merge [PR103376]
On Thu, 25 Nov 2021, Jakub Jelinek wrote: > On Wed, Nov 24, 2021 at 09:45:16AM +0100, Richard Biener wrote: > > > Thinking more about it, perhaps we could do more for BIT_XOR_EXPR. > > > We could allow masked1 == masked2 case for it, but would need to > > > do something different than the > > > n->n = n1->n | n2->n; > > > we do on all the bytes together. > > > In particular, for masked1 == masked2 if masked1 != 0 (well, for 0 > > > both variants are the same) and masked1 != 0xff we would need to > > > clear corresponding n->n byte instead of setting it to the input > > > as x ^ x = 0 (but if we don't know what x and y are, the result is > > > also don't know). Now, for plus it is much harder, because not only > > > for non-zero operands we don't know what the result is, but it can > > > modify upper bytes as well. So perhaps only if current's byte > > > masked1 && masked2 set the resulting byte to 0xff (unknown) iff > > > the byte above it is 0 and 0, and set that resulting byte to 0xff too. > > > Also, even for | we could instead of return NULL just set the resulting > > > byte to 0xff if it is different, perhaps it will be masked off later on. > > > Ok to handle that incrementally? > > > > Not sure if it is worth the trouble - the XOR handling sounds > > straight forward at least. But sure, the merging routine could > > simply be conservatively correct here. > > This patch implements that (except that for + it just punts whenever > both operand bytes aren't 0 like before). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK if you can add a testcase that exercises this "feature". Thanks, Richard. > 2021-11-25 Jakub Jelinek > > PR tree-optimization/103376 > * gimple-ssa-store-merging.c (perform_symbolic_merge): For > BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't > punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN. > For BIT_XOR_EXPR similarly and if masked1 == masked2 and the > byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to > 0. > > --- gcc/gimple-ssa-store-merging.c.jj 2021-11-24 09:54:37.684365460 +0100 > +++ gcc/gimple-ssa-store-merging.c2021-11-24 11:18:54.46266 +0100 > @@ -556,6 +556,7 @@ perform_symbolic_merge (gimple *source_s >n->bytepos = n_start->bytepos; >n->type = n_start->type; >size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; > + uint64_t res_n = n1->n | n2->n; > >for (i = 0, mask = MARKER_MASK; i < size; i++, mask <<= BITS_PER_MARKER) > { > @@ -563,12 +564,33 @@ perform_symbolic_merge (gimple *source_s > >masked1 = n1->n & mask; >masked2 = n2->n & mask; > - /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2 > - has to be 0, for BIT_IOR_EXPR x | x is still x. */ > - if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2)) > - return NULL; > + /* If at least one byte is 0, all of 0 | x == 0 ^ x == 0 + x == x. */ > + if (masked1 && masked2) > + { > + /* + can carry into upper bits, just punt. */ > + if (code == PLUS_EXPR) > + return NULL; > + /* x | x is still x. */ > + if (code == BIT_IOR_EXPR && masked1 == masked2) > + continue; > + if (code == BIT_XOR_EXPR) > + { > + /* x ^ x is 0, but MARKER_BYTE_UNKNOWN stands for > + unknown values and unknown ^ unknown is unknown. */ > + if (masked1 == masked2 > + && masked1 != ((uint64_t) MARKER_BYTE_UNKNOWN > + << i * BITS_PER_MARKER)) > + { > + res_n &= ~mask; > + continue; > + } > + } > + /* Otherwise set the byte to unknown, it might still be > + later masked off. */ > + res_n |= mask; > + } > } > - n->n = n1->n | n2->n; > + n->n = res_n; >n->n_ops = n1->n_ops + n2->n_ops; > >return source_stmt; > > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
Hi Jakub, > -Original Message- > From: Jakub Jelinek > Sent: Thursday, November 25, 2021 8:19 AM > To: Richard Biener > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org > Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > simplification [PR103417] > > Hi! > > The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10 > changes. > The simplification triggers on > (x & 4294967040U) >= 0U > and turns it into: > x <= 255U > which is incorrect, it should fold to 1 because unsigned >= 0U is always true > and normally the > /* Non-equality compare simplifications from fold_binary */ > (if (wi::to_wide (cst) == min) >(if (cmp == GE_EXPR) > { constant_boolean_node (true, type); }) simplification folds that, > but > this simplification was done earlier. > > The simplification correctly doesn't include lt which has the same reason why > it shouldn't be handled, we'll fold it to 0 elsewhere. Yes this was a bug, sorry I'm not sure why I didn't catch it... > > But, IMNSHO while it isn't incorrect to handle le and gt there, it is > unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never appear, > again in > /* Non-equality compare simplifications from fold_binary */ we have a > simplification for it: >(if (cmp == LE_EXPR) > (eq @2 @1)) >(if (cmp == GT_EXPR) > (ne @2 @1 > This is done for > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done > for both integers and vectors. > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for > signed types, I think it can be simplified to just following patch. As I mentioned on the PR I don't think LE and GT should be removed, the patch Is attempting to simplify the bitmask used because most vector ISAs can create the simpler mask much easier than the complex mask. It. 0xFF00 is harder to create than 0xFF. So while for scalar it doesn't matter as much, it does for vector code. Regards, Tamar > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *- > match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in > bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or > VECTOR_CST? > > Also, without/with this patch I see on i686-linux (can be reproduced with > RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* > signbit-2*'" > too): > FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors) > FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1, > 65535 }" > FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1 > FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}" > 1 > FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1 > FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 > "&s*.+{ 4294967290,.+}" > FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }" > Those tests use vect_int effective target, but AFAIK that can be used only in > *.dg/vect/ because it relies on vect.exp enabling options to support > vectorization on the particular target (e.g. for i686-linux that -msse2). > I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg- > options other than having the test driven by vect.exp. > > And, finally, I've noticed incorrect formatting in the new > bitmask_inv_cst_vector_p routine: > do { > if (idx > 0) > cst = vector_cst_elt (t, idx); > ... > builder.quick_push (newcst); > } while (++idx < nelts); > It should be > do > { > if (idx > 0) > cst = vector_cst_elt (t, idx); > ... > builder.quick_push (newcst); > } > while (++idx < nelts); > > 2021-11-25 Jakub Jelinek > > PR tree-optimization/103417 > * match.pd ((X & Y) CMP 0): Only handle eq and ne. Commonalize > common tests. > > * gcc.c-torture/execute/pr103417.c: New test. > > --- gcc/match.pd.jj 2021-11-24 11:46:03.191918052 +0100 > +++ gcc/match.pd 2021-11-24 22:33:43.852575772 +0100 > @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND > /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z > where ~Y + 1 == pow2 and Z = ~Y. */ (for cst (VECTOR_CST INTEGER_CST) > - (for cmp (le eq ne ge gt) > - icmp (le le gt le gt) > - (simplify > - (cmp (bit_and:c@2 @0 cst@1) integer_zerop) > - (with { tree csts = bitmask_inv_
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
On Thu, 25 Nov 2021, Tamar Christina wrote: > Hi Jakub, > > > -Original Message- > > From: Jakub Jelinek > > Sent: Thursday, November 25, 2021 8:19 AM > > To: Richard Biener > > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org > > Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > > simplification [PR103417] > > > > Hi! > > > > The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10 > > changes. > > The simplification triggers on > > (x & 4294967040U) >= 0U > > and turns it into: > > x <= 255U > > which is incorrect, it should fold to 1 because unsigned >= 0U is always > > true > > and normally the > > /* Non-equality compare simplifications from fold_binary */ > > (if (wi::to_wide (cst) == min) > >(if (cmp == GE_EXPR) > > { constant_boolean_node (true, type); }) simplification folds that, > > but > > this simplification was done earlier. > > > > The simplification correctly doesn't include lt which has the same reason > > why > > it shouldn't be handled, we'll fold it to 0 elsewhere. > > > Yes this was a bug, sorry I'm not sure why I didn't catch it... > > > > > But, IMNSHO while it isn't incorrect to handle le and gt there, it is > > unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never > > appear, > > again in > > /* Non-equality compare simplifications from fold_binary */ we have a > > simplification for it: > >(if (cmp == LE_EXPR) > > (eq @2 @1)) > >(if (cmp == GT_EXPR) > > (ne @2 @1 > > This is done for > > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done > > for both integers and vectors. > > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for > > signed types, I think it can be simplified to just following patch. Note that would mean the transform should be ordered _after_ the above, even if we retain it for vector le/gt. > As I mentioned on the PR I don't think LE and GT should be removed, the patch > Is attempting to simplify the bitmask used because most vector ISAs can create > the simpler mask much easier than the complex mask. > > It. 0xFF00 is harder to create than 0xFF. So while for scalar it > doesn't matter > as much, it does for vector code. > > Regards, > Tamar > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of > > *- > > match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in > > bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or > > VECTOR_CST? In the end that should be recoverable by genmatch. I do have some ideas to improve it for size in this area, maybe during stage4. Originally genmatch was trying to optimize for matching speed but with now honoring ordering of patterns that very much became secondary (note re-ordering patterns in match.pd can also improve *-match.c size greatly! maybe some script can try to brute-force the "optimal" order - but note some pattern order matters ;)) > > Also, without/with this patch I see on i686-linux (can be reproduced with > > RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* > > signbit-2*'" > > too): > > FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > > FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}" > > FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > > FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 > > "<=s*.+{ 255,.+}" 1 > > FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors) > > FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1, > > 65535 }" > > FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 > > "<=s*.+{ 255,.+}" 1 > > FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1 > > FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}" > > 1 > > FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 > > "<=s*.+{ 255,.+}" 1 > > FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1 > > FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 > > "&s*.+{ 4294967290,.+}" > > FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }" > > Those tests use vect_int effective target, but AFAIK that can be used only > > in > > *.dg/vect/ because it relies on vect.exp enabling options to support > > vectorization on the particular target (e.g. for i686-linux that -msse2). > > I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg- > > options other than having the test driven by vect.exp. > > > > And, finally, I've noticed incorrect formatting in the new > > bitmask_inv_cst_vector_p routine: > > do { > > if (idx > 0) > > cst = vector_cst_elt (t, idx); > > ... > > builder.quick_push (newcst); > > } while (++idx < nelts); > > It should be > > do > > { > > i
Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote: > > But, IMNSHO while it isn't incorrect to handle le and gt there, it is > > unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never > > appear, > > again in > > /* Non-equality compare simplifications from fold_binary */ we have a > > simplification for it: > >(if (cmp == LE_EXPR) > > (eq @2 @1)) > >(if (cmp == GT_EXPR) > > (ne @2 @1 > > This is done for > > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done > > for both integers and vectors. > > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for > > signed types, I think it can be simplified to just following patch. > > As I mentioned on the PR I don't think LE and GT should be removed, the patch > Is attempting to simplify the bitmask used because most vector ISAs can create > the simpler mask much easier than the complex mask. > > It. 0xFF00 is harder to create than 0xFF. So while for scalar it > doesn't matter > as much, it does for vector code. What I'm trying to explain is that you should never see those le or gt cases with TYPE_UNSIGNED (especially when the simplification is moved after those /* Non-equality compare simplifications from fold_binary */ I've mentioned), because if you try: typedef unsigned V __attribute__((vector_size (4))); unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2 (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; return x > z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at ccp1 when the constants propagate simplified using the rules I mentioned into x != 0U, x == 0U, x != (V) {} and x == (V) {}. The important rule of match.pd is composability, the simplifications should rely on other simplifications and not repeating all their decisions because that makes the *match.c larger and more expensive (and a source of extra possible bugs). Jakub
Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code
On Thu, 25 Nov 2021, Richard Biener wrote: > On Wed, 24 Nov 2021, Michael Matz wrote: > > > Hello, > > > > On Wed, 24 Nov 2021, Richard Biener wrote: > > > > > >> +/* Unreachable code in if (0) block. */ > > > >> +void baz(int *p) > > > >> +{ > > > >> + if (0) > > > >> + { > > > >> +return; /* { dg-bogus "not reachable" } */ > > > > > > > >Hmm? Why are you explicitely saying that warning here would be bogus? > > > > > > Because I don't think we want to warn here. Such code is common from > > > template instantiation or macro expansion. > > > > Like all code with an (const-propagated) explicit 'if (0)', which is of > > course the reason why -Wunreachable-code is a challenge. > > OK, so I probably shouldn't have taken -Wunreachable-code but named > it somehow differently. We want to diagnose obvious programming > mistakes, not (source code) optimization opportunities. So > > int foo (int i) > { > return i; > i += 1; > return i; > } > > should be diagnosed for example but not so > > int foo (int i) > { > if (USE_NOOP_FOO) > return i; > i += 1; > return i; > } > > and compiling with -DUSE_NOOP_FOO=1 > > > IOW: I could > > accept your argument but then wonder why you want to warn about the second > > statement of the guarded block. The situation was: > > > > if (0) { > > return; // (1) don't warn here? > > whatever++; // (2) but warn here? > > because as said above, the whatever++ will never be reachable even if > you change the condition in the if(). See my response to Martin where > I said I think if (0) of a block is a good way to comment it out > but keep it syntactically correct. > > > } > > > > That seems even more confusing. So you don't want to warn about > > unreachable code (the 'return') but you do want to warn about unreachable > > code within unreachable code (point (2) is unreachable because of the > > if(0) and because of the return). If your worry is macro/template > > expansion resulting if(0)'s then I don't see why you would only disable > > warnings for some of the statements therein. > > The point is not to disable the warning for some statements therein > but to avoid diagnosing following stmts. > > > It seems we are actually interested in code unreachable via fallthrough or > > labels, not in all unreachable code, so maybe the warning is mis-named. > > Yes, that's definitely the case - I was too lazy to re-use the old > option name here. But I don't have a good name at hand, maybe clang > has an option covering the cases I'm thinking about. > > Btw, the diagnostic spotted qsort_chk doing > > if (CMP (i1, i2)) > break; > else if (CMP (i2, i1)) > return ERR2 (i1, i2); > > where ERR2 expands to a call to a noreturn void "returning" > qsort_chk_error, so the 'return' stmt is not reachable. Not exactly > a bug but somewhat difficult to avoid the diagnostic for. I suppose > the pointless 'return' is to make it more visible that the loop > terminates here (albeit we don't return normally). > > Likewise we diagnose (c_tree_equal): > > default: > gcc_unreachable (); > } > /* We can get here with --disable-checking. */ > return false; > > where the 'return false' is never reachable. The return was likely > inserted to avoid very strange error paths then the unreachable > falls through to some other random function. It also finds this strange code in label_rtx_for_bb: /* Find the tree label if it is present. */ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { glabel *lab_stmt; lab_stmt = dyn_cast (gsi_stmt (gsi)); if (!lab_stmt) break; lab = gimple_label_label (lab_stmt); if (DECL_NONLOCAL (lab)) break; return jump_target_rtx (lab); } diagnosing /home/rguenther/src/trunk/gcc/cfgexpand.c:2476:60: error: statement is not reachable [-Werror=unreachable-code] 2476 | for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) | ~^~ indeed the loop looks pointless. Unless the DECL_NONLOCAL case was meant to continue; Richard.
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
> -Original Message- > From: Jakub Jelinek > Sent: Thursday, November 25, 2021 8:39 AM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > simplification [PR103417] > > On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote: > > > But, IMNSHO while it isn't incorrect to handle le and gt there, it > > > is unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should > > > never appear, again in > > > /* Non-equality compare simplifications from fold_binary */ we have > > > a simplification for it: > > >(if (cmp == LE_EXPR) > > > (eq @2 @1)) > > >(if (cmp == GT_EXPR) > > > (ne @2 @1 > > > This is done for > > > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be > > > done for both integers and vectors. > > > As the bitmask_inv_cst_vector_p simplification only handles eq and > > > ne for signed types, I think it can be simplified to just following patch. > > > > As I mentioned on the PR I don't think LE and GT should be removed, > > the patch Is attempting to simplify the bitmask used because most > > vector ISAs can create the simpler mask much easier than the complex > mask. > > > > It. 0xFF00 is harder to create than 0xFF. So while for scalar it > > doesn't > matter > > as much, it does for vector code. > > What I'm trying to explain is that you should never see those le or gt cases > with TYPE_UNSIGNED (especially when the simplification is moved after > those > /* Non-equality compare simplifications from fold_binary */ I've mentioned), > because if you try: > typedef unsigned V __attribute__((vector_size (4))); > > unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2 > (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; > return x > > z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at > ccp1 when > the constants propagate simplified using the rules I mentioned into x != 0U, x > == 0U, x != (V) {} and x == (V) {}. Ah I see, sorry I didn't see that rule before, you're right that if this is ordered after it then they can be dropped. Thanks, Tamar > > The important rule of match.pd is composability, the simplifications should > rely on other simplifications and not repeating all their decisions because > that > makes the *match.c larger and more expensive (and a source of extra > possible bugs). > > Jakub
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
> -Original Message- > From: Jakub Jelinek > Sent: Thursday, November 25, 2021 8:19 AM > To: Richard Biener > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org > Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > simplification [PR103417] > > Hi! > > The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10 > changes. > The simplification triggers on > (x & 4294967040U) >= 0U > and turns it into: > x <= 255U > which is incorrect, it should fold to 1 because unsigned >= 0U is always true > and normally the > /* Non-equality compare simplifications from fold_binary */ > (if (wi::to_wide (cst) == min) >(if (cmp == GE_EXPR) > { constant_boolean_node (true, type); }) simplification folds that, > but > this simplification was done earlier. > > The simplification correctly doesn't include lt which has the same reason why > it shouldn't be handled, we'll fold it to 0 elsewhere. > > But, IMNSHO while it isn't incorrect to handle le and gt there, it is > unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never appear, > again in > /* Non-equality compare simplifications from fold_binary */ we have a > simplification for it: >(if (cmp == LE_EXPR) > (eq @2 @1)) >(if (cmp == GT_EXPR) > (ne @2 @1 > This is done for > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done > for both integers and vectors. > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for > signed types, I think it can be simplified to just following patch. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *- > match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in > bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or > VECTOR_CST? > > Also, without/with this patch I see on i686-linux (can be reproduced with > RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* > signbit-2*'" > too): > FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}" > FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors) > FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1, > 65535 }" > FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1 > FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}" > 1 > FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 > "<=s*.+{ 255,.+}" 1 > FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1 > FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 > "&s*.+{ 4294967290,.+}" > FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }" > Those tests use vect_int effective target, but AFAIK that can be used only in > *.dg/vect/ because it relies on vect.exp enabling options to support > vectorization on the particular target (e.g. for i686-linux that -msse2). > I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg- > options other than having the test driven by vect.exp. Yeah, I now see that vect_int is different from some of the other effective target checks like the SVE one. I'll move the ones testing the vector code to vect and leave the scalars where they are. Thanks, Tamar > > And, finally, I've noticed incorrect formatting in the new > bitmask_inv_cst_vector_p routine: > do { > if (idx > 0) > cst = vector_cst_elt (t, idx); > ... > builder.quick_push (newcst); > } while (++idx < nelts); > It should be > do > { > if (idx > 0) > cst = vector_cst_elt (t, idx); > ... > builder.quick_push (newcst); > } > while (++idx < nelts); > > 2021-11-25 Jakub Jelinek > > PR tree-optimization/103417 > * match.pd ((X & Y) CMP 0): Only handle eq and ne. Commonalize > common tests. > > * gcc.c-torture/execute/pr103417.c: New test. > > --- gcc/match.pd.jj 2021-11-24 11:46:03.191918052 +0100 > +++ gcc/match.pd 2021-11-24 22:33:43.852575772 +0100 > @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND > /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z > where ~Y + 1 == pow2 and Z = ~Y. */ (for cst (VECTOR_CST INTEGER_CST) > - (for cmp (le eq ne ge gt) > - icmp (le le gt le gt) > - (simplify > - (cmp (bit_and:c@2 @0 cst@1) integer_zerop) > - (with { tree csts = bitmask_inv_cst_vector_p (@1); } > - (switch > - (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1)) > -&& (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) > - (icmp @0 { csts; })) > - (if (csts && !
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
On Thu, 25 Nov 2021, Tamar Christina wrote: > > > > -Original Message- > > From: Jakub Jelinek > > Sent: Thursday, November 25, 2021 8:39 AM > > To: Tamar Christina > > Cc: Richard Biener ; gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > > simplification [PR103417] > > > > On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote: > > > > But, IMNSHO while it isn't incorrect to handle le and gt there, it > > > > is unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should > > > > never appear, again in > > > > /* Non-equality compare simplifications from fold_binary */ we have > > > > a simplification for it: > > > >(if (cmp == LE_EXPR) > > > > (eq @2 @1)) > > > >(if (cmp == GT_EXPR) > > > > (ne @2 @1 > > > > This is done for > > > > (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be > > > > done for both integers and vectors. > > > > As the bitmask_inv_cst_vector_p simplification only handles eq and > > > > ne for signed types, I think it can be simplified to just following > > > > patch. > > > > > > As I mentioned on the PR I don't think LE and GT should be removed, > > > the patch Is attempting to simplify the bitmask used because most > > > vector ISAs can create the simpler mask much easier than the complex > > mask. > > > > > > It. 0xFF00 is harder to create than 0xFF. So while for scalar it > > > doesn't > > matter > > > as much, it does for vector code. > > > > What I'm trying to explain is that you should never see those le or gt cases > > with TYPE_UNSIGNED (especially when the simplification is moved after > > those > > /* Non-equality compare simplifications from fold_binary */ I've > > mentioned), > > because if you try: > > typedef unsigned V __attribute__((vector_size (4))); > > > > unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2 > > (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; > > return x > > > z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at > > ccp1 when > > the constants propagate simplified using the rules I mentioned into x != > > 0U, x > > == 0U, x != (V) {} and x == (V) {}. > > Ah I see, sorry I didn't see that rule before, you're right that if this is > ordered > after it then they can be dropped. So the patch is OK, possibly with re-ordering the matches. Thanks, Richard.
Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma
Changes from original patch: 1. Merged test_redef_* test files into one 2. Encapsulated contents of arm-mve-builtins.h in namespace arm_mve (missed in initial patch). 3. Added extern declarations for scalar_types and acle_vector types to arm-mve-builtins.h (missed in initial patch). 4. Added arm-mve-builtins.(cc|h) to gt_targets for arm-*-*-* (missed in initial patch). 5. Added include for gt-arm-mve-builtins.h to arm-mve-builtins.cc (missed in initial patch). 6. Removed explicit initialisation of handle_arm_mve_types_p as it is unneeded. --- This patch moves the implementation of MVE ACLE types from arm_mve_types.h to inside GCC via a new pragma, which replaces the prior type definitions. This allows for the types to be used internally for intrinsic function definitions. Bootstrapped and regression tested on arm-none-linux-gnuabihf, and regression tested on arm-eabi -- no issues. Thanks, Murray gcc/ChangeLog: * config.gcc: Add arm-mve-builtins.o to extra_objs for arm-*-*-* targets. * config/arm/arm-c.c (arm_pragma_arm): Handle new pragma. (arm_register_target_pragmas): Register new pragma. * config/arm/arm-protos.h: Add arm_mve namespace and declare arm_handle_mve_types_h. * config/arm/arm_mve_types.h: Replace MVE type definitions with new pragma. * config/arm/t-arm: Add arm-mve-builtins.o target. * config/arm/arm-mve-builtins.cc: New file. * config/arm/arm-mve-builtins.def: New file. * config/arm/arm-mve-builtins.h: New file. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/mve.exp: Add new subdirectories. * gcc.target/arm/mve/general-c/type_redef_1.c: New test. * gcc.target/arm/mve/general/double_pragmas_1.c: New test. * gcc.target/arm/mve/general/nomve_1.c: New test. diff --git a/gcc/config.gcc b/gcc/config.gcc index edd12655c4a1e6feb09aabbee77eacd9f66b4171..0aa386403112eff80cb5071fa6ff2fdbe610c9fc 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -352,14 +352,14 @@ arc*-*-*) ;; arm*-*-*) cpu_type=arm - extra_objs="arm-builtins.o aarch-common.o" + extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o" extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h" target_type_format_char='%' c_target_objs="arm-c.o" cxx_target_objs="arm-c.o" d_target_objs="arm-d.o" extra_options="${extra_options} arm/arm-tables.opt" - target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c" + target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c \$(srcdir)/config/arm/arm-mve-builtins.h \$(srcdir)/config/arm/arm-mve-builtins.cc" ;; avr-*-*) cpu_type=avr diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..d1414f6e0e1c2bd0a7364b837c16adf493221376 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -28,6 +28,7 @@ #include "c-family/c-pragma.h" #include "stringpool.h" #include "arm-builtins.h" +#include "arm-protos.h" tree arm_resolve_cde_builtin (location_t loc, tree fndecl, void *arglist) @@ -129,6 +130,24 @@ arm_resolve_cde_builtin (location_t loc, tree fndecl, void *arglist) return call_expr; } +/* Implement "#pragma GCC arm". */ +static void +arm_pragma_arm (cpp_reader *) +{ + tree x; + if (pragma_lex (&x) != CPP_STRING) +{ + error ("%<#pragma GCC arm%> requires a string parameter"); + return; +} + + const char *name = TREE_STRING_POINTER (x); + if (strcmp (name, "arm_mve_types.h") == 0) +arm_mve::handle_arm_mve_types_h (); + else +error ("unknown %<#pragma GCC arm%> option %qs", name); +} + /* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN. This is currently only used for the MVE related builtins for the CDE extension. Here we ensure the type of arguments is such that the size is correct, and @@ -476,6 +495,8 @@ arm_register_target_pragmas (void) targetm.target_option.pragma_parse = arm_pragma_target_parse; targetm.resolve_overloaded_builtin = arm_resolve_overloaded_builtin; + c_register_pragma ("GCC", "arm", arm_pragma_arm); + #ifdef REGISTER_SUBTARGET_PRAGMAS REGISTER_SUBTARGET_PRAGMAS (); #endif diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-builtins.cc new file mode 100644 index ..99ddc8d49aad39e057c1c0d349c6c02c278553d6 --- /dev/null +++ b/gcc/config/arm/arm-mve-builtins.cc @@ -0,0 +1,196 @@ +/* ACLE support for Arm MVE + Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but
Re: [PATCH] bswap: Improve perform_symbolic_merge [PR103376]
On Thu, Nov 25, 2021 at 09:21:37AM +0100, Richard Biener wrote: > OK if you can add a testcase that exercises this "feature". Sure, that is easy. Here is what I've committed. f2 tests the x | x = x handling in it, f3 tests x | y = unknown instead of punting, f4 tests x ^ x = 0 and f5 tests x ^ y = unknown. Without the patch only f2 is optimized to __builtin_bswap32, with the patch all of them. 2021-11-25 Jakub Jelinek PR tree-optimization/103376 * gimple-ssa-store-merging.c (perform_symbolic_merge): For BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN. For BIT_XOR_EXPR similarly and if masked1 == masked2 and the byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to 0. * gcc.dg/optimize-bswapsi-7.c: New test. --- gcc/gimple-ssa-store-merging.c.jj 2021-11-24 09:54:37.684365460 +0100 +++ gcc/gimple-ssa-store-merging.c 2021-11-24 11:18:54.46266 +0100 @@ -556,6 +556,7 @@ perform_symbolic_merge (gimple *source_s n->bytepos = n_start->bytepos; n->type = n_start->type; size = TYPE_PRECISION (n->type) / BITS_PER_UNIT; + uint64_t res_n = n1->n | n2->n; for (i = 0, mask = MARKER_MASK; i < size; i++, mask <<= BITS_PER_MARKER) { @@ -563,12 +564,33 @@ perform_symbolic_merge (gimple *source_s masked1 = n1->n & mask; masked2 = n2->n & mask; - /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2 -has to be 0, for BIT_IOR_EXPR x | x is still x. */ - if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2)) - return NULL; + /* If at least one byte is 0, all of 0 | x == 0 ^ x == 0 + x == x. */ + if (masked1 && masked2) + { + /* + can carry into upper bits, just punt. */ + if (code == PLUS_EXPR) + return NULL; + /* x | x is still x. */ + if (code == BIT_IOR_EXPR && masked1 == masked2) + continue; + if (code == BIT_XOR_EXPR) + { + /* x ^ x is 0, but MARKER_BYTE_UNKNOWN stands for +unknown values and unknown ^ unknown is unknown. */ + if (masked1 == masked2 + && masked1 != ((uint64_t) MARKER_BYTE_UNKNOWN +<< i * BITS_PER_MARKER)) + { + res_n &= ~mask; + continue; + } + } + /* Otherwise set the byte to unknown, it might still be +later masked off. */ + res_n |= mask; + } } - n->n = n1->n | n2->n; + n->n = res_n; n->n_ops = n1->n_ops + n2->n_ops; return source_stmt; --- gcc/testsuite/gcc.dg/optimize-bswapsi-7.c.jj2021-11-25 10:36:03.847529686 +0100 +++ gcc/testsuite/gcc.dg/optimize-bswapsi-7.c 2021-11-25 10:35:46.522778192 +0100 @@ -0,0 +1,37 @@ +/* PR tree-optimization/103376 */ +/* { dg-do compile } */ +/* { dg-require-effective-target bswap } */ +/* { dg-options "-O2 -fno-tree-vectorize -fdump-tree-optimized" } */ +/* { dg-additional-options "-march=z900" { target s390-*-* } } */ + +static unsigned int +f1 (unsigned int x) +{ + return (x << 24) | (x >> 8); +} + +unsigned int +f2 (unsigned *p) +{ + return ((f1 (p[0]) | (p[0] >> 8)) & 0xff00U) | (p[0] >> 24) | ((p[0] & 0xff00U) << 8) | ((p[0] & 0xffU) >> 8); +} + +unsigned int +f3 (unsigned *p) +{ + return ((f1 (p[0]) | (p[0] & 0x00ff00ffU)) & 0xff00ff00U) | (f1 (f1 (f1 (p[0]))) & 0x00ff00ffU); +} + +unsigned int +f4 (unsigned *p) +{ + return (f1 (p[0]) ^ (p[0] >> 8)) ^ (p[0] >> 24) ^ ((p[0] & 0xff00U) << 8) ^ ((p[0] & 0xffU) >> 8); +} + +unsigned int +f5 (unsigned *p) +{ + return (((f1 (p[0]) | (p[0] >> 16)) ^ (p[0] >> 8)) & 0xU) ^ (p[0] >> 24) ^ ((p[0] & 0xff00U) << 8) ^ ((p[0] & 0xffU) >> 8); +} + +/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 4 "optimized" } } */ Jakub
Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote: > > Ah I see, sorry I didn't see that rule before, you're right that if this is > > ordered > > after it then they can be dropped. > > So the patch is OK, possibly with re-ordering the matches. I've committed the patch as is because it has been tested that way and I'd like to avoid dups of that PR flowing in. Even when not reordered, the new earlier match.pd simplification will not trigger for the lt le gt ge cases anymore and the later old simplifications will trigger and I'd expect after that latter simplification the earlier should trigger again because the IL changed, no? Tamar, can you handle the reordering together with the testsuite changes (and perhaps formatting fixes in the tree.c routine)? Jakub
[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns
Just a quick ping to check this hasn't been forgotten. > -Original Message- > From: Joel Hutton > Sent: 12 November 2021 11:42 > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > > Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as > internal_fns > > > please use #define INCLUDE_MAP before the system.h include instead. > > Is it really necessary to build a new std::map for each optab lookup?! > > That looks quite ugly and inefficient. We'd usually - if necessary at > > all - build a auto_vec > and .sort () and .bsearch () > > it. > Ok, I'll rework this part. In the meantime, to address your other comment. > > > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this > > cover letter nor the patch ChangeLog explains anything. > > I'll attempt to clarify, if this makes things clearer I can include this in > the > commit message of the respun patch: > > DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it > provides convenience wrappers for defining conversions that require a hi/lo > split, like widening and narrowing operations. Each definition for > will require an optab named and two other optabs that you specify > for signed and unsigned. The hi/lo pair is necessary because the widening > operations take n narrow elements as inputs and return n/2 wide elements > as outputs. The 'lo' operation operates on the first n/2 elements of input. > The 'hi' operation operates on the second n/2 elements of input. Defining an > internal_fn along with hi/lo variations allows a single internal function to > be > returned from a vect_recog function that will later be expanded to hi/lo. > > DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a > widening internal_fn. It is defined differently in different places and > internal- > fn.def is sourced from those places so the parameters given can be reused. > internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later > defined to generate the 'expand_' functions for the hi/lo versions of the fn. > internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original > and hi/lo variants of the internal_fn > > For example: > IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, > IFN_VEC_WIDEN_PLUS_LO > for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_addl_hi_ > -> (u/s)addl2 >IFN_VEC_WIDEN_PLUS_LO -> vec_widen_addl_lo_ > -> (u/s)addl > > This gives the same functionality as the previous > WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into > VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > Let me know if I'm not expressing this clearly. > > Thanks, > Joel
Re: [PATCH] introduce predicate analysis class
On Mon, Aug 30, 2021 at 10:06 PM Martin Sebor via Gcc-patches wrote: > > The predicate analysis subset of the tree-ssa-uninit pass isn't > necessarily specific to the detection of uninitialized reads. > Suitably parameterized, the same core logic could be used in > other warning passes to improve their S/N ratio, or issue more > nuanced diagnostics (e.g., when an invalid access cannot be > ruled out but also need not in reality be unavoidable, issue > a "may be invalid" type of warning rather than "is invalid"). > > Separating the predicate analysis logic from the uninitialized > pass and exposing a narrow API should also make it easier to > understand and evolve each part independently of the other, > or replace one with a better implementation without modifying > the other.(*) > > As the first step in this direction, the attached patch extracts > the predicate analysis logic out of the pass, turns the interface > into public class members, and hides the internals in either > private members or static functions defined in a new source file. > (**) > > The changes should have no externally observable effect (i.e., > should cause no changes in warnings), except on the contents of > the uninitialized dump. While making the changes I enhanced > the dumps to help me follow the logic. Turning some previously > free-standing functions into members involved changing their > signatures and adjusting their callers. While making these > changes I also renamed some of them as well some variables for > improved clarity. Finally, I moved declarations of locals > closer to their point of initialization. > > Tested on x86_64-linux. Besides the usual bootstrap/regtest > I also tentatively verified the generality of the new class > interfaces by making use of it in -Warray-bounds. Besides there, > I'd like to make use of it in the new gimple-ssa-warn-access pass > and, longer term, any other flow-sensitive warnings that might > benefit from it. This changed can_chain_union_be_invalidated_p from for (size_t i = 0; i < uninit_pred.length (); ++i) { pred_chain c = uninit_pred[i]; size_t j; for (j = 0; j < c.length (); ++j) if (can_one_predicate_be_invalidated_p (c[j], use_guard)) break; /* If we were unable to invalidate any predicate in C, then there is a viable path from entry to the PHI where the PHI takes an uninitialized value and continues to a use of the PHI. */ if (j == c.length ()) return false; } return true; to for (unsigned i = 0; i < preds.length (); ++i) { const pred_chain &chain = preds[i]; for (unsigned j = 0; j < chain.length (); ++j) if (can_be_invalidated_p (chain[j], guard)) return true; /* If we were unable to invalidate any predicate in C, then there is a viable path from entry to the PHI where the PHI takes an interesting value and continues to a use of the PHI. */ return false; } return true; which isn't semantically equivalent (it also uses overloading to confuse me). In particular the old code checked whether an invalidation can happen for _each_ predicate in 'preds' while the new one just checks preds[0], so the loop is pointless. Catched by -Wunreachable-code complaining about unreachable ++i Martin, was that change intended? Richard. > > Martin > > [*] A review of open -Wuninitialized bugs I did while working > on this project made me aware of a number of opportunities to > improve the analyzer to reduce the number of false positives > -Wmaybe-uninitiailzed suffers from. > > [**] The class isn't fully general and, like the uninit pass, > only works with PHI nodes. I plan to generalize it to compute > the set of predicates between any two basic blocks.
Re: [PATCH] Fix typo in r12-5486.
On Thu, Nov 25, 2021 at 9:00 AM liuhongt via Gcc-patches wrote: > > TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2)) supposed to check > integer type but not pointer type, so use second parameter instead. > > i.e. first parameter is VPTR, second parameter is I4. > > 582DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_OR_4, > 583 "__atomic_fetch_or_4", > 584 BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROWCALL_LEAF_LIST) > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Failed testcases in PR are verified. > Ok for trunk? OK. > gcc/ChangeLog: > > PR middle-end/103419 > * match.pd: Fix typo, use the type of second parameter, not > first one. > --- > gcc/match.pd | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 5adcd6bd02c..09c7ce749dc 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -4053,7 +4053,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > } >(if (ibit == ibit2 >&& ibit >= 0 > - && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) > + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)) > > (match (nop_atomic_bit_test_and_p @0 @1 @3) > (bit_and (convert?@3 (SYNC_FETCH_OR_XOR_N @2 INTEGER_CST@0)) > @@ -4064,21 +4064,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > } >(if (ibit == ibit2 >&& ibit >= 0 > - && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) > + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)) > > (match (nop_atomic_bit_test_and_p @0 @0 @4) > (bit_and:c >(convert1?@4 > (ATOMIC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@5 @6)) @3)) >(convert2? @0)) > - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2) > + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0) > > (match (nop_atomic_bit_test_and_p @0 @0 @4) > (bit_and:c >(convert1?@4 > (SYNC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@3 @5 >(convert2? @0)) > - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2) > + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0) > > (match (nop_atomic_bit_test_and_p @0 @1 @3) > (bit_and@4 (convert?@3 (ATOMIC_FETCH_AND_N @2 INTEGER_CST@0 @5)) > @@ -4090,7 +4090,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > } >(if (ibit == ibit2 >&& ibit >= 0 > - && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) > + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)) > > (match (nop_atomic_bit_test_and_p @0 @1 @3) > (bit_and@4 > @@ -4103,21 +4103,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > } >(if (ibit == ibit2 >&& ibit >= 0 > - && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)) > + && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)) > > (match (nop_atomic_bit_test_and_p @4 @0 @3) > (bit_and:c >(convert1?@3 > (ATOMIC_FETCH_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 integer_onep@6 > @7))) @5)) >(convert2? @0)) > - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2) > + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4) > > (match (nop_atomic_bit_test_and_p @4 @0 @3) > (bit_and:c >(convert1?@3 > (SYNC_FETCH_AND_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 > integer_onep@6 @7) >(convert2? @0)) > - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2) > + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4) > > #endif > > -- > 2.18.1 >
Re: [PATCH] Loop unswitching: support gswitch statements.
On Wed, Nov 24, 2021 at 9:00 AM Richard Biener wrote: > > On Tue, Nov 23, 2021 at 5:36 PM Martin Liška wrote: > > > > On 11/23/21 16:20, Martin Liška wrote: > > > Sure, so for e.g. case 1 ... 5 we would need to create a new > > > unswitch_predicate > > > with 1 <= index && index <= 5 tree predicate (and the corresponding > > > irange range). > > > Later once we unswitch on it, we should use a special unreachable_flag > > > that will > > > be used for marking of dead edges (similarly how we fold gconds to > > > boolean_{false/true}_node. > > > Does it make sense? > > > > I have thought about it more and it's not enough. What we really want is > > having a irange > > for *each edge* (2 for gconds and multiple for gswitchs). Once we select a > > unswitch_predicate, > > then we need to fold_range in true/false loop all these iranges. Doing that > > we can handle situations like: > > > > if (index < 1) > > do_something1 > > > > if (index > 2) > > do_something2 > > > > switch (index) > > case 1 ... 2: > > do_something; > > ... > > > > as seen the once we unswitch on 'index < 1' and 'index > 2', then the first > > case will be taken in the false_edge > > of 'index > 2' loop unswitching. > > Hmm. I'm not sure it needs to be this complicated. We're basically > evaluating ranges/predicates based > on a fixed set of versioning predicates. Your implementation created > "predicates" for the to be simplified > conditions but in the end we like to evaluate the actual stmt to > figure the taken/not taken edges. IIRC > elsewhere Andrew showed a snipped on how to evaluate a stmt with a > given range - not sure if that > was useful enough. So what I think would be nice if we could somehow > use rangers path query > without an actual CFG. So we virtuall have > > if (versioning-predicate1) > if (versioning-predicate2) >; >else > for (;;) // out current loop > { > ... > if (condition) > ; > ... > switch (var) > { > ... > } > } > > and versioning-predicate1 and versioning-predicate2 are not in the IL. > What we'd like > to do is seed the path query with a "virtual" path through the two > predicates to the > entry of the loop and compute_ranges based on those. Then we like to > use range_of_stmt on 'if (condition)' and 'switch (var)' to determine > not taken edges. Huh, that's an interesting idea. We could definitely adapt path_range_query to work with an artificial sequence of blocks, but it would need some surgery. Off the top of my head: a) The phi handling code looks for specific edges in the path (both for intra path ranges and for relations inherent in PHIs). b) The exported ranges between blocks in the path, probably needs some massaging. c) compute_outgoing_relations would need some work as you mention below... > Looking somewhat at the sources it seems like we "simply" need to do what > compute_outgoing_relations does - unfortunately the code lacks comments > so I have no idea what jt_fur_source src (...).register_outgoing_edges does > ... fur_source is an abstraction for operands to the folding mechanism: // Source of all operands for fold_using_range and gori_compute. // It abstracts out the source of an operand so it can come from a stmt or // and edge or anywhere a derived class of fur_source wants. // The default simply picks up ranges from the current range_query. class fur_source { } When passed to register_outgoing_edges, it registers outgoing relations out of a conditional. I pass it the known outgoing edge out of the conditional, so only the relational on that edge is recorded. I have overloaded fur_source into a path specific jt_fur_source that uses a path_oracle to register relations as they would occur along a path. Once register_outgoing_edges is called on each outgoing edge between blocks in a path, the relations will have been set, and can be seen by the range_of_stmt: path_range_query::range_of_stmt (irange &r, gimple *stmt, tree) { ... // If resolving unknowns, fold the statement making use of any // relations along the path. if (m_resolve) { fold_using_range f; jt_fur_source src (stmt, this, &m_ranger->gori (), m_path); if (!f.fold_stmt (r, stmt, src)) r.set_varying (type); } ... } register_outgoing_edges would probably have to be adjusted for your CFGless paths, and maybe the path_oracle (Andrew??). My apologies. The jt_fur_source is not only wrongly named "jump_thread", but is the least obvious part of the solver. There are some comments for jt_fur_source, but its use could benefit from better comments throughout. Let's see if I have some time before my leave to document things better. Aldy > > Anyway, for now manually simplifying things is fine but I probably would still > stick to a basic interface that marks not taken outgoing edges of a stmt based > on the set of version
Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
On 24/11/2021 11:00, Richard Biener wrote: On Wed, 24 Nov 2021, Andre Vieira (lists) wrote: On 22/11/2021 12:39, Richard Biener wrote: + if (first_loop_vinfo->suggested_unroll_factor > 1) +{ + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +"* Re-trying analysis with first vector mode" +" %s for epilogue with partial vectors of" +" unrolled first loop.\n", +GET_MODE_NAME (vector_modes[0])); + mode_i = 0; and the later done check for bigger VF than main loop - why would we re-start at 0 rather than at the old mode? Maybe we want to remember the iterator value we started at when arriving at the main loop mode? So if we analyzed successfully with mode_i == 2, then sucessfully at mode_i == 4 which suggested an unroll of 2, re-start at the mode_i we continued after the mode_i == 2 successful analysis? To just consider the "simple" case of AVX vs SSE it IMHO doesn't make much sense to succeed with AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX but get a suggestion of 2 times unroll and then re-try AVX V4DF just to re-compute that yes, it's worse than SSE V2DF? You are probably thinking of SVE vs ADVSIMD here but do we need to start at 0? Adding a comment to the code would be nice. Thanks, I was indeed thinking SVE vs Advanced SIMD where we end up having to compare different vectorization strategies, which will have different costs depending. The hypothetical case, as in I don't think I've come across one, is where if we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF of 16, we may then want to then use a predicated VNx16QI epilogue. But this isn't the epilogue handling ... Am I misunderstanding the code here? To me it looks like this is picking what mode_i to start the 'while (1)' loop does the loop analysis for the epilogues?
[COMMITTED] path solver: Compute ranges in path in gimple order.
Andrew's patch for this PR103254 papered over some underlying performance issues in the path solver that I'd like to address. We are currently solving the SSA's defined in the current block in bitmap order, which amounts to random order for all purposes. This is causing unnecessary recursion in gori. This patch changes the order to gimple order, thus solving dependencies before uses. There is no change in threadable paths with this change. Tested on x86-64 & ppc64le Linux. gcc/ChangeLog: PR tree-optimization/103254 * gimple-range-path.cc (path_range_query::compute_ranges_defined): New (path_range_query::compute_ranges_in_block): Move to compute_ranges_defined. * gimple-range-path.h (compute_ranges_defined): New. --- gcc/gimple-range-path.cc | 33 ++--- gcc/gimple-range-path.h | 1 + 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index 4aa666d2c8b..e24086691c4 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block bb) } } +// Compute ranges defined in block. + +void +path_range_query::compute_ranges_defined (basic_block bb) +{ + int_range_max r; + + compute_ranges_in_phis (bb); + + // Iterate in gimple order to minimize recursion. + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) +if (gimple_has_lhs (gsi_stmt (gsi))) + { + tree name = gimple_get_lhs (gsi_stmt (gsi)); + if (TREE_CODE (name) == SSA_NAME + && bitmap_bit_p (m_imports, SSA_NAME_VERSION (name)) + && range_defined_in_block (r, name, bb)) + set_cache (r, name); + } +} + // Compute ranges defined in the current block, or exported to the // next block. @@ -423,17 +444,7 @@ path_range_query::compute_ranges_in_block (basic_block bb) clear_cache (name); } - // Solve imports defined in this block, starting with the PHIs... - compute_ranges_in_phis (bb); - // ...and then the rest of the imports. - EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi) -{ - tree name = ssa_name (i); - - if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI - && range_defined_in_block (r, name, bb)) - set_cache (r, name); -} + compute_ranges_defined (bb); if (at_exit ()) return; diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h index 57a9ae9bdcd..81c87d475dd 100644 --- a/gcc/gimple-range-path.h +++ b/gcc/gimple-range-path.h @@ -58,6 +58,7 @@ private: // Methods to compute ranges for the given path. bool range_defined_in_block (irange &, tree name, basic_block bb); void compute_ranges_in_block (basic_block bb); + void compute_ranges_defined (basic_block bb); void compute_ranges_in_phis (basic_block bb); void adjust_for_non_null_uses (basic_block bb); void ssa_range_in_phi (irange &r, gphi *phi); -- 2.31.1
[COMMITTED] path solver: Move boolean import code to compute_imports.
In a follow-up patch I will be pruning the set of exported ranges within blocks to avoid unnecessary work. In order to do this, all the interesting SSA names must be in the internal import bitmap ahead of time. I had already abstracted them out into compute_imports, but I missed the boolean code. This fixes the oversight. There's a net gain of 25 threadable paths, which is unexpected but welcome. Tested on x86-64 & ppc64le Linux. gcc/ChangeLog: PR tree-optimization/103254 * gimple-range-path.cc (path_range_query::compute_ranges): Move exported boolean code... (path_range_query::compute_imports): ...here. --- gcc/gimple-range-path.cc | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index e24086691c4..806bce9ff11 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -537,7 +537,8 @@ void path_range_query::compute_imports (bitmap imports, basic_block exit) { // Start with the imports from the exit block... - bitmap r_imports = m_ranger->gori ().imports (exit); + gori_compute &gori = m_ranger->gori (); + bitmap r_imports = gori.imports (exit); bitmap_copy (imports, r_imports); auto_vec worklist (bitmap_count_bits (imports)); @@ -579,6 +580,16 @@ path_range_query::compute_imports (bitmap imports, basic_block exit) } } } + // Exported booleans along the path, may help conditionals. + if (m_resolve) +for (i = 0; i < m_path.length (); ++i) + { + basic_block bb = m_path[i]; + tree name; + FOR_EACH_GORI_EXPORT_NAME (gori, bb, name) + if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE) + bitmap_set_bit (imports, SSA_NAME_VERSION (name)); + } } // Compute the ranges for IMPORTS along PATH. @@ -622,18 +633,6 @@ path_range_query::compute_ranges (const vec &path, { basic_block bb = curr_bb (); - if (m_resolve) - { - gori_compute &gori = m_ranger->gori (); - tree name; - - // Exported booleans along the path, may help conditionals. - // Add them as interesting imports. - FOR_EACH_GORI_EXPORT_NAME (gori, bb, name) - if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE) - bitmap_set_bit (m_imports, SSA_NAME_VERSION (name)); - } - compute_ranges_in_block (bb); adjust_for_non_null_uses (bb); -- 2.31.1
Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code
On Thu, 25 Nov 2021, Richard Biener wrote: > On Wed, 24 Nov 2021, Jason Merrill wrote: > > > On 11/24/21 11:15, Marek Polacek wrote: > > > On Wed, Nov 24, 2021 at 04:21:31PM +0100, Richard Biener via Gcc-patches > > > wrote: > > >> This resurrects -Wunreachable-code and implements a warning for > > >> trivially unreachable code as of CFG construction. Most problematic > > >> with this is the C/C++ frontend added 'return 0;' stmt in main > > >> which the patch handles for C++ like the C frontend already does > > >> by using BUILTINS_LOCATION. > > >> > > >> Another problem for future enhancement is that after CFG construction > > >> we no longer can point to the stmt making a stmt unreachable, so > > >> this implementation tries to warn on the first unreachable > > >> statement of a region. It might be possible to retain a pointer > > >> to the stmt that triggered creation of a basic-block but I'm not > > >> sure how reliable that would be. > > >> > > >> So this is really a simple attempt for now, triggered by myself > > >> running into such a coding error. As always, the perfect is the > > >> enemy of the good. > > >> > > >> It does not pass bootstrap (which enables -Wextra), because of the > > >> situation in g++.dg/Wunreachable-code-5.C where the C++ frontend > > >> prematurely elides conditions like if (! GATHER_STATISTICS) that > > >> evaluate to true - oddly enough it does _not_ do this for > > >> conditions evaluating to false ... (one of the > > >> c-c++-common/Wunreachable-code-2.c cases). > > > > > > I've taken a look into the C++ thing. This is genericize_if_stmt: > > > if we have > > > > > >if (0) > > > return; > > > > > > then cond is integer_zerop, then_ is a return_expr, but since it has > > > TREE_SIDE_EFFECTS, we create a COND_EXPR. For > > > > > >if (!0) > > > return; > > > > > > we do > > > 170 else if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_)) > > > 171 stmt = then_; > > > which elides the if completely. > > > > > > So it seems it would help if we avoided eliding the if stmt if > > > -Wunreachable-code is in effect. I'd be happy to make that change, > > > if it sounds sane. > > Yes, that seems to work. > > > Sure. > > > > Currently the front end does various constant folding as part of > > genericization, as I recall because there were missed optimizations without > > it. Is this particular one undesirable because it's at the statement level > > rather than within an expression? > > It's undesirable because it short-circuits control flow and thus > > if (0) > return; > foo (); > > becomes > > return; > foo (); > > which looks exactly like a case we want to diagnose (very likely a > programming error). > > So yes, it applies to the statement level and there only to control > statements. So another case in GCC is if (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN) ... else { /* Assert that we're only dealing with the PDP11 case. */ gcc_assert (!BYTES_BIG_ENDIAN); gcc_assert (WORDS_BIG_ENDIAN); cpp_define (pfile, "__BYTE_ORDER__=__ORDER_PDP_ENDIAN__"); where that macro expands to ((void)(!(!0) ? fancy_abort ("/home/rguenther/src/trunk/gcc/cppbuiltin.c", 180, __FUNCTION__), 0 : 0)); ((void)(!(0) ? fancy_abort ("/home/rguenther/src/trunk/gcc/cppbuiltin.c", 181, __FUNCTION__), 0 : 0)); cpp_define (pfile, "__BYTE_ORDER__=__ORDER_PDP_ENDIAN__"); and the frontend elides the COND_EXPRs making the cpp_define unreachable. That's only exposed because we no longer elide the if (1) guardingt this else path ... Also this is a case where we definitely do not want to diagnose that either the else or the true path is statically unreachable. Richard.
Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
On Thu, 25 Nov 2021, Jakub Jelinek wrote: > On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote: > > > Ah I see, sorry I didn't see that rule before, you're right that if this > > > is ordered > > > after it then they can be dropped. > > > > So the patch is OK, possibly with re-ordering the matches. > > I've committed the patch as is because it has been tested that way and I'd > like to avoid dups of that PR flowing in. Even when not reordered, the new > earlier match.pd simplification will not trigger for the lt le gt ge cases > anymore and the later old simplifications will trigger and I'd expect after > that latter simplification the earlier should trigger again because the IL > changed, no? Yes, the result always is re-folded. > Tamar, can you handle the reordering together with the testsuite changes > (and perhaps formatting fixes in the tree.c routine)?
Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)
> > gcc/ChangeLog: > > 2021-11-23 Martin Jambor > > PR ipa/103227 > * ipa-prop.h (ipa_get_param): New overload. Move bits of the existing > one to the new one. > * ipa-param-manipulation.h (ipa_param_adjustments): New member > function get_updated_index_or_split. > * ipa-param-manipulation.c > (ipa_param_adjustments::get_updated_index_or_split): New function. > * ipa-prop.c (adjust_agg_replacement_values): Reimplement, add > capability to identify scalarized parameters and perform substitution > on them. > (ipcp_transform_function): Create descriptors earlier, handle new > return values of adjust_agg_replacement_values. > > gcc/testsuite/ChangeLog: > > 2021-11-23 Martin Jambor > > PR ipa/103227 > * gcc.dg/ipa/pr103227-1.c: New test. > * gcc.dg/ipa/pr103227-3.c: Likewise. > * gcc.dg/ipa/pr103227-2.c: Likewise. > * gfortran.dg/pr53787.f90: Disable IPA-SRA. > --- > gcc/ipa-param-manipulation.c | 33 > gcc/ipa-param-manipulation.h | 7 +++ > gcc/ipa-prop.c| 73 +++ > gcc/ipa-prop.h| 15 -- > gcc/testsuite/gcc.dg/ipa/pr103227-1.c | 29 +++ > gcc/testsuite/gcc.dg/ipa/pr103227-2.c | 29 +++ > gcc/testsuite/gcc.dg/ipa/pr103227-3.c | 52 +++ > gcc/testsuite/gfortran.dg/pr53787.f90 | 2 +- > 8 files changed, 216 insertions(+), 24 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-1.c > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-2.c > create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-3.c > > diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c > index cec1dba701f..479c20b3871 100644 > --- a/gcc/ipa-param-manipulation.c > +++ b/gcc/ipa-param-manipulation.c > @@ -449,6 +449,39 @@ ipa_param_adjustments::get_updated_indices (vec > *new_indices) > } > } > > +/* If a parameter with original INDEX has survived intact, return its new > + index. Otherwise return -1. In that case, if it has been split and there > + is a new parameter representing a portion at unit OFFSET for which a value > + of a TYPE can be substituted, store its new index into SPLIT_INDEX, > + otherwise store -1 there. */ > +int > +ipa_param_adjustments::get_updated_index_or_split (int index, > +unsigned unit_offset, > +tree type, int *split_index) > +{ > + unsigned adj_len = vec_safe_length (m_adj_params); > + for (unsigned i = 0; i < adj_len ; i++) In ipa-modref I precompute this to map so we do not need to walk all params, but the loop is probably not bad since functions do not have tens of thousdands parameters :) Can I use it in ipa-modref to discover what parameters was turned from by-reference to scalar, too? > +{ > + ipa_adjusted_param *apm = &(*m_adj_params)[i]; > + if (apm->base_index != index) > + continue; > + if (apm->op == IPA_PARAM_OP_COPY) > + return i; > + if (apm->op == IPA_PARAM_OP_SPLIT > + && apm->unit_offset == unit_offset) > + { > + if (useless_type_conversion_p (apm->type, type)) > + *split_index = i; > + else > + *split_index = -1; > + return -1; > + } > +} > + > + *split_index = -1; > + return -1; > +} > + > /* Return the original index for the given new parameter index. Return a > negative number if not available. */ > > diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h > index 5adf8a22356..d1dad9fac73 100644 > --- a/gcc/ipa-param-manipulation.h > +++ b/gcc/ipa-param-manipulation.h > @@ -236,6 +236,13 @@ public: >void get_surviving_params (vec *surviving_params); >/* Fill a vector with new indices of surviving original parameters. */ >void get_updated_indices (vec *new_indices); > + /* If a parameter with original INDEX has survived intact, return its new > + index. Otherwise return -1. In that case, if it has been split and > there > + is a new parameter representing a portion at UNIT_OFFSET for which a > value > + of a TYPE can be substituted, store its new index into SPLIT_INDEX, > + otherwise store -1 there. */ > + int get_updated_index_or_split (int index, unsigned unit_offset, tree type, > + int *split_index); >/* Return the original index for the given new parameter index. Return a > negative number if not available. */ >int get_original_index (int newidx); > diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c > index e85df0971fc..a297f50e945 100644 > --- a/gcc/ipa-prop.c > +++ b/gcc/ipa-prop.c > @@ -5578,32 +5578,55 @@ ipcp_read_transformation_summaries (void) > } > > /* Adjust the aggregate replacements in AGGVAL to reflect parameters skipped > in > - NODE. */ > + NODE but also if any parameter was I
[PATCH][DOCS] docs: Add missing @option keyword.
Pushed as obvious. Martin gcc/ChangeLog: * doc/invoke.texi: Use @option for -Wuninitialized. --- gcc/doc/invoke.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d0ac59768b9..3bddfbaae6a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12117,8 +12117,8 @@ Initialize automatic variables with either a pattern or with zeroes to increase the security and predictability of a program by preventing uninitialized memory disclosure and use. GCC still considers an automatic variable that doesn't have an explicit -initializer as uninitialized, -Wuninitialized will still report warning messages -on such automatic variables. +initializer as uninitialized, @option{-Wuninitialized} will still report +warning messages on such automatic variables. With this option, GCC will also initialize any padding of automatic variables that have structure or union types to zeroes. -- 2.33.1
[PATCH] PR middle-end/103406: Check for Inf before simplifying x-x.
This is a simple one line fix to the regression PR middle-end/103406, where x - x is being folded to 0.0 even when x is +Inf or -Inf. In GCC 11 and previously, we'd check whether the type honored NaNs (which implicitly covered the case where the type honors infinities), but my patch to test whether the operand could potentially be NaN failed to also check whether the operand could potentially be Inf. This patch doesn't address the issue of NaN signedness from binary arithmetic operations, just the regression. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2021-11-25 Roger Sayle gcc/ChangeLog PR middle-end/103406 * match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p. gcc/testsuite/ChangeLog PR middle-end/103406 * gcc.dg/pr103406.c: New test case. Thanks in advance (and sorry for the inconvenience), Roger -- diff --git a/gcc/match.pd b/gcc/match.pd index f059b47..d28dfe2 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -232,7 +232,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) is volatile. */ (simplify (minus @0 @0) - (if (!FLOAT_TYPE_P (type) || !tree_expr_maybe_nan_p (@0)) + (if (!FLOAT_TYPE_P (type) + || (!tree_expr_maybe_nan_p (@0) + && !tree_expr_maybe_infinite_p (@0))) { build_zero_cst (type); })) (simplify (pointer_diff @@0 @0) diff --git a/gcc/testsuite/gcc.dg/pr103406.c b/gcc/testsuite/gcc.dg/pr103406.c new file mode 100644 index 000..9c7b83b --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr103406.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define HUGE __DBL_MAX__ +#define INF (HUGE + HUGE) +#define NAN (INF - INF) + +double foo() { + double x = -NAN; + double y = NAN; + return x + y; +} + +/* { dg-final { scan-tree-dump-not "return 0\.0" "optimized" } } */
Re: [COMMITTED] path solver: Compute ranges in path in gimple order.
On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches wrote: > > Andrew's patch for this PR103254 papered over some underlying > performance issues in the path solver that I'd like to address. > > We are currently solving the SSA's defined in the current block in > bitmap order, which amounts to random order for all purposes. This is > causing unnecessary recursion in gori. This patch changes the order > to gimple order, thus solving dependencies before uses. > > There is no change in threadable paths with this change. > > Tested on x86-64 & ppc64le Linux. > > gcc/ChangeLog: > > PR tree-optimization/103254 > * gimple-range-path.cc (path_range_query::compute_ranges_defined): New > (path_range_query::compute_ranges_in_block): Move to > compute_ranges_defined. > * gimple-range-path.h (compute_ranges_defined): New. > --- > gcc/gimple-range-path.cc | 33 ++--- > gcc/gimple-range-path.h | 1 + > 2 files changed, 23 insertions(+), 11 deletions(-) > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > index 4aa666d2c8b..e24086691c4 100644 > --- a/gcc/gimple-range-path.cc > +++ b/gcc/gimple-range-path.cc > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block bb) > } > } > > +// Compute ranges defined in block. > + > +void > +path_range_query::compute_ranges_defined (basic_block bb) > +{ > + int_range_max r; > + > + compute_ranges_in_phis (bb); > + > + // Iterate in gimple order to minimize recursion. > + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next > (&gsi)) gsi_next_nondebug (&gsi)? Of course this all has the extra cost of iterating over a possibly very large BB for just a few bits in m_imports? How often does m_imports have exactly one bit set? > +if (gimple_has_lhs (gsi_stmt (gsi))) > + { > + tree name = gimple_get_lhs (gsi_stmt (gsi)); > + if (TREE_CODE (name) == SSA_NAME > + && bitmap_bit_p (m_imports, SSA_NAME_VERSION (name)) > + && range_defined_in_block (r, name, bb)) > + set_cache (r, name); > + } So if you ever handle SSA DEFs in asms then this will not pick them. I think more generic would be to do FOR_EACH_SSA_DEF_OPERAND (..., SSA_OP_DEF) > +} > + > // Compute ranges defined in the current block, or exported to the > // next block. > > @@ -423,17 +444,7 @@ path_range_query::compute_ranges_in_block (basic_block > bb) > clear_cache (name); > } > > - // Solve imports defined in this block, starting with the PHIs... > - compute_ranges_in_phis (bb); > - // ...and then the rest of the imports. > - EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi) > -{ > - tree name = ssa_name (i); > - > - if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI > - && range_defined_in_block (r, name, bb)) > - set_cache (r, name); > -} > + compute_ranges_defined (bb); > >if (at_exit ()) > return; > diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h > index 57a9ae9bdcd..81c87d475dd 100644 > --- a/gcc/gimple-range-path.h > +++ b/gcc/gimple-range-path.h > @@ -58,6 +58,7 @@ private: >// Methods to compute ranges for the given path. >bool range_defined_in_block (irange &, tree name, basic_block bb); >void compute_ranges_in_block (basic_block bb); > + void compute_ranges_defined (basic_block bb); >void compute_ranges_in_phis (basic_block bb); >void adjust_for_non_null_uses (basic_block bb); >void ssa_range_in_phi (irange &r, gphi *phi); > -- > 2.31.1 >
Re: [PATCH] PR middle-end/103406: Check for Inf before simplifying x-x.
On Thu, Nov 25, 2021 at 12:30 PM Roger Sayle wrote: > > > This is a simple one line fix to the regression PR middle-end/103406, > where x - x is being folded to 0.0 even when x is +Inf or -Inf. > In GCC 11 and previously, we'd check whether the type honored NaNs > (which implicitly covered the case where the type honors infinities), > but my patch to test whether the operand could potentially be NaN > failed to also check whether the operand could potentially be Inf. > > This patch doesn't address the issue of NaN signedness from binary > arithmetic operations, just the regression. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check with no new failures. Ok for mainline? OK. Thanks, Richard. > > 2021-11-25 Roger Sayle > > gcc/ChangeLog > PR middle-end/103406 > * match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p. > > gcc/testsuite/ChangeLog > PR middle-end/103406 > * gcc.dg/pr103406.c: New test case. > > Thanks in advance (and sorry for the inconvenience), > Roger > -- >
Re: [COMMITTED] path solver: Compute ranges in path in gimple order.
On Thu, Nov 25, 2021 at 12:57 PM Richard Biener wrote: > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches > wrote: > > > > Andrew's patch for this PR103254 papered over some underlying > > performance issues in the path solver that I'd like to address. > > > > We are currently solving the SSA's defined in the current block in > > bitmap order, which amounts to random order for all purposes. This is > > causing unnecessary recursion in gori. This patch changes the order > > to gimple order, thus solving dependencies before uses. > > > > There is no change in threadable paths with this change. > > > > Tested on x86-64 & ppc64le Linux. > > > > gcc/ChangeLog: > > > > PR tree-optimization/103254 > > * gimple-range-path.cc (path_range_query::compute_ranges_defined): > > New > > (path_range_query::compute_ranges_in_block): Move to > > compute_ranges_defined. > > * gimple-range-path.h (compute_ranges_defined): New. > > --- > > gcc/gimple-range-path.cc | 33 ++--- > > gcc/gimple-range-path.h | 1 + > > 2 files changed, 23 insertions(+), 11 deletions(-) > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > index 4aa666d2c8b..e24086691c4 100644 > > --- a/gcc/gimple-range-path.cc > > +++ b/gcc/gimple-range-path.cc > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block > > bb) > > } > > } > > > > +// Compute ranges defined in block. > > + > > +void > > +path_range_query::compute_ranges_defined (basic_block bb) > > +{ > > + int_range_max r; > > + > > + compute_ranges_in_phis (bb); > > + > > + // Iterate in gimple order to minimize recursion. > > + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next > > (&gsi)) > > gsi_next_nondebug (&gsi)? > > Of course this all has the extra cost of iterating over a possibly very large > BB for just a few bits in m_imports? How often does m_imports have > exactly one bit set? Hmmm, good point. Perhaps this isn't worth it then. I mean, the underlying bug I'm tackling is an excess of outgoing edge ranges, not the excess recursion this patch attacks. If you think the cost would be high for large ILs, I can revert the patch. Aldy
[PATCH] Remove dead code and function
The only use of get_alias_symbol is gated by a gcc_unreachable (), so the following patch gets rid of it. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. 2021-11-24 Richard Biener * cgraphunit.c (symbol_table::output_weakrefs): Remove unreachable init. (get_alias_symbol): Remove now unused function. --- gcc/cgraphunit.c | 16 +--- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 1e58ffd65e8..3a803a34cbc 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -,17 +,6 @@ ipa_passes (void) } -/* Return string alias is alias of. */ - -static tree -get_alias_symbol (tree decl) -{ - tree alias = lookup_attribute ("alias", DECL_ATTRIBUTES (decl)); - return get_identifier (TREE_STRING_POINTER - (TREE_VALUE (TREE_VALUE (alias; -} - - /* Weakrefs may be associated to external decls and thus not output at expansion time. Emit all necessary aliases. */ @@ -2259,10 +2248,7 @@ symbol_table::output_weakrefs (void) else if (node->analyzed) target = DECL_ASSEMBLER_NAME (node->get_alias_target ()->decl); else - { - gcc_unreachable (); - target = get_alias_symbol (node->decl); - } + gcc_unreachable (); do_assemble_alias (node->decl, target); } } -- 2.31.1
[PATCH] Continue RTL verifying in rtl_verify_fallthru
One case used fatal_insn which does not return which isn't intended as can be seen by the following erro = 1. The following change refactors this to inline the relevant parts of fatal_insn instead and continue validating the RTL IL. Bootstrapped and tested on x86_64-unknown-linux-gnu, will push. 2021-11-25 Richard Biener * cfgrtl.c (rtl_verify_fallthru): Do not stop verifying with fatal_insn. (skip_insns_after_block): Remove unreachable break and continue. --- gcc/cfgrtl.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index e3a724bddb4..c7ba9006b4e 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -3001,7 +3001,8 @@ rtl_verify_fallthru (void) { error ("verify_flow_info: Incorrect fallthru %i->%i", e->src->index, e->dest->index); - fatal_insn ("wrong insn in the fallthru edge", insn); + error ("wrong insn in the fallthru edge"); + debug_rtx (insn); err = 1; } } @@ -3540,10 +3541,8 @@ skip_insns_after_block (basic_block bb) { case NOTE_INSN_BLOCK_END: gcc_unreachable (); - continue; default: continue; - break; } break; -- 2.31.1
[PATCH] Remove never looping loop in label_rtx_for_bb
This refactors the IL "walk" in a way to avoid the loop which will never iterate. Bootstrapped and tested on x86_64-unknown-linux-gnu, will push later unless there are comments explaining the function is wrong in other ways. Richard. 2021-11-25 Richard Biener * cfgexpand.c (label_rtx_for_bb): Remove dead loop construct. --- gcc/cfgexpand.c | 24 ++-- 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index eb6466f4be6..fb84d469f1e 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2461,9 +2461,6 @@ static hash_map *lab_rtx_for_bb; static rtx_code_label * label_rtx_for_bb (basic_block bb ATTRIBUTE_UNUSED) { - gimple_stmt_iterator gsi; - tree lab; - if (bb->flags & BB_RTL) return block_label (bb); @@ -2472,21 +2469,12 @@ label_rtx_for_bb (basic_block bb ATTRIBUTE_UNUSED) return *elt; /* Find the tree label if it is present. */ - - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) -{ - glabel *lab_stmt; - - lab_stmt = dyn_cast (gsi_stmt (gsi)); - if (!lab_stmt) - break; - - lab = gimple_label_label (lab_stmt); - if (DECL_NONLOCAL (lab)) - break; - - return jump_target_rtx (lab); -} + gimple_stmt_iterator gsi = gsi_start_bb (bb); + glabel *lab_stmt; + if (!gsi_end_p (gsi) + && (lab_stmt = dyn_cast (gsi_stmt (gsi))) + && !DECL_NONLOCAL (gimple_label_label (lab_stmt))) +return jump_target_rtx (gimple_label_label (lab_stmt)); rtx_code_label *l = gen_label_rtx (); lab_rtx_for_bb->put (bb, l); -- 2.31.1
[PATCH] Introduce REG_SET_EMPTY_P
This avoids a -Wunreachable-code diagnostic with EXECUTE_IF_* in case the first iteration will exit the loop. For the case in thread_jump using bitmap_empty_p looks preferable so this adds REG_SET_EMPTY_P to make that available for register sets. Bootstrapped and tested on x86_64-unknown-linux-gnu, will push. Richard. 2021-11-25 Richard Biener * regset.h (REG_SET_EMPTY_P): New macro. * cfgcleanup.c (thread_jump): Use REG_SET_EMPTY_P. --- gcc/cfgcleanup.c | 3 +-- gcc/regset.h | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 82fc505ff50..67ae0597cee 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -256,7 +256,6 @@ thread_jump (edge e, basic_block b) unsigned i; regset nonequal; bool failed = false; - reg_set_iterator rsi; /* Jump threading may cause fixup_partitions to introduce new crossing edges, which is not allowed after reload. */ @@ -379,7 +378,7 @@ thread_jump (edge e, basic_block b) goto failed_exit; } - EXECUTE_IF_SET_IN_REG_SET (nonequal, 0, i, rsi) + if (!REG_SET_EMPTY_P (nonequal)) goto failed_exit; BITMAP_FREE (nonequal); diff --git a/gcc/regset.h b/gcc/regset.h index aee6d6f974f..997b4d2d827 100644 --- a/gcc/regset.h +++ b/gcc/regset.h @@ -49,6 +49,9 @@ typedef bitmap regset; /* Clear a register set by freeing up the linked list. */ #define CLEAR_REG_SET(HEAD) bitmap_clear (HEAD) +/* True if the register set is empty. */ +#define REG_SET_EMPTY_P(HEAD) bitmap_empty_p (HEAD) + /* Copy a register set to another register set. */ #define COPY_REG_SET(TO, FROM) bitmap_copy (TO, FROM) -- 2.31.1
Re: [COMMITTED] path solver: Compute ranges in path in gimple order.
On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez wrote: > > On Thu, Nov 25, 2021 at 12:57 PM Richard Biener > wrote: > > > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches > > wrote: > > > > > > Andrew's patch for this PR103254 papered over some underlying > > > performance issues in the path solver that I'd like to address. > > > > > > We are currently solving the SSA's defined in the current block in > > > bitmap order, which amounts to random order for all purposes. This is > > > causing unnecessary recursion in gori. This patch changes the order > > > to gimple order, thus solving dependencies before uses. > > > > > > There is no change in threadable paths with this change. > > > > > > Tested on x86-64 & ppc64le Linux. > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/103254 > > > * gimple-range-path.cc > > > (path_range_query::compute_ranges_defined): New > > > (path_range_query::compute_ranges_in_block): Move to > > > compute_ranges_defined. > > > * gimple-range-path.h (compute_ranges_defined): New. > > > --- > > > gcc/gimple-range-path.cc | 33 ++--- > > > gcc/gimple-range-path.h | 1 + > > > 2 files changed, 23 insertions(+), 11 deletions(-) > > > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > > index 4aa666d2c8b..e24086691c4 100644 > > > --- a/gcc/gimple-range-path.cc > > > +++ b/gcc/gimple-range-path.cc > > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis > > > (basic_block bb) > > > } > > > } > > > > > > +// Compute ranges defined in block. > > > + > > > +void > > > +path_range_query::compute_ranges_defined (basic_block bb) > > > +{ > > > + int_range_max r; > > > + > > > + compute_ranges_in_phis (bb); > > > + > > > + // Iterate in gimple order to minimize recursion. > > > + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next > > > (&gsi)) > > > > gsi_next_nondebug (&gsi)? > > > > Of course this all has the extra cost of iterating over a possibly very > > large > > BB for just a few bits in m_imports? How often does m_imports have > > exactly one bit set? > > Hmmm, good point. > > Perhaps this isn't worth it then. I mean, the underlying bug I'm > tackling is an excess of outgoing edge ranges, not the excess > recursion this patch attacks. > > If you think the cost would be high for large ILs, I can revert the patch. I think so. If ordering is important then that should be achieved in some other ways (always a bit difficult for on-demand infrastructure). Richard. > > Aldy >
Re: [PATCH] PR tree-optimization/103359 - Check for equivalences between PHI argument and def.
On Wed, Nov 24, 2021 at 9:49 PM Andrew MacLeod via Gcc-patches wrote: > > PHI nodes frequently feed each other, and this is particularly true of > the one/two incoming edge PHIs inserted by some of the loop analysis > code which is introduced at the start of the VRP passes. > > Ranger has a hybrid of optimistic vs pessimistic evaluation, and when it > switches to pessimistic, it has to assume VARYING for a range. PHIs are > calculated as the union of all incoming edges, so once we throw a > VARYING into the mix, there's not much chance going back. (mostly > true... we can sometimes update the range when inputs change, but we > prefer to avoid iterating when possible) > > We already have code to recognize that if an argument to a PHI is the > same as the def, it cannot provide any additional information and is > skipped. ie, > ># h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)> > > We can skip the h_10 arguments, and produce [1,1][4,4] as the range with > any additional information/processing. > > This patch extends that slightly to recognize that if the argument is a > known equivalence of the def, it also does not provide any additional > information. This allows us to "ignore" some of the pessimistic VARYING > values that come in on back edges when the relation oracle indicates > that there is a known equivalence. > > Take for instance the sequence from the PR testcase: > > : ># h_7 = PHI <4(2), 1(4)> > > : ># h_18 = PHI > > : ># h_22 = PHI > > : ># h_20 = PHI > > We only fully calculate one range at a time, so when calculating h_18, > we need to first resolve the range h_22 on the back edge 3->6. That > feeds back to h_18, which isn't fully calculated yet and is > pessimistically assumed to be VARYING until we do get a value. With h_22 > being varying when resolving h_18 now, we end up makig h_18 varying, and > lose the info from h_7. > > This patch extends the equivalence observation slightly to recognize > that if the argument is a known equivalence of the def in predecessor > block, it also does not provide any additional information. This allows > us to ignore some of the pessimistic VARYING values that are set when > the relation oracle indicates that there is a known equivalence. > > In the above case, h_22 is known to be equivalent to h_18 in BB3, and so > we can ignore the range h_22 provides on any edge coming from bb3. There > is a caveat that if *all* the arguments to a PHI are in the equivalence > set, then you have to use the range of the equivalence.. otherwise you > get UNDEFINED. > > This will help us to see through some of the artifacts of cycling PHIs > in these simple cases, and in the above case, we end up with h_7, h_18, > h_22 and h_20 all in the equivalence set with a range of [1, 1][4, 4], > and we can remove the code we need to like we did in GCC11. > > This wont help with more complex PHI cycles, but that seems like > something we could be looking at elsewhere, phi-opt maybe, utilizing > ranger to set the global range when its complex. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Thanks, Richard. > Andrew > > > >
Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
On Thu, 25 Nov 2021, Andre Vieira (lists) wrote: > > On 24/11/2021 11:00, Richard Biener wrote: > > On Wed, 24 Nov 2021, Andre Vieira (lists) wrote: > > > >> On 22/11/2021 12:39, Richard Biener wrote: > >>> + if (first_loop_vinfo->suggested_unroll_factor > 1) > >>> +{ > >>> + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > >>> + { > >>> + if (dump_enabled_p ()) > >>> + dump_printf_loc (MSG_NOTE, vect_location, > >>> +"* Re-trying analysis with first vector > >>> mode" > >>> +" %s for epilogue with partial vectors of" > >>> +" unrolled first loop.\n", > >>> +GET_MODE_NAME (vector_modes[0])); > >>> + mode_i = 0; > >>> > >>> and the later done check for bigger VF than main loop - why would > >>> we re-start at 0 rather than at the old mode? Maybe we want to > >>> remember the iterator value we started at when arriving at the > >>> main loop mode? So if we analyzed successfully with mode_i == 2, > >>> then sucessfully at mode_i == 4 which suggested an unroll of 2, > >>> re-start at the mode_i we continued after the mode_i == 2 > >>> successful analysis? To just consider the "simple" case of > >>> AVX vs SSE it IMHO doesn't make much sense to succeed with > >>> AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX > >>> but get a suggestion of 2 times unroll and then re-try AVX V4DF > >>> just to re-compute that yes, it's worse than SSE V2DF? You > >>> are probably thinking of SVE vs ADVSIMD here but do we need to > >>> start at 0? Adding a comment to the code would be nice. > >>> > >>> Thanks, > >> I was indeed thinking SVE vs Advanced SIMD where we end up having to > >> compare > >> different vectorization strategies, which will have different costs > >> depending. > >> The hypothetical case, as in I don't think I've come across one, is where > >> if > >> we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF > >> of > >> 16, we may then want to then use a predicated VNx16QI epilogue. > > But this isn't the epilogue handling ... > Am I misunderstanding the code here? To me it looks like this is picking what > mode_i to start the 'while (1)' loop does the loop analysis for the epilogues? Oops, my fault, yes, it does. I would suggest to refactor things so that the mode_i = first_loop_i case is there only once. I also wonder if all the argument about starting at 0 doesn't apply to the not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So what's the reason to differ here? So in the end I'd just change the existing if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) { to if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo) || first_loop_vinfo->suggested_unroll_factor > 1) { and maybe revisit this when we have an actual testcase showing that doing sth else has a positive effect? Thanks, Richard.
Re: [COMMITTED] path solver: Compute ranges in path in gimple order.
On Thu, Nov 25, 2021 at 1:38 PM Richard Biener wrote: > > On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez wrote: > > > > On Thu, Nov 25, 2021 at 12:57 PM Richard Biener > > wrote: > > > > > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches > > > wrote: > > > > > > > > Andrew's patch for this PR103254 papered over some underlying > > > > performance issues in the path solver that I'd like to address. > > > > > > > > We are currently solving the SSA's defined in the current block in > > > > bitmap order, which amounts to random order for all purposes. This is > > > > causing unnecessary recursion in gori. This patch changes the order > > > > to gimple order, thus solving dependencies before uses. > > > > > > > > There is no change in threadable paths with this change. > > > > > > > > Tested on x86-64 & ppc64le Linux. > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/103254 > > > > * gimple-range-path.cc > > > > (path_range_query::compute_ranges_defined): New > > > > (path_range_query::compute_ranges_in_block): Move to > > > > compute_ranges_defined. > > > > * gimple-range-path.h (compute_ranges_defined): New. > > > > --- > > > > gcc/gimple-range-path.cc | 33 ++--- > > > > gcc/gimple-range-path.h | 1 + > > > > 2 files changed, 23 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > > > index 4aa666d2c8b..e24086691c4 100644 > > > > --- a/gcc/gimple-range-path.cc > > > > +++ b/gcc/gimple-range-path.cc > > > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis > > > > (basic_block bb) > > > > } > > > > } > > > > > > > > +// Compute ranges defined in block. > > > > + > > > > +void > > > > +path_range_query::compute_ranges_defined (basic_block bb) > > > > +{ > > > > + int_range_max r; > > > > + > > > > + compute_ranges_in_phis (bb); > > > > + > > > > + // Iterate in gimple order to minimize recursion. > > > > + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); > > > > gsi_next (&gsi)) > > > > > > gsi_next_nondebug (&gsi)? > > > > > > Of course this all has the extra cost of iterating over a possibly very > > > large > > > BB for just a few bits in m_imports? How often does m_imports have > > > exactly one bit set? > > > > Hmmm, good point. > > > > Perhaps this isn't worth it then. I mean, the underlying bug I'm > > tackling is an excess of outgoing edge ranges, not the excess > > recursion this patch attacks. > > > > If you think the cost would be high for large ILs, I can revert the patch. > > I think so. If ordering is important then that should be achieved in some > other ways (always a bit difficult for on-demand infrastructure). Nah, this isn't a correctness issue. It's not worth it. I will revert the patch. Thanks. Aldy
Re: [PATCH] Remove dead code and function
> The only use of get_alias_symbol is gated by a gcc_unreachable (), > so the following patch gets rid of it. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? OK, thanks! Honza
Do not check gimple_call_chain in tree-ssa-alias
Hi, this pach removes gimple_call_cahin checkin ref_maybe_used_by_call_p that disables check for CONST functions. I suppose it was meant to allow consts to read variables from the static chain but this is not what other places do. The testcase: int main() { int a =0; __attribute__ ((noinline,const)) int reta () { return a; } int val = reta(); a=1; return val+reta (); } Gets optimized to single call of reta since at least gcc 4.1. LTO bootstrapped and regtested x86_64-linux all languages. OK? * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check gimple_call_call when treating const functions. diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index cd6a0b2f67b..3c253e2843f 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -2743,9 +2743,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, bool tbaa_p) unsigned i; int flags = gimple_call_flags (call); - /* Const functions without a static chain do not implicitly use memory. */ - if (!gimple_call_chain (call) - && (flags & (ECF_CONST|ECF_NOVOPS))) + if (flags & (ECF_CONST|ECF_NOVOPS)) goto process_args; /* A call that is not without side-effects might involve volatile
Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code
Hello, On Thu, 25 Nov 2021, Richard Biener wrote: > > Yes, that's definitely the case - I was too lazy to re-use the old > > option name here. But I don't have a good name at hand, maybe clang > > has an option covering the cases I'm thinking about. As you asked: I already have difficulties to describe the exact semantics of the warning in sentences, so I don't find a good name either :-) > > Btw, the diagnostic spotted qsort_chk doing > > > > if (CMP (i1, i2)) > > break; > > else if (CMP (i2, i1)) > > return ERR2 (i1, i2); > > > > where ERR2 expands to a call to a noreturn void "returning" > > qsort_chk_error, so the 'return' stmt is not reachable. Not exactly > > a bug but somewhat difficult to avoid the diagnostic for. I suppose > > the pointless 'return' is to make it more visible that the loop > > terminates here (albeit we don't return normally). Tough one. You could also disable the warning when the fallthrough doesn't exist because of a non-returning call. If it's supposed to find obvious programming mistakes it might make sense to regard all function calls the same, like they look, i.e. as function calls that can return. Or it might make sense to not do that for programmers who happen to know about non-returning functions. :-/ > It also finds this strange code in label_rtx_for_bb: So the warning is definitely useful! > indeed the loop looks pointless. Unless the DECL_NONLOCAL case was > meant to continue; It's like that since it was introduced in 2007. It's an invariant that DECL_NONLOCAL labels are first in a BB and are not followed by normal labels, so a 'continue' wouldn't change anything; the loop is useless. Ciao, Michael.
[PATCH] [RFC] unreachable returns
We have quite a number of "default" returns that cannot be reached. One is particularly interesting since it says (see patch below): default: gcc_unreachable (); } /* We can get here with --disable-checking. */ return false; which suggests that _maybe_ the intention was to have the gcc_unreachable () which expands to __builtin_unreachable () with --disable-checking and thus a fallthru to "somewhere" be catched with a "sane" default return value rather than falling through to the next function or so. BUT - that isn't what actually happens since the 'return false' is unreachable after CFG construction and will be elided. In fact the IL after CFG construction is exactly the same with and without the spurious return. Now, I wonder if we should, instead of expanding gcc_unreachable to __builtin_unreachable () with --disable-checking, expand it to __builtin_trap () (or remove the --disable-checking variant completely, always retaining assert level checking but maybe make it cheaper in size by using __builtin_trap () or abort ()) Thoughts? That said, I do have a set of changes removing such spurious returns. 2021-11-25 Richard Biener gcc/c/ * c-typeck.c (c_tree_equal): Remove unreachable return. --- gcc/c/c-typeck.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index b71358e1821..7524304f2bd 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -15984,8 +15984,6 @@ c_tree_equal (tree t1, tree t2) default: gcc_unreachable (); } - /* We can get here with --disable-checking. */ - return false; } /* Returns true when the function declaration FNDECL is implicit, -- 2.31.1
Re: [PATCH] [RFC] unreachable returns
> We have quite a number of "default" returns that cannot be reached. > One is particularly interesting since it says (see patch below): > > default: >gcc_unreachable (); > } >/* We can get here with --disable-checking. */ >return false; > > which suggests that _maybe_ the intention was to have the > gcc_unreachable () which expands to __builtin_unreachable () > with --disable-checking and thus a fallthru to "somewhere" > be catched with a "sane" default return value rather than > falling through to the next function or so. BUT - that > isn't what actually happens since the 'return false' is > unreachable after CFG construction and will be elided. I think this is just remat of times we did not have __builtin_unreachable. I like the idea of removing the redundant code. Honza
[PATCH] Remove unreachable gcc_unreachable () at the end of functions
It seems to be a style to place gcc_unreachable () after a switch that handles all cases with every case returning. Those are unreachable (well, yes!), so they will be elided at CFG construction time and the middle-end will place another __builtin_unreachable "after" them to note the path doesn't lead to a return when the function is not declared void. So IMHO those explicit gcc_unreachable () serve no purpose, if they could be replaced by a comment. But since all cases cover switches not handling a case or not returning will likely cause some diagnostic to be emitted which is better than running into an ICE only at runtime. Bootstrapped and tested on x86_64-unknown-linux-gnu - any comments? Thanks, Richard. 2021-11-24 Richard Biener * tree.h (reverse_storage_order_for_component_p): Remove spurious gcc_unreachable. * cfganal.c (dfs_find_deadend): Likewise. * fold-const-call.c (fold_const_logb): Likewise. (fold_const_significand): Likewise. * gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p): Likewise. gcc/c-family/ * c-format.c (check_format_string): Remove spurious gcc_unreachable. --- gcc/c-family/c-format.c| 2 -- gcc/cfganal.c | 2 -- gcc/fold-const-call.c | 2 -- gcc/gimple-ssa-store-merging.c | 2 -- gcc/tree.h | 2 -- 5 files changed, 10 deletions(-) diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index e735e092043..617fb5ea626 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -296,8 +296,6 @@ check_format_string (const_tree fntype, unsigned HOST_WIDE_INT format_num, *no_add_attrs = true; return false; } - - gcc_unreachable (); } /* Under the control of FLAGS, verify EXPR is a valid constant that diff --git a/gcc/cfganal.c b/gcc/cfganal.c index 0cba612738d..48598e55c01 100644 --- a/gcc/cfganal.c +++ b/gcc/cfganal.c @@ -752,8 +752,6 @@ dfs_find_deadend (basic_block bb) next = e ? e->dest : EDGE_SUCC (bb, 0)->dest; } } - - gcc_unreachable (); } diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c index d6cb9b11a31..c542e780a18 100644 --- a/gcc/fold-const-call.c +++ b/gcc/fold-const-call.c @@ -429,7 +429,6 @@ fold_const_logb (real_value *result, const real_value *arg, } return false; } - gcc_unreachable (); } /* Try to evaluate: @@ -463,7 +462,6 @@ fold_const_significand (real_value *result, const real_value *arg, } return false; } - gcc_unreachable (); } /* Try to evaluate: diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c index e7c90ba8b59..13413ca4cd6 100644 --- a/gcc/gimple-ssa-store-merging.c +++ b/gcc/gimple-ssa-store-merging.c @@ -4861,8 +4861,6 @@ lhs_valid_for_store_merging_p (tree lhs) default: return false; } - - gcc_unreachable (); } /* Return true if the tree RHS is a constant we want to consider diff --git a/gcc/tree.h b/gcc/tree.h index f0e72b55abe..094501bd9b1 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -5110,8 +5110,6 @@ reverse_storage_order_for_component_p (tree t) default: return false; } - - gcc_unreachable (); } /* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR -- 2.31.1
Re: Do not check gimple_call_chain in tree-ssa-alias
On Thu, 25 Nov 2021, Jan Hubicka wrote: > Hi, > this pach removes gimple_call_cahin checkin ref_maybe_used_by_call_p that > disables check for CONST functions. I suppose it was meant to allow consts > to read variables from the static chain but this is not what other places > do. The testcase: > > int > main() > { > int a =0; > __attribute__ ((noinline,const)) > int reta () > { > return a; > } > int val = reta(); > a=1; > return val+reta (); > } > > Gets optimized to single call of reta since at least gcc 4.1. > > LTO bootstrapped and regtested x86_64-linux all languages. OK? I suppose at some point it broke. But yes, I agree, thus OK. Thanks, Richard. > * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check > gimple_call_call when treating const functions. > > diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c > index cd6a0b2f67b..3c253e2843f 100644 > --- a/gcc/tree-ssa-alias.c > +++ b/gcc/tree-ssa-alias.c > @@ -2743,9 +2743,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, > bool tbaa_p) >unsigned i; >int flags = gimple_call_flags (call); > > - /* Const functions without a static chain do not implicitly use memory. */ > - if (!gimple_call_chain (call) > - && (flags & (ECF_CONST|ECF_NOVOPS))) > + if (flags & (ECF_CONST|ECF_NOVOPS)) > goto process_args; > >/* A call that is not without side-effects might involve volatile > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
Re: [PATCH] PR tree-optimization/103359 - Check for equivalences between PHI argument and def.
On 11/25/21 07:40, Richard Biener wrote: On Wed, Nov 24, 2021 at 9:49 PM Andrew MacLeod via Gcc-patches wrote: PHI nodes frequently feed each other, and this is particularly true of the one/two incoming edge PHIs inserted by some of the loop analysis code which is introduced at the start of the VRP passes. Ranger has a hybrid of optimistic vs pessimistic evaluation, and when it switches to pessimistic, it has to assume VARYING for a range. PHIs are calculated as the union of all incoming edges, so once we throw a VARYING into the mix, there's not much chance going back. (mostly true... we can sometimes update the range when inputs change, but we prefer to avoid iterating when possible) We already have code to recognize that if an argument to a PHI is the same as the def, it cannot provide any additional information and is skipped. ie, # h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)> We can skip the h_10 arguments, and produce [1,1][4,4] as the range with any additional information/processing. This patch extends that slightly to recognize that if the argument is a known equivalence of the def, it also does not provide any additional information. This allows us to "ignore" some of the pessimistic VARYING values that come in on back edges when the relation oracle indicates that there is a known equivalence. Take for instance the sequence from the PR testcase: : # h_7 = PHI <4(2), 1(4)> : # h_18 = PHI : # h_22 = PHI : # h_20 = PHI We only fully calculate one range at a time, so when calculating h_18, we need to first resolve the range h_22 on the back edge 3->6. That feeds back to h_18, which isn't fully calculated yet and is pessimistically assumed to be VARYING until we do get a value. With h_22 being varying when resolving h_18 now, we end up makig h_18 varying, and lose the info from h_7. This patch extends the equivalence observation slightly to recognize that if the argument is a known equivalence of the def in predecessor block, it also does not provide any additional information. This allows us to ignore some of the pessimistic VARYING values that are set when the relation oracle indicates that there is a known equivalence. In the above case, h_22 is known to be equivalent to h_18 in BB3, and so we can ignore the range h_22 provides on any edge coming from bb3. There is a caveat that if *all* the arguments to a PHI are in the equivalence set, then you have to use the range of the equivalence.. otherwise you get UNDEFINED. This will help us to see through some of the artifacts of cycling PHIs in these simple cases, and in the above case, we end up with h_7, h_18, h_22 and h_20 all in the equivalence set with a range of [1, 1][4, 4], and we can remove the code we need to like we did in GCC11. This wont help with more complex PHI cycles, but that seems like something we could be looking at elsewhere, phi-opt maybe, utilizing ranger to set the global range when its complex. Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK? OK. Thanks, Richard. Committed.
Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code
On Thu, 25 Nov 2021, Michael Matz wrote: > Hello, > > On Thu, 25 Nov 2021, Richard Biener wrote: > > > > Yes, that's definitely the case - I was too lazy to re-use the old > > > option name here. But I don't have a good name at hand, maybe clang > > > has an option covering the cases I'm thinking about. > > As you asked: I already have difficulties to describe the exact semantics > of the warning in sentences, so I don't find a good name either :-) It diagnoses some cases of unreachable code so -Wunreachable-code sounded like the obvious fit :P But names can create (wrong) expectation ... clang has -Wunreachable-code{,-aggressive,-break,-fallthrough,-loop-increment,-return} but documentation is very sparse, -break and -return are what -aggressive enables. > > > Btw, the diagnostic spotted qsort_chk doing > > > > > > if (CMP (i1, i2)) > > > break; > > > else if (CMP (i2, i1)) > > > return ERR2 (i1, i2); > > > > > > where ERR2 expands to a call to a noreturn void "returning" > > > qsort_chk_error, so the 'return' stmt is not reachable. Not exactly > > > a bug but somewhat difficult to avoid the diagnostic for. I suppose > > > the pointless 'return' is to make it more visible that the loop > > > terminates here (albeit we don't return normally). > > Tough one. You could also disable the warning when the fallthrough > doesn't exist because of a non-returning call. If it's supposed to find > obvious programming mistakes it might make sense to regard all function > calls the same, like they look, i.e. as function calls that can return. > Or it might make sense to not do that for programmers who happen to know > about non-returning functions. :-/ > > > It also finds this strange code in label_rtx_for_bb: > > So the warning is definitely useful! Yep, also found some more real issues. But I'm not managing it to get clean in a bootstrap due to some remaining issues with early folding exposing unreachable code following gcc_assert()s Richard.
Re: [AArch64] Enable generation of FRINTNZ instructions
On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It seems vectorizable_call expects all arguments to have the same type, which doesn't work with passing the integer type as an operand work around. We already special case some IFNs there (masked load/store and gather) to ignore some args, so that would just add to this set. Richard. Hi, Reworked it to add support of the new IFN to the vectorizer. Was initially trying to make vectorizable_call and vectorizable_internal_function handle IFNs with different inputs more generically, using the information we have in the _direct structs regarding what operands to get the modes from. Unfortunately, that wasn't straightforward because of how vectorizable_call assumes operands have the same type and uses the type of the DEF_STMT_INFO of the non-constant operands (either output operand or non-constant inputs) to determine the type of constants. I assume there is some reason why we use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on the argument types. That is why I ended up with this sort of half-way mix of both, which still allows room to add more IFNs that don't take inputs of the same type, but require adding a bit of special casing similar to the IFN_FTRUNC_INT and masking ones. Bootstrapped on aarch64-none-linux. OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64.md (ftrunc2): New pattern. * config/aarch64/iterators.md (FRINTNZ): New iterator. (frintnz_mode): New int attribute. (VSFDF): Make iterator conditional. * internal-fn.def (FTRUNC_INT): New IFN. * internal-fn.c (ftrunc_int_direct): New define. (expand_ftrunc_int_optab_fn): New custom expander. (direct_ftrunc_int_optab_supported_p): New supported_p. * match.pd: Add to the existing TRUNC pattern match. * optabs.def (ftrunc_int): New entry. * stor-layout.h (element_precision): Moved from here... * tree.h (element_precision): ... to here. (element_type): New declaration. * tree.c (element_type): New function. (element_precision): Changed to use element_type. * tree-vect-stmts.c (vectorizable_internal_function): Add support for IFNs with different input types. (vectorizable_call): Teach to handle IFN_FTRUNC_INT. * doc/md.texi: New entry for ftrunc pattern name. * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. gcc/testsuite/ChangeLog: * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz instruction available. * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target. * gcc.target/aarch64/frintnz.c: New test. * gcc.target/aarch64/frintnz_vec.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti" (set_attr "speculation_barrier" "true")] ) +(define_expand "ftrunc2" + [(set (match_operand:VSFDF 0 "register_operand" "=w") +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] + FRINTNZ))] + "TARGET_FRINT" +) + (define_insn "aarch64_" [(set (match_operand:VSFDF 0 "register_operand" "=w") (unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")] FRINTNZX))] - "TARGET_FRINT && TARGET_FLOAT - && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)" + "TARGET_FRINT" "\\t%0, %1" [(set_attr "type" "f_rint")] ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST") SF DF]) ;; Scalar and vetor modes for SF, DF. -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD") + (V4SF "TARGET_SIMD") + (V2DF "TARGET_SIMD") + (DF "TARGET_FLOAT") + (SF "TARGET_FLOAT")]) ;; Advanced SIMD single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) @@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X UNSPEC_FRINT64Z UNSPEC_FRINT64X]) +(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z]) + (define_int_iter
[PATCH] d: fix ASAN in option processing
Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0666ca5c at pc 0x00ef094b bp 0x7fff8180 sp 0x7fff8178 READ of size 4 at 0x0666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. Ready for master? Thanks, Martin gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options. --- gcc/d/d-attribs.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc index d81b7d122f7..1ec800526f7 100644 --- a/gcc/d/d-attribs.cc +++ b/gcc/d/d-attribs.cc @@ -852,7 +852,9 @@ parse_optimize_options (tree args) unsigned j = 1; for (unsigned i = 1; i < decoded_options_count; ++i) { - if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION)) + unsigned opt_index = decoded_options[i].opt_index; + if (opt_index >= cl_options_count + && ! (cl_options[opt_index].flags & CL_OPTIMIZATION)) { ret = false; warning (OPT_Wattributes, -- 2.34.0
Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions
Hello, On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote: > It seems to be a style to place gcc_unreachable () after a > switch that handles all cases with every case returning. > Those are unreachable (well, yes!), so they will be elided > at CFG construction time and the middle-end will place > another __builtin_unreachable "after" them to note the > path doesn't lead to a return when the function is not declared > void. > > So IMHO those explicit gcc_unreachable () serve no purpose, > if they could be replaced by a comment. Never document in comments what you can document in code (IMO). I think the code as-is clearly documents the invariants and expectations and removing the gcc_unreachable() leads to worse sources. Can't you simply exempt warning on unreachable __builtin_unreachable()? It seems an obvious thing that the warning should _not_ warn about, after all, quite clearly, the author is aware of that being unreachable, it says so, right there. Ciao, Michael.
[COMMITTED] PR tree-optimization/102648 - Add the testcase for this PR to the testsuite.
Various ranger enabled passes, such as threading, or VRP2 resolve this now. I'm adding the test case before closing. committed as obvious. Andrew commit 1598bd47b2a4a5f12b5a987d16d82634644db4b6 Author: Andrew MacLeod Date: Thu Nov 25 08:58:19 2021 -0500 Add the testcase for this PR to the testsuite. Various ranger-enabled patches like threading and VRP2 can do this now, so add the testcase for posterity. gcc/testsuite/ PR tree-optimization/102648 * gcc.dg/pr102648.c: New. diff --git a/gcc/testsuite/gcc.dg/pr102648.c b/gcc/testsuite/gcc.dg/pr102648.c new file mode 100644 index 000..a0f6386dde3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr102648.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +void foo(); +static char a, c; +static int d, e; +static short b(short f, short g) { return f * g; } +int main() { + short h = 4; + for (; d;) +if (h) + if(e) { +if (!b(a & 1 | h, 3)) + c = 0; +h = 1; + } + if (c) +foo(); +} + +/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
[PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
Hi Jakub, This is a rebased/slightly bug-fixed version of several previously posted patch series, all in one place for ease of reference. The series should be applied on top of Chung-Lin's two patches: "Improve OpenMP target support for C++ [PR92120 v5]" https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html "Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)" https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584994.html And supersedes the following three patch series: "Topological sort for OpenMP 5.0 base pointers" https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577211.html "OpenMP: Deep struct dereferences" https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580721.html "Parsing of lvalues for "map" clauses for C and C++" https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584445.html Tested with offloading to NVPTX and bootstrapped. Further commentary on individual patches. OK? Thanks, Julian Julian Brown (16): Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset OpenACC/OpenMP: Refactor struct lowering in gimplify.c OpenACC: Rework indirect struct handling in gimplify.c Remove base_ind/base_ref handling from extract_base_bit_offset OpenMP 5.0: Clause ordering for OpenMP 5.0 (topological sorting by base pointer) Remove omp_target_reorder_clauses OpenMP/OpenACC: Hoist struct sibling list handling in gimplification OpenMP: Allow array ref components for C & C++ OpenMP: Fix non-zero attach/detach bias for struct dereferences OpenMP: Handle reference-typed struct members OpenACC: Make deep-copy-arrayofstruct.c a libgomp/runtime test Add debug_omp_expr OpenMP: Add inspector class to unify mapped address analysis OpenMP: lvalue parsing for map clauses (C++) OpenMP: lvalue parsing for map clauses (C) gcc/c-family/c-common.h | 45 + gcc/c-family/c-omp.c | 210 ++ gcc/c/c-parser.c | 150 +- gcc/c/c-tree.h|1 + gcc/c/c-typeck.c | 250 +- gcc/cp/error.c|9 + gcc/cp/parser.c | 141 +- gcc/cp/parser.h |3 + gcc/cp/semantics.c| 290 +- gcc/fortran/trans-openmp.c| 20 +- gcc/gimplify.c| 2458 +++-- gcc/omp-low.c | 23 +- gcc/testsuite/c-c++-common/gomp/map-1.c |3 +- gcc/testsuite/c-c++-common/gomp/map-6.c |6 +- gcc/testsuite/g++.dg/goacc/member-array-acc.C | 13 + gcc/testsuite/g++.dg/gomp/ind-base-3.C| 38 + gcc/testsuite/g++.dg/gomp/map-assignment-1.C | 12 + gcc/testsuite/g++.dg/gomp/map-inc-1.C | 10 + gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C | 19 + gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C | 36 + gcc/testsuite/g++.dg/gomp/map-ptrmem-2.C | 39 + .../g++.dg/gomp/map-static-cast-lvalue-1.C| 17 + gcc/testsuite/g++.dg/gomp/map-ternary-1.C | 20 + gcc/testsuite/g++.dg/gomp/member-array-2.C| 86 + gcc/testsuite/g++.dg/gomp/member-array-omp.C | 13 + gcc/testsuite/g++.dg/gomp/pr67522.C |2 +- gcc/testsuite/g++.dg/gomp/target-3.C |4 +- gcc/testsuite/g++.dg/gomp/target-lambda-1.C |6 +- gcc/testsuite/g++.dg/gomp/target-this-2.C |2 +- gcc/testsuite/g++.dg/gomp/target-this-3.C |4 +- gcc/testsuite/g++.dg/gomp/target-this-4.C |4 +- .../g++.dg/gomp/unmappable-component-1.C | 21 + gcc/tree-pretty-print.c | 45 + gcc/tree-pretty-print.h |1 + gcc/tree.def |3 + libgomp/testsuite/libgomp.c++/baseptrs-3.C| 275 ++ libgomp/testsuite/libgomp.c++/ind-base-1.C| 162 ++ libgomp/testsuite/libgomp.c++/ind-base-2.C| 49 + libgomp/testsuite/libgomp.c++/map-comma-1.C | 15 + .../testsuite/libgomp.c++/map-rvalue-ref-1.C | 22 + .../testsuite/libgomp.c++/member-array-1.C| 89 + libgomp/testsuite/libgomp.c++/struct-ref-1.C | 97 + .../libgomp.c-c++-common/baseptrs-1.c | 50 + .../libgomp.c-c++-common/baseptrs-2.c | 70 + .../libgomp.c-c++-common/ind-base-4.c | 50 + .../libgomp.c-c++-common/unary-ptr-1.c| 16 + .../testsuite/libgomp.oacc-c++/deep-copy-17.C | 101 + .../libgomp.oacc-c-c++-common/deep-copy-15.c | 68 + .../libgomp.oacc-c-c++-common/deep-copy-16.c | 231 ++ .../deep-copy-arrayofstruct.c |2 +- 50 files changed, 4114 insertions(+), 1187 deletions(-) create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C create mode 100644 gcc/testsuite/g++.dg/gomp/ind-base-3.C cr
[PATCH 01/16] Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally
It never makes sense for a GOMP_MAP_ATTACH_DETACH mapping to survive beyond gimplify.c, so this patch rewrites such mappings to GOMP_MAP_ATTACH or GOMP_MAP_DETACH unconditionally (rather than checking for a list of types of OpenACC or OpenMP constructs), in cases where it hasn't otherwise been done already in the preceding code. Previously posted here: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570399.html https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571711.html (og11) OK? Thanks, Julian 2021-06-02 Julian Brown gcc/ * gimplify.c (gimplify_scan_omp_clauses): Simplify condition for changing GOMP_MAP_ATTACH_DETACH to GOMP_MAP_ATTACH or GOMP_MAP_DETACH. --- gcc/gimplify.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 4cd62270a10..8d8735ae4c1 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -9965,15 +9965,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, skip_map_struct: ; } - else if ((code == OACC_ENTER_DATA - || code == OACC_EXIT_DATA - || code == OACC_DATA - || code == OACC_PARALLEL - || code == OACC_KERNELS - || code == OACC_SERIAL - || code == OMP_TARGET_ENTER_DATA - || code == OMP_TARGET_EXIT_DATA) - && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH) + else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH) { gomp_map_kind k = ((code == OACC_EXIT_DATA || code == OMP_TARGET_EXIT_DATA) -- 2.29.2
[PATCH 02/16] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset
This patch slightly cleans up the semantics of extract_base_bit_offset, in that the stripping of ARRAY_REFS/INDIRECT_REFS out of extract_base_bit_offset is moved back into the (two) call sites of the function. This is done in preparation for follow-on patches that extend the function. Previously posted for the og11 branch here (patch & reversion/rework): https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571712.html https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571884.html OK? Thanks, Julian 2021-06-03 Julian Brown gcc/ * gimplify.c (extract_base_bit_offset): Don't look through ARRAY_REFs or INDIRECT_REFs here. (build_struct_group): Reinstate previous behaviour for handling ARRAY_REFs/INDIRECT_REFs. --- gcc/gimplify.c | 59 +- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 8d8735ae4c1..1baea68920b 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8675,31 +8675,7 @@ extract_base_bit_offset (tree base, tree *base_ref, poly_int64 *bitposp, poly_offset_int poffset; if (base_ref) -{ - *base_ref = NULL_TREE; - - while (TREE_CODE (base) == ARRAY_REF) - base = TREE_OPERAND (base, 0); - - if (TREE_CODE (base) == INDIRECT_REF) - base = TREE_OPERAND (base, 0); -} - else -{ - if (TREE_CODE (base) == ARRAY_REF) - { - while (TREE_CODE (base) == ARRAY_REF) - base = TREE_OPERAND (base, 0); - if (TREE_CODE (base) != COMPONENT_REF - || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE) - return NULL_TREE; - } - else if (TREE_CODE (base) == INDIRECT_REF - && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) - == REFERENCE_TYPE)) - base = TREE_OPERAND (base, 0); -} +*base_ref = NULL_TREE; base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep); @@ -9673,12 +9649,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, poly_offset_int offset1; poly_int64 bitpos1; tree tree_offset1; - tree base_ref; + tree base_ref, ocd = OMP_CLAUSE_DECL (c); - tree base - = extract_base_bit_offset (OMP_CLAUSE_DECL (c), &base_ref, - &bitpos1, &offset1, - &tree_offset1); + while (TREE_CODE (ocd) == ARRAY_REF) + ocd = TREE_OPERAND (ocd, 0); + + if (TREE_CODE (ocd) == INDIRECT_REF) + ocd = TREE_OPERAND (ocd, 0); + + tree base = extract_base_bit_offset (ocd, &base_ref, + &bitpos1, &offset1, + &tree_offset1); bool do_map_struct = (base == decl && !tree_offset1); @@ -9871,6 +9852,24 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, poly_offset_int offsetn; poly_int64 bitposn; tree tree_offsetn; + + if (TREE_CODE (sc_decl) == ARRAY_REF) + { + while (TREE_CODE (sc_decl) == ARRAY_REF) + sc_decl = TREE_OPERAND (sc_decl, 0); + if (TREE_CODE (sc_decl) != COMPONENT_REF + || (TREE_CODE (TREE_TYPE (sc_decl)) + != ARRAY_TYPE)) + break; + } + else if (TREE_CODE (sc_decl) == INDIRECT_REF +&& (TREE_CODE (TREE_OPERAND (sc_decl, 0)) +== COMPONENT_REF) +&& (TREE_CODE (TREE_TYPE + (TREE_OPERAND (sc_decl, 0))) +== REFERENCE_TYPE)) + sc_decl = TREE_OPERAND (sc_decl, 0); + tree base = extract_base_bit_offset (sc_decl, NULL, &bitposn, &offsetn, -- 2.29.2
[PATCH 03/16] OpenACC/OpenMP: Refactor struct lowering in gimplify.c
(Previously submitted here: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570398.html) This patch is a second attempt at refactoring struct component mapping handling for OpenACC/OpenMP during gimplification, after the patch I posted here: https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html And improved here, post-review: https://gcc.gnu.org/pipermail/gcc-patches/2019-November/533394.html This patch goes further, in that the struct-handling code is outlined into its own function (to create the "GOMP_MAP_STRUCT" node and the sorted list of nodes immediately following it, from a set of mappings of components of a given struct or derived type). I've also gone through the list-handling code and attempted to add comments documenting how it works to the best of my understanding, and broken out a couple of helper functions in order to (hopefully) have the code self-document better also. OK? Thanks, Julian 2021-06-02 Julian Brown gcc/ * gimplify.c (insert_struct_comp_map): Refactor function into... (build_struct_comp_nodes): This new function. Remove list handling and improve self-documentation. (insert_node_after, move_node_after, move_nodes_after, move_concat_nodes_after): New helper functions. (build_struct_group): New function to build up GOMP_MAP_STRUCT node groups to map struct components. Outlined from... (gimplify_scan_omp_clauses): Here. Call above function. --- gcc/gimplify.c | 976 +++-- 1 file changed, 611 insertions(+), 365 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 1baea68920b..c5e058d6d1f 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8588,73 +8588,66 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p) return 1; } -/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a - GOMP_MAP_STRUCT mapping. C is an always_pointer mapping. STRUCT_NODE is - the struct node to insert the new mapping after (when the struct node is - initially created). PREV_NODE is the first of two or three mappings for a - pointer, and is either: - - the node before C, when a pair of mappings is used, e.g. for a C/C++ - array section. - - not the node before C. This is true when we have a reference-to-pointer - type (with a mapping for the reference and for the pointer), or for - Fortran derived-type mappings with a GOMP_MAP_TO_PSET. - If SCP is non-null, the new node is inserted before *SCP. - if SCP is null, the new node is inserted before PREV_NODE. - The return type is: - - PREV_NODE, if SCP is non-null. - - The newly-created ALLOC or RELEASE node, if SCP is null. - - The second newly-created ALLOC or RELEASE node, if we are mapping a - reference to a pointer. */ +/* For a set of mappings describing an array section pointed to by a struct + (or derived type, etc.) component, create an "alloc" or "release" node to + insert into a list following a GOMP_MAP_STRUCT node. For some types of + mapping (e.g. Fortran arrays with descriptors), an additional mapping may + be created that is inserted into the list of mapping nodes attached to the + directive being processed -- not part of the sorted list of nodes after + GOMP_MAP_STRUCT. + + CODE is the code of the directive being processed. GRP_START and GRP_END + are the first and last of two or three nodes representing this array section + mapping (e.g. a data movement node like GOMP_MAP_{TO,FROM}, optionally a + GOMP_MAP_TO_PSET, and finally a GOMP_MAP_ALWAYS_POINTER). EXTRA_NODE is + filled with the additional node described above, if needed. + + This function does not add the new nodes to any lists itself. It is the + responsibility of the caller to do that. */ static tree -insert_struct_comp_map (enum tree_code code, tree c, tree struct_node, - tree prev_node, tree *scp) +build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, +tree *extra_node) { enum gomp_map_kind mkind = (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA) ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC; - tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - tree cl = scp ? prev_node : c2; + gcc_assert (grp_start != grp_end); + + tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP); OMP_CLAUSE_SET_MAP_KIND (c2, mkind); - OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c)); - OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node; - if (OMP_CLAUSE_CHAIN (prev_node) != c - && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP - && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node)) - == GOMP_MAP_TO_PSET)) -OMP_CLAUSE_SIZE (c2) = OMP_CLAUSE_SIZE (OMP_CLAUSE_CHAIN (prev_node)); + OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end)); + OMP_CLAUSE_CHAIN (c2)
[PATCH 04/16] OpenACC: Rework indirect struct handling in gimplify.c
(Previously posted here: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570400.html) This patch reworks indirect struct handling in gimplify.c (i.e. for struct components mapped with "mystruct->a[0:n]", "mystruct->b", etc.), for OpenACC. The key observation leading to these changes was that component mappings of references-to-structures is already implemented and working, and indirect struct component handling via a pointer can work quite similarly. That lets us remove some earlier, special-case handling for mapping indirect struct component accesses for OpenACC, which required the pointed-to struct to be manually mapped before the indirect component mapping. With this patch, you can map struct components directly (e.g. an array slice "mystruct->a[0:n]") just like you can map a non-indirect struct component slice ("mystruct.a[0:n]"). Both references-to-pointers (with the former syntax) and references to structs (with the latter syntax) work now. For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the class metadata (the structure that points to the class data and vptr) -- it is instead treated as any other struct. For C++, the struct handling also works for class members ("this->foo"), without having to explicitly map "this[:1]" first. For OpenACC, we permit chained indirect component references ("mystruct->a->b[0:n]"), though only the last part of such mappings will trigger an attach/detach operation. To properly use such a construct on the target, you must still manually map "mystruct->a[:1]" first -- but there's no need to map "mystruct[:1]" explicitly before that. This version of the patch avoids altering code paths for OpenMP, where possible. (Those are dealt with by later patches in this series.) OK? Thanks, Julian 2021-06-02 Julian Brown gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET mappings for class metadata, nor GOMP_MAP_POINTER mappings for POINTER_TYPE_P decls. gcc/ * gimplify.c (extract_base_bit_offset): Add BASE_IND and OPENMP parameters. Handle pointer-typed indirect references for OpenACC alongside reference-typed ones. (strip_components_and_deref, aggregate_base_p): New functions. (build_struct_group): Add pointer type indirect ref handling, including chained references, for OpenACC. Also handle references to structs for OpenACC. Conditionalise bits for OpenMP only where appropriate. (gimplify_scan_omp_clauses): Rework pointer-type indirect structure access handling to work more like the reference-typed handling for OpenACC only. * omp-low.c (scan_sharing_clauses): Handle pointer-type indirect struct references, and references to pointers to structs also. gcc/testsuite/ * g++.dg/goacc/member-array-acc.C: New test. * g++.dg/gomp/member-array-omp.C: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test. * testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test. * testsuite/libgomp.oacc-c++/deep-copy-17.C: New test. --- gcc/fortran/trans-openmp.c| 20 +- gcc/gimplify.c| 214 +--- gcc/omp-low.c | 16 +- gcc/testsuite/g++.dg/goacc/member-array-acc.C | 13 + gcc/testsuite/g++.dg/gomp/member-array-omp.C | 13 + .../testsuite/libgomp.oacc-c++/deep-copy-17.C | 101 .../libgomp.oacc-c-c++-common/deep-copy-15.c | 68 ++ .../libgomp.oacc-c-c++-common/deep-copy-16.c | 231 ++ 8 files changed, 618 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7d761e90dd7..508e02306e9 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -3034,30 +3034,16 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, tree present = gfc_omp_check_optional_argument (decl, true); if (openacc && n->sym->ts.type == BT_CLASS) { - tree type = TREE_TYPE (decl); if (n->sym->attr.optional) sorry ("optional class parameter"); - if (POINTER_TYPE_P (type)) - { - node4 = build_omp_clause (input_location, - OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER); - OMP_CLAUSE_DECL (node4)
Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions
On Thu, 25 Nov 2021, Michael Matz wrote: > Hello, > > On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote: > > > It seems to be a style to place gcc_unreachable () after a > > switch that handles all cases with every case returning. > > Those are unreachable (well, yes!), so they will be elided > > at CFG construction time and the middle-end will place > > another __builtin_unreachable "after" them to note the > > path doesn't lead to a return when the function is not declared > > void. > > > > So IMHO those explicit gcc_unreachable () serve no purpose, > > if they could be replaced by a comment. > > Never document in comments what you can document in code (IMO). I think > the code as-is clearly documents the invariants and expectations and > removing the gcc_unreachable() leads to worse sources. > > Can't you simply exempt warning on unreachable __builtin_unreachable()? > It seems an obvious thing that the warning should _not_ warn about, after > all, quite clearly, the author is aware of that being unreachable, it says > so, right there. gcc_unreachable () is not actually __builtin_unreachable () but instead fancy_abort (__FILE__, __LINE__, __FUNCTION__). Yes, I agree that the warning shouldn't warn about "this is unrechable", but if it's not plain __builtin_unreachable () then we'd need a new function attribute on it which in this particular case means an alternate "fancy_abort" since in general fancy_aborts are of course reachable. We could also handle all noreturn calls this way and not diagnose those if they are unreachable in exchange for some false negatives. Btw, I don't agree with "Never document in comments what you can document in code" in this case, but I take it as a hint that removing gcc_unreachable in those cases should at least leave a comment in there? Richard.
[PATCH 05/16] Remove base_ind/base_ref handling from extract_base_bit_offset
In preparation for follow-up patches extending struct dereference handling for OpenMP, this patch removes base_ind/base_ref handling from gimplify.c:extract_base_bit_offset. This arguably simplifies some of the code around the callers of the function also, though subsequent patches modify those parts further. (This one has already been approved, pending approval of the rest of the series: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581426.html) 2021-09-29 Julian Brown gcc/ * gimplify.c (extract_base_bit_offset): Remove BASE_IND, BASE_REF and OPENMP parameters. (strip_indirections): New function. (build_struct_group): Update calls to extract_base_bit_offset. Rearrange indirect/reference handling accordingly. Use extracted base instead of passed-in decl when grouping component accesses together. --- gcc/gimplify.c | 109 ++--- 1 file changed, 57 insertions(+), 52 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index fcc278d07cf..73b839daa09 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8658,9 +8658,8 @@ build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, has array type, else return NULL. */ static tree -extract_base_bit_offset (tree base, tree *base_ind, tree *base_ref, -poly_int64 *bitposp, poly_offset_int *poffsetp, -tree *offsetp, bool openmp) +extract_base_bit_offset (tree base, poly_int64 *bitposp, +poly_offset_int *poffsetp, tree *offsetp) { tree offset; poly_int64 bitsize, bitpos; @@ -8668,38 +8667,12 @@ extract_base_bit_offset (tree base, tree *base_ind, tree *base_ref, int unsignedp, reversep, volatilep = 0; poly_offset_int poffset; - if (base_ind) -*base_ind = NULL_TREE; - - if (base_ref) -*base_ref = NULL_TREE; + STRIP_NOPS (base); base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode, &unsignedp, &reversep, &volatilep); - if (!openmp - && (TREE_CODE (base) == INDIRECT_REF - || (TREE_CODE (base) == MEM_REF - && integer_zerop (TREE_OPERAND (base, 1 - && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == POINTER_TYPE) -{ - if (base_ind) - *base_ind = base; - base = TREE_OPERAND (base, 0); -} - if ((TREE_CODE (base) == INDIRECT_REF - || (TREE_CODE (base) == MEM_REF - && integer_zerop (TREE_OPERAND (base, 1 - && DECL_P (TREE_OPERAND (base, 0)) - && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == REFERENCE_TYPE) -{ - if (base_ref) - *base_ref = base; - base = TREE_OPERAND (base, 0); -} - - if (!openmp) -STRIP_NOPS (base); + STRIP_NOPS (base); if (offset && poly_int_tree_p (offset)) { @@ -8756,6 +8729,17 @@ strip_components_and_deref (tree expr) return expr; } +static tree +strip_indirections (tree expr) +{ + while (TREE_CODE (expr) == INDIRECT_REF +|| (TREE_CODE (expr) == MEM_REF +&& integer_zerop (TREE_OPERAND (expr, 1 +expr = TREE_OPERAND (expr, 0); + + return expr; +} + /* Return TRUE if EXPR is something we will use as the base of an aggregate access, either: @@ -9249,7 +9233,7 @@ build_struct_group (struct gimplify_omp_ctx *ctx, { poly_offset_int coffset; poly_int64 cbitpos; - tree base_ind, base_ref, tree_coffset; + tree tree_coffset; tree ocd = OMP_CLAUSE_DECL (c); bool openmp = !(region_type & ORT_ACC); @@ -9259,10 +9243,25 @@ build_struct_group (struct gimplify_omp_ctx *ctx, if (TREE_CODE (ocd) == INDIRECT_REF) ocd = TREE_OPERAND (ocd, 0); - tree base = extract_base_bit_offset (ocd, &base_ind, &base_ref, &cbitpos, - &coffset, &tree_coffset, openmp); + tree base = extract_base_bit_offset (ocd, &cbitpos, &coffset, &tree_coffset); + tree sbase; - bool do_map_struct = (base == decl && !tree_coffset); + if (openmp) +{ + if (TREE_CODE (base) == INDIRECT_REF + && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == REFERENCE_TYPE) + sbase = strip_indirections (base); + else + sbase = base; +} + else +{ + sbase = strip_indirections (base); + + STRIP_NOPS (sbase); +} + + bool do_map_struct = (sbase == decl && !tree_coffset); /* Here, DECL is usually a DECL_P, unless we have chained indirect member accesses, e.g. mystruct->a->b. In that case it'll be the "mystruct->a" @@ -9322,19 +9321,12 @@ build_struct_group (struct gimplify_omp_ctx *ctx, OMP_CLAUSE_SET_MAP_KIND (l, k); - if (!openmp && base_ind) - OMP_CLAUSE_DECL (l) = unshare_expr (base_ind); - else if (base_ref) - OMP_CLAUSE_DECL (l) = unshare_expr (base_ref); - else - { - OMP_CLAUSE_DECL (l) = unshare_expr (decl); - if (openmp - && !DECL_P (OMP_CLAUSE_DECL (l)) -
[PATCH 06/16] OpenMP 5.0: Clause ordering for OpenMP 5.0 (topological sorting by base pointer)
This patch reimplements the omp_target_reorder_clauses function in anticipation of supporting "deeper" struct mappings (that is, with several structure dereference operators, or similar). The idea is that in place of the (possibly quadratic) algorithm in omp_target_reorder_clauses that greedily moves clauses containing addresses that are subexpressions of other addresses before those other addresses, we employ a topological sort algorithm to calculate a proper order for map clauses. This should run in linear time, and hopefully handles degenerate cases where multiple "levels" of indirect accesses are present on a given directive. The new method also takes care to keep clause groups together, addressing the concerns raised in: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570501.html To figure out if some given clause depends on a base pointer in another clause, we strip off the outer layers of the address expression, and check (via a tree_operand_hash hash table we have built) if the result is a "base pointer" as defined in OpenMP 5.0 (1.2.6 Data Terminology). There are some subtleties involved, however: - We must treat MEM_REF with zero offset the same as INDIRECT_REF. This should probably be fixed in the front ends instead so we always use a canonical form (probably INDIRECT_REF). The following patch shows one instance of the problem, but there may be others: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571382.html - Mapping a whole struct implies mapping each of that struct's elements, which may be base pointers. Because those base pointers aren't necessarily explicitly referenced in the directive in question, we treat the whole-struct mapping as a dependency instead. This version of the patch fixes a bug in omp_reorder_mapping_groups, relative to the last version posted. OK? Thanks, Julian 2021-11-23 Julian Brown gcc/ * gimplify.c (is_or_contains_p, omp_target_reorder_clauses): Delete functions. (omp_tsort_mark): Add enum. (omp_mapping_group): Add struct. (debug_mapping_group, omp_get_base_pointer, omp_get_attachment, omp_group_last, omp_gather_mapping_groups, omp_group_base, omp_index_mapping_groups, omp_containing_struct, omp_tsort_mapping_groups_1, omp_tsort_mapping_groups, omp_segregate_mapping_groups, omp_reorder_mapping_groups): New functions. (gimplify_scan_omp_clauses): Call above functions instead of omp_target_reorder_clauses, unless we've seen an error. * omp-low.c (scan_sharing_clauses): Avoid strict test if we haven't sorted mapping groups. gcc/testsuite/ * g++.dg/gomp/target-lambda-1.C: Adjust expected output. * g++.dg/gomp/target-this-3.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. --- gcc/gimplify.c | 807 +++- gcc/omp-low.c | 7 +- gcc/testsuite/g++.dg/gomp/target-lambda-1.C | 6 +- gcc/testsuite/g++.dg/gomp/target-this-3.C | 4 +- gcc/testsuite/g++.dg/gomp/target-this-4.C | 4 +- 5 files changed, 791 insertions(+), 37 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 73b839daa09..6778fb25e45 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8692,29 +8692,6 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp, return base; } -/* Returns true if EXPR is or contains (as a sub-component) BASE_PTR. */ - -static bool -is_or_contains_p (tree expr, tree base_ptr) -{ - if ((TREE_CODE (expr) == INDIRECT_REF && TREE_CODE (base_ptr) == MEM_REF) - || (TREE_CODE (expr) == MEM_REF && TREE_CODE (base_ptr) == INDIRECT_REF)) -return operand_equal_p (TREE_OPERAND (expr, 0), - TREE_OPERAND (base_ptr, 0)); - while (!operand_equal_p (expr, base_ptr)) -{ - if (TREE_CODE (base_ptr) == COMPOUND_EXPR) - base_ptr = TREE_OPERAND (base_ptr, 1); - if (TREE_CODE (base_ptr) == COMPONENT_REF - || TREE_CODE (base_ptr) == POINTER_PLUS_EXPR - || TREE_CODE (base_ptr) == SAVE_EXPR) - base_ptr = TREE_OPERAND (base_ptr, 0); - else - break; -} - return operand_equal_p (expr, base_ptr); -} - /* Remove COMPONENT_REFS and indirections from EXPR. */ static tree @@ -8768,6 +8745,7 @@ aggregate_base_p (tree expr) return false; } +#if 0 /* Implement OpenMP 5.x map ordering rules for target directives. There are several rules, and with some level of ambiguity, hopefully we can at least collect the complexity here in one place. */ @@ -8947,6 +8925,761 @@ omp_target_reorder_clauses (tree *list_p) } } } +#endif + + +enum omp_tsort_mark { + UNVISITED, + TEMPORARY, + PERMANENT +}; + +struct omp_mapping_group { + tree *grp_start; + tree grp_end; + omp_tsort_mark mark; + struct omp_mapping_group *sibling; + struct omp_mapping_group *next; +}; + +__attribute__((used)) static void +debug_ma
[PATCH 07/16] Remove omp_target_reorder_clauses
This patch has been split out from the previous one to avoid a confusingly-interleaved diff. The two patches should probably be committed squashed together. 2021-10-01 Julian Brown gcc/ * gimplify.c (omp_target_reorder_clauses): Delete. --- gcc/gimplify.c | 183 - 1 file changed, 183 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 6778fb25e45..fb923f05314 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8745,189 +8745,6 @@ aggregate_base_p (tree expr) return false; } -#if 0 -/* Implement OpenMP 5.x map ordering rules for target directives. There are - several rules, and with some level of ambiguity, hopefully we can at least - collect the complexity here in one place. */ - -static void -omp_target_reorder_clauses (tree *list_p) -{ - /* Collect refs to alloc/release/delete maps. */ - auto_vec ard; - tree *cp = list_p; - while (*cp != NULL_TREE) -if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP - && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALLOC - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_RELEASE - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_DELETE)) - { - /* Unlink cp and push to ard. */ - tree c = *cp; - tree nc = OMP_CLAUSE_CHAIN (c); - *cp = nc; - ard.safe_push (c); - - /* Any associated pointer type maps should also move along. */ - while (*cp != NULL_TREE - && OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP - && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_REFERENCE - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_POINTER - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ATTACH_DETACH - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_POINTER - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALWAYS_POINTER - || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_TO_PSET)) - { - c = *cp; - nc = OMP_CLAUSE_CHAIN (c); - *cp = nc; - ard.safe_push (c); - } - } -else - cp = &OMP_CLAUSE_CHAIN (*cp); - - /* Link alloc/release/delete maps to the end of list. */ - for (unsigned int i = 0; i < ard.length (); i++) -{ - *cp = ard[i]; - cp = &OMP_CLAUSE_CHAIN (ard[i]); -} - *cp = NULL_TREE; - - /* OpenMP 5.0 requires that pointer variables are mapped before - its use as a base-pointer. */ - auto_vec atf; - for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp)) -if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP) - { - /* Collect alloc, to, from, to/from clause tree pointers. */ - gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp); - if (k == GOMP_MAP_ALLOC - || k == GOMP_MAP_TO - || k == GOMP_MAP_FROM - || k == GOMP_MAP_TOFROM - || k == GOMP_MAP_ALWAYS_TO - || k == GOMP_MAP_ALWAYS_FROM - || k == GOMP_MAP_ALWAYS_TOFROM) - atf.safe_push (cp); - } - - for (unsigned int i = 0; i < atf.length (); i++) -if (atf[i]) - { - tree *cp = atf[i]; - tree decl = OMP_CLAUSE_DECL (*cp); - if (TREE_CODE (decl) == INDIRECT_REF || TREE_CODE (decl) == MEM_REF) - { - tree base_ptr = TREE_OPERAND (decl, 0); - STRIP_TYPE_NOPS (base_ptr); - for (unsigned int j = i + 1; j < atf.length (); j++) - if (atf[j]) - { - tree *cp2 = atf[j]; - tree decl2 = OMP_CLAUSE_DECL (*cp2); - - decl2 = OMP_CLAUSE_DECL (*cp2); - if (is_or_contains_p (decl2, base_ptr)) - { - /* Move *cp2 to before *cp. */ - tree c = *cp2; - *cp2 = OMP_CLAUSE_CHAIN (c); - OMP_CLAUSE_CHAIN (c) = *cp; - *cp = c; - - if (*cp2 != NULL_TREE - && OMP_CLAUSE_CODE (*cp2) == OMP_CLAUSE_MAP - && OMP_CLAUSE_MAP_KIND (*cp2) == GOMP_MAP_ALWAYS_POINTER) - { - tree c2 = *cp2; - *cp2 = OMP_CLAUSE_CHAIN (c2); - OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c); - OMP_CLAUSE_CHAIN (c) = c2; - } - - atf[j] = NULL; - } - } - } - } - - /* For attach_detach map clauses, if there is another map that maps the - attached/detached pointer, make sure that map is ordered before the - attach_detach. */ - atf.truncate (0); - for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp)) -if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP) - { - /* Collect alloc, to, from, to/from clauses, and - always_pointer/attach_detach clauses. */ - gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp); - if (k == GOMP_MAP_ALLOC -
[PATCH 08/16] OpenMP/OpenACC: Hoist struct sibling list handling in gimplification
This patch lifts struct sibling-list handling out of the main loop in gimplify_scan_omp_clauses. The reasons for this are several: first, it means that we can subject created sibling list groups to topological sorting (see previous patch) so base-pointer data dependencies are handled correctly. Secondly, it means that in the first pass gathering up sibling lists from parsed OpenMP/OpenACC clauses, we don't need to worry about gimplifying: that means we can see struct bases & components we need to sort sibling lists properly, even when we're using a non-DECL_P struct base. Gimplification proper still happens in the main loop in gimplify_scan_omp_clauses. Thirdly, because we use more than one pass through the clause list and gather appropriate data, we can tell if we're mapping a whole struct in a different node, and avoid building struct sibling lists for that struct appropriately. Fourthly, we can re-use the node grouping functions from the previous patch, and thus mostly avoid the "prev_list_p" handling in gimplify_scan_omp_clauses that tracks the first node in such groups at present. Some redundant code has been removed and code paths for OpenACC/OpenMP are now shared where appropriate, though OpenACC doesn't do the topological sorting of nodes (yet?). OK? Thanks, Julian 2021-09-29 Julian Brown gcc/ * gimplify.c (gimplify_omp_var_data): Remove GOVD_MAP_HAS_ATTACHMENTS. (extract_base_bit_offset): Remove OFFSETP parameter. (strip_components_and_deref): Extend with POINTER_PLUS_EXPR and COMPOUND_EXPR handling. (aggregate_base_p): Remove. (omp_group_last, omp_group_base): Add GOMP_MAP_STRUCT handling. (build_struct_group): Remove CTX, DECL, PD, COMPONENT_REF_P, FLAGS, STRUCT_SEEN_CLAUSE, PRE_P, CONT parameters. Replace PREV_LIST_P and C parameters with GRP_START_P and GRP_END. Add INNER. Update calls to extract_base_bit_offset. Remove gimplification of clauses for OpenMP. Rework inner struct handling for OpenACC. Don't use context's variables splay tree. (omp_build_struct_sibling_lists): New function, extracted from gimplify_scan_omp_clauses and refactored. (gimplify_scan_omp_clauses): Call above function to handle struct sibling lists. Remove STRUCT_MAP_TO_CLAUSE, STRUCT_SEEN_CLAUSE, STRUCT_DEREF_SET. Rework flag handling, adding decl for struct variables. (gimplify_adjust_omp_clauses_1): Remove GOVD_MAP_HAS_ATTACHMENTS handling, unused now. gcc/testsuite/ * g++.dg/goacc/member-array-acc.C: Update expected output. * g++.dg/gomp/target-3.C: Likewise. * g++.dg/gomp/target-lambda-1.C: Likewise. * g++.dg/gomp/target-this-2.C: Likewise. * g++.dg/gomp/target-this-4.C: Likewise. --- gcc/gimplify.c| 943 -- gcc/testsuite/g++.dg/goacc/member-array-acc.C | 2 +- gcc/testsuite/g++.dg/gomp/target-3.C | 4 +- gcc/testsuite/g++.dg/gomp/target-lambda-1.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-2.C | 2 +- gcc/testsuite/g++.dg/gomp/target-this-4.C | 4 +- 6 files changed, 410 insertions(+), 547 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index fb923f05314..56f0aaaf979 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -125,10 +125,6 @@ enum gimplify_omp_var_data /* Flag for GOVD_REDUCTION: inscan seen in {in,ex}clusive clause. */ GOVD_REDUCTION_INSCAN = 0x200, - /* Flag for GOVD_MAP: (struct) vars that have pointer attachments for - fields. */ - GOVD_MAP_HAS_ATTACHMENTS = 0x400, - /* Flag for GOVD_FIRSTPRIVATE: OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT. */ GOVD_FIRSTPRIVATE_IMPLICIT = 0x800, @@ -8659,7 +8655,7 @@ build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, static tree extract_base_bit_offset (tree base, poly_int64 *bitposp, -poly_offset_int *poffsetp, tree *offsetp) +poly_offset_int *poffsetp) { tree offset; poly_int64 bitsize, bitpos; @@ -8687,7 +8683,6 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp, *bitposp = bitpos; *poffsetp = poffset; - *offsetp = offset; return base; } @@ -8700,8 +8695,15 @@ strip_components_and_deref (tree expr) while (TREE_CODE (expr) == COMPONENT_REF || TREE_CODE (expr) == INDIRECT_REF || (TREE_CODE (expr) == MEM_REF -&& integer_zerop (TREE_OPERAND (expr, 1 -expr = TREE_OPERAND (expr, 0); +&& integer_zerop (TREE_OPERAND (expr, 1))) +|| TREE_CODE (expr) == POINTER_PLUS_EXPR +|| TREE_CODE (expr) == COMPOUND_EXPR) + if (TREE_CODE (expr) == COMPOUND_EXPR) + expr = TREE_OPERAND (expr, 1); + else + expr = TREE_OPERAND (expr, 0); + + STRIP_NOPS (expr); return expr; } @@ -8717,34 +8719,6 @@ strip_indirections (tree expr) return expr; } -/* Re
[PATCH 09/16] OpenMP: Allow array ref components for C & C++
This patch fixes parsing for struct components that are array references in OMP clauses in both the C and C++ front ends. OK? Thanks, Julian 2021-09-29 Julian Brown gcc/c/ * c-typeck.c (c_finish_omp_clauses): Allow ARRAY_REF components. gcc/cp/ * semantics.c (finish_omp_clauses): Allow ARRAY_REF components. --- gcc/c/c-typeck.c | 3 ++- gcc/cp/semantics.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index ee6362d4274..4d156f6d3ec 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -14918,7 +14918,8 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { t = TREE_OPERAND (t, 0); if (TREE_CODE (t) == MEM_REF - || TREE_CODE (t) == INDIRECT_REF) + || TREE_CODE (t) == INDIRECT_REF + || TREE_CODE (t) == ARRAY_REF) { t = TREE_OPERAND (t, 0); STRIP_NOPS (t); diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 50f95751d1c..e882c302f31 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7910,7 +7910,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) if (REFERENCE_REF_P (t)) t = TREE_OPERAND (t, 0); if (TREE_CODE (t) == MEM_REF - || TREE_CODE (t) == INDIRECT_REF) + || TREE_CODE (t) == INDIRECT_REF + || TREE_CODE (t) == ARRAY_REF) { t = TREE_OPERAND (t, 0); STRIP_NOPS (t); -- 2.29.2
Re: [committed 03/12] d: Insert null terminator in obstack buffers
On 7/30/21 13:01, Iain Buclaw via Gcc-patches wrote: |Covers cases where functions that handle the extracted strings ignore the explicit length. This isn't something that's known to happen in the current front-end, but the self-hosted front-end has been observed to do this in its conversions between D and C-style strings.| Can you please cherry pick this for gcc-11 branch as I see nasty output when using --verbose: $ gcc /home/marxin/Programming/gcc/gcc/testsuite/gdc.dg/attr_optimize4.d -c --verbose ... predefs GNU D_Version2 LittleEndian GNU_DWARF2_Exceptions GNU_StackGrowsDown GNU_InlineAsm D_LP64 assert D_ModuleInfo D_Exceptions D_TypeInfo all X86_64 D_HardFloat Posix linux CRuntime_Glibc CppRuntime_Gcc��... Thanks, Martin
[PATCH 10/16] OpenMP: Fix non-zero attach/detach bias for struct dereferences
This patch fixes attach/detach operations for OpenMP that have a non-zero bias: these can occur if we have a mapping such as: #pragma omp target map(mystruct->a.b[idx].c[:arrsz]) i.e. where there is an offset between the attachment point ("mystruct" here) and the pointed-to data. (The "b" and "c" members would be array types here, not pointers themselves). In this example the difference (thus bias encoded in the attach/detach node) will be something like: (uintptr_t) &mystruct->a.b[idx].c[0] - (uintptr_t) &mystruct->a OK? Thanks, Julian 2021-09-29 Julian Brown gcc/c-family/ * c-common.h (c_omp_decompose_attachable_address): Add prototype. * c-omp.c (c_omp_decompose_attachable_address): New function. gcc/c/ * c-typeck.c (handle_omp_array_sections): Handle attach/detach for struct dereferences with non-zero bias. gcc/cp/ * semantics.c (handle_omp_array_section): Handle attach/detach for struct dereferences with non-zero bias. libgomp/ * testsuite/libgomp.c++/baseptrs-3.C: Add test (XFAILed for now). * testsuite/libgomp.c-c++-common/baseptrs-1.c: Add test. * testsuite/libgomp.c-c++-common/baseptrs-2.c: Add test. --- gcc/c-family/c-common.h | 1 + gcc/c-family/c-omp.c | 42 gcc/c/c-typeck.c | 12 +- gcc/cp/semantics.c| 14 +- libgomp/testsuite/libgomp.c++/baseptrs-3.C| 182 ++ .../libgomp.c-c++-common/baseptrs-1.c | 50 + .../libgomp.c-c++-common/baseptrs-2.c | 70 +++ 7 files changed, 364 insertions(+), 7 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c++/baseptrs-3.C create mode 100644 libgomp/testsuite/libgomp.c-c++-common/baseptrs-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/baseptrs-2.c diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index d5dad99ff97..dd103d8eecd 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1251,6 +1251,7 @@ extern tree c_omp_check_context_selector (location_t, tree); extern void c_omp_mark_declare_variant (location_t, tree, tree); extern const char *c_omp_map_clause_name (tree, bool); extern void c_omp_adjust_map_clauses (tree, bool); +extern tree c_omp_decompose_attachable_address (tree t, tree *virtbase); enum c_omp_directive_kind { C_OMP_DIR_STANDALONE, diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index 3f84fd1b5cb..a90696fe706 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3113,6 +3113,48 @@ c_omp_adjust_map_clauses (tree clauses, bool is_target) } } +tree +c_omp_decompose_attachable_address (tree t, tree *virtbase) +{ + *virtbase = t; + + /* It's already a pointer. Just use that. */ + if (POINTER_TYPE_P (TREE_TYPE (t))) +return NULL_TREE; + + /* Otherwise, look for a base pointer deeper within the expression. */ + + while (TREE_CODE (t) == COMPONENT_REF +&& (TREE_CODE (TREE_OPERAND (t, 0)) == COMPONENT_REF +|| TREE_CODE (TREE_OPERAND (t, 0)) == ARRAY_REF)) +{ + t = TREE_OPERAND (t, 0); + while (TREE_CODE (t) == ARRAY_REF) + t = TREE_OPERAND (t, 0); +} + + + *virtbase = t; + + if (TREE_CODE (t) != COMPONENT_REF) +return NULL_TREE; + + t = TREE_OPERAND (t, 0); + + tree attach_pt = NULL_TREE; + + if ((TREE_CODE (t) == INDIRECT_REF + || TREE_CODE (t) == MEM_REF) + && TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == POINTER_TYPE) +{ + attach_pt = TREE_OPERAND (t, 0); + if (TREE_CODE (attach_pt) == POINTER_PLUS_EXPR) + attach_pt = TREE_OPERAND (attach_pt, 0); +} + + return attach_pt; +} + static const struct c_omp_directive omp_directives[] = { /* Keep this alphabetically sorted by the first word. Non-null second/third if any should precede null ones. */ diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 4d156f6d3ec..cfac7d0a2b5 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -13799,9 +13799,15 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (size) size = c_fully_fold (size, false, NULL); OMP_CLAUSE_SIZE (c) = size; + tree virtbase = t; + tree attach_pt + = ((ort != C_ORT_ACC) + ? c_omp_decompose_attachable_address (t, &virtbase) + : NULL_TREE); if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP || (TREE_CODE (t) == COMPONENT_REF - && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)) + && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE + && !attach_pt)) return false; gcc_assert (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_DEVICEPTR); switch (OMP_CLAUSE_MAP_KIND (c)) @@ -13834,10 +13840,10 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER && !c_mark_addressable (t)) return fa
[PATCH 12/16] OpenACC: Make deep-copy-arrayofstruct.c a libgomp/runtime test
I noticed that the test in question now compiles properly, and in fact runs properly too. Thus it's more useful as a runtime test than a passing compilation test that otherwise doesn't do much. This patch moves it to libgomp. OK? Thanks, Julian 2021-10-11 Julian Brown gcc/testsuite/ * c-c++-common/goacc/deep-copy-arrayofstruct.c: Move test from here. libgomp/ * testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c: Move test to here. --- .../libgomp.oacc-c-c++-common}/deep-copy-arrayofstruct.c| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename {gcc/testsuite/c-c++-common/goacc => libgomp/testsuite/libgomp.oacc-c-c++-common}/deep-copy-arrayofstruct.c (98%) diff --git a/gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c similarity index 98% rename from gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c rename to libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c index 4247607b61c..a11c64749cc 100644 --- a/gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do run } */ #include #include -- 2.29.2
[PATCH 11/16] OpenMP: Handle reference-typed struct members
This patch fixes the baseptrs-3.C test case introduced in the patch: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580729.html The problematic case concerns OpenMP mapping clauses containing struct members of reference type, e.g. "mystruct.myref.myptr[:N]". To be able to access the array slice through the reference in the middle, we need to perform an attach action for that reference, since it is represented internally as a pointer. I don't think the spec allows for this case explicitly. The closest clause is (OpenMP 5.0, "2.19.7.1 map Clause"): "If the type of a list item is a reference to a type T then the reference in the device data environment is initialized to refer to the object in the device data environment that corresponds to the object referenced by the list item. If mapping occurs, it occurs as though the object were mapped through a pointer with an array section of type T and length one." The patch as is allows the mapping to work with just "mystruct.myref.myptr[:N]", without an explicit "mystruct.myref" mapping also (because, would that refer to the hidden pointer used by the reference, or the automatically-dereferenced data itself?). An attach/detach operation is thus synthesised for the reference. OK? Thanks, Julian 2021-10-11 Julian Brown gcc/cp/ * semantics.c (finish_omp_clauses): Handle reference-typed members. gcc/ * gimplify.c (build_struct_group): Arrange for attach/detach nodes to be created for reference-typed struct members for OpenMP. Only create firstprivate_pointer/firstprivate_reference nodes for innermost struct accesses, those with an optionally-indirected DECL_P base. (omp_build_struct_sibling_lists): Handle two-element chain for inner struct component returned from build_struct_group. libgomp/ * testsuite/libgomp.c++/baseptrs-3.C: Remove XFAILs and extend test. --- gcc/cp/semantics.c | 4 + gcc/gimplify.c | 56 +-- libgomp/testsuite/libgomp.c++/baseptrs-3.C | 109 +++-- 3 files changed, 154 insertions(+), 15 deletions(-) diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 068c0c69e58..6d30a9ed97d 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7923,6 +7923,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) STRIP_NOPS (t); if (TREE_CODE (t) == POINTER_PLUS_EXPR) t = TREE_OPERAND (t, 0); + if (REFERENCE_REF_P (t)) + t = TREE_OPERAND (t, 0); } } while (TREE_CODE (t) == COMPONENT_REF); @@ -8021,6 +8023,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type ort) { t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); indir_component_ref_p = true; + if (REFERENCE_REF_P (t)) + t = TREE_OPERAND (t, 0); STRIP_NOPS (t); if (TREE_CODE (t) == POINTER_PLUS_EXPR) t = TREE_OPERAND (t, 0); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 56f0aaaf979..8f07da8a991 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -9802,7 +9802,10 @@ build_struct_group (enum omp_region_type region_type, enum tree_code code, /* FIXME: If we're not mapping the base pointer in some other clause on this directive, I think we want to create ALLOC/RELEASE here -- i.e. not early-exit. */ - if (openmp && attach_detach) + if (openmp + && attach_detach + && !(TREE_CODE (TREE_TYPE (ocd)) == REFERENCE_TYPE + && TREE_CODE (TREE_TYPE (TREE_TYPE (ocd))) != POINTER_TYPE)) return NULL; if (!struct_map_to_clause || struct_map_to_clause->get (base) == NULL) @@ -9851,9 +9854,32 @@ build_struct_group (enum omp_region_type region_type, enum tree_code code, tree noind = strip_indirections (base); - if (!openmp + if (openmp + && TREE_CODE (TREE_TYPE (noind)) == REFERENCE_TYPE && (region_type & ORT_TARGET) && TREE_CODE (noind) == COMPONENT_REF) + { + tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), + OMP_CLAUSE_MAP); + OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_TO); + OMP_CLAUSE_DECL (c2) = unshare_expr (base); + OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (TREE_TYPE (noind)); + + tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), + OMP_CLAUSE_MAP); + OMP_CLAUSE_SET_MAP_KIND (c3, GOMP_MAP_ATTACH_DETACH); + OMP_CLAUSE_DECL (c3) = unshare_expr (noind); + OMP_CLAUSE_SIZE (c3) = size_zero_node; + + OMP_CLAUSE_CHAIN (c2) = c3; + OMP_CLAUSE_CHAIN (c3) = NULL_TREE; + + *inner = c2; + return NULL; + } + e
[PATCH 13/16] Add debug_omp_expr
The C and C++ front-ends use a TREE_LIST as a 3-tuple representing an OpenMP array section, which tends to crash debug_generic_expr if one wants to print such an expression in the debugger. This little helper function works around that. We might want to adjust the representation of array sections to use the soon-to-be-introduced OMP_ARRAY_SECTION tree code throughout instead, at which point this patch will no longer be necessary. OK? Thanks, Julian 2021-11-15 Julian Brown gcc/ * tree-pretty-print.c (print_omp_expr, debug_omp_expr): New functions. * tree-pretty-print.h (debug_omp_expr): Add prototype. --- gcc/tree-pretty-print.c | 31 +++ gcc/tree-pretty-print.h | 1 + 2 files changed, 32 insertions(+) diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index a81ba401ef9..13b64fd52e1 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -103,6 +103,37 @@ debug_generic_stmt (tree t) fprintf (stderr, "\n"); } +static void +print_omp_expr (tree t) +{ + if (TREE_CODE (t) == TREE_LIST) +{ + tree low = TREE_PURPOSE (t); + tree len = TREE_VALUE (t); + tree base = TREE_CHAIN (t); + if (TREE_CODE (base) == TREE_LIST) + print_omp_expr (base); + else + print_generic_expr (stderr, base, TDF_VOPS|TDF_MEMSYMS); + fprintf (stderr, "["); + if (low) + print_generic_expr (stderr, low, TDF_VOPS|TDF_MEMSYMS); + fprintf (stderr, ":"); + if (len) + print_generic_expr (stderr, len, TDF_VOPS|TDF_MEMSYMS); + fprintf (stderr, "]"); +} + else +print_generic_expr (stderr, t, TDF_VOPS|TDF_MEMSYMS); +} + +DEBUG_FUNCTION void +debug_omp_expr (tree t) +{ + print_omp_expr (t); + fprintf (stderr, "\n"); +} + /* Debugging function to print out a chain of trees . */ DEBUG_FUNCTION void diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h index dacd256302b..bc910f9a1b1 100644 --- a/gcc/tree-pretty-print.h +++ b/gcc/tree-pretty-print.h @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see extern void debug_generic_expr (tree); extern void debug_generic_stmt (tree); +extern void debug_omp_expr (tree); extern void debug_tree_chain (tree); extern void print_generic_decl (FILE *, tree, dump_flags_t); extern void print_generic_stmt (FILE *, tree, dump_flags_t = TDF_NONE); -- 2.29.2
[PATCH 14/16] OpenMP: Add inspector class to unify mapped address analysis
Several places in the C and C++ front-ends dig through OpenMP addresses from "map" clauses (etc.) in order to determine whether they are component accesses that need "attach" operations, check duplicate mapping clauses, and so on. When we're extending support for more kinds of lvalues in map clauses, it seems helpful to bring these all into one place in order to keep all the analyses in sync, and to make it easier to reason about which kinds of expressions are supported. This patch introduces an "address inspector" class for that purpose, and adjusts the C and C++ front-ends to use it. (The adjacent "c_omp_decompose_attachable_address" function could also be moved into the address inspector class, perhaps. That's not been done yet.) OK? Thanks, Julian 2021-11-15 Julian Brown gcc/c-family/ * c-common.h (c_omp_address_inspector): New class. * c-omp.c (c_omp_address_inspector::init, c_omp_address_inspector::analyze_components, c_omp_address_inspector::map_supported_p, c_omp_address_inspector::mappable_type): New methods. gcc/c/ * c-typeck.c (handle_omp_array_sections_1, c_finish_omp_clauses): Use c_omp_address_inspector class. gcc/cp/ * semantics.c (cp_omp_address_inspector): New class, derived from c_omp_address_inspector. (handle_omp_array_sections_1): Use cp_omp_address_inspector class to analyze OpenMP map clause expressions. Support POINTER_PLUS_EXPR. (finish_omp_clauses): Likewise. Support some additional kinds of lvalues in map clauses. gcc/testsuite/ * g++.dg/gomp/unmappable-component-1.C: New test. --- gcc/c-family/c-common.h | 44 +++ gcc/c-family/c-omp.c | 147 ++ gcc/c/c-typeck.c | 198 +++--- gcc/cp/semantics.c| 252 ++ .../g++.dg/gomp/unmappable-component-1.C | 21 ++ 5 files changed, 338 insertions(+), 324 deletions(-) create mode 100644 gcc/testsuite/g++.dg/gomp/unmappable-component-1.C diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index dd103d8eecd..05d479e4d2f 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1253,6 +1253,50 @@ extern const char *c_omp_map_clause_name (tree, bool); extern void c_omp_adjust_map_clauses (tree, bool); extern tree c_omp_decompose_attachable_address (tree t, tree *virtbase); +class c_omp_address_inspector +{ + tree clause; + tree orig; + tree deref_toplevel; + tree outer_virtual_base; + tree root_term; + bool component_access; + bool indirections; + int map_supported; + +protected: + virtual bool reference_ref_p (tree) { return false; } + virtual bool processing_template_decl_p () { return false; } + virtual bool mappable_type (tree t); + virtual void emit_unmappable_type_notes (tree) { } + +public: + c_omp_address_inspector (tree c, tree t) +: clause (c), orig (t), deref_toplevel (NULL_TREE), + outer_virtual_base (NULL_TREE), root_term (NULL_TREE), + component_access (false), indirections (false), map_supported (-1) + { } + + ~c_omp_address_inspector () {} + + virtual void init (); + + tree analyze_components (bool); + + tree get_deref_toplevel () { return deref_toplevel; } + tree get_outer_virtual_base () { return outer_virtual_base; } + tree get_root_term () { gcc_assert (root_term); return root_term; } + bool component_access_p () { return component_access; } + + bool indir_component_ref_p () +{ + gcc_assert (!component_access || root_term != NULL_TREE); + return component_access && indirections; +} + + bool map_supported_p (); +}; + enum c_omp_directive_kind { C_OMP_DIR_STANDALONE, C_OMP_DIR_CONSTRUCT, diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index a90696fe706..5b2fbf6809b 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3113,6 +3113,153 @@ c_omp_adjust_map_clauses (tree clauses, bool is_target) } } +/* This could just be done in the constructor, but we need to call the + subclass's version of reference_ref_p, etc. */ + +void +c_omp_address_inspector::init () +{ + tree t = orig; + + gcc_assert (TREE_CODE (t) != ARRAY_REF); + + /* We may have a reference-typed component access at the outermost level + that has had convert_from_reference called on it. Look through that + access. */ + if (reference_ref_p (t) + && TREE_CODE (TREE_OPERAND (t, 0)) == COMPONENT_REF) +{ + t = TREE_OPERAND (t, 0); + deref_toplevel = t; +} + else +deref_toplevel = t; + + /* Strip off expression nodes that may enclose a COMPONENT_REF. Look through + references, but not indirections through pointers. */ + while (1) +{ + if (TREE_CODE (t) == COMPOUND_EXPR) + { + t = TREE_OPERAND (t, 1); + STRIP_NOPS (t); + } + else if (TREE_CODE (t) == POINTER_PLUS_EXPR + || TREE_CODE (t
[PATCH 16/16] OpenMP: lvalue parsing for map clauses (C)
This patch adds support for parsing general lvalues for OpenMP "map" clauses to the C front-end, similar to the previous patch for C++. This version of the patch fixes several omissions regarding non-DECL_P root terms in map clauses (i.e. "*foo" in "(*foo)->ptr->arr[:N]") -- similar to the cp/semantics.c changes in the previous patch -- and adds a couple of new tests. OK? Thanks, Julian 2021-11-24 Julian Brown gcc/c/ * c-parser.c (c_parser_postfix_expression_after_primary): Add support for OpenMP array section parsing. (c_parser_omp_variable_list): Change ALLOW_DEREF parameter to MAP_LVALUE. Support parsing of general lvalues in "map" clauses. (c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to MAP_LVALUE. Update call to c_parser_omp_variable_list. (c_parser_oacc_data_clause, c_parser_omp_clause_to, c_parser_omp_clause_from): Update calls to c_parser_omp_var_list_parens. * c-tree.h (c_omp_array_section_p): Add extern declaration. * c-typeck.c (c_omp_array_section_p): Add flag. (mark_exp_read): Support OMP_ARRAY_SECTION. (handle_omp_array_sections_1): Handle more kinds of expressions. (handle_omp_array_sections): Handle non-DECL_P attachment points. (c_finish_omp_clauses): Check for supported expression types. Support non-DECL_P root term for map clauses. gcc/testsuite/ * c-c++-common/gomp/map-1.c: Adjust expected output. * c-c++-common/gomp/map-6.c: Likewise. libgomp/ * testsuite/libgomp.c-c++-common/ind-base-4.c: New test. * testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test. --- gcc/c/c-parser.c | 150 +++--- gcc/c/c-tree.h| 1 + gcc/c/c-typeck.c | 45 +- gcc/testsuite/c-c++-common/gomp/map-1.c | 3 +- gcc/testsuite/c-c++-common/gomp/map-6.c | 2 + .../libgomp.c-c++-common/ind-base-4.c | 50 ++ .../libgomp.c-c++-common/unary-ptr-1.c| 16 ++ 7 files changed, 243 insertions(+), 24 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/ind-base-4.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/unary-ptr-1.c diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 322f30c90b4..702a0b7d8a9 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -10460,7 +10460,7 @@ c_parser_postfix_expression_after_primary (c_parser *parser, struct c_expr expr) { struct c_expr orig_expr; - tree ident, idx; + tree ident, idx, len; location_t sizeof_arg_loc[3], comp_loc; tree sizeof_arg[3]; unsigned int literal_zero_mask; @@ -10479,15 +10479,44 @@ c_parser_postfix_expression_after_primary (c_parser *parser, case CPP_OPEN_SQUARE: /* Array reference. */ c_parser_consume_token (parser); - idx = c_parser_expression (parser).value; - c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE, -"expected %<]%>"); - start = expr.get_start (); - finish = parser->tokens_buf[0].location; - expr.value = build_array_ref (op_loc, expr.value, idx); - set_c_expr_source_range (&expr, start, finish); - expr.original_code = ERROR_MARK; - expr.original_type = NULL; + idx = len = NULL_TREE; + if (!c_omp_array_section_p + || c_parser_next_token_is_not (parser, CPP_COLON)) + idx = c_parser_expression (parser).value; + + if (c_omp_array_section_p + && c_parser_next_token_is (parser, CPP_COLON)) + { + c_parser_consume_token (parser); + if (c_parser_next_token_is_not (parser, CPP_CLOSE_SQUARE)) + len = c_parser_expression (parser).value; + + c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE, +"expected %<]%>"); + +/* NOTE: We are reusing using the type of the whole array as the + type of the array section here, which isn't necessarily + entirely correct. Might need revisiting. */ + start = expr.get_start (); + finish = parser->tokens_buf[0].location; + expr.value = build3_loc (op_loc, OMP_ARRAY_SECTION, + TREE_TYPE (expr.value), expr.value, + idx, len); + set_c_expr_source_range (&expr, start, finish); + expr.original_code = ERROR_MARK; + expr.original_type = NULL; + } + else + { + c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE, +"expected %<]%>"); + start = expr.get_start (); + finish = parser->tokens_buf[0].location; + expr.value = build_array_ref (op_
[PATCH 15/16] OpenMP: lvalue parsing for map clauses (C++)
This patch changes parsing for OpenMP map clauses in C++ to use the generic expression parser, hence adds support for parsing general lvalues (as required by OpenMP 5.0+). So far only a few new types of expression are actually supported throughout compilation (including everything in the testsuite of course, and newly-added tests), and we attempt to reject unsupported expressions in order to avoid surprises for the user. This version of the patch adds a number of additional tests for various expressions accepted as lvalues in C++, many of which are currently rejected as not yet supported -- and really only a handful of the rejected cases would be plausible in the context of an OpenMP "map" clause anyway, IMO. OK? Thanks, Julian 2021-11-24 Julian Brown gcc/c-family/ * c-omp.c (c_omp_decompose_attachable_address): Handle more types of expressions. gcc/cp/ * error.c (dump_expr): Handle OMP_ARRAY_SECTION. * parser.c (cp_parser_new): Initialize parser->omp_array_section_p. (cp_parser_postfix_open_square_expression): Support OMP_ARRAY_SECTION parsing. (cp_parser_omp_var_list_no_open): Remove ALLOW_DEREF parameter, add MAP_LVALUE in its place. Supported generalised lvalue parsing for map clauses. (cp_parser_omp_var_list): Remove ALLOW_DEREF parameter, add MAP_LVALUE. Pass to cp_parser_omp_var_list_no_open. (cp_parser_oacc_data_clause, cp_parser_omp_all_clauses): Update calls to cp_parser_omp_var_list. * parser.h (cp_parser): Add omp_array_section_p field. * semantics.c (handle_omp_array_sections_1): Handle more types of map expression. (handle_omp_array_section): Handle non-DECL_P attachment points. (finish_omp_clauses): Check for supported types of expression. gcc/ * gimplify.c (build_struct_group): Handle reference-typed component accesses. Fix support for non-DECL_P struct bases. (omp_build_struct_sibling_lists): Support length-two group for synthesized inner struct mapping. * tree-pretty-print.c (dump_generic_node): Support OMP_ARRAY_SECTION. * tree.def (OMP_ARRAY_SECTION): New tree code. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update expected output. * g++.dg/gomp/pr67522.C: Likewise. * g++.dg/gomp/ind-base-3.C: New test. * g++.dg/gomp/map-assignment-1.C: New test. * g++.dg/gomp/map-inc-1.C: New test. * g++.dg/gomp/map-lvalue-ref-1.C: New test. * g++.dg/gomp/map-ptrmem-1.C: New test. * g++.dg/gomp/map-ptrmem-2.C: New test. * g++.dg/gomp/map-static-cast-lvalue-1.C: New test. * g++.dg/gomp/map-ternary-1.C: New test. * g++.dg/gomp/member-array-2.C: New test. libgomp/ * testsuite/libgomp.c++/ind-base-1.C: New test. * testsuite/libgomp.c++/ind-base-2.C: New test. * testsuite/libgomp.c++/map-comma-1.C: New test. * testsuite/libgomp.c++/map-rvalue-ref-1.C: New test. * testsuite/libgomp.c++/member-array-1.C: New test. * testsuite/libgomp.c++/struct-ref-1.C: New test. --- gcc/c-family/c-omp.c | 25 ++- gcc/cp/error.c| 9 + gcc/cp/parser.c | 141 +-- gcc/cp/parser.h | 3 + gcc/cp/semantics.c| 35 +++- gcc/gimplify.c| 37 +++- gcc/testsuite/c-c++-common/gomp/map-6.c | 4 +- gcc/testsuite/g++.dg/gomp/ind-base-3.C| 38 gcc/testsuite/g++.dg/gomp/map-assignment-1.C | 12 ++ gcc/testsuite/g++.dg/gomp/map-inc-1.C | 10 ++ gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C | 19 ++ gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C | 36 gcc/testsuite/g++.dg/gomp/map-ptrmem-2.C | 39 + .../g++.dg/gomp/map-static-cast-lvalue-1.C| 17 ++ gcc/testsuite/g++.dg/gomp/map-ternary-1.C | 20 +++ gcc/testsuite/g++.dg/gomp/member-array-2.C| 86 ++ gcc/testsuite/g++.dg/gomp/pr67522.C | 2 +- gcc/tree-pretty-print.c | 14 ++ gcc/tree.def | 3 + libgomp/testsuite/libgomp.c++/ind-base-1.C| 162 ++ libgomp/testsuite/libgomp.c++/ind-base-2.C| 49 ++ libgomp/testsuite/libgomp.c++/map-comma-1.C | 15 ++ .../testsuite/libgomp.c++/map-rvalue-ref-1.C | 22 +++ .../testsuite/libgomp.c++/member-array-1.C| 89 ++ libgomp/testsuite/libgomp.c++/struct-ref-1.C | 97 +++ 25 files changed, 956 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/g++.dg/gomp/ind-base-3.C create mode 100644 gcc/testsuite/g++.dg/gomp/map-assignment-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/map-inc-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C create
[PATCH] Remove unreachable returns
This removes unreachable return statements as diagnosed by the -Wunreachable-code patch. Some cases are more obviously an improvement than others - in fact some may get you the idea to replace them with gcc_unreachable () instead, leading to cases of the 'Remove unreachable gcc_unreachable () at the end of functions' patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? Comments? Feel free to approve select cases only. Thanks, Richard. 2021-11-25 Richard Biener * vec.c (qsort_chk): Do not return the void return value from the noreturn qsort_chk_error. * ccmp.c (expand_ccmp_expr_1): Remove unreachable return. * df-scan.c (df_ref_equal_p): Likewise. * dwarf2out.c (is_base_type): Likewise. (add_const_value_attribute): Likewise. * fixed-value.c (fixed_arithmetic): Likewise. * gimple-fold.c (gimple_fold_builtin_fputs): Likewise. * gimple-ssa-strength-reduction.c (stmt_cost): Likewise. * graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_expr_op): Likewise. (gcc_expression_from_isl_expression): Likewise. * ipa-fnsummary.c (will_be_nonconstant_expr_predicate): Likewise. * lto-streamer-in.c (lto_input_mode_table): Likewise. gcc/c-family/ * c-opts.c (c_common_post_options): Remove unreachable return. * c-pragma.c (handle_pragma_target): Likewise. (handle_pragma_optimize): Likewise. gcc/c/ * c-typeck.c (c_tree_equal): Remove unreachable return. * c-parser.c (get_matching_symbol): Likewise. libgomp/ * oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable return. --- gcc/c-family/c-opts.c | 5 + gcc/c-family/c-pragma.c | 10 ++ gcc/c/c-parser.c| 1 - gcc/c/c-typeck.c| 2 -- gcc/ccmp.c | 2 -- gcc/df-scan.c | 1 - gcc/dwarf2out.c | 3 --- gcc/fixed-value.c | 1 - gcc/gimple-fold.c | 1 - gcc/gimple-ssa-strength-reduction.c | 1 - gcc/graphite-isl-ast-to-gimple.c| 4 gcc/ipa-fnsummary.c | 1 - gcc/lto-streamer-in.c | 7 ++- gcc/vec.c | 10 +- libgomp/oacc-plugin.c | 1 - 15 files changed, 10 insertions(+), 40 deletions(-) diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 2030eb1a4cd..93845d57dee 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -1109,10 +1109,7 @@ c_common_post_options (const char **pfilename) out_stream = fopen (out_fname, "w"); if (out_stream == NULL) - { - fatal_error (input_location, "opening output file %s: %m", out_fname); - return false; - } + fatal_error (input_location, "opening output file %s: %m", out_fname); init_pp_output (out_stream); } diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c index 3663eb1cfbb..c4ed4205820 100644 --- a/gcc/c-family/c-pragma.c +++ b/gcc/c-family/c-pragma.c @@ -916,10 +916,7 @@ handle_pragma_target(cpp_reader *ARG_UNUSED(dummy)) } if (token != CPP_STRING) -{ - GCC_BAD_AT (loc, "%<#pragma GCC option%> is not a string"); - return; -} +GCC_BAD_AT (loc, "%<#pragma GCC option%> is not a string"); /* Strings are user options. */ else @@ -991,10 +988,7 @@ handle_pragma_optimize (cpp_reader *ARG_UNUSED(dummy)) } if (token != CPP_STRING && token != CPP_NUMBER) -{ - GCC_BAD ("%<#pragma GCC optimize%> is not a string or number"); - return; -} +GCC_BAD ("%<#pragma GCC optimize%> is not a string or number"); /* Strings/numbers are user options. */ else diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index f312630448f..af2bb5bc8cc 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1132,7 +1132,6 @@ get_matching_symbol (enum cpp_ttype type) { default: gcc_unreachable (); - return ""; case CPP_CLOSE_PAREN: return "("; case CPP_CLOSE_BRACE: diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index b71358e1821..7524304f2bd 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -15984,8 +15984,6 @@ c_tree_equal (tree t1, tree t2) default: gcc_unreachable (); } - /* We can get here with --disable-checking. */ - return false; } /* Returns true when the function declaration FNDECL is implicit, diff --git a/gcc/ccmp.c b/gcc/ccmp.c index d581cfadf06..616fe035e79 100644 --- a/gcc/ccmp.c +++ b/gcc/ccmp.c @@ -273,8 +273,6 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq) return NULL_RTX; return expand_ccmp_next (op1, code, tmp, prep_seq, gen_seq); } - - return NULL_RTX; } /* Main entry to expand conditional compare statement G. diff --git a/gcc/df-scan.c b/gcc/df-scan.c index 3dbda7aa52c..1baa6e73
[PATCH] c++, v3: Fix up diagnostics about taking address of an immediate member function [PR102753]
On Wed, Nov 24, 2021 at 09:07:48PM -0500, Jason Merrill wrote: > > --- gcc/cp/tree.c.jj2021-11-24 15:05:23.371927735 +0100 > > +++ gcc/cp/tree.c 2021-11-24 17:09:05.348164621 +0100 > > @@ -5167,6 +5167,7 @@ make_ptrmem_cst (tree type, tree member) > > tree ptrmem_cst = make_node (PTRMEM_CST); > > TREE_TYPE (ptrmem_cst) = type; > > PTRMEM_CST_MEMBER (ptrmem_cst) = member; > > + PTRMEM_CST_LOCATION (ptrmem_cst) = input_location; > > return ptrmem_cst; > > } > > Please also change build_x_unary_op to improve PTRMEM_CST_LOCATION instead > of adding a wrapper, and teach cp_expr_location about PTRMEM_CST_LOCATION. Done. Though, had to also change convert_for_assignment from EXPR_LOC_OR_LOC to cp_expr_loc_or_input_loc and expand_ptrmemfunc_cst to copy over location to ADDR_EXPR from PTRMEM_CST. > > --- gcc/cp/pt.c.jj 2021-11-24 15:05:23.336928234 +0100 > > +++ gcc/cp/pt.c 2021-11-24 15:34:29.018014159 +0100 > > @@ -17012,6 +17012,12 @@ tsubst_copy (tree t, tree args, tsubst_f > > r = build1 (code, type, op0); > > This should become build1_loc (EXPR_LOCATION (t), ... Done. > > > if (code == ALIGNOF_EXPR) > > ALIGNOF_EXPR_STD_P (r) = ALIGNOF_EXPR_STD_P (t); > > + /* For addresses of immediate functions ensure we have EXPR_LOCATION > > + set for possible later diagnostics. */ > > + if (code == ADDR_EXPR > > + && TREE_CODE (op0) == FUNCTION_DECL > > + && DECL_IMMEDIATE_FUNCTION_P (op0)) > > + SET_EXPR_LOCATION (r, input_location); > > ...and then do this only if t didn't have a location. And this too. 2021-11-25 Jakub Jelinek PR c++/102753 * cp-tree.h (struct ptrmem_cst): Add locus member. (PTRMEM_CST_LOCATION): Define. * tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location. (cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST. * typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for PTRMEM_CST instead of calling maybe_wrap_with_location. (cp_build_addr_expr_1): Don't diagnose taking address of immediate functions here. Instead when taking their address make sure the returned ADDR_EXPR has EXPR_LOCATION set. (expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR if taking address of immediate member function. (convert_for_assignment): Use cp_expr_loc_or_input_loc instead of EXPR_LOC_OR_LOC. * pt.c (tsubst_copy): Use build1_loc instead of build1. Ensure ADDR_EXPR of immediate function has EXPR_LOCATION set. * cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate functions here. For consteval if don't walk THEN_CLAUSE. (cp_genericize_r): Move evaluation of calls to std::source_location::current from here to... (cp_fold): ... here. Don't assert calls to immediate functions must be source_location_current_p, instead only constant evaluate calls to source_location_current_p. * g++.dg/cpp2a/consteval20.C: Add some extra tests. * g++.dg/cpp2a/consteval23.C: Likewise. * g++.dg/cpp2a/consteval25.C: New test. * g++.dg/cpp2a/srcloc20.C: New test. --- gcc/cp/cp-tree.h.jj 2021-11-25 08:35:39.856073838 +0100 +++ gcc/cp/cp-tree.h2021-11-25 14:25:33.411081733 +0100 @@ -703,6 +703,7 @@ struct GTY(()) template_parm_index { struct GTY(()) ptrmem_cst { struct tree_common common; tree member; + location_t locus; }; typedef struct ptrmem_cst * ptrmem_cst_t; @@ -4726,6 +4727,11 @@ more_aggr_init_expr_args_p (const aggr_i #define PTRMEM_CST_MEMBER(NODE) \ (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member) +/* For a pointer-to-member constant `X::Y' this is a location where + the address of the member has been taken. */ +#define PTRMEM_CST_LOCATION(NODE) \ + (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->locus) + /* The expression in question for a TYPEOF_TYPE. */ #define TYPEOF_TYPE_EXPR(NODE) (TYPE_VALUES_RAW (TYPEOF_TYPE_CHECK (NODE))) --- gcc/cp/tree.c.jj2021-11-25 08:35:39.942072610 +0100 +++ gcc/cp/tree.c 2021-11-25 14:31:32.784899701 +0100 @@ -5196,6 +5196,7 @@ make_ptrmem_cst (tree type, tree member) tree ptrmem_cst = make_node (PTRMEM_CST); TREE_TYPE (ptrmem_cst) = type; PTRMEM_CST_MEMBER (ptrmem_cst) = member; + PTRMEM_CST_LOCATION (ptrmem_cst) = input_location; return ptrmem_cst; } @@ -6040,6 +6041,8 @@ cp_expr_location (const_tree t_) return STATIC_ASSERT_SOURCE_LOCATION (t); case TRAIT_EXPR: return TRAIT_EXPR_LOCATION (t); +case PTRMEM_CST: + return PTRMEM_CST_LOCATION (t); default: return EXPR_LOCATION (t); } --- gcc/cp/typeck.c.jj 2021-11-25 08:32:50.585489416 +0100 +++ gcc/cp/typeck.c 2021-11-25 15:22:25.554996949 +0100 @@ -6497,7 +6497,7 @@ build_x_unary_op (location_t loc, enum t exp = cp_build_addr_expr_strict (xarg, complain); if (TREE_CODE (exp) ==
RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]
> -Original Message- > From: Jakub Jelinek > Sent: Thursday, November 25, 2021 9:53 AM > To: Richard Biener > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p > simplification [PR103417] > > On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote: > > > Ah I see, sorry I didn't see that rule before, you're right that if > > > this is ordered after it then they can be dropped. > > > > So the patch is OK, possibly with re-ordering the matches. > > I've committed the patch as is because it has been tested that way and I'd > like to avoid dups of that PR flowing in. Even when not reordered, the new > earlier match.pd simplification will not trigger for the lt le gt ge cases > anymore > and the later old simplifications will trigger and I'd expect after that > latter > simplification the earlier should trigger again because the IL changed, no? > Tamar, can you handle the reordering together with the testsuite changes > (and perhaps formatting fixes in the tree.c routine)? Yes I will, I'll send a patch tomorrow morning. Thanks! Regards, Tamar > > Jakub
Re: [PATCH] c++, v3: Fix up diagnostics about taking address of an immediate member function [PR102753]
On 11/25/21 09:38, Jakub Jelinek wrote: On Wed, Nov 24, 2021 at 09:07:48PM -0500, Jason Merrill wrote: --- gcc/cp/tree.c.jj2021-11-24 15:05:23.371927735 +0100 +++ gcc/cp/tree.c 2021-11-24 17:09:05.348164621 +0100 @@ -5167,6 +5167,7 @@ make_ptrmem_cst (tree type, tree member) tree ptrmem_cst = make_node (PTRMEM_CST); TREE_TYPE (ptrmem_cst) = type; PTRMEM_CST_MEMBER (ptrmem_cst) = member; + PTRMEM_CST_LOCATION (ptrmem_cst) = input_location; return ptrmem_cst; } Please also change build_x_unary_op to improve PTRMEM_CST_LOCATION instead of adding a wrapper, and teach cp_expr_location about PTRMEM_CST_LOCATION. Done. Though, had to also change convert_for_assignment from EXPR_LOC_OR_LOC to cp_expr_loc_or_input_loc and expand_ptrmemfunc_cst to copy over location to ADDR_EXPR from PTRMEM_CST. --- gcc/cp/pt.c.jj 2021-11-24 15:05:23.336928234 +0100 +++ gcc/cp/pt.c 2021-11-24 15:34:29.018014159 +0100 @@ -17012,6 +17012,12 @@ tsubst_copy (tree t, tree args, tsubst_f r = build1 (code, type, op0); This should become build1_loc (EXPR_LOCATION (t), ... Done. if (code == ALIGNOF_EXPR) ALIGNOF_EXPR_STD_P (r) = ALIGNOF_EXPR_STD_P (t); + /* For addresses of immediate functions ensure we have EXPR_LOCATION + set for possible later diagnostics. */ + if (code == ADDR_EXPR + && TREE_CODE (op0) == FUNCTION_DECL + && DECL_IMMEDIATE_FUNCTION_P (op0)) + SET_EXPR_LOCATION (r, input_location); ...and then do this only if t didn't have a location. And this too. 2021-11-25 Jakub Jelinek PR c++/102753 * cp-tree.h (struct ptrmem_cst): Add locus member. (PTRMEM_CST_LOCATION): Define. * tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location. (cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST. * typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for PTRMEM_CST instead of calling maybe_wrap_with_location. (cp_build_addr_expr_1): Don't diagnose taking address of immediate functions here. Instead when taking their address make sure the returned ADDR_EXPR has EXPR_LOCATION set. (expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR if taking address of immediate member function. (convert_for_assignment): Use cp_expr_loc_or_input_loc instead of EXPR_LOC_OR_LOC. * pt.c (tsubst_copy): Use build1_loc instead of build1. Ensure ADDR_EXPR of immediate function has EXPR_LOCATION set. * cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate functions here. For consteval if don't walk THEN_CLAUSE. (cp_genericize_r): Move evaluation of calls to std::source_location::current from here to... (cp_fold): ... here. Don't assert calls to immediate functions must be source_location_current_p, instead only constant evaluate calls to source_location_current_p. * g++.dg/cpp2a/consteval20.C: Add some extra tests. * g++.dg/cpp2a/consteval23.C: Likewise. * g++.dg/cpp2a/consteval25.C: New test. * g++.dg/cpp2a/srcloc20.C: New test. --- gcc/cp/cp-tree.h.jj 2021-11-25 08:35:39.856073838 +0100 +++ gcc/cp/cp-tree.h2021-11-25 14:25:33.411081733 +0100 @@ -703,6 +703,7 @@ struct GTY(()) template_parm_index { struct GTY(()) ptrmem_cst { struct tree_common common; tree member; + location_t locus; }; typedef struct ptrmem_cst * ptrmem_cst_t; @@ -4726,6 +4727,11 @@ more_aggr_init_expr_args_p (const aggr_i #define PTRMEM_CST_MEMBER(NODE) \ (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member) +/* For a pointer-to-member constant `X::Y' this is a location where + the address of the member has been taken. */ +#define PTRMEM_CST_LOCATION(NODE) \ + (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->locus) + /* The expression in question for a TYPEOF_TYPE. */ #define TYPEOF_TYPE_EXPR(NODE) (TYPE_VALUES_RAW (TYPEOF_TYPE_CHECK (NODE))) --- gcc/cp/tree.c.jj 2021-11-25 08:35:39.942072610 +0100 +++ gcc/cp/tree.c 2021-11-25 14:31:32.784899701 +0100 @@ -5196,6 +5196,7 @@ make_ptrmem_cst (tree type, tree member) tree ptrmem_cst = make_node (PTRMEM_CST); TREE_TYPE (ptrmem_cst) = type; PTRMEM_CST_MEMBER (ptrmem_cst) = member; + PTRMEM_CST_LOCATION (ptrmem_cst) = input_location; return ptrmem_cst; } @@ -6040,6 +6041,8 @@ cp_expr_location (const_tree t_) return STATIC_ASSERT_SOURCE_LOCATION (t); case TRAIT_EXPR: return TRAIT_EXPR_LOCATION (t); +case PTRMEM_CST: + return PTRMEM_CST_LOCATION (t); default: return EXPR_LOCATION (t); } --- gcc/cp/typeck.c.jj 2021-11-25 08:32:50.585489416 +0100 +++ gcc/cp/typeck.c 2021-11-25 15:22:25.554996949 +0100 @@ -6497,7 +6497,7 @@ build_x_unary_op (location_t loc, enum t exp = cp_build_addr_expr_strict (xarg, complain); i
Re: [PATCH] c++: __builtin_bit_cast To C array target type [PR103140]
On 11/8/21 15:03, Will Wray via Gcc-patches wrote: This patch allows __builtin_bit_cast to materialize a C array as its To type. It was developed as part of an implementation of P1997, array copy-semantics, but is independent, so makes sense to submit, review and merge ahead of it. gcc/cp/ChangeLog: * constexpr.c (check_bit_cast_type): handle ARRAY_TYPE check, (cxx_eval_bit_cast): handle ARRAY_TYPE copy. * semantics.c (cp_build_bit_cast): warn only on unbounded/VLA. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/bit-cast2.C: update XFAIL tests. * g++.dg/cpp2a/bit-cast-to-array1.C: New test. --- gcc/cp/constexpr.c | 8 - gcc/cp/semantics.c | 7 ++--- gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C | 40 + gcc/testsuite/g++.dg/cpp2a/bit-cast2.C | 8 ++--- 4 files changed, 53 insertions(+), 10 deletions(-) diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 453007c686b..be1cdada6f8 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -4124,6 +4124,11 @@ static bool check_bit_cast_type (const constexpr_ctx *ctx, location_t loc, tree type, tree orig_type) { + if (TREE_CODE (type) == ARRAY_TYPE) + return check_bit_cast_type (ctx, loc, + TYPE_MAIN_VARIANT (TREE_TYPE (type)), + orig_type); + if (TREE_CODE (type) == UNION_TYPE) { if (!ctx->quiet) @@ -4280,7 +4285,8 @@ cxx_eval_bit_cast (const constexpr_ctx *ctx, tree t, bool *non_constant_p, tree r = NULL_TREE; if (can_native_interpret_type_p (TREE_TYPE (t))) r = native_interpret_expr (TREE_TYPE (t), ptr, len); - else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE) + else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE + || TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE) { r = native_interpret_aggregate (TREE_TYPE (t), ptr, 0, len); if (r != NULL_TREE) diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 2443d032749..b3126b12abc 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -11562,13 +11562,10 @@ cp_build_bit_cast (location_t loc, tree type, tree arg, { if (!complete_type_or_maybe_complain (type, NULL_TREE, complain)) return error_mark_node; - if (TREE_CODE (type) == ARRAY_TYPE) + if (TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type)) { - /* std::bit_cast for destination ARRAY_TYPE is not possible, -as functions may not return an array, so don't bother trying -to support this (and then deal with VLAs etc.). */ error_at (loc, "%<__builtin_bit_cast%> destination type %qT " -"is an array type", type); +"is a VLA variable-length array type", type); Null TYPE_DOMAIN doesn't mean VLA, it means unknown length. Probably better to check for null or non-constant TYPE_SIZE rather than specifically for VLA. return error_mark_node; } if (!trivially_copyable_p (type)) diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C new file mode 100644 index 000..e6e50c06389 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C @@ -0,0 +1,40 @@ +// { dg-do compile } + +class S { int s; }; +S s(); +class U { int a, b; }; +U u(); + +void +foo (int *q) +{ + __builtin_bit_cast (int [1], 0); + __builtin_bit_cast (S [1], 0); + __builtin_bit_cast (U [1], u); +} + +template +void +bar (int *q) +{ + int intN[N] = {}; + int int2N[2*N] = {}; + __builtin_bit_cast (int [N], intN); + __builtin_bit_cast (S [N], intN); + __builtin_bit_cast (U [N], int2N); +} + +template +void +baz (T1 ia, T2 sa, T3 ua) +{ + __builtin_bit_cast (T1, *ia); + __builtin_bit_cast (T2, *sa); + __builtin_bit_cast (T3, *ua); +} + +void +qux (S* sp, int *ip, U* up) +{ + baz (ip, sp, up); +} diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C index 6bb1760e621..7f1836ee4e9 100644 --- a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C +++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C @@ -14,7 +14,7 @@ foo (int *q) __builtin_bit_cast (int, s);// { dg-error "'__builtin_bit_cast' source type 'S' is not trivially copyable" } __builtin_bit_cast (S, 0); // { dg-error "'__builtin_bit_cast' destination type 'S' is not trivially copyable" } __builtin_bit_cast (int &, q); // { dg-error "'__builtin_bit_cast' destination type 'int&' is not trivially copyable" } - __builtin_bit_cast (int [1], 0); // { dg-error "'__builtin_bit_cast' destination type \[^\n\r]* is an array type" } + __builtin_bit_cast (S [1], 0); // { dg-error "'__builtin_bit_cast' destination type \[^\n\r]* is not trivially copyable" } __builtin_bit_cast (V, 0); // { dg-error "invalid use of incomplete type
Re: [PATCH 1/3] c++: designated init of char array by string constant [PR55227]
On 11/21/21 21:51, Will Wray via Gcc-patches wrote: Also address "FIXME: this code is duplicated from reshape_init" in cp_complete_array_type by always calling reshape_init on init-list. PR c++/55227 gcc/cp/ChangeLog: * decl.c (reshape_init_r): Only call has_designator_check when first_initializer_p or for the inner constructor element. (cp_complete_array_type): Call reshape_init on braced-init-list. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/desig20.C: New test. --- gcc/cp/decl.c| 42 +-- gcc/testsuite/g++.dg/cpp2a/desig20.C | 48 2 files changed, 65 insertions(+), 25 deletions(-) diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 2ddf0e4a524..83a2d3bf8f1 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6824,28 +6824,31 @@ reshape_init_r (tree type, reshape_iter *d, tree first_initializer_p, if (TREE_CODE (type) == ARRAY_TYPE && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type { - tree str_init = init; - tree stripped_str_init = stripped_init; + tree arr_init = init; + tree stripped_arr_init = stripped_init; This renaming seems unnecessary; OK without the name change. + reshape_iter stripd = {}; /* Strip one level of braces if and only if they enclose a single element (as allowed by [dcl.init.string]). */ if (!first_initializer_p - && TREE_CODE (stripped_str_init) == CONSTRUCTOR - && CONSTRUCTOR_NELTS (stripped_str_init) == 1) + && TREE_CODE (stripped_arr_init) == CONSTRUCTOR + && CONSTRUCTOR_NELTS (stripped_arr_init) == 1) { - str_init = (*CONSTRUCTOR_ELTS (stripped_str_init))[0].value; - stripped_str_init = tree_strip_any_location_wrapper (str_init); + stripd.cur = CONSTRUCTOR_ELT (stripped_arr_init, 0); + arr_init = stripd.cur->value; + stripped_arr_init = tree_strip_any_location_wrapper (arr_init); } /* If it's a string literal, then it's the initializer for the array as a whole. Otherwise, continue with normal initialization for array types (one value per array element). */ - if (TREE_CODE (stripped_str_init) == STRING_CST) + if (TREE_CODE (stripped_arr_init) == STRING_CST) { - if (has_designator_problem (d, complain)) + if ((first_initializer_p && has_designator_problem (d, complain)) + || (stripd.cur && has_designator_problem (&stripd, complain))) return error_mark_node; d->cur++; - return str_init; + return arr_init; } } @@ -9545,22 +9548,11 @@ cp_complete_array_type (tree *ptype, tree initial_value, bool do_default) if (initial_value) { /* An array of character type can be initialized from a -brace-enclosed string constant. - -FIXME: this code is duplicated from reshape_init. Probably -we should just call reshape_init here? */ - if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (*ptype))) - && TREE_CODE (initial_value) == CONSTRUCTOR - && !vec_safe_is_empty (CONSTRUCTOR_ELTS (initial_value))) - { - vec *v = CONSTRUCTOR_ELTS (initial_value); - tree value = (*v)[0].value; - STRIP_ANY_LOCATION_WRAPPER (value); - - if (TREE_CODE (value) == STRING_CST - && v->length () == 1) - initial_value = value; - } +brace-enclosed string constant so call reshape_init to +remove the optional braces from a braced string literal. */ + if (BRACE_ENCLOSED_INITIALIZER_P (initial_value)) + initial_value = reshape_init (*ptype, initial_value, + tf_warning_or_error); /* If any of the elements are parameter packs, we can't actually complete this type now because the array size is dependent. */ diff --git a/gcc/testsuite/g++.dg/cpp2a/desig20.C b/gcc/testsuite/g++.dg/cpp2a/desig20.C new file mode 100644 index 000..daadfa58855 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/desig20.C @@ -0,0 +1,48 @@ +// PR c++/55227 +// Test designated initializer for char array by string constant + +// { dg-options "" } + +struct C {char a[2];}; + +/* Case a, designated, unbraced, string-literal of the exact same size + as the initialized char array; valid and accepted before and after. */ +C a = {.a="a"}; + +/* Cases b,c,d, designated, braced or mimatched-size, string literal, + previously rejected; "C99 designator 'a' outside aggregate initializer". */ +C b = {.a=""}; +C c = {.a={""}}; +C d = {.a={"a"}}; + +/* Case e, designated char array field and braced, designated array element(s) + (with GNU [N]= extension) valid and accepted before and after. */ +C e = {.a={[0]='a'}}; + +/* Cases f,g,h, braced string literal, 'designated' within inner braces; + invalid, previously accepted a
Re: [COMMITTED] path solver: Compute ranges in path in gimple order.
Pushed. Sorry for the noise. On Thu, Nov 25, 2021 at 1:51 PM Aldy Hernandez wrote: > > On Thu, Nov 25, 2021 at 1:38 PM Richard Biener > wrote: > > > > On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez wrote: > > > > > > On Thu, Nov 25, 2021 at 12:57 PM Richard Biener > > > wrote: > > > > > > > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches > > > > wrote: > > > > > > > > > > Andrew's patch for this PR103254 papered over some underlying > > > > > performance issues in the path solver that I'd like to address. > > > > > > > > > > We are currently solving the SSA's defined in the current block in > > > > > bitmap order, which amounts to random order for all purposes. This is > > > > > causing unnecessary recursion in gori. This patch changes the order > > > > > to gimple order, thus solving dependencies before uses. > > > > > > > > > > There is no change in threadable paths with this change. > > > > > > > > > > Tested on x86-64 & ppc64le Linux. > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR tree-optimization/103254 > > > > > * gimple-range-path.cc > > > > > (path_range_query::compute_ranges_defined): New > > > > > (path_range_query::compute_ranges_in_block): Move to > > > > > compute_ranges_defined. > > > > > * gimple-range-path.h (compute_ranges_defined): New. > > > > > --- > > > > > gcc/gimple-range-path.cc | 33 ++--- > > > > > gcc/gimple-range-path.h | 1 + > > > > > 2 files changed, 23 insertions(+), 11 deletions(-) > > > > > > > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > > > > > index 4aa666d2c8b..e24086691c4 100644 > > > > > --- a/gcc/gimple-range-path.cc > > > > > +++ b/gcc/gimple-range-path.cc > > > > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis > > > > > (basic_block bb) > > > > > } > > > > > } > > > > > > > > > > +// Compute ranges defined in block. > > > > > + > > > > > +void > > > > > +path_range_query::compute_ranges_defined (basic_block bb) > > > > > +{ > > > > > + int_range_max r; > > > > > + > > > > > + compute_ranges_in_phis (bb); > > > > > + > > > > > + // Iterate in gimple order to minimize recursion. > > > > > + for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); > > > > > gsi_next (&gsi)) > > > > > > > > gsi_next_nondebug (&gsi)? > > > > > > > > Of course this all has the extra cost of iterating over a possibly > > > > very large > > > > BB for just a few bits in m_imports? How often does m_imports have > > > > exactly one bit set? > > > > > > Hmmm, good point. > > > > > > Perhaps this isn't worth it then. I mean, the underlying bug I'm > > > tackling is an excess of outgoing edge ranges, not the excess > > > recursion this patch attacks. > > > > > > If you think the cost would be high for large ILs, I can revert the patch. > > > > I think so. If ordering is important then that should be achieved in some > > other ways (always a bit difficult for on-demand infrastructure). > > Nah, this isn't a correctness issue. It's not worth it. > > I will revert the patch. > > Thanks. > Aldy From f21dc29d923f559c069fbd0b32e473f5a76de12c Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 25 Nov 2021 17:30:07 +0100 Subject: [PATCH] path solver: Revert computation of ranges in gimple order. Revert the patch below, as it may slow down compilation with large CFGs. commit 8acbd7bef6edbf537e3037174907029b530212f6 Author: Aldy Hernandez Date: Wed Nov 24 09:43:36 2021 +0100 path solver: Compute ranges in path in gimple order. --- gcc/gimple-range-path.cc | 33 +++-- gcc/gimple-range-path.h | 1 - 2 files changed, 11 insertions(+), 23 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index 806bce9ff11..b9c71226c1c 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -401,27 +401,6 @@ path_range_query::compute_ranges_in_phis (basic_block bb) } } -// Compute ranges defined in block. - -void -path_range_query::compute_ranges_defined (basic_block bb) -{ - int_range_max r; - - compute_ranges_in_phis (bb); - - // Iterate in gimple order to minimize recursion. - for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) -if (gimple_has_lhs (gsi_stmt (gsi))) - { - tree name = gimple_get_lhs (gsi_stmt (gsi)); - if (TREE_CODE (name) == SSA_NAME - && bitmap_bit_p (m_imports, SSA_NAME_VERSION (name)) - && range_defined_in_block (r, name, bb)) - set_cache (r, name); - } -} - // Compute ranges defined in the current block, or exported to the // next block. @@ -444,7 +423,17 @@ path_range_query::compute_ranges_in_block (basic_block bb) clear_cache (name); } - compute_ranges_defined (bb); + // Solve imports defined in this block, starting with the PHIs... + compute_ranges_in_phis (bb); + // ...and then the rest of the imports. + EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. The rest looks good. In the test, can you add a comment telling what it is testing? Something like: "This tests that constant shape expressions passed to the reshape intrinsic are properly simplified before being used to diagnose invalid values" We also used to put a comment mentioning the person who submitted the test, but not everybody seems to do it these days. Mikael
[commit][master+OG11] amdgcn: Fix ICE generating CFI [PR103396]
If committed this patch to fix the amdgcn ICE reported in PR103396. The problem was that it was mis-counting the number of registers to save when the link register was only clobbered implicitly by calls. The issue is easily fixed by adjusting the condition to match elsewhere in the same function. Committed to master and backported to devel/omp/gcc-11. It should affect GCC 11. Andrewamdgcn: Fix ICE generating CFI [PR103396] gcc/ChangeLog: PR target/103396 * config/gcn/gcn.c (move_callee_saved_registers): Ensure that the number of spilled registers is counted correctly. diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 75a9c576694..2bde88afc32 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -2785,7 +2785,7 @@ move_callee_saved_registers (rtx sp, machine_function *offsets, int start = (regno == VGPR_REGNO (7) ? 64 : 0); int count = MIN (saved_scalars - start, 64); int add_lr = (regno == VGPR_REGNO (6) - && df_regs_ever_live_p (LINK_REGNUM)); + && offsets->lr_needs_saving); int lrdest = -1; rtvec seq = rtvec_alloc (count + add_lr);
Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)
Hi, On Thu, Nov 25 2021, Jan Hubicka wrote: >> >> gcc/ChangeLog: >> >> 2021-11-23 Martin Jambor >> >> PR ipa/103227 >> * ipa-prop.h (ipa_get_param): New overload. Move bits of the existing >> one to the new one. >> * ipa-param-manipulation.h (ipa_param_adjustments): New member >> function get_updated_index_or_split. >> * ipa-param-manipulation.c >> (ipa_param_adjustments::get_updated_index_or_split): New function. >> * ipa-prop.c (adjust_agg_replacement_values): Reimplement, add >> capability to identify scalarized parameters and perform substitution >> on them. >> (ipcp_transform_function): Create descriptors earlier, handle new >> return values of adjust_agg_replacement_values. >> >> gcc/testsuite/ChangeLog: >> >> 2021-11-23 Martin Jambor >> >> PR ipa/103227 >> * gcc.dg/ipa/pr103227-1.c: New test. >> * gcc.dg/ipa/pr103227-3.c: Likewise. >> * gcc.dg/ipa/pr103227-2.c: Likewise. >> * gfortran.dg/pr53787.f90: Disable IPA-SRA. >> --- >> gcc/ipa-param-manipulation.c | 33 >> gcc/ipa-param-manipulation.h | 7 +++ >> gcc/ipa-prop.c| 73 +++ >> gcc/ipa-prop.h| 15 -- >> gcc/testsuite/gcc.dg/ipa/pr103227-1.c | 29 +++ >> gcc/testsuite/gcc.dg/ipa/pr103227-2.c | 29 +++ >> gcc/testsuite/gcc.dg/ipa/pr103227-3.c | 52 +++ >> gcc/testsuite/gfortran.dg/pr53787.f90 | 2 +- >> 8 files changed, 216 insertions(+), 24 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-1.c >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-2.c >> create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-3.c >> >> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c >> index cec1dba701f..479c20b3871 100644 >> --- a/gcc/ipa-param-manipulation.c >> +++ b/gcc/ipa-param-manipulation.c >> @@ -449,6 +449,39 @@ ipa_param_adjustments::get_updated_indices (vec >> *new_indices) >> } >> } >> >> +/* If a parameter with original INDEX has survived intact, return its new >> + index. Otherwise return -1. In that case, if it has been split and >> there >> + is a new parameter representing a portion at unit OFFSET for which a >> value >> + of a TYPE can be substituted, store its new index into SPLIT_INDEX, >> + otherwise store -1 there. */ >> +int >> +ipa_param_adjustments::get_updated_index_or_split (int index, >> + unsigned unit_offset, >> + tree type, int *split_index) >> +{ >> + unsigned adj_len = vec_safe_length (m_adj_params); >> + for (unsigned i = 0; i < adj_len ; i++) > > In ipa-modref I precompute this to map so we do not need to walk all > params, but the loop is probably not bad since functions do not have > tens of thousdands parameters :) The most I have seen is about 70 and those were big outliers. I was thinking of precomputing it somehow but for one parameter there can be up to param ipa-sra-max-replacements replacements (default 8 - and there is another, by default stricter, limit for pointers). So it would have to be a hash table or something like it. > > Can I use it in ipa-modref to discover what parameters was turned from > by-reference to scalar, too? IIUC, I don't think you directly can, also because for one parameter you can have more scalar replacements and the interface needs an offset for which to look. OTOH, if you only care about simple scalars passed by reference, then passing zero as offset - and probably adding a flag to check there are no replacements at other offsets - would work. (But that information could also be easily pre-computed.) >> +{ >> + ipa_adjusted_param *apm = &(*m_adj_params)[i]; >> + if (apm->base_index != index) >> +continue; >> + if (apm->op == IPA_PARAM_OP_COPY) >> +return i; >> + if (apm->op == IPA_PARAM_OP_SPLIT >> + && apm->unit_offset == unit_offset) >> +{ >> + if (useless_type_conversion_p (apm->type, type)) >> +*split_index = i; >> + else >> +*split_index = -1; >> + return -1; >> +} >> +} >> + >> + *split_index = -1; >> + return -1; >> +} >> + >> /* Return the original index for the given new parameter index. Return a >> negative number if not available. */ >> >> diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h >> index 5adf8a22356..d1dad9fac73 100644 >> --- a/gcc/ipa-param-manipulation.h >> +++ b/gcc/ipa-param-manipulation.h >> @@ -236,6 +236,13 @@ public: >>void get_surviving_params (vec *surviving_params); >>/* Fill a vector with new indices of surviving original parameters. */ >>void get_updated_indices (vec *new_indices); >> + /* If a parameter with original INDEX has survived intact, return its new >> + index. Otherwise return -1. In that case, if it has been spli
Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)
> > > > In ipa-modref I precompute this to map so we do not need to walk all > > params, but the loop is probably not bad since functions do not have > > tens of thousdands parameters :) > > The most I have seen is about 70 and those were big outliers. > > I was thinking of precomputing it somehow but for one parameter there > can be up to param ipa-sra-max-replacements replacements (default 8 - > and there is another, by default stricter, limit for pointers). So it > would have to be a hash table or something like it. Yep, I think given that we have API, we can play with this later. > > > > > Can I use it in ipa-modref to discover what parameters was turned from > > by-reference to scalar, too? > > IIUC, I don't think you directly can, also because for one parameter you > can have more scalar replacements and the interface needs an offset for > which to look. OTOH, if you only care about simple scalars passed by > reference, then passing zero as offset - and probably adding a flag to > check there are no replacements at other offsets - would work. (But > that information could also be easily pre-computed.) If parameter is broken up into multipe pieces, I can just duplicate its ECF flags (if I know that pointers from whole structure does not escape, neither does pointers from its part). However presently modref compute noting useful for aggregate parameters (have patch for that but it was too late in this stage1 to push out everything, so it will come next stage1). If parameter is turned from by-reference and possibly offsetted, I can use the original ECF flag after applying deref_flags translation. Again it is not problem to multily it if the parameter is stplit into multiple subparameters. Honza
Re: [PATCH 3/4] libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup
On Tue, Nov 23, 2021 at 06:56:14PM +0100, Florian Weimer wrote: > 8<--8< > This allows switching to a different implementation for > PT_GNU_EH_FRAME lookup in a subsequent commit. > > This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader > lock that is implied by dl_iterate_phdr. However, the FDE is already > parsed outside the lock before this change, so this does not introduce > additional crashes in case of a concurrent dlclose. > > libunwind/ChangeLog > > * unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr. > Remove func, ret. > (find_fde_tail): New function. Split from > _Unwind_IteratePhdrCallback. Move the result initialization > from _Unwind_Find_FDE. > (_Unwind_Find_FDE): Updated to call find_fde_tail. LGTM, thanks. Jakub
PING^3 [PATCH v4 0/2] Implement indirect external access
On Mon, Nov 1, 2021 at 7:02 AM H.J. Lu wrote: > > On Thu, Oct 21, 2021 at 12:56 PM H.J. Lu wrote: > > > > On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu wrote: > > > > > > Changes in the v4 patch. > > > > > > 1. Add nodirect_extern_access attribute. > > > > > > Changes in the v3 patch. > > > > > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to > > > GNU binutils 2.38. But the -z indirect-extern-access linker option is > > > only available for Linux/x86. However, the --max-cache-size=SIZE linker > > > option was also addded within a day. --max-cache-size=SIZE is used to > > > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support. > > > > > > Changes in the v2 patch. > > > > > > 1. Rename the option to -fdirect-extern-access. > > > > > > --- > > > On systems with copy relocation: > > > * A copy in executable is created for the definition in a shared library > > > at run-time by ld.so. > > > * The copy is referenced by executable and shared libraries. > > > * Executable can access the copy directly. > > > > > > Issues are: > > > * Overhead of a copy, time and space, may be visible at run-time. > > > * Read-only data in the shared library becomes read-write copy in > > > executable at run-time. > > > * Local access to data with the STV_PROTECTED visibility in the shared > > > library must use GOT. > > > > > > On systems without function descriptor, function pointers vary depending > > > on where and how the functions are defined. > > > * If the function is defined in executable, it can be the address of > > > function body. > > > * If the function, including the function with STV_PROTECTED visibility, > > > is defined in the shared library, it can be the address of the PLT entry > > > in executable or shared library. > > > > > > Issues are: > > > * The address of function body may not be used as its function pointer. > > > * ld.so needs to search loaded shared libraries for the function pointer > > > of the function with STV_PROTECTED visibility. > > > > > > Here is a proposal to remove copy relocation and use canonical function > > > pointer: > > > > > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must > > > use GOT. > > > a. Linker may optimize out GOT access if the data is defined in PIE or > > > non-PIE. > > > 2. Read-only data in the shared library remain read-only at run-time > > > 3. Address of global data with the STV_PROTECTED visibility in the shared > > > library is the address of data body. > > > a. Can use IP-relative access. > > > b. May need GOT without IP-relative access. > > > 4. For systems without function descriptor, > > > a. All global function pointers of undefined functions in PIE and > > > non-PIE must use GOT. Linker may optimize out GOT access if the > > > function is defined in PIE or non-PIE. > > > b. Function pointer of functions with the STV_PROTECTED visibility in > > > executable and shared library is the address of function body. > > >i. Can use IP-relative access. > > >ii. May need GOT without IP-relative access. > > >iii. Branches to undefined functions may use PLT. > > > 5. Single global definition marker: > > > > > > Add GNU_PROPERTY_1_NEEDED: > > > > > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO > > > > > > to indicate the needed properties by the object file. > > > > > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: > > > > > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) > > > > > > to indicate that the object file requires canonical function pointers and > > > cannot be used with copy relocation. This bit should be cleared in > > > executable when there are non-GOT or non-PLT relocations in relocatable > > > input files without this bit set. > > > > > > a. Protected symbol access within the shared library can be treated as > > > local. > > > b. Copy relocation should be disallowed at link-time and run-time. > > > c. GOT function pointer reference is required at link-time and run-time. > > > > > > The indirect external access marker can be used in the following ways: > > > > > > 1. Linker can decide the best way to resolve a relocation against a > > > protected symbol before seeing all relocations against the symbol. > > > 2. Dynamic linker can decide if it is an error to have a copy relocation > > > in executable against the protected symbol in a shared library by checking > > > if the shared library is built with -fno-direct-extern-access. > > > > > > Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is > > > the default. With -fno-direct-extern-access: > > > > > > 1. Always to use GOT to access undefined symbols, including in PIE and > > > non-PIE. This is safe to do and does not break the ABI. > > > 2. In executable and shared library, for symbols with the STV_PROTECTED > > > visibility: > > > a. The address of data symbol is the address of data body. > > > b. For systems without function descriptor, the function
[PATCH take 3] ivopts: Improve code generated for very simple loops.
On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guent...@gmail.com> wrote: > On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle > wrote: > > > The patch doesn't add any testcase. > > > > The three new attached tests check that the critical invariants have a > > simpler form, and hopefully shouldn't be affected by whether the > > optimizer and/or backend costs actually decide to perform this iv > > substitution > or not. > > The testcases might depend on lp64 though, did you test them with -m32? > IMHO it's fine to require lp64 here. Great catch. You're right that when the loop index has the same precision as the target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, so that with -m32 my new tests ivopts-[567].c fail. I've added "require lp64" to those tests, but I've also added two more tests, using char and unsigned char for the loop expression, which are optimized on both ilp32 and lp64. For example, with -O2 -m32, we see the following improvements in ivopts-8.c: diff ivopts-8.old.s ivopts-8.new.s 14,16c14,15 < subl$1, %ecx < movzbl %cl, %ecx < leal4(%eax,%ecx,4), %ecx --- > movsbl %cl, %ecx > leal(%eax,%ecx,4), %ecx This might also explain why GCC currently generates sub-optimal code. Back when ivopts was written, most folks were on i686, so the generated code was optimal. But with the transition to x86_64, the code is correct, just slightly less efficient. > I'm a bit unsure about adding this special-casing in cand_value_at in general > - it > does seem that we're doing sth wrong elsewhere - either by not simplifying > even > though enough knowledge is there or by throwing away knowledge earlier > (during niter analysis?). I agree this approach is a bit ugly. Conceptually, an alternative might be to avoid throwing away knowledge earlier, during niter analysis, by adding an extra tree field to the tree_niter_desc structure, so that it returns both niter0 (the iteration count at the top of the loop) and niter1 (the iteration count at the bottom of the loop), so that later passes (cand_value_at) can use the tree that's relevant. Alas, this too is ugly, and inefficient as we're creating/folding trees that may never be used/useful. A compromise might be to add an enum field describing how the niter was calculated to tree_niter_desc, and this can be inspected/used by cand_value_at. The current patch figures this out by examining the other fields already in tree_niter_desc. > Anyway, the patch does look quite safe - can you show some statistics in how > many time there's extra simplification this way during say bootstrap? Certainly. During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu, cand_value_at is called 500657 times. The majority of calls, 447607 (89.4%), request the value at the end of the loop (after_adjust), while 53050 (10.6%) request the value at the start of the loop. 102437 calls (20.5%) are optimized by clause 1 [0..N loops] 27939 calls (5.6%) are optimized by clause 2 [beg..end loops] Looking for opportunities to improve things further, I see that 319608 calls (63.8%) have a LT_EXPR exit test. 160965 calls (32.2%) have a NE_EXPR exit test. 20084 calls (4.0%) have a GT_EXPR exit test. so handling descending loops wouldn’t be a big win. I'll investigate whether (constant) step sizes other than 1 are (i) sufficiently common and (ii) benefit from improved folding. This revised patch has been test on x86_64-pc-linux-gnu with a make bootstrap and make -k check, both with and without --target-board=unix{-m32}, with no new failures. Ok for mainline? 2021-11-25 Roger Sayle gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite * gcc.dg/wrapped-binop-simplify.c: Update expected test result. * gcc.dg/tree-ssa/ivopts-5.c: New test case. * gcc.dg/tree-ssa/ivopts-6.c: New test case. * gcc.dg/tree-ssa/ivopts-7.c: New test case. * gcc.dg/tree-ssa/ivopts-8.c: New test case. * gcc.dg/tree-ssa/ivopts-9.c: New test case. Roger -- diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 4769b65..067f823 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -5030,28 +5030,57 @@ determine_group_iv_cost_address (struct ivopts_data *data, return !sum_cost.infinite_cost_p (); } -/* Computes value of candidate CAND at position AT in iteration NITER, and - stores it to VAL. */ +/* Computes value of candidate CAND at position AT in iteration DESC->NITER, + and stores it to VAL. */ static void -cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at,
Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. The rest looks good. In the test, can you add a comment telling what it is testing? Something like: "This tests that constant shape expressions passed to the reshape intrinsic are properly simplified before being used to diagnose invalid values" Can do. We also used to put a comment mentioning the person who submitted the test, but not everybody seems to do it these days. Can do. Mikael Harald
[committed] libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]
Tested x86_64-linux, pushed to trunk. libstdc++-v3/ChangeLog: PR libstdc++/101608 * include/bits/ranges_algobase.h (__fill_n_fn): Check for constant evaluation before using memset. * testsuite/25_algorithms/fill_n/constrained.cc: Check byte-sized values as well. --- libstdc++-v3/include/bits/ranges_algobase.h | 28 --- .../25_algorithms/fill_n/constrained.cc | 6 ++-- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/libstdc++-v3/include/bits/ranges_algobase.h b/libstdc++-v3/include/bits/ranges_algobase.h index c8c4d032983..9929e5e828b 100644 --- a/libstdc++-v3/include/bits/ranges_algobase.h +++ b/libstdc++-v3/include/bits/ranges_algobase.h @@ -527,17 +527,25 @@ namespace ranges if (__n <= 0) return __first; - // TODO: Generalize this optimization to contiguous iterators. - if constexpr (is_pointer_v<_Out> - // Note that __is_byte already implies !is_volatile. - && __is_byte>::__value - && integral<_Tp>) - { - __builtin_memset(__first, static_cast(__value), __n); - return __first + __n; - } - else if constexpr (is_scalar_v<_Tp>) + if constexpr (is_scalar_v<_Tp>) { + // TODO: Generalize this optimization to contiguous iterators. + if constexpr (is_pointer_v<_Out> + // Note that __is_byte already implies !is_volatile. + && __is_byte>::__value + && integral<_Tp>) + { +#ifdef __cpp_lib_is_constant_evaluated + if (!std::is_constant_evaluated()) +#endif + { + __builtin_memset(__first, +static_cast(__value), +__n); + return __first + __n; + } + } + const auto __tmp = __value; for (; __n > 0; --__n, (void)++__first) *__first = __tmp; diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc index 6a015d34a89..1d1e1c104d4 100644 --- a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc +++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc @@ -73,11 +73,12 @@ test01() } } +template constexpr bool test02() { bool ok = true; - int x[6] = { 1, 2, 3, 4, 5, 6 }; + T x[6] = { 1, 2, 3, 4, 5, 6 }; const int y[6] = { 1, 2, 3, 4, 5, 6 }; const int z[6] = { 17, 17, 17, 4, 5, 6 }; @@ -94,5 +95,6 @@ int main() { test01(); - static_assert(test02()); + static_assert(test02()); + static_assert(test02()); // PR libstdc++/101608 } -- 2.31.1
[PATCH v2] elf: Add _dl_find_object function
I have reword the previous patch to make the interface more generally useful. Since there are now four words in the core arrays, I did away with the separate base address array. (We can bring it back in the future if necessary.) I fixed a bug in the handling of proxy map (by not copying proxy maps during the dlopen update). The placement of the function is also different, as explained in the commit message. The performance seems unchanged. I haven't included the obvious future performance enhancements in this patch, and also did not update to Arm's __gnu_Unwind_Find_exidx to use the new interface. I think this work can be done in follow-up patches. Thanks, Florian Subject: elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. --- NEWS | 4 + bits/dl_find_object.h | 32 + dlfcn/Makefile | 2 +- dlfcn/dlfcn.h | 22 + elf/Makefile | 47 +- elf/Versions | 3 + elf/dl-close.c | 4 + elf/dl-find_object.c | 841 + elf/dl-find_object.h | 115 +++ elf/dl-libc_freeres.c | 2 + elf/dl-open.c | 5 + elf/dl-support.c | 3 + elf/libc-dl_find_object.c | 26 + elf/rtld.c | 11 + elf/rtld_static_init.c | 1 + elf/tst-dl_find_object-mod1.c | 10 + elf/tst-dl_find_object-mod2.c | 15 + elf/tst-dl_find_object-mod3.c | 10 + elf/tst-dl_find_object-mod4.c | 10 + elf/tst-dl_find_object-mod5.c | 11 + elf/tst-dl_find_object-mod6.c | 11 + elf/tst-dl_find_object-mod7.c | 10 + elf/tst-dl_find_object-mod8.c | 10 + elf/tst-dl_find_object-mod9.c | 10 + elf/tst-dl_find_object-static.c| 22 + elf/tst-dl_find_object-threads.c | 275 +++ elf/tst-dl_find_object.c | 240 ++ include/atomic_wide_counter.h | 14 + include/bits/dl_find_object.h | 1 + include/dlfcn.h| 2 + include/link.h | 3 + manual/Makefile| 2 +- manual/dynlink.texi| 137 manual/libdl.texi | 10 - manual/probes.texi | 2 +- manual/threads.texi| 2 +- sysdeps/arm/bits/dl_find_object.h | 25 + sysdeps/generic/ldsodefs.h | 5 + sysdeps/mach/hurd/i386/libc.abilist| 1 + sysdeps/nios2/bits/dl_find_object.h| 25 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 1 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 1 + sysdeps/unix/sysv/linux/arc/libc.abilist | 1 + sysdeps/unix/sysv/linux/arm/be/libc.abilist| 1 + sysdeps/unix/sysv/linux/arm/le/libc.abilist| 1 + sysdeps/unix/sysv/linux/csky/libc.abilist | 1 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 1 + sysdeps/unix/sysv/linux/i386/libc.abilist | 1 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 1 + sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist | 1 + sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist | 1 + sysdeps/unix/sysv/linux/microblaze/be/libc.abilist | 1 + sysdeps/unix/sysv/linux/microblaze/le/libc.abilist | 1 + .../unix/sysv/linux/mips/m
Re: [PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE
* Jakub Jelinek: >> +/* Fallback declaration for old glibc headers. DL_FIND_EH_FRAME_DBASE is >> used >> + as a proxy to determine if declares _dl_find_eh_frame. */ >> +#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE >> +#if NEED_DBASE_MEMBER >> +void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak)); >> +#else >> +void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak)); >> +#endif >> +#define USE_DL_FIND_EH_FRAME 1 >> +#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL) >> +#endif > > I'd prefer not to do this. If we find glibc with the support in the > headers, let's use it, otherwise let's keep using what we were doing before. I've included a simplified version below, based on the _dl_find_object patch for glibc. This is a bit difficult to test, but I ran a full toolchain bootstrap with GCC + glibc on all glibc-supported architectures (except Hurd and one m68k variant; they do not presnetly build, see Joseph's testers). I also tested this by copying the respective GCC-built libgcc_s into a glibc build tree for run-time testing on i686-linux-gnu and x86_64-linux-gnu. There weren't any issues. There are a buch of unwinder tests in glibc, giving at least some coverage. Thanks, Florian Subject: libgcc: Use _dl_find_object in _Unwind_Find_FDE libgcc/ChangeLog: * unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call _dl_find_object if available. --- libgcc/unwind-dw2-fde-dip.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c index fbb0fbdebb9..b837d8e4904 100644 --- a/libgcc/unwind-dw2-fde-dip.c +++ b/libgcc/unwind-dw2-fde-dip.c @@ -504,6 +504,24 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases) if (ret != NULL) return ret; + /* Use DLFO_STRUCT_HAS_EH_DBASE as a proxy for the existence of a glibc-style + _dl_find_object function. */ +#ifdef DLFO_STRUCT_HAS_EH_DBASE + { +struct dl_find_object dlfo; +if (_dl_find_object (pc, &dlfo) == 0) + return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame, +# if DLFO_STRUCT_HAS_EH_DBASE + (_Unwind_Ptr) dlfo.dlfo_eh_dbase, +# else + NULL, +# endif + bases); +else + return NULL; +} +#endif /* DLFO_STRUCT_HAS_EH_DBASE */ + data.pc = (_Unwind_Ptr) pc; #if NEED_DBASE_MEMBER data.dbase = NULL;
Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Le 25/11/2021 à 21:03, Harald Anlauf a écrit : Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Yes, I agree with all of this. My comment wasn’t about a check on shape->expr_type, but on shape->value->expr_type if shape->expr_type is a (parameter) variable. Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. Probably, I was just trying to avoid followup bugs. ;-) I have checked the following: integer, parameter :: a(2) = [1,1] integer, parameter :: b(2) = a + 1 print *, reshape([1,2,3,4], b) end and it doesn’t fail as I thought it would. So yes, I was wrong; b has been expanded to an array before. Can you add an assert or a comment saying that the parameter value has been expanded to a constant array? Ok with that change.
Re: [PATCH v7] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]
Hi! On Wed, Nov 24, 2021 at 08:48:47PM -0300, Raoni Fassina Firmino wrote: > gcc/ChangeLog: > * builtins.c (expand_builtin_fegetround): New function. > (expand_builtin_feclear_feraise_except): New function. > (expand_builtin): Add cases for BUILT_IN_FEGETROUND, > BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT Something is missing here (maybe just a full stop?) > * config/rs6000/rs6000.md (fegetroundsi): New pattern. > (feclearexceptsi): New Pattern. > (feraiseexceptsi): New Pattern. > * doc/extend.texi: Add a new introductory paragraph about the > new builtins. Pet peeve: please don't break lines early, we have only 72 columns per line and we have many long symbol names. Trying to make many lines very short only results in everything looking very irregular, which is harder to read. > * doc/md.texi: (fegetround@var{m}): Document new optab. > (feclearexcept@var{m}): Document new optab. > (feraiseexcept@var{m}): Document new optab. > * optabs.def (fegetround_optab): New optab. > (feclearexcept_optab): New optab. > (feraiseexcept_optab): New optab. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New > test. > * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New > test. > * gcc.target/powerpc/builtin-fegetround.c: New test. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -6860,6 +6860,117 @@ >[(set_attr "type" "fpload") > (set_attr "length" "8") > (set_attr "isa" "*,p8v,p8v")]) > + > +;; int fegetround(void) > +;; > +;; This expansion for the C99 function only expands for compatible > +;; target libcs. Because it needs to return one of FE_DOWNWARD, > +;; FE_TONEAREST, FE_TOWARDZERO or FE_UPWARD with the values as defined > +;; by the target libc, and since they are free to > +;; choose the values and the expand needs to know then beforehand, > +;; this expand only expands for target libcs that it can handle the > +;; values is knows. > +;; Because of these restriction, this only expands on the desired > +;; case and fallback to a call to libc on any otherwise. > +(define_expand "fegetroundsi" (This needs some wordsmithing.) > +;; int feclearexcept(int excepts) > +;; > +;; This expansion for the C99 function only works when EXCEPTS is a > +;; constant known at compile time and specifies any one of > +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags. > +;; It doesn't handle values out of range, and always returns 0. It FAILs the expansion if a parameter is bad? Is this comment out of date? > +;; Note that FE_INVALID is unsupported because it maps to more than > +;; one bit of the FPSCR register. It could be implemented, now that you check for the libc used. It is a fixed part of the ABI :-) > +;; The FE_* are defined in the targed libc, and since they are free to > +;; choose the values and the expand needs to know then beforehand, s/then/them/ > +;; this expand only expands for target libcs that it can handle the (this expander) > +;; values is knows. s/is/it/ > +/* This testcase ensures that the builtins expand with the matching arguments > + * or otherwise fallback gracefully to a function call, and don't ICE during > + * compilation. > + * "-fno-builtin" option is used to enable calls to libc implementation of > the > + * gcc builtins tested when not using __builtin_ prefix. */ Don't use leading * in comments, btw. This is a testcase so anything goes, but FYI :-) > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c > + int i, rounding, expected; > + const int rm[] = {FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD}; > + for (i = 0; i < sizeof(rm); i++) That should be sizeof rm / sizeof rm[0] ? It accesses out of bounds as it is. Maybe test more values? At least 0, but also combinations of these FE_ bits, and maybe even FE_INVALID? With such changes the rs6000 parts are okay for trunk. Thanks! I looked at the generic changes as well, and they all look fine to me. Segher
Re: libstdc++: Make atomic::wait() const [PR102994]
On Wed, 24 Nov 2021 at 01:27, Thomas Rodgers wrote: > > const qualification was also missing in the free functions for > wait/wait_explicit/notify_one/notify_all. Revised patch attached. Please tweak the whitespace in the new test: > +test1(const std::atomic &a, char*p) The '&' should be on the type not the variable, and there should be a space before 'p': > +test1(const std::atomic& a, char* p) OK for trunk and gcc-11 with that tweak, thanks!
[PATCH, v2] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377
Hi Mikael, Am 25.11.21 um 22:02 schrieb Mikael Morin: Le 25/11/2021 à 21:03, Harald Anlauf a écrit : Hi Mikael, Am 25.11.21 um 17:46 schrieb Mikael Morin: Hello, Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit : diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 5a5aca10ebe..837eb0912c0 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape, { gfc_constructor *c; bool test; + gfc_constructor_base b; + if (shape->expr_type == EXPR_ARRAY) + b = shape->value.constructor; + else if (shape->expr_type == EXPR_VARIABLE) + b = shape->symtree->n.sym->value->value.constructor; This misses a check that shape->symtree->n.sym->value is an array, so that it makes sense to access its constructor. there are checks further above for the cases shape->expr_type == EXPR_ARRAY and for shape->expr_type == EXPR_VARIABLE which look at the elements of array shape to see if they are non-negative. Only in those cases where the full "if ()'s" pass we set shape_is_const = true; and proceed. The purpose of the auxiliary bool shape_is_const is to avoid repeating the lengthy if's again. Only then the above cited code segment should get executed. For shape->expr_type == EXPR_ARRAY there is really no change in logic. For shape->expr_type == EXPR_VARIABLE the above snipped is now executed, but then we already had else if (shape->expr_type == EXPR_VARIABLE && shape->ref && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1 && shape->ref->u.ar.as && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER && shape->symtree->n.sym->attr.flavor == FL_PARAMETER && shape->symtree->n.sym->value) In which situations do I miss anything new? Yes, I agree with all of this. My comment wasn’t about a check on shape->expr_type, but on shape->value->expr_type if shape->expr_type is a (parameter) variable. Actually, this only supports the case where the parameter value is defined by an array; but it could be an intrinsic call, a sum of parameters, a reference to an other parameter, etc. E.g. the following (still) does get rejected: print *, reshape([1,2,3,4,5], a+1) print *, reshape([1,2,3,4,5], a+a) print *, reshape([1,2,3,4,5], 2*a) print *, reshape([1,2,3,4,5], [3,3]) print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2)) and has been rejected before. The usual way to handle this is to call gfc_reduce_init_expr which (pray for it) will make an array out of whatever the shape expression is. Can you give an example where it fails? I think the current code would almost certainly fail, too. Probably, I was just trying to avoid followup bugs. ;-) I have checked the following: integer, parameter :: a(2) = [1,1] integer, parameter :: b(2) = a + 1 print *, reshape([1,2,3,4], b) end and it doesn’t fail as I thought it would. well, that one is actually better valid, since b=[2,2]. So yes, I was wrong; b has been expanded to an array before. Motivated by your reasoning I tried gfc_reduce_init_expr. That attempt failed miserably (many regressions), and I think it is not right. Then I found that array sections posed a problem that wasn't detected before. gfc_simplify_expr seemed to be a better choice that makes more sense for the present situations and seems to work here. And it even detects many more invalid cases now than e.g. Intel ;-) I've updated the patch and testcase accordingly. Can you add an assert or a comment saying that the parameter value has been expanded to a constant array? Ok with that change. Given the above discussion, I'll give you another day or two to have a further look. Otherwise Gerhard will... ;-) Cheers, Harald From 56fd0d23ac0a5bda802e5cce3024b947e497555a Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Thu, 25 Nov 2021 22:39:44 +0100 Subject: [PATCH] Fortran: improve check of arguments to the RESHAPE intrinsic gcc/fortran/ChangeLog: PR fortran/103411 * check.c (gfc_check_reshape): Improve check of size of source array for the RESHAPE intrinsic against the given shape when pad is not given, and shape is a parameter. Try other simplifications of shape. gcc/testsuite/ChangeLog: PR fortran/103411 * gfortran.dg/pr68153.f90: Adjust test to improved check. * gfortran.dg/reshape_7.f90: Likewise. * gfortran.dg/reshape_9.f90: New test. --- gcc/fortran/check.c | 22 +- gcc/testsuite/gfortran.dg/pr68153.f90 | 2 +- gcc/testsuite/gfortran.dg/reshape_7.f90 | 2 +- gcc/testsuite/gfortran.dg/reshape_9.f90 | 24 4 files changed, 43 insertions(+), 7 deletions(-) create mode 100644 gcc/t
[PATCH] x86: Add -mmove-max=bits and -mstore-max=bits
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move and store, independent of -mprefer-vector-width=bits: 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES which are enabled for Intel Sapphire Rapids processor. 2. Add -mmove-max=bits to set the maximum number of bits can be moved from memory to memory efficiently. The default value is derived from X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the preferred vector width. 3. Add -mstore-max=bits to set the maximum number of bits can be stored to memory efficiently. The default value is derived from X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the preferred vector width. gcc/ PR target/103269 * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE and PVW_NONE to ix86_target_string. * config/i386/i386-options.c (ix86_target_string): Add arguments for move_max and store_max. (ix86_target_string::add_vector_width): New lambda. (ix86_debug_options): Pass ix86_move_max and ix86_store_max to ix86_target_string. (ix86_function_specific_print): Pass ptr->x_ix86_move_max and ptr->x_ix86_store_max to ix86_target_string. (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and x_ix86_store_max. (ix86_option_override_internal): Set the default x_ix86_move_max and x_ix86_store_max. * config/i386/i386-options.h (ix86_target_string): Add prefer_vector_width and prefer_vector_width. * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed. (TARGET_AVX256_STORE_BY_PIECES): Likewise. (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max == PVW_AVX512. Use 32 if ix86_move_max or ix86_store_max >= PVW_AVX256. (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512. Use 32 if ix86_store_max >= PVW_AVX256. * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits. * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New. (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise. * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits. gcc/testsuite/ PR target/103269 * gcc.target/i386/pieces-memcpy-17.c: New test. * gcc.target/i386/pieces-memcpy-18.c: Likewise. * gcc.target/i386/pieces-memcpy-19.c: Likewise. * gcc.target/i386/pieces-memcpy-20.c: Likewise. * gcc.target/i386/pieces-memcpy-21.c: Likewise. * gcc.target/i386/pieces-memset-45.c: Likewise. * gcc.target/i386/pieces-memset-46.c: Likewise. * gcc.target/i386/pieces-memset-47.c: Likewise. * gcc.target/i386/pieces-memset-48.c: Likewise. * gcc.target/i386/pieces-memset-49.c: Likewise. --- gcc/config/i386/i386-expand.c | 1 + gcc/config/i386/i386-options.c| 75 +-- gcc/config/i386/i386-options.h| 6 +- gcc/config/i386/i386.h| 18 ++--- gcc/config/i386/i386.opt | 8 ++ gcc/config/i386/x86-tune.def | 10 +++ gcc/doc/invoke.texi | 13 .../gcc.target/i386/pieces-memcpy-17.c| 16 .../gcc.target/i386/pieces-memcpy-18.c| 16 .../gcc.target/i386/pieces-memcpy-19.c| 16 .../gcc.target/i386/pieces-memcpy-20.c| 16 .../gcc.target/i386/pieces-memcpy-21.c| 16 .../gcc.target/i386/pieces-memset-45.c| 16 .../gcc.target/i386/pieces-memset-46.c| 17 + .../gcc.target/i386/pieces-memset-47.c| 17 + .../gcc.target/i386/pieces-memset-48.c| 17 + .../gcc.target/i386/pieces-memset-49.c| 16 17 files changed, 276 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-49.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 0d5d1a0e205..7e77ff56ddc 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -12295,6 +12295,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, char *opts = ix86_target_string (bisa, bisa2, 0, 0, NULL, NULL, (enum fpmath_unit) 0,
[committed] libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]
Tested x86_64-linux, pushed to trunk. This implements the resolution I'm proposing for LWG 3545, to avoid hard errors when using std::to_address for types that make pointer_traits ill-formed. Consistent with std::iterator_traits, instantiating std::pointer_traits for a non-pointer type will be well-formed, but give an empty type with no member types. This avoids the problematic cases for std::to_address. Additionally, the pointer_to member is now only declared when the element type is not cv void (and for C++20, when the function body would be well-formed). The rebind member was already SFINAE-friendly in our implementation. libstdc++-v3/ChangeLog: PR libstdc++/96416 * include/bits/ptr_traits.h (pointer_traits): Reimplement to be SFINAE-friendly (LWG 3545). * testsuite/20_util/pointer_traits/lwg3545.cc: New test. * testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line. * testsuite/20_util/to_address/lwg3545.cc: New test. --- libstdc++-v3/include/bits/ptr_traits.h| 167 +- .../20_util/pointer_traits/lwg3545.cc | 120 + .../testsuite/20_util/to_address/1_neg.cc | 2 +- .../testsuite/20_util/to_address/lwg3545.cc | 12 ++ 4 files changed, 251 insertions(+), 50 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/pointer_traits/lwg3545.cc create mode 100644 libstdc++-v3/testsuite/20_util/to_address/lwg3545.cc diff --git a/libstdc++-v3/include/bits/ptr_traits.h b/libstdc++-v3/include/bits/ptr_traits.h index 115b86d43e4..4987fa9942f 100644 --- a/libstdc++-v3/include/bits/ptr_traits.h +++ b/libstdc++-v3/include/bits/ptr_traits.h @@ -35,6 +35,7 @@ #include #if __cplusplus > 201703L +#include #define __cpp_lib_constexpr_memory 201811L namespace __gnu_debug { struct _Safe_iterator_base; } #endif @@ -45,55 +46,119 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class __undefined; - // Given Template return T, otherwise invalid. + // For a specialization `SomeTemplate` the member `type` is T, + // otherwise `type` is `__undefined`. template struct __get_first_arg { using type = __undefined; }; - template class _Template, typename _Tp, + template class _SomeTemplate, typename _Tp, typename... _Types> -struct __get_first_arg<_Template<_Tp, _Types...>> +struct __get_first_arg<_SomeTemplate<_Tp, _Types...>> { using type = _Tp; }; - template -using __get_first_arg_t = typename __get_first_arg<_Tp>::type; - - // Given Template and U return Template, otherwise invalid. + // For a specialization `SomeTemplate` and a type `U` the member + // `type` is `SomeTemplate`, otherwise there is no member `type`. template struct __replace_first_arg { }; - template class _Template, typename _Up, + template class _SomeTemplate, typename _Up, typename _Tp, typename... _Types> -struct __replace_first_arg<_Template<_Tp, _Types...>, _Up> -{ using type = _Template<_Up, _Types...>; }; +struct __replace_first_arg<_SomeTemplate<_Tp, _Types...>, _Up> +{ using type = _SomeTemplate<_Up, _Types...>; }; - template -using __replace_first_arg_t = typename __replace_first_arg<_Tp, _Up>::type; - - template -using __make_not_void - = __conditional_t::value, __undefined, _Tp>; - - /** - * @brief Uniform interface to all pointer-like types - * @ingroup pointer_abstractions - */ +#if __cpp_concepts + // When concepts are supported detection of _Ptr::element_type is done + // by a requires-clause, so __ptr_traits_elem_t only needs to do this: template -struct pointer_traits +using __ptr_traits_elem_t = typename __get_first_arg<_Ptr>::type; +#else + // Detect the element type of a pointer-like type. + template +struct __ptr_traits_elem : __get_first_arg<_Ptr> +{ }; + + // Use _Ptr::element_type if is a valid type. + template +struct __ptr_traits_elem<_Ptr, __void_t> +{ using type = typename _Ptr::element_type; }; + + template +using __ptr_traits_elem_t = typename __ptr_traits_elem<_Ptr>::type; +#endif + + // Define pointer_traits::pointer_to. + template::value> +struct __ptr_traits_ptr_to +{ + using pointer = _Ptr; + using element_type = _Elt; + + /** + * @brief Obtain a pointer to an object + * @param __r A reference to an object of type `element_type` + * @return `pointer::pointer_to(__e)` + * @pre `pointer::pointer_to(__e)` is a valid expression. + */ + static pointer + pointer_to(element_type& __e) +#if __cpp_lib_concepts + requires requires { + { pointer::pointer_to(__e) } -> convertible_to; + } +#endif + { return pointer::pointer_to(__e); } +}; + + // Do not define pointer_traits::pointer_to if element type is void. + template +struct __ptr_traits_ptr_to<_Ptr, _Elt, true> +{ }; + + // Partial specialization defining pointer_traits::pointer_to(T&). + template +
[committed] libstdc++: Remove dg-error that no longer happens
Tested x86_64-linux, pushed to trunk. There was a c++11_only dg-error in this testcase, for a "body of constexpr function is not a return statement" diagnostic that was bogus, but happened because the return statement was ill-formed. A change to G++ earlier this month means that diagnostic is no longer emitted, so remove the dg-error. libstdc++-v3/ChangeLog: * testsuite/20_util/tuple/comparison_operators/overloaded2.cc: Remove dg-error for C++11_only error. --- .../testsuite/20_util/tuple/comparison_operators/overloaded2.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc index bac16ffd521..6a7a584c71e 100644 --- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc +++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc @@ -52,4 +52,3 @@ auto b = a < a; // { dg-error "no match for 'operator<'" "" { target c++20 } 0 } // { dg-error "no match for .*_Synth3way|in requirements" "" { target c++20 } 0 } // { dg-error "ordered comparison" "" { target c++17_down } 0 } -// { dg-error "not a return-statement" "" { target c++11_only } 0 } -- 2.31.1
[r12-5531 Regression] FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" on Linux/x86_64
On Linux/x86_64, 1b0acc4b800b589a39d637d7312da5cf969a5765 is the first bad commit commit 1b0acc4b800b589a39d637d7312da5cf969a5765 Author: Jan Hubicka Date: Thu Nov 25 23:58:48 2021 +0100 Remove forgotten early return in ipa_value_range_from_jfunc caused FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5531/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[PATCH v3 0/8] __builtin_dynamic_object_size
This patchset implements the __builtin_dynamic_object_size builtin for gcc. The primary motivation to have this builtin in gcc is to enable _FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification in use cases where the potential performance tradeoff is acceptable. Semantics: -- __builtin_dynamic_object_size has the same signature as __builtin_object_size; it accepts a pointer and type ranging from 0 to 3 and it returns an object size estimate for the pointer based on an analysis of which objects the pointer could point to. The actual properties of the object size estimate are different: - In the best case __builtin_dynamic_object_size evaluates to an expression that represents a precise size of the object being pointed to. - In case a precise object size expression cannot be evaluated, __builtin_dynamic_object_size attempts to evaluate an estimate size expression based on the object size type. - In what situations the builtin returns an estimate vs a precise expression is an implementation detail and may change in future. Users must always assume, as in the case of __builtin_object_size, that the returned value is the maximum or minimum based on the object size type they have provided. - In the worst case of failure, __builtin_dynamic_object_size returns a constant (size_t)-1 or (size_t)0. Implementation: --- - The __builtin_dynamic_object_size support is implemented in tree-object-size. In most cases the first pass (early_objsz) the builtin is treated like __builtin_object_size to preserve subobject bounds. - Each element of the object_sizes vector is now a TREE_VEC of size 2 holding bytes to the end of the object and the full size of the object. This allows proper handling of negative offsets, allowing them to the extent of the whole object bounds. This improves __builtin_object_size usage too with negative offsets, consistently returning valid results for pointer decrementing loops too. - The patchset begins with structural modification of the tree-object-size pass, followed by enhancement to return size expressions. I have split the implementation into one feature per patch (calls, function parameters, PHI, etc.) to hopefully ease review. Performance: Expressions generated by this pass in theory could be arbitrarily complex. I have not made an attempt to limit nesting of objects since it seemed too early to do that. In practice based on the few applications I built, most of the complexity of the expressions got folded away. Even so, the performance overhead is likely to be non-zero. If we find performance degradation to be significant, we could later add nesting limits to bail out if a size expression gets too complex. I have implemented simplification of __*_chk to their normal variants if we can determine at compile time that it is safe. This should limit the performance overhead of the expressions in valid cases. Build time performance doesn't seem to be affected much based on an unscientific check to time `make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`. It only increases by about a couple of seconds when the dynamic tests are added and remains more or less in the same ballpark otherwise. Testing: I have added tests for dynamic object sizes as well as wrappers for all __builtin_object_size tests to provide wide coverage. I have also done a full bootstrap build and test run on x86_64. I have also built bash, cmake, wpa_supplicant and systemtap with _FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3 (with a hacked up glibc to make sure it works) and saw no issues in any of those builds. I did some rudimentary analysis of the generated binaries using fortify-metrics[1] to confirm that there was a difference in coverage between the two fortification levels. Here is a summary of coverage in the above packages: F = number of fortified calls T = Total number of calls to fortifiable functions (fortified as well as unfortified) C = F * 100/ T Package F(2)T(2)F(3)T(3)C(2)C(3) bash428 12201005119635.08% 84.03% wpa_supplicant 163532322350340850.59% 68.96% systemtap 324 1990343 199416.28% 17.20% cmake 830 14181 958 14196 5.85% 6.75% The numbers are slightly lower than the previous patch series because in the interim I pushed an improvement to folding of the _chk builtins so that they can use ranges to simplify the calls to their regular variants. Also note that even _FORTIFY_SOURCE=2 coverage should be improved due to negative offset handling. Additional testing plans (i.e. I've already started to do some of this): - Build packages to compare values returned by __builtin_object_size with the older pass and this new one. Also compare with __builtin_dynamic_object_size. - Expand the list of packages to get more coverage metrics. - Explore performance impact on ap
[PATCH v3 1/8] tree-object-size: Replace magic numbers with enums
A simple cleanup to allow inserting dynamic size code more easily. gcc/ChangeLog: * tree-object-size.c: New enum. (object_sizes, computed, addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, merge_object_sizes, plus_stmt_object_size, collect_object_sizes_for, init_object_sizes, fini_object_sizes, object_sizes_execute): Replace magic numbers with enums. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/tree-object-size.c | 59 -- 1 file changed, 34 insertions(+), 25 deletions(-) diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 4334e05ef70..5e93bb74f92 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -45,6 +45,13 @@ struct object_size_info unsigned int *stack, *tos; }; +enum +{ + OST_SUBOBJECT = 1, + OST_MINIMUM = 2, + OST_END = 4, +}; + static tree compute_object_offset (const_tree, const_tree); static bool addr_object_size (struct object_size_info *, const_tree, int, unsigned HOST_WIDE_INT *); @@ -67,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct object_size_info *, tree, the subobject (innermost array or field with address taken). object_sizes[2] is lower bound for number of bytes till the end of the object and object_sizes[3] lower bound for subobject. */ -static vec object_sizes[4]; +static vec object_sizes[OST_END]; /* Bitmaps what object sizes have been computed already. */ -static bitmap computed[4]; +static bitmap computed[OST_END]; /* Maximum value of offset we consider to be addition. */ static unsigned HOST_WIDE_INT offset_limit; @@ -227,11 +234,11 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, { unsigned HOST_WIDE_INT sz; - if (!osi || (object_size_type & 1) != 0 + if (!osi || (object_size_type & OST_SUBOBJECT) != 0 || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME) { compute_builtin_object_size (TREE_OPERAND (pt_var, 0), - object_size_type & ~1, &sz); + object_size_type & ~OST_SUBOBJECT, &sz); } else { @@ -266,7 +273,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, } else if (DECL_P (pt_var)) { - pt_var_size = decl_init_size (pt_var, object_size_type & 2); + pt_var_size = decl_init_size (pt_var, object_size_type & OST_MINIMUM); if (!pt_var_size) return false; } @@ -287,7 +294,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, { tree var; - if (object_size_type & 1) + if (object_size_type & OST_SUBOBJECT) { var = TREE_OPERAND (ptr, 0); @@ -528,7 +535,7 @@ bool compute_builtin_object_size (tree ptr, int object_size_type, unsigned HOST_WIDE_INT *psize) { - gcc_assert (object_size_type >= 0 && object_size_type <= 3); + gcc_assert (object_size_type >= 0 && object_size_type < OST_END); /* Set to unknown and overwrite just before returning if the size could be determined. */ @@ -546,7 +553,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (computed[object_size_type] == NULL) { - if (optimize || object_size_type & 1) + if (optimize || object_size_type & OST_SUBOBJECT) return false; /* When not optimizing, rather than failing, make a small effort @@ -586,8 +593,8 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (dump_file) { fprintf (dump_file, "Computing %s %sobject size for ", - (object_size_type & 2) ? "minimum" : "maximum", - (object_size_type & 1) ? "sub" : ""); + (object_size_type & OST_MINIMUM) ? "minimum" : "maximum", + (object_size_type & OST_SUBOBJECT) ? "sub" : ""); print_generic_expr (dump_file, ptr, dump_flags); fprintf (dump_file, ":\n"); } @@ -620,7 +627,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, terminate, it could take a long time. If a pointer is increasing this way, we need to assume 0 object size. E.g. p = &buf[0]; while (cond) p = p + 4; */ - if (object_size_type & 2) + if (object_size_type & OST_MINIMUM) { osi.depths = XCNEWVEC (unsigned int, num_ssa_names); osi.stack = XNEWVEC (unsigned int, num_ssa_names); @@ -679,8 +686,9 @@ compute_builtin_object_size (tree ptr, int object_size_type, fprintf (dump_file, ": %s %sobject size " HOST_WIDE_INT_PRINT_UNSIGNED "\n", -(object_size_type & 2) ? "minimum" : "maximum", -(object_size_type & 1) ? "sub" :
[PATCH v3 2/8] tree-object-size: Abstract object_sizes array
Put all accesses to object_sizes behind functions so that we can add dynamic capability more easily. gcc/ChangeLog: * tree-object-size.c (object_sizes_grow, object_sizes_release, object_sizes_unknown_p, object_sizes_get, object_size_set_force, object_sizes_set): New functions. (addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, unknown_object_size, merge_object_sizes, plus_stmt_object_size, cond_expr_object_size, collect_object_sizes_for, check_for_plus_in_loops_1, init_object_sizes, fini_object_sizes): Adjust. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/tree-object-size.c | 177 +++-- 1 file changed, 98 insertions(+), 79 deletions(-) diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 5e93bb74f92..3780437ff91 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -88,6 +88,71 @@ unknown (int object_size_type) return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1)); } +/* Grow object_sizes[OBJECT_SIZE_TYPE] to num_ssa_names. */ + +static inline void +object_sizes_grow (int object_size_type) +{ + if (num_ssa_names > object_sizes[object_size_type].length ()) +object_sizes[object_size_type].safe_grow (num_ssa_names, true); +} + +/* Release object_sizes[OBJECT_SIZE_TYPE]. */ + +static inline void +object_sizes_release (int object_size_type) +{ + object_sizes[object_size_type].release (); +} + +/* Return true if object_sizes[OBJECT_SIZE_TYPE][VARNO] is unknown. */ + +static inline bool +object_sizes_unknown_p (int object_size_type, unsigned varno) +{ + return (object_sizes[object_size_type][varno] + == unknown (object_size_type)); +} + +/* Return size for VARNO corresponding to OSI. */ + +static inline unsigned HOST_WIDE_INT +object_sizes_get (struct object_size_info *osi, unsigned varno) +{ + return object_sizes[osi->object_size_type][varno]; +} + +/* Set size for VARNO corresponding to OSI to VAL. */ + +static inline bool +object_sizes_set_force (struct object_size_info *osi, unsigned varno, + unsigned HOST_WIDE_INT val) +{ + object_sizes[osi->object_size_type][varno] = val; + return true; +} + +/* Set size for VARNO corresponding to OSI to VAL if it is the new minimum or + maximum. */ + +static inline bool +object_sizes_set (struct object_size_info *osi, unsigned varno, + unsigned HOST_WIDE_INT val) +{ + int object_size_type = osi->object_size_type; + if ((object_size_type & OST_MINIMUM) == 0) +{ + if (object_sizes[object_size_type][varno] < val) + return object_sizes_set_force (osi, varno, val); +} + else +{ + if (object_sizes[object_size_type][varno] > val) + return object_sizes_set_force (osi, varno, val); +} + return false; +} + /* Initialize OFFSET_LIMIT variable. */ static void init_offset_limit (void) @@ -247,7 +312,7 @@ addr_object_size (struct object_size_info *osi, const_tree ptr, collect_object_sizes_for (osi, var); if (bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (var))) - sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)]; + sz = object_sizes_get (osi, SSA_NAME_VERSION (var)); else sz = unknown (object_size_type); } @@ -582,14 +647,14 @@ compute_builtin_object_size (tree ptr, int object_size_type, return false; } + struct object_size_info osi; + osi.object_size_type = object_size_type; if (!bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (ptr))) { - struct object_size_info osi; bitmap_iterator bi; unsigned int i; - if (num_ssa_names > object_sizes[object_size_type].length ()) - object_sizes[object_size_type].safe_grow (num_ssa_names, true); + object_sizes_grow (object_size_type); if (dump_file) { fprintf (dump_file, "Computing %s %sobject size for ", @@ -601,7 +666,6 @@ compute_builtin_object_size (tree ptr, int object_size_type, osi.visited = BITMAP_ALLOC (NULL); osi.reexamine = BITMAP_ALLOC (NULL); - osi.object_size_type = object_size_type; osi.depths = NULL; osi.stack = NULL; osi.tos = NULL; @@ -678,8 +742,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, if (dump_file) { EXECUTE_IF_SET_IN_BITMAP (osi.visited, 0, i, bi) - if (object_sizes[object_size_type][i] - != unknown (object_size_type)) + if (!object_sizes_unknown_p (object_size_type, i)) { print_generic_expr (dump_file, ssa_name (i), dump_flags); @@ -689,7 +752,7 @@ compute_builtin_object_size (tree ptr, int object_size_type, ((object_size_type & OST_MINIMUM) ? "minimum"
[PATCH v3 3/8] tree-object-size: Save sizes as trees and support negative offsets
Transform tree-object-size to operate on tree objects instead of host wide integers. This makes it easier to extend to dynamic expressions for object sizes. The compute_builtin_object_size interface also now returns a tree expression instead of HOST_WIDE_INT, so callers have been adjusted to account for that. The trees in object_sizes are each a TREE_VEC with the first element being the bytes from the pointer to the end of the object and the second, the size of the whole object. This allows analysis of negative offsets, which can now be allowed to the extent of the object bounds. Tests have been added to verify that it actually works. gcc/ChangeLog: * tree-object-size.h (compute_builtin_object_size): Return tree instead of HOST_WIDE_INT. * builtins.c (fold_builtin_object_size): Adjust. * gimple-fold.c (gimple_fold_builtin_strncat): Likewise. * ubsan.c (instrument_object_size): Likewise. * tree-object-size.c (object_sizes): Change type to vec. (initval): New function. (unknown): Use it. (size_unknown_p, size_initval, size_unknown): New functions. (object_sizes_unknown_p): Use it. (object_sizes_get): Return tree. (object_sizes_initialize): Rename from object_sizes_set_force and set VAL parameter type as tree. Add new parameter WHOLEVAL. (object_sizes_set): Set VAL parameter type as tree and adjust implementation. Add new parameter WHOLEVAL. (size_for_offset): New function. (decl_init_size): Adjust comment. (addr_object_size): Change PSIZE parameter to tree and adjust implementation. Add new parameter PWHOLESIZE. (alloc_object_size): Return tree. (compute_builtin_object_size): Return tree in PSIZE. (expr_object_size, call_object_size, unknown_object_size): Adjust for object_sizes_set change. (merge_object_sizes): Drop OFFSET parameter and adjust implementation for tree change. (plus_stmt_object_size): Call collect_object_sizes_for directly instead of merge_object_size and call size_for_offset to get net size. (cond_expr_object_size, collect_object_sizes_for, object_sizes_execute): Adjust for change of type from HOST_WIDE_INT to tree. (check_for_plus_in_loops_1): Likewise and skip non-positive offsets. gcc/testsuite/ChangeLog: * gcc.dg/builtin-object-size-1.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-2.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-3.c (test9): New test. (main): Call it. * gcc.dg/builtin-object-size-4.c (test8): New test. (main): Call it. * gcc.dg/builtin-object-size-5.c (test5, test6, test7): New tests. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. - Added support for negative offsets. gcc/builtins.c | 10 +- gcc/gimple-fold.c| 11 +- gcc/testsuite/gcc.dg/builtin-object-size-1.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-2.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-3.c | 31 ++ gcc/testsuite/gcc.dg/builtin-object-size-4.c | 30 ++ gcc/testsuite/gcc.dg/builtin-object-size-5.c | 25 ++ gcc/tree-object-size.c | 388 --- gcc/tree-object-size.h | 2 +- gcc/ubsan.c | 5 +- 10 files changed, 403 insertions(+), 159 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 384864bfb3a..50e66692775 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -10226,7 +10226,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum built_in_function fcode) static tree fold_builtin_object_size (tree ptr, tree ost) { - unsigned HOST_WIDE_INT bytes; + tree bytes; int object_size_type; if (!validate_arg (ptr, POINTER_TYPE) @@ -10251,8 +10251,8 @@ fold_builtin_object_size (tree ptr, tree ost) if (TREE_CODE (ptr) == ADDR_EXPR) { compute_builtin_object_size (ptr, object_size_type, &bytes); - if (wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + if (int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } else if (TREE_CODE (ptr) == SSA_NAME) { @@ -10260,8 +10260,8 @@ fold_builtin_object_size (tree ptr, tree ost) later. Maybe subsequent passes will help determining it. */ if (compute_builtin_object_size (ptr, object_size_type, &bytes) - && wi::fits_to_tree_p (bytes, size_type_node)) - return build_int_cstu (size_type_node, bytes); + && int_fits_type_p (bytes, size_type_node)) + return fold_convert (size_type_node, bytes); } return NULL_TREE; diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 73f090b
[PATCH v3 4/8] __builtin_dynamic_object_size: Recognize builtin
Recognize the __builtin_dynamic_object_size builtin and add paths in the object size path to deal with it, but treat it like __builtin_object_size for now. Also add tests to provide the same testing coverage for the new builtin name. gcc/ChangeLog: * builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin. * tree-object-size.h: Move object size type bits enum from tree-object-size.c and add new value OST_DYNAMIC. * builtins.c (expand_builtin, fold_builtin_2): Handle it. (fold_builtin_object_size): Handle new builtin and adjust for change to compute_builtin_object_size. * tree-object-size.c: Include builtins.h. (compute_builtin_object_size): Adjust. (early_object_sizes_execute_one, dynamic_object_sizes_execute_one): New functions. (object_sizes_execute): Rename insert_min_max_p argument to early. Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new functions. doc/extend.texi (__builtin_dynamic_object_size): Document new builtin. gcc/testsuite/ChangeLog: * g++.dg/ext/builtin-dynamic-object-size1.C: New test. * g++.dg/ext/builtin-dynamic-object-size2.C: Likewise. * gcc.dg/builtin-dynamic-alloc-size.c: Likewise. * gcc.dg/builtin-dynamic-object-size-1.c: Likewise. * gcc.dg/builtin-dynamic-object-size-10.c: Likewise. * gcc.dg/builtin-dynamic-object-size-11.c: Likewise. * gcc.dg/builtin-dynamic-object-size-12.c: Likewise. * gcc.dg/builtin-dynamic-object-size-13.c: Likewise. * gcc.dg/builtin-dynamic-object-size-14.c: Likewise. * gcc.dg/builtin-dynamic-object-size-15.c: Likewise. * gcc.dg/builtin-dynamic-object-size-16.c: Likewise. * gcc.dg/builtin-dynamic-object-size-17.c: Likewise. * gcc.dg/builtin-dynamic-object-size-18.c: Likewise. * gcc.dg/builtin-dynamic-object-size-19.c: Likewise. * gcc.dg/builtin-dynamic-object-size-2.c: Likewise. * gcc.dg/builtin-dynamic-object-size-3.c: Likewise. * gcc.dg/builtin-dynamic-object-size-4.c: Likewise. * gcc.dg/builtin-dynamic-object-size-5.c: Likewise. * gcc.dg/builtin-dynamic-object-size-6.c: Likewise. * gcc.dg/builtin-dynamic-object-size-7.c: Likewise. * gcc.dg/builtin-dynamic-object-size-8.c: Likewise. * gcc.dg/builtin-dynamic-object-size-9.c: Likewise. * gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion from builtin-dynamic-object-size-16.c. * gcc.dg/builtin-object-size-17.c: Likewise. Signed-off-by: Siddhesh Poyarekar --- Changes from v2: - Incorporated review suggestions. gcc/builtins.c| 11 +- gcc/builtins.def | 1 + gcc/doc/extend.texi | 13 ++ .../g++.dg/ext/builtin-dynamic-object-size1.C | 5 + .../g++.dg/ext/builtin-dynamic-object-size2.C | 5 + .../gcc.dg/builtin-dynamic-alloc-size.c | 7 + .../gcc.dg/builtin-dynamic-object-size-1.c| 6 + .../gcc.dg/builtin-dynamic-object-size-10.c | 9 ++ .../gcc.dg/builtin-dynamic-object-size-11.c | 7 + .../gcc.dg/builtin-dynamic-object-size-12.c | 5 + .../gcc.dg/builtin-dynamic-object-size-13.c | 5 + .../gcc.dg/builtin-dynamic-object-size-14.c | 5 + .../gcc.dg/builtin-dynamic-object-size-15.c | 5 + .../gcc.dg/builtin-dynamic-object-size-16.c | 6 + .../gcc.dg/builtin-dynamic-object-size-17.c | 7 + .../gcc.dg/builtin-dynamic-object-size-18.c | 8 + .../gcc.dg/builtin-dynamic-object-size-19.c | 104 .../gcc.dg/builtin-dynamic-object-size-2.c| 6 + .../gcc.dg/builtin-dynamic-object-size-3.c| 6 + .../gcc.dg/builtin-dynamic-object-size-4.c| 6 + .../gcc.dg/builtin-dynamic-object-size-5.c| 7 + .../gcc.dg/builtin-dynamic-object-size-6.c| 5 + .../gcc.dg/builtin-dynamic-object-size-7.c| 5 + .../gcc.dg/builtin-dynamic-object-size-8.c| 5 + .../gcc.dg/builtin-dynamic-object-size-9.c| 5 + gcc/testsuite/gcc.dg/builtin-object-size-16.c | 2 + gcc/testsuite/gcc.dg/builtin-object-size-17.c | 2 + gcc/tree-object-size.c| 152 +- gcc/tree-object-size.h| 10 ++ 29 files changed, 378 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c create mode 100644 gcc/testsuite/gcc
[PATCH v3 6/8] tree-object-size: Handle function parameters
Handle hints provided by __attribute__ ((access (...))) to compute dynamic sizes for objects. gcc/ChangeLog: * tree-object-size.c: Include tree-dfa.h. (parm_object_size): New function. (collect_object_sizes_for): Call it. gcc/testsuite/ChangeLog: * gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple): New function. (main): Call it. Signed-off-by: Siddhesh Poyarekar --- .../gcc.dg/builtin-dynamic-object-size-0.c| 11 gcc/tree-object-size.c| 50 ++- 2 files changed, 60 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c index ddedf6a49bd..ce0f4eb17f3 100644 --- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c @@ -46,6 +46,14 @@ test_deploop (size_t sz, size_t cond) return __builtin_dynamic_object_size (bin, 0); } +size_t +__attribute__ ((access (__read_write__, 1, 2))) +__attribute__ ((noinline)) +test_parmsz_simple (void *obj, size_t sz) +{ + return __builtin_dynamic_object_size (obj, 0); +} + unsigned nfails = 0; #define FAIL() ({ \ @@ -64,6 +72,9 @@ main (int argc, char **argv) FAIL (); if (test_deploop (128, 129) != 32) FAIL (); + if (test_parmsz_simple (argv[0], __builtin_strlen (argv[0]) + 1) + != __builtin_strlen (argv[0]) + 1) +FAIL (); if (nfails > 0) __builtin_abort (); diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 5b4dcb619cd..48b1ec6e26a 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-fold.h" #include "gimple-iterator.h" #include "tree-cfg.h" +#include "tree-dfa.h" #include "stringpool.h" #include "attribs.h" #include "builtins.h" @@ -1446,6 +1447,53 @@ cond_expr_object_size (struct object_size_info *osi, tree var, gimple *stmt) return reexamine; } +/* Find size of an object passed as a parameter to the function. */ + +static void +parm_object_size (struct object_size_info *osi, tree var) +{ + int object_size_type = osi->object_size_type; + tree parm = SSA_NAME_VAR (var); + + if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE (parm))) +expr_object_size (osi, var, parm); + + /* Look for access attribute. */ + rdwr_map rdwr_idx; + + tree fndecl = cfun->decl; + const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl); + tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm))); + tree sz = NULL_TREE; + + if (access && access->sizarg != UINT_MAX) +{ + tree fnargs = DECL_ARGUMENTS (fndecl); + tree arg = NULL_TREE; + unsigned argpos = 0; + + /* Walk through the parameters to pick the size parameter and safely +scale it by the type size. */ + for (arg = fnargs; argpos != access->sizarg && arg; + arg = TREE_CHAIN (arg), ++argpos); + + if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg))) + { + sz = get_or_create_ssa_default_def (cfun, arg); + if (sz != NULL_TREE) + { + sz = fold_convert (sizetype, sz); + if (typesize) + sz = size_binop (MULT_EXPR, sz, typesize); + } + } +} + if (!sz) +sz = size_unknown (object_size_type); + + object_sizes_set (osi, SSA_NAME_VERSION (var), sz, sz); +} + /* Compute an object size expression for VAR, which is the result of a PHI node. */ @@ -1603,7 +1651,7 @@ collect_object_sizes_for (struct object_size_info *osi, tree var) case GIMPLE_NOP: if (SSA_NAME_VAR (var) && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL) - expr_object_size (osi, var, SSA_NAME_VAR (var)); + parm_object_size (osi, var); else /* Uninitialized SSA names point nowhere. */ unknown_object_size (osi, var); -- 2.31.1