[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736 --- Comment #4 from bin cheng --- (In reply to bin cheng from comment #3) > hmm, seems topological order isn't enough for distributing a loop nest, we > need topological order plus inner loop depth-first. Well, not really. In this case, problem is that rev-post order algorithm puts "a[c] = d[3];" before the inner loop which violates the original program order. Seems that it can be fixed by inner loop depth-first order wrto how we distribute inner loop, but I am not sure if this always preserves programming order because loop has been reformed by various optimizers.
[Bug tree-optimization/95638] [10 Regression] Legit-looking code doesn't work with -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95638 bin cheng changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||10.2.0 Resolution|--- |FIXED --- Comment #15 from bin cheng --- Confirmed fixed for 10.2.0 also. Closing.
[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736 --- Comment #6 from bin cheng --- Shall this be backported to 10/11 later? Thanks.
[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627 --- Comment #5 from bin cheng --- (In reply to Jakub Jelinek from comment #3) > Started with r9-4145-ga81e2c6240655f60a49c16e0d8bbfd2ba40bba51 Sorry for the breakage. Will fix this.
[Bug tree-optimization/78427] missed optimization of loop condition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78427 bin cheng changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #5 from bin cheng --- (In reply to Antony Polukhin from comment #4) > Any progress? Oh, I missed this one. Will try to find time later. Thanks
[Bug tree-optimization/98598] Missed opportunity to optimize dependent loads in loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 --- Comment #4 from bin cheng --- Didn't go deep into the case. For simple cases taken as examples, it's possible to interchange the two loops thus enables loop invariant code motion. Though loop interchange may fail because of complicated data dependences, we may take some useful points from it, for example, the cost model checking new loop invariants wrto the outer loop.
[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627 --- Comment #9 from bin cheng --- (In reply to Jakub Jelinek from comment #8) > Still broken on current 10 branch, as written works fine on the trunk due to > the C++ FE loop changes. > Bin, did you have time to look into this yet? I am very sorry, seems I have two correctness PRs now? Will try to investigate these on this WE.
[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627 --- Comment #10 from bin cheng --- hmm, For below basic block: 128 ;; basic block 4, loop depth 2, maybe hot 129 ;;prev block 3, next block 9, flags: (NEW, VISITED) 130 ;;pred: 3 (FALLTHRU,EXECUTABLE) 131 ;;7 (FALLTHRU,DFS_BACK,EXECUTABLE) 132 # RANGE [0, 2147483647] NONZERO 2147483647 133 # c_5 = PHI <0(3), c_17(7)> 134 # .MEM_8 = PHI <.MEM_7(3), .MEM_9(7)> 135 if (_2 < c_5) 136 goto ; [INV] 137 else 138 goto ; [INV] 139 ;;succ: 8 (TRUE_VALUE,EXECUTABLE) 140 ;;9 (FALSE_VALUE,EXECUTABLE) Code in : 4276 4277 basic_block *body = get_loop_body (loop); 4278 exits = get_loop_exit_edges (loop, body); 4279 likely_exit = single_likely_exit (loop, exits); 4280 FOR_EACH_VEC_ELT (exits, i, ex) 4281 { 4282 if (ex == likely_exit) 4283 { 4284 gimple *stmt = last_stmt (ex->src); 4285 if (stmt != NULL) 4286 { gets three exit edges, one of which is bb1>, as a result, 0 niter is computed for this exit in function number_of_iterations_exit_assumptions. This seems strange, is it a fake edge added for some reason? Thanks
[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627 --- Comment #11 from bin cheng --- (In reply to bin cheng from comment #10) > hmm, > For below basic block: > 128 ;; basic block 4, loop depth 2, maybe hot > 129 ;;prev block 3, next block 9, flags: (NEW, VISITED) > 130 ;;pred: 3 (FALLTHRU,EXECUTABLE) > 131 ;;7 (FALLTHRU,DFS_BACK,EXECUTABLE) > 132 # RANGE [0, 2147483647] NONZERO 2147483647 > 133 # c_5 = PHI <0(3), c_17(7)> > 134 # .MEM_8 = PHI <.MEM_7(3), .MEM_9(7)> > 135 if (_2 < c_5) > 136 goto ; [INV] > 137 else > 138 goto ; [INV] > 139 ;;succ: 8 (TRUE_VALUE,EXECUTABLE) > 140 ;;9 (FALSE_VALUE,EXECUTABLE) > > Code in : > 4276 > 4277 basic_block *body = get_loop_body (loop); > 4278 exits = get_loop_exit_edges (loop, body); > 4279 likely_exit = single_likely_exit (loop, exits); > 4280 FOR_EACH_VEC_ELT (exits, i, ex) > 4281 { > 4282 if (ex == likely_exit) > 4283 { > 4284 gimple *stmt = last_stmt (ex->src); > 4285 if (stmt != NULL) > 4286 { > > gets three exit edges, one of which is bb1>, as a result, 0 niter is > computed for this exit in function number_of_iterations_exit_assumptions. > This seems strange, is it a fake edge added for some reason? > > Thanks Right, it's added by connect_infinite_loops_to_exit.
[Bug c++/97627] [9/10/11 Regression] loop end condition missing - endless loop with -fPIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97627 --- Comment #12 from bin cheng --- a. why the loop is considered as infinite b. we need to skip fake exit edges in niter analysis?
[Bug tree-optimization/99067] Missed optimization for induction variable elimination
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067 bin cheng changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org --- Comment #2 from bin cheng --- Mine, will have a look. Thanks for reporting.
[Bug tree-optimization/98736] [10/11 Regression] Wrong partition order generated in loop distribution pass since r10-619-g5879ab5fafedc8f6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736 --- Comment #3 from bin cheng --- hmm, seems topological order isn't enough for distributing a loop nest, we need topological order plus inner loop depth-first.
[Bug tree-optimization/99067] Missed optimization for induction variable elimination
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99067 --- Comment #3 from bin cheng --- Though not sure if the underlying root causes are the same, I think these are two different issues, at least, they are handled by different parts of code in IVOPTs. For the first one, it's a known issue in GCC and IV elimination is complicated yet quite conservative for long time, while for the second one, we indeed don't know whether "i*N+j" wraps or not. Even though we might be able to improve IVOPTs under condition of wrapping behavior.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 bin cheng changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org --- Comment #7 from bin cheng --- (In reply to Martin Liška from comment #4) > (In reply to Martin Liška from comment #3) > > But expected result is end g_2823 = 32768, right? > > Clang returns the same result 32768. > > Which regresses since r7-2373-g69b806f6a60efcf1. Hmm, that was a fix long ago. Will investigate this. Sorry for the breakage.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #9 from bin cheng --- Seems we have a long standing bug in fold-const.c:multiple_of_p in case of wrapping types. Take unsigned int as an example: (0xfffc * 0x3) % 0x3 = 0x1 But multiple_of_p returns true here. The same issue also stands for MINUS_EXPR and PLUS_EXPR. Given multiple_of_p is used elsewhere, the fix might break existing optimizations. Especially, number of loop iterations is computed in unsigned types
[Bug tree-optimization/90078] [9 Regression] ICE with deep templates caused by overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90078 --- Comment #19 from bin cheng --- I will check if the latter fix can be easily backported to GCC-9.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #13 from bin cheng --- (In reply to Richard Biener from comment #10) > (In reply to bin cheng from comment #9) > > Seems we have a long standing bug in fold-const.c:multiple_of_p in case of > > wrapping types. Take unsigned int as an example: > > (0xfffc * 0x3) % 0x3 = 0x1 > > But multiple_of_p returns true here. > > > > The same issue also stands for MINUS_EXPR and PLUS_EXPR. Given > > multiple_of_p is used elsewhere, the fix might break existing optimizations. > > Especially, number of loop iterations is computed in unsigned types > > multiple_of_p is mostly used in contexts where overflow "cannot happen" > (in TYPE/DECL_SIZE computation context), and in niter analysis it seems to > be guarded similarly. This restriction of multiple_of_p seems undocumented, Oh, I am not aware of this. Actually my previous change to it seems broke this assumption already. Will see how to fix or revert the change. > so fixing that might be good. > > Now, you don't say what's the chain of events that lead to a multiple_of_p > call > eventually leading to the wrong answer, but I guess it's the code added > under the > > + if (!niter->control.no_overflow > + && (integer_onep (s) || multiple_of_p (type, c, s))) > > check as !niter->control.no_overflow seems to suggest that the multiple_of_p > check is not properly guarded?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #14 from bin cheng --- (In reply to Richard Biener from comment #12) > So in number_of_iterations_ne it looks like the step 's' is always constant > which makes me wonder if we can somehow use ranger to tell multiple_of_p > (type, c, s) > or at least whether, if c is x * s, the multiplication could have overflowed? Yeah, I am looking if "multiple of" can be feasibly checked in niter analysis, with help of some basic information from multiple_of_p. BTW, I am not following changes in "ranger", how should I used in analysis? or similar to value range info? Thanks
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #18 from bin cheng --- Did some experiments, there are two fallouts after explicitly returning false for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR. One is the mentioned use of multiple_of_p in number_of_iterations_ne, the other is for alignment warning in stor-layout.c. As pointed out, the latter case is known not overflow/wrap. So I am thinking to introduce an additional parameter indicating that caller knows "top" doesn't overfow/wrap, otherwise, try to get rid of the undocumented assumption. we can always improve the accuracy using ranger or other tools. Not sure if this is the right way to do. As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple? For example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is multiple of 4, but it's overflow behavior on which we rely here.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #19 from bin cheng --- (In reply to bin cheng from comment #18) > Did some experiments, there are two fallouts after explicitly returning > false for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR. One is > the mentioned use of multiple_of_p in number_of_iterations_ne, the other is > for alignment warning in stor-layout.c. As pointed out, the latter case is > known not overflow/wrap. > > So I am thinking to introduce an additional parameter indicating that caller > knows "top" doesn't overfow/wrap, otherwise, try to get rid of the > undocumented assumption. we can always improve the accuracy using ranger or > other tools. Not sure if this is the right way to do. > > As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple? For > example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is > multiple of 4, but it's overflow behavior on which we rely here. Hmm, 4 is special and not a correct example. Considering: n (unsigned, multiple of 3, and > 0) + 0xfffd It's multiple of 3, but we need to rely on wrapping to get answer.
[Bug tree-optimization/100740] [9/10/11/12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r9-4145
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100740 bin cheng changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org --- Comment #2 from bin cheng --- mine. Sorry for the breakage.
[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173 --- Comment #5 from bin cheng --- (In reply to Richard Biener from comment #3) > So we're exchanging the inner two loops > > a[1][3] = 8; > for (int b = 1; b <= 5; b++) > for (int d = 0; d <= 5; d++) > for (c = 0; c <= 5; c++) > a[b][c] = a[b][c + 2] & 216; > > to > > a[1][3] = 8; > for (int b = 1; b <= 5; b++) > for (c = 0; c <= 5; c++) > for (int d = 0; d <= 5; d++) > a[b][c] = a[b][c + 2] & 216; > > but that looks wrong from a dependence analysis perspective. We have > > (compute_affine_dependence > ref_a: a[b_33][_1], stmt_a: _2 = a[b_33][_1]; > ref_b: a[b_33][c.3_32], stmt_b: a[b_33][c.3_32] = _3; > (analyze_overlapping_iterations > (chrec_a = {2, +, 1}_5) > (chrec_b = {0, +, 1}_5) > (analyze_siv_subscript > (analyze_subscript_affine_affine > (overlaps_a = [0 + 1 * x_1]) > (overlaps_b = [2 + 1 * x_1])) > ) > (overlap_iterations_a = [0 + 1 * x_1]) > (overlap_iterations_b = [2 + 1 * x_1])) > (analyze_overlapping_iterations > (chrec_a = {1, +, 1}_1) > (chrec_b = {1, +, 1}_1) > (overlap_iterations_a = [0]) > (overlap_iterations_b = [0])) > (analyze_overlapping_iterations > (chrec_a = {0, +, 1}_5) > (chrec_b = {2, +, 1}_5) > (analyze_siv_subscript > (analyze_subscript_affine_affine > (overlaps_a = [2 + 1 * x_1]) > (overlaps_b = [0 + 1 * x_1])) > ) > (overlap_iterations_a = [2 + 1 * x_1]) > (overlap_iterations_b = [0 + 1 * x_1])) > (analyze_overlapping_iterations > (chrec_a = {1, +, 1}_1) > (chrec_b = {1, +, 1}_1) > (overlap_iterations_a = [0]) > (overlap_iterations_b = [0])) > (build_classic_dist_vector > dist_vector = ( 0 0 2 > ) > ) > ) > > I don't see anything wrong with that at a first glance so the bug must be in > tree_loop_interchange::valid_data_dependences it checks > > /* Be conservative, skip case if either direction at i_idx/o_idx > levels is not '=' or '<'. */ > if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) > return false; > > dist_vect is [0 0 2], i_idx 2 and o_idx 1 but I think that dist_vect[o_idx] > should exclude zero, thus > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc > index f45b9364644..265e36c48d4 100644 > --- a/gcc/gimple-loop-interchange.cc > +++ b/gcc/gimple-loop-interchange.cc > @@ -1043,8 +1043,8 @@ tree_loop_interchange::valid_data_dependences > (unsigned i_idx, unsigned o_idx, > continue; > > /* Be conservative, skip case if either direction at i_idx/o_idx > -levels is not '=' or '<'. */ > - if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) > +levels is not '=' (for the inner loop) or '<'. */ > + if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0) > return false; > } > } > > Bin - does this analysis look sound? Hi Richard, Thanks very much for helping on this. Sorry I would need a bit more time to answer this question. Thanks again.
[Bug tree-optimization/101145] niter analysis fails for until-wrap condition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #2 from bin cheng --- (In reply to Richard Biener from comment #1) > This comes up with a pending patch to split loops like > > void > foo (int *a, int *b, unsigned l, unsigned n) > { > while (++l != n) > a[l] = b[l] + 1; > } > > into > > while (++l > n) > a[l] = b[l] + 1; > while (++l < n) > a[l] = b[l] + 1; > > since for the second loop (the "usual" case involving no wrapping of the IV) > this results in affine IVs and thus analyzable data dependence. Special case like "i++ > constant" are handled in function adjust_cond_for_loop_until_wrap, however, it only handles constant invariant on the other side right now. Will see how to cover simple cases as reported here.
[Bug tree-optimization/101145] niter analysis fails for until-wrap condition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #4 from bin cheng --- (In reply to Jiu Fu Guo from comment #3) > Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky: > > /* Only support simple cases for the moment. */ > if (TREE_CODE (iv0->base) != INTEGER_CST > || TREE_CODE (iv1->base) != INTEGER_CST) > return false; > > This code requires both sides are constant. Actually it requires an IV with constant base.
[Bug tree-optimization/101145] niter analysis fails for until-wrap condition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145 --- Comment #7 from bin cheng --- (In reply to Jiu Fu Guo from comment #5) > (In reply to bin cheng from comment #4) > > (In reply to Jiu Fu Guo from comment #3) > > > Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow > > > tricky: > > > > > > /* Only support simple cases for the moment. */ > > > if (TREE_CODE (iv0->base) != INTEGER_CST > > > || TREE_CODE (iv1->base) != INTEGER_CST) > > > return false; > > > > > > This code requires both sides are constant. > > Actually it requires an IV with constant base. > > I also feel that the intention of this function may only require one side > constant for IV0 CODE IV1. > As tests, for below loop, adjust_cond_for_loop_until_wrap return false: > > foo (int *__restrict__ a, int *__restrict__ b, unsigned i) > { > while (++i > 100) > *a++ = *b++ + 1; > } > > For below code, adjust_cond_for_loop_until_wrap returns true: > i = UINT_MAX - 200; > while (++i > 100) > *a++ = *b++ + 1; Oh sorry for being misleading. When I mentioned it requires something(...), I was describing the current behavior, not that the conditions are necessary. Feel free to improve such cases. Looking into niter analysis, these cases(trade-offs) are not rare. Thanks
[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131 --- Comment #4 from bin cheng --- (In reply to Jiu Fu Guo from comment #3) > The issue may come from 'iv0 cmp iv1' transform: > >if (c -->if (c>=b) in-loop > -->if (b<=c) in-loop > > c: {4, +, 3} > b: {1, +, 1} > > if ({1, +, 1} <= {4, +, 3}) > ==> if ({1,+,-2} <= {4,+,0}) here, error occur > ==> if ({1,+,-2} < {5,+,0}) le-->lt So this duplicates to PR100740? Thanks