Re: [RFC] Induction variable candidates not sufficiently general
On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen wrote: > A somewhat old "issue report" pointed me to the code generated for a 4-fold > manually unrolled version of the following loop: > >> while (++len != len_limit) /* this is loop */ >> if (pb[len] != cur[len]) >> break; > > As unrolled, the loop appears as: > >> while (++len != len_limit) /* this is loop */ { >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 2nd iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 3rd iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> if (++len == len_limit) /* unrolled 4th iteration */ >> break; >> if (pb[len] != cur[len]) >> break; >> } > > In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only > induction variable candidates that are being considered are all forms of the > len variable. We are not considering any induction variables to represent > the address expressions &pb[len] and &cur[len]. > > I rewrote the source code for this loop to make the addressing expressions > more explicit, as in the following: > >> cur++; >> while (++pb != last_pb) /* this is loop */ { >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 2nd iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 3rd iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> if (++pb == last_pb) /* unrolled 4th iteration */ >> break; >> if (*pb != *cur) >> break; >> ++cur; >> } > > Now, gcc does a better job of identifying the "address expression induction > variables". This version of the loop runs about 10% faster than the original > on my target architecture. > > This would seem to be a textbook pattern for the induction variable analysis. > Does anyone have any thoughts on the best way to add these candidates to the > set of induction variables that are considered by tree-ssa-loop-ivopts.c? > > Thanks in advance for any suggestions. > Hi, Could you please file a bug with your original slow test code attached? I tried to construct meaningful test case from your code snippet but not successful. There is difference in generated assembly, but it's not that fundamental. So a bug with preprocessed test would be high appreciated. I think there are two potential issues in cost computation for such case: invariant expression and iv uses outside of loop handled as inside uses. Thanks, bin
Re: [PATCH] When using -fprofile-generate=/some/path mangle absolute path of file (PR lto/85759).
On Fri, Jun 29, 2018 at 9:54 PM, Martin Liška wrote: > On 06/22/2018 10:35 PM, Jeff Law wrote: >> On 05/16/2018 05:53 AM, Martin Liška wrote: >>> On 12/21/2017 10:13 AM, Martin Liška wrote: On 12/20/2017 06:45 PM, Jakub Jelinek wrote: > Another thing is that the "/" in there is wrong, so > const char dir_separator_str[] = { DIR_SEPARATOR, '\0' }; > char *b = concat (profile_data_prefix, dir_separator_str, pwd, NULL); > needs to be used instead. This looks much nicer, I forgot about DIR_SEPARATOR. > Does profile_data_prefix have any dir separators stripped from the end? That's easy to achieve.. > Is pwd guaranteed to be relative in this case? .. however this is absolute path, which would be problematic on a DOC based FS. Maybe we should do the same path mangling as we do for purpose of gcov: https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov.c#L2424 >>> Hi. >>> >>> I decided to implement that. Which means for: >>> >>> $ gcc -fprofile-generate=/tmp/myfolder empty.c -O2 && ./a.out >>> >>> we get following file: >>> /tmp/myfolder/#home#marxin#Programming#testcases#tmp#empty.gcda >>> >>> That guarantees we have a unique file path. As seen in the PR it >>> can produce a funny ICE. >>> >>> I've been testing the patch. >>> Ready after it finishes tests? >>> >>> Martin >>> What do you think about it? Regarding the string manipulation: I'm not an expert, but work with string in C is for me always a pain :) Martin >>> >>> 0001-When-using-fprofile-generate-some-path-mangle-absolu.patch >>> >>> >>> From 386a4561a4d1501e8959871791289e95f6a89af5 Mon Sep 17 00:00:00 2001 >>> From: marxin >>> Date: Wed, 16 Aug 2017 10:22:57 +0200 >>> Subject: [PATCH] When using -fprofile-generate=/some/path mangle absolute >>> path >>> of file (PR lto/85759). >>> >>> gcc/ChangeLog: >>> >>> 2018-05-16 Martin Liska >>> >>> PR lto/85759 >>> * coverage.c (coverage_init): Mangle full path name. >>> * doc/invoke.texi: Document the change. >>> * gcov-io.c (mangle_path): New. >>> * gcov-io.h (mangle_path): Likewise. >>> * gcov.c (mangle_name): Use mangle_path for path mangling. >> ISTM you can self-approve this now if you want it to move forward :-) >> >> jeff >> > > Sure, let me install the patch then. Hi, I am a bit confused after path mangling change. Now with below command line: $ ./gcc -O2 -fprofile-use=./ sort.c -o sort.c or $ ./gcc -O2 -fprofile-use=./sort.gcda sort.c -o sort.c The da_file_name and the final name used in gcov_open is as: $ p name $11 = 0x2e63050 ./#home#chengbin.cb#work#gcc-patches#trunk-orig#target.build#bin#sort.gcda or p da_file_name $1 = 0x2e63050 "sort.gcda/#home#chengbin.cb#work#gcc-patches#trunk-orig#target.build#bin#sort.gcda" These are not valid paths? Or how should I modify the command line options? Thanks, bin > > Martin
Re: [RFC] Induction variable candidates not sufficiently general
On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen wrote: > Thanks for looking at this for me. In simplifying the test case for a bug > report, I've narrowed the "problem" to integer overflow considerations. My > len variable is declared int, and the target has 64-bit pointers. I'm > gathering that the "manual transformation" I quoted below is not considered > "equivalent" to the original source code due to different integer overflow > behaviors. If I redeclare len to be unsigned long long, then I automatically > get the optimizations that I was originally expecting. > > I suppose this is really NOT a bug? As your test case demonstrates, it is caused by wrapping unsigned int32. > > Is there a compiler optimization flag that allows the optimizer to ignore > array index integer overflow in considering legal optimizations? I am not aware of one for unsigned integer, and I guess it won't be introduced in the future either? Thanks, bin > > > > On 7/13/18 9:14 PM, Bin.Cheng wrote: >> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen >> wrote: >>> A somewhat old "issue report" pointed me to the code generated for a 4-fold >>> manually unrolled version of the following loop: >>> >>>> while (++len != len_limit) /* this is loop */ >>>> if (pb[len] != cur[len]) >>>> break; >>> >>> As unrolled, the loop appears as: >>> >>>> while (++len != len_limit) /* this is loop */ { >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 2nd iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 3rd iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 4th iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> } >>> >>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the >>> only induction variable candidates that are being considered are all forms >>> of the len variable. We are not considering any induction variables to >>> represent the address expressions &pb[len] and &cur[len]. >>> >>> I rewrote the source code for this loop to make the addressing expressions >>> more explicit, as in the following: >>> >>>> cur++; >>>> while (++pb != last_pb) /* this is loop */ { >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 2nd iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 3rd iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 4th iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> } >>> >>> Now, gcc does a better job of identifying the "address expression induction >>> variables". This version of the loop runs about 10% faster than the >>> original on my target architecture. >>> >>> This would seem to be a textbook pattern for the induction variable >>> analysis. Does anyone have any thoughts on the best way to add these >>> candidates to the set of induction variables that are considered by >>> tree-ssa-loop-ivopts.c? >>> >>> Thanks in advance for any suggestions. >>> >> Hi, >> Could you please file a bug with your original slow test code >> attached? I tried to construct meaningful test case from your code >> snippet but not successful. There is difference in generated >> assembly, but it's not that fundamental. So a bug with preprocessed >> test would be high appreciated. >> I think there are two potential issues in cost computation for such >> case: invariant expression and iv uses outside of loop handled as >> inside uses. >> >> Thanks, >> bin >> >>
Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)
On Tue, Oct 22, 2019 at 3:32 PM Jakub Jelinek wrote: > > On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote: > > So I wonder if for correctness I don't need to add: > > > > if (!use->iv->no_overflow > > && !cand->iv->no_overflow > > && !integer_pow2p (cstep)) > > return NULL_TREE; > > > > with some of the above as comment explaining why. > > > > On the other side, if cand->iv->no_overflow, couldn't we bypass the extra > > precision test? > > Here are these two in patch form. > > 2019-10-22 Jakub Jelinek > > PR debug/90231 > * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function. > (remove_unused_ivs): Use it instead of get_computation_at. When > choosing best candidate, only consider candidates where > get_debug_computation_at actually returns non-NULL. > > --- gcc/tree-ssa-loop-ivopts.c.jj 2019-10-21 14:17:57.598198162 +0200 > +++ gcc/tree-ssa-loop-ivopts.c 2019-10-22 09:30:09.782238157 +0200 > @@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi >return fold_convert (type, aff_combination_to_tree (&aff)); > } > > +/* Like get_computation_at, but try harder, even if the computation > + is more expensive. Intended for debug stmts. */ > + > +static tree > +get_debug_computation_at (class loop *loop, gimple *at, > + struct iv_use *use, struct iv_cand *cand) > +{ > + if (tree ret = get_computation_at (loop, at, use, cand)) > +return ret; > + > + tree ubase = use->iv->base, ustep = use->iv->step; > + tree cbase = cand->iv->base, cstep = cand->iv->step; > + tree var; > + tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase); > + widest_int rat; > + > + /* We must have a precision to express the values of use. */ > + if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype)) > +return NULL_TREE; > + > + /* Try to handle the case that get_computation_at doesn't, > + try to express > + use = ubase + (var - cbase) / ratio. */ > + if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep), > +&rat)) > +return NULL_TREE; > + > + bool neg_p = false; > + if (wi::neg_p (rat)) > +{ > + if (TYPE_UNSIGNED (ctype)) > + return NULL_TREE; > + neg_p = true; > + rat = wi::neg (rat); > +} > + > + /* If both IVs can wrap around and CAND doesn't have a power of two step, > + it is unsafe. Consider uint16_t CAND with step 9, when wrapping around, > + the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say > + uint8_t with step 3, those values divided by 3 cast to uint8_t will be > + ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59. > */ Interesting, so we can still get correct debug info for iter in such special cases. > + if (!use->iv->no_overflow > + && !cand->iv->no_overflow > + && !integer_pow2p (cstep)) > +return NULL_TREE; > + > + int bits = wi::exact_log2 (rat); > + if (bits == -1) > +bits = wi::floor_log2 (rat) + 1; > + if (!cand->iv->no_overflow > + && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype)) > +return NULL_TREE; The patch is fine for me. Just for the record, guess we may try to find (by recording information in early phase) the correct/bijection candidate in computing the iv in debuginfo in the future, then those checks would be unnecessary. Thanks, bin > + > + var = var_at_stmt (loop, cand, at); > + > + if (POINTER_TYPE_P (ctype)) > +{ > + ctype = unsigned_type_for (ctype); > + cbase = fold_convert (ctype, cbase); > + cstep = fold_convert (ctype, cstep); > + var = fold_convert (ctype, var); > +} > + > + ubase = unshare_expr (ubase); > + cbase = unshare_expr (cbase); > + if (stmt_after_increment (loop, cand, at)) > +var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, > + unshare_expr (cstep)); > + > + var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase); > + var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var, > +wide_int_to_tree (TREE_TYPE (var), rat)); > + if (POINTER_TYPE_P (utype)) > +{ > + var = fold_convert (sizetype, var); > + if (neg_p) > + var = fold_build1 (NEGATE_EXPR, sizetype, var); > + var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var); > +} > + else > +{ > + var = fold_convert (utype, var); > + var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype, > +ubase, var); > +} > + return var; > +} > + > /* Adjust the cost COST for being in loop setup rather than loop body. > If we're optimizing for space, the loop setup overhead is constant; > if we're optimizing for speed, amortize it over the per-iteration cost. > @@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d > struct iv_use dummy_use; > struct iv_cand *best_cand = NULL, *cand; > unsigned
Re: [PATCH GCC11]Improve uninitialized warning with value range info
-- Sender:Richard Biener Sent At:2020 Jan. 9 (Thu.) 20:01 Recipient:Bin.Cheng Cc:bin.cheng ; GCC Patches Subject:Re: [PATCH GCC11]Improve uninitialized warning with value range info On Thu, Jan 9, 2020 at 11:17 AM Bin.Cheng wrote: > > I am not quite follow here. Do you mean we collect three cases "i < > > j", "i < min(j)", "max(i) < j" then > > call prune_uninit_phi_opnds for all three conditions? > > No, I've meant to somehow arrange that the 'preds' passed to > use_pred_not_overlap_with_undef_path_pred contain all three predicates > rather than just i < j, thus "expand" fully symbolic predicates. Seems this would require non-trivial refactoring of the original code. > > This is another question? because now we simply break out of for loop > > for finding such condition: > > > > - if ((gimple_code (flag_def) == GIMPLE_PHI) > > - && (gimple_bb (flag_def) == gimple_bb (phi)) > > - && find_matching_predicate_in_rest_chains (the_pred, preds, > > -num_preds)) > > - break; > > > > It's always possible that this flag_def can't prune use predicates > > against undefined path predicates, while a later one can prune but is > > skipped? > > I don't follow but I also don't want to understand the code too much ;) > > I'm fine with the idea and if the patch cannot introudce extra bogus warnings > let's go with it. Can you amed the comment before the two find_var_cmp_const > calls? I wonder whether eliding the second sweep when the first didn't find > any predicate it skipped is worth the trouble. Thanks for the comments, I updated the patch as attached. Thanks, bin 2020-01-08 Bin Cheng * tree-ssa-uninit.c (find_var_cmp_const): New function. (use_pred_not_overlap_with_undef_path_pred): Call above. (find_matching_predicate_in_rest_chains): Remove param. 0001-Fix-false-uninitialized-warning-message.patch Description: Binary data
Re: [PATCH PR92926]Fix wrong code caused by ctor node translation unit wide sharing
On Wed, Jan 15, 2020 at 5:04 AM Jeff Law wrote: > > On Thu, 2020-01-09 at 14:20 +0800, Bin.Cheng wrote: > > On Fri, Dec 20, 2019 at 3:10 PM Richard Biener > > wrote: > > > On December 20, 2019 2:13:47 AM GMT+01:00, "Bin.Cheng" > > > wrote: > > > > On Fri, Dec 13, 2019 at 11:26 AM bin.cheng > > > > wrote: > > > > > Hi, > > > > > > > > > > As reported in PR92926, constant ctor is shared translation unit wide > > > > because of constexpr_call_table, > > > > > however, during gimplify, the shared ctor could be modified. This > > > > patch fixes the issue by unsharing > > > > > it before modification in gimplify. A test is reduced from cppcoro > > > > library and added. > > > > > Bootstrap and test ongoing. Not sure if this is the correct fix > > > > though, any comments? > > > > Ping. Any comment? > > > > > > Looks reasonable to me. > > Given PR92926 is marked as duplicate of PR93143, I updated test case > > of the patch. > > > > Thanks, > > bin > > > > 2019-12-13 Bin Cheng > > > > PR tree-optimization/93143 > > * gimplify.c (gimplify_init_constructor): Unshare ctor node before > > clearing. > > > > gcc/testsuite > > 2019-12-13 Bin Cheng > > > > PR tree-optimization/93143 > > * g++.dg/pr93143.C: New test. > Is this still needed? I think Jason fixed the outstanding sharing > problems a couple days ago. Yes, the issue is fixed and this can be discarded. Thanks, bin > > jeff >
Re: [C++ coroutines 4/6] Middle end expanders and transforms.
> > + gassign *get_res = gimple_build_assign (lhs, done); > > + gsi_replace (gsi, get_res, true); > > + *handled_ops_p = true; > > + } > > + break; > > + } > > +} > > + return NULL_TREE; > > +} > > + > > +/* Main entry point for lowering coroutine FE builtins. */ > > + > > +static unsigned int > > +execute_lower_coro_builtins (void) > > +{ > > + struct walk_stmt_info wi; > > + gimple_seq body; > > + > > + body = gimple_body (current_function_decl); > > + memset (&wi, 0, sizeof (wi)); > > + walk_gimple_seq_mod (&body, lower_coro_builtin, NULL, &wi); > > + gimple_set_body (current_function_decl, body); > > it would be nice to elide the function walk for functions not > containing any CO* stuff (you noted that below for other parts). > We can spend a bit in struct function noting functions with > coroutine code inside and set the bit from frontends or from > the gimplifier for example. Since it's behind the flag_coroutines > paywall this can be addressed as followup. Yes, this bit flag is necessary for following optimization passes, I wonder which bit you would suggest? Thanks, bin
[PATCH Coroutines]Fix false warning message about missing return
Hi, The patch sets current_function_returns_value flag in templates for all co_return/co_yield/co_await cases, as well as for ramp function. This fixes false warning message for case like the added, or various cases in cppcoro. Bootstrap and test on X86_64, is it OK? Thanks, bin gcc/cp 2020-01-20 Bin Cheng * coroutines.cc (finish_co_await_expr): Set return value flag. (finish_co_yield_expr, morph_fn_to_coro): Ditto. gcc/testsuite 2020-01-20 Bin Cheng * g++.dg/coroutines/co-return-warning-1.C: New test. 0001-Fix-false-warning-messages-about-missing-return-in-c.patch Description: Binary data
[PATCH Coroutines]Fix ICE when co_awaiting on void type
Hi, This simple patch skips calling complete_type_or_else for void type, which fixes the corresponding ICE. Thanks, bin gcc/cp 2020-01-20 Bin Cheng * coroutines.cc (build_co_await): Skip getting complete type for void. gcc/testsuite 2020-01-20 Bin Cheng * g++.dg/coroutines/co-await-void_type.C: New test. 0001-Fix-ICE-when-co_awaiting-on-void-type.patch Description: Binary data
Re: [PATCH Coroutines]Fix ICE when co_awaiting on void type
On Mon, Jan 20, 2020 at 5:28 PM Jakub Jelinek wrote: > > On Mon, Jan 20, 2020 at 08:59:20AM +, Iain Sandoe wrote: > > Hi Bin, > > > > bin.cheng wrote: > > > > > gcc/cp > > > 2020-01-20 Bin Cheng > > > > > >* coroutines.cc (build_co_await): Skip getting complete type for > > > void. > > > > > > gcc/testsuite > > > 2020-01-20 Bin Cheng > > > > > >* g++.dg/coroutines/co-await-void_type.C: New > > > test.<0001-Fix-ICE-when-co_awaiting-on-void-type.patch> > > > > This LGTM, (borderline obvious, in fact) but you will need to wait for a C++ > > maintainer to approve, > > The patch actually looks wrong to me, there is nothing magic about void > type here, it will ICE on any other incomplete type, because > complete_type_or_else returns NULL if the type is incomplete. > > So, I think you want instead: >tree o_type = complete_type_or_else (TREE_TYPE (o), o); > + if (o_type == NULL_TREE) > +return error_mark_node; >if (TREE_CODE (o_type) != RECORD_TYPE) > > or similar (the diagnostics should be emitted already, so I don't see > further point to emit it once again). Thanks very much for pointing out. I was trying to bypass generic void error message to make it more related to coroutine, like: e1.cc: In function ‘resumable foo(int)’: e1.cc:45:3: error: awaitable type ‘void’ is not a structure 45 | co_yield n + x + y; | ^~~~ vs. e1.cc: In function ‘resumable foo(int)’: e1.cc:45:20: error: invalid use of ‘void’ 45 | co_yield n + x + y; |^ Anyway I will update the patch. Thanks, bin > > But it also means that other uses of complete_type_or_else look broken: > > /* Complete this, we're going to use it. */ > coro_info->handle_type = complete_type_or_else (handle_type, fndecl); > /* Diagnostic would be emitted by complete_type_or_else. */ > if (coro_info->handle_type == error_mark_node) > return false; > > or > > if (!COMPLETE_TYPE_P (actual_type)) > actual_type = complete_type_or_else (actual_type, *stmt); > > if (TREE_CODE (actual_type) == REFERENCE_TYPE) > actual_type = build_pointer_type (TREE_TYPE (actual_type)); > > where I think the first one should check for handle_type being NULL_TREE > rather than error_mark_node, and the latter should handle the case of > it returning NULL_TREE. > > Right below the last hunk, I've noticed: > > size_t namsize = sizeof ("__parm.") + IDENTIFIER_LENGTH (pname) + 1; > char *buf = (char *) alloca (namsize); > snprintf (buf, namsize, "__parm.%s", IDENTIFIER_POINTER (pname)); > > Do you really need one byte extra? I mean, sizeof ("__parm.") already > includes +1 for the terminating '\0', so unless you append something to > the buffer later, I don't see the point for the + 1. > And the middle line could be written as > char *buf = XALLOCAVEC (char, namsize); > > Jakub >
Re: [PATCH Coroutines]Fix ICE when co_awaiting on void type
On Mon, Jan 20, 2020 at 6:31 PM Iain Sandoe wrote: > > Hi Bin, > > Bin.Cheng wrote: > > > On Mon, Jan 20, 2020 at 5:28 PM Jakub Jelinek wrote: > >> On Mon, Jan 20, 2020 at 08:59:20AM +, Iain Sandoe wrote: > >>> Hi Bin, > >>> > >>> bin.cheng wrote: > >>> > >>>> gcc/cp > >>>> 2020-01-20 Bin Cheng > >>>> > >>>> * coroutines.cc (build_co_await): Skip getting complete type for > >>>> void. > >>>> > >>>> gcc/testsuite > >>>> 2020-01-20 Bin Cheng > >>>> > >>>> * g++.dg/coroutines/co-await-void_type.C: New > >>>> test.<0001-Fix-ICE-when-co_awaiting-on-void-type.patch> > >>> > >>> This LGTM, (borderline obvious, in fact) but you will need to wait for > >>> a C++ > >>> maintainer to approve, > >> > >> The patch actually looks wrong to me, there is nothing magic about void > >> type here, it will ICE on any other incomplete type, because > >> complete_type_or_else returns NULL if the type is incomplete. > >> > >> So, I think you want instead: > >> tree o_type = complete_type_or_else (TREE_TYPE (o), o); > >> + if (o_type == NULL_TREE) > >> +return error_mark_node; > >> if (TREE_CODE (o_type) != RECORD_TYPE) > >> > >> or similar (the diagnostics should be emitted already, so I don't see > >> further point to emit it once again). > > > > Thanks very much for pointing out. I was trying to bypass generic > > void error message to make it more related to coroutine, like: > > e1.cc: In function ‘resumable foo(int)’: > > e1.cc:45:3: error: awaitable type ‘void’ is not a structure > > 45 | co_yield n + x + y; > > | ^~~~ > > > > vs. > > > > e1.cc: In function ‘resumable foo(int)’: > > e1.cc:45:20: error: invalid use of ‘void’ > > 45 | co_yield n + x + y; > > |^ > > > > Anyway I will update the patch. > > ...I had already made a start... Thanks very much! > > The patch below gives the improved diagnostic while checking for NULL > returns fom complete_type_or_else. > > OK? > Iain > > > gcc/cp > > 2020-01-20 Bin Cheng You can remove me from the ChangeLog or keep it as "bin.cheng" :-) > Iain Sandoe > > * coroutines.cc (coro_promise_type_found_p): Check for NULL return > from > complete_type_or_else. > (register_param_uses): Likewise. > (build_co_await): Do not try to use complete_type_or_else for void > types, > otherwise for incomplete types, check for NULL return from > complete_type_or_else. > > gcc/testsuite > 2020-01-20 Bin Cheng > >* g++.dg/coroutines/co-await-void_type.C: New > test.<0001-Fix-ICE-when-co_awaiting-on-void-type.patch> > > > diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc > index f0febfe0c8a..4e1ba81fb46 100644 > --- a/gcc/cp/coroutines.cc > +++ b/gcc/cp/coroutines.cc > @@ -428,8 +428,9 @@ coro_promise_type_found_p (tree fndecl, location_t loc) > > /* Complete this, we're going to use it. */ > coro_info->handle_type = complete_type_or_else (handle_type, fndecl); > + > /* Diagnostic would be emitted by complete_type_or_else. */ > - if (coro_info->handle_type == error_mark_node) > + if (!coro_info->handle_type) > return false; > > /* Build a proxy for a handle to "self" as the param to > @@ -651,8 +652,11 @@ build_co_await (location_t loc, tree a, > suspend_point_kind suspend_kind) > o = a; /* This is most likely about to fail anyway. */ > > tree o_type = TREE_TYPE (o); > - if (!VOID_TYPE_P (o_type)) > -o_type = complete_type_or_else (TREE_TYPE (o), o); > + if (o_type && !VOID_TYPE_P (o_type) && !COMPLETE_TYPE_P (o_type)) IIUC, Jakub doesn't like checking void type specially here? > +o_type = complete_type_or_else (o_type, o); > + > + if (!o_type) > +return error_mark_node; > > if (TREE_CODE (o_type) != RECORD_TYPE) > { > @@ -2746,6 +2750,10 @@ register_param_uses (tree *stmt, int *do_subtree > ATTRIBUTE_UNUSED, void *d) > if (!COMPLETE_TYPE_P (actual_type)) > actual_type = complete_type_or_else (actual_type, *stmt); > > + if (actual_type == NULL_TREE) > + /* Diagnostic emitted by complete_type_or_else. */ > + actual_type = error_mark_node; > + > if (TREE_CODE (actual_type) == REFERENCE_TYPE) > actual_type = build_pointer_type (TREE_TYPE (actual_type)); > > Thanks, bin
[PATCH Coroutines]Access promise via actor function's frame pointer argument
Hi, By standard, coroutine body should be encapsulated in try-catch block as following: try { // coroutine body } catch(...) { promise.unhandled_exception(); } Given above try-catch block is implemented in the coroutine actor function called by coroutine ramp function, so the promise should be accessed via actor function's coroutine frame pointer argument, rather than the ramp function's coroutine frame variable. This patch also refactored code to make the fix easy, unfortunately, I failed to reduce a test case from cpproro. Bootstrap and test ongoing. Is it OK? Thanks, bin gcc/cp 2020-01-20 Bin Cheng * coroutines.cc (act_des_fn, wrap_coroutine_body): New. (morph_fn_to_coro): Call act_des_fn to build actor/destroy decls, as well as access promise via actor function's frame pointer argument. Refactor code into above functions. (build_actor_fn, build_destroy_fn): Use frame pointer argument. 0001-Use-promise-in-coroutine-frame-in-actor-function.patch Description: Binary data
Re: [PATCH Coroutines]Access promise via actor function's frame pointer argument
On Mon, Jan 20, 2020 at 10:59 PM Iain Sandoe wrote: > > Hi Bin, > > bin.cheng wrote: > > > By standard, coroutine body should be encapsulated in try-catch block > > as following: > > try { > > // coroutine body > > } catch(...) { > > promise.unhandled_exception(); > > } > > Given above try-catch block is implemented in the coroutine actor > > function called by coroutine ramp function, so the promise should > > be accessed via actor function's coroutine frame pointer argument, > > rather than the ramp function's coroutine frame variable. > > thanks, good catch! > it has not triggered for me (including on some more complex test-cases that > are not useable > in the testsuite). > > > This patch also refactored code to make the fix easy, unfortunately, > > see below, > > > I failed to reduce a test case from cpproro. > > it would be good if we could try to find a reproducer. I’m happy to try and > throw > creduce at a preprocessed file, if you have one. > > > gcc/cp > > 2020-01-20 Bin Cheng > > > > * coroutines.cc (act_des_fn, wrap_coroutine_body): New. > > (morph_fn_to_coro): Call act_des_fn to build actor/destroy decls, as > > well as access promise via actor function's frame pointer argument. > > Refactor code into above functions. > > (build_actor_fn, build_destroy_fn): Use frame pointer argument > + /* We still try to look for the promise method and warn if it's not > + present. */ > + tree ueh_meth > + = lookup_promise_method (orig, coro_unhandled_exception_identifier, > + loc, /*musthave=*/ false); > + if (!ueh_meth || ueh_meth == error_mark_node) > + warning_at (loc, 0, "no member named %qE in %qT", > + coro_unhandled_exception_identifier, > + get_coroutine_promise_type (orig)); > + } > + /* Else we don't check and don't care if the method is missing. */ > + > + return fnbody; > +} > > IMO this ^^^ obfuscates the fix, and I don’t think it should be done at the > same time. Sure, given we are in this stage, I split the patch and leave refactoring to future. IMHO, the function is too long so self-contained operations worth outlined function even it's called once. Patch updated as attached. Thanks, bin gcc/cp 2020-01-20 Bin Cheng * coroutines.cc (act_des_fn): New. (morph_fn_to_coro): Call act_des_fn to build actor/destroy decls. Access promise via actor function's frame pointer argument. (build_actor_fn, build_destroy_fn): Use frame pointer argument. 0001-Use-promise-in-coroutine-frame-in-actor-function.patch Description: Binary data
[PATCH Coroutines]Fix an ICE case in expanding co_await expression
Hi, Though function co_await_expander may need to be further revised, this simple patch fixes an ICE case in co_await_expander, Handle CO_AWAIT_EXPR in conversion in co_await_expander. Function co_await_expander expands CO_AWAIT_EXPR and inserts expanded code before result of co_await is used, however, it doesn't cover the type conversion case and leads to gimplify ICE. This patch fixes it. Bootstrap and test on x86_64. Is it OK? Thanks, bin gcc/cp 2020-01-22 Bin Cheng * coroutines.cc (co_await_expander): Handle type conversion case. gcc/testsuite 2020-01-22 Bin Cheng * g++.dg/coroutines/co-await-syntax-09-convert.C: New test. 0001-Handle-CO_AWAIT_EXPR-in-conversion-in-co_await_expan.patch Description: Binary data
[PATCH PR93334][GCC11]Refine data dependence of two refs storing the same constant with the same bytes
Hi, As discussed in the PR, this simple patch refines data dependence of two write references storing the same constant with the same bytes. It simply detects the case with some restrictions and treats it as no dependence. For now the added interface in tree-data-ref.c is only used in loop distribution, which might be generalized in the future. Bootstrap and test on x86_64. Any comments? Thanks, bin 2020-01-29 Bin Cheng * tree-data-ref.c (refine_affine_dependence): New function. * tree-data-ref.h (refine_affine_dependence): New declaration. * tree.h (const_with_all_bytes_same): External declaration. * tree.c (const_with_all_bytes_same): Moved from... * tree-loop-distribution.c (const_with_all_bytes_same): ...here. Call refine_affine_dependence gcc/testsuite 2020-01-29 Bin Cheng * gcc/testsuite/gcc.dg/tree-ssa/pr93334.c: New test. 0001-pr93334.patch Description: Binary data
Re: [PATCH, ivopts] Fix fast-math-pr55281.c ICE
On Thu, Jan 30, 2020 at 8:53 PM Andrew Stubbs wrote: > > On 29/01/2020 08:24, Richard Biener wrote: > > On Tue, Jan 28, 2020 at 5:53 PM Andrew Stubbs wrote: > >> > >> This patch fixes an ICE compiling fast-math-pr55281.c for amdgcn. > >> > >> The problem is that an "iv" is created in which both base and step are > >> pointer types, > > > > How did you get a POINTER_TYPE step? That's where the issue lies > > I think. > > It can come from "find_inv_vars_cb": > >set_iv (idata, op, op, build_int_cst (TREE_TYPE (op), 0), true); This is recording invariant with zero step. It seems we are using wrong type building the zero-step. How about detecting that op has pointer type and using integer type here? Thanks, bin > > whenever "op" has a pointer type. > > Similarly for "get_iv": > >set_iv (data, var, var, build_int_cst (type, 0), true); > > whenever "var" has a pointer type. > > In this particular case, I traced the origin back to the second one, I > think (but it's somewhat hard to unpick). > > I could change one or both of those, but I don't understand enough about > the consequences of that to be sure it's the right thing to do. I can > confirm that converting the step at this point does appear to have the > desired effect in this instance. > > At least at the point of writing it to gimple I can determine what is > definitely malformed. > > Andrew
[PATCH Coroutines]Insert the default return_void call at correct position
Hi, Exception in coroutine is not correctly handled because the default return_void call is now inserted before the finish suspend point, rather than at the end of the original coroutine body. This patch fixes the issue by generating following code: co_await promise.initial_suspend(); try { // The original coroutine body promise.return_void(); // The default return_void call. } catch (...) { promise.unhandled_exception(); } final_suspend: // ... Bootstrap and test on x86_64. Is it OK? Thanks, bin gcc/cp 2020-02-03 Bin Cheng * coroutines.cc (build_actor_fn): Factor out code inserting the default return_void call to... (morph_fn_to_coro): ...here, also hoist local var declarations. gcc/testsuite 2020-02-03 Bin Cheng * g++.dg/coroutines/torture/co-ret-15-default-return_void.C: New. 0001-Insert-default-return_void-at-the-end-of-coroutine-b.patch Description: Binary data
[PATCH Coroutines]Pickup more CO_AWAIT_EXPR expanding cases
Hi, This simple patch actually is a missing part of previous one at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01451.html It picks up more CO_AWAIT_EXPR expanding cases. Test is also added. Bootstrap and test on x86_64. Is it OK? Thanks, bin gcc/cp 2020-02-05 Bin Cheng * coroutines.cc (co_await_expander): Handle more CO_AWAIT_EXPR expanding cases. gcc/testsuite 2020-02-05 Bin Cheng * g++.dg/coroutines/co-await-syntax-10.C: New test. 0001-Pickup-more-CO_AWAIT_EXPR-expanding-cases.patch Description: Binary data
[PATCH Coroutines][OBVIOUS]Fix typo of temporary variable name
Hi, This is an obvious patch fixing typo in maybe_promote_captured_temps, previously, the vnum (== 0) is used for all captured temps... Will commit later. Thanks, bin gcc/cp 2020-02-05 Bin Cheng * coroutines.cc (maybe_promote_captured_temps): Increase the index number for temporary variables' name. obvious-vnum.patch Description: Binary data
Re: [PATCH coroutines] Change lowering behavior of alias variable from copy to substitute
On Thu, Feb 6, 2020 at 5:12 PM Iain Sandoe wrote: > > Hi JunMa, > > JunMa wrote: > > > 在 2020/2/4 下午8:17, JunMa 写道: > >> Hi > >> When testing coroutines with lambda function, I find we copy each captured > >> variable to frame. However, according to gimplify pass, for each > >> declaration > >> that is an alias for another expression(DECL_VALUE_EXPR), we can > >> substitute them directly. > >> > >> Since lambda captured variables is one of this kind. It is better to > >> replace them > >> rather than copy them, This can reduce frame size (all of the captured > >> variables > >> are field of closure class) and avoid extra copy behavior as well. > >> > >> This patch remove all of the code related to copy captured variable. > >> Instead, we first rewrite DECL_VALUE_EXPR with frame field, then we check > >> every variable whether it has DECL_VALUE_EXPR, and then substitute it, > >> this > >> patch does not create frame field for such variables. > >> > >> Bootstrap and test on X86_64, is it OK? > > > minor update: only handle var_decl when iterate BIND_EXPR_VARS > > in register_local_var_uses. > > Do you have any other local patches applied along with this? > > Testing locally (on Darwin), I see regressions with optimisation O2/O3/Os > e.g: > > class-05-lambda-capture-copy-local.C -O2 (internal compiler error) > class-06-lambda-capture-ref.C -O2 (internal compiler error) > lambda-05-capture-copy-local.C -O2 (internal compiler error) > lambda-06-multi-capture.C -O2 (internal compiler error) > lambda-07-multi-capture.C -O2 (internal compiler error) > lambda-08-co-ret-parm-ref.C -O3 -g (internal compiler error) > > I have applied this to master, and on top of the patches posted by you and > Bin, but the results are the same. Hi Iains, Thanks for helping. Yes, there will be another patch fixing the O2/O3 issues. Will send it out for review soon. Thanks, bin > > thanks > Iain > > >> gcc/cp > >> 2020-02-04 Jun Ma > >> > >> * coroutines.cc (morph_fn_to_coro): Remove code related to > >> copy captured variable. > >> (register_local_var_uses): Ditto. > >> (register_param_uses): Collect use of parameters inside > >> DECL_VALUE_EXPR. > >> (transform_local_var_uses): Substitute the alias variable > >> with DECL_VALUE_EXPR if it has one. > >> > >> > >> gcc/testsuite > >> 2020-02-04 Jun Ma > >> > >> * g++.dg/coroutines/lambda-07-multi-capture.C: New test. > > > > > > <0001-fix-alias-variable.patch> > >
Re: [PATCH Coroutines]Pickup more CO_AWAIT_EXPR expanding cases
Hi, We found more ICEs because of unexpanded CO_AWAIT_EXPR, it turned out we can fix these issues with more simplification in function co_await_expander. Here is the patch with a new test. Bootstrap and test on x86_64. Is it OK? Thanks, bin gcc/cp 2020-02-10 Bin Cheng * coroutines.cc (co_await_expander): Simplify. gcc/testsuite 2020-02-10 Bin Cheng * g++.dg/coroutines/co-await-syntax-10.C: New test. * g++.dg/coroutines/co-await-syntax-11.C: New test. -- Sender:bin.cheng Sent At:2020 Feb. 5 (Wed.) 09:34 Recipient:GCC Patches Cc:Iain Sandoe ; Nathan Sidwell Subject:[PATCH Coroutines]Pickup more CO_AWAIT_EXPR expanding cases Hi, This simple patch actually is a missing part of previous one at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01451.html It picks up more CO_AWAIT_EXPR expanding cases. Test is also added. Bootstrap and test on x86_64. Is it OK? Thanks, bin 0002-Simplify-co_await_expander.patch Description: Binary data
Re: [PATCH Coroutines]Insert the default return_void call at correct position
Ping. Thanks, bin On Mon, Feb 3, 2020 at 1:55 PM bin.cheng wrote: > > Hi, > > Exception in coroutine is not correctly handled because the default > return_void call is now inserted before the finish suspend point, > rather than at the end of the original coroutine body. This patch > fixes the issue by generating following code: > co_await promise.initial_suspend(); > try { > // The original coroutine body > > promise.return_void(); // The default return_void call. > } catch (...) { > promise.unhandled_exception(); > } > final_suspend: > // ... > > Bootstrap and test on x86_64. Is it OK? > > Thanks, > bin > > gcc/cp > 2020-02-03 Bin Cheng > > * coroutines.cc (build_actor_fn): Factor out code inserting the > default return_void call to... > (morph_fn_to_coro): ...here, also hoist local var declarations. > > gcc/testsuite > 2020-02-03 Bin Cheng > > * g++.dg/coroutines/torture/co-ret-15-default-return_void.C: New.
[PATCH PR92926]Fix wrong code caused by ctor node translation unit wide sharing
Hi, As reported in PR92926, constant ctor is shared translation unit wide because of constexpr_call_table, however, during gimplify, the shared ctor could be modified. This patch fixes the issue by unsharing it before modification in gimplify. A test is reduced from cppcoro library and added. Bootstrap and test ongoing. Not sure if this is the correct fix though, any comments? Thanks, bin 2019-12-13 Bin Cheng PR tree-optimization/92926 * gimplify.c (gimplify_init_constructor): Unshare ctor node before clearing. gcc/testsuite 2019-12-13 Bin Cheng PR tree-optimization/92926 * g++.dg/pr92926.C: New test. 0001-pr92926.patch Description: Binary data
Re: [PATCH PR92926]Fix wrong code caused by ctor node translation unit wide sharing
On Fri, Dec 13, 2019 at 11:26 AM bin.cheng wrote: > > Hi, > > As reported in PR92926, constant ctor is shared translation unit wide because > of constexpr_call_table, > however, during gimplify, the shared ctor could be modified. This patch > fixes the issue by unsharing > it before modification in gimplify. A test is reduced from cppcoro library > and added. > > Bootstrap and test ongoing. Not sure if this is the correct fix though, any > comments? Ping. Any comment? Thanks, bin > > Thanks, > bin > > 2019-12-13 Bin Cheng > > PR tree-optimization/92926 > * gimplify.c (gimplify_init_constructor): Unshare ctor node before > clearing. > > gcc/testsuite > 2019-12-13 Bin Cheng > > PR tree-optimization/92926 > * g++.dg/pr92926.C: New test.
[PATCH GCC11]Improve uninitialized warning with value range info
Hi, Function use_pred_not_overlap_with_undef_path_pred of pass_late_warn_uninitialized checks if predicate of variable use overlaps with predicate of undefined control flow path. For now, it only checks ssa_var comparing against constant, this can be improved where ssa_var compares against another ssa_var with value range info, as described in comment: + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + +We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] +if no overflow/wrap. */ This change can avoid some false warning. Bootstrap and test on x86_64, any comment? Thanks, bin 2020-01-08 Bin Cheng * tree-ssa-uninit.c (find_var_cmp_const): New function. (use_pred_not_overlap_with_undef_path_pred): Call above.
Re: [PATCH GCC11]Improve uninitialized warning with value range info
Sorry, here is the patch. -- Sender:bin.cheng Sent At:2020 Jan. 8 (Wed.) 12:58 Recipient:GCC Patches Subject:[PATCH GCC11]Improve uninitialized warning with value range info Hi, Function use_pred_not_overlap_with_undef_path_pred of pass_late_warn_uninitialized checks if predicate of variable use overlaps with predicate of undefined control flow path. For now, it only checks ssa_var comparing against constant, this can be improved where ssa_var compares against another ssa_var with value range info, as described in comment: + /* Check value range info of rhs, do following transforms: + flag_var < [min, max] -> flag_var < max + flag_var > [min, max] -> flag_var > min + +We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: + flag_var <= [min, max] -> flag_var < [min, max+1] + flag_var >= [min, max] -> flag_var > [min-1, max] +if no overflow/wrap. */ This change can avoid some false warning. Bootstrap and test on x86_64, any comment? Thanks, bin 2020-01-08 Bin Cheng * tree-ssa-uninit.c (find_var_cmp_const): New function. (use_pred_not_overlap_with_undef_path_pred): Call above. check-uninit_warning-with-vrinfo-20200108.txt Description: Binary data
Re: [RFC] IVOPTs select cand with preferred D-form access
On Tue, Jan 7, 2020 at 6:48 PM Kewen.Lin wrote: > > on 2020/1/7 下午5:14, Richard Biener wrote: > > On Mon, 6 Jan 2020, Kewen.Lin wrote: > > > >> We are thinking whether it can be handled in IVOPTs instead of one RTL > >> pass. > >> > >> During IVOPTs selecting IV cands, it doesn't know the loop will be > >> unrolled so > >> it doesn't count the possible step cost in with X-form. If we can teach > >> it to > >> consider the case, the IV cands which plays with D-form can be preferred. > >> Currently unrolling (incomplete) happens in RTL, it looks we have to > >> predict > >> the loop whether unroll in IVOPTs. Since there is some parameter checks > >> on RTL > >> insn counts and target hooks, it seems not easy to get that. Besides, we > >> need > >> to check the step is valid to put into D-form field (eg: DQ-form requires > >> divide > >> 16 exactly), to ensure no extra ADDIs needed. > >> > >> I'm not sure whether it's a good idea to implement in IVOPTs, but I did > >> some > >> changes in IVOPTs to prove it's doable to get expected codes, the patch is > >> attached. > >> > >> Any comments/suggestions are highly appreiciated! > > > > Is the unrolled code better than the not unrolled code (assuming > > optimal IV choice)? Then IMHO IVOPTs should drive the unrolling, > > either by actually doing it or by forcing it via the loop->unroll > > setting. I don't think second-guessing the RTL unroller at this > > point is going to work. Alternatively turn X-form into D-form during > > RTL unrolling? > > > > Hi Richard, > > Thanks for the comments! > > Yes, unrolled version is better on Power9 for both forms, but D-form unrolled > is better > than X-form unrolled. If we drive unrolling in IVOPTs, not sure it will be a > concern > that IVOPTs becomes too heavy? or too rude with forced UF if imprecise? Do we > still > have the plan to introduce one middle-end unroll pass, does it help if yes? I am a bit worried that would make IVOPTs heavy too, it might be possible to compute heuristics whether loop should be unrolled as a post-IVOPTs transformation. Of course the transformation needs to do more work than simply unrolling in order to take advantage of aforementioned addressing mode. BTW, unrolled loop won't perform as good as ppc if the target doesn't support [base + register + offset] addressing mode? Another point, in case of multiple passes doing unrolling, the "already unrolled" information may need to be recorded as a flag of loop properties. Thanks, bin > The quoted RTL patch is to propose one RTL pass after RTL loop passes, it > also sounds > good to check whether RTL unrolling is a good place! > > > BR, > Kewen >
Re: [PATCH GCC11]Improve uninitialized warning with value range info
On Wed, Jan 8, 2020 at 6:31 PM Richard Biener wrote: > > On Wed, Jan 8, 2020 at 6:01 AM bin.cheng wrote: > > > > Sorry, here is the patch. > > -- > > Sender:bin.cheng > > Sent At:2020 Jan. 8 (Wed.) 12:58 > > Recipient:GCC Patches > > Subject:[PATCH GCC11]Improve uninitialized warning with value range info > > > > > > Hi, > > > > Function use_pred_not_overlap_with_undef_path_pred of > > pass_late_warn_uninitialized > > checks if predicate of variable use overlaps with predicate of undefined > > control flow path. > > For now, it only checks ssa_var comparing against constant, this can be > > improved where > > ssa_var compares against another ssa_var with value range info, as > > described in comment: > > > > + /* Check value range info of rhs, do following transforms: > > + flag_var < [min, max] -> flag_var < max > > + flag_var > [min, max] -> flag_var > min > > + > > +We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: > > + flag_var <= [min, max] -> flag_var < [min, max+1] > > + flag_var >= [min, max] -> flag_var > [min-1, max] > > +if no overflow/wrap. */ > > > > This change can avoid some false warning. Bootstrap and test on x86_64, > > any comment? > > Definitely a good idea - the refactoring makes the patch hard to > follow though. The > original code seems to pick any (the "first") compare against a constant. You > return the "first" but maybe from range info that might also be > [-INF,+INF]. It seems > that we'd want to pick the best so eventually sort the predicate chain > so that compares > against constants come first at least? Not sure if it really makes a > difference but I don't know either, but I simply tried to not break existing code int the patch. Function prune_uninit_phi_opnds is called for the first compares against constant, actually it should be called for each comparison, but I guess it's just avoiding O(M*N) complexity here. > even currently we could have i < 5, i < 1 so the "better" one later? > It might also make > sense to simply push three predicates for i < j, namely i < j (if ever > useful), i < min(j) > and max(i) < j to avoid repeatedly doing the range computations. IIUC, with current implementation, it's not useful to check value rang info for both sides of comparison because prune_uninit_phi_opnds requires the flag_var be defined by PHI node in the same basic block as PHI parameter. Thanks, bin > > Thanks, > Richard. > > > Thanks, > > bin > > > > 2020-01-08 Bin Cheng > > > > * tree-ssa-uninit.c (find_var_cmp_const): New function. > > (use_pred_not_overlap_with_undef_path_pred): Call above.
Re: [PATCH PR92926]Fix wrong code caused by ctor node translation unit wide sharing
On Fri, Dec 20, 2019 at 3:10 PM Richard Biener wrote: > > On December 20, 2019 2:13:47 AM GMT+01:00, "Bin.Cheng" > wrote: > >On Fri, Dec 13, 2019 at 11:26 AM bin.cheng > > wrote: > >> > >> Hi, > >> > >> As reported in PR92926, constant ctor is shared translation unit wide > >because of constexpr_call_table, > >> however, during gimplify, the shared ctor could be modified. This > >patch fixes the issue by unsharing > >> it before modification in gimplify. A test is reduced from cppcoro > >library and added. > >> > >> Bootstrap and test ongoing. Not sure if this is the correct fix > >though, any comments? > >Ping. Any comment? > > Looks reasonable to me. Given PR92926 is marked as duplicate of PR93143, I updated test case of the patch. Thanks, bin 2019-12-13 Bin Cheng PR tree-optimization/93143 * gimplify.c (gimplify_init_constructor): Unshare ctor node before clearing. gcc/testsuite 2019-12-13 Bin Cheng PR tree-optimization/93143 * g++.dg/pr93143.C: New test. From 77252c3bb41887af1daa9e83615a8aa32dc330f9 Mon Sep 17 00:00:00 2001 From: "bin.cheng" Date: Thu, 9 Jan 2020 14:13:08 +0800 Subject: [PATCH] Fix pr93143. --- gcc/gimplify.c | 2 ++ gcc/testsuite/g++.dg/pr93143.C | 73 ++ 2 files changed, 75 insertions(+) create mode 100644 gcc/testsuite/g++.dg/pr93143.C diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 73fb2e7..55d7a93 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -5083,6 +5083,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, /* Zap the CONSTRUCTOR element list, which simplifies this case. Note that we still have to gimplify, in order to handle the case of variable sized types. Avoid shared tree structures. */ + ctor = unshare_expr (ctor); + TREE_OPERAND (*expr_p, 1) = ctor; CONSTRUCTOR_ELTS (ctor) = NULL; TREE_SIDE_EFFECTS (ctor) = 0; object = unshare_expr (object); diff --git a/gcc/testsuite/g++.dg/pr93143.C b/gcc/testsuite/g++.dg/pr93143.C new file mode 100644 index 000..40710cf --- /dev/null +++ b/gcc/testsuite/g++.dg/pr93143.C @@ -0,0 +1,73 @@ +// { dg-do run } +// { dg-options "-O3 -std=c++14" } + +struct array +{ +constexpr unsigned char operator[](int i) const noexcept +{ +return arr[i]; +} + +unsigned char arr[16]; +}; + + +class v6 { +public: +using bytes_type = array; +constexpr v6(bytes_type const & bytes); +constexpr bool is_loopback() const noexcept; +static constexpr v6 loopback() noexcept +{ +return v6(v6::bytes_type{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}); +} +private: +bytes_type bytes_; +}; + + + +constexpr v6::v6(bytes_type const & bytes) +: bytes_(bytes) +{} + +constexpr +bool v6::is_loopback() const noexcept +{ +return bytes_[0] == 0 && +bytes_[1] == 0 && +bytes_[2] == 0 && +bytes_[3] == 0 && +bytes_[4] == 0 && +bytes_[5] == 0 && +bytes_[6] == 0 && +bytes_[7] == 0 && +bytes_[8] == 0 && +bytes_[9] == 0 && +bytes_[10] == 0 && +bytes_[11] == 0 && +bytes_[12] == 0 && +bytes_[13] == 0 && +bytes_[14] == 0 && +bytes_[15] == 1; +} + +void v6_type() +{ +[[maybe_unused]] constexpr auto loopback = v6::loopback(); +} + +int main() +{ +v6_type(); + +constexpr auto a = v6::loopback(); +if (!a.is_loopback()) +__builtin_abort(); + +auto b = v6::loopback(); +if (!b.is_loopback()) +__builtin_abort(); + +return 0; +} -- 1.8.3.1
Re: [PATCH GCC11]Improve uninitialized warning with value range info
On Wed, Jan 8, 2020 at 7:50 PM Richard Biener wrote: > > On Wed, Jan 8, 2020 at 12:30 PM Bin.Cheng wrote: > > > > On Wed, Jan 8, 2020 at 6:31 PM Richard Biener > > wrote: > > > > > > On Wed, Jan 8, 2020 at 6:01 AM bin.cheng > > > wrote: > > > > > > > > Sorry, here is the patch. > > > > -- > > > > Sender:bin.cheng > > > > Sent At:2020 Jan. 8 (Wed.) 12:58 > > > > Recipient:GCC Patches > > > > Subject:[PATCH GCC11]Improve uninitialized warning with value range info > > > > > > > > > > > > Hi, > > > > > > > > Function use_pred_not_overlap_with_undef_path_pred of > > > > pass_late_warn_uninitialized > > > > checks if predicate of variable use overlaps with predicate of > > > > undefined control flow path. > > > > For now, it only checks ssa_var comparing against constant, this can be > > > > improved where > > > > ssa_var compares against another ssa_var with value range info, as > > > > described in comment: > > > > > > > > + /* Check value range info of rhs, do following transforms: > > > > + flag_var < [min, max] -> flag_var < max > > > > + flag_var > [min, max] -> flag_var > min > > > > + > > > > +We can also transform LE_EXPR/GE_EXPR to LT_EXPR/GT_EXPR: > > > > + flag_var <= [min, max] -> flag_var < [min, max+1] > > > > + flag_var >= [min, max] -> flag_var > [min-1, max] > > > > +if no overflow/wrap. */ > > > > > > > > This change can avoid some false warning. Bootstrap and test on > > > > x86_64, any comment? > > > > > > Definitely a good idea - the refactoring makes the patch hard to > > > follow though. The > > > original code seems to pick any (the "first") compare against a constant. > > > You > > > return the "first" but maybe from range info that might also be > > > [-INF,+INF]. It seems > > > that we'd want to pick the best so eventually sort the predicate chain > > > so that compares > > > against constants come first at least? Not sure if it really makes a > > > difference but > > I don't know either, but I simply tried to not break existing code int > > the patch. > > Function prune_uninit_phi_opnds is called for the first compares against > > constant, actually it should be called for each comparison, but I guess it's > > just avoiding O(M*N) complexity here. > > Yeah. I'm just worried finding a "bad" value-range predicate cuts the search > in a way that causes extra bogus warnings? hmm, the code is now as: + cmp_code = find_var_cmp_const (preds, phi, &flag_def, &boundary_cst, false); + if (cmp_code == ERROR_MARK) +cmp_code = find_var_cmp_const (preds, phi, &flag_def, &boundary_cst, true); + if (cmp_code == ERROR_MARK) return false; First call to find_var_cmp_const preserves the original behavior, while the second call only finds value range case if there is no comparison cont is found by the first call. So there is no extra bogus warnings? > > > > > > even currently we could have i < 5, i < 1 so the "better" one later? > > > It might also make > > > sense to simply push three predicates for i < j, namely i < j (if ever > > > useful), i < min(j) > > > and max(i) < j to avoid repeatedly doing the range computations. > > IIUC, with current implementation, it's not useful to check value rang > > info for both sides of comparison because prune_uninit_phi_opnds > > requires the flag_var be defined by PHI node in the same basic block > > as PHI parameter. > > Yes, but without remembering the code very well my suggestion allows > "new" predicates to be gathered during collecting phase while your patch > adjusts the query phase? I am not quite follow here. Do you mean we collect three cases "i < j", "i < min(j)", "max(i) < j" then call prune_uninit_phi_opnds for all three conditions? This is another question? because now we simply break out of for loop for finding such condition: - if ((gimple_code (flag_def) == GIMPLE_PHI) - && (gimple_bb (flag_def) == gimple_bb (phi)) - && find_matching_predicate_in_rest_chains (the_pred, preds, -num_preds)) - break; It's always possible that this flag_def can't prune use predicates against undefined path predicates, while a later one can prune but is skipped? Thanks, bin > > Richard. > > > Thanks, > > bin > > > > > > Thanks, > > > Richard. > > > > > > > Thanks, > > > > bin > > > > > > > > 2020-01-08 Bin Cheng > > > > > > > > * tree-ssa-uninit.c (find_var_cmp_const): New function. > > > > (use_pred_not_overlap_with_undef_path_pred): Call above.
RE: [PING][PATCH ARM]Extend thumb1_reorg to save more comparison instructions
Ping^2 Thanks. bin -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Thursday, August 08, 2013 4:51 PM To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PING][PATCH ARM]Extend thumb1_reorg to save more comparison instructions Ping this patch, http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01057.html Thanks. bin
[PATCH ARM]Refine scaled address expression on ARM
Hi, This patch refines scaled address expression on ARM. It supports "base+index*scale" in arm_legitimate_address_outer_p. It also tries to legitimize "base + index * scale + offset" with "reg <- base + offset; reg + index * scale" by introducing thumb2_legitimize_address. For now function thumb2_legitimize_address is a kind of placeholder and just does the mentioned transformation by calling to try_multiplier_address. Hoping we can improve it in the future. With this patch: 1) "base+index*scale" is recognized. 2) PR57540 is fixed. Bootstrapped and Tested on A15. Is it OK? Thanks. Bin 2013-08-28 Bin Cheng * config/arm/arm.c (arm_legitimate_address_outer_p): Support addressing mode like "base + index * scale". (try_multiplier_address): New function. (arm_legitimize_address): Call try_multiplier_address. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 200774) +++ gcc/config/arm/arm.c(working copy) @@ -5931,7 +5931,9 @@ arm_legitimate_address_outer_p (enum machine_mode && arm_legitimate_index_p (mode, xop1, outer, strict_p)) || (!strict_p && will_be_in_index_register (xop1 || (arm_address_register_rtx_p (xop1, strict_p) - && arm_legitimate_index_p (mode, xop0, outer, strict_p))); + && arm_legitimate_index_p (mode, xop0, outer, strict_p)) + || (arm_address_register_rtx_p (xop0, strict_p) + && arm_legitimate_index_p (mode, xop1, outer, strict_p))); } #if 0 @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) } } +/* Try to find address expression like base + index * scale + offset + in X. If we find one, force base + offset into register and + construct new expression reg + index * scale; return the new + address expression if it's valid. Otherwise return X. */ +static rtx +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) +{ + rtx tmp, base_reg, new_rtx; + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX; + + gcc_assert (GET_CODE (x) == PLUS); + + /* Try to find and record base/index/scale/offset in X. */ + if (GET_CODE (XEXP (x, 1)) == MULT) +{ + tmp = XEXP (x, 0); + index = XEXP (XEXP (x, 1), 0); + scale = XEXP (XEXP (x, 1), 1); + if (GET_CODE (tmp) != PLUS) + return x; + + base = XEXP (tmp, 0); + offset = XEXP (tmp, 1); +} + else +{ + tmp = XEXP (x, 0); + offset = XEXP (x, 1); + if (GET_CODE (tmp) != PLUS) + return x; + + base = XEXP (tmp, 0); + scale = XEXP (tmp, 1); + if (GET_CODE (base) == MULT) + { + tmp = base; + base = scale; + scale = tmp; + } + if (GET_CODE (scale) != MULT) + return x; + + index = XEXP (scale, 0); + scale = XEXP (scale, 1); +} + + if (CONST_INT_P (base)) +{ + tmp = base; + base = offset; + offset = tmp; +} + + if (CONST_INT_P (index)) +{ + tmp = index; + index = scale; + scale = tmp; +} + + /* ARM only supports constant scale in address. */ + if (!CONST_INT_P (scale)) +return x; + + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode) +return x; + + /* Only register/constant are allowed in each part. */ + if (!symbol_mentioned_p (base) + && !symbol_mentioned_p (offset) + && !symbol_mentioned_p (index) + && !symbol_mentioned_p (scale)) +{ + /* Force "base+offset" into register and construct +"register+index*scale". Return the new expression +only if it's valid. */ + tmp = gen_rtx_PLUS (SImode, base, offset); + base_reg = force_reg (SImode, tmp); + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale); + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp); + return new_rtx; +} + + return x; +} + +/* Try machine-dependent ways of modifying an illegitimate Thumb2 address + to be legitimate. If we find one, return the new address. + + TODO: legitimize_address for Thumb2. */ +static rtx +thumb2_legitimize_address (rtx x, rtx orig_x ATTRIBUTE_UNUSED, + enum machine_mode mode) +{ + if (GET_CODE (x) == PLUS) +return try_multiplier_address (x, mode); + + return x; +} + /* Try machine-dependent ways of modifying an illegitimate address to be legitimate. If we find one, return the new, valid address. */ rtx @@ -6659,9 +6761,9 @@ arm_legitimize_address (rtx x, rtx orig_x, enum ma { if (!TARGET_ARM) { - /* TODO: legitimize_address for Thumb2. */ if (TARGET_THUMB2) -return x; + return thumb2_legitimize_address (x, orig_x, mode); + return thumb_legitimize_address (x, orig_x, mode); } @@ -6673,6 +6775,10 @@ arm_legitimize_address (rtx x, rtx orig_x, enum ma rtx xop0 = XEXP (x, 0);
[PATCH ARM/Embedded-4_8-branch]disable rtl loop invariant when optimizing for size
Hi, The attached patch disables rtl loop invariant when optimizing for code size. Committed to ARM/Embedded-4_8-branch as r202067. Thanks. bin 2013-08-29 Zhenqiang Chen * config/arm/arm.c (arm_option_override): Disable loop2_invariant pass when optimize_size and ira-loop-pressure is not enabled. Index: gcc/ChangeLog.arm === --- gcc/ChangeLog.arm (revision 202066) +++ gcc/ChangeLog.arm (revision 202067) @@ -1,3 +1,8 @@ +2013-08-29 Zhenqiang Chen + + * config/arm/arm.c (arm_option_override): Disable loop2_invariant + pass when optimize_size and ira-loop-pressure is not enabled. + 2013-08-05 Terry Guo Backport from mainline r197956 Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 202066) +++ gcc/config/arm/arm.c(revision 202067) @@ -2134,6 +2134,13 @@ global_options.x_param_values, global_options_set.x_param_values); + /* Do not move invariants out of loops since it tends to increase register + pressure. The heuristic to estimate register pressure does not fit for + ARM. -fira-loop-pressure tends to get more precise estimation. But it + still need more tuning. */ + if (optimize_function_for_size_p (cfun) && !flag_ira_loop_pressure) +flag_move_loop_invariants = 0; + /* Register global variables with the garbage collector. */ arm_add_gc_roots (); }
[PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
Hi, The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find different MEM_REFs sharing common part in addressing expression. If such MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient addressing expression during the RTL passes. The pass analyzes addressing expression in each MEM_REF to see if it can be formalized as follows: base:MEM_REF (T1, C1) offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) bitpos: C4 * BITS_PER_UNIT Then restructures it into below form: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), C1 + (C2 * C3) + C4) At last, rewrite the MEM_REFs if there are two or more sharing common (non-constant) part. The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in form of "T2' + C5", the MEM_REF should be restructure into: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), C1 + (C2 * C3) + C4 + (C5 * C3)) The patch also includes a test case to illustrate the problem. Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? Thanks. bin 2013-09-02 Bin Cheng * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. (restructure_reference): Call backtrace_base_for_ref. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng * gcc.dg/tree-ssa/slsr-39.c: New test.Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c === --- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) @@ -0,0 +1,26 @@ +/* Verify straight-line strength reduction for back-tracing + CADN_ADD for T2 in: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2, C3) +*PINDEX: C1 + (C2 * C3) + C4 */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slsr" } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i] [j++] = i; + a2 [i] [j++] = i; + a2 [i] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ +/* { dg-final { cleanup-tree-dump "slsr" } } */ Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 202067) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) add_cand_for_stmt (phi, c); } +/* Given PBASE which is a pointer to tree, loop up the defining + statement for it and check whether the candidate is in the + form of: + + X = B + (1 * S), S is integer constant + X = B + (i * S), S is integer one + + If so, set PBASE to the candiate's base_expr and return double + int (i * S). + Otherwise, just return double int zero. */ + +static double_int +backtrace_base_for_ref (tree *pbase) +{ + tree base_in = *pbase; + slsr_cand_t base_cand; + + STRIP_NOPS (base_in); + if (TREE_CODE (base_in) != SSA_NAME) +return tree_to_double_int (integer_zero_node); + + base_cand = base_cand_from_table (base_in); + + while (base_cand && base_cand->kind != CAND_PHI) +{ + if (base_cand->kind == CAND_ADD + && base_cand->index.is_one () + && TREE_CODE (base_cand->stride) == INTEGER_CST) + { + /* X = B + (1 * S), S is integer constant. */ + *pbase = base_cand->base_expr; + return tree_to_double_int (base_cand->stride); + } + else if (base_cand->kind == CAND_ADD + && TREE_CODE (base_cand->stride) == INTEGER_CST + && integer_onep (base_cand->stride)) +{ + /* X = B + (i * S), S is integer one. */ + *pbase = base_cand->base_expr; + return base_cand->index; + } + + if (base_cand->next_interp) + base_cand = lookup_cand (base_cand->next_interp); + else + base_cand = NULL; +} + + return tree_to_double_int (integer_zero_node); +} + /* Look for the following pattern: *PBASE:MEM_REF (T1, C1) @@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) *PBASE:T1 *POFFSET: MULT_EXPR (T2, C3) -*PINDEX: C1 + (C2 * C3) + C4 */ +*PINDEX: C1 + (C2 * C3) + C4 + When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It + will be further restructured to: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2', C3) +*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ + static bool restructure_reference (tree *pbase, tree *poffset, double_int *pindex, tree *ptype) @@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, double_int index = *pindex; double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); tree mult_op0, mult_op1, t1, t2, type; - double_int c1, c2, c3, c4; + double_int c1, c2, c3, c4, c5; if (!base || !offset @@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, } c4 = index.udiv (bp
[PATCH GCC]Find auto-increment use both before and after candidate's increment in IVOPT
Hi, For now set_autoinc_for_original_candidates only searches auto-inc uses before candidate's increment, causing pre-increment opportunities missed for original candidates. This is a straightforward fix by searching after candidate's increment too. The patch also includes a test case to illustrate the problem. Without the patch, assembly of the test is: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr2, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r1, [r3] addsr3, r1, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bmi .L2 addsr3, r0, #1 .L3: mov r0, r3 addsr3, r3, #1 ldrbr2, [r0]@ zero_extendqisi2 add r2, r2, r1 ldrbr2, [r2, #1]@ zero_extendqisi2 lslsr2, r2, #29 bpl .L3 .L2: bx lr .size foo, .-foo Which can be optimized into below: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr3, #:lower16:__ctype_ptr__ ldrbr1, [r0]@ zero_extendqisi2 movtr3, #:upper16:__ctype_ptr__ ldr r2, [r3] addsr3, r2, r1 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr1, r3, #29 bmi .L2 .L3: ldrbr3, [r0, #1]! @ zero_extendqisi2 add r3, r3, r2 ldrbr3, [r3, #1]@ zero_extendqisi2 lslsr3, r3, #29 bpl .L3 .L2: bx lr .size foo, .-foo Bootstrapped and tested on arm a15, is it OK? Thanks. bin 2013-09-02 Bin Cheng * tree-ssa-loop-ivopts.c (set_autoinc_for_original_candidates): Find auto-increment use both before and after candidate. gcc/testsuite/ChangeLog 2013-09-02 Bin Cheng * gcc.target/arm/ivopts-orig_biv-inc.c: New test.Index: gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c === --- gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c (revision 0) +++ gcc/testsuite/gcc.target/arm/ivopts-orig_biv-inc.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */ +/* { dg-skip-if "" { arm_thumb1 } } */ + +extern char *__ctype_ptr__; + +unsigned char * foo(unsigned char *ReadPtr) +{ + + unsigned char c; + + while (!(((__ctype_ptr__+sizeof(""[*ReadPtr]))[(int)(*ReadPtr)])&04) == (!(0))) + ReadPtr++; + + return ReadPtr; +} + +/* { dg-final { scan-tree-dump-times "original biv" 2 "ivopts"} } */ +/* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 200774) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4876,22 +4876,36 @@ set_autoinc_for_original_candidates (struct ivopts for (i = 0; i < n_iv_cands (data); i++) { struct iv_cand *cand = iv_cand (data, i); - struct iv_use *closest = NULL; + struct iv_use *closest_before = NULL; + struct iv_use *closest_after = NULL; if (cand->pos != IP_ORIGINAL) continue; + for (j = 0; j < n_iv_uses (data); j++) { struct iv_use *use = iv_use (data, j); unsigned uid = gimple_uid (use->stmt); - if (gimple_bb (use->stmt) != gimple_bb (cand->incremented_at) - || uid > gimple_uid (cand->incremented_at)) + + if (gimple_bb (use->stmt) != gimple_bb (cand->incremented_at)) continue; - if (closest == NULL || uid > gimple_uid (closest->stmt)) - closest = use; + + if (uid < gimple_uid (cand->incremented_at) + && (closest_before == NULL + || uid > gimple_uid (closest_before->stmt))) + closest_before = use; + + if (uid > gimple_uid (cand->incremented_at) + && (closest_after == NULL + || uid < gimple_uid (closest_after->stmt))) + closest_after = use; } - if (closest == NULL || !autoinc_possible_for_pair (data, closest, cand)) - continue; - cand->ainc_use = closest; + + if (closest_before != NULL + && autoinc_possible_for_pair (data, closest_before, cand)) + cand->ainc_use = closest_before; + else if (closest_after != NULL + && autoinc_possible_for_pair (data, closest_after, cand)) + cand->ainc_use = closest_after; } }
RE: [PATCH ARM]Refine scaled address expression on ARM
> -Original Message- > From: Richard Earnshaw > Sent: Thursday, August 29, 2013 9:06 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH ARM]Refine scaled address expression on ARM > > On 28/08/13 08:00, bin.cheng wrote: > > Hi, > > > > This patch refines scaled address expression on ARM. It supports > > "base+index*scale" in arm_legitimate_address_outer_p. It also tries > > to legitimize "base + index * scale + offset" with "reg <- base + > > offset; reg > > + index * scale" by introducing thumb2_legitimize_address. For now > > + function > > thumb2_legitimize_address is a kind of placeholder and just does the > > mentioned transformation by calling to try_multiplier_address. Hoping > > we can improve it in the future. > > > > With this patch: > > 1) "base+index*scale" is recognized. > > That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. > So this shouldn't be necessary. Can you identify where this non-canoncial form is being generated? > Oh, for now ivopt constructs "index*scale" to test whether backend supports scaled addressing mode, which is not valid on ARM, so I was going to construct "base + index*scale" instead. Since "base + index * scale" is not canonical form, I will construct the canonical form and drop this part of the patch. Is rest of this patch OK? Thanks. bin
Re: [PATCH][RFC] Move IVOPTs closer to RTL expansion
On Wed, Sep 4, 2013 at 5:20 PM, Richard Biener wrote: > > The patch below moves IVOPTs out of the GIMPLE loop pipeline more > closer to RTL expansion. That's done for multiple reasons. > > First, the loop passes that at the moment preceede IVOPTs leave > around IL that is in desparate need of basic re-optimization > like CSE, constant propagation and DCE. That puts extra load > on IVOPTs and its cost model, increasing compile-time and > possibly confusing it. > > Second, IVOPTs introduces lowered memory accesses that it > expects to stay as is, likewise it produces auto-inc/dec > sequences that it expects to stay as is until RTL expansion. > Passes such as DOM can break this expectation and make the > work done by IVOPTs a waste. > > I remember doing this excercise in the GCC 4.3 timeframe where > benchmarking on x86_64 showed no gains or losses (but x86_64 > isn't very sensitive to IV choices). > > Any help with benchmarking this on targets other than x86_64 > is appreciated (I'll re-do x86_64). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > General comments are of course also welcome. > > Thanks, > Richard. > > 2013-09-04 Richard Biener > > * passes.def: Move IVOPTs before final DCE pass. > * tree-ssa-loop.c (tree_ssa_loop_ivopts): Adjust for being > outside of the loop pipeline. > > * gcc.dg/tree-ssa/ivopts-3.c: Scan non-details dump. > * gcc.dg/tree-ssa/reassoc-19.c: Be more permissive. > > Index: gcc/passes.def > === > *** gcc/passes.def.orig 2013-09-04 10:57:33.0 +0200 > --- gcc/passes.def 2013-09-04 11:11:27.535952665 +0200 > *** along with GCC; see the file COPYING3. > *** 221,227 > NEXT_PASS (pass_complete_unroll); > NEXT_PASS (pass_slp_vectorize); > NEXT_PASS (pass_loop_prefetch); > - NEXT_PASS (pass_iv_optimize); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_tree_loop_done); > POP_INSERT_PASSES () > --- 221,226 > *** along with GCC; see the file COPYING3. > *** 237,242 > --- 236,246 > opportunities. */ > NEXT_PASS (pass_phi_only_cprop); > NEXT_PASS (pass_vrp); > + /* IVOPTs lowers memory accesses and exposes auto-inc/dec > + opportunities. Run it after the above passes cleaned up > +the loop optimized IL but before DCE as IVOPTs generates > +quite some garbage. */ > + NEXT_PASS (pass_iv_optimize); > NEXT_PASS (pass_cd_dce); > NEXT_PASS (pass_tracer); > > Index: gcc/tree-ssa-loop.c > === > *** gcc/tree-ssa-loop.c.orig2013-09-04 10:57:32.0 +0200 > --- gcc/tree-ssa-loop.c 2013-09-04 11:11:27.536952677 +0200 > *** make_pass_loop_prefetch (gcc::context *c > *** 906,915 > static unsigned int > tree_ssa_loop_ivopts (void) > { > ! if (number_of_loops (cfun) <= 1) > ! return 0; > > - tree_ssa_iv_optimize (); > return 0; > } > > --- 906,924 > static unsigned int > tree_ssa_loop_ivopts (void) > { > ! loop_optimizer_init (LOOPS_NORMAL > ! | LOOPS_HAVE_RECORDED_EXITS); > ! > ! if (number_of_loops (cfun) > 1) > ! { > ! rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); > ! scev_initialize (); > ! tree_ssa_iv_optimize (); > ! scev_finalize (); > ! } > ! > ! loop_optimizer_finalize (); > > return 0; > } > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c > === > *** gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c.orig 2013-09-04 > 10:57:33.0 +0200 > --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c2013-09-04 11:11:27.559952952 > +0200 > *** > *** 1,5 > /* { dg-do compile } */ > ! /* { dg-options "-O2 -fdump-tree-ivopts-details" } */ > > void main (void) > { > --- 1,5 > /* { dg-do compile } */ > ! /* { dg-options "-O2 -fdump-tree-ivopts" } */ > > void main (void) > { > *** void main (void) > *** 8,12 > f2 (); > } > > ! /* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" } } */ > /* { dg-final { cleanup-tree-dump "ivopts" } } */ > --- 8,12 > f2 (); > } > > ! /* { dg-final { scan-tree-dump-times "!= 0" 1 "ivopts" } } */ > /* { dg-final { cleanup-tree-dump "ivopts" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-19.c > === > *** gcc/testsuite/gcc.dg/tree-ssa/reassoc-19.c.orig 2012-12-18 > 14:24:58.0 +0100 > --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-19.c 2013-09-04 11:13:30.895416700 > +0200 > *** void foo(char* left, char* rite, int ele > *** 16,22 > } > } > > ! /* { dg-final { scan-tree-dump-times "= \\\(sizetype\\\) element" 1 > "optimized" } } */
Re: [PATCH][RFC] Move IVOPTs closer to RTL expansion
On Mon, Sep 9, 2013 at 10:01 AM, Bin.Cheng wrote: > On Wed, Sep 4, 2013 at 5:20 PM, Richard Biener wrote: >> >> The patch below moves IVOPTs out of the GIMPLE loop pipeline more >> closer to RTL expansion. That's done for multiple reasons. >> >> First, the loop passes that at the moment preceede IVOPTs leave >> around IL that is in desparate need of basic re-optimization >> like CSE, constant propagation and DCE. That puts extra load >> on IVOPTs and its cost model, increasing compile-time and >> possibly confusing it. >> >> Second, IVOPTs introduces lowered memory accesses that it >> expects to stay as is, likewise it produces auto-inc/dec >> sequences that it expects to stay as is until RTL expansion. >> Passes such as DOM can break this expectation and make the >> work done by IVOPTs a waste. >> >> I remember doing this excercise in the GCC 4.3 timeframe where >> benchmarking on x86_64 showed no gains or losses (but x86_64 >> isn't very sensitive to IV choices). >> >> Any help with benchmarking this on targets other than x86_64 >> is appreciated (I'll re-do x86_64). I will bootstrap and make check on arm a15. -- Best Regards.
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch. On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt wrote: > >>>+ int (i * S). >>>+ Otherwise, just return double int zero. */ > > This is sufficient, since you are properly checking the next_interp > chain. Another possible form would be > > X = (B + i) * 1, > > but if this is present, then one of the forms you're checking for should > also be present, so there's no need to check the MULT_CANDs. I'm not very sure here since I didn't check MULT_CAND in the patch. Could you please explain more about this? > >>>+ >>>+static double_int >>>+backtrace_base_for_ref (tree *pbase) >>>+{ >>>+ tree base_in = *pbase; >>>+ slsr_cand_t base_cand; >>>+ >>>+ STRIP_NOPS (base_in); >>>+ if (TREE_CODE (base_in) != SSA_NAME) >>>+return tree_to_double_int (integer_zero_node); >>>+ >>>+ base_cand = base_cand_from_table (base_in); >>>+ >>>+ while (base_cand && base_cand->kind != CAND_PHI) >>>+{ >>>+ if (base_cand->kind == CAND_ADD >>>+ && base_cand->index.is_one () >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) >>>+ { >>>+ /* X = B + (1 * S), S is integer constant. */ >>>+ *pbase = base_cand->base_expr; >>>+ return tree_to_double_int (base_cand->stride); >>>+ } >>>+ else if (base_cand->kind == CAND_ADD >>>+&& TREE_CODE (base_cand->stride) == INTEGER_CST >>>+&& integer_onep (base_cand->stride)) >>>+{ >>>+ /* X = B + (i * S), S is integer one. */ >>>+ *pbase = base_cand->base_expr; >>>+ return base_cand->index; >>>+ } >>>+ >>>+ if (base_cand->next_interp) >>>+ base_cand = lookup_cand (base_cand->next_interp); >>>+ else >>>+ base_cand = NULL; >>>+} >>>+ >>>+ return tree_to_double_int (integer_zero_node); >>>+} >>>+ >>> /* Look for the following pattern: >>> >>> *PBASE:MEM_REF (T1, C1) >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) >>> >>> *PBASE:T1 >>> *POFFSET: MULT_EXPR (T2, C3) >>>-*PINDEX: C1 + (C2 * C3) + C4 */ >>>+*PINDEX: C1 + (C2 * C3) + C4 >>> >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > ^ ^ > a it > >>>+ will be further restructured to: >>>+ >>>+*PBASE:T1 >>>+*POFFSET: MULT_EXPR (T2', C3) >>>+*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ >>>+ >>> static bool >>> restructure_reference (tree *pbase, tree *poffset, double_int > *pindex, >>> tree *ptype) >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, >>> double_int index = *pindex; >>> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); >>> tree mult_op0, mult_op1, t1, t2, type; >>>- double_int c1, c2, c3, c4; >>>+ double_int c1, c2, c3, c4, c5; >>> >>> if (!base >>> || !offset >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree > *poffset, >>> } >>> >>> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); >>>+ c5 = backtrace_base_for_ref (&t2); >>> >>> *pbase = t1; >>>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, >>>- double_int_to_tree (sizetype, c3)); >>>- *pindex = c1 + c2 * c3 + c4; >>>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), >>>+ double_int_to_tree (sizetype, c3)); > > I am not sure why you changed this call. fold_build2 is a more > efficient call than size_binop. size_binop makes several checks that > will fail in this case, and then calls fold_build2_loc, right? Not a > big deal but seems like changing it back would be better. Perhaps I'm > missing something (as usual ;). I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type. Otherwise there will be ssa_verify error later. Thanks. bin
Re: [PATCH][RFC] Move IVOPTs closer to RTL expansion
On Tue, Sep 10, 2013 at 3:30 AM, Steven Bosscher wrote: > On Mon, Sep 9, 2013 at 10:01 AM, Richard Biener wrote: >>> >> First, the loop passes that at the moment preceede IVOPTs leave >>> >> around IL that is in desparate need of basic re-optimization >>> >> like CSE, constant propagation and DCE. That puts extra load >>> >> on IVOPTs and its cost model, increasing compile-time and >>> >> possibly confusing it. > > So why not just run DCE and DOM just before IVOPTs? > > >>> >> Second, IVOPTs introduces lowered memory accesses that it >>> >> expects to stay as is, likewise it produces auto-inc/dec >>> >> sequences that it expects to stay as is until RTL expansion. >>> >> Passes such as DOM can break this expectation and make the >>> >> work done by IVOPTs a waste. > > But IVOPTs leaves its own messy code behind (see below). > > >>> >> I remember doing this excercise in the GCC 4.3 timeframe where >>> >> benchmarking on x86_64 showed no gains or losses (but x86_64 >>> >> isn't very sensitive to IV choices). >>> >> >>> >> Any help with benchmarking this on targets other than x86_64 >>> >> is appreciated (I'll re-do x86_64). > > Targets like POWER and ARM would be interesting to test on. Yes, I will benchmark on ARM 32bit. > > >> We already run LIM twice, moving the one that is currently after >> IVOPTs as well should be easy. But of course as you note IVOPTs >> may introduce loop invariant code it also may introduce full >> redundancies in the way it re-writes IVs. And for both people may >> claim that we have both CSE and LIM on the RTL level, too. > > I would claim that relying on RTL (G)CSE and RTL LIM is a step in the > wrong direction. You end up creating a lot of garbage RTL, and many > transformations that are easy on GIMPLE cannot be performed anymore on > RTL. > > Is it possible to make IVOPTs clean up after itself? It should be easy > for IVOPTs to notice that it creates loop-invariant code, and position It would be even better to have some loop-invariant sensitive re-association, because IVOPT (by using general tree association) splits loop-invariant expression into different gimple, making lim's life hard. > it on the loop pre-header edge. I suppose full redundancies are > harder, but I would expect that to happen less frequently (the only > situation I can think of right now is where a loop is rewritten with > two IVs where the two IVs share a common sub-expression). > > Ciao! > Steven -- Best Regards.
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Mon, Sep 9, 2013 at 11:35 PM, Bill Schmidt wrote: > >> > I rely on size_binop to convert T2 into sizetype, because T2' may be in >> > other kind of type. Otherwise there will be ssa_verify error later. >> >> OK, I see now. I had thought this was handled by fold_build2, but >> apparently not. I guess all T2's formerly handled were already sizetype >> as expected. Thanks for the explanation! > > So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in > the argument list to fold_build2? It's picking nits, but that would be > slightly more efficient. Hi Bill, This is the 2nd version of patch with your comments incorporated. Bootstrap and re-test on x86. Re-test on ARM ongoing. Is it ok if tests pass? Thanks. binIndex: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c === --- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) @@ -0,0 +1,26 @@ +/* Verify straight-line strength reduction for back-tracing + CAND_ADD for T2 in: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2, C3) +*PINDEX: C1 + (C2 * C3) + C4 */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slsr" } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i] [j++] = i; + a2 [i] [j++] = i; + a2 [i] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ +/* { dg-final { cleanup-tree-dump "slsr" } } */ Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 202067) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) add_cand_for_stmt (phi, c); } +/* Given PBASE which is a pointer to tree, look up the defining + statement for it and check whether the candidate is in the + form of: + + X = B + (1 * S), S is integer constant + X = B + (i * S), S is integer one + + If so, set PBASE to the candidate's base_expr and return double + int (i * S). + Otherwise, just return double int zero. */ + +static double_int +backtrace_base_for_ref (tree *pbase) +{ + tree base_in = *pbase; + slsr_cand_t base_cand; + + STRIP_NOPS (base_in); + if (TREE_CODE (base_in) != SSA_NAME) +return tree_to_double_int (integer_zero_node); + + base_cand = base_cand_from_table (base_in); + + while (base_cand && base_cand->kind != CAND_PHI) +{ + if (base_cand->kind == CAND_ADD + && base_cand->index.is_one () + && TREE_CODE (base_cand->stride) == INTEGER_CST) + { + /* X = B + (1 * S), S is integer constant. */ + *pbase = base_cand->base_expr; + return tree_to_double_int (base_cand->stride); + } + else if (base_cand->kind == CAND_ADD + && TREE_CODE (base_cand->stride) == INTEGER_CST + && integer_onep (base_cand->stride)) +{ + /* X = B + (i * S), S is integer one. */ + *pbase = base_cand->base_expr; + return base_cand->index; + } + + if (base_cand->next_interp) + base_cand = lookup_cand (base_cand->next_interp); + else + base_cand = NULL; +} + + return tree_to_double_int (integer_zero_node); +} + /* Look for the following pattern: *PBASE:MEM_REF (T1, C1) @@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) *PBASE:T1 *POFFSET: MULT_EXPR (T2, C3) -*PINDEX: C1 + (C2 * C3) + C4 */ +*PINDEX: C1 + (C2 * C3) + C4 + When T2 is recorded by a CAND_ADD in the form of (T2' + C5), it + will be further restructured to: + +*PBASE:T1 +*POFFSET: MULT_EXPR (T2', C3) +*PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ + static bool restructure_reference (tree *pbase, tree *poffset, double_int *pindex, tree *ptype) @@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, double_int index = *pindex; double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); tree mult_op0, mult_op1, t1, t2, type; - double_int c1, c2, c3, c4; + double_int c1, c2, c3, c4, c5; if (!base || !offset @@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, } c4 = index.udiv (bpu, FLOOR_DIV_EXPR); + c5 = backtrace_base_for_ref (&t2); *pbase = t1; - *poffset = fold_build2 (MULT_EXPR, sizetype, t2, + *poffset = fold_build2 (MULT_EXPR, sizetype, fold_convert (sizetype, t2), double_int_to_tree (sizetype, c3)); - *pindex = c1 + c2 * c3 + c4; + *pindex = c1 + c2 * c3 + c4 + c5 * c3; *ptype = type; return true;
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
On Tue, Sep 10, 2013 at 9:30 PM, Bill Schmidt wrote: > > > On Tue, 2013-09-10 at 15:41 +0800, bin.cheng wrote: >> On Mon, Sep 9, 2013 at 11:35 PM, Bill Schmidt >> wrote: >> > >> >> > I rely on size_binop to convert T2 into sizetype, because T2' may be in >> >> > other kind of type. Otherwise there will be ssa_verify error later. >> >> >> >> OK, I see now. I had thought this was handled by fold_build2, but >> >> apparently not. I guess all T2's formerly handled were already sizetype >> >> as expected. Thanks for the explanation! >> > >> > So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in >> > the argument list to fold_build2? It's picking nits, but that would be >> > slightly more efficient. >> >> Hi Bill, >> >> This is the 2nd version of patch with your comments incorporated. >> Bootstrap and re-test on x86. Re-test on ARM ongoing. Is it ok if tests >> pass? > > Looks good to me! Thanks, Bin. > Sorry I have to hold on this patch since it causes several tests failed on ARM. Will investigate it and get back ASAP. Thanks. bin
RE: [PING][PATCH ARM]Extend thumb1_reorg to save more comparison instructions
Ping ^ 3 -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Friday, August 23, 2013 3:23 PM To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: RE: [PING][PATCH ARM]Extend thumb1_reorg to save more comparison instructions Ping^2 Thanks. bin -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Thursday, August 08, 2013 4:51 PM To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PING][PATCH ARM]Extend thumb1_reorg to save more comparison instructions Ping this patch, http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01057.html Thanks. bin
RE: [PATCH ARM]Extend thumb1_reorg to save more comparison instructions
> -Original Message- > From: Richard Earnshaw > Sent: Thursday, September 12, 2013 11:24 PM > To: Bin Cheng > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH ARM]Extend thumb1_reorg to save more comparison > instructions > > On 18/04/13 06:34, Bin Cheng wrote: > > Sorry for the delay, I've been trying to get my head around this one. > > > thumb1_reorg-20130417.txt > > > > > > Index: gcc/config/arm/arm.c > > > == > = > > --- gcc/config/arm/arm.c(revision 197562) > > +++ gcc/config/arm/arm.c(working copy) > > @@ -14026,6 +14026,7 @@ thumb1_reorg (void) > >rtx set, dest, src; > >rtx pat, op0; > >rtx prev, insn = BB_END (bb); > > + bool insn_clobbered = false; > > > >while (insn != BB_HEAD (bb) && DEBUG_INSN_P (insn)) > > insn = PREV_INSN (insn); > > @@ -14034,12 +14035,29 @@ thumb1_reorg (void) > >if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) > > continue; > > > > - /* Find the first non-note insn before INSN in basic block BB. */ > > + /* Get the register with which we are comparing. */ > > + pat = PATTERN (insn); > > + op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); > > + > > + /* Find the first flag setting insn before INSN in basic block > > + BB. */ > >gcc_assert (insn != BB_HEAD (bb)); > > - prev = PREV_INSN (insn); > > - while (prev != BB_HEAD (bb) && (NOTE_P (prev) || DEBUG_INSN_P > (prev))) > > - prev = PREV_INSN (prev); > > + for (prev = PREV_INSN (insn); > > + (!insn_clobbered > > + && prev != BB_HEAD (bb) > > + && (NOTE_P (prev) > > + || DEBUG_INSN_P (prev) > > + || (GET_CODE (prev) == SET > > This can't be right. prev is an insn of some form, so the test that it is a SET will > always fail. > > What you need to do here is to initialize 'set' to null before the loop and then > have something like > > || ((set = single_set (prev)) != NULL > > > + && get_attr_conds (prev) == CONDS_NOCOND))); > > + prev = PREV_INSN (prev)) > > + { > > + if (reg_set_p (op0, prev)) > > + insn_clobbered = true; > > + } > > > > + /* Skip if op0 is clobbered by insn other than prev. */ > > + if (insn_clobbered) > > + continue; > > + > >set = single_set (prev); > > This now becomes redundant and ... > > >if (!set) > > continue; > > This will be based on the set you extracted above. > Hi Richard, here is the updated patch according to your comments. Tested on thumb1, please review. Thanks. binIndex: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 202599) +++ gcc/config/arm/arm.c(working copy) @@ -14265,9 +14265,10 @@ thumb1_reorg (void) FOR_EACH_BB (bb) { - rtx set, dest, src; - rtx pat, op0; + rtx dest, src; + rtx pat, op0, set = NULL; rtx prev, insn = BB_END (bb); + bool insn_clobbered = false; while (insn != BB_HEAD (bb) && DEBUG_INSN_P (insn)) insn = PREV_INSN (insn); @@ -14276,13 +14277,29 @@ thumb1_reorg (void) if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn) continue; - /* Find the first non-note insn before INSN in basic block BB. */ + /* Get the register with which we are comparing. */ + pat = PATTERN (insn); + op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); + + /* Find the first flag setting insn before INSN in basic block BB. */ gcc_assert (insn != BB_HEAD (bb)); - prev = PREV_INSN (insn); - while (prev != BB_HEAD (bb) && (NOTE_P (prev) || DEBUG_INSN_P (prev))) - prev = PREV_INSN (prev); + for (prev = PREV_INSN (insn); + (!insn_clobbered + && prev != BB_HEAD (bb) + && (NOTE_P (prev) + || DEBUG_INSN_P (prev) + || ((set = single_set (prev)) != NULL + && get_attr_conds (prev) == CONDS_NOCOND))); + prev = PREV_INSN (prev)) + { + if (reg_set_p (op0, prev)) + insn_clobbered = true; + } - set = single_set (prev); + /* Skip if op0 is clobbered by insn other than prev. */ + if (insn_clobbered) + continue; + if (!set) continue; @@ -14292,12 +14309,9 @@ thumb1_reorg (void) || !low_register_operand (src, SImode)) continue; - pat = PATTERN (insn); - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); /* Rewrite move into subtract of 0 if its operand is compared with ZERO -in INSN. Don't need to check dest since cprop_hardreg pass propagates -src into INSN. */ - if (REGNO (op0) == REGNO (src)) +in INSN. Both src and dest of the move insn are checked. */ + if (REGNO (op0) == REGNO (src) || REGNO (op0) == REGNO (dest)) { dest = copy_rtx (dest); src = copy_rtx (src)
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of bin.cheng > Sent: Thursday, September 12, 2013 2:13 PM > To: 'Bill Schmidt'; Yufeng Zhang; Yufeng Zhang > Cc: Richard Biener; GCC Patches > Subject: RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing > part in gimple strength reduction > > > On Tue, Sep 10, 2013 at 9:30 PM, Bill Schmidt > wrote: > > > > > > On Tue, 2013-09-10 at 15:41 +0800, bin.cheng wrote: > >> On Mon, Sep 9, 2013 at 11:35 PM, Bill Schmidt > wrote: > >> > > >> >> > I rely on size_binop to convert T2 into sizetype, because T2' may be > in other kind of type. Otherwise there will be ssa_verify error later. > >> >> > >> >> OK, I see now. I had thought this was handled by fold_build2, but > >> >> apparently not. I guess all T2's formerly handled were already > >> >> sizetype as expected. Thanks for the explanation! > >> > > >> > So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) > >> > in the argument list to fold_build2? It's picking nits, but that > >> > would be slightly more efficient. > >> > >> Hi Bill, > >> > >> This is the 2nd version of patch with your comments incorporated. > >> Bootstrap and re-test on x86. Re-test on ARM ongoing. Is it ok if tests > pass? > > > > Looks good to me! Thanks, Bin. > > > > Sorry I have to hold on this patch since it causes several tests failed on > ARM. > Will investigate it and get back ASAP. > The reported failure is false alarm and happens on trunk too. I must have compared wrong testing results. Since there is no regression and the patch is approved before, I will apply it to trunk. Thanks. bin
RE: [PATCH GCC]Catch more MEM_REFs sharing common addressing part in gimple strength reduction
> -Original Message- > From: Dominique Dhumieres [mailto:domi...@lps.ens.fr] > Sent: Wednesday, September 18, 2013 1:47 AM > To: gcc-patches@gcc.gnu.org > Cc: hjl.to...@gmail.com; Bin Cheng > Subject: Re: [PATCH GCC]Catch more MEM_REFs sharing common > addressing part in gimple strength reduction > > The new test gcc.dg/tree-ssa/slsr-39.c fails in 64 bit mode (see > http://gcc.gnu.org/ml/gcc-regression/2013-09/msg00455.html ). > Looking for MEM in the dump returns > > _12 = MEM[(int[50] *)_17]; > MEM[(int[50] *)_20] = _13; > Thanks for reporting, I think this can be fixed by patch: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00761.html Thanks. bin
RE: [PATCH ARM]Refine scaled address expression on ARM
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of bin.cheng > Sent: Monday, September 02, 2013 3:09 PM > To: Richard Earnshaw > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH ARM]Refine scaled address expression on ARM > > > > > -Original Message- > > From: Richard Earnshaw > > Sent: Thursday, August 29, 2013 9:06 PM > > To: Bin Cheng > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH ARM]Refine scaled address expression on ARM > > > > On 28/08/13 08:00, bin.cheng wrote: > > > Hi, > > > > > > This patch refines scaled address expression on ARM. It supports > > > "base+index*scale" in arm_legitimate_address_outer_p. It also tries > > > to legitimize "base + index * scale + offset" with "reg <- base + > > > offset; reg > > > + index * scale" by introducing thumb2_legitimize_address. For now > > > + function > > > thumb2_legitimize_address is a kind of placeholder and just does the > > > mentioned transformation by calling to try_multiplier_address. > > > Hoping we can improve it in the future. > > > > > > With this patch: > > > 1) "base+index*scale" is recognized. > > > > That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. > > So this shouldn't be necessary. Can you identify where this > non-canoncial form is being generated? > > > > Oh, for now ivopt constructs "index*scale" to test whether backend > supports scaled addressing mode, which is not valid on ARM, so I was going > to construct "base + index*scale" instead. Since "base + index * scale" is not > canonical form, I will construct the canonical form and drop this part of the > patch. > > Is rest of this patch OK? > Hi Richard, I removed the part over which you concerned and created this updated patch. Is it OK? Thanks. bin 2013-09-18 Bin Cheng * config/arm/arm.c (try_multiplier_address): New function. (thumb2_legitimize_address): New function. (arm_legitimize_address): Call try_multiplier_address and thumb2_legitimize_address.Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 200774) +++ gcc/config/arm/arm.c(working copy) @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) } } +/* Try to find address expression like base + index * scale + offset + in X. If we find one, force base + offset into register and + construct new expression reg + index * scale; return the new + address expression if it's valid. Otherwise return X. */ +static rtx +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) +{ + rtx tmp, base_reg, new_rtx; + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = NULL_RTX; + + gcc_assert (GET_CODE (x) == PLUS); + + /* Try to find and record base/index/scale/offset in X. */ + if (GET_CODE (XEXP (x, 1)) == MULT) +{ + tmp = XEXP (x, 0); + index = XEXP (XEXP (x, 1), 0); + scale = XEXP (XEXP (x, 1), 1); + if (GET_CODE (tmp) != PLUS) + return x; + + base = XEXP (tmp, 0); + offset = XEXP (tmp, 1); +} + else +{ + tmp = XEXP (x, 0); + offset = XEXP (x, 1); + if (GET_CODE (tmp) != PLUS) + return x; + + base = XEXP (tmp, 0); + scale = XEXP (tmp, 1); + if (GET_CODE (base) == MULT) + { + tmp = base; + base = scale; + scale = tmp; + } + if (GET_CODE (scale) != MULT) + return x; + + index = XEXP (scale, 0); + scale = XEXP (scale, 1); +} + + if (CONST_INT_P (base)) +{ + tmp = base; + base = offset; + offset = tmp; +} + + if (CONST_INT_P (index)) +{ + tmp = index; + index = scale; + scale = tmp; +} + + /* ARM only supports constant scale in address. */ + if (!CONST_INT_P (scale)) +return x; + + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode) +return x; + + /* Only register/constant are allowed in each part. */ + if (!symbol_mentioned_p (base) + && !symbol_mentioned_p (offset) + && !symbol_mentioned_p (index) + && !symbol_mentioned_p (scale)) +{ + /* Force "base+offset" into register and construct +"register+index*scale". Return the new expression +only if it's valid. */ + tmp = gen_rtx_PLUS (SImode, base, offset); + base_reg = force_reg (SImode, tmp); + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale); + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp); + ret
[PATCH]Construct canonical scaled address expression in IVOPT
Hi, For now IVOPT constructs scaled address expression in the form of "scaled*index" and checks whether backend supports it. The problem is the address expression is invalid on ARM, causing scaled expression disabled in IVOPT on ARM. This patch fixes the IVOPT part by constructing rtl address expression like "index*scaled+base". Hi Richard, I thought about the suggestion constructing TARGET_MEM[.] and adding new target hook to check whether backend supports such target memory accesses, but still want to give this patch a try because: 1) RTL pattern "index*scaled+base" is some kind of canonical form of scaled address expression and it works fine. 2) It won't save us any inconvenience by constructing TARGET_MEM node, on contrary, we have to add new target hook checking whether scaled addressing mode is supported, which in essence is nothing else than current implementation. Also "base+index*scaled" is re-structured to canonical form "index*scaled+base", I constructed the latter form in patch. Bootstrapped and tested on x86_64 and arm_a15. Is it OK? Thanks. bin 2013-09-20 Bin Cheng * tree-ssa-loop-ivopts.c (multiplier_allowed_in_address_p): Construct canonical scaled rtl address expression.Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 202599) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -3134,15 +3134,22 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT rat { enum machine_mode address_mode = targetm.addr_space.address_mode (as); rtx reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1); - rtx addr; + rtx addr, index; HOST_WIDE_INT i; valid_mult = sbitmap_alloc (2 * MAX_RATIO + 1); bitmap_clear (valid_mult); - addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); + /* Construct address expression in the canonical form of +"base+index*scale" and call memory_address_addr_space_p +to see whether it is allowed by target processors. */ + index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); for (i = -MAX_RATIO; i <= MAX_RATIO; i++) { - XEXP (addr, 1) = gen_int_mode (i, address_mode); + if (i == 1) + continue; + + XEXP (index, 1) = gen_int_mode (i, address_mode); + addr = gen_rtx_fmt_ee (PLUS, address_mode, index, reg1); if (memory_address_addr_space_p (mode, addr, as)) bitmap_set_bit (valid_mult, i + MAX_RATIO); }
RE: [PATCH]Construct canonical scaled address expression in IVOPT
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Monday, September 23, 2013 8:08 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT > > On Fri, Sep 20, 2013 at 12:00 PM, bin.cheng wrote: > > Hi, > > For now IVOPT constructs scaled address expression in the form of > > "scaled*index" and checks whether backend supports it. The problem is > > the address expression is invalid on ARM, causing scaled expression > > disabled in IVOPT on ARM. This patch fixes the IVOPT part by > > constructing rtl address expression like "index*scaled+base". > > - addr = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); > + /* Construct address expression in the canonical form of > + "base+index*scale" and call memory_address_addr_space_p to see > whether > + it is allowed by target processors. */ > + index = gen_rtx_fmt_ee (MULT, address_mode, reg1, NULL_RTX); >for (i = -MAX_RATIO; i <= MAX_RATIO; i++) > { > - XEXP (addr, 1) = gen_int_mode (i, address_mode); > + if (i == 1) > +continue; > + > + XEXP (index, 1) = gen_int_mode (i, address_mode); addr = > + gen_rtx_fmt_ee (PLUS, address_mode, index, reg1); >if (memory_address_addr_space_p (mode, addr, as)) > bitmap_set_bit (valid_mult, i + MAX_RATIO); > > so you now build reg1*scale+reg1 - which checks if offset and scale work at > the same time (and with the same value - given this is really reg1*(scale+1)). > It might be that this was implicitely assumed (or that no targets allow scale > but no offset), but it's a change that requires audit of behavior on more > targets. So literally, function multiplier_allowed_in_address_p should check whether multiplier is allowed in any kind of addressing mode, like [reg*scale + offset] and [reg*scale + reg]. Apparently it's infeasible to check every possibility for each architecture, is it ok we at least check "index", then "addr" if "index" is failed? By "any kind of addressing modes", I mean modes supported by function get_address_cost, i.e., in form of "[base] + [off] + [var] + (reg|reg*scale)". > > The above also builds more RTX waste which you can fix by re-using the PLUS > by building it up-front similar to the multiplication. You also miss the Yes, this can be fixed. > opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. I > would expect that many targets support reg1 * scale + constant-offset but > not many reg1 * scale + reg2. I thought scale==1 is unnecessary because the addressing mode degrades into "reg" or "reg+reg". Moreover, calls of multiplier_allowed_in_address_p in both get_address_cost and get_computation_cost_at have scale other than 1. > > So no, the helper now checks sth completely different. What's the problem > with arm supporting reg1 * scale? Why shouldn't it being able to handle the > implicit zero offset? As Richard clarified, ARM does not support scaled addressing mode without base register. Thanks. bin
[PATCH]Fix computation of offset in ivopt
Hi, This patch fix two minor bugs when computing offset in IVOPT. 1) Considering below example: #define MAX 100 struct tag { int i; int j; } struct tag arr[MAX] int foo (int len) { int i = 0; for (; i < len; i++) { access arr[i].j; } } Without this patch, the offset computed by strip_offset_1 for address arr[i].j is ZERO, which is apparently not. 2) Considering below example: //... : KeyIndex_66 = KeyIndex_194 + 4294967295; if (KeyIndex_66 != 0) goto ; else goto ; : : # KeyIndex_194 = PHI _62 = KeyIndex_194 + 1073741823; _63 = _62 * 4; _64 = pretmp_184 + _63; _65 = *_64; if (_65 == 0) goto ; else goto ; //... There are iv use and candidate like: use 1 address in statement _65 = *_64; at position *_64 type handletype * base pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 step 4294967292 base object (void *) pretmp_184 related candidates candidate 6 var_before ivtmp.16 var_after ivtmp.16 incremented before use 1 type unsigned int base (unsigned int) (pretmp_184 + (sizetype) KeyIndex_180 * 4) step 4294967292 base object (void *) pretmp_184 Candidate 6 is related to use 1 In function get_computation_cost_at for use 1 using cand 6, ubase and cbase are: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 pretmp_184 + (sizetype) KeyIndex_180 * 4 The cstepi computed in HOST_WIDE_INT is : 0xfffc, while offset computed in TYPE(utype) is : 0xfffc. Though they both stand for value "-4" in different precision, statement "offset -= ratio * cstepi" returns 0x1, which is wrong. Tested on x86_64 and arm. Is it OK? Thanks. bin 2013-09-24 Bin Cheng * tree-ssa-loop-ivopts.c (strip_offset_1): Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. (get_computation_cost_at): Sign extend offset.Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 200774) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2133,19 +2133,28 @@ strip_offset_1 (tree expr, bool inside_addr, bool break; case COMPONENT_REF: - if (!inside_addr) - return orig_expr; + { + tree field; + HOST_WIDE_INT boffset = 0; + if (!inside_addr) + return orig_expr; - tmp = component_ref_field_offset (expr); - if (top_compref - && cst_and_fits_in_hwi (tmp)) - { - /* Strip the component reference completely. */ - op0 = TREE_OPERAND (expr, 0); - op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); - *offset = off0 + int_cst_value (tmp); - return op0; - } + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) + && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref + && cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } + } break; case ADDR_EXPR: @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than HOST_WIDE_INT. */ + offset = sext_hwi (offset, TYPE_PRECISION (utype)); + /* If we are after the increment, the value of the candidate is higher by one iteration. */ if (stmt_is_after_inc)
Re: [PATCH]Construct canonical scaled address expression in IVOPT
On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener wrote: > On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng wrote: >> >> >>> -Original Message- > > Or even [reg*scale] (not sure about that). But yes, at least reg*scale + > offset > and reg*scale + reg. > >> Apparently it's infeasible to check every >> possibility for each architecture, is it ok we at least check "index", then >> "addr" if "index" is failed? By "any kind of addressing modes", I mean >> modes supported by function get_address_cost, i.e., in form of "[base] + >> [off] + [var] + (reg|reg*scale)". > > I suppose so, but it of course depends on what IVOPTs uses the answer > for in the end. Appearantly it doesn't distinguish between the various cases > even though TARGET_MEM_REF does support all the variants in question > (reg * scale, reg * scale + reg, reg * scale + const, reg * scale + > reg + const). > > So the better answer may be to teach the costs about the differences? Ideally, IVOPT should be aware whether scaling is allowed in every kind of addressing modes and account cost of multiplier accordingly. For current code, there are two scenarios here 1) If target supports reg*scale+reg, but not reg*scale, in this case, IVOPT considers multiplier is not allowed in any addressing mode and account multiplier with high cost. This is the problem arm having. 2) If target supports reg*scale, but not some kind of addressing mode (saying reg*scale+reg), in this case, IVOPT still constructs various scaled addressing mode in get_address_cost and depends on address_cost to compute correct cost for that addressing expression. I think this happens to work even IVOPT doesn't know "reg*scale+reg" is actually not supported. > >>> The above also builds more RTX waste which you can fix by re-using the >> PLUS >>> by building it up-front similar to the multiplication. You also miss the >> Yes, this can be fixed. >> >>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. >> I >>> would expect that many targets support reg1 * scale + constant-offset but >>> not many reg1 * scale + reg2. >> I thought scale==1 is unnecessary because the addressing mode degrades into >> "reg" or "reg+reg". Moreover, calls of multiplier_allowed_in_address_p in >> both get_address_cost and get_computation_cost_at have scale other than 1. > > Ok. > >>> >>> So no, the helper now checks sth completely different. What's the problem >>> with arm supporting reg1 * scale? Why shouldn't it being able to handle >> the >>> implicit zero offset? >> >> As Richard clarified, ARM does not support scaled addressing mode without >> base register. > > I see. > Also from the newer comments: > Btw, it should be reasonably possible to compute the whole > multiplier_allowed_in_address_p table for all primary and secondary archs > (simply build cross-cc1) and compare the results before / after a patch > candidate. Querying both reg * scale and reg + reg * scale if the first > fails sounds like a good solution to me. I take this as we should do minimal change by checking reg + reg * scale if reg * scale is failed, right? Thanks. bin -- Best Regards.
RE: [PATCH]Construct canonical scaled address expression in IVOPT
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, September 24, 2013 7:58 PM > To: Bin.Cheng > Cc: Bin Cheng; GCC Patches; Richard Earnshaw > Subject: Re: [PATCH]Construct canonical scaled address expression in IVOPT > > On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng > wrote: > > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener > > wrote: > >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng wrote: > >>> > >>> > >>>> -Original Message- > >> > >> Or even [reg*scale] (not sure about that). But yes, at least > >> reg*scale + offset and reg*scale + reg. > >> > >>> Apparently it's infeasible to check every possibility for each > >>> architecture, is it ok we at least check "index", then "addr" if > >>> "index" is failed? By "any kind of addressing modes", I mean modes > >>> supported by function get_address_cost, i.e., in form of "[base] + > >>> [off] + [var] + (reg|reg*scale)". > >> > >> I suppose so, but it of course depends on what IVOPTs uses the answer > >> for in the end. Appearantly it doesn't distinguish between the > >> various cases even though TARGET_MEM_REF does support all the > >> variants in question (reg * scale, reg * scale + reg, reg * scale + > >> const, reg * scale + reg + const). > >> > >> So the better answer may be to teach the costs about the differences? > > Ideally, IVOPT should be aware whether scaling is allowed in every > > kind of addressing modes and account cost of multiplier accordingly. > > For current code, there are two scenarios here > > 1) If target supports reg*scale+reg, but not reg*scale, in this case, > > IVOPT considers multiplier is not allowed in any addressing mode and > > account multiplier with high cost. This is the problem arm having. > > 2) If target supports reg*scale, but not some kind of addressing mode > > (saying reg*scale+reg), in this case, IVOPT still constructs various > > scaled addressing mode in get_address_cost and depends on address_cost > > to compute correct cost for that addressing expression. I think this > > happens to work even IVOPT doesn't know "reg*scale+reg" is actually > > not supported. > > > >> > >>>> The above also builds more RTX waste which you can fix by re-using > >>>> the > >>> PLUS > >>>> by building it up-front similar to the multiplication. You also > >>>> miss the > >>> Yes, this can be fixed. > >>> > >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 is valid. > >>> I > >>>> would expect that many targets support reg1 * scale + > >>>> constant-offset but not many reg1 * scale + reg2. > >>> I thought scale==1 is unnecessary because the addressing mode > >>> degrades into "reg" or "reg+reg". Moreover, calls of > >>> multiplier_allowed_in_address_p in both get_address_cost and > get_computation_cost_at have scale other than 1. > >> > >> Ok. > >> > >>>> > >>>> So no, the helper now checks sth completely different. What's the > >>>> problem with arm supporting reg1 * scale? Why shouldn't it being > >>>> able to handle > >>> the > >>>> implicit zero offset? > >>> > >>> As Richard clarified, ARM does not support scaled addressing mode > >>> without base register. > >> > >> I see. > >> > > Also from the newer comments: > > > >> Btw, it should be reasonably possible to compute the whole > >> multiplier_allowed_in_address_p table for all primary and secondary > >> archs (simply build cross-cc1) and compare the results before / after > >> a patch candidate. Querying both reg * scale and reg + reg * scale > >> if the first fails sounds like a good solution to me. > > I take this as we should do minimal change by checking reg + reg * > > scale if reg * scale is failed, right? > > Yes, you can share a single RTL expression for all this and I think querying reg > + reg * scale first makes sense (then fallback to reg * scale for compatibility). > I updated the patch as discussed. Please review. It bootstraps and passes regression test on x86/x86_64, but fails tree-ssa/scev-4.c on arm cortex-a15. Hi Richard, I investigated the failure and found out it reveals two ot
RE: [PATCH]Construct canonical scaled address expression in IVOPT
Sorry for missing the patch. Thanks. bin > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of bin.cheng > Sent: Thursday, September 26, 2013 8:05 PM > To: 'Richard Biener'; Bin.Cheng > Cc: GCC Patches; Richard Earnshaw > Subject: RE: [PATCH]Construct canonical scaled address expression in IVOPT > > > > > -Original Message- > > From: Richard Biener [mailto:richard.guent...@gmail.com] > > Sent: Tuesday, September 24, 2013 7:58 PM > > To: Bin.Cheng > > Cc: Bin Cheng; GCC Patches; Richard Earnshaw > > Subject: Re: [PATCH]Construct canonical scaled address expression in > > IVOPT > > > > On Tue, Sep 24, 2013 at 1:40 PM, Bin.Cheng > > wrote: > > > On Tue, Sep 24, 2013 at 6:12 PM, Richard Biener > > > wrote: > > >> On Tue, Sep 24, 2013 at 8:20 AM, bin.cheng > wrote: > > >>> > > >>> > > >>>> -Original Message- > > >> > > >> Or even [reg*scale] (not sure about that). But yes, at least > > >> reg*scale + offset and reg*scale + reg. > > >> > > >>> Apparently it's infeasible to check every possibility for each > > >>> architecture, is it ok we at least check "index", then "addr" if > > >>> "index" is failed? By "any kind of addressing modes", I mean > > >>> modes supported by function get_address_cost, i.e., in form of > > >>> "[base] + [off] + [var] + (reg|reg*scale)". > > >> > > >> I suppose so, but it of course depends on what IVOPTs uses the > > >> answer for in the end. Appearantly it doesn't distinguish between > > >> the various cases even though TARGET_MEM_REF does support all the > > >> variants in question (reg * scale, reg * scale + reg, reg * scale + > > >> const, reg * scale + reg + const). > > >> > > >> So the better answer may be to teach the costs about the differences? > > > Ideally, IVOPT should be aware whether scaling is allowed in every > > > kind of addressing modes and account cost of multiplier accordingly. > > > For current code, there are two scenarios here > > > 1) If target supports reg*scale+reg, but not reg*scale, in this > > > case, IVOPT considers multiplier is not allowed in any addressing > > > mode and account multiplier with high cost. This is the problem arm > having. > > > 2) If target supports reg*scale, but not some kind of addressing > > > mode (saying reg*scale+reg), in this case, IVOPT still constructs > > > various scaled addressing mode in get_address_cost and depends on > > > address_cost to compute correct cost for that addressing expression. > > > I think this happens to work even IVOPT doesn't know "reg*scale+reg" > > > is actually not supported. > > > > > >> > > >>>> The above also builds more RTX waste which you can fix by > > >>>> re-using the > > >>> PLUS > > >>>> by building it up-front similar to the multiplication. You also > > >>>> miss the > > >>> Yes, this can be fixed. > > >>> > > >>>> opportunity to have scale == 1 denote as to whether reg1 + reg2 > > >>>> is > valid. > > >>> I > > >>>> would expect that many targets support reg1 * scale + > > >>>> constant-offset but not many reg1 * scale + reg2. > > >>> I thought scale==1 is unnecessary because the addressing mode > > >>> degrades into "reg" or "reg+reg". Moreover, calls of > > >>> multiplier_allowed_in_address_p in both get_address_cost and > > get_computation_cost_at have scale other than 1. > > >> > > >> Ok. > > >> > > >>>> > > >>>> So no, the helper now checks sth completely different. What's > > >>>> the problem with arm supporting reg1 * scale? Why shouldn't it > > >>>> being able to handle > > >>> the > > >>>> implicit zero offset? > > >>> > > >>> As Richard clarified, ARM does not support scaled addressing mode > > >>> without base register. > > >> > > >> I see. > > >> > > > Also from the newer comments: > > > > > >> Btw, it should be reasonably possible to compute t
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Tuesday, September 24, 2013 6:31 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng wrote: > > + field = TREE_OPERAND (expr, 1); > + if (DECL_FIELD_BIT_OFFSET (field) > + && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) > + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > + > + tmp = component_ref_field_offset (expr); > + if (top_compref > + && cst_and_fits_in_hwi (tmp)) > + { > + /* Strip the component reference completely. */ > + op0 = TREE_OPERAND (expr, 0); > + op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); > + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; > + return op0; > + } > > the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for > either offset part you may end up doing half accounting and not stripping. > > Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to > > if (!inside_addr) >return orig_expr; > > tmp = component_ref_field_offset (expr); > field = TREE_OPERAND (expr, 1); > if (top_compref > && cst_and_fits_in_hwi (tmp) > && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) > { > ... > } Will be refined. > > note that this doesn't really handle overflows correctly as > > + *offset = off0 + int_cst_value (tmp) + boffset / > + BITS_PER_UNIT; > > may still overflow. Since it's "unsigned + signed + signed", according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed->unsigned problem here, see below. > > @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, > bitmap_clear (*depends_on); > } > > + /* Sign-extend offset if utype has lower precision than > + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION > + (utype)); > + > > offset is computed elsewhere in difference_cost and the bug to me seems > that it is unsigned. sign-extending it here is odd at least (and the extension > should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like "int_cst_value (tmp)" above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? Thanks. bin
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of bin.cheng > Sent: Friday, September 27, 2013 1:07 PM > To: 'Richard Biener' > Cc: GCC Patches > Subject: RE: [PATCH]Fix computation of offset in ivopt > > > > > -Original Message- > > From: Richard Biener [mailto:richard.guent...@gmail.com] > > Sent: Tuesday, September 24, 2013 6:31 PM > > To: Bin Cheng > > Cc: GCC Patches > > Subject: Re: [PATCH]Fix computation of offset in ivopt > > > > On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng wrote: > > > > + field = TREE_OPERAND (expr, 1); > > + if (DECL_FIELD_BIT_OFFSET (field) > > + && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) > > + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > > + > > + tmp = component_ref_field_offset (expr); > > + if (top_compref > > + && cst_and_fits_in_hwi (tmp)) > > + { > > + /* Strip the component reference completely. */ > > + op0 = TREE_OPERAND (expr, 0); > > + op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); > > + *offset = off0 + int_cst_value (tmp) + boffset / > BITS_PER_UNIT; > > + return op0; > > + } > > > > the failure paths seem mangled, that is, if cst_and_fits_in_hwi is > > false > for > > either offset part you may end up doing half accounting and not stripping. > > > > Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite > > to > > > > if (!inside_addr) > >return orig_expr; > > > > tmp = component_ref_field_offset (expr); > > field = TREE_OPERAND (expr, 1); > > if (top_compref > > && cst_and_fits_in_hwi (tmp) > > && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) > > { > > ... > > } > Will be refined. > > > > > note that this doesn't really handle overflows correctly as > > > > + *offset = off0 + int_cst_value (tmp) + boffset / > > + BITS_PER_UNIT; > > > > may still overflow. > Since it's "unsigned + signed + signed", according to implicit conversion, the > signed operand will be converted to unsigned, so the overflow would only > happen when off0 is huge number and tmp/boffset is large positive number, > right? Do I need to check whether off0 is larger than the overflowed result? > Also there is signed->unsigned problem here, see below. > > > > > @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data > *data, > > bitmap_clear (*depends_on); > > } > > > > + /* Sign-extend offset if utype has lower precision than > > + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION > > + (utype)); > > + > > > > offset is computed elsewhere in difference_cost and the bug to me > > seems that it is unsigned. sign-extending it here is odd at least > > (and the > extension > > should probably happen at sizetype precision, not that of utype). > I agree, The root cause is in split_offset_1, in which offset is computed. > Every time offset is computed in this function with a signed operand (like > "int_cst_value (tmp)" above), we need to take care the possible negative > number problem. Take this case as an example, we need to do below > change: > > case INTEGER_CST: > //... > *offset = int_cst_value (expr); > change to > case INTEGER_CST: > //... > *offset = sext_hwi (int_cst_value (expr), type); > > and > case MULT_EXPR: > //... > *offset = sext_hwi (int_cst_value (expr), type); to > case MULT_EXPR: > //... > HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); > *offset = sext_hwi (xxx, type); > > Any comments? Thought twice, I guess we can compute signed offset in strip_offset_1 and sign extend it for strip_offset, thus we don't need to change every computation of offset in that function. Thanks. bin
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Friday, September 27, 2013 4:30 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng wrote: > > > > > > case INTEGER_CST: > > //... > > *offset = int_cst_value (expr); > > change to > > case INTEGER_CST: > > //... > > *offset = sext_hwi (int_cst_value (expr), type); > > > > and > > case MULT_EXPR: > > //... > > *offset = sext_hwi (int_cst_value (expr), type); to > > case MULT_EXPR: > > //... > > HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); > > *offset = sext_hwi (xxx, type); > > > > Any comments? > > The issue is of course that we end up converting offsets to sizetype at some > point which makes them all appear unsigned. The fix for this is to simply > interpret them as signed ... but it's really a mess ;) > Hi, this is updated patch which calculates signed offset in strip_offset_1 then sign extend it in strip_offset. Bootstrap and test on x86_64/x86/arm. Is it OK? Thanks. bin 2013-09-30 Bin Cheng * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. (strip_offset): Sign extend before return.Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 202599) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2037,12 +2037,12 @@ find_interesting_uses (struct ivopts_data *data) static tree strip_offset_1 (tree expr, bool inside_addr, bool top_compref, - unsigned HOST_WIDE_INT *offset) + HOST_WIDE_INT *offset) { tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step; enum tree_code code; tree type, orig_type = TREE_TYPE (expr); - unsigned HOST_WIDE_INT off0, off1, st; + HOST_WIDE_INT off0, off1, st; tree orig_expr = expr; STRIP_NOPS (expr); @@ -2133,19 +2133,26 @@ strip_offset_1 (tree expr, bool inside_addr, bool break; case COMPONENT_REF: - if (!inside_addr) - return orig_expr; + { + tree field; + HOST_WIDE_INT boffset = 0; + if (!inside_addr) + return orig_expr; - tmp = component_ref_field_offset (expr); - if (top_compref - && cst_and_fits_in_hwi (tmp)) - { - /* Strip the component reference completely. */ - op0 = TREE_OPERAND (expr, 0); - op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); - *offset = off0 + int_cst_value (tmp); - return op0; - } + tmp = component_ref_field_offset (expr); + field = TREE_OPERAND (expr, 1); + if (top_compref + && cst_and_fits_in_hwi (tmp) + && cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, &off0); + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } + } break; case ADDR_EXPR: @@ -2196,7 +2203,16 @@ strip_offset_1 (tree expr, bool inside_addr, bool static tree strip_offset (tree expr, unsigned HOST_WIDE_INT *offset) { - return strip_offset_1 (expr, false, false, offset); + HOST_WIDE_INT off; + tree core = strip_offset_1 (expr, false, false, &off); + + /* Sign extend off if expr is in type which has lower precision + than HOST_WIDE_INT. */ + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); + + *offset = off; + return core; } /* Returns variant of TYPE that can be used as base for different uses.
RE: [PATCH]Fix computation of offset in ivopt
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Oleg Endo > Sent: Wednesday, September 25, 2013 1:41 AM > To: Richard Biener > Cc: Bin Cheng; GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > > > > After reading "overflow" and "ivopt", I was wondering whether > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55190 is somehow related. No, this patch is irrelevant. But I do have some thought on the pr55190 and will follow in the bug entry. Thanks. bin
Re: [PATCH]Fix computation of offset in ivopt
On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener wrote: > On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng wrote: >> >> > > I don't think you need > > + /* Sign extend off if expr is in type which has lower precision > + than HOST_WIDE_INT. */ > + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) > +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); > > at least it would be suspicious if you did ... There is a problem for example of the first message. The iv base if like: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure why but it seems (-4/0xFFFC) is represented by (1073741823*4). For each operand strip_offset_1 returns exactly the positive number and result of multiplication never get its chance of sign extend. That's why the sign extend is necessary to fix the problem. Or it should be fixed elsewhere by representing iv base with: "pretmp_184 + ((sizetype) KeyIndex_180 + 4294967295) * 4" in the first place. > > The only case that I can think of points to a bug in strip_offset_1 > again, namely if sizetype (the type of all offsets) is smaller than > a HOST_WIDE_INT in which case > > +boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > +*offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; > > is wrong as boffset / BITS_PER_UNIT does not do a signed division > then (for negative boffset which AFAIK does not happen - but it would > be technically allowed). Thus, the predicates like > > +&& cst_and_fits_in_hwi (tmp) > > would need to be amended with a check that the MSB is not set. So I can handle it like: +abs_boffset = abs_hwi (boffset); +x = abs_boffset / BITS_PER_UNIT; +if (boffset < 0) + x = -x; +*offset = off0 + int_cst_value (tmp) + x; Right? > > Btw, the cst_and_fits_in_hwi implementation is odd: > > bool > cst_and_fits_in_hwi (const_tree x) > { > if (TREE_CODE (x) != INTEGER_CST) > return false; > > if (TYPE_PRECISION (TREE_TYPE (x)) > HOST_BITS_PER_WIDE_INT) > return false; > > return (TREE_INT_CST_HIGH (x) == 0 > || TREE_INT_CST_HIGH (x) == -1); > } > > the precision check seems totally pointless and I wonder what's > the point of this routine as there is host_integerp () already > and tree_low_cst instead of int_cst_value - oh, I see, the latter > forcefully sign-extends that should make the extension > not necessary. See above. Thanks. bin
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Wed, Oct 17, 2012 at 6:54 AM, Steven Bosscher wrote: > On Tue, Oct 16, 2012 at 1:38 PM, Paolo Bonzini wrote: Bottom line: This is a bug in df_kill_notes. >> >> Yep, and it could cause wrong code generation, couldn't it? Because the >> new (reg:DI 72) is obviously not equal to (plus:DI (reg:DI 72) >> (const_int 24)), but it could be propagated to subsequent insns. > > Right. > > >> So I think this patch should be backported to all release branches after >> regstrapping. > > Agreed, but let's wait a few days at least to see what happens with > the patch on the trunk. > > >> Ok after bootstrap, regtest and checking that it actually fixes the bug. > > I made sure of that before posting the patch ;-) > > Bootstrap®test is running on x86_64-unknown-linux-gnu and > powerpc64-unknown-linux-gnu. I'll commit the patch tomorrow if there > are no test suite regressions. > > Well done to us all for probing until we found the root cause of this bug! > Hi Jan, For the test case: main() { configure2(); } Please make main return ZERO by default. When cross testing with qemu/simulator, the wrap exit checks whether the case by verifying the return value. For ARM target, R0 is checked while it may be clobbered(thus non-ZERO) if main does not return any value, which causes failure of test case. Thanks very much. -- Best Regards.
Re: [patch] Unify bitmap interface.
On Tue, Oct 30, 2012 at 8:23 AM, Lawrence Crowl wrote: > On 10/29/12, Diego Novillo wrote: >> On Oct 29, 2012 Diego Novillo wrote: >> > Just to make sure. Testing on ppc should be fast, for example. >> >> And useless. Your patch does not touch ppc. > > I've fixed the #if 0 and the remaining suggestions will happen in > another patch. I've committed this one. > > === > > This patch implements the unification of the *bitmap interfaces as discussed. > Essentially, we rename ebitmap and sbitmap functions to use the same names > as the bitmap functions. This rename works because we can now overload > on the bitmap type. Some macros now become inline functions to enable > that overloading. > > The sbitmap non-bool returning bitwise operations have been merged with > the bool versions. Sometimes this merge involved modifying the non-bool > version to compute the bool value, and sometimes modifying bool version to > add additional work from the non-bool version. The redundant routines have > been removed. > > The allocation functions have not been renamed, because we often do not > have an argument on which to overload. The cardinality functions have not > been renamed, because they have different parameters, and are thus not > interchangable. The iteration functions have not been renamed, because > they are functionally different. > > Tested on x86_64, contrib/config-list.mk testing passed. > Index: gcc/ChangeLog > Just one question: Should we change the name of functions "sbitmap_intersection_of_succs/sbitmap_intersection_of_preds/sbitmap_union_of_succs/sbitmap_union_of_preds" too? It might be a little confusing that sbitmap_* is used among bitmap_*. -- Best Regards.
Re: [PATCH]Fix computation of offset in ivopt
Sorry that I don't understand tree type system well, so here are two more questions, could you please explain little bit more? Thanks very much. On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener wrote: > On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng wrote: > > I don't think you need > > + /* Sign extend off if expr is in type which has lower precision > + than HOST_WIDE_INT. */ > + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) > +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); Is it possible to have type of expr smaller than the type of embedded offset (i.e., sizetype)? Should the sign extend be against sizetype or it's safe with TREE_TYPE(expr)? > > at least it would be suspicious if you did ... > > The only case that I can think of points to a bug in strip_offset_1 > again, namely if sizetype (the type of all offsets) is smaller than > a HOST_WIDE_INT in which case > > +boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > +*offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; > > is wrong as boffset / BITS_PER_UNIT does not do a signed division > then (for negative boffset which AFAIK does not happen - but it would > be technically allowed). Thus, the predicates like > > +&& cst_and_fits_in_hwi (tmp) > > would need to be amended with a check that the MSB is not set. I am confused, why "boffset / BITS_PER_UNIT" won't do signed division and how it relates to "sizetype smaller than HOST_WIDE_INT"? If sizetype is smaller, won't int_cst_value sign extends it into HOST_WIDE_INT? Thanks. bin -- Best Regards.
Re: [PATCH]Fix computation of offset in ivopt
On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener wrote: > On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng wrote: >> > > I don't think you need > > + /* Sign extend off if expr is in type which has lower precision > + than HOST_WIDE_INT. */ > + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) > +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); > > at least it would be suspicious if you did ... > > The only case that I can think of points to a bug in strip_offset_1 > again, namely if sizetype (the type of all offsets) is smaller than > a HOST_WIDE_INT in which case > > +boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > +*offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; > > is wrong as boffset / BITS_PER_UNIT does not do a signed division > then (for negative boffset which AFAIK does not happen - but it would > be technically allowed). Thus, the predicates like > > +&& cst_and_fits_in_hwi (tmp) > > would need to be amended with a check that the MSB is not set. I think it twice and have below understandings. 1) "boffset / BITS_PER_UNIT" does not do signed division because boffset might be negative and BIT_PER_UNIT could be defined as unsigned for a target. 2) if sizetype is smaller enough than HOST_WIDE_INT, and if field offset is large enough to have MSB set, then boffset would be computed negative by int_cst_value. 3) If sizetype is as large as HOST_WIDE_INT, boffset would be negative only if the field offset is a very large number with MSB set, but that would be impossible for any structure. Are these understandings right? And sorry to bother with the stupid questions. Thanks. -- Best Regards.
RE: [PATCH]Fix computation of offset in ivopt
Hi, As noted in previous messages, GCC forces offset to unsigned in middle end. It also gets offset value and stores it in HOST_WIDE_INT then uses it in various computation. In this scenario, It should use int_cst_value to do additional sign extension against the type of tree node, otherwise we might lose sign information. Take function fold_plusminus_mult_expr as an example, the sign information of arg01/arg11 might be lost because it uses TREE_INT_CST_LOW directly. I think this is the offender of the problem in this thread. I think we should fix the offender, rather the hacking strip_offset as in the original patch, so I split original patch into two as attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the other fixing the mentioned sign extension problem. Bootstrap and test on x86/x86_64/arm. Is it OK? Thanks. bin Patch a: 2013-10-17 Bin Cheng * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. Patch b: 2013-10-17 Bin Cheng * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead of TREE_INT_CST_LOW. gcc/testsuite/ChangeLog 2013-10-17 Bin Cheng * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. > -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, October 02, 2013 4:32 PM > To: Bin.Cheng > Cc: Bin Cheng; GCC Patches > Subject: Re: [PATCH]Fix computation of offset in ivopt > > On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng wrote: > > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener > > wrote: > >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng > wrote: > >>> > >>> > >> > >> I don't think you need > >> > >> + /* Sign extend off if expr is in type which has lower precision > >> + than HOST_WIDE_INT. */ > >> + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) > >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); > >> > >> at least it would be suspicious if you did ... > > There is a problem for example of the first message. The iv base if like: > > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure > > why but it seems (-4/0xFFFC) is represented by (1073741823*4). > > For each operand strip_offset_1 returns exactly the positive number > > and result of multiplication never get its chance of sign extend. > > That's why the sign extend is necessary to fix the problem. Or it > > should be fixed elsewhere by representing iv base with: > > "pretmp_184 + ((sizetype) KeyIndex_180 + 4294967295) * 4" in the first > place. > > Yeah, that's why I said the whole issue with forcing all offsets to be unsigned > is a mess ... > > There is really no good answer besides not doing that I fear. > > Yes, in the above case we could fold the whole thing differently (interpret > the offset of a POINTER_PLUS_EXPR as signed). You can try tracking down > the offender, but it'll get non-trivial easily as you have to consider the fact > that GCC will treat signed operations as having undefined behavior on > overflow. > > So I see why you want to do the extension above (re-interpret the result), I > suppose we can live with it but please make sure to add a big fat ??? > comment before it explaining why it is necessary. > > Richard. > > >> > >> The only case that I can think of points to a bug in strip_offset_1 > >> again, namely if sizetype (the type of all offsets) is smaller than a > >> HOST_WIDE_INT in which case > >> > >> +boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); > >> +*offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; > >> > >> is wrong as boffset / BITS_PER_UNIT does not do a signed division > >> then (for negative boffset which AFAIK does not happen - but it would > >> be technically allowed). Thus, the predicates like > >> > >> +&& cst_and_fits_in_hwi (tmp) > >> > >> would need to be amended with a check that the MSB is not set. > > So I can handle it like: > > > > +abs_boffset = abs_hwi (boffset); > > +x = abs_boffset / BITS_PER_UNIT; > > +if (boffset < 0) > > + x = -x; > > +*offset = off0 + int_cst_value (tmp) + x; > > > > Right? > > > >> > >> Btw, the cst_and_fits_in_hwi implementation is odd: > >> > >> bool > >> cst_and_fits_in_hwi (const_tree x) > >> { > >> if (TREE_CODE (x) != INTEGER_CST) > >
Re: [PATCH]Fix computation of offset in ivopt
Hi Richard, Is the first patch OK? Since there is a patch depending on it. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01931.html Thanks. bin On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener wrote: > On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng wrote: >> Hi, >> As noted in previous messages, GCC forces offset to unsigned in middle end. >> It also gets offset value and stores it in HOST_WIDE_INT then uses it in >> various computation. In this scenario, It should use int_cst_value to do >> additional sign extension against the type of tree node, otherwise we might >> lose sign information. Take function fold_plusminus_mult_expr as an >> example, the sign information of arg01/arg11 might be lost because it uses >> TREE_INT_CST_LOW directly. I think this is the offender of the problem in >> this thread. I think we should fix the offender, rather the hacking >> strip_offset as in the original patch, so I split original patch into two as >> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the >> other fixing the mentioned sign extension problem. > > Index: gcc/fold-const.c > === > --- gcc/fold-const.c (revision 203267) > +++ gcc/fold-const.c (working copy) > @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre >HOST_WIDE_INT int01, int11, tmp; >bool swap = false; >tree maybe_same; > - int01 = TREE_INT_CST_LOW (arg01); > - int11 = TREE_INT_CST_LOW (arg11); > + int01 = int_cst_value (arg01); > + int11 = int_cst_value (arg11); > > this is not correct - it will mishandle all large unsigned numbers. > > The real issue is that we rely on twos-complement arithmetic to work > when operating on pointer offsets (because we convert them all to > unsigned sizetype). That makes interpreting the offsets or expressions > that compute offsets hard (or impossible in some cases), as you can > see from the issues in IVOPTs. > > The above means that strip_offset_1 cannot reliably look through > MULT_EXPRs as that can make an offset X * CST that is > "effectively" signed "surely" unsigned in the process. Think of > this looking into X * CST as performing a division - whose result > is dependent on the sign of X which we lost with our unsigned > casting. Now, removing the MULT_EXPR look-through might > be too pessimizing ... but it may be worth trying to see if it fixes > your issue? In this context I also remember a new bug filed > recently of us not folding x /[ex] 4 * 4 to x. > > In the past I was working multiple times on stopping doing that > (make all offsets unsigned), but I never managed to finish ... > > Richard. > >> Bootstrap and test on x86/x86_64/arm. Is it OK? >> >> Thanks. >> bin >> >> Patch a: >> 2013-10-17 Bin Cheng >> >> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. >> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. >> >> Patch b: >> 2013-10-17 Bin Cheng >> >> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead >> of TREE_INT_CST_LOW. >> >> gcc/testsuite/ChangeLog >> 2013-10-17 Bin Cheng >> >> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. >> >>> -Original Message- >>> From: Richard Biener [mailto:richard.guent...@gmail.com] >>> Sent: Wednesday, October 02, 2013 4:32 PM >>> To: Bin.Cheng >>> Cc: Bin Cheng; GCC Patches >>> Subject: Re: [PATCH]Fix computation of offset in ivopt >>> >>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng wrote: >>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener >>> > wrote: >>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng >>> wrote: >>> >>> >>> >>> >>> >> >>> >> I don't think you need >>> >> >>> >> + /* Sign extend off if expr is in type which has lower precision >>> >> + than HOST_WIDE_INT. */ >>> >> + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) >>> >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); >>> >> >>> >> at least it would be suspicious if you did ... >>> > There is a problem for example of the first message. The iv base if >> like: >>> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure >>> > why but it seems (-4/0xFFFC) is represented by (1073741823*4). >>&
Re: [PATCH]Fix computation of offset in ivopt
On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener wrote: > On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng wrote: >> Hi, >> As noted in previous messages, GCC forces offset to unsigned in middle end. >> It also gets offset value and stores it in HOST_WIDE_INT then uses it in >> various computation. In this scenario, It should use int_cst_value to do >> additional sign extension against the type of tree node, otherwise we might >> lose sign information. Take function fold_plusminus_mult_expr as an >> example, the sign information of arg01/arg11 might be lost because it uses >> TREE_INT_CST_LOW directly. I think this is the offender of the problem in >> this thread. I think we should fix the offender, rather the hacking >> strip_offset as in the original patch, so I split original patch into two as >> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the >> other fixing the mentioned sign extension problem. > > Index: gcc/fold-const.c > === > --- gcc/fold-const.c (revision 203267) > +++ gcc/fold-const.c (working copy) > @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre >HOST_WIDE_INT int01, int11, tmp; >bool swap = false; >tree maybe_same; > - int01 = TREE_INT_CST_LOW (arg01); > - int11 = TREE_INT_CST_LOW (arg11); > + int01 = int_cst_value (arg01); > + int11 = int_cst_value (arg11); > > this is not correct - it will mishandle all large unsigned numbers. As far as the patch itself. I think the preceding host_integerp already checks for this case: int host_integerp (const_tree t, int pos) { if (t == NULL_TREE) return 0; return (TREE_CODE (t) == INTEGER_CST && ((TREE_INT_CST_HIGH (t) == 0 && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) >= 0) || (! pos && TREE_INT_CST_HIGH (t) == -1 && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) < 0 && !TYPE_UNSIGNED (TREE_TYPE (t))) || (pos && TREE_INT_CST_HIGH (t) == 0))); } Since the call is host_integerp (xxx, 0), it returns 0 for large unsigned numbers, right? > > The real issue is that we rely on twos-complement arithmetic to work > when operating on pointer offsets (because we convert them all to > unsigned sizetype). That makes interpreting the offsets or expressions > that compute offsets hard (or impossible in some cases), as you can > see from the issues in IVOPTs. > > The above means that strip_offset_1 cannot reliably look through > MULT_EXPRs as that can make an offset X * CST that is > "effectively" signed "surely" unsigned in the process. Think of > this looking into X * CST as performing a division - whose result > is dependent on the sign of X which we lost with our unsigned > casting. Now, removing the MULT_EXPR look-through might > be too pessimizing ... but it may be worth trying to see if it fixes > your issue? In this context I also remember a new bug filed > recently of us not folding x /[ex] 4 * 4 to x. > > In the past I was working multiple times on stopping doing that > (make all offsets unsigned), but I never managed to finish ... > > Richard. > >> Bootstrap and test on x86/x86_64/arm. Is it OK? >> >> Thanks. >> bin >> >> Patch a: >> 2013-10-17 Bin Cheng >> >> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. >> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. >> >> Patch b: >> 2013-10-17 Bin Cheng >> >> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead >> of TREE_INT_CST_LOW. >> >> gcc/testsuite/ChangeLog >> 2013-10-17 Bin Cheng >> >> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. >> >>> -Original Message- >>> From: Richard Biener [mailto:richard.guent...@gmail.com] >>> Sent: Wednesday, October 02, 2013 4:32 PM >>> To: Bin.Cheng >>> Cc: Bin Cheng; GCC Patches >>> Subject: Re: [PATCH]Fix computation of offset in ivopt >>> >>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng wrote: >>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener >>> > wrote: >>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng >>> wrote: >>> >>> >>> >>> >>> >> >>> >> I don't think you need >>> >> >>> >> + /* Sign extend off if expr is in type which has lower precision >>> >> + than HOST_WIDE_INT. */ >>> >> + if
Re: [PATCH]Fix computation of offset in ivopt
On Mon, Oct 21, 2013 at 8:21 PM, Richard Biener wrote: > On Fri, Oct 18, 2013 at 5:48 PM, Bin.Cheng wrote: >> Hi Richard, >> Is the first patch OK? Since there is a patch depending on it. >> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01931.html > > Yes. I committed the patch fixing strip_offset_1 as r204116. Since the root cause of the second issue is POINTER_PLUS_EXPR requires an unsigned offset operand and can't be fixed as in the second patch, I will discard that patch. Thanks. bin >> On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener >> wrote: >>> On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng wrote: >>>> Hi, >>>> As noted in previous messages, GCC forces offset to unsigned in middle end. >>>> It also gets offset value and stores it in HOST_WIDE_INT then uses it in >>>> various computation. In this scenario, It should use int_cst_value to do >>>> additional sign extension against the type of tree node, otherwise we might >>>> lose sign information. Take function fold_plusminus_mult_expr as an >>>> example, the sign information of arg01/arg11 might be lost because it uses >>>> TREE_INT_CST_LOW directly. I think this is the offender of the problem in >>>> this thread. I think we should fix the offender, rather the hacking >>>> strip_offset as in the original patch, so I split original patch into two >>>> as >>>> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, >>>> the >>>> other fixing the mentioned sign extension problem. >>> >>> Index: gcc/fold-const.c >>> === >>> --- gcc/fold-const.c (revision 203267) >>> +++ gcc/fold-const.c (working copy) >>> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre >>>HOST_WIDE_INT int01, int11, tmp; >>>bool swap = false; >>>tree maybe_same; >>> - int01 = TREE_INT_CST_LOW (arg01); >>> - int11 = TREE_INT_CST_LOW (arg11); >>> + int01 = int_cst_value (arg01); >>> + int11 = int_cst_value (arg11); >>> >>> this is not correct - it will mishandle all large unsigned numbers. >>> >>> The real issue is that we rely on twos-complement arithmetic to work >>> when operating on pointer offsets (because we convert them all to >>> unsigned sizetype). That makes interpreting the offsets or expressions >>> that compute offsets hard (or impossible in some cases), as you can >>> see from the issues in IVOPTs. >>> >>> The above means that strip_offset_1 cannot reliably look through >>> MULT_EXPRs as that can make an offset X * CST that is >>> "effectively" signed "surely" unsigned in the process. Think of >>> this looking into X * CST as performing a division - whose result >>> is dependent on the sign of X which we lost with our unsigned >>> casting. Now, removing the MULT_EXPR look-through might >>> be too pessimizing ... but it may be worth trying to see if it fixes >>> your issue? In this context I also remember a new bug filed >>> recently of us not folding x /[ex] 4 * 4 to x. >>> >>> In the past I was working multiple times on stopping doing that >>> (make all offsets unsigned), but I never managed to finish ... >>> >>> Richard. >>> >>>> Bootstrap and test on x86/x86_64/arm. Is it OK? >>>> >>>> Thanks. >>>> bin >>>> >>>> Patch a: >>>> 2013-10-17 Bin Cheng >>>> >>>> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. >>>> Count DECL_FIELD_BIT_OFFSET when computing offset for >>>> COMPONENT_REF. >>>> >>>> Patch b: >>>> 2013-10-17 Bin Cheng >>>> >>>> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value >>>> instead >>>> of TREE_INT_CST_LOW. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2013-10-17 Bin Cheng >>>> >>>> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. >>>> >>>>> -Original Message- >>>>> From: Richard Biener [mailto:richard.guent...@gmail.com] >>>>> Sent: Wednesday, October 02, 2013 4:32 PM >>>>> To: Bin.Cheng >>>>> Cc: Bin Cheng; GCC Patches >>>>> Subject: Re: [PATCH]Fix computation of
[PATCH GCC]Simplify address expression in IVOPT
Hi, I noticed that IVOPT generates complex address expressions like below for iv base. &arr_base[0].y &arr[0] &MEM[p+o] It's even worse for targets support auto-increment addressing mode because IVOPT adjusts such base expression with +/- step, then creates below: &arr_base[0].y +/- step &arr[0] +/- step &MEM[p+o] +/- step It has two disadvantages: 1) Cost computation in IVOPT can't handle complex address expression and general returns spill_cost for it, which is bad since address iv is important to IVOPT. 2) IVOPT creates duplicate candidates for IVs which have same value in different forms, for example, two candidates are generated with each for "&a[0]" and "&a". Again, it's even worse for auto-increment addressing mode. This patch fixes the issue by simplifying address expression at the entry of allocating IV struct. Maybe the simplification can be put in various fold* functions but I think it might be better in this way, because: 1) fold* functions are used from front-end to various tree optimizations, the simplified address expressions may not be what each optimizer wanted. Think about parallelism related passes, they might want the array index information kept for further analysis. 2) In some way, the simplification is conflict with current implementation of fold* function. Take fold_binary_loc as an example, it tries to simplify "&a[i1] +p c* i2" into "&a[i1+i2]". Of course we can simplify in this way for IVOPT too, but that will cause new problems like: a) we have to add code in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want to avoid; b) the simplification can't always be done because of the sign/unsigned offset problem (especially for auto-increment addressing mode). 3) There are many entry point for fold* functions, the change will be non-trivial. 4) The simplification is only done in alloc_iv for true (not duplicate ones) iv struct, the number of such iv should be moderate. With these points, I think it might be a win to do the simplification in IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? Bootstrap and tested on x86/x86_64/arm. The patch causes three cases failed on some target, but all of them are false alarm, which can be resolved by refining test cases to check more accurate information. Is it OK? Thanks. bin gcc/testsuite/ChangeLog 2013-10-29 Bin Cheng * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-10-29 Bin Cheng * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression. (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy) @@ -27,7 +27,7 @@ void xxx(void) /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */ /* 17 * iter should be strength reduced. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern char a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can not infer loop iteration from array -- exit test can not be + replaced by the array address. */ void foo (unsigned int i_width, TYPE dst) { unsigned long long i = 0; @@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst) } } -/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(working copy) @@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2) return s; } -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/tree-ssa-loop-ivopts.c ==
[PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT
Hi, The IVOPT in GCC has a problem that it does not use cost of auto-increment address expression in accounting, while it retreats to cost of address expression if auto-increment addressing mode is unavailable. For example, on ARM target: 1) the cost of "[reg]" (which is 6) is used for address expression "[reg], #off"; 2) the cost of "[reg+off]" (which is 2) is used for address expression "[reg, #off]!"; This causes: 1) cost of non-auto increment address expression is used for auto-increment address expression; 2) different address costs are used for pre/post increment address expressions. This patch fixes the problem by computing, caching and using the cost of auto-increment address expressions. Bootstrap and test on x86/arm. Is it OK? 2013-11-01 Bin Cheng * tree-ssa-loop-ivopts.c (enum ainc_type): New. (address_cost_data): New field. (get_address_cost): Compute auto-increment rtx cost in ainc_costs. Use ainc_costs for auto-increment rtx patterns. Cleanup TWS. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 204117) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -3185,10 +3185,21 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT rat TODO -- there must be some better way. This all is quite crude. */ +enum ainc_type +{ + AINC_PRE_INC,/* Pre increment. */ + AINC_PRE_DEC,/* Pre decrement. */ + AINC_POST_INC, /* Post increment. */ + AINC_POST_DEC, /* Post decrement. */ + AINC_NUM,/* Number of auto increment types. */ + AINC_NONE +}; + typedef struct address_cost_data_s { HOST_WIDE_INT min_offset, max_offset; unsigned costs[2][2][2][2]; + unsigned ainc_costs[AINC_NUM]; } *address_cost_data; @@ -3206,6 +3217,7 @@ get_address_cost (bool symbol_present, bool var_pr static bool has_preinc[MAX_MACHINE_MODE], has_postinc[MAX_MACHINE_MODE]; static bool has_predec[MAX_MACHINE_MODE], has_postdec[MAX_MACHINE_MODE]; unsigned cost, acost, complexity; + enum ainc_type autoinc_type; bool offset_p, ratio_p, autoinc; HOST_WIDE_INT s_offset, autoinc_offset, msize; unsigned HOST_WIDE_INT mask; @@ -3277,33 +3289,49 @@ get_address_cost (bool symbol_present, bool var_pr reg0 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1); reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2); - if (USE_LOAD_PRE_DECREMENT (mem_mode) + if (USE_LOAD_PRE_DECREMENT (mem_mode) || USE_STORE_PRE_DECREMENT (mem_mode)) { addr = gen_rtx_PRE_DEC (address_mode, reg0); has_predec[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); + + if (has_predec[mem_mode]) + data->ainc_costs[AINC_PRE_DEC] + = address_cost (addr, mem_mode, as, speed); } - if (USE_LOAD_POST_DECREMENT (mem_mode) + if (USE_LOAD_POST_DECREMENT (mem_mode) || USE_STORE_POST_DECREMENT (mem_mode)) { addr = gen_rtx_POST_DEC (address_mode, reg0); has_postdec[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); + + if (has_postdec[mem_mode]) + data->ainc_costs[AINC_POST_DEC] + = address_cost (addr, mem_mode, as, speed); } - if (USE_LOAD_PRE_INCREMENT (mem_mode) + if (USE_LOAD_PRE_INCREMENT (mem_mode) || USE_STORE_PRE_DECREMENT (mem_mode)) { addr = gen_rtx_PRE_INC (address_mode, reg0); has_preinc[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); + + if (has_preinc[mem_mode]) + data->ainc_costs[AINC_PRE_INC] + = address_cost (addr, mem_mode, as, speed); } - if (USE_LOAD_POST_INCREMENT (mem_mode) + if (USE_LOAD_POST_INCREMENT (mem_mode) || USE_STORE_POST_INCREMENT (mem_mode)) { addr = gen_rtx_POST_INC (address_mode, reg0); has_postinc[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); + + if (has_postinc[mem_mode]) + data->ainc_costs[AINC_POST_INC] + = address_cost (addr, mem_mode, as, speed); } for (i = 0; i < 16; i++) { @@ -3429,22 +3457,32 @@ get_address_cost (bool symbol_present, bool var_pr s_offset = offset; autoinc = false; + autoinc_type = AINC_NONE; msize = GET_MODE_SIZE (mem_mode); autoinc_offset = offset; if (stmt_after_inc) autoinc_offset += ratio * cstep; if (symbol_present || var_present || ratio != 1) autoinc = false; - else if ((has_postinc[mem_mode] && autoinc_offset == 0 + else +{ + if (has_postinc[mem_mode] && autoinc_offset == 0 + && msize == cstep) + autoinc_type = AINC_POST_INC; + else if (has_postdec[mem_mode] && autoinc_offset == 0 + && msize == -cstep) + autoinc_type = AINC_
RE: [PATCH GCC]Simplify address expression in IVOPT
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, October 30, 2013 10:46 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT > > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng wrote: > > Hmm. I think you want what get_inner_reference_aff does, using the return > value of get_inner_reference as starting point for determine_base_object. > And you don't want to restrict yourselves so much on what exprs to process, > but only exclude DECL_Ps. Did you mean I should pass all address expressions to get_inner_reference_aff except the one with declaration operand? I am a little confused here why DECL_Ps should be handled specially? Seems it can be handled properly by the get_* function. Anything important I missed? > Just amend get_inner_reference_aff to return the tree base object. I found it's inappropriate to do this because: functions like get_inner_reference* only accept reference expressions, while base_object has to be computed for other kinds of expressions too. Take gimple statement "a_1 = *ptr_1" as an example, the base passed to alloc_iv is ptr_1; the base_object computed by determine_base_object is also ptr_1, which we can't rely on get_inner_reference* . Also It seems the comment before determine_base_object might be inaccurate: " Returns a memory object to that EXPR points." which I think is " Returns a pointer pointing to the main memory object to that EXPR points." Right? > > Note that this isn't really "simplifying" but rather lowering all addresses. > > The other changes in this patch are unrelated, right? Right, I should have named this message like "refine cost computation of expressions in IVOPT" just as the patch file. Thanks. bin
Re: [PATCH GCC]Compute, cache and use cost of auto-increment rtx patterns in IVOPT
On Mon, Nov 4, 2013 at 7:38 PM, Richard Biener wrote: > On Mon, Nov 4, 2013 at 4:31 AM, bin.cheng wrote: >> Hi, >> >> The IVOPT in GCC has a problem that it does not use cost of auto-increment >> address expression in accounting, while it retreats to cost of address >> expression if auto-increment addressing mode is unavailable. >> For example, on ARM target: >> 1) the cost of "[reg]" (which is 6) is used for address expression "[reg], >> #off"; >> 2) the cost of "[reg+off]" (which is 2) is used for address expression >> "[reg, #off]!"; >> >> This causes: >> 1) cost of non-auto increment address expression is used for auto-increment >> address expression; >> 2) different address costs are used for pre/post increment address >> expressions. >> This patch fixes the problem by computing, caching and using the cost of >> auto-increment address expressions. >> >> Bootstrap and test on x86/arm. Is it OK? > > But don't you need to adjust > > static bool > determine_use_iv_cost_address (struct ivopts_data *data, >struct iv_use *use, struct iv_cand *cand) > { > bitmap depends_on; > bool can_autoinc; > int inv_expr_id = -1; > comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on, > &can_autoinc, &inv_expr_id); > > if (cand->ainc_use == use) > { > if (can_autoinc) > cost.cost -= cand->cost_step; > > this which seems to try to compensate for your issue? That's where problem gets complicated depending on how backend defines address cost. For back ends define cost according to the true cost of addressing mode approximately, the address cost of auto-increment addressing mode doesn't take the saved stepping instruction into consideration, so the code is necessary. Moreover, according to gcc internal's description about TARGET_ADDRESS_COST, RISC machines may define different address cost for addressing modes which actually have equal execution on micro-architecture level (like ARM for now). The problems are: 1) Though address costs are defined in this "discriminated" way, it's unlikely to have the saved stepping instruction considered either. The address cost of auto-increment address expression shouldn't go so far. 2) We think the "discriminated" address cost model is established before gimple pass and is outdated. The true micro-architecture address cost (or cost normalized with COSTS_N_INSNS) should be used in GCC nowadays. The rtl passes like fwprop_addr which use address cost as heuristic information should be refined... but that's totally another problem (am investigating it). So overall, I think the stepping cost should to be subtracted here. > > Or maybe I don't understand. > > CCing Bernd who implemented this IIRC. Any suggestions appreciated. Thanks. bin -- Best Regards.
RE: [PATCH GCC]Simplify address expression in IVOPT
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of bin.cheng > Sent: Monday, November 04, 2013 4:35 PM > To: 'Richard Biener' > Cc: GCC Patches > Subject: RE: [PATCH GCC]Simplify address expression in IVOPT > > > > > -Original Message- > > From: Richard Biener [mailto:richard.guent...@gmail.com] > > Sent: Wednesday, October 30, 2013 10:46 PM > > To: Bin Cheng > > Cc: GCC Patches > > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT > > > > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng wrote: > > > > Hmm. I think you want what get_inner_reference_aff does, using the > > return value of get_inner_reference as starting point for > determine_base_object. > > And you don't want to restrict yourselves so much on what exprs to > process, > > but only exclude DECL_Ps. > Did you mean I should pass all address expressions to > get_inner_reference_aff except the one with declaration operand? > I am a little confused here why DECL_Ps should be handled specially? Seems > it can be handled properly by the get_* function. Anything important I > missed? > > > Just amend get_inner_reference_aff to return the tree base object. > I found it's inappropriate to do this because: functions like > get_inner_reference* only accept reference expressions, while base_object > has to be computed for other kinds of expressions too. Take gimple > statement "a_1 = *ptr_1" as an example, the base passed to alloc_iv is ptr_1; > the base_object computed by determine_base_object is also ptr_1, which > we can't rely on get_inner_reference* . > > Also It seems the comment before determine_base_object might be > inaccurate: > " Returns a memory object to that EXPR points." which I think is " Returns a > pointer pointing to the main memory object to that EXPR points." > Right? > > > > Note that this isn't really "simplifying" but rather lowering all > addresses. > > > > The other changes in this patch are unrelated, right? > Right, I should have named this message like "refine cost computation of > expressions in IVOPT" just as the patch file. Hi, I updated the patch according to review comments. Now it lowers all address expressions except ones with DECL_P to get_inner_reference_aff, it also computes base_object which is used later to ease determine_base_object. Bootstrap and test on x86/x86_64/arm on going, is it OK if tests pass? Thanks. bin gcc/testsuite/ChangeLog 2013-11-05 Bin Cheng * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-11-05 Bin Cheng * tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions. (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost. * tree-affine.c (get_inner_reference_aff): Return base_addr. * tree-affine.h (get_inner_reference_aff): Change prototype.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy) @@ -27,7 +27,7 @@ void xxx(void) /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */ /* 17 * iter should be strength reduced. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern char a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can not infer loop iteration from array -- exit test can not be + replaced by the array address. */ void foo (unsigned int i_width, TYPE dst) { unsigned long long i = 0; @@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst) } } -/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts&
Re: [PATCH GCC]Simplify address expression in IVOPT
On Tue, Nov 5, 2013 at 7:19 PM, Yufeng Zhang wrote: > On 11/05/13 10:13, bin.cheng wrote: >> >> Index: gcc/tree-affine.c >> === >> --- gcc/tree-affine.c (revision 204117) >> +++ gcc/tree-affine.c (working copy) >> @@ -874,10 +874,11 @@ debug_aff (aff_tree *val) >> fprintf (stderr, "\n"); >> } >> >> -/* Returns address of the reference REF in ADDR. The size of the >> accessed >> - location is stored to SIZE. */ >> +/* Computes address of the reference REF in ADDR. The size of the >> accessed >> + location is stored to SIZE. Returns pointer to the ultimate >> containing >> + object to which REF refers. */ >> >> -void >> +tree >> get_inner_reference_aff (tree ref, aff_tree *addr, double_int *size) >> { >> HOST_WIDE_INT bitsize, bitpos; >> @@ -904,6 +905,8 @@ get_inner_reference_aff (tree ref, aff_tree *addr, >> aff_combination_add (addr,&tmp); >> >> *size = double_int::from_shwi ((bitsize + BITS_PER_UNIT - 1) / >> BITS_PER_UNIT); >> + >> + return base_addr; >> } >> > > I think what Richard suggests is to return the base object rather the > address of the base object, i.e. I am not sure about that. We have to pass pointer_type expression to function determine_base_object for address expressions, because there is no way to tell pointer from object once we are in determine_base_object. Thanks. bin > > return base; > > This is good in keeping the consistency in the return values between > get_inner_reference_aff and get_inner_reference. > > Yufeng > -- Best Regards.
Re: [PATCH PR68021]Set ratio to 1 when computing the value of biv cand by itself
On Wed, Feb 10, 2016 at 1:27 PM, Richard Biener wrote: > > On Wed, Feb 10, 2016 at 12:34 PM, Bin Cheng wrote: > > Hi, > > This is another way to fix PR68021, and I think it's the least intrusive > > way. The issue is triggered in a special case in which cand is a original > > biv, and use denotes the value of the biv itself. In this case, the use is > > added specifically for the original biv, as a result, get_computation_aff > > isn't called for the pair before rewriting the use. It is > > possible that constant_multiple_of/operand_equal_q could fail because of > > inconsistent fold behavior. The fold behavior is fully described in > > PR68021. > > This patch fixes IVOPT part of issue by setting ratio to 1, because it is > > known that the use has the value of the biv cand. > > > > Bootstrap and test on x86_64 and aarch64. Is it OK if no failures? > > Ok, but please add a comment why we have this special-casing instead > of relying on constant_multiple_of. Done, patch applied at revision 233269. Thanks, bin > > > Thanks, > Richard. > > > Thanks, > > bin > > > > 2016-02-09 Bin Cheng > > > > PR tree-optimization/68021 > > * tree-ssa-loop-ivopts.c (get_computation_aff): Set ratio to 1 if > > when computing the value of biv cand by itself. > > > > gcc/testsuite/ChangeLog > > 2016-02-09 Bin Cheng > > > > PR tree-optimization/68021 > > * gcc.dg/tree-ssa/pr68021.c: New test.
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
On Thu, Feb 11, 2016 at 7:14 AM, Jeff Law wrote: > On 02/09/2016 04:08 AM, Bin Cheng wrote: >> >> Hi, >> When counting cost for loop inv, GCC checks if a loop inv can be >> propagated into its use site (a memory reference). If it cannot be >> propagated, we increase its cost so that it's expensive enough to be hoisted >> out of loop. Currently we simply replace loop inv register in the use site >> with its definition expression, then call validate_changes to check if the >> result insn is valid. This is weak because validate_changes doesn't take >> canonicalization into consideration. Given below example: >> >>Loop inv def: >> 69: r149:SI=r87:SI+const(unspec[`'] 1) >>REG_DEAD r87:SI >>Loop inv use: >> 70: r150:SI=[r90:SI*0x4+r149:SI] >>REG_DEAD r149:SI >> >> The address expression after propagation is "r90 * 0x4 + (r87 + >> const(unspec[`']))". Function validate_changes simply returns false to >> it. As a matter of fact, the propagation is feasible if we canonicalize >> address expression into the form like "(r90 * 0x4 + r87) + >> const(unspec[`'])". >> >> This patch fixes the problem by canonicalizing address expression and >> verifying if the new addr is valid. The canonicalization follows GCC insn >> canonicalization rules. The test case from bugzilla PR is also included. >> As for the canonicalize_address interface, there is another >> canonicalize_address in fwprop.c which only changes shift into mult. I >> think it would be good to factor out a common RTL interface in GCC, but >> that's stage1 work. > > > Also note there's bits in combine that will canonicalize appropriate shifts > into mults. Clearly there's a need for some generalized routines to take a > fairly generic address and perform canonicalizations and simplifications on > it. > >> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> Thanks, >> bin >> >> 2016-02-09 Bin Cheng >> >> PR tree-optimization/69052 >> * loop-invariant.c (canonicalize_address): New function. >> (inv_can_prop_to_addr_use): Check validity of address expression >> which is canonicalized by above function. >> >> gcc/testsuite/ChangeLog >> 2016-02-09 Bin Cheng >> >> PR tree-optimization/69052 >> * gcc.target/i386/pr69052.c: New test. >> >> >> pr69052-20160204.txt >> >> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c >> index 707f044..157e273 100644 >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -754,6 +754,74 @@ create_new_invariant (struct def *def, rtx_insn >> *insn, bitmap depends_on, >> return inv; >> } >> >> +/* Returns a canonical version address for X. It identifies >> + addr expr in the form of A + B + C. Following instruction >> + canonicalization rules, MULT operand is moved to the front, >> + CONST operand is moved to the end; also PLUS operators are >> + chained to the left. */ >> + >> +static rtx >> +canonicalize_address (rtx x) >> +{ >> + rtx op0, op1, op2; >> + machine_mode mode = GET_MODE (x); >> + enum rtx_code code = GET_CODE (x); >> + >> + if (code != PLUS) >> +return x; >> + >> + /* Extract operands from A + B (+ C). */ >> + if (GET_CODE (XEXP (x, 0)) == PLUS) >> +{ >> + op0 = XEXP (XEXP (x, 0), 0); >> + op1 = XEXP (XEXP (x, 0), 1); >> + op2 = XEXP (x, 1); >> +} >> + else if (GET_CODE (XEXP (x, 1)) == PLUS) >> +{ >> + op0 = XEXP (x, 0); >> + op1 = XEXP (XEXP (x, 1), 0); >> + op2 = XEXP (XEXP (x, 1), 1); >> +} >> + else >> +{ >> + op0 = XEXP (x, 0); >> + op1 = XEXP (x, 1); >> + op2 = NULL_RTX; >> +} >> + >> + /* Move MULT operand to the front. */ >> + if (!REG_P (op1) && !CONST_INT_P (op1)) >> +std::swap (op0, op1); > > This feels a bit hack-ish in the sense that you already know the form of the > RTL you're expecting and just assume that you'll be given something of that > form, but no more complex. > > ISTM you're better off walking the whole rtx, recording the tidbits as you > go into a vec. If you see something unexpected during that walk, you punt > canonicalization of the whole expression. > > You then sort the vec. You want to move things like MULT to the start and > all the constants to the end I think. > > You then do simplifications, particularly on the constants, but there may be > something useful to do with MULT terms as well. You could also arrange to > rewrite ASHIFTs into MULTs at this stage. > > Then you generate a new equivalent expression from the simplified operands > in the vec. > > You might look at tree-ssa-reassoc for ideas on implementation details. > > Initially just use it in the LICM code, but I think given that kind of > structure it'd be generally useful to replace bits of combine and fwprop > > If your contention is that only a few forms really matter, then I'd like to > see those forms spelled out better in the comment and some kind of checking > that we have reasonable inc
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
On Fri, Feb 19, 2016 at 10:24 PM, Jeff Law wrote: > On 02/16/2016 11:43 AM, Bin Cheng wrote: >> >> >> From: Jeff Law >> Sent: 11 February 2016 23:26 >> To: Bin.Cheng >> Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd >> Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem >> ref with additional addr expr canonicalization >> >>>> On 02/11/2016 10:59 AM, Bin.Cheng wrote: >> >> >>>> Hi Jeff, >>>> Thanks for detailed review. I also think a generic canonical >>>> interface for RTL is much better. I will give it a try. But with >>>> high chance it's a next stage1 stuff. >>> >>> That is, of course, fine. However, if you do get something ready, I'd >>> support using it within LICM for gcc-6, then using it in other places >>> for gcc-7. >> >> Hi, >> This is the updated version patch. It fixes the problem by introducing a >> generic address canonicalization interface. This new interface >> canonicalizes address expression in following steps: >> 1) Rewrite ASHIFT into MULT recursively. >> 2) Divide address into sub expressions with PLUS as the separator. >> 3) Sort sub expressions according to precedence defined for >> communative operations. >> 4) Simplify CONST_INT_P sub expressions. >> 5) Create new canonicalized address and return. >> >> According to review comments, this interface is now restricted in LCIM, >> and will probably be expanded to other passes like fwprop and combine after >> entering GCC7. >> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> Thanks, >> bin >> >> 2016-02-15 Bin Cheng >> >> PR tree-optimization/69052 >> * loop-invariant.c (canonicalize_address_mult): New function. >> (MAX_CANON_ADDR_PARTS): New macro. >> (collect_address_parts): New function. >> (compare_address_parts, canonicalize_address): New functions. >> (inv_can_prop_to_addr_use): Check validity of address expression >> which is canonicalized by above canonicalize_address. >> >> gcc/testsuite/ChangeLog >> 2016-02-15 Bin Cheng >> >> PR tree-optimization/69052 >> * gcc.target/i386/pr69052.c: New test. > > This is exactly what I was looking for from a design standpoint. > > My only question is why didn't you use FOR_EACH_SUBRTX_VRA from rtl-iter.h > to walk the RTX expressions in collect_address_parts and > canonicalize_address_mult? Hi Jeff, Nothing special, just I haven't used this before, also canonicalize_address_mult is alphabetically copied from fwprop.c. One question is when rewriting SHIFT to MULT, we need to modify rtl expression in place, does FOR_EACH_SUBRTX iterator support this? If yes, what is the behavior for modified sub-expression? Thanks, bin > > Jeff > > >> >>> >>> Jeff >> >> >
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
On Fri, Feb 19, 2016 at 10:24 PM, Jeff Law wrote: > On 02/16/2016 11:43 AM, Bin Cheng wrote: >> >> >> From: Jeff Law >> Sent: 11 February 2016 23:26 >> To: Bin.Cheng >> Cc: Bin Cheng; gcc-patches@gcc.gnu.org; nd >> Subject: Re: [PATCH PR69052]Check if loop inv can be propagated into mem >> ref with additional addr expr canonicalization >> >>>> On 02/11/2016 10:59 AM, Bin.Cheng wrote: >> >> >>>> Hi Jeff, >>>> Thanks for detailed review. I also think a generic canonical >>>> interface for RTL is much better. I will give it a try. But with >>>> high chance it's a next stage1 stuff. >>> >>> That is, of course, fine. However, if you do get something ready, I'd >>> support using it within LICM for gcc-6, then using it in other places >>> for gcc-7. >> >> Hi, >> This is the updated version patch. It fixes the problem by introducing a >> generic address canonicalization interface. This new interface >> canonicalizes address expression in following steps: >> 1) Rewrite ASHIFT into MULT recursively. >> 2) Divide address into sub expressions with PLUS as the separator. >> 3) Sort sub expressions according to precedence defined for >> communative operations. >> 4) Simplify CONST_INT_P sub expressions. >> 5) Create new canonicalized address and return. >> >> According to review comments, this interface is now restricted in LCIM, >> and will probably be expanded to other passes like fwprop and combine after >> entering GCC7. >> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> Thanks, >> bin >> >> 2016-02-15 Bin Cheng >> >> PR tree-optimization/69052 >> * loop-invariant.c (canonicalize_address_mult): New function. >> (MAX_CANON_ADDR_PARTS): New macro. >> (collect_address_parts): New function. >> (compare_address_parts, canonicalize_address): New functions. >> (inv_can_prop_to_addr_use): Check validity of address expression >> which is canonicalized by above canonicalize_address. >> >> gcc/testsuite/ChangeLog >> 2016-02-15 Bin Cheng >> >> PR tree-optimization/69052 >> * gcc.target/i386/pr69052.c: New test. > > This is exactly what I was looking for from a design standpoint. > > My only question is why didn't you use FOR_EACH_SUBRTX_VRA from rtl-iter.h > to walk the RTX expressions in collect_address_parts and > canonicalize_address_mult? Hi Jeff, Here comes the updated patch using FOR_EACH_SUBRTX_VAR in both functions you mentioned. Bootstrap and test on x86_64 and AArch64, is it OK? The ChangeLog isn't affected. Thanks, bin > > Jeff > > >> >>> >>> Jeff >> >> > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 707f044..dcbe932 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. If not see #include "cfgloop.h" #include "expr.h" #include "params.h" +#include "rtl-iter.h" #include "dumpfile.h" /* The data stored for the loop. */ @@ -754,6 +755,130 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on, return inv; } +/* Return a canonical version of X for the address, from the point of view, + that all multiplications are represented as MULT instead of the multiply + by a power of 2 being represented as ASHIFT. + + Callers should prepare a copy of X because this function may modify it + in place. */ + +static void +canonicalize_address_mult (rtx x) +{ + subrtx_var_iterator::array_type array; + FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST) +{ + rtx sub = *iter; + + if (GET_CODE (sub) == ASHIFT + && CONST_INT_P (XEXP (sub, 1)) + && INTVAL (XEXP (sub, 1)) < GET_MODE_BITSIZE (GET_MODE (sub)) + && INTVAL (XEXP (sub, 1)) >= 0) + { + HOST_WIDE_INT shift = INTVAL (XEXP (sub, 1)); + PUT_CODE (sub, MULT); + XEXP (sub, 1) = gen_int_mode ((HOST_WIDE_INT) 1 << shift, + GET_MODE (sub)); + iter.skip_subrtxes (); + } +} +} + +/* Maximum number of sub expressions in address. We set it to + a small integer since it's unlikely to have a complicated + address expression. */ + +#define MAX_CANON_ADDR_PARTS (5) + +/* Collect sub expressions in address X with PLUS as the seperator. + Sub expressions are stored in vector ADDR_PARTS. */ + +static void +collect_address_parts (rtx x,
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
On Thu, Feb 25, 2016 at 6:39 AM, Jeff Law wrote: > On 02/22/2016 02:22 AM, Bin.Cheng wrote: > >>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from >>> rtl-iter.h >>> to walk the RTX expressions in collect_address_parts and >>> canonicalize_address_mult? >> >> Hi Jeff, >> Nothing special, just I haven't used this before, also >> canonicalize_address_mult is alphabetically copied from fwprop.c. One >> question is when rewriting SHIFT to MULT, we need to modify rtl >> expression in place, does FOR_EACH_SUBRTX iterator support this? If >> yes, what is the behavior for modified sub-expression? > > Hmm. The question of semantics when we change the underlying > sub-expressions is an interesting one. > > While I think we're OK in practice using the iterators, I think that's more > of an accident than by design -- if MULT/ASHIFT had a different underlying > RTL structure then I'm much less sure using the iterators would be safe. Hi Jeff, Yes, I thought about this too and finally decided to skip sub rtxes in modified MULT/ASHIFT expressions. I think this will make the patch with FOR_EACH_SUBRTX iterator safe. Although it doesn't dive into MULT/ASHIFT while the original one does, I don't think there is such case in practice, after all we can't handle such complicated address expression either. > > Let's go with your original patch that didn't use the iterators. Sorry for > making you do the additional work/testing to build the iterator version. Not even a problem. > But after pondering the issue you raised, I think your original patch is > safer. So in conclusion, I think both versions should be safe, the iterator one is definitely cleaner. It is your call which version I should apply. Thanks, bin > > > jeff
Re: [PATCH PR69052]Check if loop inv can be propagated into mem ref with additional addr expr canonicalization
On Thu, Feb 25, 2016 at 8:47 AM, Bin.Cheng wrote: > On Thu, Feb 25, 2016 at 6:39 AM, Jeff Law wrote: >> On 02/22/2016 02:22 AM, Bin.Cheng wrote: >> >>>> My only question is why didn't you use FOR_EACH_SUBRTX_VRA from >>>> rtl-iter.h >>>> to walk the RTX expressions in collect_address_parts and >>>> canonicalize_address_mult? >>> >>> Hi Jeff, >>> Nothing special, just I haven't used this before, also >>> canonicalize_address_mult is alphabetically copied from fwprop.c. One >>> question is when rewriting SHIFT to MULT, we need to modify rtl >>> expression in place, does FOR_EACH_SUBRTX iterator support this? If >>> yes, what is the behavior for modified sub-expression? >> >> Hmm. The question of semantics when we change the underlying >> sub-expressions is an interesting one. >> >> While I think we're OK in practice using the iterators, I think that's more >> of an accident than by design -- if MULT/ASHIFT had a different underlying >> RTL structure then I'm much less sure using the iterators would be safe. > Hi Jeff, > Yes, I thought about this too and finally decided to skip sub rtxes in > modified MULT/ASHIFT expressions. I think this will make the patch > with FOR_EACH_SUBRTX iterator safe. Although it doesn't dive into > MULT/ASHIFT while the original one does, I don't think there is such > case in practice, after all we can't handle such complicated address > expression either. >> >> Let's go with your original patch that didn't use the iterators. Sorry for >> making you do the additional work/testing to build the iterator version. > Not even a problem. >> But after pondering the issue you raised, I think your original patch is >> safer. > So in conclusion, I think both versions should be safe, the iterator > one is definitely cleaner. It is your call which version I should > apply. Hi Jeff, I tend to apply the new patch with FOR_EACH_SUBRTX because it's clearer. Does it work for you? Of course if you do think it's not that safe I will fall back to the old one. Thanks, bin
Re: [PATCH PR69052]Set higher precedence for CONST_WIDE_INT than CONST_INT when canonicalizing addr expr
On Fri, Mar 4, 2016 at 11:57 AM, Richard Biener wrote: > On Fri, Mar 4, 2016 at 12:07 PM, Bin Cheng wrote: >> Hi, >> A address canonicalization interface was introduced by my original patch to >> PR69052. The interface sorts sub-parts in an address expression based on >> precedences defined by function commutative_operand_precedence. It also >> assumes that all CONST_INT sub-parts are at the end of vector after sorting. >> But this is not always true because commutative_operand_precedence sets the >> same precedence to both CONST_INT and CONST_WIDE_INT. The patch could be >> broken in cases which have CONST_WIDE_INT sorted after CONST_INT. Even >> though I don't think we can really run into such complicated case, I worked >> out a patch forcing CONST_INT to lower precedence than CONST_WIDE_INT, so >> that for sure it will be sorted after all other kinds sub-parts. >> >> This is an obvious change. Bootstrap&test on x86_64, bootstrap&test on >> AArch64. Is it OK for this stage? > > I believe the obvious change would be to change > commutative_operand_precedence to pre-CONST_WIDE_INT behavior, namely > giving CONST_WIDE_INT the same precedence as CONST_DOUBLE. Yes, but I am not sure what else this change will affect, so I made the smallest change in the original code. I am testing this now. It would be great if anyone describes it a bit. Thanks, bin > > Richard. > > > >> Thanks, >> bin >> >> 2016-03-04 Bin Cheng >> >> PR rtl-optimization/69052 >> * loop-invariant.c (compare_address_parts): Force CONST_INT to lower >> precedence than CONST_WIDE_INT.
Re: [PING][GCC ARM]Refine scaled address expression on ARM
Ping^2 Thanks, bin On Fri, Nov 22, 2013 at 3:32 PM, bin.cheng wrote: > PING. > Hi, there is a patch at > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01353.html which slipped away. > > Thanks, > bin > > > -- Best Regards.
[PATCH ARM/Embedded-4_8-branch]Backport revision 200103 from mainline
Hi, This patch back ports revision 200103 from mainline to ARM/Embedded-4_8-branch. It is tested on arm cortex-m3 and no regressions. Thanks, binIndex: gcc/tree-ssa-uncprop.c === --- gcc/tree-ssa-uncprop.c (revision 205471) +++ gcc/tree-ssa-uncprop.c (revision 205472) @@ -466,12 +466,11 @@ struct equiv_hash_elt equiv_hash_elt; void **slot; - /* If the argument is not an invariant, and refers to the same -underlying variable as the PHI result, then there's no -point in un-propagating the argument. */ + /* If the argument is not an invariant and can be potentially +coalesced with the result, then there's no point in +un-propagating the argument. */ if (!is_gimple_min_invariant (arg) - && (SSA_NAME_VAR (arg) == SSA_NAME_VAR (res) - && TREE_TYPE (arg) == TREE_TYPE (res))) + && gimple_can_coalesce_p (arg, res)) continue; /* Lookup this argument's value in the hash table. */ @@ -485,7 +484,7 @@ int j; /* Walk every equivalence with the same value. If we find -one with the same underlying variable as the PHI result, +one that can potentially coalesce with the PHI rsult, then replace the value in the argument with its equivalent SSA_NAME. Use the most recent equivalence as hopefully that results in shortest lifetimes. */ @@ -493,8 +492,7 @@ { tree equiv = elt->equivalences[j]; - if (SSA_NAME_VAR (equiv) == SSA_NAME_VAR (res) - && TREE_TYPE (equiv) == TREE_TYPE (res)) + if (gimple_can_coalesce_p (equiv, res)) { SET_PHI_ARG_DEF (phi, e->dest_idx, equiv); break; Index: gcc/testsuite/ChangeLog.arm === --- gcc/testsuite/ChangeLog.arm (revision 205471) +++ gcc/testsuite/ChangeLog.arm (revision 205472) @@ -1,3 +1,10 @@ +2013-11-28 Bin Cheng + + Backport mainline r200103 + 2013-06-15 Jeff Law (missing) + + * gcc.dg/tree-ssa/coalesce-1.c: New test. + 2013-11-27 Terry Guo Backport mainline r205391 Index: gcc/testsuite/gcc.dg/tree-ssa/coalesce-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/coalesce-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/coalesce-1.c (revision 205472) @@ -0,0 +1,195 @@ +/* { dg-do compile } */ + +/* { dg-options "-O2 -fdump-rtl-expand-details" } */ + +typedef long unsigned int size_t; +union tree_node; +typedef union tree_node *tree; +union gimple_statement_d; +typedef union gimple_statement_d *gimple; +typedef const union tree_node *const_tree; +typedef const union gimple_statement_d *const_gimple; +struct gimple_seq_d; +typedef struct gimple_seq_d *gimple_seq; +struct edge_def; +typedef struct edge_def *edge; +struct basic_block_def; +typedef struct basic_block_def *basic_block; +typedef const struct basic_block_def *const_basic_block; +struct tree_exp +{ + tree operands[1]; +}; +typedef struct ssa_use_operand_d +{ + tree *use; +} ssa_use_operand_t; +struct phi_arg_d +{ + struct ssa_use_operand_d imm_use; +}; +union tree_node +{ + struct tree_exp exp; +}; +struct function +{ +}; +extern struct function *cfun; +struct edge_def +{ + unsigned int dest_idx; +}; +static __inline__ void +VEC_edge_must_be_pointer_type (void) +{ + (void) ((edge) 1 == (void *) 1); +} typedef struct VEC_edge_base + +{ + unsigned num; + unsigned alloc; + edge vec[1]; +} VEC_edge_base; +typedef struct VEC_edge_none +{ + VEC_edge_base base; +} VEC_edge_none; + +static __inline__ edge +VEC_edge_base_index (const VEC_edge_base * vec_, unsigned ix_, +const char *file_, unsigned line_, const char *function_) +{ + return vec_->vec[ix_]; +} + +typedef struct VEC_edge_gc +{ + VEC_edge_base base; +} VEC_edge_gc; +struct basic_block_def +{ + VEC_edge_gc *succs; +}; +static __inline__ edge +single_succ_edge (const_basic_block bb) +{ + return (VEC_edge_base_index + bb)->succs) ? &((bb)->succs)->base : 0), (0), + "/home/gcc/virgin-gcc/gcc/basic-block.h", 563, __FUNCTION__)); +} + +edge find_edge (basic_block, basic_block); +typedef tree *def_operand_p; +typedef ssa_use_operand_t *use_operand_p; +struct gimple_seq_node_d; +typedef struct gimple_seq_node_d *gimple_seq_node; +struct gimple_seq_node_d +{ + gimple stmt; +}; +typedef struct +{ + gimple_seq_node ptr; + gimple_seq seq; + basic_block bb; +} gimple_stmt_iterator; +struct gimple_statement_phi +{ + struct phi_arg_d args[1]; +}; +union gimple_statement_d +{ + struct gimple_statement_phi gimple_phi; +}; +extern size_t const gimple_ops_offset_[]; +static __inline__ tre
Re: [PATCH ARM]Refine scaled address expression on ARM
On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw wrote: > On 18/09/13 10:15, bin.cheng wrote: >> >> >>> -Original Message- >>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>> ow...@gcc.gnu.org] On Behalf Of bin.cheng >>> Sent: Monday, September 02, 2013 3:09 PM >>> To: Richard Earnshaw >>> Cc: gcc-patches@gcc.gnu.org >>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM >>> >>> >>> >>>> -Original Message- >>>> From: Richard Earnshaw >>>> Sent: Thursday, August 29, 2013 9:06 PM >>>> To: Bin Cheng >>>> Cc: gcc-patches@gcc.gnu.org >>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM >>>> >>>> On 28/08/13 08:00, bin.cheng wrote: >>>>> Hi, >>>>> >>>>> This patch refines scaled address expression on ARM. It supports >>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries >>>>> to legitimize "base + index * scale + offset" with "reg <- base + >>>>> offset; reg >>>>> + index * scale" by introducing thumb2_legitimize_address. For now >>>>> + function >>>>> thumb2_legitimize_address is a kind of placeholder and just does the >>>>> mentioned transformation by calling to try_multiplier_address. >>>>> Hoping we can improve it in the future. >>>>> >>>>> With this patch: >>>>> 1) "base+index*scale" is recognized. >>>> >>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. >>>> So this shouldn't be necessary. Can you identify where this >>> non-canoncial form is being generated? >>>> >>> >>> Oh, for now ivopt constructs "index*scale" to test whether backend >>> supports scaled addressing mode, which is not valid on ARM, so I was going >>> to construct "base + index*scale" instead. Since "base + index * scale" >> is not >>> canonical form, I will construct the canonical form and drop this part of >> the >>> patch. >>> >>> Is rest of this patch OK? >>> >> Hi Richard, I removed the part over which you concerned and created this >> updated patch. >> >> Is it OK? >> >> Thanks. >> bin >> >> 2013-09-18 Bin Cheng >> >> * config/arm/arm.c (try_multiplier_address): New function. >> (thumb2_legitimize_address): New function. >> (arm_legitimize_address): Call try_multiplier_address and >> thumb2_legitimize_address. >> >> >> 6-arm-scaled_address-20130918.txt >> >> >> Index: gcc/config/arm/arm.c >> === >> --- gcc/config/arm/arm.c (revision 200774) >> +++ gcc/config/arm/arm.c (working copy) >> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) >> } >> } >> >> +/* Try to find address expression like base + index * scale + offset >> + in X. If we find one, force base + offset into register and >> + construct new expression reg + index * scale; return the new >> + address expression if it's valid. Otherwise return X. */ >> +static rtx >> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) >> +{ >> + rtx tmp, base_reg, new_rtx; >> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = >> NULL_RTX; >> + >> + gcc_assert (GET_CODE (x) == PLUS); >> + >> + /* Try to find and record base/index/scale/offset in X. */ >> + if (GET_CODE (XEXP (x, 1)) == MULT) >> +{ >> + tmp = XEXP (x, 0); >> + index = XEXP (XEXP (x, 1), 0); >> + scale = XEXP (XEXP (x, 1), 1); >> + if (GET_CODE (tmp) != PLUS) >> + return x; >> + >> + base = XEXP (tmp, 0); >> + offset = XEXP (tmp, 1); >> +} >> + else >> +{ >> + tmp = XEXP (x, 0); >> + offset = XEXP (x, 1); >> + if (GET_CODE (tmp) != PLUS) >> + return x; >> + >> + base = XEXP (tmp, 0); >> + scale = XEXP (tmp, 1); >> + if (GET_CODE (base) == MULT) >> + { >> + tmp = base; >> + base = scale; >> + scale = tmp; >> + } >> + if (GET_CODE (scale) != MULT) >> + return x; >> + >> +
Re: [PATCH ARM]Refine scaled address expression on ARM
On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng wrote: > On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw wrote: >> On 18/09/13 10:15, bin.cheng wrote: >>> >>> >>>> -Original Message- >>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng >>>> Sent: Monday, September 02, 2013 3:09 PM >>>> To: Richard Earnshaw >>>> Cc: gcc-patches@gcc.gnu.org >>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM >>>> >>>> >>>> >>>>> -Original Message- >>>>> From: Richard Earnshaw >>>>> Sent: Thursday, August 29, 2013 9:06 PM >>>>> To: Bin Cheng >>>>> Cc: gcc-patches@gcc.gnu.org >>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM >>>>> >>>>> On 28/08/13 08:00, bin.cheng wrote: >>>>>> Hi, >>>>>> >>>>>> This patch refines scaled address expression on ARM. It supports >>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries >>>>>> to legitimize "base + index * scale + offset" with "reg <- base + >>>>>> offset; reg >>>>>> + index * scale" by introducing thumb2_legitimize_address. For now >>>>>> + function >>>>>> thumb2_legitimize_address is a kind of placeholder and just does the >>>>>> mentioned transformation by calling to try_multiplier_address. >>>>>> Hoping we can improve it in the future. >>>>>> >>>>>> With this patch: >>>>>> 1) "base+index*scale" is recognized. >>>>> >>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. >>>>> So this shouldn't be necessary. Can you identify where this >>>> non-canoncial form is being generated? >>>>> >>>> >>>> Oh, for now ivopt constructs "index*scale" to test whether backend >>>> supports scaled addressing mode, which is not valid on ARM, so I was going >>>> to construct "base + index*scale" instead. Since "base + index * scale" >>> is not >>>> canonical form, I will construct the canonical form and drop this part of >>> the >>>> patch. >>>> >>>> Is rest of this patch OK? >>>> >>> Hi Richard, I removed the part over which you concerned and created this >>> updated patch. >>> >>> Is it OK? >>> >>> Thanks. >>> bin >>> >>> 2013-09-18 Bin Cheng >>> >>> * config/arm/arm.c (try_multiplier_address): New function. >>> (thumb2_legitimize_address): New function. >>> (arm_legitimize_address): Call try_multiplier_address and >>> thumb2_legitimize_address. >>> >>> >>> 6-arm-scaled_address-20130918.txt >>> >>> >>> Index: gcc/config/arm/arm.c >>> === >>> --- gcc/config/arm/arm.c (revision 200774) >>> +++ gcc/config/arm/arm.c (working copy) >>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) >>> } >>> } >>> >>> +/* Try to find address expression like base + index * scale + offset >>> + in X. If we find one, force base + offset into register and >>> + construct new expression reg + index * scale; return the new >>> + address expression if it's valid. Otherwise return X. */ >>> +static rtx >>> +try_multiplier_address (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) >>> +{ >>> + rtx tmp, base_reg, new_rtx; >>> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset = >>> NULL_RTX; >>> + >>> + gcc_assert (GET_CODE (x) == PLUS); >>> + >>> + /* Try to find and record base/index/scale/offset in X. */ >>> + if (GET_CODE (XEXP (x, 1)) == MULT) >>> +{ >>> + tmp = XEXP (x, 0); >>> + index = XEXP (XEXP (x, 1), 0); >>> + scale = XEXP (XEXP (x, 1), 1); >>> + if (GET_CODE (tmp) != PLUS) >>> + return x; >>> + >>> + base = XEXP (tmp, 0); >>> + offset = XEXP (tmp, 1); >>> +} >>> + else >>> +{ >>>
Re: [PATCH ARM]Refine scaled address expression on ARM
On Fri, Nov 29, 2013 at 6:44 PM, Richard Biener wrote: > On Fri, Nov 29, 2013 at 8:52 AM, Bin.Cheng wrote: >> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng wrote: >>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw wrote: >>>> On 18/09/13 10:15, bin.cheng wrote: >>>>> >>>>> >>>>>> -Original Message- >>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>> ow...@gcc.gnu.org] On Behalf Of bin.cheng >>>>>> Sent: Monday, September 02, 2013 3:09 PM >>>>>> To: Richard Earnshaw >>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM >>>>>> >>>>>> >>>>>> >>>>>>> -Original Message- >>>>>>> From: Richard Earnshaw >>>>>>> Sent: Thursday, August 29, 2013 9:06 PM >>>>>>> To: Bin Cheng >>>>>>> Cc: gcc-patches@gcc.gnu.org >>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM >>>>>>> >>>>>>> On 28/08/13 08:00, bin.cheng wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> This patch refines scaled address expression on ARM. It supports >>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries >>>>>>>> to legitimize "base + index * scale + offset" with "reg <- base + >>>>>>>> offset; reg >>>>>>>> + index * scale" by introducing thumb2_legitimize_address. For now >>>>>>>> + function >>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the >>>>>>>> mentioned transformation by calling to try_multiplier_address. >>>>>>>> Hoping we can improve it in the future. >>>>>>>> >>>>>>>> With this patch: >>>>>>>> 1) "base+index*scale" is recognized. >>>>>>> >>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical form. >>>>>>> So this shouldn't be necessary. Can you identify where this >>>>>> non-canoncial form is being generated? >>>>>>> >>>>>> >>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend >>>>>> supports scaled addressing mode, which is not valid on ARM, so I was >>>>>> going >>>>>> to construct "base + index*scale" instead. Since "base + index * scale" >>>>> is not >>>>>> canonical form, I will construct the canonical form and drop this part of >>>>> the >>>>>> patch. >>>>>> >>>>>> Is rest of this patch OK? >>>>>> >>>>> Hi Richard, I removed the part over which you concerned and created this >>>>> updated patch. >>>>> >>>>> Is it OK? >>>>> >>>>> Thanks. >>>>> bin >>>>> >>>>> 2013-09-18 Bin Cheng >>>>> >>>>> * config/arm/arm.c (try_multiplier_address): New function. >>>>> (thumb2_legitimize_address): New function. >>>>> (arm_legitimize_address): Call try_multiplier_address and >>>>> thumb2_legitimize_address. >>>>> >>>>> >>>>> 6-arm-scaled_address-20130918.txt >>>>> >>>>> >>>>> Index: gcc/config/arm/arm.c >>>>> === >>>>> --- gcc/config/arm/arm.c (revision 200774) >>>>> +++ gcc/config/arm/arm.c (working copy) >>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg) >>>>> } >>>>> } >>>>> >>>>> +/* Try to find address expression like base + index * scale + offset >>>>> + in X. If we find one, force base + offset into register and >>>>> + construct new expression reg + index * scale; return the new >>>>> + address expression if it's valid. Otherwise return X. */ >>>>> +static rtx >>>>> +try_multiplier_address (rtx x, enum m
Re: [PATCH ARM]Refine scaled address expression on ARM
On Sat, Nov 30, 2013 at 12:34 AM, Richard Earnshaw wrote: > On 29/11/13 11:46, Yufeng Zhang wrote: >> On 11/29/13 07:52, Bin.Cheng wrote: >>> After thinking twice, I some kind of think we should not re-associate >>> addresses during expanding, because of lacking of context information. >>> Take base + scaled_index + offset as an example in PR57540, we just >>> don't know if "base+offset" is loop invariant from either backend or >>> RTL expander. >> >> I'm getting less convinced by re-associating base with offset >> unconditionally. One counter example is >> >> typedef int arr_1[20]; >> void foo (arr_1 a1, int i) >> { >>a1[i+10] = 1; >> } >> >> I'm experimenting a patch to get the immediate offset in the above >> example to be the last addend in the address computing (as mentioned in >> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the >> following code-gen: >> >> add r1, r0, r1, asl #2 >> mov r3, #1 >> str r3, [r1, #40] >> >> With your patch applied, the effort will be reverted to >> >> add r0, r0, #40 >> mov r3, #1 >> str r3, [r0, r1, asl #2] >> > > And another one is: > > > > typedef int arr_1[20]; > void foo (arr_1 a1, int i) > { >a1[i+10] = 1; >a1[i+11] = 1; > } > > This should compile to: > > add r1, r0, r1, asl #2 > mov r3, #1 > str r3, [r1, #40] > str r3, [r1, #44] > > And which on Thumb2 should then collapse to: > > add r1, r0, r1, asl #2 > mov r3, #1 > strdr3, r3, [r1, #40] > > With your patch I don't see any chance of being able to get to this > situation. > > (BTW, we currently generate: > > mov r3, #1 > add r1, r1, #10 > add r2, r0, r1, asl #2 > str r3, [r0, r1, asl #2] > str r3, [r2, #4] > > which is insane). The two memory references share common sub expressions, SLSR is designed to handle this case, and it should be improved to handle. The original patch are only used to pick up cases not handled by SLSR and IVOPT. Anyway, as you saw from previous message, to do the refactoring during expand is not a good practice, without enough CSE/INVARIANT information, there will be always catched and missed opportunities, that's why I think another lowering besides SLSR/IVOPT on gimple might be a win. Thanks, bin > > I think I see where you're coming from on the original testcase, but I > think you're trying to solve the wrong problem. In your test case the > base is an eliminable register, which is likely to be replaced with an > offset expression during register allocation. The problem then, I > think, is that the cost of these virtual registers is treated the same > as any other pseudo register, when it may really have the cost of a PLUS > expression. > > Perhaps the cost of using an eliminable register should be raised in > rtx_costs() (treating them as equivalent to (PLUS (reg) (CONST_INT > (TBD))), so that loop optimizations will try to hoist suitable > sub-expressions out the loop and replace them with real pseudos. > > R. > > > > > > > > -- Best Regards.
Re: [PATCH ARM]Refine scaled address expression on ARM
On Sat, Nov 30, 2013 at 12:34 AM, Richard Earnshaw wrote: > On 29/11/13 11:46, Yufeng Zhang wrote: >> On 11/29/13 07:52, Bin.Cheng wrote: >>> After thinking twice, I some kind of think we should not re-associate >>> addresses during expanding, because of lacking of context information. >>> Take base + scaled_index + offset as an example in PR57540, we just >>> don't know if "base+offset" is loop invariant from either backend or >>> RTL expander. >> >> I'm getting less convinced by re-associating base with offset >> unconditionally. One counter example is >> >> typedef int arr_1[20]; >> void foo (arr_1 a1, int i) >> { >>a1[i+10] = 1; >> } >> >> I'm experimenting a patch to get the immediate offset in the above >> example to be the last addend in the address computing (as mentioned in >> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the >> following code-gen: >> >> add r1, r0, r1, asl #2 >> mov r3, #1 >> str r3, [r1, #40] >> >> With your patch applied, the effort will be reverted to >> >> add r0, r0, #40 >> mov r3, #1 >> str r3, [r0, r1, asl #2] >> > > And another one is: > > > > typedef int arr_1[20]; > void foo (arr_1 a1, int i) > { >a1[i+10] = 1; >a1[i+11] = 1; > } > > This should compile to: > > add r1, r0, r1, asl #2 > mov r3, #1 > str r3, [r1, #40] > str r3, [r1, #44] > > And which on Thumb2 should then collapse to: > > add r1, r0, r1, asl #2 > mov r3, #1 > strdr3, r3, [r1, #40] > > With your patch I don't see any chance of being able to get to this > situation. > > (BTW, we currently generate: > > mov r3, #1 > add r1, r1, #10 > add r2, r0, r1, asl #2 > str r3, [r0, r1, asl #2] > str r3, [r2, #4] > > which is insane). > > I think I see where you're coming from on the original testcase, but I > think you're trying to solve the wrong problem. In your test case the > base is an eliminable register, which is likely to be replaced with an > offset expression during register allocation. The problem then, I > think, is that the cost of these virtual registers is treated the same > as any other pseudo register, when it may really have the cost of a PLUS > expression. > > Perhaps the cost of using an eliminable register should be raised in > rtx_costs() (treating them as equivalent to (PLUS (reg) (CONST_INT > (TBD))), so that loop optimizations will try to hoist suitable > sub-expressions out the loop and replace them with real pseudos. > I now have access to the code. The gimple before expanding is like: : # j_26 = PHI # k_29 = PHI j_9 = j_26 + 1; k_8 = parent[k_29]; if (k_8 >= 0) goto ; else goto ; The rtl generated after expanding is like: 88: NOTE_INSN_BASIC_BLOCK 7 89: r174:SI=r174:SI+0x1 90: r191:SI=r173:SI<<0x2 91: r190:SI=r105:SI+r191:SI 92: r173:SI=[r190:SI-0x810] with r105 == virtual_stack_vars_rtx, and it will be instantiated into frame_pointer_rtx in vregs pass: 88: NOTE_INSN_BASIC_BLOCK 7 89: r174:SI=r174:SI+0x1 90: r191:SI=r173:SI<<0x2 91: r190:SI=sfp:SI+r191:SI 92: r173:SI=[r190:SI-0x810] As you pointed out, sfp is not hoisted as a high cost invariant. I am not sure if loop-invariant will hoist a single pseudo reg even it's assigned with a higher cost. But before the invariant problem, the choice made by RTL expand is bad because it hides the CSE opportunity, because (sfp + r173<<2 - 0x810) == (sp + r173<<2), (sfp-0x810) can be folded into (sp), then we can embed the shift instruction in scaled addressing mode [sp + r173 << 2]. Thanks, bin -- Best Regards.
Re: Make C11 _Alignof return least not greatest alignment for a type (PR c/52023)
On Thu, Dec 5, 2013 at 6:57 AM, Joseph S. Myers wrote: > Index: c-family/c-common.c > === > --- c-family/c-common.c (revision 205668) > +++ c-family/c-common.c (working copy) > @@ -4921,14 +4921,17 @@ c_common_get_alias_set (tree t) > } > > /* Compute the value of 'sizeof (TYPE)' or '__alignof__ (TYPE)', where > - the second parameter indicates which OPERATOR is being applied. > + the IS_SIZEOF parameter indicates which operator is being applied. > The COMPLAIN flag controls whether we should diagnose possibly > ill-formed constructs or not. LOC is the location of the SIZEOF or > - TYPEOF operator. */ > + TYPEOF operator. If MIN_ALIGNOF, the least alignment required for > + a type in any context should be returned, rather than the normal > + alignment for that type. */ > > tree > c_sizeof_or_alignof_type (location_t loc, > - tree type, bool is_sizeof, int complain) > + tree type, bool is_sizeof, bool min_alignof, > + int complain) > { >const char *op_name; >tree value = NULL; > @@ -4994,6 +4997,22 @@ c_sizeof_or_alignof_type (location_t loc, > value = size_binop_loc (loc, CEIL_DIV_EXPR, TYPE_SIZE_UNIT (type), > size_int (TYPE_PRECISION (char_type_node) > / BITS_PER_UNIT)); > + else if (min_alignof) > + { > + unsigned int align = TYPE_ALIGN (type); > + align = MIN (align, BIGGEST_ALIGNMENT); > +#ifdef BIGGEST_FIELD_ALIGNMENT > + align = MIN (align, BIGGEST_FIELD_ALIGNMENT); > +#endif > + tree field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, > + type); > + unsigned int field_align = align; > +#ifdef ADJUST_FIELD_ALIGN > + field_align = ADJUST_FIELD_ALIGN (field, field_align); > +#endif Won't *field* be unused if ADJUST_FIELD_ALIGN not defined? Thanks, bin -- Best Regards.
Re: [Ping]Two pending IVOPT patches
On Fri, Dec 6, 2013 at 2:10 PM, Jeff Law wrote: > On 11/26/13 03:52, Bin.Cheng wrote: >> >> On Tue, Nov 26, 2013 at 6:06 AM, Jeff Law wrote: >>> >>> On 11/25/13 02:11, Bin.Cheng wrote: >>>> >>>> >>>> >>>> Slightly tune to make iv cand choosing algorithm more accurate: >>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01574.html >>> >>> >>> It would help if you had some sample codes where this patch was useful. >>> I >>> can kind-of see what's going on, but I'm way too unfamiliar with the >>> tree-IV >>> code to review this change. >>> >> Jeff, Thanks to your review. >> As for example, consider below code on ARM/cortex-m4, (the code itself >> may be non-sense): > > [ ... ] > So in this testcase, it looks like the effect is to eliminate the IVs for > the loop counters, instead expressing the loop termination in terms of the > pointers. Yes, it's linear function test elimination. > > As far as I can tell this is entirely due to the changes to iv_ca_narrow. Exactly. > > Do you have any codes where iv_ca_extend helps? I can see how that hunk > appears to be safe, and I'm guessing that setting the cost pair at each step > could potentially give more accurate costing on the next iteration of the > loop. But I'd love to be able to see this effect directly rather than just > assuming it's helpful. Given that I'm prepared to approve the iv_ca_extend > hunk. Very sorry I can't provide an example about this now. I remember it's a case in eembc I encountered, but with current trunk I can't reproduce it with the change about iv_ca_extend. Maybe recent checking has changed the behavior of IVOPT. Considering there is no case about this change, I am fine to discard this part of patch and continue with iv_ca_narrow part. > > I realize the changes to iv_ca_extend are of lesser importance, but I'm > still working my way through the tree IV code to get some basic > understanding of what it's doing and why. I hope to be able to look at > iv_ca_narrow in more detail over the weekend. Thanks very much. Thanks, bin > > jeff > > -- Best Regards.
[PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
Hi, Entry pr41488 shows a case in which induction variable can't be recognized or coalesced. As noted in the bug entry, one possible fix is to teach PRE not to do some specific transformation. However, it's in nature a scalar evolution issue. Considering below snippet: # i_17 = PHI # _20 = PHI <_5(5), start_4(D)(3)> ... i_13 = i_17 + 1; _5 = start_4(D) + i_13; Variable _20 appears in the form of PEELED chrec like (start_4, _5)_LOOP, meaning it has value "start_4" in the 1st iteration, then changes to _5 in following iterations. PEELED chrec is not implemented by GCC right now, it can be simplified into either POLYNOMIAL or PERIOD one. The POLYNOMIAL chrec is processed by GCC after being recognized, as in the examle, _20 is actually {start_4, 1}_LOOP. This patch modifies scalar evolution by trying to simplify PEELED chrec. Without this patch, generated code for pr41488.c is like: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r1, r2 bge .L7 push{r4} mov r3, r1 ldr r4, [r0] addsr2, r2, #1 addsr1, r1, #1 movsr0, #0 .L3: str r0, [r4, r3, lsl #2] mov r3, r1 addsr1, r1, #1 cmp r1, r2 bne .L3 ldr r4, [sp], #4 .L7: bx lr .size foo, .-foo Which is improved to: foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r1, r2 bge .L1 ldr r3, [r0] movsr0, #0 add r1, r3, r1, lsl #2 add r2, r3, r2, lsl #2 .L3: str r0, [r1], #4 cmp r1, r2 bne .L3 .L1: bx lr .size foo, .-foo One point needs to be clarified that I use tree_to_aff_combination_expand in the patch. Rational is the number of the specific PEELED_CHRECs should be moderate, I also check the equality literally firstly and resort to affine facility if that fails. By measuring bootstrap of gcc and spec2kint, I collected the number of opportunities caught by this patch like below: literal comparison/affine comparison GCC ~1200/~250 Spec2Kint 93/34 I could back trace the ssa chain instead of using affine functions, but that would miss some cases. Bootstrap and test on x86/x86_64/arm. Is it OK? Thanks, bin 2013-12-06 Bin Cheng PR tree-optimization/41488 * tree-ssa-loop-ivopts.c (add_old_iv_candidates): Don't add cand for PEELED_CHREC kind of IV. * tree-scalar-evolution.c: Include necessary header files. (peeled_chrec_map, simplify_peeled_chrec): New. (analyze_evolution_in_loop): New static variable. Call simplify_peeled_chrec. (scev_initialize): Initialize peeled_chrec_map. (scev_reset, scev_finalize): Reset and release peeled_chrec_map. gcc/testsuite/ChangeLog 2013-12-06 Bin Cheng * gcc.dg/tree-ssa/scev-7.c: New test. * gcc.dg/pr41488.c: New test.Index: gcc/tree-scalar-evolution.c === --- gcc/tree-scalar-evolution.c (revision 205688) +++ gcc/tree-scalar-evolution.c (working copy) @@ -280,6 +280,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "cfgloop.h" #include "tree-chrec.h" +#include "pointer-set.h" +#include "tree-affine.h" #include "tree-scalar-evolution.h" #include "dumpfile.h" #include "params.h" @@ -1380,7 +1382,66 @@ follow_ssa_edge (struct loop *loop, gimple def, gi } +/* Pointer map used when simplifying PEELED_CHREC into POLYNOMIAL_CHREC. */ +static pointer_map_t *peeled_chrec_map; +/* Simplify PEELED_CHREC represented by (init_cond, arg) in LOOP. + Handle below case and return the corresponding POLYNOMIAL_CHREC: + + # i_17 = PHI + # _20 = PHI <_5(5), start_4(D)(3)> + ... + i_13 = i_17 + 1; + _5 = start_4(D) + i_13; + + Though variable _20 appears as a PEELED_CHREC in the form of + (start_4, _5)_LOOP, it's a POLYNOMIAL_CHREC like {start_4, 1}_LOOP. + + See PR41488. */ + +static tree +simplify_peeled_chrec (struct loop *loop, tree arg, tree init_cond) +{ + aff_tree aff1, aff2; + tree ev, left, right, type, step_val; + + ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, arg)); + if (ev == NULL_TREE || TREE_CODE (ev) != POLYNOMIAL_CHREC) +return chrec_dont_know; + + left = CHREC_LEFT (ev); + right = CHREC_RIGHT (ev); + type = TREE_TYPE (left); + step_val = chrec_fold_plus (type, init_cond, right); + + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP + if "left" equals to "init + right". */ + if (operand_equal_p (left, step_val, 0)) +{ + if (dump_file && (dump_flags & TDF_S
Re: [PATCH GCC]Pick up more address lowering cases for ivopt and tree-affine.c
On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener wrote: > On Mon, Nov 25, 2013 at 7:41 PM, Jeff Law wrote: >> On 11/25/13 02:22, bin.cheng wrote: >> >> Unless there's a PR for this problem, I think this needs to wait. > > I agree. Btw, please split the patch. Yes, I will get back to this after entering stage 1 again :) Hi Richard, I talked with you about clean strip_offset_1 up after this series of base simplification patches, but I realized it's not safe because affine facility has it's limit, like can only support 8 elements. Though the cleanup passes bootstrap and test on x86/x86_64 and most of codes in strip_offset_1 won't be executed usually, I guess we'd better to live with it, so what do you think? Thanks, bin > > Index: gcc/tree-affine.c > === > --- gcc/tree-affine.c(revision 205087) > +++ gcc/tree-affine.c(working copy) > @@ -328,7 +328,19 @@ tree_to_aff_combination (tree expr, tree type, aff > double_int::from_uhwi (bitpos / BITS_PER_UNIT)); >core = build_fold_addr_expr (core); >if (TREE_CODE (core) == ADDR_EXPR) > -aff_combination_add_elt (comb, core, double_int_one); > +{ > + /* Handle &MEM[ptr + CST] in core part of complex reference. */ > + if (TREE_CODE (TREE_OPERAND (core, 0)) == MEM_REF) > +{ > + core = TREE_OPERAND (core, 0); > + tree_to_aff_combination (TREE_OPERAND (core, 0), type, &tmp); > + aff_combination_add (comb, &tmp); > + tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp); > + aff_combination_add (comb, &tmp); > +} > + else > +aff_combination_add_elt (comb, core, double_int_one); > +} >else > { >tree_to_aff_combination (core, type, &tmp) > > please handle the offset before taking the address, thus > > if (TREE_CODE (core) == MEM_REF) > { >add constant offset; >core = TREE_OPERAND (core, 0); > } > else > core = build_fold_addr_expr (core); > > that simplifies things and avoids the address building. > > Richard. > >> jeff >> >> -- Best Regards.
Re: [PATCH GCC]Pick up more address lowering cases for ivopt and tree-affine.c
On Fri, Dec 6, 2013 at 7:20 PM, Richard Biener wrote: > On Fri, Dec 6, 2013 at 11:40 AM, Bin.Cheng wrote: >> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >> wrote: >>> On Mon, Nov 25, 2013 at 7:41 PM, Jeff Law wrote: >>>> On 11/25/13 02:22, bin.cheng wrote: >>>> >>>> Unless there's a PR for this problem, I think this needs to wait. >>> >>> I agree. Btw, please split the patch. >> Yes, I will get back to this after entering stage 1 again :) >> >> Hi Richard, >> I talked with you about clean strip_offset_1 up after this series of >> base simplification patches, but I realized it's not safe because >> affine facility has it's limit, like can only support 8 elements. >> Though the cleanup passes bootstrap and test on x86/x86_64 and most of >> codes in strip_offset_1 won't be executed usually, I guess we'd better >> to live with it, so what do you think? > > Not sure - I'm lacking some context here ;) If you have a cleanup patch > fine - WRT the affine limit of 8 elements, further elements will just > add to the rest tree. This is to limit compile-time. Yes, so it's possible to have COMPONENT_REF stored in rest tree if we reach the 8 elements limit. In this case the COMPONENT_REF will be visited by the code in strip_offset_1, although it sounds unlikely to happen. Thanks, bin > > Richard.
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Sat, Dec 7, 2013 at 3:20 AM, Jeff Law wrote: > On 12/06/13 03:29, bin.cheng wrote: >> >> Hi, >> Entry pr41488 shows a case in which induction variable can't be recognized >> or coalesced. As noted in the bug entry, one possible fix is to teach PRE >> not to do some specific transformation. However, it's in nature a scalar >> evolution issue. Considering below snippet: >> >> # i_17 = PHI >> # _20 = PHI <_5(5), start_4(D)(3)> >> ... >> i_13 = i_17 + 1; >> _5 = start_4(D) + i_13; >> >> Variable _20 appears in the form of PEELED chrec like (start_4, _5)_LOOP, >> meaning it has value "start_4" in the 1st iteration, then changes to _5 in >> following iterations. PEELED chrec is not implemented by GCC right now, >> it >> can be simplified into either POLYNOMIAL or PERIOD one. > > Right. PEELED_CHREC was removed from the LNO branch back in 2004, > presumably before the LNO branch was merged into the mainline. No reason > was given. Maybe we don't have any user for such CHRECs in GCC now, but is's just guessing. > > But what you're really discovering here is that _20 is just another standard > looking IV that appears in the form of a peeled chrec, right? Exactly, IVOPT handles only polynomial ones, and _20 is one such chrec only appearing in peeled form. > > > > The POLYNOMIAL >> >> chrec is processed by GCC after being recognized, as in the examle, _20 is >> actually {start_4, 1}_LOOP. > > Right. Based on reading the archives, it looks like this stuff is/was > generated by PRE. I also suspect jump threading can create them. There was > talk of throttling PRE to leave things in a form that the IV analysis could > more easily digest, but I'm not sure if that was ever completed. Also could just because it is coded in that way, just as the case I added in patch. I found real examples from ggc-page.c in GCC. But it's always good to cleanup input of an optimization, I presume that's why Richard tried to move IVOPT later before. > > [ snip ] > > >> >> One point needs to be clarified that I use tree_to_aff_combination_expand >> in >> the patch. Rational is the number of the specific PEELED_CHRECs should be >> moderate, I also check the equality literally firstly and resort to affine >> facility if that fails. By measuring bootstrap of gcc and spec2kint, I >> collected the number of opportunities caught by this patch like below: >> literal comparison/affine comparison >> GCC ~1200/~250 >> Spec2Kint 93/34 >> >> I could back trace the ssa chain instead of using affine functions, but >> that >> would miss some cases. > > I assume tree_to_aff_combination_expand is relatively expensive, thus the > two approaches, one which avoids tree_to_aff_combination_expand. Considering the dump of case in the patch: : _16 = start_4(D) + 1000; if (end_6(D) < _16) goto ; else goto ; : pretmp_22 = sp_7(D)->data; : # i_17 = PHI # _20 = PHI <_5(5), _16(3)> _9 = (unsigned int) _20; _10 = _9 * 4; _11 = pretmp_22 + _10; *_11 = 0; i_13 = i_17 + -1; _5 = start_4(D) + i_13; if (_5 > end_6(D)) goto ; else goto ; : goto ; I have to prove (_16 + -1) equals to (start_4 + 999) for _20, so either using affine function to back trace the definition of _16, or I have to back trace ssa_chain manually. Here I use affine function because the number of such cases should be moderate and there are more complicated case in which simple back trace will lose. Another question, is it acceptable to add an parameter for tree_to_aff_combination_expand to limit the number of recursive call for it? Thus we don't need to expand to leaf node every time. > > > In add_old_iv_candidates, is it the case that the only time > SSA_NAME_DEF_STMT (def) is another PHI node is when it's one of these affine I suppose. Actually IVOPT make an assert on IP_ORIGINAL candidates in function rewrite_use_nonlinear_expr, like: /* An important special case -- if we are asked to express value of the original iv by itself, just exit; there is no need to introduce a new computation (that might also need casting the variable to unsigned and back). */ if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt) { enum tree_code stmt_code; gcc_assert (is_gimple_assign (use->stmt)); .. The only case I can think about involving other kind of phi node is for merging phi in conditional code like: LOOP: x_1 = phi if (cond) x_3 = x_1 + 1; else x_4
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Mon, Dec 9, 2013 at 12:00 PM, Jeff Law wrote: > On 12/06/13 19:44, Bin.Cheng wrote: >>> >>> Right. Based on reading the archives, it looks like this stuff is/was >>> generated by PRE. I also suspect jump threading can create them. There >>> was >>> talk of throttling PRE to leave things in a form that the IV analysis >>> could >>> more easily digest, but I'm not sure if that was ever completed. >> >> >> Also could just because it is coded in that way, just as the case I >> added in patch. I found real examples from ggc-page.c in GCC. >> But it's always good to cleanup input of an optimization, I presume >> that's why Richard tried to move IVOPT later before. > > Certainly. That possibility was mentioned as well in either 41488 or > elsewhere in my research. > > >>> >>> I assume tree_to_aff_combination_expand is relatively expensive, thus the >>> two approaches, one which avoids tree_to_aff_combination_expand. >> >> Considering the dump of case in the patch: > > [ snip ] > So it's also a case using the affine stuff gets you get a simpler interface > to express the value in terms you may be able match. Yeah. > > >> >> Another question, is it acceptable to add an parameter for >> tree_to_aff_combination_expand to limit the number of recursive call >> for it? Thus we don't need to expand to leaf node every time. > > If there's a compelling reason to limit the recursion, then I'd think such a > parameter would be reasonable. Not for now. Just come up this idea given that some recent patches also use the interface to get back trace information and it's expensive. Maybe Richard have some comment here too. > > > >> >>> >>> >>> In add_old_iv_candidates, is it the case that the only time >>> SSA_NAME_DEF_STMT (def) is another PHI node is when it's one of these >>> affine >> >> >> I suppose. Actually IVOPT make an assert on IP_ORIGINAL candidates in >> function rewrite_use_nonlinear_expr, like: > > Well, the reason I ask is after your patch we'd be ignoring anything where > SSA_NAME_DEF_STMT (def) is a PHI node and not a PEELED_CHREC. I want to > make sure there aren't other expected cases where SSA_NAME_DEF_STMT (def) is > a PHI that we'd want to call add_candidate_1 for. > > It sounds like there aren't other cases you'd expect to see here where > SSA_NAME_DEF_STMT (defFFF) is a PHI. Yes. > > >> IVOPT adds original cand and tries to keep it the original IV is >> because it does have an updating statement. For IVs picked up by this >> patch, it doesn't have stepping statement, so no need/way to leaving >> it untouched. It will be represented by candidates for IVs from which >> it is peeled. > > Understood. Thanks for clarifying. The patch is fine for the trunk. Thanks very much for reviewing. Since Sebastian hasn't added any comments, I will change the patch as suggested by Richard before, then retest it, and check in the new version if it's ok. Thanks, bin > > jeff -- Best Regards.
RE: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
> -Original Message- > From: Bin.Cheng [mailto:amker.ch...@gmail.com] > Sent: Monday, December 09, 2013 1:15 PM > To: Jeff Law > Cc: Bin Cheng; gcc-patches List > Subject: Re: [PATCH PR41488]Recognize more induction variables by > simplifying PEELED chrec in scalar evolution > > On Mon, Dec 9, 2013 at 12:00 PM, Jeff Law wrote: > > On 12/06/13 19:44, Bin.Cheng wrote: > >>> > >>> Right. Based on reading the archives, it looks like this stuff > >>> is/was generated by PRE. I also suspect jump threading can create > >>> them. There was talk of throttling PRE to leave things in a form > >>> that the IV analysis could more easily digest, but I'm not sure if > >>> that was ever completed. > >> > >> > >> Also could just because it is coded in that way, just as the case I > >> added in patch. I found real examples from ggc-page.c in GCC. > >> But it's always good to cleanup input of an optimization, I presume > >> that's why Richard tried to move IVOPT later before. > > > > Certainly. That possibility was mentioned as well in either 41488 or > > elsewhere in my research. > > > > > >>> > >>> I assume tree_to_aff_combination_expand is relatively expensive, > >>> thus the two approaches, one which avoids > tree_to_aff_combination_expand. > >> > >> Considering the dump of case in the patch: > > > > [ snip ] > > So it's also a case using the affine stuff gets you get a simpler > > interface to express the value in terms you may be able match. > Yeah. > > > > > > >> > >> Another question, is it acceptable to add an parameter for > >> tree_to_aff_combination_expand to limit the number of recursive call > >> for it? Thus we don't need to expand to leaf node every time. > > > > If there's a compelling reason to limit the recursion, then I'd think > > such a parameter would be reasonable. > Not for now. Just come up this idea given that some recent patches also use > the interface to get back trace information and it's expensive. Maybe > Richard have some comment here too. > > > > > > > > >> > >>> > >>> > >>> In add_old_iv_candidates, is it the case that the only time > >>> SSA_NAME_DEF_STMT (def) is another PHI node is when it's one of > >>> these affine > >> > >> > >> I suppose. Actually IVOPT make an assert on IP_ORIGINAL candidates > >> in function rewrite_use_nonlinear_expr, like: > > > > Well, the reason I ask is after your patch we'd be ignoring anything > > where SSA_NAME_DEF_STMT (def) is a PHI node and not a PEELED_CHREC. > I > > want to make sure there aren't other expected cases where > > SSA_NAME_DEF_STMT (def) is a PHI that we'd want to call > add_candidate_1 for. > > > > It sounds like there aren't other cases you'd expect to see here where > > SSA_NAME_DEF_STMT (defFFF) is a PHI. > Yes. > > > > > > >> IVOPT adds original cand and tries to keep it the original IV is > >> because it does have an updating statement. For IVs picked up by > >> this patch, it doesn't have stepping statement, so no need/way to > >> leaving it untouched. It will be represented by candidates for IVs > >> from which it is peeled. > > > > Understood. Thanks for clarifying. The patch is fine for the trunk. > Thanks very much for reviewing. Since Sebastian hasn't added any > comments, I will change the patch as suggested by Richard before, then > retest it, and check in the new version if it's ok. > Hi, This is the new version patch computing the difference in tree affine then comparing against integer_zero_node. Bootstrap and test on current upstream. I will commit it if there is no other objection. Thanks, bin > > > > jeff > > > > -- > Best Regards. Index: gcc/tree-scalar-evolution.c === --- gcc/tree-scalar-evolution.c (revision 205688) +++ gcc/tree-scalar-evolution.c (working copy) @@ -280,6 +280,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "cfgloop.h" #include "tree-chrec.h" +#include "pointer-set.h" +#include "tree-affine.h" #include "tree-scalar-evolution.h" #include "dumpfile.h" #include "params.h" @@ -1380,7 +1382,67 @@ follow_ssa_edge (struct loop *loop, gimple def, gi }
RE: [Ping]Two pending IVOPT patches
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jeff Law > Sent: Tuesday, December 10, 2013 6:31 AM > To: Bin.Cheng > Cc: gcc-patches List; Richard Biener; Zdenek Dvorak > Subject: Re: [Ping]Two pending IVOPT patches > > On 11/26/13 03:52, Bin.Cheng wrote: > > On Tue, Nov 26, 2013 at 6:06 AM, Jeff Law wrote: > >> On 11/25/13 02:11, Bin.Cheng wrote: > >>> > >>> > >>> Slightly tune to make iv cand choosing algorithm more accurate: > >>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01574.html > >> > >> It would help if you had some sample codes where this patch was > >> useful. I can kind-of see what's going on, but I'm way too > >> unfamiliar with the tree-IV code to review this change. > >> > > Jeff, Thanks to your review. > > As for example, consider below code on ARM/cortex-m4, (the code itself > > may be non-sense): > [ snip ] > > So in iv_ca_narrow you have the following change: > @@ -5705,13 +5709,15 @@ iv_ca_narrow (struct ivopts_data *data, struct > iv_ > if (!cp) > continue; > > - if (!iv_ca_has_deps (ivs, cp)) > -continue; > + iv_ca_set_cp (data, ivs, use, cp); > + acost = iv_ca_cost (ivs); > + iv_ca_set_cp (data, ivs, use, old_cp); > > - if (!cheaper_cost_pair (cp, new_cp)) > - continue; > - > - new_cp = cp; > + if (compare_costs (acost, best_cost) < 0) > + { > + best_cost = acost; > + new_cp = cp; > + } > } > } > else > > You're set a new cost pair (cp), compute the cost into acost, the restore the > old cost pair (oldcp). Then you see if you got a cheaper cost and if so, record > the best cost and set new_cp. > > What's the point behind restoring the old cost pair within the loop? > Can't you just restore it outside the loop if nothing cheaper was found? Yes, I updated the patch as attached. I also skipped the computation for START in the inner most loop. Actually there is another chance to reduce computation by merging the restore of old cost pair and the first call of iv_ca_delta_commit if we don't skip computation for START, but that should be done in another patch. > > AFAICT, the calls to iv_ca_has_deps aren't there for correctness, they were > there to prune out IVs that were not useful in this loop, correct? Emm, some kind of. See the cost of iv candidate set consists of several parts, the representation cost in cost pair; the register pressure cost falls in dependence on invariant expressions, etc.. Here iv_ca_has_deps checks whether new cost pair depends on other invariant expression which is not involved before, if so, current algorithm considers the new cost pair has higher cost and just skips. In fact, the new cost pair may give us lower overall cost even it introduces new dependence(similar to the case I gave). That's why I used the overall cost comparison for good. Is this new version patch looks good to you? I will re-test if it's ok. Thanks, bin > > Jeff Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 205798) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -5718,18 +5718,20 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ } /* Try narrowing set IVS by removing CAND. Return the cost of - the new set and store the differences in DELTA. */ + the new set and store the differences in DELTA. START is + the candidate with which we start narrowing. */ static comp_cost iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs, - struct iv_cand *cand, struct iv_ca_delta **delta) + struct iv_cand *cand, struct iv_cand *start, + struct iv_ca_delta **delta) { unsigned i, ci; struct iv_use *use; struct cost_pair *old_cp, *new_cp, *cp; bitmap_iterator bi; struct iv_cand *cnd; - comp_cost cost; + comp_cost cost, best_cost, acost; *delta = NULL; for (i = 0; i < n_iv_uses (data); i++) @@ -5740,13 +5742,15 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ if (old_cp->cand != cand) continue; - new_cp = NULL; + best_cost = iv_ca_cost (ivs); + /* Start narrowing with START. */ + new_cp = get_use_iv_cost (data, use, start); if (data->consider_all_candidates) { EXECUTE_IF_SET_IN_BITMAP (ivs->cands, 0, ci, bi) { - if (ci == cand->id) + if (ci == cand->id || (start && ci == start->id))
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 6:50 AM, Jakub Jelinek wrote: > On Tue, Dec 10, 2013 at 10:46:32AM +0800, bin.cheng wrote: >> This is the new version patch computing the difference in tree affine then >> comparing against integer_zero_node. >> Bootstrap and test on current upstream. I will commit it if there is no >> other objection. > > This breaks bootstrap on x86_64-linux for me too, though in a different > pass. > Anyway, the bugs I'm seeing here: > 1) other passes that use a cache for tree-affine computations >don't initialize it by pointer_map_create (), tree-affine.c does that >for you > 2) the cache shouldn't be destroyed by pointer_map_destroy, but >instead by free_affine_expand_cache that will actually make sure >it doesn't leak memory, etc. > 3) the actual issue why this breaks bootstrap is that you've added >the pointer_map_destroy (should be free_affine_expand_cache per 2) ) >to scev_finalize, but scev_initialize is performed at the start >of the loop passes and scev_finalize several passes later, and >in between passes ggc_collect is called. As the cache isn't registered >with GC, if you are unlucky and ggc_collect actually collects >in between scev_initialize and scev_finalize and garbage collects >some trees only mentioned in the affine cache peeled_chrec_map, >things can break completely. > > So, IMHO please revert this patch for now and try to find some > better place for the free_affine_expand_cache (&peeled_chrec_map); > call so that the cache is never non-NULL when ggc_collect may be called. > Alternatively you'd need to either tell GC about it (but the structures > are heap allocated, not GC), or say instruct through some GTY magic > that during collection free_affine_expand_cache (&peeled_chrec_map); > should be called. Sorry for bothering, I have reverted the patch. Will investigate it. Thanks, bin > > Jakub -- Best Regards.
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 4:31 AM, H.J. Lu wrote: > On Mon, Dec 9, 2013 at 6:46 PM, bin.cheng wrote: >> >>> >> Hi, >> This is the new version patch computing the difference in tree affine then >> comparing against integer_zero_node. >> Bootstrap and test on current upstream. I will commit it if there is no >> other objection. >> > > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59445 I found out the cause and addressed it in the bug entry. Thanks, bin > > -- > H.J. -- Best Regards.
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 6:50 AM, Jakub Jelinek wrote: > On Tue, Dec 10, 2013 at 10:46:32AM +0800, bin.cheng wrote: >> This is the new version patch computing the difference in tree affine then >> comparing against integer_zero_node. >> Bootstrap and test on current upstream. I will commit it if there is no >> other objection. > > This breaks bootstrap on x86_64-linux for me too, though in a different > pass. > Anyway, the bugs I'm seeing here: > 1) other passes that use a cache for tree-affine computations >don't initialize it by pointer_map_create (), tree-affine.c does that >for you > 2) the cache shouldn't be destroyed by pointer_map_destroy, but >instead by free_affine_expand_cache that will actually make sure >it doesn't leak memory, etc. > 3) the actual issue why this breaks bootstrap is that you've added >the pointer_map_destroy (should be free_affine_expand_cache per 2) ) >to scev_finalize, but scev_initialize is performed at the start >of the loop passes and scev_finalize several passes later, and >in between passes ggc_collect is called. As the cache isn't registered >with GC, if you are unlucky and ggc_collect actually collects >in between scev_initialize and scev_finalize and garbage collects >some trees only mentioned in the affine cache peeled_chrec_map, >things can break completely. > > So, IMHO please revert this patch for now and try to find some > better place for the free_affine_expand_cache (&peeled_chrec_map); > call so that the cache is never non-NULL when ggc_collect may be called. > Alternatively you'd need to either tell GC about it (but the structures > are heap allocated, not GC), or say instruct through some GTY magic > that during collection free_affine_expand_cache (&peeled_chrec_map); > should be called. Thanks for helping on the issue, I shouldn't use the affine facility before fully understanding it. I know little about GC, so when ggc_collect may be called (between two passes)? If so, I have to call free_affine_expand_cache just after the calling to tree_to_affine_combination_expand in SCEV because it's an analyzer and as you pointed out, the analyzing result is used by different optimizers. The pointer map would be maintained between different passes if it's not instantly freed. Thanks, bin > > Jakub -- Best Regards.
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 4:17 PM, Jakub Jelinek wrote: > On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote: >> On 12/10/13 23:35, Bin.Cheng wrote: >> >I know little about GC, so when ggc_collect may be called (between two >> >passes)? If so, I have to call free_affine_expand_cache just after the >> >calling to tree_to_affine_combination_expand in SCEV because it's an >> >analyzer and as you pointed out, the analyzing result is used by >> >different optimizers. The pointer map would be maintained between >> >different passes if it's not instantly freed. >> The garbage collector only runs between passes. If you have roots >> in static storage, then you'll need to decorate them so the garbage >> collector scans them. >> >> There's a fairly extensive discussion of the garbage collector in >> the GCC internals manual. > > It isn't that easy, pointer_map isn't a data structure recognized by the > garbage collector at all (plus, as I've said before, the code is inserting > XNEW allocated structures into the pointer_map, which isn't GC friendly > either, but making them GC allocated might increase garbage unnecessarily, > as all the structs are short lived). > > I've looked through scev_{initialize,finalize} callers, and almost all > passes initialize it and finalize within the same pass, it is only the loop > optimizations: > NEXT_PASS (pass_tree_loop_init); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_dce_loop); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_check_data_deps); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_graphite); > PUSH_INSERT_PASSES_WITHIN (pass_graphite) > NEXT_PASS (pass_graphite_transforms); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_dce_loop); > POP_INSERT_PASSES () > NEXT_PASS (pass_iv_canon); > NEXT_PASS (pass_parallelize_loops); > NEXT_PASS (pass_if_conversion); > /* pass_vectorize must immediately follow pass_if_conversion. > Please do not add any other passes in between. */ > NEXT_PASS (pass_vectorize); > PUSH_INSERT_PASSES_WITHIN (pass_vectorize) > NEXT_PASS (pass_dce_loop); > POP_INSERT_PASSES () > NEXT_PASS (pass_predcom); > NEXT_PASS (pass_complete_unroll); > NEXT_PASS (pass_slp_vectorize); > NEXT_PASS (pass_loop_prefetch); > NEXT_PASS (pass_iv_optimize); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_tree_loop_done); > where scev is live in between the passes, and unfortunately there > is no call that each pass calls that you could easily add the free_affine.* > call to (except for execute_function_todo but you'd need to invent another > TODO_* flag for it and the question is how you'd ensure it will be handled > for the non-loop specific passes that are just run in between these > passes (say pass_copy_prop, would we need pass_copy_prop_loop?). > > Perhaps you could replace in tree-affine.c and all it's users > {,struct} pointer_map_t * with > struct GTY((user)) affine_expand_cache { > pointer_map_t *cache; > }; > and write custom gt_ggc_mx (affine_expand_cache *) function that would > would either pointer_map_traverse the cache and for each cache entry > call gt_ggc_mx on the embedded trees in the structures, or drop the cache > (dunno if the cache is really just a cache and it doesn't matter if it is > dropped any time, or if it can affect code generation). There are also PCH > related functions that need to be defined, though I guess you could just > assert there that the cache is NULL and not walk anything. Thank both of you for being patient on this patch. I went through the documentation quickly and realized that I have to modify pointer-map structure to make it recognized by GC (maybe more work suggested by Jakub). It seems I shouldn't include that task in this patch at this stage 3, I am thinking just call free_affine* function in the place it is created for SCEV. Of course, that makes it more expensive. I just found that the patch also fixes PR58296 on IVOPT and I do want to include it in 4.9 if we are ok with it. So what's your opinion? Thanks, bin > > Jakub -- Best Regards.
RE: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Wednesday, December 11, 2013 6:04 PM > To: Jakub Jelinek > Cc: Bin.Cheng; Jeff Law; Bin Cheng; gcc-patches List > Subject: Re: [PATCH PR41488]Recognize more induction variables by > simplifying PEELED chrec in scalar evolution > > On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek wrote: > > On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote: > >> Thank both of you for being patient on this patch. > >> I went through the documentation quickly and realized that I have to > >> modify pointer-map structure to make it recognized by GC (maybe more > > > > Changing pointer_map at this point is IMHO not appropriate. > > > >> work suggested by Jakub). It seems I shouldn't include that task in > >> this patch at this stage 3, I am thinking just call free_affine* > >> function in the place it is created for SCEV. Of course, that makes > >> it more expensive. > > > > Perhaps you could call free_affine_expand_cache say from > > analyze_scalar_evolution, that is the innermost enclosing exported API > > that indirectly calls your new code, but it seems it is also called > > recursively from analyze_scalar_evolution_1 or functions it calls. > > So maybe you'd need to rename analyze_scalar_evolution to > > analyze_scalar_evolution_2 and adjust some calls to > > analyze_scalar_evolution to the latter in tree-scalar-evolution.c, and > > add analyze_scalar_evolution wrapper that calls > analyze_scalar_evolution_2 and free_affine_expand_cache. > > Whether this would work depends on detailed analysis of the > > tree-scalar-evolution.c callgraph, there are tons of functions, most > > of them static, and the question is if there is a single non-static > > (or at most a few of them) function that cover all calls to your new > > code in the static functions it (indirectly) calls, i.e. if there > > isn't any other external entry point that might call your new code > > without doing free_affine_expand_cache afterwards. > > > > If you can find that, I'd say it would be the safest thing to do. > > But let's see what say Richard thinks about it too. > > I think keeping the SCEV cache live over multiple passes is fragile (we do re- > use SSA names and passes do not appropriately call scev_reset[_htab] in all > cases). > > With fixing that the easiest approach is to associate a affine-expand cache > with the scev info. > > Without that there isn't a very good solution other than discarding the cache > after each expansion at the point you use it. The SCEV cache should > effectively make most of the affine expansion caching moot (but you can try > instrumenting that to verify it). > > That is, I suggest to try freeing the cache right after the second > tree_to_aff_combination_expand. > > Btw, > > + /* Try harder to check if they are equal. */ > + tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map); > + tree_to_aff_combination_expand (step_val, type, &aff2, > + &peeled_chrec_map); aff_combination_scale (&aff2, > + double_int_minus_one); aff_combination_add (&aff1, &aff2); left = > + fold_convert (type, aff_combination_to_tree (&aff1)); > + > + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP > + if "left" equals to "init + right". */ if (operand_equal_p > + (left, integer_zero_node, 0)) > > you don't need aff_combination_to_tree, simply check > > aff1.n == 0 && aff1.offset.is_zero () > > (you may want to add aff_combination_zero_p or even > aff_combination_equal_p) Hi, This is the new version patch with bug 59445 fixed and calling free_affine stuff just after it is used. I also added aff_combination_zero_p as suggested. The change in add_old_iv_candidates is unnecessary now, since the previous version patch triggered the assertion, I just leave it here as an additional check. At last, apology that I missed java in bootstrap before. Bootstrap and test on x86/x86_64, arm is still ongoing. Is it OK if tests pass. Thanks, bin 2013-12-12 Bin Cheng PR tree-optimization/58296 PR tree-optimization/41488 * tree-scalar-evolution.c: Include necessary header files. (simplify_peeled_chrec): New function. (analyze_evolution_in_loop): New static variable. Call simplify_peeled_chrec. * tree-ssa-loop-ivopts.c (mark_bivs): Don't mark peeled IV as biv. (add_old_iv_candidates): Don't add candidate for peeled IV. * tree-affine.h (aff_combination_zero_p): New function. gcc/test