On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, May 6, 2014 at 6:44 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >> >>>> Hi, >>>> I split the patch into two and updated the test case. >>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm >>>> cortex-m3. >>>> Is it OK? >>>> >>>> Thanks, >>>> bin >>>> >>>> PATCH 1: >>>> >>>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>>> core part of address expressions. >>> >>> No gcc/ in the changelog >>> >>> Simplify that by using aff_combination_add_cst: >>> >>> + if (TREE_CODE (core) == MEM_REF) >>> + { >>> + aff_combination_add_cst (comb, mem_ref_offset (core)); >>> + core = TREE_OPERAND (core, 0); >>> >>> patch 1 is ok with that change. >> >> Installed with below change because of wide-int merge: >> - core = build_fold_addr_expr (core); >> + if (TREE_CODE (core) == MEM_REF) >> + { >> + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, >> 1))); >> + core = TREE_OPERAND (core, 0); >> >>> >>>> PATCH 2: >>>> >>>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >>> >>> So this "lowers" addresses(?) which are based on &<not-decl>, >>> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR >> &a+4 is already in its lowered form, what we want to handle is &EXPR + >> 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. >> >>> when originally introducing address lowering (in rev. 204497) I was >>> concerned about the cost? >> Yes, you did. I still think the iv number is relative small for each >> loop, thus the change won't cause compilation time issue considering >> the existing tree<->affine transformation in ivopt. >> I would like to collect more accurate time information for ivopt in >> gcc bootstrap. Should I use "-ftime-report" then add all time slices >> in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? >> Thanks. >> >>> >>> Now I wonder why we bother to convert the lowered form back >>> to trees to store it in ->base and not simply keep (and always >>> compute) the affine expansion only. Thus, change struct iv >>> to have aff_tree base instead of tree base? >>> >>> Can you see what it takes to do such change? At the point >>> we replace uses we go into affine representation again anyway. >> Good idea, I may have a look. > After going through work flow of ivopt, I think it's practical to make > such change and can help compilation time. Ivopt calls > tree_to_aff_combination to convert iv->base into affine_tree whenever > it tries to represent an iv_use by an iv_cand, i.e., N*M times for > computing cost of each iv_use represented by each iv_cand, and M times > for rewriting each iv_use. Here, N and M stands for number of iv_use > and iv_cand. Also strip_offset (which is a recursive function) is > called for iv->base also at complexity of O(NM), so it may be improved > too. > To make a start, it's possible to store both tree and affine_tree of > base in struct iv, and use either of them whenever needed. > > It seems to me there are two potential issues of this change. First, > though affine_tree of base is stored, we can't operate on it directly, > which means we have to copy it before using it.
You'd use it as unchanged operand to some aff_* function to avoid actual copying of course. > Second, ivopt > sometimes computes affine_tree of base in different type other than > TREE_TYPE(base), we may need to do additional calls to > aff_combination_convert to get wanted type. All these will introduce > additional computation and compromise benefit of the change. Sure. > At last, I did some experiments on how many additional calls to > tree_to_aff_combination are introduced by this patch. The data of > bootstrap shows it's less than 3% comparing to calls made by current > implementation of ivopt, which confirms that this patch shouldn't have > issue of compilation time. > > Any comments? I'd see the use of affines as making the algorithmic structure of IVOPTs clearer and more consistent (also with possibly using the _expand variants later to get even more simplifications). I don't want to force you to do this change but I thought it may help further changes (as you seem to be doing a lot of IVOPTs changes lately). So - the patch is ok as-is then, but please consider refactoring IVOPTs code when you do changes. It's current structure is somewhat awkward. Thanks, Richard. > Thanks, > bin > > > -- > Best Regards.