On Thu, May 14, 2015 at 1:11 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, May 13, 2015 at 7:38 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, May 8, 2015 at 12:47 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> GCC's IVO currently handles every IV use independently, which is not right >>> by learning from cases reported in PR65447. >>> >>> The rationale is: >>> 1) Lots of address type IVs refer to the same memory object, share similar >>> base and have same step. We should handle these IVs as a group in order to >>> maximize CSE opportunities, prefer reg+offset addressing mode. >>> 2) GCC's IVO algorithm is expensive and only is run when candidate set is >>> small enough. By grouping same family uses, we can decrease the number of >>> both uses and candidates. Before this patch, number of candidates for >>> PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly >>> code on targets like AArch64 and Mips. >>> 3) Even for cases the assembly code isn't improved, we can still get >>> compilation time benefit with this patch. >>> 4) This is a prerequisite for enabling auto-increment support in IVO on >>> AArch64. >>> >>> For now, this is only done to address type IVs, in the future I may extend >>> it to general IVs too. >>> >>> For AArch64: >>> Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by >>> this patch. A couple of cases from spec2k/fp appear regressed. I looked >>> into generated assembly code and can confirm the regression is false alarm >>> except one case (189.lucas). For that case, I think it's another issue >>> exposed by this patch (GCC failed to CSE candidate setup code, resulting in >>> bloated loop header). Anyway, I also fined tuned the patch to minimize the >>> impact. >>> >>> For AArch32, this patch seems to be able to improve spec2kfp too, but I >>> didn't look deep into it. I guess the reason is it can make life for >>> auto-increment support in IVO better. >>> >>> One of defects of this patch is computation of max offset in >>> compute_max_addr_offset is basically borrowed from get_address_cost. The >>> comment says we should find a better way to compute all information. People >>> also complained we need to refactor that part of code. I don't have good >>> solution to that yet, though I did try best to keep compute_max_addr_offset >>> simple. >>> >>> I believe this is a generally wanted change, bootstrap and test on x86_64 >>> and AArch64, so is it ok? >> >> I'm a little bit worried about the linked list of sub-uses and the "sorting" >> (that's quadratic). A little. I don't have any good idea but to use a >> tree... >> We don't seem to limit the number of sub-uses (if we'd do that it would >> become O(1)). >> >> Similar is searching in the list of uses for a group with same base/step >> (but ISTR IVOPTs has multiple similar loops?) > > Hi Richard, > Thanks for reviewing. Instead of tree, I can also keep the linked > list, then quick sort it by using a vector as temporary storage. This > can avoid the complexity of BST operations without non-trivial > overload.
Sounds good if constructing & querying are not done at the same time. > For the searching routine, a local hash table could help. >> >> Overall the patch looks like a good improvement to how we do IVO, so I think >> it is ok as-is. > Do you want me to do this change now, or I can pick it up later when > dealing with compilation time issue? Also it would be nice if any > compilation time issue will be reported after applying this version > patch. Because we will have a IVO compilation time benchmark then. You can pick it up later. Richard. > Thanks, > bin > >> >> Thanks, >> Richard. >> >> >>> >>> 2015-05-08 Bin Cheng <bin.ch...@arm.com> >>> >>> PR tree-optimization/65447 >>> * tree-ssa-loop-ivopts.c (struct iv_use): New fields. >>> (dump_use, dump_uses): Support to dump sub use. >>> (record_use): New parameters to support sub use. Remove call to >>> dump_use. >>> (record_sub_use, record_group_use): New functions. >>> (compute_max_addr_offset, split_all_small_groups): New functions. >>> (group_address_uses, rewrite_use_address): New functions. >>> (strip_offset): New declaration. >>> (find_interesting_uses_address): Call record_group_use. >>> (add_candidate): New assertion. >>> (infinite_cost_p): Move definition forward. >>> (add_costs): Check INFTY cost and return immediately. >>> (get_computation_cost_at): Clear setup cost and dependent bitmap >>> for sub uses. >>> (determine_use_iv_cost_address): Compute cost for sub uses. >>> (rewrite_use_address_1): Rename from old rewrite_use_address. >>> (free_loop_data): Free sub uses. >>> (tree_ssa_iv_optimize_loop): Call group_address_uses. >>> >>> gcc/testsuite/ChangeLog >>> 2015-05-08 Bin Cheng <bin.ch...@arm.com> >>> >>> PR tree-optimization/65447 >>> * gcc.dg/tree-ssa/pr65447.c: New test.