On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <[email protected]> wrote:
> Hi,
> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
> prevented from representing address of one memory object using address of
> another memory object. Also as I commented in that PR, we have two possible
> fixes for this:
> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
> using another one, while not breaking PR50955.
> 2) Add iv candidates with base object stripped. In this way, we use the
> common base-stripped part to represent all address expressions, in the form
> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>
> In terms of code generation, method 2) is at least as good as 1), actually
> better in my opinion. The problem of 2) is we need to tell when iv
> candidates should be added for the common part and when shouldn't. This
> issue can be generalized and described as: We know IVO tries to add
> candidates by deriving from iv uses. One disadvantage is that candidates
> are derived from iv use independently. It doesn't take common sub
> expression among different iv uses into consideration. As a result,
> candidate for common sub expression is not added, while many useless
> candidates are added.
>
> As a matter of fact, candidate derived from iv use is useful only if it's
> common enough and could be shared among different uses. A candidate is most
> likely useless if it's derived from a single use and could not be shared by
> others. This patch works in this way by firstly recording all kinds
> candidates derived from iv uses, then adding candidates for common ones.
>
> The patch improves 410.bwaves by 3-4% on x86_64. I also saw regression for
> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
> they are false alarms caused by align issues.
> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
> another irrelevant issue caused by heuristic candidate selecting algorithm.
> Unfortunately, I don't have fix to it currently.
>
> This patch may add more candidates in some cases, but generally candidates
> number is smaller because we don't need to add useless candidates now.
> Statistic data shows there are quite fewer loops with more than 30
> candidates when building spec2k6 on x86_64 using this patch.
>
> Bootstrap and test on x86_64. I will re-test it against latest trunk on
> AArch64. Is it OK?
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+ const iv_common_cand *ccand2)
+{
+ return ccand1->hash == ccand2->hash
+ && operand_equal_p (ccand1->base, ccand2->base, 0)
+ && operand_equal_p (ccand1->step, ccand2->step, 0)
+ && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));
I'm wondering on the TYPE_PRECISION check. a) why is that needed?
and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
unconditionally?
+ slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+ if (*slot == NULL)
+ {
+ *slot = XNEW (struct iv_common_cand);
allocate from the IV obstack instead? I see we do a lot of heap allocations
in IVOPTs, so we can improve that as followup as well.
We probably should empty the obstack after each processed loop.
Thanks,
Richard.
> Thanks,
> bin
>
> 2015-11-03 Bin Cheng <[email protected]>
>
> PR tree-optimization/52272
> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
> (struct iv_common_cand_hasher): New struct.
> (iv_common_cand_hasher::hash): New function.
> (iv_common_cand_hasher::equal): New function.
> (struct ivopts_data): New fields, iv_common_cand_tab and
> iv_common_cands.
> (tree_ssa_iv_optimize_init): Initialize above fields.
> (record_common_cand, common_cand_cmp): New functions.
> (add_iv_candidate_derived_from_uses): New function.
> (add_iv_candidate_for_use): Record iv_common_cands derived from
> iv use in hash table, instead of adding candidates directly.
> (add_iv_candidate_for_uses): Call
> add_iv_candidate_derived_from_uses.
> (record_important_candidates): Add important candidates to iv uses'
> related_cands. Always keep related_cands for future use.
> (try_add_cand_for): Use iv uses' related_cands.
> (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
> in struct ivopts_data, iv_common_cand_tab and iv_common_cands.