On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
<[email protected]> wrote:
> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <[email protected]> wrote:
>> Hi,
>> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
>> prevented from representing address of one memory object using address of
>> another memory object. Also as I commented in that PR, we have two possible
>> fixes for this:
>> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
>> using another one, while not breaking PR50955.
>> 2) Add iv candidates with base object stripped. In this way, we use the
>> common base-stripped part to represent all address expressions, in the form
>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>
>> In terms of code generation, method 2) is at least as good as 1), actually
>> better in my opinion. The problem of 2) is we need to tell when iv
>> candidates should be added for the common part and when shouldn't. This
>> issue can be generalized and described as: We know IVO tries to add
>> candidates by deriving from iv uses. One disadvantage is that candidates
>> are derived from iv use independently. It doesn't take common sub
>> expression among different iv uses into consideration. As a result,
>> candidate for common sub expression is not added, while many useless
>> candidates are added.
>>
>> As a matter of fact, candidate derived from iv use is useful only if it's
>> common enough and could be shared among different uses. A candidate is most
>> likely useless if it's derived from a single use and could not be shared by
>> others. This patch works in this way by firstly recording all kinds
>> candidates derived from iv uses, then adding candidates for common ones.
>>
>> The patch improves 410.bwaves by 3-4% on x86_64. I also saw regression for
>> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
>> they are false alarms caused by align issues.
>> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
>> another irrelevant issue caused by heuristic candidate selecting algorithm.
>> Unfortunately, I don't have fix to it currently.
>>
>> This patch may add more candidates in some cases, but generally candidates
>> number is smaller because we don't need to add useless candidates now.
>> Statistic data shows there are quite fewer loops with more than 30
>> candidates when building spec2k6 on x86_64 using this patch.
>>
>> Bootstrap and test on x86_64. I will re-test it against latest trunk on
>> AArch64. Is it OK?
>
> +inline bool
> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
> + const iv_common_cand *ccand2)
> +{
> + return ccand1->hash == ccand2->hash
> + && operand_equal_p (ccand1->base, ccand2->base, 0)
> + && operand_equal_p (ccand1->step, ccand2->step, 0)
> + && TYPE_PRECISION (TREE_TYPE (ccand1->base))
> + == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>
Hi Richard,
Thanks for reviewing.
> I'm wondering on the TYPE_PRECISION check. a) why is that needed?
Because operand_equal_p doesn't check type precision for constant int
nodes, and IVO needs to take precision into consideration.
> and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
> unconditionally?
Both SCEV and IVO work on expressions with type satisfying
POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
unconditionally?
>
> + slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
> + if (*slot == NULL)
> + {
> + *slot = XNEW (struct iv_common_cand);
>
> allocate from the IV obstack instead? I see we do a lot of heap allocations
> in IVOPTs, so we can improve that as followup as well.
>
Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
are better to be allocated in obstack. Actually I have already make
that change to struct iv. others will be followup too.
Thanks,
bin
> We probably should empty the obstack after each processed loop.
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>>
>> 2015-11-03 Bin Cheng <[email protected]>
>>
>> PR tree-optimization/52272
>> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
>> (struct iv_common_cand_hasher): New struct.
>> (iv_common_cand_hasher::hash): New function.
>> (iv_common_cand_hasher::equal): New function.
>> (struct ivopts_data): New fields, iv_common_cand_tab and
>> iv_common_cands.
>> (tree_ssa_iv_optimize_init): Initialize above fields.
>> (record_common_cand, common_cand_cmp): New functions.
>> (add_iv_candidate_derived_from_uses): New function.
>> (add_iv_candidate_for_use): Record iv_common_cands derived from
>> iv use in hash table, instead of adding candidates directly.
>> (add_iv_candidate_for_uses): Call
>> add_iv_candidate_derived_from_uses.
>> (record_important_candidates): Add important candidates to iv uses'
>> related_cands. Always keep related_cands for future use.
>> (try_add_cand_for): Use iv uses' related_cands.
>> (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
>> in struct ivopts_data, iv_common_cand_tab and iv_common_cands.