On Fri, May 9, 2014 at 8:18 AM, Jeff Law <l...@redhat.com> wrote:
> On 05/07/14 14:52, Richard Sandiford wrote:
>>
>> I noticed for_each_rtx showing up in profiles and thought I'd have a go
>> at using worklist-based iterators instead.  So far I have three:
>>
>>    FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
>>    FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
>>    FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *
>>
>> with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.
>
> Yea, for_each_rtx is a bit of a workhorse, so I'm not surprised it's showing
> up on some profiles.
>
>
>
>
>> I've locally replaced all for_each_rtx calls in the generic code with
>> these iterators and they make things reproducably faster.  The speed-up
>> on full --enable-checking=release ./cc1 and ./cc1plus times is only about
>> 1%,
>> but maybe that's enough to justify the churn.
>
> I'd consider that enough to justify the churn, especially if it's an
> improvement we're seeing consistently rather than just something that's
> helping pathological cases.
>
>
>
>>
>> Implementation-wise, the main observation is that most subrtxes are part
>> of a single contiguous sequence of "e" fields.  E.g. when compiling an
>> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
>> subrtxes of 7,636,542 rtxes.  Of those:
>
> Yea, makes sense.  One could possibly carry this a step further and realize
> that we could specialize the walkers.  ie, rather than indexing the rtx_code
> to get the rtx format, then iterate over that to find the 'e' fields, if we
> have an PLUS, then damn it, we ought to know how to walk pretty directly.
>
>
>>
>> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
>> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>>                        no "E" fields, and
>> (C)    43,532 (00.6%) are more complicated.
>>
>> (A) is really a special case of (B) in which the block has zero length.
>> Those are the only two cases that really need to be handled inline.
>> The implementation does this by having a mapping from an rtx code to the
>> bounds of its "e" sequence, in the form of a start index and count.
>
> Right.  So my thought above us just extending this a bit and realizing that
> we don't need an index/count for most RTXs.  That may (or may not) be worth
> the extra pain.  Your call whether or not to investigate that further.
>
>
>
>>
>> Does this look like it's worth pursuing?  If so, is it OK to start
>> submitting now or should it wait until 4.9.1 is out?  If it is OK,
>> I plan to convert the ports too and get rid of for_each_rtx.
>
> I think it's definitely worth pursuing.  I suspect the release managers
> would definitely want to wait for 4.9.1 :-)

It's not a too big change so no need to wait.

Thanks,
Richard.

> jeff
>

Reply via email to