Re: RFC: Faster for_each_rtx-like iterators

Richard Sandiford Sat, 17 May 2014 00:34:24 -0700

Thanks for the comments.

Jeff Law <l...@redhat.com> writes:
>> Implementation-wise, the main observation is that most subrtxes are part
>> of a single contiguous sequence of "e" fields.  E.g. when compiling an
>> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
>> subrtxes of 7,636,542 rtxes.  Of those:
> Yea, makes sense.  One could possibly carry this a step further and 
> realize that we could specialize the walkers.  ie, rather than indexing 
> the rtx_code to get the rtx format, then iterate over that to find the 
> 'e' fields, if we have an PLUS, then damn it, we ought to know how to 
> walk pretty directly.


One thing I tried was to add STATIC_CONSTANT_P for specific codes to the
next () routine.  The idea was that if we enter next () after handling
a particular code, we could skip directly to the right handling for that
code.  (I tried REG specifically, since that's such a commonly-handled
case and since the leaf case should be the easiest to thread.)  It didn't
help though because it would rely on the kind of jump threading that
only happens once __builtin_constant_p has been removed.

I suppose we could put the onus on the users of the iterator to invoke
a "handle subrtxes of this code" routine once they know what the code is.
That could make things a bit ugly though.  E.g.:

  FOR_EACH_SUBRTX (iter, array, expr, NONCONST)
    if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid > minuid)
      return true;

would become:

  FOR_EACH_SUBRTX (iter, array, expr, NONCONST)
    if (GET_CODE (*iter) == VALUE)
      {
        if (CSELIB_VAL_PTR (*iter)->uid > minuid)
          return true;
        iter.code_is (VALUE);
      }

It began to feel like premature optimisation.

>> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
>> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>>                        no "E" fields, and
>> (C)    43,532 (00.6%) are more complicated.
>>
>> (A) is really a special case of (B) in which the block has zero length.
>> Those are the only two cases that really need to be handled inline.
>> The implementation does this by having a mapping from an rtx code to the
>> bounds of its "e" sequence, in the form of a start index and count.
> Right.  So my thought above us just extending this a bit and realizing 
> that we don't need an index/count for most RTXs.  That may (or may not) 
> be worth the extra pain.  Your call whether or not to investigate that 
> further.

Well, the idea was to try to keep the code small enough for inlining
to be worthwhile.  I don't think we'd want a tree of checks for specific
codes because I'd have thought it would introduce extra hard-to-predict
branches.  E.g. although PLUS and MEM are very common codes, they probably 
only occur once in most patterns.

I suppose we could organise the codes so that all the leaf rtxes come
before all the "e"s, which in turn come before all the "ee"s, etc.,
so that we can just check for code ranges.  The problem is that we'll
probably then end up with a separate bounds check (for the stack
worklist) for each length, so the code would end up bigger and
with more branches.  Also, I think the codes are currently arranged
so that the codes that are likely to appear in top-level patterns
are in the same block, and that constants are in the same block, etc.,
so that jump-table-based cases can be used where profitable.
(E.g. SET...TRAP_IF.)

Thanks,
Richard

Re: RFC: Faster for_each_rtx-like iterators

Reply via email to