Thanks for the comments. Jeff Law <l...@redhat.com> writes: >> Implementation-wise, the main observation is that most subrtxes are part >> of a single contiguous sequence of "e" fields. E.g. when compiling an >> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the >> subrtxes of 7,636,542 rtxes. Of those: > Yea, makes sense. One could possibly carry this a step further and > realize that we could specialize the walkers. ie, rather than indexing > the rtx_code to get the rtx format, then iterate over that to find the > 'e' fields, if we have an PLUS, then damn it, we ought to know how to > walk pretty directly.
One thing I tried was to add STATIC_CONSTANT_P for specific codes to the next () routine. The idea was that if we enter next () after handling a particular code, we could skip directly to the right handling for that code. (I tried REG specifically, since that's such a commonly-handled case and since the leaf case should be the easiest to thread.) It didn't help though because it would rely on the kind of jump threading that only happens once __builtin_constant_p has been removed. I suppose we could put the onus on the users of the iterator to invoke a "handle subrtxes of this code" routine once they know what the code is. That could make things a bit ugly though. E.g.: FOR_EACH_SUBRTX (iter, array, expr, NONCONST) if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid > minuid) return true; would become: FOR_EACH_SUBRTX (iter, array, expr, NONCONST) if (GET_CODE (*iter) == VALUE) { if (CSELIB_VAL_PTR (*iter)->uid > minuid) return true; iter.code_is (VALUE); } It began to feel like premature optimisation. >> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields, >> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and >> no "E" fields, and >> (C) 43,532 (00.6%) are more complicated. >> >> (A) is really a special case of (B) in which the block has zero length. >> Those are the only two cases that really need to be handled inline. >> The implementation does this by having a mapping from an rtx code to the >> bounds of its "e" sequence, in the form of a start index and count. > Right. So my thought above us just extending this a bit and realizing > that we don't need an index/count for most RTXs. That may (or may not) > be worth the extra pain. Your call whether or not to investigate that > further. Well, the idea was to try to keep the code small enough for inlining to be worthwhile. I don't think we'd want a tree of checks for specific codes because I'd have thought it would introduce extra hard-to-predict branches. E.g. although PLUS and MEM are very common codes, they probably only occur once in most patterns. I suppose we could organise the codes so that all the leaf rtxes come before all the "e"s, which in turn come before all the "ee"s, etc., so that we can just check for code ranges. The problem is that we'll probably then end up with a separate bounds check (for the stack worklist) for each length, so the code would end up bigger and with more branches. Also, I think the codes are currently arranged so that the codes that are likely to appear in top-level patterns are in the same block, and that constants are in the same block, etc., so that jump-table-based cases can be used where profitable. (E.g. SET...TRAP_IF.) Thanks, Richard