Hello! >> I also wonder if compare-elim ought to be helping here. Isn't that the >> point here, to eliminate the comparison and instead get it for free as >> part of the arithmetic? If so, is it the fact that we have memory >> references that prevents compare-elim from kicking in? > > Yes, compare-elim doesn't work with memory references but, more radically, it > is not enabled for x86 (it is only enabled for aarch64, mn10300 and rx).
I did experiment a bit with a compare-elim pass on x86. However, as rth said in [1]: --quote-- If we want to use this pass for x86, then for 4.8 we should also fix the discrepancy between the compare-elim canonical [(operate) (set-cc)] and the combine canonical [(set-cc) (operate)] (Because of the simplicity of the substitution in compare-elim, I prefer the former as the canonical canonical.) --/quote-- There were some patches flowing around [2], [3] that enhanced compare-elim pass for x86 needs, but the target never switched to new pass, mostly because compare-elim pass did not catch all cases that traditional RTX combine pass did. However, combine-elim pass can cross BB boundaries, where traditional RTX combine doesn't (and IIRC it even has a comment why it doesn't try too hard to do so). The reason why x86 doesn't use both passes is simply due to the fact quoted above. compare-elim pass substitutes the clobber in the PARALLEL RTX with a new set-cc in-place, so all relevant patterns in i386.md (and a couple of support functions in i386.c) would have to be swapped around. Unfortunately, simply changing i386.md insn patterns would disable existing RTX combiner functionality, leading to various missed-optimization regressions. Due to the above, I would like to propose that existing RTX compare pass be updated to handle [(operate)(set-cc)] patterns (exclusively?). >From my experience, compare-elim post-reload pass would catch a bunch of remaining cross-BB opportunities, left by RTX combine pass, so compare-elim pass would be effective on x86 also after RTX combiner does its job. While target-dependent changes would be fairly trivial, I don't know about the amount of work in combine.c to handle new canonical patterns. Maybe RTL maintainer can chime in (hint, hint, wnk wink ;) There is also hidden benefit for "other", compare-elim only targets. Having this pass enabled on a wildly popular target would help catching eventual bugs in the pass. [1] https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00251.html [2] https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00466.html [3] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01487.html Uros.