Michael Meissner schrieb: > In particular, go to pages 5-7 of my tutorial (chapter 6) where I talk about > scratch registers and allocating a new pseudo in split1 which is now always > run.
Ah great! That's the missing link. The pseudo hatches in split1 from scratch. This makes combine even more powerful an split1 can be seen as expand2 :-) imho, combine could be slightly improved by some minor modifications: I. In the old 3.4.x (private port) I introduced a target hook in combine, just prior to where recog_for_combine gets called. The hook did some canonicalization of rtx and thereby considerably reduced the number of patterns that would have been necessary without that hook. It transformed some unspecs. A second use case of such a hook could look as follows: Imagine a combined pattern that does not match but would be legal if the backend knew that some register contains some specific value like, e.g., non-negative, zero-or-storeflagvalue, combiner has proved that some bits will always be set or always be cleared; that kind of things. The hook could accept the pattern (provided it passes recog_for_combine, in that setup the hook will dispatch to recog_for_combine, so it has the opportunity to canonicalize the pattern or take some decision depending on the insn matched) if backend is allowed so see the proof (pattern ok) or there is no such proof (pattern rejected). In the actual setup you could write more complex patterns to make combiner look deeper into things. That implies writing an insn that has the information explicit in the pattern. Most probably you are going so split the stuff in split1 again. A little bit more explicit: Suppose f(x) is the pattern in question, just kooked up by combine. You have an insn that can do f(x) under the condition that x \in {0,1} but not f(x) generally, and combine doesn't look deeper into things because of some reason. In a case where combine can proof that x will always be 0 or 1 it is legal to accept f(x), otherwise not. The current situation, however, would go like this: In the case where the source happens to be like x &= 1, f(x) and there is a combine bridge that can do f(x & 1) combine might take a try. f(x & 1) matches, and in split1 things get split as x &= 1, f(x). Now you have the knowledge that x is either 0 or 1 and you can replace (or directly split it like) x &=1, g(x) with an insn g(x) that is cheaper than f(x). Note that the first case is more general and need no combine bridge. The only thing that has to be done is replacing f(x) with g(x), which is legal under the assumption x \in {0,1}. II. Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2. Combine combines A+B and sees costs f(A+B) = 5. This makes combine reject the pattern, not looking deeper. If f(A+B+C) = 2 you either will never see A+B+C, or you have to lie about cost and say f(A+B) = 2. Ok. As combine is myopic (I do not blame combine here, it's complex enough) you go and write the combine bridge A+B. But: In situations where A+B matches but there is no thing like C we see a dis-optimization. split1 could fix that by splitting A+B into A, B again, but that clutters up machine description. Combine could go like this: a) f(A+B) = 5 is more expensive than f(A)+f(B) = 4 b) do the combine A+B c) look deeper d1) there is match A+B+C cost f(A+B+C) = 2 => accept d2) there is match A+B+C cost f(A+B+C) = 12 => costly => reject d3) there is no match A+B+C => A+B ist costly => reject Skimming combine dumps, I see most patterns get rejected because they do not match, just a few insns get rejected because of higher cost. So looking deeper even if the costs are higher may give better code and increasing compile time just a little bit. Case II is similar to tunneling known from physics. Georg