Thank you very much. This was very informative. Richard Sandiford writes: > If we have an instruction: > > A: (set (reg Z) (plus (reg X) (const_int 0xdeadbeef))) > > we will need to use something like: > > (set (reg Y) (const_int 0xdead0000)) > (set (reg Y) (ior (reg Y) (const_int 0xbeef))) > B: (set (reg Z) (plus (reg X) (reg Y))) > > But if A is in a loop, the Y loads can be hoisted, and the cost > of A is effectively the same as the cost of B. In other words, > the (il)legitimacy of the constant operand doesn't really matter.
My guess is that A not being a recognizable insn, this is relevant at RTL expansion. Is this correct? > In summary, the current costs generally work because: > > (a) We _usually_ only apply costs to arbitrary instructions > (rather than candidate instruction patterns) before > loop optimisation. I don't think I understand this point. I see the part that the cost is typically queried before loop optimization but I don't understand the distinction between "arbitrary instructions" and "candidate instruction patterns". Can you please explain the difference? > (b) It doesn't matter what we return for invalid candidate > instruction patterns, because recog will reject them anyway. > > So I suppose my next question is: are you seeing this problem with cse1 > or cse2? The reasoning behind the zero cost might still be valid for > REG_EQUAL notes in cse1. However, it's probably not right for cse2, > which runs after loop hoisting. I am seeing it with both, so at least at cse2 we could do it with this. > Perhaps we could add some kind of context parameter to rtx_costs > to choose between the hoisting and non-hoisting cost. As well as > helping with your case, it could let us use the non-hoisting cost > before loop optimisation in cases where the insn isn't going to > go in a loop. The drawback is that we then have to replicate > even more of the .md file in rtx_costs. > > Alternatively, perhaps we could just assume that rtx_costs always > returns the hoisted cost when optimising for speed, in which case > I think your alternative solution would be theoretically correct > (i.e. not a hack ;)). OK, I think I am going to propose this in the patch then. It might still be interesting to experiment with providing more context to rtx_costs. > E.g. suppose we're deciding how to implement an in-loop multiplication. > We calculate the cost of a multiplication instruction vs. the cost of a > shift/add sequence, but we don't consider whether any of the backend-specific > shift/add set-up instructions could be hoisted. This would lead to us > using multiplication insns in cases where we don't want to. > > (This was one of the most common situations in which the zero cost helped.) I am not sure I understand this. Why would we decide to hoist suboperations of a multiplication? If it is loop-variant then even the suboperations are loop-variant whereas if it is loop-invariant then we can hoist the whole operation. What am I missing? Adam