You seem to be confused. We've known *why* CSE does stuff that GCSE doesn't catch for almost as long as we've had GCSE.
It's because CSE *doesn't just do CSE*! It does value numbering, and a bunch of other things, which are not really implemented at the RTL level as seperate passes, Well, sure, but most of the benefit of running those is within a basic block. I don't see why the combination of a global just-CSE with the current intra-block code wouldn't be effective. Also, the viewpoint that absolutely everything CSE currently does needs to be done in order to remove CSE is wrong. I'm not talking about removing CSE. Indeed, the part of CSE that chooses the best operand from a cost point of view likely needs to stay forever. I was just talking about removing the following of jumps. The correct viewpoint is "we shouldn't remove CSE until every *profitable* transformation it makes is subsumed by something else". And, as I understand it, the claim is that this is not yet true for the following of jumps and my question is why.