On 07/21/11 17:42, Richard Sandiford wrote: > At the moment, auto-inc-dec.c only considers pairs of instructions, > so it can't optimise this kind of sequence. The attached patch is a > WIP (but almost complete) attempt to handle longer sequences too.
So, I promised to look at it, I guess I better do so before disappearing on vacation. It's a lot to take in; so far I think I get the overall structure and some of the details. On the whole I think it looks good. I've tried to come up with such algorithms in the past, but I think I always tried too hard to find "optimal" solutions. The algorithm here looks reasonable. > /* When optimizing for speed, don't introduce dependencies between > memory references in the chain and memory references outside of it, > since doing so would limit scheduling. If the chain is truly > worthwhile, we can still try again using a new pseudo register. */ Now, it's good that there is at least some consideration for this, but I wonder whether it's possible to do better here. The basic principle is that if something is achievable without autoinc, in the same number of insns (or a factor 1.x thereof with x small), then it's preferrable not to use autoinc at all. For example, > ... = *r1 ... = *rN++ > ... = *(r1 + 4) ... = *rN > *(r1 + 4) = ... *rN++ = ... > r2 = r1 + 8 r2 = rN > ... = *r2 ---> ... = rN++ > r3 = r2 + 4 r3 = rN > ... = *r3 ... = rN++ > r4 = r1 + 16 r4 = rN > ... = *r4 ... = rN On many targets, this would best be handled as a series of (reg + offset) addressing modes. If an update is required, on targets that support both (reg + offset) and {pre,post}_modify, it would be good to have an automodify in just one of the addresses; if a single add insn is required this may still be better if the number of memory references is large enough. Do you think this is something that can be done in the context of this pass? Minor details: I'm not sure all the code is completely safe if a hardreg is used with a mix of single- and multi-word accesses. Maybe just mark a hardreg that occurs in a multi-word mode as unavailable for the optimization? It would be nice not to have semi-identical pairs of functions like other_uses_before and other_uses_after or valid_mem_add_pair and valid_add_mem_pair. Can these be merged sensibly? Just to make sure, my understanding is that this will also generate (automod (reg1) (plus (reg1) (reg2))). Right? > /* Represents an offset that is applied to a valno base. */ > union valno_offset { > HOST_WIDE_INT constant; > struct valno_def *valno; > }; Should mention that this is used as splay tree key? For some reason I find it slightly scary to use a union as a key; I see that things like valno_splay have a constant_p parameter - maybe you want two separate trees instead? > /* Create a new entry with key KEY in ROOT's derived tree. */ > > static struct valno_def * > new_splay_valno (bool constant_p, struct valno_def *base, > union valno_offset *offset) Comment doesn't match definition. > Using rtx costs does make things worse on some targets, which I think > is due to dubious MEM costs. m68k is a particularly bad case, because > for -Os, it uses byte counts rather than COSTS_N_INSNS. Let the m68k maintainers worry about that particular problem. The basic principle must be to have the optimizers use costs reasonably and expect targets to define them reasonably. Bernd