On 07/21/11 17:42, Richard Sandiford wrote:

> At the moment, auto-inc-dec.c only considers pairs of instructions,
> so it can't optimise this kind of sequence.  The attached patch is a
> WIP (but almost complete) attempt to handle longer sequences too.

So, I promised to look at it, I guess I better do so before disappearing
on vacation. It's a lot to take in; so far I think I get the overall
structure and some of the details.

On the whole I think it looks good. I've tried to come up with such
algorithms in the past, but I think I always tried too hard to find
"optimal" solutions. The algorithm here looks reasonable.

>       /* When optimizing for speed, don't introduce dependencies between
>          memory references in the chain and memory references outside of it,
>          since doing so would limit scheduling.  If the chain is truly
>          worthwhile, we can still try again using a new pseudo register.  */

Now, it's good that there is at least some consideration for this, but I
wonder whether it's possible to do better here. The basic principle is
that if something is achievable without autoinc, in the same number of
insns (or a factor 1.x thereof with x small), then it's preferrable not
to use autoinc at all.

For example,

>            ... = *r1                ... = *rN++
>            ... = *(r1 + 4)          ... = *rN
>      *(r1 + 4) = ...              *rN++ = ...
>             r2 = r1 + 8              r2 = rN
>            ... = *r2          --->  ... = rN++
>             r3 = r2 + 4              r3 = rN
>            ... = *r3                ... = rN++
>             r4 = r1 + 16             r4 = rN
>            ... = *r4                ... = rN

On many targets, this would best be handled as a series of (reg +
offset) addressing modes. If an update is required, on targets that
support both (reg + offset) and {pre,post}_modify, it would be good to
have an automodify in just one of the addresses; if a single add insn is
required this may still be better if the number of memory references is
large enough. Do you think this is something that can be done in the
context of this pass?

Minor details:

I'm not sure all the code is completely safe if a hardreg is used with a
mix of single- and multi-word accesses. Maybe just mark a hardreg that
occurs in a multi-word mode as unavailable for the optimization?

It would be nice not to have semi-identical pairs of functions like
other_uses_before and other_uses_after or valid_mem_add_pair and
valid_add_mem_pair. Can these be merged sensibly?

Just to make sure, my understanding is that this will also generate
(automod (reg1) (plus (reg1) (reg2))). Right?

> /* Represents an offset that is applied to a valno base.  */
> union valno_offset {
>   HOST_WIDE_INT constant;
>   struct valno_def *valno;
> };

Should mention that this is used as splay tree key? For some reason I
find it slightly scary to use a union as a key; I see that things like
valno_splay have a constant_p parameter - maybe you want two separate
trees instead?

> /* Create a new entry with key KEY in ROOT's derived tree.  */
> 
> static struct valno_def *
> new_splay_valno (bool constant_p, struct valno_def *base,
>                  union valno_offset *offset)

Comment doesn't match definition.

> Using rtx costs does make things worse on some targets, which I think
> is due to dubious MEM costs.  m68k is a particularly bad case, because
> for -Os, it uses byte counts rather than COSTS_N_INSNS.

Let the m68k maintainers worry about that particular problem. The basic
principle must be to have the optimizers use costs reasonably and expect
targets to define them reasonably.


Bernd

Reply via email to