> Is it possible that the virtual pass causes inefficiencies in some
> cases by sticking with r8-r31 when one of the 'normal' registers
> would be better?

That's not a fair question to ask, since the virtual pass can *only*
use r8-r31.  The first bank has to be left alone else the
devirtualizer becomes a few orders of magnitude harder, if not
impossible, to make work correctly.

> In some cases, the normal optimization steps remove a lot, if not all, 
> of the unnecessary register passing, but not always.

I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers.  I've written
my own on some occasions (for rl78 too).  Perhaps this is a good
starting point to look at?

> much needless copying, which strengthens my suspicion that it's 
> something in the RL78 backend that needs 'tweaking'.

Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.

I keep hoping that gcc's own post-reload optimizers would do a better
job, though.  Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.

Reply via email to