On 24/03/14 04:44, Jeff Law wrote:
On 03/22/14 05:29, Richard Hulme wrote:
On 22/03/14 01:47, Jeff Law wrote:
On 03/21/14 18:35, DJ Delorie wrote:

I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers.  I've written
my own on some occasions (for rl78 too).  Perhaps this is a good
starting point to look at?

much needless copying, which strengthens my suspicion that it's
something in the RL78 backend that needs 'tweaking'.

Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.

I keep hoping that gcc's own post-reload optimizers would do a better
job, though.  Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.
The virtual register file was the only way I could see to make RL78
work.  I can't recall the details, but when you described the situation
to me the virtual register file was the only way I could see to make the
RL78 work in the IRA+reload world.

What would be quite interesting to try would be to continue to use the
virtualized register set, but instead use the IRA+LRA path.  Presumably
that wouldn't be terribly hard to try and there's a reasonable chance
that'll improve the code in a noticeable way.

Looking at how that's done by other backends, as far as I can tell, I
just need to add something like:

#undef  TARGET_LRA_P
#define TARGET_LRA_P rl78_enable_lra

static bool
rl78_enable_lra (void)
{
   return true;
}

to rl78.c?  At least in theory, even if other work is needed elsewhere
to make things run smoothly.

Unfortunately, that function never seems to be called.

How does TARGET_LRA_P get used, anyway?  I can't find anything that
tries to use it, only places where it gets set.  Is there some funky
preprocessor stuff going on that's stopping me grepping for it?
That should be enough to switch to the LRA path.   It's a target hook.
Grep for "targetm.lra_p"

Hi Jeff,

Ok, I figured out what was wrong eventually. I'd added the lines above *after* the declaration of the targetm variable.

Activating LRA alone is certainly not the answer. Whilst I can see that *some* of the "to me, to you" register passing has been eliminated, LRA seems to have an intense dislike to indirect memory addressing with an offset. So instead of something like:

mov   a, [sp+4]

it's now producing:

movw   ax, sp
addw   ax, #4
movw   hl, ax
mov    a, [hl]

which takes 7 bytes (compared to 4). Overall I've got an code increase of about 31%.

I don't know why it's avoiding the indirect with offset addressing mode. It *does* generate code using it but seemingly as a last resort.

Something else to track down, I guess.

Regards,

Richard.

Reply via email to