Andrew Stubbs <a...@codesourcery.com> writes:
> On 18/04/12 16:53, Richard Sandiford wrote:
>> Andrew Stubbs<a...@codesourcery.com>  writes:
>>> On 18/04/12 11:55, Richard Sandiford wrote:
>>>> The problem is that not all register moves are always going to be
>>>> eliminated, even when no mode changes are involved.  It might make
>>>> sense to restrict that code you quoted:
>>>>
>>>>        case SIMPLE_PSEUDO_REG_MOVE:
>>>>          if (MODES_TIEABLE_P (GET_MODE (x), word_mode))
>>>>            bitmap_set_bit (decomposable_context, regno);
>>>>          break;
>>>>
>>>> to the second pass though.
>>>
>>> Yes, I thought of that, but I dismissed it because the second pass is
>>> really very late. It would be just in time to take advantage of the
>>> relaxed register allocation, but would miss out on all the various
>>> optimizations that forward-propagation, combining, and such can offer.
>>>
>>> This is why I've tried to find a way to do something about it in the
>>> first pass. I thought it makes sense to do something for none-no-op
>>> moves (when is there such a thing, btw, without it being and extend,
>>> truncate, or subreg?),
>>
>> AFAIK there isn't, which is why I'm a bit unsure what you're suggesting.
>
> And why I don't understand what the current code is trying to achieve.

See below.

>> Different modes like DI and DF can both be stored in NEON registers,
>> so if you have a situation where one is punned into the other,
>> I think that's an even stronger reason to want to keep them together.
>
> Does the compiler use pseudo-reg copies for that? I thought it mostly 
> just referred to the same register with a different mode and everything 
> just DTRT.
>
> OK, let's go back to the start: at first sight, the lower-subregs pass 
> decomposes every psuedo-register that is larger than a core register, is 
> only defined or used via subreg or a simple copy, or is a copy of a 
> decomposed register that has no non-decomposable features itself 
> (forward propagation). It does not deliberately decompose 
> pseudo-registers that are only copies from or to a hard-register, even 
> though there's nothing intrinsically non-decomposable about that 
> (besides that there's no benefit), but it can happen if forward 
> propagation occurs. It explicitly does not decompose any pseudo that is 
> used in a non-move DImode operation.
>
> All this makes sense to me: if the backend is written such that DImode 
> operations are expanded in terms of SImode subregs, then it's better to 
> think of the subregs independently. (On ARM, this *is* the case when 
> NEON is disabled.)
>
> But then there's this extra "feature" that a pseudo-to-pseudo copy 
> triggers both pseudo registers to be considered decomposable (unless 
> there's some other use that prohibits it), and I don't know why?
>
> Yes, I understand that a move from NEON to core might benefit from this, 
> but those don't exist before reload. I also theorized that moves that 
> convert to some other kind of mode might be interesting (the existing 
> code checks for "tieable" modes, presumable with reason), but I can't 
> come up with a valid example (mode changes usually require a non-move 
> operation of some kind).
>
> In fact, the only examples of a pseudo-pseudo copy that won't be 
> eliminated by fwprop et al would be to do with loops and conditionals, 
> and I don't understand why they should be special.

Not just those, because loads, stores, calls, volatiles, etc.,
can't be moved freely.  E.g. code like:

    uint64_t foo (uint64_t *x, uint64_t z)
    {
      uint64_t y = *x;
      *x = z;
      return y;
    }

benefits too, because y must be a pseudo.

I don't think the idea is that these cases are special in themselves.
What we're looking for are pseudos that _may_ be decomposed into
separate registers.  If one of the pseudos in the move is only used in
decomposable contexts (including nonvolatile loads and stores, as well
as copies to and from hard registers, etc.), then we may be able to
completely replace the original pseudo with two smaller ones.  E.g.:

    (set (reg:DI X) (mem:DI ...))
    ...
    (set (reg:DI Y) (reg:DI X))

In this case, X can be completely replaced by two SImode registers.

What isn't clear to me is why we don't seem to do the same for:

    (set (reg:DI X) (mem:DI ...))
    (set (mem:DI ...) (reg:DI X))

Perhaps we do and I'm just misreading the code.  Or perhaps it's just
too hard to get the costs right.  Splitting that would be moving even
further from what you want though :-)

> The result of this extra feature is that if I copy the output of a 
> DImode insn *directly* to a DImode hard reg (say a return value) then 
> there's no decomposition, but if the expand pass happens to have put an 
> intermediate pseudo register (as it does do) then this extra rule 
> decomposes it most unhelpfully (ok, there's only actually a problem if 
> the compiler can reason that one subreg or the other is unchanged, as is 
> the case with sign_extend).

But remember that this pass is not designed for targets like NEON that
have lots of native double-word operations.  It's designed for targets
that don't.  I think you said earlier that your testcase was handled
correctly for non-NEON, which is the point: decomposing in that case
_may_ be a benefit (if we end up being able to replace all uses of a
doubleword pseudo with uses of 2 word pseudos) and should be no worse
otherwise.

> So, after having thought all this through again, unless somebody can 
> show why not, I propose that we remove this mis-feature entirely, or at 
> least disable it in the first pass.

I still prefer the idea of disabling in the first pass.  It'll need to
be tested on something like non-NEON ARM to see whether it makes things
worse or better there.  (I think size testing would be fine.)

Richard

Reply via email to