Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Jeff Law Fri, 06 Oct 2023 10:58:10 -0700



On 10/6/23 08:17, Manolis Tsamis wrote:
SNIP

So I was ready to ACK, but realized there weren't any testresults for a
primary platform mentioned.  So I ran this on x86.

It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

     9 0000 B88000E0              movl    $-18874240, %eax
     9      FE
    10 0005 67C70000              movl    $0, (%eax)
    10      000000
    11 000c C3                    ret

With f-m-o we get this:

     9 0000 B8000000              movl    $0, %eax
     9      00
    10 0005 67C78080              movl    $0, -18874240(%eax)
    10      00E0FE00
    10      000000
    11 0010 C3                    ret


The key being that we don't get rid of the original move instruction,
nor does the original move instruction get smaller due to simplification
of its constant.  Additionally, the memory store gets larger.  The net
is a 4 byte increase in code size.

Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).


This is probably a fairly rare scenario and the original bug report was
for a correctness issue in using addresses in the range
0x80000000..0xffffffff in x32.  So I wouldn't lose any sleep if we
adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
I wanted to give you the chance to ponder if this is something you'd
prefer to improve in f-m-o itself.   At some level if the base register
collapses down to 0, then we could take the offset as a constant address
and try to recognize that form.  If that fails, then just consider the
change unprofitable rather than trying to recognize it as reg+d.

Anyway, waiting to hear your thoughts...

Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.

I must have missed that in the earlier discussion.

I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?

It's a code quality issue as long as we don't transform the code intomovl $0, -18874240, at which point it would become a correctness issue.


With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.

It's a chain of insns that produce an address and use it in the memoryreference. We essentially changed the first insn in the chain from movl-18874240, %eax into movl 0, %eax. So we'd have to someone note thatthe base register in the memory reference has the value zero in thechain of instructions. That may or may not be reasonable to do.


One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?

We don't have a generalized query to do that. You might be able to askwhat the cost to load 0 into a register is, but many targetsartificially decrease that value.

You could use the costing model to cost the entire sequencebefore/after. There's an interface to walk a sequence and return acost. In the case of f-m-o the insns are part of the larger chain, sowe'd need a different API.

The other option would be to declare this is known, but not importantissue. I would think that it's going to be rare to have absoluteaddresses and x32 isn't really used much. The combination of the twoshould be exceedingly rare. Hence my willingness to use-fno-fold-mem-offsets in the test.


Jeff

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to