On 10/6/23 08:17, Manolis Tsamis wrote:
SNIP
So I was ready to ACK, but realized there weren't any testresults for a
primary platform mentioned. So I ran this on x86.
It's triggering one regression (code quality).
Specifically gcc.target/i386/pr52146.c
The f-m-o code is slightly worse than without f-m-o.
Without f-m-o we get this:
9 0000 B88000E0 movl $-18874240, %eax
9 FE
10 0005 67C70000 movl $0, (%eax)
10 000000
11 000c C3 ret
With f-m-o we get this:
9 0000 B8000000 movl $0, %eax
9 00
10 0005 67C78080 movl $0, -18874240(%eax)
10 00E0FE00
10 000000
11 0010 C3 ret
The key being that we don't get rid of the original move instruction,
nor does the original move instruction get smaller due to simplification
of its constant. Additionally, the memory store gets larger. The net
is a 4 byte increase in code size.
Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).
This is probably a fairly rare scenario and the original bug report was
for a correctness issue in using addresses in the range
0x80000000..0xffffffff in x32. So I wouldn't lose any sleep if we
adjusted the test to pass -fno-fold-mem-offsets. But before doing that
I wanted to give you the chance to ponder if this is something you'd
prefer to improve in f-m-o itself. At some level if the base register
collapses down to 0, then we could take the offset as a constant address
and try to recognize that form. If that fails, then just consider the
change unprofitable rather than trying to recognize it as reg+d.
Anyway, waiting to hear your thoughts...
Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.
I must have missed that in the earlier discussion.
I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?
It's a code quality issue as long as we don't transform the code into
movl $0, -18874240, at which point it would become a correctness issue.
With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.
It's a chain of insns that produce an address and use it in the memory
reference. We essentially changed the first insn in the chain from movl
-18874240, %eax into movl 0, %eax. So we'd have to someone note that
the base register in the memory reference has the value zero in the
chain of instructions. That may or may not be reasonable to do.
One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?
We don't have a generalized query to do that. You might be able to ask
what the cost to load 0 into a register is, but many targets
artificially decrease that value.
You could use the costing model to cost the entire sequence
before/after. There's an interface to walk a sequence and return a
cost. In the case of f-m-o the insns are part of the larger chain, so
we'd need a different API.
The other option would be to declare this is known, but not important
issue. I would think that it's going to be rare to have absolute
addresses and x32 isn't really used much. The combination of the two
should be exceedingly rare. Hence my willingness to use
-fno-fold-mem-offsets in the test.
Jeff