On 10/10/23 05:59, Manolis Tsamis wrote:
It's a code quality issue as long as we don't transform the code into
movl $0, -18874240, at which point it would become a correctness issue.
Ok, thanks for pointing that out as I thought that movl $0, -18874240
and movl $0, -18874240(eax) with eax == 0 were the same.
It's a quirk of x32 related to sign extension of the address. Certainly
not obvious! But it's better than the PA where:
(mem (plus (reg A) (reg B)))
Is semantically different than
(mem (plus (reg B) (reg A)))
Due to the implicit segment register selection that happens on the high
bits of the base register rather than the high bits of the effective
address.
With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.
It's a chain of insns that produce an address and use it in the memory
reference. We essentially changed the first insn in the chain from movl
-18874240, %eax into movl 0, %eax. So we'd have to someone note that
the base register in the memory reference has the value zero in the
chain of instructions. That may or may not be reasonable to do.
Ok, do you believe it would be worthwhile to add a REG_EQ note or
something similar? I assume some REG_EQ notes will be added from the
following cprop-hardreg pass too?
I think it really depends on the degree of difficulty and whether or not
we think there are other cases like this one lurking.
Given this scenario can only happen with an absolute address, I would
probably explore if there's a way to trivially detect this case, but I
wouldn't spend a lot of time on it.
And I wasn't necessarily thinking of a note in the RTL, just a bit of
state within f-m-o when updating an arithmetic chain so that we could
determine that the final instruction was a memory reference to an
absolute address.
You could use the costing model to cost the entire sequence
before/after. There's an interface to walk a sequence and return a
cost. In the case of f-m-o the insns are part of the larger chain, so
we'd need a different API.
The other option would be to declare this is known, but not important
issue. I would think that it's going to be rare to have absolute
addresses and x32 isn't really used much. The combination of the two
should be exceedingly rare. Hence my willingness to use
-fno-fold-mem-offsets in the test.
Yes, I now better understand why you suggested adding
-fno-fold-mem-offsets to the testcase. Or we could also make the test
not fail in this case where the memory access has the base register,
as there's no correctness issue.
:)
Then, going back to the code quality regression, wouldn't things be
much better if we would emit xor eax, eax instead of mov eax, 0? In
that case xor eax, eax should be 2 bytes instead of 5 for mov eax, 0
and hence the code size difference should be trivial. Then we can keep
f-m-o as is, including this testcase.
The selection of xor vs mov imm is handled elsewhere. On some uarchs
the former is preferred, on others the latter.
If you think about xor %eax,%eax as an example. Unless you special case
that scenario in your cpu front-end, the xor will have a data dependency
on any prior set of %eax which inhibits ILP.
Is there a way to emit a taret-specific optimized register zero insn?
If so I'll use that and adjust the testcase as needed, and I think
we're done with this one.
I can live with just adjusting the testcase. Let's go with that.
jeff