>> If the port has a splitter to rip apart a douple-word load into single-word 
>> loads, then we'd obviously only want to do that in cases where the 
>> double-word load actually generates > 1 assembly instruction.

Or indeed if it is really a performance win. And I think that should
purely be a per port / micro-architectural decision .

>
> For arm/aarch64, I guess it's not an issue, otherwise the peephole2
> won't work at all.  ARM maintainers should have answer to this.

Generating more ldrd's and strd's will be beneficial in the ARM and
the AArch64 port - we save code size and start using more memory
bandwidth available per instruction on most higher end cores that I'm
aware of. Even on the smaller microcontrollers I expect it to be a win
because you've saved code size. There may well be pathological cases
given we've shortened some dependencies or increased lifetimes of
others but overall I'd expect it to be more positive than negative.

I also expect this to be more effective in the T32 (Thumb2) ISA and
AArch64 because ldrd/ strd and ldp / stp respectively can work with
any registers unlike the A32 ISA where the registers loaded or stored
must be consecutive registers. I'm hoping for some more review on the
generic bits before looking into the backend implementation in the
expectation that this is the direction folks want to proceed.


regards
Ramana




>
>
>>
>> jeff

Reply via email to