>> If the port has a splitter to rip apart a douple-word load into single-word >> loads, then we'd obviously only want to do that in cases where the >> double-word load actually generates > 1 assembly instruction.
Or indeed if it is really a performance win. And I think that should purely be a per port / micro-architectural decision . > > For arm/aarch64, I guess it's not an issue, otherwise the peephole2 > won't work at all. ARM maintainers should have answer to this. Generating more ldrd's and strd's will be beneficial in the ARM and the AArch64 port - we save code size and start using more memory bandwidth available per instruction on most higher end cores that I'm aware of. Even on the smaller microcontrollers I expect it to be a win because you've saved code size. There may well be pathological cases given we've shortened some dependencies or increased lifetimes of others but overall I'd expect it to be more positive than negative. I also expect this to be more effective in the T32 (Thumb2) ISA and AArch64 because ldrd/ strd and ldp / stp respectively can work with any registers unlike the A32 ISA where the registers loaded or stored must be consecutive registers. I'm hoping for some more review on the generic bits before looking into the backend implementation in the expectation that this is the direction folks want to proceed. regards Ramana > > >> >> jeff