https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org
--- Comment #1 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to sameerad from comment #0)
> While implementing peephole2 for combining shorter types loads/stores into
> larger type load/store, following testcase was found for aarch64 for which
> peephole does not happen because the type of zero/sign extended operands is
> not the same.
>
> Test program:
> unsigned short
> subus (unsigned short *array)
> {
> return array[0] + array[1];
> }
>
> Expander generated RTL:
> (insn 6 3 7 2 (set (reg:HI 96)
> (mem:HI (reg/v/f:DI 94 [ array ]) [1 *array_4(D)+0 S2 A16]))
> (nil))
> (insn 7 6 8 2 (set (reg:HI 97)
> (mem:HI (plus:DI (reg/v/f:DI 94 [ array ])
> (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D)
> + 2B]+0 S2 A16]))
> (nil))
> (insn 8 7 9 2 (set (reg:SI 99)
> (subreg:SI (reg:HI 97) 0))
> (nil))
> (insn 9 8 10 2 (set (reg:SI 98)
> (plus:SI (subreg:SI (reg:HI 96) 0)
> (reg:SI 99)))
> (expr_list:REG_EQUAL (plus:SI (subreg:SI (reg:HI 96) 0)
> (subreg:SI (reg:HI 97) 0))
> (nil)))
>
> The combiner combines insn 7 and 8 to generate zero extension to SI mode.
>
> (insn 8 7 9 2 (set (reg:SI 99 [ MEM[(short unsigned int *)array_4(D) + 2B] ])
> (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 94 [ array ])
> (const_int 2 [0x2])) [1 MEM[(short unsigned int
> *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64}
> (expr_list:REG_DEAD (reg/v/f:DI 94 [ array ])
> (nil)))
>
> The reload pass removes SUBREGs, which holds information about desired
> type, because of which HImode regs are zero extended to DImode.
>
> (insn 8 7 6 2 (set (reg:SI 1 x1 [orig:99 MEM[(short unsigned int
> *)array_4(D) + 2B] ] [99])
> (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 0 x0 [orig:94 array ]
> [94])
> (const_int 2 [0x2])) [1 MEM[(short unsigned int
> *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64}
> (nil))
> (insn 6 8 9 2 (set (reg:DI 0 x0)
> (zero_extend:DI (mem:HI (reg/v/f:DI 0 x0 [orig:94 array ] [94]) [1
> *array_4(D)+0 S2 A16]))) {*zero_extendhidi2_aarch64}
> (nil))
> (insn 9 6 14 2 (set (reg:SI 0 x0 [98])
> (plus:SI (reg:SI 0 x0 [orig:96 *array_4(D) ] [96])
> (reg:SI 1 x1 [orig:99 MEM[(short unsigned int *)array_4(D) + 2B]
> ] [99]))){*addsi3_aarch64}
> (nil))
> (insn 14 9 15 2 (set (reg/i:HI 0 x0)
> (reg:HI 0 x0 [98])) {*movhi_aarch64}
> (nil))
> (insn 15 14 17 2 (use (reg/i:HI 0 x0))
> (nil))
> (note 17 15 18 NOTE_INSN_DELETED)
> (note 18 17 0 NOTE_INSN_DELETED)
>
> Now as both memory accesses have different extended types, they cannot be
> combined by peephole.
>
> Because of this, even when sched_fusion has brought the loads/stores closer,
> they cannot be merged.
Hmmm,
ldr w0, [x0]
ldr w1, [x0, 2]
is not the same as
ldp w0, w1, [x0]
ldp w0, w1, [x0] is the same as merging
ldr w0, [x0]
ldr w1, [x0, 4]
Am I missing something ? That would mean it isn't possible to merge this
combination.
Thoughts ...