https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #3 from Jim Wilson <wilson at gcc dot gnu.org> ---
The basic idea here is that the movqi pattern in riscv.md currently emits RTL
for a load that looks like this
  (set (reg:QI target) (mem:QI (address)))
As an experiment, we want to try changing that to something like this
  (set (reg:DI temp) (zero_extend:DI (mem:DI (address))))
  (set (reg:QI target) (subreg:QI (reg:DI temp) 0))
The hope is that the optimizer will combine the subreg with following
operations resulting in smaller faster code at the end.  And this should also
solve the volatile load optimization problem.  So we need a patch, and then we
need experiments to see if the patch actually produces better code on real
programs.  It should be fairly easy to write the patch even if you don't have
any gcc experience.  The testing part of this is probably more work than the
patch writing.

The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to
be modified to look for qimode loads from memory, allocate a temporary
register, do a zero_extending load into the temp reg, and then a subreg copy
into the target register.

You will probably also need to handle cases where both the target and source
are memory locations, in which case this already gets split into two
instructions, a load followed by a store.

You can look at the movqi pattern in arm.md file to see an example of how to do
this, where it calls gen_zero_extendqisi2.  Though for RISC-V, we would want
gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32.

If the movqi change works, then we would want similar changes for movhi and
maybe also movsi for rv64.

It might also be worth checking whether zero-extend or sign-extend is the
better choice.  We zero extend char by default, so that should be best.  For
rv64, the hardware sign extends simode to dimode by default, so sign-extend is
probably best for that.  For himode I'm not sure, I think we prefer sign-extend
by default, but that isn't necessarily the best choice for loads.  This would
have to be tested.

You can see rtl dumps by using -fdump-rtl-all.  The combiner is the pass that
should be optimizing away the unnecessary zero-extend.  You can see details of
what the combiner pass is doing by using -fdump-rtl-combine-all.

Reply via email to