On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
<craig.blackm...@embecosm.com> wrote:
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an 
> overhead
> in creating the new base, this change is only attempted when 'base register' 
> is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for 
> the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in 
> future.

The fact that this needs 4 load/stores in a block with the same base
address means it won't trigger very often.  But it does seem to work.
I see about a 0.05% code size reduction for a rv32gc newlib nano
libc.a.  There might be other codes that benefit more.

I'm concerned about the number of RISC-V specific optimization passes
people are writing.  I've seen at least 3 so far.  This one is at
least small and simple enough to understand that it doesn't look like
it will cause maintenance problems.

The config.gcc change conflicts with the save-restore optimization
pass that Andrew Burgess added, but that is a trivial fix.

The code only works for 32-bit load/stores.  rv64 has compressed
64-bit load/stores, and the F and D extensions have float and double
compressed loads and stores.  The patch would be more useful if it
handled all of these.

The patch doesn't check the other operand, it only looks at the memory
operand.  This results in some cases where the code rewrites
instructions that can never be compressed.  For instance, given
void
store1z (int *array)
{
  array[200] = 0;
  array[201] = 0;
  array[202] = 0;
  array[203] = 0;
}
the patch increases code size by 4 bytes because it rewrites the
stores, but we don't have compressed store 0, and x0 is not a
compressed reg, so there is no point in rewriting the stores.  I can
also create examples that show a size increase with loads but it is a
little trickier.
extern int sub1 (int, int, int, int, int, int, int);
int
load1a (int a0, int a1, int a2, int a3, int a4, int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return sub1 (a0, a1, a2, a3, a4, 0, a);
}
The register allocator will pass a0-a4 directly through from args to
the call, leaving only a5-a7 for the load base address and dest, and
only one of those regs is a compressed reg, but we need two, so these
loads can't be compressed.  The code still gets rewritten, resulting
in a size increase for the extra add.  Not sure if you can do anything
about that though, since you are doing this before register
allocation.

It isn't clear that the change riscv_address_cost is for.  That should
be commented.

I'd suggest parens in the riscv_compressed_lw_offset_p return
statement, that makes it easier to align the code correctly (in emacs
at least).

The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
constants" section of riscv.h, and maybe renamed similar to the other
constants.

There is a REG_P check in get_si_mem_reg.  That should probably handle
SUBREGs too.

A testcase to verify the patch would be nice.  I have one I wrote for
testing that shows some tests that work, some tests that don't work,
and some that work but maybe shouldn't.

I vaguely remember Micheal Meissner talking about doing something
similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
if he tried, or how far he got if he did try.  It might be useful to
ask him about that work.

Otherwise this looks OK to me.

Jim

Reply via email to