Hi Kito, Thank you for taking the time to review my patch. I am posting an updated patchset taking into account your comments.
On 18/09/2019 11:01, Kito Cheng wrote: > Hi Craig: > > Some general review comment: > - Split new pass into new file. > - Add new option to enable/disable this pass. > - Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd? > I think there is lots of common logic for supporting other types > compressed load/store > instruction, but I'd like to see those support at once. I agree the patch could be extended to other load/store instructions but unfortunately I don't have the time to do this at the moment. Can the lw/sw support be merged and the others added later? > - Do you have experimental data about doing that after register > allocation/reload, I don't think it is feasible to move the pass after reload, because the pass requires a new register to be allocated for the new base. > I'd prefer doing such optimization after RA, because we can > accurately estimate > how many byte we can gain, I guess it because RA didn't assign fit > src/dest reg > or base reg so that increase code size? > Before reload, we do not know whether the base reg will be a compressed register or not. > > On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore > <craig.blackm...@embecosm.com> wrote: >> >> This patch aims to allow more load/store instructions to be compressed by >> replacing a load/store of 'base register + large offset' with a new >> load/store >> of 'new base + small offset'. If the new base gets stored in a compressed >> register, then the new load/store can be compressed. Since there is an >> overhead >> in creating the new base, this change is only attempted when 'base register' >> is >> referenced in at least 4 load/stores in a basic block. >> >> The optimization is implemented in a new RISC-V specific pass called >> shorten_memrefs which is enabled for RVC targets. It has been developed for >> the >> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in >> future. >> >> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes >> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the >> Embench benchmark suite (https://www.embench.org/) we see code size >> reductions >> of up to 18 bytes (0.7%) and only two cases where code size is increased >> slightly, by 2 bytes each: >> >> Embench results (.text size in bytes, excluding .rodata) >> >> Benchmark Without patch With patch Diff >> aha-mont64 1052 1052 0 >> crc32 232 232 0 >> cubic 2446 2448 2 >> edn 1454 1450 -4 >> huffbench 1642 1642 0 >> matmult-int 420 420 0 >> minver 1056 1056 0 >> nbody 714 714 0 >> nettle-aes 2888 2884 -4 >> nettle-sha256 5566 5564 -2 >> nsichneu 15052 15052 0 >> picojpeg 8078 8078 0 >> qrduino 6140 6140 0 >> sglib-combined 2444 2444 0 >> slre 2438 2420 -18 >> st 880 880 0 >> statemate 3842 3842 0 >> ud 702 702 0 >> wikisort 4278 4280 2 >> ------------------------------------------------- >> Total 61324 61300 -24 >> >> The patch has been tested on the following bare metal targets using QEMU >> and there were no regressions: >> >> rv32i >> rv32iac >> rv32im >> rv32imac >> rv32imafc >> rv64imac >> rv64imafdc >> >> We noticed that sched2 undoes some of the addresses generated by this >> optimization and consequently increases code size, therefore this patch adds >> a >> check in sched-deps.c to avoid changes that are expected to increase code >> size >> when not optimizing for speed. Since this change touches target-independent >> code, the patch has been bootstrapped and tested on x86 with no regressions. >> >> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c >> index 52db3cc..92a0893 100644 >> --- a/gcc/sched-deps.c >> +++ b/gcc/sched-deps.c >> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see >> #include "sched-int.h" >> #include "params.h" >> #include "cselib.h" >> +#include "predict.h" >> >> #ifdef INSN_SCHEDULING >> >> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx >> new_addr) >> rtx mem = *mii->mem_loc; >> rtx new_mem; >> >> + /* When not optimizing for speed, avoid changes that are expected to make >> code >> + size larger. */ >> + addr_space_t as = MEM_ADDR_SPACE (mem); >> + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn)); >> + int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed); >> + int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed); >> + if (new_cost > old_cost && !speed) > > I think !speed should not needed here, it mean address_cost is > incorrect if generated worse code, but this change will effect all > other targets, > so I think it would be better to split into another patch and CC > related reviewer. > > I have removed !speed in the updated patch and CC'd Jeff Law. Jeff - please could you review my change to sched-deps.c in patch 2/2? Thanks, Craig -- Craig Blackmore (2): RISC-V: Add shorten_memrefs pass sched-deps.c: Avoid replacing address if it increases address cost gcc/config.gcc | 2 +- gcc/config/riscv/riscv-passes.def | 20 +++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++ gcc/config/riscv/riscv.c | 86 ++++++++++- gcc/config/riscv/riscv.h | 5 + gcc/config/riscv/riscv.opt | 6 + gcc/config/riscv/t-riscv | 5 + gcc/doc/invoke.texi | 10 ++ gcc/sched-deps.c | 9 ++ 10 files changed, 327 insertions(+), 6 deletions(-) create mode 100644 gcc/config/riscv/riscv-passes.def create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c -- 2.17.1