On 08/30/2017 12:34 AM, Michael Clark wrote: > >> On 30 Aug 2017, at 12:36 PM, Michael Clark <michaeljcl...@mac.com> wrote: >> >> Dear GCC folk, >> >> >> # Issue Background >> >> We’re investigating an issue with redundant sign-extension instructions >> being emitted with the riscv backend. Firstly I would like to state that >> riscv is possibly a unique backend with respect to its canonical >> sign-extended register form due to the following: >> >> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit >> operations on RV64 is sign-extended not zero-extended. >> >> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to >> signed DI mode values holding SI mode subregs >> >> - RISC-V does not have register aliases for these different modes, rather >> different instructions are selected to operate on different register widths. >> >> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, >> etc. >> >> - Some instructions such as logical ops only have full word width variants >> (AND, OR, XOR) but these instructions maintain canonical form as there is no >> horizontal movement of bits. >> >> >> # Issue Details >> >> During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to >> define TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote >> values to wider modes. This was done to ensure ABI correctness so that >> registers holding narrower modes always contain canonical sign extended >> values. >> >> The result of this is that sign_extend operations are inserted but later >> eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly >> eliminated, and indeed most 32-bit instructions are handled correctly. This >> is what the pattern looks like for a typical 32-bit instruction after expand: >> >> ;; >> ;; Full RTL generated for this function: >> ;; >> (note 1 0 4 NOTE_INSN_DELETED) >> (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) >> (insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ]) >> (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1 >> (nil)) >> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >> (insn 6 3 7 2 (set (reg:SI 74) >> (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0) >> (const_int 24 [0x18]))) "lshift.c":1 -1 >> (nil)) >> (insn 7 6 8 2 (set (reg:DI 75) >> (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1 >> (nil)) >> (insn 8 7 12 2 (set (reg:DI 72 [ <retval> ]) >> (reg:DI 75)) "lshift.c":1 -1 >> (nil)) >> (insn 12 8 13 2 (set (reg/i:DI 10 a0) >> (reg:DI 72 [ <retval> ])) "lshift.c":1 -1 >> (nil)) >> (insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1 >> (nil)) >> >> >> # Problem test cases >> >> - testcase 1 - open coded bswap - redundant sign extension instructions >> - testcase 2 - open coded right shift - redundant sign extension instructions >> - testcase 3 - open coded left shift - control / correct behaviour >> >> It appears that the downside to the PROMOTE_MODE transition is that several >> redundant sign-extension operations are cropping up under specific >> circumstances. One of the first small isolators is testcase 1 (below), which >> is an open-coded bswap. You’ll notice the SEXT.W emitted before the return. >> This was then further isolated to an even simpler testcase 2 which is a >> single right shift which also emits SEXT.W before the return. A 32-bit left >> shift or other 32-bit operations correctly have the redundant sign_extend >> eliminated. >> >> We have isolated the code that prevented the right shift from having its >> redundant sign extension eliminated and it is target independent code in >> simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is >> more efficient, likely an assumption based on the fact that most platforms >> define POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero >> extended form vs sign extended form. The redundant sign extension appears to >> be due to an optimisation in simply_rtx that generates zero extend for right >> shifts where the value is non zero, based on the assumption that zero extend >> is” better”. This is not true on platforms that define >> POINTERS_EXTEND_UNSIGNED to false i.e. naturally promote subregister width >> operations to canonical sign-extended form internally. >> >> >> # Partial resolution of test case 2 >> >> We have prepared a patch that disables the zero extend optimisation on >> platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in >> testcase 2. We believe this fix is very low risk because right shift for >> values above zero will always have a zero sign bit, thus either sign or zero >> extension can be used (this is in the comment), however sign-extension is >> “better” on RISC-V and likely all other platforms with >> POINTERS_EXTEND_UNSIGNED false. Platforms like x86, Aarch64 and most others >> define POINTERS_EXTEND_UNSIGNED to 1 so would not be affected by this change. >> >> Please review this candidate patch to fix the codegen issue for testcase 2. >> >> >> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c >> index ce632ae..25dd70f 100644 >> --- a/gcc/simplify-rtx.c >> +++ b/gcc/simplify-rtx.c >> @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, >> machine_mode mode, rtx op) >> /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as >> (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0. */ >> if (GET_CODE (op) == LSHIFTRT >> +#if defined(POINTERS_EXTEND_UNSIGNED) >> + /* we skip this optimisation if pointers naturally extend signed */ >> + && POINTERS_EXTEND_UNSIGNED >> +#endif >> && CONST_INT_P (XEXP (op, 1)) >> && XEXP (op, 1) != const0_rtx) >> return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); >> >> >> However, after applying this patch, we noticed that while it resolved the >> redundant sign extension after the right shift in testcase 2, it has not >> resolved the redundant sign extension emitted in testcase 1. >> >> >> # Questions >> >> 1. Is the first patch an appropriate fix for testcase 2? >> >> This particular patch seems harmless as zero or sign extending a right shift >> of a non-zero value will have the same result, however in RISC-V’s case, >> sign-extension can be eliminated at no cost and the zero_extend results in >> the redundant SEXT.W due to an inserted sign_extend for canonicalisation of >> the register value. It seemed to me that we can’t easily access promote_mode >> in simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the >> canonical representation for wider types is sign-extended. I believe that is >> the intention of that flag as it is used this way in the sign and zero >> extension logic in emit-rtl.c >> >> 2. Where should we fix the original bswap testcase 1? >> >> The redundant sign extension is not solved by the patch above and it seems >> we have found 2 distinct issues. We are now wondering if logic to handle >> redundant sign extension elimination for the word width logical operations >> (AND/OR/XOR) is missing in simplify_rtx or REE (Redundant Extension >> Elimination). 64-bit logical operations on DI mode registers with SI mode >> subregs should also have their sign_extend operations eliminated as there is >> no horizontal bit movement introduced by these operations. Upon looking at >> the RTL for testcase 1, we can see that there is a very similar pattern to >> the right shift case. We are seeking guidance on where to find the logic (or >> lack thereof) to eliminate sign_extend on DI mode registers containing SI >> mode subregs for full width logical ops. Should this logic be in >> simplify-rtx.c or ree.c? > > I believe these patterns can have their sign extension eliminated on riscv > but REE is not touching them for some reason: > > (sign_extend:DI (and:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI))) > (sign_extend:DI (or:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI))) > (sign_extend:DI (xor:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI)))
Couldn't can't we factor out the subregs so that you have (sign_extend:DI (subreg:DI (and:SI (reg1:SI) (reg2:SI)))) Which something should be able to trivially simplify into: (sign_extend:DI (and:SI (reg1:SI) (reg2:SI)) That's a generic simplification and ought to be done in simplify-rtx or somewhere similar. In general, eliminating the subregs like that should be a good thing. Jeff