Hi, I’ve spun an alternative version of RISC-V GCC which preserves RV64 ABI sign extended canonicalisation for passed values but keeps values in SImode until return.
https://github.com/michaeljclark/riscv-gcc/commits/riscv-gcc-7-mc These are the changes: - #undef PROMOTE_MODE - #define TARGET_PROMOTE_FUNCTION_MODE - Implement riscv_function_promote to promote arguments and return values - Add SImode logical op patterns to riscv.md The RTL for the sign extension regression test cases now contain all SImode ops, however REE is still not able to remove sign_extend after and/ior/xor/not with SImode inputs that are in canonical form. I’m still seeing shifts not being coalesced as this likely happens in combine, and the redundant sign extensions are not eliminated until much later in REE with this particular configuration. Observations. - Aarch64 leaves upper bits undefined instead of canonicalising them and PROMOTE_MODE promotes QImode and HImode to SImode. - RISC-V defines explicit sign_extend + op pattern matches in its metadata to collapse redundant sign extends - Explicit _extend metadata patterns appear to be required for REE to eliminate redundant sign_extend. - I removed these patterns and then all functions returning int had: sext.w a0, a0; ret - Explicit _extend metadata patterns span only two ops - e.g. (sign_extend:DI (plus:SI)), (sign_extend:DI (minus:SI)), (sign_extend:DI (any_shift:SI)), etc - REE may not be able to cross enough transitive dependencies to handle sign_extend and/ior/xor/not ops with canonical inputs - e.g. (sign_extend:DI (logical_op:SI (plus_minus_shift:SI) (plus_minus_shift:SI))) - e.g. (sign_extend:DI (logical_op:SI (logical_op:SI (logical_op:SI (plus_minus_shift:SI) (plus_minus_shift:SI))))) - metadata pattern matching approach is untenable where there is a series of logical ops on canonical inputs. e.g. the bswap test case - Shift coalescing is done before REE in combine so this may be why we are seeing missed optimisations. - there may be more optimisations that are missed due to the sign_extend promotions changing patterns - Keeping values in SImode until return doesn’t appear to help much however it may change how the problem is solved. Actions - Backport Prathamesh’s type promotion patch to riscv-gcc. It may suit the eager definition of PROMOTE_MODE - Benchmark Alternative ABI that leaves upper bits undefined, out of curiosities sake… I’ve been reasoning about the alternate ABI. I can see it has the cost of widening narrower return values when they are promoted to wider types, but the general case of adding integers to pointers would not be affected when it is inline. Deferring canonicalisation when there are several routines that return ints and work on ints as inputs would seem to indicate generation of less operations. I’d really like to see SPECint run on the alternate ABI. However the current ABI could be improved if the compiler is better able to reason about cases where the sign_extend operations don’t need to be emitted, so it would not be a fair test against the current version of the port. Michael. > On 30 Aug 2017, at 12:36 PM, Michael Clark <michaeljcl...@mac.com> wrote: > > Dear GCC folk, > > > # Issue Background > > We’re investigating an issue with redundant sign-extension instructions being > emitted with the riscv backend. Firstly I would like to state that riscv is > possibly a unique backend with respect to its canonical sign-extended > register form due to the following: > > - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit > operations on RV64 is sign-extended not zero-extended. > > - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to > signed DI mode values holding SI mode subregs > > - RISC-V does not have register aliases for these different modes, rather > different instructions are selected to operate on different register widths. > > - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc. > > - Some instructions such as logical ops only have full word width variants > (AND, OR, XOR) but these instructions maintain canonical form as there is no > horizontal movement of bits. > > > # Issue Details > > During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to define > TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote values to > wider modes. This was done to ensure ABI correctness so that registers > holding narrower modes always contain canonical sign extended values. > > The result of this is that sign_extend operations are inserted but later > eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly > eliminated, and indeed most 32-bit instructions are handled correctly. This > is what the pattern looks like for a typical 32-bit instruction after expand: > > ;; > ;; Full RTL generated for this function: > ;; > (note 1 0 4 NOTE_INSN_DELETED) > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ]) > (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1 > (nil)) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (set (reg:SI 74) > (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0) > (const_int 24 [0x18]))) "lshift.c":1 -1 > (nil)) > (insn 7 6 8 2 (set (reg:DI 75) > (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1 > (nil)) > (insn 8 7 12 2 (set (reg:DI 72 [ <retval> ]) > (reg:DI 75)) "lshift.c":1 -1 > (nil)) > (insn 12 8 13 2 (set (reg/i:DI 10 a0) > (reg:DI 72 [ <retval> ])) "lshift.c":1 -1 > (nil)) > (insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1 > (nil)) > > > # Problem test cases > > - testcase 1 - open coded bswap - redundant sign extension instructions > - testcase 2 - open coded right shift - redundant sign extension instructions > - testcase 3 - open coded left shift - control / correct behaviour > > It appears that the downside to the PROMOTE_MODE transition is that several > redundant sign-extension operations are cropping up under specific > circumstances. One of the first small isolators is testcase 1 (below), which > is an open-coded bswap. You’ll notice the SEXT.W emitted before the return. > This was then further isolated to an even simpler testcase 2 which is a > single right shift which also emits SEXT.W before the return. A 32-bit left > shift or other 32-bit operations correctly have the redundant sign_extend > eliminated. > > We have isolated the code that prevented the right shift from having its > redundant sign extension eliminated and it is target independent code in > simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is > more efficient, likely an assumption based on the fact that most platforms > define POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero > extended form vs sign extended form. The redundant sign extension appears to > be due to an optimisation in simply_rtx that generates zero extend for right > shifts where the value is non zero, based on the assumption that zero extend > is” better”. This is not true on platforms that define > POINTERS_EXTEND_UNSIGNED to false i.e. naturally promote subregister width > operations to canonical sign-extended form internally. > > > # Partial resolution of test case 2 > > We have prepared a patch that disables the zero extend optimisation on > platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in > testcase 2. We believe this fix is very low risk because right shift for > values above zero will always have a zero sign bit, thus either sign or zero > extension can be used (this is in the comment), however sign-extension is > “better” on RISC-V and likely all other platforms with > POINTERS_EXTEND_UNSIGNED false. Platforms like x86, Aarch64 and most others > define POINTERS_EXTEND_UNSIGNED to 1 so would not be affected by this change. > > Please review this candidate patch to fix the codegen issue for testcase 2. > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index ce632ae..25dd70f 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, > machine_mode mode, rtx op) > /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as > (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0. */ > if (GET_CODE (op) == LSHIFTRT > +#if defined(POINTERS_EXTEND_UNSIGNED) > + /* we skip this optimisation if pointers naturally extend signed */ > + && POINTERS_EXTEND_UNSIGNED > +#endif > && CONST_INT_P (XEXP (op, 1)) > && XEXP (op, 1) != const0_rtx) > return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op)); > > > However, after applying this patch, we noticed that while it resolved the > redundant sign extension after the right shift in testcase 2, it has not > resolved the redundant sign extension emitted in testcase 1. > > > # Questions > > 1. Is the first patch an appropriate fix for testcase 2? > > This particular patch seems harmless as zero or sign extending a right shift > of a non-zero value will have the same result, however in RISC-V’s case, > sign-extension can be eliminated at no cost and the zero_extend results in > the redundant SEXT.W due to an inserted sign_extend for canonicalisation of > the register value. It seemed to me that we can’t easily access promote_mode > in simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the canonical > representation for wider types is sign-extended. I believe that is the > intention of that flag as it is used this way in the sign and zero extension > logic in emit-rtl.c > > 2. Where should we fix the original bswap testcase 1? > > The redundant sign extension is not solved by the patch above and it seems we > have found 2 distinct issues. We are now wondering if logic to handle > redundant sign extension elimination for the word width logical operations > (AND/OR/XOR) is missing in simplify_rtx or REE (Redundant Extension > Elimination). 64-bit logical operations on DI mode registers with SI mode > subregs should also have their sign_extend operations eliminated as there is > no horizontal bit movement introduced by these operations. Upon looking at > the RTL for testcase 1, we can see that there is a very similar pattern to > the right shift case. We are seeking guidance on where to find the logic (or > lack thereof) to eliminate sign_extend on DI mode registers containing SI > mode subregs for full width logical ops. Should this logic be in > simplify-rtx.c or ree.c? > > Regards, > Michael. > > P.S. if the patch is acceptable, I can post to gcc-patches and/or raise bugs > in GCC bugzilla… I guess we’re initially seeking guidance on the issue… > > > testcase 1: > > $ cat bswap.c > unsigned bswap(unsigned p) { > return ((p >> 24) & 0x000000ff) | ((p << 8 ) & 0x00ff0000) | > ((p >> 8 ) & 0x0000ff00) | ((p << 24) & 0xff000000); > } > $ cat bswap.s > bswap2: > sllw a3,a0,24 > srlw a5,a0,24 > sllw a4,a0,8 > or a5,a5,a3 > li a3,16711680 > and a4,a4,a3 > or a5,a5,a4 > li a4,65536 > add a4,a4,-256 > srlw a0,a0,8 > and a0,a0,a4 > or a0,a5,a0 > sext.w a0,a0 # redundant > ret > > > testcase 2: > > $ cat rshift.c > unsigned rs24(unsigned rs1) { return rs1 >> 24; } > $ cat rshift.s > rs24: > sllw a0,a0,24 > sext.w a0,a0 # redundant > ret > > > testcase 3: > > $ cat lshift.c > unsigned ls24(unsigned rs1) { return rs1 << 24; } > $ cat lshift.s > ls24: > sllw a0,a0,24 > ret