https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79057
Bug ID: 79057 Summary: Lra reloads to used register Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: vogt at linux dot vnet.ibm.com CC: krebbel at gcc dot gnu.org Target Milestone: --- Host: s390x Target: s390x Created attachment 40500 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40500&action=edit ira output With a new experimental pattern for the z10 risbg instruction a regression appeared that seems to point towards a situation where Lra does not work too well, but it may also be a backend thing. This code: -- unsigned long foo(unsigned char *s, unsigned int flags) { unsigned long uv = *s; long unsigned int len; len = 6; while (len--) { uv = (uv << 6 | ((unsigned char)*s & 0x3f)); s++; } return uv; } -- used to result in alternating (set (reg:DI) (zero_extend:DI (mem:QI ...))) (set (reg:DI) (some logical operation using two regs)) (set (reg:DI) (zero_extend:DI (mem:QI ...))) (set (reg:DI) (some logical operation using two regs)) ... Now, there's the experimental pattern that allows combining these to a single insn: (define_insn "*<risbg_n>_disi_<shift>" [(set (match_operand:DI 0 "register_operand" "=d,d") (ior:DI (and:DI (subreg:DI (match_operand:SINT 1 "nonimmediate_operand" "0,0") 0) (match_operand:DI 2 "const_int_operand" "")) (SHIFT:DI (match_operand:DI 3 "nonimmediate_operand" "d,0") (match_operand:DI 4 "rXsbg_rotate_count_operand" "")) ))] ... -> (set (reg:DI) (some logical operation using one reg and one memory operand) (set (reg:DI) (some logical operation using one reg and one memory operand) ... after "ira", which is exactly what the pattern is supposed to do. So far, everything is fine. The resulting risbg instruction takes one register masks and rotates some bits and ors them into the output register. With the above C code, if the result of the last unrolled loop pass is in %r3 one could load the next byte into, say, %r1 and merge in the bits from %r3 into %r1 so that the new intermediary result is in %r1. However, reload forces the intermediary result to always use the same register, adding a lot of register moves to free the still used register. llgc %r1,0(%r2) # 7 *zero_extendqidi2_extimm/2 [length = 6] llgc %r3,1(%r2) # 60 *zero_extendqidi2_extimm/2 [length = 6] risbgn %r1,%r1,0,57,6 # 14 *risbgn_ashldi/1 [length = 6] risbgn %r3,%r1,0,57,6 # 21 *risbgn_disi_ashl/1 [length = 6] lgr %r1,%r3 # 62 *movdi_64/13 [length = 4] llgc %r3,2(%r2) # 64 *zero_extendqidi2_extimm/2 [length = 6] risbgn %r3,%r1,0,57,6 # 28 *risbgn_disi_ashl/1 [length = 6] lgr %r1,%r3 # 66 *movdi_64/13 [length = 4] llgc %r3,3(%r2) # 68 *zero_extendqidi2_extimm/2 [length = 6] risbgn %r3,%r1,0,57,6 # 35 *risbgn_disi_ashl/1 [length = 6] lgr %r1,%r3 # 70 *movdi_64/13 [length = 4] llgc %r3,4(%r2) # 72 *zero_extendqidi2_extimm/2 [length = 6] llgc %r2,5(%r2) # 76 *zero_extendqidi2_extimm/2 [length = 6] risbgn %r3,%r1,0,57,6 # 42 *risbgn_disi_ashl/1 [length = 6] lgr %r1,%r3 # 74 *movdi_64/13 [length = 4] lr %r3,%r2 # 77 *movqi/1 [length = 2] risbgn %r3,%r1,0,57,6 # 54 *risbgn_disi_ashl/1 [length = 6] lgr %r2,%r3 # 78 *movdi_64/13 [length = 4] br %r14 # 81 *return [length = 2] The instructions "lgr %r1,%r3" and "lgr %r2,%r3" could be avoided by better register allocation. The two part question is why reload does this and whether there is anything that the backend can do to get this right.