https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70236
Bug ID: 70236 Summary: Register allocation and loop unrolling lead to waste of registers Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: vogt at linux dot vnet.ibm.com CC: vmakarov at gcc dot gnu.org Target Milestone: --- Created attachment 37966 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37966&action=edit ira dump A new s390x pattern for shift-and-xor does not yield a satisfying result with this code when compiled with "-O3 -funroll-loops": -- snip -- unsigned long hash(unsigned long l) { unsigned long v = 0; unsigned long i; for (i = 0; i < 8; i++) { v <<= 1; v ^= l; } return v; } -- snip -- => lgr %r1,%r2 lgr %r3,%r2 rxsbg %r1,%r2,0,62,1 # (shift r2 by one bit left and xor with r1) rxsbg %r3,%r1,0,62,1 lgr %r1,%r2 rxsbg %r1,%r3,0,62,1 lgr %r4,%r1 <----- unnecessary lgr %r1,%r2 rxsbg %r1,%r4,0,62,1 lgr %r5,%r1 <----- unnecessary lgr %r1,%r2 rxsbg %r1,%r5,0,62,1 lgr %r0,%r1 <----- unnecessary lgr %r1,%r2 rxsbg %r1,%r0,0,62,1 rxsbg %r2,%r1,0,62,1 br %r14 ("%r1,%r2,0,62,1" means "r1 := r1 ^ (r2 << 1)"; the ",0,62,1" part of the instruction effectively means "shift left by one".) (gets worse with more loop passes). The code got unrolled in tree: v_16 = l_4(D) << 1; v_17 = l_4(D) ^ v_16; v_21 = v_17 << 1; v_22 = l_4(D) ^ v_21; v_26 = v_22 << 1; v_27 = l_4(D) ^ v_26; v_31 = v_27 << 1; v_32 = l_4(D) ^ v_31; v_36 = v_32 << 1; v_37 = l_4(D) ^ v_36; v_41 = v_37 << 1; v_42 = l_4(D) ^ v_41; v_3 = v_42 << 1; v_5 = v_3 ^ l_4(D); return v_5; Register allocation insists on having the value of "l" in r1. As the result of the previous pass through the loop is in r1, it's necessary to move that value out of the way first. Later on, regrename fails to clean up this situation, probaby because the problem is too complex with many sequential overlapping register use chains. This: lgr %r1,%r2 rxsbg %r1,%r3,0,62,1 lgr %r4,%r1 lgr %r1,%r2 rxsbg %r1,%r4,0,62,1 lgr %r5,%r1 lgr %r1,%r2 rxsbg %r1,%r5,0,62,1 lgr %r0,%r1 lgr %r1,%r2 rxsbg %r1,%r0,0,62,1 could be rewritten to lgr %r1,%r2 rxsbg %r1,%r3,0,62,1 lgr %r3,%r2 rxsbg %r3,%r1,0,62,1 lgr %r1,%r2 rxsbg %r1,%r3,0,62,1 lgr %r3,%r2 rxsbg %r3,%r1,0,62,1 just using three registers. The question is whether this situation can be improved, either in the register allocator, or regrename, or in the pattern.