https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79057

            Bug ID: 79057
           Summary: Lra reloads to used register
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vogt at linux dot vnet.ibm.com
                CC: krebbel at gcc dot gnu.org
  Target Milestone: ---
              Host: s390x
            Target: s390x

Created attachment 40500
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40500&action=edit
ira output

With a new experimental pattern for the z10 risbg instruction a regression
appeared that seems to point towards a situation where Lra does not work too
well, but it may also be a backend thing.  This code:

--
unsigned long foo(unsigned char *s, unsigned int flags)
{
  unsigned long uv = *s;
  long unsigned int len;
  len = 6;
  while (len--)
    {
      uv = (uv << 6 | ((unsigned char)*s & 0x3f));
      s++;
    }
  return uv;
}
--

used to result in alternating 

  (set (reg:DI) (zero_extend:DI (mem:QI ...)))
  (set (reg:DI) (some logical operation using two regs))
  (set (reg:DI) (zero_extend:DI (mem:QI ...)))
  (set (reg:DI) (some logical operation using two regs))
  ...

Now, there's the experimental pattern that allows combining these to a single
insn:

  (define_insn "*<risbg_n>_disi_<shift>"
    [(set (match_operand:DI 0 "register_operand" "=d,d")
        (ior:DI (and:DI
                 (subreg:DI
                  (match_operand:SINT 1 "nonimmediate_operand" "0,0") 0)
                 (match_operand:DI 2 "const_int_operand" ""))
                (SHIFT:DI
                 (match_operand:DI 3 "nonimmediate_operand" "d,0")
                 (match_operand:DI 4 "rXsbg_rotate_count_operand" ""))
                ))] ...

->

  (set (reg:DI) (some logical operation using one reg and one memory operand)
  (set (reg:DI) (some logical operation using one reg and one memory operand)
  ...

after "ira", which is exactly what the pattern is supposed to do.  So far,
everything is fine.  The resulting risbg instruction takes one register masks
and rotates some bits and ors them into the output register.  With the above C
code, if the result of the last unrolled loop pass is in %r3 one could load the
next byte into, say, %r1 and merge in the bits from %r3 into %r1 so that the
new intermediary result is in %r1.  However, reload forces the intermediary
result to always use the same register, adding a lot of register moves to free
the still used register.

        llgc    %r1,0(%r2)      # 7     *zero_extendqidi2_extimm/2      [length
= 6]
        llgc    %r3,1(%r2)      # 60    *zero_extendqidi2_extimm/2      [length
= 6]
        risbgn  %r1,%r1,0,57,6  # 14    *risbgn_ashldi/1        [length = 6]
        risbgn  %r3,%r1,0,57,6  # 21    *risbgn_disi_ashl/1     [length = 6]
        lgr     %r1,%r3 # 62    *movdi_64/13    [length = 4]
        llgc    %r3,2(%r2)      # 64    *zero_extendqidi2_extimm/2      [length
= 6]
        risbgn  %r3,%r1,0,57,6  # 28    *risbgn_disi_ashl/1     [length = 6]
        lgr     %r1,%r3 # 66    *movdi_64/13    [length = 4]
        llgc    %r3,3(%r2)      # 68    *zero_extendqidi2_extimm/2      [length
= 6]
        risbgn  %r3,%r1,0,57,6  # 35    *risbgn_disi_ashl/1     [length = 6]
        lgr     %r1,%r3 # 70    *movdi_64/13    [length = 4]
        llgc    %r3,4(%r2)      # 72    *zero_extendqidi2_extimm/2      [length
= 6]
        llgc    %r2,5(%r2)      # 76    *zero_extendqidi2_extimm/2      [length
= 6]
        risbgn  %r3,%r1,0,57,6  # 42    *risbgn_disi_ashl/1     [length = 6]
        lgr     %r1,%r3 # 74    *movdi_64/13    [length = 4]
        lr      %r3,%r2 # 77    *movqi/1        [length = 2]
        risbgn  %r3,%r1,0,57,6  # 54    *risbgn_disi_ashl/1     [length = 6]
        lgr     %r2,%r3 # 78    *movdi_64/13    [length = 4]
        br      %r14    # 81    *return [length = 2]

The instructions "lgr %r1,%r3" and "lgr %r2,%r3" could be avoided by better
register allocation.

The two part question is why reload does this and whether there is anything
that the backend can do to get this right.

Reply via email to