In some cases, the work of the cse1 pass is counterproductive, as
we noticed on s390x.  The effect described below is present since
at least 4.8.0.  Note that this may not become manifest in a
performance issue problem on all platforms.  Also note that -O1
does not show this behaviour because the responsible code is only
executed with -O2 or higher.

The core of the problem is the was cse1 sometimes handles function
parameters.  Roughly, the observed situation is

Before cse1

  start of function
  set pseudoreg Rp to the first argument from hardreg R2
  (some code that uses Rp)
  set R2 to Rp

After cse1:

  start of function
  set pseudoreg Rp to the first argument from hardreg R2
  (some code that uses Rp)  <--- The use of Rp is still present
  set R2 to R2              <--- cse1 has replaced Rp with R2

After that, the set pattern is removed completely, and now we have
both, Rp and R2 live in the drafted code snippet.  Because R2 ist
still supposed to be live later on, the ira pass chooses a
different hard register (R1) for Rp, and code to copy R1 back to
R2 is added later.  (See further down for Rtl and assembly code.)

--

There seems to be code to prevent this in cse.c:hash_rtx_cb() as a
comment from that function suggests:

    /* On some machines, we can't record any non-fixed hard register,
       because extending its life will cause reload problems.  We
       consider ap, fp, sp, gp to be fixed for this purpose.
    ...

Unfortunately this is not caused by hashing but by the code
dealing with src_related in cse_insn().  When cse_insn() handles
the "copy Rp to R2" instruction, it does nothing up to line 5020
and sets src_related there:

  /* This is the same as the destination of the insns, we want
     to prefer it.  Copy it to src_related.  The code below will
     then give it a negative cost.  */
  if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
    src_related = dest;

Eventually, the term src_related is used to replace the source
expression of the set pattern.  So, while the above comment may be
applicable to hashed expressions that are considered for
replacement, there's no such "safety net" for the expressions
src_related, src_folded etc.  I guess if there was, that would fix
the issue.

--

So, I've made an experimental hack (see attachment) and treid
that.  In a larger test suite, register copies could be saved in
quite some places (including the test program below), but in other
places new register copies were introduced, resulting in about
twice as much "issues" as without the patch.

Maybe the patch is just too coarse.  In general I'd assume that
the register allocator does a better job of assigning hard
registers to pseudo registers.  Is it possible to better describe
when cse1 should keep its hands off pseudo registers?

--

Example (complete filles in the attached .tgz file):

--- foo.c ---
extern long bar(long y);
long foo(long x, long z)
{
  if (x != 0)
    x = bar(x);
  return x + z;
}
--- foo.c ---

Assembly output with three superfluous instructions:

--- foo.s ---
foo:
        stmg    %r12,%r15,96(%r15)
        aghi    %r15,-160
        ltgr    %r1,%r2       <--- unnecessary copy of r2 to r1
        lgr     %r12,%r3
        je      .L2
        brasl   %r14,bar      <--- r2 th as argument, return value in r2
        lgr     %r1,%r2       <--- unnecessary copy of r2 to r1
.L2:
        agr     %r1,%r12      <--- could use r2 instead of r1
        lg      %r4,272(%r15)
        lgr     %r2,%r1       <--- unnecessary copy of r1 to r2
        lmg     %r12,%r15,256(%r15)
        br      %r4
--- foo.s ---

Rtl before cse1 pass:

--- foo.c.195r.dfinit ---
(insn 2 5 3 2 (set (reg/v:DI 61 [ x ])
        (reg:DI 2 %r2 [ x ])) /home/vogt/foo.c:3 897 {*movdi_64}
     (nil))
...
(insn 7 4 8 2 (set (reg:CCZ 33 %cc)
        (compare:CCZ (reg/v:DI 61 [ x ])
            (const_int 0 [0]))) /home/vogt/foo.c:4 846 {*tstdi_cconly2}
     (nil))
(jump_insn 8 7 9 2 (set (pc)
        (if_then_else (eq (reg:CCZ 33 %cc)
                (const_int 0 [0]))
            (label_ref 13)
            (pc))) /home/vogt/foo.c:4 1440 {*cjump_64}
     (int_list:REG_BR_PROB 6102 (nil))
 -> 13)
...
(insn 10 9 11 3 (set (reg:DI 2 %r2)
        (reg/v:DI 61 [ x ])) /home/vogt/foo.c:5 897 {*movdi_64}
     (nil))
(call_insn 11 10 12 3 (parallel [
            (set (reg:DI 2 %r2)
                (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
<function_decl 0x3fff08a7c00 bar>) [0 bar S1 A8])
                    (const_int 0 [0])))
            (clobber (reg:DI 14 %r14))
        ]) /home/vogt/foo.c:5 1480 {*brasl_r}
     (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  
<function_decl 0x3fff08a7c00 bar>)
        (nil))
    (expr_list:DI (use (reg:DI 2 %r2))
        (nil)))
(insn 12 11 13 3 (set (reg/v:DI 61 [ x ])
        (reg:DI 2 %r2)) /home/vogt/foo.c:5 897 {*movdi_64}
     (nil))
(code_label 13 12 14 4 2 "" [1 uses])
--- foo.c.195r.dfinit ---

Cse1 replaces r61 with r2 in just one insn:

--- foo.c.196r.cse1 ---
(insn 10 9 11 3 (set (reg:DI 2 %r2)
        (reg:DI 2 %r2)) /home/vogt/foo.c:5 897 {*movdi_64}
     (expr_list:REG_DEAD (reg/v:DI 61 [ x ])
        (nil)))
--- foo.c.196r.cse1 ---

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Attachment: foo.tgz
Description: application/gtar-compressed

>From a7fc5b13c7f0eb59e0c0d1389d031acfae3f7c6b Mon Sep 17 00:00:00 2001
From: Dominik Vogt <v...@linux.vnet.ibm.com>
Date: Mon, 12 Oct 2015 13:05:46 +0100
Subject: [PATCH] !!!attempt fix

---
 gcc/cse.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/gcc/cse.c b/gcc/cse.c
index a9cc26a..74daf12 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4600,10 +4600,16 @@ cse_insn (rtx_insn *insn)
       /* Set nonzero if we need to call force_const_mem on with the
 	 contents of src_folded before using it.  */
       int src_folded_force_flag = 0;
+      bool set_hard_reg_from_pseudo;
 
       dest = SET_DEST (sets[i].rtl);
       src = SET_SRC (sets[i].rtl);
 
+      set_hard_reg_from_pseudo = (src && REG_P (src)
+				  && REGNO (src) >= FIRST_PSEUDO_REGISTER
+				  && dest && REG_P (dest)
+				  && REGNO (dest) < FIRST_PSEUDO_REGISTER);
+
       /* If SRC is a constant that has no machine mode,
 	 hash it with the destination's machine mode.
 	 This way we can keep different modes separate.  */
@@ -5168,6 +5174,13 @@ cse_insn (rtx_insn *insn)
 	      src_elt_cost = MAX_COST;
 	    }
 
+	  if (set_hard_reg_from_pseudo
+	      && trial && REG_P (trial) && REGNO (trial) == REGNO (dest))
+	    /* The register allocator is expected to do a better job of
+	       replacing pseudo registers with hard registers a la
+	       (set hardreg<n> pseudoreg<m>) -> (set hardreg<n> hardreg<n>)  */
+	    continue;
+
 	  /* Avoid creation of overlapping memory moves.  */
 	  if (MEM_P (trial) && MEM_P (SET_DEST (sets[i].rtl)))
 	    {
-- 
2.3.0

Reply via email to