Re: backward threading heuristics tweek

Jan Hubicka Thu, 11 Aug 2016 05:36:18 -0700

> This also caused:
> 
> FAIL: gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 4
> 
>   Failures:
>       gcc.dg/tree-ssa/pr69270-3.c
>       
>   Bisected to: 
> 
>   Author: hubicka
>   Date:   Sun Aug 7 10:50:16 2016 +0000
> 
>       * tree-ssa-threadbackward.c: Include tree-inline.h
>       (profitable_jump_thread_path): Use estimate_num_insns to estimate
>       size of copied block; for cold paths reduce duplication.
>       (find_jump_threads_backwards): Remove redundant tests.
>       (pass_thread_jumps::gate): Enable for -Os.
>       * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update testcase.
> 
>   svn+ssh://gcc.gnu.org/svn/gcc/trunk@239219 
> 
> On my aarch64-none-linux-gnu and x86_64-none-linux-gnu automatic bisect
> robots.


Sorry for that - it seems I have missed this failure.  The reason is that FSM
now stops on:
  /* We avoid creating irreducible inner loops unless we thread through
     a multiway branch, in which case we have deemed it worth losing
     other loop optimizations later.

     We also consider it worth creating an irreducible inner loop if
     the number of copied statement is low relative to the length of
     the path -- in that case there's little the traditional loop
     optimizer would have done anyway, so an irreducible loop is not
     so bad.  */
  if (!threaded_multiway_branch && creates_irreducible_loop
      && (n_insns * PARAM_VALUE (PARAM_FSM_SCALE_PATH_STMTS)
          > path_length * PARAM_VALUE (PARAM_FSM_SCALE_PATH_BLOCKS)))

    {
      if (dump_file && (dump_flags & TDF_DETAILS))
        fprintf (dump_file,
                 "FSM would create irreducible loop without threading "
                 "multiway branch.\n");
      path->pop ();
      return NULL;
    }

The path threaded now gets n_insn==13 and path_lengt=6. I guess the difference
is that the path consists of several calls that are considered heavy by the
new code size estimate which is correct. It is definitly heaver than path
consisting of few increments.

<bb 6>:                                                                         
                                                                                
                           
if (phi_inserted_5 == 0)                                                        
                                                                                
                           
  goto <bb 8>;                                                                  
                                                                                
                           
else                                                                            
                                                                                
                           
  goto <bb 7>;

<bb 4>:
_2 = boo ();
if (_2 != 20)
  goto <bb 8>;
else
  goto <bb 6>;

<bb 3>:
_1 = arf ();
if (_1 != 10)
  goto <bb 8>;
else
  goto <bb 4>;

<bb 9>:
# phi_inserted_5 = PHI <0(2), phi_inserted_4(8)>
_3 = end_imm_use_stmt_p ();
if (_3 == 0)
  goto <bb 3>;
else
  goto <bb 10>;

loop latch.
<bb 8>: 
# phi_inserted_4 = PHI <phi_inserted_5(4), 1(6), phi_inserted_5(7), 
phi_inserted_5(3)>
next_imm_use_stmt ();

<bb 6>:
if (phi_inserted_5 == 0)
  goto <bb 8>;
else
  goto <bb 7>;


I would say that optimizing this path to dead is not the most important thing.  
The question is whether
there is really problem with an irreducible loop. THere are two loops in the 
function body prior threading:

;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 6 7 8 9 10
;;
;; Loop 1
;;  header 9, latch 8
;;  depth 1, outer 0
;;  nodes: 9 8 4 6 7 3
;; 2 succs { 9 }
;; 3 succs { 8 4 }
;; 4 succs { 8 6 }
;; 6 succs { 8 7 }
;; 7 succs { 8 }
;; 8 succs { 9 }
;; 9 succs { 3 10 }
;; 10 succs { 1 }

So the threaded path lives fully inside loop1: 6->8->9->3->4->6 propagating
that phi_inserted is 0 after the first iteration of the loop.  This looks like
useful loop peeling oppurtunity which does not garble loop structure. So
perhaps threading paths starting and passing loop latch (i.e. peeling) is
sane? Perhaps all paths fully captured in the loop in question are?

;; 2 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
;;
;; Loop 1
;;  header 12, latch 11
;;  depth 1, outer 0
;;  nodes: 12 11 4 9 10 3 8 7 6 5
;; 2 succs { 12 }
;; 3 succs { 11 4 }
;; 4 succs { 11 5 }
;; 5 succs { 6 10 }
;; 6 succs { 7 }
;; 7 succs { 8 13 }
;; 8 succs { 11 9 }
;; 9 succs { 11 10 }
;; 10 succs { 11 }
;; 11 succs { 12 }
;; 12 succs { 3 13 }
;; 13 succs { 1 }

Honza

Re: backward threading heuristics tweek

Reply via email to