https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118317

--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Yea.  If we look at the .optimized output we get something like this for rv64:

unsigned int bar (unsigned int len)
{
  unsigned int t;
  _Bool _1;
  unsigned int _2; 
  unsigned int _3;

;;   basic block 2, loop depth 0, maybe hot
;;    prev block 0, next block 1, flags: (NEW, VISITED)
;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
  _1 = len_4(D) != 0;
  _2 = (unsigned int) _1;
  _3 = len_4(D) - _2;
  t_5 = _3 + 1;
  return t_5;
;;    succ:       EXIT (EXECUTABLE) j.c:4:12

}



;; Function bar1 (bar1, funcdef_no=1, decl_uid=2466, cgraph_uid=2,
symbol_order=1)

Removing basic block 3
;; basic block 3, loop depth 0
;;  pred:       2
;;  succ:       4


COND_EXPR in block 2 and PHI in block 4 converted to straightline code.
Merging blocks 2 and 4
fix_loop_structure: fixing up loops for function
unsigned int bar1 (unsigned int len)
{   
  unsigned int _4;

;;   basic block 2, loop depth 0, maybe hot
;;    prev block 0, next block 1, flags: (NEW, VISITED)
;;    pred:       ENTRY (FALLTHRU,EXECUTABLE)
  _4 = MAX_EXPR <len_2(D), 1>;
  return _4;
;;    succ:       EXIT (EXECUTABLE) j.c:9:25

We can see the unsigned types in bar() which should allow us to generate this
via MAX_EXPR.    From a codegen standpoint we get this on rv64:

bar:
        seqz    a5,a0   # 8     [c=4 l=4]  *seq_zero_disi
        addw    a0,a5,a0        # 17    [c=8 l=4]  addsi3_extended/0
        ret             # 25    [c=0 l=4]  simple_return
        .size   bar, .-bar
        .align  1
        .globl  bar1
        .type   bar1, @function
bar1:
        li      a5,1            # 8     [c=4 l=4]  *movdi_64bit/1
        maxu    a0,a0,a5        # 16    [c=4 l=4]  *umaxdi3
        ret             # 24    [c=0 l=4]  simple_return


bar1 is marginally better because the li has no incoming data independencies
and can thus issue whereever we want.

Given both are 2 insn sequences, we're not likely going to be able to fix this
cleanly in combine.  So generating a MAX_EXPR in gimple for bar() seems like
the only viable path forward.

Reply via email to