https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118317
--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Yea. If we look at the .optimized output we get something like this for rv64:
unsigned int bar (unsigned int len)
{
unsigned int t;
_Bool _1;
unsigned int _2;
unsigned int _3;
;; basic block 2, loop depth 0, maybe hot
;; prev block 0, next block 1, flags: (NEW, VISITED)
;; pred: ENTRY (FALLTHRU,EXECUTABLE)
_1 = len_4(D) != 0;
_2 = (unsigned int) _1;
_3 = len_4(D) - _2;
t_5 = _3 + 1;
return t_5;
;; succ: EXIT (EXECUTABLE) j.c:4:12
}
;; Function bar1 (bar1, funcdef_no=1, decl_uid=2466, cgraph_uid=2,
symbol_order=1)
Removing basic block 3
;; basic block 3, loop depth 0
;; pred: 2
;; succ: 4
COND_EXPR in block 2 and PHI in block 4 converted to straightline code.
Merging blocks 2 and 4
fix_loop_structure: fixing up loops for function
unsigned int bar1 (unsigned int len)
{
unsigned int _4;
;; basic block 2, loop depth 0, maybe hot
;; prev block 0, next block 1, flags: (NEW, VISITED)
;; pred: ENTRY (FALLTHRU,EXECUTABLE)
_4 = MAX_EXPR <len_2(D), 1>;
return _4;
;; succ: EXIT (EXECUTABLE) j.c:9:25
We can see the unsigned types in bar() which should allow us to generate this
via MAX_EXPR. From a codegen standpoint we get this on rv64:
bar:
seqz a5,a0 # 8 [c=4 l=4] *seq_zero_disi
addw a0,a5,a0 # 17 [c=8 l=4] addsi3_extended/0
ret # 25 [c=0 l=4] simple_return
.size bar, .-bar
.align 1
.globl bar1
.type bar1, @function
bar1:
li a5,1 # 8 [c=4 l=4] *movdi_64bit/1
maxu a0,a0,a5 # 16 [c=4 l=4] *umaxdi3
ret # 24 [c=0 l=4] simple_return
bar1 is marginally better because the li has no incoming data independencies
and can thus issue whereever we want.
Given both are 2 insn sequences, we're not likely going to be able to fix this
cleanly in combine. So generating a MAX_EXPR in gimple for bar() seems like
the only viable path forward.