https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |rtl-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note on the trunk I have change the code slightly to get a cmove done.
With the cmove we could simplify the following RTL:
Trying 27, 28 -> 29:
27: {flags:CCZ=cmp(r86:SI&0x100,0);r82:SI=r86:SI&0x100;}
REG_DEAD r86:SI
28: r85:SI=0xffffffffffffffe6
29: r82:SI={(flags:CCZ==0)?r82:SI:r85:SI}
REG_DEAD r85:SI
REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg/v:SI 82 [ tt ])
(if_then_else:SI (eq (zero_extract:SI (reg:SI 86)
(const_int 1 [0x1])
(const_int 8 [0x8]))
(const_int 0 [0]))
(and:SI (reg:SI 86)
(const_int 256 [0x100]))
(const_int -26 [0xffffffffffffffe6])))
But that would be a 3->3 combine which I don't know if combine does. I know it
does 3->1 and 3->2
andl $256, %edi
movl $-26, %eax
cmovne %eax, %edi
I also don't know what the cost of doing cmov vs the shifts here though.
I know for aarch64, it is worse but that should have been modeled already.