https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97872
Bug ID: 97872 Summary: Missed optimization for less-than comparison on vectors Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: #include <arm_neon.h> uint8x8_t f1(int8x8_t a, int8x8_t b) { return a < b; } uint8x8_t f2(int8x8_t a, int8x8_t b) { return vclt_s8 (a, b); } Code-gen for f2 uses vcgt insn f2: vcgt.s8 d0, d1, d0 bx lr However code-gen for f1 results in: f1: vmov.i32 d16, #0xffffffff @ v8qi vmov.i32 d17, #0 @ v8qi vcgt.s8 d0, d1, d0 vbsl d0, d16, d17 bx lr which IIUC is redundant, since vcgt will set all-ones, or all-zeros in d0 depending on the comparison. The reason this happens is because vclt_s8 uses __builtin_neon_vcgtv8qi that emits vcgt.s8, while f1 is lowered to using VCOND in optimized dump: f1 (int8x8_t a, int8x8_t b) { vector(8) signed char _2; uint8x8_t _5; <bb 2> [local count: 1073741824]: _2 = .VCOND (a_3(D), b_4(D), { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 107); _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2); return _5; } and correspondingly expanded to: ;; _2 = .VCOND (a_3(D), b_4(D), { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 107); (insn 7 6 8 (set (reg:V8QI 117) (const_vector:V8QI [ (const_int -1 [0xffffffffffffffff]) repeated x8 ])) "foo.c":4:12 -1 (nil)) (insn 8 7 9 (set (reg:V8QI 118) (const_vector:V8QI [ (const_int 0 [0]) repeated x8 ])) "foo.c":4:12 -1 (nil)) (insn 9 8 10 (set (reg:V8QI 119) (neg:V8QI (gt:V8QI (reg/v:V8QI 116 [ b ]) (reg/v:V8QI 115 [ a ])))) "foo.c":4:12 -1 (nil)) (insn 10 9 0 (set (reg:V8QI 113 [ _2 ]) (unspec:V8QI [ (reg:V8QI 119) (reg:V8QI 117) (reg:V8QI 118) ] UNSPEC_VBSL)) "foo.c":4:12 -1 (nil)) Thanks, Prathamesh