https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109955

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
One thing I see is

-(insn 11 10 15 2 (set (subreg:V16QI (reg:V2DI 83 [ <retval> ]) 0)
-        (unspec:V16QI [
-                (reg:V16QI 92)
-                (reg:V16QI 91)
-                (lt:V16QI (reg:V16QI 90)
-                    (const_vector:V16QI [
-                            (const_int 0 [0]) repeated x16
-                        ]))
-            ] UNSPEC_BLENDV))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10
discrim 1 7431 {*sse4_1_pblendvb_lt}
                 (nil)))))

vs

+(insn 8 5 9 2 (set (reg:V16QI 89)
+        (const_vector:V16QI [
+                (const_int -1 [0xffffffffffffffff]) repeated x16
+            ]))
"/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 1838
{movv16qi_internal}
+     (nil))
+(insn 9 8 11 2 (set (reg:V16QI 90)
+        (gt:V16QI (reg:V16QI 92)
+            (reg:V16QI 89)))
"/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 6749
{*sse2_gtv16qi3}
      (expr_list:REG_DEAD (reg:V16QI 92)
+        (expr_list:REG_DEAD (reg:V16QI 89)
+            (nil))))
+(note 11 9 12 2 NOTE_INSN_DELETED)
+(insn 12 11 16 2 (set (subreg:V16QI (reg:V2DI 84 [ <retval> ]) 0)
+        (unspec:V16QI [
+                (reg:V16QI 93)
+                (reg:V16QI 94)
+                (reg:V16QI 90)
+            ] UNSPEC_BLENDV))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10
discrim 1 7429 {sse4_1_pblendvb}
+     (expr_list:REG_DEAD (reg:V16QI 93)
+        (expr_list:REG_DEAD (reg:V16QI 90)
+            (expr_list:REG_DEAD (reg:V16QI 94)
                 (nil)))))

after the combiner which seems to be a missing simplification of


(insn 8 5 9 2 (set (reg:V16QI 89)
        (const_vector:V16QI [
                (const_int -1 [0xffffffffffffffff]) repeated x16
            ]))
(insn 9 8 11 2 (set (reg:V16QI 90)
               (gt:V16QI (reg:V16QI 92)
                (reg:V16QI 89)))

to

(lt:V16QI (reg:V16QI 90)
                    (const_vector:V16QI [
                            (const_int 0 [0]) repeated x16
                        ])

Trying 8 -> 9:
    8: r89:V16QI=const_vector
    9: r90:V16QI=r92:V16QI>r89:V16QI
      REG_DEAD r92:V16QI
      REG_DEAD r89:V16QI
Failed to match this instruction:
(set (reg:V16QI 90)
    (gt:V16QI (reg:V16QI 92)
        (const_vector:V16QI [
                (const_int -1 [0xffffffffffffffff]) repeated x16
            ])))

Trying 8, 9 -> 12:
    8: r89:V16QI=const_vector
    9: r90:V16QI=r92:V16QI>r89:V16QI
      REG_DEAD r92:V16QI
      REG_DEAD r89:V16QI
   12: r84:V2DI#0=unspec[r93:V16QI,r94:V16QI,r90:V16QI] 47
      REG_DEAD r93:V16QI
      REG_DEAD r90:V16QI
      REG_DEAD r94:V16QI
Failed to match this instruction:
(set (subreg:V16QI (reg:V2DI 84 [ <retval> ]) 0)
    (unspec:V16QI [
            (reg:V16QI 93)
            (reg:V16QI 94) 
            (gt:V16QI (reg:V16QI 92)
                (const_vector:V16QI [
                        (const_int -1 [0xffffffffffffffff]) repeated x16
                    ]))
        ] UNSPEC_BLENDV))

not sure if the lt is a standalone thing.  Maybe we just need a
define-insn-and-split for _gt as well.  All those seem to be somewhat
tuned to the exact way RTL expansion works when the vcond patterns are there.

Getting rid of vcond* (but not vcond_mask) would allow quite some
simplification
in middle-end code and the vectorizer.

Reply via email to