> Note for blendv, it checks the significant bit of the mask, not simple
>  if_then_else
>   mask
>   if_true 
>   if_false
> 
> It should be 
> if_then_else
>    ashiftrt mask 31
>    if_true
>    if_false
I think canonical form (produced by combine) would be

if_then_else
  ge mask 0
  if_false
  if_true
> 
> Maybe not very useful in practice, just like why there's UNSPEC_FMADDSUB
> 
> 6334
>  6335;; It would be possible to represent these without the UNSPEC as
>  6336;;
>  6337;; (vec_merge
>  6338;;   (fma op1 op2 op3)
>  6339;;   (fma op1 op2 (neg op3))
>  6340;;   (merge-const))
>  6341;;
>  6342;; But this doesn't seem useful in practice.

I am not so sure about this when it come to relatively common
instructions.  Hiding things in unspec prevents combine and other RTL
passes from doing their job. I would say that it only makes sense for
siutations where RTL equivalent is very inconvenient.

I noticed that we miss other optimizations of conditional moves. For
example:

int a[1000];
int b[1000];

int test()
{
        for (int i = 0; i < 1000; i++)
                a[i] = b[i] > 10 ? 2 : 3;
}

is compiled by clang to:
        pcmpgtd %xmm0, %xmm2
        paddd   %xmm1, %xmm2

while we do
        pcmpgtd %xmm4, %xmm0
        pand    %xmm0, %xmm1
        pandn   %xmm2, %xmm0
        por     %xmm1, %xmm0

Here I guess combine is out of luck. I fails to simplify the three
logcal operations:

        Trying 17, 16 -> 18:
           17: r112:V4SI=~r110:V4SI&r104:V4SI
              REG_DEAD r110:V4SI
           16: r111:V4SI=r107:V4SI&r110:V4SI
           18: r100:V4SI=r112:V4SI|r111:V4SI
              REG_DEAD r112:V4SI
              REG_DEAD r111:V4SI
        Failed to match this instruction:
        (set (reg:V4SI 100 [ vect_iftmp.10 ])
            (ior:V4SI (and:V4SI (not:V4SI (reg:V4SI 110))
                    (reg:V4SI 104))
                (and:V4SI (reg:V4SI 107)
                    (reg:V4SI 110))))

Here reg 110 is set by compare:

        (insn 15 14 16 3 (set (reg:V4SI 110)
                (gt:V4SI (reg:V4SI 109 [ MEM <vector(4) int> [(int *)&b + 
ivtmp.17_13 * 1] ])
                    (reg:V4SI 104))) 6997 {*sse2_gtv4si3}
             (expr_list:REG_DEAD (reg:V4SI 109 [ MEM <vector(4) int> [(int *)&b 
+ ivtmp.17_13 * 1] ])
                (nil)))

and I think it misses the fact that the mask is either all 0 or all 1
for each lane (which is a value rango info it does not track).

Similarly one can simplify i.e.
                a[i] = b[i] > 10 ? 2 : 5;
into and and or...

Honza

Reply via email to