https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124076

            Bug ID: 124076
           Summary: riscv: Redundant vec-vec move and masking, RTL
                    predication.
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
  Target Milestone: ---
            Target: riscv

When looking at vect-early-break_81.c I noticed the following:

        vsetvli zero,zero,e32,m1,ta,mu   # Can be ma instead of mu
        vzext.vf2       v3,v1
        vmv1r.v v1,v2                    # Redundant
        vmsne.vv        v1,v4,v3

We perform a comparison with "mask undisturbed" policy but without actually
specifying a mask register, so the operation is unmasked.

The issue is that at expand time the operation still appears masked:

(insn 185 184 186 15 (set (reg:RVVMF32BI 363)
        (if_then_else:RVVMF32BI (unspec:RVVMF32BI [
                    (reg:RVVMF32BI 361)
                    (reg:DI 364)
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (ne:RVVMF32BI (reg:RVVM1SI 358 [ vect__8.33_153 ])
                (reg:RVVM1SI 360 [ vect__3.24_134 ]))
            (reg:RVVMF32BI 365))) "vect-early-break_81.c":29:10 discrim 99328
14262 {*pred_cmprvvm1si}
     (nil))

with a real merge operand (reg 365). 

cprop3 recognizes that the mask is always true:

(insn 185 179 187 11 (set (reg:RVVMF32BI 363)
        (if_then_else:RVVMF32BI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 431)
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (ne:RVVMF32BI (reg:RVVM1SI 358 [ vect__8.33_153 ])
                (reg:RVVM1SI 360 [ vect__3.24_134 ]))
            (reg:RVVMF32BI 365))) "vect-early-break_81.c":29:10 discrim 99328
14262 {*pred_cmprvvm1si}

The, now redundant, merge operand remains unchanged.

When the merge operand is different from the target operand like here we have
no choice but to reload one of them.  A downstream effect is that we even
choose the "mask undisturbed" policy in the vsetvl pass later (which might be
less efficient depending on the uarch).

This could obviously happen for any instruction so a targeted combine pattern
will only help for one but not all cases.  We might need a target-specific pass
that recognizes these situations (always-true mask with merge operand or
always-false mask) and simplifies them.

Even nicer would be to have a real RTL mask representation so these kinds of
optimizations could be handled generically.  (We've been wanting a predicated
gimple representation for a while as well)

I guess if_then_else is not that bad for such cases but the if would need to
contain two AND'ed masks.  For aarch64 the length mask would always be true.
Or do we need a new RTL expression for it altogether?

Reply via email to