Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

Lulu Cheng Wed, 12 Feb 2025 18:52:17 -0800


在 2025/1/24 下午7:44, Richard Sandiford 写道:

Lulu Cheng <chengl...@loongson.cn> writes:

在 2025/1/24 下午3:58, Richard Sandiford 写道:

Lulu Cheng <chengl...@loongson.cn> writes:

在 2025/1/22 上午8:49, Xi Ruoyao 写道:
I have no problem with this patch.
But, I have always been confused about the use of reload_completed.

I can understand that it needs to be true here, but I don't quite
understand the following:

```

(define_insn_and_split "*zero_extendsidi2_internal"
     [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
           (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
"r,m,ZC,k")))]
     "TARGET_64BIT"
     "@
      bstrpick.d\t%0,%1,31,0
      ld.wu\t%0,%1
      #
      ldx.wu\t%0,%1"
     "&& reload_completed
      && MEM_P (operands[1])
      && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
SImode)
          && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),
SImode))
      && !paradoxical_subreg_p (operands[0])"
     [(set (match_dup 3) (match_dup 1))
      (set (match_dup 0)
           (ior:DI (zero_extend:DI
                     (subreg:SI (match_dup 0) 0))
                   (match_dup 2)))]
     {
       operands[1] = gen_lowpart (SImode, operands[1]);
       operands[3] = gen_lowpart (SImode, operands[0]);
       operands[2] = const0_rtx;
     }
     [(set_attr "move_type" "arith,load,load,load")
      (set_attr "mode" "DI")])
```

What is the role of reload_complete here?

Yeah, I agree it looks odd.  In particular, operands[0] should never be
a subreg after RA, so the paradoxical_subreg_p test shouldn't be needed.
And the hard-coded (subreg:SI ... 0) in the expansion pattern doesn't
seem correct for hard registers -- it should be folded down to a single
(reg:SI ...) instead, as for operands[3].

Thanks,
Richard

Now I have a very vague idea of when reload_completed needs to be judged

in the split stage and when it does not need to be judged.:-(

Could you please give me some guidance?

Two of the main uses of reload_completed in splits that I know of are:

(1) Splitting an instruction that has multiple alternatives, in cases
     where the choice between splitting and not splitting depends on
     the register allocation.  An aarch64 example of this is:

     (define_insn_and_split "aarch64_simd_mov_from_<mode>low"
       [(set (match_operand:<VHALF> 0 "register_operand")
             (vec_select:<VHALF>
               (match_operand:VQMOV_NO2E 1 "register_operand")
               (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))]
       "TARGET_FLOAT"
       {@ [ cons: =0 , 1 ; attrs: type   , arch      ]
          [ w        , w ; mov_reg       , simd      ] #
          [ ?r       , w ; neon_to_gp<q> , base_simd ] umov\t%0, %1.d[0]
          [ ?r       , w ; f_mrc         , *         ] fmov\t%0, %d1
       }
       "&& reload_completed && aarch64_simd_register (operands[0], <VHALF>mode)"
       [(set (match_dup 0) (match_dup 1))]
       {
         operands[1] = aarch64_replace_reg_mode (operands[1], <VHALF>mode);
       }
       [(set_attr "length" "4")]
     )

     Here, we want to split the first alternative (the one where the
     destination is a SIMD register), but we don't know until after RA
     whether the destination is a SIMD register.

(2) Splitting an instruction that the RA finds easier to allocate when
     unsplit.  A common instance of this is multiword moves.  An aarch64
     example is:

     (define_split
       [(set (match_operand:VSTRUCT_2QD 0 "register_operand")
             (match_operand:VSTRUCT_2QD 1 "register_operand"))]
       "TARGET_FLOAT && reload_completed"
       [(const_int 0)]
     {
       aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 2);
       DONE;
     })

     In particular, the unsplit form allows input and output registers to
     overlap.  The RA would not allow overlap if the instructions were
     split before RA (since the RA doesn't track the liveness of individual
     SIMD registers in multi-register tuples).

     This might become less of an issue in future, if the RA does become
     able to track the liveness of individual registers in a multi-register
     value.

There'll be other uses too, though.

Richard


I have modified the `zero_extendsidi_internal` template.

I think this writing conforms to the description of subreg in gccint.pdf.:-)


Thanks!


@@ -1766,18 +1766,13 @@ (define_insn_and_split "*zero_extendsidi2_internal"
    ldx.wu\t%0,%1"
   "&& reload_completed
    && MEM_P (operands[1])

- && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),SImode)- && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),SImode))

-   && !paradoxical_subreg_p (operands[0])"

+ && loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),SImode)

+   && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0), SImode)"
   [(set (match_dup 3) (match_dup 1))
    (set (match_dup 0)
-       (ior:DI (zero_extend:DI
-                 (subreg:SI (match_dup 0) 0))
-               (match_dup 2)))]
+       (zero_extend:DI (match_dup 3)))]
   {
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[3] = gen_lowpart (SImode, operands[0]);
-    operands[2] = const0_rtx;
+    operands[3] = gen_rtx_REG (SImode, REGNO (operands[0]));
   }
   [(set_attr "move_type" "arith,load,load,load")
    (set_attr "mode" "DI")])

Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

Reply via email to