在 2025/1/24 下午7:44, Richard Sandiford 写道:
Lulu Cheng <chengl...@loongson.cn> writes:
在 2025/1/24 下午3:58, Richard Sandiford 写道:
Lulu Cheng <chengl...@loongson.cn> writes:
在 2025/1/22 上午8:49, Xi Ruoyao 写道:
I have no problem with this patch.
But, I have always been confused about the use of reload_completed.
I can understand that it needs to be true here, but I don't quite
understand the following:
```
(define_insn_and_split "*zero_extendsidi2_internal"
[(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
"r,m,ZC,k")))]
"TARGET_64BIT"
"@
bstrpick.d\t%0,%1,31,0
ld.wu\t%0,%1
#
ldx.wu\t%0,%1"
"&& reload_completed
&& MEM_P (operands[1])
&& (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
SImode)
&& !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),
SImode))
&& !paradoxical_subreg_p (operands[0])"
[(set (match_dup 3) (match_dup 1))
(set (match_dup 0)
(ior:DI (zero_extend:DI
(subreg:SI (match_dup 0) 0))
(match_dup 2)))]
{
operands[1] = gen_lowpart (SImode, operands[1]);
operands[3] = gen_lowpart (SImode, operands[0]);
operands[2] = const0_rtx;
}
[(set_attr "move_type" "arith,load,load,load")
(set_attr "mode" "DI")])
```
What is the role of reload_complete here?
Yeah, I agree it looks odd. In particular, operands[0] should never be
a subreg after RA, so the paradoxical_subreg_p test shouldn't be needed.
And the hard-coded (subreg:SI ... 0) in the expansion pattern doesn't
seem correct for hard registers -- it should be folded down to a single
(reg:SI ...) instead, as for operands[3].
Thanks,
Richard
Now I have a very vague idea of when reload_completed needs to be judged
in the split stage and when it does not need to be judged.:-(
Could you please give me some guidance?
Two of the main uses of reload_completed in splits that I know of are:
(1) Splitting an instruction that has multiple alternatives, in cases
where the choice between splitting and not splitting depends on
the register allocation. An aarch64 example of this is:
(define_insn_and_split "aarch64_simd_mov_from_<mode>low"
[(set (match_operand:<VHALF> 0 "register_operand")
(vec_select:<VHALF>
(match_operand:VQMOV_NO2E 1 "register_operand")
(match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))]
"TARGET_FLOAT"
{@ [ cons: =0 , 1 ; attrs: type , arch ]
[ w , w ; mov_reg , simd ] #
[ ?r , w ; neon_to_gp<q> , base_simd ] umov\t%0, %1.d[0]
[ ?r , w ; f_mrc , * ] fmov\t%0, %d1
}
"&& reload_completed && aarch64_simd_register (operands[0], <VHALF>mode)"
[(set (match_dup 0) (match_dup 1))]
{
operands[1] = aarch64_replace_reg_mode (operands[1], <VHALF>mode);
}
[(set_attr "length" "4")]
)
Here, we want to split the first alternative (the one where the
destination is a SIMD register), but we don't know until after RA
whether the destination is a SIMD register.
(2) Splitting an instruction that the RA finds easier to allocate when
unsplit. A common instance of this is multiword moves. An aarch64
example is:
(define_split
[(set (match_operand:VSTRUCT_2QD 0 "register_operand")
(match_operand:VSTRUCT_2QD 1 "register_operand"))]
"TARGET_FLOAT && reload_completed"
[(const_int 0)]
{
aarch64_simd_emit_reg_reg_move (operands, <VSTRUCT_ELT>mode, 2);
DONE;
})
In particular, the unsplit form allows input and output registers to
overlap. The RA would not allow overlap if the instructions were
split before RA (since the RA doesn't track the liveness of individual
SIMD registers in multi-register tuples).
This might become less of an issue in future, if the RA does become
able to track the liveness of individual registers in a multi-register
value.
There'll be other uses too, though.
Richard
I have modified the `zero_extendsidi_internal` template.
I think this writing conforms to the description of subreg in gccint.pdf.:-)
Thanks!
@@ -1766,18 +1766,13 @@ (define_insn_and_split "*zero_extendsidi2_internal"
ldx.wu\t%0,%1"
"&& reload_completed
&& MEM_P (operands[1])
- && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
SImode)
- && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),
SImode))
- && !paradoxical_subreg_p (operands[0])"
+ && loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
SImode)
+ && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0), SImode)"
[(set (match_dup 3) (match_dup 1))
(set (match_dup 0)
- (ior:DI (zero_extend:DI
- (subreg:SI (match_dup 0) 0))
- (match_dup 2)))]
+ (zero_extend:DI (match_dup 3)))]
{
- operands[1] = gen_lowpart (SImode, operands[1]);
- operands[3] = gen_lowpart (SImode, operands[0]);
- operands[2] = const0_rtx;
+ operands[3] = gen_rtx_REG (SImode, REGNO (operands[0]));
}
[(set_attr "move_type" "arith,load,load,load")
(set_attr "mode" "DI")])