在 2023/9/4 09:43, gaosong 写道:
> Hi, yijun
>
> 在 2023/9/3 上午9:10, Jiajie Chen 写道:
>>
>> On 2023/9/3 09:06, Richard Henderson wrote:
>>> On 9/1/23 22:02, Jiajie Chen wrote:
>>>> If LSX is available, use LSX instructions to implement 128-bit load &
>>>> store.
>>>
>>> Is this really guaranteed to be an atomic 128-bit operation?
>>>
>>
>> Song Gao, please check this.
>>
>>
> Could you explain this issue? Thanks.
If address is aligned with 16-bytes, the 128-bit load/store is atomic.
Else it is not atomic since maybe it crosses two cache lines or pages.
Regards
Bibo Mao
>
>>> Or, as for many vector processors, is this really two separate 64-bit
>>> memory operations under the hood?
>>>
>>>
>>>> +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg
>>>> data_hi,
>>>> + TCGReg addr_reg, MemOpIdx oi, bool
>>>> is_ld)
>>>> +{
>>>> + TCGLabelQemuLdst *ldst;
>>>> + HostAddress h;
>>>> +
>>>> + ldst = prepare_host_addr(s, &h, addr_reg, oi, true);
>>>> + if (is_ld) {
>>>> + tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> + tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
>>>> + tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
>>>> + } else {
>>>> + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
>>>> + tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
>>>> + tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> + }
>>>
>>> You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and
>>> therefore the vector operation, is required. I assume the gr<->vr moves
>>> have a cost and two integer operations are preferred when allowable.
>>>
>>> Compare the other implementations of this function.
>>>
>>>
>>> r~
>