loongarch64: Implement 128-bit load & store

bibo mao Mon, 04 Sep 2023 02:44:08 -0700

在 2023/9/4 09:43, gaosong 写道:
> Hi, yijun
> 
> 在 2023/9/3 上午9:10, Jiajie Chen 写道:
>>
>> On 2023/9/3 09:06, Richard Henderson wrote:
>>> On 9/1/23 22:02, Jiajie Chen wrote:
>>>> If LSX is available, use LSX instructions to implement 128-bit load &
>>>> store.
>>>
>>> Is this really guaranteed to be an atomic 128-bit operation?
>>>
>>
>> Song Gao, please check this.
>>
>>
> Could you explain this issue?  Thanks.
If address is aligned with 16-bytes, the 128-bit load/store is atomic.
Else it is not atomic since maybe it crosses two cache lines or pages.

Regards
Bibo Mao
> 
>>> Or, as for many vector processors, is this really two separate 64-bit 
>>> memory operations under the hood?
>>>
>>>
>>>> +static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg 
>>>> data_hi,
>>>> +                                   TCGReg addr_reg, MemOpIdx oi, bool 
>>>> is_ld)
>>>> +{
>>>> +    TCGLabelQemuLdst *ldst;
>>>> +    HostAddress h;
>>>> +
>>>> +    ldst = prepare_host_addr(s, &h, addr_reg, oi, true);
>>>> +    if (is_ld) {
>>>> +        tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> +        tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
>>>> +        tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
>>>> +    } else {
>>>> +        tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
>>>> +        tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
>>>> +        tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
>>>> +    }
>>>
>>> You should use h.aa.atom < MO_128 to determine if 128-bit atomicity, and 
>>> therefore the vector operation, is required.  I assume the gr<->vr moves 
>>> have a cost and two integer operations are preferred when allowable.
>>>
>>> Compare the other implementations of this function.
>>>
>>>
>>> r~
>
Re: [PATCH v3 16/16] tcg/loongarch64: Implement 128-bit load & store

Reply via email to