在 2025/8/18 下午7:28, Xi Ruoyao 写道:
On Mon, 2025-08-18 at 11:10 +0800, Xi Ruoyao wrote:
On Mon, 2025-08-18 at 09:15 +0800, Lulu Cheng wrote:
Pushed to r16-3247 ... r16-3264.
Sorry it took so long to merge.
LoongArch: Implement 16-byte CAS with sc.q
Sorry but it seems we need to revert this one particularly.
The implementation has a bug: in CAS operation if expected != original,
we'll still need to (atomically) read original into the pair of
registers. But my implementation fails to do that :(.
I tried to fix it but it seems difficult. The ll-sc loop will be
something like
1: ll.d $t0, $a0, 0
dbar
ld.d $t1, $a0, 8
bne $t0, $a1, 2f
bne $t1, $a2, 2f
ori $a3, $r0, 1 # set the success flag
move $t2, $a4
move $t3, $a5
b 3f
2: move $a3, $zero
move $t2, $t0
move $t3, $t1
3: sc.q $t2, $t3, $a0, 0
beqz $t2, 1b
Note that we still need to issue the sc.q even if the comparison fails
to ensure the atomicity of the load (i.e. if another thread changed the
content of memory between ll.d and ld.d, we shouldn't see a "teared up"
state). But now it will trigger a page exception if expected !=
original and the memory is in a read-only page (see all the discussions
in PR 80878).
Thus IMO it's better to revert this for now until I can find a proper
solution.
I'm testing a patch using vldx and scq together to fix the issue.
Do you mean to use vldx to load it again when the atomic operation is
not satisfied or the comparison fails?