*sending this email again, now in plain text

Hi Will,

I'm working at Huawei on verification of atomic primitives. I thought it would 
be appropriate to write to you because you're mentioned in several papers on 
ARM concurrency (https://www.cl.cam.ac.uk/~pes20/papers/topics.html), gcc 
patches and you're an author of several patches to kernel regarding this.

I've recently been looking into our implementation of atomic loads/stores on 
ARMv7 and found out that we treat stores specially - with LDREXD/STREXD loop - 
while load is just LDREXD.
The latest version of manual (DDI 0406C.d) explicitly prohibits this saying in 
A3.5.3 that "The way to atomically load two 32-bit quantities is to perform an 
LDREXD/STREXD sequence, reading and writing the same value, for which the 
STREXD succeeds, and use the read values."
Both GCC and LLVM produce the same code as us 
(https://godbolt.org/z/bYaWbEbjh). The explanation for this in GCC 
(https://gcc.gnu.org/pipermail/gcc-patches/2012-April/338841.html) given by 
Richard Earnshaw is based on older version of ARM ARM (before C.c), which says 
that LDREXDs are atomic:

--- C.b
+++ C.c
  In ARMv7, the single-copy atomic processor accesses are
  ***
- memory accesses caused by LDREXD and STREXD instructions to 
doubleword-aligned locations.
+ Memory accesses caused by a LDREXD/STREXD to a doubleword-aligned location 
for which the STREXD succeeds
+ cause single-copy atomic updates of the doubleword being accessed.

Interestingly, prior to issue C.c LDREXD's pseudocode contained one single-copy 
atomic memory access (0406C.b A8.8.77): MemA[address,8] , whereas now it 
contains two (0406C.d A8.8.78): MemA[address,4] and MemA[address+4,4].

Also regarding LPAE, there is a discrepancy between prose and pseudocode 
explanations of atomicity of LDRD/STRD on LPAE. In prose LDRD/STRD are atomic 
only in locations that might be used to hold translations, while in pseudocode 
they are always atomic.
LLVM doesn't change its code output for LPAE. GCC produces regular LDRD for 
load and keeps the LDREXD/STREXD loop for store. In the kernel both loads and 
stores are regular LDRD/STRD.

I've ran some litmus tests on a bunch of boards. Here are the results:
- cortex-a7 (BCM2836, RK3128), cortex-a9 (Exynos4412) and cortex-a17 (RK3188) - 
LDRD/STRD are single-copy atomic;
- cortex-a5 (MSM8625Q) - LDRD/STRD are not single-copy atomic, but it's enough 
to use LDEXRD/STRD to fix it: two writers with STRD and one reader with LDEXRD 
don't produce inconsistent results.
Interestingly, on cortex-a9 (without LPAE) regular LDRD/STRD are atomic.

Can you shed some light on the situation with LDREXD/STREXD? Why was the manual 
changed? Do you think we should change implementation in kernel and elsewhere 
to what manual suggests?
Also about LPAE, manual doesn't pose a requirement that all locations in the 
memory system are 64-bit single-copy atomic - only those that might be used to 
hold translations, "such as bulk SDRAM". Does this mean we can safely use 
LDRD/STRD?

Related patches/discussion:
http://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/005934.html
https://lists.infradead.org/pipermail/linux-arm-kernel/2013-March/157817.html
https://gcc.gnu.org/pipermail/gcc-patches/2012-April/338781.html
https://gcc.gnu.org/pipermail/gcc-patches/2016-February/442717.html
https://reviews.llvm.org/rGc882eb0723afa9dfe626eebb9699c1871a8bbbab

---
Peter

Reply via email to