Yvan,
That is correct. The addresses need to be aligned as per the restrictions in
the architecture. Yes we could have an issue but software writers need to deal
with it because IIUC you either have a huge performance penalty (x86 / lock
prefix) or correctness issues (ARM, Powerpc) .
Cheers,
Ramana
From: Yvan Roux [mailto:yvan.r...@linaro.org]
Sent: 23 November 2012 10:29
To: Ramana Radhakrishnan
Cc: linaro-toolchain@lists.linaro.org
Subject: Re: Atomic builtins questions
Hi Ramana and Peter,
There is no issue in the first case. You are correct that the dmb's there are
to ensure the sequential consistency as you'd want to see with __ATOMIC_SEQ_CST
in the call to the builtin. However what you must remember is that STRs are
guaranteed to be single-copy atomic by the architecture on v7-a. In general on
v7-a there is single-copy atomicity with loads and stores to byte, half-words
(2 byte aligned) and word (4 byte aligned) addresses i.e. ldrb, ldrh and ldr.
ok, but if I understand well the ARM ARM atomicity chapter (A3.5.3) the
single-copy atomicity
is not guarantee for unaligned access, so maybe I missed something, but if it
is allowed to call
an atomic builtin on an unaligned address, we could have an issue.
On systems with LPAE enabled, ldrd and strd are also guaranteed to be atomic
copies from 64 bit aligned addresses, thus in general for v7-a you should be
using the ldrexd / strexd variant in this particular case. Thus the code for
the first example should work correctly as is and *not* require an ldrex /
strex implementation.
The second case always hurts my head - the short story is that you are right
and it looks like a bug. The barrier needs to be after the load and not before
because `Acquire' semantics imply that no reads in the current thread dependent
on the value currently loaded can be reordered before this load. What this
means is that loads before this operation can percolate downwards towards the
barrier .
The other clue to this is the definition around the `Release' semantics, where
no writes in the current thread can be reordered after this store. This implies
that the write operation puts out barriers *before* the write which is what we
do with __atomic_store_n (addr, 0, __ATOMIC_RELEASE); and that appears to be
correct.
I am agree.
However what's not clear to me is why there is this deliberate choice in the
default implementation of expand_atomic_load to put out a memory fence before
the load and that seems to have percolated back down into the backend for
atomic_loaddi. We should take this up upstream as a bug.
It is not clear for me too, I would have write the expand_atomic_load like that
:
if (model == MEMMODEL_SEQ_CST)
expand_mem_thread_fence (model);
emit_move_insn (target, mem);
if (model == MEMMODEL_SEQ_CST || model == MEMMODEL_ACQUIRE)
expand_mem_thread_fence (model);
I'll ask the question upstream.
Yvan
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain