> On 6 May 2015, at 18:19, alvise rigo <[email protected]> wrote: > > Hi Mark, > > Firstly, thank you for your feedback. > > On Wed, May 6, 2015 at 5:55 PM, Mark Burton <[email protected]> wrote: >> A massive thank you for doing this work Alvise, >> >> On our side, the patch we suggested is only applicable for ARM, though the >> mechanism would work for any CPU, >> - BUT >> It doesn’t force atomic instructions out through the slow path. This is >> either a very good thing (it’s much faster), or a very bad thing (it doesn’t >> allow you to treat them in the IO space), depending on your point of view. > > Indeed, this is for sure a more invasive approach, but it's made on > purpose to have control over those non-atomic stores that might modify > the 'linked' memory. >
exactly :-) Cheers Mark. >> >> Depending on what the rest of the community thinks, it seems to me we should >> apply both patches so that e.g. ARM’s existing atomic instructions run much >> faster and above all more ‘accurately’ - (with the patch we’ve provided), >> and the same mechanism can be applied to all other architectures - but we >> can - somehow - swap for this more ‘controllable’ implementation when e.g. >> the mutex is located in IO space…. > > Yes, this makes sense. > > Thank you, > alvise > >> >> Cheers >> >> Mark. >> >>> On 6 May 2015, at 17:38, Alvise Rigo <[email protected]> wrote: >>> >>> This patch series provides an infrastructure for atomic >>> instruction implementation in QEMU, paving the way for TCG multi-threading. >>> The adopted design does not rely on host atomic >>> instructions and is intended to propose a 'legacy' solution for >>> translating guest atomic instructions. >>> >>> The underlying idea is to provide new TCG instructions that guarantee >>> atomicity to some memory accesses or in general a way to define memory >>> transactions. More specifically, a new pair of TCG instructions are >>> implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as >>> LoadLink and StoreConditional primitives (only 32 bit variant >>> implemented). In order to achieve this, a new bitmap is added to the >>> ram_list structure (always unique) which flags all memory pages that >>> could not be accessed directly through the fast-path, due to previous >>> exclusive operations. This new bitmap is coupled with a new TLB flag >>> which forces the slow-path exectuion. All stores which take place >>> between an LL/SC operation by other vCPUs in the same memory page, will >>> fail the subsequent StoreConditional. >>> >>> In theory, the provided implementation of TCG LoadLink/StoreConditional >>> can be used to properly handle atomic instructions on any architecture. >>> >>> The new slow-path is implemented such that: >>> - the LoadLink behaves as a normal load slow-path, except for cleaning >>> the dirty flag in the bitmap. The TLB entries created from now on will >>> force the slow-path. To ensure it, we flush the TLB cache for the >>> other vCPUs >>> - the StoreConditional behaves as a normal store slow-path, except for >>> checking the state of the dirty bitmap and returning 0 or 1 whether or >>> not the StoreConditional succeeded (0 when no vCPU has touched the >>> same memory in the mean time). >>> >>> All those write accesses that are forced to follow the 'legacy' >>> slow-path will set the accessed memory page to dirty. >>> >>> In this series only the ARM ldrex/strex instructions are implemented. >>> The code was tested with bare-metal test cases and with Linux, using >>> upstream QEMU. >>> >>> This work has been sponsored by Huawei Technologies Dusseldorf GmbH. >>> >>> Alvise Rigo (5): >>> exec: Add new exclusive bitmap to ram_list >>> Add new TLB_EXCL flag >>> softmmu: Add helpers for a new slow-path >>> tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions >>> target-arm: translate: implement qemu_ldlink and qemu_stcond ops >>> >>> cputlb.c | 11 ++- >>> include/exec/cpu-all.h | 1 + >>> include/exec/cpu-defs.h | 2 + >>> include/exec/memory.h | 3 +- >>> include/exec/ram_addr.h | 19 +++- >>> softmmu_llsc_template.h | 233 >>> ++++++++++++++++++++++++++++++++++++++++++++++++ >>> softmmu_template.h | 52 ++++++++++- >>> target-arm/translate.c | 94 ++++++++++++++++++- >>> tcg/arm/tcg-target.c | 105 ++++++++++++++++------ >>> tcg/tcg-be-ldst.h | 2 + >>> tcg/tcg-op.c | 20 +++++ >>> tcg/tcg-op.h | 3 + >>> tcg/tcg-opc.h | 4 + >>> tcg/tcg.c | 2 + >>> tcg/tcg.h | 20 +++++ >>> 15 files changed, 538 insertions(+), 33 deletions(-) >>> create mode 100644 softmmu_llsc_template.h >>> >>> -- >>> 2.4.0 >>> >> >> >> +44 (0)20 7100 3485 x 210 >> +33 (0)5 33 52 01 77x 210 >> >> +33 (0)603762104 >> mark.burton >> +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton
