Re: Fwd: Re: GCC libatomic questions
Hi, I have a revised version of the libatomic ABI draft which tries to accommodate Richard's comments. The new version is attached. The diff is also appended. Thanks, - Bin diff ABI.txt ABI-1.1.txt 28a29,30 > - The versioning of the library external symbols > 47a50,57 > Note > > Some 64-bit x86 ISA does not support the cmpxchg16b instruction, for > example, some early AMD64 processors and later Intel Xeon Phi co- > processor. Whether cmpxchg16b is supported may affect the ABI > specification for certain atomic types. We will discuss the detail > where it has an impact. > 101c111,112 < _Atomic __int12816 16 N not applicable --- > _Atomic __int128 (with at16)1616 Y not applicable > _Atomic __int128 (w/o at16) 1616 N not applicable 105c116,117 < _Atomic long double 1616 N 12 4 N --- > _Atomic long double (with at16) 1616 Y 12 4 N > _Atomic long double (w/o at16) 1616 N 12 4 N 106a119,120 > _Atomic double _Complex 1616(8) Y 16 16(8) N > (with at16) 107a122 > (w/o at16) 110a126,127 > _Atomic long double _Imaginary 1616 Y 12 4 N > (with at16) 111a129 > (w/o at16) 146a165,167 > with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA > does not support cmpxchg16b. > 191a213,214 > _Atomic struct {char a[16];}1616(1) Y 1616(1) N > (with at16) 192a216 > (w/o at16) 208a233,235 > with at16 means the ISA supports cmpxchg16b, w/o at16 means the ISA > does not support cmpxchg16b. > 246a274,276 > On the 64-bit x86 platform which supports the cmpxchg16b instruction, > 16-byte atomic types whose alignment matches the size is inlineable. > 303,306c333,338 < CMPXCHG16B is not always available on 64-bit x86 platforms, so 16-byte < naturally aligned atomics are not inlineable. The support functions for < such atomics are free to use lock-free implementation if the instruction < is available on specific platforms. --- > "Inlineability" is a compile time property, which in most cases depends > only on the type. In a few cases it also depends on whether the target > ISA supports the cmpxchg16b instruction. A compiler may get the ISA > information by either compilation flags or inquiring the hardware > capabilities. When the hardware capabilities information is not available, > the compiler should assume the cmpxchg16b instruction is not supported. 665a698,705 > The function takes the size of an object and an address which > is one of the following three cases > - the address of the object > - a faked address that solely indicates the alignment of the > object's address > - NULL, which means that the alignment of the object matches size > and returns whether the object is lock-free. > 711c751 < 5. Libatomic Assumption on Non-blocking Memory Instructions --- > 5. Libatomic symbol versioning 712a753,868 > Here is the mapfile for symbol versioning of the libatomic library > specified by this ABI specification > > LIBATOMIC_1.0 { > global: > __atomic_load; > __atomic_store; > __atomic_exchange; > __atomic_compare_exchange; > __atomic_is_lock_free; > > __atomic_add_fetch_1; > __atomic_add_fetch_2; > __atomic_add_fetch_4; > __atomic_add_fetch_8; > __atomic_add_fetch_16; > __atomic_and_fetch_1; > __atomic_and_fetch_2; > __atomic_and_fetch_4; > __atomic_and_fetch_8; > __atomic_and_fetch_16; > __atomic_compare_exchange_1; > __atomic_compare_exchange_2; > __atomic_compare_exchange_4; > __atomic_compare_exchange_8; > __atomic_compare_exchange_16; > __atomic_exchange_1; > __atomic_exchange_2; > __atomic_exchange_4; > __atomic_exchange_8; > __atomic_exchange_16; > __atomic_fetch_add_1; > __atomic_fetch_add_2; > __atomic_fetch_add_4; > __atomic_fetch_add_8; > __atomic_fetch_add_16; > __atomic_fetch_and_1; > __atomic_fetch_and_2; > __atomic_fetch_and_4; > __atomic_fetch_and_8; > __atomic_fetch_and_16; > __atomic_fetch_nand_1; > __atomic_fetch_nand_2; > __atomic_fetch_nand_4; > __atomic_fetch_nand_8; > __atomic_fetch_nand_16; > __atomic_fetch_or_1; > __atomic_fetch_or_2; > __atomic_fetch_or_4; > __atomic_fetch_or_8; > __atomic_fetch_or_16; > __atomic_fetch_sub_1; > __atomic_fetch_sub_2; > __atomic_fetch_sub_4; > __atomic_fetch_sub_8; > __atomic_fetch_sub_16; > __atomic_fetch_xor_1; > __atomic_fetch_xor_2; > __atomic_fetch_xor_4; > __atomic_fetch_xor_8; > __atomic_fetch_xor_16; > __atomic_load_1; > __atomic_load_2; > __atomic_l
GCC libatomic ABI specification draft
Got an error from gcc@gcc.gnu.org alias. Remove the pdf attachment and re-send it to the alias ... On 11/14/2016 4:34 PM, Bin Fan wrote: Hi All, I have an updated version of libatomic ABI specification draft. Please take a look to see if it matches GCC implementation. The purpose of this document is to establish an official GCC libatomic ABI, and allow compatible compiler and runtime implementations on the affected platforms. Compared to the last version you have reviewed, here are the major updates - Rewrite the notes in N2.3.2 to explicit mention the implementation of __atomic_compare_exchange follows memcmp/memcpy semantics, and the consequence of it. - Rewrite section 3 to replace "lock-free" operations with "hardware backed" instructions. The digest of this section is: 1) inlineable atomics must be implemented with the hardware backed atomic instructions. 2) for non-inlineable atomics, the compiler must generate a runtime call, and the runtime support function is free to use any implementation. - The Rationale section in section 3 is also revised to remove the mentioning of "lock-free", but there is not major change of concept. - Add note N3.1 to emphasize the assumption of general hardware supported atomic instruction - Add note N3.2 to discuss the issues of cmpxchg16b - Add a paragraph in section 4.1 to specify memory_order_consume must be implemented through memory_order_acquire. Section 4.2 emphasizes it again. - The specification of each runtime functions mostly maps to the corresponding generic functions in the C11 standard. Two functions are worth noting: 1) C11 atomic_compare_exchange compares and updates the "value" while __atomic_compare_exchange functions in this ABI compare and update the "memory", which implies the memcmp and memcpy semantics. 2) The specification of __atomic_is_lock_free allows both a per-object result and a per-type result. A per-type implementation could pass NULL, or a faked address as the address of the object. A per-object implementation could pass the actual address of the object. Thanks, - Bin On 8/10/2016 3:33 PM, Bin Fan wrote: Hi Torvald, Thanks a lot for your review. Please find my response inline... On 8/5/2016 8:51 AM, Torvald Riegel wrote: [CC'ing Andrew MacLeod, who has been working on the atomics too.] On Tue, 2016-08-02 at 16:28 -0700, Bin Fan wrote: I'm wondering if you have a chance to review the revised libatomic ABI draft. The email was rejected by the gcc alias once due to some html stuff in the email text. Though I resend a pure txt format version, I'm not sure if it worked, so this time I drop the gcc alias. If you do not have any issues, I'm wondering if this ABI draft could be published in some GCC wiki or documentation? I'd be happy to prepare a version without the "notes" part. Because the padding of structure types is not affected by _Atomic modifier, the contents of any padding in the atomic structure object is still undefined, therefore the atomic compare and exchange operation on such objects may fail due to the difference of the padding. I think this isn't quite clear. This paragraph is just to clarify that _Atomic does not change (e.g. zeroing out) the padding bits, whose content were undefined in the current SPARC and x86 ABI specifications, and will still be undefined for _Atomic aggregates. This paragraph is part of "notes" rather than the main body of the ABI draft. If it is not clear, I will change it by mentioning the memcmp/memcpy-like semantics. Perhaps it's easier to describe it in the way that C++ does, referring to the memcmp/memcpy-like semantics of compare_exchange (e.g., see N4606 29.6.5p27). C11 isn't quite clear about this, or I am misunderstanding what they really mean by "value of the object" (see N1570 7.17.7.4p2). This is the subject of C11 Defect Report 431: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_431 which has been fixed to align with the C++ standard and closed with a Proposed Technical Corrigendum which will appear in the next revision of the C standard (~2017). Note that in section 4.2 of this ABI draft, the function description of __atomic_compare_exchange uses "compares the memory pointed to by object" instead of "compares the value pointed to by object" as you quoted from N1570 7.17.7.4p. Since you asked about whether you should review the function descriptions, this is one of the two worth noticing cases. I will mention another one later in this email. Lock-free atomic operations does not require runtime support functions. The compiler may generate inlined code for efficiency. This ABI specification defines a few inlineable atomic types. An atomic type is inlineable means the compiler may generate inlined instruction sequence for atomic operations on such types. The
Re: GCC libatomic ABI specification draft
Hi Szabolcs, > On Nov 29, 2016, at 3:11 AM, Szabolcs Nagy wrote: > > On 17/11/16 20:12, Bin Fan wrote: >> >> Although this ABI specification specifies that 16-byte properly aligned >> atomics are inlineable on platforms >> supporting cmpxchg16b, we document the caveats here for further discussion. >> If we decide to change the >> inlineable attribute for those atomics, then this ABI, the compiler and the >> runtime implementation should be >> updated together at the same time. >> >> >> The compiler and runtime need to check the availability of cmpxchg16b to >> implement this ABI specification. >> Here is how it would work: The compiler can get the information either from >> the compiler flags or by >> inquiring the hardware capabilities. When the information is not available, >> the compiler should assume that >> cmpxchg16b instruction is not supported. The runtime library implementation >> can also query the hardware >> compatibility and choose the implementation at runtime. Assuming the user >> provides correct compiler options > > with this abi the runtime implementation *must* query the hardware > (because there might be inlined cmpxchg16b in use in another module > on a hardware that supports it and the runtime must be able to sync > with it). Thanks for the comment. Yes, the ABI requires libatomic must query the hardware. This is necessary if we want the compiler to generate inlined code for 16-byte atomics. Note that this particular issue only affects x86. I notice GCC already have a few builtins declared in cpuid.h. The functions are x86 specific. So couldn’t the query be done by those functions? > > currently gcc libatomic does not guarantee this which is dangerously > broken: if gcc is configured with --disable-gnu-indirect-function > (or on targets without ifunc support: solaris, bsd, android, musl,..) > the compiler may inline cmpxchg16b in one translation unit but use > incompatible runtime function in another. > > there is PR 70191 but this issue has wider scope. This issue was actually found by us while we are working on the ABI draft. So we filed the bug and we think it should be fixed. Compiler inlining 16-byte atomics has other issues as noted in the ABI draft. So the alternative is stop inlining those atomics, but that would need a compiler fix. Thanks, - Bin > >> and the inquiry returns the correct information, on a platform that supports >> cmpxchg16b, the code generated >> by the compiler will both use cmpxchg16b; on a platform that does not >> support cmpxchg16b, the code generated >> by the compiler, including the code generated for a generic platform, always >> call the support function, so >> there is no compatibility problem. >