GCC interpretation of C11 atomics (DR 459)

2018-02-25 Thread Ruslan Nikolaev via gcc
Hi
I have read multiple bug reports (84522, 80878, 70490), and the past decision 
regarding GCC change to redirect double-width (128-bit) atomics for x86-64 and 
arm64 to libatomic. Below I mention major concerns as well as the response from 
C11 (WG14) regarding DR 459 which, most likely, triggered this change in more 
recent GCC releases in the first place. 
If I understand correctly, the redirection to libatomic was made for 2 reasons:
1. cmpxchg16b is not available on early amd64 processors. (However, mcx16 flag 
already specifies that you use CPUs that have this instruction, so it should 
not be a concern when the flag is specified.)
2. atomic_load on read-only memory. DR 459 now requires to have 'const' 
qualifiers for atomic_load which probably resulted in the interpretation that 
read-only memory must be supported. However, per response from C11/WG14 (see 
below), it does not seem to be the case at all. Therefore, previously filed bug 
70490 does not seem to be valid.
There are several concerns with current GCC behavior:

1. Not consistent with clang/llvm which completely supports double-width 
atomics for arm32, arm64, x86 and x86-64 making it possible to write portable 
code (w/o specific extensions or assembly code) across all these architectures 
(which is finally possible with C11!).The behavior of clang: if mxc16 is 
specified, cmpxchg16b is generated for x86-64 (without any calls to libatomic), 
otherwise -- redirection to libatomic. For arm64, ldaxp/staxp are always 
generated. In my opinion, this is very logical and non-confusing.

2. Oftentimes you want to have strict guarantees (by specifying mcx16 flag for 
x86-64) that the generated code is lock-free, otherwise it is useless. 
Double-width atomics are often used in lock-free algorithms that use tags 
(stamps) for pointers to resolve the ABA problem. So, it is very useful to have 
corresponding support in the compiler.

3. The behavior is inconsistent even within GCC. Older (and more limited, less 
portable, etc) __sync builtins still use cmpxchg16b directly, newer __atomic 
and C11 -- do not. Moreover, __sync builtins are probably less suitable for 
arm/arm64.

4. atomic_load can be implemented using read-modify-write as it is the only 
option for x86-64 and arm64 (see below).

For these reasons, it may be a good idea if GCC folks reconsider past decision. 
And just to clarify: if mcx16 (x86-64) is not specified during compilation, it 
is totally OK to redirect to libatomic, and there make the final decision if 
target CPU supports a given instruction or not. But if it is specified, it 
makes sense for performance reasons and lock-freedom guarantees to always 
generate it directly. 

-- Ruslan

Response from the WG14 (C11) Convener regarding DR 459: (I asked for a 
permission to publish this response here.)
Ruslan,

     Thank you for your comments.  There is no normative requirement that const 
objects be suitable for read-only memory.  An example and a footnote refer to 
read-only memory as a way to illustrate a point, but examples and footnotes are 
not normative.  The actual nature of read-only memory and how it can be used 
are outside the scope of the standard, so there is nothing to prevent 
atomic_load from being implemented as a read-modify-write operation.

                                        David
My original email:

Dear David Keaton,
After reviewing the proposed change DR 459 for C11: 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_459 ,I 
identified that adding const qualifier to atomic_load (C11 implements its 
without it) may actually be harmful in some cases.
Particularly, for double-width (128-bit) atomics found in x86-64 (cmpxchg16b 
instruction), arm64 (ldaxp/staxp instructions), it is currently only possible 
to implement atomic_load for 128 bit using corresponding read-modify-write 
instructions (i.e., potentially rewriting memory with the same value, but, in 
essence, not changing it). But these implementations will not work on read-only 
memory. Similar concerns apply to some extent to x86 and arm32 for double-width 
(64-bit) atomics. Otherwise, there is no obstacle to implement all C11 atomics 
for corresponding types in these architectures. Moreover, a well-known 
clang/llvm compiler already implements all double-width operations for x86, 
x86-64, arm32 and arm64 (atomic_load is implemented using corresponding 
read-modify-write instructions). Double-width atomics are often used in data 
structures that need tagging for pointers to avoid the ABA problem (e.g., in 
lock-free stacks and queues).
It is my understanding that C11 aimed to make atomics more or less portable 
across different microarchitectures, while at the same time provide an ability 
for a compiler to optimize code well and utilize all potential of the 
corresponding microarchitecture.
If now it is required to support read-only memory (i.e., const qualifier) for 
atomic_load, 128-bit atomics are likely be impossible to impleme

Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-25 Thread Ruslan Nikolaev via gcc
Alexander,
Thank you for your comments. Please see my response below. I definitely do not 
want to fight for or against this change in gcc, but there are definitely 
legitimate concerns to consider.  I think, it would really be good to consider 
this change to make things more compatible (i.e., at least between clang/llvm 
and gcc which can be both used within the same ecosystem). There are real 
practical benefits of having true lock-free double-width operations when 
implementing algorithms that rely on ABA tagging for pointers, and C11 at last 
gives an opportunity to do that without resorting to assembly or 
platform-specific implementations.


> Note that there's more issues to that than just behavior on readonly memory:
> you need to ensure that the whole program, including all static and shared
> libraries, is compiled with -mcx16 (and currently there's no ld.so/ld-level
> support to ensure that), or you'd need to be sure that it's safe to mix code
> compiled with different -mcx16 settings because it never happens to interop
> on wide atomic objects.

Well, if libatomic is already doing it when corresponding CPU feature is 
available (i.e., effectively implementing operations using cmpxchg16b), I do 
not see any problem here. mcx16 implies that you *have* cmpxchg16b, therefore 
other code compiled without -mcx16 flag will go to libatomic. Inside libatomic, 
it will detect that cmpxchg16b *is* available, thus making code compiled with 
and without -mcx16 flag completely compatible on a given system. Or do I miss 
something here?

If you do not have cmpxchg16b, but the program is compiled with the flag, it 
will simply not run (as expected).
 
So, in other words, libatomic should still decide whether you have cmpxchg16b 
or not for cases when -mcx16 is not specified. But if it is specified, 
cmpxchg16b can be generated unconditionally. If you want better compatibility, 
you will not specify the flag. Mix of -mcx16 and mno-cx16 will be, thus, binary 
compatible.

> Note that there's no "load" function in the __sync family, so the original
> concern about operations on readonly memory does not apply.
Yes, but per clarification from WG14/C11, read-only memory should not be a 
concern at all, as this behavior is not specified anyway (regardless of the 
const specifier). Read-modify-write is allowed for atomic_load as long as there 
is no 'visible' change on the value being loaded. In this sense, the bug that 
was filed previously regarding read-only memory accesses and const specifier 
does not seem to be valid.
Additionally, it is really odd and counterintuitive to still provide support 
for (almost) deprecated macros while not giving such an opportunity for newer 
and more advanced functions.

> You don't mention it directly, so just to make it clear for readers: on 
> systems
> where GNU IFUNC extension is available (i.e. on Glibc), libatomic tries to do
> exactly that: test for cmpxchg16b availability and redirect 128-bit atomics to
> lock-free RMW implementations if so.  (I don't like this solution)

Yes, but libatomic makes things slower due to indirection. Also, it is much 
harder to track what is going on, as there is no guarantee of lock-freedom in 
this case. BTW -- The fact that it currently uses cmpxchg16b if available may 
actually be helpful to switch to the suggested behavior without breaking binary 
compatibility (if I understand everything correctly).

-- Ruslan
   

Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
> I'd say the main issue is that libatomic is not guaranteed to work like that.


> Today it relies on IFUNC for redirection, so you may (and not "will") get the
> desired behavior on Glibc (implying Linux), not on other OSes, and neither on
> Linux with non-GNU libc (nor on bare metal, for that matter).
I think, in case if IFUNC is not available (i.e., outside glibc), redirection 
is still possible by introducing a regular function pointer there. Yes, it is 
an extra cost but better than nothing (+ consistent behavior on all platforms), 
probably will not add too much anyway because there is already a performance 
hit by going to libatomic.

   


Re: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
Thank you for more comments, my response is below.



On Mon, 26 Feb 2018, Szabolcs Nagy wrote:> 
> rmw load is only valid if the implementation can
> guarantee that atomic objects are never read-only.
But per response from WG14 regarding DR 459 which I quoted, the standard does 
not seem to define behavior for read-only memory (and const qualifier should 
not suggest that). RMW, according to them, is fine for atomic_load.

> current implementations on linux (including clang)
> don't do that, so an rmw load can observably break
> conforming c code: a static global const object is
> placed in .rodata section and thus rmw on it is a
> crash at runtime contrary to c standard requirements.
I have just tried to compile the code using clang. Latest stable version of 
clang seems to emit cmpxchg16b for the code you mentioned if I specify mcx16. 
If I do not, it redirects to libatomic.  (I have not tried the version from the 
trunk, though.)


  On Monday, February 26, 2018 8:57 AM, Alexander Monakov wrote:
> OK, but that sounds like a matter of not emitting atomic
> objects into .rodata, which shouldn't be a big problem,
> if not for backwards compatibility concern?

I agree, sounds like a good idea. Certainly for _Atomic objects > 8 bytes.

> and then with new enough libatomic on Glibc this segfaults
> with GCC on x86_64 too due to IFUNC redirection mentioned
> in the other subthread.

Seems like it is a problem anyway. Another reason to never emit _Atomic inside 
.rodata




   

Re: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
On Monday, February 26, 2018 1:15 PM, Florian Weimer  wrote:
 


> I think x86-64 should be able to do atomic load and store via SSE2 
> registers, but perhaps if the memory is suitably aligned (which is the 
> other problem—the libatomic code will work irrespective of alignment, as 
> far as I understand it).

IIRC, it is not always guaranteed to be atomic, so RMW is probably the only 
safe option for x86-64. And for ARM64, too, as far as I understand.

Just to summarize what can be done if the proposed change is accepted (from the 
discussion so far):

1. _Atomic on objects larger than 8 bytes should not be placed in .rodata even 
if declared as const. It can also be specified that atomic_load should not be 
used on read-only memory with double-width operations.

2. libatomic can be modified to redirect to functions that use cmpxchg16b 
(whenever available on target CPU) through regular functions pointers even if 
IFFUNC is not available. This will provide consistent behavior everywhere, and 
binary compatibility for mcx16 and mno-cx16

3. never redirect to libatomic for arm64 (since ldaxp/staxp are available), 
redirect for x86-64 only if mcx16 is not specified. For ARM64, there is no 
mcx16 option at all.

-- Ruslan



   

Re: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
Torvald, thank you for your output. See my response below. 

On Monday, February 26, 2018 1:35 PM, Torvald Riegel  
wrote:

> ... does not imply this latter statement.  The statement you cited is
> about what the standard itself requires, not what makes sense for a
> particular implementation. 

True but makes sense to provide true atomics when they are available. Since the 
standard seem to allow atomic_load implementation using RMW, does not seem to 
be a problem.
In fact, lock_free flag for this type can return true only if mcx16 is 
specified; otherwise -- it returns false (since it can only be determined 
during runtime, assuming worst case scenario)

> So, in such a case, using the wide CAS for
> atomic loads breaks a reasonable assumption.  Moreover, it's also a
> special case, in that 32b atomics do work as intended.

But in this case a programmer already makes an assumption that atomic_load does 
not use RMW which C11 does not seem to guarantee.Of course, for single-width 
operations, the programmer may in most practical cases assume it (even though 
there is no guarantee).
Anyway, there is no good solution here for double-width operations, and the 
programmer should not assume it is possible when writing portable code.In fact, 
lock-based solution is even more confusing and potentially error-prone (e.g., 
cannot be safely used inside signal handlers since it is not lock-free, etc)

> The behavior you favor would violate that, and
> there's no portable way to distinguish one from the other. 

There is already a similar problem with IFFUNC (when used with Linux and 
glibc). In fact, I do not see any difference here. Redirection to libatomic 
when mcx16 is specified just adds extra cost + less predictable behavior. 
Moreover, it seems counterintuitive -- I specify a flag that mcx16 is supported 
but gcc still does not use it (at least directly). It is possible to make a 
change to libatomic to always use cmpxchg16b when available (even on systems 
without IFFUNC), this way it is totally consistent and binary compatible for 
code compiled with and without mcx16.


> I see your point in wanting to have a builtin or such for the 64b atomic
> CAS.  However, IMO, this doesn't fit into the world of C11/C++11
> atomics, and thus rather should be accessible through a separate
> interface.
Why not? If atomic_load is not really an issue, then it may be good to use 
standardized interface.




   

Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
Torvald, I definitely do not want to insist on this design choice, but it makes 
sense to at least seriuously consider it given the concerns I described. And 
especially because IFFUNC in libatomic already redirects to cmpxchg16b, so it 
just adds extra cost and indirection. Quite frankly, I do not even see any 
serious problem here with respect to binary compatibility. Even if cmpxchg16b 
was not used on some platforms outside Linux, old binaries will go to libatomic 
which can now be updated to simply use cmpxchg16b. (Even for statically linked 
should not be an issue -- they will not have any direct interaction with newer 
binaries.)

 > Not getting the performance usually associated with atomic loads can be
> a big problem for code that tries to be portable.

I do not think it is a common use case anyway. How often atomic_load is used on 
double-width operations? If a programmer needs some guarantees and does not 
care about lock-freedom, why not use a regular lock here? This way nothing 
magical happens. Otherwise, he will may hit unexpected issues in places like 
signal handlers (which is hard to debug since it will hang only once in a 
while). With cmpxchg16b, it is at least more or less reproducible: if you tried 
to use it on read-only memory, you will immediately get a segfault.

> I think I now remember why we "didn't fix" libatomic: There might be
> compiled code out there that does use the wide CAS, so changing
> libatomic from the status quo to using its intenral locks could break
> programs.
Well, it already happens for Linux and glibc. There nothing will break. For 
other architectures, it would be good to implement the same, so that consistent 
behavior is observed everywhere.


> No, they only said that it doesn't need to be a concern for the
> standard.  Implementations have to pay attention to more things, so it
> is a concern for implementation.
Yes, but the only problem I see is that it is currently placed to .rodata when 
const is used. It is easy to resolve: just do not place it there for _Atomic 
objects > 8 bytes. Then also clarify that a programmer cannot safely cast some 
arbitrary object that can be placed in .rodata to use with atomic_load.
It needs to be addressed anyway, as there is already a segfault for provided 
example in x86-64 and Linux even with redirection to libatomic.

> It's not "visible" in the abstract machine under some setting of the
> as-if rule.  But it is definitely visible in an implementation in which
> the effects of read-only memory are visible (see my example of mapping
> memory from another process read-only so as to read data from that
> process).
True but it is not defined for read-only memory anyway, and no assumptions can 
be made in portable code. 

-- Ruslan






   

Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-26 Thread Ruslan Nikolaev via gcc
Thanks, everyone, for the output, it is very useful. I am just proposing to 
consider the change unless there are clear roadblocks. (Either design choice is 
probably OK with respect to the standard formally speaking, but there are some 
clear advantages also.) I wrote a summary of pros & cons (which, of course, is 
slightly biased towards the change :) )
I also opened Bug 84563 with the rationale.


Pros of the proposed approach:
1. Ability to use guaranteed lock-free double-width atomics (when mcx16 is 
specified for x86-64, and always for arm64) in more or less portable manner 
across different supported architectures (without resorting to non-standard 
extensions or writing separate assembly code for each architecture). Hopefully, 
the behavior may also be made more or less consistent across different 
compilers over time. It is already the case for clang/llvm. As mentioned, 
double-width lock-free atomics have real practical use (ABA tags for pointers).

2. More likely to find a bug immediately if a programmer tries to do something 
that is not guaranteed by the standard (i.e., getting segfault on read-only 
memory when using double-width atomic_load). This is true even if mcx16 is not 
used, as most CPUs have cmpxchg16b, and libatomic will use it.On the other 
hand, atomic_load implemented through locks may have hard-to-find and debug 
issues in signal handlers, interrupt contexts, etc when a programmer 
erroneously assumes that atomic_load is non-blocking

3. For arm64 the corresponding instructions are always available, no need for 
mcx16 flag or redirection to libatomic at all (libatomic may still keep old 
implementation for backward compatibility).
4. Faster & easy to analyze code when mcx16 is specified.

5. Ability to tell for sure if the implementation is lock-free by checking 
corresponding C11 flag when mcx16 is specified. When unspecified, the flag will 
be false to accommodate the worse-case scenario.

6. Consistent behavior everywhere on all platforms regardless of IFFUNC, mcx16 
flag, etc. If cmpxchg16b is available, it is always used (platforms that do not 
support IFFUNC will use function pointers for redirection). The only thing the 
mcx16 flag changes is removing indirection to libatomic and giving guaranteed 
lock_free flag for corresponding types. (BTW, in practice, if you use the flag, 
you should know what you are doing already)

7. Ability to finally deprecate old __sync builtins, and use new and more 
advanced __atomic everywhere.


Cons of the proposed approach:

1. Compiler may place const atomic objects to .rodata. (Avoided by making sure 
_Atomic objects with the size > 8 are not placed in .rodata + clarifying that 
casting random .rodata objects for double-width atomics is undefined and is not 
allowed.)

2. Backward compatibility concerns if used outside glibc/IFFUNC. Most likely, 
even in this case, not an issue since all calls there are already redirected to 
libatomic anyway, and statically-linked binaries will not interact with new 
binaries directly.
3. Read-only memory for atomic_load will not be supported for double-width 
types. But it is actually better than hiding the problem under the carpet 
(current behavior is actually even worse because it is inconsistent across 
different platforms, i.e. different for x86-64 in Linux and arm64). Anyway, it 
is better to use a lock-based approach explicitly if for whatever reason it is 
more preferable (read-only memory, performance (?), etc).
-- Ruslan


Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc
Formally speaking, either implementation satisfies C11 because the standard 
allows much leeway in the interpretation here. But, of course, it is kind of 
annoying that double-width types (and that also includes potentially 64-bit on 
some 32-bit processors, e.g. i586 also has cmpxchg8b and no official way to 
read atomically otherwise) need special handling and compiler extensions which 
basically means that in a number of cases I cannot write portable code, I need 
to put a bunch of architecture-dependent ifdefs, for say, 64 bit atomics even. 
(And all this mess to accommodate almost non-existent case when someone wants 
to use atomic_load on read-only memory for wide types, in which no good 
solution exists anyway.)

Particularly, imagine when someones writes some lock-free code for different 
types (in templates, macros, etc). It basically uses same C11 atomic primitives 
but for various integer sizes. Now I need special handling for larger types 
because whatever libatomic provides does not guarantee lock-freedom (i.e., 
useless) which otherwise I do not need. True that wider types may not be 
available across all architectures, but I would prefer to have generic and 
standard-conformant code at least for those that have them.



> That's a valid goal, but it does not imply that we should mess with how
> atomics are implemented by default, nor should we mess with the default
> use cases.  This goal wants something special, and that is exposing the
> fact that *only* a CAS is available to synchronize atomically on a
>particular type.  That is an extension of the existing atomics design.

See above

> The standard doesn't specify read-only memory, so it also doesn't forbid
> the concept.  The implementation takes it into account though, and thus
> it's defined in that context.
But my point is that a programmer cannot rely on this feature anyway unless 
she/he wants to write code which compiles only with gcc. It is unspecified by 
the standard and implementations that use read-modify-write for atomic_load are 
perfectly valid. The whole point to have this standard in the first place is to 
allow code be compiled by different compilers, otherwise people can just rely 
on gcc-specific extensions.


> The topic we're currently discussing does not significantly affect when
> we can remove __sync builtins, IMO.

They are the only builtins that directly expose double-width operations. Short 
of using assembly fall-backs, they are the only option right now.

> They do care about whether atomic operations are natively supported on
> that particular type -- and that should include a load.
I think, the whole point to have atomic operations is ability to provide 
lock-free operations whenever possible. Even though standard does not guarantee 
it, that is almost the only sane use case. Otherwise, there is no point -- you 
can always use locks. If they do not care about lock-freedom, they should just 
use locks.


> Nobody is proposing to mark things as lock-free if they aren't.  Thus, I
> don't see any change to what's usable in signal handlers.
It is not obvious to anyone that atomic_load will block. It will *not* for 
single-width types. So, again we see differences for single- and double-width 
types. Even though you do not have problems with read-only memory, you have 
another problem for double-width types which may be even more subtle and much 
harder to debug in a number of cases. Of course, no one can make an assumption 
that it will not block, but the same can be said about read-only memory.
Anyway, I do not have a horse in the race... I just proposed to consider this 
change for a number of legitimate use cases, but it is eventually up to the gcc 
developers to decide.
-- Ruslan
   

Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc



> 1) your proposal would make gcc non-conforming to iso c unless it changes how 
> static const objects are emitted.
I do not think, ISO C requires to put const objects to .rodata. And it is 
easily solved by not placing it there for _Atomic objects that cannot be safely 
loaded from read-only memory.

> 2) the two implementations are not abi compatible, the choice is already 
> made, changing it is an abi break.
Since current implementations redirects to libatomic anyway, almost nothing 
should break. The only case it will break -- if somebody erroneously used 
atomic_load for 128-bit type on read-only memory (which is, again, not 
guaranteed by the standard). In practice, this case almost non-existent. The 
worst that may happen -- you will a segfault right away.

> 3) Torvald pointed out further considerations such as users expecting 
> lock-free atomic loads to be faster than stores.

Is it even true? Is it faster to use some global lock (implemented through RMW) 
than a single RMW operation? If you use this global lock, you will not get 
loads faster than stores.

   


Re: Fw: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc
Torvald, thank you for your output, but I think, this discussion gets a little 
pointless. There is nothing else I can add since gcc folks are reluctant to 
this change anyway. In my opinion, there is no compelling reason against such 
an implementation (it is perfectly fine with the standard, read-only memory is 
not guaranteed for atomic_load anyway). Even binary compatibility that was 
mentioned is unlikely to be an issue if implemented as I described. And finally 
this is something that can actually be useful in practice (at least as far as I 
can judge from my experience). By the way, this issue was already raised 
multiple times during last couple of years by different people who actually use 
it for various real projects (bugs were eventually closed as 'INVALID').
All described challenges are purely technical and can easily be resolved. 
Moreover, clang/llvm chose this implementation, and it seems very logical and 
non-confusing to me. It certainly makes sense to expose hardware capabilities 
through standard interfaces whenever possible.

For my projects, I will simply fall back to my own implementation using inline 
assembly (at least for now) because, unfortunately, it is the only thing that 
is guaranteed to work outside of clang/llvm in the foreseeable future (__sync 
functions have some limitations and do not look like an attractive option 
either, by the way).



On Tuesday, February 27, 2018 11:21 AM, Torvald Riegel  
wrote:
 

 On Tue, 2018-02-27 at 13:16 +0000, Ruslan Nikolaev via gcc wrote:
> > 3) Torvald pointed out further considerations such as users expecting 
> > lock-free atomic loads to be faster than stores.
> 
> Is it even true? Is it faster to use some global lock (implemented through 
> RMW) than a single RMW operation? If you use this global lock, you will not 
> get loads faster than stores.

If GCC declares a type as lock-free, atomic loads on this type will be
natively supported through some sort of load instruction.  That means
they are faster than stores under concurrent accesses, in particular
when there are concurrent atomic loads (for all major HW we care about).

If there is no natively supported atomic load, GCC will not declare the
type to be lock-free.

Nobody made statement about performance of locks vs. RMWs.





   

Re: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc



> Consider a producer-consumer relationship between two processes where
> the producer doesn't want to wait for the consumer.  For example, the
> producer could be an application that's being traced, and the consumer
> is a trace aggregation tool.  The producer can provide a read-only
> mapping to the consumer, and put a nonblocking ring buffer or something
> similar in there.  That allows the consumer to read, but it still needs
> atomic access because the consumer is modifying the ring buffer
> concurrently.
Sorry for getting into someone's else conversation... And what good solution 
gcc offers right now? It forces producer and consumer to use lock-based (BTW: 
global lock!) approach for *both* producer and consumer if we are talking about 
128-bit types.  Therefore, sometimes producers *will* wait (by, effectively, 
blocking). Basically, it becomes useless. In this case, I would rather use a 
lock-based approach which at least does not use a global lock. On the contrary, 
the alternative implementation would have been at least useful when both 
producers and consumers have full (RW) access.

Anyway, I already said that I personally will go with assembly inlines for 
right now. I just wanted to raise this concern since other people may find it 
useful in their projects.


   

Re: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc


> But we're not talking about that special case of 128b types here.  The
> majority of synchronization doesn't need more than machine word size.
Then why do you worry about read-only access for 128b types? (it is a special 
case anyway).

> No, such a program would have a bug anyway.  It wouldn't even
> synchronize properly. 

Therefore, it was not a valid example / use case (for 128-bit) in the first 
place. It was a *valid* example for smaller atomics, though. But that is 
exactly my point -- your current solution for 128 bit does not add any 
practical value except when you want to use lock-based solution (but see my 
explanation below).


> The lock would need to be shared between processes in the example I
> gave.  You have to build your own lock for that currently, because C/C++
> don't give you any process-shared locks.
At least in Linux, you can simply use eventfd(2) to reliably do it (without 
relying on "array of locks"). Given that it is not a very common use case, does 
not seem to need to have special C standard for this. And whatever C11 provides 
can not be relied upon anyway since you do not have strict guarantees that 
read-only memory is supported for larger types. For example, clang (and 
possibly other compilers) will break this assumption. At least, I would prefer 
to use eventfd in my application (if ever needed at all) since it has reliable 
and well-defined behavior in Linux.


   

Re: GCC interpretation of C11 atomics (DR 459)

2018-02-27 Thread Ruslan Nikolaev via gcc
Torvald, I think this discussion, indeed, gets pointless. Some of your 
responses clearly take my comments out of larger picture and context of the 
discussion.
One thing is clear that either implementation is fine with the standard 
(formally speaking) simply because the standard allows too much leeway on how 
you implement atomics. In fact, as I mentioned clang/llvm implements it 
differently. I find it as a weakness of the standard, actually, because for 
portable (across different compilers), the only thing you can more or less 
safely assume are single-width types.
Thank you for your output and discussion.