Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Peter Zijlstra
On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

>   int x = 0;
>   int y = 0;
>   int r0, r1;
> 
>   int dummy;
> 
>   void t0(void)
>   {
>   __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> 
>   __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
> 
>   r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
>   }
> 
>   void t1(void)
>   {
>   __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
>   r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
>   }
> 
>   // BUG_ON(r0 == 0 && r1 == 0)
> 
> On x86-64 (gcc 13.1 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   dummy(%rip), %eax
>   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   ret
>   t1():
>   movl$1, y(%rip)
>   lock orq $0, (%rsp)
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   %eax, dummy(%rip)
>   mfence;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   retq
>   t1():
>   movl$1, y(%rip)
>   mfence
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   retq



Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Jonathan Wakely via Gcc
On Tue, 4 Jul 2023 at 10:47, Peter Zijlstra wrote:
>
> On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
>
> >   int x = 0;
> >   int y = 0;
> >   int r0, r1;
> >
> >   int dummy;
> >
> >   void t0(void)
> >   {
> >   __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> >
> >   __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
> >   __atomic_thread_fence(__ATOMIC_SEQ_CST);
> >
> >   r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
> >   }
> >
> >   void t1(void)
> >   {
> >   __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
> >   __atomic_thread_fence(__ATOMIC_SEQ_CST);
> >   r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
> >   }
> >
> >   // BUG_ON(r0 == 0 && r1 == 0)
> >
> > On x86-64 (gcc 13.1 -O2) we get:
> >
> >   t0():
> >   movl$1, x(%rip)
> >   movl$1, %eax
> >   xchgl   dummy(%rip), %eax
> >   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
> >   movly(%rip), %eax
> >   movl%eax, r0(%rip)
> >   ret
> >   t1():
> >   movl$1, y(%rip)
> >   lock orq $0, (%rsp)
> >   movlx(%rip), %eax
> >   movl%eax, r1(%rip)
> >   ret
>
> So I would expect the compilers to do better here. It should know those
> __atomic_thread_fence() thingies are superfluous and simply not emit
> them. This could even be done as a peephole pass later, where it sees
> consecutive atomic ops and the second being a no-op.

Right, I don't see why we need a whole set of new built-ins that say
"this fence isn't needed if the adjacent atomic op already implies a
fence". If the adjacent atomic op already implies a fence for a given
ISA, then the compiler should already be able to elide the explicit
fence.

So just write your code with the explicit fence, and rely on the
compiler to optimize it properly. Admittedly, today's compilers don't
do that optimization well, but they also don't support your proposed
built-ins, so you're going to have to wait for compilers to make
improvements either way.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
discusses that compilers could (and should) optimize around atomics
better.

>
> > On x86-64 (clang 16 -O2) we get:
> >
> >   t0():
> >   movl$1, x(%rip)
> >   movl$1, %eax
> >   xchgl   %eax, dummy(%rip)
> >   mfence;; Redundant with previous exchange.
>
> And that's just terrible :/ Nobody should be using MFENCE for this. And
> using MFENCE after a LOCK prefixes instruction (implicit in this case)
> is just fail, because I don't think C++ atomics cover MMIO and other
> such 'lovely' things.
>
> >   movly(%rip), %eax
> >   movl%eax, r0(%rip)
> >   retq
> >   t1():
> >   movl$1, y(%rip)
> >   mfence
> >   movlx(%rip), %eax
> >   movl%eax, r1(%rip)
> >   retq
>


Re: wishlist: support for shorter pointers

2023-07-04 Thread David Brown via Gcc

On 03/07/2023 18:42, Rafał Pietrak via Gcc wrote:

Hi Ian,

W dniu 3.07.2023 o 17:07, Ian Lance Taylor pisze:
On Wed, Jun 28, 2023 at 11:21 PM Rafał Pietrak via Gcc 
 wrote:

[]

I was thinking about that, and it doesn't look as requiring that deep
rewrites. ABI spec, that  could accomodate the functionality could be as
little as one additional attribute to linker segments.


If I understand correctly, you are looking for something like the x32
mode that was available for a while on x86_64 processors:
https://en.wikipedia.org/wiki/X32_ABI .  That was a substantial amount
of work including changes to the compiler, assembler, linker, standard
library, and kernel.  And at least to me it's never seemed
particularly popular.


Yes.

And WiKi reporting up to 40% performance improvements in some corner 
cases is impressive and encouraging. I believe, that the reported 
average of 5-8% improvement would be significantly better within MCU 
tiny resources environment. In MCU world, such improvement could mean 
fit-nofit of a project into a particular device.


-R



A key difference is that using 32-bit pointers on an x86 is enough 
address space for a large majority of use-cases, while even on the 
smallest small ARM microcontroller, 16-bit is not enough.  (It's not 
even enough to access all memory on larger AVR microcontrollers - the 
only 8-bit device supported by mainline gcc.)  So while 16 bits would 
cover the address space of the RAM on a small ARM microcontroller, it 
would not cover access to code/flash space (including read-only data), 
IO registers, or other areas of memory-mapped memory and peripherals. 
Generic low-level pointers really have to be able to access everything.


So an equivalent of x32 mode would not work at all.  Really, what you 
want is a 16-bit "small pointer" that is added to 0x2000 (the base 
address for RAM in small ARM devices, in case anyone following this 
thread is unfamiliar with the details) to get a real data pointer.  And 
you'd like these small pointers to have convenient syntax and efficient use.


I think a C++ class (or rather, class template) with inline functions is 
the way to go here.  gcc's optimiser will give good code, and the C++ 
class will let you get nice syntax to hide the messy details.


There is no good way to do this in C.  Named address spaces would be a 
possibility, but require quite a bit of effort and change to the 
compiler to implement, and they don't give you anything that you would 
not get from a C++ class.


(That's not quite true - named address spaces can, I believe, also 
influence the section name used for allocation of data defined in these 
spaces, which cannot be done by a C++ class.)


David



Re: wishlist: support for shorter pointers

2023-07-04 Thread Oleg Endo
> I think a C++ class (or rather, class template) with inline functions is 
> the way to go here.  gcc's optimiser will give good code, and the C++ 
> class will let you get nice syntax to hide the messy details.
> 
> There is no good way to do this in C.  Named address spaces would be a 
> possibility, but require quite a bit of effort and change to the 
> compiler to implement, and they don't give you anything that you would 
> not get from a C++ class.
> 
> (That's not quite true - named address spaces can, I believe, also 
> influence the section name used for allocation of data defined in these 
> spaces, which cannot be done by a C++ class.)
> 

Does the C++ template class shebang work for storing "short code pointers"
for things like compile-time/link-time generated function tables?  Haven't
tried it myself, but somehow I doubt it.

Cheers,
Oleg




Re: wishlist: support for shorter pointers

2023-07-04 Thread Rafał Pietrak via Gcc




W dniu 3.07.2023 o 18:29, Rafał Pietrak pisze:

Hi David,


[--]
4. It is worth taking a step back, and thinking about how you would 
like to use these pointers.  It is likely that you would be better 
thinking in terms of an array, rather than pointers - after all, you 
don't want to be using dynamically allocated memory here if you can 
avoid it, and certainly not generic malloc().  If you can use an 
array, then your index type can be as small as you like - maybe 
uint8_t is enough.


I did that trip ... some time ago. May be I discarded the idea 
prematurely, but I dropped it because I was afraid of cost of 


I remember now what was my main problem with indexes implementation: 
inability to express/write chain "references" with them. Table/index 
semantic of:

t[a][b][c][d].
is a "multidimentional table" which is completely different from 
"pointer semantic" of:

*t->a->b->c->d

It is quite legit to do a full circle around a circular list this way, 
while table semantics doesn't allow that.


Indexes are off the table.

-R


Re: wishlist: support for shorter pointers

2023-07-04 Thread Rafał Pietrak via Gcc

Hi,

W dniu 4.07.2023 o 14:38, David Brown pisze:
[-]
A key difference is that using 32-bit pointers on an x86 is enough 
address space for a large majority of use-cases, while even on the 
smallest small ARM microcontroller, 16-bit is not enough.  (It's not 
even enough to access all memory on larger AVR microcontrollers - the 
only 8-bit device supported by mainline gcc.)  So while 16 bits would 
cover the address space of the RAM on a small ARM microcontroller, it 
would not cover access to code/flash space (including read-only data), 
IO registers, or other areas of memory-mapped memory and peripherals. 
Generic low-level pointers really have to be able to access everything.


Naturaly 16-bit is "most of the time" not enough to cover the entire 
workspace on even the smallest MCU (AVR being the only close to an 
exception here), but in my little experience, that is not really 
necessary. Meaning "generic low-level pointers really have to...", I 
don't think so. I really don't. Programs often manipulate quite 
"localized" data, and compiler is capable enough to distinguish and keep 
separate pointers of different "domains". What makes it currently 
impossible is tools (semantic constructs like pragma or named sections) 
that would let it happen.




So an equivalent of x32 mode would not work at all.  Really, what you 
want is a 16-bit "small pointer" that is added to 0x2000 (the base 
address for RAM in small ARM devices, in case anyone following this 
thread is unfamiliar with the details) to get a real data pointer.  And 
you'd like these small pointers to have convenient syntax and efficient 
use.


more or less yes. But "with a twist". A "compiler construct" that would 
be (say) sufficient to get the RAM-savings/optimization I'm aiming at 
could be "reduced" to the ability to create "medium-size" array of "some 
objects" and have them reference each other all WITHIN that "array". 
That array was in my earlier emails referred to as segment or section. 
So whenever a programmer writes a construct like:


struct test_s attribute((small-and-funny)) {
struct test_s attribute((small-and-funny)) *next, *prev, *head;
struct test_s attribute((small-and-funny)) *user, *group;
} repository[1000];
struct test_s attribute((small-and-funny)) *master, *trash;

compiler puts that data into that small array (dedicated section), so no 
"generic low-level pointers" referring that data would need to exist 
within the program. And if it happens, error is thrown (or 
autoconversion happen).




I think a C++ class (or rather, class template) with inline functions is 
the way to go here.  gcc's optimiser will give good code, and the C++ 
class will let you get nice syntax to hide the messy details.


OK. Thenx for the advice, but going into c++ is a major thing for me and 
(at least for  the time being) I'll stay with ordinary "big" pointers in 
plain C instead.


There is no good way to do this in C.  Named address spaces would be a 
possibility, but require quite a bit of effort and change to the 
compiler to implement, and they don't give you anything that you would 
not get from a C++ class.


Yes. named address spaces would be great. And for code, too.

(That's not quite true - named address spaces can, I believe, also 
influence the section name used for allocation of data defined in these 
spaces, which cannot be done by a C++ class.)


OK.

-R


Re: wishlist: support for shorter pointers

2023-07-04 Thread David Brown via Gcc

On 04/07/2023 16:20, Rafał Pietrak wrote:



W dniu 3.07.2023 o 18:29, Rafał Pietrak pisze:

Hi David,


[--]
4. It is worth taking a step back, and thinking about how you would 
like to use these pointers.  It is likely that you would be better 
thinking in terms of an array, rather than pointers - after all, you 
don't want to be using dynamically allocated memory here if you can 
avoid it, and certainly not generic malloc().  If you can use an 
array, then your index type can be as small as you like - maybe 
uint8_t is enough.


I did that trip ... some time ago. May be I discarded the idea 
prematurely, but I dropped it because I was afraid of cost of 


I remember now what was my main problem with indexes implementation: 
inability to express/write chain "references" with them. Table/index 
semantic of:

 t[a][b][c][d].
is a "multidimentional table" which is completely different from 
"pointer semantic" of:

 *t->a->b->c->d

It is quite legit to do a full circle around a circular list this way, 
while table semantics doesn't allow that.


Indexes are off the table.

-R


If you have a circular buffer, it is vastly more efficient to have an 
array with no pointers or indices, and use head and tail indices to 
track the current position.  But I'm not sure if that is what you are 
looking for.  And you can use indices in fields for chaining, but the 
syntax will be different.  (For some microcontrollers, the 
multiplications involved in array index calculations can be an issue, 
but not for ARM devices.)





Re: wishlist: support for shorter pointers

2023-07-04 Thread David Brown via Gcc

On 04/07/2023 16:46, Rafał Pietrak wrote:

Hi,

W dniu 4.07.2023 o 14:38, David Brown pisze:
[-]
A key difference is that using 32-bit pointers on an x86 is enough 
address space for a large majority of use-cases, while even on the 
smallest small ARM microcontroller, 16-bit is not enough.  (It's not 
even enough to access all memory on larger AVR microcontrollers - the 
only 8-bit device supported by mainline gcc.)  So while 16 bits would 
cover the address space of the RAM on a small ARM microcontroller, it 
would not cover access to code/flash space (including read-only data), 
IO registers, or other areas of memory-mapped memory and peripherals. 
Generic low-level pointers really have to be able to access everything.


Naturaly 16-bit is "most of the time" not enough to cover the entire 
workspace on even the smallest MCU (AVR being the only close to an 
exception here), but in my little experience, that is not really 
necessary.


(Most MSP430 devices, also supported by GCC, are also covered by a 
16-bit address space.)


Meaning "generic low-level pointers really have to...", I 
don't think so. I really don't. Programs often manipulate quite 
"localized" data, and compiler is capable enough to distinguish and keep 
separate pointers of different "domains". What makes it currently 
impossible is tools (semantic constructs like pragma or named sections) 
that would let it happen.




No, generic low-level pointers /do/ have to work with all reasonable 
address spaces on the device.  A generic pointer has to support pointing 
to modifiable ram, to constant data (flash on small microcontrollers), 
to IO registers, etc.  If you want something that can access a specific, 
restricted area, then it is a specialised pointer - not a generic one. 
C has no support for making your own pointer types, but C++ does.




So an equivalent of x32 mode would not work at all.  Really, what you 
want is a 16-bit "small pointer" that is added to 0x2000 (the base 
address for RAM in small ARM devices, in case anyone following this 
thread is unfamiliar with the details) to get a real data pointer.  
And you'd like these small pointers to have convenient syntax and 
efficient use.


more or less yes. But "with a twist". A "compiler construct" that would 
be (say) sufficient to get the RAM-savings/optimization I'm aiming at 
could be "reduced" to the ability to create "medium-size" array of "some 
objects" and have them reference each other all WITHIN that "array". 
That array was in my earlier emails referred to as segment or section. 
So whenever a programmer writes a construct like:


struct test_s attribute((small-and-funny)) {
 struct test_s attribute((small-and-funny)) *next, *prev, *head;
 struct test_s attribute((small-and-funny)) *user, *group;
} repository[1000];
struct test_s attribute((small-and-funny)) *master, *trash;

compiler puts that data into that small array (dedicated section), so no 
"generic low-level pointers" referring that data would need to exist 
within the program. And if it happens, error is thrown (or 
autoconversion happen).




GCC attributes for sections already exist.

And again - indices will give you what you need here more efficiently 
than pointers.  All of your pointers can be converted to "repository[i]" 
format.  (And if your repository has no more than 256 entries, 8-bit 
indices will be sufficient.)  It can be efficient to store pointers to 
the entries in local variables if you are using them a lot, though GCC 
will do a fair amount of that automatically.




I think a C++ class (or rather, class template) with inline functions 
is the way to go here.  gcc's optimiser will give good code, and the 
C++ class will let you get nice syntax to hide the messy details.


OK. Thenx for the advice, but going into c++ is a major thing for me and 
(at least for  the time being) I'll stay with ordinary "big" pointers in 
plain C instead.


There is no good way to do this in C.  Named address spaces would be a 
possibility, but require quite a bit of effort and change to the 
compiler to implement, and they don't give you anything that you would 
not get from a C++ class.


Yes. named address spaces would be great. And for code, too.



It is good to have a wishlist (and you can file a wishlist "bug" in the 
gcc bugzilla, so that it won't be forgotten).  But it is also good to be 
realistic.  Indices will give you what you need in terms of space 
efficiency, but will be messier in the syntax.  A small pointer class 
will give you efficient code and neat syntax, but require C++.  These 
two solutions will, however, work today.  (And they are both target 
independent.)


David


(That's not quite true - named address spaces can, I believe, also 
influence the section name used for allocation of data defined in 
these spaces, which cannot be done by a C++ class.)


OK.

-R




Re: wishlist: support for shorter pointers

2023-07-04 Thread Rafał Pietrak via Gcc




W dniu 4.07.2023 o 17:13, David Brown pisze:
[]


If you have a circular buffer, it is vastly more efficient to have an 
array with no pointers or indices, and use head and tail indices to 
track the current position.  But I'm not sure if that is what you are 
looking for.  And you can use indices in fields for chaining, but the 
syntax will be different.  (For some microcontrollers, the 
multiplications involved in array index calculations can be an issue, 
but not for ARM devices.)


Ring Buffers, yest and no. Thy have their uses, but at this particular 
case (my current project) using them is pointless. A little explanation: 
at this point I have an "object" (a structure, or rather a union of 
structures) with 6 pointers and some additional data. Those 6 pointers 
are entangled in something that look like "neural network" (although 
it's  NOT one). This structure is sort of a demo, a template. It's 
expected to grow somewhat for the real thing ... like 3-5 times current 
structure. This translates to  100-150 bytes each (from current 32bytes) 
with "big several" expected as total size of the system. And my target 
is 2K-RAM/4kRAM devices.


I don't imagine turning this web into any amount of RB. In my capacity, 
that'd make it unmanageable.


But this is just me. I though, people doing embedded out there face 
similar problems, and a nice compiler "pragma" into direction of named 
spaces/segments could really help here.


-R


Re: wishlist: support for shorter pointers

2023-07-04 Thread Rafał Pietrak via Gcc




W dniu 4.07.2023 o 17:55, David Brown pisze:

On 04/07/2023 16:46, Rafał Pietrak wrote:

[--]


Yes. named address spaces would be great. And for code, too.



It is good to have a wishlist (and you can file a wishlist "bug" in the 
gcc bugzilla, so that it won't be forgotten).  But it is also good to be 
realistic.  Indices will give you what you need in terms of space 
efficiency, but will be messier in the syntax.  A small pointer class 
will give you efficient code and neat syntax, but require C++.  These 
two solutions will, however, work today.  (And they are both target 
independent.)


OK, Eventually I may invest into the ++. For now, thenx for the 
discussion and pointing me to the most promising directions. See U.


-R


Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Olivier Dion via Gcc
On Mon, 03 Jul 2023, Alan Stern  wrote:
> On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
>> This is a request for comments on extending the atomic builtins API to
>> help avoiding redundant memory barriers.  Indeed, there are
>
> What atomic builtins API are you talking about?  The kernel's?  That's 
> what it sounded like when I first read this sentence -- why else post 
> your message on a kernel mailing list?

Good point, we meant the `__atomic' builtins from GCC and Clang.  Sorry
for the confusion.

[...]

>> fully-ordered atomic operations like xchg and cmpxchg success in LKMM
>> have implicit memory barriers before/after the operations [1-2], while
>> atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
>> do not have any ordering guarantees of an atomic thread fence
>> __ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].
>
> After reading what you wrote below, I realized that the API you're 
> thinking of modifying is the one used by liburcu for user programs.  
> It's a shame you didn't mention this in either the subject line or the 
> first few paragraphs of the email; that would have made understanding 
> the message a little easier.

Indeed, our intent is to discuss the Userspace RCU uatomic API by extending
the toolchain's atomic builtins and not the LKMM itself.  The reason why
we've reached out to the Linux kernel developers is because the
original Userspace RCU uatomic API is based on the LKMM.

> In any case, your proposal seems reasonable to me at first glance, with 
> two possible exceptions:
>
> 1.I can see why you have special fences for before/after load, 
>   store, and rmw operations.  But why clear?  In what way is 
>   clearing an atomic variable different from storing a 0 in it?

We could indeed group the clear with the store.

We had two approaches in mind:

  a) A before/after pair by category of operation:

 - load
 - store
 - RMW
  
  b) A before/after pair for every operation:

 - load
 - store
 - exchange
 - compare_exchange
 - {add,sub,and,xor,or,nand}_fetch
 - fetch_{add,sub,and,xor,or,nand}
 - test_and_set
 - clear

If we go for the grouping in a), we have to take into account that the
barriers emitted need to cover the worse case scenario.  As an example,
Clang can emit a store for a exchange with SEQ_CST on x86-64, if the
returned value is not used.

Therefore, for the grouping in a), all RMW would need to emit a memory
barrier (with Clang on x86-64).  But with the scheme in b), we can emit
the barrier explicitly for the exchange operation.  We however question
the usefulness of this kind of optimization made by the compiler, since
a user should use a store operation instead.

> 2.You don't have a special fence for use after initializing an 
>   atomic.  This operation can be treated specially, because at the 
>   point where an atomic is initialized, it generally has not yet 
>   been made visible to any other threads.

I assume that you're referring to something like std::atomic_init from
C++11 and deprecated in C++20?  I do not see any scenario on any
architecture where a compiler would emit an atomic operation for the
initialization of an atomic variable.  If a memory barrier is required
in this situation, then an explicit one can be emitted using the
existing API.

In our case -- with the compiler's atomic builtins -- the initialization
of a variable can be done without any atomic operations and does not
require any memory barrier.  This is a consequence of being capable of
working with integral-scalar/pointer type without an atomic qualifier.

> Therefore the fence which would normally appear after a store (or
> clear) generally need not appear after an initialization, and you
> might want to add a special API to force the generation of such a
> fence.

I am puzzled by this.  Initialization of a shared variable does not need
to be atomic until its publication.  Could you expand on this?

Thanks for the feedback,
Olivier

-- 
Olivier Dion
EfficiOS Inc.
https://www.efficios.com


Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Alan Stern
On Tue, Jul 04, 2023 at 01:19:23PM -0400, Olivier Dion wrote:
> On Mon, 03 Jul 2023, Alan Stern  wrote:
> > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> >> This is a request for comments on extending the atomic builtins API to
> >> help avoiding redundant memory barriers.  Indeed, there are
> >
> > What atomic builtins API are you talking about?  The kernel's?  That's 
> > what it sounded like when I first read this sentence -- why else post 
> > your message on a kernel mailing list?
> 
> Good point, we meant the `__atomic' builtins from GCC and Clang.  Sorry
> for the confusion.

Oh, is that it?  Then I misunderstood entirely; I thought you were 
talking about augmenting the set of functions or macros made available 
in liburcu.  I did not realize you intended to change the compilers.

> Indeed, our intent is to discuss the Userspace RCU uatomic API by extending
> the toolchain's atomic builtins and not the LKMM itself.  The reason why
> we've reached out to the Linux kernel developers is because the
> original Userspace RCU uatomic API is based on the LKMM.

But why do you want to change the compilers to better support urcu?  
That seems like going about things backward; wouldn't it make more sense 
to change urcu to better match the facilities offered by the current 
compilers?

What if everybody started to do this: modifying the compilers to better 
support their pet projects?  The end result would be chaos!

> > 1.  I can see why you have special fences for before/after load, 
> > store, and rmw operations.  But why clear?  In what way is 
> > clearing an atomic variable different from storing a 0 in it?
> 
> We could indeed group the clear with the store.
> 
> We had two approaches in mind:
> 
>   a) A before/after pair by category of operation:
> 
>  - load
>  - store
>  - RMW
>   
>   b) A before/after pair for every operation:
> 
>  - load
>  - store
>  - exchange
>  - compare_exchange
>  - {add,sub,and,xor,or,nand}_fetch
>  - fetch_{add,sub,and,xor,or,nand}
>  - test_and_set
>  - clear
> 
> If we go for the grouping in a), we have to take into account that the
> barriers emitted need to cover the worse case scenario.  As an example,
> Clang can emit a store for a exchange with SEQ_CST on x86-64, if the
> returned value is not used.
> 
> Therefore, for the grouping in a), all RMW would need to emit a memory
> barrier (with Clang on x86-64).  But with the scheme in b), we can emit
> the barrier explicitly for the exchange operation.  We however question
> the usefulness of this kind of optimization made by the compiler, since
> a user should use a store operation instead.

So in the end you settled on a compromise?

> > 2.  You don't have a special fence for use after initializing an 
> > atomic.  This operation can be treated specially, because at the 
> > point where an atomic is initialized, it generally has not yet 
> > been made visible to any other threads.
> 
> I assume that you're referring to something like std::atomic_init from
> C++11 and deprecated in C++20?  I do not see any scenario on any
> architecture where a compiler would emit an atomic operation for the
> initialization of an atomic variable.  If a memory barrier is required
> in this situation, then an explicit one can be emitted using the
> existing API.
> 
> In our case -- with the compiler's atomic builtins -- the initialization
> of a variable can be done without any atomic operations and does not
> require any memory barrier.  This is a consequence of being capable of
> working with integral-scalar/pointer type without an atomic qualifier.
> 
> > Therefore the fence which would normally appear after a store (or
> > clear) generally need not appear after an initialization, and you
> > might want to add a special API to force the generation of such a
> > fence.
> 
> I am puzzled by this.  Initialization of a shared variable does not need
> to be atomic until its publication.  Could you expand on this?

In the kernel, I believe it sometimes happens that an atomic variable 
may be published before it is initialized.  (If that's wrong, Paul or 
Peter can correct me.)  But since this doesn't apply to the situations 
you're concerned with, you can forget I mentioned it.

Alan


Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Paul E. McKenney via Gcc
On Tue, Jul 04, 2023 at 04:25:45PM -0400, Alan Stern wrote:
> On Tue, Jul 04, 2023 at 01:19:23PM -0400, Olivier Dion wrote:

[ . . . ]

> > I am puzzled by this.  Initialization of a shared variable does not need
> > to be atomic until its publication.  Could you expand on this?
> 
> In the kernel, I believe it sometimes happens that an atomic variable 
> may be published before it is initialized.  (If that's wrong, Paul or 
> Peter can correct me.)  But since this doesn't apply to the situations 
> you're concerned with, you can forget I mentioned it.

Both use cases exist.

A global atomic is implicitly published at compile time.  If the desired
initial value is not known until multiple threads are running, then it
is necessary to be careful.  Hence double-check locking and its various
replacements.  (Clearly, if you can determine the initial value before
going multithreaded, life is simpler.)

And dynamically allocated or on-stack storage is the other case, where
there is a point in time when the storage is private even after multiple
threads are running.

Or am I missing the point?

Thanx, Paul


Re: wishlist: support for shorter pointers

2023-07-04 Thread Martin Uecker
Am Dienstag, dem 04.07.2023 um 16:46 +0200 schrieb Rafał Pietrak:...

> > 
> > I think a C++ class (or rather, class template) with inline functions is 
> > the way to go here.  gcc's optimiser will give good code, and the C++ 
> > class will let you get nice syntax to hide the messy details.
> 
> OK. Thenx for the advice, but going into c++ is a major thing for me and 
> (at least for  the time being) I'll stay with ordinary "big" pointers in 
> plain C instead.

Depending on what you are doing, "nice syntax" may not be
worth dealing with C++ issues. But this depends a lot on 
circumstances.  If the spaces saving are really valuable,
I would personally just wrap accesses with a macro.

> > There is no good way to do this in C.  Named address spaces would be a 
> > possibility, but require quite a bit of effort and change to the 
> > compiler to implement, and they don't give you anything that you would 
> > not get from a C++ class.
> 
> Yes. named address spaces would be great. And for code, too.
> 

While certainly some work, implementation effort for 
new kinds of named address spaces  does not seem to be
terrible at first glance:

https://gcc.gnu.org/onlinedocs/gccint/target-macros/adding-support-for-named-address-spaces.html

> 
Martin





Warning specifically for a returning noreturn

2023-07-04 Thread Julian Waters via Gcc
Hi all,

Currently to disable the warning that a noreturn method does return, it's
required to disable warnings entirely. This can be very inconvenient when
-Werror is enabled with a noreturn method that isn't specifically calling
something like std::abort() at the end, when one wants all other -Wall and
-Wextra warnings to be reported, for instance in the Java HotSpot VM (which
I'm currently adapting to compile with gcc on all supported platforms). Is
there a possibility we can add a disable warning option specifically for
this case? Something like -Wno-returning-noreturn. I'm interested in adding
this myself if it's not convenient for gcc's maintainers to do so at the
moment, but I'd need some guidance on where to look and what the relevant
code is

best regards,
Julian


Re: Warning specifically for a returning noreturn

2023-07-04 Thread Andrew Pinski via Gcc
On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc  wrote:
>
> Hi all,
>
> Currently to disable the warning that a noreturn method does return, it's
> required to disable warnings entirely. This can be very inconvenient when
> -Werror is enabled with a noreturn method that isn't specifically calling
> something like std::abort() at the end, when one wants all other -Wall and
> -Wextra warnings to be reported, for instance in the Java HotSpot VM (which
> I'm currently adapting to compile with gcc on all supported platforms). Is
> there a possibility we can add a disable warning option specifically for
> this case? Something like -Wno-returning-noreturn. I'm interested in adding
> this myself if it's not convenient for gcc's maintainers to do so at the
> moment, but I'd need some guidance on where to look and what the relevant
> code is

You could just add
__builtin_unreachable(); (or std::unreachable(); if you are C++23 or
unreachable() if you are using C23).
Or even add while(true) ;

I am pretty sure not having an option is on purpose and not really
interested in adding an option here because of the above workarounds.

Thanks,
Andrew Pinski

>
> best regards,
> Julian


Re: Warning specifically for a returning noreturn

2023-07-04 Thread Julian Waters via Gcc
Hi Andrew, thanks for the quick response,

What if the method has a return value? I know it sounds counterintuitive,
but in some places HotSpot relies on the noreturn attribute being applied
to methods that do return a value in an unreachable code path. Does the
unreachable builtin cover that case too?

best regards.
Julian

On Wed, Jul 5, 2023 at 9:07 AM Andrew Pinski  wrote:

> On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc 
> wrote:
> >
> > Hi all,
> >
> > Currently to disable the warning that a noreturn method does return, it's
> > required to disable warnings entirely. This can be very inconvenient when
> > -Werror is enabled with a noreturn method that isn't specifically calling
> > something like std::abort() at the end, when one wants all other -Wall
> and
> > -Wextra warnings to be reported, for instance in the Java HotSpot VM
> (which
> > I'm currently adapting to compile with gcc on all supported platforms).
> Is
> > there a possibility we can add a disable warning option specifically for
> > this case? Something like -Wno-returning-noreturn. I'm interested in
> adding
> > this myself if it's not convenient for gcc's maintainers to do so at the
> > moment, but I'd need some guidance on where to look and what the relevant
> > code is
>
> You could just add
> __builtin_unreachable(); (or std::unreachable(); if you are C++23 or
> unreachable() if you are using C23).
> Or even add while(true) ;
>
> I am pretty sure not having an option is on purpose and not really
> interested in adding an option here because of the above workarounds.
>
> Thanks,
> Andrew Pinski
>
> >
> > best regards,
> > Julian
>


Re: Warning specifically for a returning noreturn

2023-07-04 Thread Andrew Pinski via Gcc
On Tue, Jul 4, 2023 at 6:32 PM Julian Waters  wrote:
>
> Hi Andrew, thanks for the quick response,
>
> What if the method has a return value? I know it sounds counterintuitive, but 
> in some places HotSpot relies on the noreturn attribute being applied to 
> methods that do return a value in an unreachable code path. Does the 
> unreachable builtin cover that case too?

It is wrong to use noreturn on a function other than one which has a
return type of void as documented.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noreturn-function-attribute
:
```
It does not make sense for a noreturn function to have a return type
other than void.
```

Thanks,
Andrew Pinski


>
> best regards.
> Julian
>
> On Wed, Jul 5, 2023 at 9:07 AM Andrew Pinski  wrote:
>>
>> On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc  wrote:
>> >
>> > Hi all,
>> >
>> > Currently to disable the warning that a noreturn method does return, it's
>> > required to disable warnings entirely. This can be very inconvenient when
>> > -Werror is enabled with a noreturn method that isn't specifically calling
>> > something like std::abort() at the end, when one wants all other -Wall and
>> > -Wextra warnings to be reported, for instance in the Java HotSpot VM (which
>> > I'm currently adapting to compile with gcc on all supported platforms). Is
>> > there a possibility we can add a disable warning option specifically for
>> > this case? Something like -Wno-returning-noreturn. I'm interested in adding
>> > this myself if it's not convenient for gcc's maintainers to do so at the
>> > moment, but I'd need some guidance on where to look and what the relevant
>> > code is
>>
>> You could just add
>> __builtin_unreachable(); (or std::unreachable(); if you are C++23 or
>> unreachable() if you are using C23).
>> Or even add while(true) ;
>>
>> I am pretty sure not having an option is on purpose and not really
>> interested in adding an option here because of the above workarounds.
>>
>> Thanks,
>> Andrew Pinski
>>
>> >
>> > best regards,
>> > Julian


Re: wishlist: support for shorter pointers

2023-07-04 Thread Rafał Pietrak via Gcc

Hi,

W dniu 5.07.2023 o 00:57, Martin Uecker pisze:

Am Dienstag, dem 04.07.2023 um 16:46 +0200 schrieb Rafał Pietrak:...

[]


Yes. named address spaces would be great. And for code, too.



While certainly some work, implementation effort for
new kinds of named address spaces  does not seem to be
terrible at first glance:

https://gcc.gnu.org/onlinedocs/gccint/target-macros/adding-support-for-named-address-spaces.html


Oh! I see. this is good news. Although that internals documentation is 
complete black magic to me and I cannot tell heads from tails in it, the 
surrounding comments sound promising... like GCC-13 actually had the 
internal "machinery" supporting named address spaces and just 
cpu-platform specific code is missing (for all but "SPU port"). Is that 
right?


And if it's so ... there is no mention of how does it show up for 
"simple user" of the GCC (instead of the use of that "machinery" by 
creators of particular GCC port). In other words: how the sources should 
look like for the compiler to do "the thing"?


-R


Re: Warning specifically for a returning noreturn

2023-07-04 Thread LIU Hao via Gcc

在 2023/7/5 09:40, Andrew Pinski via Gcc 写道:

It is wrong to use noreturn on a function other than one which has a
return type of void as documented.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noreturn-function-attribute
:
```
It does not make sense for a noreturn function to have a return type
other than void.
```


It makes sense for a callback or virtual function which requires a non-void 
return type, e.g.

```
class my_stream : public std::streambuf
  {
  protected:
virtual
int
underflow() override;
  };

__attribute__((__noreturn__))   // [[noreturn]] won't work
int
my_stream::
underflow()
  {
throw std::invalid_argument("not implemented");
  }
```



--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature