Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: > int x = 0; > int y = 0; > int r0, r1; > > int dummy; > > void t0(void) > { > __atomic_store_n(&x, 1, __ATOMIC_RELAXED); > > __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST); > __atomic_thread_fence(__ATOMIC_SEQ_CST); > > r0 = __atomic_load_n(&y, __ATOMIC_RELAXED); > } > > void t1(void) > { > __atomic_store_n(&y, 1, __ATOMIC_RELAXED); > __atomic_thread_fence(__ATOMIC_SEQ_CST); > r1 = __atomic_load_n(&x, __ATOMIC_RELAXED); > } > > // BUG_ON(r0 == 0 && r1 == 0) > > On x86-64 (gcc 13.1 -O2) we get: > > t0(): > movl$1, x(%rip) > movl$1, %eax > xchgl dummy(%rip), %eax > lock orq $0, (%rsp) ;; Redundant with previous exchange. > movly(%rip), %eax > movl%eax, r0(%rip) > ret > t1(): > movl$1, y(%rip) > lock orq $0, (%rsp) > movlx(%rip), %eax > movl%eax, r1(%rip) > ret So I would expect the compilers to do better here. It should know those __atomic_thread_fence() thingies are superfluous and simply not emit them. This could even be done as a peephole pass later, where it sees consecutive atomic ops and the second being a no-op. > On x86-64 (clang 16 -O2) we get: > > t0(): > movl$1, x(%rip) > movl$1, %eax > xchgl %eax, dummy(%rip) > mfence;; Redundant with previous exchange. And that's just terrible :/ Nobody should be using MFENCE for this. And using MFENCE after a LOCK prefixes instruction (implicit in this case) is just fail, because I don't think C++ atomics cover MMIO and other such 'lovely' things. > movly(%rip), %eax > movl%eax, r0(%rip) > retq > t1(): > movl$1, y(%rip) > mfence > movlx(%rip), %eax > movl%eax, r1(%rip) > retq
Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
On Tue, 4 Jul 2023 at 10:47, Peter Zijlstra wrote: > > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: > > > int x = 0; > > int y = 0; > > int r0, r1; > > > > int dummy; > > > > void t0(void) > > { > > __atomic_store_n(&x, 1, __ATOMIC_RELAXED); > > > > __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST); > > __atomic_thread_fence(__ATOMIC_SEQ_CST); > > > > r0 = __atomic_load_n(&y, __ATOMIC_RELAXED); > > } > > > > void t1(void) > > { > > __atomic_store_n(&y, 1, __ATOMIC_RELAXED); > > __atomic_thread_fence(__ATOMIC_SEQ_CST); > > r1 = __atomic_load_n(&x, __ATOMIC_RELAXED); > > } > > > > // BUG_ON(r0 == 0 && r1 == 0) > > > > On x86-64 (gcc 13.1 -O2) we get: > > > > t0(): > > movl$1, x(%rip) > > movl$1, %eax > > xchgl dummy(%rip), %eax > > lock orq $0, (%rsp) ;; Redundant with previous exchange. > > movly(%rip), %eax > > movl%eax, r0(%rip) > > ret > > t1(): > > movl$1, y(%rip) > > lock orq $0, (%rsp) > > movlx(%rip), %eax > > movl%eax, r1(%rip) > > ret > > So I would expect the compilers to do better here. It should know those > __atomic_thread_fence() thingies are superfluous and simply not emit > them. This could even be done as a peephole pass later, where it sees > consecutive atomic ops and the second being a no-op. Right, I don't see why we need a whole set of new built-ins that say "this fence isn't needed if the adjacent atomic op already implies a fence". If the adjacent atomic op already implies a fence for a given ISA, then the compiler should already be able to elide the explicit fence. So just write your code with the explicit fence, and rely on the compiler to optimize it properly. Admittedly, today's compilers don't do that optimization well, but they also don't support your proposed built-ins, so you're going to have to wait for compilers to make improvements either way. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html discusses that compilers could (and should) optimize around atomics better. > > > On x86-64 (clang 16 -O2) we get: > > > > t0(): > > movl$1, x(%rip) > > movl$1, %eax > > xchgl %eax, dummy(%rip) > > mfence;; Redundant with previous exchange. > > And that's just terrible :/ Nobody should be using MFENCE for this. And > using MFENCE after a LOCK prefixes instruction (implicit in this case) > is just fail, because I don't think C++ atomics cover MMIO and other > such 'lovely' things. > > > movly(%rip), %eax > > movl%eax, r0(%rip) > > retq > > t1(): > > movl$1, y(%rip) > > mfence > > movlx(%rip), %eax > > movl%eax, r1(%rip) > > retq >
Re: wishlist: support for shorter pointers
On 03/07/2023 18:42, Rafał Pietrak via Gcc wrote: Hi Ian, W dniu 3.07.2023 o 17:07, Ian Lance Taylor pisze: On Wed, Jun 28, 2023 at 11:21 PM Rafał Pietrak via Gcc wrote: [] I was thinking about that, and it doesn't look as requiring that deep rewrites. ABI spec, that could accomodate the functionality could be as little as one additional attribute to linker segments. If I understand correctly, you are looking for something like the x32 mode that was available for a while on x86_64 processors: https://en.wikipedia.org/wiki/X32_ABI . That was a substantial amount of work including changes to the compiler, assembler, linker, standard library, and kernel. And at least to me it's never seemed particularly popular. Yes. And WiKi reporting up to 40% performance improvements in some corner cases is impressive and encouraging. I believe, that the reported average of 5-8% improvement would be significantly better within MCU tiny resources environment. In MCU world, such improvement could mean fit-nofit of a project into a particular device. -R A key difference is that using 32-bit pointers on an x86 is enough address space for a large majority of use-cases, while even on the smallest small ARM microcontroller, 16-bit is not enough. (It's not even enough to access all memory on larger AVR microcontrollers - the only 8-bit device supported by mainline gcc.) So while 16 bits would cover the address space of the RAM on a small ARM microcontroller, it would not cover access to code/flash space (including read-only data), IO registers, or other areas of memory-mapped memory and peripherals. Generic low-level pointers really have to be able to access everything. So an equivalent of x32 mode would not work at all. Really, what you want is a 16-bit "small pointer" that is added to 0x2000 (the base address for RAM in small ARM devices, in case anyone following this thread is unfamiliar with the details) to get a real data pointer. And you'd like these small pointers to have convenient syntax and efficient use. I think a C++ class (or rather, class template) with inline functions is the way to go here. gcc's optimiser will give good code, and the C++ class will let you get nice syntax to hide the messy details. There is no good way to do this in C. Named address spaces would be a possibility, but require quite a bit of effort and change to the compiler to implement, and they don't give you anything that you would not get from a C++ class. (That's not quite true - named address spaces can, I believe, also influence the section name used for allocation of data defined in these spaces, which cannot be done by a C++ class.) David
Re: wishlist: support for shorter pointers
> I think a C++ class (or rather, class template) with inline functions is > the way to go here. gcc's optimiser will give good code, and the C++ > class will let you get nice syntax to hide the messy details. > > There is no good way to do this in C. Named address spaces would be a > possibility, but require quite a bit of effort and change to the > compiler to implement, and they don't give you anything that you would > not get from a C++ class. > > (That's not quite true - named address spaces can, I believe, also > influence the section name used for allocation of data defined in these > spaces, which cannot be done by a C++ class.) > Does the C++ template class shebang work for storing "short code pointers" for things like compile-time/link-time generated function tables? Haven't tried it myself, but somehow I doubt it. Cheers, Oleg
Re: wishlist: support for shorter pointers
W dniu 3.07.2023 o 18:29, Rafał Pietrak pisze: Hi David, [--] 4. It is worth taking a step back, and thinking about how you would like to use these pointers. It is likely that you would be better thinking in terms of an array, rather than pointers - after all, you don't want to be using dynamically allocated memory here if you can avoid it, and certainly not generic malloc(). If you can use an array, then your index type can be as small as you like - maybe uint8_t is enough. I did that trip ... some time ago. May be I discarded the idea prematurely, but I dropped it because I was afraid of cost of I remember now what was my main problem with indexes implementation: inability to express/write chain "references" with them. Table/index semantic of: t[a][b][c][d]. is a "multidimentional table" which is completely different from "pointer semantic" of: *t->a->b->c->d It is quite legit to do a full circle around a circular list this way, while table semantics doesn't allow that. Indexes are off the table. -R
Re: wishlist: support for shorter pointers
Hi, W dniu 4.07.2023 o 14:38, David Brown pisze: [-] A key difference is that using 32-bit pointers on an x86 is enough address space for a large majority of use-cases, while even on the smallest small ARM microcontroller, 16-bit is not enough. (It's not even enough to access all memory on larger AVR microcontrollers - the only 8-bit device supported by mainline gcc.) So while 16 bits would cover the address space of the RAM on a small ARM microcontroller, it would not cover access to code/flash space (including read-only data), IO registers, or other areas of memory-mapped memory and peripherals. Generic low-level pointers really have to be able to access everything. Naturaly 16-bit is "most of the time" not enough to cover the entire workspace on even the smallest MCU (AVR being the only close to an exception here), but in my little experience, that is not really necessary. Meaning "generic low-level pointers really have to...", I don't think so. I really don't. Programs often manipulate quite "localized" data, and compiler is capable enough to distinguish and keep separate pointers of different "domains". What makes it currently impossible is tools (semantic constructs like pragma or named sections) that would let it happen. So an equivalent of x32 mode would not work at all. Really, what you want is a 16-bit "small pointer" that is added to 0x2000 (the base address for RAM in small ARM devices, in case anyone following this thread is unfamiliar with the details) to get a real data pointer. And you'd like these small pointers to have convenient syntax and efficient use. more or less yes. But "with a twist". A "compiler construct" that would be (say) sufficient to get the RAM-savings/optimization I'm aiming at could be "reduced" to the ability to create "medium-size" array of "some objects" and have them reference each other all WITHIN that "array". That array was in my earlier emails referred to as segment or section. So whenever a programmer writes a construct like: struct test_s attribute((small-and-funny)) { struct test_s attribute((small-and-funny)) *next, *prev, *head; struct test_s attribute((small-and-funny)) *user, *group; } repository[1000]; struct test_s attribute((small-and-funny)) *master, *trash; compiler puts that data into that small array (dedicated section), so no "generic low-level pointers" referring that data would need to exist within the program. And if it happens, error is thrown (or autoconversion happen). I think a C++ class (or rather, class template) with inline functions is the way to go here. gcc's optimiser will give good code, and the C++ class will let you get nice syntax to hide the messy details. OK. Thenx for the advice, but going into c++ is a major thing for me and (at least for the time being) I'll stay with ordinary "big" pointers in plain C instead. There is no good way to do this in C. Named address spaces would be a possibility, but require quite a bit of effort and change to the compiler to implement, and they don't give you anything that you would not get from a C++ class. Yes. named address spaces would be great. And for code, too. (That's not quite true - named address spaces can, I believe, also influence the section name used for allocation of data defined in these spaces, which cannot be done by a C++ class.) OK. -R
Re: wishlist: support for shorter pointers
On 04/07/2023 16:20, Rafał Pietrak wrote: W dniu 3.07.2023 o 18:29, Rafał Pietrak pisze: Hi David, [--] 4. It is worth taking a step back, and thinking about how you would like to use these pointers. It is likely that you would be better thinking in terms of an array, rather than pointers - after all, you don't want to be using dynamically allocated memory here if you can avoid it, and certainly not generic malloc(). If you can use an array, then your index type can be as small as you like - maybe uint8_t is enough. I did that trip ... some time ago. May be I discarded the idea prematurely, but I dropped it because I was afraid of cost of I remember now what was my main problem with indexes implementation: inability to express/write chain "references" with them. Table/index semantic of: t[a][b][c][d]. is a "multidimentional table" which is completely different from "pointer semantic" of: *t->a->b->c->d It is quite legit to do a full circle around a circular list this way, while table semantics doesn't allow that. Indexes are off the table. -R If you have a circular buffer, it is vastly more efficient to have an array with no pointers or indices, and use head and tail indices to track the current position. But I'm not sure if that is what you are looking for. And you can use indices in fields for chaining, but the syntax will be different. (For some microcontrollers, the multiplications involved in array index calculations can be an issue, but not for ARM devices.)
Re: wishlist: support for shorter pointers
On 04/07/2023 16:46, Rafał Pietrak wrote: Hi, W dniu 4.07.2023 o 14:38, David Brown pisze: [-] A key difference is that using 32-bit pointers on an x86 is enough address space for a large majority of use-cases, while even on the smallest small ARM microcontroller, 16-bit is not enough. (It's not even enough to access all memory on larger AVR microcontrollers - the only 8-bit device supported by mainline gcc.) So while 16 bits would cover the address space of the RAM on a small ARM microcontroller, it would not cover access to code/flash space (including read-only data), IO registers, or other areas of memory-mapped memory and peripherals. Generic low-level pointers really have to be able to access everything. Naturaly 16-bit is "most of the time" not enough to cover the entire workspace on even the smallest MCU (AVR being the only close to an exception here), but in my little experience, that is not really necessary. (Most MSP430 devices, also supported by GCC, are also covered by a 16-bit address space.) Meaning "generic low-level pointers really have to...", I don't think so. I really don't. Programs often manipulate quite "localized" data, and compiler is capable enough to distinguish and keep separate pointers of different "domains". What makes it currently impossible is tools (semantic constructs like pragma or named sections) that would let it happen. No, generic low-level pointers /do/ have to work with all reasonable address spaces on the device. A generic pointer has to support pointing to modifiable ram, to constant data (flash on small microcontrollers), to IO registers, etc. If you want something that can access a specific, restricted area, then it is a specialised pointer - not a generic one. C has no support for making your own pointer types, but C++ does. So an equivalent of x32 mode would not work at all. Really, what you want is a 16-bit "small pointer" that is added to 0x2000 (the base address for RAM in small ARM devices, in case anyone following this thread is unfamiliar with the details) to get a real data pointer. And you'd like these small pointers to have convenient syntax and efficient use. more or less yes. But "with a twist". A "compiler construct" that would be (say) sufficient to get the RAM-savings/optimization I'm aiming at could be "reduced" to the ability to create "medium-size" array of "some objects" and have them reference each other all WITHIN that "array". That array was in my earlier emails referred to as segment or section. So whenever a programmer writes a construct like: struct test_s attribute((small-and-funny)) { struct test_s attribute((small-and-funny)) *next, *prev, *head; struct test_s attribute((small-and-funny)) *user, *group; } repository[1000]; struct test_s attribute((small-and-funny)) *master, *trash; compiler puts that data into that small array (dedicated section), so no "generic low-level pointers" referring that data would need to exist within the program. And if it happens, error is thrown (or autoconversion happen). GCC attributes for sections already exist. And again - indices will give you what you need here more efficiently than pointers. All of your pointers can be converted to "repository[i]" format. (And if your repository has no more than 256 entries, 8-bit indices will be sufficient.) It can be efficient to store pointers to the entries in local variables if you are using them a lot, though GCC will do a fair amount of that automatically. I think a C++ class (or rather, class template) with inline functions is the way to go here. gcc's optimiser will give good code, and the C++ class will let you get nice syntax to hide the messy details. OK. Thenx for the advice, but going into c++ is a major thing for me and (at least for the time being) I'll stay with ordinary "big" pointers in plain C instead. There is no good way to do this in C. Named address spaces would be a possibility, but require quite a bit of effort and change to the compiler to implement, and they don't give you anything that you would not get from a C++ class. Yes. named address spaces would be great. And for code, too. It is good to have a wishlist (and you can file a wishlist "bug" in the gcc bugzilla, so that it won't be forgotten). But it is also good to be realistic. Indices will give you what you need in terms of space efficiency, but will be messier in the syntax. A small pointer class will give you efficient code and neat syntax, but require C++. These two solutions will, however, work today. (And they are both target independent.) David (That's not quite true - named address spaces can, I believe, also influence the section name used for allocation of data defined in these spaces, which cannot be done by a C++ class.) OK. -R
Re: wishlist: support for shorter pointers
W dniu 4.07.2023 o 17:13, David Brown pisze: [] If you have a circular buffer, it is vastly more efficient to have an array with no pointers or indices, and use head and tail indices to track the current position. But I'm not sure if that is what you are looking for. And you can use indices in fields for chaining, but the syntax will be different. (For some microcontrollers, the multiplications involved in array index calculations can be an issue, but not for ARM devices.) Ring Buffers, yest and no. Thy have their uses, but at this particular case (my current project) using them is pointless. A little explanation: at this point I have an "object" (a structure, or rather a union of structures) with 6 pointers and some additional data. Those 6 pointers are entangled in something that look like "neural network" (although it's NOT one). This structure is sort of a demo, a template. It's expected to grow somewhat for the real thing ... like 3-5 times current structure. This translates to 100-150 bytes each (from current 32bytes) with "big several" expected as total size of the system. And my target is 2K-RAM/4kRAM devices. I don't imagine turning this web into any amount of RB. In my capacity, that'd make it unmanageable. But this is just me. I though, people doing embedded out there face similar problems, and a nice compiler "pragma" into direction of named spaces/segments could really help here. -R
Re: wishlist: support for shorter pointers
W dniu 4.07.2023 o 17:55, David Brown pisze: On 04/07/2023 16:46, Rafał Pietrak wrote: [--] Yes. named address spaces would be great. And for code, too. It is good to have a wishlist (and you can file a wishlist "bug" in the gcc bugzilla, so that it won't be forgotten). But it is also good to be realistic. Indices will give you what you need in terms of space efficiency, but will be messier in the syntax. A small pointer class will give you efficient code and neat syntax, but require C++. These two solutions will, however, work today. (And they are both target independent.) OK, Eventually I may invest into the ++. For now, thenx for the discussion and pointing me to the most promising directions. See U. -R
Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
On Mon, 03 Jul 2023, Alan Stern wrote: > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: >> This is a request for comments on extending the atomic builtins API to >> help avoiding redundant memory barriers. Indeed, there are > > What atomic builtins API are you talking about? The kernel's? That's > what it sounded like when I first read this sentence -- why else post > your message on a kernel mailing list? Good point, we meant the `__atomic' builtins from GCC and Clang. Sorry for the confusion. [...] >> fully-ordered atomic operations like xchg and cmpxchg success in LKMM >> have implicit memory barriers before/after the operations [1-2], while >> atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11 >> do not have any ordering guarantees of an atomic thread fence >> __ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3]. > > After reading what you wrote below, I realized that the API you're > thinking of modifying is the one used by liburcu for user programs. > It's a shame you didn't mention this in either the subject line or the > first few paragraphs of the email; that would have made understanding > the message a little easier. Indeed, our intent is to discuss the Userspace RCU uatomic API by extending the toolchain's atomic builtins and not the LKMM itself. The reason why we've reached out to the Linux kernel developers is because the original Userspace RCU uatomic API is based on the LKMM. > In any case, your proposal seems reasonable to me at first glance, with > two possible exceptions: > > 1.I can see why you have special fences for before/after load, > store, and rmw operations. But why clear? In what way is > clearing an atomic variable different from storing a 0 in it? We could indeed group the clear with the store. We had two approaches in mind: a) A before/after pair by category of operation: - load - store - RMW b) A before/after pair for every operation: - load - store - exchange - compare_exchange - {add,sub,and,xor,or,nand}_fetch - fetch_{add,sub,and,xor,or,nand} - test_and_set - clear If we go for the grouping in a), we have to take into account that the barriers emitted need to cover the worse case scenario. As an example, Clang can emit a store for a exchange with SEQ_CST on x86-64, if the returned value is not used. Therefore, for the grouping in a), all RMW would need to emit a memory barrier (with Clang on x86-64). But with the scheme in b), we can emit the barrier explicitly for the exchange operation. We however question the usefulness of this kind of optimization made by the compiler, since a user should use a store operation instead. > 2.You don't have a special fence for use after initializing an > atomic. This operation can be treated specially, because at the > point where an atomic is initialized, it generally has not yet > been made visible to any other threads. I assume that you're referring to something like std::atomic_init from C++11 and deprecated in C++20? I do not see any scenario on any architecture where a compiler would emit an atomic operation for the initialization of an atomic variable. If a memory barrier is required in this situation, then an explicit one can be emitted using the existing API. In our case -- with the compiler's atomic builtins -- the initialization of a variable can be done without any atomic operations and does not require any memory barrier. This is a consequence of being capable of working with integral-scalar/pointer type without an atomic qualifier. > Therefore the fence which would normally appear after a store (or > clear) generally need not appear after an initialization, and you > might want to add a special API to force the generation of such a > fence. I am puzzled by this. Initialization of a shared variable does not need to be atomic until its publication. Could you expand on this? Thanks for the feedback, Olivier -- Olivier Dion EfficiOS Inc. https://www.efficios.com
Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
On Tue, Jul 04, 2023 at 01:19:23PM -0400, Olivier Dion wrote: > On Mon, 03 Jul 2023, Alan Stern wrote: > > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: > >> This is a request for comments on extending the atomic builtins API to > >> help avoiding redundant memory barriers. Indeed, there are > > > > What atomic builtins API are you talking about? The kernel's? That's > > what it sounded like when I first read this sentence -- why else post > > your message on a kernel mailing list? > > Good point, we meant the `__atomic' builtins from GCC and Clang. Sorry > for the confusion. Oh, is that it? Then I misunderstood entirely; I thought you were talking about augmenting the set of functions or macros made available in liburcu. I did not realize you intended to change the compilers. > Indeed, our intent is to discuss the Userspace RCU uatomic API by extending > the toolchain's atomic builtins and not the LKMM itself. The reason why > we've reached out to the Linux kernel developers is because the > original Userspace RCU uatomic API is based on the LKMM. But why do you want to change the compilers to better support urcu? That seems like going about things backward; wouldn't it make more sense to change urcu to better match the facilities offered by the current compilers? What if everybody started to do this: modifying the compilers to better support their pet projects? The end result would be chaos! > > 1. I can see why you have special fences for before/after load, > > store, and rmw operations. But why clear? In what way is > > clearing an atomic variable different from storing a 0 in it? > > We could indeed group the clear with the store. > > We had two approaches in mind: > > a) A before/after pair by category of operation: > > - load > - store > - RMW > > b) A before/after pair for every operation: > > - load > - store > - exchange > - compare_exchange > - {add,sub,and,xor,or,nand}_fetch > - fetch_{add,sub,and,xor,or,nand} > - test_and_set > - clear > > If we go for the grouping in a), we have to take into account that the > barriers emitted need to cover the worse case scenario. As an example, > Clang can emit a store for a exchange with SEQ_CST on x86-64, if the > returned value is not used. > > Therefore, for the grouping in a), all RMW would need to emit a memory > barrier (with Clang on x86-64). But with the scheme in b), we can emit > the barrier explicitly for the exchange operation. We however question > the usefulness of this kind of optimization made by the compiler, since > a user should use a store operation instead. So in the end you settled on a compromise? > > 2. You don't have a special fence for use after initializing an > > atomic. This operation can be treated specially, because at the > > point where an atomic is initialized, it generally has not yet > > been made visible to any other threads. > > I assume that you're referring to something like std::atomic_init from > C++11 and deprecated in C++20? I do not see any scenario on any > architecture where a compiler would emit an atomic operation for the > initialization of an atomic variable. If a memory barrier is required > in this situation, then an explicit one can be emitted using the > existing API. > > In our case -- with the compiler's atomic builtins -- the initialization > of a variable can be done without any atomic operations and does not > require any memory barrier. This is a consequence of being capable of > working with integral-scalar/pointer type without an atomic qualifier. > > > Therefore the fence which would normally appear after a store (or > > clear) generally need not appear after an initialization, and you > > might want to add a special API to force the generation of such a > > fence. > > I am puzzled by this. Initialization of a shared variable does not need > to be atomic until its publication. Could you expand on this? In the kernel, I believe it sometimes happens that an atomic variable may be published before it is initialized. (If that's wrong, Paul or Peter can correct me.) But since this doesn't apply to the situations you're concerned with, you can forget I mentioned it. Alan
Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
On Tue, Jul 04, 2023 at 04:25:45PM -0400, Alan Stern wrote: > On Tue, Jul 04, 2023 at 01:19:23PM -0400, Olivier Dion wrote: [ . . . ] > > I am puzzled by this. Initialization of a shared variable does not need > > to be atomic until its publication. Could you expand on this? > > In the kernel, I believe it sometimes happens that an atomic variable > may be published before it is initialized. (If that's wrong, Paul or > Peter can correct me.) But since this doesn't apply to the situations > you're concerned with, you can forget I mentioned it. Both use cases exist. A global atomic is implicitly published at compile time. If the desired initial value is not known until multiple threads are running, then it is necessary to be careful. Hence double-check locking and its various replacements. (Clearly, if you can determine the initial value before going multithreaded, life is simpler.) And dynamically allocated or on-stack storage is the other case, where there is a point in time when the storage is private even after multiple threads are running. Or am I missing the point? Thanx, Paul
Re: wishlist: support for shorter pointers
Am Dienstag, dem 04.07.2023 um 16:46 +0200 schrieb Rafał Pietrak:... > > > > I think a C++ class (or rather, class template) with inline functions is > > the way to go here. gcc's optimiser will give good code, and the C++ > > class will let you get nice syntax to hide the messy details. > > OK. Thenx for the advice, but going into c++ is a major thing for me and > (at least for the time being) I'll stay with ordinary "big" pointers in > plain C instead. Depending on what you are doing, "nice syntax" may not be worth dealing with C++ issues. But this depends a lot on circumstances. If the spaces saving are really valuable, I would personally just wrap accesses with a macro. > > There is no good way to do this in C. Named address spaces would be a > > possibility, but require quite a bit of effort and change to the > > compiler to implement, and they don't give you anything that you would > > not get from a C++ class. > > Yes. named address spaces would be great. And for code, too. > While certainly some work, implementation effort for new kinds of named address spaces does not seem to be terrible at first glance: https://gcc.gnu.org/onlinedocs/gccint/target-macros/adding-support-for-named-address-spaces.html > Martin
Warning specifically for a returning noreturn
Hi all, Currently to disable the warning that a noreturn method does return, it's required to disable warnings entirely. This can be very inconvenient when -Werror is enabled with a noreturn method that isn't specifically calling something like std::abort() at the end, when one wants all other -Wall and -Wextra warnings to be reported, for instance in the Java HotSpot VM (which I'm currently adapting to compile with gcc on all supported platforms). Is there a possibility we can add a disable warning option specifically for this case? Something like -Wno-returning-noreturn. I'm interested in adding this myself if it's not convenient for gcc's maintainers to do so at the moment, but I'd need some guidance on where to look and what the relevant code is best regards, Julian
Re: Warning specifically for a returning noreturn
On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc wrote: > > Hi all, > > Currently to disable the warning that a noreturn method does return, it's > required to disable warnings entirely. This can be very inconvenient when > -Werror is enabled with a noreturn method that isn't specifically calling > something like std::abort() at the end, when one wants all other -Wall and > -Wextra warnings to be reported, for instance in the Java HotSpot VM (which > I'm currently adapting to compile with gcc on all supported platforms). Is > there a possibility we can add a disable warning option specifically for > this case? Something like -Wno-returning-noreturn. I'm interested in adding > this myself if it's not convenient for gcc's maintainers to do so at the > moment, but I'd need some guidance on where to look and what the relevant > code is You could just add __builtin_unreachable(); (or std::unreachable(); if you are C++23 or unreachable() if you are using C23). Or even add while(true) ; I am pretty sure not having an option is on purpose and not really interested in adding an option here because of the above workarounds. Thanks, Andrew Pinski > > best regards, > Julian
Re: Warning specifically for a returning noreturn
Hi Andrew, thanks for the quick response, What if the method has a return value? I know it sounds counterintuitive, but in some places HotSpot relies on the noreturn attribute being applied to methods that do return a value in an unreachable code path. Does the unreachable builtin cover that case too? best regards. Julian On Wed, Jul 5, 2023 at 9:07 AM Andrew Pinski wrote: > On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc > wrote: > > > > Hi all, > > > > Currently to disable the warning that a noreturn method does return, it's > > required to disable warnings entirely. This can be very inconvenient when > > -Werror is enabled with a noreturn method that isn't specifically calling > > something like std::abort() at the end, when one wants all other -Wall > and > > -Wextra warnings to be reported, for instance in the Java HotSpot VM > (which > > I'm currently adapting to compile with gcc on all supported platforms). > Is > > there a possibility we can add a disable warning option specifically for > > this case? Something like -Wno-returning-noreturn. I'm interested in > adding > > this myself if it's not convenient for gcc's maintainers to do so at the > > moment, but I'd need some guidance on where to look and what the relevant > > code is > > You could just add > __builtin_unreachable(); (or std::unreachable(); if you are C++23 or > unreachable() if you are using C23). > Or even add while(true) ; > > I am pretty sure not having an option is on purpose and not really > interested in adding an option here because of the above workarounds. > > Thanks, > Andrew Pinski > > > > > best regards, > > Julian >
Re: Warning specifically for a returning noreturn
On Tue, Jul 4, 2023 at 6:32 PM Julian Waters wrote: > > Hi Andrew, thanks for the quick response, > > What if the method has a return value? I know it sounds counterintuitive, but > in some places HotSpot relies on the noreturn attribute being applied to > methods that do return a value in an unreachable code path. Does the > unreachable builtin cover that case too? It is wrong to use noreturn on a function other than one which has a return type of void as documented. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noreturn-function-attribute : ``` It does not make sense for a noreturn function to have a return type other than void. ``` Thanks, Andrew Pinski > > best regards. > Julian > > On Wed, Jul 5, 2023 at 9:07 AM Andrew Pinski wrote: >> >> On Tue, Jul 4, 2023 at 5:54 PM Julian Waters via Gcc wrote: >> > >> > Hi all, >> > >> > Currently to disable the warning that a noreturn method does return, it's >> > required to disable warnings entirely. This can be very inconvenient when >> > -Werror is enabled with a noreturn method that isn't specifically calling >> > something like std::abort() at the end, when one wants all other -Wall and >> > -Wextra warnings to be reported, for instance in the Java HotSpot VM (which >> > I'm currently adapting to compile with gcc on all supported platforms). Is >> > there a possibility we can add a disable warning option specifically for >> > this case? Something like -Wno-returning-noreturn. I'm interested in adding >> > this myself if it's not convenient for gcc's maintainers to do so at the >> > moment, but I'd need some guidance on where to look and what the relevant >> > code is >> >> You could just add >> __builtin_unreachable(); (or std::unreachable(); if you are C++23 or >> unreachable() if you are using C23). >> Or even add while(true) ; >> >> I am pretty sure not having an option is on purpose and not really >> interested in adding an option here because of the above workarounds. >> >> Thanks, >> Andrew Pinski >> >> > >> > best regards, >> > Julian
Re: wishlist: support for shorter pointers
Hi, W dniu 5.07.2023 o 00:57, Martin Uecker pisze: Am Dienstag, dem 04.07.2023 um 16:46 +0200 schrieb Rafał Pietrak:... [] Yes. named address spaces would be great. And for code, too. While certainly some work, implementation effort for new kinds of named address spaces does not seem to be terrible at first glance: https://gcc.gnu.org/onlinedocs/gccint/target-macros/adding-support-for-named-address-spaces.html Oh! I see. this is good news. Although that internals documentation is complete black magic to me and I cannot tell heads from tails in it, the surrounding comments sound promising... like GCC-13 actually had the internal "machinery" supporting named address spaces and just cpu-platform specific code is missing (for all but "SPU port"). Is that right? And if it's so ... there is no mention of how does it show up for "simple user" of the GCC (instead of the use of that "machinery" by creators of particular GCC port). In other words: how the sources should look like for the compiler to do "the thing"? -R
Re: Warning specifically for a returning noreturn
在 2023/7/5 09:40, Andrew Pinski via Gcc 写道: It is wrong to use noreturn on a function other than one which has a return type of void as documented. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noreturn-function-attribute : ``` It does not make sense for a noreturn function to have a return type other than void. ``` It makes sense for a callback or virtual function which requires a non-void return type, e.g. ``` class my_stream : public std::streambuf { protected: virtual int underflow() override; }; __attribute__((__noreturn__)) // [[noreturn]] won't work int my_stream:: underflow() { throw std::invalid_argument("not implemented"); } ``` -- Best regards, LIU Hao OpenPGP_signature Description: OpenPGP digital signature