Implementing atomic load as compare-and-swap for read-only memory
Hi all, expand_atomic_load in optabs.c tries to expand a wide atomic load using an atomic_compare_and_swap with the comment saying that sometimes a redundant harmless store may be performed. Is the store really valid if the memory is read-only? I've been looking at implementing a similar compare-and-swap strategy for atomic_loaddi for some arm targets and this concern came up. I don't think GCC can statically prove that a particular piece of memory is guaranteed to be writeable at runtime in all cases, so emitting a spurious store would not be always valid. I see this concern was already raised in https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html but that doesn't seem to have gone anywhere. Any thoughts? Should we remove the assumption that atomic loads always access writeable memory? Thanks, Kyrill
Re: Implementing atomic load as compare-and-swap for read-only memory
On Fri, Jun 03, 2016 at 10:34:15AM +0100, Kyrill Tkachov wrote: > expand_atomic_load in optabs.c tries to expand a wide atomic load using an > atomic_compare_and_swap > with the comment saying that sometimes a redundant harmless store may be > performed. > Is the store really valid if the memory is read-only? > > I've been looking at implementing a similar compare-and-swap strategy for > atomic_loaddi for some > arm targets and this concern came up. I don't think GCC can statically prove > that a particular > piece of memory is guaranteed to be writeable at runtime in all cases, so > emitting a spurious > store would not be always valid. > > I see this concern was already raised in > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html > but that doesn't seem to have gone anywhere. > > Any thoughts? Should we remove the assumption that atomic loads always access > writeable memory? I guess it is a tough decision. If you don't have HW instruction to read say double word aligned integer atomically, if you don't implement atomic load on it through compare and swap (which indeed won't work with read-only memory), then you probably need to fall back to locks in libatomic. But doesn't that mean you should fall back to locked operation also for any other atomic operation on such types, because otherwise if you atomic_store or any other kind of atomic operation, it wouldn't use the locking, while for atomic load it would? That would be an ABI change and quite significant pessimization in many cases. Jakub
GCC 5.5 Status Report (2016-06-03)
Status == The GCC 5 branch is open again for regression and documentation fixes. Quality Data Priority # Change from last report --- --- P10 P2 138 - 9 P3 20 + 7 P4 76 - 4 P5 28 --- --- Total P1-P3 158 - 5 Total 262 - 9 Previous Report === https://gcc.gnu.org/ml/gcc/2016-05/msg00164.html
Re: Implementing atomic load as compare-and-swap for read-only memory
On Fri, 2016-06-03 at 12:03 +0200, Jakub Jelinek wrote: > On Fri, Jun 03, 2016 at 10:34:15AM +0100, Kyrill Tkachov wrote: > > expand_atomic_load in optabs.c tries to expand a wide atomic load using an > > atomic_compare_and_swap > > with the comment saying that sometimes a redundant harmless store may be > > performed. > > Is the store really valid if the memory is read-only? > > > > I've been looking at implementing a similar compare-and-swap strategy for > > atomic_loaddi for some > > arm targets and this concern came up. I don't think GCC can statically > > prove that a particular > > piece of memory is guaranteed to be writeable at runtime in all cases, so > > emitting a spurious > > store would not be always valid. > > > > I see this concern was already raised in > > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html > > but that doesn't seem to have gone anywhere. > > > > Any thoughts? Should we remove the assumption that atomic loads always > > access writeable memory? I think we shouldn't store to memory that we cannot be certain is writable. > I guess it is a tough decision. If you don't have HW instruction to read > say double word aligned integer atomically, if you don't implement atomic > load on it through compare and swap (which indeed won't work with read-only > memory), then you probably need to fall back to locks in libatomic. And that would be fine, IMO. If you can't even load atomically, doing something useful with this type will be hard except in special cases. Also, doing a CAS (compare-and-swap) and thus potentially bringing in the cache line in exclusive mode can be a lot more costly than what users might expect from a load. A short critical section might not be much slower. If you only have a CAS as base of the atomic operations on a type, then a CAS operation exposed to the user will still be a just a single HW CAS. But any other operation besides the CAS and a load will need *two* CAS operations; even an atomic store has to be implemented as a CAS loop. > But doesn't that mean you should fall back to locked operation also for any > other atomic operation on such types, because otherwise if you atomic_store > or any other kind of atomic operation, it wouldn't use the locking, while > for atomic load it would? I suppose you mean that one must fall back to using locking for all operations? If load isn't atomic, then it can't be made atomic using external locks if the other operations don't use the locks. > That would be an ABI change and quite significant > pessimization in many cases. A change from wide CAS to locking would be an ABI change I suppose, but it could also be considered a necessary bugfix if we don't want to write to read-only memory. Does this affect anything but i686?
Re: Implementing atomic load as compare-and-swap for read-only memory
On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote: > And that would be fine, IMO. If you can't even load atomically, doing > something useful with this type will be hard except in special cases. > Also, doing a CAS (compare-and-swap) and thus potentially bringing in > the cache line in exclusive mode can be a lot more costly than what > users might expect from a load. A short critical section might not be > much slower. > > If you only have a CAS as base of the atomic operations on a type, then > a CAS operation exposed to the user will still be a just a single HW > CAS. But any other operation besides the CAS and a load will need *two* > CAS operations; even an atomic store has to be implemented as a CAS > loop. Would we just stop expanding all those __atomic_*/__sync_* builtins inline then (which would IMHO break tons of stuff), or just some predicate that atomic.h/atomic headers use? > > But doesn't that mean you should fall back to locked operation also for any > > other atomic operation on such types, because otherwise if you atomic_store > > or any other kind of atomic operation, it wouldn't use the locking, while > > for atomic load it would? > > I suppose you mean that one must fall back to using locking for all > operations? If load isn't atomic, then it can't be made atomic using > external locks if the other operations don't use the locks. > > > That would be an ABI change and quite significant > > pessimization in many cases. > > A change from wide CAS to locking would be an ABI change I suppose, but > it could also be considered a necessary bugfix if we don't want to write > to read-only memory. Does this affect anything but i686? Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64 (judging from who initiated this thread), I bet there are many others. Jakub
Re: Implementing atomic load as compare-and-swap for read-only memory
Hi Jakub, Torvald, On 03/06/16 13:32, Jakub Jelinek wrote: On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote: And that would be fine, IMO. If you can't even load atomically, doing something useful with this type will be hard except in special cases. Also, doing a CAS (compare-and-swap) and thus potentially bringing in the cache line in exclusive mode can be a lot more costly than what users might expect from a load. A short critical section might not be much slower. If you only have a CAS as base of the atomic operations on a type, then a CAS operation exposed to the user will still be a just a single HW CAS. But any other operation besides the CAS and a load will need *two* CAS operations; even an atomic store has to be implemented as a CAS loop. Would we just stop expanding all those __atomic_*/__sync_* builtins inline then (which would IMHO break tons of stuff), or just some predicate that atomic.h/atomic headers use? But doesn't that mean you should fall back to locked operation also for any other atomic operation on such types, because otherwise if you atomic_store or any other kind of atomic operation, it wouldn't use the locking, while for atomic load it would? I suppose you mean that one must fall back to using locking for all operations? If load isn't atomic, then it can't be made atomic using external locks if the other operations don't use the locks. That would be an ABI change and quite significant pessimization in many cases. A change from wide CAS to locking would be an ABI change I suppose, but it could also be considered a necessary bugfix if we don't want to write to read-only memory. Does this affect anything but i686? Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64 (judging from who initiated this thread), I bet there are many others. I'm looking at pre-LPAE ARMv7-A targets for which the ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends: "The way to atomically load two 32-bit quantities is to perform a LDREXD/STREXD sequence, reading and writing the same value, for which the STREXD succeeds, and use the read values." Currently we emit just a single load-doubleword-exclusive which, according to the above, would not be enough on such targets. On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 70814). Kyrill Jakub
RE: (R5900) Implementing Vector Support
Woon yung Liu writes: > On Wednesday, June 1, 2016 5:45 AM, Richard Henderson > wrote: > > This is almost always incorrect, and certainly before reload. > > You need to use gen_lowpart. There are examples in the code > > > fragments that I sent the other week. > > The problem is that gen_lowpart() doesn't seem to support casting to > other modes of the same size. > When I use it, the assert within gen_lowpart_general() will fail due to > gen_lowpart_common() rejecting the operation (new mode size is not > smaller than the old). The conclusion we came to when developing MSA is that simplify_gen_subreg is the way to go for converting between vector modes: simplify_gen_subreg (new_mode, rtx, old_mode, 0) I'm not sure there is much need to change modes after reload so do it upfront at expand time or when splitting and you should be OK. See trunk mips.c for a number of uses of this when converting vector modes. > > You need to read the gcc internals documentation. They are all three > different > > > uses, though there is some overlap between define_insn and > define_expand. > > I actually read the GCC internals documentation before I even begun any > attempt at this, but there was still a lot that I did not successfully > grasp. > > I'll go with define_expand then. define_expand only provides a way to generate an instruction or sequence of instructions to achieve the overall goal. You must also have define_insn definitions for any pattern you emit or the generated code will fail to match. A define_insn_and_split is just shorthand for a define_insn where one or more output patterns are '#' (split) and you want to define the split alongside the instruction rather than as a separate define_split. As far as I understand the difference is syntactic sugar. > But I am already doubting that I will complete this port as I can no > longer see a favourable conclusion. It may take time but I'm sure we can help talk through the problems. As a new GCC developer you are a welcome addition to the community. Thanks, Matthew
Re: Implementing atomic load as compare-and-swap for read-only memory
On Fri, 2016-06-03 at 14:32 +0200, Jakub Jelinek wrote: > On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote: > > And that would be fine, IMO. If you can't even load atomically, doing > > something useful with this type will be hard except in special cases. > > Also, doing a CAS (compare-and-swap) and thus potentially bringing in > > the cache line in exclusive mode can be a lot more costly than what > > users might expect from a load. A short critical section might not be > > much slower. > > > > If you only have a CAS as base of the atomic operations on a type, then > > a CAS operation exposed to the user will still be a just a single HW > > CAS. But any other operation besides the CAS and a load will need *two* > > CAS operations; even an atomic store has to be implemented as a CAS > > loop. > > Would we just stop expanding all those __atomic_*/__sync_* builtins inline > then (which would IMHO break tons of stuff), I agree that switching from atomic HW instructions to lock-based atomics or vice versa is an ABI break. I'm not quite sure whether to be more concerned about this or about storing to read-only memory. Using read-only memory is only meaningful if the pages are also mapped by another process that has write access to them, but maybe that's more common than we'd think? OTOH, having that and having use of wide atomics might be rare? > or just some predicate that > atomic.h/atomic headers use? Not sure which predicate you mean precisely, but if it's about not advertising operations as lock-free that use weird things like CAS for loads, then I'd probably agree. I believe that many users understand "lock-free" to also imply that they are natively supported by the hardware (and also efficiently, not generating unncessary contention), so not advertising load-based-on-CAS as lock-free might be helpful to them.
Re: Implementing atomic load as compare-and-swap for read-only memory
On Fri, 2016-06-03 at 13:46 +0100, Kyrill Tkachov wrote: > Hi Jakub, Torvald, > > On 03/06/16 13:32, Jakub Jelinek wrote: > > On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote: > >> And that would be fine, IMO. If you can't even load atomically, doing > >> something useful with this type will be hard except in special cases. > >> Also, doing a CAS (compare-and-swap) and thus potentially bringing in > >> the cache line in exclusive mode can be a lot more costly than what > >> users might expect from a load. A short critical section might not be > >> much slower. > >> > >> If you only have a CAS as base of the atomic operations on a type, then > >> a CAS operation exposed to the user will still be a just a single HW > >> CAS. But any other operation besides the CAS and a load will need *two* > >> CAS operations; even an atomic store has to be implemented as a CAS > >> loop. > > Would we just stop expanding all those __atomic_*/__sync_* builtins inline > > then (which would IMHO break tons of stuff), or just some predicate that > > atomic.h/atomic headers use? > > > >>> But doesn't that mean you should fall back to locked operation also for > >>> any > >>> other atomic operation on such types, because otherwise if you > >>> atomic_store > >>> or any other kind of atomic operation, it wouldn't use the locking, while > >>> for atomic load it would? > >> I suppose you mean that one must fall back to using locking for all > >> operations? If load isn't atomic, then it can't be made atomic using > >> external locks if the other operations don't use the locks. > >> > >>> That would be an ABI change and quite significant > >>> pessimization in many cases. > >> A change from wide CAS to locking would be an ABI change I suppose, but > >> it could also be considered a necessary bugfix if we don't want to write > >> to read-only memory. Does this affect anything but i686? > > Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64 > > (judging from who initiated this thread), I bet there are many others. > > I'm looking at pre-LPAE ARMv7-A targets for which the > ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends: > "The way to atomically load two 32-bit quantities is to perform a > LDREXD/STREXD sequence, reading and writing the same value, for which the > STREXD succeeds, and use the read values." > > Currently we emit just a single load-doubleword-exclusive which, according to > the above, > would not be enough on such targets. If the store is idempotent, will the hardware actually execute a store that could conflict with a read-only memory mapping? In any case, I assume they would have to at least enforce the ordering constraints that might be caused with such a store, because a program could use an idempotent intentially just to get ordering. A similar question applies to archs with CAS instructions, but I'd assume they'd always create a visible store. I'm not quite sure whether current HW would also always put the cacheline into exclusive mode even when the CAS would fail because the expected value doesn't match the actual one. Do the arch maintainers know something about this? > On aarch64 doubleword (128 bit) atomic loads are done through locks (PR > 70814). Note that the bug reporter also states that llvm recently changed to calling libatomic functions.
GCC 5.4 Released
The GNU Compiler Collection version 5.4 has been released. GCC 5.4 is a bug-fix release from the GCC 5 branch containing important fixes for regressions and serious bugs in GCC 5.3 with more than 147 bugs fixed since the previous release. This release is available from the FTP servers listed at: http://www.gnu.org/order/ftp.html Please do not contact me directly regarding questions or comments about this release. Instead, use the resources available from http://gcc.gnu.org. As always, a vast number of people contributed to this GCC release -- far too many to thank them individually!
Re: Implementing atomic load as compare-and-swap for read-only memory
On 06/03/2016 05:32 AM, Jakub Jelinek wrote: A change from wide CAS to locking would be an ABI change I suppose, but it could also be considered a necessary bugfix if we don't want to write to read-only memory. Does this affect anything but i686? Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64 (judging from who initiated this thread), I bet there are many others. Both arm and aarch64 are not affected. Both use the double-word load-locked instruction for the atomic load; they simply don't pair it with a store-conditional in that case. There's nothing to be done about <= i686, but for recent-ish cpus we should be able to use either sse or fpu loads. While it's not promised in the spec, I would suggest that any 64-bit capable cpu will in practice perform all aligned 64-bit loads atomicly. There's nothing that can be done for x86_64 128-bit load. There are probably some implementations for which an aligned 128-bit sse load is atomic, but we also know that there are some implementations for which it is not (those that split sse instructions into 2 64-bit micro-ops, e.g. some AMD and all Atoms). It's an oversight that it would be nice for Intel+AMD to fix for us... r~
Re: (R5900) Implementing Vector Support
On 06/03/2016 05:54 AM, Woon yung Liu wrote: The problem is that gen_lowpart() doesn't seem to support casting to other modes of the same size. It certainly does. The only place you get into trouble with gen_lowpart is with CONST_INT, which is mode-less. But I am already doubting that I will complete this port as I can no longer see a favourable conclusion. I would suggest that you post actual code to the list so that people can help you. Simply answering questions in the abstract, as I have been doing, can only go so far. r~
April/May 2016 GNU Toolchain Update
Hi Guys, Well now that GCC 6 is out lets see what new features have started to appear in the toolchain: Several new warning options have been added to GCC: * The option -Wno-duplicate-decl-specifier has been added to generate warnings whenever a declaration contains duplicate const, volatile, restrict or _Atomic specifiers. This warning is enabled by -Wall. * The option -Wignored-attributes warns when an attribute is correctly assigned, but the compiler decided to ignore it anyway. This is different from the -Wattributes which warns when an attribute is either unknown or used in the wrong place. * The option -Wswitch-unreachable warns whenever a switch statement contains statements between the controlling expression and the first case label, which will never be executed. For example: switch (cond) { i = 15; case 5: The option does not warn if the statement(s) between the controlling expression and the first case label are just variable declarations: switch (cond) { int i; case 5: * The option -Wdangling-else warns about constructions where there may be confusion to which if statement an else branch belongs. For example: if (a) if (b) foo (); else bar (); * The option -Wmemset-elt-size warns about suspicious calls to the memset function, if the first argument references an array, and the third argument is a number equal to the number of elements, but not equal to the size of the array in memory. For example: int array[10]; memset (array, 0, 10); // Should be: memset (array, 0, 10 * sizeof (int)); A new point release of GDB is out: 7.11.1. This is a bugfix release addressing these issues: * PR remote/19863 (7.10 regression: gdb remote.c due to "setfs" with gdbserver < 7.7) * PR gdb/19829 (gdb crashes with PT and reverse next) * PR gdb/19676 (gdb fails with assert error if /proc is not mounted) * PR gdb/19828 (7.11 regression: non-stop gdb -p : internal error) * PR remote/19840 (gdb crashes on reverse-stepi) * PR gdb/19858 (GDB doesn't register the JIT libraries on attach) * PR gdb/19958 (Breakpoints/watchpoints broken on MIPS Linux <= 4.5) * PR build/20029 (symfile.c ambiguous else warning) * PR python/20037 (GDB use-after-free error) * PR gdb/20039 (Using MI/all-stop, can't interrupt multi-threaded program after continue)a In the development GDB sources a couple of new features have been added: * Fortran: Support structures with fields of dynamic types and arrays of dynamic types. * Rust language support. GDB now supports debugging programs written in the Rust programming language. Development in the binutils has mostly concentrated on bugfixing, but there have been a few new features added: * The ARM port of GAS now has support for the ARMv8-M architecture, including the security and DSP extensions. * The ARC port of GAS now accepts .extInstruction, .extCondCode, .extAuxRegister, and .extCoreRegister pseudo-ops that allow an user to define custom instructions, conditional codes, auxiliary and core registers. * The MIPS port of GAS can now generate code for the DSP Release 3 Application Specific Extension. * Linker scripts can now use a NOCROSSREFSTO directive. This is like the NOCROSSREFS directive which ensures that two or more output sections are entirely independent from each other, except that it does allow one way referencing. The NOCROSSREFS_TO directive takes a list of output section names and complains if the first section is referenced from any of the other sections. Cheers Nick
An issue with GCC 6.1.0's make install?
Hello all, Yesterday I managed to successfully build GCC and all of the accompanying languages that it supports by default (Ada, C, C++, Fortran, Go, Java, Objective-C, Objective-C++, and Link-time Optimization (LTO)). I did not build JIT support because I have not herd if it is stable or not. Anyways, seeing as I didn't (and still do not) want to wait another 12 hours for that to build, I compressed it into a .tar.bz2 archive, copied it over to another server, decompressed it, and here's when the problems start. Keep in mind that I did ensure that all files were compressed and extracted. When I go into my build subdirectory build tree, and type "make install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and gcj, but it errors out (and, subsequently, bales out) and says the following: Making install in tools make[3]: *** [install-recursive] Error 1 make[2]: *** [install-recursive] Error 1 make[1]: *** [install-target-libjava] Error 2 make: *** [install] Error 2 And then: $ gcj gcj: error: libgcj.spec: No such file or directory I'm considering the test suite, but until it installs, I'm not sure if executing the test suite would be very wise at this point. To get it to say that no input file was specified, I have to manually run the following commands: $ cd x86_64-pc-linux-gnu/libjava $ cp libgcj.spec /usr/bin Has the transportation of the source code caused the build tree to be messed up? I know that it works perfectly fine on my other server. Running make install without the -s command line parameter yields nothing. Have I done something wrong? -- Signed, Ethin D. Probst