Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Kyrill Tkachov

Hi all,

expand_atomic_load in optabs.c tries to expand a wide atomic load using an 
atomic_compare_and_swap
with the comment saying that sometimes a redundant harmless store may be 
performed.
Is the store really valid if the memory is read-only?

I've been looking at implementing a similar compare-and-swap strategy for 
atomic_loaddi for some
arm targets and this concern came up. I don't think GCC can statically prove 
that a particular
piece of memory is guaranteed to be writeable at runtime in all cases, so 
emitting a spurious
store would not be always valid.

I see this concern was already raised in 
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html
but that doesn't seem to have gone anywhere.

Any thoughts? Should we remove the assumption that atomic loads always access 
writeable memory?

Thanks,
Kyrill


Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 10:34:15AM +0100, Kyrill Tkachov wrote:
> expand_atomic_load in optabs.c tries to expand a wide atomic load using an 
> atomic_compare_and_swap
> with the comment saying that sometimes a redundant harmless store may be 
> performed.
> Is the store really valid if the memory is read-only?
> 
> I've been looking at implementing a similar compare-and-swap strategy for 
> atomic_loaddi for some
> arm targets and this concern came up. I don't think GCC can statically prove 
> that a particular
> piece of memory is guaranteed to be writeable at runtime in all cases, so 
> emitting a spurious
> store would not be always valid.
> 
> I see this concern was already raised in 
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html
> but that doesn't seem to have gone anywhere.
> 
> Any thoughts? Should we remove the assumption that atomic loads always access 
> writeable memory?

I guess it is a tough decision.  If you don't have HW instruction to read
say double word aligned integer atomically, if you don't implement atomic
load on it through compare and swap (which indeed won't work with read-only
memory), then you probably need to fall back to locks in libatomic.
But doesn't that mean you should fall back to locked operation also for any
other atomic operation on such types, because otherwise if you atomic_store
or any other kind of atomic operation, it wouldn't use the locking, while
for atomic load it would?  That would be an ABI change and quite significant
pessimization in many cases.

Jakub


GCC 5.5 Status Report (2016-06-03)

2016-06-03 Thread Richard Biener

Status
==

The GCC 5 branch is open again for regression and documentation fixes.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  138   -   9
P3   20   +   7
P4   76   -   4
P5   28
---   ---
Total P1-P3 158   -   5
Total   262   -   9


Previous Report
===

https://gcc.gnu.org/ml/gcc/2016-05/msg00164.html


Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Torvald Riegel
On Fri, 2016-06-03 at 12:03 +0200, Jakub Jelinek wrote:
> On Fri, Jun 03, 2016 at 10:34:15AM +0100, Kyrill Tkachov wrote:
> > expand_atomic_load in optabs.c tries to expand a wide atomic load using an 
> > atomic_compare_and_swap
> > with the comment saying that sometimes a redundant harmless store may be 
> > performed.
> > Is the store really valid if the memory is read-only?
> > 
> > I've been looking at implementing a similar compare-and-swap strategy for 
> > atomic_loaddi for some
> > arm targets and this concern came up. I don't think GCC can statically 
> > prove that a particular
> > piece of memory is guaranteed to be writeable at runtime in all cases, so 
> > emitting a spurious
> > store would not be always valid.
> > 
> > I see this concern was already raised in 
> > https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00278.html
> > but that doesn't seem to have gone anywhere.
> > 
> > Any thoughts? Should we remove the assumption that atomic loads always 
> > access writeable memory?

I think we shouldn't store to memory that we cannot be certain is
writable.

> I guess it is a tough decision.  If you don't have HW instruction to read
> say double word aligned integer atomically, if you don't implement atomic
> load on it through compare and swap (which indeed won't work with read-only
> memory), then you probably need to fall back to locks in libatomic.

And that would be fine, IMO.  If you can't even load atomically, doing
something useful with this type will be hard except in special cases.
Also, doing a CAS (compare-and-swap) and thus potentially bringing in
the cache line in exclusive mode can be a lot more costly than what
users might expect from a load.  A short critical section might not be
much slower.

If you only have a CAS as base of the atomic operations on a type, then
a CAS operation exposed to the user will still be a just a single HW
CAS.  But any other operation besides the CAS and a load will need *two*
CAS operations; even an atomic store has to be implemented as a CAS
loop.

> But doesn't that mean you should fall back to locked operation also for any
> other atomic operation on such types, because otherwise if you atomic_store
> or any other kind of atomic operation, it wouldn't use the locking, while
> for atomic load it would?

I suppose you mean that one must fall back to using locking for all
operations?  If load isn't atomic, then it can't be made atomic using
external locks if the other operations don't use the locks.

> That would be an ABI change and quite significant
> pessimization in many cases.

A change from wide CAS to locking would be an ABI change I suppose, but
it could also be considered a necessary bugfix if we don't want to write
to read-only memory.  Does this affect anything but i686?




Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Jakub Jelinek
On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> And that would be fine, IMO.  If you can't even load atomically, doing
> something useful with this type will be hard except in special cases.
> Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> the cache line in exclusive mode can be a lot more costly than what
> users might expect from a load.  A short critical section might not be
> much slower.
> 
> If you only have a CAS as base of the atomic operations on a type, then
> a CAS operation exposed to the user will still be a just a single HW
> CAS.  But any other operation besides the CAS and a load will need *two*
> CAS operations; even an atomic store has to be implemented as a CAS
> loop.

Would we just stop expanding all those __atomic_*/__sync_* builtins inline
then (which would IMHO break tons of stuff), or just some predicate that
atomic.h/atomic headers use?

> > But doesn't that mean you should fall back to locked operation also for any
> > other atomic operation on such types, because otherwise if you atomic_store
> > or any other kind of atomic operation, it wouldn't use the locking, while
> > for atomic load it would?
> 
> I suppose you mean that one must fall back to using locking for all
> operations?  If load isn't atomic, then it can't be made atomic using
> external locks if the other operations don't use the locks.
> 
> > That would be an ABI change and quite significant
> > pessimization in many cases.
> 
> A change from wide CAS to locking would be an ABI change I suppose, but
> it could also be considered a necessary bugfix if we don't want to write
> to read-only memory.  Does this affect anything but i686?

Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
(judging from who initiated this thread), I bet there are many others.

Jakub


Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Kyrill Tkachov

Hi Jakub, Torvald,

On 03/06/16 13:32, Jakub Jelinek wrote:

On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:

And that would be fine, IMO.  If you can't even load atomically, doing
something useful with this type will be hard except in special cases.
Also, doing a CAS (compare-and-swap) and thus potentially bringing in
the cache line in exclusive mode can be a lot more costly than what
users might expect from a load.  A short critical section might not be
much slower.

If you only have a CAS as base of the atomic operations on a type, then
a CAS operation exposed to the user will still be a just a single HW
CAS.  But any other operation besides the CAS and a load will need *two*
CAS operations; even an atomic store has to be implemented as a CAS
loop.

Would we just stop expanding all those __atomic_*/__sync_* builtins inline
then (which would IMHO break tons of stuff), or just some predicate that
atomic.h/atomic headers use?


But doesn't that mean you should fall back to locked operation also for any
other atomic operation on such types, because otherwise if you atomic_store
or any other kind of atomic operation, it wouldn't use the locking, while
for atomic load it would?

I suppose you mean that one must fall back to using locking for all
operations?  If load isn't atomic, then it can't be made atomic using
external locks if the other operations don't use the locks.


That would be an ABI change and quite significant
pessimization in many cases.

A change from wide CAS to locking would be an ABI change I suppose, but
it could also be considered a necessary bugfix if we don't want to write
to read-only memory.  Does this affect anything but i686?

Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
(judging from who initiated this thread), I bet there are many others.


I'm looking at pre-LPAE ARMv7-A targets for which the
ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends:
"The way to atomically load two 32-bit quantities is to perform a
LDREXD/STREXD sequence, reading and writing the same value, for which the
STREXD succeeds, and use the read values."

Currently we emit just a single load-doubleword-exclusive which, according to 
the above,
would not be enough on such targets.

On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 70814).

Kyrill


Jakub




RE: (R5900) Implementing Vector Support

2016-06-03 Thread Matthew Fortune
Woon yung Liu  writes:
> On Wednesday, June 1, 2016 5:45 AM, Richard Henderson 
> wrote:
> > This is almost always incorrect, and certainly before reload.
> > You need to use gen_lowpart.  There are examples in the code
> 
> > fragments that I sent the other week.
> 
> The problem is that gen_lowpart() doesn't seem to support casting to
> other modes of the same size.
> When I use it, the assert within gen_lowpart_general() will fail due to
> gen_lowpart_common() rejecting the operation (new mode size is not
> smaller than the old).

The conclusion we came to when developing MSA is that simplify_gen_subreg
is the way to go for converting between vector modes:

simplify_gen_subreg (new_mode, rtx, old_mode, 0)

I'm not sure there is much need to change modes after reload so do it
upfront at expand time or when splitting and you should be OK.

See trunk mips.c for a number of uses of this when converting vector modes.
 
> > You need to read the gcc internals documentation.  They are all three
> different
> 
> > uses, though there is some overlap between define_insn and
> define_expand.
> 
> I actually read the GCC internals documentation before I even begun any
> attempt at this, but there was still a lot that I did not successfully
> grasp.
> 
> I'll go with define_expand then.

define_expand only provides a way to generate an instruction or sequence
of instructions to achieve the overall goal. You must also have
define_insn definitions for any pattern you emit or the generated code
will fail to match.

A define_insn_and_split is just shorthand for a define_insn where one
or more output patterns are '#' (split) and you want to define the
split alongside the instruction rather than as a separate define_split.
As far as I understand the difference is syntactic sugar.

> But I am already doubting that I will complete this port as I can no
> longer see a favourable conclusion.

It may take time but I'm sure we can help talk through the problems. As
a new GCC developer you are a welcome addition to the community.

Thanks,
Matthew



Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Torvald Riegel
On Fri, 2016-06-03 at 14:32 +0200, Jakub Jelinek wrote:
> On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> > And that would be fine, IMO.  If you can't even load atomically, doing
> > something useful with this type will be hard except in special cases.
> > Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> > the cache line in exclusive mode can be a lot more costly than what
> > users might expect from a load.  A short critical section might not be
> > much slower.
> > 
> > If you only have a CAS as base of the atomic operations on a type, then
> > a CAS operation exposed to the user will still be a just a single HW
> > CAS.  But any other operation besides the CAS and a load will need *two*
> > CAS operations; even an atomic store has to be implemented as a CAS
> > loop.
> 
> Would we just stop expanding all those __atomic_*/__sync_* builtins inline
> then (which would IMHO break tons of stuff),

I agree that switching from atomic HW instructions to lock-based atomics
or vice versa is an ABI break.  I'm not quite sure whether to be more
concerned about this or about storing to read-only memory.  Using
read-only memory is only meaningful if the pages are also mapped by
another process that has write access to them, but maybe that's more
common than we'd think?  OTOH, having that and having use of wide
atomics might be rare?

> or just some predicate that
> atomic.h/atomic headers use?

Not sure which predicate you mean precisely, but if it's about not
advertising operations as lock-free that use weird things like CAS for
loads, then I'd probably agree.  I believe that many users understand
"lock-free" to also imply that they are natively supported by the
hardware (and also efficiently, not generating unncessary contention),
so not advertising load-based-on-CAS as lock-free might be helpful to
them.



Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Torvald Riegel
On Fri, 2016-06-03 at 13:46 +0100, Kyrill Tkachov wrote:
> Hi Jakub, Torvald,
> 
> On 03/06/16 13:32, Jakub Jelinek wrote:
> > On Fri, Jun 03, 2016 at 02:26:09PM +0200, Torvald Riegel wrote:
> >> And that would be fine, IMO.  If you can't even load atomically, doing
> >> something useful with this type will be hard except in special cases.
> >> Also, doing a CAS (compare-and-swap) and thus potentially bringing in
> >> the cache line in exclusive mode can be a lot more costly than what
> >> users might expect from a load.  A short critical section might not be
> >> much slower.
> >>
> >> If you only have a CAS as base of the atomic operations on a type, then
> >> a CAS operation exposed to the user will still be a just a single HW
> >> CAS.  But any other operation besides the CAS and a load will need *two*
> >> CAS operations; even an atomic store has to be implemented as a CAS
> >> loop.
> > Would we just stop expanding all those __atomic_*/__sync_* builtins inline
> > then (which would IMHO break tons of stuff), or just some predicate that
> > atomic.h/atomic headers use?
> >
> >>> But doesn't that mean you should fall back to locked operation also for 
> >>> any
> >>> other atomic operation on such types, because otherwise if you 
> >>> atomic_store
> >>> or any other kind of atomic operation, it wouldn't use the locking, while
> >>> for atomic load it would?
> >> I suppose you mean that one must fall back to using locking for all
> >> operations?  If load isn't atomic, then it can't be made atomic using
> >> external locks if the other operations don't use the locks.
> >>
> >>> That would be an ABI change and quite significant
> >>> pessimization in many cases.
> >> A change from wide CAS to locking would be an ABI change I suppose, but
> >> it could also be considered a necessary bugfix if we don't want to write
> >> to read-only memory.  Does this affect anything but i686?
> > Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
> > (judging from who initiated this thread), I bet there are many others.
> 
> I'm looking at pre-LPAE ARMv7-A targets for which the
> ARM Architecture Reference Manual (rev C.c) section A3.5.3 recommends:
> "The way to atomically load two 32-bit quantities is to perform a
> LDREXD/STREXD sequence, reading and writing the same value, for which the
> STREXD succeeds, and use the read values."
> 
> Currently we emit just a single load-doubleword-exclusive which, according to 
> the above,
> would not be enough on such targets.

If the store is idempotent, will the hardware actually execute a store
that could conflict with a read-only memory mapping?
In any case, I assume they would have to at least enforce the ordering
constraints that might be caused with such a store, because a program
could use an idempotent intentially just to get ordering.

A similar question applies to archs with CAS instructions, but I'd
assume they'd always create a visible store.  I'm not quite sure whether
current HW would also always put the cacheline into exclusive mode even
when the CAS would fail because the expected value doesn't match the
actual one.  Do the arch maintainers know something about this?

> On aarch64 doubleword (128 bit) atomic loads are done through locks (PR 
> 70814).

Note that the bug reporter also states that llvm recently changed to
calling libatomic functions.



GCC 5.4 Released

2016-06-03 Thread Richard Biener

The GNU Compiler Collection version 5.4 has been released.

GCC 5.4 is a bug-fix release from the GCC 5 branch
containing important fixes for regressions and serious bugs in
GCC 5.3 with more than 147 bugs fixed since the previous release.
This release is available from the FTP servers listed at:

  http://www.gnu.org/order/ftp.html

Please do not contact me directly regarding questions or comments
about this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release
-- far too many to thank them individually!


Re: Implementing atomic load as compare-and-swap for read-only memory

2016-06-03 Thread Richard Henderson

On 06/03/2016 05:32 AM, Jakub Jelinek wrote:

A change from wide CAS to locking would be an ABI change I suppose, but
it could also be considered a necessary bugfix if we don't want to write
to read-only memory.  Does this affect anything but i686?


Also x86_64 (for 128-bit atomics), clearly also either arm or aarch64
(judging from who initiated this thread), I bet there are many others.


Both arm and aarch64 are not affected.  Both use the double-word load-locked 
instruction for the atomic load; they simply don't pair it with a 
store-conditional in that case.


There's nothing to be done about <= i686, but for recent-ish cpus we should be 
able to use either sse or fpu loads.  While it's not promised in the spec, I 
would suggest that any 64-bit capable cpu will in practice perform all aligned 
64-bit loads atomicly.


There's nothing that can be done for x86_64 128-bit load.  There are probably 
some implementations for which an aligned 128-bit sse load is atomic, but we 
also know that there are some implementations for which it is not (those that 
split sse instructions into 2 64-bit micro-ops, e.g. some AMD and all Atoms).


It's an oversight that it would be nice for Intel+AMD to fix for us...


r~


Re: (R5900) Implementing Vector Support

2016-06-03 Thread Richard Henderson

On 06/03/2016 05:54 AM, Woon yung Liu wrote:

The problem is that gen_lowpart() doesn't seem to support casting to other 
modes of the same size.


It certainly does.  The only place you get into trouble with gen_lowpart is 
with CONST_INT, which is mode-less.



But I am already doubting that I will complete this port as I can no longer see 
a favourable conclusion.


I would suggest that you post actual code to the list so that people can help 
you.  Simply answering questions in the abstract, as I have been doing, can 
only go so far.



r~



April/May 2016 GNU Toolchain Update

2016-06-03 Thread Nick Clifton
Hi Guys,

  Well now that GCC 6 is out lets see what new features have started
  to appear in the toolchain:

  Several new warning options have been added to GCC:
  
  * The option -Wno-duplicate-decl-specifier has been added to
generate warnings whenever a declaration contains duplicate const,
volatile, restrict or _Atomic specifiers.  This warning is enabled
by -Wall.

  * The option -Wignored-attributes warns when an attribute is
correctly assigned, but the compiler decided to ignore it anyway.
This is different from the -Wattributes which warns when an
attribute is either unknown or used in the wrong place.

  * The option -Wswitch-unreachable warns whenever a switch statement
contains statements between the controlling expression and the
first case label, which will never be executed.  For example:

  switch (cond)
{
  i = 15;
  case 5:
 

The option does not warn if the statement(s) between the
controlling expression and the first case label are just variable
declarations:

  switch (cond)
{
  int i;
  case 5:
 

  * The option -Wdangling-else warns about constructions where there
may be confusion to which if statement an else branch belongs.
For example:

   if (a)
 if (b)
   foo ();
   else
 bar ();

  * The option -Wmemset-elt-size warns about suspicious calls to
the memset function, if the first argument references an array,
and the third argument is a number equal to the number of
elements, but not equal to the size of the array in memory.
For example:

  int array[10];
  memset (array, 0, 10);  // Should be: memset (array, 0, 10 * sizeof 
(int));


  A new point release of GDB is out: 7.11.1.  This is a bugfix release
  addressing these issues:

* PR remote/19863 (7.10 regression: gdb remote.c due to "setfs" with
  gdbserver < 7.7)
* PR gdb/19829 (gdb crashes with PT and reverse next)
* PR gdb/19676 (gdb fails with assert error if /proc is not mounted)
* PR gdb/19828 (7.11 regression: non-stop gdb -p : internal error)
* PR remote/19840 (gdb crashes on reverse-stepi)
* PR gdb/19858 (GDB doesn't register the JIT libraries on attach)
* PR gdb/19958 (Breakpoints/watchpoints broken on MIPS Linux <= 4.5)
* PR build/20029 (symfile.c ambiguous else warning)
* PR python/20037 (GDB use-after-free error)
* PR gdb/20039 (Using MI/all-stop, can't interrupt multi-threaded
  program after continue)a

  In the development GDB sources a couple of new features have been
  added:
  
* Fortran: Support structures with fields of dynamic types and 
  arrays of dynamic types.

* Rust language support.
  GDB now supports debugging programs written in the Rust
  programming language.


  Development in the binutils has mostly concentrated on bugfixing,
  but there have been a few new features added:

  * The ARM port of GAS now has support for the ARMv8-M architecture,
including the security and DSP extensions.

  * The ARC port of GAS now accepts .extInstruction, .extCondCode, 
.extAuxRegister, and .extCoreRegister pseudo-ops that allow an
user to define custom instructions, conditional codes, auxiliary
and core registers.

  * The MIPS port of GAS can now generate code for the DSP Release 3
Application Specific Extension.

  * Linker scripts can now use a NOCROSSREFSTO directive.  This is
like the NOCROSSREFS directive which ensures that two or more
output sections are entirely independent from each other, except
that it does allow one way referencing. The NOCROSSREFS_TO
directive takes a list of output section names and complains if
the first section is referenced from any of the other sections. 

Cheers
  Nick


An issue with GCC 6.1.0's make install?

2016-06-03 Thread Ethin Probst
Hello all,
Yesterday I managed to successfully build GCC and all of the
accompanying languages that it supports by default (Ada, C, C++,
Fortran, Go, Java, Objective-C, Objective-C++, and Link-time
Optimization (LTO)). I did not build JIT support because I have not
herd if it is stable or not.
Anyways, seeing as I didn't (and still do not) want to wait another 12
hours for that to build, I compressed it into a .tar.bz2 archive,
copied it over to another server, decompressed it, and here's when the
problems start. Keep in mind that I did ensure that all files were
compressed and extracted.
When I go into my build subdirectory build tree, and type "make
install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and
gcj, but it errors out (and, subsequently, bales out) and says the
following:
Making install in tools
make[3]: *** [install-recursive] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: *** [install-target-libjava] Error 2
make: *** [install] Error 2
And then:
$ gcj
gcj: error: libgcj.spec: No such file or directory
I'm considering the test suite, but until it installs, I'm not sure if
executing the test suite would be very wise at this point. To get it
to say that no input file was specified, I have to manually run the
following commands:
$ cd x86_64-pc-linux-gnu/libjava
$ cp libgcj.spec /usr/bin
Has the transportation of the source code caused the build tree to be
messed up? I know that it works perfectly fine on my other server.
Running make install without the -s command line parameter yields
nothing. Have I done something wrong?
-- 
Signed,
Ethin D. Probst