[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #16 from Richard Biener  ---
It's hard to tell - will try to look at more dumps produced by a cross which
hopefully matches your setup.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2018-04-19 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #29 from rguenther at suse dot de  ---
On Thu, 19 Apr 2018, jameskuyper at verizon dot net wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
> 
> James Kuyper Jr.  changed:
> 
>What|Removed |Added
> 
>  CC||jameskuyper at verizon dot 
> net
> 
> --- Comment #28 from James Kuyper Jr.  ---
> (In reply to Martin Sebor from comment #17)
> > The C Union Visibility rule was intended to cover that case.  The trouble is
> > that the rule tends to be interpreted differently by different people, users
> > and implementers alike: Is it the union object that must be visible at the
> > point of the access, or just the union type?
> 
> The relevant wording is "anywhere that a declaration of the completed type of
> the union is visible.", so it's unambiguously the type, not the object, which
> must be visible. A declaration of the completed type can be visible without 
> any
> objects of that type being visible, and that's sufficient for this rule to
> apply.
> 
> > ...  Must the access be performed
> > using the union object, or just the union type, or neither?
> 
> It says only that "it is permitted to inspect the common initial part"; it
> imposes no restrictions on how the inspection may be performed. Clearly,
> inspection through an lvalue of the union's type must be permitted, but it is
> also permitted to use the more indirect methods which are the subject of this
> bug report, simply because the standard says nothing to restrict the 
> permission
> it grants to the more direct cases.

Note I repeatedly said this part of the standard is just stupid.  It makes
most if not all type-based alias analysis useless.

Which means I'll refuse any patches implementing it in a way that affects
default behavior.  A clean patch (I really can't think of any clean 
approach besides forcing -fno-strict-aliasing!) with some extra flag
(well, just use -fno-strict-aliasing ...) would be fine with me.

[Bug libstdc++/85439] mt19937_64 producing unexpected result only in certain configuration

2018-04-19 Thread foddex at foddex dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85439

--- Comment #4 from Marc "Foddex" Oude Kotte  ---
The reason I was expecting the same result everywhere is because of this
statement on cppreference.com:


"Notes
The 1th consecutive invocation of a default-contructed std::mt19937 is
required to produce the value 4123659995.

The 1th consecutive invocation of a default-contructed std::mt19937_64 is
required to produce the value 9981545732273789042"


Source: http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine

Obviously I'm not testing the 1th consecutive invocation but the 1st, but I
still expected things to be similar (and they are, in 7/8 test cases).

[Bug fortran/85448] the compiler selects the wrong subroutine because of bind(c,name=...)

2018-04-19 Thread francois.jacq at irsn dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448

--- Comment #2 from francois.jacq at irsn dot fr ---
(In reply to kargl from comment #1)
> (In reply to francois.jacq from comment #0)
> > In the following example, the subroutine c_open of the module m2, which
> > should call the subroutine odopen of the module m1, calls itself instead...
> 
> The behavior you see is correct.  There is no 'c_odopen'
> in the m2.o file.  You gave it the binding name 'odopen'.
> You have created a recursive call.

No the behavior is not correct : at least, the compiler should issue an error
message precising that there is a name conflict because odopen is also
accessible via the instruction use.

Another variant which clearly shows that the compiler behavior is wrong :

module m2
   use iso_c_binding
   use m1, only : odopen_m1 => odopen
   implicit none
   contains
   subroutine c_odopen(unit) bind(c,name="odopen")
  integer(c_int),intent(out) :: unit
  write(*,*) 'c_odopen'
  call odopen_m1(unit)
   end subroutine
end module

With this variant, I get again the same behavior : c_odopen is called
recursively

[Bug tree-optimization/85446] [7/8 Regression] wrong-code on riscv64

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85446

--- Comment #2 from Jakub Jelinek  ---
Author: jakub
Date: Thu Apr 19 07:46:54 2018
New Revision: 259488

URL: https://gcc.gnu.org/viewcvs?rev=259488&root=gcc&view=rev
Log:
PR tree-optimization/85446
* match.pd ((intptr_t) x eq/ne CST to x eq/ne (typeof x) cst): Require
the integral and pointer types to have the same precision.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/match.pd

[Bug tree-optimization/85446] [7 Regression] wrong-code on riscv64

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85446

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[7/8 Regression] wrong-code |[7 Regression] wrong-code
   |on riscv64  |on riscv64

--- Comment #3 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #17 from Richard Biener  ---
Created attachment 43984
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43984&action=edit
patch I am testing

It is a similar issue, if not the same.  With the attached I see the pcom
performed - can you throw it on some ppc testing please?

[Bug ipa/85449] [8 Regression] Wrong specialization is called in self recursive functions after r259319

2018-04-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85449

--- Comment #6 from Martin Liška  ---
So it's very hard to isolate a simple test-case without not doing an UBSAN.
Thus easiest way to reproduce that is:

$ cd src
$ gcc *.c -I. -O3 && ./a.out 
Fatal error: glibc detected an invalid stdio handle
Aborted (core dumped)

with the main.c file I sent.

[Bug libgomp/85463] New: [nvptx] "exit" in offloaded region doesn't terminate process

2018-04-19 Thread tschwinge at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85463

Bug ID: 85463
   Summary: [nvptx] "exit" in offloaded region doesn't terminate
process
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: openacc, openmp
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---
Target: nvptx

Consider:

#include 

int main(int argc, char *argv[])
{
#pragma acc parallel
  {
if (argc != 1)
  exit(35);
  }

  return 0;
}

No matter the argc value, this program always terminates normally (exit code
zero).

Is the right solution to have GOACC_parallel etc. detect that exit etc. have
been called inside offloaded regions, and then have them terminate the process
in the appropriate way?

Can we make use of PTX trap in some way for that, to avoid adding overhead of
retval checking for (the majority of) kernel launches that terminate normally? 
That is, when C exit is called, set some retval (like currently only done in
stand-alone nvptx target testing), and then call PTX trap instead of PTX exit,
which will then cause cuStreamSynchronize will indicate launch error, then we
can check the retval and do the appropritate thing?

Or, similarly, a scheme where the low-overhead PTX exit is only ever used for
normal (exit code zero) C exit(0), and everything else uses the PTX trap
variant?

This relates to PR85166 "[nvptx, libgfortran] Libgomp fortran tests using stop
in offloaded fns fail to compile", where (still now) a Fortran STOP usage will
not actually terminate the process with the appropriate error code.  (The
"abort" testcases in the libgomp testsuite might appear to do the right thing
nevertheless; see the (unresolved) discussion in
.)

[Bug c++/85461] A simple recursive TMP static const initializer defeats gcc

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85461

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
The testcase is incomplete, something like:
int a = bitWidthHolding<255>::width;
is missing, otherwise it doesn't fail.  The reason for giving up is that it
tries to instantiate bitWidthHolding<0> many times, doesn't seem to be a
regression, all gcc versions that have -std=c++0x support fail similarly.

[Bug c++/85464] New: Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread janniksilvanus at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

Bug ID: 85464
   Summary: Wignored-qualifiers is emitted by cc1plus without
diagnostics when triggered by a cast operator.
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janniksilvanus at gmail dot com
  Target Milestone: ---

Consider the following example:

struct Test {
   operator int const() const;
};

When compiling using

g++ -Wextra -c cast.C

I get 

cc1plus: warning: type qualifiers ignored on function return type
[-Wignored-qualifiers]

Note that the warning is emitted by cc1plus instead of g++, and does not
contain any further info about e.g. the line that triggers this warning.

Output of g++ -v:
Target: x86_64-redhat-linux
Configured with: ./../gcc-7.3.0/configure --prefix=/lfs/vlsi/tools/gcc/7.3.0
--disable-multilib --enable-languages=c,c++,lto --enable-bootstrap
--enable-shared --enable-threads=posix --enable-checking=release
--with-system-zlib --enable-gnu-unique-object
--with-default-libstdcxx-abi=gcc4-compatible --enable-libmpx
--enable-gnu-indirect-function --with-tune=generic --build=x86_64-redhat-linux
--with-advance-toolchain=/opt/rh/devtoolset-7/root/
Thread model: posix
gcc version 7.3.0 (GCC)

The same problem seems to be present with version 7.1.0, but not with 6.1.0
(although gcc-6.1.0 places the marker on the wrong const, i.e. the last const
of the line, which seems to have been reported and fixed via bug 69733).

[Bug c++/85462] [8 Regression] internal compiler error: in inc_refcount_use, at cp/pt.c:8955

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85462

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-19
 CC||aoliva at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
   Target Milestone|--- |8.0
Summary|internal compiler error: in |[8 Regression] internal
   |inc_refcount_use, at|compiler error: in
   |cp/pt.c:8955|inc_refcount_use, at
   ||cp/pt.c:8955
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Reducing and bisecting now.  I suspect r259457.

[Bug c++/85461] A simple recursive TMP static const initializer defeats gcc

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85461

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||rejects-valid
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-19
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely  ---
I think https://wg21.link/cwg712 makes this valid.

[Bug libgomp/85463] [nvptx] "exit" in offloaded region doesn't terminate process

2018-04-19 Thread tschwinge at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85463

--- Comment #1 from Thomas Schwinge  ---
Author: tschwinge
Date: Thu Apr 19 08:53:38 2018
New Revision: 259491

URL: https://gcc.gnu.org/viewcvs?rev=259491&root=gcc&view=rev
Log:
PR85463 '[nvptx] "exit" in offloaded region doesn't terminate process'

libgomp/
PR libfortran/85166
* testsuite/libgomp.oacc-fortran/abort-1.f90: Switch back to "call
abort".
* testsuite/libgomp.oacc-fortran/abort-2.f90: Likewise.

libgfortran/
PR libfortran/85166
PR libgomp/85463
* runtime/minimal.c (stop_numeric): Reimplement.
(stop_string, error_stop_string, error_stop_numeric): New
functions.
libgomp/
PR libgomp/85463
* testsuite/libgomp.oacc-fortran/error_stop-1.f: New file.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.

Added:
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-2.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-3.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-1.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-2.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-3.f
Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/runtime/minimal.c
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90

[Bug libfortran/85166] [nvptx, libgfortran] Libgomp fortran tests using stop in offloaded fns fail to compile

2018-04-19 Thread tschwinge at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85166

--- Comment #7 from Thomas Schwinge  ---
Author: tschwinge
Date: Thu Apr 19 08:53:38 2018
New Revision: 259491

URL: https://gcc.gnu.org/viewcvs?rev=259491&root=gcc&view=rev
Log:
PR85463 '[nvptx] "exit" in offloaded region doesn't terminate process'

libgomp/
PR libfortran/85166
* testsuite/libgomp.oacc-fortran/abort-1.f90: Switch back to "call
abort".
* testsuite/libgomp.oacc-fortran/abort-2.f90: Likewise.

libgfortran/
PR libfortran/85166
PR libgomp/85463
* runtime/minimal.c (stop_numeric): Reimplement.
(stop_string, error_stop_string, error_stop_numeric): New
functions.
libgomp/
PR libgomp/85463
* testsuite/libgomp.oacc-fortran/error_stop-1.f: New file.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.

Added:
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-2.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/error_stop-3.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-1.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-2.f
trunk/libgomp/testsuite/libgomp.oacc-fortran/stop-3.f
Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/runtime/minimal.c
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
trunk/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90

[Bug c++/85460] g++ does not check incompatible type

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85460

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-19
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from Jonathan Wakely  ---
The standard doesn't require a diagnostic here, because you haven't
instantiated the template.

Confirming as an enhancement, because rejecting the code before instantiation
is permitted, just not required.

[Bug other/50639] -flto=jobserver broken on large LTO build

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
So fixed.

[Bug lto/50679] [meta-bug] Linux kernel LTO tracking bug

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50679
Bug 50679 depends on bug 50639, which changed state.

Bug 50639 Summary: -flto=jobserver broken on large LTO build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug fortran/85448] the compiler selects the wrong subroutine because of bind(c,name=...)

2018-04-19 Thread francois.jacq at irsn dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448

--- Comment #3 from francois.jacq at irsn dot fr ---
Notice that this is a regression : The version 4.8.5 returns the result I
expected...

[Bug rtl-optimization/85455] [8 Regression] ICE in verify_loop_structure, at cfgloop.c:1708 (error: basic block 3 should be marked irreducible)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85455

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-04-19
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Target Milestone|--- |8.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
So we end up in

bool
cleanup_cfg (int mode)
{
...

with mode == 41 and

  /* ???  We probably do this way too often.  */
  if (current_loops
  && (changed
  || (mode & CLEANUP_CFG_CHANGED)))
{
  timevar_push (TV_REPAIR_LOOPS);
  /* The above doesn't preserve dominance info if available.  */
  gcc_assert (!dom_info_available_p (CDI_DOMINATORS));
  calculate_dominance_info (CDI_DOMINATORS);
  fix_loop_structure (NULL);
  free_dominance_info (CDI_DOMINATORS);
  timevar_pop (TV_REPAIR_LOOPS);
}

here changed is false and mode doesn't include CLEANUP_CFG_CHANGED.

  while (try_optimize_cfg (mode))
{

returns false here so changed isn't set but the call wrecks loop info.
That is because of

  if (mode & (CLEANUP_CROSSJUMP | CLEANUP_THREADING))
clear_bb_flags ();

and

/* Bit mask for all basic block flags that must be preserved.  These are
   the bit masks that are *not* cleared by clear_bb_flags.  */
#define BB_FLAGS_TO_PRESERVE\
  (BB_DISABLE_SCHEDULE | BB_RTL | BB_NON_LOCAL_GOTO_TARGET  \
   | BB_HOT_PARTITION | BB_COLD_PARTITION)

which doesn't include loop related flags.  This is a latent issue, certainly
not a regression in general(?), a fix could be

Index: gcc/cfg.c
===
--- gcc/cfg.c   (revision 259486)
+++ gcc/cfg.c   (working copy)
@@ -386,9 +386,13 @@ void
 clear_bb_flags (void)
 {
   basic_block bb;
+  int flags_to_preserve = BB_FLAGS_TO_PRESERVE;
+  if (current_loops
+  && loops_state_satisfies_p (cfun,
LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
+flags_to_preserve |= BB_IRREDUCIBLE_LOOP;

   FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= BB_FLAGS_TO_PRESERVE;
+bb->flags &= flags_to_preserve;
 }


 /* Check the consistency of profile information.  We can't do that

[Bug tree-optimization/85459] [8 Regression] Larger code generated from GMP template meta-programming

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85459

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
I think this is a result of many changes.
E.g. r249885 bumps .s size from 3709 to 4599 bytes, r254724 from 4599 to 5768,
r255510 from 5772 to 7713.  You are compiling with -O3, not -Os, so the size is
not what counts though, but the code speed, right?

[Bug tree-optimization/85459] [8 Regression] Larger code generated from GMP template meta-programming

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85459

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |8.0

[Bug testsuite/61606] About GCC install, testing step (mostly check...)

2018-04-19 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61606

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #3 from Jürgen Reuter  ---
Is it really necessary to keep this one here open? Just stumbled on this via
the discussion on c.l.f. This was with 4.9 which is no longer supported.
Probably on an OS that is no longer used by the OP.

[Bug c++/85464] [7 Regression] Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
 Status|UNCONFIRMED |NEW
  Known to work||6.4.0, 8.0.1
   Keywords||diagnostic
   Last reconfirmed||2018-04-19
 Ever confirmed|0   |1
Summary|Wignored-qualifiers is  |[7 Regression]
   |emitted by cc1plus without  |Wignored-qualifiers is
   |diagnostics when triggered  |emitted by cc1plus without
   |by a cast operator. |diagnostics when triggered
   ||by a cast operator.
   Target Milestone|--- |7.4
  Known to fail||7.3.1

--- Comment #1 from Richard Biener  ---
Confirmed.  Works again on trunk.

[Bug c++/85464] [7 Regression] Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

--- Comment #2 from Jonathan Wakely  ---
Started with r240863 (the fix for PR 69733) and was fixed by r249935 (for PR
65775).

[Bug ipa/85449] [8 Regression] Wrong specialization is called in self recursive functions after r259319

2018-04-19 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85449

--- Comment #7 from Martin Jambor  ---
I believe I understand the issue and will prepare a testcase from scratch. 
Possibly after I test/submit the patch if it takes too long.  Thanks for your
effort!

[Bug c++/85464] [7 Regression] Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

Jonathan Wakely  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org

--- Comment #3 from Jonathan Wakely  ---
Looks easy to fix, I'll take it.

[Bug ipa/85449] [8 Regression] Wrong specialization is called in self recursive functions after r259319

2018-04-19 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85449

--- Comment #8 from Martin Jambor  ---
I believe I understand the issue and will prepare a testcase from scratch. 
Possibly after I test/submit the patch if it takes too long.  Thanks for your
effort!

[Bug tree-optimization/85459] [8 Regression] Larger code generated from GMP template meta-programming

2018-04-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85459

--- Comment #2 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #1)
> I think this is a result of many changes.
> E.g. r249885 bumps .s size from 3709 to 4599 bytes, r254724 from 4599 to
> 5768, r255510 from 5772 to 7713.  You are compiling with -O3, not -Os, so
> the size is not what counts though, but the code speed, right?

-Os gives something much larger (16633). I agree 100% that what matters at -O3
is code speed, not size, and I am only using the size as a crude indicator
before looking into the code more closely. And what I saw were many extra moves
and some missed __builtin_constant_p opportunities, both of which obviously
have a negative impact on performance. If the only size increase came from
outlining the exception code, that would be fine. But compiling with
-fno-exceptions (after removing enough exception-related unnecessary stuff from
the file so it compiles) still shows an increase from 2639 to 4268.

[Bug ipa/85449] [8 Regression] Wrong specialization is called in self recursive functions after r259319

2018-04-19 Thread tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85449

--- Comment #9 from Tamar Christina  ---
Thanks Martin & Martin! The patch seems to work for me as well.

I did try to minimize it before but it seems the smallest change changes the
conditions for the cloning and it doesn't trigger it anymore.

[Bug tree-optimization/85459] [8 Regression] Larger code generated from GMP template meta-programming

2018-04-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85459

Marc Glisse  changed:

   What|Removed |Added

  Attachment #43982|0   |1
is obsolete||

--- Comment #3 from Marc Glisse  ---
Created attachment 43985
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43985&action=edit
preprocessed testcase (slightly reduced, compiles with -fno-exceptions)

[Bug libstdc++/85439] mt19937_64 producing unexpected result only in certain configuration

2018-04-19 Thread webrown.cpp at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85439

--- Comment #5 from W E Brown  ---
(In reply to Marc "Foddex" Oude Kotte from comment #4)
> The reason I was expecting the same result everywhere is because of this
> statement on cppreference.com:
> 
> 
> "Notes
> The 1th consecutive invocation of a default-contructed std::mt19937 is
> required to produce the value 4123659995.
> 
> The 1th consecutive invocation of a default-contructed std::mt19937_64
> is required to produce the value 9981545732273789042"
> 
> 
> Source:
> http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine
> 
> Obviously I'm not testing the 1th consecutive invocation but the 1st,
> but I still expected things to be similar (and they are, in 7/8 test cases).

The cited text is correct, but the attached program doesn't test it.

In particular, the attached program tests nothing about calling any Mersenne
twister object; rather, it looks at the results of calling
uniform_int_distribution objects.  As specified in C++, distribution objects
are not required to produce identical results across platforms, even if given
identical inputs.  Any expectation to the contrary is not supported by the C++
standard.

ISTM that testing the behavior of std::mt19937 and std::mt19937_64 -- or of any
engine, for that matter -- ought not involve any distribution object at all. 
Instead, such a test program should IMO be looking at the result(s) of directly
invoking the engine.

Comment #2 was exactly right.  I recommend this report be closed as invalid.

[Bug c++/69733] -Wignored-qualifiers points to wrong const

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69733

--- Comment #5 from Jonathan Wakely  ---
N.B. this caused a regression for conversion operators, see PR 85464.

[Bug c++/85464] [7 Regression] Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

--- Comment #4 from Jonathan Wakely  ---
Trunk prints a location, but it's the wrong one again:

ign.cc:3:25: warning: type qualifiers ignored on function return type
[-Wignored-qualifiers]
operator int const() const; // { dg-error "type qualifiers ignored" }
 ^

So this is a regression for gcc-7 and an ordinary bug for gcc-8.

[Bug c++/85462] [8 Regression] internal compiler error: in inc_refcount_use, at cp/pt.c:8955

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85462

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Created attachment 43986
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43986&action=edit
gcc8-pr85462.patch

Untested fix, still waiting if reduction comes up with some reasonably sized
testcase.

Confirmed r259457 as commit introducing this regression.

[Bug testsuite/61606] About GCC install, testing step (mostly check...)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61606

Richard Biener  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #4 from Richard Biener  ---
Works for us.

[Bug libstdc++/85439] mt19937_64 producing unexpected result only in certain configuration

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85439

Jonathan Wakely  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Jonathan Wakely  ---
(In reply to Marc "Foddex" Oude Kotte from comment #4)
> Obviously I'm not testing the 1th consecutive invocation but the 1st,

No, as Walter explained, you're not even doing that.

If you want to inspect the 1st output of the MT engines, or the 1th, then
do that:

template class, typename Type>
Type generateRandom() {
  return Generator()();
}

If you do this you'll see that every implementation produces the same results
from the mersenne-twisters engines.

Your program inspects the output of std::uniform_input_distribution instead.
The standard says nothing about the exact values it produces, only the
distribution of values given a range of inputs. So as Walter suggested, I'm
closing this as INVALID, as there's no bug here.

(Also, stop using names like _Generator, _Dist and _Type, those are reserved
names, see http://en.cppreference.com/w/cpp/language/identifiers for details).

[Bug testsuite/61606] About GCC install, testing step (mostly check...)

2018-04-19 Thread fernando at info dot unlp.edu.ar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61606

--- Comment #5 from Fernando G. Tinetti  ---
(In reply to Jürgen Reuter from comment #3)
> Is it really necessary to keep this one here open? Just stumbled on this via
> the discussion on c.l.f. This was with 4.9 which is no longer supported.
> Probably on an OS that is no longer used by the OP.

I understand. 

(the current link for the contents I was referring to is
http://fernando.bl.ee/GCC-gfortran-install.html)

Maybe I was the only one with that problem, for which the original questions
possibly remain. I did not try with the new versions.

Thanks anyway, maybe the wrong issues are fixed as time goes by,

Fernando.

[Bug c/85465] New: [og7, openacc] ICE in mark_vars_oacc_gangprivate, at c/c-parser.c:14213

2018-04-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85465

Bug ID: 85465
   Summary: [og7, openacc] ICE in mark_vars_oacc_gangprivate, at
c/c-parser.c:14213
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

$ cat bug.c
int
main (void)
{

  #pragma acc parallel
  foo ();

  return 0;
}

$ ./install/bin/gcc bug.c -fopenacc
bug.c: In function ‘main’:
bug.c:6:3: warning: implicit declaration of function ‘foo’
[-Wimplicit-function-declaration]
   foo ();
   ^~~
bug.c:6:3: internal compiler error: in mark_vars_oacc_gangprivate, at
c/c-parser.c:14213
0x7ecba5 mark_vars_oacc_gangprivate
gcc/c/c-parser.c:14213
0x110d5b8 walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*),
void*, hash_set >*, tree_node*
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*,
hash_set >*))
gcc/tree.c:11832
0x110e32f walk_tree_1(tree_node**, tree_node* (*)(tree_node**, int*, void*),
void*, hash_set >*, tree_node*
(*)(tree_node**, int*, tree_node* (*)(tree_node**, int*, void*), void*,
hash_set >*))
gcc/tree.c:12158
0x7edea2 c_parser_oacc_kernels_parallel
gcc/c/c-parser.c:14269
0x7fb762 c_parser_omp_construct
gcc/c/c-parser.c:17580
0x7e2de7 c_parser_pragma
gcc/c/c-parser.c:10372
0x7d5544 c_parser_compound_statement_nostart
gcc/c/c-parser.c:4890
0x7d5015 c_parser_compound_statement
gcc/c/c-parser.c:4755
0x7cfb0a c_parser_declaration_or_fndef
gcc/c/c-parser.c:2128
0x7ce202 c_parser_external_declaration
gcc/c/c-parser.c:1472
0x7cdd98 c_parser_translation_unit
gcc/c/c-parser.c:1352
0x7fd41e c_parse_file()
gcc/c/c-parser.c:18349
0x85aaab c_common_parse_file()
gcc/c-family/c-opts.c:1107
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/85466] New: Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread cpphackster at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

Bug ID: 85466
   Summary: Performance is slow when doing 'branchless'
conditional style math operations
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cpphackster at gmail dot com
  Target Milestone: ---

I have been investigating turning if statements into math operations inspired
by a blog article...

http://theorangeduck.com/page/avoiding-shader-conditionals

...and other resources listed here...
https://gist.github.com/unitycoder/4d988bb21b3ce820eaa23028ed6d04bd

There are also many 'branchless' type things on stack overflow (like the
signmum function needed for the branchless operations)

https://stackoverflow.com/questions/1903954/is-there-a-standard-sign-function-signum-sgn-in-c-c


I set up a quickbench benchmark to test if this branchless code is faster on
CPU as well.

http://quick-bench.com/o5lYur5c9rVuOyAn6-fzDf6xTuk

It seems that for a case such as...

if (myVector[n] > 0.5){
result[n] = 0.8f;
}
else {
result[n] = 0.1f;
}

...which gets turned into the branchless

result[n] = lerp(0.1f, 0.8f, when_gt(myVec[n], 0.5f));

...clang runs ~2x faster than the standard if statement (it seems to turn it
into a lot of vectorized code which seems to be many movups)

gcc is very slow compared to even the standard base case.

one suspect part is ~68% of time being spend in one part of the code.

3.00%  mulss  0x8(%rsp),%xmm0
67.88% addss  %xmm3,%xmm0
4.60%  movss  %xmm0,(%rbx,%rdx,1)
2.14%  add$0x4,%rdx
   cmp$0x61a80,%rdx
   je 4053a0 
   movss  0x0(%rbp,%rdx,1),%xmm0
0.68%  xor%eax,%eax
0.45%  subss  %xmm2,%xmm0
1.89%  ucomiss %xmm1,%xmm0
1.47%  seta   %al
1.85%  xor%ecx,%ecx
   ucomiss %xmm0,%xmm1
   pxor   %xmm0,%xmm0
   seta   %cl
1.17%  sub%ecx,%eax
0.90%  cvtsi2ss %eax,%xmm0
4.29%  ucomiss %xmm0,%xmm1
2.89%  movaps %xmm4,%xmm0
   jbe405350 
3.02%  mulss  0xc(%rsp),%xmm0
3.72%  jmp405356 


I'm happy to help out with testing any build or fixes for this. My assembly
knowledge is limited but willing to help out where possible/run benchmarks etc.

Cheers
Dan

[Bug tree-optimization/85467] New: [8 Regression] ICE: verify_gimple failed: non-trivial conversion at assignment with -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10

2018-04-19 Thread zsojka at seznam dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85467

Bug ID: 85467
   Summary: [8 Regression] ICE: verify_gimple failed: non-trivial
conversion at assignment with -O2 -fno-tree-ccp
--param=sccvn-max-scc-size=10
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 43987
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43987&action=edit
reduced testcase

This looks very similar to PR85195

The ICE can be reproduced with any of the typedefs:
typedef char T;
typedef short T;
typedef int T;
typedef long T;
typedef long long T;
typedef __int128 T;

as long as the vector has just 1 member.

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10
testcase.c
testcase.c: In function 'foo':
testcase.c:24:1: error: non-trivial conversion at assignment
 }
 ^
vector(1) char
char
_5 = 0;
testcase.c:24:1: error: non-trivial conversion at assignment
V
char
_8 = 0;
during GIMPLE pass: forwprop
testcase.c:24:1: internal compiler error: verify_gimple failed
0xdbaca9 verify_gimple_in_cfg(function*, bool)
/repo/gcc-trunk/gcc/tree-cfg.c:5585
0xc37613 execute_function_todo
/repo/gcc-trunk/gcc/passes.c:1994
0xc37f59 execute_todo
/repo/gcc-trunk/gcc/passes.c:2048
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-259457-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-259457-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
gcc version 8.0.1 20180418 (experimental) (GCC) 

$

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #1 from Marc Glisse  ---
Please always include your code in the bug report (this external website
doesn't even seem to have a "download the code" option).

[Bug tree-optimization/85467] [8 Regression] ICE: verify_gimple failed: non-trivial conversion at assignment with -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85467

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-19
 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |8.0
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Started with r247092.

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2018-04-19
 Ever confirmed|0   |1

[Bug c++/61982] Optimizer does not eliminate stores to destroyed objects

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61982

--- Comment #12 from Jonathan Wakely  ---
The C++ standard. Specifically, [basic.life].

After the constructor runs and before the destructor runs, that memory location
contains an object of type X. After the destructor finishes you can reuse the
storage for another object.

The memcmp calls treats the storage as an array of unsigned char and can
implicitly begin the lifetime of an array of unsigned char because unsigned
char does not have non-vacuous initialization (i.e. it doesn't need to be
initialized). However, if you don't initialize those characters then they have
indeterminate values.

So the first memcmp call reuses the storage to create an array of unsigned char
with indeterminate values. That ends the lifetime of the X (without calling the
destructor). Then you call the destructor on an object that already ended its
lifetime, which is undefined. Then you reuse it as an array of unsigned char
again, and it still has indeterminate values.

Even if we ignore the UB caused by invoking the destructor for an object that
no logner exists, the value representation of the X cannot be inspected by
memcmp, because X is not trivially copyable, and since the values are
indeterminate there is no requirement for the values written by the destructor
to be observable.

[Bug tree-optimization/85467] [8 Regression] ICE: verify_gimple failed: non-trivial conversion at assignment with -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85467

--- Comment #2 from Jakub Jelinek  ---
I'll have a look.

[Bug tree-optimization/85467] [8 Regression] ICE: verify_gimple failed: non-trivial conversion at assignment with -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85467

--- Comment #3 from Richard Biener  ---
Must be some match.pd pattern again.

[Bug libstdc++/85439] mt19937_64 producing unexpected result only in certain configuration

2018-04-19 Thread foddex at foddex dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85439

--- Comment #7 from Marc "Foddex" Oude Kotte  ---
OK. My apologies for the confusion, and thanks for the clarification!

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947

--- Comment #2 from Richard Biener  ---
>From the webpage you seem to just use -O3 -std=c++17, the assembly shows the
branchless code isn't vectorized.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

James Greenhalgh  changed:

   What|Removed |Added

 CC||jgreenhalgh at gcc dot gnu.org

--- Comment #3 from James Greenhalgh  ---
Created attachment 43988
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43988&action=edit
Reduced testcase

I believe this testcase shows the issue being reported here. Clang seems to
spot this is essentially a memset across the array, while GCC doesn't.

On AArch64 with Clang:

  .LBB1_9:// =>This Inner Loop Header: Depth=1
stp q0, q0, [x8, #-16]
subsx20, x20, #8// =8
add x8, x8, #32 // =32
b.ne.LBB1_9

On x86-64 with Clang:

  .LBB1_9:# =>This Inner Loop Header: Depth=1
movups  %xmm0, -144(%rax,%rcx,4)
movups  %xmm0, -128(%rax,%rcx,4)
movups  %xmm0, -112(%rax,%rcx,4)
movups  %xmm0, -96(%rax,%rcx,4)
movups  %xmm0, -80(%rax,%rcx,4)
movups  %xmm0, -64(%rax,%rcx,4)
movups  %xmm0, -48(%rax,%rcx,4)
movups  %xmm0, -32(%rax,%rcx,4)
movups  %xmm0, -16(%rax,%rcx,4)
movups  %xmm0, (%rax,%rcx,4)
addq$40, %rcx
cmpq$100036, %rcx   # imm = 0x186C4
jne .LBB1_9

GCC doesn't spot this.

On the other hand G++'s inlining of the various random number initialisation
routines really hammers Clang, which ends up emulating 128-bit arithmetic on
AArch64.

[Bug c++/85462] [8 Regression] internal compiler error: in inc_refcount_use, at cp/pt.c:8955

2018-04-19 Thread manuel.lauss at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85462

--- Comment #3 from Manuel Lauss  ---
(In reply to Jakub Jelinek from comment #2)
> Created attachment 43986 [details]
> gcc8-pr85462.patch
> 
> Untested fix, still waiting if reduction comes up with some reasonably sized
> testcase.

It does fix the issue.

[Bug c++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

Richard Biener  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #4 from Richard Biener  ---
On x86_64 somehow x87 math ends up being used.  Huh.  I also see calls to
nextafter() in the assembly?!  Note you need sth better than SSE2 to vectorize
the spaghetti, AVX2 does it for me.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

Richard Biener  changed:

   What|Removed |Added

 CC||jwakely.gcc at gmail dot com
  Component|c++ |libstdc++

--- Comment #5 from Richard Biener  ---
I see sth like

  template
_RealType
generate_canonical(_UniformRandomNumberGenerator& __urng)
{
...
   __ret = std::nextafter(_RealType(1), _RealType(0));

instantiated as

_RealType std::generate_canonical(_UniformRandomNumberGenerator&) [with
_RealType = double; long unsigned int __bits = 53;
_UniformRandomNumberGenerator = std::mersenne_twister_engine] (struct mersenne_twister_engine & __urng)
{
...
  const long double __r;
^^^
(but that's unused it seems)
...
  __ret = nextafter (1.0e+0, 0.0);

and cmath containing

  constexpr float
  nextafter(float __x, float __y)
  { return __builtin_nextafterf(__x, __y); }

  constexpr long double
  nextafter(long double __x, long double __y)
  { return __builtin_nextafterl(__x, __y); }



  template
constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
nextafter(_Tp __x, _Up __y)
{
  typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
  return nextafter(__type(__x), __type(__y));
}

which means std::nextafter will use the long double variant?

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #6 from Richard Biener  ---
(In reply to Richard Biener from comment #5)
> I see sth like
> 
>   template typename _UniformRandomNumberGenerator>
> _RealType
> generate_canonical(_UniformRandomNumberGenerator& __urng)
> {
> ...
>__ret = std::nextafter(_RealType(1), _RealType(0));
> 
> instantiated as
> 
> _RealType std::generate_canonical(_UniformRandomNumberGenerator&) [with
> _RealType = double; long unsigned int __bits = 53;
> _UniformRandomNumberGenerator = std::mersenne_twister_engine int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15,
> 4022730752, 18, 1812433253>] (struct mersenne_twister_engine & __urng)
> {
> ...
>   const long double __r;
> ^^^
> (but that's unused it seems)
> ...
>   __ret = nextafter (1.0e+0, 0.0);
> 
> and cmath containing
> 
>   constexpr float
>   nextafter(float __x, float __y)
>   { return __builtin_nextafterf(__x, __y); }
> 
>   constexpr long double
>   nextafter(long double __x, long double __y)
>   { return __builtin_nextafterl(__x, __y); }
> 
> 
> 
>   template
> constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> nextafter(_Tp __x, _Up __y)
> {
>   typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
>   return nextafter(__type(__x), __type(__y));
> }
> 
> which means std::nextafter will use the long double variant?

Ah, and in the same function:

_3 = (long double) __tmp;
_4 = _3 * 4.294967296e+9;
__tmp = (double) _4;

likely from

  const long double __r = static_cast(__urng.max())
   - static_cast(__urng.min()) + 1.0L;

the loop to be vectorized is

   [1.36%]:
  # n_315 = PHI <0(31), n_290(35)>
  # ivtmp_313 = PHI <1000(31), ivtmp_289(35)>
  _312 = (long unsigned int) n_315;
  _311 = _312 * 4;
  _310 = _82 + _311;
  _309 = *_310;
  _308 = _309 - 5.0e-1;
  _307 = _308 > 0.0;
  _306 = (int) _307;
  _305 = _308 < 0.0;
  _304 = (int) _305;
  _303 = _306 - _304;
  _302 = (float) _303;
  if (_302 < 0.0)
goto ; [36.00%]
  else
goto ; [64.00%]

   [0.87%]:

   [1.37%]:
  # _301 = PHI <&D.56316(32), &D.56315(33)>
  D.56315 ={v} {CLOBBER};
  D.56316 ={v} {CLOBBER};
  _298 = *_301;
  _297 = _298 * 6.9988079071044921875e-1;
  _296 = _297 + 1.0001490116119384765625e-1;
  _295 = pretmp_336 + _311;
  *_295 = _296;
  n_290 = n_315 + 1;
  ivtmp_289 = ivtmp_313 - 1;
  if (ivtmp_289 == 0)
goto ; [1.01%]
  else
goto ; [98.99%]

   [1.36%]:
  goto ; [100.00%]

which failed to if-convert, quite possibly because of the strange clobbers.
Which means the testcase is likely undefined as well.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #7 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #5)
> I see sth like
> 
>   template typename _UniformRandomNumberGenerator>
> _RealType
> generate_canonical(_UniformRandomNumberGenerator& __urng)
> {
> ...
>__ret = std::nextafter(_RealType(1), _RealType(0));
> 
> instantiated as
> 
> _RealType std::generate_canonical(_UniformRandomNumberGenerator&) [with
> _RealType = double; long unsigned int __bits = 53;
> _UniformRandomNumberGenerator = std::mersenne_twister_engine int, 32, 624, 397, 31, 2567483615, 11, 4294967295, 7, 2636928640, 15,
> 4022730752, 18, 1812433253>] (struct mersenne_twister_engine & __urng)
> {
> ...
>   const long double __r;
> ^^^
> (but that's unused it seems)

It's used to calculate __log2r which is the maximum range of the random bit
generator __urng.

> ...
>   __ret = nextafter (1.0e+0, 0.0);
> 
> and cmath containing
> 
>   constexpr float
>   nextafter(float __x, float __y)
>   { return __builtin_nextafterf(__x, __y); }
> 
>   constexpr long double
>   nextafter(long double __x, long double __y)
>   { return __builtin_nextafterl(__x, __y); }
> 
> 
> 
>   template
> constexpr typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
> nextafter(_Tp __x, _Up __y)
> {
>   typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
>   return nextafter(__type(__x), __type(__y));
> }

And also:

  using ::nextafter;

> which means std::nextafter will use the long double variant?

No, in the instantiation of generate_canonical it uses the
nextafter(double, double) from libc that's added to namespace std by "using
::nextafter".

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #8 from Richard Biener  ---
(In reply to James Greenhalgh from comment #3)
> Created attachment 43988 [details]
> Reduced testcase
> 
> I believe this testcase shows the issue being reported here. Clang seems to
> spot this is essentially a memset across the array, while GCC doesn't.
> 
> On AArch64 with Clang:
> 
>   .LBB1_9:// =>This Inner Loop Header:
> Depth=1
> stp q0, q0, [x8, #-16]
> subsx20, x20, #8// =8
> add x8, x8, #32 // =32
> b.ne.LBB1_9
> 
> On x86-64 with Clang:
> 
>   .LBB1_9:# =>This Inner Loop Header: Depth=1
>   movups  %xmm0, -144(%rax,%rcx,4)
>   movups  %xmm0, -128(%rax,%rcx,4)
>   movups  %xmm0, -112(%rax,%rcx,4)
>   movups  %xmm0, -96(%rax,%rcx,4)
>   movups  %xmm0, -80(%rax,%rcx,4)
>   movups  %xmm0, -64(%rax,%rcx,4)
>   movups  %xmm0, -48(%rax,%rcx,4)
>   movups  %xmm0, -32(%rax,%rcx,4)
>   movups  %xmm0, -16(%rax,%rcx,4)
>   movups  %xmm0, (%rax,%rcx,4)
>   addq$40, %rcx
>   cmpq$100036, %rcx   # imm = 0x186C4
>   jne .LBB1_9
> 
> GCC doesn't spot this.

I bet clang is clever on the undefinedness I pointed out.

> On the other hand G++'s inlining of the various random number initialisation
> routines really hammers Clang, which ends up emulating 128-bit arithmetic on
> AArch64.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #9 from Jonathan Wakely  ---
This is undefined:

template 
inline auto when_greater_then(T x, T y) -> decltype(std::max(sign(x - y),
T(0)))  {
  return std::max(sign(x - y), T(0));
}


The return type of std::max is a reference to one of its arguments, so the
decltype expression is a reference, and you end up returning a reference to a
local temporary.

This would fix that problem:

template 
inline auto when_greater_then(T x, T y)
-> std::remove_reference_t  {
  return std::max(sign(x - y), T(0));
}

But it's dumb anyway, because the return type of std::max is T& so you don't
need to use decltype and remove_reference to know the type.

template 
inline T when_greater_then(T x, T y)
{
  return std::max(sign(x - y), T(0));
}

Much simpler, no dangling references.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #10 from Richard Biener  ---
And you need -fno-trapping-math to allow if-conversion.  Then things are fast.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 84737, which changed state.

Bug 84737 Summary: [8 Regression] 20% degradation in CPU2000 172.mgrid starting 
with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Richard Biener  ---
Fixed (hopefully).

[Bug ipa/85447] [8 Regression] ICE in create_specialized_node, at ipa-cp.c:3870 since r259319

2018-04-19 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85447

Martin Jambor  changed:

   What|Removed |Added

  Attachment #43981|0   |1
is obsolete||

--- Comment #8 from Martin Jambor  ---
Created attachment 43989
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43989&action=edit
Fix undergoing testing

This version of the fix replaces a bool parameter with a hash_set, which
actually corresponds to what I intended to do in the first place.  It also
contains a testcase that was miscompiled with the previous version.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #19 from Richard Biener  ---
Author: rguenth
Date: Thu Apr 19 12:41:42 2018
New Revision: 259493

URL: https://gcc.gnu.org/viewcvs?rev=259493&root=gcc&view=rev
Log:
2018-04-19  Richard Biener  

PR tree-optimization/84737
* tree-vect-data-refs.c (vect_copy_ref_info): New function
copying restrict info.
(vect_setup_realignment): Use it.
* tree-vectorizer.h (vect_copy_ref_info): Declare.
* tree-vect-stmts.c (vectorizable_store): Copy ref info from
the first DR to all generated stores.
(vectorizable_load): Likewise for loads.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-data-refs.c
trunk/gcc/tree-vect-stmts.c
trunk/gcc/tree-vectorizer.h

[Bug c++/84588] [8 Regression] internal compiler error: Segmentation fault (contains_struct_check())

2018-04-19 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84588

Paolo Carlini  changed:

   What|Removed |Added

 CC||paolo.carlini at oracle dot com

--- Comment #3 from Paolo Carlini  ---
I believe that for the regression we want something like (it tests fine):

Index: parser.c
===
--- parser.c(revision 259489)
+++ parser.c(working copy)
@@ -21358,6 +21358,8 @@ cp_parser_parameter_declaration_list (cp_parser* p
{
  *is_error = true;
  parameters = error_mark_node;
+ if (parser->fully_implicit_function_template_p)
+   abort_fully_implicit_template (parser);
  break;
}

Then the diagnostic is not perfect (a spurious final warning) but is exactly
the same we issue for, say:

struct a {
  void b() {}
  void c(auto = [] {
if (a a(int int){})
  ;
  }) {}
};

[Bug ipa/85449] [8 Regression] Wrong specialization is called in self recursive functions after r259319

2018-04-19 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85449

--- Comment #10 from Martin Jambor  ---
Created attachment 43990
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43990&action=edit
Simple testcase

This is a simple testcase.  Let me prepare the final patch then.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #11 from James Greenhalgh  ---
With Jonathon's suggested change, copied in to the original poster's framework
(without -fno-trapping-math), Clang hot loop ( score: 165065
http://quick-bench.com/6NaD8ay0f8qMh9n0aMriYEiuKNA ) is:

0.16%  movups 0x61a80(%r15,%rax,4),%xmm6
1.15%  movups 0x61a90(%r15,%rax,4),%xmm7
0.60%  movaps %xmm1,%xmm3
5.44%  cmpltps %xmm6,%xmm3
0.44%  movaps %xmm1,%xmm6
0.40%  cmpltps %xmm7,%xmm6
0.44%  movaps %xmm5,%xmm7
4.97%  andps  %xmm3,%xmm7
0.20%  andnps %xmm4,%xmm3
0.36%  orps   %xmm7,%xmm3
1.04%  movaps %xmm5,%xmm7
4.97%  andps  %xmm6,%xmm7
0.11%  andnps %xmm4,%xmm6
4.95%  orps   %xmm7,%xmm6
5.53%  movups %xmm3,0x61a80(%rbx,%rax,4)
0.47%  movups %xmm6,0x61a90(%rbx,%rax,4)
4.42%  movups 0x61aa0(%r15,%rax,4),%xmm3
20.42% movups 0x61ab0(%r15,%rax,4),%xmm6
1.00%  movaps %xmm1,%xmm7
0.49%  cmpltps %xmm3,%xmm7
9.79%  movaps %xmm1,%xmm3
0.16%  cmpltps %xmm6,%xmm3
2.26%  movaps %xmm5,%xmm6
0.60%  andps  %xmm7,%xmm6
4.20%  andnps %xmm4,%xmm7
1.18%  orps   %xmm6,%xmm7
2.22%  movaps %xmm5,%xmm6
0.47%  andps  %xmm3,%xmm6
4.24%  andnps %xmm4,%xmm3
4.88%  movups %xmm7,0x61aa0(%rbx,%rax,4)
0.27%  orps   %xmm6,%xmm3
5.22%  movups %xmm3,0x61ab0(%rbx,%rax,4)
6.02%  add$0x10,%rax
   jne405b30 

GCC hot loop ( score: 2385754
http://quick-bench.com/ehLe-aqkpXkkx2sHLd6TWq_p4g4 ) is:

0.56%  movss  0x0(%rbp,%rdx,1),%xmm0
1.47%  xor%eax,%eax
2.00%  subss  %xmm2,%xmm0
7.02%  ucomiss %xmm1,%xmm0
6.77%  seta   %al
4.96%  xor%ecx,%ecx
0.25%  ucomiss %xmm0,%xmm1
0.84%  pxor   %xmm0,%xmm0
0.09%  seta   %cl
5.40%  sub%ecx,%eax
3.22%  cvtsi2ss %eax,%xmm0
9.87%  ucomiss %xmm0,%xmm1
6.53%  ja 4053a8 
10.24% mulss  %xmm4,%xmm0
11.55% addss  %xmm3,%xmm0
5.46%  movss  %xmm0,(%rbx,%rdx,1)
2.00%  add$0x4,%rdx
   cmp$0x61a80,%rdx
   jne405350 

Daniel Elliott does that better match your expectations? If so, I think this
can be resolved as missed optimization of invalid code.

[Bug tree-optimization/85467] [8 Regression] ICE: verify_gimple failed: non-trivial conversion at assignment with -O2 -fno-tree-ccp --param=sccvn-max-scc-size=10

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85467

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 43991
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43991&action=edit
gcc8-pr85467.patch

fold-const.c this time.  Untested patch.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #20 from Pat Haugen  ---
(In reply to Richard Biener from comment #18)
> Fixed (hopefully).

Yes, mgrid performance is back. Thanks.

[Bug rtl-optimization/85455] [8 Regression] ICE in verify_loop_structure, at cfgloop.c:1708 (error: basic block 3 should be marked irreducible)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85455

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-checking

--- Comment #2 from Richard Biener  ---
patch posted

[Bug c++/85468] New: Wrong location for -Wignored-qualifiers diagnostic on conversion operator

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85468

Bug ID: 85468
   Summary: Wrong location for -Wignored-qualifiers diagnostic on
conversion operator
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

struct Test {
   operator int const() const; // { dg-error "type qualifiers ignored" }
};

ign.cc:2:25: warning: type qualifiers ignored on function return type
[-Wignored-qualifiers]
operator int const() const; // { dg-error "type qualifiers ignored" }
 ^

The location should point to the first const.

This is the subject of PR 69733 but that fix didn't cover conversion operators.

[Bug c++/85464] [7 Regression] Wignored-qualifiers is emitted by cc1plus without diagnostics when triggered by a cast operator.

2018-04-19 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85464

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||patch

--- Comment #5 from Jonathan Wakely  ---
Patch: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00951.html

I've created PR 85468 for the wrong location, so this one is just about the
regression in gcc-7-branch.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

Eric Botcazou  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org

--- Comment #35 from Eric Botcazou  ---
> A port does not need maintenance only for that port, and its users, but also
> for GCC itself.  All ports are a cost to _all_ GCC developers.  If a port is
> not maintained it has to be removed.

Do you IBM guys have a hidden agenda to bury the left-overs of Freescale? ;-)

The SPE port has already been moved out of the way so I don't really see the
point in further hammering it like that; there are plenty of obsolete ports in
the tree that would have to removed before this one if the above criterion was
followed to the letter.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #36 from John Paul Adrian Glaubitz  ---
(In reply to Jakub Jelinek from comment #33)
> Yes, but the port split was done in May last year, and nothing substantial
> happened since then.  Port maintainance is not about promises, but about
> doing the work.  If he does the work soon, the port can be de-obsoleted in
> GCC9, otherwise it will be removed, which doesn't mean it can't be added at
> some point later.  Of course, the later it will be done, the harder it will
> be.

He is working on it and people are using it. It's not surprising that it takes
longer than any work done by paid developers on the x86 or POWER targets.

> > > m68k needs some serious work, too, in the not far future (if the cc0 
> > > removal
> > > finally goes through -- that has been over ten years now).
> > 
> > Yes, I am aware of that. But there are enough people interested in such work
> > so I think we will be able get around doing that at some point.
> 
> Nobody did the work in the last 15+ years for m68k, it doesn't seem likely
> that all of sudden it will happen.  There have been numerous posts about
> what to do to get rid of cc0, e.g. in 2005 and several other years.
> See https://gcc.gnu.org/backends.html for details, a healthy port doesn't
> have
> c (cc0), p (not using define_peephole2), has a (uses LRA).  We can't
> maintain old reload, or cc0 support indefinitely.

I have been doing some research yesterday myself and couldn't find a page which
documents on how to write a port. I couldn't even find documentation on the cc0
stuff.

> > > A port does not need maintenance only for that port, and its users, but 
> > > also
> > > for GCC itself.  All ports are a cost to _all_ GCC developers.  If a port 
> > > is
> > > not maintained it has to be removed.
> > 
> > So, again my question is: What exactly is the with the powerpcspe target at
> > the moment and why does upstream claim the port is broken when it apparently
> > works for us in Debian? Am I missing something?
> 
> Have you read all the threads mentioned in
> https://gcc.gnu.org/ml/gcc/2018-04/msg00102.html
> and all the above comments?  All the details are in there.

Could you please just mention issue in question that causes trouble? The
messages directly linked only mention PR81084 but I haven't run into this issue
myself.

Again, could you please mention an urgent issue with the powerpcspe target that
causes serious issues for other users or developers? Thanks!

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #37 from Jakub Jelinek  ---
Not sure about IBM, I as a GCC developer and RM have major problem with the
amount of dead code in the port, because anyone who makes changes to the
middle-end that need backend changes will waste time adjusting code that is
dead and can't be really tested (all the Altivec/VMX code, all the 64-bit
support in there, all the power6/7/8/9, all the floating point modes stuff,
etc.).  Furthermore the lack of -mno-lra removal in it.  If somebody is willing
to change all backends rather than waiting on target maintainers to fix stuff
up, at least the work should not be wasted on dead code.  Just look e.g. how
many times in the last year Richard Sandiford modified this dead code in
config/powerpcspe.  That is pretty much all wasted effort (others too).

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #38 from John Paul Adrian Glaubitz  ---
(In reply to Eric Botcazou from comment #35)
> Do you IBM guys have a hidden agenda to bury the left-overs of Freescale? ;-)

I thought Jakub works for RedHat?

> The SPE port has already been moved out of the way so I don't really see the
> point in further hammering it like that; there are plenty of obsolete ports
> in the tree that would have to removed before this one if the above
> criterion was followed to the letter.

To be honest: I made this experience already in multiple projects. IBM
employees have been quite unfriendly towards PPC targets which were not POWER8
or newer. The answer was usually "Anything lower than POWER8 is no longer
supported." talking as if the project in question belongs to IBM. Quite
frustrating.

[Bug rtl-optimization/85455] [8 Regression] ICE in verify_loop_structure, at cfgloop.c:1708 (error: basic block 3 should be marked irreducible)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85455

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Thu Apr 19 13:53:06 2018
New Revision: 259494

URL: https://gcc.gnu.org/viewcvs?rev=259494&root=gcc&view=rev
Log:
2018-04-19  Richard Biener  

PR middle-end/85455
* cfg.c (clear_bb_flags): When loop state says we have
marked irreducible regions also preserve BB_IRREDUCIBLE_LOOP.

* gcc.dg/pr85455.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/pr85455.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfg.c
trunk/gcc/testsuite/ChangeLog

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #39 from John Paul Adrian Glaubitz  ---
(In reply to Jakub Jelinek from comment #37)
> Not sure about IBM, I as a GCC developer and RM have major problem with the
> amount of dead code in the port, because anyone who makes changes to the
> middle-end that need backend changes will waste time adjusting code that is
> dead and can't be really tested (all the Altivec/VMX code, all the 64-bit
> support in there, all the power6/7/8/9, all the floating point modes stuff,
> etc.).

It's not a waste of time if people are actually still using the code which is
the case for both PowerPCSPE and m68k. I don't understand why some people think
it's acceptable to kill open source stuff that is actively being used by
others.

> Furthermore the lack of -mno-lra removal in it.  If somebody is
> willing to change all backends rather than waiting on target maintainers to
> fix stuff up, at least the work should not be wasted on dead code.  Just
> look e.g. how many times in the last year Richard Sandiford modified this
> dead code in config/powerpcspe.  That is pretty much all wasted effort
> (others too).

I think your problem lies in the fact that you are solely seeing this from the
gcc maintainers perspective (which I cannot blame you for and which is not
surprising). However, you have to keep in mind that - in the end - gcc is a
product that people are using for productive work. So, there has to be a
balance between the objective quality of the code and the usefulness of the
project.

If I understand Eric correctly, the PowerPCSPE backend has been split off from
the other PPC code so as long as the code works and people are using it (and
with someone even working on updating it), why the hurry to remove it?

[Bug debug/81307] [8 regression] g++.dg/debug/debug9.C -gstabs FAILs

2018-04-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81307

--- Comment #9 from Martin Liška  ---
Thanks Jakub, I can confirm I don't see any other UBSANs related to gstabs in
the test-suite.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #40 from John Paul Adrian Glaubitz  ---
Is there documentation like this for gcc?

> https://llvm.org/docs/WritingAnLLVMBackend.html

Would be very useful for people wanting to help with the old backends.

[Bug objc/50909] Process "#pragma options align=reset" correctly on Mac OS X

2018-04-19 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50909

Eric Gallager  changed:

   What|Removed |Added

 CC||iains at gcc dot gnu.org,
   ||mikestump at comcast dot net

--- Comment #11 from Eric Gallager  ---
(In reply to Manuel López-Ibáñez from comment #9)
> (In reply to Alex from comment #8)
> > As to someone implementing clang behavior: I was under the impression that
> > the attached patch just needed to be applied? That is why I was inquiring as
> > to why nobody had applied it after 3 years.
> 
> Writing a patch is often the easy part. The hard part is getting the patch
> approved and following up on reviewer's comments. See
> https://gcc.gnu.org/contribute.html and https://gcc.gnu.org/wiki/GCC_Research
> 
> The patch doesn't look correct to me anyway, but you will get a better
> review from a Darwin maintainer, see the MAINTAINERS file in the gcc sources.

Adding them on cc (Only Mike is currently listed as a Darwin maintainer, but
IMO Iain ought to be listed as one, too)

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #41 from David Edelsohn  ---
SPE mostly is a separate architecture that happens to share many of the basic
mnemonics with PowerPC. Maintaining the SPE port was a burden to the
Power/PowerPC maintainers. As discussed in the other threads, despite years of
promises, the SPE port was not maintained, not even regular testsuite results.
The communication mostly consisted of bug reports raised a year after the patch
or release.

In regard to IBM employee response to PowerPC targets older than Power8, you
said yourself that these are IBM employees. They are directed to work on
specific projects and patches.  Sometimes IBM developers can become too focused
on the Power8 and later issue and include portability in their design, but IBM
also is not a general support center for all PowerPC. IBM is a business.  As
with all commercially-supported Open Source projects and all forms of work in
general, some developers have broader interest in the project and others
approach it as a job.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #42 from Jakub Jelinek  ---
See e.g. https://gcc.gnu.org/onlinedocs/gccint/
https://gcc.gnu.org/onlinedocs/gccint/#toc-RTL-Representation
https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html#Machine-Desc
https://kristerw.blogspot.cz/2017/08/writing-gcc-backend_4.html
and many others.  Don't really know how is writing a new backend relevant to
maintaining an existing one.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2018-04-19 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #30 from Martin Sebor  ---
Richard, I offered to write a proposal (with Clark) to improve the rules.  With
the object model proposals already in the pipeline (N2223) this is a good time
to review them and see if it makes sense to extend or change them to also cover
this case in an acceptable way.  It would be helpful if you could take some
time to summarize your main concerns or suggestions for changes in this area. 
I can start working on the proposal for the fall 2018 WG14 meeting.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #43 from rguenther at suse dot de  ---
On Thu, 19 Apr 2018, glaubitz at physik dot fu-berlin.de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084
> 
> --- Comment #40 from John Paul Adrian Glaubitz  fu-berlin.de> ---
> Is there documentation like this for gcc?
> 
> > https://llvm.org/docs/WritingAnLLVMBackend.html
> 
> Would be very useful for people wanting to help with the old backends.

The wiki has some material, a quick search finds
https://gcc.gnu.org/wiki/WritingANewBackEnd not sure how useful or
up-to-date it is.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #12 from Marc Glisse  ---
Constant folding for nextafter seems like a useful thing to add, whatever we
say about the rest of the testcase.

[Bug objc/53905] -Wformat-nonliteral gives false positives with __attribute__((format(NSString,...)))

2018-04-19 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53905

Eric Gallager  changed:

   What|Removed |Added

 CC||iains at gcc dot gnu.org,
   ||mikestump at comcast dot net

--- Comment #3 from Eric Gallager  ---
Adding objc maintainers on cc

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #44 from John Paul Adrian Glaubitz  ---
(In reply to David Edelsohn from comment #41)
> SPE mostly is a separate architecture that happens to share many of the
> basic mnemonics with PowerPC. Maintaining the SPE port was a burden to the
> Power/PowerPC maintainers. As discussed in the other threads, despite years
> of promises, the SPE port was not maintained, not even regular testsuite
> results. The communication mostly consisted of bug reports raised a year
> after the patch or release.

Those are apparent mistakes made in the past. I am happy to provide testsuite
results starting next week. As I said, gcc-8 is stable at the moment on real
PowerPCSPE hardware.

> In regard to IBM employee response to PowerPC targets older than Power8, you
> said yourself that these are IBM employees. They are directed to work on
> specific projects and patches.  Sometimes IBM developers can become too
> focused on the Power8 and later issue and include portability in their
> design, but IBM also is not a general support center for all PowerPC. IBM is
> a business.  As with all commercially-supported Open Source projects and all
> forms of work in general, some developers have broader interest in the
> project and others approach it as a job.

I understand how that works. No one is actually asking IBM people to fix
anything if they don't want to. But I have a problem with people removing
working code that people are using and then dismissing it with answers like the
one I received. It would be different if project X belongs to IBM.

One of such examples was that POWER5 support was forcefully removed by IBM
people from Golang. They claimed that this move was urgently necessary to
reduce maintenance burden. Yet all that was needed to get Golang working again
on POWER5 was to revert three patches and slightly modify them. I am still
rebasing it regularly without any problems.

[Bug libstdc++/85466] Performance is slow when doing 'branchless' conditional style math operations

2018-04-19 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466

--- Comment #13 from rguenther at suse dot de  ---
On Thu, 19 Apr 2018, glisse at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85466
> 
> --- Comment #12 from Marc Glisse  ---
> Constant folding for nextafter seems like a useful thing to add, whatever we
> say about the rest of the testcase.

Indeed - looks like this and related functions are not constant folded for
whatever reason...

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread glaubitz at physik dot fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #45 from John Paul Adrian Glaubitz  ---
(In reply to Jakub Jelinek from comment #42)
> See e.g. https://gcc.gnu.org/onlinedocs/gccint/
> https://gcc.gnu.org/onlinedocs/gccint/#toc-RTL-Representation
> https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html#Machine-Desc
> https://kristerw.blogspot.cz/2017/08/writing-gcc-backend_4.html
> and many others.

Thank you.

> Don't really know how is writing a new backend relevant to
> maintaining an existing one.

Well, I guess there is no documentation "How to fix an unmaintained target and
bring it back into shape.", is there?

Documenation which explains how to write a backend should also help fixing an
existing one.

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #46 from David Edelsohn  ---
I understand the issues with Golang and have been raising the issue internally.

[Bug rtl-optimization/85455] [8 Regression] ICE in verify_loop_structure, at cfgloop.c:1708 (error: basic block 3 should be marked irreducible)

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85455

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128

2018-04-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381

--- Comment #6 from Tom de Vries  ---
Created attachment 43992
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43992&action=edit
tentative patch

(In reply to Tom de Vries from comment #4)
> This looks like a JIT bug, but with this tentative patch:
> ...
> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index 8c478c874bd..ac394ee1ae6 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -4479,7 +4479,7 @@ nvptx_process_pars (parallel *par)
>   threads = nvptx_mach_vector_length ();
> }
>  
> -  if (!empty || !is_call)
> +  if (!(empty || is_call))
> {
>   /* Insert begin and end synchronizations.  */
>   emit_insn_before (nvptx_cta_sync (barrier, threads),
> ...
> no barriers are generated, and the minimized testcase passes.

Actually, this was a bit more complicated than that.

The condition correctly identifies the situation of a call and no state
propagation as not needing barriers.

But in the case of not a call (so, fork/join), even if there's no state
propagation, we need synchronization at the end of worker and vector loops.

[ More precisely, we need it inbetween loops, but we're currently not detecting
that situation.

And even more precisely, we need it inbetween dependent loops, but we also
currently not detecting that situation. ]

This patch skips the first of the bar.syncs, keeping the one after the loop,
and that allows parallel-loop-1.c to pass with vector length 128.

[Bug fortran/85448] the compiler selects the wrong subroutine because of bind(c,name=...)

2018-04-19 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448

--- Comment #4 from Steve Kargl  ---
On Thu, Apr 19, 2018 at 09:07:15AM +, francois.jacq at irsn dot fr wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448
> 
> --- Comment #3 from francois.jacq at irsn dot fr ---
> Notice that this is a regression : The version 4.8.5 returns the result I
> expected...
> 

Or it was a bug in 4.8.5 and is now fixed.

Now, the version where you rename odopen from m1
to odopen_m1 is probably a bug.

[Bug target/85198] long long int vector mistaken as long int vector

2018-04-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85198

--- Comment #4 from Bill Schmidt  ---
Ah, but vulli does have the wrong element type, when you get a little deeper.

 
V2DI
size 
unit-size 
align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x3fffb5bc5dd8 nunits:2>
public static common V2DI defer-output pr85198-2.c:3:22 size
 unit-size 
align:128 warn_if_not_align:0>>

[Bug target/85381] [og7, nvptx, openacc] parallel-loop-1.c fails with default vector length 128

2018-04-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85381

--- Comment #7 from Tom de Vries  ---
For this example:
...
#define n 1024

int
main (void)
{
  #pragma acc parallel vector_length(128)
  {
#pragma acc loop vector
for (int i = 0; i < n; i++)
  ;

#pragma acc loop vector
for (int i = 0; i < n; i++)
  ;
  }

  return 0;
}
...

we currently generate:
...
.entry main$_omp_fn$0
{
.reg.u64 %r24;
.reg.u64 %r25;
.reg.u64 %r26;
.reg.u64 %r27;
.reg.pred %r28;
{
.reg.u32%x;
mov.u32 %x, %tid.x;
setp.ne.u32 %r28, %x, 0;
}
bar.sync0;
@%r28   bra $L2;
// join 4;
// fork 4;
$L2:
bar.sync0;
ret;
}
...

so if we fix the branch around nothing problem here, we'll get back-to-back
bar.syncs again, and may run into the JIT but again.

We may wanna insert dummy ops inbetween (it would be nice if something less
heavy than a membar.cta will work).

[Bug fortran/85448] the compiler selects the wrong subroutine because of bind(c,name=...)

2018-04-19 Thread francois.jacq at irsn dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448

--- Comment #5 from francois.jacq at irsn dot fr ---
On Thursday 19 April 2018 16:30:09 you wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448
> 
> --- Comment #4 from Steve Kargl 
> ---
> 
> On Thu, Apr 19, 2018 at 09:07:15AM +, francois.jacq at irsn dot fr 
wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448
> > 
> > --- Comment #3 from francois.jacq at irsn dot fr ---
> > Notice that this is a regression : The version 4.8.5 returns the result I
> > expected...
> 
> Or it was a bug in 4.8.5 and is now fixed.
> 
> Now, the version where you rename odopen from m1
> to odopen_m1 is probably a bug.

No : both are bugs (and probably identical). The bind clause cannot change how 
the procedure name "odopen" has to be considered from the Fortran side !

Intel, Oracle and Nag compilers agree with me.

[Bug fortran/85448] the compiler selects the wrong subroutine because of bind(c,name=...)

2018-04-19 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85448

kargl at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
 Status|RESOLVED|REOPENED
   Last reconfirmed||2018-04-19
 CC|kargl at gcc dot gnu.org   |
 Resolution|INVALID |---
 Ever confirmed|0   |1

--- Comment #6 from kargl at gcc dot gnu.org ---
Re-open.
Remove my email address.
Let someone else deal with the problem.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2018-04-19 Thread jameskuyper at verizon dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #31 from James Kuyper Jr.  ---
(In reply to rguent...@suse.de from comment #29)
> On Thu, 19 Apr 2018, jameskuyper at verizon dot net wrote:
...
> > The relevant wording is "anywhere that a declaration of the completed type 
> > of
> > the union is visible.", so it's unambiguously the type, not the object, 
> > which
> > must be visible. A declaration of the completed type can be visible without 
> > any
> > objects of that type being visible, and that's sufficient for this rule to
> > apply.
> > 
> > > ...  Must the access be performed
> > > using the union object, or just the union type, or neither?
> > 
> > It says only that "it is permitted to inspect the common initial part"; it
> > imposes no restrictions on how the inspection may be performed. Clearly,
> > inspection through an lvalue of the union's type must be permitted, but it 
> > is
> > also permitted to use the more indirect methods which are the subject of 
> > this
> > bug report, simply because the standard says nothing to restrict the 
> > permission
> > it grants to the more direct cases.
> 
> Note I repeatedly said this part of the standard is just stupid.

As a judgement call, I reserve the right to disagree with you on that point,
particularly if that judgement was based primarily on the following
misconception:

> ...  It makes
> most if not all type-based alias analysis useless.

How could that be true? It only applies to pairs of struct types that are the
types of members of the same union, it only applies within the scope of a
completed definition of that union's type, and it doesn't apply if the
implementation can prove to itself that the two objects in question are not
actually members of the same union object. It seems to me that the need to take
this rule into consideration would come up pretty infrequently.

Code which relies upon this feature to implement a C-style approximation to
inheritance has been fairly common, which is precisely why the C committee
decided to create this rule, to make sure such code had well-defined behavior.

> Which means I'll refuse any patches implementing it in a way that affects
> default behavior.  A clean patch (I really can't think of any clean 
> approach besides forcing -fno-strict-aliasing!) with some extra flag
> (well, just use -fno-strict-aliasing ...) would be fine with me.

I can understanding not making this the default behavior if you feel that way;
I only use gcc in fully standard-conforming mode, anyway, so that doesn't
matter to me. However, personally, I would prefer it if gcc's fully-conforming
mode took full advantage of all the optimization opportunities legitimately
enabled by 6.5p7 (which does not include opportunities revoked by 6.5.2.3p6).

[Bug target/81084] [8 Regression] powerpcspe port full of confusing configury / command-line options not related to SPE

2018-04-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81084

--- Comment #47 from Segher Boessenkool  ---
(In reply to Eric Botcazou from comment #35)
> > A port does not need maintenance only for that port, and its users, but also
> > for GCC itself.  All ports are a cost to _all_ GCC developers.  If a port is
> > not maintained it has to be removed.
> 
> Do you IBM guys have a hidden agenda to bury the left-overs of Freescale? ;-)
> 
> The SPE port has already been moved out of the way so I don't really see the
> point in further hammering it like that; there are plenty of obsolete ports
> in the tree that would have to removed before this one if the above
> criterion was followed to the letter.

This is not about IBM.

Believe it or not, but the rs6000 port maintainers *care* about older systems.

I wanted to obsolete SPE support because it is a big burden, not in small part
because no one maintains it.  This has been going on for years and years.

Big pushback; people still want SPE, they just don't want to spend work on it.
Well neither do we, it's been enough.  So I spent a week splitting off the port
(also tested removing VSX etc.; removing unused code does not take that long;
I just have no way to *test* it so that was not included).  It was agreed the
powerpcspe port would be maintained or it would be removed.

Now a year later GCC 8 is on the horizon, and the powerpcspe port is still not
maintained.  And the RMs decided to give it *another* year: it is not removed
but merely obsoleted.

[Bug target/85417] -fcf-protection should provide CET protection on x86

2018-04-19 Thread hjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85417

--- Comment #6 from hjl at gcc dot gnu.org  ---
Author: hjl
Date: Thu Apr 19 15:15:04 2018
New Revision: 259496

URL: https://gcc.gnu.org/viewcvs?rev=259496&root=gcc&view=rev
Log:
x86: Enable -fcf-protection with multi-byte NOPs

-fcf-protection -mcet can't be used with IFUNC features, like symbol
multiversioning or target clone, since IBT/SHSTK are applied to the whole
program and they may be disabled in some functions.  But -fcf-protection
is implemented with multi-byte NOPs on all 64-bit processors as well as
32-bit processors starting with Pentium Pro.  If -fcf-protection requires
-mcet, IFUNC features can't be used on Linux when -fcf-protection is
enabled by default.

This patch changes -fcf-protection to implement indirect branch and
return address tracking with multi-byte NOPs.  -mibt and -mshstk are
changed to only enable CET built-in functions.  CET tests are updated
to allow -fcf-protection without -mibt, -mshstk and -mcet on x86.
-fcf-protection=none are also added to tests which fail with
-fcf-protection so that -fcf-protection can be added to RUNTESTFLAGS
to verify -fcf-protection implementation.

gcc/

PR target/85417
* config/i386/cet.c (file_end_indicate_exec_stack_and_cet):
Check flag_cf_protection instead of TARGET_IBT and TARGET_SHSTK.
* config/i386/i386-c.c (ix86_target_macros_internal): Also
define __IBT__ and __SHSTK__ for -fcf-protection.
* config/i386/i386.c (pass_insert_endbranch::gate): Don't check
TARGET_IBT.
(ix86_trampoline_init): Likewise.
(x86_output_mi_thunk): Likewise.
(ix86_notrack_prefixed_insn_p): Likewise.
(ix86_option_override_internal): Don't disallow -fcf-protection.
* config/i386/i386.md (rdssp): Also enable for
-fcf-protection.
(incssp): Likewise.
(nop_endbr): Likewise.
* config/i386/i386.opt (mcet): Change help message to built-in
functions only.
(mibt): Likewise.
(mshstk): Likewise.
* doc/invoke.texi: Remove -mcet, -mibt and -mshstk condition
on -fcf-protection.  Change -mcet, -mibt and -mshstk to only
enable CET built-in functions.

gcc/testsuite/

PR target/85417
* c-c++-common/attr-nocf-check-1.c: Compile with
-fcf-protection=none.
* c-c++-common/attr-nocf-check-3.c: Likewise.
* gcc.dg/march-generic.c: Likewise.
* gcc.target/i386/align-limit.c: Likewise.
* gcc.target/i386/cet-notrack-icf-1.c: Likewise.
* gcc.target/i386/cet-notrack-icf-3.c: Likewise.
* gcc.target/i386/cet-property-2.c: Likewise.
* gcc.target/i386/ret-thunk-26.c: Likewise.
* c-c++-common/fcf-protection-1.c: Remove dg-error for x86
targets.
* c-c++-common/fcf-protection-2.c: Likewise.
* c-c++-common/fcf-protection-3.c: Likewise.
* c-c++-common/fcf-protection-5.c: Likewise.
* c-c++-common/fcf-protection-6.c: Likewise.
* c-c++-common/fcf-protection-7.c: Likewise.
* gcc.target/i386/cet-label-3.c: New test.
* gcc.target/i386/cet-property-3.c: Likewise.
* gcc.target/i386/cet-sjlj-7.c: Likewise.
* gcc.target/i386/pr85417-1.c: Likewise.
* gcc.target/i386/indirect-thunk-attr-7.c: Also expect
__x86_indirect_thunk_nt_(r|e)ax
* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
* gcc.target/i386/pr85403.c: Remove dg-error,

Added:
trunk/gcc/testsuite/gcc.target/i386/cet-label-3.c
trunk/gcc/testsuite/gcc.target/i386/cet-property-3.c
trunk/gcc/testsuite/gcc.target/i386/cet-sjlj-7.c
trunk/gcc/testsuite/gcc.target/i386/pr85417-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/cet.c
trunk/gcc/config/i386/i386-c.c
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/i386.opt
trunk/gcc/doc/invoke.texi
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/c-c++-common/attr-nocf-check-1.c
trunk/gcc/testsuite/c-c++-common/attr-nocf-check-3.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-1.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-2.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-3.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-5.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-6.c
trunk/gcc/testsuite/c-c++-common/fcf-protection-7.c
trunk/gcc/testsuite/gcc.dg/march-generic.c
trunk/gcc/testsuite/gcc.target/i386/align-limit.c
trunk/gcc/testsuite/gcc.target/i386/cet-notrack-icf-1.c
trunk/gcc/testsuite/gcc.target/i386/cet-notrack-icf-3.c
trunk/gcc/testsuite/gcc.target/i386/cet-property-2.c
trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
trunk/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
trunk/gcc/testsuite/gcc.target/i386/pr85403.c
trunk/gcc/testsuite/gcc.target/i386/ret-thunk-26.c

[Bug target/85469] New: -mibt is unused

2018-04-19 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85469

Bug ID: 85469
   Summary: -mibt is unused
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: igor.v.tsimbalist at intel dot com
Blocks: 81652
  Target Milestone: ---
Target: x86_64-*-*, i?86-*-*

With r259496, now -mibt does nothing and should be removed.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81652
[Bug 81652] [meta-bug] -fcf-protection=full -mcet bugs

  1   2   >