[Bug tree-optimization/88970] ICE: verify_ssa failed (error: definition in block 2 follows the use)

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88970

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Confirmed, it's very old (at least 4.9.0).

[Bug c++/88967] [9 regression] openmp default(none) broken

2019-01-22 Thread lebedev.ri at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88967

--- Comment #4 from Roman Lebedev  ---
(In reply to Roman Lebedev from comment #3)
> While there, any advice on how that is supposed to be rewritten?
> Simply adding "shared(begin, len)" makes older gcc's unhappy
> https://godbolt.org/z/gyZBR-
> Only keeping "shared(begin, len)" (and dropping "default(none)") does not
> work either.

Right, "firstprivate(begin, len)" works while being backward compatible,
sorry for panicking too early.
https://godbolt.org/z/tEYKIq

[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-01-22
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Can't reproduce with current trunk, I only see:

g++ pr88969.cpp -c -std=c++2a -fchecking=1
pr88969.cpp:13:21: error: expected initializer at end of input
   13 |   void delete_B(B *b)
  | ^
pr88969.cpp:13:21: error: expected ‘}’ at end of input
pr88969.cpp:8:28: note: to match this ‘{’
8 | namespace delete_selection {
  |^

[Bug c/88968] [8/9 Regression] Stack overflow in gimplify_expr

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88968

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||jakub at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
   Target Milestone|--- |8.3
 Ever confirmed|0   |1
  Known to fail||8.2.0, 9.0

--- Comment #1 from Martin Liška  ---
Confirmed, it's rejected with GCC 7.4.0:
pr88968.c: In function ‘yp’:
pr88968.c:9:9: error: cannot take address of bit-field ‘hq’
 #pragma omp atomic
 ^~~

so started with r250929.

[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||marxin at gcc dot gnu.org
  Component|c   |preprocessor
 Ever confirmed|0   |1

--- Comment #3 from Martin Liška  ---
Confirmed, happens will all releases I have (4.8.0+).

[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-01-22
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Martin Liška  ---
Can you please provide full command line you use for the compilation?
I can't reproduce for the snippet for a cross-compiler.

[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965

Jakub Jelinek  changed:

   What|Removed |Added

 Status|WAITING |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Created attachment 45488
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45488&action=edit
gcc9-pr88965.patch

Untested fix.

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||amker at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
   Target Milestone|--- |8.3
 Ever confirmed|0   |1

--- Comment #2 from Martin Liška  ---
Anyway, started with r255472.

[Bug d/88958] ICE in walk_aliased_vdefs_1, at tree-ssa-alias.c:2887

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88958

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||ibuclaw at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug d/88957] ICE: Segmentation fault in tree_could_trap_p, at tree-eh.c:2672

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88957

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||ibuclaw at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug c/88956] [9 Regression] ICE: Floating point exception

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88956

Martin Liška  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|NEW |ASSIGNED
 CC||marxin at gcc dot gnu.org
  Known to work||8.2.0
   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org
  Known to fail||9.0

--- Comment #2 from Martin Liška  ---
Started with r262522.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #21 from rguenther at suse dot de  ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #19 from Chris Elrod  ---
> To add a little more:
> I used inline asm for direct access to the rsqrt instruction "vrsqrt14ps" in
> Julia. Without adding a Newton step, the answers are wrong beyond just a 
> couple
> significant digits.
> With the Newton step, the answers are correct.
> 
> My point is that LLVM-compiled code (Clang/Flang/ispc) are definitely adding
> the Newton step. They get the correct answer.
> 
> That leaves my best guess for the performance difference as owing to the 
> masked
> "vrsqrt14ps" that gcc is using:
> 
> vcmpps  $4, %zmm0, %zmm5, %k1
> vrsqrt14ps  %zmm0, %zmm1{%k1}{z}
> 
> Is there any way for me to test that idea?
> Edit the asm to remove the vcmppss and mask, compile the asm with gcc, and
> benchmark it?

Usually it's easiest to compile to assembler with GCC (-S) and test
this kind of theories by editing the GCC generated assembly and
then benchmark that.  Just use the assembler as input to the
gfortran compile command instead of the .f for linking the program.

[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"

2019-01-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966

--- Comment #4 from Andrew Pinski  ---
The same reason why:
#define mymacro 1

str(mymacro)
stringify(mymacro)

Gives different results.

[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509

2019-01-22 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969

--- Comment #3 from Arseny Solokha  ---
--- mi9qy2yt.cpp2019-01-22 15:51:33.410845340 +0700
+++ tbfkgb7c.cpp2019-01-22 15:51:28.620898102 +0700
@@ -7,7 +7,7 @@
 namespace delete_selection {
   struct B {
 void operator delete(void*) = delete;
-void operator delete(B *, std::destroying_delete_t) = delete;
+void operator delete(void *, std::destroying_delete_t) = delete;
   };
   void delete_B(B *b) { delete b; }
 }

% g++-9.0.0-alpha20190120 -c tbfkgb7c.cpp
tbfkgb7c.cpp:10:62: internal compiler error: Segmentation fault
   10 | void operator delete(void *, std::destroying_delete_t) = delete;
  |  ^~
0xf9cb6f crash_signal
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/toplev.c:326
0xa5ad2f tree_class_check(tree_node*, tree_code_class, char const*, int, char
const*)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/tree.h:3298
0xa5ad2f comptypes(tree_node*, tree_node*, int)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/typeck.c:1465
0x926cc7 coerce_delete_type(tree_node*, unsigned int)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl2.c:1776
0x8ff2ba grok_op_properties(tree_node*, bool)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:13472
0x90c3ac grokfndecl
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:9034
0x916b60 grokdeclarator(cp_declarator const*, cp_decl_specifier_seq*,
decl_context, int, tree_node**)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl.c:12424
0x92a54e grokfield(cp_declarator const*, cp_decl_specifier_seq*, tree_node*,
bool, tree_node*, tree_node*)
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/decl2.c:814
0x9c13cf cp_parser_member_declaration
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:24656
0x999f9f cp_parser_member_specification_opt
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:24129
0x999f9f cp_parser_class_specifier_1
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:23273
0x99bc98 cp_parser_class_specifier
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:23535
0x99bc98 cp_parser_type_specifier
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:17356
0x99cc50 cp_parser_decl_specifier_seq
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:14049
0x99d424 cp_parser_simple_declaration
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13354
0x9c2cdd cp_parser_declaration
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13173
0x9c389c cp_parser_declaration_seq_opt
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13049
0x9c389c cp_parser_namespace_body
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:19252
0x9c389c cp_parser_namespace_definition
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:19230
0x9c2df0 cp_parser_declaration
   
/var/tmp/portage/sys-devel/gcc-9.0.0_alpha20190120/work/gcc-9-20190120/gcc/cp/parser.c:13153

[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509

2019-01-22 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969

--- Comment #2 from Arseny Solokha  ---
I get what you see when I modify the testcase from comment 0 the following way:

--- mi9qy2yt.cpp2019-01-22 15:48:53.473604944 +0700
+++ r9d6mwt2.cpp2019-01-22 15:46:45.567008369 +0700
@@ -1,3 +1,4 @@
+
 namespace std {
   struct destroying_delete_t {
 struct __construct { explicit __construct() = default; };
@@ -9,5 +10,5 @@
 void operator delete(void*) = delete;
 void operator delete(B *, std::destroying_delete_t) = delete;
   };
-  void delete_B(B *b) { delete b; }
-}
+  void delete_B(B *b)
+


Looks like a copy-paste error?

[Bug fortran/35476] Accepts invalid: USE/host association of generics with same specifics

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35476

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #9 from Jürgen Reuter  ---
This is still present and not caught by gfortran, according to the interp from
J3 the code is invalid.

[Bug gcov-profile/88924] [GCOV] Wrong frequencies when there is complicated if expressions in gcov

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88924

Martin Liška  changed:

   What|Removed |Added

   Priority|P3  |P5
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Confirmed, it's related to some subexpression folding. Thus low priority to
fix.

[Bug fortran/35779] error pointer wrong in PARAMETER

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35779

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #13 from Jürgen Reuter  ---
Still present in trunk.

[Bug gcov-profile/88913] [GCOV] Wrong frequencies when a global variable is in a while expression in gcov

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88913

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Martin Liška  ---
Fixed on trunk in r247374.

[Bug gcov-profile/88913] [GCOV] Wrong frequencies when a global variable is in a while expression in gcov

2019-01-22 Thread yangyibiao at nju dot edu.cn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88913

--- Comment #2 from Yibiao Yang  ---
(In reply to Martin Liška from comment #1)
> Fixed on trunk in r247374.

Thanks.

[Bug c++/88969] ICE in build_op_delete_call, at cp/call.c:6509

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||jason at gcc dot gnu.org

--- Comment #4 from Martin Liška  ---
Confirmed, started with r266053.

[Bug rtl-optimization/49429] [4.7 Regression] dse.c change (r175063) causes execution failures

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49429

--- Comment #21 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:10:25 2019
New Revision: 268138

URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev
Log:
PR rtl-optimization/49429
PR target/49454
PR rtl-optimization/86334
PR target/88906
* expr.c (emit_block_move_hints): Move marking of MEM_EXPRs
addressable from here...
(emit_block_op_via_libcall): ... to here.

* gcc.target/i386/pr86334.c: New test.
* gcc.target/i386/pr88906.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr86334.c
trunk/gcc/testsuite/gcc.target/i386/pr88906.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/86334] wrong code with -march=athlon -mmemcpy-strategy=libcall:-1:noalign

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86334

--- Comment #4 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:10:25 2019
New Revision: 268138

URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev
Log:
PR rtl-optimization/49429
PR target/49454
PR rtl-optimization/86334
PR target/88906
* expr.c (emit_block_move_hints): Move marking of MEM_EXPRs
addressable from here...
(emit_block_op_via_libcall): ... to here.

* gcc.target/i386/pr86334.c: New test.
* gcc.target/i386/pr88906.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr86334.c
trunk/gcc/testsuite/gcc.target/i386/pr88906.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/testsuite/ChangeLog

[Bug target/49454] [4.7 Regression] /usr/include/libio.h:336:3: internal compiler error: Segmentation fault

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49454

--- Comment #9 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:10:25 2019
New Revision: 268138

URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev
Log:
PR rtl-optimization/49429
PR target/49454
PR rtl-optimization/86334
PR target/88906
* expr.c (emit_block_move_hints): Move marking of MEM_EXPRs
addressable from here...
(emit_block_op_via_libcall): ... to here.

* gcc.target/i386/pr86334.c: New test.
* gcc.target/i386/pr88906.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr86334.c
trunk/gcc/testsuite/gcc.target/i386/pr88906.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/testsuite/ChangeLog

[Bug target/88906] wrong code with -march=k6 -minline-all-stringops -minline-stringops-dynamically -mmemcpy-strategy=libcall:-1:align and vector argument

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88906

--- Comment #8 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:10:25 2019
New Revision: 268138

URL: https://gcc.gnu.org/viewcvs?rev=268138&root=gcc&view=rev
Log:
PR rtl-optimization/49429
PR target/49454
PR rtl-optimization/86334
PR target/88906
* expr.c (emit_block_move_hints): Move marking of MEM_EXPRs
addressable from here...
(emit_block_op_via_libcall): ... to here.

* gcc.target/i386/pr86334.c: New test.
* gcc.target/i386/pr88906.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr86334.c
trunk/gcc/testsuite/gcc.target/i386/pr88906.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/expr.c
trunk/gcc/testsuite/ChangeLog

[Bug fortran/35718] deallocating non-allocated pointer target does not fail

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35718

--- Comment #6 from Jürgen Reuter  ---
Still present in trunk.

[Bug target/88905] [8/9 Regression] ICE: in decompose, at rtl.h:2253 with -mabm and __builtin_popcountll

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88905

--- Comment #5 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:11:35 2019
New Revision: 268139

URL: https://gcc.gnu.org/viewcvs?rev=268139&root=gcc&view=rev
Log:
PR target/88905
* optabs.c (add_equal_note): Add op0_mode argument, use it instead of
GET_MODE (op0).
(expand_binop_directly, expand_doubleword_clz,
expand_doubleword_popcount, expand_ctz, expand_ffs,
expand_unop_direct, maybe_emit_unop_insn): Adjust callers.

* gcc.dg/pr88905.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr88905.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904

--- Comment #5 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:12:31 2019
New Revision: 268140

URL: https://gcc.gnu.org/viewcvs?rev=268140&root=gcc&view=rev
Log:
PR rtl-optimization/88904
* cfgcleanup.c (thread_jump): Verify cond2 doesn't mention
any nonequal registers before processing BB_END (b).

* gcc.c-torture/execute/pr88904.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/execute/pr88904.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgcleanup.c
trunk/gcc/testsuite/ChangeLog

[Bug target/88905] [8 Regression] ICE: in decompose, at rtl.h:2253 with -mabm and __builtin_popcountll

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88905

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[8/9 Regression] ICE: in|[8 Regression] ICE: in
   |decompose, at rtl.h:2253|decompose, at rtl.h:2253
   |with -mabm and  |with -mabm and
   |__builtin_popcountll|__builtin_popcountll

--- Comment #6 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread christopher.leonard at abaco dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #5 from Christopher Leonard  
---
Is the order at least consistant with x86-32? i.e. if you give a 64-bit input
operand to inline assembly the order is hi:lo? I'm worried this is a bizarre
convention imposed on high endian architectures.

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||jakub at gcc dot gnu.org
 Blocks||88670
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  The reason is that vector lowering only lowers the arithmetic
but leaves the loads and stores alone:

   _1 = *b_5(D);
   _2 = *c_6(D);
-  _3 = _1 + _2;
+  _9 = BIT_FIELD_REF <_1, 128, 0>;
+  _10 = BIT_FIELD_REF <_2, 128, 0>;
+  _11 = _9 + _10;
+  _12 = BIT_FIELD_REF <_1, 128, 128>;
+  _13 = BIT_FIELD_REF <_2, 128, 128>;
+  _14 = _12 + _13;
+  _15 = BIT_FIELD_REF <_1, 128, 256>;
+  _16 = BIT_FIELD_REF <_2, 128, 256>;
+  _17 = _15 + _16;
+  _18 = BIT_FIELD_REF <_1, 128, 384>;
+  _19 = BIT_FIELD_REF <_2, 128, 384>;
+  _20 = _18 + _19;
+  _3 = {_11, _14, _17, _20};
   *a_7(D) = _3;

there's some hack^Wcode in tree-ssa-forwprop.c to deal with similar cases
using {REAL,IMAG}PART_EXPR and COMPLEX_EXPR, splitting feeding/destination
memory accesses.  The same trick is missing for vector loads/stores.

OTOH it would be more reasonable for vector lowering to split the loads.
It's not so difficult to do - the main "issue" would be making sure
the wide vector load goes away (or maybe that's even a secondary issue
that could be ignored).

With just the loads handled code generation improves to

test:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
andq$-64, %rsp
subq$8, %rsp
movdqa  (%rsi), %xmm3
movdqa  16(%rsi), %xmm2
movdqa  32(%rsi), %xmm1
movdqa  48(%rsi), %xmm0
paddd   (%rdx), %xmm3
paddd   16(%rdx), %xmm2
paddd   32(%rdx), %xmm1
paddd   48(%rdx), %xmm0
movaps  %xmm3, (%rdi)
movaps  %xmm2, 16(%rdi)
movaps  %xmm1, 32(%rdi)
movaps  %xmm0, 48(%rdi)
leave
ret
.cfi_endproc

for SSE2 and

test:
.LFB0:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
andq$-64, %rsp
subq$8, %rsp
vmovdqa (%rsi), %ymm3
vpaddd  (%rdx), %ymm3, %ymm0
vmovdqa %xmm0, %xmm2
vmovdqa %ymm0, -120(%rsp)
vmovdqa 32(%rsi), %ymm0
vmovdqa -104(%rsp), %xmm4
vpaddd  32(%rdx), %ymm0, %ymm0
vmovaps %xmm2, (%rdi)
vmovdqa %ymm0, -88(%rsp)
vmovdqa -72(%rsp), %xmm5
vmovaps %xmm4, 16(%rdi)
vmovaps %xmm0, 32(%rdi)
vmovaps %xmm0, 32(%rdi)
vmovaps %xmm5, 48(%rdi)
vzeroupper
leave
.cfi_def_cfa 7, 8
ret

for skylake.  Not sure why we spill anything with the above, with the SSE
code we manage to elide the spills (but not the stack reservation).  I
guess we need to handle the stores as well.

The odd thing is that if I simply do

  _12 = BIT_FIELD_REF <*b_5(D), 256, 256>;
  _9 = BIT_FIELD_REF <*b_5(D), 256, 0>;
  _13 = BIT_FIELD_REF <*c_6(D), 256, 256>;
  _10 = BIT_FIELD_REF <*c_6(D), 256, 0>;
  _11 = _9 + _10;
  _14 = _12 + _13;
  BIT_FIELD_REF <*a_7(D), 256, 0> = _11;
  BIT_FIELD_REF <*a_7(D), 256, 256> = _14;

code-generation is even worse:

vmovdqa (%rsi), %ymm0
vmovdqa 32(%rsi), %ymm2
vpaddd  (%rdx), %ymm0, %ymm3
vpaddd  32(%rdx), %ymm2, %ymm1
vmovdqa %ymm3, -64(%rsp)
movq-56(%rsp), %rax
vmovdqa %ymm1, -32(%rsp)
movq%rax, 8(%rdi)
movq-48(%rsp), %rax
vmovdqa -64(%rsp), %xmm0
movq%rax, 16(%rdi)
movq-40(%rsp), %rax
vmovq   %xmm0, (%rdi)
movq%rax, 24(%rdi)
movq-24(%rsp), %rax
vmovdqa -32(%rsp), %xmm0
movq%rax, 40(%rdi)
movq-16(%rsp), %rax
vmovq   %xmm0, 32(%rdi)
movq%rax, 48(%rdi)
movq-8(%rsp), %rax
movq%rax, 56(%rdi)
vzeroupper
leave
.cfi_def_cfa 7, 8
ret

the stores expand to

;; BIT_FIELD_REF <*a_7(D), 256, 0> = _11;

(insn 14 13 15 (set (mem/j:DI (reg/v/f:DI 88 [ a ]) [1 *a_7(D)+0 S8 A256])
(subreg:DI (reg:V8SI 84 [ _11 ]) 0)) "t.c":6:6 -1
 (nil))

(insn 15 14 16 (set (mem/j:DI (plus:DI (reg/v/f:DI 88 [ a ])
(const_int 8 [0x8])) [1 *a_7(D)+8 S8 A64])
(subreg:DI (reg:V8SI 84 [ _11 ]) 8)) "t.c":6:6 -1
 (nil))

(insn 16 15 17 (set (mem/j:DI (plus:DI (reg/v/f:DI 88 [ a ])
(const_int 16 [0x10])) [1 *a_7(D)+16 S8 A128])
(subreg:DI (reg:V8SI 84 [ _11 ]) 16)) "t.c":6:6 -1
 (nil))

(insn 17 16 0 (set (mem/j:DI (plus:DI (reg/v/f:DI 88 

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #2 from Richard Biener  ---
Created attachment 45489
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45489&action=edit
untested patch

forwprop patch I was playing with.

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #3 from Jakub Jelinek  ---
Yeah, I've noticed that already when working on __builtin_convertvector, we
don't really do much TER for the oversized vector SSA_NAMEs and force them into
stack all the time.  Wonder if we couldn't do kind of SRA for these vectors
after generic vector lowering to split them into multiple unrelated SSA_NAMEs
if possible.

[Bug c++/88951] [9 Regression] No fpermissive offerred on 'error: jump to case label'

2019-01-22 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88951

--- Comment #1 from Paolo Carlini  ---
The rationale for the change is here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00623.html I my experience,
accepting such kind of code is really dangerous, because -fpermissive isn't
fine grained thus in some cases users want to pass it to allow for other *safe*
legacy constructs. That said, clarified that I vote NO, NO based on real, hard,
experience, it's the front-end maintainers call, reverting the change would be
trivial.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #6 from Andrew Pinski  ---
(In reply to Christopher Leonard from comment #5)
> Is the order at least consistant with x86-32? i.e. if you give a 64-bit
> input operand to inline assembly the order is hi:lo? I'm worried this is a
> bizarre convention imposed on high endian architectures.

Yes the order is always hi:lo (reg:reg+1) on all targets I know of; endianness
only matters when it comes to memory.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread elrodc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #22 from Chris Elrod  ---
Okay. I did that, and the time went from about 4.25 microseconds down to 4.0
microseconds. So that is an improvement, but accounts for only a small part of
the difference with the LLVM-compilers.

-O3 -fno-math-errno

was about 3.5 microseconds, so -funsafe-math-optimizations still results in a
regression in this code.

3.5 microseconds is roughly as fast as you can get with vsqrt and div.

My best guess now is that gcc does a lot more to improve the accuracy of vsqrt.
If I understand correctly, these are all the involved instructions:

vmovaps .LC2(%rip), %zmm7
vmovaps .LC3(%rip), %zmm6
# for loop begins
vrsqrt14ps  %zmm1, %zmm2 # comparison and mask removed
vmulps  %zmm1, %zmm2, %zmm0
vmulps  %zmm2, %zmm0, %zmm1
vmulps  %zmm6, %zmm0, %zmm0
vaddps  %zmm7, %zmm1, %zmm1
vmulps  %zmm0, %zmm1, %zmm1
vrcp14ps%zmm1, %zmm0
vmulps  %zmm1, %zmm0, %zmm1
vmulps  %zmm1, %zmm0, %zmm1
vaddps  %zmm0, %zmm0, %zmm0
vsubps  %zmm1, %zmm0, %zmm0
vfnmadd213ps(%r10,%rax), %zmm0, %zmm2

If I understand this correctly:

zmm2 =(approx) 1 / sqrt(zmm1)
zmm0 = zmm1 * zmm2 = (approx) sqrt(zmm1)
zmm1 = zmm0 * zmm2 = (approx) 1
zmm0 = zmm6 * zmm0 = (approx) constant6 * sqrt(zmm1)
zmm1 = zmm7 * zmm1 = (approx) constant7
zmm1 = zmm0 * zmm1 = (approx) constant6 * constant6 * sqrt(zmm1)
zmm0 = (approx) 1 / zmm1 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 *
constant7)
zmm1 = zmm1 * zmm0 = (approx) 1
zmm1 = zmm1 * zmm0 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * constant7)
zmm0 = 2 * zmm0 = (approx) 2 / sqrt(zmm1) * 2 / (constant6 * constant7)
zmm0 = zmm1 - zmm0 = (approx) -1 / sqrt(zmm1) * 1 / (constant6 * constant7)

which implies that constant6 * constant6 = approximately -1?


LLVM seems to do a much simpler / briefer update of the output of vrsqrt.

When I implemented a vrsqrt intrinsic in a Julia library, I just looked at
Wikipedia and did (roughly):

constant1 = -0.5
constant2 = 1.5

zmm2 = (approx) 1 / sqrt(zmm1)
zmm3 = constant * zmm1
zmm1 = zmm2 * zmm2
zmm3 = zmm3 * zmm1 + constant2
zmm2 = zmm2 * zmm3


I am not a numerical analyst, so I can't comment on relative validities or
accuracies of these approaches.
I also don't know what LLVM 7+ does. LLVM 6 doesn't use vrsqrt.

I would be interesting in reading explanations or discussions, if any are
available.

[Bug fortran/37222] [OOP] Checks when overriding type-bound procedures are incomplete

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37222

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #4 from Jürgen Reuter  ---
As Janus commented there is just one left-over (already fixed in the past six
years?). So what is really left to do here?

[Bug fortran/37398] Statement functions mask missing PURE procedures.

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37398

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #3 from Jürgen Reuter  ---
This correctly gives the expected error messages since at least gfortran 5.4.
Closing as FIXED?

[Bug fortran/38113] on warning/error: skip whitespaces, move position marker to actual variable name

2019-01-22 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38113

Jürgen Reuter  changed:

   What|Removed |Added

 CC||juergen.reuter at desy dot de

--- Comment #9 from Jürgen Reuter  ---
Here there are some problems that have been fixed, and some new have been
revealed!? To me it is not clear what the exact context is now. Maybe closing
as WORKSFORME, and waiting for someone to open an actual issue with the
alignment of markers?

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #23 from rguenther at suse dot de  ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #22 from Chris Elrod  ---
> Okay. I did that, and the time went from about 4.25 microseconds down to 4.0
> microseconds. So that is an improvement, but accounts for only a small part of
> the difference with the LLVM-compilers.
> 
> -O3 -fno-math-errno
> 
> was about 3.5 microseconds, so -funsafe-math-optimizations still results in a
> regression in this code.
> 
> 3.5 microseconds is roughly as fast as you can get with vsqrt and div.
> 
> My best guess now is that gcc does a lot more to improve the accuracy of 
> vsqrt.
> If I understand correctly, these are all the involved instructions:
> 
> vmovaps .LC2(%rip), %zmm7
> vmovaps .LC3(%rip), %zmm6
> # for loop begins
> vrsqrt14ps  %zmm1, %zmm2 # comparison and mask removed
> vmulps  %zmm1, %zmm2, %zmm0
> vmulps  %zmm2, %zmm0, %zmm1
> vmulps  %zmm6, %zmm0, %zmm0
> vaddps  %zmm7, %zmm1, %zmm1
> vmulps  %zmm0, %zmm1, %zmm1
> vrcp14ps%zmm1, %zmm0
> vmulps  %zmm1, %zmm0, %zmm1
> vmulps  %zmm1, %zmm0, %zmm1
> vaddps  %zmm0, %zmm0, %zmm0
> vsubps  %zmm1, %zmm0, %zmm0
> vfnmadd213ps(%r10,%rax), %zmm0, %zmm2
> 
> If I understand this correctly:
> 
> zmm2 =(approx) 1 / sqrt(zmm1)
> zmm0 = zmm1 * zmm2 = (approx) sqrt(zmm1)
> zmm1 = zmm0 * zmm2 = (approx) 1
> zmm0 = zmm6 * zmm0 = (approx) constant6 * sqrt(zmm1)
> zmm1 = zmm7 * zmm1 = (approx) constant7
> zmm1 = zmm0 * zmm1 = (approx) constant6 * constant6 * sqrt(zmm1)
> zmm0 = (approx) 1 / zmm1 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 *
> constant7)
> zmm1 = zmm1 * zmm0 = (approx) 1
> zmm1 = zmm1 * zmm0 = (approx) 1 / sqrt(zmm1) * 1 / (constant6 * constant7)
> zmm0 = 2 * zmm0 = (approx) 2 / sqrt(zmm1) * 2 / (constant6 * constant7)
> zmm0 = zmm1 - zmm0 = (approx) -1 / sqrt(zmm1) * 1 / (constant6 * constant7)
> 
> which implies that constant6 * constant6 = approximately -1?

GCC implements

 /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */

which looks similar to what LLVM does.  You can look at the
-fdump-tree-optimized dump to see if there's anything fishy.

> 
> LLVM seems to do a much simpler / briefer update of the output of vrsqrt.
> 
> When I implemented a vrsqrt intrinsic in a Julia library, I just looked at
> Wikipedia and did (roughly):
> 
> constant1 = -0.5
> constant2 = 1.5
> 
> zmm2 = (approx) 1 / sqrt(zmm1)
> zmm3 = constant * zmm1
> zmm1 = zmm2 * zmm2
> zmm3 = zmm3 * zmm1 + constant2
> zmm2 = zmm2 * zmm3
> 
> 
> I am not a numerical analyst, so I can't comment on relative validities or
> accuracies of these approaches.
> I also don't know what LLVM 7+ does. LLVM 6 doesn't use vrsqrt.
> 
> I would be interesting in reading explanations or discussions, if any are
> available.
> 
>

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2019-01-22 Thread nidal.faour at wdc dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

--- Comment #6 from Nidal Faour  ---
Andrew Pinski is right, after chasing this bug with the help of Andrew Burgess
in the file simple-object.c, calling the creat

outfd = creat (dest, 00777);

the creat function wraps the open function but do not pass open mode
and the fix mentioned by Adrew was as follow:

When opening output files for simple-object creation, we must ensure that the
file is opened in binary mode.  Failure to do so causes file corruption, and
LTO failure on Windows targets.

libiberty/ChangeLog:

PR lto/88422
* simple-object.c (O_BINARY): Define if not already defined.
(simple_object_copy_lto_debug_sections): Create file in binary
mode.
---
 libiberty/ChangeLog   | 7 +++
 libiberty/simple-object.c | 6 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c index
c1f38cee8ee..e061073abd1 100644
--- a/libiberty/simple-object.c
+++ b/libiberty/simple-object.c
@@ -44,6 +44,10 @@ Boston, MA 02110-1301, USA.  */  #define SEEK_SET 0  #endif

+#ifndef O_BINARY
+# define O_BINARY 0
+#endif
+
 #include "simple-object-common.h"

 /* The known object file formats.  */
@@ -349,7 +353,7 @@ simple_object_copy_lto_debug_sections (simple_object_read
*sobj,
   return errmsg;
 }

-  outfd = creat (dest, 00777);
+  outfd = open (dest, O_CREAT|O_WRONLY|O_TRUNC|O_BINARY, 00777);
   if (outfd == -1)
 {
   *err = errno;
--

[Bug tree-optimization/88919] New test case gcc.dg/vect/pr88903-1.c in r268076 fails

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88919

--- Comment #2 from Richard Biener  ---
Sandra posted a patch that will probably fix this (out-of-bound shift values).

[Bug c++/88951] [9 Regression] No fpermissive offerred on 'error: jump to case label'

2019-01-22 Thread paolo.carlini at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88951

--- Comment #2 from Paolo Carlini  ---
Also note, further clarifying what I said in the linked messages, that we only
temporarily, for few releases, accepted with -fpermissive such kind of broken
code: before gcc5, -fpermissive suppressed the first error, but then an
additional hard error was emitted anyway.

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread husseydevin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

Devin Hussey  changed:

   What|Removed |Added

 CC||husseydevin at gmail dot com

--- Comment #4 from Devin Hussey  ---
Strangely, this doesn't seem to affect the ARM or aarch64 backends, although I
am on a December build (specifically Dec 29). 8.2 is also unaffected.

arm-none-eabi-gcc -mfloat-abi=hard -mfpu=neon -march=armv7-a -O3 -S test.c

test:
vldmia  r1, {d0-d7}
vldmia  r2, {d24-d31}
vadd.i32q8, q0, q12
vadd.i32q9, q1, q13
vadd.i32q10, q2, q14
vadd.i32q11, q3, q15
vstmia  r0, {d16-d23}
bx  lr

aarch64-none-eabi-gcc -O3 -S test.c

test:
ld1 {v16.16b - v19.16b}, [x1]
ld1 {v4.16b - v7.16b}, [x2]
add v0.4s, v16.4s, v4.4s
add v1.4s, v17.4s, v5.4s
add v2.4s, v18.4s, v6.4s
add v3.4s, v19.4s, v7.4s
st1 {v0.16b - v3.16b}, [x0]
ret

Amusingly, Clang trunk for ARMv7-a has a similar issue (aarch64 is fine).

test:
.fnstart
.save   {r11, lr}
push{r11, lr}
add r3, r1, #48
mov lr, r1
mov r12, r2
vld1.64 {d20, d21}, [r3]
add r3, r2, #48
add r1, r1, #32
vld1.32 {d16, d17}, [lr]!
vld1.32 {d18, d19}, [r12]!
vadd.i32q8, q9, q8
vld1.64 {d22, d23}, [r3]
vadd.i32q10, q11, q10
vld1.64 {d26, d27}, [r1]
add r1, r2, #32
vld1.64 {d28, d29}, [r1]
add r1, r0, #48
vadd.i32q11, q14, q13
vld1.64 {d24, d25}, [lr]
vld1.64 {d18, d19}, [r12]
vadd.i32q9, q9, q12
vst1.64 {d20, d21}, [r1]
add r1, r0, #32
vst1.32 {d16, d17}, [r0]!
vst1.64 {d22, d23}, [r1]
vst1.64 {d18, d19}, [r0]
pop {r11, pc}

[Bug fortran/37222] [OOP] Checks when overriding type-bound procedures are incomplete

2019-01-22 Thread janus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37222

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from janus at gcc dot gnu.org ---
(In reply to Jürgen Reuter from comment #4)
> As Janus commented there is just one left-over (already fixed in the past
> six years?). So what is really left to do here?

I don't think the left-over is actually fixed (at least the FIXME notes are
still present in interface.c).

In any case, further improvement in this area is rather hard and yields only
little gain, so I think it's reasonable to close this ten-year-old PR that
presents no concrete test case (after all, the FIXMEs are still there for
future reference).

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #7 from Uroš Bizjak  ---
(In reply to Christopher Leonard from comment #5)
> Is the order at least consistant with x86-32? i.e. if you give a 64-bit
> input operand to inline assembly the order is hi:lo? I'm worried this is a
> bizarre convention imposed on high endian architectures.

On x86, we don't allow register pairs in asm at all. Please see print_reg,
where:

  switch (msize)
{
case 16:
case 12:
case 8:
  if (GENERAL_REGNO_P (regno) && msize > GET_MODE_SIZE (word_mode))
warning (0, "unsupported size for integer register");
  /* FALLTHRU */

So, if someone wants to handle DImode on 32bit targets, both registers have to
be passed to assembly explicitly, using "(int) lval" and "(int) (lval >> 32)".

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #5 from Andrew Pinski  ---
(In reply to Devin Hussey from comment #4)
> Strangely, this doesn't seem to affect the ARM or aarch64 backends, although
> I am on a December build (specifically Dec 29). 8.2 is also unaffected.

This is due to those backends support very wide integer modes (OI, etc.).


> aarch64-none-eabi-gcc -O3 -S test.c
> 
> test:
> ld1 {v16.16b - v19.16b}, [x1]
> ld1 {v4.16b - v7.16b}, [x2]
> add v0.4s, v16.4s, v4.4s
> add v1.4s, v17.4s, v5.4s
> add v2.4s, v18.4s, v6.4s
> add v3.4s, v19.4s, v7.4s
> st1 {v0.16b - v3.16b}, [x0]
> ret

This is not really that good code either on most if not all micro-arch of
ARMv8.
Doing, 8 ldr/ld1 and 4 st1 is almost always better.

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-01-22
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #7 from Richard Biener  ---
Thanks for catching this!  I'll apply the patch.

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||8.2.1, 9.0
 Resolution|--- |FIXED
  Known to fail||8.2.0

--- Comment #9 from Richard Biener  ---
Fixed.

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

--- Comment #8 from Richard Biener  ---
Author: rguenth
Date: Tue Jan 22 09:47:52 2019
New Revision: 268141

URL: https://gcc.gnu.org/viewcvs?rev=268141&root=gcc&view=rev
Log:
2019-01-22  Nidal Faour  

PR lto/88422
* simple-object.c (O_BINARY): Define if not already defined.
(simple_object_copy_lto_debug_sections): Create file in binary
mode.

Modified:
trunk/libiberty/ChangeLog
trunk/libiberty/simple-object.c

[Bug lto/88422] collect2.exe: fatal error: lto-wrapper returned 1 exit status: file not recognized: file truncated

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88422

--- Comment #10 from Richard Biener  ---
Author: rguenth
Date: Tue Jan 22 09:49:27 2019
New Revision: 268142

URL: https://gcc.gnu.org/viewcvs?rev=268142&root=gcc&view=rev
Log:
2019-01-22  Nidal Faour  

PR lto/88422
* simple-object.c (O_BINARY): Define if not already defined.
(simple_object_copy_lto_debug_sections): Create file in binary
mode.

Modified:
branches/gcc-8-branch/libiberty/ChangeLog
branches/gcc-8-branch/libiberty/simple-object.c

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #6 from Andrew Pinski  ---
Try using 128 (or 256) and you might see that aarch64 falls down similarly.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #8 from Andreas Schwab  ---
reg:reg+1 maps to lo:hi on x86.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread christopher.leonard at abaco dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #9 from Christopher Leonard  
---
(In reply to Andrew Pinski from comment #6)
> Yes the order is always hi:lo (reg:reg+1) on all targets I know of

This is definitely not the natural choice (on any platform: I agree, endianness
is irrelevant here) so I would recommend documenting this as well, and
potentially recommending in the docs to explicitly cast e.g. a parameter for a
function-style macro used as an input operand expression for inline asm, (%L0
is no help when the size is unknown, it seems to select the "next" register
when you give a 32-bit type, which isn't even loaded with a value in the
generated PPC assembly).

This is how the code messed up for me, I wrote a macro function to generate
MTSPR instructions for a given SPR and load value (this is needed since the SPR
number used in MTSPR is immediate, there is not alternative where you can take
the SPR from a register). One of the constants I used in the calculation of an
SPR's load value became a 64-bit type in a later code change, making the input
operand 64-bit instead of 32-bit, breaking my code.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread christopher.leonard at abaco dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #10 from Christopher Leonard  
---
Getting contradictory statements now:
>reg:reg+1 maps to lo:hi on x86.
>On x86, we don't allow register pairs in asm at all.

Not allowing, or printing a warning, is much better behavior than what I have
been getting on PPC.

[Bug tree-optimization/88044] [9 regression] gfortran.dg/transfer_intrinsic_3.f90 hangs after r266171

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88044

--- Comment #15 from Jakub Jelinek  ---
Author: jakub
Date: Tue Jan 22 09:58:23 2019
New Revision: 268143

URL: https://gcc.gnu.org/viewcvs?rev=268143&root=gcc&view=rev
Log:
PR tree-optimization/88044
* tree-ssa-loop-niter.c (number_of_iterations_cond): If condition
is false in the first iteration, but !every_iteration, return false
instead of true with niter->niter zero.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-loop-niter.c

[Bug tree-optimization/88044] [9 regression] gfortran.dg/transfer_intrinsic_3.f90 hangs after r266171

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88044

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek  ---
Fixed.

[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862

--- Comment #2 from Richard Biener  ---
Huh.  We get &itarg1 here from originally (integer(kind=4)) &itarg1.  The
stmt we analyze is

  if (_4 != _316)

I have a simple patch.

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #7 from Marc Glisse  ---
See PR 55266 (and several others).

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #8 from Richard Biener  ---
You can try the attached patch, it "fixes" the issue on the GIMPLE side but
appearantly the BIT_FIELD_REF stores go a weird path during RTL expansion
and so we end up spilling again.

[Bug preprocessor/88966] Indirect stringification of "linux" produces "1"

2019-01-22 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88966

Jonathan Wakely  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Jonathan Wakely  ---
This is not a bug, "linux" is a predefined macro and the preprocessor is doing
exactly what it's supposed to. See
https://gcc.gnu.org/onlinedocs/cpp/System-specific-Predefined-Macros.html

[Bug middle-end/88897] Bogus maybe-uninitialized warning on class field

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88897

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 Ever confirmed|0   |1

--- Comment #6 from Richard Biener  ---
So this boils down to a missed optimization (as many cases do...).  The
uninit warning sees

 [local count: 1073741825]:
_3 = bar ();
future_state::future_state (&_local_state);
MEM[(struct  &)&_local_state] ={v} {CLOBBER};
MEM[(struct optional *)&_local_state]._M_engaged = 0;
MEM[(struct optional *)_3]._M_engaged = 0;
_7 = MEM[(struct optional &)&_local_state]._M_engaged;
if (_7 != 0)
  goto ; [50.00%]
else
  goto ; [50.00%]

 [local count: 536870912]:
_6 = MEM[(struct temporary_buffer &)&_local_state]._buffer;
...

and warns about the load _6 = ...

As you can see the condition isn't elided and somehow we didn't manage to CSE
the load of _M_engaged here, possibly due to the appearant aliasing of
the store via _3.  points-to analysis explicitely says it might alias
_local_state because _local_state escapes to future_state::future_state
and PTA is not flow-sensitive:

   [local count: 1073741825]:
  # PT = nonlocal escaped null
  # USE = nonlocal null { D.2493 } (escaped)
  # CLB = nonlocal null { D.2493 } (escaped)
  _3 = bar ();
  # USE = nonlocal null { D.2493 } (escaped)
  # CLB = nonlocal null { D.2493 } (escaped)
  future_state::future_state (&_local_stateD.2493);
  MEM[(struct  &)&_local_stateD.2493] ={v} {CLOBBER};
  MEM[(struct optionalD.2409 *)&_local_stateD.2493]._M_engagedD.2426 = 0;
  MEM[(struct optionalD.2409 *)_3]._M_engagedD.2426 = 0;

[Bug rtl-optimization/88904] [9 Regression] Basic block incorrectly skipped in jump threading.

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88904

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jakub Jelinek  ---
Fixed.

[Bug target/88906] wrong code with -march=k6 -minline-all-stringops -minline-stringops-dynamically -mmemcpy-strategy=libcall:-1:align and vector argument

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88906

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug fortran/37398] Statement functions mask missing PURE procedures.

2019-01-22 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37398

--- Comment #4 from Dominique d'Humieres  ---
> This correctly gives the expected error messages since at least gfortran 5.4.
> Closing as FIXED?

FORALL(i=1:4) a(i) = st3 (i) is still not caught.

[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap

2019-01-22 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953

--- Comment #4 from Andreas Krebbel  ---
Looks like a problem which was fixed with r265158:

S/390: Fix problem with vec_init expander

gcc/ChangeLog:

2018-10-15  Andreas Krebbel  

* config/s390/s390.c (s390_expand_vec_init): Force vector element
into reg if it isn't a general operand.

gcc/testsuite/ChangeLog:

2018-10-15  Andreas Krebbel  

* g++.dg/vec-init-1.C: New test.



I've backported the patch to GCC 7 and 8 branch on 2018-10-19. Canonical is
aware of the problem and will pick the patch up for their next GCC updates.

Could you please check whether this fixes your problem?

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||jakub at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
--- gcc/gimple-loop-interchange.cc.jj   2019-01-01 12:37:17.416970701 +0100
+++ gcc/gimple-loop-interchange.cc  2019-01-22 11:34:42.303796570 +0100
@@ -692,7 +692,7 @@ loop_cand::analyze_induction_var (tree v
   iv->var = var;
   iv->init_val = init;
   iv->init_expr = chrec;
-  iv->step = build_int_cst (TREE_TYPE (chrec), 0);
+  iv->step = build_zero_cst (TREE_TYPE (chrec));
   m_inductions.safe_push (iv);
   return true;
 }

fixes this.  SCEV is able to deal with non-integral/pointer IVs like
SCALAR_FLOAT_TYPE_P in this case and create_iv as well, just build_int_cst must
not be used in that case.

[Bug c++/88971] New: Branch optimization inconsistency (missed optimization)

2019-01-22 Thread maratrus at mail dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971

Bug ID: 88971
   Summary: Branch optimization inconsistency (missed
optimization)
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: maratrus at mail dot ru
  Target Milestone: ---

Created attachment 45490
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45490&action=edit
A code that demonstrates different patterns in optimization technique

In the code attached I expect the compiler not to generate any code between two
`mfence` instructions in the method `CheckAndPrint()`.

Indeed, it does the good job if I call `PrintGood()` method and no code
is generated. But if I out-comment `PrintBad()` or even simple return
the compiler generates a code for the if-expression `if (t.j > 0)`.

In all three cases there seems to be no reason to generate any code.

The code attached is compiled as:

`g++ -std=c++11 -Ofast opt_template.cc -o opt_template`

I must be missing something but is there a good reason why the compiler managed
to optimize the code in one case but non in the other two?

[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap

2019-01-22 Thread Jan.Kossmann at hpi dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953

--- Comment #5 from Jan Kossmann  ---
You are right, I verified with:

gcc version 9.0.0 20190122 (experimental) (GCC)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3'  '-o' 'test.cpp.o'
'-shared-libgcc' '-march=z13' '-mno-htm' '-mzarch' '-m64'
 gcc/bin/../libexec/gcc/s390x-ibm-linux-gnu/9.0.0/cc1plus -E -quiet -v
-imultiarch s390x-linux-gnu -iprefix
gcc/bin/../lib/gcc/s390x-ibm-linux-gnu/9.0.0/ -D_GNU_SOURCE test.cpp -march=z13
-mno-htm -mzarch -m64 -O3 -fpch-preprocess -o test.ii

and it worked out fine. Sorry for the trouble, thanks for your help!

[Bug rtl-optimization/88948] [9 Regression] ICE in elimination_costs_in_insn, at reload1.c:3640 since r264148

2019-01-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88948

--- Comment #2 from Uroš Bizjak  ---
The problem is with can_assign_to_reg_without_clobbers_p in gcse.c, where we
have:

  /* If the test insn is valid and doesn't need clobbers, and the target also
 has no objections, we're good.  */
  if (icode >= 0
  && (num_clobbers == 0 || !added_clobbers_hard_reg_p (icode))
  && ! (targetm.cannot_copy_insn_p
&& targetm.cannot_copy_insn_p (test_insn)))
can_assign = true;

The test instruction is created as:

(insn 26 0 0 (set (reg:SI 152)
(fix:SI (reg:DF 89))) -1
 (nil))

which is (correctly) recognized as

(define_insn "fix_trunc_i387_fisttp"
  [(set (match_operand:SWI248x 0 "nonimmediate_operand" "=m")
(fix:SWI248x (match_operand 1 "register_operand" "f")))
   (clobber (match_scratch:XF 2 "=&f"))]

However, recog also reports that 1 clobber needs to be added. The instruction
is recognized nevertheless due to "|| !added_clobbers_hard_reg_p (icode)"
bypass. The recognized insn doesn't clobber hard reg, but it also needs a
clobber of a scratch reg to be recognized.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread elrodc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #24 from Chris Elrod  ---
The dump looks like this:

  vect__67.78_217 = SQRT (vect__213.77_225);
  vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0
} / vect__67.78_217;
  vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248;
  vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248;
  vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248;

so the vrsqrt optimization happens later. g++ shows the same problems with
weird code generation. However this:

 /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */

does not match this:

vrsqrt14ps  %zmm1, %zmm2 # comparison and mask removed
vmulps  %zmm1, %zmm2, %zmm0
vmulps  %zmm2, %zmm0, %zmm1
vmulps  %zmm6, %zmm0, %zmm0
vaddps  %zmm7, %zmm1, %zmm1
vmulps  %zmm0, %zmm1, %zmm1
vrcp14ps%zmm1, %zmm0
vmulps  %zmm1, %zmm0, %zmm1
vmulps  %zmm1, %zmm0, %zmm1
vaddps  %zmm0, %zmm0, %zmm0
vsubps  %zmm1, %zmm0, %zmm0

Recommendations on the next place to look for what's going on?

[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-22 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed|2019-01-21 00:00:00 |2019-01-22
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Confirmed on aarch64 then.

[Bug tree-optimization/88972] New: popcnt of limited 128-bit number with unnecessary zeroing

2019-01-22 Thread drepper.fsp+rhbz at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972

Bug ID: 88972
   Summary: popcnt of limited 128-bit number with unnecessary
zeroing
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: drepper.fsp+rhbz at gmail dot com
  Target Milestone: ---

Compile the following code on x86-64 with -Ofast -march=haswell:

int f(__uint128_t m)
{
  if (m < 64000)
return __builtin_popcount(m);
  return -1;
}


The generated code with the trunk gcc looks like this:

   0:   b8 ff f9 00 00  mov$0xf9ff,%eax
   5:   48 39 f8cmp%rdi,%rax
   8:   b8 00 00 00 00  mov$0x0,%eax
   d:   48 19 f0sbb%rsi,%rax
  10:   72 0e   jb 20 
  12:   31 c0   xor%eax,%eax
  14:   f3 0f b8 c7 popcnt %edi,%eax
  18:   c3  retq   
  19:   0f 1f 80 00 00 00 00nopl   0x0(%rax)
  20:   b8 ff ff ff ff  mov$0x,%eax
  25:   c3  retq   


The instruction at offset 12 is unnecessary.  I guess this is a left-over from
the popcnt of the upper half which is recognized to be unnecessary and left
out.  There is no addition anymore but somehow the register clearing survived.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #25 from rguenther at suse dot de  ---
On Tue, 22 Jan 2019, elrodc at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #24 from Chris Elrod  ---
> The dump looks like this:
> 
>   vect__67.78_217 = SQRT (vect__213.77_225);
>   vect_ui33_68.79_248 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
> 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0
> } / vect__67.78_217;
>   vect__71.80_249 = vect__246.59_65 * vect_ui33_68.79_248;
>   vect_u13_73.81_250 = vect__187.71_14 * vect_ui33_68.79_248;
>   vect_u23_75.82_251 = vect__200.74_5 * vect_ui33_68.79_248;
> 
> so the vrsqrt optimization happens later. g++ shows the same problems with
> weird code generation. However this:
> 
>  /* sqrt(a)  = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0)
> rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */
> 
> does not match this:
> 
> vrsqrt14ps  %zmm1, %zmm2 # comparison and mask removed
> vmulps  %zmm1, %zmm2, %zmm0
> vmulps  %zmm2, %zmm0, %zmm1
> vmulps  %zmm6, %zmm0, %zmm0
> vaddps  %zmm7, %zmm1, %zmm1
> vmulps  %zmm0, %zmm1, %zmm1
> vrcp14ps%zmm1, %zmm0
> vmulps  %zmm1, %zmm0, %zmm1
> vmulps  %zmm1, %zmm0, %zmm1
> vaddps  %zmm0, %zmm0, %zmm0
> vsubps  %zmm1, %zmm0, %zmm0
> 
> Recommendations on the next place to look for what's going on?

You can try enabling -mrecip to see RSQRT in .optimized - there's
probably late 1/sqrt optimization on RTL.

[Bug target/88963] gcc generates terrible code for vectors of 64+ length which are not natively supported

2019-01-22 Thread husseydevin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963

--- Comment #9 from Devin Hussey  ---
(In reply to Andrew Pinski from comment #6)
> Try using 128 (or 256) and you might see that aarch64 falls down similarly.

yup. Oof.

test:
sub sp, sp, #560
stp x29, x30, [sp]
mov x29, sp
stp x19, x20, [sp, 16]
mov x19, 128
mov x20, x0
add x0, sp, 176
str x21, [sp, 32]
mov x21, x2
mov x2, x19
bl  memcpy
mov x2, x19
mov x1, x21
add x0, sp, 304
bl  memcpy
ldr q7, [sp, 176]
mov x2, x19
ldr q6, [sp, 192]
add x1, sp, 48
ldr q5, [sp, 208]
mov x0, x20
ldr q4, [sp, 224]
ldr q3, [sp, 240]
ldr q2, [sp, 256]
ldr q1, [sp, 272]
ldr q0, [sp, 288]
ldr q23, [sp, 304]
ldr q22, [sp, 320]
ldr q21, [sp, 336]
ldr q20, [sp, 352]
ldr q19, [sp, 368]
ldr q18, [sp, 384]
ldr q17, [sp, 400]
ldr q16, [sp, 416]
add v7.4s, v7.4s, v23.4s
add v6.4s, v6.4s, v22.4s
add v5.4s, v5.4s, v21.4s
add v4.4s, v4.4s, v20.4s
add v3.4s, v3.4s, v19.4s
str q7, [sp, 48]
add v2.4s, v2.4s, v18.4s
str q6, [sp, 64]
add v1.4s, v1.4s, v17.4s
str q5, [sp, 80]
add v0.4s, v0.4s, v16.4s
str q4, [sp, 96]
str q3, [sp, 112]
str q2, [sp, 128]
str q1, [sp, 144]
str q0, [sp, 160]
bl  memcpy
ldp x29, x30, [sp]
ldp x19, x20, [sp, 16]
ldr x21, [sp, 32]
add sp, sp, 560
ret

[Bug rtl-optimization/88953] Unrecognizable insn on architecture zEC12 with boost::bimap

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88953

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Jakub Jelinek  ---
Fixed then on all active branches.

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

--- Comment #4 from Jakub Jelinek  ---
Created attachment 45491
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45491&action=edit
gcc9-pr88964.patch

Untested fix.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread elrodc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #26 from Chris Elrod  ---
> You can try enabling -mrecip to see RSQRT in .optimized - there's
> probably late 1/sqrt optimization on RTL.

No luck. The full commands I used:

gfortran -Ofast -mrecip -S -fdump-tree-optimized -march=native -shared -fPIC
-mprefer-vector-width=512 -fno-semantic-interposition -o
gfortvectorizationdump.s  vectorization_test.f90

g++ -mrecip -Ofast -fdump-tree-optimized -S -march=native -shared -fPIC
-mprefer-vector-width=512 -fno-semantic-interposition -o
gppvectorization_test.s  vectorization_test.cpp

g++'s output was similar:

  vect_U33_60.31_372 = SQRT (vect_S33_59.30_371);
  vect_Ui33_61.32_374 = { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0,
1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0
} / vect_U33_60.31_372;
  vect_U13_62.33_375 = vect_S13_47.24_359 * vect_Ui33_61.32_374;
  vect_U23_63.34_376 = vect_S23_53.27_365 * vect_Ui33_61.32_374;

and it has the same assembly as gfortran for the rsqrt:

vcmpps  $4, %zmm0, %zmm5, %k1
vrsqrt14ps  %zmm0, %zmm1{%k1}{z}
vmulps  %zmm0, %zmm1, %zmm2
vmulps  %zmm1, %zmm2, %zmm0
vmulps  %zmm6, %zmm2, %zmm2
vaddps  %zmm7, %zmm0, %zmm0
vmulps  %zmm2, %zmm0, %zmm0
vrcp14ps%zmm0, %zmm10
vmulps  %zmm0, %zmm10, %zmm0
vmulps  %zmm0, %zmm10, %zmm0
vaddps  %zmm10, %zmm10, %zmm10
vsubps  %zmm0, %zmm10, %zmm10

[Bug middle-end/88950] stack_protect_prologue can be reordered by sched1 around memory accesses

2019-01-22 Thread matmal01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88950

Matthew Malcomson  changed:

   What|Removed |Added

  Known to fail||5.4.0

--- Comment #5 from Matthew Malcomson  ---
This problem has been around for a long time -- I have seen the same
fundamental problem on gcc 5.4 (when looking for a version to put in the "known
to work" field).

With   "gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" on the same
testcase, the stack_protect_test pattern gets reordered to before the second
memory access (the "buf[b] = c" line), and again the stack protection does not
guard this memory access.


(insn:TI 8 126 16 (parallel [
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29)
(const_int 88 [0x58])) [1 D.2834+0 S8 A64])
(unspec:DI [
(mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_SET))
(set (reg:DI 5 x5 [126])
(const_int 0 [0]))
]) stack-reorder.c:1 864 {stack_protect_set_di}
 (expr_list:REG_UNUSED (reg:DI 5 x5 [126])
(nil)))
(insn:TI 16 8 71 (set (mem/j:QI (plus:DI (reg:DI 0 x0 [105])
(const_int 4016 [0xfb0])) [0 buf S1 A8])
(reg:QI 4 x4 [106])) stack-reorder.c:3 45 {*movqi_aarch64}
 (expr_list:REG_DEAD (reg:QI 4 x4 [106])
(expr_list:REG_DEAD (reg:DI 0 x0 [105])
(nil
(insn 71 16 22 (parallel [
(set (reg:DI 3 x3 [125])
(unspec:DI [
(mem/v/f/c:DI (plus:DI (reg/f:DI 29 x29)
(const_int 88 [0x58])) [1 D.2834+0 S8 A64])
(mem/v/f/c:DI (reg/f:DI 3 x3 [100]) [1
__stack_chk_guard+0 S8 A64])
] UNSPEC_SP_TEST))
(clobber (reg:DI 0 x0 [127]))
]) stack-reorder.c:14 866 {stack_protect_test_di}
 (expr_list:REG_UNUSED (reg:DI 0 x0 [127])
(nil)))
(insn:TI 22 71 140 (set (mem/j:QI (plus:DI (reg:DI 1 x1 [110])
(const_int 4016 [0xfb0])) [0 buf S1 A8])
(reg:QI 2 x2 [ c ])) stack-reorder.c:4 45 {*movqi_aarch64}
 (expr_list:REG_DEAD (reg:QI 2 x2 [ c ])
(expr_list:REG_DEAD (reg:DI 1 x1 [110])
(nil

[Bug target/88954] __attribute__((noplt)) doesn't work with function pointers

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954

--- Comment #5 from Richard Biener  ---
For indirect calls the attributes on the function type pointed to a relevant. 
Unioning attributes from the actually called function (if the compiler can
figure that out) can be appropriate depending on the actual attribute.

[Bug tree-optimization/88713] Vectorized code slow vs. flang

2019-01-22 Thread elrodc at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713

--- Comment #27 from Chris Elrod  ---
g++ -mrecip=all -O3  -fno-signed-zeros -fassociative-math -freciprocal-math
-fno-math-errno -ffinite-math-only -fno-trapping-math -fdump-tree-optimized -S
-march=native -shared -fPIC -mprefer-vector-width=512
-fno-semantic-interposition -o gppvectorization_test.s  vectorization_test.cpp

is not enough to get vrsqrt. I need -funsafe-math-optimizations for the
instruction to appear in the asm.

[Bug tree-optimization/88973] New: New -Wrestrict warning since r268048

2019-01-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88973

Bug ID: 88973
   Summary: New -Wrestrict warning since r268048
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: msebor at gcc dot gnu.org
  Target Milestone: ---

Created attachment 45492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45492&action=edit
test-case

The test-case comes from autogen package:

$ gcc autogen.i -c -O2 -Werror=restrict
In function ‘strcpy’,
inlined from ‘canonicalize_pathname’ at autogen.i:10536:17,
inlined from ‘option_pathfind.constprop’ at autogen.i:10420:32:
autogen.i:4050:10: error: ‘__builtin_strcpy’ accessing 1 byte at offsets [0,
9223372036854775807] and [0, 9223372036854775807] may overlap 1 byte at offset
0 [-Werror=restrict]
 4050 |   return __builtin___strcpy_chk (__dest, __src, __builtin_object_size
(__dest, 2 > 1));
  | 
^
cc1: some warnings being treated as errors

Martin can you please verify that the warning is correct?

[Bug c/88955] transparent_union for vector types not accepted

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88955

Richard Biener  changed:

   What|Removed |Added

   Keywords||rejects-valid
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
 CC||hjl.tools at gmail dot com,
   ||jsm28 at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Hmm.  I guess the "issue" is that the union has TImode rather than V2DImode.
stor-layout doesn't look at TYPE_TRANSPARENT_AGGR at all though.  Relevant
is

  /* If we only have one real field; use its mode if that mode's size
 matches the type's size.  This generally only applies to RECORD_TYPE.
 For UNION_TYPE, if the widest field is MODE_INT then use that mode.
 If the widest field is MODE_PARTIAL_INT, and the union will be passed
 by reference, then use that mode.  */
  poly_uint64 type_size;
  if ((TREE_CODE (type) == RECORD_TYPE
   || (TREE_CODE (type) == UNION_TYPE
   && (GET_MODE_CLASS (mode) == MODE_INT
   || (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT
   && targetm.calls.pass_by_reference (pack_cumulative_args
(0),
   mode, type, 0)
  && mode != VOIDmode
  && poly_int_tree_p (TYPE_SIZE (type), &type_size)
  && known_eq (GET_MODE_BITSIZE (mode), type_size))
;
  else
mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk ();

where we reject vector modes.  The C++ diagnostic is a bit more clear:

> g++ t.c -S
t.c:5:1: error: type transparent ‘union’ cannot be made transparent
because the type of the first field has a different ABI from the class overall
 {
 ^

which hints at the implementation of the argument passing being the culprit
for the restriction (not sure why the ABI of the class overall should matter
given the docs of transparent_union say the ABI is specified by the first
field...)

[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
Fixed.

[Bug tree-optimization/88862] [9 Regression] ICE in extract_affine, at graphite-sese-to-poly.c:313

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88862

--- Comment #4 from Richard Biener  ---
Author: rguenth
Date: Tue Jan 22 11:28:56 2019
New Revision: 268147

URL: https://gcc.gnu.org/viewcvs?rev=268147&root=gcc&view=rev
Log:
2019-01-22  Richard Biener  

PR tree-optimization/88862
* graphite-scop-detection.c
(scop_detection::graphite_can_represent_scev): Reject ADDR_EXPR.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/graphite-scop-detection.c

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

--- Comment #5 from Richard Biener  ---
Hmm, I wonder if handling FP inductions during interchange causes correctness
issues as well (FP rounding, etc.).  Otherwise the patch looks obvious.

[Bug target/88965] powerpc64le vector builtin hits ICE in verify_gimple

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88965

--- Comment #4 from Richard Biener  ---
LGTM

[Bug c/88968] [8/9 Regression] Stack overflow in gimplify_expr

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88968

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug c++/88969] [9 Regression] ICE in build_op_delete_call, at cp/call.c:6509

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88969

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Target Milestone|--- |9.0
Summary|ICE in  |[9 Regression] ICE in
   |build_op_delete_call, at|build_op_delete_call, at
   |cp/call.c:6509  |cp/call.c:6509

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

--- Comment #6 from Jakub Jelinek  ---
In the spot which I'm changing IMHO shouldn't, that + 0.0 really should be
folded (and if not, we should tweak create_iv not to do any addition if
real_zerop).  Though of course for other floating point IVs where the step is
non-zero it could make a difference.

[Bug tree-optimization/88970] ICE: verify_ssa failed (error: definition in block 2 follows the use)

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88970

Richard Biener  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
Version|unknown |9.0

--- Comment #2 from Richard Biener  ---
Looks like a missing/incomplete DECL_EXPR.

;; Function void d() (null)
;; enabled by -tree-original


{
  typedef int e[0:(sizetype) SAVE_EXPR ];
^^^ shouldn't this have (ssizetype) b (1) + -1)?

  int f[0:(sizetype) SAVE_EXPR ];
  int c;
  typedef struct __lambda0 __lambda0;

ssizetype D.2306;
  < (1) + -1) >;
  <];>>;
int c;
  <::operator() (&TARGET_EXPR ) >;
}

[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||rguenth at gcc dot gnu.org
  Component|c++ |libstdc++

--- Comment #1 from Richard Biener  ---
This is because it still needs to generate the std::string objects at the
caller
site (outside of the if (print)).  This involves quite some code to get
rid of, and even at -O3 we do not inline basic_string::basic_string it seems
(ISTR that is out-of-line in the library):

  __asm__ __volatile__("mfence" :  :  : "memory");
  _6 = MEM[(const int *)&data + 4B];
  if (_6 > 0)
goto ; [41.48%]
  else
goto ; [58.52%]

   [local count: 445388109]:
  std::basic_string::basic_string (&D.39204, "<", &D.39205);
  _7 = MEM[(char * *)&D.39204];
  _8 = _7 + 18446744073709551592;
  if (_8 != &_S_empty_rep_storage)
goto ; [10.00%]
  else
goto ; [90.00%]

   [local count: 434030711]:
  goto ; [100.00%]

   [local count: 44538811]:
  if (__gthrw___pthread_key_create != 0B)
goto ; [53.47%]
  else
goto ; [46.53%]

   [local count: 23814902]:
  _9 = &MEM[(struct _Rep *)_7 + -24B].D.23940._M_refcount;
  _10 = __atomic_fetch_add_4 (_9, 4294967295, 4);
  _11 = (int) _10;
  goto ; [100.00%]

   [local count: 20723909]:
  __result_12 = MEM[(_Atomic_word *)_7 + -8B];
  _13 = __result_12 + -1;
  MEM[(_Atomic_word *)_7 + -8B] = _13;

   [local count: 44538811]:
  # _14 = PHI <_11(6), __result_12(7)>
  if (_14 <= 0)
goto ; [25.50%]
  else
goto ; [74.50%]

   [local count: 11357397]:
  std::basic_string::_Rep::_M_destroy (_8, &D.39206);

   [local count: 445388108]:
  D.39206 ={v} {CLOBBER};
  D.39204 ={v} {CLOBBER};
  D.39205 ={v} {CLOBBER};

   [local count: 1073741825]:
  __asm__ __volatile__("mfence" :  :  : "memory");
  data ={v} {CLOBBER};

[Bug target/88972] popcnt of limited 128-bit number with unnecessary zeroing

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-22
  Component|tree-optimization   |target
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Err, __builtin_popcount has an integer argument so you call popcount on
(int)m.  The reason must be different.

(insn 17 16 26 4 (parallel [
(set (reg:SI 88 [  ])
(popcount:SI (subreg:SI (reg/v:TI 89 [ m ]) 0)))
(clobber (reg:CC 17 flags))
]) "t.c":4 -1
 (nil))

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

--- Comment #7 from Jakub Jelinek  ---
Actually no, with HONOR_SIGNED_ZEROS it shouldn't be optimized out.
So, if we don't have other way how to make distinction between a normal chrec
with step +0.0 and loop invariant var, we should punt at least for
HONOR_SIGNED_ZEROS.

[Bug tree-optimization/88973] [8/9 Regression] New -Wrestrict warning since r268048

2019-01-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88973

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic
   Priority|P3  |P2
  Known to work||8.2.0
   Target Milestone|--- |8.3
Summary|New -Wrestrict warning  |[8/9 Regression] New
   |since r268048   |-Wrestrict warning since
   ||r268048
  Known to fail||8.2.1

--- Comment #1 from Richard Biener  ---
I believe the change was backported as well.

[Bug tree-optimization/88964] [8/9 Regression] ICE in wide_int_to_tree_1, at tree.c:1561

2019-01-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88964

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #45491|0   |1
is obsolete||

--- Comment #8 from Jakub Jelinek  ---
Created attachment 45493
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45493&action=edit
gcc9-pr88964.patch

Updated patch.

[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)

2019-01-22 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971

--- Comment #2 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #1)
> rid of, and even at -O3 we do not inline basic_string::basic_string it seems
> (ISTR that is out-of-line in the library):

There's an explicit instantiation in the library, but the definition is inline
in the headers. If the compiler wanted to inline it, all the code is visible
and nothing forces it to use the explicit instantiation in the library.

[Bug target/88972] popcnt of limited 128-bit number with unnecessary zeroing

2019-01-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88972

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Uroš Bizjak  ---
This is by design.

/* X86_TUNE_AVOID_FALSE_DEP_FOR_BMI: Avoid false dependency
   for bit-manipulation instructions.  */
DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
  m_SANDYBRIDGE | m_CORE_AVX2 | m_GENERIC)

[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)

2019-01-22 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971

--- Comment #3 from Jonathan Wakely  ---
(In reply to Richard Biener from comment #1)
> This is because it still needs to generate the std::string objects at the
> caller
> site (outside of the if (print)).  This involves quite some code to get
> rid of, and even at -O3 we do not inline basic_string::basic_string it seems
> (ISTR that is out-of-line in the library):
> 
>   __asm__ __volatile__("mfence" :  :  : "memory");
>   _6 = MEM[(const int *)&data + 4B];
>   if (_6 > 0)
> goto ; [41.48%]
>   else
> goto ; [58.52%]
> 
>[local count: 445388109]:
>   std::basic_string::basic_string (&D.39204, "<", &D.39205);
>   _7 = MEM[(char * *)&D.39204];
>   _8 = _7 + 18446744073709551592;
>   if (_8 != &_S_empty_rep_storage)
> goto ; [10.00%]
>   else
> goto ; [90.00%]

Looks like you're using -D_GLIBCXX_USE_CXX11_ABI=0 but the OP is not.

[Bug target/88952] The asm operator modifiers for rs6000 should be documented like they are for x86

2019-01-22 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88952

--- Comment #11 from Uroš Bizjak  ---
(In reply to Christopher Leonard from comment #10)
> Getting contradictory statements now:
> >reg:reg+1 maps to lo:hi on x86.
> >On x86, we don't allow register pairs in asm at all.
> 
> Not allowing, or printing a warning, is much better behavior than what I
> have been getting on PPC.

Ah, sorry - x86 emits a warning.

  1   2   3   >