[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #17 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #16)
> Yes, after the problem occurred, I did a completely clean new build of gmp,
> mpfr, mpc, gcc (configured with ../configure --prefix=/usr/local/
> --with-gmp=/usr/local/ --with-mpfr=/usr/local/ --with-mpc=/usr/local/
> --enable-checking=release --enable-languages=c,c++,fortran,lto),
> all the tools our software depends, and our software.

OK, FWIW (thinking a bit more last night) if you examine the logs from building
GCC, you will see the same linker complaint in the log for building
libstdc++.dylib.  Which kinda reinforces the expectation that this is not the
source of the problem.  However, I'm thinking to try and construct some small
experiment to check that the newer ld64 doesn't do something active as well as
complain.

> It turns out that
> external C++ libraries linked into our (Fortran) project via bind(C)

I might be wrong, but suspect there was some change to the C binding around
that time too - but I also recall seeing a recent patch go by to fix a problem
in that area (but not sure if it's been applied yet).  Will let Dominique
comment on that.

> are not
> a problem if they have been built via libtool, such that a .dylib, a .a and
> a .la file are present. The two projects that have problem either exist as
> .dylib and .a produced by hand-written configure and makefiles (i.e. not
> using autotools), or only as dynamic libraries produced via cmake and make.

That's an interesting observation, what we need is to find the specific
difference in the output exe.

* Narrowing this down by knowing where and what causes the problem will become
important at some point - so a debug build and lldb session could be a useful
next step.

* as a general rule, it's also useful to see if an -O0 build exhibits the
problem - in case its an optimisation  issue.

[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758

--- Comment #5 from Martin Liška  ---
What about this:

$ cat 11.i
void PreEvaluate(void);
int main() { PreEvaluate(); return 0; }

$ cat 22.i
cat 22.i
extern int a[];
int b;
int c;

void PreEvaluate(void) {
  b = 0;
  for (; b < 8; b++)
a[b] = c * (b > 0 ? b - 1 : 0);
}

$ gcc-8 11.i 22.i -flto -O3 -shared -fPIC
$ gcc 11.i 22.i -flto -O3 -shared -fPIC
during GIMPLE pass: dom
22.i: In function ‘PreEvaluate’:
22.i:5:6: internal compiler error: Segmentation fault
5 | void PreEvaluate(void) {
  |  ^
0xc186df crash_signal
/home/marxin/Programming/gcc/gcc/toplev.c:326
0x76d8910f ???
   
/usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0xeb6184 location_wrapper_p(tree_node const*)
/home/marxin/Programming/gcc/gcc/tree.h:3807
0xeb6184 tree_strip_any_location_wrapper(tree_node*)
/home/marxin/Programming/gcc/gcc/tree.h:3819
0xeb6184 initializer_each_zero_or_onep(tree_node const*)
/home/marxin/Programming/gcc/gcc/tree.c:11239
0xeb6264 initializer_each_zero_or_onep(tree_node const*)
/home/marxin/Programming/gcc/gcc/tree.c:11259
0x1083fcf gimple_simplify_MULT_EXPR
/dev/shm/objdir/gcc/gimple-match.c:47953
0xfa636f gimple_simplify
/dev/shm/objdir/gcc/gimple-match.c:90161
0xfa79a3 gimple_resimplify2(gimple**, gimple_match_op*, tree_node*
(*)(tree_node*))
/home/marxin/Programming/gcc/gcc/gimple-match-head.c:285
0x10bb1df gimple_simplify(gimple*, gimple_match_op*, gimple**, tree_node*
(*)(tree_node*), tree_node* (*)(tree_node*))
/home/marxin/Programming/gcc/gcc/gimple-match-head.c:895
0x98f334 fold_stmt_1
/home/marxin/Programming/gcc/gcc/gimple-fold.c:4934
0xd2c566 dom_opt_dom_walker::optimize_stmt(basic_block_def*,
gimple_stmt_iterator)
/home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:1967
0xd2db2c dom_opt_dom_walker::before_dom_children(basic_block_def*)
/home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:1468
0x13fd3a7 dom_walker::walk(basic_block_def*)
/home/marxin/Programming/gcc/gcc/domwalk.c:353
0xd2e99d execute
/home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:706

[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328

2019-01-09 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752

Matthias Kretz  changed:

   What|Removed |Added

  Attachment #45376|0   |1
is obsolete||

--- Comment #4 from Matthias Kretz  ---
Created attachment 45385
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45385&action=edit
valid code test case

True, I made an error in the verification script. Better reduction attached.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #24 from rguenther at suse dot de  ---
On Wed, 9 Jan 2019, dongjianqiang2 at huawei dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #23 from John Dong  ---
> diff -urp a/gcc/expr.c b/gcc/expr.c
> --- a/gcc/expr.c2019-01-09 03:19:03.750205982 +0800
> +++ b/gcc/expr.c2019-01-09 03:38:23.414174738 +0800
> @@ -10760,6 +10760,16 @@ expand_expr_real_1 (tree exp, rtx target
> && GET_MODE_CLASS (ext_mode) == MODE_INT)
>   reversep = TYPE_REVERSE_STORAGE_ORDER (type);
> 
> +   int modePrecision = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE
> (tem)));
> +   int typePrecision = TYPE_PRECISION (TREE_TYPE (tem));
> +   int shiftSize = modePrecision - typePrecision;
> +   rtx regTarget = gen_reg_rtx (GET_MODE (op0));
> +
> +   if (shiftSize && REG_P (op0))
> + op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0,
> + shiftSize, regTarget,
> + TYPE_UNSIGNED (TREE_TYPE (tem)));
> +
> op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
>  (modifier == EXPAND_STACK_PARM
>   ? NULL_RTX : target),
> 
> Tried to fix the bug when expand.

The bug is clearly in value-numbering, not RTL expansion

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

--- Comment #19 from Richard Biener  ---
(In reply to Richard Biener from comment #18)
> So for find_base_term to compute sth conservative we'd need to track
> RTX_SURELY_NON_POINTER (what RTX is surely _not_ based on a pointer
> and thus can be ignored).  And when find_base_term ever figures
> two bases in say a PLUS it has to conservatively return 0.
> 
> I fear the existing REG_POINTER does not help at all.  For the testcase
> we have
> 
> (plus:DI (reg:DI 83 [ d.0_2 ])
> (symbol_ref:DI ("y") [flags 0x2]  ))
> 
> where reg:DI 83 is not marked with REG_POINTER and find_base_term
> doesn't find it to be an alternate base.  For the testcase the
> offending MEM has a MEM_EXPR and we have proper points-to info.
> 
> IMHO the proper solution is to kill base_alias_check or all problematic
> cases in find_base_term (binary ops with more than one non-CONST_INT
> operand).
> 
> And eventually make sure to more properly preserve MEM_EXPRs.
> 
> Maybe sth as "simple" as the following which of course fixes the
> testcase but will make find_base_term fail on any variable-indexed
> thing.
> 
> diff --git a/gcc/alias.c b/gcc/alias.c
> index 93f53543d12..3a66e10b431 100644
> --- a/gcc/alias.c
> +++ b/gcc/alias.c
> @@ -2009,12 +2009,14 @@ find_base_term (rtx x, vec rtx base = find_base_term (tmp1, visited_vals);
> if (base != NULL_RTX
> && ((REG_P (tmp1) && REG_POINTER (tmp1))
> -|| known_base_value_p (base)))
> +|| known_base_value_p (base))
> +   && CONST_INT_P (tmp2))
>   return base;
> base = find_base_term (tmp2, visited_vals);
> if (base != NULL_RTX
> && ((REG_P (tmp2) && REG_POINTER (tmp2))
> -|| known_base_value_p (base)))
> +|| known_base_value_p (base))
> +   && CONST_INT_P (tmp1))
>   return base;
>  
> /* We could not determine which of the two operands was the

"benchmarking" this by comparing cc1 with/without shows a difference mostly
in scheduling (but the number of differences is comparatively small!).  Also
overall text size shrinks with the patch (whatever that means).

On GIMPLE we try hard to not construct addresses "based" on the wrong
object, in fact IVOPTs has code to avoid building IVs based on
things like &a - &b and propagation avoids turning unintptr_t arithmetic
into pointer arithmetic even if it can see the converted from addresses.

All those things cannot be done on RTL since we lost the distinction between
pointers and integers and there's only PLUS.

So I have a _very_ hard time seeing how RTL can ever be fixed to discover
bases for alias analysis purposes without just resorting to MEM_EXPRs.

That is, unless we want to live with this kind of wrong-code bugs.

Similarly fishy is may_be_sp_based_p.

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #18 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #14)

does the application use exceptions?

> This one is failing:
> gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o
> static_1.exe .libs/static_1.exe_prclib_dispatcher.o 



> /usr/local/lib/libstdc++.a

^^^ please confirm that this is from the "current compiler build".



> -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/
> libsupc++/.libs -lm 

 note - no "-lSystem -lgcc_ext.10.5" (which is what I'd expect).

> 
> while that one is working:
> 
> gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o
> static_1.exe .libs/static_1.exe_prclib_dispatcher.o 



> libsupc++/.libs -lSystem -lgcc_ext.10.5 /usr/local//lib/libHepMC.a -lstdc++
> -llcio -lm

^^^ this looks like the build process in this case is adding libs that the
compiler driver normally adds ( they are not present in the case above ).

* If you can extract these two fortran link lines - and then execute them
separately in the build dir with "-v" so that we can see the output of the
compiler-driver's internal link line and what its search paths are.

* According to your posted otool output, the version of libstdc++.dylib that is
bound is the one in /usr/local/lib/ which is where you pick up the static lib
in the non-working case.

* The object files used to build the static (.a) and dynamic (.dylib) versions
of libstdc++ are the same, so we really need to pin down where the issue
occurs.

* DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1  
will show you which libraries are used, and from which library each symbol is
resolved - it probably will produce a lot of output..

[Bug tree-optimization/87214] [9 Regression] r263772 miscompiled 520.omnetpp_r in SPEC CPU 2017

2019-01-09 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|REOPENED|ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
Mine then.

[Bug libstdc++/88204] New test case 26_numerics/complex/operators/more_constexpr.cc from r266416 fails

2019-01-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88204

--- Comment #2 from Jonathan Wakely  ---
Author: redi
Date: Wed Jan  9 09:37:34 2019
New Revision: 267757

URL: https://gcc.gnu.org/viewcvs?rev=267757&root=gcc&view=rev
Log:
PR libstdc++/88204 disable std::complex tests

The IBM128 long double format isn't foldable in constant expressions, so
conditionally skip the std::complex cases when they'll
fail.

PR libstdc++/88204
* testsuite/26_numerics/complex/operators/more_constexpr.cc: Do not
test std::complex if long double format is IBM128.
* testsuite/26_numerics/complex/requirements/more_constexpr.cc:
Likewise.

Modified:
trunk/libstdc++-v3/ChangeLog
   
trunk/libstdc++-v3/testsuite/26_numerics/complex/operators/more_constexpr.cc
   
trunk/libstdc++-v3/testsuite/26_numerics/complex/requirements/more_constexpr.cc

[Bug tree-optimization/88763] Better Output for Loop Unswitching

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-09
 CC||dmalcolm at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I guess the logging should be switched to dump_* so that -fopt-info- can
report these.

[Bug c++/88761] [8/9 Regression] ICE in tsubst_copy, at cp/pt.c:15478 when chaining lambda calls & fold-expressions

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88761

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to work||7.3.1
Version|8.2.0   |8.2.1
   Target Milestone|9.0 |8.3
Summary|[9 Regression] ICE in   |[8/9 Regression] ICE in
   |tsubst_copy, at |tsubst_copy, at
   |cp/pt.c:15478 when chaining |cp/pt.c:15478 when chaining
   |lambda calls &  |lambda calls &
   |fold-expressions|fold-expressions

[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #1 from Richard Biener  ---
So LLVM unrolls 4 times while GCC (always) unrolls 8 times.  The unrolled body
for GCC (x86_64 this time) is

.L4:
movl(%rdx), %ecx
vmovsd  (%rax), %xmm8
addq$32, %rdx
addq$64, %rax
vmovsd  -56(%rax), %xmm9
vmovsd  -48(%rax), %xmm10
vfmadd231sd (%rsi,%rcx,8), %xmm8, %xmm0
movl-28(%rdx), %ecx
vmovsd  -40(%rax), %xmm11
vmovsd  -32(%rax), %xmm12
vfmadd231sd (%rsi,%rcx,8), %xmm9, %xmm0
movl-24(%rdx), %ecx
vmovsd  -24(%rax), %xmm13
vmovsd  -16(%rax), %xmm14
vfmadd231sd (%rsi,%rcx,8), %xmm10, %xmm0
movl-20(%rdx), %ecx
vmovsd  -8(%rax), %xmm15
vfmadd231sd (%rsi,%rcx,8), %xmm11, %xmm0
movl-16(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm12, %xmm0
movl-12(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm13, %xmm0
movl-8(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm14, %xmm0
movl-4(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm15, %xmm0
cmpq%rax, %r9
jne .L4

and what you quoted is the prologue.  You didn't quote llvms prologue
but if I read my clangs outout correct it uses a loop there.
(is there sth like -fdump-tree-optimized for clang?)

Our RTL unroller cannot do a loopy prologue but it always has this
jump-into peeled copies thing.  Using --param max-unroll-times=4
produces

.L4:
movl(%rdx), %ecx
vmovsd  (%rax), %xmm2
addq$16, %rdx
addq$32, %rax
vmovsd  -24(%rax), %xmm3
vmovsd  -16(%rax), %xmm4
vfmadd231sd (%rsi,%rcx,8), %xmm2, %xmm0
movl-12(%rdx), %ecx
vmovsd  -8(%rax), %xmm5
vfmadd231sd (%rsi,%rcx,8), %xmm3, %xmm0
movl-8(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm4, %xmm0
movl-4(%rdx), %ecx
vfmadd231sd (%rsi,%rcx,8), %xmm5, %xmm0
cmpq%rax, %r8
jne .L4

which is nearly equivalent to clnags varaint?

[Bug rtl-optimization/88331] [9 Regression] ICE in rtl_verify_bb_layout, at cfgrtl.c:2987

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88331

--- Comment #15 from Jakub Jelinek  ---
Author: jakub
Date: Wed Jan  9 10:16:10 2019
New Revision: 267758

URL: https://gcc.gnu.org/viewcvs?rev=267758&root=gcc&view=rev
Log:
PR rtl-optimization/88331
* function.c (assign_stack_local_1): Don't set dynamic_align_addr if
not currently_expanding_to_rtl.

* gcc.target/i386/pr88331.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr88331.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/function.c
trunk/gcc/testsuite/ChangeLog

[Bug libstdc++/87855] std::optional only copy-constructible if T is trivially copy-constructible

2019-01-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87855

--- Comment #22 from Jonathan Wakely  ---
Author: redi
Date: Wed Jan  9 10:17:10 2019
New Revision: 267759

URL: https://gcc.gnu.org/viewcvs?rev=267759&root=gcc&view=rev
Log:
PR libstdc++/87855 fix optional for types with non-trivial copy/move

Backport both parts of the fix for PR libstdc++/87855, as well as a test
tweak from r263657 to avoid having to adjust dg-error line numbers.

* testsuite/20_util/optional/cons/value_neg.cc: Change dg-error to
dg-prune-output. Remove unused header.

Backport from mainline
2019-01-08  Jonathan Wakely  

When the contained value is not trivially copy (or move) constructible
the union's copy (or move) constructor will be deleted, and so the
_Optional_payload delegating constructors are invalid. G++ fails to
diagnose this because it incorrectly performs copy elision in the
delegating constructors. Clang does diagnose it (llvm.org/PR40245).

The solution is to avoid performing any copy (or move) when the
contained value's copy (or move) constructor isn't trivial. Instead the
contained value can be constructed by calling _M_construct. This is OK,
because the relevant constructor doesn't need to be constexpr when the
contained value isn't trivially copy (or move) constructible.

Additionally, this patch removes a lot of code duplication in the
_Optional_payload partial specializations and the _Optional_base partial
specialization, by hoisting it into common base classes.

The Python pretty printer for std::optional needs to be adjusted to
support the new layout. Retain support for the old layout, and add a
test to verify that the support still works.

PR libstdc++/87855
* include/std/optional (_Optional_payload_base): New class template
for common code hoisted from _Optional_payload specializations. Use
a template for the union, to allow a partial specialization for
types with non-trivial destructors. Add constructors for in-place
initialization to the union.
(_Optional_payload(bool, const _Optional_payload&)): Use _M_construct
to perform non-trivial copy construction, instead of relying on
non-standard copy elision in a delegating constructor.
(_Optional_payload(bool, _Optional_payload&&)): Likewise for
non-trivial move construction.
(_Optional_payload): Derive from _Optional_payload_base and use it
for everything except the non-trivial assignment operators, which are
defined as needed.
(_Optional_payload): Derive from the specialization
_Optional_payload and add a destructor.
(_Optional_base_impl::_M_destruct, _Optional_base_impl::_M_reset):
Forward to corresponding members of _Optional_payload.
(_Optional_base_impl::_M_is_engaged, _Optional_base_impl::_M_get):
Hoist common members from _Optional_base.
(_Optional_base): Make all members and base class public.
(_Optional_base::_M_get, _Optional_base::_M_is_engaged): Move to
_Optional_base_impl.
* python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Add
support for new std::optional layout.
* testsuite/libstdc++-prettyprinters/compat.cc: New test.

Backport from mainline
2018-11-19  Ville Voutilainen  

PR libstdc++/87855
Also implement P0602R4 (variant and optional
should propagate copy/move triviality) for std::optional.
* include/std/optional (_Optional_payload): Change
the main constraints to check constructibility in
addition to assignability.
(operator=): Make constexpr.
(_M_reset): Likewise.
(_M_construct): Likewise.
(operator->): Likewise.
* testsuite/20_util/optional/assignment/8.cc: Adjust.
* testsuite/20_util/optional/assignment/9.cc: New.

Added:
   
branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/assignment/9.cc
Modified:
branches/gcc-8-branch/libstdc++-v3/ChangeLog
branches/gcc-8-branch/libstdc++-v3/include/std/optional
   
branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/assignment/8.cc
   
branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/cons/value_neg.cc

[Bug libstdc++/87855] std::optional only copy-constructible if T is trivially copy-constructible

2019-01-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87855

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |8.3

--- Comment #21 from Jonathan Wakely  ---
Also fixed for GCC 8.3

[Bug target/88756] [nvptx, openacc] Override too many num_workers in nvptx plugin, instead of erroring out

2019-01-09 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88756

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> For the user, it's somewhat confusing that this passes with warning when
> compiling as C++, and fails to execute when compiling as C.

> I wonder why we don't do the
> same in the plugin, that is, override with warning.
> 
> We would have the more acceptable difference of "compile with warning and
> run" vs "compile and run with warning".

Thomas, any comments from OpenACC usability perspective?

Thanks,
- Tom

[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758

--- Comment #6 from Jakub Jelinek  ---
Author: jakub
Date: Wed Jan  9 10:24:43 2019
New Revision: 267760

URL: https://gcc.gnu.org/viewcvs?rev=267760&root=gcc&view=rev
Log:
PR middle-end/88758
* tree.c (initializer_each_zero_or_onep) : Use
vector_cst_elt instead of VECTOR_CST_ENCODED_ELT.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree.c

[Bug c/88766] New: [9 Regression] Rejects valid? C code since r259641

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766

Bug ID: 88766
   Summary: [9 Regression] Rejects valid? C code since r259641
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, jsm28 at gcc dot gnu.org
  Target Milestone: ---

Following code (reduced from gpg2) now fails to compile:

$ cat dns-stuff.i
struct dns_options {
  struct {
void *a;
int b;
  };
  int *socks_host;
  char *socks_user;
  char *socks_password;
};
static char tor_socks_user[1], tor_socks_password[1];
struct {
  int socks_host;
} libdns;
int d;
int *c();
int ax() {
  int *az;
  int ba;
  az = c((&__extension__({
   (struct dns_options){{0, 0},
0,
0,
.socks_host = &libdns.socks_host,
.socks_user = tor_socks_user,
.socks_password = tor_socks_password};
 })),
 &ba);
  d = *az;
  return 0;
}

$ gcc dns-stuff.i
dns-stuff.i: In function ‘ax’:
dns-stuff.i:19:11: error: lvalue required as unary ‘&’ operand
   19 |   az = c((&__extension__({
  |   ^

[Bug rtl-optimization/88331] [9 Regression] ICE in rtl_verify_bb_layout, at cfgrtl.c:2987

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88331

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Jakub Jelinek  ---
Fixed.

[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jakub Jelinek  ---
Fixed.  If you manage to turn the testcase into testsuite suitable form, please
commit it with this PR's number in the ChangeLog.

[Bug c/88766] [9 Regression] Rejects valid? C code since r259641

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.0

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Created attachment 45386
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45386&action=edit
aarch64-llvm output with -Ofast -mcpu=cortex-a57

I'm attaching the full LLVM aarch64 output.

The output you quoted is with -funroll-loops. If that's not given, GCC doesn't
seem to unroll by default at all (on aarch64 or x86_64 from my testing).

Is there anything we can do to make the default unrolling a bit more
aggressive?

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #19 from Jürgen Reuter  ---
(In reply to Iain Sandoe from comment #18)
> (In reply to Jürgen Reuter from comment #14)
> 
> does the application use exceptions?

No exceptions, only a poor man's C signal catcher. 

> 
> > /usr/local/lib/libstdc++.a
> 
> ^^^ please confirm that this is from the "current compiler build".
> 

Yes, they are the same. Unfortunately, there is no uninstall target for gcc,
but all stdc++ libraries in /usr/local/lib are from my Jan 8 clean building. 


> 
> ^^^ this looks like the build process in this case is adding libs that
> the compiler driver normally adds ( they are not present in the case above ).
> 

Yes, that is for a different reason, a different build with a tutorial C and
C++ wrapper for our code, but they don't hurt here.


> * If you can extract these two fortran link lines - and then execute them
> separately in the build dir with "-v" so that we can see the output of the
> compiler-driver's internal link line and what its search paths are.

This is the output for the non-working linking:
$ gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o
static_1.exe .libs/static_1.exe_prclib_dispatcher.o 
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hepmc
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/lcio
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hoppet
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/looptools
-L/usr/local/packages/OpenLoops/lib -L/usr/local/lib -L../src
./.libs/static_1_lib.a
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/models
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core/.libs/libwhizard_main.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libomega.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/omega/src/.libs/libomega_core.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libwhizard.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/vamp/src/.libs/libvamp.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe1/src/.libs/libcirce1.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe2/src/.libs/libcirce2.a
-lcuttools -lopenloops -loneloop -lolcommon -lrambo /usr/local/lib/libLHAPDF.a
/usr/local//lib/libHepMC.a -llcio /usr/local/lib/libstdc++.a
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src/.libs
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/libsupc++/.libs
-lm -v
Driving: gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o
static_1.exe .libs/static_1.exe_prclib_dispatcher.o
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hepmc
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/lcio
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hoppet
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/looptools
-L/usr/local/packages/OpenLoops/lib -L/usr/local/lib -L../src
./.libs/static_1_lib.a
-L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/models
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core/.libs/libwhizard_main.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libomega.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/omega/src/.libs/libomega_core.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libwhizard.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/vamp/src/.libs/libvamp.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe1/src/.libs/libcirce1.a
/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe2/src/.libs/libcirce2.a
-lcuttools -lopenloops -loneloop -lolcommon -lrambo /usr/local/lib/libLHAPDF.a
/usr/local//lib/libHepMC.a -llcio /usr/local/lib/libstdc++.a
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src/.libs
-L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/libsupc++/.libs
-lm -v -mmacosx-version-min=10.14.0 -asm_macosx_version_min=10.14 -l gfortran
-shared-libgcc
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin18.2.0/9.0.0/lto-wrapper
Target: x86_64-apple-darwin18.2.0
Configured with: ../configure --prefix=/usr/local/ --with-gmp=/usr/local/
--with-mpfr=/usr/local/ --with-mpc=/usr/local/ --enable-checking=release
--enable-languages=c,c++,fortran,lto
Thread model: posix
gcc version 9.0.0 20190107 (experimental) (GCC) 
Reading specs from
/usr/local/lib/gcc/x86_64-apple-darwin18.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #3 from Richard Biener  ---
(In reply to ktkachov from comment #2)
> Created attachment 45386 [details]
> aarch64-llvm output with -Ofast -mcpu=cortex-a57
> 
> I'm attaching the full LLVM aarch64 output.
> 
> The output you quoted is with -funroll-loops. If that's not given, GCC
> doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> testing).
> 
> Is there anything we can do to make the default unrolling a bit more
> aggressive?

Well, the RTL loop unroller is not enabled by default at any
optimization level (unless you are using FDO).  There's also
related flags not enabled (-fsplit-ivs-in-unroller and
-fvariable-expansion-in-unroller).

The RTL loop unroller is simply not good at estimating benefit
of unrolling (which is also why you usually see it unrolling
--param max-unroll-times times) and the tunables it has are
not very well tuned across targets.

Micha did quite extensive benchmarking (on x86_64) which shows that
the cases where unrolling is profitable are rare and the reason
is often hard to understand.

That's of course in the context of CPUs having caches of
pre-decoded/fused/etc. instructions optimizing issue which
makes peeled prologues expensive as well as even more special
caches for small loops avoiding more frontend costs.

Not sure if arm archs have any of this.

I generally don't believe in unrolling as a separately profitable
transform.  Rather unrolling could be done as part of another
transform (vectorization is the best example).  For sth still
done on RTL that would then include scheduling which is where
the best cost estimates should be available (and if you do
this post-reload then you even have a very good idea of
register pressure).  This is also why I think a standalone
unrolling phase belongs on RTL since I don't see a good way
of estimating cost/benefit on GIMPLE (see how difficult it is
to cost vectorization vs. non-vectorization there).

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #20 from Jürgen Reuter  ---
Created attachment 45387
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45387&action=edit
DYLD_PRINT output non-working example

DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1 ./static_1.exe >
non_working_output 2>&1

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #21 from Jürgen Reuter  ---
Created attachment 45388
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45388&action=edit
DYLD_PRINT output working example

DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1 ./static_1.exe > working_output
2>&1

[Bug tree-optimization/88767] New: 'unroll and jam' not optimizing some loops

2019-01-09 Thread helijia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

Bug ID: 88767
   Summary: 'unroll and jam' not optimizing some loops
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: helijia at gcc dot gnu.org
  Target Milestone: ---

The test source is as follows:
__attribute__((noinline)) void calculate(const double* __restrict__ A, const
double* __restrict__ B, double* __restrict__ C) {
  unsigned int l_m = 0;
  unsigned int l_n = 0;
  unsigned int l_k = 0;

  A = (const double*)__builtin_assume_aligned(A,16);
  B = (const double*)__builtin_assume_aligned(B,16);
  C = (double*)__builtin_assume_aligned(C,16);

  for ( l_n = 0; l_n < 9; l_n++ ) { // loop 1 
   for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; } // loop 2 

for ( l_k = 0; l_k < 17; l_k++ ) { // loop 3 
  for ( l_m = 0; l_m < 10; l_m++ ) { // loop 4
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
  }
}
  }
}

#define SIZE 36
double A[SIZE][SIZE] __attribute__((aligned(16)));
double B[SIZE][SIZE] __attribute__((aligned(16)));
double C[SIZE][SIZE] __attribute__((aligned(16)));

int main()
{
  long r, i, j;

  for (i=0; i < SIZE; i++) {
for (j=0; j < SIZE; j++) {
  A[i][j] = 1.0;
  B[i][j] = 2.0;
  C[i][j] = 3.0;
}
  }

  for (r=0; r < 100; r++) {
calculate(&A[0][0],&B[0][0], &C[0][0]);
  }

  return 0;
}

First, I compile the test case with the following command. g++
unroll_jam_bug.cpp -O3  -funroll-loops -floop-unroll-and-jam -o unroll_jam_bug
-fdump-tree-unrolljam-details. In the generated file of
unroll_jam_bug.cpp.143t.unrolljam, I found that there is no unroll and jam
optimization for the loop in the calculate function.

Second, I added the -fdump-tree-all parameter to the command line. I found that
the innermost loop(loop 3 and 4) is completely unrolled because
pass_data_complete_unrolli pass thinks innermost loop is small. As the inner
loop is fully expanded, the original loop becomes large. When the loop is
expanded in the pass_loop_jam pass, the number of unroll_factor * loop
instruction > 200 will be judged. If the result is true, the optimization will
be abandoned. Otherwise, the optimization will proceed. 

By the second analysis, I tried to ban the unrolli optimization.So I use the
following command line. g++ unroll_jam_bug.cpp -O3 -mcpu=power8
-fdisable-tree-cunrolli -floop-unroll-and-jam -o unroll_jam_bug
-fdump-tree-unrolljam-details
Using this command, loop unroll and jam
optimization will be executed, but there seems to be room for optimization.

Original code:
for ( l_n = 0; l_n < 9; l_n++ ) {
for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }

for ( l_k = 0; l_k < 17; l_k++ ) {
   for ( l_m = 0; l_m < 10; l_m++ ) {
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
  }
}
  }
After unroll and jam pass:
for ( l_n = 0; l_n < 9; l_n++ ) {
for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }

for ( l_k = 0; l_k < 17; l_k += 2 ) {
  for ( l_m = 0; l_m < 10; l_m++ ) {
C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
C[(l_n*10)+l_m] += A[(l_k*20 + 20)+l_m] * B[(l_n*20)+l_k + 1];
  }
}
  }

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #22 from Jürgen Reuter  ---
This is the output from the lldb command (but this was not a debug build of gcc
yet):
$ lldb ./static_1.exe
(lldb) target create "./static_1.exe"
Current executable set to './static_1.exe' (x86_64).
(lldb) run
Process 36799 launched: './static_1.exe' (x86_64)
static_1.exe(36799,0x1048f75c0) malloc: *** error for object 0x105c5eee0:
pointer being freed was not allocated
static_1.exe(36799,0x1048f75c0) malloc: *** set a breakpoint in
malloc_error_break to debug
Process 36799 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x7fff5a2d023e libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fff5a2d023e <+10>: jae0x7fff5a2d0248; <+20>
0x7fff5a2d0240 <+12>: movq   %rax, %rdi
0x7fff5a2d0243 <+15>: jmp0x7fff5a2ca3b7; cerror_nocancel
0x7fff5a2d0248 <+20>: retq   
Target 0: (static_1.exe) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x7fff5a2d023e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff5a386c1c libsystem_pthread.dylib`pthread_kill + 285
frame #2: 0x7fff5a2391c9 libsystem_c.dylib`abort + 127
frame #3: 0x7fff5a3486e2 libsystem_malloc.dylib`malloc_vreport + 545
frame #4: 0x7fff5a3484a3 libsystem_malloc.dylib`malloc_report + 152
frame #5: 0x000100929c84
static_1.exe`std::locale::_Impl::~_Impl(this=0x000105c5f0a0) at
locale.cc:243
frame #6: 0x000100929d8e
static_1.exe`std::locale::operator=(this=0x000105c611c0,
__other=0x7ffeefbfdad8) at locale_classes.h:568
frame #7: 0x000100927aec
static_1.exe`std::ios_base::_M_init(this=0x000105c610f0) at
ios_locale.cc:44
frame #8: 0x00010096cef1 static_1.exe`std::basic_ios >::init(this=0x000105c610f0,
__sb=0x000105c60840) at basic_ios.tcc:129
frame #9: 0x000105afcdf9 libstdc++.6.dylib`std::ios_base::Init::Init()
+ 681
frame #10: 0x000105ad30a0
libsio.2.12.dylib`_GLOBAL__sub_I_SIO_blockManager.cc + 16
frame #11: 0x000104859cc8
dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) +
518
frame #12: 0x000104859ec6
dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40
frame #13: 0x0001048550da
dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&,
unsigned int, char const*, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) + 358
frame #14: 0x00010485506d
dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&,
unsigned int, char const*, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) + 249
frame #15: 0x00010485506d
dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&,
unsigned int, char const*, ImageLoader::InitializerTimingList&,
ImageLoader::UninitedUpwards&) + 249
frame #16: 0x000104854254
dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned
int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 134
frame #17: 0x0001048542e8
dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&,
ImageLoader::InitializerTimingList&) + 74
frame #18: 0x000104843774 dyld`dyld::initializeMainExecutable() + 199
frame #19: 0x00010484878f dyld`dyld::_main(macho_header const*,
unsigned long, int, char const**, char const**, char const**, unsigned long*) +
6237
frame #20: 0x0001048424f6 dyld`dyldbootstrap::start(macho_header
const*, int, char const**, long, macho_header const*, unsigned long*) + 1154
frame #21: 0x000104842036 dyld`_dyld_start + 54

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

--- Comment #20 from Richard Biener  ---
For stage3/gcc/*.o statistics show we perform 21051052 base_alias_check calls
and in the end 706852 times it is the one that would have disambiguated
things compared to if we remove it (thus as if we do base_alias_check last).

Note there's also

  base = find_base_term (x_addr);
  if (base && (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF
   && CONSTANT_POOL_ADDRESS_P (base
return 0;

which is suspicious but I guess harder to hit in practice so things go wrong.

base_alias_check is not exactly the first thing we check (but nearly) so
we'd roughly lose 3% disambiguations from RTL alias analysis if we scrap
base_alias_check completely.

That's probably too much.

Note the CONSTANT_POOL_ADDRESS_P thing isn't necessary and subsumed by
following checks so we could remove that without losing anything
(it hits only 84 times at all in the above set and later checks subsume it).

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

--- Comment #21 from Richard Biener  ---
Created attachment 45389
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45389&action=edit
statistic patch

patch I added to record statistics

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-01-09
 CC||matz at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
What's the room for improvement?  Why's unrolling the innermost loop not
profitable?

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #23 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #22)
> This is the output from the lldb command (but this was not a debug build of
> gcc yet):
> $ lldb ./static_1.exe
> (lldb) target create "./static_1.exe"
> Current executable set to './static_1.exe' (x86_64).
> (lldb) run



> __sb=0x000105c60840) at basic_ios.tcc:129
> frame #9: 0x000105afcdf9
> libstdc++.6.dylib`std::ios_base::Init::Init() + 681
> frame #10: 0x000105ad30a0

 so, you have a combination of things linking libstdc++ statically and
dynamically .. that seems fragile at best.

Having said that - the tricky thing now is to determine what has "broken" (it's
probably going to be hard without a "before" and "after" case).

[Bug libstdc++/88204] New test case 26_numerics/complex/operators/more_constexpr.cc from r266416 fails

2019-01-09 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88204

--- Comment #3 from Jonathan Wakely  ---
Fixed for GNU/Linux and AIX. Please reopen if it's still failing on Darwin.

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #24 from Richard Biener  ---
(In reply to Iain Sandoe from comment #23)
> (In reply to Jürgen Reuter from comment #22)
> > This is the output from the lldb command (but this was not a debug build of
> > gcc yet):
> > $ lldb ./static_1.exe
> > (lldb) target create "./static_1.exe"
> > Current executable set to './static_1.exe' (x86_64).
> > (lldb) run
> 
> 
> 
> > __sb=0x000105c60840) at basic_ios.tcc:129
> > frame #9: 0x000105afcdf9
> > libstdc++.6.dylib`std::ios_base::Init::Init() + 681
> > frame #10: 0x000105ad30a0
> 
>  so, you have a combination of things linking libstdc++ statically and
> dynamically .. that seems fragile at best.
> 
> Having said that - the tricky thing now is to determine what has "broken"
> (it's probably going to be hard without a "before" and "after" case).

Indeed - somehow you didn't get a statically linked executable.  Quoting the
full final link command would be interesting.

[Bug fortran/88768] New: Derived type io in conjunction with allocatable component and recursion fails

2019-01-09 Thread mscfd at gmx dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768

Bug ID: 88768
   Summary: Derived type io in conjunction with allocatable
component and recursion fails
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mscfd at gmx dot net
  Target Milestone: ---

This is a strange bug, which requires a some kind of dt IO (defined as "
generic :: write(unformatted) => write_unformatted", but not used!), an
allocatable component with dimension(:) (a "character(len=:), allocatable"
triggers the bug as well), and a recursive function.

If the "write(unformatted)"-part is commented out, the bug does not occur.
Without recursing, the return value is fine (variable y). Also, if the
dimension(:) is omitted in the declaration of r, then the bug disappears as
well.

The code either show funny values for z or segfaults. Valgrind shows an illegal
memory read.


module mod

implicit none
private

type, public :: t
   real, dimension(:), allocatable :: r
contains
   procedure :: set
   generic :: assignment(=) => set
   procedure :: recurse
   generic :: write(unformatted) => write_unformatted
   procedure :: write_unformatted
end type t

contains

subroutine set(self, x)
   class(t), intent(out) :: self
   class(t), intent(in) :: x
   real, dimension(:), allocatable :: tmp
   if (allocated(x%r)) then
  ! make a local copy to avoid any aliasing issues
  tmp = x%r
  self%r = tmp
   end if
end subroutine set

recursive function recurse(self, i) result(x)
   type(t) :: x
   class(t), intent(in) :: self
   integer,  intent(in) :: i
   if (i > 0) then
  x = self%recurse(i-1)
   else
  x = self
   end if
end function recurse

subroutine write_unformatted(dtv, unit, iostat, iomsg)
   class(t), intent(in):: dtv
   integer,  intent(in):: unit
   integer,  intent(out)   :: iostat
   character(len=*), intent(inout) :: iomsg
   write(unit, iostat=iostat, iomsg=iomsg) 'unformatted'
end subroutine write_unformatted

end module mod


program dt_io

use mod
implicit none

type(t) :: x, y, z

x%r = [1.23, 2.21]
y = x%recurse(0)  ! fine
z = x%recurse(1)  ! fails

print *, x%r
print *, y%r
print *, z%r

end program dt_io

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread juergen.reuter at desy dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #25 from Jürgen Reuter  ---
(In reply to Richard Biener from comment #24)
> (In reply to Iain Sandoe from comment #23)
> > (In reply to Jürgen Reuter from comment #22)

> 
> Indeed - somehow you didn't get a statically linked executable.  Quoting the
> full final link command would be interesting.

The full link commands can be found here, I believe: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750#c14

Our code generates code for particle physics simulations in the form of dynamic
libraries that get linked and loaded. For batch clusters, we attempted to
provide static binaries for these simulations, however, we have order 10-15
external libraries that can be linked to our code (which are partially
mandatory). There are some of them which only exist as dynamic libraries, so
there our approach cannot result in a purely static binary. The static stdc++
library is sucked in via the libtool link mode/flag -static-libtool-libs while
the dynamic ones are sucked in via the external C++ libraries that are
available only dynamically.

[Bug sanitizer/88684] [7/8/9 Regression] Please make SANITIZER_NON_UNIQUE_TYPEINFO a runtime flag (or always true)

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684

--- Comment #8 from Martin Liška  ---
I created upstream patch candidate:
https://reviews.llvm.org/D56485

[Bug sanitizer/88684] [7/8/9 Regression] Please make SANITIZER_NON_UNIQUE_TYPEINFO a runtime flag (or always true)

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |9.0

[Bug c/88766] [9 Regression] Rejects valid? C code since r259641

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766

--- Comment #1 from Jakub Jelinek  ---
Reduced testcase:

struct S { int s; };

void
foo (void)
{
  void *p = &(struct S) { 0 };
  void *q = &({ (struct S) { 0 }; });
}

The p initializer is accepted, q is rejected.
By my reading this is invalid, C99 6.5.2.5/6 says:
"If the compound literal occurs outside the body of a function, the object
has static storage duration; otherwise, it has automatic storage duration
associated with the enclosing block."
and the statement expression is still a compound statement and thus the
compound literal is associated with the statement expression's block.  So it is
the same thing as:
void
bar (void)
{
  void *r = &({ int a = 0; a; });
}
which fails with the same diagnostics.

Joseph, do you agree?

[Bug tree-optimization/88763] Better Output for Loop Unswitching

2019-01-09 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763

--- Comment #2 from Marius Messerschmidt  ---
Sorry but I do not fully understand what you mean. Do you suggest using
different command line arguments?

So far I tried:

-fdump-tree-all
-fdump-tree-unswitch

and

-fopt-info-all-optall

But none of them told me the all the things that I would wish to know, most
important the reason why a particular loop was skipped during unswitching (e.g.
because it is not invariant or so (right now it already reports a few things
with -fdump-tree-unswitch like too-many-instructions or too-many-branches))

[Bug rtl-optimization/88751] Performance regression reload vs lra

2019-01-09 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88751

--- Comment #2 from Andreas Krebbel  ---
(In reply to Richard Biener from comment #1)
...
> Would be interesting to know the sparseness of regs / BBs for your testcase
> at the point of LRA and whether compacting regs (do we ever do that?) might
> be a good idea in general.  (we do compact BBs regularly)

Good point. Only 9352 of the 27089 pseudos appear to be actually referenced.
Hence the following patch fixes the problem for me:

diff --git a/gcc/ira.c b/gcc/ira.c
index c8f2df43dd1..965819e1ef9 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5157,6 +5157,7 @@ ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
+  int i, num_used_regs = 0;

   clear_bb_flags ();

@@ -5172,12 +5173,17 @@ ira (FILE *f)

   ira_conflicts_p = optimize > 0;

+  /* Determine the number of pseudos actually requiring coloring.  */
+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
+
   /* If there are too many pseudos and/or basic blocks (e.g. 10K
  pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
  use simplified and faster algorithms in LRA.  */
   lra_simple_p
 = (ira_use_lra_p
-   && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
+   && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+
   if (lra_simple_p)
 {
   /* It permits to skip live range splitting in LRA.  */

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

--- Comment #22 from Richard Biener  ---
Things we fail to disambiguate are

(mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0  S16 A8])
 vs.
(mem/c:TF (plus:SI (reg/f:SI 19 frame)
  (const_int -16 [0xfff0])) [1  S16 A128])

or

(mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
 vs.
(mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2] )
[2 argv+0 S4 A32])

where I don't find anything besides CSELIB cselib_sp_based_value_p handling
in find_base_term that could be the one handling it?

I guess we should be able to somehow handle both sp and frame based
accesses in a more conservative way?

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #25 from Wilco  ---
(In reply to rguent...@suse.de from comment #17)
> On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > 
> > --- Comment #16 from Wilco  ---
> > I think we need to simplify the many BIG_ENDIAN macros so it is feasible to 
> > get
> > big-endian to work reliably on all targets. There seem to be far too many
> > options which affect too many unrelated things. Big-endian is fundamentally
> > about memory byte ordering, so allowing to different byte/bit orderings in
> > registers just makes things overly complex without any benefit.
> 
> It's unfortunately not the compiler writers choice but the CPU designers.

It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned
little-endian long long and big-endian doubles! ARM2 only supported
little-endian so it didn't matter at the time. However it doesn't allow
unaligned accesses, tightly packed bitfields and runtime endian swapping as
required by the embedded space, or hardware floating point. No surprise it was
replaced by the Arm EABI.

[Bug tree-optimization/69196] [5 Regression] code size regression with jump threading at -O2

2019-01-09 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69196

--- Comment #29 from Sebastian Huber  ---
Just for reference some numbers for GCC 7.4.0 and GCC 9.0.0 20190104:

sparc-rtems5-gcc --version
sparc-rtems5-gcc (GCC) 7.4.0 20181206 (RTEMS 5, RSB
ddba5372522da341fa20b2c75dfe966231cb6790, Newlib
df6915f029ac9acd2b479ea898388cbd7dda4974)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

sparc-rtems5-gcc -c -O2 -o vprintk.7.4.0.o vprintk.i

sparc-rtems6-gcc --version
sparc-rtems6-gcc (GCC) 9.0.0 20190104 (RTEMS 6, RSB
cd4a4f61ea5bbd4236f7717a94cd5e67f8b3ad20, Newlib
34d9bb709390b14b4ed0b1ea2656bf6bf5a055c3)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

sparc-rtems6-gcc -c -O2 -o vprintk.9.0.0.o vprintk.i

size *.o
   textdata bss dec hex filename
688   0   0 688 2b0 vprintk.4.9.4.o
   1272   0   01272 4f8 vprintk.6.0.0.o
933   0   0 933 3a5 vprintk.7.4.0.o
825   0   0 825 339 vprintk.9.0.0.o

It seems the code size is quite volatile for this test case.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #26 from Richard Biener  ---
(In reply to Wilco from comment #25)
> (In reply to rguent...@suse.de from comment #17)
> > On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > > 
> > > --- Comment #16 from Wilco  ---
> > > I think we need to simplify the many BIG_ENDIAN macros so it is feasible 
> > > to get
> > > big-endian to work reliably on all targets. There seem to be far too many
> > > options which affect too many unrelated things. Big-endian is 
> > > fundamentally
> > > about memory byte ordering, so allowing to different byte/bit orderings in
> > > registers just makes things overly complex without any benefit.
> > 
> > It's unfortunately not the compiler writers choice but the CPU designers.
> 
> It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned
> little-endian long long and big-endian doubles! ARM2 only supported
> little-endian so it didn't matter at the time. However it doesn't allow
> unaligned accesses, tightly packed bitfields and runtime endian swapping as
> required by the embedded space, or hardware floating point. No surprise it
> was replaced by the Arm EABI.

Whatever ;)

Did anybody test the patch?  Testing on x86_64 will be quite pointless...

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #27 from Wilco  ---
(In reply to Eric Botcazou from comment #22)
> > Is it really pure RTL, therefore not used in tree? So the above patch using
> > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?
> 
> I wouldn't say incorrect, just inappropriate and unnecessary.  And, yes, it
> isn't used at the tree level and should stay so IMO.  BYTES_BIG_ENDIAN alone
> already implicitly enforces a numbering on bits.

I mean incorrect as in the optimization would still trigger and give incorrect
results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that BITS_BIG_ENDIAN has
no bearing on the bitfield offsets used on tree level).

[Bug rtl-optimization/88769] New: Call to sin() optimized away, disregarding possible side-effect (errno)

2019-01-09 Thread per at pz dot se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769

Bug ID: 88769
   Summary: Call to sin() optimized away, disregarding possible
side-effect (errno)
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: per at pz dot se
  Target Milestone: ---

(This is my first GCC bug report, so please have patience with me...)

Test program:

#include 

void foo(float x) 
{
  sin(x);
}


When compiling with -O1 (or higher), the call to sin() is optimized away:


.file   "test.c"
.text
.globl  foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
rep ret
.cfi_endproc
.LFE0:
.size   foo, .-foo
.ident  "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section.note.GNU-stack,"",@progbits



However, the sin() call has possible side-effects; according to the glibc docs,
it sets errno to EDOM in case the argument is an infinity.

The math errno can be disabled with -fno-math-errno, but according to the GCC
docs it is enabled by default, and compiling with the -fmath-errno makes no
difference.

The behavior is the same with GCC 8.2 and "trunk" (via godbolt.org).

clang 6.0 does not optimize away the sin() call, except when called with
-fno-math-errno.



$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-7
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)

[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries

2019-01-09 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750

--- Comment #26 from Iain Sandoe  ---
(In reply to Jürgen Reuter from comment #25)
> (In reply to Richard Biener from comment #24)
> > (In reply to Iain Sandoe from comment #23)
> > > (In reply to Jürgen Reuter from comment #22)
> 
> > 
> > Indeed - somehow you didn't get a statically linked executable.  Quoting the
> > full final link command would be interesting.
> 
> The full link commands can be found here, I believe: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750#c14
> 
> Our code generates code for particle physics simulations in the form of
> dynamic libraries that get linked and loaded. For batch clusters, we
> attempted to provide static binaries for these simulations, however, we have
> order 10-15 external libraries that can be linked to our code (which are
> partially mandatory). There are some of them which only exist as dynamic
> libraries, so there our approach cannot result in a purely static binary.
> The static stdc++ library is sucked in via the libtool link mode/flag
> -static-libtool-libs while the dynamic ones are sucked in via the external
> C++ libraries that are available only dynamically.

So .. I appreciate it can be difficult with a sophisticated project.  However,
it would seem prudent to try to arrange that you have only one instance of the
c++ library.  Imagine creating an object in one instance, and that object
somehow finds it's way to be destroyed in a different one.

I've spent some time trying to make it possible to link GCC Darwin projects
'statically', (modulo the libSystem, which must be dynamic) - but that's only
going to work if all the project dependent libs are available as convenience
libs (or, I suppose, if no used dynamic ones have any external deps other than
libSystem).

If that's not possible, then it's most likely better to arrange to do a link -r
on everything that can be found as convenience .. and then link the result with
-lstdc++.

It might be that it worked before mostly from luck - although I'd still like to
have a reference for a known "working" static linked case.  As the c++ library
grows, this is only going to be more fragile.

[Bug rtl-optimization/88770] New: Redundant load opt. or CSE pessimizes code

2019-01-09 Thread bisqwit at iki dot fi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770

Bug ID: 88770
   Summary: Redundant load opt. or CSE pessimizes code
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bisqwit at iki dot fi
  Target Milestone: ---

For this code (-xc -std=c99 or -xc++ -std=c++17):

struct guu { int a; int b; float c; char d; };

extern void test(struct guu);

void caller()
{
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
}

CSE (or some other form of redundant loads optimization) pessimizes the code.
Problem occurs on optimization levels -O1 and higher, including -Os.

If the function "caller" calls test() just once, the resulting code is (-O3
-fno-optimize-sibling-calls, stack alignment/push/pops omitted for brevity):

movabs  rdi, 21474836483
movabs  rsi, 39743127552
calltest

If "caller" calls test() twice, the code is a lot longer and not just twice as
long. (Stack alignment/push/pops omitted for brevity):

movabs  rbp, 21474836483
mov rdi, rbp
movabs  rbx, 38654705664
mov rsi, rbx
or  rbx, 1088421888
or  rsi, 1088421888
calltest
mov rsi, rbx
mov rdi, rbp
calltest

If we change caller() such that the parameters in the two calls are not
identical:

void caller()
{
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
test( (struct guu){.a = 3, .b = 6, .c = 7, .d = 10} );
}

The generated code is optimal again as expected:

movabs  rdi, 21474836483
movabs  rsi, 39743127552
calltest
movabs  rdi, 25769803779
movabs  rsi, 44038094848
calltest

The problem in the first examples is that the compiler sees that the same
parameter is used twice, and it tries to save it in a callee-saves register, in
order to reuse the same values on the second call. However re-initializing the
registers from scratch would have been more efficient.

The problem occurs on GCC versions 4.8.1 and newer. It does not occur in GCC
version 4.7.4, which generated different code that is otherwise inefficient.

For reference, the problem also exists in Clang versions 3.5 and newer, but not
in versions 3.4 and earlier.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #28 from Richard Biener  ---
(In reply to Wilco from comment #27)
> (In reply to Eric Botcazou from comment #22)
> > > Is it really pure RTL, therefore not used in tree? So the above patch 
> > > using
> > > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it?
> > 
> > I wouldn't say incorrect, just inappropriate and unnecessary.  And, yes, it
> > isn't used at the tree level and should stay so IMO.  BYTES_BIG_ENDIAN alone
> > already implicitly enforces a numbering on bits.
> 
> I mean incorrect as in the optimization would still trigger and give
> incorrect results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that
> BITS_BIG_ENDIAN has no bearing on the bitfield offsets used on tree level).

Given that it matters for

  /* If I2 is setting a pseudo to a constant and I3 is setting some
 sub-part of it to another constant, merge them by making a new
 constant.  */
  if (i1 == 0
...
  if (GET_CODE (dest) == ZERO_EXTRACT)
{
...
  if (BITS_BIG_ENDIAN)
offset = GET_MODE_PRECISION (dest_mode) - width - offset;

and VN tries to do sth similar I wonder if it does matter after all...

That said, the docs also refer to 'bit-field instructions' but do not
elaborate further -- I guess zero_extract is such but I'd have guessed
BIT_FIELD_REF (on trees) is as well.  But yes, RTL expansion adjusts
things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care
(or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN).

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #29 from Wilco  ---
(In reply to Richard Biener from comment #26)

> Did anybody test the patch?  Testing on x86_64 will be quite pointless...

Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:

ubfxx1, x20, 2, 16

This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue is
that we're using a bitfield reference on a value that is claimed not to be a
bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work
correctly.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #30 from Eric Botcazou  ---
> That said, the docs also refer to 'bit-field instructions' but do not
> elaborate further -- I guess zero_extract is such but I'd have guessed
> BIT_FIELD_REF (on trees) is as well.  But yes, RTL expansion adjusts
> things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care
> (or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN).

Yes, BYTES_BIG_ENDIAN is implicitly propagated to bits at the tree level.
I don't think that we want to support BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN at
the tree level, that would be a nightmare.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #4 from Wilco  ---
(In reply to ktkachov from comment #2)
> Created attachment 45386 [details]
> aarch64-llvm output with -Ofast -mcpu=cortex-a57
> 
> I'm attaching the full LLVM aarch64 output.
> 
> The output you quoted is with -funroll-loops. If that's not given, GCC
> doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> testing).
> 
> Is there anything we can do to make the default unrolling a bit more
> aggressive?

I don't think the RTL unroller works at all. It doesn't have the right
settings, and doesn't understand how to unroll, so we always get inefficient
and bloated code.

To do unrolling correctly it has to be integrated at tree level - for example
when vectorization isn't possible/beneficial, unrolling might still be a good
idea.

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

Bill Schmidt  changed:

   What|Removed |Added

 Status|WAITING |UNCONFIRMED
 Ever confirmed|1   |0

--- Comment #2 from Bill Schmidt  ---
Hi Richard -- This was reported to us internally.  The performance of this test
case on a P8 server indicates that disabling complete unrolling and applying
unroll-and-jam could produce about a 1.5x speedup.  I am going to have our
performance team verify that this is the case using just the options that Li
Jia used; the original report modified the source to provide the results of
unroll-and-jam since the reporter didn't know how to disable cunrolli.  I'll
post the results here when we have them.

[Bug tree-optimization/88763] Better Output for Loop Unswitching

2019-01-09 Thread marius.messerschmidt at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763

--- Comment #3 from Marius Messerschmidt  ---
Sorry but I do not fully understand what you mean. Do you suggest using
different command line arguments?

So far I tried:

-fdump-tree-all
-fdump-tree-unswitch

and

-fopt-info-all-optall

But none of them told me the all the things that I would wish to know, most
important the reason why a particular loop was skipped during unswitching (e.g.
because it is not invariant or so (right now it already reports a few things
with -fdump-tree-unswitch like too-many-instructions or too-many-branches))

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #3 from Michael Matz  ---
I don't see anything to improve either (as far as unroll-and-jam is concerned).
It's quite possible that cunrolli is harming more than helping in this case,
but with it disabled it seems the code is as it should be.

So, please state what you want to see changed: unroll-and-jam or cunrolli?

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #31 from rguenther at suse dot de  ---
On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> 
> --- Comment #29 from Wilco  ---
> (In reply to Richard Biener from comment #26)
> 
> > Did anybody test the patch?  Testing on x86_64 will be quite pointless...
> 
> Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:
> 
> ubfxx1, x20, 2, 16
> 
> This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue 
> is
> that we're using a bitfield reference on a value that is claimed not to be a
> bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work
> correctly.

So that's because TYPE_PRECISION != GET_MODE_PRECISION and the
BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose.

Thus there is a RTL expansion side of the bug after all?

The "fixed" RTL is

(insn 6 5 7 (set (reg:SI 95)
(lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
(const_int 2 [0x2]))) "t.c":42:48 -1
 (nil))

(insn 7 6 8 (set (reg:SI 96)
(and:SI (reg:SI 95)
(const_int 1073741823 [0x3fff]))) "t.c":42:48 -1
 (nil))

(insn 8 7 9 (set (subreg:DI (reg:HI 97) 0)
(zero_extract:DI (subreg:DI (reg:SI 96) 0)
(const_int 16 [0x10])
(const_int 2 [0x2]))) "t.c":44:8 -1
 (nil))

so the 30bit value is in reg:SI 96 (the :30 cast causes the
and with 0x3fff) but then the zero_extract we generate
is bogus.

So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN?

We might be able to work around this by optimization on GIMPLE,
combining

  _1 = ulAddr_3(D) >> 2;
  _2 = () _1;
  _6 = BIT_FIELD_REF <_2, 16, 14>;

as far as eliminating at least the non-mode precision type...

Of course that would just work around the underlying RTL expansion
bug?

Note we can end up with things like

 _2 = ( (TYPE_MODE 
(TREE_TYPE (tem)
+   && BYTES_BIG_ENDIAN)
+ bitpos += (GET_MODE_BITSIZE (as_a  (TYPE_MODE 
(TREE_TYPE (tem
+- TYPE_PRECISION (TREE_TYPE (tem)));
+
/* If TEM's type is a union of variable size, pass TARGET to the 
inner
   computation, since it will need a temporary and TARGET is known
   to have to do.  This occurs in unchecked conversion in Ada.  */

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #4 from rguenther at suse dot de  ---
On Wed, 9 Jan 2019, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
> 
> Bill Schmidt  changed:
> 
>What|Removed |Added
> 
>  Status|WAITING |UNCONFIRMED
>  Ever confirmed|1   |0
> 
> --- Comment #2 from Bill Schmidt  ---
> Hi Richard -- This was reported to us internally.  The performance of this 
> test
> case on a P8 server indicates that disabling complete unrolling and applying
> unroll-and-jam could produce about a 1.5x speedup.  I am going to have our
> performance team verify that this is the case using just the options that Li
> Jia used; the original report modified the source to provide the results of
> unroll-and-jam since the reporter didn't know how to disable cunrolli.  I'll
> post the results here when we have them.

Note for cases like this it would be nice to extend our set of loop 
pragmas so you could say

#pragma GCC loop unroll-and-jam [factor]

on the outer loop which should then disable unrolling of the inner.

If source modification is possible, that is.  Using 
-fdisable-tree-cunrolli isn't meant to be a "production thing"

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #5 from Bill Schmidt  ---
From the original reporter:

Partially unrolling the outermost loop in the innermost loop body enables data
reuse for array A (see source) thereby improving the mem-ops/compute ratio and
providing the performance gain.

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #32 from Richard Biener  ---
(In reply to rguent...@suse.de from comment #31)
> On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739
> > 
> > --- Comment #29 from Wilco  ---
> > (In reply to Richard Biener from comment #26)
> > 
> > > Did anybody test the patch?  Testing on x86_64 will be quite pointless...
> > 
> > Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes:
> > 
> > ubfxx1, x20, 2, 16
> > 
> > This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The 
> > issue is
> > that we're using a bitfield reference on a value that is claimed not to be a
> > bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever 
> > work
> > correctly.
> 
> So that's because TYPE_PRECISION != GET_MODE_PRECISION and the
> BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose.
> 
> Thus there is a RTL expansion side of the bug after all?
> 
> The "fixed" RTL is
> 
> (insn 6 5 7 (set (reg:SI 95)
> (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
> (const_int 2 [0x2]))) "t.c":42:48 -1
>  (nil))
> 
> (insn 7 6 8 (set (reg:SI 96)
> (and:SI (reg:SI 95)
> (const_int 1073741823 [0x3fff]))) "t.c":42:48 -1
>  (nil))
> 
> (insn 8 7 9 (set (subreg:DI (reg:HI 97) 0)
> (zero_extract:DI (subreg:DI (reg:SI 96) 0)
> (const_int 16 [0x10])
> (const_int 2 [0x2]))) "t.c":44:8 -1
>  (nil))
> 
> so the 30bit value is in reg:SI 96 (the :30 cast causes the
> and with 0x3fff) but then the zero_extract we generate
> is bogus.
> 
> So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN?
> 
> We might be able to work around this by optimization on GIMPLE,
> combining
> 
>   _1 = ulAddr_3(D) >> 2;
>   _2 = () _1;
>   _6 = BIT_FIELD_REF <_2, 16, 14>;
> 
> as far as eliminating at least the non-mode precision type...
> 
> Of course that would just work around the underlying RTL expansion
> bug?
> 
> Note we can end up with things like
> 
>  _2 = (  _3 = (  _5 = _2 + 3;
> 
> as well so shifting at the conversion might not be the correct
> answer (but instead BIT_FIELD_REF expansion needs to be fixed).
> 
> Alternatively we could declare it invalid GIMPLE and require
> BIT_FIELD_REF positions to be always relative to the mode
> (but then I'd rather disallow BIT_FIELD_REF on non-mode
> precision entities...).
> 
> Sth like the following might fix the RTL expansion issue
> which then generates
> 
> Test_func:
> ubfxx0, x0, 2, 16
> cmp w0, 1
> bne .L6
> mov w0, 0
> 
> and just
> 
> (insn 6 5 7 (set (reg:SI 95)
> (lshiftrt:SI (reg/v:SI 94 [ ulAddr ])
> (const_int 2 [0x2]))) "t.c":42:48 -1
>  (nil))
> 
> (insn 7 6 8 (set (reg:SI 96)
> (and:SI (reg:SI 95)
> (const_int 1073741823 [0x3fff]))) "t.c":42:48 -1
>  (nil))
> 
> (insn 8 7 9 (set (reg:SI 97)
> (zero_extend:SI (subreg:HI (reg:SI 96) 2))) "t.c":44:8 -1
>  (nil))
> 
> Index: gcc/expr.c
> ===
> --- gcc/expr.c  (revision 267553)
> +++ gcc/expr.c  (working copy)
> @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target
>infinitely recurse.  */
> gcc_assert (tem != exp);
>  
> +   /* When extracting from non-mode bitsize entities adjust the
> +  bit position for BYTES_BIG_ENDIAN.  */
> +   if (INTEGRAL_TYPE_P (TREE_TYPE (tem))
> +   && (TYPE_PRECISION (TREE_TYPE (tem))
> +   < GET_MODE_BITSIZE (as_a  (TYPE_MODE 
> (TREE_TYPE (tem)
> +   && BYTES_BIG_ENDIAN)
> + bitpos += (GET_MODE_BITSIZE (as_a  (TYPE_MODE 
> (TREE_TYPE (tem
> +- TYPE_PRECISION (TREE_TYPE (tem)));
> +
> /* If TEM's type is a union of variable size, pass TARGET to the 
> inner
>computation, since it will need a temporary and TARGET is known
>to have to do.  This occurs in unchecked conversion in Ada.  */

Btw, this needs to be amended for WORDS_BIG_ENDIAN of course.  I guess
we might even run into the case that such BIT_FIELD_REF references
a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN I guess).

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #6 from Bill Schmidt  ---
Yes, we don't want to encourage disabling cunrolli by hand for production use. 
This test case is interesting because it shows a tension between complete
unrolling of inner loops and classical HPC loop optimization, which wants
control over memory access patterns.  I think we will eventually have to
address this more generally.

[Bug rtl-optimization/88770] Redundant load opt. or CSE pessimizes code

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization, ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-09
 CC||vmakarov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I guess being constants would make this a job for lra remat?

Confirmed also on trunk.

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #7 from Bill Schmidt  ---
(In reply to Michael Matz from comment #3)
> I don't see anything to improve either (as far as unroll-and-jam is
> concerned).
> It's quite possible that cunrolli is harming more than helping in this case,
> but with it disabled it seems the code is as it should be.
> 
> So, please state what you want to see changed: unroll-and-jam or cunrolli?

The question in my mind is what to do about the phase interaction between the
two.  Classical optimizations of loop nests for HPC code optimize memory access
patterns, and cunrolli takes some of the options off the table before
unroll-and-jam (in this case) can analyze the loop.

[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-09
 CC||jsm28 at gcc dot gnu.org
  Component|tree-optimization   |c
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
This is because GCC thinks sin() doesn't set errno.

DEF_LIB_BUILTIN(BUILT_IN_SIN, "sin", BT_FN_DOUBLE_DOUBLE,
ATTR_MATHFN_FPROUNDING)

According to the C standard no error conditions are documented for sin or cos,
specifically no domain error is documented.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #5 from Wilco  ---
(In reply to Wilco from comment #4)
> (In reply to ktkachov from comment #2)
> > Created attachment 45386 [details]
> > aarch64-llvm output with -Ofast -mcpu=cortex-a57
> > 
> > I'm attaching the full LLVM aarch64 output.
> > 
> > The output you quoted is with -funroll-loops. If that's not given, GCC
> > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> > testing).
> > 
> > Is there anything we can do to make the default unrolling a bit more
> > aggressive?
> 
> I don't think the RTL unroller works at all. It doesn't have the right
> settings, and doesn't understand how to unroll, so we always get inefficient
> and bloated code.
> 
> To do unrolling correctly it has to be integrated at tree level - for
> example when vectorization isn't possible/beneficial, unrolling might still
> be a good idea.

To add some numbers to the conversation, the gain LLVM gets from default
unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017.

This clearly shows there is huge potential from unrolling, *if* we can teach
GCC to unroll properly like LLVM. That means early unrolling, using good
default settings and using a trailing loop rather than inefficient peeling.

[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops

2019-01-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #8 from rguenther at suse dot de  ---
On Wed, 9 Jan 2019, wschmidt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
> 
> --- Comment #7 from Bill Schmidt  ---
> (In reply to Michael Matz from comment #3)
> > I don't see anything to improve either (as far as unroll-and-jam is
> > concerned).
> > It's quite possible that cunrolli is harming more than helping in this case,
> > but with it disabled it seems the code is as it should be.
> > 
> > So, please state what you want to see changed: unroll-and-jam or cunrolli?
> 
> The question in my mind is what to do about the phase interaction between the
> two.  Classical optimizations of loop nests for HPC code optimize memory 
> access
> patterns, and cunrolli takes some of the options off the table before
> unroll-and-jam (in this case) can analyze the loop.

A improvement of the heuristics could be to turn down
--param max-completely-peel-times and friends for cunrolli.

cunrolli is important to remove abstraction in C++ since none of the
scalar optimization passes knows to unroll loops "virtually" (it's on
my list to experiment with such an idea for value-numbering)

[Bug tree-optimization/88771] New: [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

Bug ID: 88771
   Summary: [9 Regression] Misleading -Werror=array-bounds error
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Starting from r264956 I see error for:

$ cat om.i
typedef struct {
  int a;
} * b;

char *c, *x;
int f;

void d() {
  b e;
  char a = f + 1 ?: f;
  __builtin_strncpy(c, x, f);
  if (a)
e->a = 0;
}

$ gcc  om.i -c -O2 -Werror=array-bounds
om.i: In function ‘d’:
om.i:11:3: error: ‘__builtin_strncpy’ pointer overflow between offset 0 and
size [-1, 9223372036854775807] [-Werror=array-bounds]
   11 |   __builtin_strncpy(c, x, f);
  |   ^~
cc1: some warnings being treated as errors

$ gcc  om.i -c -O2 -Werror=array-bounds -m32
om.i: In function ‘d’:
om.i:11:3: error: ‘__builtin_strncpy’ pointer overflow between offset 0 and
size [4294967295, 2147483647] [-Werror=array-bounds]
   11 |   __builtin_strncpy(c, x, f);
  |   ^~
cc1: some warnings being treated as errors

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2019-1-9
 CC||msebor at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Known to work||8.2.0
   Target Milestone|--- |9.0
  Known to fail||9.0

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #6 from rguenther at suse dot de  ---
On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> 
> --- Comment #5 from Wilco  ---
> (In reply to Wilco from comment #4)
> > (In reply to ktkachov from comment #2)
> > > Created attachment 45386 [details]
> > > aarch64-llvm output with -Ofast -mcpu=cortex-a57
> > > 
> > > I'm attaching the full LLVM aarch64 output.
> > > 
> > > The output you quoted is with -funroll-loops. If that's not given, GCC
> > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> > > testing).
> > > 
> > > Is there anything we can do to make the default unrolling a bit more
> > > aggressive?
> > 
> > I don't think the RTL unroller works at all. It doesn't have the right
> > settings, and doesn't understand how to unroll, so we always get inefficient
> > and bloated code.
> > 
> > To do unrolling correctly it has to be integrated at tree level - for
> > example when vectorization isn't possible/beneficial, unrolling might still
> > be a good idea.
> 
> To add some numbers to the conversation, the gain LLVM gets from default
> unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017.
> 
> This clearly shows there is huge potential from unrolling, *if* we can teach
> GCC to unroll properly like LLVM. That means early unrolling, using good
> default settings and using a trailing loop rather than inefficient peeling.

I don't see why this cannot be done on RTL where we have vastly more
information of whether there are execution resources that can be
used by unrolling.  Note we also want unrolling to interleave
instructions to not rely on pre-reload scheduling which in turn means
having a good eye on register pressure (again sth not very well handled
on GIMPLE)

[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)

2019-01-09 Thread per at pz dot se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769

--- Comment #2 from Per Zetterlund  ---
The POSIX standard describes domain error conditions for sin() :
http://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html .

I guess there is a discrepancy between the C standard and the POSIX standard in
this case.

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
It's the restrict pass doing this after VRP figured we can simplify things
via threading:

# .MEM_22 = VDEF <.MEM_7(D)>
__builtin_strncpy (pretmp_9, pretmp_19, 18446744073709551615);

not sure what the warning is about though but I guess it's triggered
by seeing that e->a = 0 store?

The testcase seems to be reduced ad absurdum and the bisection looks odd.

Can you attach some original source?

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

--- Comment #2 from Martin Liška  ---
Created attachment 45390
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45390&action=edit
original test-case

Original test that fails just with -m32:

$ gcc  om-original.i -c -O2 -Werror=array-bounds -m32
In file included from /usr/include/string.h:494,
 from /usr/include/X11/Xfuncs.h:46,
 from ../../../include/X11/Xlibint.h:335,
 from omGeneric.c:53:
In function ‘strncpy’,
inlined from ‘read_EncodingInfo’ at omGeneric.c:1836:9:
/usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ pointer
overflow between offset 0 and size [4294967295, 2147483647]
[-Werror=array-bounds]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos
(__dest));
  | 
^~~ 
cc1: some warnings being treated as errors

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

--- Comment #3 from Martin Liška  ---
Original test-case started to produce the warning since r263662.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #7 from Wilco  ---
(In reply to rguent...@suse.de from comment #6)
> On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> > 
> > --- Comment #5 from Wilco  ---
> > (In reply to Wilco from comment #4)
> > > (In reply to ktkachov from comment #2)
> > > > Created attachment 45386 [details]
> > > > aarch64-llvm output with -Ofast -mcpu=cortex-a57
> > > > 
> > > > I'm attaching the full LLVM aarch64 output.
> > > > 
> > > > The output you quoted is with -funroll-loops. If that's not given, GCC
> > > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my
> > > > testing).
> > > > 
> > > > Is there anything we can do to make the default unrolling a bit more
> > > > aggressive?
> > > 
> > > I don't think the RTL unroller works at all. It doesn't have the right
> > > settings, and doesn't understand how to unroll, so we always get 
> > > inefficient
> > > and bloated code.
> > > 
> > > To do unrolling correctly it has to be integrated at tree level - for
> > > example when vectorization isn't possible/beneficial, unrolling might 
> > > still
> > > be a good idea.
> > 
> > To add some numbers to the conversation, the gain LLVM gets from default
> > unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017.
> > 
> > This clearly shows there is huge potential from unrolling, *if* we can teach
> > GCC to unroll properly like LLVM. That means early unrolling, using good
> > default settings and using a trailing loop rather than inefficient peeling.
> 
> I don't see why this cannot be done on RTL where we have vastly more
> information of whether there are execution resources that can be
> used by unrolling.  Note we also want unrolling to interleave
> instructions to not rely on pre-reload scheduling which in turn means
> having a good eye on register pressure (again sth not very well handled
> on GIMPLE)

The main issue is that other loop optimizations are done on tree, so things
like addressing modes, loop invariants, CSEs are run on the non-unrolled
version. Then when we unroll in RTL we end up with very non-optimal code.
Typical unrolled loop starts like this:

add x13, x2, 1
add x14, x2, 2
add x11, x2, 3
add x10, x2, 4
ldr w30, [x4, x13, lsl 2]
add x9, x2, 5
add x5, x2, 6
add x12, x2, 7
ldr d23, [x3, x2, lsl 3]
... rest of unrolled loop

So basically it decides to create a new induction variable for every unrolled
copy in the loop. This often leads to spills just because it creates way too
many redundant addressing instructions. It also blocks scheduling between
iterations since the alias optimization doesn't appear to understand simple
constant differences between indices.

So unrolling should definitely be done at a high level just like vectorization.

[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450

Jakub Jelinek  changed:

   What|Removed |Added

 CC||uros at gcc dot gnu.org

--- Comment #15 from Jakub Jelinek  ---
Because you are using -march=native -mtune=native by default, it is unclear
what exact ISA/tuning that is.  From the dump I assume it must be something
with -mavx or -mavx2, so that is what I've used, but it would be nice to know
it more exact.  Guess ./xgcc -B ./ -v -xc /dev/null -S would reveal that.

Anyway, seems on gimplify.ii the problematic change in assign_stack_local_1 is
triggered just once, and to me it looks completely unnecessarily.
assign_stack_temp_for_type is called with BLKmode, 24, and gimple_stmt_iterator
type.  This is done because on the
seq = gsi_split_seq_after (iter);
call in gimplify_cleanup_point_expr inlined into gimplify_expr where iter is
clearly passed by invisible reference.
assign_stack_temp_for_type calls get_stack_local_alignment which does:
  if (mode == BLKmode)
alignment = BIGGEST_ALIGNMENT;
  else
alignment = GET_MODE_ALIGNMENT (mode);

  /* Allow the frond-end to (possibly) increase the alignment of this
 stack slot.  */
  if (! type)
type = lang_hooks.types.type_for_mode (mode, 0);

  return STACK_SLOT_ALIGNMENT (type, mode, alignment);
It seems complete waste to me try to align the 24 byte structure to 32 byte
boundary and allocate 48 bytes for it on the stack, then dynamically adjust the
start so that it is 32 byte aligned.  BIGGEST_ALIGNMENT is 256 bits because of
-mavx (could be even 512 bits for -mavx512f).

So, first of all, I'd think we should in i386 STACK_SLOT_ALIGNMENT undo this
unnecessary overalignment.  But that doesn't explain why you get segfault
elsewhere.

[Bug libgcc/88772] New: Exception handling configured mode does not match the one finally used

2019-01-09 Thread ylatuya at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772

Bug ID: 88772
   Summary: Exception handling configured mode does not match the
one finally used
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ylatuya at gmail dot com
  Target Milestone: ---

I am building a multilib GCC+MinGW toolchain targeting Windows. I have built
the cross toolchain, which compiled and works correctly and I am now trying to
build the native one.
The cross toolchain is configured with:
../configure --prefix /home/andoni/mingw/linux/w64 --libdir
/home/andoni/mingw/linux/w64/lib --enable-introspection  
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
--disable-shared --disable-libgomp --disable-libquadmath
--disable-libquadmath-support --disable-libmudflap --disable-libmpx
--disable-libssp --disable-nls --enable-threads=posix --enable-__cxa_atexit
--enable-lto --enable-plugin --enable-multiarch --enable-languages=c,c++
--enable-long-long 
--with-sysroot=/home/andoni/mingw/linux/w64/x86_64-w64-mingw32/sysroot 
--with-local-prefix=/home/andoni/mingw/linux/w64/x86_64-w64-mingw32/sysroot 
--target=x86_64-w64-mingw32


The native toolchain is configured with the same settings, only changing the
host:
../configure --prefix /home/andoni/mingw/windows/w64 --libdir
/home/andoni/mingw/windows/w64/lib --disable-introspection  
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
--disable-shared --disable-libgomp --disable-libquadmath
--disable-libquadmath-support --disable-libmudflap --disable-libmpx
--disable-libssp --disable-nls --enable-threads=posix --enable-__cxa_atexit
--enable-lto --enable-plugin --enable-multiarch --enable-languages=c,c++
--enable-long-long 
--with-sysroot=/home/andoni/mingw/windows/w64/x86_64-w64-mingw32/sysroot 
--with-local-prefix=/home/andoni/mingw/windows/w64/x86_64-w64-mingw32/sysroot 
--target=x86_64-w64-mingw32 --host=x86_64-w64-mingw32

I none of them I force SJLJ or disable it, so from the documentation and the
headers  it should be using SEH for 64 bits and SJLJ for 32 bits:
gcc/config/i386/cygming.h
369 /* If configured with --disable-sjlj-exceptions, use DWARF2 for 32-bit
370mode else default to SJLJ.  64-bit code uses SEH unless you request
371SJLJ.  */


But what happens is that it ends up using i386/t-dw2-eh instead of
i386/t-seh-eh
 and there is compilation error:

../../../../libgcc/unwind.inc: In function '_Unwind_RaiseException_Phase2':
../../../../libgcc/unwind.inc:53:62: error: 'struct _Unwind_Exception' has no
member named 'private_2'; did you mean 'private_'?
   match_handler = (uw_identify_context (context) == exc->private_2

The error seems to be in the switch case for x86_64-mingw32 in
libgcc/config.host:

  762 >   # This has to match the logic for DWARF2_UNWIND_INFO in
gcc/config/i386/cygming.h
 763 >   if test x$ac_cv_sjlj_exceptions = xyes; then
 764 >   >   tmake_eh_file="i386/t-sjlj-eh"
 765 >   elif test "${host_address}" = 32; then
 766 >   # biarch -m32 with --disable-sjlj-exceptions
 767 >>  tmake_eh_file="i386/t-dw2-eh"
 768 >   >   md_unwind_header=i386/w32-unwind.h
 769 >   else
 770 >   >   tmake_eh_file="i386/t-seh-eh"
 771 >   fi


  ^
  private_

[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450

--- Comment #16 from Jakub Jelinek  ---
The following patch does that.  Guess the issues reported in this PR might go
away with that, but it is really just an attempt to fix inefficiency in the
generated code rather than fix the wrong-code issue we have somewhere.

--- gcc/config/i386/i386.c.jj   2019-01-08 22:33:34.605708026 +0100
+++ gcc/config/i386/i386.c  2019-01-09 15:11:35.902663636 +0100
@@ -29679,6 +29679,17 @@ ix86_local_alignment (tree exp, machine_
   && (!type || !TYPE_USER_ALIGN (type))
   && (!decl || !DECL_USER_ALIGN (decl)))
 align = 32;
+  /* Similarly, don't do dynamic stack realignment just because
+ we need a BLKmode stack slot and have high BIGGEST_ALIGNMENT.
+ This is what get_stack_local_alignment returns regardless of
+ the actual needs, undo that here.  */
+  if (align == BIGGEST_ALIGNMENT
+  && mode == BLKmode
+  && !decl
+  && type
+  && align > TYPE_ALIGN (type)
+  && align > MAX_SUPPORTED_STACK_ALIGNMENT)
+align = MAX (TYPE_ALIGN (type), MAX_SUPPORTED_STACK_ALIGNMENT);

   /* If TYPE is NULL, we are allocating a stack slot for caller-save
  register in MODE.  We will return the largest alignment of XF
--- gcc/function.c.jj   2019-01-09 11:15:31.539836837 +0100
+++ gcc/function.c  2019-01-09 15:10:52.971371328 +0100
@@ -919,8 +919,10 @@ assign_stack_temp_for_type (machine_mode

 So for requests which depended on the rounding of SIZE, we go ahead
 and round it now.  We also make sure ALIGNMENT is at least
-BIGGEST_ALIGNMENT.  */
-  gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT);
+minimum of BIGGEST_ALIGNMENT and MAX_SUPPORTED_STACK_ALIGNMENT.  */
+  gcc_assert (mode != BLKmode
+ || align >= MIN (BIGGEST_ALIGNMENT,
+  MAX_SUPPORTED_STACK_ALIGNMENT));
   p->slot = assign_stack_local_1 (mode,
  (mode == BLKmode
   ? aligned_upper_bound (size,

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> Things we fail to disambiguate are
> 
> (mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0  S16 A8])
>  vs.
> (mem/c:TF (plus:SI (reg/f:SI 19 frame)
>   (const_int -16 [0xfff0])) [1  S16 A128])
> 
> or
> 
> (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
>  vs.
> (mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2]  argv>) [2 argv+0 S4 A32])
> 
> where I don't find anything besides CSELIB cselib_sp_based_value_p handling
> in find_base_term that could be the one handling it?
> 
> I guess we should be able to somehow handle both sp and frame based
> accesses in a more conservative way?

it's really 99% like this which is why eventually that CONST_INT restriction
worked so "well".

Can we easily identify spill slot accesses somehow?  The parameter accesses
(frame references?) should simply get appropriate MEM_EXPRs IMHO.

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #14 from Eric Botcazou  ---
Author: ebotcazou
Date: Wed Jan  9 14:34:20 2019
New Revision: 267771

URL: https://gcc.gnu.org/viewcvs?rev=267771&root=gcc&view=rev
Log:
PR target/84010
* config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode
consistently in TLS address generation and adjust code to the renaming
of patterns.  Mark calls to __tls_get_addr as const.
* config/sparc/sparc.md (tgd_hi22): Turn into...
(tgd_hi22): ...this and use Pmode throughout.
(tgd_lo10): Turn into...
(tgd_lo10): ...this and use Pmode throughout.
(tgd_add32): Merge into...
(tgd_add64): Likewise.
(tgd_add): ...this and use Pmode throughout.
(tldm_hi22): Turn into...
(tldm_hi22): ...this and use Pmode throughout.
(tldm_lo10): Turn into...
(tldm_lo10): ...this and use Pmode throughout.
(tldm_add32): Merge into...
(tldm_add64): Likewise.
(tldm_add): ...this and use Pmode throughout.
(tldm_call32): Merge into...
(tldm_call64): Likewise.
(tldm_call): ...this and use Pmode throughout.
(tldo_hix22): Turn into...
(tldo_hix22): ...this and use Pmode throughout.
(tldo_lox10): Turn into...
(tldo_lox10): ...this and use Pmode throughout.
(tldo_add32): Merge into...
(tldo_add64): Likewise.
(tldo_add): ...this and use Pmode throughout.
(tie_hi22): Turn into...
(tie_hi22): ...this and use Pmode throughout.
(tie_lo10): Turn into...
(tie_lo10): ...this and use Pmode throughout.
(tie_ld64): Use DImode throughout.
(tie_add32): Merge into...
(tie_add64): Likewise.
(tie_add): ...this and use Pmode throughout.
(tle_hix22_sp32): Merge into...
(tle_hix22_sp64): Likewise.
(tle_hix22): ...this and use Pmode throughout.
(tle_lox22_sp32): Merge into...
(tle_lox22_sp64): Likewise.
(tle_lox22): ...this and use Pmode throughout.
(*tldo_ldub_sp32): Merge into...
(*tldo_ldub_sp64): Likewise.
(*tldo_ldub): ...this and use Pmode throughout.
(*tldo_ldub1_sp32): Merge into...
(*tldo_ldub1_sp64): Likewise.
(*tldo_ldub1): ...this and use Pmode throughout.
(*tldo_ldub2_sp32): Merge into...
(*tldo_ldub2_sp64): Likewise.
(*tldo_ldub2): ...this and use Pmode throughout.
(*tldo_ldsb1_sp32): Merge into...
(*tldo_ldsb1_sp64): Likewise.
(*tldo_ldsb1): ...this and use Pmode throughout.
(*tldo_ldsb2_sp32): Merge into...
(*tldo_ldsb2_sp64): Likewise.
(*tldo_ldsb2): ...this and use Pmode throughout.
(*tldo_ldub3_sp64): Use DImode throughout.
(*tldo_ldsb3_sp64): Likewise.
(*tldo_lduh_sp32): Merge into...
(*tldo_lduh_sp64): Likewise.
(*tldo_lduh): ...this and use Pmode throughout.
(*tldo_lduh1_sp32): Merge into...
(*tldo_lduh1_sp64): Likewise.
(*tldo_lduh1): ...this and use Pmode throughout.
(*tldo_ldsh1_sp32): Merge into...
(*tldo_ldsh1_sp64): Likewise.
(*tldo_ldsh1): ...this and use Pmode throughout.
(*tldo_lduh2_sp64): Use DImode throughout.
(*tldo_ldsh2_sp64): Likewise.
(*tldo_lduw_sp32): Merge into...
(*tldo_lduw_sp64): Likewise.
(*tldo_lduw): ...this and use Pmode throughout.
(*tldo_lduw1_sp64): Use DImode throughout.
(*tldo_ldsw1_sp64): Likewise.
(*tldo_ldx_sp64): Likewise.
(*tldo_stb_sp32): Merge into...
(*tldo_stb_sp64): Likewise.
(*tldo_stb): ...this and use Pmode throughout.
(*tldo_sth_sp32): Merge into...
(*tldo_sth_sp64): Likewise.
(*tldo_sth): ...this and use Pmode throughout.
(*tldo_stw_sp32): Merge into...
(*tldo_stw_sp64): Likewise.
(*tldo_stw): ...this and use Pmode throughout.
(*tldo_stx_sp64): Use DImode throughout.

Added:
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint8.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sparc/sparc.c
trunk/gcc/config/sparc/sparc.md
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug

2019-01-09 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739

--- Comment #33 from Wilco  ---
(In reply to Richard Biener from comment #32)

> > 
> > Index: gcc/expr.c
> > ===
> > --- gcc/expr.c  (revision 267553)
> > +++ gcc/expr.c  (working copy)
> > @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target
> >infinitely recurse.  */
> > gcc_assert (tem != exp);
> >  
> > +   /* When extracting from non-mode bitsize entities adjust the
> > +  bit position for BYTES_BIG_ENDIAN.  */
> > +   if (INTEGRAL_TYPE_P (TREE_TYPE (tem))
> > +   && (TYPE_PRECISION (TREE_TYPE (tem))
> > +   < GET_MODE_BITSIZE (as_a  (TYPE_MODE 
> > (TREE_TYPE (tem)
> > +   && BYTES_BIG_ENDIAN)
> > + bitpos += (GET_MODE_BITSIZE (as_a  (TYPE_MODE 
> > (TREE_TYPE (tem
> > +- TYPE_PRECISION (TREE_TYPE (tem)));
> > +
> > /* If TEM's type is a union of variable size, pass TARGET to the 
> > inner
> >computation, since it will need a temporary and TARGET is known
> >to have to do.  This occurs in unchecked conversion in Ada.  */
> 
> Btw, this needs to be amended for WORDS_BIG_ENDIAN of course.  I guess
> we might even run into the case that such BIT_FIELD_REF references
> a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN !=
> BYTES_BIG_ENDIAN I guess).

Was that meant to be instead or in addition to the tree-ssa-sccvn.c patch? With
both I get:

lsr w20, w1, 2
...
and w1, w20, 65535

With only the expr.c patch it starts to look as expected:

lsr w20, w1, 2
...
lsr w1, w20, 14

And with the latter case the new torture test now passes on big-endian!

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #15 from Eric Botcazou  ---
Author: ebotcazou
Date: Wed Jan  9 14:39:18 2019
New Revision: 267772

URL: https://gcc.gnu.org/viewcvs?rev=267772&root=gcc&view=rev
Log:
PR target/84010
* config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode
consistently in TLS address generation and adjust code to the renaming
of patterns.  Mark calls to __tls_get_addr as const.
* config/sparc/sparc.md (tgd_hi22): Turn into...
(tgd_hi22): ...this and use Pmode throughout.
(tgd_lo10): Turn into...
(tgd_lo10): ...this and use Pmode throughout.
(tgd_add32): Merge into...
(tgd_add64): Likewise.
(tgd_add): ...this and use Pmode throughout.
(tldm_hi22): Turn into...
(tldm_hi22): ...this and use Pmode throughout.
(tldm_lo10): Turn into...
(tldm_lo10): ...this and use Pmode throughout.
(tldm_add32): Merge into...
(tldm_add64): Likewise.
(tldm_add): ...this and use Pmode throughout.
(tldm_call32): Merge into...
(tldm_call64): Likewise.
(tldm_call): ...this and use Pmode throughout.
(tldo_hix22): Turn into...
(tldo_hix22): ...this and use Pmode throughout.
(tldo_lox10): Turn into...
(tldo_lox10): ...this and use Pmode throughout.
(tldo_add32): Merge into...
(tldo_add64): Likewise.
(tldo_add): ...this and use Pmode throughout.
(tie_hi22): Turn into...
(tie_hi22): ...this and use Pmode throughout.
(tie_lo10): Turn into...
(tie_lo10): ...this and use Pmode throughout.
(tie_ld64): Use DImode throughout.
(tie_add32): Merge into...
(tie_add64): Likewise.
(tie_add): ...this and use Pmode throughout.
(tle_hix22_sp32): Merge into...
(tle_hix22_sp64): Likewise.
(tle_hix22): ...this and use Pmode throughout.
(tle_lox22_sp32): Merge into...
(tle_lox22_sp64): Likewise.
(tle_lox22): ...this and use Pmode throughout.
(*tldo_ldub_sp32): Merge into...
(*tldo_ldub_sp64): Likewise.
(*tldo_ldub): ...this and use Pmode throughout.
(*tldo_ldub1_sp32): Merge into...
(*tldo_ldub1_sp64): Likewise.
(*tldo_ldub1): ...this and use Pmode throughout.
(*tldo_ldub2_sp32): Merge into...
(*tldo_ldub2_sp64): Likewise.
(*tldo_ldub2): ...this and use Pmode throughout.
(*tldo_ldsb1_sp32): Merge into...
(*tldo_ldsb1_sp64): Likewise.
(*tldo_ldsb1): ...this and use Pmode throughout.
(*tldo_ldsb2_sp32): Merge into...
(*tldo_ldsb2_sp64): Likewise.
(*tldo_ldsb2): ...this and use Pmode throughout.
(*tldo_ldub3_sp64): Use DImode throughout.
(*tldo_ldsb3_sp64): Likewise.
(*tldo_lduh_sp32): Merge into...
(*tldo_lduh_sp64): Likewise.
(*tldo_lduh): ...this and use Pmode throughout.
(*tldo_lduh1_sp32): Merge into...
(*tldo_lduh1_sp64): Likewise.
(*tldo_lduh1): ...this and use Pmode throughout.
(*tldo_ldsh1_sp32): Merge into...
(*tldo_ldsh1_sp64): Likewise.
(*tldo_ldsh1): ...this and use Pmode throughout.
(*tldo_lduh2_sp64): Use DImode throughout.
(*tldo_ldsh2_sp64): Likewise.
(*tldo_lduw_sp32): Merge into...
(*tldo_lduw_sp64): Likewise.
(*tldo_lduw): ...this and use Pmode throughout.
(*tldo_lduw1_sp64): Use DImode throughout.
(*tldo_ldsw1_sp64): Likewise.
(*tldo_ldx_sp64): Likewise.
(*tldo_stb_sp32): Merge into...
(*tldo_stb_sp64): Likewise.
(*tldo_stb): ...this and use Pmode throughout.
(*tldo_sth_sp32): Merge into...
(*tldo_sth_sp64): Likewise.
(*tldo_sth): ...this and use Pmode throughout.
(*tldo_stw_sp32): Merge into...
(*tldo_stw_sp64): Likewise.
(*tldo_stw): ...this and use Pmode throughout.
(*tldo_stx_sp64): Use DImode throughout.

Added:
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c
  - copied unc

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #16 from Eric Botcazou  ---
Author: ebotcazou
Date: Wed Jan  9 14:41:55 2019
New Revision: 267773

URL: https://gcc.gnu.org/viewcvs?rev=267773&root=gcc&view=rev
Log:
PR target/84010
* config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode
consistently in TLS address generation and adjust code to the renaming
of patterns.  Mark calls to __tls_get_addr as const.
* config/sparc/sparc.md (tgd_hi22): Turn into...
(tgd_hi22): ...this and use Pmode throughout.
(tgd_lo10): Turn into...
(tgd_lo10): ...this and use Pmode throughout.
(tgd_add32): Merge into...
(tgd_add64): Likewise.
(tgd_add): ...this and use Pmode throughout.
(tldm_hi22): Turn into...
(tldm_hi22): ...this and use Pmode throughout.
(tldm_lo10): Turn into...
(tldm_lo10): ...this and use Pmode throughout.
(tldm_add32): Merge into...
(tldm_add64): Likewise.
(tldm_add): ...this and use Pmode throughout.
(tldm_call32): Merge into...
(tldm_call64): Likewise.
(tldm_call): ...this and use Pmode throughout.
(tldo_hix22): Turn into...
(tldo_hix22): ...this and use Pmode throughout.
(tldo_lox10): Turn into...
(tldo_lox10): ...this and use Pmode throughout.
(tldo_add32): Merge into...
(tldo_add64): Likewise.
(tldo_add): ...this and use Pmode throughout.
(tie_hi22): Turn into...
(tie_hi22): ...this and use Pmode throughout.
(tie_lo10): Turn into...
(tie_lo10): ...this and use Pmode throughout.
(tie_ld64): Use DImode throughout.
(tie_add32): Merge into...
(tie_add64): Likewise.
(tie_add): ...this and use Pmode throughout.
(tle_hix22_sp32): Merge into...
(tle_hix22_sp64): Likewise.
(tle_hix22): ...this and use Pmode throughout.
(tle_lox22_sp32): Merge into...
(tle_lox22_sp64): Likewise.
(tle_lox22): ...this and use Pmode throughout.
(*tldo_ldub_sp32): Merge into...
(*tldo_ldub_sp64): Likewise.
(*tldo_ldub): ...this and use Pmode throughout.
(*tldo_ldub1_sp32): Merge into...
(*tldo_ldub1_sp64): Likewise.
(*tldo_ldub1): ...this and use Pmode throughout.
(*tldo_ldub2_sp32): Merge into...
(*tldo_ldub2_sp64): Likewise.
(*tldo_ldub2): ...this and use Pmode throughout.
(*tldo_ldsb1_sp32): Merge into...
(*tldo_ldsb1_sp64): Likewise.
(*tldo_ldsb1): ...this and use Pmode throughout.
(*tldo_ldsb2_sp32): Merge into...
(*tldo_ldsb2_sp64): Likewise.
(*tldo_ldsb2): ...this and use Pmode throughout.
(*tldo_ldub3_sp64): Use DImode throughout.
(*tldo_ldsb3_sp64): Likewise.
(*tldo_lduh_sp32): Merge into...
(*tldo_lduh_sp64): Likewise.
(*tldo_lduh): ...this and use Pmode throughout.
(*tldo_lduh1_sp32): Merge into...
(*tldo_lduh1_sp64): Likewise.
(*tldo_lduh1): ...this and use Pmode throughout.
(*tldo_ldsh1_sp32): Merge into...
(*tldo_ldsh1_sp64): Likewise.
(*tldo_ldsh1): ...this and use Pmode throughout.
(*tldo_lduh2_sp64): Use DImode throughout.
(*tldo_ldsh2_sp64): Likewise.
(*tldo_lduw_sp32): Merge into...
(*tldo_lduw_sp64): Likewise.
(*tldo_lduw): ...this and use Pmode throughout.
(*tldo_lduw1_sp64): Use DImode throughout.
(*tldo_ldsw1_sp64): Likewise.
(*tldo_ldx_sp64): Likewise.
(*tldo_stb_sp32): Merge into...
(*tldo_stb_sp64): Likewise.
(*tldo_stb): ...this and use Pmode throughout.
(*tldo_sth_sp32): Merge into...
(*tldo_sth_sp64): Likewise.
(*tldo_sth): ...this and use Pmode throughout.
(*tldo_stw_sp32): Merge into...
(*tldo_stw_sp64): Likewise.
(*tldo_stw): ...this and use Pmode throughout.
(*tldo_stx_sp64): Use DImode throughout.

Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c
  - copied unchanged from r267771,
trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c
  - copied unc

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread jrtc27 at jrtc27 dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #17 from James Clarke  ---
Ah, great, thanks, that's indeed a nicer way of writing the patterns.

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

Eric Botcazou  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Eric Botcazou  ---
Fixed at last in upcoming 7.5, 8.3 and 9.x releases.

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #19 from Eric Botcazou  ---
> Ah, great, thanks, that's indeed a nicer way of writing the patterns.

You're welcome.  Don't hesitate to ping next time I drop the ball for so long.

[Bug target/84010] problematic TLS code generation on 64-bit SPARC

2019-01-09 Thread jrtc27 at jrtc27 dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010

--- Comment #20 from James Clarke  ---
(In reply to Eric Botcazou from comment #19)
> > Ah, great, thanks, that's indeed a nicer way of writing the patterns.
> 
> You're welcome.  Don't hesitate to ping next time I drop the ball for so
> long.

I had forgotten myself that a fix was never committed, probably because I
remembered writing the patch, otherwise I would have pinged it long ago!

[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)

2019-01-09 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769

Eric Gallager  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||egallager at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #3 from Eric Gallager  ---
dup of bug 80042

*** This bug has been marked as a duplicate of bug 80042 ***

[Bug middle-end/80042] gcc thinks sin/cos don't set errno

2019-01-09 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042

Eric Gallager  changed:

   What|Removed |Added

 CC||per at pz dot se

--- Comment #6 from Eric Gallager  ---
*** Bug 88769 has been marked as a duplicate of this bug. ***

[Bug middle-end/87836] ICE in cc1 for gcc-6.5.0 with SPARC hardware

2019-01-09 Thread gary_mills at fastmail dot fm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87836

--- Comment #30 from Gary Mills  ---
A build of gcc-7 on SPARC just completed successfully with a much larger
configuration:

$
/export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/configure
CC=/usr/gcc/4.9/bin/gcc CXX=/usr/gcc/4.9/bin/g++
F77=/usr/gcc/4.9/bin/gfortran FC=/usr/gcc/4.9/bin/gfortran CFLAGS=-O2
-mno-app-regs LDFLAGS=-m32 PKG_CONFIG_PATH=/usr/lib/pkgconfig
--prefix=/usr/gcc/7 --mandir=/usr/gcc/7/share/man
--bindir=/usr/gcc/7/bin --sbindir=/usr/gcc/7/bin
--libdir=/usr/gcc/7/lib --libexecdir=/usr/gcc/7/lib
--with-pkgversion=OpenIndiana 7.3.0-OI-0
--with-bugurl=https://bugs.openindiana.org --enable-languages=c,c++
--without-gnu-ld --with-ld=/usr/bin/ld --without-gnu-as
--with-as=/usr/bin/as LDFLAGS=-R/usr/gcc/7/lib

There still was no ICE.  I'm going to try an even larger configuration next in
an attempt to identify which configuration setting causes the ICE.  I'm
suspicious of these three:

CONFIGURE_OPTIONS+= --host $(GNU_ARCH)
CONFIGURE_OPTIONS+= --build $(GNU_ARCH)
CONFIGURE_OPTIONS+= --target $(GNU_ARCH)

which are part of the OI Makefile.  Note the missing equal signs (=).  I only
noticed these a few days ago.  I'll include these at the very end of my
testing.

[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450

--- Comment #17 from Jakub Jelinek  ---
Though, the more I look at it, the more I'm for reversion of the patch + deal
with it in the assign_stack_local caller that needs that.

[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc

2019-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450

--- Comment #18 from Jakub Jelinek  ---
Created attachment 45391
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45391&action=edit
gcc9-pr88450.patch

Untested patch that does that.

[Bug middle-end/86979] [9 Regression] ICE: in maybe_record_trace_start, at dwarf2cfi.c:2348 with -m32 on darwin

2019-01-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86979

--- Comment #10 from Alexander Monakov  ---
As discussed with Andrew offline, the real problem is creating a path where
stack pointer is decremented twice - that is really not supposed to happen (so
the issue could appear even in absence of REG_ARGS_SIZE notes). We'll be having
another look to find the root cause.

[Bug libgcc/88772] Exception handling configured mode does not match the one finally used

2019-01-09 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-01-09
 CC||ebotcazou at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Eric Botcazou  ---
What's the result of the configure check of libgcc for SJLJ?  It should be
visible in the config.log file in the libgcc build directory:

whether the compiler is configured for setjmp/longjmp exceptions...

[Bug demangler/88539] A memory leak issue was discovered in cplus-dem.c

2019-01-09 Thread nickc at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88539

Nick Clifton  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||nickc at gcc dot gnu.org
 Resolution|--- |WONTFIX

--- Comment #3 from Nick Clifton  ---
Sorry, but a leak of 10 bytes is just not serious enough to be worth
worrying about.  Especially when these programs do not run continuously
but instead terminate shortly after they are invoked.

[Bug c++/88572] error: braces around scalar initializer - should be a warning

2019-01-09 Thread wjwray at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88572

--- Comment #13 from Will Wray  ---
Re-reviewing, I notice that the patch I posted in comment #9
now rejects nested empty-brace scalar init:

  int i{{}};

which was previously accepted. So we'll need a decision on this too.

Clang rejects with -pedantic-errors or warns otherwise:
   pedantic error / warning: too many braces around scalar initializer

MSVC rejects:
   error: 'initializing': cannot convert from 'initializer list' to 'int'
   note: Too many braces around initializer for 'int'

I reckon that Clang is right to reject under -pedantic, else accept and warn

This Quora post comes to a similar conclusion:

https://www.quora.com/Is-double-braced-scalar-initialization-allowed-by-the-C-standard-int-x

>accepting {{}} for int seems like a harmless language extension.

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

--- Comment #4 from Martin Sebor  ---
The warning is triggered by the excessive size argument in the strncpy call. 
The excessive size makes the call invalid regardless of the values of the two
pointer arguments.

This happens both with the reduced test case in comment #0 and with the
translation unit and -m32.  The warning code just looks at the call:

  __builtin_strncpy (_65, buf_30, 4294967295);

I don't see much the warning code alone can do to handle this case.  We have
talked about at least two approaches to dealing these invalid calls earlier. 
Jeff's preference is to replace them with traps.  Others have suggested
replacing them with __builtin_unreachable().

[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error

2019-01-09 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771

--- Comment #5 from Martin Sebor  ---
That said, the size range in the warning output is wrong.  It should be just
4294967295.  The warning should probably also be changed to -Wstringop-overflow
which diagnoses both out-of-bounds writes and reads.  I can look into that.

[Bug c/88766] [9 Regression] Rejects valid? C code since r259641

2019-01-09 Thread joseph at codesourcery dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766

--- Comment #2 from joseph at codesourcery dot com  ---
Yes, I think that (a) a statement expression is not an lvalue and (b) if 
it were (or if the code were changed to move the unary '&' inside the 
statement expression), the code would be taking the address of an object 
whose lifetime had ended by the time that address is used.

[Bug tree-optimization/88763] Better Output for Loop Unswitching

2019-01-09 Thread dmalcolm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763

--- Comment #4 from David Malcolm  ---
(In reply to Marius Messerschmidt from comment #3)
> Sorry but I do not fully understand what you mean. Do you suggest using
> different command line arguments?

I believe Richard is referring to the internal API used for dumping; right now
it's presumably just writing to a FILE *, and this doesn't show up for
-fopt-info*.

> So far I tried:
> 
> -fdump-tree-all
> -fdump-tree-unswitch
> 
> and
> 
> -fopt-info-all-optall
> 
> But none of them told me the all the things that I would wish to know, most
> important the reason why a particular loop was skipped during unswitching
> (e.g. because it is not invariant or so (right now it already reports a few
> things with -fdump-tree-unswitch like too-many-instructions or
> too-many-branches))

Am taking a look.

[Bug libgcc/88772] Exception handling configured mode does not match the one finally used

2019-01-09 Thread ylatuya at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772

--- Comment #2 from Andoni  ---
(In reply to Eric Botcazou from comment #1)
> What's the result of the configure check of libgcc for SJLJ?  It should be
> visible in the config.log file in the libgcc build directory:
> 
> whether the compiler is configured for setjmp/longjmp exceptions...

I just wiped the build to start a clean build from scratch, but I remember
checking this and it was "no". I can confirm it in ~1 hour

  1   2   >