[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #17 from Iain Sandoe --- (In reply to Jürgen Reuter from comment #16) > Yes, after the problem occurred, I did a completely clean new build of gmp, > mpfr, mpc, gcc (configured with ../configure --prefix=/usr/local/ > --with-gmp=/usr/local/ --with-mpfr=/usr/local/ --with-mpc=/usr/local/ > --enable-checking=release --enable-languages=c,c++,fortran,lto), > all the tools our software depends, and our software. OK, FWIW (thinking a bit more last night) if you examine the logs from building GCC, you will see the same linker complaint in the log for building libstdc++.dylib. Which kinda reinforces the expectation that this is not the source of the problem. However, I'm thinking to try and construct some small experiment to check that the newer ld64 doesn't do something active as well as complain. > It turns out that > external C++ libraries linked into our (Fortran) project via bind(C) I might be wrong, but suspect there was some change to the C binding around that time too - but I also recall seeing a recent patch go by to fix a problem in that area (but not sure if it's been applied yet). Will let Dominique comment on that. > are not > a problem if they have been built via libtool, such that a .dylib, a .a and > a .la file are present. The two projects that have problem either exist as > .dylib and .a produced by hand-written configure and makefiles (i.e. not > using autotools), or only as dynamic libraries produced via cmake and make. That's an interesting observation, what we need is to find the specific difference in the output exe. * Narrowing this down by knowing where and what causes the problem will become important at some point - so a debug build and lldb session could be a useful next step. * as a general rule, it's also useful to see if an -O0 build exhibits the problem - in case its an optimisation issue.
[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758 --- Comment #5 from Martin Liška --- What about this: $ cat 11.i void PreEvaluate(void); int main() { PreEvaluate(); return 0; } $ cat 22.i cat 22.i extern int a[]; int b; int c; void PreEvaluate(void) { b = 0; for (; b < 8; b++) a[b] = c * (b > 0 ? b - 1 : 0); } $ gcc-8 11.i 22.i -flto -O3 -shared -fPIC $ gcc 11.i 22.i -flto -O3 -shared -fPIC during GIMPLE pass: dom 22.i: In function ‘PreEvaluate’: 22.i:5:6: internal compiler error: Segmentation fault 5 | void PreEvaluate(void) { | ^ 0xc186df crash_signal /home/marxin/Programming/gcc/gcc/toplev.c:326 0x76d8910f ??? /usr/src/debug/glibc-2.27-6.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 0xeb6184 location_wrapper_p(tree_node const*) /home/marxin/Programming/gcc/gcc/tree.h:3807 0xeb6184 tree_strip_any_location_wrapper(tree_node*) /home/marxin/Programming/gcc/gcc/tree.h:3819 0xeb6184 initializer_each_zero_or_onep(tree_node const*) /home/marxin/Programming/gcc/gcc/tree.c:11239 0xeb6264 initializer_each_zero_or_onep(tree_node const*) /home/marxin/Programming/gcc/gcc/tree.c:11259 0x1083fcf gimple_simplify_MULT_EXPR /dev/shm/objdir/gcc/gimple-match.c:47953 0xfa636f gimple_simplify /dev/shm/objdir/gcc/gimple-match.c:90161 0xfa79a3 gimple_resimplify2(gimple**, gimple_match_op*, tree_node* (*)(tree_node*)) /home/marxin/Programming/gcc/gcc/gimple-match-head.c:285 0x10bb1df gimple_simplify(gimple*, gimple_match_op*, gimple**, tree_node* (*)(tree_node*), tree_node* (*)(tree_node*)) /home/marxin/Programming/gcc/gcc/gimple-match-head.c:895 0x98f334 fold_stmt_1 /home/marxin/Programming/gcc/gcc/gimple-fold.c:4934 0xd2c566 dom_opt_dom_walker::optimize_stmt(basic_block_def*, gimple_stmt_iterator) /home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:1967 0xd2db2c dom_opt_dom_walker::before_dom_children(basic_block_def*) /home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:1468 0x13fd3a7 dom_walker::walk(basic_block_def*) /home/marxin/Programming/gcc/gcc/domwalk.c:353 0xd2e99d execute /home/marxin/Programming/gcc/gcc/tree-ssa-dom.c:706
[Bug c++/88752] ICE in enclosing_instantiation_of, at cp/pt.c:13328
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88752 Matthias Kretz changed: What|Removed |Added Attachment #45376|0 |1 is obsolete|| --- Comment #4 from Matthias Kretz --- Created attachment 45385 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45385&action=edit valid code test case True, I made an error in the verification script. Better reduction attached.
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #24 from rguenther at suse dot de --- On Wed, 9 Jan 2019, dongjianqiang2 at huawei dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > --- Comment #23 from John Dong --- > diff -urp a/gcc/expr.c b/gcc/expr.c > --- a/gcc/expr.c2019-01-09 03:19:03.750205982 +0800 > +++ b/gcc/expr.c2019-01-09 03:38:23.414174738 +0800 > @@ -10760,6 +10760,16 @@ expand_expr_real_1 (tree exp, rtx target > && GET_MODE_CLASS (ext_mode) == MODE_INT) > reversep = TYPE_REVERSE_STORAGE_ORDER (type); > > + int modePrecision = GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE > (tem))); > + int typePrecision = TYPE_PRECISION (TREE_TYPE (tem)); > + int shiftSize = modePrecision - typePrecision; > + rtx regTarget = gen_reg_rtx (GET_MODE (op0)); > + > + if (shiftSize && REG_P (op0)) > + op0 = expand_shift (LSHIFT_EXPR, GET_MODE (op0), op0, > + shiftSize, regTarget, > + TYPE_UNSIGNED (TREE_TYPE (tem))); > + > op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp, > (modifier == EXPAND_STACK_PARM > ? NULL_RTX : target), > > Tried to fix the bug when expand. The bug is clearly in value-numbering, not RTL expansion
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 --- Comment #19 from Richard Biener --- (In reply to Richard Biener from comment #18) > So for find_base_term to compute sth conservative we'd need to track > RTX_SURELY_NON_POINTER (what RTX is surely _not_ based on a pointer > and thus can be ignored). And when find_base_term ever figures > two bases in say a PLUS it has to conservatively return 0. > > I fear the existing REG_POINTER does not help at all. For the testcase > we have > > (plus:DI (reg:DI 83 [ d.0_2 ]) > (symbol_ref:DI ("y") [flags 0x2] )) > > where reg:DI 83 is not marked with REG_POINTER and find_base_term > doesn't find it to be an alternate base. For the testcase the > offending MEM has a MEM_EXPR and we have proper points-to info. > > IMHO the proper solution is to kill base_alias_check or all problematic > cases in find_base_term (binary ops with more than one non-CONST_INT > operand). > > And eventually make sure to more properly preserve MEM_EXPRs. > > Maybe sth as "simple" as the following which of course fixes the > testcase but will make find_base_term fail on any variable-indexed > thing. > > diff --git a/gcc/alias.c b/gcc/alias.c > index 93f53543d12..3a66e10b431 100644 > --- a/gcc/alias.c > +++ b/gcc/alias.c > @@ -2009,12 +2009,14 @@ find_base_term (rtx x, vec rtx base = find_base_term (tmp1, visited_vals); > if (base != NULL_RTX > && ((REG_P (tmp1) && REG_POINTER (tmp1)) > -|| known_base_value_p (base))) > +|| known_base_value_p (base)) > + && CONST_INT_P (tmp2)) > return base; > base = find_base_term (tmp2, visited_vals); > if (base != NULL_RTX > && ((REG_P (tmp2) && REG_POINTER (tmp2)) > -|| known_base_value_p (base))) > +|| known_base_value_p (base)) > + && CONST_INT_P (tmp1)) > return base; > > /* We could not determine which of the two operands was the "benchmarking" this by comparing cc1 with/without shows a difference mostly in scheduling (but the number of differences is comparatively small!). Also overall text size shrinks with the patch (whatever that means). On GIMPLE we try hard to not construct addresses "based" on the wrong object, in fact IVOPTs has code to avoid building IVs based on things like &a - &b and propagation avoids turning unintptr_t arithmetic into pointer arithmetic even if it can see the converted from addresses. All those things cannot be done on RTL since we lost the distinction between pointers and integers and there's only PLUS. So I have a _very_ hard time seeing how RTL can ever be fixed to discover bases for alias analysis purposes without just resorting to MEM_EXPRs. That is, unless we want to live with this kind of wrong-code bugs. Similarly fishy is may_be_sp_based_p.
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #18 from Iain Sandoe --- (In reply to Jürgen Reuter from comment #14) does the application use exceptions? > This one is failing: > gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o > static_1.exe .libs/static_1.exe_prclib_dispatcher.o > /usr/local/lib/libstdc++.a ^^^ please confirm that this is from the "current compiler build". > -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/ > libsupc++/.libs -lm note - no "-lSystem -lgcc_ext.10.5" (which is what I'd expect). > > while that one is working: > > gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o > static_1.exe .libs/static_1.exe_prclib_dispatcher.o > libsupc++/.libs -lSystem -lgcc_ext.10.5 /usr/local//lib/libHepMC.a -lstdc++ > -llcio -lm ^^^ this looks like the build process in this case is adding libs that the compiler driver normally adds ( they are not present in the case above ). * If you can extract these two fortran link lines - and then execute them separately in the build dir with "-v" so that we can see the output of the compiler-driver's internal link line and what its search paths are. * According to your posted otool output, the version of libstdc++.dylib that is bound is the one in /usr/local/lib/ which is where you pick up the static lib in the non-working case. * The object files used to build the static (.a) and dynamic (.dylib) versions of libstdc++ are the same, so we really need to pin down where the issue occurs. * DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1 will show you which libraries are used, and from which library each symbol is resolved - it probably will produce a lot of output..
[Bug tree-optimization/87214] [9 Regression] r263772 miscompiled 520.omnetpp_r in SPEC CPU 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #4 from rsandifo at gcc dot gnu.org --- Mine then.
[Bug libstdc++/88204] New test case 26_numerics/complex/operators/more_constexpr.cc from r266416 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88204 --- Comment #2 from Jonathan Wakely --- Author: redi Date: Wed Jan 9 09:37:34 2019 New Revision: 267757 URL: https://gcc.gnu.org/viewcvs?rev=267757&root=gcc&view=rev Log: PR libstdc++/88204 disable std::complex tests The IBM128 long double format isn't foldable in constant expressions, so conditionally skip the std::complex cases when they'll fail. PR libstdc++/88204 * testsuite/26_numerics/complex/operators/more_constexpr.cc: Do not test std::complex if long double format is IBM128. * testsuite/26_numerics/complex/requirements/more_constexpr.cc: Likewise. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/testsuite/26_numerics/complex/operators/more_constexpr.cc trunk/libstdc++-v3/testsuite/26_numerics/complex/requirements/more_constexpr.cc
[Bug tree-optimization/88763] Better Output for Loop Unswitching
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-09 CC||dmalcolm at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I guess the logging should be switched to dump_* so that -fopt-info- can report these.
[Bug c++/88761] [8/9 Regression] ICE in tsubst_copy, at cp/pt.c:15478 when chaining lambda calls & fold-expressions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88761 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Known to work||7.3.1 Version|8.2.0 |8.2.1 Target Milestone|9.0 |8.3 Summary|[9 Regression] ICE in |[8/9 Regression] ICE in |tsubst_copy, at |tsubst_copy, at |cp/pt.c:15478 when chaining |cp/pt.c:15478 when chaining |lambda calls & |lambda calls & |fold-expressions|fold-expressions
[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #1 from Richard Biener --- So LLVM unrolls 4 times while GCC (always) unrolls 8 times. The unrolled body for GCC (x86_64 this time) is .L4: movl(%rdx), %ecx vmovsd (%rax), %xmm8 addq$32, %rdx addq$64, %rax vmovsd -56(%rax), %xmm9 vmovsd -48(%rax), %xmm10 vfmadd231sd (%rsi,%rcx,8), %xmm8, %xmm0 movl-28(%rdx), %ecx vmovsd -40(%rax), %xmm11 vmovsd -32(%rax), %xmm12 vfmadd231sd (%rsi,%rcx,8), %xmm9, %xmm0 movl-24(%rdx), %ecx vmovsd -24(%rax), %xmm13 vmovsd -16(%rax), %xmm14 vfmadd231sd (%rsi,%rcx,8), %xmm10, %xmm0 movl-20(%rdx), %ecx vmovsd -8(%rax), %xmm15 vfmadd231sd (%rsi,%rcx,8), %xmm11, %xmm0 movl-16(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm12, %xmm0 movl-12(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm13, %xmm0 movl-8(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm14, %xmm0 movl-4(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm15, %xmm0 cmpq%rax, %r9 jne .L4 and what you quoted is the prologue. You didn't quote llvms prologue but if I read my clangs outout correct it uses a loop there. (is there sth like -fdump-tree-optimized for clang?) Our RTL unroller cannot do a loopy prologue but it always has this jump-into peeled copies thing. Using --param max-unroll-times=4 produces .L4: movl(%rdx), %ecx vmovsd (%rax), %xmm2 addq$16, %rdx addq$32, %rax vmovsd -24(%rax), %xmm3 vmovsd -16(%rax), %xmm4 vfmadd231sd (%rsi,%rcx,8), %xmm2, %xmm0 movl-12(%rdx), %ecx vmovsd -8(%rax), %xmm5 vfmadd231sd (%rsi,%rcx,8), %xmm3, %xmm0 movl-8(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm4, %xmm0 movl-4(%rdx), %ecx vfmadd231sd (%rsi,%rcx,8), %xmm5, %xmm0 cmpq%rax, %r8 jne .L4 which is nearly equivalent to clnags varaint?
[Bug rtl-optimization/88331] [9 Regression] ICE in rtl_verify_bb_layout, at cfgrtl.c:2987
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88331 --- Comment #15 from Jakub Jelinek --- Author: jakub Date: Wed Jan 9 10:16:10 2019 New Revision: 267758 URL: https://gcc.gnu.org/viewcvs?rev=267758&root=gcc&view=rev Log: PR rtl-optimization/88331 * function.c (assign_stack_local_1): Don't set dynamic_align_addr if not currently_expanding_to_rtl. * gcc.target/i386/pr88331.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr88331.c Modified: trunk/gcc/ChangeLog trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog
[Bug libstdc++/87855] std::optional only copy-constructible if T is trivially copy-constructible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87855 --- Comment #22 from Jonathan Wakely --- Author: redi Date: Wed Jan 9 10:17:10 2019 New Revision: 267759 URL: https://gcc.gnu.org/viewcvs?rev=267759&root=gcc&view=rev Log: PR libstdc++/87855 fix optional for types with non-trivial copy/move Backport both parts of the fix for PR libstdc++/87855, as well as a test tweak from r263657 to avoid having to adjust dg-error line numbers. * testsuite/20_util/optional/cons/value_neg.cc: Change dg-error to dg-prune-output. Remove unused header. Backport from mainline 2019-01-08 Jonathan Wakely When the contained value is not trivially copy (or move) constructible the union's copy (or move) constructor will be deleted, and so the _Optional_payload delegating constructors are invalid. G++ fails to diagnose this because it incorrectly performs copy elision in the delegating constructors. Clang does diagnose it (llvm.org/PR40245). The solution is to avoid performing any copy (or move) when the contained value's copy (or move) constructor isn't trivial. Instead the contained value can be constructed by calling _M_construct. This is OK, because the relevant constructor doesn't need to be constexpr when the contained value isn't trivially copy (or move) constructible. Additionally, this patch removes a lot of code duplication in the _Optional_payload partial specializations and the _Optional_base partial specialization, by hoisting it into common base classes. The Python pretty printer for std::optional needs to be adjusted to support the new layout. Retain support for the old layout, and add a test to verify that the support still works. PR libstdc++/87855 * include/std/optional (_Optional_payload_base): New class template for common code hoisted from _Optional_payload specializations. Use a template for the union, to allow a partial specialization for types with non-trivial destructors. Add constructors for in-place initialization to the union. (_Optional_payload(bool, const _Optional_payload&)): Use _M_construct to perform non-trivial copy construction, instead of relying on non-standard copy elision in a delegating constructor. (_Optional_payload(bool, _Optional_payload&&)): Likewise for non-trivial move construction. (_Optional_payload): Derive from _Optional_payload_base and use it for everything except the non-trivial assignment operators, which are defined as needed. (_Optional_payload): Derive from the specialization _Optional_payload and add a destructor. (_Optional_base_impl::_M_destruct, _Optional_base_impl::_M_reset): Forward to corresponding members of _Optional_payload. (_Optional_base_impl::_M_is_engaged, _Optional_base_impl::_M_get): Hoist common members from _Optional_base. (_Optional_base): Make all members and base class public. (_Optional_base::_M_get, _Optional_base::_M_is_engaged): Move to _Optional_base_impl. * python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Add support for new std::optional layout. * testsuite/libstdc++-prettyprinters/compat.cc: New test. Backport from mainline 2018-11-19 Ville Voutilainen PR libstdc++/87855 Also implement P0602R4 (variant and optional should propagate copy/move triviality) for std::optional. * include/std/optional (_Optional_payload): Change the main constraints to check constructibility in addition to assignability. (operator=): Make constexpr. (_M_reset): Likewise. (_M_construct): Likewise. (operator->): Likewise. * testsuite/20_util/optional/assignment/8.cc: Adjust. * testsuite/20_util/optional/assignment/9.cc: New. Added: branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/assignment/9.cc Modified: branches/gcc-8-branch/libstdc++-v3/ChangeLog branches/gcc-8-branch/libstdc++-v3/include/std/optional branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/assignment/8.cc branches/gcc-8-branch/libstdc++-v3/testsuite/20_util/optional/cons/value_neg.cc
[Bug libstdc++/87855] std::optional only copy-constructible if T is trivially copy-constructible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87855 Jonathan Wakely changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Target Milestone|--- |8.3 --- Comment #21 from Jonathan Wakely --- Also fixed for GCC 8.3
[Bug target/88756] [nvptx, openacc] Override too many num_workers in nvptx plugin, instead of erroring out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88756 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > For the user, it's somewhat confusing that this passes with warning when > compiling as C++, and fails to execute when compiling as C. > I wonder why we don't do the > same in the plugin, that is, override with warning. > > We would have the more acceptable difference of "compile with warning and > run" vs "compile and run with warning". Thomas, any comments from OpenACC usability perspective? Thanks, - Tom
[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758 --- Comment #6 from Jakub Jelinek --- Author: jakub Date: Wed Jan 9 10:24:43 2019 New Revision: 267760 URL: https://gcc.gnu.org/viewcvs?rev=267760&root=gcc&view=rev Log: PR middle-end/88758 * tree.c (initializer_each_zero_or_onep) : Use vector_cst_elt instead of VECTOR_CST_ENCODED_ELT. Modified: trunk/gcc/ChangeLog trunk/gcc/tree.c
[Bug c/88766] New: [9 Regression] Rejects valid? C code since r259641
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766 Bug ID: 88766 Summary: [9 Regression] Rejects valid? C code since r259641 Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: jakub at gcc dot gnu.org, jsm28 at gcc dot gnu.org Target Milestone: --- Following code (reduced from gpg2) now fails to compile: $ cat dns-stuff.i struct dns_options { struct { void *a; int b; }; int *socks_host; char *socks_user; char *socks_password; }; static char tor_socks_user[1], tor_socks_password[1]; struct { int socks_host; } libdns; int d; int *c(); int ax() { int *az; int ba; az = c((&__extension__({ (struct dns_options){{0, 0}, 0, 0, .socks_host = &libdns.socks_host, .socks_user = tor_socks_user, .socks_password = tor_socks_password}; })), &ba); d = *az; return 0; } $ gcc dns-stuff.i dns-stuff.i: In function ‘ax’: dns-stuff.i:19:11: error: lvalue required as unary ‘&’ operand 19 | az = c((&__extension__({ | ^
[Bug rtl-optimization/88331] [9 Regression] ICE in rtl_verify_bb_layout, at cfgrtl.c:2987
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88331 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #16 from Jakub Jelinek --- Fixed.
[Bug middle-end/88758] [9 Regression] 186.crafty in SPEC CPU 2000 failed to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88758 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Jakub Jelinek --- Fixed. If you manage to turn the testcase into testsuite suitable form, please commit it with this PR's number in the ChangeLog.
[Bug c/88766] [9 Regression] Rejects valid? C code since r259641
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.0
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #2 from ktkachov at gcc dot gnu.org --- Created attachment 45386 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45386&action=edit aarch64-llvm output with -Ofast -mcpu=cortex-a57 I'm attaching the full LLVM aarch64 output. The output you quoted is with -funroll-loops. If that's not given, GCC doesn't seem to unroll by default at all (on aarch64 or x86_64 from my testing). Is there anything we can do to make the default unrolling a bit more aggressive?
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #19 from Jürgen Reuter --- (In reply to Iain Sandoe from comment #18) > (In reply to Jürgen Reuter from comment #14) > > does the application use exceptions? No exceptions, only a poor man's C signal catcher. > > > /usr/local/lib/libstdc++.a > > ^^^ please confirm that this is from the "current compiler build". > Yes, they are the same. Unfortunately, there is no uninstall target for gcc, but all stdc++ libraries in /usr/local/lib are from my Jan 8 clean building. > > ^^^ this looks like the build process in this case is adding libs that > the compiler driver normally adds ( they are not present in the case above ). > Yes, that is for a different reason, a different build with a tutorial C and C++ wrapper for our code, but they don't hurt here. > * If you can extract these two fortran link lines - and then execute them > separately in the build dir with "-v" so that we can see the output of the > compiler-driver's internal link line and what its search paths are. This is the output for the non-working linking: $ gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o static_1.exe .libs/static_1.exe_prclib_dispatcher.o -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hepmc -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/lcio -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hoppet -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/looptools -L/usr/local/packages/OpenLoops/lib -L/usr/local/lib -L../src ./.libs/static_1_lib.a -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/models /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core/.libs/libwhizard_main.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libomega.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/omega/src/.libs/libomega_core.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libwhizard.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/vamp/src/.libs/libvamp.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe1/src/.libs/libcirce1.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe2/src/.libs/libcirce2.a -lcuttools -lopenloops -loneloop -lolcommon -lrambo /usr/local/lib/libLHAPDF.a /usr/local//lib/libHepMC.a -llcio /usr/local/lib/libstdc++.a -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src/.libs -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/libsupc++/.libs -lm -v Driving: gfortran -g -O2 -Wl,-rpath -Wl,/usr/local/packages/OpenLoops/lib -o static_1.exe .libs/static_1.exe_prclib_dispatcher.o -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hepmc -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/lcio -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/hoppet -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/looptools -L/usr/local/packages/OpenLoops/lib -L/usr/local/lib -L../src ./.libs/static_1_lib.a -L/Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/models /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/whizard-core/.libs/libwhizard_main.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libomega.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/omega/src/.libs/libomega_core.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/src/.libs/libwhizard.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/vamp/src/.libs/libvamp.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe1/src/.libs/libcirce1.a /Users/reuter/Physik/whizard/trunk/_build_quasi_naked/circe2/src/.libs/libcirce2.a -lcuttools -lopenloops -loneloop -lolcommon -lrambo /usr/local/lib/libLHAPDF.a /usr/local//lib/libHepMC.a -llcio /usr/local/lib/libstdc++.a -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/src/.libs -L/usr/local/packages/gcc_9.0/_build/x86_64-apple-darwin18.2.0/libstdc++-v3/libsupc++/.libs -lm -v -mmacosx-version-min=10.14.0 -asm_macosx_version_min=10.14 -l gfortran -shared-libgcc Using built-in specs. COLLECT_GCC=gfortran COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin18.2.0/9.0.0/lto-wrapper Target: x86_64-apple-darwin18.2.0 Configured with: ../configure --prefix=/usr/local/ --with-gmp=/usr/local/ --with-mpfr=/usr/local/ --with-mpc=/usr/local/ --enable-checking=release --enable-languages=c,c++,fortran,lto Thread model: posix gcc version 9.0.0 20190107 (experimental) (GCC) Reading specs from /usr/local/lib/gcc/x86_64-apple-darwin18.
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #3 from Richard Biener --- (In reply to ktkachov from comment #2) > Created attachment 45386 [details] > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > I'm attaching the full LLVM aarch64 output. > > The output you quoted is with -funroll-loops. If that's not given, GCC > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > testing). > > Is there anything we can do to make the default unrolling a bit more > aggressive? Well, the RTL loop unroller is not enabled by default at any optimization level (unless you are using FDO). There's also related flags not enabled (-fsplit-ivs-in-unroller and -fvariable-expansion-in-unroller). The RTL loop unroller is simply not good at estimating benefit of unrolling (which is also why you usually see it unrolling --param max-unroll-times times) and the tunables it has are not very well tuned across targets. Micha did quite extensive benchmarking (on x86_64) which shows that the cases where unrolling is profitable are rare and the reason is often hard to understand. That's of course in the context of CPUs having caches of pre-decoded/fused/etc. instructions optimizing issue which makes peeled prologues expensive as well as even more special caches for small loops avoiding more frontend costs. Not sure if arm archs have any of this. I generally don't believe in unrolling as a separately profitable transform. Rather unrolling could be done as part of another transform (vectorization is the best example). For sth still done on RTL that would then include scheduling which is where the best cost estimates should be available (and if you do this post-reload then you even have a very good idea of register pressure). This is also why I think a standalone unrolling phase belongs on RTL since I don't see a good way of estimating cost/benefit on GIMPLE (see how difficult it is to cost vectorization vs. non-vectorization there).
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #20 from Jürgen Reuter --- Created attachment 45387 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45387&action=edit DYLD_PRINT output non-working example DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1 ./static_1.exe > non_working_output 2>&1
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #21 from Jürgen Reuter --- Created attachment 45388 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45388&action=edit DYLD_PRINT output working example DYLD_PRINT_LIBRARIES=1 DYLD_PRINT_BINDINGS=1 ./static_1.exe > working_output 2>&1
[Bug tree-optimization/88767] New: 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 Bug ID: 88767 Summary: 'unroll and jam' not optimizing some loops Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: helijia at gcc dot gnu.org Target Milestone: --- The test source is as follows: __attribute__((noinline)) void calculate(const double* __restrict__ A, const double* __restrict__ B, double* __restrict__ C) { unsigned int l_m = 0; unsigned int l_n = 0; unsigned int l_k = 0; A = (const double*)__builtin_assume_aligned(A,16); B = (const double*)__builtin_assume_aligned(B,16); C = (double*)__builtin_assume_aligned(C,16); for ( l_n = 0; l_n < 9; l_n++ ) { // loop 1 for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; } // loop 2 for ( l_k = 0; l_k < 17; l_k++ ) { // loop 3 for ( l_m = 0; l_m < 10; l_m++ ) { // loop 4 C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k]; } } } } #define SIZE 36 double A[SIZE][SIZE] __attribute__((aligned(16))); double B[SIZE][SIZE] __attribute__((aligned(16))); double C[SIZE][SIZE] __attribute__((aligned(16))); int main() { long r, i, j; for (i=0; i < SIZE; i++) { for (j=0; j < SIZE; j++) { A[i][j] = 1.0; B[i][j] = 2.0; C[i][j] = 3.0; } } for (r=0; r < 100; r++) { calculate(&A[0][0],&B[0][0], &C[0][0]); } return 0; } First, I compile the test case with the following command. g++ unroll_jam_bug.cpp -O3 -funroll-loops -floop-unroll-and-jam -o unroll_jam_bug -fdump-tree-unrolljam-details. In the generated file of unroll_jam_bug.cpp.143t.unrolljam, I found that there is no unroll and jam optimization for the loop in the calculate function. Second, I added the -fdump-tree-all parameter to the command line. I found that the innermost loop(loop 3 and 4) is completely unrolled because pass_data_complete_unrolli pass thinks innermost loop is small. As the inner loop is fully expanded, the original loop becomes large. When the loop is expanded in the pass_loop_jam pass, the number of unroll_factor * loop instruction > 200 will be judged. If the result is true, the optimization will be abandoned. Otherwise, the optimization will proceed. By the second analysis, I tried to ban the unrolli optimization.So I use the following command line. g++ unroll_jam_bug.cpp -O3 -mcpu=power8 -fdisable-tree-cunrolli -floop-unroll-and-jam -o unroll_jam_bug -fdump-tree-unrolljam-details Using this command, loop unroll and jam optimization will be executed, but there seems to be room for optimization. Original code: for ( l_n = 0; l_n < 9; l_n++ ) { for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; } for ( l_k = 0; l_k < 17; l_k++ ) { for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k]; } } } After unroll and jam pass: for ( l_n = 0; l_n < 9; l_n++ ) { for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; } for ( l_k = 0; l_k < 17; l_k += 2 ) { for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k]; C[(l_n*10)+l_m] += A[(l_k*20 + 20)+l_m] * B[(l_n*20)+l_k + 1]; } } }
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #22 from Jürgen Reuter --- This is the output from the lldb command (but this was not a debug build of gcc yet): $ lldb ./static_1.exe (lldb) target create "./static_1.exe" Current executable set to './static_1.exe' (x86_64). (lldb) run Process 36799 launched: './static_1.exe' (x86_64) static_1.exe(36799,0x1048f75c0) malloc: *** error for object 0x105c5eee0: pointer being freed was not allocated static_1.exe(36799,0x1048f75c0) malloc: *** set a breakpoint in malloc_error_break to debug Process 36799 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x7fff5a2d023e libsystem_kernel.dylib`__pthread_kill + 10 libsystem_kernel.dylib`__pthread_kill: -> 0x7fff5a2d023e <+10>: jae0x7fff5a2d0248; <+20> 0x7fff5a2d0240 <+12>: movq %rax, %rdi 0x7fff5a2d0243 <+15>: jmp0x7fff5a2ca3b7; cerror_nocancel 0x7fff5a2d0248 <+20>: retq Target 0: (static_1.exe) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x7fff5a2d023e libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x7fff5a386c1c libsystem_pthread.dylib`pthread_kill + 285 frame #2: 0x7fff5a2391c9 libsystem_c.dylib`abort + 127 frame #3: 0x7fff5a3486e2 libsystem_malloc.dylib`malloc_vreport + 545 frame #4: 0x7fff5a3484a3 libsystem_malloc.dylib`malloc_report + 152 frame #5: 0x000100929c84 static_1.exe`std::locale::_Impl::~_Impl(this=0x000105c5f0a0) at locale.cc:243 frame #6: 0x000100929d8e static_1.exe`std::locale::operator=(this=0x000105c611c0, __other=0x7ffeefbfdad8) at locale_classes.h:568 frame #7: 0x000100927aec static_1.exe`std::ios_base::_M_init(this=0x000105c610f0) at ios_locale.cc:44 frame #8: 0x00010096cef1 static_1.exe`std::basic_ios >::init(this=0x000105c610f0, __sb=0x000105c60840) at basic_ios.tcc:129 frame #9: 0x000105afcdf9 libstdc++.6.dylib`std::ios_base::Init::Init() + 681 frame #10: 0x000105ad30a0 libsio.2.12.dylib`_GLOBAL__sub_I_SIO_blockManager.cc + 16 frame #11: 0x000104859cc8 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 518 frame #12: 0x000104859ec6 dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40 frame #13: 0x0001048550da dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 358 frame #14: 0x00010485506d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 249 frame #15: 0x00010485506d dyld`ImageLoader::recursiveInitialization(ImageLoader::LinkContext const&, unsigned int, char const*, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 249 frame #16: 0x000104854254 dyld`ImageLoader::processInitializers(ImageLoader::LinkContext const&, unsigned int, ImageLoader::InitializerTimingList&, ImageLoader::UninitedUpwards&) + 134 frame #17: 0x0001048542e8 dyld`ImageLoader::runInitializers(ImageLoader::LinkContext const&, ImageLoader::InitializerTimingList&) + 74 frame #18: 0x000104843774 dyld`dyld::initializeMainExecutable() + 199 frame #19: 0x00010484878f dyld`dyld::_main(macho_header const*, unsigned long, int, char const**, char const**, char const**, unsigned long*) + 6237 frame #20: 0x0001048424f6 dyld`dyldbootstrap::start(macho_header const*, int, char const**, long, macho_header const*, unsigned long*) + 1154 frame #21: 0x000104842036 dyld`_dyld_start + 54
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 --- Comment #20 from Richard Biener --- For stage3/gcc/*.o statistics show we perform 21051052 base_alias_check calls and in the end 706852 times it is the one that would have disambiguated things compared to if we remove it (thus as if we do base_alias_check last). Note there's also base = find_base_term (x_addr); if (base && (GET_CODE (base) == LABEL_REF || (GET_CODE (base) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (base return 0; which is suspicious but I guess harder to hit in practice so things go wrong. base_alias_check is not exactly the first thing we check (but nearly) so we'd roughly lose 3% disambiguations from RTL alias analysis if we scrap base_alias_check completely. That's probably too much. Note the CONSTANT_POOL_ADDRESS_P thing isn't necessary and subsumed by following checks so we could remove that without losing anything (it hits only 84 times at all in the above set and later checks subsume it).
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 --- Comment #21 from Richard Biener --- Created attachment 45389 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45389&action=edit statistic patch patch I added to record statistics
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2019-01-09 CC||matz at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- What's the room for improvement? Why's unrolling the innermost loop not profitable?
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #23 from Iain Sandoe --- (In reply to Jürgen Reuter from comment #22) > This is the output from the lldb command (but this was not a debug build of > gcc yet): > $ lldb ./static_1.exe > (lldb) target create "./static_1.exe" > Current executable set to './static_1.exe' (x86_64). > (lldb) run > __sb=0x000105c60840) at basic_ios.tcc:129 > frame #9: 0x000105afcdf9 > libstdc++.6.dylib`std::ios_base::Init::Init() + 681 > frame #10: 0x000105ad30a0 so, you have a combination of things linking libstdc++ statically and dynamically .. that seems fragile at best. Having said that - the tricky thing now is to determine what has "broken" (it's probably going to be hard without a "before" and "after" case).
[Bug libstdc++/88204] New test case 26_numerics/complex/operators/more_constexpr.cc from r266416 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88204 --- Comment #3 from Jonathan Wakely --- Fixed for GNU/Linux and AIX. Please reopen if it's still failing on Darwin.
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #24 from Richard Biener --- (In reply to Iain Sandoe from comment #23) > (In reply to Jürgen Reuter from comment #22) > > This is the output from the lldb command (but this was not a debug build of > > gcc yet): > > $ lldb ./static_1.exe > > (lldb) target create "./static_1.exe" > > Current executable set to './static_1.exe' (x86_64). > > (lldb) run > > > > > __sb=0x000105c60840) at basic_ios.tcc:129 > > frame #9: 0x000105afcdf9 > > libstdc++.6.dylib`std::ios_base::Init::Init() + 681 > > frame #10: 0x000105ad30a0 > > so, you have a combination of things linking libstdc++ statically and > dynamically .. that seems fragile at best. > > Having said that - the tricky thing now is to determine what has "broken" > (it's probably going to be hard without a "before" and "after" case). Indeed - somehow you didn't get a statically linked executable. Quoting the full final link command would be interesting.
[Bug fortran/88768] New: Derived type io in conjunction with allocatable component and recursion fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768 Bug ID: 88768 Summary: Derived type io in conjunction with allocatable component and recursion fails Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mscfd at gmx dot net Target Milestone: --- This is a strange bug, which requires a some kind of dt IO (defined as " generic :: write(unformatted) => write_unformatted", but not used!), an allocatable component with dimension(:) (a "character(len=:), allocatable" triggers the bug as well), and a recursive function. If the "write(unformatted)"-part is commented out, the bug does not occur. Without recursing, the return value is fine (variable y). Also, if the dimension(:) is omitted in the declaration of r, then the bug disappears as well. The code either show funny values for z or segfaults. Valgrind shows an illegal memory read. module mod implicit none private type, public :: t real, dimension(:), allocatable :: r contains procedure :: set generic :: assignment(=) => set procedure :: recurse generic :: write(unformatted) => write_unformatted procedure :: write_unformatted end type t contains subroutine set(self, x) class(t), intent(out) :: self class(t), intent(in) :: x real, dimension(:), allocatable :: tmp if (allocated(x%r)) then ! make a local copy to avoid any aliasing issues tmp = x%r self%r = tmp end if end subroutine set recursive function recurse(self, i) result(x) type(t) :: x class(t), intent(in) :: self integer, intent(in) :: i if (i > 0) then x = self%recurse(i-1) else x = self end if end function recurse subroutine write_unformatted(dtv, unit, iostat, iomsg) class(t), intent(in):: dtv integer, intent(in):: unit integer, intent(out) :: iostat character(len=*), intent(inout) :: iomsg write(unit, iostat=iostat, iomsg=iomsg) 'unformatted' end subroutine write_unformatted end module mod program dt_io use mod implicit none type(t) :: x, y, z x%r = [1.23, 2.21] y = x%recurse(0) ! fine z = x%recurse(1) ! fails print *, x%r print *, y%r print *, z%r end program dt_io
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #25 from Jürgen Reuter --- (In reply to Richard Biener from comment #24) > (In reply to Iain Sandoe from comment #23) > > (In reply to Jürgen Reuter from comment #22) > > Indeed - somehow you didn't get a statically linked executable. Quoting the > full final link command would be interesting. The full link commands can be found here, I believe: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750#c14 Our code generates code for particle physics simulations in the form of dynamic libraries that get linked and loaded. For batch clusters, we attempted to provide static binaries for these simulations, however, we have order 10-15 external libraries that can be linked to our code (which are partially mandatory). There are some of them which only exist as dynamic libraries, so there our approach cannot result in a purely static binary. The static stdc++ library is sucked in via the libtool link mode/flag -static-libtool-libs while the dynamic ones are sucked in via the external C++ libraries that are available only dynamically.
[Bug sanitizer/88684] [7/8/9 Regression] Please make SANITIZER_NON_UNIQUE_TYPEINFO a runtime flag (or always true)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684 --- Comment #8 from Martin Liška --- I created upstream patch candidate: https://reviews.llvm.org/D56485
[Bug sanitizer/88684] [7/8/9 Regression] Please make SANITIZER_NON_UNIQUE_TYPEINFO a runtime flag (or always true)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684 Martin Liška changed: What|Removed |Added Target Milestone|--- |9.0
[Bug c/88766] [9 Regression] Rejects valid? C code since r259641
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766 --- Comment #1 from Jakub Jelinek --- Reduced testcase: struct S { int s; }; void foo (void) { void *p = &(struct S) { 0 }; void *q = &({ (struct S) { 0 }; }); } The p initializer is accepted, q is rejected. By my reading this is invalid, C99 6.5.2.5/6 says: "If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block." and the statement expression is still a compound statement and thus the compound literal is associated with the statement expression's block. So it is the same thing as: void bar (void) { void *r = &({ int a = 0; a; }); } which fails with the same diagnostics. Joseph, do you agree?
[Bug tree-optimization/88763] Better Output for Loop Unswitching
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763 --- Comment #2 from Marius Messerschmidt --- Sorry but I do not fully understand what you mean. Do you suggest using different command line arguments? So far I tried: -fdump-tree-all -fdump-tree-unswitch and -fopt-info-all-optall But none of them told me the all the things that I would wish to know, most important the reason why a particular loop was skipped during unswitching (e.g. because it is not invariant or so (right now it already reports a few things with -fdump-tree-unswitch like too-many-instructions or too-many-branches))
[Bug rtl-optimization/88751] Performance regression reload vs lra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88751 --- Comment #2 from Andreas Krebbel --- (In reply to Richard Biener from comment #1) ... > Would be interesting to know the sparseness of regs / BBs for your testcase > at the point of LRA and whether compacting regs (do we ever do that?) might > be a good idea in general. (we do compact BBs regularly) Good point. Only 9352 of the 27089 pseudos appear to be actually referenced. Hence the following patch fixes the problem for me: diff --git a/gcc/ira.c b/gcc/ira.c index c8f2df43dd1..965819e1ef9 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5157,6 +5157,7 @@ ira (FILE *f) int ira_max_point_before_emit; bool saved_flag_caller_saves = flag_caller_saves; enum ira_region saved_flag_ira_region = flag_ira_region; + int i, num_used_regs = 0; clear_bb_flags (); @@ -5172,12 +5173,17 @@ ira (FILE *f) ira_conflicts_p = optimize > 0; + /* Determine the number of pseudos actually requiring coloring. */ + for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++) +num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); + /* If there are too many pseudos and/or basic blocks (e.g. 10K pseudos and 10K blocks or 100K pseudos and 1K blocks), we will use simplified and faster algorithms in LRA. */ lra_simple_p = (ira_use_lra_p - && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun)); + && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); + if (lra_simple_p) { /* It permits to skip live range splitting in LRA. */
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 --- Comment #22 from Richard Biener --- Things we fail to disambiguate are (mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0 S16 A8]) vs. (mem/c:TF (plus:SI (reg/f:SI 19 frame) (const_int -16 [0xfff0])) [1 S16 A128]) or (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3 S4 A32]) vs. (mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2] ) [2 argv+0 S4 A32]) where I don't find anything besides CSELIB cselib_sp_based_value_p handling in find_base_term that could be the one handling it? I guess we should be able to somehow handle both sp and frame based accesses in a more conservative way?
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #25 from Wilco --- (In reply to rguent...@suse.de from comment #17) > On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > > > --- Comment #16 from Wilco --- > > I think we need to simplify the many BIG_ENDIAN macros so it is feasible to > > get > > big-endian to work reliably on all targets. There seem to be far too many > > options which affect too many unrelated things. Big-endian is fundamentally > > about memory byte ordering, so allowing to different byte/bit orderings in > > registers just makes things overly complex without any benefit. > > It's unfortunately not the compiler writers choice but the CPU designers. It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned little-endian long long and big-endian doubles! ARM2 only supported little-endian so it didn't matter at the time. However it doesn't allow unaligned accesses, tightly packed bitfields and runtime endian swapping as required by the embedded space, or hardware floating point. No surprise it was replaced by the Arm EABI.
[Bug tree-optimization/69196] [5 Regression] code size regression with jump threading at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69196 --- Comment #29 from Sebastian Huber --- Just for reference some numbers for GCC 7.4.0 and GCC 9.0.0 20190104: sparc-rtems5-gcc --version sparc-rtems5-gcc (GCC) 7.4.0 20181206 (RTEMS 5, RSB ddba5372522da341fa20b2c75dfe966231cb6790, Newlib df6915f029ac9acd2b479ea898388cbd7dda4974) Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. sparc-rtems5-gcc -c -O2 -o vprintk.7.4.0.o vprintk.i sparc-rtems6-gcc --version sparc-rtems6-gcc (GCC) 9.0.0 20190104 (RTEMS 6, RSB cd4a4f61ea5bbd4236f7717a94cd5e67f8b3ad20, Newlib 34d9bb709390b14b4ed0b1ea2656bf6bf5a055c3) Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. sparc-rtems6-gcc -c -O2 -o vprintk.9.0.0.o vprintk.i size *.o textdata bss dec hex filename 688 0 0 688 2b0 vprintk.4.9.4.o 1272 0 01272 4f8 vprintk.6.0.0.o 933 0 0 933 3a5 vprintk.7.4.0.o 825 0 0 825 339 vprintk.9.0.0.o It seems the code size is quite volatile for this test case.
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #26 from Richard Biener --- (In reply to Wilco from comment #25) > (In reply to rguent...@suse.de from comment #17) > > On Tue, 8 Jan 2019, wilco at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > > > > > --- Comment #16 from Wilco --- > > > I think we need to simplify the many BIG_ENDIAN macros so it is feasible > > > to get > > > big-endian to work reliably on all targets. There seem to be far too many > > > options which affect too many unrelated things. Big-endian is > > > fundamentally > > > about memory byte ordering, so allowing to different byte/bit orderings in > > > registers just makes things overly complex without any benefit. > > > > It's unfortunately not the compiler writers choice but the CPU designers. > > It's more a bad ABI choice. The initial Arm ABI had 4-byte aligned > little-endian long long and big-endian doubles! ARM2 only supported > little-endian so it didn't matter at the time. However it doesn't allow > unaligned accesses, tightly packed bitfields and runtime endian swapping as > required by the embedded space, or hardware floating point. No surprise it > was replaced by the Arm EABI. Whatever ;) Did anybody test the patch? Testing on x86_64 will be quite pointless...
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #27 from Wilco --- (In reply to Eric Botcazou from comment #22) > > Is it really pure RTL, therefore not used in tree? So the above patch using > > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it? > > I wouldn't say incorrect, just inappropriate and unnecessary. And, yes, it > isn't used at the tree level and should stay so IMO. BYTES_BIG_ENDIAN alone > already implicitly enforces a numbering on bits. I mean incorrect as in the optimization would still trigger and give incorrect results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that BITS_BIG_ENDIAN has no bearing on the bitfield offsets used on tree level).
[Bug rtl-optimization/88769] New: Call to sin() optimized away, disregarding possible side-effect (errno)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769 Bug ID: 88769 Summary: Call to sin() optimized away, disregarding possible side-effect (errno) Product: gcc Version: 7.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: per at pz dot se Target Milestone: --- (This is my first GCC bug report, so please have patience with me...) Test program: #include void foo(float x) { sin(x); } When compiling with -O1 (or higher), the call to sin() is optimized away: .file "test.c" .text .globl foo .type foo, @function foo: .LFB0: .cfi_startproc rep ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0" .section.note.GNU-stack,"",@progbits However, the sin() call has possible side-effects; according to the glibc docs, it sets errno to EDOM in case the argument is an infinity. The math errno can be disabled with -fno-math-errno, but according to the GCC docs it is enabled by default, and compiling with the -fmath-errno makes no difference. The behavior is the same with GCC 8.2 and "trunk" (via godbolt.org). clang 6.0 does not optimize away the sin() call, except when called with -fno-math-errno. $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.3.0-27ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
[Bug fortran/88750] [9 Regression] runtime error in statically linked binaries
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750 --- Comment #26 from Iain Sandoe --- (In reply to Jürgen Reuter from comment #25) > (In reply to Richard Biener from comment #24) > > (In reply to Iain Sandoe from comment #23) > > > (In reply to Jürgen Reuter from comment #22) > > > > > Indeed - somehow you didn't get a statically linked executable. Quoting the > > full final link command would be interesting. > > The full link commands can be found here, I believe: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88750#c14 > > Our code generates code for particle physics simulations in the form of > dynamic libraries that get linked and loaded. For batch clusters, we > attempted to provide static binaries for these simulations, however, we have > order 10-15 external libraries that can be linked to our code (which are > partially mandatory). There are some of them which only exist as dynamic > libraries, so there our approach cannot result in a purely static binary. > The static stdc++ library is sucked in via the libtool link mode/flag > -static-libtool-libs while the dynamic ones are sucked in via the external > C++ libraries that are available only dynamically. So .. I appreciate it can be difficult with a sophisticated project. However, it would seem prudent to try to arrange that you have only one instance of the c++ library. Imagine creating an object in one instance, and that object somehow finds it's way to be destroyed in a different one. I've spent some time trying to make it possible to link GCC Darwin projects 'statically', (modulo the libSystem, which must be dynamic) - but that's only going to work if all the project dependent libs are available as convenience libs (or, I suppose, if no used dynamic ones have any external deps other than libSystem). If that's not possible, then it's most likely better to arrange to do a link -r on everything that can be found as convenience .. and then link the result with -lstdc++. It might be that it worked before mostly from luck - although I'd still like to have a reference for a known "working" static linked case. As the c++ library grows, this is only going to be more fragile.
[Bug rtl-optimization/88770] New: Redundant load opt. or CSE pessimizes code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770 Bug ID: 88770 Summary: Redundant load opt. or CSE pessimizes code Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bisqwit at iki dot fi Target Milestone: --- For this code (-xc -std=c99 or -xc++ -std=c++17): struct guu { int a; int b; float c; char d; }; extern void test(struct guu); void caller() { test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); } CSE (or some other form of redundant loads optimization) pessimizes the code. Problem occurs on optimization levels -O1 and higher, including -Os. If the function "caller" calls test() just once, the resulting code is (-O3 -fno-optimize-sibling-calls, stack alignment/push/pops omitted for brevity): movabs rdi, 21474836483 movabs rsi, 39743127552 calltest If "caller" calls test() twice, the code is a lot longer and not just twice as long. (Stack alignment/push/pops omitted for brevity): movabs rbp, 21474836483 mov rdi, rbp movabs rbx, 38654705664 mov rsi, rbx or rbx, 1088421888 or rsi, 1088421888 calltest mov rsi, rbx mov rdi, rbp calltest If we change caller() such that the parameters in the two calls are not identical: void caller() { test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} ); test( (struct guu){.a = 3, .b = 6, .c = 7, .d = 10} ); } The generated code is optimal again as expected: movabs rdi, 21474836483 movabs rsi, 39743127552 calltest movabs rdi, 25769803779 movabs rsi, 44038094848 calltest The problem in the first examples is that the compiler sees that the same parameter is used twice, and it tries to save it in a callee-saves register, in order to reuse the same values on the second call. However re-initializing the registers from scratch would have been more efficient. The problem occurs on GCC versions 4.8.1 and newer. It does not occur in GCC version 4.7.4, which generated different code that is otherwise inefficient. For reference, the problem also exists in Clang versions 3.5 and newer, but not in versions 3.4 and earlier.
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #28 from Richard Biener --- (In reply to Wilco from comment #27) > (In reply to Eric Botcazou from comment #22) > > > Is it really pure RTL, therefore not used in tree? So the above patch > > > using > > > BITS_BIG_ENDIAN for tree stuff would be incorrect to use it? > > > > I wouldn't say incorrect, just inappropriate and unnecessary. And, yes, it > > isn't used at the tree level and should stay so IMO. BYTES_BIG_ENDIAN alone > > already implicitly enforces a numbering on bits. > > I mean incorrect as in the optimization would still trigger and give > incorrect results if BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN (given that > BITS_BIG_ENDIAN has no bearing on the bitfield offsets used on tree level). Given that it matters for /* If I2 is setting a pseudo to a constant and I3 is setting some sub-part of it to another constant, merge them by making a new constant. */ if (i1 == 0 ... if (GET_CODE (dest) == ZERO_EXTRACT) { ... if (BITS_BIG_ENDIAN) offset = GET_MODE_PRECISION (dest_mode) - width - offset; and VN tries to do sth similar I wonder if it does matter after all... That said, the docs also refer to 'bit-field instructions' but do not elaborate further -- I guess zero_extract is such but I'd have guessed BIT_FIELD_REF (on trees) is as well. But yes, RTL expansion adjusts things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care (or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN).
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #29 from Wilco --- (In reply to Richard Biener from comment #26) > Did anybody test the patch? Testing on x86_64 will be quite pointless... Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes: ubfxx1, x20, 2, 16 This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue is that we're using a bitfield reference on a value that is claimed not to be a bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work correctly.
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #30 from Eric Botcazou --- > That said, the docs also refer to 'bit-field instructions' but do not > elaborate further -- I guess zero_extract is such but I'd have guessed > BIT_FIELD_REF (on trees) is as well. But yes, RTL expansion adjusts > things based on BITS_BIG_ENDIAN so it looks like GENERIC doesn't care > (or assumes BITS_BIG_ENDIAN == BYTES_BIG_ENDIAN). Yes, BYTES_BIG_ENDIAN is implicitly propagated to bits at the tree level. I don't think that we want to support BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN at the tree level, that would be a nightmare.
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #4 from Wilco --- (In reply to ktkachov from comment #2) > Created attachment 45386 [details] > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > I'm attaching the full LLVM aarch64 output. > > The output you quoted is with -funroll-loops. If that's not given, GCC > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > testing). > > Is there anything we can do to make the default unrolling a bit more > aggressive? I don't think the RTL unroller works at all. It doesn't have the right settings, and doesn't understand how to unroll, so we always get inefficient and bloated code. To do unrolling correctly it has to be integrated at tree level - for example when vectorization isn't possible/beneficial, unrolling might still be a good idea.
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 Bill Schmidt changed: What|Removed |Added Status|WAITING |UNCONFIRMED Ever confirmed|1 |0 --- Comment #2 from Bill Schmidt --- Hi Richard -- This was reported to us internally. The performance of this test case on a P8 server indicates that disabling complete unrolling and applying unroll-and-jam could produce about a 1.5x speedup. I am going to have our performance team verify that this is the case using just the options that Li Jia used; the original report modified the source to provide the results of unroll-and-jam since the reporter didn't know how to disable cunrolli. I'll post the results here when we have them.
[Bug tree-optimization/88763] Better Output for Loop Unswitching
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763 --- Comment #3 from Marius Messerschmidt --- Sorry but I do not fully understand what you mean. Do you suggest using different command line arguments? So far I tried: -fdump-tree-all -fdump-tree-unswitch and -fopt-info-all-optall But none of them told me the all the things that I would wish to know, most important the reason why a particular loop was skipped during unswitching (e.g. because it is not invariant or so (right now it already reports a few things with -fdump-tree-unswitch like too-many-instructions or too-many-branches))
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #3 from Michael Matz --- I don't see anything to improve either (as far as unroll-and-jam is concerned). It's quite possible that cunrolli is harming more than helping in this case, but with it disabled it seems the code is as it should be. So, please state what you want to see changed: unroll-and-jam or cunrolli?
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #31 from rguenther at suse dot de --- On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > --- Comment #29 from Wilco --- > (In reply to Richard Biener from comment #26) > > > Did anybody test the patch? Testing on x86_64 will be quite pointless... > > Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes: > > ubfxx1, x20, 2, 16 > > This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The issue > is > that we're using a bitfield reference on a value that is claimed not to be a > bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever work > correctly. So that's because TYPE_PRECISION != GET_MODE_PRECISION and the BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose. Thus there is a RTL expansion side of the bug after all? The "fixed" RTL is (insn 6 5 7 (set (reg:SI 95) (lshiftrt:SI (reg/v:SI 94 [ ulAddr ]) (const_int 2 [0x2]))) "t.c":42:48 -1 (nil)) (insn 7 6 8 (set (reg:SI 96) (and:SI (reg:SI 95) (const_int 1073741823 [0x3fff]))) "t.c":42:48 -1 (nil)) (insn 8 7 9 (set (subreg:DI (reg:HI 97) 0) (zero_extract:DI (subreg:DI (reg:SI 96) 0) (const_int 16 [0x10]) (const_int 2 [0x2]))) "t.c":44:8 -1 (nil)) so the 30bit value is in reg:SI 96 (the :30 cast causes the and with 0x3fff) but then the zero_extract we generate is bogus. So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN? We might be able to work around this by optimization on GIMPLE, combining _1 = ulAddr_3(D) >> 2; _2 = () _1; _6 = BIT_FIELD_REF <_2, 16, 14>; as far as eliminating at least the non-mode precision type... Of course that would just work around the underlying RTL expansion bug? Note we can end up with things like _2 = ( (TYPE_MODE (TREE_TYPE (tem) + && BYTES_BIG_ENDIAN) + bitpos += (GET_MODE_BITSIZE (as_a (TYPE_MODE (TREE_TYPE (tem +- TYPE_PRECISION (TREE_TYPE (tem))); + /* If TEM's type is a union of variable size, pass TARGET to the inner computation, since it will need a temporary and TARGET is known to have to do. This occurs in unchecked conversion in Ada. */
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #4 from rguenther at suse dot de --- On Wed, 9 Jan 2019, wschmidt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 > > Bill Schmidt changed: > >What|Removed |Added > > Status|WAITING |UNCONFIRMED > Ever confirmed|1 |0 > > --- Comment #2 from Bill Schmidt --- > Hi Richard -- This was reported to us internally. The performance of this > test > case on a P8 server indicates that disabling complete unrolling and applying > unroll-and-jam could produce about a 1.5x speedup. I am going to have our > performance team verify that this is the case using just the options that Li > Jia used; the original report modified the source to provide the results of > unroll-and-jam since the reporter didn't know how to disable cunrolli. I'll > post the results here when we have them. Note for cases like this it would be nice to extend our set of loop pragmas so you could say #pragma GCC loop unroll-and-jam [factor] on the outer loop which should then disable unrolling of the inner. If source modification is possible, that is. Using -fdisable-tree-cunrolli isn't meant to be a "production thing"
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #5 from Bill Schmidt --- From the original reporter: Partially unrolling the outermost loop in the innermost loop body enables data reuse for array A (see source) thereby improving the mem-ops/compute ratio and providing the performance gain.
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #32 from Richard Biener --- (In reply to rguent...@suse.de from comment #31) > On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 > > > > --- Comment #29 from Wilco --- > > (In reply to Richard Biener from comment #26) > > > > > Did anybody test the patch? Testing on x86_64 will be quite pointless... > > > > Well that generates _18 = BIT_FIELD_REF <_2, 16, 14>; and becomes: > > > > ubfxx1, x20, 2, 16 > > > > This extracts bits 2-17 of the 30-bit value instead of bits 14-29. The > > issue is > > that we're using a bitfield reference on a value that is claimed not to be a > > bitfield in comment 6. So I can't see how using BIT_FIELD_REF could ever > > work > > correctly. > > So that's because TYPE_PRECISION != GET_MODE_PRECISION and the > BIT_FIELD_REF expansion counting from GET_MODE_PRECISION I suppose. > > Thus there is a RTL expansion side of the bug after all? > > The "fixed" RTL is > > (insn 6 5 7 (set (reg:SI 95) > (lshiftrt:SI (reg/v:SI 94 [ ulAddr ]) > (const_int 2 [0x2]))) "t.c":42:48 -1 > (nil)) > > (insn 7 6 8 (set (reg:SI 96) > (and:SI (reg:SI 95) > (const_int 1073741823 [0x3fff]))) "t.c":42:48 -1 > (nil)) > > (insn 8 7 9 (set (subreg:DI (reg:HI 97) 0) > (zero_extract:DI (subreg:DI (reg:SI 96) 0) > (const_int 16 [0x10]) > (const_int 2 [0x2]))) "t.c":44:8 -1 > (nil)) > > so the 30bit value is in reg:SI 96 (the :30 cast causes the > and with 0x3fff) but then the zero_extract we generate > is bogus. > > So maybe the :30 cast should have been a shift for BYTES_BIG_ENDIAN? > > We might be able to work around this by optimization on GIMPLE, > combining > > _1 = ulAddr_3(D) >> 2; > _2 = () _1; > _6 = BIT_FIELD_REF <_2, 16, 14>; > > as far as eliminating at least the non-mode precision type... > > Of course that would just work around the underlying RTL expansion > bug? > > Note we can end up with things like > > _2 = ( _3 = ( _5 = _2 + 3; > > as well so shifting at the conversion might not be the correct > answer (but instead BIT_FIELD_REF expansion needs to be fixed). > > Alternatively we could declare it invalid GIMPLE and require > BIT_FIELD_REF positions to be always relative to the mode > (but then I'd rather disallow BIT_FIELD_REF on non-mode > precision entities...). > > Sth like the following might fix the RTL expansion issue > which then generates > > Test_func: > ubfxx0, x0, 2, 16 > cmp w0, 1 > bne .L6 > mov w0, 0 > > and just > > (insn 6 5 7 (set (reg:SI 95) > (lshiftrt:SI (reg/v:SI 94 [ ulAddr ]) > (const_int 2 [0x2]))) "t.c":42:48 -1 > (nil)) > > (insn 7 6 8 (set (reg:SI 96) > (and:SI (reg:SI 95) > (const_int 1073741823 [0x3fff]))) "t.c":42:48 -1 > (nil)) > > (insn 8 7 9 (set (reg:SI 97) > (zero_extend:SI (subreg:HI (reg:SI 96) 2))) "t.c":44:8 -1 > (nil)) > > Index: gcc/expr.c > === > --- gcc/expr.c (revision 267553) > +++ gcc/expr.c (working copy) > @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target >infinitely recurse. */ > gcc_assert (tem != exp); > > + /* When extracting from non-mode bitsize entities adjust the > + bit position for BYTES_BIG_ENDIAN. */ > + if (INTEGRAL_TYPE_P (TREE_TYPE (tem)) > + && (TYPE_PRECISION (TREE_TYPE (tem)) > + < GET_MODE_BITSIZE (as_a (TYPE_MODE > (TREE_TYPE (tem) > + && BYTES_BIG_ENDIAN) > + bitpos += (GET_MODE_BITSIZE (as_a (TYPE_MODE > (TREE_TYPE (tem > +- TYPE_PRECISION (TREE_TYPE (tem))); > + > /* If TEM's type is a union of variable size, pass TARGET to the > inner >computation, since it will need a temporary and TARGET is known >to have to do. This occurs in unchecked conversion in Ada. */ Btw, this needs to be amended for WORDS_BIG_ENDIAN of course. I guess we might even run into the case that such BIT_FIELD_REF references a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN I guess).
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #6 from Bill Schmidt --- Yes, we don't want to encourage disabling cunrolli by hand for production use. This test case is interesting because it shows a tension between complete unrolling of inner loops and classical HPC loop optimization, which wants control over memory access patterns. I think we will eventually have to address this more generally.
[Bug rtl-optimization/88770] Redundant load opt. or CSE pessimizes code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770 Richard Biener changed: What|Removed |Added Keywords||missed-optimization, ra Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-09 CC||vmakarov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I guess being constants would make this a job for lra remat? Confirmed also on trunk.
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #7 from Bill Schmidt --- (In reply to Michael Matz from comment #3) > I don't see anything to improve either (as far as unroll-and-jam is > concerned). > It's quite possible that cunrolli is harming more than helping in this case, > but with it disabled it seems the code is as it should be. > > So, please state what you want to see changed: unroll-and-jam or cunrolli? The question in my mind is what to do about the phase interaction between the two. Classical optimizations of loop nests for HPC code optimize memory access patterns, and cunrolli takes some of the options off the table before unroll-and-jam (in this case) can analyze the loop.
[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-09 CC||jsm28 at gcc dot gnu.org Component|tree-optimization |c Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- This is because GCC thinks sin() doesn't set errno. DEF_LIB_BUILTIN(BUILT_IN_SIN, "sin", BT_FN_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING) According to the C standard no error conditions are documented for sin or cos, specifically no domain error is documented.
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #5 from Wilco --- (In reply to Wilco from comment #4) > (In reply to ktkachov from comment #2) > > Created attachment 45386 [details] > > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > > > I'm attaching the full LLVM aarch64 output. > > > > The output you quoted is with -funroll-loops. If that's not given, GCC > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > > testing). > > > > Is there anything we can do to make the default unrolling a bit more > > aggressive? > > I don't think the RTL unroller works at all. It doesn't have the right > settings, and doesn't understand how to unroll, so we always get inefficient > and bloated code. > > To do unrolling correctly it has to be integrated at tree level - for > example when vectorization isn't possible/beneficial, unrolling might still > be a good idea. To add some numbers to the conversation, the gain LLVM gets from default unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017. This clearly shows there is huge potential from unrolling, *if* we can teach GCC to unroll properly like LLVM. That means early unrolling, using good default settings and using a trailing loop rather than inefficient peeling.
[Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 --- Comment #8 from rguenther at suse dot de --- On Wed, 9 Jan 2019, wschmidt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 > > --- Comment #7 from Bill Schmidt --- > (In reply to Michael Matz from comment #3) > > I don't see anything to improve either (as far as unroll-and-jam is > > concerned). > > It's quite possible that cunrolli is harming more than helping in this case, > > but with it disabled it seems the code is as it should be. > > > > So, please state what you want to see changed: unroll-and-jam or cunrolli? > > The question in my mind is what to do about the phase interaction between the > two. Classical optimizations of loop nests for HPC code optimize memory > access > patterns, and cunrolli takes some of the options off the table before > unroll-and-jam (in this case) can analyze the loop. A improvement of the heuristics could be to turn down --param max-completely-peel-times and friends for cunrolli. cunrolli is important to remove abstraction in C++ since none of the scalar optimization passes knows to unroll loops "virtually" (it's on my list to experiment with such an idea for value-numbering)
[Bug tree-optimization/88771] New: [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 Bug ID: 88771 Summary: [9 Regression] Misleading -Werror=array-bounds error Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: diagnostic Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- Starting from r264956 I see error for: $ cat om.i typedef struct { int a; } * b; char *c, *x; int f; void d() { b e; char a = f + 1 ?: f; __builtin_strncpy(c, x, f); if (a) e->a = 0; } $ gcc om.i -c -O2 -Werror=array-bounds om.i: In function ‘d’: om.i:11:3: error: ‘__builtin_strncpy’ pointer overflow between offset 0 and size [-1, 9223372036854775807] [-Werror=array-bounds] 11 | __builtin_strncpy(c, x, f); | ^~ cc1: some warnings being treated as errors $ gcc om.i -c -O2 -Werror=array-bounds -m32 om.i: In function ‘d’: om.i:11:3: error: ‘__builtin_strncpy’ pointer overflow between offset 0 and size [4294967295, 2147483647] [-Werror=array-bounds] 11 | __builtin_strncpy(c, x, f); | ^~ cc1: some warnings being treated as errors
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 Martin Liška changed: What|Removed |Added Last reconfirmed||2019-1-9 CC||msebor at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Known to work||8.2.0 Target Milestone|--- |9.0 Known to fail||9.0
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #6 from rguenther at suse dot de --- On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 > > --- Comment #5 from Wilco --- > (In reply to Wilco from comment #4) > > (In reply to ktkachov from comment #2) > > > Created attachment 45386 [details] > > > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > > > > > I'm attaching the full LLVM aarch64 output. > > > > > > The output you quoted is with -funroll-loops. If that's not given, GCC > > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > > > testing). > > > > > > Is there anything we can do to make the default unrolling a bit more > > > aggressive? > > > > I don't think the RTL unroller works at all. It doesn't have the right > > settings, and doesn't understand how to unroll, so we always get inefficient > > and bloated code. > > > > To do unrolling correctly it has to be integrated at tree level - for > > example when vectorization isn't possible/beneficial, unrolling might still > > be a good idea. > > To add some numbers to the conversation, the gain LLVM gets from default > unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017. > > This clearly shows there is huge potential from unrolling, *if* we can teach > GCC to unroll properly like LLVM. That means early unrolling, using good > default settings and using a trailing loop rather than inefficient peeling. I don't see why this cannot be done on RTL where we have vastly more information of whether there are execution resources that can be used by unrolling. Note we also want unrolling to interleave instructions to not rely on pre-reload scheduling which in turn means having a good eye on register pressure (again sth not very well handled on GIMPLE)
[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769 --- Comment #2 from Per Zetterlund --- The POSIX standard describes domain error conditions for sin() : http://pubs.opengroup.org/onlinepubs/9699919799/functions/sin.html . I guess there is a discrepancy between the C standard and the POSIX standard in this case.
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- It's the restrict pass doing this after VRP figured we can simplify things via threading: # .MEM_22 = VDEF <.MEM_7(D)> __builtin_strncpy (pretmp_9, pretmp_19, 18446744073709551615); not sure what the warning is about though but I guess it's triggered by seeing that e->a = 0 store? The testcase seems to be reduced ad absurdum and the bisection looks odd. Can you attach some original source?
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 --- Comment #2 from Martin Liška --- Created attachment 45390 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45390&action=edit original test-case Original test that fails just with -m32: $ gcc om-original.i -c -O2 -Werror=array-bounds -m32 In file included from /usr/include/string.h:494, from /usr/include/X11/Xfuncs.h:46, from ../../../include/X11/Xlibint.h:335, from omGeneric.c:53: In function ‘strncpy’, inlined from ‘read_EncodingInfo’ at omGeneric.c:1836:9: /usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ pointer overflow between offset 0 and size [4294967295, 2147483647] [-Werror=array-bounds] 106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest)); | ^~~ cc1: some warnings being treated as errors
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 --- Comment #3 from Martin Liška --- Original test-case started to produce the warning since r263662.
[Bug tree-optimization/88760] GCC unrolling is suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 --- Comment #7 from Wilco --- (In reply to rguent...@suse.de from comment #6) > On Wed, 9 Jan 2019, wilco at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 > > > > --- Comment #5 from Wilco --- > > (In reply to Wilco from comment #4) > > > (In reply to ktkachov from comment #2) > > > > Created attachment 45386 [details] > > > > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > > > > > > > I'm attaching the full LLVM aarch64 output. > > > > > > > > The output you quoted is with -funroll-loops. If that's not given, GCC > > > > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > > > > testing). > > > > > > > > Is there anything we can do to make the default unrolling a bit more > > > > aggressive? > > > > > > I don't think the RTL unroller works at all. It doesn't have the right > > > settings, and doesn't understand how to unroll, so we always get > > > inefficient > > > and bloated code. > > > > > > To do unrolling correctly it has to be integrated at tree level - for > > > example when vectorization isn't possible/beneficial, unrolling might > > > still > > > be a good idea. > > > > To add some numbers to the conversation, the gain LLVM gets from default > > unrolling is 4.5% on SPECINT2017 and 1.0% on SPECFP2017. > > > > This clearly shows there is huge potential from unrolling, *if* we can teach > > GCC to unroll properly like LLVM. That means early unrolling, using good > > default settings and using a trailing loop rather than inefficient peeling. > > I don't see why this cannot be done on RTL where we have vastly more > information of whether there are execution resources that can be > used by unrolling. Note we also want unrolling to interleave > instructions to not rely on pre-reload scheduling which in turn means > having a good eye on register pressure (again sth not very well handled > on GIMPLE) The main issue is that other loop optimizations are done on tree, so things like addressing modes, loop invariants, CSEs are run on the non-unrolled version. Then when we unroll in RTL we end up with very non-optimal code. Typical unrolled loop starts like this: add x13, x2, 1 add x14, x2, 2 add x11, x2, 3 add x10, x2, 4 ldr w30, [x4, x13, lsl 2] add x9, x2, 5 add x5, x2, 6 add x12, x2, 7 ldr d23, [x3, x2, lsl 3] ... rest of unrolled loop So basically it decides to create a new induction variable for every unrolled copy in the loop. This often leads to spills just because it creates way too many redundant addressing instructions. It also blocks scheduling between iterations since the alias optimization doesn't appear to understand simple constant differences between indices. So unrolling should definitely be done at a high level just like vectorization.
[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450 Jakub Jelinek changed: What|Removed |Added CC||uros at gcc dot gnu.org --- Comment #15 from Jakub Jelinek --- Because you are using -march=native -mtune=native by default, it is unclear what exact ISA/tuning that is. From the dump I assume it must be something with -mavx or -mavx2, so that is what I've used, but it would be nice to know it more exact. Guess ./xgcc -B ./ -v -xc /dev/null -S would reveal that. Anyway, seems on gimplify.ii the problematic change in assign_stack_local_1 is triggered just once, and to me it looks completely unnecessarily. assign_stack_temp_for_type is called with BLKmode, 24, and gimple_stmt_iterator type. This is done because on the seq = gsi_split_seq_after (iter); call in gimplify_cleanup_point_expr inlined into gimplify_expr where iter is clearly passed by invisible reference. assign_stack_temp_for_type calls get_stack_local_alignment which does: if (mode == BLKmode) alignment = BIGGEST_ALIGNMENT; else alignment = GET_MODE_ALIGNMENT (mode); /* Allow the frond-end to (possibly) increase the alignment of this stack slot. */ if (! type) type = lang_hooks.types.type_for_mode (mode, 0); return STACK_SLOT_ALIGNMENT (type, mode, alignment); It seems complete waste to me try to align the 24 byte structure to 32 byte boundary and allocate 48 bytes for it on the stack, then dynamically adjust the start so that it is 32 byte aligned. BIGGEST_ALIGNMENT is 256 bits because of -mavx (could be even 512 bits for -mavx512f). So, first of all, I'd think we should in i386 STACK_SLOT_ALIGNMENT undo this unnecessary overalignment. But that doesn't explain why you get segfault elsewhere.
[Bug libgcc/88772] New: Exception handling configured mode does not match the one finally used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772 Bug ID: 88772 Summary: Exception handling configured mode does not match the one finally used Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: ylatuya at gmail dot com Target Milestone: --- I am building a multilib GCC+MinGW toolchain targeting Windows. I have built the cross toolchain, which compiled and works correctly and I am now trying to build the native one. The cross toolchain is configured with: ../configure --prefix /home/andoni/mingw/linux/w64 --libdir /home/andoni/mingw/linux/w64/lib --enable-introspection --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --disable-shared --disable-libgomp --disable-libquadmath --disable-libquadmath-support --disable-libmudflap --disable-libmpx --disable-libssp --disable-nls --enable-threads=posix --enable-__cxa_atexit --enable-lto --enable-plugin --enable-multiarch --enable-languages=c,c++ --enable-long-long --with-sysroot=/home/andoni/mingw/linux/w64/x86_64-w64-mingw32/sysroot --with-local-prefix=/home/andoni/mingw/linux/w64/x86_64-w64-mingw32/sysroot --target=x86_64-w64-mingw32 The native toolchain is configured with the same settings, only changing the host: ../configure --prefix /home/andoni/mingw/windows/w64 --libdir /home/andoni/mingw/windows/w64/lib --disable-introspection --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --disable-shared --disable-libgomp --disable-libquadmath --disable-libquadmath-support --disable-libmudflap --disable-libmpx --disable-libssp --disable-nls --enable-threads=posix --enable-__cxa_atexit --enable-lto --enable-plugin --enable-multiarch --enable-languages=c,c++ --enable-long-long --with-sysroot=/home/andoni/mingw/windows/w64/x86_64-w64-mingw32/sysroot --with-local-prefix=/home/andoni/mingw/windows/w64/x86_64-w64-mingw32/sysroot --target=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 I none of them I force SJLJ or disable it, so from the documentation and the headers it should be using SEH for 64 bits and SJLJ for 32 bits: gcc/config/i386/cygming.h 369 /* If configured with --disable-sjlj-exceptions, use DWARF2 for 32-bit 370mode else default to SJLJ. 64-bit code uses SEH unless you request 371SJLJ. */ But what happens is that it ends up using i386/t-dw2-eh instead of i386/t-seh-eh and there is compilation error: ../../../../libgcc/unwind.inc: In function '_Unwind_RaiseException_Phase2': ../../../../libgcc/unwind.inc:53:62: error: 'struct _Unwind_Exception' has no member named 'private_2'; did you mean 'private_'? match_handler = (uw_identify_context (context) == exc->private_2 The error seems to be in the switch case for x86_64-mingw32 in libgcc/config.host: 762 > # This has to match the logic for DWARF2_UNWIND_INFO in gcc/config/i386/cygming.h 763 > if test x$ac_cv_sjlj_exceptions = xyes; then 764 > > tmake_eh_file="i386/t-sjlj-eh" 765 > elif test "${host_address}" = 32; then 766 > # biarch -m32 with --disable-sjlj-exceptions 767 >> tmake_eh_file="i386/t-dw2-eh" 768 > > md_unwind_header=i386/w32-unwind.h 769 > else 770 > > tmake_eh_file="i386/t-seh-eh" 771 > fi ^ private_
[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450 --- Comment #16 from Jakub Jelinek --- The following patch does that. Guess the issues reported in this PR might go away with that, but it is really just an attempt to fix inefficiency in the generated code rather than fix the wrong-code issue we have somewhere. --- gcc/config/i386/i386.c.jj 2019-01-08 22:33:34.605708026 +0100 +++ gcc/config/i386/i386.c 2019-01-09 15:11:35.902663636 +0100 @@ -29679,6 +29679,17 @@ ix86_local_alignment (tree exp, machine_ && (!type || !TYPE_USER_ALIGN (type)) && (!decl || !DECL_USER_ALIGN (decl))) align = 32; + /* Similarly, don't do dynamic stack realignment just because + we need a BLKmode stack slot and have high BIGGEST_ALIGNMENT. + This is what get_stack_local_alignment returns regardless of + the actual needs, undo that here. */ + if (align == BIGGEST_ALIGNMENT + && mode == BLKmode + && !decl + && type + && align > TYPE_ALIGN (type) + && align > MAX_SUPPORTED_STACK_ALIGNMENT) +align = MAX (TYPE_ALIGN (type), MAX_SUPPORTED_STACK_ALIGNMENT); /* If TYPE is NULL, we are allocating a stack slot for caller-save register in MODE. We will return the largest alignment of XF --- gcc/function.c.jj 2019-01-09 11:15:31.539836837 +0100 +++ gcc/function.c 2019-01-09 15:10:52.971371328 +0100 @@ -919,8 +919,10 @@ assign_stack_temp_for_type (machine_mode So for requests which depended on the rounding of SIZE, we go ahead and round it now. We also make sure ALIGNMENT is at least -BIGGEST_ALIGNMENT. */ - gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT); +minimum of BIGGEST_ALIGNMENT and MAX_SUPPORTED_STACK_ALIGNMENT. */ + gcc_assert (mode != BLKmode + || align >= MIN (BIGGEST_ALIGNMENT, + MAX_SUPPORTED_STACK_ALIGNMENT)); p->slot = assign_stack_local_1 (mode, (mode == BLKmode ? aligned_upper_bound (size,
[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 --- Comment #23 from Richard Biener --- (In reply to Richard Biener from comment #22) > Things we fail to disambiguate are > > (mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0 S16 A8]) > vs. > (mem/c:TF (plus:SI (reg/f:SI 19 frame) > (const_int -16 [0xfff0])) [1 S16 A128]) > > or > > (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3 S4 A32]) > vs. > (mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2] argv>) [2 argv+0 S4 A32]) > > where I don't find anything besides CSELIB cselib_sp_based_value_p handling > in find_base_term that could be the one handling it? > > I guess we should be able to somehow handle both sp and frame based > accesses in a more conservative way? it's really 99% like this which is why eventually that CONST_INT restriction worked so "well". Can we easily identify spill slot accesses somehow? The parameter accesses (frame references?) should simply get appropriate MEM_EXPRs IMHO.
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #14 from Eric Botcazou --- Author: ebotcazou Date: Wed Jan 9 14:34:20 2019 New Revision: 267771 URL: https://gcc.gnu.org/viewcvs?rev=267771&root=gcc&view=rev Log: PR target/84010 * config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode consistently in TLS address generation and adjust code to the renaming of patterns. Mark calls to __tls_get_addr as const. * config/sparc/sparc.md (tgd_hi22): Turn into... (tgd_hi22): ...this and use Pmode throughout. (tgd_lo10): Turn into... (tgd_lo10): ...this and use Pmode throughout. (tgd_add32): Merge into... (tgd_add64): Likewise. (tgd_add): ...this and use Pmode throughout. (tldm_hi22): Turn into... (tldm_hi22): ...this and use Pmode throughout. (tldm_lo10): Turn into... (tldm_lo10): ...this and use Pmode throughout. (tldm_add32): Merge into... (tldm_add64): Likewise. (tldm_add): ...this and use Pmode throughout. (tldm_call32): Merge into... (tldm_call64): Likewise. (tldm_call): ...this and use Pmode throughout. (tldo_hix22): Turn into... (tldo_hix22): ...this and use Pmode throughout. (tldo_lox10): Turn into... (tldo_lox10): ...this and use Pmode throughout. (tldo_add32): Merge into... (tldo_add64): Likewise. (tldo_add): ...this and use Pmode throughout. (tie_hi22): Turn into... (tie_hi22): ...this and use Pmode throughout. (tie_lo10): Turn into... (tie_lo10): ...this and use Pmode throughout. (tie_ld64): Use DImode throughout. (tie_add32): Merge into... (tie_add64): Likewise. (tie_add): ...this and use Pmode throughout. (tle_hix22_sp32): Merge into... (tle_hix22_sp64): Likewise. (tle_hix22): ...this and use Pmode throughout. (tle_lox22_sp32): Merge into... (tle_lox22_sp64): Likewise. (tle_lox22): ...this and use Pmode throughout. (*tldo_ldub_sp32): Merge into... (*tldo_ldub_sp64): Likewise. (*tldo_ldub): ...this and use Pmode throughout. (*tldo_ldub1_sp32): Merge into... (*tldo_ldub1_sp64): Likewise. (*tldo_ldub1): ...this and use Pmode throughout. (*tldo_ldub2_sp32): Merge into... (*tldo_ldub2_sp64): Likewise. (*tldo_ldub2): ...this and use Pmode throughout. (*tldo_ldsb1_sp32): Merge into... (*tldo_ldsb1_sp64): Likewise. (*tldo_ldsb1): ...this and use Pmode throughout. (*tldo_ldsb2_sp32): Merge into... (*tldo_ldsb2_sp64): Likewise. (*tldo_ldsb2): ...this and use Pmode throughout. (*tldo_ldub3_sp64): Use DImode throughout. (*tldo_ldsb3_sp64): Likewise. (*tldo_lduh_sp32): Merge into... (*tldo_lduh_sp64): Likewise. (*tldo_lduh): ...this and use Pmode throughout. (*tldo_lduh1_sp32): Merge into... (*tldo_lduh1_sp64): Likewise. (*tldo_lduh1): ...this and use Pmode throughout. (*tldo_ldsh1_sp32): Merge into... (*tldo_ldsh1_sp64): Likewise. (*tldo_ldsh1): ...this and use Pmode throughout. (*tldo_lduh2_sp64): Use DImode throughout. (*tldo_ldsh2_sp64): Likewise. (*tldo_lduw_sp32): Merge into... (*tldo_lduw_sp64): Likewise. (*tldo_lduw): ...this and use Pmode throughout. (*tldo_lduw1_sp64): Use DImode throughout. (*tldo_ldsw1_sp64): Likewise. (*tldo_ldx_sp64): Likewise. (*tldo_stb_sp32): Merge into... (*tldo_stb_sp64): Likewise. (*tldo_stb): ...this and use Pmode throughout. (*tldo_sth_sp32): Merge into... (*tldo_sth_sp64): Likewise. (*tldo_sth): ...this and use Pmode throughout. (*tldo_stw_sp32): Merge into... (*tldo_stw_sp64): Likewise. (*tldo_stw): ...this and use Pmode throughout. (*tldo_stx_sp64): Use DImode throughout. Added: trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint8.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sparc/sparc.c trunk/gcc/config/sparc/sparc.md trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/88739] [7/8/9 Regression] Big-endian union bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88739 --- Comment #33 from Wilco --- (In reply to Richard Biener from comment #32) > > > > Index: gcc/expr.c > > === > > --- gcc/expr.c (revision 267553) > > +++ gcc/expr.c (working copy) > > @@ -10562,6 +10562,15 @@ expand_expr_real_1 (tree exp, rtx target > >infinitely recurse. */ > > gcc_assert (tem != exp); > > > > + /* When extracting from non-mode bitsize entities adjust the > > + bit position for BYTES_BIG_ENDIAN. */ > > + if (INTEGRAL_TYPE_P (TREE_TYPE (tem)) > > + && (TYPE_PRECISION (TREE_TYPE (tem)) > > + < GET_MODE_BITSIZE (as_a (TYPE_MODE > > (TREE_TYPE (tem) > > + && BYTES_BIG_ENDIAN) > > + bitpos += (GET_MODE_BITSIZE (as_a (TYPE_MODE > > (TREE_TYPE (tem > > +- TYPE_PRECISION (TREE_TYPE (tem))); > > + > > /* If TEM's type is a union of variable size, pass TARGET to the > > inner > >computation, since it will need a temporary and TARGET is known > >to have to do. This occurs in unchecked conversion in Ada. */ > > Btw, this needs to be amended for WORDS_BIG_ENDIAN of course. I guess > we might even run into the case that such BIT_FIELD_REF references > a non-contiguous set of bits... (that's also true for BITS_BIG_ENDIAN != > BYTES_BIG_ENDIAN I guess). Was that meant to be instead or in addition to the tree-ssa-sccvn.c patch? With both I get: lsr w20, w1, 2 ... and w1, w20, 65535 With only the expr.c patch it starts to look as expected: lsr w20, w1, 2 ... lsr w1, w20, 14 And with the latter case the new torture test now passes on big-endian!
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #15 from Eric Botcazou --- Author: ebotcazou Date: Wed Jan 9 14:39:18 2019 New Revision: 267772 URL: https://gcc.gnu.org/viewcvs?rev=267772&root=gcc&view=rev Log: PR target/84010 * config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode consistently in TLS address generation and adjust code to the renaming of patterns. Mark calls to __tls_get_addr as const. * config/sparc/sparc.md (tgd_hi22): Turn into... (tgd_hi22): ...this and use Pmode throughout. (tgd_lo10): Turn into... (tgd_lo10): ...this and use Pmode throughout. (tgd_add32): Merge into... (tgd_add64): Likewise. (tgd_add): ...this and use Pmode throughout. (tldm_hi22): Turn into... (tldm_hi22): ...this and use Pmode throughout. (tldm_lo10): Turn into... (tldm_lo10): ...this and use Pmode throughout. (tldm_add32): Merge into... (tldm_add64): Likewise. (tldm_add): ...this and use Pmode throughout. (tldm_call32): Merge into... (tldm_call64): Likewise. (tldm_call): ...this and use Pmode throughout. (tldo_hix22): Turn into... (tldo_hix22): ...this and use Pmode throughout. (tldo_lox10): Turn into... (tldo_lox10): ...this and use Pmode throughout. (tldo_add32): Merge into... (tldo_add64): Likewise. (tldo_add): ...this and use Pmode throughout. (tie_hi22): Turn into... (tie_hi22): ...this and use Pmode throughout. (tie_lo10): Turn into... (tie_lo10): ...this and use Pmode throughout. (tie_ld64): Use DImode throughout. (tie_add32): Merge into... (tie_add64): Likewise. (tie_add): ...this and use Pmode throughout. (tle_hix22_sp32): Merge into... (tle_hix22_sp64): Likewise. (tle_hix22): ...this and use Pmode throughout. (tle_lox22_sp32): Merge into... (tle_lox22_sp64): Likewise. (tle_lox22): ...this and use Pmode throughout. (*tldo_ldub_sp32): Merge into... (*tldo_ldub_sp64): Likewise. (*tldo_ldub): ...this and use Pmode throughout. (*tldo_ldub1_sp32): Merge into... (*tldo_ldub1_sp64): Likewise. (*tldo_ldub1): ...this and use Pmode throughout. (*tldo_ldub2_sp32): Merge into... (*tldo_ldub2_sp64): Likewise. (*tldo_ldub2): ...this and use Pmode throughout. (*tldo_ldsb1_sp32): Merge into... (*tldo_ldsb1_sp64): Likewise. (*tldo_ldsb1): ...this and use Pmode throughout. (*tldo_ldsb2_sp32): Merge into... (*tldo_ldsb2_sp64): Likewise. (*tldo_ldsb2): ...this and use Pmode throughout. (*tldo_ldub3_sp64): Use DImode throughout. (*tldo_ldsb3_sp64): Likewise. (*tldo_lduh_sp32): Merge into... (*tldo_lduh_sp64): Likewise. (*tldo_lduh): ...this and use Pmode throughout. (*tldo_lduh1_sp32): Merge into... (*tldo_lduh1_sp64): Likewise. (*tldo_lduh1): ...this and use Pmode throughout. (*tldo_ldsh1_sp32): Merge into... (*tldo_ldsh1_sp64): Likewise. (*tldo_ldsh1): ...this and use Pmode throughout. (*tldo_lduh2_sp64): Use DImode throughout. (*tldo_ldsh2_sp64): Likewise. (*tldo_lduw_sp32): Merge into... (*tldo_lduw_sp64): Likewise. (*tldo_lduw): ...this and use Pmode throughout. (*tldo_lduw1_sp64): Use DImode throughout. (*tldo_ldsw1_sp64): Likewise. (*tldo_ldx_sp64): Likewise. (*tldo_stb_sp32): Merge into... (*tldo_stb_sp64): Likewise. (*tldo_stb): ...this and use Pmode throughout. (*tldo_sth_sp32): Merge into... (*tldo_sth_sp64): Likewise. (*tldo_sth): ...this and use Pmode throughout. (*tldo_stw_sp32): Merge into... (*tldo_stw_sp64): Likewise. (*tldo_stw): ...this and use Pmode throughout. (*tldo_stx_sp64): Use DImode throughout. Added: branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c branches/gcc-8-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c - copied unc
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #16 from Eric Botcazou --- Author: ebotcazou Date: Wed Jan 9 14:41:55 2019 New Revision: 267773 URL: https://gcc.gnu.org/viewcvs?rev=267773&root=gcc&view=rev Log: PR target/84010 * config/sparc/sparc.c (sparc_legitimize_tls_address): Only use Pmode consistently in TLS address generation and adjust code to the renaming of patterns. Mark calls to __tls_get_addr as const. * config/sparc/sparc.md (tgd_hi22): Turn into... (tgd_hi22): ...this and use Pmode throughout. (tgd_lo10): Turn into... (tgd_lo10): ...this and use Pmode throughout. (tgd_add32): Merge into... (tgd_add64): Likewise. (tgd_add): ...this and use Pmode throughout. (tldm_hi22): Turn into... (tldm_hi22): ...this and use Pmode throughout. (tldm_lo10): Turn into... (tldm_lo10): ...this and use Pmode throughout. (tldm_add32): Merge into... (tldm_add64): Likewise. (tldm_add): ...this and use Pmode throughout. (tldm_call32): Merge into... (tldm_call64): Likewise. (tldm_call): ...this and use Pmode throughout. (tldo_hix22): Turn into... (tldo_hix22): ...this and use Pmode throughout. (tldo_lox10): Turn into... (tldo_lox10): ...this and use Pmode throughout. (tldo_add32): Merge into... (tldo_add64): Likewise. (tldo_add): ...this and use Pmode throughout. (tie_hi22): Turn into... (tie_hi22): ...this and use Pmode throughout. (tie_lo10): Turn into... (tie_lo10): ...this and use Pmode throughout. (tie_ld64): Use DImode throughout. (tie_add32): Merge into... (tie_add64): Likewise. (tie_add): ...this and use Pmode throughout. (tle_hix22_sp32): Merge into... (tle_hix22_sp64): Likewise. (tle_hix22): ...this and use Pmode throughout. (tle_lox22_sp32): Merge into... (tle_lox22_sp64): Likewise. (tle_lox22): ...this and use Pmode throughout. (*tldo_ldub_sp32): Merge into... (*tldo_ldub_sp64): Likewise. (*tldo_ldub): ...this and use Pmode throughout. (*tldo_ldub1_sp32): Merge into... (*tldo_ldub1_sp64): Likewise. (*tldo_ldub1): ...this and use Pmode throughout. (*tldo_ldub2_sp32): Merge into... (*tldo_ldub2_sp64): Likewise. (*tldo_ldub2): ...this and use Pmode throughout. (*tldo_ldsb1_sp32): Merge into... (*tldo_ldsb1_sp64): Likewise. (*tldo_ldsb1): ...this and use Pmode throughout. (*tldo_ldsb2_sp32): Merge into... (*tldo_ldsb2_sp64): Likewise. (*tldo_ldsb2): ...this and use Pmode throughout. (*tldo_ldub3_sp64): Use DImode throughout. (*tldo_ldsb3_sp64): Likewise. (*tldo_lduh_sp32): Merge into... (*tldo_lduh_sp64): Likewise. (*tldo_lduh): ...this and use Pmode throughout. (*tldo_lduh1_sp32): Merge into... (*tldo_lduh1_sp64): Likewise. (*tldo_lduh1): ...this and use Pmode throughout. (*tldo_ldsh1_sp32): Merge into... (*tldo_ldsh1_sp64): Likewise. (*tldo_ldsh1): ...this and use Pmode throughout. (*tldo_lduh2_sp64): Use DImode throughout. (*tldo_ldsh2_sp64): Likewise. (*tldo_lduw_sp32): Merge into... (*tldo_lduw_sp64): Likewise. (*tldo_lduw): ...this and use Pmode throughout. (*tldo_lduw1_sp64): Use DImode throughout. (*tldo_ldsw1_sp64): Likewise. (*tldo_ldx_sp64): Likewise. (*tldo_stb_sp32): Merge into... (*tldo_stb_sp64): Likewise. (*tldo_stb): ...this and use Pmode throughout. (*tldo_sth_sp32): Merge into... (*tldo_sth_sp64): Likewise. (*tldo_sth): ...this and use Pmode throughout. (*tldo_stw_sp32): Merge into... (*tldo_stw_sp64): Likewise. (*tldo_stw): ...this and use Pmode throughout. (*tldo_stx_sp64): Use DImode throughout. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int16.c branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int32.c branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int64.c branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-int8.c branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c - copied unchanged from r267771, trunk/gcc/testsuite/gcc.target/sparc/tls-ld-uint16.c branches/gcc-7-branch/gcc/testsuite/gcc.target/sparc/tls-ld-uint32.c - copied unc
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #17 from James Clarke --- Ah, great, thanks, that's indeed a nicer way of writing the patterns.
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 Eric Botcazou changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Eric Botcazou --- Fixed at last in upcoming 7.5, 8.3 and 9.x releases.
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #19 from Eric Botcazou --- > Ah, great, thanks, that's indeed a nicer way of writing the patterns. You're welcome. Don't hesitate to ping next time I drop the ball for so long.
[Bug target/84010] problematic TLS code generation on 64-bit SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84010 --- Comment #20 from James Clarke --- (In reply to Eric Botcazou from comment #19) > > Ah, great, thanks, that's indeed a nicer way of writing the patterns. > > You're welcome. Don't hesitate to ping next time I drop the ball for so > long. I had forgotten myself that a fix was never committed, probably because I remembered writing the patch, otherwise I would have pinged it long ago!
[Bug c/88769] Call to sin() optimized away, disregarding possible side-effect (errno)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88769 Eric Gallager changed: What|Removed |Added Status|NEW |RESOLVED CC||egallager at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #3 from Eric Gallager --- dup of bug 80042 *** This bug has been marked as a duplicate of bug 80042 ***
[Bug middle-end/80042] gcc thinks sin/cos don't set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80042 Eric Gallager changed: What|Removed |Added CC||per at pz dot se --- Comment #6 from Eric Gallager --- *** Bug 88769 has been marked as a duplicate of this bug. ***
[Bug middle-end/87836] ICE in cc1 for gcc-6.5.0 with SPARC hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87836 --- Comment #30 from Gary Mills --- A build of gcc-7 on SPARC just completed successfully with a much larger configuration: $ /export/home/mills/Downloads/code/oi-userland/components/developer/gcc-7/gcc-7.3.0/configure CC=/usr/gcc/4.9/bin/gcc CXX=/usr/gcc/4.9/bin/g++ F77=/usr/gcc/4.9/bin/gfortran FC=/usr/gcc/4.9/bin/gfortran CFLAGS=-O2 -mno-app-regs LDFLAGS=-m32 PKG_CONFIG_PATH=/usr/lib/pkgconfig --prefix=/usr/gcc/7 --mandir=/usr/gcc/7/share/man --bindir=/usr/gcc/7/bin --sbindir=/usr/gcc/7/bin --libdir=/usr/gcc/7/lib --libexecdir=/usr/gcc/7/lib --with-pkgversion=OpenIndiana 7.3.0-OI-0 --with-bugurl=https://bugs.openindiana.org --enable-languages=c,c++ --without-gnu-ld --with-ld=/usr/bin/ld --without-gnu-as --with-as=/usr/bin/as LDFLAGS=-R/usr/gcc/7/lib There still was no ICE. I'm going to try an even larger configuration next in an attempt to identify which configuration setting causes the ICE. I'm suspicious of these three: CONFIGURE_OPTIONS+= --host $(GNU_ARCH) CONFIGURE_OPTIONS+= --build $(GNU_ARCH) CONFIGURE_OPTIONS+= --target $(GNU_ARCH) which are part of the OI Makefile. Note the missing equal signs (=). I only noticed these a few days ago. I'll include these at the very end of my testing.
[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450 --- Comment #17 from Jakub Jelinek --- Though, the more I look at it, the more I'm for reversion of the patch + deal with it in the assign_stack_local caller that needs that.
[Bug bootstrap/88450] [9 regression] ICE in stage 2 compiler while configuring libgcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88450 --- Comment #18 from Jakub Jelinek --- Created attachment 45391 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45391&action=edit gcc9-pr88450.patch Untested patch that does that.
[Bug middle-end/86979] [9 Regression] ICE: in maybe_record_trace_start, at dwarf2cfi.c:2348 with -m32 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86979 --- Comment #10 from Alexander Monakov --- As discussed with Andrew offline, the real problem is creating a path where stack pointer is decremented twice - that is really not supposed to happen (so the issue could appear even in absence of REG_ARGS_SIZE notes). We'll be having another look to find the root cause.
[Bug libgcc/88772] Exception handling configured mode does not match the one finally used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772 Eric Botcazou changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2019-01-09 CC||ebotcazou at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Eric Botcazou --- What's the result of the configure check of libgcc for SJLJ? It should be visible in the config.log file in the libgcc build directory: whether the compiler is configured for setjmp/longjmp exceptions...
[Bug demangler/88539] A memory leak issue was discovered in cplus-dem.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88539 Nick Clifton changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||nickc at gcc dot gnu.org Resolution|--- |WONTFIX --- Comment #3 from Nick Clifton --- Sorry, but a leak of 10 bytes is just not serious enough to be worth worrying about. Especially when these programs do not run continuously but instead terminate shortly after they are invoked.
[Bug c++/88572] error: braces around scalar initializer - should be a warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88572 --- Comment #13 from Will Wray --- Re-reviewing, I notice that the patch I posted in comment #9 now rejects nested empty-brace scalar init: int i{{}}; which was previously accepted. So we'll need a decision on this too. Clang rejects with -pedantic-errors or warns otherwise: pedantic error / warning: too many braces around scalar initializer MSVC rejects: error: 'initializing': cannot convert from 'initializer list' to 'int' note: Too many braces around initializer for 'int' I reckon that Clang is right to reject under -pedantic, else accept and warn This Quora post comes to a similar conclusion: https://www.quora.com/Is-double-braced-scalar-initialization-allowed-by-the-C-standard-int-x >accepting {{}} for int seems like a harmless language extension.
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 --- Comment #4 from Martin Sebor --- The warning is triggered by the excessive size argument in the strncpy call. The excessive size makes the call invalid regardless of the values of the two pointer arguments. This happens both with the reduced test case in comment #0 and with the translation unit and -m32. The warning code just looks at the call: __builtin_strncpy (_65, buf_30, 4294967295); I don't see much the warning code alone can do to handle this case. We have talked about at least two approaches to dealing these invalid calls earlier. Jeff's preference is to replace them with traps. Others have suggested replacing them with __builtin_unreachable().
[Bug tree-optimization/88771] [9 Regression] Misleading -Werror=array-bounds error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88771 --- Comment #5 from Martin Sebor --- That said, the size range in the warning output is wrong. It should be just 4294967295. The warning should probably also be changed to -Wstringop-overflow which diagnoses both out-of-bounds writes and reads. I can look into that.
[Bug c/88766] [9 Regression] Rejects valid? C code since r259641
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88766 --- Comment #2 from joseph at codesourcery dot com --- Yes, I think that (a) a statement expression is not an lvalue and (b) if it were (or if the code were changed to move the unary '&' inside the statement expression), the code would be taking the address of an object whose lifetime had ended by the time that address is used.
[Bug tree-optimization/88763] Better Output for Loop Unswitching
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88763 --- Comment #4 from David Malcolm --- (In reply to Marius Messerschmidt from comment #3) > Sorry but I do not fully understand what you mean. Do you suggest using > different command line arguments? I believe Richard is referring to the internal API used for dumping; right now it's presumably just writing to a FILE *, and this doesn't show up for -fopt-info*. > So far I tried: > > -fdump-tree-all > -fdump-tree-unswitch > > and > > -fopt-info-all-optall > > But none of them told me the all the things that I would wish to know, most > important the reason why a particular loop was skipped during unswitching > (e.g. because it is not invariant or so (right now it already reports a few > things with -fdump-tree-unswitch like too-many-instructions or > too-many-branches)) Am taking a look.
[Bug libgcc/88772] Exception handling configured mode does not match the one finally used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88772 --- Comment #2 from Andoni --- (In reply to Eric Botcazou from comment #1) > What's the result of the configure check of libgcc for SJLJ? It should be > visible in the config.log file in the libgcc build directory: > > whether the compiler is configured for setjmp/longjmp exceptions... I just wiped the build to start a clean build from scratch, but I remember checking this and it was "no". I can confirm it in ~1 hour