[Bug c/69616] New: optimization of 8 movb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616 Bug ID: 69616 Summary: optimization of 8 movb Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: izaberina at gmail dot com Target Milestone: --- I'm on arch linux on x86_64, using gcc 5.3.0. From this code: char tape[65536]; void f() { tape[0] = 0; tape[1] = 0; tape[2] = 0; tape[3] = 0; tape[4] = 0; tape[5] = 0; tape[6] = 0; tape[7] = 0; } gcc produces 8 movb at any -O level, while clang produces 1 movq. Why is that not being optimized? $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /build/gcc/src/gcc-5.3.0/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release Thread model: posix gcc version 5.3.0 (GCC)
[Bug target/69616] optimization of 8 movb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616 Markus Trippelsdorf changed: What|Removed |Added Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 CC||trippels at gcc dot gnu.org Component|c |target Ever confirmed|0 |1 --- Comment #1 from Markus Trippelsdorf --- https://goo.gl/k9lDZQ
[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577 --- Comment #7 from rsandifo at gcc dot gnu.org --- (In reply to Uroš Bizjak from comment #6) > IMO, we should revert r215450, and fix a couple of cases using narrowing > conversions with gen_lowpart that were introduced after r215450. Please give me a few days to look at it first. I still think r215450 is correct and reverting it is likely to regress code quality.
[Bug target/69616] optimization of 8 movb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #2 from Richard Biener --- Generic BB vectorization could do this, I have partial patches to enable it. And we can combine stores on RTL. We have several duplicate bugreports here.
[Bug target/69532] FAIL: gcc.target/arm/{vect-,}fmaxmin.c execution test on armv7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69532 --- Comment #4 from david.sherwood at arm dot com --- (In reply to vries from comment #3) > Also for the non-vect version: > ... > FAIL: gcc.target/arm/fmaxmin.c execution test > ... Hi, if you are not already fixing this, I can take a look if you want?
[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 CC||rguenth at gcc dot gnu.org Component|rtl-optimization|tree-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- GCC relies on some fold() routine to do this and we end up with ;; Function r0_to_imax_2 (null) ;; enabled by -tree-original { if ((unsigned int) x <= 2147483645) { ext (); } } ;; Function r0_to_imax_1 (null) ;; enabled by -tree-original { if (x >= 0 && x != 2147483647) { ext (); } } ;; Function r0_to_imax (null) ;; enabled by -tree-original { if (x >= 0) { ext (); } so it seems we are confused by the trick that triggers first, replacing the <= INT_MAX-1 compare with a != INT_MAX compare and that not being handled in the range construction code. Looks like ifcombine doesn't handle it either (maybe_fold_and_comparisons). void ext(void); void r0_to_imax_1(int x){ if (x>=0 && x<=(__INT_MAX__-1)) ext(); }
[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0
[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener --- Mine.
[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Shouldn't be hard to teach the range code in reassoc about this, but stage1 material.
[Bug target/69617] New: PowerPC/e6500: Atomic byte/halfword operations not properly supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69617 Bug ID: 69617 Summary: PowerPC/e6500: Atomic byte/halfword operations not properly supported Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.hu...@embedded-brains.de Target Milestone: --- The PowerPC/e6500 support lacks support for the load/store byte/halfword with decoration indexed instructions: #include unsigned char inc_uchar(atomic_uchar *a) { return atomic_fetch_add(a, 1); } unsigned short inc_ushort(atomic_ushort *a) { return atomic_fetch_add(a, 1); } powerpc-rtems4.12-gcc -O2 -Wall -Wextra -pedantic -mcpu=e6500 -m32 -S atomic.c cat atomic.s .file "atomic.c" .machine power4 .section".text" .align 2 .p2align 4,,15 .globl inc_uchar .type inc_uchar, @function inc_uchar: rlwinm 8,3,3,27,28 li 7,255 xori 8,8,0x18 li 6,1 sync slw 7,7,8 slw 6,6,8 rlwinm 9,3,0,0,29 .L2: lwarx 3,0,9 add 5,3,6 andc 10,3,7 and 5,5,7 or 10,10,5 stwcx. 10,0,9 bne- 0,.L2 isync srw 3,3,8 rlwinm 3,3,0,0xff blr .size inc_uchar, .-inc_uchar .align 2 .p2align 4,,15 .globl inc_ushort .type inc_ushort, @function inc_ushort: rlwinm 8,3,3,27,27 li 7,0 xori 8,8,0x10 ori 7,7,0x li 6,1 sync slw 7,7,8 slw 6,6,8 rlwinm 9,3,0,0,29 .L6: lwarx 3,0,9 add 5,3,6 andc 10,3,7 and 5,5,7 or 10,10,5 stwcx. 10,0,9 bne- 0,.L6 isync srw 3,3,8 rlwinm 3,3,0,0x blr .size inc_ushort, .-inc_ushort .ident "GCC: (GNU) 6.0.0 20160202 (experimental)
[Bug target/69616] optimization of 8 movb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Like PR22141. Note that we probably want to deal with this at the GIMPLE level instead of RTL, that is just too late, and handle bitfields and other adjacent memory operations in there shortly before expansion, then at least for bitfields go through some GIMPLE passes (ccp, forwprop?) to clean that up.
[Bug target/69532] FAIL: gcc.target/arm/{vect-,}fmaxmin.c execution test on armv7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69532 --- Comment #5 from vries at gcc dot gnu.org --- (In reply to david.sherwood from comment #4) > (In reply to vries from comment #3) > > Also for the non-vect version: > > ... > > FAIL: gcc.target/arm/fmaxmin.c execution test > > ... > > Hi, if you are not already fixing this, I can take a look if you want? Please do :) Thanks, - Tom
[Bug tree-optimization/67921] [6 Regression] "internal compiler error: in build_polynomial_chrec, at tree-chrec.h:147" when using -fsanitize=undefined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67921 amker at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #11 from amker at gcc dot gnu.org --- Should be fixed.
[Bug rtl-optimization/69609] [6 Regression] block reordering consumes an inordinate amount of time, REE consumes much memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69609 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords||compile-time-hog, ||memory-hog Last reconfirmed||2016-02-02 Component|c |rtl-optimization CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 Summary|block reordering consumes |[6 Regression] block |an inordinate amount of |reordering consumes an |time|inordinate amount of time, ||REE consumes much memory Target Milestone|--- |6.0 --- Comment #1 from Richard Biener --- Also takes a lot of memory with GCC 4.9 at least (killed at 2Gb). GCC 5 seems to peak at ~1.6GB. Note that with this kind of generated code I _always_ recommend -O1. -O2 simply has too many quadraticnesses. This case is a load of indirect jumps and loops and thus a very twisted CFG. I'd say the number of BBs hits the BB reordering slowness. First memory peak is for RTL pre (400MB), then rest_of_handle_ud_dce (800MB), then REE (on trunk 2.1GB!). All basically DF issues. It looks like we leak somewhere as well. Thus first tracking the memory-use regression towards GCC 5 at -O2. -O1 behaves reasonably (300MB memory use, 30sec compile-time for GCC 6), only out-lier is df live&initialized regs: 11.51 (39%) usr 0.08 ( 9%) sys 11.47 (37%) wall 0 kB ( 0%) ggc well, we know DF is slow and memory hungry.
[Bug libgomp/69597] execution failure for libgomp.oacc-c-c++-common/atomic_capture-1.c with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69597 --- Comment #2 from Richard Biener --- OACC uses IPA PTA unconditionally, right?
[Bug target/69618] New: PowerPC/e6500: Atomic fence operations not properly supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69618 Bug ID: 69618 Summary: PowerPC/e6500: Atomic fence operations not properly supported Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.hu...@embedded-brains.de Target Milestone: --- On the PowerPC/e6500 elemental synchronization operations should be used for acquire/release barriers. Currently a lwsync is used instead: #include void release(void) { atomic_thread_fence(memory_order_release); } void acquire(void) { atomic_thread_fence(memory_order_acquire); } powerpc-rtems4.12-gcc -O2 -Wall -Wextra -pedantic -mcpu=e6500 -m32 -S fence.c cat fence.s .file "fence.c" .machine power4 .section".text" .align 2 .p2align 4,,15 .globl release .type release, @function release: lwsync blr .size release, .-release .align 2 .p2align 4,,15 .globl acquire .type acquire, @function acquire: lwsync blr .size acquire, .-acquire .ident "GCC: (GNU) 6.0.0 20160202 (experimental) See also e6500 Core Reference Manual, 5.5.5.2.1 (Simplified memory barrier recommendations) and EREF: A Programmer’s Reference Manual for Freescale Power Architecture Processors (06/2014), 7.4.8.3 (Forcing Load and Store Ordering (Memory Barriers)). For acquire semantic a "ESYNC 12" instruction should be used (Load-Load- and Load-Store-Barriere) and for release semantic a "ESYNC 5" instruction (Store-Store- and Load-Store-Barrier). See also Memory Model Rationales, section "Why ordering constraints are never limited to loads or stores" (www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html). The e6500 core honours in contrast to the general PowerPC architecture a Load-Store-Ordering (EREF, 7.4.8.2 (Architecture Ordering Requirements), number 4), so maybe for acquire semantic "ESYNC 8" instruction (Load-Load-Barrier) and for release semantic a "ESYNC 1" instruction is sufficient (Store-Store-Barrier). See EREF, 7.4.8.4 (Architectural Memory Access Ordering), 7.4.8.6.1 (Acquire Lock and Import Shared Memory) and 7.4.8.7.1 (Export Shared Memory and Release Lock).
[Bug libgomp/69597] execution failure for libgomp.oacc-c-c++-common/atomic_capture-1.c with -flto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69597 --- Comment #3 from vries at gcc dot gnu.org --- (In reply to Richard Biener from comment #2) > OACC uses IPA PTA unconditionally, right? It uses it by default. I think -fno-ipa-pta should work as expected.
[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 CC||ktkachov at gcc dot gnu.org Known to work||4.9.4, 5.3.1 Ever confirmed|0 |1 --- Comment #1 from ktkachov at gcc dot gnu.org --- Reproduced with the command line: -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority -march=armv7-a -mfloat-abi=hard -marm -mfpu=vfpv4 Note that you didn't specify a --with-fpu value in your gcc configuration and didn't use an -mfpu option in your command line, so gcc defaulted to the value 'vfp' for the fpu. For an architecture like armv7-a it's usually more common to use values like neon or neon-vfpv4 or vfpv4 if you don't want NEON. 'vfp' is a very old fpu level. Though this particular bug reproduces with -mfpu=vfp as well. That's just for future reference if you want a gcc targeting a more realistic armv7-a setup by default.
[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606 --- Comment #3 from Richard Biener --- So before VRP2 we have : load_dst_16 = b; # RANGE [2, 65535] NONZERO 65535 _12 = (int) load_dst_16; # RANGE [0, 255] _9 = (unsigned char) load_dst_16; e = _9; # RANGE [0, 255] NONZERO 255 _11 = (int) _9; d = _12; # RANGE [0, 1] NONZERO 1 _14 = 1 % 0; c = _14; return 0; the range on _12 is bogus, it was once conditional: : a.0_4 = a; # RANGE [-128, 127] _5 = (int) a.0_4; # RANGE [-128, 127] g_6 = ~_5; b.1_7 = b; # RANGE [0, 65535] NONZERO 65535 _8 = (int) b.1_7; if (_8 > 1) goto ; else goto ; : # RANGE [0, 255] _9 = (unsigned char) b.1_7; e = _9; # RANGE [0, 255] NONZERO 255 _11 = (int) _9; # RANGE [2, 65535] NONZERO 65535 _12 = _8 | _11; d = _12; : # RANGE [-128, 127] # g_1 = PHI # RANGE [0, 1] NONZERO 1 _14 = 1 % g_1; c = _14; return 0; And it's the bswap pass breaking it: 16 bit load in target endianness found at: _12 = (int) load_dst_16; ... : a.0_4 = a; # RANGE [-128, 127] _5 = (int) a.0_4; # RANGE [-128, 127] g_6 = ~_5; load_dst_16 = b; # RANGE [2, 65535] NONZERO 65535 _12 = (int) load_dst_16; because it does if (!useless_type_conversion_p (TREE_TYPE (tgt), load_type)) { val_tmp = make_temp_ssa_name (aligned_load_type, NULL, "load_dst"); load_stmt = gimple_build_assign (val_tmp, val_expr); gimple_set_vuse (load_stmt, n->vuse); gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT); gimple_assign_set_rhs_with_ops (&gsi, NOP_EXPR, val_tmp); } else { gimple_assign_set_rhs_with_ops (&gsi, MEM_REF, val_expr); gimple_set_vuse (cur_stmt, n->vuse); } thus simply replaces the RHS and before: /* Move cur_stmt just before one of the load of the original to ensure it has the same VUSE. See PR61517 for what could go wrong. */ gsi_move_before (&gsi, &gsi_ins); gsi = gsi_for_stmt (cur_stmt); which in this case moves the computation to a different BB.
[Bug middle-end/68542] [6 Regression] 10% 481.wrf performance regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68542 --- Comment #8 from Ilya Enkovich --- Author: ienkovich Date: Tue Feb 2 09:46:26 2016 New Revision: 233068 URL: https://gcc.gnu.org/viewcvs?rev=233068&root=gcc&view=rev Log: gcc/ 2016-02-02 Yuri Rumyantsev PR middle-end/68542 * config/i386/i386.c (ix86_expand_branch): Add support for conditional branch with vector comparison. * config/i386/sse.md (VI48_AVX): New mode iterator. (define_expand "cbranch4): Add support for conditional branch with vector comparison. * tree-vect-loop.c (optimize_mask_stores): New function. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores after vec_info destroy. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. gcc/testsuite 2016-02-02 Yuri Rumyantsev PR middle-end/68542 * gcc.dg/vect/vect-mask-store-move-1.c: New test. * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c trunk/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.c trunk/gcc/tree-vectorizer.h
[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606 --- Comment #4 from Richard Biener --- Testing Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c(revision 233067) --- gcc/tree-ssa-math-opts.c(working copy) *** bswap_replace (gimple *cur_stmt, gimple *** 2622,2627 --- 2622,2629 /* Move cur_stmt just before one of the load of the original to ensure it has the same VUSE. See PR61517 for what could go wrong. */ + if (gimple_bb (cur_stmt) != gimple_bb (src_stmt)) + reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt)); gsi_move_before (&gsi, &gsi_ins); gsi = gsi_for_stmt (cur_stmt);
[Bug tree-optimization/69619] New: [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 Bug ID: 69619 Summary: [6 Regression] compilation doesn't terminate during CCMP expansion Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: compile-time-hog, memory-hog Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 Testcase: int a, b, c, d; int e[1]; void fn1 () { int *f = &d; c = 6; for (; c; c--) { b = 0; for (; b <= 5; b++) { short g = e[(b + 2) * 9 + c]; *f = *f == a && e[(b + 2) * 9 + c]; } } } Given -O3 for aarch64 GCC doesn't terminate compilation and seems to keep eating more and more memory. The testcase does contain undefined behaviour and GCC warns about it: mycrash.c: In function 'fn1': mycrash.c:13:22: warning: iteration 1 invokes undefined behavior [-Waggressive-loop-optimizations] short g = e[(b + 2) * 9 + c]; ~^ mycrash.c:11:7: note: within this loop for (; b <= 5; b++) but GCC shouldn't go into an infinite loop. At -O2 the testcase compiles instantaneously. Interrupting compilation in gdb and dumping the callstack shows that it's recursing deeply into the ccmp expansion code. After a few seconds of compilation I see a stack frame more than 400 levels deep with expand_ccmp_expr appearing periodically in there, though I don't know if that's the ccmp expand code's fault or the tree optimisers' fault. I can provide the tree dumps if needed, though they should be easy to reproduce with a recent aarch64 compiler
[Bug tree-optimization/69619] [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||wdijkstr at arm dot com Target Milestone|--- |6.0 Known to fail||6.0
[Bug testsuite/69620] New: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized fails for powerpc64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69620 Bug ID: 69620 Summary: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized fails for powerpc64-linux-gnu Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: tiago.brusamarello at datacom dot ind.br Target Milestone: --- Hi. The test "gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized" is failing for gcc 4.9.1 on powerpc64-gnu-linux: Executing on host: powerpc64-unknown-linux-gnu-gcc /opt/x-tools/powerpc64-unknown-linux-gnu_v1.0/test-suite/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c -fno-diagnostics-show-caret -fdiagnostics-color=never -O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common -S -o loop-19.s(timeout = 300) spawn powerpc64-unknown-linux-gnu-gcc /opt/x-tools/powerpc64-unknown-linux-gnu_v1.0/test-suite/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c -fno-diagnostics-show-caret -fdiagnostics-color=never -O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common -S -o loop-19.s PASS: gcc.dg/tree-ssa/loop-19.c (test for excess errors) FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2 FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )c," 2
[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Even if we look through macros, I'd actually think we should warn here. Because this is actually: #define EAGAIN 11 #define EWOULDBLOCK EAGAIN extern int *__errno_location (void) __attribute__ ((__nothrow__, __leaf__, __const__)); #define errno (*__errno_location ()) int foo () { if (errno == EAGAIN || errno == EWOULDBLOCK) return 1; return 0; } and even errno.h claims that EWOULDBLOCK is EAGAIN. The warning on this started with r222408.
[Bug c++/69277] [6 Regression] ICE mangling a flexible array member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69277 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- So, any progress on this? If the updated patch needs pinging, please do ping it, patches should be pinged after a week.
[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615 --- Comment #3 from Peter Cordes --- @Richard and Jakub: That's just addressing the first part of my report, the problem with x <= (INT_MAX-1), right? You may have missed the second part of the problem, since I probably buried it under too much detail with the first: In the case where the limit is variable, but can easily be proven to itself be in the range [0 .. INT_MAX-1) or much smaller: // gcc always fails to optimize this to an unsigned compare, but clang succeeds void rangecheck_var(int64_t x, int64_t lim2) { //lim2 >>= 60; lim2 &= 0xf; // let the compiler figure out the limited range of limit if (x>=0 && x
[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #8 from rsandifo at gcc dot gnu.org --- Testing a patch.
[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615 --- Comment #4 from Jakub Jelinek --- I suppose even that is doable in the reassoc framework, or it could be done in match.pd just using the recorded value ranges, like richi has handled PR69595, or both.
[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #1 from ktkachov at gcc dot gnu.org --- Just increasing the size of 'e' avoids undefined behaviour. The following doesn't give a warning and still shows the bug: int a, b, c, d; int e[100]; void fn1 () { int *f = &d; c = 6; for (; c; c--) { b = 0; for (; b <= 5; b++) { short g = e[(b + 2) * 9 + c]; *f = *f == a && e[(b + 2) * 9 + c]; } } }
[Bug c++/69621] New: extern std::string used as reference template-argument does not have [abi:cxx11] tag applied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69621 Bug ID: 69621 Summary: extern std::string used as reference template-argument does not have [abi:cxx11] tag applied Product: gcc Version: 5.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ed at catmur dot co.uk Target Milestone: --- #include template struct S { static void f(); }; extern std::string s; void g() { S::f(); } asm emitted for g(): jmp S::f() Expected: jmp S::f() The latter will be emitted if there is a definition for the std::string in the translation unit. Broken versions: 5.1.0, 5.2.0, 5.3.0 Workaround: change S to take its template-parameter by pointer: jmp S<&(s[abi:cxx11])>::f() Original reporter: http://stackoverflow.com/questions/35151104/stdstring-as-template-parameter-and-abi-tag-in-gcc-5 Possibly related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66971
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 --- Comment #8 from Bernd Schmidt --- Looks like it can be slightly reduced, removing not executed paths. template constexpr inline const T & min (const T &a, const T &b) { if (b < a) return b; return a; } template < typename T > constexpr inline const T & max (const T &a, const T &b) { if (a < b) return b; return a; } static inline void foo (unsigned x, unsigned y, unsigned z, double &h, double &s, double &l) { double r = x / 255.0; double g = y / 255.0; double b = z / 255.0; double m = max (r, max (g, b)); double n = min (r, min (g, b)); double d = m - n; double e = m + n; h = 0.0, s = 0.0, l = e / 2.0; if (d > 0.0) { s = l > 0.5 ? d / (2.0 - e) : d / e; if (m == g) h = (b - r) / d + 2.0; h /= 6.0; } } __attribute__ ((noinline, noclone)) void bar (unsigned x[3], double y[3]) { double h, s, l; foo (x[0], x[1], x[2], h, s, l); y[0] = h; y[1] = s; y[2] = l; } int main () { unsigned x[3] = { 0, 128, 0 }; double y[3]; bar (x, y); if (__builtin_fabs (y[0] - 0.3) > 0.001) __builtin_abort (); return 0; }
[Bug target/69622] New: compiler reordering of non-temporal (write-combining) stores produces significant performance hit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69622 Bug ID: 69622 Summary: compiler reordering of non-temporal (write-combining) stores produces significant performance hit Product: gcc Version: 5.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- Target: i386-linux-gnu, x86_64-linux-gnu IDK whether to mark this as "target" or something else. Other architectures might have similar write-combining stores that are sensitive to writing whole cache-lines at once. For background, see this SO question: http://stackoverflow.com/questions/25778302/wrong-gcc-generated-assembly-ordering-results-in-performance-hit In an unrolled copy loop, gcc decides to emit vmovntdq stores in a different order than they appear in the source. There's no correctness issue, but the amount of fill-buffers is very limited (maybe each core has 10 or so?). So it's *much* better to write all of one cacheline, then all of the next cacheline. See my answer on that SO question for lots of discussion and links. The poster of that question got a 33% speedup (from ~10.2M packets per second to ~13.3M packets per second by putting the loads and stores in source order in the binary. (Unknown hardware and surrounding code, but presumably this loop is *the* bottleneck in his app). Anyway, real numbers show that this isn't just a theoretical argument that some code would be better. Compilable test-case that demonstrates the issue: #include #include //#define compiler_writebarrier() __asm__ __volatile__ ("") #define compiler_writebarrier() // empty. void copy_mcve(void *const destination, const void *const source, const size_t bytes) { __m256i *dst = destination; const __m256i *src = source; const __m256i *dst_endp = (destination + bytes); while (dst < dst_endp) { __m256i m0 = _mm256_load_si256( src + 0 ); __m256i m1 = _mm256_load_si256( src + 1 ); __m256i m2 = _mm256_load_si256( src + 2 ); __m256i m3 = _mm256_load_si256( src + 3 ); _mm256_stream_si256( dst+0, m0 ); compiler_writebarrier(); // even one anywhere in the loop is enough for current gcc _mm256_stream_si256( dst+1, m1 ); compiler_writebarrier(); _mm256_stream_si256( dst+2, m2 ); compiler_writebarrier(); _mm256_stream_si256( dst+3, m3 ); compiler_writebarrier(); src += 4; dst += 4; } } compiles (with the barriers defined as a no-op) to (gcc 5.3.0 -O3 -march=haswell: http://goo.gl/CwtpS7): copy_mcve: addq%rdi, %rdx cmpq%rdx, %rdi jnb .L7 .L5: vmovdqa 32(%rsi), %ymm2 subq$-128, %rdi subq$-128, %rsi vmovdqa -64(%rsi), %ymm1 vmovdqa -32(%rsi), %ymm0 vmovdqa -128(%rsi), %ymm3 # If dst is aligned, the four halves of two cache lines are {A B} {C D}: vmovntdq%ymm2, -96(%rdi) # B vmovntdq%ymm1, -64(%rdi) # C vmovntdq%ymm0, -32(%rdi) # D vmovntdq%ymm3, -128(%rdi)# A cmpq%rdi, %rdx ja .L5 vzeroupper .L7:ret If the output buffer is aligned, that B C D A store ordering maximally separates the two halves of the first cache line, giving the most opportunity for partially-full fill buffers to get flushed. Doing the +32 load first makes no sense with that placement of the pointer-increment instructions. Doing the +0 load first could save a byte of code-size by not needing a displacement byte. I'm guessing that's what one optimizer function was going for when it put the subs there, but then something else came along and re-ordered the loads. Is there something that tries to touch both cache-lines as early as possible, to trigger the loads? Assuming the buffer is 64B-aligned? Doing the subs after the last store would save another insn byte, because one of the stores could use an empty displacement as well. That's where clang puts the pointer increments (and it keeps the loads and stores in source order). clang also uses vmovaps / vmovntps. It's probably a holdover from saving an insn byte in the non-VEX encoding of the 128b insn, but does make the output work with AVX1 instead of requiring AVX2. Using a 2-register addressing mode for the loads could save a sub instruction inside the loop. Increment dst normally, but reference src with a 2-register addressing mode with dst and a register initialized with src-dst. (In the godbolt link, uncomment the #define ADDRESSING_MODE_HACK. With ugly enough source, gcc can be bludgeoned into making code like that. It wastes insns in the intro, though, apparently to avoid 3-component (base+index+disp) addresses.
[Bug tree-optimization/68715] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1043
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68715 ktkachov at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed|2016-01-07 00:00:00 |2016-2-2 CC||ktkachov at gcc dot gnu.org --- Comment #4 from ktkachov at gcc dot gnu.org --- Seeing this on aarch64 with current trunk at -Ofast -floop-interchange with the C testcase with ISL 0.15: int a[1], c[1]; int b, d, e; void fn1 (int p1) { for (;;) ; } int fn3 () { for (; e; e++) c[e] = 2; for (; d; d--) a[d] = 8; return 0; } int fn5 (int); int fn2 () { fn3 (); } void fn4 () { fn1 (b || fn5 (fn2 ())); }
[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #2 from Wilco --- Changing to c = 3 generates code after a short time. The issue is recursive calls to expand_ccmp_expr during the 2 possible options tried to determine costs. That makes the algorithm exponential. A fix would be to expand the LHS and RHS of both gs0 and gs1 in expand_ccmp_expr_1 (and not in gen_ccmp_first) as these expansions will be identical no matter what order we choose to expand gs0 and gs1 in. It's not clear how easy that can be achieved using the existing interfaces though.
[Bug c++/69623] New: CWG 1388; Invalid deduction of non-trailing template parameter pack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69623 Bug ID: 69623 Summary: CWG 1388; Invalid deduction of non-trailing template parameter pack Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: colu...@gmx-topmail.de Target Milestone: --- template void f(T..., U...) {} int main() {f();} --- This shouldn't compile. Although U is deduced to the empty pack via [temp.arg.explicit]/3, T isn't, because it isn't trailing. I.e. this code should yield a deduction failure. Also see https://llvm.org/bugs/show_bug.cgi?id=26435.
[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602 --- Comment #6 from Manuel López-Ibáñez --- (In reply to Jakub Jelinek from comment #5) > Even if we look through macros, I'd actually think we should warn here. I think we should NOT look through macros. The purpose of the warning is to catch mistakes like xxx && !!xxx or 0 < A || A > 0, but if the user writes f() && g() and both functions return the same value, it is clearly not a bug, even if GCC knows that they are the same function. For the purposes of this warning (and probably several others), we should treat macros as variables and functions. Thus, (errno == EAGAIN || errno == EWOULDBLOCK) /* no warning: EAGAIN and EWOULDBLOCK are not the same macro */ (errno == EAGAIN || EAGAIN == errno) /* warn: logical 'or' of the same expressions */ Of course, this is not trivial to implement at the moment (mostly because we don't have locations for macros that expand to constants, thus even finding that something comes from a macro is not possible in most cases).
[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602 --- Comment #7 from Manuel López-Ibáñez --- (In reply to Eric Blake from comment #0) > However, as shown by the sample code below, gcc 6.0's new warning is > over-ambitious, and is likely to _cause_ rather than cure user bugs, when > uninformed users unaware that Linux has the two errno values equal dumb down > the code to silence the warning, but in the process break their code on > other platforms where it is important to check for both values. As a work-around, something like: if (0 || errno == EAGAIN || errno == EWOULDBLOCK) silences the warning (although it should not).
[Bug rtl-optimization/69606] [5 Regression] wrong code at -Os and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606 Richard Biener changed: What|Removed |Added Known to work||6.0 Summary|[5/6 Regression] wrong code |[5 Regression] wrong code |at -Os and above on |at -Os and above on |x86_64-linux-gnu|x86_64-linux-gnu Known to fail||5.3.0 --- Comment #5 from Richard Biener --- Fixed on trunk sofar.
[Bug rtl-optimization/69606] [5 Regression] wrong code at -Os and above on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606 Richard Biener changed: What|Removed |Added Known to work||6.0 Summary|[5/6 Regression] wrong code |[5 Regression] wrong code |at -Os and above on |at -Os and above on |x86_64-linux-gnu|x86_64-linux-gnu Known to fail||5.3.0 --- Comment #5 from Richard Biener --- Fixed on trunk sofar. --- Comment #6 from Richard Biener --- Author: rguenth Date: Tue Feb 2 12:39:36 2016 New Revision: 233069 URL: https://gcc.gnu.org/viewcvs?rev=233069&root=gcc&view=rev Log: 2016-02-02 Richard Biener PR tree-optimization/69606 * tree-ssa-math-opts.c (bswap_replace): Clear flow sensitive info on the result before moving a stmt. * gcc.dg/torture/pr69606.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr69606.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-math-opts.c
[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602 --- Comment #8 from Jakub Jelinek --- (In reply to Manuel López-Ibáñez from comment #7) > (In reply to Eric Blake from comment #0) > > However, as shown by the sample code below, gcc 6.0's new warning is > > over-ambitious, and is likely to _cause_ rather than cure user bugs, when > > uninformed users unaware that Linux has the two errno values equal dumb down > > the code to silence the warning, but in the process break their code on > > other platforms where it is important to check for both values. > > As a work-around, something like: > > if (0 || errno == EAGAIN || errno == EWOULDBLOCK) > > silences the warning (although it should not). That is something that should be fixed. But if (errno == EAGAIN || (EWOULDBLOCK != EAGAIN && errno == EWOULDBLOCK)) could be better workaround.
[Bug rtl-optimization/69307] [4.9/5/6 Regression] wrong code with -O2 -fselective-scheduling @ armv7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69307 --- Comment #6 from Andrey Belevantsev --- Created attachment 37551 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37551&action=edit proposed patch Here before reload we're trying to rename a hard register. At the very final point of choosing the new register we forget to properly check hard_regno_nregs when checking liveness restrictions (though we did all the way to that point). Then we incorrectly choose the original register as it seems to be good enough. Fixed by looping over all registers specified in hard_regno_nregs at that place, too.
[Bug target/69622] compiler reordering of non-temporal (write-combining) stores produces significant performance hit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69622 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- A workaround is -fno-schedule-insns2. I suppose the compiler is trying to increase the distance of the loads and stores (in a greedy way) to reduce the impact on load latency in the general premise of moving loads up and stores down. In fact with -fno-schedule-insns2 you can see that we end up with .L5: vmovdqa 32(%rsi), %ymm2 vmovdqa 64(%rsi), %ymm1 vmovdqa 96(%rsi), %ymm0 vmovdqa (%rsi), %ymm3 vmovntdq%ymm3, (%rdi) vmovntdq%ymm2, 32(%rdi) vmovntdq%ymm1, 64(%rdi) vmovntdq%ymm0, 96(%rdi) which is because we do TER the zero-offset load (thus RTL expand it right before the store). Possibly scheduling tries to fix that up but does a miserable job.
[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032 Uroš Bizjak changed: What|Removed |Added Keywords|ra | Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #13 from Uroš Bizjak --- Well... at the end, it was target problem. For the RA to work as expected, we have to provide some sensible cost values for MMX and SSE register moves. Following patch fixes the problem: --cut here-- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b500233..121e802 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -595,17 +595,17 @@ struct processor_costs geode_cost = { {4, 6, 6}, /* cost of storing fp registers in SFmode, DFmode and XFmode */ - 1, /* cost of moving MMX register */ - {1, 1}, /* cost of loading MMX registers + 2, /* cost of moving MMX register */ + {2, 2}, /* cost of loading MMX registers in SImode and DImode */ - {1, 1}, /* cost of storing MMX registers + {2, 2}, /* cost of storing MMX registers in SImode and DImode */ - 1, /* cost of moving SSE register */ - {1, 1, 1}, /* cost of loading SSE registers + 2, /* cost of moving SSE register */ + {2, 2, 8}, /* cost of loading SSE registers in SImode, DImode and TImode */ - {1, 1, 1}, /* cost of storing SSE registers + {2, 2, 8}, /* cost of storing SSE registers in SImode, DImode and TImode */ - 1, /* MMX or SSE register to integer */ + 6, /* MMX or SSE register to integer */ 64, /* size of l1 cache. */ 128, /* size of l2 cache. */ 32, /* size of prefetch block */ --cut here--
[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614 --- Comment #2 from ktkachov at gcc dot gnu.org --- Bisection showed this started with r228302. But I'm not sure if that's the cause or just exposes a latent bug.
[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 --- Comment #3 from Wilco --- A simple workaround is to calculate cost1 early and only try the 2nd option if the cost is low (ie. it's not a huge expression that may evaluate into lots of ccmps). A slightly more advanced way would be to walk prep_seq_1 to count the actual number of ccmp instructions.
[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614 --- Comment #3 from ktkachov at gcc dot gnu.org --- Note that -mtpcs-leaf-frame was deprecated in GCC 5 due to a number of bugs with it: https://gcc.gnu.org/gcc-5/changes.html There are a number of known issues with these options relating to the old ABI: https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=mapcs-frame&list_id=139237
[Bug c/69624] New: sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 Bug ID: 69624 Summary: sanitize-coverage=trace-pc miscompiles kernel Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jirislaby at gmail dot com Target Milestone: --- I have commit a8175057d14fa8ff8cc4589edf55a6855d9afdf4 Author: Dmitry Vyukov Date: Mon Nov 9 19:59:08 2015 +0100 new coverage that uses shared buffer applied to kernel 4.4. I am seeing crashes in netlink_bind at 0xd5dc: d5bd: 4c 89 e2mov%r12,%rdx d5c0: e8 00 00 00 00 callq d5c5 d5c1: R_X86_64_PC32 __sw_hweight32-0x4 d5c5: 03 83 d0 02 00 00 add0x2d0(%rbx),%eax d5cb: 48 c1 ea 03 shr$0x3,%rdx d5cf: 41 89 c5mov%eax,%r13d d5d2: 48 b8 00 00 00 00 00movabs $0xdc00,%rax d5d9: fc ff df d5dc: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) because rdx is 0. rdx is fetched from r12, then __sw_hweight32 is called, it zeroes rdx and (%rdx,%rax,1) dereference is then rax == 0xdc00 dereference which leads to a crash.
[Bug target/69613] [6 Regression] wrong code with -O and simple 128bit arithmetics and vectors @ aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69613 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 CC||ktkachov at gcc dot gnu.org Known to work||4.9.4, 5.3.1 Ever confirmed|0 |1 --- Comment #1 from ktkachov at gcc dot gnu.org --- Confirmed
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #1 from Jiri Slaby --- Created attachment 37552 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37552&action=edit __sw_hweight32 assembly
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #2 from Jiri Slaby --- Created attachment 37553 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37553&action=edit __sanitizer_cov_trace_pc implementation This guys actually changes rdx.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #3 from Jiri Slaby --- Preprocessed code: http://www.fi.muni.cz/~xslaby/sklad/af_netlink.i This one results in the code from initial description. I.e. rdx is loaded before a call.
[Bug c++/69621] extern std::string used as reference template-argument does not have [abi:cxx11] tag applied
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69621 Jonathan Wakely changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 Ever confirmed|0 |1
[Bug libgomp/69625] New: deadlock in libgomp.c/doacross-1.c test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69625 Bug ID: 69625 Summary: deadlock in libgomp.c/doacross-1.c test Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vogt at linux dot vnet.ibm.com CC: jakub at gcc dot gnu.org Target Milestone: --- Target: s390x Created attachment 37554 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37554&action=edit .s file of test program On s390x with -march=z196 -O2/-O3 the test hangs with a deadlock (and also doacross-[2.3].c and doacross-1.C, but I haven't looked at them yet). I've stripped down the test to this: -- snip -- #include #define N 64 int b[N / 16][8][4]; int main () { int i, j, k, l; (void)l; #pragma omp parallel { printf("+++\n"); #pragma omp for schedule(static, 0) ordered (3) nowait for (i = 2; i < N / 16 - 1; i++) for (j = 0; j < 8; j += 2) for (k = 1; k <= 3; k++) { #pragma omp atomic write b[i][j][k] = 11; #pragma omp ordered depend(sink: i, j - 2, k - 1) \ depend(sink: i - 2, j - 2, k + 1) #pragma omp ordered depend(sink: i - 3, j + 2, k - 2) if (j >= 2 && k > 1) { #pragma omp atomic read l = b[i][j - 2][k - 1]; } #pragma omp atomic write b[i][j][k] = 22; if (i >= 4 && j >= 2 && k < 3) { #pragma omp atomic read l = b[i - 2][j - 2][k + 1]; } #pragma omp ordered depend(source) #pragma omp atomic write b[i][j][k] = 33; } printf("---\n"); } printf("done\n"); return 0; } -- snip -- (See attachment for full .s file.) (Running on an LPAR with 17 cores inside gdb.) The function GOMP_parallel starts threads 2 to 17 which enter and leave the parallel region (they print both "+++" and "---" then hang in a team_barrier_wait_final() call in gomp_thread_start. Only then thread 1 runs the thread function. gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads)); fn (data); Thread 1 comes across 0x8b7a <+522>: brasl %r14,0x87b0 with %r10 == 2 (which presumably contains k), then continues through 0x8cf6 <+902>: brasl %r14,0x86f0 and finally comes back to 0x8b7a <+522>: brasl %r14,0x87b0 with %r10 == 3. In GOMP_doacross_wait() it ends up calling doacross_spin() and never gets out of that again: doacross_spin (array, flattened, cur); 0x03fff7ef5562 <+282>: lg %r1,0(%r5) 0x03fff7ef5568 <+288>: clgr%r1,%r2 0x03fff7ef556c <+292>: jle 0x3fff7ef5562 The value of r1 (= *r5 (= *array?)) remains 6 (since there's no other thread left that could modify it) while the value of r2 is 0xfffb4a1. To me this looks as if doacross_spin() compares an integer value with an address or rubbish. Any ideas what's going on?
[Bug libstdc++/69626] New: [6 Regression] std::strtoll no longer defined in c++98 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626 Bug ID: 69626 Summary: [6 Regression] std::strtoll no longer defined in c++98 mode Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- for __cplusplus < 201103L bits/c++config.h does: # ifndef _GLIBCXX_USE_C99_STDLIB # define _GLIBCXX_USE_C99_STDLIB _GLIBCXX98_USE_C99_STDLIB # endif but acinclude.m4 never defines _GLIBCXX98_USE_C99_STDLIB so the C99 stdlib.h functions are no longer defined for C++98. PR 69350 says we shouldn't be defining those non-C++98 functions anyway, but removing them now was not intentional (and we probably don't want to remove them for -std=gnu++98 anyway). #include int main() { &std::strtoll; } $ g++ -std=c++98 ll.cc ll.cc: In function ‘int main()’: ll.cc:5:4: error: ‘strtoll’ is not a member of ‘std’ &std::strtoll; ^~~ ll.cc:5:4: note: suggested alternative: In file included from /home/jwakely/gcc/6/include/c++/6.0.0/cstdlib:75:0, from ll.cc:1: /usr/include/stdlib.h:209:22: note: ‘strtoll’ extern long long int strtoll (const char *__restrict __nptr, ^~~
[Bug c++/69627] New: [6 Regression] Conditional jump or move depends on uninitialised value(s) in (anonymous namespace)::layout::get_state_at_point
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69627 Bug ID: 69627 Summary: [6 Regression] Conditional jump or move depends on uninitialised value(s) in (anonymous namespace)::layout::get_state_at_point Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- Hello. $ cat tc2.c typedef float __m128; void test_1 () { __m128 myvec[2]; int const *ptr; myvec[1]/ptr; } $ valgrind --leak-check=yes --trace-children=yes ./gcc/xgcc -B gcc tc2.c produces: ==14470== Conditional jump or move depends on uninitialised value(s) ==14470==at 0x1901E24: (anonymous namespace)::layout::get_state_at_point(int, int, int, int, (anonymous namespace)::point_state*) (diagnostic-show-locus.c:725) ==14470==by 0x1901861: (anonymous namespace)::layout::print_source_line(int, (anonymous namespace)::line_bounds*) (diagnostic-show-locus.c:561) ==14470==by 0x190210A: diagnostic_show_locus(diagnostic_context*, diagnostic_info const*) (diagnostic-show-locus.c:835) ==14470==by 0x8B50E6: c_diagnostic_finalizer(diagnostic_context*, diagnostic_info*) (c-opts.c:167) ==14470==by 0x18FEBCC: diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) (diagnostic.c:800) ==14470==by 0x18FFF96: error_at_rich_loc(rich_location*, char const*, ...) (diagnostic.c:1173) ==14470==by 0x85E415: binary_op_error(rich_location*, tree_code, tree_node*, tree_node*) (c-common.c:3865) ==14470==by 0x7FF1E0: build_binary_op(unsigned int, tree_code, tree_node*, tree_node*, int) (c-typeck.c:11577) ==14470==by 0x7DF958: parser_build_binary_op(unsigned int, tree_code, c_expr, c_expr) (c-typeck.c:3515) ==14470==by 0x81D970: c_parser_binary_expression(c_parser*, c_expr*, tree_node*) (c-parser.c:6636) ==14470==by 0x81C27B: c_parser_conditional_expression(c_parser*, c_expr*, tree_node*) (c-parser.c:6279) ==14470==by 0x81BF8B: c_parser_expr_no_commas(c_parser*, c_expr*, tree_node*) (c-parser.c:6196) ==14470== ==14470== Conditional jump or move depends on uninitialised value(s) ==14470==at 0x1901E24: (anonymous namespace)::layout::get_state_at_point(int, int, int, int, (anonymous namespace)::point_state*) (diagnostic-show-locus.c:725) ==14470==by 0x190199F: (anonymous namespace)::layout::print_annotation_line(int, (anonymous namespace)::line_bounds) (diagnostic-show-locus.c:603) ==14470==by 0x1902129: diagnostic_show_locus(diagnostic_context*, diagnostic_info const*) (diagnostic-show-locus.c:837) ==14470==by 0x8B50E6: c_diagnostic_finalizer(diagnostic_context*, diagnostic_info*) (c-opts.c:167) ==14470==by 0x18FEBCC: diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) (diagnostic.c:800) ==14470==by 0x18FFF96: error_at_rich_loc(rich_location*, char const*, ...) (diagnostic.c:1173) ==14470==by 0x85E415: binary_op_error(rich_location*, tree_code, tree_node*, tree_node*) (c-common.c:3865) ==14470==by 0x7FF1E0: build_binary_op(unsigned int, tree_code, tree_node*, tree_node*, int) (c-typeck.c:11577) ==14470==by 0x7DF958: parser_build_binary_op(unsigned int, tree_code, c_expr, c_expr) (c-typeck.c:3515) ==14470==by 0x81D970: c_parser_binary_expression(c_parser*, c_expr*, tree_node*) (c-parser.c:6636) ==14470==by 0x81C27B: c_parser_conditional_expression(c_parser*, c_expr*, tree_node*) (c-parser.c:6279) ==14470==by 0x81BF8B: c_parser_expr_no_commas(c_parser*, c_expr*, tree_node*) (c-parser.c:6196) Thanks, Martin
[Bug c++/69628] New: [6 Regression] Conditional jump or move depends on uninitialised value(s)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69628 Bug ID: 69628 Summary: [6 Regression] Conditional jump or move depends on uninitialised value(s) Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- Hello. Running: $ echo "0'';" | valgrind --leak-check=yes --trace-children=yes ./gcc/xg++ -Bgcc -std=c++14 -xc++ - produces: ==14976== Conditional jump or move depends on uninitialised value(s) ==14976==at 0xB1A59D: lex_charconst(cpp_token const*) (c-lex.c:1252) ==14976==by 0xB18692: c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int) (c-lex.c:550) ==14976==by 0x92889A: cp_lexer_get_preprocessor_token(cp_lexer*, cp_token*) (parser.c:792) ==14976==by 0x928545: cp_lexer_new_main() (parser.c:656) ==14976==by 0x92BD1A: cp_parser_new() (parser.c:3687) ==14976==by 0x97B2C0: c_parse_file() (parser.c:37354) ==14976==by 0xB23FDA: c_common_parse_file() (c-opts.c:1064) ==14976==by 0x11225EE: compile_file() (toplev.c:465) ==14976==by 0x1124B96: do_compile() (toplev.c:1988) ==14976==by 0x1124E21: toplev::main(int, char**) (toplev.c:2096) ==14976==by 0x1B4DE9F: main (main.c:39) Thanks, Martin
[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 Ever confirmed|0 |1 --- Comment #4 from ktkachov at gcc dot gnu.org --- Confirmed then
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2016-02-02 Ever confirmed|0 |1 --- Comment #4 from Jakub Jelinek --- What gcc options are you using on the preprocessed source to trigger this?
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #5 from Jiri Slaby --- (In reply to Jakub Jelinek from comment #4) > What gcc options are you using on the preprocessed source to trigger this? By default this: gcc-6 -nostdinc -fno-strict-aliasing -fno-common -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -pipe -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fstack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fasynchronous-unwind-tables -pg -mfentry -fno-inline-functions-called-once -fno-strict-overflow -fconserve-stack -fsanitize=kernel-address -fasan-shadow-offset=0xdc00 --param asan-stack=1 --param asan-globals=1 --param asan-instrumentation-with-call-threshold=1 -fsanitize-coverage=trace-pc -S -o - af_netlink.i And this simplified one produces the same around the call: gcc-6 -nostdinc -O2 -std=gnu89 -fsanitize=kernel-address -fasan-shadow-offset=0xdc00 --param asan-stack=1 --param asan-globals=1 --param asan-instrumentation-with-call-threshold=1 -fsanitize-coverage=trace-pc -S -o - af_netlink.i
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #6 from Dmitry Vyukov --- Also what gcc version? I've tried: gcc version 6.0.0 20160105 (experimental) (GCC) $ gcc /tmp/af_netlink.c -c -O2 -fsanitize-coverage=trace-pc -fsanitize=kernel-address --param asan-stack=1 --param asan-globals=1 --param asan-instrumentation-with-call-threshold=1 -fasan-shadow-offset=0xdc00 (which should resemble what kernel uses), but I don't see any similar code fragments.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #7 from Jiri Slaby --- (In reply to Dmitry Vyukov from comment #6) > Also what gcc version? $ gcc-6 --version gcc-6 (SUSE Linux) 6.0.0 20160121 (experimental) [trunk revision 232670] > I've tried: > gcc version 6.0.0 20160105 (experimental) (GCC) > $ gcc /tmp/af_netlink.c -c -O2 -fsanitize-coverage=trace-pc > -fsanitize=kernel-address --param asan-stack=1 --param asan-globals=1 > --param asan-instrumentation-with-call-threshold=1 > -fasan-shadow-offset=0xdc00 With this I see that too: movq%r12, %rdx #APP # 28 "../arch/x86/include/asm/arch_hweight.h" 1 661: call __sw_hweight32 662: #...more ALTINST crap #NO_APP addl720(%rbx), %eax shrq$3, %rdx movl%eax, %r13d movabsq $-2305847407260205056, %rax cmpb$0, (%rdx,%rax)
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #8 from Dmitry Vyukov --- First of all, are you sure that r12 is not 0 before the call? Deference of 0xdc00 is how KASAN reacts on NULL deref, it does shadow check before the memory accesses. If original address is NULL, the shadow check will go to 0xdc00. I see such GPFs quite frequently, so that's what I would assume first. If you just switched to gcc6, then it can be some latent bug (undefined behavior), which started to fire with a new compiler. p.s. I can reproduce the generated code now.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #9 from Jiri Slaby --- (In reply to Dmitry Vyukov from comment #8) > First of all, are you sure that r12 is not 0 before the call? Yes. > Deference of 0xdc00 is how KASAN reacts on NULL deref, it does > shadow check before the memory accesses. If original address is NULL, the > shadow check will go to 0xdc00. I see such GPFs quite > frequently, so that's what I would assume first. I know, I thought so first too. But later, I debugged that to a gcc bug :). > If you just switched to gcc6, then it can be some latent bug (undefined > behavior), which started to fire with a new compiler. W/ CONFIG_KCOV=n (i.e. no -fsanitize-coverage), it works, apparently.
[Bug c++/69628] [6 Regression] Conditional jump or move depends on uninitialised value(s) in lex_charconst(cpp_token const*) (c-lex.c:1252)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69628 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0
[Bug c++/69627] [6 Regression] Conditional jump or move depends on uninitialised value(s) in (anonymous namespace)::layout::get_state_at_point
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69627 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0
[Bug libstdc++/69626] [6 Regression] std::strtoll etc. no longer defined in c++98 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #10 from Jakub Jelinek --- If you are calling a function (__sw_hweight32) without letting gcc know you do that, are you sure that function call does not modify any registers other than "flags" and "rax"?
[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577 rsandifo at gcc dot gnu.org changed: What|Removed |Added URL||https://gcc.gnu.org/ml/gcc- ||patches/2016-02/msg00127.ht ||ml --- Comment #9 from rsandifo at gcc dot gnu.org --- Patch posted as: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00127.html Would definitely appreciate wider testing, e.g. for SPEC performance. (Assuming the patch is OK in principle of course.)
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #11 from Jiri Slaby --- (In reply to Jakub Jelinek from comment #10) > If you are calling a function (__sw_hweight32) without letting gcc know you > do that, are you sure that function call does not modify any registers other > than "flags" and "rax"? Not at all, I suppose. See attachment #37552. __sw_hweight32 changes only retval (rax) and parameter (rdi). But __sw_hweight32 proper is as well instrumented by the coverage hook. And it changes whatever... In any way, what is the proper way of annotating asm() directive when there is a call inside?
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #12 from Jiri Slaby --- (In reply to Jiri Slaby from comment #11) > __sw_hweight32 changes only retval (rax) and parameter (rdi). ... and rdi is stored to and restored from stack.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #13 from Jakub Jelinek --- Seems hweight.c is compiled with -fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11 but that of course expects that all the functions in there are leaf, if they aren't leaf, you'd need to compile with the same flags also all the functions that might be called from there. So bet you want to arrange for hweight.c from being compiled without -fsanitize-coverage=trace-pc and without other options that might introduce calls.
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 Bernd Schmidt changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #9 from Bernd Schmidt --- Ah, of course. 804856f: df ec fucomip %st(4),%st pc 0x804856f0x804856f st00.5019607843137254902230771913540508 (raw 0x3ffe8080808080808081) st11(raw 0x3fff8000) st20.2509803921568627451115385956770254 (raw 0x3ffd8080808080808081) st30(raw 0x) st40.5019607843137254832299731788225472 (raw 0x3ffe8080808080808000) An equality comparison of floating point numbers, on x87. One of those was just loaded from a stack slot, the other was kept in a register the whole time. This code needs -ffloat-store, or -mpc64, when compiled for 32-bit x86.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #14 from Dmitry Vyukov --- Wait, I already disabled instrumentation of hweight.c for because of this: +# Kernel does not boot if we instrument this file as it uses custom calling +# convention (see CONFIG_ARCH_HWEIGHT_CFLAGS). +KCOV_INSTRUMENT_hweight.o := n If you apply the latest kcov patch "[PATCH v6] kernel: add kcov code coverage", it should work.
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 --- Comment #10 from Jakub Jelinek --- (In reply to Bernd Schmidt from comment #9) > Ah, of course. > > 804856f: df ec fucomip %st(4),%st > > pc 0x804856f 0x804856f > st00.5019607843137254902230771913540508 (raw > 0x3ffe8080808080808081) > st11 (raw 0x3fff8000) > st20.2509803921568627451115385956770254 (raw > 0x3ffd8080808080808081) > st30 (raw 0x) > st40.5019607843137254832299731788225472 (raw > 0x3ffe8080808080808000) > > An equality comparison of floating point numbers, on x87. One of those was > just loaded from a stack slot, the other was kept in a register the whole > time. > > This code needs -ffloat-store, or -mpc64, when compiled for 32-bit x86. Or -fexcess-precision=standard.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #15 from Jiri Slaby --- (In reply to Dmitry Vyukov from comment #14) > If you apply the latest kcov patch "[PATCH v6] kernel: add kcov code > coverage", it should work. Could you please push that to the syzkaller tree [1] then? [1] https://github.com/dvyukov/linux/commits/kcov
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #16 from Dmitry Vyukov --- > Could you please push that to the syzkaller tree [1] then? Sorry, syzkaller page referred to outdated patch. I was hoping that Andrew will take it soon, so that I can update the link to a more respected location. Updated now: https://github.com/dvyukov/linux/commit/33787098ffaaa83b8a7ccf519913ac5fd6125931 If you will have any other issue feel free to contact syzkal...@googlegroups.com. Re the original issue. We could call a special __sanitizer_cov_trace_pc_special callback which would save/restore all registers when a file is compiled with -fcall-saved*. But it probably does not worth it while we have a single case. We could also add a finer-grained function attibute which disables kcov instrumentation of a single function. But the same reasoning applies.
[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624 --- Comment #17 from Dmitry Vyukov --- Jakub, I guess you can close this. Sorry again.
[Bug tree-optimization/69595] [6 Regression] Bogus -Warray-bound warning due to missed optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69595 --- Comment #3 from Richard Biener --- Author: rguenth Date: Tue Feb 2 15:19:32 2016 New Revision: 233076 URL: https://gcc.gnu.org/viewcvs?rev=233076&root=gcc&view=rev Log: 2016-02-02 Richard Biener PR tree-optimization/69595 * match.pd: Add range test simplifications to true/false. * gcc.dg/Warray-bounds-17.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/Warray-bounds-17.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
[Bug target/69613] [6 Regression] wrong code with -O and simple 128bit arithmetics and vectors @ aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69613 --- Comment #2 from ktkachov at gcc dot gnu.org --- Bisection shows this started with r226901, the big copyrename dropping patch. I didn't investigate whether it's actually the cause of the bug or just exposes another latent one.
[Bug tree-optimization/69599] [6 Regression] libgomp.c fipa-pta tests compiled with -flto -flto-partition=max fail in execution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69599 vries at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |6.0 Summary|libgomp.c fipa-pta tests|[6 Regression] libgomp.c |compiled with -flto |fipa-pta tests compiled |-flto-partition=max fail in |with -flto |execution |-flto-partition=max fail in ||execution --- Comment #2 from vries at gcc dot gnu.org --- omp-nested-2.c is omp-nested-1.c with -fipa-pta. omp-nested-1.c with -fipa-pta -flto -flto-partition=max passes for gcc-5-branch. So, marking as 6 regression.
[Bug tree-optimization/69595] [6 Regression] Bogus -Warray-bound warning due to missed optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69595 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Richard Biener --- Fixed.
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 --- Comment #11 from Tom Hughes --- This is C++ so -fexcess-precision=standard is no help as that is C only. Likewise -ffloat-store is, as I understand it, not much help in real world code because you need to make sure that you force stores in order to trigger it? Using -mpc64 seems very scary as I believe it alters the global state of the program. So short of -mfpmath=sse I suspect the only solution is to replace the equality with a comparisin that allow some variation. I'll just slink off and go back to hating x87 FP math I think...
[Bug lto/69630] New: [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630 Bug ID: 69630 Summary: [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402 Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org Target Milestone: --- Running the attached C++ (.ii) file fails with: $ g++ -r -nostdlib -O2 -flto -Wsuggest-final-methods test.ii lto1: internal compiler error: Segmentation fault 0xa78a4f crash_signal ../../gcc/toplev.c:335 0x87b824 types_same_for_odr(tree_node const*, tree_node const*, bool) ../../gcc/ipa-devirt.c:402 0x885bc5 possible_polymorphic_call_targets(tree_node*, long, ipa_polymorphic_call_context, bool*, void**, bool) ../../gcc/ipa-devirt.c:3200 0x887427 possible_polymorphic_call_targets(cgraph_edge*, bool*, void**, bool) ../../gcc/ipa-utils.h:114 0x887427 ipa_devirt ../../gcc/ipa-devirt.c:3575
[Bug lto/69630] [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630 --- Comment #1 from Tobias Burnus --- Created attachment 37557 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37557&action=edit test.ii test case
[Bug libstdc++/69626] [6 Regression] std::strtoll etc. no longer defined in c++98 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626 Jonathan Wakely changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2016-02-02 Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org Ever confirmed|0 |1
[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032 --- Comment #14 from uros at gcc dot gnu.org --- Author: uros Date: Tue Feb 2 16:07:24 2016 New Revision: 233079 URL: https://gcc.gnu.org/viewcvs?rev=233079&root=gcc&view=rev Log: PR target/67032 * config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032 --- Comment #15 from uros at gcc dot gnu.org --- Author: uros Date: Tue Feb 2 16:08:56 2016 New Revision: 233080 URL: https://gcc.gnu.org/viewcvs?rev=233080&root=gcc&view=rev Log: PR target/67032 * config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/i386/i386.c
[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032 --- Comment #16 from uros at gcc dot gnu.org --- Author: uros Date: Tue Feb 2 16:10:04 2016 New Revision: 233081 URL: https://gcc.gnu.org/viewcvs?rev=233081&root=gcc&view=rev Log: PR target/67032 * config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/i386/i386.c
[Bug rtl-optimization/67609] [5 Regression] Generates wrong code for SSE2 _mm_load_pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #43 from rsandifo at gcc dot gnu.org --- FWIW, the proposed patch for PR69577 fixes this testcase with the aarch64_cannot_change_mode_class change reverted. The code quality looks slightly better too.
[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #17 from Uroš Bizjak --- Fixed everywhere.
[Bug c++/69631] New: Bogus overflow in constant expression error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69631 Bug ID: 69631 Summary: Bogus overflow in constant expression error Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mpolacek at gcc dot gnu.org Target Milestone: --- Starting with the C++ delayed folding merge, we reject this test with -fwrapv: struct C { static const unsigned short max = static_cast((32767 * 2 + 1)); }; q.C:2:80: error: overflow in constant expression [-fpermissive] static const unsigned short max = static_cast((32767 * 2 + 1)); ^ q.C:2:80: error: overflow in constant expression [-fpermissive] q.C:2:80: error: overflow in constant expression [-fpermissive] q.C:2:80: error: overflow in constant expression [-fpermissive]
[Bug tree-optimization/67282] Wrong code with -floop-nest-optimize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67282 --- Comment #2 from Marek Polacek --- I can reproduce this one with: Using built-in specs. COLLECT_GCC=./xgcc Target: x86_64-unknown-linux-gnu Configured with: /home/marek/src/gcc/configure --enable-languages=c,c++ --enable-checking=yes -with-system-zlib --disable-bootstrap --disable-libvtv --disable-libcilkrts --disable-libitm --disable-libgomp --disable-libcc1 --disable-libstdcxx-pch --disable-libssp --enable-isl Thread model: posix gcc version 5.3.1 20160202 (GCC) But not anymore with 6.
[Bug lto/69630] [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630 Tobias Burnus changed: What|Removed |Added Target Milestone|--- |6.0
[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570 --- Comment #12 from Bernd Schmidt --- Or lose the equality tests on the max values, instead use something like if (b > r && b >= g) I suppose that could still have problems if b and g are equal and one of them is spilled. Someone who knows that code would have to say whether that's a problem or not.
[Bug c++/69632] New: No error issued for declaring a parameter having a late-specified return type without the 'auto' type specifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69632 Bug ID: 69632 Summary: No error issued for declaring a parameter having a late-specified return type without the 'auto' type specifier Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ppalka at gcc dot gnu.org Target Milestone: --- g++ does not issue an error for following the invalid declaration: int foo (long (int) -> char); This does not appear to be a regression. All versions of g++ (since the introduction of C++11 support) accept this code.
[Bug c++/69277] [6 Regression] ICE mangling a flexible array member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69277 --- Comment #6 from Martin Sebor --- In response to another patch for a related problem Jason asked me to change the representation of flexible array members in C++. The alternate representation has an impact on how this bug is dealt with so it's on hold pending Jason's decision. The patch with the alternate representation was posted last week and pinged last night: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01901.html In his response from this morning Jason requests some additional tweaks to the patch. I should have an updated one by tomorrow.
[Bug rtl-optimization/69633] New: [6 Regression] Redundant move is generated after r228097
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633 Bug ID: 69633 Summary: [6 Regression] Redundant move is generated after r228097 Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ysrumyan at gmail dot com Target Milestone: --- Sorry, that we noticed this regression just now but not in September. After Makarov's fix for 61578 ( and s390 regression) we noticed that for attached simple test-case extracted from real benchmark one more redundant move instruction is generated (till 20160202 compiler build): before fix (postreload dump) 86: NOTE_INSN_BASIC_BLOCK 4 40: dx:QI=[si:SI] 41: ax:QI=[si:SI+0x1] 42: {si:SI=si:SI+0x3;clobber flags:CC;} 43: dx:SI=zero_extend(dx:QI) 44: ax:SI=zero_extend(ax:QI) 45: cx:SI=zero_extend([si:SI-0x1]) 46: {di:SI=dx:SI*0x4c8b;clobber flags:CC;} 47: {bx:SI=ax:SI*0x9646;clobber flags:CC;} 48: {bx:SI=bx:SI+di:SI;clobber flags:CC;} 49: {di:SI=cx:SI*0x1d2f;clobber flags:CC;} 50: NOTE_INSN_DELETED 51: bx:SI=bx:SI+di:SI+0x8000 52: {bx:SI=bx:SI>>0x10;clobber flags:CC;} 53: [bp:SI]=bx:QI 96: bx:SI=dx:SI 55: {bx:SI=bx:SI<<0xf;clobber flags:CC;} 57: {bx:SI=bx:SI-dx:SI;clobber flags:CC;} after fix 86: NOTE_INSN_BASIC_BLOCK 4 40: dx:QI=[si:SI] 41: ax:QI=[si:SI+0x1] 42: {si:SI=si:SI+0x3;clobber flags:CC;} 43: dx:SI=zero_extend(dx:QI) 44: ax:SI=zero_extend(ax:QI) 45: cx:SI=zero_extend([si:SI-0x1]) 46: {di:SI=dx:SI*0x4c8b;clobber flags:CC;} 47: {bx:SI=ax:SI*0x9646;clobber flags:CC;} 48: {bx:SI=bx:SI+di:SI;clobber flags:CC;} 49: {di:SI=cx:SI*0x1d2f;clobber flags:CC;} 50: NOTE_INSN_DELETED 51: bx:SI=bx:SI+di:SI+0x8000 52: {bx:SI=bx:SI>>0x10;clobber flags:CC;} 53: [bp:SI]=bx:QI 96: bx:SI=dx:SI 55: {bx:SI=bx:SI<<0xf;clobber flags:CC;} 98: di:SI=bx:SI !! redundnat move 57: {di:SI=di:SI-dx:SI;clobber flags:CC;} In result, we got >3% slowdown on Silvermont in pie & 32-bit mode.
[Bug c++/69632] No error issued for declaring a parameter having a late-specified return type without the 'auto' type specifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69632 Jonathan Wakely changed: What|Removed |Added Keywords||accepts-invalid Status|UNCONFIRMED |NEW Last reconfirmed||2016-02-02 Ever confirmed|0 |1
[Bug rtl-optimization/69633] [6 Regression] Redundant move is generated after r228097
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633 --- Comment #1 from Yuri Rumyantsev --- Created attachment 37559 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37559&action=edit test-case to reproduce Need to be compiled with -O2 -m32 -pie -fPIE. Assume that -march=slm is not needed.