[Bug regression/40886] New: No loop counter reversal for simple loops anymore
Given this simple program: main() { int i; for (i = 0; i < 10; i++) f2(); } compiled with -O2 gcc33-hammer detects that the loop index is not used in the loop body and rewrites it to a downwards counting loop, elimitinating one instruction (inc/cmp -> dec) movl$9, %ebx .p2align 4,,7 .L6: xorl%eax, %eax callf2 decl%ebx jns .L6 popq%rbx but gcc 4.0 - 4.3 don't anymore (tried 4.1, 4.3 and 4.4): xorl%ebx, %ebx .p2align 4,,7 .L2: xorl%eax, %eax incl%ebx callf2 cmpl$10, %ebx jne .L2 -- Summary: No loop counter reversal for simple loops anymore Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andi-gcc at firstfloor dot org GCC host triplet: x86_64-linux GCC target triplet: x86_64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40886
[Bug regression/40886] [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore
--- Comment #4 from andi-gcc at firstfloor dot org 2009-08-07 08:50 --- The RTL loop optimizer does this optimization. I had to fix it a couple of years ago for unsigned variables. I think the loop optimizer still does it, just the gcc 4 frontend doesn't give it input RTL with a suitable pattern anymore. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40886
[Bug regression/40886] [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore
--- Comment #6 from andi-gcc at firstfloor dot org 2009-08-07 09:38 --- It worked on x86 at least -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40886
[Bug regression/40886] [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore
--- Comment #8 from andi-gcc at firstfloor dot org 2009-08-07 09:52 --- At least my example in the original bug description shows that the optimization worked on gcc 3.3. If your theory doesn't explain this then your theory is wrong. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40886
[Bug other/37280] weak symbol regression breaks linux kernel
--- Comment #1 from hp at gcc dot gnu dot org 2008-08-29 16:02 --- (In reply to comment #0) > I attached a preprocessed test case. Where? --- Comment #2 from andi-gcc at firstfloor dot org 2008-08-29 16:02 --- Created an attachment (id=16164) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16164&action=view) preprocessed test case -- hp at gcc dot gnu dot org changed: What|Removed |Added CC||hp at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37280
[Bug tree-optimization/37312] New: -Os significantly faster than -O2 on test case
[component might be wrong] The appended test case is significantly faster with -Os -funroll-all-loops (~5%) versus -O2 -funroll-all-loops in gcc 4.4 ( gcc version 4.4.0 20080829; that is shortly after the IRA merge) on a Core2 (Merom) In earlier gcc versions they are about the same performance. The -Os improvement is against all earlier versions (good!) but it should be in -O2 too. I tried -fno-tree-pre as it was suggested and it didn't make a difference. -- Summary: -Os significantly faster than -O2 on test case Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andi-gcc at firstfloor dot org GCC host triplet: x86_64-linux GCC target triplet: x86-64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
[Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
--- Comment #1 from andi-gcc at firstfloor dot org 2008-09-01 11:22 --- Created an attachment (id=16178) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16178&action=view) test case checksum functions extracted from the Linux kernel. Not preprocessed, but should compile on any x86 ISO-C system -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
[Bug tree-optimization/37312] -Os significantly faster than -O2 on test case
--- Comment #3 from andi-gcc at firstfloor dot org 2008-09-01 14:20 --- Thanks for the us^whelpful comment. If you can suggest a way to do carry preserving addition without inline assembler that would be fine, otherwise not. -Os seems to do something that improves it at least (and that is new in 4.4, 4.3 didn't do that) I suppose -O2 does something more that makes it then worse again. I merely filled it because I thought it would be interesting to fix that something to not pessimize code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37312
[Bug lto/95928] New: LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 Bug ID: 95928 Summary: LTO through ar breaks weak function resolution Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org CC: marxin at gcc dot gnu.org Target Milestone: --- Target: x86_64-linux Created attachment 48791 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48791&action=edit dummy.c With the attached test case (extracted from the Linux kernel) the expected behavior is that the strong version of __x64_sys_capget overrides the weak version in sys_ni.i This works with LTO when the object files are linked directly, but doesn't work (weak version of function is output) when the linking is through a .a file. Works gcc -flto -c sys_ni.i gcc -flto -c capability.i gcc -O2 -flto dummy.c sys_ni.o capability.o # sys_ni_syscall doesn't appear, so the strong version is chosen objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall Breaks: gcc -flto -c sys_ni.i gcc -flto -c capability.i rm -f x.a gcc-ar q x.a sys_ni.o capability.o gcc -O2 -flto dummy.c x.a # sys_ni_syscall appears, so the weak version is incorrectly chosen objdump --disassemble=__x64_sys_capget | grep sys_ni_syscall This seems to be a regression, it worked on gcc-7, but breaks on gcc 9/10. Don't have any immediate versions to test.
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #2 from Andi Kleen --- Created attachment 48793 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48793&action=edit capability.i
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #1 from Andi Kleen --- Created attachment 48792 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48792&action=edit sys_ni.i
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #3 from Andi Kleen --- Versions reproduced: gcc version 10.1.1 20200507 [revision dd38686d9c810cecbaa80bb82ed91caaa58ad635] (SUSE Linux) gcc-9 (SUSE Linux) 9.3.1 20200406 [revision 6db837a5288ee3ca5ec504fbd5a765817e556ac2] Version which worked correctly: gcc version 7.5.0 (SUSE Linux) binutils: GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1
[Bug bootstrap/95934] New: bootstrap fails in compiler assert in sanitizer_platform_limits_posix.cpp:1136
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95934 Bug ID: 95934 Summary: bootstrap fails in compiler assert in sanitizer_platform_limits_posix.cpp:1136 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- commit e74c281bf4955eea7fdc5f21b43e29fa0235a5b0 (HEAD -> trunk, origin/trunk, origin/master, origin/HEAD) make bootstrap fails with ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_internal_defs.h:336:30: note: in expansion of macro 'IMPL_COMPILER_ASSERT' 336 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__) | ^~~~ ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h:1442:3: note: in expansion of macro 'COMPILER_CHECK' 1442 | COMPILER_CHECK(sizeof(((__sanitizer_##CLASS *)NULL)->MEMBER) == \ | ^~ ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:1136:1: note: in expansion of macro 'CHECK_SIZE_AND_OFFSET' 1136 | CHECK_SIZE_AND_OFFSET(ipc_perm, mode); | ^ Works fine when I comment out the assert. There already is a ifdef checking for lots of cases, seems it doesn't work on mine either. This is with a recent glibc-2.31-5.9.x86_64 (opensuse glibc) Perhaps the assert should just be disabled like this? (patch is likely white space damaged) diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp index b4f8f67b664..bb6377b70cb 100644 --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp @@ -1133,7 +1133,7 @@ CHECK_SIZE_AND_OFFSET(ipc_perm, cgid); /* On aarch64 glibc 2.20 and earlier provided incorrect mode field. */ /* On Arm newer glibc provide a different mode field, it's hard to detect so just disable the check. */ -CHECK_SIZE_AND_OFFSET(ipc_perm, mode); +//CHECK_SIZE_AND_OFFSET(ipc_perm, mode); #endif CHECK_TYPE_SIZE(shmid_ds);
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #4 from Andi Kleen --- Reproduced on trunk too 11.0-200626 e74c281bf4955eea7fdc5f21b43e29fa0235a5b0
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #5 from Andi Kleen --- It doesn't seem to be the plugin itself, I compiled trunk with the gcc-7 lto-plugin.c and it fails too.
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #8 from Andi Kleen --- It works fine without LTO. Otherwise the Linux kernel wouldn't work. It relies on this behavior for its syscalls. The test case is extracted from there.
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #9 from Andi Kleen --- I think the STB_SECONDARY stuff is only needed if ld -r is used, but not for ar
[Bug lto/95928] LTO through ar breaks weak function resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95928 --- Comment #12 from Andi Kleen --- Okay. I only compared gcc-7 (working) vs gcc-9 (broken), but always with LTO. Looking at the kernel link it also uses --whole-archive. Perhaps that makes a difference? I'll redo the test case with --whole-archive (will need some fixes)
[Bug target/93346] New: gcc not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 Bug ID: 93346 Summary: gcc not generate BZHI Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org Target Milestone: --- Target: x86_64
[Bug target/93346] gcc not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 --- Comment #1 from Andi Kleen --- typedef unsigned u; u bzhi(u src, u inx) { return src & ((1 << inx) - 1); } with -O2 -march=skylake generates movl%esi, %r8d movl$1, %esi shlx%r8d, %esi, %esi leal-1(%rsi), %eax andl%edi, %eax ret clang generates the expected bzhil %esi, %edi, %eax retq
[Bug c/50584] No warning for passing small array to C99 static array declarator
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50584 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot ||org --- Comment #1 from Andi Kleen 2013-02-19 06:51:18 UTC --- Confirmed. Still happens with 4.7/4.8
[Bug target/55947] non constant memory models lose HLE qualifiers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55947 --- Comment #3 from Andi Kleen 2013-03-13 13:49:10 UTC --- It was pointed out to me that atomic triggers this with, when compiled with no optimization. For HLE wrong hints would be generated. bool test_and_set(memory_order __m = memory_order_seq_cst) noexcept { return __atomic_test_and_set (&_M_i, __m); } bool foo(std::atomic_flag fl) { return fl.test_and_set(std::memory_order_relaxed); }
[Bug target/55948] __atomic_clear / __atomic_store_n ignore HLE_RELEASE flags
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55948 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #2 from Andi Kleen 2013-03-13 13:53:33 UTC --- patch was checked in some time ago
[Bug tree-optimization/56618] New: inline assembler with too many lines causes ICE in account_size_time, at ipa-inline-analysis.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56618 Bug #: 56618 Summary: inline assembler with too many lines causes ICE in account_size_time, at ipa-inline-analysis.c Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org A 6.4 mio lines inline assembler statement (happened in some auto generated real code) causes an overflow in the inliner cost estimation per cpu, resulting in a ICE. Reproducer #!/usr/bin/python print "int foo(void) {" print " asm(" for i in range(640): print r'"\n"' print " );" print "}" ./longasm.py > l.c gcc l.c Observed back to 4.7 at least, but much older compilers should be ok. So it's a regression Patch has been posted at http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg50027.html There was some discussion, but no approval
[Bug target/56619] New: i386 hle atomic intrinsics flags are undocumented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56619 Bug #: 56619 Summary: i386 hle atomic intrinsics flags are undocumented Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Patch has been posted here http://patchwork.ozlabs.org/patch/198096/ but so far not reviewed/approved
[Bug target/56619] i386 hle atomic intrinsics flags are undocumented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56619 --- Comment #1 from Andi Kleen 2013-03-14 13:18:32 UTC --- This is a more complete version of the documentation (also including RTM intrinsics), again not approved: http://patchwork.ozlabs.org/patch/211504/
[Bug target/53315] simple xtest program generates ICE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53315 Andi Kleen changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #20 from Andi Kleen 2013-03-15 13:55:28 UTC --- Fixed for some time
[Bug target/56619] i386 hle atomic intrinsics flags are undocumented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56619 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #3 from Andi Kleen 2013-03-15 13:56:36 UTC --- Fix checked into trunk
[Bug rtl-optimization/56912] New: scheduler change breaks linux kernel LTO build with 4.8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56912 Bug #: 56912 Summary: scheduler change breaks linux kernel LTO build with 4.8 Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org For the Linux kernel LTO build I get ICEs during LTO (segfaults) with the recent 4.8 branch. I bisected it down to this patch and reverting fixes it. No simple test case unfortunately as it is LTO Backport from mainline 2013-02-25 Andrey Belevantsev Alexander Monakov PR middle-end/56077 * sched-deps.c (sched_analyze_insn): When reg_pending_barrier, flush pending lists also on non-jumps. Adjust comment. Typical crash: #7 #8 sched_analyze_1 (deps=0x7fff550a5c00, x=0x7fa0311e2ed0, insn=0x7fa0311f40d8) at ../../gcc/gcc/sched-deps.c:2479 #9 0x00b668d5 in sched_analyze_insn (deps=deps@entry=0x7fff550a5c00, x=0x7fa0311e2e70, insn=insn@entry=0x7fa0311f40d8) at ../../gcc/gcc/sched-deps.c:2859 #10 0x00b6859b in deps_analyze_insn (deps=deps@entry=0x7fff550a5c00, insn=insn@entry=0x7fa0311f40d8) at ../../gcc/gcc/sched-deps.c:3505 #11 0x00b689c3 in sched_analyze (deps=0x7fff550a5c00, head=, tail=0x7fa0311f8c18) at ../../gcc/gcc/sched-deps.c:3653 #12 0x0070b635 in compute_block_dependences (bb=0) at ../../gcc/gcc/sched-rgn.c:2702 #13 sched_rgn_compute_dependencies (rgn=rgn@entry=5) at ../../gcc/gcc/sched-rgn.c:3140 #14 0x0070df84 in schedule_region (rgn=5) at ../../gcc/gcc/sched-rgn.c:2915 #15 schedule_insns () at ../../gcc/gcc/sched-rgn.c:3299 #16 schedule_insns () at ../../gcc/gcc/sched-rgn.c:3278 #17 0x0070e3b1 in rest_of_handle_sched2 () at ../../gcc/gcc/sched-rgn.c:3523 #18 0x006b534e in execute_one_pass (pass=pass@entry=0x112e240 ) at ../../gcc/gcc/passes.c:2084 #19 0x006b56bd in execute_pass_list (pass=0x112e240 ) at ../../gcc/gcc/passes.c:2139 #20 0x006b56cf in execute_pass_list (pass=0x112d840 ) at ../../gcc/gcc/passes.c:2140 #21 0x006b56cf in execute_pass_list (pass=0x112d8a0 ) at ../../gcc/gcc/passes.c:2140 #22 0x00792043 in tree_rest_of_compilation (fndecl=0x7fa03f899700) at ../../gcc/gcc/tree-optimize.c:422 #23 0x00536f7b in cgraph_expand_function (node=0x7fa03c49b5a0)
[Bug lto/54206] New: build in source dir breaks lto plugin detection
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54206 Bug #: 54206 Summary: build in source dir breaks lto plugin detection Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org when building inside the source dir with a plugin ld that supports plugin this check in gcc/configure.ac fails AC_MSG_CHECKING(linker plugin support) gcc_cv_lto_plugin=0 if test -f liblto_plugin.la; then Result is a compiler without ld plugin support It works with a separate build dir Should either handle this or error out on in tree builds
[Bug lto/54206] build in source dir breaks lto plugin detection
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54206 --- Comment #2 from Andi Kleen 2012-08-09 12:39:14 UTC --- I didn't do it, but I had to debug a user's gcc config who did it. WONTFIX would be wrong, if you really don't support it error out in configure please instead of silent breakage (glibc does that). But fixing it would be better I think.
[Bug lto/54206] build in source dir breaks lto plugin detection
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54206 --- Comment #5 from Andi Kleen 2012-10-04 18:50:52 UTC --- This is the configure snippet glibc is using for this. Someone with better autoconf-fu than me could add it if test "`cd $srcdir; /bin/pwd`" = "`/bin/pwd`"; then AC_MSG_ERROR([you must configure in a separate build directory]) fi
[Bug lto/55066] New: lto integer-cst change causes ICE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55066 Bug #: 55066 Summary: lto integer-cst change causes ICE Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org A large LTO allyes kernel build (no small test case unfortunately) recently started ICEing during the LTO phase with linux-lto-2.6/drivers/isdn/hardware/eicon/message.c:12035:0: internal compiler error: in widest_int_cst_value, at tree.c:10214 static byte mixer_request(dword Id, word Number, DIVA_CAPI_ADAPTER *a, PLCI *plci, APPL *appl, API_PARSE *msg) ^ 0x8c05f8 widest_int_cst_value(tree_node const*) ../../gcc/gcc/tree.c:10213 0x81987f find_bswap_1 ../../gcc/gcc/tree-ssa-math-opts.c:1669 0x819a23 find_bswap_1 ../../gcc/gcc/tree-ssa-math-opts.c:1733 0x81a194 find_bswap ../../gcc/gcc/tree-ssa-math-opts.c:1779 0x81a194 execute_optimize_bswap ../../gcc/gcc/tree-ssa-math-opts.c:1905 Please submit a full bug report, I bisected it down to this change from Richi: 2012-10-18 Richard Guenther * lto-streamer.h (enum LTO_tags): Add LTO_integer_cst. * lto-streamer-in.c (lto_input_tree): Use it. * lto-streamer-out.c (lto_output_tree): Likewise, for !TREE_OVERFLOW integer constants only. * tree-streamer-in.c (unpack_ts_int_cst_value_fields): New function. (unpack_value_fields): Call it. (streamer_read_integer_cst): Simplify. * tree-streamer-out.c (pack_ts_int_cst_value_fields): New function. (streamer_pack_tree_bitfields): Call it. (streamer_write_integer_cst): Adjust.
[Bug lto/55066] lto integer-cst change causes ICE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55066 Andi Kleen changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||FIXED --- Comment #2 from Andi Kleen 2012-10-25 11:45:38 UTC --- Ok seems to work now with latest trunk
[Bug lto/54095] Unnecessary static variable renaming
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54095 --- Comment #14 from Andi Kleen 2012-10-25 14:20:31 UTC --- Is there a chance to fix this in 4.8? What remains to be done?
[Bug target/55139] New: __atomic store does not support __ATOMIC_HLE_RELEASE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139 Bug #: 55139 Summary: __atomic store does not support __ATOMIC_HLE_RELEASE Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org volatile int slock; void unlock(void) { int free_val = 1; __atomic_store(&slock, &free_val, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE); } spin.c: In function 'unlock': spin.c:6:16: warning: invalid memory model argument 3 of '__atomic_store' [-Winvalid-memory-model] __atomic_store(&slock, &free_val, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE); But XRELEASE MOV ... is allowed in TSX.
[Bug target/55139] __atomic store does not support __ATOMIC_HLE_RELEASE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139 --- Comment #1 from Andi Kleen 2012-11-07 04:03:53 UTC --- This is an interesting one. This is the gcc code: enum memmodel { MEMMODEL_RELAXED = 0, MEMMODEL_CONSUME = 1, MEMMODEL_ACQUIRE = 2, MEMMODEL_RELEASE = 3, MEMMODEL_ACQ_REL = 4, MEMMODEL_SEQ_CST = 5, MEMMODEL_LAST = 6 }; #define MEMMODEL_MASK ((1<<16)-1) enum memmodel model; model = get_memmodel (CALL_EXPR_ARG (exp, 2)); if ((model & MEMMODEL_MASK) != MEMMODEL_RELAXED && (model & MEMMODEL_MASK) != MEMMODEL_SEQ_CST && (model & MEMMODEL_MASK) != MEMMODEL_RELEASE) { error ("invalid memory model for %<__atomic_store%>"); return NULL_RTX; } HLE_STORE is 1 << 16, so outside the enum range But when looking at the assembler we see that the & MEMMODEL_MASK gets optimized away, it just generates a direct sequence of 32bit cmps. This makes all the != fail, even though they should succeed I presume the optimizer assumes nothing can be outside the enum. I tried to expand the enum by adding MEMMODEL_ARCH1 = 1 << 16, MEMMODEL_ARCH2 = 1 << 17, MEMMODEL_ARCH3 = 1 << 18, MEMMODEL_ARCH4 = 1 << 19 But still doesn't work. Questions: - Is it legal for the optimizer to assume this? - Why does extending the enum not help? We could fix it by not using an enum here of course, but I wonder if this is an underlying optimizer bug.
[Bug libstdc++/55233] New: libstdc++ atomic does not support hle_acquire/release
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55233 Bug #: 55233 Summary: libstdc++ atomic does not support hle_acquire/release Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org CC: kyuk...@gcc.gnu.org Target: x86_64-linux The underlying __atomic_* C intrinsics support TSX HLE ACQUIRE/RELEASE, but libstdc++ atomic does not define the enum values or mask the bits. However extending the libstdc++ enum may run into a variant of PR55139
[Bug target/55139] __atomic store does not support __ATOMIC_HLE_RELEASE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139 --- Comment #3 from Andi Kleen 2012-11-07 14:45:17 UTC --- I saw the problem both with bootstrapped and non bootstrapped (4.6 base) compilers I haven't checked if it's always the missing and, but it's likely Ok I can change everything to int, but why did extending the enum not help? I fear that I may hide some bug this way. Also how to fix libstdc++? It seems to have a similar enum in its headers (it doesn't support HLE yet, but that's another bug)
[Bug target/55139] __atomic store does not support __ATOMIC_HLE_RELEASE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55139 --- Comment #4 from Andi Kleen 2012-11-09 14:06:20 UTC --- My earlier analysis was not correct. I was chasing the wrong warning. Rather the problem is in c-common.c, where the atomic models are checked again. I'm sending a patch for that.
[Bug lto/41589] lto does not eliminate unused variables
--- Comment #1 from andi-gcc at firstfloor dot org 2009-10-05 14:04 --- Created an attachment (id=18710) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18710&action=view) tlto1.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41589
[Bug lto/41589] New: lto does not eliminate unused variables
Using built-in specs. COLLECT_GCC=gcc45 COLLECT_LTO_WRAPPER=/pkg/gcc-4.5-091004/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.5.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/pkg/gcc-4.5-091004 --enable-checking=release --enable-languages=c,c++ --disable-nls --enable-lto Thread model: posix gcc version 4.5.0 20091004 (experimental) (GCC) With the attached simple lto test case I would have expected f1 and f2 and y to be optimized away when built with -fwhole-program. But that was not the case. nm ./tlto ... 00601020 b completed.5856 00601010 W data_start 00601028 b dtor_idx.5858 00400560 T f1 00400550 T f2 004004f0 t frame_dummy 00400520 T main U printf@@GLIBC_2.2.5 00601030 B y -- Summary: lto does not eliminate unused variables Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andi-gcc at firstfloor dot org GCC host triplet: x86_64-linux GCC target triplet: x86_64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41589
[Bug lto/41589] lto does not eliminate unused variables
--- Comment #2 from andi-gcc at firstfloor dot org 2009-10-05 14:04 --- Created an attachment (id=18711) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18711&action=view) tlto2.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41589
[Bug lto/41589] lto does not eliminate unused variables
--- Comment #3 from andi-gcc at firstfloor dot org 2009-10-05 14:04 --- Created an attachment (id=18712) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18712&action=view) Makefile.lto -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41589
[Bug lto/41591] New: documentation should document interaction of -flto and -fwhole-program
e.g. when -fwhole-program should be specified, at compile or at line time and that it applies to the object files I figured it out using the source and #gcc, but it would be better in the texinfo file. -- Summary: documentation should document interaction of -flto and - fwhole-program Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andi-gcc at firstfloor dot org GCC host triplet: x86_64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41591
[Bug tree-optimization/41589] lto does not eliminate unused variables
--- Comment #6 from andi-gcc at firstfloor dot org 2009-10-05 15:42 --- I use binutils 2.19 (from opensuse 11.1). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41589
[Bug lto/46905] -flto -fno-lto does not disable lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46905 --- Comment #5 from Andi Kleen 2011-01-08 23:16:25 UTC --- slim lto will take some time (next stage1) i also plan to drop most of the code because with forced plugin the elf code in collect2 should not be needed anymore.
[Bug lto/46905] -flto -fno-lto does not disable lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46905 --- Comment #6 from Andi Kleen 2011-01-08 23:56:48 UTC --- And to add: if you have more fixes for -fno-lto please add them now, don't wait.
[Bug preprocessor/47311] [4.6 Regression][C++0x] ICE in tsubst @cp/pt.c:10502
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47311 --- Comment #19 from Andi Kleen 2011-01-17 19:59:23 UTC --- Sounds like a valgrind bug to me. It should know that the string instruction does not examine the values after the terminator character and the length.
[Bug bootstrap/58090] New: bootstrap fails comparison with --enable-gather-detailed-mem-stats
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58090 Bug ID: 58090 Summary: bootstrap fails comparison with --enable-gather-detailed-mem-stats Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org On x86_64-linux Works without --enable-gather-detailed-mem-stats make[2]: *** [compare] Error 1 make[1]: *** [stage3-bubble] Error 2 make: *** [all] Error 2
[Bug target/50302] New: inefficient float->double conversion in AVX with -mtune=generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50302 Bug #: 50302 Summary: inefficient float->double conversion in AVX with -mtune=generic Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org I noticed that with AVX and -mtune=generic and converting a single float to a double gcc still generates vunpcklps reg,reg vcvtps2pd reg,reg instead of the more straight forward and likely more power efficient vcvtss2sd reg,reg AFAIK the first sequence was only needed on some older AMD CPUs with SSE to avoid a conversion penalty, does it really still make sense for AVX? Perhaps that should be fixed for tune=generic ? Test case: #include float a = 1, b = 2; float c; int main(void) { c = a + b; printf("%f\n", c); return 0; }
[Bug middle-end/49282] malloc corruption in large lto1-wpa run during inline edge heap resize
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49282 --- Comment #7 from Andi Kleen 2011-09-13 16:00:24 UTC --- I haven't tried 32bit or GCOV recently, so not sure. I can try next time. I was still stuck on the other problem with the confused linker plugin ids.
[Bug lto/50511] New: gcc lto streamer in fragments memory badly
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50511 Bug #: 50511 Summary: gcc lto streamer in fragments memory badly Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org I ran into a problem while testing LTO on a quite large project with a lot of object files: the lto streamer fragmented the memory map badly by constantly mapping and unmapping the input files with mmap. I ended up with a memory map with lots of 2 page holes between mappings. Eventually it bumped into the default 64k max number of mappings limit on Linux and errored out because mmap failed. Workaround was to increase this limit (sysctl -w vm.max_map_count = 20) However gcc should be more efficient in its mappings. I think the problem is the one off cache in lto_file_read() being too dumb. Looking into a fix.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #2 from Andi Kleen 2011-09-29 15:58:26 UTC --- Looking...
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #6 from Andi Kleen 2011-09-29 18:03:21 UTC --- I don't see the problem on a 64bit bootstrap-lto. I guess i must have written some 32bit unsafe code.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #9 from Andi Kleen 2011-09-29 18:17:08 UTC --- Created attachment 25381 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25381 Use long long in lto-plugin Can you please test this patch? Thanks.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #10 from Andi Kleen 2011-09-29 18:19:02 UTC --- I did the same patch (with long long) I think using long long here is ok because lto-plugin only builds on modern and non weird hosts and they should all have long long anyways. uint64_t is probably fine too.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #14 from Andi Kleen 2011-09-29 18:27:11 UTC --- But that's what I did? % diffstat plugin-fix lto-plugin.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) I don't see why long long cannot be used on the platforms supporting plugins (windows, darwin, linux)
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #15 from Andi Kleen 2011-09-29 18:28:18 UTC --- Hmm good point. Maybe the splay tree can be fixed. Otherwise have to use 32bit ids on 32bit, but then the risk of collisions is higher again.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #18 from Andi Kleen 2011-09-29 20:36:50 UTC --- Created attachment 25384 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25384 fix + splay tree I have some unrelated trouble with a 32bit bootstrap currently. This patch should fix all the problems with the splay tree, by allocating the key separately. Can you give it a test please? Thanks
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #24 from Andi Kleen 2011-09-29 23:06:35 UTC --- Thanks. Does it work with this change?
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #27 from Andi Kleen 2011-09-29 23:21:12 UTC --- Hmm is that just for efficiency or did you fix another bug? (not worrying about efficiency too much because this tree has only one entry per input file)
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #28 from Andi Kleen 2011-09-29 23:22:17 UTC --- I don't understand which overflow you refer to. Can you please clarify? afaik a - b is the standard way to write these comparison functions.
[Bug lto/50568] [4.7 Regression] Massive LTO failures
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50568 --- Comment #30 from Andi Kleen 2011-09-29 23:33:52 UTC --- Okay. Can you post the patch then?
[Bug target/50583] Many __sync_XXX builtin functions are incorrect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583 --- Comment #6 from Andi Kleen 2011-09-30 23:35:29 UTC --- Can't say I'm a fan of adding such a heavy weight sequence into an intrinsic. Maybe better to simply leave out the intrinsics that cannot be implemented with loops? If someone wants a loop they better open code it. It would be nice if you implemented the ors, nands and ands with bts, btr etc if the second argument is a constant and only has one bit set.
[Bug tree-optimization/50587] New: ICE init_range_entry, at tree-ssa-reassoc.c:1698 caused by recent change
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50587 Bug #: 50587 Summary: ICE init_range_entry, at tree-ssa-reassoc.c:1698 caused by recent change Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Jakub, your recent change PR tree-optimization/46309 * fold-const.c (make_range, merge_ranges): Remove prototypes. (make_range_step): New function. (make_range): Use it. * tree.h (make_range_step): New prototypes. * Makefile.in (tree-ssa-reassoc.o): Depend on $(DIAGNOSTIC_CORE_H). * tree-ssa-reassoc.c: Include diagnostic-core.h. (struct range_entry): New type. (init_range_entry, range_entry_cmp, update_range_test, optimize_range_tests): New functions. (reassociate_bb): Call optimize_range_tests. * gcc.dg/pr46309.c: New test. breaks my LTO kernel builds. I get a lot of internal compiler error: in init_range_entry, at tree-ssa-reassoc.c:1698 in different files. With the patch reverted things are fine
[Bug tree-optimization/50602] New: ICE in tree_nrv, at tree-nrv.c:155 during large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50602 Bug #: 50602 Summary: ICE in tree_nrv, at tree-nrv.c:155 during large LTO build Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org I get this at the end of a large 32bit LTO build. Cannot give you a small test case unless you want the full builddir. Bisect is difficult because the build relies on some recent other fixes. But I have a core file: #6 0x00b47eb4 in fancy_abort (file=Unhandled dwarf expression opcode 0xf3 ) at ../../gcc/gcc/diagnostic.c:893 #7 0x0075ac05 in tree_nrv () at ../../gcc/gcc/tree-nrv.c:155 #8 0x0068d7ab in execute_one_pass (pass=0x10a9ac0) at ../../gcc/gcc/passes.c:2064 #9 0x0068dae5 in execute_pass_list (pass=0x10a9ac0) at ../../gcc/gcc/passes.c:2119 #10 0x0075df39 in tree_rest_of_compilation (fndecl=0x2b84d3ea6900) at ../../gcc/gcc/tree-optimize.c:420 #11 0x0051ad36 in cgraph_expand_function (node=0x2b84e28a7360) at ../../gcc/gcc/cgraphunit.c:1805 #12 0x0051c612 in cgraph_output_in_order () at ../../gcc/gcc/cgraphunit.c:1962 #13 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:2136 #14 0x004cb7c5 in lto_main () at ../../gcc/gcc/lto/lto.c:2872 ... #7 0x0075ac05 in tree_nrv () at ../../gcc/gcc/tree-nrv.c:155 155 gcc_assert (ret_val == result); (gdb) p ret_val $3 = (tree_node *) 0x0 (gdb) pt result type = union tree_node { tree_base base; tree_typed typed; tree_common common; tree_int_cst int_cst; tree_real_cst real_cst; tree_fixed_cst fixed_cst; tree_vector vector; tree_string string; tree_complex complex; tree_identifier identifier; tree_decl_minimal decl_minimal; tree_decl_common decl_common; tree_decl_with_rtl decl_with_rtl; tree_decl_non_common decl_non_common; tree_parm_decl parm_decl; tree_decl_with_vis decl_with_vis; tree_var_decl var_decl; tree_field_decl field_decl; tree_label_decl label_decl; tree_result_decl result_decl; tree_const_decl const_decl; tree_type_decl type_decl; tree_function_decl function_decl; tree_translation_unit_decl translation_unit_decl; tree_type_common type_common; tree_type_with_lang_specific type_with_lang_specific; tree_type_non_common type_non_common; tree_list list; tree_vec vec; tree_exp exp; tree_ssa_name ssa_name; tree_block block; tree_binfo binfo; tree_statement_list stmt_list; tree_constructor constructor; tree_omp_clause omp_clause; tree_optimization_option optimization; tree_target_option target_option; } *
[Bug tree-optimization/50602] ICE in tree_nrv, at tree-nrv.c:155 during large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50602 Andi Kleen changed: What|Removed |Added Version|unknown |4.7.0 --- Comment #1 from Andi Kleen 2011-10-03 16:40:47 UTC --- Seen with gcc version 4.7.0 20111002 (experimental) (GCC)
[Bug tree-optimization/50602] ICE in tree_nrv, at tree-nrv.c:155 during large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50602 --- Comment #3 from Andi Kleen 2011-10-04 13:22:09 UTC --- Hmm, are you saying gdb fooled me? Any other suggestions how to debug it?
[Bug c/50624] New: detecting array overflows regressed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50624 Bug #: 50624 Summary: detecting array overflows regressed Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Created attachment 25424 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25424 overflow tester The attached program tests 5 different array overflows that the compiler should be able to detect at compile time. gcc 4.5 detects 2 out of 5 with -O2 -Wall: overflow.c:14:7: warning: array subscript is above array bounds overflow.c:22:12: warning: array subscript is above array bounds Current mainline detects zero. gcc version 4.7.0 20111002 (experimental) (GCC)
[Bug c/50624] detecting array overflows regressed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50624 --- Comment #2 from Andi Kleen 2011-10-05 18:56:24 UTC --- Thanks. It's not a pure regression. Even 4.5 misses some easy cases: especially the local stack array case, which should be in theory really easy.
[Bug c/50624] detecting array overflows regressed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50624 --- Comment #5 from Andi Kleen 2011-10-06 14:49:19 UTC --- Easy case = constant expressions as index? Would the frontend be able to handle short array[1]; i = 1; array[i]
[Bug other/50636] New: GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 Bug #: 50636 Summary: GC in large LTO builds cause excessive fragmentation in memory map Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org When doing a very large LTO build I fail with "out of virtual memory" Some investigation showed the problem was not actually running out of memory, but gcc excessively fragmenting its memory map. The Linux kernel has a default limit of 64k mappings per process and the fragmentation exceeded that. This lead to gc mmap allocations failing and other problems. A workaround is to increase /proc/sys/vm/max_map_count Looking at /proc/$(pidof lto1)/maps I see there are lots of 1-3 page holes between other anonymousmemory. I think that's caused by ggc-pages free_pages() function freeing too early and in too small chunks (and perhaps LTO garbage collecting too much?)
[Bug other/50639] New: -flto=jobserver broken on large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50639 Bug #: 50639 Summary: -flto=jobserver broken on large LTO build Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org On a -j8 Linux kernel LTO build with -flto=jobserver I always end up with make[3]: *** read jobs pipe: No such file or directory. Stop. make[3]: *** Waiting for unfinished jobs lto-wrapper: make returned 2 exit status /usr/local/bin/ld-plugin: lto-wrapper failed at the final link stage. The link is actually succeeding, but something confuses jobserver It works with -flto=8. The message comes from make. It could be a regression, but it's hard to say because older much builds usually ran into other problems.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #3 from Andi Kleen 2011-10-06 21:31:56 UTC --- I would prefer to free in 2MB chunks if possible I was experimenting with increasing the quire size from 1 to 2MB so that a modern kernel with transparent huge pages can always get a huge page. If the freeing also happens in the same chunks the kernel can keep the 2MB pages together. But yes MADV_DONTNEED makes sense too.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #5 from Andi Kleen 2011-10-06 21:46:32 UTC --- If it's a 2MB page then madvise MADV_DONTNEED will split it if it's not 2MB aligned. It would be good to optimize the freeing pattern so that this happens rarely. I will try to do some numbers how much 2MB pages improves performance. If yes we could also do a MADV_HUGEPAGE by default. But short term fix would be simply to have a threshold for freepages before freeing?
[Bug tree-optimization/50644] New: ICE in set_is_used added today
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50644 Bug #: 50644 Summary: ICE in set_is_used added today Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Since updating to today's trunk I get a ICE in set_is_used while building a LTOed linux kernel. Yesterday it didn't happen Running a bisect. Here's the crash #7 #8 set_is_used (var=) at +../../gcc/gcc/tree-flow-inline.h:562 #9 mark_all_vars_used_1 (var=) at +../../gcc/gcc/tree-ssa-live.c:379 #10 0x00860b3e in walk_tree_1 (tp=0x2b11d2f00c00, func=0x7a4390 +, data=0x4296a40, pset=0x0, lh=0) at ../../gcc/gcc/tree.c:10448 #11 0x00860f89 in walk_tree_1 (tp=0x2b11d2efacd0, func=0x7a4390 +, data=0x4296a40, pset=0x0, lh=0) at ../../gcc/gcc/tree.c:10526 #12 0x007a4eb5 in mark_all_vars_used (data=, +expr_p=) at ../../gcc/gcc/tree-ssa-live.c:595 #13 remove_unused_locals (data=, expr_p=) at ../../gcc/gcc/tree-ssa-live.c:798 #14 0x0068c268 in execute_function_todo (data=Unhandled dwarf +expression opcode 0xf3 ) at ../../gcc/gcc/passes.c:1695 #15 0x0068d114 in execute_todo (flags=2132516) at +../../gcc/gcc/passes.c:1741 #16 0x0068f3ce in execute_one_ipa_transform_pass (ipa_pass=0x10ac6e0, +node=0x2b11e3116ea0) at ../../gcc/gcc/passes.c:1919 #17 execute_all_ipa_transforms (ipa_pass=0x10ac6e0, node=0x2b11e3116ea0) at +../../gcc/gcc/passes.c:1947 #18 0x0075fd20 in tree_rest_of_compilation (fndecl=0x2b11d2ed7300) at +../../gcc/gcc/tree-optimize.c:413 #19 0x0051b8a6 in cgraph_expand_function (node=0x2b11e3116ea0) at +../../gcc/gcc/cgraphunit.c:1805 #20 0x0051d182 in cgraph_output_in_order () at +../../gcc/gcc/cgraphunit.c:1962 #21 cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:2136 ... (gdb) up #8 set_is_used (var=) at +../../gcc/gcc/tree-flow-inline.h:562 562 ann->used = true; (gdb) p ann $1 = (var_ann_d *) 0x0
[Bug tree-optimization/50644] ICE in set_is_used added today
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50644 Andi Kleen changed: What|Removed |Added CC||matz at gcc dot gnu.org --- Comment #1 from Andi Kleen 2011-10-06 23:53:57 UTC --- Problem is caused by commit 6d3d8bf0e6cb73524be01e28cb82a484cd3d11fd Author: matz Date: Thu Oct 6 15:18:12 2011 + * tree-flow.h (get_var_ann): Don't declare. * tree-flow-inline.h (get_var_ann): Remove. (set_is_used): Use var_ann, not get_var_ann. * tree-dfa.c (add_referenced_var): Inline body of get_var_ann. * tree-profile.c (gimple_gen_edge_profiler): Call find_referenced_var_in. (gimple_gen_interval_profiler): Ditto. (gimple_gen_pow2_profiler): Ditto. (gimple_gen_one_value_profiler): Ditto. (gimple_gen_average_profiler): Ditto. (gimple_gen_ior_profiler): Ditto. (gimple_gen_ic_profiler): Ditto plus call add_referenced_var. (gimple_gen_ic_func_profiler): Call add_referenced_var. * tree-mudflap.c (execute_mudflap_function_ops): Call add_referenced_var. I cannot give you a small test case because it needs a full LTO builddir
[Bug lto/44992] ld -r breaks LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44992 Andi Kleen changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED --- Comment #10 from Andi Kleen 2011-10-07 05:42:50 UTC --- I consider this fixed now because everything I need works together with HJ's binutils. Mainstream binutils will hopefull catch up eventually.
[Bug lto/46905] -flto -fno-lto does not disable lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46905 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #8 from Andi Kleen 2011-10-07 05:43:55 UTC --- AFAIK this works now.
[Bug lto/45475] target use in libcpp breaks LTO bootstrap
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45475 Andi Kleen changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #12 from Andi Kleen 2011-10-07 05:45:29 UTC --- Fixed for quite some time
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #6 from Andi Kleen 2011-10-07 05:47:54 UTC --- *** Bug 50302 has been marked as a duplicate of this bug. ***
[Bug middle-end/49282] malloc corruption in large lto1-wpa run during inline edge heap resize
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49282 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED --- Comment #8 from Andi Kleen 2011-10-07 05:49:13 UTC --- Haven't seen this for some time with different builds, so it's probably fixed
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #7 from Andi Kleen 2011-10-07 05:50:40 UTC --- *** Bug 50511 has been marked as a duplicate of this bug. ***
[Bug lto/50511] gcc lto streamer in fragments memory badly
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50511 Andi Kleen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE --- Comment #1 from Andi Kleen 2011-10-07 05:50:40 UTC --- Was likely the same problem as 50636 *** This bug has been marked as a duplicate of bug 50636 ***
[Bug target/50302] inefficient float->double conversion in AVX with -mtune=generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50302 Andi Kleen changed: What|Removed |Added Status|NEW |RESOLVED Resolution||DUPLICATE --- Comment #2 from Andi Kleen 2011-10-07 05:47:54 UTC --- Was actually a dup of the GC problem. I tried fixing the one-off cache, but it didn't fix the fragmentation *** This bug has been marked as a duplicate of bug 50636 ***
[Bug lto/44463] whopr does not work with weak functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44463 --- Comment #12 from Andi Kleen 2011-10-07 05:52:08 UTC --- Honza, I think that is fixed now, correct? I should probably drop my workarounds but haven't yet
[Bug target/50302] inefficient float->double conversion in AVX with -mtune=generic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50302 --- Comment #4 from Andi Kleen 2011-10-07 14:40:02 UTC --- Sorry yes my mistake.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #10 from Andi Kleen 2011-10-07 14:44:10 UTC --- To track the pattern you can simply use strace or ftrace (I did ftrace) I checked the kernel code now and if the madvise is big enough it won't split up the 2MB page. So doing it aggressively should be ok, but still it may be beneficial to skip it for very scattered pages. I suspect other OS don't have MADV_DONTNEED, they would probably need Honza's pool idea. I did a prototype patch now, will be testing it.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #11 from Andi Kleen 2011-10-08 16:47:54 UTC --- Created attachment 25445 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25445 patchkit I tested this patchkit which implements most of the ideas from this bug, unfortunately still the same problem (with 20% threshold and madvise) >From the core file I see a lot of 2-3 page holes still: 65218 load65205 00019000 2ae45c515000 bfea1000 2**12 CONTENTS, ALLOC, LOAD 65219 load65206 00018000 2ae45c52f000 bfeba000 2**12 CONTENTS, ALLOC, LOAD 65220 load65207 00013000 2ae45c548000 bfed2000 2**12 CONTENTS, ALLOC, LOAD 65221 load65208 00019000 2ae45c55d000 bfee5000 2**12 CONTENTS, ALLOC, LOAD 65222 load65209 1000 2ae45c577000 bfefe000 2**12 CONTENTS, ALLOC, LOAD 65223 load65210 00044000 2ae45c579000 bfeff000 2**12 CONTENTS, ALLOC, LOAD 65224 load65211 0003d000 2ae45c5be000 bff43000 2**12 CONTENTS, ALLOC, LOAD 65225 load65212 00021000 2ae45c5fc000 bff8 2**12 CONTENTS, ALLOC, LOAD 65226 load65213 6000 2ae45c61e000 bffa1000 2**12 CONTENTS, ALLOC, LOAD 65227 load65214 0002d000 2ae45c625000 bffa7000 2**12 CONTENTS, ALLOC, LOAD 65228 load65215 00041000 2ae45c653000 bffd4000 2**12 CONTENTS, ALLOC, LOAD 65229 load65216 b000 2ae45c695000 c0015000 2**12 CONTENTS, ALLOC, LOAD 65230 load65217 1000 2ae45c6a1000 c002 2**12 CONTENTS, ALLOC, LOAD 65231 load65218 f000 2ae45c6a3000 c0021000 2**12 CONTENTS, ALLOC, LOAD 65232 load65219 1000 2ae45c6b3000 c003 2**12 CONTENTS, ALLOC, LOAD 65233 load65220 00031000 2ae45c6b5000 c0031000 2**12 CONTENTS, ALLOC, LOAD 65234 load65221 0001a000 2ae45c6e7000 c0062000 2**12 CONTENTS, ALLOC, LOAD 65235 load65222 0001c000 2ae45c702000 c007c000 2**12 CONTENTS, ALLOC, LOAD
[Bug lto/50666] New: bad error reporting for TMPDIR full
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50666 Bug #: 50666 Summary: bad error reporting for TMPDIR full Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org When $TMPDIR gets full for a LTO build you just get "ld exited with error 1" which is very unhelpful. It would be better if there was a real error message for this.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 Andi Kleen changed: What|Removed |Added Attachment #25445|0 |1 is obsolete|| --- Comment #12 from Andi Kleen 2011-10-08 19:54:59 UTC --- Created attachment 25446 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25446 updated patchkit This version seems to work. I am at under 1000 mappings now for the case that failed previously.
[Bug other/50636] GC in large LTO builds cause excessive fragmentation in memory map
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50636 --- Comment #14 from Andi Kleen 2011-10-08 21:10:13 UTC --- Thanks for the review. Fixed the accounting I'll leave the xmalloc_failed hook out for now: it would need a retry path which is somewhat complicated. If it's needed would probably just add another separate threshold that forces munmapping. BTW i also filed a bug on the glibc bug this triggered: http://sourceware.org/bugzilla/show_bug.cgi?id=13276
[Bug middle-end/25957] -fstack-protector code on i386/x86_64 can be improved.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25957 --- Comment #11 from Andi Kleen 2011-10-08 23:27:02 UTC --- I just checked and the problem is still there with 4.7.0 20111002 xorq%fs:40, %rax jne .L4 addq$120, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L4: .cfi_restore_state .p2align 4,,6 call__stack_chk_fail .cfi_endproc .LFE0: unnecessary wasteful alignment for the call to abort. The basic block should be marked cold.
[Bug tree-optimization/50602] ICE in tree_nrv, at tree-nrv.c:155 during large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50602 --- Comment #5 from Andi Kleen 2011-10-09 02:31:38 UTC --- Looked at this now again debug_function doesn't work. TDF_UID was also not available, but i hardcoded it. (gdb) call debug_function (cfun->decl, 1<<8) (gdb) neither the other call (gdb) call print_generic_expr(stderr, result, 1 << 8) (gdb) Looking at the code 0x0075a827 :test %rax,%rax 0x0075a82a :je 0x75a7f2 0x0075a82c :cmp%rax,%rbx 0x0075a82f :je 0x75a7f2 0x0075a831 :mov$0xbe3bb7,%edx 0x0075a836 :mov$0x9b,%esi 0x0075a83b :mov$0xbe3b7a,%edi 0x0075a840 :callq 0xb488e0 0x0075a845 :nopl (%rax) I'm in the fancy_abort When I go up and print $rax I get 0 (gdb) p $rax $11 = 0 But this cannot be because the test tested it for 0. Perhaps the unwind information below is wrong?
[Bug tree-optimization/50602] ICE in tree_nrv, at tree-nrv.c:155 during large LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50602 --- Comment #6 from Andi Kleen 2011-10-09 04:05:52 UTC --- i changed the code now to save ret_val in a volatile global. This is a bit better (gdb) p saved_ret_val $5 = (volatile tree) 0x2afc557b68c0 (gdb) p result $6 = (tree_node *) 0x2afbfb754a00 Still not sure how to print them, maybe stderr is broken. Looking at raw output I see one of them is a VAR_DECL and the other a RESULT_DECL (gdb) p result->decl_minimal.uid $9 = 83837 (gdb) p saved_ret_val->decl_minimal.uid $10 = 3599083 (gdb) p cfun->decl->decl_minimal.uid $3 = 83835 Searching for the second uid in the dump files I see it first in 045i.whole-program: : return D.3599083; and the first doesn't appear in any file (that means the current pass added it?) The third is first in 049.inline
[Bug lto/50679] New: Linux kernel LTO tracking bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50679 Bug #: 50679 Summary: Linux kernel LTO tracking bug Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org Meta bug to track various problems encountered while building the Linux kernel with LTO
[Bug lto/50620] "undefined reference" errors / csmith lto testing
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50620 Andi Kleen changed: What|Removed |Added CC||andi-gcc at firstfloor dot ||org --- Comment #2 from Andi Kleen 2011-10-09 14:33:18 UTC --- I see a similar problem in a large LTO build (at least Honza thinks it's the same)
[Bug tree-optimization/50644] ICE in set_is_used added today
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50644 --- Comment #5 from Andi Kleen 2011-10-13 13:22:40 UTC --- Note I need to keep reverting this patch to do any substantial builds. I hear it's also failing for other too. Any progress in fixing it? Thanks.
[Bug other/50783] New: builtin c++ demanger does not handle clones
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50783 Bug #: 50783 Summary: builtin c++ demanger does not handle clones Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other AssignedTo: unassig...@gcc.gnu.org ReportedBy: andi-...@firstfloor.org >From a unrelated double bootstrap failure: In function 'rtx_def* gen_movsfcc(rtx, rtx, rtx, rtx)': vs In function 'int _ZL8recog_61P7rtx_defS0_Pi.isra.35(rtvec_def**, rtx)': Seems like the function with .isra.35 was not correctly demangled.