[Bug c/56434] New: document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 Bug #: 56434 Summary: document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: c...@pobox.com The docs say that __attribute__((__malloc__)) only has one effect: informing the compiler that returned pointers do not alias other pointers. But reading the compiler output, and then reading gcc source code, proves that it also has a second effect: informing the compiler that returned pointers are aligned to BIGGEST_ALIGNMENT. To quote expand_call: /* The return value from a malloc-like function is a pointer. */ if (TREE_CODE (rettype) == POINTER_TYPE) mark_reg_pointer (temp, BIGGEST_ALIGNMENT); This should be added to the documentation. As a side issue, BIGGEST_ALIGNMENT changes on the i386 target depending on whether -mavx is specified (128 vs. 256). Is it really a good idea for gcc to assume different things about the behavior of malloc() depending on -mavx? It seems that perhaps an alignment of 128 should always be conferred on malloc on the i386 platform, regardless of -mavx? What would the new target macro be? SMALLEST_BIGGEST_ALIGNMENT? :)
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #2 from Chip Salzenberg 2013-02-25 17:51:36 UTC --- I detected this by observing inlined strlen() on a malloc'd pointer did not first do an unaligned prologue. I expected it to first advance by bytes until it detected alignment, but it didn't do any of that; it leapt right into the word-sized optimized loop. This suggests that the compiler knows than an 8-byte-aligned (say) pointer has its low seven bits off and will evaporate away any code that depends on them being nonzero. Or is the strlen inlining special-cased?
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #3 from Chip Salzenberg 2013-02-25 17:54:23 UTC --- I meant the low three bits off, for a maximum value of seven. Of course.
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #5 from Chip Salzenberg 2013-02-25 20:02:24 UTC --- Indeed. So MALLOC_ABI_ALIGNMENT should perhaps default to the largest alignment of all the C89 types, with platform overrides as needed?
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #7 from Chip Salzenberg 2013-03-21 00:31:40 UTC --- So ... is there still a question of the Right Thing here? It seems that fixing MALLOC_ABI_ALIGNMENT for the world, and ensuring that BIGGEST_ALIGNMENT never affects the ABI, are the actions to take. If this were done soon we could even see it fixed for 4.8.0. Help?
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #10 from Chip Salzenberg 2013-03-22 20:20:11 UTC --- Thanks muchly. Then MALLOC_ABI_ALIGNMENT will need fixing, as Jakub observes, but that needed to happen anyway.
[Bug target/56726] New: i386: MALLOC_ABI_ALIGNMENT is too small (usually)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 Bug #: 56726 Summary: i386: MALLOC_ABI_ALIGNMENT is too small (usually) Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: c...@pobox.com Observed malloc alignment for the i386 ABI is double POINTER_SIZE. BITS_PER_WORD, the current default, is usually too small. (It's right only on X32.) Proposed patch: --- gcc/config/i386/i386.h (revision 197055) +++ gcc/config/i386/i386.h (working copy) @@ -815,6 +815,14 @@ x86_field_alignment (FIELD, COMPUTED) #endif +/* The maximum alignment 'malloc' honors. + + This value is taken from glibc documentation for memalign(). It may + be up to double the very conservative GCC default. This should be safe, + since even the GCC 4.8 default of BIGGEST_ALIGNMENT usually worked. */ + +#define MALLOC_ABI_ALIGNMENT (POINTER_SIZE * 2) + /* If defined, a C expression to compute the alignment given to a constant that is being placed in memory. EXP is the constant and ALIGN is the alignment that the object would ordinarily have
[Bug rtl-optimization/56434] document that __attribute__((__malloc__)) assumes returned pointer has BIGGEST_ALIGNMENT
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56434 --- Comment #12 from Chip Salzenberg 2013-03-25 19:15:10 UTC --- Thank you. I've filed #56726 with a patch to update MALLOC_ABI_ALIGNMENT on i386.
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #2 from Chip Salzenberg 2013-03-25 21:35:19 UTC --- I'm a bit skeptical of that. Glibc malloc alignment is 2 * sizeof(void*), and void* in X32 is 32 bits. Unless X32 code uses the x86_64 libc, I am confused. PS: Hi, HJ
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #4 from Chip Salzenberg 2013-03-25 22:35:57 UTC --- If I'm reading that correctly, it seems to agree with my patch. It looks like MALLOC_ABI_ALIGNMENT of POINTER_SIZE*2 is always either correct or smaller than necessary, but never too large. If MALLOC_ABI_ALIGNMENT is smaller than necessary then optimizations may be missed (depending on the values). But if it is too large then performance *will* suffer. It might even cause exceptions from unaligned accesses, but i386 is very forgiving, so it'll just be slower for no apparent reason. Perhaps the glibc version differences in malloc should be advertised with __attribute__ on the malloc declarations. Perhaps a new pragma or attribute is required to do this 100% right. But in the meantime I like POINTER_SIZE*2.
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #6 from Chip Salzenberg 2013-03-29 06:05:19 UTC --- May I have this accepted?
[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020 --- Comment #31 from Chip Salzenberg 2012-08-14 22:46:12 UTC --- I've tested the attached patch, and I find that it succeeds in preventing the current missed optimizations in structs passed by value from affecting 128-bit structs. IOW: Works for me. Thanks!
[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 --- Comment #17 from Chip Salzenberg 2012-08-14 22:50:01 UTC --- The patch posted in Bug 20020 prevents missed optimization for 128-bit structures on x86_64. So this bug does seem to be all about the BLKmode.
[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020 --- Comment #32 from Chip Salzenberg 2012-08-14 23:09:01 UTC --- More good data: this patch reduces the size of libstdc++.so by .5% $ size usr/lib/libstdc++.so.6.0.17 /usr/lib/libstdc++.so.6.0.17 textdata bss dec hex filename 949608 36200 85088 1070896 105730 usr/lib/libstdc++.so.6.0.17 955484 36200 85088 1076772 106e24 /usr/lib/libstdc++.so.6.0.17
[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020 --- Comment #39 from Chip Salzenberg 2012-08-15 09:13:36 UTC --- avoiding BLKmode avoids unnecessary spills to memory. See Bug 28831 and Bug 41194 for examples.
[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 --- Comment #18 from Chip Salzenberg 2012-08-15 18:00:39 UTC --- What will it take to get this fixed? Pass by value is Big in C++11 style, with move semantics designed to tie right into the optimization that's being missed here. This is sucking a lot for C++.
[Bug rtl-optimization/44194] struct returned by value generates useless stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194 --- Comment #44 from Chip Salzenberg 2012-09-12 23:21:21 UTC --- Note that the x86 target has been changed in svn to use TImode for 128-bit structures, and structures bigger than 128 bits may not be passed in registers, so triggering this bug may be quite different now.
[Bug rtl-optimization/44194] struct returned by value generates useless stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194 --- Comment #48 from Chip Salzenberg 2012-09-14 17:23:08 UTC --- May Shub-Internet not see you as you pass.
[Bug rtl-optimization/54585] New: stack space allocated but never used when calling functions that return structs in registers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585 Bug #: 54585 Summary: stack space allocated but never used when calling functions that return structs in registers Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: c...@pobox.com Now that bug #44194 is fixed, and a returned structure used as a parameter is no longer stored unnecessarily, a new bug is visible: a stack frame is being allocated that is entirely unused. On x86_64 target with the fix for 44194 backported to the 4.7 branch, this code: #include struct blargh { uint32_t a, b, c; } foo(); void bar(uint32_t a, uint32_t b, uint32_t c); void func() { struct blargh s = foo(); bar(s.a, s.b, s.c); } no longer uses any stack memory at all, but still the function call reserves 24 bytes with "subq $24,%rsp" and promptly returns it with "addq $24,%rsp". The generated code looks like this: func: .cfi_startproc xorl%eax, %eax subq$24, %rsp .cfi_def_cfa_offset 32 callfoo movq%rax, %rsi movl%eax, %edi addq$24, %rsp .cfi_def_cfa_offset 8 shrq$32, %rsi jmp bar .cfi_endproc
[Bug middle-end/28831] [4.7/4.8/4.9 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 --- Comment #22 from Chip Salzenberg --- Anyone? Bueller?
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #7 from Chip Salzenberg --- Should this ticket have status CONFIRMED ? Also I suspect it's been fixed in trunk...
[Bug rtl-optimization/54585] stack space allocated but never used when calling functions that return structs in registers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585 --- Comment #1 from Chip Salzenberg --- I'd like to suggest this ticket be at least CONFIRMED what with the code samples in the ticket. What will it take to fix this?
[Bug rtl-optimization/54585] stack space allocated but never used when calling functions that return structs in registers
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54585 --- Comment #3 from Chip Salzenberg --- It's worth it for cache reasons I believe. The data cache works better you don't spread out the stack data unnecessarily. More concretely, if the stack frame can entirely disappear then you also reduce the instruction count. That's fewer instructions to dispatch and less icache pressure.
[Bug libstdc++/54025] New: atomic won't compile: chrono::duration::duration() is not C++11 compliant
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54025 Bug #: 54025 Summary: atomic won't compile: chrono::duration::duration() is not C++11 compliant Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: c...@pobox.com Attempting to compile atomic fails, because the duration default constructor is not " = default" as required by the standard, but instead explicitly initializes its representation. Here is what libstdc++ says: constexpr duration() : __r() { } but here is what the standard says should be there, and if I make the change, compilation succeeds: constexpr duration() = default; Test source: #include #include using namespace std; using namespace chrono; int main() { atomic> dur; } Error before patch: /usr/include/c++/4.7/atomic: In instantiation of ‘struct std::atomic > >’: atdur.cc:6:35: required from here /usr/include/c++/4.7/atomic:160:7: error: function ‘std::atomic<_Tp>::atomic() [with _Tp = std::chrono::duration >]’ defaulted on its first declaration with an exception-specification that differs from the implicit declaration ‘constexpr std::atomic > >::atomic()’
[Bug libstdc++/54025] atomic won't compile: chrono::duration::duration() is not C++11 compliant
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54025 --- Comment #1 from Chip Salzenberg 2012-07-19 02:56:57 UTC --- Created attachment 27829 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27829 patch to duration default ctor
[Bug libstdc++/54075] [4.7.1] unordered_map 3x slower than 4.6.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075 --- Comment #16 from Chip Salzenberg 2012-07-26 22:50:17 UTC --- In my tests, with this patch, 4.7.1 is about 10% slower than 4.6 ... a vast improvement but certainly not parity. ./bench46 1.75s user 0.82s system 99% cpu 2.577 total ./bench47 8.01s user 2.78s system 99% cpu 10.800 total ./bench47+patch 1.95s user 0.80s system 99% cpu 2.764 total
[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075 --- Comment #18 from Chip Salzenberg 2012-07-26 23:38:34 UTC --- I couldn't say. I don't understand the issue, I'm just reporting results and deploying packages for my fellow devs.
[Bug libstdc++/54075] [4.7.1] unordered_map insert 3x slower than 4.6.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075 --- Comment #20 from Chip Salzenberg 2012-07-27 01:00:14 UTC --- Are you talking to me? 'cause I was providing results for the patch already committed to svn, using the code in this very bug description.
[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 Chip Salzenberg changed: What|Removed |Added CC||chip at pobox dot com --- Comment #15 from Chip Salzenberg 2012-08-06 00:37:36 UTC --- Ping. I've just run into this with the tip of the gcc 4.7.1 branch. Is there a workaround? Some way to label the struct as not needing to be stored? Something like __attribute__((noaddress)); We want to pass and return structs by value as current C++ style recommends, but the extra register spills are dragging down performance. For small key classes we've switched to using big integers with masking functions, but for larger ones there is no workaround that we know of. Given this code: extern val_t foo(); extern int bar(val_t); int main() { return bar(foo()); } When val_t is a struct of two int64_t on x86_64, the code has two extra stores: > movq%rax, (%rsp) > movq%rdx, 8(%rsp) and the stack frame is larger and there is no tail call optimization. When val_t is __int128 on x86_64, the code is optimal: tail call, no extra stores, smaller stack frame (because there is no need to store the value).
[Bug middle-end/28831] [4.6/4.7/4.8 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 --- Comment #16 from Chip Salzenberg 2012-08-06 00:57:13 UTC --- Addendum: In cut down test cases where I only pass by value or only return by value, but not both, I find no extra stores, which is good; but I still find a lot of unnecessary frame allocation (either $24 or $40, depending), and tail call is still missing.
[Bug rtl-optimization/44194] struct returned by value generates useless stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194 Chip Salzenberg changed: What|Removed |Added CC||chip at pobox dot com --- Comment #42 from Chip Salzenberg 2012-08-06 01:22:43 UTC --- Is bug #28831 a dup of this?
[Bug target/20020] x86_64 - 128 bit structs not targeted to TImode
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20020 Chip Salzenberg changed: What|Removed |Added CC||chip at pobox dot com --- Comment #13 from Chip Salzenberg 2012-08-06 22:52:41 UTC --- Is this bug obsolete now?
[Bug libstdc++/69191] Wrong equality comparison between error_code and error_condition + segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69191 --- Comment #16 from Chip Salzenberg --- Still happening in 7.2
[Bug middle-end/28831] [4.8/4.9/4.10 Regression] Aggregate copy not elided when using a return value as a pass-by-value parameter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28831 --- Comment #24 from Chip Salzenberg --- In 4.8.2 (Ubuntu trusty), the copy is finally elided. Good job! But stack space is still allocated for the copy that is not made. So it's not all fixed.
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #8 from Chip Salzenberg --- Further research says that the alignment of a malloc(N) will be >= N if there is a basic type that requires alignment N. So we may be able to ramp this up quite a bit.
[Bug target/56726] i386: MALLOC_ABI_ALIGNMENT is too small (usually)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56726 --- Comment #11 from Chip Salzenberg --- Indeed, 16 is required by the ABI; see http://www.x86-64.org/documentation/abi.pdf page 12. Only the SIMD __m256 is bigger than 16, and there seems no end to Intel's extensions to SIMD registers, so holding at 16 seems like the Right Thing.