[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #7 from martsummsw at hotmail dot com 2006-08-27 06:37 --- I am the reporter of this bug (with a new email-adress) This problem seems to be solved with 4.1.1 =) (Consider only #5 and forward - the first is wrong/irrelevant) -- martsummsw at hotmail dot com changed: What|Removed |Added CC||martsummsw at hotmail dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #9 from martsummsw at hotmail dot com 2006-08-27 13:21 --- Hmmm - I am (also) wrong when I claimed it was solved in 4.1.1. It is improved since the example that goes wrong in #5 now is right, but it is just the limit (for when the compiler gets comfused) that is pushed a bit. e.g. for (int bp=0;bp<11;++bp) // Up to 11 is fine unrolled in gcc 4.1.1 However 12 and above e.g. for (int bp=0;bp<12;++bp) // this still produces the poor performing code -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #11 from martsummsw at hotmail dot com 2006-08-27 19:33 --- You are right =) I recall I did play with some params in 3.4, but without result but I did not in 4.0 - since I did not expect a so (in my head) fairly low number to be "large" ... It would be real nice if gcc had an option forceing it to compile both unrolled and not unrolled versions of known sizes and "at last" deciding the speed gain contra the extra used space. In this case with e.g 14 iterations the space is not even doubled in space however the speed is increased with more than 400%. (I know gcc cannot know how much faster it is) The #pragma would also be real nice I could dream about a pragma with the following behaviour ... #pragma unroll-next-loop [guess x1,x2,x3] and if guess was used (for unknown sizes) it expanded "for (int u=0;uhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug c++/23793] New: Unhealthy optimization. Accessing double with reinterpret_cast.
This is an error-report. However I will provide some background for my little piece of code. (The code itself is very simple) I will try to post on comp.std.c++ in order to make this a part of C++ (maybe c) otherwise I might come back too beck you to implement it just in your compiler. There are many reasons. (One and the best is to switch a double based on intervals) Therefore I would like a VERY FAST FUNCTION to return the sign of a double (and float and long double) ... (And compare on a double d<=-0.0 (without branch) wont do the trick. And I can't blame you because you will have to respect Nan. Since c/c++ does not have this fast function (skipping Nan) (which I hope will come) I have no other no other options that to write it myself (and cheat!) That means that my trick will only work on doubles in IEEE 754 - with a size of 2. The sizepart of double (in my case 8 bytes) and int (in my case 4 bytes) could probably be fixed with the right macros. However my code will only work on x86 and other machines accepting the IEEE 754 standard. I think Motorola does not follow this - but nevermind. The code with the bug is : (Read signbit a push it to be one or zero) int is_not_positive(double v) { return ((reinterpret_cast(&v)[1]) >> 31); } This works with option O1 (and below) but fails with O2 (and above) The O1 correct (but not fast fast code) looks like this: .file"bug.cpp" .text .align 2 .globl _Z15is_not_positived .type_Z15is_not_positived, @function _Z15is_not_positived: .LFB3: pushl%ebp .LCFI0: movl%esp, %ebp .LCFI1: subl$8, %esp .LCFI2: fldl8(%ebp) fstpl-8(%ebp) movl-4(%ebp), %eax shrl$31, %eax movl%ebp, %esp popl%ebp ret .LFE3: .size_Z15is_not_positived, .-_Z15is_not_positived .section.note.GNU-stack,"",@progbits .ident"GCC: (GNU) 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1)" The wrong optimization simply removes: fldl8(%ebp) fstpl-8(%ebp) // I guess that it removes the store. --- The "wished optimized code" is (notice this is partly manually written so I might be wrong. I am not an assembler expert) .LFB4: pushl%ebp .LCFI0: movl%esp, %ebp .LCFI1: movl12(%ebp), %eax popl%ebp shrl$31, %eax ret .LFE4: I am sorry that I have not testet it with a newer version. (However I am not to bright and last time I did emerge gcc (with accept newest version I got problems with compiling my kernel)) I hope the answer is "just upgrade you stupid man..." Regards Thorbjørn Martsum PS: BTW... I have found a workaround using unsigned long long. This works with O3 and has only one extra instruction compared to "my best" movl12(%ebp), %eax popl%ebp expands to movl12(%ebp), %ecx popl%ebp movl%ecx, %eax (a missing peephole-pattern (?) ) But sill WAY WAY Better than what the MS VS6 gives =) -- Summary: Unhealthy optimization. Accessing double with reinterpret_cast. Product: gcc Version: 3.3.5 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: martsummsw at hotmail dot com CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23793
[Bug c++/23793] Unhealthy optimization. Accessing double with reinterpret_cast.
--- Additional Comments From martsummsw at hotmail dot com 2005-09-10 04:58 --- First of all - Thank you. And I promise never to report an error against such an old version. Second I however think I am happy to haven reported it since nobody on comp.lang.c++ knew . I found nothing on the internet or man math.h | grep -i sign // this might have changed since my old version Third I know that the reinterpret_cast is more than bad behavior and it strictly needs not to work. However I am not casting an int into a char-array where I should be able to hit eg an odd adresses (or?) and I guess that no padding could be in my way(?) So I could see no *reason* for it not to work. I know the union would do it (more) secure. I was just so stupid to think that it would not be as efficient. Summa: I am happy that I have been told about signbit - and I am really sorry to have wasted your time... Regards Thorbjørn -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23793