[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-26 Thread martsummsw at hotmail dot com


--- Comment #7 from martsummsw at hotmail dot com  2006-08-27 06:37 ---
I am the reporter of this bug (with a new email-adress)
This problem seems to be solved with 4.1.1 =)

(Consider only #5 and forward - the first is wrong/irrelevant)


-- 

martsummsw at hotmail dot com changed:

   What|Removed |Added

 CC||martsummsw at hotmail dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread martsummsw at hotmail dot com


--- Comment #9 from martsummsw at hotmail dot com  2006-08-27 13:21 ---
Hmmm - I am (also) wrong when I claimed it was solved in 4.1.1. It is improved
since the example that goes wrong in #5 now is right, but it is just 
the limit (for when the compiler gets comfused) that is pushed a bit.

e.g.
for (int bp=0;bp<11;++bp) 
// Up to 11 is fine unrolled in gcc 4.1.1 

However 12 and above e.g.
for (int bp=0;bp<12;++bp) 
// this still produces the poor performing code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread martsummsw at hotmail dot com


--- Comment #11 from martsummsw at hotmail dot com  2006-08-27 19:33 ---
You are right =) 

I recall I did play with some params in 3.4, but without result but I did
not in 4.0 - since I did not expect a so (in my head) fairly low number to be
"large" ...

It would be real nice if gcc had an option forceing it to compile both unrolled
and not unrolled versions of known sizes and "at last" deciding the speed gain
contra the extra used space. In this case with e.g 14 iterations the space is
not even doubled in space however the speed is increased with more than 400%.
(I know gcc cannot know how much faster it is)

The #pragma would also be real nice
I could dream about a pragma with the following behaviour ...
#pragma unroll-next-loop [guess x1,x2,x3] 

and if guess was used (for unknown sizes) it expanded "for (int u=0;uhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug c++/23793] New: Unhealthy optimization. Accessing double with reinterpret_cast.

2005-09-09 Thread martsummsw at hotmail dot com
This is an error-report. However I will provide some background 
for my little piece of code. (The code itself is very simple)

I will try to post on comp.std.c++ in order to make this a part of C++ 
(maybe c) otherwise I might come back too beck you to implement it just in your
compiler.

There are many reasons. (One and the best is to switch a double based on
intervals) Therefore I would like a VERY FAST FUNCTION to return the sign of a
double (and float and long double) ...

(And compare on a double d<=-0.0 (without branch) wont do the trick. And I can't
blame you because you will have to respect Nan. 

Since c/c++ does not have this fast function (skipping Nan)
(which I hope will come) I have no other no other options that to 
write it myself (and cheat!)

That means that my trick will only work on doubles in IEEE 754 - with a size of 
2. 

The sizepart of double (in my case 8 bytes) and int (in my case 4 bytes) could
probably be fixed with the right macros. 

However my code will only work on x86 and other machines accepting the IEEE 754
standard. I think Motorola does not follow this - but nevermind.

The code with the bug is : (Read signbit a push it to be one or zero)
int is_not_positive(double v)
{
  return ((reinterpret_cast(&v)[1]) >> 31);
}

This works with option O1 (and below)
but fails with O2 (and above)

The O1 correct (but not fast fast code) looks like this:

.file"bug.cpp"
.text
.align 2
.globl _Z15is_not_positived
.type_Z15is_not_positived, @function
_Z15is_not_positived:
.LFB3:
pushl%ebp
.LCFI0:
movl%esp, %ebp
.LCFI1:
subl$8, %esp
.LCFI2:
fldl8(%ebp)
fstpl-8(%ebp)
movl-4(%ebp), %eax
shrl$31, %eax
movl%ebp, %esp
popl%ebp
ret
.LFE3:
.size_Z15is_not_positived, .-_Z15is_not_positived
.section.note.GNU-stack,"",@progbits
.ident"GCC: (GNU) 3.3.5-20050130 (Gentoo 3.3.5.20050130-r1,
ssp-3.3.5.20050130-1, pie-8.7.7.1)"

The wrong optimization simply removes:
fldl8(%ebp)
fstpl-8(%ebp)
// I guess that it removes the store. 

---
The "wished optimized code" is (notice this is partly manually written so
I might be wrong. I am not an assembler expert)

.LFB4:
pushl%ebp
.LCFI0:
movl%esp, %ebp
.LCFI1:
movl12(%ebp), %eax
popl%ebp
shrl$31, %eax
ret
.LFE4:

I am sorry that I have not testet it with a newer version. 
(However I am not to bright and last time I did emerge gcc (with accept newest
version I got problems with compiling my kernel))
I hope the answer is "just upgrade you stupid man..."

Regards 
Thorbjørn Martsum

PS: 
BTW...
I have found a workaround using unsigned long long. 
This works with O3 and has only one extra instruction compared to "my best"

movl12(%ebp), %eax
popl%ebp

expands to

movl12(%ebp), %ecx
popl%ebp
movl%ecx, %eax

(a missing peephole-pattern (?) ) 
But sill WAY WAY Better than what the MS VS6 gives =)

-- 
   Summary: Unhealthy optimization. Accessing double with
reinterpret_cast.
   Product: gcc
   Version: 3.3.5
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c++
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: martsummsw at hotmail dot com
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23793


[Bug c++/23793] Unhealthy optimization. Accessing double with reinterpret_cast.

2005-09-09 Thread martsummsw at hotmail dot com

--- Additional Comments From martsummsw at hotmail dot com  2005-09-10 
04:58 ---
First of all - Thank you. 
And I promise never to report an error against such an old version.

Second I however think I am happy to haven reported it since nobody on
comp.lang.c++ knew . I found nothing on the internet or 
man math.h | grep -i sign // this might have changed since my old version

Third I know that the reinterpret_cast is more than bad behavior and it 
strictly needs not to work. However I am not casting an int into a char-array
where I should be able to hit eg an odd adresses (or?) and I guess that no
padding could be in my way(?)
So I could see no *reason* for it not to work. I know the union would do it 
(more) secure. I was just so stupid to think that it would not be as efficient.

Summa: I am happy that I have been told about signbit - 
and I am really sorry to have wasted your time... 

Regards 
Thorbjørn


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23793