Re: x86 gcc lacks simple optimization

2013-12-06 Thread Jeff Law
On 12/06/13 07:17, Richard Biener wrote: On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov wrote: Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov wrote: > Hi, > > Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c > and now it yields code like (x86 again): > > .L5: > movzbl 4(%esi,%eax,4), %edx > movb %dl, 4(%ebx,%eax,4) > addl $1, %eax > cmpl %ecx, %eax > jne .L5 > > So

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c and now it yields code like (x86 again): .L5: movzbl 4(%esi,%eax,4), %edx movb %dl, 4(%ebx,%eax,4) addl $1, %eax cmpl %ecx, %eax jne .L5 So, excessive lea is gone. It is great, thank you so much. But I wonder what else can I

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Marc Glisse
On Fri, 6 Dec 2013, Konstantin Vladimirov wrote: Consider code: int foo(char *t, char *v, int w) { int i; for (i = 1; i != w; ++i) { int x = i << 2; A side note, but something too few people seem to be aware of: writing i<<2 can pessimize code compared to i*4 (and it is never faster). That

Re: x86 gcc lacks simple optimization

2013-12-06 Thread H.J. Lu
On Fri, Dec 6, 2013 at 2:25 AM, Richard Biener wrote: > On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov > wrote: >> Hi, >> >> nothing changes if everything is unsigned and we are guaranteed to not >> raise UB on overflow: >> >> unsigned foo(unsigned char *t, unsigned char *v, unsigned w) >

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov wrote: > Hi, > > nothing changes if everything is unsigned and we are guaranteed to not > raise UB on overflow: > > unsigned foo(unsigned char *t, unsigned char *v, unsigned w) > { > unsigned i; > > for (i = 1; i != w; ++i) > { > unsigned x =

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, nothing changes if everything is unsigned and we are guaranteed to not raise UB on overflow: unsigned foo(unsigned char *t, unsigned char *v, unsigned w) { unsigned i; for (i = 1; i != w; ++i) { unsigned x = i << 2; v[x + 4] = t[x + 4]; } return 0; } yields: .L5: leal 0(,%eax,4), %edx add

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov wrote: > Hi, > > Consider code: > > int foo(char *t, char *v, int w) > { > int i; > > for (i = 1; i != w; ++i) > { > int x = i << 2; > v[x + 4] = t[x + 4]; > } > > return 0; > } > > Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with o

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Jakub Jelinek
On Fri, Dec 06, 2013 at 12:30:54PM +0400, Konstantin Vladimirov wrote: > Consider code: > > int foo(char *t, char *v, int w) > { > int i; > > for (i = 1; i != w; ++i) > { > int x = i << 2; > v[x + 4] = t[x + 4]; > } > > return 0; > } This is either job of ivopts pass, dunno why it doesn't consi

Re: x86 gcc lacks simple optimization

2013-12-06 Thread Konstantin Vladimirov
Hi, Example from x86 code was only for ease of reproduction. I am pretty sure, this is architecture-independent issue. Say on ARM: .L2: mov ip, r3, asl #2 add ip, ip, #4 add r3, r3, #1 ldrb r4, [r0, ip] @ zero_extendqisi2 cmp r3, r2 strb r4, [r1, ip] bne .L2 May be improved to: .L2: add r3, r3,

Re: x86 gcc lacks simple optimization

2013-12-06 Thread David Brown
On 06/12/13 09:30, Konstantin Vladimirov wrote: > Hi, > > Consider code: > > int foo(char *t, char *v, int w) > { > int i; > > for (i = 1; i != w; ++i) > { > int x = i << 2; > v[x + 4] = t[x + 4]; > } > > return 0; > } > > Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options: >