Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-19 Thread Uros Bizjak
On Fri, Dec 14, 2012 at 11:47 AM, Yuri Rumyantsev wrote: > With your new fix that add if-then-else splitting for memory operand I > got expected performance speed-up - +6.7% for Atom and +8.4% for SNB. > We need to do all testing this weekend and I will get you our final > feedback on Monday. Af

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-14 Thread Yuri Rumyantsev
Hi Uros, With your new fix that add if-then-else splitting for memory operand I got expected performance speed-up - +6.7% for Atom and +8.4% for SNB. We need to do all testing this weekend and I will get you our final feedback on Monday. Thanks ahead for all your help. Yuri. 2012/12/13 Uros Bizj

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Jan Hubicka
> > Honza, I think the pass manager should call default_rtl_profile () before > > each > > RTL pass to avoid this, no? > > Please note that we have plenty of existing peephole2s that use > optimize_insn_for_speed_p predicate. It is assumed to work ... It is set by peep2 pass static void peephole

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Jan Hubicka
> On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote: > > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > > wrote: > > > >>> I assume that this is not right way for fixing such simple performance > >>> anomaly since we need to do redundant work - combine load to > >>> conditional and then split

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 4:02 PM, Yuri Rumyantsev wrote: > We did not see any performance improvement on Atom in 32-bit mode at > routelookup from eembc_2_0 (eembc_1_1). I assume that for x86_64 the patch works as expected. Let's take a bigger hammer for 32bit targets - the splitter that effectiv

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Yuri Rumyantsev
Uros, We did not see any performance improvement on Atom in 32-bit mode at routelookup from eembc_2_0 (eembc_1_1). Best regards. Yuri. 2012/12/13 Uros Bizjak : > On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote: > >>> The patch proposed by Uros is useless since we don't have free scratch >>>

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote: >> The patch proposed by Uros is useless since we don't have free scratch >> register to do splitting of memory operand: >> >> ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] >> 17[flags] >> >> ... >> >> (insn 96 131 13

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote: > Hi Guys, > > The patch proposed by Uros is useless since we don't have free scratch > register to do splitting of memory operand: > > ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] > 17[flags] > > ... > > (insn 96

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote: > The patch proposed by Uros is useless since we don't have free scratch > register to do splitting of memory operand: > > ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] > 17[flags] > > ... > > (insn 96 131 132 7 (

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Yuri Rumyantsev
Hi Guys, The patch proposed by Uros is useless since we don't have free scratch register to do splitting of memory operand: ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp] 17[flags] ... (insn 96 131 132 7 (set (reg/v/f:SI 6 bp [orig:70 trie_root ] [70]) (if_the

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Thu, Dec 13, 2012 at 11:20 AM, Uros Bizjak wrote: > On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener > wrote: > > I assume that this is not right way for fixing such simple performance > anomaly since we need to do redundant work - combine load to > conditional and then split it ba

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Uros Bizjak
On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener wrote: I assume that this is not right way for fixing such simple performance anomaly since we need to do redundant work - combine load to conditional and then split it back in peephole2? Does it look reasonable? Why we should p

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-13 Thread Richard Biener
On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote: > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener > wrote: > >>> I assume that this is not right way for fixing such simple performance >>> anomaly since we need to do redundant work - combine load to >>> conditional and then split it back in pe

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Henderson
On 12/12/2012 10:32 AM, Uros Bizjak wrote: > Please check the attached patch, it implements this limitation in a correct > way: > - keeps memory operands for -Os or cold parts of the executable > - doesn't increase register pressure > - handles all situations where memory operand can propagate int

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener wrote: >> I assume that this is not right way for fixing such simple performance >> anomaly since we need to do redundant work - combine load to >> conditional and then split it back in peephole2? Does it look >> reasonable? Why we should produce no

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 3:39 PM, Yuri Rumyantsev wrote: > Guys, > > I assume that this is not right way for fixing such simple performance > anomaly since we need to do redundant work - combine load to > conditional and then split it back in peephole2? Does it look > reasonable? Why we should prod

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Guys, I assume that this is not right way for fixing such simple performance anomaly since we need to do redundant work - combine load to conditional and then split it back in peephole2? Does it look reasonable? Why we should produce non-efficient instrucction that must be splitted later? Best re

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak wrote: > On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener > wrote: > >>> This fix is aimed to remove performance degradation introduced by new >>> LRA phase that in fact is combining problem. Gcc combiner does >>> propagation of memory load to if-then-

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener wrote: >> This fix is aimed to remove performance degradation introduced by new >> LRA phase that in fact is combining problem. Gcc combiner does >> propagation of memory load to if-then-else gimple that was splitted >> back by old reload phase. LR

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Richard, I assume that this fix does not affect on code size since such pattern happens very rare although I can add a check on it if you insist. Register pressure is not a issue here since I assume that additional fill won't affect on performance as cmove with memory operand. I decided to not

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev wrote: > Hi Uros, > > This fix is for all x86 platforms, we tested it on core2/corei7, > atom/atom2 and AMD and got performance improvement +6% -- +11%. So I > don' think we need to introduce additioanl tune feature. > > Sorry for my typo with gcc

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Yuri Rumyantsev
Hi Uros, This fix is for all x86 platforms, we tested it on core2/corei7, atom/atom2 and AMD and got performance improvement +6% -- +11%. So I don' think we need to introduce additioanl tune feature. Sorry for my typo with gcc version - I ment mainline only since 4.7 does not use LRA. Thanks. Yu

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Richard Biener
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote: > Hi All, > > This fix is aimed to remove performance degradation introduced by new > LRA phase that in fact is combining problem. Gcc combiner does > propagation of memory load to if-then-else gimple that was splitted > back by old reload p

Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote: > This fix is aimed to remove performance degradation introduced by new > LRA phase that in fact is combining problem. Gcc combiner does > propagation of memory load to if-then-else gimple that was splitted > back by old reload phase. LRA d