On Fri, Dec 14, 2012 at 11:47 AM, Yuri Rumyantsev wrote:
> With your new fix that add if-then-else splitting for memory operand I
> got expected performance speed-up - +6.7% for Atom and +8.4% for SNB.
> We need to do all testing this weekend and I will get you our final
> feedback on Monday.
Af
Hi Uros,
With your new fix that add if-then-else splitting for memory operand I
got expected performance speed-up - +6.7% for Atom and +8.4% for SNB.
We need to do all testing this weekend and I will get you our final
feedback on Monday.
Thanks ahead for all your help.
Yuri.
2012/12/13 Uros Bizj
> > Honza, I think the pass manager should call default_rtl_profile () before
> > each
> > RTL pass to avoid this, no?
>
> Please note that we have plenty of existing peephole2s that use
> optimize_insn_for_speed_p predicate. It is assumed to work ...
It is set by peep2 pass
static void
peephole
> On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote:
> > On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
> > wrote:
> >
> >>> I assume that this is not right way for fixing such simple performance
> >>> anomaly since we need to do redundant work - combine load to
> >>> conditional and then split
On Thu, Dec 13, 2012 at 4:02 PM, Yuri Rumyantsev wrote:
> We did not see any performance improvement on Atom in 32-bit mode at
> routelookup from eembc_2_0 (eembc_1_1).
I assume that for x86_64 the patch works as expected. Let's take a
bigger hammer for 32bit targets - the splitter that effectiv
Uros,
We did not see any performance improvement on Atom in 32-bit mode at
routelookup from eembc_2_0 (eembc_1_1).
Best regards.
Yuri.
2012/12/13 Uros Bizjak :
> On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote:
>
>>> The patch proposed by Uros is useless since we don't have free scratch
>>>
On Thu, Dec 13, 2012 at 3:27 PM, Uros Bizjak wrote:
>> The patch proposed by Uros is useless since we don't have free scratch
>> register to do splitting of memory operand:
>>
>> ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp]
>> 17[flags]
>>
>> ...
>>
>> (insn 96 131 13
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote:
> Hi Guys,
>
> The patch proposed by Uros is useless since we don't have free scratch
> register to do splitting of memory operand:
>
> ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp]
> 17[flags]
>
> ...
>
> (insn 96
On Thu, Dec 13, 2012 at 3:23 PM, Yuri Rumyantsev wrote:
> The patch proposed by Uros is useless since we don't have free scratch
> register to do splitting of memory operand:
>
> ;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp]
> 17[flags]
>
> ...
>
> (insn 96 131 132 7 (
Hi Guys,
The patch proposed by Uros is useless since we don't have free scratch
register to do splitting of memory operand:
;; regs ever live 0[ax] 1[dx] 2[cx] 3[bx] 4[si] 5[di] 6[bp] 7[sp]
17[flags]
...
(insn 96 131 132 7 (set (reg/v/f:SI 6 bp [orig:70 trie_root ] [70])
(if_the
On Thu, Dec 13, 2012 at 11:20 AM, Uros Bizjak wrote:
> On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener
> wrote:
>
> I assume that this is not right way for fixing such simple performance
> anomaly since we need to do redundant work - combine load to
> conditional and then split it ba
On Thu, Dec 13, 2012 at 10:51 AM, Richard Biener
wrote:
I assume that this is not right way for fixing such simple performance
anomaly since we need to do redundant work - combine load to
conditional and then split it back in peephole2? Does it look
reasonable? Why we should p
On Wed, Dec 12, 2012 at 7:32 PM, Uros Bizjak wrote:
> On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
> wrote:
>
>>> I assume that this is not right way for fixing such simple performance
>>> anomaly since we need to do redundant work - combine load to
>>> conditional and then split it back in pe
On 12/12/2012 10:32 AM, Uros Bizjak wrote:
> Please check the attached patch, it implements this limitation in a correct
> way:
> - keeps memory operands for -Os or cold parts of the executable
> - doesn't increase register pressure
> - handles all situations where memory operand can propagate int
On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
wrote:
>> I assume that this is not right way for fixing such simple performance
>> anomaly since we need to do redundant work - combine load to
>> conditional and then split it back in peephole2? Does it look
>> reasonable? Why we should produce no
On Wed, Dec 12, 2012 at 3:39 PM, Yuri Rumyantsev wrote:
> Guys,
>
> I assume that this is not right way for fixing such simple performance
> anomaly since we need to do redundant work - combine load to
> conditional and then split it back in peephole2? Does it look
> reasonable? Why we should prod
Guys,
I assume that this is not right way for fixing such simple performance
anomaly since we need to do redundant work - combine load to
conditional and then split it back in peephole2? Does it look
reasonable? Why we should produce non-efficient instrucction that must
be splitted later?
Best re
On Wed, Dec 12, 2012 at 1:55 PM, Uros Bizjak wrote:
> On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
> wrote:
>
>>> This fix is aimed to remove performance degradation introduced by new
>>> LRA phase that in fact is combining problem. Gcc combiner does
>>> propagation of memory load to if-then-
On Wed, Dec 12, 2012 at 12:44 PM, Richard Biener
wrote:
>> This fix is aimed to remove performance degradation introduced by new
>> LRA phase that in fact is combining problem. Gcc combiner does
>> propagation of memory load to if-then-else gimple that was splitted
>> back by old reload phase. LR
Hi Richard,
I assume that this fix does not affect on code size since such pattern
happens very rare although I can add a check on it if you insist.
Register pressure is not a issue here since I assume that additional
fill won't affect on performance as cmove with memory operand. I
decided to not
On Wed, Dec 12, 2012 at 12:47 PM, Yuri Rumyantsev wrote:
> Hi Uros,
>
> This fix is for all x86 platforms, we tested it on core2/corei7,
> atom/atom2 and AMD and got performance improvement +6% -- +11%. So I
> don' think we need to introduce additioanl tune feature.
>
> Sorry for my typo with gcc
Hi Uros,
This fix is for all x86 platforms, we tested it on core2/corei7,
atom/atom2 and AMD and got performance improvement +6% -- +11%. So I
don' think we need to introduce additioanl tune feature.
Sorry for my typo with gcc version - I ment mainline only since 4.7
does not use LRA.
Thanks.
Yu
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote:
> Hi All,
>
> This fix is aimed to remove performance degradation introduced by new
> LRA phase that in fact is combining problem. Gcc combiner does
> propagation of memory load to if-then-else gimple that was splitted
> back by old reload p
On Wed, Dec 12, 2012 at 12:27 PM, Yuri Rumyantsev wrote:
> This fix is aimed to remove performance degradation introduced by new
> LRA phase that in fact is combining problem. Gcc combiner does
> propagation of memory load to if-then-else gimple that was splitted
> back by old reload phase. LRA d
24 matches
Mail list logo