[RFC, ARM] later split of symbol_refs

2012-06-27 Thread Dmitry Melnik
Hi, We'd like to note about CodeSourcery's patch for ARM backend, from which GCC mainline can gain 4% on SPEC2K INT: http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch (also the patch is attached). Originally, we noticed that GNU Go works 6

Re: [RFC, ARM] later split of symbol_refs

2012-06-29 Thread Dmitry Melnik
On 06/27/2012 07:55 PM, Ramana Radhakrishnan wrote: > I must admit that I had been suggesting to Zhenqiang about turning > this off by tightening the movsi_insn predicates rather than adding a > split, but given that it appears to produce enough benefit in this > case I don't have any reasons to

Re: [RFC, ARM] later split of symbol_refs

2012-06-29 Thread Dmitry Melnik
On 06/27/2012 07:53 PM, Richard Earnshaw wrote: Please update the ChangeLog entry (it's not appropriate to mention Sourcery G++) and add a comment as Steven has suggested. Otherwise OK. Updated. Ok to commit now? -- Best regards, Dmitry 2009-05-29 Julian Brown gcc/ * config/arm/arm

Re: [RFC, ARM] later split of symbol_refs

2012-07-04 Thread Dmitry Melnik
On 06/29/2012 06:31 PM, Ramana Radhakrishnan wrote: Ok with this comment? +;; Split symbol_refs at the later stage (after cprop), instead of generating +;; movt/movw pair directly at expand. Otherwise corresponding high_sum +;; and lo_sum would be merged back into memory load at cprop. Howeve

Re: [PATCH, ARM] Support NEON's VABD with combine pass

2011-09-12 Thread Dmitry Melnik
Interesting but I would be a bit defensive and make sure that this matches only if -ffast-math in the FP case. You are sort of relying on the fact that vsub wouldn't be generated without ffast-math but I'd rather be defensive about it . (This is in case it's not clear in the non-intrinsics case)

[RFC, ARM][PATCH 0/5] Enhancements to handling of Thumb-2 conditional insns

2011-12-30 Thread Dmitry Melnik
Hi, This series of patches solves few issues we found with Thumb-2 conditional insns. These fixes include: 1) Split if_then_else into cond_execs to generate only required minimum of IT-blocks; 2) Grouping conditional insns of same INSN_PRIORITY to avoid excessive splitting of IT-blocks; 3)

[RFC, ARM][PATCH 1/5] Split if_then_else into cond_execs

2011-12-30 Thread Dmitry Melnik
This patch adds splits for if_then_else into cond_execs. This helps generating the minimum number of IT-blocks for two consequent if_then_elses, e.g. one ITETE insn instead of two ITE insns, if if_then_else were expanded directly into assembly code. There are three splitters for the cases when b

[RFC, ARM][PATCH 2/5] Try not to split IT-blocks by scheduling conditional insns together

2011-12-30 Thread Dmitry Melnik
more target hooks just to save correct can_issue_more value. This has reduced code size by 144 bytes on SPEC2K INT with -O2 (no regressions). 2011-12-29 Dmitry Melnik gcc/ * config/arm/arm.c (arm_variable_issue, arm_sched_init, arm_sched_finish, arm_sched_re

[RFC, ARM][PATCH 3/5] Adjust the maximum number of if-converted insns to 4

2011-12-30 Thread Dmitry Melnik
branch insn and code won't grow. This limit is applied for each of converted conditional branches. This reduces code size by 96 bytes on SPEC2K INT with -O2 (with +4 byte regression on one test). 2011-12-29 Dmitry Melnik gcc/ * config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): New macro.

[RFC, ARM][PATCH 4/5] Limit on frequency in if-conversion

2011-12-30 Thread Dmitry Melnik
If one of branches has significantly greater probability than the other, then it may be better to rely on CPU's branch prediction and block reordering, than putting rarely executed instructions into the pipeline. In this patch we set 10% frequency ratio as a cutoff. On SPEC2K INT with -O2 this

[RFC, ARM][PATCH 5/5] Swap passes peephole2 and if_after_reload

2011-12-30 Thread Dmitry Melnik
After Thumb-2's peephole2 adds flag clobbering on suitable insns in order to generate 16-bit encoding for them, if-conversion can't transform these insns into cond_execs. In theory, if the instruction were converted to conditional form, it would also use 16-bit encoding, so the flag clobbering

[PATCH, ARM] Cortex-A8 backend fixes

2012-02-09 Thread Dmitry Melnik
This patch fixes few things in pipeline description of ARM Cortex-A8. 1) arm_no_early_alu_shift_value_dep() checks early dependence only for one argument, ignoring the dependence on register used as shift amount. For example, this function is used as a condition in bypass that sets dep_cost=0

[PATCH, ARM] Support NEON's VABD with combine pass

2011-07-29 Thread Dmitry Melnik
This patch adds two define_insn patterns for NEON vabd instruction to make combine pass recognize expressions matching (vabs (vsub ...)) patterns as vabd. This patch reduces code size of x264 binary from 649143 to 648343 (800 bytes, or 0.12%) and increases its performance on average by 2.5% on

[PATCH, ARM] Support NEON's VABD with combine pass

2011-07-29 Thread Dmitry Melnik
This patch adds two define_insn patterns for NEON vabd instruction to make combine pass recognize expressions matching (vabs (vsub ...)) patterns as vabd. This patch reduces code size of x264 binary from 649143 to 648343 (800 bytes, or 0.12%) and increases its performance on average by 2.5% on

[PATCH, ARM] Reload register class fix for NEON constants

2011-04-25 Thread Dmitry Melnik
Hi All, The attached patch changes the reload class for NEON constant vectors from GENERAL_REGS to NO_REGS. The issue was found on this code from libevas: void _op_blend_p_caa_dp(unsigned *s, unsigned* e, unsigned *d, unsigned c) { while (d < e) { *d = ( (*s) >> 8) & 0x00ff00ff)