2014-07-03 15:11 GMT+04:00 Uros Bizjak <ubiz...@gmail.com>: > On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: > >>>> Silvermont processors have penalty for instructions having 4+ bytes of >>>> prefixes (including escape >>>> bytes in opcode). This situation happens when REX prefix is used in SSE4 >>>> instructions. This >>>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over >>>> xmm8-xmm15 in those >>>> instructions. I achieved it by adding new tuning flag and new >>>> alternatives affected by tuning. >>> >>>> SSE4 instructions are not very widely used by GCC but I see some >>>> significant gains caused by >>>> this patch (tested on Avoton on -O3). >>> >>>> 2014-07-02 Ilya Enkovich <ilya.enkov...@intel.com> >>> >>>> * config/i386/constraints.md (Yr): New. >>>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. >>>> (REG_CLASS_NAMES): Likewise. >>>> (REG_CLASS_CONTENTS): Likewise. >>>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives >>>> which use only NO_REX_SSE_REGS. >>> >>> You don't need to add alternatives, just change existing alternatives >>> from "x" to "Yr". The allocator will handle reduced register set just >>> fine. >> >> Hi, >> >> Thanks for review! >> >> My first patch version did such replacement. Performance results were >> OK but I got into stability issues due to peephole2 pass. Peepholes >> may exchange operands of instructions and ignore register restrictions >> assuming all SSE registers are homogeneous. It caused unrecognized >> instructions on some tests. I preferred to add a new alternative >> instead of fixing peephole and possibly other similar problems. > > No, please rather fix the peephole2 patterns. It is just a matter of > putting satisfies_constraint_Xx to their insn condition. In effect, > peephole2 pass is nullifying your optimization. Also, RA is still free > to allocate unwanted registers, even when prefixed with "?".
I didn't find a nice way to fix peephole2 patterns to take register constraints into account. Is there any way to do it? Also fully restrict xmm8-15 does not seem right. It is just costly but not fully disallowed. Ilya > > Uros.