2014-07-03 15:11 GMT+04:00 Uros Bizjak <ubiz...@gmail.com>:
> On Thu, Jul 3, 2014 at 12:45 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
>
>>>> Silvermont processors have penalty for instructions having 4+ bytes of 
>>>> prefixes (including escape
>>>> bytes in opcode).  This situation happens when REX prefix is used in SSE4 
>>>> instructions.  This
>>>> patch tries to avoid such situation by preferring xmm0-xmm7 usage over 
>>>> xmm8-xmm15 in those
>>>> instructions.  I achieved it by adding new tuning flag and new 
>>>> alternatives affected by tuning.
>>>
>>>> SSE4 instructions are not very widely used by GCC but I see some 
>>>> significant gains caused by
>>>> this patch (tested on Avoton on -O3).
>>>
>>>> 2014-07-02  Ilya Enkovich  <ilya.enkov...@intel.com>
>>>
>>>> * config/i386/constraints.md (Yr): New.
>>>> * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
>>>> (REG_CLASS_NAMES): Likewise.
>>>> (REG_CLASS_CONTENTS): Likewise.
>>>> * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
>>>> which use only NO_REX_SSE_REGS.
>>>
>>> You don't need to add alternatives, just change existing alternatives
>>> from "x" to "Yr". The allocator will handle reduced register set just
>>> fine.
>>
>> Hi,
>>
>> Thanks for review!
>>
>> My first patch version did such replacement. Performance results were
>> OK but I got into stability issues due to peephole2 pass.  Peepholes
>> may exchange operands of instructions and ignore register restrictions
>> assuming all SSE registers are homogeneous.  It caused unrecognized
>> instructions on some tests.  I preferred to add a new alternative
>> instead of fixing peephole and possibly other similar problems.
>
> No, please rather fix the peephole2 patterns. It is just a matter of
> putting satisfies_constraint_Xx to their insn condition. In effect,
> peephole2 pass is nullifying your optimization. Also, RA is still free
> to allocate unwanted registers, even when prefixed with "?".

I didn't find a nice way to fix peephole2 patterns to take register
constraints into account. Is there any way to do it?
Also fully restrict xmm8-15 does not seem right.  It is just costly
but not fully disallowed.

Ilya

>
> Uros.

Reply via email to