On September 23, 2014 5:33:35 PM CEST, Uros Bizjak <ubiz...@gmail.com> wrote:
>On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmaka...@redhat.com>
>wrote:
>
>>>> You are right constrain_operands is not upto LRA possibilities and
>we should make the following change:
>>>>
>>>> Index: recog.c
>>>> ===================================================================
>>>> --- recog.c     (revision 215337)
>>>> +++ recog.c     (working copy)
>>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>>                                || (strict < 0 && CONSTANT_P (op))
>>>>                                /* During reload, accept a pseudo 
>*/
>>>>                                || (reload_in_progress && REG_P (op)
>>>> -                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)))
>>>> +                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)
>>>> +                              /* LRA can put reg value into memory
>if
>>>> +                                 it is necessary.  */
>>>> +                              || (strict <= 0 && targetm.lra_p ()
>&& REG_P (op)))
>>>>                     win = 1;
>>>>                   else if (insn_extra_address_constraint (cn)
>>>>                            /* Every address operand can be reloaded
>to fit.  */
>>>>
>>>> But that is a different story (for insns with single alternative
>containing only "m").
>>>>
>>>> I guess I should submit such change for recog.c as a separate
>patch.
>>> I think that the above is the right approach to fix PR60704, so the
>>> current PR60704 fix [1] should be reverted.
>>>
>>> [1]
>https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>>
>>>
>> Ok. I can submit patch reverting it + the change in recog.c.
>>
>> I have still a question: do we really need
>>
>> (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>
>> As I wrote I'd always enable the alternative.  I don't expect
>performance improvement in disabling this alternative when path r->x is
>slow (as I heard it is implemented internally by moving through cache
>anyway).  Even it is slow I believe it is still not faster than
>r->m->x.  What do you think?
>
>The "r->x" alternative results in "vector" decoding on amdfam10. This
>is AMD-speak for microcoded instructions, and AMD optimization manual
>strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
>provide more relevant data on the performance impact.

IIRC a vector decoded instruction merely limits the frontend which can at most 
decode and dispatch one such insn at a time. So the performance impact depends 
on the context.

Richard.

>Thanks,
>Uros.


Reply via email to