On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:

>>> You are right constrain_operands is not upto LRA possibilities and we 
>>> should make the following change:
>>>
>>> Index: recog.c
>>> ===================================================================
>>> --- recog.c     (revision 215337)
>>> +++ recog.c     (working copy)
>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>                                || (strict < 0 && CONSTANT_P (op))
>>>                                /* During reload, accept a pseudo  */
>>>                                || (reload_in_progress && REG_P (op)
>>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>>> +                              /* LRA can put reg value into memory if
>>> +                                 it is necessary.  */
>>> +                              || (strict <= 0 && targetm.lra_p () && REG_P 
>>> (op)))
>>>                     win = 1;
>>>                   else if (insn_extra_address_constraint (cn)
>>>                            /* Every address operand can be reloaded to fit. 
>>>  */
>>>
>>> But that is a different story (for insns with single alternative containing 
>>> only "m").
>>>
>>> I guess I should submit such change for recog.c as a separate patch.
>> I think that the above is the right approach to fix PR60704, so the
>> current PR60704 fix [1] should be reverted.
>>
>> [1] 
>> https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>
>>
> Ok. I can submit patch reverting it + the change in recog.c.
>
> I have still a question: do we really need
>
> (eq_attr "alternative" "1")
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>                            || optimize_function_for_size_p (cfun)")
>
> As I wrote I'd always enable the alternative.  I don't expect performance 
> improvement in disabling this alternative when path r->x is slow (as I heard 
> it is implemented internally by moving through cache anyway).  Even it is 
> slow I believe it is still not faster than r->m->x.  What do you think?

The "r->x" alternative results in "vector" decoding on amdfam10. This
is AMD-speak for microcoded instructions, and AMD optimization manual
strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
provide more relevant data on the performance impact.

Thanks,
Uros.

Reply via email to