On September 23, 2014 5:33:35 PM CEST, Uros Bizjak <ubiz...@gmail.com> wrote: >On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmaka...@redhat.com> >wrote: > >>>> You are right constrain_operands is not upto LRA possibilities and >we should make the following change: >>>> >>>> Index: recog.c >>>> =================================================================== >>>> --- recog.c (revision 215337) >>>> +++ recog.c (working copy) >>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict) >>>> || (strict < 0 && CONSTANT_P (op)) >>>> /* During reload, accept a pseudo >*/ >>>> || (reload_in_progress && REG_P (op) >>>> - && REGNO (op) >= >FIRST_PSEUDO_REGISTER))) >>>> + && REGNO (op) >= >FIRST_PSEUDO_REGISTER) >>>> + /* LRA can put reg value into memory >if >>>> + it is necessary. */ >>>> + || (strict <= 0 && targetm.lra_p () >&& REG_P (op))) >>>> win = 1; >>>> else if (insn_extra_address_constraint (cn) >>>> /* Every address operand can be reloaded >to fit. */ >>>> >>>> But that is a different story (for insns with single alternative >containing only "m"). >>>> >>>> I guess I should submit such change for recog.c as a separate >patch. >>> I think that the above is the right approach to fix PR60704, so the >>> current PR60704 fix [1] should be reverted. >>> >>> [1] >https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989 >>> >>> >> Ok. I can submit patch reverting it + the change in recog.c. >> >> I have still a question: do we really need >> >> (eq_attr "alternative" "1") >> (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS >> || optimize_function_for_size_p (cfun)") >> >> As I wrote I'd always enable the alternative. I don't expect >performance improvement in disabling this alternative when path r->x is >slow (as I heard it is implemented internally by moving through cache >anyway). Even it is slow I believe it is still not faster than >r->m->x. What do you think? > >The "r->x" alternative results in "vector" decoding on amdfam10. This >is AMD-speak for microcoded instructions, and AMD optimization manual >strongly recommends avoiding them. I have CC'd Ganesh, maybe he can >provide more relevant data on the performance impact.
IIRC a vector decoded instruction merely limits the frontend which can at most decode and dispatch one such insn at a time. So the performance impact depends on the context. Richard. >Thanks, >Uros.