On Wed, Jun 27, 2012 at 5:02 AM, Richard Henderson <r...@redhat.com> wrote: > The problem I'd like to solve is stuff like > > pxor %xmm4, %xmm4 > ... > movdqa %xmm4, %xmm2 > pcmpgtd %xmm0, %xmm2 > > In that there's no point performing the copy from xmm4 > rather than just emitting a new pxor insn. > > The Real Problem, as I see it, is that at the point (g)cse > runs we have no visibility into the 2-operand matching > constraint on that pcmpgtd so we make the wrong choice > in sharing the zero. > > If we're using AVX, instead of SSE, we don't use matching > constraints and given the 3-operand insn, hoisting the zero > is the right and proper thing to do because we won't need > to emit that movdqa. > > Of course, this fires for normal integer code as well. > Some cases it's a clear win: > > -: 41 be 1f 00 00 00 mov $0x1f,%r14d > ... > -: 4c 89 f1 mov %r14,%rcx > +: b9 1f 00 00 00 mov $0x1f,%ecx > > sometimes not (increased code size): > > -: 41 bd 01 00 00 00 mov $0x1,%r13d > -: 4d 89 ec mov %r13,%r12 > +: 41 bc 01 00 00 00 mov $0x1,%r12d > +: 41 bd 01 00 00 00 mov $0x1,%r13d
I suppose that might be fixed if instead of + /* Only use the constant when it's just as cheap as a reg move. */ + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) + return c; you'd unconditionall use size costs? > although the total difference is minimal, and ambiguous: > > new text old text > cc1 13971302 13971342 > cc1plus 15882736 15882728 > > Also, note that in the first case above, r14 is otherwise > unused, and we wind up with an unnecessary save/restore of > the register in the function. > > Thoughts? We have an inverse issue elsewhere in that we don't CSE a propagated constant but get mov $0, %(eax) mov $0, 4%(eax) ... instead of doing one register clearing and then re-using that as zero. But I suppose reload is not exactly the place to fix that ;) Richard. > > r~