On Wed, Jan 06, 2010 at 04:18:06PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 06, 2010 at 10:15:58AM +0000, Andrew Haley wrote:
> > On 01/06/2010 09:59 AM, Mark Colby wrote:
> > >>>> Yabbut, how come RTL cse can handle it in x86_64, but PPC not?
> > >>>
> > >>> Probably because the RTL on x86_64 uses and's and ior's, but PPC uses
> > >>> set's of zero_extract's (insvsi).
> > >>
> > >> Aha! Yes, that'll probably be it. It should be easy to fix cse to
> > >> recognize those too.
> >
> > > I'm not familiar with the gcc source yet, but just in case I get the
> > > time to look at this, could anyone give me a file/line ref to dive
> > > into and examine?
> >
> > Would you believe cse.c? :-)
> >
> > I can't find the line without investigating further.
> >
> > Andrew.
> >
> > P.S. This is a nontrivial task if you don't know gcc, but might be a
> > good place for a beginner to start. OTOH, might be hard: no way to
> > know without digging.
>
> I've digged a little bit and this optimizes the testcase on PowerPC 32-bit.
> The patch is completely untested though.
>
> On PowerPC 64-bit which apparently doesn't use ZERO_EXTRACT in this case I
> see a different issue. It generates
> li 3,0
> ori 3,3,32820
> sldi 3,3,16
> while IMHO 2 insns to load the constant would be completely sufficient,
Indeed.
> apparently rs6000_emit_set_long_const needs work.
> lis 3,0x8034
> extsw 3,3
> or
> li 3,0x401a
> sldi 3,3,17
> etc. do IMHO the same.
Huh? I don't think so:
- first one loads 0xffff_ffff_8034_0000 in r3, and the extsw looks redundant
- second ones ends up with 0x0000_0000_8034_0000 in r3, and looks optimal.
Gabriel