On Mon, Feb 11, 2019 at 07:17:16AM -0600, Bill Schmidt wrote: > At -O0 (if I hand-inline everything myself to avoid errors), we scalarize > the modulo/masking operation into a rldicl for each doubleword. I really > don't see any reason to change the code.
So what does this look like at expand (at -O0)? Is it something that is done at gimple level, is it expand itself, is it some target thing? > >> For -mcpu=power9, we get close, but have some bad register allocation and > >> an unnecessary extend: > >> > >> xxspltib 0,4 <- why not just xxspltib 32,4? > >> xxlor 32,0,0 <- wasted copy > > Yeah, huh. Where does that come from... I blame splitters after reload. > > This only happens at -O2 and up, FWIW. At -O1 we allocate the registers > reasonably. Heh. > >> Weird. I just tried adding -mvsx > > Does it _need_ VSX anyway? Are these builtins defined without it, too? > > Yes (vector long long / V2DImode requires VSX). So something like /* { dg-do run } */ /* { dg-require-effective-target vsx_hw } */ /* { dg-options "-mvsx -O2" } */ then? > I pointed to the bugzilla in another reply -- which was "resolved" with a > hack. > I consider it still broken this way... Reopen that PR? > I tested a revised version of the patch overnight and will submit shortly. Thanks. Segher