On Sun, Feb 10, 2019 at 08:42:42PM -0600, Bill Schmidt wrote: > On 2/10/19 4:05 PM, Segher Boessenkool wrote: > > On Sun, Feb 10, 2019 at 10:10:02AM -0600, Bill Schmidt wrote: > >> I've added executable tests for both shift-right algebraic and shift-right > >> logical. > >> Both fail prior to applying the patch, and work correctly afterwards. > > Please add a test for left shifts, as well? > > Can do. I verified that left shifts were not broken and figured a test case > had been added then, but have not checked. Good to test this particular > scenario, though.
Well you actually added code for left shifts, too ;-) > >> 2019-02-08 Bill Schmidt <wschm...@linux.ibm.com> > >> > >> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right > >> and shift-left vector built-ins need to include a TRUNC_MOD_EXPR > >> for correct semantics. Also, don't expand a vector-splat if there > >> is a type mismatch; let the back end handle it. > > Does it always result in just the shift instruction now? Does the modulo > > ever remain? (Maybe at -O0?) Modulo is hugely expensive; it is always > > modulo by a power of two, so a simple bitmask, so maybe write that directly > > instead? > > We always get the shift. For -mcpu=power8, we always load the mask from > memory rather than generating the vspltisb, which is not ideal code > generation, > but is at least correct. _Just_ the shift, I meant. I know we load a constant from memory, for constants; I am worried about non-constants, do those get a modulo, or just a mask. Also at -O0; if we get an actual division at -O0, there probably are other cases where it isn't optimised away as well. > For -mcpu=power9, we get close, but have some bad register allocation and > an unnecessary extend: > > xxspltib 0,4 <- why not just xxspltib 32,4? > xxlor 32,0,0 <- wasted copy Yeah, huh. Where does that come from... I blame splitters after reload. > vextsb2d 0,0 <- unnecessary due to vsrad semantics > vsrad 2,2,0 > > Again, this is at least correct. We have more work to do to produce the > most efficient code, but we have PR89213 open for that. Yes. But I'd like to see at least a mask instead of a modulo. > >> + /* It's sometimes useful to use one of these to build a > >> + splat for V2DImode, since the upper bits will be ignored. > >> + Don't fold if we detect that situation, as we end up > >> + losing the splat instruction in this case. */ > >> + if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs))))) > >> + return false; > > This isn't really detecting that situation... Instead, it doesn't fold > > whenever the size of the splatted elements isn't the same as the size of > > the elts of the target vector. That's probably perfectly safe, but please > > spell it out. It's fine to mention the motivating case, of course. > > Yep, will correct. Actually, as I look back at my notes, I believe that this > change is not necessary after all (same code generated with and without it). > I'll verify. Ha, that would be nice :-) > > Testing for vsx_hw but not enabling vsx is probably wrong, too. > > Weird. I just tried adding -mvsx Does it _need_ VSX anyway? Are these builtins defined without it, too? > and I get this peculiar error we've seen > before about AMD graphics card offloading: > > spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ > /home/wschmidt/gcc/gcc-mainline-t\ > est2/gcc/testsuite/gcc.target/powerpc/srad-modulo.c > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers > -fdiagnostics-color=never -O2 -mvsx -lm -o \ > ./srad-modulo.exe^M > ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to > '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, > '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ > m^[[K' or '^[[01m^[[Kfast^[[m^[[K'^M > compiler exited with status 1 Huh, why do you get colour escapes at all? I thought the testsuite used GCC_COLORS= to get rid of that nonsense. It's quite unreadable like this. > Executing on host: /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c > -fno-diagnosti\ > cs-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never > -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s (timeout = 300) > spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc > -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c > -fno-diagnostic\ > s-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never > -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s^M > xgcc: fatal error: GCC is not configured to support amdgcn-unknown-amdhsa as > offload target^M > compilation terminated.^M > compiler exited with status 1 This is a test to see if there is offload stuff. There isn't. > FAIL: gcc.target/powerpc/srad-modulo.c (test for excess errors) > Excess errors: > ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to > '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, > '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ > m^[[K' or '^[[01m^[[Kfast^[[m^[[K' > > Any ideas what's causing this? I can't test with -mvsx until it's fixed. > Do we still have a bug open on this? I have never seen this before, not in bugzilla, not in my own testing (I do however almost always use --disable-libgomp, it breaks way too often, impeding all other progress). > > Does it need -O3, does -O2 not work? > > -O2 is fine, can change to that. Please do. > > Should this testcase check expected machine code as well? > > I think we should defer that to the fix for PR89213. We aren't at > ideal code quite yet. Gotcha. Segher