On 2/11/19 6:43 AM, Segher Boessenkool wrote: > On Sun, Feb 10, 2019 at 08:42:42PM -0600, Bill Schmidt wrote: >> On 2/10/19 4:05 PM, Segher Boessenkool wrote: >>> On Sun, Feb 10, 2019 at 10:10:02AM -0600, Bill Schmidt wrote: >>>> I've added executable tests for both shift-right algebraic and shift-right >>>> logical. >>>> Both fail prior to applying the patch, and work correctly afterwards. >>> Please add a test for left shifts, as well? >> Can do. I verified that left shifts were not broken and figured a test case >> had been added then, but have not checked. Good to test this particular >> scenario, though. > Well you actually added code for left shifts, too ;-)
Nope. :-) It just looks like it because of the way the diff hunks come out. The changes are to shift-right-logical and shift-right-algebraic. But it still doesn't hurt to add the test. > >>>> 2019-02-08 Bill Schmidt <wschm...@linux.ibm.com> >>>> >>>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right >>>> and shift-left vector built-ins need to include a TRUNC_MOD_EXPR >>>> for correct semantics. Also, don't expand a vector-splat if there >>>> is a type mismatch; let the back end handle it. >>> Does it always result in just the shift instruction now? Does the modulo >>> ever remain? (Maybe at -O0?) Modulo is hugely expensive; it is always >>> modulo by a power of two, so a simple bitmask, so maybe write that directly >>> instead? >> We always get the shift. For -mcpu=power8, we always load the mask from >> memory rather than generating the vspltisb, which is not ideal code >> generation, >> but is at least correct. > _Just_ the shift, I meant. I know we load a constant from memory, for > constants; I am worried about non-constants, do those get a modulo, or > just a mask. Also at -O0; if we get an actual division at -O0, there > probably are other cases where it isn't optimised away as well. Even at -O1 this is all folded away in GIMPLE: test_sradi_4 (vi64_t a) { vi64_t result; <bb 2> [local count: 1073741824]: result_3 = a_2(D) >> 4; return result_3; } At -O0 (if I hand-inline everything myself to avoid errors), we scalarize the modulo/masking operation into a rldicl for each doubleword. I really don't see any reason to change the code. > >> For -mcpu=power9, we get close, but have some bad register allocation and >> an unnecessary extend: >> >> xxspltib 0,4 <- why not just xxspltib 32,4? >> xxlor 32,0,0 <- wasted copy > Yeah, huh. Where does that come from... I blame splitters after reload. This only happens at -O2 and up, FWIW. At -O1 we allocate the registers reasonably. > >> vextsb2d 0,0 <- unnecessary due to vsrad semantics >> vsrad 2,2,0 >> >> Again, this is at least correct. We have more work to do to produce the >> most efficient code, but we have PR89213 open for that. > Yes. But I'd like to see at least a mask instead of a modulo. Again, there is no modulo by the time we exit GIMPLE. Our efficiency failings are all in the back end. > >>>> + /* It's sometimes useful to use one of these to build a >>>> + splat for V2DImode, since the upper bits will be ignored. >>>> + Don't fold if we detect that situation, as we end up >>>> + losing the splat instruction in this case. */ >>>> + if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs))))) >>>> + return false; >>> This isn't really detecting that situation... Instead, it doesn't fold >>> whenever the size of the splatted elements isn't the same as the size of >>> the elts of the target vector. That's probably perfectly safe, but please >>> spell it out. It's fine to mention the motivating case, of course. >> Yep, will correct. Actually, as I look back at my notes, I believe that this >> change is not necessary after all (same code generated with and without it). >> I'll verify. > Ha, that would be nice :-) Confirmed, BTW. > >>> Testing for vsx_hw but not enabling vsx is probably wrong, too. >> Weird. I just tried adding -mvsx > Does it _need_ VSX anyway? Are these builtins defined without it, too? Yes (vector long long / V2DImode requires VSX). > >> and I get this peculiar error we've seen >> before about AMD graphics card offloading: >> >> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc >> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ >> /home/wschmidt/gcc/gcc-mainline-t\ >> est2/gcc/testsuite/gcc.target/powerpc/srad-modulo.c >> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers >> -fdiagnostics-color=never -O2 -mvsx -lm -o \ >> ./srad-modulo.exe^M >> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to >> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, >> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ >> m^[[K' or '^[[01m^[[Kfast^[[m^[[K'^M >> compiler exited with status 1 > Huh, why do you get colour escapes at all? I thought the testsuite used > GCC_COLORS= to get rid of that nonsense. It's quite unreadable like this. Yep, that's why it took me a while to figure out what was going on. ;-) It was objecting to "-O3 -mvsx" looking like one argument because of an extra set of parens in my dg-options directive. > >> Executing on host: /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc >> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c >> -fno-diagnosti\ >> cs-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never >> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s (timeout = 300) >> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc >> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c >> -fno-diagnostic\ >> s-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never >> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s^M >> xgcc: fatal error: GCC is not configured to support amdgcn-unknown-amdhsa as >> offload target^M >> compilation terminated.^M >> compiler exited with status 1 > This is a test to see if there is offload stuff. There isn't. Which should never be run for RUNTESTFLAGS="powerpc.exp=vec-srad.c" ... something is really hosed up. > >> FAIL: gcc.target/powerpc/srad-modulo.c (test for excess errors) >> Excess errors: >> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to >> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, >> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\ >> m^[[K' or '^[[01m^[[Kfast^[[m^[[K' >> >> Any ideas what's causing this? I can't test with -mvsx until it's fixed. >> Do we still have a bug open on this? > I have never seen this before, not in bugzilla, not in my own testing (I do > however almost always use --disable-libgomp, it breaks way too often, > impeding all other progress). I pointed to the bugzilla in another reply -- which was "resolved" with a hack. I consider it still broken this way... I tested a revised version of the patch overnight and will submit shortly. Bill > >>> Does it need -O3, does -O2 not work? >> -O2 is fine, can change to that. > Please do. > >>> Should this testcase check expected machine code as well? >> I think we should defer that to the fix for PR89213. We aren't at >> ideal code quite yet. > Gotcha. > > > Segher >