On Sun, Feb 10, 2019 at 08:42:42PM -0600, Bill Schmidt wrote:
> On 2/10/19 4:05 PM, Segher Boessenkool wrote:
> > On Sun, Feb 10, 2019 at 10:10:02AM -0600, Bill Schmidt wrote:
> >> I've added executable tests for both shift-right algebraic and shift-right 
> >> logical.
> >> Both fail prior to applying the patch, and work correctly afterwards.
> > Please add a test for left shifts, as well?
> 
> Can do.  I verified that left shifts were not broken and figured a test case
> had been added then, but have not checked.  Good to test this particular
> scenario, though.

Well you actually added code for left shifts, too ;-)

> >> 2019-02-08  Bill Schmidt  <wschm...@linux.ibm.com>
> >>
> >>    * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right
> >>    and shift-left vector built-ins need to include a TRUNC_MOD_EXPR
> >>    for correct semantics.  Also, don't expand a vector-splat if there
> >>    is a type mismatch; let the back end handle it.
> > Does it always result in just the shift instruction now?  Does the modulo
> > ever remain?  (Maybe at -O0?)  Modulo is hugely expensive; it is always
> > modulo by a power of two, so a simple bitmask, so maybe write that directly
> > instead?
> 
> We always get the shift.  For -mcpu=power8, we always load the mask from
> memory rather than generating the vspltisb, which is not ideal code 
> generation,
> but is at least correct.

_Just_ the shift, I meant.  I know we load a constant from memory, for
constants; I am worried about non-constants, do those get a modulo, or
just a mask.  Also at -O0; if we get an actual division at -O0, there
probably are other cases where it isn't optimised away as well.

> For -mcpu=power9, we get close, but have some bad register allocation and
> an unnecessary extend:
> 
>         xxspltib 0,4   <- why not just xxspltib 32,4?
>         xxlor 32,0,0   <- wasted copy

Yeah, huh.  Where does that come from...  I blame splitters after reload.

>         vextsb2d 0,0   <- unnecessary due to vsrad semantics
>         vsrad 2,2,0
> 
> Again, this is at least correct.  We have more work to do to produce the
> most efficient code, but we have PR89213 open for that.

Yes.  But I'd like to see at least a mask instead of a modulo.

> >> +  /* It's sometimes useful to use one of these to build a
> >> +     splat for V2DImode, since the upper bits will be ignored.
> >> +     Don't fold if we detect that situation, as we end up
> >> +     losing the splat instruction in this case.  */
> >> +  if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs)))))
> >> +    return false;
> > This isn't really detecting that situation...  Instead, it doesn't fold
> > whenever the size of the splatted elements isn't the same as the size of
> > the elts of the target vector.  That's probably perfectly safe, but please
> > spell it out.  It's fine to mention the motivating case, of course.
> 
> Yep, will correct.  Actually, as I look back at my notes, I believe that this
> change is not necessary after all (same code generated with and without it).
> I'll verify.

Ha, that would be nice :-)

> > Testing for vsx_hw but not enabling vsx is probably wrong, too.
> 
> Weird.  I just tried adding -mvsx

Does it _need_ VSX anyway?  Are these builtins defined without it, too?

> and I get this peculiar error we've seen
> before about AMD graphics card offloading:
> 
> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ 
> /home/wschmidt/gcc/gcc-mainline-t\
> est2/gcc/testsuite/gcc.target/powerpc/srad-modulo.c 
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers 
> -fdiagnostics-color=never -O2 -mvsx -lm -o \
> ./srad-modulo.exe^M
> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to 
> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, 
> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\
> m^[[K' or '^[[01m^[[Kfast^[[m^[[K'^M
> compiler exited with status 1

Huh, why do you get colour escapes at all?  I thought the testsuite used
GCC_COLORS= to get rid of that nonsense.  It's quite unreadable like this.

> Executing on host: /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c    
> -fno-diagnosti\
> cs-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never  
> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s    (timeout = 300)
> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c 
> -fno-diagnostic\
> s-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never 
> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s^M
> xgcc: fatal error: GCC is not configured to support amdgcn-unknown-amdhsa as 
> offload target^M
> compilation terminated.^M
> compiler exited with status 1

This is a test to see if there is offload stuff.  There isn't.

> FAIL: gcc.target/powerpc/srad-modulo.c (test for excess errors)
> Excess errors:
> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to 
> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, 
> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\
> m^[[K' or '^[[01m^[[Kfast^[[m^[[K'
> 
> Any ideas what's causing this?  I can't test with -mvsx until it's fixed.
> Do we still have a bug open on this?

I have never seen this before, not in bugzilla, not in my own testing (I do
however almost always use --disable-libgomp, it breaks way too often,
impeding all other progress).

> > Does it need -O3, does -O2 not work?
> 
> -O2 is fine, can change to that.

Please do.

> > Should this testcase check expected machine code as well?
> 
> I think we should defer that to the fix for PR89213.  We aren't at
> ideal code quite yet.

Gotcha.


Segher

Reply via email to