Re: [PATCH] rs6000: Vector shift-right should honor modulo semantics

Bill Schmidt Mon, 11 Feb 2019 05:17:42 -0800

On 2/11/19 6:43 AM, Segher Boessenkool wrote:
> On Sun, Feb 10, 2019 at 08:42:42PM -0600, Bill Schmidt wrote:
>> On 2/10/19 4:05 PM, Segher Boessenkool wrote:
>>> On Sun, Feb 10, 2019 at 10:10:02AM -0600, Bill Schmidt wrote:
>>>> I've added executable tests for both shift-right algebraic and shift-right 
>>>> logical.
>>>> Both fail prior to applying the patch, and work correctly afterwards.
>>> Please add a test for left shifts, as well?
>> Can do.  I verified that left shifts were not broken and figured a test case
>> had been added then, but have not checked.  Good to test this particular
>> scenario, though.
> Well you actually added code for left shifts, too ;-)


Nope. :-)  It just looks like it because of the way the diff hunks come out.
The changes are to shift-right-logical and shift-right-algebraic.  But it
still doesn't hurt to add the test.

>
>>>> 2019-02-08  Bill Schmidt  <wschm...@linux.ibm.com>
>>>>
>>>>    * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right
>>>>    and shift-left vector built-ins need to include a TRUNC_MOD_EXPR
>>>>    for correct semantics.  Also, don't expand a vector-splat if there
>>>>    is a type mismatch; let the back end handle it.
>>> Does it always result in just the shift instruction now?  Does the modulo
>>> ever remain?  (Maybe at -O0?)  Modulo is hugely expensive; it is always
>>> modulo by a power of two, so a simple bitmask, so maybe write that directly
>>> instead?
>> We always get the shift.  For -mcpu=power8, we always load the mask from
>> memory rather than generating the vspltisb, which is not ideal code 
>> generation,
>> but is at least correct.
> _Just_ the shift, I meant.  I know we load a constant from memory, for
> constants; I am worried about non-constants, do those get a modulo, or
> just a mask.  Also at -O0; if we get an actual division at -O0, there
> probably are other cases where it isn't optimised away as well.

Even at -O1 this is all folded away in GIMPLE:

test_sradi_4 (vi64_t a)
{
  vi64_t result;

  <bb 2> [local count: 1073741824]:
  result_3 = a_2(D) >> 4;
  return result_3;

}

At -O0 (if I hand-inline everything myself to avoid errors), we scalarize
the modulo/masking operation into a rldicl for each doubleword.  I really
don't see any reason to change the code.

>
>> For -mcpu=power9, we get close, but have some bad register allocation and
>> an unnecessary extend:
>>
>>         xxspltib 0,4   <- why not just xxspltib 32,4?
>>         xxlor 32,0,0   <- wasted copy
> Yeah, huh.  Where does that come from...  I blame splitters after reload.

This only happens at -O2 and up, FWIW.  At -O1 we allocate the registers
reasonably.

>
>>         vextsb2d 0,0   <- unnecessary due to vsrad semantics
>>         vsrad 2,2,0
>>
>> Again, this is at least correct.  We have more work to do to produce the
>> most efficient code, but we have PR89213 open for that.
> Yes.  But I'd like to see at least a mask instead of a modulo.

Again, there is no modulo by the time we exit GIMPLE.  Our efficiency
failings are all in the back end.

>
>>>> +  /* It's sometimes useful to use one of these to build a
>>>> +     splat for V2DImode, since the upper bits will be ignored.
>>>> +     Don't fold if we detect that situation, as we end up
>>>> +     losing the splat instruction in this case.  */
>>>> +  if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs)))))
>>>> +    return false;
>>> This isn't really detecting that situation...  Instead, it doesn't fold
>>> whenever the size of the splatted elements isn't the same as the size of
>>> the elts of the target vector.  That's probably perfectly safe, but please
>>> spell it out.  It's fine to mention the motivating case, of course.
>> Yep, will correct.  Actually, as I look back at my notes, I believe that this
>> change is not necessary after all (same code generated with and without it).
>> I'll verify.
> Ha, that would be nice :-)

Confirmed, BTW.

>
>>> Testing for vsx_hw but not enabling vsx is probably wrong, too.
>> Weird.  I just tried adding -mvsx
> Does it _need_ VSX anyway?  Are these builtins defined without it, too?

Yes (vector long long / V2DImode requires VSX).

>
>> and I get this peculiar error we've seen
>> before about AMD graphics card offloading:
>>
>> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
>> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ 
>> /home/wschmidt/gcc/gcc-mainline-t\
>> est2/gcc/testsuite/gcc.target/powerpc/srad-modulo.c 
>> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers 
>> -fdiagnostics-color=never -O2 -mvsx -lm -o \
>> ./srad-modulo.exe^M
>> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to 
>> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, 
>> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\
>> m^[[K' or '^[[01m^[[Kfast^[[m^[[K'^M
>> compiler exited with status 1
> Huh, why do you get colour escapes at all?  I thought the testsuite used
> GCC_COLORS= to get rid of that nonsense.  It's quite unreadable like this.

Yep, that's why it took me a while to figure out what was going on. ;-)  It was
objecting to "-O3 -mvsx" looking like one argument because of an extra set of
parens in my dg-options directive.

>
>> Executing on host: /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
>> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c    
>> -fno-diagnosti\
>> cs-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never  
>> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s    (timeout = 300)
>> spawn -ignore SIGHUP /home/wschmidt/gcc/build/gcc-mainline-test2/gcc/xgcc 
>> -B/home/wschmidt/gcc/build/gcc-mainline-test2/gcc/ offload_gcn7262.c 
>> -fno-diagnostic\
>> s-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never 
>> -foffload=amdgcn-unknown-amdhsa -S -o offload_gcn7262.s^M
>> xgcc: fatal error: GCC is not configured to support amdgcn-unknown-amdhsa as 
>> offload target^M
>> compilation terminated.^M
>> compiler exited with status 1
> This is a test to see if there is offload stuff.  There isn't.

Which should never be run for RUNTESTFLAGS="powerpc.exp=vec-srad.c" ... 
something
is really hosed up.

>
>> FAIL: gcc.target/powerpc/srad-modulo.c (test for excess errors)
>> Excess errors:
>> ^[[01m^[[Kcc1:^[[m^[[K ^[[01;31m^[[Kerror: ^[[m^[[Kargument to 
>> '^[[01m^[[K-O^[[m^[[K' should be a non-negative integer, 
>> '^[[01m^[[Kg^[[m^[[K', '^[[01m^[[Ks^[[\
>> m^[[K' or '^[[01m^[[Kfast^[[m^[[K'
>>
>> Any ideas what's causing this?  I can't test with -mvsx until it's fixed.
>> Do we still have a bug open on this?
> I have never seen this before, not in bugzilla, not in my own testing (I do
> however almost always use --disable-libgomp, it breaks way too often,
> impeding all other progress).

I pointed to the bugzilla in another reply -- which was "resolved" with a hack.
I consider it still broken this way...

I tested a revised version of the patch overnight and will submit shortly.

Bill

>
>>> Does it need -O3, does -O2 not work?
>> -O2 is fine, can change to that.
> Please do.
>
>>> Should this testcase check expected machine code as well?
>> I think we should defer that to the fix for PR89213.  We aren't at
>> ideal code quite yet.
> Gotcha.
>
>
> Segher
>

Re: [PATCH] rs6000: Vector shift-right should honor modulo semantics

Reply via email to