[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

Carl Love Wed, 15 May 2019 14:48:48 -0700

https://bugs.kde.org/show_bug.cgi?id=406256


--- Comment #11 from Carl Love <c...@us.ibm.com> ---
Julian:


> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho
> w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=RFEmMkZAk
> --
> _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ
> Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e=
> 
> --- Comment #8 from Julian Seward <jsew...@acm.org> ---
> Thanks for the respin.  I have mostly only minor comments about
> it.  Is OK to
> land provided all the comments below are addressed, except for the
> one about
> vectorising negateVF32, which would be nice to fix if you can do so
> relatively
> easily, but is not essential.
> 
> Also, when landing, please split the patch into two parts: the
> implementation
> and the tests, and land the implementation first.

Done.  I had all the changes in one patch to make it easier to move
files around to 5 different machines for testing.

> 
> 
> --- a/VEX/priv/guest_ppc_toIR.c
> +++ b/VEX/priv/guest_ppc_toIR.c
> 
> is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised
> nicely.  Is
> it also possible to do negateVF32 with vectors, rather lane
> by lane as at
> present?  Note, for a vector version of is_NaN, you can see more or
> less how
> to do it by looking at isNan() in host_ppc_isel.c.
> 
Renamed negateVF32(value) to negate_Vector( size, value) to make the
naming more consistent.  Also, structured it to generalize easily to
more vector sizes.  Currently just supporting vector of F32.  This
change requires creating the new function is_NaN_Vector(size, value) as
you eluded to above.  Again, structured the function to easily extend
to more vector sizes.

> 
> +static IRExpr* dnormV32_adj ( IRExpr* src )
> 
> nit: maybe rename this to be more consistent with your other vector-
> helper
> function names (is_Zero_Vector etc)
> 

Yea, consistency is a good thing.  Renamed dnormV32_adj() to
dnorm_adj_Vector().  Note did not restructure the function to easily
extend to other vector sizes.  Left that for the future.
> 
> +   assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128,
> +                                 unop( Iop_1Sto64,
> +                                       mkexpr( VSCR_NJ ) ) ,
> +                                 unop( Iop_1Sto64,
> +                                       mkexpr( VSCR_NJ ) ) ) );
> 
> nit: VSCR_NJ isn't used past this point.  Change its type to Ity_I64
> and lift
> the 1Sto64 operation into that definition, so it isn't duplicated
> here.
> 

Yea, that would be better.  Fixed.

> 
> --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S
> +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S
> 
>  LafterFP2:
> +       /* set host Vector Status Control Register bit NJ to zero
> +          to ensure the host generate subnormal results for the
> +          vector floating point instructions. */
> +        mfvscr  16  /* Clear NJ bit */
> +        vspltisw 9,0x1                   /* 4x 0x00000001 */
> +        vspltisw 8,0x0                   /* zero */
> +        vsldoi   9,8,9,0x2               /* <<2*8 => 4x 0x00010000
> */
> +        vnor     9,9,9                   /* 4x 0xFFFEFFFF */
> +       vand     16,16,9                 /* Mask out NJ bit */
> +        mtvscr  16
> 
> (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff
> in
> this file.  Otherwise this will fail when run on a non-Altivec
> enabled target
> (do we still support any of those)?  And the same for the other to
> assembly
> files.


> (2) (As a check) is the above sequence runnable even on the lowest
> level
> Altivec subset?  Otherwise (again) it will fail at run time.
> 
> (3) I wasn't entirely clear what the changes to the post-run
> invariant checks
> are (after label "postamble:").  IIUC, they already do check that
> VSCR[NJ].host == 0, but the comments are wrong, and you've updated
> the
> comments, but not the code?  Can you clarify/re-check?

When I went in to fix up the code per you comments, I noticed that the
code I added to set VSCR[NJ] = 0 is not doing anything. I had missed
the code a bit lower:

       /* set host AltiVec control word to the default mode expected           
           by VEX-generated code. */                                            
   ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2)                           
   ld 6,0(6)                                                                    
        cmpldi  6,0                                                             
        beq     .LafterVMX2                                                     

        vspltisw 3,0x0  /* generate zero */                                     
        mtvscr  3    

which is forcing the entire VSCR register to zero.  The instruction
vtvscr moves the contents of the specified register to VSCR.  This is
actually what I had originally been looking for when I was trying to
figure out why Valgrind always generated subnormal results but natively
I wasn't getting subnormal results on BE.  This code to set VSCR to
zero exists on ppc32, ppc64le, ppc64be.  

The comment /* Check VSCR[NJ] == 1 */ really throws me, hence I was
trying to make it clear that we need the NJ bit to be zero.  The code
is checking for the bit to be set 1, if it is, then it calls the
invariant violation.  Chanaged the comment to make this clearer to the
reader what is going on.   

So, I removed my code to set VSCR[NJ] as it is overwritten anyway.  I
updated the code with comments to make it clearer where VSCR[NJ] gets
set to zero and really what is going on with the invarient check.  So,
I claim there is now no functional changes in the assembly code.

Retested on Power 6, Power 7, Power 8LE, Power 8BE, Power 9.

> 
> diff --git a/memcheck/tests/ppc32/vgcore.9687
> b/memcheck/tests/ppc32/vgcore.9687
> new file mode 100644
> 
> This should definitely not be in the patch!

Argh!! not sure how I managed that.  Fixed
> 

Given the changes to the assembly functions are not trivial, probably
best for another quick review before committing.

I have updated the the corresponding bugzilla:  Bug 406256 - PPC64,
vector floating point instructions don't handle subnormal according to
VSCR[NJ] bit setting
      https://bugs.kde.org/show_bug.cgi?id=406256

The changes to this patch require the patch for adding support to
vlogefp and vexptefp instructions to change due to the function name
changes.   

https://bugs.kde.org/show_bug.cgi?id=407340

This patch needs your OK on the name of the added Iop.

                        Carl

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 406256] PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting

Reply via email to