https://bugs.kde.org/show_bug.cgi?id=406256
--- Comment #11 from Carl Love <c...@us.ibm.com> --- Julian: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.kde.org_sho > w-5Fbug.cgi-3Fid-3D406256&d=DwIFaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=RFEmMkZAk > -- > _wFGN5tkM_A&m=5o2Sjz00Tqqxz4dOXWvuGwWsbia9aURAkghSmeY0Sm0&s=1rRXfsCpJ > Dor_oQ1SSmOvBBYZkeJgD7aT4Iz_gYMxGk&e= > > --- Comment #8 from Julian Seward <jsew...@acm.org> --- > Thanks for the respin. I have mostly only minor comments about > it. Is OK to > land provided all the comments below are addressed, except for the > one about > vectorising negateVF32, which would be nice to fix if you can do so > relatively > easily, but is not essential. > > Also, when landing, please split the patch into two parts: the > implementation > and the tests, and land the implementation first. Done. I had all the changes in one patch to make it easier to move files around to 5 different machines for testing. > > > --- a/VEX/priv/guest_ppc_toIR.c > +++ b/VEX/priv/guest_ppc_toIR.c > > is_Zero_Vector, is_Denorm_Vector and dnormV32_adj have vectorised > nicely. Is > it also possible to do negateVF32 with vectors, rather lane > by lane as at > present? Note, for a vector version of is_NaN, you can see more or > less how > to do it by looking at isNan() in host_ppc_isel.c. > Renamed negateVF32(value) to negate_Vector( size, value) to make the naming more consistent. Also, structured it to generalize easily to more vector sizes. Currently just supporting vector of F32. This change requires creating the new function is_NaN_Vector(size, value) as you eluded to above. Again, structured the function to easily extend to more vector sizes. > > +static IRExpr* dnormV32_adj ( IRExpr* src ) > > nit: maybe rename this to be more consistent with your other vector- > helper > function names (is_Zero_Vector etc) > Yea, consistency is a good thing. Renamed dnormV32_adj() to dnorm_adj_Vector(). Note did not restructure the function to easily extend to other vector sizes. Left that for the future. > > + assign ( VSCR_NJ_mask, binop( Iop_64HLtoV128, > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) , > + unop( Iop_1Sto64, > + mkexpr( VSCR_NJ ) ) ) ); > > nit: VSCR_NJ isn't used past this point. Change its type to Ity_I64 > and lift > the 1Sto64 operation into that definition, so it isn't duplicated > here. > Yea, that would be better. Fixed. > > --- a/coregrind/m_dispatch/dispatch-ppc32-linux.S > +++ b/coregrind/m_dispatch/dispatch-ppc32-linux.S > > LafterFP2: > + /* set host Vector Status Control Register bit NJ to zero > + to ensure the host generate subnormal results for the > + vector floating point instructions. */ > + mfvscr 16 /* Clear NJ bit */ > + vspltisw 9,0x1 /* 4x 0x00000001 */ > + vspltisw 8,0x0 /* zero */ > + vsldoi 9,8,9,0x2 /* <<2*8 => 4x 0x00010000 > */ > + vnor 9,9,9 /* 4x 0xFFFEFFFF */ > + vand 16,16,9 /* Mask out NJ bit */ > + mtvscr 16 > > (1) Guard these with #ifdef HAS_ALTIVEC like the other Altivec stuff > in > this file. Otherwise this will fail when run on a non-Altivec > enabled target > (do we still support any of those)? And the same for the other to > assembly > files. > (2) (As a check) is the above sequence runnable even on the lowest > level > Altivec subset? Otherwise (again) it will fail at run time. > > (3) I wasn't entirely clear what the changes to the post-run > invariant checks > are (after label "postamble:"). IIUC, they already do check that > VSCR[NJ].host == 0, but the comments are wrong, and you've updated > the > comments, but not the code? Can you clarify/re-check? When I went in to fix up the code per you comments, I noticed that the code I added to set VSCR[NJ] = 0 is not doing anything. I had missed the code a bit lower: /* set host AltiVec control word to the default mode expected by VEX-generated code. */ ld 6,.tocent__vgPlain_machine_ppc64_has_VMX@toc(2) ld 6,0(6) cmpldi 6,0 beq .LafterVMX2 vspltisw 3,0x0 /* generate zero */ mtvscr 3 which is forcing the entire VSCR register to zero. The instruction vtvscr moves the contents of the specified register to VSCR. This is actually what I had originally been looking for when I was trying to figure out why Valgrind always generated subnormal results but natively I wasn't getting subnormal results on BE. This code to set VSCR to zero exists on ppc32, ppc64le, ppc64be. The comment /* Check VSCR[NJ] == 1 */ really throws me, hence I was trying to make it clear that we need the NJ bit to be zero. The code is checking for the bit to be set 1, if it is, then it calls the invariant violation. Chanaged the comment to make this clearer to the reader what is going on. So, I removed my code to set VSCR[NJ] as it is overwritten anyway. I updated the code with comments to make it clearer where VSCR[NJ] gets set to zero and really what is going on with the invarient check. So, I claim there is now no functional changes in the assembly code. Retested on Power 6, Power 7, Power 8LE, Power 8BE, Power 9. > > diff --git a/memcheck/tests/ppc32/vgcore.9687 > b/memcheck/tests/ppc32/vgcore.9687 > new file mode 100644 > > This should definitely not be in the patch! Argh!! not sure how I managed that. Fixed > Given the changes to the assembly functions are not trivial, probably best for another quick review before committing. I have updated the the corresponding bugzilla: Bug 406256 - PPC64, vector floating point instructions don't handle subnormal according to VSCR[NJ] bit setting https://bugs.kde.org/show_bug.cgi?id=406256 The changes to this patch require the patch for adding support to vlogefp and vexptefp instructions to change due to the function name changes. https://bugs.kde.org/show_bug.cgi?id=407340 This patch needs your OK on the name of the added Iop. Carl -- You are receiving this mail because: You are watching all bug changes.