Hi Matt, Thanks for your comment! Your are right, the sign() in Mesa is really good. I found it hard to written it in C code. Beignet also support implementation using Gen IR defined in Beignet, which is almost directly mapped to Gen ASM. I will follow your suggestion. Thanks!
Ruiling > -----Original Message----- > From: Matt Turner [mailto:[email protected]] > Sent: Friday, January 30, 2015 4:01 AM > To: Song, Ruiling > Cc: [email protected] > Subject: Re: [Beignet] [PATCH] libocl: refine implementation of sign(). > > On Wed, Jan 28, 2015 at 11:18 PM, Ruiling Song <[email protected]> > wrote: > > Avoid if-branching. > > > > Signed-off-by: Ruiling Song <[email protected]> > > --- > > backend/src/libocl/tmpl/ocl_common.tmpl.cl | 16 +++++++++------- > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl > > b/backend/src/libocl/tmpl/ocl_common.tmpl.cl > > index db7b0d8..77bd2d3 100644 > > --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl > > +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl > > @@ -17,6 +17,7 @@ > > */ > > #include "ocl_common.h" > > #include "ocl_float.h" > > +#include "ocl_relational.h" > > > > > > ////////////////////////////////////////////////////////////////////// > > /////// > > // Common Functions > > @@ -55,11 +56,12 @@ OVERLOADABLE float smoothstep(float e0, float > e1, > > float x) { } > > > > OVERLOADABLE float sign(float x) { > > - if(x > 0) > > - return 1; > > - if(x < 0) > > - return -1; > > - if(x == -0.f) > > - return -0.f; > > - return 0.f; > > + union {float f; unsigned u;} ieee; > > + ieee.f = x; > > + unsigned k = ieee.u; > > + float r = (k&0x80000000) ? -1.0f : 1.0f; // differentiate +0.0f > > + -0.0f float s = 0.0f * r; s = (x == 0.0f) ? s : r; return > > + isnan(x) ? 0.0f : s; > > } > > -- > > 1.7.10.4 > > I don't know if the structure of Beignet allows it (I see that the > implementation is in OpenCL C rather than hardware instructions), but Mesa > implements sign() for GLSL in three instructions: > > cmp.nz.f0 null x:f 0.0:f > and ret:ud x:ud 0x80000000:ud > (+f0) or ret:ud ret:ud 0x3f800000:ud > > The AND instruction extracts the sign bit, and the predicated OR instruction > ORs in the hex value of 1.0 if x is not zero. > > This gives +1.0 if x > 0.0 > +0.0 if x == +0.0 > -0.0 if x == -0.0 > -1.0 if x < 0.0 > > And since the CMP.NZ's src1 is zero, you can move the conditional mod back > into the instruction that generated x. > > The CL spec says you also have to handle NaN, which this implementation > doesn't do, but that should just be an additional two instructions, I think: > > <CMP for NaN> (I don't remember precisely... CMPN.U maybe?) > (+f0) mov ret:f 0.0f > > I think this should be a few instructions shorter than what your code will > compile to. _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
