On Mon, Dec 12, 2011 at 8:20 AM, Jose Fonseca <[email protected]> wrote: > ----- Original Message ----- >> On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca <[email protected]> >> wrote: >> > >> > >> > ----- Original Message ----- >> >> On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca >> >> <[email protected]> >> >> wrote: >> >> > I saw this email yesterday, but did not have time to reply. >> >> > >> >> > ----- Original Message ----- >> >> >> On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt <[email protected]> >> >> >> wrote: >> >> >> > On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák >> >> >> > <[email protected]> >> >> >> > wrote: >> >> >> >> unpack_float_z_Z24_X8 fails with -O2 in: >> >> >> >> - fbo-blit-d24s8 >> >> >> >> - fbo-depth-sample-compare >> >> >> >> - fbo-readpixels-depth-formats >> >> >> >> - glean/depthStencil >> >> >> >> >> >> >> >> With -O0, it works fine. >> >> >> >> >> >> >> >> I am removing all the assertions. There's not much point in >> >> >> >> having >> >> >> >> them, >> >> >> >> is there? >> >> >> > >> >> >> > Is -ffast-math involved at all? >> >> >> >> >> >> Yes, -fno-fast-math makes the problem go away. >> >> >> >> >> >> > >> >> >> > At a guess, replacing "* scale" with "/ (float)0xffffff" >> >> >> > makes >> >> >> > the >> >> >> > failure happen either way? >> >> >> >> >> >> Yes. Only using GLdouble fixes it. >> >> >> >> >> >> It makes sense. The mantissa without the sign has 23 bits, but >> >> >> 24 >> >> >> bits >> >> >> are required to exactly represent 0xffffff if I am not >> >> >> mistaken. >> >> > >> >> > I'm afraid this is wrong: single precision floating point can >> >> > describe 24bits uints (and therefore unorms) without any loss, >> >> > because although the mantissa has 23bits, the mantissa is only >> >> > used to represent all but the most significant bit, ie., there's >> >> > an implicit 24th bit always set to one. >> >> > >> >> > The fact that -fno-fast-math makes the problem go away is yet >> >> > another clear proof that the issue is _not_ lack of precision of >> >> > single-precision floating point -- as -fno-fast-math will still >> >> > use single-precision floating point numbers. Actually, >> >> > fno-fast-math, it will ensure that all intermediate steps are >> >> > done >> >> > in single precision instead of higher intel x87 80bit floating >> >> > points, on the 32bit x86 architecture. And I suspect the >> >> > problem >> >> > here is that the rounding w/ x80 yields wrong results. >> >> > >> >> > >> >> > I also disagree that using double precision is a good solution. >> >> > Either fast-math serves our purposes, or it doesn't. Using >> >> > double >> >> > precision to work-around issues with single-precision fast-math >> >> > is >> >> > the *worst* thing we could do. >> >> > >> >> > >> >> > Does the assertion failure even happen on 64bits? I doubt, as >> >> > x64 >> >> > ABI establishes the use of sse for all floating point, and x87 >> >> > 80bit floating point will never be used. So effectively this is >> >> > making all archs slower. >> >> > >> >> > >> >> > Honestly, I think the patch should be recalled. And we should >> >> > consider disabling fast-math. And maybe enabling -mfpmath=sse >> >> > on >> >> > 32bits x86 (i.e., require sse). >> >> >> >> You're probably right. Feel free to revert & fix the issue in some >> >> other way. >> > >> > OK. >> > >> > Could you please confirm whether the assertions failures you saw >> > were on 32bits or 64bits mode? >> >> 32-bit. > > Thanks. > > In that case I propose dropping fast-math, at least on x86 32bits, as it > makes (in particular the unorm24 <-> f32 conversions happen in a lot of > places in Mesa's source tree) and some times worse code [1], and simply use > the subset of finer grained options [2]: > > -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, > -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. > > which really meets our needs. > > About -mfpmath=sse on 32bits, although I believe that depending on sse would > be alright, it looks like -mfpmath=sse it's not a clear winner on 32bits , > because calls to libc still need to use x87 registers. > > > Jose > > > [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/ > [2] http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html
Thanks for digging into this, guys. I'm happy to drop -ffast-math (or use -fno-fast-math) but it would be interesting to do at least a few before/after comparisons just to make sure there's no surprises in performance. In any case, don't we still need to use double-precision for Z32 packing/unpacking? I ran into a failure in _mesa_pack_float_z_row() for Z32 on Saturday (debug build on x64). I'd like to leave Marek's patch committed as-is for now until we settle on another solution. -Brian _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
