Brian,
On 2002.04.12 20:14 Brian Paul wrote:
> ...
>
> I'd like to see Mesa satisfy the 255*255=255 identity. Is it hard to
> implement that in the MMX code? If it is, we could let it go for now
> and see if anyone complains.
>
I guess you didn't received my previous email yet. This is already
satisfied by the MMX code, which (as is is now) give _always_ the exact
results, including in these extreme cases, for 8 bit.
But your comments sort of answer my previous question regarding this as
well.
> ...
>
> It's been at least a year since I touched that code. As far as I can
> remember the comments are correct. Though I don't remember if it was
> an issue at 5/6/5 or 8/8/8 color depth, or both. I don't know what
> else might have changed since then to cause different results with
> Glean.
>
It's an issue just with 8/8/8 color depth.
>
> > Thanks for all your good work, by the way!
>
> Yes!
>
> -Brian
>
So I guess it's probably best to leave the code as it is now... but wait!
And what if we do:
t/255 ~= (t + (t>>8) + (t>> 15)) >> 8
this gives 255 for t = 255*255.
I made some further enquires:
- also 16bit arithmetic only.
- it doesn't gives the exact results just 4.241.987 out of 16.777.216
possible cases, i.e., is exact 75% of the times.
- very easy to code, in fact already done for MMX code (see initial
patch attached)
- it also gives a 6% speedup, in my benchmark from the previous 3.637088
sec to 3.429032 sec. Plus a little more when I optimize the assembly code
a little further since it the abcense of rounding frees some registers.
- and glean likes it, since it just give an error of 0.522796 !!
I guess we've got ourselves a new method: Fonseca's method!! He!He!
Now for real. I would appreciate comments on this as I don't believe much
in wonderful discoveries..
Jos� Fonseca
PS: In case you're wandering if all these details do really matter so
much, most of the stuff here will apply to the C code optimizations as
well. And don't forget that when the 64bit processors get into our homes
we can make what is being done now on MMX code directly on the C code! Of
course that most of the readers have 3D cards that do this much faster &
better... but there's no fun on that! ;-)
PPS: I'm renaming the subjects because I've noticed that my threads have
the nasty habit of going on and on forever, and I want people to still
read my emails! ;-)
Index: mmx_blend.S
===================================================================
RCS file: /cvsroot/mesa3d/Mesa/src/X86/mmx_blend.S,v
retrieving revision 1.8
diff -u -r1.8 mmx_blend.S
--- mmx_blend.S 10 Apr 2002 16:32:32 -0000 1.8
+++ mmx_blend.S 12 Apr 2002 20:48:55 -0000
@@ -39,7 +39,15 @@
*
* achieving the exact results
*/
-#define GMBT_ROUNDOFF 1
+#define GMBT_ROUNDOFF 0
+
+/* instead of the roundoff this adds a small correction to satisfy the OpenGL criteria
+ *
+ * t/255 ~= (t + (t >> 8) + (t >> 15)) >> 8
+ *
+ * note that although is faster than rounding off it doesn't give always the exact
+results
+ */
+#define GMBT_GEOMETRIC_CORRECTION 1
/*
* do
@@ -282,6 +290,14 @@
PADDW ( MM3, MM2 ) /* t1 + (t1 >> 8) ~= (t1/255)
<< 8 */
PADDW ( MM5, MM6 ) /* t2 + (t2 >> 8) ~= (t2/255)
<< 8 */
+
+#if GMBT_GEOMETRIC_CORRECTION
+ PSRLW ( CONST(7), MM3 ) /* t1 >> 15
+ */
+ PSRLW ( CONST(7), MM5 ) /* t2 >> 15
+ */
+
+ PADDW ( MM3, MM2 ) /* t1 + (t1 >> 8) + (t1 >>15) ~=
+(t1/255) << 8 */
+ PADDW ( MM5, MM6 ) /* t2 + (t2 >> 8) + (t2 >>15) ~=
+(t2/255) << 8 */
+#endif
#endif
#if GMBT_SIGNED_ARITHMETIC