On Thu, Aug 14, 2014 at 1:30 PM, Ian Romanick <[email protected]> wrote: > On 08/13/2014 11:58 PM, Matt Turner wrote: >> On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin <[email protected]> wrote: >>> I left all the variants as separate operations in the glsl ir. However for >>> gallium I only added the fine version, as it seems like DDX can do pretty >>> much >>> whatever it wants. I was on the fence about adding coarse versions as well >>> and >>> then using the FragmentShaderDerivative hint to select one or the other in >>> the >>> glsl -> tgsi conversion. >>> >>> In the case of nv50/nvc0, doing the fine version is pretty much the only >>> (easy) way of doing derivatives. I haven't traced the blob to see how it >>> handles things yet. In any case, on nv50/nvc0 all this is completely moot, >>> at >>> least for now. Curious about what the situation with other hardware is. >> >> i965 already implements coarse and fine derivatives, selectable by the >> derivatives hint, coarse default. > > I don't think that's the same thing. The "fine" derivatives in i965 > definitely do not meet this requirement: > > "...second-order fine derivatives, e.g., dFdxFine(dFdxFine(x)) > will properly reflect the difference between the independent > fine derivatives computed within the 2x2 square." > > As it is now, dFdxFine(dFdxFine(x*x*x))) will always be zero in the i965 > driver. Two pixels on the same line will have different dFdy, but the > dFdx will be the same. Right?
I sent a question about this to the list earlier (with no response other than my own), but I believe that to be a typo in the spec. Look at Issue 2, which explicitly talks about dFdxFine(dFdyFine(...)). There's no way to get second-order derivatives in a single variable with only 2 points, so it would want a larger block. > > Is there a piglit test for that specific part? (I haven't looked at the > piglit list at all.) There's a piglit test for dFdxFine(dFdyFine()). > >> The calculation of the derivative itself isn't faster for coarse >> derivatives, but it was discovered that if all of the samples of a >> sample_d are from the same LOD, it's a bunch faster on Haswell at >> least. See commit 848c0e72. And with coarse derivatives they are. >> >> Maybe other hardware has similar optimizations? >> >>> Also, the extension spec claims to require GLSL 4.00, which seems a little >>> extreme. Instead I restrict it to core contexts. Let me know if I should >>> change this. >> >> Making it core-only doesn't help, nor does it satisfy the GLSL >= 4.0 >> requirement in the spec. I'm not sure if we have a way to arbitrarily >> limit an extension to being exposed under certain GLSL versions... ? >> _______________________________________________ >> mesa-dev mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
