On Sun, 2017-01-22 at 00:20 -0800, Francisco Jerez wrote: > "Juan A. Suarez Romero" <[email protected]> writes: > > > Rewrite atan2(y,x) to cover (+/-)INF values. > > > > This fixes several test cases in Vulkan CTS > > (dEQP-VK.glsl.builtin.precision.atan2.*) > > > > v2: do not flush denorms to 0 (jasuarez) > > --- > > src/compiler/spirv/vtn_glsl450.c | 48 > > +++++++++++++++++++++++++++++++++++----- > > 1 file changed, 42 insertions(+), 6 deletions(-) > > > > diff --git a/src/compiler/spirv/vtn_glsl450.c > > b/src/compiler/spirv/vtn_glsl450.c > > index 0d32fddbef..d52a22c0c3 100644 > > --- a/src/compiler/spirv/vtn_glsl450.c > > +++ b/src/compiler/spirv/vtn_glsl450.c > > @@ -299,18 +299,47 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x) > > return nir_fmul(b, tmp, nir_fsign(b, y_over_x)); > > } > > > > +/* > > + * Computes atan2(y,x) > > + */ > > static nir_ssa_def * > > build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x) > > { > > nir_ssa_def *zero = nir_imm_float(b, 0.0f); > > - > > - /* If |x| >= 1.0e-8 * |y|: */ > > - nir_ssa_def *condition = > > - nir_fge(b, nir_fabs(b, x), > > - nir_fmul(b, nir_imm_float(b, 1.0e-8f), nir_fabs(b, y))); > > + nir_ssa_def *inf = nir_imm_float(b, INFINITY); > > + nir_ssa_def *minus_inf = nir_imm_float(b, -INFINITY); > > + nir_ssa_def *m_3_pi_4 = nir_fmul(b, nir_imm_float(b, 3.0f), > > + nir_imm_float(b, M_PI_4f)); > > + > > + /* if y == +-INF */ > > + nir_ssa_def *y_is_inf = nir_feq(b, nir_fabs(b, y), inf); > > + > > + /* if x == +-INF */ > > + nir_ssa_def *x_is_inf = nir_feq(b, nir_fabs(b, x), inf); > > + > > + /* Case: y is +-INF */ > > + nir_ssa_def *y_is_inf_then = > > + nir_fmul(b, nir_fsign(b, y), > > + nir_bcsel(b, nir_feq(b, x, inf), > > + nir_imm_float(b, M_PI_4f), > > + nir_bcsel(b, nir_feq(b, x, minus_inf), > > + m_3_pi_4, > > + nir_imm_float(b, M_PI_2f)))); > > + > > + /* Case: x is +-INF */ > > + nir_ssa_def *x_is_inf_then = > > + nir_fmul(b, nir_fsign(b, y), > > + nir_bcsel(b, nir_feq(b, x, inf), > > + zero, > > + nir_imm_float(b, M_PIf))); > > + > > I don't think we need all these special cases. The majority of the > infinity/zero handling rules required by IEEE are fairly natural and > would be taken care of without any additional effort by the > floating-point division operation and single-argument atan function > below if they propagated infinities and zeroes according to IEEE rules. > > I had a look at the test results myself and noticed that the failures > are for the most part due to a precision problem in the current > implementation that doesn't only affect infinity -- Relative precision > also explodes as x grows above certain point, infinities just make the > problem catastrophic and cause it to return NaN instead of the > expected finite value. The reason for the precision problem is that > fdiv is later on lowered into an fmul+frcp sequence, and the latter may > flush the result to zero if the denominator was so huge that its > reciprocal would be denormalized. If the numerator happened to be > infinite you may end up with ∞/huge = NaN for the same reason. >
Right. For this case I'd submitted a patch to the test itself, that roughly speaking assumes any result as possible if denominator is big enough. https://gerrit.khronos.org/#/c/524/ I understand with your alternative proposal you would also handle this case correctly, making the CTS change not required, right? > On top of that there seem to be other issues with the current atan2 > implementation: > > - It doesn't handle zeros correctly. This may be related to your > observation that denorm arguments cause it to give bogus results, but > the problem doesn't seem to be related to denorms in particular, but > to the fact that denorms can get flushed to -0 which is in turn > handled incorrectly. The reason is that the existing code uses 'y >= > 0' to determine on which side of the branch cut we are, but that > causes the discontinuity to end up along the y=-epsilon line instead > of along the y=0 line as IEEE requires -- IOW, with the current > implementation very small negative y values behave as if they were > positive which causes the result to have a large absolute error of > 2π. > > - It doesn't give IEEE-compliant results when both arguments are > simultaneously infinite. This is not surprising given that IEEE > defining atan2(∞, ∞) = π/4 is fairly artificial (as are the other > rules for combinations of positive or negative infinity), strictly > speaking taking the limit along any direction other than the diagonal > would be as right or wrong. To make the matter worse IEEE disagrees > with itself on the direction limits are taken when it goes on and > defines e.g. atan2(+0, -0) = π taken along the horizontal. Luckily > GLSL specifically allows implementations to deviate from IEEE rules > in a neighborhood of zero ("Results are undefined if x and y are both > 0."), so we don't need to care about that one. > > Except for the last point, these issues seem serious enough to be worth > fixing -- I'll reply to this thread with an alternative implementation > of atan2 that addresses the first two issues (and actually uses less > instructions than the current implementation), plus another, optional > patch that addresses the third issue in order to make the CTS tests > happy (we can implement the whole oddball atan2(±∞, ±∞) corner cases as > IEEE says with only three additional Gen instructions by being bit > smart, but one could argue that Khronos should just make atan2(±∞, ±∞) > undefined since they're already deviating from IEEE-compliant behavior > at (±0, ±0)). > > P.S.: Been waiting for hours to get jenkins results and it looks like > the CI is busted, will send the patches once I get positive > results -- Likely not today ;). > > > + /* If x > 0 */ > > + nir_ssa_def *x_not_zero = > > + nir_fne(b, x, zero); > > > > /* Then...call atan(y/x) and fix it up: */ > > nir_ssa_def *atan1 = build_atan(b, nir_fdiv(b, y, x)); > > + > > nir_ssa_def *r_then = > > nir_bcsel(b, nir_flt(b, x, zero), > > nir_fadd(b, atan1, > > @@ -323,7 +352,14 @@ build_atan2(nir_builder *b, nir_ssa_def *y, > > nir_ssa_def *x) > > nir_ssa_def *r_else = > > nir_fmul(b, nir_fsign(b, y), nir_imm_float(b, M_PI_2f)); > > > > - return nir_bcsel(b, condition, r_then, r_else); > > + /* Everything together */ > > + return nir_bcsel(b, y_is_inf, > > + y_is_inf_then, > > + nir_bcsel(b, x_is_inf, > > + x_is_inf_then, > > + nir_bcsel(b, x_not_zero, > > + r_then, > > + r_else))); > > } > > > > static nir_ssa_def * > > -- > > 2.11.0 > > > > _______________________________________________ > > mesa-dev mailing list > > [email protected] > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
