Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ian Romanick
On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> ---
> 
> We talked about something like this a while back, but the end result
> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> But it'd be nice for wine to be able to control this too.
> 
> I couldn't actually find any evidence of the discussion from 2017 or so,
> so ... let's have another one.
> 
>  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
>  1 file changed, 136 insertions(+)
>  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> 
> diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
> b/docs/specs/MESA_ieee_fp_alu_mode.spec
> new file mode 100644
> index 000..cb274f06571
> --- /dev/null
> +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> @@ -0,0 +1,136 @@
> +Name
> +
> +MESA_ieee_fp_alu_mode
> +
> +Name Strings
> +
> +GL_MESA_ieee_fp_alu_mode
> +
> +Contact
> +
> +Ilia Mirkin, ilia 'at' x.org
> +
> +IP Status
> +
> +No known IP issues.
> +
> +Status
> +
> +Proposed
> +
> +Version
> +
> +Number
> +
> +TBD
> +
> +Dependencies
> +
> +OpenGL 3.0 or OpenGL ES 3.0 is required.
> +
> +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> +specifications.
> +
> +Overview
> +
> +Pre-GL3 hardware did not generally have full IEEE floating point 
> operation
> +support. Among other things, 0 * Infinity would work out to 0, and NaN's
> +might not be generated, or otherwise be treated improperly. GL3-class and
> +later hardware introduced full IEEE FP support, including NaN, Infinity,
> +and the proper generation of these.
> +
> +Some software targeted at older hardware makes assumptions about how the
> +shader ALU works. And to accomodate these, GL3-class hardware has a way 
> to
> +change how the shader ALU behaves. There are no standards around this, 
> and
> +different hardware has different ways of dealing with it. However these
> +modes were designed specifically with such older software in mind.
> +
> +This extension introduces a way to configure a context to be in non-IEEE
> +ALU mode. This extension does not specify precisely what this means, as
> +each vendor has something different. Generally it means non-IEEE 
> compliant
> +handling of multiplication, as well as any other unspecified changes.

I think many of the other things are specified.  They're the non-IEEE
behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
those mimic the required behavior of early DX shader models.  There are
a bunch of cases that specify that zero is generated when IEEE would
require NaN.

If there's just a small handful of things like this, we'd probably be
better adding a couple new built-in functions to do the job.  The
problem on Intel hardware is... we really, really don't want to switch
to non-IEEE mode because it changes how a bunch of things work, and we
haven't tested any of that in many years.  I'd much rather put in some
kind of work-arounds for things that don't want multiplication or pow()
to generate NaN.

As for the mechanism, I'm very strongly in favor of something that would
be locked-in when the shader is compiled.  I really want to avoid any
potential that an external glEnable could trigger a a recompile.

The more I think about it... having an extension that adds a handful
built-in functions that give old shader model behavior would be a good
idea.  We could even test it. :)  I've looked a lot of shaders, and I've
seen a lot of not-quite-what-they-wanted methods for avoiding NaN
behavior in a bunch of these functions.  Having a special version of
inversesqrt() that returns FLT_MAX for 0 would be useful to a lot of
users.  As part of the spec we could even provide canonical versions of
the functions so that users could copy-and-paste

#ifndef GL_MESA_foo

float inveresqrt_nonIEEE(float x)
{
...
}

#endif

> +
> +New Tokens
> +
> +Accepted by the  parameter of Enable, Disable, and IsEnabled, by
> +the  parameter of GetBooleanv, GetIntegerv, GetFloatv, and
> +GetDoublev:
> +
> +IEEE_FP_ALU_MODE_MESA  0x
> +
> +
> +Changes to GLSL Section 4.1.4 Floats:
> +
> +Add the following paragraph:
> +
> +In case that the shader is being executed in a context with
> +IEEE_FP_ALU_MODE_MESA disabled, multiplication shall produce the 
> following
> +(non-IEEE-complaint) result:
> +
> +   float a = 0;
> +   float b = Infinity;
> +   float c = a * b; // c == 0
> +
> +There may be other implications from this mode being enabled, including
> +clamping of non-finite values, or anything else the hardware mode happens
> +to enable to achieve compatibility.
> +
> +New State
> +
> +(add to table 6.52, Miscellaneous, p.392)
> +
> +   Initial
> +Get Value  Type   Get Command   Value Description   
> Sec.   Attribute
> +

Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ilia Mirkin
On Mon, Feb 24, 2020 at 1:10 PM Ian Romanick  wrote:
>
> On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> > ---
> >
> > We talked about something like this a while back, but the end result
> > was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> > But it'd be nice for wine to be able to control this too.
> >
> > I couldn't actually find any evidence of the discussion from 2017 or so,
> > so ... let's have another one.
> >
> >  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
> >  1 file changed, 136 insertions(+)
> >  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> >
> > diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
> > b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > new file mode 100644
> > index 000..cb274f06571
> > --- /dev/null
> > +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > @@ -0,0 +1,136 @@
> > +Name
> > +
> > +MESA_ieee_fp_alu_mode
> > +
> > +Name Strings
> > +
> > +GL_MESA_ieee_fp_alu_mode
> > +
> > +Contact
> > +
> > +Ilia Mirkin, ilia 'at' x.org
> > +
> > +IP Status
> > +
> > +No known IP issues.
> > +
> > +Status
> > +
> > +Proposed
> > +
> > +Version
> > +
> > +Number
> > +
> > +TBD
> > +
> > +Dependencies
> > +
> > +OpenGL 3.0 or OpenGL ES 3.0 is required.
> > +
> > +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> > +specifications.
> > +
> > +Overview
> > +
> > +Pre-GL3 hardware did not generally have full IEEE floating point 
> > operation
> > +support. Among other things, 0 * Infinity would work out to 0, and 
> > NaN's
> > +might not be generated, or otherwise be treated improperly. GL3-class 
> > and
> > +later hardware introduced full IEEE FP support, including NaN, 
> > Infinity,
> > +and the proper generation of these.
> > +
> > +Some software targeted at older hardware makes assumptions about how 
> > the
> > +shader ALU works. And to accomodate these, GL3-class hardware has a 
> > way to
> > +change how the shader ALU behaves. There are no standards around this, 
> > and
> > +different hardware has different ways of dealing with it. However these
> > +modes were designed specifically with such older software in mind.
> > +
> > +This extension introduces a way to configure a context to be in 
> > non-IEEE
> > +ALU mode. This extension does not specify precisely what this means, as
> > +each vendor has something different. Generally it means non-IEEE 
> > compliant
> > +handling of multiplication, as well as any other unspecified changes.
>
> I think many of the other things are specified.  They're the non-IEEE
> behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
> those mimic the required behavior of early DX shader models.  There are
> a bunch of cases that specify that zero is generated when IEEE would
> require NaN.
>
> If there's just a small handful of things like this, we'd probably be
> better adding a couple new built-in functions to do the job.  The
> problem on Intel hardware is... we really, really don't want to switch
> to non-IEEE mode because it changes how a bunch of things work, and we
> haven't tested any of that in many years.  I'd much rather put in some
> kind of work-arounds for things that don't want multiplication or pow()
> to generate NaN.

So basically anything that ever involves multiplication needs to have
these variants. Things like dot, the various crazy ops of days past
whose names escape me but involve complex calculations, etc. Things
like pow are questionable (depends on if they get decomposed or not),
and things like rcp/rsq unquestionably produce NaN's (or Infinity,
sorry not 100% sure but easily checked) on NVIDIA irrespective of that
mode being enabled.

Also on Intel hardware, as you mention, the "non-ieee" mode is ...
interesting, so to allow for that, I didn't want to say anything other
than the positive cases. If you have no interest in exposing this, I
could rewrite this in a NVIDIA/AMD-friendly manner.

>
> As for the mechanism, I'm very strongly in favor of something that would
> be locked-in when the shader is compiled.  I really want to avoid any
> potential that an external glEnable could trigger a a recompile.

Stefan Dösinger suggested a context flag on IRC. I'd be fine with that
too, even if I have to go create 2 exts due to GLX/EGL.

>
> The more I think about it... having an extension that adds a handful
> built-in functions that give old shader model behavior would be a good
> idea.  We could even test it. :)  I've looked a lot of shaders, and I've
> seen a lot of not-quite-what-they-wanted methods for avoiding NaN
> behavior in a bunch of these functions.  Having a special version of
> inversesqrt() that returns FLT_MAX for 0 would be useful to a lot of
> users.  As part of the spec we could even provide canonical versions of
> the functions so that users could copy-and-paste

That would preclude nv50-series hardware from benefiting, since i

Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Matteo Bruni
On Mon, Feb 24, 2020 at 8:21 PM Ilia Mirkin  wrote:
>
> On Mon, Feb 24, 2020 at 1:10 PM Ian Romanick  wrote:
> >
> > On 2/23/20 5:57 PM, Ilia Mirkin wrote:
> > > ---
> > >
> > > We talked about something like this a while back, but the end result
> > > was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> > > But it'd be nice for wine to be able to control this too.
> > >
> > > I couldn't actually find any evidence of the discussion from 2017 or so,
> > > so ... let's have another one.
> > >
> > >  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
> > >  1 file changed, 136 insertions(+)
> > >  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
> > >
> > > diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
> > > b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > > new file mode 100644
> > > index 000..cb274f06571
> > > --- /dev/null
> > > +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
> > > @@ -0,0 +1,136 @@
> > > +Name
> > > +
> > > +MESA_ieee_fp_alu_mode
> > > +
> > > +Name Strings
> > > +
> > > +GL_MESA_ieee_fp_alu_mode
> > > +
> > > +Contact
> > > +
> > > +Ilia Mirkin, ilia 'at' x.org
> > > +
> > > +IP Status
> > > +
> > > +No known IP issues.
> > > +
> > > +Status
> > > +
> > > +Proposed
> > > +
> > > +Version
> > > +
> > > +Number
> > > +
> > > +TBD
> > > +
> > > +Dependencies
> > > +
> > > +OpenGL 3.0 or OpenGL ES 3.0 is required.
> > > +
> > > +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
> > > +specifications.
> > > +
> > > +Overview
> > > +
> > > +Pre-GL3 hardware did not generally have full IEEE floating point 
> > > operation
> > > +support. Among other things, 0 * Infinity would work out to 0, and 
> > > NaN's
> > > +might not be generated, or otherwise be treated improperly. 
> > > GL3-class and
> > > +later hardware introduced full IEEE FP support, including NaN, 
> > > Infinity,
> > > +and the proper generation of these.
> > > +
> > > +Some software targeted at older hardware makes assumptions about how 
> > > the
> > > +shader ALU works. And to accomodate these, GL3-class hardware has a 
> > > way to
> > > +change how the shader ALU behaves. There are no standards around 
> > > this, and
> > > +different hardware has different ways of dealing with it. However 
> > > these
> > > +modes were designed specifically with such older software in mind.
> > > +
> > > +This extension introduces a way to configure a context to be in 
> > > non-IEEE
> > > +ALU mode. This extension does not specify precisely what this means, 
> > > as
> > > +each vendor has something different. Generally it means non-IEEE 
> > > compliant
> > > +handling of multiplication, as well as any other unspecified changes.
> >
> > I think many of the other things are specified.  They're the non-IEEE
> > behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
> > those mimic the required behavior of early DX shader models.  There are
> > a bunch of cases that specify that zero is generated when IEEE would
> > require NaN.
> >
> > If there's just a small handful of things like this, we'd probably be
> > better adding a couple new built-in functions to do the job.  The
> > problem on Intel hardware is... we really, really don't want to switch
> > to non-IEEE mode because it changes how a bunch of things work, and we
> > haven't tested any of that in many years.  I'd much rather put in some
> > kind of work-arounds for things that don't want multiplication or pow()
> > to generate NaN.
>
> So basically anything that ever involves multiplication needs to have
> these variants. Things like dot, the various crazy ops of days past
> whose names escape me but involve complex calculations, etc. Things
> like pow are questionable (depends on if they get decomposed or not),
> and things like rcp/rsq unquestionably produce NaN's (or Infinity,
> sorry not 100% sure but easily checked) on NVIDIA irrespective of that
> mode being enabled.
>
> Also on Intel hardware, as you mention, the "non-ieee" mode is ...
> interesting, so to allow for that, I didn't want to say anything other
> than the positive cases. If you have no interest in exposing this, I
> could rewrite this in a NVIDIA/AMD-friendly manner.
>
> >
> > As for the mechanism, I'm very strongly in favor of something that would
> > be locked-in when the shader is compiled.  I really want to avoid any
> > potential that an external glEnable could trigger a a recompile.
>
> Stefan Dösinger suggested a context flag on IRC. I'd be fine with that
> too, even if I have to go create 2 exts due to GLX/EGL.
>
> >
> > The more I think about it... having an extension that adds a handful
> > built-in functions that give old shader model behavior would be a good
> > idea.  We could even test it. :)  I've looked a lot of shaders, and I've
> > seen a lot of not-quite-what-they-wanted methods for avoiding NaN
> > behavior in a bunc

Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ian Romanick
On 2/24/20 11:21 AM, Ilia Mirkin wrote:
> On Mon, Feb 24, 2020 at 1:10 PM Ian Romanick  wrote:
>>
>> On 2/23/20 5:57 PM, Ilia Mirkin wrote:
>>> ---
>>>
>>> We talked about something like this a while back, but the end result
>>> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
>>> But it'd be nice for wine to be able to control this too.
>>>
>>> I couldn't actually find any evidence of the discussion from 2017 or so,
>>> so ... let's have another one.
>>>
>>>  docs/specs/MESA_ieee_fp_alu_mode.spec | 136 ++
>>>  1 file changed, 136 insertions(+)
>>>  create mode 100644 docs/specs/MESA_ieee_fp_alu_mode.spec
>>>
>>> diff --git a/docs/specs/MESA_ieee_fp_alu_mode.spec 
>>> b/docs/specs/MESA_ieee_fp_alu_mode.spec
>>> new file mode 100644
>>> index 000..cb274f06571
>>> --- /dev/null
>>> +++ b/docs/specs/MESA_ieee_fp_alu_mode.spec
>>> @@ -0,0 +1,136 @@
>>> +Name
>>> +
>>> +MESA_ieee_fp_alu_mode
>>> +
>>> +Name Strings
>>> +
>>> +GL_MESA_ieee_fp_alu_mode
>>> +
>>> +Contact
>>> +
>>> +Ilia Mirkin, ilia 'at' x.org
>>> +
>>> +IP Status
>>> +
>>> +No known IP issues.
>>> +
>>> +Status
>>> +
>>> +Proposed
>>> +
>>> +Version
>>> +
>>> +Number
>>> +
>>> +TBD
>>> +
>>> +Dependencies
>>> +
>>> +OpenGL 3.0 or OpenGL ES 3.0 is required.
>>> +
>>> +The extension is written against the OpenGL GL 3.0 and OpenGL ES 3.0
>>> +specifications.
>>> +
>>> +Overview
>>> +
>>> +Pre-GL3 hardware did not generally have full IEEE floating point 
>>> operation
>>> +support. Among other things, 0 * Infinity would work out to 0, and 
>>> NaN's
>>> +might not be generated, or otherwise be treated improperly. GL3-class 
>>> and
>>> +later hardware introduced full IEEE FP support, including NaN, 
>>> Infinity,
>>> +and the proper generation of these.
>>> +
>>> +Some software targeted at older hardware makes assumptions about how 
>>> the
>>> +shader ALU works. And to accomodate these, GL3-class hardware has a 
>>> way to
>>> +change how the shader ALU behaves. There are no standards around this, 
>>> and
>>> +different hardware has different ways of dealing with it. However these
>>> +modes were designed specifically with such older software in mind.
>>> +
>>> +This extension introduces a way to configure a context to be in 
>>> non-IEEE
>>> +ALU mode. This extension does not specify precisely what this means, as
>>> +each vendor has something different. Generally it means non-IEEE 
>>> compliant
>>> +handling of multiplication, as well as any other unspecified changes.
>>
>> I think many of the other things are specified.  They're the non-IEEE
>> behaviors of GL_ARB_vertex_program and GL_ARB_fragment_program, and
>> those mimic the required behavior of early DX shader models.  There are
>> a bunch of cases that specify that zero is generated when IEEE would
>> require NaN.
>>
>> If there's just a small handful of things like this, we'd probably be
>> better adding a couple new built-in functions to do the job.  The
>> problem on Intel hardware is... we really, really don't want to switch
>> to non-IEEE mode because it changes how a bunch of things work, and we
>> haven't tested any of that in many years.  I'd much rather put in some
>> kind of work-arounds for things that don't want multiplication or pow()
>> to generate NaN.
> 
> So basically anything that ever involves multiplication needs to have
> these variants. Things like dot, the various crazy ops of days past
> whose names escape me but involve complex calculations, etc. Things
> like pow are questionable (depends on if they get decomposed or not),
> and things like rcp/rsq unquestionably produce NaN's (or Infinity,
> sorry not 100% sure but easily checked) on NVIDIA irrespective of that
> mode being enabled.
> 
> Also on Intel hardware, as you mention, the "non-ieee" mode is ...
> interesting, so to allow for that, I didn't want to say anything other
> than the positive cases. If you have no interest in exposing this, I
> could rewrite this in a NVIDIA/AMD-friendly manner.

I'd like to expose it because I see so many apps roll their own.  I
don't really want to use our hardware mode.  Even if I did, it's
beneficial to take a more holistic approach.  I think the whole compiler
stack wants to know that some operations behave differently so that the
optimizer can take advantage of that.  We mostly have options in the
compiler to force more NaN-correct behavior, but it would be nice to
know that the results of some operations have different range allowing
additional optimizations.

I've been planning to write an issue, but I haven't gotten around to it
yet.  I think the next thing to add to range tracking is
"must_be_finite" and "must_be_a_number".  For a lot of exiting things we
can easily reason about these predicates.  nir_op_fsat always produces a
finite number.  nir_op_min produces a number if either source is a
number, and it produces a f

Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ilia Mirkin
On Sun, Feb 23, 2020 at 8:57 PM Ilia Mirkin  wrote:
>
> ---
>
> We talked about something like this a while back, but the end result
> was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> But it'd be nice for wine to be able to control this too.
>
> I couldn't actually find any evidence of the discussion from 2017 or so,
> so ... let's have another one.

Ian, Matteo, thanks for your feedback.

Based on IRC discussion today, it looks like radeonsi is not
interested in gaining such a feature, and Ian's comments all point
away from exposing the actual hardware bits, which I think means this
ext is DOA. I don't think support in just nouveau + r600 justifies the
effort of getting this going -- too small of a user base between them,
and they can use nine if they really want better perf. I guess that's
why this ext died back in 2017 too, but at least now there will
hopefully be an easier-to-find record of it.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Jacob Lifshay
See also: http://bugs.libre-riscv.org/show_bug.cgi?id=188

It might be worthwhile to consider a Vulkan extension to support this as a
translation target for DX9 as well as other old HW?

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] Add GL_MESA_ieee_fp_alu_mode specification draft

2020-02-24 Thread Ilia Mirkin
On Mon, Feb 24, 2020 at 6:56 PM Ilia Mirkin  wrote:
>
> On Sun, Feb 23, 2020 at 8:57 PM Ilia Mirkin  wrote:
> >
> > ---
> >
> > We talked about something like this a while back, but the end result
> > was inconclusive. I added a TGSI MUL_ZERO_WINS shader property for nine.
> > But it'd be nice for wine to be able to control this too.
> >
> > I couldn't actually find any evidence of the discussion from 2017 or so,
> > so ... let's have another one.
>
> Ian, Matteo, thanks for your feedback.
>
> Based on IRC discussion today, it looks like radeonsi is not
> interested in gaining such a feature, and Ian's comments all point
> away from exposing the actual hardware bits, which I think means this
> ext is DOA. I don't think support in just nouveau + r600 justifies the
> effort of getting this going -- too small of a user base between them,
> and they can use nine if they really want better perf. I guess that's
> why this ext died back in 2017 too, but at least now there will
> hopefully be an easier-to-find record of it.
>
> Cheers,
>
>   -ilia

Oh, and for posterity, strfllw in #winehackers tracked down the
original thread from 2017:

https://lists.freedesktop.org/archives/mesa-dev/2017-January/140613.html

Not sure how I missed it, I even looked in Jan 2017, but there we are.
Apparently part of the difficulty for radeonsi is LLVM, but perhaps
ACO could soften some of that blow.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev