Re: [Numpy-discussion] Floating point precision expectations in NumPy

2021-07-30 Thread Jerry Morrison
On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg 
wrote:

> Hi all,
>
> there is a proposal to add some Intel specific fast math routine to
> NumPy:
>
> https://github.com/numpy/numpy/pull/19478
>
> part of numerical algorithms is that there is always a speed vs.
> precision trade-off, giving a more precise result is slower.
>
> So there is a question what the general precision expectation should be
> in NumPy.  And how much is it acceptable to diverge in the
> precision/speed trade-off depending on CPU/system?
>
> I doubt we can formulate very clear rules here, but any input on what
> precision you would expect or trade-offs seem acceptable would be
> appreciated!
>
>
> Some more details
> -
>
> This is mainly interesting e.g. for functions like logarithms,
> trigonometric functions, or cubic roots.
>
> Some basic functions (multiplication, addition) are correct as per IEEE
> standard and give the best possible result, but these are typically
> only correct within very small numerical errors.
>
> This is typically measured as "ULP":
>
>  https://en.wikipedia.org/wiki/Unit_in_the_last_place
>
> where 0.5 ULP would be the best possible result.
>
>
> Merging the PR may mean relaxing the current precision slightly in some
> places.  In general Intel advertises 4 ULP of precision (although the
> actual precision for most functions seems better).
>
>
> Here are two tables, one from glibc and one for the Intel functions:
>
>
> https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> (Mainly the LA column)
> https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
>
>
> Different implementation give different accuracy, but formulating some
> guidelines/expectation (or referencing them) would be useful guidance.
>

"Close enough" depends on the application but non-linear models can get the
"butterfly effect" where the results diverge if they aren't identical.

For a certain class of scientific programming applications, reproducibility
is paramount.

Development teams may use a variety of development laptops, workstations,
scientific computing clusters, and cloud computing platforms. If the tests
pass on your machine but fail in CI, you have a debugging problem.

If your published scientific article links to source code that replicates
your computation, scientists will expect to be able to run that code, now
or in a couple decades, and replicate the same outputs. They'll be using
different OS releases and maybe different CPU + accelerator architectures.

Reproducible Science is good. Replicated Science is better.


Clearly there are other applications where it's easy to trade
reproducibility and some precision for speed.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floating point precision expectations in NumPy

2021-07-30 Thread Sebastian Berg
On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:
> On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg < 
> sebast...@sipsolutions.net>
> wrote:
> 
> > Hi all,
> > 
> > there is a proposal to add some Intel specific fast math routine to
> > NumPy:
> > 
> >     https://github.com/numpy/numpy/pull/19478
> > 
> > part of numerical algorithms is that there is always a speed vs.
> > precision trade-off, giving a more precise result is slower.
> > 




I have to make a correction: I linked the SVML, which is distinct from
VML (which the PR proposes), the actual table for precision is here:

https://github.com/numpy/numpy/pull/19485#issuecomment-887995864



> "Close enough" depends on the application but non-linear models can
> get the
> "butterfly effect" where the results diverge if they aren't
> identical.


Right, so my hope was to gauge what the general expectation is.  I take
it you expect a high accuracy.

The error for the computations itself is seems low on first sight, but
of course they can explode quickly in non-linear settings...
(In the chaotic systems I worked with, the shadowing theorem would
usually alleviate such worries. And testing the integration would be
more important.  But I am sure for certain questions things may be far
more tricky.)


> 
> For a certain class of scientific programming applications,
> reproducibility
> is paramount.
> 
> Development teams may use a variety of development laptops,
> workstations,
> scientific computing clusters, and cloud computing platforms. If the
> tests
> pass on your machine but fail in CI, you have a debugging problem.
> 
> If your published scientific article links to source code that
> replicates
> your computation, scientists will expect to be able to run that code,
> now
> or in a couple decades, and replicate the same outputs. They'll be
> using
> different OS releases and maybe different CPU + accelerator
> architectures.
> 
> Reproducible Science is good. Replicated Science is better.
> 
> 
> Clearly there are other applications where it's easy to trade
> reproducibility and some precision for speed.


Agreed, although there are so many factors, often out of our control,
that I am not sure that true replicability is achievable without
containers :(.

It would be amazing if NumPy could have a "replicable" mode, but I am
not sure how that could be done, or if the "ground work" in the math
and linear algebra libraries even exists.


However, even if it is practically impossible to make things
replicable, there is an argument for improving reproducibility and
replicability, e.g. by choosing the high-accuracy version here.  Even
if it is impossible to actually ensure.

Cheers,

Sebastian


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion