Re: [Numpy-discussion] Floating point precision expectations in NumPy

2021-08-19 Thread Jerry Morrison
On Fri, Jul 30, 2021 at 12:22 PM Sebastian Berg 
wrote:

> On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote:
> > On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > Hi all,
> > >
> > > there is a proposal to add some Intel specific fast math routine to
> > > NumPy:
> > >
> > > https://github.com/numpy/numpy/pull/19478
> > >
> > > part of numerical algorithms is that there is always a speed vs.
> > > precision trade-off, giving a more precise result is slower.
> > >
>
> 
>


> > "Close enough" depends on the application but non-linear models can
> > get the
> > "butterfly effect" where the results diverge if they aren't
> > identical.
>
>
> Right, so my hope was to gauge what the general expectation is.  I take
> it you expect a high accuracy.
>
> The error for the computations itself is seems low on first sight, but
> of course they can explode quickly in non-linear settings...
> (In the chaotic systems I worked with, the shadowing theorem would
> usually alleviate such worries. And testing the integration would be
> more important.  But I am sure for certain questions things may be far
> more tricky.)
>

I'll put forth an expectation that after installing a specific set of
libraries, the floating point results would be identical across platforms
and into the future. Ideally developers could install library updates (for
hardware compatibility, security fixes, or other reasons) and still get
identical results.

That expectation is for reproducibility, not high accuracy. So it'd be fine
to install different libraries [or maybe use those pip package options in
brackets, whatever they do?] to trade accuracy for speed. Could any
particular choice of accuracy still provide reproducible results across
platforms and time?



> > For a certain class of scientific programming applications,
> > reproducibility
> > is paramount.
> >
> > Development teams may use a variety of development laptops,
> > workstations,
> > scientific computing clusters, and cloud computing platforms. If the
> > tests
> > pass on your machine but fail in CI, you have a debugging problem.
> >
> > If your published scientific article links to source code that
> > replicates
> > your computation, scientists will expect to be able to run that code,
> > now
> > or in a couple decades, and replicate the same outputs. They'll be
> > using
> > different OS releases and maybe different CPU + accelerator
> > architectures.
> >
> > Reproducible Science is good. Replicated Science is better.
> > 
> >
> > Clearly there are other applications where it's easy to trade
> > reproducibility and some precision for speed.
>
>
> Agreed, although there are so many factors, often out of our control,
> that I am not sure that true replicability is achievable without
> containers :(.
>
> It would be amazing if NumPy could have a "replicable" mode, but I am
> not sure how that could be done, or if the "ground work" in the math
> and linear algebra libraries even exists.
>
>
> However, even if it is practically impossible to make things
> replicable, there is an argument for improving reproducibility and
> replicability, e.g. by choosing the high-accuracy version here.  Even
> if it is impossible to actually ensure.
>

Yes! Let's at least have reproducibility in mind and work on improving
it, e.g. by removing failure modes.
(Ditto for security :-)
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding New Feature

2021-08-19 Thread Bhavay Malhotra
Dear Team,



I’m thinking of adding a new feature in response to the issue no. #19039.



The feature is basically a function to check whether the data type of both
the numpy arrays are same or not.

If the numpy arrays have different data types function return a False else
it returns a True.



Please consider my feature and reply appropriately so that I can send my PR
accordingly.



Waiting to see a prompt reply.



Thanking You.

Regards,

Bhavay
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding New Feature

2021-08-19 Thread Matti Picus


On 19/8/21 6:15 pm, Bhavay Malhotra wrote:

Dear Team,

I’m thinking of adding a new feature in response to the issue no. #19039.

The feature is basically a function to check whether the data type of 
both the numpy arrays are same or not.


If the numpy arrays have different data types function return a False 
else it returns a True.



Thanking You.

Regards,

Bhavay


As we discussed on the issue https://github.com/numpy/numpy/issues/19039,


Is there a use-case where |"b.dtype == c.dtype"| would not suffice?


Matti

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floating point precision expectations in NumPy

2021-08-19 Thread Stanley Seibert
On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison <
jerry.morrison+nu...@gmail.com> wrote:

>
> I'll put forth an expectation that after installing a specific set of
> libraries, the floating point results would be identical across platforms
> and into the future. Ideally developers could install library updates (for
> hardware compatibility, security fixes, or other reasons) and still get
> identical results.
>
> That expectation is for reproducibility, not high accuracy. So it'd be
> fine to install different libraries [or maybe use those pip package options
> in brackets, whatever they do?] to trade accuracy for speed. Could any
> particular choice of accuracy still provide reproducible results across
> platforms and time?
>

While this would be nice, in practice bit-identical results for floating
point NumPy functions across different operating systems and future time is
going to be impractical to achieve.  IEEE-754 helps by specifying the
result of basic floating point operations, but once you move into special
math functions (like cos()) or other algorithms that can be implemented in
several "mathematically equivalent" ways, bit-level stability basically
becomes impossible without snapshotting your entire software stack.  Many
of these special math functions are provided by the operating system, which
generally do not make such guarantees.

Quick example: Suppose you want to implement sum() on a floating point
array.  If you start at the beginning of the array and iterate to the end,
adding each element to an accumulator, you will get one answer.  If you do
mathematically equivalent pairwise summations (using a temporary array for
storage), you will get a different, and probably more accurate answer.
Neither answer will (in general) be the same as summing those numbers
together with infinite precision, then rounding to the closest floating
point number at the end.  We could decide to make the specification for
sum() also specify the algorithm for computing sum() to ensure we make the
same round-off errors every time.  However, this kind of detailed
specification might be harder to write for other functions, or might even
lock the library into accuracy bugs that can't be fixed in the future.

I think the most pragmatic thing you can hope for is:

   - Bit-identical results with containers that snapshot everything,
   including the system math library.
   - Libraries that specify their accuracy levels when possible, and
   disclose when algorithm changes will affect the bit-identicalness of
   results.

On a meta-level, if analysis conclusions depend on getting bit-identical
results from floating point operations, then you really want to use a
higher precision float and/or an algorithm less sensitive to round-off
error.  Floating point numbers are not real numbers.  :)
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion