Re: [Numpy-discussion] New NEP: merging multiarray and umath

2018-03-08 Thread Gregor Thalhammer

Hi,

long time ago I wrote a wrapper to to use optimised and parallelized math 
functions from Intels vector math library 
geggo/uvml: Provide vectorized math function (MKL) for numpy 


I found it useful to inject (some of) the fast methods into numpy via 
np.set_num_ops(), to gain more performance without changing my programs.

While this original project is outdated, I can imagine that a centralised way 
to swap the implementation of math functions is useful. Therefor I suggest to 
keep np.set_num_ops(), but admittedly I do not understand all the technical 
implications of the proposed change.

best
Gregor___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New NEP: merging multiarray and umath

2018-03-11 Thread Gregor Thalhammer


> Am 09.03.2018 um 02:06 schrieb Nathaniel Smith :
> 
> On Thu, Mar 8, 2018 at 1:52 AM, Gregor Thalhammer
> mailto:gregor.thalham...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> long time ago I wrote a wrapper to to use optimised and parallelized math
>> functions from Intels vector math library
>> geggo/uvml: Provide vectorized math function (MKL) for numpy
>> 
>> I found it useful to inject (some of) the fast methods into numpy via
>> np.set_num_ops(), to gain more performance without changing my programs.
>> 
>> While this original project is outdated, I can imagine that a centralised
>> way to swap the implementation of math functions is useful. Therefor I
>> suggest to keep np.set_num_ops(), but admittedly I do not understand all the
>> technical implications of the proposed change.
> 
> The main part of the proposal is to merge the two libraries; the
> question of whether to deprecate set_numeric_ops is a bit separate.
> There's no technical obstacle to keeping it, except the usual issue of
> having more cruft to maintain :-).


> 
> It's usually true that any monkeypatching interface will be useful to
> someone under some circumstances, but we usually don't consider this a
> good enough reason on its own to add and maintain these kinds of
> interfaces. And an unfortunate side-effect of these kinds of hacky
> interfaces is that they can end up removing the pressure to solve
> problems properly. In this case, better solutions would include:
> 
> - Adding support for accelerated vector math libraries to NumPy
> directly (e.g. MKL, yeppp)
> 
> - Overriding the inner loops inside ufuncs like numpy.add that
> np.ndarray.__add__ ultimately calls. This would speed up all addition
> (whether or not it uses Python + syntax), would be a more general
> solution (e.g. you could monkeypatch np.exp to use MKL's fast
> vectorized exp), would let you skip reimplementing all the tricky
> shared bits of the ufunc logic, etc. Conceptually it's not even very
> hacky, because we allow you add new loops to existing ufuncs; making
> it possible to replace existing loops wouldn't be a big stretch. (In
> fact it's possible that we already allow this; I haven't checked.)
> 
> So I still lean towards deprecating set_numeric_ops. It's not the most
> crucial part of the proposal though; if it turns out to be too
> controversial then I'll take it out.

Dear Nathaniel,

since you referred to your reply in your latest post in this thread I comment 
here.

First, I agree that set_numeric_ops() is not very important for replacing numpy 
math functions with faster implementations, mostly because this covers only the 
basic operations (+, *, boolean operations), which are fast anyhow, only pow 
can be accelerated by a substantial factor.

I also agree that adding support for optimised math function libraries directly 
to numpy might be a better solution than patching numpy. But in the past there 
have been a couple of proposals to add fast vectorised math functions directly 
to numpy, e.g. for a GSoC project. There have always been long discussions 
about maintainability, testing, vendor lock-in, free versus non-free software — 
all attempts failed. Only the Intel accelerated Python distribution claims that 
it boosted performance for transcendental functions, but I do not know how they 
achieved this and if this could be integrated in the official numpy. 

Therefor I think there is some need for an „official“ way to swap numpy math 
functions at the user (Python) level at runtime. As Julian commented, you want 
this flexibility because of speed and accuracy trade-offs.

Just replacing the inner loop might be an alternative way, but I am not sure. 
Many optimised vector math libraries require contiguous arrays, so they don’t 
fulfil the expectations numpy has for an inner loop. So you would need to 
allocate memory, copy, and free memory for each call to the inner loop. I image 
this gives quite some overhead you could avoid by a completely custom ufunc. 
On the other hand, setting up a ufunc from inner loop functions is easy, you 
can reuse all the numpy machinery. I disagree with you that you have to 
reimplement the whole ufunc machinery if you swap math functions at the ufunc 
level.

Stupid question: how to get the first argument of 
 int PyUFunc_ReplaceLoopBySignature(PyUFuncObject 
<https://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#c.PyUFuncObject>*
 ufunc,
e.g. for np.add ?

So, please consider this when refactoring/redesigning the ufunc module.

Gregor



> 
> -n
> 
> -- 
> Nathaniel J. Smith -- https://vorpus.org <https://vorpus.org/>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@p

Re: [Numpy-discussion] Floating point precision expectations in NumPy

2021-07-28 Thread Gregor Thalhammer


> Am 28.07.2021 um 01:50 schrieb Sebastian Berg :
> 
> Hi all,
> 
> there is a proposal to add some Intel specific fast math routine to
> NumPy:
> 
>https://github.com/numpy/numpy/pull/19478

Many years ago I wrote a package
https://github.com/geggo/uvml
that makes the VML, a fast implementation of transcendetal math functions, 
available for numpy. Don’t know if it still compiles.
It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By 
this it is less machine dependent (optimized implementations are selected 
automatically depending on the availability of, e.g., SSE, AVX, or AVX512), 
just link to a library. It compiles as an external module, can be activated at 
runtime. 

Different precision models can be selected at runtime (globally). I thinks 
Intel advocates to use the LA (low accuracy) mode as a good compromise between 
performance and accuracy. Different people have strongly diverging opinions 
about what to expect.

The speedups possibly gained by these approaches often vaporize in 
non-benchmark applications, as for those functions performance is often limited 
by memory bandwidth, unless all your data stays in CPU cache. By default I 
would go for high accuracy mode, with option to switch to low accuracy if one 
urgently needs the better performance. But then one should use different 
approaches for speeding up numpy.

Gregor


> 
> part of numerical algorithms is that there is always a speed vs.
> precision trade-off, giving a more precise result is slower.
> 
> So there is a question what the general precision expectation should be
> in NumPy.  And how much is it acceptable to diverge in the
> precision/speed trade-off depending on CPU/system?
> 
> I doubt we can formulate very clear rules here, but any input on what
> precision you would expect or trade-offs seem acceptable would be
> appreciated!
> 
> 
> Some more details
> -
> 
> This is mainly interesting e.g. for functions like logarithms,
> trigonometric functions, or cubic roots.
> 
> Some basic functions (multiplication, addition) are correct as per IEEE
> standard and give the best possible result, but these are typically
> only correct within very small numerical errors.
> 
> This is typically measured as "ULP":
> 
> https://en.wikipedia.org/wiki/Unit_in_the_last_place
> 
> where 0.5 ULP would be the best possible result.
> 
> 
> Merging the PR may mean relaxing the current precision slightly in some
> places.  In general Intel advertises 4 ULP of precision (although the
> actual precision for most functions seems better).
> 
> 
> Here are two tables, one from glibc and one for the Intel functions:
> 
> https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> (Mainly the LA column) 
> https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
> 
> 
> Different implementation give different accuracy, but formulating some
> guidelines/expectation (or referencing them) would be useful guidance. 
> 
> For basic 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion