Announcing Python-Blosc2 4.0.0 =============================== This is a major version release where we have accelerated computation via multithreading using the [miniexpr library]( https://github.com/Blosc/miniexpr/tree/main). We have also changed the wheel layout to comply with PEP 427 and added support for the [blosc2-openzl plugin](https://github.com/Blosc/blosc2-openzl).
You can think of Python-Blosc2 4.x as an extension of NumPy/numexpr that: - Can deal with NDArray compressed objects using first-class codecs & filters. - Performs many kind of math expressions, including reductions, indexing... - Supports multi-threading and SIMD acceleration (via numexpr/miniexpr). - Can operate with data from other libraries (like PyTables, h5py, Zarr, Dask, etc). - Supports NumPy ufunc mechanism: mix and match NumPy and Blosc2 computations. - Integrates with Numba and Cython via UDFs (User Defined Functions). - Adheres to modern array API standard conventions ( https://data-apis.org/array-api/). - Can perform linear algebra operations (like ``blosc2.tensordot()``). Have a glimpse at the kind of acceleration that the new miniexpr engine can get (using an Ubuntu 24.04 box with an i9-13900K CPU and 32 GB of RAM): In [1]: import numpy as np In [2]: import blosc2 In [3]: import numexpr as ne In [4]: %time a = np.linspace(0., 1., int(1e9), dtype=np.float32) CPU times: user 1.41 s, sys: 1.14 s, total: 2.54 s Wall time: 2.54 s In [5]: %time b2a = blosc2.linspace(0., 1., int(1e9), dtype=np.float32) CPU times: user 6.89 s, sys: 776 ms, total: 7.67 s Wall time: 2.2 s In [6]: %time np.sum(np.sin(a + 0.5)) CPU times: user 1.05 s, sys: 435 ms, total: 1.48 s Wall time: 1.48 s Out[6]: np.float32(8.068454e+08) In [7]: %time ne.evaluate("sum(sin(a + 0.5))") CPU times: user 3.79 s, sys: 41.9 ms, total: 3.83 s Wall time: 3.83 s Out[7]: array(8.0684536e+08) In [8]: %time blosc2.evaluate("sum(sin(a + 0.5))") CPU times: user 2.96 s, sys: 459 ms, total: 3.42 s Wall time: 683 ms # 2.2x faster than NumPy and 6.8x faster than NumExpr Out[8]: np.float32(8.068454e+08) In [9]: %time blosc2.evaluate("sum(sin(b2a + 0.5))") CPU times: user 3.55 s, sys: 31.4 ms, total: 3.58 s Wall time: 176 ms # 8.4x faster than NumPy and 21.7x faster than NumExpr Out[9]: np.float32(8.068453e+08) In [10]: %time np.sum(np.sin(b2a + 0.5)) # blosc2 array support numpy ufunc/array interfaces CPU times: user 3.46 s, sys: 53.2 ms, total: 3.52 s Wall time: 174 ms Out[10]: np.float32(8.068453e+08) Here you can see blosc2.evaluate() acting as a drop-in replacement for numexpr.evaluate(); it supports parallelized reductions and transparently working with both native NumPy and Blosc2 arrays. It achieves way better performance by making a more effective use of the cache system in modern CPUs. See the rational at: https://ironarray.io/blog/miniexpr-powered-blosc2. In addition, Python-Blosc2 can work transparently with data either in-memory, on disk or in the network. In these days where memory prices have skyrocketed, compression is an important subject, most specially if this does not necessarily mean a drop in performance --and we are commited to increase the scenarios where this is the case. More info: https://blosc.org/python-blosc2/ Cheers, -- Francesc Alted
_______________________________________________ NumPy-Discussion mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: [email protected]
