[Numpy-discussion] NEP 54 - SIMD infrastructure evolution to C++ and adopting Google Highway
Hi all, We have just NEP 54, "SIMD infrastructure evolution: adopting Google Highway when moving to C++?", with Draft status after a long review at https://github.com/numpy/numpy/pull/24138. It looks like it wasn't sent to this list before. Please see https://numpy.org/neps/nep-0054-simd-cpp-highway.html for the rendered version (complete text below). This is a complex topic, and the NEP captures more a discussion on the pros and cons of moving to Highway, and in what form. Most folks active in working on SIMD code in NumPy have weighed in in one of several calls, in the community meeting and the 3-weekly meeting of the recently formed NumPy Optimization Team. I think we can summarize the current status as follows: - Google Highway is now included in the main repo as a git submodule - We are +1 on using Highway for high-level operations where possible given accuracy constraints, and are already doing so for sorting functionality. - We are -1 on using Highway's dynamic dispatch, we prefer to stay with the current dynamic dispatch support via build system support, which has worked well for us for ~4 years now. - We are +0 to +0.5 on using Highway's form of 'universal intrinsics', in preference of moving our own universal intrinsics from C to C++. Both would be a major improvement on the current state of our C implementation. - For that latter decision, there isn't complete consensus on it, and also Highway is missing a few things that NumPy does have that we'd like to see it gain. In particular, a way to prototype and test new SIMD intrinsics from Python (see https://numpy.org/neps/nep-0054-simd-cpp-highway.html#the-simd-unit-testing-module ). Cheers, Ralf full text of the NEP: === NEP 54 — SIMD infrastructure evolution: adopting Google Highway when moving to C++? === :Author: Sayed Adel, Jan Wassenberg, Matti Picus, Ralf Gommers, Chris Sidebottom :Status: Draft :Type: Standards Track :Created: 2023-07-06 :Resolution: TODO Abstract We are moving the SIMD intrinsic framework, Universal Intrinsics, from C to C++. We have also moved to Meson as the build system. The Google Highway intrinsics project is proposing we use Highway instead of our Universal Intrinsics as described in `NEP 38`_. This is a complex and multi-faceted decision - this NEP is an attempt to describe the trade-offs involved and what would need to be done. Motivation and Scope We want to refactor the C-based Universal Intrinsics (see :ref:`NEP 38 `) to C++. This work was ongoing for some time, and Google's Highway was suggested as an alternative, which was already written in C++ and had support for scalable SVE and other reusable components (such as VQSort). The move from C to C++ is motivated by (a) code readability and ease of development, (b) the need to add support for sizeless SIMD instructions (e.g., ARM's SVE, RISC-V's RVV). As an example of the readability improvement, here is a typical line of C code from our current C universal intrinsics framework: .. code:: // The @name@ is the numpy-specific templating in .c.src files npyv_@sfx@ a5 = npyv_load_@sfx@(src1 + npyv_nlanes_@sfx@ * 4); This will change (as implemented in PR `gh-21057`_) to: .. code:: C++ auto a5 = Load(src1 + nlanes * 4); If the above C++ code were to use Highway under the hood it would look quite similar, it uses similarly understandable names as ``Load`` for individual portable intrinsics. The ``@sfx`` in the C version above is the template variable for type identifiers, e.g.: ``#sfx = u8, s8, u16, s16, u32, s32, u64, s64, f32, f64#``. Explicit use of bitsize-encoded types like this won't work for sizeless SIMD instruction sets. With C++ this is easier to handle; PR `gh-21057`_ shows how and contains more complete examples of what the C++ code will look like. The scope of this NEP includes discussing most relevant aspects of adopting Google Highway to replace our current Universal Intrinsics framework, including but not limited to: - Maintainability, domain expertise availability, ease of onboarding new contributor, and other social aspects, - Key technical differences and constraints that may impact NumPy's internal design or performance, - Build system related aspects, - Release timing related aspects. Out of scope (at least for now) is revisiting other aspects of our current SIMD support strategy: - accuracy vs. performance trade-offs when adding SIMD support to a function - use of SVML and x86-simd-sort (and possibly its equivalents for aarch64) - pulling in individual bits or algorithms of Highway (as in `gh-24018`_) or SLEEF (as discussed in that same PR) Usage and Impact N/A - there will be no significant user-visible changes. Backward compatibility -- There will be no changes in user-facin
[Numpy-discussion] Fwd: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module)
(I took this off-list unintentionally, so I'm forward each email to the list now) -- Forwarded message - From: Ralf Gommers Date: Thu, Dec 28, 2023 at 8:51 PM Subject: Re: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module) To: Dr. Thomas Orgis On Mon, Dec 25, 2023 at 10:37 AM Dr. Thomas Orgis < thomas.or...@uni-hamburg.de> wrote: > Hapy holidays … but I have an issue still that hopefully can be > addressed with the meson blas detection you are upstreaming(?). > Happy holidays to you too. And yes, I hope so:) > > Am Wed, 6 Dec 2023 18:06:01 +0100 > schrieb Ralf Gommers : > > > > Well, now is another day. Pkgsrc uses python -m build and I added > > > > > > -Csetup-args=-Dblas=${CBLAS_PC} > -Csetup-args=-Dlapack=${LAPACK_PC} > > > > > > which seems to work out fine using cblas.pc and lapack.pc in the case > > > of the netlib install. In fact, most linking is done only to libblas.so > > > instead of libcblas.so, as the linker is smart enough to throw away the > > > unused lib. > > > > > > > Great, thanks for confirming! > > This works for numpy and also installs scipy nicely, but this produces > a broken scipy install when using netlib reference libraries from > pkgsrc. These come as > > libblas.so > liblapack.so (NEEDing libblas.so) > libcblas.so (NEEDing libblas.so) > libpapacke.so (NEEDing liblapack.so, hence libblas.so) > > and their respective .pc files. This is the natural order that occus to > me when building from netlib upstream. This should work fine. It's auto-detected in NumPy already, and will be in SciPy in the future. For now, using `-Dblas=blas -Dlapack=lapack` in the SciPy build should work. > This also means that one could > just replace BLAS and put stock LAPACK on top, what optimized BLAS libs > usually start out with. This is indeed possible, if unusual. It's supported and one reason for why we have separate `blas` and `lapack` flags. I'd discourage distros from shipping something like that by default though, since it tends to lead to problems. Arch Linux used to do this, shipping an OpenBLAS without LAPACK symbols. Luckily they finally fixed that. Shipping non-default build configs like that is invariably a bad idea, and should only be done if there's a pressing need. Only that they tend to pack all symbols into > one common library, which then project builds like numpy rely on. > > Telling the meson build that BLAS is libcblas works as long as actually > CBLAS symbols are used. Please never do this. The library is BLAS, so you should use `-Dblas=blas` for NumPy. It will find `cblas` just fine that way. > If not — I presume now, as I didn't yet see the > actual build lines that are triggered via the python -m build and meson > indirections — the linker might discard the -lcblas and leave symbols > unresolved (--as-needed but no --no-undefined). > > This happens with scipy: > > $ LANG=C readelf -d > /data/pkg/lib/python3.11/site-packages/scipy/sparse/linalg/_dsolve/_superlu.so > |grep NEEDED > 0x0001 (NEEDED) Shared library: [libm.so.6] > 0x0001 (NEEDED) Shared library: [libc.so.6] > This is probably a bug in SciPy. The build target depends on both `blas` and `lapack`, and sets `DUSE_VENDOR_BLAS=1`. However, it looks like it should depend on `cblas`. If you add `cblas` to this line, I think it'll fix the issue: https://github.com/scipy/scipy/blob/6452a48c9611d16140b160091de6cf5299fadd9f/scipy/sparse/linalg/_dsolve/meson.build#L208 . > It would link against libopenblas_openmp.so if that had been the CBLAS > (and LAPACK) choice and all would be fine, but here, it should link > with libcblas.so, or directly to libblas.so, just like our regular > install of superlu: > > $ LANG=C readelf -d /data/pkg/lib/libsuperlu.so|grep NEEDED > 0x0001 (NEEDED) Shared library: [libblas.so.3] > 0x0001 (NEEDED) Shared library: [libm.so.6] > 0x0001 (NEEDED) Shared library: [libc.so.6] > > Of course, just not vendoring superlu would be one solution for scipy, > but I think the deeper issue with the meson BLAS support should be > solved: The 4 parts of the BLAS canon (not talking about SCALAPACK etc. > yet) need to be handled explicitly. > > It is confusing, though, as meson prints this: > > Run-time dependency blas found: YES 3.11.0 > Run-time dependency cblas found: YES 3.11.0 > Run-time dependency lapack found: YES 3.11.0 > > It suggests that it looked for and found 3 libraries, but actually, it > only cared for -llapack and -lcblas. It needs to find -lblas directly, > too (or the cblas package separately, for that matter, not as component > of blas). > > Is that easily fixable from your side? (I'm assuming numpy, scipy and > the future stock BLAS support of meson are handled together.) Is this > just an oversight on the scipy side and they could link the vendored > superly with
[Numpy-discussion] Fwd: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module)
-- Forwarded message - From: Dr. Thomas Orgis Date: Fri, Dec 29, 2023 at 12:00 AM Subject: Re: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module) To: Ralf Gommers Am Thu, 28 Dec 2023 20:51:27 +0100 schrieb Ralf Gommers : > > libblas.so > > liblapack.so (NEEDing libblas.so) > > libcblas.so (NEEDing libblas.so) > > libpapacke.so (NEEDing liblapack.so, hence libblas.so) > > > > and their respective .pc files. This is the natural order that occus to > > me when building from netlib upstream. > > > This should work fine. It's auto-detected in NumPy already, and will be in > SciPy in the future. For now, using `-Dblas=blas -Dlapack=lapack` in the > SciPy build should work. I noticed that with -Dblas=blas, which is in pkgsrc now. The detection code sets cblas and finds libcblas by dark magic / defaults that happen to match. But what if my setup uses -Dblas=netlib_blas? Then the internal guesswork would fail. Please consider a mode where the user specifies separate names for all 4 components. For package builds, we do not want any guess work, including assuming that libblas.so is accompanied by libcblas.so with that exact name. So I'd like -Dblas=$BLAS_PACKAGE -Dcblas=$CBLAS_PACKAGE \ -Dlapack=$LAPACK_PACKAGE -Dlapacke=$LAPACKE_PACKAGE where the values may all be the same or not. If I fail to provide one of those, feel free to guess for the rest (for example, assuming/trying that all of those are openblas if I say -Dblas=openblas). I also realized that including LAPACK in OpenBLAS is needed, but any new BLAS code could start out just replacing the netlib piece by piece. The partitioning is there and it is probably good for managing the complexity, limiting scope of the individual libraries. > > Telling the meson build that BLAS is libcblas works as long as actually > > CBLAS symbols are used. > > > Please never do this. The library is BLAS, so you should use `-Dblas=blas` > for NumPy. It will find `cblas` just fine that way. Oh. As I wrote before, we now have -Csetup-args=-Dblas=${CBLAS_PC} -Csetup-args=-Dlapack=${LAPACK_PC} for math/py-numpy. That's CBLAS_PC, not BLAS_PC. And this works. > This is probably a bug in SciPy. Well, apparently its just a miscommunication between us two. Scipy is fine with -Csetup-args=-Dblas=${BLAS_PC} -Csetup-args=-Dlapack=${LAPACK_PC} locating licblas by inferring it from libblas, and finding cblas in openblas_foobar, apparently. It prints those lines: Run-time dependency blas found: YES 3.11.0 Run-time dependency cblas found: YES 3.11.0 Run-time dependency lapack found: YES 3.11.0 blas: blas lapack : lapack While the numpy build does this: Run-time dependency cblas found: YES 3.11.0 Message: BLAS symbol suffix: Run-time dependency lapack found: YES 3.11.0 blas: cblas lapack : lapack This looks similar to the case of openblas_openmp for -Dblas and -Dlapack: Run-time dependency openblas_openmp found: YES 0.3.24 Message: BLAS symbol suffix: Run-time dependency openblas_openmp found: YES 0.3.24 blas: openblas_openmp lapack : openblas_openmp So scipy locates cblas based on the name blas, but doesn't really use cblas. Numpy is happy with libcblas bringing libblas in and calls it blas, but really uses the cblas interface. This looks a bit confusing. I guess it makes more sense to continue that discussion on the meson PRs for this functionality … as it transcends NumPy, anyway. I hope we can settle on something that works for autodetection and prescription of all parts. And I need to ponder if I leave it at -Dblas=$CBLAS_PC for pkgsrc now. It's somewhat wrong, but also more correct, as NumPy _really_ means to use CBLAS API, not BLAS. Alrighty then, Thomas -- Dr. Thomas Orgis HPC @ Universität Hamburg ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module)
(re-sending to list) On Fri, Dec 29, 2023 at 11:34 AM Ralf Gommers wrote: > > > On Fri, Dec 29, 2023 at 12:00 AM Dr. Thomas Orgis < > thomas.or...@uni-hamburg.de> wrote: > >> Am Thu, 28 Dec 2023 20:51:27 +0100 >> schrieb Ralf Gommers : >> >> > > libblas.so >> > > liblapack.so (NEEDing libblas.so) >> > > libcblas.so (NEEDing libblas.so) >> > > libpapacke.so (NEEDing liblapack.so, hence libblas.so) >> > > >> > > and their respective .pc files. This is the natural order that occus >> to >> > > me when building from netlib upstream. >> > >> > >> > This should work fine. It's auto-detected in NumPy already, and will be >> in >> > SciPy in the future. For now, using `-Dblas=blas -Dlapack=lapack` in the >> > SciPy build should work. >> >> I noticed that with -Dblas=blas, which is in pkgsrc now. The detection >> code sets cblas and finds libcblas by dark magic / defaults that happen >> to match. But what if my setup uses -Dblas=netlib_blas? Then the >> internal guesswork would fail. >> > > If the library name is libcblas.so it will still be found. If it's also a > nonstandard name, then yes it's going to fail. I'd say though that (a) this > isn't a real-world situation as far as we know, (b) just don't do this as a > packager, and (c) if you really must, you can still make it work by > providing a custom `cblas.pc` (see > http://scipy.github.io/devdocs/building/blas_lapack.html#using-pkg-config-to-detect-libraries-in-a-nonstandard-location > ). > > Please consider a mode where the user specifies separate names for all >> 4 components. For package builds, we do not want any guess work, >> including assuming that libblas.so is accompanied by libcblas.so with >> that exact name. >> >> So I'd like >> >> -Dblas=$BLAS_PACKAGE -Dcblas=$CBLAS_PACKAGE \ >> -Dlapack=$LAPACK_PACKAGE -Dlapacke=$LAPACKE_PACKAGE >> >> where the values may all be the same or not. If I fail to provide one >> of those, feel free to guess for the rest (for example, assuming/trying >> that all of those are openblas if I say -Dblas=openblas). > > > We don't use LAPACKE, so that one can be ignored. For CBLAS, I'd honestly > rather get a bug report than add new CLI flags for a situation that seems > to be purely hypothetical. Things work on all known distributions I > believe, and this design isn't new but was the same that numpy.distutils > uses. We can consider a new `-Dcblas` flag at any point, there is nothing > in the design preventing us from adding it later. But I'd rather only do so > if there's a real need. > > >> I also realized that including LAPACK in OpenBLAS is needed, but any >> new BLAS code could start out just replacing the netlib piece by piece. >> The partitioning is there and it is probably good for managing the >> complexity, limiting scope of the individual libraries. >> >> > > Telling the meson build that BLAS is libcblas works as long as >> actually >> > > CBLAS symbols are used. >> > >> > >> > Please never do this. The library is BLAS, so you should use >> `-Dblas=blas` >> > for NumPy. It will find `cblas` just fine that way. >> >> Oh. As I wrote before, we now have >> >> -Csetup-args=-Dblas=${CBLAS_PC} >> -Csetup-args=-Dlapack=${LAPACK_PC} >> >> for math/py-numpy. That's CBLAS_PC, not BLAS_PC. And this works. >> > > I assume that it also passes if you'd pass in BLAS_PC? > > >> >> > This is probably a bug in SciPy. >> >> Well, apparently its just a miscommunication between us two. Scipy is >> fine with >> > > Phew:) I also just confirmed by writing a new SciPy CI job for the split > Netlib BLAS situation, based on how OpenSUSE packages it. And that passes. > > >> >> -Csetup-args=-Dblas=${BLAS_PC} >> -Csetup-args=-Dlapack=${LAPACK_PC} >> >> locating licblas by inferring it from libblas, and finding cblas in >> openblas_foobar, apparently. It prints those lines: >> >> Run-time dependency blas found: YES 3.11.0 >> Run-time dependency cblas found: YES 3.11.0 >> Run-time dependency lapack found: YES 3.11.0 >> blas: blas >> lapack : lapack >> >> While the numpy build does this: >> >> Run-time dependency cblas found: YES 3.11.0 >> Message: BLAS symbol suffix: >> Run-time dependency lapack found: YES 3.11.0 >> blas: cblas >> lapack : lapack >> >> This looks similar to the case of openblas_openmp for -Dblas and -Dlapack: >> >> Run-time dependency openblas_openmp found: YES 0.3.24 >> Message: BLAS symbol suffix: >> Run-time dependency openblas_openmp found: YES 0.3.24 >> blas: openblas_openmp >> lapack : openblas_openmp >> >> So scipy locates cblas based on the name blas, but doesn't really use >> cblas. > > > It does in a few places, like SuperLU. > > >> Numpy is happy with libcblas bringing libblas in and calls it >> blas, but really uses the cblas interface. This looks a bit confusing. >> > > I may be able to add something to the docs, but there should be no > confusion. We need "BLAS with CBLAS symbols". CBLAS should sim
[Numpy-discussion] Re: incomplete BLAS/CBLAS linking (Telling meson build which CBLAS/LAPACK (LAPACKE?) to use via pkgconfig module)
On Sat, Dec 30, 2023 at 1:57 PM Dr. Thomas Orgis < thomas.or...@uni-hamburg.de> wrote: > > Am Fri, 29 Dec 2023 11:34:04 +0100 > schrieb Ralf Gommers : > > > If the library name is libcblas.so it will still be found. If it's also a > > nonstandard name, then yes it's going to fail. I'd say though that (a) > this > > isn't a real-world situation as far as we know, > > It can be more funny. I just notied on an Ubuntu system (following > Debian for sure, here) that there are both > > /usr/lib/x86_64-linux-gnu/libblas.so.3 > /usr/lib/x86_64-linux-gnu/libcblas.so.3 > > but those belong to different packages. The first contains BLAS and > CBLAS API and is installed from netlib code. > > $ readelf -d -s /usr/lib/x86_64-linux-gnu/libblas.so.3 | grep cblas_ | wc > -l > 184 > > The second is installed alongside ATLAS. > > $ readelf -d -s /usr/lib/x86_64-linux-gnu/libcblas.so.3 | grep cblas_ | > wc -l > 154 > > The symbols lists differ in that there are both functions unique to both. > > $ ldd /usr/lib/x86_64-linux-gnu/libcblas.so.3 > linux-vdso.so.1 (0x7ffcb572) > libatlas.so.3 => /lib/x86_64-linux-gnu/libatlas.so.3 > (0x7fd9b27ee000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7fd9b25c6000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7fd9b24df000) > /lib64/ld-linux-x86-64.so.2 (0x7fd9b2bae000) > > I _guess_ this situation would be mostly fine since libblas has enough > of the CBLAS symbols to prevent location of the wrong libcblas next to > it by the meson search. > > Quick followup regarding netlib splits. Debian only recently folded > libcblas into libblas, as > > https://lists.debian.org/debian-devel/2019/10/msg00273.html > > notes. Not that long ago … esp. considering stable debian. Not sure > when this appeared. And of course numpy is the point where things were > broken: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=913567 > > I'm now looking into how Debian actually produces a combined BLAS+CBLAS > from netlib, as we're using the CMake build system and I do not see an > option to do that. The upstream build produces separate libraries, so I > assumed that is a case that one should handle. Yes, Debian made quite a mess there. We do have a CI job for Netlib on Debain though in NumPy, and indeed it works fine because of the CBLAS symbols already being found inside libblas.so > But it is a demonstration that any guess that libcblas belongs to > libblas just from the name may be wrong in real-world installations. > Letting this sink in some more, I realized the more fundamental reason for treating them together: when we express dependencies, we do so for a *package* (i.e., a packaged version of some project), not for a specific build output like a shared library or a header file. In this case it's a little obscured by BLAS being an interface and the libblas/libcblas mix, but it's still the case that we're looking for multiple installed things from a single package. So we want "MKL" or "Netlib BLAS", where MKL is not only a shared library (or set of them), but for example also the corresponding header file (mkl_cblas.h rather than cblas.h). The situation you are worrying about is basically that of an unknown package with a set of shared libraries and headers that have non-standard names. I'd say that that's then simply a non-supported package, until someone comes to report the situation and we can add support for it (or file a bug against that package and convince the authors not to make such a mess). I think this point is actually important, and I hope you can appreciate it as a packager - we need to depend on packages (things that have URLs to source repos, maintainers, etc.), not random library names. > > Here, it might be a strange installation remnant. > > $ dpkg -L libatlas3-base > /. > /usr > /usr/lib > /usr/lib/x86_64-linux-gnu > /usr/lib/x86_64-linux-gnu/atlas > /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 > /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3 > /usr/lib/x86_64-linux-gnu/libatlas.so.3.10.3 > /usr/lib/x86_64-linux-gnu/libcblas.so.3.10.3 > /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3 > /usr/lib/x86_64-linux-gnu/liblapack_atlas.so.3.10.3 > /usr/share > /usr/share/doc > /usr/share/doc/libatlas3-base > /usr/share/doc/libatlas3-base/README.Debian > /usr/share/doc/libatlas3-base/changelog.Debian.gz > /usr/share/doc/libatlas3-base/copyright > /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3 > /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3 > /usr/lib/x86_64-linux-gnu/libatlas.so.3 > /usr/lib/x86_64-linux-gnu/libcblas.so.3 > /usr/lib/x86_64-linux-gnu/libf77blas.so.3 > /usr/lib/x86_64-linux-gnu/liblapack_atlas.so.3 > > An eclectic list of redundant libraries. But as it is, this is a case where > > 1. libatlas has BLAS, libcblas has corresponding CBLAS (only >referencing ATLAS-specific ABI). > > 2. libcblas does _not_ work together with libblas (uses ATL_ symbols). > I'll n