On Wed, 2022-10-26 at 06:30 +0000, Torsten Keßler wrote: > Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a > federal state in the south west of Germany. With this email > I'm applying to be become a trusted user. > After graduating with a PhD in applied mathematics this year I'm now > a post-doc with a focus on numerical analysis, the art of solving physical > problems with mathematically sound algorithms on a computer. > I've been using Arch Linux on my private machines (and at work) since my > first weeks at university ten years ago. After initial distro hopping a > friend recommended Arch. I immediately liked the way it handles packages > via pacman, its wiki and the flexibility of its installation process. > > Owing to their massively parallel architecture, GPUs have emerged as the > leading platform for computationally expensive problems: Machine > Learning/AI, real-world engineering problems, simulation of complex > physical systems. For a long time, nVidia's CUDA framework (closed > source, exclusively for their GPUs) has dominated this field. In 2015, > AMD announced ROCm, their open source compute framework for GPUs. A > common interface to CUDA, called HIP, makes it possible to write code > that compiles and runs both on AMD and nVidia hardware. I've been > closely following the development of ROCm on GitHub, trying to compile > the stack from time to time. But only since 2020, the kernel includes > all the necessary code to compile the ROCm stack on Arch Linux. Around > this time I've started to contribute to rocm-arch on GitHub, a > collection of PKGBUILDs for ROCm (with around 50 packages). Soon after > that, I became the main contributor to the repository and, since 2021, > I've been the maintainer of the whole ROCm stack. > > We have an active issue tracker and recently started a discussion page > for rocm-arch. Most of the open issues as of now are for bookkeeping of > patches we applied to run ROCm on Arch Linux. Many of them are linked to > an upstream issue and a corresponding pull request that fixes the > issues. This way I've already contributed code to a couple of libraries > of the ROCm stack. > > Over the years, many libraries have added official support for ROCm, > including tensorflow, pytorch, python-cupy, python-numba (not actively > maintained anymore) and blender. Support of ROCm for the latter > generated large interest in the community and is one reason Sven > contacted me, asking me if I would be interested to take care of ROCm in > [community]. In its current version, ROCm support for blender works out > of the box. Just install hip-runtime-amd from the AUR and enable the HIP > backend in blender's settings for rendering. The machine learning > libraries require more dependencies from the AUR. Once installed, > pytorch and tensorflow are known to work on Vega GPUs and the recent > RDNA architecture. > > My first action as a TU would be to add basic support of ROCm to > [community], i.e. the low level libraries, including HIP and an open > source runtime for OpenCL based on ROCm. That would be enough to run > blender with its ROCm backend. At the same time, I would expand the wiki > article on ROCm. The interaction with the community would also move from > the issue tracker of rocm-arch to the Arch Linux bug tracker and the > forums. In a second phase I would add the high level libraries that > would enable users to quickly compile and run complex libraries such as > tensorflow, pytorch or cupy.
Huge +1 for me here. It would be awesome to bring ROCm to the official repos. I have not done it as currently I am split between tons of projects, which makes it hard to find the time for the initial work and then commit to maintaining the stack, so I am very excited having someone take this item of the my endless TODO list! > #BEGIN Technical details > > The minimal package list for HIP which includes the runtime libraries > for basic GPU programming and the GPU compiler (hipcc) comprises eight > packages > > * rocm-cmake (basic cmake files for ROCm) > * rocm-llvm (upstream llvm with to-be-merged changes by AMD) > * rocm-device-libs (implements math functions for all GPU architectures) > * comgr (runtime library, "compiler support" for rocm-llvm) > * hsakmt-roct (interface to the amdgpu kernel driver) > * hsa-rocr (runtime for HSA compute kernels) > * rocminfo (display information on HSA agents: GPU and possibly CPU) > * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired > by CUDA C++) > > All but rocm-llvm are small libraries under the permissive MIT license. > Since ROCm 5.2, all packages successfully build in a clean chroot and > are distributed in the community repo arch4edu. > > The application libraries for numerical linear algebra, sparse matrices > or random numbers start with roc and hip (rocblas, rocsparse, rocrand). > The hip* packages are designed in such a way that they would also work > with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. > With few exceptions (rocthrust, rccl) these packages are licensed under MIT. > > Possible issues: > There are three packages that are not fully working under Arch Linux or > lack an open source license. The first is rocm-gdb, a fork of gdb with > GPU support. To work properly it needs a kernel module currently not > available in upstream linux but only as part of AMD's dkms modules. But > they only work with specific kernel versions. Support for this from my > side on Arch Linux was dropped a while ago. One closed source package is > hsa-amd-aqlprofile. As the name suggests it is used for profiling as > part of rocprofiler. Above mentioned packages are only required for > debugging and profiling but are no runtime dependencies of the big > machine learning libraries or any other package with ROCm support I'm > aware of. The third package is rocm-core, a package only part of the > meta packages for ROCm with no influence on the ROCm runtime. It > provides a single header and a library with a single function that > returns the current ROCm version. No source code has been published by > AMD so far and the official package lacks a license file. > > A second issue is GPU support. AMD officially only supports the > professional compute GPUs. This does not mean that ROCm is not working > on consumer cards but merely that AMD cannot guarantee all > functionalities through excessive testing. Recently, ROCm added support > for Navi 21 (RX 6800 onwards), see > > https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html > > I own a Vega 56 (gfx900) that is officially supported, so I can test all > packages before publishing them on the AUR / in [community]. I own a RX 5700 XT (gfx1010), if specific testing is required. > #END Technical details > > On the long term, I would like to foster Arch Linux as the leading > platform for scientific computing. This includes Machine Learning > libraries in the official repositories as well as packages for classical > "number crunching" such as petsc, trilinos and packages that depend on > them: deal-ii, dune or ngsolve. +1 on this too. My day job is supporting the Python scientific computing / data science ecosystem, with a focus on packaging, so I am looking forward to this, and helping out where I can. > The sponsors of my application are Sven (svenstaro) and Bruno (archange). > > I'm looking forward to the upcoming the discussion and your feedback on > my application. > > Best, > Torsten That said, I skimmed Torsten's PKGBUILD, and the only thing I noticed was the missing -DCMAKE_BUILD_TYPE=None argument from CMake packages, against the recommendations from [1], which I wouldn't consider a bid deal anyway. So no roast for me, against Sven's expectations :P Overall, I am very happy we have someone interested in working on ROCm support in the offical repos, and am looking forward to working with Torsten. +1 on the candidate for me! [1] https://wiki.archlinux.org/title/CMake_package_guidelines Cheers, Filipe Laíns
signature.asc
Description: This is a digitally signed message part