On Mon, 7 Nov 2022 19:05:16 +0000 Torsten Keßler <t.kess...@posteo.de> wrote:
> Hi Justin! > > > Some ROC repositories include documentation (cmake, device libs, hip), > > maybe it would make > > sense to include those in `/usr/share/doc/${pkgname}`? > That's a very good idea. For some packages, AMD bundles them with the > package (rocm-dbgapi) and sometimes it's shipped separately, see hip-doc > [1]. > > > The limited support of ROCm has been one of the main things locking me into > > Nvidia for my > > workstations. > Yes, that's really the main drawback of ROCm. CUDA works on almost any > Nvidia GPU (even on mobile variants). I hope AMD will change their > policy with Navi 30+. > > > Have you tried contacting AMD about `rocm-core`? > Others already did. AMD supported promised to release the source code in > March [2]. > > > Finding information about ROCm support in consumer cards really isn't easy > > – but I guess with > > CUDA I just expect it to work with recent Nvidia cards? > Do you mean the common HIP abstraction layer (like hipfft, hipblas,...)? > Yes, that should work with any recent CUDA version. But I haven't tried > this as I don't have access to an Nvidia GPU. Furthermore, this feature > (HIP with CUDA) has never been requested by the community at rocm-arch. > I think Nvidia users just stick with CUDA and don't need HIP. I mean with ROCm I'm not sure if a GPU I'm going to buy will support it. > > Maybe it would be a good idea to provide testing scripts / documents for > > them, so they can > > report back once you push things into testing? > Absolutely! There's HIP examples [3] from AMD which checks basic HIP > language features. Additionally, we have `rocm-validation-suite` which > offers several tests. > > > Having a list of tested cards in the wiki would be great as well. > I agree! Once we have an established test suite, this should be > straightforward. > > Best! > Torsten > > [1] http://repo.radeon.com/rocm/apt/5.3/pool/main/h/hip-doc/ > [2] > https://github.com/RadeonOpenCompute/ROCm/issues/1705#issuecomment-1081599282 > [3] https://github.com/ROCm-Developer-Tools/HIP-Examples > > Am 06.11.22 um 23:10 schrieb aur-general-requ...@lists.archlinux.org: > > Send Aur-general mailing list submissions to > > aur-general@lists.archlinux.org > > > > To subscribe or unsubscribe via email, send a message with subject or > > body 'help' to > > aur-general-requ...@lists.archlinux.org > > > > You can reach the person managing the list at > > aur-general-ow...@lists.archlinux.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Aur-general digest..." > > > > Today's Topics: > > > > 1. Re: TU Application - tpkessler (Justin Kromlinger) > > 2. Re: TU Application - tpkessler (Torsten Keßler) > > 3. Re: TU Application - tpkessler (Filipe Laíns) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 6 Nov 2022 20:01:14 +0100 > > From: Justin Kromlinger <hashwo...@archlinux.org> > > Subject: Re: TU Application - tpkessler > > To: aur-general@lists.archlinux.org > > Message-ID: <20221106200114.43840...@maker.hashworks.net> > > Content-Type: multipart/signed; > > boundary="Sig_//b6Alp4sEqgQD9YkXUpB1=1"; > > protocol="application/pgp-signature"; micalg=pgp-sha256 > > > > Hi Torsten! > > > > On Wed, 26 Oct 2022 06:30:33 +0000 > > Torsten Keßler <t.kess...@posteo.de> wrote: > > > >> Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a > >> federal state in the south west of Germany. With this email > >> I'm applying to be become a trusted user. > >> After graduating with a PhD in applied mathematics this year I'm now > >> a post-doc with a focus on numerical analysis, the art of solving physical > >> problems with mathematically sound algorithms on a computer. > >> I've been using Arch Linux on my private machines (and at work) since my > >> first weeks at university ten years ago. After initial distro hopping a > >> friend recommended Arch. I immediately liked the way it handles packages > >> via pacman, its wiki and the flexibility of its installation process. > > Soon we can switch the Arch Linux IRC main language to German! > > > >> Owing to their massively parallel architecture, GPUs have emerged as the > >> leading platform for computationally expensive problems: Machine > >> Learning/AI, real-world engineering problems, simulation of complex > >> physical systems. For a long time, nVidia's CUDA framework (closed > >> source, exclusively for their GPUs) has dominated this field. In 2015, > >> AMD announced ROCm, their open source compute framework for GPUs. A > >> common interface to CUDA, called HIP, makes it possible to write code > >> that compiles and runs both on AMD and nVidia hardware. I've been > >> closely following the development of ROCm on GitHub, trying to compile > >> the stack from time to time. But only since 2020, the kernel includes > >> all the necessary code to compile the ROCm stack on Arch Linux. Around > >> this time I've started to contribute to rocm-arch on GitHub, a > >> collection of PKGBUILDs for ROCm (with around 50 packages). Soon after > >> that, I became the main contributor to the repository and, since 2021, > >> I've been the maintainer of the whole ROCm stack. > >> We have an active issue tracker and recently started a discussion page > >> for rocm-arch. Most of the open issues as of now are for bookkeeping of > >> patches we applied to run ROCm on Arch Linux. Many of them are linked to > >> an upstream issue and a corresponding pull request that fixes the > >> issues. This way I've already contributed code to a couple of libraries > >> of the ROCm stack. > >> > >> Over the years, many libraries have added official support for ROCm, > >> including tensorflow, pytorch, python-cupy, python-numba (not actively > >> maintained anymore) and blender. Support of ROCm for the latter > >> generated large interest in the community and is one reason Sven > >> contacted me, asking me if I would be interested to take care of ROCm in > >> [community]. In its current version, ROCm support for blender works out > >> of the box. Just install hip-runtime-amd from the AUR and enable the HIP > >> backend in blender's settings for rendering. The machine learning > >> libraries require more dependencies from the AUR. Once installed, > >> pytorch and tensorflow are known to work on Vega GPUs and the recent > >> RDNA architecture. > >> > >> My first action as a TU would be to add basic support of ROCm to > >> [community], i.e. the low level libraries, including HIP and an open > >> source runtime for OpenCL based on ROCm. That would be enough to run > >> blender with its ROCm backend. At the same time, I would expand the wiki > >> article on ROCm. The interaction with the community would also move from > >> the issue tracker of rocm-arch to the Arch Linux bug tracker and the > >> forums. In a second phase I would add the high level libraries that > >> would enable users to quickly compile and run complex libraries such as > >> tensorflow, pytorch or cupy. > > The limited support of ROCm has been one of the main things locking me into > > Nvidia for my > > workstations. Having stuff in community would certainly help with that! > > > >> #BEGIN Technical details > >> > >> The minimal package list for HIP which includes the runtime libraries > >> for basic GPU programming and the GPU compiler (hipcc) comprises eight > >> packages > >> > >> * rocm-cmake (basic cmake files for ROCm) > >> * rocm-llvm (upstream llvm with to-be-merged changes by AMD) > >> * rocm-device-libs (implements math functions for all GPU architectures) > >> * comgr (runtime library, "compiler support" for rocm-llvm) > >> * hsakmt-roct (interface to the amdgpu kernel driver) > >> * hsa-rocr (runtime for HSA compute kernels) > >> * rocminfo (display information on HSA agents: GPU and possibly CPU) > >> * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired > >> by CUDA C++) > > PKGBUILDs look good to me. Some ROC repositories include documentation > > (cmake, device libs, > > hip), maybe it would make sense to include those in > > `/usr/share/doc/${pkgname}`? > > > >> All but rocm-llvm are small libraries under the permissive MIT license. > >> Since ROCm 5.2, all packages successfully build in a clean chroot and > >> are distributed in the community repo arch4edu. > >> > >> The application libraries for numerical linear algebra, sparse matrices > >> or random numbers start with roc and hip (rocblas, rocsparse, rocrand). > >> The hip* packages are designed in such a way that they would also work > >> with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. > >> With few exceptions (rocthrust, rccl) these packages are licensed under > >> MIT. > >> > >> Possible issues: > >> There are three packages that are not fully working under Arch Linux or > >> lack an open source license. The first is rocm-gdb, a fork of gdb with > >> GPU support. To work properly it needs a kernel module currently not > >> available in upstream linux but only as part of AMD's dkms modules. But > >> they only work with specific kernel versions. Support for this from my > >> side on Arch Linux was dropped a while ago. One closed source package is > >> hsa-amd-aqlprofile. As the name suggests it is used for profiling as > >> part of rocprofiler. Above mentioned packages are only required for > >> debugging and profiling but are no runtime dependencies of the big > >> machine learning libraries or any other package with ROCm support I'm > >> aware of. The third package is rocm-core, a package only part of the > >> meta packages for ROCm with no influence on the ROCm runtime. It > >> provides a single header and a library with a single function that > >> returns the current ROCm version. No source code has been published by > >> AMD so far and the official package lacks a license file. > > Have you tried contacting AMD about `rocm-core`? It seems odd to keep such > > a small thing closed > > source / without a license. > > > >> A second issue is GPU support. AMD officially only supports the > >> professional compute GPUs. This does not mean that ROCm is not working > >> on consumer cards but merely that AMD cannot guarantee all > >> functionalities through excessive testing. Recently, ROCm added support > >> for Navi 21 (RX 6800 onwards), see > >> > >> https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html > >> > >> I own a Vega 56 (gfx900) that is officially supported, so I can test all > >> packages before publishing them on the AUR / in [community]. > > Finding information about ROCm support in consumer cards really isn't easy > > – but I guess with > > CUDA I just expect it to work with recent Nvidia cards? > > > > I would guess that we have a bunch of TUs with Radeon RX 5000/6000 (and > > soon 7000) series cards, > > but without the needed knowledge / use case for ROCm. Maybe it would be a > > good idea to provide > > testing scripts / documents for them, so they can report back once you push > > things into testing? > > > > Having a list of tested cards in the wiki would be great as well. > > > >> #END Technical details > >> > >> On the long term, I would like to foster Arch Linux as the leading > >> platform for scientific computing. This includes Machine Learning > >> libraries in the official repositories as well as packages for classical > >> "number crunching" such as petsc, trilinos and packages that depend on > >> them: deal-ii, dune or ngsolve. > >> > >> The sponsors of my application are Sven (svenstaro) and Bruno (archange). > >> > >> I'm looking forward to the upcoming the discussion and your feedback on > >> my application. > >> > >> Best, > >> Torsten > > Best Regards > > Justin > > > > > > > -- hashworks Web https://hashworks.net Public Key 0x4FE7F4FEAC8EBE67
pgpykYCf6UdZ4.pgp
Description: OpenPGP digital signature