On Fri, Dec 11, 2020 at 10:57 AM Douglas Eadline <deadl...@eadline.org> wrote: > Second, and most importantly, CentOS will not matter to HPC. > (and maybe other sectors as well) Distributions will become > second class citizens to containers. All that is needed is a > base OS to run the container (think Singularity)
As much as I agree that containers can, do and will continue to solve problems for a lot of things in user-space, because they're just another process, they fall short for anything related to kernel-space. because they're not VMs,. they depend on the kernel of the host they're running on, and all of its drivers stack. And HPC, being focused on performance, is still a lot about kernel-space: think parallel file systems, interconnects, network drivers, GPU drivers, etc. Or even MPI implementations to some extent. It only takes a quick look at the entry point of containers that some vendors provide, to realize that making them work in multiple environments is not that far from supporting applications on multiple distributions in the first place: detecting what versions of what drivers are present on the host to select matching libraries in the container is anything but straightforward or efficient. Anything that interacts with a GPU or an interconnect adapter will have specific requirements about drivers and system-level things (like kernel parameters) where containers will be of little to no help. If you want to make sure that your container works on any version of the OFED stack, GPU driver versions and kernel versions, and still provide decent performance, you'll probably need a massive amount of extra work to support the multitude possible host-side configurations. > Years ago in the early days of Warewwulf, Greg Kurtzer > (Warewulf/Singularity) talked about the idea of bundling the > essential/minimal OS and libraries with applications in custom > Warewulf VNFS image. The scheduler would then boot the application > image -- everything works. This is the same approach that many HPC sites have taken over the years, to decouple system-level software from user-level software as much as possible: deploy compute nodes with a bare minimal OS installation (to cover kernel, drivers and low-level hardware-related stacks), and provide the user-level software (scientific applications) as modules, over NFS and the like, independently of the OS distribution. > An open source project will release a container that "contains" > everything thing it needs to run (along with the container recipe) > Using Singularity you can also sign the container to assure > provenance of the code. The scheduler runs containers. Simple. Provided it can access the hardware resources it needs, yes. :) > The need to maintain library version trees and Modules for > goes away, Of course if are developer writing your own application, > you need specific libraries, but not system wide. Build the > application in your working directly, include any specific libraries > you need in the local source tree and fold it all into a container. That's also a domain where containers only go part of the way: containerized applications are fine as long as they are autonomous and are happy being alone in their own world. But as soon as they need to interact with another application (hello DL frameworks), things get more complicated: what version of that container do you need? Will it work with your own container? Will you end up building yet another container that bundles both applications? What if there's a 3rd element in the workflow? Suddenly, it's back to selecting application versions and making them work together. A little bit like a... distribution? :) > Bottom line, it is all good, we are moving on. Things are changing for sure. But the devil is in the details. Cheers, -- Kilian _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf