On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <beowulf@beowulf.org> wrote:
> I have come across this question in a few locations. Being specific, I am > a fan of the Julia language. Ont he Juia forum a respected developer > recently asked what the options were for keeping code developed on a laptop > in sync with code being deployed on an HPC system. > In keeping with the rest of the buzzwords, where does CI/CD fit between "code developed" and "code being deployed"? Once you have a mechanism for this, can't this be used for the final deployment? Or even CD could automatically take care of that final deployment? > There was some discussion of having Git style repositories which can be > synced to/from. > Yes, that would work fine. Why would git not be compatible with an HPC setup? And why restrict yourself to git and not talk about distributed version control systems in general? > My suggestion was an ssh mount of the home directory on the HPC system, > which I have configured effectively int he past when using remote HPC > systems. > I don't quite parse the first part of the phrase - care to reformulate/elaborate? > Again their workflow is to develop on the laptop and upload code to Github > type repositories. Then when running on a cloud service the software ids > downloaded from the Repo. > The way I read it, this is very much restricted to code that can be run immediately after download, i.e. using a scripting language. That might fit your HPC universe, but the parallel one I live in still mostly runs code built and maybe even optimized on the HPC system it runs on. This includes software delivered in binary form from ISVs, open source code (f.e. GROMACS), or code developed in-house - they all have in common using an internode (f.e. MPI) or intranode (OpenMP, CUDA) communication and/or control library directly, not through a deep stack. > There are of course HPC services on the cloud, with gateways to access > them. > > This leads me to ask - shoudl we be presenting HPC services as a 'cloud' > service, no matter that it is a non-virtualised on-premise setup? > What's in a name? It's called cloud computing today, but it was called grid computing 10-15 years ago... For many years, before the cloud-craze began, scientists might have had access to some HPC resources in their own institution, in other institutions in the same city, country, continent or even across continents. How is this different from having access to an on-premise install of f.e. OpenStack or a cloud computing offer somewhere else also using OpenStack? The only advantage in some cases is that the on-premise stuff might be better integrated with the "home" setup (i.e. common file systems, common user management, or - why not? - better documentation :)), which improves the user experience, but the functionality is very similar or the same. To come back to your initial topic - a git repo can just as well be sync-ed to a login node of a cluster (wherever that is located) or to a VM in the AWS cloud (wherever that is located). > I think out loud that many HPC codes depend crucially on a $HOME directory > being presnet on the compute nodes as the codes look for dot files etc. in > $HOME. I guess this can be dealt with by fake $HOMES which again sync back > to the Repo. > I don't follow you here... $HOME, dot files, repo, syncing back? And why "Repo" with capital letter, is it supposed to be a name or something special? In my HPC universe, people actually not only need code, but also data - usually LOTS of data. Replicating the code (for scripting languages) or the binaries (for compiled stuff) would be trivial, replicating the data would not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the fly whenever the instance is brought up would be slow and costly. And by the way this is in no way a new idea - queueing systems have for a long time the concept of "pre" and "post" job stages, which could be used to pull in code and/or data to the node(s) on which the node would be running and clean up afterwards. Cheers, Bogdan
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf