Re: [slurm-users] Burst to AWS cloud

Brian Andrus Tue, 15 Dec 2020 12:15:37 -0800

I have done that for several clients.

1. Staging data is a pain. The simplest thing was to have it as part of
   the job script, or have the job itself be dependent upon a separate
   staging job. Where bandwidth is an issue, we have implemented bbcp
2. Depending on size and connectivity, you can use hosts files or
   create a subdomain for the cluster nodes. I prefer the latter. Just
   use static IPs for your cloud nodes. You do need to ensure
   connectivity with networks, etc. of course
3. SchedMD has the info on cloud nodes:
   https://slurm.schedmd.com/elastic_computing.html
4. Try to isolate everything you use so it isn't overly dependent on
   some other groups services (eg: DNS, authentication, etc) unless you
   can be aware of any changes they are making so aren't surprised.
   Also, avoid network mounts on nodes. Performance takes a big hit
   when you have that going over a direct-connect or VPN.


Brian Andrus


On 12/15/2020 12:02 PM, Sajesh Singh wrote:

We are currently investigating the use of the cloud schedulingfeatures within an on-site Slurm installation and was wondering ifanyone had any experiences that they wish to share of trying to usethis feature. In particular I am interested to know:
https://slurm.schedmd.com/elastic_computing.html<https://slurm.schedmd.com/elastic_computing.html>
1) Recommendations for staging the data that was needed by the nodesin cloud
2) How did you handle name resolution
3) Any resources/documentation in particular that proved helpful whilesetting up the environment
4) Any bits of advise or horror stories that may be helpful inavoiding pitfalls.
Regards,

-SS-

Re: [slurm-users] Burst to AWS cloud

Reply via email to