Brian,
  Thank you for the info. Will definitely keep you recommendations handy while 
putting this together.

-SS-


From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Brian 
Andrus
Sent: Tuesday, December 15, 2020 3:14 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Burst to AWS cloud

EXTERNAL SENDER


I have done that for several clients.

  1.  Staging data is a pain. The simplest thing was to have it as part of the 
job script, or have the job itself be dependent upon a separate staging job. 
Where bandwidth is an issue, we have implemented bbcp
  2.  Depending on size and connectivity, you can use hosts files or create a 
subdomain for the cluster nodes. I prefer the latter. Just use static IPs for 
your cloud nodes. You do need to ensure connectivity with networks, etc. of 
course
  3.  SchedMD has the info on cloud nodes: 
https://slurm.schedmd.com/elastic_computing.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Felastic_computing.html&data=04%7C01%7Cssingh%40amnh.org%7Cd05731e539ad4bf2087608d8a13607a0%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637436600713667076%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=At8q53Xb1%2FNFdkZViinfX%2FPEHq6%2Fgz%2FyRUC8G2shwOY%3D&reserved=0>
  4.  Try to isolate everything you use so it isn't overly dependent on some 
other groups services (eg: DNS, authentication, etc) unless you can be aware of 
any changes they are making so aren't surprised. Also, avoid network mounts on 
nodes. Performance takes a big hit when you have that going over a 
direct-connect or VPN.

Brian Andrus


On 12/15/2020 12:02 PM, Sajesh Singh wrote:
We are currently investigating the use of the cloud scheduling features within 
an on-site Slurm installation and was wondering if anyone had any experiences 
that they wish to share of trying to use this feature. In particular I am 
interested to know:

https://slurm.schedmd.com/elastic_computing.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Felastic_computing.html&data=04%7C01%7Cssingh%40amnh.org%7Cd05731e539ad4bf2087608d8a13607a0%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637436600713677073%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Q80f%2BjejDl3fc3qDfYAwhbIgjUg7KzXbcfQwPgsRnCo%3D&reserved=0>

1)  Recommendations for staging the data that was needed by the nodes in cloud
2) How did you handle name resolution
3) Any resources/documentation in particular that proved helpful while setting 
up the environment
4) Any bits of advise or horror stories that may be helpful in avoiding 
pitfalls.


Regards,

-SS-

Reply via email to