Hi, I just finished a cluster which consists of multi-regional and on-premise servers.
My slurm cluster environment is as follows and I want to run jobs in a combination of multiple region worker nodes. Slurm master server was created in GCP KR Region, Worker node #1 was created in the same region with slurm master server, and has NVIDIA T4 2 GPUs. Worker node #2 was created in GCP US Region, and has NVIDIA T4 2 GPUs. And Worker node #3 is one of the on premise servers which has NVIDIA T4 8 GPUs. In this environment, Can I run a slurm job in combination of #1 server 2 GPUs + #2 servers 2 GPUs?, or #1 server 2 GPUs + #3 on premise server? Depending on the result of my several tests, multi-regional GPUs combinations failed. Those jobs were run in only one region's worker node. Are there any mechanisms or rules about the combination of multiple worker nodes? and priority rule in selection of multi worker nodes? Thanks