Re: [slurm-users] How to queue jobs based on non-existent features

2020-08-13 Thread Thomas M. Payerle
I have not had a chance to look at you rcode, but find it intriguing, although I am not sure about use cases. Do you do anything to lock out other jobs from the affected node? E.g., you submit a job with unsatisfiable constraint foo. The tool scanning the cluster detects a job queued with foo cons

Re: [slurm-users] How to queue jobs based on non-existent features

2020-08-13 Thread Raj Sahae
Hi All, I have developed a first solution to this issue that I brought up back in early July. I don't think it is complete enough to be the final solution for everyone but it does work and I think it's a good starting place to showcase the value of this feature and iterate for improvement. I wa

Re: [slurm-users] [External] Re: openmpi / UCX / srun

2020-08-13 Thread Max Quast
Hey stijn, thank you very much for the advice! Answer to your questions: Q: are you using rdma-core with mellanox ofed? A: only mellanox ofed, no rdma-core Q: and do you have any uverbs_write error messages in dmesg on the hosts? A: Yes, I have! I have set: 'UCX_TLS=tcp,self,sm' on the slurmd'

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-13 Thread Jodie H. Sprouse
Hello Tina, Thank you for the suggestions and responses!!! As of right now, it seems to be working with taking off the “CPUs=“ all together from gres.conf. The original thought process was to have 4 set aside to always go to the gpu; not so sure that is necessary as long as the CPU partition can