Hi, We have a similar configuration, very heterogeneous cluster and cons_tres. Users need to specify the CPU/memory/GPU/time, and it will schedule their job somewhere. Indeed there's currently no guarantee that you won't be left with a node with unusable GPUs because no CPUs or memory are available.
We use one partition with 100% of the nodes and a time limit of 2 days, and a second partition with ~90% of the nodes and a limit of 7 days. This gives shorter jobs a chance to run without waiting just for long jobs. We also use weights for the nodes, such that smaller nodes (resource-wise) will be selected first. This prevents smaller jobs from filling up the larger nodes (unless previous smaller nodes are occupied). HTH, Yair. On Mon, Feb 8, 2021 at 1:41 PM Ansgar Esztermann-Kirchner < aesz...@mpibpc.mpg.de> wrote: > Hello List, > > we're running a heterogeneous cluster (just x86_64, but a lot of > different node types from 8 to 64 HW threads, 1 to 4 GPUs). > Our processing power (for our main application, at least) is > exclusively provided by the GPUs, so cons_tres looks quite promising: > depending on the size of the job, request an appropriate number of > GPUs. Of course, you have to request some CPUs as well -- ideally, > evenly distributed among the GPUs (e.g. 10 per GPU on a 20-core, 2-GPU > node; 16 on a 64-core, 4-GPU node). > Of course, one could use different partitions for different nodes, and > then submit individual jobs with CPU requests tailored to one such > partition, but I'd prefer a more flexible approach where a given job > could run on any large enough node. > > Is there anyone with a similar setup? Any config options I've missed, > or do you have a work-around? > > Thanks, > > A. > > -- > Ansgar Esztermann > Sysadmin Dep. Theoretical and Computational Biophysics > http://www.mpibpc.mpg.de/grubmueller/esztermann > -- /| | \/ | Yair Yarom | System Group (DevOps) [] | The Rachel and Selim Benin School [] /\ | of Computer Science and Engineering []//\\/ | The Hebrew University of Jerusalem [// \\ | T +972-2-5494522 | F +972-2-5494522 // \ | ir...@cs.huji.ac.il // |