Yes I agree about the reservation, that was the next thing I was about to focus on.....
Please do show your res config. On Wed, Nov 26, 2025, 3:26 PM Christopher Samuel via slurm-users < [email protected]> wrote: > On 11/13/25 2:16 pm, Lee via slurm-users wrote: > > > 1. When I look at our 8 non-MIG DGXs, via `scontrol show node=dgxXY | > > grep Gres`, 7/8 DGXs report "Gres=gpu:*H100*:8(S:0-1)" while dgx09 > > reports "Gres=gpu:*h100*:8(S:0-1)" > > Two thoughts: > > 1) Looking at the 24.11 code when it's using NVML to get the names > everything gets lowercased - so I wonder if these new ones are getting > correctly discovered by NVML but the older ones are not and so using the > uppercase values in your config? > > gpu_common_underscorify_tolower(device_name); > > I would suggest making sure the GPU names are lower-cased everywhere for > consistency. > > 2) From memory (away from work at the moment) slurmd caches hwloc > library information in an XML file - you might want to go and find that > on an older and newer node and compare those to see if you see the same > difference there. It could be interesting to see if you stop slurmd on > an older node, move that XML file out of the way start slurmd whether > that changes how it reports the node. > > Also I saw you posted "slurmd -G" on the new one, could you post that > from an older one too please? > > Best of luck, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Philadelphia, PA, USA > > -- > slurm-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
-- slurm-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
