I can't make hardware changes, but I still want to make use of the cluster. Let's keep the discussion on how to get slurm to do it, if that's possible.
On Fri, Jun 4, 2021 at 11:13 AM Jason Simms <sim...@lafayette.edu> wrote: > Unpopular opinion: remove the failing GPU. > > JLS > > On Fri, Jun 4, 2021 at 2:07 PM Ahmad Khalifa <underoath...@gmail.com> > wrote: > >> Because there are failing GPUs that I'm trying to avoid. >> >> On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth <stephan.r...@ee.ethz.ch> >> wrote: >> >>> On 03.06.21 07:11, Ahmad Khalifa wrote: >>> > How to send a job to a particular gpu card using its ID (0,1,2...etc)? >>> >>> Why do you need to access a GPU based on its ID? >>> >>> If its to select a certain GPU type, there are other methods you can use. >>> >>> You could create partitions for the same GPU types or add features. >>> Due to our heterogenous nodes with mixed GPU types we do the latter, we >>> added a feature for the GPU architectures and one for the GPU types to >>> each node. >>> >>> Cheers, >>> Stephan >>> >>> > > -- > *Jason L. Simms, Ph.D., M.P.H.* > Manager of Research and High-Performance Computing > XSEDE Campus Champion > Lafayette College > Information Technology Services > 710 Sullivan Rd | Easton, PA 18042 > Office: 112 Skillman Library > p: (610) 330-5632 >