Thank you for your input Jason, I wasn't trying to "chide" you in any way. I appreciate your contribution to the discussion.
On Fri, Jun 4, 2021 at 11:37 AM Jason Simms <sim...@lafayette.edu> wrote: > You don't need to chide me for making what is, to me, a reasonable > solution. *You* may not be able to make hardware changes, but why the > people who can would want failing GPUs remaining in a system is anathema to > my approach to cluster management. In other words, I do not recommend you > try to find a workaround to a solution that, in my opinion, is best solved > by eliminating the faulty hardware. I understand the impulse, and if there > is a simple solution to specifying a specific GPU, then fine, do that. But > again it goes against treating such resources as generic - nodes and > hardware should be thought of as cattle, not pets, and should be managed > accordingly. Again, I believe you are trying to solve a problem that should > not be yours to solve. Sorry if this irritates you. > > JLS > > On Fri, Jun 4, 2021 at 2:17 PM Ahmad Khalifa <underoath...@gmail.com> > wrote: > >> I can't make hardware changes, but I still want to make use of the >> cluster. Let's keep the discussion on how to get slurm to do it, if that's >> possible. >> >> On Fri, Jun 4, 2021 at 11:13 AM Jason Simms <sim...@lafayette.edu> wrote: >> >>> Unpopular opinion: remove the failing GPU. >>> >>> JLS >>> >>> On Fri, Jun 4, 2021 at 2:07 PM Ahmad Khalifa <underoath...@gmail.com> >>> wrote: >>> >>>> Because there are failing GPUs that I'm trying to avoid. >>>> >>>> On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth <stephan.r...@ee.ethz.ch> >>>> wrote: >>>> >>>>> On 03.06.21 07:11, Ahmad Khalifa wrote: >>>>> > How to send a job to a particular gpu card using its ID >>>>> (0,1,2...etc)? >>>>> >>>>> Why do you need to access a GPU based on its ID? >>>>> >>>>> If its to select a certain GPU type, there are other methods you can >>>>> use. >>>>> >>>>> You could create partitions for the same GPU types or add features. >>>>> Due to our heterogenous nodes with mixed GPU types we do the latter, >>>>> we >>>>> added a feature for the GPU architectures and one for the GPU types to >>>>> each node. >>>>> >>>>> Cheers, >>>>> Stephan >>>>> >>>>> >>> >>> -- >>> *Jason L. Simms, Ph.D., M.P.H.* >>> Manager of Research and High-Performance Computing >>> XSEDE Campus Champion >>> Lafayette College >>> Information Technology Services >>> 710 Sullivan Rd | Easton, PA 18042 >>> Office: 112 Skillman Library >>> p: (610) 330-5632 >>> >> > > -- > *Jason L. Simms, Ph.D., M.P.H.* > Manager of Research and High-Performance Computing > XSEDE Campus Champion > Lafayette College > Information Technology Services > 710 Sullivan Rd | Easton, PA 18042 > Office: 112 Skillman Library > p: (610) 330-5632 >