scontrol release job nnnnn Not sure if the system can be set to automatically release jobs but I would not want them too as a faulty system will go into a do loop start, fail, start.
Doug On Tue, Jan 22, 2019 at 10:45 AM Roger Moye <rm...@quantlab.com> wrote: > This morning we had several jobs fail with “launch failed requeued held” > state. We traced this to a failed prolog. We fixed the problem but the > jobs remained in this state. > > > > Is there a way to configure slurm so that it will automatically release > the job from the Held state so that it can run? There were plenty of > healthy nodes for this job so I’d prefer that the job not remained held > indefinitely. > > > > Thanks! > > -Roger > > > > [image: cid:image001.png@01D22319.C7D5D540] > > Roger Moye > > HPC Engineer > > 713.425.6236 Office > > 713.898.0021 Mobile > > > > QUANTLAB Financial, LLC > > 3 Greenway Plaza > > Suite 200 > > Houston, Texas 77046 > > www.quantlab.com > > > > > ----------------------------------------------------------------------------------- > > The information in this communication and any attachment is confidential > and intended solely for the attention and use of the named addressee(s). > All information and opinions expressed herein are subject to change without > notice. This communication is not to be construed as an offer to sell or > the solicitation of an offer to buy any security. Any such offer or > solicitation can only be made by means of the delivery of a confidential > private offering memorandum (which should be carefully reviewed for a > complete description of investment strategies and risks). Any reliance one > may place on the accuracy or validity of this information is at their own > risk. Past performance is not necessarily indicative of the future results > of an investment. All figures are estimated and unaudited unless otherwise > noted. If you are not the intended recipient, or a person responsible for > delivering this to the intended recipient, you are not authorized to and > must not disclose, copy, distribute, or retain this message or any part of > it. In this case, please notify the sender immediately at 713-333-5440 >