the queue system does not start the jobs?
I mean, for example, I run this script with ID of job
$ why_job_does_not_start.sh 12345
and it writes:
"There is not enough memory on free nodes" or "There is no enough nodes
to start"
--
Pavel Vashchenkov
start long time. All small jobs
(requiring 1 or 2 nodes) start but this my big job which requires all
nodes still waiting in queue. I do know that asked resources do not
exceed real resources of the cluster, because it had been started when
were no other jobs.
--
Pavel Vashchenkov
In
100 GB (RealMem - AllocMem))?
PS On other nodes the situation is similar:
RealMemory=257433 AllocMem=180224 FreeMem=7913
On free node (it is not allocated for computation right now):
RealMemory=257433 AllocMem=0 FreeMem=159610
--
Pavel Vashchenkov
I've found that this question arised two years ago:
https://bugs.schedmd.com/show_bug.cgi?id=4717
And it's still unsolved :(
--
Pavel Vashchenkov
02.03.2020 17:28, Pavel Vashchenkov пишет:
> 28.02.2020 20:53, Renfro, Michael пишет:
>> When I made similar queues, and only
bs to
MaxCPUsPerNode-1 or less. If I use all available CPU cores, GPU-job does
not start in 'gpu' partition on thesame nodes.
--
Pavel Vashchenkov
>
>> On Feb 27, 2020, at 9:51 PM, Pavel Vashchenkov wrote:
>>
>> External Email Warning
>>
>> This email
Hello,
I have a hybrid cluster with 2 GPUs and 2 20-cores CPUs on each node.
I created two partitions: - "cpu" for CPU-only jobs which are allowed to
allocate up to 38 cores per node - "gpu" for GPU-only jobs which are
allowed to allocate up to 2 GPUs and 2 CPU cores.
Respective sections in slur