I ran some other tests and got the nearly the same results. That 4 minutes in my previous post means about 50% overhead. So, 24000 minutes on direct run is about 35000 minutes via slurm. I will post with details later. the methodology I used is
1- Submit a job to a specific node (compute-0-0) via slurm on the frontend and get te elapsed run time (or add time command in the script) 2- ssh to the specific node (compute-0-0) and directly run the program with time command. So, the hardware is the same. I have to say that the frontnend has little differences with compute-0-0 but that is not important because as I said before, the program is installed on /usr and not the shared file system. I think the slurm process which query the node to collect runtime information is not negligible. For example, squeue updates the runtime every seconds. How can I tell slurm not to query very soon. For example, update the node information every 10 seconds. Though I am not sure how much effect that has. Regards, Mahmood On Fri, Apr 20, 2018 at 10:39 AM, Loris Bennett <loris.benn...@fu-berlin.de> wrote: > Hi Mahmood, > Rather than the overhead being 50%, maybe it is just 4 minutes. If > another job runs for a week, that might not be a problem. In addition, > you just have one data point, so it is rather difficult to draw any > conclusion. > > However, I think that it is unlikely that Slurm is responsible for > this difference. What can happen is that, if a node is powered down > before the job starts, then the clock starts ticking as soon as the job > is assigned to the node. This means that the elapsed time also includes > the time for the node to be provisioned. If this is not relevant in > your case, then you are probably just not comparing like with like, > e.g. is the hardware underlying /tmp identical in both cases? > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de >