The larger cluster is using NFS. I can see how that could be related to the difference of behaviours between the clusters.
The buffering behaviour is the same if I tail the file from the node running the job. The only thing that seems to change the behaviour is whether I use srun to create a job step or not. On Wed, Feb 10, 2021 at 4:09 PM Aaron Jackson < aaron.jack...@nottingham.ac.uk> wrote: > Is it being written to NFS? You say on your local dev cluster it's a > single node. Is it also the login node as well as compute? In that case > I guess there is no NFS. Larger cluster will be using some sort of > shared storage, so whichever shared file system you are using likely has > caching. > > If you are able to connect directly to the node which is running the > job, you can try tailing from there. It'll likely update immediately if > what I said above is the case. > > Cheers, > Aaron > > > On 9 February 2021 at 23:47 GMT, Maria Semple wrote: > > > Hello all, > > > > I've noticed an odd behaviour with job steps in some Slurm environments. > > When a script is launched directly as a job, the output is written to > file > > immediately. When the script is launched as a step in a job, output is > > written in ~30 second chunks. This doesn't happen in all Slurm > > environments, but if it happens in one, it seems to always happen. For > > example, on my local development cluster, which is a single node on > Ubuntu > > 18, I don't experience this. On a large Centos 7 based cluster, I do. > > > > Below is a simple reproducible example: > > > > loop.sh: > > #!/bin/bash > > for i in {1..100} > > do > > echo $i > > sleep 1 > > done > > > > withsteps.sh: > > #!/bin/bash > > srun ./loop.sh > > > > Then from the command line running sbatch loop.sh followed by tail -f > > slurm-<job #>.out prints the job output in smaller chunks, which appears > to > > be related to file system buffering or the time it takes for the tail > > process to notice that the file has updated. Running cat on the file > every > > second shows that the output is in the file immediately after it is > emitted > > by the script. > > > > If you run sbatch withsteps.sh instead, tail-ing or repeatedly cat-ing > the > > output file will show that the job output is written in a chunk of 30 - > 35 > > lines. > > > > I'm hoping this is something that is possible to work around, potentially > > related to an OS setting, the way Slurm was compiled, or a Slurm setting. > > > -- > Research Fellow > School of Computer Science > University of Nottingham > > > > This message and any attachment are intended solely for the addressee > and may contain confidential information. If you have received this > message in error, please contact the sender and delete the email and > attachment. > > Any views or opinions expressed by the author of this email do not > necessarily reflect the views of the University of Nottingham. Email > communications with the University of Nottingham may be monitored > where permitted by law. > > > > > > -- Thanks, Maria