Run salloc with a smaller number of nodes or tasks, then take a look at
lsof (or some other favorite means of finding IP connections). IIRC,
each srun/node in the allocation needs 70-80 IP connections with the
node running salloc, so a large node count can overwhelm the default
allocation of file descriptors.
On 2/2/2021 1:14 PM, Patrick Goetz wrote:
That sounds like a linux issue. You probably need to reset the max
limit for file descriptors someplace.
Maybe start here:
https://rtcamp.com/tutorials/linux/increase-open-files-limit/
On 2/2/21 11:50 AM, Prentice Bisbal wrote:
Has anyone seen this error message before? A user just reported it. A
Google search doesn't turn up anything useful. I mean, I understand
what too many open files means, but I'm surprised to see it in the
context of salloc.
salloc: error: Error on msg accept socket: Too many open files