Run salloc with a smaller number of nodes or tasks, then take a look at lsof (or some other favorite means of finding IP connections). IIRC, each srun/node in the allocation needs 70-80 IP connections with the node running salloc, so a large node count can overwhelm the default allocation of file descriptors.

On 2/2/2021 1:14 PM, Patrick Goetz wrote:
That sounds like a linux issue. You probably need to reset the max limit for file descriptors someplace.

Maybe start here:
 https://rtcamp.com/tutorials/linux/increase-open-files-limit/

On 2/2/21 11:50 AM, Prentice Bisbal wrote:
Has anyone seen this error message before? A user just reported it. A Google search doesn't turn up anything useful. I mean, I understand what too many open files means, but I'm surprised to see it in the context of salloc.

salloc: error: Error on msg accept socket: Too many open files



Reply via email to