Aug 3, 2021, 11:12 by than...@debian.org: > Thanks a lot for the patches Ahzo. Especially fixing the file handle leak > should help a lot. >
Actually that patch was rather a workaround than a fix. The real cause of the "Too many open files" issue is a behavior change of the multiprocessing.Process class in python3. It now opens a pipe internally, which it did not do in python2. The solution is to call the close() method of multiprocessing.Process to close the internal pipe. Attached u2-close-the-internal-pipe-of-multiprocessing.Process.patch does just that. However, this method was only added in python 3.7, so attempting to use it fails in earlier versions of python3 (and also in python2). Regards, Ahzo
>From 27ee32c1fc773fbbb7e54036acc5df6453c10131 Mon Sep 17 00:00:00 2001 From: Ahzo <a...@tutanota.com> Date: Wed, 4 Aug 2021 23:23:20 +0200 Subject: [PATCH] close the internal pipe of multiprocessing.Process Every finished, but not yet logged worker holds an open fd. Thus when following a long running worker, so many finished tests can accumulate, that the open files limit (ulimit -n) is reached. This then causes the test suite to fail with 'OSError: [Errno 24] Too many open files'. The open fd is due to a behavior change of the multiprocessing.Process class in python3. It now opens a pipe internally, which it did not do in python2. The solution is to call the close() method of multiprocessing.Process to close the internal pipe. However, this method is only available since python 3.7. --- sage/src/sage/doctest/forker.py | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/sage/src/sage/doctest/forker.py b/sage/src/sage/doctest/forker.py index 045975c1b5..170ce6824d 100644 --- a/sage/src/sage/doctest/forker.py +++ b/sage/src/sage/doctest/forker.py @@ -1891,6 +1891,14 @@ class DocTestDispatcher(SageObject): # report(), parallel testing can easily fail # with a "Too many open files" error. w.save_result_output() + # In python3 multiprocessing.Process also + # opens a pipe internally, which has to be + # closed here, as well. + # But afterwards, exitcode and pid are + # no longer available. + w.copied_exitcode = w.exitcode + w.copied_pid = w.pid + w.close() finished.append(w) workers = new_workers @@ -1910,10 +1918,10 @@ class DocTestDispatcher(SageObject): self.controller.reporter.report( w.source, w.killed, - w.exitcode, + w.copied_exitcode, w.result, w.output, - pid=w.pid) + pid=w.copied_pid) pending_tests -= 1 -- 2.30.2