potiuk commented on code in PR #46891:
URL: https://github.com/apache/airflow/pull/46891#discussion_r2213153842
##########
airflow-core/tests/unit/cli/commands/test_api_server_command.py:
##########
@@ -296,16 +297,31 @@ def test_run_command_daemon(
)
]
mock_pid_file.assert_has_calls([mock.call(mock_setup_locations.return_value[0],
-1)])
- assert mock_open.mock_calls == [
- mock.call(mock_setup_locations.return_value[1], "a"),
- mock.call().__enter__(),
- mock.call(mock_setup_locations.return_value[2], "a"),
- mock.call().__enter__(),
- mock.call().truncate(0),
- mock.call().truncate(0),
- mock.call().__exit__(None, None, None),
- mock.call().__exit__(None, None, None),
- ]
+ if sys.version_info >= (3, 13):
+ # extra close is called in Python 3.13+ to close the file
descriptors
+ assert mock_open.mock_calls == [
+ mock.call(mock_setup_locations.return_value[1], "a"),
+ mock.call().__enter__(),
+ mock.call(mock_setup_locations.return_value[2], "a"),
+ mock.call().__enter__(),
+ mock.call().truncate(0),
+ mock.call().truncate(0),
+ mock.call().__exit__(None, None, None),
+ mock.call().close(),
+ mock.call().__exit__(None, None, None),
+ mock.call().close(),
+ ]
+ else:
+ assert mock_open.mock_calls == [
+ mock.call(mock_setup_locations.return_value[1], "a"),
+ mock.call().__enter__(),
+ mock.call(mock_setup_locations.return_value[2], "a"),
+ mock.call().__enter__(),
+ mock.call().truncate(0),
+ mock.call().truncate(0),
+ mock.call().__exit__(None, None, None),
+ mock.call().__exit__(None, None, None),
+ ]
Review Comment:
OK. I looked a bit closer and I **think** this is a side effect of
https://github.com/python/cpython/pull/25425
AND @amoghrajesh @kaxil @ashb -> I tink this one also uncovers the root
cause of some potential issue we have with file descriptors sharing that some
of our users experienced - for example
https://github.com/apache/airflow/issues/51512 - which we might want to address
without waiting for the Python 3.13 fix. It seems that we should have a bit
more manual closing and generally managing some of the file descriptors when we
are forking tasks from supervisor.
This change in Python 3.13 https://github.com/python/cpython/pull/25425
addresses this veeeeery long standing issue:
https://bugs.python.org/issue34321 -> where mmap always unnecessarily cloed
file descriptors even when it was not needed. When forking and mmaping some
file description might be shared between the forks or they might be duplicated.
In this case it seems that previously thsoe file descriptors were duplicated
and (if I understand correctly) you could not close them when you forked and
performed mmap - because if you wanted to size/resize such memory-mapped files,
it would fail.
The related issue mentions that after this change is implemented:
> After the mmap() call has returned, the file descriptor, fd, can
> be closed immediately without invalidating the mapping.
Also it mentions that:
> The cloning of the file descriptor seems to be done to ensure that the
file cannot be closed behind mmap's back, but if you are mmap()'ing a lot of
memory regions of a file this can cause a 'Too many open files' error.
Which I thik the underlying "open" method is running the close() immediately
after mmaping in Python 3.13 - previously it had to wait with closing the file
description after the forked process returns (I guess).
Which is also an interesting thing for us - this **could be** (as the
description mentioned) a root cuase for some "too many files open" issues when
a lot of forks /mmap calls happened - because some duplicated filedescriptors
could not be closed. For example this one:
https://github.com/apache/airflow/issues/51512 -> I think we are falling the
victim of this one and possibly we should rework a bit how file descriptors are
shared between supervisor and tasks - so that we can close them early.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]