potiuk commented on PR #53149: URL: https://github.com/apache/airflow/pull/53149#issuecomment-3091971532
Ok: https://github.com/apache/airflow/pull/53506 is now updated with a (nearly) complete solution. The only thing it is missing is pre-commit to synchronize "shared" dependencies with distributions using them automatically (I pretended we have such a commit and copied dependencies manually - but pre-commit can automatically synchronize them - similarly as we do already in providers. How it works: * All shared distributions are sub-packages of `airflow_shared` package. This package follows similar approach we are using in providers with legacy namespace packages: ``` __path__ = __import__("pkgutil").extend_path(__path__, __name__) ``` * This way - they can effectively refer to each other via: ```python from ..timezones.timezone import is_naive ``` Because they are effective all sub-packages of the "airflow_shared" so `airflow_shared.logging` can refer to `airflow_shared.timezones` with `..timezones`. * The 'logging` and `timezones` are both symlinked to the same `_shared` package of `task-sdk` - and they are "force-included" there in task-sdk's pyproject.toml for sdist. Result when you build sdist package: ``` -rw-r--r-- 0 0 0 785 Feb 2 2020 apache_airflow_task_sdk-1.1.0/tests/task_sdk/io/__init__.py -rw-r--r-- 0 0 0 11212 Feb 2 2020 apache_airflow_task_sdk-1.1.0/tests/task_sdk/io/test_path.py -rw-r--r-- 0 0 0 9530 Feb 2 2020 apache_airflow_task_sdk-1.1.0/tests/task_sdk/log/test_log.py -rw-r--r-- 0 0 0 843 Feb 2 2020 apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/__init__.py -rw-r--r-- 0 0 0 828 Feb 2 2020 apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/config.py -rw-r--r-- 0 0 0 1016 Feb 2 2020 apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/structlog.py -rw-r--r-- 0 0 0 843 Feb 2 2020 apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/timezones/__init__.py -rw-r--r-- 0 0 0 10070 Feb 2 2020 apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/timezones/timezone.py ``` * The imports are "clean" and predictable. In task sdk your imports will be `from airflow.task_sdk._shared.logging` and `from airflow.task_sdk._shared.timezones` .. Can't be cleaner. * I updated `setup_idea.py` to automatically add "shared" libraries to IntelliJ project (including tests in those folders) - so that you can run tests written for "internal logging" and "internal timezone" testing directly from the IDE * Also - those "shared" libraries are now actually "real" packages (!). They can build locally and even released - for example in CI if we need them - and (purely theorethically) - we could even use them as "regular" dependencies. Which opens up interesting possibility - for example `airflow-ctl` could **actually** use those packages as regular dependencies - rather thn inlined. The only difference in this case will be how you declare dependencies: * for inlining case, the package folders should be symlinked, their dependencies (not the packages themselves) should be added as dependencies of the "including" distribution, the "apache-airflow-shared-*" dependencies should be aded as devel-dependencies so that pytest tests can run in all cases * for "regular" case - the "apache-airflow-shared-*" packages should be declared as regular dependencies (not dev ones) We do not have to do it of course, but it opens an interesting possibility - when releasing `airflow-ctl` or other standalone distributions, that will nicely work and `airflow-ctl` could direcly use `import airflow_shared.logging` without having to do all the symlinking and `_shared` sub-package. And it will **just work** as usual. No magic whatsoever. * Also those are "real" distributions also for development and testing - which means that you can do this: ``` cd shared/logging uv sync uv run pytest ``` And it will do what you expect it to do: venv will be updated to **only** contains logging (and timezone because it is dev dependency of logging) dependencies and it will run all the tests. This means that we can completely separately work on "logging" and "timezone" and any other shared library, we can run tests, and (most importantly) when we run tests this way, we make sure that logging does not use any external packages that are not declared as dependency - which means that it will be impossible to accidentally do "import airflow" in logging or timezone - because `uv sync` will make sure that there is no "airflow-core" available for it. This means that our CI will automatically verify that the shared packages are properly isolated and do not do "forbidden" stuff. * There are two drawbacks I am currently aware of (but extremely small and absolutely not blockers) 1) When you refer another shared library from your shared library, you need to #noqa it for parent relative import - I think it's a good idea to not allow parent relative imports in general, but this is the exceptional case that should be rare and is justified to use it. ``` from ..timezones.timezone import is_naive # noqa: TID252 ``` 2) When you work directly on the sources of such shared library, IntelliJ does not find and recognize those `..timezone` relative imports (but it works when you run tests etc. - it's only IDE being confused that .. is not **real** parent but a legacy namespace package. HOWEVER, it DOES work in the "including" projects where the shared libraries are symlinked - and this is, I guess in most cases where people will be working on any changes anyway - so this is really, really small thing: <img width="1163" height="238" alt="Screenshot 2025-07-19 at 09 08 52" src="https://github.com/user-attachments/assets/8c181a97-f54b-421f-9ad2-d40257376f09" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
