potiuk commented on PR #53149:
URL: https://github.com/apache/airflow/pull/53149#issuecomment-3091971532

   Ok: https://github.com/apache/airflow/pull/53506 is now updated with a 
(nearly) complete solution. The only thing it is missing is pre-commit to 
synchronize "shared" dependencies with distributions using them automatically 
(I pretended we have such a commit and copied dependencies manually - but 
pre-commit can automatically synchronize them - similarly as we do already in 
providers.
   
   How it works:
   
   * All shared distributions are sub-packages of `airflow_shared` package. 
This package follows similar approach we are using in providers with legacy 
namespace packages:
   ```
   __path__ = __import__("pkgutil").extend_path(__path__, __name__)
   ```
   
   * This way - they can effectively refer to each other via:
   
   ```python
   from ..timezones.timezone import is_naive
   ```
   
   Because they are effective all sub-packages of the "airflow_shared" so 
`airflow_shared.logging` can refer to `airflow_shared.timezones` with 
`..timezones`.
   
   * The 'logging` and `timezones` are both symlinked to the same `_shared` 
package of `task-sdk` - and they are "force-included" there in task-sdk's 
pyproject.toml for sdist.
   
   Result when you build sdist package:
   
   ```
   -rw-r--r--  0 0      0         785 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/tests/task_sdk/io/__init__.py
   -rw-r--r--  0 0      0       11212 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/tests/task_sdk/io/test_path.py
   -rw-r--r--  0 0      0        9530 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/tests/task_sdk/log/test_log.py
   -rw-r--r--  0 0      0         843 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/__init__.py
   -rw-r--r--  0 0      0         828 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/config.py
   -rw-r--r--  0 0      0        1016 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/logging/structlog.py
   -rw-r--r--  0 0      0         843 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/timezones/__init__.py
   -rw-r--r--  0 0      0       10070 Feb  2  2020 
apache_airflow_task_sdk-1.1.0/src/airflow/sdk/_shared/timezones/timezone.py
   ```
   
   * The imports are "clean" and predictable. In task sdk your imports will be 
`from airflow.task_sdk._shared.logging` and `from 
airflow.task_sdk._shared.timezones` .. Can't be cleaner.
   
   * I updated `setup_idea.py` to automatically add "shared" libraries to 
IntelliJ project (including tests in those folders) - so that you can run tests 
written for "internal logging" and "internal timezone" testing directly from 
the IDE 
   
   
   * Also - those "shared" libraries are now actually "real" packages (!). They 
can build locally and even released - for example in CI if we need them - and 
(purely theorethically) - we could even use them as "regular" dependencies. 
Which opens up interesting possibility - for example `airflow-ctl` could 
**actually** use those packages as regular dependencies - rather thn inlined. 
The only difference in this case will be how you declare dependencies:
      * for inlining case, the package folders should be symlinked, their 
dependencies (not the packages themselves) should be added as dependencies of 
the "including" distribution, the "apache-airflow-shared-*" dependencies should 
be aded as devel-dependencies so that pytest tests can run in all cases
      * for "regular" case - the "apache-airflow-shared-*" packages should be 
declared as regular dependencies (not dev ones)
   
   We do not have to do it of course, but it opens an interesting possibility - 
when releasing `airflow-ctl` or other standalone distributions, that will 
nicely work and `airflow-ctl` could direcly use `import airflow_shared.logging` 
without having to do all the symlinking and `_shared` sub-package. And it will 
**just work** as usual. No magic whatsoever.
   
   * Also those are "real" distributions also for development and testing - 
which means that you can do this:
   
   ```
   cd shared/logging
   uv sync
   uv run pytest
   ```
   
   And it will do what you expect it to do: venv will be updated to **only** 
contains logging (and timezone because it is dev dependency of logging) 
dependencies and it will run all the tests. This means that we can completely 
separately work on "logging" and "timezone" and any other shared library, we 
can run tests, and (most importantly) when we run tests this way, we make sure 
that logging does not use any external packages that are not declared as 
dependency - which means that it will be impossible to accidentally do "import 
airflow" in logging or timezone - because `uv sync` will make sure that there 
is no "airflow-core" available for it. This means that our CI will 
automatically verify that the shared packages are properly isolated and do not 
do "forbidden" stuff.
   
   * There are two drawbacks I am currently aware of (but extremely small and 
absolutely not blockers)
   
   1) When you refer another shared library from your shared library, you need 
to #noqa it for parent relative import - I think it's a good idea to not allow 
parent relative imports in general, but this is the exceptional case that 
should be rare and is justified to use it.
   
   ```
   from ..timezones.timezone import is_naive  # noqa: TID252
   ```
   
   2) When you work directly on the sources of such shared library, IntelliJ 
does not find and recognize those `..timezone` relative imports (but it works 
when you run tests etc. - it's only IDE being confused that .. is not **real** 
parent but a legacy namespace package.
   
   HOWEVER, it DOES work in the "including" projects where the shared libraries 
are symlinked - and this is, I guess in most cases where people will be working 
on any changes anyway - so this is really, really small thing:
   
   <img width="1163" height="238" alt="Screenshot 2025-07-19 at 09 08 52" 
src="https://github.com/user-attachments/assets/8c181a97-f54b-421f-9ad2-d40257376f09";
 />
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to