jaklan commented on issue #20974:
URL: https://github.com/apache/airflow/issues/20974#issuecomment-3041781553
First round of tests (Airflow 2.10.1, locally using `airflow standalone`):
1. `**kwargs`:
```python
def dummy_callable(**kwargs):
print(kwargs)
```
a. `system_site_packages=False` + `use_dill=False`:
```shell
TypeError: cannot pickle 'module' object
```
> As expected, caused by i.a. `'macros': <module 'airflow.macros' from
'<some/path'>`
b. `system_site_packages=False` + `use_dill=True` +
`requirements=["apache-airflow=2.10.1", "dill==0.3.1.1"]`:
```shell
TypeError: code() argument 13 must be str, not int
```
> I expected that one to work, but well 🙃 No idea what `code()` is
<details>
<summary>Traceback</summary>
```shell
[2025-07-06, 09:48:14 UTC] {process_utils.py:186} INFO - Executing cmd:
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/bin/python
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.py
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.in
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.out
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/string_args.txt
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/termination.log
[2025-07-06, 09:48:14 UTC] {process_utils.py:190} INFO - Output:
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - Traceback (most
recent call last):
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - File
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.py",
line 29, in <module>
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - arg_dict =
dill.load(file)
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -
^^^^^^^^^^^^^^^
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - File
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/lib/python3.11/site-packages/dill/_dill.py",
line 270, in load
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - return
Unpickler(file, ignore=ignore, **kwds).load()
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - File
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/lib/python3.11/site-packages/dill/_dill.py",
line 472, in load
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - obj =
StockUnpickler.load(self)
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -
^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - TypeError:
code() argument 13 must be str, not int
```
</details>
2. `get_current_context()`:
```python
def dummy_callable():
from airflow.operators.python import get_current_context
context = get_current_context()
print(context)
```
a. `system_site_packages=False` + `use_dill=False` +
`requirements=["apache-airflow=2.10.1"]`:
```shell
airflow.exceptions.AirflowException: Current context was requested but
no context was found! Are you running within an airflow task?
```
b. `system_site_packages=False` + `use_dill=True` +
`requirements=["apache-airflow=2.10.1", "dill==0.3.1.1"]`:
```shell
airflow.exceptions.AirflowException: Current context was requested but
no context was found! Are you running within an airflow task?
```
> Based on the above discussion - both expected I guess
3. Individual context parameters:
```python
def dummy_callable(ds, dag, task, ti, params, ...):
print(ds)
print(dag)
...
```
a. `system_site_packages=False` + `use_dill=False`:
- what works: strings (e.g. `ds`), integers (e.g. `expanded_ti_count`),
dicts (e.g. `params`), lists (e.g. `inlets`), bools (e.g. `test_mode`)
- what doesn't work due to `TypeError: cannot pickle 'module' object`:
`dag`, `task`, `macros`, `dag_run`
- what doesn't work due to `TypeError: generate_report() missing 1
required positional argument`: `task_instance`, `ti`, `conn`, `var`
- what doesn't work due to `ModuleNotFoundError: No module named
'airflow'`: `conf`
- what doesn't work due to `ModuleNotFoundError: No module named
'lazy_object_proxy'`: `triggering_dataset_events`
- what doesn't work due to `ModuleNotFoundError: No module named
'pendulum'`: pendulum.DateTime (e.g. `logical_date`)
b. `system_site_packages=False` + `use_dill=False` +
`requirements=["apache-airflow==2.10.1", "pendulum==3.0.0",
"lazy-object-proxy==1.10.0"]`:
- what started to work: `conf`, `triggering_dataset_events`,
pendulum.DateTime (e.g. `logical_date`)
- the rest without changes
c. `system_site_packages=False` + `use_dill=True` +
`requirements=["apache-airflow==2.10.1", "pendulum==3.0.0",
"lazy-object-proxy==1.10.0", "dill==0.3.1.1"]`:
- what started to work: `dag`, `task`, `macros`, `dag_run`
- the rest without changes<br><br>
> Nothing surprising here
So to sum up:
1. I was surprised `**kwargs` didn't work with `dill` enabled, also the
error was very confusing
2. `get_current_context()` indeed never works in the callable of the
PythonVirtualenvOperator
3. Individual context parameters work as expected, it's just important to a)
specify required dependencies in `requirements` b) remember that some variables
like `task_instance` or `var` are simply never passed
If the mentioned issue with `**kwargs` and `dill` is not solvable - I would
say individual context params are the way to go as a recommendation then. The
issue with `ti` and `var` not being passed and the need to add
`apache-airflow`, `pendulum` and `lazy_object_proxy` (or to enable
`system_site_packages`) are already mentioned in
[docs](https://airflow.apache.org/docs/apache-airflow-providers-standard/stable/operators/python.html#id1):
> Unfortunately, Airflow does not support serializing `var`, `ti` and
`task_instance` due to incompatibilities with the underlying library. For
Airflow context variables make sure that you either have access to Airflow
through setting `system_site_packages` to `True` or add `apache-airflow` to the
`requirements` argument. Otherwise you won’t have access to the most context
variables of Airflow in `op_kwargs`. If you want the context related to
datetime objects like `data_interval_start` you can add `pendulum` and
`lazy_object_proxy`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]