jaklan commented on issue #20974:
URL: https://github.com/apache/airflow/issues/20974#issuecomment-3041781553

   First round of tests (Airflow 2.10.1, locally using `airflow standalone`):
   
   1. `**kwargs`:
   
       ```python
       def dummy_callable(**kwargs):
           print(kwargs)
       ```
   
       a. `system_site_packages=False` + `use_dill=False`:
        
       ```shell
       TypeError: cannot pickle 'module' object
       ```
       > As expected, caused by i.a. `'macros': <module 'airflow.macros' from 
'<some/path'>`
   
       b. `system_site_packages=False` + `use_dill=True` + 
`requirements=["apache-airflow=2.10.1", "dill==0.3.1.1"]`:
   
       ```shell
       TypeError: code() argument 13 must be str, not int
       ```
   
       > I expected that one to work, but well 🙃 No idea what `code()` is
   
       <details>
       
       <summary>Traceback</summary>
       
       ```shell
       [2025-07-06, 09:48:14 UTC] {process_utils.py:186} INFO - Executing cmd: 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/bin/python 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.py 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.in 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.out 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/string_args.txt
 
/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/termination.log
       [2025-07-06, 09:48:14 UTC] {process_utils.py:190} INFO - Output:
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - Traceback (most 
recent call last):
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -   File 
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv-call8v8a8wby/script.py", 
line 29, in <module>
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -     arg_dict = 
dill.load(file)
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -                
^^^^^^^^^^^^^^^
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -   File 
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/lib/python3.11/site-packages/dill/_dill.py",
 line 270, in load
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -     return 
Unpickler(file, ignore=ignore, **kwds).load()
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -            
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -   File 
"/var/folders/rr/t205jkwj18l8gddtwjvzdtpr0000gn/T/venv6oahkg8d/lib/python3.11/site-packages/dill/_dill.py",
 line 472, in load
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -     obj = 
StockUnpickler.load(self)
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO -           
^^^^^^^^^^^^^^^^^^^^^^^^^
       [2025-07-06, 09:48:16 UTC] {process_utils.py:194} INFO - TypeError: 
code() argument 13 must be str, not int
       ```
   
        </details>
   
   2. `get_current_context()`:
   
       ```python
       def dummy_callable():
           from airflow.operators.python import get_current_context
           context = get_current_context()
           print(context)
       ```
   
       a. `system_site_packages=False` + `use_dill=False` + 
`requirements=["apache-airflow=2.10.1"]`:
   
       ```shell
       airflow.exceptions.AirflowException: Current context was requested but 
no context was found! Are you running within an airflow task?
       ```
   
       b. `system_site_packages=False` + `use_dill=True` + 
`requirements=["apache-airflow=2.10.1", "dill==0.3.1.1"]`:
   
       ```shell
       airflow.exceptions.AirflowException: Current context was requested but 
no context was found! Are you running within an airflow task?
       ```
   
       > Based on the above discussion - both expected I guess
   
   3. Individual context parameters:
   
       ```python
       def dummy_callable(ds, dag, task, ti, params, ...):
           print(ds)
           print(dag)
           ...
       ```
   
       a. `system_site_packages=False` + `use_dill=False`:
        
       - what works: strings (e.g. `ds`), integers (e.g. `expanded_ti_count`), 
dicts (e.g. `params`), lists (e.g. `inlets`), bools (e.g. `test_mode`)
       - what doesn't work due to `TypeError: cannot pickle 'module' object`: 
`dag`, `task`, `macros`, `dag_run`
       - what doesn't work due to `TypeError: generate_report() missing 1 
required positional argument`: `task_instance`, `ti`, `conn`, `var`
       - what doesn't work due to `ModuleNotFoundError: No module named 
'airflow'`: `conf`
       - what doesn't work due to `ModuleNotFoundError: No module named 
'lazy_object_proxy'`: `triggering_dataset_events`
       - what doesn't work due to `ModuleNotFoundError: No module named 
'pendulum'`: pendulum.DateTime (e.g. `logical_date`)
       
       b. `system_site_packages=False` + `use_dill=False` + 
`requirements=["apache-airflow==2.10.1", "pendulum==3.0.0", 
"lazy-object-proxy==1.10.0"]`:
   
       - what started to work: `conf`, `triggering_dataset_events`, 
pendulum.DateTime (e.g. `logical_date`)
       - the rest without changes
   
       c. `system_site_packages=False` + `use_dill=True` + 
`requirements=["apache-airflow==2.10.1", "pendulum==3.0.0", 
"lazy-object-proxy==1.10.0", "dill==0.3.1.1"]`:
   
       - what started to work: `dag`, `task`, `macros`, `dag_run`
       - the rest without changes<br><br>
   
       > Nothing surprising here
   
   So to sum up:
   1.  I was surprised `**kwargs` didn't work with `dill` enabled, also the 
error was very confusing
   2. `get_current_context()` indeed never works in the callable of the 
PythonVirtualenvOperator
   3. Individual context parameters work as expected, it's just important to a) 
specify required dependencies in `requirements` b) remember that some variables 
like `task_instance` or `var` are simply never passed
   
   If the mentioned issue with `**kwargs` and `dill` is not solvable - I would 
say individual context params are the way to go as a recommendation then. The 
issue with `ti` and `var` not being passed and the need to add 
`apache-airflow`, `pendulum` and `lazy_object_proxy` (or to enable 
`system_site_packages`) are already mentioned in 
[docs](https://airflow.apache.org/docs/apache-airflow-providers-standard/stable/operators/python.html#id1):
   
   > Unfortunately, Airflow does not support serializing `var`, `ti` and 
`task_instance` due to incompatibilities with the underlying library. For 
Airflow context variables make sure that you either have access to Airflow 
through setting `system_site_packages` to `True` or add `apache-airflow` to the 
`requirements` argument. Otherwise you won’t have access to the most context 
variables of Airflow in `op_kwargs`. If you want the context related to 
datetime objects like `data_interval_start` you can add `pendulum` and 
`lazy_object_proxy`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to