FaresDev8 opened a new issue, #53127:
URL: https://github.com/apache/airflow/issues/53127

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   Issue faced on two versions:
   **apache-airflow-providers-google==12.0.0
   apache-airflow-providers-google==15.1.0**
   
   ### Apache Airflow version
   
   2.10.5
   
   ### Operating System
   
   Debian Linux 12
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Docker version 27.5.1
   No tools used other than airflow.
   
   
   ### What happened
   
   Code reference that causes the issue:
   
   
https://github.com/apache/airflow/blob/e142ab96a0ecbb953e9fb8ab6d17b4c2a7624aba/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py#L1663-L1666
   
   
https://github.com/apache/airflow/blob/e142ab96a0ecbb953e9fb8ab6d17b4c2a7624aba/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py#L2061-L2062
   
   While executing an external table creation job, I faced an issue that the 
above code runs into an error: 
   ```
   File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/google/cloud/hooks/bigquery.py",
 line 2077, in _format_schema_for_description
       for field in schema["fields"]:
                    ~~~~~~^^^^^^^^^^
   KeyError: 'fields'
   ```
   
   After debugging I noticed that this kind of jobs return an empty dictionary 
for the key "schema"
   This causes the second code reference in line 2061-2062 to try to loop over 
an empty dictionary and then it fails as there are no key "fields".
   
   ### What you think should happen instead
   
   What should happen is that the following script should check if the value 
for key "schema" exists and not check if the key it self exists
   
   
https://github.com/apache/airflow/blob/e142ab96a0ecbb953e9fb8ab6d17b4c2a7624aba/providers/google/src/airflow/providers/google/cloud/hooks/bigquery.py#L1663-L1666
   
   If we make the logic to check the value instead of the key then it will make 
sure that it won't call function '**_format_schema_for_description()**' without 
a proper schema values.
   
   To fix this we can modify the script at line 1663-1666 to the following 
script:
   
   
   `if query_results["schema"]:`
   `    self.description = 
_format_schema_for_description(query_results["schema"])`
   `else:`
   `    self.description = []`
   
   
   ### How to reproduce
   
   To reproduce the problem you need the following:
   1- Airflow and airflow-providers installed < versions dont matter much as 
this script exists in multiple versions.
   2- Create a sql script that create an external table < This will assure no 
schema will be returned.
   3- execute the sql script using the cursor from the bigqueryhook
   `hook = BigQueryHook(gcp_conn_id=gcp_conn_id, use_legacy_sql=False)
   conn = hook.get_conn()
   cursor = conn.cursor()
   cursor.execute(operation='SELECT 1 AS NUMBER')`
   
   ### Anything else
   
   This problem is part of the script and will appear whenever anyone tries to 
execute a non resulting query.
   Something like creating and dropping tables, data transformation, and other 
system queries.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to