dabla opened a new issue, #3284:
URL: https://github.com/apache/arrow-adbc/issues/3284

   ### What happened?
   
   Hello, I'm an Apache Airflow contributor working together with @zeroshade to 
implement an [Apache Arrow provider for 
Airflow](https://github.com/apache/airflow/pull/52330/). I've been implementing 
an 
[ADBCHook](https://github.com/apache/airflow/pull/52330/files#diff-8b068a49eea0ba56d52c9e4e90dfe6f5a315ed6cbb4ec2055e0d0046af71090a)
 which allows Airflow to integrate with ADBC as this is how Airflow integrates 
with different databases like ODBC or JDBC.  While testing ADBC in Airflow with 
a Postgres database I've came across an issue with JSONB columns in tables.  If 
I ommit the JSOB column from the INSERT statement everything works, I don't 
have any issues with the native [Postgres](https://www.psycopg.org/) driver and 
thus native PostgresHook in Airflow.
   
   When logging the schema of the table, I see the JSONB column is typed as a 
pyarrow string:
   
   ```
   table_schema: index: int64
   node_type: string
   functional_key: string
   id: string
   name: string
   valid_from_date: timestamp[us]
   valid_to_date: timestamp[us]
   micro_functions: string
   ```
   
   This SQL statement to perform inserts are generated dynamically by Airflow 
depending on the targeted database (e.g. dialect) as this is done the same way 
with other integrations (e.g. ODBC/JDBC) as we also have to support UPSERT 
statements which isn't supported by the adbc_ingest method, but even while 
doing a pure INSERT instead of an UPSERT or by using the adbc_ingest method, 
the same error arises when a JSONB column is present in the table.
   
   ### Stack Trace
   
   ```
   ProgrammingError: INVALID_ARGUMENT: Failed to prepare query: ERROR:  column 
"micro_functions" is of type jsonb but expression is of type text
   LINE 1: ...h_switch_partner_id) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,...
                                                                ^
   HINT:  You will need to rewrite or cast the expression.
   
   Query was:INSERT INTO nodes.node (index, node_type, functional_key, id, 
name, valid_from_date, valid_to_date, micro_functions) VALUES 
($1,$2,$3,$4,$5,$6,$7,$8) ON CONFLICT (id, valid_from_date) DO UPDATE SET index 
= excluded.index, node_type = excluded.node_type, functional_key = 
excluded.functional_key, name = excluded.name, valid_to_date = 
excluded.valid_to_date, micro_functions = excluded.micro_functions. SQLSTATE: 
42804
   File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 867 in run
   
   File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1159 in _execute_task
   
   File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 
397 in wrapper
   
   File "/usr/local/airflow/includes/common/sql/sql.py", line 178 in execute
   
   File "/usr/local/airflow/plugins/infrabel/hooks/adbc.py", line 385 in 
insert_rows
   
   File 
"/usr/local/lib64/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 
771 in executemany
   
   File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1674 in 
adbc_driver_manager._lib._blocking_call_impl
   
   File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1667 in 
adbc_driver_manager._lib._blocking_call_impl
   
   File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1391 in 
adbc_driver_manager._lib.AdbcStatement.execute_update
   
   File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 261 in 
adbc_driver_manager._lib.check_error
   ```
   
   ### How can we reproduce the bug?
   
   ```
   CREATE TABLE nodes.node (
        "index" int8 NULL,
        node_type text NULL,
        functional_key text NULL,
        id text NOT NULL,
        "name" text NULL,
        valid_from_date timestamp NOT NULL,
        valid_to_date timestamp NULL,
        micro_functions jsonb NULL,
        CONSTRAINT int_node_pkey PRIMARY KEY (id, valid_from_date)
   );
   CREATE INDEX node_valid_from_date_valid_to_date_idx ON nodes.node USING 
btree (valid_from_date, valid_to_date);
   CREATE INDEX ix_nodes_node_index ON nodes.node USING btree (index);
   ```
   
   ### Environment/Setup
   
   Tested on Linux with latest pyarrow 21.0.0 and adbc-driver-postgresql 1.7.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to