dabla opened a new issue, #3284: URL: https://github.com/apache/arrow-adbc/issues/3284
### What happened? Hello, I'm an Apache Airflow contributor working together with @zeroshade to implement an [Apache Arrow provider for Airflow](https://github.com/apache/airflow/pull/52330/). I've been implementing an [ADBCHook](https://github.com/apache/airflow/pull/52330/files#diff-8b068a49eea0ba56d52c9e4e90dfe6f5a315ed6cbb4ec2055e0d0046af71090a) which allows Airflow to integrate with ADBC as this is how Airflow integrates with different databases like ODBC or JDBC. While testing ADBC in Airflow with a Postgres database I've came across an issue with JSONB columns in tables. If I ommit the JSOB column from the INSERT statement everything works, I don't have any issues with the native [Postgres](https://www.psycopg.org/) driver and thus native PostgresHook in Airflow. When logging the schema of the table, I see the JSONB column is typed as a pyarrow string: ``` table_schema: index: int64 node_type: string functional_key: string id: string name: string valid_from_date: timestamp[us] valid_to_date: timestamp[us] micro_functions: string ``` This SQL statement to perform inserts are generated dynamically by Airflow depending on the targeted database (e.g. dialect) as this is done the same way with other integrations (e.g. ODBC/JDBC) as we also have to support UPSERT statements which isn't supported by the adbc_ingest method, but even while doing a pure INSERT instead of an UPSERT or by using the adbc_ingest method, the same error arises when a JSONB column is present in the table. ### Stack Trace ``` ProgrammingError: INVALID_ARGUMENT: Failed to prepare query: ERROR: column "micro_functions" is of type jsonb but expression is of type text LINE 1: ...h_switch_partner_id) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,... ^ HINT: You will need to rewrite or cast the expression. Query was:INSERT INTO nodes.node (index, node_type, functional_key, id, name, valid_from_date, valid_to_date, micro_functions) VALUES ($1,$2,$3,$4,$5,$6,$7,$8) ON CONFLICT (id, valid_from_date) DO UPDATE SET index = excluded.index, node_type = excluded.node_type, functional_key = excluded.functional_key, name = excluded.name, valid_to_date = excluded.valid_to_date, micro_functions = excluded.micro_functions. SQLSTATE: 42804 File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 867 in run File "/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1159 in _execute_task File "/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper File "/usr/local/airflow/includes/common/sql/sql.py", line 178 in execute File "/usr/local/airflow/plugins/infrabel/hooks/adbc.py", line 385 in insert_rows File "/usr/local/lib64/python3.12/site-packages/adbc_driver_manager/dbapi.py", line 771 in executemany File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1674 in adbc_driver_manager._lib._blocking_call_impl File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1667 in adbc_driver_manager._lib._blocking_call_impl File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 1391 in adbc_driver_manager._lib.AdbcStatement.execute_update File "/usr/local/airflow/adbc_driver_manager/_lib.pyx", line 261 in adbc_driver_manager._lib.check_error ``` ### How can we reproduce the bug? ``` CREATE TABLE nodes.node ( "index" int8 NULL, node_type text NULL, functional_key text NULL, id text NOT NULL, "name" text NULL, valid_from_date timestamp NOT NULL, valid_to_date timestamp NULL, micro_functions jsonb NULL, CONSTRAINT int_node_pkey PRIMARY KEY (id, valid_from_date) ); CREATE INDEX node_valid_from_date_valid_to_date_idx ON nodes.node USING btree (valid_from_date, valid_to_date); CREATE INDEX ix_nodes_node_index ON nodes.node USING btree (index); ``` ### Environment/Setup Tested on Linux with latest pyarrow 21.0.0 and adbc-driver-postgresql 1.7.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
