flisboac commented on issue #9650: URL: https://github.com/apache/iceberg/issues/9650#issuecomment-2054171388
Well, now errors are happening on a `MERGE INTO` as well. Usage is the same as what the OP reported. Also, because I'm using PySpark, a more detailed error report is somewhat hidden behind PySpark's abstractions, so the only stacktrace immediately available is this one: ```text Traceback (most recent call last): File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py", line 2110, in <module> main() File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py", line 2106, in main cdc_processor.run() File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py", line 344, in run self._do_run(execution_context) File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py", line 412, in _do_run self._do_run_cdc(manifest_file, now=now) File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py", line 602, in _do_run_cdc merge_sql_count = self._spark_session.sql(merge_sql, args=merge_sql_params).count() File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/pyspark.zip/pyspark/sql/session.py", line 1631, in sql File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py", line 185, in deco pyspark.errors.exceptions.captured.IllegalArgumentException: Comparison method violates its general contract! ``` `MERGE INTO` query looks like this: ```sql MERGE INTO spark_catalog.DATABASE.TABLE target USING __cdc_source_data__ source ON ( 1=1 AND (target.int_field_1 BETWEEN :__CDC_PDFILTER_MIN__int_field_1 AND :__CDC_PDFILTER_MAX__int_field_1) AND (target.int_field_2 BETWEEN :__CDC_PDFILTER_MIN__int_field_2 AND :__CDC_PDFILTER_MAX__int_field_2) AND (target.int_field_3 BETWEEN :__CDC_PDFILTER_MIN__int_field_3 AND :__CDC_PDFILTER_MAX__int_field_3) AND target.int_field_1 <=> source.int_field_1 AND target.int_field_2 <=> source.int_field_2 AND target.int_field_3 <=> source.int_field_3 ) WHEN MATCHED AND source.operation = 'D' THEN DELETE WHEN MATCHED AND source.operation IN ('U', 'I') THEN UPDATE SET target.int_field_1 = source.int_field_1, target.int_field_2 = source.int_field_2, target.int_field_3 = source.int_field_3, -- A lot more field assignments here target.dt_geracao = cast('2024-04-14T19:20:06.774Z' as timestamp) WHEN NOT MATCHED AND source.operation in ('I', 'U') THEN INSERT ( int_field_1, int_field_2, int_field_3, -- A lot more more column names here ) VALUES ( source.int_field_1, source.int_field_2, source.int_field_3, -- A lot more more column values here cast('2024-04-14T19:20:06.774Z' as timestamp) ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org