flisboac commented on issue #9650:
URL: https://github.com/apache/iceberg/issues/9650#issuecomment-2054171388
Well, now errors are happening on a `MERGE INTO` as well. Usage is the same
as what the OP reported. Also, because I'm using PySpark, a more detailed error
report is somewhat hidden behind PySpark's abstractions, so the only stacktrace
immediately available is this one:
```text
Traceback (most recent call last):
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py",
line 2110, in <module>
main()
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py",
line 2106, in main
cdc_processor.run()
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py",
line 344, in run
self._do_run(execution_context)
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py",
line 412, in _do_run
self._do_run_cdc(manifest_file, now=now)
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/run_cdc_job.py",
line 602, in _do_run_cdc
merge_sql_count = self._spark_session.sql(merge_sql,
args=merge_sql_params).count()
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/pyspark.zip/pyspark/sql/session.py",
line 1631, in sql
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
line 1322, in __call__
File
"/mnt1/yarn/usercache/hadoop/appcache/application_1713122184304_0002/container_1713122184304_0002_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py",
line 185, in deco
pyspark.errors.exceptions.captured.IllegalArgumentException: Comparison
method violates its general contract!
```
`MERGE INTO` query looks like this:
```sql
MERGE INTO spark_catalog.DATABASE.TABLE target
USING __cdc_source_data__ source
ON (
1=1
AND (target.int_field_1 BETWEEN :__CDC_PDFILTER_MIN__int_field_1 AND
:__CDC_PDFILTER_MAX__int_field_1)
AND (target.int_field_2 BETWEEN :__CDC_PDFILTER_MIN__int_field_2 AND
:__CDC_PDFILTER_MAX__int_field_2)
AND (target.int_field_3 BETWEEN :__CDC_PDFILTER_MIN__int_field_3 AND
:__CDC_PDFILTER_MAX__int_field_3)
AND target.int_field_1 <=> source.int_field_1
AND target.int_field_2 <=> source.int_field_2
AND target.int_field_3 <=> source.int_field_3
)
WHEN MATCHED AND source.operation = 'D' THEN DELETE
WHEN MATCHED AND source.operation IN ('U', 'I') THEN UPDATE SET
target.int_field_1 = source.int_field_1,
target.int_field_2 = source.int_field_2,
target.int_field_3 = source.int_field_3,
-- A lot more field assignments here
target.dt_geracao = cast('2024-04-14T19:20:06.774Z' as timestamp)
WHEN NOT MATCHED AND source.operation in ('I', 'U') THEN INSERT
(
int_field_1,
int_field_2,
int_field_3,
-- A lot more more column names here
) VALUES (
source.int_field_1,
source.int_field_2,
source.int_field_3,
-- A lot more more column values here
cast('2024-04-14T19:20:06.774Z' as timestamp)
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]