npennequin opened a new issue, #14239:
URL: https://github.com/apache/iceberg/issues/14239
### Apache Iceberg version
1.10.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
We are using a Spark job to remove expired rows from an Iceberg table based
on a unix timestamp column (stored as an integer).
The query is
DELETE FROM $table
WHERE status = 'Deleted'
AND deleted_date < $timestamp
This works correctly when the table has `'write.delete.mode' =
'merge-on-read'`.
However with the table in `copy-on-write` mode, the query fails with an
`IllegalStateException` in the last stage of the deletion to rewrite data:
```
Caused by: java.lang.IllegalStateException: Not an instance of
java.lang.CharSequence: 1234
at org.apache.iceberg.data.GenericRecord.get(GenericRecord.java:138)
at
org.apache.iceberg.data.InternalRecordWrapper.get(InternalRecordWrapper.java:101)
at
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:121)
at
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:94)
at
org.apache.iceberg.util.StructLikeWrapper.equals(StructLikeWrapper.java:91)
at java.base/java.util.HashMap.getNode(HashMap.java:568)
at java.base/java.util.HashMap.containsKey(HashMap.java:592)
at java.base/java.util.HashSet.contains(HashSet.java:204)
at org.apache.iceberg.util.StructLikeSet.contains(StructLikeSet.java:61)
at
org.apache.iceberg.data.DeleteFilter.lambda$applyEqDeletes$0(DeleteFilter.java:209)
at org.apache.iceberg.deletes.Deletes$1.shouldKeep(Deletes.java:101)
at org.apache.iceberg.util.Filter$Iterator.shouldKeep(Filter.java:48)
at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66)
at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135)
at
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
at
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
at scala.Option.exists(Option.scala:376)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
```
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]