npennequin opened a new issue, #14239:
URL: https://github.com/apache/iceberg/issues/14239

   ### Apache Iceberg version
   
   1.10.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We are using a Spark job to remove expired rows from an Iceberg table based 
on a unix timestamp column (stored as an integer).
   
   The query is
   
       DELETE FROM $table
       WHERE status = 'Deleted'
       AND deleted_date < $timestamp
   
   This works correctly when the table has `'write.delete.mode' = 
'merge-on-read'`.
   
   However with the table in `copy-on-write` mode, the query fails with an 
`IllegalStateException` in the last stage of the deletion to rewrite data:
   
   ```
   Caused by: java.lang.IllegalStateException: Not an instance of 
java.lang.CharSequence: 1234
        at org.apache.iceberg.data.GenericRecord.get(GenericRecord.java:138)
        at 
org.apache.iceberg.data.InternalRecordWrapper.get(InternalRecordWrapper.java:101)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:121)
        at 
org.apache.iceberg.types.Comparators$StructLikeComparator.compare(Comparators.java:94)
        at 
org.apache.iceberg.util.StructLikeWrapper.equals(StructLikeWrapper.java:91)
        at java.base/java.util.HashMap.getNode(HashMap.java:568)
        at java.base/java.util.HashMap.containsKey(HashMap.java:592)
        at java.base/java.util.HashSet.contains(HashSet.java:204)
        at org.apache.iceberg.util.StructLikeSet.contains(StructLikeSet.java:61)
        at 
org.apache.iceberg.data.DeleteFilter.lambda$applyEqDeletes$0(DeleteFilter.java:209)
        at org.apache.iceberg.deletes.Deletes$1.shouldKeep(Deletes.java:101)
        at org.apache.iceberg.util.Filter$Iterator.shouldKeep(Filter.java:48)
        at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66)
        at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
        at scala.Option.exists(Option.scala:376)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:168)
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to