Re: [PR] Include key metadata in manifest tables [iceberg]

via GitHub Fri, 05 Dec 2025 03:58:07 -0800


tom-s-powell commented on PR #14750:
URL: https://github.com/apache/iceberg/pull/14750#issuecomment-3616594751


   We are also going to need https://github.com/apache/iceberg/pull/14751 for 
full testable change.
   
   The test I would like to add to 
https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestTableEncryption.java
 is:
   ```
   @TestTemplate
   public void testDropTableWithPurge() {
     List<Object[]> dataFileTable =
         sql("SELECT file_path FROM %s.%s", tableName, 
MetadataTableType.ALL_DATA_FILES);
     List<String> dataFiles =
         Streams.concat(dataFileTable.stream())
             .map(row -> (String) row[0])
             .collect(Collectors.toList());
     assertThat(dataFiles).isNotEmpty();
     assertThat(dataFiles)
         .allSatisfy(filePath -> 
assertThat(localInput(filePath).exists()).isTrue());
   
     sql("DROP TABLE %s PURGE", tableName);
   
     assertThatThrownBy(() -> catalog.loadTable(tableIdent))
         .isInstanceOf(NoSuchTableException.class);
     assertThat(dataFiles)
         .allSatisfy(filePath -> 
assertThat(localInput(filePath).exists()).isFalse());
   }
   ```
   
   Without the changes in https://github.com/apache/iceberg/pull/14751 we get 
the following:
   ```
   Caused by: java.lang.IllegalStateException: Cannot return the encryption 
keys after serialization
        at 
org.apache.iceberg.encryption.StandardEncryptionManager.encryptionKeys(StandardEncryptionManager.java:170)
        at 
org.apache.iceberg.encryption.EncryptionUtil.decryptManifestListKeyMetadata(EncryptionUtil.java:132)
        at 
org.apache.iceberg.BaseManifestListFile.decryptKeyMetadata(BaseManifestListFile.java:47)
        at 
org.apache.iceberg.encryption.EncryptingFileIO.newInputFile(EncryptingFileIO.java:115)
        at 
org.apache.iceberg.AllManifestsTable$ManifestListReadTask.file(AllManifestsTable.java:223)
        at 
org.apache.iceberg.AllManifestsTable$ManifestListReadTask.file(AllManifestsTable.java:160)
        at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:87)
        at 
org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43)
   ```
   
   If we have the changes from https://github.com/apache/iceberg/pull/14751 
without the changes in this PR we get the following:
   ```
   org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: 
file:/var/folders/2v/qzfl1x_137l3dyycx4j_d_29081c3k/T/hive5130984441689838261/table/metadata/snap-1670846233047793002-1-e953eed2-338c-45fc-8060-6722e78ea54a.avro
        at 
app//org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:113)
        at 
app//org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:78)
        at 
app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205)
        at 
app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204)
        at 
app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205)
        at 
app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204)
        at 
app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205)
        at 
app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204)
        at 
app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:196)
        at app//org.apache.iceberg.util.Filter.lambda$filter$0(Filter.java:34)
        at 
app//org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:89)
        at 
app//org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:99)
        at 
app//org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43)
        at 
app//org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141)
        at 
app//org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:148)
        at 
app//org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:186)
        at 
app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:72)
        at 
app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:72)
        at app//scala.Option.exists(Option.scala:406)
        at 
app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72)
        at 
app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:103)
        at 
app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72)
        at 
app//org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at app//scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:593)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown
 Source)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
app//org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
app//org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
        at app//scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:593)
        at 
app//org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:143)
        at 
app//org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
        at 
app//org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111)
        at 
app//org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
app//org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
        at app//org.apache.spark.scheduler.Task.run(Task.scala:147)
        at 
app//org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647)
        at 
app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80)
        at 
app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77)
        at app//org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
        at 
app//org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at [email protected]/java.lang.Thread.run(Thread.java:1583)
   Caused by: org.apache.avro.InvalidAvroMagicException: Not an Avro data file
        at 
app//org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:79)
        at 
app//org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104)
        ... 42 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Include key metadata in manifest tables [iceberg]

Reply via email to