tom-s-powell commented on PR #14750: URL: https://github.com/apache/iceberg/pull/14750#issuecomment-3616594751
We are also going to need https://github.com/apache/iceberg/pull/14751 for full testable change. The test I would like to add to https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestTableEncryption.java is: ``` @TestTemplate public void testDropTableWithPurge() { List<Object[]> dataFileTable = sql("SELECT file_path FROM %s.%s", tableName, MetadataTableType.ALL_DATA_FILES); List<String> dataFiles = Streams.concat(dataFileTable.stream()) .map(row -> (String) row[0]) .collect(Collectors.toList()); assertThat(dataFiles).isNotEmpty(); assertThat(dataFiles) .allSatisfy(filePath -> assertThat(localInput(filePath).exists()).isTrue()); sql("DROP TABLE %s PURGE", tableName); assertThatThrownBy(() -> catalog.loadTable(tableIdent)) .isInstanceOf(NoSuchTableException.class); assertThat(dataFiles) .allSatisfy(filePath -> assertThat(localInput(filePath).exists()).isFalse()); } ``` Without the changes in https://github.com/apache/iceberg/pull/14751 we get the following: ``` Caused by: java.lang.IllegalStateException: Cannot return the encryption keys after serialization at org.apache.iceberg.encryption.StandardEncryptionManager.encryptionKeys(StandardEncryptionManager.java:170) at org.apache.iceberg.encryption.EncryptionUtil.decryptManifestListKeyMetadata(EncryptionUtil.java:132) at org.apache.iceberg.BaseManifestListFile.decryptKeyMetadata(BaseManifestListFile.java:47) at org.apache.iceberg.encryption.EncryptingFileIO.newInputFile(EncryptingFileIO.java:115) at org.apache.iceberg.AllManifestsTable$ManifestListReadTask.file(AllManifestsTable.java:223) at org.apache.iceberg.AllManifestsTable$ManifestListReadTask.file(AllManifestsTable.java:160) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:87) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43) ``` If we have the changes from https://github.com/apache/iceberg/pull/14751 without the changes in this PR we get the following: ``` org.apache.iceberg.exceptions.RuntimeIOException: Failed to open file: file:/var/folders/2v/qzfl1x_137l3dyycx4j_d_29081c3k/T/hive5130984441689838261/table/metadata/snap-1670846233047793002-1-e953eed2-338c-45fc-8060-6722e78ea54a.avro at app//org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:113) at app//org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:78) at app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205) at app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204) at app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205) at app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204) at app//org.apache.iceberg.io.CloseableIterable$7$1.<init>(CloseableIterable.java:205) at app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:204) at app//org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:196) at app//org.apache.iceberg.util.Filter.lambda$filter$0(Filter.java:34) at app//org.apache.iceberg.io.CloseableIterable$2.iterator(CloseableIterable.java:89) at app//org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:99) at app//org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43) at app//org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141) at app//org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:148) at app//org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:186) at app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:72) at app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:72) at app//scala.Option.exists(Option.scala:406) at app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72) at app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:103) at app//org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72) at app//org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at app//scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:593) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at app//org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at app//org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at app//scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:593) at app//org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:143) at app//org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57) at app//org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:111) at app//org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at app//org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at app//org.apache.spark.scheduler.Task.run(Task.scala:147) at app//org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at app//org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at app//org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at app//org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at [email protected]/java.lang.Thread.run(Thread.java:1583) Caused by: org.apache.avro.InvalidAvroMagicException: Not an Avro data file at app//org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:79) at app//org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:104) ... 42 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
