[I] manifest list missing error after "cannot commit table due to base location not same as glue location" [iceberg]

via GitHub Wed, 03 Jan 2024 18:15:55 -0800


waichee opened a new issue, #9406:
URL: https://github.com/apache/iceberg/issues/9406


   ### Apache Iceberg version
   
   1.3.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   **Setup**
   We use the following spark libraries to write to Iceberg on EMR:
   `org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1`
   `org.apache.iceberg:iceberg-spark-extensions-3.4_2.12:1.3.1`
   
   We have setup DynamoDb lock manager with Glue catalog for our iceberg table.
   
   Spark config
   ```
         sparkConf
           .set("spark.sql.catalog.iceberg", 
"org.apache.iceberg.spark.SparkCatalog")
           .set("spark.sql.catalog.iceberg.catalog-impl", 
"org.apache.iceberg.aws.glue.GlueCatalog")
           .set("spark.sql.catalog.iceberg.lock-impl", 
"org.apache.iceberg.aws.dynamodb.DynamoDbLockManager")
           .set("spark.sql.catalog.iceberg.io-impl", 
"org.apache.iceberg.aws.s3.S3FileIO")
           .set("spark.sql.catalog.iceberg.warehouse", warehousePath)
           .set("spark.sql.catalog.iceberg.lock.table", locktableName)
           .set("spark.sql.defaultCatalog","iceberg");
   ```
   
   table properties
   ```
       Map<String, String> tableOptions = Map.of(
         "provider", CATALOG_PROVIDER,
         "write.parquet.compression-codec", "snappy",
         "write.delete.mode", "copy-on-write",
         "write.update.mode", "merge-on-read",
         "write.merge.mode", "merge-on-read",
         "write.spark.accept-any-schema", "true"
       );
   
   ```
   
   java code to write to Iceberg table
   ```
       datafram
           .writeTo(String.format("%s.%s", database, tablename))
           .option("mergeSchema", "true")
           .append();
   ```
   
   **What happened**
   We noticed the `.append` write to a table failed with the following 
exception.
   
   ```
   Caused by: org.apache.iceberg.exceptions.CommitFailedException: Cannot 
commit iceberg.zdp_l1_scd2_cdc_sharddb.fraud_scores because base metadata 
location 's3://bucket/iceberg/db/tableName/metadata/metadata1.json' is not same 
as the current Glue location  
's3://bucket/iceberg/db/tableName/metadata/metadata2.json`
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.checkMetadataLocation(GlueTableOperations.java:272)
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:158)
        
   ```
   When we restart the job, the app throws  404 not found when looking up the 
manifest file for the Iceberg table
   
   ```
   Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: null 
(Service: S3, Status Code: 404, Request ID: BGEGZNRNM1YR7MGW, Extended Request 
ID: 
qTaJ1r8Ky1vMKOAU5+r4djAQY9r05jGqvFFGLJz9WD6upWROlpIroSEefjGsw2OQI41k/pHKRRI=) 
(Service: S3, Status Code: 404, Request ID: BGEGZNRNM1YR7MGW)
        at 
software.amazon.awssdk.services.s3.model.NoSuchKeyException$BuilderImpl.build(NoSuchKeyException.java:126)
        at 
software.amazon.awssdk.services.s3.model.NoSuchKeyException$BuilderImpl.build(NoSuchKeyException.java:80)
        at 
software.amazon.awssdk.services.s3.internal.handlers.ExceptionTranslationInterceptor.modifyException(ExceptionTranslationInterceptor.java:63)
        at 
software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.modifyException(ExecutionInterceptorChain.java:202)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.ExceptionReportingUtils.runModifyException(ExceptionReportingUtils.java:54)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.ExceptionReportingUtils.reportFailureToInterceptors(ExceptionReportingUtils.java:38)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:39)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:198)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
        at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
        at 
software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5495)
        at 
org.apache.iceberg.aws.s3.BaseS3File.getObjectMetadata(BaseS3File.java:85)
        at org.apache.iceberg.aws.s3.S3InputFile.getLength(S3InputFile.java:77)
        at 
org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76)
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:36)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:333)
        at 
org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:241)
        at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45)
        at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:146)
        at org.apache.iceberg.BaseSnapshot.dataManifests(BaseSnapshot.java:172)
        at 
org.apache.iceberg.MergingSnapshotProducer.apply(MergingSnapshotProducer.java:826)
        at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:226)
        at 
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:376)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
        at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:374)
        at 
org.apache.iceberg.spark.source.SparkWrite.commitOperation(SparkWrite.java:222)
        at 
org.apache.iceberg.spark.source.SparkWrite.access$1300(SparkWrite.java:84)
        at 
org.apache.iceberg.spark.source.SparkWrite$BatchAppend.commit(SparkWrite.java:285)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:422)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:382)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:248)
        at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:360)
        at 
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:359)
        at 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:248)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:104)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:123)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:160)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:250)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:160)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:271)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:159)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:69)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
        at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:554)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:107)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:554)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:530)
        at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:97)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:84)
        at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:82)
        at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
        at 
org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:195)
        at 
org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:149)
   
   ```
   
   Inspecting Glue and S3 we see the following:
   - Glue : metadata = 
`s3://bucket/iceberg/db/tableName/metadata/metadata2.json`, previous_metadata = 
`s3://bucket/iceberg/db/tableName/metadata/metadata1.json`
   - S3 : Both metadata.json files exists in S3. However metadata2.json file is 
pointing to manifest file location that does not exists
   
   We are unable to use the table after this without manual rollback of the 
metadata file.
   Is there a bug in optimistic locking with Glue catalog in this case? Would 
appreciate some pointers on what would have caused this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] manifest list missing error after "cannot commit table due to base location not same as glue location" [iceberg]

Reply via email to