edgarRd opened a new issue, #7635:
URL: https://github.com/apache/iceberg/issues/7635

   ### Apache Iceberg version
   
   1.2.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   # Environment
   
   Spark 3.3.2
   Iceberg 1.2.1
   
   Tabular's docker environment: 
https://github.com/tabular-io/docker-spark-iceberg/blob/main/docker-compose.yml
   
   # Repro
   
   ## 1. Setup 
   ```sql
   CREATE TABLE demo.nyc.contacts2
   (
     id bigint NOT NULL COMMENT 'unique id',
     first_name string,
     last_name string,
     neighborhood string
   )
   TBLPROPERTIES(
     'format-version'='2',
     'write.delete.mode'='merge-on-read',
     'write.update.mode'='merge-on-read',
     'write.merge.mode'='merge-on-read'
   );
   
   ALTER TABLE demo.nyc.contacts2 SET IDENTIFIER FIELDS id;
   
   INSERT INTO demo.nyc.contacts2
   SELECT /*+ COALESCE(1) */ *
   FROM VALUES (1, 'Adam', 'Smith', 'SoHo'), 
   (2, 'Virginia', 'Smith', 'SoHo'), 
   (3, 'Thomas', 'Lao', 'Midtown'),
   (4, 'John', 'Books', 'Williamsburg'),
   (5, 'Anna', 'Frank', 'Midtown');
   ```
   
   Validate setup, expectations:
   * 1 data file
   * 5 rows
   ```
   SELECT * FROM demo.nyc.contacts2;
   SELECT * FROM demo.nyc.contacts2.files;
   ```
   
   ## 2. Setup branch
   
   ```sql
   ALTER TABLE demo.nyc.contacts2 CREATE BRANCH ds20230501 RETAIN 730 DAYS;
   
   INSERT INTO  demo.nyc.contacts2.branch_ds20230501
   SELECT /*+ COALESCE(1) */ *
   FROM VALUES (6, 'Peter', 'Smith', 'Chelsea'), (7, 'John', 'Connor', 
'Greenwich Village');
   ```
   
   Validate branch setup, expectations:
   * 7 rows
   * 2 total data files (1 in main branch, 1 in branch `ds20230501`)
   
   ```sql
   SELECT * FROM demo.nyc.contacts2.branch_ds20230501;
   SELECT * FROM demo.nyc.contacts2.all_files;
   ```
   
   ## 3. Test merge-on-read delete on branch for 1 row within data file
   ```sql
   DELETE FROM demo.nyc.contacts2.branch_ds20230501 WHERE id=7;
   ```
   
   Previous command fails with:
   ```
   spark-sql> DELETE FROM demo.nyc.contacts2.branch_ds20230501 WHERE id=7;
   23/05/17 20:04:45 ERROR SparkSQLDriver: Failed in [DELETE FROM 
demo.nyc.contacts2.branch_ds20230501 WHERE id=7]
   org.apache.iceberg.exceptions.ValidationException: Cannot delete file where 
some, but not all, rows match filter ref(name="id") == 7: 
s3://warehouse/nyc/contacts2/data/00000-1051-74b41dde-49d4-4a85-844c-0ef79e1257f6-00001.parquet
        at 
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
        at 
org.apache.iceberg.ManifestFilterManager.manifestHasDeletedFiles(ManifestFilterManager.java:377)
        at 
org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:307)
        at 
org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:189)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   org.apache.iceberg.exceptions.ValidationException: Cannot delete file where 
some, but not all, rows match filter ref(name="id") == 7: 
s3://warehouse/nyc/contacts2/data/00000-1051-74b41dde-49d4-4a85-844c-0ef79e1257f6-00001.parquet
        at 
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
        at 
org.apache.iceberg.ManifestFilterManager.manifestHasDeletedFiles(ManifestFilterManager.java:377)
        at 
org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:307)
        at 
org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:189)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   ## 4. Delete from main branch also fails
   
   This error is different, but fails consistently with `400` (bad request) so 
I wonder if there's some incorrect handling here. Possibly related to the 
environment as well as using Tabular's docker setup.
   
   ```sql
   DELETE FROM demo.nyc.contacts2 WHERE id=3;
   ```
   
   ```
   Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 400, Request ID: 17600716A659BCC7, Extended Request 
ID: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)
        at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
        at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
        at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
        at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
        at 
software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
        at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179)
        at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
        at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
        at 
software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321)
        at 
org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435)
        at 
org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269)
        at 
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1197)
        at 
org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:255)
        at 
org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:75)
        at 
org.apache.iceberg.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:122)
        at 
org.apache.iceberg.io.RollingFileWriter.close(RollingFileWriter.java:147)
        at 
org.apache.iceberg.io.RollingPositionDeleteWriter.close(RollingPositionDeleteWriter.java:35)
        at 
org.apache.iceberg.io.ClusteredWriter.closeCurrentWriter(ClusteredWriter.java:118)
        at org.apache.iceberg.io.ClusteredWriter.close(ClusteredWriter.java:110)
        at 
org.apache.iceberg.io.ClusteredPositionDeleteWriter.close(ClusteredPositionDeleteWriter.java:34)
        at 
org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.close(SparkPositionDeltaWrite.java:477)
        at 
org.apache.iceberg.spark.source.SparkPositionDeltaWrite$DeleteOnlyDeltaWriter.commit(SparkPositionDeltaWrite.java:460)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:176)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
        at 
org.apache.spark.sql.execution.datasources.v2.DeltaWithMetadataWritingSparkTask.run(WriteDeltaExec.scala:208)
        at 
org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to