[I] Iceberg Rewrite DataFiles unmanageable behavior [iceberg]

via GitHub Wed, 07 Feb 2024 00:20:28 -0800


supsupsap opened a new issue, #9674:
URL: https://github.com/apache/iceberg/issues/9674


   ### Apache Iceberg version
   
   1.4.3 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Hi all!
   I have a problem with Iceberg Rewrite DataFiles unmanageable behavior.
   
   **ENV:** 
   
   > Spark 3.5.0 in k8s
   > Spark config - ({ "driver_cores": "6", "driver_memory": "40Gi", 
"executor_cores": "2", "executor_memory": "50Gi", "num_executors": "30})
   > Iceberg 1.4.2
   > S3
   
   Rewrite datafiles procedure config -
   ```
   strategy="binpack",
   options={
         "rewrite-job-order": "bytes-asc",
         "max-file-group-size-bytes": f"{1024*1024*1024}",
         "target-file-size-bytes": f"{512*1024*1024}",
         "min-input-files": "5",
         "max-concurrent-file-group-rewrites": "30",
         "partial-progress.enabled": "true",
         "partial-progress.max-commits": "20",
   }
   ```
   
   The fact is that 3 times running procedures with the same parameters on the 
same table gave different results and different errors, I can't understand what 
I'm doing wrong, maybe someone can explain to me what's the matter?
   
   **TLDR - The first run did not produce errors related to overwriting files 
and did not overwrite all files which were good for rewrite.
   Subsequent runs overwritten more files than the first one on the same table. 
And generated errors related to Premature end of Content-Length**
   
   3 jobs were run with the same parameters on a partitioned table with many 
small files. And here are the results I got...
   (No new data was written to this table between the runs of this procedure)
   
   **1st try:**
   
   > duration - 83 minutes
   > rewritten_data_files_count = 100638
   > added_data_files_count = 52
   > rewritten_bytes_count = 17814636741
   > failed_data_files_count = 0 (!!!)
   
   
   **Errors/Warnings:**
   It were many warnings like this:
   ```
   2024-02-01T14:09:25,893 WARN [Rewrite-Service-15] 
org.apache.iceberg.util.Tasks - Retrying task after failure: Cannot commit: 
stale table metadata
   org.apache.iceberg.exceptions.CommitFailedException: Cannot commit: stale 
table metadata
   ```
   2)
   ```
   2024-02-01T14:13:39,857 WARN [Rewrite-Service-4] 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - MetaStoreClient lost 
connection. Attempting to reconnect (1 of 1) after 1s. getTable
   org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Connection reset by peer (Write failed)
   ...
   Caused by: java.net.SocketException: Connection reset by peer (Write failed)
   ```
   3)
   `2024-02-01T14:13:39,958 WARN [Rewrite-Service-6] 
org.apache.iceberg.util.Tasks - Retrying task after failure: Base metadata 
location 
's3a://obs-zdp-warehouse-stage-mz/stage.db/test_partitioned/metadata/15461-ff3d5cc3-b0bb-4847-a433-79526c16e22a.metadata.json'
 is not same as the current table metadata location 
's3a://obs-zdp-warehouse-stage-mz/stage.db/test_partitioned/metadata/15462-262e2ad3-9bd9-409f-bbc1-b1fefa891477.metadata.json'
 for stage.test_partitioned`
   4)
   ```
   2024-02-01T14:13:49,268 ERROR [Rewrite-Service-24] 
org.apache.iceberg.actions.RewriteDataFilesCommitManager - Cannot commit groups 
[RewriteFileGroup{info=FileGroupInfo{globalIndex=8, partitionIndex=1, 
partition=PartitionData{date=19753, date2=19752, event_type=stupid_event}}, 
numRewrittenFiles=6062, numAddedFiles=3, numRewrittenBytes=1073690623}, 
RewriteFileGroup{info=FileGroupInfo{globalIndex=3, partitionIndex=1, 
partition=PartitionData{date=19753, date2=19753, event_type=neutral_event}}, 
numRewrittenFiles=6049, numAddedFiles=3, numRewrittenBytes=1073712029}, 
RewriteFileGroup{info=FileGroupInfo{globalIndex=5, partitionIndex=1, 
partition=PartitionData{date=19753, date2=19752, event_type=neutral_event}}, 
numRewrittenFiles=6062, numAddedFiles=3, numRewrittenBytes=1073741011}], 
attempting to clean up written files
   org.apache.iceberg.exceptions.CommitFailedException: Cannot commit: stale 
table metadata
   ```
   5)
   ```
   2024-02-01T14:19:12,950 WARN [ForkJoinPool.commonPool-worker-1] 
org.apache.thrift.transport.TIOStreamTransport - Error closing output stream.
   java.net.SocketException: Socket closed
   ```
   
   **Summarizing the first launch -**  The first launch demonstrated a long 
duration and a high number of rewritten files. However, it was notable for not 
having any failed files. Also there were several warning and error messages 
related to stale table metadata, connection issues, and socket closure.
   Looking at the output, there were no errors in the end with overwriting 
files.
   And for a new snapshot we have around 180k files.
   **We looked inside s3 and saw that there are still a lot of small files in 
some partitions - So we run it a second time.**
   
   **2nd try:**
   
   > duration - 25 minutes
   > rewritten_data_files_count = 75415 - (what? why? Surely nothing here 
should have been overwritten? because of 1st try and 0 in 
failed_data_files_count?)
   > added_data_files_count = 58
   > rewritten_bytes_count = 13346723044
   > failed_data_files_count = 9463 (Now we have failed files, I remind you 
that nothing was added to the table between the runs)
   
   Errors/Warnings:
   It were many errors like this:
   ```
   2024-02-01T14:47:34,693 INFO [Rewrite-Service-26] 
org.apache.spark.scheduler.DAGScheduler - Job 18 failed: save at 
SparkBinPackDataRewriter.java:62, took 316.504764 s
   2024-02-01T14:47:34,694 ERROR [Rewrite-Service-26] 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=iceberg_hive.stage.test_partitioned, 
format=PARQUET) is aborting.
   2024-02-01T14:47:34,700 INFO [Rewrite-Service-26] 
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) using bulk 
deletes (job abort)
   2024-02-01T14:47:34,700 ERROR [Rewrite-Service-26] 
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source 
write support IcebergBatchWrite(table=iceberg_hive.stage.test_partitioned, 
format=PARQUET) aborted.
   2024-02-01T14:47:34,701 ERROR [Rewrite-Service-26] 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction - Failure during 
rewrite group FileGroupInfo{globalIndex=21, partitionIndex=1, 
partition=PartitionData{date=19753, date2=19752, event_type=neutral_event}}
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 23.0 failed 4 times, most recent failure: Lost task 0.3 in stage 23.0 
(TID 67) (10.16.37.136 executor 5): 
org.apache.iceberg.exceptions.RuntimeIOException: 
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException: 
Premature end of Content-Length delimited message body (expected: 233,561,838; 
received: 143,761,408)
   ...
   Caused by: 
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException: 
Premature end of Content-Length delimited message body (expected: 233,561,838; 
received: 143,761,408)
   ```
   **summarizing the 2nd launch:** A lot of files were overwritten, and we 
received a much smaller number of files compared to the first launch. At the 
same time, we also received many errors regarding overwriting files and all 
related to: Premature end of Content-Length…
   
   **3rd try:**
   
   > duration - 13 minutes
   > rewritten_data_files_count = 2615 - (and again, we have overwritten files, 
much smaller, but there are, now at least it is clear from where that this is 
part of the files from failed in the second attempt)
   > added_data_files_count = 2
   > rewritten_bytes_count = 463745143
   > failed_data_files_count = 6885
   
   Errors
   It were many errors like this:
   `[February 1, 2024 8:14 PM] {spark_submit_k8s_operator.py:67} INFO - 
2024-02-01T15:14:47,097 WARN [task-result-getter-0] 
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 13) 
(10.16.33.130 executor 25): org.apache.iceberg.exceptions.RuntimeIOException: 
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException: 
Premature end of Content-Length delimited message body (expected: 296,640,009; 
received: 144,359,424)`
   **summarizing the 3rd launch:** Some were able to overwrite, some were not. 
And again problems related to Premature end of Content-Length…
   
   Full Stacktrace of Premature end of Content-Length:
   
   ```
   [2024-02-07, 07:48:05 UTC] {spark_submit_k8s_operator.py:67} INFO - 
2024-02-07T07:48:01,330 INFO [dispatcher-CoarseGrainedScheduler] 
org.apache.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 6.0 (TID 
10) (10.16.35.80, executor 1, partition 1, PROCESS_LOCAL, 14586 bytes) 
   2024-02-07T07:48:01,339 WARN [task-result-getter-2] 
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 0) 
(10.16.35.80 executor 1): org.apache.iceberg.exceptions.RuntimeIOException: 
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException: 
Premature end of Content-Length delimited message body (expected: 297,221,738; 
received: 144,236,544)
        at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:149)
        at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:126)
        at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65)
        at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
        at scala.Option.exists(Option.scala:376)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: 
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException: 
Premature end of Content-Length delimited message body (expected: 297,221,738; 
received: 144,236,544)
        at 
org.apache.iceberg.aws.shaded.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
        at 
org.apache.iceberg.aws.shaded.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
        at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133)
        at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133)
        at 
software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66)
        at org.apache.iceberg.aws.s3.S3InputStream.read(S3InputStream.java:109)
        at 
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
        at 
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
        at 
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)
        at 
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
        at 
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:147)
        ... 32 more
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Iceberg Rewrite DataFiles unmanageable behavior [iceberg]

Reply via email to