supsupsap opened a new issue, #9674:
URL: https://github.com/apache/iceberg/issues/9674
### Apache Iceberg version
1.4.3 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Hi all!
I have a problem with Iceberg Rewrite DataFiles unmanageable behavior.
**ENV:**
> Spark 3.5.0 in k8s
> Spark config - ({ "driver_cores": "6", "driver_memory": "40Gi",
"executor_cores": "2", "executor_memory": "50Gi", "num_executors": "30})
> Iceberg 1.4.2
> S3
Rewrite datafiles procedure config -
```
strategy="binpack",
options={
"rewrite-job-order": "bytes-asc",
"max-file-group-size-bytes": f"{1024*1024*1024}",
"target-file-size-bytes": f"{512*1024*1024}",
"min-input-files": "5",
"max-concurrent-file-group-rewrites": "30",
"partial-progress.enabled": "true",
"partial-progress.max-commits": "20",
}
```
The fact is that 3 times running procedures with the same parameters on the
same table gave different results and different errors, I can't understand what
I'm doing wrong, maybe someone can explain to me what's the matter?
**TLDR - The first run did not produce errors related to overwriting files
and did not overwrite all files which were good for rewrite.
Subsequent runs overwritten more files than the first one on the same table.
And generated errors related to Premature end of Content-Length**
3 jobs were run with the same parameters on a partitioned table with many
small files. And here are the results I got...
(No new data was written to this table between the runs of this procedure)
**1st try:**
> duration - 83 minutes
> rewritten_data_files_count = 100638
> added_data_files_count = 52
> rewritten_bytes_count = 17814636741
> failed_data_files_count = 0 (!!!)
**Errors/Warnings:**
It were many warnings like this:
```
2024-02-01T14:09:25,893 WARN [Rewrite-Service-15]
org.apache.iceberg.util.Tasks - Retrying task after failure: Cannot commit:
stale table metadata
org.apache.iceberg.exceptions.CommitFailedException: Cannot commit: stale
table metadata
```
2)
```
2024-02-01T14:13:39,857 WARN [Rewrite-Service-4]
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient - MetaStoreClient lost
connection. Attempting to reconnect (1 of 1) after 1s. getTable
org.apache.thrift.transport.TTransportException: java.net.SocketException:
Connection reset by peer (Write failed)
...
Caused by: java.net.SocketException: Connection reset by peer (Write failed)
```
3)
`2024-02-01T14:13:39,958 WARN [Rewrite-Service-6]
org.apache.iceberg.util.Tasks - Retrying task after failure: Base metadata
location
's3a://obs-zdp-warehouse-stage-mz/stage.db/test_partitioned/metadata/15461-ff3d5cc3-b0bb-4847-a433-79526c16e22a.metadata.json'
is not same as the current table metadata location
's3a://obs-zdp-warehouse-stage-mz/stage.db/test_partitioned/metadata/15462-262e2ad3-9bd9-409f-bbc1-b1fefa891477.metadata.json'
for stage.test_partitioned`
4)
```
2024-02-01T14:13:49,268 ERROR [Rewrite-Service-24]
org.apache.iceberg.actions.RewriteDataFilesCommitManager - Cannot commit groups
[RewriteFileGroup{info=FileGroupInfo{globalIndex=8, partitionIndex=1,
partition=PartitionData{date=19753, date2=19752, event_type=stupid_event}},
numRewrittenFiles=6062, numAddedFiles=3, numRewrittenBytes=1073690623},
RewriteFileGroup{info=FileGroupInfo{globalIndex=3, partitionIndex=1,
partition=PartitionData{date=19753, date2=19753, event_type=neutral_event}},
numRewrittenFiles=6049, numAddedFiles=3, numRewrittenBytes=1073712029},
RewriteFileGroup{info=FileGroupInfo{globalIndex=5, partitionIndex=1,
partition=PartitionData{date=19753, date2=19752, event_type=neutral_event}},
numRewrittenFiles=6062, numAddedFiles=3, numRewrittenBytes=1073741011}],
attempting to clean up written files
org.apache.iceberg.exceptions.CommitFailedException: Cannot commit: stale
table metadata
```
5)
```
2024-02-01T14:19:12,950 WARN [ForkJoinPool.commonPool-worker-1]
org.apache.thrift.transport.TIOStreamTransport - Error closing output stream.
java.net.SocketException: Socket closed
```
**Summarizing the first launch -** The first launch demonstrated a long
duration and a high number of rewritten files. However, it was notable for not
having any failed files. Also there were several warning and error messages
related to stale table metadata, connection issues, and socket closure.
Looking at the output, there were no errors in the end with overwriting
files.
And for a new snapshot we have around 180k files.
**We looked inside s3 and saw that there are still a lot of small files in
some partitions - So we run it a second time.**
**2nd try:**
> duration - 25 minutes
> rewritten_data_files_count = 75415 - (what? why? Surely nothing here
should have been overwritten? because of 1st try and 0 in
failed_data_files_count?)
> added_data_files_count = 58
> rewritten_bytes_count = 13346723044
> failed_data_files_count = 9463 (Now we have failed files, I remind you
that nothing was added to the table between the runs)
Errors/Warnings:
It were many errors like this:
```
2024-02-01T14:47:34,693 INFO [Rewrite-Service-26]
org.apache.spark.scheduler.DAGScheduler - Job 18 failed: save at
SparkBinPackDataRewriter.java:62, took 316.504764 s
2024-02-01T14:47:34,694 ERROR [Rewrite-Service-26]
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source
write support IcebergBatchWrite(table=iceberg_hive.stage.test_partitioned,
format=PARQUET) is aborting.
2024-02-01T14:47:34,700 INFO [Rewrite-Service-26]
org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) using bulk
deletes (job abort)
2024-02-01T14:47:34,700 ERROR [Rewrite-Service-26]
org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source
write support IcebergBatchWrite(table=iceberg_hive.stage.test_partitioned,
format=PARQUET) aborted.
2024-02-01T14:47:34,701 ERROR [Rewrite-Service-26]
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction - Failure during
rewrite group FileGroupInfo{globalIndex=21, partitionIndex=1,
partition=PartitionData{date=19753, date2=19752, event_type=neutral_event}}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 23.0 failed 4 times, most recent failure: Lost task 0.3 in stage 23.0
(TID 67) (10.16.37.136 executor 5):
org.apache.iceberg.exceptions.RuntimeIOException:
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 233,561,838;
received: 143,761,408)
...
Caused by:
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 233,561,838;
received: 143,761,408)
```
**summarizing the 2nd launch:** A lot of files were overwritten, and we
received a much smaller number of files compared to the first launch. At the
same time, we also received many errors regarding overwriting files and all
related to: Premature end of Content-Length…
**3rd try:**
> duration - 13 minutes
> rewritten_data_files_count = 2615 - (and again, we have overwritten files,
much smaller, but there are, now at least it is clear from where that this is
part of the files from failed in the second attempt)
> added_data_files_count = 2
> rewritten_bytes_count = 463745143
> failed_data_files_count = 6885
Errors
It were many errors like this:
`[February 1, 2024 8:14 PM] {spark_submit_k8s_operator.py:67} INFO -
2024-02-01T15:14:47,097 WARN [task-result-getter-0]
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 13)
(10.16.33.130 executor 25): org.apache.iceberg.exceptions.RuntimeIOException:
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 296,640,009;
received: 144,359,424)`
**summarizing the 3rd launch:** Some were able to overwrite, some were not.
And again problems related to Premature end of Content-Length…
Full Stacktrace of Premature end of Content-Length:
```
[2024-02-07, 07:48:05 UTC] {spark_submit_k8s_operator.py:67} INFO -
2024-02-07T07:48:01,330 INFO [dispatcher-CoarseGrainedScheduler]
org.apache.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 6.0 (TID
10) (10.16.35.80, executor 1, partition 1, PROCESS_LOCAL, 14586 bytes)
2024-02-07T07:48:01,339 WARN [task-result-getter-2]
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 0)
(10.16.35.80 executor 1): org.apache.iceberg.exceptions.RuntimeIOException:
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 297,221,738;
received: 144,236,544)
at
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:149)
at
org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:126)
at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:65)
at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:49)
at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:135)
at
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
at
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
at scala.Option.exists(Option.scala:376)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
at
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by:
org.apache.iceberg.aws.shaded.org.apache.http.ConnectionClosedException:
Premature end of Content-Length delimited message body (expected: 297,221,738;
received: 144,236,544)
at
org.apache.iceberg.aws.shaded.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
at
org.apache.iceberg.aws.shaded.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133)
at
software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66)
at org.apache.iceberg.aws.s3.S3InputStream.read(S3InputStream.java:109)
at
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
at
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
at
org.apache.iceberg.shaded.org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
at
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)
at
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)
at
org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
at
org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:147)
... 32 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]