Re: [PR] Docs: Fix description of min-input-files option of Spark rewrite_data_files procedure [iceberg]

via GitHub Thu, 19 Jun 2025 10:08:12 -0700


pvary commented on code in PR #13355:
URL: https://github.com/apache/iceberg/pull/13355#discussion_r2157413509



##########
docs/docs/spark-procedures.md:
##########
@@ -406,7 +406,7 @@ Iceberg can compact data files in parallel using Spark with 
the `rewriteDataFile
 | `target-file-size-bytes` | 536870912 (512 MB, default value of 
`write.target-file-size-bytes` from [table 
properties](configuration.md#write-properties)) | Target output file size |
 | `min-file-size-bytes` | 75% of target file size | Files under this threshold 
will be considered for rewriting regardless of any other criteria |
 | `max-file-size-bytes` | 180% of target file size | Files with sizes above 
this threshold will be considered for rewriting regardless of any other 
criteria |
-| `min-input-files` | 5 | Any file group exceeding this number of files will 
be rewritten regardless of other criteria |
+| `min-input-files` | 5 | Any file group (with at least two files) having this 
number of files or more will be rewritten regardless of other criteria |

Review Comment:
   Maybe add this as a separate sentence at the end? It is easier to understand 
this way, than trying to decipher what `with at least two files` means in this 
context



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Fix description of min-input-files option of Spark rewrite_data_files procedure [iceberg]

Reply via email to