Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

via GitHub Fri, 25 Oct 2024 14:53:11 -0700


himadripal commented on code in PR #11396:
URL: https://github.com/apache/iceberg/pull/11396#discussion_r1817404590



##########
docs/docs/spark-procedures.md:
##########
@@ -402,7 +403,8 @@ Iceberg can compact data files in parallel using Spark with 
the `rewriteDataFile
 | `rewrite-all` | false | Force rewriting of all provided files overriding 
other options |
 | `max-file-group-size-bytes` | 107374182400 (100GB) | Largest amount of data 
that should be rewritten in a single file group. The entire rewrite operation 
is broken down into pieces based on partitioning and within partitions based on 
size into file-groups.  This helps with breaking down the rewriting of very 
large partitions which may not be rewritable otherwise due to the resource 
constraints of the cluster. |
 | `delete-file-threshold` | 2147483647 | Minimum number of deletes that needs 
to be associated with a data file for it to be considered for rewriting |
-
+| `output-spec-id` | current partition spec id | Desired partition spec ID to 
be used for rewriting data files. This allows data files to be rewritten with a 
custom partition spec. |

Review Comment:
   output-spec-id - may be mention that `it needs to be one of the existing 
ones ( previous partitions specs)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

Reply via email to