[GitHub] [iceberg] vshel opened a new issue, #5997: Iceberg table maintenance/compaction within AWS

GitBox Mon, 17 Oct 2022 02:42:09 -0700


vshel opened a new issue, #5997:
URL: https://github.com/apache/iceberg/issues/5997


   ### Query engine
   
   Spark3
   
   ### Question
   
   Hello, I have a ~6TB iceberg table with ~10,000 partitions within S3 and I 
am using Glue catalog, what is the correct way of running compaction on such a 
table?
   
   From documentation: https://iceberg.apache.org/docs/latest/maintenance/ I 
can run:
   ```
   SparkActions
       .get()
       .rewriteDataFiles(table)
       .filter(Expressions.equal("date", "2020-08-18"))
       .option("target-file-size-bytes", Long.toString(500 * 1024 * 1024)) // 
500 MB
       .execute();
   ```
   This is going to execute on a single aws instance, how do I scale this to 
many instances for the compaction process to run in parallel on many partitions 
at once, is there an out of the box support for this?
   Additionally, the table is constantly updated, am I supposed to pause all 
updates until compaction finishes?
   
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] vshel opened a new issue, #5997: Iceberg table maintenance/compaction within AWS

Reply via email to