rdsarvar opened a new pull request, #11368: URL: https://github.com/apache/iceberg/pull/11368
# Description Currently, data file rewrites supports specifying the output spec ID to be used. Added functionality to provide a partition spec itself and have it added as a non-default spec if it does not already exist on the table. # Benefits These changes would make it simpler to tier partition granularity by time ranges. As an example: Say your table is heavily used but mostly targets most recent data and you still want to provide the ability for folks to query back in time. You could achieve additional performance improvements by applying more granular partitions in the base table and then have a compaction job that runs by tiers: 1. Short term compaction (reuses the table definition - high granularity, get rid of as many small files as you can) 2. Long term compaction (specified partition spec that is not the default - lower granularity, will cut down the metadata stored for the table) # Notes for Reviewers **Note: This is definitely not complete and I am open to all feedback. Whether some functionalities already exist outside OR if it should be done differently.** The part I'm mostly iffy on is modifying `BaseUpdatePartitionSpec.java` with `table.updateSpec()` instead of having something like `table.addSpec(partitionSpec). addNonDefaultSpec().commit()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org