bryanck opened a new pull request, #13327:
URL: https://github.com/apache/iceberg/pull/13327

   This PR adds an option to `RewriteDataFiles` to set a predicate for 
filtering out data files by attributes. You can already specify an expression 
to filter out data files, but there is currently no way to filter out data 
files by attributes such as the data file location.
   
   We have tables with data landing in multiple regions in S3. When new data is 
committed, we trigger various processes, such as moving data in remote regions 
to the table location, as well as running rewrite data files to compact the 
data. This new file filter option allows us to filter out data files in remote 
regions (based on the location) so we only compact data local to the table 
location. This prevents concurrency issues (moving data while compacting), and 
also allows the server-side file move to take precedence over loading files in 
remote regions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to