bryanck opened a new pull request, #13327: URL: https://github.com/apache/iceberg/pull/13327
This PR adds an option to `RewriteDataFiles` to set a predicate for filtering out data files by attributes. You can already specify an expression to filter out data files, but there is currently no way to filter out data files by attributes such as the data file location. We have tables with data landing in multiple regions in S3. When new data is committed, we trigger various processes, such as moving data in remote regions to the table location, as well as running rewrite data files to compact the data. This new file filter option allows us to filter out data files in remote regions (based on the location) so we only compact data local to the table location. This prevents concurrency issues (moving data while compacting), and also allows the server-side file move to take precedence over loading files in remote regions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org