RussellSpitzer commented on PR #16330: URL: https://github.com/apache/iceberg/pull/16330#issuecomment-4452121571
I kind of want to do something smarter than this long term, but that is probably a good first step. For example just because a file was "written" with a sort order doesn't mean it shouldn't be resorted In the original doc for example I proposed looking at overlaps and only selecting files for rewriting where the overlap depth was at a certain level. Like for example if I have files [1 - 100] - SortId 1 [1 - 100] - SortId 1 [1 - 100] - SortId 1 [1 - 100] - No SortId Just rewriting the last file doesn't make sense, and ignoring the first three is probably a mistake -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
