ludwigxia opened a new pull request, #13638: URL: https://github.com/apache/iceberg/pull/13638
Our streaming jobs generate a significant volume of metadata files. To remove old metadata files, enabling `write.metadata.delete-after-commit.enabled=true` represents a recommended practice. However, this will only delete metadata files that are tracked in the metadata log and will not delete orphaned metadata files. While executing `remove_orphan_files` would theoretically address such files, traversing the entire data directory imposes substantial performance overhead. Given that streaming jobs follow an append-only pattern, orphaned data rarely occurs in practice. Therefore, we introduce an `only_metadata` option to exclusively clean the metadata directory, reducing overall processing latency – particularly for developers who only require metadata directory cleanup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
