ludwigxia opened a new pull request, #13638:
URL: https://github.com/apache/iceberg/pull/13638

   Our streaming jobs generate a significant volume of metadata files. To 
remove old metadata files, enabling 
`write.metadata.delete-after-commit.enabled=true` represents a recommended 
practice. However, ​​this will only delete metadata files that are tracked in 
the metadata log and will not delete orphaned metadata files.
   
   While executing `remove_orphan_files` would theoretically address such 
files, traversing the entire data directory imposes substantial performance 
overhead. Given that streaming jobs follow an append-only pattern, orphaned 
data rarely occurs in practice. ​Therefore, we introduce an `only_metadata` 
option to exclusively clean the metadata directory, reducing overall processing 
latency – particularly for developers who only require metadata directory 
cleanup.​​


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to