RussellSpitzer commented on issue #12150: URL: https://github.com/apache/iceberg/issues/12150#issuecomment-2630889126
This is really more of questions for a whole book or a series of talks, I would recommend checking out https://www.youtube.com/playlist?list=PLkifVhhWtccxBSrKFPXOmjAFFEpeYii5K For all the Iceberg Summit videos from last year For short answers: You should run all those maintenance things. The most important for most people are Rewrite Metadata and Expire Snapshots. The others are more contextual and expensive to actually run so it's usage dependent imho. Spark Apis use distributed computing, thats the biggest difference. The Java APIS are also much more low level in Iceberg, more for users building engines or doing custom logic. MOR is faster on write, slower on read. Good for sparse deletes COW is slower on write, faster on read. Good for dense deletes (many deletes in the same file - 30% or more of the file replaced) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org