aokolnychyi opened a new pull request, #8972: URL: https://github.com/apache/iceberg/pull/8972
This PR migrates the action for rewriting manifests to use rolling writers. Right now, we collect all entries in a Spark partition into a list to determine the number of entries that must be written and then decide whether to split them into multiple manifest files or not. This process is slow as it forces Spark to materialize all records in a partition before we start writing. Moreover, it consumes quite a bit of memory as the entire Spark partition is loaded into memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org