mxm opened a new pull request, #14358: URL: https://github.com/apache/iceberg/pull/14358
DynamicWriteResultAggregator uses the ManifestOutputFileFactory class to write a temporary manifest. For the Dynamic Sink we want to support writing to a vast amount of tables, even during a single checkpoint. So we avoid storing all factories and use a cache with an eviction policy. The problem is that if the factory for a given table is evicted during a checkpoint flush while there could still be writes for that factory being processed. In that case the same output directory will be generated again which leads to overwriting already written manifests files. We must avoid recreating the output file factory during checkpoint flushing. It is fine to drop the factories due to cache eviction afterwards, as the output paths for factories are scoped by checkpoint id. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
