hydrogenlee commented on PR #14080: URL: https://github.com/apache/iceberg/pull/14080#issuecomment-3350041356
Thanks for your suggestions @stevenzwu @gyfora . But I still feel that cleaning up state in `initialStates` to avoid large state size(and not to free the memory) isn’t the right way, because: 1. The cleanup logic would be split into two different places. 2. If there is incorrect logic in `snapshotState`, for example (although unlikely): all subtasks retain their state (without clearing it), and the state is then cleaned up in `initializeState`, it might seem fine when the parallelism is relatively small (even though a bug exists, since the cleanup in `initializeState` allows continued execution without surfacing the error). But when the parallelism is higher (e.g., thousands or more), it may get stuck in the INITIALIZING state — because there would be no opportunity to invoke the `initialStates` method. I think the key to avoiding future errors is to ensure that the `globalStatisticsState` contains only one element. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
