hydrogenlee commented on PR #14080:
URL: https://github.com/apache/iceberg/pull/14080#issuecomment-3350041356

   Thanks for your suggestions @stevenzwu @gyfora . But I still feel that 
cleaning up state in `initialStates` to avoid large state size(and not to free 
the memory) isn’t the right way, because:
   
   1. The cleanup logic would be split into two different places.
   2. If there is incorrect logic in `snapshotState`, for example (although 
unlikely): all subtasks retain their state (without clearing it), and the state 
is then cleaned up in `initializeState`, it might seem fine when the 
parallelism is relatively small (even though a bug exists, since the cleanup in 
`initializeState` allows continued execution without surfacing the error). But 
when the parallelism is higher (e.g., thousands or more), it may get stuck in 
the INITIALIZING state  — because there would be no opportunity to invoke the 
`initialStates` method.
   
   I think the key to avoiding future errors is to ensure that the 
`globalStatisticsState` contains only one element. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to