NickyYe commented on pull request #2562: URL: https://github.com/apache/hadoop/pull/2562#issuecomment-748338149
> Thanks for the information - this may explain why HDFS-12703 was needed, as some exceptions which were not logged at that time, caused the decommission thread to stop running until the NN was restarted. The change there was to catch the exception. > > The change here looks correct to me, but as the issue exists on the trunk branch, we should fix it there first, and then backport to 3.3, 3.2, 3.1 and 2.10 so the fix is in place across all branches. Due to HDFS-14854, the fix on trunk could be a very different one, since it doesn't make sense to change the new interface with a boolean parameter to stopTrackingNode while DatanodeAdminBackoffMonitor does't need. Looks a better fix would be introduce a cancelledNodes to DatanodeAdminDefaultMonitor, just like DatanodeAdminBackoffMonitor . Then in stopTrackingNode, don't remove dn from outOfServiceNodeBlocks, but add it to cancelledNodes for further process. However, the change would be a little bit bigger. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
