NickyYe commented on pull request #2562:
URL: https://github.com/apache/hadoop/pull/2562#issuecomment-748338149


   > Thanks for the information - this may explain why HDFS-12703 was needed, 
as some exceptions which were not logged at that time, caused the decommission 
thread to stop running until the NN was restarted. The change there was to 
catch the exception.
   > 
   > The change here looks correct to me, but as the issue exists on the trunk 
branch, we should fix it there first, and then backport to 3.3, 3.2, 3.1 and 
2.10 so the fix is in place across all branches.
   
   Due to HDFS-14854, the fix on trunk could be a very different one, since it 
doesn't make sense to change the new interface with a boolean parameter to 
stopTrackingNode while DatanodeAdminBackoffMonitor does't need.
   
   Looks a better fix would be introduce a cancelledNodes to 
DatanodeAdminDefaultMonitor, just like DatanodeAdminBackoffMonitor . Then in 
stopTrackingNode, don't remove dn from outOfServiceNodeBlocks, but add it to 
cancelledNodes for further process.
   
   However, the change would be a little bit bigger.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to