ayushtkn commented on PR #5396: URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433606299
Viraj I am Sorry I am totally against this. I am writing this because I don't want to ghost you and then if someone comes and agrees then shoot a vote against. I have mentioned a bunch of reasons above, I even think if some client is connected to datanode and happily reading a file, he might get impacted, AFAIK block location can be cached as well and there are many other reasons, I don't to get you a list, I am pretty sure you would be aware of almost all of them... A service like datanode killing itself doesn't sound something feasible to me at all. Having these hooks and all in a service which holds data, sounds just doing the same thing but opening ways to get exploited. That sound even more risky to me. This is something a cluster Admin services should handle. A datanode going down or having troubles is something a basic use case for HDFS, that is where replication pitches in. Ideally it should just alarm the admins and they should figure out what went wrong, may be a restart won't fix things and you would be in loop, doing a shutdown shoot BR to the ones your are still connected and then restart. Metrics are there which can tell you which datanode is dead, so advanced cluster administrator services can leverage that. There is [JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java) which can be leveraged. If those services can trigger a restart and do operations like shutdown, they should be allowed to fetch metrics as well. or there is an API which getDatanodeStats and which can take dead as param or so, and such a logic can be developed by a periodic check or so. Regarding the cloud thing and metrics stuff. I got a chance to talk to some cloud Infra folks at my org and we do have ways to get metrics. I am not sharing how, because I don't know how professionally safe it is for me. But there are ways to do so. So, This can be handled at deployment levels. Should be done there only and this auto shutdown logic based on some factors, I am just repeating myself I am totally against it.... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
