ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433606299

   Viraj I am Sorry I am totally against this. I am writing this because I 
don't want to ghost you and then if someone comes and agrees then shoot a vote 
against.
   
   I have mentioned a bunch of reasons above, I even think if some client is 
connected to datanode and happily reading a file, he might get impacted, AFAIK 
block location can be cached as well and there are many other reasons, I don't 
to get you a list, I am pretty sure you would be aware of almost all of them...
   
   A service like datanode killing itself doesn't sound something feasible to 
me at all. Having these hooks and all in a service which holds data, sounds 
just doing the same thing but opening ways to get exploited. That sound even 
more risky to me.
   
   This is something a cluster Admin services should handle. A datanode going 
down or having troubles is something a basic use case for HDFS, that is where 
replication pitches in.
   
   Ideally it should just alarm the admins and they should figure out what went 
wrong, may be a restart won't fix things and you would be in loop, doing a 
shutdown shoot BR to the ones your are still connected and then restart. 
   
   Metrics are there which can tell you which datanode is dead, so advanced 
cluster administrator services can leverage that. There is 
[JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java)
 which can be leveraged. If those services can trigger a restart and do 
operations like shutdown, they should be allowed to fetch metrics as well. or 
there is an API which getDatanodeStats and which can take dead as param or so, 
and such a logic can be developed by a periodic check or so.
   
   Regarding the cloud thing and metrics stuff. I got a chance to talk to some 
cloud Infra folks at my org and we do have ways to get metrics. I am not 
sharing how, because I don't know how professionally safe it is for me. But 
there are ways to do so. 
   
   So, This can be handled at deployment levels. Should be done there only and 
this auto shutdown logic based on some factors, I am just repeating myself I am 
totally against it....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to