[GitHub] [hadoop] ayushtkn commented on pull request #5396: HDFS-16918. Optionally shut down datanode if it does not stay connected to active namenode

via GitHub Thu, 16 Feb 2023 11:31:47 -0800


ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433606299

Viraj I am Sorry I am totally against this. I am writing this because I
don't want to ghost you and then if someone comes and agrees then shoot a vote
against.

I have mentioned a bunch of reasons above, I even think if some client is
connected to datanode and happily reading a file, he might get impacted, AFAIK
block location can be cached as well and there are many other reasons, I don't
to get you a list, I am pretty sure you would be aware of almost all of them...

A service like datanode killing itself doesn't sound something feasible to
me at all. Having these hooks and all in a service which holds data, sounds
just doing the same thing but opening ways to get exploited. That sound even
more risky to me.

This is something a cluster Admin services should handle. A datanode going
down or having troubles is something a basic use case for HDFS, that is where
replication pitches in.

Ideally it should just alarm the admins and they should figure out what went
wrong, may be a restart won't fix things and you would be in loop, doing a
shutdown shoot BR to the ones your are still connected and then restart.

Metrics are there which can tell you which datanode is dead, so advanced
cluster administrator services can leverage that. There is
[JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java)
which can be leveraged. If those services can trigger a restart and do
operations like shutdown, they should be allowed to fetch metrics as well. or
there is an API which getDatanodeStats and which can take dead as param or so,
and such a logic can be developed by a periodic check or so.

Regarding the cloud thing and metrics stuff. I got a chance to talk to some
cloud Infra folks at my org and we do have ways to get metrics. I am not
sharing how, because I don't know how professionally safe it is for me. But
there are ways to do so.

So, This can be handled at deployment levels. Should be done there only and
this auto shutdown logic based on some factors, I am just repeating myself I am
totally against it....

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] ayushtkn commented on pull request #5396: HDFS-16918. Optionally shut down datanode if it does not stay connected to active namenode

Reply via email to