Thanks for this. I think in the end I'm going to assume there's
sufficient traffic that the node state that riak-java-client keeps track
of is up to date enough.
Of course I have yet another question. Even if I assume the state of
each node is correct, how do I know if the cluster overall is considered
healthy? This may not be a valid question, but I hope it is. For
example, if the cluster configuration requires 3 nodes to write, I can
write some fairly detailed code in riak-java-client to realize that's
the configuration and count that there are enough healthy nodes.
However, if I'm using something like haproxy, I'm not sure there's a
great spot to put that logic.
Is there a way to query the cluster overall to ask a health question
like this?
-DB
On 12/8/15 1:09 PM, Alexander Sicular wrote:
Besides just plainly writing a key, you could also do something like (pseudo
code):
Riak.put(canaryKey, pw=n_val){
If ok -> cool!
If borked -> sad face
}
The important bit is the pw (primary write) equals your replication value. This
means that all copies in the virtual node replica set need to go to virtual
nodes allocated to their primary physical machines. This is a way you can check
cluster status from the app level as in , is the cluster in some kind of borked
state.
-Alexander
@siculars
http://siculars.posthaven.com
Sent from my iRotaryPhone
On Dec 8, 2015, at 14:13, David Byron <[email protected]> wrote:
I'm still curious what people think here. As I stare at this longer, I'd like
to be able to call RiakNode.checkHealth(), but it's private.
HealthMonitorTask.run that only calls checkHealth some of the time, so without
the ability to call it directly, I think I'm getting a stale notion of health
in circumstances like I outlined below -- when the last operation was
successful, but the node has since gone down.
Thanks for your input.
-DB
On 12/2/15 10:25 PM, David Byron wrote:
I'm implementing a health check for a service of mine that uses riak.
I've seen this code from
https://github.com/basho/riak-java-client/issues/456:
RiakCluster cluster = clientInstance.getRiakCluster();
List<RiakNode> nodes = cluster.getNodes();
for (RiakNode node : nodes)
{
State state = node.getNodeState();
}
and it's great. From what I can tell, it depends on some background
processing that keeps track of the state of the nodes. I did a quick
test though, and if I run 'riak stop' from the command line and then
this loop with no intervening operations, the nodes report RUNNING. Even
after some time passes (more than three minutes), still RUNNING.
However, if I run do run an intervening operation (some actual query of
data for example) that fails, the nodes then report HEALTH_CHECKING.
Then, after 'riak start', the nodes report RUNNING again. I suppose
that's good.
So, I'm trying to decide how to implement the health check. The above
loop doesn't seem to be enough, but do I really need to do something like:
final RiakFuture<Void, Void> future = cluster.execute(new PingOperation());
try {
future.await();
future.get();
} catch (ExecutionException | InterruptedException e) {
// bad
}
// good
Maybe it's sufficient to only do this if all the nodes report RUNNING? I
suppose there's always a small window in time where a node could report
bad, but via a ping I'd learn it was up...so I'm torn. Any suggestions
for whether pinging every time is correct, or there's something more
efficient (and safe)?
Thanks for your help.
-DB
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com