Re: cluster health check using riak-java-client

David Byron Sun, 13 Dec 2015 17:33:51 -0800

Thanks for this. I think in the end I'm going to assume there'ssufficient traffic that the node state that riak-java-client keeps trackof is up to date enough.

Of course I have yet another question. Even if I assume the state ofeach node is correct, how do I know if the cluster overall is consideredhealthy? This may not be a valid question, but I hope it is. Forexample, if the cluster configuration requires 3 nodes to write, I canwrite some fairly detailed code in riak-java-client to realize that'sthe configuration and count that there are enough healthy nodes.However, if I'm using something like haproxy, I'm not sure there's agreat spot to put that logic.

Is there a way to query the cluster overall to ask a health questionlike this?


-DB

On 12/8/15 1:09 PM, Alexander Sicular wrote:

Besides just plainly writing a key, you could also do something like (pseudo 
code):

Riak.put(canaryKey, pw=n_val){
   If ok -> cool!
   If borked -> sad face
}

The important bit is the pw (primary write) equals your replication value. This 
means that all copies in the virtual node replica set need to go to virtual 
nodes allocated to their primary physical machines. This is a way you can check 
cluster status from the app level as in , is the cluster in some kind of borked 
state.

-Alexander

@siculars
http://siculars.posthaven.com

Sent from my iRotaryPhone

On Dec 8, 2015, at 14:13, David Byron <[email protected]> wrote:

I'm still curious what people think here.  As I stare at this longer, I'd like 
to be able to call RiakNode.checkHealth(), but it's private.

HealthMonitorTask.run that only calls checkHealth some of the time, so without 
the ability to call it directly, I think I'm getting a stale notion of health 
in circumstances like I outlined below -- when the last operation was 
successful, but the node has since gone down.

Thanks for your input.

-DB

On 12/2/15 10:25 PM, David Byron wrote:
I'm implementing a health check for a service of mine that uses riak.
I've seen this code from
https://github.com/basho/riak-java-client/issues/456:

RiakCluster cluster = clientInstance.getRiakCluster();
List<RiakNode> nodes = cluster.getNodes();
for (RiakNode node : nodes)
{
   State state = node.getNodeState();
}

and it's great.  From what I can tell, it depends on some background
processing that keeps track of the state of the nodes.  I did a quick
test though, and if I run 'riak stop' from the command line and then
this loop with no intervening operations, the nodes report RUNNING. Even
after some time passes (more than three minutes), still RUNNING.

However, if I run do run an intervening operation (some actual query of
data for example) that fails, the nodes then report HEALTH_CHECKING.
Then, after 'riak start', the nodes report RUNNING again.  I suppose
that's good.

So, I'm trying to decide how to implement the health check.  The above
loop doesn't seem to be enough, but do I really need to do something like:

final RiakFuture<Void, Void> future = cluster.execute(new PingOperation());

try {
   future.await();
   future.get();
} catch (ExecutionException | InterruptedException e) {
   // bad
}
// good

Maybe it's sufficient to only do this if all the nodes report RUNNING? I
suppose there's always a small window in time where a node could report
bad, but via a ping I'd learn it was up...so I'm torn.  Any suggestions
for whether pinging every time is correct, or there's something more
efficient (and safe)?

Thanks for your help.

-DB


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: cluster health check using riak-java-client

Reply via email to