Having got the first stage of my client connector module nicely working to a single node, I'm now looking at how to make it cluster-aware, maintaining multiple connections for reliability and load-spreading. What are some good strategies to take here?
My current plan involves connecting to a (randomly chosen from a list?)
seed node, to query the list of peers in the cluster, then make a
selection of some number of those to be "primary" nodes, and some more
as "backup" nodes. The primary nodes will be used to spread actual
query load around, the backups sitting idle simply as a fast way to
failover to some known-working connection if a primary falls over. By
registering an interest in topology and status change messages, the
client can keep the list of available nodes up-to-date.
1. What is a good way to handle prepared statements here? Should they
be prepared on all the (primary/all?) nodes, or just one? Some
applications I could imagine having just a handful of heavily-used
prepared statements, so they'd become a hotspot on one node if it
wasn't spread around. But then what to do as new nodes become
elected as primaries? Should they be prepared eagerly on
connection? Lazily at next use?
2. Secondly; what are suggested ways to actually spread load among the
primaries? I could imagine a simple round-robin, or something more
fancy involving picking the node with the fewest outstanding
requests, or the one on which we've been responsible for the least
processing time recently, or something else... Do client libraries
generally provide a selection of these mechanisms, or just pick one?
--
Paul "LeoNerd" Evans
[email protected]
ICQ# 4135350 | Registered Linux# 179460
http://www.leonerd.org.uk/
signature.asc
Description: PGP signature
