Wow! Awesome. Give me a bit to try to plug this into my environment. The other way I was going to attempt this was to use the health check file option for the ping request handler. I would have to write a separate process in python or something that would ping zookeeper for active nodes and if the current box's ip is there, I would create the health check file which would make the ping work.
I'd prefer not to introduce yet another process that I need to keep running, so this looks promising. Jim On Wed, Jul 24, 2013 at 11:49 AM, Timothy Potter [via Lucene] < ml-node+s472066n4080116...@n3.nabble.com> wrote: > Hi Jim, > > Based on our discussion, I cooked up this solution for my book Solr in > Action and would appreciate you looking it over to see if it meets > your needs. The basic idea is to extend Solr's built-in > PingRequestHandler to verify a replica is connected to Zookeeper and > is in the "active" state. To enable this, install the custom JAR and > then update your solrconfig.xml to use this class instead of the > built-in one for the /admin/ping request handler: > > <requestHandler name="/admin/ping" > class="sia.ch13.ClusterStateAwarePingRequestHandler"> > > > > >>>> Code <<<< > > package sia.ch13; > > import org.apache.solr.cloud.CloudDescriptor; > import org.apache.solr.cloud.ZkController; > import org.apache.solr.common.SolrException; > import org.apache.solr.common.cloud.ClusterState; > import org.apache.solr.common.cloud.Slice; > import org.apache.solr.core.CoreContainer; > import org.apache.solr.core.CoreDescriptor; > import org.apache.solr.core.SolrCore; > import org.apache.solr.handler.PingRequestHandler; > import org.apache.solr.request.SolrQueryRequest; > import org.apache.solr.response.SolrQueryResponse; > import org.slf4j.Logger; > import org.slf4j.LoggerFactory; > > /** > * Extends Solr's PingRequestHandler to check a replica's cluster > status as part of the health check. > */ > public class ClusterStateAwarePingRequestHandler extends > PingRequestHandler { > > public static Logger log = > LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class); > > @Override > public void handleRequestBody(SolrQueryRequest solrQueryRequest, > SolrQueryResponse solrQueryResponse) throws Exception { > // delegate to the base class to check the status of this local > index > super.handleRequestBody(solrQueryRequest, solrQueryResponse); > > // if ping status is OK, then check cluster state of this core > if ("OK".equals(solrQueryResponse.getValues().get("status"))) { > verifyThisReplicaIsActive(solrQueryRequest.getCore()); > } > } > > /** > * Verifies this replica is "active". > */ > protected void verifyThisReplicaIsActive(SolrCore solrCore) throws > SolrException { > String replicaState = "unknown"; > String nodeName = "?"; > String shardName = "?"; > String collectionName = "?"; > String role = "?"; > Exception exc = null; > try { > CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor(); > CoreContainer coreContainer = > coreDescriptor.getCoreContainer(); > CloudDescriptor cloud = coreDescriptor.getCloudDescriptor(); > > shardName = cloud.getShardId(); > collectionName = cloud.getCollectionName(); > role = (cloud.isLeader() ? "Leader" : "Replica"); > > ZkController zkController = coreContainer.getZkController(); > if (zkController != null) { > nodeName = zkController.getNodeName(); > if (zkController.isConnected()) { > ClusterState clusterState = > zkController.getClusterState(); > Slice slice = > clusterState.getSlice(collectionName, shardName); > replicaState = (slice != null) ? slice.getState() : > "gone"; > } else { > replicaState = "not connected to Zookeeper"; > } > } else { > replicaState = "Zookeeper not enabled/configured"; > } > } catch (Exception e) { > replicaState = "error determining cluster state"; > exc = e; > } > > if ("active".equals(replicaState)) { > log.info(String.format("%s at %s for %s in the %s > collection is active.", > role, nodeName, shardName, collectionName)); > } else { > // fail the ping by raising an exception > String errMsg = String.format("%s at %s for %s in the %s > collection is not active! State is: %s", > role, nodeName, shardName, collectionName, > replicaState); > if (exc != null) { > throw new > SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc); > } else { > throw new > SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg); > } > } > } > } > > On Tue, Jul 23, 2013 at 1:46 PM, jimtronic <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4080116&i=0>> > wrote: > > > I think the best bet here would be a ping like handler that would simply > > return the state of only this box in the cluster: > > > > Something like /admin/state which would return > > "down","active","leader","recovering" > > > > I'm not really sure where to begin however. Any ideas? > > > > jim > > > > On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] < > > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080116&i=1>> > wrote: > > > >> There is but I couldn't get it to work in my environment on Jetty, see: > >> > >> > >> > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E > < > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E>> > > >> > >> Let me know if you have any better luck. I had to resort to something > >> hacky but was out of time I could devote to such unproductive > >> endeavors ;-) > >> > >> On Mon, Jul 22, 2013 at 10:49 AM, jimtronic <[hidden email]< > http://user/SendEmail.jtp?type=node&node=4079518&i=0>> > >> wrote: > >> > >> > I'm not sure why it went down exactly -- I restarted the process and > >> lost the > >> > logs. (d'oh!) > >> > > >> > An OOM seems likely, however. Is there a setting for killing the > >> processes > >> > when solr encounters an OOM? > >> > > >> > Thanks! > >> > > >> > Jim > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html > >> > >> > Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > >> ------------------------------ > >> If you reply to this email, your message will be added to the > discussion > >> below: > >> > >> > > >> . > >> NAML< > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > >> > > > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080116.html > To unsubscribe from Node down, but not out, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4079495&code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ=> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080125.html Sent from the Solr - User mailing list archive at Nabble.com.