Wow! Awesome. Give me a bit to try to plug this into my environment.

The other way I was going to attempt this was to use the health check file
option for the ping request handler. I would have to write a separate
process in python or something that would ping zookeeper for active nodes
and if the current box's ip is there, I would create the health check file
which would make the ping work.

I'd prefer not to introduce yet another process that I need to keep
running, so this looks promising.

Jim

On Wed, Jul 24, 2013 at 11:49 AM, Timothy Potter [via Lucene] <
ml-node+s472066n4080116...@n3.nabble.com> wrote:

> Hi Jim,
>
> Based on our discussion, I cooked up this solution for my book Solr in
> Action and would appreciate you looking it over to see if it meets
> your needs. The basic idea is to extend Solr's built-in
> PingRequestHandler to verify a replica is connected to Zookeeper and
> is in the "active" state. To enable this, install the custom JAR and
> then update your solrconfig.xml to use this class instead of the
> built-in one for the /admin/ping request handler:
>
> <requestHandler name="/admin/ping"
> class="sia.ch13.ClusterStateAwarePingRequestHandler">
>
>
>
> >>>> Code <<<<
>
> package sia.ch13;
>
> import org.apache.solr.cloud.CloudDescriptor;
> import org.apache.solr.cloud.ZkController;
> import org.apache.solr.common.SolrException;
> import org.apache.solr.common.cloud.ClusterState;
> import org.apache.solr.common.cloud.Slice;
> import org.apache.solr.core.CoreContainer;
> import org.apache.solr.core.CoreDescriptor;
> import org.apache.solr.core.SolrCore;
> import org.apache.solr.handler.PingRequestHandler;
> import org.apache.solr.request.SolrQueryRequest;
> import org.apache.solr.response.SolrQueryResponse;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> /**
>  * Extends Solr's PingRequestHandler to check a replica's cluster
> status as part of the health check.
>  */
> public class ClusterStateAwarePingRequestHandler extends
> PingRequestHandler {
>
>     public static Logger log =
> LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class);
>
>     @Override
>     public void handleRequestBody(SolrQueryRequest solrQueryRequest,
> SolrQueryResponse solrQueryResponse) throws Exception {
>         // delegate to the base class to check the status of this local
> index
>         super.handleRequestBody(solrQueryRequest, solrQueryResponse);
>
>         // if ping status is OK, then check cluster state of this core
>         if ("OK".equals(solrQueryResponse.getValues().get("status"))) {
>             verifyThisReplicaIsActive(solrQueryRequest.getCore());
>         }
>     }
>
>     /**
>      * Verifies this replica is "active".
>      */
>     protected void verifyThisReplicaIsActive(SolrCore solrCore) throws
> SolrException {
>         String replicaState = "unknown";
>         String nodeName = "?";
>         String shardName = "?";
>         String collectionName = "?";
>         String role = "?";
>         Exception exc = null;
>         try {
>             CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor();
>             CoreContainer coreContainer =
> coreDescriptor.getCoreContainer();
>             CloudDescriptor cloud = coreDescriptor.getCloudDescriptor();
>
>             shardName = cloud.getShardId();
>             collectionName = cloud.getCollectionName();
>             role = (cloud.isLeader() ? "Leader" : "Replica");
>
>             ZkController zkController = coreContainer.getZkController();
>             if (zkController != null) {
>                 nodeName = zkController.getNodeName();
>                 if (zkController.isConnected()) {
>                     ClusterState clusterState =
> zkController.getClusterState();
>                     Slice slice =
> clusterState.getSlice(collectionName, shardName);
>                     replicaState = (slice != null) ? slice.getState() :
> "gone";
>                 } else {
>                     replicaState = "not connected to Zookeeper";
>                 }
>             } else {
>                 replicaState = "Zookeeper not enabled/configured";
>             }
>         } catch (Exception e) {
>             replicaState = "error determining cluster state";
>             exc = e;
>         }
>
>         if ("active".equals(replicaState)) {
>             log.info(String.format("%s at %s for %s in the %s
> collection is active.",
>                     role, nodeName, shardName, collectionName));
>         } else {
>             // fail the ping by raising an exception
>             String errMsg = String.format("%s at %s for %s in the %s
> collection is not active! State is: %s",
>                     role, nodeName, shardName, collectionName,
> replicaState);
>             if (exc != null) {
>                 throw new
> SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc);
>             } else {
>                 throw new
> SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg);
>             }
>         }
>     }
> }
>
> On Tue, Jul 23, 2013 at 1:46 PM, jimtronic <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4080116&i=0>>
> wrote:
>
> > I think the best bet here would be a ping like handler that would simply
> > return the state of only this box in the cluster:
> >
> > Something like /admin/state which would return
> > "down","active","leader","recovering"
> >
> > I'm not really sure where to begin however. Any ideas?
> >
> > jim
> >
> > On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] <
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080116&i=1>>
> wrote:
> >
> >> There is but I couldn't get it to work in my environment on Jetty, see:
> >>
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E
> <
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E>>
>
> >>
> >> Let me know if you have any better luck. I had to resort to something
> >> hacky but was out of time I could devote to such unproductive
> >> endeavors ;-)
> >>
> >> On Mon, Jul 22, 2013 at 10:49 AM, jimtronic <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4079518&i=0>>
> >> wrote:
> >>
> >> > I'm not sure why it went down exactly -- I restarted the process and
> >> lost the
> >> > logs. (d'oh!)
> >> >
> >> > An OOM seems likely, however. Is there a setting for killing the
> >> processes
> >> > when solr encounters an OOM?
> >> >
> >> > Thanks!
> >> >
> >> > Jim
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
> >>
> >> > Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >> ------------------------------
> >>  If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
>
> >> .
> >> NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080116.html
>  To unsubscribe from Node down, but not out, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4079495&code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ=>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080125.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to