Re: Node down, but not out

jimtronic Wed, 24 Jul 2013 13:36:50 -0700

Well, it seems to work. I wonder what the best way to test this would be?
How can I remove a node from a cluster but still have it be up and running?


Jim

On Wed, Jul 24, 2013 at 12:10 PM, Jim Musil <jimtro...@gmail.com> wrote:

> Wow! Awesome. Give me a bit to try to plug this into my environment.
>
> The other way I was going to attempt this was to use the health check file
> option for the ping request handler. I would have to write a separate
> process in python or something that would ping zookeeper for active nodes
> and if the current box's ip is there, I would create the health check file
> which would make the ping work.
>
> I'd prefer not to introduce yet another process that I need to keep
> running, so this looks promising.
>
> Jim
>
> On Wed, Jul 24, 2013 at 11:49 AM, Timothy Potter [via Lucene] <
> ml-node+s472066n4080116...@n3.nabble.com> wrote:
>
>> Hi Jim,
>>
>> Based on our discussion, I cooked up this solution for my book Solr in
>> Action and would appreciate you looking it over to see if it meets
>> your needs. The basic idea is to extend Solr's built-in
>> PingRequestHandler to verify a replica is connected to Zookeeper and
>> is in the "active" state. To enable this, install the custom JAR and
>> then update your solrconfig.xml to use this class instead of the
>> built-in one for the /admin/ping request handler:
>>
>> <requestHandler name="/admin/ping"
>> class="sia.ch13.ClusterStateAwarePingRequestHandler">
>>
>>
>>
>> >>>> Code <<<<
>>
>> package sia.ch13;
>>
>> import org.apache.solr.cloud.CloudDescriptor;
>> import org.apache.solr.cloud.ZkController;
>> import org.apache.solr.common.SolrException;
>> import org.apache.solr.common.cloud.ClusterState;
>> import org.apache.solr.common.cloud.Slice;
>> import org.apache.solr.core.CoreContainer;
>> import org.apache.solr.core.CoreDescriptor;
>> import org.apache.solr.core.SolrCore;
>> import org.apache.solr.handler.PingRequestHandler;
>> import org.apache.solr.request.SolrQueryRequest;
>> import org.apache.solr.response.SolrQueryResponse;
>> import org.slf4j.Logger;
>> import org.slf4j.LoggerFactory;
>>
>> /**
>>  * Extends Solr's PingRequestHandler to check a replica's cluster
>> status as part of the health check.
>>  */
>> public class ClusterStateAwarePingRequestHandler extends
>> PingRequestHandler {
>>
>>     public static Logger log =
>> LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class);
>>
>>     @Override
>>     public void handleRequestBody(SolrQueryRequest solrQueryRequest,
>> SolrQueryResponse solrQueryResponse) throws Exception {
>>         // delegate to the base class to check the status of this local
>> index
>>         super.handleRequestBody(solrQueryRequest, solrQueryResponse);
>>
>>         // if ping status is OK, then check cluster state of this core
>>         if ("OK".equals(solrQueryResponse.getValues().get("status"))) {
>>             verifyThisReplicaIsActive(solrQueryRequest.getCore());
>>         }
>>     }
>>
>>     /**
>>      * Verifies this replica is "active".
>>      */
>>     protected void verifyThisReplicaIsActive(SolrCore solrCore) throws
>> SolrException {
>>         String replicaState = "unknown";
>>         String nodeName = "?";
>>         String shardName = "?";
>>         String collectionName = "?";
>>         String role = "?";
>>         Exception exc = null;
>>         try {
>>             CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor();
>>             CoreContainer coreContainer =
>> coreDescriptor.getCoreContainer();
>>             CloudDescriptor cloud = coreDescriptor.getCloudDescriptor();
>>
>>             shardName = cloud.getShardId();
>>             collectionName = cloud.getCollectionName();
>>             role = (cloud.isLeader() ? "Leader" : "Replica");
>>
>>             ZkController zkController = coreContainer.getZkController();
>>             if (zkController != null) {
>>                 nodeName = zkController.getNodeName();
>>                 if (zkController.isConnected()) {
>>                     ClusterState clusterState =
>> zkController.getClusterState();
>>                     Slice slice =
>> clusterState.getSlice(collectionName, shardName);
>>                     replicaState = (slice != null) ? slice.getState() :
>> "gone";
>>                 } else {
>>                     replicaState = "not connected to Zookeeper";
>>                 }
>>             } else {
>>                 replicaState = "Zookeeper not enabled/configured";
>>             }
>>         } catch (Exception e) {
>>             replicaState = "error determining cluster state";
>>             exc = e;
>>         }
>>
>>         if ("active".equals(replicaState)) {
>>             log.info(String.format("%s at %s for %s in the %s
>> collection is active.",
>>                     role, nodeName, shardName, collectionName));
>>         } else {
>>             // fail the ping by raising an exception
>>             String errMsg = String.format("%s at %s for %s in the %s
>> collection is not active! State is: %s",
>>                     role, nodeName, shardName, collectionName,
>> replicaState);
>>             if (exc != null) {
>>                 throw new
>> SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc);
>>             } else {
>>                 throw new
>> SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg);
>>             }
>>         }
>>     }
>> }
>>
>> On Tue, Jul 23, 2013 at 1:46 PM, jimtronic <[hidden 
>> email]<http://user/SendEmail.jtp?type=node&node=4080116&i=0>>
>> wrote:
>>
>> > I think the best bet here would be a ping like handler that would
>> simply
>> > return the state of only this box in the cluster:
>> >
>> > Something like /admin/state which would return
>> > "down","active","leader","recovering"
>> >
>> > I'm not really sure where to begin however. Any ideas?
>> >
>> > jim
>> >
>> > On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] <
>> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080116&i=1>>
>> wrote:
>> >
>> >> There is but I couldn't get it to work in my environment on Jetty,
>> see:
>> >>
>> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E
>> <
>> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E>>
>>
>> >>
>> >> Let me know if you have any better luck. I had to resort to something
>> >> hacky but was out of time I could devote to such unproductive
>> >> endeavors ;-)
>> >>
>> >> On Mon, Jul 22, 2013 at 10:49 AM, jimtronic <[hidden email]<
>> http://user/SendEmail.jtp?type=node&node=4079518&i=0>>
>> >> wrote:
>> >>
>> >> > I'm not sure why it went down exactly -- I restarted the process and
>> >> lost the
>> >> > logs. (d'oh!)
>> >> >
>> >> > An OOM seems likely, however. Is there a setting for killing the
>> >> processes
>> >> > when solr encounters an OOM?
>> >> >
>> >> > Thanks!
>> >> >
>> >> > Jim
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html
>> >>
>> >> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >>
>>
>> >> .
>> >> NAML<
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html
>>
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080116.html
>>  To unsubscribe from Node down, but not out, click 
>> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4079495&code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ=>
>> .
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080169.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Node down, but not out

Reply via email to