On Mon, 5 Mar 2012 11:26:20 -0500, Mark Miller <markrmil...@gmail.com> wrote:
On Mar 5, 2012, at 10:01 AM, dar...@ontrenet.com wrote:

If one of those 10 indexing nodes goes down or falls out of sync and comes back, does ZK block the state of indexing until that single node catches
back up?

No - if a node falls out of sync or comes back, the rest of the
cluster continues as normal and the node goes into recovery.

In recovery, the node tries two things to catch up: first it tries to
peer sync - if its off by less than 100 updates, it will simply
exchange updates with the leader and come back into sync. If its off
by more than that, it will start buffering updates from the leader,
replicate the full index from the leader, and then apply its buffered
updates to get come back in sync.

The only time indexing is stopped for a node is if that node loses
its connection to zookeeper. All other nodes that can still talk to
zookeeper will continue indexing. How soon we consider that we can't
talk to zookeeper depends on the zk session timeout - I have to look,
but for an embedded ensemble, we may be defaulting this a little low
currently.

That would suggest that in our case at some point Solr drops the connection to ZK and is unable restore the connection, even after restarting Tomcat, many times.

I know ZK is running fine and responds with imok when i ask ruok. When i restart Tomcat i'll see these bad things in ZK's log:

2012-03-05 17:55:07,084 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /141.105.120.152:52328 2012-03-05 17:55:07,090 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] - Connection request from old client /141.105.120.152:52328; will be dropped if server is in r-o mode 2012-03-05 17:55:07,091 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /141.105.120.152:52328 2012-03-05 17:55:07,094 [myid:] - INFO [SyncThread:0:FileTxnLog@199] - Creating new log file: log.1 2012-03-05 17:55:07,107 [myid:] - INFO [SyncThread:0:ZooKeeperServer@604] - Established session 0x135e3ffdb540000 with negotiated timeout 10000 for client /141.105.120.152:52328 2012-03-05 17:55:07,206 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException when processing sessionid:0x135e3ffdb540000 type:delete cxid:0xb zxid:0x5 txntype:-1 reqpath:n/a Error Path:/live_nodes/cn003.openindex.io:80_solr Error:KeeperErrorCode = NoNode for /live_nodes/cn003.openindex.io:80_solr

Solr will not come back up, even with a clean ZK data dir. I'll clear the dataDir of one of the stuborn Solr nodes and retry. ... The Solr node comes back up, finally. Here's the ZK log:

2012-03-05 17:56:55,939 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /141.105.120.152:36311 2012-03-05 17:56:55,944 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] - Connection request from old client /141.105.120.152:36311; will be dropped if server is in r-o mode 2012-03-05 17:56:55,944 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /141.105.120.152:36311 2012-03-05 17:56:55,967 [myid:] - INFO [SyncThread:0:ZooKeeperServer@604] - Established session 0x135e3ffdb540001 with negotiated timeout 10000 for client /141.105.120.152:36311 2012-03-05 17:56:56,058 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperException when processing sessionid:0x135e3ffdb540001 type:delete cxid:0x3 zxid:0x6b txntype:-1 reqpath:n/a Error Path:/live_nodes/cn003.openindex.io:80_solr Error:KeeperErrorCode = NoNode for /live_nodes/cn003.openindex.io:80_solr

I'm not sure about the problem but it looks like Solr won't start fine if there's an issue after listing all segment files. It may not be a ZK or cloud problem at all. Any suggestions?

Thanks


- Mark Miller
lucidimagination.com

Reply via email to