Re: [SoldCloud] Slow indexing

Markus Jelsma Mon, 05 Mar 2012 10:00:37 -0800

On Mon, 5 Mar 2012 11:26:20 -0500, Mark Miller <markrmil...@gmail.com>wrote:

On Mar 5, 2012, at 10:01 AM, dar...@ontrenet.com wrote:

If one of those 10 indexing nodes goes down or falls out of sync andcomesback, does ZK block the state of indexing until that single nodecatches
back up?


No - if a node falls out of sync or comes back, the rest of the
cluster continues as normal and the node goes into recovery.

In recovery, the node tries two things to catch up: first it tries to
peer sync - if its off by less than 100 updates, it will simply
exchange updates with the leader and come back into sync. If its off
by more than that, it will start buffering updates from the leader,
replicate the full index from the leader, and then apply its buffered
updates to get come back in sync.

The only time indexing is stopped for a node is if that node loses
its connection to zookeeper. All other nodes that can still talk to
zookeeper will continue indexing. How soon we consider that we can't
talk to zookeeper depends on the zk session timeout - I have to look,
but for an embedded ensemble, we may be defaulting this a little low
currently.

That would suggest that in our case at some point Solr drops theconnection to ZK and is unable restore the connection, even afterrestarting Tomcat, many times.

I know ZK is running fine and responds with imok when i ask ruok. Wheni restart Tomcat i'll see these bad things in ZK's log:

2012-03-05 17:55:07,084 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -Accepted socket connection from /141.105.120.152:523282012-03-05 17:55:07,090 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -Connection request from old client /141.105.120.152:52328; will bedropped if server is in r-o mode2012-03-05 17:55:07,091 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Clientattempting to establish new session at /141.105.120.152:523282012-03-05 17:55:07,094 [myid:] - INFO [SyncThread:0:FileTxnLog@199] -Creating new log file: log.12012-03-05 17:55:07,107 [myid:] - INFO[SyncThread:0:ZooKeeperServer@604] - Established session0x135e3ffdb540000 with negotiated timeout 10000 for client/141.105.120.152:523282012-03-05 17:55:07,206 [myid:] - INFO [ProcessThread(sid:0cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperExceptionwhen processing sessionid:0x135e3ffdb540000 type:delete cxid:0xbzxid:0x5 txntype:-1 reqpath:n/a ErrorPath:/live_nodes/cn003.openindex.io:80_solr Error:KeeperErrorCode =NoNode for /live_nodes/cn003.openindex.io:80_solr

Solr will not come back up, even with a clean ZK data dir. I'll clearthe dataDir of one of the stuborn Solr nodes and retry. ... The Solrnode comes back up, finally. Here's the ZK log:

2012-03-05 17:56:55,939 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -Accepted socket connection from /141.105.120.152:363112012-03-05 17:56:55,944 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -Connection request from old client /141.105.120.152:36311; will bedropped if server is in r-o mode2012-03-05 17:56:55,944 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Clientattempting to establish new session at /141.105.120.152:363112012-03-05 17:56:55,967 [myid:] - INFO[SyncThread:0:ZooKeeperServer@604] - Established session0x135e3ffdb540001 with negotiated timeout 10000 for client/141.105.120.152:363112012-03-05 17:56:56,058 [myid:] - INFO [ProcessThread(sid:0cport:-1)::PrepRequestProcessor@617] - Got user-level KeeperExceptionwhen processing sessionid:0x135e3ffdb540001 type:delete cxid:0x3zxid:0x6b txntype:-1 reqpath:n/a ErrorPath:/live_nodes/cn003.openindex.io:80_solr Error:KeeperErrorCode =NoNode for /live_nodes/cn003.openindex.io:80_solr

I'm not sure about the problem but it looks like Solr won't start fineif there's an issue after listing all segment files. It may not be a ZKor cloud problem at all. Any suggestions?


Thanks


- Mark Miller
lucidimagination.com

Re: [SoldCloud] Slow indexing

Reply via email to