On Mar 4, 2012, at 5:43 PM, Markus Jelsma wrote: > everything stalls after it lists all segment files and that a ZK state change > has occured.
Can you get a stack trace here? I'll try to respond to more tomorrow. What version of trunk are you using? We have been making fixes and improvements all the time, so need to get a frame of reference. When a client node cannot talk to zookeeper, because it may not know certain things it should (what if a leader changes?), it must reject updates (searches will still work). Why can't the node talk to zookeeper? Perhaps the load is so high on the server, it cannot respond to zk within the session timeout? I really don't know yet. When this happens though, it forces a recovery when/if the node can reconnect to zookeeper. We have not yet started on optimizing bulk indexing - currently an update is added locally *before* sending updates in parallel to each replica. Then we wait for each response before responding to the client. We plan to offer more optimizations and options around this. Feed back will be useful in making some of these improvements. - Mark Miller lucidimagination.com