Hi

Perhaps, but my indexers just failed and receive a Service Unavaible error. The nodes themselves started to report multiple exceptions after many hours of indexing. One for an external file field which usually is not a problem and errors for not being able to talk to ZK.

%2012-03-04 16:37:18,786 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread] - : Recovery failed - trying again...%2012-03-04 16:44 :23,803 ERROR [solr.core.SolrCore] - [http-80-15] - : Error opening external value source file: java.io.FileNotFoundException: /mnt/di sk_1/solr/openindex_a/external_ctr (No such file or directory)%2012-03-04 16:45:11,768 ERROR [solr.core.SolrCore] - [http-80-15] - : E rror opening external value source file: java.io.FileNotFoundException: /mnt/disk_2/solr/openindex_b/external_ctr (No such file or dir ectory)%2012-03-04 16:48:37,334 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-16] - : null:ClientAbortException: java.net.Socket
Exception: Broken pipe
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:318) at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:296) at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:394) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:756) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:780) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:593)
        at org.apache.coyote.Response.doWrite(Response.java:560)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)
        ... 22 more



%2012-03-04 18:29:05,950 ERROR [solr.core.SolrCore] - [http-80-12] - : org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:679) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:251) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:97) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:662)

Tomcat instances had to be restarted but won't come back up. We've seen it before and usually restarting twice brings a node back up, at this moment stopping all nodes and ZK, clearing ZK and restarting the nodes won't bring them back up. The log files after restarting show the usual info but everything stalls after it lists all segment files and that a ZK state change has occured.

We've seen similar issues before where it could no longer talk to ZK for some reason and then everything went down, but ZK was still running. The ZK log began to show a lot of connection warning messages like:

2012-03-04 22:27:36,725 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] - Connection request from old client /141.105.120.151:49810; will be dropped if server is in r-o mode 2012-03-04 22:27:36,726 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establ
ish new session at /141.105.120.151:49810
2012-03-04 22:27:36,729 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] - Connection request from old client /141.105.120.153:36788; will be dropped if server is in r-o mode 2012-03-04 22:27:36,729 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establ
ish new session at /141.105.120.153:36788

But also other problems:

 client /141.105.120.154:36442 (no session established for client)
2012-03-04 22:27:23,195 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exceptio
n
EndOfStreamException: Unable to read additional data from client sessionid 0x135dfcfda000001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)
2012-03-04 22:27:23,196 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
 client /141.105.120.152:48526 which had sessionid 0x135dfcfda000001
2012-03-04 22:27:23,224 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exceptio
n
EndOfStreamException: Unable to read additional data from client sessionid 0x135dfcfda000002, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:662)

I'm quite clueless now and have 5 test nodes that are unable to resume operations.Will try again tomorrow and see if cleaning the Solr dataDir will help.

Thanks

-------- Original Message --------
Subject: Re: [SoldCloud] Slow indexing
Date: Sun, 4 Mar 2012 22:34:20 +0100
From: eks dev <eks...@googlemail.com>
To: solr-user@lucene.apache.org, markus.jel...@openindex.io
Reply-To: solr-user@lucene.apache.org

hmm, loks like you are facing exactly the phenomena I asked about.
See my question here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326

On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
Hi,

With auto-committing disabled we can now index many millions of documents in our test environment on a 5-node cluster with 5 shards and a replication factor of 2. The documents are uploaded from map/reduce. No significant changes were made to solrconfig and there are no update processors enabled.
We are using a trunk revision from this weekend.

The indexing speed is well below what we are used to see, we can easily index 5 millions documents on a non-cloud enabled Solr 3.x instance within an hour. What could be going on? There aren't many open TCP connections and the number of file descriptors is stable and I/O is low but CPU-time is high! Each node has two Solr cores both writing to their dedicated disk.

The indexing speed is stable, it was slow at start and still is. It's now running for well over 6 hours and only 3.5 millions documents are indexed. Another strange detail is that the node receiving all incoming documents (we're not yet using a client side Solr server pool) has a much larger disk usage than all other nodes. This is peculiar as we expected all replica's to
be a about the same size.

The receiving node has slightly higher CPU than the other nodes but the
thread dump shows a very large amount of threads of type
cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At the top of the list these threads all have < 20ms time but near the bottom it rises to just over 100ms. All nodes have a couple of http-80-30 (121994)
threads with very high CPU-time each.

Is this a known issue? Did i miss something? Any ideas?

Thanks

--

Reply via email to