Fwd: Re: [SoldCloud] Slow indexing

Markus Jelsma Sun, 04 Mar 2012 14:38:13 -0800

Hi

Perhaps, but my indexers just failed and receive a Service Unavaibleerror. The nodes themselves started to report multiple exceptions aftermany hours of indexing. One for an external file field which usually isnot a problem and errors for not being able to talk to ZK.

%2012-03-04 16:37:18,786 ERROR [solr.cloud.RecoveryStrategy] -[RecoveryThread] - : Recovery failed - trying again...%2012-03-04 16:44:23,803 ERROR [solr.core.SolrCore] - [http-80-15] - : Error openingexternal value source file: java.io.FileNotFoundException: /mnt/disk_1/solr/openindex_a/external_ctr (No such file ordirectory)%2012-03-04 16:45:11,768 ERROR [solr.core.SolrCore] -[http-80-15] - : Error opening external value source file: java.io.FileNotFoundException:/mnt/disk_2/solr/openindex_b/external_ctr (No such file or directory)%2012-03-04 16:48:37,334 ERROR [solr.servlet.SolrDispatchFilter]- [http-80-16] - : null:ClientAbortException: java.net.Socket

Exception: Broken pipe

atorg.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)atorg.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)atorg.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:318)atorg.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:296)atorg.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)

        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)

atjava.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)atorg.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)atorg.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:394)atorg.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)atorg.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)atorg.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)atorg.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)atorg.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)atorg.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)atorg.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)atorg.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)atorg.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)atorg.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)atorg.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)atorg.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)

atjava.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)atjava.net.SocketOutputStream.write(SocketOutputStream.java:136)atorg.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:756)atorg.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)atorg.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)atorg.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:780)atorg.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126)atorg.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:593)

        at org.apache.coyote.Response.doWrite(Response.java:560)

atorg.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)

        ... 22 more

%2012-03-04 18:29:05,950 ERROR [solr.core.SolrCore] - [http-80-12] - :org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updatesare disabled.atorg.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:679)atorg.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:251)atorg.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:97)atorg.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)

        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)

atorg.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)atorg.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)

atorg.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)atorg.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)atorg.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)atorg.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)atorg.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)atorg.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)atorg.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)atorg.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)atorg.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)atorg.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)atorg.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)atorg.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)atorg.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

        at java.lang.Thread.run(Thread.java:662)

Tomcat instances had to be restarted but won't come back up. We've seenit before and usually restarting twice brings a node back up, at thismoment stopping all nodes and ZK, clearing ZK and restarting the nodeswon't bring them back up. The log files after restarting show the usualinfo but everything stalls after it lists all segment files and that aZK state change has occured.

We've seen similar issues before where it could no longer talk to ZKfor some reason and then everything went down, but ZK was still running.The ZK log began to show a lot of connection warning messages like:

2012-03-04 22:27:36,725 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -Connection request from oldclient /141.105.120.151:49810; will be dropped if server is in r-omode2012-03-04 22:27:36,726 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Clientattempting to establ

ish new session at /141.105.120.151:49810

2012-03-04 22:27:36,729 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -Connection request from oldclient /141.105.120.153:36788; will be dropped if server is in r-omode2012-03-04 22:27:36,729 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Clientattempting to establ

ish new session at /141.105.120.153:36788

But also other problems:

 client /141.105.120.154:36442 (no session established for client)

2012-03-04 22:27:23,195 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caughtend of stream exceptio

EndOfStreamException: Unable to read additional data from clientsessionid 0x135dfcfda000001, likely client has closed socketatorg.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)atorg.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)

        at java.lang.Thread.run(Thread.java:662)

2012-03-04 22:27:23,196 [myid:] - INFO[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closedsocket connection for

 client /141.105.120.152:48526 which had sessionid 0x135dfcfda000001

2012-03-04 22:27:23,224 [myid:] - WARN[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caughtend of stream exceptio

EndOfStreamException: Unable to read additional data from clientsessionid 0x135dfcfda000002, likely client has closed socketatorg.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)atorg.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)

        at java.lang.Thread.run(Thread.java:662)

I'm quite clueless now and have 5 test nodes that are unable to resumeoperations.Will try again tomorrow and see if cleaning the Solr dataDirwill help.

Thanks

-------- Original Message --------
Subject: Re: [SoldCloud] Slow indexing
Date: Sun, 4 Mar 2012 22:34:20 +0100
From: eks dev <eks...@googlemail.com>
To: solr-user@lucene.apache.org, markus.jel...@openindex.io
Reply-To: solr-user@lucene.apache.org

hmm, loks like you are facing exactly the phenomena I asked about.
See my question here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326

On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:

Hi,
With auto-committing disabled we can now index many millions ofdocuments inour test environment on a 5-node cluster with 5 shards and areplicationfactor of 2. The documents are uploaded from map/reduce. Nosignificantchanges were made to solrconfig and there are no update processorsenabled.
We are using a trunk revision from this weekend.
The indexing speed is well below what we are used to see, we caneasilyindex 5 millions documents on a non-cloud enabled Solr 3.x instancewithinan hour. What could be going on? There aren't many open TCPconnections andthe number of file descriptors is stable and I/O is low but CPU-timeishigh! Each node has two Solr cores both writing to their dedicateddisk.
The indexing speed is stable, it was slow at start and still is. It'snowrunning for well over 6 hours and only 3.5 millions documents areindexed.Another strange detail is that the node receiving all incomingdocuments(we're not yet using a client side Solr server pool) has a muchlarger diskusage than all other nodes. This is peculiar as we expected allreplica's to
be a about the same size.
The receiving node has slightly higher CPU than the other nodes butthe
thread dump shows a very large amount of threads of type
cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. Atthetop of the list these threads all have < 20ms time but near thebottom itrises to just over 100ms. All nodes have a couple of http-80-30(121994)
threads with very high CPU-time each.

Is this a known issue? Did i miss something? Any ideas?

Thanks

--

Fwd: Re: [SoldCloud] Slow indexing

Reply via email to