Hi
Perhaps, but my indexers just failed and receive a Service Unavaible
error. The nodes themselves started to report multiple exceptions after
many hours of indexing. One for an external file field which usually is
not a problem and errors for not being able to talk to ZK.
%2012-03-04 16:37:18,786 ERROR [solr.cloud.RecoveryStrategy] -
[RecoveryThread] - : Recovery failed - trying again...%2012-03-04 16:44
:23,803 ERROR [solr.core.SolrCore] - [http-80-15] - : Error opening
external value source file: java.io.FileNotFoundException: /mnt/di
sk_1/solr/openindex_a/external_ctr (No such file or
directory)%2012-03-04 16:45:11,768 ERROR [solr.core.SolrCore] -
[http-80-15] - : E
rror opening external value source file: java.io.FileNotFoundException:
/mnt/disk_2/solr/openindex_b/external_ctr (No such file or dir
ectory)%2012-03-04 16:48:37,334 ERROR [solr.servlet.SolrDispatchFilter]
- [http-80-16] - : null:ClientAbortException: java.net.Socket
Exception: Broken pipe
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)
at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)
at
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:318)
at
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:296)
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:98)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:394)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:756)
at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)
at
org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)
at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:780)
at
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126)
at
org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:593)
at org.apache.coyote.Response.doWrite(Response.java:560)
at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)
... 22 more
%2012-03-04 18:29:05,950 ERROR [solr.core.SolrCore] - [http-80-12] - :
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:679)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:251)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:97)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:135)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:255)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Tomcat instances had to be restarted but won't come back up. We've seen
it before and usually restarting twice brings a node back up, at this
moment stopping all nodes and ZK, clearing ZK and restarting the nodes
won't bring them back up. The log files after restarting show the usual
info but everything stalls after it lists all segment files and that a
ZK state change has occured.
We've seen similar issues before where it could no longer talk to ZK
for some reason and then everything went down, but ZK was still running.
The ZK log began to show a lot of connection warning messages like:
2012-03-04 22:27:36,725 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -
Connection request from old
client /141.105.120.151:49810; will be dropped if server is in r-o
mode
2012-03-04 22:27:36,726 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establ
ish new session at /141.105.120.151:49810
2012-03-04 22:27:36,729 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@792] -
Connection request from old
client /141.105.120.153:36788; will be dropped if server is in r-o
mode
2012-03-04 22:27:36,729 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
attempting to establ
ish new session at /141.105.120.153:36788
But also other problems:
client /141.105.120.154:36442 (no session established for client)
2012-03-04 22:27:23,195 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught
end of stream exceptio
n
EndOfStreamException: Unable to read additional data from client
sessionid 0x135dfcfda000001, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
2012-03-04 22:27:23,196 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
socket connection for
client /141.105.120.152:48526 which had sessionid 0x135dfcfda000001
2012-03-04 22:27:23,224 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught
end of stream exceptio
n
EndOfStreamException: Unable to read additional data from client
sessionid 0x135dfcfda000002, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
I'm quite clueless now and have 5 test nodes that are unable to resume
operations.Will try again tomorrow and see if cleaning the Solr dataDir
will help.
Thanks
-------- Original Message --------
Subject: Re: [SoldCloud] Slow indexing
Date: Sun, 4 Mar 2012 22:34:20 +0100
From: eks dev <eks...@googlemail.com>
To: solr-user@lucene.apache.org, markus.jel...@openindex.io
Reply-To: solr-user@lucene.apache.org
hmm, loks like you are facing exactly the phenomena I asked about.
See my question here:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326
On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
Hi,
With auto-committing disabled we can now index many millions of
documents in
our test environment on a 5-node cluster with 5 shards and a
replication
factor of 2. The documents are uploaded from map/reduce. No
significant
changes were made to solrconfig and there are no update processors
enabled.
We are using a trunk revision from this weekend.
The indexing speed is well below what we are used to see, we can
easily
index 5 millions documents on a non-cloud enabled Solr 3.x instance
within
an hour. What could be going on? There aren't many open TCP
connections and
the number of file descriptors is stable and I/O is low but CPU-time
is
high! Each node has two Solr cores both writing to their dedicated
disk.
The indexing speed is stable, it was slow at start and still is. It's
now
running for well over 6 hours and only 3.5 millions documents are
indexed.
Another strange detail is that the node receiving all incoming
documents
(we're not yet using a client side Solr server pool) has a much
larger disk
usage than all other nodes. This is peculiar as we expected all
replica's to
be a about the same size.
The receiving node has slightly higher CPU than the other nodes but
the
thread dump shows a very large amount of threads of type
cmdDistribExecutor-8-thread-292260 (295090) with 0-100ms CPU-time. At
the
top of the list these threads all have < 20ms time but near the
bottom it
rises to just over 100ms. All nodes have a couple of http-80-30
(121994)
threads with very high CPU-time each.
Is this a known issue? Did i miss something? Any ideas?
Thanks
--