If I'm reading this right, you have 420M docs on a single shard? If that's true
you are pushing the envelope of what I've seen work and be performant. Your
OOM errors are the proverbial 'smoking gun' that you're putting too many docs
on too few nodes.

You say that the document count is "growing quite rapidly". My expectation is
that your problems will only get worse as you cram more docs into your shard.

You're correct that adding more memory (and consequently more JVM
memory?) only gets you so far before you start running into GC trouble,
when you hit full GC pauses they'll get longer and longer which is its own
problem. And you don't want to have huge JVM memory at the expense
of op system memory due the fact that Lucene uses MMapDirectory, see
Uwe's excellent blog:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I'd _strongly_ recommend you do "the sizing exercise". There are lots of
details here:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

You've already done some of this inadvertently, unfortunately it sounds like
it's in production. If I were going to guess, I'd say the maximum number of
docs on any shard should be less than half what you currently have. So you
need to figure out how many docs you expect to host in this collection
eventually
and have N/200M shards. At least.

There are various strategies when the answer is "I don't know", you
might add new
collections when you max out and then use "collection aliasing" to
query them etc.

Best,
Erick

On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sle...@panopto.com> wrote:
> Hello,
>
> I'm looking for some guidance on the best steps for tuning a solr cloud
> cluster which is heavy on writes. We are currently running a solr cloud
> fleet composed of one core, one shard, and three nodes. The cloud is hosted
> in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
> and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
> GiB over 420M documents, and growing quite rapidly. We are currently doing
> a bit more than 1000 document writes/deletes per second.
>
> Recently, we've hit some trouble with our production cloud. We have had the
> process on individual instances die a few times, and we see the following
> error messages being logged (expanded logs at the bottom of the email):
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.servlet.ServletHandler;
> /solr/panopto/select
> java.lang.IllegalStateException: Committed
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
> Committed before 500 {trace=org.eclipse.jetty.io.EofException
>
>
> Another time we saw this happen, we had java OOM errors (expanded logs at
> the bottom):
>
> WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/panopto/select
> java.lang.OutOfMemoryError: Java heap space
> ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>
> When the cloud goes into recovery during live indexing, it takes about 4-6
> hours for a node to recover, but when we turn off indexing, recovery only
> takes about 90 minutes.
>
> Moreover, we see that deletes are extremely slow. We do batch deletes of
> about 300 documents based on two value filters, and this takes about one
> minute:
>
> Research online suggests that a larger disk cache
> <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful,
> but I also see from an older page
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for
> Lucene that turning down the swappiness on our Linux instances may be
> preferred to simply increasing space for the disk cache.
>
> Moreover, to scale in the past, we've simply rolled our cluster while
> increasing the memory on the new machines, but I wonder if we're hitting
> the limit for how much we should scale vertically. My impression is that
> sharding will allow us to warm searchers faster and maintain a more
> effective cache as we scale. Will we really be helped by sharding, or is it
> only a matter of total CPU/Memory in the cluster?
>
> Thanks!
>
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>
> Logs:
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> at
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> at
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
>
> WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/panopto/select
> java.lang.OutOfMemoryError: Java heap space
> ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
> WARN  - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response;
> Committed before 500 {trace=org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
> at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
> at
> org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
> at
> org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
> at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> ,code=500}

Reply via email to