> If I'm reading this right, you have 420M docs on a single shard? Yep, you were reading it right. Thanks for your guidance. We will do various prototyping following "the sizing exercise".
Best, Stephen On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > If I'm reading this right, you have 420M docs on a single shard? If that's > true > you are pushing the envelope of what I've seen work and be performant. Your > OOM errors are the proverbial 'smoking gun' that you're putting too many > docs > on too few nodes. > > You say that the document count is "growing quite rapidly". My expectation > is > that your problems will only get worse as you cram more docs into your > shard. > > You're correct that adding more memory (and consequently more JVM > memory?) only gets you so far before you start running into GC trouble, > when you hit full GC pauses they'll get longer and longer which is its own > problem. And you don't want to have huge JVM memory at the expense > of op system memory due the fact that Lucene uses MMapDirectory, see > Uwe's excellent blog: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > I'd _strongly_ recommend you do "the sizing exercise". There are lots of > details here: > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > You've already done some of this inadvertently, unfortunately it sounds > like > it's in production. If I were going to guess, I'd say the maximum number of > docs on any shard should be less than half what you currently have. So you > need to figure out how many docs you expect to host in this collection > eventually > and have N/200M shards. At least. > > There are various strategies when the answer is "I don't know", you > might add new > collections when you max out and then use "collection aliasing" to > query them etc. > > Best, > Erick > > On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis <sle...@panopto.com> wrote: > > Hello, > > > > I'm looking for some guidance on the best steps for tuning a solr cloud > > cluster which is heavy on writes. We are currently running a solr cloud > > fleet composed of one core, one shard, and three nodes. The cloud is > hosted > > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu > > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550 > > GiB over 420M documents, and growing quite rapidly. We are currently > doing > > a bit more than 1000 document writes/deletes per second. > > > > Recently, we've hit some trouble with our production cloud. We have had > the > > process on individual instances die a few times, and we see the following > > error messages being logged (expanded logs at the bottom of the email): > > > > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException; > > null:org.eclipse.jetty.io.EofException > > > > WARN - 2016-04-26 00:55:29.571; > org.eclipse.jetty.servlet.ServletHandler; > > /solr/panopto/select > > java.lang.IllegalStateException: Committed > > > > WARN - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response; > > Committed before 500 {trace=org.eclipse.jetty.io.EofException > > > > > > Another time we saw this happen, we had java OOM errors (expanded logs at > > the bottom): > > > > WARN - 2016-04-25 22:58:43.943; > org.eclipse.jetty.servlet.ServletHandler; > > Error for /solr/panopto/select > > java.lang.OutOfMemoryError: Java heap space > > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException; > > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap > space > > ... > > Caused by: java.lang.OutOfMemoryError: Java heap space > > > > > > When the cloud goes into recovery during live indexing, it takes about > 4-6 > > hours for a node to recover, but when we turn off indexing, recovery only > > takes about 90 minutes. > > > > Moreover, we see that deletes are extremely slow. We do batch deletes of > > about 300 documents based on two value filters, and this takes about one > > minute: > > > > Research online suggests that a larger disk cache > > <https://wiki.apache.org/solr/SolrPerformanceProblems> could be helpful, > > but I also see from an older page > > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed> on tuning for > > Lucene that turning down the swappiness on our Linux instances may be > > preferred to simply increasing space for the disk cache. > > > > Moreover, to scale in the past, we've simply rolled our cluster while > > increasing the memory on the new machines, but I wonder if we're hitting > > the limit for how much we should scale vertically. My impression is that > > sharding will allow us to warm searchers faster and maintain a more > > effective cache as we scale. Will we really be helped by sharding, or is > it > > only a matter of total CPU/Memory in the cluster? > > > > Thanks! > > > > Stephen > > > > (206)753-9320 > > stephen-lewis.net > > > > Logs: > > > > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException; > > null:org.eclipse.jetty.io.EofException > > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142) > > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107) > > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) > > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) > > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) > > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) > > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155) > > at > > > org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83) > > at > > > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) > > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Thread.java:745) > > > > WARN - 2016-04-25 22:58:43.943; > org.eclipse.jetty.servlet.ServletHandler; > > Error for /solr/panopto/select > > java.lang.OutOfMemoryError: Java heap space > > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException; > > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap > space > > at > > > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:793) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:434) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) > > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.OutOfMemoryError: Java heap space > > > > WARN - 2016-04-26 00:56:43.873; org.eclipse.jetty.server.Response; > > Committed before 500 {trace=org.eclipse.jetty.io.EofException > > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142) > > at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107) > > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) > > at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) > > at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) > > at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) > > at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) > > at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155) > > at > > > org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83) > > at > > > org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) > > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Thread.java:745) > > ,code=500} > -- Stephen (206)753-9320 stephen-lewis.net