Help me understand these newrelic graphs
Here are some screen shots of our Solr Cloud cluster via Newrelic http://postimg.org/gallery/2hyzyeyc/ We currently have a 5 node cluster and all indexing is done on separate machines and shipped over. Our machines are running on SSD's with 18G of ram (Index size is 8G). We only have 1 shard at the moment with replicas on all 5 machines. I'm guessing thats a bit of a waste? How come when we do our bulk updating the response time actually decreases? I would think the load would be higher therefor response time should be higher. Any way I can decrease the response time? Thanks
Re: Help me understand these newrelic graphs
Ahh.. its including the add operation. That makes sense I then. A bit silly on NR's part they don't break it down. Otis, our index is only 8G so I don't consider that big by any means but our queries can get a bit complex with a bit of faceting. Do you still think it makes sense to shard? How easy would this be to get working? On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > I think NR has support for breaking by handler, no? Just checked - no. > Only webapp controller, but that doesn't apply to Solr. > > SPM should be more helpful when it comes to monitoring Solr - you can > filter by host, handler, collection/core, etc. -- you can see the demo - > https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud. > > If your index is big or queries are complex, shard it and parallelize > search. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice wrote: > > > I think your response time is including the average response for an add > > operation, which generally returns very quickly and due to sheer number > are > > averaging out the response time of your queries. New Relic should break > > out requests based on which handler they're hitting but they don't seem > to. > > > > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev > >wrote: > > > > > Here are some screen shots of our Solr Cloud cluster via Newrelic > > > > > > http://postimg.org/gallery/2hyzyeyc/ > > > > > > We currently have a 5 node cluster and all indexing is done on separate > > > machines and shipped over. Our machines are running on SSD's with 18G > of > > > ram (Index size is 8G). We only have 1 shard at the moment with > replicas > > on > > > all 5 machines. I'm guessing thats a bit of a waste? > > > > > > How come when we do our bulk updating the response time actually > > decreases? > > > I would think the load would be higher therefor response time should be > > > higher. Any way I can decrease the response time? > > > > > > Thanks > > > > > >
Re: Help me understand these newrelic graphs
If that is the case, what would help? On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > It really depends, hard to give a definitive instruction without more > pieces of info. > e.g. if your CPUs are all maxed out and you already have a high number of > concurrent queries than sharding may not be of any help at all. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Thu, Mar 13, 2014 at 7:42 PM, Software Dev >wrote: > > > Ahh.. its including the add operation. That makes sense I then. A bit > silly > > on NR's part they don't break it down. > > > > Otis, our index is only 8G so I don't consider that big by any means but > > our queries can get a bit complex with a bit of faceting. Do you still > > think it makes sense to shard? How easy would this be to get working? > > > > > > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > > > Hi, > > > > > > I think NR has support for breaking by handler, no? Just checked - no. > > > Only webapp controller, but that doesn't apply to Solr. > > > > > > SPM should be more helpful when it comes to monitoring Solr - you can > > > filter by host, handler, collection/core, etc. -- you can see the demo > - > > > https://apps.sematext.com/demo - though this is plain Solr, not > > SolrCloud. > > > > > > If your index is big or queries are complex, shard it and parallelize > > > search. > > > > > > Otis > > > -- > > > Performance Monitoring * Log Analytics * Search Analytics > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice > > wrote: > > > > > > > I think your response time is including the average response for an > add > > > > operation, which generally returns very quickly and due to sheer > number > > > are > > > > averaging out the response time of your queries. New Relic should > > break > > > > out requests based on which handler they're hitting but they don't > seem > > > to. > > > > > > > > > > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev < > > static.void@gmail.com > > > > >wrote: > > > > > > > > > Here are some screen shots of our Solr Cloud cluster via Newrelic > > > > > > > > > > http://postimg.org/gallery/2hyzyeyc/ > > > > > > > > > > We currently have a 5 node cluster and all indexing is done on > > separate > > > > > machines and shipped over. Our machines are running on SSD's with > 18G > > > of > > > > > ram (Index size is 8G). We only have 1 shard at the moment with > > > replicas > > > > on > > > > > all 5 machines. I'm guessing thats a bit of a waste? > > > > > > > > > > How come when we do our bulk updating the response time actually > > > > decreases? > > > > > I would think the load would be higher therefor response time > should > > be > > > > > higher. Any way I can decrease the response time? > > > > > > > > > > Thanks > > > > > > > > > > > > > > >
Re: Help me understand these newrelic graphs
Here is a screenshot of the host information: http://postimg.org/image/vub5ihxix/ As you can see we have 24 core CPU's and the load is only at 5-7.5. On Fri, Mar 14, 2014 at 10:02 AM, Software Dev wrote: > If that is the case, what would help? > > > On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > >> It really depends, hard to give a definitive instruction without more >> pieces of info. >> e.g. if your CPUs are all maxed out and you already have a high number of >> concurrent queries than sharding may not be of any help at all. >> >> Otis >> -- >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> On Thu, Mar 13, 2014 at 7:42 PM, Software Dev > >wrote: >> >> > Ahh.. its including the add operation. That makes sense I then. A bit >> silly >> > on NR's part they don't break it down. >> > >> > Otis, our index is only 8G so I don't consider that big by any means but >> > our queries can get a bit complex with a bit of faceting. Do you still >> > think it makes sense to shard? How easy would this be to get working? >> > >> > >> > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic < >> > otis.gospodne...@gmail.com> wrote: >> > >> > > Hi, >> > > >> > > I think NR has support for breaking by handler, no? Just checked - >> no. >> > > Only webapp controller, but that doesn't apply to Solr. >> > > >> > > SPM should be more helpful when it comes to monitoring Solr - you can >> > > filter by host, handler, collection/core, etc. -- you can see the >> demo - >> > > https://apps.sematext.com/demo - though this is plain Solr, not >> > SolrCloud. >> > > >> > > If your index is big or queries are complex, shard it and parallelize >> > > search. >> > > >> > > Otis >> > > -- >> > > Performance Monitoring * Log Analytics * Search Analytics >> > > Solr & Elasticsearch Support * http://sematext.com/ >> > > >> > > >> > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice >> > wrote: >> > > >> > > > I think your response time is including the average response for an >> add >> > > > operation, which generally returns very quickly and due to sheer >> number >> > > are >> > > > averaging out the response time of your queries. New Relic should >> > break >> > > > out requests based on which handler they're hitting but they don't >> seem >> > > to. >> > > > >> > > > >> > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev < >> > static.void@gmail.com >> > > > >wrote: >> > > > >> > > > > Here are some screen shots of our Solr Cloud cluster via Newrelic >> > > > > >> > > > > http://postimg.org/gallery/2hyzyeyc/ >> > > > > >> > > > > We currently have a 5 node cluster and all indexing is done on >> > separate >> > > > > machines and shipped over. Our machines are running on SSD's with >> 18G >> > > of >> > > > > ram (Index size is 8G). We only have 1 shard at the moment with >> > > replicas >> > > > on >> > > > > all 5 machines. I'm guessing thats a bit of a waste? >> > > > > >> > > > > How come when we do our bulk updating the response time actually >> > > > decreases? >> > > > > I would think the load would be higher therefor response time >> should >> > be >> > > > > higher. Any way I can decrease the response time? >> > > > > >> > > > > Thanks >> > > > > >> > > > >> > > >> > >> > >
Re: Help me understand these newrelic graphs
Otis, I want to get those spikes down lower if possible. As mentioned in the above posts that the 25ms timing you are seeing is not really accurate because that's the average response time for ALL requests including the bulk add operations which are generally super fast. Our true response time is around 100ms. On Fri, Mar 14, 2014 at 10:54 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Are you trying to bring that 24.9 ms response time down? > Looks like there is room for more aggressive sharing there, yes. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Fri, Mar 14, 2014 at 1:07 PM, Software Dev >wrote: > > > Here is a screenshot of the host information: > > http://postimg.org/image/vub5ihxix/ > > > > As you can see we have 24 core CPU's and the load is only at 5-7.5. > > > > > > On Fri, Mar 14, 2014 at 10:02 AM, Software Dev < > static.void@gmail.com > > >wrote: > > > > > If that is the case, what would help? > > > > > > > > > On Thu, Mar 13, 2014 at 8:46 PM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > >> It really depends, hard to give a definitive instruction without more > > >> pieces of info. > > >> e.g. if your CPUs are all maxed out and you already have a high number > > of > > >> concurrent queries than sharding may not be of any help at all. > > >> > > >> Otis > > >> -- > > >> Performance Monitoring * Log Analytics * Search Analytics > > >> Solr & Elasticsearch Support * http://sematext.com/ > > >> > > >> > > >> On Thu, Mar 13, 2014 at 7:42 PM, Software Dev < > > static.void@gmail.com > > >> >wrote: > > >> > > >> > Ahh.. its including the add operation. That makes sense I then. A > bit > > >> silly > > >> > on NR's part they don't break it down. > > >> > > > >> > Otis, our index is only 8G so I don't consider that big by any means > > but > > >> > our queries can get a bit complex with a bit of faceting. Do you > still > > >> > think it makes sense to shard? How easy would this be to get > working? > > >> > > > >> > > > >> > On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic < > > >> > otis.gospodne...@gmail.com> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > I think NR has support for breaking by handler, no? Just checked > - > > >> no. > > >> > > Only webapp controller, but that doesn't apply to Solr. > > >> > > > > >> > > SPM should be more helpful when it comes to monitoring Solr - you > > can > > >> > > filter by host, handler, collection/core, etc. -- you can see the > > >> demo - > > >> > > https://apps.sematext.com/demo - though this is plain Solr, not > > >> > SolrCloud. > > >> > > > > >> > > If your index is big or queries are complex, shard it and > > parallelize > > >> > > search. > > >> > > > > >> > > Otis > > >> > > -- > > >> > > Performance Monitoring * Log Analytics * Search Analytics > > >> > > Solr & Elasticsearch Support * http://sematext.com/ > > >> > > > > >> > > > > >> > > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice > > > >> > wrote: > > >> > > > > >> > > > I think your response time is including the average response for > > an > > >> add > > >> > > > operation, which generally returns very quickly and due to sheer > > >> number > > >> > > are > > >> > > > averaging out the response time of your queries. New Relic > should > > >> > break > > >> > > > out requests based on which handler they're hitting but they > don't > > >> seem > > >> > > to. > > >> > > > > > >> > > > > > >> > > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev < > > >> > static.void@gmail.com > > >> > > > >wrote: > > >> > > > > > >> > > > > Here are some screen shots of our Solr Cloud cluster via > > Newrelic > > >> > > > > > > >> > > > > http://postimg.org/gallery/2hyzyeyc/ > > >> > > > > > > >> > > > > We currently have a 5 node cluster and all indexing is done on > > >> > separate > > >> > > > > machines and shipped over. Our machines are running on SSD's > > with > > >> 18G > > >> > > of > > >> > > > > ram (Index size is 8G). We only have 1 shard at the moment > with > > >> > > replicas > > >> > > > on > > >> > > > > all 5 machines. I'm guessing thats a bit of a waste? > > >> > > > > > > >> > > > > How come when we do our bulk updating the response time > actually > > >> > > > decreases? > > >> > > > > I would think the load would be higher therefor response time > > >> should > > >> > be > > >> > > > > higher. Any way I can decrease the response time? > > >> > > > > > > >> > > > > Thanks > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
Solr Cloud collection keep going down?
We have 2 collections with 1 shard each replicated over 5 servers in the cluster. We see a lot of flapping (down or recovering) on one of the collections. When this happens the other collection hosted on the same machine is still marked as active. When this happens it takes a fairly long time (~30 minutes) for the collection to come back online, if at all. I find that its usually more reliable to completely shutdown solr on the affected machine and bring it back up with its core disabled. We then re-enable the core when its marked as active. A few questions: 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing that marks one collection as down but the other on the same machine as up? 2) Why does recovery take forever when a node goes down.. even if its only down for 30 seconds. Our index is only 7-8G and we are running on SSD's. 3) What can be done to diagnose and fix this problem?
Re: Solr Cloud collection keep going down?
iter.write(OutputStreamWriter.java:207) at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) at org.apache.solr.util.FastWriter.write(FastWriter.java:55) at org.apache.solr.response.RubyWriter.writeStr(RubyResponseWriter.java:87) at org.apache.solr.response.JSONWriter.writeNamedListAsFlat(JSONResponseWriter.java:285) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:301) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:188) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95) at org.apache.solr.response.RubyResponseWriter.write(RubyResponseWriter.java:37) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:768) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:440) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375) at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164) at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:182) at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838) ... 51 more ,code=500} On Sat, Mar 22, 2014 at 12:23 PM, Software Dev wrote: > We have 2 collections with 1 shard each replicated over 5 servers in the > cluster. We see a lot of flapping (down or recovering) on one of the > collections. When this happens the other collection hosted on the same > machine is still marked as active. When this happens it takes a fairly long > time (~30 minutes) for the collection to come back online, if at all. I find > that its usually more reliable to completely shutdown solr on the affected > machine and bring it back up with its core disabled. We then re-enable the > core when its marked as active. > > A few questions: > > 1) What is the healt
Re: Solr Cloud collection keep going down?
Shawn, Thanks for pointing me in the right direction. After consulting the above document I *think* that the problem may be too large of a heap and which may be affecting GC collection and hence causing ZK timeouts. We have around 20G of memory on these machines with a min/max of heap at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for aside for disk cache. Why did we choose 6-10? No other reason than we wanted to allot enough for disk cache and then everything else was thrown and Solr. Does this sound about right? I took some screenshots for VisualVM and our NewRelic reporting as well as some relevant portions of our SolrConfig.xml. Any thoughts/comments would be greatly appreciated. http://postimg.org/gallery/4t73sdks/1fc10f9c/ Thanks On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey wrote: > On 3/22/2014 1:23 PM, Software Dev wrote: >> We have 2 collections with 1 shard each replicated over 5 servers in the >> cluster. We see a lot of flapping (down or recovering) on one of the >> collections. When this happens the other collection hosted on the same >> machine is still marked as active. When this happens it takes a fairly long >> time (~30 minutes) for the collection to come back online, if at all. I >> find that its usually more reliable to completely shutdown solr on the >> affected machine and bring it back up with its core disabled. We then >> re-enable the core when its marked as active. >> >> A few questions: >> >> 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing >> that marks one collection as down but the other on the same machine as up? >> >> 2) Why does recovery take forever when a node goes down.. even if its only >> down for 30 seconds. Our index is only 7-8G and we are running on SSD's. >> >> 3) What can be done to diagnose and fix this problem? > > Unless you are actually using the ping request handler, the healthcheck > config will not matter. Or were you referring to something else? > > Referencing the logs you included in your reply: The EofException > errors happen because your client code times out and disconnects before > the request it made has completed. That is most likely just a symptom > that has nothing at all to do with the problem. > > Read the following wiki page. What I'm going to say below will > reference information you can find there: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Relevant side note: The default zookeeper client timeout is 15 seconds. > A typical zookeeper config defines tickTime as 2 seconds, and the > timeout cannot be configured to be more than 20 times the tickTime, > which means it cannot go beyond 40 seconds. The default timeout value > 15 seconds is usually more than enough, unless you are having > performance problems. > > If you are not actually taking Solr instances down, then the fact that > you are seeing the log replay messages indicates to me that something is > taking so much time that the connection to Zookeeper times out. When it > finally responds, it will attempt to recover the index, which means > first it will replay the transaction log and then it might replicate the > index from the shard leader. > > Replaying the transaction log is likely the reason it takes so long to > recover. The wiki page I linked above has a "slow startup" section that > explains how to fix this. > > There is some kind of underlying problem that is causing the zookeeper > connection to timeout. It is most likely garbage collection pauses or > insufficient RAM to cache the index, possibly both. > > You did not indicate how much total RAM you have or how big your Java > heap is. As the wiki page mentions in the SSD section, SSD is not a > substitute for having enough RAM to cache at significant percentage of > your index. > > Thanks, > Shawn >
Question on highlighting edgegrams
In 3.5.0 we have the following. If we searched for "c" with highlighting enabled we would get back results such as: cdat crocdile cool beans But in the latest Solr (4.7) we get the full words highlighted back. Did something change from these versions with regards to highlighting? Thanks
Re: Question on highlighting edgegrams
Bump On Mon, Mar 24, 2014 at 3:00 PM, Software Dev wrote: > In 3.5.0 we have the following. > > positionIncrementGap="100"> > > > > maxGramSize="30"/> > > > > > > > > If we searched for "c" with highlighting enabled we would get back > results such as: > > cdat > crocdile > cool beans > > But in the latest Solr (4.7) we get the full words highlighted back. > Did something change from these versions with regards to highlighting? > > Thanks
Replication (Solr Cloud)
I see that by default in SolrCloud that my collections are replicating. Should this be disabled in SolrCloud as this is already handled by it? >From the documentation: "The Replication screen shows you the current replication state for the named core you have specified. In Solr, replication is for the index only. SolrCloud has supplanted much of this functionality, but if you are still using index replication, you can use this screen to see the replication state:" I just want to make sure before I disable it that if we send an update to one server that the document will be correctly replicated across all nodes. Thanks
Re: Replication (Solr Cloud)
Thanks for the reply. Ill make sure NOT to disable it.
Re: Solr Cloud collection keep going down?
Can anyone else chime in? Thanks On Mon, Mar 24, 2014 at 10:10 AM, Software Dev wrote: > Shawn, > > Thanks for pointing me in the right direction. After consulting the > above document I *think* that the problem may be too large of a heap > and which may be affecting GC collection and hence causing ZK > timeouts. > > We have around 20G of memory on these machines with a min/max of heap > at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for > aside for disk cache. Why did we choose 6-10? No other reason than we > wanted to allot enough for disk cache and then everything else was > thrown and Solr. Does this sound about right? > > I took some screenshots for VisualVM and our NewRelic reporting as > well as some relevant portions of our SolrConfig.xml. Any > thoughts/comments would be greatly appreciated. > > http://postimg.org/gallery/4t73sdks/1fc10f9c/ > > Thanks > > > > > On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey wrote: >> On 3/22/2014 1:23 PM, Software Dev wrote: >>> We have 2 collections with 1 shard each replicated over 5 servers in the >>> cluster. We see a lot of flapping (down or recovering) on one of the >>> collections. When this happens the other collection hosted on the same >>> machine is still marked as active. When this happens it takes a fairly long >>> time (~30 minutes) for the collection to come back online, if at all. I >>> find that its usually more reliable to completely shutdown solr on the >>> affected machine and bring it back up with its core disabled. We then >>> re-enable the core when its marked as active. >>> >>> A few questions: >>> >>> 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing >>> that marks one collection as down but the other on the same machine as up? >>> >>> 2) Why does recovery take forever when a node goes down.. even if its only >>> down for 30 seconds. Our index is only 7-8G and we are running on SSD's. >>> >>> 3) What can be done to diagnose and fix this problem? >> >> Unless you are actually using the ping request handler, the healthcheck >> config will not matter. Or were you referring to something else? >> >> Referencing the logs you included in your reply: The EofException >> errors happen because your client code times out and disconnects before >> the request it made has completed. That is most likely just a symptom >> that has nothing at all to do with the problem. >> >> Read the following wiki page. What I'm going to say below will >> reference information you can find there: >> >> http://wiki.apache.org/solr/SolrPerformanceProblems >> >> Relevant side note: The default zookeeper client timeout is 15 seconds. >> A typical zookeeper config defines tickTime as 2 seconds, and the >> timeout cannot be configured to be more than 20 times the tickTime, >> which means it cannot go beyond 40 seconds. The default timeout value >> 15 seconds is usually more than enough, unless you are having >> performance problems. >> >> If you are not actually taking Solr instances down, then the fact that >> you are seeing the log replay messages indicates to me that something is >> taking so much time that the connection to Zookeeper times out. When it >> finally responds, it will attempt to recover the index, which means >> first it will replay the transaction log and then it might replicate the >> index from the shard leader. >> >> Replaying the transaction log is likely the reason it takes so long to >> recover. The wiki page I linked above has a "slow startup" section that >> explains how to fix this. >> >> There is some kind of underlying problem that is causing the zookeeper >> connection to timeout. It is most likely garbage collection pauses or >> insufficient RAM to cache the index, possibly both. >> >> You did not indicate how much total RAM you have or how big your Java >> heap is. As the wiki page mentions in the SSD section, SSD is not a >> substitute for having enough RAM to cache at significant percentage of >> your index. >> >> Thanks, >> Shawn >>
Re: Replication (Solr Cloud)
One other question. If I optimize a collection on one node, does this get replicated to all others when finished? On Tue, Mar 25, 2014 at 10:13 AM, Software Dev wrote: > Thanks for the reply. Ill make sure NOT to disable it.
Re: Replication (Solr Cloud)
Ehh.. found out the hard way. I optimized the collection on 1 machine and when it was completed it replicated to the others and took my cluster down. Shitty On Tue, Mar 25, 2014 at 10:46 AM, Software Dev wrote: > One other question. If I optimize a collection on one node, does this > get replicated to all others when finished? > > On Tue, Mar 25, 2014 at 10:13 AM, Software Dev > wrote: >> Thanks for the reply. Ill make sure NOT to disable it.
Re: Replication (Solr Cloud)
So its generally a bad idea to optimize I gather? - In older versions it might have done them all at once, but I believe that newer versions only do one core at a time. On Tue, Mar 25, 2014 at 11:16 AM, Shawn Heisey wrote: > On 3/25/2014 11:59 AM, Software Dev wrote: >> >> Ehh.. found out the hard way. I optimized the collection on 1 machine >> and when it was completed it replicated to the others and took my >> cluster down. Shitty > > > It doesn't get replicated -- each core in the collection will be optimized. > In older versions it might have done them all at once, but I believe that > newer versions only do one core at a time. > > Doing an optimize on a Solr core results in a LOT of I/O. If your Solr > install is having performance issues, that will push it over the edge. When > SolrCloud ends up with a performance problem in one place, they tend to > multiply and cause MORE problems. It can get bad enough that the whole > cluster goes down because it's trying to do a recovery on every node. For > that reason, it's extremely important that you have enough system resources > available across your cloud (RAM in particular) to avoid performance issues. > > Thanks, > Shawn >
Re: Replication (Solr Cloud)
"In older versions it might have done them all at once, but I believe that newer versions only do one core at a time." It looks like it did it all at once and I'm on the latest (4.7) On Tue, Mar 25, 2014 at 11:27 AM, Software Dev wrote: > So its generally a bad idea to optimize I gather? > > - In older versions it might have done them all at once, but I believe > that newer versions only do one core at a time. > > On Tue, Mar 25, 2014 at 11:16 AM, Shawn Heisey wrote: >> On 3/25/2014 11:59 AM, Software Dev wrote: >>> >>> Ehh.. found out the hard way. I optimized the collection on 1 machine >>> and when it was completed it replicated to the others and took my >>> cluster down. Shitty >> >> >> It doesn't get replicated -- each core in the collection will be optimized. >> In older versions it might have done them all at once, but I believe that >> newer versions only do one core at a time. >> >> Doing an optimize on a Solr core results in a LOT of I/O. If your Solr >> install is having performance issues, that will push it over the edge. When >> SolrCloud ends up with a performance problem in one place, they tend to >> multiply and cause MORE problems. It can get bad enough that the whole >> cluster goes down because it's trying to do a recovery on every node. For >> that reason, it's extremely important that you have enough system resources >> available across your cloud (RAM in particular) to avoid performance issues. >> >> Thanks, >> Shawn >>
Re: Question on highlighting edgegrams
Same problem here: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html On Tue, Mar 25, 2014 at 9:39 AM, Software Dev wrote: > Bump > > On Mon, Mar 24, 2014 at 3:00 PM, Software Dev > wrote: >> In 3.5.0 we have the following. >> >> > positionIncrementGap="100"> >> >> >> >> > maxGramSize="30"/> >> >> >> >> >> >> >> >> If we searched for "c" with highlighting enabled we would get back >> results such as: >> >> cdat >> crocdile >> cool beans >> >> But in the latest Solr (4.7) we get the full words highlighted back. >> Did something change from these versions with regards to highlighting? >> >> Thanks
What contributes to disk IO?
What are the main contributing factors for Solr Cloud generating a lot of disk IO? A lot of reads? Writes? Insufficient RAM? I would think if there was enough disk cache available for the whole index there would be little to no disk IO.
Re: Question on highlighting edgegrams
Is this a known bug? On Tue, Mar 25, 2014 at 1:12 PM, Software Dev wrote: > Same problem here: > http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html > > On Tue, Mar 25, 2014 at 9:39 AM, Software Dev > wrote: >> Bump >> >> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev >> wrote: >>> In 3.5.0 we have the following. >>> >>> >> positionIncrementGap="100"> >>> >>> >>> >>> >> maxGramSize="30"/> >>> >>> >>> >>> >>> >>> >>> >>> If we searched for "c" with highlighting enabled we would get back >>> results such as: >>> >>> cdat >>> crocdile >>> cool beans >>> >>> But in the latest Solr (4.7) we get the full words highlighted back. >>> Did something change from these versions with regards to highlighting? >>> >>> Thanks
What are my options?
We have a collection named "items". These are simply products that we sell. A large part of our scoring involves boosting on certain metrics for each product (amount sold, total GMS, ratings, etc). Some of these metrics are actually split across multiple tables. We are currently re-indexing the complete document anytime any of these values changes. I'm wondering if there is a better way? Some ideas: 1) Partial update the document. Is this even possible? 2) Add a parent-child relationship on Item and its metrics? 3) Dump all metrics to a file and use that as it changes throughout the day? I forgot the actual component that does it. Either way, can it handle multiple values? 4) Something else? I appreciate any feedback. Thanks
Re: Question on highlighting edgegrams
Certainly I am not the only user experiencing this? On Wed, Mar 26, 2014 at 1:11 PM, Software Dev wrote: > Is this a known bug? > > On Tue, Mar 25, 2014 at 1:12 PM, Software Dev > wrote: >> Same problem here: >> http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html >> >> On Tue, Mar 25, 2014 at 9:39 AM, Software Dev >> wrote: >>> Bump >>> >>> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev >>> wrote: >>>> In 3.5.0 we have the following. >>>> >>>> >>> positionIncrementGap="100"> >>>> >>>> >>>> >>>> >>> maxGramSize="30"/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> If we searched for "c" with highlighting enabled we would get back >>>> results such as: >>>> >>>> cdat >>>> crocdile >>>> cool beans >>>> >>>> But in the latest Solr (4.7) we get the full words highlighted back. >>>> Did something change from these versions with regards to highlighting? >>>> >>>> Thanks
Re: Question on highlighting edgegrams
Shalin, I am running 4.7 and seeing this behavior :( On Thu, Mar 27, 2014 at 10:36 PM, Shalin Shekhar Mangar wrote: > Yes, there are known bugs with EdgeNGram filters. I think they are fixed in > 4.4 > > See https://issues.apache.org/jira/browse/LUCENE-3907 > > On Fri, Mar 28, 2014 at 10:17 AM, Software Dev > wrote: >> Certainly I am not the only user experiencing this? >> >> On Wed, Mar 26, 2014 at 1:11 PM, Software Dev >> wrote: >>> Is this a known bug? >>> >>> On Tue, Mar 25, 2014 at 1:12 PM, Software Dev >>> wrote: >>>> Same problem here: >>>> http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html >>>> >>>> On Tue, Mar 25, 2014 at 9:39 AM, Software Dev >>>> wrote: >>>>> Bump >>>>> >>>>> On Mon, Mar 24, 2014 at 3:00 PM, Software Dev >>>>> wrote: >>>>>> In 3.5.0 we have the following. >>>>>> >>>>>> >>>>> positionIncrementGap="100"> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> maxGramSize="30"/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> If we searched for "c" with highlighting enabled we would get back >>>>>> results such as: >>>>>> >>>>>> cdat >>>>>> crocdile >>>>>> cool beans >>>>>> >>>>>> But in the latest Solr (4.7) we get the full words highlighted back. >>>>>> Did something change from these versions with regards to highlighting? >>>>>> >>>>>> Thanks > > > > -- > Regards, > Shalin Shekhar Mangar.
Highlighting bug with edgegrams
In 3.5.0 we have the following. If we searched for "c" with highlighting enabled we would get back results such as: cdat crocdile cool beans But in the latest Solr (4.7.1) we get the full words highlighted back. Did something change from these versions with regards to highlighting? Thanks Found an old post but no info: http://lucene.472066.n3.nabble.com/Solr-4-x-EdgeNGramFilterFactory-and-highlighting-td4114748.html
Re: Sharding and replicas (Solr Cloud)
Sorry about the confusion. I meant I created my config via the ZkCLI and then I wanted to create my core via the CollectionsAPI. I *think* I have it working but was wondering why there are a crazy amount of core names under the admin "Core Selector"? When I create X amount of shards via the bootstrap command I think it only creates 1 core. Am I missing something? On Thu, Nov 7, 2013 at 1:06 PM, Shawn Heisey wrote: > On 11/7/2013 1:58 PM, Mark wrote: > >> If I create my collection via the ZkCLI (https://cwiki.apache.org/ >> confluence/display/solr/Command+Line+Utilities) how do I configure the >> number of shards and replicas? >> > > I was not aware that you could create collections with zkcli. I did not > think that was possible. Use the collections API: > > http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_ > Collections_API > > Thanks, > Shawn > >
Re: Sharding and replicas (Solr Cloud)
I too want to be in control of everything that is created. Here is what I'm trying to do. 1) Start up a cluster of 5 Solr Instances 2) Import the configuration to Zookeeper 3) Manually create a collection via the collections api with number of shards and replication factor Now there are some issues with step 3. After creating the collection reload the GUI I always see: - *collection1:* org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection collection1 found:null until I restart the cluster. Is there a way around this? Also after creating the collection it creates a directory in $SOLR_HOME/home. So in this example it created ${SOLR_HOME}/collection1_shard1_replica1 and ${SOLR_HOME}/collection1_shard1_replica2. What happens when I rename both of these to the same in the core admin? On Thu, Nov 7, 2013 at 3:15 PM, Shawn Heisey wrote: > On 11/7/2013 2:52 PM, Software Dev wrote: > >> Sorry about the confusion. I meant I created my config via the ZkCLI and >> then I wanted to create my core via the CollectionsAPI. I *think* I have >> it >> working but was wondering why there are a crazy amount of core names under >> the admin "Core Selector"? >> >> When I create X amount of shards via the bootstrap command I think it only >> creates 1 core. Am I missing something? >> > > If you create it with numShards=1 and replicationFactor=2, you'll end up > with a total of 2 cores across all your Solr instances. For my simple > cloud install, these are the numbers that I'm using. One shard, a total of > two copies. > > If you create it with the numbers given on the wiki page, numShards=3 and > replicationFactor=4, there would be a total of 12 cores created across all > your servers. The maxShardsPerNode parameter defaults to 1, which means > that only 1 core per instance (SolrCloud node) is allowed for that > collection. If there aren't enough Solr instances for the numbers you have > entered, the creation will fail. > > I don't know any details about what the bootstrap_conf parameter actually > does when it creates collections. I've never used it - I want to be in > control of the configs and collections that get created. > > Thanks, > Shawn > >
Solr Cloud Bulk Indexing Questions
We are testing our shiny new Solr Cloud architecture but we are experiencing some issues when doing bulk indexing. We have 5 solr cloud machines running and 3 indexing machines (separate from the cloud servers). The indexing machines pull off ids from a queue then they index and ship over a document via a CloudSolrServer. It appears that the indexers are too fast because the load (particularly disk io) on the solr cloud machines spikes through the roof making the entire cluster unusable. It's kind of odd because the total index size is not even large..ie, < 10GB. Are there any optimization/enhancements I could try to help alleviate these problems? I should note that for the above collection we have only have 1 shard thats replicated across all machines so all machines have the full index. Would we benefit from switching to a ConcurrentUpdateSolrServer where all updates get sent to 1 machine and 1 machine only? We could then remove this machine from our cluster than that handles user requests. Thanks for any input.
Re: Solr Cloud Bulk Indexing Questions
We commit have a soft commit every 5 seconds and hard commit every 30. As far as docs/second it would guess around 200/sec which doesn't seem that high. On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson wrote: > Questions: How often do you commit your updates? What is your > indexing rate in docs/second? > > In a SolrCloud setup, you should be using a CloudSolrServer. If the > server is having trouble keeping up with updates, switching to CUSS > probably wouldn't help. > > So I suspect there's something not optimal about your setup that's > the culprit. > > Best, > Erick > > On Mon, Jan 20, 2014 at 4:00 PM, Software Dev > wrote: > > We are testing our shiny new Solr Cloud architecture but we are > > experiencing some issues when doing bulk indexing. > > > > We have 5 solr cloud machines running and 3 indexing machines (separate > > from the cloud servers). The indexing machines pull off ids from a queue > > then they index and ship over a document via a CloudSolrServer. It > appears > > that the indexers are too fast because the load (particularly disk io) on > > the solr cloud machines spikes through the roof making the entire cluster > > unusable. It's kind of odd because the total index size is not even > > large..ie, < 10GB. Are there any optimization/enhancements I could try to > > help alleviate these problems? > > > > I should note that for the above collection we have only have 1 shard > thats > > replicated across all machines so all machines have the full index. > > > > Would we benefit from switching to a ConcurrentUpdateSolrServer where all > > updates get sent to 1 machine and 1 machine only? We could then remove > this > > machine from our cluster than that handles user requests. > > > > Thanks for any input. >
Re: Solr Cloud Bulk Indexing Questions
We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all updates get sent to one machine or something? On Mon, Jan 20, 2014 at 2:42 PM, Software Dev wrote: > We commit have a soft commit every 5 seconds and hard commit every 30. As > far as docs/second it would guess around 200/sec which doesn't seem that > high. > > > On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson > wrote: > >> Questions: How often do you commit your updates? What is your >> indexing rate in docs/second? >> >> In a SolrCloud setup, you should be using a CloudSolrServer. If the >> server is having trouble keeping up with updates, switching to CUSS >> probably wouldn't help. >> >> So I suspect there's something not optimal about your setup that's >> the culprit. >> >> Best, >> Erick >> >> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev >> wrote: >> > We are testing our shiny new Solr Cloud architecture but we are >> > experiencing some issues when doing bulk indexing. >> > >> > We have 5 solr cloud machines running and 3 indexing machines (separate >> > from the cloud servers). The indexing machines pull off ids from a queue >> > then they index and ship over a document via a CloudSolrServer. It >> appears >> > that the indexers are too fast because the load (particularly disk io) >> on >> > the solr cloud machines spikes through the roof making the entire >> cluster >> > unusable. It's kind of odd because the total index size is not even >> > large..ie, < 10GB. Are there any optimization/enhancements I could try >> to >> > help alleviate these problems? >> > >> > I should note that for the above collection we have only have 1 shard >> thats >> > replicated across all machines so all machines have the full index. >> > >> > Would we benefit from switching to a ConcurrentUpdateSolrServer where >> all >> > updates get sent to 1 machine and 1 machine only? We could then remove >> this >> > machine from our cluster than that handles user requests. >> > >> > Thanks for any input. >> > >
Re: Solr Cloud Bulk Indexing Questions
4.6.0 On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller wrote: > What version are you running? > > - Mark > > On Jan 20, 2014, at 5:43 PM, Software Dev > wrote: > > > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all > > updates get sent to one machine or something? > > > > > > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev >wrote: > > > >> We commit have a soft commit every 5 seconds and hard commit every 30. > As > >> far as docs/second it would guess around 200/sec which doesn't seem that > >> high. > >> > >> > >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson < > erickerick...@gmail.com>wrote: > >> > >>> Questions: How often do you commit your updates? What is your > >>> indexing rate in docs/second? > >>> > >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the > >>> server is having trouble keeping up with updates, switching to CUSS > >>> probably wouldn't help. > >>> > >>> So I suspect there's something not optimal about your setup that's > >>> the culprit. > >>> > >>> Best, > >>> Erick > >>> > >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev < > static.void@gmail.com> > >>> wrote: > >>>> We are testing our shiny new Solr Cloud architecture but we are > >>>> experiencing some issues when doing bulk indexing. > >>>> > >>>> We have 5 solr cloud machines running and 3 indexing machines > (separate > >>>> from the cloud servers). The indexing machines pull off ids from a > queue > >>>> then they index and ship over a document via a CloudSolrServer. It > >>> appears > >>>> that the indexers are too fast because the load (particularly disk io) > >>> on > >>>> the solr cloud machines spikes through the roof making the entire > >>> cluster > >>>> unusable. It's kind of odd because the total index size is not even > >>>> large..ie, < 10GB. Are there any optimization/enhancements I could try > >>> to > >>>> help alleviate these problems? > >>>> > >>>> I should note that for the above collection we have only have 1 shard > >>> thats > >>>> replicated across all machines so all machines have the full index. > >>>> > >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where > >>> all > >>>> updates get sent to 1 machine and 1 machine only? We could then remove > >>> this > >>>> machine from our cluster than that handles user requests. > >>>> > >>>> Thanks for any input. > >>> > >> > >> > >
Removing a node from Solr Cloud
What is the process for completely removing a node from Solr Cloud? We recently removed one but t its still showing up as "Gone" in the Cloud admin. Thanks
Setting leaderVoteWait for auto discovered cores
How is this accomplished? We currently have an empty solr.xml (auto-discovery) so I'm not sure where to put this value?
Re: Removing a node from Solr Cloud
Thanks. Anyway to accomplish this if the machine crashed (ie, can't unload it from that admin)? On Tue, Jan 21, 2014 at 11:25 AM, Anshum Gupta wrote: > You could unload the cores. This optionally also deletes the data and > instance directory. > Look at http://wiki.apache.org/solr/CoreAdmin#UNLOAD. > > > On Tue, Jan 21, 2014 at 10:22 AM, Software Dev >wrote: > > > What is the process for completely removing a node from Solr Cloud? We > > recently removed one but t its still showing up as "Gone" in the Cloud > > admin. > > > > Thanks > > > > > > -- > > Anshum Gupta > http://www.anshumgupta.net >
Re: Solr Cloud Bulk Indexing Questions
Any other suggestions? On Mon, Jan 20, 2014 at 2:49 PM, Software Dev wrote: > 4.6.0 > > > On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller wrote: > >> What version are you running? >> >> - Mark >> >> On Jan 20, 2014, at 5:43 PM, Software Dev >> wrote: >> >> > We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do all >> > updates get sent to one machine or something? >> > >> > >> > On Mon, Jan 20, 2014 at 2:42 PM, Software Dev < >> static.void@gmail.com>wrote: >> > >> >> We commit have a soft commit every 5 seconds and hard commit every 30. >> As >> >> far as docs/second it would guess around 200/sec which doesn't seem >> that >> >> high. >> >> >> >> >> >> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson < >> erickerick...@gmail.com>wrote: >> >> >> >>> Questions: How often do you commit your updates? What is your >> >>> indexing rate in docs/second? >> >>> >> >>> In a SolrCloud setup, you should be using a CloudSolrServer. If the >> >>> server is having trouble keeping up with updates, switching to CUSS >> >>> probably wouldn't help. >> >>> >> >>> So I suspect there's something not optimal about your setup that's >> >>> the culprit. >> >>> >> >>> Best, >> >>> Erick >> >>> >> >>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev < >> static.void@gmail.com> >> >>> wrote: >> >>>> We are testing our shiny new Solr Cloud architecture but we are >> >>>> experiencing some issues when doing bulk indexing. >> >>>> >> >>>> We have 5 solr cloud machines running and 3 indexing machines >> (separate >> >>>> from the cloud servers). The indexing machines pull off ids from a >> queue >> >>>> then they index and ship over a document via a CloudSolrServer. It >> >>> appears >> >>>> that the indexers are too fast because the load (particularly disk >> io) >> >>> on >> >>>> the solr cloud machines spikes through the roof making the entire >> >>> cluster >> >>>> unusable. It's kind of odd because the total index size is not even >> >>>> large..ie, < 10GB. Are there any optimization/enhancements I could >> try >> >>> to >> >>>> help alleviate these problems? >> >>>> >> >>>> I should note that for the above collection we have only have 1 shard >> >>> thats >> >>>> replicated across all machines so all machines have the full index. >> >>>> >> >>>> Would we benefit from switching to a ConcurrentUpdateSolrServer where >> >>> all >> >>>> updates get sent to 1 machine and 1 machine only? We could then >> remove >> >>> this >> >>>> machine from our cluster than that handles user requests. >> >>>> >> >>>> Thanks for any input. >> >>> >> >> >> >> >> >> >
Re: Solr Cloud Bulk Indexing Questions
A suggestion would be to hard commit much less often, ie every 10 minutes, and see if there is a change. - Will try this How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache ? - We have 18G of ram 12 dedicated to Solr but as of right now the total index size is only 5GB Ah, and what about network IO ? Could that be a limiting factor ? - What is the size of your documents ? A few KB, MB, ... ? Under 1MB - Again, total index size is only 5GB so I dont know if this would be a problem On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez wrote: > 1 node having more load should be the leader (because of the extra work > of receiving and distributing updates, but my experiences show only a > bit more CPU usage, and no difference in disk IO). > > A suggestion would be to hard commit much less often, ie every 10 > minutes, and see if there is a change. > How much system RAM ? JVM Heap ? Enough space in RAM for system disk cache > ? > What is the size of your documents ? A few KB, MB, ... ? > Ah, and what about network IO ? Could that be a limiting factor ? > > > André > > > On 2014-01-21 23:40, Software Dev wrote: > >> Any other suggestions? >> >> >> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev >> wrote: >> >> 4.6.0 >>> >>> >>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller >> >wrote: >>> >>> What version are you running? >>>> >>>> - Mark >>>> >>>> On Jan 20, 2014, at 5:43 PM, Software Dev >>>> wrote: >>>> >>>> We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do >>>>> all >>>>> updates get sent to one machine or something? >>>>> >>>>> >>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev < >>>>> >>>> static.void@gmail.com>wrote: >>>> >>>>> We commit have a soft commit every 5 seconds and hard commit every 30. >>>>>> >>>>> As >>>> >>>>> far as docs/second it would guess around 200/sec which doesn't seem >>>>>> >>>>> that >>>> >>>>> high. >>>>>> >>>>>> >>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson < >>>>>> >>>>> erickerick...@gmail.com>wrote: >>>> >>>>> Questions: How often do you commit your updates? What is your >>>>>>> indexing rate in docs/second? >>>>>>> >>>>>>> In a SolrCloud setup, you should be using a CloudSolrServer. If the >>>>>>> server is having trouble keeping up with updates, switching to CUSS >>>>>>> probably wouldn't help. >>>>>>> >>>>>>> So I suspect there's something not optimal about your setup that's >>>>>>> the culprit. >>>>>>> >>>>>>> Best, >>>>>>> Erick >>>>>>> >>>>>>> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev < >>>>>>> >>>>>> static.void@gmail.com> >>>> >>>>> wrote: >>>>>>> >>>>>>>> We are testing our shiny new Solr Cloud architecture but we are >>>>>>>> experiencing some issues when doing bulk indexing. >>>>>>>> >>>>>>>> We have 5 solr cloud machines running and 3 indexing machines >>>>>>>> >>>>>>> (separate >>>> >>>>> from the cloud servers). The indexing machines pull off ids from a >>>>>>>> >>>>>>> queue >>>> >>>>> then they index and ship over a document via a CloudSolrServer. It >>>>>>>> >>>>>>> appears >>>>>>> >>>>>>>> that the indexers are too fast because the load (particularly disk >>>>>>>> >>>>>>> io) >>>> >>>>> on >>>>>>> >>>>>>>> the solr cloud machines spikes through the roof making the entire >>>>>>>> >>>>>>> cluster >>>>>>> >>>>>>>> unusable. It's kind of odd because the total index size is not even >>>>>>>> large..ie, < 10GB. Are there any optimization/enhancements I could >>>>>>>> >>>>>>> try >>>> >>>>> to >>>>>>> >>>>>>>> help alleviate these problems? >>>>>>>> >>>>>>>> I should note that for the above collection we have only have 1 >>>>>>>> shard >>>>>>>> >>>>>>> thats >>>>>>> >>>>>>>> replicated across all machines so all machines have the full index. >>>>>>>> >>>>>>>> Would we benefit from switching to a ConcurrentUpdateSolrServer >>>>>>>> where >>>>>>>> >>>>>>> all >>>>>>> >>>>>>>> updates get sent to 1 machine and 1 machine only? We could then >>>>>>>> >>>>>>> remove >>>> >>>>> this >>>>>>> >>>>>>>> machine from our cluster than that handles user requests. >>>>>>>> >>>>>>>> Thanks for any input. >>>>>>>> >>>>>>> >>>>>> >>>> >> -- >> André Bois-Crettez >> >> Software Architect >> Search Developer >> http://www.kelkoo.com/ >> > > Kelkoo SAS > Société par Actions Simplifiée > Au capital de € 4.168.964,30 > Siège social : 8, rue du Sentier 75002 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > destinataire de ce message, merci de le détruire et d'en avertir > l'expéditeur. >
Re: Solr Cloud Bulk Indexing Questions
Thanks for suggestions. After reading that document I feel even more confused though because I always thought that hard commits should be less frequent that hard commits. Is there any way to configure autoCommit, softCommit values on a per request basis? The majority of the time we have small flow of updates coming in and we would like to see them in ASAP. However we occasionally need to do some bulk indexing (once a week or less) and the need to see those updates right away isn't as critical. I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode and the other 5% is "Index-Heavy Query-Light/Heavy" mode. Thanks On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson wrote: > When you're doing hard commits, is it with openSeacher = true or > false? It should probably be false... > > Here's a rundown of the soft/hard commit consequences: > > > http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > I suspect (but, of course, can't prove) that you're over-committing > and hitting segment > merges without meaning to... > > FWIW, > Erick > > On Wed, Jan 22, 2014 at 1:46 PM, Software Dev > wrote: > > A suggestion would be to hard commit much less often, ie every 10 > > minutes, and see if there is a change. > > > > - Will try this > > > > How much system RAM ? JVM Heap ? Enough space in RAM for system disk > cache ? > > > > - We have 18G of ram 12 dedicated to Solr but as of right now the total > > index size is only 5GB > > > > Ah, and what about network IO ? Could that be a limiting factor ? > > > > - What is the size of your documents ? A few KB, MB, ... ? > > > > Under 1MB > > > > - Again, total index size is only 5GB so I dont know if this would be a > > problem > > > > > > > > > > > > > > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez > > wrote: > > > >> 1 node having more load should be the leader (because of the extra work > >> of receiving and distributing updates, but my experiences show only a > >> bit more CPU usage, and no difference in disk IO). > >> > >> A suggestion would be to hard commit much less often, ie every 10 > >> minutes, and see if there is a change. > >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk > cache > >> ? > >> What is the size of your documents ? A few KB, MB, ... ? > >> Ah, and what about network IO ? Could that be a limiting factor ? > >> > >> > >> André > >> > >> > >> On 2014-01-21 23:40, Software Dev wrote: > >> > >>> Any other suggestions? > >>> > >>> > >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev < > static.void@gmail.com> > >>> wrote: > >>> > >>> 4.6.0 > >>>> > >>>> > >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller >>>> >wrote: > >>>> > >>>> What version are you running? > >>>>> > >>>>> - Mark > >>>>> > >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev > > >>>>> wrote: > >>>>> > >>>>> We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do > >>>>>> all > >>>>>> updates get sent to one machine or something? > >>>>>> > >>>>>> > >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev < > >>>>>> > >>>>> static.void@gmail.com>wrote: > >>>>> > >>>>>> We commit have a soft commit every 5 seconds and hard commit every > 30. > >>>>>>> > >>>>>> As > >>>>> > >>>>>> far as docs/second it would guess around 200/sec which doesn't seem > >>>>>>> > >>>>>> that > >>>>> > >>>>>> high. > >>>>>>> > >>>>>>> > >>>>>>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson < > >>>>>>> > >>>>>> erickerick...@gmail.com>wrote: > >>>>> > >>>>>> Questions: How often do you commit your updates? What is your > >>>>>>>> indexing rate in docs/second? > >>>>>>>> > >>>>>>>&g
Re: Solr Cloud Bulk Indexing Questions
Also, any suggestions on debugging? What should I look for and how? Thanks On Thu, Jan 23, 2014 at 10:01 AM, Software Dev wrote: > Thanks for suggestions. After reading that document I feel even more > confused though because I always thought that hard commits should be less > frequent that hard commits. > > Is there any way to configure autoCommit, softCommit values on a per > request basis? The majority of the time we have small flow of updates > coming in and we would like to see them in ASAP. However we occasionally > need to do some bulk indexing (once a week or less) and the need to see > those updates right away isn't as critical. > > I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode > and the other 5% is "Index-Heavy Query-Light/Heavy" mode. > > Thanks > > > On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson > wrote: > >> When you're doing hard commits, is it with openSeacher = true or >> false? It should probably be false... >> >> Here's a rundown of the soft/hard commit consequences: >> >> >> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> I suspect (but, of course, can't prove) that you're over-committing >> and hitting segment >> merges without meaning to... >> >> FWIW, >> Erick >> >> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev >> wrote: >> > A suggestion would be to hard commit much less often, ie every 10 >> > minutes, and see if there is a change. >> > >> > - Will try this >> > >> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk >> cache ? >> > >> > - We have 18G of ram 12 dedicated to Solr but as of right now the total >> > index size is only 5GB >> > >> > Ah, and what about network IO ? Could that be a limiting factor ? >> > >> > - What is the size of your documents ? A few KB, MB, ... ? >> > >> > Under 1MB >> > >> > - Again, total index size is only 5GB so I dont know if this would be a >> > problem >> > >> > >> > >> > >> > >> > >> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez >> > wrote: >> > >> >> 1 node having more load should be the leader (because of the extra work >> >> of receiving and distributing updates, but my experiences show only a >> >> bit more CPU usage, and no difference in disk IO). >> >> >> >> A suggestion would be to hard commit much less often, ie every 10 >> >> minutes, and see if there is a change. >> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk >> cache >> >> ? >> >> What is the size of your documents ? A few KB, MB, ... ? >> >> Ah, and what about network IO ? Could that be a limiting factor ? >> >> >> >> >> >> André >> >> >> >> >> >> On 2014-01-21 23:40, Software Dev wrote: >> >> >> >>> Any other suggestions? >> >>> >> >>> >> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev < >> static.void@gmail.com> >> >>> wrote: >> >>> >> >>> 4.6.0 >> >>>> >> >>>> >> >>>> On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller > >>>> >wrote: >> >>>> >> >>>> What version are you running? >> >>>>> >> >>>>> - Mark >> >>>>> >> >>>>> On Jan 20, 2014, at 5:43 PM, Software Dev < >> static.void@gmail.com> >> >>>>> wrote: >> >>>>> >> >>>>> We also noticed that disk IO shoots up to 100% on 1 of the nodes. >> Do >> >>>>>> all >> >>>>>> updates get sent to one machine or something? >> >>>>>> >> >>>>>> >> >>>>>> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev < >> >>>>>> >> >>>>> static.void@gmail.com>wrote: >> >>>>> >> >>>>>> We commit have a soft commit every 5 seconds and hard commit every >> 30. >> >>>>>>> >> >>>>>> As >> >>>>> >> >>>>>> far as docs/second it would guess around 200/s
Re: Solr Cloud Bulk Indexing Questions
Does maxWriteMBPerSec apply to NRTCachingDirectoryFactory? I only see maxMergeSizeMB and maxCachedMB as configuration values. On Thu, Jan 23, 2014 at 11:05 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > Have you tried maxWriteMBPerSec? > > http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > > > On Mon, Jan 20, 2014 at 4:00 PM, Software Dev >wrote: > > > We are testing our shiny new Solr Cloud architecture but we are > > experiencing some issues when doing bulk indexing. > > > > We have 5 solr cloud machines running and 3 indexing machines (separate > > from the cloud servers). The indexing machines pull off ids from a queue > > then they index and ship over a document via a CloudSolrServer. It > appears > > that the indexers are too fast because the load (particularly disk io) on > > the solr cloud machines spikes through the roof making the entire cluster > > unusable. It's kind of odd because the total index size is not even > > large..ie, < 10GB. Are there any optimization/enhancements I could try to > > help alleviate these problems? > > > > I should note that for the above collection we have only have 1 shard > thats > > replicated across all machines so all machines have the full index. > > > > Would we benefit from switching to a ConcurrentUpdateSolrServer where all > > updates get sent to 1 machine and 1 machine only? We could then remove > this > > machine from our cluster than that handles user requests. > > > > Thanks for any input. > > >
SolrCloudServer questions
Can someone clarify what the following options are: - updatesToLeaders - shutdownLBHttpSolrServer - parallelUpdates Also, I remember in older version of Solr there was an efficient format that was used between SolrJ and Solr that is more compact. Does this sill exist in the latest version of Solr? If so, is it the default? Thanks
Disabling Commit/Auto-Commit (SolrCloud)
Is there a way to disable commit/hard-commit at runtime? For example, we usually have our hard commit and soft-commit set really low but when we do bulk indexing we would like to disable this to increase performance. If there isn't a an easy way of doing this would simply pushing a new solrconfig to solrcloud work?
Re: SolrCloudServer questions
Which of any of these settings would be beneficial when bulk uploading? On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller wrote: > > > On Jan 31, 2014, at 1:56 PM, Greg Walters > wrote: > > > I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore > my response. > > > >> -updatesToLeaders > > > > Only send documents to shard leaders while indexing. This saves > cross-talk between slaves and leaders which results in more efficient > document routing. > > Right, but recently this has less of an affect because CloudSolrServer can > now hash documents and directly send them to the right place. This option > has become more historical. Just make sure you set the correct id field on > the CloudSolrServer instance for this hashing to work (I think it defaults > to "id"). > > > > >> shutdownLBHttpSolrServer > > > > CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute > requests (that aren't updates directly to leaders). Where did you find > this? I don't see this in the javadoc anywhere but it is a boolean in the > CloudSolrServer class. It looks like when you create a new CloudSolrServer > and pass it your own LBHttpSolrServer the boolean gets set to false and the > CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut down. > > > >> parellelUpdates > > > > The javadoc's done have any description for this one but I checked out > the code for CloudSolrServer and if parallelUpdates it looks like it > executes update statements to multiple shards at the same time. > > Right, we should def add some javadoc, but this sends updates to shards in > parallel rather than with a single thread. Can really increase update > speed. Still not as powerful as using CloudSolrServer from multiple > threads, but a nice improvement non the less. > > > - Mark > > http://about.me/markrmiller > > > > > I'm no dev but I can read so please excuse any errors on my part. > > > > Thanks, > > Greg > > > > On Jan 31, 2014, at 11:40 AM, Software Dev > wrote: > > > >> Can someone clarify what the following options are: > >> > >> - updatesToLeaders > >> - shutdownLBHttpSolrServer > >> - parallelUpdates > >> > >> Also, I remember in older version of Solr there was an efficient format > >> that was used between SolrJ and Solr that is more compact. Does this > sill > >> exist in the latest version of Solr? If so, is it the default? > >> > >> Thanks > > > >
Re: SolrCloudServer questions
Out use case is we have 3 indexing machines pulling off a kafka queue and they are all sending individual updates. On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller wrote: > Just make sure parallel updates is set to true. > > If you want to load even faster, you can use the bulk add methods, or if > you need more fine grained responses, use the single add from multiple > threads (though bulk add can also be done via multiple threads if you > really want to try and push the max). > > - Mark > > http://about.me/markrmiller > > On Jan 31, 2014, at 3:50 PM, Software Dev > wrote: > > > Which of any of these settings would be beneficial when bulk uploading? > > > > > > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller > wrote: > > > >> > >> > >> On Jan 31, 2014, at 1:56 PM, Greg Walters > >> wrote: > >> > >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore > >> my response. > >>> > >>>> -updatesToLeaders > >>> > >>> Only send documents to shard leaders while indexing. This saves > >> cross-talk between slaves and leaders which results in more efficient > >> document routing. > >> > >> Right, but recently this has less of an affect because CloudSolrServer > can > >> now hash documents and directly send them to the right place. This > option > >> has become more historical. Just make sure you set the correct id field > on > >> the CloudSolrServer instance for this hashing to work (I think it > defaults > >> to "id"). > >> > >>> > >>>> shutdownLBHttpSolrServer > >>> > >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute > >> requests (that aren't updates directly to leaders). Where did you find > >> this? I don't see this in the javadoc anywhere but it is a boolean in > the > >> CloudSolrServer class. It looks like when you create a new > CloudSolrServer > >> and pass it your own LBHttpSolrServer the boolean gets set to false and > the > >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut > down. > >>> > >>>> parellelUpdates > >>> > >>> The javadoc's done have any description for this one but I checked out > >> the code for CloudSolrServer and if parallelUpdates it looks like it > >> executes update statements to multiple shards at the same time. > >> > >> Right, we should def add some javadoc, but this sends updates to shards > in > >> parallel rather than with a single thread. Can really increase update > >> speed. Still not as powerful as using CloudSolrServer from multiple > >> threads, but a nice improvement non the less. > >> > >> > >> - Mark > >> > >> http://about.me/markrmiller > >> > >>> > >>> I'm no dev but I can read so please excuse any errors on my part. > >>> > >>> Thanks, > >>> Greg > >>> > >>> On Jan 31, 2014, at 11:40 AM, Software Dev > >> wrote: > >>> > >>>> Can someone clarify what the following options are: > >>>> > >>>> - updatesToLeaders > >>>> - shutdownLBHttpSolrServer > >>>> - parallelUpdates > >>>> > >>>> Also, I remember in older version of Solr there was an efficient > format > >>>> that was used between SolrJ and Solr that is more compact. Does this > >> sill > >>>> exist in the latest version of Solr? If so, is it the default? > >>>> > >>>> Thanks > >>> > >> > >> > >
Re: SolrCloudServer questions
Also, if we are seeing a huge cpu spike on the leader when doing a bulk index, would changing any of the options help? On Sat, Feb 1, 2014 at 2:59 PM, Software Dev wrote: > Out use case is we have 3 indexing machines pulling off a kafka queue and > they are all sending individual updates. > > > On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller wrote: > >> Just make sure parallel updates is set to true. >> >> If you want to load even faster, you can use the bulk add methods, or if >> you need more fine grained responses, use the single add from multiple >> threads (though bulk add can also be done via multiple threads if you >> really want to try and push the max). >> >> - Mark >> >> http://about.me/markrmiller >> >> On Jan 31, 2014, at 3:50 PM, Software Dev >> wrote: >> >> > Which of any of these settings would be beneficial when bulk uploading? >> > >> > >> > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller >> wrote: >> > >> >> >> >> >> >> On Jan 31, 2014, at 1:56 PM, Greg Walters >> >> wrote: >> >> >> >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore >> >> my response. >> >>> >> >>>> -updatesToLeaders >> >>> >> >>> Only send documents to shard leaders while indexing. This saves >> >> cross-talk between slaves and leaders which results in more efficient >> >> document routing. >> >> >> >> Right, but recently this has less of an affect because CloudSolrServer >> can >> >> now hash documents and directly send them to the right place. This >> option >> >> has become more historical. Just make sure you set the correct id >> field on >> >> the CloudSolrServer instance for this hashing to work (I think it >> defaults >> >> to "id"). >> >> >> >>> >> >>>> shutdownLBHttpSolrServer >> >>> >> >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to >> distribute >> >> requests (that aren't updates directly to leaders). Where did you find >> >> this? I don't see this in the javadoc anywhere but it is a boolean in >> the >> >> CloudSolrServer class. It looks like when you create a new >> CloudSolrServer >> >> and pass it your own LBHttpSolrServer the boolean gets set to false >> and the >> >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut >> down. >> >>> >> >>>> parellelUpdates >> >>> >> >>> The javadoc's done have any description for this one but I checked out >> >> the code for CloudSolrServer and if parallelUpdates it looks like it >> >> executes update statements to multiple shards at the same time. >> >> >> >> Right, we should def add some javadoc, but this sends updates to >> shards in >> >> parallel rather than with a single thread. Can really increase update >> >> speed. Still not as powerful as using CloudSolrServer from multiple >> >> threads, but a nice improvement non the less. >> >> >> >> >> >> - Mark >> >> >> >> http://about.me/markrmiller >> >> >> >>> >> >>> I'm no dev but I can read so please excuse any errors on my part. >> >>> >> >>> Thanks, >> >>> Greg >> >>> >> >>> On Jan 31, 2014, at 11:40 AM, Software Dev > > >> >> wrote: >> >>> >> >>>> Can someone clarify what the following options are: >> >>>> >> >>>> - updatesToLeaders >> >>>> - shutdownLBHttpSolrServer >> >>>> - parallelUpdates >> >>>> >> >>>> Also, I remember in older version of Solr there was an efficient >> format >> >>>> that was used between SolrJ and Solr that is more compact. Does this >> >> sill >> >>>> exist in the latest version of Solr? If so, is it the default? >> >>>> >> >>>> Thanks >> >>> >> >> >> >> >> >> >
How does Solr parse schema.xml?
Can anyone point me in the right direction. I'm trying to duplicate the functionality of the analysis request handler so we can wrap a service around it to return the terms given a string of text. We would like to read the same schema.xml file to configure the analyzer,tokenizer, etc but I can't seem to find the class that actually does the parsing of that file. Thanks