On 5/2/2013 4:24 AM, Annette Newton wrote: > Hi Shawn, > > Thanks so much for your response. We basically are very write intensive > and write throughput is pretty essential to our product. Reads are > sporadic and actually is functioning really well. > > We write on average (at the moment) 8-12 batches of 35 documents per > minute. But we really will be looking to write more in the future, so need > to work out scaling of solr and how to cope with more volume. > > Schema (I have changed the names) : > > http://pastebin.com/x1ry7ieW > > Config: > > http://pastebin.com/pqjTCa7L
This is very clean. There's probably more you could remove/comment, but generally speaking I couldn't find any glaring issues. In particular, you have disabled autowarming, which is a major contributor to commit speed problems. The first thing I think I'd try is increasing zkClientTimeout to 30 or 60 seconds. You can use the startup commandline or solr.xml, I would probably use the latter. Here's a solr.xml fragment that uses a system property or a 15 second default: <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}" hostContext="solr"> General thoughts, these changes might not help this particular issue: You've got autoCommit with openSearcher=true. This is a hard commit. If it were me, I would set that up with openSearcher=false and either do explicit soft commits from my application or set up autoSoftCommit with a shorter timeframe than autoCommit. This might simply be a scaling issue, where you'll need to spread the load wider than four shards. I know that there are financial considerations with that, and they might not be small, so let's leave that alone for now. The memory problems might be a symptom/cause of the scaling issue I just mentioned. You said you're using facets, which can be a real memory hog even with only a few of them. Have you tried facet.method=enum to see how it performs? You'd need to switch to it exclusively, never go with the default of fc. You could put that in the defaults or invariants section of your request handler(s). Another way to reduce memory usage for facets is to use disk-based docValues on version 4.2 or later for the facet fields, but this will increase your index size, and your index is already quite large. Depending on your index contents, the increase may be small or large. Something to just mention: It looks like your solrconfig.xml has hard-coded absolute paths for dataDir and updateLog. This is fine if you'll only ever have one core/collection on each server, but it'll be a disaster if you have multiples. I could be wrong about how these get interpreted in SolrCloud -- they might actually be relative despite starting with a slash. Thanks, Shawn