Peter, Are you using per-segment faceting, eg, SOLR-1617? That could help your situation.
On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <peter.stu...@gmail.com> wrote: > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you should consider using the enum method for facet > searches (facet.method=enum) unless you've got A LOT of memory on your > machine. In our tests, this method uses a lot less memory and > autowarms more quickly than fc. (Note, I've not tried the new > segement-based 'fcs' option, as I can't find support for it in > branch_3x - looks nice for 4.x though) > Admittedly, for our data, enum is not quite as fast for searching as > fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile > tradeoff. > If you do have access to LOTS of memory, AND you can guarantee that > the index won't grow beyond the memory capacity (i.e. you have some > sort of deletion policy in place), fc can be a lot faster than enum > when searching with lots of facets across many terms. > > 2. Secondly, we've found that LRUCache is faster at autowarming than > FastLRUCache - in our tests, about 20% faster. Maybe this is just our > environment - your mileage may vary. > > So, our filterCache section in solrconfig.xml looks like this: > <filterCache > class="solr.LRUCache" > size="3600" > initialSize="1400" > autowarmCount="3600"/> > > For a 28GB index, running in a quad-core x64 VMWare instance, 30 > warmed facet fields, Solr is running at ~4GB. Stats filterCache size > shows usually in the region of ~2400. > > 3. It's also a good idea to have some sort of > firstSearcher/newSearcher event listener queries to allow new data to > populate the caches. > Of course, what you put in these is dependent on the facets you need/use. > We've found a good combination is a firstSearcher with as many facets > in the search as your environment can handle, then a subset of the > most common facets for the newSearcher. > > 4. We also set: > <useColdSearcher>true</useColdSearcher> > just in case. > > 5. Another key area for search performance with high commits is to use > 2 Solr instances - one for the high commit rate indexing, and one for > searching. > The read-only searching instance can be a remote replica, or a local > read-only instance that reads the same core as the indexing instance > (for the latter, you'll need something that periodically refreshes - > i.e. runs commit()). > This way, you can tune the indexing instance for writing performance > and the searching instance as above for max read performance. > > Using the setup above, we get fantastic searching speed for small > facet sets (well under 1sec), and really good searching for large > facet sets (a couple of secs depending on index size, number of > facets, unique terms etc. etc.), > even when searching against largeish indexes (>20million docs). > We have yet to see any OOM or GC errors using the techniques above, > even in low memory conditions. > > I hope there are people that find this useful. I know I've spent a lot > of time looking for stuff like this, so hopefullly, this will save > someone some time. > > > Peter >