Re: Tuning Solr caches with high commit rates (NRT)

Peter Sturge Mon, 13 Sep 2010 01:09:57 -0700

1. You can run multiple Solr instances in separate JVMs, with both
having their solr.xml configured to use the same index folder.
You need to be careful that one and only one of these instances will
ever update the index at a time. The best way to ensure this is to use
one for writing only,
and the other is read-only and never writes to the index. This
read-only instance is the one to use for tuning for high search
performance. Even though the RO instance doesn't write to the index,
it still needs periodic (albeit empty) commits to kick off
autowarming/cache refresh.


Depending on your needs, you might not need to have 2 separate
instances. We need it because the 'write' instance is also doing a lot
of metadata pre-write operations in the same jvm as Solr, and so has
its own memory requirements.

2. We use sharding all the time, and it works just fine with this
scenario, as the RO instance is simply another shard in the pack.


On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <peat...@yahoo.de> wrote:
> Peter,
>
> thanks a lot for your in-depth explanations!
> Your findings will be definitely helpful for my next performance
> improvement tests :-)
>
> Two questions:
>
> 1. How would I do that:
>
>> or a local read-only instance that reads the same core as the indexing
>> instance (for the latter, you'll need something that periodically refreshes 
>> - i.e. runs commit()).
>
>
> 2. Did you try sharding with your current setup (e.g. one big,
> nearly-static index and a tiny write+read index)?
>
> Regards,
> Peter.
>
>> Hi,
>>
>> Below are some notes regarding Solr cache tuning that should prove
>> useful for anyone who uses Solr with frequent commits (e.g. <5min).
>>
>> Environment:
>> Solr 1.4.1 or branch_3x trunk.
>> Note the 4.x trunk has lots of neat new features, so the notes here
>> are likely less relevant to the 4.x environment.
>>
>> Overview:
>> Our Solr environment makes extensive use of faceting, we perform
>> commits every 30secs, and the indexes tend be on the large-ish side
>> (>20million docs).
>> Note: For our data, when we commit, we are always adding new data,
>> never changing existing data.
>> This type of environment can be tricky to tune, as Solr is more geared
>> toward fast reads than frequent writes.
>>
>> Symptoms:
>> If anyone has used faceting in searches where you are also performing
>> frequent commits, you've likely encountered the dreaded OutOfMemory or
>> GC Overhead Exeeded errors.
>> In high commit rate environments, this is almost always due to
>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't
>> finish autowarming their caches before the next commit()
>> comes along and invalidates them.
>> Once this starts happening on a regular basis, it is likely your
>> Solr's JVM will run out of memory eventually, as the number of
>> searchers (and their cache arrays) will keep growing until the JVM
>> dies of thirst.
>> To check if your Solr environment is suffering from this, turn on INFO
>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=x'.
>>
>> In tests, we've only ever seen this problem when using faceting, and
>> facet.method=fc.
>>
>> Some solutions to this are:
>>     Reduce the commit rate to allow searchers to fully warm before the
>> next commit
>>     Reduce or eliminate the autowarming in caches
>>     Both of the above
>>
>> The trouble is, if you're doing NRT commits, you likely have a good
>> reason for it, and reducing/elimintating autowarming will very
>> significantly impact search performance in high commit rate
>> environments.
>>
>> Solution:
>> Here are some setup steps we've used that allow lots of faceting (we
>> typically search with at least 20-35 different facet fields, and date
>> faceting/sorting) on large indexes, and still keep decent search
>> performance:
>>
>> 1. Firstly, you should consider using the enum method for facet
>> searches (facet.method=enum) unless you've got A LOT of memory on your
>> machine. In our tests, this method uses a lot less memory and
>> autowarms more quickly than fc. (Note, I've not tried the new
>> segement-based 'fcs' option, as I can't find support for it in
>> branch_3x - looks nice for 4.x though)
>> Admittedly, for our data, enum is not quite as fast for searching as
>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile
>> tradeoff.
>> If you do have access to LOTS of memory, AND you can guarantee that
>> the index won't grow beyond the memory capacity (i.e. you have some
>> sort of deletion policy in place), fc can be a lot faster than enum
>> when searching with lots of facets across many terms.
>>
>> 2. Secondly, we've found that LRUCache is faster at autowarming than
>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our
>> environment - your mileage may vary.
>>
>> So, our filterCache section in solrconfig.xml looks like this:
>>     <filterCache
>>       class="solr.LRUCache"
>>       size="3600"
>>       initialSize="1400"
>>       autowarmCount="3600"/>
>>
>> For a 28GB index, running in a quad-core x64 VMWare instance, 30
>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size
>> shows usually in the region of ~2400.
>>
>> 3. It's also a good idea to have some sort of
>> firstSearcher/newSearcher event listener queries to allow new data to
>> populate the caches.
>> Of course, what you put in these is dependent on the facets you need/use.
>> We've found a good combination is a firstSearcher with as many facets
>> in the search as your environment can handle, then a subset of the
>> most common facets for the newSearcher.
>>
>> 4. We also set:
>>    <useColdSearcher>true</useColdSearcher>
>> just in case.
>>
>> 5. Another key area for search performance with high commits is to use
>> 2 Solr instances - one for the high commit rate indexing, and one for
>> searching.
>> The read-only searching instance can be a remote replica, or a local
>> read-only instance that reads the same core as the indexing instance
>> (for the latter, you'll need something that periodically refreshes -
>> i.e. runs commit()).
>> This way, you can tune the indexing instance for writing performance
>> and the searching instance as above for max read performance.
>>
>> Using the setup above, we get fantastic searching speed for small
>> facet sets (well under 1sec), and really good searching for large
>> facet sets (a couple of secs depending on index size, number of
>> facets, unique terms etc. etc.),
>> even when searching against largeish indexes (>20million docs).
>> We have yet to see any OOM or GC errors using the techniques above,
>> even in low memory conditions.
>>
>> I hope there are people that find this useful. I know I've spent a lot
>> of time looking for stuff like this, so hopefullly, this will save
>> someone some time.
>>
>>
>> Peter
>>
>
>

Re: Tuning Solr caches with high commit rates (NRT)

Reply via email to