Re: Tuning Solr caches with high commit rates (NRT)

Dennis Gearon Fri, 17 Sep 2010 09:55:37 -0700

BTW, what is NRT?

Dennis Gearon


Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/17/10, Peter Sturge <[email protected]> wrote:

> From: Peter Sturge <[email protected]>
> Subject: Re: Tuning Solr caches with high commit rates (NRT)
> To: [email protected]
> Date: Friday, September 17, 2010, 2:18 AM
> Hi,
> 
> It's great to see such a fantastic response to this thread
> - NRT is
> alive and well!
> 
> I'm hoping to collate this information and add it to the
> wiki when I
> get a few free cycles (thanks Erik for the heads up).
> 
> In the meantime, I thought I'd add a few tidbits of
> additional
> information that might prove useful:
> 
> 1. The first one to note is that the techniques/setup
> described in
> this thread don't fix the underlying potential for
> OutOfMemory errors
> - there can always be an index large enough to ask of its
> JVM more
> memory than is available for cache.
> These techniques, however, mitigate the risk, and provide
> an efficient
> balance between memory use and search performance.
> There are some interesting discussions going on for both
> Lucene and
> Solr regarding the '2 pounds of baloney into a 1 pound bag'
> issue of
> unbounded caches, with a number of interesting strategies.
> One strategy that I like, but haven't found in discussion
> lists is
> auto-limiting cache size/warming based on available
> resources (similar
> to the way file system caches use free memory). This would
> allow
> caches to adjust to their memory environment as indexes
> grow.
> 
> 2. A note regarding lockType in solrconfig.xml for dual
> Solr
> instances: It's best not to use 'none' as a value for
> lockType - this
> sets the lockType to null, and as the source comments note,
> this is a
> recipe for disaster, so, use 'simple' instead.
> 
> 3. Chris mentioned setting maxWarmingSearchers to 1 as a
> way of
> minimizing the number of onDeckSearchers. This is a prudent
> move --
> thanks Chris for bringing this up!
> 
> All the best,
> Peter
> 
> 
> 
> 
> On Tue, Sep 14, 2010 at 2:00 PM, Peter Karich <[email protected]>
> wrote:
> > Peter Sturge,
> >
> > this was a nice hint, thanks again! If you are here in
> Germany anytime I
> > can invite you to a beer or an apfelschorle ! :-)
> > I only needed to change the lockType to none in the
> solrconfig.xml,
> > disable the replication and set the data dir to the
> master data dir!
> >
> > Regards,
> > Peter Karich.
> >
> >> Hi Peter,
> >>
> >> this scenario would be really great for us - I
> didn't know that this is
> >> possible and works, so: thanks!
> >> At the moment we are doing similar with
> replicating to the readonly
> >> instance but
> >> the replication is somewhat lengthy and
> resource-intensive at this
> >> datavolume ;-)
> >>
> >> Regards,
> >> Peter.
> >>
> >>
> >>> 1. You can run multiple Solr instances in
> separate JVMs, with both
> >>> having their solr.xml configured to use the
> same index folder.
> >>> You need to be careful that one and only one
> of these instances will
> >>> ever update the index at a time. The best way
> to ensure this is to use
> >>> one for writing only,
> >>> and the other is read-only and never writes to
> the index. This
> >>> read-only instance is the one to use for
> tuning for high search
> >>> performance. Even though the RO instance
> doesn't write to the index,
> >>> it still needs periodic (albeit empty) commits
> to kick off
> >>> autowarming/cache refresh.
> >>>
> >>> Depending on your needs, you might not need to
> have 2 separate
> >>> instances. We need it because the 'write'
> instance is also doing a lot
> >>> of metadata pre-write operations in the same
> jvm as Solr, and so has
> >>> its own memory requirements.
> >>>
> >>> 2. We use sharding all the time, and it works
> just fine with this
> >>> scenario, as the RO instance is simply another
> shard in the pack.
> >>>
> >>>
> >>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich
> <[email protected]>
> wrote:
> >>>
> >>>
> >>>> Peter,
> >>>>
> >>>> thanks a lot for your in-depth
> explanations!
> >>>> Your findings will be definitely helpful
> for my next performance
> >>>> improvement tests :-)
> >>>>
> >>>> Two questions:
> >>>>
> >>>> 1. How would I do that:
> >>>>
> >>>>
> >>>>
> >>>>> or a local read-only instance that
> reads the same core as the indexing
> >>>>> instance (for the latter, you'll need
> something that periodically refreshes - i.e. runs
> commit()).
> >>>>>
> >>>>>
> >>>> 2. Did you try sharding with your current
> setup (e.g. one big,
> >>>> nearly-static index and a tiny write+read
> index)?
> >>>>
> >>>> Regards,
> >>>> Peter.
> >>>>
> >>>>
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Below are some notes regarding Solr
> cache tuning that should prove
> >>>>> useful for anyone who uses Solr with
> frequent commits (e.g. <5min).
> >>>>>
> >>>>> Environment:
> >>>>> Solr 1.4.1 or branch_3x trunk.
> >>>>> Note the 4.x trunk has lots of neat
> new features, so the notes here
> >>>>> are likely less relevant to the 4.x
> environment.
> >>>>>
> >>>>> Overview:
> >>>>> Our Solr environment makes extensive
> use of faceting, we perform
> >>>>> commits every 30secs, and the indexes
> tend be on the large-ish side
> >>>>> (>20million docs).
> >>>>> Note: For our data, when we commit, we
> are always adding new data,
> >>>>> never changing existing data.
> >>>>> This type of environment can be tricky
> to tune, as Solr is more geared
> >>>>> toward fast reads than frequent
> writes.
> >>>>>
> >>>>> Symptoms:
> >>>>> If anyone has used faceting in
> searches where you are also performing
> >>>>> frequent commits, you've likely
> encountered the dreaded OutOfMemory or
> >>>>> GC Overhead Exeeded errors.
> >>>>> In high commit rate environments, this
> is almost always due to
> >>>>> multiple 'onDeck' searchers and
> autowarming - i.e. new searchers don't
> >>>>> finish autowarming their caches before
> the next commit()
> >>>>> comes along and invalidates them.
> >>>>> Once this starts happening on a
> regular basis, it is likely your
> >>>>> Solr's JVM will run out of memory
> eventually, as the number of
> >>>>> searchers (and their cache arrays)
> will keep growing until the JVM
> >>>>> dies of thirst.
> >>>>> To check if your Solr environment is
> suffering from this, turn on INFO
> >>>>> level logging, and look for:
> 'PERFORMANCE WARNING: Overlapping
> >>>>> onDeckSearchers=x'.
> >>>>>
> >>>>> In tests, we've only ever seen this
> problem when using faceting, and
> >>>>> facet.method=fc.
> >>>>>
> >>>>> Some solutions to this are:
> >>>>>     Reduce the commit rate to allow
> searchers to fully warm before the
> >>>>> next commit
> >>>>>     Reduce or eliminate the
> autowarming in caches
> >>>>>     Both of the above
> >>>>>
> >>>>> The trouble is, if you're doing NRT
> commits, you likely have a good
> >>>>> reason for it, and
> reducing/elimintating autowarming will very
> >>>>> significantly impact search
> performance in high commit rate
> >>>>> environments.
> >>>>>
> >>>>> Solution:
> >>>>> Here are some setup steps we've used
> that allow lots of faceting (we
> >>>>> typically search with at least 20-35
> different facet fields, and date
> >>>>> faceting/sorting) on large indexes,
> and still keep decent search
> >>>>> performance:
> >>>>>
> >>>>> 1. Firstly, you should consider using
> the enum method for facet
> >>>>> searches (facet.method=enum) unless
> you've got A LOT of memory on your
> >>>>> machine. In our tests, this method
> uses a lot less memory and
> >>>>> autowarms more quickly than fc. (Note,
> I've not tried the new
> >>>>> segement-based 'fcs' option, as I
> can't find support for it in
> >>>>> branch_3x - looks nice for 4.x
> though)
> >>>>> Admittedly, for our data, enum is not
> quite as fast for searching as
> >>>>> fc, but short of purchsing a
> Thaiwanese RAM factory, it's a worthwhile
> >>>>> tradeoff.
> >>>>> If you do have access to LOTS of
> memory, AND you can guarantee that
> >>>>> the index won't grow beyond the memory
> capacity (i.e. you have some
> >>>>> sort of deletion policy in place), fc
> can be a lot faster than enum
> >>>>> when searching with lots of facets
> across many terms.
> >>>>>
> >>>>> 2. Secondly, we've found that LRUCache
> is faster at autowarming than
> >>>>> FastLRUCache - in our tests, about 20%
> faster. Maybe this is just our
> >>>>> environment - your mileage may vary.
> >>>>>
> >>>>> So, our filterCache section in
> solrconfig.xml looks like this:
> >>>>>     <filterCache
> >>>>>       class="solr.LRUCache"
> >>>>>       size="3600"
> >>>>>       initialSize="1400"
> >>>>>       autowarmCount="3600"/>
> >>>>>
> >>>>> For a 28GB index, running in a
> quad-core x64 VMWare instance, 30
> >>>>> warmed facet fields, Solr is running
> at ~4GB. Stats filterCache size
> >>>>> shows usually in the region of ~2400.
> >>>>>
> >>>>> 3. It's also a good idea to have some
> sort of
> >>>>> firstSearcher/newSearcher event
> listener queries to allow new data to
> >>>>> populate the caches.
> >>>>> Of course, what you put in these is
> dependent on the facets you need/use.
> >>>>> We've found a good combination is a
> firstSearcher with as many facets
> >>>>> in the search as your environment can
> handle, then a subset of the
> >>>>> most common facets for the
> newSearcher.
> >>>>>
> >>>>> 4. We also set:
> >>>>>  
>  <useColdSearcher>true</useColdSearcher>
> >>>>> just in case.
> >>>>>
> >>>>> 5. Another key area for search
> performance with high commits is to use
> >>>>> 2 Solr instances - one for the high
> commit rate indexing, and one for
> >>>>> searching.
> >>>>> The read-only searching instance can
> be a remote replica, or a local
> >>>>> read-only instance that reads the same
> core as the indexing instance
> >>>>> (for the latter, you'll need something
> that periodically refreshes -
> >>>>> i.e. runs commit()).
> >>>>> This way, you can tune the indexing
> instance for writing performance
> >>>>> and the searching instance as above
> for max read performance.
> >>>>>
> >>>>> Using the setup above, we get
> fantastic searching speed for small
> >>>>> facet sets (well under 1sec), and
> really good searching for large
> >>>>> facet sets (a couple of secs depending
> on index size, number of
> >>>>> facets, unique terms etc. etc.),
> >>>>> even when searching against largeish
> indexes (>20million docs).
> >>>>> We have yet to see any OOM or GC
> errors using the techniques above,
> >>>>> even in low memory conditions.
> >>>>>
> >>>>> I hope there are people that find this
> useful. I know I've spent a lot
> >>>>> of time looking for stuff like this,
> so hopefullly, this will save
> >>>>> someone some time.
> >>>>>
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>>
> >
> >
>

Re: Tuning Solr caches with high commit rates (NRT)

Reply via email to