Hi Dennis,

These are the Lucene file segments that hold the index data on the file system.
Have a look at: http://wiki.apache.org/solr/SolrPerformanceFactors

Peter


On Mon, Sep 13, 2010 at 7:02 AM, Dennis Gearon <gear...@sbcglobal.net> wrote:
> BTW, what is a segment?
>
> I've only heard about them in the last 2 weeks here on the list.
> Dennis Gearon
>
> Signature Warning
> ----------------
> EARTH has a Right To Life,
>  otherwise we all die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>
>
> --- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com> wrote:
>
>> From: Jason Rutherglen <jason.rutherg...@gmail.com>
>> Subject: Re: Tuning Solr caches with high commit rates (NRT)
>> To: solr-user@lucene.apache.org
>> Date: Sunday, September 12, 2010, 7:52 PM
>> Yeah there's no patch... I think
>> Yonik can write it. :-)  Yah... The
>> Lucene version shouldn't matter.  The distributed
>> faceting
>> theoretically can easily be applied to multiple segments,
>> however the
>> way it's written for me is a challenge to untangle and
>> apply
>> successfully to a working patch.  Also I don't have
>> this as an itch to
>> scratch at the moment.
>>
>> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <peter.stu...@gmail.com>
>> wrote:
>> > Hi Jason,
>> >
>> > I've tried some limited testing with the 4.x trunk
>> using fcs, and I
>> > must say, I really like the idea of per-segment
>> faceting.
>> > I was hoping to see it in 3.x, but I don't see this
>> option in the
>> > branch_3x trunk. Is your SOLR-1606 patch referred to
>> in SOLR-1617 the
>> > one to use with 3.1?
>> > There seems to be a number of Solr issues tied to this
>> - one of them
>> > being Lucene-1785. Can the per-segment faceting patch
>> work with Lucene
>> > 2.9/branch_3x?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> >
>> > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen
>> > <jason.rutherg...@gmail.com>
>> wrote:
>> >> Peter,
>> >>
>> >> Are you using per-segment faceting, eg, SOLR-1617?
>>  That could help
>> >> your situation.
>> >>
>> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge
>> <peter.stu...@gmail.com>
>> wrote:
>> >>> Hi,
>> >>>
>> >>> Below are some notes regarding Solr cache
>> tuning that should prove
>> >>> useful for anyone who uses Solr with frequent
>> commits (e.g. <5min).
>> >>>
>> >>> Environment:
>> >>> Solr 1.4.1 or branch_3x trunk.
>> >>> Note the 4.x trunk has lots of neat new
>> features, so the notes here
>> >>> are likely less relevant to the 4.x
>> environment.
>> >>>
>> >>> Overview:
>> >>> Our Solr environment makes extensive use of
>> faceting, we perform
>> >>> commits every 30secs, and the indexes tend be
>> on the large-ish side
>> >>> (>20million docs).
>> >>> Note: For our data, when we commit, we are
>> always adding new data,
>> >>> never changing existing data.
>> >>> This type of environment can be tricky to
>> tune, as Solr is more geared
>> >>> toward fast reads than frequent writes.
>> >>>
>> >>> Symptoms:
>> >>> If anyone has used faceting in searches where
>> you are also performing
>> >>> frequent commits, you've likely encountered
>> the dreaded OutOfMemory or
>> >>> GC Overhead Exeeded errors.
>> >>> In high commit rate environments, this is
>> almost always due to
>> >>> multiple 'onDeck' searchers and autowarming -
>> i.e. new searchers don't
>> >>> finish autowarming their caches before the
>> next commit()
>> >>> comes along and invalidates them.
>> >>> Once this starts happening on a regular basis,
>> it is likely your
>> >>> Solr's JVM will run out of memory eventually,
>> as the number of
>> >>> searchers (and their cache arrays) will keep
>> growing until the JVM
>> >>> dies of thirst.
>> >>> To check if your Solr environment is suffering
>> from this, turn on INFO
>> >>> level logging, and look for: 'PERFORMANCE
>> WARNING: Overlapping
>> >>> onDeckSearchers=x'.
>> >>>
>> >>> In tests, we've only ever seen this problem
>> when using faceting, and
>> >>> facet.method=fc.
>> >>>
>> >>> Some solutions to this are:
>> >>>    Reduce the commit rate to allow searchers
>> to fully warm before the
>> >>> next commit
>> >>>    Reduce or eliminate the autowarming in
>> caches
>> >>>    Both of the above
>> >>>
>> >>> The trouble is, if you're doing NRT commits,
>> you likely have a good
>> >>> reason for it, and reducing/elimintating
>> autowarming will very
>> >>> significantly impact search performance in
>> high commit rate
>> >>> environments.
>> >>>
>> >>> Solution:
>> >>> Here are some setup steps we've used that
>> allow lots of faceting (we
>> >>> typically search with at least 20-35 different
>> facet fields, and date
>> >>> faceting/sorting) on large indexes, and still
>> keep decent search
>> >>> performance:
>> >>>
>> >>> 1. Firstly, you should consider using the enum
>> method for facet
>> >>> searches (facet.method=enum) unless you've got
>> A LOT of memory on your
>> >>> machine. In our tests, this method uses a lot
>> less memory and
>> >>> autowarms more quickly than fc. (Note, I've
>> not tried the new
>> >>> segement-based 'fcs' option, as I can't find
>> support for it in
>> >>> branch_3x - looks nice for 4.x though)
>> >>> Admittedly, for our data, enum is not quite as
>> fast for searching as
>> >>> fc, but short of purchsing a Thaiwanese RAM
>> factory, it's a worthwhile
>> >>> tradeoff.
>> >>> If you do have access to LOTS of memory, AND
>> you can guarantee that
>> >>> the index won't grow beyond the memory
>> capacity (i.e. you have some
>> >>> sort of deletion policy in place), fc can be a
>> lot faster than enum
>> >>> when searching with lots of facets across many
>> terms.
>> >>>
>> >>> 2. Secondly, we've found that LRUCache is
>> faster at autowarming than
>> >>> FastLRUCache - in our tests, about 20% faster.
>> Maybe this is just our
>> >>> environment - your mileage may vary.
>> >>>
>> >>> So, our filterCache section in solrconfig.xml
>> looks like this:
>> >>>    <filterCache
>> >>>      class="solr.LRUCache"
>> >>>      size="3600"
>> >>>      initialSize="1400"
>> >>>      autowarmCount="3600"/>
>> >>>
>> >>> For a 28GB index, running in a quad-core x64
>> VMWare instance, 30
>> >>> warmed facet fields, Solr is running at ~4GB.
>> Stats filterCache size
>> >>> shows usually in the region of ~2400.
>> >>>
>> >>> 3. It's also a good idea to have some sort of
>> >>> firstSearcher/newSearcher event listener
>> queries to allow new data to
>> >>> populate the caches.
>> >>> Of course, what you put in these is dependent
>> on the facets you need/use.
>> >>> We've found a good combination is a
>> firstSearcher with as many facets
>> >>> in the search as your environment can handle,
>> then a subset of the
>> >>> most common facets for the newSearcher.
>> >>>
>> >>> 4. We also set:
>> >>>
>> <useColdSearcher>true</useColdSearcher>
>> >>> just in case.
>> >>>
>> >>> 5. Another key area for search performance
>> with high commits is to use
>> >>> 2 Solr instances - one for the high commit
>> rate indexing, and one for
>> >>> searching.
>> >>> The read-only searching instance can be a
>> remote replica, or a local
>> >>> read-only instance that reads the same core as
>> the indexing instance
>> >>> (for the latter, you'll need something that
>> periodically refreshes -
>> >>> i.e. runs commit()).
>> >>> This way, you can tune the indexing instance
>> for writing performance
>> >>> and the searching instance as above for max
>> read performance.
>> >>>
>> >>> Using the setup above, we get fantastic
>> searching speed for small
>> >>> facet sets (well under 1sec), and really good
>> searching for large
>> >>> facet sets (a couple of secs depending on
>> index size, number of
>> >>> facets, unique terms etc. etc.),
>> >>> even when searching against largeish indexes
>> (>20million docs).
>> >>> We have yet to see any OOM or GC errors using
>> the techniques above,
>> >>> even in low memory conditions.
>> >>>
>> >>> I hope there are people that find this useful.
>> I know I've spent a lot
>> >>> of time looking for stuff like this, so
>> hopefullly, this will save
>> >>> someone some time.
>> >>>
>> >>>
>> >>> Peter
>> >>>
>> >>
>> >
>>
>

Reply via email to