Hi Dennis, These are the Lucene file segments that hold the index data on the file system. Have a look at: http://wiki.apache.org/solr/SolrPerformanceFactors
Peter On Mon, Sep 13, 2010 at 7:02 AM, Dennis Gearon <gear...@sbcglobal.net> wrote: > BTW, what is a segment? > > I've only heard about them in the last 2 weeks here on the list. > Dennis Gearon > > Signature Warning > ---------------- > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > --- On Sun, 9/12/10, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > >> From: Jason Rutherglen <jason.rutherg...@gmail.com> >> Subject: Re: Tuning Solr caches with high commit rates (NRT) >> To: solr-user@lucene.apache.org >> Date: Sunday, September 12, 2010, 7:52 PM >> Yeah there's no patch... I think >> Yonik can write it. :-) Yah... The >> Lucene version shouldn't matter. The distributed >> faceting >> theoretically can easily be applied to multiple segments, >> however the >> way it's written for me is a challenge to untangle and >> apply >> successfully to a working patch. Also I don't have >> this as an itch to >> scratch at the moment. >> >> On Sun, Sep 12, 2010 at 7:18 PM, Peter Sturge <peter.stu...@gmail.com> >> wrote: >> > Hi Jason, >> > >> > I've tried some limited testing with the 4.x trunk >> using fcs, and I >> > must say, I really like the idea of per-segment >> faceting. >> > I was hoping to see it in 3.x, but I don't see this >> option in the >> > branch_3x trunk. Is your SOLR-1606 patch referred to >> in SOLR-1617 the >> > one to use with 3.1? >> > There seems to be a number of Solr issues tied to this >> - one of them >> > being Lucene-1785. Can the per-segment faceting patch >> work with Lucene >> > 2.9/branch_3x? >> > >> > Thanks, >> > Peter >> > >> > >> > >> > On Mon, Sep 13, 2010 at 12:05 AM, Jason Rutherglen >> > <jason.rutherg...@gmail.com> >> wrote: >> >> Peter, >> >> >> >> Are you using per-segment faceting, eg, SOLR-1617? >> That could help >> >> your situation. >> >> >> >> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge >> <peter.stu...@gmail.com> >> wrote: >> >>> Hi, >> >>> >> >>> Below are some notes regarding Solr cache >> tuning that should prove >> >>> useful for anyone who uses Solr with frequent >> commits (e.g. <5min). >> >>> >> >>> Environment: >> >>> Solr 1.4.1 or branch_3x trunk. >> >>> Note the 4.x trunk has lots of neat new >> features, so the notes here >> >>> are likely less relevant to the 4.x >> environment. >> >>> >> >>> Overview: >> >>> Our Solr environment makes extensive use of >> faceting, we perform >> >>> commits every 30secs, and the indexes tend be >> on the large-ish side >> >>> (>20million docs). >> >>> Note: For our data, when we commit, we are >> always adding new data, >> >>> never changing existing data. >> >>> This type of environment can be tricky to >> tune, as Solr is more geared >> >>> toward fast reads than frequent writes. >> >>> >> >>> Symptoms: >> >>> If anyone has used faceting in searches where >> you are also performing >> >>> frequent commits, you've likely encountered >> the dreaded OutOfMemory or >> >>> GC Overhead Exeeded errors. >> >>> In high commit rate environments, this is >> almost always due to >> >>> multiple 'onDeck' searchers and autowarming - >> i.e. new searchers don't >> >>> finish autowarming their caches before the >> next commit() >> >>> comes along and invalidates them. >> >>> Once this starts happening on a regular basis, >> it is likely your >> >>> Solr's JVM will run out of memory eventually, >> as the number of >> >>> searchers (and their cache arrays) will keep >> growing until the JVM >> >>> dies of thirst. >> >>> To check if your Solr environment is suffering >> from this, turn on INFO >> >>> level logging, and look for: 'PERFORMANCE >> WARNING: Overlapping >> >>> onDeckSearchers=x'. >> >>> >> >>> In tests, we've only ever seen this problem >> when using faceting, and >> >>> facet.method=fc. >> >>> >> >>> Some solutions to this are: >> >>> Reduce the commit rate to allow searchers >> to fully warm before the >> >>> next commit >> >>> Reduce or eliminate the autowarming in >> caches >> >>> Both of the above >> >>> >> >>> The trouble is, if you're doing NRT commits, >> you likely have a good >> >>> reason for it, and reducing/elimintating >> autowarming will very >> >>> significantly impact search performance in >> high commit rate >> >>> environments. >> >>> >> >>> Solution: >> >>> Here are some setup steps we've used that >> allow lots of faceting (we >> >>> typically search with at least 20-35 different >> facet fields, and date >> >>> faceting/sorting) on large indexes, and still >> keep decent search >> >>> performance: >> >>> >> >>> 1. Firstly, you should consider using the enum >> method for facet >> >>> searches (facet.method=enum) unless you've got >> A LOT of memory on your >> >>> machine. In our tests, this method uses a lot >> less memory and >> >>> autowarms more quickly than fc. (Note, I've >> not tried the new >> >>> segement-based 'fcs' option, as I can't find >> support for it in >> >>> branch_3x - looks nice for 4.x though) >> >>> Admittedly, for our data, enum is not quite as >> fast for searching as >> >>> fc, but short of purchsing a Thaiwanese RAM >> factory, it's a worthwhile >> >>> tradeoff. >> >>> If you do have access to LOTS of memory, AND >> you can guarantee that >> >>> the index won't grow beyond the memory >> capacity (i.e. you have some >> >>> sort of deletion policy in place), fc can be a >> lot faster than enum >> >>> when searching with lots of facets across many >> terms. >> >>> >> >>> 2. Secondly, we've found that LRUCache is >> faster at autowarming than >> >>> FastLRUCache - in our tests, about 20% faster. >> Maybe this is just our >> >>> environment - your mileage may vary. >> >>> >> >>> So, our filterCache section in solrconfig.xml >> looks like this: >> >>> <filterCache >> >>> class="solr.LRUCache" >> >>> size="3600" >> >>> initialSize="1400" >> >>> autowarmCount="3600"/> >> >>> >> >>> For a 28GB index, running in a quad-core x64 >> VMWare instance, 30 >> >>> warmed facet fields, Solr is running at ~4GB. >> Stats filterCache size >> >>> shows usually in the region of ~2400. >> >>> >> >>> 3. It's also a good idea to have some sort of >> >>> firstSearcher/newSearcher event listener >> queries to allow new data to >> >>> populate the caches. >> >>> Of course, what you put in these is dependent >> on the facets you need/use. >> >>> We've found a good combination is a >> firstSearcher with as many facets >> >>> in the search as your environment can handle, >> then a subset of the >> >>> most common facets for the newSearcher. >> >>> >> >>> 4. We also set: >> >>> >> <useColdSearcher>true</useColdSearcher> >> >>> just in case. >> >>> >> >>> 5. Another key area for search performance >> with high commits is to use >> >>> 2 Solr instances - one for the high commit >> rate indexing, and one for >> >>> searching. >> >>> The read-only searching instance can be a >> remote replica, or a local >> >>> read-only instance that reads the same core as >> the indexing instance >> >>> (for the latter, you'll need something that >> periodically refreshes - >> >>> i.e. runs commit()). >> >>> This way, you can tune the indexing instance >> for writing performance >> >>> and the searching instance as above for max >> read performance. >> >>> >> >>> Using the setup above, we get fantastic >> searching speed for small >> >>> facet sets (well under 1sec), and really good >> searching for large >> >>> facet sets (a couple of secs depending on >> index size, number of >> >>> facets, unique terms etc. etc.), >> >>> even when searching against largeish indexes >> (>20million docs). >> >>> We have yet to see any OOM or GC errors using >> the techniques above, >> >>> even in low memory conditions. >> >>> >> >>> I hope there are people that find this useful. >> I know I've spent a lot >> >>> of time looking for stuff like this, so >> hopefullly, this will save >> >>> someone some time. >> >>> >> >>> >> >>> Peter >> >>> >> >> >> > >> >