This all applies to having more than once processor though - if you have one processor, than non concurrent can also make sense.
But especially with the young space, you want concurrency - with upto 98% of objects being short lived, and multiple threads generally creating new objects, its a huge boon to collect the young space concurrently. Mark Miller wrote: > Walter Underwood wrote: > >> For batch-oriented computing, like Hadoop, the most efficient GC is probably >> a non-concurrent, non-generational GC. >> > Okay - for batch we somewhat agree I guess - if you can stand any length > of pausing, non concurrent can be nice, because you don't pay for thread > sync communication. Only with a small heap size though (less than 100MB > is what I've seen). You would pause the batch job while GC takes place. > If you have 8 processors, and you are pausing all of them to collect a > large heap using only 1 processor, that doesn't make much sense to me. > The thread communication pain will be far outweighed by using more > processors to do the collection faster, and not "stop the world" for > your batch job so long. Stopping your application dead in its tracks, > and then only using one of the available processors to collect a large > heap, while the rest sit idle, doesn't make much sense. > > I also don't agree it ever really makes sense not to do generational > collection. What is your argument here? Generational collection is > **way** more efficient for short lived objects, which tend to be up to > 98% of the objects in most applications. The only way I see that making > sense is if you have almost no short lived objects (which occurs in > what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non > generational approach anymore. It's just standard GC practice. > >> I doubt that there are many >> batch-oriented applications of Solr, though. >> >> The rest of the advice is intended to be general and it sounds like we agree >> about sizing. If the nursery is not big enough, the tenured space will be >> used for allocations that have a short lifetime and that will increase the >> length and/or frequency of major collections. >> >> > Yes - I wasn't arguing with every point - I was picking and choosing :) > After the heap size, the size of the young generation is the most > important factor. > >> Cache evictions are the interesting part, because they cause a constant rate >> of tenured space garbage. In most many servers, you can get a big enough >> nursery that major collections are very rare. That won't happen in Solr >> because of cache evictions. >> >> The IBM JVM is excellent. Their concurrent generational GC policy is >> "gencon". >> >> > Yeah, I actually know very little about the IBM JVM, so I wasn't really > commenting. But from the info I gleaned here and on a couple quick web > searches, I'm not too impressed by it's GC. > >> wunder >> >> -----Original Message----- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Friday, September 25, 2009 10:31 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr and Garbage Collection >> >> My bad - later, it looks as if your giving general advice, and thats >> what I took issue with. >> >> Any Collector that is not doing generational collection is essentially >> from the dark ages and shouldn't be used. >> >> Any Collector that doesn't have concurrent options, unless possibly your >> running a tiny app (under 100MB of RAM), or only have a single CPU, is >> also dark ages, and not fit for a server environement. >> >> I havn't kept up with IBM's JVM, but it sounds like they are well behind >> Sun in GC then. >> >> - Mark >> >> Walter Underwood wrote: >> >> >>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low >>> pause" collector is only in the Sun JVM. >>> >>> I just found this excellent article about the various IBM GC options for a >>> Lucene application with a 100GB heap: >>> >>> >>> >>> >> http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large >> >> >>> _h.html >>> >>> wunder >>> >>> -----Original Message----- >>> From: Mark Miller [mailto:markrmil...@gmail.com] >>> Sent: Friday, September 25, 2009 10:03 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Solr and Garbage Collection >>> >>> Walter Underwood wrote: >>> >>> >>> >>>> 30ms is not better or worse than 1s until you look at the service >>>> requirements. For many applications, it is worth dedicating 10% of your >>>> processing time to GC if that makes the worst-case pause short. >>>> >>>> On the other hand, my experience with the IBM JVM was that the maximum >>>> >>>> >>>> >>> query >>> >>> >>> >>>> rate was 2-3X better with the concurrent generational GC compared to any >>>> >>>> >>>> >>> of >>> >>> >>> >>>> their other GC algorithms, so we got the best throughput along with the >>>> shortest pauses. >>>> >>>> >>>> >>>> >>> With which collector? Since the very early JVM's, all GC is generational. >>> Most of the collectors (other than the Serial Collector) also work >>> concurrently. >>> By default, they are concurrent on different generations, but you can >>> add concurrency >>> to the "other" generation with each now too. >>> >>> >>> >>>> Solr garbage generation (for queries) seems to have two major components: >>>> per-request garbage and cache evictions. With a generational collector, >>>> these two are handled by separate parts of the collector. >>>> >>>> >>>> >>> Different parts of the collector? Its a different collector depending on >>> the generation. >>> The young generation is collected with a copy collector. This is because >>> almost all the objects >>> in the young generation are likely dead, and a copy collector only needs >>> to visit live objects. So >>> its very efficient. The tenured generation uses something more along the >>> lines of mark and sweep or mark >>> and compact. >>> >>> >>> >>>> Per-request >>>> garbage should completely fit in the short-term heap (nursery), so that >>>> >>>> >> it >> >> >>>> can be collected rapidly and returned to use for further requests. If the >>>> nursery is too small, the per-request allocations will be made in tenured >>>> space and sit there until the next major GC. Cache evictions are almost >>>> always in long-term storage (tenured space) because an LRU algorithm >>>> guarantees that the garbage will be old. >>>> >>>> Check the growth rate of tenured space (under constant load, of course) >>>> while increasing the size of the nursery. That rate should drop when the >>>> nursery gets big enough, then not drop much further as it is increased >>>> >>>> >>>> >>> more. >>> >>> >>> >>>> After that, reduce the size of tenured space until major GCs start >>>> >>>> >>>> >>> happening >>> >>> >>> >>>> "too often" (a judgment call). A bigger tenured space means longer major >>>> >>>> >>>> >>> GCs >>> >>> >>> >>>> and thus longer pauses, so you don't want it oversized by too much. >>>> >>>> >>>> >>>> >>> With the concurrent low pause collector, the goal is to avoid "major" >>> collections, >>> by collecting *before* the tenured space is filled. If you you are >>> getting "major" collections, >>> you need to tune your settings - the whole point of that collector is to >>> avoid "major" >>> collections, and do almost all of the work while your application is not >>> paused. There are >>> still 2 brief pauses during the collection, but they should not be >>> significant at all. >>> >>> >>> >>>> Also check the hit rates of your caches. If the hit rate is low, say 20% >>>> >>>> >>>> >>> or >>> >>> >>> >>>> less, make that cache much bigger or set it to zero. Either one will >>>> >>>> >>>> >>> reduce >>> >>> >>> >>>> the number of cache evictions. If you have an HTTP cache in front of >>>> >>>> >> Solr, >> >> >>>> zero may be the right choice, since the HTTP cache is cherry-picking the >>>> easily cacheable requests. >>>> >>>> Note that a commit nearly doubles the memory required, because you have >>>> >>>> >>>> >>> two >>> >>> >>> >>>> live Searcher objects with all their caches. Make sure you have headroom >>>> >>>> >>>> >>> for >>> >>> >>> >>>> a commit. >>>> >>>> If you want to test the tenured space usage, you must test with real >>>> >>>> >> world >> >> >>>> queries. Those are the only way to get accurate cache eviction rates. >>>> >>>> wunder >>>> >>>> -----Original Message----- >>>> From: Jonathan Ariel [mailto:ionat...@gmail.com] >>>> Sent: Friday, September 25, 2009 9:34 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Solr and Garbage Collection >>>> >>>> BTW why making them equal will lower the frequency of GC? >>>> >>>> On 9/25/09, Fuad Efendi <f...@efendi.ca> wrote: >>>> >>>> >>>> >>>> >>>>>> Bigger heaps lead to bigger GC pauses in general. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> Opposite viewpoint: >>>>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC >>>>> >>>>> >>>>> >>>>> >>>> once-per-second. >>>> >>>> >>>> >>>> >>>>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) >>>>> >>>>> Use -server option. >>>>> >>>>> -server option of JVM is 'native CPU code', I remember WebLogic 7 >>>>> >>>>> >> console >> >> >>>>> with SUN JVM 1.3 not showing any GC (just horizontal line). >>>>> >>>>> -Fuad >>>>> http://www.linkedin.com/in/liferay >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > -- - Mark http://www.lucidimagination.com