Hello, 2011/3/14 Markus Jelsma <markus.jel...@openindex.io>
> That depends on your GC settings and generation sizes. And, instead of > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > JConsole now shows a different profile output but load is still high and performance is still bad. Btw, here is the thread profile from newrelic: https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm Note that we do use a form of sharding so I maybe all the time spent waiting for handleRequestBody is results from sharding? > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > It's actually, as I understand it, expected JVM behavior to see the heap > > rise to close to it's limit before it gets GC'd, that's how Java GC > > works. Whether that should happen every 20 seconds or what, I don't > nkow. > > > > Another option is setting better JVM garbage collection arguments, so GC > > doesn't "stop the world" so often. I have had good luck with my Solr > > using this: -XX:+UseParallelGC > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > Hello again, > > > > > > 2011/3/14 Markus Jelsma<markus.jel...@openindex.io> > > > > > >>> Hello, > > >>> > > >>> 2011/3/14 Markus Jelsma<markus.jel...@openindex.io> > > >>> > > >>>> Hi Doğacan, > > >>>> > > >>>> Are you, at some point, running out of heap space? In my experience, > > >>>> that's the common cause of increased load and excessivly high > response > > >>>> times (or time > > >>>> outs). > > >>> > > >>> How much of a heap size would be enough? Our index size is growing > > >>> slowly but we did not have this problem > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > >> > > >> Telling how much heap space is needed isn't easy to say. It usually > > >> needs to > > >> be increased when you run out of memory and get those nasty OOM > errors, > > >> are you getting them? > > >> Replication eventes will increase heap usage due to cache warming > > >> queries and > > >> autowarming. > > > > > > Nope, no OOM errors. > > > > > >>> We left most of the caches in solrconfig as default and only > increased > > >>> filterCache to 1024. We only ask for "id"s (which > > >>> are unique) and no other fields during queries (though we do > faceting). > > >>> Btw, 1.6gb of our index is stored fields (we store > > >>> everything for now, even though we do not get them during queries), > and > > >>> about 1gb of index. > > >> > > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, > > >> are there > > >> a lot of entries? Is there an insanity count? Do you use boost > > >> functions? > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > boosting functions. > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > > goes to 8gb every 20 seconds or so, > > > gc runs, falls down to 1gb. > > > > > > Btw, our current revision was just a random choice but up until two > weeks > > > ago it has been rock-solid so we have been > > > reluctant to update to another version. Would you recommend upgrading > to > > > latest trunk? > > > > > >> It might not have anything to do with memory at all but i'm just > asking. > > >> There > > >> may be a bug in your revision causing this. > > >> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > get > > >> > > >> any > > >> > > >>> improvement in load. I can try monitoring with Jconsole > > >>> with 8gigs of heap to see if it helps. > > >>> > > >>>> Cheers, > > >>>> > > >>>>> Hello everyone, > > >>>>> > > >>>>> First of all here is our Solr setup: > > >>>>> > > >>>>> - Solr nightly build 986158 > > >>>>> - Running solr inside the default jetty comes with solr build > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with > 24gb > > >> > > >> of > > >> > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr > Replication > > >>>>> - Size of index is around 2.5gb > > >>>>> - No incremental writes, index is created from scratch(delete old > > >>>> > > >>>> documents > > >>>> > > >>>>> -> commit new documents -> optimize) every 6 hours > > >>>>> - Avg # of request per second is around 60 (for a single slave) > > >>>>> - Avg time per request is around 25ms (before having problems) > > >>>>> - Load on each is slave is around 2 > > >>>>> > > >>>>> We are using this set-up for months without any problem. However > last > > >>>> > > >>>> week > > >>>> > > >>>>> we started to experience very weird performance problems like : > > >>>>> > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even > higher > > >> > > >> if > > >> > > >>>> we > > >>>> > > >>>>> don't restart the slaves) > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > >>>>> cpu) > > >>>>> > > >>>>> When we profile solr we see two very strange things : > > >>>>> > > >>>>> 1 - This is the jconsole output: > > >>>>> > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691 > > >>>>> > > >>>>> As you see gc runs for every 10-15 seconds and collects more than 1 > > >> > > >> gb > > >> > > >>>>> of memory. (Actually if you wait more than 10 minutes you see > spikes > > >>>>> up to > > >>>> > > >>>> 4gb > > >>>> > > >>>>> consistently) > > >>>>> > > >>>>> 2 - This is the newrelic output : > > >>>>> > > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > >>>>> > > >>>>> As you see solr spent ridiculously long time in > > >>>>> SolrDispatchFilter.doFilter() method. > > >>>>> > > >>>>> > > >>>>> Apart form these, when we clean the index directory, re-replicate > and > > >>>>> restart each slave one by one we see a relief in the system but > > >> > > >> after > > >> > > >>>> some > > >>>> > > >>>>> time servers start to melt down again. Although deleting index and > > >>>>> replicating doesn't solve the problem, we think that these problems > > >> > > >> are > > >> > > >>>>> somehow related to replication. Because symptoms started after > > >>>> > > >>>> replication > > >>>> > > >>>>> and once it heals itself after replication. I also see > > >>>>> lucene-write.lock files in slaves (we don't have write.lock files > in > > >>>>> the master) which I think we shouldn't see. > > >>>>> > > >>>>> > > >>>>> If anyone can give any sort of ideas, we will appreciate it. > > >>>>> > > >>>>> Regards, > > >>>>> Dogacan Guney > -- Doğacan Güney