Re: Solr performance issue

Doğacan Güney Mon, 14 Mar 2011 16:54:15 -0700

Hello,

2011/3/14 Markus Jelsma <markus.jel...@openindex.io>


> That depends on your GC settings and generation sizes. And, instead of
> UseParallelGC you'd better use UseParNewGC in combination with CMS.
>
>
JConsole now shows a different profile output but load is still high and
performance is still bad.

Btw, here is the thread profile from newrelic:

https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm

Note that we do use a form of sharding so I maybe all the time spent waiting
for handleRequestBody
is results from sharding?


> See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
>
> > It's actually, as I understand it, expected JVM behavior to see the heap
> > rise to close to it's limit before it gets GC'd, that's how Java GC
> > works.  Whether that should happen every 20 seconds or what, I don't
> nkow.
> >
> > Another option is setting better JVM garbage collection arguments, so GC
> > doesn't "stop the world" so often. I have had good luck with my Solr
> > using this:  -XX:+UseParallelGC
> >
> > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > Hello again,
> > >
> > > 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > >
> > >>> Hello,
> > >>>
> > >>> 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > >>>
> > >>>> Hi Doğacan,
> > >>>>
> > >>>> Are you, at some point, running out of heap space? In my experience,
> > >>>> that's the common cause of increased load and excessivly high
> response
> > >>>> times (or time
> > >>>> outs).
> > >>>
> > >>> How much of a heap size would be enough? Our index size is growing
> > >>> slowly but we did not have this problem
> > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > >>
> > >> Telling how much heap space is needed isn't easy to say. It usually
> > >> needs to
> > >> be increased when you run out of memory and get those nasty OOM
> errors,
> > >> are you getting them?
> > >> Replication eventes will increase heap usage due to cache warming
> > >> queries and
> > >> autowarming.
> > >
> > > Nope, no OOM errors.
> > >
> > >>> We left most of the caches in solrconfig as default and only
> increased
> > >>> filterCache to 1024. We only ask for "id"s (which
> > >>> are unique) and no other fields during queries (though we do
> faceting).
> > >>> Btw, 1.6gb of our index is stored fields (we store
> > >>> everything for now, even though we do not get them during queries),
> and
> > >>> about 1gb of index.
> > >>
> > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
> > >> are there
> > >> a lot of entries? Is there an insanity count? Do you use boost
> > >> functions?
> > >
> > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > boosting functions.
> > >
> > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > goes to 8gb every 20 seconds or so,
> > > gc runs, falls down to 1gb.
> > >
> > > Btw, our current revision was just a random choice but up until two
> weeks
> > > ago it has been rock-solid so we have been
> > > reluctant to update to another version. Would you recommend upgrading
> to
> > > latest trunk?
> > >
> > >> It might not have anything to do with memory at all but i'm just
> asking.
> > >> There
> > >> may be a bug in your revision causing this.
> > >>
> > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> get
> > >>
> > >> any
> > >>
> > >>> improvement in load. I can try monitoring with Jconsole
> > >>> with 8gigs of heap to see if it helps.
> > >>>
> > >>>> Cheers,
> > >>>>
> > >>>>> Hello everyone,
> > >>>>>
> > >>>>> First of all here is our Solr setup:
> > >>>>>
> > >>>>> - Solr nightly build 986158
> > >>>>> - Running solr inside the default jetty comes with solr build
> > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> 24gb
> > >>
> > >> of
> > >>
> > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> Replication
> > >>>>> - Size of index is around 2.5gb
> > >>>>> - No incremental writes, index is created from scratch(delete old
> > >>>>
> > >>>> documents
> > >>>>
> > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > >>>>> - Avg time per request is around 25ms (before having problems)
> > >>>>> - Load on each is slave is around 2
> > >>>>>
> > >>>>> We are using this set-up for months without any problem. However
> last
> > >>>>
> > >>>> week
> > >>>>
> > >>>>> we started to experience very weird performance problems like :
> > >>>>>
> > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> higher
> > >>
> > >> if
> > >>
> > >>>> we
> > >>>>
> > >>>>> don't restart the slaves)
> > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > >>>>> cpu)
> > >>>>>
> > >>>>> When we profile solr we see two very strange things :
> > >>>>>
> > >>>>> 1 - This is the jconsole output:
> > >>>>>
> > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > >>>>>
> > >>>>> As you see gc runs for every 10-15 seconds and collects more than 1
> > >>
> > >> gb
> > >>
> > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> spikes
> > >>>>> up to
> > >>>>
> > >>>> 4gb
> > >>>>
> > >>>>> consistently)
> > >>>>>
> > >>>>> 2 - This is the newrelic output :
> > >>>>>
> > >>>>> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > >>>>>
> > >>>>> As you see solr spent ridiculously long time in
> > >>>>> SolrDispatchFilter.doFilter() method.
> > >>>>>
> > >>>>>
> > >>>>> Apart form these, when we clean the index directory, re-replicate
> and
> > >>>>> restart  each slave one by one we see a relief in the system but
> > >>
> > >> after
> > >>
> > >>>> some
> > >>>>
> > >>>>> time servers start to melt down again. Although deleting index and
> > >>>>> replicating doesn't solve the problem, we think that these problems
> > >>
> > >> are
> > >>
> > >>>>> somehow related to replication. Because symptoms started after
> > >>>>
> > >>>> replication
> > >>>>
> > >>>>> and once it heals itself after replication. I also see
> > >>>>> lucene-write.lock files in slaves (we don't have write.lock files
> in
> > >>>>> the master) which I think we shouldn't see.
> > >>>>>
> > >>>>>
> > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Dogacan Guney
>



-- 
Doğacan Güney

Re: Solr performance issue

Reply via email to