2011/3/14 Markus Jelsma <markus.jel...@openindex.io>

> Mmm. SearchHander.handleRequestBody takes care of sharding. Could your
> system
> suffer from
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
> ?
>
>
We increased thread limit (which was 10000 before) but it did not help.

Anyway, we will try to disable sharding tomorrow. Maybe this can give us a
better picture.

Thanks for the help, everyone.


> I'm not sure, i haven't seen a similar issue in a sharded environment,
> probably because it was a controlled environment.
>
>
> > Hello,
> >
> > 2011/3/14 Markus Jelsma <markus.jel...@openindex.io>
> >
> > > That depends on your GC settings and generation sizes. And, instead of
> > > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> >
> > JConsole now shows a different profile output but load is still high and
> > performance is still bad.
> >
> > Btw, here is the thread profile from newrelic:
> >
> > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> >
> > Note that we do use a form of sharding so I maybe all the time spent
> > waiting for handleRequestBody
> > is results from sharding?
> >
> > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > >
> > > > It's actually, as I understand it, expected JVM behavior to see the
> > > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > > GC works.  Whether that should happen every 20 seconds or what, I
> > > > don't
> > >
> > > nkow.
> > >
> > > > Another option is setting better JVM garbage collection arguments, so
> > > > GC doesn't "stop the world" so often. I have had good luck with my
> > > > Solr using this:  -XX:+UseParallelGC
> > > >
> > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > > Hello again,
> > > > >
> > > > > 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > > > >
> > > > >>> Hello,
> > > > >>>
> > > > >>> 2011/3/14 Markus Jelsma<markus.jel...@openindex.io>
> > > > >>>
> > > > >>>> Hi Doğacan,
> > > > >>>>
> > > > >>>> Are you, at some point, running out of heap space? In my
> > > > >>>> experience, that's the common cause of increased load and
> > > > >>>> excessivly high
> > >
> > > response
> > >
> > > > >>>> times (or time
> > > > >>>> outs).
> > > > >>>
> > > > >>> How much of a heap size would be enough? Our index size is
> growing
> > > > >>> slowly but we did not have this problem
> > > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > > >>
> > > > >> Telling how much heap space is needed isn't easy to say. It
> usually
> > > > >> needs to
> > > > >> be increased when you run out of memory and get those nasty OOM
> > >
> > > errors,
> > >
> > > > >> are you getting them?
> > > > >> Replication eventes will increase heap usage due to cache warming
> > > > >> queries and
> > > > >> autowarming.
> > > > >
> > > > > Nope, no OOM errors.
> > > > >
> > > > >>> We left most of the caches in solrconfig as default and only
> > >
> > > increased
> > >
> > > > >>> filterCache to 1024. We only ask for "id"s (which
> > > > >>> are unique) and no other fields during queries (though we do
> > >
> > > faceting).
> > >
> > > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > > >>> everything for now, even though we do not get them during
> queries),
> > >
> > > and
> > >
> > > > >>> about 1gb of index.
> > > > >>
> > > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > > >> fieldCache, are there
> > > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > > >> functions?
> > > > >
> > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > > boosting functions.
> > > > >
> > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it
> still
> > > > > goes to 8gb every 20 seconds or so,
> > > > > gc runs, falls down to 1gb.
> > > > >
> > > > > Btw, our current revision was just a random choice but up until two
> > >
> > > weeks
> > >
> > > > > ago it has been rock-solid so we have been
> > > > > reluctant to update to another version. Would you recommend
> upgrading
> > >
> > > to
> > >
> > > > > latest trunk?
> > > > >
> > > > >> It might not have anything to do with memory at all but i'm just
> > >
> > > asking.
> > >
> > > > >> There
> > > > >> may be a bug in your revision causing this.
> > > > >>
> > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did
> not
> > >
> > > get
> > >
> > > > >> any
> > > > >>
> > > > >>> improvement in load. I can try monitoring with Jconsole
> > > > >>> with 8gigs of heap to see if it helps.
> > > > >>>
> > > > >>>> Cheers,
> > > > >>>>
> > > > >>>>> Hello everyone,
> > > > >>>>>
> > > > >>>>> First of all here is our Solr setup:
> > > > >>>>>
> > > > >>>>> - Solr nightly build 986158
> > > > >>>>> - Running solr inside the default jetty comes with solr build
> > > > >>>>> - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > >
> > > 24gb
> > >
> > > > >> of
> > > > >>
> > > > >>>>> RAM) - Index replicated (on optimize) to slaves via Solr
> > >
> > > Replication
> > >
> > > > >>>>> - Size of index is around 2.5gb
> > > > >>>>> - No incremental writes, index is created from scratch(delete
> old
> > > > >>>>
> > > > >>>> documents
> > > > >>>>
> > > > >>>>> ->  commit new documents ->  optimize)  every 6 hours
> > > > >>>>> - Avg # of request per second is around 60 (for a single slave)
> > > > >>>>> - Avg time per request is around 25ms (before having problems)
> > > > >>>>> - Load on each is slave is around 2
> > > > >>>>>
> > > > >>>>> We are using this set-up for months without any problem.
> However
> > >
> > > last
> > >
> > > > >>>> week
> > > > >>>>
> > > > >>>>> we started to experience very weird performance problems like :
> > > > >>>>>
> > > > >>>>> - Avg time per request increased from 25ms to 200-300ms (even
> > >
> > > higher
> > >
> > > > >> if
> > > > >>
> > > > >>>> we
> > > > >>>>
> > > > >>>>> don't restart the slaves)
> > > > >>>>> - Load on each slave increased from 2 to 15-20 (solr uses
> > > > >>>>> %400-%600 cpu)
> > > > >>>>>
> > > > >>>>> When we profile solr we see two very strange things :
> > > > >>>>>
> > > > >>>>> 1 - This is the jconsole output:
> > > > >>>>>
> > > > >>>>> https://skitch.com/meralan/rwwcf/mail-886x691
> > > > >>>>>
> > > > >>>>> As you see gc runs for every 10-15 seconds and collects more
> than
> > > > >>>>> 1
> > > > >>
> > > > >> gb
> > > > >>
> > > > >>>>> of memory. (Actually if you wait more than 10 minutes you see
> > >
> > > spikes
> > >
> > > > >>>>> up to
> > > > >>>>
> > > > >>>> 4gb
> > > > >>>>
> > > > >>>>> consistently)
> > > > >>>>>
> > > > >>>>> 2 - This is the newrelic output :
> > > > >>>>>
> > > > >>>>>
> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > >>>>>
> > > > >>>>> As you see solr spent ridiculously long time in
> > > > >>>>> SolrDispatchFilter.doFilter() method.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Apart form these, when we clean the index directory,
> re-replicate
> > >
> > > and
> > >
> > > > >>>>> restart  each slave one by one we see a relief in the system
> but
> > > > >>
> > > > >> after
> > > > >>
> > > > >>>> some
> > > > >>>>
> > > > >>>>> time servers start to melt down again. Although deleting index
> > > > >>>>> and replicating doesn't solve the problem, we think that these
> > > > >>>>> problems
> > > > >>
> > > > >> are
> > > > >>
> > > > >>>>> somehow related to replication. Because symptoms started after
> > > > >>>>
> > > > >>>> replication
> > > > >>>>
> > > > >>>>> and once it heals itself after replication. I also see
> > > > >>>>> lucene-write.lock files in slaves (we don't have write.lock
> files
> > >
> > > in
> > >
> > > > >>>>> the master) which I think we shouldn't see.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> If anyone can give any sort of ideas, we will appreciate it.
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> Dogacan Guney
>



-- 
Doğacan Güney

Reply via email to