More details, please. You tried all of the different GC implementations? Is there enough memory assign to the JVM to run comfortably but no much more? (The OS uses spare memory as disk buffers a lot better than Java does.)
How many threads are there? Distributed search uses two searches, both parallelized with 1 thread per shard. Perhaps they're building up? Do a heap scan with text output every, say, 6 hours. If there is something building up, you might spot it. YourKit is really nice for this kind of problem. Also RMI is very bad on GC. Are you connecting to Solr or the Tomcat with it? Lance On Tue, Dec 21, 2010 at 7:09 PM, Alexey Kovyrin <ale...@kovyrin.net> wrote: > Hello guys, > > We at scribd.com have recently deployed our new search cluster based > on Dec 1st, 2010 branch_3x solr code and we're very happy about the > new features in brings. > Though looks like we have a weird problem here: once a day our servers > handling sharded search queries (frontend servers that receive > requests and then fan them out to backend machines) die. Everything > looks cool for a day, memory usage is stable, GC is doing its work as > usual.... and then eventually we get a weird GC activity spike that > kills whole VM and the only way to bring it back is to kill -9 the > tomcat6 vm and restart it. We've tried different GC tuning options, > tried to reduce caches to almost a zero size, still no luck. > > So I was wondering if there were any known issues with solr branch 3x > in the last month that could have caused this kind of problems or if > we could provide any more information that could help to track down > the issue. > > Thanks. > > -- > Alexey Kovyrin > http://kovyrin.net/ > -- Lance Norskog goks...@gmail.com