> I did heap dump + heap histogram before killing the jvm today and the
only really suspicious thing was the top line in the histogram:
class [B,
81883 instances,
3,974,092,842 bytes

> Most of the instances (actually all of around a hundred of them I
checked with jhat) look almost the same in terms of references:

> org.apache.lucene.store.niofsdirectory$niofsindexin...@0x2aab14a47120
(98 bytes) : field buffer
java.nio.heapbytebuf...@0x2aab14a475a8 (55 bytes) : field hbWhat do
you mean 'instance'? Are there 100 shards in one Solr instance? And
each instance has one NIOFSDirectory object?

So, the machine has no real indexes and does nothing but broker
searches to shards? And all of the indexes are empty?

What do the logs say just before the fail? It sounds like maybe some
anomalous event triggers a bug.

You might try rolling back to earlier 3_x.

Lance

On Sat, Dec 25, 2010 at 11:15 AM, Alexey Kovyrin <ale...@kovyrin.net> wrote:
> Today I've managed to get the following from a "dead" server:
> - gc log since start till the death of the service
> - jutil -gc -t 1000 output since start till the end
> - thread stack dump before killing the server
> - heap histogram before killing the server
> - heap dump
>
>
>
> On Fri, Dec 24, 2010 at 11:36 PM, Lance Norskog <goks...@gmail.com> wrote:
>> More details, please. You tried all of the different GC
>> implementations? Is there enough memory assign to the JVM to run
>> comfortably but no much more? (The OS uses spare memory as disk
>> buffers a lot better than Java does.)
>
> We have 24Gb ram on the server, we use 6Gb (used to have 12Gb but that
> did not make any difference) dedicated to java vm (xmx6000m).
>
>> How many threads are there? Distributed search uses two searches, both
>> parallelized with 1 thread per shard. Perhaps they're building up?
>
>
> There are usually around 150-200 threads in the jvm.
>
>> Do a heap scan with text output every, say, 6 hours. If there is
>> something building up, you might spot it.
>
> I did heap dump + heap histogram before killing the jvm today and the
> only really suspicious thing was the top line in the histogram:
> class [B,
> 81883 instances,
> 3,974,092,842 bytes
>
> Most of the instances (actually all of around a hundred of them I
> checked with jhat) look almost the same in terms of references:
>
> org.apache.lucene.store.niofsdirectory$niofsindexin...@0x2aab14a47120
> (98 bytes) : field buffer
> java.nio.heapbytebuf...@0x2aab14a475a8 (55 bytes) : field hb
>
>
>> Also RMI is very bad on GC. Are you connecting to Solr or the Tomcat with it?
>
> I believe we don't use it here.
>
>> On Tue, Dec 21, 2010 at 7:09 PM, Alexey Kovyrin <ale...@kovyrin.net> wrote:
>>> Hello guys,
>>>
>>> We at scribd.com have recently deployed our new search cluster based
>>> on Dec 1st, 2010 branch_3x solr code and we're very happy about the
>>> new features in brings.
>>> Though looks like we have a weird problem here: once a day our servers
>>> handling sharded search queries (frontend servers that receive
>>> requests and then fan them out to backend machines) die. Everything
>>> looks cool for a day, memory usage is stable, GC is doing its work as
>>> usual.... and then eventually we get a weird GC activity spike that
>>> kills whole VM and the only way to bring it back is to kill -9 the
>>> tomcat6 vm and restart it. We've tried different GC tuning options,
>>> tried to reduce caches to almost a zero size, still no luck.
>>>
>>> So I was wondering if there were any known issues with solr branch 3x
>>> in the last month that could have caused this kind of problems or if
>>> we could provide any more information that could help to track down
>>> the issue.
>>>
>>> Thanks.
>>>
>>> --
>>> Alexey Kovyrin
>>> http://kovyrin.net/
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> Alexey Kovyrin
> http://kovyrin.net/
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to