Re: Java heap space

2006-04-29 Thread Marcus Stratmann
Chris Hostetter wrote:
> interesting .. are you getting the OutOfMemory on an actual delete
> operation or when doing a commit after executing some deletes?

Yes, on a delete operation. I'm not doing any commits until the end of
all delete operations.
After reading this I was curious if using commits during deleting would
have any effect. So I tested doing a commit after 10,000 deletes at a
time (which, I know, is not recommended). But that simply didn't change
anything.

Meanwhile I found out that I can gain 10,000 documents more to delete
(before getting an OOM) by increasing the heap space by 500M.
Unfortunately we need to delete about 200,000 documents on each update
which would need 10G to be added to the heap space. Not to speak of
the same number of inserts.


> part of the problem may be that under the covers, any delete involves
> doing a query (even if oyou are deleting by uniqueKey, that's implimented
> s a delete by Term, which requires iterating over a TermEnum to find the
> relevent document, and if your index is big enough, loading that TermEnum
> and may be the cause of your OOM.

Yes, I thought so, too. And in fact I get OOM even if I just submit search
queries.


> Maybe, maybe not ... what options are you using in your solrconfig.xml's
> indexDefaults and mainIndex blocks?

I adopted the default values from the example installation which looked
quite reasonable to me.


> ... 10 million documents could be the
> magic point at which your mergeFactor triggers the merging of several
> large segments into one uber segment -- which may be big enough to cause
> an OOM when the IndexReader tries to open it.

Yes, I'm using the default mergeFactor of 10 and as 10 million is 10^7
this is what appeared suspicious to me.
Is it right, that the mergeFactor connot be changed once the index has
been built?

Marcus




Re: Java heap space

2006-04-29 Thread Yonik Seeley

On 4/29/06, Marcus Stratmann <[EMAIL PROTECTED]> wrote:

Yes, on a delete operation. I'm not doing any commits until the end of
all delete operations.


I assume this is a delete-by-id and not a delete-by-query?  They work
very differently.

There is some state stored for each pending delete-by-id... there is
a HashMap with an entry for each id that needs to be
deleted.  This state shouldn't be that large though.

If fact, delete-by-id does nothing with a Lucene index at all until 


After reading this I was curious if using commits during deleting would
have any effect. So I tested doing a commit after 10,000 deletes at a
time (which, I know, is not recommended). But that simply didn't change
anything.


Strange... that suggests it's not the state kept in the HaspMap.


Meanwhile I found out that I can gain 10,000 documents more to delete
(before getting an OOM) by increasing the heap space by 500M.


Something else is going on.


Unfortunately we need to delete about 200,000 documents on each update
which would need 10G to be added to the heap space. Not to speak of
the same number of inserts.


If you are first deleting so you can re-add a newer version of the
document, you don't need too... overwriting older documents based on
the uniqueKeyField is something Solr does for you!


Yes, I thought so, too. And in fact I get OOM even if I just submit search
queries.


Is it possible to use a profiler to see where all the memory is going?
It sounds like you may have uncovered a memory leak somewhere.
Also what OS, what JVM, what appserver are you using?


-Yonik