Re: Memory use with sorting problem

Chris Laux Wed, 28 Nov 2007 05:43:23 -0800

Just wanted to add the solution to this problem, in case someone finds
the matching description in the archives (see below).


By reducing the granularity of the timestamp field (stored as slong)
from seconds to minutes the number of unique values was reduced by an
order of magnitude (there are about 500.000 minutes in a year) and hence
the memory use was also reduced.

Chris


Chris Laux wrote:
> Hi again,
> 
> in the meantime I discovered the use of jmap (I'm not a Java programmer)
> and found that all the memory was being used up by String and char[]
> objects.
> 
> The Lucene docs have the following to say on sorting memory use:
> 
>> For String fields, the cache is larger: in addition to the above
> array, the value of every term in the field is kept in memory. If there
> are many unique terms in the field, this could be quite large.
> 
> (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Sort.html)
> 
> I am sorting on the "slong" schema type, which is of course stored as a
> string. The above quote seems to indicate that it is possible for a
> field not to be a string for the purposes of the sort, while I took it
> from LiA that everything is a string to Lucene.
> 
> What can I do to make sure the additional memory is not used by every
> unique term? i.e. how to have the slong not be a "String field"?
> 
> Cheers,
> Chris
> 
> 
> Chris Laux wrote:
>> Hi all,
>>
>> I've been struggling with this problem for over a month now, and
>> although memory issues have been discussed often, I don't seem to be
>> able to find a fitting solution.
>>
>> The index is merely 1.5 GB large, but memory use quickly fills out the
>> heap max of 1 GB on a 2 GB machine. This then works fine until
>> auto-warming starts. Switching the latter off altogether is unattractive
>> as it leads to response times of up to 30 s. When auto-warming starts, I
>> get this error:
>>
>>> SEVERE: Error during auto-warming of
>> key:org.apache.solr.search.QueryResultKey
>> @e0b93139:java.lang.OutOfMemoryError: Java heap space
>>
>> Now when I reduce the size of caches (to a fraction of the default
>> settings) and number of warming Searchers (to 2), memory use is not
>> reduced and the problem stays. Only deactivating auto-warming will help.
>> When I set the heap size limit higher (and go into swap space), all the
>> extra memory seems to be used up right away, independently from
>> auto-warming.
>>
>> This all seems to be closely connected to sorting by a numerical field,
>> as switching this off does make memory use a lot more friendly.
>>
>> Is it normal to need that much Memory for such a small index?
>>
>> I suspect the problem is in Lucene, would it be better to post on their
>> list?
>>
>> Does anyone know a better way of getting the sorting done?
>>
>> Thanks in advance for your help,
>>
>> Chris
>>
>>
>> This is the field setup in schema.xml:
>>
>> <field name="id" type="long" stored="true" required="true"
>> multiValued="false" />
>> <field name="user-id" type="long" stored="true" required="true"
>> multiValued="false" />
>> <field name="text" type="text" indexed="true" multiValued="false" />
>> <field name="created" type="slong" indexed="true" multiValued="false" />
>>
>> And this is a sample query:
>>
>> select/?q=solr&start=0&rows=20&sort=created+desc
>>
>>
>

Re: Memory use with sorting problem

Reply via email to