Some more info,

  Profiling the heap dump shows
"org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object
- taking up almost 80% of total memory (6G) - see the attached screen
shot for a smaller dump. There is some norms object - not sure where
are they coming from as I've omitnorms=true for all indexed records.

I also noticed that if I run a query - let's say generic query that
hits 100million records and then follow up with a specific query -
which hits only 1 record, the second query causes the increase in
heap.

Looks like there are few bytes being loaded into memory for each
document - I've checked the schema all indexes have omitNorms=true,
all caches are commented out - still looking to see what else might
put things in memory which don't get collected by GC.

I also saw, https://issues.apache.org/jira/browse/SOLR-1111 for Solr
1.4 (which I'm using). Not sure if that can cause any problem. I do
use range queries for dates - would that have any effect?

Any other ideas?

Thanks,
-vivek

On Thu, May 14, 2009 at 8:38 PM, vivek sar <vivex...@gmail.com> wrote:
> Thanks Mark.
>
> I checked all the items you mentioned,
>
> 1) I've omitnorms=true for all my indexed fields (stored only fields I
> guess doesn't matter)
> 2) I've tried commenting out all caches in the solrconfig.xml, but
> that doesn't help much
> 3) I've tried commenting out the first and new searcher listeners
> settings in the solrconfig.xml - the only way that helps is that at
> startup time the memory usage doesn't spike up - that's only because
> there is no auto-warmer query to run. But, I noticed commenting out
> searchers slows down any other queries to Solr.
> 4) I don't have any sort or facet in my queries
> 5) I'm not sure how to change the "Lucene term interval" from Solr -
> is there a way to do that?
>
> I've been playing around with this memory thing the whole day and have
> found that it's the search that's hogging the memory. Any time there
> is a search on all the records (800 million) the heap consumption
> jumps by 5G. This makes me think there has to be some configuration in
> Solr that's causing some terms per document to be loaded in memory.
>
> I've posted my settings several times on this forum, but no one has
> been able to pin point what configuration might be causing this. If
> someone is interested I can attach the solrconfig and schema files as
> well. Here are the settings again under Query tag,
>
> <query>
>  <maxBooleanClauses>1024</maxBooleanClauses>
>  <enableLazyFieldLoading>true</enableLazyFieldLoading>
>  <queryResultWindowSize>50</queryResultWindowSize>
>  <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
>   <HashDocSet maxSize="3000" loadFactor="0.75"/>
>  <useColdSearcher>false</useColdSearcher>
>  <maxWarmingSearchers>2</maxWarmingSearchers>
>  </query>
>
> and schema,
>
>  <field name="id" type="long" indexed="true" stored="true"
> required="true" omitNorms="true" compressed="false"/>
>
>  <field name="atmps" type="integer" indexed="false" stored="true"
> compressed="false"/>
>  <field name="bcid" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="cmpcd" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="ctry" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="dlt" type="date" indexed="false" stored="true"
> default="NOW/HOUR"  compressed="false"/>
>  <field name="dmn" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="eaddr" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="emsg" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="erc" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="evt" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="from" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="lfid" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="lsid" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="prsid" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="rc" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="rmcd" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="rmscd" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="scd" type="string" indexed="true" stored="true"
> omitNorms="true" compressed="false"/>
>  <field name="sip" type="string" indexed="false" stored="true"
> compressed="false"/>
>  <field name="ts" type="date" indexed="true" stored="false"
> default="NOW/HOUR" omitNorms="true"/>
>
>  <!-- catchall field, containing all other searchable text fields (implemented
>       via copyField further on in this schema  -->
>  <field name="all" type="text_ws" indexed="true" stored="false"
> omitNorms="true" multiValued="true"/>
>
> Any help is greatly appreciated.
>
> Thanks,
> -vivek
>
> On Thu, May 14, 2009 at 6:22 PM, Mark Miller <markrmil...@gmail.com> wrote:
>> 800 million docs is on the high side for modern hardware.
>>
>> If even one field has norms on, your talking almost 800 MB right there. And
>> then if another Searcher is brought up well the old one is serving (which
>> happens when you update)? Doubled.
>>
>> Your best bet is to distribute across a couple machines.
>>
>> To minimize you would want to turn off or down caching, don't facet, don't
>> sort, turn off all norms, possibly get at the Lucene term interval and raise
>> it. Drop on deck searchers setting. Even then, 800 million...time to
>> distribute I'd think.
>>
>> vivek sar wrote:
>>>
>>> Some update on this issue,
>>>
>>> 1) I attached jconsole to my app and monitored the memory usage.
>>> During indexing the memory usage goes up and down, which I think is
>>> normal. The memory remains around the min heap size (4 G) for
>>> indexing, but as soon as I run a search the tenured heap usage jumps
>>> up to 6G and remains there. Subsequent searches increases the heap
>>> usage even more until it reaches the max (8G) - after which everything
>>> (indexing and searching becomes slow).
>>>
>>> The search query is a very generic one in this case which goes through
>>> all the cores (4 of them - 800 million records), finds 400million
>>> matches and returns 100 rows.
>>>
>>> Does the Solr searcher holds up the reference to objects in memory? I
>>> couldn't find any settings that would tell me it does, but every
>>> search causing heap to go up is definitely suspicious.
>>>
>>> 2) I ran the jmap histo to get the top objects (this is on a smaller
>>> instance with 2 G memory, this is before running search - after
>>> running search I wasn't able to run jmap),
>>>
>>>  num     #instances         #bytes  class name
>>> ----------------------------------------------
>>>   1:       3890855      222608992  [C
>>>   2:       3891673      155666920  java.lang.String
>>>   3:       3284341      131373640  org.apache.lucene.index.TermInfo
>>>   4:       3334198      106694336  org.apache.lucene.index.Term
>>>   5:           271       26286496  [J
>>>   6:            16       26273936  [Lorg.apache.lucene.index.Term;
>>>   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
>>>   8:        320512       15384576
>>> org.apache.lucene.index.FreqProxTermsWriter$PostingList
>>>   9:         10335       11554136  [I
>>>
>>> I'm not sure what's the first one (C)? I couldn't profile it to know
>>> what all the Strings are being allocated by - any ideas?
>>>
>>> Any ideas on what Searcher might be holding on and how can we change
>>> that behavior?
>>>
>>> Thanks,
>>> -vivek
>>>

Reply via email to