Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num     #instances         #bytes  class name
----------------------------------------------
   1:       3890855      222608992  [C
   2:       3891673      155666920  java.lang.String
   3:       3284341      131373640  org.apache.lucene.index.TermInfo
   4:       3334198      106694336  org.apache.lucene.index.Term
   5:           271       26286496  [J
   6:            16       26273936  [Lorg.apache.lucene.index.Term;
   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
   8:        320512       15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9:         10335       11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar <vivex...@gmail.com> wrote:
> I don't know if field type has any impact on the memory usage - does it?
>
> Our use cases require complete matches, thus there is no need of any
> analysis in most cases - does it matter in terms of memory usage?
>
> Also, is there any default caching used by Solr if I comment out all
> the caches under query in solrconfig.xml? I also don't have any
> auto-warming queries.
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
>> Warning: I'm waaaay out of my competency range when I comment
>> on SOLR, but I've seen the statement that string fields are NOT
>> tokenized while text fields are, and I notice that almost all of your fields
>> are string type.
>>
>> Would someone more knowledgeable than me care to comment on whether
>> this is at all relevant? Offered in the spirit that sometimes there are
>> things
>> so basic that only an amateur can see them <G>....
>>
>> Best
>> Erick
>>
>> On Wed, May 13, 2009 at 4:42 PM, vivek sar <vivex...@gmail.com> wrote:
>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>  <field name="id" type="long" indexed="true" stored="true"
>>> required="true" omitNorms="true" compressed="false"/>
>>>
>>>   <field name="atmps" type="integer" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="bcid" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="cmpcd" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="ctry" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="dlt" type="date" indexed="false" stored="true"
>>> default="NOW/HOUR"  compressed="false"/>
>>>   <field name="dmn" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="eaddr" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="emsg" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="erc" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="evt" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="from" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="lfid" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="lsid" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="prsid" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="rc" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="rmcd" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="rmscd" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="scd" type="string" indexed="true" stored="true"
>>> omitNorms="true" compressed="false"/>
>>>   <field name="sip" type="string" indexed="false" stored="true"
>>> compressed="false"/>
>>>   <field name="ts" type="date" indexed="true" stored="false"
>>> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>   <!-- catchall field, containing all other searchable text fields
>>> (implemented
>>>        via copyField further on in this schema  -->
>>>   <field name="all" type="text_ws" indexed="true" stored="false"
>>> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>> <otis_gospodne...@yahoo.com> wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>>> caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>>> consume during indexing.  There is no need to commit every 50K docs unless
>>> you want to trigger snapshot creation.
>>> > 3) see 1) above
>>> >
>>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>>> going to fly. :)
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > ----- Original Message ----
>>> >> From: vivek sar <vivex...@gmail.com>
>>> >> To: solr-user@lucene.apache.org
>>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>>> >> Subject: Solr memory requirements?
>>> >>
>>> >> Hi,
>>> >>
>>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>>> >> complete answer in the forum archive. Here are my questions,
>>> >>
>>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>>> >> of it would be loaded in memory?
>>> >>
>>> >> 2) How much memory is required during index time? If I'm committing
>>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>>> >> I need to give to Solr.
>>> >>
>>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>>> >> size index? Is there any benchmark on this?
>>> >>
>>> >> Here are some of my configuration from solrconfig.xml,
>>> >>
>>> >> 1) 64
>>> >> 2) All the caches (under query tag) are commented out
>>> >> 3) Few others,
>>> >>       a)  true    ==>
>>> >> would this require memory?
>>> >>       b)  50
>>> >>       c) 200
>>> >>       d)
>>> >>       e) false
>>> >>       f)  2
>>> >>
>>> >> The problem we are having is following,
>>> >>
>>> >> I've given Solr RAM of 6G. As the total index size (all cores
>>> >> combined) start growing the Solr memory consumption  goes up. With 800
>>> >> million documents, I see Solr already taking up all the memory at
>>> >> startup. After that the commits, searches everything become slow. We
>>> >> will be having distributed setup with multiple Solr instances (around
>>> >> 8) on four boxes, but our requirement is to have each Solr instance at
>>> >> least maintain around 1.5 billion documents.
>>> >>
>>> >> We are trying to see if we can somehow reduce the Solr memory
>>> >> footprint. If someone can provide a pointer on what parameters affect
>>> >> memory and what effects it has we can then decide whether we want that
>>> >> parameter or not. I'm not sure if there is any minimum Solr
>>> >> requirement for it to be able maintain large indexes. I've used Lucene
>>> >> before and that didn't require anything by default - it used up memory
>>> >> only during index and search times - not otherwise.
>>> >>
>>> >> Any help is very much appreciated.
>>> >>
>>> >> Thanks,
>>> >> -vivek
>>> >
>>> >
>>>
>>
>

Reply via email to