Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong.
I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? <field name="id" type="long" indexed="true" stored="true" required="true" omitNorms="true" compressed="false"/> <field name="atmps" type="integer" indexed="false" stored="true" compressed="false"/> <field name="bcid" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="cmpcd" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="ctry" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="dlt" type="date" indexed="false" stored="true" default="NOW/HOUR" compressed="false"/> <field name="dmn" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="eaddr" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="emsg" type="string" indexed="false" stored="true" compressed="false"/> <field name="erc" type="string" indexed="false" stored="true" compressed="false"/> <field name="evt" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="from" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="lfid" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="lsid" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="prsid" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="rc" type="string" indexed="false" stored="true" compressed="false"/> <field name="rmcd" type="string" indexed="false" stored="true" compressed="false"/> <field name="rmscd" type="string" indexed="false" stored="true" compressed="false"/> <field name="scd" type="string" indexed="true" stored="true" omitNorms="true" compressed="false"/> <field name="sip" type="string" indexed="false" stored="true" compressed="false"/> <field name="ts" type="date" indexed="true" stored="false" default="NOW/HOUR" omitNorms="true"/> <!-- catchall field, containing all other searchable text fields (implemented via copyField further on in this schema --> <field name="all" type="text_ws" indexed="true" stored="false" omitNorms="true" multiValued="true"/> Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > > Hi, > Some answers: > 1) .tii files in the Lucene index. When you sort, all distinct values for > the field(s) used for sorting. Similarly for facet fields. Solr caches. > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume > during indexing. There is no need to commit every 50K docs unless you want > to trigger snapshot creation. > 3) see 1) above > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > going to fly. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: vivek sar <vivex...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> Subject: Solr memory requirements? >> >> Hi, >> >> I'm pretty sure this has been asked before, but I couldn't find a >> complete answer in the forum archive. Here are my questions, >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> I've 4 cores with each core 50G in size. When Solr comes up how much >> of it would be loaded in memory? >> >> 2) How much memory is required during index time? If I'm committing >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> I need to give to Solr. >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> size index? Is there any benchmark on this? >> >> Here are some of my configuration from solrconfig.xml, >> >> 1) 64 >> 2) All the caches (under query tag) are commented out >> 3) Few others, >> a) true ==> >> would this require memory? >> b) 50 >> c) 200 >> d) >> e) false >> f) 2 >> >> The problem we are having is following, >> >> I've given Solr RAM of 6G. As the total index size (all cores >> combined) start growing the Solr memory consumption goes up. With 800 >> million documents, I see Solr already taking up all the memory at >> startup. After that the commits, searches everything become slow. We >> will be having distributed setup with multiple Solr instances (around >> 8) on four boxes, but our requirement is to have each Solr instance at >> least maintain around 1.5 billion documents. >> >> We are trying to see if we can somehow reduce the Solr memory >> footprint. If someone can provide a pointer on what parameters affect >> memory and what effects it has we can then decide whether we want that >> parameter or not. I'm not sure if there is any minimum Solr >> requirement for it to be able maintain large indexes. I've used Lucene >> before and that didn't require anything by default - it used up memory >> only during index and search times - not otherwise. >> >> Any help is very much appreciated. >> >> Thanks, >> -vivek > >