I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage?
Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Warning: I'm waaaay out of my competency range when I comment > on SOLR, but I've seen the statement that string fields are NOT > tokenized while text fields are, and I notice that almost all of your fields > are string type. > > Would someone more knowledgeable than me care to comment on whether > this is at all relevant? Offered in the spirit that sometimes there are > things > so basic that only an amateur can see them <G>.... > > Best > Erick > > On Wed, May 13, 2009 at 4:42 PM, vivek sar <vivex...@gmail.com> wrote: > >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> <field name="id" type="long" indexed="true" stored="true" >> required="true" omitNorms="true" compressed="false"/> >> >> <field name="atmps" type="integer" indexed="false" stored="true" >> compressed="false"/> >> <field name="bcid" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="cmpcd" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="ctry" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="dlt" type="date" indexed="false" stored="true" >> default="NOW/HOUR" compressed="false"/> >> <field name="dmn" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="eaddr" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="emsg" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="erc" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="evt" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="from" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="lfid" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="lsid" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="prsid" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="rc" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="rmcd" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="rmscd" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="scd" type="string" indexed="true" stored="true" >> omitNorms="true" compressed="false"/> >> <field name="sip" type="string" indexed="false" stored="true" >> compressed="false"/> >> <field name="ts" type="date" indexed="true" stored="false" >> default="NOW/HOUR" omitNorms="true"/> >> >> >> <!-- catchall field, containing all other searchable text fields >> (implemented >> via copyField further on in this schema --> >> <field name="all" type="text_ws" indexed="true" stored="false" >> omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> <otis_gospodne...@yahoo.com> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values >> for the field(s) used for sorting. Similarly for facet fields. Solr >> caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> consume during indexing. There is no need to commit every 50K docs unless >> you want to trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> going to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > ----- Original Message ---- >> >> From: vivek sar <vivex...@gmail.com> >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in memory? >> >> >> >> 2) How much memory is required during index time? If I'm committing >> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> >> I need to give to Solr. >> >> >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> >> size index? Is there any benchmark on this? >> >> >> >> Here are some of my configuration from solrconfig.xml, >> >> >> >> 1) 64 >> >> 2) All the caches (under query tag) are commented out >> >> 3) Few others, >> >> a) true ==> >> >> would this require memory? >> >> b) 50 >> >> c) 200 >> >> d) >> >> e) false >> >> f) 2 >> >> >> >> The problem we are having is following, >> >> >> >> I've given Solr RAM of 6G. As the total index size (all cores >> >> combined) start growing the Solr memory consumption goes up. With 800 >> >> million documents, I see Solr already taking up all the memory at >> >> startup. After that the commits, searches everything become slow. We >> >> will be having distributed setup with multiple Solr instances (around >> >> 8) on four boxes, but our requirement is to have each Solr instance at >> >> least maintain around 1.5 billion documents. >> >> >> >> We are trying to see if we can somehow reduce the Solr memory >> >> footprint. If someone can provide a pointer on what parameters affect >> >> memory and what effects it has we can then decide whether we want that >> >> parameter or not. I'm not sure if there is any minimum Solr >> >> requirement for it to be able maintain large indexes. I've used Lucene >> >> before and that didn't require anything by default - it used up memory >> >> only during index and search times - not otherwise. >> >> >> >> Any help is very much appreciated. >> >> >> >> Thanks, >> >> -vivek >> > >> > >> >