I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Warning: I'm waaaay out of my competency range when I comment
> on SOLR, but I've seen the statement that string fields are NOT
> tokenized while text fields are, and I notice that almost all of your fields
> are string type.
>
> Would someone more knowledgeable than me care to comment on whether
> this is at all relevant? Offered in the spirit that sometimes there are
> things
> so basic that only an amateur can see them <G>....
>
> Best
> Erick
>
> On Wed, May 13, 2009 at 4:42 PM, vivek sar <vivex...@gmail.com> wrote:
>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>  <field name="id" type="long" indexed="true" stored="true"
>> required="true" omitNorms="true" compressed="false"/>
>>
>>   <field name="atmps" type="integer" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="bcid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="cmpcd" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="ctry" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="dlt" type="date" indexed="false" stored="true"
>> default="NOW/HOUR"  compressed="false"/>
>>   <field name="dmn" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="eaddr" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="emsg" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="erc" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="evt" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="from" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="lfid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="lsid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="prsid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="rc" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="rmcd" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="rmscd" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="scd" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>   <field name="sip" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>   <field name="ts" type="date" indexed="true" stored="false"
>> default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>   <!-- catchall field, containing all other searchable text fields
>> (implemented
>>        via copyField further on in this schema  -->
>>   <field name="all" type="text_ws" indexed="true" stored="false"
>> omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>> <otis_gospodne...@yahoo.com> wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>> caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>> consume during indexing.  There is no need to commit every 50K docs unless
>> you want to trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>> going to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: vivek sar <vivex...@gmail.com>
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in memory?
>> >>
>> >> 2) How much memory is required during index time? If I'm committing
>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> >> I need to give to Solr.
>> >>
>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> >> size index? Is there any benchmark on this?
>> >>
>> >> Here are some of my configuration from solrconfig.xml,
>> >>
>> >> 1) 64
>> >> 2) All the caches (under query tag) are commented out
>> >> 3) Few others,
>> >>       a)  true    ==>
>> >> would this require memory?
>> >>       b)  50
>> >>       c) 200
>> >>       d)
>> >>       e) false
>> >>       f)  2
>> >>
>> >> The problem we are having is following,
>> >>
>> >> I've given Solr RAM of 6G. As the total index size (all cores
>> >> combined) start growing the Solr memory consumption  goes up. With 800
>> >> million documents, I see Solr already taking up all the memory at
>> >> startup. After that the commits, searches everything become slow. We
>> >> will be having distributed setup with multiple Solr instances (around
>> >> 8) on four boxes, but our requirement is to have each Solr instance at
>> >> least maintain around 1.5 billion documents.
>> >>
>> >> We are trying to see if we can somehow reduce the Solr memory
>> >> footprint. If someone can provide a pointer on what parameters affect
>> >> memory and what effects it has we can then decide whether we want that
>> >> parameter or not. I'm not sure if there is any minimum Solr
>> >> requirement for it to be able maintain large indexes. I've used Lucene
>> >> before and that didn't require anything by default - it used up memory
>> >> only during index and search times - not otherwise.
>> >>
>> >> Any help is very much appreciated.
>> >>
>> >> Thanks,
>> >> -vivek
>> >
>> >
>>
>

Reply via email to