Re: Lucene FieldCache - Out of memory exception

Rahul R Wed, 02 May 2012 22:28:48 -0700

Jack,
Yes, the queries work fine till I hit the OOM. The fields that start with
S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field
definitions from schema.xml :
 <dynamicField name="S_*" type="string"    indexed="true"  stored="true"
omitNorms="true"/>
   <dynamicField name="I_*" type="sint"    indexed="true"  stored="true"
omitNorms="true"/>
   <dynamicField name="F_*" type="sfloat"    indexed="true"  stored="true"
omitNorms="true"/>
   <dynamicField name="D_*" type="date"    indexed="true"  stored="true"
omitNorms="true"/>
   <dynamicField name="B_*" type="boolean"    indexed="true"  stored="true"
omitNorms="true"/>


*Each FieldCache will be an array with maxdoc entries (your total number of
documents - 1.4 million) times the size of the field value or whatever a
string reference is in your JVM*
So if I understand correct - every field (dynamic or normal) will have its
own field cache. The size of the field cache for any field will be (maxDocs
* sizeOfField) ? If the field has only 100 unique values, will it occupy
(100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ?

*Roughly what is the typical or average length of one of your facet field
values? And, on average, how many unique terms are there within a typical
faceted field?*
Each field length may vary from 10 - 30 characters. Average of 20 maybe.
Number of unique terms within a faceted field will vary from 100 - 1000.
Average of 300. How will the number of unique terms affect performance ?

*3 GB sounds like it might not be enough for such heavy use of faceting. It
is probably not the 50-70 number, but the 440 or accumulated number across
many queries that pushes the memory usage up*
I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a
limitation that more RAM cannot be allocated.

*When you hit OOM, what does the Solr admin stats display say for
FieldCache?*
I don't have solr deployed as a separate web app. All solr jar files are
present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So
is there a way I can get this information that the admin would show ?

Thank you for your time.

-Rahul


On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> The FieldCache gets populated the first time a given field is referenced
> as a facet and then will stay around forever. So, as additional queries get
> executed with different facet fields, the number of FieldCache entries will
> grow.
>
> If I understand what you have said, theses faceted queries do work
> initially, but after awhile they stop working with OOM, correct?
>
> The size of a single FieldCache depends on the field type. Since you are
> using dynamic fields, it depends on your "dynamicField" types - which you
> have not told us about. From your query I see that your fields start with
> "S_" and "F_" - presumably you have dynamic field types "S_*" and "F_*"?
> Are they strings, integers, floats, or what?
>
> Each FieldCache will be an array with maxdoc entries (your total number of
> documents - 1.4 million) times the size of the field value or whatever a
> string reference is in your JVM.
>
> String fields will take more space than numeric fields for the FieldCache,
> since a separate table is maintained for the unique terms in that field.
> Roughly what is the typical or average length of one of your facet field
> values? And, on average, how many unique terms are there within a typical
> faceted field?
>
> If you can convert many of these faceted fields to simple integers the
> size should go down dramatically, but that depends on your application.
>
> 3 GB sounds like it might not be enough for such heavy use of faceting. It
> is probably not the 50-70 number, but the 440 or accumulated number across
> many queries that pushes the memory usage up.
>
> When you hit OOM, what does the Solr admin stats display say for
> FieldCache?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rahul R
> Sent: Wednesday, May 02, 2012 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache - Out of memory exception
>
>
> Here is one sample query that I picked up from the log file :
>
> q=*%3A*&fq=Category%3A%223__**107%22&fq=S_P1540477699%3A%**
> 22MICROCIRCUIT%2C+LINE+**TRANSCEIVERS%22&rows=0&facet=**
> true&facet.mincount=1&facet.**limit=2&facet.field=S_**
> C1503120369&facet.field=S_**P1406389942&facet.field=S_**
> P1430116878&facet.field=S_**P1430116881&facet.field=S_**
> P1406453552&facet.field=S_**P1406451296&facet.field=S_**
> P1406452465&facet.field=S_**C2968809156&facet.field=S_**
> P1406389980&facet.field=S_**P1540477699&facet.field=S_**
> P1406389982&facet.field=S_**P1406389984&facet.field=S_**
> P1406451284&facet.field=S_**P1406389926&facet.field=S_**
> P1424886581&facet.field=S_**P2017662632&facet.field=F_**
> P1946367021&facet.field=S_**P1430116884&facet.field=S_**
> P2017662620&facet.field=F_**P1406451304&facet.field=F_**
> P1406451306&facet.field=F_**P1406451308&facet.field=S_**
> P1500901421&facet.field=S_**P1507138990&facet.field=I_**
> P1406452433&facet.field=I_**P1406453565&facet.field=I_**
> P1406452463&facet.field=I_**P1406453573&facet.field=I_**
> P1406451324&facet.field=I_**P1406451288&facet.field=S_**
> P1406451282&facet.field=S_**P1406452471&facet.field=S_**P14248866
> 05&facet.field=S_P1946367015&**facet.field=S_P1424886598&**
> facet.field=S_P1946367018&**facet.field=S_P1406453556&**
> facet.field=S_P1406389932&**facet.field=S_P2017662623&**
> facet.field=S_P1406450978&**facet.field=F_P1406452455&**
> facet.field=S_P1406389972&**facet.field=S_P1406389974&**
> facet.field=S_P1406389986&**facet.field=F_P1946367027&**
> facet.field=F_P1406451294&**facet.field=F_P1406451286&**
> facet.field=F_P1406451328&**facet.field=S_P1424886593&**
> facet.field=S_P1406453567&**facet.field=S_P2017662629&**
> facet.field=S_P1406453571&**facet.field=F_P1946367030&**
> facet.field=S_P1406453569&**facet.field=S_P2017662626&**
> facet.field=S_P1406389978&**facet.field=F_P1946367024
>
> My primary question here is, can Solr handle this kind of queries with so
> many facet fields. I have tried using both enum and fc for facet.method and
> there is no improvement with either.
>
> Appreciate any help on this. Thank you.
>
> - Rahul
>
>
> On Mon, Apr 30, 2012 at 2:53 PM, Rahul R <rahul.s...@gmail.com> wrote:
>
>  Hello,
>> I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
>> server on Solaris. I use embedded solr server. More details :
>> Number of docs in solr index : 1.4 million
>> Physical size of index : 640MB
>> Total number of fields in the index : 700 (99% of these are dynamic
>> fields)
>> Total number of fields enabled for faceting : 440
>> Avg number of facet fields participating in a faceted query : 50-70
>> Total RAM allocated to weblogic appserver : 3GB (max possible)
>>
>> In a multi user environment with 3 users using this application for a
>> period of around 40 minutes, the application runs out of memory. Analysis
>> of the heap dump shows that almost 85% of the memory is retained by the
>> FieldCache. Now I understand that the field cache is out of our control
>> but
>> would appreciate some suggestions on how to handle this issue.
>>
>> Some questions on this front :
>> - some mail threads on this forum seem to indicate that there could be
>> some connection between having dynamic fields and usage of FieldCache. Is
>> this true ? Most of the fields in my index are dynamic fields.
>> - as mentioned above, most of my faceted queries could have around 50-70
>> facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
>> per query). Could this be the source of the problem ? Is this too high for
>> solr to support ?
>> - Initially, I had a facet.sort defined in solrconfig.xml. Since
>> FieldCache builds up on sorting, I even removed the facet.sort and tried,
>> but no respite. The behavior is same as before.
>> - The document id that I have for each document is quite big (around 50
>> characters on average). Can this be a problem ? I reduced this to around
>> 15
>> characters and tried but still there is no improvement.
>> - Can the size of the data be a problem ? But on this forum, I see many
>> users talking of more than 100 million documents in their index. I have
>> only 1.4 million with physical size of 640MB. The physical server on which
>> this application is running, has sufficient RAM and CPU.
>> - What gets stored in the FieldCache ? Is it the entire document or just
>> the document Id ?
>>
>>
>> Any help is much appreciated. Thank you.
>>
>> regards
>> Rahul
>>
>>
>>
>>
>

Re: Lucene FieldCache - Out of memory exception

Reply via email to