On Wed, Jun 30, 2010 at 6:19 PM, Robert Petersen <rober...@buy.com> wrote:
> Most of these hundreds of facet fields have tens of values but a couple have 
> thousands, is thousands of different values too many to do enum or is that 
> still ok?  If so I could apply it carte blanche to the whole field...

enum can still handle thousands, but often slower (and remember to
increase the size of your filterCache which will now see greater
usage).

I would do facet.method=enum for the default and then override that
for those few fields with thousands of unique terms via
f.123_contentAttributeToken.facet.method=fc

-Yonik
http://www.lucidimagination.com

> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, June 30, 2010 1:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: OOM on uninvert field request
>
> On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen <rober...@buy.com> wrote:
>> Hello I am trying to find the right max and min settings for Java 1.6 on 
>> 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
>> currently have java set to an even 4GB (export JAVA_OPTS="-Xmx4096m 
>> -Xms4096m") for both min and max which is doing pretty well but occasionally 
>> still getting the below OOM errors.  We're running on dual quad core xeons 
>> with 16GB memory installed.  I've been getting the below OOM exceptions 
>> still though.
>>
>> Is the memsize mentioned in the INFO for the uninvert in bytes? is 
>> memSize=29604020 mean 29MB?
>
> Yes.
>
>> We have a few hundred of these fields and they contain ints used as IDs, and 
>> so I guess could they eat all the memory to uninvert them all after we apply 
>> load and enough queries are performed.  Does the field type matter, would 
>> int be better than string if these are lookup ids sparsely populated across 
>> the index?
>
> No, using UnInvertedField faceting, the fieldType won't matter much at
> all for the space it takes up.
>
> The key here is that it looks like the number of unique terms in these
> fields is low - you would probably do much better with
> facet.method=enum (which iterates over terms rather than documents).
>
> -Yonik
> http://www.lucidimagination.com
>

Reply via email to