On Mon, Aug 3, 2009 at 2:43 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

>
> >
> > I'm getting the following warning in my logs: 2009-08-03 13:41:40,114
> > [http-127.0.0.1-8080-1] WARN  org.apache.solr.core.SolrCore - Approaching
> > too many values for UnInvertedField faceting on field 'originaltext' :
> > bucket size=15802492
> >
> > What's the impact of that?  If the number of values (number of unique
> terms
> > for that field, or some other "values"?) exceeds that limit, will
> faceting
> > for that field go back to a different technique and be slower, or...?
>
> It will throw an exception.
>
> This method of faceting wasn't really designed for big full-text fields.
> The enum method should work better for this... try something like the
> following:
>
> f.originaltext.facet.method=enum
> facet.enum.cache.minDf=10000
>
> -Yonik
> http://www.lucidimagination.com
>

Hmm, that's a hard thing to sell to the user and my boss, as it makes the
query time go from nearly always being sub-second (frequently less than 60
ms), to ranging up to nearly 4 seconds for a new query not already in the
cache.  (My test was with 100 facets being requested, which may be
reasonable, as one reason to facet on a full-text field to provide a dynamic
world-cloud).

How can I mitigate the time it takes with the enum method?  Do I need to ask
for more facet values in my facet-warming query (I set facet.limit to 1 as
it didn't seem to matter to the FieldValueCache)? And/Or do I need to up the
autowarmCount on the FilterCache?  If speed is the primary concern vs
memory, should I bother with the minDf setting?

I guess I should update my code to use the enum method on all the fields
that are likely to risk crossing this line.  Should I be looking at the
termInstances property on the fields that are displayed in the
FieldValueCache on the stats page, and figuring those on the order of 10
million are likely to grow past the limit?

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Reply via email to