Hi Chris,

Thanks for the info. I have looked into the "docValues" option earlier. But
docValues doesn't support textField and we require textField to enable
various tokenizer and analyzers (like shingle, pattern filter etc.) We
require the faceting to be on terms with in the text field, not as a whole
(which string does). A use case is to generate tag clouds from social
conversations.

The enum option is interesting. From its description it seemed not suitable
for this purpose. I will try that out and see.

Regards,
Dave







On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> : Yes, the memory and cpu spiked for that machine. Another issue I found in
> : the log was "SolrException: Too many values for UnInvertedField faceting
> on
> : field".
> : I was using the fc method. Will changing the method/params help?
>
> the fc/fcs faceting methods really aren't going to work well with
> something like an indexed full text field where it has to build an
> UnInvertedField with a huge volume of unique terms.
>
> : One thing I don't understand is that, the query was returning only a
> single
> : document, but the facet still seems to be having the issue.
>
> the data structures for faceting (which are the same for sorting in the
> single valued case) are optimized for re-use -- regardles of the number of
> documents that match, the FieldCache & UnInvertedField structures are
> built up for the entire index.  You pay up front with Heap space to get
> faster speed for your overall requests in return.
>
> For your situation, there are two possible sollutions to try...
>
> 1) facet.method=enum
>
> this is the classic alternative for faceting, it's typically much slower
> then the fc & fcs methods but that's because it let's you trade speed for
> RAM.  One specific thing you have to watch out for is that this will
> usually use the filterCache, and since you are almost certainly going to
> have more terms in this facet field then any workable size of your
> filterCache, there's going to be a lot of wasted time constantly evicting
> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>
>
> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>
> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>
> I haven't done much experimenting with this, particularly in our "facet
> on full text" type situation, but when you use docValues, in theory,
> in memory fieldCache and UnInvertedField structures are't needed --
> instead much smaller structures are kept in the heap that refer down
> directly to the DocValue structures memory mapped from disk (which are
> created when you add/commit to your index -- they don't need "un-inverted"
> at query time)
>
> I, for one, would definitley be interested to know if reindexing your full
> text field with docValues makes the faceting feasible...
>
> https://cwiki.apache.org/confluence/display/solr/DocValues
>
> -Hoss
> http://www.lucidworks.com/
>

Reply via email to