Hi Chris,

The enum option is working for us, with suitable minDf settings. We are
able to do faceting with decent speed using this.

Thanks a lot,
Dave


On Fri, Feb 28, 2014 at 9:09 AM, David Miller <davthehac...@gmail.com>wrote:

> Hi Chris,
>
> Thanks for the info. I have looked into the "docValues" option earlier.
> But docValues doesn't support textField and we require textField to enable
> various tokenizer and analyzers (like shingle, pattern filter etc.) We
> require the faceting to be on terms with in the text field, not as a whole
> (which string does). A use case is to generate tag clouds from social
> conversations.
>
> The enum option is interesting. From its description it seemed not
> suitable for this purpose. I will try that out and see.
>
> Regards,
> Dave
>
>
>
>
>
>
>
> On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter <hossman_luc...@fucit.org
> > wrote:
>
>>
>> : Yes, the memory and cpu spiked for that machine. Another issue I found
>> in
>> : the log was "SolrException: Too many values for UnInvertedField
>> faceting on
>> : field".
>> : I was using the fc method. Will changing the method/params help?
>>
>> the fc/fcs faceting methods really aren't going to work well with
>> something like an indexed full text field where it has to build an
>> UnInvertedField with a huge volume of unique terms.
>>
>> : One thing I don't understand is that, the query was returning only a
>> single
>> : document, but the facet still seems to be having the issue.
>>
>> the data structures for faceting (which are the same for sorting in the
>> single valued case) are optimized for re-use -- regardles of the number of
>> documents that match, the FieldCache & UnInvertedField structures are
>> built up for the entire index.  You pay up front with Heap space to get
>> faster speed for your overall requests in return.
>>
>> For your situation, there are two possible sollutions to try...
>>
>> 1) facet.method=enum
>>
>> this is the classic alternative for faceting, it's typically much slower
>> then the fc & fcs methods but that's because it let's you trade speed for
>> RAM.  One specific thing you have to watch out for is that this will
>> usually use the filterCache, and since you are almost certainly going to
>> have more terms in this facet field then any workable size of your
>> filterCache, there's going to be a lot of wasted time constantly evicting
>> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>>
>> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>>
>> I haven't done much experimenting with this, particularly in our "facet
>> on full text" type situation, but when you use docValues, in theory,
>> in memory fieldCache and UnInvertedField structures are't needed --
>> instead much smaller structures are kept in the heap that refer down
>> directly to the DocValue structures memory mapped from disk (which are
>> created when you add/commit to your index -- they don't need "un-inverted"
>> at query time)
>>
>> I, for one, would definitley be interested to know if reindexing your full
>> text field with docValues makes the faceting feasible...
>>
>> https://cwiki.apache.org/confluence/display/solr/DocValues
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>

Reply via email to