Hi Chris, The enum option is working for us, with suitable minDf settings. We are able to do faceting with decent speed using this.
Thanks a lot, Dave On Fri, Feb 28, 2014 at 9:09 AM, David Miller <davthehac...@gmail.com>wrote: > Hi Chris, > > Thanks for the info. I have looked into the "docValues" option earlier. > But docValues doesn't support textField and we require textField to enable > various tokenizer and analyzers (like shingle, pattern filter etc.) We > require the faceting to be on terms with in the text field, not as a whole > (which string does). A use case is to generate tag clouds from social > conversations. > > The enum option is interesting. From its description it seemed not > suitable for this purpose. I will try that out and see. > > Regards, > Dave > > > > > > > > On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter <hossman_luc...@fucit.org > > wrote: > >> >> : Yes, the memory and cpu spiked for that machine. Another issue I found >> in >> : the log was "SolrException: Too many values for UnInvertedField >> faceting on >> : field". >> : I was using the fc method. Will changing the method/params help? >> >> the fc/fcs faceting methods really aren't going to work well with >> something like an indexed full text field where it has to build an >> UnInvertedField with a huge volume of unique terms. >> >> : One thing I don't understand is that, the query was returning only a >> single >> : document, but the facet still seems to be having the issue. >> >> the data structures for faceting (which are the same for sorting in the >> single valued case) are optimized for re-use -- regardles of the number of >> documents that match, the FieldCache & UnInvertedField structures are >> built up for the entire index. You pay up front with Heap space to get >> faster speed for your overall requests in return. >> >> For your situation, there are two possible sollutions to try... >> >> 1) facet.method=enum >> >> this is the classic alternative for faceting, it's typically much slower >> then the fc & fcs methods but that's because it let's you trade speed for >> RAM. One specific thing you have to watch out for is that this will >> usually use the filterCache, and since you are almost certainly going to >> have more terms in this facet field then any workable size of your >> filterCache, there's going to be a lot of wasted time constantly evicting >> things fro mthat cache -- playing with facet.enum.cache.minDf should help. >> >> >> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter >> >> 2) use docValues="true" on your field (with facet.method=fc or fcs) >> >> I haven't done much experimenting with this, particularly in our "facet >> on full text" type situation, but when you use docValues, in theory, >> in memory fieldCache and UnInvertedField structures are't needed -- >> instead much smaller structures are kept in the heap that refer down >> directly to the DocValue structures memory mapped from disk (which are >> created when you add/commit to your index -- they don't need "un-inverted" >> at query time) >> >> I, for one, would definitley be interested to know if reindexing your full >> text field with docValues makes the faceting feasible... >> >> https://cwiki.apache.org/confluence/display/solr/DocValues >> >> -Hoss >> http://www.lucidworks.com/ >> > >