: thanks, but that's what i started with, but it took an even longer time and : threw this: : Approaching too many values for UnInvertedField faceting on field 'text' : : bucket size=15560140 : Approaching too many values for UnInvertedField faceting on field 'text : : bucket size=15619075 : Exception during facet counts:org.apache.solr.common.SolrException: Too many : values for UnInvertedField faceting on field text
right ... facet.method=fc is a good default, but cases like full text faceting can cause it to seriously blow up the memory ... i didn't eve realize it was possible to get it to fail this way, i would have just expected an OutOfmemoryException. facet.method=enum is probably your best bet in this situation precisely because it does a linera scan over the terms ... it's slower because it's safer. the one speed up you might be able to get is to ensure you don't use the filterCache -- that way you don't wast time constantly caching/overwriting DocSets and FWIW... : > If facet search is not the correct approach, i thought about using : > something : > like org.apache.lucene.misc.HighFreqTerms, but i'm not sure how to do this : > in solr. Should i implememt a request handler that executes this kind of HighFreqTerms just looks at the raw docfreq for the terms, nearly identical to the TermsComponent -- there is no way to deal with your "subset of documents" requrements using an approach like that. If the number of subsets you have to deal with are fixed, finite, and non-overlapping, using distinct cores for each subset (which you can aggregate using distributed search when you don't want this type of query) can also be a wise choice in many situations (ie: if you have a "books" core and a "movies" core you can search both using distributed search, or hit the terms component on just one of them to get the top terms for that core) -Hoss