the time factor has more to do with teh number of distinct values in the field being faceted on then it does the number of documents. with 1 million documents there are probably a lot more indexed terms in the "contents" field then there are with only 1000 documents.
As an inverted index, there is no efficient way for Solr's faceting code to know just which terms are in the 37 docs that match your query -- it has to check them all. The good news is that if you can make your filterCache big enough, it won't matter which 37 (or 37,000) documents match your next query where you facet on the contents field -- the facet counts should compute much faster. For fields where Solr can tell it will have just one value, it can do some optimizations to use the FieldCache instead of iterating over every term in the field you're faceting on, but that would apply to your "contents" field. : I'm doing a facet search like the following. The content field schema is : : <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> : <filter class="solr.StandardFilterFactory"/> : <filter class="solr.StopFilterFactory" : ignoreCase="true" words="stopwords.txt"/> : <filter class="solr.LowerCaseFilterFactory"/> : <filter class="solr.TrimFilterFactory"/> : : : /solr/select?q=dirt : field:www.example.com&facet=true&facet.field=content&facet.limit=-1&facet.mincount=1 : : If I run this on a server with a total of 1000 pages that contain : pages for www.example.com, it returns in about 1 second, and gives me : 37 docs, and quite a few facet values. : : If I run this same search on a server with over a 1,000,000 pages in : total, including the pages that are in the first example, it returns : in about 2 minutes! still giving me 37 docs and the same amount of : facet values. : : Seems to me the search should have been constrained to : field:www.example.com in both cases, so perhaps shouldn't be much : different in time to execute. : : Is there any more in formation on facet searching that will explain : what's going on? -Hoss