To make a dictionary with a 'minimum document count' you need to make the dictionary from the facets. Facets will create this for you; but will allocate memory for every last term. The last N facets will have the smallest # of terms.
To get term counts for hundreds of millions of terms, I think you need a separate program that walks the terms. It would be very easy to pull the term and counts, and print terms with count > N. Lucene's CheckIndex program gives a nice base for this kind of thing. On Thu, Aug 26, 2010 at 3:09 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : What you're talking about is effectively promoting the spellcheck > : index to a first-class Solr index, instead of an appendage bolted on > : the side of an existing core. Given sharding and distributed search, > : this may be a better design. > > even w/o promoting the spell index to be a "main" index, it still seems > like the "rebuild" aspect of SpellCheck component could be improved to > take advantage of regular Lucene IndexReader semanics: don't reopen the > reader used to serve SpellComponent requests untill the "new" index is > completley built. > > I'm actaully really suprised that it doesn't work that way right now -- > but i imagine this has to do with the way the SpellCheckCOmponent deals > with the SpellChecker abstraction that hides the index -- still, it seems > like there's room for improvement there. > > > -Hoss > > -- > http://lucenerevolution.org/ ... October 7-8, Boston > http://bit.ly/stump-hoss ... Stump The Chump! > > -- Lance Norskog goks...@gmail.com