To make a dictionary with a 'minimum document count' you need to make
the dictionary from the facets. Facets will create this for you; but
will allocate memory for every last term. The last N facets will have
the smallest # of terms.

To get term counts for hundreds of millions of terms, I think you need
a separate program that walks the terms. It would be very easy to pull
the term and counts, and print terms with count > N. Lucene's
CheckIndex program gives a nice base for this kind of thing.

On Thu, Aug 26, 2010 at 3:09 PM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : What you're talking about is effectively promoting the spellcheck
> : index to a first-class Solr index, instead of an appendage bolted on
> : the side of an existing core. Given sharding and distributed search,
> : this may be a better design.
>
> even w/o promoting the spell index to be a "main" index, it still seems
> like the "rebuild" aspect of SpellCheck component could be improved to
> take advantage of regular Lucene IndexReader semanics: don't reopen the
> reader used to serve SpellComponent requests untill the "new" index is
> completley built.
>
> I'm actaully really suprised that it doesn't work that way right now --
> but i imagine this has to do with the way the SpellCheckCOmponent deals
> with the SpellChecker abstraction that hides the index -- still, it seems
> like there's room for improvement there.
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss      ...  Stump The Chump!
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to