Re: Luke / get doc count for each term

Yonik Seeley Tue, 16 Jun 2009 13:57:46 -0700

doc count for each term is stored directly in the index - with the big
caveat that it doesn't take deleted docs into account.  That addresses
the "get doc count for each term".

"get doc count for each field" is a different question... see below.

On Tue, Jun 16, 2009 at 1:57 PM, Ryan McKinley<ryan...@gmail.com> wrote:
> Hi-
>
> I'm trying to use the LukeRequestHandler with an index of ~9 million
> docs.  I know that counting the top / distinct terms for each field is
> expensive and can take a LONG time to return.
>
> Is there a faster way to check the number of documents for each field?
>  Currently this gets the doc count for each term:
>
>      if( sfield != null && sfield.indexed() ) {
>        Query q = qp.parse( fieldName+":[* TO *]" );
>        int docCount = searcher.numDocs( q, matchAllDocs );

That looks like it gets the doc count for each field, as opposed to each term.

> Looking at it again, that could be replaced with:
>
>      if( sfield != null && sfield.indexed() ) {
>        Query q = qp.parse( fieldName+":[* TO *]" );
>        int docCount = searcher.getDocSet( q ).size();

Correct.  Unfortunately it probably won't save you much (one set intersection).
I don't (currently) know of a way to get this info quicker.

In a specific application, the fastest way would be to index a boolean
or another single token for each document that had the field you were
interested in.... then count the number of docs for the single token
rather than all tokens in the field.

-Yonik
http://www.lucidimagination.com

> Is there any faster option then running a query for each field?
>
> thanks
> ryan
>

Re: Luke / get doc count for each term

Reply via email to