doc count for each term is stored directly in the index - with the big caveat that it doesn't take deleted docs into account. That addresses the "get doc count for each term".
"get doc count for each field" is a different question... see below. On Tue, Jun 16, 2009 at 1:57 PM, Ryan McKinley<ryan...@gmail.com> wrote: > Hi- > > I'm trying to use the LukeRequestHandler with an index of ~9 million > docs. I know that counting the top / distinct terms for each field is > expensive and can take a LONG time to return. > > Is there a faster way to check the number of documents for each field? > Currently this gets the doc count for each term: > > if( sfield != null && sfield.indexed() ) { > Query q = qp.parse( fieldName+":[* TO *]" ); > int docCount = searcher.numDocs( q, matchAllDocs ); That looks like it gets the doc count for each field, as opposed to each term. > Looking at it again, that could be replaced with: > > if( sfield != null && sfield.indexed() ) { > Query q = qp.parse( fieldName+":[* TO *]" ); > int docCount = searcher.getDocSet( q ).size(); Correct. Unfortunately it probably won't save you much (one set intersection). I don't (currently) know of a way to get this info quicker. In a specific application, the fastest way would be to index a boolean or another single token for each document that had the field you were interested in.... then count the number of docs for the single token rather than all tokens in the field. -Yonik http://www.lucidimagination.com > Is there any faster option then running a query for each field? > > thanks > ryan >