On Thu, Aug 9, 2012 at 10:20 AM, tech.vronk <t...@vronk.net> wrote: > Hello, > > I wonder how to figure out the total token count in a collection (per > index), i.e. the size of a corpus/collection measured in tokens. >
You want to use this statistic, which tells you number of tokens for an indexed field: http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/index/Terms.html#getSumTotalTermFreq%28%29 -- lucidimagination.com