Am 09.08.2012 18:02, schrieb Robert Muir:
On Thu, Aug 9, 2012 at 10:20 AM, tech.vronk <t...@vronk.net> wrote:
Hello,
I wonder how to figure out the total token count in a collection (per
index), i.e. the size of a corpus/collection measured in tokens.
You want to use this statistic, which tells you number of tokens for
an indexed field:
http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/index/Terms.html#getSumTotalTermFreq%28%29
just to say:
thank you, this seems to work well!
matej