In trunk (flex) you can ask each segment for its unique term count. But to compute the unique term count across all segments is necessarily costly (requires merging them, to de-dup), as Hoss described.
Mike On Tue, Jul 27, 2010 at 12:27 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Hi Jason, > > Are you looking for the total number of unique terms or total number of term > occurrences? > > Checkindex reports both, but does a bunch of other work so is probably not > the fastest. > > If you are looking for total number of term occurrences, you might look at > contrib/org/apache/lucene/misc/HighFreqTerms.java. > > If you are just looking for the total number of unique terms, I wonder if > there is some low level API that would allow you to just access the in-memory > representation of the tii file and then multiply the number of terms in it by > your indexDivisor (default 128). I haven't dug in to the code so I don't > actually know how the tii file gets loaded into a data structure in memory. > If there is api access, it seems like this might be the quickest way to get > the number of unique terms. (Of course you would have to do this for each > segment). > > Tom > -----Original Message----- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Monday, July 26, 2010 8:39 PM > To: solr-user@lucene.apache.org > Subject: Re: Total number of terms in an index? > > > : Sorry, like the subject, I mean the total number of terms. > > it's not stored anywhere, so the only way to fetch it is to actually > iteate all of the terms and count them (that's why LukeRequestHandler is > slow slow to compute this particular value) > > If i remember right, someone mentioned at one point that flex would let > you store data about stuff like this in your index as part of the segment > writing, but frankly i'm still not sure how that iwll help -- because you > unless your index is fully optimized, you still have to iterate the terms > in each segment to 'de-dup' them. > > > -Hoss > >