Hello, > > On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote: > > : Is it possible to get the values from the ValueSource (or from > > : getFieldCacheCounts) sorted by its natural order (from lowest to > > : highest values)? > > > > well, an inverted term index is already a data structure > listing terms > > from lowest to highest and the associated documents -- so > if you want to > > iterate from low to high between a range and find matching > docs you should > > just use hte TermEnum > Ok. Unfortunately I don't see how I can get a TermEnum for a specific > field (e.g. "price")... I tried > > TermEnum te = searcher.getReader().terms(new Term(field, "")); > > but this returns also terms for several other fields.
correct, see http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/IndexReader.html#terms() > Is it possible at all to get a TermEnum for a specific field? AFAIK not directly. Normally, I use something like: TermEnum terms = searcher.getReader().terms(new Term(field, "")); while (terms.term() != null && terms.term().field() == field){ //do things terms.next(); } > > Then if I had this TermEnum, how can I check if a Term is in my > DocSet? In other words, I would like to read Terms for a specific > field from my DocSet - so that I could determine all price terms > for my DocSet. Is your DocSet some sort of filter? if so, in your while loop you can fill a new Filter, like BitSet docFilter = new BitSet(reader.maxDoc()); and in the while loop: docs.seek(terms); while (docs.next()) { docFilter.set(docs.doc()); } If your DocSet is not a BitSet you might be able to construct one for it, Regards Ard > > Is there a way to achieve this? > > Thanx in advance, > cheers, > Martin > > > > -- the whole point of the FieldCache (and > > FieldCacheSource) is to have a "reverse inverted index" so > you can quickly > > fetch the indexed value if you know the docId. > > > > perhaps you should elaborate a little more on what it is > you are trying to > > do so we can help you figure out how to do it more > efficinelty ... i know > > you mentioend computing price ranges in your first message > ... but you > > also didn't post any clear code about that part of your > problem, just that > > the *other* part of your code that iterated over every doc > was too slow > > ... perhaps you shouldn't be iterating over every doc to > figure out your > > ranges .. perhaps you can iterate over the terms themselves? > > > > > > hang on ... rereading your first message i just noticed something i > > definitely didn't spot before... > > > > >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms > > >> for the second request, while reading prices takes ~600 ms. > > > > ...i clearly missed this, and fixated on your assertion > that your reading > > of field values took longer then the stock methods -- but > you're not just > > comparing the time needed byu different methods, you're also timing > > different fields. > > > > this actually makes a lot of sense since there are probably > a lot fewer > > unique values for the cat field, so there are a lot fewer > discrete values > > to deal with when computing counts. > > > > > > > > > > -Hoss > > > -- > Martin Grotzke > http://www.javakaffee.de/blog/ >