On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote: > : Is it possible to get the values from the ValueSource (or from > : getFieldCacheCounts) sorted by its natural order (from lowest to > : highest values)? > > well, an inverted term index is already a data structure listing terms > from lowest to highest and the associated documents -- so if you want to > iterate from low to high between a range and find matching docs you should > just use hte TermEnum Ok. Unfortunately I don't see how I can get a TermEnum for a specific field (e.g. "price")... I tried
TermEnum te = searcher.getReader().terms(new Term(field, "")); but this returns also terms for several other fields. Is it possible at all to get a TermEnum for a specific field? Then if I had this TermEnum, how can I check if a Term is in my DocSet? In other words, I would like to read Terms for a specific field from my DocSet - so that I could determine all price terms for my DocSet. Is there a way to achieve this? Thanx in advance, cheers, Martin > -- the whole point of the FieldCache (and > FieldCacheSource) is to have a "reverse inverted index" so you can quickly > fetch the indexed value if you know the docId. > > perhaps you should elaborate a little more on what it is you are trying to > do so we can help you figure out how to do it more efficinelty ... i know > you mentioend computing price ranges in your first message ... but you > also didn't post any clear code about that part of your problem, just that > the *other* part of your code that iterated over every doc was too slow > ... perhaps you shouldn't be iterating over every doc to figure out your > ranges .. perhaps you can iterate over the terms themselves? > > > hang on ... rereading your first message i just noticed something i > definitely didn't spot before... > > >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms > >> for the second request, while reading prices takes ~600 ms. > > ...i clearly missed this, and fixated on your assertion that your reading > of field values took longer then the stock methods -- but you're not just > comparing the time needed byu different methods, you're also timing > different fields. > > this actually makes a lot of sense since there are probably a lot fewer > unique values for the cat field, so there are a lot fewer discrete values > to deal with when computing counts. > > > > > -Hoss > -- Martin Grotzke http://www.javakaffee.de/blog/
signature.asc
Description: This is a digitally signed message part