Re: How to read values of a field efficiently

Martin Grotzke Sun, 19 Aug 2007 10:01:47 -0700

On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote:
> : Is it possible to get the values from the ValueSource (or from
> : getFieldCacheCounts) sorted by its natural order (from lowest to
> : highest values)?
> 
> well, an inverted term index is already a data structure listing terms
> from lowest to highest and the associated documents -- so if you want to
> iterate from low to high between a range and find matching docs you should
> just use hte TermEnum
Ok. Unfortunately I don't see how I can get a TermEnum for a specific
field (e.g. "price")... I tried


TermEnum te = searcher.getReader().terms(new Term(field, ""));

but this returns also terms for several other fields.
Is it possible at all to get a TermEnum for a specific field?

Then if I had this TermEnum, how can I check if a Term is in my
DocSet? In other words, I would like to read Terms for a specific
field from my DocSet - so that I could determine all price terms
for my DocSet.

Is there a way to achieve this?

Thanx in advance,
cheers,
Martin


>  -- the whole point of the FieldCache (and
> FieldCacheSource) is to have a "reverse inverted index" so you can quickly
> fetch the indexed value if you know the docId.
> 
> perhaps you should elaborate a little more on what it is you are trying to
> do so we can help you figure out how to do it more efficinelty ... i know
> you mentioend computing price ranges in your first message ... but you
> also didn't post any clear code about that part of your problem, just that
> the *other* part of your code that iterated over every doc was too slow
> ... perhaps you shouldn't be iterating over every doc to figure out your
> ranges .. perhaps you can iterate over the terms themselves?
> 
> 
> hang on ... rereading your first message i just noticed something i
> definitely didn't spot before...
> 
> >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms
> >> for the second request, while reading prices takes ~600 ms.
> 
> ...i clearly missed this, and fixated on your assertion that your reading
> of field values took longer then the stock methods -- but you're not just
> comparing the time needed byu different methods, you're also timing
> different fields.
> 
> this actually makes a lot of sense since there are probably a lot fewer
> unique values for the cat field, so there are a lot fewer discrete values
> to deal with when computing counts.
> 
> 
> 
> 
> -Hoss
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/

signature.asc
Description: This is a digitally signed message part

Re: How to read values of a field efficiently

Reply via email to