RE: How to read values of a field efficiently

Ard Schrijvers Sun, 19 Aug 2007 12:39:55 -0700

Hello,

> 
> On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote:
> > : Is it possible to get the values from the ValueSource (or from
> > : getFieldCacheCounts) sorted by its natural order (from lowest to
> > : highest values)?
> > 
> > well, an inverted term index is already a data structure 
> listing terms
> > from lowest to highest and the associated documents -- so 
> if you want to
> > iterate from low to high between a range and find matching 
> docs you should
> > just use hte TermEnum
> Ok. Unfortunately I don't see how I can get a TermEnum for a specific
> field (e.g. "price")... I tried
> 
> TermEnum te = searcher.getReader().terms(new Term(field, ""));
> 
> but this returns also terms for several other fields.


correct, see 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/IndexReader.html#terms()

> Is it possible at all to get a TermEnum for a specific field?

AFAIK not directly. Normally, I use something like:

TermEnum terms = searcher.getReader().terms(new Term(field, ""));
        while (terms.term() != null && terms.term().field() == field){
                //do things                               
                terms.next();
        }

> 
> Then if I had this TermEnum, how can I check if a Term is in my
> DocSet? In other words, I would like to read Terms for a specific
> field from my DocSet - so that I could determine all price terms
> for my DocSet.

Is your DocSet some sort of filter? if so, in your while loop you can fill a 
new Filter, like

BitSet docFilter = new BitSet(reader.maxDoc());

and in the while loop:

        docs.seek(terms);
        while (docs.next()) {
           docFilter.set(docs.doc());
        }

If your DocSet is not a BitSet you might be able to construct one for it,

Regards Ard

> 
> Is there a way to achieve this?
> 
> Thanx in advance,
> cheers,
> Martin
> 
> 
> >  -- the whole point of the FieldCache (and
> > FieldCacheSource) is to have a "reverse inverted index" so 
> you can quickly
> > fetch the indexed value if you know the docId.
> > 
> > perhaps you should elaborate a little more on what it is 
> you are trying to
> > do so we can help you figure out how to do it more 
> efficinelty ... i know
> > you mentioend computing price ranges in your first message 
> ... but you
> > also didn't post any clear code about that part of your 
> problem, just that
> > the *other* part of your code that iterated over every doc 
> was too slow
> > ... perhaps you shouldn't be iterating over every doc to 
> figure out your
> > ranges .. perhaps you can iterate over the terms themselves?
> > 
> > 
> > hang on ... rereading your first message i just noticed something i
> > definitely didn't spot before...
> > 
> > >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms
> > >> for the second request, while reading prices takes ~600 ms.
> > 
> > ...i clearly missed this, and fixated on your assertion 
> that your reading
> > of field values took longer then the stock methods -- but 
> you're not just
> > comparing the time needed byu different methods, you're also timing
> > different fields.
> > 
> > this actually makes a lot of sense since there are probably 
> a lot fewer
> > unique values for the cat field, so there are a lot fewer 
> discrete values
> > to deal with when computing counts.
> > 
> > 
> > 
> > 
> > -Hoss
> > 
> -- 
> Martin Grotzke
> http://www.javakaffee.de/blog/
>

RE: How to read values of a field efficiently

Reply via email to