On Mon, 2007-07-30 at 00:30 -0700, Chris Hostetter wrote: > : Is it possible to get the values from the ValueSource (or from > : getFieldCacheCounts) sorted by its natural order (from lowest to > : highest values)? > > well, an inverted term index is already a data structure listing terms > from lowest to highest and the associated documents -- so if you want to > iterate from low to high between a range and find matching docs you should > just use hte TermEnum -- the whole point of the FieldCache (and > FieldCacheSource) is to have a "reverse inverted index" so you can quickly > fetch the indexed value if you know the docId. Ok, I will have a look at the TermEnum and try this.
> > perhaps you should elaborate a little more on what it is you are trying to > do so we can help you figure out how to do it more efficinelty ... I want to read all values of the price field of the found docs, and calculate the mean value and the standard deviation. Based on the min value (mean - deviation, the max value (mean + deviation) and the number of prices I calculate price ranges. Then I iterate over the sorted array of prices and count how many prices go into the current range. This sorting (Arrays.sort) takes much time, that's why I asked if it's possible to read values in sorted order. But reading this, I think it would also be possible to skip sorting and check for each price into which bucket it would go and increment the counter for this bucket - this should also be a possibility for optimization. > ... perhaps you shouldn't be iterating over every doc to figure out your > ranges .. perhaps you can iterate over the terms themselves? Are you referring to TermEnum with this? Thanx && cheers, Martin > > > hang on ... rereading your first message i just noticed something i > definitely didn't spot before... > > >> Fairly long: getFieldCacheCounts for the cat field takes ~70 ms > >> for the second request, while reading prices takes ~600 ms. > > ...i clearly missed this, and fixated on your assertion that your reading > of field values took longer then the stock methods -- but you're not just > comparing the time needed byu different methods, you're also timing > different fields. > > this actually makes a lot of sense since there are probably a lot fewer > unique values for the cat field, so there are a lot fewer discrete values > to deal with when computing counts. > > > > > -Hoss > -- Martin Grotzke http://www.javakaffee.de/blog/
signature.asc
Description: This is a digitally signed message part