On 6/29/2011 7:50 PM, Yonik Seeley wrote:
OK, your filter queries have hundreds of terms in them (and that means
hundreds of term lookups, which uses the term index).
Thus, your termIndexInterval change is be the leading suspect for the
slowdown. A termIndexInterval of 1024 means that
a term lookup will seek to the closest 1024th term and then call
next() until the desired term is found. Hence instead of calling
next()
an average of 64 times internally, it's now 512 times.
Of course there is still a mystery about why your tii (which is the
term index) would be so much bigger instead of smaller...
It turns out I got the two indexes backwards, the smaller one was the
new index. I may have mixed up the indexes on some of the other files
too, but they weren't much different, so I'm not going to try and figure
out where any mistakes might be.
Earlier in the afternoon I figured this out, removed termIndexInterval
from my config, and rebuilt the index. I had originally put this in to
speed up indexing. The evidence I had available at the time told me
that this goal was accomplished, but the rebuild actually went faster
without the statement. Warming times are now averaging under 10 seconds
even with the warmup count back up to 8. This is still slower than I
would like, but it is a major improvement. Even more important, I
understand what happened.
I was thinking perhaps I might actually decrease the termIndexInterval
value below the default of 128. I know from reading the Hathi Trust
blog that memory usage for the tii file is much more than the size of
the file would indicate, but if I increase it from 13MB to 26MB, it
probably would still be OK.
Are any index intervals for the other Lucene files configurable in a
similar manner? I know that screwing too much with the defaults can
make things much worse, so I would be very careful with any adjustments,
and try to fully understand why any performance gain or loss occurred.
Thanks,
Shawn