Hi Yonik,
Thank you for the explanation. If the primary goal was to save index
space for a very specific subclass of fields, the implementation
certainly makes more sense. I wonder, though, if it could also make
sense to support a query-time only boolean to optionally disable TF
independently, on a per-field basis? Or, perhaps (and this may be
demonstrating my naivete), allowing Similarity to be overridden on a
per-field basis? I imagine it could make scoring even more confusing
than it sometimes already is, though. It's an atrocious hack on my part,
but I largely seem to have achieved my tf goals in this manner; I
overrode the getSimilarity methods in PhraseQuery and TermQuery to
return a fixed-tf Similarity implementation if the field value is in the
set of those I care about. From the looks of it, though, generalizing
the change into anything other than a hack would touch a rather large
number of code points.
Best regards,
Aaron
Yonik Seeley wrote:
On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee <ucbmc...@gmail.com> wrote:
I suppose I'm curious why the omitTfAndPositions option conflates two
apparently independent features.
This relates to the index format, and is more for performance/size
benefits when they are not needed. In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).
-Yonik
http://www.lucidimagination.com