Hi Yonik,

Thank you for the explanation. If the primary goal was to save index space for a very specific subclass of fields, the implementation certainly makes more sense. I wonder, though, if it could also make sense to support a query-time only boolean to optionally disable TF independently, on a per-field basis? Or, perhaps (and this may be demonstrating my naivete), allowing Similarity to be overridden on a per-field basis? I imagine it could make scoring even more confusing than it sometimes already is, though. It's an atrocious hack on my part, but I largely seem to have achieved my tf goals in this manner; I overrode the getSimilarity methods in PhraseQuery and TermQuery to return a fixed-tf Similarity implementation if the field value is in the set of those I care about. From the looks of it, though, generalizing the change into anything other than a hack would touch a rather large number of code points.

Best regards,
Aaron


Yonik Seeley wrote:
On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee <ucbmc...@gmail.com> wrote:
I suppose I'm curious why the omitTfAndPositions option conflates two
apparently independent features.

This relates to the index format, and is more for performance/size
benefits when they are not needed.  In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).

-Yonik
http://www.lucidimagination.com

Reply via email to