Re: Disabling tf (term frequency) during indexing and/or scoring

Aaron McKee Fri, 18 Sep 2009 08:05:53 -0700


Hi Yonik,

Thank you for the explanation. If the primary goal was to save indexspace for a very specific subclass of fields, the implementationcertainly makes more sense. I wonder, though, if it could also makesense to support a query-time only boolean to optionally disable TFindependently, on a per-field basis? Or, perhaps (and this may bedemonstrating my naivete), allowing Similarity to be overridden on aper-field basis? I imagine it could make scoring even more confusingthan it sometimes already is, though. It's an atrocious hack on my part,but I largely seem to have achieved my tf goals in this manner; Ioverrode the getSimilarity methods in PhraseQuery and TermQuery toreturn a fixed-tf Similarity implementation if the field value is in theset of those I care about. From the looks of it, though, generalizingthe change into anything other than a hack would touch a rather largenumber of code points.


Best regards,
Aaron


Yonik Seeley wrote:

On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee <ucbmc...@gmail.com> wrote:

I suppose I'm curious why the omitTfAndPositions option conflates two
apparently independent features.


This relates to the index format, and is more for performance/size
benefits when they are not needed.  In the index, it's impossible to
omit the tf info and keep the position info (the frequency is the
number of positions).

-Yonik
http://www.lucidimagination.com

Re: Disabling tf (term frequency) during indexing and/or scoring

Reply via email to