Flatten term frequency

Vincenzo D'Amore Thu, 29 Nov 2018 02:44:43 -0800

Hi all,

I have a relevancy problem, I suppose to know a solution for this problem
but I would like to know if in your experience there is a better one.


For example I have two documents which have the "termA" in their field
"title", the former has the "termA" repeated more times but the latter has
the term only once. When searching for "termA" the former has bigger score
due to TF/IDF.

Both the documents are fairly similar so I don't want that term frequency
in the title boosts the score.
The only solution I know to flatten the score when there is a difference in
term frequency is having configured my own similarity class in the schema
that returns constantly 1 for term frequency.

I'm curious to know if you know another way, in the beginning I thought to
omit term frequency at index time.

Looking around I've found an old issue
https://issues.apache.org/jira/browse/LUCENE-1561 where omitTF has been
renamed into omitTermFreqAndPositions.

What I've understood is that omitting term frequency imply also remove term
positions, so very likely omitting term frequency is not what I'm looking
for.

As said, I'm curious to know if you know another way, and as usual thanks i
advance for your time e for your patience.

Best regards,
Vincenzo


-- 
Vincenzo D'Amore

Flatten term frequency

Reply via email to