Re: Can we manipulate termfreq to count as 1 for multiple matches?

Chris Hostetter Fri, 22 Mar 2013 12:55:36 -0700

: parameter "*omitTermFreqAndPositions"*

the key thing to remember being: if you use this, then by omiting 
positions you can no longer do phrase queries.


: or you can use a custom similarity class that overrides the term freq and
: return one for only that field.
: http://wiki.apache.org/solr/SchemaXml#Similarity

There is actaully a SImilarity class already written designed to target 
this specific problem of "keyword spamming" in text fields...

: > Document_1
: > Name = Blue Jeans
: > Description = This jeans is very soft.  Jeans is pretty nice.
: >
: > Now, If I Search for "Jeans" then "Jeans" is found in 2 places in
: > Description field.

...first off, it's important to remember that 'tf' doesn't afect things in 
isolation -- usually there is also a "lenghtNorm" factor that would 
penalize the score of that document compared to another one that had a 
short description that only included the word Jeans once (ie: "These are 
Red Jeans")

Using the SweetSpotSimilarity, you can specify target values identifying 
what ideal values (ie: "sweet spot") you anticipate in a typical document 
for both the tf and lengthNorm ... 

https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html
https://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/SweetSpotSimilarity.html

...so if you want to say that "1 to 4 instances of the term are equally 
good, and above that start to reward docs more" you could configure the tf 
function to do that.

(If you really want the same tf() scoring factor for all docs, regardless 
on how many times the term is mentioned -- then you would need to write 
your own SImilarity subclass at the moment)

-Hoss

Re: Can we manipulate termfreq to count as 1 for multiple matches?

Reply via email to