You should start by checking out the "SweetSpotSimilarity" .. it was 
heavily designed arround the idea of dealing with things like excessively 
verbose titles, and keyword stuffing in summary text ... so you can 
configure your expectation for what a "normal" length doc is, and they 
will be penalized for being longer then that.  similarly you can say what 
a 'resaonable' tf is, and docs that exceed that would't get added boost 
(which in conjunction with teh lengthNorm penality penalizes docs that 
stuff keywords)

https://lucene.apache.org/solr/5_0_0/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html

https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.computeLengthNorm.svg
https://lucene.apache.org/core/5_0_0/misc/org/apache/lucene/misc/doc-files/ss.hyperbolicTf.svg


-Hoss
http://www.lucidworks.com/

Reply via email to