The name escapes me but Lucene used to have a special Similarity impl for this sort of stuff. I think it's still there.
We implemented a slightly better Similarity that used Gaussian distribution and was thus smoother. Try doing that. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 7, 2012 4:16 AM, "Dotan Cohen" <dotanco...@gmail.com> wrote: > Hi all! One area when I am applying Solr deals with variable-length > posts by users: think of things from one word posts ("Cool!" with an > attached photo) to blog-post length (500-1000 words). Due to Field > Normalization, the short posts get the highest Solr score, while the > long, informative posts are pushed to the end of the results. > Therefore I am moving to remove Field Normalization with > omitNorms=true. However, there do exist the craft users who stuff > hundreds of irrelevant words together to get noticed. Therefore, I > would like to try some sort of exponential (or even linear) Field > Normalization where no normalization is performed on one-word > documents but the longest documents (over a few hundred words) get > some penalty. > > Are there any facilities in Solr for performing this? I would of > course prefer query-time computation as that let me do other things > with the data, but if only index-time computation is possible the I > can accept that. > > Thank you! > > -- > Dotan Cohen > > http://gibberish.co.il > http://what-is-what.com >