The name escapes me but Lucene used to have a special Similarity impl for
this sort of stuff. I think it's still there.

We implemented a slightly better Similarity that used Gaussian distribution
and was thus smoother. Try doing that.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 7, 2012 4:16 AM, "Dotan Cohen" <dotanco...@gmail.com> wrote:

> Hi all! One area when I am applying Solr deals with variable-length
> posts by users: think of things from one word posts ("Cool!" with an
> attached photo) to blog-post length (500-1000 words). Due to Field
> Normalization, the short posts get the highest Solr score, while the
> long, informative posts are pushed to the end of the results.
> Therefore I am moving to remove Field Normalization with
> omitNorms=true. However, there do exist the craft users who stuff
> hundreds of irrelevant words together to get noticed. Therefore, I
> would like to try some sort of exponential (or even linear) Field
> Normalization where no normalization is performed on one-word
> documents but the longest documents (over a few hundred words) get
> some penalty.
>
> Are there any facilities in Solr for performing this? I would of
> course prefer query-time computation as that let me do other things
> with the data, but if only index-time computation is possible the I
> can accept that.
>
> Thank you!
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

Reply via email to