RE: per-fieldtype similarity not working

Markus Jelsma Fri, 08 Jun 2012 06:05:04 -0700

Excellent!
Thanks


 
 
-----Original message-----
> From:Robert Muir <rcm...@gmail.com>
> Sent: Fri 08-Jun-2012 13:06
> To: Markus Jelsma <markus.jel...@openindex.io>
> Cc: solr-user@lucene.apache.org
> Subject: Re: per-fieldtype similarity not working
> 
> On Fri, Jun 8, 2012 at 5:04 AM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Thanks Robert,
> >
> > The difference in scores is clear now so it shouldn't matter as queryNorm 
> > doesn't affect ranking but coord does. Can you explain why coord is left 
> > out now and why it is considered to skew results and why queryNorm skews 
> > results? And which specific new ranking algorithms they confuse, BM25F?
> 
> I think its easiest to compare the two TF normalization functions,
> DefaultSimilarity really needs something like this because its
> function (sqrt) grows very fast for a single term.
> On the other hand, consider BM25's: tf/(tf+lengthNorm), it saturates
> rather quickly for a single term, so when multiple terms are being
> scored, huge numbers of occurrences of a single term won't dominate
> the overall score.
> 
> You can see this visually here (give it a second to load, and imagine
> documentLength = averageDocumentLength and k=1.2):
> http://www.wolframalpha.com/input/?i=plot+sqrt%28x%29%2C+x%2F%28x%2B1.2%29%2C+x%3D1+to+100
> 
> >
> > Also, i would expect the default SchemaSimilarityFactory to behave the same 
> > as DefaultSimilarity this might raise some further confusion down the line.
> 
> Thats ok: I'd rather the very expert case (Per-Field scoring) be
> trickier than have a trap for people that try to use any algorithm
> other than TFIDFSimilarity
> 
> -- 
> lucidimagination.com
>

RE: per-fieldtype similarity not working

Reply via email to