Excellent! Thanks
-----Original message----- > From:Robert Muir <rcm...@gmail.com> > Sent: Fri 08-Jun-2012 13:06 > To: Markus Jelsma <markus.jel...@openindex.io> > Cc: solr-user@lucene.apache.org > Subject: Re: per-fieldtype similarity not working > > On Fri, Jun 8, 2012 at 5:04 AM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > Thanks Robert, > > > > The difference in scores is clear now so it shouldn't matter as queryNorm > > doesn't affect ranking but coord does. Can you explain why coord is left > > out now and why it is considered to skew results and why queryNorm skews > > results? And which specific new ranking algorithms they confuse, BM25F? > > I think its easiest to compare the two TF normalization functions, > DefaultSimilarity really needs something like this because its > function (sqrt) grows very fast for a single term. > On the other hand, consider BM25's: tf/(tf+lengthNorm), it saturates > rather quickly for a single term, so when multiple terms are being > scored, huge numbers of occurrences of a single term won't dominate > the overall score. > > You can see this visually here (give it a second to load, and imagine > documentLength = averageDocumentLength and k=1.2): > http://www.wolframalpha.com/input/?i=plot+sqrt%28x%29%2C+x%2F%28x%2B1.2%29%2C+x%3D1+to+100 > > > > > Also, i would expect the default SchemaSimilarityFactory to behave the same > > as DefaultSimilarity this might raise some further confusion down the line. > > Thats ok: I'd rather the very expert case (Per-Field scoring) be > trickier than have a trap for people that try to use any algorithm > other than TFIDFSimilarity > > -- > lucidimagination.com >