Tracked it down to this ticket:

https://issues.apache.org/jira/browse/LUCENE-6590

which changed the implementation of normalize() in
org.apache.lucene.search.similarities.TFIDFSimilarity.

I've asked for comment on that ticket.

Upayavira

On Fri, 10 Jun 2016, at 01:39 AM, Ahmet Arslan wrote:
> Hi,
> 
> I wondered the same before and failed to decipher TFIDFSimilarity.
> Scoring looks like tf*idf*idf to me.
> 
> I appreciate someone who will shed some light on this.
> 
> Thanks,
> Ahmet
> 
> 
> 
> On Friday, June 10, 2016 12:37 AM, Upayavira <u...@odoko.co.uk> wrote:
> I've just done a very simple, single term query against a 4.10 system
> and a 5.5 system, each with much the same data.
> 
> The score for the 4.10 system was essentially made up of the field
> weight, which is:
>    score = tf * idf 
> 
> Whereas, in the 5.5 system, there is an additional "query weight", which
> is idf * query norm. If query norm is 1, then the final score is now:
>   score = query_weight * field_weight
>           = ( idf * 1 ) * (tf * idf)
>           = tf * idf^2
> 
> Can anyone explain why this new "query weight" element has appeared in
> our scores somewhere between 4.10 and 5.5?
> 
> Thanks!
> 
> Upayavira
> 
> 4.10 score ========================================================
>       "2937439": {
>         "match": true,
>         "value": 5.5993805,
>         "description": "weight(description:obama in 394012)
>         [DefaultSimilarity], result of:",
>         "details": [
>           {
>             "match": true,
>             "value": 5.5993805,
>             "description": "fieldWeight in 394012, product of:",
>             "details": [
>               {
>                 "match": true,
>                 "value": 1,
>                 "description": "tf(freq=1.0), with freq of:",
>                 "details": [
>                   {
>                     "match": true,
>                     "value": 1,
>                     "description": "termFreq=1.0"
>                   }
>                 ]
>               },
>               {
>                 "match": true,
>                 "value": 5.5993805,
>                 "description": "idf(docFreq=56010, maxDocs=5568765)"
>               },
>               {
>                 "match": true,
>                 "value": 1,
>                 "description": "fieldNorm(doc=394012)"
>               }
>             ]
>           }
>         ]
> 5.5 score ========================================================
>       "2502281":{
>         "match":true,
>         "value":28.51136,
>         "description":"weight(description:obama in 43472) [], result
>         of:",
>         "details":[{
>             "match":true,
>             "value":28.51136,
>             "description":"score(doc=43472,freq=1.0), product of:",
>             "details":[{
>                 "match":true,
>                 "value":5.339603,
>                 "description":"queryWeight, product of:",
>                 "details":[{
>                     "match":true,
>                     "value":5.339603,
>                     "description":"idf(docFreq=31905,
>                     maxDocs=2446459)"},
>                   {
>                     "match":true,
>                     "value":1.0,
>                     "description":"queryNorm"}]},
>               {
>                 "match":true,
>                 "value":5.339603,
>                 "description":"fieldWeight in 43472, product of:",
>                 "details":[{
>                     "match":true,
>                     "value":1.0,
>                     "description":"tf(freq=1.0), with freq of:",
>                     "details":[{
>                         "match":true,
>                         "value":1.0,
>                         "description":"termFreq=1.0"}]},
>                   {
>                     "match":true,
>                     "value":5.339603,
>                     "description":"idf(docFreq=31905,
>                     maxDocs=2446459)"},
>                   {
>                     "match":true,
>                     "value":1.0,
>                     "description":"fieldNorm(doc=43472)"}]}]}]},

Reply via email to