strange behavior of scores and term proximity use

Ariel Zerbib Wed, 16 Nov 2011 11:07:12 -0800

Hi,

For this term proximity query: ab_main_title_l0:"to be or not to be"~1000


http://localhost:8888/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000&sort=score+desc&start=0&rows=3&fl=ab_main_title_l0%2Cscore%2Cid&debugQuery=true

The third first results are the following one:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">5</int>
</lst>
<result name="response" numFound="318" start="0" maxScore="3.0814114">
  <doc>
    <long name="id">2315190010001021</long>
    <arr name="ab_main_title_l0">
      <str>og54ct8n To be or not to be a Jew. 5w8ojsx2</str>
    </arr>
    <float name="score">3.0814114</float></doc>
  <doc>
    <long name="id">2313006480001021</long>
    <arr name="ab_main_title_l0">
      <str>og54ct8n To be or not to be 5w8ojsx2</str>
    </arr>
    <float name="score">3.0814114</float></doc>
  <doc>
    <long name="id">2356410250001021</long>
    <arr name="ab_main_title_l0">
      <str>og54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2</str>
    </arr>
    <float name="score">3.0814114</float></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000</str>
  <str name="querystring">ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000</str>
  <str name="parsedquery">PhraseQuery(ab_main_title_l0:"og54ct8n to be or
not to be 5w8ojsx2"~1000)</str>
  <str name="parsedquery_toString">ab_main_title_l0:"og54ct8n to be or not
to be 5w8ojsx2"~1000</str>
  <lst name="explain">
    <str name="2315190010001021">
5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
    0.57735026 = tf(freq=0.33333334), with freq of:
      0.33333334 = phraseFreq=0.33333334
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=378403)
</str>
    <str name="2313006480001021">
9.244234 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = phraseFreq=1.0
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=482807)
</str>
    <str name="2356410250001021">
5.337161 = (MATCH) weight(ab_main_title_l0:"og54ct8n to be or not to be
5w8ojsx2"~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
    0.57735026 = tf(freq=0.33333334), with freq of:
      0.33333334 = phraseFreq=0.33333334
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=1317563)
</str>
</response>

The used version is a 4.0 October snapshot.

I have 2 questions about the result:
- Why debug print and scores in result are different?
- What is the expected behavior of this kind of term proximity query?
          - The debug scores seem to be well ordered but the result scores
seem to be wrong.


Thanks,
Ariel

strange behavior of scores and term proximity use

Reply via email to