I don't have any additional questions, and won't, until you are able to
supply the information requested in my previous response.
-- Jack Krupansky
-----Original Message-----
From: Kuchekar
Sent: Friday, September 13, 2013 1:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Different Responses for 4.4 and 3.5 solr index
Hi,
Following is the debug query results :
*Solr 3.5*
<lst name="profile_D48699">
<bool name="match">true</bool>
<float name="value">60.67038</float>
<str name="description">sum of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">60.67038</float>
<str name="description">max plus 1.0 times others of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.44362593</float>
<str name="description">weight(content:cancer^0.5 in 21506339),
product of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.009291923</float>
<str name="description">queryWeight(content:cancer^0.5),
product of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.5</float>
<str name="description">boost</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">3.5684927</float>
<str name="description">idf(docFreq=1682287,
maxDocs=21947370)</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">0.005207758</float>
<str name="description">queryNorm</str>
</lst>
</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">47.74318</float>
<str name="description">fieldWeight(content:cancer in
21506339), product of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">13.379088</float>
* <str
name="description">tf(termFreq(content:cancer)=179)</str>*
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">3.5684927</float>
<str name="description">idf(docFreq=1682287,
maxDocs=21947370)</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">1.0</float>
<str name="description">fieldNorm(field=content,
doc=21506339)</str>
</lst>
</arr>
</lst>
</arr>
</lst>
*Solr 4.4 debug query :*
*
*
<lst name="profile_D48699">
<bool name="match">true</bool>
<float name="value">67.04259</float>
<str name="description">max plus 1.0 times others of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.75314933</float>
<str name="description">weight(content:cancer^0.5 in 20543947)
[DefaultSimilarity], result of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.75314933</float>
<str name="description">score(doc=20543947,freq=515.0 =
termFreq=515.0 ), product of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.009295603</float>
<str name="description">queryWeight, product of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">0.5</float>
<str name="description">boost</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">3.5702603</float>
<str name="description">idf(docFreq=1678887,
maxDocs=21941764)</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">0.005207241</float>
<str name="description">queryNorm</str>
</lst>
</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">81.0221</float>
<str name="description">fieldWeight in 20543947, product
of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">22.693611</float>
<str name="description">tf(freq=515.0), with freq
of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">515.0</float>
<str name="description">termFreq=515.0</str>
</lst>
</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">3.5702603</float>
<str name="description">idf(docFreq=1678887,
maxDocs=21941764)</str>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">1.0</float>
<str name="description">fieldNorm(doc=20543947)</str>
</lst>
</arr>
</lst>
</arr>
</lst>
</arr>
</lst>
Search for the term 'cancer' in the field 'content' should me the count to
be 515.
Please let me known if you have any questions or concerns.
Thanks.
Kuchekar, Nilesh
On Fri, Sep 13, 2013 at 12:36 AM, Jack Krupansky
<j...@basetechnology.com>wrote:
There may be some token filters that are emitting a different number of
terms. There are so many changes between 3.5 and 4.4, that it simply isn't
worth the trouble to track down all of them. In some cases, there may be
bugs in 3.5 that have gotten fixed in any of the intervening releases.
Do you have a specific example - the input text and the field and field
type and analyzer where the tf differs? That should suggest where the
differences come from.
Do you have any specific reason to believe that one of the counts is more
right than the other?
-- Jack Krupansky
-----Original Message----- From: Kuchekar
Sent: Thursday, September 12, 2013 4:50 PM
To: solr-user@lucene.apache.org
Cc: Stefan Matheis
Subject: Re: Different Responses for 4.4 and 3.5 solr index
Hi,
After triaging more for this, we find that the termFrequency (tf) for
the same field in the same doc in solr 3.5 and 4.4 is different.
example :
If word "fruits" appear in some field for 20 times
In 3.5 tf is reported to be 8, where as in 4.4 solr it reports to be 20.
that is changing the the score.
Also we see that the function 'idf' which depends upon the max doc is
changed.
Are there any changes in 'termFrequency' and 'idf' function in solr 4.4
compared to solr 3.5.
Looking forward for your reply.
Thanks.
Kuchekar, Nilesh
On Thu, Sep 12, 2013 at 11:30 AM, Kuchekar <kuchekar.nil...@gmail.com>**
wrote:
Hi,
Any updates on this?. Is ranking computation dependent on the
'maxDoc'
value in the solr? Is this happening due to changing value of 'maxDoc'
value after each optimization. As in, in solr 4.4 every time optimization
is ran, the 'maxDoc' value is reset, where as this is not the case in
solr
3.5.
Looking forward for the reply.
Thanks.
Kuchekar, Nilesh
On Wed, Aug 28, 2013 at 3:32 PM, Michael Sokolov <
msoko...@safaribooksonline.com**> wrote:
We've been seeing changes in our rankings as well. I don't have a
definite answer yet, since we're waiting on an index rebuild, but our
current working theory is that the change to default omitNorms="true"
for
primitive types may have had an effect, possibly due to follow on
confusion: our developers may have omitted norms from some other fields
they shouldn't have?
-Mike
On 08/26/2013 09:46 AM, Stefan Matheis wrote:
Did you check the scoring? (use fl=*,score to retrieve it) ..
additionally debugQuery=true might provide more information about how
the
score was calculated.
- Stefan
On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:
Hi,
The response from 4.4 and 3.5 in the current scenario differs in the
sequence in which results are given us back.
For example :
Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
Response from 4.4 solr is : id C, id:A, id:D, id:B...
Looking forward your reply.
Thanks.
Kuchekar, Nilesh
On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
<matheis.ste...@gmail.com (mailto:matheis.stefan@gmail.****com<
matheis.ste...@gmail.com>
)>wrote:
Kuchekar (hope that's your first name?)
you didn't tell us .. how they differ? do you get an actual error? or
does
the result contain documents you didn't expect? or the other way
round,
that some are missing you'd expect to be there?
- Stefan
On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:
Hi,
We get different response when we query 4.4 and 3.5 solr using same
query params.
My query param are as following :
facet=true
&facet.mincount=1
&facet.limit=25
&qf=content^0.0+p_last_name^****500.0+p_first_name^50.0+**
strong_topic^0.0+first_author_****topic^0.0+last_author_topic^**0.**
0+title_topic^0.0
&wt=javabin
&version=2
&rows=10
&f.affiliation_org.facet.****limit=150
&fl=p_id,p_first_name,p_last_****name
&start=0
&q=Apple
&facet.field=affiliation_org
&fq=table:profile
&fq=num_content:[*+TO+1500]
&fq=name:"Apple"
The content in both (solr 4.4 and solr 3.5) are same.
The solrconfig.xml from 3.5 an 4.4 are similarly constructed.
Is there something I am missing that might have been changed in 4.4,
which
might be causing this issue. ?. The "qf" params looks same.
Looking forward for your reply.
Thanks.
Kuchekar, Nilesh