Thanks. No immediate, obvious, problem stands out, but I need to study it
more closely (which I am doing now).
For the "good" query I see idf(doc: ca=10 067=10), which looks exactly
correct.
But for the "bad" query I see idf(text: ca=16 067=9), which doesn't look
right. I can believe that there are additional docs containing "ca" in some
field, but the text field should have at least as many occurrences of "067"
as the doc field.
Any chance that you modified your schema, including copyFields since the
first 5 documents were added? If so, you need to re-index them all.
And, the ca=16 suggests that you have additional copyFields that target the
"text" field. Is that the case?
Are you using the official release of 3.4 or was this a snapshot nightly
build?
What schema version do you have? Look for "<schema name="example"
version="n.m">" in schema.xml.
Thanks.
-- Jack Krupansky
-----Original Message-----
From: Shalom
Sent: Thursday, August 09, 2012 9:27 AM
To: solr-user@lucene.apache.org
Subject: Re: search on default field returns less documents
Jack, Thanks for your reply.
We are using solr 3.4.
We use the standard lucene query parser.
I added debugQuery=true , this is the result when searching ca067 and
getting 5 documents:
<lst name="debug"><str name="rawquerystring">ca067</str><str
name="querystring">ca067</str><str name="parsedquery">PhraseQuery(text:"ca
067")</str><str name="parsedquery_toString">text:"ca 067"</str><lst
name="explain"><str name="219">
0.1108914 = (MATCH) weight(text:"ca 067" in 75), product of:
1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
0.1108914 = fieldWeight(text:"ca 067" in 75), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.01953125 = fieldNorm(field=text, doc=75)
</str><str name="215">
0.088713124 = (MATCH) weight(text:"ca 067" in 71), product of:
1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
0.088713124 = fieldWeight(text:"ca 067" in 71), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.015625 = fieldNorm(field=text, doc=71)
</str><str name="216">
0.088713124 = (MATCH) weight(text:"ca 067" in 72), product of:
1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
0.088713124 = fieldWeight(text:"ca 067" in 72), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.015625 = fieldNorm(field=text, doc=72)
</str><str name="218">
0.06653485 = (MATCH) weight(text:"ca 067" in 74), product of:
1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
0.06653485 = fieldWeight(text:"ca 067" in 74), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.01171875 = fieldNorm(field=text, doc=74)
</str><str name="217">
0.0554457 = (MATCH) weight(text:"ca 067" in 73), product of:
1.0 = queryWeight(text:"ca 067"), product of:
5.67764 = idf(text: ca=16 067=9)
0.17612952 = queryNorm
0.0554457 = fieldWeight(text:"ca 067" in 73), product of:
1.0 = tf(phraseFreq=1.0)
5.67764 = idf(text: ca=16 067=9)
0.009765625 = fieldNorm(field=text, doc=73)
</str></lst>
this is the result when searching doc:ca067 and getting 10 documents:
<lst name="debug"><str name="rawquerystring">doc:ca067</str><str
name="querystring">doc:ca067</str><str
name="parsedquery">PhraseQuery(doc:"ca 067")</str><str
name="parsedquery_toString">doc:"ca 067"</str><lst name="explain"><str
name="215">
1.8805147 = (MATCH) weight(doc:"ca 067" in 71), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 71), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=71)
</str><str name="216">
1.8805147 = (MATCH) weight(doc:"ca 067" in 72), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 72), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=72)
</str><str name="217">
1.8805147 = (MATCH) weight(doc:"ca 067" in 73), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 73), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=73)
</str><str name="218">
1.8805147 = (MATCH) weight(doc:"ca 067" in 74), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 74), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=74)
</str><str name="219">
1.8805147 = (MATCH) weight(doc:"ca 067" in 75), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 75), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=75)
</str><str name="220">
1.8805147 = (MATCH) weight(doc:"ca 067" in 76), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 76), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=76)
</str><str name="221">
1.8805147 = (MATCH) weight(doc:"ca 067" in 77), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 77), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=77)
</str><str name="222">
1.8805147 = (MATCH) weight(doc:"ca 067" in 78), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 78), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=78)
</str><str name="223">
1.8805147 = (MATCH) weight(doc:"ca 067" in 79), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 79), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=79)
</str><str name="224">
1.8805147 = (MATCH) weight(doc:"ca 067" in 80), product of:
0.99999994 = queryWeight(doc:"ca 067"), product of:
6.0176477 = idf(doc: ca=10 067=10)
0.16617788 = queryNorm
1.8805149 = fieldWeight(doc:"ca 067" in 80), product of:
1.0 = tf(phraseFreq=1.0)
6.0176477 = idf(doc: ca=10 067=10)
0.3125 = fieldNorm(field=doc, doc=80)
</str></lst>
to remind you , we have 10 documents where the doc field is these names:
ca067sac 201205 At A Glance v0.pdf
ca067sac 201205 Builder Activity v0.pdf
ca067sac 201205 Foreclosure v0.pdf
ca067sac 201205 Hili Activity v0.pdf
ca067sac 201205 LCP Activity v0.pdf
ca067sac 201205 Lender Activity v0.pdf
ca067sac 201205 Title Activity v0.pdf
ca067sac 201205 Transaction Rpt TO v0.pdf
ca067sac 201205 Transaction Rpt v0.pdf
ca067sac 201205 Unknown Escrow-Title v0.pdf
if I search ca067 i get 5 results, searching for only 067 I get 9 results,
where searching doc:ca067 or doc:067 I get 10 results.
This is how my textgen look like:
<fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
Thank you
--
View this message in context:
http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896p4000145.html
Sent from the Solr - User mailing list archive at Nabble.com.