Re: search on default field returns less documents

Jack Krupansky Thu, 09 Aug 2012 09:12:29 -0700

Thanks. No immediate, obvious, problem stands out, but I need to study itmore closely (which I am doing now).

For the "good" query I see idf(doc: ca=10 067=10), which looks exactlycorrect.

But for the "bad" query I see idf(text: ca=16 067=9), which doesn't lookright. I can believe that there are additional docs containing "ca" in somefield, but the text field should have at least as many occurrences of "067"as the doc field.

Any chance that you modified your schema, including copyFields since thefirst 5 documents were added? If so, you need to re-index them all.

And, the ca=16 suggests that you have additional copyFields that target the"text" field. Is that the case?

Are you using the official release of 3.4 or was this a snapshot nightlybuild?

What schema version do you have? Look for "<schema name="example"version="n.m">" in schema.xml.


Thanks.

-- Jack Krupansky

-----Original Message-----From: Shalom

Sent: Thursday, August 09, 2012 9:27 AM
To: solr-user@lucene.apache.org
Subject: Re: search on default field returns less documents

Jack, Thanks for your reply.

We are using solr 3.4.

We use the standard lucene query parser.

I added debugQuery=true , this is the result when searching ca067 and
getting 5 documents:

<lst name="debug"><str name="rawquerystring">ca067</str><str
name="querystring">ca067</str><str name="parsedquery">PhraseQuery(text:"ca
067")</str><str name="parsedquery_toString">text:"ca 067"</str><lst
name="explain"><str name="219">
0.1108914 = (MATCH) weight(text:"ca 067" in 75), product of:
 1.0 = queryWeight(text:"ca 067"), product of:
   5.67764 = idf(text: ca=16 067=9)
   0.17612952 = queryNorm
 0.1108914 = fieldWeight(text:"ca 067" in 75), product of:
   1.0 = tf(phraseFreq=1.0)
   5.67764 = idf(text: ca=16 067=9)
   0.01953125 = fieldNorm(field=text, doc=75)
</str><str name="215">
0.088713124 = (MATCH) weight(text:"ca 067" in 71), product of:
 1.0 = queryWeight(text:"ca 067"), product of:
   5.67764 = idf(text: ca=16 067=9)
   0.17612952 = queryNorm
 0.088713124 = fieldWeight(text:"ca 067" in 71), product of:
   1.0 = tf(phraseFreq=1.0)
   5.67764 = idf(text: ca=16 067=9)
   0.015625 = fieldNorm(field=text, doc=71)
</str><str name="216">
0.088713124 = (MATCH) weight(text:"ca 067" in 72), product of:
 1.0 = queryWeight(text:"ca 067"), product of:
   5.67764 = idf(text: ca=16 067=9)
   0.17612952 = queryNorm
 0.088713124 = fieldWeight(text:"ca 067" in 72), product of:
   1.0 = tf(phraseFreq=1.0)
   5.67764 = idf(text: ca=16 067=9)
   0.015625 = fieldNorm(field=text, doc=72)
</str><str name="218">
0.06653485 = (MATCH) weight(text:"ca 067" in 74), product of:
 1.0 = queryWeight(text:"ca 067"), product of:
   5.67764 = idf(text: ca=16 067=9)
   0.17612952 = queryNorm
 0.06653485 = fieldWeight(text:"ca 067" in 74), product of:
   1.0 = tf(phraseFreq=1.0)
   5.67764 = idf(text: ca=16 067=9)
   0.01171875 = fieldNorm(field=text, doc=74)
</str><str name="217">
0.0554457 = (MATCH) weight(text:"ca 067" in 73), product of:
 1.0 = queryWeight(text:"ca 067"), product of:
   5.67764 = idf(text: ca=16 067=9)
   0.17612952 = queryNorm
 0.0554457 = fieldWeight(text:"ca 067" in 73), product of:
   1.0 = tf(phraseFreq=1.0)
   5.67764 = idf(text: ca=16 067=9)
   0.009765625 = fieldNorm(field=text, doc=73)
</str></lst>


this is the result when searching doc:ca067 and getting 10 documents:

<lst name="debug"><str name="rawquerystring">doc:ca067</str><str
name="querystring">doc:ca067</str><str
name="parsedquery">PhraseQuery(doc:"ca 067")</str><str
name="parsedquery_toString">doc:"ca 067"</str><lst name="explain"><str
name="215">
1.8805147 = (MATCH) weight(doc:"ca 067" in 71), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 71), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=71)
</str><str name="216">
1.8805147 = (MATCH) weight(doc:"ca 067" in 72), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 72), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=72)
</str><str name="217">
1.8805147 = (MATCH) weight(doc:"ca 067" in 73), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 73), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=73)
</str><str name="218">
1.8805147 = (MATCH) weight(doc:"ca 067" in 74), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 74), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=74)
</str><str name="219">
1.8805147 = (MATCH) weight(doc:"ca 067" in 75), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 75), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=75)
</str><str name="220">
1.8805147 = (MATCH) weight(doc:"ca 067" in 76), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 76), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=76)
</str><str name="221">
1.8805147 = (MATCH) weight(doc:"ca 067" in 77), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 77), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=77)
</str><str name="222">
1.8805147 = (MATCH) weight(doc:"ca 067" in 78), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 78), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=78)
</str><str name="223">
1.8805147 = (MATCH) weight(doc:"ca 067" in 79), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 79), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=79)
</str><str name="224">
1.8805147 = (MATCH) weight(doc:"ca 067" in 80), product of:
 0.99999994 = queryWeight(doc:"ca 067"), product of:
   6.0176477 = idf(doc: ca=10 067=10)
   0.16617788 = queryNorm
 1.8805149 = fieldWeight(doc:"ca 067" in 80), product of:
   1.0 = tf(phraseFreq=1.0)
   6.0176477 = idf(doc: ca=10 067=10)
   0.3125 = fieldNorm(field=doc, doc=80)
</str></lst>


to remind you , we have 10 documents where the doc field is these names:
ca067sac 201205 At A Glance v0.pdf
ca067sac 201205 Builder Activity v0.pdf
ca067sac 201205 Foreclosure v0.pdf
ca067sac 201205 Hili Activity v0.pdf
ca067sac 201205 LCP Activity v0.pdf
ca067sac 201205 Lender Activity v0.pdf
ca067sac 201205 Title Activity v0.pdf
ca067sac 201205 Transaction Rpt TO v0.pdf
ca067sac 201205 Transaction Rpt v0.pdf
ca067sac 201205 Unknown Escrow-Title v0.pdf


if I search ca067 i get 5 results, searching for only 067 I get 9 results,
where searching doc:ca067 or doc:067 I get 10 results.


This is how my textgen look like:
<fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>



Thank you



--

View this message in context:http://lucene.472066.n3.nabble.com/search-on-default-field-returns-less-documents-tp3999896p4000145.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: search on default field returns less documents

Reply via email to