IDF is the frequency of the term in that field for the entire index, not the specific document.
So it means that the term is in that field for some document somewhere, but not in that particular document I believe... Which leads me to wonder if the document is getting indexed as you expect, although there's nothing in the data that you've provided that I can point to as the culprit, it all looks like it *should* work.... If you can get a copy of Luke and look at the document in question and/or look at the "schema browser" for that particular field it might help, but frankly I'm at a loss to understand what the problem is... Sorry I can't be of more help Erick On Tue, Jul 26, 2011 at 1:04 PM, Robert Petersen <rober...@buy.com> wrote: > That didn't help. Seems like another case where I should get matches but > don't and this time it is only for some documents. Others with similar > content do match just fine. The debug output 'explain other' section for a > non-matching document seems to say the term frequency is 0 for my problematic > term, although I know it is in the content. > > I ended up making a synonym to do what the analysis stack *should* be doing: > splitting LaserJet on case changes. IE putting LaserJet, laser jet in > synonyms at index time makes this work. I don't know why though. > > Question: Does this debug output mean it is matching the terms but the term > frequency vector is returning 0 for the frequency of this term. IE Does this > mean the term is in the doc but not in the tf array? > > 0.0 = no match on required clause (moreWords:"laser jet") >>> >>> 0.0 = weight(moreWords:"laser jet" in 32497), product of: >>> >>> 0.60590804 = queryWeight(moreWords:"laser jet"), product of: >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.041507367 = queryNorm >>> >>> 0.0 = fieldWeight(moreWords:"laser jet" in 32497), product of: >>> >>> 0.0 = tf(phraseFreq=0.0) >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.078125 = fieldNorm(field=moreWords, doc=32497) >>> >>> > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, July 25, 2011 3:28 PM > To: solr-user@lucene.apache.org > Subject: Re: please help explaining debug output > > Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good > idea since this seems like it *should* work. > > Erick > > On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen <rober...@buy.com> wrote: >> I'm still on solr 1.4.0 and the analysis page looks like they should match, >> and other products with the same content do in fact match. I'm reindexing >> the non-matching ones to rule that out. >> >> -----Original Message----- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Monday, July 25, 2011 1:58 PM >> To: solr-user@lucene.apache.org >> Subject: Re: please help explaining debug output >> >> Hmmm, I'm assuming that moreWords is your default text field, yes? >> >> But it works for me (tm), using 1.4.1. What version of Solr are you on? >> >> Also, take a glance at the admin/analysis page, that might help... >> >> Gotta run >> >> Erick >> >> On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen <rober...@buy.com> wrote: >>> Sorry, to clarify a search for P1102W matches all three docs but a >>> search for p1102w LaserJet only matches the second two. Someone asked >>> me a question while I was typing and I got distracted, apologies for any >>> confusion. >>> >>> -----Original Message----- >>> From: Robert Petersen [mailto:rober...@buy.com] >>> Sent: Monday, July 25, 2011 1:42 PM >>> To: solr-user@lucene.apache.org >>> Subject: please help explaining debug output >>> >>> I have three documents with the following product titles in a text field >>> called moreWords with analysis stack matching the solr example text >>> field definition. >>> >>> >>> >>> 1. HP LaserJet P1102W Monochrome Laser Printer >>> <http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l >>> oc/101/213824965.html> >>> >>> 2. HP CE285A (85A) Remanufactured Black Toner Cartridge for >>> LaserJet M1212nf, P1102, P1102W Series >>> <http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri >>> dge-for-laserjet/q/loc/101/217145536.html> >>> >>> 3. Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet >>> M1130, LaserJet M1132, LaserJet M1210 >>> <http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1 >>> 102w-laserjet-m1130/q/loc/101/222045267.html> >>> >>> >>> >>> A search for P1102W matches (2) and (3), but not (1) above. Can someone >>> explain the debug output? It looks like I am getting a non-match on (1) >>> because term frequency is zero? Am I reading that right? If so, how >>> could that be? the searched terms are equivalently in all three docs. I >>> don't get it. >>> >>> >>> >>> >>> >>> <lst name="debug"> >>> >>> <str name="rawquerystring">p1102w LaserJet </str> >>> >>> <str name="querystring">p1102w LaserJet </str> >>> >>> <str name="parsedquery">+PhraseQuery(moreWords:"p 1102 w") >>> +PhraseQuery(moreWords:"laser jet")</str> >>> >>> <str name="parsedquery_toString">+moreWords:"p 1102 w" +moreWords:"laser >>> jet"</str> >>> >>> <lst name="explain"> >>> >>> <str name="222045267"> >>> >>> 3.64852 = (MATCH) sum of: >>> >>> 2.4758534 = weight(moreWords:"p 1102 w" in 6667236), product of: >>> >>> 0.7955347 = queryWeight(moreWords:"p 1102 w"), product of: >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.041507367 = queryNorm >>> >>> 3.1121879 = fieldWeight(moreWords:"p 1102 w" in 6667236), product >>> of: >>> >>> 1.7320508 = tf(phraseFreq=3.0) >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.09375 = fieldNorm(field=moreWords, doc=6667236) >>> >>> 1.1726664 = weight(moreWords:"laser jet" in 6667236), product of: >>> >>> 0.60590804 = queryWeight(moreWords:"laser jet"), product of: >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.041507367 = queryNorm >>> >>> 1.9353869 = fieldWeight(moreWords:"laser jet" in 6667236), product >>> of: >>> >>> 1.4142135 = tf(phraseFreq=2.0) >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.09375 = fieldNorm(field=moreWords, doc=6667236) >>> >>> >>> >>> </str> >>> >>> <str name="222045265"> >>> >>> 2.8656518 = (MATCH) sum of: >>> >>> 1.4294347 = weight(moreWords:"p 1102 w" in 6684158), product of: >>> >>> 0.7955347 = queryWeight(moreWords:"p 1102 w"), product of: >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.041507367 = queryNorm >>> >>> 1.7968225 = fieldWeight(moreWords:"p 1102 w" in 6684158), product >>> of: >>> >>> 1.0 = tf(phraseFreq=1.0) >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.09375 = fieldNorm(field=moreWords, doc=6684158) >>> >>> 1.4362172 = weight(moreWords:"laser jet" in 6684158), product of: >>> >>> 0.60590804 = queryWeight(moreWords:"laser jet"), product of: >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.041507367 = queryNorm >>> >>> 2.3703551 = fieldWeight(moreWords:"laser jet" in 6684158), product >>> of: >>> >>> 1.7320508 = tf(phraseFreq=3.0) >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.09375 = fieldNorm(field=moreWords, doc=6684158) >>> >>> >>> >>> </str> >>> >>> </lst> >>> >>> <str name="otherQuery">sku:213824965 >>> >>> </str> >>> >>> <lst name="explainOther"> >>> >>> <str name="213824965"> >>> >>> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited >>> clause(s) >>> >>> 1.1911955 = weight(moreWords:"p 1102 w" in 32497), product of: >>> >>> 0.7955347 = queryWeight(moreWords:"p 1102 w"), product of: >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.041507367 = queryNorm >>> >>> 1.4973521 = fieldWeight(moreWords:"p 1102 w" in 32497), product of: >>> >>> 1.0 = tf(phraseFreq=1.0) >>> >>> 19.166107 = idf(moreWords: p=189166 1102=1135 w=445720) >>> >>> 0.078125 = fieldNorm(field=moreWords, doc=32497) >>> >>> 0.0 = no match on required clause (moreWords:"laser jet") >>> >>> 0.0 = weight(moreWords:"laser jet" in 32497), product of: >>> >>> 0.60590804 = queryWeight(moreWords:"laser jet"), product of: >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.041507367 = queryNorm >>> >>> 0.0 = fieldWeight(moreWords:"laser jet" in 32497), product of: >>> >>> 0.0 = tf(phraseFreq=0.0) >>> >>> 14.597603 = idf(moreWords: laser=26731 jet=12685) >>> >>> 0.078125 = fieldNorm(field=moreWords, doc=32497) >>> >>> >>> >>> </str> >>> >>> </lst> >>> >>> >> >