Hi Erick, By TermsCoponent, I think you meant me to try the following?
http://vottopg15.ottawa.ibm.com:8983/solr/testdata/terms?terms.f1=ALL_FIELDS&terms.prefix=the If so, I tried it and I'm getting 0 hits: <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="terms"/> </response> In fact, I'm getting 0 hits on anything I pass to "terms.prefix" Another thing I noticed is this. Using Solr Admin Console's Schema Browser, after selecting the field "ALL_FIELDS and clicking on Load Term Info button, I'm seeing "be" in the list!! Like so: 4 localhost abc a...@localhost.com com intern be /intern abclocalhostcom user I don't understand what I'm looking at here (in the schema browser) or if this is at all related to my issue (I'm seeing "be" listed here and wandering if it has something to do with my issue). If I click on any of the listed words, I get a hit, but I get 0 hits when I click on "be". Thanks. Steve On Tue, Jul 5, 2016 at 7:07 PM, Steven White <swhite4...@gmail.com> wrote: > Thanks for the quick reply Erick. > > Here is the analyzer I'm using: > > <fieldType name="all_raw_text" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" > ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" > generateNumberParts="1" splitOnCaseChange="0" catenateWords="1" > splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" > catenateAll="1" catenateNumbers="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > If in fact it is my analyzer, what part of it is causing this? If not, > I'm not clear about the "TermsComponent" that you suggested having me look > into. How do I "point" it at my field? I have zero knowledge about this. > Is this something I do from Solr's Admin Console via Schema Browser link? > > Steve > > > On Tue, Jul 5, 2016 at 6:51 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> My guess is that your field analysis isn't stripping the various non >> alpha-num >> characters, thus "the]" is actually a token in your index, square bracket >> and >> all. If that's true, it certainly doesn't match the stopword "the". >> >> You can check by using the TermsComponent, pointing it at your field >> and setting terms.prefix=the >> >> See: >> https://cwiki.apache.org/confluence/display/solr/The+Terms+Component >> >> Best, >> Erick >> >> On Tue, Jul 5, 2016 at 2:34 PM, Steven White <swhite4...@gmail.com> >> wrote: >> > HI Everyone, >> > >> > I'm trying to understand why I get a hit when I search for "the}" but >> not >> > when I search for "the" (searches are done without the quotes and "the" >> is >> > a stopword in my case). >> > >> > Here is the debugQuery output using "the}": >> > "debug": { >> > "rawquerystring": "the}", >> > "querystring": "the}", >> > "parsedquery": "(+DisjunctionMaxQuery(((ALL_FIELDS:the} >> > ALL_FIELDS:the))~1.0))/no_coord", >> > "parsedquery_toString": "+((ALL_FIELDS:the} ALL_FIELDS:the))~1.0", >> > "explain": { >> > "-1.5.1804": "\n0.14220011 = sum of:\n 0.14220011 = >> > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n >> 0.14220011 >> > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, >> > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 0, product >> of:\n >> > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = >> > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.078125 = fieldNorm(doc=0)\n", >> > "-1.5.3552": "\n0.14220011 = sum of:\n 0.14220011 = >> > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n >> 0.14220011 >> > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, >> > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 0, product >> of:\n >> > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = >> > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.078125 = fieldNorm(doc=0)\n", >> > "-1.5.3554": "\n0.14220011 = sum of:\n 0.14220011 = >> > weight(ALL_FIELDS:the in 1) [DefaultSimilarity], result of:\n >> 0.14220011 >> > = score(doc=1,freq=2.0), product of:\n 0.51863563 = queryWeight, >> > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 1, product >> of:\n >> > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = >> > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.078125 = fieldNorm(doc=1)\n", >> > "-1.5.1802": "\n0.1137601 = sum of:\n 0.1137601 = >> > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n >> 0.1137601 >> > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, >> > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.20899205 = queryNorm\n 0.21934493 = fieldWeight in 0, product >> of:\n >> > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = >> > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n >> > 0.0625 = fieldNorm(doc=0)\n" >> > }, >> > "QParser": "ExtendedDismaxQParser", >> > "altquerystring": null, >> > "boost_queries": null, >> > "parsed_boost_queries": [], >> > "boostfuncs": null, >> > "filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > "parsed_filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > >> > Here is the debugQuery output using "the" >> > "debug": { >> > "rawquerystring": "the", >> > "querystring": "the", >> > "parsedquery": "(+())/no_coord", >> > "parsedquery_toString": "+()", >> > "explain": {}, >> > "QParser": "ExtendedDismaxQParser", >> > "altquerystring": null, >> > "boost_queries": null, >> > "parsed_boost_queries": [], >> > "boostfuncs": null, >> > "filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > "parsed_filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > >> > As expected, I get no hits when I search for just "}": >> > "debug": { >> > "rawquerystring": "}", >> > "querystring": "}", >> > "parsedquery": >> "(+DisjunctionMaxQuery((ALL_FIELDS:})~1.0))/no_coord", >> > "parsedquery_toString": "+(ALL_FIELDS:})~1.0", >> > "explain": {}, >> > "QParser": "ExtendedDismaxQParser", >> > "altquerystring": null, >> > "boost_queries": null, >> > "parsed_boost_queries": [], >> > "boostfuncs": null, >> > "filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > "parsed_filter_queries": [ >> > "ISBN_GROUP_ID:2" >> > ], >> > >> > In case it matters, I'm also getting a hit when I search for "the." or >> > "the]" or "the/" or "the," or "the=" etc. >> > >> > Thanks in advanced. >> > >> > Steve >> > >