Thanks for the quick reply Erick. Here is the analyzer I'm using:
<fieldType name="all_raw_text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateNumberParts="1" splitOnCaseChange="0" catenateWords="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" catenateAll="1" catenateNumbers="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> If in fact it is my analyzer, what part of it is causing this? If not, I'm not clear about the "TermsComponent" that you suggested having me look into. How do I "point" it at my field? I have zero knowledge about this. Is this something I do from Solr's Admin Console via Schema Browser link? Steve On Tue, Jul 5, 2016 at 6:51 PM, Erick Erickson <erickerick...@gmail.com> wrote: > My guess is that your field analysis isn't stripping the various non > alpha-num > characters, thus "the]" is actually a token in your index, square bracket > and > all. If that's true, it certainly doesn't match the stopword "the". > > You can check by using the TermsComponent, pointing it at your field > and setting terms.prefix=the > > See: > https://cwiki.apache.org/confluence/display/solr/The+Terms+Component > > Best, > Erick > > On Tue, Jul 5, 2016 at 2:34 PM, Steven White <swhite4...@gmail.com> wrote: > > HI Everyone, > > > > I'm trying to understand why I get a hit when I search for "the}" but not > > when I search for "the" (searches are done without the quotes and "the" > is > > a stopword in my case). > > > > Here is the debugQuery output using "the}": > > "debug": { > > "rawquerystring": "the}", > > "querystring": "the}", > > "parsedquery": "(+DisjunctionMaxQuery(((ALL_FIELDS:the} > > ALL_FIELDS:the))~1.0))/no_coord", > > "parsedquery_toString": "+((ALL_FIELDS:the} ALL_FIELDS:the))~1.0", > > "explain": { > > "-1.5.1804": "\n0.14220011 = sum of:\n 0.14220011 = > > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n > 0.14220011 > > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, > > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 0, product > of:\n > > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = > > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.078125 = fieldNorm(doc=0)\n", > > "-1.5.3552": "\n0.14220011 = sum of:\n 0.14220011 = > > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n > 0.14220011 > > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, > > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 0, product > of:\n > > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = > > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.078125 = fieldNorm(doc=0)\n", > > "-1.5.3554": "\n0.14220011 = sum of:\n 0.14220011 = > > weight(ALL_FIELDS:the in 1) [DefaultSimilarity], result of:\n > 0.14220011 > > = score(doc=1,freq=2.0), product of:\n 0.51863563 = queryWeight, > > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.20899205 = queryNorm\n 0.27418116 = fieldWeight in 1, product > of:\n > > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = > > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.078125 = fieldNorm(doc=1)\n", > > "-1.5.1802": "\n0.1137601 = sum of:\n 0.1137601 = > > weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n > 0.1137601 > > = score(doc=0,freq=2.0), product of:\n 0.51863563 = queryWeight, > > product of:\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.20899205 = queryNorm\n 0.21934493 = fieldWeight in 0, product > of:\n > > 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = > > termFreq=2.0\n 2.4816046 = idf(docFreq=4, maxDocs=22)\n > > 0.0625 = fieldNorm(doc=0)\n" > > }, > > "QParser": "ExtendedDismaxQParser", > > "altquerystring": null, > > "boost_queries": null, > > "parsed_boost_queries": [], > > "boostfuncs": null, > > "filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > "parsed_filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > > > Here is the debugQuery output using "the" > > "debug": { > > "rawquerystring": "the", > > "querystring": "the", > > "parsedquery": "(+())/no_coord", > > "parsedquery_toString": "+()", > > "explain": {}, > > "QParser": "ExtendedDismaxQParser", > > "altquerystring": null, > > "boost_queries": null, > > "parsed_boost_queries": [], > > "boostfuncs": null, > > "filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > "parsed_filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > > > As expected, I get no hits when I search for just "}": > > "debug": { > > "rawquerystring": "}", > > "querystring": "}", > > "parsedquery": "(+DisjunctionMaxQuery((ALL_FIELDS:})~1.0))/no_coord", > > "parsedquery_toString": "+(ALL_FIELDS:})~1.0", > > "explain": {}, > > "QParser": "ExtendedDismaxQParser", > > "altquerystring": null, > > "boost_queries": null, > > "parsed_boost_queries": [], > > "boostfuncs": null, > > "filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > "parsed_filter_queries": [ > > "ISBN_GROUP_ID:2" > > ], > > > > In case it matters, I'm also getting a hit when I search for "the." or > > "the]" or "the/" or "the," or "the=" etc. > > > > Thanks in advanced. > > > > Steve >