My guess is that your field analysis isn't stripping the various non alpha-num
characters, thus "the]" is actually a token in your index, square bracket and
all. If that's true, it certainly doesn't match the stopword "the".

You can check by using the TermsComponent, pointing it at your field
and setting terms.prefix=the

See:
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component

Best,
Erick

On Tue, Jul 5, 2016 at 2:34 PM, Steven White <swhite4...@gmail.com> wrote:
> HI Everyone,
>
> I'm trying to understand why I get a hit when I search for "the}" but not
> when I search for "the" (searches are done without the quotes and "the" is
> a stopword in my case).
>
> Here is the debugQuery output using "the}":
>   "debug": {
>     "rawquerystring": "the}",
>     "querystring": "the}",
>     "parsedquery": "(+DisjunctionMaxQuery(((ALL_FIELDS:the}
> ALL_FIELDS:the))~1.0))/no_coord",
>     "parsedquery_toString": "+((ALL_FIELDS:the} ALL_FIELDS:the))~1.0",
>     "explain": {
>       "-1.5.1804": "\n0.14220011 = sum of:\n  0.14220011 =
> weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n    0.14220011
> = score(doc=0,freq=2.0), product of:\n      0.51863563 = queryWeight,
> product of:\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.20899205 = queryNorm\n      0.27418116 = fieldWeight in 0, product of:\n
>        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 =
> termFreq=2.0\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.078125 = fieldNorm(doc=0)\n",
>       "-1.5.3552": "\n0.14220011 = sum of:\n  0.14220011 =
> weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n    0.14220011
> = score(doc=0,freq=2.0), product of:\n      0.51863563 = queryWeight,
> product of:\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.20899205 = queryNorm\n      0.27418116 = fieldWeight in 0, product of:\n
>        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 =
> termFreq=2.0\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.078125 = fieldNorm(doc=0)\n",
>       "-1.5.3554": "\n0.14220011 = sum of:\n  0.14220011 =
> weight(ALL_FIELDS:the in 1) [DefaultSimilarity], result of:\n    0.14220011
> = score(doc=1,freq=2.0), product of:\n      0.51863563 = queryWeight,
> product of:\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.20899205 = queryNorm\n      0.27418116 = fieldWeight in 1, product of:\n
>        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 =
> termFreq=2.0\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.078125 = fieldNorm(doc=1)\n",
>       "-1.5.1802": "\n0.1137601 = sum of:\n  0.1137601 =
> weight(ALL_FIELDS:the in 0) [DefaultSimilarity], result of:\n    0.1137601
> = score(doc=0,freq=2.0), product of:\n      0.51863563 = queryWeight,
> product of:\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.20899205 = queryNorm\n      0.21934493 = fieldWeight in 0, product of:\n
>        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 =
> termFreq=2.0\n        2.4816046 = idf(docFreq=4, maxDocs=22)\n
>  0.0625 = fieldNorm(doc=0)\n"
>     },
>     "QParser": "ExtendedDismaxQParser",
>     "altquerystring": null,
>     "boost_queries": null,
>     "parsed_boost_queries": [],
>     "boostfuncs": null,
>     "filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>     "parsed_filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>
> Here is the debugQuery output using "the"
>   "debug": {
>     "rawquerystring": "the",
>     "querystring": "the",
>     "parsedquery": "(+())/no_coord",
>     "parsedquery_toString": "+()",
>     "explain": {},
>     "QParser": "ExtendedDismaxQParser",
>     "altquerystring": null,
>     "boost_queries": null,
>     "parsed_boost_queries": [],
>     "boostfuncs": null,
>     "filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>     "parsed_filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>
> As expected, I get no hits when I search for just "}":
>   "debug": {
>     "rawquerystring": "}",
>     "querystring": "}",
>     "parsedquery": "(+DisjunctionMaxQuery((ALL_FIELDS:})~1.0))/no_coord",
>     "parsedquery_toString": "+(ALL_FIELDS:})~1.0",
>     "explain": {},
>     "QParser": "ExtendedDismaxQParser",
>     "altquerystring": null,
>     "boost_queries": null,
>     "parsed_boost_queries": [],
>     "boostfuncs": null,
>     "filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>     "parsed_filter_queries": [
>       "ISBN_GROUP_ID:2"
>     ],
>
> In case it matters, I'm also getting a hit when I search for "the." or
> "the]" or "the/" or "the," or "the=" etc.
>
> Thanks in advanced.
>
> Steve

Reply via email to