Thanks for the reply. So, as an aside, should I remove the solr.WhitespaceTokenizerFactory and solr.WordDelimiterFilterFactory from the query analyzer part?
Any idea in which direction I should poke around? I deactivated dismax for now, but would really like to use it. Wouter Admiraal 2015-06-04 16:54 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>: > The empty parentheses in the parsed query says something odd is going on > with query-time analysis, that is essentially generating an empty term. > That may not be the cause of your specific issue, but at least its says > that something is unexplained here. > > Generally, there is an asymmetry between the index and query analyzers when > the word delimiter filter is used - at index time you typically generate > extra terms to aid in recall, while at query time the extra terms are not > generated to aid in precision. In particular, you would just generate the > word and number parts, and not preserve the original token. But... that > should not matter if there is only a single query term. So, something else > is going on here. > > -- Jack Krupansky > > On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal <w...@wadmiraal.net> wrote: > >> Hi, thanks for the response. >> >> Label field: >> <field name="label" type="text" indexed="true" stored="true" >> termVectors="true" omitNorms="true"/> >> >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="txt/stopwords.txt" /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" >> preserveOriginal="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> <filter class="solr.NGramFilterFactory" minGramSize="3" >> maxGramSize="25"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="txt/stopwords.txt" /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" >> preserveOriginal="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> >> I can surely optimize the above config a bit, maybe only use one >> <analyzer> for both query and index. But for now, this is what it >> does. >> >> Just as a side-question: is dismax *supposed* to match fields exactly >> with the search query? Or is my expectation correct, meaning it should >> "tokenize" the field, just as with regular searches? It just doesn't >> seem intuitive to me. >> >> Thank you again for your help. >> >> Kind regards, >> Wouter Admiraal >> >> >> 2015-06-04 14:52 GMT+02:00 Shawn Heisey <apa...@elyograg.org>: >> > On 6/4/2015 1:22 AM, Wouter Admiraal wrote: >> >> When I turn on debug, I get the following: >> >> >> >> "debug": { >> >> "rawquerystring": "Food", >> >> "querystring": "Food", >> >> "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord", >> >> "parsedquery_toString": "+(label:Food^3.0) ()", >> >> "explain": {}, >> >> "QParser": "DisMaxQParser", >> >> "altquerystring": null, >> >> "boostfuncs": null, >> >> ... >> >> } >> >> >> >> I don't understand how/why this doesn't use a "contains" operator. >> >> This was the behavior on the old 1.4 instance. I went through the >> >> changelog for 1.4 to 5.1, but I don't find any explicit information >> >> about dismax behaving differently, except the "mm" parameter needs a >> >> default. I tried many values for mm (including 0, 100%, 100, etc) but >> >> to no avail. >> > >> > In your schema.xml, what is the definition of the label field, and the >> > fieldType definition of the type used in the label field? That will >> > determine exactly how the query is parsed and whether individual words >> > will match. I wasn't using dismax or edismax back when I was running >> > 1.4, so I can't say anything about how it used to work, only how it >> > works now. >> > >> > Thanks, >> > Shawn >> > >>