Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

Wouter Admiraal Thu, 04 Jun 2015 08:40:42 -0700

Thanks for the reply.

So, as an aside, should I remove the solr.WhitespaceTokenizerFactory
and solr.WordDelimiterFilterFactory from the query analyzer part?


Any idea in which direction I should poke around? I deactivated dismax
for now, but would really like to use it.


Wouter Admiraal


2015-06-04 16:54 GMT+02:00 Jack Krupansky <jack.krupan...@gmail.com>:
> The empty parentheses in the parsed query says something odd is going on
> with query-time analysis, that is essentially generating an empty term.
> That may not be the cause of your specific issue, but at least its says
> that something is unexplained here.
>
> Generally, there is an asymmetry between the index and query analyzers when
> the word delimiter filter is used - at index time you typically generate
> extra terms to aid in recall, while at query time the extra terms are not
> generated to aid in precision. In particular, you would just generate the
> word and number parts, and not preserve the original token. But... that
> should not matter if there is only a single query term. So, something else
> is going on here.
>
> -- Jack Krupansky
>
> On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal <w...@wadmiraal.net> wrote:
>
>> Hi, thanks for the response.
>>
>> Label field:
>> <field name="label" type="text" indexed="true" stored="true"
>> termVectors="true" omitNorms="true"/>
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>>     <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="txt/stopwords.txt" />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>         <filter class="solr.NGramFilterFactory" minGramSize="3"
>> maxGramSize="25"/>
>>     </analyzer>
>>     <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory"
>> synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="txt/stopwords.txt" />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>> </fieldType>
>>
>> I can surely optimize the above config a bit, maybe only use one
>> <analyzer> for both query and index. But for now, this is what it
>> does.
>>
>> Just as a side-question: is dismax *supposed* to match fields exactly
>> with the search query? Or is my expectation correct, meaning it should
>> "tokenize" the field, just as with regular searches? It just doesn't
>> seem intuitive to me.
>>
>> Thank you again for your help.
>>
>> Kind regards,
>> Wouter Admiraal
>>
>>
>> 2015-06-04 14:52 GMT+02:00 Shawn Heisey <apa...@elyograg.org>:
>> > On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> >> When I turn on debug, I get the following:
>> >>
>> >> "debug": {
>> >>   "rawquerystring": "Food",
>> >>   "querystring": "Food",
>> >>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>> >>   "parsedquery_toString": "+(label:Food^3.0) ()",
>> >>   "explain": {},
>> >>   "QParser": "DisMaxQParser",
>> >>   "altquerystring": null,
>> >>   "boostfuncs": null,
>> >>   ...
>> >> }
>> >>
>> >> I don't understand how/why this doesn't use a "contains" operator.
>> >> This was the behavior on the old 1.4 instance. I went through the
>> >> changelog for 1.4 to 5.1, but I don't find any explicit information
>> >> about dismax behaving differently, except the "mm" parameter needs a
>> >> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> >> to no avail.
>> >
>> > In your schema.xml, what is the definition of the label field, and the
>> > fieldType definition of the type used in the label field?  That will
>> > determine exactly how the query is parsed and whether individual words
>> > will match.  I wasn't using dismax or edismax back when I was running
>> > 1.4, so I can't say anything about how it used to work, only how it
>> > works now.
>> >
>> > Thanks,
>> > Shawn
>> >
>>

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

Reply via email to