from:"Wouter Admiraal"

When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal

Hi all.

Sorry about the title, but I don't know how to be more explicit than
that. I am updating a Solr 1.4 install to Solr 5.1. I went through all
the changes, updated my schema.xml, etc. Everything works (I
re-indexed instead of migrating the existing one). I can search for
documents, no problem there.

Where I do have a problem is with dismax. It doesn't behave like
before. It must a configuration issue, or maybe I never really
understood how it is supposed to work.

I have 2 documents, which can be summarized as follows:

{
  "label": "Food Inc",
  "keywords": ["Food", "Nutrition"]
}

{
  "label": "Food check online",
  "keywords": ["Internet", "Health"]
}

If I disable dismax and search for "Food" (?q=Food), I find both
documents. So far, so good.

If I turn dismax on and add a boost to the label, I get 0 results
(?q=Food&defType=dismax&qf=label^3.0).

If I turn dismax on and add a boost to the keywords, I get 1 result
("Food Inc", which has a keyword "Food";
?q=Food&defType=dismax&qf=keywords^2.0).

So, from what I understand, it tries to match the search term
*exactly* when enabling dismax, but uses a "contains keyword" logic
when disabling dismax (same for edismax). Which means "Food" !== "Food
Inc" with dismax on.

When I turn on debug, I get the following:

"debug": {
  "rawquerystring": "Food",
  "querystring": "Food",
  "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
  "parsedquery_toString": "+(label:Food^3.0) ()",
  "explain": {},
  "QParser": "DisMaxQParser",
  "altquerystring": null,
  "boostfuncs": null,
  ...
}

I don't understand how/why this doesn't use a "contains" operator.
This was the behavior on the old 1.4 instance. I went through the
changelog for 1.4 to 5.1, but I don't find any explicit information
about dismax behaving differently, except the "mm" parameter needs a
default. I tried many values for mm (including 0, 100%, 100, etc) but
to no avail.

Thanks for your help.

Best regards,

Wouter Admiraal

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal

Hi, thanks for the response.

Label field:




















I can surely optimize the above config a bit, maybe only use one
 for both query and index. But for now, this is what it
does.

Just as a side-question: is dismax *supposed* to match fields exactly
with the search query? Or is my expectation correct, meaning it should
"tokenize" the field, just as with regular searches? It just doesn't
seem intuitive to me.

Thank you again for your help.

Kind regards,
Wouter Admiraal


2015-06-04 14:52 GMT+02:00 Shawn Heisey :
> On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> When I turn on debug, I get the following:
>>
>> "debug": {
>>   "rawquerystring": "Food",
>>   "querystring": "Food",
>>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>>   "parsedquery_toString": "+(label:Food^3.0) ()",
>>   "explain": {},
>>   "QParser": "DisMaxQParser",
>>   "altquerystring": null,
>>   "boostfuncs": null,
>>   ...
>> }
>>
>> I don't understand how/why this doesn't use a "contains" operator.
>> This was the behavior on the old 1.4 instance. I went through the
>> changelog for 1.4 to 5.1, but I don't find any explicit information
>> about dismax behaving differently, except the "mm" parameter needs a
>> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> to no avail.
>
> In your schema.xml, what is the definition of the label field, and the
> fieldType definition of the type used in the label field?  That will
> determine exactly how the query is parsed and whether individual words
> will match.  I wasn't using dismax or edismax back when I was running
> 1.4, so I can't say anything about how it used to work, only how it
> works now.
>
> Thanks,
> Shawn
>

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

2015-06-04 Thread Wouter Admiraal

Thanks for the reply.

So, as an aside, should I remove the solr.WhitespaceTokenizerFactory
and solr.WordDelimiterFilterFactory from the query analyzer part?

Any idea in which direction I should poke around? I deactivated dismax
for now, but would really like to use it.


Wouter Admiraal


2015-06-04 16:54 GMT+02:00 Jack Krupansky :
> The empty parentheses in the parsed query says something odd is going on
> with query-time analysis, that is essentially generating an empty term.
> That may not be the cause of your specific issue, but at least its says
> that something is unexplained here.
>
> Generally, there is an asymmetry between the index and query analyzers when
> the word delimiter filter is used - at index time you typically generate
> extra terms to aid in recall, while at query time the extra terms are not
> generated to aid in precision. In particular, you would just generate the
> word and number parts, and not preserve the original token. But... that
> should not matter if there is only a single query term. So, something else
> is going on here.
>
> -- Jack Krupansky
>
> On Thu, Jun 4, 2015 at 10:03 AM, Wouter Admiraal  wrote:
>
>> Hi, thanks for the response.
>>
>> Label field:
>> > termVectors="true" omitNorms="true"/>
>>
>> 
>> 
>> 
>> > words="txt/stopwords.txt" />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>> 
>> 
>> > maxGramSize="25"/>
>> 
>> 
>> 
>> > synonyms="txt/synonyms.txt" ignoreCase="true" expand="true"/>
>> > words="txt/stopwords.txt" />
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
>> preserveOriginal="1"/>
>> 
>> 
>> 
>>
>> I can surely optimize the above config a bit, maybe only use one
>>  for both query and index. But for now, this is what it
>> does.
>>
>> Just as a side-question: is dismax *supposed* to match fields exactly
>> with the search query? Or is my expectation correct, meaning it should
>> "tokenize" the field, just as with regular searches? It just doesn't
>> seem intuitive to me.
>>
>> Thank you again for your help.
>>
>> Kind regards,
>> Wouter Admiraal
>>
>>
>> 2015-06-04 14:52 GMT+02:00 Shawn Heisey :
>> > On 6/4/2015 1:22 AM, Wouter Admiraal wrote:
>> >> When I turn on debug, I get the following:
>> >>
>> >> "debug": {
>> >>   "rawquerystring": "Food",
>> >>   "querystring": "Food",
>> >>   "parsedquery": "(+DisjunctionMaxQuery((label:Food^3.0)) ())/no_coord",
>> >>   "parsedquery_toString": "+(label:Food^3.0) ()",
>> >>   "explain": {},
>> >>   "QParser": "DisMaxQParser",
>> >>   "altquerystring": null,
>> >>   "boostfuncs": null,
>> >>   ...
>> >> }
>> >>
>> >> I don't understand how/why this doesn't use a "contains" operator.
>> >> This was the behavior on the old 1.4 instance. I went through the
>> >> changelog for 1.4 to 5.1, but I don't find any explicit information
>> >> about dismax behaving differently, except the "mm" parameter needs a
>> >> default. I tried many values for mm (including 0, 100%, 100, etc) but
>> >> to no avail.
>> >
>> > In your schema.xml, what is the definition of the label field, and the
>> > fieldType definition of the type used in the label field?  That will
>> > determine exactly how the query is parsed and whether individual words
>> > will match.  I wasn't using dismax or edismax back when I was running
>> > 1.4, so I can't say anything about how it used to work, only how it
>> > works now.
>> >
>> > Thanks,
>> > Shawn
>> >
>>

When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

Re: When using Dismax, Solr 5.1 tries to compare the entire field to the search string, instead of only using keywords

3 matches

Site Navigation

Mail list logo

Footer information