Re: dismax query does not match with additional field in qf

Jack Krupansky Tue, 07 Oct 2014 08:26:28 -0700

I think what is happening is that your last term, the naked apostrophe isanalyzing to zero terms and simply being ignored, but when you add the extrafield, a string field, you now have another term in the query, and you havemm set to 100%, so that "new" term must match. It probably fails because youhave no naked apostrophe term in that field in the index.

Probably none of your string field terms were matching before, but thatwasn't apparent since the tokenized text matched. But with this nakedapostrophe term, there is no way to tell Lucene to match "no" term, so itrequried the string term to match, which won't happen since only the fullstring is indexed.

Generally, you need to escape all special characters in a query. Thenhopefully your string field will match.


-- Jack Krupansky

-----Original Message-----From: Andreas Hubold

Sent: Tuesday, September 30, 2014 11:14 AM
To: solr-user@lucene.apache.org
Subject: dismax query does not match with additional field in qf

Hi,

I ran into a problem with the Solr dismax query parser. We're using Solr
4.10.0 and the field types mentioned below are taken from the example
schema.xml.

In a test we have a document with rather strange content in a field
named "name_tokenized" of type "text_general":

abc_<iframe src='loadLocale.js' onload='javascript:document.XSSed="name"'width=0 height=0>


(It's a test for XSS bug detection, but that doesn't matter here.)

I can find the document when I use the following dismax query with qf
set to field "name_tokenized" only:

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2

If I submit exactly the same query but add another field "feederstate"
to the qf parameter, I don't get any results anymore. The field is of
type "string".

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2%20feederstate

The decoded value of q is: abc_<iframe src='loadLocale.js'
onload='javascript:document.XSSed="name"' and it seems the trailing
single-quote causes problems here. (In fact, I can find the document
when I remove the last char)
The parsed query for the latter case is

(
  +((

DisjunctionMaxQuery((feederstate:abc_<iframe | ((name_tokenized:abc_name_tokenized:iframe)^2.0))~0.1)DisjunctionMaxQuery((feederstate:src='loadLocale.js' |((name_tokenized:src name_tokenized:loadlocale.js)^2.0))~0.1)DisjunctionMaxQuery((feederstate:onload='javascript:document.XSSed= |((name_tokenized:onload name_tokenized:javascript:document.xssed)^2.0))~0.1)

    DisjunctionMaxQuery((feederstate:name | name_tokenized:name^2.0)~0.1)
    DisjunctionMaxQuery((feederstate:')~0.1)
  )~5)

DisjunctionMaxQuery((textbody:"abc_ iframe src loadlocale.js onloadjavascript:document.xssed name" | name_tokenized:"abc_ iframe srcloadlocale.js onload javascript:document.xssed name"^2.0)~0.1)

)/no_coord


I've configured the handler with <str name="mm">100%</str> so that all
of the 5 dismax queries at the top must match. But this one does not match:

DisjunctionMaxQuery((feederstate:')~0.1)


I'd expect that an additional field in the qf parameter would not lead
to fewer matches.
Okay, the above example is a rather crude test but I'd like to
understand it. Is this a bug in Solr?

I've also found https://issues.apache.org/jira/browse/SOLR-3047 which
sounds somewhat similar.

Regards,

Andreas

Re: dismax query does not match with additional field in qf

Reply via email to