Hi,

I ran into a problem with the Solr dismax query parser. We're using Solr 4.10.0 and the field types mentioned below are taken from the example schema.xml.

In a test we have a document with rather strange content in a field named "name_tokenized" of type "text_general":

abc_<iframe src='loadLocale.js' onload='javascript:document.XSSed="name"' width=0 
height=0>

(It's a test for XSS bug detection, but that doesn't matter here.)

I can find the document when I use the following dismax query with qf set to field "name_tokenized" only:

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2

If I submit exactly the same query but add another field "feederstate" to the qf parameter, I don't get any results anymore. The field is of type "string".

http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2%20feederstate

The decoded value of q is: abc_<iframe src='loadLocale.js' onload='javascript:document.XSSed="name"' and it seems the trailing single-quote causes problems here. (In fact, I can find the document when I remove the last char)
The parsed query for the latter case is

(
  +((
    DisjunctionMaxQuery((feederstate:abc_<iframe | ((name_tokenized:abc_ 
name_tokenized:iframe)^2.0))~0.1)
    DisjunctionMaxQuery((feederstate:src='loadLocale.js' | ((name_tokenized:src 
name_tokenized:loadlocale.js)^2.0))~0.1)
    DisjunctionMaxQuery((feederstate:onload='javascript:document.XSSed= | 
((name_tokenized:onload name_tokenized:javascript:document.xssed)^2.0))~0.1)
    DisjunctionMaxQuery((feederstate:name | name_tokenized:name^2.0)~0.1)
    DisjunctionMaxQuery((feederstate:')~0.1)
  )~5)

  DisjunctionMaxQuery((textbody:"abc_ iframe src loadlocale.js onload 
javascript:document.xssed name" | name_tokenized:"abc_ iframe src loadlocale.js onload 
javascript:document.xssed name"^2.0)~0.1)
)/no_coord


I've configured the handler with <str name="mm">100%</str> so that all of the 5 dismax queries at the top must match. But this one does not match:

DisjunctionMaxQuery((feederstate:')~0.1)


I'd expect that an additional field in the qf parameter would not lead to fewer matches. Okay, the above example is a rather crude test but I'd like to understand it. Is this a bug in Solr?

I've also found https://issues.apache.org/jira/browse/SOLR-3047 which sounds somewhat similar.

Regards,
Andreas

Reply via email to