Hi,
I ran into a problem with the Solr dismax query parser. We're using Solr
4.10.0 and the field types mentioned below are taken from the example
schema.xml.
In a test we have a document with rather strange content in a field
named "name_tokenized" of type "text_general":
abc_<iframe src='loadLocale.js' onload='javascript:document.XSSed="name"' width=0
height=0>
(It's a test for XSS bug detection, but that doesn't matter here.)
I can find the document when I use the following dismax query with qf
set to field "name_tokenized" only:
http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2
If I submit exactly the same query but add another field "feederstate"
to the qf parameter, I don't get any results anymore. The field is of
type "string".
http://localhost:44080/solr/studio/editor?deftype=dismax&q=abc_%3Ciframe+src%3D%27loadLocale.js%27+onload%3D%27javascript%3Adocument.XSSed%3D%22name%22%27&debug=true&echoParams=all&qf=name_tokenized^2%20feederstate
The decoded value of q is: abc_<iframe src='loadLocale.js'
onload='javascript:document.XSSed="name"' and it seems the trailing
single-quote causes problems here. (In fact, I can find the document
when I remove the last char)
The parsed query for the latter case is
(
+((
DisjunctionMaxQuery((feederstate:abc_<iframe | ((name_tokenized:abc_
name_tokenized:iframe)^2.0))~0.1)
DisjunctionMaxQuery((feederstate:src='loadLocale.js' | ((name_tokenized:src
name_tokenized:loadlocale.js)^2.0))~0.1)
DisjunctionMaxQuery((feederstate:onload='javascript:document.XSSed= |
((name_tokenized:onload name_tokenized:javascript:document.xssed)^2.0))~0.1)
DisjunctionMaxQuery((feederstate:name | name_tokenized:name^2.0)~0.1)
DisjunctionMaxQuery((feederstate:')~0.1)
)~5)
DisjunctionMaxQuery((textbody:"abc_ iframe src loadlocale.js onload
javascript:document.xssed name" | name_tokenized:"abc_ iframe src loadlocale.js onload
javascript:document.xssed name"^2.0)~0.1)
)/no_coord
I've configured the handler with <str name="mm">100%</str> so that all
of the 5 dismax queries at the top must match. But this one does not match:
DisjunctionMaxQuery((feederstate:')~0.1)
I'd expect that an additional field in the qf parameter would not lead
to fewer matches.
Okay, the above example is a rather crude test but I'd like to
understand it. Is this a bug in Solr?
I've also found https://issues.apache.org/jira/browse/SOLR-3047 which
sounds somewhat similar.
Regards,
Andreas