[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023884#comment-17023884 ]
ASF subversion and git services commented on SOLR-14189: -------------------------------------------------------- Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch refs/heads/gradle-master from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ] SOLR-14189: Add changes entry > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > ---------------------------------------------------------------------------------------------- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers > Reporter: Andy Webb > Assignee: Uwe Schindler > Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at > line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> > ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... > <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... > <TERM> ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org