[
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Webb updated SOLR-14189:
-----------------------------
Description:
The edismax and some other query parsers treat pure whitespace queries as empty
queries, but they use Java's {{String.trim()}} method to normalise queries.
That method [only treats characters 0-32 as
whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} -
which bypass the test and lead to {{400 Bad Request}} responses - see for
example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs
{{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the
exception:
{noformat}
org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at
line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER>
... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ...
<REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ...
<TERM> ...
{noformat}
[PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax,
edismax and rerank query parsers to use
[StringUtils.stripToNull()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#stripToNull-java.lang.String-]
which is aware of all whitespace characters.
was:
The edismax and some other query parsers treat pure whitespace queries as empty
queries, but they use Java's {{String.trim()}} method to normalise queries.
That method only treats characters 0-32 as whitespace. Other whitespace
characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - which bypass the test
and lead to {{400 Bad Request}} responses - see for example
{{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs
{{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the
exception:
{noformat}
org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at
line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER>
... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ...
<REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ...
<TERM> ...
{noformat}
PR: https://github.com/apache/lucene-solr/pull/1172
> Some whitespace characters bypass zero-length test in query parsers leading
> to 400 Bad Request
> ----------------------------------------------------------------------------------------------
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query parsers
> Reporter: Andy Webb
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as
> empty queries, but they use Java's {{String.trim()}} method to normalise
> queries. That method [only treats characters 0-32 as
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
> Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} -
> which bypass the test and lead to {{400 Bad Request}} responses - see for
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at
> line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER>
> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ...
> <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ...
> <TERM> ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax,
> edismax and rerank query parsers to use
> [StringUtils.stripToNull()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#stripToNull-java.lang.String-]
> which is aware of all whitespace characters.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]