The following query is failing:

((Google +))

This is ultimately reduced to 'google' by my analysis chain, but the following is in my log (3.2.0, but 3.4.0 also fails):

SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse '( (Google +))': Encountered " ")" ") "" at line 1, column 12.

If I change it to 'Google+' or 'Goo+gle' it works.

Below is the fieldType definition. The pattern filter is designed to strip leading/trailing punctuation characters, but leave any punctuation in the middle of a term alone. It does affect the plus sign, by reducing it to a term of length zero. The length filter then removes it at the end. In the 'Google+' variant, the pattern filter simply strips that character off and the query does not fail. Am I seeing a bug here, or problems with my fieldType?

<fieldType name="genText" class="solr.TextField" sortMissingLast="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
          pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
          replacement="$2"
          allowempty="false"
        />
<filter class="solr.WordDelimiterFilterFactory"
          splitOnCaseChange="1"
          splitOnNumerics="1"
          stemEnglishPossessive="1"
          generateWordParts="1"
          generateNumberParts="1"
          catenateWords="1"
          catenateNumbers="1"
          catenateAll="0"
          preserveOriginal="1"
        />
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="1" max="512"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
          pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
          replacement="$2"
          allowempty="false"
        />
<filter class="solr.WordDelimiterFilterFactory"
          splitOnCaseChange="1"
          splitOnNumerics="1"
          stemEnglishPossessive="1"
          generateWordParts="1"
          generateNumberParts="1"
          catenateWords="0"
          catenateNumbers="0"
          catenateAll="0"
          preserveOriginal="1"
        />
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="1" max="512"/>
</analyzer>
</fieldType>

Reply via email to