I'm having an issue where searches that contain ampersands aren't being
handled correctly. I need them to be dropped at index time *AND* query
time. When documents come in and are indexed the ampersands are
successfully dropped when they go into my stemmed field (When I facet on
the stemmed field they aren't in the list), but when I actually search with
a term containing an ampersand, I get no results.

E.g. I search for the string "light fit" or "light and fit" then I get
results, but when I search for "light & fit" I get none. Even though the
SnowballPorterFilterFactory should be dropping it at query time like it
does for the "and" and all 3 queries *should* be equivalent.

I've tried adding a synonym such that shows in
my _schema_analysis_synonyms_default.json (I only have one default file) in
both this form and its inverse as well:

"and":[

      "&",
      "and"],


I've also tried adding the StopWord filter to my fieldtype with & in the
stopwords (though this shouldn't be necessary because the SnowBallPorter
should be dropping it anyway) and it still doesn't work.

Is there some kind of special handling I need for ampersands? I'm thinking
that Solr must be interpreting it as some kind of operator and I need to
tell Solr that it's actually literal text so the SnowBallPorter knows to
drop it. Using backslashes or url encoding instead doesn't work though.
Does anyone have any ideas?

I can obviously just remove any ampersands from the q before I submit the
query to Solr and get the correct results, so this is not a game breaking
problem, but i'm more curious to *why* this is happening and how to fix it
correctly.

Cheers,

Callum.

Extra info:

I'm using Solr 5.5.2 in cloud mode.

The q in the queries is specified like this and are parsed the following
way:

"rawquerystring":"stemmed_description:light & fit", "querystring":"
stemmed_description:light & fit", "parsedquery":"(+(+stemmed_description:light
+DisjunctionMaxQuery((stemmed_description:&)) +DisjunctionMaxQuery((
stemmed_description:fit))))/no_coord", "parsedquery_toString":"+(+
stemmed_description:light +(stemmed_description:&) +(stemmed_description
:fit))",

I have a stemmed field defined in my schema (schema version 1.5) defined
like this:

<field name="stemmed_description" type="stemmed_text" indexed="true"
stored="false" required="false" multiValued="true"/>

with a field type defined like this:

    <!-- Stemmed text type -->
    <fieldType name="stemmed_text" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                catenateWords="1"
                preserveOriginal="0"
                splitOnNumerics="0"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.ManagedSynonymFilterFactory" managed="default"
/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                catenateWords="1"
                preserveOriginal="1"
                splitOnNumerics="0"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory"
language="English"/>
      </analyzer>
    </fieldType>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for our other offices can be found at 
http://www.mintel.com/office-locations.

This email and any attachments may include content that is confidential, 
privileged 
or otherwise protected under applicable law. Unauthorised disclosure, 
copying, distribution 
or use of the contents is prohibited and may be unlawful. If you have 
received this email in error,
including without appropriate authorisation, then please reply to the 
sender about the error 
and delete this email and any attachments.

Reply via email to