Trey Grainger created SOLR-14434:
------------------------------------
Summary: Multiterm Analyzer Not Persisted in Managed Schema
Key: SOLR-14434
URL: https://issues.apache.org/jira/browse/SOLR-14434
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: Schema and Analysis
Affects Versions: 8.5.1, 8.4.1, 8.5, 8.3.1, 8.4, 8.3, 8.1.1, 8.2, 8.1, 8.0
Reporter: Trey Grainger
In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an
explicit "{{multiterm}}" analyzer to schema f\{{ieldType}} definitions. This
allows for specific control over analysis for things like wildcard terms,
prefix queries, range queries, etc. For example, the following would cause the
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
<fieldType class="solr.TextField" multiValued="true" name="multiterm_test"
positionIncrementGap="100" termOffsets="true" termVectors="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
<analyzer type="multiterm">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
</analyzer>
</fieldType>{code}
This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if
you use managed schema (i.e. {{managed-schema}} file) and push your schema
directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema
API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only
{{index}} and {{query}} analyzers are).
This bug seems to have originated from LUCENE-8497, which refactored this code
area substantially. The bug is caused by the managed schema being able to READ
in the {{multiterm}} analyzers from the schema file, but then being unable to
write them out. Since pushing the schema directly to Zookeeper only requires
Solr reading them in, this bug would not have been obvious in initial testing.
However, since the schema API reads in the schema file, writes an updated
schema out to Zookeeper (where the bug occurs), and then reads the file back
in, all of the {{multiTerm}} analyzers get stripped out.
I've identified the problematic code and am looking into an appropriate fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]