Hello, We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and WordDelimiterFilter have been deprecated. Solr doc recommends to use SynonymGraphFilter and WordDelimiterGraphFilter instead. In current schema, we have text field type defined as
<fieldType name="text_syn" class="solr.TextField" omitPositions="true" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> In the index phase we have both SynonymFilter and WordDelimiterFilter configured: <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/> Solr documentation states that "graph filters produces correct token graphs, but cannot consume an input token graph correctly. When use these two graph filter during indexing, you must follow it with a FlattenGraphFilter". I am confused as how to replace them with the new SynonymGraphFilter and WordDelimiterGraphFilter. A few questions: 1. Regarding the FlattenGraphFilter, is it to be used only once or multiple times after each graph filter? Can we have the configure like this? <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterGraphFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/> <filter class="solr.FlattenGraphFilterFactory"/> 2. Is it possible to we have two graph filters, i.e. both SynonymGraphFilter and WordDelimiterGraphFilter in the same analysis chain? If not what's the best option to replace our current config? 3. With the StopFilterFactory in between SynonymGraphFilter and WordDelimiterGraphFilter, I get a few index errors: Exception writing document id XXXXXX to the index; possible analysis error Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 But if I move StopFilter before the SynonymGraphFilter the errors are gone. I guess the StopFilter mess up the SynonymGraphFilter output? Not sure if it's a solr defect or there is a guideline that StopFilter should not be put after graph filters. Thanks in advance for you input. Thanks, Wei