Hi, i've created the following filterchain in a field type, the idea is to use it for autocompletion purposes:
<tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- create tokens separated by whitespace --> <filter class="solr.LowerCaseFilterFactory"/> <!-- lowercase everything --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <!-- throw out stopwords --> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> <!-- throw out all everything except a-z --> <!-- actually, here i would like to join multiple tokens together again, to provide one token for the EdgeNGramFilterFactory --> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" /> <!-- create edgeNGram tokens for autocomplete matches --> With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple tokens on input strings with whitespaces in it. This leads to the following results: Input Query: "George Cloo" Matches: - "George Harrison" - "John Clooridge" - "George Smith" -"George Clooney" - etc However, only "George Clooney" should match in the autocompletion use case. Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which concatenates all the tokens generated by the WhitespaceTokenizerFactory. Are there filters which can do such a thing? If not, are there examples how to implement a custom TokenFilter? thanks! -robert