First use PatternReplaceCharFilterFactory. The difference is that PatternReplaceCharFilterFactoryworks on the entire input whereas PatternReplaceFilterFactory works only on the tokens emitted by the tokenizer. Concrete example using WhitespeceTokenizerFactory would be this [is some ] text PatternReplaceFilterFactory would see 5 tokens, "this", "[is", "some", "]", and "text". So it would be very hard to do what you want.
patternReplaceCharFilterFactory will see the entire input as one string and operate on it, _then" send it through the tokenizer. And also don't be fooled by the fact that the _stored_ data will still contain the removed words. So when you get the doc back from solr you'll see the original input, brackets and all. In the above example, if you returned the field you'd still see this [is some ] text when the doc matched. This doc would be found when searching for "this" or "text", but _not_ when searching for "is" or "some". You want some pattern like <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\[.*?\]" replacement=" "/> Best, Erick On Wed, May 10, 2017 at 6:08 PM, Michael Tobias <mtob...@btinternet.com> wrote: > I am sure this is very simple but I cannot get the pattern right. > > How can I use solr.PatternReplaceFilterFactory to remove all words in > brackets from being indexed? > > eg [ignore this] > > thanks > > Michael >