All, I am having trouble getting my regex pattern to work properly. I have tried PatternReplaceFilterFactory after the standard tokenizer
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" replacement=" " replace="all"/> and PatternReplaceCharFilterFactory before it. <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-zA-Z0-9])" replacement=" " replace="all"/> It looks like this should work to remove everything except letters and numbers. <charFilter class="solr.HTMLStripCharFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" /> <filter class="solr.LengthFilterFactory" min="2" max="999"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" replacement=" " replace="all"/> I am left with quite a few facet items like this <int name="_ view">1443</int> <int name="view _">1599</int> Can anyone suggest what may be going on here? I have verified that my regex works properly here http://www.fileformat.info/tool/regex.htm Adam