Using 3.5 (also tried trunk), I have the following charFilter defined on my fieldType (just extended text_general to keep things simple):
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w)\1{2,}+" replaceWith="$1$1"/> The intent of this charFilter is to match any characters that are repeated in a string more than twice and collapse down to a max of two, i.e. fooobarrrr => foobarr Using the analysis form, I end up with: fba Here is the full <fieldType> definition (just the one addition of the leading <charFilter>): <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\w)\1{2,}+" replaceWith="$1$1"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> It seems like my regex and replacement strategy should work ... to prove it, I wrote a little Regex.java class in which I borrowed some from the PatternReplaceCharFilter class ... when I execute the following with my little hack, I get the expected results: [~/dev]$ java Regex "(\\w)\\1{2,}+" fooobarrrr "\$1\$1" result: foobarr Is this a known issue or does anyone know how to work-around this? If not, I'll open a JIRA but wanted to check here first. Cheers, Tim >>>> Regex.java <<<< import java.util.regex.Pattern; import java.util.regex.Matcher; public class Regex { public static void main(String[] args) throws Exception { String toCompile = args[0]; Pattern p = Pattern.compile(toCompile); System.out.println("result: "+processPattern(p, args[1], args[2])); } // borrowed from PatternReplaceCharFilter.java private static CharSequence processPattern(Pattern pattern, CharSequence input, String replacement) { final Matcher m = pattern.matcher(input); final StringBuffer cumulativeOutput = new StringBuffer(); int cumulative = 0; int lastMatchEnd = 0; while (m.find()) { final int groupSize = m.end() - m.start(); final int skippedSize = m.start() - lastMatchEnd; lastMatchEnd = m.end(); final int lengthBeforeReplacement = cumulativeOutput.length() + skippedSize; m.appendReplacement(cumulativeOutput, replacement); final int replacementSize = cumulativeOutput.length() - lengthBeforeReplacement; if (groupSize != replacementSize) { if (replacementSize < groupSize) { cumulative += groupSize - replacementSize; int atIndex = lengthBeforeReplacement + replacementSize; //System.err.println(atIndex + "!" + cumulative); //addOffCorrectMap(atIndex, cumulative); } } } m.appendTail(cumulativeOutput); return cumulativeOutput; } }