dear solr users, my data looks like this:
j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg asdlfj. if I want to query for the first "word", the following queries must match: j]s(dh)fjk j]s(dh)fjk j]sdhfjk jsdhfjk dhf So the matching should ignore some characters like ( ) [ ] and should match substrings. So far I have the following field definition in the schema.xml: <fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="[\[\]\(\)]" replacement="" replace="all" /> <filter class="solr.LowerCaseFilterFactory"/> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="[\[\]\(\)]" replacement="" replace="all" /> <filter class="solr.LowerCaseFilterFactory"/> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2" /> </analyzer> </fieldType> With this definition the matching works as planned. But not for highlighting, there the special characters seem to move the <em> tags to wrong positions, for example searching for "jsdhfjk" misses the last 3 letters of the words ( = 3 special characters from PatternReplaceFilterFactory) <em>j]s(dh)</em>fjk Solr has so many bells and whistles - what must I do to get a correctly working highlighting? kind regards, Stefan