dear solr users,

my data looks like this:

j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg 
asdlfj.

if I want to query for the first "word", the following queries must match:

j]s(dh)fjk
j]s(dh)fjk
j]sdhfjk
jsdhfjk
dhf

So the matching should ignore some characters like ( ) [ ] and should match 
substrings.

So far I have the following field definition in the schema.xml:

    <fieldType name="text_ngram" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="[\[\]\(\)]" 
replacement="" replace="all" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2" 
/> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="[\[\]\(\)]" 
replacement="" replace="all" />
        <filter class="solr.LowerCaseFilterFactory"/>  
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2" 
/> 
      </analyzer>
    </fieldType>


With this definition the matching works as planned. But not for highlighting, 
there the special characters seem to move the <em> tags to wrong positions, for 
example searching for "jsdhfjk" misses the last 3 letters of the words ( = 3 
special characters from PatternReplaceFilterFactory)

<em>j]s(dh)</em>fjk

Solr has so many bells and whistles - what must I do to get a correctly working 
highlighting?

kind regards,
Stefan



Reply via email to