Hi,

So, I have a string for indexing:

abc - def (notice the space on either side of hyphen)

which is being processed with this filter-list:-


    <fieldType name="shingle" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <charFilter
class="org.apache.lucene.analysis.icu.ICUNormalizer2CharFilterFactory"
name="nfkc" mode="compose"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" preserveOriginal="0"
splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
        <filter class="solr.FlattenGraphFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="false" fillerToken=""/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.LimitTokenCountFilterFactory"
maxTokenCount="10000" consumeAllTokens="false"/>
        <filter class="solr.LengthFilterFactory" min="1" max="255"/>
      </analyzer>


I get two shingle tokens at the end:

"abc" "def"

I want to get "abc def" . What can I tweak to get this?


Thanks
Nawab

Reply via email to