Hi, So, I have a string for indexing:
abc - def (notice the space on either side of hyphen) which is being processed with this filter-list:- <fieldType name="shingle" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="org.apache.lucene.analysis.icu.ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="0" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> <filter class="solr.FlattenGraphFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false" fillerToken=""/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="10000" consumeAllTokens="false"/> <filter class="solr.LengthFilterFactory" min="1" max="255"/> </analyzer> I get two shingle tokens at the end: "abc" "def" I want to get "abc def" . What can I tweak to get this? Thanks Nawab