Re: Strip out punctuation at the end of token

Shawn Heisey Thu, 23 Nov 2017 07:22:48 -0800

On 11/23/2017 8:06 AM, marotosg wrote:

I am trying to strip out any "."  at the end of a token but I would like to
keep the original token as well.
This is my index analyzer
<analyzer type="index">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
preserveOriginal="1"/>
           <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="false"/>
           <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>


i was thinking of using the solr.PatternReplaceFilterFactory but i see this
one won't keep the original token.


The WordDelimiterFilterFactory that you have configured will do that.

Here I have taken your analysis chain, added it to a test install ofSolr, and tried it out. It appears to be doing exactly what you want itto do.


https://www.dropbox.com/s/5puf7rzbypdcspu/wdf-analysis-marotosg.png?dl=0

Thanks,
Shawn

Re: Strip out punctuation at the end of token

Reply via email to