Re: Strip out punctuation at the end of token

marotosg Fri, 24 Nov 2017 01:32:37 -0800

Hi Shaw.
Thanks for your reply. Actually my issue is with the last token. It looks
like for the last token of a string. It keeps the dot.


In your case Testing. This is a test. Test.

Keeps the "Test." 

Is there any reason I can't see for that behauviour?

Thanks,
Sergio

Testing. This is a test. Test.
Shawn Heisey-2 wrote
> On 11/23/2017 8:06 AM, marotosg wrote:
>> I am trying to strip out any "."  at the end of a token but I would like
>> to
>> keep the original token as well.
>> This is my index analyzer
>> 
> <analyzer type="index">
>>           
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>            
> <filter class="solr.WordDelimiterFilterFactory"
>>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>>            
> <filter class="solr.ASCIIFoldingFilterFactory"
>>
>  preserveOriginal="false"/>
>>            
> <filter class="solr.LowerCaseFilterFactory"/>
>> 
> </analyzer>
>> 
>> i was thinking of using the solr.PatternReplaceFilterFactory but i see
>> this
>> one won't keep the original token.
> 
> The WordDelimiterFilterFactory that you have configured will do that.
> 
> Here I have taken your analysis chain, added it to a test install of 
> Solr, and tried it out.  It appears to be doing exactly what you want it 
> to do.
> 
> https://www.dropbox.com/s/5puf7rzbypdcspu/wdf-analysis-marotosg.png?dl=0
> 
> Thanks,
> Shawn





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Strip out punctuation at the end of token

Reply via email to