rmuir commented on PR #12172:
URL: https://github.com/apache/lucene/pull/12172#issuecomment-1448749282

   Note, if we fix it here, stemmer maybe should deal with this case too? 
https://github.com/snowballstem/snowball/blob/master/algorithms/romanian.sbl#L26-L27
   
   Alternatively, tokenfilter could be added that "normalizes/folds" these and 
runs before stopfilter and stemfilter to take care of it. It would have the 
advantage of giving the user choice (they can just create customanalzer and 
remove the normalization if they don't want that folding). But it would be 
overkill, if we should really just fix stopwords and stemmer.  Sorry, I'm not 
knowledgeable on Romanian, so I don't know if it causes problems to treat them 
"the same". 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to