rmuir commented on PR #12172: URL: https://github.com/apache/lucene/pull/12172#issuecomment-1448749282
Note, if we fix it here, stemmer maybe should deal with this case too? https://github.com/snowballstem/snowball/blob/master/algorithms/romanian.sbl#L26-L27 Alternatively, tokenfilter could be added that "normalizes/folds" these and runs before stopfilter and stemfilter to take care of it. It would have the advantage of giving the user choice (they can just create customanalzer and remove the normalization if they don't want that folding). But it would be overkill, if we should really just fix stopwords and stemmer. Sorry, I'm not knowledgeable on Romanian, so I don't know if it causes problems to treat them "the same". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org