[GitHub] [lucene] rmuir commented on pull request #12172: Add Romanian stopwords with s&t with comma

via GitHub Tue, 28 Feb 2023 11:36:02 -0800


rmuir commented on PR #12172:
URL: https://github.com/apache/lucene/pull/12172#issuecomment-1448749282


   Note, if we fix it here, stemmer maybe should deal with this case too? 
https://github.com/snowballstem/snowball/blob/master/algorithms/romanian.sbl#L26-L27
   
   Alternatively, tokenfilter could be added that "normalizes/folds" these and 
runs before stopfilter and stemfilter to take care of it. It would have the 
advantage of giving the user choice (they can just create customanalzer and 
remove the normalization if they don't want that folding). But it would be 
overkill, if we should really just fix stopwords and stemmer.  Sorry, I'm not 
knowledgeable on Romanian, so I don't know if it causes problems to treat them 
"the same". 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] rmuir commented on pull request #12172: Add Romanian stopwords with s&t with comma

Reply via email to