Trey314159 commented on PR #12172: URL: https://github.com/apache/lucene/pull/12172#issuecomment-1452625462
> After reading up on the history of these characters, I think we should treat them "the same" for Romanian always. Yeah, I agree. > I think a filter may be worthwhile as a separate PR? Sounds good. A token filter called `romanian_normalization` or some such sounds good and in keeping with other analyzers. > fwiw I'm subscribed that list and haven't seen a message in 10 years. I think they are just using github issues/PRs now? It's definitely low volume. The archives show one message every month or two. If the moderator doesn't approve it by Monday I'll try something else. Please feel free to contact them through another channel if you can/want to. > So I think the most ideal situation would be to both fix snowball and then map cedilla to "correct" forms with a TokenFilter? Yeah. I guess if Snowball moves relatively quickly it would make sense to wait for them to make their change so as not to need to have a mapping going one way (comma to cedilla), and then switching it to be the opposite a short while later—though that _is_ what we're going to do locally since we aren't planning on upgrading particularly soon, while config changes are relatively easy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org