rmuir commented on issue #14235: URL: https://github.com/apache/lucene/issues/14235#issuecomment-2657956964
This one looks to me like another dictionary bug. Unfortunately the current options we have to "tolerate" such bugs don't work in this case, but perhaps they can be improved. The affix file in question looks like this: ``` SFX ô Y 138 # this indicates that 138 rules should follow ... 137 rules follow ... SFX õ Y 29 # this indicates a new header, and that 29 rules follow ``` The parser is mad because it expects one more rule for ô (LATIN SMALL LETTER O WITH CIRCUMFLEX), but instead it receives header with LATIN SMALL LETTER O WITH TILDE. I will try to make a standalone reproducer test, this one takes 5 minutes to run and that's not fun. Then see if logic can be adjusted to handle it. Separately (additionally) can work with the upstream of this dictionary to fix the bug, because their dictionary may not be working as intended with other parsers (e.g. hunspell C code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org