rmuir commented on issue #14235:
URL: https://github.com/apache/lucene/issues/14235#issuecomment-2657956964

   This one looks to me like another dictionary bug. Unfortunately the current 
options we have to "tolerate" such bugs don't work in this case, but perhaps 
they can be improved.
   
   The affix file in question looks like this:
   ```
   SFX ô Y 138 # this indicates that 138 rules should follow
   ... 137 rules follow ...
   SFX õ Y 29 # this indicates a new header, and that 29 rules follow
   ```
   
   The parser is mad because it expects one more rule for ô (LATIN SMALL LETTER 
O WITH CIRCUMFLEX), but instead it receives header with LATIN SMALL LETTER O 
WITH TILDE.
   
   I will try to make a standalone reproducer test, this one takes 5 minutes to 
run and that's not fun. Then see if logic can be adjusted to handle it.
   
   Separately (additionally) can work with the upstream of this dictionary to 
fix the bug, because their dictionary may not be working as intended with other 
parsers (e.g. hunspell C code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to