[GitHub] [lucene-solr] dweiss commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

GitBox Thu, 04 Feb 2021 09:26:54 -0800


dweiss commented on pull request #2277:
URL: https://github.com/apache/lucene-solr/pull/2277#issuecomment-773476961



   Hi Peter. The file size itself doesn't matter if we can assume some kind of 
leader buffer in which these flags have to occur and which we can rewind. 
Implementing this is technically easy - for example via BufferedInputStream 
with a reasonably large internal buffer, then a mark on the zero-eth byte. Once 
you reach your flags, you reset the buffer. 
   
   The only problem I see here is a minor potential to have a the buffer limit 
fall on an UTF8 surrogate, for example, which could potentially trigger some 
kind of exception... but this can be worked around.
   
   I'll try to do this, time permitting. It's not much of a problem - it can be 
done later too. I have to limit my time for Lucene to reasonable chunks though. 
:)
   
   > for Kinyarwanda it's 38MB.
   
   Is this one of the openoffice dictionaries? Once we have them all parse 
successfully it'd be a good baseline test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2277: LUCENE-9716: Hunspell: support flag usage before its format is even specified

Reply via email to