rmuir commented on pull request #2459:
URL: https://github.com/apache/lucene-solr/pull/2459#issuecomment-791585657


   For greek, if you analyze the distribution of dictionary (I use 
https://scripts.sil.org/UnicodeCharacterCount ), you can see that smallest 
character in the whole dictionary is `0x386` (decimal 902) and largest is 
`0x3CE` (decimal 974). So, even simpler, you could exploit that and encode a 
single `int base = 0x386; // smallest char in use` for this whole dictionary 
and all characters would be single-byte encoded.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to