rmuir commented on pull request #2459: URL: https://github.com/apache/lucene-solr/pull/2459#issuecomment-791585657
For greek, if you analyze the distribution of dictionary (I use https://scripts.sil.org/UnicodeCharacterCount ), you can see that smallest character in the whole dictionary is `0x386` (decimal 902) and largest is `0x3CE` (decimal 974). So, even simpler, you could exploit that and encode a single `int base = 0x386; // smallest char in use` for this whole dictionary and all characters would be single-byte encoded. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org