donnerpeter commented on pull request #2459: URL: https://github.com/apache/lucene-solr/pull/2459#issuecomment-792642905
> For greek, if you analyze the distribution of dictionary (I use https://scripts.sil.org/UnicodeCharacterCount ), you can see that smallest character in the whole dictionary is `0x386` (decimal 902) and largest is `0x3CE` (decimal 974). So, even simpler, you could exploit that and encode a single `int base = 0x386; // smallest char in use` for this whole dictionary and all characters would be single-byte encoded. @rmuir This seems to bring memory usage a bit down, but not so much as I'd hope (11.1->9.4 MB for Greek, 463->458 MB total). As it also complicates the code, I'd leave this out, at least for now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org