donnerpeter commented on pull request #2459:
URL: https://github.com/apache/lucene-solr/pull/2459#issuecomment-792642905


   > For greek, if you analyze the distribution of dictionary (I use 
https://scripts.sil.org/UnicodeCharacterCount ), you can see that smallest 
character in the whole dictionary is `0x386` (decimal 902) and largest is 
`0x3CE` (decimal 974). So, even simpler, you could exploit that and encode a 
single `int base = 0x386; // smallest char in use` for this whole dictionary 
and all characters would be single-byte encoded.
   
   @rmuir This seems to bring memory usage a bit down, but not so much as I'd 
hope (11.1->9.4 MB for Greek, 463->458 MB total). As it also complicates the 
code, I'd leave this out, at least for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to