mikemccand commented on PR #888:
URL: https://github.com/apache/lucene/pull/888#issuecomment-1792595250

   > @uschindler <https://github.com/uschindler> pushed 0 commits.
   
   Huh, how do you do that?
   
   Mike McCandless
   
   http://blog.mikemccandless.com
   
   
   On Fri, Nov 3, 2023 at 10:42 AM Uwe Schindler ***@***.***>
   wrote:
   
   > @mikemccand <https://github.com/mikemccand>: If you want to see the
   > changes I reverted, see the above comparison:
   > 
https://github.com/apache/lucene/compare/36de2bb7fa7a0587a102cf5c4d35ac8f94976bbd..c1b626c0636821f4d7c085895359489e7dfa330f
   >
   > Those changes need to be re-applied to the repo in correct files (not sure
   > where this code now lives, looks like BytesRefBlockPool, but no idea, 
sorry)
   >
   > I think I know after looking into those changes what the problem was.
   > Internally BytesRefHash uses BIG ENDIAN, because some parts in the byte
   > array are "UTF-8 like" encoded (if highest bit is set another byte
   > follows). As this is stupid to do and requires only a few bytes more
   > storage, I removed that encoding to always use shorts instead of "byte or
   > BE short". When the encoding no longer matters and must not be "UTF-8
   > encoding like", it can use native order. But for safety you could also use
   > LE encoding to make use of actual CPUs (ARM is also LE now).
   >
   > So we have 2 posisbilities:
   >
   >    - Change the internal encoding of bytesrefhash and remove the Big
   >    Endian UTF-8 like encoding (or call it vShort) and switch to Little 
Endian
   >    shorts
   >    - Use native encoding to also help CPUs like s390 and use native
   >    encoding (which also works). This PR supports this, but it is 
questionable
   >    for the reasons Robert said.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/lucene/pull/888#issuecomment-1792570482>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAGCOXAUIXXARYWAF4PRGQLYCT7GXAVCNFSM5V4VYZVKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGI2TOMBUHAZA>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to