mikemccand commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206798896
> > I know we are discussing how to make the conflated quantization and distance metric combinations math work out (i.e. how to fix the issue), but I'm just trying to get the big picture of what the issue even is. How would we explain the buggy behavior to users? > > The scalar quantizer currently assumes the destination `byte` values are unsigned. However, Java cannot have unsigned byte values that are `> 127`. Consequently, when trying to utilize the full number of bits in `int8` the scoring falls apart. OK, thanks. Regardless of the distance metric, `int8` is buggy today. (Hmm, are there any special cases where `int8` is working? I see blog posts about using `int8` in Elasticsearch...). I would think even if your vectors are e.g. always non-negative in all values, `int8` quantization will still try (in a buggy way) to use the full 8 bytes, and will still be buggy. > Looking back, when I finished up the ScalarQuantization support for int4, I should have disallowed `8` bits as a parameter to prevent this confusion. I allowed it and I shouldn't have. Well since it doesn't work at all today, I think we can freely change/fix the behavior, e.g. in 9.12 or even a 9.11.2? We could simply remove support for `int8`? Anyone who is using it today is getting terrible results so they really should not be using it, and on upgrade, when they see Lucene no longer supports it, they can fix their code to either go to `int4` or `int7`? We could (later, not rushing) try to explore the fixes you suggested above to actually get it working, but given the discussions so far, the math sounds complicated :) > However, Java cannot have unsigned byte values that are `> 127`. Yeah this is always so annoying :) And if you try to negate the special `-128` byte value, you get `-128` back. And `(byte) Math.abs(-128)` also returns -128. This makes for evil interview line of questioning... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org