mikemccand commented on issue #13519:
URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206798896

   > > I know we are discussing how to make the conflated quantization and 
distance metric combinations math work out (i.e. how to fix the issue), but I'm 
just trying to get the big picture of what the issue even is. How would we 
explain the buggy behavior to users?
   > 
   > The scalar quantizer currently assumes the destination `byte` values are 
unsigned. However, Java cannot have unsigned byte values that are `> 127`. 
Consequently, when trying to utilize the full number of bits in `int8` the 
scoring falls apart.
   
   OK, thanks.  Regardless of the distance metric, `int8` is buggy today.  
(Hmm, are there any special cases where `int8` is working?  I see blog posts 
about using `int8` in Elasticsearch...).  I would think even if your vectors 
are e.g. always non-negative in all values, `int8` quantization will still try 
(in a buggy way) to use the full 8 bytes, and will still be buggy.
   
   > Looking back, when I finished up the ScalarQuantization support for int4, 
I should have disallowed `8` bits as a parameter to prevent this confusion. I 
allowed it and I shouldn't have.
   
   Well since it doesn't work at all today, I think we can freely change/fix 
the behavior, e.g. in 9.12 or even a 9.11.2?  We could simply remove support 
for `int8`?  Anyone who is using it today is getting terrible results so they 
really should not be using it, and on upgrade, when they see Lucene no longer 
supports it, they can fix their code to either go to `int4` or `int7`?
   
   We could (later, not rushing) try to explore the fixes you suggested above 
to actually get it working, but given the discussions so far, the math sounds 
complicated :)
   
   > However, Java cannot have unsigned byte values that are `> 127`.
   
   Yeah this is always so annoying :)  And if you try to negate the special 
`-128` byte value, you get `-128` back.  And `(byte) Math.abs(-128)` also 
returns -128.  This makes for evil interview line of questioning...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to