mccullocht opened a new pull request, #15411:
URL: https://github.com/apache/lucene/pull/15411

   Loss from quantization can yield some unexpected values from the corrected 
dot product, sometimes producing
   values that are out-of-bounds. This is more likely when the inputs are 
"extreme" in the sense that they are very far
   from the segment-level centroid.
   
   * Bound euclidean distance to a non-negative value -- negative values do not 
make any sense.
   * Clamp dot product/cosine score to `[-1,1]` as the normalized dot product 
should always return values in this range.
   
   This works well enough for 4+ bit quantization but may not work as well for 
1-bit quantization since the loss is so great.
   Maximum inner product inputs are also not clamped as I haven't figured out 
if this makes sense.
   
   Fix the `testSingleVectorCase` to l2 normalize all vectors for `DOT_PRODUCT` 
similarity.
   
   This is a partial fix for #15408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to