shubhamvishu opened a new pull request, #12726:
URL: https://github.com/apache/lucene/pull/12726

   ### Description
   
   While going through 
[VectorUtil](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java)
 class, I observed we don't have a check for unit vector in 
`VectorUtil#l2normalize` so passing a unit vector goes thorough the whole L2 
normalization(which is totally not required and it should early exit?). I 
confirmed this by trying out a silly example of 
`VectorUtil.l2normalize(VectorUtil.l2normalize(nonUnitVector))` and it 
performed every calculation twice. We could also argue that user should not 
call for this for a unit vector but I believe there would be cases where user 
simply want to perform the L2 normalization without checking the vector or if 
there are some overflowing values.
   
   TL;DR : We should early exit in `VectorUtil#l2normalize`, returning the same 
input vector if its a unit vector
   
   This is easily avoidable if we introduce a light check to see if the L1 norm 
or squared sum of input vector is equal to 1.0 (or) maybe just check 
`Math.abs(l1norm - 1.0d) <= 1e-5` (as in this PR) because that unit vector dot 
product(`v x v`) are not exactly 1.0 but like example : `0.9999999403953552` 
etc. With `1e-5` delta here we would be assuming a vector v having `v x v` >= 
`0.99999` is a unit vector or say already L2 normalized which seems fine as the 
delta is really small? and also the check is not heavy one?. 
   
   I'm not sure this there existed some sort of similar check before or 
something(I tried to check but didn't find any history) so looking forward to 
thoughts if this makes sense to be added or not. Thanks!
   
   
   <!--
   If this is your first contribution to Lucene, please make sure you have 
reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to