[GitHub] [lucene] msokolov commented on a diff in pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef

2022-08-20 Thread GitBox


msokolov commented on code in PR #1076:
URL: https://github.com/apache/lucene/pull/1076#discussion_r950691134


##
lucene/core/src/java/org/apache/lucene/index/VectorEncoding.java:
##
@@ -21,12 +21,8 @@
 public enum VectorEncoding {
 
   /**
-   * Encodes vector using 8 bits of precision per sample. Use only with 
DOT_PRODUCT similarity.
-   * NOTE: this can enable significant storage savings and faster searches, at 
the cost of some
-   * possible loss of precision. In order to use it, all vectors must be of 
the same norm, as
-   * measured by the sum of the squares of the scalar values, and those values 
must be in the range
-   * [-128, 127]. This applies to both document and query vectors. Using 
nonconforming vectors can
-   * result in errors or poor search results.
+   * Encodes vector using 8 bits of precision per sample. NOTE: this can 
enable significant storage

Review Comment:
   I added back a comment about the range requirement, and also added the 
safety checks to `toBytesRef` -- you convinced me it's too dangerous otherwise, 
and in any case we only do it a few times per query (per segment). Maybe later 
we can move to codec so we only do once per query.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov merged pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef

2022-08-20 Thread GitBox


msokolov merged PR #1076:
URL: https://github.com/apache/lucene/pull/1076


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov merged pull request #1075: don't call BitSet.cardinality() more than needed

2022-08-20 Thread GitBox


msokolov merged PR #1075:
URL: https://github.com/apache/lucene/pull/1075


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #1074: Fix for bad cast when sorting a KnnVectors index over BytesRef

2022-08-20 Thread GitBox


msokolov commented on PR #1074:
URL: https://github.com/apache/lucene/pull/1074#issuecomment-1221307962

   hmm new testRandomBytes failed with seed CBC12662A8273C68


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #1073: fix VectorUtil.dotProductScore normalization

2022-08-20 Thread GitBox


msokolov commented on code in PR #1073:
URL: https://github.com/apache/lucene/pull/1073#discussion_r950691627


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -270,7 +270,8 @@ public static float dotProduct(BytesRef a, BytesRef b) {
*/
   public static float dotProductScore(BytesRef a, BytesRef b) {
 // divide by 2 * 2^14 (maximum absolute value of product of 2 signed 
bytes) * len
-return (1 + dotProduct(a, b)) / (float) (a.length * (1 << 15));
+float maxValue = (float) (a.length * (1 << 14));
+return (maxValue + dotProduct(a, b)) / (2 * maxValue);

Review Comment:
   ah ... yes!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov merged pull request #1073: fix VectorUtil.dotProductScore normalization

2022-08-20 Thread GitBox


msokolov merged PR #1073:
URL: https://github.com/apache/lucene/pull/1073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov merged pull request #1074: Fix for bad cast when sorting a KnnVectors index over BytesRef

2022-08-20 Thread GitBox


msokolov merged PR #1074:
URL: https://github.com/apache/lucene/pull/1074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org