[GitHub] [lucene] msokolov commented on a diff in pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef
msokolov commented on code in PR #1076: URL: https://github.com/apache/lucene/pull/1076#discussion_r950691134 ## lucene/core/src/java/org/apache/lucene/index/VectorEncoding.java: ## @@ -21,12 +21,8 @@ public enum VectorEncoding { /** - * Encodes vector using 8 bits of precision per sample. Use only with DOT_PRODUCT similarity. - * NOTE: this can enable significant storage savings and faster searches, at the cost of some - * possible loss of precision. In order to use it, all vectors must be of the same norm, as - * measured by the sum of the squares of the scalar values, and those values must be in the range - * [-128, 127]. This applies to both document and query vectors. Using nonconforming vectors can - * result in errors or poor search results. + * Encodes vector using 8 bits of precision per sample. NOTE: this can enable significant storage Review Comment: I added back a comment about the range requirement, and also added the safety checks to `toBytesRef` -- you convinced me it's too dangerous otherwise, and in any case we only do it a few times per query (per segment). Maybe later we can move to codec so we only do once per query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef
msokolov merged PR #1076: URL: https://github.com/apache/lucene/pull/1076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #1075: don't call BitSet.cardinality() more than needed
msokolov merged PR #1075: URL: https://github.com/apache/lucene/pull/1075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #1074: Fix for bad cast when sorting a KnnVectors index over BytesRef
msokolov commented on PR #1074: URL: https://github.com/apache/lucene/pull/1074#issuecomment-1221307962 hmm new testRandomBytes failed with seed CBC12662A8273C68 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #1073: fix VectorUtil.dotProductScore normalization
msokolov commented on code in PR #1073: URL: https://github.com/apache/lucene/pull/1073#discussion_r950691627 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,7 +270,8 @@ public static float dotProduct(BytesRef a, BytesRef b) { */ public static float dotProductScore(BytesRef a, BytesRef b) { // divide by 2 * 2^14 (maximum absolute value of product of 2 signed bytes) * len -return (1 + dotProduct(a, b)) / (float) (a.length * (1 << 15)); +float maxValue = (float) (a.length * (1 << 14)); +return (maxValue + dotProduct(a, b)) / (2 * maxValue); Review Comment: ah ... yes! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #1073: fix VectorUtil.dotProductScore normalization
msokolov merged PR #1073: URL: https://github.com/apache/lucene/pull/1073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #1074: Fix for bad cast when sorting a KnnVectors index over BytesRef
msokolov merged PR #1074: URL: https://github.com/apache/lucene/pull/1074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org