[GitHub] [lucene] LuXugang commented on a diff in pull request #1062: Optimize TermInSetQuery for terms that match all docs in a segment

2022-08-19 Thread GitBox
LuXugang commented on code in PR #1062: URL: https://github.com/apache/lucene/pull/1062#discussion_r949980320 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -293,6 +296,9 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOExcept

[GitHub] [lucene-solr] ceciliassis closed pull request #450: Add rule exception for "imento" and "mento" suffix

2022-08-19 Thread GitBox
ceciliassis closed pull request #450: Add rule exception for "imento" and "mento" suffix URL: https://github.com/apache/lucene-solr/pull/450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] gsmiller commented on a diff in pull request #1062: Optimize TermInSetQuery for terms that match all docs in a segment

2022-08-19 Thread GitBox
gsmiller commented on code in PR #1062: URL: https://github.com/apache/lucene/pull/1062#discussion_r950251125 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -293,6 +296,9 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOExcept

[GitHub] [lucene] LuXugang commented on a diff in pull request #1062: Optimize TermInSetQuery for terms that match all docs in a segment

2022-08-19 Thread GitBox
LuXugang commented on code in PR #1062: URL: https://github.com/apache/lucene/pull/1062#discussion_r950306914 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -293,6 +296,9 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOExcept

[GitHub] [lucene] LuXugang commented on a diff in pull request #1062: Optimize TermInSetQuery for terms that match all docs in a segment

2022-08-19 Thread GitBox
LuXugang commented on code in PR #1062: URL: https://github.com/apache/lucene/pull/1062#discussion_r950307252 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -293,6 +296,9 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOExcept

[GitHub] [lucene] LuXugang commented on a diff in pull request #1062: Optimize TermInSetQuery for terms that match all docs in a segment

2022-08-19 Thread GitBox
LuXugang commented on code in PR #1062: URL: https://github.com/apache/lucene/pull/1062#discussion_r950307252 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -293,6 +296,9 @@ private WeightOrDocIdSet rewrite(LeafReaderContext context) throws IOExcept

[GitHub] [lucene] msokolov opened a new pull request, #1073: fix VectorUtil.dotProductScore normalization

2022-08-19 Thread GitBox
msokolov opened a new pull request, #1073: URL: https://github.com/apache/lucene/pull/1073 There was a thinko in recently-pushed VectorUtil.dotProductScore. It didn't have the needed offset to the score: was adding 1 when it should have added `-128 * -128 * dimension`. -- This is an auto

[jira] [Commented] (LUCENE-10681) ArrayIndexOutOfBoundsException while indexing large binary file

2022-08-19 Thread Michael Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581941#comment-17581941 ] Michael Sokolov commented on LUCENE-10681: -- I think so; please close this one

[jira] [Commented] (LUCENE-10318) Reuse HNSW graphs when merging segments?

2022-08-19 Thread Michael Sokolov (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581942#comment-17581942 ] Michael Sokolov commented on LUCENE-10318: -- Another idea I played with at one

[GitHub] [lucene] msokolov commented on pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
msokolov commented on PR #1054: URL: https://github.com/apache/lucene/pull/1054#issuecomment-1220873853 Thanks so much for the review @jtibshirani -- I will post some PRs with fixes today, although I may not be able to address everything you raised -- This is an automated message from the

[GitHub] [lucene] msokolov opened a new pull request, #1074: Fix for bad cast when sorting a KnnVectors index over BytesRef

2022-08-19 Thread GitBox
msokolov opened a new pull request, #1074: URL: https://github.com/apache/lucene/pull/1074 Thanks @jtibshirani for noticing this one! Clearly we were missing some tests, so I beefed up BaseKnnVectorsFormatTestCase a bit, adding a specific sorted index test over bytes and also a testRandomBy

[GitHub] [lucene] msokolov opened a new pull request, #1075: don't call BitSet.cardinality() more than needed

2022-08-19 Thread GitBox
msokolov opened a new pull request, #1075: URL: https://github.com/apache/lucene/pull/1075 ### Description (or a Jira issue link if you have one) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [lucene] msokolov commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
msokolov commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950423184 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -133,22 +130,21 @@ private TopDocs searchLeaf(LeafReaderContext ctx, Weight filterWeight) th

[GitHub] [lucene] msokolov commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
msokolov commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950423611 ## lucene/core/src/java/org/apache/lucene/codecs/lucene94/Lucene94HnswVectorsWriter.java: ## @@ -249,6 +261,29 @@ private void writeSortingField(FieldWriter fieldData, i

[GitHub] [lucene] msokolov opened a new pull request, #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef

2022-08-19 Thread GitBox
msokolov opened a new pull request, #1076: URL: https://github.com/apache/lucene/pull/1076 Adds some type safety checks to KnnVectorField Adds a unit test to exercise the safety checks in TestField the unit test uncovered a bad bug where I had used a length instead of a "to" position i

[GitHub] [lucene] msokolov commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
msokolov commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950503788 ## lucene/core/src/java/org/apache/lucene/document/KnnVectorField.java: ## @@ -117,6 +160,21 @@ public KnnVectorField(String name, float[] vector, FieldType fieldType)

[GitHub] [lucene] msokolov commented on a diff in pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
msokolov commented on code in PR #1071: URL: https://github.com/apache/lucene/pull/1071#discussion_r950507613 ## lucene/core/src/java/org/apache/lucene/index/VectorValues.java: ## @@ -192,36 +176,5 @@ public int advance(int target) throws IOException { public long cost() {

[GitHub] [lucene] msokolov commented on a diff in pull request #1058: LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in TermInSetQuery

2022-08-19 Thread GitBox
msokolov commented on code in PR #1058: URL: https://github.com/apache/lucene/pull/1058#discussion_r950508835 ## lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java: ## @@ -345,15 +345,62 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOExceptio

[GitHub] [lucene] msokolov commented on a diff in pull request #1058: LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in TermInSetQuery

2022-08-19 Thread GitBox
msokolov commented on code in PR #1058: URL: https://github.com/apache/lucene/pull/1058#discussion_r950509327 ## lucene/CHANGES.txt: ## @@ -95,6 +95,9 @@ Improvements - * LUCENE-10592: Build HNSW Graph on indexing. (Mayya Sharipova, Adrien Grand, Julie Tib

[jira] [Commented] (LUCENE-8118) ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing

2022-08-19 Thread Jira
[ https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582032#comment-17582032 ] Luís Filipe Nassif commented on LUCENE-8118: I did some workarounds in our p

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1071: URL: https://github.com/apache/lucene/pull/1071#discussion_r950621376 ## lucene/core/src/java/org/apache/lucene/index/VectorValues.java: ## @@ -192,36 +176,5 @@ public int advance(int target) throws IOException { public long cost()

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1071: URL: https://github.com/apache/lucene/pull/1071#discussion_r950621473 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -133,7 +132,7 @@ private HnswGraphBuilder( * accessor for the vectors */

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1071: URL: https://github.com/apache/lucene/pull/1071#discussion_r950622069 ## lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java: ## @@ -308,41 +302,6 @@ private void printFanoutHist(Path indexPath) throws IOException {

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1071: URL: https://github.com/apache/lucene/pull/1071#discussion_r950622069 ## lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java: ## @@ -308,41 +302,6 @@ private void printFanoutHist(Path indexPath) throws IOException {

[GitHub] [lucene] jtibshirani commented on pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani commented on PR #1071: URL: https://github.com/apache/lucene/pull/1071#issuecomment-1221189385 Thank you for the reviews. I'm going to merge since it seems you're both on board with the change. -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1076: Add safety checks to KnnVectorField; fixed issue with copying BytesRef

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1076: URL: https://github.com/apache/lucene/pull/1076#discussion_r950623197 ## lucene/core/src/java/org/apache/lucene/index/VectorEncoding.java: ## @@ -21,12 +21,8 @@ public enum VectorEncoding { /** - * Encodes vector using 8 bits of

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950624287 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -133,22 +130,21 @@ private TopDocs searchLeaf(LeafReaderContext ctx, Weight filterWeight)

[GitHub] [lucene] jtibshirani merged pull request #1071: LUCENE-9583: Remove RandomAccessVectorValuesProducer

2022-08-19 Thread GitBox
jtibshirani merged PR #1071: URL: https://github.com/apache/lucene/pull/1071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

2022-08-19 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582097#comment-17582097 ] ASF subversion and git services commented on LUCENE-9583: - Commi

[GitHub] [lucene] jtibshirani opened a new pull request, #1077: Remove KnnVectorsFormat#currentVersion

2022-08-19 Thread GitBox
jtibshirani opened a new pull request, #1077: URL: https://github.com/apache/lucene/pull/1077 These internal versions only make sense within a codec definition, and aren't meant to be exposed and compared across codecs. Since this method is only used in tests, we can move the check to

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950631004 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsFormat.java: ## @@ -76,6 +78,15 @@ public static KnnVectorsFormat forName(String name) { /** Returns a {

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950631270 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -213,4 +243,48 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; }

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1054: LUCENE-10577: enable quantization of HNSW vectors to 8 bits

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1054: URL: https://github.com/apache/lucene/pull/1054#discussion_r950631408 ## lucene/core/src/java/org/apache/lucene/codecs/KnnFieldVectorsWriter.java: ## @@ -20,8 +20,12 @@ import java.io.IOException; import org.apache.lucene.util.Account

[GitHub] [lucene] jtibshirani commented on a diff in pull request #1073: fix VectorUtil.dotProductScore normalization

2022-08-19 Thread GitBox
jtibshirani commented on code in PR #1073: URL: https://github.com/apache/lucene/pull/1073#discussion_r950632314 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -270,7 +270,8 @@ public static float dotProduct(BytesRef a, BytesRef b) { */ public stati