[GitHub] [lucene] iverase opened a new pull request #685: LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count.
iverase opened a new pull request #685: URL: https://github.com/apache/lucene/pull/685 These query wrappers do not modify the set of matching documents so they can delegate Weight#count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r807653039 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -24,19 +24,36 @@ import java.util.Objects; import org.apache.lucene.codecs.KnnVectorsReader; import org.apache.lucene.document.KnnVectorField; +import org.apache.lucene.index.FieldInfo; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.VectorSimilarityFunction; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BitSet; +import org.apache.lucene.util.BitSetIterator; import org.apache.lucene.util.Bits; +import org.apache.lucene.util.FixedBitSet; -/** Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. */ +/** + * Uses {@link KnnVectorsReader#search} to perform nearest neighbour search. + * + * This query also allows for performing a kNN search subject to a filter. In this case, it first + * executes the filter for each leaf, then chooses a strategy dynamically: + * + * + * If the filter cost is less than k, just execute an exact search + * Otherwise run a kNN search subject to the filter + * the kNN search visits too many vectors without completing, stop and run an exact search Review comment: **if** the KNN search ? ## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ## @@ -455,6 +484,61 @@ public void testRandom() throws IOException { } } + /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */ + public void testRandomWithFilter() throws IOException { +int numDocs = 200; +int dimension = atLeast(5); +int numIters = atLeast(10); +try (Directory d = newDirectory()) { + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +doc.add(new KnnVectorField("field", randomVector(dimension))); +doc.add(new NumericDocValuesField("tag", i)); +doc.add(new IntPoint("tag", i)); +w.addDocument(doc); + } + w.close(); + + try (IndexReader reader = DirectoryReader.open(d)) { +IndexSearcher searcher = newSearcher(reader); +for (int i = 0; i < numIters; i++) { + int lower = random().nextInt(50); + + // Check that when filter is restrictive, we use exact search + Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6); + KnnVectorQuery query = new KnnVectorQuery("field", randomVector(dimension), 5, filter); + TopDocs results = searcher.search(query, numDocs); + assertEquals(TotalHits.Relation.EQUAL_TO, results.totalHits.relation); + assertEquals(results.totalHits.value, 5); Review comment: How do we know that we used the exact search? Are we judging by the equality of `results.totalHits.value` and `results.scoreDocs.length`? I guess in most cases this is true. Another idea is always use `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO` for the approximate search results as returned in `KnnVectorQuery.searchLeaf`: ```java TopDocs results = approximateSearch(ctx, acceptDocs, visitedLimit); if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) { return ; } else { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request #686: LUCENE-10421: use Constant instead of relying upon timestamp
rmuir opened a new pull request #686: URL: https://github.com/apache/lucene/pull/686 All the other uses of `System.currentTimeMillis` (both java and test code) are no good, but i'd rather tackle them in a followup issue (I will make a JIRA). Eventually, we can ban use of wall clock time with forbidden-apis. But for now, I'd just like to have nightly benchmarks again :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493146#comment-17493146 ] Robert Muir commented on LUCENE-10391: -- Actually these nightly benchmarks have not even been running. See LUCENE-10421 > Reuse data structures across HnswGraph invocations > -- > > Key: LUCENE-10391 > URL: https://issues.apache.org/jira/browse/LUCENE-10391 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Julie Tibshirani >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. > Profiles from nightly benchmarks suggest that allocating data-structures > incurs both lots of heap allocations > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)] > and CPU usage > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).] > It looks like reusing data structures across invocations would be a > low-hanging fruit that could help save significant CPU? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10423) Remove uses of wall-clock time in codebase
Robert Muir created LUCENE-10423: Summary: Remove uses of wall-clock time in codebase Key: LUCENE-10423 URL: https://issues.apache.org/jira/browse/LUCENE-10423 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Followup to LUCENE-10421 Code in the library shouldn't rely on wall-clock time. If you look at all the places doing this, they are basically all bad news. Most tests doing this are "iterating for some amount of wall-clock time" which causes them to instead just be non-reproducible. These should be changed to use a fixed number of loop iterations instead. It would really be great to ban this stuff in forbidden apis. It is even in the configuration file, just currently commented out. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
rmuir commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1041345085 Yeah, those are actually API bugs? We have public methods that have non-public classes in their signature. Looks like this will be more complex to fix up. In this example of `ByteBufferIndexInput.newInstance` and its `ByteBufferGuard` parameter, I think a better solution is to make `ByteBufferIndexInput.newInstance` package-private. The only callers are `ByteBuffersDirectory` and `MMapDirectory` which are in the same package. Then we don't need to make `ByteBufferGuard` public. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on pull request #679: Monitor Improvements LUCENE-10422
mogui commented on pull request #679: URL: https://github.com/apache/lucene/pull/679#issuecomment-1041408992 @romseygeek I've updated with the requested changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #681: LUCENE-10322: Enable -Xlint:path and -Xlint:-exports
dweiss commented on pull request #681: URL: https://github.com/apache/lucene/pull/681#issuecomment-1041410379 > Yeah, those are actually API bugs? They do look like API issues to me. Useful warning, by the way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807992105 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java ## @@ -206,14 +203,22 @@ private void writeMeta( meta.writeVLong(vectorIndexOffset); meta.writeVLong(vectorIndexLength); meta.writeInt(field.getVectorDimension()); -meta.writeInt(docIds.length); -for (int docId : docIds) { - // TODO: delta-encode, or write as bitset - meta.writeVInt(docId); + +// write docIDs +int count = docsWithField.cardinality(); +meta.writeInt(count); +if (count == maxDoc) { + meta.writeByte((byte) -1); + ; // dense marker, each document has a vector value Review comment: Addressed in 47042f2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807992728 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ## @@ -424,38 +448,45 @@ public int docID() { @Override public int nextDoc() { - if (++ord >= size()) { + if (++ord >= size) { doc = NO_MORE_DOCS; } else { -doc = ordToDoc[ord]; +doc = ordToDocOperator.applyAsInt(ord); } return doc; } @Override public int advance(int target) { assert docID() < target; - ord = Arrays.binarySearch(ordToDoc, ord + 1, ordToDoc.length, target); - if (ord < 0) { -ord = -(ord + 1); + + if (ordToDoc == null) { +ord = target; + } else { +ord = Arrays.binarySearch(ordToDoc, ord + 1, ordToDoc.length, target); +if (ord < 0) { + ord = -(ord + 1); +} } - assert ord <= ordToDoc.length; - if (ord == ordToDoc.length) { + + assert ord <= size; + if (ord == size) { doc = NO_MORE_DOCS; } else { -doc = ordToDoc[ord]; +doc = ordToDocOperator.applyAsInt(ord); +; Review comment: Addressed in 47042f2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807993536 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ## @@ -266,12 +268,12 @@ private Bits getAcceptOrds(Bits acceptDocs, FieldEntry fieldEntry) { return new Bits() { @Override public boolean get(int index) { -return acceptDocs.get(fieldEntry.ordToDoc[index]); +return acceptDocs.get(fieldEntry.ordToDoc(index)); Review comment: Great comment, addressed in 47042f2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807998586 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: @jtibshirani Thanks for your feedback and comment. What did you mean by "vectors were out of order"? `VectorValues` `extends DocIdSetIterator` and are expected to be accessed in the increasing doc IDs order. Or did you mean `RandomAccessVectorValues`? I think this class doesn't concern itself with doc Ids. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807998586 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: @jtibshirani Thanks for your feedback and comment. What did you mean by "vectors were out of order"? `VectorValues` `extends DocIdSetIterator` and are expected to be accessed in the increasing doc ID order. Or did you mean `RandomAccessVectorValues`? I think this class doesn't concern itself with doc Ids. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807998586 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: @jtibshirani Thanks for your feedback and comment. What did you mean by "vectors were out of order"? `VectorValues` `extends DocIdSetIterator` and are expected to be accessed in the increasing doc ID order. Or did you mean `RandomAccessVectorValues`? This class doesn't concern itself with doc Ids, so we should not worry about docIds in this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
mayya-sharipova commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r807998586 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: @jtibshirani Thanks for your feedback and comment. What did you mean by "vectors were out of order"? `VectorValues` `extends DocIdSetIterator` and are expected to be accessed in the increasing doc ID order. Or did you mean `RandomAccessVectorValues`? This class doesn't concern itself with doc Ids, so we should not worry docIds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on pull request #2641: SOLR-15965 Use proper signatures for SolrAuth
janhoy commented on pull request #2641: URL: https://github.com/apache/lucene-solr/pull/2641#issuecomment-1041761443 So the benefit of backporting to 8x is that we get a more secure PKI for the lifetime of 8x (12+ months), and that you get an upgrade path 8.x -> 8.11.2 -> 9.x where rolling upgrades will work ootb without any param settings. Fair enough. Perhaps add to the 8.11.2 release-notes (wiki) that this release makes rolling upgrade to 9.x easier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493305#comment-17493305 ] Michael McCandless commented on LUCENE-10391: - Sorry for the nightly benchmarks down-time! I think I [pushed a fix just now|https://github.com/mikemccand/luceneutil/commit/36eec79e5ea3cb336c38d53bd4ea35bd6847b4c5] that should get them running again ... cross fingers! > Reuse data structures across HnswGraph invocations > -- > > Key: LUCENE-10391 > URL: https://issues.apache.org/jira/browse/LUCENE-10391 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Julie Tibshirani >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. > Profiles from nightly benchmarks suggest that allocating data-structures > incurs both lots of heap allocations > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)] > and CPU usage > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).] > It looks like reusing data structures across invocations would be a > low-hanging fruit that could help save significant CPU? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10421) Non-deterministic results from KnnVectorQuery?
[ https://issues.apache.org/jira/browse/LUCENE-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493306#comment-17493306 ] Michael McCandless commented on LUCENE-10421: - +1 for a constant. 42 seems good? > Non-deterministic results from KnnVectorQuery? > -- > > Key: LUCENE-10421 > URL: https://issues.apache.org/jira/browse/LUCENE-10421 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [Nightly benchmarks|https://home.apache.org/~mikemccand/lucenebench/] have > been upset for the past ~1.5 weeks because it looks like {{KnnVectorQuery}} > is giving slightly different results on every run, even on an identical > (deterministically constructed – single thread indexing, flush by doc count, > {{{}SerialMergeSchedule{}}}, {{{}LogDocCountMergePolicy{}}}, etc.) index each > night. It produces failures like this, which then abort the benchmark to > help us catch any recent accidental bug that alters our precise top N search > hits and scores: > {noformat} > Traceback (most recent call last): > File “/l/util.nightly/src/python/nightlyBench.py”, line 2177, in > run() > File “/l/util.nightly/src/python/nightlyBench.py”, line 1225, in run > raise RuntimeError(‘search result differences: %s’ % str(errors)) > RuntimeError: search result differences: > [“query=KnnVectorQuery:vector[-0.07267512,...][10] filter=None sort=None > groupField=None hitCount=10: hit 4 has wrong field/score value ([20844660], > ‘0.92060816’) vs ([254438\ > 06], ‘0.920046’)“, “query=KnnVectorQuery:vector[-0.12073054,...][10] > filter=None sort=None groupField=None hitCount=10: hit 7 has wrong > field/score value ([25501982], ‘0.99630797’) vs ([13688085], ‘0.9961489’)“, > “qu\ > ery=KnnVectorQuery:vector[0.02227773,...][10] filter=None sort=None > groupField=None hitCount=10: hit 0 has wrong field/score value ([4741915], > ‘0.9481132’) vs ([14220828], ‘0.9579846’)“, “query=KnnVectorQuery:vector\ > [0.024077624,...][10] filter=None sort=None groupField=None hitCount=10: hit > 0 has wrong field/score value ([7472373], ‘0.8460249’) vs ([12577825], > ‘0.8378446’)“]{noformat} > At first I thought this might be expected because of the recent (awesome!!) > improvements to HNSW, so I tried to simply "regold". But the regold did not > "take", so it indeed looks like there is some non-determinism here. > I pinged [~msoko...@gmail.com] and he found this random seeding that is most > likely the cause? > {noformat} > public final class HnswGraphBuilder { > /** Default random seed for level generation * */ > private static final long DEFAULT_RAND_SEED = System.currentTimeMillis(); > {noformat} > Can we somehow make this deterministic instead? Or maybe the nightly > benchmarks could somehow pass something in to make results deterministic for > benchmarking? Or ... we could also relax the benchmarks to accept > non-determinism for {{KnnVectorQuery}} task? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10421) Non-deterministic results from KnnVectorQuery?
[ https://issues.apache.org/jira/browse/LUCENE-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493320#comment-17493320 ] Robert Muir commented on LUCENE-10421: -- 42 patch is here: https://github.com/apache/lucene/pull/686 > Non-deterministic results from KnnVectorQuery? > -- > > Key: LUCENE-10421 > URL: https://issues.apache.org/jira/browse/LUCENE-10421 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [Nightly benchmarks|https://home.apache.org/~mikemccand/lucenebench/] have > been upset for the past ~1.5 weeks because it looks like {{KnnVectorQuery}} > is giving slightly different results on every run, even on an identical > (deterministically constructed – single thread indexing, flush by doc count, > {{{}SerialMergeSchedule{}}}, {{{}LogDocCountMergePolicy{}}}, etc.) index each > night. It produces failures like this, which then abort the benchmark to > help us catch any recent accidental bug that alters our precise top N search > hits and scores: > {noformat} > Traceback (most recent call last): > File “/l/util.nightly/src/python/nightlyBench.py”, line 2177, in > run() > File “/l/util.nightly/src/python/nightlyBench.py”, line 1225, in run > raise RuntimeError(‘search result differences: %s’ % str(errors)) > RuntimeError: search result differences: > [“query=KnnVectorQuery:vector[-0.07267512,...][10] filter=None sort=None > groupField=None hitCount=10: hit 4 has wrong field/score value ([20844660], > ‘0.92060816’) vs ([254438\ > 06], ‘0.920046’)“, “query=KnnVectorQuery:vector[-0.12073054,...][10] > filter=None sort=None groupField=None hitCount=10: hit 7 has wrong > field/score value ([25501982], ‘0.99630797’) vs ([13688085], ‘0.9961489’)“, > “qu\ > ery=KnnVectorQuery:vector[0.02227773,...][10] filter=None sort=None > groupField=None hitCount=10: hit 0 has wrong field/score value ([4741915], > ‘0.9481132’) vs ([14220828], ‘0.9579846’)“, “query=KnnVectorQuery:vector\ > [0.024077624,...][10] filter=None sort=None groupField=None hitCount=10: hit > 0 has wrong field/score value ([7472373], ‘0.8460249’) vs ([12577825], > ‘0.8378446’)“]{noformat} > At first I thought this might be expected because of the recent (awesome!!) > improvements to HNSW, so I tried to simply "regold". But the regold did not > "take", so it indeed looks like there is some non-determinism here. > I pinged [~msoko...@gmail.com] and he found this random seeding that is most > likely the cause? > {noformat} > public final class HnswGraphBuilder { > /** Default random seed for level generation * */ > private static final long DEFAULT_RAND_SEED = System.currentTimeMillis(); > {noformat} > Can we somehow make this deterministic instead? Or maybe the nightly > benchmarks could somehow pass something in to make results deterministic for > benchmarking? Or ... we could also relax the benchmarks to accept > non-determinism for {{KnnVectorQuery}} task? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10391) Reuse data structures across HnswGraph invocations
[ https://issues.apache.org/jira/browse/LUCENE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493345#comment-17493345 ] Julie Tibshirani commented on LUCENE-10391: --- Oh okay thanks, ignore my analysis above then. Funny how I managed to see improvement even when they weren't running! > Reuse data structures across HnswGraph invocations > -- > > Key: LUCENE-10391 > URL: https://issues.apache.org/jira/browse/LUCENE-10391 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Julie Tibshirani >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Creating HNSW graphs involves doing many repeated calls to HnswGraph#search. > Profiles from nightly benchmarks suggest that allocating data-structures > incurs both lots of heap allocations > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_heap)] > and CPU usage > ([http://people.apache.org/~mikemccand/lucenebench/2022.01.23.18.03.17.html#profiler_1kb_indexing_vectors_4_cpu).] > It looks like reusing data structures across invocations would be a > low-hanging fruit that could help save significant CPU? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller commented on a change in pull request #678: URL: https://github.com/apache/lucene/pull/678#discussion_r808256634 ## File path: lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java ## @@ -111,12 +111,9 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio @Override public Scorer scorer(LeafReaderContext context) throws IOException { -Terms terms = context.reader().terms(fieldName); -if (terms == null) { - return null; -} +Terms terms = Terms.terms(context.reader(), fieldName); TermsEnum termsEnum = terms.iterator(); -if (termsEnum.seekExact(new BytesRef(featureName)) == false) { +if (!termsEnum.seekExact(new BytesRef(featureName))) { Review comment: As a side note, in case it's helpful, I know with IntelliJ at least you can disable the suggestion it likes to give to convert all these `== false` occurrences to `!` if that irritates you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader
gsmiller commented on pull request #678: URL: https://github.com/apache/lucene/pull/678#issuecomment-1041899789 Thanks for the quick iteration! This looks good to me now. As I mentioned before, I'm going to wait a couple days before merging in case anyone else wants to chime in with feedback or opposition to adding this functionality, but I'd consider this ready to go from my perspective. As a side note, in the future, it makes it a little easier to review if you avoid force pushing changes and leave the git commit history in place. That way I can easily look at what's changed since I last reviewed. I know a lot of people are in the habit of squashing commit history to keep it clean, but github makes that super easy to do when actually merging your pull request, so no need to do that on your side. Just a future note. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #685: LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count.
gsmiller commented on a change in pull request #685: URL: https://github.com/apache/lucene/pull/685#discussion_r808281208 ## File path: lucene/CHANGES.txt ## @@ -615,6 +615,8 @@ Improvements * LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values with its own custom encoding. (Greg Miller) + +* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera) Review comment: This should go under 9.1 right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r808308244 ## File path: build.gradle ## @@ -183,3 +183,5 @@ apply from: file('gradle/hacks/turbocharge-jvm-opts.gradle') apply from: file('gradle/hacks/dummy-outputs.gradle') apply from: file('gradle/pylucene/pylucene.gradle') +sourceCompatibility = JavaVersion.VERSION_16 Review comment: Definitely not! Somehow this file gets automatically changed, and I accidentally included it with `git add -u`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r808356718 ## File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java ## @@ -455,6 +484,61 @@ public void testRandom() throws IOException { } } + /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */ + public void testRandomWithFilter() throws IOException { +int numDocs = 200; +int dimension = atLeast(5); +int numIters = atLeast(10); +try (Directory d = newDirectory()) { + RandomIndexWriter w = new RandomIndexWriter(random(), d); + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +doc.add(new KnnVectorField("field", randomVector(dimension))); +doc.add(new NumericDocValuesField("tag", i)); +doc.add(new IntPoint("tag", i)); +w.addDocument(doc); + } + w.close(); + + try (IndexReader reader = DirectoryReader.open(d)) { +IndexSearcher searcher = newSearcher(reader); +for (int i = 0; i < numIters; i++) { + int lower = random().nextInt(50); + + // Check that when filter is restrictive, we use exact search + Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6); + KnnVectorQuery query = new KnnVectorQuery("field", randomVector(dimension), 5, filter); + TopDocs results = searcher.search(query, numDocs); + assertEquals(TotalHits.Relation.EQUAL_TO, results.totalHits.relation); + assertEquals(results.totalHits.value, 5); Review comment: Thanks for catching this. I actually got confused here and wrote test assertions that are misleading. Since `KnnVectorQuery` is rewritten to `DocAndScoreQuery`, none of the information about visited nodes is preserved. Therefore we can't tell if exact or approximate search was used. I will rework this test. I will open a follow-up issue to discuss this. I don't feel like we have a perfect grasp on what total hits should mean in the context of kNN search, especially since it differs between `LeafReader#searchNearestVectors` and the output of `KnnVectorQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] vigyasharma commented on pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field
vigyasharma commented on pull request #677: URL: https://github.com/apache/lucene/pull/677#issuecomment-1042264970 > This looks great. I left a tiny comment related to tests. Could you also add an entry to `CHANGES.txt` under "Lucene 9.1.0"? Thank you for reviewing this PR, @jtibshirani. I've added the Changes entry and updates UTs to not assert on the search result. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10176) Remove VectorValues#size()
[ https://issues.apache.org/jira/browse/LUCENE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493518#comment-17493518 ] Julie Tibshirani commented on LUCENE-10176: --- Sorry to be jumping in late with a question. What is the motivation for removing VectorValues#size()? We have the information available and it could be helpful in some contexts. For example https://github.com/apache/lucene/pull/656 proposes to add a query KnnVectorFieldExistsQuery. This query could benefit from VectorValues#size() to try to rewrite to MatchAllDocsQuery when all docs have a vector. > Remove VectorValues#size() > -- > > Key: LUCENE-10176 > URL: https://issues.apache.org/jira/browse/LUCENE-10176 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > > This method doesn't seem to be used anywhere except by > SimpleTextKnnVectorsReader#search, which uses it in an incorrect way by using > it as the total number of hits matching a nearest-neighbor search (it is > incorrect because this number might be higher than the number of vectors > having a value because of deletes). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
jtibshirani commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r808504111 ## File path: lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java ## @@ -1018,4 +1020,57 @@ public void testAdvance() throws Exception { } } } + + public void testVectorValuesReportCorrectDocs() throws Exception { +final int numDocs = atLeast(1000); +final int dim = random().nextInt(20) + 1; +final VectorSimilarityFunction similarityFunction = +VectorSimilarityFunction.values()[ +random().nextInt(VectorSimilarityFunction.values().length)]; + +float fieldValuesCheckSum = 0f; +int fieldDocCount = 0; +long fieldSumDocIDs = 0; + +try (Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig())) { + for (int i = 0; i < numDocs; i++) { +Document doc = new Document(); +int docID = random().nextInt(numDocs); +doc.add(new StoredField("id", docID)); +if (random().nextInt(4) == 3) { + float[] vector = randomVector(dim); + doc.add(new KnnVectorField("knn_vector", vector, similarityFunction)); + fieldValuesCheckSum += vector[0]; + fieldDocCount++; + fieldSumDocIDs += docID; +} +w.addDocument(doc); + } + + if (random().nextBoolean()) { +w.forceMerge(1); + } + + try (IndexReader r = w.getReader()) { +float checksum = 0; Review comment: Sorry I read this too fast and wrote a confusing comment :) This check looks good to me. ## File path: lucene/CHANGES.txt ## @@ -204,6 +204,8 @@ Optimizations * LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of matching clauses is a constant. (LuYunCheng via Adrien Grand) +* LUCENE-10408 Better encoding of doc Ids in vectors (Mayya Sharipova, Julie Tibshirani, Adrien Grand) Review comment: Thanks for including me! I'm also fine if you omit me when I'm a reviewer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #685: LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count.
iverase commented on a change in pull request #685: URL: https://github.com/apache/lucene/pull/685#discussion_r808708059 ## File path: lucene/CHANGES.txt ## @@ -615,6 +615,8 @@ Improvements * LUCENE-10062: Switch taxonomy faceting to use numeric doc values for storing ordinals instead of binary doc values with its own custom encoding. (Greg Miller) + +* LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (Ignacio Vera) Review comment: of course, what an oversight :) thanks @gsmiller! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #685: LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count.
iverase merged pull request #685: URL: https://github.com/apache/lucene/pull/685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10415) FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count
[ https://issues.apache.org/jira/browse/LUCENE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493705#comment-17493705 ] ASF subversion and git services commented on LUCENE-10415: -- Commit 84e34dc4683ba43a0ebe5e942ee117b64b29cdec in lucene's branch refs/heads/main from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=84e34dc ] LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (#685) These query wrappers do not modify the set of matching documents so they can delegate Weight#count. > FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count > - > > Key: LUCENE-10415 > URL: https://issues.apache.org/jira/browse/LUCENE-10415 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > We have a number of query wrappers that do not modify the set of matching > documents like FunctionScoreQuery and IndexOrDocValuesQuery. These queries > should delegate Weight#count. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10415) FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count
[ https://issues.apache.org/jira/browse/LUCENE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493709#comment-17493709 ] ASF subversion and git services commented on LUCENE-10415: -- Commit 423573759f74645e0f2cf4a092d8e2d51b75b559 in lucene's branch refs/heads/branch_9x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4235737 ] LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (#685) These query wrappers do not modify the set of matching documents so they can delegate Weight#count. > FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count > - > > Key: LUCENE-10415 > URL: https://issues.apache.org/jira/browse/LUCENE-10415 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > We have a number of query wrappers that do not modify the set of matching > documents like FunctionScoreQuery and IndexOrDocValuesQuery. These queries > should delegate Weight#count. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10415) FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count
[ https://issues.apache.org/jira/browse/LUCENE-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera resolved LUCENE-10415. --- Fix Version/s: 9.1 Assignee: Ignacio Vera Resolution: Fixed > FunctionScoreQuery and IndexOrDocValuesQuery should delegate Weight#count > - > > Key: LUCENE-10415 > URL: https://issues.apache.org/jira/browse/LUCENE-10415 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Assignee: Ignacio Vera >Priority: Minor > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > We have a number of query wrappers that do not modify the set of matching > documents like FunctionScoreQuery and IndexOrDocValuesQuery. These queries > should delegate Weight#count. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org