[GitHub] [lucene] mayya-sharipova commented on issue #11769: TestKnnVectorQuery.testScoreEuclidean fails
mayya-sharipova commented on issue #11769: URL: https://github.com/apache/lucene/issues/11769#issuecomment-1251024321 Thanks @msokolov. I run the above test on main, and it doesn't fall anymore. Closing the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova closed issue #11769: TestKnnVectorQuery.testScoreEuclidean fails
mayya-sharipova closed issue #11769: TestKnnVectorQuery.testScoreEuclidean fails URL: https://github.com/apache/lucene/issues/11769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #11784: NeighborArray is now fixed size
mayya-sharipova commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251045793 @msokolov Thanks for tackling this. I was also thinking to remove `NeighborArray` of resizing, which makes logic simplier. I was thinking a better approach would be to leave it to `NeighborArray` users to define `maxSize`, and not add +1 in the `NeighborArray` class itself as this PR suggests. For example, [OnHeapHnswGraph](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java#L62-L66) already adds +1 when creating `NeighborArray`. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11769: TestKnnVectorQuery.testScoreEuclidean fails
msokolov commented on issue #11769: URL: https://github.com/apache/lucene/issues/11769#issuecomment-1251087019 I just backported to 9.x and 9_4 branches -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11784: NeighborArray is now fixed size
msokolov commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251132612 > I was thinking a better approach would be to leave it to uses of NeighborArray to define maxSize, and not add +1 in the NeighborArray class itself as this PR suggests I guess I was thinking that since this class only has a single use, it wouldn't matter? But it definitely is better encapsulation to move the sizing logic to the place where we know how many we need. +1 to have consumers do it, especially since at least in one place they already do :) I'll follow up with a patch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #11781: Diversity check bugfix
mayya-sharipova commented on PR #11781: URL: https://github.com/apache/lucene/pull/11781#issuecomment-1251139127 @msokolov Thanks for tacking this. I ran ann benchmarks with this change, and happy to confirm that in my test recall with this PR is the same as in 9.3 branch, although QPS is lower, but we can investigate QPSs later. **glove-100-angular M:16 efConstruction:100** | | 9.3 recall | 9.3 QPS | this PR recall | this PR QPS | | --- | -: | ---: | -: | --: | | n_cands=10 | 0.620 | 2745.933 | 0.620 |1675.500 | | n_cands=20 | 0.680 | 2288.665 | 0.680 |1512.744 | | n_cands=40 | 0.746 | 1770.243 | 0.746 |1040.240 | | n_cands=80 | 0.809 | 1226.738 | 0.809 | 695.236 | | n_cands=120 | 0.843 | 948.908 | 0.843 | 525.914 | | n_cands=200 | 0.878 | 671.781 | 0.878 | 351.529 | | n_cands=400 | 0.918 | 392.265 | 0.918 | 207.854 | | n_cands=600 | 0.937 | 282.403 | 0.937 | 144.311 | | n_cands=800 | 0.949 | 214.620 | 0.949 | 116.875 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11781: Diversity check bugfix
mayya-sharipova commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974364476 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -316,49 +316,49 @@ private boolean isDiverse(BytesRef candidate, NeighborArray neighbors, float sco */ private int findWorstNonDiverse(NeighborArray neighbors) throws IOException { for (int i = neighbors.size() - 1; i > 0; i--) { - if (isWorstNonDiverse(i, neighbors, neighbors.score[i])) { + if (isWorstNonDiverse(i, neighbors)) { return i; } } return neighbors.size() - 1; } - private boolean isWorstNonDiverse( - int candidate, NeighborArray neighbors, float minAcceptedSimilarity) throws IOException { + private boolean isWorstNonDiverse(int candidateIndex, NeighborArray neighbors) + throws IOException { +int candidateNode = neighbors.node[candidateIndex]; return switch (vectorEncoding) { - case BYTE -> isWorstNonDiverse( - candidate, vectors.binaryValue(candidate), neighbors, minAcceptedSimilarity); + case BYTE -> isWorstNonDiverse(candidateIndex, vectors.binaryValue(candidateNode), neighbors); case FLOAT32 -> isWorstNonDiverse( - candidate, vectors.vectorValue(candidate), neighbors, minAcceptedSimilarity); + candidateIndex, vectors.vectorValue(candidateNode), neighbors); }; } private boolean isWorstNonDiverse( - int candidateIndex, float[] candidate, NeighborArray neighbors, float minAcceptedSimilarity) - throws IOException { -for (int i = candidateIndex - 1; i > -0; i--) { + int candidateIndex, float[] candidateVector, NeighborArray neighbors) throws IOException { +float minAcceptedSimilarity = neighbors.score[candidateIndex]; +for (int i = candidateIndex - 1; i >= 0; i--) { float neighborSimilarity = - similarityFunction.compare(candidate, vectorsCopy.vectorValue(neighbors.node[i])); - // node i is too similar to node j given its score relative to the base node + similarityFunction.compare(candidateVector, vectorsCopy.vectorValue(neighbors.node[i])); + // candidate node is too similar to node i given its score relative to the base node if (neighborSimilarity >= minAcceptedSimilarity) { -return false; +return true; } } -return true; +return false; } private boolean isWorstNonDiverse( - int candidateIndex, BytesRef candidate, NeighborArray neighbors, float minAcceptedSimilarity) - throws IOException { -for (int i = candidateIndex - 1; i > -0; i--) { + int candidateIndex, BytesRef candidateVector, NeighborArray neighbors) throws IOException { Review Comment: I am surprised that with this big change, we had only a small reduction in recall. I guess the reason could be that in our tests diversity check was really relevant only for small number of nodes; in majority of cases the algorithm just eliminated the most distant node. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11781: Diversity check bugfix
mayya-sharipova commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974364724 ## lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java: ## @@ -555,6 +556,78 @@ public void testDiversity() throws IOException { assertLevel0Neighbors(builder.hnsw, 5, 1, 4); } + public void testDiversityFallback() throws IOException { +vectorEncoding = randomVectorEncoding(); +similarityFunction = VectorSimilarityFunction.EUCLIDEAN; +// Some test cases can't be exercised in two dimensions; +// in particular if a new neighbor displaces an existing neighbor +// by being closer to the target, yet none of the existing neighbors is closer to the new vector +// than to the target -- ie they all remain diverse, so we simply drop the farthest one. +float[][] values = { + {0, 0, 0}, + {0, 1, 0}, + {0, 0, 2}, + {1, 0, 0}, + {0, 0.4f, 0} +}; +MockVectorValues vectors = new MockVectorValues(values); +// First add nodes until everybody gets a full neighbor list +HnswGraphBuilder builder = +HnswGraphBuilder.create( +vectors, vectorEncoding, similarityFunction, 1, 10, random().nextInt()); +// node 0 is added by the builder constructor +// builder.addGraphNode(vectors.vectorValue(0)); +RandomAccessVectorValues vectorsCopy = vectors.copy(); +builder.addGraphNode(1, vectorsCopy); +builder.addGraphNode(2, vectorsCopy); +assertLevel0Neighbors(builder.hnsw, 0, 1, 2); +// 2 is closer to 0 than 1, so it is excluded as non-diverse +assertLevel0Neighbors(builder.hnsw, 1, 0); +// 1 is closer to 0 than 2, so it is excluded as non-diverse +assertLevel0Neighbors(builder.hnsw, 2, 0); + +builder.addGraphNode(3, vectorsCopy); +// this is one case we are testing; 2 has been displaced by 3 Review Comment: nice test! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #11784: NeighborArray is now fixed size
msokolov commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251146315 Also -- now that I see this I realize that most likely we are never exercising this resize capability, so removing it won't really help performance / memory usage as I was hoping. But it still seems like a good cleanup? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new issue, #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`
iverase opened a new issue, #11785: URL: https://github.com/apache/lucene/issues/11785 ### Description This method iterates over all the remaining edges of a polygons to check if a given edge intersects any of them .Currently the method is called when curing local intersections or splitting the polygon which is iterating over the polygon edges so it is potentially a O(n^2) on the edges of the polygon. The calls are performed on a big conditional but currently the calls are not done in the last position. So just moving the call to the last position brings a very nice performance improvement. For example for the polygons shared on https://github.com/apache/lucene/issues/11777: [FE-2456.txt](https://github.com/apache/lucene/files/9577391/FE-2456.txt): without change: 542.682 seconds with change: 229.524 seconds [ORG-24132378.txt](https://github.com/apache/lucene/files/9577398/ORG-24132378.txt): without change: too long, I did not have patience to let it finish. with change: 1416.57 seconds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase opened a new pull request, #11786: Improve tessellator performance by delaying calls to the method #isIntersectingPolygon
iverase opened a new pull request, #11786: URL: https://github.com/apache/lucene/pull/11786 See https://github.com/apache/lucene/issues/11785 With these change the bottleneck of the tessellator moves to the algorithm that eliminates holes from the polygon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #11773: Could `PointRangeQuery`'s boundary values used for `NumericComparator` to calculate `estimatedNumberOfMatches`
jpountz commented on issue #11773: URL: https://github.com/apache/lucene/issues/11773#issuecomment-1251190154 The `estimatedNumberOfMatches` should still be very close to the actual number, so I'm not expecting that a more precise value would change when we rebuild the `DocIdSet` of top-k candidates, would it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a diff in pull request #11781: Diversity check bugfix
msokolov commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974411586 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -316,49 +316,49 @@ private boolean isDiverse(BytesRef candidate, NeighborArray neighbors, float sco */ private int findWorstNonDiverse(NeighborArray neighbors) throws IOException { for (int i = neighbors.size() - 1; i > 0; i--) { - if (isWorstNonDiverse(i, neighbors, neighbors.score[i])) { + if (isWorstNonDiverse(i, neighbors)) { return i; } } return neighbors.size() - 1; } - private boolean isWorstNonDiverse( - int candidate, NeighborArray neighbors, float minAcceptedSimilarity) throws IOException { + private boolean isWorstNonDiverse(int candidateIndex, NeighborArray neighbors) + throws IOException { +int candidateNode = neighbors.node[candidateIndex]; return switch (vectorEncoding) { - case BYTE -> isWorstNonDiverse( - candidate, vectors.binaryValue(candidate), neighbors, minAcceptedSimilarity); + case BYTE -> isWorstNonDiverse(candidateIndex, vectors.binaryValue(candidateNode), neighbors); case FLOAT32 -> isWorstNonDiverse( - candidate, vectors.vectorValue(candidate), neighbors, minAcceptedSimilarity); + candidateIndex, vectors.vectorValue(candidateNode), neighbors); }; } private boolean isWorstNonDiverse( - int candidateIndex, float[] candidate, NeighborArray neighbors, float minAcceptedSimilarity) - throws IOException { -for (int i = candidateIndex - 1; i > -0; i--) { + int candidateIndex, float[] candidateVector, NeighborArray neighbors) throws IOException { +float minAcceptedSimilarity = neighbors.score[candidateIndex]; +for (int i = candidateIndex - 1; i >= 0; i--) { float neighborSimilarity = - similarityFunction.compare(candidate, vectorsCopy.vectorValue(neighbors.node[i])); - // node i is too similar to node j given its score relative to the base node + similarityFunction.compare(candidateVector, vectorsCopy.vectorValue(neighbors.node[i])); + // candidate node is too similar to node i given its score relative to the base node if (neighborSimilarity >= minAcceptedSimilarity) { -return false; +return true; } } -return true; +return false; } private boolean isWorstNonDiverse( - int candidateIndex, BytesRef candidate, NeighborArray neighbors, float minAcceptedSimilarity) - throws IOException { -for (int i = candidateIndex - 1; i > -0; i--) { + int candidateIndex, BytesRef candidateVector, NeighborArray neighbors) throws IOException { Review Comment: I know - how did this garbage even work at all! :frowning_face: It's kind of astonishing how insensitive this whole process is to the diversity checking. Initially we didn't have it at all though (just always pick the closest neighbors), and things still kind of work. Then I had the wonky implementation that did not sort the neighbors while indexing, but did some best effort kind of thing, and still it mostly worked. So we need good tests here to ensure we are doing the right thing! Because bugs here can lead to small degradation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #11781: Diversity check bugfix
msokolov merged PR #11781: URL: https://github.com/apache/lucene/pull/11781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov closed issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
msokolov closed issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577 URL: https://github.com/apache/lucene/issues/11782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
msokolov commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251216990 merged #11781 and cherry-picked to `branch_9x` and `branch_9_4` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits
jpountz commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1251264246 I got some numbers for write amplification for the case tested in `TestTieredMergePolicy#testSimulateUpdates`: | Allowed percentage of deletes | Write amplification | | - | - | | 50 (max) | 4.34 | | 33 (default) | 4.34 | | 20 (min) | 4.68 | | 10 | 6.13 | | 5 | 8.76 | | 4 | 10.31 | | 3 | 12.97 | | 2 | 18.76 | | 1 | 37.89 | | 0 | 10779.78 | Assuming these numbers are representative, maybe we could allow users to configure 5% as the allowed percentage of deletes that their indexes may have, which translates to ~2x more write amplification compared to the default of 33% according to the above numbers. For reference, the algorithm that `TieredMergePolicy` uses to keep the number of deletes under the threshold consists of running the most balanced merge (with a small bias towards merges that reclaim more deletes) until the number of deletes of the index is under the threshold. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new issue, #11787: Handle degenerate case where all HNSW search candidates are filtered
msokolov opened a new issue, #11787: URL: https://github.com/apache/lucene/issues/11787 ### Description This test failure reproduces every time. What seems to happen is that we search with a filter that retains > 50% of documents yet we hit an unlucky condition where the graph is not fully connected and every candidate node we visit gets filtered, so we end up with 0 results. It's kind of a degenerate case that is pretty unlikely to arise in a real graph, yet it seems we ought to have some kind of fallback to exact search for this case. ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnVectorQuery.testFilterWithSameScore" -Ptests.jvms=1 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=C5E04AD69C13E006 -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov merged pull request #11747: update DOAP and releaseWizard to reflect migration to github
msokolov merged PR #11747: URL: https://github.com/apache/lucene/pull/11747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] reta opened a new issue, #11788: Upgrade ANTLR to version 4.11.1
reta opened a new issue, #11788: URL: https://github.com/apache/lucene/issues/11788 ### Description The Apache Lucene is using quite old version of ANTLR 4.5.1-1. By itseld, it is not a showstopper, but more profound issue is that some ANTLR 3.x bits are used [1]. Since ANTLR 4.10.x (or even earlier), the compatibility layer with `3.x` release line has been dropped in `4.x` (see please [2]), which makes Apache Lucene impossile to use with recent ANTLR 4.10.x+ releases [3]. The sample exception is below. ``` > java.lang.UnsupportedOperationException: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with version 3 (expected 4). > at org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:56) > at org.antlr.antlr4.runtime@4.11.1/org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:48) > at org.apache.lucene.expressions@10.0.0-SNAPSHOT/org.apache.lucene.expressions.js.JavascriptLexer.(JavascriptLexer.java:279) ``` [1] https://github.com/apache/lucene/blob/main/lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptLexer.java#L189 [2] https://github.com/antlr/antlr4/commit/c68e127a7cf14470565d6e6ae1eff06db3e56ea7 [3] https://github.com/opensearch-project/OpenSearch/pull/4546 @uschindler @jpountz any objections in migrating to ANTLR `4.11.1`? I would be happy to offer my help here, thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sashashura opened a new pull request, #11789: GitHub Workflows security hardening
sashashura opened a new pull request, #11789: URL: https://github.com/apache/lucene/pull/11789 This PR adds explicit [permissions section](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions) to workflows. This is a security best practice because by default workflows run with [extended set of permissions](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token) (except from `on: pull_request` [from external forks](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/)). By specifying any permission explicitly all others are set to none. By using the principle of least privilege the damage a compromised workflow can do (because of an [injection](https://securitylab.github.com/research/github-actions-untrusted-input/) or compromised third party tool or action) is restricted. It is recommended to have [most strict permissions on the top level](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions) and grant write permissions on [job level](https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs) case by case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
jtibshirani commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251626019 @msokolov a test case started failing regularly after you merged the change. Here's an example repro line: ``` ./gradlew test --tests TestKnnVectorQuery.testFilterWithSameScore -Dtests.seed=1951CEB96E0899ED -Dtests.locale=en-PR -Dtests.timezone=Antarctica/South_Pole -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
msokolov commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251640822 Thanks, I had opened https://github.com/apache/lucene/issues/11787. I'm not entirely sure this is unexpected? But maybe the graphs have become sparser somehow?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered
msokolov commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251652637 This test is really testing a pathological case ... when the vectors are all the same everything is equidistant from everything else and "nearest neighbor" ceases to really even mean anything. I'm not sure we should actually have this test other than to verify that there is no crash. Maybe I'm misunderstanding, but what it the test really asserting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request, #11790: Mark HNSW search results incomplete when fewer than topK are found
msokolov opened a new pull request, #11790: URL: https://github.com/apache/lucene/pull/11790 This addresses a random test failure that came up recently due to another fix. I think this failure exposed a hole in our logic; when a search returns fewer results than requested *and we have not explored the entire graph*, we should fall back to exhaustive search. This can happen in degenerate cases such as this test creates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered
jtibshirani commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251681230 Thanks for digging into this! I added this test to exercise the tie-breaking logic. But now I think it wasn't a good idea -- HNSW is known to exhibit very poor performance when vectors are duplicated. And this test takes it to an extreme! It's not really a scenario we support well. Maybe we could just remove this test. It wasn't critical, and I could always follow-up with a better way to test tie-breaking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577
jtibshirani commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251691399 Oh oops, I had missed that. I made a comment on the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] patelprateek opened a new issue, #11791: cardinality estimation for query filters
patelprateek opened a new issue, #11791: URL: https://github.com/apache/lucene/issues/11791 ### Description For large scale data the query filters can take long time to execute and return data . the returned data can also be large like millions of documents . Is there any functionality to be able to get some quick approximate estimate for query filters that can be potentially used to decide whether to run the query or not. If not , would like to know any recommendation or ideas on how we can implement or build that functionality ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered
msokolov commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251697346 We could keep the test if we did this: https://github.com/apache/lucene/pull/11790 which would cause fallback to a full scan in this kind of case. It seems like a reasonable fallback to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase merged pull request #11786: Improve tessellator performance by delaying calls to the method #isIntersectingPolygon
iverase merged PR #11786: URL: https://github.com/apache/lucene/pull/11786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11789: GitHub Workflows security hardening
dweiss commented on PR #11789: URL: https://github.com/apache/lucene/pull/11789#issuecomment-1251873349 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase closed issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`
iverase closed issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon` URL: https://github.com/apache/lucene/issues/11785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`
iverase commented on issue #11785: URL: https://github.com/apache/lucene/issues/11785#issuecomment-1251887364 closed in https://github.com/apache/lucene/pull/11786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org