[GitHub] [lucene] mayya-sharipova commented on issue #11769: TestKnnVectorQuery.testScoreEuclidean fails

2022-09-19 Thread GitBox
mayya-sharipova commented on issue #11769: URL: https://github.com/apache/lucene/issues/11769#issuecomment-1251024321 Thanks @msokolov. I run the above test on main, and it doesn't fall anymore. Closing the issue. -- This is an automated message from the Apache Git Service. To re

[GitHub] [lucene] mayya-sharipova closed issue #11769: TestKnnVectorQuery.testScoreEuclidean fails

2022-09-19 Thread GitBox
mayya-sharipova closed issue #11769: TestKnnVectorQuery.testScoreEuclidean fails URL: https://github.com/apache/lucene/issues/11769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [lucene] mayya-sharipova commented on pull request #11784: NeighborArray is now fixed size

2022-09-19 Thread GitBox
mayya-sharipova commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251045793 @msokolov Thanks for tackling this. I was also thinking to remove `NeighborArray` of resizing, which makes logic simplier. I was thinking a better approach would be to leav

[GitHub] [lucene] msokolov commented on issue #11769: TestKnnVectorQuery.testScoreEuclidean fails

2022-09-19 Thread GitBox
msokolov commented on issue #11769: URL: https://github.com/apache/lucene/issues/11769#issuecomment-1251087019 I just backported to 9.x and 9_4 branches -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [lucene] msokolov commented on pull request #11784: NeighborArray is now fixed size

2022-09-19 Thread GitBox
msokolov commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251132612 > I was thinking a better approach would be to leave it to uses of NeighborArray to define maxSize, and not add +1 in the NeighborArray class itself as this PR suggests I guess I

[GitHub] [lucene] mayya-sharipova commented on pull request #11781: Diversity check bugfix

2022-09-19 Thread GitBox
mayya-sharipova commented on PR #11781: URL: https://github.com/apache/lucene/pull/11781#issuecomment-1251139127 @msokolov Thanks for tacking this. I ran ann benchmarks with this change, and happy to confirm that in my test recall with this PR is the same as in 9.3 branch, although QP

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11781: Diversity check bugfix

2022-09-19 Thread GitBox
mayya-sharipova commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974364476 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -316,49 +316,49 @@ private boolean isDiverse(BytesRef candidate, NeighborArray

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #11781: Diversity check bugfix

2022-09-19 Thread GitBox
mayya-sharipova commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974364724 ## lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java: ## @@ -555,6 +556,78 @@ public void testDiversity() throws IOException { assertLeve

[GitHub] [lucene] msokolov commented on pull request #11784: NeighborArray is now fixed size

2022-09-19 Thread GitBox
msokolov commented on PR #11784: URL: https://github.com/apache/lucene/pull/11784#issuecomment-1251146315 Also -- now that I see this I realize that most likely we are never exercising this resize capability, so removing it won't really help performance / memory usage as I was hoping. But i

[GitHub] [lucene] iverase opened a new issue, #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`

2022-09-19 Thread GitBox
iverase opened a new issue, #11785: URL: https://github.com/apache/lucene/issues/11785 ### Description This method iterates over all the remaining edges of a polygons to check if a given edge intersects any of them .Currently the method is called when curing local intersections or sp

[GitHub] [lucene] iverase opened a new pull request, #11786: Improve tessellator performance by delaying calls to the method #isIntersectingPolygon

2022-09-19 Thread GitBox
iverase opened a new pull request, #11786: URL: https://github.com/apache/lucene/pull/11786 See https://github.com/apache/lucene/issues/11785 With these change the bottleneck of the tessellator moves to the algorithm that eliminates holes from the polygon. -- This is an automa

[GitHub] [lucene] jpountz commented on issue #11773: Could `PointRangeQuery`'s boundary values used for `NumericComparator` to calculate `estimatedNumberOfMatches`

2022-09-19 Thread GitBox
jpountz commented on issue #11773: URL: https://github.com/apache/lucene/issues/11773#issuecomment-1251190154 The `estimatedNumberOfMatches` should still be very close to the actual number, so I'm not expecting that a more precise value would change when we rebuild the `DocIdSet` of top-k c

[GitHub] [lucene] msokolov commented on a diff in pull request #11781: Diversity check bugfix

2022-09-19 Thread GitBox
msokolov commented on code in PR #11781: URL: https://github.com/apache/lucene/pull/11781#discussion_r974411586 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -316,49 +316,49 @@ private boolean isDiverse(BytesRef candidate, NeighborArray neighb

[GitHub] [lucene] msokolov merged pull request #11781: Diversity check bugfix

2022-09-19 Thread GitBox
msokolov merged PR #11781: URL: https://github.com/apache/lucene/pull/11781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] msokolov closed issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577

2022-09-19 Thread GitBox
msokolov closed issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577 URL: https://github.com/apache/lucene/issues/11782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [lucene] msokolov commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577

2022-09-19 Thread GitBox
msokolov commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251216990 merged #11781 and cherry-picked to `branch_9x` and `branch_9_4` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-19 Thread GitBox
jpountz commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-1251264246 I got some numbers for write amplification for the case tested in `TestTieredMergePolicy#testSimulateUpdates`: | Allowed percentage of deletes | Write amplification | | ---

[GitHub] [lucene] msokolov opened a new issue, #11787: Handle degenerate case where all HNSW search candidates are filtered

2022-09-19 Thread GitBox
msokolov opened a new issue, #11787: URL: https://github.com/apache/lucene/issues/11787 ### Description This test failure reproduces every time. What seems to happen is that we search with a filter that retains > 50% of documents yet we hit an unlucky condition where the graph is not

[GitHub] [lucene] msokolov merged pull request #11747: update DOAP and releaseWizard to reflect migration to github

2022-09-19 Thread GitBox
msokolov merged PR #11747: URL: https://github.com/apache/lucene/pull/11747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[GitHub] [lucene] reta opened a new issue, #11788: Upgrade ANTLR to version 4.11.1

2022-09-19 Thread GitBox
reta opened a new issue, #11788: URL: https://github.com/apache/lucene/issues/11788 ### Description The Apache Lucene is using quite old version of ANTLR 4.5.1-1. By itseld, it is not a showstopper, but more profound issue is that some ANTLR 3.x bits are used [1]. Since ANTLR 4.10.x

[GitHub] [lucene] sashashura opened a new pull request, #11789: GitHub Workflows security hardening

2022-09-19 Thread GitBox
sashashura opened a new pull request, #11789: URL: https://github.com/apache/lucene/pull/11789 This PR adds explicit [permissions section](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions) to workflows. This is a security best practice becaus

[GitHub] [lucene] jtibshirani commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577

2022-09-19 Thread GitBox
jtibshirani commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251626019 @msokolov a test case started failing regularly after you merged the change. Here's an example repro line: ``` ./gradlew test --tests TestKnnVectorQuery.testFilterWithS

[GitHub] [lucene] msokolov commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577

2022-09-19 Thread GitBox
msokolov commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251640822 Thanks, I had opened https://github.com/apache/lucene/issues/11787. I'm not entirely sure this is unexpected? But maybe the graphs have become sparser somehow?? -- This is an au

[GitHub] [lucene] msokolov commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered

2022-09-19 Thread GitBox
msokolov commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251652637 This test is really testing a pathological case ... when the vectors are all the same everything is equidistant from everything else and "nearest neighbor" ceases to really even me

[GitHub] [lucene] msokolov opened a new pull request, #11790: Mark HNSW search results incomplete when fewer than topK are found

2022-09-19 Thread GitBox
msokolov opened a new pull request, #11790: URL: https://github.com/apache/lucene/pull/11790 This addresses a random test failure that came up recently due to another fix. I think this failure exposed a hole in our logic; when a search returns fewer results than requested *and we have not e

[GitHub] [lucene] jtibshirani commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered

2022-09-19 Thread GitBox
jtibshirani commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251681230 Thanks for digging into this! I added this test to exercise the tie-breaking logic. But now I think it wasn't a good idea -- HNSW is known to exhibit very poor performance when

[GitHub] [lucene] jtibshirani commented on issue #11782: Fix bugs in HNSW diversity check introduced in LUCENE-10577

2022-09-19 Thread GitBox
jtibshirani commented on issue #11782: URL: https://github.com/apache/lucene/issues/11782#issuecomment-1251691399 Oh oops, I had missed that. I made a comment on the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] patelprateek opened a new issue, #11791: cardinality estimation for query filters

2022-09-19 Thread GitBox
patelprateek opened a new issue, #11791: URL: https://github.com/apache/lucene/issues/11791 ### Description For large scale data the query filters can take long time to execute and return data . the returned data can also be large like millions of documents . Is there any functionali

[GitHub] [lucene] msokolov commented on issue #11787: Handle degenerate case where all HNSW search candidates are filtered

2022-09-19 Thread GitBox
msokolov commented on issue #11787: URL: https://github.com/apache/lucene/issues/11787#issuecomment-1251697346 We could keep the test if we did this: https://github.com/apache/lucene/pull/11790 which would cause fallback to a full scan in this kind of case. It seems like a reasonable fallba

[GitHub] [lucene] iverase merged pull request #11786: Improve tessellator performance by delaying calls to the method #isIntersectingPolygon

2022-09-19 Thread GitBox
iverase merged PR #11786: URL: https://github.com/apache/lucene/pull/11786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] dweiss commented on pull request #11789: GitHub Workflows security hardening

2022-09-19 Thread GitBox
dweiss commented on PR #11789: URL: https://github.com/apache/lucene/pull/11789#issuecomment-1251873349 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [lucene] iverase closed issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`

2022-09-19 Thread GitBox
iverase closed issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon` URL: https://github.com/apache/lucene/issues/11785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [lucene] iverase commented on issue #11785: Improve tessellator performance by delaying calls of`isIntersectingPolygon`

2022-09-19 Thread GitBox
iverase commented on issue #11785: URL: https://github.com/apache/lucene/issues/11785#issuecomment-1251887364 closed in https://github.com/apache/lucene/pull/11786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR