[GitHub] [lucene] vsop-479 opened a new pull request, #12528: Early terminate visit low cardinality BKD leaf when current value greater than upper point for one dim point.

2023-08-30 Thread via GitHub
vsop-479 opened a new pull request, #12528: URL: https://github.com/apache/lucene/pull/12528 Since values are sorted on the sorted dim in BKD leaf, early terminate the visit when current packed value greater than upper point for one dim point, may get a better performance. Furthermore, f

[GitHub] [lucene] mkhludnev commented on issue #7350: FieldCacheRangeFilter missing from MIGRATE.html [LUCENE-6288]

2023-08-30 Thread via GitHub
mkhludnev commented on issue #7350: URL: https://github.com/apache/lucene/issues/7350#issuecomment-1698682533 Hi, @PenghaiZhang Using FieldCache is obviously discouraged in favor of DocValues (and/or Points???). You can start from https://lucene.apache.org/core/9_5_0/core/org/apac

[GitHub] [lucene] gashutos commented on pull request #12520: Honor topvalue while determining isMissingvalueCompetitive in case bottom is not set

2023-08-30 Thread via GitHub
gashutos commented on PR #12520: URL: https://github.com/apache/lucene/pull/12520#issuecomment-1698994072 > Looks good, can you add a test for missing bottom value as well? Thanks @backslasht . Yes the missing value comparison with bottom (as missing value) is in this test. ht

[GitHub] [lucene] backslasht commented on pull request #12520: Honor topvalue while determining isMissingvalueCompetitive in case bottom is not set

2023-08-30 Thread via GitHub
backslasht commented on PR #12520: URL: https://github.com/apache/lucene/pull/12520#issuecomment-1699028517 > > Looks good, can you add a test for missing ~~bottom~~ minimum value for desc sort as well? > > Thanks @backslasht . Yes the missing value comparison with bottom (as missing

[GitHub] [lucene] gashutos commented on pull request #12520: Honor topvalue while determining isMissingvalueCompetitive in case bottom is not set

2023-08-30 Thread via GitHub
gashutos commented on PR #12520: URL: https://github.com/apache/lucene/pull/12520#issuecomment-1699043021 > > > Looks good, can you add a test for missing ~bottom~ minimum value for desc sort as well? > > > > > > Thanks @backslasht . Yes the missing value comparison with bottom (a

[GitHub] [lucene] benwtrent commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-08-30 Thread via GitHub
benwtrent commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1699427835 I honestly don't know how big `new long[(count/8) * 3];` could get. If this thing can be large ((Integer.MAX_VALUE - 1)/8)*3, it would be good to use some sort of statically

[GitHub] [lucene] iverase commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-08-30 Thread via GitHub
iverase commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1699464755 `count` can only be as bigger as `max_point_on_leaf_nodes` so that's ok. I would like not to create an array every time this method is call, can we find a way to reuse that array?

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310578040 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310584453 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorer.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310567808 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -231,30 +184,34 @@ private void initializeFromGraph( } } - private void a

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310555224 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -205,24 +168,14 @@ private void initializeFromGraph( initializedNodes.add

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310431863 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -165,11 +134,7 @@ private HnswGraphBuilder( * @param vectorsToAdd the vectors fo

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310435808 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [lucene] benwtrent commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
benwtrent commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310584183 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorer.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] jainankitk commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-08-30 Thread via GitHub
jainankitk commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1699706675 With the change: Run 1: ``` |Heap used for stored fields || 0 | MB |

[GitHub] [lucene] jainankitk commented on issue #12527: Optimize readInts24 performance for DocIdsWriter

2023-08-30 Thread via GitHub
jainankitk commented on issue #12527: URL: https://github.com/apache/lucene/issues/12527#issuecomment-1699704470 I ran the workload few more times, and somehow the difference was not as much: Without the patch Run 1: ``` |Heap used for

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-08-30 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1699741992 I added a few tasks that I'm adding here for reference to see how it plays with disjunctions that have more terms or different document frequencies: ``` OrHighVeryLow: 2005 mous

[GitHub] [lucene] vigyasharma commented on issue #12524: Test failure in TestIndexWriter.testDeleteUnusedFiles on Windows 11

2023-08-30 Thread via GitHub
vigyasharma commented on issue #12524: URL: https://github.com/apache/lucene/issues/12524#issuecomment-1699740128 Thanks for surfacing this issue @vsop-479. Going by the documentation in code, we rely on Windows to keep the file open even though it's merged away because we have a reader ope

[GitHub] [lucene] jpountz commented on pull request #12520: Honor topvalue while determining isMissingvalueCompetitive in case bottom is not set

2023-08-30 Thread via GitHub
jpountz commented on PR #12520: URL: https://github.com/apache/lucene/pull/12520#issuecomment-1699744965 I'll merge next week if nobody beats me to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [lucene] msokolov commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-08-30 Thread via GitHub
msokolov commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1699796771 > OrHighHigh sees a major speedup: I think you meant OrHighLow, which is indeed very nicely improved -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-08-30 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1699812531 Oops, yes indeed OrHighLow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310859816 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -165,11 +134,7 @@ private HnswGraphBuilder( * @param vectorsToAdd the vectors for w

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310861588 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310862094 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -205,24 +168,14 @@ private void initializeFromGraph( initializedNodes.add(ne

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310862426 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -231,30 +184,34 @@ private void initializeFromGraph( } } - private void addV

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310869676 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -366,38 +324,11 @@ private void popToScratch(GraphBuilderKnnCollector candidates) {

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310870581 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-08-30 Thread via GitHub
jimczi commented on code in PR #12529: URL: https://github.com/apache/lucene/pull/12529#discussion_r1310878840 ## lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorer.java: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [lucene] vsop-479 commented on issue #12524: Test failure in TestIndexWriter.testDeleteUnusedFiles on Windows 11

2023-08-30 Thread via GitHub
vsop-479 commented on issue #12524: URL: https://github.com/apache/lucene/issues/12524#issuecomment-1700297940 > Maybe something changed in Windows 11 which does not maintain this behavior anymore? I think you are right @vigyasharma . I will try to find another Windows version to ver