[GitHub] [lucene] romseygeek commented on a diff in pull request #860: LUCENE-10553: Fix WANDScorer's handling of 0 and +Infty.

2022-05-03 Thread GitBox
romseygeek commented on code in PR #860: URL: https://github.com/apache/lucene/pull/860#discussion_r863538587 ## lucene/core/src/java/org/apache/lucene/search/WANDScorer.java: ## @@ -86,7 +86,6 @@ static int scalingFactor(float f) { * sure we do not miss any matches. */

[GitHub] [lucene] mayya-sharipova commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
mayya-sharipova commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1116030632 @LuXugang I've created a feature branch: https://github.com/apache/lucene/tree/vectors-disi-direct From my side, we are good to merge this PR into it, but I wonder if @jtibshi

[jira] [Commented] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?

2022-05-03 Thread Alan Woodward (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531180#comment-17531180 ] Alan Woodward commented on LUCENE-10436: The backport of this has inadvertently

[jira] [Created] (LUCENE-10555) avoid repeated NumericLeafComparator setScorer called

2022-05-03 Thread jianping weng (Jira)
jianping weng created LUCENE-10555: -- Summary: avoid repeated NumericLeafComparator setScorer called Key: LUCENE-10555 URL: https://issues.apache.org/jira/browse/LUCENE-10555 Project: Lucene - Core

[jira] [Updated] (LUCENE-10555) avoid repeated NumericLeafComparator setScorer calls

2022-05-03 Thread jianping weng (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jianping weng updated LUCENE-10555: --- Summary: avoid repeated NumericLeafComparator setScorer calls (was: avoid repeated Numeric

[GitHub] [lucene] wjp719 opened a new pull request, #864: LUCENE-10555: avoid repeated NumericLeafComparator setScorer calls

2022-05-03 Thread GitBox
wjp719 opened a new pull request, #864: URL: https://github.com/apache/lucene/pull/864 ElasticSearch use `CancellableBulkScorer` to fast cancel long time query execution by splitting one segment docs to many docs sets. For every docs sets, `collector.setScorer(scorer)` is called, then

[GitHub] [lucene] mocobeta commented on pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
mocobeta commented on PR #833: URL: https://github.com/apache/lucene/pull/833#issuecomment-1116116020 Hi, I am very new to this issue and interested in adding a benchmark for ExitableDirectoryReader to luceneutil. I think it'd be not a "benchmark" since it won't measure search performance,

[GitHub] [lucene] mocobeta commented on a diff in pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
mocobeta commented on code in PR #833: URL: https://github.com/apache/lucene/pull/833#discussion_r863835107 ## lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java: ## @@ -428,6 +430,107 @@ public void testDocValues() throws IOException { directory.

[GitHub] [lucene] rmuir commented on a diff in pull request #633: LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.

2022-05-03 Thread GitBox
rmuir commented on code in PR #633: URL: https://github.com/apache/lucene/pull/633#discussion_r863840319 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/MockRandomMergePolicy.java: ## @@ -86,6 +86,20 @@ public MergeSpecification findMerges( return mergeSpec;

[GitHub] [lucene] mocobeta commented on a diff in pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
mocobeta commented on code in PR #833: URL: https://github.com/apache/lucene/pull/833#discussion_r863840972 ## lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java: ## @@ -428,6 +430,107 @@ public void testDocValues() throws IOException { directory.

[GitHub] [lucene] mocobeta commented on a diff in pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
mocobeta commented on code in PR #833: URL: https://github.com/apache/lucene/pull/833#discussion_r863855038 ## lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java: ## @@ -428,6 +430,107 @@ public void testDocValues() throws IOException { directory.

[GitHub] [lucene] LuXugang opened a new pull request, #865: Lucene-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang opened a new pull request, #865: URL: https://github.com/apache/lucene/pull/865 follow-up of https://github.com/apache/lucene/pull/792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [lucene] LuXugang commented on pull request #865: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on PR #865: URL: https://github.com/apache/lucene/pull/865#issuecomment-1116261200 Hi @mayya-sharipova , full RP presented. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] jpountz commented on pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
jpountz commented on PR #833: URL: https://github.com/apache/lucene/pull/833#issuecomment-1116331660 Actually I was thinking of search performance as something that we would like to measure, since these wrappers can induce some performance overhead. E.g. the baseline could run on the raw re

[jira] [Updated] (LUCENE-10527) Use bigger maxConn for last layer in HNSW

2022-05-03 Thread Julie Tibshirani (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10527: -- Description: Recently I was rereading the HNSW paper ([https://arxiv.org/pdf/1603.09

[GitHub] [lucene] jtibshirani commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
jtibshirani commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1116736228 Thanks for running those benchmarks! I'm working on a quick review before you merge it. One thing I was surprised about with the benchmarks it that you find searches take 60ms or

[GitHub] [lucene] mocobeta commented on pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-03 Thread GitBox
mocobeta commented on PR #833: URL: https://github.com/apache/lucene/pull/833#issuecomment-1116850417 > the baseline could run on the raw reader and the contender would wrap the reader with ExitableDirectoryReader and a very large timeout that's almost certainly not going to be hit, so that

[GitHub] [lucene] mayya-sharipova commented on pull request #862: LUCENE-9848 Sort HNSW graph neighbors for construction

2022-05-03 Thread GitBox
mayya-sharipova commented on PR #862: URL: https://github.com/apache/lucene/pull/862#issuecomment-1116854352 @msokolov Thanks so much for your feedback. I've addressed it in 160904904c94ffd4d194fc2509124c0e2eb9c44a. I've also did another set of benchmarking with new changes, this time

[GitHub] [lucene] jtibshirani commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
jtibshirani commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864338089 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -320,13 +323,19 @@ private static class FieldEntry { final int numL

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #862: LUCENE-9848 Sort HNSW graph neighbors for construction

2022-05-03 Thread GitBox
mayya-sharipova commented on code in PR #862: URL: https://github.com/apache/lucene/pull/862#discussion_r864407239 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -21,32 +21,64 @@ /** * NeighborArray encodes the neighbors of a node and their mu

[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #862: LUCENE-9848 Sort HNSW graph neighbors for construction

2022-05-03 Thread GitBox
mayya-sharipova commented on code in PR #862: URL: https://github.com/apache/lucene/pull/862#discussion_r864407369 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -72,8 +104,38 @@ public void removeLast() { size--; } + public void removeI

[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864437136 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -335,23 +344,23 @@ private static class FieldEntry { dimension = inp

[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864438991 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -335,23 +344,23 @@ private static class FieldEntry { dimension = inp

[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864438991 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -335,23 +344,23 @@ private static class FieldEntry { dimension = inp

[GitHub] [lucene] mocobeta opened a new pull request, #866: Make CONTRIBUTING.md a bit more succinct

2022-05-03 Thread GitBox
mocobeta opened a new pull request, #866: URL: https://github.com/apache/lucene/pull/866 I think it'd be good to try to keep the contributing guide succinct as far as possible so that it is helpful for experienced developers (the main targets of the document, I think). - remove speci

[GitHub] [lucene] mocobeta commented on pull request #866: Make CONTRIBUTING.md a bit more succinct

2022-05-03 Thread GitBox
mocobeta commented on PR #866: URL: https://github.com/apache/lucene/pull/866#issuecomment-1116946180 It's a trivial change in documentation. I'll wait until tomorrow then merge it to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864475729 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -507,8 +515,90 @@ public BytesRef binaryValue(int targetOrd) throws IOExce

[GitHub] [lucene] LuXugang commented on a diff in pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on code in PR #792: URL: https://github.com/apache/lucene/pull/792#discussion_r864476448 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -335,23 +344,23 @@ private static class FieldEntry { dimension = inp

[GitHub] [lucene] LuXugang commented on pull request #792: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-03 Thread GitBox
LuXugang commented on PR #792: URL: https://github.com/apache/lucene/pull/792#issuecomment-1116957549 > Maybe you could give some information about your machine and benchmark set-up (was there a warmup?) Here is my benchmark test demo : https://github.com/LuXugang/Lucene-7.5.0/commit