[GitHub] [lucene] zhaih commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-31 Thread via GitHub
zhaih commented on code in PR #12114: URL: https://github.com/apache/lucene/pull/12114#discussion_r1091567169 ## lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java: ## @@ -379,27 +272,24 @@ public int advance(final int target) throws IOException { @Over

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410148671 i don't understand this issue. The only purpose of this query is for scoring. If you don't want scores, drop the clause completely. -- This is an automated message from the Apache Git Se

[GitHub] [lucene] benwtrent commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
benwtrent commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410368372 > Ah since Lucene95 has just been released, I think we should move this to Lucene 96? @zhaih Do you mean create a new Codec version? From what I can tell, nothing in the

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
benwtrent commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410402325 @rmuir > i don't understand this issue. The only purpose of this query is for scoring. If you don't want scores, drop the clause completely. A `FeatureField` provides a u

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410418901 So just rewrite it to a TermWeight in createWeight if scores are not needed? No need to duplicate the logic. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410426974 example pseudocode: ``` @Override public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { if (!scoreMode.needs

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
benwtrent commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410445640 I like that @rmuir! Its keeps the nice API for FeatureFields and removes code duplication. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410447591 stolen from SynonymQuery lol. and not sure about why it doesn't pass ScoreMode straight thru and instead hardcodes COMPLETE_NO_SCORES, seems wrong. but you got the idea. -- This is an a

[GitHub] [lucene] jpountz commented on a diff in pull request #12116: Improve document API for stored fields.

2023-01-31 Thread via GitHub
jpountz commented on code in PR #12116: URL: https://github.com/apache/lucene/pull/12116#discussion_r1092033320 ## lucene/core/src/java/org/apache/lucene/document/StoredValue.java: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410515203 I'm ok with changes but i still don't understand the use-case. Pulling all documents containing features, then calculating your own score throws away all the efficiency of FeatureField (e.

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410532359 does it make sense? From my perspective the reason to use `FeatureField` is for the WAND-skipping. So if you ask for it not to do scoring, it can't skip, and it defeats the entire purpose.

[GitHub] [lucene] jpountz commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub
jpountz commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410736154 For the record this need comes from implementing sparse retrieval similarly to what's discussed at #11799, so `FeatureField` no longer stores features but regular terms here. One option

[GitHub] [lucene] jpountz commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-31 Thread via GitHub
jpountz commented on code in PR #12114: URL: https://github.com/apache/lucene/pull/12114#discussion_r1092226923 ## lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java: ## @@ -379,27 +272,24 @@ public int advance(final int target) throws IOException { @Ov

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092275814 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -489,6 +485,220 @@ public void mergeOneField(FieldInfo fieldInfo, Me

[GitHub] [lucene] jpountz commented on pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.

2023-01-31 Thread via GitHub
jpountz commented on PR #11900: URL: https://github.com/apache/lucene/pull/11900#issuecomment-1410801742 @jfboeuf I took a stab at removing the versioning logic to simplify the change, I plan on merging it soon if this works for you. -- This is an automated message from the Apache Git Ser

[GitHub] [lucene] zhaih commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
zhaih commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410818168 > Do you mean create a new Codec version? From what I can tell, nothing in the underlying storage format has changed and the only reason Lucene95HnswVectorsReader is cast is for Lucene95Hn

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
benwtrent commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092319484 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it r

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092337143 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub
benwtrent commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092368089 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it r

[GitHub] [lucene] javanna opened a new pull request, #12121: Remove VectorUtil#toBytesRef

2023-01-31 Thread via GitHub
javanna opened a new pull request, #12121: URL: https://github.com/apache/lucene/pull/12121 The method is currently only used in its corresponding test method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [lucene] javanna opened a new pull request, #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub
javanna opened a new pull request, #12122: URL: https://github.com/apache/lucene/pull/12122 Two of the methods (squareDistance and dotProduct) that take byte arrays return a float while the variable used to store the value is an int. They can just return an int. -- This is an automated m

[GitHub] [lucene] benwtrent commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub
benwtrent commented on code in PR #12122: URL: https://github.com/apache/lucene/pull/12122#discussion_r1092395951 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, float[] v2, int index

[GitHub] [lucene] javanna commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub
javanna commented on code in PR #12122: URL: https://github.com/apache/lucene/pull/12122#discussion_r1092402069 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, float[] v2, int index)

[GitHub] [lucene] javanna commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField

2023-01-31 Thread via GitHub
javanna commented on issue #12028: URL: https://github.com/apache/lucene/issues/12028#issuecomment-1411142472 Looks like this issue is addressed with the PR above? Can we close it or is there anything left to do that I am missing? -- This is an automated message from the Apache Git Servic

[GitHub] [lucene] mdmarshmallow commented on pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

2023-01-31 Thread via GitHub
mdmarshmallow commented on PR #11958: URL: https://github.com/apache/lucene/pull/11958#issuecomment-1411221077 Hi, I was wondering if this could be merged. I think I addressed all the feedback given here and it has been approved for quite a while now. Thanks! -- This is an automated messa