[GitHub] [lucene] zhaih commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.
zhaih commented on code in PR #12114: URL: https://github.com/apache/lucene/pull/12114#discussion_r1091567169 ## lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java: ## @@ -379,27 +272,24 @@ public int advance(final int target) throws IOException { @Override public int docID() { - return docIt < 0 ? -1 : docIt >= upto ? NO_MORE_DOCS : docs[docIt]; + return docIt < 0 ? -1 : docs[docIt]; } @Override -public int freq() throws IOException { - return withFreqs && docIt < upto ? freqs[docIt] : 1; +public int nextDoc() throws IOException { + return docs[++docIt]; } @Override -public int nextDoc() throws IOException { - if (++docIt >= upto) return NO_MORE_DOCS; - return docs[docIt]; +public long cost() { + return upTo; } -/** Returns the wrapped {@link PostingsEnum}. */ -PostingsEnum getWrapped() { - return in; +@Override +public int freq() throws IOException { Review Comment: So we're removing `freq` support because no one is really using it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410148671 i don't understand this issue. The only purpose of this query is for scoring. If you don't want scores, drop the clause completely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12050: Reuse HNSW graph for intialization during merge
benwtrent commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410368372 > Ah since Lucene95 has just been released, I think we should move this to Lucene 96? @zhaih Do you mean create a new Codec version? From what I can tell, nothing in the underlying storage format has changed and the only reason `Lucene95HnswVectorsReader` is cast is for `Lucene95HnswVectorsReader#getGraph`, which already existed. Could you clarify your concern? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
benwtrent commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410402325 @rmuir > i don't understand this issue. The only purpose of this query is for scoring. If you don't want scores, drop the clause completely. A `FeatureField` provides a useful extension point for learned-sparse retrieval models (see linked issue). These models provide multiple `feature` and `score` pairs. These fields will likely match relevant documents that are not previously matched by other means. A perfectly valid (and powerful) query would be `BooleanQuery` with multiple `SHOULD` clauses containing `FeatureQuery` objects (obviously, with minimum should match > 0). Note that no other field is being queried. Dropping all those clauses would be a `match_all` and not accurately reflect the matching document set. Being able to iterate the entire recall set for matching multiple `FeatureField` values will provide useful insight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410418901 So just rewrite it to a TermWeight in createWeight if scores are not needed? No need to duplicate the logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410426974 example pseudocode: ``` @Override public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { if (!scoreMode.needsScores()) { // if scores are not needed, let TermWeight deal with optimizing that case. TermQuery tq = new TermQuery(new Term(fieldName, featureName)); return searcher .rewrite(tq) .createWeight(searcher, ScoreMode.COMPLETE_NO_SCORES, boost); } ... } ``` This would ensure that it gets all the optimizations of TermQuery and keep the code simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
benwtrent commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410445640 I like that @rmuir! Its keeps the nice API for FeatureFields and removes code duplication. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410447591 stolen from SynonymQuery lol. and not sure about why it doesn't pass ScoreMode straight thru and instead hardcodes COMPLETE_NO_SCORES, seems wrong. but you got the idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12116: Improve document API for stored fields.
jpountz commented on code in PR #12116: URL: https://github.com/apache/lucene/pull/12116#discussion_r1092033320 ## lucene/core/src/java/org/apache/lucene/document/StoredValue.java: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.util.Objects; +import org.apache.lucene.index.IndexableField; +import org.apache.lucene.util.BytesRef; + +/** + * Abstraction around a stored value. + * + * @see IndexableField + */ +public final class StoredValue { + + /** Type of a {@link StoredValue}. */ + public enum Type { +INTEGER, +LONG, +FLOAT, +DOUBLE, +BINARY, +STRING; + } + + private final Type type; + private int intValue; + private long longValue; + private float floatValue; + private double doubleValue; + private BytesRef binaryValue; + private String stringValue; + + /** Ctor for integer values. */ + public StoredValue(int value) { +type = Type.INTEGER; +intValue = value; + } + + /** Ctor for long values. */ + public StoredValue(long value) { +type = Type.LONG; +longValue = value; + } + + /** Ctor for float values. */ + public StoredValue(float value) { +type = Type.FLOAT; +floatValue = value; + } + + /** Ctor for double values. */ + public StoredValue(double value) { +type = Type.DOUBLE; +doubleValue = value; + } + + /** Ctor for binary values. */ + public StoredValue(BytesRef value) { +type = Type.BINARY; +binaryValue = Objects.requireNonNull(value); + } + + /** Ctor for binary values. */ + public StoredValue(String value) { +type = Type.STRING; +stringValue = Objects.requireNonNull(value); + } + + /** Retrieve the type of the stored value. */ + public Type getType() { +return type; + } + + /** Set an integer value. */ + public void setIntValue(int value) { +if (type != Type.INTEGER) { + throw new IllegalArgumentException("Cannot set an integer on a " + type + " value"); +} +intValue = value; + } + + /** Set a long value. */ + public void setLongValue(long value) { +if (type != Type.LONG) { + throw new IllegalArgumentException("Cannot set a long on a " + type + " value"); +} +longValue = value; + } + + /** Set a float value. */ + public void setFloatValue(float value) { +if (type != Type.FLOAT) { + throw new IllegalArgumentException("Cannot set a float on a " + type + " value"); +} +floatValue = value; + } + + /** Set a double value. */ + public void setLongValue(double value) { +if (type != Type.DOUBLE) { Review Comment: Yes! Fixed. ## lucene/core/src/java/org/apache/lucene/document/StoredValue.java: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import java.util.Objects; +import org.apache.lucene.index.IndexableField; +import org.apache.lucene.util.BytesRef; + +/** + * Abstraction around a stored value. + * + * @see IndexableField + */ +public final class StoredValue { + + /** Type of a {@link StoredValue}. */ + public enum Type { +INTEGER, +LONG, +FLOAT, +DOUBLE, +BINARY, +STRING; + } + + private final Type type; + private int intValue; + private long longValue; + private float floatValue; + private double doubleValue; + private BytesRef binaryValue; + private String stringValue; + + /** Ctor for integer values. */ + pu
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410515203 I'm ok with changes but i still don't understand the use-case. Pulling all documents containing features, then calculating your own score throws away all the efficiency of FeatureField (e.g. early termination) and will be way too slow as the worst-case is scoring `O(maxdoc)` which could be e.g. a billion. It would be better to look at `Rescorer` api, e.g. keep the scores ON for the FeatureField, but pull top 500 or 1000 and re-rank those with anything fancy. It keeps everything fast and bounded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
rmuir commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410532359 does it make sense? From my perspective the reason to use `FeatureField` is for the WAND-skipping. So if you ask for it not to do scoring, it can't skip, and it defeats the entire purpose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case
jpountz commented on PR #12118: URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410736154 For the record this need comes from implementing sparse retrieval similarly to what's discussed at #11799, so `FeatureField` no longer stores features but regular terms here. One option is to reuse `FeatureField` for this. Another option could be to reuse `TermQuery` by configuring the `Similarity`'s `SimScorer` to properly decode the frequency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.
jpountz commented on code in PR #12114: URL: https://github.com/apache/lucene/pull/12114#discussion_r1092226923 ## lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java: ## @@ -379,27 +272,24 @@ public int advance(final int target) throws IOException { @Override public int docID() { - return docIt < 0 ? -1 : docIt >= upto ? NO_MORE_DOCS : docs[docIt]; + return docIt < 0 ? -1 : docs[docIt]; } @Override -public int freq() throws IOException { - return withFreqs && docIt < upto ? freqs[docIt] : 1; +public int nextDoc() throws IOException { + return docs[++docIt]; } @Override -public int nextDoc() throws IOException { - if (++docIt >= upto) return NO_MORE_DOCS; - return docs[docIt]; +public long cost() { + return upTo; } -/** Returns the wrapped {@link PostingsEnum}. */ -PostingsEnum getWrapped() { - return in; +@Override +public int freq() throws IOException { Review Comment: With this change, fields that have frequencies are now handled by `SortingPostingsEnum` while `SortingDocsEnum` focuses on fields that only index docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092275814 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -489,6 +485,220 @@ public void mergeOneField(FieldInfo fieldInfo, MergeState mergeState) throws IOE } } + private HnswGraphBuilder createFloatVectorHnswGraphBuilder( Review Comment: Oh I see. Makes sense. I updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.
jpountz commented on PR #11900: URL: https://github.com/apache/lucene/pull/11900#issuecomment-1410801742 @jfboeuf I took a stab at removing the versioning logic to simplify the change, I plan on merging it soon if this works for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12050: Reuse HNSW graph for intialization during merge
zhaih commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410818168 > Do you mean create a new Codec version? From what I can tell, nothing in the underlying storage format has changed and the only reason Lucene95HnswVectorsReader is cast is for Lucene95HnswVectorsReader#getGraph, which already existed. @benwtrent You're right, I had an impression of this work was based on the newly created codec but yeah we don't need a new codec for it. Sorry for the confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge
benwtrent commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092319484 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it reached the visited nodes limit private boolean incomplete; + public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new NeighborQueue(1, true); Review Comment: It is nice to have a static thing like this. But, `EMPTY_MAX_HEAP_NEIGHBOR_QUEUE#add(int float)` is possible. This seems dangerous to me as somebody might accidentally call `search` and then add values to this static object. If we are going to have a static object like this, it would be good if it was `EmptyNeighborQueue` that disallows `add` or any mutable action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge
jmazanec15 commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092337143 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it reached the visited nodes limit private boolean incomplete; + public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new NeighborQueue(1, true); Review Comment: You are right, I did not think about this. Given how much mutable state there is, I am wondering if it might just be better to get rid of this. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge
benwtrent commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1092368089 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,6 +56,8 @@ long apply(long v) { // Whether the search stopped early because it reached the visited nodes limit private boolean incomplete; + public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new NeighborQueue(1, true); Review Comment: @jmazanec15 simply removing it and going back to the way it was (since all the following loops would be empty) should be OK imo. Either way I am good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #12121: Remove VectorUtil#toBytesRef
javanna opened a new pull request, #12121: URL: https://github.com/apache/lucene/pull/12121 The method is currently only used in its corresponding test method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #12122: Adjust return type for VectorUtil methods
javanna opened a new pull request, #12122: URL: https://github.com/apache/lucene/pull/12122 Two of the methods (squareDistance and dotProduct) that take byte arrays return a float while the variable used to store the value is an int. They can just return an int. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12122: Adjust return type for VectorUtil methods
benwtrent commented on code in PR #12122: URL: https://github.com/apache/lucene/pull/12122#discussion_r1092395951 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, float[] v2, int index) { } /** Returns the sum of squared differences of the two vectors. */ - public static float squareDistance(byte[] a, byte[] b) { + public static int squareDistance(byte[] a, byte[] b) { Review Comment: `EUCLIDEAN#compare(byte[], byte[])` needs to be updated because switching this to int changes `1/(1 + int)` where as previously it would return factional. Something like `1f/(1f + VectorUtil#squareDistance(byte[], byte[])` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on a diff in pull request #12122: Adjust return type for VectorUtil methods
javanna commented on code in PR #12122: URL: https://github.com/apache/lucene/pull/12122#discussion_r1092402069 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, float[] v2, int index) { } /** Returns the sum of squared differences of the two vectors. */ - public static float squareDistance(byte[] a, byte[] b) { + public static int squareDistance(byte[] a, byte[] b) { Review Comment: yep should be fixed now. I am glad we had that code inspection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField
javanna commented on issue #12028: URL: https://github.com/apache/lucene/issues/12028#issuecomment-1411142472 Looks like this issue is addressed with the PR above? Can we close it or is there anything left to do that I am missing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes
mdmarshmallow commented on PR #11958: URL: https://github.com/apache/lucene/pull/11958#issuecomment-1411221077 Hi, I was wondering if this could be merged. I think I addressed all the feedback given here and it has been approved for quite a while now. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org