date:20230131

[GitHub] [lucene] zhaih commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-31 Thread via GitHub



zhaih commented on code in PR #12114:
URL: https://github.com/apache/lucene/pull/12114#discussion_r1091567169


##
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java:
##
@@ -379,27 +272,24 @@ public int advance(final int target) throws IOException {
 
 @Override
 public int docID() {
-  return docIt < 0 ? -1 : docIt >= upto ? NO_MORE_DOCS : docs[docIt];
+  return docIt < 0 ? -1 : docs[docIt];
 }
 
 @Override
-public int freq() throws IOException {
-  return withFreqs && docIt < upto ? freqs[docIt] : 1;
+public int nextDoc() throws IOException {
+  return docs[++docIt];
 }
 
 @Override
-public int nextDoc() throws IOException {
-  if (++docIt >= upto) return NO_MORE_DOCS;
-  return docs[docIt];
+public long cost() {
+  return upTo;
 }
 
-/** Returns the wrapped {@link PostingsEnum}. */
-PostingsEnum getWrapped() {
-  return in;
+@Override
+public int freq() throws IOException {

Review Comment:
   So we're removing `freq` support because no one is really using it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410148671

   i don't understand this issue. The only purpose of this query is for 
scoring. If you don't want scores, drop the clause completely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



benwtrent commented on PR #12050:
URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410368372

   > Ah since Lucene95 has just been released, I think we should move this to 
Lucene 96?
   
   @zhaih 
   
   Do you mean create a new Codec version? From what I can tell, nothing in the 
underlying storage format has changed and the only reason 
`Lucene95HnswVectorsReader` is cast is for 
`Lucene95HnswVectorsReader#getGraph`, which already existed.
   
   Could you clarify your concern?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



benwtrent commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410402325

   @rmuir 
   
   > i don't understand this issue. The only purpose of this query is for 
scoring. If you don't want scores, drop the clause completely.
   
   A `FeatureField` provides a useful extension point for learned-sparse 
retrieval models (see linked issue). These models provide multiple `feature` 
and `score` pairs. These fields will likely match relevant documents that are 
not previously matched by other means.
   
   A perfectly valid (and powerful) query would be `BooleanQuery` with multiple 
`SHOULD` clauses containing `FeatureQuery` objects (obviously, with minimum 
should match > 0). Note that no other field is being queried. Dropping all 
those clauses would be a `match_all` and not accurately reflect the matching 
document set.
   
   Being able to iterate the entire recall set for matching multiple 
`FeatureField` values will provide useful insight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410418901

   So just rewrite it to a TermWeight in createWeight if scores are not needed? 
No need to duplicate the logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410426974

   example pseudocode:
   ```
 @Override
 public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
 throws IOException {
   if (!scoreMode.needsScores()) {
 // if scores are not needed, let TermWeight deal with optimizing that 
case.
 TermQuery tq = new TermQuery(new Term(fieldName, featureName));
 return searcher
 .rewrite(tq)
 .createWeight(searcher, ScoreMode.COMPLETE_NO_SCORES, boost);
   }
   ...
 }
   ```
   
   This would ensure that it gets all the optimizations of TermQuery and keep 
the code simple.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



benwtrent commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410445640

   I like that @rmuir! Its keeps the nice API for FeatureFields and removes 
code duplication.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410447591

   stolen from SynonymQuery lol. and not sure about why it doesn't pass 
ScoreMode straight thru and instead hardcodes COMPLETE_NO_SCORES, seems wrong. 
but you got the idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12116: Improve document API for stored fields.

2023-01-31 Thread via GitHub



jpountz commented on code in PR #12116:
URL: https://github.com/apache/lucene/pull/12116#discussion_r1092033320


##
lucene/core/src/java/org/apache/lucene/document/StoredValue.java:
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.util.Objects;
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Abstraction around a stored value.
+ *
+ * @see IndexableField
+ */
+public final class StoredValue {
+
+  /** Type of a {@link StoredValue}. */
+  public enum Type {
+INTEGER,
+LONG,
+FLOAT,
+DOUBLE,
+BINARY,
+STRING;
+  }
+
+  private final Type type;
+  private int intValue;
+  private long longValue;
+  private float floatValue;
+  private double doubleValue;
+  private BytesRef binaryValue;
+  private String stringValue;
+
+  /** Ctor for integer values. */
+  public StoredValue(int value) {
+type = Type.INTEGER;
+intValue = value;
+  }
+
+  /** Ctor for long values. */
+  public StoredValue(long value) {
+type = Type.LONG;
+longValue = value;
+  }
+
+  /** Ctor for float values. */
+  public StoredValue(float value) {
+type = Type.FLOAT;
+floatValue = value;
+  }
+
+  /** Ctor for double values. */
+  public StoredValue(double value) {
+type = Type.DOUBLE;
+doubleValue = value;
+  }
+
+  /** Ctor for binary values. */
+  public StoredValue(BytesRef value) {
+type = Type.BINARY;
+binaryValue = Objects.requireNonNull(value);
+  }
+
+  /** Ctor for binary values. */
+  public StoredValue(String value) {
+type = Type.STRING;
+stringValue = Objects.requireNonNull(value);
+  }
+
+  /** Retrieve the type of the stored value. */
+  public Type getType() {
+return type;
+  }
+
+  /** Set an integer value. */
+  public void setIntValue(int value) {
+if (type != Type.INTEGER) {
+  throw new IllegalArgumentException("Cannot set an integer on a " + type 
+ " value");
+}
+intValue = value;
+  }
+
+  /** Set a long value. */
+  public void setLongValue(long value) {
+if (type != Type.LONG) {
+  throw new IllegalArgumentException("Cannot set a long on a " + type + " 
value");
+}
+longValue = value;
+  }
+
+  /** Set a float value. */
+  public void setFloatValue(float value) {
+if (type != Type.FLOAT) {
+  throw new IllegalArgumentException("Cannot set a float on a " + type + " 
value");
+}
+floatValue = value;
+  }
+
+  /** Set a double value. */
+  public void setLongValue(double value) {
+if (type != Type.DOUBLE) {

Review Comment:
   Yes! Fixed.



##
lucene/core/src/java/org/apache/lucene/document/StoredValue.java:
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import java.util.Objects;
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Abstraction around a stored value.
+ *
+ * @see IndexableField
+ */
+public final class StoredValue {
+
+  /** Type of a {@link StoredValue}. */
+  public enum Type {
+INTEGER,
+LONG,
+FLOAT,
+DOUBLE,
+BINARY,
+STRING;
+  }
+
+  private final Type type;
+  private int intValue;
+  private long longValue;
+  private float floatValue;
+  private double doubleValue;
+  private BytesRef binaryValue;
+  private String stringValue;
+
+  /** Ctor for integer values. */
+  pu

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410515203

   I'm ok with changes but i still don't understand the use-case. Pulling all 
documents containing features, then calculating your own score throws away all 
the efficiency of FeatureField (e.g. early termination) and will be way too 
slow as the worst-case is scoring `O(maxdoc)` which could be e.g. a billion.
   
   It would be better to look at `Rescorer` api, e.g. keep the scores ON for 
the FeatureField, but pull top 500 or 1000 and re-rank those with anything 
fancy. It keeps everything fast and bounded.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



rmuir commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410532359

   does it make sense? From my perspective the reason to use `FeatureField` is 
for the WAND-skipping. So if you ask for it not to do scoring, it can't skip, 
and it defeats the entire purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

2023-01-31 Thread via GitHub



jpountz commented on PR #12118:
URL: https://github.com/apache/lucene/pull/12118#issuecomment-1410736154

   For the record this need comes from implementing sparse retrieval similarly 
to what's discussed at #11799, so `FeatureField` no longer stores features but 
regular terms here. One option is to reuse `FeatureField` for this. Another 
option could be to reuse `TermQuery` by configuring the `Similarity`'s 
`SimScorer` to properly decode the frequency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

2023-01-31 Thread via GitHub



jpountz commented on code in PR #12114:
URL: https://github.com/apache/lucene/pull/12114#discussion_r1092226923


##
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java:
##
@@ -379,27 +272,24 @@ public int advance(final int target) throws IOException {
 
 @Override
 public int docID() {
-  return docIt < 0 ? -1 : docIt >= upto ? NO_MORE_DOCS : docs[docIt];
+  return docIt < 0 ? -1 : docs[docIt];
 }
 
 @Override
-public int freq() throws IOException {
-  return withFreqs && docIt < upto ? freqs[docIt] : 1;
+public int nextDoc() throws IOException {
+  return docs[++docIt];
 }
 
 @Override
-public int nextDoc() throws IOException {
-  if (++docIt >= upto) return NO_MORE_DOCS;
-  return docs[docIt];
+public long cost() {
+  return upTo;
 }
 
-/** Returns the wrapped {@link PostingsEnum}. */
-PostingsEnum getWrapped() {
-  return in;
+@Override
+public int freq() throws IOException {

Review Comment:
   With this change, fields that have frequencies are now handled by 
`SortingPostingsEnum` while `SortingDocsEnum` focuses on fields that only index 
docs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1092275814


##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -489,6 +485,220 @@ public void mergeOneField(FieldInfo fieldInfo, MergeState 
mergeState) throws IOE
 }
   }
 
+  private HnswGraphBuilder createFloatVectorHnswGraphBuilder(

Review Comment:
   Oh I see. Makes sense. I updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.

2023-01-31 Thread via GitHub



jpountz commented on PR #11900:
URL: https://github.com/apache/lucene/pull/11900#issuecomment-1410801742

   @jfboeuf I took a stab at removing the versioning logic to simplify the 
change, I plan on merging it soon if this works for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zhaih commented on pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



zhaih commented on PR #12050:
URL: https://github.com/apache/lucene/pull/12050#issuecomment-1410818168

   > Do you mean create a new Codec version? From what I can tell, nothing in 
the underlying storage format has changed and the only reason 
Lucene95HnswVectorsReader is cast is for Lucene95HnswVectorsReader#getGraph, 
which already existed.
   
   @benwtrent You're right, I had an impression of this work was based on the 
newly created codec but yeah we don't need a new codec for it. Sorry for the 
confusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



benwtrent commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1092319484


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java:
##
@@ -56,6 +56,8 @@ long apply(long v) {
   // Whether the search stopped early because it reached the visited nodes 
limit
   private boolean incomplete;
 
+  public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new 
NeighborQueue(1, true);

Review Comment:
   It is nice to have a static thing like this. But, 
`EMPTY_MAX_HEAP_NEIGHBOR_QUEUE#add(int float)` is possible. This seems 
dangerous to me as somebody might accidentally call `search` and then add 
values to this static object.
   
   If we are going to have a static object like this, it would be good if it 
was `EmptyNeighborQueue` that disallows `add` or any mutable action. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



jmazanec15 commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1092337143


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java:
##
@@ -56,6 +56,8 @@ long apply(long v) {
   // Whether the search stopped early because it reached the visited nodes 
limit
   private boolean incomplete;
 
+  public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new 
NeighborQueue(1, true);

Review Comment:
   You are right, I did not think about this. Given how much mutable state 
there is, I am wondering if it might just be better to get rid of this. What do 
you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

2023-01-31 Thread via GitHub



benwtrent commented on code in PR #12050:
URL: https://github.com/apache/lucene/pull/12050#discussion_r1092368089


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java:
##
@@ -56,6 +56,8 @@ long apply(long v) {
   // Whether the search stopped early because it reached the visited nodes 
limit
   private boolean incomplete;
 
+  public static final NeighborQueue EMPTY_MAX_HEAP_NEIGHBOR_QUEUE = new 
NeighborQueue(1, true);

Review Comment:
   @jmazanec15 simply removing it and going back to the way it was (since all 
the following loops would be empty) should be OK imo. Either way I am good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna opened a new pull request, #12121: Remove VectorUtil#toBytesRef

2023-01-31 Thread via GitHub



javanna opened a new pull request, #12121:
URL: https://github.com/apache/lucene/pull/12121

   The method is currently only used in its corresponding test method.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna opened a new pull request, #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub



javanna opened a new pull request, #12122:
URL: https://github.com/apache/lucene/pull/12122

   Two of the methods (squareDistance and dotProduct) that take byte arrays 
return a float while the variable used to store the value is an int. They can 
just return an int.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub



benwtrent commented on code in PR #12122:
URL: https://github.com/apache/lucene/pull/12122#discussion_r1092395951


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, 
float[] v2, int index) {
   }
 
   /** Returns the sum of squared differences of the two vectors. */
-  public static float squareDistance(byte[] a, byte[] b) {
+  public static int squareDistance(byte[] a, byte[] b) {

Review Comment:
   `EUCLIDEAN#compare(byte[], byte[])` needs to be updated because switching 
this to int changes `1/(1 + int)` where as previously it would return 
factional. 
   
   Something like `1f/(1f + VectorUtil#squareDistance(byte[], byte[])`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

2023-01-31 Thread via GitHub



javanna commented on code in PR #12122:
URL: https://github.com/apache/lucene/pull/12122#discussion_r1092402069


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -181,7 +181,7 @@ private static float squareDistanceUnrolled(float[] v1, 
float[] v2, int index) {
   }
 
   /** Returns the sum of squared differences of the two vectors. */
-  public static float squareDistance(byte[] a, byte[] b) {
+  public static int squareDistance(byte[] a, byte[] b) {

Review Comment:
   yep should be fixed now. I am glad we had that code inspection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField

2023-01-31 Thread via GitHub



javanna commented on issue #12028:
URL: https://github.com/apache/lucene/issues/12028#issuecomment-1411142472

   Looks like this issue is addressed with the PR above? Can we close it or is 
there anything left to do that I am missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mdmarshmallow commented on pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

2023-01-31 Thread via GitHub



mdmarshmallow commented on PR #11958:
URL: https://github.com/apache/lucene/pull/11958#issuecomment-1411221077

   Hi, I was wondering if this could be merged. I think I addressed all the 
feedback given here and it has been approved for quite a while now. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zhaih commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] benwtrent commented on pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] benwtrent commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] jpountz commented on a diff in pull request #12116: Improve document API for stored fields.

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] rmuir commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] jpountz commented on pull request #12118: Add `FeatureQuery` weight caching in non-scoring case

[GitHub] [lucene] jpountz commented on a diff in pull request #12114: Use radix sort to sort postings when index sorting is enabled.

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] jpountz commented on pull request #11900: Reduce bloom filter size by using the optimal count for hash functions.

[GitHub] [lucene] zhaih commented on pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] jmazanec15 commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] benwtrent commented on a diff in pull request #12050: Reuse HNSW graph for intialization during merge

[GitHub] [lucene] javanna opened a new pull request, #12121: Remove VectorUtil#toBytesRef

[GitHub] [lucene] javanna opened a new pull request, #12122: Adjust return type for VectorUtil methods

[GitHub] [lucene] benwtrent commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

[GitHub] [lucene] javanna commented on a diff in pull request #12122: Adjust return type for VectorUtil methods

[GitHub] [lucene] javanna commented on issue #12028: Add newSetQuery for IntField, LongField, FloatField, DoubleField

[GitHub] [lucene] mdmarshmallow commented on pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

25 matches

Site Navigation

Mail list logo

Footer information