Re: [PR] Remove usage of IndexSearcher#Search(Query, Collector) from monitor package [lucene]

2024-09-10 Thread via GitHub


javanna commented on code in PR #13735:
URL: https://github.com/apache/lucene/pull/13735#discussion_r1751543562


##
lucene/core/src/java/org/apache/lucene/search/CollectorManager.java:
##
@@ -46,4 +48,28 @@ public interface CollectorManager {
* called after collection is finished on all provided collectors.
*/
   T reduce(Collection collectors) throws IOException;
+
+  /**
+   * Wrap a provided {@link Collector} with a thin {@code CollectorManager} 
wrapper for use with
+   * {@link IndexSearcher#search(Query, CollectorManager)}. The wrapping 
{@code CollectorManager}
+   * provides no {@link CollectorManager#reduce(Collection)} implementation, 
so the wrapped {@code
+   * Collector} needs to do all relevant work while collecting.
+   *
+   * Note: This is only safe to use when {@code IndexSearcher} is created 
with no executor (see:
+   * {@link IndexSearcher#IndexSearcher(IndexReader, Executor)}), or the 
provided collector is
+   * threadsafe.
+   */
+  static  CollectorManager wrap(C in) {
+return new CollectorManager() {
+  @Override
+  public C newCollector() {
+return in;
+  }
+
+  @Override
+  public Void reduce(Collection collectors) {
+return null;

Review Comment:
   Right, agreed. We can assume that the collection holds the collectors that 
newCollector returned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1751550108


##
lucene/misc/src/java/org/apache/lucene/misc/util/fst/UpToTwoPositiveIntOutputs.java:
##
@@ -42,36 +42,11 @@
 public final class UpToTwoPositiveIntOutputs extends Outputs {
 
   /** Holds two long outputs. */
-  public static final class TwoLongs {
-public final long first;
-public final long second;
-
-public TwoLongs(long first, long second) {
-  this.first = first;
-  this.second = second;
+  public record TwoLongs(long first, long second) {
+public TwoLongs {
   assert first >= 0;
   assert second >= 0;
 }
-
-@Override
-public String toString() {
-  return "TwoLongs:" + first + "," + second;
-}

Review Comment:
   Yes, its fine in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1751558206


##
lucene/core/src/java/org/apache/lucene/util/FixedBits.java:
##
@@ -17,15 +17,7 @@
 package org.apache.lucene.util;
 
 /** Immutable twin of FixedBitSet. */
-final class FixedBits implements Bits {
-
-  final long[] bits;
-  final int length;
-
-  FixedBits(long[] bits, int length) {
-this.bits = bits;
-this.length = length;
-  }
+record FixedBits(long[] bits, int length) implements Bits {

Review Comment:
   Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1751557625


##
lucene/test-framework/src/java/org/apache/lucene/tests/codecs/asserting/AssertingLiveDocsFormat.java:
##
@@ -68,9 +68,7 @@ public String toString() {
 return "Asserting(" + in + ")";
   }
 
-  static class AssertingBits implements Bits {
-final Bits in;
-
+  record AssertingBits(Bits in) implements Bits {

Review Comment:
   I agree, lets keep it as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#Search(Query, Collector) from monitor package [lucene]

2024-09-10 Thread via GitHub


javanna commented on code in PR #13735:
URL: https://github.com/apache/lucene/pull/13735#discussion_r1751560638


##
lucene/core/src/java/org/apache/lucene/search/CollectorManager.java:
##
@@ -46,4 +48,28 @@ public interface CollectorManager {
* called after collection is finished on all provided collectors.
*/
   T reduce(Collection collectors) throws IOException;
+
+  /**
+   * Wrap a provided {@link Collector} with a thin {@code CollectorManager} 
wrapper for use with
+   * {@link IndexSearcher#search(Query, CollectorManager)}. The wrapping 
{@code CollectorManager}
+   * provides no {@link CollectorManager#reduce(Collection)} implementation, 
so the wrapped {@code
+   * Collector} needs to do all relevant work while collecting.
+   *
+   * Note: This is only safe to use when {@code IndexSearcher} is created 
with no executor (see:
+   * {@link IndexSearcher#IndexSearcher(IndexReader, Executor)}), or the 
provided collector is
+   * threadsafe.
+   */
+  static  CollectorManager wrap(C in) {

Review Comment:
   Naming is the most difficult bit :) Some options: `forSequentialExecution` , 
`createSequentialManager`, `singleThreadedManager` , or something along those 
lines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] add TFSimilarity class [lucene]

2024-09-10 Thread via GitHub


cpoerschke commented on PR #13749:
URL: https://github.com/apache/lucene/pull/13749#issuecomment-2340084262

   > ... use the TF like a payload ...
   
   Initially I thought that this would require a custom `Term[Query|Scorer]` 
style change but then from a brief chat with @seanmacavaney (thank you!) I 
learnt that maybe `TFSimilarity` could be a thing here instead and it turns out 
`TestConjunctions` already had it as an inner class also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] add TFSimilarity class [lucene]

2024-09-10 Thread via GitHub


cpoerschke opened a new pull request, #13749:
URL: https://github.com/apache/lucene/pull/13749

   Motivation is to use the TF like a payload but without needing to have 
payloads and positions.
   
   see also #13731 `PayloadScoreQuery` javadoc update w.r.t. `SpanQuery` use
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] add TFSimilarity class [lucene]

2024-09-10 Thread via GitHub


seanmacavaney commented on code in PR #13749:
URL: https://github.com/apache/lucene/pull/13749#discussion_r1751677536


##
lucene/core/src/java/org/apache/lucene/search/similarities/TFSimilarity.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.similarities;
+
+import org.apache.lucene.index.FieldInvertState;
+import org.apache.lucene.search.CollectionStatistics;
+import org.apache.lucene.search.TermStatistics;
+
+/** Similarity that returns the TF as score */
+public class TFSimilarity extends Similarity {
+
+  @Override
+  public long computeNorm(FieldInvertState state) {
+return 1; // we dont care
+  }
+
+  @Override
+  public SimScorer scorer(
+  float boost, CollectionStatistics collectionStats, TermStatistics... 
termStats) {
+return new SimScorer() {
+  @Override
+  public float score(float freq, long norm) {
+return freq;

Review Comment:
   Should this be `boost * freq`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] add TFSimilarity class [lucene]

2024-09-10 Thread via GitHub


cpoerschke commented on code in PR #13749:
URL: https://github.com/apache/lucene/pull/13749#discussion_r1751688741


##
lucene/core/src/java/org/apache/lucene/search/similarities/TFSimilarity.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.similarities;
+
+import org.apache.lucene.index.FieldInvertState;
+import org.apache.lucene.search.CollectionStatistics;
+import org.apache.lucene.search.TermStatistics;
+
+/** Similarity that returns the TF as score */
+public class TFSimilarity extends Similarity {
+
+  @Override
+  public long computeNorm(FieldInvertState state) {
+return 1; // we dont care
+  }
+
+  @Override
+  public SimScorer scorer(
+  float boost, CollectionStatistics collectionStats, TermStatistics... 
termStats) {
+return new SimScorer() {
+  @Override
+  public float score(float freq, long norm) {
+return freq;

Review Comment:
   Good catch, thanks! Will also add test coverage to match, marking as draft 
in the meantime.
   ```suggestion
   return boost * freq;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide a custom hash implementation in HnswLock [lucene]

2024-09-10 Thread via GitHub


dweiss commented on code in PR #13751:
URL: https://github.com/apache/lucene/pull/13751#discussion_r1751723180


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswLock.java:
##
@@ -67,4 +66,10 @@ public void close() {
   lock.unlock();
 }
   }
+
+  static int hash(int v1, int v2) {
+int result = 31 + v1;
+result = 31 * result + v2;

Review Comment:
   (or just drop the constant shifting part and leave v1 * 31 + v2...).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-10 Thread via GitHub


javanna commented on code in PR #13747:
URL: https://github.com/apache/lucene/pull/13747#discussion_r1751774184


##
lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java:
##
@@ -530,49 +528,104 @@ public static Query createJoinQuery(
 
 final Query rewrittenFromQuery = searcher.rewrite(fromQuery);
 final Query rewrittenToQuery = searcher.rewrite(toQuery);
-GlobalOrdinalsWithScoreCollector globalOrdinalsWithScoreCollector;
+Supplier 
globalOrdinalsWithScoreCollector;
+final OrdinalMap finalOrdinalMap = ordinalMap;
 switch (scoreMode) {
   case Total:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Sum(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Sum(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Min:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Min(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Min(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Max:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Max(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Max(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Avg:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Avg(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Avg(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case None:
 if (min <= 1 && max == Integer.MAX_VALUE) {
-  GlobalOrdinalsCollector globalOrdinalsCollector =
-  new GlobalOrdinalsCollector(joinField, ordinalMap, valueCount);
-  searcher.search(rewrittenFromQuery, globalOrdinalsCollector);
+  LongBitSet collectedOrdinals =
+  searcher.search(
+  rewrittenFromQuery,
+  new CollectorManager() {

Review Comment:
   do we expect users to benefit from this collector manager outside of 
`JoinUtil`? If so maybe it deserves to be in its own outer class, and include 
the merge method (now called combine) in it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


mikemccand commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1751974584


##
lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java:
##
@@ -28,17 +36,114 @@
  */
 public class TotalHitCountCollectorManager
 implements CollectorManager {
+
+  private final boolean hasSegmentPartitions;
+
+  /**
+   * Creates a new total hit count collector manager. The collectors returned 
by {@link
+   * #newCollector()} don't support intra-segment concurrency. Use the other 
constructor if segments
+   * partitions are being searched.
+   */
+  public TotalHitCountCollectorManager() {
+this(false);
+  }
+
+  /**
+   * Creates a new total hit count collector manager, providing a flag that 
signals whether segment
+   * partitions are being searched, in which case the different collector need 
to share state to
+   * ensure consistent behaviour across partitions of the same segment. There 
are segment partitions
+   * when the {@link IndexSearcher#slices(List)} methods returns leaf slices 
that target leaf reader
+   * partitions.
+   *
+   * @see IndexSearcher#slices(List)
+   * @see org.apache.lucene.search.IndexSearcher.LeafReaderContextPartition
+   * @param hasSegmentPartitions whether the collector manager needs to create 
collectors that
+   * support searching segment partitions in parallel (via intra-segment 
search concurrency)
+   */
+  public TotalHitCountCollectorManager(boolean hasSegmentPartitions) {

Review Comment:
   > I made the change to accept a `LeafSlice[]` in the constructor instead of 
a boolean. That feels more automatic, although not perfect :)
   
   I think this is better -- there is no more trap, since the 
`CollectorManager` now "does the right thing" if there are partitions in the 
`LeafSlice[]`, or not.
   
   We can improve it later (post 10.0.0) ... ideally, somehow, `IndexSearcher` 
would share the slices with the `CollectorManager` up front to inform it.  Or, 
we can optimize enough for the "might contain partitions" case and just use 
that code path always (better).  But let's not block this awesome first step 
with this "optimization drives API" wrinkle...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


mikemccand commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752005115


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java:
##
@@ -163,11 +156,6 @@ public boolean equals(Object o) {
   FieldAndWeight that = (FieldAndWeight) o;
   return Float.compare(that.weight, weight) == 0 && Objects.equals(field, 
that.field);
 }

Review Comment:
   > (I tried to quickly look up what record does wrt equality for floats and 
doubles but couldn't find it, will look later)
   
   I found one mention of floating point numbers in the [JEP for 
records](https://openjdk.org/jeps/395):
   
   ```
   In addition, for all record classes the implicitly declared equals method is 
implemented so that it is reflexive and that it behaves consistently with 
hashCode for record classes that have floating point components. Again, 
explicitly declared equals and hashCode methods should behave similarly.
   ```
   
   I'm not sure I understand that paragraph :)



##
lucene/MIGRATE.md:
##
@@ -193,6 +193,7 @@ access the members using method calls instead of field 
accesses. Affected classe
 
 - `IOContext`, `MergeInfo`, and `FlushInfo` (GITHUB#13205)
 - `BooleanClause` (GITHUB#13261)
+- `CollectionStatistics`, `TermStatistics` and `LeafMetadata` (GITHUB#13328)

Review Comment:
   And  maybe change this wording too (same as `CHANGES.txt`)?  I think 
@jpountz asked for `TopDocs` to also be mentioned?



##
lucene/CHANGES.txt:
##
@@ -118,6 +118,8 @@ API Changes
   argument with a `FacetsCollectorManager` and update the return type to 
include both `TopDocs` results as well as
   facets results. (Luca Cavanna)
 
+* GITHUB#13328: Convert CollectionStatistics, TermStatistics and LeafMetadata 
etc. to record classes. (Shubham Chaudhary)

Review Comment:
   Maybe reword to `Convert many basic Lucene classes to record classes, 
including X, Y, and Z`?  It's not just these three classes, if I'm reading the 
PR correctly, and the `etc.` should not have so much power ;)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-10 Thread via GitHub


Mikep86 commented on code in PR #13697:
URL: https://github.com/apache/lucene/pull/13697#discussion_r1752115832


##
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java:
##
@@ -242,14 +242,25 @@ public int nextDoc() throws IOException {
 
 @Override
 public int advance(int target) throws IOException {
-  if (target >= parentBits.length()) {
-return doc = NO_MORE_DOCS;
-  }
-  final int firstChildTarget = target == 0 ? 0 : 
parentBits.prevSetBit(target - 1) + 1;
+  final int prevParent =
+  target == 0 ? 0 : parentBits.prevSetBit(Math.min(target, 
parentBits.length()) - 1);
+
   int childDoc = childApproximation.docID();
-  if (childDoc < firstChildTarget) {
-childDoc = childApproximation.advance(firstChildTarget);
+  if (childDoc < prevParent) {
+childDoc = childApproximation.advance(prevParent);
+  }
+
+  // TODO: Can we assume that doc 0 is never a real parent that also 
matches the child query?

Review Comment:
   Because we use 0 to indicate "no previous parent doc", we cannot detect the 
case when:
   
   - A real parent doc exists at 0
   - That parent doc matches the child query
   
   I don't think this is a problem since any real parent doc at 0 would never 
be returned by `ToParentBlockJoinQuery` since it cannot have child docs. 
Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


reta commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1752230454


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -233,7 +235,13 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 ? leaves ->
 leaves.isEmpty()
 ? new LeafSlice[0]
-: new LeafSlice[] {new LeafSlice(new ArrayList<>(leaves))}
+: new LeafSlice[] {
+  new LeafSlice(
+  new ArrayList<>(

Review Comment:
   We probably could use the immutable list without allocating `new ArrayList`?
   
   ```suggestion
 leaves.stream()
 .map(LeafReaderContextPartition::createForEntireSegment)
 .toList())
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add dynamic range facets [lucene]

2024-09-10 Thread via GitHub


stefanvodita commented on PR #13689:
URL: https://github.com/apache/lucene/pull/13689#issuecomment-2341420365

   I've addressed the feedback and there don't seem to be any blockers. Thank 
you Mike and Robert for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add dynamic range facets [lucene]

2024-09-10 Thread via GitHub


stefanvodita merged PR #13689:
URL: https://github.com/apache/lucene/pull/13689


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] DrillSideways#search: use bounded wildcard for the list of drill-sideways CollectorManagers [lucene]

2024-09-10 Thread via GitHub


stefanvodita commented on PR #13750:
URL: https://github.com/apache/lucene/pull/13750#issuecomment-2341497570

   @epotyom - could you still add a CHANGES entry to keep track of this change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-10 Thread via GitHub


Mikep86 commented on code in PR #13697:
URL: https://github.com/apache/lucene/pull/13697#discussion_r1752115832


##
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java:
##
@@ -242,14 +242,25 @@ public int nextDoc() throws IOException {
 
 @Override
 public int advance(int target) throws IOException {
-  if (target >= parentBits.length()) {
-return doc = NO_MORE_DOCS;
-  }
-  final int firstChildTarget = target == 0 ? 0 : 
parentBits.prevSetBit(target - 1) + 1;
+  final int prevParent =
+  target == 0 ? 0 : parentBits.prevSetBit(Math.min(target, 
parentBits.length()) - 1);
+
   int childDoc = childApproximation.docID();
-  if (childDoc < firstChildTarget) {
-childDoc = childApproximation.advance(firstChildTarget);
+  if (childDoc < prevParent) {
+childDoc = childApproximation.advance(prevParent);
+  }
+
+  // TODO: Can we assume that doc 0 is never a real parent that also 
matches the child query?

Review Comment:
   ~~Because we use 0 to indicate "no previous parent doc", we cannot detect 
the case when:
   
   - A real parent doc exists at 0
   - That parent doc matches the child query
   
   I don't think this is a problem since any real parent doc at 0 would never 
be returned by `ToParentBlockJoinQuery` since it cannot have child docs. 
Thoughts?~~
   
   This question is based on a misunderstanding on my part. Reworking this part 
of the code now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-10 Thread via GitHub


Mikep86 commented on code in PR #13697:
URL: https://github.com/apache/lucene/pull/13697#discussion_r1752115832


##
lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java:
##
@@ -242,14 +242,25 @@ public int nextDoc() throws IOException {
 
 @Override
 public int advance(int target) throws IOException {
-  if (target >= parentBits.length()) {
-return doc = NO_MORE_DOCS;
-  }
-  final int firstChildTarget = target == 0 ? 0 : 
parentBits.prevSetBit(target - 1) + 1;
+  final int prevParent =
+  target == 0 ? 0 : parentBits.prevSetBit(Math.min(target, 
parentBits.length()) - 1);
+
   int childDoc = childApproximation.docID();
-  if (childDoc < firstChildTarget) {
-childDoc = childApproximation.advance(firstChildTarget);
+  if (childDoc < prevParent) {
+childDoc = childApproximation.advance(prevParent);
+  }
+
+  // TODO: Can we assume that doc 0 is never a real parent that also 
matches the child query?

Review Comment:
   Because we use 0 to indicate "no previous parent doc", we cannot detect the 
case when:
   
   - A real parent doc exists at 0
- That parent doc matches the child query
   
   I don't think this is a problem since any real parent doc at 0 would never 
be returned by `ToParentBlockJoinQuery` since it cannot have child docs. 
Thoughts?
   
   Edit: This question is based on a misunderstanding on my part. Reworking 
this part of the code now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should KNN indexing throw an exception if `beamWidth < maxConn` to alert users to misconfiguration? [lucene]

2024-09-10 Thread via GitHub


mikemccand closed issue #13752: Should KNN indexing throw an exception if 
`beamWidth < maxConn` to alert users to misconfiguration?
URL: https://github.com/apache/lucene/issues/13752


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reorder checks in LRUQueryCache#count [lucene]

2024-09-10 Thread via GitHub


jpountz merged PR #13742:
URL: https://github.com/apache/lucene/pull/13742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


reta commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1752702414


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -233,7 +235,13 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 ? leaves ->
 leaves.isEmpty()
 ? new LeafSlice[0]
-: new LeafSlice[] {new LeafSlice(new ArrayList<>(leaves))}
+: new LeafSlice[] {
+  new LeafSlice(
+  new ArrayList<>(

Review Comment:
   Ah, thanks, I got confused by [1] where `Collections.singletonList` is used 
... (I think non issue in this case, nothing to sort)
   
   [1] 
https://github.com/apache/lucene/pull/13542/files#diff-a05ffca5b7693ec880539a5eee7bbfa2d15105e5e8a6105c064d6b66f9226318R286



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1752702691


##
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##
@@ -142,7 +142,7 @@ protected Facets buildFacetsResult(
   private IndexSearcher getNewSearcher(IndexReader reader) {
 // Do not wrap with an asserting searcher, since DrillSidewaysQuery doesn't
 // implement all the required components like Weight#scorer.
-IndexSearcher searcher = newSearcher(reader, true, false, 
random().nextBoolean());
+IndexSearcher searcher = newSearcher(reader, true, false, 
Concurrency.INTER_SEGMENT);

Review Comment:
   I opened #13753 to track this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] DrillSideways does not support intra-segment concurrency [lucene]

2024-09-10 Thread via GitHub


javanna opened a new issue, #13753:
URL: https://github.com/apache/lucene/issues/13753

   With the introduction of intra-segment search concurrency (see #13542), 
`DrillSideways` is the only case that disables it in tests which can't support 
intra-segment slicing given its implementation requirements to go through all 
docs in a segment at once. This is asserted as well in its score method which 
checks that min and max are respectively `0` and `NO_MORE_DOCS`. 
   
   This issue is to discuss a plan to support intra-segment concurrency in 
`DrillSideways`. Ideally, we'd be able to one day enable intra-segment slicing 
by default, once #13745 is addressed, but that would not make `IndexSearcher` 
usable in `DrillSideways` out of the box which is not a good user experience. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace Map with IntObjectHashMap for DV producer [lucene]

2024-09-10 Thread via GitHub


jpountz merged PR #13686:
URL: https://github.com/apache/lucene/pull/13686


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add dynamic range facets [lucene]

2024-09-10 Thread via GitHub


stefanvodita commented on code in PR #13689:
URL: https://github.com/apache/lucene/pull/13689#discussion_r1752721942


##
lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.range;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.InPlaceMergeSorter;
+
+/**
+ * Methods to create dynamic ranges for numeric fields.
+ *
+ * @lucene.experimental
+ */
+public final class DynamicRangeUtil {
+
+  private DynamicRangeUtil() {}
+
+  /**
+   * Construct dynamic ranges using the specified weight field to generate 
equi-weight range for the
+   * specified numeric bin field
+   *
+   * @param weightFieldName Name of the specified weight field
+   * @param weightValueSource Value source of the weight field
+   * @param fieldValueSource Value source of the value field
+   * @param facetsCollector FacetsCollector
+   * @param topN Number of requested ranges
+   * @param exec An executor service that is used to do the computation
+   * @return A list of DynamicRangeInfo that contains count, relevance, min, 
max, and centroid for
+   * each range
+   */
+  public static List computeDynamicRanges(
+  String weightFieldName,
+  LongValuesSource weightValueSource,
+  LongValuesSource fieldValueSource,
+  FacetsCollector facetsCollector,
+  int topN,
+  ExecutorService exec)
+  throws IOException {
+
+List matchingDocsList = 
facetsCollector.getMatchingDocs();
+int totalDoc = matchingDocsList.stream().mapToInt(matchingDoc -> 
matchingDoc.totalHits).sum();
+long[] values = new long[totalDoc];
+long[] weights = new long[totalDoc];
+long totalWeight = 0;
+int overallLength = 0;
+
+List> futures = new ArrayList<>();
+List tasks = new ArrayList<>();
+for (FacetsCollector.MatchingDocs matchingDocs : matchingDocsList) {
+  if (matchingDocs.totalHits > 0) {
+SegmentOutput segmentOutput = new 
SegmentOutput(matchingDocs.totalHits);
+
+// [1] retrieve values and associated weights concurrently
+SegmentTask task =
+new SegmentTask(matchingDocs, fieldValueSource, weightValueSource, 
segmentOutput);
+tasks.add(task);
+futures.add(exec.submit(task));
+  }
+}
+
+// [2] wait for all segment runs to finish
+for (Future future : futures) {
+  try {
+future.get();
+  } catch (InterruptedException ie) {
+throw new RuntimeException(ie);
+  } catch (ExecutionException ee) {
+IOUtils.rethrowAlways(ee.getCause());
+  }
+}
+
+// [3] merge the segment value and weight arrays into one array 
respectively and update the
+// total weights
+// and valid value length
+for (SegmentTask task : tasks) {
+  SegmentOutput curSegmentOutput = task.segmentOutput;
+  // if segment total weight overflows, return null
+  if (curSegmentOutput == null) {
+return null;
+  }
+
+  assert curSegmentOutput.values.length == curSegmentOutput.weights.length;
+
+  try {
+totalWeight = Math.addExact(curSegmentOutput.segmentTotalWeight, 
totalWeight);
+  } catch (ArithmeticException ae) {
+throw new IllegalArgumentException(
+"weight field \"" + weightFieldName + "\": long totalWeight value 
out of bounds", ae);
+  }
+
+  int currSegmentLen = curSegmentOutput.segmentIdx;
+  System.arraycopy(curSegmentOutput.values, 0, values, overallLength, 
currSegmentLen);
+  System.arraycopy(curSegmentOutput.weights, 0, weights, overallLength, 
currSegmentLen);
+  overallLength += currSegmentLen;
+}
+return computeDynamicNumericRanges(values, weights

Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


jpountz merged PR #13328:
URL: https://github.com/apache/lucene/pull/13328


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-09-10 Thread via GitHub


benwtrent commented on PR #13635:
URL: https://github.com/apache/lucene/pull/13635#issuecomment-2342018139

   I haven't been able to review fully, but will soon. 
   
   I like this idea. Do you have any scripts & test data where you have tested 
that this Lucene implementation works and gives as good as results as you had 
in the original paper?
   
   One thing that still bugs me is attaching yet another parameter to a common 
method. Especially a parameter that will be rarely used (my gut reaction is 
choosing the correct seed query is difficult and average users will never use 
it). I will think more about this. Maybe its just fine and I am overreacting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2342018120

   Thanks all for the feedback and reviews! The journey only begins as this PR 
is getting merged, looking forward to making intra-segment concurrency support 
shine soon :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-10 Thread via GitHub


javanna merged PR #13542:
URL: https://github.com/apache/lucene/pull/13542


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-09-10 Thread via GitHub


javanna closed issue #9721: Decouple within-query concurrency from the index's 
segment geometry [LUCENE-8675]
URL: https://github.com/apache/lucene/issues/9721


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-09-10 Thread via GitHub


javanna closed issue #9721: Decouple within-query concurrency from the index's 
segment geometry [LUCENE-8675]
URL: https://github.com/apache/lucene/issues/9721


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


stefanvodita commented on PR #13328:
URL: https://github.com/apache/lucene/pull/13328#issuecomment-2342037249

   I think we've run into a conflict with #13689. @jpountz - should I fix that 
or are you already doing it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


jpountz commented on PR #13328:
URL: https://github.com/apache/lucene/pull/13328#issuecomment-2342059732

   @javanna just fixed it, thanks @javanna !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Update QueryUtils to use CollectorManager [lucene]

2024-09-10 Thread via GitHub


msfroh commented on code in PR #13748:
URL: https://github.com/apache/lucene/pull/13748#discussion_r1752775181


##
lucene/test-framework/src/java/org/apache/lucene/tests/search/QueryUtils.java:
##
@@ -345,175 +348,201 @@ public static void checkSkipTo(final Query q, final 
IndexSearcher s) throws IOEx
   // FUTURE: ensure scorer.doc()==-1
 
   final float maxDiff = 1e-5f;
-  final LeafReader[] lastReader = {null};
 
-  s.search(
-  q,
-  new SimpleCollector() {
-private Scorable sc;
-private Scorer scorer;
-private DocIdSetIterator iterator;
-private int leafPtr;
+  List lastReaders =
+  s.search(
+  q,
+  new CollectorManager>() {
+@Override
+public SimpleCollectorWithLastReader newCollector() {
+  return new SimpleCollectorWithLastReader() {
+LeafReader lastReader = null;
+private Scorable sc;
+private Scorer scorer;
+private DocIdSetIterator iterator;
+private int leafPtr;
+
+@Override
+public void setScorer(Scorable scorer) {
+  this.sc = scorer;
+}
 
-@Override
-public void setScorer(Scorable scorer) {
-  this.sc = scorer;
-}
+@Override
+public void collect(int doc) throws IOException {
+  float score = sc.score();
+  lastDoc[0] = doc;
+  try {
+if (scorer == null) {
+  Query rewritten = s.rewrite(q);
+  Weight w = s.createWeight(rewritten, 
ScoreMode.COMPLETE, 1);
+  LeafReaderContext context = 
readerContextArray.get(leafPtr);
+  scorer = w.scorer(context);
+  iterator = scorer.iterator();
+}
+
+int op = order[(opidx[0]++) % order.length];
+// System.out.println(op==skip_op ?
+// "skip("+(sdoc[0]+1)+")":"next()");
+boolean more =
+op == skip_op
+? iterator.advance(scorer.docID() + 1)
+!= DocIdSetIterator.NO_MORE_DOCS
+: iterator.nextDoc() != 
DocIdSetIterator.NO_MORE_DOCS;
+int scorerDoc = scorer.docID();
+float scorerScore = scorer.score();
+float scorerScore2 = scorer.score();
+float scoreDiff = Math.abs(score - scorerScore);
+float scorerDiff = Math.abs(scorerScore2 - 
scorerScore);
+
+boolean success = false;
+try {
+  assertTrue(more);
+  assertEquals("scorerDoc=" + scorerDoc + ",doc=" + 
doc, scorerDoc, doc);
+  assertTrue(
+  "score=" + score + ", scorerScore=" + 
scorerScore,
+  scoreDiff <= maxDiff);
+  assertTrue(
+  "scorerScorer=" + scorerScore + ", 
scorerScore2=" + scorerScore2,
+  scorerDiff <= maxDiff);
+  success = true;
+} finally {
+  if (!success) {
+if (LuceneTestCase.VERBOSE) {
+  StringBuilder sbord = new StringBuilder();
+  for (int i = 0; i < order.length; i++) {
+sbord.append(order[i] == skip_op ? " skip()" : 
" next()");
+  }
+  System.out.println(
+  "ERROR matching docs:"
+  + "\n\t"
+  + (doc != scorerDoc ? "--> " : "")
+  + "doc="
+  + doc
+  + ", scorerDoc="
+  + scorerDoc
+  + "\n\t"
+  + (!more ? "--> " : "")
+  + "tscorer.more="
+  + more
+  + "\n\t"
+  + (scoreDiff > maxDiff ? "--> " : "")
+  + "scorerScore="
+  + scorerScore
+  + " scoreDiff="
+  + scoreDiff
+  

Re: [PR] Add dynamic range facets [lucene]

2024-09-10 Thread via GitHub


stefanvodita commented on code in PR #13689:
URL: https://github.com/apache/lucene/pull/13689#discussion_r1752721942


##
lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java:
##
@@ -0,0 +1,276 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.range;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.LongValues;
+import org.apache.lucene.search.LongValuesSource;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.InPlaceMergeSorter;
+
+/**
+ * Methods to create dynamic ranges for numeric fields.
+ *
+ * @lucene.experimental
+ */
+public final class DynamicRangeUtil {
+
+  private DynamicRangeUtil() {}
+
+  /**
+   * Construct dynamic ranges using the specified weight field to generate 
equi-weight range for the
+   * specified numeric bin field
+   *
+   * @param weightFieldName Name of the specified weight field
+   * @param weightValueSource Value source of the weight field
+   * @param fieldValueSource Value source of the value field
+   * @param facetsCollector FacetsCollector
+   * @param topN Number of requested ranges
+   * @param exec An executor service that is used to do the computation
+   * @return A list of DynamicRangeInfo that contains count, relevance, min, 
max, and centroid for
+   * each range
+   */
+  public static List computeDynamicRanges(
+  String weightFieldName,
+  LongValuesSource weightValueSource,
+  LongValuesSource fieldValueSource,
+  FacetsCollector facetsCollector,
+  int topN,
+  ExecutorService exec)
+  throws IOException {
+
+List matchingDocsList = 
facetsCollector.getMatchingDocs();
+int totalDoc = matchingDocsList.stream().mapToInt(matchingDoc -> 
matchingDoc.totalHits).sum();
+long[] values = new long[totalDoc];
+long[] weights = new long[totalDoc];
+long totalWeight = 0;
+int overallLength = 0;
+
+List> futures = new ArrayList<>();
+List tasks = new ArrayList<>();
+for (FacetsCollector.MatchingDocs matchingDocs : matchingDocsList) {
+  if (matchingDocs.totalHits > 0) {
+SegmentOutput segmentOutput = new 
SegmentOutput(matchingDocs.totalHits);
+
+// [1] retrieve values and associated weights concurrently
+SegmentTask task =
+new SegmentTask(matchingDocs, fieldValueSource, weightValueSource, 
segmentOutput);
+tasks.add(task);
+futures.add(exec.submit(task));
+  }
+}
+
+// [2] wait for all segment runs to finish
+for (Future future : futures) {
+  try {
+future.get();
+  } catch (InterruptedException ie) {
+throw new RuntimeException(ie);
+  } catch (ExecutionException ee) {
+IOUtils.rethrowAlways(ee.getCause());
+  }
+}
+
+// [3] merge the segment value and weight arrays into one array 
respectively and update the
+// total weights
+// and valid value length
+for (SegmentTask task : tasks) {
+  SegmentOutput curSegmentOutput = task.segmentOutput;
+  // if segment total weight overflows, return null
+  if (curSegmentOutput == null) {
+return null;
+  }
+
+  assert curSegmentOutput.values.length == curSegmentOutput.weights.length;
+
+  try {
+totalWeight = Math.addExact(curSegmentOutput.segmentTotalWeight, 
totalWeight);
+  } catch (ArithmeticException ae) {
+throw new IllegalArgumentException(
+"weight field \"" + weightFieldName + "\": long totalWeight value 
out of bounds", ae);
+  }
+
+  int currSegmentLen = curSegmentOutput.segmentIdx;
+  System.arraycopy(curSegmentOutput.values, 0, values, overallLength, 
currSegmentLen);
+  System.arraycopy(curSegmentOutput.weights, 0, weights, overallLength, 
currSegmentLen);
+  overallLength += currSegmentLen;
+}
+return computeDynamicNumericRanges(values, weights

Re: [PR] Update QueryUtils to use CollectorManager [lucene]

2024-09-10 Thread via GitHub


javanna commented on code in PR #13748:
URL: https://github.com/apache/lucene/pull/13748#discussion_r1752834380


##
lucene/test-framework/src/java/org/apache/lucene/tests/search/QueryUtils.java:
##
@@ -345,175 +348,201 @@ public static void checkSkipTo(final Query q, final 
IndexSearcher s) throws IOEx
   // FUTURE: ensure scorer.doc()==-1
 
   final float maxDiff = 1e-5f;
-  final LeafReader[] lastReader = {null};
 
-  s.search(
-  q,
-  new SimpleCollector() {
-private Scorable sc;
-private Scorer scorer;
-private DocIdSetIterator iterator;
-private int leafPtr;
+  List lastReaders =
+  s.search(
+  q,
+  new CollectorManager>() {
+@Override
+public SimpleCollectorWithLastReader newCollector() {
+  return new SimpleCollectorWithLastReader() {
+LeafReader lastReader = null;
+private Scorable sc;
+private Scorer scorer;
+private DocIdSetIterator iterator;
+private int leafPtr;
+
+@Override
+public void setScorer(Scorable scorer) {
+  this.sc = scorer;
+}
 
-@Override
-public void setScorer(Scorable scorer) {
-  this.sc = scorer;
-}
+@Override
+public void collect(int doc) throws IOException {
+  float score = sc.score();
+  lastDoc[0] = doc;
+  try {
+if (scorer == null) {
+  Query rewritten = s.rewrite(q);
+  Weight w = s.createWeight(rewritten, 
ScoreMode.COMPLETE, 1);
+  LeafReaderContext context = 
readerContextArray.get(leafPtr);
+  scorer = w.scorer(context);
+  iterator = scorer.iterator();
+}
+
+int op = order[(opidx[0]++) % order.length];
+// System.out.println(op==skip_op ?
+// "skip("+(sdoc[0]+1)+")":"next()");
+boolean more =
+op == skip_op
+? iterator.advance(scorer.docID() + 1)
+!= DocIdSetIterator.NO_MORE_DOCS
+: iterator.nextDoc() != 
DocIdSetIterator.NO_MORE_DOCS;
+int scorerDoc = scorer.docID();
+float scorerScore = scorer.score();
+float scorerScore2 = scorer.score();
+float scoreDiff = Math.abs(score - scorerScore);
+float scorerDiff = Math.abs(scorerScore2 - 
scorerScore);
+
+boolean success = false;
+try {
+  assertTrue(more);
+  assertEquals("scorerDoc=" + scorerDoc + ",doc=" + 
doc, scorerDoc, doc);
+  assertTrue(
+  "score=" + score + ", scorerScore=" + 
scorerScore,
+  scoreDiff <= maxDiff);
+  assertTrue(
+  "scorerScorer=" + scorerScore + ", 
scorerScore2=" + scorerScore2,
+  scorerDiff <= maxDiff);
+  success = true;
+} finally {
+  if (!success) {
+if (LuceneTestCase.VERBOSE) {
+  StringBuilder sbord = new StringBuilder();
+  for (int i = 0; i < order.length; i++) {
+sbord.append(order[i] == skip_op ? " skip()" : 
" next()");
+  }
+  System.out.println(
+  "ERROR matching docs:"
+  + "\n\t"
+  + (doc != scorerDoc ? "--> " : "")
+  + "doc="
+  + doc
+  + ", scorerDoc="
+  + scorerDoc
+  + "\n\t"
+  + (!more ? "--> " : "")
+  + "tscorer.more="
+  + more
+  + "\n\t"
+  + (scoreDiff > maxDiff ? "--> " : "")
+  + "scorerScore="
+  + scorerScore
+  + " scoreDiff="
+  + scoreDiff
+ 

Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


uschindler commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752842660


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java:
##
@@ -163,11 +156,6 @@ public boolean equals(Object o) {
   FieldAndWeight that = (FieldAndWeight) o;
   return Float.compare(that.weight, weight) == 0 && Objects.equals(field, 
that.field);
 }

Review Comment:
   Hi,
   The equals methods for floats is generated by invokedynamic using a helper. 
When a record component is a float, the autogenerated impl is generated at 
runtime as combination of several Method handles. It compares them using this 
code:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L157
   
   This is equivalent to the already existing code. So please remove all custom 
equals methods.
   
   Unfortunately in the merged PR there are so one more. I will add a not on 
them. Please remove them in a followup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


uschindler commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752842660


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java:
##
@@ -163,11 +156,6 @@ public boolean equals(Object o) {
   FieldAndWeight that = (FieldAndWeight) o;
   return Float.compare(that.weight, weight) == 0 && Objects.equals(field, 
that.field);
 }

Review Comment:
   Hi,
   The equals methods for floats is generated by invokedynamic using a helper. 
When a record component is a float, the autogenerated impl is generated at 
runtime as combination of several Method handles. It compares them using this 
code:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L157
   
   The method handles used by the code generator are saved in a map:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L101
   
   This is equivalent to the already existing code. So please remove all custom 
equals methods.
   
   Unfortunately in the merged PR there are so one more. I will add a not on 
them. Please remove them in a followup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


uschindler commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752860507


##
lucene/core/src/java/org/apache/lucene/search/MaxScoreAccumulator.java:
##
@@ -101,11 +93,6 @@ public boolean equals(Object o) {
   return docBase == result.docBase && Float.compare(result.score, score) 
== 0;
 }
 
-@Override

Review Comment:
   Please also remove the equals, see this comment: 
https://github.com/apache/lucene/pull/13328#discussion_r1752842660



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


uschindler commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752842660


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java:
##
@@ -163,11 +156,6 @@ public boolean equals(Object o) {
   FieldAndWeight that = (FieldAndWeight) o;
   return Float.compare(that.weight, weight) == 0 && Objects.equals(field, 
that.field);
 }

Review Comment:
   Hi,
   The equals methods for floats is generated by invokedynamic at runtime using 
a bootstrapping:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L382
   
   The class file of a record only contains a stub equals/hashcode/to string 
which delegates to an invokedynamic call passing the class which allows the 
bootstrapped to get all record components and construct a method handle for the 
whole method implementation.
   
   When a record component is a float, the autogenerated impl using this impl 
to handle floats. It compares them using this code:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L157
   
   The method handles used by the code generator are saved in a map:
   
   
https://github.com/openjdk/jdk/blob/9785e19f3f87306cabc26a862d35b89d41cfef93/src/java.base/share/classes/java/lang/runtime/ObjectMethods.java#L101
   
   This is equivalent to the already existing code. So please remove all custom 
equals methods.
   
   Unfortunately in the merged PR there are some more. Please remove them in a 
followup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-10 Thread via GitHub


msfroh commented on code in PR #13747:
URL: https://github.com/apache/lucene/pull/13747#discussion_r1752887025


##
lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java:
##
@@ -530,49 +528,104 @@ public static Query createJoinQuery(
 
 final Query rewrittenFromQuery = searcher.rewrite(fromQuery);
 final Query rewrittenToQuery = searcher.rewrite(toQuery);
-GlobalOrdinalsWithScoreCollector globalOrdinalsWithScoreCollector;
+Supplier 
globalOrdinalsWithScoreCollector;
+final OrdinalMap finalOrdinalMap = ordinalMap;
 switch (scoreMode) {
   case Total:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Sum(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Sum(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Min:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Min(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Min(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Max:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Max(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Max(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case Avg:
 globalOrdinalsWithScoreCollector =
-new GlobalOrdinalsWithScoreCollector.Avg(joinField, ordinalMap, 
valueCount, min, max);
+() ->
+new GlobalOrdinalsWithScoreCollector.Avg(
+joinField, finalOrdinalMap, valueCount, min, max);
 break;
   case None:
 if (min <= 1 && max == Integer.MAX_VALUE) {
-  GlobalOrdinalsCollector globalOrdinalsCollector =
-  new GlobalOrdinalsCollector(joinField, ordinalMap, valueCount);
-  searcher.search(rewrittenFromQuery, globalOrdinalsCollector);
+  LongBitSet collectedOrdinals =
+  searcher.search(
+  rewrittenFromQuery,
+  new CollectorManager() {

Review Comment:
   Good idea!
   
   I've created a new `MergeableCollector` interface that adds a `merge` method 
to `Collector`. Then we can reuse a `MergeableCollectorManager` to fold all 
results into a single collector in the `reduce` step. At the very least, I can 
use it for the two cases in this method (`GlobalOrdinalsCollector` and 
`GlobalOrdinalsWithScoreCollector`). I should be able to use them once I tackle 
the terms and numeric collectors.
   
   I've kept them package-private for now, so I can use them throughout the 
`join` module. I don't know if anyone else wants to implement `reduce` as a 
pair-wise merge. (Most of the existing `reduce` implementations just do an 
N-way merge.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


uschindler commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1752895892


##
lucene/core/src/java/org/apache/lucene/search/MaxScoreAccumulator.java:
##
@@ -101,11 +93,6 @@ public boolean equals(Object o) {
   return docBase == result.docBase && Float.compare(result.score, score) 
== 0;
 }
 
-@Override

Review Comment:
   To string is also useless here. Please remove, too 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use range optimizations for "slow" MultiTermQueries when terms happen to be contiguous [lucene]

2024-09-10 Thread via GitHub


github-actions[bot] commented on PR #13693:
URL: https://github.com/apache/lucene/pull/13693#issuecomment-2342381311

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Implement Accountable for NFARunAutomaton [lucene]

2024-09-10 Thread via GitHub


zhaih merged PR #13741:
URL: https://github.com/apache/lucene/pull/13741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] add TFSimilarity class [lucene]

2024-09-10 Thread via GitHub


rmuir commented on code in PR #13749:
URL: https://github.com/apache/lucene/pull/13749#discussion_r1752981586


##
lucene/core/src/java/org/apache/lucene/search/similarities/TFSimilarity.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.similarities;
+
+import org.apache.lucene.index.FieldInvertState;
+import org.apache.lucene.search.CollectionStatistics;
+import org.apache.lucene.search.TermStatistics;
+
+/** Similarity that returns the TF as score */
+public class TFSimilarity extends Similarity {
+
+  @Override
+  public long computeNorm(FieldInvertState state) {
+return 1; // we dont care
+  }

Review Comment:
   I'd also support removing this footgun as a separate PR. Similarity is 
supposed to be an "expert" interface, but this abstract method that impacts the 
index format is trappy, for the reasons shown here.
   
   Maybe it should have a default implementation (currently copied about 3 or 4 
different places in subclasses)? 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java#L114
   
   I wouldn't mess with any other method, except their javadocs. If we have a 
default implementation for `computeNorm` then we can suggest in these places a 
code snippet of how to decode it (e.g. link to SmallFloat.byte4ToInt or 
whatever) to help users:
   * 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java#L158
   * 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java#L167
   
   It would make the class easier to use. Users might look at BM25Similarity to 
try to figure out how to do it today, but that one has heavy optimizations such 
as precomputed length table. A couple javadocs would go a long way on the 
query-side there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-10 Thread via GitHub


msfroh commented on PR #13747:
URL: https://github.com/apache/lucene/pull/13747#issuecomment-2342434431

   @javanna  -- I've managed to get the remaining numeric / terms collectors in 
the Join module working with multiple search threads.
   
   I can add them to this PR, but the diff is pretty massive. I'm thinking of 
holding off for another PR, but I'm happy to go either way. 
   
   There is probably value in "atomically" jumping from the current "always 
single-threaded" mode straight to "everything works with a multithreaded 
searcher", versus this PR's current state where global ordinal-based joins work 
with a multithreaded searcher but numeric/term-based joins don't.
   
   Thanks a lot for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1753129302


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java:
##
@@ -163,11 +156,6 @@ public boolean equals(Object o) {
   FieldAndWeight that = (FieldAndWeight) o;
   return Float.compare(that.weight, weight) == 0 && Objects.equals(field, 
that.field);
 }

Review Comment:
   Nice! I'll check and followup with removing others I could find.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-09-10 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1753134745


##
lucene/core/src/java/org/apache/lucene/search/MaxScoreAccumulator.java:
##
@@ -101,11 +93,6 @@ public boolean equals(Object o) {
   return docBase == result.docBase && Float.compare(result.score, score) 
== 0;
 }
 
-@Override

Review Comment:
   Sure, I'll fix it in a followup PR. I could only find this instance in the 
classes changed to records where we have removable custom equals/toString but 
I'll take another pass to be sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Nightly gh action "buildAndPushRelease and smokeTestRelease.py" should save release.log on failure [lucene]

2024-09-10 Thread via GitHub


dweiss opened a new issue, #13754:
URL: https://github.com/apache/lucene/issues/13754

   ### Description
   
   It'll be useful to diagnose problems in case of failures, such as this one:
   https://github.com/apache/lucene/actions/runs/10803880083/job/29968434986
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly gh action "buildAndPushRelease and smokeTestRelease.py" should save release.log on failure [lucene]

2024-09-10 Thread via GitHub


dweiss closed issue #13754: Nightly gh action "buildAndPushRelease and 
smokeTestRelease.py" should save release.log on failure
URL: https://github.com/apache/lucene/issues/13754


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize switch statements in FST [lucene]

2024-09-10 Thread via GitHub


mrhbj closed pull request #13755: Modernize switch statements in FST
URL: https://github.com/apache/lucene/pull/13755


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Modernize switch statements in FST [lucene]

2024-09-10 Thread via GitHub


mrhbj opened a new pull request, #13755:
URL: https://github.com/apache/lucene/pull/13755

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org