[PR] [9_10] Mark TermInSetQuery ctors with varargs terms as deprecated [lucene]

2023-12-01 Thread via GitHub


slow-J opened a new pull request, #12864:
URL: https://github.com/apache/lucene/pull/12864

   Issue: https://github.com/apache/lucene/issues/12243.
   The deprecated items here are being removed in `main` branch through 
PR:https://github.com/apache/lucene/pull/12837.
   
   For methods calling this, also adding new methods taking a collection param 
and deprecating previous ones with varargs param: 
SortedSetDocValuesField#newSlowSetQuery, SortedDocValuesField#newSlowSetQuery, 
KeywordField#newSetQuery
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix intermittently failing TestParallelLeafReader [lucene]

2023-12-01 Thread via GitHub


ChrisHegarty opened a new pull request, #12865:
URL: https://github.com/apache/lucene/pull/12865

   This commit fixes the intermittently failing TestParallelLeafReader.
   
   The ParallelLeafReader requires the document order to be consistent across 
indexes - each document contains the union of the fields of all documents with 
the same document number. The test asserts this. But now, with 
MockRandomMergePolicy potentially reversing the doc ID order while merging, 
this invalidates the assumption of the test indexes and assertions. The 
solution is to just ensure that no merging actually happens in these tiny test 
indexes.
   
   Similar root cause as #12850.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


kaivalnp opened a new pull request, #12866:
URL: https://github.com/apache/lucene/pull/12866

   ### Description
   
   
[`#findBestEntryPoint`](https://github.com/apache/lucene/blob/4bc7850465dfac9dc0638d9ee782007883869ffe/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L151)
 is used to determine the entry point for the last level of HNSW search
   
   It finds the single best-scoring node from [all upper 
levels](https://github.com/apache/lucene/blob/4bc7850465dfac9dc0638d9ee782007883869ffe/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L159)
 - but performs an [unnecessary 
computation](https://github.com/apache/lucene/blob/4bc7850465dfac9dc0638d9ee782007883869ffe/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L157)
 (along with [recording one visited 
node](https://github.com/apache/lucene/blob/4bc7850465dfac9dc0638d9ee782007883869ffe/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L154))
 when the graph just has 1 level (so the entry node is just the overall graph's 
entry node)
   
   Also added a test to demonstrate this (fails without the changes in PR) -- 
where we visit `graph.size() + 1` nodes when the `topK` is high (should be a 
maximum of `graph.size()`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestParallelLeafReader [lucene]

2023-12-01 Thread via GitHub


ChrisHegarty commented on PR #12865:
URL: https://github.com/apache/lucene/pull/12865#issuecomment-1836024282

   The test was previously seen to fail about 1 in every couple of hundred 
runs, with:
   
   ```
   org.junit.ComparisonFailure: expected: but was:
at __randomizedtesting.SeedInfo.seed([D07DE7201C057238:E432507488980B27]:0)
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at 
org.apache.lucene.index.TestParallelLeafReader.queryTest(TestParallelLeafReader.java:264)
at 
org.apache.lucene.index.TestParallelLeafReader.testQueries(TestParallelLeafReader.java:49)
...
   ```
   
   After the change, the test has not failed in several hundreds of thousands 
of runs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add ParentJoin KNN support [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1836025921

   @david-sitsky sorry for the confusion, it was renamed 
`DiversifyingChildren*KnnVectorQuery`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on code in PR #12866:
URL: https://github.com/apache/lucene/pull/12866#discussion_r1412053821


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -100,19 +100,10 @@ private static void search(
   HnswGraphSearcher graphSearcher,
   Bits acceptOrds)
   throws IOException {
-int initialEp = graph.entryNode();
-if (initialEp == -1) {
-  return;
+int ep = graphSearcher.findBestEntryPoint(scorer, graph, knnCollector);

Review Comment:
   I think you should still check `collector.earlyTerminated()` after finding 
the best entry point as we increment the visited count after each check. 
Meaning we could have visited too many by one, and there is no point in 
continuing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


stefanvodita commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836064137

   I did 5 benchmark runs for 4 configurations. To avoid making this comment 
way too large, I'll just report the averages across the 5 runs per 
configuration.
   It looks like there are some genuine differences between the baseline and 
cadidate, but I'm not sure the differences between different candidate 
configurations are significant. How do we trade off between the extra 
miliseconds and the memory savings?
   
   ```
   baseline
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 25021.00
post-filter
   0.5420.23   10  0   64  250 100 43624   1.00
post-filter
   0.5120.28   20  0   64  250 100 111019  1.00
post-filter
   
   candidate with INITIAL_CAPACITY == 10
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 24981.00
post-filter
   0.5420.23   10  0   64  250 100 45063   1.00
post-filter
   0.5120.28   20  0   64  250 100 116683  1.00
post-filter
   
   candidate with INITIAL_CAPACITY == 100
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 25281.00
post-filter
   0.5420.23   10  0   64  250 100 46055   1.00
post-filter
   0.5120.28   20  0   64  250 100 115407  1.00
post-filter
   
   candidate with INITIAL_CAPACITY == 1000
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 25181.00
post-filter
   0.5420.23   10  0   64  250 100 46617   1.00
post-filter
   0.5120.29   20  0   64  250 100 118359  1.00
post-filter
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836068375

   @stefanvodita 
   
   > How do we trade off between the extra miliseconds and the memory savings?
   
   It would be good to know the actual memory savings. I don't know how to 
measure the trade-off without know what we are trading off for.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Reconcile changelog 9.9.0 section [lucene]

2023-12-01 Thread via GitHub


ChrisHegarty opened a new pull request, #12867:
URL: https://github.com/apache/lucene/pull/12867

   Reconcile the changelog between branch_9_9 and main. This change just 
reorders a number of entries in _main_ to match that of branch_9_9. As 
identified by Mike's script, #12860
   
   There are still a couple of minor inconsistencies between the changelog 
entries, but since they require a little more than just reordering, I'll 
separate them out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add simple tool to diff entries in lucene's CHANGES.txt that should be identical [lucene]

2023-12-01 Thread via GitHub


ChrisHegarty commented on code in PR #12860:
URL: https://github.com/apache/lucene/pull/12860#discussion_r1412073640


##
dev-tools/scripts/diff_lucene_changes.py:
##
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import re
+import subprocess
+import sys
+import tempfile
+import urllib.request
+
+'''
+A simple tool to see diffs between main's version of CHANGES.txt entries for
+a given release vs the stable branch's version.  It's best to keep these 1)
+identical and 2) matching what changes were actually backported to be honest
+to users and avoid future annoying conflicts on backport.
+'''
+
+# e.g. python3 -u diff_lucene_changes.py branch_9_9 main 9.9.0
+
+#
+
+def get_changes_url(branch_name):
+  if os.path.isdir(branch_name):
+url = f'file://{branch_name}/lucene/CHANGES.txt'
+  else:
+url = 
f'https://raw.githubusercontent.com/apache/lucene/{branch_name}/lucene/CHANGES.txt'
+  print(f'NOTE: resolving {branch_name} --> {url}')
+  return url
+
+def extract_release_section(changes_txt, release_name):
+  open('/x/tmp/foo.txt', 'wb').write(changes_txt)

Review Comment:
   Did you mean just `/tmp` here? I had to modify this to run on my Mac.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

2023-12-01 Thread via GitHub


shubhamvishu commented on PR #12798:
URL: https://github.com/apache/lucene/pull/12798#issuecomment-1836093934

   Exactly @javanna Do you think it would make sense to have a new 
`FutureUtil` class and add this function there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412105704


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   I was new to this class, just curious what happen if INITIAL_CAPACITY > 
maxSize. I thought maxSize is the maximum size that we ever needed. 
   
   But looking at previous code, it seems the array can still be expanded even 
if we already fully pre-allocated it with maxSize, from the line `node = 
ArrayUtil.grow(node);`. Is it just because it was initial grow on-demand, then 
change to full prelocation, but didn't clean that up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412105704


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   I was new to this class, just curious what happen if INITIAL_CAPACITY > 
maxSize. I thought maxSize is the maximum size that we ever needed. 
   
   But looking at previous code, it seems the array can still be expanded even 
if we already fully pre-allocated it with maxSize, from the line `node = 
ArrayUtil.grow(node);`. Is it just because it was initial grow on-demand, then 
change to full prelocation, but didn't clean that up.
   
   If it's indeed the maximum size that we ever need, maybe 
`Math.min(INITIAL_CAPACITY, maxSize)` would be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


kaivalnp commented on code in PR #12866:
URL: https://github.com/apache/lucene/pull/12866#discussion_r1412139289


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -100,19 +100,10 @@ private static void search(
   HnswGraphSearcher graphSearcher,
   Bits acceptOrds)
   throws IOException {
-int initialEp = graph.entryNode();
-if (initialEp == -1) {
-  return;
+int ep = graphSearcher.findBestEntryPoint(scorer, graph, knnCollector);

Review Comment:
   Makes sense! Added now..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub


Pulkitg64 commented on PR #12857:
URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836188173

   Thanks @shubhamvishu for taking a look.
   > I went through the change but I didn't understand how are we not reusing 
the bitset in the current approach. We do wrap the BitSetIterator with a 
FilteredDocIdSetIterator when there are deleted docs right which would 
eventually use the bitset to advance the inner iterator(See 
[this](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/FilteredDocIdSetIterator.java#L72-L87)).
   
   Sorry! I think, I should have used different title for this PR. The part in 
the current approach, which I am trying to optimize is that when the iterator 
is of ```BitSetIterator``` instance and ```live docs``` are not null. So in 
current approach we create a new BitSet while taking live docs into 
consideration. But this bitset creation is a linear time complexity process, 
because to create bitset we need to iterate over all matched docs. This BitSet 
creation is not required as we can wrap both matched docs bitset and live docs 
bitset under single Bits instance which can be later used directly during 
approximate search. So instead of creating new Bitset, we are computing if a 
document is valid for searching or not at runtime. This saves us time to create 
new BitSet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reconcile changelog 9.9.0 section [lucene]

2023-12-01 Thread via GitHub


ChrisHegarty merged PR #12867:
URL: https://github.com/apache/lucene/pull/12867


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412155078


##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
 return copy;
   }
 
+  /**
+   * Returns an array whose size is at least {@code minLength}, generally 
over-allocating
+   * exponentially, but never allocating more than {@code maxLength} elements.
+   */
+  public static int[] growInRange(int[] array, int minLength, int maxLength) {
+if (array.length >= minLength) {
+  return array;
+}
+if (minLength > maxLength) {
+  throw new IllegalArgumentException(
+  "requested minimum array length "
+  + minLength
+  + " is larger than requested maximum array length "
+  + maxLength);
+}

Review Comment:
   I think it would be strange that we allow minSize to be > maxSize, that 
could introduce latent bugs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412167183


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   This seems to come from the `M` parameter, which is maximum number of 
connection (neighbors) a node can have, which is M on upper level and M*2 on 
level 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412176786


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   We should never ever allow `INITIAL_CAPACITY` to be used to create these 
arrays when it is larger than `maxSize`. This would be a serious bug IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


msokolov commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412182912


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   I think it's some leftover stuff from before, yes. Although it's also 
possible we have a bug in our maxSize initialization? I don't think we do 
though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


msokolov commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412186499


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   I really don't understand why `candidate with INITIAL_CAPACITY == 1000` 
would be slower. Is it GC-related? Have you been able to look at JFR at all?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub


msokolov commented on issue #12863:
URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836250307

   If the JVM crashes, it's generally considered a JVM bug. I'll note there is 
a more recent JDK 11 release - they seem to be up to 11.0.20. Have you 
considered upgrading? These point release generally include bug fixes, although 
I can't speak to whether this particular issue has been addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412215432


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   Could it be due to the over-allocation that every node now needs to allocate 
1000 size array whether they need it or not?
   
   I think that's the more reason we should use min(INITIAL_SIZE, maxSize)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412215432


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   Could it be due to the over-allocation that every node now needs to allocate 
1000 size array whether they need it or not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add simple tool to diff entries in lucene's CHANGES.txt that should be identical [lucene]

2023-12-01 Thread via GitHub


mikemccand commented on code in PR #12860:
URL: https://github.com/apache/lucene/pull/12860#discussion_r1412235383


##
dev-tools/scripts/diff_lucene_changes.py:
##
@@ -0,0 +1,79 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import re
+import subprocess
+import sys
+import tempfile
+import urllib.request
+
+'''
+A simple tool to see diffs between main's version of CHANGES.txt entries for
+a given release vs the stable branch's version.  It's best to keep these 1)
+identical and 2) matching what changes were actually backported to be honest
+to users and avoid future annoying conflicts on backport.
+'''
+
+# e.g. python3 -u diff_lucene_changes.py branch_9_9 main 9.9.0
+
+#
+
+def get_changes_url(branch_name):
+  if os.path.isdir(branch_name):
+url = f'file://{branch_name}/lucene/CHANGES.txt'
+  else:
+url = 
f'https://raw.githubusercontent.com/apache/lucene/{branch_name}/lucene/CHANGES.txt'
+  print(f'NOTE: resolving {branch_name} --> {url}')
+  return url
+
+def extract_release_section(changes_txt, release_name):
+  open('/x/tmp/foo.txt', 'wb').write(changes_txt)

Review Comment:
   Er, woops, this was not supposed to be committed -- it was just there for 
temporary debugging -- I'll remove!  Good catch, thanks @ChrisHegarty ;)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub


rmuir commented on issue #12863:
URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836306192

   I don't think its a jvm bug. This is what happens when you try to read from 
a closed indexreader that is backed by mmap. because we've unmapped the byte 
buffer  when `close()` was called -> SIGSEGV.
   
   So it is a bug in the application (accessing close'd indexreader), but it 
causes the JVM crash. It is a tale as old as time, you can find numerous 
threads on it elsewhere. I'm not gonna repeat it all again.
   
   We make a best effort to detect such abuses and prevent this, but its not 
possible to do safely without using @uschindler 's Panama-based provider and a 
modern JVM.
   
   So either:
   * fix your code not to access close'd indexreaders
   * upgrade to recent lucene and java and use panama provider. You will get 
AlreadyClosedException or similar instead.
   * call `setUseUnmap(false)` on your MMapDirectory. The downside is much disk 
and address space will now be tied to the java garbage collector...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub


rmuir closed issue #12863: Jvm Crashes occassionaly with Lucene 8.10.0, JDK 
11.0.15+10
URL: https://github.com/apache/lucene/issues/12863


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


kaivalnp commented on PR #12866:
URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836321336

   Thanks @benwtrent :)
   
   > But a CHANGES entry for Lucene 9.10
   
   I did not see a section for "9.10" in the [current 
CHANGES.txt](https://github.com/apache/lucene/blob/b231e5b213200f4049dcc7364cc1452d11cd3f2c/lucene/CHANGES.txt)
 -- any idea when this will be added?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on PR #12866:
URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836322275

   @kaivalnp you can add it :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub


uschindler commented on issue #12863:
URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836326667

   Thanks @rmuir,
   I can confirm the above pattern happens when you access an 
IndexReader/IndexSearcher form another thread if it was closed.
   
   To see AlreadyClosedExceptions you need to upgrade to at least Lucene ~9.5 
(better use latest 9.9) and use Java 19 or later. This configuration will give 
you a AlreadyClosedException instead of crashing your JVM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on issue #12840:
URL: https://github.com/apache/lucene/issues/12840#issuecomment-1836360899

   fixed via: https://github.com/apache/lucene/pull/12848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased [lucene]

2023-12-01 Thread via GitHub


benwtrent closed issue #12840: Reproducible error in 
TestLucene90HnswVectorsFormat.testIndexedValueNotAliased 
URL: https://github.com/apache/lucene/issues/12840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


kaivalnp commented on PR #12866:
URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836394910

   Replicated [this commit](https://github.com/apache/lucene/commit/94b879a5) 
(which added the 9.9.0 entry) for 9.10.0
   Please let me know if I missed something..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Trying using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub


shubhamvishu opened a new pull request, #12868:
URL: https://github.com/apache/lucene/pull/12868

   ### Description
   We are currently using Murmurhash 
2([MurmurHash64.java](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/MurmurHash64.java))
 in the bloom filters implementation in lucene where we also have Murmurhash 3 
(the latest one available in the 
[MurmurHash](https://en.wikipedia.org/wiki/MurmurHash) family of hash 
functions) and provides better performance, avalanche effect etc. It provides 
128-bit variant which Murmurhash 2 doesn't. This PR aims to use Murmurhash 3 
with bloom filters and see if that helps.
    
   Note : We are already using murmur hash 3 in 
[BytesRefHash](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java#L412-L414)
 implementation (#)
   
    
   Next steps : Run `luceneutil` benchmarks and share the results 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make NeighborArray fixed size [lucene]

2023-12-01 Thread via GitHub


msokolov closed issue #11783: Make NeighborArray fixed size
URL: https://github.com/apache/lucene/issues/11783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make NeighborArray fixed size [lucene]

2023-12-01 Thread via GitHub


msokolov commented on issue #11783:
URL: https://github.com/apache/lucene/issues/11783#issuecomment-1836436845

   resolving this since I think we did it and are now going back to more 
dynamic allocation strategy LOL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve Javadoc [lucene]

2023-12-01 Thread via GitHub


mikemccand commented on PR #12508:
URL: https://github.com/apache/lucene/pull/12508#issuecomment-1836443822

   Woops, thank you for your attention to detail @lukas-vlcek!  And sorry for 
the crazy long time to respond.  I'll merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve Javadoc [lucene]

2023-12-01 Thread via GitHub


mikemccand merged PR #12508:
URL: https://github.com/apache/lucene/pull/12508


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Changed byte to int for prefix_length_key [lucene]

2023-12-01 Thread via GitHub


msokolov merged PR #12507:
URL: https://github.com/apache/lucene/pull/12507


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub


shubhamvishu commented on PR #12868:
URL: https://github.com/apache/lucene/pull/12868#issuecomment-1836499432

   Below are the `luceneutil` benchmark results for `wikimediumall`. Looks all 
flat and good to me.
   
   
   ```
  TaskQPS baseline  StdDevQPS my_modified_version  StdDev   
 Pct diff p-value
LowIntervalsOrdered6.08  (5.0%)6.00  (6.3%) 
  -1.3% ( -12% -   10%) 0.463
 TermDTSort   97.64  (4.6%)   96.36  (5.6%) 
  -1.3% ( -11% -9%) 0.417
 OrHighHigh   28.51  (5.7%)   28.28  (5.7%) 
  -0.8% ( -11% -   11%) 0.651
  HighTermMonthSort 2287.85  (2.9%) 2270.76  (3.3%) 
  -0.7% (  -6% -5%) 0.447
   MedTermDayTaxoFacets   14.70  (4.6%)   14.60  (4.0%) 
  -0.7% (  -8% -8%) 0.620
  BrowseDayOfYearTaxoFacets3.94 (28.1%)3.91 (28.0%) 
  -0.7% ( -44% -   77%) 0.941
Respell   54.80  (1.8%)   54.45  (2.2%) 
  -0.6% (  -4% -3%) 0.307
   BrowseDateTaxoFacets3.92 (27.5%)3.89 (27.4%) 
  -0.6% ( -43% -   74%) 0.942
  BrowseMonthSSDVFacets4.45 (10.1%)4.43  (9.7%) 
  -0.6% ( -18% -   21%) 0.845
MedSpanNear   91.41  (2.6%)   90.86  (3.7%) 
  -0.6% (  -6% -5%) 0.554
MedIntervalsOrdered6.61  (3.7%)6.57  (4.4%) 
  -0.5% (  -8% -7%) 0.671
   HighIntervalsOrdered6.73  (3.8%)6.69  (4.2%) 
  -0.5% (  -8% -7%) 0.691
  HighTermTitleSort  121.44  (4.0%)  120.97  (4.6%) 
  -0.4% (  -8% -8%) 0.778
   HighSpanNear5.31  (2.6%)5.29  (3.2%) 
  -0.4% (  -6% -5%) 0.701
LowSpanNear   50.68  (2.6%)   50.51  (3.3%) 
  -0.3% (  -6% -5%) 0.731
  OrHighMed   46.96  (3.3%)   46.83  (3.2%) 
  -0.3% (  -6% -6%) 0.788
   OrHighNotLow  235.93  (5.2%)  235.59  (6.5%) 
  -0.1% ( -11% -   12%) 0.937
   HighSloppyPhrase4.70  (4.7%)4.70  (5.6%) 
  -0.1% (  -9% -   10%) 0.931
  BrowseMonthTaxoFacets4.15 (35.3%)4.15 (35.3%) 
  -0.1% ( -52% -  108%) 0.991
 Fuzzy2   54.22  (1.1%)   54.15  (1.3%) 
  -0.1% (  -2% -2%) 0.745
   OrNotHighLow  517.03  (1.6%)  516.41  (1.6%) 
  -0.1% (  -3% -3%) 0.816
 Fuzzy1   38.59  (1.2%)   38.56  (1.3%) 
  -0.1% (  -2% -2%) 0.808
BrowseRandomLabelSSDVFacets2.77  (7.5%)2.77  (7.5%) 
  -0.0% ( -14% -   16%) 0.986
AndHighHigh   29.06  (3.2%)   29.06  (3.1%) 
  -0.0% (  -6% -6%) 0.991
 AndHighMed   55.89  (2.9%)   55.91  (2.7%) 
   0.0% (  -5% -5%) 0.970
  OrHighLow  319.86  (2.2%)  320.00  (2.0%) 
   0.0% (  -4% -4%) 0.945
 IntNRQ   21.94  (2.5%)   21.96  (3.4%) 
   0.1% (  -5% -6%) 0.905
  LowPhrase  136.01  (4.1%)  136.20  (3.8%) 
   0.1% (  -7% -8%) 0.908
   OrNotHighMed  208.82  (3.6%)  209.20  (3.9%) 
   0.2% (  -7% -8%) 0.879
  OrNotHighHigh  241.45  (3.8%)  241.96  (4.9%) 
   0.2% (  -8% -9%) 0.880
  OrHighNotHigh  191.17  (4.6%)  191.65  (5.7%) 
   0.3% (  -9% -   11%) 0.878
 AndHighLow  300.40  (3.0%)  301.26  (2.7%) 
   0.3% (  -5% -6%) 0.754
   BrowseDateSSDVFacets0.90 (12.4%)0.91 (10.9%) 
   0.3% ( -20% -   26%) 0.933
BrowseRandomLabelTaxoFacets3.34 (24.3%)3.36 (24.6%) 
   0.4% ( -39% -   65%) 0.963
   OrHighNotMed  215.89  (5.5%)  216.69  (6.3%) 
   0.4% ( -10% -   12%) 0.843
   Wildcard   61.64  (2.0%)   61.91  (2.2%) 
   0.4% (  -3% -4%) 0.501
   HighTermTitleBDVSort5.43  (4.3%)5.46  (4.5%) 
   0.5% (  -8% -9%) 0.729
   PKLookup  136.14  (1.9%)  136.97  (1.9%) 
   0.6% (  -3% -4%) 0.310
  HighTermDayOfYearSort  195.65  (4.1%)  196.91  (4.6%) 
   0.6% (  -7% -9%) 0.641
MedTerm  455.25  (5.3%)  458.30  (5.8%) 
   0.7% (  -9% -   12%) 0.702
LowSloppyPhrase   30.11  (2.0%)   30.36  (1.8%) 
   0.8% (  -2% -4%) 0.171
   HighTerm  367.19  (6.1%)  370.21  (6.3%) 
   0.8% ( -10% -   14%) 0.6

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub


kaivalnp commented on code in PR #12857:
URL: https://github.com/apache/lucene/pull/12857#discussion_r1412393818


##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
   return NO_RESULTS;
 }
 
-BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-int cost = acceptDocs.cardinality();
+DocIdSetIterator iterator = scorer.iterator();
+BitSet bitSet =
+iterator instanceof BitSetIterator
+? ((BitSetIterator) iterator).getBitSet()
+: BitSet.of(iterator, maxDoc);
+Bits acceptDocs =
+new Bits() {
+  @Override
+  public boolean get(int index) {
+return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) 
: bitSet.get(index);
+  }
+
+  @Override
+  public int length() {
+return bitSet.cardinality();
+  }

Review Comment:
   `length()` is more like the [maximum 
doc](https://github.com/apache/lucene/blob/0e96b9cd8c1267acee02684bcac5be89d60b3bf4/lucene/core/src/java/org/apache/lucene/util/Bits.java#L73)
 you can request, this should be `bitSet.length()`?



##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
   return NO_RESULTS;
 }
 
-BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-int cost = acceptDocs.cardinality();
+DocIdSetIterator iterator = scorer.iterator();
+BitSet bitSet =
+iterator instanceof BitSetIterator
+? ((BitSetIterator) iterator).getBitSet()
+: BitSet.of(iterator, maxDoc);
+Bits acceptDocs =
+new Bits() {
+  @Override
+  public boolean get(int index) {
+return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) 
: bitSet.get(index);

Review Comment:
   nit: Can we make this cleaner by `return (liveDocs == null || 
liveDocs.get(index)) && bitSet.get(index)` ?



##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
   return NO_RESULTS;
 }
 
-BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-int cost = acceptDocs.cardinality();
+DocIdSetIterator iterator = scorer.iterator();
+BitSet bitSet =
+iterator instanceof BitSetIterator
+? ((BitSetIterator) iterator).getBitSet()
+: BitSet.of(iterator, maxDoc);
+Bits acceptDocs =
+new Bits() {

Review Comment:
   Perhaps we can wrap the `BitSet` in a new `Bits` only for the case we're 
trying to optimize (when `iterator instanceof BitSetIterator`) -- not changing 
the `BitSet.of` flow like @shubhamvishu also mentioned?



##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, 
Weight filterWeight) throw
   return NO_RESULTS;
 }
 
-BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-int cost = acceptDocs.cardinality();
+DocIdSetIterator iterator = scorer.iterator();
+BitSet bitSet =
+iterator instanceof BitSetIterator
+? ((BitSetIterator) iterator).getBitSet()
+: BitSet.of(iterator, maxDoc);
+Bits acceptDocs =
+new Bits() {
+  @Override
+  public boolean get(int index) {
+return liveDocs != null ? liveDocs.get(index) & bitSet.get(index) 
: bitSet.get(index);
+  }
+
+  @Override
+  public int length() {
+return bitSet.cardinality();
+  }
+};
+
+FilteredDocIdSetIterator filteredDocIdSetIterator =
+new FilteredDocIdSetIterator(new BitSetIterator(bitSet, maxDoc)) {
+  @Override
+  protected boolean match(int doc) {
+return liveDocs == null || liveDocs.get(doc);
+  }
+};
+
+int cost = acceptDocs.length();

Review Comment:
   The `cost` here determines what limit to set for `#approximateSearch`
   
   If we use `acceptDocs.length()`, this will be equal to `maxDoc` (and always 
complete graph search without falling back to exact search, even when we want 
to..)
   
   Perhaps this should be `acceptDocs.cardinality()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscri

Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2023-12-01 Thread via GitHub


uschindler commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1836540146

   I tested the new code - and at the same time commenting out the Java 21 
fallback code to check for "closed" in the `IllegalStateException` message here:
   
   
https://github.com/apache/lucene/blob/5c9152917614ecee0eef7612ab71c24c9d92529c/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L116-L121
   
   With JDK 22 build 23 the test fails because it rethrows the 
`IllegalStateException` due to the `isAlive()` check didn't work. With build 26 
beasting the test worked for hours:
   
   ```
   $ ./gradlew :lucene:core:beast --tests TestMMapDirectory.testAceWithThreads 
-Ptests.multiplier=500 -Ptests.dups=100
   ```
   
   (build 23 or earlier Java versions like 21 failed on first or second build)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub


benwtrent merged PR #12866:
URL: https://github.com/apache/lucene/pull/12866


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Simplifying text area stream in Luke- ticket 12809 [lucene]

2023-12-01 Thread via GitHub


pratikshelarkar opened a new pull request, #12869:
URL: https://github.com/apache/lucene/pull/12869

   Hi, 
   Simplifying text area stream in Luke- ticket 12809
   
   This is my first contribution to lucene. Can you please review my code and 
advice. I'll try my best to add this enhancement.
   
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Simplifying TextAreaPrintStream in Luke [lucene]

2023-12-01 Thread via GitHub


pratikshelarkar commented on issue #12809:
URL: https://github.com/apache/lucene/issues/12809#issuecomment-1836594465

   Hi, Can you please review my PR for this ticket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


stefanvodita commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836599463

   I did some memory profiling and it doesn't look promising. Let's take 
initial capacity 100 as an example. Total allocation is 3.41GB compared to 
1.29GB on the baseline, and peak heap usage is 946MiB compared to 547MiB on 
baseline. It's worse for smaller initial capacities and larger capacities 
devolve into the baseline behavior.
   
   I might try a few runs on a different data-set, but maybe these neighbor 
arrays tend to be mostly used in the general case.
   
   
   Baseline: 
[baseline.jfr.zip](https://github.com/apache/lucene/files/13530935/baseline.jfr.zip)
   https://github.com/apache/lucene/assets/41467371/a2a09c9c-ef91-4227-a4f8-655404b2cb21";>
   
   
   Candidate: 
[candidate-100.jfr.zip](https://github.com/apache/lucene/files/13530936/candidate-100.jfr.zip)
   https://github.com/apache/lucene/assets/41467371/80c2141a-fc6f-4624-9f7a-fb264ad36ef5";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


stefanvodita commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412454853


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   > what happen if INITIAL_CAPACITY > maxSize
   
   Great point. I fixed this and tested with `INITIAL_CAPACITY == 1000` again. 
In the profiler, it looks the same as running the baseline. I assume nodes tend 
not to have 1000 neighbors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub


shubhamvishu commented on PR #12857:
URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836604882

   @kaivalnp We could use the `acceptDocs.cardinality()` when its a 
`BitSetIterator` to get the upper bound which might have some deletes but that 
would still change the decision sometimes of whether to go for exact search or 
not. Since we don't know how many of those docs are live but we do know the num 
of deletes in the segment(we don't know the intersections of these two). One 
thing that might be tried is to come up with some heuristic that adds some 
penalty to the cost based on the num of deletes in the segment (i.e. 
`ctx.reader().numDeletedDocs()/ctx.reader().maxDoc()`). Like maybe if there are 
10% deletes we could for eg decrease the cost by 10% or maybe 5%. This might 
help in cases where we miss falling back to exact search. Though this would 
need some thorough benchmarking to see what works best.
   
   On separate note, I'm thinking if there is some use case where we don't 
require to know this cost upfront and directly go for approximate search only 
for instance. Currently, this optimization only kicks in when the iterator is 
of `BitSetIterator` but if its possible to ignore this cost step or get this 
cost by some other heuristic/approximation then we could completely make it 
completely lazily evaluated using `DISI#advance(docid)` for those use cases. 
@msokolov @benwtrent Maybe you could share your thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412475261


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   @stefanvodita with a maxConn of `64`, nodes will at most have `128` 
neighbors (maybe `129` as when we exceed the accounted size, we will then 
remove one before storing).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


stefanvodita commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412502016


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   Oh, I see. Is there some recommended value or normal range for maxConn?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub


zhaih commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1412533096


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -32,17 +32,20 @@
  * @lucene.internal
  */
 public class NeighborArray {
+  private static final int INITIAL_CAPACITY = 10;
   private final boolean scoresDescOrder;
+  private final int maxSize;
   private int size;
   float[] score;
   int[] node;
   private int sortedNodeSize;
   public final ReadWriteLock rwlock = new ReentrantReadWriteLock(true);
 
   public NeighborArray(int maxSize, boolean descOrder) {
-node = new int[maxSize];
-score = new float[maxSize];
+node = new int[INITIAL_CAPACITY];

Review Comment:
   I think it's just due to having consuming more memory -> more GC cycles 
needed -> higher latency.
   
   So I rechecked the code, when we insert a node, we will first collect 
`beamWidth` number of candidates, and then try to diversely add those 
candidates to the neighborArray. So I think:
* in case that `beamWidth > maxSize`, we can just init this with `maxSize` 
and done, because it's likely in a larger graph that the first fill will 
directly fill the `NeighborArray` to full and there's no point on resizing it 
with any init size.
* in case that `beamWidth < maxSize`, we can just init this with 
`beamWidth` such that the init fill will likely fill the array in a nearly full 
state?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on PR #12857:
URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836748924

   Is our goal memory usage or speed?
   
   We could use `FixedBitSet#intersectionCount` and keep from having to create 
a new bit set that is the intersection. 
   
   I am honestly not sure if the implementation here is any faster than just 
creating the bit set upfront and checking it. During search, you now have to 
check two bitsets now instead of one. 
   
   If the filter happens to be `< number of docs visited in a typical search`, 
your implementation here seems like it would be slower.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub


benwtrent commented on PR #12857:
URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836750284

   Broad feedback: any "optimizations" without benchmarking aren't 
optimizations, they are just guesses.
   
   I am curious to see if this helps CPU usage in anyway. I could see it 
helping memory usage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add ParentJoin KNN support [lucene]

2023-12-01 Thread via GitHub


david-sitsky commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1836848114

   > @david-sitsky sorry for the confusion, it was renamed 
`DiversifyingChildren*KnnVectorQuery`
   
   Ah.. no worries, thanks.  We should update the changelog 
https://lucene.apache.org/core/9_8_0/changes/Changes.html#v9.8.0.new_features 
since it is still referring to the old classnames.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub


shubhamvishu commented on PR #12868:
URL: https://github.com/apache/lucene/pull/12868#issuecomment-1837045741

   @mikemccand rightly pointed out that `luceneutil` doesn't use bloom filter 
postings format by default and we should enable it for `id` field and rerun the 
benchmarks to see the impact. Will rerun and share the results here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org