date:20250317

Re: [I] TestKnnGraph.testMultiThreadedSearch random test failure [lucene]

2025-03-17 Thread via GitHub



benwtrent commented on issue #14327:
URL: https://github.com/apache/lucene/issues/14327#issuecomment-2730201462

   Git bisect blames: a6a96cde1c65fddb65363f0090a0202fd6db329c
   
   Which, if the scores are the same between docs, makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-17 Thread via GitHub



jpountz commented on code in PR #14365:
URL: https://github.com/apache/lucene/pull/14365#discussion_r1999318407


##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -251,6 +252,30 @@ public void visit(int docID, byte[] packedValue) {
   }
 }
 
+@Override
+public void visit(IntsRef ref) {
+  final int[] docs = ref.ints;
+  for (int i = ref.offset, to = ref.offset + ref.length; i < to; 
i++) {
+int docID = docs[i];
+if (docID > maxDocVisited) {
+  adder.add(docID);
+}
+  }
+}
+
+@Override
+public void visit(DocIdSetIterator iterator) throws IOException {
+  assert iterator.docID() == -1;
+  int docID = iterator.nextDoc();
+  if (docID <= maxDocVisited) {
+docID = iterator.advance(maxDocVisited + 1);
+  }
+  while (docID != DocIdSetIterator.NO_MORE_DOCS) {
+adder.add(docID);
+docID = iterator.nextDoc();
+  }

Review Comment:
   Could it just be a simple loop such as:
   
   ```
   for (int doc = iterator.advance(maxDocVisited + 1); doc != NO_MORE_DOCS; doc 
= iterator.nextDoc()) {
 adder.add(doc);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-17 Thread via GitHub



msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730390727

   Instead of a boolean flag, what if we define an interface that specifies the 
folding rules? 
   
   It could have two methods: one that folds input characters to a canonical 
representation (before sorting) and one that expands from the canonical 
representation to the characters that should be matched. We could ship ASCII 
and Turkish implementations to start, say. If someone has a Romanian corpus 
that has a mix of characters with and without diacritics, they might strip 
diacritics on input and expand them for matching. (That would effectively 
combine lowercase and ASCII folding.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-17 Thread via GitHub



rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730397885

   I think my ask is misunderstood, it is just to follow the Unicode standard. 
There are two mappings for simple case folding:
   * Default
   * Alternate (Turkish/azeri)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-17 Thread via GitHub



rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730410784

   If you want to do fancy romanian accent removal, use an analyzer and 
normalize your data. That's what a search engine is all about. 
   
   But if we want to provide some limited runtime expansion (which I'm still 
very unsure about), let's just stick with the standards and not invent 
something else. No need for interfaces, abstractions or lambdas. `boolean` is 
the correct solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] add Automaton.toMermaid() for emitting mermaid state charts [lucene]

2025-03-17 Thread via GitHub



rmuir commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2730119438

   i'll keep the PR up here.  Actually as a first step, I'd rather improve 
existing toDot() and regex toString(). It would help the logic here, too.
   
   There's no need to escape codepoints as `\u0010`, we could do `U+10` 
instead. And it is very conservative about what is "printable".
   
   If I want to make them look pretty, i edit the .dot file and fix some of 
these things. We can definitely try to help remove that step.
   
   For mermaid, it does allow accessible descriptions via "hover" with the 
mouse. It would be nice to be "aggressive" about displaying characters but at 
the same time provide the ability to see the unicode range if you want. I'll 
see what the possibilities are with dot here too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Decode doc ids in BKD leaves with auto-vectorized loops [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729902081

   I raised an PR for annotation. 
https://github.com/mikemccand/luceneutil/pull/354.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729280384

   OK i get expected results that multiple of 16 faster than multiple of 8 when 
i force `-XX:UseAVX=3`, it can be seen AVX3 is slower on this chip, that may be 
why java disabled it by default.
   
   ```
   Benchmark Mode  Cnt   Score  
 Error   Units
   Decode21Benchmark.decode21Scalar thrpt5  28.375 
? 0.064  ops/ms
   Decode21Benchmark.decode21Scalar:asm thrpt  NaN  
   ---
   Decode21Benchmark.decode21Vector thrpt5  41.844 
? 0.182  ops/ms
   Decode21Benchmark.decode21Vector:asm thrpt  NaN  
   ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf16  thrpt5  64.471 
? 0.218  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf16:asm  thrpt  NaN  
   ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf8   thrpt5  39.665 
? 0.120  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf8:asm   thrpt  NaN  
   ---
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Decode doc ids in BKD leaves with auto-vectorized loops [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729806648

   Nightly benchmark confirmed the speed up 
https://benchmarks.mikemccandless.com/2025.03.16.18.04.58.html.
   
   Thanks again for profile guide and helping figure out simpler and faster 
codes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Multi-threaded vector search over multiple segments can lead to inconsistent results [lucene]

2025-03-17 Thread via GitHub



tteofili commented on issue #14180:
URL: https://github.com/apache/lucene/issues/14180#issuecomment-2730194375

   > Could you please let me know which future version of Elasticsearch will 
resolve the vector search consistency problem?
   
   we are investigating on a proper solution to this issue that addresses the 
determinism while retaining good efficiency (e.g., see some early experiments 
in this [draft PR](https://github.com/apache/lucene/pull/14191))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Completion FSTs to be loaded off-heap by default [lucene]

2025-03-17 Thread via GitHub



javanna commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999422163


##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionPostingsFormat.java:
##
@@ -122,11 +122,6 @@ public enum FSTLoadMode {
 
   private final FSTLoadMode fstLoadMode;
 
-  /** Used only by core Lucene at read-time via Service Provider instantiation 
*/
-  public CompletionPostingsFormat(String name) {

Review Comment:
   This was unused and redundant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Speedup merging of HNSW graphs [lucene]

2025-03-17 Thread via GitHub



mayya-sharipova commented on PR #14331:
URL: https://github.com/apache/lucene/pull/14331#issuecomment-2730506974

   @msokolov Thanks for the comment.
   I've experimented setting: beamCandidates0 to `M * 3` increasing it from the 
previous `M*2` when building merged graphs. 
   Graphs look better, but there are still significant speedups
   
   Evaluation is done with Luceneutil on these datasets:
   
   1. **quora-E5-small**; 522931 docs; 384 dims; 7 bits quantized; cosine metric
   
  - baseline: index time: **112.41s**,  force merge: **113.81s**
   
  - candidate: index time: **81.55s**, force merge: **70.87s**
   
   2. **cohere-wikipedia-v2**; 1M docs; 768 dims; 7 bits quantized; cosine 
metric
   
  - baseline: index time: **158.1s**, force merge: **425.20s**
   
  - candidate: index time: **122.95s**, force merge: **239.28s**
   
   3. **gist**; 960 dims, 1M docs; 7 bits quantized; euclidean metric
   
  - baseline: index time: **141.82s**, force merge: **536.07s**
   
  - candidate: index time: **119.26s**, force merge: **279.05s**
   
   4. **cohere-wikipedia-v3**; 1M docs; 1024 dims; 7 bits quantized; 
dot_product metric
   
  - baseline: index time: **211.86s**, force merge: **654.97s**
   
  - candidate: index time: **168.22s,** force merge: **414.12s**
   
   
![10_multiple](https://github.com/user-attachments/assets/3d78f854-386a-4ec7-b488-ad36ac1cdd58)
   
![10_single](https://github.com/user-attachments/assets/528bc255-e199-4764-9c02-12e96692394b)
   
![100_multiple](https://github.com/user-attachments/assets/7c45e8df-4c59-41c8-b922-f1c18274b8ae)
   
![100_single](https://github.com/user-attachments/assets/a0211c9e-2f8f-4e1f-9604-e5fe70c96eb5)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-03-17 Thread via GitHub



kaivalnp commented on PR #14131:
URL: https://github.com/apache/lucene/pull/14131#issuecomment-2730545360

   Exciting change! Since this PR adds a new codec for vector search, I wanted 
to point to #14178 along similar lines -- adding a new Faiss-based KNN format 
to index and query vectors
   
   Faiss (https://github.com/facebookresearch/faiss) is _"a library for 
efficient similarity search and clustering of dense vectors"_. It supports 
various features like vector transforms (eg PCA), indexing algorithms (eg IVF, 
HNSW, etc), quantization techniques (eg PQ), search strategies (eg 2-step 
refinement), different hardware (including 
[GPUs](https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU) -- also 
has support for cuVS) -- and adding this codec would allow users to make use of 
(most of) these features!
   
   Internally, the format calls the [C 
API](https://github.com/facebookresearch/faiss/blob/main/c_api/INSTALL.md) of 
Faiss using Panama (https://openjdk.org/projects/panama) FFI. The codec is 
present in the sandbox module, and does _not_ add Faiss as a dependency of 
Lucene -- only relies on the shared library (along with all dependencies) to be 
present at runtime (on `$LD_LIBRARY_PATH` or `-Djava.library.path`)
   
   Would appreciate feedback on the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-17 Thread via GitHub



msfroh commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2730578138

   Okay, got it! That's the piece that I was misunderstanding. I didn't realize 
that Turkish/Azeri is the **only** other valid folding. I kept thinking of it 
as just an example where the naïve folding wouldn't work.
   
   I'll go ahead and update this PR with that in mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Completion FSTs to be loaded off-heap by default [lucene]

2025-03-17 Thread via GitHub



javanna commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999421774


##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/Completion101PostingsFormat.java:
##
@@ -25,17 +25,9 @@
  * @lucene.experimental
  */
 public class Completion101PostingsFormat extends CompletionPostingsFormat {
-  /** Creates a {@link Completion101PostingsFormat} that will load the 
completion FST on-heap. */
+  /** Creates a {@link Completion101PostingsFormat} that will load the 
completion FST off-heap. */
   public Completion101PostingsFormat() {
-this(FSTLoadMode.ON_HEAP);
-  }
-
-  /**
-   * Creates a {@link Completion101PostingsFormat} that will use the provided 
fstLoadMode
-   *  to determine if the completion FST should be loaded on or off 
heap.
-   */
-  public Completion101PostingsFormat(FSTLoadMode fstLoadMode) {

Review Comment:
   I find that these constructors taking the fst load mode are trappy, in that 
they make readers think that calling them allows to override the load mode. In 
practice, the load mode is only ever used at read time, and that codepath goes 
through SPI loading, so via the default constructor. I think removing these is 
the best path forward to avoid confusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] A little optimization about LZ4 [lucene]

2025-03-17 Thread via GitHub



jainankitk commented on issue #14347:
URL: https://github.com/apache/lucene/issues/14347#issuecomment-2730860749

   I am not sure if it is good idea to have this as user parameter. But, I am 
wondering if the default for `BEST_SPEED`  should be using preset dict as that 
compromises speed for compression. Users can always use `BEST_COMPRESSION` if 
they are willing to compromise on speed in favor of compression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Completion FSTs to be loaded off-heap by default [lucene]

2025-03-17 Thread via GitHub



javanna opened a new pull request, #14364:
URL: https://github.com/apache/lucene/pull/14364

   All the existing completion postings format load their FSTs on-heap. It is 
possible to customize that behaviour by mainintaing a custom postings format 
that override the fst load mode.
   
   TestSuggestField attempted to test all modes, but in practice, it can only 
test the default load mode because the read path relies on the default 
constructor called via SPI, that does not allow providing the fst load mode.
   
   This commit changes the default value to OFF_HEAP, and addresses the testing 
gap in TestSuggestField, so that all load modes are tested. This unveiled a bug 
when loading FST off heap and the index input is a MemorySegmentIndexInput 
which is fixed by #14271.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-03-17 Thread via GitHub



javanna commented on PR #14275:
URL: https://github.com/apache/lucene/pull/14275#issuecomment-2729969364

   I opened #14364 to make the suggested change to the completion postings 
format, let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-03-17 Thread via GitHub



javanna commented on PR #14270:
URL: https://github.com/apache/lucene/pull/14270#issuecomment-2729966315

   I opened #14364 to still address the testing gap, but also change the 
default load mode to off heap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-03-17 Thread via GitHub



javanna closed pull request #14270: Address completion fields testing gap and 
truly allow loading FST off heap
URL: https://github.com/apache/lucene/pull/14270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] BooleanScorer doesn't optimize for TwoPhaseIterator [lucene]

2025-03-17 Thread via GitHub



dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729965328

   An aside:  `org.apache.lucene.search.DisjunctionScorer.TwoPhase#matches` 
looks kind of sad, in that each matches() call is going to build a priority 
queue of "unverified matches" (DisiWrapper holding TwoPhaseIterator).  It seems 
strange to populate one on visiting each doc instead of maintaining a fixed 
pre-sorted array of them, since we know which clauses have TPIs.  The 
DisiWrapper could have a TPI index (by match cost) into an array of 
TwoPhaseIterators.  The selected unverifiedMatches per matches() call might be 
noted via a bitmask/bitset that is cheap to set & clear & iterate set bits.  Or 
could just use an array of DisiWrapper that is cleared & filled.  No matchCost 
comparisons & heap manipulation.
   Not sure if I'm over-optimizing here.  The use-case bringing me here is only 
one TPI, and it's approximation is all docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Decode doc ids in BKD leaves with auto-vectorized loops [lucene]

2025-03-17 Thread via GitHub



jpountz commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2729975580

   Fantastic speedup. Nice to see tasks like `TermDayOfYearSort` also take 
advantage from this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] BooleanScorer doesn't optimize for TwoPhaseIterator [lucene]

2025-03-17 Thread via GitHub



jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729988278

   The current approach is probably not the fastest indeed. We should add a 
task to nightly benchmarks if we want to optimize this. Something like a 
disjunction of phrase queries (possibly both for top-n collection and counting)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Opening of vector files with ReadAdvice.RANDOM_PRELOAD [lucene]

2025-03-17 Thread via GitHub



jainankitk commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2730902349

   > Lucene currently uses ReadAdvice.RANDOM when opening these files. I think 
it would be better to use RANDOM_PRELOAD.
   
   As per the documentation for RANDOM_PRELOAD:
   
_This should only be used on very small files that can be expected to fit 
in RAM with very high confidence._
   
   Hence, I am wondering if it can have adverse impact for cases having small 
page cache. While https://github.com/apache/lucene/pull/13264 adds way to 
configure ReadAdvice, that applies to all file formats I guess. It would be 
great, if users can tune the ReadAdvice for specific formats, depending on the 
server page cache and data size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] BooleanScorer doesn't optimize for TwoPhaseIterator [lucene]

2025-03-17 Thread via GitHub



dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2730909459

   BTW I don't have plans to explore this further.  Anyone should feel free to 
take over.  Or abandon if nobody cares -- I admit it's very unusual to even 
have a top level disjunction, let alone one with an expensive clause, as 
implied by the TPI.  I found this while writing a simple test, so simple that 
it wasn't real-world representative.  The real-world representative has the 
disjunction deeper than the top level, and that which doesn't show this 
performance problem as the bulk scorer is then not used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



mccullocht commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1999677796


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,72 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool with lazy initialization
+private static class LazyHolder {
+static final ExecutorService MERGE_THREAD_POOL = 
+
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+}
+
+private static ExecutorService getMergeThreadPool() {
+return LazyHolder.MERGE_THREAD_POOL;
+}
+
+// Use getMergeThreadPool() instead of direct access
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (mergeSource.hasPendingMerges()) { // Use hasPendingMerges() 
instead of relying on null check
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {
+break; // Explicitly exit if no merge is available
+}
+
+// Submit merge task to the shared thread pool
+MERGE_THREAD_POOL.submit(() -> {
+try {
+mergeSource.merge(merge);
+} catch (IOException e) {
+throw new RuntimeException("Merge operation failed", e);
+}
+});
+
+// Cleanup completed merges
+activeMerges.removeIf(Future::isDone);
+}
+}
+
+@Override
+public void close() throws IOException {

Review Comment:
   IIRC `IndexWriter.close()` will call `MergeScheduler.close()` which means 
that you would wait for all merges across all indexes using this scheduler to 
complete. The `IndexWriter` that is currently calling `close()` will stop 
scheduling new merges, but the other writers may not, which means this may not 
terminate. To fix this you may need to map active merges back to the 
`IndexWriter` or `Directory` they belong to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Implement #docIDRunEnd() on `DisjunctionDISIApproximation`. [lucene]

2025-03-17 Thread via GitHub



jpountz merged PR #14363:
URL: https://github.com/apache/lucene/pull/14363


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Completion FSTs to be loaded off-heap by default [lucene]

2025-03-17 Thread via GitHub



jpountz commented on code in PR #14364:
URL: https://github.com/apache/lucene/pull/14364#discussion_r1999714466


##
lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java:
##
@@ -951,7 +951,16 @@ static IndexWriterConfig iwcWithSuggestField(Analyzer 
analyzer, final Set

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-17 Thread via GitHub



rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2731329809

   it is confusing. because unicode case folding algorithm is supposed to work 
for everyone. But here's the problem:
   
   for most of the world:
   * lowercase i has a dot, uppercase I has no dot.
   
   for turkish/azeri world:
   * lowercase i corresponds to İ with a dot and lowercase ı without dot 
corresponds to I
   
   that's why they need their own separate mappings. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Add Issue Tracker Link under 'Editing Content on the Lucene™ Sites' [lucene-site]

2025-03-17 Thread via GitHub



DivyanshIITB opened a new pull request, #78:
URL: https://github.com/apache/lucene-site/pull/78

   This PR adds a direct link to the [Lucene Issue 
Tracker](https://issues.apache.org/jira/projects/LUCENE) under the "Editing 
Content on the Lucene™ sites" section in site-instructions.md.
   
   Changes Made:
   
   - Updated site-instructions.md to include a reference to the issue tracker.
   - Placed the link right after the mention of the GitHub and Gitbox 
repositories for better visibility.
   
   Reason for the Change:
   
   - Helps contributors quickly find where to report website-related issues.
   - Improves documentation clarity and accessibility.
   
   Fixes #72 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Implement bulk adding methods for dynamic pruning. [lucene]

2025-03-17 Thread via GitHub



gf2121 opened a new pull request, #14365:
URL: https://github.com/apache/lucene/pull/14365

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Opening of vector files with ReadAdvice.RANDOM_PRELOAD [lucene]

2025-03-17 Thread via GitHub



jpountz commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2730966937

   For what it's worth, it's possible to override the read advice of vectors 
with something like that:
   
   ```java
   Path path = ...;
   Directory dir = new FilterDirectory(FSDirectory.open(path)) {
 @Override
 public IndexInput openInput(String name, IOContext context) throws 
IOException {
   if (name.endsWith(".vex") || name.endsWith(".vec") || 
name.endsWith(".veb")) {
 context = context.withReadAdvice(ReadAdvice.RANDOM_PRELOAD);
   }
   return super.openInput(name, context);
 }
   };
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998645110


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);

Review Comment:
   Thanks for the suggestion! I've updated the implementation to use lazy 
static initialization by wrapping the thread pool in a nested static class. 
This ensures the thread pool is only created when first accessed, avoiding 
unnecessary resource allocation. Let me know if you have any further feedback! 
Now, instead of directly initializing MERGE_THREAD_POOL, I'm using a LazyHolder 
class with a getMergeThreadPool() method to ensure thread-safe lazy 
initialization. This should prevent the thread pool from being created when the 
policy is unused."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998646386


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (true) {
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {
+break;
+}
+// Submit merge task to the shared thread pool
+Future future = MERGE_THREAD_POOL.submit(() -> {
+try {
+mergeSource.merge(merge);
+} catch (IOException e) {
+throw new RuntimeException(e);
+}
+});
+
+try {
+future.get(); // Ensure the task completes
+} catch (Exception e) {
+throw new IOException("Merge operation failed", e);
+}

Review Comment:
   Thanks for the feedback! You're absolutely right—blocking on future.get(); 
was making the merges sequential. I've removed it so that merges can now 
proceed asynchronously in the background. Let me know if this aligns with the 
intended behavior!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998645593


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (true) {
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {

Review Comment:
   Thanks for pointing this out! I’ve updated the loop condition to use 
hasPendingMerges() instead of relying on null. This ensures better adherence to 
the documented API behavior. Let me know if this aligns with what you were 
suggesting!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998646829


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (true) {
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {
+break;
+}
+// Submit merge task to the shared thread pool
+Future future = MERGE_THREAD_POOL.submit(() -> {
+try {
+mergeSource.merge(merge);
+} catch (IOException e) {
+throw new RuntimeException(e);
+}
+});
+
+try {
+future.get(); // Ensure the task completes
+} catch (Exception e) {
+throw new IOException("Merge operation failed", e);
+}
+}
+}
+
+@Override
+public void close() {

Review Comment:
   Thanks for pointing this out! I’ve updated close() to properly wait for all 
running merges, similar to sync() in ConcurrentMergeScheduler. Now, merge() 
tracks active tasks, and I've added cleanup for completed futures. Also, I 
introduced shutdownThreadPool() to allow safe global shutdown. Let me know if 
this works!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998648834


##
lucene/core/src/test/org/apache/lucene/index/TestMultiTenantMergeScheduler.java:
##
@@ -0,0 +1,30 @@
+package org.apache.lucene.index;
+
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.RAMDirectory;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.util.LuceneTestCase;
+
+public class TestMultiTenantMergeScheduler extends LuceneTestCase {
+
+public void testMultiTenantMergeScheduler() throws Exception {

Review Comment:
   Thanks for the feedback! I’ve expanded the test suite to include multiple 
writers consuming the scheduler, different merge scheduling times, and 
staggered start/close behavior. Also, I’ve referred to 
TestConcurrentMergeScheduler for structuring test cases to verify concurrency. 
Let me know if you have additional suggestions!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



DivyanshIITB commented on PR #14335:
URL: https://github.com/apache/lucene/pull/14335#issuecomment-2729358357

   I have a request to you. Kindly ignore the following two deleted files in 
the "Files Changed" section :
   "KeepOnlyLastCommitDeletionPolicy.java"
   "ConcurrentMergeScheduler.java"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729094239

   Results on `wikimediumall`: 
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
 IntSet   94.25  (1.4%)   93.50  
(1.1%)   -0.8% (  -3% -1%) 0.069
CountFilteredIntNRQ   43.06  (2.1%)   43.27  
(2.2%)0.5% (  -3% -4%) 0.529
 FilteredIntNRQ   79.65  (3.1%)   80.63  
(3.4%)1.2% (  -5% -8%) 0.285
 IntNRQ   81.27  (3.6%)   82.37  
(3.8%)1.3% (  -5% -9%) 0.311
   ```
   
   
   
   https://bytedance.larkoffice.com/sheets/J1W9s9NM5hVgOYtvecAccMrJnXc";
 data-importRangeRawData-range="'Sheet1'!A1:D16">
     | baselline | candidate | diff
   -- | -- | -- | --
   _32.kdd | 58829701 | 58713421 | -0.20%
   _65.kdd | 63532709 | 63171629 | -0.57%
   _98.kdd | 64475136 | 63986556 | -0.76%
   _cb.kdd | 64765401 | 64210351 | -0.86%
   _fe.kdd | 63945666 | 63346246 | -0.94%
   _fp.kdd | 6619883 | 6094047 | -7.94%
   _g0.kdd | 6422847 | 5915855 | -7.89%
   _gb.kdd | 6483860 | 5981968 | -7.74%
   _gm.kdd | 6494743 | 6005943 | -7.53%
   _gx.kdd | 6253202 | 5811974 | -7.06%
   _gy.kdd | 522267 | 522267 | 0.00%
   _gz.kdd | 521444 | 521444 | 0.00%
   _h0.kdd | 522729 | 522729 | 0.00%
   _h1.kdd | 504805 | 504805 | 0.00%
   _h2.kdd | 463669 | 463669 | 0.00%
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2025-03-17 Thread via GitHub



jpountz commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729147376

   Should we floor to a multiple of 16 instead of 8 so that we have a perfect 
second loop with AVX-512 as well? (By the way, which of your machine produced 
the above benchmark results?) Otherwise, the change makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-17 Thread via GitHub



jpountz commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998546217


##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(

Review Comment:
   The contract is that `TwoPhaseIterator#matches()` verifies if the current 
doc ID of the approximation matches. So we cannot load matches from the 
approximation into a bit set and then call `matches()` as it wouldn't check the 
correct doc ID. We could change the API, but I'm not sure if that would buy 
anything as our most typical two-phase iterator, PhraseQuery, is similar: 
`PostingsEnum#nextPosition()` reads the next position of the current doc ID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-17 Thread via GitHub



jpountz commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998546819


##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(
+  LeafCollector collector,
+  Bits acceptDocs,
+  List approximations,
+  List twoPhases,
+  int min,
+  int max)
+  throws IOException {
+if (approximations.size() == 1) {
+  // Since a TwoPhaseIterator comes with its approximation, the number of 
approximations is gte
+  // the number of TwoPhaseIterators
+  assert twoPhases.size() <= 1;
+  TwoPhaseIterator twoPhase = twoPhases.get(0);
+  DocIdSetIterator approximation = approximations.isEmpty() ? null : 
approximations.get(0);

Review Comment:
   Hmm, you're right. This means I'm missing a test case. I'll look into it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998596143


##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(

Review Comment:
   > The contract is that TwoPhaseIterator#matches() verifies if the current 
doc ID of the approximation matches.
   
   Oh yes, you are right. Thanks for explanation!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on PR #14361:
URL: https://github.com/apache/lucene/pull/14361#issuecomment-2729250628

   Thanks for feedback,
   
   > Should we floor to a multiple of 16 instead of 8 so that we have a perfect 
second loop with AVX-512 as well?
   
   That is what i thought initially. But my AVX-512 machine (hopefully it is) 
somehow only deals 256 bit once for its `vpand` and `vpslld` instructions so 
flooring to multiple of 8 works on it. Flooring to multiple of 16 makes 
benchmark a bit slower on my machine as more remaining longs need to be read 
but i'm fine to floor it to multiple of 16 there are machines requiring that.
   
   ```
   Benchmark Mode  Cnt   Score  
 Error   Units
   Decode21Benchmark.decode21Scalar thrpt5  28.114 
? 0.013  ops/ms
   Decode21Benchmark.decode21Scalar:asm thrpt  NaN  
   ---
   Decode21Benchmark.decode21Vector thrpt5  49.160 
? 1.661  ops/ms
   Decode21Benchmark.decode21Vector:asm thrpt  NaN  
   ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf16  thrpt5  74.828 
? 0.463  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf16:asm  thrpt  NaN  
   ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf8   thrpt5  81.078 
? 0.397  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf8:asm   thrpt  NaN  
   ---
   ```
   
   cpu flags:
   ```
   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc 
rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid 
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb 
stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx 
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl 
xsaveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512_vnni arch_capabilities
   ```
   
   > By the way, which of your machine produced the above benchmark results?
   
   The luceneutil results get on the intel chip (Intel(R) Xeon(R) Gold 5118 CPU 
@ 2.30GHz (AVX 512)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2025-03-17 Thread via GitHub



gf2121 opened a new pull request, #14361:
URL: https://github.com/apache/lucene/pull/14361

   This PR tries another way to implement the idea of 
https://github.com/apache/lucene/pull/13521, taking advantage of 
auto-vectorized loop to decode ints like we did in for bpv24 in 
https://github.com/apache/lucene/pull/14203.
   
   One thing need to be pointed out is that the remainder loop does not get 
vectorized (again!) since `512 / 3 = 170` is not a multiple of 8, then you see 
the`floorToMultipleOf8` trick .
   
   **Mac M2**
   ```
   BenchmarkMode  CntScore   
Error   Units
   Decode21Benchmark.decode21Scalarthrpt5   92.405 ± 
0.521  ops/ms
   Decode21Benchmark.decode21Vectorthrpt5  108.325 ± 
1.517  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf8  thrpt5  141.691 ± 
3.948  ops/ms
   ```
   
   **Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (AVX 512)**
   ```
   BenchmarkMode  Cnt   Score   
Error   Units
   Decode21Benchmark.decode21Scalarthrpt5  29.134 ? 
0.087  ops/ms
   Decode21Benchmark.decode21Scalar:asmthrpt  NaN   
  ---
   Decode21Benchmark.decode21Vectorthrpt5  45.180 ? 
0.479  ops/ms
   Decode21Benchmark.decode21Vector:asmthrpt  NaN   
  ---
   Decode21Benchmark.decode21VectorFloorToMultipleOf8  thrpt5  76.330 ? 
2.858  ops/ms
   Decode21Benchmark.decode21VectorFloorToMultipleOf8:asm  thrpt  NaN   
  ---
   ```
   
   cc @expani who raised this neat idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998258104


##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(

Review Comment:
   Out of curiosity: why we are choosing such a per-doc conjunction algorithm 
instead of scoring approximations into a bitset and check `matches` for 
existing bits? Actually i also thought about something like 
`IndexedDISI#intoBitset` to speed up range queries on `Match.IF_DOC_HAS_VALUE` 
:)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] TestKnnGraph.testMultiThreadedSearch random test failure [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on issue #14327:
URL: https://github.com/apache/lucene/issues/14327#issuecomment-2729377397

   Seeing similar failure as well:
   
   ```
   > java.lang.AssertionError: [doc=0 score=0.990099 shardIndex=-1, doc=3 
score=0.49751243 shardIndex=-1, doc=5 score=0.21691975 shardIndex=-1, doc=6 
score=0.19960079 shardIndex=-1, doc=8 score=0.17825313 shardIndex=-1] 
expected:<5> but was:<8>
  > at 
__randomizedtesting.SeedInfo.seed([7E836993445B[632](https://github.com/apache/lucene/actions/runs/13899513369/job/38887681283?pr=14361#step:5:633)6]:0)
  > at org.junit.Assert.fail(Assert.java:89)
  > at org.junit.Assert.failNotEquals(Assert.java:835)
  > at org.junit.Assert.assertEquals(Assert.java:647)
  > at 
org.apache.lucene.index.TestKnnGraph.assertResults(TestKnnGraph.java:356)
  > at 
org.apache.lucene.index.TestKnnGraph.lambda$testMultiThreadedSearch$0(TestKnnGraph.java:308)
  > at java.base/java.lang.Thread.run(Thread.java:1575)
  > 
  > java.lang.AssertionError: [doc=0 score=0.990099 shardIndex=-1, 
doc=3 score=0.49751243 shardIndex=-1, doc=5 score=0.21691975 shardIndex=-1, 
doc=6 score=0.19960079 shardIndex=-1, doc=8 score=0.17825313 shardIndex=-1] 
expected:<5> but was:<8>
  > at __randomizedtesting.SeedInfo.seed([7E836993445B6326]:0)
  > at org.junit.Assert.fail(Assert.java:89)
  > at org.junit.Assert.failNotEquals(Assert.java:835)
  > at org.junit.Assert.assertEquals(Assert.java:647)
  > at 
org.apache.lucene.index.TestKnnGraph.assertResults(TestKnnGraph.java:356)
  > at 
org.apache.lucene.index.TestKnnGraph.lambda$testMultiThreadedSearch$0(TestKnnGraph.java:308)
  > at java.base/java.lang.Thread.run(Thread.java:1575)
  > 
  > java.lang.AssertionError: [doc=0 score=0.990099 shardIndex=-1, 
doc=3 score=0.49751243 shardIndex=-1, doc=5 score=0.21691975 shardIndex=-1, 
doc=6 score=0.19960079 shardIndex=-1, doc=8 score=0.17825313 shardIndex=-1] 
expected:<5> but was:<8>
  > at __randomizedtesting.SeedInfo.seed([7E836993445B6326]:0)
  > at org.junit.Assert.fail(Assert.java:89)
  > at org.junit.Assert.failNotEquals(Assert.java:835)
  > at 
org.junit.Assert.assertEquals(Assert.java:[647](https://github.com/apache/lucene/actions/runs/13899513369/job/38887681283?pr=14361#step:5:648))
  > at 
org.apache.lucene.index.TestKnnGraph.assertResults(TestKnnGraph.java:356)
  > at 
org.apache.lucene.index.TestKnnGraph.lambda$testMultiThreadedSearch$0(TestKnnGraph.java:308)
  > at java.base/java.lang.Thread.run(Thread.java:1575)
 2> NOTE: reproduce with: gradlew test --tests 
TestKnnGraph.testMultiThreadedSearch -Dtests.seed=7E836993445B6326 
-Dtests.locale=kam -Dtests.timezone=America/Guyana -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] add Automaton.toMermaid() for emitting mermaid state charts [lucene]

2025-03-17 Thread via GitHub



dweiss commented on PR #14360:
URL: https://github.com/apache/lucene/pull/14360#issuecomment-2728443809

   I've looked at the docs of mermaid and toyed around a bit. I agree that 
infinite loops are so ugly that one's eyes start to bleed. Maybe we should 
stick to graphviz.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-03-17 Thread via GitHub



vigyasharma commented on code in PR #14335:
URL: https://github.com/apache/lucene/pull/14335#discussion_r1998041333


##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (true) {
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {

Review Comment:
   Can we use `hasPendingMerges()` instead? I know existing schedulers assume 
`null` means there are no more merges, but it's not the documented API behavior.



##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);

Review Comment:
   This will create a threadpool even when this policy is not used. Let's use 
lazy static initialization.



##
lucene/core/src/test/org/apache/lucene/index/TestMultiTenantMergeScheduler.java:
##
@@ -0,0 +1,30 @@
+package org.apache.lucene.index;
+
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.RAMDirectory;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.util.LuceneTestCase;
+
+public class TestMultiTenantMergeScheduler extends LuceneTestCase {
+
+public void testMultiTenantMergeScheduler() throws Exception {

Review Comment:
   This test is not sufficient. We need to test (and also implement) a lot more 
scenarios, like multiple writers consuming this scheduler, scheduling different 
merges and starting/closing at different times. You can refer to the tests in 
`TestConcurrentMergeScheduler` as a starting point.



##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriters.
+ */
+public class MultiTenantMergeScheduler extends MergeScheduler {
+
+// Shared global thread pool
+private static final ExecutorService MERGE_THREAD_POOL = 
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() / 2);
+
+@Override
+public void merge(MergeScheduler.MergeSource mergeSource, MergeTrigger 
trigger) throws IOException {
+while (true) {
+MergePolicy.OneMerge merge = mergeSource.getNextMerge();
+if (merge == null) {
+break;
+}
+// Submit merge task to the shared thread pool
+Future future = MERGE_THREAD_POOL.submit(() -> {
+try {
+mergeSource.merge(merge);
+} catch (IOException e) {
+throw new RuntimeException(e);
+}
+});
+
+try {
+future.get(); // Ensure the task completes
+} catch (Exception e) {
+throw new IOException("Merge operation failed", e);
+}

Review Comment:
   I don't think this achieves concurrent background merging like we'd want. I 
think this code will submit a merge request and block on its future for 
completion, effectively making all merges sequential.



##
lucene/core/src/java/org/apache/lucene/index/MultiTenantMergeScheduler.java:
##
@@ -0,0 +1,44 @@
+package org.apache.lucene.index;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.io.IOException;
+
+/**
+ * A multi-tenant merge scheduler that shares a global thread pool across 
multiple IndexWriter

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-17 Thread via GitHub



gf2121 commented on code in PR #14359:
URL: https://github.com/apache/lucene/pull/14359#discussion_r1998219212


##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(
+  LeafCollector collector,
+  Bits acceptDocs,
+  List approximations,
+  List twoPhases,
+  int min,
+  int max)
+  throws IOException {
+if (approximations.size() == 1) {
+  // Since a TwoPhaseIterator comes with its approximation, the number of 
approximations is gte
+  // the number of TwoPhaseIterators
+  assert twoPhases.size() <= 1;
+  TwoPhaseIterator twoPhase = twoPhases.get(0);
+  DocIdSetIterator approximation = approximations.isEmpty() ? null : 
approximations.get(0);

Review Comment:
   ```suggestion
 DocIdSetIterator approximation = approximations.get(0);
 TwoPhaseIterator twoPhase = twoPhases.isEmpty() ? null : 
twoPhases.get(0);
   ```



##
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java:
##
@@ -238,9 +296,77 @@ private void scoreWindowUsingBitSet(
 windowMatches.clear();
   }
 
+  private void scoreWindowUsingLeapFrog(

Review Comment:
   Out of curiosity: why we are choosing such a per-doc conjunction algorithm 
instead of scoring approximations into a bitset and check `matches` for 
existing bits? Actually i also thought about something like 
`IndexedDISI#intoBitset` to speed up range queries on sparse doc values :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-17 Thread via GitHub



guojialiang92 opened a new issue, #14362:
URL: https://github.com/apache/lucene/issues/14362

   ### Description
   
   Can we support modifying `segmentInfos.counter` in `IndexWriter`?
   This can be used to skip some segment names when writing.
   In the scenario of enabling `segment replication` in an Elasticsearch 
cluster, this control can be implemented during primary shard peer recovery and 
replica shard promotion to avoid segment counter conflicts.
   
   Please help evaluate, if possible, I will submit a Pull Request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Specialise DirectMonotonicReader when it only contains one block [lucene]

2025-03-17 Thread via GitHub



iverase closed pull request #14358: Specialise DirectMonotonicReader when it 
only contains one block
URL: https://github.com/apache/lucene/pull/14358


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

52 matches

Mail list logo