Re: [PR] [Bug] Fix for stored fields force merge regression [lucene]

2025-05-16 Thread via GitHub


github-actions[bot] commented on PR #14512:
URL: https://github.com/apache/lucene/pull/14512#issuecomment-2887885025

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-16 Thread via GitHub


gf2121 commented on code in PR #14679:
URL: https://github.com/apache/lucene/pull/14679#discussion_r2093432298


##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -208,6 +208,25 @@ protected SimScorer() {}
  */
 public abstract float score(float freq, long norm);
 
+/**
+ * Batch-score documents. This method scores {@code size} documents at 
once. The default
+ * implementation can be found below:
+ *
+ * 
+ * for (int i = 0; i < size; ++i) {
+ *   scores[i] = score(freqs[i], norms[i]);
+ * }
+ * 
+ *
+ * @see #score(float, long)
+ * @lucene.internal
+ */
+public void score(int size, int[] freqs, long[] norms, float[] scores) {
+  for (int i = 0; i < size; ++i) {
+scores[i] = score(freqs[i], norms[i]);

Review Comment:
   > We may also be able to do a bit better than calling score in a loop
   
   Yeah! I played with`BM25` a bit and the result looks promising:
   
   ```
   Benchmark   Mode  Cnt   Score   Error   Units
   VectorizedBM25Benchmark.scoreBaseline  thrpt5  10.991 ± 0.356  ops/us
   VectorizedBM25Benchmark.scoreVectorthrpt5  15.149 ± 0.029  ops/us
   ```
   ```
   public static void scoreBaseline(int size, int[] freqs, long[] norms, 
float[] scores, float[] cache, int weight, float[] buffer) {
 for (int i = 0; i < size; ++i) {
   float normInverse = cache[((byte) norms[i]) & 0xFF];
   scores[i] = weight - weight / (1f + freqs[i] * normInverse);
 }
   }
   
   public static void scoreVector(int size, int[] freqs, long[] norms, float[] 
scores, float[] cache, int weight, float[] buffer) {
 for (int i = 0; i < size; ++i) {
   buffer[i] = cache[((byte) norms[i]) & 0xFF];
 }
 for (int i = 0; i < size; ++i) {
   scores[i] = weight - weight / (1f + freqs[i] * buffer[i]);
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-16 Thread via GitHub


jpountz commented on code in PR #14679:
URL: https://github.com/apache/lucene/pull/14679#discussion_r2093664788


##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -208,6 +208,25 @@ protected SimScorer() {}
  */
 public abstract float score(float freq, long norm);
 
+/**
+ * Batch-score documents. This method scores {@code size} documents at 
once. The default
+ * implementation can be found below:
+ *
+ * 
+ * for (int i = 0; i < size; ++i) {
+ *   scores[i] = score(freqs[i], norms[i]);
+ * }
+ * 
+ *
+ * @see #score(float, long)
+ * @lucene.internal
+ */
+public void score(int size, int[] freqs, long[] norms, float[] scores) {
+  for (int i = 0; i < size; ++i) {
+scores[i] = score(freqs[i], norms[i]);

Review Comment:
   Exciting!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]

2025-05-16 Thread via GitHub


msokolov merged PR #14680:
URL: https://github.com/apache/lucene/pull/14680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchark regression on pre-filtered vector search [lucene]

2025-05-16 Thread via GitHub


msokolov closed issue #14671: Nightly benchark regression on pre-filtered 
vector search
URL: https://github.com/apache/lucene/issues/14671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2025-05-16 Thread via GitHub


RKSPD commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-2887741322

   > I am actually in the process of extending Lucene Codec for JVector DiskANN 
integration. Note this work is part of 
[opensearch-project/k-NN#2386](https://github.com/opensearch-project/k-NN/issues/2386)
 I can share my branch once it's ready and perhaps later can try and take a 
stab at bringing it back into Lucene as an optional KNN codec. Since jVector is 
a JVM library it would be ideal to have its DiskANN implementation supported as 
a Lucene codec to make it easier to integrate back into OpenSearch and other 
upstream dependencies.
   
   
   I'm working on a spinoff similar to OpenSearch's JVector codec as a 
standalone `KnnVectorsFormat` for Lucene. This would live in the sandbox module 
for now and would integrate with Lucene's existing vector APIs and codec SPI. 
I'll open that spinoff issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2025-05-16 Thread via GitHub


RKSPD commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-2887805413

   > > I am actually in the process of extending Lucene Codec for JVector 
DiskANN integration. Note this work is part of 
[opensearch-project/k-NN#2386](https://github.com/opensearch-project/k-NN/issues/2386)
 I can share my branch once it's ready and perhaps later can try and take a 
stab at bringing it back into Lucene as an optional KNN codec. Since jVector is 
a JVM library it would be ideal to have its DiskANN implementation supported as 
a Lucene codec to make it easier to integrate back into OpenSearch and other 
upstream dependencies.
   > 
   > I'm working on a spinoff similar to OpenSearch's JVector codec as a 
standalone `KnnVectorsFormat` for Lucene. This would live in the sandbox module 
for now and would integrate with Lucene's existing vector APIs and codec SPI. 
I'll open that spinoff issue.
   
   Opened 
[#14681](https://github.com/apache/lucene/issues/14681#issue-3070014510) would 
really appreciate feedback! Thank you :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-16 Thread via GitHub


github-actions[bot] commented on PR #14678:
URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887286703

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-16 Thread via GitHub


github-actions[bot] commented on PR #14678:
URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887298479

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-16 Thread via GitHub


rmuir commented on issue #14630:
URL: https://github.com/apache/lucene/issues/14630#issuecomment-2887482208

   Prime suspect: 
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/commit/027f29fb4104bac71151c47ce637fb18579a4a36
   
   You may look at other changes to flags and such on the arch package between 
6.12 and 6.14 but i would bet a beer on that one. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-16 Thread via GitHub


schlosna commented on PR #14678:
URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887624646

   Added `BytesRefBenchmark` demonstrating existing `new 
BytesRef(CharSequence)` vs. `new BytesRef(String)` demonstrating 2x to 6x 
throughput improvement on AMD EPYC 7R13 Processor:
   
   ```
   # AMD EPYC 7R13 Processor
   # JMH version: 1.37
   # VM version: JDK 24.0.1, OpenJDK 64-Bit Server VM, 24.0.1+9-FR
   # VM invoker: 
/home/dev/.gradle/jdks/amazon-corretto-24.0.1.9.1-linux-x64/bin/java
   # VM options: -XX:+UseParallelGC -XX:TieredStopAtLevel=1 
-XX:ActiveProcessorCount=1 -Dfile.encoding=UTF-8 -Duser.country=US 
-Duser.language=en -Duser.variant
   # Blackhole mode: compiler (auto-detected, use 
-Djmh.blackhole.autoDetect=false to disable)
   # Warmup: 3 iterations, 3 s each
   # Measurement: 5 iterations, 3 s each
   
 (before) 
(after)
   bytesRefCharSequence   
bytesRefString
   Benchmark  (length)  (type)   Mode  Cnt  Score Error 
Score Error   Units
   BytesRefBenchmark10   ASCII  thrpt5 75.650 ±   4.078   
180.382 ±   7.672  ops/us
   BytesRefBenchmark10  ISO_8859_1  thrpt5 66.688 ±   3.957   
114.655 ±   7.510  ops/us
   BytesRefBenchmark10   UTF_8_BMP  thrpt5 46.678 ±   2.387
77.036 ±   2.484  ops/us
   BytesRefBenchmark10  UTF_16  thrpt5 26.289 ±   1.376
49.021 ±   1.740  ops/us
   BytesRefBenchmark   100   ASCII  thrpt5  9.106 ±   0.487
47.246 ±   1.218  ops/us
   BytesRefBenchmark   100  ISO_8859_1  thrpt5  8.447 ±   0.628
23.809 ±   0.488  ops/us
   BytesRefBenchmark   100   UTF_8_BMP  thrpt5  5.129 ±   0.141
10.431 ±   0.336  ops/us
   BytesRefBenchmark   100  UTF_16  thrpt5  2.963 ±   0.127 
5.904 ±   0.271  ops/us
   BytesRefBenchmark  1000   ASCII  thrpt5  0.917 ±   0.038 
5.891 ±   0.428  ops/us
   BytesRefBenchmark  1000  ISO_8859_1  thrpt5  0.830 ±   0.024 
1.981 ±   0.077  ops/us
   BytesRefBenchmark  1000   UTF_8_BMP  thrpt5  0.468 ±   0.014 
0.950 ±   0.057  ops/us
   BytesRefBenchmark  1000  UTF_16  thrpt5  0.298 ±   0.011 
0.537 ±   0.021  ops/us
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]

2025-05-16 Thread via GitHub


github-actions[bot] commented on PR #14680:
URL: https://github.com/apache/lucene/pull/14680#issuecomment-2887632883

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]

2025-05-16 Thread via GitHub


msokolov opened a new pull request, #14680:
URL: https://github.com/apache/lucene/pull/14680

   Originally (before optimistic KNN query):
   
   ```
   recall  latency(ms) nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  selectivity   filterType  vec_disk(MB)  vec_RAM(MB)  indexType
0.6937.374  100   100  50   16 50 no
 3753  0.05   pre-filter 0.0000.000   HNSW
0.7057.752  100   100 100   16 50 no
 4680  0.05   pre-filter 0.0000.000   HNSW
0.764   13.086  100   100 250   16 50 no
 6307  0.05   pre-filter 0.0000.000   HNSW
   ```
   
   Mainline (filtering broken):
   
   ```
   recall  latency(ms) nDoc  topK  fanout  maxConn  beamWidth  quantized  
index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB) 
 indexType
0.894   31.510  100   100  50   16 50 no
  0.00  Infinity 00.00 0.0000.000   
HNSW
1.000   46.757  100   100 100   16 50 no
  0.00  Infinity 00.00 0.0000.000   
HNSW
1.000   52.029  100   100 250   16 50 no
  0.00  Infinity 00.00 0.0000.000   
HNSW
   ```
   
   With this fix:
   
   ```
   recall  latency(ms) nDoc  topK  fanout  maxConn  beamWidth  quantized  
visited  selectivity   filterType  vec_disk(MB)  vec_RAM(MB)  indexType
0.7349.018  100   100  50   16 50 no
 8951 0.05   pre-filter 0.0000.000   HNSW
0.742   10.259  100   100 100   16 50 no
 9857 0.05   pre-filter 0.0000.000   HNSW
0.771   10.981  100   100 250   16 50 no
12117 0.05   pre-filter 0.0000.000   HNSW
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-16 Thread via GitHub


uschindler commented on issue #14630:
URL: https://github.com/apache/lucene/issues/14630#issuecomment-2886130445

   > I'm also not liking these unrelated hardware errors!! But they are 
pre-existing for a long time now...
   
   This happens on modern hardware more often. You often get PCIe checksum 
errors which can be corrected when the NVMe is heavily used, According to the 
hardware builder this is not a reason to complain, because the kernel is 
working as expected (repeating the PCIe request or correcting the CRC error). 
They say only if you get more than a few per minute, its a reason to change the 
board or other hardware.
   
   The problem is more the verbose error logging of the kernel. It should be 
reduced because it alarms users for things which aren't important.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-16 Thread via GitHub


uschindler commented on issue #14630:
URL: https://github.com/apache/lucene/issues/14630#issuecomment-2886118006

   > Hmm downgrading to Java 23 is not so simple ... I got it installed, 
cutover benchy's N places to use Java 23, but then Lucene's `main` insists in 
at least two places that I'm running Java 24. Could I temporarily turn off this 
check? Or is there a real/subtle reason why Java 23 will not work anymore 
(besides that it is officially EOL'd)?
   
   You can't compile the expressions module anymore. The change @jpountz 
mentions would not prevent you from building.
   
   The biggest problem is the vector code which was updated to java 24 already,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] TestStressNRTReplication may never terminate (exceed suite timeout) [lucene]

2025-05-16 Thread via GitHub


dweiss opened a new issue, #14664:
URL: https://github.com/apache/lucene/issues/14664

   ### Description
   
   It hangs in the 'restarter' thread on this condition:
   ```
   while (startupThreads.size() > 0) {
 Thread.sleep(10);
   }
   ```
   the main thread just joins the restarted and never ends. 
   
   The bug is easy to show if you add a sleep before startupThreads.add(t); it 
is currently invoked after the sub-thread has been started - this means that it 
is possible the sub-thread removes itself from the startupThreads list before 
it's added to it, leaving the list in an inconsistent state (and never reach an 
empty state).
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix termination condition in TestStressNRTReplication. [lucene]

2025-05-16 Thread via GitHub


github-actions[bot] commented on PR #14665:
URL: https://github.com/apache/lucene/pull/14665#issuecomment-2879192882

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix termination condition in TestStressNRTReplication. [lucene]

2025-05-16 Thread via GitHub


dweiss commented on code in PR #14665:
URL: https://github.com/apache/lucene/pull/14665#discussion_r2088329682


##
lucene/replicator/src/test/org/apache/lucene/replicator/nrt/TestStressNRTReplication.java:
##
@@ -994,26 +998,26 @@ public void run() {
 } finally {
   starting[idx] = false;
   startupThreads.remove(Thread.currentThread());
+  message("N" + idx + ": top: removed thread");
 }
   }
 };
+startupThreads.add(t);
 t.setName("start R" + idx);
 t.start();
-startupThreads.add(t);
   }

Review Comment:
   This is the actual fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-16 Thread via GitHub


mikemccand commented on issue #14630:
URL: https://github.com/apache/lucene/issues/14630#issuecomment-2887249181

   Thanks @uschindler!  I was able to get @jpountz idea to work -- it ran in 
last night's run (2025-05-15) and it looks to me like Java 23 -> 24 was not 
responsible for the slowdown!  
[`VectorSearch`](https://benchmarks.mikemccandless.com/VectorSearch.html) is 
still slow on the last data point, and same for 
[`CountAndHighHigh`](https://benchmarks.mikemccandless.com/CountAndHighHigh.html).
  This is on Lucene's `main` 
https://github.com/apache/lucene/commit/10c31217a91563ce42a0ab26a677738c181a7b63.
   
   I think this is sort of good news -- Java upgrade isn't to blame.
   
   But then it leaves the sinister question of what was to blame ... lemme get 
the full list of changed packages on that day.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Override ValueSource.FromDoubleValuesSource.getSortField [lucene]

2025-05-16 Thread via GitHub


dsmiley merged PR #14654:
URL: https://github.com/apache/lucene/pull/14654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-16 Thread via GitHub


vigyasharma commented on code in PR #14678:
URL: https://github.com/apache/lucene/pull/14678#discussion_r2093976930


##
lucene/CHANGES.txt:
##
@@ -41,6 +41,7 @@ Optimizations
 -
 * GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam 
Durina)
 * GITHUB#14022: Optimize DFS marking of connected components in HNSW by 
reducing stack depth, improving performance and reducing allocations. 
(Viswanath Kuchibhotla)
+* GITHUB#14678: Optimize BytesRef creation from Strings, improving throughput 
and reducing allocations. (David Schlosnagle)
 

Review Comment:
   This could go in 10.3, any reason to keep this for 11.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Added toString() method to BytesRefBuilder [lucene]

2025-05-16 Thread via GitHub


vigyasharma commented on code in PR #14676:
URL: https://github.com/apache/lucene/pull/14676#discussion_r2093986452


##
lucene/CHANGES.txt:
##
@@ -49,6 +49,9 @@ Bug Fixes
 
 * GITHUB#14075: Remove duplicate and add missing entry on brazilian portuguese 
stopwords list. (Arthur Caccavo)
 
+* GITHUB#14161: PointInSetQuery's constructor now throws 
IllegalArgumentException

Review Comment:
   We can put this in 10.3, doesn't need to wait for 11.0



##
lucene/core/src/test/org/apache/lucene/search/TestPointQueries.java:
##
@@ -2599,4 +2599,33 @@ public void 
testPointInSetQuerySkipsNonMatchingSegments() throws IOException {
 w.close();
 dir.close();
   }
+
+  public void testOutOfOrderValuesInPointInSetQuery() throws Exception {

Review Comment:
   Thanks for adding this test!



##
lucene/core/src/java/org/apache/lucene/util/BytesRefBuilder.java:
##
@@ -171,4 +171,9 @@ public boolean equals(Object obj) {
   public int hashCode() {
 throw new UnsupportedOperationException();
   }
+
+  @Override
+  public String toString() {
+return this.get().toString();

Review Comment:
   This will return hex encoded bytes. Is that okay for the exception message? 
Should we use `utf8ToString` instread?



##
lucene/core/src/test/org/apache/lucene/search/TestPointQueries.java:
##
@@ -2599,4 +2599,33 @@ public void 
testPointInSetQuerySkipsNonMatchingSegments() throws IOException {
 w.close();
 dir.close();
   }
+
+  public void testOutOfOrderValuesInPointInSetQuery() throws Exception {
+IllegalArgumentException expected =
+expectThrows(
+IllegalArgumentException.class,
+() -> {
+  new PointInSetQuery(
+  "foo",
+  1,
+  1,
+  new PointInSetQuery.Stream() {
+private final BytesRef[] values = {
+  newBytesRef(new byte[] {2}), newBytesRef(new byte[] {1}) 
// out of order
+};
+int index = 0;
+
+@Override
+public BytesRef next() {
+  return index < values.length ? values[index++] : null;
+}
+  }) {
+@Override
+protected String toString(byte[] point) {

Review Comment:
   Why did we need this override?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org