Re: [PR] [Bug] Fix for stored fields force merge regression [lucene]
github-actions[bot] commented on PR #14512: URL: https://github.com/apache/lucene/pull/14512#issuecomment-2887885025 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up exhaustive evaluation. [lucene]
gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2093432298 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -208,6 +208,25 @@ protected SimScorer() {} */ public abstract float score(float freq, long norm); +/** + * Batch-score documents. This method scores {@code size} documents at once. The default + * implementation can be found below: + * + * + * for (int i = 0; i < size; ++i) { + * scores[i] = score(freqs[i], norms[i]); + * } + * + * + * @see #score(float, long) + * @lucene.internal + */ +public void score(int size, int[] freqs, long[] norms, float[] scores) { + for (int i = 0; i < size; ++i) { +scores[i] = score(freqs[i], norms[i]); Review Comment: > We may also be able to do a bit better than calling score in a loop Yeah! I played with`BM25` a bit and the result looks promising: ``` Benchmark Mode Cnt Score Error Units VectorizedBM25Benchmark.scoreBaseline thrpt5 10.991 ± 0.356 ops/us VectorizedBM25Benchmark.scoreVectorthrpt5 15.149 ± 0.029 ops/us ``` ``` public static void scoreBaseline(int size, int[] freqs, long[] norms, float[] scores, float[] cache, int weight, float[] buffer) { for (int i = 0; i < size; ++i) { float normInverse = cache[((byte) norms[i]) & 0xFF]; scores[i] = weight - weight / (1f + freqs[i] * normInverse); } } public static void scoreVector(int size, int[] freqs, long[] norms, float[] scores, float[] cache, int weight, float[] buffer) { for (int i = 0; i < size; ++i) { buffer[i] = cache[((byte) norms[i]) & 0xFF]; } for (int i = 0; i < size; ++i) { scores[i] = weight - weight / (1f + freqs[i] * buffer[i]); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up exhaustive evaluation. [lucene]
jpountz commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2093664788 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -208,6 +208,25 @@ protected SimScorer() {} */ public abstract float score(float freq, long norm); +/** + * Batch-score documents. This method scores {@code size} documents at once. The default + * implementation can be found below: + * + * + * for (int i = 0; i < size; ++i) { + * scores[i] = score(freqs[i], norms[i]); + * } + * + * + * @see #score(float, long) + * @lucene.internal + */ +public void score(int size, int[] freqs, long[] norms, float[] scores) { + for (int i = 0; i < size; ++i) { +scores[i] = score(freqs[i], norms[i]); Review Comment: Exciting! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]
msokolov merged PR #14680: URL: https://github.com/apache/lucene/pull/14680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchark regression on pre-filtered vector search [lucene]
msokolov closed issue #14671: Nightly benchark regression on pre-filtered vector search URL: https://github.com/apache/lucene/issues/14671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
RKSPD commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-2887741322 > I am actually in the process of extending Lucene Codec for JVector DiskANN integration. Note this work is part of [opensearch-project/k-NN#2386](https://github.com/opensearch-project/k-NN/issues/2386) I can share my branch once it's ready and perhaps later can try and take a stab at bringing it back into Lucene as an optional KNN codec. Since jVector is a JVM library it would be ideal to have its DiskANN implementation supported as a Lucene codec to make it easier to integrate back into OpenSearch and other upstream dependencies. I'm working on a spinoff similar to OpenSearch's JVector codec as a standalone `KnnVectorsFormat` for Lucene. This would live in the sandbox module for now and would integrate with Lucene's existing vector APIs and codec SPI. I'll open that spinoff issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
RKSPD commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-2887805413 > > I am actually in the process of extending Lucene Codec for JVector DiskANN integration. Note this work is part of [opensearch-project/k-NN#2386](https://github.com/opensearch-project/k-NN/issues/2386) I can share my branch once it's ready and perhaps later can try and take a stab at bringing it back into Lucene as an optional KNN codec. Since jVector is a JVM library it would be ideal to have its DiskANN implementation supported as a Lucene codec to make it easier to integrate back into OpenSearch and other upstream dependencies. > > I'm working on a spinoff similar to OpenSearch's JVector codec as a standalone `KnnVectorsFormat` for Lucene. This would live in the sandbox module for now and would integrate with Lucene's existing vector APIs and codec SPI. I'll open that spinoff issue. Opened [#14681](https://github.com/apache/lucene/issues/14681#issue-3070014510) would really appreciate feedback! Thank you :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve BytesRef creation from String [lucene]
github-actions[bot] commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887286703 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve BytesRef creation from String [lucene]
github-actions[bot] commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887298479 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]
rmuir commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2887482208 Prime suspect: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/commit/027f29fb4104bac71151c47ce637fb18579a4a36 You may look at other changes to flags and such on the arch package between 6.12 and 6.14 but i would bet a beer on that one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve BytesRef creation from String [lucene]
schlosna commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2887624646 Added `BytesRefBenchmark` demonstrating existing `new BytesRef(CharSequence)` vs. `new BytesRef(String)` demonstrating 2x to 6x throughput improvement on AMD EPYC 7R13 Processor: ``` # AMD EPYC 7R13 Processor # JMH version: 1.37 # VM version: JDK 24.0.1, OpenJDK 64-Bit Server VM, 24.0.1+9-FR # VM invoker: /home/dev/.gradle/jdks/amazon-corretto-24.0.1.9.1-linux-x64/bin/java # VM options: -XX:+UseParallelGC -XX:TieredStopAtLevel=1 -XX:ActiveProcessorCount=1 -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable) # Warmup: 3 iterations, 3 s each # Measurement: 5 iterations, 3 s each (before) (after) bytesRefCharSequence bytesRefString Benchmark (length) (type) Mode Cnt Score Error Score Error Units BytesRefBenchmark10 ASCII thrpt5 75.650 ± 4.078 180.382 ± 7.672 ops/us BytesRefBenchmark10 ISO_8859_1 thrpt5 66.688 ± 3.957 114.655 ± 7.510 ops/us BytesRefBenchmark10 UTF_8_BMP thrpt5 46.678 ± 2.387 77.036 ± 2.484 ops/us BytesRefBenchmark10 UTF_16 thrpt5 26.289 ± 1.376 49.021 ± 1.740 ops/us BytesRefBenchmark 100 ASCII thrpt5 9.106 ± 0.487 47.246 ± 1.218 ops/us BytesRefBenchmark 100 ISO_8859_1 thrpt5 8.447 ± 0.628 23.809 ± 0.488 ops/us BytesRefBenchmark 100 UTF_8_BMP thrpt5 5.129 ± 0.141 10.431 ± 0.336 ops/us BytesRefBenchmark 100 UTF_16 thrpt5 2.963 ± 0.127 5.904 ± 0.271 ops/us BytesRefBenchmark 1000 ASCII thrpt5 0.917 ± 0.038 5.891 ± 0.428 ops/us BytesRefBenchmark 1000 ISO_8859_1 thrpt5 0.830 ± 0.024 1.981 ± 0.077 ops/us BytesRefBenchmark 1000 UTF_8_BMP thrpt5 0.468 ± 0.014 0.950 ± 0.057 ops/us BytesRefBenchmark 1000 UTF_16 thrpt5 0.298 ± 0.011 0.537 ± 0.021 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]
github-actions[bot] commented on PR #14680: URL: https://github.com/apache/lucene/pull/14680#issuecomment-2887632883 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Use per-segment K in filtered KNN fallback logic (fixes 14671) [lucene]
msokolov opened a new pull request, #14680: URL: https://github.com/apache/lucene/pull/14680 Originally (before optimistic KNN query): ``` recall latency(ms) nDoc topK fanout maxConn beamWidth quantized visited selectivity filterType vec_disk(MB) vec_RAM(MB) indexType 0.6937.374 100 100 50 16 50 no 3753 0.05 pre-filter 0.0000.000 HNSW 0.7057.752 100 100 100 16 50 no 4680 0.05 pre-filter 0.0000.000 HNSW 0.764 13.086 100 100 250 16 50 no 6307 0.05 pre-filter 0.0000.000 HNSW ``` Mainline (filtering broken): ``` recall latency(ms) nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB) indexType 0.894 31.510 100 100 50 16 50 no 0.00 Infinity 00.00 0.0000.000 HNSW 1.000 46.757 100 100 100 16 50 no 0.00 Infinity 00.00 0.0000.000 HNSW 1.000 52.029 100 100 250 16 50 no 0.00 Infinity 00.00 0.0000.000 HNSW ``` With this fix: ``` recall latency(ms) nDoc topK fanout maxConn beamWidth quantized visited selectivity filterType vec_disk(MB) vec_RAM(MB) indexType 0.7349.018 100 100 50 16 50 no 8951 0.05 pre-filter 0.0000.000 HNSW 0.742 10.259 100 100 100 16 50 no 9857 0.05 pre-filter 0.0000.000 HNSW 0.771 10.981 100 100 250 16 50 no 12117 0.05 pre-filter 0.0000.000 HNSW ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]
uschindler commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2886130445 > I'm also not liking these unrelated hardware errors!! But they are pre-existing for a long time now... This happens on modern hardware more often. You often get PCIe checksum errors which can be corrected when the NVMe is heavily used, According to the hardware builder this is not a reason to complain, because the kernel is working as expected (repeating the PCIe request or correcting the CRC error). They say only if you get more than a few per minute, its a reason to change the board or other hardware. The problem is more the verbose error logging of the kernel. It should be reduced because it alarms users for things which aren't important. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]
uschindler commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2886118006 > Hmm downgrading to Java 23 is not so simple ... I got it installed, cutover benchy's N places to use Java 23, but then Lucene's `main` insists in at least two places that I'm running Java 24. Could I temporarily turn off this check? Or is there a real/subtle reason why Java 23 will not work anymore (besides that it is officially EOL'd)? You can't compile the expressions module anymore. The change @jpountz mentions would not prevent you from building. The biggest problem is the vector code which was updated to java 24 already, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] TestStressNRTReplication may never terminate (exceed suite timeout) [lucene]
dweiss opened a new issue, #14664: URL: https://github.com/apache/lucene/issues/14664 ### Description It hangs in the 'restarter' thread on this condition: ``` while (startupThreads.size() > 0) { Thread.sleep(10); } ``` the main thread just joins the restarted and never ends. The bug is easy to show if you add a sleep before startupThreads.add(t); it is currently invoked after the sub-thread has been started - this means that it is possible the sub-thread removes itself from the startupThreads list before it's added to it, leaving the list in an inconsistent state (and never reach an empty state). ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix termination condition in TestStressNRTReplication. [lucene]
github-actions[bot] commented on PR #14665: URL: https://github.com/apache/lucene/pull/14665#issuecomment-2879192882 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix termination condition in TestStressNRTReplication. [lucene]
dweiss commented on code in PR #14665: URL: https://github.com/apache/lucene/pull/14665#discussion_r2088329682 ## lucene/replicator/src/test/org/apache/lucene/replicator/nrt/TestStressNRTReplication.java: ## @@ -994,26 +998,26 @@ public void run() { } finally { starting[idx] = false; startupThreads.remove(Thread.currentThread()); + message("N" + idx + ": top: removed thread"); } } }; +startupThreads.add(t); t.setName("start R" + idx); t.start(); -startupThreads.add(t); } Review Comment: This is the actual fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]
mikemccand commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2887249181 Thanks @uschindler! I was able to get @jpountz idea to work -- it ran in last night's run (2025-05-15) and it looks to me like Java 23 -> 24 was not responsible for the slowdown! [`VectorSearch`](https://benchmarks.mikemccandless.com/VectorSearch.html) is still slow on the last data point, and same for [`CountAndHighHigh`](https://benchmarks.mikemccandless.com/CountAndHighHigh.html). This is on Lucene's `main` https://github.com/apache/lucene/commit/10c31217a91563ce42a0ab26a677738c181a7b63. I think this is sort of good news -- Java upgrade isn't to blame. But then it leaves the sinister question of what was to blame ... lemme get the full list of changed packages on that day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Override ValueSource.FromDoubleValuesSource.getSortField [lucene]
dsmiley merged PR #14654: URL: https://github.com/apache/lucene/pull/14654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve BytesRef creation from String [lucene]
vigyasharma commented on code in PR #14678: URL: https://github.com/apache/lucene/pull/14678#discussion_r2093976930 ## lucene/CHANGES.txt: ## @@ -41,6 +41,7 @@ Optimizations - * GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam Durina) * GITHUB#14022: Optimize DFS marking of connected components in HNSW by reducing stack depth, improving performance and reducing allocations. (Viswanath Kuchibhotla) +* GITHUB#14678: Optimize BytesRef creation from Strings, improving throughput and reducing allocations. (David Schlosnagle) Review Comment: This could go in 10.3, any reason to keep this for 11.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Added toString() method to BytesRefBuilder [lucene]
vigyasharma commented on code in PR #14676: URL: https://github.com/apache/lucene/pull/14676#discussion_r2093986452 ## lucene/CHANGES.txt: ## @@ -49,6 +49,9 @@ Bug Fixes * GITHUB#14075: Remove duplicate and add missing entry on brazilian portuguese stopwords list. (Arthur Caccavo) +* GITHUB#14161: PointInSetQuery's constructor now throws IllegalArgumentException Review Comment: We can put this in 10.3, doesn't need to wait for 11.0 ## lucene/core/src/test/org/apache/lucene/search/TestPointQueries.java: ## @@ -2599,4 +2599,33 @@ public void testPointInSetQuerySkipsNonMatchingSegments() throws IOException { w.close(); dir.close(); } + + public void testOutOfOrderValuesInPointInSetQuery() throws Exception { Review Comment: Thanks for adding this test! ## lucene/core/src/java/org/apache/lucene/util/BytesRefBuilder.java: ## @@ -171,4 +171,9 @@ public boolean equals(Object obj) { public int hashCode() { throw new UnsupportedOperationException(); } + + @Override + public String toString() { +return this.get().toString(); Review Comment: This will return hex encoded bytes. Is that okay for the exception message? Should we use `utf8ToString` instread? ## lucene/core/src/test/org/apache/lucene/search/TestPointQueries.java: ## @@ -2599,4 +2599,33 @@ public void testPointInSetQuerySkipsNonMatchingSegments() throws IOException { w.close(); dir.close(); } + + public void testOutOfOrderValuesInPointInSetQuery() throws Exception { +IllegalArgumentException expected = +expectThrows( +IllegalArgumentException.class, +() -> { + new PointInSetQuery( + "foo", + 1, + 1, + new PointInSetQuery.Stream() { +private final BytesRef[] values = { + newBytesRef(new byte[] {2}), newBytesRef(new byte[] {1}) // out of order +}; +int index = 0; + +@Override +public BytesRef next() { + return index < values.length ? values[index++] : null; +} + }) { +@Override +protected String toString(byte[] point) { Review Comment: Why did we need this override? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org