[PR] [9_10] Mark TermInSetQuery ctors with varargs terms as deprecated [lucene]

2023-12-01 Thread via GitHub
slow-J opened a new pull request, #12864: URL: https://github.com/apache/lucene/pull/12864 Issue: https://github.com/apache/lucene/issues/12243. The deprecated items here are being removed in `main` branch through PR:https://github.com/apache/lucene/pull/12837. For methods calling

[PR] Fix intermittently failing TestParallelLeafReader [lucene]

2023-12-01 Thread via GitHub
ChrisHegarty opened a new pull request, #12865: URL: https://github.com/apache/lucene/pull/12865 This commit fixes the intermittently failing TestParallelLeafReader. The ParallelLeafReader requires the document order to be consistent across indexes - each document contains the union o

[PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
kaivalnp opened a new pull request, #12866: URL: https://github.com/apache/lucene/pull/12866 ### Description [`#findBestEntryPoint`](https://github.com/apache/lucene/blob/4bc7850465dfac9dc0638d9ee782007883869ffe/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L

Re: [PR] Fix intermittently failing TestParallelLeafReader [lucene]

2023-12-01 Thread via GitHub
ChrisHegarty commented on PR #12865: URL: https://github.com/apache/lucene/pull/12865#issuecomment-1836024282 The test was previously seen to fail about 1 in every couple of hundred runs, with: ``` org.junit.ComparisonFailure: expected: but was: at __randomizedtesting.SeedInfo.

Re: [PR] Add ParentJoin KNN support [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1836025921 @david-sitsky sorry for the confusion, it was renamed `DiversifyingChildren*KnnVectorQuery` -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on code in PR #12866: URL: https://github.com/apache/lucene/pull/12866#discussion_r1412053821 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -100,19 +100,10 @@ private static void search( HnswGraphSearcher graphSearch

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
stefanvodita commented on PR #12844: URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836064137 I did 5 benchmark runs for 4 configurations. To avoid making this comment way too large, I'll just report the averages across the 5 runs per configuration. It looks like there ar

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on PR #12844: URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836068375 @stefanvodita > How do we trade off between the extra miliseconds and the memory savings? It would be good to know the actual memory savings. I don't know how to measure

[PR] Reconcile changelog 9.9.0 section [lucene]

2023-12-01 Thread via GitHub
ChrisHegarty opened a new pull request, #12867: URL: https://github.com/apache/lucene/pull/12867 Reconcile the changelog between branch_9_9 and main. This change just reorders a number of entries in _main_ to match that of branch_9_9. As identified by Mike's script, #12860 There are

Re: [PR] Add simple tool to diff entries in lucene's CHANGES.txt that should be identical [lucene]

2023-12-01 Thread via GitHub
ChrisHegarty commented on code in PR #12860: URL: https://github.com/apache/lucene/pull/12860#discussion_r1412073640 ## dev-tools/scripts/diff_lucene_changes.py: ## @@ -0,0 +1,79 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Licensed to the Apache Software Foundation (A

Re: [PR] Add static function in TaskExecutor to retrieve the results for a collection of Future [lucene]

2023-12-01 Thread via GitHub
shubhamvishu commented on PR #12798: URL: https://github.com/apache/lucene/pull/12798#issuecomment-1836093934 Exactly @javanna Do you think it would make sense to have a new `FutureUtil` class and add this function there? -- This is an automated message from the Apache Git Service. To

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412105704 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412105704 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
kaivalnp commented on code in PR #12866: URL: https://github.com/apache/lucene/pull/12866#discussion_r1412139289 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -100,19 +100,10 @@ private static void search( HnswGraphSearcher graphSearche

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub
Pulkitg64 commented on PR #12857: URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836188173 Thanks @shubhamvishu for taking a look. > I went through the change but I didn't understand how are we not reusing the bitset in the current approach. We do wrap the BitSetIterator w

Re: [PR] Reconcile changelog 9.9.0 section [lucene]

2023-12-01 Thread via GitHub
ChrisHegarty merged PR #12867: URL: https://github.com/apache/lucene/pull/12867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412155078 ## lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java: ## @@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) { return copy;

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412167183 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412176786 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private sta

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
msokolov commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412182912 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
msokolov commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412186499 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub
msokolov commented on issue #12863: URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836250307 If the JVM crashes, it's generally considered a JVM bug. I'll note there is a more recent JDK 11 release - they seem to be up to 11.0.20. Have you considered upgrading? These point

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412215432 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
dungba88 commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412215432 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private stat

Re: [PR] Add simple tool to diff entries in lucene's CHANGES.txt that should be identical [lucene]

2023-12-01 Thread via GitHub
mikemccand commented on code in PR #12860: URL: https://github.com/apache/lucene/pull/12860#discussion_r1412235383 ## dev-tools/scripts/diff_lucene_changes.py: ## @@ -0,0 +1,79 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Licensed to the Apache Software Foundation (ASF

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub
rmuir commented on issue #12863: URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836306192 I don't think its a jvm bug. This is what happens when you try to read from a closed indexreader that is backed by mmap. because we've unmapped the byte buffer when `close()` was cal

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub
rmuir closed issue #12863: Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 URL: https://github.com/apache/lucene/issues/12863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
kaivalnp commented on PR #12866: URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836321336 Thanks @benwtrent :) > But a CHANGES entry for Lucene 9.10 I did not see a section for "9.10" in the [current CHANGES.txt](https://github.com/apache/lucene/blob/b231e5b2132

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on PR #12866: URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836322275 @kaivalnp you can add it :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Jvm Crashes occassionaly with Lucene 8.10.0, JDK 11.0.15+10 [lucene]

2023-12-01 Thread via GitHub
uschindler commented on issue #12863: URL: https://github.com/apache/lucene/issues/12863#issuecomment-1836326667 Thanks @rmuir, I can confirm the above pattern happens when you access an IndexReader/IndexSearcher form another thread if it was closed. To see AlreadyClosedExceptions

Re: [I] Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on issue #12840: URL: https://github.com/apache/lucene/issues/12840#issuecomment-1836360899 fixed via: https://github.com/apache/lucene/pull/12848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased [lucene]

2023-12-01 Thread via GitHub
benwtrent closed issue #12840: Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased URL: https://github.com/apache/lucene/issues/12840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
kaivalnp commented on PR #12866: URL: https://github.com/apache/lucene/pull/12866#issuecomment-1836394910 Replicated [this commit](https://github.com/apache/lucene/commit/94b879a5) (which added the 9.9.0 entry) for 9.10.0 Please let me know if I missed something.. -- This is an automat

[PR] Trying using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub
shubhamvishu opened a new pull request, #12868: URL: https://github.com/apache/lucene/pull/12868 ### Description We are currently using Murmurhash 2([MurmurHash64.java](https://github.com/apache/lucene/blob/main/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/MurmurHash64.java)) in

Re: [I] Make NeighborArray fixed size [lucene]

2023-12-01 Thread via GitHub
msokolov closed issue #11783: Make NeighborArray fixed size URL: https://github.com/apache/lucene/issues/11783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Make NeighborArray fixed size [lucene]

2023-12-01 Thread via GitHub
msokolov commented on issue #11783: URL: https://github.com/apache/lucene/issues/11783#issuecomment-1836436845 resolving this since I think we did it and are now going back to more dynamic allocation strategy LOL -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Improve Javadoc [lucene]

2023-12-01 Thread via GitHub
mikemccand commented on PR #12508: URL: https://github.com/apache/lucene/pull/12508#issuecomment-1836443822 Woops, thank you for your attention to detail @lukas-vlcek! And sorry for the crazy long time to respond. I'll merge this. -- This is an automated message from the Apache Git Serv

Re: [PR] Improve Javadoc [lucene]

2023-12-01 Thread via GitHub
mikemccand merged PR #12508: URL: https://github.com/apache/lucene/pull/12508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Changed byte to int for prefix_length_key [lucene]

2023-12-01 Thread via GitHub
msokolov merged PR #12507: URL: https://github.com/apache/lucene/pull/12507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub
shubhamvishu commented on PR #12868: URL: https://github.com/apache/lucene/pull/12868#issuecomment-1836499432 Below are the `luceneutil` benchmark results for `wikimediumall`. Looks all flat and good to me. ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub
kaivalnp commented on code in PR #12857: URL: https://github.com/apache/lucene/pull/12857#discussion_r1412393818 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -118,13 +118,38 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight f

Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2023-12-01 Thread via GitHub
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1836540146 I tested the new code - and at the same time commenting out the Java 21 fallback code to check for "closed" in the `IllegalStateException` message here: https://github.com/apac

Re: [PR] Prevent extra similarity computation for single-level graphs [lucene]

2023-12-01 Thread via GitHub
benwtrent merged PR #12866: URL: https://github.com/apache/lucene/pull/12866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] Simplifying text area stream in Luke- ticket 12809 [lucene]

2023-12-01 Thread via GitHub
pratikshelarkar opened a new pull request, #12869: URL: https://github.com/apache/lucene/pull/12869 Hi, Simplifying text area stream in Luke- ticket 12809 This is my first contribution to lucene. Can you please review my code and advice. I'll try my best to add this enhancement.

Re: [I] Simplifying TextAreaPrintStream in Luke [lucene]

2023-12-01 Thread via GitHub
pratikshelarkar commented on issue #12809: URL: https://github.com/apache/lucene/issues/12809#issuecomment-1836594465 Hi, Can you please review my PR for this ticket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
stefanvodita commented on PR #12844: URL: https://github.com/apache/lucene/pull/12844#issuecomment-1836599463 I did some memory profiling and it doesn't look promising. Let's take initial capacity 100 as an example. Total allocation is 3.41GB compared to 1.29GB on the baseline, and peak hea

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412454853 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub
shubhamvishu commented on PR #12857: URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836604882 @kaivalnp We could use the `acceptDocs.cardinality()` when its a `BitSetIterator` to get the upper bound which might have some deletes but that would still change the decision somet

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412475261 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private sta

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412502016 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-01 Thread via GitHub
zhaih commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1412533096 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,17 +32,20 @@ * @lucene.internal */ public class NeighborArray { + private static

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on PR #12857: URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836748924 Is our goal memory usage or speed? We could use `FixedBitSet#intersectionCount` and keep from having to create a new bit set that is the intersection. I am honestly not s

Re: [PR] Reuse BitSet when there are deleted documents in the index instead of creating new BitSet [lucene]

2023-12-01 Thread via GitHub
benwtrent commented on PR #12857: URL: https://github.com/apache/lucene/pull/12857#issuecomment-1836750284 Broad feedback: any "optimizations" without benchmarking aren't optimizations, they are just guesses. I am curious to see if this helps CPU usage in anyway. I could see it helpi

Re: [PR] Add ParentJoin KNN support [lucene]

2023-12-01 Thread via GitHub
david-sitsky commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1836848114 > @david-sitsky sorry for the confusion, it was renamed `DiversifyingChildren*KnnVectorQuery` Ah.. no worries, thanks. We should update the changelog https://lucene.apache.o

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2023-12-01 Thread via GitHub
shubhamvishu commented on PR #12868: URL: https://github.com/apache/lucene/pull/12868#issuecomment-1837045741 @mikemccand rightly pointed out that `luceneutil` doesn't use bloom filter postings format by default and we should enable it for `id` field and rerun the benchmarks to see the impa