[I] Change behavior for finding segmentMerges from contains to finding IDs? [lucene]

2024-06-10 Thread via GitHub
ameyakarve opened a new issue, #13477: URL: https://github.com/apache/lucene/issues/13477 Context: I was working on a custom merge policy implementation: https://github.com/apache/lucene/blob/edba83e63652f414c50305b7c3b545f374d1108c/lucene/core/src/java/org/apache/lucene/index/IndexWr

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2159534702 @gsmiller did you have more than one segment? This branch of the code only occurs if there is more than one segment. By default, the buffer size is 1GB, which for smaller d

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-10 Thread via GitHub
gsmiller commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2159506702 Thanks @benwtrent. As another data point, I ran `knnPerfTest` with the vectors that `luceneutil` downloads as part of `setup.py` (`enwiki-20120502-lines-1k-100d.vec` / `vector-task-100d

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2159258542 I ran some more tests. My logic for "move assertion to when we first read the index segment values" still doesn't work. Its possible with index sorting that we read doc values o

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-10 Thread via GitHub
original-brownbear commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2159183411 > we may need bigger queues, because a single search operation may create many more tasks than before? Right, an alternative would be to count in-progress searches at th

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-10 Thread via GitHub
javanna commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2159105375 I took some time to digest the suggested code changes and the discussions above. I get the sizing issues with using two thread pools (one executing `IndexSearcher#search` or whatever ope

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-10 Thread via GitHub
javanna commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1633706065 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -112,15 +102,10 @@ RunnableFuture createTask(Callable callable) { () -> {

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-10 Thread via GitHub
shatejas commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2159053204 > But we removed it, because we found it wasn't needed. Is there a comment chain which I can look at to better understand this. Would be helpful if it is linked. Thanks! @m

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-10 Thread via GitHub
shatejas commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2159048277 > if we can abstract this and any other parameter that can come in future for any algorithm in a class `SearchParameters` or `VectorSearchParameters` I thought about this and it i

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-10 Thread via GitHub
shatejas commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2159001847 > Now the user interface itself when querying assumes "HNSW-esque" things. I am wondering why this wasn't raised in [#12551](https://github.com/apache/lucene/pull/12551) >

Re: [PR] Fix typo in StringValueFacetCountsExample.java [lucene]

2024-06-10 Thread via GitHub
gsmiller commented on PR #13474: URL: https://github.com/apache/lucene/pull/13474#issuecomment-2158659743 Thanks @paulk-asert ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-10 Thread via GitHub
msokolov commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1633432342 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java: ## @@ -189,6 +191,18 @@ public ByteVectorValues getByteVectorValu

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1633387782 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java: ## @@ -189,6 +191,18 @@ public ByteVectorValues getByteVectorVal

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1633378674 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.java: ## @@ -189,6 +191,18 @@ public ByteVectorValues getByteVectorVal

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-06-10 Thread via GitHub
msokolov commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2158554884 We used to have this as a separate parameter, but after discussing, we realized this is completely equivalent to running search with larger k (=efSearch) and then discarding all but the

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2158456017 Making `norms` & `terms` merge synchronously (not forking to the intra-merge pool), makes the assertions go away, but then (surprise surprise), another test fails 🤦 ``` grad

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2158370906 After adjusting the assertion, some other assertions are now being triggered. ``` java.lang.AssertionError: [body] norms must not be cached twice at org.apach

[PR] Make Gradle dashboard easy to find by adding a badge [lucene]

2024-06-10 Thread via GitHub
stefanvodita opened a new pull request, #13476: URL: https://github.com/apache/lucene/pull/13476 ### Description #12293 made it so we would publish builds which can be viewed at [ge.apache.org](https://ge.apache.org/scans?search.buildToolType=gradle&search.rootProjectNames=lucene-root

Re: [I] Expose flat vectors in "user space" [lucene]

2024-06-10 Thread via GitHub
msokolov commented on issue #13468: URL: https://github.com/apache/lucene/issues/13468#issuecomment-2158343964 See https://github.com/apache/lucene/pull/13469. This still leaves search() as throwing `UnsupportedOperationException` but enables scoring using quantized vectors. I think typical

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-06-10 Thread via GitHub
benwtrent commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2158343032 Here is an example of an assertion tripping: ``` java.lang.AssertionError: DocValuesProducer are only supposed to be consumed in the thread in which they have been acquired. B

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-06-10 Thread via GitHub
kaivalnp commented on PR #13285: URL: https://github.com/apache/lucene/pull/13285#issuecomment-2158288293 Summary of latest changes: 1. Resolved merge conflicts 2. Moved `CHANGES.txt` entry from 9.11 -> 9.12 since the prior is now released 3. `#Scorer` is now `final` and not overrid

Re: [PR] Remove ByteBufferIndexInput and update all Panama implementations (MMap and Vector) to Java 21 [lucene]

2024-06-10 Thread via GitHub
uschindler commented on PR #13146: URL: https://github.com/apache/lucene/pull/13146#issuecomment-2158250327 I reopened #13325. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput [lucene]

2024-06-10 Thread via GitHub
uschindler commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2158241513 Hi, I reopened this to make a further investigation. @dsmiley talked to me at Berlinbuzzwords and he also commented on #13146. He has seen a major slowdown on Apache So

Re: [PR] Remove ByteBufferIndexInput and update all Panama implementations (MMap and Vector) to Java 21 [lucene]

2024-06-10 Thread via GitHub
uschindler commented on PR #13146: URL: https://github.com/apache/lucene/pull/13146#issuecomment-2158221471 Hi @dsmiley, Thanks for the quick talk on Berlinbuzzwords. Actually this looks like the same issue we have seen in the dacapobench. When back at home I will try to write a JMH be

Re: [PR] Silence odd test runner warnings after gradle upgrade [lucene]

2024-06-10 Thread via GitHub
dweiss commented on PR #13471: URL: https://github.com/apache/lucene/pull/13471#issuecomment-2157846324 Thanks. I've backported it to 9x as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Silence odd test runner warnings after gradle upgrade [lucene]

2024-06-10 Thread via GitHub
dweiss merged PR #13471: URL: https://github.com/apache/lucene/pull/13471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Fix typo in StringValueFacetCountsExample.java [lucene]

2024-06-10 Thread via GitHub
stefanvodita commented on PR #13474: URL: https://github.com/apache/lucene/pull/13474#issuecomment-2157728116 Thanks for finding this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix typo in StringValueFacetCountsExample.java [lucene]

2024-06-10 Thread via GitHub
stefanvodita merged PR #13474: URL: https://github.com/apache/lucene/pull/13474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Add new test case "testGetLines" for lucene/core/analysis/WordlistLoader [lucene]

2024-06-10 Thread via GitHub
stefanvodita merged PR #13419: URL: https://github.com/apache/lucene/pull/13419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] Expose flat vectors in "user space" [lucene]

2024-06-10 Thread via GitHub
navneet1v commented on issue #13468: URL: https://github.com/apache/lucene/issues/13468#issuecomment-2157491442 > Currently if you make a KnnFloatVectorField or a KnnByteVectorField you get an HNSW graph even if you don't want it. We have all the tools to support this use case, but the API