Re: [PR] Remove recurse into sub block when scan leaf block in IDVersionSegmentTermsEnumFrame#scanToTermLeaf. [lucene]

2024-09-17 Thread via GitHub
vsop-479 commented on PR #13786: URL: https://github.com/apache/lucene/pull/13786#issuecomment-2357386172 @mikemccand Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Move anonymous Weight implementation in PointRangeQuery to named class [lucene]

2024-09-17 Thread via GitHub
github-actions[bot] commented on PR #13711: URL: https://github.com/apache/lucene/pull/13711#issuecomment-2357244427 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] HNSW BP reorder tool [lucene]

2024-09-17 Thread via GitHub
github-actions[bot] commented on PR #13683: URL: https://github.com/apache/lucene/pull/13683#issuecomment-2357244456 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[PR] Disable intra-merge parallelism for all structures but kNN vectors [lucene]

2024-09-17 Thread via GitHub
benwtrent opened a new pull request, #13799: URL: https://github.com/apache/lucene/pull/13799 After adjusting tests that truly exercise intra-merge parallelism, more issues have arisen. See: https://github.com/apache/lucene/issues/13798 To be risk adverse & due to the soon to be relea

Re: [PR] Speed up advancing within a block. [lucene]

2024-09-17 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2356421569 > Maybe we could somehow dynamically optimize I had a similar intuition, since this seems to be highly consistent, iterators should be able to predict by how many documents they ne

Re: [PR] Speed up advancing within a block. [lucene]

2024-09-17 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2356416483 Performance went back up: https://benchmarks.mikemccandless.com/CountAndHighMed.html. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356359326 @ShashwatShivam so, the flat codec version is sneaky, depending on when you cloned the repo, it might not be doing anything Lucene by default will return nothing for approx

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-09-17 Thread via GitHub
ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356278997 Following up on the above comment by tanyaroosta, the dataset I was using for benchmarking RaBitQ through Luceneutil (main branch) was amazon's ASIN and query embeddings (which ar

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356243291 @tanyaroosta we are still doing larger scale testing, but if you want to test with LuceneUtil, here is the branch I am using: https://github.com/mikemccand/luceneutil/compare/main...be

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-09-17 Thread via GitHub
tanyaroosta commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2356189954 @benwtrent we are trying to run tests with the RaBitQ Lucene implementation, and are not able to replicate the numbers reported in the paper. Have you run tests as part of the imple

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
jpountz commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355990742 Agreed, this sounds safer with 9.12 around the corner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355997125 @jpountz I will do that. But, keep the testing that provides the parallelism to ensure we are covered when things are enabled in the future -- This is an automated message from

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355986406 @jpountz given the feature freeze of 9.12, what do you think of disabling intra-merge parallelism for everything :/ and we enable it one at a time for things in the future as wrin

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
jpountz commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355929388 Wow, thanks for finding this, it's indeed broken. I'll look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355911571 Ah, looking at the `clone` code for merge state @jpountz ``` for (int i = 0; i < storedFieldsReaders.length; ++i) { if (storedFieldsReaders[i] != null) {

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355900516 Reverting point value parallelism fixes this bug. This tells me that with how merging point values from multiple threads is busted. I will see if there is a quick fix. If there is

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
benwtrent commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355815297 Intra merge concurrency is causing a race condition here it seems? I can debug where. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
jpountz commented on issue #13798: URL: https://github.com/apache/lucene/issues/13798#issuecomment-2355807304 git bisect points to this commit: https://github.com/apache/lucene/commit/b940511b07b768be974f62cdc165ac948f5c686f -- This is an automated message from the Apache Git Service. To

Re: [PR] Change docValuesSkipIndex from a boolean to an enum. [lucene]

2024-09-17 Thread via GitHub
jpountz merged PR #13784: URL: https://github.com/apache/lucene/pull/13784 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Cleanup redundant allocations and code around Comparator use [lucene]

2024-09-17 Thread via GitHub
jpountz merged PR #13795: URL: https://github.com/apache/lucene/pull/13795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] add RawTFSimilarity class [lucene]

2024-09-17 Thread via GitHub
cpoerschke merged PR #13749: URL: https://github.com/apache/lucene/pull/13749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[I] TestPerFieldDocValuesFormat.testThreads2 fails with java.lang.ArrayIndexOutOfBoundsException [lucene]

2024-09-17 Thread via GitHub
ChrisHegarty opened a new issue, #13798: URL: https://github.com/apache/lucene/issues/13798 Fails on main and 9.x ``` gradlew test --tests TestPerFieldDocValuesFormat -Dtests.seed=9BD00EA6CD907A21 -Dtests.nightly=true -Dtests.locale=gv -Dtests.timezone=America/Juneau -Dtests.asser

Re: [PR] Remove CollectorManager#forSequentialExecution [lucene]

2024-09-17 Thread via GitHub
javanna merged PR #13790: URL: https://github.com/apache/lucene/pull/13790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Remove CollectorManager#forSequentialExecution [lucene]

2024-09-17 Thread via GitHub
javanna commented on PR #13790: URL: https://github.com/apache/lucene/pull/13790#issuecomment-2354777034 Thanks for the feedback @gsmiller I will go ahead and merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use