Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]
alessandrobenedetti commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2743201362 @vigyasharma, from a first superficial pass, I see that this PR touches similar points of my original outdated one: https://github.com/apache/lucene/pull/12314, but it seems redoing similar things from scratch. Aside from the difficulties of adapting an old pull request, are there other reasons? Is it any better from any angle? I'll proceed with a deeper review, but if you know already about pain points that were in my original PR that were not worth to be ported to 2025, I would be glad to hear them! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] bump antlr 4.11.1 -> 4.13.2 [lucene]
rmuir merged PR #14388: URL: https://github.com/apache/lucene/pull/14388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]
DivyanshIITB commented on code in PR #14373: URL: https://github.com/apache/lucene/pull/14373#discussion_r2009127681 ## lucene/CHANGES.txt: ## @@ -35,6 +35,10 @@ Optimizations - * GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam Durina) * GITHUB#14022: Optimize DFS marking of connected components in HNSW by reducing stack depth, improving performance and reducing allocations. (Viswanath Kuchibhotla) +* GITHUB#14373: Optimized `ParallelLeafReader` to improve term vector fetching efficiency. +- Fetches all term vectors once per reader instead of per field. +- Reduces complexity from **O(n²) to O(n)**. +- Enhances performance for documents with many fields. (Divyansh Agrawal) Review Comment: I have modified `CHANGES.txt` as you said. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]
DivyanshIITB commented on PR #14373: URL: https://github.com/apache/lucene/pull/14373#issuecomment-2746245859 > > I successfully ran `./gradlew tidy` and the built was successful. > > Github build is still failing on spotless (formatting). `tidy` will change and reformat offending files for you, you need to commit and push those changes. Thanks for the review! 🙌 I have run `./gradlew tidy` and pushed the formatting fixes. Let me know if there's anything else needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2747056694 @gf2121 For what it's worth, I implemented this patch, and measured with luceneutil on `wikimedium10m`. TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTermDayOfYearSort 470.94 (11.4%) 441.83 (8.5%) -6.2% ( -23% - 15%) 0.051 MedSloppyPhrase 183.03 (8.5%) 175.29 (6.1%) -4.2% ( -17% - 11%) 0.070 LowSpanNear 411.55 (10.3%) 396.07 (9.5%) -3.8% ( -21% - 17%) 0.230 BrowseMonthSSDVFacets 27.44 (8.5%) 26.47 (5.4%) -3.6% ( -16% - 11%) 0.113 MedTerm 1178.02 (9.8%) 1142.75 (6.9%) -3.0% ( -17% - 15%) 0.261 HighTerm 929.38 (7.6%) 902.10 (7.9%) -2.9% ( -17% - 13%) 0.232 MedPhrase 251.86 (7.8%) 245.03 (9.7%) -2.7% ( -18% - 16%) 0.330 OrHighNotLow 1175.04 (9.4%) 1143.49 (11.8%) -2.7% ( -21% - 20%) 0.425 OrNotHighHigh 877.79 (7.1%) 858.18 (7.3%) -2.2% ( -15% - 13%) 0.326 LowTerm 1419.65 (9.5%) 1389.75 (10.4%) -2.1% ( -20% - 19%) 0.504 MedIntervalsOrdered 59.98 (8.2%) 58.76 (7.2%) -2.0% ( -16% - 14%) 0.403 TermDTSort 454.05 (11.0%) 444.95 (12.6%) -2.0% ( -23% - 24%) 0.593 Fuzzy1 130.17 (3.4%) 128.06 (4.1%) -1.6% ( -8% -6%) 0.172 OrNotHighMed 1033.67 (10.7%) 1017.41 (10.3%) -1.6% ( -20% - 21%) 0.636 AndHighHigh 388.27 (8.1%) 382.25 (7.5%) -1.5% ( -15% - 15%) 0.530 HighSloppyPhrase 132.66 (5.2%) 130.62 (6.4%) -1.5% ( -12% - 10%) 0.402 LowIntervalsOrdered 637.45 (6.9%) 627.85 (6.9%) -1.5% ( -14% - 13%) 0.488 IntNRQ 438.20 (11.4%) 431.78 (10.2%) -1.5% ( -20% - 22%) 0.668 HighTermTitleBDVSort 123.85 (8.9%) 122.20 (8.7%) -1.3% ( -17% - 17%) 0.633 BrowseDayOfYearSSDVFacets 26.84 (10.1%) 26.50 (9.3%) -1.3% ( -18% - 20%) 0.683 Fuzzy2 102.89 (2.1%) 101.68 (3.6%) -1.2% ( -6% -4%) 0.204 LowSloppyPhrase 683.96 (10.5%) 676.73 (12.0%) -1.1% ( -21% - 23%) 0.766 Respell 134.93 (2.2%) 133.51 (2.6%) -1.0% ( -5% -3%) 0.167 AndHighHighDayTaxoFacets 67.12 (4.2%) 66.44 (4.6%) -1.0% ( -9% -8%) 0.471 Wildcard 379.66 (8.4%) 376.74 (9.8%) -0.8% ( -17% - 19%) 0.790 MedSpanNear 227.24 (4.8%) 225.86 (5.6%) -0.6% ( -10% - 10%) 0.715 HighTermMonthSort 1811.33 (12.1%) 1803.90 (13.4%) -0.4% ( -23% - 28%) 0.919 AndHighMed 820.66 (8.4%) 817.55 (9.8%) -0.4% ( -17% - 19%) 0.895 OrHighNotHigh 757.11 (8.3%) 754.48 (7.8%) -0.3% ( -15% - 17%) 0.892 OrNotHighLow 1757.90 (11.0%) 1754.29 (8.9%) -0.2% ( -18% - 22%) 0.948 MedTermDayTaxoFacets 148.12 (4.6%) 148.00 (5.1%) -0.1% ( -9% - 10%) 0.956 PKLookup 293.33 (4.4%) 293.35 (2.9%)0.0% ( -7% -7%) 0.995 Prefix3 707.43 (14.9%) 708.70 (12.0%)0.2% ( -23% - 31%) 0.967 HighSpanNear 75.55 (3.7%) 75.81 (4.5%)0.3% ( -7% -8%) 0.793 AndHighMedDayTaxoFacets 217.31 (6.1%) 218.27 (5.5%)0.4% ( -10% - 12%) 0.808 OrHighMedDayTaxoFacets 47.73 (4.8%) 47.97 (3.9%)0.5% ( -7% -9%) 0.712 range 6721.40 (10.5%) 6784.00 (9.0%)0.9% ( -16% - 22%) 0.763 HighTermTitleSort 138.34 (5.1%) 139.89 (4.1%)1.1% ( -7% - 10%) 0.443 OrHighMed 677.06 (14.7%) 687.35 (10.5%)1.5% ( -20% - 31%) 0.707 HighIntervalsOrdered 119.78 (10.0%) 121.82 (8.4%)1.7% ( -15% - 22%) 0.562 OrHighLow 1099.81 (6
Re: [I] Exploring GPU based kNN vector search [lucene]
alessandrobenedetti commented on issue #13003: URL: https://github.com/apache/lucene/issues/13003#issuecomment-2743054765 Does https://github.com/apache/lucene/issues/14243 supersede this one? should we close it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]
vigyasharma commented on PR #14373: URL: https://github.com/apache/lucene/pull/14373#issuecomment-2747063076 Changes merged. Thanks @DivyanshIITB ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix comment typo [lucene]
gf2121 merged PR #14392: URL: https://github.com/apache/lucene/pull/14392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix comment typo [lucene]
gf2121 commented on PR #14392: URL: https://github.com/apache/lucene/pull/14392#issuecomment-2747016784 Thanks @flat35hd99 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix comment typo [lucene]
flat35hd99 opened a new pull request, #14392: URL: https://github.com/apache/lucene/pull/14392 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Pack file pointers when merging BKD trees [lucene]
iverase opened a new pull request, #14393: URL: https://github.com/apache/lucene/pull/14393 We are currently using long arrays to hold file pointers. These arrays can get pretty big if the number of points is big which seems wasteful, moreover when those file pointers are monotonically increasing. This commit proposes to pack these arrays using existing lucene algorithms to save some heap during merging and to avoid humongous allocations. relates https://github.com/apache/lucene/issues/14382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2747066080 Maybe I should measure it with `DVBench` in luceneutil, or add a bench in jmh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]
vigyasharma merged PR #14373: URL: https://github.com/apache/lucene/pull/14373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]
vigyasharma commented on PR #14373: URL: https://github.com/apache/lucene/pull/14373#issuecomment-2745914617 > I successfully ran `./gradlew tidy` and the built was successful. Github build is still failing on spotless (formatting). `tidy` will change and reformat offending files for you, you need to commit and push those changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org