Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-03-23 Thread via GitHub


alessandrobenedetti commented on PR #14173:
URL: https://github.com/apache/lucene/pull/14173#issuecomment-2743201362

   @vigyasharma, from a first superficial pass, I see that this PR touches 
similar points of my original outdated one: 
   https://github.com/apache/lucene/pull/12314, but it seems redoing similar 
things from scratch.
   Aside from the difficulties of adapting an old pull request, are there other 
reasons?
   Is it any better from any angle?
   
   I'll proceed with a deeper review, but if you know already about pain points 
that were in my original PR that were not worth to be ported to 2025, I would 
be glad to hear them!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] bump antlr 4.11.1 -> 4.13.2 [lucene]

2025-03-23 Thread via GitHub


rmuir merged PR #14388:
URL: https://github.com/apache/lucene/pull/14388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-23 Thread via GitHub


DivyanshIITB commented on code in PR #14373:
URL: https://github.com/apache/lucene/pull/14373#discussion_r2009127681


##
lucene/CHANGES.txt:
##
@@ -35,6 +35,10 @@ Optimizations
 -
 * GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam 
Durina)
 * GITHUB#14022: Optimize DFS marking of connected components in HNSW by 
reducing stack depth, improving performance and reducing allocations. 
(Viswanath Kuchibhotla)
+* GITHUB#14373: Optimized `ParallelLeafReader` to improve term vector fetching 
efficiency.
+- Fetches all term vectors once per reader instead of per field.
+- Reduces complexity from **O(n²) to O(n)**.
+- Enhances performance for documents with many fields. (Divyansh Agrawal)

Review Comment:
   I have modified `CHANGES.txt` as you said.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-23 Thread via GitHub


DivyanshIITB commented on PR #14373:
URL: https://github.com/apache/lucene/pull/14373#issuecomment-2746245859

   > > I successfully ran `./gradlew tidy` and the built was successful.
   > 
   > Github build is still failing on spotless (formatting). `tidy` will change 
and reformat offending files for you, you need to commit and push those changes.
   
   Thanks for the review! 🙌 I have run `./gradlew tidy` and pushed the 
formatting fixes. Let me know if there's anything else needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-23 Thread via GitHub


vsop-479 commented on PR #14371:
URL: https://github.com/apache/lucene/pull/14371#issuecomment-2747056694

   @gf2121 
   For what it's worth, I implemented this patch, and measured with luceneutil 
on `wikimedium10m`.
   
   
  TaskQPS baseline  StdDevQPS my_modified_version  StdDev   
 Pct diff p-value
  HighTermDayOfYearSort  470.94 (11.4%)  441.83  
(8.5%)   -6.2% ( -23% -   15%) 0.051
MedSloppyPhrase  183.03  (8.5%)  175.29  
(6.1%)   -4.2% ( -17% -   11%) 0.070
LowSpanNear  411.55 (10.3%)  396.07  
(9.5%)   -3.8% ( -21% -   17%) 0.230
  BrowseMonthSSDVFacets   27.44  (8.5%)   26.47  
(5.4%)   -3.6% ( -16% -   11%) 0.113
MedTerm 1178.02  (9.8%) 1142.75  
(6.9%)   -3.0% ( -17% -   15%) 0.261
   HighTerm  929.38  (7.6%)  902.10  
(7.9%)   -2.9% ( -17% -   13%) 0.232
  MedPhrase  251.86  (7.8%)  245.03  
(9.7%)   -2.7% ( -18% -   16%) 0.330
   OrHighNotLow 1175.04  (9.4%) 1143.49 
(11.8%)   -2.7% ( -21% -   20%) 0.425
  OrNotHighHigh  877.79  (7.1%)  858.18  
(7.3%)   -2.2% ( -15% -   13%) 0.326
LowTerm 1419.65  (9.5%) 1389.75 
(10.4%)   -2.1% ( -20% -   19%) 0.504
MedIntervalsOrdered   59.98  (8.2%)   58.76  
(7.2%)   -2.0% ( -16% -   14%) 0.403
 TermDTSort  454.05 (11.0%)  444.95 
(12.6%)   -2.0% ( -23% -   24%) 0.593
 Fuzzy1  130.17  (3.4%)  128.06  
(4.1%)   -1.6% (  -8% -6%) 0.172
   OrNotHighMed 1033.67 (10.7%) 1017.41 
(10.3%)   -1.6% ( -20% -   21%) 0.636
AndHighHigh  388.27  (8.1%)  382.25  
(7.5%)   -1.5% ( -15% -   15%) 0.530
   HighSloppyPhrase  132.66  (5.2%)  130.62  
(6.4%)   -1.5% ( -12% -   10%) 0.402
LowIntervalsOrdered  637.45  (6.9%)  627.85  
(6.9%)   -1.5% ( -14% -   13%) 0.488
 IntNRQ  438.20 (11.4%)  431.78 
(10.2%)   -1.5% ( -20% -   22%) 0.668
   HighTermTitleBDVSort  123.85  (8.9%)  122.20  
(8.7%)   -1.3% ( -17% -   17%) 0.633
  BrowseDayOfYearSSDVFacets   26.84 (10.1%)   26.50  
(9.3%)   -1.3% ( -18% -   20%) 0.683
 Fuzzy2  102.89  (2.1%)  101.68  
(3.6%)   -1.2% (  -6% -4%) 0.204
LowSloppyPhrase  683.96 (10.5%)  676.73 
(12.0%)   -1.1% ( -21% -   23%) 0.766
Respell  134.93  (2.2%)  133.51  
(2.6%)   -1.0% (  -5% -3%) 0.167
   AndHighHighDayTaxoFacets   67.12  (4.2%)   66.44  
(4.6%)   -1.0% (  -9% -8%) 0.471
   Wildcard  379.66  (8.4%)  376.74  
(9.8%)   -0.8% ( -17% -   19%) 0.790
MedSpanNear  227.24  (4.8%)  225.86  
(5.6%)   -0.6% ( -10% -   10%) 0.715
  HighTermMonthSort 1811.33 (12.1%) 1803.90 
(13.4%)   -0.4% ( -23% -   28%) 0.919
 AndHighMed  820.66  (8.4%)  817.55  
(9.8%)   -0.4% ( -17% -   19%) 0.895
  OrHighNotHigh  757.11  (8.3%)  754.48  
(7.8%)   -0.3% ( -15% -   17%) 0.892
   OrNotHighLow 1757.90 (11.0%) 1754.29  
(8.9%)   -0.2% ( -18% -   22%) 0.948
   MedTermDayTaxoFacets  148.12  (4.6%)  148.00  
(5.1%)   -0.1% (  -9% -   10%) 0.956
   PKLookup  293.33  (4.4%)  293.35  
(2.9%)0.0% (  -7% -7%) 0.995
Prefix3  707.43 (14.9%)  708.70 
(12.0%)0.2% ( -23% -   31%) 0.967
   HighSpanNear   75.55  (3.7%)   75.81  
(4.5%)0.3% (  -7% -8%) 0.793
AndHighMedDayTaxoFacets  217.31  (6.1%)  218.27  
(5.5%)0.4% ( -10% -   12%) 0.808
 OrHighMedDayTaxoFacets   47.73  (4.8%)   47.97  
(3.9%)0.5% (  -7% -9%) 0.712
  range 6721.40 (10.5%) 6784.00  
(9.0%)0.9% ( -16% -   22%) 0.763
  HighTermTitleSort  138.34  (5.1%)  139.89  
(4.1%)1.1% (  -7% -   10%) 0.443
  OrHighMed  677.06 (14.7%)  687.35 
(10.5%)1.5% ( -20% -   31%) 0.707
   HighIntervalsOrdered  119.78 (10.0%)  121.82  
(8.4%)1.7% ( -15% -   22%) 0.562
  OrHighLow 1099.81  (6

Re: [I] Exploring GPU based kNN vector search [lucene]

2025-03-23 Thread via GitHub


alessandrobenedetti commented on issue #13003:
URL: https://github.com/apache/lucene/issues/13003#issuecomment-2743054765

   Does https://github.com/apache/lucene/issues/14243  supersede this one? 
should we close it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-23 Thread via GitHub


vigyasharma commented on PR #14373:
URL: https://github.com/apache/lucene/pull/14373#issuecomment-2747063076

   Changes merged. Thanks @DivyanshIITB !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix comment typo [lucene]

2025-03-23 Thread via GitHub


gf2121 merged PR #14392:
URL: https://github.com/apache/lucene/pull/14392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix comment typo [lucene]

2025-03-23 Thread via GitHub


gf2121 commented on PR #14392:
URL: https://github.com/apache/lucene/pull/14392#issuecomment-2747016784

   Thanks @flat35hd99 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix comment typo [lucene]

2025-03-23 Thread via GitHub


flat35hd99 opened a new pull request, #14392:
URL: https://github.com/apache/lucene/pull/14392

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Pack file pointers when merging BKD trees [lucene]

2025-03-23 Thread via GitHub


iverase opened a new pull request, #14393:
URL: https://github.com/apache/lucene/pull/14393

   We are currently using long arrays to hold file pointers. These arrays can 
get pretty big if the number of points is big which seems wasteful, moreover 
when those file pointers are monotonically increasing. This commit proposes to 
pack these arrays using existing lucene algorithms to save some heap during 
merging and to avoid humongous allocations. 
   
   relates https://github.com/apache/lucene/issues/14382
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-23 Thread via GitHub


vsop-479 commented on PR #14371:
URL: https://github.com/apache/lucene/pull/14371#issuecomment-2747066080

   Maybe I should measure it with `DVBench` in luceneutil, or add a bench in 
jmh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-23 Thread via GitHub


vigyasharma merged PR #14373:
URL: https://github.com/apache/lucene/pull/14373


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize ParallelLeafReader to improve term vector fetching efficienc [lucene]

2025-03-23 Thread via GitHub


vigyasharma commented on PR #14373:
URL: https://github.com/apache/lucene/pull/14373#issuecomment-2745914617

   > I successfully ran `./gradlew tidy` and the built was successful.
   
   Github build is still failing on spotless (formatting). `tidy` will change 
and reformat offending files for you, you need to commit and push those changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org