Re: [PR] Fix StoredFieldsConsumer finish [lucene]

2024-10-21 Thread via GitHub


linfn commented on PR #13927:
URL: https://github.com/apache/lucene/pull/13927#issuecomment-2426869427

   @jpountz Done. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move BooleanScorer to work on top of Scorers rather than BulkScorers. [lucene]

2024-10-21 Thread via GitHub


jpountz commented on PR #13931:
URL: https://github.com/apache/lucene/pull/13931#issuecomment-2425831201

   I could confirm the speedup on a different machine:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
LowIntervalsOrdered9.98  (4.7%)9.61  
(6.7%)   -3.7% ( -14% -8%) 0.044
   HighIntervalsOrdered2.34  (4.6%)2.26  
(5.1%)   -3.1% ( -12% -6%) 0.043
MedIntervalsOrdered   11.57  (4.6%)   11.30  
(4.1%)   -2.3% ( -10% -6%) 0.090
  CountTerm13086.97  (4.6%)12820.41  
(8.0%)   -2.0% ( -13% -   11%) 0.324
 IntNRQ  250.03  (8.7%)  245.69 
(10.8%)   -1.7% ( -19% -   19%) 0.576
   HighTermTitleBDVSort   34.20  (2.9%)   33.67  
(4.0%)   -1.5% (  -8% -5%) 0.164
LowTerm 1671.38  (3.3%) 1650.93  
(5.6%)   -1.2% (  -9% -7%) 0.395
 Fuzzy2  115.68  (1.4%)  114.62  
(2.2%)   -0.9% (  -4% -2%) 0.109
 Fuzzy1  157.55  (1.7%)  156.19  
(2.0%)   -0.9% (  -4% -2%) 0.146
MedSpanNear4.34  (2.6%)4.30  
(3.3%)   -0.8% (  -6% -5%) 0.402
CountPhrase5.18  (7.2%)5.14  
(5.7%)   -0.8% ( -12% -   13%) 0.703
   PKLookup  457.36  (3.7%)  454.41  
(4.1%)   -0.6% (  -8% -7%) 0.603
LowSpanNear7.29  (2.6%)7.25  
(3.0%)   -0.6% (  -6% -5%) 0.522
   HighSpanNear9.02  (1.7%)8.97  
(2.0%)   -0.5% (  -4% -3%) 0.402
   CountAndHighHigh   58.51  (1.3%)   58.24  
(1.4%)   -0.5% (  -3% -2%) 0.285
Respell   97.94  (1.8%)   97.49  
(1.9%)   -0.5% (  -4% -3%) 0.433
  HighTermDayOfYearSort 2991.77  (4.0%) 2978.80  
(4.7%)   -0.4% (  -8% -8%) 0.754
 TermDTSort  646.59  (2.4%)  643.93  
(3.2%)   -0.4% (  -5% -5%) 0.645
 Or2Terms2StopWords  422.30  (3.2%)  420.68  
(2.3%)   -0.4% (  -5% -5%) 0.663
 OrHighHigh   74.97  (3.1%)   74.70  
(2.2%)   -0.4% (  -5% -5%) 0.661
  HighTermMonthSort 1987.11  (2.0%) 1981.26  
(2.4%)   -0.3% (  -4% -4%) 0.673
CountAndHighMed  169.81  (1.6%)  169.32  
(1.6%)   -0.3% (  -3% -2%) 0.567
  HighTermTitleSort  205.08  (2.9%)  204.55  
(3.1%)   -0.3% (  -6% -6%) 0.788
  OrHighLow 1389.51  (3.7%) 1386.35  
(4.2%)   -0.2% (  -7% -8%) 0.857
  OrHighMed  199.35  (3.2%)  198.92  
(2.4%)   -0.2% (  -5% -5%) 0.807
And2Terms2StopWords  389.56  (4.1%)  388.90  
(3.9%)   -0.2% (  -7% -8%) 0.893
Prefix3  537.90  (2.6%)  537.19  
(1.9%)   -0.1% (  -4% -4%) 0.856
 HighPhrase   78.58  (3.8%)   78.56  
(2.9%)   -0.0% (  -6% -6%) 0.986
 AndHighLow 1526.88  (3.6%) 1527.75  
(4.2%)0.1% (  -7% -8%) 0.963
OrStopWords   47.00  (4.8%)   47.07  
(3.6%)0.2% (  -7% -8%) 0.904
  And3Terms  274.37  (4.3%)  274.95  
(3.4%)0.2% (  -7% -8%) 0.863
   HighTerm  894.11  (6.6%)  897.19  
(9.1%)0.3% ( -14% -   17%) 0.890
   Or3Terms  218.85  (4.2%)  219.80  
(3.1%)0.4% (  -6% -8%) 0.712
MedSloppyPhrase  115.99  (3.0%)  116.54  
(2.4%)0.5% (  -4% -5%) 0.580
MedTerm  902.19  (5.9%)  906.44  
(8.7%)0.5% ( -13% -   15%) 0.841
   AndStopWords   40.54  (4.0%)   40.79  
(2.7%)0.6% (  -5% -7%) 0.566
AndHighHigh   95.06  (3.5%)   95.75  
(3.0%)0.7% (  -5% -7%) 0.483
 AndHighMed  216.82  (3.6%)  218.42  
(3.2%)0.7% (  -5% -7%) 0.488
  LowPhrase   39.04  (2.7%)   39.34  
(2.1%)0.8% (  -3% -5%) 0.318
   Wildcard  100.42  (4.4%)  101.19  
(3.9%)0.8% (  -7% -9%) 0.562
  MedPhrase   52.82  (3.2%)   53.31  
(2.6%)0.9% (  -

[I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-10-21 Thread via GitHub


jpountz opened a new issue, #13938:
URL: https://github.com/apache/lucene/issues/13938

   ### Description
   
   Vector search currently loads vectors from disk by issuing a `seek()` 
followed by a `readFloats()`. We should instead:
- Add an absolute `readFloats()` method to `RandomAccessInput`
- Refactor the latest vector search file format to use `RandomAccessInput` 
instead of `IndexInput` to read vectors from disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Reduce the compiled size of the collect() method on `TopScoreDocCollector`. [lucene]

2024-10-21 Thread via GitHub


jpountz opened a new pull request, #13939:
URL: https://github.com/apache/lucene/pull/13939

   This comes from observations on https://tantivy-search.github.io/bench/ for 
exhaustive evaluation like `TOP_100_COUNT`. `collect()` is often inlined, but 
other methods that we'd like to see inlined like `PostingsEnum#nextDoc()` are 
not always inlined. This PR decreases the compiled size of `collect()` to make 
more room for other methods to be inlined.
   
   It does so by moving an assertion to `AssertingScorable` and extracting an 
uncommon code path to a method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub


javanna commented on PR #13926:
URL: https://github.com/apache/lucene/pull/13926#issuecomment-2426005143

   Yes @dweiss indeed it's complicated. I have tried manually and I did not 
succeed yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Check ahead if we can get the count [lucene]

2024-10-21 Thread via GitHub


LuXugang commented on PR #13899:
URL: https://github.com/apache/lucene/pull/13899#issuecomment-2426154458

   > The logic makes sense to me but it's a bit hard to read, could we avoid 
touching `getDocIdSetIteratorOrNull` and only have new logic in the 
`Weight#count` impl?
   
   Thank you for your feedback! @jpountz  I really appreciate your suggestion. 
I’ve made the changes as you recommended 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on code in PR #13937:
URL: https://github.com/apache/lucene/pull/13937#discussion_r1808845516


##
lucene/queries/src/java/org/apache/lucene/queries/intervals/OrderedIntervalsSource.java:
##
@@ -161,8 +163,8 @@ public int nextInterval() throws IOException {
 final int end = last.end();
 this.end = end;
 int slop = end - start + 1;
-for (IntervalIterator subIterator : subIterators) {
-  slop -= subIterator.width();
+for (int j = 0; j < subIterators.size(); j++) {

Review Comment:
   Done :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on PR #13932:
URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426308076

   @jpountz 🤦‍♂️ I noticed this too but kept attributing this to CPU savings 
helping JIT and the like on my weaker benchmark box (feeling really clever 
about myself) ... but now that you've asked I figured I'd double check... 
https://github.com/mikemccand/luceneutil/pull/307 ... very sorry about that I 
guess I should update a couple of my PR descriptions ... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on PR #13932:
URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426635537

   W're also too confident in results I think: 
https://github.com/mikemccand/luceneutil/pull/308


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on PR #13932:
URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426636569

   That said :) thanks Adrien, merging :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub


original-brownbear merged PR #13932:
URL: https://github.com/apache/lucene/pull/13932


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on PR #13894:
URL: https://github.com/apache/lucene/pull/13894#issuecomment-2426662403

   I think we simply underestimate the variance luceneutil which results in 
p-values that are too low. See 
https://github.com/mikemccand/luceneutil/pull/308 for a suggested fix. The QPS 
have since recovered from the tiny dip seen here .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce the compiled size of the collect() method on `TopScoreDocCollector`. [lucene]

2024-10-21 Thread via GitHub


jpountz commented on PR #13939:
URL: https://github.com/apache/lucene/pull/13939#issuecomment-2426505269

   For reference, luceneutil shows no difference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Look into ACORN-1, or another algorithm to aid in filtered HNSW search [lucene]

2024-10-21 Thread via GitHub


benwtrent opened a new issue, #13940:
URL: https://github.com/apache/lucene/issues/13940

   ### Description
   
   Lucene already does OK in filtered kNN search, but it can be better. 
   
   An interesting paper in this area: https://arxiv.org/abs/2403.04871
   
   Weaviate has done an implementation of such paper: 
https://github.com/weaviate/weaviate/pull/5369
   
   
   The key idea is a multi-expansion search of the graph. Instead of pure fan 
out only looking at the current neighborhood, additional neighbor-neighborhoods 
are explored, regardless if you have collected all the candidates or not.
   
   This does allow some nice properties and honestly, doesn't seem that 
difficult to implement.
   
   I would ignore Acorn-lamba part of the paper and focus in on Acorn-1. 
   
   I think for graph construction and storage, building from quantized 
estimations & potentially bi-partite graph organization of the nodes would be 
overall better. 
   
   But, this "explore the next hop" thing does seem nice. 
   
   
   Also, our recent connectivity improvements will only make filtered search 
better.
   
   
   One additional thought, I wonder if we should also allow more than one entry 
point into the bottom layer with filtered search?
   
   Honestly, all this optimization tuning can get tricky as you consider the 
filtering percentages (did the user filter out only 1% of the docs or 80% of 
them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup PriorityQueue a little [lucene]

2024-10-21 Thread via GitHub


mikemccand commented on code in PR #13936:
URL: https://github.com/apache/lucene/pull/13936#discussion_r1808871571


##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -117,7 +117,8 @@ public PriorityQueue(int maxSize, Supplier 
sentinelObjectSupplier) {
* ArrayIndexOutOfBoundsException} is thrown.
*/
   public void addAll(Collection elements) {
-if (this.size + elements.size() > this.maxSize) {
+int size = this.size;

Review Comment:
   Could you add comments explaining that the local variable assignment is done 
on purpose for performance reasons?  We don't want a future refactoring to 
"simplify" this code and cut back to `this.size`.



##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -283,7 +293,7 @@ private final boolean upHeap(int origPos) {
 return i != origPos;
   }
 
-  private final void downHeap(int i) {
+  private void downHeap(int i, T[] heap, int size) {

Review Comment:
   Why are we removing `final` on `upHeap` and `downHeap`?  Does that somehow 
help performance?



##
lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java:
##
@@ -270,7 +280,7 @@ public final boolean remove(T element) {
 return false;
   }
 
-  private final boolean upHeap(int origPos) {
+  private boolean upHeap(int origPos, T[] heap) {

Review Comment:
   > Whether or not it's worth doing this kind of optimization for the observed 
gain is a tricky question
   
   We've done such optimizations in the past for very hot hotspots in Lucene, 
e.g. `readVInt`, all the carefully gen'd code for decoding `int[]` blocks in 
different bit widths, etc.  But it clearly is a tricky judgement call in each 
case...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-21 Thread via GitHub


msokolov commented on PR #13910:
URL: https://github.com/apache/lucene/pull/13910#issuecomment-2427329494

   > Since Lucene90 didn't support sparse vector values, I am not sure this is 
strictly necessary. But I can understand it from a consistency standpoint.
   
   After reflection, I don't think this is true. We always supported sparse 
vector indexes - I don't see how we could have avoided it, really. It seems to 
me this bug was introduced  
[here](https://github.com/apache/lucene/commit/a65cf8960a1057d98126256d1610292ad5c8f1b3#diff-b75a9bbd95dd9708267528a08f3d0b1093f74feb179f03f16f8dfb1857930077L271)
 and it was released as part of 9.8, meaning that releases 9.8 and later in the 
9.x series will be able to read indexes produced in 9.0 but will give 
meaningless results for HNSW searches over those indexes. This seems like 
something we maybe ought to make the user community aware of


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub


dweiss commented on PR #13926:
URL: https://github.com/apache/lucene/pull/13926#issuecomment-2427318916

   Also - this basically adds syntax highlighting and suggestions, forget about 
running tests with these classes - I don't think it'll work from the IDE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub


original-brownbear commented on PR #13937:
URL: https://github.com/apache/lucene/pull/13937#issuecomment-2426878304

   Thanks Adrien!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub


original-brownbear merged PR #13937:
URL: https://github.com/apache/lucene/pull/13937


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub


dweiss commented on PR #13926:
URL: https://github.com/apache/lucene/pull/13926#issuecomment-2427276085

   It is complicated also because there is some trickery in how Lucene compiles 
against those preview APIs - we don't use the preview option but instead fool 
the compiler into thinking these APIs are not in preview-mode (so that they can 
be used without the preview flag at runtime). It is definitely not something 
IDEs will easily digest.
   
   I managed to get the compilation working in IntelliJ using IntelliJ 
compilation mode, followed by manual tweaks. Sorry for lame Windows paths, but 
maybe it'll be helpful -
   
   1) add the right sources to the main21 module -
   
   
![image](https://github.com/user-attachments/assets/7b903682-120e-455a-b55d-ba9178f3571b)
   
   and redirect its output to a separate folder:
   
   
![image](https://github.com/user-attachments/assets/7baf3ac2-c5b6-4bb2-89c8-870ba7a85b5b)
   
   2) add custom compiler options to the java21 module:
   
   
![image](https://github.com/user-attachments/assets/4d63bc61-f0fa-4b74-954e-0263cd325199)
   
   rebuild and the sources in main21 will compile cleanly.
   
   
![image](https://github.com/user-attachments/assets/c9fba3ee-0ad5-4826-b44a-4ce6932f3fbf)
   
   Sadly, any time you re-import the project from gradle, this configuration 
tweak will likely be destroyed. For one-time use it may be an option though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix StoredFieldsConsumer finish [lucene]

2024-10-21 Thread via GitHub


jpountz merged PR #13927:
URL: https://github.com/apache/lucene/pull/13927


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move BooleanScorer to work on top of Scorers rather than BulkScorers. [lucene]

2024-10-21 Thread via GitHub


jpountz merged PR #13931:
URL: https://github.com/apache/lucene/pull/13931


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-10-21 Thread via GitHub


benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2427907718

   Here is some Lucene Util Benchmarking. Some of these numbers actually 
contradict some of my previous benchmarking for int4. Which is frustrating, I 
wonder what I did wrong then or now. Or of float32 got faster between then and 
now :)
   
   Regardless, this shows that bit quantization is generally as fast as int4 
search or faster and you can get good recall with some oversampling. Combining 
with the 32x reduction in space its pretty nice.
   
   The oversampling rates were `[1, 1.5, 2, 3, 4, 5]`. HNSW params 
`m=16,efsearch=100`. `Recall@100`.
   
   ## Cohere v2 1M
   
   | quantization  | Index Time | Force Merge time | Mem Required |
   |---||--|--|
   | 1 bit | 395.18 | 411.67   | 175.9MB  |
   | 4 bit (compress)  | 1877.47| 491.13   | 439.7MB  |
   | 7 bit | 500.59 | 820.53   | 833.9MB  |
   | raw   | 493.44 | 792.04   | 3132.8MB |
   
   
![cohere-v2-bit-1M](https://github.com/user-attachments/assets/0e704dc7-d4a2-4f4a-98ca-3b23641cd4e9)
   
   ## Cohere v3 1M
   
   1M Cohere v3 1024
   
   | quantization  | Index Time | Force Merge time | Mem Required |
   |---||--|--|
   | 1 bit | 338.97 | 342.61   | 208MB|
   | 4 bit (compress)  | 1113.06| 5490.36  | 578MB|
   | 7 bit | 437.63 | 744.12   | 1094MB   |
   | raw   | 408.75 | 798.11   | 4162MB   |
   
   
   
![cohere-v3-bit-1M](https://github.com/user-attachments/assets/94a25ce5-4cbf-4a7a-b07b-052ff730c9ca)
   
   # e5Small
   
   | quantization  | Index Time | Force Merge time | Mem Required |
   |---||--|--|
   | 1 bit | 161.84 | 42.37| 57.6MB   |
   | 4 bit (compress)  | 665.54 | 660.33   | 123.2MB  |
   | 7 bit | 267.13 | 89.99| 219.6MB  |
   | raw   | 249.26 | 77.81| 793.5MB  |
   
   
![e5small-bit-500k](https://github.com/user-attachments/assets/d649a54b-9da6-454f-9d9b-f7ff2b53ac78)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org