Re: [PR] Avoid allocating liveDocs when no soft-deletes [lucene]

2024-10-11 Thread via GitHub


jpountz commented on code in PR #13895:
URL: https://github.com/apache/lucene/pull/13895#discussion_r1797605709


##
lucene/core/src/java/org/apache/lucene/index/PendingSoftDeletes.java:
##
@@ -76,15 +76,14 @@ void onNewReader(CodecReader reader, SegmentCommitInfo 
info) throws IOException
 hardDeletes.onNewReader(reader, info);
 // only re-calculate this if we haven't seen this generation
 if (dvGeneration < info.getDocValuesGen()) {
-  final DocIdSetIterator iterator =
-  FieldExistsQuery.getDocValuesDocIdSetIterator(field, reader);
-  int newDelCount;
-  if (iterator
-  != null) { // nothing is deleted we don't have a soft deletes field 
in this segment
-assert info.info.maxDoc() > 0 : "maxDoc is 0";
+  final int newDelCount;
+  var iterator = FieldExistsQuery.getDocValuesDocIdSetIterator(field, 
reader);
+  if (iterator != null && iterator.nextDoc() != 
DocIdSetIterator.NO_MORE_DOCS) {
+iterator = FieldExistsQuery.getDocValuesDocIdSetIterator(field, 
reader);

Review Comment:
   I wonder if we could change `applySoftDeletes` to start with the current doc 
ID so that we don't need to re-create an iterator here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Use growNoCopy and setInt in Util#toIntsRef. [lucene]

2024-10-11 Thread via GitHub


vsop-479 opened a new pull request, #13889:
URL: https://github.com/apache/lucene/pull/13889

   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Check ahead of time if the `count` can be obtained [lucene]

2024-10-11 Thread via GitHub


LuXugang opened a new issue, #13890:
URL: https://github.com/apache/lucene/issues/13890

   ### Description
   
   Currently, we traverse the `BKD` tree or perform a binary search using 
`DocValues` first, and then check whether the count can be obtained in the 
`count()` method of `IndexSortSortedNumericDocValuesRangeQuery`. 
   
   Should we consider providing a mechanism to perform this check beforehand,  
avoid unnecessary processing when dealing with a dense range.?   @jpountz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Create a community metrics dashboard [lucene]

2024-10-11 Thread via GitHub


stefanvodita opened a new issue, #13896:
URL: https://github.com/apache/lucene/issues/13896

   ### Description
   
   Beam has a rich community metrics 
[dashboard](http://35.193.202.176/d/code_velocity/code-velocity?orgId=1) 
([discussion](https://lists.apache.org/thread/0mqo7jfz0ooc1bzvggvw6h0gdlmbd1kl)).
 Would it help to have something similar?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2024-10-11 Thread via GitHub


stefanvodita opened a new issue, #13898:
URL: https://github.com/apache/lucene/issues/13898

   ### Description
   
   We always have to remind ourselves and others to add a CHANGES entry. We can 
probably write a GitHub action to check for changes to that file on every new 
PR and leave a comment warning the author if that's missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-11 Thread via GitHub


original-brownbear commented on PR #13885:
URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407269525

   Hmm I tried and failed miserably to find a solution to initialize in 
`reset()` that I fully understand :) It's quite hard to reason about all the 
possible paths here for me, but I see your point.
   As far as specialization goes, that one I could find a version that didn't 
show runtime regressions (because callsites + polymorphism).
   Maybe do a little cleanup along the lines of 
https://github.com/apache/lucene/pull/13892 first then get back to this? :)
   It's hard to fix this without complicating the code, this is the simplest 
version I could find but I see your points. I bet if we clean up all the 
duplication that we can cleanup without runtime overhead, we'll find an easier 
fix here :) wdyt?
   (this one also isn't so urgent, it's very few cycles runtime, mostly a p100 
thing for JVMs under GC pressure probably)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Simplify leaf slice calculation [lucene]

2024-10-11 Thread via GitHub


original-brownbear opened a new pull request, #13893:
URL: https://github.com/apache/lucene/pull/13893

   No need to go through the indirection of 2 wrapped functions, just put the 
logic in plain methods. Also, we can just outright set the field if there's no 
executor.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize slice calculation in IndexSearcher a little [lucene]

2024-10-11 Thread via GitHub


original-brownbear commented on code in PR #13860:
URL: https://github.com/apache/lucene/pull/13860#discussion_r1797010600


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -540,7 +523,43 @@ public int count(Query query) throws IOException {
* @lucene.experimental
*/
   public final LeafSlice[] getSlices() {
-return leafSlicesSupplier.get();
+LeafSlice[] res = leafSlices;
+if (res == null) {
+  res = computeAndCacheSlices();
+}
+return res;
+  }
+
+  private synchronized LeafSlice[] computeAndCacheSlices() {

Review Comment:
   Sure thing, finally got around to it in 
https://github.com/apache/lucene/pull/13893 :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) [lucene]

2024-10-11 Thread via GitHub


cpoerschke opened a new pull request, #13891:
URL: https://github.com/apache/lucene/pull/13891

   (cherry picked from commit dab731175c86cc25741f37845136d38a69a9d165)
   
   Resolved Conflicts:
lucene/CHANGES.txt
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Dry up both the two ImpactsEnum Implementation ins Lucene912PostingsReader [lucene]

2024-10-11 Thread via GitHub


original-brownbear opened a new pull request, #13892:
URL: https://github.com/apache/lucene/pull/13892

   These two share a lot of code, in particular the impacts implementation is 
100% identical. We can save a lot of code and potentially some cycles for 
method invocations by drying things up. The changes are just mechanical field 
movements with the following exceptions:
   
   1. One of the two implementations was using a bytes ref builder, one a bytes 
ref for holding the serialized impacts. The bytesref variant is faster so I 
used that for both when extracting.
   2. Some simple arithmetic simplifications around the levels that should be 
obvious.
   3. Removed the the logic for an index without positions in 
`BlockImpactsPostingsEnum`, that was dead code, we only set this thing up if 
there's positions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-11 Thread via GitHub


jpountz commented on PR #13885:
URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407713452

   I would expect putting something like `if (forUtil == null && 
termState.docFreq >= BLOCK_SIZE) { /* initialize forUtil */ }` in reset() to 
work (untested).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-11 Thread via GitHub


original-brownbear commented on PR #13885:
URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407850593

   Ok this was my fault, I tried this approach but it failed for 
`EverythingEnum`. Now I realize though, `EverythingEnum` effectively sets up 
the `PForUtil` every time anyway. There the same trick seems to maybe make a 
little sense for the `ForDeltaUtil` so I went ahead and left that part in for 
it. WDYT? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Check ahead if we can get the count [lucene]

2024-10-11 Thread via GitHub


LuXugang opened a new pull request, #13899:
URL: https://github.com/apache/lucene/pull/13899

   Check whether the IndexSortSortedNumericDocValuesRangeQuery's count can be 
obtained before traversing the BKD tree or performing binary search using 
DocValues.
   
   see https://github.com/apache/lucene/issues/13890
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) [lucene]

2024-10-11 Thread via GitHub


cpoerschke commented on PR #13858:
URL: https://github.com/apache/lucene/pull/13858#issuecomment-2407074358

   > Heads up: I moved the CHANGES entry in main and branch_10x to the 10.0.0 
section. ...
   
   Thanks!
   
   > ... Does this also need to backported to 9_12 for the next 9.12.1 patch 
release?
   
   That would be good, yes, #13891 opened for it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org