Re: [PR] Avoid allocating liveDocs when no soft-deletes [lucene]
jpountz commented on code in PR #13895: URL: https://github.com/apache/lucene/pull/13895#discussion_r1797605709 ## lucene/core/src/java/org/apache/lucene/index/PendingSoftDeletes.java: ## @@ -76,15 +76,14 @@ void onNewReader(CodecReader reader, SegmentCommitInfo info) throws IOException hardDeletes.onNewReader(reader, info); // only re-calculate this if we haven't seen this generation if (dvGeneration < info.getDocValuesGen()) { - final DocIdSetIterator iterator = - FieldExistsQuery.getDocValuesDocIdSetIterator(field, reader); - int newDelCount; - if (iterator - != null) { // nothing is deleted we don't have a soft deletes field in this segment -assert info.info.maxDoc() > 0 : "maxDoc is 0"; + final int newDelCount; + var iterator = FieldExistsQuery.getDocValuesDocIdSetIterator(field, reader); + if (iterator != null && iterator.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { +iterator = FieldExistsQuery.getDocValuesDocIdSetIterator(field, reader); Review Comment: I wonder if we could change `applySoftDeletes` to start with the current doc ID so that we don't need to re-create an iterator here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Use growNoCopy and setInt in Util#toIntsRef. [lucene]
vsop-479 opened a new pull request, #13889: URL: https://github.com/apache/lucene/pull/13889 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Check ahead of time if the `count` can be obtained [lucene]
LuXugang opened a new issue, #13890: URL: https://github.com/apache/lucene/issues/13890 ### Description Currently, we traverse the `BKD` tree or perform a binary search using `DocValues` first, and then check whether the count can be obtained in the `count()` method of `IndexSortSortedNumericDocValuesRangeQuery`. Should we consider providing a mechanism to perform this check beforehand, avoid unnecessary processing when dealing with a dense range.? @jpountz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Create a community metrics dashboard [lucene]
stefanvodita opened a new issue, #13896: URL: https://github.com/apache/lucene/issues/13896 ### Description Beam has a rich community metrics [dashboard](http://35.193.202.176/d/code_velocity/code-velocity?orgId=1) ([discussion](https://lists.apache.org/thread/0mqo7jfz0ooc1bzvggvw6h0gdlmbd1kl)). Would it help to have something similar? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]
stefanvodita opened a new issue, #13898: URL: https://github.com/apache/lucene/issues/13898 ### Description We always have to remind ourselves and others to add a CHANGES entry. We can probably write a GitHub action to check for changes to that file on every new PR and leave a comment warning the author if that's missing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]
original-brownbear commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407269525 Hmm I tried and failed miserably to find a solution to initialize in `reset()` that I fully understand :) It's quite hard to reason about all the possible paths here for me, but I see your point. As far as specialization goes, that one I could find a version that didn't show runtime regressions (because callsites + polymorphism). Maybe do a little cleanup along the lines of https://github.com/apache/lucene/pull/13892 first then get back to this? :) It's hard to fix this without complicating the code, this is the simplest version I could find but I see your points. I bet if we clean up all the duplication that we can cleanup without runtime overhead, we'll find an easier fix here :) wdyt? (this one also isn't so urgent, it's very few cycles runtime, mostly a p100 thing for JVMs under GC pressure probably) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Simplify leaf slice calculation [lucene]
original-brownbear opened a new pull request, #13893: URL: https://github.com/apache/lucene/pull/13893 No need to go through the indirection of 2 wrapped functions, just put the logic in plain methods. Also, we can just outright set the field if there's no executor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize slice calculation in IndexSearcher a little [lucene]
original-brownbear commented on code in PR #13860: URL: https://github.com/apache/lucene/pull/13860#discussion_r1797010600 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -540,7 +523,43 @@ public int count(Query query) throws IOException { * @lucene.experimental */ public final LeafSlice[] getSlices() { -return leafSlicesSupplier.get(); +LeafSlice[] res = leafSlices; +if (res == null) { + res = computeAndCacheSlices(); +} +return res; + } + + private synchronized LeafSlice[] computeAndCacheSlices() { Review Comment: Sure thing, finally got around to it in https://github.com/apache/lucene/pull/13893 :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) [lucene]
cpoerschke opened a new pull request, #13891: URL: https://github.com/apache/lucene/pull/13891 (cherry picked from commit dab731175c86cc25741f37845136d38a69a9d165) Resolved Conflicts: lucene/CHANGES.txt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Dry up both the two ImpactsEnum Implementation ins Lucene912PostingsReader [lucene]
original-brownbear opened a new pull request, #13892: URL: https://github.com/apache/lucene/pull/13892 These two share a lot of code, in particular the impacts implementation is 100% identical. We can save a lot of code and potentially some cycles for method invocations by drying things up. The changes are just mechanical field movements with the following exceptions: 1. One of the two implementations was using a bytes ref builder, one a bytes ref for holding the serialized impacts. The bytesref variant is faster so I used that for both when extracting. 2. Some simple arithmetic simplifications around the levels that should be obvious. 3. Removed the the logic for an index without positions in `BlockImpactsPostingsEnum`, that was dead code, we only set this thing up if there's positions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]
jpountz commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407713452 I would expect putting something like `if (forUtil == null && termState.docFreq >= BLOCK_SIZE) { /* initialize forUtil */ }` in reset() to work (untested). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]
original-brownbear commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2407850593 Ok this was my fault, I tried this approach but it failed for `EverythingEnum`. Now I realize though, `EverythingEnum` effectively sets up the `PForUtil` every time anyway. There the same trick seems to maybe make a little sense for the `ForDeltaUtil` so I went ahead and left that part in for it. WDYT? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Check ahead if we can get the count [lucene]
LuXugang opened a new pull request, #13899: URL: https://github.com/apache/lucene/pull/13899 Check whether the IndexSortSortedNumericDocValuesRangeQuery's count can be obtained before traversing the BKD tree or performing binary search using DocValues. see https://github.com/apache/lucene/issues/13890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) [lucene]
cpoerschke commented on PR #13858: URL: https://github.com/apache/lucene/pull/13858#issuecomment-2407074358 > Heads up: I moved the CHANGES entry in main and branch_10x to the 10.0.0 section. ... Thanks! > ... Does this also need to backported to 9_12 for the next 9.12.1 patch release? That would be good, yes, #13891 opened for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org