Re: [PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]

2024-07-09 Thread via GitHub
jpountz commented on code in PR #13556: URL: https://github.com/apache/lucene/pull/13556#discussion_r1671719100 ## lucene/core/src/test/org/apache/lucene/index/TestTermsEnum.java: ## @@ -998,6 +999,43 @@ public void testCommonPrefixTerms() throws Exception { d.close(); }

[PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]

2024-07-09 Thread via GitHub
vsop-479 opened a new pull request, #13556: URL: https://github.com/apache/lucene/pull/13556 ### Description In current implementation, we won't get the live doc, if we have deletes on unFlushed segment. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-07-09 Thread via GitHub
github-actions[bot] commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2219129038 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Only apply deletion one time for unique term update in FrozenBufferedUpdates.applyTermDeletes [lucene]

2024-07-09 Thread via GitHub
github-actions[bot] commented on PR #13486: URL: https://github.com/apache/lucene/pull/13486#issuecomment-2219128808 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
benwtrent commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218618086 @shubhamvishu looking at `BufferedUpdatesStream` & `bytesUsed`, that may be a candidate. However, benchmarking would have to prove it out. -- This is an automated message from the Ap

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
benwtrent commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218607252 @shubhamvishu stuff like this gets tricky when you see patterns like `if(pendingNumDocs.incrementAndGet() > IndexWriter.getActualMaxDocs()) {` There, you want to be ABSOLU

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-09 Thread via GitHub
magibney commented on PR #13555: URL: https://github.com/apache/lucene/pull/13555#issuecomment-2218482093 Follows an approach analogous to the ["custom arenas"](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/foreign/Arena.html#custom-arenas) case described in Arena j

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
shubhamvishu commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218438332 @benwtrent @original-brownbear Yes, I agree with both points. I haven't looked into the profiler but I was mainly referring to the usages I see in [DocumentsWriter](https://github.c

Re: [PR] Fix quantized vector writer ram estimates [lucene]

2024-07-09 Thread via GitHub
gautamworah96 commented on PR #13553: URL: https://github.com/apache/lucene/pull/13553#issuecomment-2218434337 Interesting. I didn't know the ram usage estimator is what is actually used for flushing based on ram size. I used to think it was only for external monitoring/alarming, and some o

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
original-brownbear commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218302360 @shubhamvishu what Ben says, plus keep in mind that the adder is less consistent and probably slower for <= 2 or 3 threads competing scenarios on x86 hardware (that seems to b

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
benwtrent commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218295960 > Does it make sense to change other AtomicLong occurences @shubhamvishu I would say only if those usages are a "many write & few read" scenarios. -- This is an automated mess

Re: [PR] Fix quantized vector writer ram estimates [lucene]

2024-07-09 Thread via GitHub
benwtrent merged PR #13553: URL: https://github.com/apache/lucene/pull/13553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput when IndexInputs are closed in other threads (deoptimizations,...) [lucene]

2024-07-09 Thread via GitHub
magibney commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2218097364 I extended the `Arena` interface to track refCount for https://github.com/apache/lucene/pull/13555 -- curious what you think of this approach? -- This is an automated message fr

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
shubhamvishu commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2218065362 Nice! Does it make sense to change other `AtomicLong` occurences to `LongAdder`? Is see couple of other usages where there might be some opportunity to squeeze some more gains prob

[I] Pruning of estimating the point value count from BooleanScorerSupplier [lucene]

2024-07-09 Thread via GitHub
kkewwei opened a new issue, #13554: URL: https://github.com/apache/lucene/issues/13554 ### Description In #13199, we add `isEstimatedPointCountGreaterThanOrEqualTo` to dynamic pruning in the point value, there also too many function call `estimatePointCount` directly, dynamic pruning

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput when IndexInputs are closed in other threads (deoptimizations,...) [lucene]

2024-07-09 Thread via GitHub
uschindler commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2217896710 > > Possibly group multiple files into one arena. One Idea that came into my mind: MMapDirectory groups files belonging to the same segment together and uses a single Arena for t

Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]

2024-07-09 Thread via GitHub
mikemccand merged PR #13488: URL: https://github.com/apache/lucene/pull/13488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Stop double-checking priority queue inserts [lucene]

2024-07-09 Thread via GitHub
mikemccand closed issue #13175: Stop double-checking priority queue inserts URL: https://github.com/apache/lucene/issues/13175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Stop double-checking priority queue inserts [lucene]

2024-07-09 Thread via GitHub
mikemccand closed issue #13175: Stop double-checking priority queue inserts URL: https://github.com/apache/lucene/issues/13175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]

2024-07-09 Thread via GitHub
mikemccand commented on PR #13488: URL: https://github.com/apache/lucene/pull/13488#issuecomment-2217879541 Woops, sorry @slow-J -- yes this can be merged now -- I'll merge! Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Examine performance of individual data accessor methods of MemorySegmentIndexInput when IndexInputs are closed in other threads (deoptimizations,...) [lucene]

2024-07-09 Thread via GitHub
magibney commented on issue #13325: URL: https://github.com/apache/lucene/issues/13325#issuecomment-2217862298 > Possibly group multiple files into one arena. One Idea that came into my mind: MMapDirectory groups files belonging to the same segment together and uses a single Arena for them.

Re: [PR] Introduce TestLucene90DocValuesFormatVariableSkipInterval for testing docvalues skipper index [lucene]

2024-07-09 Thread via GitHub
iverase merged PR #13550: URL: https://github.com/apache/lucene/pull/13550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-09 Thread via GitHub
benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2217634878 @MilindShyani no worries dude! Enjoy your vacation and don't stress about this, it will still be here when you get back to the office :) -- This is an automated message from the

Re: [PR] Improve VectorUtil::xorBitCount perf on ARM [lucene]

2024-07-09 Thread via GitHub
ChrisHegarty commented on code in PR #13545: URL: https://github.com/apache/lucene/pull/13545#discussion_r1670333682 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -212,6 +212,14 @@ public static int int4DotProductPacked(byte[] unpacked, byte[] packed) {

Re: [PR] Introduce TestLucene90DocValuesFormatVariableSkipIntervalfor testing docvalues skipper index [lucene]

2024-07-09 Thread via GitHub
iverase commented on code in PR #13550: URL: https://github.com/apache/lucene/pull/13550#discussion_r1670101147 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -96,6 +97,7 @@ public Lucene90DocValuesConsumer( state.segme

Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-09 Thread via GitHub
jpountz commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2216965045 I pushed an annotation, the speedup on some queries is huge: https://people.apache.org/~mikemccand/lucenebench/CombinedHighHigh.html. -- This is an automated message from the Apache Gi

Re: [PR] Improve VectorUtil::xorBitCount perf on ARM [lucene]

2024-07-09 Thread via GitHub
ChrisHegarty commented on PR #13545: URL: https://github.com/apache/lucene/pull/13545#issuecomment-2216914939 @uschindler Apologies, I didn't notice this when cherrypicking. Thanks for reverting (while I was sleeping ;-) ) -- This is an automated message from the Apache Git Service. To re

Re: [PR] Introduce TestLucene90DocValuesFormatVariableSkipIntervalfor testing docvalues skipper index [lucene]

2024-07-09 Thread via GitHub
jpountz commented on code in PR #13550: URL: https://github.com/apache/lucene/pull/13550#discussion_r1669856325 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesConsumer.java: ## @@ -96,6 +97,7 @@ public Lucene90DocValuesConsumer( state.segme