Re: [PR] Performance improvements to use RWLock to access LRUQueryCache [lucene]

2024-05-09 Thread via GitHub
boicehuang commented on PR #13306: URL: https://github.com/apache/lucene/pull/13306#issuecomment-2103777811 > I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed. > > @boicehuang could you add

Re: [I] Make intra tasks in IndexingChain.flush parallel execute. [lucene]

2024-05-09 Thread via GitHub
vsop-479 closed issue #13349: Make intra tasks in IndexingChain.flush parallel execute. URL: https://github.com/apache/lucene/issues/13349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-09 Thread via GitHub
zhongshanhao commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1596142310 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -56,9 +56,29 @@ final class BlockMaxConjunctionBulkScorer extends BulkS

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-09 Thread via GitHub
github-actions[bot] commented on PR #13253: URL: https://github.com/apache/lucene/pull/13253#issuecomment-2103636752 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2103558162 OK, I ran it again, on my index where the flush was set at 28MB & force merged. This time I ran it over all 10k queries (previously it was just 1k, as calculating the true nearest

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-05-09 Thread via GitHub
jpountz commented on issue #9721: URL: https://github.com/apache/lucene/issues/9721#issuecomment-2103468211 I'd really like to keep intra-segment parallelism simple and stick to splitting the doc ID space, which is the most natural approach for queries that produce good iterators like term

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-09 Thread via GitHub
kaivalnp commented on PR #13285: URL: https://github.com/apache/lucene/pull/13285#issuecomment-2103463681 Saw some merge conflicts after a recent commit and resolved those.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Remove unused "implements Accountable". [lucene]

2024-05-09 Thread via GitHub
jpountz merged PR #13330: URL: https://github.com/apache/lucene/pull/13330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-09 Thread via GitHub
jpountz commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1596001365 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -68,18 +88,10 @@ public int score(LeafCollector collector, Bits acceptDocs,

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-09 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2103397669 > also, i'm a little concerned about low-level parallelization of e.g. individual stored documents. seems like a lot of overhead! if you need 10,000 documents ranges, at least make a sin

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-09 Thread via GitHub
benwtrent merged PR #13181: URL: https://github.com/apache/lucene/pull/13181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-09 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2103346625 > @naveentatikonda using lucene-util, scalar quantization, I get a recall@100 of `0.735`. > This looks much better. I will try to set it up and reproduce with lucene-

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2103282067 @naveentatikonda using lucene-util, scalar quantization, I get a recall@100 of `0.735`. I am calculating the recall by gathering the true 100 nearest neighbors from the tes

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-09 Thread via GitHub
Pulkitg64 commented on code in PR #13308: URL: https://github.com/apache/lucene/pull/13308#discussion_r1595465708 ## lucene/CHANGES.txt: ## @@ -102,6 +102,8 @@ API Changes Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others.

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on code in PR #13308: URL: https://github.com/apache/lucene/pull/13308#discussion_r1595456541 ## lucene/CHANGES.txt: ## @@ -102,6 +102,8 @@ API Changes Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others.

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2102651378 > Backporting would only require that you may need to duplicate versions for 19, 20, 21+. 19 has no vectorization, but 20 and 21 have identical vector code but differences in memory

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2102634217 Backporting would only require that you may need to duplicate versions for 19, 20, 21+. 19 has no vectorization, but 20 and 21 have identical vector code but differences in memory seg

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2102610850 > > Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct? > > Correct that it requires JDK 21+ and Panama

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on code in PR #13339: URL: https://github.com/apache/lucene/pull/13339#discussion_r1595406942 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentByteVectorScorerSupplier.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache So

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2102593155 > Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct? Correct that it requires JDK 21+ and Panama Vec

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
msokolov commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2102582255 Question / confirmation please -- since this requires Panama, thus JDK 21+ (I think?) it can only target Lucene 10+, correct? -- This is an automated message from the Apache Git Servi

Re: [I] NRT failure due to FieldInfo & File mismatch [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on issue #13353: URL: https://github.com/apache/lucene/issues/13353#issuecomment-2102572650 What makes matters worse, is that it doesn't even have to be ALL docs that failed, just some of them that had point values (or knn vector values, etc.). Anything that eagerly upda

Re: [I] NRT failure due to FieldInfo & File mismatch [lucene]

2024-05-09 Thread via GitHub
benwtrent commented on issue #13353: URL: https://github.com/apache/lucene/issues/13353#issuecomment-2102567007 I am having a difficult time figuring out how to fix this. It seems to me that if the segment is "hard deleted", we should reset all its FieldInfos as there isn't any data written

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102565315 My last comment here: It would be good to somehow document this in the file so it is clear why all those combinations discussed here are "correct", although they seem to be wrong. Aft

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on code in PR #13339: URL: https://github.com/apache/lucene/pull/13339#discussion_r1595313325 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -139,7 +139,7 @@ public final class Lucene99HnswVectorsFormat exte

[I] Merges sometimes do lots of work even after being aborted [lucene]

2024-05-09 Thread via GitHub
DaveCTurner opened a new issue, #13354: URL: https://github.com/apache/lucene/issues/13354 ### Description We see some Lucene indices taking many seconds (occasionally minutes) to abort merges during rollback, doing a lot of now-pointless IO, with the merge thread spending all its ti

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty merged PR #13351: URL: https://github.com/apache/lucene/pull/13351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102346506 > Hm, actually when we keep "default" in the random pick, the CI builds sometimes use real conditions, so we should keep it! Agreed. I'm declaring that this PR is done. I wil

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102340226 Hm, actually when we keep "default" in the random pick, the CI builds sometimes use real conditions, so we should keep it! Let's keep the current state. -- This is an automat

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102322383 If we remove default we should set the default value to null. Then it passes no sysprop at all. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102316837 We can still keep the default parser in the provider code. It will be used by the default vectorization argument. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102313946 Hi, good point. This simplifies logic more. I would also like to move the jvmargs over to randomization.gradle. -- This is an automated message from the Apache Git Service. T

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102304137 There is a problem with `default` - it does not do what you might expect it to do - it does not enable Panama Vector (unless you prevent C2 from being disabled). For example:

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on code in PR #13339: URL: https://github.com/apache/lucene/pull/13339#discussion_r1595176480 ## lucene/core/src/java/org/apache/lucene/store/FilterIndexInput.java: ## @@ -40,6 +48,19 @@ public static IndexInput unwrap(IndexInput in) { return in; }

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102236939 > The reason for the jvmargs is to not enable C2 for our short running tests with lots of randomization. This causes dramatic overhead (Robert tested this) on total test runtime. So

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102232774 See: https://github.com/apache/lucene/issues/10200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102223583 > > Actually that's wanted: Because we want tests by default run fast. For that the CI env var sets empty jvmargs. With your latest change we randomly slowdown the tests. > > I

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102214628 >Actually that's wanted: Because we want tests by default run fast. For that the CI env var sets empty jvmargs. With your latest change we randomly slowdown the tests. It's n

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102209208 > There is still one small issue. C2 will still be disabled if `default` is randomly selected, e.g. > > ``` > $ ./gradlew :lucene:core:testOpts | egrep ".*defaultvectorizati

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102206160 I updated the default setting of `tests.defaultvectorization` to include when `default` is randomly selected. E.g. ``` $ ./gradlew :lucene:core:testOpts | egrep ".*default

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102182479 There is still one small issue. C2 will still be disabled if `default` is randomly selected, e.g. ``` $ ./gradlew :lucene:core:testOpts | egrep ".*defaultvectorization.*|.

Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

2024-05-09 Thread via GitHub
vsop-479 commented on code in PR #13072: URL: https://github.com/apache/lucene/pull/13072#discussion_r1595034249 ## lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java: ## @@ -67,12 +68,16 @@ protected RunAutomaton(Automaton a, int alphabetSize) { points

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102138884 I tested the combination by repeatedly executing: `gradlew :lucene:core:testOpts`, appending several parameters. All combinations seems right: - if you pass `-Ptests.defaultvectori

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
ChrisHegarty commented on code in PR #13351: URL: https://github.com/apache/lucene/pull/13351#discussion_r1595089366 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -47,7 +47,7 @@ public abstract class VectorizationProvider {

Re: [PR] Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults [lucene]

2024-05-09 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2102122837 I added a bit of code to make the integer vector enforcement always `false` if the default vectorization settings are used. -- This is an automated message from the Apache Git Servi