Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

2024-05-08 Thread via GitHub
vsop-479 commented on code in PR #13072: URL: https://github.com/apache/lucene/pull/13072#discussion_r1595034249 ## lucene/core/src/java/org/apache/lucene/util/automaton/RunAutomaton.java: ## @@ -67,12 +68,16 @@ protected RunAutomaton(Automaton a, int alphabetSize) { points

Re: [I] Make intra tasks in IndexingChain.flush parallel execute. [lucene]

2024-05-08 Thread via GitHub
vsop-479 commented on issue #13349: URL: https://github.com/apache/lucene/issues/13349#issuecomment-2101865186 I think you are right @jpountz . Since indexing already use almost all resources in many cases, maybe it is less worth to add an executor to make intra tasks parallel execute for

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2101570838 In general we may still look at Robert's suggestion. If we plan to send a preload for mayne slices, we should think of adding another random API to `RandomAccessIndexInput`, something

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13337: URL: https://github.com/apache/lucene/pull/13337#discussion_r1594719247 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -50,6 +51,7 @@ abstract class MemorySegmentIndexInput extends IndexInput impl

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13337: URL: https://github.com/apache/lucene/pull/13337#discussion_r1594719247 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -50,6 +51,7 @@ abstract class MemorySegmentIndexInput extends IndexInput impl

Re: [I] Doc out of order issue from Lucene 8.10.1 [lucene]

2024-05-08 Thread via GitHub
jpountz commented on issue #13338: URL: https://github.com/apache/lucene/issues/13338#issuecomment-2101392431 This log is scary: out-of-order doc IDs, doc freq greater than total term freq, there is a major issue here. Are you doing something exotic on this index? Can you replicate this bug

[I] NRT failure due to SegmentInfo & File mismatch [lucene]

2024-05-08 Thread via GitHub
benwtrent opened a new issue, #13353: URL: https://github.com/apache/lucene/issues/13353 ### Description There has been a nasty test failure in ES for awhile: https://github.com/elastic/elasticsearch/issues/105122 The test simulates a document indexing failure. It turns out, th

Re: [I] Make intra tasks in IndexingChain.flush parallel execute. [lucene]

2024-05-08 Thread via GitHub
jpountz commented on issue #13349: URL: https://github.com/apache/lucene/issues/13349#issuecomment-2101365494 Lucene already has a model for indexing/flushing concurrency that consists of indexing documents from multiple threads. I guess that the idea that you are suggesting could make sens

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2101321131 Now let's go to vacation and before that have beers on German "Vatertag". Please backport those changes here, too as this bug also affects 9.x -- This is an automated message from t

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2101312191 ok, I pushed my changes: - `CI=true` is only useful for CI and preserves old behaviour - to enable default vectorization settings and enable Hotspot's C2 use `-Ptests.defaultvect

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2101296885 I found a solution. It was a bit tricky, but works. Will commit to this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
dweiss commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2101242254 > P.S.: Let's please merge the default-tests.gradle and randomization.gradle! This code is unreadable! All in one file would be much better. I'm pretty sure there was some convolut

Re: [PR] Reduce memory usage of field maps in FieldInfos and BlockTree TermsReader. [lucene]

2024-05-08 Thread via GitHub
dsmiley commented on code in PR #13327: URL: https://github.com/apache/lucene/pull/13327#discussion_r1594462331 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -156,15 +148,38 @@ public FieldInfos(FieldInfo[] infos) { this.softDeletesField = softDelet

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-08 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2101074166 > The `cohere-768-IP`, this was coherev2 correct? @benwtrent Thanks for your response. I'm not exactly sure about the version of it. But, this is the dataset. https

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2101049841 Hrm, I am hopeless at moment. Need a bit of freetime to walk around. The issue is that jvmArgs are resolved early, while the test properties are done in randomization.gradle. This cau

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2100877006 My problem is: `randomization.gradle` resolves the test options lazily in `afterEvaluate` and assigns them to `ext.testOptionsResolved`. In contrast `defaults' test evaluates th

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-08 Thread via GitHub
benwtrent commented on code in PR #13339: URL: https://github.com/apache/lucene/pull/13339#discussion_r1594222886 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -139,7 +139,7 @@ public final class Lucene99HnswVectorsFormat extends

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-08 Thread via GitHub
ChrisHegarty commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2100834266 > Still working on reviewing, busy busy busy and on vacation starting from tomorrow! I appreciate your ongoing review. Pause your review for now, enjoy your vacation, and I'l

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13339: URL: https://github.com/apache/lucene/pull/13339#issuecomment-2100821063 Still working on reviewing, busy busy busy and on vacation starting from tomorrow! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2100789697 I am trying to implement this, but actually there is some horrible code duplication in `test-defaults.gradle` and `randomization.gradle`. It resolves all test options 2 times and then

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-08 Thread via GitHub
benwtrent commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2100722572 @naveentatikonda interesting results for sure. I tested with max-inner-product & CohereV2 and didn't see a drop like this. I will try and replicate. The `cohere-768-IP`, t

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13351: URL: https://github.com/apache/lucene/pull/13351#discussion_r159409 ## gradle/testing/randomization.gradle: ## @@ -107,10 +107,14 @@ allprojects { // vectorization related [propName: 'tests.vectorsize',

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-08 Thread via GitHub
benwtrent commented on PR #13308: URL: https://github.com/apache/lucene/pull/13308#issuecomment-2100628867 @Pulkitg64 could you deprecate also `VectorUtil.cosine`? That is technically a public API. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-08 Thread via GitHub
Pulkitg64 commented on PR #13308: URL: https://github.com/apache/lucene/pull/13308#issuecomment-2100613810 > > I didn't find any internal usages of this function. > > That doesn't make sense to me. IntelliJ tells me there are over 30 usages of VectorSimilarityFunction.COSINE. S

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13351: URL: https://github.com/apache/lucene/pull/13351#issuecomment-2100606534 I have a little bit of problem with that. Our CI infrastructure passes `CI=true` and basically we only want there that the "make tests fast" JVM option is removed (it removes the sett

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13351: URL: https://github.com/apache/lucene/pull/13351#discussion_r1594041203 ## gradle/testing/randomization.gradle: ## @@ -107,10 +107,14 @@ allprojects { // vectorization related [propName: 'tests.vectorsize',

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-08 Thread via GitHub
uschindler commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2100583761 Hi, thats a fine PR but it still has some problem, so I'd tend to move away from "devlopers" using CI=true and instead use another setting. Will explain in the PR. -- This is

[I] TestIndexWriterMergePolicy.testStressUpdateSameDocumentWithMergeOnGetReader failure due to DWPT race condition [lucene]

2024-05-08 Thread via GitHub
benwtrent opened a new issue, #13352: URL: https://github.com/apache/lucene/issues/13352 ### Description This replicates rarely for me locally. To me this indicates a race condition instead of some logic error. ``` java.lang.AssertionError: seqNo=114262 vs maxSeqNo=11426

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
dweiss commented on code in PR #13351: URL: https://github.com/apache/lucene/pull/13351#discussion_r1593961189 ## gradle/testing/randomization.gradle: ## @@ -107,10 +107,14 @@ allprojects { // vectorization related [propName: 'tests.vectorsize',

Re: [PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
dweiss commented on code in PR #13351: URL: https://github.com/apache/lucene/pull/13351#discussion_r1593959051 ## gradle/testing/randomization.gradle: ## @@ -107,10 +107,14 @@ allprojects { // vectorization related [propName: 'tests.vectorsize',

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-08 Thread via GitHub
benwtrent commented on PR #13308: URL: https://github.com/apache/lucene/pull/13308#issuecomment-2100451382 > I didn't find any internal usages of this function. That doesn't make sense to me. IntelliJ tells me there are over 30 usages of VectorSimilarityFunction.COSINE. -- This is

[PR] Enable Panama Vector similarities implementation always in the CI [lucene]

2024-05-08 Thread via GitHub
ChrisHegarty opened a new pull request, #13351: URL: https://github.com/apache/lucene/pull/13351 This commit enables the Panama Vector similarities implementation always in the CI. After this change, the CI will **always** run the tests with the Panama Vector similarities implementa

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-08 Thread via GitHub
uschindler commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2100253239 P.S.: I had the same idea to enable it randomly by adding another item to the random list. Next to that, maybe the best would be to have another option to do two things:

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-08 Thread via GitHub
uschindler commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2100249027 Hi, sorry for being late. I am busy at moment, but as quick fix I applied locally the following: ```diff gradle/testing/randomization.gradle | 8 ++-- 1 file cha

Re: [PR] [GENE-2434] - Load vault token from file [lucene-solr]

2024-05-08 Thread via GitHub
itygh commented on PR #2685: URL: https://github.com/apache/lucene-solr/pull/2685#issuecomment-2100212444 这是来自QQ邮箱的假期自动回复邮件。您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] [GENE-2434] - Load vault token from file [lucene-solr]

2024-05-08 Thread via GitHub
puru-yanamala opened a new pull request, #2685: URL: https://github.com/apache/lucene-solr/pull/2685 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-08 Thread via GitHub
ChrisHegarty commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2100196605 @uschindler your idea above is clearly preferable, but another alternative is to just add to the set of values in `vectorsize`, that will result in enabling Panama Vector rand

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-08 Thread via GitHub
ChrisHegarty commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1593562581 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [PR] Performance improvements to use RWLock to access LRUQueryCache [lucene]

2024-05-08 Thread via GitHub
boicehuang commented on PR #13306: URL: https://github.com/apache/lucene/pull/13306#issuecomment-2099953047 Current improvement number of the submitted code version. doc count | field cardinality | query point | baseline QPS | candidate QPS | diff percentage | diff -- | -- | -- | -