[PR] Fixed a flaky test TestKnnFloatVectorQuery.testFindFewer [lucene]

2025-02-12 Thread via GitHub
navneet1v opened a new pull request, #14223: URL: https://github.com/apache/lucene/pull/14223 ### Description Fixed a flaky test TestKnnFloatVectorQuery.testFindFewer This PR solves the issue: https://github.com/apache/lucene/issues/14175 Details on why the test was faili

Re: [I] Refactor QueryCache to improve concurrency and performance [lucene]

2025-02-12 Thread via GitHub
stefanvodita commented on issue #14222: URL: https://github.com/apache/lucene/issues/14222#issuecomment-2653320901 Sounds interesting, keen to see if we measure a performance improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-12 Thread via GitHub
iverase commented on code in PR #14213: URL: https://github.com/apache/lucene/pull/14213#discussion_r1952672292 ## lucene/core/src/java/org/apache/lucene/index/StoredFieldVisitor.java: ## @@ -41,15 +40,17 @@ public abstract class StoredFieldVisitor { protected StoredFieldVisi

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-12 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2653948712 I refactor code to inner-loop. Result on wikimediumall AVX512 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct d

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-12 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2653702500 Barring any further discussion, I am gonna merge this soon. While the filtering search still isn't perfect, this is a marked improvement. -- This is an automated message from the Apa

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-12 Thread via GitHub
iverase commented on code in PR #14213: URL: https://github.com/apache/lucene/pull/14213#discussion_r1952663407 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/compressing/Lucene50CompressingStoredFieldsReader.java: ## @@ -25,12 +25,7 @@ import org

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-12 Thread via GitHub
iverase commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2653747350 +1 I like the idea of encapsulating the DataInput and length inside `StoredFieldDataInput`, and yes, it gives symmetry between the read and the write side. -- This is an automated mes

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-12 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2653001052 Inner loop performance get better on the newest commit. ``` Mac M2 Benchmark(bpv) (countVariable) Mode CntScore Error Units BKDCodec

[PR] Enable error-prone checks for NonFinalStaticField [lucene]

2025-02-12 Thread via GitHub
msfroh opened a new pull request, #14228: URL: https://github.com/apache/lucene/pull/14228 ### Description On https://github.com/apache/lucene/pull/14221, I foolishly asserted that the error-prone `NonFinalStaticField` checks would be "a lot less noisy". It seems that PR and its mere

Re: [PR] Fixed a flaky test TestKnnFloatVectorQuery.testFindFewer [lucene]

2025-02-12 Thread via GitHub
navneet1v commented on PR #14223: URL: https://github.com/apache/lucene/pull/14223#issuecomment-2655527582 An unrelated test to this change is failing. ``` TestForTooMuchCloning > test FAILED java.lang.AssertionError: too many calls to IndexInput.clone during merging: 523

[I] Estimate memory usage for merges [lucene]

2025-02-12 Thread via GitHub
carlosdelest opened a new issue, #14225: URL: https://github.com/apache/lucene/issues/14225 ### Description Merges should expose an estimation of the amount of heap memory that will be used for the actual merge to be done. This information should be exposed as part of the merge

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
rmuir commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654392511 I can reproduce it, thanks! My concern is around the `[]`, this is an empty "character class", maybe unlucky result from the RNG? Otherwise when 'empty' logic is wrong, usually you kn

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
benwtrent commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654393417 Thanks @rmuir ! I really don't know if it's a test data failure or something interesting actually broke. -- This is an automated message from the Apache Git Service. To r

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
rmuir commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654396301 I see `[]` in the regexp which matches no possible characters, so it creates Automata.makeEmpty(). Concatenation of empty with anything else returns empty, so that's why the test fail

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-12 Thread via GitHub
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1953282099 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used t

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-12 Thread via GitHub
benwtrent commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2654818796 This is indeed interesting. I eagerly wait some of the perf numbers ;). But, this re-entering the graph makes me think that collectors will need to track visitation. This way the

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-12 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2654820602 Hmm github's test run failed with: ``` gradlew :lucene:core:test --tests "org.apache.lucene.search.TestByteVectorSimilarityQuery.testFallbackToExact" -Ptests.jvms=1 -Ptests.jv

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-12 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2654826667 > But, this re-entering the graph makes me think that collectors will need to track visitation. This way the same vector path isn't visited multiple times. Its possible that once you en

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
rmuir commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654406157 i think a test against old parser should show it. maybe it indirectly did the `peek()`s and `match()`s differently in the case of no content within a `[]`, and didnt try to parse it

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-12 Thread via GitHub
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1953123437 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used t

[I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
benwtrent opened a new issue, #14224: URL: https://github.com/apache/lucene/issues/14224 ### Description I bet the failure is due to the recent development around `Automaton` stuffs. Gitbisect led me to fe42efc5918 ``` TestOperations > testGetRandomAcceptedString FAI

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-12 Thread via GitHub
benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2654755040 So, verifying the "fewDistinct" slowness, here is how connect components works in this adverse case: ``` 1> HNSW 1 [2025-02-12T20:14:45.641640Z; TEST-TestKnnFloatVector

[PR] OptimisticKnnVectorQuery [lucene]

2025-02-12 Thread via GitHub
msokolov opened a new pull request, #14226: URL: https://github.com/apache/lucene/pull/14226 ### Description This is a WIP patch to work out an idea for Knn hit collection that is deterministic and efficient in the sense that the number of hits collected per leaf scales with the size

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
rmuir commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2655012824 This one was simpler, it is just a case where you should be getting a parsing error: https://github.com/apache/lucene/pull/14227 -- This is an automated message from the Apache Git

[PR] Fix failure found by TestOperations.testGetRandomAcceptedString [lucene]

2025-02-12 Thread via GitHub
rmuir opened a new pull request, #14227: URL: https://github.com/apache/lucene/pull/14227 string: `?+½]+]+Ř*+[\]ᖴﴁ.` expected: before #14193 ``` java.lang.IllegalArgumentException: expected ']' at position 17 ``` actual: after #14193 ``` REGEXP_CONCATENATION

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
benwtrent commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654345727 Here is the regex string that creates the empty operations: ``` regex str: (\?){1,}\½(\]){1,}\(\]){1,}((\Ř)*){1,}[]"ᖴﴁ". ``` -- This is an automated message from t

Re: [PR] Upgrade errorprone to 2.36.0 [lucene]

2025-02-12 Thread via GitHub
dweiss merged PR #14216: URL: https://github.com/apache/lucene/pull/14216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Clean up public non-final statics [lucene]

2025-02-12 Thread via GitHub
dweiss merged PR #14221: URL: https://github.com/apache/lucene/pull/14221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-12 Thread via GitHub
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1953286478 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used t

Re: [I] TestOperations.testGetRandomAcceptedString failing [lucene]

2025-02-12 Thread via GitHub
rmuir commented on issue #14224: URL: https://github.com/apache/lucene/issues/14224#issuecomment-2654370971 @benwtrent I can look tonight -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Clean up public non-final statics [lucene]

2025-02-12 Thread via GitHub
msfroh commented on PR #14221: URL: https://github.com/apache/lucene/pull/14221#issuecomment-2654462496 > #14216 may help with this specifically `NonFinalStaticField` which was off since it was super noisy. https://errorprone.info/bugpattern/NonFinalStaticField Once #14216 and this PR

Re: [PR] Clean up public non-final statics [lucene]

2025-02-12 Thread via GitHub
dweiss commented on PR #14221: URL: https://github.com/apache/lucene/pull/14221#issuecomment-2654625492 Done. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] supports force merge based on specified segments. [lucene]

2025-02-12 Thread via GitHub
cgejian commented on PR #14163: URL: https://github.com/apache/lucene/pull/14163#issuecomment-2655275009 @mikemccand This is my first time submitting a PR to the Lucene project, Could you please tell me if this PR can be merged? Looking forward to your reply, thank you. -- This is an