Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-20 Thread via GitHub
dungba88 commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r1964980144 ## lucene/core/src/java/org/apache/lucene/search/OptimisticKnnVectorQuery.java: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

[PR] PointInSetQuery clips segments by lower and upper [lucene]

2025-02-20 Thread via GitHub
hanbj opened a new pull request, #14268: URL: https://github.com/apache/lucene/pull/14268 ### Description When creating a PointInSetQuery object, the data in the packedPoints parameter is returned in order, so the maximum and minimum values ​​can be determined when iterating over packedP

[PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-02-20 Thread via GitHub
hanbj opened a new pull request, #14267: URL: https://github.com/apache/lucene/pull/14267 ### Description When lowerPoint is equal to upperPoint. In fact, there is no need to compare lowerPoint and upperPoint at the same time. The number of comparisons can be reduced by half when collect

Re: [I] Refactor QueryCache to improve concurrency and performance [lucene]

2025-02-20 Thread via GitHub
sgup432 commented on issue #14222: URL: https://github.com/apache/lucene/issues/14222#issuecomment-2673153965 I got busy with other stuff but got sometime to run initial benchmark for this. I essentially micro-benchmarked `putIfAbsent()` and `get()`methods in isolation for QueryCache

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2025-02-20 Thread via GitHub
github-actions[bot] commented on PR #13948: URL: https://github.com/apache/lucene/pull/13948#issuecomment-2673020214 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[jira] [Updated] (LUCENE-6809) DictionaryCompoundWordTokenFilter should respect minSubwordSize also for fragments

2025-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated LUCENE-6809: --- Labels: pull-request-available (was: ) > DictionaryCompoundWordTokenFilter should respect m

Re: [I] DictionaryCompoundWordTokenFilter should respect minSubwordSize also for fragments [LUCENE-6809] [lucene]

2025-02-20 Thread via GitHub
renatoh commented on issue #7867: URL: https://github.com/apache/lucene/issues/7867#issuecomment-2672713650 In my opinion, the root issue is that DictionaryCompoundWordTokenFilter is not consuming the characters of a found word. As an example: The German word Schweinefleisch (literally tr

Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-20 Thread via GitHub
mkhludnev closed issue #14260: TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails URL: https://github.com/apache/lucene/issues/14260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on issue #14260: URL: https://github.com/apache/lucene/issues/14260#issuecomment-2672662321 `10.x` nigthly passed https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-10.x/128/changes Awaiting `main`. -- This is an automated message from the Apache Gi

Re: [I] Make HNSW merges faster [lucene]

2025-02-20 Thread via GitHub
benwtrent commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-2672345401 Their LID technique almost feels like boot strapping with `log(n)` clusters... I wonder if we could simply gather `log(n)` clusters, then at merge time we merge like cluste

[I] Flaky `TestKnnByteVectorQueryMMap.testRandomWithFilter` test failures [lucene]

2025-02-20 Thread via GitHub
benwtrent opened a new issue, #14266: URL: https://github.com/apache/lucene/issues/14266 ### Description I noticed a 10x failure for `TestKnnByteVectorQueryMMap.testRandomWithFilter` and its an interestingly weird edge case, and exceptionally improbable. I initially thought t

[PR] Reduce knn recall test flakiness [lucene]

2025-02-20 Thread via GitHub
benwtrent opened a new pull request, #14265: URL: https://github.com/apache/lucene/pull/14265 I have noticed some additional flakiness for knn recall. For example: ``` ./gradlew test --tests TestPerFieldKnnVectorsFormat.testRecall -Dtests.seed=FAEFE5196FDED25B -Dtests.local

Re: [I] Make HNSW merges faster [lucene]

2025-02-20 Thread via GitHub
benwtrent commented on issue #12440: URL: https://github.com/apache/lucene/issues/12440#issuecomment-2672002754 Some more resent research was shown to me by @tteofili https://arxiv.org/pdf/2501.13992 claiming not only faster search, but also more than 2x faster hnsw graph building (w

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-20 Thread via GitHub
benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2671796667 So, these adverse scenarios where connect components has to do a ton of work all stem from us keeping the graph very sparse (e.g. only connecting diverse nodes). I wonder

Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-20 Thread via GitHub
benwtrent commented on issue #14260: URL: https://github.com/apache/lucene/issues/14260#issuecomment-2671560417 Ah, nice :). I ran it again locally, and it worked well. I think this issue can be closed. thank you @mkhludnev ! -- This is an automated message from the Apache Git Service. To

Re: [PR] fix nightly test #14260 request all hits (#14263) [lucene]

2025-02-20 Thread via GitHub
mkhludnev merged PR #14264: URL: https://github.com/apache/lucene/pull/14264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] fix nightly test #14260 request all hits (#14263) [lucene]

2025-02-20 Thread via GitHub
mkhludnev opened a new pull request, #14264: URL: https://github.com/apache/lucene/pull/14264 TestSsDvMultiRangeQuery#testDuelWithStandardDisjunction (cherry picked from commit 27079706ef1f8341b2033efde767e95045c91f6c) -- This is an automated message from the Apache Git Serv

Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on issue #14260: URL: https://github.com/apache/lucene/issues/14260#issuecomment-2671466187 pushed fix #14263 Thanks @benwtrent Luckily it breaks only nightly test. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on issue #14260: URL: https://github.com/apache/lucene/issues/14260#issuecomment-2671467410 failure https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1578/testReport/junit/org.apache.lucene.sandbox.search/TestSsDvMultiRangeQuery/testDuelWithStandardDisj

Re: [PR] fix #14260 assert error TestSsDvMultiRangeQuery#testDuelWithStandardDisjunction [lucene]

2025-02-20 Thread via GitHub
mkhludnev merged PR #14263: URL: https://github.com/apache/lucene/pull/14263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on PR #13974: URL: https://github.com/apache/lucene/pull/13974#issuecomment-2671447690 one idea: - optimize one seekCeil() call `if r.upper==r.lower || r.equals(r.lower)` TBC -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] fix #14260 assert error TestSsDvMultiRangeQuery#testDuelWithStandardDisjunction [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on PR #14263: URL: https://github.com/apache/lucene/pull/14263#issuecomment-2671380907 related #13974 fix #14260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] fix assert error TestSsDvMultiRangeQuery#testDuelWithStandardDisjunction #14260 [lucene]

2025-02-20 Thread via GitHub
mkhludnev opened a new pull request, #14263: URL: https://github.com/apache/lucene/pull/14263 ### Description Request all hints to fix assert error in TestSsDvMultiRangeQuery#testDuelWithStandardDisjunction -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-02-20 Thread via GitHub
mkhludnev commented on PR #13974: URL: https://github.com/apache/lucene/pull/13974#issuecomment-2670991159 pardon #14260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To