Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]

2025-06-01 Thread via GitHub


gf2121 commented on code in PR #14739:
URL: https://github.com/apache/lucene/pull/14739#discussion_r2117962626


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java:
##
@@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, 
int min, int max) thr
   // NOTE: windowMax is inclusive
   int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1);
 
-  float maxWindowScore = Float.POSITIVE_INFINITY;
   if (0 < scorable.minCompetitiveScore) {
-maxWindowScore = computeMaxScore(windowMin, windowMax);
+float maxWindowScore = computeMaxScore(windowMin, windowMax);
+scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, 
maxWindowScore);
+  } else {
+scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1);

Review Comment:
   So `minCompetitiveScore` won't get a chance to be respected when filter 
clause leads the query because `windowMax` is `DocIdSetIterator#NO_MORE_DOCS`, 
could this cause regression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]

2025-06-01 Thread via GitHub


romseygeek commented on PR #14729:
URL: https://github.com/apache/lucene/pull/14729#issuecomment-2927412166

   The advantage of a `Rescorer` is that is is explicitly only run over the 
hits in a `TopDocs` instance, whereas `FunctionScoreQuery` will run over the 
entire docid space if you let it.  So it's a natural fit for a late-interaction 
search process - run your first query over the whole document set to get a 
preliminary top-k, and then pass the resulting `TopDocs` to your rescorer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14708:
URL: https://github.com/apache/lucene/pull/14708#issuecomment-2928159436

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix java doc in IndexWriter. [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14733:
URL: https://github.com/apache/lucene/pull/14733#issuecomment-2928438304

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix java doc in IndexWriter. [lucene]

2025-06-01 Thread via GitHub


vsop-479 commented on code in PR #14733:
URL: https://github.com/apache/lucene/pull/14733#discussion_r2119876685


##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -469,9 +469,9 @@ public void onTicketBacklog() {
* session can be quickly made available for searching without closing the 
writer nor calling
* {@link #commit}.
*
-   * Note that this is functionally equivalent to calling {#flush} and then 
opening a new reader.
-   * But the turnaround time of this method should be faster since it avoids 
the potentially costly
-   * {@link #commit}.
+   * Note that this is functionally equivalent to calling {@link #flush} 
and then opening a new

Review Comment:
   Thanks @stefanvodita , I resolved it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build(deps): bump ruff from 0.11.9 to 0.11.12 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


dependabot[bot] opened a new pull request, #14744:
URL: https://github.com/apache/lucene/pull/14744

   Bumps [ruff](https://github.com/astral-sh/ruff) from 0.11.9 to 0.11.12.
   
   Release notes
   Sourced from https://github.com/astral-sh/ruff/releases";>ruff's releases.
   
   0.11.12
   Release Notes
   Preview features
   
   [airflow] Revise fix titles (AIR3) (https://redirect.github.com/astral-sh/ruff/pull/18215";>#18215)
   [pylint] Implement missing-maxsplit-arg 
(PLC0207) (https://redirect.github.com/astral-sh/ruff/pull/17454";>#17454)
   [pyupgrade] New rule UP050 
(useless-class-metaclass-type) (https://redirect.github.com/astral-sh/ruff/pull/18334";>#18334)
   [flake8-use-pathlib] Replace os.symlink with 
Path.symlink_to (PTH211) (https://redirect.github.com/astral-sh/ruff/pull/18337";>#18337)
   
   Bug fixes
   
   [flake8-bugbear] Ignore __debug__ attribute in 
B010 (https://redirect.github.com/astral-sh/ruff/pull/18357";>#18357)
   [flake8-async] Fix anyio.sleep argument name 
(ASYNC115, ASYNC116) (https://redirect.github.com/astral-sh/ruff/pull/18262";>#18262)
   [refurb] Fix FURB129 autofix generating 
invalid syntax (https://redirect.github.com/astral-sh/ruff/pull/18235";>#18235)
   
   Rule changes
   
   [flake8-implicit-str-concat] Add autofix for 
ISC003 (https://redirect.github.com/astral-sh/ruff/pull/18256";>#18256)
   [pycodestyle] Improve the diagnostic message for 
E712 (https://redirect.github.com/astral-sh/ruff/pull/18328";>#18328)
   [flake8-2020] Fix diagnostic message for != 
comparisons (YTT201) (https://redirect.github.com/astral-sh/ruff/pull/18293";>#18293)
   [pyupgrade] Make fix unsafe if it deletes comments 
(UP010) (https://redirect.github.com/astral-sh/ruff/pull/18291";>#18291)
   
   Documentation
   
   Simplify rules table to improve readability (https://redirect.github.com/astral-sh/ruff/pull/18297";>#18297)
   Update editor integrations link in README (https://redirect.github.com/astral-sh/ruff/pull/17977";>#17977)
   [flake8-bugbear] Add fix safety section (B006) 
(https://redirect.github.com/astral-sh/ruff/pull/17652";>#17652)
   
   Contributors
   
   https://github.com/AlexWaygood";>@​AlexWaygood
   https://github.com/CodeMan62";>@​CodeMan62
   https://github.com/InSyncWithFoo";>@​InSyncWithFoo
   https://github.com/Kalmaegi";>@​Kalmaegi
   https://github.com/LaBatata101";>@​LaBatata101
   https://github.com/Lee-W";>@​Lee-W
   https://github.com/MaddyGuthridge";>@​MaddyGuthridge
   https://github.com/MatthewMckee4";>@​MatthewMckee4
   https://github.com/MichaReiser";>@​MichaReiser
   https://github.com/Vasanth-96";>@​Vasanth-96
   https://github.com/carljm";>@​carljm
   https://github.com/charliermarsh";>@​charliermarsh
   https://github.com/chirizxc";>@​chirizxc
   https://github.com/dcreager";>@​dcreager
   https://github.com/dhruvmanila";>@​dhruvmanila
   https://github.com/dsherret";>@​dsherret
   https://github.com/dylwil3";>@​dylwil3
   https://github.com/felixscherz";>@​felixscherz
   https://github.com/fennr";>@​fennr
   
   
   
   ... (truncated)
   
   
   Changelog
   Sourced from https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md";>ruff's 
changelog.
   
   0.11.12
   Preview features
   
   [airflow] Revise fix titles (AIR3) (https://redirect.github.com/astral-sh/ruff/pull/18215";>#18215)
   [pylint] Implement missing-maxsplit-arg 
(PLC0207) (https://redirect.github.com/astral-sh/ruff/pull/17454";>#17454)
   [pyupgrade] New rule UP050 
(useless-class-metaclass-type) (https://redirect.github.com/astral-sh/ruff/pull/18334";>#18334)
   [flake8-use-pathlib] Replace os.symlink with 
Path.symlink_to (PTH211) (https://redirect.github.com/astral-sh/ruff/pull/18337";>#18337)
   
   Bug fixes
   
   [flake8-bugbear] Ignore __debug__ attribute in 
B010 (https://redirect.github.com/astral-sh/ruff/pull/18357";>#18357)
   [flake8-async] Fix anyio.sleep argument name 
(ASYNC115, ASYNC116) (https://redirect.github.com/astral-sh/ruff/pull/18262";>#18262)
   [refurb] Fix FURB129 autofix generating 
invalid syntax (https://redirect.github.com/astral-sh/ruff/pull/18235";>#18235)
   
   Rule changes
   
   [flake8-implicit-str-concat] Add autofix for 
ISC003 (https://redirect.github.com/astral-sh/ruff/pull/18256";>#18256)
   [pycodestyle] Improve the diagnostic message for 
E712 (https://redirect.github.com/astral-sh/ruff/pull/18328";>#18328)
   [flake8-2020] Fix diagnostic message for != 
comparisons (YTT201) (https://redirect.github.com/astral-sh/ruff/pull/18293";>#18293)
   [pyupgrade] Make fix unsafe if it deletes comments 
(UP010) (https://redirect.github.com/astral-sh/ruff/pull/18291";>#18291)
   
   Documentation
   
   Simplify rules table to improve readability (https://redirect.github.com/astral-sh/ruff/pull/18297";>#18297)
   Update editor integrations link in README (https://redirect.github.com/astral-sh/ruff/pull/17977";>#17977)
   [flake8-bugbear] Add fix safety section (B006) 
(https://redirect.github.com/astral-sh/ruff/pull/17652";>#1765

Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-01 Thread via GitHub


vigyasharma commented on PR #14708:
URL: https://github.com/apache/lucene/pull/14708#issuecomment-2928167858

   Moved full precision scores logic to a separate 
`FullPrecisionFloatVectorSimilarityValuesSource` that can take a custom vector 
similarity function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build(deps): bump basedpyright from 1.29.1 to 1.29.2 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


dependabot[bot] opened a new pull request, #14745:
URL: https://github.com/apache/lucene/pull/14745

   Bumps [basedpyright](https://github.com/detachhead/basedpyright) from 1.29.1 
to 1.29.2.
   
   Commits
   
   https://github.com/DetachHead/basedpyright/commit/cc4dcede985490e029945df0981b644b0ae806df";>cc4dced
 1.29.2
   https://github.com/DetachHead/basedpyright/commit/c42ccb1e5324e0583968c4f6d804e6fb1f6b9f58";>c42ccb1
 configure vscode to treat selfParameter and 
clsParameter semantic token t...
   https://github.com/DetachHead/basedpyright/commit/539f430b4ebb815e12e1564fcdb71d6133450a09";>539f430
 update package-lock.json
   https://github.com/DetachHead/basedpyright/commit/a6174a35fe105ebaa883caf7b7d8892c0380e36a";>a6174a3
 Merge tag '1.1.401' into merge-1.1.401
   https://github.com/DetachHead/basedpyright/commit/79632e394fe037d6f1e6c7d0c403191fc5a1b4a3";>79632e3
 Fix lint
   https://github.com/DetachHead/basedpyright/commit/feaaec000e30baca34d705e638c99ff18f98d2db";>feaaec0
 Replace with ts-expect-error as suggested
   https://github.com/DetachHead/basedpyright/commit/fd1e97e09839112c7d8e9126df7b92b0a41c80f9";>fd1e97e
 Revert accidental whitespace changes
   https://github.com/DetachHead/basedpyright/commit/1c150a27cee0942d5c50260e0f1dfc45ec49b3ff";>1c150a2
 Remove new static&class method tests, they should also have a decorator 
and a...
   https://github.com/DetachHead/basedpyright/commit/46653d1db1a850eb6753da6dcaece03d72080171";>46653d1
 Do not autocomplete @override when 
reportExplicitOverride is disabled
   https://github.com/DetachHead/basedpyright/commit/ae64f08f39e7a7bef18acca58fdafafb859bffc2";>ae64f08
 Published 1.1.401
   Additional commits viewable in https://github.com/detachhead/basedpyright/compare/v1.29.1...v1.29.2";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=basedpyright&package-manager=pip&previous-version=1.29.1&new-version=1.29.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump org.apache.rat:apache-rat from 0.14 to 0.16.1 [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14582:
URL: https://github.com/apache/lucene/pull/14582#issuecomment-2928199733

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] build(deps): bump holidays from 0.71 to 0.73 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


dependabot[bot] opened a new pull request, #14743:
URL: https://github.com/apache/lucene/pull/14743

   Bumps [holidays](https://github.com/vacanza/holidays) from 0.71 to 0.73.
   
   Release notes
   Sourced from https://github.com/vacanza/holidays/releases";>holidays's 
releases.
   
   v0.73
   Version 0.73
   Released May 19, 2025
   
   Add Cocos Islands holidays (https://redirect.github.com/vacanza/holidays/issues/2532";>#2532 by https://github.com/tr33k";>@​tr33k)
   Add Grenada holidays (https://redirect.github.com/vacanza/holidays/issues/2524";>#2524 by https://github.com/nalin-28";>@​nalin-28)
   Add Nepal holidays (https://redirect.github.com/vacanza/holidays/issues/2386";>#2386 by https://github.com/ankushhKapoor";>@​ankushhKapoor, https://github.com/arkid15r";>@​arkid15r)
   Add Togo holidays (https://redirect.github.com/vacanza/holidays/issues/2525";>#2525 by https://github.com/Roshan-1024";>@​Roshan-1024, https://github.com/KJhellico";>@​KJhellico)
   Update Andorra holidays, add l10n support (https://redirect.github.com/vacanza/holidays/issues/2530";>#2530 by https://github.com/KJhellico";>@​KJhellico)
   Update Argentina holidays: add 2018 G20 Leaders' Summit for Buenos Aires 
(https://redirect.github.com/vacanza/holidays/issues/2529";>#2529 
by https://github.com/PPsyrius";>@​PPsyrius)
   Update Philippines holidays: add special holiday May 12, 2025 (https://redirect.github.com/vacanza/holidays/issues/2539";>#2539 by https://github.com/KJhellico";>@​KJhellico)
   Update Vatican City holidays: add election and name day of Pope Leo XIV 
(https://redirect.github.com/vacanza/holidays/issues/2549";>#2549 
by https://github.com/KJhellico";>@​KJhellico)
   Update documentation build: make PR links in changelog (https://redirect.github.com/vacanza/holidays/issues/2540";>#2540 by https://github.com/KJhellico";>@​KJhellico)
   Update pre-commit config (https://redirect.github.com/vacanza/holidays/issues/2548";>#2548 by https://github.com/KJhellico";>@​KJhellico, https://github.com/arkid15r";>@​arkid15r)
   
   Full Changelog: https://github.com/vacanza/holidays/compare/v0.72...v0.73";>https://github.com/vacanza/holidays/compare/v0.72...v0.73
   v0.72
   Version 0.72
   Released May 5, 2025
   
   Add Sao Tome and Principe holidays (https://redirect.github.com/vacanza/holidays/issues/2489";>#2489 by https://github.com/tr33k";>@​tr33k, https://github.com/arkid15r";>@​arkid15r)
   Add Trinidad and Tobago holidays (https://redirect.github.com/vacanza/holidays/issues/2402";>#2402 by https://github.com/Roshan-1024";>@​Roshan-1024, https://github.com/KJhellico";>@​KJhellico)
   Fix TestClosestHoliday current date handling (https://redirect.github.com/vacanza/holidays/issues/2517";>#2517 by https://github.com/KJhellico";>@​KJhellico)
   Fix typography: replace U+2019 with "'" and U+2013 with '-' 
(https://redirect.github.com/vacanza/holidays/issues/2523";>#2523 
by https://github.com/KJhellico";>@​KJhellico)
   Update Canada holidays: add historical holidays (https://redirect.github.com/vacanza/holidays/issues/2507";>#2507 by https://github.com/PPsyrius";>@​PPsyrius)
   Update Ethiopia holidays: official source namings, WORKDAY 
category (https://redirect.github.com/vacanza/holidays/issues/2490";>#2490 by https://github.com/PPsyrius";>@​PPsyrius)
   Update India holidays: add missing Tamil Nadu holidays (https://redirect.github.com/vacanza/holidays/issues/2502";>#2502 by https://github.com/tr33k";>@​tr33k, https://github.com/KJhellico";>@​KJhellico)
   Update README: add Snyk package health badge (https://redirect.github.com/vacanza/holidays/issues/2503";>#2503 by https://github.com/KJhellico";>@​KJhellico)
   Update Singapore holidays: 2025 Polling Day on May 3rd (https://redirect.github.com/vacanza/holidays/issues/2487";>#2487 by https://github.com/PPsyrius";>@​PPsyrius)
   Update Taiwan holidays: test case refactor (https://redirect.github.com/vacanza/holidays/issues/2498";>#2498 by https://github.com/PPsyrius";>@​PPsyrius)
   Update documentation build process (https://redirect.github.com/vacanza/holidays/issues/2501";>#2501 by https://github.com/KJhellico";>@​KJhellico, https://github.com/arkid15r";>@​arkid15r)
   Update documentation tests: add AUTHORS.md checking (https://redirect.github.com/vacanza/holidays/issues/2492";>#2492 by https://github.com/KJhellico";>@​KJhellico, https://github.com/arkid15r";>@​arkid15r)
   Add missing subdivisions aliases  (https://redirect.github.com/vacanza/holidays/issues/2520";>#2520 by https://github.com/KJhellico";>@​KJhellico)
   Disable v1 incompatibility warning (https://redirect.github.com/vacanza/holidays/issues/2518";>#2518 by https://github.com/arkid15r";>@​arkid15r)
   Docstring cleanup for Indochinese countries (https://redirect.github.com/vacanza/holidays/issues/2505";>#2505 by https://github.com/PPsyrius";>@​PPsyrius)
   Extend Chinese Lunisolar calendar support (https://redirect.github.com/vacanza/holidays/issues/2488";>#2488 by https://github.com/KJh

Re: [PR] build(deps): bump ruff from 0.11.9 to 0.11.12 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14744:
URL: https://github.com/apache/lucene/pull/14744#issuecomment-2928169198

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build(deps): bump basedpyright from 1.29.1 to 1.29.2 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14745:
URL: https://github.com/apache/lucene/pull/14745#issuecomment-2928169297

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] build(deps): bump holidays from 0.71 to 0.73 in /dev-tools/scripts [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14743:
URL: https://github.com/apache/lucene/pull/14743#issuecomment-2928169068

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-01 Thread via GitHub


dungba88 commented on code in PR #14708:
URL: https://github.com/apache/lucene/pull/14708#discussion_r2119797821


##
lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java:
##
@@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws 
IOException {
   ByteVectorValues.checkField(ctx.reader(), fieldName);
   return null;
 }
-return vectorValues.scorer(queryVector);
+final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName);
+if (fi.getVectorDimension() != queryVector.length) {
+  throw new IllegalArgumentException(
+  "Query vector dimension does not match field dimension: "
+  + queryVector.length
+  + " != "
+  + fi.getVectorDimension());
+}
+
+// default vector scorer
+if (useFullPrecision == false) {
+  return vectorValues.scorer(queryVector);
+}
+
+final VectorSimilarityFunction vectorSimilarityFunction = 
fi.getVectorSimilarityFunction();
+return new VectorScorer() {
+  final KnnVectorValues.DocIndexIterator iterator = 
vectorValues.iterator();
+
+  @Override
+  public float score() throws IOException {
+return vectorSimilarityFunction.compare(
+queryVector, vectorValues.vectorValue(iterator.index()));

Review Comment:
   I'm putting https://github.com/apache/lucene/issues/14746 for further 
discussion on this topic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Can we support vectors to be loaded with direct I/O for full precision re-ranking? [lucene]

2025-06-01 Thread via GitHub


dungba88 opened a new issue, #14746:
URL: https://github.com/apache/lucene/issues/14746

   ### Description
   
   Spin-off from discussion in https://github.com/apache/lucene/pull/14708. One 
of the concern with with full precision (FP) re-ranking (for quantized vectors) 
is that if we use off-heap vector reader it will page-in the FP vector data and 
can compete with quantized vector data which are used for HNSW graph search. As 
HNSW will suffer the performance greatly if the vectors are not in memory, for 
instance with limited memory, can we support a mode to let the FP vectors be 
loaded with direct I/O? (Or if this is already possible?)
   
   For integrating with the existing quantized vectors codec, is my 
understanding correct that we will need to create a new codec/vector reader 
that extend from the [existing 
reader](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java#L126C16-L126C31)
 and use a different raw vector format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2928581410

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-01 Thread via GitHub


dsmiley commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2928610050

   I simplified the glob patterns.
   
   I realized my QA was quite flawed because I had the "Google Java Format" 
IntelliJ plugin installed, which overrides the style settings.  I disabled it 
and spent hours wrestling with IntelliJ's settings to try to get it to match as 
close as possible.  Alas... there remains irreconcilable differences, 
especially in line wrapping of Javadoc but also in line wrapping of long 
assignments.  This is the best I could do.  I updated the Groovy settings to 
match because of it's syntactical closeness to Java.
   
   A reminder:  these settings are applied to certain lines of code or files 
when the user takes certain actions.  Yes it can be done automatically on 
commit but that's opt-in and I'm not sure I'd condone that in a project that 
uses a non-IDE solution as we do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-01 Thread via GitHub


vigyasharma commented on code in PR #14708:
URL: https://github.com/apache/lucene/pull/14708#discussion_r2118820224


##
lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java:
##
@@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws 
IOException {
   ByteVectorValues.checkField(ctx.reader(), fieldName);
   return null;
 }
-return vectorValues.scorer(queryVector);
+final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName);
+if (fi.getVectorDimension() != queryVector.length) {
+  throw new IllegalArgumentException(
+  "Query vector dimension does not match field dimension: "
+  + queryVector.length
+  + " != "
+  + fi.getVectorDimension());
+}
+
+// default vector scorer
+if (useFullPrecision == false) {
+  return vectorValues.scorer(queryVector);
+}
+
+final VectorSimilarityFunction vectorSimilarityFunction = 
fi.getVectorSimilarityFunction();
+return new VectorScorer() {
+  final KnnVectorValues.DocIndexIterator iterator = 
vectorValues.iterator();
+
+  @Override
+  public float score() throws IOException {
+return vectorSimilarityFunction.compare(
+queryVector, vectorValues.vectorValue(iterator.index()));

Review Comment:
   I do remember reading some results on the DiskANN issue where benchmarks 
indicated that having the vectors needed for ANN graph search in memory (the 
quantized vectors in this case), does lead to better performance. So maybe, an 
option to use only DIRECT_IO for this makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] .editorconfig [lucene]

2025-06-01 Thread via GitHub


vigyasharma commented on PR #14740:
URL: https://github.com/apache/lucene/pull/14740#issuecomment-2926728995

   > I'm fine with adding these hints, although I don't use this convention 
myself
   
   Oh wait, I thought these changes were coming from the `editorconfig` in this 
PR. But looks like it's adding some stuff from my local IDE setup, duh! 
Surprising because I don't use this convention either!
   
   
   
   > Robert's comments should probably be reviewed and applied first - 
especially if adding this file may result in corruping certain file types 
(Makefile).
   
   100% agree. We also don't need to add any extra hints, as long as we have 
parity with the tidy bot, and reformatting with this config doesn't change 
existing committed code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]

2025-06-01 Thread via GitHub


vigyasharma commented on code in PR #14708:
URL: https://github.com/apache/lucene/pull/14708#discussion_r2118819077


##
lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java:
##
@@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws 
IOException {
   ByteVectorValues.checkField(ctx.reader(), fieldName);
   return null;
 }
-return vectorValues.scorer(queryVector);
+final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName);
+if (fi.getVectorDimension() != queryVector.length) {
+  throw new IllegalArgumentException(
+  "Query vector dimension does not match field dimension: "
+  + queryVector.length
+  + " != "
+  + fi.getVectorDimension());
+}
+
+// default vector scorer
+if (useFullPrecision == false) {
+  return vectorValues.scorer(queryVector);
+}
+
+final VectorSimilarityFunction vectorSimilarityFunction = 
fi.getVectorSimilarityFunction();
+return new VectorScorer() {
+  final KnnVectorValues.DocIndexIterator iterator = 
vectorValues.iterator();
+
+  @Override
+  public float score() throws IOException {
+return vectorSimilarityFunction.compare(
+queryVector, vectorValues.vectorValue(iterator.index()));

Review Comment:
   It's a valid concern for setups with limited memory.
   
   > As HNSW will suffer the performance if the vectors are not in RAM, I'm 
wondering if we can restrict the memory used by the re-ranking phase.
   
   Maybe.. I wonder how we decide that the pages used for HNSW search are more 
important than pages used for FP reranking. For an application which does KNN 
search and reranks via full precision vectors, a query doesn't really complete 
until both phases are done. Wouldn't thrashing queries during reranking add to 
overall query latency. Maybe this is okay if you were reranking for only a 
subset of queries, and the vast majority is still only HNSW search, but that 
seems very use-case specific. Might be best to let OS page cache handle this?
   
   Anyway, I think this deserves it's own discussion, perhaps in a separate 
issue? And as you and others have already mentioned, it can be handled 
independent of this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Failing test: TestSpellChecking.testGeneratedSuggestions — ComparisonFailure with expected suggestion list [lucene]

2025-06-01 Thread via GitHub


N624-debu commented on issue #14741:
URL: https://github.com/apache/lucene/issues/14741#issuecomment-2927483252

   Thanks for the suggestion, @vigyasharma. I’d be happy to take a look at 
fixing the test output formatting and open a PR for it. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix: Include tests.jvmargs in "Reproduce with" output [lucene]

2025-06-01 Thread via GitHub


N624-debu opened a new pull request, #14742:
URL: https://github.com/apache/lucene/pull/14742

   Appended tests.jvmargs from system properties to support accurate 
reproduction lines.
   
   ### Description
   
   This patch fixes an issue where the "Reproduce with" line generated by the 
test runner omits `tests.jvmargs`. This leads to loss of custom JVM flags 
(e.g., `-XX:+UseParallelGC`, `-XX:ActiveProcessorCount=1`) when attempting to 
reproduce test failures.
   
   The fix appends the system property `tests.jvmargs` using `addVmOpt`, 
ensuring the JVM args used during test execution are preserved in the 
reproduction line.
   
   Fix for #14741.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix: Include tests.jvmargs in "Reproduce with" output [lucene]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #14742:
URL: https://github.com/apache/lucene/pull/14742#issuecomment-2927705148

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]

2025-06-01 Thread via GitHub


jpountz commented on code in PR #14739:
URL: https://github.com/apache/lucene/pull/14739#discussion_r2119453329


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java:
##
@@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, 
int min, int max) thr
   // NOTE: windowMax is inclusive
   int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1);
 
-  float maxWindowScore = Float.POSITIVE_INFINITY;
   if (0 < scorable.minCompetitiveScore) {
-maxWindowScore = computeMaxScore(windowMin, windowMax);
+float maxWindowScore = computeMaxScore(windowMin, windowMax);
+scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, 
maxWindowScore);
+  } else {
+scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1);

Review Comment:
   I believe we've always had this problem? I remember trying to make things 
better but it didn't look great or caused performance regressions with term 
queries, the case I care about the most.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]

2025-06-01 Thread via GitHub


gf2121 commented on code in PR #14739:
URL: https://github.com/apache/lucene/pull/14739#discussion_r2119549265


##
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java:
##
@@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, 
int min, int max) thr
   // NOTE: windowMax is inclusive
   int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1);
 
-  float maxWindowScore = Float.POSITIVE_INFINITY;
   if (0 < scorable.minCompetitiveScore) {
-maxWindowScore = computeMaxScore(windowMin, windowMax);
+float maxWindowScore = computeMaxScore(windowMin, windowMax);
+scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, 
maxWindowScore);
+  } else {
+scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1);

Review Comment:
   > I believe we've always had this problem?
   
   I agree that the previous version could not skip windows, but within window, 
it only needs to do conjunction with the competitive docs, while this PR could 
evaluate more.
   
   I'm not sure how much this will affect though. `FilteredAndHighHigh` tasks 
should provide similar case and numbers not look bad. Let's move on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org