Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]
gf2121 commented on code in PR #14739: URL: https://github.com/apache/lucene/pull/14739#discussion_r2117962626 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr // NOTE: windowMax is inclusive int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1); - float maxWindowScore = Float.POSITIVE_INFINITY; if (0 < scorable.minCompetitiveScore) { -maxWindowScore = computeMaxScore(windowMin, windowMax); +float maxWindowScore = computeMaxScore(windowMin, windowMax); +scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, maxWindowScore); + } else { +scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1); Review Comment: So `minCompetitiveScore` won't get a chance to be respected when filter clause leads the query because `windowMax` is `DocIdSetIterator#NO_MORE_DOCS`, could this cause regression? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]
romseygeek commented on PR #14729: URL: https://github.com/apache/lucene/pull/14729#issuecomment-2927412166 The advantage of a `Rescorer` is that is is explicitly only run over the hits in a `TopDocs` instance, whereas `FunctionScoreQuery` will run over the entire docid space if you let it. So it's a natural fit for a late-interaction search process - run your first query over the whole document set to get a preliminary top-k, and then pass the resulting `TopDocs` to your rescorer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]
github-actions[bot] commented on PR #14708: URL: https://github.com/apache/lucene/pull/14708#issuecomment-2928159436 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix java doc in IndexWriter. [lucene]
github-actions[bot] commented on PR #14733: URL: https://github.com/apache/lucene/pull/14733#issuecomment-2928438304 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix java doc in IndexWriter. [lucene]
vsop-479 commented on code in PR #14733: URL: https://github.com/apache/lucene/pull/14733#discussion_r2119876685 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -469,9 +469,9 @@ public void onTicketBacklog() { * session can be quickly made available for searching without closing the writer nor calling * {@link #commit}. * - * Note that this is functionally equivalent to calling {#flush} and then opening a new reader. - * But the turnaround time of this method should be faster since it avoids the potentially costly - * {@link #commit}. + * Note that this is functionally equivalent to calling {@link #flush} and then opening a new Review Comment: Thanks @stefanvodita , I resolved it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] build(deps): bump ruff from 0.11.9 to 0.11.12 in /dev-tools/scripts [lucene]
dependabot[bot] opened a new pull request, #14744: URL: https://github.com/apache/lucene/pull/14744 Bumps [ruff](https://github.com/astral-sh/ruff) from 0.11.9 to 0.11.12. Release notes Sourced from https://github.com/astral-sh/ruff/releases";>ruff's releases. 0.11.12 Release Notes Preview features [airflow] Revise fix titles (AIR3) (https://redirect.github.com/astral-sh/ruff/pull/18215";>#18215) [pylint] Implement missing-maxsplit-arg (PLC0207) (https://redirect.github.com/astral-sh/ruff/pull/17454";>#17454) [pyupgrade] New rule UP050 (useless-class-metaclass-type) (https://redirect.github.com/astral-sh/ruff/pull/18334";>#18334) [flake8-use-pathlib] Replace os.symlink with Path.symlink_to (PTH211) (https://redirect.github.com/astral-sh/ruff/pull/18337";>#18337) Bug fixes [flake8-bugbear] Ignore __debug__ attribute in B010 (https://redirect.github.com/astral-sh/ruff/pull/18357";>#18357) [flake8-async] Fix anyio.sleep argument name (ASYNC115, ASYNC116) (https://redirect.github.com/astral-sh/ruff/pull/18262";>#18262) [refurb] Fix FURB129 autofix generating invalid syntax (https://redirect.github.com/astral-sh/ruff/pull/18235";>#18235) Rule changes [flake8-implicit-str-concat] Add autofix for ISC003 (https://redirect.github.com/astral-sh/ruff/pull/18256";>#18256) [pycodestyle] Improve the diagnostic message for E712 (https://redirect.github.com/astral-sh/ruff/pull/18328";>#18328) [flake8-2020] Fix diagnostic message for != comparisons (YTT201) (https://redirect.github.com/astral-sh/ruff/pull/18293";>#18293) [pyupgrade] Make fix unsafe if it deletes comments (UP010) (https://redirect.github.com/astral-sh/ruff/pull/18291";>#18291) Documentation Simplify rules table to improve readability (https://redirect.github.com/astral-sh/ruff/pull/18297";>#18297) Update editor integrations link in README (https://redirect.github.com/astral-sh/ruff/pull/17977";>#17977) [flake8-bugbear] Add fix safety section (B006) (https://redirect.github.com/astral-sh/ruff/pull/17652";>#17652) Contributors https://github.com/AlexWaygood";>@AlexWaygood https://github.com/CodeMan62";>@CodeMan62 https://github.com/InSyncWithFoo";>@InSyncWithFoo https://github.com/Kalmaegi";>@Kalmaegi https://github.com/LaBatata101";>@LaBatata101 https://github.com/Lee-W";>@Lee-W https://github.com/MaddyGuthridge";>@MaddyGuthridge https://github.com/MatthewMckee4";>@MatthewMckee4 https://github.com/MichaReiser";>@MichaReiser https://github.com/Vasanth-96";>@Vasanth-96 https://github.com/carljm";>@carljm https://github.com/charliermarsh";>@charliermarsh https://github.com/chirizxc";>@chirizxc https://github.com/dcreager";>@dcreager https://github.com/dhruvmanila";>@dhruvmanila https://github.com/dsherret";>@dsherret https://github.com/dylwil3";>@dylwil3 https://github.com/felixscherz";>@felixscherz https://github.com/fennr";>@fennr ... (truncated) Changelog Sourced from https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md";>ruff's changelog. 0.11.12 Preview features [airflow] Revise fix titles (AIR3) (https://redirect.github.com/astral-sh/ruff/pull/18215";>#18215) [pylint] Implement missing-maxsplit-arg (PLC0207) (https://redirect.github.com/astral-sh/ruff/pull/17454";>#17454) [pyupgrade] New rule UP050 (useless-class-metaclass-type) (https://redirect.github.com/astral-sh/ruff/pull/18334";>#18334) [flake8-use-pathlib] Replace os.symlink with Path.symlink_to (PTH211) (https://redirect.github.com/astral-sh/ruff/pull/18337";>#18337) Bug fixes [flake8-bugbear] Ignore __debug__ attribute in B010 (https://redirect.github.com/astral-sh/ruff/pull/18357";>#18357) [flake8-async] Fix anyio.sleep argument name (ASYNC115, ASYNC116) (https://redirect.github.com/astral-sh/ruff/pull/18262";>#18262) [refurb] Fix FURB129 autofix generating invalid syntax (https://redirect.github.com/astral-sh/ruff/pull/18235";>#18235) Rule changes [flake8-implicit-str-concat] Add autofix for ISC003 (https://redirect.github.com/astral-sh/ruff/pull/18256";>#18256) [pycodestyle] Improve the diagnostic message for E712 (https://redirect.github.com/astral-sh/ruff/pull/18328";>#18328) [flake8-2020] Fix diagnostic message for != comparisons (YTT201) (https://redirect.github.com/astral-sh/ruff/pull/18293";>#18293) [pyupgrade] Make fix unsafe if it deletes comments (UP010) (https://redirect.github.com/astral-sh/ruff/pull/18291";>#18291) Documentation Simplify rules table to improve readability (https://redirect.github.com/astral-sh/ruff/pull/18297";>#18297) Update editor integrations link in README (https://redirect.github.com/astral-sh/ruff/pull/17977";>#17977) [flake8-bugbear] Add fix safety section (B006) (https://redirect.github.com/astral-sh/ruff/pull/17652";>#1765
Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]
vigyasharma commented on PR #14708: URL: https://github.com/apache/lucene/pull/14708#issuecomment-2928167858 Moved full precision scores logic to a separate `FullPrecisionFloatVectorSimilarityValuesSource` that can take a custom vector similarity function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] build(deps): bump basedpyright from 1.29.1 to 1.29.2 in /dev-tools/scripts [lucene]
dependabot[bot] opened a new pull request, #14745: URL: https://github.com/apache/lucene/pull/14745 Bumps [basedpyright](https://github.com/detachhead/basedpyright) from 1.29.1 to 1.29.2. Commits https://github.com/DetachHead/basedpyright/commit/cc4dcede985490e029945df0981b644b0ae806df";>cc4dced 1.29.2 https://github.com/DetachHead/basedpyright/commit/c42ccb1e5324e0583968c4f6d804e6fb1f6b9f58";>c42ccb1 configure vscode to treat selfParameter and clsParameter semantic token t... https://github.com/DetachHead/basedpyright/commit/539f430b4ebb815e12e1564fcdb71d6133450a09";>539f430 update package-lock.json https://github.com/DetachHead/basedpyright/commit/a6174a35fe105ebaa883caf7b7d8892c0380e36a";>a6174a3 Merge tag '1.1.401' into merge-1.1.401 https://github.com/DetachHead/basedpyright/commit/79632e394fe037d6f1e6c7d0c403191fc5a1b4a3";>79632e3 Fix lint https://github.com/DetachHead/basedpyright/commit/feaaec000e30baca34d705e638c99ff18f98d2db";>feaaec0 Replace with ts-expect-error as suggested https://github.com/DetachHead/basedpyright/commit/fd1e97e09839112c7d8e9126df7b92b0a41c80f9";>fd1e97e Revert accidental whitespace changes https://github.com/DetachHead/basedpyright/commit/1c150a27cee0942d5c50260e0f1dfc45ec49b3ff";>1c150a2 Remove new static&class method tests, they should also have a decorator and a... https://github.com/DetachHead/basedpyright/commit/46653d1db1a850eb6753da6dcaece03d72080171";>46653d1 Do not autocomplete @override when reportExplicitOverride is disabled https://github.com/DetachHead/basedpyright/commit/ae64f08f39e7a7bef18acca58fdafafb859bffc2";>ae64f08 Published 1.1.401 Additional commits viewable in https://github.com/detachhead/basedpyright/compare/v1.29.1...v1.29.2";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] deps(java): bump org.apache.rat:apache-rat from 0.14 to 0.16.1 [lucene]
github-actions[bot] commented on PR #14582: URL: https://github.com/apache/lucene/pull/14582#issuecomment-2928199733 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] build(deps): bump holidays from 0.71 to 0.73 in /dev-tools/scripts [lucene]
dependabot[bot] opened a new pull request, #14743: URL: https://github.com/apache/lucene/pull/14743 Bumps [holidays](https://github.com/vacanza/holidays) from 0.71 to 0.73. Release notes Sourced from https://github.com/vacanza/holidays/releases";>holidays's releases. v0.73 Version 0.73 Released May 19, 2025 Add Cocos Islands holidays (https://redirect.github.com/vacanza/holidays/issues/2532";>#2532 by https://github.com/tr33k";>@tr33k) Add Grenada holidays (https://redirect.github.com/vacanza/holidays/issues/2524";>#2524 by https://github.com/nalin-28";>@nalin-28) Add Nepal holidays (https://redirect.github.com/vacanza/holidays/issues/2386";>#2386 by https://github.com/ankushhKapoor";>@ankushhKapoor, https://github.com/arkid15r";>@arkid15r) Add Togo holidays (https://redirect.github.com/vacanza/holidays/issues/2525";>#2525 by https://github.com/Roshan-1024";>@Roshan-1024, https://github.com/KJhellico";>@KJhellico) Update Andorra holidays, add l10n support (https://redirect.github.com/vacanza/holidays/issues/2530";>#2530 by https://github.com/KJhellico";>@KJhellico) Update Argentina holidays: add 2018 G20 Leaders' Summit for Buenos Aires (https://redirect.github.com/vacanza/holidays/issues/2529";>#2529 by https://github.com/PPsyrius";>@PPsyrius) Update Philippines holidays: add special holiday May 12, 2025 (https://redirect.github.com/vacanza/holidays/issues/2539";>#2539 by https://github.com/KJhellico";>@KJhellico) Update Vatican City holidays: add election and name day of Pope Leo XIV (https://redirect.github.com/vacanza/holidays/issues/2549";>#2549 by https://github.com/KJhellico";>@KJhellico) Update documentation build: make PR links in changelog (https://redirect.github.com/vacanza/holidays/issues/2540";>#2540 by https://github.com/KJhellico";>@KJhellico) Update pre-commit config (https://redirect.github.com/vacanza/holidays/issues/2548";>#2548 by https://github.com/KJhellico";>@KJhellico, https://github.com/arkid15r";>@arkid15r) Full Changelog: https://github.com/vacanza/holidays/compare/v0.72...v0.73";>https://github.com/vacanza/holidays/compare/v0.72...v0.73 v0.72 Version 0.72 Released May 5, 2025 Add Sao Tome and Principe holidays (https://redirect.github.com/vacanza/holidays/issues/2489";>#2489 by https://github.com/tr33k";>@tr33k, https://github.com/arkid15r";>@arkid15r) Add Trinidad and Tobago holidays (https://redirect.github.com/vacanza/holidays/issues/2402";>#2402 by https://github.com/Roshan-1024";>@Roshan-1024, https://github.com/KJhellico";>@KJhellico) Fix TestClosestHoliday current date handling (https://redirect.github.com/vacanza/holidays/issues/2517";>#2517 by https://github.com/KJhellico";>@KJhellico) Fix typography: replace U+2019 with "'" and U+2013 with '-' (https://redirect.github.com/vacanza/holidays/issues/2523";>#2523 by https://github.com/KJhellico";>@KJhellico) Update Canada holidays: add historical holidays (https://redirect.github.com/vacanza/holidays/issues/2507";>#2507 by https://github.com/PPsyrius";>@PPsyrius) Update Ethiopia holidays: official source namings, WORKDAY category (https://redirect.github.com/vacanza/holidays/issues/2490";>#2490 by https://github.com/PPsyrius";>@PPsyrius) Update India holidays: add missing Tamil Nadu holidays (https://redirect.github.com/vacanza/holidays/issues/2502";>#2502 by https://github.com/tr33k";>@tr33k, https://github.com/KJhellico";>@KJhellico) Update README: add Snyk package health badge (https://redirect.github.com/vacanza/holidays/issues/2503";>#2503 by https://github.com/KJhellico";>@KJhellico) Update Singapore holidays: 2025 Polling Day on May 3rd (https://redirect.github.com/vacanza/holidays/issues/2487";>#2487 by https://github.com/PPsyrius";>@PPsyrius) Update Taiwan holidays: test case refactor (https://redirect.github.com/vacanza/holidays/issues/2498";>#2498 by https://github.com/PPsyrius";>@PPsyrius) Update documentation build process (https://redirect.github.com/vacanza/holidays/issues/2501";>#2501 by https://github.com/KJhellico";>@KJhellico, https://github.com/arkid15r";>@arkid15r) Update documentation tests: add AUTHORS.md checking (https://redirect.github.com/vacanza/holidays/issues/2492";>#2492 by https://github.com/KJhellico";>@KJhellico, https://github.com/arkid15r";>@arkid15r) Add missing subdivisions aliases (https://redirect.github.com/vacanza/holidays/issues/2520";>#2520 by https://github.com/KJhellico";>@KJhellico) Disable v1 incompatibility warning (https://redirect.github.com/vacanza/holidays/issues/2518";>#2518 by https://github.com/arkid15r";>@arkid15r) Docstring cleanup for Indochinese countries (https://redirect.github.com/vacanza/holidays/issues/2505";>#2505 by https://github.com/PPsyrius";>@PPsyrius) Extend Chinese Lunisolar calendar support (https://redirect.github.com/vacanza/holidays/issues/2488";>#2488 by https://github.com/KJh
Re: [PR] build(deps): bump ruff from 0.11.9 to 0.11.12 in /dev-tools/scripts [lucene]
github-actions[bot] commented on PR #14744: URL: https://github.com/apache/lucene/pull/14744#issuecomment-2928169198 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] build(deps): bump basedpyright from 1.29.1 to 1.29.2 in /dev-tools/scripts [lucene]
github-actions[bot] commented on PR #14745: URL: https://github.com/apache/lucene/pull/14745#issuecomment-2928169297 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] build(deps): bump holidays from 0.71 to 0.73 in /dev-tools/scripts [lucene]
github-actions[bot] commented on PR #14743: URL: https://github.com/apache/lucene/pull/14743#issuecomment-2928169068 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]
dungba88 commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2119797821 ## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ## @@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws IOException { ByteVectorValues.checkField(ctx.reader(), fieldName); return null; } -return vectorValues.scorer(queryVector); +final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName); +if (fi.getVectorDimension() != queryVector.length) { + throw new IllegalArgumentException( + "Query vector dimension does not match field dimension: " + + queryVector.length + + " != " + + fi.getVectorDimension()); +} + +// default vector scorer +if (useFullPrecision == false) { + return vectorValues.scorer(queryVector); +} + +final VectorSimilarityFunction vectorSimilarityFunction = fi.getVectorSimilarityFunction(); +return new VectorScorer() { + final KnnVectorValues.DocIndexIterator iterator = vectorValues.iterator(); + + @Override + public float score() throws IOException { +return vectorSimilarityFunction.compare( +queryVector, vectorValues.vectorValue(iterator.index())); Review Comment: I'm putting https://github.com/apache/lucene/issues/14746 for further discussion on this topic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Can we support vectors to be loaded with direct I/O for full precision re-ranking? [lucene]
dungba88 opened a new issue, #14746: URL: https://github.com/apache/lucene/issues/14746 ### Description Spin-off from discussion in https://github.com/apache/lucene/pull/14708. One of the concern with with full precision (FP) re-ranking (for quantized vectors) is that if we use off-heap vector reader it will page-in the FP vector data and can compete with quantized vector data which are used for HNSW graph search. As HNSW will suffer the performance greatly if the vectors are not in memory, for instance with limited memory, can we support a mode to let the FP vectors be loaded with direct I/O? (Or if this is already possible?) For integrating with the existing quantized vectors codec, is my understanding correct that we will need to create a new codec/vector reader that extend from the [existing reader](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java#L126C16-L126C31) and use a different raw vector format? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] .editorconfig [lucene]
github-actions[bot] commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2928581410 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] .editorconfig [lucene]
dsmiley commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2928610050 I simplified the glob patterns. I realized my QA was quite flawed because I had the "Google Java Format" IntelliJ plugin installed, which overrides the style settings. I disabled it and spent hours wrestling with IntelliJ's settings to try to get it to match as close as possible. Alas... there remains irreconcilable differences, especially in line wrapping of Javadoc but also in line wrapping of long assignments. This is the best I could do. I updated the Groovy settings to match because of it's syntactical closeness to Java. A reminder: these settings are applied to certain lines of code or files when the user takes certain actions. Yes it can be done automatically on commit but that's opt-in and I'm not sure I'd condone that in a project that uses a non-IDE solution as we do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]
vigyasharma commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2118820224 ## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ## @@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws IOException { ByteVectorValues.checkField(ctx.reader(), fieldName); return null; } -return vectorValues.scorer(queryVector); +final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName); +if (fi.getVectorDimension() != queryVector.length) { + throw new IllegalArgumentException( + "Query vector dimension does not match field dimension: " + + queryVector.length + + " != " + + fi.getVectorDimension()); +} + +// default vector scorer +if (useFullPrecision == false) { + return vectorValues.scorer(queryVector); +} + +final VectorSimilarityFunction vectorSimilarityFunction = fi.getVectorSimilarityFunction(); +return new VectorScorer() { + final KnnVectorValues.DocIndexIterator iterator = vectorValues.iterator(); + + @Override + public float score() throws IOException { +return vectorSimilarityFunction.compare( +queryVector, vectorValues.vectorValue(iterator.index())); Review Comment: I do remember reading some results on the DiskANN issue where benchmarks indicated that having the vectors needed for ANN graph search in memory (the quantized vectors in this case), does lead to better performance. So maybe, an option to use only DIRECT_IO for this makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] .editorconfig [lucene]
vigyasharma commented on PR #14740: URL: https://github.com/apache/lucene/pull/14740#issuecomment-2926728995 > I'm fine with adding these hints, although I don't use this convention myself Oh wait, I thought these changes were coming from the `editorconfig` in this PR. But looks like it's adding some stuff from my local IDE setup, duh! Surprising because I don't use this convention either! > Robert's comments should probably be reviewed and applied first - especially if adding this file may result in corruping certain file types (Makefile). 100% agree. We also don't need to add any extra hints, as long as we have parity with the tidy bot, and reformatting with this config doesn't change existing committed code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a DoubleValuesSource for scoring full precision vector similarity [lucene]
vigyasharma commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2118819077 ## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ## @@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws IOException { ByteVectorValues.checkField(ctx.reader(), fieldName); return null; } -return vectorValues.scorer(queryVector); +final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName); +if (fi.getVectorDimension() != queryVector.length) { + throw new IllegalArgumentException( + "Query vector dimension does not match field dimension: " + + queryVector.length + + " != " + + fi.getVectorDimension()); +} + +// default vector scorer +if (useFullPrecision == false) { + return vectorValues.scorer(queryVector); +} + +final VectorSimilarityFunction vectorSimilarityFunction = fi.getVectorSimilarityFunction(); +return new VectorScorer() { + final KnnVectorValues.DocIndexIterator iterator = vectorValues.iterator(); + + @Override + public float score() throws IOException { +return vectorSimilarityFunction.compare( +queryVector, vectorValues.vectorValue(iterator.index())); Review Comment: It's a valid concern for setups with limited memory. > As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase. Maybe.. I wonder how we decide that the pages used for HNSW search are more important than pages used for FP reranking. For an application which does KNN search and reranks via full precision vectors, a query doesn't really complete until both phases are done. Wouldn't thrashing queries during reranking add to overall query latency. Maybe this is okay if you were reranking for only a subset of queries, and the vast majority is still only HNSW search, but that seems very use-case specific. Might be best to let OS page cache handle this? Anyway, I think this deserves it's own discussion, perhaps in a separate issue? And as you and others have already mentioned, it can be handled independent of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Failing test: TestSpellChecking.testGeneratedSuggestions — ComparisonFailure with expected suggestion list [lucene]
N624-debu commented on issue #14741: URL: https://github.com/apache/lucene/issues/14741#issuecomment-2927483252 Thanks for the suggestion, @vigyasharma. I’d be happy to take a look at fixing the test output formatting and open a PR for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix: Include tests.jvmargs in "Reproduce with" output [lucene]
N624-debu opened a new pull request, #14742: URL: https://github.com/apache/lucene/pull/14742 Appended tests.jvmargs from system properties to support accurate reproduction lines. ### Description This patch fixes an issue where the "Reproduce with" line generated by the test runner omits `tests.jvmargs`. This leads to loss of custom JVM flags (e.g., `-XX:+UseParallelGC`, `-XX:ActiveProcessorCount=1`) when attempting to reproduce test failures. The fix appends the system property `tests.jvmargs` using `addVmOpt`, ensuring the JVM args used during test execution are preserved in the reproduction line. Fix for #14741. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix: Include tests.jvmargs in "Reproduce with" output [lucene]
github-actions[bot] commented on PR #14742: URL: https://github.com/apache/lucene/pull/14742#issuecomment-2927705148 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]
jpountz commented on code in PR #14739: URL: https://github.com/apache/lucene/pull/14739#discussion_r2119453329 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr // NOTE: windowMax is inclusive int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1); - float maxWindowScore = Float.POSITIVE_INFINITY; if (0 < scorable.minCompetitiveScore) { -maxWindowScore = computeMaxScore(windowMin, windowMax); +float maxWindowScore = computeMaxScore(windowMin, windowMax); +scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, maxWindowScore); + } else { +scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1); Review Comment: I believe we've always had this problem? I remember trying to make things better but it didn't look great or caused performance regressions with term queries, the case I care about the most. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. [lucene]
gf2121 commented on code in PR #14739: URL: https://github.com/apache/lucene/pull/14739#discussion_r2119549265 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -87,18 +87,64 @@ public int score(LeafCollector collector, Bits acceptDocs, int min, int max) thr // NOTE: windowMax is inclusive int windowMax = Math.min(scorers[0].advanceShallow(windowMin), max - 1); - float maxWindowScore = Float.POSITIVE_INFINITY; if (0 < scorable.minCompetitiveScore) { -maxWindowScore = computeMaxScore(windowMin, windowMax); +float maxWindowScore = computeMaxScore(windowMin, windowMax); +scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, maxWindowScore); + } else { +scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1); Review Comment: > I believe we've always had this problem? I agree that the previous version could not skip windows, but within window, it only needs to do conjunction with the competitive docs, while this PR could evaluate more. I'm not sure how much this will affect though. `FilteredAndHighHigh` tasks should provide similar case and numbers not look bad. Let's move on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org