Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-23 Thread via GitHub
vsop-479 commented on PR #14814: URL: https://github.com/apache/lucene/pull/14814#issuecomment-2998497268 Thanks for explaining @msokolov, closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-23 Thread via GitHub
vsop-479 closed pull request #14814: Add isEmpty in PriorityQueue. URL: https://github.com/apache/lucene/pull/14814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] ci: enable gh annotations with ast-grep [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14840: URL: https://github.com/apache/lucene/pull/14840#issuecomment-2998468980 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] ci: enable gh annotations with ast-grep [lucene]

2025-06-23 Thread via GitHub
rmuir opened a new pull request, #14840: URL: https://github.com/apache/lucene/pull/14840 Error is still printed to console, but in a special structured format recognized by actions to highlight and annotate the problem in your PR. When clicking the build failure, annotations can be s

Re: [PR] Update the IOContext on IndexInput rather than the ReadAdvice [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14702: URL: https://github.com/apache/lucene/pull/14702#issuecomment-2998368176 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] IndexOrDocValuesQuery and IndexSortSortedNumericDocValuesRangeQuery should only be counted once when computing maxClauseCount [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14759: URL: https://github.com/apache/lucene/pull/14759#issuecomment-2998368018 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-23 Thread via GitHub
rmuir commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2162658278 ## lucene/CHANGES.txt: ## @@ -2453,7 +2454,7 @@ New Features * LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery to speed up computi

Re: [PR] Introduce getQuantizedVectorValues method in LeafReader to access QuantizedByteVectorValues [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14792: URL: https://github.com/apache/lucene/pull/14792#issuecomment-2998156913 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on PR #14836: URL: https://github.com/apache/lucene/pull/14836#issuecomment-2997126587 @msokolov YEP! Let me do that, sorry, should have posted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] assertDocValuesEquals should support sparse sorted doc_values [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14839: URL: https://github.com/apache/lucene/pull/14839#issuecomment-2998141485 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] assertDocValuesEquals should support sparse sorted doc_values [lucene]

2025-06-23 Thread via GitHub
parkertimmins opened a new pull request, #14839: URL: https://github.com/apache/lucene/pull/14839 In LuceneTestCase, assertDocValuesEquals compares the doc_values in two indices. For SortedSetDocValues, as well as other doc value types, it does not assume that every document has a value. Bu

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
benwtrent merged PR #14836: URL: https://github.com/apache/lucene/pull/14836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add isEmpty in PriorityQueue. [lucene]

2025-06-23 Thread via GitHub
msokolov commented on PR #14814: URL: https://github.com/apache/lucene/pull/14814#issuecomment-2997973738 I believe the reason Collection has isEmpty() is there may be a way to more efficiently know if size > 0? But that doesn't apply here, and I don't think we ought to be adding sugar meth

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
msokolov commented on PR #14836: URL: https://github.com/apache/lucene/pull/14836#issuecomment-2997958719 Thanks! Yes it makes sense to me to specialize when we think we know which approach is better. I'm kind of curious about the log(N) heuristic - I suppose tweaking it won't make a big di

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-23 Thread via GitHub
dsmiley commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2997685023 Non-default codecs don't have the high support burden that the default codec has, in terms of backwards compatibility and general documentation expectations. Few users will choose

Re: [PR] Make it possible to extend PatienceKnnQuery [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on code in PR #14838: URL: https://github.com/apache/lucene/pull/14838#discussion_r2162003447 ## lucene/core/src/java/org/apache/lucene/search/PatienceKnnVectorQuery.java: ## @@ -123,8 +123,24 @@ public static PatienceKnnVectorQuery fromSeededQuery(SeededKnn

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-23 Thread via GitHub
vigyasharma commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2162100578 ## lucene/core/src/java/org/apache/lucene/search/RescoreTopNQuery.java: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on PR #14836: URL: https://github.com/apache/lucene/pull/14836#issuecomment-2997236824 @msokolov [baseline_and_candidate_jfr.zip](https://github.com/user-attachments/files/20869476/baseline_and_candidate_jfr.zip) Additional stats, bit compressed, 50k docs.

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-06-23 Thread via GitHub
msokolov commented on PR #14837: URL: https://github.com/apache/lucene/pull/14837#issuecomment-2997116185 Seems a bit weird to include a merge policy change in a PR that is mostly about changing logging, but OK. I think I would be happier if the PR was entitled "Don't invoke merge-on-commit

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
msokolov commented on PR #14836: URL: https://github.com/apache/lucene/pull/14836#issuecomment-2997121067 > The expectedVisitedNodes is both empirical and intuitive. This could possibly be refined given the number of connections within the graph, but I think this is "good enough" for now.

Re: [I] [Minor] Release email URLs not clickable in some email clients [lucene]

2025-06-23 Thread via GitHub
stefanvodita commented on issue #12119: URL: https://github.com/apache/lucene/issues/12119#issuecomment-2997102505 Yeah, maybe. Both Gmail and Outlook work correctly for me now though 🤷 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Make it possible to extend PatienceKnnQuery [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14838: URL: https://github.com/apache/lucene/pull/14838#issuecomment-2997087456 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Make it possible to extend PatienceKnnQuery [lucene]

2025-06-23 Thread via GitHub
tteofili commented on PR #14838: URL: https://github.com/apache/lucene/pull/14838#issuecomment-2997086215 nit: this also fixes a minor javadoc issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] Make it possible to extend PatienceKnnQuery [lucene]

2025-06-23 Thread via GitHub
tteofili opened a new pull request, #14838: URL: https://github.com/apache/lucene/pull/14838 Currently `PatienceKnnQuery` extends and is constructed by an `AbstractKnnQuery`, which is not public and therefore avoids other classes to extend it (as the related ctor is also package private).

Re: [I] [Minor] Release email URLs not clickable in some email clients [lucene]

2025-06-23 Thread via GitHub
rmuir commented on issue #12119: URL: https://github.com/apache/lucene/issues/12119#issuecomment-2997015959 Maybe it has to do with whether the email had html or not, I could see the `>` becoming a `>` and then get treated differently by different clients? -- This is an automated message

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14837: URL: https://github.com/apache/lucene/pull/14837#issuecomment-2996819436 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-06-23 Thread via GitHub
mikemccand opened a new pull request, #14837: URL: https://github.com/apache/lucene/pull/14837 Some small fixes to IndexWriter's InfoStream logging, uncovered when working on the new segment tracing tool from luceneutil (example: https://githubsearch.mikemccandless.com/segments_15.html). I

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on code in PR #14836: URL: https://github.com/apache/lucene/pull/14836#discussion_r2161790605 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -41,6 +41,16 @@ public class HnswGraphSearcher extends AbstractHnswGraphSearcher {

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
john-wagster commented on code in PR #14836: URL: https://github.com/apache/lucene/pull/14836#discussion_r2161776441 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -41,6 +41,16 @@ public class HnswGraphSearcher extends AbstractHnswGraphSearche

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2996722590 > It's not an either-or; it's both. The integration may better highlight how Lucene core can be improved as it allows a more "apples to apples" comparison for performance, e

[PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
benwtrent opened a new pull request, #14836: URL: https://github.com/apache/lucene/pull/14836 For smaller graphs, the overhead cost of a SparseFixedBitSet shows up in the performance metrics. This adjusts the bitset creation logic to be more similar to how we utilize [Sparse]FixedBit

Re: [PR] Sometimes use `FixedBitSet` when doing HNSW searches [lucene]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #14836: URL: https://github.com/apache/lucene/pull/14836#issuecomment-2996663562 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-23 Thread via GitHub
dsmiley commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2996611142 bq. I do think there are things that Lucene can learn from JVector. It would be way better for the Lucene community as a whole to "do the hard thing" and improve Lucene directly ins

Re: [I] Expand TieredMergePolicy deletePctAllowed limits [lucene]

2025-06-23 Thread via GitHub
stefanvodita commented on issue #11761: URL: https://github.com/apache/lucene/issues/11761#issuecomment-2996527281 It's a good point @jpountz! Thank you also for the extra attention to TMP in #14823. I've had a look at the Amazon Product Search indexes and confirmed that the deletes perc

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2996401932 > if the FP vectors are not stored in memory we have noticed that the graph structure is overall pretty lean and can fit on the JVM heap pretty easily, even on low heaps. T

Re: [I] action failures sent to bui...@lucene.apache.org [lucene]

2025-06-23 Thread via GitHub
rmuir commented on issue #14687: URL: https://github.com/apache/lucene/issues/14687#issuecomment-2996376085 we could also investigate https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubActionsbuildstatusemails but I don't know if it will sp

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-06-23 Thread via GitHub
benwtrent commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2996328248 kNN queries are completed in the rewrite phase, if any rescoring needs to be done, it should be done during that full phase. I would expect the experience to be: `Rescore

Re: [I] A workflow "bot" that would apply formatting fixes and create a PR against a PR? [lucene]

2025-06-23 Thread via GitHub
dweiss commented on issue #14835: URL: https://github.com/apache/lucene/issues/14835#issuecomment-2996310019 This looks nice. > Currently all tools just print to the console in CI. Not all of them - this one creates a report that's part of the action's result - https://githu

Re: [I] [Minor] Release email URLs not clickable in some email clients [lucene]

2025-06-23 Thread via GitHub
stefanvodita closed issue #12119: [Minor] Release email URLs not clickable in some email clients URL: https://github.com/apache/lucene/issues/12119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] A workflow "bot" that would apply formatting fixes and create a PR against a PR? [lucene]

2025-06-23 Thread via GitHub
rmuir commented on issue #14835: URL: https://github.com/apache/lucene/issues/14835#issuecomment-2995975774 We could also try e.g. `--format github` option of tools to get better integration. I think they typically use "annotations" and might be much more friendly I'm not sure how autofixes

Re: [PR] build: replace six simple error prone checks [lucene]

2025-06-23 Thread via GitHub
rmuir merged PR #14831: URL: https://github.com/apache/lucene/pull/14831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[I] A workflow "bot" that would apply formatting fixes and create a PR against a PR? [lucene]

2025-06-23 Thread via GitHub
dweiss opened a new issue, #14835: URL: https://github.com/apache/lucene/issues/14835 ### Description There are more different tools now used to harness code quality. I wonder if it'd be possible to create a workflow that would: * run a set of designated formatting-validation tasks

Re: [I] [Minor] Release email URLs not clickable in some email clients [lucene]

2025-06-23 Thread via GitHub
stefanvodita commented on issue #12119: URL: https://github.com/apache/lucene/issues/12119#issuecomment-2995448188 I've not observed the same thing with the new 9.12.2 and 10.2.2 releases. Maybe it was just a one-time issue. I'm not convinced my original understanding was right anyway. --

Re: [I] Remove -XX:ActiveProcessorCount=1 from template.gradle.properties [lucene]

2025-06-23 Thread via GitHub
dweiss commented on issue #14829: URL: https://github.com/apache/lucene/issues/14829#issuecomment-2995105823 Yes, I think it's reasonable. It is many, many different moving pieces in gradle - I couldn't figure out how the number of reported cores affects the final performance (but then - I