Re: [PR] deps(java): bump org.apache.rat:apache-rat from 0.14 to 0.16.1 [lucene]

2025-05-11 Thread via GitHub
rmuir commented on PR #14582: URL: https://github.com/apache/lucene/pull/14582#issuecomment-2870547128 first bumping to 0.15 via #14648 we can rebase the bot after that here. 0.16.x seems like a bigger change based on https://creadur.apache.org/rat/changes-report.html, so there

[PR] upgrade from rat 0.14 to rat 0.15 [lucene]

2025-05-11 Thread via GitHub
rmuir opened a new pull request, #14648: URL: https://github.com/apache/lucene/pull/14648 This upgrade doesn't break our build, seems the API changes that cause issues might begin with 0.16: https://creadur.apache.org/rat/changes-report.html -- This is an automated message

[PR] Add comment about using InverseIntersectVisit and IntersectVisitor. [lucene]

2025-05-11 Thread via GitHub
vsop-479 opened a new pull request, #14647: URL: https://github.com/apache/lucene/pull/14647 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Dynamic threshold for DocIdSetBuilder [lucene]

2025-05-11 Thread via GitHub
prudhvigodithi commented on issue #14485: URL: https://github.com/apache/lucene/issues/14485#issuecomment-2870497304 So from my understanding, Instead of creating one large BitSet for the entire segment (sized for maxDoc), the suggestion is to: - Create a smaller BitSet that only cove

[PR] Add instructions to help/IDEs.txt for VSCode and Neovim [lucene]

2025-05-11 Thread via GitHub
rmuir opened a new pull request, #14646: URL: https://github.com/apache/lucene/pull/14646 Both of these use the eclipse language server, so they just leverage existing `gradlew eclipse`. The trick is to disable Eclipse Language Server's built-in gradle integration and just use the .c

Re: [PR] build(deps): bump ruff from 0.11.7 to 0.11.8 in /dev-tools/scripts [lucene]

2025-05-11 Thread via GitHub
rmuir merged PR #14603: URL: https://github.com/apache/lucene/pull/14603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] build(deps): bump ruff from 0.11.7 to 0.11.8 in /dev-tools/scripts [lucene]

2025-05-11 Thread via GitHub
rmuir commented on PR #14603: URL: https://github.com/apache/lucene/pull/14603#issuecomment-2870383819 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] deps(java): bump de.jflex:jflex from 1.8.2 to 1.9.1 [lucene]

2025-05-11 Thread via GitHub
rmuir merged PR #14583: URL: https://github.com/apache/lucene/pull/14583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[I] investigate jflex 1.9.x buffer size/expansion feature [lucene]

2025-05-11 Thread via GitHub
rmuir opened a new issue, #14645: URL: https://github.com/apache/lucene/issues/14645 ### Description #14583 only bumps the dependency and regenerates, but doesn't take advantage of the new features. I think we are currently taking care of this with skeleton files in `gradle/generatio

Re: [PR] deps(java): bump de.jflex:jflex from 1.8.2 to 1.9.1 [lucene]

2025-05-11 Thread via GitHub
rmuir commented on code in PR #14583: URL: https://github.com/apache/lucene/pull/14583#discussion_r2083651585 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java: ## @@ -438,6 +436,16 @@ public final void setBufferSize(int numChars) {

Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2025-05-11 Thread via GitHub
stefanvodita commented on issue #13898: URL: https://github.com/apache/lucene/issues/13898#issuecomment-2870276813 I've been monitoring the jobs after the most recent batch of fixes and I'm happy with the results. The only change that the bot got wrong was #14638 ([logs](https://github.com/

[PR] Enable changelog verifier [lucene]

2025-05-11 Thread via GitHub
stefanvodita opened a new pull request, #14644: URL: https://github.com/apache/lucene/pull/14644 The changelog verifier will start to post comments on PRs and to add milestones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Use the preload hint on completion fields and memory terms dictionaries. [lucene]

2025-05-11 Thread via GitHub
jpountz merged PR #14634: URL: https://github.com/apache/lucene/pull/14634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Clean up FileTypeHint a bit. [lucene]

2025-05-11 Thread via GitHub
jpountz merged PR #14635: URL: https://github.com/apache/lucene/pull/14635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-11 Thread via GitHub
rmuir commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2870192359 > Oh, hmmm, maybe not -- JDK 23 EOL'd. you can still download it the old fashioned way for a test: https://www.oracle.com/java/technologies/javase/jdk23-archive-downloads.html

Re: [I] Promote sandbox facets to the main facets module [lucene]

2025-05-11 Thread via GitHub
jpountz commented on issue #14619: URL: https://github.com/apache/lucene/issues/14619#issuecomment-2870187602 Facets already put the burden of choosing between taxonomy and doc-value-based faceting on users. If we introduce a new approach for faceting, I worry that it would make things even

Re: [I] Segment count (merging) can impact recall on KNN ParentJoin queries [lucene]

2025-05-11 Thread via GitHub
jpountz commented on issue #14643: URL: https://github.com/apache/lucene/issues/14643#issuecomment-2870180042 Why are the recall values so bad with parent-join queries (whether merging is enabled or not)? Is there a bug? -- This is an automated message from the Apache Git Service. To resp

Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-11 Thread via GitHub
msokolov closed issue #14630: Nightly benchmark regression on 2025.05.01 URL: https://github.com/apache/lucene/issues/14630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Segment count (merging) can impact recall on KNN ParentJoin queries [lucene]

2025-05-11 Thread via GitHub
msokolov commented on issue #14643: URL: https://github.com/apache/lucene/issues/14643#issuecomment-2869830216 sadly, this is expected. It's not only parent-join, but any kind of approximate NN search. Think of the limit where we have as many segments as there are documents, recall will alw

Re: [I] TopFieldCollector mistakenly assumes that all leaves share the same index sort [lucene]

2025-05-11 Thread via GitHub
msokolov commented on issue #14399: URL: https://github.com/apache/lucene/issues/14399#issuecomment-2869828209 Would it make sense to have different collectors for the two cases, one with and one without a cache? -- This is an automated message from the Apache Git Service. To respond to t